Index: vendor/llvm/dist/docs/CMake.rst
===================================================================
--- vendor/llvm/dist/docs/CMake.rst	(revision 295845)
+++ vendor/llvm/dist/docs/CMake.rst	(revision 295846)
@@ -1,652 +1,670 @@
 ========================
 Building LLVM with CMake
 ========================
 
 .. contents::
    :local:
 
 Introduction
 ============
 
 `CMake <http://www.cmake.org/>`_ is a cross-platform build-generator tool. CMake
 does not build the project, it generates the files needed by your build tool
 (GNU make, Visual Studio, etc.) for building LLVM.
 
 If you are really anxious about getting a functional LLVM build, go to the
 `Quick start`_ section. If you are a CMake novice, start with `Basic CMake usage`_
 and then go back to the `Quick start`_ section once you know what you are doing. The
 `Options and variables`_ section is a reference for customizing your build. If
 you already have experience with CMake, this is the recommended starting point.
 
 .. _Quick start:
 
 Quick start
 ===========
 
 We use here the command-line, non-interactive CMake interface.
 
 #. `Download <http://www.cmake.org/cmake/resources/software.html>`_ and install
    CMake. Version 2.8.8 is the minimum required, but if you're using the Ninja
    backend, CMake v3.2 or newer is required to `get interactive output
    <http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20141117/244797.html>`_
    when running :doc:`Lit <CommandGuide/lit>`.
 
 #. Open a shell. Your development tools must be reachable from this shell
    through the PATH environment variable.
 
 #. Create a build directory. Building LLVM in the source
    directory is not supported. cd to this directory:
 
    .. code-block:: console
 
      $ mkdir mybuilddir
      $ cd mybuilddir
 
 #. Execute this command in the shell replacing `path/to/llvm/source/root` with
    the path to the root of your LLVM source tree:
 
    .. code-block:: console
 
      $ cmake path/to/llvm/source/root
 
    CMake will detect your development environment, perform a series of tests, and
    generate the files required for building LLVM. CMake will use default values
    for all build parameters. See the `Options and variables`_ section for
    a list of build parameters that you can modify.
 
    This can fail if CMake can't detect your toolset, or if it thinks that the
    environment is not sane enough. In this case, make sure that the toolset that
    you intend to use is the only one reachable from the shell, and that the shell
    itself is the correct one for your development environment. CMake will refuse
    to build MinGW makefiles if you have a POSIX shell reachable through the PATH
    environment variable, for instance. You can force CMake to use a given build
    tool; for instructions, see the `Usage`_ section, below.
 
 #. After CMake has finished running, proceed to use IDE project files, or start
    the build from the build directory:
 
    .. code-block:: console
 
      $ cmake --build .
 
    The ``--build`` option tells ``cmake`` to invoke the underlying build
    tool (``make``, ``ninja``, ``xcodebuild``, ``msbuild``, etc.)
 
    The underlying build tool can be invoked directly, of course, but
    the ``--build`` option is portable.
 
 #. After LLVM has finished building, install it from the build directory:
 
    .. code-block:: console
 
      $ cmake --build . --target install
 
    The ``--target`` option with ``install`` parameter in addition to
    the ``--build`` option tells ``cmake`` to build the ``install`` target.
 
    It is possible to set a different install prefix at installation time
    by invoking the ``cmake_install.cmake`` script generated in the
    build directory:
 
    .. code-block:: console
 
      $ cmake -DCMAKE_INSTALL_PREFIX=/tmp/llvm -P cmake_install.cmake
 
 .. _Basic CMake usage:
 .. _Usage:
 
 Basic CMake usage
 =================
 
 This section explains basic aspects of CMake
 which you may need in your day-to-day usage.
 
 CMake comes with extensive documentation, in the form of html files, and as
 online help accessible via the ``cmake`` executable itself. Execute ``cmake
 --help`` for further help options.
 
 CMake allows you to specify a build tool (e.g., GNU make, Visual Studio,
 or Xcode). If not specified on the command line, CMake tries to guess which
 build tool to use, based on your environment. Once it has identified your
 build tool, CMake uses the corresponding *Generator* to create files for your
 build tool (e.g., Makefiles or Visual Studio or Xcode project files). You can
 explicitly specify the generator with the command line option ``-G "Name of the
 generator"``. To see a list of the available generators on your system, execute
 
 .. code-block:: console
 
   $ cmake --help
 
 This will list the generator names at the end of the help text.
 
 Generators' names are case-sensitive, and may contain spaces. For this reason,
 you should enter them exactly as they are listed in the ``cmake --help``
 output, in quotes. For example, to generate project files specifically for
 Visual Studio 12, you can execute:
 
 .. code-block:: console
 
   $ cmake -G "Visual Studio 12" path/to/llvm/source/root
 
 For a given development platform there can be more than one adequate
 generator. If you use Visual Studio, "NMake Makefiles" is a generator you can use
 for building with NMake. By default, CMake chooses the most specific generator
 supported by your development environment. If you want an alternative generator,
 you must tell this to CMake with the ``-G`` option.
 
 .. todo::
 
   Explain variables and cache. Move explanation here from #options section.
 
 .. _Options and variables:
 
 Options and variables
 =====================
 
 Variables customize how the build will be generated. Options are boolean
 variables, with possible values ON/OFF. Options and variables are defined on the
 CMake command line like this:
 
 .. code-block:: console
 
   $ cmake -DVARIABLE=value path/to/llvm/source
 
 You can set a variable after the initial CMake invocation to change its
 value. You can also undefine a variable:
 
 .. code-block:: console
 
   $ cmake -UVARIABLE path/to/llvm/source
 
 Variables are stored in the CMake cache. This is a file named ``CMakeCache.txt``
 stored at the root of your build directory that is generated by ``cmake``.
 Editing it yourself is not recommended.
 
 Variables are listed in the CMake cache and later in this document with
 the variable name and type separated by a colon. You can also specify the
 variable and type on the CMake command line:
 
 .. code-block:: console
 
   $ cmake -DVARIABLE:TYPE=value path/to/llvm/source
 
 Frequently-used CMake variables
 -------------------------------
 
 Here are some of the CMake variables that are used often, along with a
 brief explanation and LLVM-specific notes. For full documentation, consult the
 CMake manual, or execute ``cmake --help-variable VARIABLE_NAME``.
 
 **CMAKE_BUILD_TYPE**:STRING
   Sets the build type for ``make``-based generators. Possible values are
   Release, Debug, RelWithDebInfo and MinSizeRel. If you are using an IDE such as
   Visual Studio, you should use the IDE settings to set the build type.
 
 **CMAKE_INSTALL_PREFIX**:PATH
   Path where LLVM will be installed if "make install" is invoked or the
   "install" target is built.
 
 **LLVM_LIBDIR_SUFFIX**:STRING
   Extra suffix to append to the directory where libraries are to be
   installed. On a 64-bit architecture, one could use ``-DLLVM_LIBDIR_SUFFIX=64``
   to install libraries to ``/usr/lib64``.
 
 **CMAKE_C_FLAGS**:STRING
   Extra flags to use when compiling C source files.
 
 **CMAKE_CXX_FLAGS**:STRING
   Extra flags to use when compiling C++ source files.
 
-**BUILD_SHARED_LIBS**:BOOL
-  Flag indicating if shared libraries will be built. Its default value is
-  OFF. This option is only recommended for use by LLVM developers.
-  On Windows, shared libraries may be used when building with MinGW, including
-  mingw-w64, but not when building with the Microsoft toolchain.
-
 .. _LLVM-specific variables:
 
 LLVM-specific variables
 -----------------------
 
 **LLVM_TARGETS_TO_BUILD**:STRING
   Semicolon-separated list of targets to build, or *all* for building all
   targets. Case-sensitive. Defaults to *all*. Example:
   ``-DLLVM_TARGETS_TO_BUILD="X86;PowerPC"``.
 
 **LLVM_BUILD_TOOLS**:BOOL
   Build LLVM tools. Defaults to ON. Targets for building each tool are generated
   in any case. You can build a tool separately by invoking its target. For
   example, you can build *llvm-as* with a Makefile-based system by executing *make
   llvm-as* at the root of your build directory.
 
 **LLVM_INCLUDE_TOOLS**:BOOL
   Generate build targets for the LLVM tools. Defaults to ON. You can use this
   option to disable the generation of build targets for the LLVM tools.
 
 **LLVM_BUILD_EXAMPLES**:BOOL
   Build LLVM examples. Defaults to OFF. Targets for building each example are
   generated in any case. See documentation for *LLVM_BUILD_TOOLS* above for more
   details.
 
 **LLVM_INCLUDE_EXAMPLES**:BOOL
   Generate build targets for the LLVM examples. Defaults to ON. You can use this
   option to disable the generation of build targets for the LLVM examples.
 
 **LLVM_BUILD_TESTS**:BOOL
   Build LLVM unit tests. Defaults to OFF. Targets for building each unit test
   are generated in any case. You can build a specific unit test using the
   targets defined under *unittests*, such as ADTTests, IRTests, SupportTests,
   etc. (Search for ``add_llvm_unittest`` in the subdirectories of *unittests*
   for a complete list of unit tests.) It is possible to build all unit tests
   with the target *UnitTests*.
 
 **LLVM_INCLUDE_TESTS**:BOOL
   Generate build targets for the LLVM unit tests. Defaults to ON. You can use
   this option to disable the generation of build targets for the LLVM unit
   tests.
 
 **LLVM_APPEND_VC_REV**:BOOL
   Append version control revision info (svn revision number or Git revision id)
   to LLVM version string (stored in the PACKAGE_VERSION macro). For this to work
   cmake must be invoked before the build. Defaults to OFF.
 
 **LLVM_ENABLE_THREADS**:BOOL
   Build with threads support, if available. Defaults to ON.
 
 **LLVM_ENABLE_CXX1Y**:BOOL
   Build in C++1y mode, if available. Defaults to OFF.
 
 **LLVM_ENABLE_ASSERTIONS**:BOOL
   Enables code assertions. Defaults to ON if and only if ``CMAKE_BUILD_TYPE``
   is *Debug*.
 
 **LLVM_ENABLE_EH**:BOOL
   Build LLVM with exception-handling support. This is necessary if you wish to
   link against LLVM libraries and make use of C++ exceptions in your own code
   that need to propagate through LLVM code. Defaults to OFF.
 
 **LLVM_ENABLE_PIC**:BOOL
   Add the ``-fPIC`` flag to the compiler command-line, if the compiler supports
   this flag. Some systems, like Windows, do not need this flag. Defaults to ON.
 
 **LLVM_ENABLE_RTTI**:BOOL
   Build LLVM with run-time type information. Defaults to OFF.
 
 **LLVM_ENABLE_WARNINGS**:BOOL
   Enable all compiler warnings. Defaults to ON.
 
 **LLVM_ENABLE_PEDANTIC**:BOOL
   Enable pedantic mode. This disables compiler-specific extensions, if
   possible. Defaults to ON.
 
 **LLVM_ENABLE_WERROR**:BOOL
   Stop and fail the build, if a compiler warning is triggered. Defaults to OFF.
 
 **LLVM_ABI_BREAKING_CHECKS**:STRING
   Used to decide if LLVM should be built with ABI breaking checks or
   not.  Allowed values are `WITH_ASSERTS` (default), `FORCE_ON` and
   `FORCE_OFF`.  `WITH_ASSERTS` turns on ABI breaking checks in an
   assertion enabled build.  `FORCE_ON` (`FORCE_OFF`) turns them on
   (off) irrespective of whether normal (`NDEBUG`-based) assertions are
   enabled or not.  A version of LLVM built with ABI breaking checks
   is not ABI compatible with a version built without it.
 
 **LLVM_BUILD_32_BITS**:BOOL
   Build 32-bit executables and libraries on 64-bit systems. This option is
   available only on some 64-bit Unix systems. Defaults to OFF.
 
 **LLVM_TARGET_ARCH**:STRING
   LLVM target to use for native code generation. This is required for JIT
   generation. It defaults to "host", meaning that it shall pick the architecture
   of the machine where LLVM is being built. If you are cross-compiling, set it
   to the target architecture name.
 
 **LLVM_TABLEGEN**:STRING
   Full path to a native TableGen executable (usually named ``llvm-tblgen``). This is
   intended for cross-compiling: if the user sets this variable, no native
   TableGen will be created.
 
 **LLVM_LIT_ARGS**:STRING
   Arguments given to lit.  ``make check`` and ``make clang-test`` are affected.
   By default, ``'-sv --no-progress-bar'`` on Visual C++ and Xcode, ``'-sv'`` on
   others.
 
 **LLVM_LIT_TOOLS_DIR**:PATH
   The path to GnuWin32 tools for tests. Valid on Windows host.  Defaults to
   the empty string, in which case lit will look for tools needed for tests
   (e.g. ``grep``, ``sort``, etc.) in your %PATH%. If GnuWin32 is not in your
   %PATH%, then you can set this variable to the GnuWin32 directory so that
   lit can find tools needed for tests in that directory.
 
 **LLVM_ENABLE_FFI**:BOOL
   Indicates whether the LLVM Interpreter will be linked with the Foreign Function
   Interface library (libffi) in order to enable calling external functions.
   If the library or its headers are installed in a custom
   location, you can also set the variables FFI_INCLUDE_DIR and
   FFI_LIBRARY_DIR to the directories where ffi.h and libffi.so can be found,
   respectively. Defaults to OFF.
 
 **LLVM_EXTERNAL_{CLANG,LLD,POLLY}_SOURCE_DIR**:PATH
   These variables specify the path to the source directory for the external
   LLVM projects Clang, lld, and Polly, respectively, relative to the top-level
   source directory.  If the in-tree subdirectory for an external project
   exists (e.g., llvm/tools/clang for Clang), then the corresponding variable
   will not be used.  If the variable for an external project does not point
   to a valid path, then that project will not be built.
 
 **LLVM_USE_OPROFILE**:BOOL
   Enable building OProfile JIT support. Defaults to OFF.
 
 **LLVM_PROFDATA_FILE**:PATH
   Path to a profdata file to pass into clang's -fprofile-instr-use flag. This
   can only be specified if you're building with clang.
 
 **LLVM_USE_INTEL_JITEVENTS**:BOOL
   Enable building support for Intel JIT Events API. Defaults to OFF.
 
 **LLVM_ENABLE_ZLIB**:BOOL
   Enable building with zlib to support compression/uncompression in LLVM tools.
   Defaults to ON.
 
 **LLVM_USE_SANITIZER**:STRING
   Define the sanitizer used to build LLVM binaries and tests. Possible values
   are ``Address``, ``Memory``, ``MemoryWithOrigins``, ``Undefined``, ``Thread``,
   and ``Address;Undefined``. Defaults to empty string.
 
 **LLVM_PARALLEL_COMPILE_JOBS**:STRING
   Define the maximum number of concurrent compilation jobs.
 
 **LLVM_PARALLEL_LINK_JOBS**:STRING
   Define the maximum number of concurrent link jobs.
 
 **LLVM_BUILD_DOCS**:BOOL
   Enables all enabled documentation targets (i.e. Doxgyen and Sphinx targets) to
   be built as part of the normal build. If the ``install`` target is run then
   this also enables all built documentation targets to be installed. Defaults to
   OFF.
 
 **LLVM_ENABLE_DOXYGEN**:BOOL
   Enables the generation of browsable HTML documentation using doxygen.
   Defaults to OFF.
 
 **LLVM_ENABLE_DOXYGEN_QT_HELP**:BOOL
   Enables the generation of a Qt Compressed Help file. Defaults to OFF.
   This affects the make target ``doxygen-llvm``. When enabled, apart from
   the normal HTML output generated by doxygen, this will produce a QCH file
   named ``org.llvm.qch``. You can then load this file into Qt Creator.
   This option is only useful in combination with ``-DLLVM_ENABLE_DOXYGEN=ON``;
   otherwise this has no effect.
 
 **LLVM_DOXYGEN_QCH_FILENAME**:STRING
   The filename of the Qt Compressed Help file that will be generated when
   ``-DLLVM_ENABLE_DOXYGEN=ON`` and
   ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON`` are given. Defaults to
   ``org.llvm.qch``.
   This option is only useful in combination with
   ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``;
   otherwise it has no effect.
 
 **LLVM_DOXYGEN_QHP_NAMESPACE**:STRING
   Namespace under which the intermediate Qt Help Project file lives. See `Qt
   Help Project`_
   for more information. Defaults to "org.llvm". This option is only useful in
   combination with ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``; otherwise
   it has no effect.
 
 **LLVM_DOXYGEN_QHP_CUST_FILTER_NAME**:STRING
   See `Qt Help Project`_ for
   more information. Defaults to the CMake variable ``${PACKAGE_STRING}`` which
   is a combination of the package name and version string. This filter can then
   be used in Qt Creator to select only documentation from LLVM when browsing
   through all the help files that you might have loaded. This option is only
   useful in combination with ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``;
   otherwise it has no effect.
 
 .. _Qt Help Project: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom-filters
 
 **LLVM_DOXYGEN_QHELPGENERATOR_PATH**:STRING
   The path to the ``qhelpgenerator`` executable. Defaults to whatever CMake's
   ``find_program()`` can find. This option is only useful in combination with
   ``-DLLVM_ENABLE_DOXYGEN_QT_HELP=ON``; otherwise it has no
   effect.
 
 **LLVM_DOXYGEN_SVG**:BOOL
   Uses .svg files instead of .png files for graphs in the Doxygen output.
   Defaults to OFF.
 
 **LLVM_ENABLE_SPHINX**:BOOL
   If enabled CMake will search for the ``sphinx-build`` executable and will make
   the ``SPHINX_OUTPUT_HTML`` and ``SPHINX_OUTPUT_MAN`` CMake options available.
   Defaults to OFF.
 
 **SPHINX_EXECUTABLE**:STRING
   The path to the ``sphinx-build`` executable detected by CMake.
 
 **SPHINX_OUTPUT_HTML**:BOOL
   If enabled (and ``LLVM_ENABLE_SPHINX`` is enabled) then the targets for
   building the documentation as html are added (but not built by default unless
   ``LLVM_BUILD_DOCS`` is enabled). There is a target for each project in the
   source tree that uses sphinx (e.g.  ``docs-llvm-html``, ``docs-clang-html``
   and ``docs-lld-html``). Defaults to ON.
 
 **SPHINX_OUTPUT_MAN**:BOOL
   If enabled (and ``LLVM_ENABLE_SPHINX`` is enabled) the targets for building
   the man pages are added (but not built by default unless ``LLVM_BUILD_DOCS``
   is enabled). Currently the only target added is ``docs-llvm-man``. Defaults
   to ON.
 
 **SPHINX_WARNINGS_AS_ERRORS**:BOOL
   If enabled then sphinx documentation warnings will be treated as
   errors. Defaults to ON.
 
 **LLVM_CREATE_XCODE_TOOLCHAIN**:BOOL
   OS X Only: If enabled CMake will generate a target named
   'install-xcode-toolchain'. This target will create a directory at
   $CMAKE_INSTALL_PREFIX/Toolchains containing an xctoolchain directory which can
   be used to override the default system tools. 
+
+**LLVM_BUILD_LLVM_DYLIB**:BOOL
+  If enabled, the target for building the libLLVM shared library is added.
+  This library contains all of LLVM's components in a single shared library.
+  Defaults to OFF. This cannot be used in conjunction with BUILD_SHARED_LIBS.
+  Tools will only be linked to the libLLVM shared library if LLVM_LINK_LLVM_DYLIB
+  is also ON.
+  The components in the library can be customised by setting LLVM_DYLIB_COMPONENTS
+  to a list of the desired components.
+
+**LLVM_LINK_LLVM_DYLIB**:BOOL
+  If enabled, tools will be linked with the libLLVM shared library. Defaults
+  to OFF. Setting LLVM_LINK_LLVM_DYLIB to ON also sets LLVM_BUILD_LLVM_DYLIB
+  to ON.
+
+**BUILD_SHARED_LIBS**:BOOL
+  Flag indicating if each LLVM component (e.g. Support) is built as a shared
+  library (ON) or as a static library (OFF). Its default value is OFF. On
+  Windows, shared libraries may be used when building with MinGW, including
+  mingw-w64, but not when building with the Microsoft toolchain.
+ 
+  .. note:: BUILD_SHARED_LIBS is only recommended for use by LLVM developers.
+            If you want to build LLVM as a shared library, you should use the
+            ``LLVM_BUILD_LLVM_DYLIB`` option.
 
 Executing the test suite
 ========================
 
 Testing is performed when the *check-all* target is built. For instance, if you are
 using Makefiles, execute this command in the root of your build directory:
 
 .. code-block:: console
 
   $ make check-all
 
 On Visual Studio, you may run tests by building the project "check-all".
 For more information about testing, see the :doc:`TestingGuide`.
 
 Cross compiling
 ===============
 
 See `this wiki page <http://www.vtk.org/Wiki/CMake_Cross_Compiling>`_ for
 generic instructions on how to cross-compile with CMake. It goes into detailed
 explanations and may seem daunting, but it is not. On the wiki page there are
 several examples including toolchain files. Go directly to `this section
 <http://www.vtk.org/Wiki/CMake_Cross_Compiling#Information_how_to_set_up_various_cross_compiling_toolchains>`_
 for a quick solution.
 
 Also see the `LLVM-specific variables`_ section for variables used when
 cross-compiling.
 
 Embedding LLVM in your project
 ==============================
 
 From LLVM 3.5 onwards both the CMake and autoconf/Makefile build systems export
 LLVM libraries as importable CMake targets. This means that clients of LLVM can
 now reliably use CMake to develop their own LLVM-based projects against an
 installed version of LLVM regardless of how it was built.
 
 Here is a simple example of a CMakeLists.txt file that imports the LLVM libraries
 and uses them to build a simple application ``simple-tool``.
 
 .. code-block:: cmake
 
   cmake_minimum_required(VERSION 2.8.8)
   project(SimpleProject)
 
   find_package(LLVM REQUIRED CONFIG)
 
   message(STATUS "Found LLVM ${LLVM_PACKAGE_VERSION}")
   message(STATUS "Using LLVMConfig.cmake in: ${LLVM_DIR}")
 
   # Set your project compile flags.
   # E.g. if using the C++ header files
   # you will need to enable C++11 support
   # for your compiler.
 
   include_directories(${LLVM_INCLUDE_DIRS})
   add_definitions(${LLVM_DEFINITIONS})
 
   # Now build our tools
   add_executable(simple-tool tool.cpp)
 
   # Find the libraries that correspond to the LLVM components
   # that we wish to use
   llvm_map_components_to_libnames(llvm_libs support core irreader)
 
   # Link against LLVM libraries
   target_link_libraries(simple-tool ${llvm_libs})
 
 The ``find_package(...)`` directive when used in CONFIG mode (as in the above
 example) will look for the ``LLVMConfig.cmake`` file in various locations (see
 cmake manual for details).  It creates a ``LLVM_DIR`` cache entry to save the
 directory where ``LLVMConfig.cmake`` is found or allows the user to specify the
 directory (e.g. by passing ``-DLLVM_DIR=/usr/share/llvm/cmake`` to
 the ``cmake`` command or by setting it directly in ``ccmake`` or ``cmake-gui``).
 
 This file is available in two different locations.
 
 * ``<INSTALL_PREFIX>/share/llvm/cmake/LLVMConfig.cmake`` where
   ``<INSTALL_PREFIX>`` is the install prefix of an installed version of LLVM.
   On Linux typically this is ``/usr/share/llvm/cmake/LLVMConfig.cmake``.
 
 * ``<LLVM_BUILD_ROOT>/share/llvm/cmake/LLVMConfig.cmake`` where
   ``<LLVM_BUILD_ROOT>`` is the root of the LLVM build tree. **Note: this is only
   available when building LLVM with CMake.**
 
 If LLVM is installed in your operating system's normal installation prefix (e.g.
 on Linux this is usually ``/usr/``) ``find_package(LLVM ...)`` will
 automatically find LLVM if it is installed correctly. If LLVM is not installed
 or you wish to build directly against the LLVM build tree you can use
 ``LLVM_DIR`` as previously mentioned.
 
 The ``LLVMConfig.cmake`` file sets various useful variables. Notable variables
 include
 
 ``LLVM_CMAKE_DIR``
   The path to the LLVM CMake directory (i.e. the directory containing
   LLVMConfig.cmake).
 
 ``LLVM_DEFINITIONS``
   A list of preprocessor defines that should be used when building against LLVM.
 
 ``LLVM_ENABLE_ASSERTIONS``
   This is set to ON if LLVM was built with assertions, otherwise OFF.
 
 ``LLVM_ENABLE_EH``
   This is set to ON if LLVM was built with exception handling (EH) enabled,
   otherwise OFF.
 
 ``LLVM_ENABLE_RTTI``
   This is set to ON if LLVM was built with run time type information (RTTI),
   otherwise OFF.
 
 ``LLVM_INCLUDE_DIRS``
   A list of include paths to directories containing LLVM header files.
 
 ``LLVM_PACKAGE_VERSION``
   The LLVM version. This string can be used with CMake conditionals, e.g., ``if
   (${LLVM_PACKAGE_VERSION} VERSION_LESS "3.5")``.
 
 ``LLVM_TOOLS_BINARY_DIR``
   The path to the directory containing the LLVM tools (e.g. ``llvm-as``).
 
 Notice that in the above example we link ``simple-tool`` against several LLVM
 libraries. The list of libraries is determined by using the
 ``llvm_map_components_to_libnames()`` CMake function. For a list of available
 components look at the output of running ``llvm-config --components``.
 
 Note that for LLVM < 3.5 ``llvm_map_components_to_libraries()`` was
 used instead of ``llvm_map_components_to_libnames()``. This is now deprecated
 and will be removed in a future version of LLVM.
 
 .. _cmake-out-of-source-pass:
 
 Developing LLVM passes out of source
 ------------------------------------
 
 It is possible to develop LLVM passes out of LLVM's source tree (i.e. against an
 installed or built LLVM). An example of a project layout is provided below.
 
 .. code-block:: none
 
   <project dir>/
       |
       CMakeLists.txt
       <pass name>/
           |
           CMakeLists.txt
           Pass.cpp
           ...
 
 Contents of ``<project dir>/CMakeLists.txt``:
 
 .. code-block:: cmake
 
   find_package(LLVM REQUIRED CONFIG)
 
   add_definitions(${LLVM_DEFINITIONS})
   include_directories(${LLVM_INCLUDE_DIRS})
 
   add_subdirectory(<pass name>)
 
 Contents of ``<project dir>/<pass name>/CMakeLists.txt``:
 
 .. code-block:: cmake
 
   add_library(LLVMPassname MODULE Pass.cpp)
 
 Note if you intend for this pass to be merged into the LLVM source tree at some
 point in the future it might make more sense to use LLVM's internal
 ``add_llvm_loadable_module`` function instead by...
 
 
 Adding the following to ``<project dir>/CMakeLists.txt`` (after
 ``find_package(LLVM ...)``)
 
 .. code-block:: cmake
 
   list(APPEND CMAKE_MODULE_PATH "${LLVM_CMAKE_DIR}")
   include(AddLLVM)
 
 And then changing ``<project dir>/<pass name>/CMakeLists.txt`` to
 
 .. code-block:: cmake
 
   add_llvm_loadable_module(LLVMPassname
     Pass.cpp
     )
 
 When you are done developing your pass, you may wish to integrate it
 into the LLVM source tree. You can achieve it in two easy steps:
 
 #. Copying ``<pass name>`` folder into ``<LLVM root>/lib/Transform`` directory.
 
 #. Adding ``add_subdirectory(<pass name>)`` line into
    ``<LLVM root>/lib/Transform/CMakeLists.txt``.
 
 Compiler/Platform-specific topics
 =================================
 
 Notes for specific compilers and/or platforms.
 
 Microsoft Visual C++
 --------------------
 
 **LLVM_COMPILER_JOBS**:STRING
   Specifies the maximum number of parallel compiler jobs to use per project
   when building with msbuild or Visual Studio. Only supported for the Visual
   Studio 2010 CMake generator. 0 means use all processors. Default is 0.
Index: vendor/llvm/dist/docs/ReleaseNotes.rst
===================================================================
--- vendor/llvm/dist/docs/ReleaseNotes.rst	(revision 295845)
+++ vendor/llvm/dist/docs/ReleaseNotes.rst	(revision 295846)
@@ -1,274 +1,305 @@
 ======================
 LLVM 3.8 Release Notes
 ======================
 
 .. contents::
     :local:
 
 
 Introduction
 ============
 
 This document contains the release notes for the LLVM Compiler Infrastructure,
 release 3.8.  Here we describe the status of LLVM, including major improvements
 from the previous release, improvements in various subprojects of LLVM, and
 some of the current users of the code.  All LLVM releases may be downloaded
 from the `LLVM releases web site <http://llvm.org/releases/>`_.
 
 For more information about LLVM, including information about the latest
 release, please check out the `main LLVM web site <http://llvm.org/>`_.  If you
 have questions or comments, the `LLVM Developer's Mailing List
 <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ is a good place to send
 them.
 
 Non-comprehensive list of changes in this release
 =================================================
 * With this release, the minimum Windows version required for running LLVM is
   Windows 7. Earlier versions, including Windows Vista and XP are no longer
   supported.
 
 * With this release, the autoconf build system is deprecated. It will be removed
   in the 3.9 release. Please migrate to using CMake. For more information see:
   `Building LLVM with CMake <CMake.html>`_
 
 * The C API function LLVMLinkModules is deprecated. It will be removed in the
   3.9 release. Please migrate to LLVMLinkModules2. Unlike the old function the
   new one
 
    * Doesn't take an unused parameter.
    * Destroys the source instead of only damaging it.
    * Does not record a message. Use the diagnostic handler instead.
 
 * The C API functions LLVMParseBitcode, LLVMParseBitcodeInContext,
   LLVMGetBitcodeModuleInContext and LLVMGetBitcodeModule have been deprecated.
   They will be removed in 3.9. Please migrate to the versions with a 2 suffix.
   Unlike the old ones the new ones do not record a diagnostic message. Use
   the diagnostic handler instead.
 
 * The deprecated C APIs LLVMGetBitcodeModuleProviderInContext and
   LLVMGetBitcodeModuleProvider have been removed.
 
 * The deprecated C APIs LLVMCreateExecutionEngine, LLVMCreateInterpreter,
   LLVMCreateJITCompiler, LLVMAddModuleProvider and LLVMRemoveModuleProvider
   have been removed.
 
 * With this release, the C API headers have been reorganized to improve build
   time. Type specific declarations have been moved to Type.h, and error
   handling routines have been moved to ErrorHandling.h. Both are included in
   Core.h so nothing should change for projects directly including the headers,
   but transitive dependencies may be affected.
 
 * llvm-ar now suports thin archives.
 
 * llvm doesn't produce .data.rel.ro.local or .data.rel sections anymore.
 
 * aliases to available_externally globals are now rejected by the verifier.
 
 * the IR Linker has been split into IRMover that moves bits from one module to
   another and Linker proper that decides what to link.
 
 * Support for dematerializing has been dropped.
 
 * RegisterScheduler::setDefault was removed. Targets that used to call into the
   command line parser to set the DAGScheduler, and that don't have enough
   control with setSchedulingPreference, should look into overriding the
   SubTargetHook "getDAGScheduler()".
 
 * ``ilist_iterator<T>`` no longer has implicit conversions to and from ``T*``,
   since ``ilist_iterator<T>`` may be pointing at the sentinel (which is usually
   not of type ``T`` at all).  To convert from an iterator ``I`` to a pointer,
   use ``&*I``; to convert from a pointer ``P`` to an iterator, use
   ``P->getIterator()``.  Alternatively, explicit conversions via
   ``static_cast<T>(U)`` are still available.
 
 * ``ilist_node<T>::getNextNode()`` and ``ilist_node<T>::getPrevNode()`` now
   fail at compile time when the node cannot access its parent list.
   Previously, when the sentinel was was an ``ilist_half_node<T>``, this API
   could return the sentinal instead of ``nullptr``.  Frustrated callers should
   be updated to use ``iplist<T>::getNextNode(T*)`` instead.  Alternatively, if
   the node ``N`` is guaranteed not to be the last in the list, it is safe to
   call ``&*++N->getIterator()`` directly.
 
+* The `Kaleidoscope tutorials <tutorial/index.html>`_ have been updated to use
+  the ORC JIT APIs.
+
+* ORC now has a basic set of C bindings.
+
+* Optional support for linking clang and the LLVM tools with a single libLLVM
+  shared library. To enable this, pass ``-DLLVM_LINK_LLVM_DYLIB=ON`` to CMake.
+  See `Building LLVM with CMake`_ for more details.
+
+* The optimization to move the prologue and epilogue of functions in colder
+  code path (shrink-wrapping) is now enabled by default.
+
+* A new target-independent gcc-compatible emulated Thread Local Storage mode
+  is added.  When ``-femultated-tls`` flag is used, all accesses to TLS
+  variables are converted to calls to ``__emutls_get_address`` in the runtime
+  library.
+
+* MSVC compatible exception handling has been completely overhauled. New
+  instructions have been introduced to facilitate this:
+  `New exception handling instructions <ExceptionHandling.html#new-exception-handling-instructions>`_. 
+  While we have done our best to test this feature thoroughly, it would
+  not be completely surprising if there were a few lingering issues that
+  early adopters might bump into.
+
 .. NOTE
    For small 1-3 sentence descriptions, just add an entry at the end of
    this list. If your description won't fit comfortably in one bullet
    point (e.g. maybe you would like to give an example of the
    functionality, or simply have a lot to talk about), see the `NOTE` below
    for adding a new subsection.
 
 * ... next change ...
 
 .. NOTE
    If you would like to document a larger change, then you can add a
    subsection about it right here. You can copy the following boilerplate
    and un-indent it (the indentation causes it to be inside this comment).
 
    Special New Feature
    -------------------
 
    Makes programs 10x faster by doing Special New Thing.
 
 
 Changes to the ARM Backends
 ---------------------------
 
 During this release the AArch64 target has:
 
 * Added support for more sanitizers (MSAN, TSAN) and made them compatible with
-  all VMA kernel configurations (kurrently tested on 39 and 42 bits).
+  all VMA kernel configurations (currently tested on 39 and 42 bits).
 * Gained initial LLD support in the new ELF back-end
 * Extended the Load/Store optimiser and cleaned up some of the bad decisions
   made earlier.
 * Expanded LLDB support, including watchpoints, native building, Renderscript,
   LLDB-server, debugging 32-bit applications.
 * Added support for the ``Exynos M1`` chip.
 
 During this release the ARM target has:
 
 * Gained massive performance improvements on embedded benchmarks due to finally
   running the stride vectorizer in full form, incrementing the performance gains
   that we already had in the previous releases with limited stride vectorization.
 * Expanded LLDB support, including watchpoints, unwind tables
 * Extended the Load/Store optimiser and cleaned up some of the bad decisions
   made earlier.
 * Simplified code generation for global variable addresses in ELF, resulting in
   a significant (4% in Chromium) reduction in code size.
 * Gained some additional code size improvements, though there's still a long road
   ahead, especially for older cores.
 * Added some EABI floating point comparison functions to Compiler-RT
 * Added support for Windows+GNU triple, +features in -mcpu/-march options.
 
 
 Changes to the MIPS Target
 --------------------------
 
 During this release the MIPS target has:
 
 * Significantly extended support for the Integrated Assembler. See below for
   more information
 * Added support for the ``P5600`` processor.
 * Added support for the ``interrupt`` attribute for MIPS32R2 and later. This
   attribute will generate a function which can be used as a interrupt handler
   on bare metal MIPS targets using the static relocation model.
 * Added support for the ``ERETNC`` instruction found in MIPS32R5 and later.
 * Added support for OpenCL. See http://portablecl.org/.
 
   * Address spaces 1 to 255 are now reserved for software use and conversions
     between them are no-op casts.
 
 * Removed the ``mips16`` value for the -mcpu option since it is an :abbr:`ASE
   (Application Specific Extension)` and not a processor. If you were using this,
   please specify another CPU and use ``-mips16`` to enable MIPS16.
 * Removed ``copy_u.w`` from 32-bit MSA and ``copy_u.d`` from 64-bit MSA since
   they have been removed from the MSA specification due to forward compatibility
   issues.  For example, 32-bit MSA code containing ``copy_u.w`` would behave
   differently on a 64-bit processor supporting MSA. The corresponding intrinsics
   are still available and may expand to ``copy_s.[wd]`` where this is
   appropriate for forward compatibility purposes.
 * Relaxed the ``-mnan`` option to allow ``-mnan=2008`` on MIPS32R2/MIPS64R2 for
   compatibility with GCC.
 * Made MIPS64R6 the default CPU for 64-bit Android triples.
 
 The MIPS target has also fixed various bugs including the following notable
 fixes:
 
 * Fixed reversed operands on ``mthi``/``mtlo`` in the DSP :abbr:`ASE
   (Application Specific Extension)`.
 * The code generator no longer uses ``jal`` for calls to absolute immediate
   addresses.
 * Disabled fast instruction selection on MIPS32R6 and MIPS64R6 since this is not
   yet supported.
 * Corrected addend for ``R_MIPS_HI16`` and ``R_MIPS_PCHI16`` in MCJIT
 * The code generator no longer crashes when handling subregisters of an 64-bit
   FPU register with undefined value.
 * The code generator no longer attempts to use ``$zero`` for operands that do
   not permit ``$zero``.
 * Corrected the opcode used for ``ll``/``sc`` when using MIPS32R6/MIPS64R6 and
   the Integrated Assembler.
 * Added support for atomic load and atomic store.
 * Corrected debug info when dynamically re-aligning the stack.
 
 Integrated Assembler
 ^^^^^^^^^^^^^^^^^^^^
 We have made a large number of improvements to the integrated assembler for
 MIPS. In this release, the integrated assembler isn't quite production-ready
 since there are a few known issues related to bare-metal support, checking
 immediates on instructions, and the N32/N64 ABI's. However, the current support
 should be sufficient for many users of the O32 ABI, particularly those targeting
 MIPS32 on Linux or bare-metal MIPS32.
 
 If you would like to try the integrated assembler, please use
 ``-fintegrated-as``.
 
 Changes to the PowerPC Target
 -----------------------------
 
  During this release ...
 
 
 Changes to the X86 Target
 -----------------------------
 
  During this release ...
 
 * TLS is enabled for Cygwin as emutls.
 
 * Smaller code for materializing 32-bit 1 and -1 constants at ``-Os``.
 
 * More efficient code for wide integer compares. (E.g. 64-bit compares
   on 32-bit targets.)
 
-* Tail call support for ``thiscall``, ``stdcall`, ``vectorcall``, and
+* Tail call support for ``thiscall``, ``stdcall``, ``vectorcall``, and
   ``fastcall`` functions.
+
+Changes to the Hexagon Target
+-----------------------------
+
+In addition to general code size and performance improvements, Hexagon target
+now has basic support for Hexagon V60 architecture and Hexagon Vector
+Extensions (HVX).
 
 Changes to the AVR Target
 -------------------------
 
 Slightly less than half of the AVR backend has been merged in at this point. It is still
 missing a number large parts which cause it to be unusable, but is well on the
 road to being completely merged and workable.
 
 Changes to the OCaml bindings
 -----------------------------
 
  During this release ...
 
 * The ocaml function link_modules has been replaced with link_modules' which
   uses LLVMLinkModules2.
 
 
 External Open Source Projects Using LLVM 3.8
 ============================================
 
 An exciting aspect of LLVM is that it is used as an enabling technology for
 a lot of other language and tools projects. This section lists some of the
 projects that have already been updated to work with LLVM 3.8.
 
 LDC - the LLVM-based D compiler
 -------------------------------
 
 `D <http://dlang.org>`_ is a language with C-like syntax and static typing. It
 pragmatically combines efficiency, control, and modeling power, with safety and
 programmer productivity. D supports powerful concepts like Compile-Time Function
 Execution (CTFE) and Template Meta-Programming, provides an innovative approach
 to concurrency and offers many classical paradigms.
 
 `LDC <http://wiki.dlang.org/LDC>`_ uses the frontend from the reference compiler
 combined with LLVM as backend to produce efficient native code. LDC targets
 x86/x86_64 systems like Linux, OS X and Windows and also PowerPC (32/64 bit)
 and ARM. Ports to other architectures like AArch64 and MIPS64 are underway.
 
 
 Additional Information
 ======================
 
 A wide variety of additional information is available on the `LLVM web page
 <http://llvm.org/>`_, in particular in the `documentation
 <http://llvm.org/docs/>`_ section.  The web page also contains versions of the
 API documentation which is up-to-date with the Subversion version of the source
 code.  You can access versions of these documents specific to this release by
 going into the ``llvm/docs/`` directory in the LLVM tree.
 
 If you have any questions or comments about LLVM, please feel free to contact
 us via the `mailing lists <http://llvm.org/docs/#maillist>`_.
 
Index: vendor/llvm/dist/include/llvm/CodeGen/LiveInterval.h
===================================================================
--- vendor/llvm/dist/include/llvm/CodeGen/LiveInterval.h	(revision 295845)
+++ vendor/llvm/dist/include/llvm/CodeGen/LiveInterval.h	(revision 295846)
@@ -1,868 +1,873 @@
 //===-- llvm/CodeGen/LiveInterval.h - Interval representation ---*- C++ -*-===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file implements the LiveRange and LiveInterval classes.  Given some
 // numbering of each the machine instructions an interval [i, j) is said to be a
 // live range for register v if there is no instruction with number j' >= j
 // such that v is live at j' and there is no instruction with number i' < i such
 // that v is live at i'. In this implementation ranges can have holes,
 // i.e. a range might look like [1,20), [50,65), [1000,1001).  Each
 // individual segment is represented as an instance of LiveRange::Segment,
 // and the whole range is represented as an instance of LiveRange.
 //
 //===----------------------------------------------------------------------===//
 
 #ifndef LLVM_CODEGEN_LIVEINTERVAL_H
 #define LLVM_CODEGEN_LIVEINTERVAL_H
 
 #include "llvm/ADT/IntEqClasses.h"
 #include "llvm/CodeGen/SlotIndexes.h"
 #include "llvm/Support/AlignOf.h"
 #include "llvm/Support/Allocator.h"
 #include "llvm/Target/TargetRegisterInfo.h"
 #include <cassert>
 #include <climits>
 #include <set>
 
 namespace llvm {
   class CoalescerPair;
   class LiveIntervals;
   class MachineInstr;
   class MachineRegisterInfo;
   class TargetRegisterInfo;
   class raw_ostream;
   template <typename T, unsigned Small> class SmallPtrSet;
 
   /// VNInfo - Value Number Information.
   /// This class holds information about a machine level values, including
   /// definition and use points.
   ///
   class VNInfo {
   public:
     typedef BumpPtrAllocator Allocator;
 
     /// The ID number of this value.
     unsigned id;
 
     /// The index of the defining instruction.
     SlotIndex def;
 
     /// VNInfo constructor.
     VNInfo(unsigned i, SlotIndex d)
       : id(i), def(d)
     { }
 
     /// VNInfo construtor, copies values from orig, except for the value number.
     VNInfo(unsigned i, const VNInfo &orig)
       : id(i), def(orig.def)
     { }
 
     /// Copy from the parameter into this VNInfo.
     void copyFrom(VNInfo &src) {
       def = src.def;
     }
 
     /// Returns true if this value is defined by a PHI instruction (or was,
     /// PHI instructions may have been eliminated).
     /// PHI-defs begin at a block boundary, all other defs begin at register or
     /// EC slots.
     bool isPHIDef() const { return def.isBlock(); }
 
     /// Returns true if this value is unused.
     bool isUnused() const { return !def.isValid(); }
 
     /// Mark this value as unused.
     void markUnused() { def = SlotIndex(); }
   };
 
   /// Result of a LiveRange query. This class hides the implementation details
   /// of live ranges, and it should be used as the primary interface for
   /// examining live ranges around instructions.
   class LiveQueryResult {
     VNInfo *const EarlyVal;
     VNInfo *const LateVal;
     const SlotIndex EndPoint;
     const bool Kill;
 
   public:
     LiveQueryResult(VNInfo *EarlyVal, VNInfo *LateVal, SlotIndex EndPoint,
                     bool Kill)
       : EarlyVal(EarlyVal), LateVal(LateVal), EndPoint(EndPoint), Kill(Kill)
     {}
 
     /// Return the value that is live-in to the instruction. This is the value
     /// that will be read by the instruction's use operands. Return NULL if no
     /// value is live-in.
     VNInfo *valueIn() const {
       return EarlyVal;
     }
 
     /// Return true if the live-in value is killed by this instruction. This
     /// means that either the live range ends at the instruction, or it changes
     /// value.
     bool isKill() const {
       return Kill;
     }
 
     /// Return true if this instruction has a dead def.
     bool isDeadDef() const {
       return EndPoint.isDead();
     }
 
     /// Return the value leaving the instruction, if any. This can be a
     /// live-through value, or a live def. A dead def returns NULL.
     VNInfo *valueOut() const {
       return isDeadDef() ? nullptr : LateVal;
     }
 
     /// Returns the value alive at the end of the instruction, if any. This can
     /// be a live-through value, a live def or a dead def.
     VNInfo *valueOutOrDead() const {
       return LateVal;
     }
 
     /// Return the value defined by this instruction, if any. This includes
     /// dead defs, it is the value created by the instruction's def operands.
     VNInfo *valueDefined() const {
       return EarlyVal == LateVal ? nullptr : LateVal;
     }
 
     /// Return the end point of the last live range segment to interact with
     /// the instruction, if any.
     ///
     /// The end point is an invalid SlotIndex only if the live range doesn't
     /// intersect the instruction at all.
     ///
     /// The end point may be at or past the end of the instruction's basic
     /// block. That means the value was live out of the block.
     SlotIndex endPoint() const {
       return EndPoint;
     }
   };
 
   /// This class represents the liveness of a register, stack slot, etc.
   /// It manages an ordered list of Segment objects.
   /// The Segments are organized in a static single assignment form: At places
   /// where a new value is defined or different values reach a CFG join a new
   /// segment with a new value number is used.
   class LiveRange {
   public:
 
     /// This represents a simple continuous liveness interval for a value.
     /// The start point is inclusive, the end point exclusive. These intervals
     /// are rendered as [start,end).
     struct Segment {
       SlotIndex start;  // Start point of the interval (inclusive)
       SlotIndex end;    // End point of the interval (exclusive)
       VNInfo *valno;    // identifier for the value contained in this segment.
 
       Segment() : valno(nullptr) {}
 
       Segment(SlotIndex S, SlotIndex E, VNInfo *V)
         : start(S), end(E), valno(V) {
         assert(S < E && "Cannot create empty or backwards segment");
       }
 
       /// Return true if the index is covered by this segment.
       bool contains(SlotIndex I) const {
         return start <= I && I < end;
       }
 
       /// Return true if the given interval, [S, E), is covered by this segment.
       bool containsInterval(SlotIndex S, SlotIndex E) const {
         assert((S < E) && "Backwards interval?");
         return (start <= S && S < end) && (start < E && E <= end);
       }
 
       bool operator<(const Segment &Other) const {
         return std::tie(start, end) < std::tie(Other.start, Other.end);
       }
       bool operator==(const Segment &Other) const {
         return start == Other.start && end == Other.end;
       }
 
       void dump() const;
     };
 
     typedef SmallVector<Segment,4> Segments;
     typedef SmallVector<VNInfo*,4> VNInfoList;
 
     Segments segments;   // the liveness segments
     VNInfoList valnos;   // value#'s
 
     // The segment set is used temporarily to accelerate initial computation
     // of live ranges of physical registers in computeRegUnitRange.
     // After that the set is flushed to the segment vector and deleted.
     typedef std::set<Segment> SegmentSet;
     std::unique_ptr<SegmentSet> segmentSet;
 
     typedef Segments::iterator iterator;
     iterator begin() { return segments.begin(); }
     iterator end()   { return segments.end(); }
 
     typedef Segments::const_iterator const_iterator;
     const_iterator begin() const { return segments.begin(); }
     const_iterator end() const  { return segments.end(); }
 
     typedef VNInfoList::iterator vni_iterator;
     vni_iterator vni_begin() { return valnos.begin(); }
     vni_iterator vni_end()   { return valnos.end(); }
 
     typedef VNInfoList::const_iterator const_vni_iterator;
     const_vni_iterator vni_begin() const { return valnos.begin(); }
     const_vni_iterator vni_end() const   { return valnos.end(); }
 
     /// Constructs a new LiveRange object.
     LiveRange(bool UseSegmentSet = false)
         : segmentSet(UseSegmentSet ? llvm::make_unique<SegmentSet>()
                                    : nullptr) {}
 
     /// Constructs a new LiveRange object by copying segments and valnos from
     /// another LiveRange.
     LiveRange(const LiveRange &Other, BumpPtrAllocator &Allocator) {
       assert(Other.segmentSet == nullptr &&
              "Copying of LiveRanges with active SegmentSets is not supported");
 
       // Duplicate valnos.
       for (const VNInfo *VNI : Other.valnos) {
         createValueCopy(VNI, Allocator);
       }
       // Now we can copy segments and remap their valnos.
       for (const Segment &S : Other.segments) {
         segments.push_back(Segment(S.start, S.end, valnos[S.valno->id]));
       }
     }
 
     /// advanceTo - Advance the specified iterator to point to the Segment
     /// containing the specified position, or end() if the position is past the
     /// end of the range.  If no Segment contains this position, but the
     /// position is in a hole, this method returns an iterator pointing to the
     /// Segment immediately after the hole.
     iterator advanceTo(iterator I, SlotIndex Pos) {
       assert(I != end());
       if (Pos >= endIndex())
         return end();
       while (I->end <= Pos) ++I;
       return I;
     }
 
     const_iterator advanceTo(const_iterator I, SlotIndex Pos) const {
       assert(I != end());
       if (Pos >= endIndex())
         return end();
       while (I->end <= Pos) ++I;
       return I;
     }
 
     /// find - Return an iterator pointing to the first segment that ends after
     /// Pos, or end(). This is the same as advanceTo(begin(), Pos), but faster
     /// when searching large ranges.
     ///
     /// If Pos is contained in a Segment, that segment is returned.
     /// If Pos is in a hole, the following Segment is returned.
     /// If Pos is beyond endIndex, end() is returned.
     iterator find(SlotIndex Pos);
 
     const_iterator find(SlotIndex Pos) const {
       return const_cast<LiveRange*>(this)->find(Pos);
     }
 
     void clear() {
       valnos.clear();
       segments.clear();
     }
 
     size_t size() const {
       return segments.size();
     }
 
     bool hasAtLeastOneValue() const { return !valnos.empty(); }
 
     bool containsOneValue() const { return valnos.size() == 1; }
 
     unsigned getNumValNums() const { return (unsigned)valnos.size(); }
 
     /// getValNumInfo - Returns pointer to the specified val#.
     ///
     inline VNInfo *getValNumInfo(unsigned ValNo) {
       return valnos[ValNo];
     }
     inline const VNInfo *getValNumInfo(unsigned ValNo) const {
       return valnos[ValNo];
     }
 
     /// containsValue - Returns true if VNI belongs to this range.
     bool containsValue(const VNInfo *VNI) const {
       return VNI && VNI->id < getNumValNums() && VNI == getValNumInfo(VNI->id);
     }
 
     /// getNextValue - Create a new value number and return it.  MIIdx specifies
     /// the instruction that defines the value number.
     VNInfo *getNextValue(SlotIndex def, VNInfo::Allocator &VNInfoAllocator) {
       VNInfo *VNI =
         new (VNInfoAllocator) VNInfo((unsigned)valnos.size(), def);
       valnos.push_back(VNI);
       return VNI;
     }
 
     /// createDeadDef - Make sure the range has a value defined at Def.
     /// If one already exists, return it. Otherwise allocate a new value and
     /// add liveness for a dead def.
     VNInfo *createDeadDef(SlotIndex Def, VNInfo::Allocator &VNInfoAllocator);
 
     /// Create a copy of the given value. The new value will be identical except
     /// for the Value number.
     VNInfo *createValueCopy(const VNInfo *orig,
                             VNInfo::Allocator &VNInfoAllocator) {
       VNInfo *VNI =
         new (VNInfoAllocator) VNInfo((unsigned)valnos.size(), *orig);
       valnos.push_back(VNI);
       return VNI;
     }
 
     /// RenumberValues - Renumber all values in order of appearance and remove
     /// unused values.
     void RenumberValues();
 
     /// MergeValueNumberInto - This method is called when two value numbers
     /// are found to be equivalent.  This eliminates V1, replacing all
     /// segments with the V1 value number with the V2 value number.  This can
     /// cause merging of V1/V2 values numbers and compaction of the value space.
     VNInfo* MergeValueNumberInto(VNInfo *V1, VNInfo *V2);
 
     /// Merge all of the live segments of a specific val# in RHS into this live
     /// range as the specified value number. The segments in RHS are allowed
     /// to overlap with segments in the current range, it will replace the
     /// value numbers of the overlaped live segments with the specified value
     /// number.
     void MergeSegmentsInAsValue(const LiveRange &RHS, VNInfo *LHSValNo);
 
     /// MergeValueInAsValue - Merge all of the segments of a specific val#
     /// in RHS into this live range as the specified value number.
     /// The segments in RHS are allowed to overlap with segments in the
     /// current range, but only if the overlapping segments have the
     /// specified value number.
     void MergeValueInAsValue(const LiveRange &RHS,
                              const VNInfo *RHSValNo, VNInfo *LHSValNo);
 
     bool empty() const { return segments.empty(); }
 
     /// beginIndex - Return the lowest numbered slot covered.
     SlotIndex beginIndex() const {
       assert(!empty() && "Call to beginIndex() on empty range.");
       return segments.front().start;
     }
 
     /// endNumber - return the maximum point of the range of the whole,
     /// exclusive.
     SlotIndex endIndex() const {
       assert(!empty() && "Call to endIndex() on empty range.");
       return segments.back().end;
     }
 
     bool expiredAt(SlotIndex index) const {
       return index >= endIndex();
     }
 
     bool liveAt(SlotIndex index) const {
       const_iterator r = find(index);
       return r != end() && r->start <= index;
     }
 
     /// Return the segment that contains the specified index, or null if there
     /// is none.
     const Segment *getSegmentContaining(SlotIndex Idx) const {
       const_iterator I = FindSegmentContaining(Idx);
       return I == end() ? nullptr : &*I;
     }
 
     /// Return the live segment that contains the specified index, or null if
     /// there is none.
     Segment *getSegmentContaining(SlotIndex Idx) {
       iterator I = FindSegmentContaining(Idx);
       return I == end() ? nullptr : &*I;
     }
 
     /// getVNInfoAt - Return the VNInfo that is live at Idx, or NULL.
     VNInfo *getVNInfoAt(SlotIndex Idx) const {
       const_iterator I = FindSegmentContaining(Idx);
       return I == end() ? nullptr : I->valno;
     }
 
     /// getVNInfoBefore - Return the VNInfo that is live up to but not
     /// necessarilly including Idx, or NULL. Use this to find the reaching def
     /// used by an instruction at this SlotIndex position.
     VNInfo *getVNInfoBefore(SlotIndex Idx) const {
       const_iterator I = FindSegmentContaining(Idx.getPrevSlot());
       return I == end() ? nullptr : I->valno;
     }
 
     /// Return an iterator to the segment that contains the specified index, or
     /// end() if there is none.
     iterator FindSegmentContaining(SlotIndex Idx) {
       iterator I = find(Idx);
       return I != end() && I->start <= Idx ? I : end();
     }
 
     const_iterator FindSegmentContaining(SlotIndex Idx) const {
       const_iterator I = find(Idx);
       return I != end() && I->start <= Idx ? I : end();
     }
 
     /// overlaps - Return true if the intersection of the two live ranges is
     /// not empty.
     bool overlaps(const LiveRange &other) const {
       if (other.empty())
         return false;
       return overlapsFrom(other, other.begin());
     }
 
     /// overlaps - Return true if the two ranges have overlapping segments
     /// that are not coalescable according to CP.
     ///
     /// Overlapping segments where one range is defined by a coalescable
     /// copy are allowed.
     bool overlaps(const LiveRange &Other, const CoalescerPair &CP,
                   const SlotIndexes&) const;
 
     /// overlaps - Return true if the live range overlaps an interval specified
     /// by [Start, End).
     bool overlaps(SlotIndex Start, SlotIndex End) const;
 
     /// overlapsFrom - Return true if the intersection of the two live ranges
     /// is not empty.  The specified iterator is a hint that we can begin
     /// scanning the Other range starting at I.
     bool overlapsFrom(const LiveRange &Other, const_iterator I) const;
 
     /// Returns true if all segments of the @p Other live range are completely
     /// covered by this live range.
     /// Adjacent live ranges do not affect the covering:the liverange
     /// [1,5](5,10] covers (3,7].
     bool covers(const LiveRange &Other) const;
 
     /// Add the specified Segment to this range, merging segments as
     /// appropriate.  This returns an iterator to the inserted segment (which
     /// may have grown since it was inserted).
     iterator addSegment(Segment S);
 
     /// If this range is live before @p Use in the basic block that starts at
     /// @p StartIdx, extend it to be live up to @p Use, and return the value. If
     /// there is no segment before @p Use, return nullptr.
     VNInfo *extendInBlock(SlotIndex StartIdx, SlotIndex Use);
 
     /// join - Join two live ranges (this, and other) together.  This applies
     /// mappings to the value numbers in the LHS/RHS ranges as specified.  If
     /// the ranges are not joinable, this aborts.
     void join(LiveRange &Other,
               const int *ValNoAssignments,
               const int *RHSValNoAssignments,
               SmallVectorImpl<VNInfo *> &NewVNInfo);
 
     /// True iff this segment is a single segment that lies between the
     /// specified boundaries, exclusively. Vregs live across a backedge are not
     /// considered local. The boundaries are expected to lie within an extended
     /// basic block, so vregs that are not live out should contain no holes.
     bool isLocal(SlotIndex Start, SlotIndex End) const {
       return beginIndex() > Start.getBaseIndex() &&
         endIndex() < End.getBoundaryIndex();
     }
 
     /// Remove the specified segment from this range.  Note that the segment
     /// must be a single Segment in its entirety.
     void removeSegment(SlotIndex Start, SlotIndex End,
                        bool RemoveDeadValNo = false);
 
     void removeSegment(Segment S, bool RemoveDeadValNo = false) {
       removeSegment(S.start, S.end, RemoveDeadValNo);
     }
 
     /// Remove segment pointed to by iterator @p I from this range.  This does
     /// not remove dead value numbers.
     iterator removeSegment(iterator I) {
       return segments.erase(I);
     }
 
     /// Query Liveness at Idx.
     /// The sub-instruction slot of Idx doesn't matter, only the instruction
     /// it refers to is considered.
     LiveQueryResult Query(SlotIndex Idx) const {
       // Find the segment that enters the instruction.
       const_iterator I = find(Idx.getBaseIndex());
       const_iterator E = end();
       if (I == E)
         return LiveQueryResult(nullptr, nullptr, SlotIndex(), false);
 
       // Is this an instruction live-in segment?
       // If Idx is the start index of a basic block, include live-in segments
       // that start at Idx.getBaseIndex().
       VNInfo *EarlyVal = nullptr;
       VNInfo *LateVal  = nullptr;
       SlotIndex EndPoint;
       bool Kill = false;
       if (I->start <= Idx.getBaseIndex()) {
         EarlyVal = I->valno;
         EndPoint = I->end;
         // Move to the potentially live-out segment.
         if (SlotIndex::isSameInstr(Idx, I->end)) {
           Kill = true;
           if (++I == E)
             return LiveQueryResult(EarlyVal, LateVal, EndPoint, Kill);
         }
         // Special case: A PHIDef value can have its def in the middle of a
         // segment if the value happens to be live out of the layout
         // predecessor.
         // Such a value is not live-in.
         if (EarlyVal->def == Idx.getBaseIndex())
           EarlyVal = nullptr;
       }
       // I now points to the segment that may be live-through, or defined by
       // this instr. Ignore segments starting after the current instr.
       if (!SlotIndex::isEarlierInstr(Idx, I->start)) {
         LateVal = I->valno;
         EndPoint = I->end;
       }
       return LiveQueryResult(EarlyVal, LateVal, EndPoint, Kill);
     }
 
     /// removeValNo - Remove all the segments defined by the specified value#.
     /// Also remove the value# from value# list.
     void removeValNo(VNInfo *ValNo);
 
     /// Returns true if the live range is zero length, i.e. no live segments
     /// span instructions. It doesn't pay to spill such a range.
     bool isZeroLength(SlotIndexes *Indexes) const {
       for (const Segment &S : segments)
         if (Indexes->getNextNonNullIndex(S.start).getBaseIndex() <
             S.end.getBaseIndex())
           return false;
       return true;
     }
 
+    // Returns true if any segment in the live range contains any of the
+    // provided slot indexes.  Slots which occur in holes between
+    // segments will not cause the function to return true.
+    bool isLiveAtIndexes(ArrayRef<SlotIndex> Slots) const;
+
     bool operator<(const LiveRange& other) const {
       const SlotIndex &thisIndex = beginIndex();
       const SlotIndex &otherIndex = other.beginIndex();
       return thisIndex < otherIndex;
     }
 
     /// Flush segment set into the regular segment vector.
     /// The method is to be called after the live range
     /// has been created, if use of the segment set was
     /// activated in the constructor of the live range.
     void flushSegmentSet();
 
     void print(raw_ostream &OS) const;
     void dump() const;
 
     /// \brief Walk the range and assert if any invariants fail to hold.
     ///
     /// Note that this is a no-op when asserts are disabled.
 #ifdef NDEBUG
     void verify() const {}
 #else
     void verify() const;
 #endif
 
   protected:
     /// Append a segment to the list of segments.
     void append(const LiveRange::Segment S);
 
   private:
     friend class LiveRangeUpdater;
     void addSegmentToSet(Segment S);
     void markValNoForDeletion(VNInfo *V);
 
   };
 
   inline raw_ostream &operator<<(raw_ostream &OS, const LiveRange &LR) {
     LR.print(OS);
     return OS;
   }
 
   /// LiveInterval - This class represents the liveness of a register,
   /// or stack slot.
   class LiveInterval : public LiveRange {
   public:
     typedef LiveRange super;
 
     /// A live range for subregisters. The LaneMask specifies which parts of the
     /// super register are covered by the interval.
     /// (@sa TargetRegisterInfo::getSubRegIndexLaneMask()).
     class SubRange : public LiveRange {
     public:
       SubRange *Next;
       LaneBitmask LaneMask;
 
       /// Constructs a new SubRange object.
       SubRange(LaneBitmask LaneMask)
         : Next(nullptr), LaneMask(LaneMask) {
       }
 
       /// Constructs a new SubRange object by copying liveness from @p Other.
       SubRange(LaneBitmask LaneMask, const LiveRange &Other,
                BumpPtrAllocator &Allocator)
         : LiveRange(Other, Allocator), Next(nullptr), LaneMask(LaneMask) {
       }
     };
 
   private:
     SubRange *SubRanges; ///< Single linked list of subregister live ranges.
 
   public:
     const unsigned reg;  // the register or stack slot of this interval.
     float weight;        // weight of this interval
 
     LiveInterval(unsigned Reg, float Weight)
       : SubRanges(nullptr), reg(Reg), weight(Weight) {}
 
     ~LiveInterval() {
       clearSubRanges();
     }
 
     template<typename T>
     class SingleLinkedListIterator {
       T *P;
     public:
       SingleLinkedListIterator<T>(T *P) : P(P) {}
       SingleLinkedListIterator<T> &operator++() {
         P = P->Next;
         return *this;
       }
       SingleLinkedListIterator<T> &operator++(int) {
         SingleLinkedListIterator res = *this;
         ++*this;
         return res;
       }
       bool operator!=(const SingleLinkedListIterator<T> &Other) {
         return P != Other.operator->();
       }
       bool operator==(const SingleLinkedListIterator<T> &Other) {
         return P == Other.operator->();
       }
       T &operator*() const {
         return *P;
       }
       T *operator->() const {
         return P;
       }
     };
 
     typedef SingleLinkedListIterator<SubRange> subrange_iterator;
     subrange_iterator subrange_begin() {
       return subrange_iterator(SubRanges);
     }
     subrange_iterator subrange_end() {
       return subrange_iterator(nullptr);
     }
 
     typedef SingleLinkedListIterator<const SubRange> const_subrange_iterator;
     const_subrange_iterator subrange_begin() const {
       return const_subrange_iterator(SubRanges);
     }
     const_subrange_iterator subrange_end() const {
       return const_subrange_iterator(nullptr);
     }
 
     iterator_range<subrange_iterator> subranges() {
       return make_range(subrange_begin(), subrange_end());
     }
 
     iterator_range<const_subrange_iterator> subranges() const {
       return make_range(subrange_begin(), subrange_end());
     }
 
     /// Creates a new empty subregister live range. The range is added at the
     /// beginning of the subrange list; subrange iterators stay valid.
     SubRange *createSubRange(BumpPtrAllocator &Allocator,
                              LaneBitmask LaneMask) {
       SubRange *Range = new (Allocator) SubRange(LaneMask);
       appendSubRange(Range);
       return Range;
     }
 
     /// Like createSubRange() but the new range is filled with a copy of the
     /// liveness information in @p CopyFrom.
     SubRange *createSubRangeFrom(BumpPtrAllocator &Allocator,
                                  LaneBitmask LaneMask,
                                  const LiveRange &CopyFrom) {
       SubRange *Range = new (Allocator) SubRange(LaneMask, CopyFrom, Allocator);
       appendSubRange(Range);
       return Range;
     }
 
     /// Returns true if subregister liveness information is available.
     bool hasSubRanges() const {
       return SubRanges != nullptr;
     }
 
     /// Removes all subregister liveness information.
     void clearSubRanges();
 
     /// Removes all subranges without any segments (subranges without segments
     /// are not considered valid and should only exist temporarily).
     void removeEmptySubRanges();
 
     /// Construct main live range by merging the SubRanges of @p LI.
     void constructMainRangeFromSubranges(const SlotIndexes &Indexes,
                                          VNInfo::Allocator &VNIAllocator);
 
     /// getSize - Returns the sum of sizes of all the LiveRange's.
     ///
     unsigned getSize() const;
 
     /// isSpillable - Can this interval be spilled?
     bool isSpillable() const {
       return weight != llvm::huge_valf;
     }
 
     /// markNotSpillable - Mark interval as not spillable
     void markNotSpillable() {
       weight = llvm::huge_valf;
     }
 
     bool operator<(const LiveInterval& other) const {
       const SlotIndex &thisIndex = beginIndex();
       const SlotIndex &otherIndex = other.beginIndex();
       return std::tie(thisIndex, reg) < std::tie(otherIndex, other.reg);
     }
 
     void print(raw_ostream &OS) const;
     void dump() const;
 
     /// \brief Walks the interval and assert if any invariants fail to hold.
     ///
     /// Note that this is a no-op when asserts are disabled.
 #ifdef NDEBUG
     void verify(const MachineRegisterInfo *MRI = nullptr) const {}
 #else
     void verify(const MachineRegisterInfo *MRI = nullptr) const;
 #endif
 
   private:
     /// Appends @p Range to SubRanges list.
     void appendSubRange(SubRange *Range) {
       Range->Next = SubRanges;
       SubRanges = Range;
     }
 
     /// Free memory held by SubRange.
     void freeSubRange(SubRange *S);
   };
 
   inline raw_ostream &operator<<(raw_ostream &OS, const LiveInterval &LI) {
     LI.print(OS);
     return OS;
   }
 
   raw_ostream &operator<<(raw_ostream &OS, const LiveRange::Segment &S);
 
   inline bool operator<(SlotIndex V, const LiveRange::Segment &S) {
     return V < S.start;
   }
 
   inline bool operator<(const LiveRange::Segment &S, SlotIndex V) {
     return S.start < V;
   }
 
   /// Helper class for performant LiveRange bulk updates.
   ///
   /// Calling LiveRange::addSegment() repeatedly can be expensive on large
   /// live ranges because segments after the insertion point may need to be
   /// shifted. The LiveRangeUpdater class can defer the shifting when adding
   /// many segments in order.
   ///
   /// The LiveRange will be in an invalid state until flush() is called.
   class LiveRangeUpdater {
     LiveRange *LR;
     SlotIndex LastStart;
     LiveRange::iterator WriteI;
     LiveRange::iterator ReadI;
     SmallVector<LiveRange::Segment, 16> Spills;
     void mergeSpills();
 
   public:
     /// Create a LiveRangeUpdater for adding segments to LR.
     /// LR will temporarily be in an invalid state until flush() is called.
     LiveRangeUpdater(LiveRange *lr = nullptr) : LR(lr) {}
 
     ~LiveRangeUpdater() { flush(); }
 
     /// Add a segment to LR and coalesce when possible, just like
     /// LR.addSegment(). Segments should be added in increasing start order for
     /// best performance.
     void add(LiveRange::Segment);
 
     void add(SlotIndex Start, SlotIndex End, VNInfo *VNI) {
       add(LiveRange::Segment(Start, End, VNI));
     }
 
     /// Return true if the LR is currently in an invalid state, and flush()
     /// needs to be called.
     bool isDirty() const { return LastStart.isValid(); }
 
     /// Flush the updater state to LR so it is valid and contains all added
     /// segments.
     void flush();
 
     /// Select a different destination live range.
     void setDest(LiveRange *lr) {
       if (LR != lr && isDirty())
         flush();
       LR = lr;
     }
 
     /// Get the current destination live range.
     LiveRange *getDest() const { return LR; }
 
     void dump() const;
     void print(raw_ostream&) const;
   };
 
   inline raw_ostream &operator<<(raw_ostream &OS, const LiveRangeUpdater &X) {
     X.print(OS);
     return OS;
   }
 
   /// ConnectedVNInfoEqClasses - Helper class that can divide VNInfos in a
   /// LiveInterval into equivalence clases of connected components. A
   /// LiveInterval that has multiple connected components can be broken into
   /// multiple LiveIntervals.
   ///
   /// Given a LiveInterval that may have multiple connected components, run:
   ///
   ///   unsigned numComps = ConEQ.Classify(LI);
   ///   if (numComps > 1) {
   ///     // allocate numComps-1 new LiveIntervals into LIS[1..]
   ///     ConEQ.Distribute(LIS);
   /// }
 
   class ConnectedVNInfoEqClasses {
     LiveIntervals &LIS;
     IntEqClasses EqClass;
 
   public:
     explicit ConnectedVNInfoEqClasses(LiveIntervals &lis) : LIS(lis) {}
 
     /// Classify the values in \p LR into connected components.
     /// Returns the number of connected components.
     unsigned Classify(const LiveRange &LR);
 
     /// getEqClass - Classify creates equivalence classes numbered 0..N. Return
     /// the equivalence class assigned the VNI.
     unsigned getEqClass(const VNInfo *VNI) const { return EqClass[VNI->id]; }
 
     /// Distribute values in \p LI into a separate LiveIntervals
     /// for each connected component. LIV must have an empty LiveInterval for
     /// each additional connected component. The first connected component is
     /// left in \p LI.
     void Distribute(LiveInterval &LI, LiveInterval *LIV[],
                     MachineRegisterInfo &MRI);
   };
 
 }
 #endif
Index: vendor/llvm/dist/include/llvm/IR/IRBuilder.h
===================================================================
--- vendor/llvm/dist/include/llvm/IR/IRBuilder.h	(revision 295845)
+++ vendor/llvm/dist/include/llvm/IR/IRBuilder.h	(revision 295846)
@@ -1,1796 +1,1796 @@
 //===---- llvm/IRBuilder.h - Builder for LLVM Instructions ------*- C++ -*-===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file defines the IRBuilder class, which is used as a convenient way
 // to create LLVM instructions with a consistent and simplified interface.
 //
 //===----------------------------------------------------------------------===//
 
 #ifndef LLVM_IR_IRBUILDER_H
 #define LLVM_IR_IRBUILDER_H
 
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/ConstantFolder.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/Operator.h"
 #include "llvm/IR/ValueHandle.h"
 #include "llvm/Support/CBindingWrapping.h"
 
 namespace llvm {
 class MDNode;
 
 /// \brief This provides the default implementation of the IRBuilder
 /// 'InsertHelper' method that is called whenever an instruction is created by
 /// IRBuilder and needs to be inserted.
 ///
 /// By default, this inserts the instruction at the insertion point.
 template <bool preserveNames = true>
 class IRBuilderDefaultInserter {
 protected:
   void InsertHelper(Instruction *I, const Twine &Name,
                     BasicBlock *BB, BasicBlock::iterator InsertPt) const {
     if (BB) BB->getInstList().insert(InsertPt, I);
     if (preserveNames)
       I->setName(Name);
   }
 };
 
 /// \brief Common base class shared among various IRBuilders.
 class IRBuilderBase {
   DebugLoc CurDbgLocation;
 
 protected:
   BasicBlock *BB;
   BasicBlock::iterator InsertPt;
   LLVMContext &Context;
 
   MDNode *DefaultFPMathTag;
   FastMathFlags FMF;
 
   ArrayRef<OperandBundleDef> DefaultOperandBundles;
 
 public:
   IRBuilderBase(LLVMContext &context, MDNode *FPMathTag = nullptr,
                 ArrayRef<OperandBundleDef> OpBundles = None)
       : Context(context), DefaultFPMathTag(FPMathTag), FMF(),
         DefaultOperandBundles(OpBundles) {
     ClearInsertionPoint();
   }
 
   //===--------------------------------------------------------------------===//
   // Builder configuration methods
   //===--------------------------------------------------------------------===//
 
   /// \brief Clear the insertion point: created instructions will not be
   /// inserted into a block.
   void ClearInsertionPoint() {
     BB = nullptr;
     InsertPt.reset(nullptr);
   }
 
   BasicBlock *GetInsertBlock() const { return BB; }
   BasicBlock::iterator GetInsertPoint() const { return InsertPt; }
   LLVMContext &getContext() const { return Context; }
 
   /// \brief This specifies that created instructions should be appended to the
   /// end of the specified block.
   void SetInsertPoint(BasicBlock *TheBB) {
     BB = TheBB;
     InsertPt = BB->end();
   }
 
   /// \brief This specifies that created instructions should be inserted before
   /// the specified instruction.
   void SetInsertPoint(Instruction *I) {
     BB = I->getParent();
     InsertPt = I->getIterator();
     assert(InsertPt != BB->end() && "Can't read debug loc from end()");
     SetCurrentDebugLocation(I->getDebugLoc());
   }
 
   /// \brief This specifies that created instructions should be inserted at the
   /// specified point.
   void SetInsertPoint(BasicBlock *TheBB, BasicBlock::iterator IP) {
     BB = TheBB;
     InsertPt = IP;
     if (IP != TheBB->end())
       SetCurrentDebugLocation(IP->getDebugLoc());
   }
 
   /// \brief Set location information used by debugging information.
   void SetCurrentDebugLocation(DebugLoc L) { CurDbgLocation = std::move(L); }
 
   /// \brief Get location information used by debugging information.
   const DebugLoc &getCurrentDebugLocation() const { return CurDbgLocation; }
 
   /// \brief If this builder has a current debug location, set it on the
   /// specified instruction.
   void SetInstDebugLocation(Instruction *I) const {
     if (CurDbgLocation)
       I->setDebugLoc(CurDbgLocation);
   }
 
   /// \brief Get the return type of the current function that we're emitting
   /// into.
   Type *getCurrentFunctionReturnType() const;
 
   /// InsertPoint - A saved insertion point.
   class InsertPoint {
     BasicBlock *Block;
     BasicBlock::iterator Point;
 
   public:
     /// \brief Creates a new insertion point which doesn't point to anything.
     InsertPoint() : Block(nullptr) {}
 
     /// \brief Creates a new insertion point at the given location.
     InsertPoint(BasicBlock *InsertBlock, BasicBlock::iterator InsertPoint)
       : Block(InsertBlock), Point(InsertPoint) {}
 
     /// \brief Returns true if this insert point is set.
     bool isSet() const { return (Block != nullptr); }
 
     llvm::BasicBlock *getBlock() const { return Block; }
     llvm::BasicBlock::iterator getPoint() const { return Point; }
   };
 
   /// \brief Returns the current insert point.
   InsertPoint saveIP() const {
     return InsertPoint(GetInsertBlock(), GetInsertPoint());
   }
 
   /// \brief Returns the current insert point, clearing it in the process.
   InsertPoint saveAndClearIP() {
     InsertPoint IP(GetInsertBlock(), GetInsertPoint());
     ClearInsertionPoint();
     return IP;
   }
 
   /// \brief Sets the current insert point to a previously-saved location.
   void restoreIP(InsertPoint IP) {
     if (IP.isSet())
       SetInsertPoint(IP.getBlock(), IP.getPoint());
     else
       ClearInsertionPoint();
   }
 
   /// \brief Get the floating point math metadata being used.
   MDNode *getDefaultFPMathTag() const { return DefaultFPMathTag; }
 
   /// \brief Get the flags to be applied to created floating point ops
   FastMathFlags getFastMathFlags() const { return FMF; }
 
   /// \brief Clear the fast-math flags.
   void clearFastMathFlags() { FMF.clear(); }
 
   /// \brief Set the floating point math metadata to be used.
   void setDefaultFPMathTag(MDNode *FPMathTag) { DefaultFPMathTag = FPMathTag; }
 
   /// \brief Set the fast-math flags to be used with generated fp-math operators
   void setFastMathFlags(FastMathFlags NewFMF) { FMF = NewFMF; }
 
   //===--------------------------------------------------------------------===//
   // RAII helpers.
   //===--------------------------------------------------------------------===//
 
   // \brief RAII object that stores the current insertion point and restores it
   // when the object is destroyed. This includes the debug location.
   class InsertPointGuard {
     IRBuilderBase &Builder;
     AssertingVH<BasicBlock> Block;
     BasicBlock::iterator Point;
     DebugLoc DbgLoc;
 
     InsertPointGuard(const InsertPointGuard &) = delete;
     InsertPointGuard &operator=(const InsertPointGuard &) = delete;
 
   public:
     InsertPointGuard(IRBuilderBase &B)
         : Builder(B), Block(B.GetInsertBlock()), Point(B.GetInsertPoint()),
           DbgLoc(B.getCurrentDebugLocation()) {}
 
     ~InsertPointGuard() {
       Builder.restoreIP(InsertPoint(Block, Point));
       Builder.SetCurrentDebugLocation(DbgLoc);
     }
   };
 
   // \brief RAII object that stores the current fast math settings and restores
   // them when the object is destroyed.
   class FastMathFlagGuard {
     IRBuilderBase &Builder;
     FastMathFlags FMF;
     MDNode *FPMathTag;
 
     FastMathFlagGuard(const FastMathFlagGuard &) = delete;
     FastMathFlagGuard &operator=(
         const FastMathFlagGuard &) = delete;
 
   public:
     FastMathFlagGuard(IRBuilderBase &B)
         : Builder(B), FMF(B.FMF), FPMathTag(B.DefaultFPMathTag) {}
 
     ~FastMathFlagGuard() {
       Builder.FMF = FMF;
       Builder.DefaultFPMathTag = FPMathTag;
     }
   };
 
   //===--------------------------------------------------------------------===//
   // Miscellaneous creation methods.
   //===--------------------------------------------------------------------===//
 
   /// \brief Make a new global variable with initializer type i8*
   ///
   /// Make a new global variable with an initializer that has array of i8 type
   /// filled in with the null terminated string value specified.  The new global
   /// variable will be marked mergable with any others of the same contents.  If
   /// Name is specified, it is the name of the global variable created.
   GlobalVariable *CreateGlobalString(StringRef Str, const Twine &Name = "",
                                      unsigned AddressSpace = 0);
 
   /// \brief Get a constant value representing either true or false.
   ConstantInt *getInt1(bool V) {
     return ConstantInt::get(getInt1Ty(), V);
   }
 
   /// \brief Get the constant value for i1 true.
   ConstantInt *getTrue() {
     return ConstantInt::getTrue(Context);
   }
 
   /// \brief Get the constant value for i1 false.
   ConstantInt *getFalse() {
     return ConstantInt::getFalse(Context);
   }
 
   /// \brief Get a constant 8-bit value.
   ConstantInt *getInt8(uint8_t C) {
     return ConstantInt::get(getInt8Ty(), C);
   }
 
   /// \brief Get a constant 16-bit value.
   ConstantInt *getInt16(uint16_t C) {
     return ConstantInt::get(getInt16Ty(), C);
   }
 
   /// \brief Get a constant 32-bit value.
   ConstantInt *getInt32(uint32_t C) {
     return ConstantInt::get(getInt32Ty(), C);
   }
 
   /// \brief Get a constant 64-bit value.
   ConstantInt *getInt64(uint64_t C) {
     return ConstantInt::get(getInt64Ty(), C);
   }
 
   /// \brief Get a constant N-bit value, zero extended or truncated from
   /// a 64-bit value.
   ConstantInt *getIntN(unsigned N, uint64_t C) {
     return ConstantInt::get(getIntNTy(N), C);
   }
 
   /// \brief Get a constant integer value.
   ConstantInt *getInt(const APInt &AI) {
     return ConstantInt::get(Context, AI);
   }
 
   //===--------------------------------------------------------------------===//
   // Type creation methods
   //===--------------------------------------------------------------------===//
 
   /// \brief Fetch the type representing a single bit
   IntegerType *getInt1Ty() {
     return Type::getInt1Ty(Context);
   }
 
   /// \brief Fetch the type representing an 8-bit integer.
   IntegerType *getInt8Ty() {
     return Type::getInt8Ty(Context);
   }
 
   /// \brief Fetch the type representing a 16-bit integer.
   IntegerType *getInt16Ty() {
     return Type::getInt16Ty(Context);
   }
 
   /// \brief Fetch the type representing a 32-bit integer.
   IntegerType *getInt32Ty() {
     return Type::getInt32Ty(Context);
   }
 
   /// \brief Fetch the type representing a 64-bit integer.
   IntegerType *getInt64Ty() {
     return Type::getInt64Ty(Context);
   }
 
   /// \brief Fetch the type representing a 128-bit integer.
   IntegerType *getInt128Ty() { return Type::getInt128Ty(Context); }
 
   /// \brief Fetch the type representing an N-bit integer.
   IntegerType *getIntNTy(unsigned N) {
     return Type::getIntNTy(Context, N);
   }
 
   /// \brief Fetch the type representing a 16-bit floating point value.
   Type *getHalfTy() {
     return Type::getHalfTy(Context);
   }
 
   /// \brief Fetch the type representing a 32-bit floating point value.
   Type *getFloatTy() {
     return Type::getFloatTy(Context);
   }
 
   /// \brief Fetch the type representing a 64-bit floating point value.
   Type *getDoubleTy() {
     return Type::getDoubleTy(Context);
   }
 
   /// \brief Fetch the type representing void.
   Type *getVoidTy() {
     return Type::getVoidTy(Context);
   }
 
   /// \brief Fetch the type representing a pointer to an 8-bit integer value.
   PointerType *getInt8PtrTy(unsigned AddrSpace = 0) {
     return Type::getInt8PtrTy(Context, AddrSpace);
   }
 
   /// \brief Fetch the type representing a pointer to an integer value.
   IntegerType *getIntPtrTy(const DataLayout &DL, unsigned AddrSpace = 0) {
     return DL.getIntPtrType(Context, AddrSpace);
   }
 
   //===--------------------------------------------------------------------===//
   // Intrinsic creation methods
   //===--------------------------------------------------------------------===//
 
   /// \brief Create and insert a memset to the specified pointer and the
   /// specified value.
   ///
   /// If the pointer isn't an i8*, it will be converted. If a TBAA tag is
   /// specified, it will be added to the instruction. Likewise with alias.scope
   /// and noalias tags.
   CallInst *CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, unsigned Align,
                          bool isVolatile = false, MDNode *TBAATag = nullptr,
                          MDNode *ScopeTag = nullptr,
                          MDNode *NoAliasTag = nullptr) {
     return CreateMemSet(Ptr, Val, getInt64(Size), Align, isVolatile,
                         TBAATag, ScopeTag, NoAliasTag);
   }
 
   CallInst *CreateMemSet(Value *Ptr, Value *Val, Value *Size, unsigned Align,
                          bool isVolatile = false, MDNode *TBAATag = nullptr,
                          MDNode *ScopeTag = nullptr,
                          MDNode *NoAliasTag = nullptr);
 
   /// \brief Create and insert a memcpy between the specified pointers.
   ///
   /// If the pointers aren't i8*, they will be converted.  If a TBAA tag is
   /// specified, it will be added to the instruction. Likewise with alias.scope
   /// and noalias tags.
   CallInst *CreateMemCpy(Value *Dst, Value *Src, uint64_t Size, unsigned Align,
                          bool isVolatile = false, MDNode *TBAATag = nullptr,
                          MDNode *TBAAStructTag = nullptr,
                          MDNode *ScopeTag = nullptr,
                          MDNode *NoAliasTag = nullptr) {
     return CreateMemCpy(Dst, Src, getInt64(Size), Align, isVolatile, TBAATag,
                         TBAAStructTag, ScopeTag, NoAliasTag);
   }
 
   CallInst *CreateMemCpy(Value *Dst, Value *Src, Value *Size, unsigned Align,
                          bool isVolatile = false, MDNode *TBAATag = nullptr,
                          MDNode *TBAAStructTag = nullptr,
                          MDNode *ScopeTag = nullptr,
                          MDNode *NoAliasTag = nullptr);
 
   /// \brief Create and insert a memmove between the specified
   /// pointers.
   ///
   /// If the pointers aren't i8*, they will be converted.  If a TBAA tag is
   /// specified, it will be added to the instruction. Likewise with alias.scope
   /// and noalias tags.
   CallInst *CreateMemMove(Value *Dst, Value *Src, uint64_t Size, unsigned Align,
                           bool isVolatile = false, MDNode *TBAATag = nullptr,
                           MDNode *ScopeTag = nullptr,
                           MDNode *NoAliasTag = nullptr) {
     return CreateMemMove(Dst, Src, getInt64(Size), Align, isVolatile,
                          TBAATag, ScopeTag, NoAliasTag);
   }
 
   CallInst *CreateMemMove(Value *Dst, Value *Src, Value *Size, unsigned Align,
                           bool isVolatile = false, MDNode *TBAATag = nullptr,
                           MDNode *ScopeTag = nullptr,
                           MDNode *NoAliasTag = nullptr);
 
   /// \brief Create a lifetime.start intrinsic.
   ///
   /// If the pointer isn't i8* it will be converted.
   CallInst *CreateLifetimeStart(Value *Ptr, ConstantInt *Size = nullptr);
 
   /// \brief Create a lifetime.end intrinsic.
   ///
   /// If the pointer isn't i8* it will be converted.
   CallInst *CreateLifetimeEnd(Value *Ptr, ConstantInt *Size = nullptr);
 
   /// \brief Create a call to Masked Load intrinsic
   CallInst *CreateMaskedLoad(Value *Ptr, unsigned Align, Value *Mask,
                              Value *PassThru = nullptr, const Twine &Name = "");
 
   /// \brief Create a call to Masked Store intrinsic
   CallInst *CreateMaskedStore(Value *Val, Value *Ptr, unsigned Align,
                               Value *Mask);
 
   /// \brief Create an assume intrinsic call that allows the optimizer to
   /// assume that the provided condition will be true.
   CallInst *CreateAssumption(Value *Cond);
 
   /// \brief Create a call to the experimental.gc.statepoint intrinsic to
   /// start a new statepoint sequence.
   CallInst *CreateGCStatepointCall(uint64_t ID, uint32_t NumPatchBytes,
                                    Value *ActualCallee,
                                    ArrayRef<Value *> CallArgs,
                                    ArrayRef<Value *> DeoptArgs,
                                    ArrayRef<Value *> GCArgs,
                                    const Twine &Name = "");
 
   /// \brief Create a call to the experimental.gc.statepoint intrinsic to
   /// start a new statepoint sequence.
   CallInst *CreateGCStatepointCall(uint64_t ID, uint32_t NumPatchBytes,
                                    Value *ActualCallee, uint32_t Flags,
                                    ArrayRef<Use> CallArgs,
                                    ArrayRef<Use> TransitionArgs,
                                    ArrayRef<Use> DeoptArgs,
                                    ArrayRef<Value *> GCArgs,
                                    const Twine &Name = "");
 
   // \brief Conveninence function for the common case when CallArgs are filled
   // in using makeArrayRef(CS.arg_begin(), CS.arg_end()); Use needs to be
   // .get()'ed to get the Value pointer.
   CallInst *CreateGCStatepointCall(uint64_t ID, uint32_t NumPatchBytes,
                                    Value *ActualCallee, ArrayRef<Use> CallArgs,
                                    ArrayRef<Value *> DeoptArgs,
                                    ArrayRef<Value *> GCArgs,
                                    const Twine &Name = "");
 
   /// brief Create an invoke to the experimental.gc.statepoint intrinsic to
   /// start a new statepoint sequence.
   InvokeInst *
   CreateGCStatepointInvoke(uint64_t ID, uint32_t NumPatchBytes,
                            Value *ActualInvokee, BasicBlock *NormalDest,
                            BasicBlock *UnwindDest, ArrayRef<Value *> InvokeArgs,
                            ArrayRef<Value *> DeoptArgs,
                            ArrayRef<Value *> GCArgs, const Twine &Name = "");
 
   /// brief Create an invoke to the experimental.gc.statepoint intrinsic to
   /// start a new statepoint sequence.
   InvokeInst *CreateGCStatepointInvoke(
       uint64_t ID, uint32_t NumPatchBytes, Value *ActualInvokee,
       BasicBlock *NormalDest, BasicBlock *UnwindDest, uint32_t Flags,
       ArrayRef<Use> InvokeArgs, ArrayRef<Use> TransitionArgs,
       ArrayRef<Use> DeoptArgs, ArrayRef<Value *> GCArgs,
       const Twine &Name = "");
 
   // Conveninence function for the common case when CallArgs are filled in using
   // makeArrayRef(CS.arg_begin(), CS.arg_end()); Use needs to be .get()'ed to
   // get the Value *.
   InvokeInst *
   CreateGCStatepointInvoke(uint64_t ID, uint32_t NumPatchBytes,
                            Value *ActualInvokee, BasicBlock *NormalDest,
                            BasicBlock *UnwindDest, ArrayRef<Use> InvokeArgs,
                            ArrayRef<Value *> DeoptArgs,
                            ArrayRef<Value *> GCArgs, const Twine &Name = "");
 
   /// \brief Create a call to the experimental.gc.result intrinsic to extract
   /// the result from a call wrapped in a statepoint.
   CallInst *CreateGCResult(Instruction *Statepoint,
                            Type *ResultType,
                            const Twine &Name = "");
 
   /// \brief Create a call to the experimental.gc.relocate intrinsics to
   /// project the relocated value of one pointer from the statepoint.
   CallInst *CreateGCRelocate(Instruction *Statepoint,
                              int BaseOffset,
                              int DerivedOffset,
                              Type *ResultType,
                              const Twine &Name = "");
 
 private:
   /// \brief Create a call to a masked intrinsic with given Id.
   /// Masked intrinsic has only one overloaded type - data type.
   CallInst *CreateMaskedIntrinsic(Intrinsic::ID Id, ArrayRef<Value *> Ops,
                                   Type *DataTy, const Twine &Name = "");
 
   Value *getCastedInt8PtrValue(Value *Ptr);
 };
 
 /// \brief This provides a uniform API for creating instructions and inserting
 /// them into a basic block: either at the end of a BasicBlock, or at a specific
 /// iterator location in a block.
 ///
 /// Note that the builder does not expose the full generality of LLVM
 /// instructions.  For access to extra instruction properties, use the mutators
 /// (e.g. setVolatile) on the instructions after they have been
 /// created. Convenience state exists to specify fast-math flags and fp-math
 /// tags.
 ///
 /// The first template argument handles whether or not to preserve names in the
 /// final instruction output. This defaults to on.  The second template argument
 /// specifies a class to use for creating constants.  This defaults to creating
 /// minimally folded constants.  The third template argument allows clients to
 /// specify custom insertion hooks that are called on every newly created
 /// insertion.
 template<bool preserveNames = true, typename T = ConstantFolder,
          typename Inserter = IRBuilderDefaultInserter<preserveNames> >
 class IRBuilder : public IRBuilderBase, public Inserter {
   T Folder;
 
 public:
   IRBuilder(LLVMContext &C, const T &F, Inserter I = Inserter(),
             MDNode *FPMathTag = nullptr,
             ArrayRef<OperandBundleDef> OpBundles = None)
       : IRBuilderBase(C, FPMathTag, OpBundles), Inserter(std::move(I)),
         Folder(F) {}
 
   explicit IRBuilder(LLVMContext &C, MDNode *FPMathTag = nullptr,
                      ArrayRef<OperandBundleDef> OpBundles = None)
       : IRBuilderBase(C, FPMathTag, OpBundles), Folder() {}
 
   explicit IRBuilder(BasicBlock *TheBB, const T &F, MDNode *FPMathTag = nullptr,
                      ArrayRef<OperandBundleDef> OpBundles = None)
       : IRBuilderBase(TheBB->getContext(), FPMathTag, OpBundles), Folder(F) {
     SetInsertPoint(TheBB);
   }
 
   explicit IRBuilder(BasicBlock *TheBB, MDNode *FPMathTag = nullptr,
                      ArrayRef<OperandBundleDef> OpBundles = None)
       : IRBuilderBase(TheBB->getContext(), FPMathTag, OpBundles), Folder() {
     SetInsertPoint(TheBB);
   }
 
   explicit IRBuilder(Instruction *IP, MDNode *FPMathTag = nullptr,
                      ArrayRef<OperandBundleDef> OpBundles = None)
       : IRBuilderBase(IP->getContext(), FPMathTag, OpBundles), Folder() {
     SetInsertPoint(IP);
   }
 
   IRBuilder(BasicBlock *TheBB, BasicBlock::iterator IP, const T &F,
             MDNode *FPMathTag = nullptr,
             ArrayRef<OperandBundleDef> OpBundles = None)
       : IRBuilderBase(TheBB->getContext(), FPMathTag, OpBundles), Folder(F) {
     SetInsertPoint(TheBB, IP);
   }
 
   IRBuilder(BasicBlock *TheBB, BasicBlock::iterator IP,
             MDNode *FPMathTag = nullptr,
             ArrayRef<OperandBundleDef> OpBundles = None)
       : IRBuilderBase(TheBB->getContext(), FPMathTag, OpBundles), Folder() {
     SetInsertPoint(TheBB, IP);
   }
 
   /// \brief Get the constant folder being used.
   const T &getFolder() { return Folder; }
 
   /// \brief Return true if this builder is configured to actually add the
   /// requested names to IR created through it.
   bool isNamePreserving() const { return preserveNames; }
 
   /// \brief Insert and return the specified instruction.
   template<typename InstTy>
   InstTy *Insert(InstTy *I, const Twine &Name = "") const {
     this->InsertHelper(I, Name, BB, InsertPt);
     this->SetInstDebugLocation(I);
     return I;
   }
 
   /// \brief No-op overload to handle constants.
   Constant *Insert(Constant *C, const Twine& = "") const {
     return C;
   }
 
   //===--------------------------------------------------------------------===//
   // Instruction creation methods: Terminators
   //===--------------------------------------------------------------------===//
 
 private:
   /// \brief Helper to add branch weight and unpredictable metadata onto an
   /// instruction.
   /// \returns The annotated instruction.
   template <typename InstTy>
   InstTy *addBranchMetadata(InstTy *I, MDNode *Weights, MDNode *Unpredictable) {
     if (Weights)
       I->setMetadata(LLVMContext::MD_prof, Weights);
     if (Unpredictable)
       I->setMetadata(LLVMContext::MD_unpredictable, Unpredictable);
     return I;
   }
 
 public:
   /// \brief Create a 'ret void' instruction.
   ReturnInst *CreateRetVoid() {
     return Insert(ReturnInst::Create(Context));
   }
 
   /// \brief Create a 'ret <val>' instruction.
   ReturnInst *CreateRet(Value *V) {
     return Insert(ReturnInst::Create(Context, V));
   }
 
   /// \brief Create a sequence of N insertvalue instructions,
   /// with one Value from the retVals array each, that build a aggregate
   /// return value one value at a time, and a ret instruction to return
   /// the resulting aggregate value.
   ///
   /// This is a convenience function for code that uses aggregate return values
   /// as a vehicle for having multiple return values.
   ReturnInst *CreateAggregateRet(Value *const *retVals, unsigned N) {
     Value *V = UndefValue::get(getCurrentFunctionReturnType());
     for (unsigned i = 0; i != N; ++i)
       V = CreateInsertValue(V, retVals[i], i, "mrv");
     return Insert(ReturnInst::Create(Context, V));
   }
 
   /// \brief Create an unconditional 'br label X' instruction.
   BranchInst *CreateBr(BasicBlock *Dest) {
     return Insert(BranchInst::Create(Dest));
   }
 
   /// \brief Create a conditional 'br Cond, TrueDest, FalseDest'
   /// instruction.
   BranchInst *CreateCondBr(Value *Cond, BasicBlock *True, BasicBlock *False,
                            MDNode *BranchWeights = nullptr,
                            MDNode *Unpredictable = nullptr) {
     return Insert(addBranchMetadata(BranchInst::Create(True, False, Cond),
                                     BranchWeights, Unpredictable));
   }
 
   /// \brief Create a switch instruction with the specified value, default dest,
   /// and with a hint for the number of cases that will be added (for efficient
   /// allocation).
   SwitchInst *CreateSwitch(Value *V, BasicBlock *Dest, unsigned NumCases = 10,
                            MDNode *BranchWeights = nullptr,
                            MDNode *Unpredictable = nullptr) {
     return Insert(addBranchMetadata(SwitchInst::Create(V, Dest, NumCases),
                                     BranchWeights, Unpredictable));
   }
 
   /// \brief Create an indirect branch instruction with the specified address
   /// operand, with an optional hint for the number of destinations that will be
   /// added (for efficient allocation).
   IndirectBrInst *CreateIndirectBr(Value *Addr, unsigned NumDests = 10) {
     return Insert(IndirectBrInst::Create(Addr, NumDests));
   }
 
   InvokeInst *CreateInvoke(Value *Callee, BasicBlock *NormalDest,
                            BasicBlock *UnwindDest, const Twine &Name = "") {
     return Insert(InvokeInst::Create(Callee, NormalDest, UnwindDest, None),
                   Name);
   }
   InvokeInst *CreateInvoke(Value *Callee, BasicBlock *NormalDest,
                            BasicBlock *UnwindDest, Value *Arg1,
                            const Twine &Name = "") {
     return Insert(InvokeInst::Create(Callee, NormalDest, UnwindDest, Arg1),
                   Name);
   }
   InvokeInst *CreateInvoke3(Value *Callee, BasicBlock *NormalDest,
                             BasicBlock *UnwindDest, Value *Arg1,
                             Value *Arg2, Value *Arg3,
                             const Twine &Name = "") {
     Value *Args[] = { Arg1, Arg2, Arg3 };
     return Insert(InvokeInst::Create(Callee, NormalDest, UnwindDest, Args),
                   Name);
   }
   /// \brief Create an invoke instruction.
   InvokeInst *CreateInvoke(Value *Callee, BasicBlock *NormalDest,
                            BasicBlock *UnwindDest, ArrayRef<Value *> Args,
                            const Twine &Name = "") {
     return Insert(InvokeInst::Create(Callee, NormalDest, UnwindDest, Args),
                   Name);
   }
   InvokeInst *CreateInvoke(Value *Callee, BasicBlock *NormalDest,
                            BasicBlock *UnwindDest, ArrayRef<Value *> Args,
                            ArrayRef<OperandBundleDef> OpBundles,
                            const Twine &Name = "") {
     return Insert(InvokeInst::Create(Callee, NormalDest, UnwindDest, Args,
                                      OpBundles), Name);
   }
 
   ResumeInst *CreateResume(Value *Exn) {
     return Insert(ResumeInst::Create(Exn));
   }
 
   CleanupReturnInst *CreateCleanupRet(CleanupPadInst *CleanupPad,
                                       BasicBlock *UnwindBB = nullptr) {
     return Insert(CleanupReturnInst::Create(CleanupPad, UnwindBB));
   }
 
   CatchSwitchInst *CreateCatchSwitch(Value *ParentPad, BasicBlock *UnwindBB,
                                      unsigned NumHandlers,
                                      const Twine &Name = "") {
     return Insert(CatchSwitchInst::Create(ParentPad, UnwindBB, NumHandlers),
                   Name);
   }
 
   CatchPadInst *CreateCatchPad(Value *ParentPad, ArrayRef<Value *> Args,
                                const Twine &Name = "") {
     return Insert(CatchPadInst::Create(ParentPad, Args), Name);
   }
 
   CleanupPadInst *CreateCleanupPad(Value *ParentPad,
                                    ArrayRef<Value *> Args = None,
                                    const Twine &Name = "") {
     return Insert(CleanupPadInst::Create(ParentPad, Args), Name);
   }
 
   CatchReturnInst *CreateCatchRet(CatchPadInst *CatchPad, BasicBlock *BB) {
     return Insert(CatchReturnInst::Create(CatchPad, BB));
   }
 
   UnreachableInst *CreateUnreachable() {
     return Insert(new UnreachableInst(Context));
   }
 
   //===--------------------------------------------------------------------===//
   // Instruction creation methods: Binary Operators
   //===--------------------------------------------------------------------===//
 private:
   BinaryOperator *CreateInsertNUWNSWBinOp(BinaryOperator::BinaryOps Opc,
                                           Value *LHS, Value *RHS,
                                           const Twine &Name,
                                           bool HasNUW, bool HasNSW) {
     BinaryOperator *BO = Insert(BinaryOperator::Create(Opc, LHS, RHS), Name);
     if (HasNUW) BO->setHasNoUnsignedWrap();
     if (HasNSW) BO->setHasNoSignedWrap();
     return BO;
   }
 
   Instruction *AddFPMathAttributes(Instruction *I,
                                    MDNode *FPMathTag,
                                    FastMathFlags FMF) const {
     if (!FPMathTag)
       FPMathTag = DefaultFPMathTag;
     if (FPMathTag)
       I->setMetadata(LLVMContext::MD_fpmath, FPMathTag);
     I->setFastMathFlags(FMF);
     return I;
   }
 
 public:
   Value *CreateAdd(Value *LHS, Value *RHS, const Twine &Name = "",
                    bool HasNUW = false, bool HasNSW = false) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateAdd(LC, RC, HasNUW, HasNSW), Name);
     return CreateInsertNUWNSWBinOp(Instruction::Add, LHS, RHS, Name,
                                    HasNUW, HasNSW);
   }
   Value *CreateNSWAdd(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateAdd(LHS, RHS, Name, false, true);
   }
   Value *CreateNUWAdd(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateAdd(LHS, RHS, Name, true, false);
   }
   Value *CreateFAdd(Value *LHS, Value *RHS, const Twine &Name = "",
                     MDNode *FPMathTag = nullptr) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateFAdd(LC, RC), Name);
     return Insert(AddFPMathAttributes(BinaryOperator::CreateFAdd(LHS, RHS),
                                       FPMathTag, FMF), Name);
   }
   Value *CreateSub(Value *LHS, Value *RHS, const Twine &Name = "",
                    bool HasNUW = false, bool HasNSW = false) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateSub(LC, RC, HasNUW, HasNSW), Name);
     return CreateInsertNUWNSWBinOp(Instruction::Sub, LHS, RHS, Name,
                                    HasNUW, HasNSW);
   }
   Value *CreateNSWSub(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateSub(LHS, RHS, Name, false, true);
   }
   Value *CreateNUWSub(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateSub(LHS, RHS, Name, true, false);
   }
   Value *CreateFSub(Value *LHS, Value *RHS, const Twine &Name = "",
                     MDNode *FPMathTag = nullptr) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateFSub(LC, RC), Name);
     return Insert(AddFPMathAttributes(BinaryOperator::CreateFSub(LHS, RHS),
                                       FPMathTag, FMF), Name);
   }
   Value *CreateMul(Value *LHS, Value *RHS, const Twine &Name = "",
                    bool HasNUW = false, bool HasNSW = false) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateMul(LC, RC, HasNUW, HasNSW), Name);
     return CreateInsertNUWNSWBinOp(Instruction::Mul, LHS, RHS, Name,
                                    HasNUW, HasNSW);
   }
   Value *CreateNSWMul(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateMul(LHS, RHS, Name, false, true);
   }
   Value *CreateNUWMul(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateMul(LHS, RHS, Name, true, false);
   }
   Value *CreateFMul(Value *LHS, Value *RHS, const Twine &Name = "",
                     MDNode *FPMathTag = nullptr) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateFMul(LC, RC), Name);
     return Insert(AddFPMathAttributes(BinaryOperator::CreateFMul(LHS, RHS),
                                       FPMathTag, FMF), Name);
   }
   Value *CreateUDiv(Value *LHS, Value *RHS, const Twine &Name = "",
                     bool isExact = false) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateUDiv(LC, RC, isExact), Name);
     if (!isExact)
       return Insert(BinaryOperator::CreateUDiv(LHS, RHS), Name);
     return Insert(BinaryOperator::CreateExactUDiv(LHS, RHS), Name);
   }
   Value *CreateExactUDiv(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateUDiv(LHS, RHS, Name, true);
   }
   Value *CreateSDiv(Value *LHS, Value *RHS, const Twine &Name = "",
                     bool isExact = false) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateSDiv(LC, RC, isExact), Name);
     if (!isExact)
       return Insert(BinaryOperator::CreateSDiv(LHS, RHS), Name);
     return Insert(BinaryOperator::CreateExactSDiv(LHS, RHS), Name);
   }
   Value *CreateExactSDiv(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateSDiv(LHS, RHS, Name, true);
   }
   Value *CreateFDiv(Value *LHS, Value *RHS, const Twine &Name = "",
                     MDNode *FPMathTag = nullptr) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateFDiv(LC, RC), Name);
     return Insert(AddFPMathAttributes(BinaryOperator::CreateFDiv(LHS, RHS),
                                       FPMathTag, FMF), Name);
   }
   Value *CreateURem(Value *LHS, Value *RHS, const Twine &Name = "") {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateURem(LC, RC), Name);
     return Insert(BinaryOperator::CreateURem(LHS, RHS), Name);
   }
   Value *CreateSRem(Value *LHS, Value *RHS, const Twine &Name = "") {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateSRem(LC, RC), Name);
     return Insert(BinaryOperator::CreateSRem(LHS, RHS), Name);
   }
   Value *CreateFRem(Value *LHS, Value *RHS, const Twine &Name = "",
                     MDNode *FPMathTag = nullptr) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateFRem(LC, RC), Name);
     return Insert(AddFPMathAttributes(BinaryOperator::CreateFRem(LHS, RHS),
                                       FPMathTag, FMF), Name);
   }
 
   Value *CreateShl(Value *LHS, Value *RHS, const Twine &Name = "",
                    bool HasNUW = false, bool HasNSW = false) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateShl(LC, RC, HasNUW, HasNSW), Name);
     return CreateInsertNUWNSWBinOp(Instruction::Shl, LHS, RHS, Name,
                                    HasNUW, HasNSW);
   }
   Value *CreateShl(Value *LHS, const APInt &RHS, const Twine &Name = "",
                    bool HasNUW = false, bool HasNSW = false) {
     return CreateShl(LHS, ConstantInt::get(LHS->getType(), RHS), Name,
                      HasNUW, HasNSW);
   }
   Value *CreateShl(Value *LHS, uint64_t RHS, const Twine &Name = "",
                    bool HasNUW = false, bool HasNSW = false) {
     return CreateShl(LHS, ConstantInt::get(LHS->getType(), RHS), Name,
                      HasNUW, HasNSW);
   }
 
   Value *CreateLShr(Value *LHS, Value *RHS, const Twine &Name = "",
                     bool isExact = false) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateLShr(LC, RC, isExact), Name);
     if (!isExact)
       return Insert(BinaryOperator::CreateLShr(LHS, RHS), Name);
     return Insert(BinaryOperator::CreateExactLShr(LHS, RHS), Name);
   }
   Value *CreateLShr(Value *LHS, const APInt &RHS, const Twine &Name = "",
                     bool isExact = false) {
     return CreateLShr(LHS, ConstantInt::get(LHS->getType(), RHS), Name,isExact);
   }
   Value *CreateLShr(Value *LHS, uint64_t RHS, const Twine &Name = "",
                     bool isExact = false) {
     return CreateLShr(LHS, ConstantInt::get(LHS->getType(), RHS), Name,isExact);
   }
 
   Value *CreateAShr(Value *LHS, Value *RHS, const Twine &Name = "",
                     bool isExact = false) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateAShr(LC, RC, isExact), Name);
     if (!isExact)
       return Insert(BinaryOperator::CreateAShr(LHS, RHS), Name);
     return Insert(BinaryOperator::CreateExactAShr(LHS, RHS), Name);
   }
   Value *CreateAShr(Value *LHS, const APInt &RHS, const Twine &Name = "",
                     bool isExact = false) {
     return CreateAShr(LHS, ConstantInt::get(LHS->getType(), RHS), Name,isExact);
   }
   Value *CreateAShr(Value *LHS, uint64_t RHS, const Twine &Name = "",
                     bool isExact = false) {
     return CreateAShr(LHS, ConstantInt::get(LHS->getType(), RHS), Name,isExact);
   }
 
   Value *CreateAnd(Value *LHS, Value *RHS, const Twine &Name = "") {
     if (Constant *RC = dyn_cast<Constant>(RHS)) {
       if (isa<ConstantInt>(RC) && cast<ConstantInt>(RC)->isAllOnesValue())
         return LHS;  // LHS & -1 -> LHS
       if (Constant *LC = dyn_cast<Constant>(LHS))
         return Insert(Folder.CreateAnd(LC, RC), Name);
     }
     return Insert(BinaryOperator::CreateAnd(LHS, RHS), Name);
   }
   Value *CreateAnd(Value *LHS, const APInt &RHS, const Twine &Name = "") {
     return CreateAnd(LHS, ConstantInt::get(LHS->getType(), RHS), Name);
   }
   Value *CreateAnd(Value *LHS, uint64_t RHS, const Twine &Name = "") {
     return CreateAnd(LHS, ConstantInt::get(LHS->getType(), RHS), Name);
   }
 
   Value *CreateOr(Value *LHS, Value *RHS, const Twine &Name = "") {
     if (Constant *RC = dyn_cast<Constant>(RHS)) {
       if (RC->isNullValue())
         return LHS;  // LHS | 0 -> LHS
       if (Constant *LC = dyn_cast<Constant>(LHS))
         return Insert(Folder.CreateOr(LC, RC), Name);
     }
     return Insert(BinaryOperator::CreateOr(LHS, RHS), Name);
   }
   Value *CreateOr(Value *LHS, const APInt &RHS, const Twine &Name = "") {
     return CreateOr(LHS, ConstantInt::get(LHS->getType(), RHS), Name);
   }
   Value *CreateOr(Value *LHS, uint64_t RHS, const Twine &Name = "") {
     return CreateOr(LHS, ConstantInt::get(LHS->getType(), RHS), Name);
   }
 
   Value *CreateXor(Value *LHS, Value *RHS, const Twine &Name = "") {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateXor(LC, RC), Name);
     return Insert(BinaryOperator::CreateXor(LHS, RHS), Name);
   }
   Value *CreateXor(Value *LHS, const APInt &RHS, const Twine &Name = "") {
     return CreateXor(LHS, ConstantInt::get(LHS->getType(), RHS), Name);
   }
   Value *CreateXor(Value *LHS, uint64_t RHS, const Twine &Name = "") {
     return CreateXor(LHS, ConstantInt::get(LHS->getType(), RHS), Name);
   }
 
   Value *CreateBinOp(Instruction::BinaryOps Opc,
                      Value *LHS, Value *RHS, const Twine &Name = "",
                      MDNode *FPMathTag = nullptr) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateBinOp(Opc, LC, RC), Name);
     llvm::Instruction *BinOp = BinaryOperator::Create(Opc, LHS, RHS);
     if (isa<FPMathOperator>(BinOp))
       BinOp = AddFPMathAttributes(BinOp, FPMathTag, FMF);
     return Insert(BinOp, Name);
   }
 
   Value *CreateNeg(Value *V, const Twine &Name = "",
                    bool HasNUW = false, bool HasNSW = false) {
     if (Constant *VC = dyn_cast<Constant>(V))
       return Insert(Folder.CreateNeg(VC, HasNUW, HasNSW), Name);
     BinaryOperator *BO = Insert(BinaryOperator::CreateNeg(V), Name);
     if (HasNUW) BO->setHasNoUnsignedWrap();
     if (HasNSW) BO->setHasNoSignedWrap();
     return BO;
   }
   Value *CreateNSWNeg(Value *V, const Twine &Name = "") {
     return CreateNeg(V, Name, false, true);
   }
   Value *CreateNUWNeg(Value *V, const Twine &Name = "") {
     return CreateNeg(V, Name, true, false);
   }
   Value *CreateFNeg(Value *V, const Twine &Name = "",
                     MDNode *FPMathTag = nullptr) {
     if (Constant *VC = dyn_cast<Constant>(V))
       return Insert(Folder.CreateFNeg(VC), Name);
     return Insert(AddFPMathAttributes(BinaryOperator::CreateFNeg(V),
                                       FPMathTag, FMF), Name);
   }
   Value *CreateNot(Value *V, const Twine &Name = "") {
     if (Constant *VC = dyn_cast<Constant>(V))
       return Insert(Folder.CreateNot(VC), Name);
     return Insert(BinaryOperator::CreateNot(V), Name);
   }
 
   //===--------------------------------------------------------------------===//
   // Instruction creation methods: Memory Instructions
   //===--------------------------------------------------------------------===//
 
   AllocaInst *CreateAlloca(Type *Ty, Value *ArraySize = nullptr,
                            const Twine &Name = "") {
     return Insert(new AllocaInst(Ty, ArraySize), Name);
   }
   // \brief Provided to resolve 'CreateLoad(Ptr, "...")' correctly, instead of
   // converting the string to 'bool' for the isVolatile parameter.
   LoadInst *CreateLoad(Value *Ptr, const char *Name) {
     return Insert(new LoadInst(Ptr), Name);
   }
   LoadInst *CreateLoad(Value *Ptr, const Twine &Name = "") {
     return Insert(new LoadInst(Ptr), Name);
   }
   LoadInst *CreateLoad(Type *Ty, Value *Ptr, const Twine &Name = "") {
     return Insert(new LoadInst(Ty, Ptr), Name);
   }
   LoadInst *CreateLoad(Value *Ptr, bool isVolatile, const Twine &Name = "") {
     return Insert(new LoadInst(Ptr, nullptr, isVolatile), Name);
   }
   StoreInst *CreateStore(Value *Val, Value *Ptr, bool isVolatile = false) {
     return Insert(new StoreInst(Val, Ptr, isVolatile));
   }
   // \brief Provided to resolve 'CreateAlignedLoad(Ptr, Align, "...")'
   // correctly, instead of converting the string to 'bool' for the isVolatile
   // parameter.
   LoadInst *CreateAlignedLoad(Value *Ptr, unsigned Align, const char *Name) {
     LoadInst *LI = CreateLoad(Ptr, Name);
     LI->setAlignment(Align);
     return LI;
   }
   LoadInst *CreateAlignedLoad(Value *Ptr, unsigned Align,
                               const Twine &Name = "") {
     LoadInst *LI = CreateLoad(Ptr, Name);
     LI->setAlignment(Align);
     return LI;
   }
   LoadInst *CreateAlignedLoad(Value *Ptr, unsigned Align, bool isVolatile,
                               const Twine &Name = "") {
     LoadInst *LI = CreateLoad(Ptr, isVolatile, Name);
     LI->setAlignment(Align);
     return LI;
   }
   StoreInst *CreateAlignedStore(Value *Val, Value *Ptr, unsigned Align,
                                 bool isVolatile = false) {
     StoreInst *SI = CreateStore(Val, Ptr, isVolatile);
     SI->setAlignment(Align);
     return SI;
   }
   FenceInst *CreateFence(AtomicOrdering Ordering,
                          SynchronizationScope SynchScope = CrossThread,
                          const Twine &Name = "") {
     return Insert(new FenceInst(Context, Ordering, SynchScope), Name);
   }
   AtomicCmpXchgInst *
   CreateAtomicCmpXchg(Value *Ptr, Value *Cmp, Value *New,
                       AtomicOrdering SuccessOrdering,
                       AtomicOrdering FailureOrdering,
                       SynchronizationScope SynchScope = CrossThread) {
     return Insert(new AtomicCmpXchgInst(Ptr, Cmp, New, SuccessOrdering,
                                         FailureOrdering, SynchScope));
   }
   AtomicRMWInst *CreateAtomicRMW(AtomicRMWInst::BinOp Op, Value *Ptr, Value *Val,
                                  AtomicOrdering Ordering,
                                SynchronizationScope SynchScope = CrossThread) {
     return Insert(new AtomicRMWInst(Op, Ptr, Val, Ordering, SynchScope));
   }
   Value *CreateGEP(Value *Ptr, ArrayRef<Value *> IdxList,
                    const Twine &Name = "") {
     return CreateGEP(nullptr, Ptr, IdxList, Name);
   }
   Value *CreateGEP(Type *Ty, Value *Ptr, ArrayRef<Value *> IdxList,
                    const Twine &Name = "") {
     if (Constant *PC = dyn_cast<Constant>(Ptr)) {
       // Every index must be constant.
       size_t i, e;
       for (i = 0, e = IdxList.size(); i != e; ++i)
         if (!isa<Constant>(IdxList[i]))
           break;
       if (i == e)
         return Insert(Folder.CreateGetElementPtr(Ty, PC, IdxList), Name);
     }
     return Insert(GetElementPtrInst::Create(Ty, Ptr, IdxList), Name);
   }
   Value *CreateInBoundsGEP(Value *Ptr, ArrayRef<Value *> IdxList,
                            const Twine &Name = "") {
     return CreateInBoundsGEP(nullptr, Ptr, IdxList, Name);
   }
   Value *CreateInBoundsGEP(Type *Ty, Value *Ptr, ArrayRef<Value *> IdxList,
                            const Twine &Name = "") {
     if (Constant *PC = dyn_cast<Constant>(Ptr)) {
       // Every index must be constant.
       size_t i, e;
       for (i = 0, e = IdxList.size(); i != e; ++i)
         if (!isa<Constant>(IdxList[i]))
           break;
       if (i == e)
         return Insert(Folder.CreateInBoundsGetElementPtr(Ty, PC, IdxList),
                       Name);
     }
     return Insert(GetElementPtrInst::CreateInBounds(Ty, Ptr, IdxList), Name);
   }
   Value *CreateGEP(Value *Ptr, Value *Idx, const Twine &Name = "") {
     return CreateGEP(nullptr, Ptr, Idx, Name);
   }
   Value *CreateGEP(Type *Ty, Value *Ptr, Value *Idx, const Twine &Name = "") {
     if (Constant *PC = dyn_cast<Constant>(Ptr))
       if (Constant *IC = dyn_cast<Constant>(Idx))
         return Insert(Folder.CreateGetElementPtr(Ty, PC, IC), Name);
     return Insert(GetElementPtrInst::Create(Ty, Ptr, Idx), Name);
   }
   Value *CreateInBoundsGEP(Type *Ty, Value *Ptr, Value *Idx,
                            const Twine &Name = "") {
     if (Constant *PC = dyn_cast<Constant>(Ptr))
       if (Constant *IC = dyn_cast<Constant>(Idx))
         return Insert(Folder.CreateInBoundsGetElementPtr(Ty, PC, IC), Name);
     return Insert(GetElementPtrInst::CreateInBounds(Ty, Ptr, Idx), Name);
   }
   Value *CreateConstGEP1_32(Value *Ptr, unsigned Idx0, const Twine &Name = "") {
     return CreateConstGEP1_32(nullptr, Ptr, Idx0, Name);
   }
   Value *CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0,
                             const Twine &Name = "") {
     Value *Idx = ConstantInt::get(Type::getInt32Ty(Context), Idx0);
 
     if (Constant *PC = dyn_cast<Constant>(Ptr))
       return Insert(Folder.CreateGetElementPtr(Ty, PC, Idx), Name);
 
     return Insert(GetElementPtrInst::Create(Ty, Ptr, Idx), Name);
   }
   Value *CreateConstInBoundsGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0,
                                     const Twine &Name = "") {
     Value *Idx = ConstantInt::get(Type::getInt32Ty(Context), Idx0);
 
     if (Constant *PC = dyn_cast<Constant>(Ptr))
       return Insert(Folder.CreateInBoundsGetElementPtr(Ty, PC, Idx), Name);
 
     return Insert(GetElementPtrInst::CreateInBounds(Ty, Ptr, Idx), Name);
   }
   Value *CreateConstGEP2_32(Type *Ty, Value *Ptr, unsigned Idx0, unsigned Idx1,
                             const Twine &Name = "") {
     Value *Idxs[] = {
       ConstantInt::get(Type::getInt32Ty(Context), Idx0),
       ConstantInt::get(Type::getInt32Ty(Context), Idx1)
     };
 
     if (Constant *PC = dyn_cast<Constant>(Ptr))
       return Insert(Folder.CreateGetElementPtr(Ty, PC, Idxs), Name);
 
     return Insert(GetElementPtrInst::Create(Ty, Ptr, Idxs), Name);
   }
   Value *CreateConstInBoundsGEP2_32(Type *Ty, Value *Ptr, unsigned Idx0,
                                     unsigned Idx1, const Twine &Name = "") {
     Value *Idxs[] = {
       ConstantInt::get(Type::getInt32Ty(Context), Idx0),
       ConstantInt::get(Type::getInt32Ty(Context), Idx1)
     };
 
     if (Constant *PC = dyn_cast<Constant>(Ptr))
       return Insert(Folder.CreateInBoundsGetElementPtr(Ty, PC, Idxs), Name);
 
     return Insert(GetElementPtrInst::CreateInBounds(Ty, Ptr, Idxs), Name);
   }
   Value *CreateConstGEP1_64(Value *Ptr, uint64_t Idx0, const Twine &Name = "") {
     Value *Idx = ConstantInt::get(Type::getInt64Ty(Context), Idx0);
 
     if (Constant *PC = dyn_cast<Constant>(Ptr))
       return Insert(Folder.CreateGetElementPtr(nullptr, PC, Idx), Name);
 
     return Insert(GetElementPtrInst::Create(nullptr, Ptr, Idx), Name);
   }
   Value *CreateConstInBoundsGEP1_64(Value *Ptr, uint64_t Idx0,
                                     const Twine &Name = "") {
     Value *Idx = ConstantInt::get(Type::getInt64Ty(Context), Idx0);
 
     if (Constant *PC = dyn_cast<Constant>(Ptr))
       return Insert(Folder.CreateInBoundsGetElementPtr(nullptr, PC, Idx), Name);
 
     return Insert(GetElementPtrInst::CreateInBounds(nullptr, Ptr, Idx), Name);
   }
   Value *CreateConstGEP2_64(Value *Ptr, uint64_t Idx0, uint64_t Idx1,
                     const Twine &Name = "") {
     Value *Idxs[] = {
       ConstantInt::get(Type::getInt64Ty(Context), Idx0),
       ConstantInt::get(Type::getInt64Ty(Context), Idx1)
     };
 
     if (Constant *PC = dyn_cast<Constant>(Ptr))
       return Insert(Folder.CreateGetElementPtr(nullptr, PC, Idxs), Name);
 
     return Insert(GetElementPtrInst::Create(nullptr, Ptr, Idxs), Name);
   }
   Value *CreateConstInBoundsGEP2_64(Value *Ptr, uint64_t Idx0, uint64_t Idx1,
                                     const Twine &Name = "") {
     Value *Idxs[] = {
       ConstantInt::get(Type::getInt64Ty(Context), Idx0),
       ConstantInt::get(Type::getInt64Ty(Context), Idx1)
     };
 
     if (Constant *PC = dyn_cast<Constant>(Ptr))
       return Insert(Folder.CreateInBoundsGetElementPtr(nullptr, PC, Idxs),
                     Name);
 
     return Insert(GetElementPtrInst::CreateInBounds(nullptr, Ptr, Idxs), Name);
   }
   Value *CreateStructGEP(Type *Ty, Value *Ptr, unsigned Idx,
                          const Twine &Name = "") {
     return CreateConstInBoundsGEP2_32(Ty, Ptr, 0, Idx, Name);
   }
 
   /// \brief Same as CreateGlobalString, but return a pointer with "i8*" type
   /// instead of a pointer to array of i8.
   Value *CreateGlobalStringPtr(StringRef Str, const Twine &Name = "",
                                unsigned AddressSpace = 0) {
     GlobalVariable *gv = CreateGlobalString(Str, Name, AddressSpace);
     Value *zero = ConstantInt::get(Type::getInt32Ty(Context), 0);
     Value *Args[] = { zero, zero };
     return CreateInBoundsGEP(gv->getValueType(), gv, Args, Name);
   }
 
   //===--------------------------------------------------------------------===//
   // Instruction creation methods: Cast/Conversion Operators
   //===--------------------------------------------------------------------===//
 
   Value *CreateTrunc(Value *V, Type *DestTy, const Twine &Name = "") {
     return CreateCast(Instruction::Trunc, V, DestTy, Name);
   }
   Value *CreateZExt(Value *V, Type *DestTy, const Twine &Name = "") {
     return CreateCast(Instruction::ZExt, V, DestTy, Name);
   }
   Value *CreateSExt(Value *V, Type *DestTy, const Twine &Name = "") {
     return CreateCast(Instruction::SExt, V, DestTy, Name);
   }
   /// \brief Create a ZExt or Trunc from the integer value V to DestTy. Return
   /// the value untouched if the type of V is already DestTy.
   Value *CreateZExtOrTrunc(Value *V, Type *DestTy,
                            const Twine &Name = "") {
     assert(V->getType()->isIntOrIntVectorTy() &&
            DestTy->isIntOrIntVectorTy() &&
            "Can only zero extend/truncate integers!");
     Type *VTy = V->getType();
     if (VTy->getScalarSizeInBits() < DestTy->getScalarSizeInBits())
       return CreateZExt(V, DestTy, Name);
     if (VTy->getScalarSizeInBits() > DestTy->getScalarSizeInBits())
       return CreateTrunc(V, DestTy, Name);
     return V;
   }
   /// \brief Create a SExt or Trunc from the integer value V to DestTy. Return
   /// the value untouched if the type of V is already DestTy.
   Value *CreateSExtOrTrunc(Value *V, Type *DestTy,
                            const Twine &Name = "") {
     assert(V->getType()->isIntOrIntVectorTy() &&
            DestTy->isIntOrIntVectorTy() &&
            "Can only sign extend/truncate integers!");
     Type *VTy = V->getType();
     if (VTy->getScalarSizeInBits() < DestTy->getScalarSizeInBits())
       return CreateSExt(V, DestTy, Name);
     if (VTy->getScalarSizeInBits() > DestTy->getScalarSizeInBits())
       return CreateTrunc(V, DestTy, Name);
     return V;
   }
   Value *CreateFPToUI(Value *V, Type *DestTy, const Twine &Name = ""){
     return CreateCast(Instruction::FPToUI, V, DestTy, Name);
   }
   Value *CreateFPToSI(Value *V, Type *DestTy, const Twine &Name = ""){
     return CreateCast(Instruction::FPToSI, V, DestTy, Name);
   }
   Value *CreateUIToFP(Value *V, Type *DestTy, const Twine &Name = ""){
     return CreateCast(Instruction::UIToFP, V, DestTy, Name);
   }
   Value *CreateSIToFP(Value *V, Type *DestTy, const Twine &Name = ""){
     return CreateCast(Instruction::SIToFP, V, DestTy, Name);
   }
   Value *CreateFPTrunc(Value *V, Type *DestTy,
                        const Twine &Name = "") {
     return CreateCast(Instruction::FPTrunc, V, DestTy, Name);
   }
   Value *CreateFPExt(Value *V, Type *DestTy, const Twine &Name = "") {
     return CreateCast(Instruction::FPExt, V, DestTy, Name);
   }
   Value *CreatePtrToInt(Value *V, Type *DestTy,
                         const Twine &Name = "") {
     return CreateCast(Instruction::PtrToInt, V, DestTy, Name);
   }
   Value *CreateIntToPtr(Value *V, Type *DestTy,
                         const Twine &Name = "") {
     return CreateCast(Instruction::IntToPtr, V, DestTy, Name);
   }
   Value *CreateBitCast(Value *V, Type *DestTy,
                        const Twine &Name = "") {
     return CreateCast(Instruction::BitCast, V, DestTy, Name);
   }
   Value *CreateAddrSpaceCast(Value *V, Type *DestTy,
                              const Twine &Name = "") {
     return CreateCast(Instruction::AddrSpaceCast, V, DestTy, Name);
   }
   Value *CreateZExtOrBitCast(Value *V, Type *DestTy,
                              const Twine &Name = "") {
     if (V->getType() == DestTy)
       return V;
     if (Constant *VC = dyn_cast<Constant>(V))
       return Insert(Folder.CreateZExtOrBitCast(VC, DestTy), Name);
     return Insert(CastInst::CreateZExtOrBitCast(V, DestTy), Name);
   }
   Value *CreateSExtOrBitCast(Value *V, Type *DestTy,
                              const Twine &Name = "") {
     if (V->getType() == DestTy)
       return V;
     if (Constant *VC = dyn_cast<Constant>(V))
       return Insert(Folder.CreateSExtOrBitCast(VC, DestTy), Name);
     return Insert(CastInst::CreateSExtOrBitCast(V, DestTy), Name);
   }
   Value *CreateTruncOrBitCast(Value *V, Type *DestTy,
                               const Twine &Name = "") {
     if (V->getType() == DestTy)
       return V;
     if (Constant *VC = dyn_cast<Constant>(V))
       return Insert(Folder.CreateTruncOrBitCast(VC, DestTy), Name);
     return Insert(CastInst::CreateTruncOrBitCast(V, DestTy), Name);
   }
   Value *CreateCast(Instruction::CastOps Op, Value *V, Type *DestTy,
                     const Twine &Name = "") {
     if (V->getType() == DestTy)
       return V;
     if (Constant *VC = dyn_cast<Constant>(V))
       return Insert(Folder.CreateCast(Op, VC, DestTy), Name);
     return Insert(CastInst::Create(Op, V, DestTy), Name);
   }
   Value *CreatePointerCast(Value *V, Type *DestTy,
                            const Twine &Name = "") {
     if (V->getType() == DestTy)
       return V;
     if (Constant *VC = dyn_cast<Constant>(V))
       return Insert(Folder.CreatePointerCast(VC, DestTy), Name);
     return Insert(CastInst::CreatePointerCast(V, DestTy), Name);
   }
 
   Value *CreatePointerBitCastOrAddrSpaceCast(Value *V, Type *DestTy,
                                              const Twine &Name = "") {
     if (V->getType() == DestTy)
       return V;
 
     if (Constant *VC = dyn_cast<Constant>(V)) {
       return Insert(Folder.CreatePointerBitCastOrAddrSpaceCast(VC, DestTy),
                     Name);
     }
 
     return Insert(CastInst::CreatePointerBitCastOrAddrSpaceCast(V, DestTy),
                   Name);
   }
 
   Value *CreateIntCast(Value *V, Type *DestTy, bool isSigned,
                        const Twine &Name = "") {
     if (V->getType() == DestTy)
       return V;
     if (Constant *VC = dyn_cast<Constant>(V))
       return Insert(Folder.CreateIntCast(VC, DestTy, isSigned), Name);
     return Insert(CastInst::CreateIntegerCast(V, DestTy, isSigned), Name);
   }
 
   Value *CreateBitOrPointerCast(Value *V, Type *DestTy,
                                 const Twine &Name = "") {
     if (V->getType() == DestTy)
       return V;
     if (V->getType()->getScalarType()->isPointerTy() &&
         DestTy->getScalarType()->isIntegerTy())
       return CreatePtrToInt(V, DestTy, Name);
     if (V->getType()->getScalarType()->isIntegerTy() &&
         DestTy->getScalarType()->isPointerTy())
       return CreateIntToPtr(V, DestTy, Name);
 
     return CreateBitCast(V, DestTy, Name);
   }
 
 private:
   // \brief Provided to resolve 'CreateIntCast(Ptr, Ptr, "...")', giving a
   // compile time error, instead of converting the string to bool for the
   // isSigned parameter.
   Value *CreateIntCast(Value *, Type *, const char *) = delete;
 
 public:
   Value *CreateFPCast(Value *V, Type *DestTy, const Twine &Name = "") {
     if (V->getType() == DestTy)
       return V;
     if (Constant *VC = dyn_cast<Constant>(V))
       return Insert(Folder.CreateFPCast(VC, DestTy), Name);
     return Insert(CastInst::CreateFPCast(V, DestTy), Name);
   }
 
   //===--------------------------------------------------------------------===//
   // Instruction creation methods: Compare Instructions
   //===--------------------------------------------------------------------===//
 
   Value *CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateICmp(ICmpInst::ICMP_EQ, LHS, RHS, Name);
   }
   Value *CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateICmp(ICmpInst::ICMP_NE, LHS, RHS, Name);
   }
   Value *CreateICmpUGT(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateICmp(ICmpInst::ICMP_UGT, LHS, RHS, Name);
   }
   Value *CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateICmp(ICmpInst::ICMP_UGE, LHS, RHS, Name);
   }
   Value *CreateICmpULT(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateICmp(ICmpInst::ICMP_ULT, LHS, RHS, Name);
   }
   Value *CreateICmpULE(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateICmp(ICmpInst::ICMP_ULE, LHS, RHS, Name);
   }
   Value *CreateICmpSGT(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateICmp(ICmpInst::ICMP_SGT, LHS, RHS, Name);
   }
   Value *CreateICmpSGE(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateICmp(ICmpInst::ICMP_SGE, LHS, RHS, Name);
   }
   Value *CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateICmp(ICmpInst::ICMP_SLT, LHS, RHS, Name);
   }
   Value *CreateICmpSLE(Value *LHS, Value *RHS, const Twine &Name = "") {
     return CreateICmp(ICmpInst::ICMP_SLE, LHS, RHS, Name);
   }
 
   Value *CreateFCmpOEQ(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_OEQ, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpOGT(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_OGT, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpOGE(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_OGE, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpOLT(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_OLT, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpOLE(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_OLE, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpONE(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_ONE, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpORD(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_ORD, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpUNO(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_UNO, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpUEQ(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_UEQ, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpUGT(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_UGT, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpUGE(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_UGE, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpULT(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_ULT, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpULE(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_ULE, LHS, RHS, Name, FPMathTag);
   }
   Value *CreateFCmpUNE(Value *LHS, Value *RHS, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     return CreateFCmp(FCmpInst::FCMP_UNE, LHS, RHS, Name, FPMathTag);
   }
 
   Value *CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS,
                     const Twine &Name = "") {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateICmp(P, LC, RC), Name);
     return Insert(new ICmpInst(P, LHS, RHS), Name);
   }
   Value *CreateFCmp(CmpInst::Predicate P, Value *LHS, Value *RHS,
                     const Twine &Name = "", MDNode *FPMathTag = nullptr) {
     if (Constant *LC = dyn_cast<Constant>(LHS))
       if (Constant *RC = dyn_cast<Constant>(RHS))
         return Insert(Folder.CreateFCmp(P, LC, RC), Name);
     return Insert(AddFPMathAttributes(new FCmpInst(P, LHS, RHS),
                                       FPMathTag, FMF), Name);
   }
 
   //===--------------------------------------------------------------------===//
   // Instruction creation methods: Other Instructions
   //===--------------------------------------------------------------------===//
 
   PHINode *CreatePHI(Type *Ty, unsigned NumReservedValues,
                      const Twine &Name = "") {
     return Insert(PHINode::Create(Ty, NumReservedValues), Name);
   }
 
   CallInst *CreateCall(Value *Callee, ArrayRef<Value *> Args = None,
-                       ArrayRef<OperandBundleDef> OpBundles = None,
                        const Twine &Name = "", MDNode *FPMathTag = nullptr) {
-    CallInst *CI = CallInst::Create(Callee, Args, OpBundles);
-    if (isa<FPMathOperator>(CI))
-      CI = cast<CallInst>(AddFPMathAttributes(CI, FPMathTag, FMF));
-    return Insert(CI, Name);
-  }
-
-  CallInst *CreateCall(Value *Callee, ArrayRef<Value *> Args,
-                       const Twine &Name, MDNode *FPMathTag = nullptr) {
     PointerType *PTy = cast<PointerType>(Callee->getType());
     FunctionType *FTy = cast<FunctionType>(PTy->getElementType());
     return CreateCall(FTy, Callee, Args, Name, FPMathTag);
   }
 
   CallInst *CreateCall(llvm::FunctionType *FTy, Value *Callee,
                        ArrayRef<Value *> Args, const Twine &Name = "",
                        MDNode *FPMathTag = nullptr) {
     CallInst *CI = CallInst::Create(FTy, Callee, Args, DefaultOperandBundles);
+    if (isa<FPMathOperator>(CI))
+      CI = cast<CallInst>(AddFPMathAttributes(CI, FPMathTag, FMF));
+    return Insert(CI, Name);
+  }
+
+  CallInst *CreateCall(Value *Callee, ArrayRef<Value *> Args,
+                       ArrayRef<OperandBundleDef> OpBundles,
+                       const Twine &Name = "", MDNode *FPMathTag = nullptr) {
+    CallInst *CI = CallInst::Create(Callee, Args, OpBundles);
     if (isa<FPMathOperator>(CI))
       CI = cast<CallInst>(AddFPMathAttributes(CI, FPMathTag, FMF));
     return Insert(CI, Name);
   }
 
   CallInst *CreateCall(Function *Callee, ArrayRef<Value *> Args,
                        const Twine &Name = "", MDNode *FPMathTag = nullptr) {
     return CreateCall(Callee->getFunctionType(), Callee, Args, Name, FPMathTag);
   }
 
   Value *CreateSelect(Value *C, Value *True, Value *False,
                       const Twine &Name = "") {
     if (Constant *CC = dyn_cast<Constant>(C))
       if (Constant *TC = dyn_cast<Constant>(True))
         if (Constant *FC = dyn_cast<Constant>(False))
           return Insert(Folder.CreateSelect(CC, TC, FC), Name);
     return Insert(SelectInst::Create(C, True, False), Name);
   }
 
   VAArgInst *CreateVAArg(Value *List, Type *Ty, const Twine &Name = "") {
     return Insert(new VAArgInst(List, Ty), Name);
   }
 
   Value *CreateExtractElement(Value *Vec, Value *Idx,
                               const Twine &Name = "") {
     if (Constant *VC = dyn_cast<Constant>(Vec))
       if (Constant *IC = dyn_cast<Constant>(Idx))
         return Insert(Folder.CreateExtractElement(VC, IC), Name);
     return Insert(ExtractElementInst::Create(Vec, Idx), Name);
   }
 
   Value *CreateExtractElement(Value *Vec, uint64_t Idx,
                               const Twine &Name = "") {
     return CreateExtractElement(Vec, getInt64(Idx), Name);
   }
 
   Value *CreateInsertElement(Value *Vec, Value *NewElt, Value *Idx,
                              const Twine &Name = "") {
     if (Constant *VC = dyn_cast<Constant>(Vec))
       if (Constant *NC = dyn_cast<Constant>(NewElt))
         if (Constant *IC = dyn_cast<Constant>(Idx))
           return Insert(Folder.CreateInsertElement(VC, NC, IC), Name);
     return Insert(InsertElementInst::Create(Vec, NewElt, Idx), Name);
   }
 
   Value *CreateInsertElement(Value *Vec, Value *NewElt, uint64_t Idx,
                              const Twine &Name = "") {
     return CreateInsertElement(Vec, NewElt, getInt64(Idx), Name);
   }
 
   Value *CreateShuffleVector(Value *V1, Value *V2, Value *Mask,
                              const Twine &Name = "") {
     if (Constant *V1C = dyn_cast<Constant>(V1))
       if (Constant *V2C = dyn_cast<Constant>(V2))
         if (Constant *MC = dyn_cast<Constant>(Mask))
           return Insert(Folder.CreateShuffleVector(V1C, V2C, MC), Name);
     return Insert(new ShuffleVectorInst(V1, V2, Mask), Name);
   }
 
   Value *CreateShuffleVector(Value *V1, Value *V2, ArrayRef<int> IntMask,
                              const Twine &Name = "") {
     size_t MaskSize = IntMask.size();
     SmallVector<Constant*, 8> MaskVec(MaskSize);
     for (size_t i = 0; i != MaskSize; ++i)
       MaskVec[i] = getInt32(IntMask[i]);
     Value *Mask = ConstantVector::get(MaskVec);
     return CreateShuffleVector(V1, V2, Mask, Name);
   }
 
   Value *CreateExtractValue(Value *Agg,
                             ArrayRef<unsigned> Idxs,
                             const Twine &Name = "") {
     if (Constant *AggC = dyn_cast<Constant>(Agg))
       return Insert(Folder.CreateExtractValue(AggC, Idxs), Name);
     return Insert(ExtractValueInst::Create(Agg, Idxs), Name);
   }
 
   Value *CreateInsertValue(Value *Agg, Value *Val,
                            ArrayRef<unsigned> Idxs,
                            const Twine &Name = "") {
     if (Constant *AggC = dyn_cast<Constant>(Agg))
       if (Constant *ValC = dyn_cast<Constant>(Val))
         return Insert(Folder.CreateInsertValue(AggC, ValC, Idxs), Name);
     return Insert(InsertValueInst::Create(Agg, Val, Idxs), Name);
   }
 
   LandingPadInst *CreateLandingPad(Type *Ty, unsigned NumClauses,
                                    const Twine &Name = "") {
     return Insert(LandingPadInst::Create(Ty, NumClauses), Name);
   }
 
   //===--------------------------------------------------------------------===//
   // Utility creation methods
   //===--------------------------------------------------------------------===//
 
   /// \brief Return an i1 value testing if \p Arg is null.
   Value *CreateIsNull(Value *Arg, const Twine &Name = "") {
     return CreateICmpEQ(Arg, Constant::getNullValue(Arg->getType()),
                         Name);
   }
 
   /// \brief Return an i1 value testing if \p Arg is not null.
   Value *CreateIsNotNull(Value *Arg, const Twine &Name = "") {
     return CreateICmpNE(Arg, Constant::getNullValue(Arg->getType()),
                         Name);
   }
 
   /// \brief Return the i64 difference between two pointer values, dividing out
   /// the size of the pointed-to objects.
   ///
   /// This is intended to implement C-style pointer subtraction. As such, the
   /// pointers must be appropriately aligned for their element types and
   /// pointing into the same object.
   Value *CreatePtrDiff(Value *LHS, Value *RHS, const Twine &Name = "") {
     assert(LHS->getType() == RHS->getType() &&
            "Pointer subtraction operand types must match!");
     PointerType *ArgType = cast<PointerType>(LHS->getType());
     Value *LHS_int = CreatePtrToInt(LHS, Type::getInt64Ty(Context));
     Value *RHS_int = CreatePtrToInt(RHS, Type::getInt64Ty(Context));
     Value *Difference = CreateSub(LHS_int, RHS_int);
     return CreateExactSDiv(Difference,
                            ConstantExpr::getSizeOf(ArgType->getElementType()),
                            Name);
   }
 
   /// \brief Create an invariant.group.barrier intrinsic call, that stops
   /// optimizer to propagate equality using invariant.group metadata.
   /// If Ptr type is different from i8*, it's casted to i8* before call
   /// and casted back to Ptr type after call.
   Value *CreateInvariantGroupBarrier(Value *Ptr) {
     Module *M = BB->getParent()->getParent();
     Function *FnInvariantGroupBarrier = Intrinsic::getDeclaration(M,
             Intrinsic::invariant_group_barrier);
 
     Type *ArgumentAndReturnType = FnInvariantGroupBarrier->getReturnType();
     assert(ArgumentAndReturnType ==
         FnInvariantGroupBarrier->getFunctionType()->getParamType(0) &&
         "InvariantGroupBarrier should take and return the same type");
     Type *PtrType = Ptr->getType();
 
     bool PtrTypeConversionNeeded = PtrType != ArgumentAndReturnType;
     if (PtrTypeConversionNeeded)
       Ptr = CreateBitCast(Ptr, ArgumentAndReturnType);
 
     CallInst *Fn = CreateCall(FnInvariantGroupBarrier, {Ptr});
 
     if (PtrTypeConversionNeeded)
       return CreateBitCast(Fn, PtrType);
     return Fn;
   }
 
   /// \brief Return a vector value that contains \arg V broadcasted to \p
   /// NumElts elements.
   Value *CreateVectorSplat(unsigned NumElts, Value *V, const Twine &Name = "") {
     assert(NumElts > 0 && "Cannot splat to an empty vector!");
 
     // First insert it into an undef vector so we can shuffle it.
     Type *I32Ty = getInt32Ty();
     Value *Undef = UndefValue::get(VectorType::get(V->getType(), NumElts));
     V = CreateInsertElement(Undef, V, ConstantInt::get(I32Ty, 0),
                             Name + ".splatinsert");
 
     // Shuffle the value across the desired number of elements.
     Value *Zeros = ConstantAggregateZero::get(VectorType::get(I32Ty, NumElts));
     return CreateShuffleVector(V, Undef, Zeros, Name + ".splat");
   }
 
   /// \brief Return a value that has been extracted from a larger integer type.
   Value *CreateExtractInteger(const DataLayout &DL, Value *From,
                               IntegerType *ExtractedTy, uint64_t Offset,
                               const Twine &Name) {
     IntegerType *IntTy = cast<IntegerType>(From->getType());
     assert(DL.getTypeStoreSize(ExtractedTy) + Offset <=
                DL.getTypeStoreSize(IntTy) &&
            "Element extends past full value");
     uint64_t ShAmt = 8 * Offset;
     Value *V = From;
     if (DL.isBigEndian())
       ShAmt = 8 * (DL.getTypeStoreSize(IntTy) -
                    DL.getTypeStoreSize(ExtractedTy) - Offset);
     if (ShAmt) {
       V = CreateLShr(V, ShAmt, Name + ".shift");
     }
     assert(ExtractedTy->getBitWidth() <= IntTy->getBitWidth() &&
            "Cannot extract to a larger integer!");
     if (ExtractedTy != IntTy) {
       V = CreateTrunc(V, ExtractedTy, Name + ".trunc");
     }
     return V;
   }
 
   /// \brief Create an assume intrinsic call that represents an alignment
   /// assumption on the provided pointer.
   ///
   /// An optional offset can be provided, and if it is provided, the offset
   /// must be subtracted from the provided pointer to get the pointer with the
   /// specified alignment.
   CallInst *CreateAlignmentAssumption(const DataLayout &DL, Value *PtrValue,
                                       unsigned Alignment,
                                       Value *OffsetValue = nullptr) {
     assert(isa<PointerType>(PtrValue->getType()) &&
            "trying to create an alignment assumption on a non-pointer?");
 
     PointerType *PtrTy = cast<PointerType>(PtrValue->getType());
     Type *IntPtrTy = getIntPtrTy(DL, PtrTy->getAddressSpace());
     Value *PtrIntValue = CreatePtrToInt(PtrValue, IntPtrTy, "ptrint");
 
     Value *Mask = ConstantInt::get(IntPtrTy,
       Alignment > 0 ? Alignment - 1 : 0);
     if (OffsetValue) {
       bool IsOffsetZero = false;
       if (ConstantInt *CI = dyn_cast<ConstantInt>(OffsetValue))
         IsOffsetZero = CI->isZero();
 
       if (!IsOffsetZero) {
         if (OffsetValue->getType() != IntPtrTy)
           OffsetValue = CreateIntCast(OffsetValue, IntPtrTy, /*isSigned*/ true,
                                       "offsetcast");
         PtrIntValue = CreateSub(PtrIntValue, OffsetValue, "offsetptr");
       }
     }
 
     Value *Zero = ConstantInt::get(IntPtrTy, 0);
     Value *MaskedPtr = CreateAnd(PtrIntValue, Mask, "maskedptr");
     Value *InvCond = CreateICmpEQ(MaskedPtr, Zero, "maskcond");
 
     return CreateAssumption(InvCond);
   }
 };
 
 // Create wrappers for C Binding types (see CBindingWrapping.h).
 DEFINE_SIMPLE_CONVERSION_FUNCTIONS(IRBuilder<>, LLVMBuilderRef)
 
 } // end namespace llvm
 
 #endif // LLVM_IR_IRBUILDER_H
Index: vendor/llvm/dist/include/llvm/IR/Instructions.h
===================================================================
--- vendor/llvm/dist/include/llvm/IR/Instructions.h	(revision 295845)
+++ vendor/llvm/dist/include/llvm/IR/Instructions.h	(revision 295846)
@@ -1,4825 +1,4833 @@
 //===-- llvm/Instructions.h - Instruction subclass definitions --*- C++ -*-===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file exposes the class definitions of all of the subclasses of the
 // Instruction class.  This is meant to be an easy way to get access to all
 // instruction subclasses.
 //
 //===----------------------------------------------------------------------===//
 
 #ifndef LLVM_IR_INSTRUCTIONS_H
 #define LLVM_IR_INSTRUCTIONS_H
 
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/iterator_range.h"
 #include "llvm/IR/Attributes.h"
 #include "llvm/IR/CallingConv.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/Support/ErrorHandling.h"
 #include <iterator>
 
 namespace llvm {
 
 class APInt;
 class ConstantInt;
 class ConstantRange;
 class DataLayout;
 class LLVMContext;
 
 enum AtomicOrdering {
   NotAtomic = 0,
   Unordered = 1,
   Monotonic = 2,
   // Consume = 3,  // Not specified yet.
   Acquire = 4,
   Release = 5,
   AcquireRelease = 6,
   SequentiallyConsistent = 7
 };
 
 enum SynchronizationScope {
   SingleThread = 0,
   CrossThread = 1
 };
 
 /// Returns true if the ordering is at least as strong as acquire
 /// (i.e. acquire, acq_rel or seq_cst)
 inline bool isAtLeastAcquire(AtomicOrdering Ord) {
    return (Ord == Acquire ||
     Ord == AcquireRelease ||
     Ord == SequentiallyConsistent);
 }
 
 /// Returns true if the ordering is at least as strong as release
 /// (i.e. release, acq_rel or seq_cst)
 inline bool isAtLeastRelease(AtomicOrdering Ord) {
 return (Ord == Release ||
     Ord == AcquireRelease ||
     Ord == SequentiallyConsistent);
 }
 
 //===----------------------------------------------------------------------===//
 //                                AllocaInst Class
 //===----------------------------------------------------------------------===//
 
 /// AllocaInst - an instruction to allocate memory on the stack
 ///
 class AllocaInst : public UnaryInstruction {
   Type *AllocatedType;
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   AllocaInst *cloneImpl() const;
 
 public:
   explicit AllocaInst(Type *Ty, Value *ArraySize = nullptr,
                       const Twine &Name = "",
                       Instruction *InsertBefore = nullptr);
   AllocaInst(Type *Ty, Value *ArraySize,
              const Twine &Name, BasicBlock *InsertAtEnd);
 
   AllocaInst(Type *Ty, const Twine &Name, Instruction *InsertBefore = nullptr);
   AllocaInst(Type *Ty, const Twine &Name, BasicBlock *InsertAtEnd);
 
   AllocaInst(Type *Ty, Value *ArraySize, unsigned Align,
              const Twine &Name = "", Instruction *InsertBefore = nullptr);
   AllocaInst(Type *Ty, Value *ArraySize, unsigned Align,
              const Twine &Name, BasicBlock *InsertAtEnd);
 
   // Out of line virtual method, so the vtable, etc. has a home.
   ~AllocaInst() override;
 
   /// isArrayAllocation - Return true if there is an allocation size parameter
   /// to the allocation instruction that is not 1.
   ///
   bool isArrayAllocation() const;
 
   /// getArraySize - Get the number of elements allocated. For a simple
   /// allocation of a single element, this will return a constant 1 value.
   ///
   const Value *getArraySize() const { return getOperand(0); }
   Value *getArraySize() { return getOperand(0); }
 
   /// getType - Overload to return most specific pointer type
   ///
   PointerType *getType() const {
     return cast<PointerType>(Instruction::getType());
   }
 
   /// getAllocatedType - Return the type that is being allocated by the
   /// instruction.
   ///
   Type *getAllocatedType() const { return AllocatedType; }
   /// \brief for use only in special circumstances that need to generically
   /// transform a whole instruction (eg: IR linking and vectorization).
   void setAllocatedType(Type *Ty) { AllocatedType = Ty; }
 
   /// getAlignment - Return the alignment of the memory that is being allocated
   /// by the instruction.
   ///
   unsigned getAlignment() const {
     return (1u << (getSubclassDataFromInstruction() & 31)) >> 1;
   }
   void setAlignment(unsigned Align);
 
   /// isStaticAlloca - Return true if this alloca is in the entry block of the
   /// function and is a constant size.  If so, the code generator will fold it
   /// into the prolog/epilog code, so it is basically free.
   bool isStaticAlloca() const;
 
   /// \brief Return true if this alloca is used as an inalloca argument to a
   /// call.  Such allocas are never considered static even if they are in the
   /// entry block.
   bool isUsedWithInAlloca() const {
     return getSubclassDataFromInstruction() & 32;
   }
 
   /// \brief Specify whether this alloca is used to represent the arguments to
   /// a call.
   void setUsedWithInAlloca(bool V) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~32) |
                                (V ? 32 : 0));
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return (I->getOpcode() == Instruction::Alloca);
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   // Shadow Instruction::setInstructionSubclassData with a private forwarding
   // method so that subclasses cannot accidentally use it.
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                LoadInst Class
 //===----------------------------------------------------------------------===//
 
 /// LoadInst - an instruction for reading from memory.  This uses the
 /// SubclassData field in Value to store whether or not the load is volatile.
 ///
 class LoadInst : public UnaryInstruction {
   void AssertOK();
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   LoadInst *cloneImpl() const;
 
 public:
   LoadInst(Value *Ptr, const Twine &NameStr, Instruction *InsertBefore);
   LoadInst(Value *Ptr, const Twine &NameStr, BasicBlock *InsertAtEnd);
   LoadInst(Type *Ty, Value *Ptr, const Twine &NameStr, bool isVolatile = false,
            Instruction *InsertBefore = nullptr);
   LoadInst(Value *Ptr, const Twine &NameStr, bool isVolatile = false,
            Instruction *InsertBefore = nullptr)
       : LoadInst(cast<PointerType>(Ptr->getType())->getElementType(), Ptr,
                  NameStr, isVolatile, InsertBefore) {}
   LoadInst(Value *Ptr, const Twine &NameStr, bool isVolatile,
            BasicBlock *InsertAtEnd);
   LoadInst(Value *Ptr, const Twine &NameStr, bool isVolatile, unsigned Align,
            Instruction *InsertBefore = nullptr)
       : LoadInst(cast<PointerType>(Ptr->getType())->getElementType(), Ptr,
                  NameStr, isVolatile, Align, InsertBefore) {}
   LoadInst(Type *Ty, Value *Ptr, const Twine &NameStr, bool isVolatile,
            unsigned Align, Instruction *InsertBefore = nullptr);
   LoadInst(Value *Ptr, const Twine &NameStr, bool isVolatile,
            unsigned Align, BasicBlock *InsertAtEnd);
   LoadInst(Value *Ptr, const Twine &NameStr, bool isVolatile, unsigned Align,
            AtomicOrdering Order, SynchronizationScope SynchScope = CrossThread,
            Instruction *InsertBefore = nullptr)
       : LoadInst(cast<PointerType>(Ptr->getType())->getElementType(), Ptr,
                  NameStr, isVolatile, Align, Order, SynchScope, InsertBefore) {}
   LoadInst(Type *Ty, Value *Ptr, const Twine &NameStr, bool isVolatile,
            unsigned Align, AtomicOrdering Order,
            SynchronizationScope SynchScope = CrossThread,
            Instruction *InsertBefore = nullptr);
   LoadInst(Value *Ptr, const Twine &NameStr, bool isVolatile,
            unsigned Align, AtomicOrdering Order,
            SynchronizationScope SynchScope,
            BasicBlock *InsertAtEnd);
 
   LoadInst(Value *Ptr, const char *NameStr, Instruction *InsertBefore);
   LoadInst(Value *Ptr, const char *NameStr, BasicBlock *InsertAtEnd);
   LoadInst(Type *Ty, Value *Ptr, const char *NameStr = nullptr,
            bool isVolatile = false, Instruction *InsertBefore = nullptr);
   explicit LoadInst(Value *Ptr, const char *NameStr = nullptr,
                     bool isVolatile = false,
                     Instruction *InsertBefore = nullptr)
       : LoadInst(cast<PointerType>(Ptr->getType())->getElementType(), Ptr,
                  NameStr, isVolatile, InsertBefore) {}
   LoadInst(Value *Ptr, const char *NameStr, bool isVolatile,
            BasicBlock *InsertAtEnd);
 
   /// isVolatile - Return true if this is a load from a volatile memory
   /// location.
   ///
   bool isVolatile() const { return getSubclassDataFromInstruction() & 1; }
 
   /// setVolatile - Specify whether this is a volatile load or not.
   ///
   void setVolatile(bool V) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~1) |
                                (V ? 1 : 0));
   }
 
   /// getAlignment - Return the alignment of the access that is being performed
   ///
   unsigned getAlignment() const {
     return (1 << ((getSubclassDataFromInstruction() >> 1) & 31)) >> 1;
   }
 
   void setAlignment(unsigned Align);
 
   /// Returns the ordering effect of this fence.
   AtomicOrdering getOrdering() const {
     return AtomicOrdering((getSubclassDataFromInstruction() >> 7) & 7);
   }
 
   /// Set the ordering constraint on this load. May not be Release or
   /// AcquireRelease.
   void setOrdering(AtomicOrdering Ordering) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~(7 << 7)) |
                                (Ordering << 7));
   }
 
   SynchronizationScope getSynchScope() const {
     return SynchronizationScope((getSubclassDataFromInstruction() >> 6) & 1);
   }
 
   /// Specify whether this load is ordered with respect to all
   /// concurrently executing threads, or only with respect to signal handlers
   /// executing in the same thread.
   void setSynchScope(SynchronizationScope xthread) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~(1 << 6)) |
                                (xthread << 6));
   }
 
   void setAtomic(AtomicOrdering Ordering,
                  SynchronizationScope SynchScope = CrossThread) {
     setOrdering(Ordering);
     setSynchScope(SynchScope);
   }
 
   bool isSimple() const { return !isAtomic() && !isVolatile(); }
   bool isUnordered() const {
     return getOrdering() <= Unordered && !isVolatile();
   }
 
   Value *getPointerOperand() { return getOperand(0); }
   const Value *getPointerOperand() const { return getOperand(0); }
   static unsigned getPointerOperandIndex() { return 0U; }
 
   /// \brief Returns the address space of the pointer operand.
   unsigned getPointerAddressSpace() const {
     return getPointerOperand()->getType()->getPointerAddressSpace();
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::Load;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   // Shadow Instruction::setInstructionSubclassData with a private forwarding
   // method so that subclasses cannot accidentally use it.
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                StoreInst Class
 //===----------------------------------------------------------------------===//
 
 /// StoreInst - an instruction for storing to memory
 ///
 class StoreInst : public Instruction {
   void *operator new(size_t, unsigned) = delete;
   void AssertOK();
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   StoreInst *cloneImpl() const;
 
 public:
   // allocate space for exactly two operands
   void *operator new(size_t s) {
     return User::operator new(s, 2);
   }
   StoreInst(Value *Val, Value *Ptr, Instruction *InsertBefore);
   StoreInst(Value *Val, Value *Ptr, BasicBlock *InsertAtEnd);
   StoreInst(Value *Val, Value *Ptr, bool isVolatile = false,
             Instruction *InsertBefore = nullptr);
   StoreInst(Value *Val, Value *Ptr, bool isVolatile, BasicBlock *InsertAtEnd);
   StoreInst(Value *Val, Value *Ptr, bool isVolatile,
             unsigned Align, Instruction *InsertBefore = nullptr);
   StoreInst(Value *Val, Value *Ptr, bool isVolatile,
             unsigned Align, BasicBlock *InsertAtEnd);
   StoreInst(Value *Val, Value *Ptr, bool isVolatile,
             unsigned Align, AtomicOrdering Order,
             SynchronizationScope SynchScope = CrossThread,
             Instruction *InsertBefore = nullptr);
   StoreInst(Value *Val, Value *Ptr, bool isVolatile,
             unsigned Align, AtomicOrdering Order,
             SynchronizationScope SynchScope,
             BasicBlock *InsertAtEnd);
 
   /// isVolatile - Return true if this is a store to a volatile memory
   /// location.
   ///
   bool isVolatile() const { return getSubclassDataFromInstruction() & 1; }
 
   /// setVolatile - Specify whether this is a volatile store or not.
   ///
   void setVolatile(bool V) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~1) |
                                (V ? 1 : 0));
   }
 
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   /// getAlignment - Return the alignment of the access that is being performed
   ///
   unsigned getAlignment() const {
     return (1 << ((getSubclassDataFromInstruction() >> 1) & 31)) >> 1;
   }
 
   void setAlignment(unsigned Align);
 
   /// Returns the ordering effect of this store.
   AtomicOrdering getOrdering() const {
     return AtomicOrdering((getSubclassDataFromInstruction() >> 7) & 7);
   }
 
   /// Set the ordering constraint on this store.  May not be Acquire or
   /// AcquireRelease.
   void setOrdering(AtomicOrdering Ordering) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~(7 << 7)) |
                                (Ordering << 7));
   }
 
   SynchronizationScope getSynchScope() const {
     return SynchronizationScope((getSubclassDataFromInstruction() >> 6) & 1);
   }
 
   /// Specify whether this store instruction is ordered with respect to all
   /// concurrently executing threads, or only with respect to signal handlers
   /// executing in the same thread.
   void setSynchScope(SynchronizationScope xthread) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~(1 << 6)) |
                                (xthread << 6));
   }
 
   void setAtomic(AtomicOrdering Ordering,
                  SynchronizationScope SynchScope = CrossThread) {
     setOrdering(Ordering);
     setSynchScope(SynchScope);
   }
 
   bool isSimple() const { return !isAtomic() && !isVolatile(); }
   bool isUnordered() const {
     return getOrdering() <= Unordered && !isVolatile();
   }
 
   Value *getValueOperand() { return getOperand(0); }
   const Value *getValueOperand() const { return getOperand(0); }
 
   Value *getPointerOperand() { return getOperand(1); }
   const Value *getPointerOperand() const { return getOperand(1); }
   static unsigned getPointerOperandIndex() { return 1U; }
 
   /// \brief Returns the address space of the pointer operand.
   unsigned getPointerAddressSpace() const {
     return getPointerOperand()->getType()->getPointerAddressSpace();
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::Store;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   // Shadow Instruction::setInstructionSubclassData with a private forwarding
   // method so that subclasses cannot accidentally use it.
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
 };
 
 template <>
 struct OperandTraits<StoreInst> : public FixedNumOperandTraits<StoreInst, 2> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(StoreInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                                FenceInst Class
 //===----------------------------------------------------------------------===//
 
 /// FenceInst - an instruction for ordering other memory operations
 ///
 class FenceInst : public Instruction {
   void *operator new(size_t, unsigned) = delete;
   void Init(AtomicOrdering Ordering, SynchronizationScope SynchScope);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   FenceInst *cloneImpl() const;
 
 public:
   // allocate space for exactly zero operands
   void *operator new(size_t s) {
     return User::operator new(s, 0);
   }
 
   // Ordering may only be Acquire, Release, AcquireRelease, or
   // SequentiallyConsistent.
   FenceInst(LLVMContext &C, AtomicOrdering Ordering,
             SynchronizationScope SynchScope = CrossThread,
             Instruction *InsertBefore = nullptr);
   FenceInst(LLVMContext &C, AtomicOrdering Ordering,
             SynchronizationScope SynchScope,
             BasicBlock *InsertAtEnd);
 
   /// Returns the ordering effect of this fence.
   AtomicOrdering getOrdering() const {
     return AtomicOrdering(getSubclassDataFromInstruction() >> 1);
   }
 
   /// Set the ordering constraint on this fence.  May only be Acquire, Release,
   /// AcquireRelease, or SequentiallyConsistent.
   void setOrdering(AtomicOrdering Ordering) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & 1) |
                                (Ordering << 1));
   }
 
   SynchronizationScope getSynchScope() const {
     return SynchronizationScope(getSubclassDataFromInstruction() & 1);
   }
 
   /// Specify whether this fence orders other operations with respect to all
   /// concurrently executing threads, or only with respect to signal handlers
   /// executing in the same thread.
   void setSynchScope(SynchronizationScope xthread) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~1) |
                                xthread);
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::Fence;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   // Shadow Instruction::setInstructionSubclassData with a private forwarding
   // method so that subclasses cannot accidentally use it.
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                AtomicCmpXchgInst Class
 //===----------------------------------------------------------------------===//
 
 /// AtomicCmpXchgInst - an instruction that atomically checks whether a
 /// specified value is in a memory location, and, if it is, stores a new value
 /// there.  Returns the value that was loaded.
 ///
 class AtomicCmpXchgInst : public Instruction {
   void *operator new(size_t, unsigned) = delete;
   void Init(Value *Ptr, Value *Cmp, Value *NewVal,
             AtomicOrdering SuccessOrdering, AtomicOrdering FailureOrdering,
             SynchronizationScope SynchScope);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   AtomicCmpXchgInst *cloneImpl() const;
 
 public:
   // allocate space for exactly three operands
   void *operator new(size_t s) {
     return User::operator new(s, 3);
   }
   AtomicCmpXchgInst(Value *Ptr, Value *Cmp, Value *NewVal,
                     AtomicOrdering SuccessOrdering,
                     AtomicOrdering FailureOrdering,
                     SynchronizationScope SynchScope,
                     Instruction *InsertBefore = nullptr);
   AtomicCmpXchgInst(Value *Ptr, Value *Cmp, Value *NewVal,
                     AtomicOrdering SuccessOrdering,
                     AtomicOrdering FailureOrdering,
                     SynchronizationScope SynchScope,
                     BasicBlock *InsertAtEnd);
 
   /// isVolatile - Return true if this is a cmpxchg from a volatile memory
   /// location.
   ///
   bool isVolatile() const {
     return getSubclassDataFromInstruction() & 1;
   }
 
   /// setVolatile - Specify whether this is a volatile cmpxchg.
   ///
   void setVolatile(bool V) {
      setInstructionSubclassData((getSubclassDataFromInstruction() & ~1) |
                                 (unsigned)V);
   }
 
   /// Return true if this cmpxchg may spuriously fail.
   bool isWeak() const {
     return getSubclassDataFromInstruction() & 0x100;
   }
 
   void setWeak(bool IsWeak) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~0x100) |
                                (IsWeak << 8));
   }
 
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   /// Set the ordering constraint on this cmpxchg.
   void setSuccessOrdering(AtomicOrdering Ordering) {
     assert(Ordering != NotAtomic &&
            "CmpXchg instructions can only be atomic.");
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~0x1c) |
                                (Ordering << 2));
   }
 
   void setFailureOrdering(AtomicOrdering Ordering) {
     assert(Ordering != NotAtomic &&
            "CmpXchg instructions can only be atomic.");
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~0xe0) |
                                (Ordering << 5));
   }
 
   /// Specify whether this cmpxchg is atomic and orders other operations with
   /// respect to all concurrently executing threads, or only with respect to
   /// signal handlers executing in the same thread.
   void setSynchScope(SynchronizationScope SynchScope) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~2) |
                                (SynchScope << 1));
   }
 
   /// Returns the ordering constraint on this cmpxchg.
   AtomicOrdering getSuccessOrdering() const {
     return AtomicOrdering((getSubclassDataFromInstruction() >> 2) & 7);
   }
 
   /// Returns the ordering constraint on this cmpxchg.
   AtomicOrdering getFailureOrdering() const {
     return AtomicOrdering((getSubclassDataFromInstruction() >> 5) & 7);
   }
 
   /// Returns whether this cmpxchg is atomic between threads or only within a
   /// single thread.
   SynchronizationScope getSynchScope() const {
     return SynchronizationScope((getSubclassDataFromInstruction() & 2) >> 1);
   }
 
   Value *getPointerOperand() { return getOperand(0); }
   const Value *getPointerOperand() const { return getOperand(0); }
   static unsigned getPointerOperandIndex() { return 0U; }
 
   Value *getCompareOperand() { return getOperand(1); }
   const Value *getCompareOperand() const { return getOperand(1); }
 
   Value *getNewValOperand() { return getOperand(2); }
   const Value *getNewValOperand() const { return getOperand(2); }
 
   /// \brief Returns the address space of the pointer operand.
   unsigned getPointerAddressSpace() const {
     return getPointerOperand()->getType()->getPointerAddressSpace();
   }
 
   /// \brief Returns the strongest permitted ordering on failure, given the
   /// desired ordering on success.
   ///
   /// If the comparison in a cmpxchg operation fails, there is no atomic store
   /// so release semantics cannot be provided. So this function drops explicit
   /// Release requests from the AtomicOrdering. A SequentiallyConsistent
   /// operation would remain SequentiallyConsistent.
   static AtomicOrdering
   getStrongestFailureOrdering(AtomicOrdering SuccessOrdering) {
     switch (SuccessOrdering) {
     default: llvm_unreachable("invalid cmpxchg success ordering");
     case Release:
     case Monotonic:
       return Monotonic;
     case AcquireRelease:
     case Acquire:
       return Acquire;
     case SequentiallyConsistent:
       return SequentiallyConsistent;
     }
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::AtomicCmpXchg;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   // Shadow Instruction::setInstructionSubclassData with a private forwarding
   // method so that subclasses cannot accidentally use it.
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
 };
 
 template <>
 struct OperandTraits<AtomicCmpXchgInst> :
     public FixedNumOperandTraits<AtomicCmpXchgInst, 3> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(AtomicCmpXchgInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                                AtomicRMWInst Class
 //===----------------------------------------------------------------------===//
 
 /// AtomicRMWInst - an instruction that atomically reads a memory location,
 /// combines it with another value, and then stores the result back.  Returns
 /// the old value.
 ///
 class AtomicRMWInst : public Instruction {
   void *operator new(size_t, unsigned) = delete;
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   AtomicRMWInst *cloneImpl() const;
 
 public:
   /// This enumeration lists the possible modifications atomicrmw can make.  In
   /// the descriptions, 'p' is the pointer to the instruction's memory location,
   /// 'old' is the initial value of *p, and 'v' is the other value passed to the
   /// instruction.  These instructions always return 'old'.
   enum BinOp {
     /// *p = v
     Xchg,
     /// *p = old + v
     Add,
     /// *p = old - v
     Sub,
     /// *p = old & v
     And,
     /// *p = ~(old & v)
     Nand,
     /// *p = old | v
     Or,
     /// *p = old ^ v
     Xor,
     /// *p = old >signed v ? old : v
     Max,
     /// *p = old <signed v ? old : v
     Min,
     /// *p = old >unsigned v ? old : v
     UMax,
     /// *p = old <unsigned v ? old : v
     UMin,
 
     FIRST_BINOP = Xchg,
     LAST_BINOP = UMin,
     BAD_BINOP
   };
 
   // allocate space for exactly two operands
   void *operator new(size_t s) {
     return User::operator new(s, 2);
   }
   AtomicRMWInst(BinOp Operation, Value *Ptr, Value *Val,
                 AtomicOrdering Ordering, SynchronizationScope SynchScope,
                 Instruction *InsertBefore = nullptr);
   AtomicRMWInst(BinOp Operation, Value *Ptr, Value *Val,
                 AtomicOrdering Ordering, SynchronizationScope SynchScope,
                 BasicBlock *InsertAtEnd);
 
   BinOp getOperation() const {
     return static_cast<BinOp>(getSubclassDataFromInstruction() >> 5);
   }
 
   void setOperation(BinOp Operation) {
     unsigned short SubclassData = getSubclassDataFromInstruction();
     setInstructionSubclassData((SubclassData & 31) |
                                (Operation << 5));
   }
 
   /// isVolatile - Return true if this is a RMW on a volatile memory location.
   ///
   bool isVolatile() const {
     return getSubclassDataFromInstruction() & 1;
   }
 
   /// setVolatile - Specify whether this is a volatile RMW or not.
   ///
   void setVolatile(bool V) {
      setInstructionSubclassData((getSubclassDataFromInstruction() & ~1) |
                                 (unsigned)V);
   }
 
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   /// Set the ordering constraint on this RMW.
   void setOrdering(AtomicOrdering Ordering) {
     assert(Ordering != NotAtomic &&
            "atomicrmw instructions can only be atomic.");
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~(7 << 2)) |
                                (Ordering << 2));
   }
 
   /// Specify whether this RMW orders other operations with respect to all
   /// concurrently executing threads, or only with respect to signal handlers
   /// executing in the same thread.
   void setSynchScope(SynchronizationScope SynchScope) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~2) |
                                (SynchScope << 1));
   }
 
   /// Returns the ordering constraint on this RMW.
   AtomicOrdering getOrdering() const {
     return AtomicOrdering((getSubclassDataFromInstruction() >> 2) & 7);
   }
 
   /// Returns whether this RMW is atomic between threads or only within a
   /// single thread.
   SynchronizationScope getSynchScope() const {
     return SynchronizationScope((getSubclassDataFromInstruction() & 2) >> 1);
   }
 
   Value *getPointerOperand() { return getOperand(0); }
   const Value *getPointerOperand() const { return getOperand(0); }
   static unsigned getPointerOperandIndex() { return 0U; }
 
   Value *getValOperand() { return getOperand(1); }
   const Value *getValOperand() const { return getOperand(1); }
 
   /// \brief Returns the address space of the pointer operand.
   unsigned getPointerAddressSpace() const {
     return getPointerOperand()->getType()->getPointerAddressSpace();
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::AtomicRMW;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   void Init(BinOp Operation, Value *Ptr, Value *Val,
             AtomicOrdering Ordering, SynchronizationScope SynchScope);
   // Shadow Instruction::setInstructionSubclassData with a private forwarding
   // method so that subclasses cannot accidentally use it.
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
 };
 
 template <>
 struct OperandTraits<AtomicRMWInst>
     : public FixedNumOperandTraits<AtomicRMWInst,2> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(AtomicRMWInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                             GetElementPtrInst Class
 //===----------------------------------------------------------------------===//
 
 // checkGEPType - Simple wrapper function to give a better assertion failure
 // message on bad indexes for a gep instruction.
 //
 inline Type *checkGEPType(Type *Ty) {
   assert(Ty && "Invalid GetElementPtrInst indices for type!");
   return Ty;
 }
 
 /// GetElementPtrInst - an instruction for type-safe pointer arithmetic to
 /// access elements of arrays and structs
 ///
 class GetElementPtrInst : public Instruction {
   Type *SourceElementType;
   Type *ResultElementType;
 
   void anchor() override;
 
   GetElementPtrInst(const GetElementPtrInst &GEPI);
   void init(Value *Ptr, ArrayRef<Value *> IdxList, const Twine &NameStr);
 
   /// Constructors - Create a getelementptr instruction with a base pointer an
   /// list of indices. The first ctor can optionally insert before an existing
   /// instruction, the second appends the new instruction to the specified
   /// BasicBlock.
   inline GetElementPtrInst(Type *PointeeType, Value *Ptr,
                            ArrayRef<Value *> IdxList, unsigned Values,
                            const Twine &NameStr, Instruction *InsertBefore);
   inline GetElementPtrInst(Type *PointeeType, Value *Ptr,
                            ArrayRef<Value *> IdxList, unsigned Values,
                            const Twine &NameStr, BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   GetElementPtrInst *cloneImpl() const;
 
 public:
   static GetElementPtrInst *Create(Type *PointeeType, Value *Ptr,
                                    ArrayRef<Value *> IdxList,
                                    const Twine &NameStr = "",
                                    Instruction *InsertBefore = nullptr) {
     unsigned Values = 1 + unsigned(IdxList.size());
     if (!PointeeType)
       PointeeType =
           cast<PointerType>(Ptr->getType()->getScalarType())->getElementType();
     else
       assert(
           PointeeType ==
           cast<PointerType>(Ptr->getType()->getScalarType())->getElementType());
     return new (Values) GetElementPtrInst(PointeeType, Ptr, IdxList, Values,
                                           NameStr, InsertBefore);
   }
   static GetElementPtrInst *Create(Type *PointeeType, Value *Ptr,
                                    ArrayRef<Value *> IdxList,
                                    const Twine &NameStr,
                                    BasicBlock *InsertAtEnd) {
     unsigned Values = 1 + unsigned(IdxList.size());
     if (!PointeeType)
       PointeeType =
           cast<PointerType>(Ptr->getType()->getScalarType())->getElementType();
     else
       assert(
           PointeeType ==
           cast<PointerType>(Ptr->getType()->getScalarType())->getElementType());
     return new (Values) GetElementPtrInst(PointeeType, Ptr, IdxList, Values,
                                           NameStr, InsertAtEnd);
   }
 
   /// Create an "inbounds" getelementptr. See the documentation for the
   /// "inbounds" flag in LangRef.html for details.
   static GetElementPtrInst *CreateInBounds(Value *Ptr,
                                            ArrayRef<Value *> IdxList,
                                            const Twine &NameStr = "",
                                            Instruction *InsertBefore = nullptr){
     return CreateInBounds(nullptr, Ptr, IdxList, NameStr, InsertBefore);
   }
   static GetElementPtrInst *
   CreateInBounds(Type *PointeeType, Value *Ptr, ArrayRef<Value *> IdxList,
                  const Twine &NameStr = "",
                  Instruction *InsertBefore = nullptr) {
     GetElementPtrInst *GEP =
         Create(PointeeType, Ptr, IdxList, NameStr, InsertBefore);
     GEP->setIsInBounds(true);
     return GEP;
   }
   static GetElementPtrInst *CreateInBounds(Value *Ptr,
                                            ArrayRef<Value *> IdxList,
                                            const Twine &NameStr,
                                            BasicBlock *InsertAtEnd) {
     return CreateInBounds(nullptr, Ptr, IdxList, NameStr, InsertAtEnd);
   }
   static GetElementPtrInst *CreateInBounds(Type *PointeeType, Value *Ptr,
                                            ArrayRef<Value *> IdxList,
                                            const Twine &NameStr,
                                            BasicBlock *InsertAtEnd) {
     GetElementPtrInst *GEP =
         Create(PointeeType, Ptr, IdxList, NameStr, InsertAtEnd);
     GEP->setIsInBounds(true);
     return GEP;
   }
 
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   // getType - Overload to return most specific sequential type.
   SequentialType *getType() const {
     return cast<SequentialType>(Instruction::getType());
   }
 
   Type *getSourceElementType() const { return SourceElementType; }
 
   void setSourceElementType(Type *Ty) { SourceElementType = Ty; }
   void setResultElementType(Type *Ty) { ResultElementType = Ty; }
 
   Type *getResultElementType() const {
     assert(ResultElementType ==
            cast<PointerType>(getType()->getScalarType())->getElementType());
     return ResultElementType;
   }
 
   /// \brief Returns the address space of this instruction's pointer type.
   unsigned getAddressSpace() const {
     // Note that this is always the same as the pointer operand's address space
     // and that is cheaper to compute, so cheat here.
     return getPointerAddressSpace();
   }
 
   /// getIndexedType - Returns the type of the element that would be loaded with
   /// a load instruction with the specified parameters.
   ///
   /// Null is returned if the indices are invalid for the specified
   /// pointer type.
   ///
   static Type *getIndexedType(Type *Ty, ArrayRef<Value *> IdxList);
   static Type *getIndexedType(Type *Ty, ArrayRef<Constant *> IdxList);
   static Type *getIndexedType(Type *Ty, ArrayRef<uint64_t> IdxList);
 
   inline op_iterator       idx_begin()       { return op_begin()+1; }
   inline const_op_iterator idx_begin() const { return op_begin()+1; }
   inline op_iterator       idx_end()         { return op_end(); }
   inline const_op_iterator idx_end()   const { return op_end(); }
 
   Value *getPointerOperand() {
     return getOperand(0);
   }
   const Value *getPointerOperand() const {
     return getOperand(0);
   }
   static unsigned getPointerOperandIndex() {
     return 0U;    // get index for modifying correct operand.
   }
 
   /// getPointerOperandType - Method to return the pointer operand as a
   /// PointerType.
   Type *getPointerOperandType() const {
     return getPointerOperand()->getType();
   }
 
   /// \brief Returns the address space of the pointer operand.
   unsigned getPointerAddressSpace() const {
     return getPointerOperandType()->getPointerAddressSpace();
   }
 
   /// GetGEPReturnType - Returns the pointer type returned by the GEP
   /// instruction, which may be a vector of pointers.
   static Type *getGEPReturnType(Value *Ptr, ArrayRef<Value *> IdxList) {
     return getGEPReturnType(
         cast<PointerType>(Ptr->getType()->getScalarType())->getElementType(),
         Ptr, IdxList);
   }
   static Type *getGEPReturnType(Type *ElTy, Value *Ptr,
                                 ArrayRef<Value *> IdxList) {
     Type *PtrTy = PointerType::get(checkGEPType(getIndexedType(ElTy, IdxList)),
                                    Ptr->getType()->getPointerAddressSpace());
     // Vector GEP
     if (Ptr->getType()->isVectorTy()) {
       unsigned NumElem = Ptr->getType()->getVectorNumElements();
       return VectorType::get(PtrTy, NumElem);
     }
     for (Value *Index : IdxList)
       if (Index->getType()->isVectorTy()) {
         unsigned NumElem = Index->getType()->getVectorNumElements();
         return VectorType::get(PtrTy, NumElem);
       }
     // Scalar GEP
     return PtrTy;
   }
 
   unsigned getNumIndices() const {  // Note: always non-negative
     return getNumOperands() - 1;
   }
 
   bool hasIndices() const {
     return getNumOperands() > 1;
   }
 
   /// hasAllZeroIndices - Return true if all of the indices of this GEP are
   /// zeros.  If so, the result pointer and the first operand have the same
   /// value, just potentially different types.
   bool hasAllZeroIndices() const;
 
   /// hasAllConstantIndices - Return true if all of the indices of this GEP are
   /// constant integers.  If so, the result pointer and the first operand have
   /// a constant offset between them.
   bool hasAllConstantIndices() const;
 
   /// setIsInBounds - Set or clear the inbounds flag on this GEP instruction.
   /// See LangRef.html for the meaning of inbounds on a getelementptr.
   void setIsInBounds(bool b = true);
 
   /// isInBounds - Determine whether the GEP has the inbounds flag.
   bool isInBounds() const;
 
   /// \brief Accumulate the constant address offset of this GEP if possible.
   ///
   /// This routine accepts an APInt into which it will accumulate the constant
   /// offset of this GEP if the GEP is in fact constant. If the GEP is not
   /// all-constant, it returns false and the value of the offset APInt is
   /// undefined (it is *not* preserved!). The APInt passed into this routine
   /// must be at least as wide as the IntPtr type for the address space of
   /// the base GEP pointer.
   bool accumulateConstantOffset(const DataLayout &DL, APInt &Offset) const;
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return (I->getOpcode() == Instruction::GetElementPtr);
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 template <>
 struct OperandTraits<GetElementPtrInst> :
   public VariadicOperandTraits<GetElementPtrInst, 1> {
 };
 
 GetElementPtrInst::GetElementPtrInst(Type *PointeeType, Value *Ptr,
                                      ArrayRef<Value *> IdxList, unsigned Values,
                                      const Twine &NameStr,
                                      Instruction *InsertBefore)
     : Instruction(getGEPReturnType(PointeeType, Ptr, IdxList), GetElementPtr,
                   OperandTraits<GetElementPtrInst>::op_end(this) - Values,
                   Values, InsertBefore),
       SourceElementType(PointeeType),
       ResultElementType(getIndexedType(PointeeType, IdxList)) {
   assert(ResultElementType ==
          cast<PointerType>(getType()->getScalarType())->getElementType());
   init(Ptr, IdxList, NameStr);
 }
 GetElementPtrInst::GetElementPtrInst(Type *PointeeType, Value *Ptr,
                                      ArrayRef<Value *> IdxList, unsigned Values,
                                      const Twine &NameStr,
                                      BasicBlock *InsertAtEnd)
     : Instruction(getGEPReturnType(PointeeType, Ptr, IdxList), GetElementPtr,
                   OperandTraits<GetElementPtrInst>::op_end(this) - Values,
                   Values, InsertAtEnd),
       SourceElementType(PointeeType),
       ResultElementType(getIndexedType(PointeeType, IdxList)) {
   assert(ResultElementType ==
          cast<PointerType>(getType()->getScalarType())->getElementType());
   init(Ptr, IdxList, NameStr);
 }
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(GetElementPtrInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                               ICmpInst Class
 //===----------------------------------------------------------------------===//
 
 /// This instruction compares its operands according to the predicate given
 /// to the constructor. It only operates on integers or pointers. The operands
 /// must be identical types.
 /// \brief Represent an integer comparison operator.
 class ICmpInst: public CmpInst {
   void anchor() override;
 
   void AssertOK() {
     assert(getPredicate() >= CmpInst::FIRST_ICMP_PREDICATE &&
            getPredicate() <= CmpInst::LAST_ICMP_PREDICATE &&
            "Invalid ICmp predicate value");
     assert(getOperand(0)->getType() == getOperand(1)->getType() &&
           "Both operands to ICmp instruction are not of the same type!");
     // Check that the operands are the right type
     assert((getOperand(0)->getType()->isIntOrIntVectorTy() ||
             getOperand(0)->getType()->isPtrOrPtrVectorTy()) &&
            "Invalid operand types for ICmp instruction");
   }
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical ICmpInst
   ICmpInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics.
   ICmpInst(
     Instruction *InsertBefore,  ///< Where to insert
     Predicate pred,  ///< The predicate to use for the comparison
     Value *LHS,      ///< The left-hand-side of the expression
     Value *RHS,      ///< The right-hand-side of the expression
     const Twine &NameStr = ""  ///< Name of the instruction
   ) : CmpInst(makeCmpResultType(LHS->getType()),
               Instruction::ICmp, pred, LHS, RHS, NameStr,
               InsertBefore) {
 #ifndef NDEBUG
   AssertOK();
 #endif
   }
 
   /// \brief Constructor with insert-at-end semantics.
   ICmpInst(
     BasicBlock &InsertAtEnd, ///< Block to insert into.
     Predicate pred,  ///< The predicate to use for the comparison
     Value *LHS,      ///< The left-hand-side of the expression
     Value *RHS,      ///< The right-hand-side of the expression
     const Twine &NameStr = ""  ///< Name of the instruction
   ) : CmpInst(makeCmpResultType(LHS->getType()),
               Instruction::ICmp, pred, LHS, RHS, NameStr,
               &InsertAtEnd) {
 #ifndef NDEBUG
   AssertOK();
 #endif
   }
 
   /// \brief Constructor with no-insertion semantics
   ICmpInst(
     Predicate pred, ///< The predicate to use for the comparison
     Value *LHS,     ///< The left-hand-side of the expression
     Value *RHS,     ///< The right-hand-side of the expression
     const Twine &NameStr = "" ///< Name of the instruction
   ) : CmpInst(makeCmpResultType(LHS->getType()),
               Instruction::ICmp, pred, LHS, RHS, NameStr) {
 #ifndef NDEBUG
   AssertOK();
 #endif
   }
 
   /// For example, EQ->EQ, SLE->SLE, UGT->SGT, etc.
   /// @returns the predicate that would be the result if the operand were
   /// regarded as signed.
   /// \brief Return the signed version of the predicate
   Predicate getSignedPredicate() const {
     return getSignedPredicate(getPredicate());
   }
 
   /// This is a static version that you can use without an instruction.
   /// \brief Return the signed version of the predicate.
   static Predicate getSignedPredicate(Predicate pred);
 
   /// For example, EQ->EQ, SLE->ULE, UGT->UGT, etc.
   /// @returns the predicate that would be the result if the operand were
   /// regarded as unsigned.
   /// \brief Return the unsigned version of the predicate
   Predicate getUnsignedPredicate() const {
     return getUnsignedPredicate(getPredicate());
   }
 
   /// This is a static version that you can use without an instruction.
   /// \brief Return the unsigned version of the predicate.
   static Predicate getUnsignedPredicate(Predicate pred);
 
   /// isEquality - Return true if this predicate is either EQ or NE.  This also
   /// tests for commutativity.
   static bool isEquality(Predicate P) {
     return P == ICMP_EQ || P == ICMP_NE;
   }
 
   /// isEquality - Return true if this predicate is either EQ or NE.  This also
   /// tests for commutativity.
   bool isEquality() const {
     return isEquality(getPredicate());
   }
 
   /// @returns true if the predicate of this ICmpInst is commutative
   /// \brief Determine if this relation is commutative.
   bool isCommutative() const { return isEquality(); }
 
   /// isRelational - Return true if the predicate is relational (not EQ or NE).
   ///
   bool isRelational() const {
     return !isEquality();
   }
 
   /// isRelational - Return true if the predicate is relational (not EQ or NE).
   ///
   static bool isRelational(Predicate P) {
     return !isEquality(P);
   }
 
   /// Initialize a set of values that all satisfy the predicate with C.
   /// \brief Make a ConstantRange for a relation with a constant value.
   static ConstantRange makeConstantRange(Predicate pred, const APInt &C);
 
   /// Exchange the two operands to this instruction in such a way that it does
   /// not modify the semantics of the instruction. The predicate value may be
   /// changed to retain the same result if the predicate is order dependent
   /// (e.g. ult).
   /// \brief Swap operands and adjust predicate.
   void swapOperands() {
     setPredicate(getSwappedPredicate());
     Op<0>().swap(Op<1>());
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::ICmp;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                               FCmpInst Class
 //===----------------------------------------------------------------------===//
 
 /// This instruction compares its operands according to the predicate given
 /// to the constructor. It only operates on floating point values or packed
 /// vectors of floating point values. The operands must be identical types.
 /// \brief Represents a floating point comparison operator.
 class FCmpInst: public CmpInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical FCmpInst
   FCmpInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics.
   FCmpInst(
     Instruction *InsertBefore, ///< Where to insert
     Predicate pred,  ///< The predicate to use for the comparison
     Value *LHS,      ///< The left-hand-side of the expression
     Value *RHS,      ///< The right-hand-side of the expression
     const Twine &NameStr = ""  ///< Name of the instruction
   ) : CmpInst(makeCmpResultType(LHS->getType()),
               Instruction::FCmp, pred, LHS, RHS, NameStr,
               InsertBefore) {
     assert(pred <= FCmpInst::LAST_FCMP_PREDICATE &&
            "Invalid FCmp predicate value");
     assert(getOperand(0)->getType() == getOperand(1)->getType() &&
            "Both operands to FCmp instruction are not of the same type!");
     // Check that the operands are the right type
     assert(getOperand(0)->getType()->isFPOrFPVectorTy() &&
            "Invalid operand types for FCmp instruction");
   }
 
   /// \brief Constructor with insert-at-end semantics.
   FCmpInst(
     BasicBlock &InsertAtEnd, ///< Block to insert into.
     Predicate pred,  ///< The predicate to use for the comparison
     Value *LHS,      ///< The left-hand-side of the expression
     Value *RHS,      ///< The right-hand-side of the expression
     const Twine &NameStr = ""  ///< Name of the instruction
   ) : CmpInst(makeCmpResultType(LHS->getType()),
               Instruction::FCmp, pred, LHS, RHS, NameStr,
               &InsertAtEnd) {
     assert(pred <= FCmpInst::LAST_FCMP_PREDICATE &&
            "Invalid FCmp predicate value");
     assert(getOperand(0)->getType() == getOperand(1)->getType() &&
            "Both operands to FCmp instruction are not of the same type!");
     // Check that the operands are the right type
     assert(getOperand(0)->getType()->isFPOrFPVectorTy() &&
            "Invalid operand types for FCmp instruction");
   }
 
   /// \brief Constructor with no-insertion semantics
   FCmpInst(
     Predicate pred, ///< The predicate to use for the comparison
     Value *LHS,     ///< The left-hand-side of the expression
     Value *RHS,     ///< The right-hand-side of the expression
     const Twine &NameStr = "" ///< Name of the instruction
   ) : CmpInst(makeCmpResultType(LHS->getType()),
               Instruction::FCmp, pred, LHS, RHS, NameStr) {
     assert(pred <= FCmpInst::LAST_FCMP_PREDICATE &&
            "Invalid FCmp predicate value");
     assert(getOperand(0)->getType() == getOperand(1)->getType() &&
            "Both operands to FCmp instruction are not of the same type!");
     // Check that the operands are the right type
     assert(getOperand(0)->getType()->isFPOrFPVectorTy() &&
            "Invalid operand types for FCmp instruction");
   }
 
   /// @returns true if the predicate of this instruction is EQ or NE.
   /// \brief Determine if this is an equality predicate.
   static bool isEquality(Predicate Pred) {
     return Pred == FCMP_OEQ || Pred == FCMP_ONE || Pred == FCMP_UEQ ||
            Pred == FCMP_UNE;
   }
 
   /// @returns true if the predicate of this instruction is EQ or NE.
   /// \brief Determine if this is an equality predicate.
   bool isEquality() const { return isEquality(getPredicate()); }
 
   /// @returns true if the predicate of this instruction is commutative.
   /// \brief Determine if this is a commutative predicate.
   bool isCommutative() const {
     return isEquality() ||
            getPredicate() == FCMP_FALSE ||
            getPredicate() == FCMP_TRUE ||
            getPredicate() == FCMP_ORD ||
            getPredicate() == FCMP_UNO;
   }
 
   /// @returns true if the predicate is relational (not EQ or NE).
   /// \brief Determine if this a relational predicate.
   bool isRelational() const { return !isEquality(); }
 
   /// Exchange the two operands to this instruction in such a way that it does
   /// not modify the semantics of the instruction. The predicate value may be
   /// changed to retain the same result if the predicate is order dependent
   /// (e.g. ult).
   /// \brief Swap operands and adjust predicate.
   void swapOperands() {
     setPredicate(getSwappedPredicate());
     Op<0>().swap(Op<1>());
   }
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::FCmp;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 /// CallInst - This class represents a function call, abstracting a target
 /// machine's calling convention.  This class uses low bit of the SubClassData
 /// field to indicate whether or not this is a tail call.  The rest of the bits
 /// hold the calling convention of the call.
 ///
 class CallInst : public Instruction,
                  public OperandBundleUser<CallInst, User::op_iterator> {
   AttributeSet AttributeList; ///< parameter attributes for call
   FunctionType *FTy;
   CallInst(const CallInst &CI);
   void init(Value *Func, ArrayRef<Value *> Args,
             ArrayRef<OperandBundleDef> Bundles, const Twine &NameStr) {
     init(cast<FunctionType>(
              cast<PointerType>(Func->getType())->getElementType()),
          Func, Args, Bundles, NameStr);
   }
   void init(FunctionType *FTy, Value *Func, ArrayRef<Value *> Args,
             ArrayRef<OperandBundleDef> Bundles, const Twine &NameStr);
   void init(Value *Func, const Twine &NameStr);
 
   /// Construct a CallInst given a range of arguments.
   /// \brief Construct a CallInst from a range of arguments
   inline CallInst(FunctionType *Ty, Value *Func, ArrayRef<Value *> Args,
                   ArrayRef<OperandBundleDef> Bundles, const Twine &NameStr,
                   Instruction *InsertBefore);
   inline CallInst(Value *Func, ArrayRef<Value *> Args,
                   ArrayRef<OperandBundleDef> Bundles, const Twine &NameStr,
                   Instruction *InsertBefore)
       : CallInst(cast<FunctionType>(
                      cast<PointerType>(Func->getType())->getElementType()),
                  Func, Args, Bundles, NameStr, InsertBefore) {}
 
   inline CallInst(Value *Func, ArrayRef<Value *> Args, const Twine &NameStr,
                   Instruction *InsertBefore)
       : CallInst(Func, Args, None, NameStr, InsertBefore) {}
 
   /// Construct a CallInst given a range of arguments.
   /// \brief Construct a CallInst from a range of arguments
   inline CallInst(Value *Func, ArrayRef<Value *> Args,
                   ArrayRef<OperandBundleDef> Bundles, const Twine &NameStr,
                   BasicBlock *InsertAtEnd);
 
   explicit CallInst(Value *F, const Twine &NameStr,
                     Instruction *InsertBefore);
   CallInst(Value *F, const Twine &NameStr, BasicBlock *InsertAtEnd);
 
   friend class OperandBundleUser<CallInst, User::op_iterator>;
   bool hasDescriptor() const { return HasDescriptor; }
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   CallInst *cloneImpl() const;
 
 public:
   static CallInst *Create(Value *Func, ArrayRef<Value *> Args,
                           ArrayRef<OperandBundleDef> Bundles = None,
                           const Twine &NameStr = "",
                           Instruction *InsertBefore = nullptr) {
     return Create(cast<FunctionType>(
                       cast<PointerType>(Func->getType())->getElementType()),
                   Func, Args, Bundles, NameStr, InsertBefore);
   }
   static CallInst *Create(Value *Func, ArrayRef<Value *> Args,
                           const Twine &NameStr,
                           Instruction *InsertBefore = nullptr) {
     return Create(cast<FunctionType>(
                       cast<PointerType>(Func->getType())->getElementType()),
                   Func, Args, None, NameStr, InsertBefore);
   }
   static CallInst *Create(FunctionType *Ty, Value *Func, ArrayRef<Value *> Args,
                           const Twine &NameStr,
                           Instruction *InsertBefore = nullptr) {
     return new (unsigned(Args.size() + 1))
         CallInst(Ty, Func, Args, None, NameStr, InsertBefore);
   }
   static CallInst *Create(FunctionType *Ty, Value *Func, ArrayRef<Value *> Args,
                           ArrayRef<OperandBundleDef> Bundles = None,
                           const Twine &NameStr = "",
                           Instruction *InsertBefore = nullptr) {
     const unsigned TotalOps =
         unsigned(Args.size()) + CountBundleInputs(Bundles) + 1;
     const unsigned DescriptorBytes = Bundles.size() * sizeof(BundleOpInfo);
 
     return new (TotalOps, DescriptorBytes)
         CallInst(Ty, Func, Args, Bundles, NameStr, InsertBefore);
   }
   static CallInst *Create(Value *Func, ArrayRef<Value *> Args,
                           ArrayRef<OperandBundleDef> Bundles,
                           const Twine &NameStr, BasicBlock *InsertAtEnd) {
     const unsigned TotalOps =
         unsigned(Args.size()) + CountBundleInputs(Bundles) + 1;
     const unsigned DescriptorBytes = Bundles.size() * sizeof(BundleOpInfo);
 
     return new (TotalOps, DescriptorBytes)
         CallInst(Func, Args, Bundles, NameStr, InsertAtEnd);
   }
   static CallInst *Create(Value *Func, ArrayRef<Value *> Args,
                           const Twine &NameStr, BasicBlock *InsertAtEnd) {
     return new (unsigned(Args.size() + 1))
         CallInst(Func, Args, None, NameStr, InsertAtEnd);
   }
   static CallInst *Create(Value *F, const Twine &NameStr = "",
                           Instruction *InsertBefore = nullptr) {
     return new(1) CallInst(F, NameStr, InsertBefore);
   }
   static CallInst *Create(Value *F, const Twine &NameStr,
                           BasicBlock *InsertAtEnd) {
     return new(1) CallInst(F, NameStr, InsertAtEnd);
   }
 
   /// \brief Create a clone of \p CI with a different set of operand bundles and
   /// insert it before \p InsertPt.
   ///
   /// The returned call instruction is identical \p CI in every way except that
   /// the operand bundles for the new instruction are set to the operand bundles
   /// in \p Bundles.
   static CallInst *Create(CallInst *CI, ArrayRef<OperandBundleDef> Bundles,
                           Instruction *InsertPt = nullptr);
 
   /// CreateMalloc - Generate the IR for a call to malloc:
   /// 1. Compute the malloc call's argument as the specified type's size,
   ///    possibly multiplied by the array size if the array size is not
   ///    constant 1.
   /// 2. Call malloc with that argument.
   /// 3. Bitcast the result of the malloc call to the specified type.
   static Instruction *CreateMalloc(Instruction *InsertBefore,
                                    Type *IntPtrTy, Type *AllocTy,
                                    Value *AllocSize, Value *ArraySize = nullptr,
                                    Function* MallocF = nullptr,
                                    const Twine &Name = "");
   static Instruction *CreateMalloc(BasicBlock *InsertAtEnd,
                                    Type *IntPtrTy, Type *AllocTy,
                                    Value *AllocSize, Value *ArraySize = nullptr,
                                    Function* MallocF = nullptr,
                                    const Twine &Name = "");
   /// CreateFree - Generate the IR for a call to the builtin free function.
   static Instruction* CreateFree(Value* Source, Instruction *InsertBefore);
   static Instruction* CreateFree(Value* Source, BasicBlock *InsertAtEnd);
 
   ~CallInst() override;
 
   FunctionType *getFunctionType() const { return FTy; }
 
   void mutateFunctionType(FunctionType *FTy) {
     mutateType(FTy->getReturnType());
     this->FTy = FTy;
   }
 
   // Note that 'musttail' implies 'tail'.
   enum TailCallKind { TCK_None = 0, TCK_Tail = 1, TCK_MustTail = 2,
                       TCK_NoTail = 3 };
   TailCallKind getTailCallKind() const {
     return TailCallKind(getSubclassDataFromInstruction() & 3);
   }
   bool isTailCall() const {
     unsigned Kind = getSubclassDataFromInstruction() & 3;
     return Kind == TCK_Tail || Kind == TCK_MustTail;
   }
   bool isMustTailCall() const {
     return (getSubclassDataFromInstruction() & 3) == TCK_MustTail;
   }
   bool isNoTailCall() const {
     return (getSubclassDataFromInstruction() & 3) == TCK_NoTail;
   }
   void setTailCall(bool isTC = true) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~3) |
                                unsigned(isTC ? TCK_Tail : TCK_None));
   }
   void setTailCallKind(TailCallKind TCK) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~3) |
                                unsigned(TCK));
   }
 
   /// Provide fast operand accessors
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   /// getNumArgOperands - Return the number of call arguments.
   ///
   unsigned getNumArgOperands() const {
     return getNumOperands() - getNumTotalBundleOperands() - 1;
   }
 
   /// getArgOperand/setArgOperand - Return/set the i-th call argument.
   ///
   Value *getArgOperand(unsigned i) const {
     assert(i < getNumArgOperands() && "Out of bounds!");
     return getOperand(i);
   }
   void setArgOperand(unsigned i, Value *v) {
     assert(i < getNumArgOperands() && "Out of bounds!");
     setOperand(i, v);
   }
 
   /// \brief Return the iterator pointing to the beginning of the argument list.
   op_iterator arg_begin() { return op_begin(); }
 
   /// \brief Return the iterator pointing to the end of the argument list.
   op_iterator arg_end() {
     // [ call args ], [ operand bundles ], callee
     return op_end() - getNumTotalBundleOperands() - 1;
   };
 
   /// \brief Iteration adapter for range-for loops.
   iterator_range<op_iterator> arg_operands() {
     return make_range(arg_begin(), arg_end());
   }
 
   /// \brief Return the iterator pointing to the beginning of the argument list.
   const_op_iterator arg_begin() const { return op_begin(); }
 
   /// \brief Return the iterator pointing to the end of the argument list.
   const_op_iterator arg_end() const {
     // [ call args ], [ operand bundles ], callee
     return op_end() - getNumTotalBundleOperands() - 1;
   };
 
   /// \brief Iteration adapter for range-for loops.
   iterator_range<const_op_iterator> arg_operands() const {
     return make_range(arg_begin(), arg_end());
   }
 
   /// \brief Wrappers for getting the \c Use of a call argument.
   const Use &getArgOperandUse(unsigned i) const {
     assert(i < getNumArgOperands() && "Out of bounds!");
     return getOperandUse(i);
   }
   Use &getArgOperandUse(unsigned i) {
     assert(i < getNumArgOperands() && "Out of bounds!");
     return getOperandUse(i);
   }
 
   /// getCallingConv/setCallingConv - Get or set the calling convention of this
   /// function call.
   CallingConv::ID getCallingConv() const {
     return static_cast<CallingConv::ID>(getSubclassDataFromInstruction() >> 2);
   }
   void setCallingConv(CallingConv::ID CC) {
     auto ID = static_cast<unsigned>(CC);
     assert(!(ID & ~CallingConv::MaxID) && "Unsupported calling convention");
     setInstructionSubclassData((getSubclassDataFromInstruction() & 3) |
                                (ID << 2));
   }
 
   /// getAttributes - Return the parameter attributes for this call.
   ///
   const AttributeSet &getAttributes() const { return AttributeList; }
 
   /// setAttributes - Set the parameter attributes for this call.
   ///
   void setAttributes(const AttributeSet &Attrs) { AttributeList = Attrs; }
 
   /// addAttribute - adds the attribute to the list of attributes.
   void addAttribute(unsigned i, Attribute::AttrKind attr);
 
   /// addAttribute - adds the attribute to the list of attributes.
   void addAttribute(unsigned i, StringRef Kind, StringRef Value);
 
   /// removeAttribute - removes the attribute from the list of attributes.
   void removeAttribute(unsigned i, Attribute attr);
 
   /// \brief adds the dereferenceable attribute to the list of attributes.
   void addDereferenceableAttr(unsigned i, uint64_t Bytes);
 
   /// \brief adds the dereferenceable_or_null attribute to the list of
   /// attributes.
   void addDereferenceableOrNullAttr(unsigned i, uint64_t Bytes);
 
   /// \brief Determine whether this call has the given attribute.
   bool hasFnAttr(Attribute::AttrKind A) const {
     assert(A != Attribute::NoBuiltin &&
            "Use CallInst::isNoBuiltin() to check for Attribute::NoBuiltin");
     return hasFnAttrImpl(A);
   }
 
   /// \brief Determine whether this call has the given attribute.
   bool hasFnAttr(StringRef A) const {
     return hasFnAttrImpl(A);
   }
 
   /// \brief Determine whether the call or the callee has the given attributes.
   bool paramHasAttr(unsigned i, Attribute::AttrKind A) const;
 
   /// \brief Return true if the data operand at index \p i has the attribute \p
   /// A.
   ///
   /// Data operands include call arguments and values used in operand bundles,
   /// but does not include the callee operand.  This routine dispatches to the
   /// underlying AttributeList or the OperandBundleUser as appropriate.
   ///
   /// The index \p i is interpreted as
   ///
   ///  \p i == Attribute::ReturnIndex  -> the return value
   ///  \p i in [1, arg_size + 1)  -> argument number (\p i - 1)
   ///  \p i in [arg_size + 1, data_operand_size + 1) -> bundle operand at index
   ///     (\p i - 1) in the operand list.
   bool dataOperandHasImpliedAttr(unsigned i, Attribute::AttrKind A) const;
 
   /// \brief Extract the alignment for a call or parameter (0=unknown).
   unsigned getParamAlignment(unsigned i) const {
     return AttributeList.getParamAlignment(i);
   }
 
   /// \brief Extract the number of dereferenceable bytes for a call or
   /// parameter (0=unknown).
   uint64_t getDereferenceableBytes(unsigned i) const {
     return AttributeList.getDereferenceableBytes(i);
   }
 
   /// \brief Extract the number of dereferenceable_or_null bytes for a call or
   /// parameter (0=unknown).
   uint64_t getDereferenceableOrNullBytes(unsigned i) const {
     return AttributeList.getDereferenceableOrNullBytes(i);
   }
 
   /// @brief Determine if the parameter or return value is marked with NoAlias
   /// attribute.
   /// @param n The parameter to check. 1 is the first parameter, 0 is the return
   bool doesNotAlias(unsigned n) const {
     return AttributeList.hasAttribute(n, Attribute::NoAlias);
   }
 
   /// \brief Return true if the call should not be treated as a call to a
   /// builtin.
   bool isNoBuiltin() const {
     return hasFnAttrImpl(Attribute::NoBuiltin) &&
       !hasFnAttrImpl(Attribute::Builtin);
   }
 
   /// \brief Return true if the call should not be inlined.
   bool isNoInline() const { return hasFnAttr(Attribute::NoInline); }
   void setIsNoInline() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::NoInline);
   }
 
   /// \brief Return true if the call can return twice
   bool canReturnTwice() const {
     return hasFnAttr(Attribute::ReturnsTwice);
   }
   void setCanReturnTwice() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::ReturnsTwice);
   }
 
   /// \brief Determine if the call does not access memory.
   bool doesNotAccessMemory() const {
     return hasFnAttr(Attribute::ReadNone);
   }
   void setDoesNotAccessMemory() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::ReadNone);
   }
 
   /// \brief Determine if the call does not access or only reads memory.
   bool onlyReadsMemory() const {
     return doesNotAccessMemory() || hasFnAttr(Attribute::ReadOnly);
   }
   void setOnlyReadsMemory() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::ReadOnly);
   }
 
   /// @brief Determine if the call can access memmory only using pointers based
   /// on its arguments.
   bool onlyAccessesArgMemory() const {
     return hasFnAttr(Attribute::ArgMemOnly);
   }
   void setOnlyAccessesArgMemory() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::ArgMemOnly);
   }
 
   /// \brief Determine if the call cannot return.
   bool doesNotReturn() const { return hasFnAttr(Attribute::NoReturn); }
   void setDoesNotReturn() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::NoReturn);
   }
 
   /// \brief Determine if the call cannot unwind.
   bool doesNotThrow() const { return hasFnAttr(Attribute::NoUnwind); }
   void setDoesNotThrow() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::NoUnwind);
   }
 
   /// \brief Determine if the call cannot be duplicated.
   bool cannotDuplicate() const {return hasFnAttr(Attribute::NoDuplicate); }
   void setCannotDuplicate() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::NoDuplicate);
   }
 
   /// \brief Determine if the call is convergent
   bool isConvergent() const { return hasFnAttr(Attribute::Convergent); }
   void setConvergent() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::Convergent);
   }
 
   /// \brief Determine if the call returns a structure through first
   /// pointer argument.
   bool hasStructRetAttr() const {
     if (getNumArgOperands() == 0)
       return false;
 
     // Be friendly and also check the callee.
     return paramHasAttr(1, Attribute::StructRet);
   }
 
   /// \brief Determine if any call argument is an aggregate passed by value.
   bool hasByValArgument() const {
     return AttributeList.hasAttrSomewhere(Attribute::ByVal);
   }
 
   /// getCalledFunction - Return the function called, or null if this is an
   /// indirect function invocation.
   ///
   Function *getCalledFunction() const {
     return dyn_cast<Function>(Op<-1>());
   }
 
   /// getCalledValue - Get a pointer to the function that is invoked by this
   /// instruction.
   const Value *getCalledValue() const { return Op<-1>(); }
         Value *getCalledValue()       { return Op<-1>(); }
 
   /// setCalledFunction - Set the function called.
   void setCalledFunction(Value* Fn) {
     setCalledFunction(
         cast<FunctionType>(cast<PointerType>(Fn->getType())->getElementType()),
         Fn);
   }
   void setCalledFunction(FunctionType *FTy, Value *Fn) {
     this->FTy = FTy;
     assert(FTy == cast<FunctionType>(
                       cast<PointerType>(Fn->getType())->getElementType()));
     Op<-1>() = Fn;
   }
 
   /// isInlineAsm - Check if this call is an inline asm statement.
   bool isInlineAsm() const {
     return isa<InlineAsm>(Op<-1>());
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::Call;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   template <typename AttrKind> bool hasFnAttrImpl(AttrKind A) const {
     if (AttributeList.hasAttribute(AttributeSet::FunctionIndex, A))
       return true;
 
     // Operand bundles override attributes on the called function, but don't
     // override attributes directly present on the call instruction.
     if (isFnAttrDisallowedByOpBundle(A))
       return false;
 
     if (const Function *F = getCalledFunction())
       return F->getAttributes().hasAttribute(AttributeSet::FunctionIndex, A);
     return false;
   }
 
   // Shadow Instruction::setInstructionSubclassData with a private forwarding
   // method so that subclasses cannot accidentally use it.
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
 };
 
 template <>
 struct OperandTraits<CallInst> : public VariadicOperandTraits<CallInst, 1> {
 };
 
 CallInst::CallInst(Value *Func, ArrayRef<Value *> Args,
                    ArrayRef<OperandBundleDef> Bundles, const Twine &NameStr,
                    BasicBlock *InsertAtEnd)
     : Instruction(
           cast<FunctionType>(cast<PointerType>(Func->getType())
                                  ->getElementType())->getReturnType(),
           Instruction::Call, OperandTraits<CallInst>::op_end(this) -
                                  (Args.size() + CountBundleInputs(Bundles) + 1),
           unsigned(Args.size() + CountBundleInputs(Bundles) + 1), InsertAtEnd) {
   init(Func, Args, Bundles, NameStr);
 }
 
 CallInst::CallInst(FunctionType *Ty, Value *Func, ArrayRef<Value *> Args,
                    ArrayRef<OperandBundleDef> Bundles, const Twine &NameStr,
                    Instruction *InsertBefore)
     : Instruction(Ty->getReturnType(), Instruction::Call,
                   OperandTraits<CallInst>::op_end(this) -
                       (Args.size() + CountBundleInputs(Bundles) + 1),
                   unsigned(Args.size() + CountBundleInputs(Bundles) + 1),
                   InsertBefore) {
   init(Ty, Func, Args, Bundles, NameStr);
 }
 
 // Note: if you get compile errors about private methods then
 //       please update your code to use the high-level operand
 //       interfaces. See line 943 above.
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(CallInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                               SelectInst Class
 //===----------------------------------------------------------------------===//
 
 /// SelectInst - This class represents the LLVM 'select' instruction.
 ///
 class SelectInst : public Instruction {
   void init(Value *C, Value *S1, Value *S2) {
     assert(!areInvalidOperands(C, S1, S2) && "Invalid operands for select");
     Op<0>() = C;
     Op<1>() = S1;
     Op<2>() = S2;
   }
 
   SelectInst(Value *C, Value *S1, Value *S2, const Twine &NameStr,
              Instruction *InsertBefore)
     : Instruction(S1->getType(), Instruction::Select,
                   &Op<0>(), 3, InsertBefore) {
     init(C, S1, S2);
     setName(NameStr);
   }
   SelectInst(Value *C, Value *S1, Value *S2, const Twine &NameStr,
              BasicBlock *InsertAtEnd)
     : Instruction(S1->getType(), Instruction::Select,
                   &Op<0>(), 3, InsertAtEnd) {
     init(C, S1, S2);
     setName(NameStr);
   }
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   SelectInst *cloneImpl() const;
 
 public:
   static SelectInst *Create(Value *C, Value *S1, Value *S2,
                             const Twine &NameStr = "",
                             Instruction *InsertBefore = nullptr) {
     return new(3) SelectInst(C, S1, S2, NameStr, InsertBefore);
   }
   static SelectInst *Create(Value *C, Value *S1, Value *S2,
                             const Twine &NameStr,
                             BasicBlock *InsertAtEnd) {
     return new(3) SelectInst(C, S1, S2, NameStr, InsertAtEnd);
   }
 
   const Value *getCondition() const { return Op<0>(); }
   const Value *getTrueValue() const { return Op<1>(); }
   const Value *getFalseValue() const { return Op<2>(); }
   Value *getCondition() { return Op<0>(); }
   Value *getTrueValue() { return Op<1>(); }
   Value *getFalseValue() { return Op<2>(); }
 
   /// areInvalidOperands - Return a string if the specified operands are invalid
   /// for a select operation, otherwise return null.
   static const char *areInvalidOperands(Value *Cond, Value *True, Value *False);
 
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   OtherOps getOpcode() const {
     return static_cast<OtherOps>(Instruction::getOpcode());
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::Select;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 template <>
 struct OperandTraits<SelectInst> : public FixedNumOperandTraits<SelectInst, 3> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(SelectInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                                VAArgInst Class
 //===----------------------------------------------------------------------===//
 
 /// VAArgInst - This class represents the va_arg llvm instruction, which returns
 /// an argument of the specified type given a va_list and increments that list
 ///
 class VAArgInst : public UnaryInstruction {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   VAArgInst *cloneImpl() const;
 
 public:
   VAArgInst(Value *List, Type *Ty, const Twine &NameStr = "",
              Instruction *InsertBefore = nullptr)
     : UnaryInstruction(Ty, VAArg, List, InsertBefore) {
     setName(NameStr);
   }
   VAArgInst(Value *List, Type *Ty, const Twine &NameStr,
             BasicBlock *InsertAtEnd)
     : UnaryInstruction(Ty, VAArg, List, InsertAtEnd) {
     setName(NameStr);
   }
 
   Value *getPointerOperand() { return getOperand(0); }
   const Value *getPointerOperand() const { return getOperand(0); }
   static unsigned getPointerOperandIndex() { return 0U; }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == VAArg;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                ExtractElementInst Class
 //===----------------------------------------------------------------------===//
 
 /// ExtractElementInst - This instruction extracts a single (scalar)
 /// element from a VectorType value
 ///
 class ExtractElementInst : public Instruction {
   ExtractElementInst(Value *Vec, Value *Idx, const Twine &NameStr = "",
                      Instruction *InsertBefore = nullptr);
   ExtractElementInst(Value *Vec, Value *Idx, const Twine &NameStr,
                      BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   ExtractElementInst *cloneImpl() const;
 
 public:
   static ExtractElementInst *Create(Value *Vec, Value *Idx,
                                    const Twine &NameStr = "",
                                    Instruction *InsertBefore = nullptr) {
     return new(2) ExtractElementInst(Vec, Idx, NameStr, InsertBefore);
   }
   static ExtractElementInst *Create(Value *Vec, Value *Idx,
                                    const Twine &NameStr,
                                    BasicBlock *InsertAtEnd) {
     return new(2) ExtractElementInst(Vec, Idx, NameStr, InsertAtEnd);
   }
 
   /// isValidOperands - Return true if an extractelement instruction can be
   /// formed with the specified operands.
   static bool isValidOperands(const Value *Vec, const Value *Idx);
 
   Value *getVectorOperand() { return Op<0>(); }
   Value *getIndexOperand() { return Op<1>(); }
   const Value *getVectorOperand() const { return Op<0>(); }
   const Value *getIndexOperand() const { return Op<1>(); }
 
   VectorType *getVectorOperandType() const {
     return cast<VectorType>(getVectorOperand()->getType());
   }
 
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::ExtractElement;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 template <>
 struct OperandTraits<ExtractElementInst> :
   public FixedNumOperandTraits<ExtractElementInst, 2> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(ExtractElementInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                                InsertElementInst Class
 //===----------------------------------------------------------------------===//
 
 /// InsertElementInst - This instruction inserts a single (scalar)
 /// element into a VectorType value
 ///
 class InsertElementInst : public Instruction {
   InsertElementInst(Value *Vec, Value *NewElt, Value *Idx,
                     const Twine &NameStr = "",
                     Instruction *InsertBefore = nullptr);
   InsertElementInst(Value *Vec, Value *NewElt, Value *Idx, const Twine &NameStr,
                     BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   InsertElementInst *cloneImpl() const;
 
 public:
   static InsertElementInst *Create(Value *Vec, Value *NewElt, Value *Idx,
                                    const Twine &NameStr = "",
                                    Instruction *InsertBefore = nullptr) {
     return new(3) InsertElementInst(Vec, NewElt, Idx, NameStr, InsertBefore);
   }
   static InsertElementInst *Create(Value *Vec, Value *NewElt, Value *Idx,
                                    const Twine &NameStr,
                                    BasicBlock *InsertAtEnd) {
     return new(3) InsertElementInst(Vec, NewElt, Idx, NameStr, InsertAtEnd);
   }
 
   /// isValidOperands - Return true if an insertelement instruction can be
   /// formed with the specified operands.
   static bool isValidOperands(const Value *Vec, const Value *NewElt,
                               const Value *Idx);
 
   /// getType - Overload to return most specific vector type.
   ///
   VectorType *getType() const {
     return cast<VectorType>(Instruction::getType());
   }
 
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::InsertElement;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 template <>
 struct OperandTraits<InsertElementInst> :
   public FixedNumOperandTraits<InsertElementInst, 3> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(InsertElementInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                           ShuffleVectorInst Class
 //===----------------------------------------------------------------------===//
 
 /// ShuffleVectorInst - This instruction constructs a fixed permutation of two
 /// input vectors.
 ///
 class ShuffleVectorInst : public Instruction {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   ShuffleVectorInst *cloneImpl() const;
 
 public:
   // allocate space for exactly three operands
   void *operator new(size_t s) {
     return User::operator new(s, 3);
   }
   ShuffleVectorInst(Value *V1, Value *V2, Value *Mask,
                     const Twine &NameStr = "",
                     Instruction *InsertBefor = nullptr);
   ShuffleVectorInst(Value *V1, Value *V2, Value *Mask,
                     const Twine &NameStr, BasicBlock *InsertAtEnd);
 
   /// isValidOperands - Return true if a shufflevector instruction can be
   /// formed with the specified operands.
   static bool isValidOperands(const Value *V1, const Value *V2,
                               const Value *Mask);
 
   /// getType - Overload to return most specific vector type.
   ///
   VectorType *getType() const {
     return cast<VectorType>(Instruction::getType());
   }
 
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   Constant *getMask() const {
     return cast<Constant>(getOperand(2));
   }
 
   /// getMaskValue - Return the index from the shuffle mask for the specified
   /// output result.  This is either -1 if the element is undef or a number less
   /// than 2*numelements.
   static int getMaskValue(Constant *Mask, unsigned i);
 
   int getMaskValue(unsigned i) const {
     return getMaskValue(getMask(), i);
   }
 
   /// getShuffleMask - Return the full mask for this instruction, where each
   /// element is the element number and undef's are returned as -1.
   static void getShuffleMask(Constant *Mask, SmallVectorImpl<int> &Result);
 
   void getShuffleMask(SmallVectorImpl<int> &Result) const {
     return getShuffleMask(getMask(), Result);
   }
 
   SmallVector<int, 16> getShuffleMask() const {
     SmallVector<int, 16> Mask;
     getShuffleMask(Mask);
     return Mask;
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::ShuffleVector;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 template <>
 struct OperandTraits<ShuffleVectorInst> :
   public FixedNumOperandTraits<ShuffleVectorInst, 3> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(ShuffleVectorInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                                ExtractValueInst Class
 //===----------------------------------------------------------------------===//
 
 /// ExtractValueInst - This instruction extracts a struct member or array
 /// element value from an aggregate value.
 ///
 class ExtractValueInst : public UnaryInstruction {
   SmallVector<unsigned, 4> Indices;
 
   ExtractValueInst(const ExtractValueInst &EVI);
   void init(ArrayRef<unsigned> Idxs, const Twine &NameStr);
 
   /// Constructors - Create a extractvalue instruction with a base aggregate
   /// value and a list of indices.  The first ctor can optionally insert before
   /// an existing instruction, the second appends the new instruction to the
   /// specified BasicBlock.
   inline ExtractValueInst(Value *Agg,
                           ArrayRef<unsigned> Idxs,
                           const Twine &NameStr,
                           Instruction *InsertBefore);
   inline ExtractValueInst(Value *Agg,
                           ArrayRef<unsigned> Idxs,
                           const Twine &NameStr, BasicBlock *InsertAtEnd);
 
   // allocate space for exactly one operand
   void *operator new(size_t s) { return User::operator new(s, 1); }
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   ExtractValueInst *cloneImpl() const;
 
 public:
   static ExtractValueInst *Create(Value *Agg,
                                   ArrayRef<unsigned> Idxs,
                                   const Twine &NameStr = "",
                                   Instruction *InsertBefore = nullptr) {
     return new
       ExtractValueInst(Agg, Idxs, NameStr, InsertBefore);
   }
   static ExtractValueInst *Create(Value *Agg,
                                   ArrayRef<unsigned> Idxs,
                                   const Twine &NameStr,
                                   BasicBlock *InsertAtEnd) {
     return new ExtractValueInst(Agg, Idxs, NameStr, InsertAtEnd);
   }
 
   /// getIndexedType - Returns the type of the element that would be extracted
   /// with an extractvalue instruction with the specified parameters.
   ///
   /// Null is returned if the indices are invalid for the specified type.
   static Type *getIndexedType(Type *Agg, ArrayRef<unsigned> Idxs);
 
   typedef const unsigned* idx_iterator;
   inline idx_iterator idx_begin() const { return Indices.begin(); }
   inline idx_iterator idx_end()   const { return Indices.end(); }
   inline iterator_range<idx_iterator> indices() const {
     return make_range(idx_begin(), idx_end());
   }
 
   Value *getAggregateOperand() {
     return getOperand(0);
   }
   const Value *getAggregateOperand() const {
     return getOperand(0);
   }
   static unsigned getAggregateOperandIndex() {
     return 0U;                      // get index for modifying correct operand
   }
 
   ArrayRef<unsigned> getIndices() const {
     return Indices;
   }
 
   unsigned getNumIndices() const {
     return (unsigned)Indices.size();
   }
 
   bool hasIndices() const {
     return true;
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::ExtractValue;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 ExtractValueInst::ExtractValueInst(Value *Agg,
                                    ArrayRef<unsigned> Idxs,
                                    const Twine &NameStr,
                                    Instruction *InsertBefore)
   : UnaryInstruction(checkGEPType(getIndexedType(Agg->getType(), Idxs)),
                      ExtractValue, Agg, InsertBefore) {
   init(Idxs, NameStr);
 }
 ExtractValueInst::ExtractValueInst(Value *Agg,
                                    ArrayRef<unsigned> Idxs,
                                    const Twine &NameStr,
                                    BasicBlock *InsertAtEnd)
   : UnaryInstruction(checkGEPType(getIndexedType(Agg->getType(), Idxs)),
                      ExtractValue, Agg, InsertAtEnd) {
   init(Idxs, NameStr);
 }
 
 //===----------------------------------------------------------------------===//
 //                                InsertValueInst Class
 //===----------------------------------------------------------------------===//
 
 /// InsertValueInst - This instruction inserts a struct field of array element
 /// value into an aggregate value.
 ///
 class InsertValueInst : public Instruction {
   SmallVector<unsigned, 4> Indices;
 
   void *operator new(size_t, unsigned) = delete;
   InsertValueInst(const InsertValueInst &IVI);
   void init(Value *Agg, Value *Val, ArrayRef<unsigned> Idxs,
             const Twine &NameStr);
 
   /// Constructors - Create a insertvalue instruction with a base aggregate
   /// value, a value to insert, and a list of indices.  The first ctor can
   /// optionally insert before an existing instruction, the second appends
   /// the new instruction to the specified BasicBlock.
   inline InsertValueInst(Value *Agg, Value *Val,
                          ArrayRef<unsigned> Idxs,
                          const Twine &NameStr,
                          Instruction *InsertBefore);
   inline InsertValueInst(Value *Agg, Value *Val,
                          ArrayRef<unsigned> Idxs,
                          const Twine &NameStr, BasicBlock *InsertAtEnd);
 
   /// Constructors - These two constructors are convenience methods because one
   /// and two index insertvalue instructions are so common.
   InsertValueInst(Value *Agg, Value *Val, unsigned Idx,
                   const Twine &NameStr = "",
                   Instruction *InsertBefore = nullptr);
   InsertValueInst(Value *Agg, Value *Val, unsigned Idx, const Twine &NameStr,
                   BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   InsertValueInst *cloneImpl() const;
 
 public:
   // allocate space for exactly two operands
   void *operator new(size_t s) {
     return User::operator new(s, 2);
   }
 
   static InsertValueInst *Create(Value *Agg, Value *Val,
                                  ArrayRef<unsigned> Idxs,
                                  const Twine &NameStr = "",
                                  Instruction *InsertBefore = nullptr) {
     return new InsertValueInst(Agg, Val, Idxs, NameStr, InsertBefore);
   }
   static InsertValueInst *Create(Value *Agg, Value *Val,
                                  ArrayRef<unsigned> Idxs,
                                  const Twine &NameStr,
                                  BasicBlock *InsertAtEnd) {
     return new InsertValueInst(Agg, Val, Idxs, NameStr, InsertAtEnd);
   }
 
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   typedef const unsigned* idx_iterator;
   inline idx_iterator idx_begin() const { return Indices.begin(); }
   inline idx_iterator idx_end()   const { return Indices.end(); }
   inline iterator_range<idx_iterator> indices() const {
     return make_range(idx_begin(), idx_end());
   }
 
   Value *getAggregateOperand() {
     return getOperand(0);
   }
   const Value *getAggregateOperand() const {
     return getOperand(0);
   }
   static unsigned getAggregateOperandIndex() {
     return 0U;                      // get index for modifying correct operand
   }
 
   Value *getInsertedValueOperand() {
     return getOperand(1);
   }
   const Value *getInsertedValueOperand() const {
     return getOperand(1);
   }
   static unsigned getInsertedValueOperandIndex() {
     return 1U;                      // get index for modifying correct operand
   }
 
   ArrayRef<unsigned> getIndices() const {
     return Indices;
   }
 
   unsigned getNumIndices() const {
     return (unsigned)Indices.size();
   }
 
   bool hasIndices() const {
     return true;
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::InsertValue;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 template <>
 struct OperandTraits<InsertValueInst> :
   public FixedNumOperandTraits<InsertValueInst, 2> {
 };
 
 InsertValueInst::InsertValueInst(Value *Agg,
                                  Value *Val,
                                  ArrayRef<unsigned> Idxs,
                                  const Twine &NameStr,
                                  Instruction *InsertBefore)
   : Instruction(Agg->getType(), InsertValue,
                 OperandTraits<InsertValueInst>::op_begin(this),
                 2, InsertBefore) {
   init(Agg, Val, Idxs, NameStr);
 }
 InsertValueInst::InsertValueInst(Value *Agg,
                                  Value *Val,
                                  ArrayRef<unsigned> Idxs,
                                  const Twine &NameStr,
                                  BasicBlock *InsertAtEnd)
   : Instruction(Agg->getType(), InsertValue,
                 OperandTraits<InsertValueInst>::op_begin(this),
                 2, InsertAtEnd) {
   init(Agg, Val, Idxs, NameStr);
 }
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(InsertValueInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                               PHINode Class
 //===----------------------------------------------------------------------===//
 
 // PHINode - The PHINode class is used to represent the magical mystical PHI
 // node, that can not exist in nature, but can be synthesized in a computer
 // scientist's overactive imagination.
 //
 class PHINode : public Instruction {
   void anchor() override;
 
   void *operator new(size_t, unsigned) = delete;
   /// ReservedSpace - The number of operands actually allocated.  NumOperands is
   /// the number actually in use.
   unsigned ReservedSpace;
   PHINode(const PHINode &PN);
   // allocate space for exactly zero operands
   void *operator new(size_t s) {
     return User::operator new(s);
   }
   explicit PHINode(Type *Ty, unsigned NumReservedValues,
                    const Twine &NameStr = "",
                    Instruction *InsertBefore = nullptr)
     : Instruction(Ty, Instruction::PHI, nullptr, 0, InsertBefore),
       ReservedSpace(NumReservedValues) {
     setName(NameStr);
     allocHungoffUses(ReservedSpace);
   }
 
   PHINode(Type *Ty, unsigned NumReservedValues, const Twine &NameStr,
           BasicBlock *InsertAtEnd)
     : Instruction(Ty, Instruction::PHI, nullptr, 0, InsertAtEnd),
       ReservedSpace(NumReservedValues) {
     setName(NameStr);
     allocHungoffUses(ReservedSpace);
   }
 
 protected:
   // allocHungoffUses - this is more complicated than the generic
   // User::allocHungoffUses, because we have to allocate Uses for the incoming
   // values and pointers to the incoming blocks, all in one allocation.
   void allocHungoffUses(unsigned N) {
     User::allocHungoffUses(N, /* IsPhi */ true);
   }
 
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   PHINode *cloneImpl() const;
 
 public:
   /// Constructors - NumReservedValues is a hint for the number of incoming
   /// edges that this phi node will have (use 0 if you really have no idea).
   static PHINode *Create(Type *Ty, unsigned NumReservedValues,
                          const Twine &NameStr = "",
                          Instruction *InsertBefore = nullptr) {
     return new PHINode(Ty, NumReservedValues, NameStr, InsertBefore);
   }
   static PHINode *Create(Type *Ty, unsigned NumReservedValues,
                          const Twine &NameStr, BasicBlock *InsertAtEnd) {
     return new PHINode(Ty, NumReservedValues, NameStr, InsertAtEnd);
   }
 
   /// Provide fast operand accessors
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   // Block iterator interface. This provides access to the list of incoming
   // basic blocks, which parallels the list of incoming values.
 
   typedef BasicBlock **block_iterator;
   typedef BasicBlock * const *const_block_iterator;
 
   block_iterator block_begin() {
     Use::UserRef *ref =
       reinterpret_cast<Use::UserRef*>(op_begin() + ReservedSpace);
     return reinterpret_cast<block_iterator>(ref + 1);
   }
 
   const_block_iterator block_begin() const {
     const Use::UserRef *ref =
       reinterpret_cast<const Use::UserRef*>(op_begin() + ReservedSpace);
     return reinterpret_cast<const_block_iterator>(ref + 1);
   }
 
   block_iterator block_end() {
     return block_begin() + getNumOperands();
   }
 
   const_block_iterator block_end() const {
     return block_begin() + getNumOperands();
   }
 
+  iterator_range<block_iterator> blocks() {
+    return make_range(block_begin(), block_end());
+  }
+
+  iterator_range<const_block_iterator> blocks() const {
+    return make_range(block_begin(), block_end());
+  }
+
   op_range incoming_values() { return operands(); }
 
   const_op_range incoming_values() const { return operands(); }
 
   /// getNumIncomingValues - Return the number of incoming edges
   ///
   unsigned getNumIncomingValues() const { return getNumOperands(); }
 
   /// getIncomingValue - Return incoming value number x
   ///
   Value *getIncomingValue(unsigned i) const {
     return getOperand(i);
   }
   void setIncomingValue(unsigned i, Value *V) {
     assert(V && "PHI node got a null value!");
     assert(getType() == V->getType() &&
            "All operands to PHI node must be the same type as the PHI node!");
     setOperand(i, V);
   }
   static unsigned getOperandNumForIncomingValue(unsigned i) {
     return i;
   }
   static unsigned getIncomingValueNumForOperand(unsigned i) {
     return i;
   }
 
   /// getIncomingBlock - Return incoming basic block number @p i.
   ///
   BasicBlock *getIncomingBlock(unsigned i) const {
     return block_begin()[i];
   }
 
   /// getIncomingBlock - Return incoming basic block corresponding
   /// to an operand of the PHI.
   ///
   BasicBlock *getIncomingBlock(const Use &U) const {
     assert(this == U.getUser() && "Iterator doesn't point to PHI's Uses?");
     return getIncomingBlock(unsigned(&U - op_begin()));
   }
 
   /// getIncomingBlock - Return incoming basic block corresponding
   /// to value use iterator.
   ///
   BasicBlock *getIncomingBlock(Value::const_user_iterator I) const {
     return getIncomingBlock(I.getUse());
   }
 
   void setIncomingBlock(unsigned i, BasicBlock *BB) {
     assert(BB && "PHI node got a null basic block!");
     block_begin()[i] = BB;
   }
 
   /// addIncoming - Add an incoming value to the end of the PHI list
   ///
   void addIncoming(Value *V, BasicBlock *BB) {
     if (getNumOperands() == ReservedSpace)
       growOperands();  // Get more space!
     // Initialize some new operands.
     setNumHungOffUseOperands(getNumOperands() + 1);
     setIncomingValue(getNumOperands() - 1, V);
     setIncomingBlock(getNumOperands() - 1, BB);
   }
 
   /// removeIncomingValue - Remove an incoming value.  This is useful if a
   /// predecessor basic block is deleted.  The value removed is returned.
   ///
   /// If the last incoming value for a PHI node is removed (and DeletePHIIfEmpty
   /// is true), the PHI node is destroyed and any uses of it are replaced with
   /// dummy values.  The only time there should be zero incoming values to a PHI
   /// node is when the block is dead, so this strategy is sound.
   ///
   Value *removeIncomingValue(unsigned Idx, bool DeletePHIIfEmpty = true);
 
   Value *removeIncomingValue(const BasicBlock *BB, bool DeletePHIIfEmpty=true) {
     int Idx = getBasicBlockIndex(BB);
     assert(Idx >= 0 && "Invalid basic block argument to remove!");
     return removeIncomingValue(Idx, DeletePHIIfEmpty);
   }
 
   /// getBasicBlockIndex - Return the first index of the specified basic
   /// block in the value list for this PHI.  Returns -1 if no instance.
   ///
   int getBasicBlockIndex(const BasicBlock *BB) const {
     for (unsigned i = 0, e = getNumOperands(); i != e; ++i)
       if (block_begin()[i] == BB)
         return i;
     return -1;
   }
 
   Value *getIncomingValueForBlock(const BasicBlock *BB) const {
     int Idx = getBasicBlockIndex(BB);
     assert(Idx >= 0 && "Invalid basic block argument!");
     return getIncomingValue(Idx);
   }
 
   /// hasConstantValue - If the specified PHI node always merges together the
   /// same value, return the value, otherwise return null.
   Value *hasConstantValue() const;
 
   /// Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::PHI;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   void growOperands();
 };
 
 template <>
 struct OperandTraits<PHINode> : public HungoffOperandTraits<2> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(PHINode, Value)
 
 //===----------------------------------------------------------------------===//
 //                           LandingPadInst Class
 //===----------------------------------------------------------------------===//
 
 //===---------------------------------------------------------------------------
 /// LandingPadInst - The landingpad instruction holds all of the information
 /// necessary to generate correct exception handling. The landingpad instruction
 /// cannot be moved from the top of a landing pad block, which itself is
 /// accessible only from the 'unwind' edge of an invoke. This uses the
 /// SubclassData field in Value to store whether or not the landingpad is a
 /// cleanup.
 ///
 class LandingPadInst : public Instruction {
   /// ReservedSpace - The number of operands actually allocated.  NumOperands is
   /// the number actually in use.
   unsigned ReservedSpace;
   LandingPadInst(const LandingPadInst &LP);
 
 public:
   enum ClauseType { Catch, Filter };
 
 private:
   void *operator new(size_t, unsigned) = delete;
   // Allocate space for exactly zero operands.
   void *operator new(size_t s) {
     return User::operator new(s);
   }
   void growOperands(unsigned Size);
   void init(unsigned NumReservedValues, const Twine &NameStr);
 
   explicit LandingPadInst(Type *RetTy, unsigned NumReservedValues,
                           const Twine &NameStr, Instruction *InsertBefore);
   explicit LandingPadInst(Type *RetTy, unsigned NumReservedValues,
                           const Twine &NameStr, BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   LandingPadInst *cloneImpl() const;
 
 public:
   /// Constructors - NumReservedClauses is a hint for the number of incoming
   /// clauses that this landingpad will have (use 0 if you really have no idea).
   static LandingPadInst *Create(Type *RetTy, unsigned NumReservedClauses,
                                 const Twine &NameStr = "",
                                 Instruction *InsertBefore = nullptr);
   static LandingPadInst *Create(Type *RetTy, unsigned NumReservedClauses,
                                 const Twine &NameStr, BasicBlock *InsertAtEnd);
 
   /// Provide fast operand accessors
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   /// isCleanup - Return 'true' if this landingpad instruction is a
   /// cleanup. I.e., it should be run when unwinding even if its landing pad
   /// doesn't catch the exception.
   bool isCleanup() const { return getSubclassDataFromInstruction() & 1; }
 
   /// setCleanup - Indicate that this landingpad instruction is a cleanup.
   void setCleanup(bool V) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~1) |
                                (V ? 1 : 0));
   }
 
   /// Add a catch or filter clause to the landing pad.
   void addClause(Constant *ClauseVal);
 
   /// Get the value of the clause at index Idx. Use isCatch/isFilter to
   /// determine what type of clause this is.
   Constant *getClause(unsigned Idx) const {
     return cast<Constant>(getOperandList()[Idx]);
   }
 
   /// isCatch - Return 'true' if the clause and index Idx is a catch clause.
   bool isCatch(unsigned Idx) const {
     return !isa<ArrayType>(getOperandList()[Idx]->getType());
   }
 
   /// isFilter - Return 'true' if the clause and index Idx is a filter clause.
   bool isFilter(unsigned Idx) const {
     return isa<ArrayType>(getOperandList()[Idx]->getType());
   }
 
   /// getNumClauses - Get the number of clauses for this landing pad.
   unsigned getNumClauses() const { return getNumOperands(); }
 
   /// reserveClauses - Grow the size of the operand list to accommodate the new
   /// number of clauses.
   void reserveClauses(unsigned Size) { growOperands(Size); }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::LandingPad;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 template <>
 struct OperandTraits<LandingPadInst> : public HungoffOperandTraits<1> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(LandingPadInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                               ReturnInst Class
 //===----------------------------------------------------------------------===//
 
 //===---------------------------------------------------------------------------
 /// ReturnInst - Return a value (possibly void), from a function.  Execution
 /// does not continue in this function any longer.
 ///
 class ReturnInst : public TerminatorInst {
   ReturnInst(const ReturnInst &RI);
 
 private:
   // ReturnInst constructors:
   // ReturnInst()                  - 'ret void' instruction
   // ReturnInst(    null)          - 'ret void' instruction
   // ReturnInst(Value* X)          - 'ret X'    instruction
   // ReturnInst(    null, Inst *I) - 'ret void' instruction, insert before I
   // ReturnInst(Value* X, Inst *I) - 'ret X'    instruction, insert before I
   // ReturnInst(    null, BB *B)   - 'ret void' instruction, insert @ end of B
   // ReturnInst(Value* X, BB *B)   - 'ret X'    instruction, insert @ end of B
   //
   // NOTE: If the Value* passed is of type void then the constructor behaves as
   // if it was passed NULL.
   explicit ReturnInst(LLVMContext &C, Value *retVal = nullptr,
                       Instruction *InsertBefore = nullptr);
   ReturnInst(LLVMContext &C, Value *retVal, BasicBlock *InsertAtEnd);
   explicit ReturnInst(LLVMContext &C, BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   ReturnInst *cloneImpl() const;
 
 public:
   static ReturnInst* Create(LLVMContext &C, Value *retVal = nullptr,
                             Instruction *InsertBefore = nullptr) {
     return new(!!retVal) ReturnInst(C, retVal, InsertBefore);
   }
   static ReturnInst* Create(LLVMContext &C, Value *retVal,
                             BasicBlock *InsertAtEnd) {
     return new(!!retVal) ReturnInst(C, retVal, InsertAtEnd);
   }
   static ReturnInst* Create(LLVMContext &C, BasicBlock *InsertAtEnd) {
     return new(0) ReturnInst(C, InsertAtEnd);
   }
   ~ReturnInst() override;
 
   /// Provide fast operand accessors
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   /// Convenience accessor. Returns null if there is no return value.
   Value *getReturnValue() const {
     return getNumOperands() != 0 ? getOperand(0) : nullptr;
   }
 
   unsigned getNumSuccessors() const { return 0; }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return (I->getOpcode() == Instruction::Ret);
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   BasicBlock *getSuccessorV(unsigned idx) const override;
   unsigned getNumSuccessorsV() const override;
   void setSuccessorV(unsigned idx, BasicBlock *B) override;
 };
 
 template <>
 struct OperandTraits<ReturnInst> : public VariadicOperandTraits<ReturnInst> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(ReturnInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                               BranchInst Class
 //===----------------------------------------------------------------------===//
 
 //===---------------------------------------------------------------------------
 /// BranchInst - Conditional or Unconditional Branch instruction.
 ///
 class BranchInst : public TerminatorInst {
   /// Ops list - Branches are strange.  The operands are ordered:
   ///  [Cond, FalseDest,] TrueDest.  This makes some accessors faster because
   /// they don't have to check for cond/uncond branchness. These are mostly
   /// accessed relative from op_end().
   BranchInst(const BranchInst &BI);
   void AssertOK();
   // BranchInst constructors (where {B, T, F} are blocks, and C is a condition):
   // BranchInst(BB *B)                           - 'br B'
   // BranchInst(BB* T, BB *F, Value *C)          - 'br C, T, F'
   // BranchInst(BB* B, Inst *I)                  - 'br B'        insert before I
   // BranchInst(BB* T, BB *F, Value *C, Inst *I) - 'br C, T, F', insert before I
   // BranchInst(BB* B, BB *I)                    - 'br B'        insert at end
   // BranchInst(BB* T, BB *F, Value *C, BB *I)   - 'br C, T, F', insert at end
   explicit BranchInst(BasicBlock *IfTrue, Instruction *InsertBefore = nullptr);
   BranchInst(BasicBlock *IfTrue, BasicBlock *IfFalse, Value *Cond,
              Instruction *InsertBefore = nullptr);
   BranchInst(BasicBlock *IfTrue, BasicBlock *InsertAtEnd);
   BranchInst(BasicBlock *IfTrue, BasicBlock *IfFalse, Value *Cond,
              BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   BranchInst *cloneImpl() const;
 
 public:
   static BranchInst *Create(BasicBlock *IfTrue,
                             Instruction *InsertBefore = nullptr) {
     return new(1) BranchInst(IfTrue, InsertBefore);
   }
   static BranchInst *Create(BasicBlock *IfTrue, BasicBlock *IfFalse,
                             Value *Cond, Instruction *InsertBefore = nullptr) {
     return new(3) BranchInst(IfTrue, IfFalse, Cond, InsertBefore);
   }
   static BranchInst *Create(BasicBlock *IfTrue, BasicBlock *InsertAtEnd) {
     return new(1) BranchInst(IfTrue, InsertAtEnd);
   }
   static BranchInst *Create(BasicBlock *IfTrue, BasicBlock *IfFalse,
                             Value *Cond, BasicBlock *InsertAtEnd) {
     return new(3) BranchInst(IfTrue, IfFalse, Cond, InsertAtEnd);
   }
 
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   bool isUnconditional() const { return getNumOperands() == 1; }
   bool isConditional()   const { return getNumOperands() == 3; }
 
   Value *getCondition() const {
     assert(isConditional() && "Cannot get condition of an uncond branch!");
     return Op<-3>();
   }
 
   void setCondition(Value *V) {
     assert(isConditional() && "Cannot set condition of unconditional branch!");
     Op<-3>() = V;
   }
 
   unsigned getNumSuccessors() const { return 1+isConditional(); }
 
   BasicBlock *getSuccessor(unsigned i) const {
     assert(i < getNumSuccessors() && "Successor # out of range for Branch!");
     return cast_or_null<BasicBlock>((&Op<-1>() - i)->get());
   }
 
   void setSuccessor(unsigned idx, BasicBlock *NewSucc) {
     assert(idx < getNumSuccessors() && "Successor # out of range for Branch!");
     *(&Op<-1>() - idx) = NewSucc;
   }
 
   /// \brief Swap the successors of this branch instruction.
   ///
   /// Swaps the successors of the branch instruction. This also swaps any
   /// branch weight metadata associated with the instruction so that it
   /// continues to map correctly to each operand.
   void swapSuccessors();
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return (I->getOpcode() == Instruction::Br);
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   BasicBlock *getSuccessorV(unsigned idx) const override;
   unsigned getNumSuccessorsV() const override;
   void setSuccessorV(unsigned idx, BasicBlock *B) override;
 };
 
 template <>
 struct OperandTraits<BranchInst> : public VariadicOperandTraits<BranchInst, 1> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(BranchInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                               SwitchInst Class
 //===----------------------------------------------------------------------===//
 
 //===---------------------------------------------------------------------------
 /// SwitchInst - Multiway switch
 ///
 class SwitchInst : public TerminatorInst {
   void *operator new(size_t, unsigned) = delete;
   unsigned ReservedSpace;
   // Operand[0]    = Value to switch on
   // Operand[1]    = Default basic block destination
   // Operand[2n  ] = Value to match
   // Operand[2n+1] = BasicBlock to go to on match
   SwitchInst(const SwitchInst &SI);
   void init(Value *Value, BasicBlock *Default, unsigned NumReserved);
   void growOperands();
   // allocate space for exactly zero operands
   void *operator new(size_t s) {
     return User::operator new(s);
   }
   /// SwitchInst ctor - Create a new switch instruction, specifying a value to
   /// switch on and a default destination.  The number of additional cases can
   /// be specified here to make memory allocation more efficient.  This
   /// constructor can also autoinsert before another instruction.
   SwitchInst(Value *Value, BasicBlock *Default, unsigned NumCases,
              Instruction *InsertBefore);
 
   /// SwitchInst ctor - Create a new switch instruction, specifying a value to
   /// switch on and a default destination.  The number of additional cases can
   /// be specified here to make memory allocation more efficient.  This
   /// constructor also autoinserts at the end of the specified BasicBlock.
   SwitchInst(Value *Value, BasicBlock *Default, unsigned NumCases,
              BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   SwitchInst *cloneImpl() const;
 
 public:
   // -2
   static const unsigned DefaultPseudoIndex = static_cast<unsigned>(~0L-1);
 
   template <class SwitchInstTy, class ConstantIntTy, class BasicBlockTy>
   class CaseIteratorT {
   protected:
     SwitchInstTy *SI;
     unsigned Index;
 
   public:
     typedef CaseIteratorT<SwitchInstTy, ConstantIntTy, BasicBlockTy> Self;
 
     /// Initializes case iterator for given SwitchInst and for given
     /// case number.
     CaseIteratorT(SwitchInstTy *SI, unsigned CaseNum) {
       this->SI = SI;
       Index = CaseNum;
     }
 
     /// Initializes case iterator for given SwitchInst and for given
     /// TerminatorInst's successor index.
     static Self fromSuccessorIndex(SwitchInstTy *SI, unsigned SuccessorIndex) {
       assert(SuccessorIndex < SI->getNumSuccessors() &&
              "Successor index # out of range!");
       return SuccessorIndex != 0 ?
              Self(SI, SuccessorIndex - 1) :
              Self(SI, DefaultPseudoIndex);
     }
 
     /// Resolves case value for current case.
     ConstantIntTy *getCaseValue() {
       assert(Index < SI->getNumCases() && "Index out the number of cases.");
       return reinterpret_cast<ConstantIntTy*>(SI->getOperand(2 + Index*2));
     }
 
     /// Resolves successor for current case.
     BasicBlockTy *getCaseSuccessor() {
       assert((Index < SI->getNumCases() ||
               Index == DefaultPseudoIndex) &&
              "Index out the number of cases.");
       return SI->getSuccessor(getSuccessorIndex());
     }
 
     /// Returns number of current case.
     unsigned getCaseIndex() const { return Index; }
 
     /// Returns TerminatorInst's successor index for current case successor.
     unsigned getSuccessorIndex() const {
       assert((Index == DefaultPseudoIndex || Index < SI->getNumCases()) &&
              "Index out the number of cases.");
       return Index != DefaultPseudoIndex ? Index + 1 : 0;
     }
 
     Self operator++() {
       // Check index correctness after increment.
       // Note: Index == getNumCases() means end().
       assert(Index+1 <= SI->getNumCases() && "Index out the number of cases.");
       ++Index;
       return *this;
     }
     Self operator++(int) {
       Self tmp = *this;
       ++(*this);
       return tmp;
     }
     Self operator--() {
       // Check index correctness after decrement.
       // Note: Index == getNumCases() means end().
       // Also allow "-1" iterator here. That will became valid after ++.
       assert((Index == 0 || Index-1 <= SI->getNumCases()) &&
              "Index out the number of cases.");
       --Index;
       return *this;
     }
     Self operator--(int) {
       Self tmp = *this;
       --(*this);
       return tmp;
     }
     bool operator==(const Self& RHS) const {
       assert(RHS.SI == SI && "Incompatible operators.");
       return RHS.Index == Index;
     }
     bool operator!=(const Self& RHS) const {
       assert(RHS.SI == SI && "Incompatible operators.");
       return RHS.Index != Index;
     }
     Self &operator*() {
       return *this;
     }
   };
 
   typedef CaseIteratorT<const SwitchInst, const ConstantInt, const BasicBlock>
     ConstCaseIt;
 
   class CaseIt : public CaseIteratorT<SwitchInst, ConstantInt, BasicBlock> {
 
     typedef CaseIteratorT<SwitchInst, ConstantInt, BasicBlock> ParentTy;
 
   public:
     CaseIt(const ParentTy &Src) : ParentTy(Src) {}
     CaseIt(SwitchInst *SI, unsigned CaseNum) : ParentTy(SI, CaseNum) {}
 
     /// Sets the new value for current case.
     void setValue(ConstantInt *V) {
       assert(Index < SI->getNumCases() && "Index out the number of cases.");
       SI->setOperand(2 + Index*2, reinterpret_cast<Value*>(V));
     }
 
     /// Sets the new successor for current case.
     void setSuccessor(BasicBlock *S) {
       SI->setSuccessor(getSuccessorIndex(), S);
     }
   };
 
   static SwitchInst *Create(Value *Value, BasicBlock *Default,
                             unsigned NumCases,
                             Instruction *InsertBefore = nullptr) {
     return new SwitchInst(Value, Default, NumCases, InsertBefore);
   }
   static SwitchInst *Create(Value *Value, BasicBlock *Default,
                             unsigned NumCases, BasicBlock *InsertAtEnd) {
     return new SwitchInst(Value, Default, NumCases, InsertAtEnd);
   }
 
   /// Provide fast operand accessors
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   // Accessor Methods for Switch stmt
   Value *getCondition() const { return getOperand(0); }
   void setCondition(Value *V) { setOperand(0, V); }
 
   BasicBlock *getDefaultDest() const {
     return cast<BasicBlock>(getOperand(1));
   }
 
   void setDefaultDest(BasicBlock *DefaultCase) {
     setOperand(1, reinterpret_cast<Value*>(DefaultCase));
   }
 
   /// getNumCases - return the number of 'cases' in this switch instruction,
   /// except the default case
   unsigned getNumCases() const {
     return getNumOperands()/2 - 1;
   }
 
   /// Returns a read/write iterator that points to the first
   /// case in SwitchInst.
   CaseIt case_begin() {
     return CaseIt(this, 0);
   }
   /// Returns a read-only iterator that points to the first
   /// case in the SwitchInst.
   ConstCaseIt case_begin() const {
     return ConstCaseIt(this, 0);
   }
 
   /// Returns a read/write iterator that points one past the last
   /// in the SwitchInst.
   CaseIt case_end() {
     return CaseIt(this, getNumCases());
   }
   /// Returns a read-only iterator that points one past the last
   /// in the SwitchInst.
   ConstCaseIt case_end() const {
     return ConstCaseIt(this, getNumCases());
   }
 
   /// cases - iteration adapter for range-for loops.
   iterator_range<CaseIt> cases() {
     return make_range(case_begin(), case_end());
   }
 
   /// cases - iteration adapter for range-for loops.
   iterator_range<ConstCaseIt> cases() const {
     return make_range(case_begin(), case_end());
   }
 
   /// Returns an iterator that points to the default case.
   /// Note: this iterator allows to resolve successor only. Attempt
   /// to resolve case value causes an assertion.
   /// Also note, that increment and decrement also causes an assertion and
   /// makes iterator invalid.
   CaseIt case_default() {
     return CaseIt(this, DefaultPseudoIndex);
   }
   ConstCaseIt case_default() const {
     return ConstCaseIt(this, DefaultPseudoIndex);
   }
 
   /// findCaseValue - Search all of the case values for the specified constant.
   /// If it is explicitly handled, return the case iterator of it, otherwise
   /// return default case iterator to indicate
   /// that it is handled by the default handler.
   CaseIt findCaseValue(const ConstantInt *C) {
     for (CaseIt i = case_begin(), e = case_end(); i != e; ++i)
       if (i.getCaseValue() == C)
         return i;
     return case_default();
   }
   ConstCaseIt findCaseValue(const ConstantInt *C) const {
     for (ConstCaseIt i = case_begin(), e = case_end(); i != e; ++i)
       if (i.getCaseValue() == C)
         return i;
     return case_default();
   }
 
   /// findCaseDest - Finds the unique case value for a given successor. Returns
   /// null if the successor is not found, not unique, or is the default case.
   ConstantInt *findCaseDest(BasicBlock *BB) {
     if (BB == getDefaultDest()) return nullptr;
 
     ConstantInt *CI = nullptr;
     for (CaseIt i = case_begin(), e = case_end(); i != e; ++i) {
       if (i.getCaseSuccessor() == BB) {
         if (CI) return nullptr;   // Multiple cases lead to BB.
         else CI = i.getCaseValue();
       }
     }
     return CI;
   }
 
   /// addCase - Add an entry to the switch instruction...
   /// Note:
   /// This action invalidates case_end(). Old case_end() iterator will
   /// point to the added case.
   void addCase(ConstantInt *OnVal, BasicBlock *Dest);
 
   /// removeCase - This method removes the specified case and its successor
   /// from the switch instruction. Note that this operation may reorder the
   /// remaining cases at index idx and above.
   /// Note:
   /// This action invalidates iterators for all cases following the one removed,
   /// including the case_end() iterator.
   void removeCase(CaseIt i);
 
   unsigned getNumSuccessors() const { return getNumOperands()/2; }
   BasicBlock *getSuccessor(unsigned idx) const {
     assert(idx < getNumSuccessors() &&"Successor idx out of range for switch!");
     return cast<BasicBlock>(getOperand(idx*2+1));
   }
   void setSuccessor(unsigned idx, BasicBlock *NewSucc) {
     assert(idx < getNumSuccessors() && "Successor # out of range for switch!");
     setOperand(idx * 2 + 1, NewSucc);
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::Switch;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   BasicBlock *getSuccessorV(unsigned idx) const override;
   unsigned getNumSuccessorsV() const override;
   void setSuccessorV(unsigned idx, BasicBlock *B) override;
 };
 
 template <>
 struct OperandTraits<SwitchInst> : public HungoffOperandTraits<2> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(SwitchInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                             IndirectBrInst Class
 //===----------------------------------------------------------------------===//
 
 //===---------------------------------------------------------------------------
 /// IndirectBrInst - Indirect Branch Instruction.
 ///
 class IndirectBrInst : public TerminatorInst {
   void *operator new(size_t, unsigned) = delete;
   unsigned ReservedSpace;
   // Operand[0]    = Value to switch on
   // Operand[1]    = Default basic block destination
   // Operand[2n  ] = Value to match
   // Operand[2n+1] = BasicBlock to go to on match
   IndirectBrInst(const IndirectBrInst &IBI);
   void init(Value *Address, unsigned NumDests);
   void growOperands();
   // allocate space for exactly zero operands
   void *operator new(size_t s) {
     return User::operator new(s);
   }
   /// IndirectBrInst ctor - Create a new indirectbr instruction, specifying an
   /// Address to jump to.  The number of expected destinations can be specified
   /// here to make memory allocation more efficient.  This constructor can also
   /// autoinsert before another instruction.
   IndirectBrInst(Value *Address, unsigned NumDests, Instruction *InsertBefore);
 
   /// IndirectBrInst ctor - Create a new indirectbr instruction, specifying an
   /// Address to jump to.  The number of expected destinations can be specified
   /// here to make memory allocation more efficient.  This constructor also
   /// autoinserts at the end of the specified BasicBlock.
   IndirectBrInst(Value *Address, unsigned NumDests, BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   IndirectBrInst *cloneImpl() const;
 
 public:
   static IndirectBrInst *Create(Value *Address, unsigned NumDests,
                                 Instruction *InsertBefore = nullptr) {
     return new IndirectBrInst(Address, NumDests, InsertBefore);
   }
   static IndirectBrInst *Create(Value *Address, unsigned NumDests,
                                 BasicBlock *InsertAtEnd) {
     return new IndirectBrInst(Address, NumDests, InsertAtEnd);
   }
 
   /// Provide fast operand accessors.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   // Accessor Methods for IndirectBrInst instruction.
   Value *getAddress() { return getOperand(0); }
   const Value *getAddress() const { return getOperand(0); }
   void setAddress(Value *V) { setOperand(0, V); }
 
   /// getNumDestinations - return the number of possible destinations in this
   /// indirectbr instruction.
   unsigned getNumDestinations() const { return getNumOperands()-1; }
 
   /// getDestination - Return the specified destination.
   BasicBlock *getDestination(unsigned i) { return getSuccessor(i); }
   const BasicBlock *getDestination(unsigned i) const { return getSuccessor(i); }
 
   /// addDestination - Add a destination.
   ///
   void addDestination(BasicBlock *Dest);
 
   /// removeDestination - This method removes the specified successor from the
   /// indirectbr instruction.
   void removeDestination(unsigned i);
 
   unsigned getNumSuccessors() const { return getNumOperands()-1; }
   BasicBlock *getSuccessor(unsigned i) const {
     return cast<BasicBlock>(getOperand(i+1));
   }
   void setSuccessor(unsigned i, BasicBlock *NewSucc) {
     setOperand(i + 1, NewSucc);
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::IndirectBr;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   BasicBlock *getSuccessorV(unsigned idx) const override;
   unsigned getNumSuccessorsV() const override;
   void setSuccessorV(unsigned idx, BasicBlock *B) override;
 };
 
 template <>
 struct OperandTraits<IndirectBrInst> : public HungoffOperandTraits<1> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(IndirectBrInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                               InvokeInst Class
 //===----------------------------------------------------------------------===//
 
 /// InvokeInst - Invoke instruction.  The SubclassData field is used to hold the
 /// calling convention of the call.
 ///
 class InvokeInst : public TerminatorInst,
                    public OperandBundleUser<InvokeInst, User::op_iterator> {
   AttributeSet AttributeList;
   FunctionType *FTy;
   InvokeInst(const InvokeInst &BI);
   void init(Value *Func, BasicBlock *IfNormal, BasicBlock *IfException,
             ArrayRef<Value *> Args, ArrayRef<OperandBundleDef> Bundles,
             const Twine &NameStr) {
     init(cast<FunctionType>(
              cast<PointerType>(Func->getType())->getElementType()),
          Func, IfNormal, IfException, Args, Bundles, NameStr);
   }
   void init(FunctionType *FTy, Value *Func, BasicBlock *IfNormal,
             BasicBlock *IfException, ArrayRef<Value *> Args,
             ArrayRef<OperandBundleDef> Bundles, const Twine &NameStr);
 
   /// Construct an InvokeInst given a range of arguments.
   ///
   /// \brief Construct an InvokeInst from a range of arguments
   inline InvokeInst(Value *Func, BasicBlock *IfNormal, BasicBlock *IfException,
                     ArrayRef<Value *> Args, ArrayRef<OperandBundleDef> Bundles,
                     unsigned Values, const Twine &NameStr,
                     Instruction *InsertBefore)
       : InvokeInst(cast<FunctionType>(
                        cast<PointerType>(Func->getType())->getElementType()),
                    Func, IfNormal, IfException, Args, Bundles, Values, NameStr,
                    InsertBefore) {}
 
   inline InvokeInst(FunctionType *Ty, Value *Func, BasicBlock *IfNormal,
                     BasicBlock *IfException, ArrayRef<Value *> Args,
                     ArrayRef<OperandBundleDef> Bundles, unsigned Values,
                     const Twine &NameStr, Instruction *InsertBefore);
   /// Construct an InvokeInst given a range of arguments.
   ///
   /// \brief Construct an InvokeInst from a range of arguments
   inline InvokeInst(Value *Func, BasicBlock *IfNormal, BasicBlock *IfException,
                     ArrayRef<Value *> Args, ArrayRef<OperandBundleDef> Bundles,
                     unsigned Values, const Twine &NameStr,
                     BasicBlock *InsertAtEnd);
 
   friend class OperandBundleUser<InvokeInst, User::op_iterator>;
   bool hasDescriptor() const { return HasDescriptor; }
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   InvokeInst *cloneImpl() const;
 
 public:
   static InvokeInst *Create(Value *Func, BasicBlock *IfNormal,
                             BasicBlock *IfException, ArrayRef<Value *> Args,
                             const Twine &NameStr,
                             Instruction *InsertBefore = nullptr) {
     return Create(cast<FunctionType>(
                       cast<PointerType>(Func->getType())->getElementType()),
                   Func, IfNormal, IfException, Args, None, NameStr,
                   InsertBefore);
   }
   static InvokeInst *Create(Value *Func, BasicBlock *IfNormal,
                             BasicBlock *IfException, ArrayRef<Value *> Args,
                             ArrayRef<OperandBundleDef> Bundles = None,
                             const Twine &NameStr = "",
                             Instruction *InsertBefore = nullptr) {
     return Create(cast<FunctionType>(
                       cast<PointerType>(Func->getType())->getElementType()),
                   Func, IfNormal, IfException, Args, Bundles, NameStr,
                   InsertBefore);
   }
   static InvokeInst *Create(FunctionType *Ty, Value *Func, BasicBlock *IfNormal,
                             BasicBlock *IfException, ArrayRef<Value *> Args,
                             const Twine &NameStr,
                             Instruction *InsertBefore = nullptr) {
     unsigned Values = unsigned(Args.size()) + 3;
     return new (Values) InvokeInst(Ty, Func, IfNormal, IfException, Args, None,
                                    Values, NameStr, InsertBefore);
   }
   static InvokeInst *Create(FunctionType *Ty, Value *Func, BasicBlock *IfNormal,
                             BasicBlock *IfException, ArrayRef<Value *> Args,
                             ArrayRef<OperandBundleDef> Bundles = None,
                             const Twine &NameStr = "",
                             Instruction *InsertBefore = nullptr) {
     unsigned Values = unsigned(Args.size()) + CountBundleInputs(Bundles) + 3;
     unsigned DescriptorBytes = Bundles.size() * sizeof(BundleOpInfo);
 
     return new (Values, DescriptorBytes)
         InvokeInst(Ty, Func, IfNormal, IfException, Args, Bundles, Values,
                    NameStr, InsertBefore);
   }
   static InvokeInst *Create(Value *Func,
                             BasicBlock *IfNormal, BasicBlock *IfException,
                             ArrayRef<Value *> Args, const Twine &NameStr,
                             BasicBlock *InsertAtEnd) {
     unsigned Values = unsigned(Args.size()) + 3;
     return new (Values) InvokeInst(Func, IfNormal, IfException, Args, None,
                                    Values, NameStr, InsertAtEnd);
   }
   static InvokeInst *Create(Value *Func, BasicBlock *IfNormal,
                             BasicBlock *IfException, ArrayRef<Value *> Args,
                             ArrayRef<OperandBundleDef> Bundles,
                             const Twine &NameStr, BasicBlock *InsertAtEnd) {
     unsigned Values = unsigned(Args.size()) + CountBundleInputs(Bundles) + 3;
     unsigned DescriptorBytes = Bundles.size() * sizeof(BundleOpInfo);
 
     return new (Values, DescriptorBytes)
         InvokeInst(Func, IfNormal, IfException, Args, Bundles, Values, NameStr,
                    InsertAtEnd);
   }
 
   /// \brief Create a clone of \p II with a different set of operand bundles and
   /// insert it before \p InsertPt.
   ///
   /// The returned invoke instruction is identical to \p II in every way except
   /// that the operand bundles for the new instruction are set to the operand
   /// bundles in \p Bundles.
   static InvokeInst *Create(InvokeInst *II, ArrayRef<OperandBundleDef> Bundles,
                             Instruction *InsertPt = nullptr);
 
   /// Provide fast operand accessors
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   FunctionType *getFunctionType() const { return FTy; }
 
   void mutateFunctionType(FunctionType *FTy) {
     mutateType(FTy->getReturnType());
     this->FTy = FTy;
   }
 
   /// getNumArgOperands - Return the number of invoke arguments.
   ///
   unsigned getNumArgOperands() const {
     return getNumOperands() - getNumTotalBundleOperands() - 3;
   }
 
   /// getArgOperand/setArgOperand - Return/set the i-th invoke argument.
   ///
   Value *getArgOperand(unsigned i) const {
     assert(i < getNumArgOperands() && "Out of bounds!");
     return getOperand(i);
   }
   void setArgOperand(unsigned i, Value *v) {
     assert(i < getNumArgOperands() && "Out of bounds!");
     setOperand(i, v);
   }
 
   /// \brief Return the iterator pointing to the beginning of the argument list.
   op_iterator arg_begin() { return op_begin(); }
 
   /// \brief Return the iterator pointing to the end of the argument list.
   op_iterator arg_end() {
     // [ invoke args ], [ operand bundles ], normal dest, unwind dest, callee
     return op_end() - getNumTotalBundleOperands() - 3;
   };
 
   /// \brief Iteration adapter for range-for loops.
   iterator_range<op_iterator> arg_operands() {
     return make_range(arg_begin(), arg_end());
   }
 
   /// \brief Return the iterator pointing to the beginning of the argument list.
   const_op_iterator arg_begin() const { return op_begin(); }
 
   /// \brief Return the iterator pointing to the end of the argument list.
   const_op_iterator arg_end() const {
     // [ invoke args ], [ operand bundles ], normal dest, unwind dest, callee
     return op_end() - getNumTotalBundleOperands() - 3;
   };
 
   /// \brief Iteration adapter for range-for loops.
   iterator_range<const_op_iterator> arg_operands() const {
     return make_range(arg_begin(), arg_end());
   }
 
   /// \brief Wrappers for getting the \c Use of a invoke argument.
   const Use &getArgOperandUse(unsigned i) const {
     assert(i < getNumArgOperands() && "Out of bounds!");
     return getOperandUse(i);
   }
   Use &getArgOperandUse(unsigned i) {
     assert(i < getNumArgOperands() && "Out of bounds!");
     return getOperandUse(i);
   }
 
   /// getCallingConv/setCallingConv - Get or set the calling convention of this
   /// function call.
   CallingConv::ID getCallingConv() const {
     return static_cast<CallingConv::ID>(getSubclassDataFromInstruction());
   }
   void setCallingConv(CallingConv::ID CC) {
     auto ID = static_cast<unsigned>(CC);
     assert(!(ID & ~CallingConv::MaxID) && "Unsupported calling convention");
     setInstructionSubclassData(ID);
   }
 
   /// getAttributes - Return the parameter attributes for this invoke.
   ///
   const AttributeSet &getAttributes() const { return AttributeList; }
 
   /// setAttributes - Set the parameter attributes for this invoke.
   ///
   void setAttributes(const AttributeSet &Attrs) { AttributeList = Attrs; }
 
   /// addAttribute - adds the attribute to the list of attributes.
   void addAttribute(unsigned i, Attribute::AttrKind attr);
 
   /// removeAttribute - removes the attribute from the list of attributes.
   void removeAttribute(unsigned i, Attribute attr);
 
   /// \brief adds the dereferenceable attribute to the list of attributes.
   void addDereferenceableAttr(unsigned i, uint64_t Bytes);
 
   /// \brief adds the dereferenceable_or_null attribute to the list of
   /// attributes.
   void addDereferenceableOrNullAttr(unsigned i, uint64_t Bytes);
 
   /// \brief Determine whether this call has the given attribute.
   bool hasFnAttr(Attribute::AttrKind A) const {
     assert(A != Attribute::NoBuiltin &&
            "Use CallInst::isNoBuiltin() to check for Attribute::NoBuiltin");
     return hasFnAttrImpl(A);
   }
 
   /// \brief Determine whether this call has the given attribute.
   bool hasFnAttr(StringRef A) const {
     return hasFnAttrImpl(A);
   }
 
   /// \brief Determine whether the call or the callee has the given attributes.
   bool paramHasAttr(unsigned i, Attribute::AttrKind A) const;
 
   /// \brief Return true if the data operand at index \p i has the attribute \p
   /// A.
   ///
   /// Data operands include invoke arguments and values used in operand bundles,
   /// but does not include the invokee operand, or the two successor blocks.
   /// This routine dispatches to the underlying AttributeList or the
   /// OperandBundleUser as appropriate.
   ///
   /// The index \p i is interpreted as
   ///
   ///  \p i == Attribute::ReturnIndex  -> the return value
   ///  \p i in [1, arg_size + 1)  -> argument number (\p i - 1)
   ///  \p i in [arg_size + 1, data_operand_size + 1) -> bundle operand at index
   ///     (\p i - 1) in the operand list.
   bool dataOperandHasImpliedAttr(unsigned i, Attribute::AttrKind A) const;
 
   /// \brief Extract the alignment for a call or parameter (0=unknown).
   unsigned getParamAlignment(unsigned i) const {
     return AttributeList.getParamAlignment(i);
   }
 
   /// \brief Extract the number of dereferenceable bytes for a call or
   /// parameter (0=unknown).
   uint64_t getDereferenceableBytes(unsigned i) const {
     return AttributeList.getDereferenceableBytes(i);
   }
 
   /// \brief Extract the number of dereferenceable_or_null bytes for a call or
   /// parameter (0=unknown).
   uint64_t getDereferenceableOrNullBytes(unsigned i) const {
     return AttributeList.getDereferenceableOrNullBytes(i);
   }
 
   /// @brief Determine if the parameter or return value is marked with NoAlias
   /// attribute.
   /// @param n The parameter to check. 1 is the first parameter, 0 is the return
   bool doesNotAlias(unsigned n) const {
     return AttributeList.hasAttribute(n, Attribute::NoAlias);
   }
 
   /// \brief Return true if the call should not be treated as a call to a
   /// builtin.
   bool isNoBuiltin() const {
     // We assert in hasFnAttr if one passes in Attribute::NoBuiltin, so we have
     // to check it by hand.
     return hasFnAttrImpl(Attribute::NoBuiltin) &&
       !hasFnAttrImpl(Attribute::Builtin);
   }
 
   /// \brief Return true if the call should not be inlined.
   bool isNoInline() const { return hasFnAttr(Attribute::NoInline); }
   void setIsNoInline() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::NoInline);
   }
 
   /// \brief Determine if the call does not access memory.
   bool doesNotAccessMemory() const {
     return hasFnAttr(Attribute::ReadNone);
   }
   void setDoesNotAccessMemory() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::ReadNone);
   }
 
   /// \brief Determine if the call does not access or only reads memory.
   bool onlyReadsMemory() const {
     return doesNotAccessMemory() || hasFnAttr(Attribute::ReadOnly);
   }
   void setOnlyReadsMemory() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::ReadOnly);
   }
 
   /// @brief Determine if the call access memmory only using it's pointer
   /// arguments.
   bool onlyAccessesArgMemory() const {
     return hasFnAttr(Attribute::ArgMemOnly);
   }
   void setOnlyAccessesArgMemory() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::ArgMemOnly);
   }
 
   /// \brief Determine if the call cannot return.
   bool doesNotReturn() const { return hasFnAttr(Attribute::NoReturn); }
   void setDoesNotReturn() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::NoReturn);
   }
 
   /// \brief Determine if the call cannot unwind.
   bool doesNotThrow() const { return hasFnAttr(Attribute::NoUnwind); }
   void setDoesNotThrow() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::NoUnwind);
   }
 
   /// \brief Determine if the invoke cannot be duplicated.
   bool cannotDuplicate() const {return hasFnAttr(Attribute::NoDuplicate); }
   void setCannotDuplicate() {
     addAttribute(AttributeSet::FunctionIndex, Attribute::NoDuplicate);
   }
 
   /// \brief Determine if the call returns a structure through first
   /// pointer argument.
   bool hasStructRetAttr() const {
     if (getNumArgOperands() == 0)
       return false;
 
     // Be friendly and also check the callee.
     return paramHasAttr(1, Attribute::StructRet);
   }
 
   /// \brief Determine if any call argument is an aggregate passed by value.
   bool hasByValArgument() const {
     return AttributeList.hasAttrSomewhere(Attribute::ByVal);
   }
 
   /// getCalledFunction - Return the function called, or null if this is an
   /// indirect function invocation.
   ///
   Function *getCalledFunction() const {
     return dyn_cast<Function>(Op<-3>());
   }
 
   /// getCalledValue - Get a pointer to the function that is invoked by this
   /// instruction
   const Value *getCalledValue() const { return Op<-3>(); }
         Value *getCalledValue()       { return Op<-3>(); }
 
   /// setCalledFunction - Set the function called.
   void setCalledFunction(Value* Fn) {
     setCalledFunction(
         cast<FunctionType>(cast<PointerType>(Fn->getType())->getElementType()),
         Fn);
   }
   void setCalledFunction(FunctionType *FTy, Value *Fn) {
     this->FTy = FTy;
     assert(FTy == cast<FunctionType>(
                       cast<PointerType>(Fn->getType())->getElementType()));
     Op<-3>() = Fn;
   }
 
   // get*Dest - Return the destination basic blocks...
   BasicBlock *getNormalDest() const {
     return cast<BasicBlock>(Op<-2>());
   }
   BasicBlock *getUnwindDest() const {
     return cast<BasicBlock>(Op<-1>());
   }
   void setNormalDest(BasicBlock *B) {
     Op<-2>() = reinterpret_cast<Value*>(B);
   }
   void setUnwindDest(BasicBlock *B) {
     Op<-1>() = reinterpret_cast<Value*>(B);
   }
 
   /// getLandingPadInst - Get the landingpad instruction from the landing pad
   /// block (the unwind destination).
   LandingPadInst *getLandingPadInst() const;
 
   BasicBlock *getSuccessor(unsigned i) const {
     assert(i < 2 && "Successor # out of range for invoke!");
     return i == 0 ? getNormalDest() : getUnwindDest();
   }
 
   void setSuccessor(unsigned idx, BasicBlock *NewSucc) {
     assert(idx < 2 && "Successor # out of range for invoke!");
     *(&Op<-2>() + idx) = reinterpret_cast<Value*>(NewSucc);
   }
 
   unsigned getNumSuccessors() const { return 2; }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return (I->getOpcode() == Instruction::Invoke);
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   BasicBlock *getSuccessorV(unsigned idx) const override;
   unsigned getNumSuccessorsV() const override;
   void setSuccessorV(unsigned idx, BasicBlock *B) override;
 
   template <typename AttrKind> bool hasFnAttrImpl(AttrKind A) const {
     if (AttributeList.hasAttribute(AttributeSet::FunctionIndex, A))
       return true;
 
     // Operand bundles override attributes on the called function, but don't
     // override attributes directly present on the invoke instruction.
     if (isFnAttrDisallowedByOpBundle(A))
       return false;
 
     if (const Function *F = getCalledFunction())
       return F->getAttributes().hasAttribute(AttributeSet::FunctionIndex, A);
     return false;
   }
 
   // Shadow Instruction::setInstructionSubclassData with a private forwarding
   // method so that subclasses cannot accidentally use it.
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
 };
 
 template <>
 struct OperandTraits<InvokeInst> : public VariadicOperandTraits<InvokeInst, 3> {
 };
 
 InvokeInst::InvokeInst(FunctionType *Ty, Value *Func, BasicBlock *IfNormal,
                        BasicBlock *IfException, ArrayRef<Value *> Args,
                        ArrayRef<OperandBundleDef> Bundles, unsigned Values,
                        const Twine &NameStr, Instruction *InsertBefore)
     : TerminatorInst(Ty->getReturnType(), Instruction::Invoke,
                      OperandTraits<InvokeInst>::op_end(this) - Values, Values,
                      InsertBefore) {
   init(Ty, Func, IfNormal, IfException, Args, Bundles, NameStr);
 }
 InvokeInst::InvokeInst(Value *Func, BasicBlock *IfNormal,
                        BasicBlock *IfException, ArrayRef<Value *> Args,
                        ArrayRef<OperandBundleDef> Bundles, unsigned Values,
                        const Twine &NameStr, BasicBlock *InsertAtEnd)
     : TerminatorInst(
           cast<FunctionType>(cast<PointerType>(Func->getType())
                                  ->getElementType())->getReturnType(),
           Instruction::Invoke, OperandTraits<InvokeInst>::op_end(this) - Values,
           Values, InsertAtEnd) {
   init(Func, IfNormal, IfException, Args, Bundles, NameStr);
 }
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(InvokeInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                              ResumeInst Class
 //===----------------------------------------------------------------------===//
 
 //===---------------------------------------------------------------------------
 /// ResumeInst - Resume the propagation of an exception.
 ///
 class ResumeInst : public TerminatorInst {
   ResumeInst(const ResumeInst &RI);
 
   explicit ResumeInst(Value *Exn, Instruction *InsertBefore=nullptr);
   ResumeInst(Value *Exn, BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   ResumeInst *cloneImpl() const;
 
 public:
   static ResumeInst *Create(Value *Exn, Instruction *InsertBefore = nullptr) {
     return new(1) ResumeInst(Exn, InsertBefore);
   }
   static ResumeInst *Create(Value *Exn, BasicBlock *InsertAtEnd) {
     return new(1) ResumeInst(Exn, InsertAtEnd);
   }
 
   /// Provide fast operand accessors
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   /// Convenience accessor.
   Value *getValue() const { return Op<0>(); }
 
   unsigned getNumSuccessors() const { return 0; }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::Resume;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   BasicBlock *getSuccessorV(unsigned idx) const override;
   unsigned getNumSuccessorsV() const override;
   void setSuccessorV(unsigned idx, BasicBlock *B) override;
 };
 
 template <>
 struct OperandTraits<ResumeInst> :
     public FixedNumOperandTraits<ResumeInst, 1> {
 };
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(ResumeInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                         CatchSwitchInst Class
 //===----------------------------------------------------------------------===//
 class CatchSwitchInst : public TerminatorInst {
   void *operator new(size_t, unsigned) = delete;
   /// ReservedSpace - The number of operands actually allocated.  NumOperands is
   /// the number actually in use.
   unsigned ReservedSpace;
   // Operand[0] = Outer scope
   // Operand[1] = Unwind block destination
   // Operand[n] = BasicBlock to go to on match
   CatchSwitchInst(const CatchSwitchInst &CSI);
   void init(Value *ParentPad, BasicBlock *UnwindDest, unsigned NumReserved);
   void growOperands(unsigned Size);
   // allocate space for exactly zero operands
   void *operator new(size_t s) { return User::operator new(s); }
   /// CatchSwitchInst ctor - Create a new switch instruction, specifying a
   /// default destination.  The number of additional handlers can be specified
   /// here to make memory allocation more efficient.
   /// This constructor can also autoinsert before another instruction.
   CatchSwitchInst(Value *ParentPad, BasicBlock *UnwindDest,
                   unsigned NumHandlers, const Twine &NameStr,
                   Instruction *InsertBefore);
 
   /// CatchSwitchInst ctor - Create a new switch instruction, specifying a
   /// default destination.  The number of additional handlers can be specified
   /// here to make memory allocation more efficient.
   /// This constructor also autoinserts at the end of the specified BasicBlock.
   CatchSwitchInst(Value *ParentPad, BasicBlock *UnwindDest,
                   unsigned NumHandlers, const Twine &NameStr,
                   BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   CatchSwitchInst *cloneImpl() const;
 
 public:
   static CatchSwitchInst *Create(Value *ParentPad, BasicBlock *UnwindDest,
                                  unsigned NumHandlers,
                                  const Twine &NameStr = "",
                                  Instruction *InsertBefore = nullptr) {
     return new CatchSwitchInst(ParentPad, UnwindDest, NumHandlers, NameStr,
                                InsertBefore);
   }
   static CatchSwitchInst *Create(Value *ParentPad, BasicBlock *UnwindDest,
                                  unsigned NumHandlers, const Twine &NameStr,
                                  BasicBlock *InsertAtEnd) {
     return new CatchSwitchInst(ParentPad, UnwindDest, NumHandlers, NameStr,
                                InsertAtEnd);
   }
 
   /// Provide fast operand accessors
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   // Accessor Methods for CatchSwitch stmt
   Value *getParentPad() const { return getOperand(0); }
   void setParentPad(Value *ParentPad) { setOperand(0, ParentPad); }
 
   // Accessor Methods for CatchSwitch stmt
   bool hasUnwindDest() const { return getSubclassDataFromInstruction() & 1; }
   bool unwindsToCaller() const { return !hasUnwindDest(); }
   BasicBlock *getUnwindDest() const {
     if (hasUnwindDest())
       return cast<BasicBlock>(getOperand(1));
     return nullptr;
   }
   void setUnwindDest(BasicBlock *UnwindDest) {
     assert(UnwindDest);
     assert(hasUnwindDest());
     setOperand(1, UnwindDest);
   }
 
   /// getNumHandlers - return the number of 'handlers' in this catchswitch
   /// instruction, except the default handler
   unsigned getNumHandlers() const {
     if (hasUnwindDest())
       return getNumOperands() - 2;
     return getNumOperands() - 1;
   }
 
 private:
   static BasicBlock *handler_helper(Value *V) { return cast<BasicBlock>(V); }
   static const BasicBlock *handler_helper(const Value *V) {
     return cast<BasicBlock>(V);
   }
 
 public:
   typedef std::pointer_to_unary_function<Value *, BasicBlock *> DerefFnTy;
   typedef mapped_iterator<op_iterator, DerefFnTy> handler_iterator;
   typedef iterator_range<handler_iterator> handler_range;
 
 
   typedef std::pointer_to_unary_function<const Value *, const BasicBlock *>
       ConstDerefFnTy;
   typedef mapped_iterator<const_op_iterator, ConstDerefFnTy> const_handler_iterator;
   typedef iterator_range<const_handler_iterator> const_handler_range;
 
   /// Returns an iterator that points to the first handler in CatchSwitchInst.
   handler_iterator handler_begin() {
     op_iterator It = op_begin() + 1;
     if (hasUnwindDest())
       ++It;
     return handler_iterator(It, DerefFnTy(handler_helper));
   }
   /// Returns an iterator that points to the first handler in the
   /// CatchSwitchInst.
   const_handler_iterator handler_begin() const {
     const_op_iterator It = op_begin() + 1;
     if (hasUnwindDest())
       ++It;
     return const_handler_iterator(It, ConstDerefFnTy(handler_helper));
   }
 
   /// Returns a read-only iterator that points one past the last
   /// handler in the CatchSwitchInst.
   handler_iterator handler_end() {
     return handler_iterator(op_end(), DerefFnTy(handler_helper));
   }
   /// Returns an iterator that points one past the last handler in the
   /// CatchSwitchInst.
   const_handler_iterator handler_end() const {
     return const_handler_iterator(op_end(), ConstDerefFnTy(handler_helper));
   }
 
   /// handlers - iteration adapter for range-for loops.
   handler_range handlers() {
     return make_range(handler_begin(), handler_end());
   }
 
   /// handlers - iteration adapter for range-for loops.
   const_handler_range handlers() const {
     return make_range(handler_begin(), handler_end());
   }
 
   /// addHandler - Add an entry to the switch instruction...
   /// Note:
   /// This action invalidates handler_end(). Old handler_end() iterator will
   /// point to the added handler.
   void addHandler(BasicBlock *Dest);
 
   void removeHandler(handler_iterator HI);
 
   unsigned getNumSuccessors() const { return getNumOperands() - 1; }
   BasicBlock *getSuccessor(unsigned Idx) const {
     assert(Idx < getNumSuccessors() &&
            "Successor # out of range for catchswitch!");
     return cast<BasicBlock>(getOperand(Idx + 1));
   }
   void setSuccessor(unsigned Idx, BasicBlock *NewSucc) {
     assert(Idx < getNumSuccessors() &&
            "Successor # out of range for catchswitch!");
     setOperand(Idx + 1, NewSucc);
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::CatchSwitch;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   BasicBlock *getSuccessorV(unsigned Idx) const override;
   unsigned getNumSuccessorsV() const override;
   void setSuccessorV(unsigned Idx, BasicBlock *B) override;
 };
 
 template <>
 struct OperandTraits<CatchSwitchInst> : public HungoffOperandTraits<2> {};
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(CatchSwitchInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                               CleanupPadInst Class
 //===----------------------------------------------------------------------===//
 class CleanupPadInst : public FuncletPadInst {
 private:
   explicit CleanupPadInst(Value *ParentPad, ArrayRef<Value *> Args,
                           unsigned Values, const Twine &NameStr,
                           Instruction *InsertBefore)
       : FuncletPadInst(Instruction::CleanupPad, ParentPad, Args, Values,
                        NameStr, InsertBefore) {}
   explicit CleanupPadInst(Value *ParentPad, ArrayRef<Value *> Args,
                           unsigned Values, const Twine &NameStr,
                           BasicBlock *InsertAtEnd)
       : FuncletPadInst(Instruction::CleanupPad, ParentPad, Args, Values,
                        NameStr, InsertAtEnd) {}
 
 public:
   static CleanupPadInst *Create(Value *ParentPad, ArrayRef<Value *> Args = None,
                                 const Twine &NameStr = "",
                                 Instruction *InsertBefore = nullptr) {
     unsigned Values = 1 + Args.size();
     return new (Values)
         CleanupPadInst(ParentPad, Args, Values, NameStr, InsertBefore);
   }
   static CleanupPadInst *Create(Value *ParentPad, ArrayRef<Value *> Args,
                                 const Twine &NameStr, BasicBlock *InsertAtEnd) {
     unsigned Values = 1 + Args.size();
     return new (Values)
         CleanupPadInst(ParentPad, Args, Values, NameStr, InsertAtEnd);
   }
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::CleanupPad;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                               CatchPadInst Class
 //===----------------------------------------------------------------------===//
 class CatchPadInst : public FuncletPadInst {
 private:
   explicit CatchPadInst(Value *CatchSwitch, ArrayRef<Value *> Args,
                         unsigned Values, const Twine &NameStr,
                         Instruction *InsertBefore)
       : FuncletPadInst(Instruction::CatchPad, CatchSwitch, Args, Values,
                        NameStr, InsertBefore) {}
   explicit CatchPadInst(Value *CatchSwitch, ArrayRef<Value *> Args,
                         unsigned Values, const Twine &NameStr,
                         BasicBlock *InsertAtEnd)
       : FuncletPadInst(Instruction::CatchPad, CatchSwitch, Args, Values,
                        NameStr, InsertAtEnd) {}
 
 public:
   static CatchPadInst *Create(Value *CatchSwitch, ArrayRef<Value *> Args,
                               const Twine &NameStr = "",
                               Instruction *InsertBefore = nullptr) {
     unsigned Values = 1 + Args.size();
     return new (Values)
         CatchPadInst(CatchSwitch, Args, Values, NameStr, InsertBefore);
   }
   static CatchPadInst *Create(Value *CatchSwitch, ArrayRef<Value *> Args,
                               const Twine &NameStr, BasicBlock *InsertAtEnd) {
     unsigned Values = 1 + Args.size();
     return new (Values)
         CatchPadInst(CatchSwitch, Args, Values, NameStr, InsertAtEnd);
   }
 
   /// Convenience accessors
   CatchSwitchInst *getCatchSwitch() const {
     return cast<CatchSwitchInst>(Op<-1>());
   }
   void setCatchSwitch(Value *CatchSwitch) {
     assert(CatchSwitch);
     Op<-1>() = CatchSwitch;
   }
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::CatchPad;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                               CatchReturnInst Class
 //===----------------------------------------------------------------------===//
 
 class CatchReturnInst : public TerminatorInst {
   CatchReturnInst(const CatchReturnInst &RI);
 
   void init(Value *CatchPad, BasicBlock *BB);
   CatchReturnInst(Value *CatchPad, BasicBlock *BB, Instruction *InsertBefore);
   CatchReturnInst(Value *CatchPad, BasicBlock *BB, BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   CatchReturnInst *cloneImpl() const;
 
 public:
   static CatchReturnInst *Create(Value *CatchPad, BasicBlock *BB,
                                  Instruction *InsertBefore = nullptr) {
     assert(CatchPad);
     assert(BB);
     return new (2) CatchReturnInst(CatchPad, BB, InsertBefore);
   }
   static CatchReturnInst *Create(Value *CatchPad, BasicBlock *BB,
                                  BasicBlock *InsertAtEnd) {
     assert(CatchPad);
     assert(BB);
     return new (2) CatchReturnInst(CatchPad, BB, InsertAtEnd);
   }
 
   /// Provide fast operand accessors
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   /// Convenience accessors.
   CatchPadInst *getCatchPad() const { return cast<CatchPadInst>(Op<0>()); }
   void setCatchPad(CatchPadInst *CatchPad) {
     assert(CatchPad);
     Op<0>() = CatchPad;
   }
 
   BasicBlock *getSuccessor() const { return cast<BasicBlock>(Op<1>()); }
   void setSuccessor(BasicBlock *NewSucc) {
     assert(NewSucc);
     Op<1>() = NewSucc;
   }
   unsigned getNumSuccessors() const { return 1; }
 
   Value *getParentPad() const {
     return getCatchPad()->getCatchSwitch()->getParentPad();
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return (I->getOpcode() == Instruction::CatchRet);
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   BasicBlock *getSuccessorV(unsigned Idx) const override;
   unsigned getNumSuccessorsV() const override;
   void setSuccessorV(unsigned Idx, BasicBlock *B) override;
 };
 
 template <>
 struct OperandTraits<CatchReturnInst>
     : public FixedNumOperandTraits<CatchReturnInst, 2> {};
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(CatchReturnInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                               CleanupReturnInst Class
 //===----------------------------------------------------------------------===//
 
 class CleanupReturnInst : public TerminatorInst {
 private:
   CleanupReturnInst(const CleanupReturnInst &RI);
 
   void init(Value *CleanupPad, BasicBlock *UnwindBB);
   CleanupReturnInst(Value *CleanupPad, BasicBlock *UnwindBB, unsigned Values,
                     Instruction *InsertBefore = nullptr);
   CleanupReturnInst(Value *CleanupPad, BasicBlock *UnwindBB, unsigned Values,
                     BasicBlock *InsertAtEnd);
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   CleanupReturnInst *cloneImpl() const;
 
 public:
   static CleanupReturnInst *Create(Value *CleanupPad,
                                    BasicBlock *UnwindBB = nullptr,
                                    Instruction *InsertBefore = nullptr) {
     assert(CleanupPad);
     unsigned Values = 1;
     if (UnwindBB)
       ++Values;
     return new (Values)
         CleanupReturnInst(CleanupPad, UnwindBB, Values, InsertBefore);
   }
   static CleanupReturnInst *Create(Value *CleanupPad, BasicBlock *UnwindBB,
                                    BasicBlock *InsertAtEnd) {
     assert(CleanupPad);
     unsigned Values = 1;
     if (UnwindBB)
       ++Values;
     return new (Values)
         CleanupReturnInst(CleanupPad, UnwindBB, Values, InsertAtEnd);
   }
 
   /// Provide fast operand accessors
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
   bool hasUnwindDest() const { return getSubclassDataFromInstruction() & 1; }
   bool unwindsToCaller() const { return !hasUnwindDest(); }
 
   /// Convenience accessor.
   CleanupPadInst *getCleanupPad() const {
     return cast<CleanupPadInst>(Op<0>());
   }
   void setCleanupPad(CleanupPadInst *CleanupPad) {
     assert(CleanupPad);
     Op<0>() = CleanupPad;
   }
 
   unsigned getNumSuccessors() const { return hasUnwindDest() ? 1 : 0; }
 
   BasicBlock *getUnwindDest() const {
     return hasUnwindDest() ? cast<BasicBlock>(Op<1>()) : nullptr;
   }
   void setUnwindDest(BasicBlock *NewDest) {
     assert(NewDest);
     assert(hasUnwindDest());
     Op<1>() = NewDest;
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return (I->getOpcode() == Instruction::CleanupRet);
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   BasicBlock *getSuccessorV(unsigned Idx) const override;
   unsigned getNumSuccessorsV() const override;
   void setSuccessorV(unsigned Idx, BasicBlock *B) override;
 
   // Shadow Instruction::setInstructionSubclassData with a private forwarding
   // method so that subclasses cannot accidentally use it.
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
 };
 
 template <>
 struct OperandTraits<CleanupReturnInst>
     : public VariadicOperandTraits<CleanupReturnInst, /*MINARITY=*/1> {};
 
 DEFINE_TRANSPARENT_OPERAND_ACCESSORS(CleanupReturnInst, Value)
 
 //===----------------------------------------------------------------------===//
 //                           UnreachableInst Class
 //===----------------------------------------------------------------------===//
 
 //===---------------------------------------------------------------------------
 /// UnreachableInst - This function has undefined behavior.  In particular, the
 /// presence of this instruction indicates some higher level knowledge that the
 /// end of the block cannot be reached.
 ///
 class UnreachableInst : public TerminatorInst {
   void *operator new(size_t, unsigned) = delete;
 
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   UnreachableInst *cloneImpl() const;
 
 public:
   // allocate space for exactly zero operands
   void *operator new(size_t s) {
     return User::operator new(s, 0);
   }
   explicit UnreachableInst(LLVMContext &C, Instruction *InsertBefore = nullptr);
   explicit UnreachableInst(LLVMContext &C, BasicBlock *InsertAtEnd);
 
   unsigned getNumSuccessors() const { return 0; }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Instruction::Unreachable;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 
 private:
   BasicBlock *getSuccessorV(unsigned idx) const override;
   unsigned getNumSuccessorsV() const override;
   void setSuccessorV(unsigned idx, BasicBlock *B) override;
 };
 
 //===----------------------------------------------------------------------===//
 //                                 TruncInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents a truncation of integer types.
 class TruncInst : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical TruncInst
   TruncInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   TruncInst(
     Value *S,                           ///< The value to be truncated
     Type *Ty,                           ///< The (smaller) type to truncate to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end-of-block semantics
   TruncInst(
     Value *S,                     ///< The value to be truncated
     Type *Ty,                     ///< The (smaller) type to truncate to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == Trunc;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                 ZExtInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents zero extension of integer types.
 class ZExtInst : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical ZExtInst
   ZExtInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   ZExtInst(
     Value *S,                           ///< The value to be zero extended
     Type *Ty,                           ///< The type to zero extend to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end semantics.
   ZExtInst(
     Value *S,                     ///< The value to be zero extended
     Type *Ty,                     ///< The type to zero extend to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == ZExt;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                 SExtInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents a sign extension of integer types.
 class SExtInst : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical SExtInst
   SExtInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   SExtInst(
     Value *S,                           ///< The value to be sign extended
     Type *Ty,                           ///< The type to sign extend to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end-of-block semantics
   SExtInst(
     Value *S,                     ///< The value to be sign extended
     Type *Ty,                     ///< The type to sign extend to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == SExt;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                 FPTruncInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents a truncation of floating point types.
 class FPTruncInst : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical FPTruncInst
   FPTruncInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   FPTruncInst(
     Value *S,                           ///< The value to be truncated
     Type *Ty,                           ///< The type to truncate to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-before-instruction semantics
   FPTruncInst(
     Value *S,                     ///< The value to be truncated
     Type *Ty,                     ///< The type to truncate to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == FPTrunc;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                 FPExtInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents an extension of floating point types.
 class FPExtInst : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical FPExtInst
   FPExtInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   FPExtInst(
     Value *S,                           ///< The value to be extended
     Type *Ty,                           ///< The type to extend to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end-of-block semantics
   FPExtInst(
     Value *S,                     ///< The value to be extended
     Type *Ty,                     ///< The type to extend to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == FPExt;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                 UIToFPInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents a cast unsigned integer to floating point.
 class UIToFPInst : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical UIToFPInst
   UIToFPInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   UIToFPInst(
     Value *S,                           ///< The value to be converted
     Type *Ty,                           ///< The type to convert to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end-of-block semantics
   UIToFPInst(
     Value *S,                     ///< The value to be converted
     Type *Ty,                     ///< The type to convert to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == UIToFP;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                 SIToFPInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents a cast from signed integer to floating point.
 class SIToFPInst : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical SIToFPInst
   SIToFPInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   SIToFPInst(
     Value *S,                           ///< The value to be converted
     Type *Ty,                           ///< The type to convert to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end-of-block semantics
   SIToFPInst(
     Value *S,                     ///< The value to be converted
     Type *Ty,                     ///< The type to convert to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == SIToFP;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                 FPToUIInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents a cast from floating point to unsigned integer
 class FPToUIInst  : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical FPToUIInst
   FPToUIInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   FPToUIInst(
     Value *S,                           ///< The value to be converted
     Type *Ty,                           ///< The type to convert to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end-of-block semantics
   FPToUIInst(
     Value *S,                     ///< The value to be converted
     Type *Ty,                     ///< The type to convert to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< Where to insert the new instruction
   );
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == FPToUI;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                 FPToSIInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents a cast from floating point to signed integer.
 class FPToSIInst  : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical FPToSIInst
   FPToSIInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   FPToSIInst(
     Value *S,                           ///< The value to be converted
     Type *Ty,                           ///< The type to convert to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end-of-block semantics
   FPToSIInst(
     Value *S,                     ///< The value to be converted
     Type *Ty,                     ///< The type to convert to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   /// \brief Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == FPToSI;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                 IntToPtrInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents a cast from an integer to a pointer.
 class IntToPtrInst : public CastInst {
 public:
   /// \brief Constructor with insert-before-instruction semantics
   IntToPtrInst(
     Value *S,                           ///< The value to be converted
     Type *Ty,                           ///< The type to convert to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end-of-block semantics
   IntToPtrInst(
     Value *S,                     ///< The value to be converted
     Type *Ty,                     ///< The type to convert to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical IntToPtrInst
   IntToPtrInst *cloneImpl() const;
 
   /// \brief Returns the address space of this instruction's pointer type.
   unsigned getAddressSpace() const {
     return getType()->getPointerAddressSpace();
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == IntToPtr;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                                 PtrToIntInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents a cast from a pointer to an integer
 class PtrToIntInst : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical PtrToIntInst
   PtrToIntInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   PtrToIntInst(
     Value *S,                           ///< The value to be converted
     Type *Ty,                           ///< The type to convert to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end-of-block semantics
   PtrToIntInst(
     Value *S,                     ///< The value to be converted
     Type *Ty,                     ///< The type to convert to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   /// \brief Gets the pointer operand.
   Value *getPointerOperand() { return getOperand(0); }
   /// \brief Gets the pointer operand.
   const Value *getPointerOperand() const { return getOperand(0); }
   /// \brief Gets the operand index of the pointer operand.
   static unsigned getPointerOperandIndex() { return 0U; }
 
   /// \brief Returns the address space of the pointer operand.
   unsigned getPointerAddressSpace() const {
     return getPointerOperand()->getType()->getPointerAddressSpace();
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == PtrToInt;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                             BitCastInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents a no-op cast from one type to another.
 class BitCastInst : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical BitCastInst
   BitCastInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   BitCastInst(
     Value *S,                           ///< The value to be casted
     Type *Ty,                           ///< The type to casted to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end-of-block semantics
   BitCastInst(
     Value *S,                     ///< The value to be casted
     Type *Ty,                     ///< The type to casted to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == BitCast;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 //===----------------------------------------------------------------------===//
 //                          AddrSpaceCastInst Class
 //===----------------------------------------------------------------------===//
 
 /// \brief This class represents a conversion between pointers from
 /// one address space to another.
 class AddrSpaceCastInst : public CastInst {
 protected:
   // Note: Instruction needs to be a friend here to call cloneImpl.
   friend class Instruction;
   /// \brief Clone an identical AddrSpaceCastInst
   AddrSpaceCastInst *cloneImpl() const;
 
 public:
   /// \brief Constructor with insert-before-instruction semantics
   AddrSpaceCastInst(
     Value *S,                           ///< The value to be casted
     Type *Ty,                           ///< The type to casted to
     const Twine &NameStr = "",          ///< A name for the new instruction
     Instruction *InsertBefore = nullptr ///< Where to insert the new instruction
   );
 
   /// \brief Constructor with insert-at-end-of-block semantics
   AddrSpaceCastInst(
     Value *S,                     ///< The value to be casted
     Type *Ty,                     ///< The type to casted to
     const Twine &NameStr,         ///< A name for the new instruction
     BasicBlock *InsertAtEnd       ///< The block to insert the instruction into
   );
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
   static inline bool classof(const Instruction *I) {
     return I->getOpcode() == AddrSpaceCast;
   }
   static inline bool classof(const Value *V) {
     return isa<Instruction>(V) && classof(cast<Instruction>(V));
   }
 };
 
 } // End llvm namespace
 
 #endif
Index: vendor/llvm/dist/lib/CodeGen/CalcSpillWeights.cpp
===================================================================
--- vendor/llvm/dist/lib/CodeGen/CalcSpillWeights.cpp	(revision 295845)
+++ vendor/llvm/dist/lib/CodeGen/CalcSpillWeights.cpp	(revision 295846)
@@ -1,230 +1,233 @@
 //===------------------------ CalcSpillWeights.cpp ------------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/CodeGen/VirtRegMap.h"
 #include "llvm/CodeGen/CalcSpillWeights.h"
 #include "llvm/CodeGen/LiveIntervalAnalysis.h"
 #include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetInstrInfo.h"
 #include "llvm/Target/TargetRegisterInfo.h"
 #include "llvm/Target/TargetSubtargetInfo.h"
 using namespace llvm;
 
 #define DEBUG_TYPE "calcspillweights"
 
 void llvm::calculateSpillWeightsAndHints(LiveIntervals &LIS,
                            MachineFunction &MF,
                            VirtRegMap *VRM,
                            const MachineLoopInfo &MLI,
                            const MachineBlockFrequencyInfo &MBFI,
                            VirtRegAuxInfo::NormalizingFn norm) {
   DEBUG(dbgs() << "********** Compute Spill Weights **********\n"
                << "********** Function: " << MF.getName() << '\n');
 
   MachineRegisterInfo &MRI = MF.getRegInfo();
   VirtRegAuxInfo VRAI(MF, LIS, VRM, MLI, MBFI, norm);
   for (unsigned i = 0, e = MRI.getNumVirtRegs(); i != e; ++i) {
     unsigned Reg = TargetRegisterInfo::index2VirtReg(i);
     if (MRI.reg_nodbg_empty(Reg))
       continue;
     VRAI.calculateSpillWeightAndHint(LIS.getInterval(Reg));
   }
 }
 
 // Return the preferred allocation register for reg, given a COPY instruction.
 static unsigned copyHint(const MachineInstr *mi, unsigned reg,
                          const TargetRegisterInfo &tri,
                          const MachineRegisterInfo &mri) {
   unsigned sub, hreg, hsub;
   if (mi->getOperand(0).getReg() == reg) {
     sub = mi->getOperand(0).getSubReg();
     hreg = mi->getOperand(1).getReg();
     hsub = mi->getOperand(1).getSubReg();
   } else {
     sub = mi->getOperand(1).getSubReg();
     hreg = mi->getOperand(0).getReg();
     hsub = mi->getOperand(0).getSubReg();
   }
 
   if (!hreg)
     return 0;
 
   if (TargetRegisterInfo::isVirtualRegister(hreg))
     return sub == hsub ? hreg : 0;
 
   const TargetRegisterClass *rc = mri.getRegClass(reg);
 
   // Only allow physreg hints in rc.
   if (sub == 0)
     return rc->contains(hreg) ? hreg : 0;
 
   // reg:sub should match the physreg hreg.
   return tri.getMatchingSuperReg(hreg, sub, rc);
 }
 
 // Check if all values in LI are rematerializable
 static bool isRematerializable(const LiveInterval &LI,
                                const LiveIntervals &LIS,
                                VirtRegMap *VRM,
                                const TargetInstrInfo &TII) {
   unsigned Reg = LI.reg;
   unsigned Original = VRM ? VRM->getOriginal(Reg) : 0;
   for (LiveInterval::const_vni_iterator I = LI.vni_begin(), E = LI.vni_end();
        I != E; ++I) {
     const VNInfo *VNI = *I;
     if (VNI->isUnused())
       continue;
     if (VNI->isPHIDef())
       return false;
 
     MachineInstr *MI = LIS.getInstructionFromIndex(VNI->def);
     assert(MI && "Dead valno in interval");
 
     // Trace copies introduced by live range splitting.  The inline
     // spiller can rematerialize through these copies, so the spill
     // weight must reflect this.
     if (VRM) {
       while (MI->isFullCopy()) {
         // The copy destination must match the interval register.
         if (MI->getOperand(0).getReg() != Reg)
           return false;
 
         // Get the source register.
         Reg = MI->getOperand(1).getReg();
 
         // If the original (pre-splitting) registers match this
         // copy came from a split.
         if (!TargetRegisterInfo::isVirtualRegister(Reg) ||
             VRM->getOriginal(Reg) != Original)
           return false;
 
         // Follow the copy live-in value.
         const LiveInterval &SrcLI = LIS.getInterval(Reg);
         LiveQueryResult SrcQ = SrcLI.Query(VNI->def);
         VNI = SrcQ.valueIn();
         assert(VNI && "Copy from non-existing value");
         if (VNI->isPHIDef())
           return false;
         MI = LIS.getInstructionFromIndex(VNI->def);
         assert(MI && "Dead valno in interval");
       }
     }
 
     if (!TII.isTriviallyReMaterializable(MI, LIS.getAliasAnalysis()))
       return false;
   }
   return true;
 }
 
 void
 VirtRegAuxInfo::calculateSpillWeightAndHint(LiveInterval &li) {
   MachineRegisterInfo &mri = MF.getRegInfo();
   const TargetRegisterInfo &tri = *MF.getSubtarget().getRegisterInfo();
   MachineBasicBlock *mbb = nullptr;
   MachineLoop *loop = nullptr;
   bool isExiting = false;
   float totalWeight = 0;
   unsigned numInstr = 0; // Number of instructions using li
   SmallPtrSet<MachineInstr*, 8> visited;
 
   // Find the best physreg hint and the best virtreg hint.
   float bestPhys = 0, bestVirt = 0;
   unsigned hintPhys = 0, hintVirt = 0;
 
   // Don't recompute a target specific hint.
   bool noHint = mri.getRegAllocationHint(li.reg).first != 0;
 
   // Don't recompute spill weight for an unspillable register.
   bool Spillable = li.isSpillable();
 
   for (MachineRegisterInfo::reg_instr_iterator
        I = mri.reg_instr_begin(li.reg), E = mri.reg_instr_end();
        I != E; ) {
     MachineInstr *mi = &*(I++);
     numInstr++;
     if (mi->isIdentityCopy() || mi->isImplicitDef() || mi->isDebugValue())
       continue;
     if (!visited.insert(mi).second)
       continue;
 
     float weight = 1.0f;
     if (Spillable) {
       // Get loop info for mi.
       if (mi->getParent() != mbb) {
         mbb = mi->getParent();
         loop = Loops.getLoopFor(mbb);
         isExiting = loop ? loop->isLoopExiting(mbb) : false;
       }
 
       // Calculate instr weight.
       bool reads, writes;
       std::tie(reads, writes) = mi->readsWritesVirtualRegister(li.reg);
       weight = LiveIntervals::getSpillWeight(
         writes, reads, &MBFI, mi);
 
       // Give extra weight to what looks like a loop induction variable update.
       if (writes && isExiting && LIS.isLiveOutOfMBB(li, mbb))
         weight *= 3;
 
       totalWeight += weight;
     }
 
     // Get allocation hints from copies.
     if (noHint || !mi->isCopy())
       continue;
     unsigned hint = copyHint(mi, li.reg, tri, mri);
     if (!hint)
       continue;
     // Force hweight onto the stack so that x86 doesn't add hidden precision,
     // making the comparison incorrectly pass (i.e., 1 > 1 == true??).
     //
     // FIXME: we probably shouldn't use floats at all.
     volatile float hweight = Hint[hint] += weight;
     if (TargetRegisterInfo::isPhysicalRegister(hint)) {
       if (hweight > bestPhys && mri.isAllocatable(hint))
         bestPhys = hweight, hintPhys = hint;
     } else {
       if (hweight > bestVirt)
         bestVirt = hweight, hintVirt = hint;
     }
   }
 
   Hint.clear();
 
   // Always prefer the physreg hint.
   if (unsigned hint = hintPhys ? hintPhys : hintVirt) {
     mri.setRegAllocationHint(li.reg, 0, hint);
     // Weakly boost the spill weight of hinted registers.
     totalWeight *= 1.01F;
   }
 
   // If the live interval was already unspillable, leave it that way.
   if (!Spillable)
     return;
 
-  // Mark li as unspillable if all live ranges are tiny.
-  if (li.isZeroLength(LIS.getSlotIndexes())) {
+  // Mark li as unspillable if all live ranges are tiny and the interval
+  // is not live at any reg mask.  If the interval is live at a reg mask
+  // spilling may be required.
+  if (li.isZeroLength(LIS.getSlotIndexes()) &&
+      !li.isLiveAtIndexes(LIS.getRegMaskSlots())) {
     li.markNotSpillable();
     return;
   }
 
   // If all of the definitions of the interval are re-materializable,
   // it is a preferred candidate for spilling.
   // FIXME: this gets much more complicated once we support non-trivial
   // re-materialization.
   if (isRematerializable(li, LIS, VRM, *MF.getSubtarget().getInstrInfo()))
     totalWeight *= 0.5F;
 
   li.weight = normalize(totalWeight, li.getSize(), numInstr);
 }
Index: vendor/llvm/dist/lib/CodeGen/LiveInterval.cpp
===================================================================
--- vendor/llvm/dist/lib/CodeGen/LiveInterval.cpp	(revision 295845)
+++ vendor/llvm/dist/lib/CodeGen/LiveInterval.cpp	(revision 295846)
@@ -1,1468 +1,1502 @@
 //===-- LiveInterval.cpp - Live Interval Representation -------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file implements the LiveRange and LiveInterval classes.  Given some
 // numbering of each the machine instructions an interval [i, j) is said to be a
 // live range for register v if there is no instruction with number j' >= j
 // such that v is live at j' and there is no instruction with number i' < i such
 // that v is live at i'. In this implementation ranges can have holes,
 // i.e. a range might look like [1,20), [50,65), [1000,1001).  Each
 // individual segment is represented as an instance of LiveRange::Segment,
 // and the whole range is represented as an instance of LiveRange.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/CodeGen/LiveInterval.h"
 #include "RegisterCoalescer.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/CodeGen/LiveIntervalAnalysis.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetRegisterInfo.h"
 #include <algorithm>
 using namespace llvm;
 
 namespace {
 //===----------------------------------------------------------------------===//
 // Implementation of various methods necessary for calculation of live ranges.
 // The implementation of the methods abstracts from the concrete type of the
 // segment collection.
 //
 // Implementation of the class follows the Template design pattern. The base
 // class contains generic algorithms that call collection-specific methods,
 // which are provided in concrete subclasses. In order to avoid virtual calls
 // these methods are provided by means of C++ template instantiation.
 // The base class calls the methods of the subclass through method impl(),
 // which casts 'this' pointer to the type of the subclass.
 //
 //===----------------------------------------------------------------------===//
 
 template <typename ImplT, typename IteratorT, typename CollectionT>
 class CalcLiveRangeUtilBase {
 protected:
   LiveRange *LR;
 
 protected:
   CalcLiveRangeUtilBase(LiveRange *LR) : LR(LR) {}
 
 public:
   typedef LiveRange::Segment Segment;
   typedef IteratorT iterator;
 
   VNInfo *createDeadDef(SlotIndex Def, VNInfo::Allocator &VNInfoAllocator) {
     assert(!Def.isDead() && "Cannot define a value at the dead slot");
 
     iterator I = impl().find(Def);
     if (I == segments().end()) {
       VNInfo *VNI = LR->getNextValue(Def, VNInfoAllocator);
       impl().insertAtEnd(Segment(Def, Def.getDeadSlot(), VNI));
       return VNI;
     }
 
     Segment *S = segmentAt(I);
     if (SlotIndex::isSameInstr(Def, S->start)) {
       assert(S->valno->def == S->start && "Inconsistent existing value def");
 
       // It is possible to have both normal and early-clobber defs of the same
       // register on an instruction. It doesn't make a lot of sense, but it is
       // possible to specify in inline assembly.
       //
       // Just convert everything to early-clobber.
       Def = std::min(Def, S->start);
       if (Def != S->start)
         S->start = S->valno->def = Def;
       return S->valno;
     }
     assert(SlotIndex::isEarlierInstr(Def, S->start) && "Already live at def");
     VNInfo *VNI = LR->getNextValue(Def, VNInfoAllocator);
     segments().insert(I, Segment(Def, Def.getDeadSlot(), VNI));
     return VNI;
   }
 
   VNInfo *extendInBlock(SlotIndex StartIdx, SlotIndex Use) {
     if (segments().empty())
       return nullptr;
     iterator I =
         impl().findInsertPos(Segment(Use.getPrevSlot(), Use, nullptr));
     if (I == segments().begin())
       return nullptr;
     --I;
     if (I->end <= StartIdx)
       return nullptr;
     if (I->end < Use)
       extendSegmentEndTo(I, Use);
     return I->valno;
   }
 
   /// This method is used when we want to extend the segment specified
   /// by I to end at the specified endpoint. To do this, we should
   /// merge and eliminate all segments that this will overlap
   /// with. The iterator is not invalidated.
   void extendSegmentEndTo(iterator I, SlotIndex NewEnd) {
     assert(I != segments().end() && "Not a valid segment!");
     Segment *S = segmentAt(I);
     VNInfo *ValNo = I->valno;
 
     // Search for the first segment that we can't merge with.
     iterator MergeTo = std::next(I);
     for (; MergeTo != segments().end() && NewEnd >= MergeTo->end; ++MergeTo)
       assert(MergeTo->valno == ValNo && "Cannot merge with differing values!");
 
     // If NewEnd was in the middle of a segment, make sure to get its endpoint.
     S->end = std::max(NewEnd, std::prev(MergeTo)->end);
 
     // If the newly formed segment now touches the segment after it and if they
     // have the same value number, merge the two segments into one segment.
     if (MergeTo != segments().end() && MergeTo->start <= I->end &&
         MergeTo->valno == ValNo) {
       S->end = MergeTo->end;
       ++MergeTo;
     }
 
     // Erase any dead segments.
     segments().erase(std::next(I), MergeTo);
   }
 
   /// This method is used when we want to extend the segment specified
   /// by I to start at the specified endpoint.  To do this, we should
   /// merge and eliminate all segments that this will overlap with.
   iterator extendSegmentStartTo(iterator I, SlotIndex NewStart) {
     assert(I != segments().end() && "Not a valid segment!");
     Segment *S = segmentAt(I);
     VNInfo *ValNo = I->valno;
 
     // Search for the first segment that we can't merge with.
     iterator MergeTo = I;
     do {
       if (MergeTo == segments().begin()) {
         S->start = NewStart;
         segments().erase(MergeTo, I);
         return I;
       }
       assert(MergeTo->valno == ValNo && "Cannot merge with differing values!");
       --MergeTo;
     } while (NewStart <= MergeTo->start);
 
     // If we start in the middle of another segment, just delete a range and
     // extend that segment.
     if (MergeTo->end >= NewStart && MergeTo->valno == ValNo) {
       segmentAt(MergeTo)->end = S->end;
     } else {
       // Otherwise, extend the segment right after.
       ++MergeTo;
       Segment *MergeToSeg = segmentAt(MergeTo);
       MergeToSeg->start = NewStart;
       MergeToSeg->end = S->end;
     }
 
     segments().erase(std::next(MergeTo), std::next(I));
     return MergeTo;
   }
 
   iterator addSegment(Segment S) {
     SlotIndex Start = S.start, End = S.end;
     iterator I = impl().findInsertPos(S);
 
     // If the inserted segment starts in the middle or right at the end of
     // another segment, just extend that segment to contain the segment of S.
     if (I != segments().begin()) {
       iterator B = std::prev(I);
       if (S.valno == B->valno) {
         if (B->start <= Start && B->end >= Start) {
           extendSegmentEndTo(B, End);
           return B;
         }
       } else {
         // Check to make sure that we are not overlapping two live segments with
         // different valno's.
         assert(B->end <= Start &&
                "Cannot overlap two segments with differing ValID's"
                " (did you def the same reg twice in a MachineInstr?)");
       }
     }
 
     // Otherwise, if this segment ends in the middle of, or right next
     // to, another segment, merge it into that segment.
     if (I != segments().end()) {
       if (S.valno == I->valno) {
         if (I->start <= End) {
           I = extendSegmentStartTo(I, Start);
 
           // If S is a complete superset of a segment, we may need to grow its
           // endpoint as well.
           if (End > I->end)
             extendSegmentEndTo(I, End);
           return I;
         }
       } else {
         // Check to make sure that we are not overlapping two live segments with
         // different valno's.
         assert(I->start >= End &&
                "Cannot overlap two segments with differing ValID's");
       }
     }
 
     // Otherwise, this is just a new segment that doesn't interact with
     // anything.
     // Insert it.
     return segments().insert(I, S);
   }
 
 private:
   ImplT &impl() { return *static_cast<ImplT *>(this); }
 
   CollectionT &segments() { return impl().segmentsColl(); }
 
   Segment *segmentAt(iterator I) { return const_cast<Segment *>(&(*I)); }
 };
 
 //===----------------------------------------------------------------------===//
 //   Instantiation of the methods for calculation of live ranges
 //   based on a segment vector.
 //===----------------------------------------------------------------------===//
 
 class CalcLiveRangeUtilVector;
 typedef CalcLiveRangeUtilBase<CalcLiveRangeUtilVector, LiveRange::iterator,
                               LiveRange::Segments> CalcLiveRangeUtilVectorBase;
 
 class CalcLiveRangeUtilVector : public CalcLiveRangeUtilVectorBase {
 public:
   CalcLiveRangeUtilVector(LiveRange *LR) : CalcLiveRangeUtilVectorBase(LR) {}
 
 private:
   friend CalcLiveRangeUtilVectorBase;
 
   LiveRange::Segments &segmentsColl() { return LR->segments; }
 
   void insertAtEnd(const Segment &S) { LR->segments.push_back(S); }
 
   iterator find(SlotIndex Pos) { return LR->find(Pos); }
 
   iterator findInsertPos(Segment S) {
     return std::upper_bound(LR->begin(), LR->end(), S.start);
   }
 };
 
 //===----------------------------------------------------------------------===//
 //   Instantiation of the methods for calculation of live ranges
 //   based on a segment set.
 //===----------------------------------------------------------------------===//
 
 class CalcLiveRangeUtilSet;
 typedef CalcLiveRangeUtilBase<CalcLiveRangeUtilSet,
                               LiveRange::SegmentSet::iterator,
                               LiveRange::SegmentSet> CalcLiveRangeUtilSetBase;
 
 class CalcLiveRangeUtilSet : public CalcLiveRangeUtilSetBase {
 public:
   CalcLiveRangeUtilSet(LiveRange *LR) : CalcLiveRangeUtilSetBase(LR) {}
 
 private:
   friend CalcLiveRangeUtilSetBase;
 
   LiveRange::SegmentSet &segmentsColl() { return *LR->segmentSet; }
 
   void insertAtEnd(const Segment &S) {
     LR->segmentSet->insert(LR->segmentSet->end(), S);
   }
 
   iterator find(SlotIndex Pos) {
     iterator I =
         LR->segmentSet->upper_bound(Segment(Pos, Pos.getNextSlot(), nullptr));
     if (I == LR->segmentSet->begin())
       return I;
     iterator PrevI = std::prev(I);
     if (Pos < (*PrevI).end)
       return PrevI;
     return I;
   }
 
   iterator findInsertPos(Segment S) {
     iterator I = LR->segmentSet->upper_bound(S);
     if (I != LR->segmentSet->end() && !(S.start < *I))
       ++I;
     return I;
   }
 };
 } // namespace
 
 //===----------------------------------------------------------------------===//
 //   LiveRange methods
 //===----------------------------------------------------------------------===//
 
 LiveRange::iterator LiveRange::find(SlotIndex Pos) {
   // This algorithm is basically std::upper_bound.
   // Unfortunately, std::upper_bound cannot be used with mixed types until we
   // adopt C++0x. Many libraries can do it, but not all.
   if (empty() || Pos >= endIndex())
     return end();
   iterator I = begin();
   size_t Len = size();
   do {
     size_t Mid = Len >> 1;
     if (Pos < I[Mid].end)
       Len = Mid;
     else
       I += Mid + 1, Len -= Mid + 1;
   } while (Len);
   return I;
 }
 
 VNInfo *LiveRange::createDeadDef(SlotIndex Def,
                                   VNInfo::Allocator &VNInfoAllocator) {
   // Use the segment set, if it is available.
   if (segmentSet != nullptr)
     return CalcLiveRangeUtilSet(this).createDeadDef(Def, VNInfoAllocator);
   // Otherwise use the segment vector.
   return CalcLiveRangeUtilVector(this).createDeadDef(Def, VNInfoAllocator);
 }
 
 // overlaps - Return true if the intersection of the two live ranges is
 // not empty.
 //
 // An example for overlaps():
 //
 // 0: A = ...
 // 4: B = ...
 // 8: C = A + B ;; last use of A
 //
 // The live ranges should look like:
 //
 // A = [3, 11)
 // B = [7, x)
 // C = [11, y)
 //
 // A->overlaps(C) should return false since we want to be able to join
 // A and C.
 //
 bool LiveRange::overlapsFrom(const LiveRange& other,
                              const_iterator StartPos) const {
   assert(!empty() && "empty range");
   const_iterator i = begin();
   const_iterator ie = end();
   const_iterator j = StartPos;
   const_iterator je = other.end();
 
   assert((StartPos->start <= i->start || StartPos == other.begin()) &&
          StartPos != other.end() && "Bogus start position hint!");
 
   if (i->start < j->start) {
     i = std::upper_bound(i, ie, j->start);
     if (i != begin()) --i;
   } else if (j->start < i->start) {
     ++StartPos;
     if (StartPos != other.end() && StartPos->start <= i->start) {
       assert(StartPos < other.end() && i < end());
       j = std::upper_bound(j, je, i->start);
       if (j != other.begin()) --j;
     }
   } else {
     return true;
   }
 
   if (j == je) return false;
 
   while (i != ie) {
     if (i->start > j->start) {
       std::swap(i, j);
       std::swap(ie, je);
     }
 
     if (i->end > j->start)
       return true;
     ++i;
   }
 
   return false;
 }
 
 bool LiveRange::overlaps(const LiveRange &Other, const CoalescerPair &CP,
                          const SlotIndexes &Indexes) const {
   assert(!empty() && "empty range");
   if (Other.empty())
     return false;
 
   // Use binary searches to find initial positions.
   const_iterator I = find(Other.beginIndex());
   const_iterator IE = end();
   if (I == IE)
     return false;
   const_iterator J = Other.find(I->start);
   const_iterator JE = Other.end();
   if (J == JE)
     return false;
 
   for (;;) {
     // J has just been advanced to satisfy:
     assert(J->end >= I->start);
     // Check for an overlap.
     if (J->start < I->end) {
       // I and J are overlapping. Find the later start.
       SlotIndex Def = std::max(I->start, J->start);
       // Allow the overlap if Def is a coalescable copy.
       if (Def.isBlock() ||
           !CP.isCoalescable(Indexes.getInstructionFromIndex(Def)))
         return true;
     }
     // Advance the iterator that ends first to check for more overlaps.
     if (J->end > I->end) {
       std::swap(I, J);
       std::swap(IE, JE);
     }
     // Advance J until J->end >= I->start.
     do
       if (++J == JE)
         return false;
     while (J->end < I->start);
   }
 }
 
 /// overlaps - Return true if the live range overlaps an interval specified
 /// by [Start, End).
 bool LiveRange::overlaps(SlotIndex Start, SlotIndex End) const {
   assert(Start < End && "Invalid range");
   const_iterator I = std::lower_bound(begin(), end(), End);
   return I != begin() && (--I)->end > Start;
 }
 
 bool LiveRange::covers(const LiveRange &Other) const {
   if (empty())
     return Other.empty();
 
   const_iterator I = begin();
   for (const Segment &O : Other.segments) {
     I = advanceTo(I, O.start);
     if (I == end() || I->start > O.start)
       return false;
 
     // Check adjacent live segments and see if we can get behind O.end.
     while (I->end < O.end) {
       const_iterator Last = I;
       // Get next segment and abort if it was not adjacent.
       ++I;
       if (I == end() || Last->end != I->start)
         return false;
     }
   }
   return true;
 }
 
 /// ValNo is dead, remove it.  If it is the largest value number, just nuke it
 /// (and any other deleted values neighboring it), otherwise mark it as ~1U so
 /// it can be nuked later.
 void LiveRange::markValNoForDeletion(VNInfo *ValNo) {
   if (ValNo->id == getNumValNums()-1) {
     do {
       valnos.pop_back();
     } while (!valnos.empty() && valnos.back()->isUnused());
   } else {
     ValNo->markUnused();
   }
 }
 
 /// RenumberValues - Renumber all values in order of appearance and delete the
 /// remaining unused values.
 void LiveRange::RenumberValues() {
   SmallPtrSet<VNInfo*, 8> Seen;
   valnos.clear();
   for (const Segment &S : segments) {
     VNInfo *VNI = S.valno;
     if (!Seen.insert(VNI).second)
       continue;
     assert(!VNI->isUnused() && "Unused valno used by live segment");
     VNI->id = (unsigned)valnos.size();
     valnos.push_back(VNI);
   }
 }
 
 void LiveRange::addSegmentToSet(Segment S) {
   CalcLiveRangeUtilSet(this).addSegment(S);
 }
 
 LiveRange::iterator LiveRange::addSegment(Segment S) {
   // Use the segment set, if it is available.
   if (segmentSet != nullptr) {
     addSegmentToSet(S);
     return end();
   }
   // Otherwise use the segment vector.
   return CalcLiveRangeUtilVector(this).addSegment(S);
 }
 
 void LiveRange::append(const Segment S) {
   // Check that the segment belongs to the back of the list.
   assert(segments.empty() || segments.back().end <= S.start);
   segments.push_back(S);
 }
 
 /// extendInBlock - If this range is live before Kill in the basic
 /// block that starts at StartIdx, extend it to be live up to Kill and return
 /// the value. If there is no live range before Kill, return NULL.
 VNInfo *LiveRange::extendInBlock(SlotIndex StartIdx, SlotIndex Kill) {
   // Use the segment set, if it is available.
   if (segmentSet != nullptr)
     return CalcLiveRangeUtilSet(this).extendInBlock(StartIdx, Kill);
   // Otherwise use the segment vector.
   return CalcLiveRangeUtilVector(this).extendInBlock(StartIdx, Kill);
 }
 
 /// Remove the specified segment from this range.  Note that the segment must
 /// be in a single Segment in its entirety.
 void LiveRange::removeSegment(SlotIndex Start, SlotIndex End,
                               bool RemoveDeadValNo) {
   // Find the Segment containing this span.
   iterator I = find(Start);
   assert(I != end() && "Segment is not in range!");
   assert(I->containsInterval(Start, End)
          && "Segment is not entirely in range!");
 
   // If the span we are removing is at the start of the Segment, adjust it.
   VNInfo *ValNo = I->valno;
   if (I->start == Start) {
     if (I->end == End) {
       if (RemoveDeadValNo) {
         // Check if val# is dead.
         bool isDead = true;
         for (const_iterator II = begin(), EE = end(); II != EE; ++II)
           if (II != I && II->valno == ValNo) {
             isDead = false;
             break;
           }
         if (isDead) {
           // Now that ValNo is dead, remove it.
           markValNoForDeletion(ValNo);
         }
       }
 
       segments.erase(I);  // Removed the whole Segment.
     } else
       I->start = End;
     return;
   }
 
   // Otherwise if the span we are removing is at the end of the Segment,
   // adjust the other way.
   if (I->end == End) {
     I->end = Start;
     return;
   }
 
   // Otherwise, we are splitting the Segment into two pieces.
   SlotIndex OldEnd = I->end;
   I->end = Start;   // Trim the old segment.
 
   // Insert the new one.
   segments.insert(std::next(I), Segment(End, OldEnd, ValNo));
 }
 
 /// removeValNo - Remove all the segments defined by the specified value#.
 /// Also remove the value# from value# list.
 void LiveRange::removeValNo(VNInfo *ValNo) {
   if (empty()) return;
   segments.erase(std::remove_if(begin(), end(), [ValNo](const Segment &S) {
     return S.valno == ValNo;
   }), end());
   // Now that ValNo is dead, remove it.
   markValNoForDeletion(ValNo);
 }
 
 void LiveRange::join(LiveRange &Other,
                      const int *LHSValNoAssignments,
                      const int *RHSValNoAssignments,
                      SmallVectorImpl<VNInfo *> &NewVNInfo) {
   verify();
 
   // Determine if any of our values are mapped.  This is uncommon, so we want
   // to avoid the range scan if not.
   bool MustMapCurValNos = false;
   unsigned NumVals = getNumValNums();
   unsigned NumNewVals = NewVNInfo.size();
   for (unsigned i = 0; i != NumVals; ++i) {
     unsigned LHSValID = LHSValNoAssignments[i];
     if (i != LHSValID ||
         (NewVNInfo[LHSValID] && NewVNInfo[LHSValID] != getValNumInfo(i))) {
       MustMapCurValNos = true;
       break;
     }
   }
 
   // If we have to apply a mapping to our base range assignment, rewrite it now.
   if (MustMapCurValNos && !empty()) {
     // Map the first live range.
 
     iterator OutIt = begin();
     OutIt->valno = NewVNInfo[LHSValNoAssignments[OutIt->valno->id]];
     for (iterator I = std::next(OutIt), E = end(); I != E; ++I) {
       VNInfo* nextValNo = NewVNInfo[LHSValNoAssignments[I->valno->id]];
       assert(nextValNo && "Huh?");
 
       // If this live range has the same value # as its immediate predecessor,
       // and if they are neighbors, remove one Segment.  This happens when we
       // have [0,4:0)[4,7:1) and map 0/1 onto the same value #.
       if (OutIt->valno == nextValNo && OutIt->end == I->start) {
         OutIt->end = I->end;
       } else {
         // Didn't merge. Move OutIt to the next segment,
         ++OutIt;
         OutIt->valno = nextValNo;
         if (OutIt != I) {
           OutIt->start = I->start;
           OutIt->end = I->end;
         }
       }
     }
     // If we merge some segments, chop off the end.
     ++OutIt;
     segments.erase(OutIt, end());
   }
 
   // Rewrite Other values before changing the VNInfo ids.
   // This can leave Other in an invalid state because we're not coalescing
   // touching segments that now have identical values. That's OK since Other is
   // not supposed to be valid after calling join();
   for (Segment &S : Other.segments)
     S.valno = NewVNInfo[RHSValNoAssignments[S.valno->id]];
 
   // Update val# info. Renumber them and make sure they all belong to this
   // LiveRange now. Also remove dead val#'s.
   unsigned NumValNos = 0;
   for (unsigned i = 0; i < NumNewVals; ++i) {
     VNInfo *VNI = NewVNInfo[i];
     if (VNI) {
       if (NumValNos >= NumVals)
         valnos.push_back(VNI);
       else
         valnos[NumValNos] = VNI;
       VNI->id = NumValNos++;  // Renumber val#.
     }
   }
   if (NumNewVals < NumVals)
     valnos.resize(NumNewVals);  // shrinkify
 
   // Okay, now insert the RHS live segments into the LHS.
   LiveRangeUpdater Updater(this);
   for (Segment &S : Other.segments)
     Updater.add(S);
 }
 
 /// Merge all of the segments in RHS into this live range as the specified
 /// value number.  The segments in RHS are allowed to overlap with segments in
 /// the current range, but only if the overlapping segments have the
 /// specified value number.
 void LiveRange::MergeSegmentsInAsValue(const LiveRange &RHS,
                                        VNInfo *LHSValNo) {
   LiveRangeUpdater Updater(this);
   for (const Segment &S : RHS.segments)
     Updater.add(S.start, S.end, LHSValNo);
 }
 
 /// MergeValueInAsValue - Merge all of the live segments of a specific val#
 /// in RHS into this live range as the specified value number.
 /// The segments in RHS are allowed to overlap with segments in the
 /// current range, it will replace the value numbers of the overlaped
 /// segments with the specified value number.
 void LiveRange::MergeValueInAsValue(const LiveRange &RHS,
                                     const VNInfo *RHSValNo,
                                     VNInfo *LHSValNo) {
   LiveRangeUpdater Updater(this);
   for (const Segment &S : RHS.segments)
     if (S.valno == RHSValNo)
       Updater.add(S.start, S.end, LHSValNo);
 }
 
 /// MergeValueNumberInto - This method is called when two value nubmers
 /// are found to be equivalent.  This eliminates V1, replacing all
 /// segments with the V1 value number with the V2 value number.  This can
 /// cause merging of V1/V2 values numbers and compaction of the value space.
 VNInfo *LiveRange::MergeValueNumberInto(VNInfo *V1, VNInfo *V2) {
   assert(V1 != V2 && "Identical value#'s are always equivalent!");
 
   // This code actually merges the (numerically) larger value number into the
   // smaller value number, which is likely to allow us to compactify the value
   // space.  The only thing we have to be careful of is to preserve the
   // instruction that defines the result value.
 
   // Make sure V2 is smaller than V1.
   if (V1->id < V2->id) {
     V1->copyFrom(*V2);
     std::swap(V1, V2);
   }
 
   // Merge V1 segments into V2.
   for (iterator I = begin(); I != end(); ) {
     iterator S = I++;
     if (S->valno != V1) continue;  // Not a V1 Segment.
 
     // Okay, we found a V1 live range.  If it had a previous, touching, V2 live
     // range, extend it.
     if (S != begin()) {
       iterator Prev = S-1;
       if (Prev->valno == V2 && Prev->end == S->start) {
         Prev->end = S->end;
 
         // Erase this live-range.
         segments.erase(S);
         I = Prev+1;
         S = Prev;
       }
     }
 
     // Okay, now we have a V1 or V2 live range that is maximally merged forward.
     // Ensure that it is a V2 live-range.
     S->valno = V2;
 
     // If we can merge it into later V2 segments, do so now.  We ignore any
     // following V1 segments, as they will be merged in subsequent iterations
     // of the loop.
     if (I != end()) {
       if (I->start == S->end && I->valno == V2) {
         S->end = I->end;
         segments.erase(I);
         I = S+1;
       }
     }
   }
 
   // Now that V1 is dead, remove it.
   markValNoForDeletion(V1);
 
   return V2;
 }
 
 void LiveRange::flushSegmentSet() {
   assert(segmentSet != nullptr && "segment set must have been created");
   assert(
       segments.empty() &&
       "segment set can be used only initially before switching to the array");
   segments.append(segmentSet->begin(), segmentSet->end());
   segmentSet = nullptr;
   verify();
 }
 
+bool LiveRange::isLiveAtIndexes(ArrayRef<SlotIndex> Slots) const {
+  ArrayRef<SlotIndex>::iterator SlotI = Slots.begin();
+  ArrayRef<SlotIndex>::iterator SlotE = Slots.end();
+
+  // If there are no regmask slots, we have nothing to search.
+  if (SlotI == SlotE)
+    return false;
+
+  // Start our search at the first segment that ends after the first slot.
+  const_iterator SegmentI = find(*SlotI);
+  const_iterator SegmentE = end();
+
+  // If there are no segments that end after the first slot, we're done.
+  if (SegmentI == SegmentE)
+    return false;
+
+  // Look for each slot in the live range.
+  for ( ; SlotI != SlotE; ++SlotI) {
+    // Go to the next segment that ends after the current slot.
+    // The slot may be within a hole in the range.
+    SegmentI = advanceTo(SegmentI, *SlotI);
+    if (SegmentI == SegmentE)
+      return false;
+
+    // If this segment contains the slot, we're done.
+    if (SegmentI->contains(*SlotI))
+      return true;
+    // Otherwise, look for the next slot.
+  }
+
+  // We didn't find a segment containing any of the slots.
+  return false;
+}
+
 void LiveInterval::freeSubRange(SubRange *S) {
   S->~SubRange();
   // Memory was allocated with BumpPtr allocator and is not freed here.
 }
 
 void LiveInterval::removeEmptySubRanges() {
   SubRange **NextPtr = &SubRanges;
   SubRange *I = *NextPtr;
   while (I != nullptr) {
     if (!I->empty()) {
       NextPtr = &I->Next;
       I = *NextPtr;
       continue;
     }
     // Skip empty subranges until we find the first nonempty one.
     do {
       SubRange *Next = I->Next;
       freeSubRange(I);
       I = Next;
     } while (I != nullptr && I->empty());
     *NextPtr = I;
   }
 }
 
 void LiveInterval::clearSubRanges() {
   for (SubRange *I = SubRanges, *Next; I != nullptr; I = Next) {
     Next = I->Next;
     freeSubRange(I);
   }
   SubRanges = nullptr;
 }
 
 /// Helper function for constructMainRangeFromSubranges(): Search the CFG
 /// backwards until we find a place covered by a LiveRange segment that actually
 /// has a valno set.
 static VNInfo *searchForVNI(const SlotIndexes &Indexes, LiveRange &LR,
     const MachineBasicBlock *MBB,
     SmallPtrSetImpl<const MachineBasicBlock*> &Visited) {
   // We start the search at the end of MBB.
   SlotIndex EndIdx = Indexes.getMBBEndIdx(MBB);
   // In our use case we can't live the area covered by the live segments without
   // finding an actual VNI def.
   LiveRange::iterator I = LR.find(EndIdx.getPrevSlot());
   assert(I != LR.end());
   LiveRange::Segment &S = *I;
   if (S.valno != nullptr)
     return S.valno;
 
   VNInfo *VNI = nullptr;
   // Continue at predecessors (we could even go to idom with domtree available).
   for (const MachineBasicBlock *Pred : MBB->predecessors()) {
     // Avoid going in circles.
     if (!Visited.insert(Pred).second)
       continue;
 
     VNI = searchForVNI(Indexes, LR, Pred, Visited);
     if (VNI != nullptr) {
       S.valno = VNI;
       break;
     }
   }
 
   return VNI;
 }
 
 static void determineMissingVNIs(const SlotIndexes &Indexes, LiveInterval &LI) {
   SmallPtrSet<const MachineBasicBlock*, 5> Visited;
 
   LiveRange::iterator OutIt;
   VNInfo *PrevValNo = nullptr;
   for (LiveRange::iterator I = LI.begin(), E = LI.end(); I != E; ++I) {
     LiveRange::Segment &S = *I;
     // Determine final VNI if necessary.
     if (S.valno == nullptr) {
       // This can only happen at the begin of a basic block.
       assert(S.start.isBlock() && "valno should only be missing at block begin");
 
       Visited.clear();
       const MachineBasicBlock *MBB = Indexes.getMBBFromIndex(S.start);
       for (const MachineBasicBlock *Pred : MBB->predecessors()) {
         VNInfo *VNI = searchForVNI(Indexes, LI, Pred, Visited);
         if (VNI != nullptr) {
           S.valno = VNI;
           break;
         }
       }
       assert(S.valno != nullptr && "could not determine valno");
     }
     // Merge with previous segment if it has the same VNI.
     if (PrevValNo == S.valno && OutIt->end == S.start) {
       OutIt->end = S.end;
     } else {
       // Didn't merge. Move OutIt to next segment.
       if (PrevValNo == nullptr)
         OutIt = LI.begin();
       else
         ++OutIt;
 
       if (OutIt != I)
         *OutIt = *I;
       PrevValNo = S.valno;
     }
   }
   // If we merged some segments chop off the end.
   ++OutIt;
   LI.segments.erase(OutIt, LI.end());
 }
 
 void LiveInterval::constructMainRangeFromSubranges(
     const SlotIndexes &Indexes, VNInfo::Allocator &VNIAllocator) {
   // The basic observations on which this algorithm is based:
   // - Each Def/ValNo in a subrange must have a corresponding def on the main
   //   range, but not further defs/valnos are necessary.
   // - If any of the subranges is live at a point the main liverange has to be
   //   live too, conversily if no subrange is live the main range mustn't be
   //   live either.
   // We do this by scanning through all the subranges simultaneously creating new
   // segments in the main range as segments start/ends come up in the subranges.
   assert(hasSubRanges() && "expected subranges to be present");
   assert(segments.empty() && valnos.empty() && "expected empty main range");
 
   // Collect subrange, iterator pairs for the walk and determine first and last
   // SlotIndex involved.
   SmallVector<std::pair<const SubRange*, const_iterator>, 4> SRs;
   SlotIndex First;
   SlotIndex Last;
   for (const SubRange &SR : subranges()) {
     if (SR.empty())
       continue;
     SRs.push_back(std::make_pair(&SR, SR.begin()));
     if (!First.isValid() || SR.segments.front().start < First)
       First = SR.segments.front().start;
     if (!Last.isValid() || SR.segments.back().end > Last)
       Last = SR.segments.back().end;
   }
 
   // Walk over all subranges simultaneously.
   Segment CurrentSegment;
   bool ConstructingSegment = false;
   bool NeedVNIFixup = false;
   LaneBitmask ActiveMask = 0;
   SlotIndex Pos = First;
   while (true) {
     SlotIndex NextPos = Last;
     enum {
       NOTHING,
       BEGIN_SEGMENT,
       END_SEGMENT,
     } Event = NOTHING;
     // Which subregister lanes are affected by the current event.
     LaneBitmask EventMask = 0;
     // Whether a BEGIN_SEGMENT is also a valno definition point.
     bool IsDef = false;
     // Find the next begin or end of a subrange segment. Combine masks if we
     // have multiple begins/ends at the same position. Ends take precedence over
     // Begins.
     for (auto &SRP : SRs) {
       const SubRange &SR = *SRP.first;
       const_iterator &I = SRP.second;
       // Advance iterator of subrange to a segment involving Pos; the earlier
       // segments are already merged at this point.
       while (I != SR.end() &&
              (I->end < Pos ||
               (I->end == Pos && (ActiveMask & SR.LaneMask) == 0)))
         ++I;
       if (I == SR.end())
         continue;
       if ((ActiveMask & SR.LaneMask) == 0 &&
           Pos <= I->start && I->start <= NextPos) {
         // Merge multiple begins at the same position.
         if (I->start == NextPos && Event == BEGIN_SEGMENT) {
           EventMask |= SR.LaneMask;
           IsDef |= I->valno->def == I->start;
         } else if (I->start < NextPos || Event != END_SEGMENT) {
           Event = BEGIN_SEGMENT;
           NextPos = I->start;
           EventMask = SR.LaneMask;
           IsDef = I->valno->def == I->start;
         }
       }
       if ((ActiveMask & SR.LaneMask) != 0 &&
           Pos <= I->end && I->end <= NextPos) {
         // Merge multiple ends at the same position.
         if (I->end == NextPos && Event == END_SEGMENT)
           EventMask |= SR.LaneMask;
         else {
           Event = END_SEGMENT;
           NextPos = I->end;
           EventMask = SR.LaneMask;
         }
       }
     }
 
     // Advance scan position.
     Pos = NextPos;
     if (Event == BEGIN_SEGMENT) {
       if (ConstructingSegment && IsDef) {
         // Finish previous segment because we have to start a new one.
         CurrentSegment.end = Pos;
         append(CurrentSegment);
         ConstructingSegment = false;
       }
 
       // Start a new segment if necessary.
       if (!ConstructingSegment) {
         // Determine value number for the segment.
         VNInfo *VNI;
         if (IsDef) {
           VNI = getNextValue(Pos, VNIAllocator);
         } else {
           // We have to reuse an existing value number, if we are lucky
           // then we already passed one of the predecessor blocks and determined
           // its value number (with blocks in reverse postorder this would be
           // always true but we have no such guarantee).
           assert(Pos.isBlock());
           const MachineBasicBlock *MBB = Indexes.getMBBFromIndex(Pos);
           // See if any of the predecessor blocks has a lower number and a VNI
           for (const MachineBasicBlock *Pred : MBB->predecessors()) {
             SlotIndex PredEnd = Indexes.getMBBEndIdx(Pred);
             VNI = getVNInfoBefore(PredEnd);
             if (VNI != nullptr)
               break;
           }
           // Def will come later: We have to do an extra fixup pass.
           if (VNI == nullptr)
             NeedVNIFixup = true;
         }
 
         // In rare cases we can produce adjacent segments with the same value
         // number (if they come from different subranges, but happen to have
         // the same defining instruction). VNIFixup will fix those cases.
         if (!empty() && segments.back().end == Pos &&
             segments.back().valno == VNI)
           NeedVNIFixup = true;
         CurrentSegment.start = Pos;
         CurrentSegment.valno = VNI;
         ConstructingSegment = true;
       }
       ActiveMask |= EventMask;
     } else if (Event == END_SEGMENT) {
       assert(ConstructingSegment);
       // Finish segment if no lane is active anymore.
       ActiveMask &= ~EventMask;
       if (ActiveMask == 0) {
         CurrentSegment.end = Pos;
         append(CurrentSegment);
         ConstructingSegment = false;
       }
     } else {
       // We reached the end of the last subranges and can stop.
       assert(Event == NOTHING);
       break;
     }
   }
 
   // We might not be able to assign new valnos for all segments if the basic
   // block containing the definition comes after a segment using the valno.
   // Do a fixup pass for this uncommon case.
   if (NeedVNIFixup)
     determineMissingVNIs(Indexes, *this);
 
   assert(ActiveMask == 0 && !ConstructingSegment && "all segments ended");
   verify();
 }
 
 unsigned LiveInterval::getSize() const {
   unsigned Sum = 0;
   for (const Segment &S : segments)
     Sum += S.start.distance(S.end);
   return Sum;
 }
 
 raw_ostream& llvm::operator<<(raw_ostream& os, const LiveRange::Segment &S) {
   return os << '[' << S.start << ',' << S.end << ':' << S.valno->id << ")";
 }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 void LiveRange::Segment::dump() const {
   dbgs() << *this << "\n";
 }
 #endif
 
 void LiveRange::print(raw_ostream &OS) const {
   if (empty())
     OS << "EMPTY";
   else {
     for (const Segment &S : segments) {
       OS << S;
       assert(S.valno == getValNumInfo(S.valno->id) && "Bad VNInfo");
     }
   }
 
   // Print value number info.
   if (getNumValNums()) {
     OS << "  ";
     unsigned vnum = 0;
     for (const_vni_iterator i = vni_begin(), e = vni_end(); i != e;
          ++i, ++vnum) {
       const VNInfo *vni = *i;
       if (vnum) OS << " ";
       OS << vnum << "@";
       if (vni->isUnused()) {
         OS << "x";
       } else {
         OS << vni->def;
         if (vni->isPHIDef())
           OS << "-phi";
       }
     }
   }
 }
 
 void LiveInterval::print(raw_ostream &OS) const {
   OS << PrintReg(reg) << ' ';
   super::print(OS);
   // Print subranges
   for (const SubRange &SR : subranges()) {
     OS << " L" << PrintLaneMask(SR.LaneMask) << ' ' << SR;
   }
 }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 void LiveRange::dump() const {
   dbgs() << *this << "\n";
 }
 
 void LiveInterval::dump() const {
   dbgs() << *this << "\n";
 }
 #endif
 
 #ifndef NDEBUG
 void LiveRange::verify() const {
   for (const_iterator I = begin(), E = end(); I != E; ++I) {
     assert(I->start.isValid());
     assert(I->end.isValid());
     assert(I->start < I->end);
     assert(I->valno != nullptr);
     assert(I->valno->id < valnos.size());
     assert(I->valno == valnos[I->valno->id]);
     if (std::next(I) != E) {
       assert(I->end <= std::next(I)->start);
       if (I->end == std::next(I)->start)
         assert(I->valno != std::next(I)->valno);
     }
   }
 }
 
 void LiveInterval::verify(const MachineRegisterInfo *MRI) const {
   super::verify();
 
   // Make sure SubRanges are fine and LaneMasks are disjunct.
   LaneBitmask Mask = 0;
   LaneBitmask MaxMask = MRI != nullptr ? MRI->getMaxLaneMaskForVReg(reg) : ~0u;
   for (const SubRange &SR : subranges()) {
     // Subrange lanemask should be disjunct to any previous subrange masks.
     assert((Mask & SR.LaneMask) == 0);
     Mask |= SR.LaneMask;
 
     // subrange mask should not contained in maximum lane mask for the vreg.
     assert((Mask & ~MaxMask) == 0);
     // empty subranges must be removed.
     assert(!SR.empty());
 
     SR.verify();
     // Main liverange should cover subrange.
     assert(covers(SR));
   }
 }
 #endif
 
 
 //===----------------------------------------------------------------------===//
 //                           LiveRangeUpdater class
 //===----------------------------------------------------------------------===//
 //
 // The LiveRangeUpdater class always maintains these invariants:
 //
 // - When LastStart is invalid, Spills is empty and the iterators are invalid.
 //   This is the initial state, and the state created by flush().
 //   In this state, isDirty() returns false.
 //
 // Otherwise, segments are kept in three separate areas:
 //
 // 1. [begin; WriteI) at the front of LR.
 // 2. [ReadI; end) at the back of LR.
 // 3. Spills.
 //
 // - LR.begin() <= WriteI <= ReadI <= LR.end().
 // - Segments in all three areas are fully ordered and coalesced.
 // - Segments in area 1 precede and can't coalesce with segments in area 2.
 // - Segments in Spills precede and can't coalesce with segments in area 2.
 // - No coalescing is possible between segments in Spills and segments in area
 //   1, and there are no overlapping segments.
 //
 // The segments in Spills are not ordered with respect to the segments in area
 // 1. They need to be merged.
 //
 // When they exist, Spills.back().start <= LastStart,
 //                 and WriteI[-1].start <= LastStart.
 
 void LiveRangeUpdater::print(raw_ostream &OS) const {
   if (!isDirty()) {
     if (LR)
       OS << "Clean updater: " << *LR << '\n';
     else
       OS << "Null updater.\n";
     return;
   }
   assert(LR && "Can't have null LR in dirty updater.");
   OS << " updater with gap = " << (ReadI - WriteI)
      << ", last start = " << LastStart
      << ":\n  Area 1:";
   for (const auto &S : make_range(LR->begin(), WriteI))
     OS << ' ' << S;
   OS << "\n  Spills:";
   for (unsigned I = 0, E = Spills.size(); I != E; ++I)
     OS << ' ' << Spills[I];
   OS << "\n  Area 2:";
   for (const auto &S : make_range(ReadI, LR->end()))
     OS << ' ' << S;
   OS << '\n';
 }
 
 void LiveRangeUpdater::dump() const
 {
   print(errs());
 }
 
 // Determine if A and B should be coalesced.
 static inline bool coalescable(const LiveRange::Segment &A,
                                const LiveRange::Segment &B) {
   assert(A.start <= B.start && "Unordered live segments.");
   if (A.end == B.start)
     return A.valno == B.valno;
   if (A.end < B.start)
     return false;
   assert(A.valno == B.valno && "Cannot overlap different values");
   return true;
 }
 
 void LiveRangeUpdater::add(LiveRange::Segment Seg) {
   assert(LR && "Cannot add to a null destination");
 
   // Fall back to the regular add method if the live range
   // is using the segment set instead of the segment vector.
   if (LR->segmentSet != nullptr) {
     LR->addSegmentToSet(Seg);
     return;
   }
 
   // Flush the state if Start moves backwards.
   if (!LastStart.isValid() || LastStart > Seg.start) {
     if (isDirty())
       flush();
     // This brings us to an uninitialized state. Reinitialize.
     assert(Spills.empty() && "Leftover spilled segments");
     WriteI = ReadI = LR->begin();
   }
 
   // Remember start for next time.
   LastStart = Seg.start;
 
   // Advance ReadI until it ends after Seg.start.
   LiveRange::iterator E = LR->end();
   if (ReadI != E && ReadI->end <= Seg.start) {
     // First try to close the gap between WriteI and ReadI with spills.
     if (ReadI != WriteI)
       mergeSpills();
     // Then advance ReadI.
     if (ReadI == WriteI)
       ReadI = WriteI = LR->find(Seg.start);
     else
       while (ReadI != E && ReadI->end <= Seg.start)
         *WriteI++ = *ReadI++;
   }
 
   assert(ReadI == E || ReadI->end > Seg.start);
 
   // Check if the ReadI segment begins early.
   if (ReadI != E && ReadI->start <= Seg.start) {
     assert(ReadI->valno == Seg.valno && "Cannot overlap different values");
     // Bail if Seg is completely contained in ReadI.
     if (ReadI->end >= Seg.end)
       return;
     // Coalesce into Seg.
     Seg.start = ReadI->start;
     ++ReadI;
   }
 
   // Coalesce as much as possible from ReadI into Seg.
   while (ReadI != E && coalescable(Seg, *ReadI)) {
     Seg.end = std::max(Seg.end, ReadI->end);
     ++ReadI;
   }
 
   // Try coalescing Spills.back() into Seg.
   if (!Spills.empty() && coalescable(Spills.back(), Seg)) {
     Seg.start = Spills.back().start;
     Seg.end = std::max(Spills.back().end, Seg.end);
     Spills.pop_back();
   }
 
   // Try coalescing Seg into WriteI[-1].
   if (WriteI != LR->begin() && coalescable(WriteI[-1], Seg)) {
     WriteI[-1].end = std::max(WriteI[-1].end, Seg.end);
     return;
   }
 
   // Seg doesn't coalesce with anything, and needs to be inserted somewhere.
   if (WriteI != ReadI) {
     *WriteI++ = Seg;
     return;
   }
 
   // Finally, append to LR or Spills.
   if (WriteI == E) {
     LR->segments.push_back(Seg);
     WriteI = ReadI = LR->end();
   } else
     Spills.push_back(Seg);
 }
 
 // Merge as many spilled segments as possible into the gap between WriteI
 // and ReadI. Advance WriteI to reflect the inserted instructions.
 void LiveRangeUpdater::mergeSpills() {
   // Perform a backwards merge of Spills and [SpillI;WriteI).
   size_t GapSize = ReadI - WriteI;
   size_t NumMoved = std::min(Spills.size(), GapSize);
   LiveRange::iterator Src = WriteI;
   LiveRange::iterator Dst = Src + NumMoved;
   LiveRange::iterator SpillSrc = Spills.end();
   LiveRange::iterator B = LR->begin();
 
   // This is the new WriteI position after merging spills.
   WriteI = Dst;
 
   // Now merge Src and Spills backwards.
   while (Src != Dst) {
     if (Src != B && Src[-1].start > SpillSrc[-1].start)
       *--Dst = *--Src;
     else
       *--Dst = *--SpillSrc;
   }
   assert(NumMoved == size_t(Spills.end() - SpillSrc));
   Spills.erase(SpillSrc, Spills.end());
 }
 
 void LiveRangeUpdater::flush() {
   if (!isDirty())
     return;
   // Clear the dirty state.
   LastStart = SlotIndex();
 
   assert(LR && "Cannot add to a null destination");
 
   // Nothing to merge?
   if (Spills.empty()) {
     LR->segments.erase(WriteI, ReadI);
     LR->verify();
     return;
   }
 
   // Resize the WriteI - ReadI gap to match Spills.
   size_t GapSize = ReadI - WriteI;
   if (GapSize < Spills.size()) {
     // The gap is too small. Make some room.
     size_t WritePos = WriteI - LR->begin();
     LR->segments.insert(ReadI, Spills.size() - GapSize, LiveRange::Segment());
     // This also invalidated ReadI, but it is recomputed below.
     WriteI = LR->begin() + WritePos;
   } else {
     // Shrink the gap if necessary.
     LR->segments.erase(WriteI + Spills.size(), ReadI);
   }
   ReadI = WriteI + Spills.size();
   mergeSpills();
   LR->verify();
 }
 
 unsigned ConnectedVNInfoEqClasses::Classify(const LiveRange &LR) {
   // Create initial equivalence classes.
   EqClass.clear();
   EqClass.grow(LR.getNumValNums());
 
   const VNInfo *used = nullptr, *unused = nullptr;
 
   // Determine connections.
   for (const VNInfo *VNI : LR.valnos) {
     // Group all unused values into one class.
     if (VNI->isUnused()) {
       if (unused)
         EqClass.join(unused->id, VNI->id);
       unused = VNI;
       continue;
     }
     used = VNI;
     if (VNI->isPHIDef()) {
       const MachineBasicBlock *MBB = LIS.getMBBFromIndex(VNI->def);
       assert(MBB && "Phi-def has no defining MBB");
       // Connect to values live out of predecessors.
       for (MachineBasicBlock::const_pred_iterator PI = MBB->pred_begin(),
            PE = MBB->pred_end(); PI != PE; ++PI)
         if (const VNInfo *PVNI = LR.getVNInfoBefore(LIS.getMBBEndIdx(*PI)))
           EqClass.join(VNI->id, PVNI->id);
     } else {
       // Normal value defined by an instruction. Check for two-addr redef.
       // FIXME: This could be coincidental. Should we really check for a tied
       // operand constraint?
       // Note that VNI->def may be a use slot for an early clobber def.
       if (const VNInfo *UVNI = LR.getVNInfoBefore(VNI->def))
         EqClass.join(VNI->id, UVNI->id);
     }
   }
 
   // Lump all the unused values in with the last used value.
   if (used && unused)
     EqClass.join(used->id, unused->id);
 
   EqClass.compress();
   return EqClass.getNumClasses();
 }
 
 template<typename LiveRangeT, typename EqClassesT>
 static void DistributeRange(LiveRangeT &LR, LiveRangeT *SplitLRs[],
                             EqClassesT VNIClasses) {
   // Move segments to new intervals.
   LiveRange::iterator J = LR.begin(), E = LR.end();
   while (J != E && VNIClasses[J->valno->id] == 0)
     ++J;
   for (LiveRange::iterator I = J; I != E; ++I) {
     if (unsigned eq = VNIClasses[I->valno->id]) {
       assert((SplitLRs[eq-1]->empty() || SplitLRs[eq-1]->expiredAt(I->start)) &&
              "New intervals should be empty");
       SplitLRs[eq-1]->segments.push_back(*I);
     } else
       *J++ = *I;
   }
   LR.segments.erase(J, E);
 
   // Transfer VNInfos to their new owners and renumber them.
   unsigned j = 0, e = LR.getNumValNums();
   while (j != e && VNIClasses[j] == 0)
     ++j;
   for (unsigned i = j; i != e; ++i) {
     VNInfo *VNI = LR.getValNumInfo(i);
     if (unsigned eq = VNIClasses[i]) {
       VNI->id = SplitLRs[eq-1]->getNumValNums();
       SplitLRs[eq-1]->valnos.push_back(VNI);
     } else {
       VNI->id = j;
       LR.valnos[j++] = VNI;
     }
   }
   LR.valnos.resize(j);
 }
 
 void ConnectedVNInfoEqClasses::Distribute(LiveInterval &LI, LiveInterval *LIV[],
                                           MachineRegisterInfo &MRI) {
   // Rewrite instructions.
   for (MachineRegisterInfo::reg_iterator RI = MRI.reg_begin(LI.reg),
        RE = MRI.reg_end(); RI != RE;) {
     MachineOperand &MO = *RI;
     MachineInstr *MI = RI->getParent();
     ++RI;
     // DBG_VALUE instructions don't have slot indexes, so get the index of the
     // instruction before them.
     // Normally, DBG_VALUE instructions are removed before this function is
     // called, but it is not a requirement.
     SlotIndex Idx;
     if (MI->isDebugValue())
       Idx = LIS.getSlotIndexes()->getIndexBefore(MI);
     else
       Idx = LIS.getInstructionIndex(MI);
     LiveQueryResult LRQ = LI.Query(Idx);
     const VNInfo *VNI = MO.readsReg() ? LRQ.valueIn() : LRQ.valueDefined();
     // In the case of an <undef> use that isn't tied to any def, VNI will be
     // NULL. If the use is tied to a def, VNI will be the defined value.
     if (!VNI)
       continue;
     if (unsigned EqClass = getEqClass(VNI))
       MO.setReg(LIV[EqClass-1]->reg);
   }
 
   // Distribute subregister liveranges.
   if (LI.hasSubRanges()) {
     unsigned NumComponents = EqClass.getNumClasses();
     SmallVector<unsigned, 8> VNIMapping;
     SmallVector<LiveInterval::SubRange*, 8> SubRanges;
     BumpPtrAllocator &Allocator = LIS.getVNInfoAllocator();
     for (LiveInterval::SubRange &SR : LI.subranges()) {
       // Create new subranges in the split intervals and construct a mapping
       // for the VNInfos in the subrange.
       unsigned NumValNos = SR.valnos.size();
       VNIMapping.clear();
       VNIMapping.reserve(NumValNos);
       SubRanges.clear();
       SubRanges.resize(NumComponents-1, nullptr);
       for (unsigned I = 0; I < NumValNos; ++I) {
         const VNInfo &VNI = *SR.valnos[I];
         const VNInfo *MainRangeVNI = LI.getVNInfoAt(VNI.def);
         assert(MainRangeVNI != nullptr
                && "SubRange def must have corresponding main range def");
         unsigned ComponentNum = getEqClass(MainRangeVNI);
         VNIMapping.push_back(ComponentNum);
         if (ComponentNum > 0 && SubRanges[ComponentNum-1] == nullptr) {
           SubRanges[ComponentNum-1]
             = LIV[ComponentNum-1]->createSubRange(Allocator, SR.LaneMask);
         }
       }
       DistributeRange(SR, SubRanges.data(), VNIMapping);
     }
     LI.removeEmptySubRanges();
   }
 
   // Distribute main liverange.
   DistributeRange(LI, LIV, EqClass);
 }
Index: vendor/llvm/dist/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
===================================================================
--- vendor/llvm/dist/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp	(revision 295845)
+++ vendor/llvm/dist/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp	(revision 295846)
@@ -1,4681 +1,4707 @@
 //===-- LegalizeDAG.cpp - Implement SelectionDAG::Legalize ----------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file implements the SelectionDAG::Legalize method.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/CodeGen/SelectionDAG.h"
 #include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Triple.h"
 #include "llvm/CodeGen/Analysis.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineJumpTableInfo.h"
 #include "llvm/IR/CallingConv.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/DebugInfo.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetFrameLowering.h"
 #include "llvm/Target/TargetLowering.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Target/TargetSubtargetInfo.h"
 using namespace llvm;
 
 #define DEBUG_TYPE "legalizedag"
 
 namespace {
 
 struct FloatSignAsInt;
 
 //===----------------------------------------------------------------------===//
 /// This takes an arbitrary SelectionDAG as input and
 /// hacks on it until the target machine can handle it.  This involves
 /// eliminating value sizes the machine cannot handle (promoting small sizes to
 /// large sizes or splitting up large values into small values) as well as
 /// eliminating operations the machine cannot handle.
 ///
 /// This code also does a small amount of optimization and recognition of idioms
 /// as part of its processing.  For example, if a target does not support a
 /// 'setcc' instruction efficiently, but does support 'brcc' instruction, this
 /// will attempt merge setcc and brc instructions into brcc's.
 ///
 class SelectionDAGLegalize {
   const TargetMachine &TM;
   const TargetLowering &TLI;
   SelectionDAG &DAG;
 
   /// \brief The set of nodes which have already been legalized. We hold a
   /// reference to it in order to update as necessary on node deletion.
   SmallPtrSetImpl<SDNode *> &LegalizedNodes;
 
   /// \brief A set of all the nodes updated during legalization.
   SmallSetVector<SDNode *, 16> *UpdatedNodes;
 
   EVT getSetCCResultType(EVT VT) const {
     return TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
   }
 
   // Libcall insertion helpers.
 
 public:
   SelectionDAGLegalize(SelectionDAG &DAG,
                        SmallPtrSetImpl<SDNode *> &LegalizedNodes,
                        SmallSetVector<SDNode *, 16> *UpdatedNodes = nullptr)
       : TM(DAG.getTarget()), TLI(DAG.getTargetLoweringInfo()), DAG(DAG),
         LegalizedNodes(LegalizedNodes), UpdatedNodes(UpdatedNodes) {}
 
   /// \brief Legalizes the given operation.
   void LegalizeOp(SDNode *Node);
 
 private:
   SDValue OptimizeFloatStore(StoreSDNode *ST);
 
   void LegalizeLoadOps(SDNode *Node);
   void LegalizeStoreOps(SDNode *Node);
 
   /// Some targets cannot handle a variable
   /// insertion index for the INSERT_VECTOR_ELT instruction.  In this case, it
   /// is necessary to spill the vector being inserted into to memory, perform
   /// the insert there, and then read the result back.
   SDValue PerformInsertVectorEltInMemory(SDValue Vec, SDValue Val,
                                          SDValue Idx, SDLoc dl);
   SDValue ExpandINSERT_VECTOR_ELT(SDValue Vec, SDValue Val,
                                   SDValue Idx, SDLoc dl);
 
   /// Return a vector shuffle operation which
   /// performs the same shuffe in terms of order or result bytes, but on a type
   /// whose vector element type is narrower than the original shuffle type.
   /// e.g. <v4i32> <0, 1, 0, 1> -> v8i16 <0, 1, 2, 3, 0, 1, 2, 3>
   SDValue ShuffleWithNarrowerEltType(EVT NVT, EVT VT, SDLoc dl,
                                      SDValue N1, SDValue N2,
                                      ArrayRef<int> Mask) const;
 
   bool LegalizeSetCCCondCode(EVT VT, SDValue &LHS, SDValue &RHS, SDValue &CC,
                              bool &NeedInvert, SDLoc dl);
 
   SDValue ExpandLibCall(RTLIB::Libcall LC, SDNode *Node, bool isSigned);
   SDValue ExpandLibCall(RTLIB::Libcall LC, EVT RetVT, const SDValue *Ops,
                         unsigned NumOps, bool isSigned, SDLoc dl);
 
   std::pair<SDValue, SDValue> ExpandChainLibCall(RTLIB::Libcall LC,
                                                  SDNode *Node, bool isSigned);
   SDValue ExpandFPLibCall(SDNode *Node, RTLIB::Libcall Call_F32,
                           RTLIB::Libcall Call_F64, RTLIB::Libcall Call_F80,
                           RTLIB::Libcall Call_F128,
                           RTLIB::Libcall Call_PPCF128);
   SDValue ExpandIntLibCall(SDNode *Node, bool isSigned,
                            RTLIB::Libcall Call_I8,
                            RTLIB::Libcall Call_I16,
                            RTLIB::Libcall Call_I32,
                            RTLIB::Libcall Call_I64,
                            RTLIB::Libcall Call_I128);
   void ExpandDivRemLibCall(SDNode *Node, SmallVectorImpl<SDValue> &Results);
   void ExpandSinCosLibCall(SDNode *Node, SmallVectorImpl<SDValue> &Results);
 
   SDValue EmitStackConvert(SDValue SrcOp, EVT SlotVT, EVT DestVT, SDLoc dl);
   SDValue ExpandBUILD_VECTOR(SDNode *Node);
   SDValue ExpandSCALAR_TO_VECTOR(SDNode *Node);
   void ExpandDYNAMIC_STACKALLOC(SDNode *Node,
                                 SmallVectorImpl<SDValue> &Results);
   void getSignAsIntValue(FloatSignAsInt &State, SDLoc DL, SDValue Value) const;
   SDValue modifySignAsInt(const FloatSignAsInt &State, SDLoc DL,
                           SDValue NewIntValue) const;
   SDValue ExpandFCOPYSIGN(SDNode *Node) const;
   SDValue ExpandFABS(SDNode *Node) const;
   SDValue ExpandLegalINT_TO_FP(bool isSigned, SDValue LegalOp, EVT DestVT,
                                SDLoc dl);
   SDValue PromoteLegalINT_TO_FP(SDValue LegalOp, EVT DestVT, bool isSigned,
                                 SDLoc dl);
   SDValue PromoteLegalFP_TO_INT(SDValue LegalOp, EVT DestVT, bool isSigned,
                                 SDLoc dl);
 
   SDValue ExpandBITREVERSE(SDValue Op, SDLoc dl);
   SDValue ExpandBSWAP(SDValue Op, SDLoc dl);
   SDValue ExpandBitCount(unsigned Opc, SDValue Op, SDLoc dl);
 
   SDValue ExpandExtractFromVectorThroughStack(SDValue Op);
   SDValue ExpandInsertToVectorThroughStack(SDValue Op);
   SDValue ExpandVectorBuildThroughStack(SDNode* Node);
 
   SDValue ExpandConstantFP(ConstantFPSDNode *CFP, bool UseCP);
   SDValue ExpandConstant(ConstantSDNode *CP);
 
   // if ExpandNode returns false, LegalizeOp falls back to ConvertNodeToLibcall
   bool ExpandNode(SDNode *Node);
   void ConvertNodeToLibcall(SDNode *Node);
   void PromoteNode(SDNode *Node);
 
 public:
   // Node replacement helpers
   void ReplacedNode(SDNode *N) {
     LegalizedNodes.erase(N);
     if (UpdatedNodes)
       UpdatedNodes->insert(N);
   }
   void ReplaceNode(SDNode *Old, SDNode *New) {
     DEBUG(dbgs() << " ... replacing: "; Old->dump(&DAG);
           dbgs() << "     with:      "; New->dump(&DAG));
 
     assert(Old->getNumValues() == New->getNumValues() &&
            "Replacing one node with another that produces a different number "
            "of values!");
     DAG.ReplaceAllUsesWith(Old, New);
     for (unsigned i = 0, e = Old->getNumValues(); i != e; ++i)
       DAG.TransferDbgValues(SDValue(Old, i), SDValue(New, i));
     if (UpdatedNodes)
       UpdatedNodes->insert(New);
     ReplacedNode(Old);
   }
   void ReplaceNode(SDValue Old, SDValue New) {
     DEBUG(dbgs() << " ... replacing: "; Old->dump(&DAG);
           dbgs() << "     with:      "; New->dump(&DAG));
 
     DAG.ReplaceAllUsesWith(Old, New);
     DAG.TransferDbgValues(Old, New);
     if (UpdatedNodes)
       UpdatedNodes->insert(New.getNode());
     ReplacedNode(Old.getNode());
   }
   void ReplaceNode(SDNode *Old, const SDValue *New) {
     DEBUG(dbgs() << " ... replacing: "; Old->dump(&DAG));
 
     DAG.ReplaceAllUsesWith(Old, New);
     for (unsigned i = 0, e = Old->getNumValues(); i != e; ++i) {
       DEBUG(dbgs() << (i == 0 ? "     with:      "
                               : "      and:      ");
             New[i]->dump(&DAG));
       DAG.TransferDbgValues(SDValue(Old, i), New[i]);
       if (UpdatedNodes)
         UpdatedNodes->insert(New[i].getNode());
     }
     ReplacedNode(Old);
   }
 };
 }
 
 /// Return a vector shuffle operation which
 /// performs the same shuffe in terms of order or result bytes, but on a type
 /// whose vector element type is narrower than the original shuffle type.
 /// e.g. <v4i32> <0, 1, 0, 1> -> v8i16 <0, 1, 2, 3, 0, 1, 2, 3>
 SDValue
 SelectionDAGLegalize::ShuffleWithNarrowerEltType(EVT NVT, EVT VT,  SDLoc dl,
                                                  SDValue N1, SDValue N2,
                                                  ArrayRef<int> Mask) const {
   unsigned NumMaskElts = VT.getVectorNumElements();
   unsigned NumDestElts = NVT.getVectorNumElements();
   unsigned NumEltsGrowth = NumDestElts / NumMaskElts;
 
   assert(NumEltsGrowth && "Cannot promote to vector type with fewer elts!");
 
   if (NumEltsGrowth == 1)
     return DAG.getVectorShuffle(NVT, dl, N1, N2, &Mask[0]);
 
   SmallVector<int, 8> NewMask;
   for (unsigned i = 0; i != NumMaskElts; ++i) {
     int Idx = Mask[i];
     for (unsigned j = 0; j != NumEltsGrowth; ++j) {
       if (Idx < 0)
         NewMask.push_back(-1);
       else
         NewMask.push_back(Idx * NumEltsGrowth + j);
     }
   }
   assert(NewMask.size() == NumDestElts && "Non-integer NumEltsGrowth?");
   assert(TLI.isShuffleMaskLegal(NewMask, NVT) && "Shuffle not legal?");
   return DAG.getVectorShuffle(NVT, dl, N1, N2, &NewMask[0]);
 }
 
 /// Expands the ConstantFP node to an integer constant or
 /// a load from the constant pool.
 SDValue
 SelectionDAGLegalize::ExpandConstantFP(ConstantFPSDNode *CFP, bool UseCP) {
   bool Extend = false;
   SDLoc dl(CFP);
 
   // If a FP immediate is precise when represented as a float and if the
   // target can do an extending load from float to double, we put it into
   // the constant pool as a float, even if it's is statically typed as a
   // double.  This shrinks FP constants and canonicalizes them for targets where
   // an FP extending load is the same cost as a normal load (such as on the x87
   // fp stack or PPC FP unit).
   EVT VT = CFP->getValueType(0);
   ConstantFP *LLVMC = const_cast<ConstantFP*>(CFP->getConstantFPValue());
   if (!UseCP) {
     assert((VT == MVT::f64 || VT == MVT::f32) && "Invalid type expansion");
     return DAG.getConstant(LLVMC->getValueAPF().bitcastToAPInt(), dl,
                            (VT == MVT::f64) ? MVT::i64 : MVT::i32);
   }
 
   EVT OrigVT = VT;
   EVT SVT = VT;
   while (SVT != MVT::f32 && SVT != MVT::f16) {
     SVT = (MVT::SimpleValueType)(SVT.getSimpleVT().SimpleTy - 1);
     if (ConstantFPSDNode::isValueValidForType(SVT, CFP->getValueAPF()) &&
         // Only do this if the target has a native EXTLOAD instruction from
         // smaller type.
         TLI.isLoadExtLegal(ISD::EXTLOAD, OrigVT, SVT) &&
         TLI.ShouldShrinkFPConstant(OrigVT)) {
       Type *SType = SVT.getTypeForEVT(*DAG.getContext());
       LLVMC = cast<ConstantFP>(ConstantExpr::getFPTrunc(LLVMC, SType));
       VT = SVT;
       Extend = true;
     }
   }
 
   SDValue CPIdx =
       DAG.getConstantPool(LLVMC, TLI.getPointerTy(DAG.getDataLayout()));
   unsigned Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlignment();
   if (Extend) {
     SDValue Result = DAG.getExtLoad(
         ISD::EXTLOAD, dl, OrigVT, DAG.getEntryNode(), CPIdx,
         MachinePointerInfo::getConstantPool(DAG.getMachineFunction()), VT,
         false, false, false, Alignment);
     return Result;
   }
   SDValue Result =
       DAG.getLoad(OrigVT, dl, DAG.getEntryNode(), CPIdx,
                   MachinePointerInfo::getConstantPool(DAG.getMachineFunction()),
                   false, false, false, Alignment);
   return Result;
 }
 
 /// Expands the Constant node to a load from the constant pool.
 SDValue SelectionDAGLegalize::ExpandConstant(ConstantSDNode *CP) {
   SDLoc dl(CP);
   EVT VT = CP->getValueType(0);
   SDValue CPIdx = DAG.getConstantPool(CP->getConstantIntValue(),
                                       TLI.getPointerTy(DAG.getDataLayout()));
   unsigned Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlignment();
   SDValue Result =
     DAG.getLoad(VT, dl, DAG.getEntryNode(), CPIdx,
                 MachinePointerInfo::getConstantPool(DAG.getMachineFunction()),
                 false, false, false, Alignment);
   return Result;
 }
 
 /// Expands an unaligned store to 2 half-size stores.
 static void ExpandUnalignedStore(StoreSDNode *ST, SelectionDAG &DAG,
                                  const TargetLowering &TLI,
                                  SelectionDAGLegalize *DAGLegalize) {
   assert(ST->getAddressingMode() == ISD::UNINDEXED &&
          "unaligned indexed stores not implemented!");
   SDValue Chain = ST->getChain();
   SDValue Ptr = ST->getBasePtr();
   SDValue Val = ST->getValue();
   EVT VT = Val.getValueType();
   int Alignment = ST->getAlignment();
   unsigned AS = ST->getAddressSpace();
 
   SDLoc dl(ST);
   if (ST->getMemoryVT().isFloatingPoint() ||
       ST->getMemoryVT().isVector()) {
     EVT intVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits());
     if (TLI.isTypeLegal(intVT)) {
       // Expand to a bitconvert of the value to the integer type of the
       // same size, then a (misaligned) int store.
       // FIXME: Does not handle truncating floating point stores!
       SDValue Result = DAG.getNode(ISD::BITCAST, dl, intVT, Val);
       Result = DAG.getStore(Chain, dl, Result, Ptr, ST->getPointerInfo(),
                            ST->isVolatile(), ST->isNonTemporal(), Alignment);
       DAGLegalize->ReplaceNode(SDValue(ST, 0), Result);
       return;
     }
     // Do a (aligned) store to a stack slot, then copy from the stack slot
     // to the final destination using (unaligned) integer loads and stores.
     EVT StoredVT = ST->getMemoryVT();
     MVT RegVT =
       TLI.getRegisterType(*DAG.getContext(),
                           EVT::getIntegerVT(*DAG.getContext(),
                                             StoredVT.getSizeInBits()));
     unsigned StoredBytes = StoredVT.getSizeInBits() / 8;
     unsigned RegBytes = RegVT.getSizeInBits() / 8;
     unsigned NumRegs = (StoredBytes + RegBytes - 1) / RegBytes;
 
     // Make sure the stack slot is also aligned for the register type.
     SDValue StackPtr = DAG.CreateStackTemporary(StoredVT, RegVT);
 
     // Perform the original store, only redirected to the stack slot.
     SDValue Store = DAG.getTruncStore(Chain, dl,
                                       Val, StackPtr, MachinePointerInfo(),
                                       StoredVT, false, false, 0);
     SDValue Increment = DAG.getConstant(
         RegBytes, dl, TLI.getPointerTy(DAG.getDataLayout(), AS));
     SmallVector<SDValue, 8> Stores;
     unsigned Offset = 0;
 
     // Do all but one copies using the full register width.
     for (unsigned i = 1; i < NumRegs; i++) {
       // Load one integer register's worth from the stack slot.
       SDValue Load = DAG.getLoad(RegVT, dl, Store, StackPtr,
                                  MachinePointerInfo(),
                                  false, false, false, 0);
       // Store it to the final location.  Remember the store.
       Stores.push_back(DAG.getStore(Load.getValue(1), dl, Load, Ptr,
                                   ST->getPointerInfo().getWithOffset(Offset),
                                     ST->isVolatile(), ST->isNonTemporal(),
                                     MinAlign(ST->getAlignment(), Offset)));
       // Increment the pointers.
       Offset += RegBytes;
       StackPtr = DAG.getNode(ISD::ADD, dl, StackPtr.getValueType(), StackPtr,
                              Increment);
       Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr, Increment);
     }
 
     // The last store may be partial.  Do a truncating store.  On big-endian
     // machines this requires an extending load from the stack slot to ensure
     // that the bits are in the right place.
     EVT MemVT = EVT::getIntegerVT(*DAG.getContext(),
                                   8 * (StoredBytes - Offset));
 
     // Load from the stack slot.
     SDValue Load = DAG.getExtLoad(ISD::EXTLOAD, dl, RegVT, Store, StackPtr,
                                   MachinePointerInfo(),
                                   MemVT, false, false, false, 0);
 
     Stores.push_back(DAG.getTruncStore(Load.getValue(1), dl, Load, Ptr,
                                        ST->getPointerInfo()
                                          .getWithOffset(Offset),
                                        MemVT, ST->isVolatile(),
                                        ST->isNonTemporal(),
                                        MinAlign(ST->getAlignment(), Offset),
                                        ST->getAAInfo()));
     // The order of the stores doesn't matter - say it with a TokenFactor.
     SDValue Result = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Stores);
     DAGLegalize->ReplaceNode(SDValue(ST, 0), Result);
     return;
   }
   assert(ST->getMemoryVT().isInteger() &&
          !ST->getMemoryVT().isVector() &&
          "Unaligned store of unknown type.");
   // Get the half-size VT
   EVT NewStoredVT = ST->getMemoryVT().getHalfSizedIntegerVT(*DAG.getContext());
   int NumBits = NewStoredVT.getSizeInBits();
   int IncrementSize = NumBits / 8;
 
   // Divide the stored value in two parts.
   SDValue ShiftAmount =
       DAG.getConstant(NumBits, dl, TLI.getShiftAmountTy(Val.getValueType(),
                                                         DAG.getDataLayout()));
   SDValue Lo = Val;
   SDValue Hi = DAG.getNode(ISD::SRL, dl, VT, Val, ShiftAmount);
 
   // Store the two parts
   SDValue Store1, Store2;
   Store1 = DAG.getTruncStore(Chain, dl,
                              DAG.getDataLayout().isLittleEndian() ? Lo : Hi,
                              Ptr, ST->getPointerInfo(), NewStoredVT,
                              ST->isVolatile(), ST->isNonTemporal(), Alignment);
 
   Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,
                     DAG.getConstant(IncrementSize, dl,
                                     TLI.getPointerTy(DAG.getDataLayout(), AS)));
   Alignment = MinAlign(Alignment, IncrementSize);
   Store2 = DAG.getTruncStore(
       Chain, dl, DAG.getDataLayout().isLittleEndian() ? Hi : Lo, Ptr,
       ST->getPointerInfo().getWithOffset(IncrementSize), NewStoredVT,
       ST->isVolatile(), ST->isNonTemporal(), Alignment, ST->getAAInfo());
 
   SDValue Result =
     DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Store1, Store2);
   DAGLegalize->ReplaceNode(SDValue(ST, 0), Result);
 }
 
 /// Expands an unaligned load to 2 half-size loads.
 static void
 ExpandUnalignedLoad(LoadSDNode *LD, SelectionDAG &DAG,
                     const TargetLowering &TLI,
                     SDValue &ValResult, SDValue &ChainResult) {
   assert(LD->getAddressingMode() == ISD::UNINDEXED &&
          "unaligned indexed loads not implemented!");
   SDValue Chain = LD->getChain();
   SDValue Ptr = LD->getBasePtr();
   EVT VT = LD->getValueType(0);
   EVT LoadedVT = LD->getMemoryVT();
   SDLoc dl(LD);
   if (VT.isFloatingPoint() || VT.isVector()) {
     EVT intVT = EVT::getIntegerVT(*DAG.getContext(), LoadedVT.getSizeInBits());
     if (TLI.isTypeLegal(intVT) && TLI.isTypeLegal(LoadedVT)) {
       // Expand to a (misaligned) integer load of the same size,
       // then bitconvert to floating point or vector.
       SDValue newLoad = DAG.getLoad(intVT, dl, Chain, Ptr,
                                     LD->getMemOperand());
       SDValue Result = DAG.getNode(ISD::BITCAST, dl, LoadedVT, newLoad);
       if (LoadedVT != VT)
         Result = DAG.getNode(VT.isFloatingPoint() ? ISD::FP_EXTEND :
                              ISD::ANY_EXTEND, dl, VT, Result);
 
       ValResult = Result;
       ChainResult = newLoad.getValue(1);
       return;
     }
 
     // Copy the value to a (aligned) stack slot using (unaligned) integer
     // loads and stores, then do a (aligned) load from the stack slot.
     MVT RegVT = TLI.getRegisterType(*DAG.getContext(), intVT);
     unsigned LoadedBytes = LoadedVT.getSizeInBits() / 8;
     unsigned RegBytes = RegVT.getSizeInBits() / 8;
     unsigned NumRegs = (LoadedBytes + RegBytes - 1) / RegBytes;
 
     // Make sure the stack slot is also aligned for the register type.
     SDValue StackBase = DAG.CreateStackTemporary(LoadedVT, RegVT);
 
     SDValue Increment =
         DAG.getConstant(RegBytes, dl, TLI.getPointerTy(DAG.getDataLayout()));
     SmallVector<SDValue, 8> Stores;
     SDValue StackPtr = StackBase;
     unsigned Offset = 0;
 
     // Do all but one copies using the full register width.
     for (unsigned i = 1; i < NumRegs; i++) {
       // Load one integer register's worth from the original location.
       SDValue Load = DAG.getLoad(RegVT, dl, Chain, Ptr,
                                  LD->getPointerInfo().getWithOffset(Offset),
                                  LD->isVolatile(), LD->isNonTemporal(),
                                  LD->isInvariant(),
                                  MinAlign(LD->getAlignment(), Offset),
                                  LD->getAAInfo());
       // Follow the load with a store to the stack slot.  Remember the store.
       Stores.push_back(DAG.getStore(Load.getValue(1), dl, Load, StackPtr,
                                     MachinePointerInfo(), false, false, 0));
       // Increment the pointers.
       Offset += RegBytes;
       Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr, Increment);
       StackPtr = DAG.getNode(ISD::ADD, dl, StackPtr.getValueType(), StackPtr,
                              Increment);
     }
 
     // The last copy may be partial.  Do an extending load.
     EVT MemVT = EVT::getIntegerVT(*DAG.getContext(),
                                   8 * (LoadedBytes - Offset));
     SDValue Load = DAG.getExtLoad(ISD::EXTLOAD, dl, RegVT, Chain, Ptr,
                                   LD->getPointerInfo().getWithOffset(Offset),
                                   MemVT, LD->isVolatile(),
                                   LD->isNonTemporal(),
                                   LD->isInvariant(),
                                   MinAlign(LD->getAlignment(), Offset),
                                   LD->getAAInfo());
     // Follow the load with a store to the stack slot.  Remember the store.
     // On big-endian machines this requires a truncating store to ensure
     // that the bits end up in the right place.
     Stores.push_back(DAG.getTruncStore(Load.getValue(1), dl, Load, StackPtr,
                                        MachinePointerInfo(), MemVT,
                                        false, false, 0));
 
     // The order of the stores doesn't matter - say it with a TokenFactor.
     SDValue TF = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Stores);
 
     // Finally, perform the original load only redirected to the stack slot.
     Load = DAG.getExtLoad(LD->getExtensionType(), dl, VT, TF, StackBase,
                           MachinePointerInfo(), LoadedVT, false,false, false,
                           0);
 
     // Callers expect a MERGE_VALUES node.
     ValResult = Load;
     ChainResult = TF;
     return;
   }
   assert(LoadedVT.isInteger() && !LoadedVT.isVector() &&
          "Unaligned load of unsupported type.");
 
   // Compute the new VT that is half the size of the old one.  This is an
   // integer MVT.
   unsigned NumBits = LoadedVT.getSizeInBits();
   EVT NewLoadedVT;
   NewLoadedVT = EVT::getIntegerVT(*DAG.getContext(), NumBits/2);
   NumBits >>= 1;
 
   unsigned Alignment = LD->getAlignment();
   unsigned IncrementSize = NumBits / 8;
   ISD::LoadExtType HiExtType = LD->getExtensionType();
 
   // If the original load is NON_EXTLOAD, the hi part load must be ZEXTLOAD.
   if (HiExtType == ISD::NON_EXTLOAD)
     HiExtType = ISD::ZEXTLOAD;
 
   // Load the value in two parts
   SDValue Lo, Hi;
   if (DAG.getDataLayout().isLittleEndian()) {
     Lo = DAG.getExtLoad(ISD::ZEXTLOAD, dl, VT, Chain, Ptr, LD->getPointerInfo(),
                         NewLoadedVT, LD->isVolatile(),
                         LD->isNonTemporal(), LD->isInvariant(), Alignment,
                         LD->getAAInfo());
     Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,
                       DAG.getConstant(IncrementSize, dl, Ptr.getValueType()));
     Hi = DAG.getExtLoad(HiExtType, dl, VT, Chain, Ptr,
                         LD->getPointerInfo().getWithOffset(IncrementSize),
                         NewLoadedVT, LD->isVolatile(),
                         LD->isNonTemporal(),LD->isInvariant(),
                         MinAlign(Alignment, IncrementSize), LD->getAAInfo());
   } else {
     Hi = DAG.getExtLoad(HiExtType, dl, VT, Chain, Ptr, LD->getPointerInfo(),
                         NewLoadedVT, LD->isVolatile(),
                         LD->isNonTemporal(), LD->isInvariant(), Alignment,
                         LD->getAAInfo());
     Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,
                       DAG.getConstant(IncrementSize, dl, Ptr.getValueType()));
     Lo = DAG.getExtLoad(ISD::ZEXTLOAD, dl, VT, Chain, Ptr,
                         LD->getPointerInfo().getWithOffset(IncrementSize),
                         NewLoadedVT, LD->isVolatile(),
                         LD->isNonTemporal(), LD->isInvariant(),
                         MinAlign(Alignment, IncrementSize), LD->getAAInfo());
   }
 
   // aggregate the two parts
   SDValue ShiftAmount =
       DAG.getConstant(NumBits, dl, TLI.getShiftAmountTy(Hi.getValueType(),
                                                         DAG.getDataLayout()));
   SDValue Result = DAG.getNode(ISD::SHL, dl, VT, Hi, ShiftAmount);
   Result = DAG.getNode(ISD::OR, dl, VT, Result, Lo);
 
   SDValue TF = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
                              Hi.getValue(1));
 
   ValResult = Result;
   ChainResult = TF;
 }
 
 /// Some target cannot handle a variable insertion index for the
 /// INSERT_VECTOR_ELT instruction.  In this case, it
 /// is necessary to spill the vector being inserted into to memory, perform
 /// the insert there, and then read the result back.
 SDValue SelectionDAGLegalize::
 PerformInsertVectorEltInMemory(SDValue Vec, SDValue Val, SDValue Idx,
                                SDLoc dl) {
   SDValue Tmp1 = Vec;
   SDValue Tmp2 = Val;
   SDValue Tmp3 = Idx;
 
   // If the target doesn't support this, we have to spill the input vector
   // to a temporary stack slot, update the element, then reload it.  This is
   // badness.  We could also load the value into a vector register (either
   // with a "move to register" or "extload into register" instruction, then
   // permute it into place, if the idx is a constant and if the idx is
   // supported by the target.
   EVT VT    = Tmp1.getValueType();
   EVT EltVT = VT.getVectorElementType();
   EVT IdxVT = Tmp3.getValueType();
   EVT PtrVT = TLI.getPointerTy(DAG.getDataLayout());
   SDValue StackPtr = DAG.CreateStackTemporary(VT);
 
   int SPFI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
 
   // Store the vector.
   SDValue Ch = DAG.getStore(
       DAG.getEntryNode(), dl, Tmp1, StackPtr,
       MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SPFI), false,
       false, 0);
 
   // Truncate or zero extend offset to target pointer type.
   Tmp3 = DAG.getZExtOrTrunc(Tmp3, dl, PtrVT);
   // Add the offset to the index.
   unsigned EltSize = EltVT.getSizeInBits()/8;
   Tmp3 = DAG.getNode(ISD::MUL, dl, IdxVT, Tmp3,
                      DAG.getConstant(EltSize, dl, IdxVT));
   SDValue StackPtr2 = DAG.getNode(ISD::ADD, dl, IdxVT, Tmp3, StackPtr);
   // Store the scalar value.
   Ch = DAG.getTruncStore(Ch, dl, Tmp2, StackPtr2, MachinePointerInfo(), EltVT,
                          false, false, 0);
   // Load the updated vector.
   return DAG.getLoad(VT, dl, Ch, StackPtr, MachinePointerInfo::getFixedStack(
                                                DAG.getMachineFunction(), SPFI),
                      false, false, false, 0);
 }
 
 
 SDValue SelectionDAGLegalize::
 ExpandINSERT_VECTOR_ELT(SDValue Vec, SDValue Val, SDValue Idx, SDLoc dl) {
   if (ConstantSDNode *InsertPos = dyn_cast<ConstantSDNode>(Idx)) {
     // SCALAR_TO_VECTOR requires that the type of the value being inserted
     // match the element type of the vector being created, except for
     // integers in which case the inserted value can be over width.
     EVT EltVT = Vec.getValueType().getVectorElementType();
     if (Val.getValueType() == EltVT ||
         (EltVT.isInteger() && Val.getValueType().bitsGE(EltVT))) {
       SDValue ScVec = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl,
                                   Vec.getValueType(), Val);
 
       unsigned NumElts = Vec.getValueType().getVectorNumElements();
       // We generate a shuffle of InVec and ScVec, so the shuffle mask
       // should be 0,1,2,3,4,5... with the appropriate element replaced with
       // elt 0 of the RHS.
       SmallVector<int, 8> ShufOps;
       for (unsigned i = 0; i != NumElts; ++i)
         ShufOps.push_back(i != InsertPos->getZExtValue() ? i : NumElts);
 
       return DAG.getVectorShuffle(Vec.getValueType(), dl, Vec, ScVec,
                                   &ShufOps[0]);
     }
   }
   return PerformInsertVectorEltInMemory(Vec, Val, Idx, dl);
 }
 
 SDValue SelectionDAGLegalize::OptimizeFloatStore(StoreSDNode* ST) {
   // Turn 'store float 1.0, Ptr' -> 'store int 0x12345678, Ptr'
   // FIXME: We shouldn't do this for TargetConstantFP's.
   // FIXME: move this to the DAG Combiner!  Note that we can't regress due
   // to phase ordering between legalized code and the dag combiner.  This
   // probably means that we need to integrate dag combiner and legalizer
   // together.
   // We generally can't do this one for long doubles.
   SDValue Chain = ST->getChain();
   SDValue Ptr = ST->getBasePtr();
   unsigned Alignment = ST->getAlignment();
   bool isVolatile = ST->isVolatile();
   bool isNonTemporal = ST->isNonTemporal();
   AAMDNodes AAInfo = ST->getAAInfo();
   SDLoc dl(ST);
   if (ConstantFPSDNode *CFP = dyn_cast<ConstantFPSDNode>(ST->getValue())) {
     if (CFP->getValueType(0) == MVT::f32 &&
         TLI.isTypeLegal(MVT::i32)) {
       SDValue Con = DAG.getConstant(CFP->getValueAPF().
                                       bitcastToAPInt().zextOrTrunc(32),
                                     SDLoc(CFP), MVT::i32);
       return DAG.getStore(Chain, dl, Con, Ptr, ST->getPointerInfo(),
                           isVolatile, isNonTemporal, Alignment, AAInfo);
     }
 
     if (CFP->getValueType(0) == MVT::f64) {
       // If this target supports 64-bit registers, do a single 64-bit store.
       if (TLI.isTypeLegal(MVT::i64)) {
         SDValue Con = DAG.getConstant(CFP->getValueAPF().bitcastToAPInt().
                                       zextOrTrunc(64), SDLoc(CFP), MVT::i64);
         return DAG.getStore(Chain, dl, Con, Ptr, ST->getPointerInfo(),
                             isVolatile, isNonTemporal, Alignment, AAInfo);
       }
 
       if (TLI.isTypeLegal(MVT::i32) && !ST->isVolatile()) {
         // Otherwise, if the target supports 32-bit registers, use 2 32-bit
         // stores.  If the target supports neither 32- nor 64-bits, this
         // xform is certainly not worth it.
         const APInt &IntVal = CFP->getValueAPF().bitcastToAPInt();
         SDValue Lo = DAG.getConstant(IntVal.trunc(32), dl, MVT::i32);
         SDValue Hi = DAG.getConstant(IntVal.lshr(32).trunc(32), dl, MVT::i32);
         if (DAG.getDataLayout().isBigEndian())
           std::swap(Lo, Hi);
 
         Lo = DAG.getStore(Chain, dl, Lo, Ptr, ST->getPointerInfo(), isVolatile,
                           isNonTemporal, Alignment, AAInfo);
         Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,
                           DAG.getConstant(4, dl, Ptr.getValueType()));
         Hi = DAG.getStore(Chain, dl, Hi, Ptr,
                           ST->getPointerInfo().getWithOffset(4),
                           isVolatile, isNonTemporal, MinAlign(Alignment, 4U),
                           AAInfo);
 
         return DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo, Hi);
       }
     }
   }
   return SDValue(nullptr, 0);
 }
 
 void SelectionDAGLegalize::LegalizeStoreOps(SDNode *Node) {
     StoreSDNode *ST = cast<StoreSDNode>(Node);
     SDValue Chain = ST->getChain();
     SDValue Ptr = ST->getBasePtr();
     SDLoc dl(Node);
 
     unsigned Alignment = ST->getAlignment();
     bool isVolatile = ST->isVolatile();
     bool isNonTemporal = ST->isNonTemporal();
     AAMDNodes AAInfo = ST->getAAInfo();
 
     if (!ST->isTruncatingStore()) {
       if (SDNode *OptStore = OptimizeFloatStore(ST).getNode()) {
         ReplaceNode(ST, OptStore);
         return;
       }
 
       {
         SDValue Value = ST->getValue();
         MVT VT = Value.getSimpleValueType();
         switch (TLI.getOperationAction(ISD::STORE, VT)) {
         default: llvm_unreachable("This action is not supported yet!");
         case TargetLowering::Legal: {
           // If this is an unaligned store and the target doesn't support it,
           // expand it.
           EVT MemVT = ST->getMemoryVT();
           unsigned AS = ST->getAddressSpace();
           unsigned Align = ST->getAlignment();
           const DataLayout &DL = DAG.getDataLayout();
           if (!TLI.allowsMemoryAccess(*DAG.getContext(), DL, MemVT, AS, Align))
             ExpandUnalignedStore(cast<StoreSDNode>(Node), DAG, TLI, this);
           break;
         }
         case TargetLowering::Custom: {
           SDValue Res = TLI.LowerOperation(SDValue(Node, 0), DAG);
           if (Res && Res != SDValue(Node, 0))
             ReplaceNode(SDValue(Node, 0), Res);
           return;
         }
         case TargetLowering::Promote: {
           MVT NVT = TLI.getTypeToPromoteTo(ISD::STORE, VT);
           assert(NVT.getSizeInBits() == VT.getSizeInBits() &&
                  "Can only promote stores to same size type");
           Value = DAG.getNode(ISD::BITCAST, dl, NVT, Value);
           SDValue Result =
             DAG.getStore(Chain, dl, Value, Ptr,
                          ST->getPointerInfo(), isVolatile,
                          isNonTemporal, Alignment, AAInfo);
           ReplaceNode(SDValue(Node, 0), Result);
           break;
         }
         }
         return;
       }
     } else {
       SDValue Value = ST->getValue();
 
       EVT StVT = ST->getMemoryVT();
       unsigned StWidth = StVT.getSizeInBits();
       auto &DL = DAG.getDataLayout();
 
       if (StWidth != StVT.getStoreSizeInBits()) {
         // Promote to a byte-sized store with upper bits zero if not
         // storing an integral number of bytes.  For example, promote
         // TRUNCSTORE:i1 X -> TRUNCSTORE:i8 (and X, 1)
         EVT NVT = EVT::getIntegerVT(*DAG.getContext(),
                                     StVT.getStoreSizeInBits());
         Value = DAG.getZeroExtendInReg(Value, dl, StVT);
         SDValue Result =
           DAG.getTruncStore(Chain, dl, Value, Ptr, ST->getPointerInfo(),
                             NVT, isVolatile, isNonTemporal, Alignment, AAInfo);
         ReplaceNode(SDValue(Node, 0), Result);
       } else if (StWidth & (StWidth - 1)) {
         // If not storing a power-of-2 number of bits, expand as two stores.
         assert(!StVT.isVector() && "Unsupported truncstore!");
         unsigned RoundWidth = 1 << Log2_32(StWidth);
         assert(RoundWidth < StWidth);
         unsigned ExtraWidth = StWidth - RoundWidth;
         assert(ExtraWidth < RoundWidth);
         assert(!(RoundWidth % 8) && !(ExtraWidth % 8) &&
                "Store size not an integral number of bytes!");
         EVT RoundVT = EVT::getIntegerVT(*DAG.getContext(), RoundWidth);
         EVT ExtraVT = EVT::getIntegerVT(*DAG.getContext(), ExtraWidth);
         SDValue Lo, Hi;
         unsigned IncrementSize;
 
         if (DL.isLittleEndian()) {
           // TRUNCSTORE:i24 X -> TRUNCSTORE:i16 X, TRUNCSTORE@+2:i8 (srl X, 16)
           // Store the bottom RoundWidth bits.
           Lo = DAG.getTruncStore(Chain, dl, Value, Ptr, ST->getPointerInfo(),
                                  RoundVT,
                                  isVolatile, isNonTemporal, Alignment,
                                  AAInfo);
 
           // Store the remaining ExtraWidth bits.
           IncrementSize = RoundWidth / 8;
           Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,
                             DAG.getConstant(IncrementSize, dl,
                                             Ptr.getValueType()));
           Hi = DAG.getNode(
               ISD::SRL, dl, Value.getValueType(), Value,
               DAG.getConstant(RoundWidth, dl,
                               TLI.getShiftAmountTy(Value.getValueType(), DL)));
           Hi = DAG.getTruncStore(Chain, dl, Hi, Ptr,
                              ST->getPointerInfo().getWithOffset(IncrementSize),
                                  ExtraVT, isVolatile, isNonTemporal,
                                  MinAlign(Alignment, IncrementSize), AAInfo);
         } else {
           // Big endian - avoid unaligned stores.
           // TRUNCSTORE:i24 X -> TRUNCSTORE:i16 (srl X, 8), TRUNCSTORE@+2:i8 X
           // Store the top RoundWidth bits.
           Hi = DAG.getNode(
               ISD::SRL, dl, Value.getValueType(), Value,
               DAG.getConstant(ExtraWidth, dl,
                               TLI.getShiftAmountTy(Value.getValueType(), DL)));
           Hi = DAG.getTruncStore(Chain, dl, Hi, Ptr, ST->getPointerInfo(),
                                  RoundVT, isVolatile, isNonTemporal, Alignment,
                                  AAInfo);
 
           // Store the remaining ExtraWidth bits.
           IncrementSize = RoundWidth / 8;
           Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,
                             DAG.getConstant(IncrementSize, dl,
                                             Ptr.getValueType()));
           Lo = DAG.getTruncStore(Chain, dl, Value, Ptr,
                               ST->getPointerInfo().getWithOffset(IncrementSize),
                                  ExtraVT, isVolatile, isNonTemporal,
                                  MinAlign(Alignment, IncrementSize), AAInfo);
         }
 
         // The order of the stores doesn't matter.
         SDValue Result = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo, Hi);
         ReplaceNode(SDValue(Node, 0), Result);
       } else {
         switch (TLI.getTruncStoreAction(ST->getValue().getValueType(), StVT)) {
         default: llvm_unreachable("This action is not supported yet!");
         case TargetLowering::Legal: {
           EVT MemVT = ST->getMemoryVT();
           unsigned AS = ST->getAddressSpace();
           unsigned Align = ST->getAlignment();
           // If this is an unaligned store and the target doesn't support it,
           // expand it.
           if (!TLI.allowsMemoryAccess(*DAG.getContext(), DL, MemVT, AS, Align))
             ExpandUnalignedStore(cast<StoreSDNode>(Node), DAG, TLI, this);
           break;
         }
         case TargetLowering::Custom: {
           SDValue Res = TLI.LowerOperation(SDValue(Node, 0), DAG);
           if (Res && Res != SDValue(Node, 0))
             ReplaceNode(SDValue(Node, 0), Res);
           return;
         }
         case TargetLowering::Expand:
           assert(!StVT.isVector() &&
                  "Vector Stores are handled in LegalizeVectorOps");
 
           // TRUNCSTORE:i16 i32 -> STORE i16
           assert(TLI.isTypeLegal(StVT) &&
                  "Do not know how to expand this store!");
           Value = DAG.getNode(ISD::TRUNCATE, dl, StVT, Value);
           SDValue Result =
             DAG.getStore(Chain, dl, Value, Ptr, ST->getPointerInfo(),
                          isVolatile, isNonTemporal, Alignment, AAInfo);
           ReplaceNode(SDValue(Node, 0), Result);
           break;
         }
       }
     }
 }
 
 void SelectionDAGLegalize::LegalizeLoadOps(SDNode *Node) {
   LoadSDNode *LD = cast<LoadSDNode>(Node);
   SDValue Chain = LD->getChain();  // The chain.
   SDValue Ptr = LD->getBasePtr();  // The base pointer.
   SDValue Value;                   // The value returned by the load op.
   SDLoc dl(Node);
 
   ISD::LoadExtType ExtType = LD->getExtensionType();
   if (ExtType == ISD::NON_EXTLOAD) {
     MVT VT = Node->getSimpleValueType(0);
     SDValue RVal = SDValue(Node, 0);
     SDValue RChain = SDValue(Node, 1);
 
     switch (TLI.getOperationAction(Node->getOpcode(), VT)) {
     default: llvm_unreachable("This action is not supported yet!");
     case TargetLowering::Legal: {
       EVT MemVT = LD->getMemoryVT();
       unsigned AS = LD->getAddressSpace();
       unsigned Align = LD->getAlignment();
       const DataLayout &DL = DAG.getDataLayout();
       // If this is an unaligned load and the target doesn't support it,
       // expand it.
       if (!TLI.allowsMemoryAccess(*DAG.getContext(), DL, MemVT, AS, Align))
         ExpandUnalignedLoad(cast<LoadSDNode>(Node), DAG, TLI, RVal, RChain);
       break;
     }
     case TargetLowering::Custom: {
       SDValue Res = TLI.LowerOperation(RVal, DAG);
       if (Res.getNode()) {
         RVal = Res;
         RChain = Res.getValue(1);
       }
       break;
     }
     case TargetLowering::Promote: {
       MVT NVT = TLI.getTypeToPromoteTo(Node->getOpcode(), VT);
       assert(NVT.getSizeInBits() == VT.getSizeInBits() &&
              "Can only promote loads to same size type");
 
       SDValue Res = DAG.getLoad(NVT, dl, Chain, Ptr, LD->getMemOperand());
       RVal = DAG.getNode(ISD::BITCAST, dl, VT, Res);
       RChain = Res.getValue(1);
       break;
     }
     }
     if (RChain.getNode() != Node) {
       assert(RVal.getNode() != Node && "Load must be completely replaced");
       DAG.ReplaceAllUsesOfValueWith(SDValue(Node, 0), RVal);
       DAG.ReplaceAllUsesOfValueWith(SDValue(Node, 1), RChain);
       if (UpdatedNodes) {
         UpdatedNodes->insert(RVal.getNode());
         UpdatedNodes->insert(RChain.getNode());
       }
       ReplacedNode(Node);
     }
     return;
   }
 
   EVT SrcVT = LD->getMemoryVT();
   unsigned SrcWidth = SrcVT.getSizeInBits();
   unsigned Alignment = LD->getAlignment();
   bool isVolatile = LD->isVolatile();
   bool isNonTemporal = LD->isNonTemporal();
   bool isInvariant = LD->isInvariant();
   AAMDNodes AAInfo = LD->getAAInfo();
 
   if (SrcWidth != SrcVT.getStoreSizeInBits() &&
       // Some targets pretend to have an i1 loading operation, and actually
       // load an i8.  This trick is correct for ZEXTLOAD because the top 7
       // bits are guaranteed to be zero; it helps the optimizers understand
       // that these bits are zero.  It is also useful for EXTLOAD, since it
       // tells the optimizers that those bits are undefined.  It would be
       // nice to have an effective generic way of getting these benefits...
       // Until such a way is found, don't insist on promoting i1 here.
       (SrcVT != MVT::i1 ||
        TLI.getLoadExtAction(ExtType, Node->getValueType(0), MVT::i1) ==
          TargetLowering::Promote)) {
     // Promote to a byte-sized load if not loading an integral number of
     // bytes.  For example, promote EXTLOAD:i20 -> EXTLOAD:i24.
     unsigned NewWidth = SrcVT.getStoreSizeInBits();
     EVT NVT = EVT::getIntegerVT(*DAG.getContext(), NewWidth);
     SDValue Ch;
 
     // The extra bits are guaranteed to be zero, since we stored them that
     // way.  A zext load from NVT thus automatically gives zext from SrcVT.
 
     ISD::LoadExtType NewExtType =
       ExtType == ISD::ZEXTLOAD ? ISD::ZEXTLOAD : ISD::EXTLOAD;
 
     SDValue Result =
       DAG.getExtLoad(NewExtType, dl, Node->getValueType(0),
                      Chain, Ptr, LD->getPointerInfo(),
                      NVT, isVolatile, isNonTemporal, isInvariant, Alignment,
                      AAInfo);
 
     Ch = Result.getValue(1); // The chain.
 
     if (ExtType == ISD::SEXTLOAD)
       // Having the top bits zero doesn't help when sign extending.
       Result = DAG.getNode(ISD::SIGN_EXTEND_INREG, dl,
                            Result.getValueType(),
                            Result, DAG.getValueType(SrcVT));
     else if (ExtType == ISD::ZEXTLOAD || NVT == Result.getValueType())
       // All the top bits are guaranteed to be zero - inform the optimizers.
       Result = DAG.getNode(ISD::AssertZext, dl,
                            Result.getValueType(), Result,
                            DAG.getValueType(SrcVT));
 
     Value = Result;
     Chain = Ch;
   } else if (SrcWidth & (SrcWidth - 1)) {
     // If not loading a power-of-2 number of bits, expand as two loads.
     assert(!SrcVT.isVector() && "Unsupported extload!");
     unsigned RoundWidth = 1 << Log2_32(SrcWidth);
     assert(RoundWidth < SrcWidth);
     unsigned ExtraWidth = SrcWidth - RoundWidth;
     assert(ExtraWidth < RoundWidth);
     assert(!(RoundWidth % 8) && !(ExtraWidth % 8) &&
            "Load size not an integral number of bytes!");
     EVT RoundVT = EVT::getIntegerVT(*DAG.getContext(), RoundWidth);
     EVT ExtraVT = EVT::getIntegerVT(*DAG.getContext(), ExtraWidth);
     SDValue Lo, Hi, Ch;
     unsigned IncrementSize;
     auto &DL = DAG.getDataLayout();
 
     if (DL.isLittleEndian()) {
       // EXTLOAD:i24 -> ZEXTLOAD:i16 | (shl EXTLOAD@+2:i8, 16)
       // Load the bottom RoundWidth bits.
       Lo = DAG.getExtLoad(ISD::ZEXTLOAD, dl, Node->getValueType(0),
                           Chain, Ptr,
                           LD->getPointerInfo(), RoundVT, isVolatile,
                           isNonTemporal, isInvariant, Alignment, AAInfo);
 
       // Load the remaining ExtraWidth bits.
       IncrementSize = RoundWidth / 8;
       Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,
                          DAG.getConstant(IncrementSize, dl,
                                          Ptr.getValueType()));
       Hi = DAG.getExtLoad(ExtType, dl, Node->getValueType(0), Chain, Ptr,
                           LD->getPointerInfo().getWithOffset(IncrementSize),
                           ExtraVT, isVolatile, isNonTemporal, isInvariant,
                           MinAlign(Alignment, IncrementSize), AAInfo);
 
       // Build a factor node to remember that this load is independent of
       // the other one.
       Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
                        Hi.getValue(1));
 
       // Move the top bits to the right place.
       Hi = DAG.getNode(
           ISD::SHL, dl, Hi.getValueType(), Hi,
           DAG.getConstant(RoundWidth, dl,
                           TLI.getShiftAmountTy(Hi.getValueType(), DL)));
 
       // Join the hi and lo parts.
       Value = DAG.getNode(ISD::OR, dl, Node->getValueType(0), Lo, Hi);
     } else {
       // Big endian - avoid unaligned loads.
       // EXTLOAD:i24 -> (shl EXTLOAD:i16, 8) | ZEXTLOAD@+2:i8
       // Load the top RoundWidth bits.
       Hi = DAG.getExtLoad(ExtType, dl, Node->getValueType(0), Chain, Ptr,
                           LD->getPointerInfo(), RoundVT, isVolatile,
                           isNonTemporal, isInvariant, Alignment, AAInfo);
 
       // Load the remaining ExtraWidth bits.
       IncrementSize = RoundWidth / 8;
       Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,
                          DAG.getConstant(IncrementSize, dl,
                                          Ptr.getValueType()));
       Lo = DAG.getExtLoad(ISD::ZEXTLOAD,
                           dl, Node->getValueType(0), Chain, Ptr,
                           LD->getPointerInfo().getWithOffset(IncrementSize),
                           ExtraVT, isVolatile, isNonTemporal, isInvariant,
                           MinAlign(Alignment, IncrementSize), AAInfo);
 
       // Build a factor node to remember that this load is independent of
       // the other one.
       Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
                        Hi.getValue(1));
 
       // Move the top bits to the right place.
       Hi = DAG.getNode(
           ISD::SHL, dl, Hi.getValueType(), Hi,
           DAG.getConstant(ExtraWidth, dl,
                           TLI.getShiftAmountTy(Hi.getValueType(), DL)));
 
       // Join the hi and lo parts.
       Value = DAG.getNode(ISD::OR, dl, Node->getValueType(0), Lo, Hi);
     }
 
     Chain = Ch;
   } else {
     bool isCustom = false;
     switch (TLI.getLoadExtAction(ExtType, Node->getValueType(0),
                                  SrcVT.getSimpleVT())) {
     default: llvm_unreachable("This action is not supported yet!");
     case TargetLowering::Custom:
       isCustom = true;
       // FALLTHROUGH
     case TargetLowering::Legal: {
       Value = SDValue(Node, 0);
       Chain = SDValue(Node, 1);
 
       if (isCustom) {
         SDValue Res = TLI.LowerOperation(SDValue(Node, 0), DAG);
         if (Res.getNode()) {
           Value = Res;
           Chain = Res.getValue(1);
         }
       } else {
         // If this is an unaligned load and the target doesn't support it,
         // expand it.
         EVT MemVT = LD->getMemoryVT();
         unsigned AS = LD->getAddressSpace();
         unsigned Align = LD->getAlignment();
         const DataLayout &DL = DAG.getDataLayout();
         if (!TLI.allowsMemoryAccess(*DAG.getContext(), DL, MemVT, AS, Align))
           ExpandUnalignedLoad(cast<LoadSDNode>(Node), DAG, TLI, Value, Chain);
       }
       break;
     }
     case TargetLowering::Expand:
       EVT DestVT = Node->getValueType(0);
       if (!TLI.isLoadExtLegal(ISD::EXTLOAD, DestVT, SrcVT)) {
         // If the source type is not legal, see if there is a legal extload to
         // an intermediate type that we can then extend further.
         EVT LoadVT = TLI.getRegisterType(SrcVT.getSimpleVT());
         if (TLI.isTypeLegal(SrcVT) || // Same as SrcVT == LoadVT?
             TLI.isLoadExtLegal(ExtType, LoadVT, SrcVT)) {
           // If we are loading a legal type, this is a non-extload followed by a
           // full extend.
           ISD::LoadExtType MidExtType =
               (LoadVT == SrcVT) ? ISD::NON_EXTLOAD : ExtType;
 
           SDValue Load = DAG.getExtLoad(MidExtType, dl, LoadVT, Chain, Ptr,
                                         SrcVT, LD->getMemOperand());
           unsigned ExtendOp =
               ISD::getExtForLoadExtType(SrcVT.isFloatingPoint(), ExtType);
           Value = DAG.getNode(ExtendOp, dl, Node->getValueType(0), Load);
           Chain = Load.getValue(1);
           break;
         }
 
         // Handle the special case of fp16 extloads. EXTLOAD doesn't have the
         // normal undefined upper bits behavior to allow using an in-reg extend
         // with the illegal FP type, so load as an integer and do the
         // from-integer conversion.
         if (SrcVT.getScalarType() == MVT::f16) {
           EVT ISrcVT = SrcVT.changeTypeToInteger();
           EVT IDestVT = DestVT.changeTypeToInteger();
           EVT LoadVT = TLI.getRegisterType(IDestVT.getSimpleVT());
 
           SDValue Result = DAG.getExtLoad(ISD::ZEXTLOAD, dl, LoadVT,
                                           Chain, Ptr, ISrcVT,
                                           LD->getMemOperand());
           Value = DAG.getNode(ISD::FP16_TO_FP, dl, DestVT, Result);
           Chain = Result.getValue(1);
           break;
         }
       }
 
       assert(!SrcVT.isVector() &&
              "Vector Loads are handled in LegalizeVectorOps");
 
       // FIXME: This does not work for vectors on most targets.  Sign-
       // and zero-extend operations are currently folded into extending
       // loads, whether they are legal or not, and then we end up here
       // without any support for legalizing them.
       assert(ExtType != ISD::EXTLOAD &&
              "EXTLOAD should always be supported!");
       // Turn the unsupported load into an EXTLOAD followed by an
       // explicit zero/sign extend inreg.
       SDValue Result = DAG.getExtLoad(ISD::EXTLOAD, dl,
                                       Node->getValueType(0),
                                       Chain, Ptr, SrcVT,
                                       LD->getMemOperand());
       SDValue ValRes;
       if (ExtType == ISD::SEXTLOAD)
         ValRes = DAG.getNode(ISD::SIGN_EXTEND_INREG, dl,
                              Result.getValueType(),
                              Result, DAG.getValueType(SrcVT));
       else
         ValRes = DAG.getZeroExtendInReg(Result, dl, SrcVT.getScalarType());
       Value = ValRes;
       Chain = Result.getValue(1);
       break;
     }
   }
 
   // Since loads produce two values, make sure to remember that we legalized
   // both of them.
   if (Chain.getNode() != Node) {
     assert(Value.getNode() != Node && "Load must be completely replaced");
     DAG.ReplaceAllUsesOfValueWith(SDValue(Node, 0), Value);
     DAG.ReplaceAllUsesOfValueWith(SDValue(Node, 1), Chain);
     if (UpdatedNodes) {
       UpdatedNodes->insert(Value.getNode());
       UpdatedNodes->insert(Chain.getNode());
     }
     ReplacedNode(Node);
   }
 }
 
 /// Return a legal replacement for the given operation, with all legal operands.
 void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
   DEBUG(dbgs() << "\nLegalizing: "; Node->dump(&DAG));
 
   if (Node->getOpcode() == ISD::TargetConstant) // Allow illegal target nodes.
     return;
 
 #ifndef NDEBUG
   for (unsigned i = 0, e = Node->getNumValues(); i != e; ++i)
     assert((TLI.getTypeAction(*DAG.getContext(), Node->getValueType(i)) ==
               TargetLowering::TypeLegal ||
             TLI.isTypeLegal(Node->getValueType(i))) &&
            "Unexpected illegal type!");
 
   for (const SDValue &Op : Node->op_values())
     assert((TLI.getTypeAction(*DAG.getContext(), Op.getValueType()) ==
               TargetLowering::TypeLegal ||
             TLI.isTypeLegal(Op.getValueType()) ||
             Op.getOpcode() == ISD::TargetConstant) &&
             "Unexpected illegal type!");
 #endif
 
   // Figure out the correct action; the way to query this varies by opcode
   TargetLowering::LegalizeAction Action = TargetLowering::Legal;
   bool SimpleFinishLegalizing = true;
   switch (Node->getOpcode()) {
   case ISD::INTRINSIC_W_CHAIN:
   case ISD::INTRINSIC_WO_CHAIN:
   case ISD::INTRINSIC_VOID:
   case ISD::STACKSAVE:
     Action = TLI.getOperationAction(Node->getOpcode(), MVT::Other);
     break;
   case ISD::GET_DYNAMIC_AREA_OFFSET:
     Action = TLI.getOperationAction(Node->getOpcode(),
                                     Node->getValueType(0));
     break;
   case ISD::VAARG:
     Action = TLI.getOperationAction(Node->getOpcode(),
                                     Node->getValueType(0));
     if (Action != TargetLowering::Promote)
       Action = TLI.getOperationAction(Node->getOpcode(), MVT::Other);
     break;
   case ISD::FP_TO_FP16:
   case ISD::SINT_TO_FP:
   case ISD::UINT_TO_FP:
   case ISD::EXTRACT_VECTOR_ELT:
     Action = TLI.getOperationAction(Node->getOpcode(),
                                     Node->getOperand(0).getValueType());
     break;
   case ISD::FP_ROUND_INREG:
   case ISD::SIGN_EXTEND_INREG: {
     EVT InnerType = cast<VTSDNode>(Node->getOperand(1))->getVT();
     Action = TLI.getOperationAction(Node->getOpcode(), InnerType);
     break;
   }
   case ISD::ATOMIC_STORE: {
     Action = TLI.getOperationAction(Node->getOpcode(),
                                     Node->getOperand(2).getValueType());
     break;
   }
   case ISD::SELECT_CC:
   case ISD::SETCC:
   case ISD::BR_CC: {
     unsigned CCOperand = Node->getOpcode() == ISD::SELECT_CC ? 4 :
                          Node->getOpcode() == ISD::SETCC ? 2 :
                          Node->getOpcode() == ISD::SETCCE ? 3 : 1;
     unsigned CompareOperand = Node->getOpcode() == ISD::BR_CC ? 2 : 0;
     MVT OpVT = Node->getOperand(CompareOperand).getSimpleValueType();
     ISD::CondCode CCCode =
         cast<CondCodeSDNode>(Node->getOperand(CCOperand))->get();
     Action = TLI.getCondCodeAction(CCCode, OpVT);
     if (Action == TargetLowering::Legal) {
       if (Node->getOpcode() == ISD::SELECT_CC)
         Action = TLI.getOperationAction(Node->getOpcode(),
                                         Node->getValueType(0));
       else
         Action = TLI.getOperationAction(Node->getOpcode(), OpVT);
     }
     break;
   }
   case ISD::LOAD:
   case ISD::STORE:
     // FIXME: Model these properly.  LOAD and STORE are complicated, and
     // STORE expects the unlegalized operand in some cases.
     SimpleFinishLegalizing = false;
     break;
   case ISD::CALLSEQ_START:
   case ISD::CALLSEQ_END:
     // FIXME: This shouldn't be necessary.  These nodes have special properties
     // dealing with the recursive nature of legalization.  Removing this
     // special case should be done as part of making LegalizeDAG non-recursive.
     SimpleFinishLegalizing = false;
     break;
   case ISD::EXTRACT_ELEMENT:
   case ISD::FLT_ROUNDS_:
   case ISD::FPOWI:
   case ISD::MERGE_VALUES:
   case ISD::EH_RETURN:
   case ISD::FRAME_TO_ARGS_OFFSET:
   case ISD::EH_SJLJ_SETJMP:
   case ISD::EH_SJLJ_LONGJMP:
   case ISD::EH_SJLJ_SETUP_DISPATCH:
     // These operations lie about being legal: when they claim to be legal,
     // they should actually be expanded.
     Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
     if (Action == TargetLowering::Legal)
       Action = TargetLowering::Expand;
     break;
   case ISD::INIT_TRAMPOLINE:
   case ISD::ADJUST_TRAMPOLINE:
   case ISD::FRAMEADDR:
   case ISD::RETURNADDR:
     // These operations lie about being legal: when they claim to be legal,
     // they should actually be custom-lowered.
     Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
     if (Action == TargetLowering::Legal)
       Action = TargetLowering::Custom;
     break;
   case ISD::READCYCLECOUNTER:
     // READCYCLECOUNTER returns an i64, even if type legalization might have
     // expanded that to several smaller types.
     Action = TLI.getOperationAction(Node->getOpcode(), MVT::i64);
     break;
   case ISD::READ_REGISTER:
   case ISD::WRITE_REGISTER:
     // Named register is legal in the DAG, but blocked by register name
     // selection if not implemented by target (to chose the correct register)
     // They'll be converted to Copy(To/From)Reg.
     Action = TargetLowering::Legal;
     break;
   case ISD::DEBUGTRAP:
     Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
     if (Action == TargetLowering::Expand) {
       // replace ISD::DEBUGTRAP with ISD::TRAP
       SDValue NewVal;
       NewVal = DAG.getNode(ISD::TRAP, SDLoc(Node), Node->getVTList(),
                            Node->getOperand(0));
       ReplaceNode(Node, NewVal.getNode());
       LegalizeOp(NewVal.getNode());
       return;
     }
     break;
 
   default:
     if (Node->getOpcode() >= ISD::BUILTIN_OP_END) {
       Action = TargetLowering::Legal;
     } else {
       Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
     }
     break;
   }
 
   if (SimpleFinishLegalizing) {
     SDNode *NewNode = Node;
     switch (Node->getOpcode()) {
     default: break;
     case ISD::SHL:
     case ISD::SRL:
     case ISD::SRA:
     case ISD::ROTL:
     case ISD::ROTR:
       // Legalizing shifts/rotates requires adjusting the shift amount
       // to the appropriate width.
       if (!Node->getOperand(1).getValueType().isVector()) {
         SDValue SAO =
           DAG.getShiftAmountOperand(Node->getOperand(0).getValueType(),
                                     Node->getOperand(1));
         HandleSDNode Handle(SAO);
         LegalizeOp(SAO.getNode());
         NewNode = DAG.UpdateNodeOperands(Node, Node->getOperand(0),
                                          Handle.getValue());
       }
       break;
     case ISD::SRL_PARTS:
     case ISD::SRA_PARTS:
     case ISD::SHL_PARTS:
       // Legalizing shifts/rotates requires adjusting the shift amount
       // to the appropriate width.
       if (!Node->getOperand(2).getValueType().isVector()) {
         SDValue SAO =
           DAG.getShiftAmountOperand(Node->getOperand(0).getValueType(),
                                     Node->getOperand(2));
         HandleSDNode Handle(SAO);
         LegalizeOp(SAO.getNode());
         NewNode = DAG.UpdateNodeOperands(Node, Node->getOperand(0),
                                          Node->getOperand(1),
                                          Handle.getValue());
       }
       break;
     }
 
     if (NewNode != Node) {
       ReplaceNode(Node, NewNode);
       Node = NewNode;
     }
     switch (Action) {
     case TargetLowering::Legal:
       return;
     case TargetLowering::Custom: {
       // FIXME: The handling for custom lowering with multiple results is
       // a complete mess.
       SDValue Res = TLI.LowerOperation(SDValue(Node, 0), DAG);
       if (Res.getNode()) {
         if (!(Res.getNode() != Node || Res.getResNo() != 0))
           return;
 
         if (Node->getNumValues() == 1) {
           // We can just directly replace this node with the lowered value.
           ReplaceNode(SDValue(Node, 0), Res);
           return;
         }
 
         SmallVector<SDValue, 8> ResultVals;
         for (unsigned i = 0, e = Node->getNumValues(); i != e; ++i)
           ResultVals.push_back(Res.getValue(i));
         ReplaceNode(Node, ResultVals.data());
         return;
       }
     }
       // FALL THROUGH
     case TargetLowering::Expand:
       if (ExpandNode(Node))
         return;
       // FALL THROUGH
     case TargetLowering::LibCall:
       ConvertNodeToLibcall(Node);
       return;
     case TargetLowering::Promote:
       PromoteNode(Node);
       return;
     }
   }
 
   switch (Node->getOpcode()) {
   default:
 #ifndef NDEBUG
     dbgs() << "NODE: ";
     Node->dump( &DAG);
     dbgs() << "\n";
 #endif
     llvm_unreachable("Do not know how to legalize this operator!");
 
   case ISD::CALLSEQ_START:
   case ISD::CALLSEQ_END:
     break;
   case ISD::LOAD: {
     return LegalizeLoadOps(Node);
   }
   case ISD::STORE: {
     return LegalizeStoreOps(Node);
   }
   }
 }
 
 SDValue SelectionDAGLegalize::ExpandExtractFromVectorThroughStack(SDValue Op) {
   SDValue Vec = Op.getOperand(0);
   SDValue Idx = Op.getOperand(1);
   SDLoc dl(Op);
 
   // Before we generate a new store to a temporary stack slot, see if there is
   // already one that we can use. There often is because when we scalarize
   // vector operations (using SelectionDAG::UnrollVectorOp for example) a whole
   // series of EXTRACT_VECTOR_ELT nodes are generated, one for each element in
   // the vector. If all are expanded here, we don't want one store per vector
   // element.
 
   // Caches for hasPredecessorHelper
   SmallPtrSet<const SDNode *, 32> Visited;
   SmallVector<const SDNode *, 16> Worklist;
 
   SDValue StackPtr, Ch;
   for (SDNode::use_iterator UI = Vec.getNode()->use_begin(),
        UE = Vec.getNode()->use_end(); UI != UE; ++UI) {
     SDNode *User = *UI;
     if (StoreSDNode *ST = dyn_cast<StoreSDNode>(User)) {
       if (ST->isIndexed() || ST->isTruncatingStore() ||
           ST->getValue() != Vec)
         continue;
 
       // Make sure that nothing else could have stored into the destination of
       // this store.
       if (!ST->getChain().reachesChainWithoutSideEffects(DAG.getEntryNode()))
         continue;
 
       // If the index is dependent on the store we will introduce a cycle when
       // creating the load (the load uses the index, and by replacing the chain
       // we will make the index dependent on the load).
       if (Idx.getNode()->hasPredecessorHelper(ST, Visited, Worklist))
         continue;
 
       StackPtr = ST->getBasePtr();
       Ch = SDValue(ST, 0);
       break;
     }
   }
 
   if (!Ch.getNode()) {
     // Store the value to a temporary stack slot, then LOAD the returned part.
     StackPtr = DAG.CreateStackTemporary(Vec.getValueType());
     Ch = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr,
                       MachinePointerInfo(), false, false, 0);
   }
 
   // Add the offset to the index.
   unsigned EltSize =
       Vec.getValueType().getVectorElementType().getSizeInBits()/8;
   Idx = DAG.getNode(ISD::MUL, dl, Idx.getValueType(), Idx,
                     DAG.getConstant(EltSize, SDLoc(Vec), Idx.getValueType()));
 
   Idx = DAG.getZExtOrTrunc(Idx, dl, TLI.getPointerTy(DAG.getDataLayout()));
   StackPtr = DAG.getNode(ISD::ADD, dl, Idx.getValueType(), Idx, StackPtr);
 
   SDValue NewLoad;
 
   if (Op.getValueType().isVector())
     NewLoad = DAG.getLoad(Op.getValueType(), dl, Ch, StackPtr,
                           MachinePointerInfo(), false, false, false, 0);
   else
     NewLoad = DAG.getExtLoad(
         ISD::EXTLOAD, dl, Op.getValueType(), Ch, StackPtr, MachinePointerInfo(),
         Vec.getValueType().getVectorElementType(), false, false, false, 0);
 
   // Replace the chain going out of the store, by the one out of the load.
   DAG.ReplaceAllUsesOfValueWith(Ch, SDValue(NewLoad.getNode(), 1));
 
   // We introduced a cycle though, so update the loads operands, making sure
   // to use the original store's chain as an incoming chain.
   SmallVector<SDValue, 6> NewLoadOperands(NewLoad->op_begin(),
                                           NewLoad->op_end());
   NewLoadOperands[0] = Ch;
   NewLoad =
       SDValue(DAG.UpdateNodeOperands(NewLoad.getNode(), NewLoadOperands), 0);
   return NewLoad;
 }
 
 SDValue SelectionDAGLegalize::ExpandInsertToVectorThroughStack(SDValue Op) {
   assert(Op.getValueType().isVector() && "Non-vector insert subvector!");
 
   SDValue Vec  = Op.getOperand(0);
   SDValue Part = Op.getOperand(1);
   SDValue Idx  = Op.getOperand(2);
   SDLoc dl(Op);
 
   // Store the value to a temporary stack slot, then LOAD the returned part.
 
   SDValue StackPtr = DAG.CreateStackTemporary(Vec.getValueType());
   int FI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
   MachinePointerInfo PtrInfo =
       MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FI);
 
   // First store the whole vector.
   SDValue Ch = DAG.getStore(DAG.getEntryNode(), dl, Vec, StackPtr, PtrInfo,
                             false, false, 0);
 
   // Then store the inserted part.
 
   // Add the offset to the index.
   unsigned EltSize =
       Vec.getValueType().getVectorElementType().getSizeInBits()/8;
 
   Idx = DAG.getNode(ISD::MUL, dl, Idx.getValueType(), Idx,
                     DAG.getConstant(EltSize, SDLoc(Vec), Idx.getValueType()));
   Idx = DAG.getZExtOrTrunc(Idx, dl, TLI.getPointerTy(DAG.getDataLayout()));
 
   SDValue SubStackPtr = DAG.getNode(ISD::ADD, dl, Idx.getValueType(), Idx,
                                     StackPtr);
 
   // Store the subvector.
   Ch = DAG.getStore(Ch, dl, Part, SubStackPtr,
                     MachinePointerInfo(), false, false, 0);
 
   // Finally, load the updated vector.
   return DAG.getLoad(Op.getValueType(), dl, Ch, StackPtr, PtrInfo,
                      false, false, false, 0);
 }
 
 SDValue SelectionDAGLegalize::ExpandVectorBuildThroughStack(SDNode* Node) {
   // We can't handle this case efficiently.  Allocate a sufficiently
   // aligned object on the stack, store each element into it, then load
   // the result as a vector.
   // Create the stack frame object.
   EVT VT = Node->getValueType(0);
   EVT EltVT = VT.getVectorElementType();
   SDLoc dl(Node);
   SDValue FIPtr = DAG.CreateStackTemporary(VT);
   int FI = cast<FrameIndexSDNode>(FIPtr.getNode())->getIndex();
   MachinePointerInfo PtrInfo =
       MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FI);
 
   // Emit a store of each element to the stack slot.
   SmallVector<SDValue, 8> Stores;
   unsigned TypeByteSize = EltVT.getSizeInBits() / 8;
   // Store (in the right endianness) the elements to memory.
   for (unsigned i = 0, e = Node->getNumOperands(); i != e; ++i) {
     // Ignore undef elements.
     if (Node->getOperand(i).getOpcode() == ISD::UNDEF) continue;
 
     unsigned Offset = TypeByteSize*i;
 
     SDValue Idx = DAG.getConstant(Offset, dl, FIPtr.getValueType());
     Idx = DAG.getNode(ISD::ADD, dl, FIPtr.getValueType(), FIPtr, Idx);
 
     // If the destination vector element type is narrower than the source
     // element type, only store the bits necessary.
     if (EltVT.bitsLT(Node->getOperand(i).getValueType().getScalarType())) {
       Stores.push_back(DAG.getTruncStore(DAG.getEntryNode(), dl,
                                          Node->getOperand(i), Idx,
                                          PtrInfo.getWithOffset(Offset),
                                          EltVT, false, false, 0));
     } else
       Stores.push_back(DAG.getStore(DAG.getEntryNode(), dl,
                                     Node->getOperand(i), Idx,
                                     PtrInfo.getWithOffset(Offset),
                                     false, false, 0));
   }
 
   SDValue StoreChain;
   if (!Stores.empty())    // Not all undef elements?
     StoreChain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Stores);
   else
     StoreChain = DAG.getEntryNode();
 
   // Result is a load from the stack slot.
   return DAG.getLoad(VT, dl, StoreChain, FIPtr, PtrInfo,
                      false, false, false, 0);
 }
 
 namespace {
 /// Keeps track of state when getting the sign of a floating-point value as an
 /// integer.
 struct FloatSignAsInt {
   EVT FloatVT;
   SDValue Chain;
   SDValue FloatPtr;
   SDValue IntPtr;
   MachinePointerInfo IntPointerInfo;
   MachinePointerInfo FloatPointerInfo;
   SDValue IntValue;
   APInt SignMask;
+  uint8_t SignBit;
 };
 }
 
 /// Bitcast a floating-point value to an integer value. Only bitcast the part
 /// containing the sign bit if the target has no integer value capable of
 /// holding all bits of the floating-point value.
 void SelectionDAGLegalize::getSignAsIntValue(FloatSignAsInt &State,
                                              SDLoc DL, SDValue Value) const {
   EVT FloatVT = Value.getValueType();
   unsigned NumBits = FloatVT.getSizeInBits();
   State.FloatVT = FloatVT;
   EVT IVT = EVT::getIntegerVT(*DAG.getContext(), NumBits);
   // Convert to an integer of the same size.
   if (TLI.isTypeLegal(IVT)) {
     State.IntValue = DAG.getNode(ISD::BITCAST, DL, IVT, Value);
     State.SignMask = APInt::getSignBit(NumBits);
+    State.SignBit = NumBits - 1;
     return;
   }
 
   auto &DataLayout = DAG.getDataLayout();
   // Store the float to memory, then load the sign part out as an integer.
   MVT LoadTy = TLI.getRegisterType(*DAG.getContext(), MVT::i8);
   // First create a temporary that is aligned for both the load and store.
   SDValue StackPtr = DAG.CreateStackTemporary(FloatVT, LoadTy);
   int FI = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
   // Then store the float to it.
   State.FloatPtr = StackPtr;
   MachineFunction &MF = DAG.getMachineFunction();
   State.FloatPointerInfo = MachinePointerInfo::getFixedStack(MF, FI);
   State.Chain = DAG.getStore(DAG.getEntryNode(), DL, Value, State.FloatPtr,
                              State.FloatPointerInfo, false, false, 0);
 
   SDValue IntPtr;
   if (DataLayout.isBigEndian()) {
     assert(FloatVT.isByteSized() && "Unsupported floating point type!");
     // Load out a legal integer with the same sign bit as the float.
     IntPtr = StackPtr;
     State.IntPointerInfo = State.FloatPointerInfo;
   } else {
     // Advance the pointer so that the loaded byte will contain the sign bit.
     unsigned ByteOffset = (FloatVT.getSizeInBits() / 8) - 1;
     IntPtr = DAG.getNode(ISD::ADD, DL, StackPtr.getValueType(), StackPtr,
                       DAG.getConstant(ByteOffset, DL, StackPtr.getValueType()));
     State.IntPointerInfo = MachinePointerInfo::getFixedStack(MF, FI,
                                                              ByteOffset);
   }
 
   State.IntPtr = IntPtr;
   State.IntValue = DAG.getExtLoad(ISD::EXTLOAD, DL, LoadTy, State.Chain,
                                   IntPtr, State.IntPointerInfo, MVT::i8,
                                   false, false, false, 0);
   State.SignMask = APInt::getOneBitSet(LoadTy.getSizeInBits(), 7);
+  State.SignBit = 7;
 }
 
 /// Replace the integer value produced by getSignAsIntValue() with a new value
 /// and cast the result back to a floating-point type.
 SDValue SelectionDAGLegalize::modifySignAsInt(const FloatSignAsInt &State,
                                           SDLoc DL, SDValue NewIntValue) const {
   if (!State.Chain)
     return DAG.getNode(ISD::BITCAST, DL, State.FloatVT, NewIntValue);
 
   // Override the part containing the sign bit in the value stored on the stack.
   SDValue Chain = DAG.getTruncStore(State.Chain, DL, NewIntValue, State.IntPtr,
                                     State.IntPointerInfo, MVT::i8, false, false,
                                     0);
   return DAG.getLoad(State.FloatVT, DL, Chain, State.FloatPtr,
                      State.FloatPointerInfo, false, false, false, 0);
 }
 
 SDValue SelectionDAGLegalize::ExpandFCOPYSIGN(SDNode *Node) const {
   SDLoc DL(Node);
   SDValue Mag = Node->getOperand(0);
   SDValue Sign = Node->getOperand(1);
 
   // Get sign bit into an integer value.
   FloatSignAsInt SignAsInt;
   getSignAsIntValue(SignAsInt, DL, Sign);
 
   EVT IntVT = SignAsInt.IntValue.getValueType();
   SDValue SignMask = DAG.getConstant(SignAsInt.SignMask, DL, IntVT);
   SDValue SignBit = DAG.getNode(ISD::AND, DL, IntVT, SignAsInt.IntValue,
                                 SignMask);
 
   // If FABS is legal transform FCOPYSIGN(x, y) => sign(x) ? -FABS(x) : FABS(X)
   EVT FloatVT = Mag.getValueType();
   if (TLI.isOperationLegalOrCustom(ISD::FABS, FloatVT) &&
       TLI.isOperationLegalOrCustom(ISD::FNEG, FloatVT)) {
     SDValue AbsValue = DAG.getNode(ISD::FABS, DL, FloatVT, Mag);
     SDValue NegValue = DAG.getNode(ISD::FNEG, DL, FloatVT, AbsValue);
     SDValue Cond = DAG.getSetCC(DL, getSetCCResultType(IntVT), SignBit,
                                 DAG.getConstant(0, DL, IntVT), ISD::SETNE);
     return DAG.getSelect(DL, FloatVT, Cond, NegValue, AbsValue);
   }
 
-  // Transform values to integer, copy the sign bit and transform back.
+  // Transform Mag value to integer, and clear the sign bit.
   FloatSignAsInt MagAsInt;
   getSignAsIntValue(MagAsInt, DL, Mag);
-  assert(SignAsInt.SignMask == MagAsInt.SignMask);
-  SDValue ClearSignMask = DAG.getConstant(~SignAsInt.SignMask, DL, IntVT);
-  SDValue ClearedSign = DAG.getNode(ISD::AND, DL, IntVT, MagAsInt.IntValue,
+  EVT MagVT = MagAsInt.IntValue.getValueType();
+  SDValue ClearSignMask = DAG.getConstant(~MagAsInt.SignMask, DL, MagVT);
+  SDValue ClearedSign = DAG.getNode(ISD::AND, DL, MagVT, MagAsInt.IntValue,
                                     ClearSignMask);
-  SDValue CopiedSign = DAG.getNode(ISD::OR, DL, IntVT, ClearedSign, SignBit);
 
+  // Get the signbit at the right position for MagAsInt.
+  int ShiftAmount = SignAsInt.SignBit - MagAsInt.SignBit;
+  if (SignBit.getValueSizeInBits() > ClearedSign.getValueSizeInBits()) {
+    if (ShiftAmount > 0) {
+      SDValue ShiftCnst = DAG.getConstant(ShiftAmount, DL, IntVT);
+      SignBit = DAG.getNode(ISD::SRL, DL, IntVT, SignBit, ShiftCnst);
+    } else if (ShiftAmount < 0) {
+      SDValue ShiftCnst = DAG.getConstant(-ShiftAmount, DL, IntVT);
+      SignBit = DAG.getNode(ISD::SHL, DL, IntVT, SignBit, ShiftCnst);
+    }
+    SignBit = DAG.getNode(ISD::TRUNCATE, DL, MagVT, SignBit);
+  } else if (SignBit.getValueSizeInBits() < ClearedSign.getValueSizeInBits()) {
+    SignBit = DAG.getNode(ISD::ZERO_EXTEND, DL, MagVT, SignBit);
+    if (ShiftAmount > 0) {
+      SDValue ShiftCnst = DAG.getConstant(ShiftAmount, DL, MagVT);
+      SignBit = DAG.getNode(ISD::SRL, DL, MagVT, SignBit, ShiftCnst);
+    } else if (ShiftAmount < 0) {
+      SDValue ShiftCnst = DAG.getConstant(-ShiftAmount, DL, MagVT);
+      SignBit = DAG.getNode(ISD::SHL, DL, MagVT, SignBit, ShiftCnst);
+    }
+  }
+
+  // Store the part with the modified sign and convert back to float.
+  SDValue CopiedSign = DAG.getNode(ISD::OR, DL, MagVT, ClearedSign, SignBit);
   return modifySignAsInt(MagAsInt, DL, CopiedSign);
 }
 
 SDValue SelectionDAGLegalize::ExpandFABS(SDNode *Node) const {
   SDLoc DL(Node);
   SDValue Value = Node->getOperand(0);
 
   // Transform FABS(x) => FCOPYSIGN(x, 0.0) if FCOPYSIGN is legal.
   EVT FloatVT = Value.getValueType();
   if (TLI.isOperationLegalOrCustom(ISD::FCOPYSIGN, FloatVT)) {
     SDValue Zero = DAG.getConstantFP(0.0, DL, FloatVT);
     return DAG.getNode(ISD::FCOPYSIGN, DL, FloatVT, Value, Zero);
   }
 
   // Transform value to integer, clear the sign bit and transform back.
   FloatSignAsInt ValueAsInt;
   getSignAsIntValue(ValueAsInt, DL, Value);
   EVT IntVT = ValueAsInt.IntValue.getValueType();
   SDValue ClearSignMask = DAG.getConstant(~ValueAsInt.SignMask, DL, IntVT);
   SDValue ClearedSign = DAG.getNode(ISD::AND, DL, IntVT, ValueAsInt.IntValue,
                                     ClearSignMask);
   return modifySignAsInt(ValueAsInt, DL, ClearedSign);
 }
 
 void SelectionDAGLegalize::ExpandDYNAMIC_STACKALLOC(SDNode* Node,
                                            SmallVectorImpl<SDValue> &Results) {
   unsigned SPReg = TLI.getStackPointerRegisterToSaveRestore();
   assert(SPReg && "Target cannot require DYNAMIC_STACKALLOC expansion and"
           " not tell us which reg is the stack pointer!");
   SDLoc dl(Node);
   EVT VT = Node->getValueType(0);
   SDValue Tmp1 = SDValue(Node, 0);
   SDValue Tmp2 = SDValue(Node, 1);
   SDValue Tmp3 = Node->getOperand(2);
   SDValue Chain = Tmp1.getOperand(0);
 
   // Chain the dynamic stack allocation so that it doesn't modify the stack
   // pointer when other instructions are using the stack.
   Chain = DAG.getCALLSEQ_START(Chain, DAG.getIntPtrConstant(0, dl, true), dl);
 
   SDValue Size  = Tmp2.getOperand(1);
   SDValue SP = DAG.getCopyFromReg(Chain, dl, SPReg, VT);
   Chain = SP.getValue(1);
   unsigned Align = cast<ConstantSDNode>(Tmp3)->getZExtValue();
   unsigned StackAlign =
       DAG.getSubtarget().getFrameLowering()->getStackAlignment();
   Tmp1 = DAG.getNode(ISD::SUB, dl, VT, SP, Size);       // Value
   if (Align > StackAlign)
     Tmp1 = DAG.getNode(ISD::AND, dl, VT, Tmp1,
                        DAG.getConstant(-(uint64_t)Align, dl, VT));
   Chain = DAG.getCopyToReg(Chain, dl, SPReg, Tmp1);     // Output chain
 
   Tmp2 = DAG.getCALLSEQ_END(Chain, DAG.getIntPtrConstant(0, dl, true),
                             DAG.getIntPtrConstant(0, dl, true), SDValue(), dl);
 
   Results.push_back(Tmp1);
   Results.push_back(Tmp2);
 }
 
 /// Legalize a SETCC with given LHS and RHS and condition code CC on the current
 /// target.
 ///
 /// If the SETCC has been legalized using AND / OR, then the legalized node
 /// will be stored in LHS. RHS and CC will be set to SDValue(). NeedInvert
 /// will be set to false.
 ///
 /// If the SETCC has been legalized by using getSetCCSwappedOperands(),
 /// then the values of LHS and RHS will be swapped, CC will be set to the
 /// new condition, and NeedInvert will be set to false.
 ///
 /// If the SETCC has been legalized using the inverse condcode, then LHS and
 /// RHS will be unchanged, CC will set to the inverted condcode, and NeedInvert
 /// will be set to true. The caller must invert the result of the SETCC with
 /// SelectionDAG::getLogicalNOT() or take equivalent action to swap the effect
 /// of a true/false result.
 ///
 /// \returns true if the SetCC has been legalized, false if it hasn't.
 bool SelectionDAGLegalize::LegalizeSetCCCondCode(EVT VT,
                                                  SDValue &LHS, SDValue &RHS,
                                                  SDValue &CC,
                                                  bool &NeedInvert,
                                                  SDLoc dl) {
   MVT OpVT = LHS.getSimpleValueType();
   ISD::CondCode CCCode = cast<CondCodeSDNode>(CC)->get();
   NeedInvert = false;
   switch (TLI.getCondCodeAction(CCCode, OpVT)) {
   default: llvm_unreachable("Unknown condition code action!");
   case TargetLowering::Legal:
     // Nothing to do.
     break;
   case TargetLowering::Expand: {
     ISD::CondCode InvCC = ISD::getSetCCSwappedOperands(CCCode);
     if (TLI.isCondCodeLegal(InvCC, OpVT)) {
       std::swap(LHS, RHS);
       CC = DAG.getCondCode(InvCC);
       return true;
     }
     ISD::CondCode CC1 = ISD::SETCC_INVALID, CC2 = ISD::SETCC_INVALID;
     unsigned Opc = 0;
     switch (CCCode) {
     default: llvm_unreachable("Don't know how to expand this condition!");
     case ISD::SETO:
         assert(TLI.getCondCodeAction(ISD::SETOEQ, OpVT)
             == TargetLowering::Legal
             && "If SETO is expanded, SETOEQ must be legal!");
         CC1 = ISD::SETOEQ; CC2 = ISD::SETOEQ; Opc = ISD::AND; break;
     case ISD::SETUO:
         assert(TLI.getCondCodeAction(ISD::SETUNE, OpVT)
             == TargetLowering::Legal
             && "If SETUO is expanded, SETUNE must be legal!");
         CC1 = ISD::SETUNE; CC2 = ISD::SETUNE; Opc = ISD::OR;  break;
     case ISD::SETOEQ:
     case ISD::SETOGT:
     case ISD::SETOGE:
     case ISD::SETOLT:
     case ISD::SETOLE:
     case ISD::SETONE:
     case ISD::SETUEQ:
     case ISD::SETUNE:
     case ISD::SETUGT:
     case ISD::SETUGE:
     case ISD::SETULT:
     case ISD::SETULE:
         // If we are floating point, assign and break, otherwise fall through.
         if (!OpVT.isInteger()) {
           // We can use the 4th bit to tell if we are the unordered
           // or ordered version of the opcode.
           CC2 = ((unsigned)CCCode & 0x8U) ? ISD::SETUO : ISD::SETO;
           Opc = ((unsigned)CCCode & 0x8U) ? ISD::OR : ISD::AND;
           CC1 = (ISD::CondCode)(((int)CCCode & 0x7) | 0x10);
           break;
         }
         // Fallthrough if we are unsigned integer.
     case ISD::SETLE:
     case ISD::SETGT:
     case ISD::SETGE:
     case ISD::SETLT:
       // We only support using the inverted operation, which is computed above
       // and not a different manner of supporting expanding these cases.
       llvm_unreachable("Don't know how to expand this condition!");
     case ISD::SETNE:
     case ISD::SETEQ:
       // Try inverting the result of the inverse condition.
       InvCC = CCCode == ISD::SETEQ ? ISD::SETNE : ISD::SETEQ;
       if (TLI.isCondCodeLegal(InvCC, OpVT)) {
         CC = DAG.getCondCode(InvCC);
         NeedInvert = true;
         return true;
       }
       // If inverting the condition didn't work then we have no means to expand
       // the condition.
       llvm_unreachable("Don't know how to expand this condition!");
     }
 
     SDValue SetCC1, SetCC2;
     if (CCCode != ISD::SETO && CCCode != ISD::SETUO) {
       // If we aren't the ordered or unorder operation,
       // then the pattern is (LHS CC1 RHS) Opc (LHS CC2 RHS).
       SetCC1 = DAG.getSetCC(dl, VT, LHS, RHS, CC1);
       SetCC2 = DAG.getSetCC(dl, VT, LHS, RHS, CC2);
     } else {
       // Otherwise, the pattern is (LHS CC1 LHS) Opc (RHS CC2 RHS)
       SetCC1 = DAG.getSetCC(dl, VT, LHS, LHS, CC1);
       SetCC2 = DAG.getSetCC(dl, VT, RHS, RHS, CC2);
     }
     LHS = DAG.getNode(Opc, dl, VT, SetCC1, SetCC2);
     RHS = SDValue();
     CC  = SDValue();
     return true;
   }
   }
   return false;
 }
 
 /// Emit a store/load combination to the stack.  This stores
 /// SrcOp to a stack slot of type SlotVT, truncating it if needed.  It then does
 /// a load from the stack slot to DestVT, extending it if needed.
 /// The resultant code need not be legal.
 SDValue SelectionDAGLegalize::EmitStackConvert(SDValue SrcOp,
                                                EVT SlotVT,
                                                EVT DestVT,
                                                SDLoc dl) {
   // Create the stack frame object.
   unsigned SrcAlign = DAG.getDataLayout().getPrefTypeAlignment(
       SrcOp.getValueType().getTypeForEVT(*DAG.getContext()));
   SDValue FIPtr = DAG.CreateStackTemporary(SlotVT, SrcAlign);
 
   FrameIndexSDNode *StackPtrFI = cast<FrameIndexSDNode>(FIPtr);
   int SPFI = StackPtrFI->getIndex();
   MachinePointerInfo PtrInfo =
       MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SPFI);
 
   unsigned SrcSize = SrcOp.getValueType().getSizeInBits();
   unsigned SlotSize = SlotVT.getSizeInBits();
   unsigned DestSize = DestVT.getSizeInBits();
   Type *DestType = DestVT.getTypeForEVT(*DAG.getContext());
   unsigned DestAlign = DAG.getDataLayout().getPrefTypeAlignment(DestType);
 
   // Emit a store to the stack slot.  Use a truncstore if the input value is
   // later than DestVT.
   SDValue Store;
 
   if (SrcSize > SlotSize)
     Store = DAG.getTruncStore(DAG.getEntryNode(), dl, SrcOp, FIPtr,
                               PtrInfo, SlotVT, false, false, SrcAlign);
   else {
     assert(SrcSize == SlotSize && "Invalid store");
     Store = DAG.getStore(DAG.getEntryNode(), dl, SrcOp, FIPtr,
                          PtrInfo, false, false, SrcAlign);
   }
 
   // Result is a load from the stack slot.
   if (SlotSize == DestSize)
     return DAG.getLoad(DestVT, dl, Store, FIPtr, PtrInfo,
                        false, false, false, DestAlign);
 
   assert(SlotSize < DestSize && "Unknown extension!");
   return DAG.getExtLoad(ISD::EXTLOAD, dl, DestVT, Store, FIPtr,
                         PtrInfo, SlotVT, false, false, false, DestAlign);
 }
 
 SDValue SelectionDAGLegalize::ExpandSCALAR_TO_VECTOR(SDNode *Node) {
   SDLoc dl(Node);
   // Create a vector sized/aligned stack slot, store the value to element #0,
   // then load the whole vector back out.
   SDValue StackPtr = DAG.CreateStackTemporary(Node->getValueType(0));
 
   FrameIndexSDNode *StackPtrFI = cast<FrameIndexSDNode>(StackPtr);
   int SPFI = StackPtrFI->getIndex();
 
   SDValue Ch = DAG.getTruncStore(
       DAG.getEntryNode(), dl, Node->getOperand(0), StackPtr,
       MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SPFI),
       Node->getValueType(0).getVectorElementType(), false, false, 0);
   return DAG.getLoad(
       Node->getValueType(0), dl, Ch, StackPtr,
       MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SPFI), false,
       false, false, 0);
 }
 
 static bool
 ExpandBVWithShuffles(SDNode *Node, SelectionDAG &DAG,
                      const TargetLowering &TLI, SDValue &Res) {
   unsigned NumElems = Node->getNumOperands();
   SDLoc dl(Node);
   EVT VT = Node->getValueType(0);
 
   // Try to group the scalars into pairs, shuffle the pairs together, then
   // shuffle the pairs of pairs together, etc. until the vector has
   // been built. This will work only if all of the necessary shuffle masks
   // are legal.
 
   // We do this in two phases; first to check the legality of the shuffles,
   // and next, assuming that all shuffles are legal, to create the new nodes.
   for (int Phase = 0; Phase < 2; ++Phase) {
     SmallVector<std::pair<SDValue, SmallVector<int, 16> >, 16> IntermedVals,
                                                                NewIntermedVals;
     for (unsigned i = 0; i < NumElems; ++i) {
       SDValue V = Node->getOperand(i);
       if (V.getOpcode() == ISD::UNDEF)
         continue;
 
       SDValue Vec;
       if (Phase)
         Vec = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VT, V);
       IntermedVals.push_back(std::make_pair(Vec, SmallVector<int, 16>(1, i)));
     }
 
     while (IntermedVals.size() > 2) {
       NewIntermedVals.clear();
       for (unsigned i = 0, e = (IntermedVals.size() & ~1u); i < e; i += 2) {
         // This vector and the next vector are shuffled together (simply to
         // append the one to the other).
         SmallVector<int, 16> ShuffleVec(NumElems, -1);
 
         SmallVector<int, 16> FinalIndices;
         FinalIndices.reserve(IntermedVals[i].second.size() +
                              IntermedVals[i+1].second.size());
         
         int k = 0;
         for (unsigned j = 0, f = IntermedVals[i].second.size(); j != f;
              ++j, ++k) {
           ShuffleVec[k] = j;
           FinalIndices.push_back(IntermedVals[i].second[j]);
         }
         for (unsigned j = 0, f = IntermedVals[i+1].second.size(); j != f;
              ++j, ++k) {
           ShuffleVec[k] = NumElems + j;
           FinalIndices.push_back(IntermedVals[i+1].second[j]);
         }
 
         SDValue Shuffle;
         if (Phase)
           Shuffle = DAG.getVectorShuffle(VT, dl, IntermedVals[i].first,
                                          IntermedVals[i+1].first,
                                          ShuffleVec.data());
         else if (!TLI.isShuffleMaskLegal(ShuffleVec, VT))
           return false;
         NewIntermedVals.push_back(
             std::make_pair(Shuffle, std::move(FinalIndices)));
       }
 
       // If we had an odd number of defined values, then append the last
       // element to the array of new vectors.
       if ((IntermedVals.size() & 1) != 0)
         NewIntermedVals.push_back(IntermedVals.back());
 
       IntermedVals.swap(NewIntermedVals);
     }
 
     assert(IntermedVals.size() <= 2 && IntermedVals.size() > 0 &&
            "Invalid number of intermediate vectors");
     SDValue Vec1 = IntermedVals[0].first;
     SDValue Vec2;
     if (IntermedVals.size() > 1)
       Vec2 = IntermedVals[1].first;
     else if (Phase)
       Vec2 = DAG.getUNDEF(VT);
 
     SmallVector<int, 16> ShuffleVec(NumElems, -1);
     for (unsigned i = 0, e = IntermedVals[0].second.size(); i != e; ++i)
       ShuffleVec[IntermedVals[0].second[i]] = i;
     for (unsigned i = 0, e = IntermedVals[1].second.size(); i != e; ++i)
       ShuffleVec[IntermedVals[1].second[i]] = NumElems + i;
 
     if (Phase)
       Res = DAG.getVectorShuffle(VT, dl, Vec1, Vec2, ShuffleVec.data());
     else if (!TLI.isShuffleMaskLegal(ShuffleVec, VT))
       return false;
   }
 
   return true;
 }
 
 /// Expand a BUILD_VECTOR node on targets that don't
 /// support the operation, but do support the resultant vector type.
 SDValue SelectionDAGLegalize::ExpandBUILD_VECTOR(SDNode *Node) {
   unsigned NumElems = Node->getNumOperands();
   SDValue Value1, Value2;
   SDLoc dl(Node);
   EVT VT = Node->getValueType(0);
   EVT OpVT = Node->getOperand(0).getValueType();
   EVT EltVT = VT.getVectorElementType();
 
   // If the only non-undef value is the low element, turn this into a
   // SCALAR_TO_VECTOR node.  If this is { X, X, X, X }, determine X.
   bool isOnlyLowElement = true;
   bool MoreThanTwoValues = false;
   bool isConstant = true;
   for (unsigned i = 0; i < NumElems; ++i) {
     SDValue V = Node->getOperand(i);
     if (V.getOpcode() == ISD::UNDEF)
       continue;
     if (i > 0)
       isOnlyLowElement = false;
     if (!isa<ConstantFPSDNode>(V) && !isa<ConstantSDNode>(V))
       isConstant = false;
 
     if (!Value1.getNode()) {
       Value1 = V;
     } else if (!Value2.getNode()) {
       if (V != Value1)
         Value2 = V;
     } else if (V != Value1 && V != Value2) {
       MoreThanTwoValues = true;
     }
   }
 
   if (!Value1.getNode())
     return DAG.getUNDEF(VT);
 
   if (isOnlyLowElement)
     return DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VT, Node->getOperand(0));
 
   // If all elements are constants, create a load from the constant pool.
   if (isConstant) {
     SmallVector<Constant*, 16> CV;
     for (unsigned i = 0, e = NumElems; i != e; ++i) {
       if (ConstantFPSDNode *V =
           dyn_cast<ConstantFPSDNode>(Node->getOperand(i))) {
         CV.push_back(const_cast<ConstantFP *>(V->getConstantFPValue()));
       } else if (ConstantSDNode *V =
                  dyn_cast<ConstantSDNode>(Node->getOperand(i))) {
         if (OpVT==EltVT)
           CV.push_back(const_cast<ConstantInt *>(V->getConstantIntValue()));
         else {
           // If OpVT and EltVT don't match, EltVT is not legal and the
           // element values have been promoted/truncated earlier.  Undo this;
           // we don't want a v16i8 to become a v16i32 for example.
           const ConstantInt *CI = V->getConstantIntValue();
           CV.push_back(ConstantInt::get(EltVT.getTypeForEVT(*DAG.getContext()),
                                         CI->getZExtValue()));
         }
       } else {
         assert(Node->getOperand(i).getOpcode() == ISD::UNDEF);
         Type *OpNTy = EltVT.getTypeForEVT(*DAG.getContext());
         CV.push_back(UndefValue::get(OpNTy));
       }
     }
     Constant *CP = ConstantVector::get(CV);
     SDValue CPIdx =
         DAG.getConstantPool(CP, TLI.getPointerTy(DAG.getDataLayout()));
     unsigned Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlignment();
     return DAG.getLoad(
         VT, dl, DAG.getEntryNode(), CPIdx,
         MachinePointerInfo::getConstantPool(DAG.getMachineFunction()), false,
         false, false, Alignment);
   }
 
   SmallSet<SDValue, 16> DefinedValues;
   for (unsigned i = 0; i < NumElems; ++i) {
     if (Node->getOperand(i).getOpcode() == ISD::UNDEF)
       continue;
     DefinedValues.insert(Node->getOperand(i));
   }
 
   if (TLI.shouldExpandBuildVectorWithShuffles(VT, DefinedValues.size())) {
     if (!MoreThanTwoValues) {
       SmallVector<int, 8> ShuffleVec(NumElems, -1);
       for (unsigned i = 0; i < NumElems; ++i) {
         SDValue V = Node->getOperand(i);
         if (V.getOpcode() == ISD::UNDEF)
           continue;
         ShuffleVec[i] = V == Value1 ? 0 : NumElems;
       }
       if (TLI.isShuffleMaskLegal(ShuffleVec, Node->getValueType(0))) {
         // Get the splatted value into the low element of a vector register.
         SDValue Vec1 = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VT, Value1);
         SDValue Vec2;
         if (Value2.getNode())
           Vec2 = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VT, Value2);
         else
           Vec2 = DAG.getUNDEF(VT);
 
         // Return shuffle(LowValVec, undef, <0,0,0,0>)
         return DAG.getVectorShuffle(VT, dl, Vec1, Vec2, ShuffleVec.data());
       }
     } else {
       SDValue Res;
       if (ExpandBVWithShuffles(Node, DAG, TLI, Res))
         return Res;
     }
   }
 
   // Otherwise, we can't handle this case efficiently.
   return ExpandVectorBuildThroughStack(Node);
 }
 
 // Expand a node into a call to a libcall.  If the result value
 // does not fit into a register, return the lo part and set the hi part to the
 // by-reg argument.  If it does fit into a single register, return the result
 // and leave the Hi part unset.
 SDValue SelectionDAGLegalize::ExpandLibCall(RTLIB::Libcall LC, SDNode *Node,
                                             bool isSigned) {
   TargetLowering::ArgListTy Args;
   TargetLowering::ArgListEntry Entry;
   for (const SDValue &Op : Node->op_values()) {
     EVT ArgVT = Op.getValueType();
     Type *ArgTy = ArgVT.getTypeForEVT(*DAG.getContext());
     Entry.Node = Op;
     Entry.Ty = ArgTy;
     Entry.isSExt = isSigned;
     Entry.isZExt = !isSigned;
     Args.push_back(Entry);
   }
   SDValue Callee = DAG.getExternalSymbol(TLI.getLibcallName(LC),
                                          TLI.getPointerTy(DAG.getDataLayout()));
 
   Type *RetTy = Node->getValueType(0).getTypeForEVT(*DAG.getContext());
 
   // By default, the input chain to this libcall is the entry node of the
   // function. If the libcall is going to be emitted as a tail call then
   // TLI.isUsedByReturnOnly will change it to the right chain if the return
   // node which is being folded has a non-entry input chain.
   SDValue InChain = DAG.getEntryNode();
 
   // isTailCall may be true since the callee does not reference caller stack
   // frame. Check if it's in the right position.
   SDValue TCChain = InChain;
   bool isTailCall = TLI.isInTailCallPosition(DAG, Node, TCChain);
   if (isTailCall)
     InChain = TCChain;
 
   TargetLowering::CallLoweringInfo CLI(DAG);
   CLI.setDebugLoc(SDLoc(Node)).setChain(InChain)
     .setCallee(TLI.getLibcallCallingConv(LC), RetTy, Callee, std::move(Args), 0)
     .setTailCall(isTailCall).setSExtResult(isSigned).setZExtResult(!isSigned);
 
   std::pair<SDValue, SDValue> CallInfo = TLI.LowerCallTo(CLI);
 
   if (!CallInfo.second.getNode())
     // It's a tailcall, return the chain (which is the DAG root).
     return DAG.getRoot();
 
   return CallInfo.first;
 }
 
 /// Generate a libcall taking the given operands as arguments
 /// and returning a result of type RetVT.
 SDValue SelectionDAGLegalize::ExpandLibCall(RTLIB::Libcall LC, EVT RetVT,
                                             const SDValue *Ops, unsigned NumOps,
                                             bool isSigned, SDLoc dl) {
   TargetLowering::ArgListTy Args;
   Args.reserve(NumOps);
 
   TargetLowering::ArgListEntry Entry;
   for (unsigned i = 0; i != NumOps; ++i) {
     Entry.Node = Ops[i];
     Entry.Ty = Entry.Node.getValueType().getTypeForEVT(*DAG.getContext());
     Entry.isSExt = isSigned;
     Entry.isZExt = !isSigned;
     Args.push_back(Entry);
   }
   SDValue Callee = DAG.getExternalSymbol(TLI.getLibcallName(LC),
                                          TLI.getPointerTy(DAG.getDataLayout()));
 
   Type *RetTy = RetVT.getTypeForEVT(*DAG.getContext());
 
   TargetLowering::CallLoweringInfo CLI(DAG);
   CLI.setDebugLoc(dl).setChain(DAG.getEntryNode())
     .setCallee(TLI.getLibcallCallingConv(LC), RetTy, Callee, std::move(Args), 0)
     .setSExtResult(isSigned).setZExtResult(!isSigned);
 
   std::pair<SDValue,SDValue> CallInfo = TLI.LowerCallTo(CLI);
 
   return CallInfo.first;
 }
 
 // Expand a node into a call to a libcall. Similar to
 // ExpandLibCall except that the first operand is the in-chain.
 std::pair<SDValue, SDValue>
 SelectionDAGLegalize::ExpandChainLibCall(RTLIB::Libcall LC,
                                          SDNode *Node,
                                          bool isSigned) {
   SDValue InChain = Node->getOperand(0);
 
   TargetLowering::ArgListTy Args;
   TargetLowering::ArgListEntry Entry;
   for (unsigned i = 1, e = Node->getNumOperands(); i != e; ++i) {
     EVT ArgVT = Node->getOperand(i).getValueType();
     Type *ArgTy = ArgVT.getTypeForEVT(*DAG.getContext());
     Entry.Node = Node->getOperand(i);
     Entry.Ty = ArgTy;
     Entry.isSExt = isSigned;
     Entry.isZExt = !isSigned;
     Args.push_back(Entry);
   }
   SDValue Callee = DAG.getExternalSymbol(TLI.getLibcallName(LC),
                                          TLI.getPointerTy(DAG.getDataLayout()));
 
   Type *RetTy = Node->getValueType(0).getTypeForEVT(*DAG.getContext());
 
   TargetLowering::CallLoweringInfo CLI(DAG);
   CLI.setDebugLoc(SDLoc(Node)).setChain(InChain)
     .setCallee(TLI.getLibcallCallingConv(LC), RetTy, Callee, std::move(Args), 0)
     .setSExtResult(isSigned).setZExtResult(!isSigned);
 
   std::pair<SDValue, SDValue> CallInfo = TLI.LowerCallTo(CLI);
 
   return CallInfo;
 }
 
 SDValue SelectionDAGLegalize::ExpandFPLibCall(SDNode* Node,
                                               RTLIB::Libcall Call_F32,
                                               RTLIB::Libcall Call_F64,
                                               RTLIB::Libcall Call_F80,
                                               RTLIB::Libcall Call_F128,
                                               RTLIB::Libcall Call_PPCF128) {
   RTLIB::Libcall LC;
   switch (Node->getSimpleValueType(0).SimpleTy) {
   default: llvm_unreachable("Unexpected request for libcall!");
   case MVT::f32: LC = Call_F32; break;
   case MVT::f64: LC = Call_F64; break;
   case MVT::f80: LC = Call_F80; break;
   case MVT::f128: LC = Call_F128; break;
   case MVT::ppcf128: LC = Call_PPCF128; break;
   }
   return ExpandLibCall(LC, Node, false);
 }
 
 SDValue SelectionDAGLegalize::ExpandIntLibCall(SDNode* Node, bool isSigned,
                                                RTLIB::Libcall Call_I8,
                                                RTLIB::Libcall Call_I16,
                                                RTLIB::Libcall Call_I32,
                                                RTLIB::Libcall Call_I64,
                                                RTLIB::Libcall Call_I128) {
   RTLIB::Libcall LC;
   switch (Node->getSimpleValueType(0).SimpleTy) {
   default: llvm_unreachable("Unexpected request for libcall!");
   case MVT::i8:   LC = Call_I8; break;
   case MVT::i16:  LC = Call_I16; break;
   case MVT::i32:  LC = Call_I32; break;
   case MVT::i64:  LC = Call_I64; break;
   case MVT::i128: LC = Call_I128; break;
   }
   return ExpandLibCall(LC, Node, isSigned);
 }
 
 /// Issue libcalls to __{u}divmod to compute div / rem pairs.
 void
 SelectionDAGLegalize::ExpandDivRemLibCall(SDNode *Node,
                                           SmallVectorImpl<SDValue> &Results) {
   unsigned Opcode = Node->getOpcode();
   bool isSigned = Opcode == ISD::SDIVREM;
 
   RTLIB::Libcall LC;
   switch (Node->getSimpleValueType(0).SimpleTy) {
   default: llvm_unreachable("Unexpected request for libcall!");
   case MVT::i8:   LC= isSigned ? RTLIB::SDIVREM_I8  : RTLIB::UDIVREM_I8;  break;
   case MVT::i16:  LC= isSigned ? RTLIB::SDIVREM_I16 : RTLIB::UDIVREM_I16; break;
   case MVT::i32:  LC= isSigned ? RTLIB::SDIVREM_I32 : RTLIB::UDIVREM_I32; break;
   case MVT::i64:  LC= isSigned ? RTLIB::SDIVREM_I64 : RTLIB::UDIVREM_I64; break;
   case MVT::i128: LC= isSigned ? RTLIB::SDIVREM_I128:RTLIB::UDIVREM_I128; break;
   }
 
   // The input chain to this libcall is the entry node of the function.
   // Legalizing the call will automatically add the previous call to the
   // dependence.
   SDValue InChain = DAG.getEntryNode();
 
   EVT RetVT = Node->getValueType(0);
   Type *RetTy = RetVT.getTypeForEVT(*DAG.getContext());
 
   TargetLowering::ArgListTy Args;
   TargetLowering::ArgListEntry Entry;
   for (const SDValue &Op : Node->op_values()) {
     EVT ArgVT = Op.getValueType();
     Type *ArgTy = ArgVT.getTypeForEVT(*DAG.getContext());
     Entry.Node = Op;
     Entry.Ty = ArgTy;
     Entry.isSExt = isSigned;
     Entry.isZExt = !isSigned;
     Args.push_back(Entry);
   }
 
   // Also pass the return address of the remainder.
   SDValue FIPtr = DAG.CreateStackTemporary(RetVT);
   Entry.Node = FIPtr;
   Entry.Ty = RetTy->getPointerTo();
   Entry.isSExt = isSigned;
   Entry.isZExt = !isSigned;
   Args.push_back(Entry);
 
   SDValue Callee = DAG.getExternalSymbol(TLI.getLibcallName(LC),
                                          TLI.getPointerTy(DAG.getDataLayout()));
 
   SDLoc dl(Node);
   TargetLowering::CallLoweringInfo CLI(DAG);
   CLI.setDebugLoc(dl).setChain(InChain)
     .setCallee(TLI.getLibcallCallingConv(LC), RetTy, Callee, std::move(Args), 0)
     .setSExtResult(isSigned).setZExtResult(!isSigned);
 
   std::pair<SDValue, SDValue> CallInfo = TLI.LowerCallTo(CLI);
 
   // Remainder is loaded back from the stack frame.
   SDValue Rem = DAG.getLoad(RetVT, dl, CallInfo.second, FIPtr,
                             MachinePointerInfo(), false, false, false, 0);
   Results.push_back(CallInfo.first);
   Results.push_back(Rem);
 }
 
 /// Return true if sincos libcall is available.
 static bool isSinCosLibcallAvailable(SDNode *Node, const TargetLowering &TLI) {
   RTLIB::Libcall LC;
   switch (Node->getSimpleValueType(0).SimpleTy) {
   default: llvm_unreachable("Unexpected request for libcall!");
   case MVT::f32:     LC = RTLIB::SINCOS_F32; break;
   case MVT::f64:     LC = RTLIB::SINCOS_F64; break;
   case MVT::f80:     LC = RTLIB::SINCOS_F80; break;
   case MVT::f128:    LC = RTLIB::SINCOS_F128; break;
   case MVT::ppcf128: LC = RTLIB::SINCOS_PPCF128; break;
   }
   return TLI.getLibcallName(LC) != nullptr;
 }
 
 /// Return true if sincos libcall is available and can be used to combine sin
 /// and cos.
 static bool canCombineSinCosLibcall(SDNode *Node, const TargetLowering &TLI,
                                     const TargetMachine &TM) {
   if (!isSinCosLibcallAvailable(Node, TLI))
     return false;
   // GNU sin/cos functions set errno while sincos does not. Therefore
   // combining sin and cos is only safe if unsafe-fpmath is enabled.
   bool isGNU = Triple(TM.getTargetTriple()).getEnvironment() == Triple::GNU;
   if (isGNU && !TM.Options.UnsafeFPMath)
     return false;
   return true;
 }
 
 /// Only issue sincos libcall if both sin and cos are needed.
 static bool useSinCos(SDNode *Node) {
   unsigned OtherOpcode = Node->getOpcode() == ISD::FSIN
     ? ISD::FCOS : ISD::FSIN;
 
   SDValue Op0 = Node->getOperand(0);
   for (SDNode::use_iterator UI = Op0.getNode()->use_begin(),
        UE = Op0.getNode()->use_end(); UI != UE; ++UI) {
     SDNode *User = *UI;
     if (User == Node)
       continue;
     // The other user might have been turned into sincos already.
     if (User->getOpcode() == OtherOpcode || User->getOpcode() == ISD::FSINCOS)
       return true;
   }
   return false;
 }
 
 /// Issue libcalls to sincos to compute sin / cos pairs.
 void
 SelectionDAGLegalize::ExpandSinCosLibCall(SDNode *Node,
                                           SmallVectorImpl<SDValue> &Results) {
   RTLIB::Libcall LC;
   switch (Node->getSimpleValueType(0).SimpleTy) {
   default: llvm_unreachable("Unexpected request for libcall!");
   case MVT::f32:     LC = RTLIB::SINCOS_F32; break;
   case MVT::f64:     LC = RTLIB::SINCOS_F64; break;
   case MVT::f80:     LC = RTLIB::SINCOS_F80; break;
   case MVT::f128:    LC = RTLIB::SINCOS_F128; break;
   case MVT::ppcf128: LC = RTLIB::SINCOS_PPCF128; break;
   }
 
   // The input chain to this libcall is the entry node of the function.
   // Legalizing the call will automatically add the previous call to the
   // dependence.
   SDValue InChain = DAG.getEntryNode();
 
   EVT RetVT = Node->getValueType(0);
   Type *RetTy = RetVT.getTypeForEVT(*DAG.getContext());
 
   TargetLowering::ArgListTy Args;
   TargetLowering::ArgListEntry Entry;
 
   // Pass the argument.
   Entry.Node = Node->getOperand(0);
   Entry.Ty = RetTy;
   Entry.isSExt = false;
   Entry.isZExt = false;
   Args.push_back(Entry);
 
   // Pass the return address of sin.
   SDValue SinPtr = DAG.CreateStackTemporary(RetVT);
   Entry.Node = SinPtr;
   Entry.Ty = RetTy->getPointerTo();
   Entry.isSExt = false;
   Entry.isZExt = false;
   Args.push_back(Entry);
 
   // Also pass the return address of the cos.
   SDValue CosPtr = DAG.CreateStackTemporary(RetVT);
   Entry.Node = CosPtr;
   Entry.Ty = RetTy->getPointerTo();
   Entry.isSExt = false;
   Entry.isZExt = false;
   Args.push_back(Entry);
 
   SDValue Callee = DAG.getExternalSymbol(TLI.getLibcallName(LC),
                                          TLI.getPointerTy(DAG.getDataLayout()));
 
   SDLoc dl(Node);
   TargetLowering::CallLoweringInfo CLI(DAG);
   CLI.setDebugLoc(dl).setChain(InChain)
     .setCallee(TLI.getLibcallCallingConv(LC),
                Type::getVoidTy(*DAG.getContext()), Callee, std::move(Args), 0);
 
   std::pair<SDValue, SDValue> CallInfo = TLI.LowerCallTo(CLI);
 
   Results.push_back(DAG.getLoad(RetVT, dl, CallInfo.second, SinPtr,
                                 MachinePointerInfo(), false, false, false, 0));
   Results.push_back(DAG.getLoad(RetVT, dl, CallInfo.second, CosPtr,
                                 MachinePointerInfo(), false, false, false, 0));
 }
 
 /// This function is responsible for legalizing a
 /// INT_TO_FP operation of the specified operand when the target requests that
 /// we expand it.  At this point, we know that the result and operand types are
 /// legal for the target.
 SDValue SelectionDAGLegalize::ExpandLegalINT_TO_FP(bool isSigned,
                                                    SDValue Op0,
                                                    EVT DestVT,
                                                    SDLoc dl) {
   // TODO: Should any fast-math-flags be set for the created nodes?
   
   if (Op0.getValueType() == MVT::i32 && TLI.isTypeLegal(MVT::f64)) {
     // simple 32-bit [signed|unsigned] integer to float/double expansion
 
     // Get the stack frame index of a 8 byte buffer.
     SDValue StackSlot = DAG.CreateStackTemporary(MVT::f64);
 
     // word offset constant for Hi/Lo address computation
     SDValue WordOff = DAG.getConstant(sizeof(int), dl,
                                       StackSlot.getValueType());
     // set up Hi and Lo (into buffer) address based on endian
     SDValue Hi = StackSlot;
     SDValue Lo = DAG.getNode(ISD::ADD, dl, StackSlot.getValueType(),
                              StackSlot, WordOff);
     if (DAG.getDataLayout().isLittleEndian())
       std::swap(Hi, Lo);
 
     // if signed map to unsigned space
     SDValue Op0Mapped;
     if (isSigned) {
       // constant used to invert sign bit (signed to unsigned mapping)
       SDValue SignBit = DAG.getConstant(0x80000000u, dl, MVT::i32);
       Op0Mapped = DAG.getNode(ISD::XOR, dl, MVT::i32, Op0, SignBit);
     } else {
       Op0Mapped = Op0;
     }
     // store the lo of the constructed double - based on integer input
     SDValue Store1 = DAG.getStore(DAG.getEntryNode(), dl,
                                   Op0Mapped, Lo, MachinePointerInfo(),
                                   false, false, 0);
     // initial hi portion of constructed double
     SDValue InitialHi = DAG.getConstant(0x43300000u, dl, MVT::i32);
     // store the hi of the constructed double - biased exponent
     SDValue Store2 = DAG.getStore(Store1, dl, InitialHi, Hi,
                                   MachinePointerInfo(),
                                   false, false, 0);
     // load the constructed double
     SDValue Load = DAG.getLoad(MVT::f64, dl, Store2, StackSlot,
                                MachinePointerInfo(), false, false, false, 0);
     // FP constant to bias correct the final result
     SDValue Bias = DAG.getConstantFP(isSigned ?
                                      BitsToDouble(0x4330000080000000ULL) :
                                      BitsToDouble(0x4330000000000000ULL),
                                      dl, MVT::f64);
     // subtract the bias
     SDValue Sub = DAG.getNode(ISD::FSUB, dl, MVT::f64, Load, Bias);
     // final result
     SDValue Result;
     // handle final rounding
     if (DestVT == MVT::f64) {
       // do nothing
       Result = Sub;
     } else if (DestVT.bitsLT(MVT::f64)) {
       Result = DAG.getNode(ISD::FP_ROUND, dl, DestVT, Sub,
                            DAG.getIntPtrConstant(0, dl));
     } else if (DestVT.bitsGT(MVT::f64)) {
       Result = DAG.getNode(ISD::FP_EXTEND, dl, DestVT, Sub);
     }
     return Result;
   }
   assert(!isSigned && "Legalize cannot Expand SINT_TO_FP for i64 yet");
   // Code below here assumes !isSigned without checking again.
 
   // Implementation of unsigned i64 to f64 following the algorithm in
   // __floatundidf in compiler_rt. This implementation has the advantage
   // of performing rounding correctly, both in the default rounding mode
   // and in all alternate rounding modes.
   // TODO: Generalize this for use with other types.
   if (Op0.getValueType() == MVT::i64 && DestVT == MVT::f64) {
     SDValue TwoP52 =
       DAG.getConstant(UINT64_C(0x4330000000000000), dl, MVT::i64);
     SDValue TwoP84PlusTwoP52 =
       DAG.getConstantFP(BitsToDouble(UINT64_C(0x4530000000100000)), dl,
                         MVT::f64);
     SDValue TwoP84 =
       DAG.getConstant(UINT64_C(0x4530000000000000), dl, MVT::i64);
 
     SDValue Lo = DAG.getZeroExtendInReg(Op0, dl, MVT::i32);
     SDValue Hi = DAG.getNode(ISD::SRL, dl, MVT::i64, Op0,
                              DAG.getConstant(32, dl, MVT::i64));
     SDValue LoOr = DAG.getNode(ISD::OR, dl, MVT::i64, Lo, TwoP52);
     SDValue HiOr = DAG.getNode(ISD::OR, dl, MVT::i64, Hi, TwoP84);
     SDValue LoFlt = DAG.getNode(ISD::BITCAST, dl, MVT::f64, LoOr);
     SDValue HiFlt = DAG.getNode(ISD::BITCAST, dl, MVT::f64, HiOr);
     SDValue HiSub = DAG.getNode(ISD::FSUB, dl, MVT::f64, HiFlt,
                                 TwoP84PlusTwoP52);
     return DAG.getNode(ISD::FADD, dl, MVT::f64, LoFlt, HiSub);
   }
 
   // Implementation of unsigned i64 to f32.
   // TODO: Generalize this for use with other types.
   if (Op0.getValueType() == MVT::i64 && DestVT == MVT::f32) {
     // For unsigned conversions, convert them to signed conversions using the
     // algorithm from the x86_64 __floatundidf in compiler_rt.
     if (!isSigned) {
       SDValue Fast = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::f32, Op0);
 
       SDValue ShiftConst = DAG.getConstant(
           1, dl, TLI.getShiftAmountTy(Op0.getValueType(), DAG.getDataLayout()));
       SDValue Shr = DAG.getNode(ISD::SRL, dl, MVT::i64, Op0, ShiftConst);
       SDValue AndConst = DAG.getConstant(1, dl, MVT::i64);
       SDValue And = DAG.getNode(ISD::AND, dl, MVT::i64, Op0, AndConst);
       SDValue Or = DAG.getNode(ISD::OR, dl, MVT::i64, And, Shr);
 
       SDValue SignCvt = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::f32, Or);
       SDValue Slow = DAG.getNode(ISD::FADD, dl, MVT::f32, SignCvt, SignCvt);
 
       // TODO: This really should be implemented using a branch rather than a
       // select.  We happen to get lucky and machinesink does the right
       // thing most of the time.  This would be a good candidate for a
       //pseudo-op, or, even better, for whole-function isel.
       SDValue SignBitTest = DAG.getSetCC(dl, getSetCCResultType(MVT::i64),
         Op0, DAG.getConstant(0, dl, MVT::i64), ISD::SETLT);
       return DAG.getSelect(dl, MVT::f32, SignBitTest, Slow, Fast);
     }
 
     // Otherwise, implement the fully general conversion.
 
     SDValue And = DAG.getNode(ISD::AND, dl, MVT::i64, Op0,
          DAG.getConstant(UINT64_C(0xfffffffffffff800), dl, MVT::i64));
     SDValue Or = DAG.getNode(ISD::OR, dl, MVT::i64, And,
          DAG.getConstant(UINT64_C(0x800), dl, MVT::i64));
     SDValue And2 = DAG.getNode(ISD::AND, dl, MVT::i64, Op0,
          DAG.getConstant(UINT64_C(0x7ff), dl, MVT::i64));
     SDValue Ne = DAG.getSetCC(dl, getSetCCResultType(MVT::i64), And2,
                               DAG.getConstant(UINT64_C(0), dl, MVT::i64),
                               ISD::SETNE);
     SDValue Sel = DAG.getSelect(dl, MVT::i64, Ne, Or, Op0);
     SDValue Ge = DAG.getSetCC(dl, getSetCCResultType(MVT::i64), Op0,
                               DAG.getConstant(UINT64_C(0x0020000000000000), dl,
                                               MVT::i64),
                               ISD::SETUGE);
     SDValue Sel2 = DAG.getSelect(dl, MVT::i64, Ge, Sel, Op0);
     EVT SHVT = TLI.getShiftAmountTy(Sel2.getValueType(), DAG.getDataLayout());
 
     SDValue Sh = DAG.getNode(ISD::SRL, dl, MVT::i64, Sel2,
                              DAG.getConstant(32, dl, SHVT));
     SDValue Trunc = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, Sh);
     SDValue Fcvt = DAG.getNode(ISD::UINT_TO_FP, dl, MVT::f64, Trunc);
     SDValue TwoP32 =
       DAG.getConstantFP(BitsToDouble(UINT64_C(0x41f0000000000000)), dl,
                         MVT::f64);
     SDValue Fmul = DAG.getNode(ISD::FMUL, dl, MVT::f64, TwoP32, Fcvt);
     SDValue Lo = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, Sel2);
     SDValue Fcvt2 = DAG.getNode(ISD::UINT_TO_FP, dl, MVT::f64, Lo);
     SDValue Fadd = DAG.getNode(ISD::FADD, dl, MVT::f64, Fmul, Fcvt2);
     return DAG.getNode(ISD::FP_ROUND, dl, MVT::f32, Fadd,
                        DAG.getIntPtrConstant(0, dl));
   }
 
   SDValue Tmp1 = DAG.getNode(ISD::SINT_TO_FP, dl, DestVT, Op0);
 
   SDValue SignSet = DAG.getSetCC(dl, getSetCCResultType(Op0.getValueType()),
                                  Op0,
                                  DAG.getConstant(0, dl, Op0.getValueType()),
                                  ISD::SETLT);
   SDValue Zero = DAG.getIntPtrConstant(0, dl),
           Four = DAG.getIntPtrConstant(4, dl);
   SDValue CstOffset = DAG.getSelect(dl, Zero.getValueType(),
                                     SignSet, Four, Zero);
 
   // If the sign bit of the integer is set, the large number will be treated
   // as a negative number.  To counteract this, the dynamic code adds an
   // offset depending on the data type.
   uint64_t FF;
   switch (Op0.getSimpleValueType().SimpleTy) {
   default: llvm_unreachable("Unsupported integer type!");
   case MVT::i8 : FF = 0x43800000ULL; break;  // 2^8  (as a float)
   case MVT::i16: FF = 0x47800000ULL; break;  // 2^16 (as a float)
   case MVT::i32: FF = 0x4F800000ULL; break;  // 2^32 (as a float)
   case MVT::i64: FF = 0x5F800000ULL; break;  // 2^64 (as a float)
   }
   if (DAG.getDataLayout().isLittleEndian())
     FF <<= 32;
   Constant *FudgeFactor = ConstantInt::get(
                                        Type::getInt64Ty(*DAG.getContext()), FF);
 
   SDValue CPIdx =
       DAG.getConstantPool(FudgeFactor, TLI.getPointerTy(DAG.getDataLayout()));
   unsigned Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlignment();
   CPIdx = DAG.getNode(ISD::ADD, dl, CPIdx.getValueType(), CPIdx, CstOffset);
   Alignment = std::min(Alignment, 4u);
   SDValue FudgeInReg;
   if (DestVT == MVT::f32)
     FudgeInReg = DAG.getLoad(
         MVT::f32, dl, DAG.getEntryNode(), CPIdx,
         MachinePointerInfo::getConstantPool(DAG.getMachineFunction()), false,
         false, false, Alignment);
   else {
     SDValue Load = DAG.getExtLoad(
         ISD::EXTLOAD, dl, DestVT, DAG.getEntryNode(), CPIdx,
         MachinePointerInfo::getConstantPool(DAG.getMachineFunction()), MVT::f32,
         false, false, false, Alignment);
     HandleSDNode Handle(Load);
     LegalizeOp(Load.getNode());
     FudgeInReg = Handle.getValue();
   }
 
   return DAG.getNode(ISD::FADD, dl, DestVT, Tmp1, FudgeInReg);
 }
 
 /// This function is responsible for legalizing a
 /// *INT_TO_FP operation of the specified operand when the target requests that
 /// we promote it.  At this point, we know that the result and operand types are
 /// legal for the target, and that there is a legal UINT_TO_FP or SINT_TO_FP
 /// operation that takes a larger input.
 SDValue SelectionDAGLegalize::PromoteLegalINT_TO_FP(SDValue LegalOp,
                                                     EVT DestVT,
                                                     bool isSigned,
                                                     SDLoc dl) {
   // First step, figure out the appropriate *INT_TO_FP operation to use.
   EVT NewInTy = LegalOp.getValueType();
 
   unsigned OpToUse = 0;
 
   // Scan for the appropriate larger type to use.
   while (1) {
     NewInTy = (MVT::SimpleValueType)(NewInTy.getSimpleVT().SimpleTy+1);
     assert(NewInTy.isInteger() && "Ran out of possibilities!");
 
     // If the target supports SINT_TO_FP of this type, use it.
     if (TLI.isOperationLegalOrCustom(ISD::SINT_TO_FP, NewInTy)) {
       OpToUse = ISD::SINT_TO_FP;
       break;
     }
     if (isSigned) continue;
 
     // If the target supports UINT_TO_FP of this type, use it.
     if (TLI.isOperationLegalOrCustom(ISD::UINT_TO_FP, NewInTy)) {
       OpToUse = ISD::UINT_TO_FP;
       break;
     }
 
     // Otherwise, try a larger type.
   }
 
   // Okay, we found the operation and type to use.  Zero extend our input to the
   // desired type then run the operation on it.
   return DAG.getNode(OpToUse, dl, DestVT,
                      DAG.getNode(isSigned ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND,
                                  dl, NewInTy, LegalOp));
 }
 
 /// This function is responsible for legalizing a
 /// FP_TO_*INT operation of the specified operand when the target requests that
 /// we promote it.  At this point, we know that the result and operand types are
 /// legal for the target, and that there is a legal FP_TO_UINT or FP_TO_SINT
 /// operation that returns a larger result.
 SDValue SelectionDAGLegalize::PromoteLegalFP_TO_INT(SDValue LegalOp,
                                                     EVT DestVT,
                                                     bool isSigned,
                                                     SDLoc dl) {
   // First step, figure out the appropriate FP_TO*INT operation to use.
   EVT NewOutTy = DestVT;
 
   unsigned OpToUse = 0;
 
   // Scan for the appropriate larger type to use.
   while (1) {
     NewOutTy = (MVT::SimpleValueType)(NewOutTy.getSimpleVT().SimpleTy+1);
     assert(NewOutTy.isInteger() && "Ran out of possibilities!");
 
     // A larger signed type can hold all unsigned values of the requested type,
     // so using FP_TO_SINT is valid
     if (TLI.isOperationLegalOrCustom(ISD::FP_TO_SINT, NewOutTy)) {
       OpToUse = ISD::FP_TO_SINT;
       break;
     }
 
     // However, if the value may be < 0.0, we *must* use some FP_TO_SINT.
     if (!isSigned && TLI.isOperationLegalOrCustom(ISD::FP_TO_UINT, NewOutTy)) {
       OpToUse = ISD::FP_TO_UINT;
       break;
     }
 
     // Otherwise, try a larger type.
   }
 
 
   // Okay, we found the operation and type to use.
   SDValue Operation = DAG.getNode(OpToUse, dl, NewOutTy, LegalOp);
 
   // Truncate the result of the extended FP_TO_*INT operation to the desired
   // size.
   return DAG.getNode(ISD::TRUNCATE, dl, DestVT, Operation);
 }
 
 /// Open code the operations for BITREVERSE.
 SDValue SelectionDAGLegalize::ExpandBITREVERSE(SDValue Op, SDLoc dl) {
   EVT VT = Op.getValueType();
   EVT SHVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());
   unsigned Sz = VT.getScalarSizeInBits();
   
   SDValue Tmp, Tmp2;
   Tmp = DAG.getConstant(0, dl, VT);
   for (unsigned I = 0, J = Sz-1; I < Sz; ++I, --J) {
     if (I < J)
       Tmp2 =
           DAG.getNode(ISD::SHL, dl, VT, Op, DAG.getConstant(J - I, dl, SHVT));
     else
       Tmp2 =
           DAG.getNode(ISD::SRL, dl, VT, Op, DAG.getConstant(I - J, dl, SHVT));
     
     APInt Shift(Sz, 1);
     Shift = Shift.shl(J);
     Tmp2 = DAG.getNode(ISD::AND, dl, VT, Tmp2, DAG.getConstant(Shift, dl, VT));
     Tmp = DAG.getNode(ISD::OR, dl, VT, Tmp, Tmp2);
   }
 
   return Tmp;
 }
 
 /// Open code the operations for BSWAP of the specified operation.
 SDValue SelectionDAGLegalize::ExpandBSWAP(SDValue Op, SDLoc dl) {
   EVT VT = Op.getValueType();
   EVT SHVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());
   SDValue Tmp1, Tmp2, Tmp3, Tmp4, Tmp5, Tmp6, Tmp7, Tmp8;
   switch (VT.getSimpleVT().SimpleTy) {
   default: llvm_unreachable("Unhandled Expand type in BSWAP!");
   case MVT::i16:
     Tmp2 = DAG.getNode(ISD::SHL, dl, VT, Op, DAG.getConstant(8, dl, SHVT));
     Tmp1 = DAG.getNode(ISD::SRL, dl, VT, Op, DAG.getConstant(8, dl, SHVT));
     return DAG.getNode(ISD::OR, dl, VT, Tmp1, Tmp2);
   case MVT::i32:
     Tmp4 = DAG.getNode(ISD::SHL, dl, VT, Op, DAG.getConstant(24, dl, SHVT));
     Tmp3 = DAG.getNode(ISD::SHL, dl, VT, Op, DAG.getConstant(8, dl, SHVT));
     Tmp2 = DAG.getNode(ISD::SRL, dl, VT, Op, DAG.getConstant(8, dl, SHVT));
     Tmp1 = DAG.getNode(ISD::SRL, dl, VT, Op, DAG.getConstant(24, dl, SHVT));
     Tmp3 = DAG.getNode(ISD::AND, dl, VT, Tmp3,
                        DAG.getConstant(0xFF0000, dl, VT));
     Tmp2 = DAG.getNode(ISD::AND, dl, VT, Tmp2, DAG.getConstant(0xFF00, dl, VT));
     Tmp4 = DAG.getNode(ISD::OR, dl, VT, Tmp4, Tmp3);
     Tmp2 = DAG.getNode(ISD::OR, dl, VT, Tmp2, Tmp1);
     return DAG.getNode(ISD::OR, dl, VT, Tmp4, Tmp2);
   case MVT::i64:
     Tmp8 = DAG.getNode(ISD::SHL, dl, VT, Op, DAG.getConstant(56, dl, SHVT));
     Tmp7 = DAG.getNode(ISD::SHL, dl, VT, Op, DAG.getConstant(40, dl, SHVT));
     Tmp6 = DAG.getNode(ISD::SHL, dl, VT, Op, DAG.getConstant(24, dl, SHVT));
     Tmp5 = DAG.getNode(ISD::SHL, dl, VT, Op, DAG.getConstant(8, dl, SHVT));
     Tmp4 = DAG.getNode(ISD::SRL, dl, VT, Op, DAG.getConstant(8, dl, SHVT));
     Tmp3 = DAG.getNode(ISD::SRL, dl, VT, Op, DAG.getConstant(24, dl, SHVT));
     Tmp2 = DAG.getNode(ISD::SRL, dl, VT, Op, DAG.getConstant(40, dl, SHVT));
     Tmp1 = DAG.getNode(ISD::SRL, dl, VT, Op, DAG.getConstant(56, dl, SHVT));
     Tmp7 = DAG.getNode(ISD::AND, dl, VT, Tmp7,
                        DAG.getConstant(255ULL<<48, dl, VT));
     Tmp6 = DAG.getNode(ISD::AND, dl, VT, Tmp6,
                        DAG.getConstant(255ULL<<40, dl, VT));
     Tmp5 = DAG.getNode(ISD::AND, dl, VT, Tmp5,
                        DAG.getConstant(255ULL<<32, dl, VT));
     Tmp4 = DAG.getNode(ISD::AND, dl, VT, Tmp4,
                        DAG.getConstant(255ULL<<24, dl, VT));
     Tmp3 = DAG.getNode(ISD::AND, dl, VT, Tmp3,
                        DAG.getConstant(255ULL<<16, dl, VT));
     Tmp2 = DAG.getNode(ISD::AND, dl, VT, Tmp2,
                        DAG.getConstant(255ULL<<8 , dl, VT));
     Tmp8 = DAG.getNode(ISD::OR, dl, VT, Tmp8, Tmp7);
     Tmp6 = DAG.getNode(ISD::OR, dl, VT, Tmp6, Tmp5);
     Tmp4 = DAG.getNode(ISD::OR, dl, VT, Tmp4, Tmp3);
     Tmp2 = DAG.getNode(ISD::OR, dl, VT, Tmp2, Tmp1);
     Tmp8 = DAG.getNode(ISD::OR, dl, VT, Tmp8, Tmp6);
     Tmp4 = DAG.getNode(ISD::OR, dl, VT, Tmp4, Tmp2);
     return DAG.getNode(ISD::OR, dl, VT, Tmp8, Tmp4);
   }
 }
 
 /// Expand the specified bitcount instruction into operations.
 SDValue SelectionDAGLegalize::ExpandBitCount(unsigned Opc, SDValue Op,
                                              SDLoc dl) {
   switch (Opc) {
   default: llvm_unreachable("Cannot expand this yet!");
   case ISD::CTPOP: {
     EVT VT = Op.getValueType();
     EVT ShVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());
     unsigned Len = VT.getSizeInBits();
 
     assert(VT.isInteger() && Len <= 128 && Len % 8 == 0 &&
            "CTPOP not implemented for this type.");
 
     // This is the "best" algorithm from
     // http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
 
     SDValue Mask55 = DAG.getConstant(APInt::getSplat(Len, APInt(8, 0x55)),
                                      dl, VT);
     SDValue Mask33 = DAG.getConstant(APInt::getSplat(Len, APInt(8, 0x33)),
                                      dl, VT);
     SDValue Mask0F = DAG.getConstant(APInt::getSplat(Len, APInt(8, 0x0F)),
                                      dl, VT);
     SDValue Mask01 = DAG.getConstant(APInt::getSplat(Len, APInt(8, 0x01)),
                                      dl, VT);
 
     // v = v - ((v >> 1) & 0x55555555...)
     Op = DAG.getNode(ISD::SUB, dl, VT, Op,
                      DAG.getNode(ISD::AND, dl, VT,
                                  DAG.getNode(ISD::SRL, dl, VT, Op,
                                              DAG.getConstant(1, dl, ShVT)),
                                  Mask55));
     // v = (v & 0x33333333...) + ((v >> 2) & 0x33333333...)
     Op = DAG.getNode(ISD::ADD, dl, VT,
                      DAG.getNode(ISD::AND, dl, VT, Op, Mask33),
                      DAG.getNode(ISD::AND, dl, VT,
                                  DAG.getNode(ISD::SRL, dl, VT, Op,
                                              DAG.getConstant(2, dl, ShVT)),
                                  Mask33));
     // v = (v + (v >> 4)) & 0x0F0F0F0F...
     Op = DAG.getNode(ISD::AND, dl, VT,
                      DAG.getNode(ISD::ADD, dl, VT, Op,
                                  DAG.getNode(ISD::SRL, dl, VT, Op,
                                              DAG.getConstant(4, dl, ShVT))),
                      Mask0F);
     // v = (v * 0x01010101...) >> (Len - 8)
     Op = DAG.getNode(ISD::SRL, dl, VT,
                      DAG.getNode(ISD::MUL, dl, VT, Op, Mask01),
                      DAG.getConstant(Len - 8, dl, ShVT));
 
     return Op;
   }
   case ISD::CTLZ_ZERO_UNDEF:
     // This trivially expands to CTLZ.
     return DAG.getNode(ISD::CTLZ, dl, Op.getValueType(), Op);
   case ISD::CTLZ: {
     EVT VT = Op.getValueType();
     unsigned len = VT.getSizeInBits();
 
     if (TLI.isOperationLegalOrCustom(ISD::CTLZ_ZERO_UNDEF, VT)) {
       EVT SetCCVT = getSetCCResultType(VT);
       SDValue CTLZ = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, dl, VT, Op);
       SDValue Zero = DAG.getConstant(0, dl, VT);
       SDValue SrcIsZero = DAG.getSetCC(dl, SetCCVT, Op, Zero, ISD::SETEQ);
       return DAG.getNode(ISD::SELECT, dl, VT, SrcIsZero,
                          DAG.getConstant(len, dl, VT), CTLZ);
     }
 
     // for now, we do this:
     // x = x | (x >> 1);
     // x = x | (x >> 2);
     // ...
     // x = x | (x >>16);
     // x = x | (x >>32); // for 64-bit input
     // return popcount(~x);
     //
     // Ref: "Hacker's Delight" by Henry Warren
     EVT ShVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());
     for (unsigned i = 0; (1U << i) <= (len / 2); ++i) {
       SDValue Tmp3 = DAG.getConstant(1ULL << i, dl, ShVT);
       Op = DAG.getNode(ISD::OR, dl, VT, Op,
                        DAG.getNode(ISD::SRL, dl, VT, Op, Tmp3));
     }
     Op = DAG.getNOT(dl, Op, VT);
     return DAG.getNode(ISD::CTPOP, dl, VT, Op);
   }
   case ISD::CTTZ_ZERO_UNDEF:
     // This trivially expands to CTTZ.
     return DAG.getNode(ISD::CTTZ, dl, Op.getValueType(), Op);
   case ISD::CTTZ: {
     // for now, we use: { return popcount(~x & (x - 1)); }
     // unless the target has ctlz but not ctpop, in which case we use:
     // { return 32 - nlz(~x & (x-1)); }
     // Ref: "Hacker's Delight" by Henry Warren
     EVT VT = Op.getValueType();
     SDValue Tmp3 = DAG.getNode(ISD::AND, dl, VT,
                                DAG.getNOT(dl, Op, VT),
                                DAG.getNode(ISD::SUB, dl, VT, Op,
                                            DAG.getConstant(1, dl, VT)));
     // If ISD::CTLZ is legal and CTPOP isn't, then do that instead.
     if (!TLI.isOperationLegalOrCustom(ISD::CTPOP, VT) &&
         TLI.isOperationLegalOrCustom(ISD::CTLZ, VT))
       return DAG.getNode(ISD::SUB, dl, VT,
                          DAG.getConstant(VT.getSizeInBits(), dl, VT),
                          DAG.getNode(ISD::CTLZ, dl, VT, Tmp3));
     return DAG.getNode(ISD::CTPOP, dl, VT, Tmp3);
   }
   }
 }
 
 bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
   SmallVector<SDValue, 8> Results;
   SDLoc dl(Node);
   SDValue Tmp1, Tmp2, Tmp3, Tmp4;
   bool NeedInvert;
   switch (Node->getOpcode()) {
   case ISD::CTPOP:
   case ISD::CTLZ:
   case ISD::CTLZ_ZERO_UNDEF:
   case ISD::CTTZ:
   case ISD::CTTZ_ZERO_UNDEF:
     Tmp1 = ExpandBitCount(Node->getOpcode(), Node->getOperand(0), dl);
     Results.push_back(Tmp1);
     break;
   case ISD::BITREVERSE:
     Results.push_back(ExpandBITREVERSE(Node->getOperand(0), dl));
     break;
   case ISD::BSWAP:
     Results.push_back(ExpandBSWAP(Node->getOperand(0), dl));
     break;
   case ISD::FRAMEADDR:
   case ISD::RETURNADDR:
   case ISD::FRAME_TO_ARGS_OFFSET:
     Results.push_back(DAG.getConstant(0, dl, Node->getValueType(0)));
     break;
   case ISD::FLT_ROUNDS_:
     Results.push_back(DAG.getConstant(1, dl, Node->getValueType(0)));
     break;
   case ISD::EH_RETURN:
   case ISD::EH_LABEL:
   case ISD::PREFETCH:
   case ISD::VAEND:
   case ISD::EH_SJLJ_LONGJMP:
     // If the target didn't expand these, there's nothing to do, so just
     // preserve the chain and be done.
     Results.push_back(Node->getOperand(0));
     break;
   case ISD::READCYCLECOUNTER:
     // If the target didn't expand this, just return 'zero' and preserve the
     // chain.
     Results.append(Node->getNumValues() - 1,
                    DAG.getConstant(0, dl, Node->getValueType(0)));
     Results.push_back(Node->getOperand(0));
     break;
   case ISD::EH_SJLJ_SETJMP:
     // If the target didn't expand this, just return 'zero' and preserve the
     // chain.
     Results.push_back(DAG.getConstant(0, dl, MVT::i32));
     Results.push_back(Node->getOperand(0));
     break;
   case ISD::ATOMIC_LOAD: {
     // There is no libcall for atomic load; fake it with ATOMIC_CMP_SWAP.
     SDValue Zero = DAG.getConstant(0, dl, Node->getValueType(0));
     SDVTList VTs = DAG.getVTList(Node->getValueType(0), MVT::Other);
     SDValue Swap = DAG.getAtomicCmpSwap(
         ISD::ATOMIC_CMP_SWAP, dl, cast<AtomicSDNode>(Node)->getMemoryVT(), VTs,
         Node->getOperand(0), Node->getOperand(1), Zero, Zero,
         cast<AtomicSDNode>(Node)->getMemOperand(),
         cast<AtomicSDNode>(Node)->getOrdering(),
         cast<AtomicSDNode>(Node)->getOrdering(),
         cast<AtomicSDNode>(Node)->getSynchScope());
     Results.push_back(Swap.getValue(0));
     Results.push_back(Swap.getValue(1));
     break;
   }
   case ISD::ATOMIC_STORE: {
     // There is no libcall for atomic store; fake it with ATOMIC_SWAP.
     SDValue Swap = DAG.getAtomic(ISD::ATOMIC_SWAP, dl,
                                  cast<AtomicSDNode>(Node)->getMemoryVT(),
                                  Node->getOperand(0),
                                  Node->getOperand(1), Node->getOperand(2),
                                  cast<AtomicSDNode>(Node)->getMemOperand(),
                                  cast<AtomicSDNode>(Node)->getOrdering(),
                                  cast<AtomicSDNode>(Node)->getSynchScope());
     Results.push_back(Swap.getValue(1));
     break;
   }
   case ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS: {
     // Expanding an ATOMIC_CMP_SWAP_WITH_SUCCESS produces an ATOMIC_CMP_SWAP and
     // splits out the success value as a comparison. Expanding the resulting
     // ATOMIC_CMP_SWAP will produce a libcall.
     SDVTList VTs = DAG.getVTList(Node->getValueType(0), MVT::Other);
     SDValue Res = DAG.getAtomicCmpSwap(
         ISD::ATOMIC_CMP_SWAP, dl, cast<AtomicSDNode>(Node)->getMemoryVT(), VTs,
         Node->getOperand(0), Node->getOperand(1), Node->getOperand(2),
         Node->getOperand(3), cast<MemSDNode>(Node)->getMemOperand(),
         cast<AtomicSDNode>(Node)->getSuccessOrdering(),
         cast<AtomicSDNode>(Node)->getFailureOrdering(),
         cast<AtomicSDNode>(Node)->getSynchScope());
 
     SDValue Success = DAG.getSetCC(SDLoc(Node), Node->getValueType(1),
                                    Res, Node->getOperand(2), ISD::SETEQ);
 
     Results.push_back(Res.getValue(0));
     Results.push_back(Success);
     Results.push_back(Res.getValue(1));
     break;
   }
   case ISD::DYNAMIC_STACKALLOC:
     ExpandDYNAMIC_STACKALLOC(Node, Results);
     break;
   case ISD::MERGE_VALUES:
     for (unsigned i = 0; i < Node->getNumValues(); i++)
       Results.push_back(Node->getOperand(i));
     break;
   case ISD::UNDEF: {
     EVT VT = Node->getValueType(0);
     if (VT.isInteger())
       Results.push_back(DAG.getConstant(0, dl, VT));
     else {
       assert(VT.isFloatingPoint() && "Unknown value type!");
       Results.push_back(DAG.getConstantFP(0, dl, VT));
     }
     break;
   }
   case ISD::FP_ROUND:
   case ISD::BITCAST:
     Tmp1 = EmitStackConvert(Node->getOperand(0), Node->getValueType(0),
                             Node->getValueType(0), dl);
     Results.push_back(Tmp1);
     break;
   case ISD::FP_EXTEND:
     Tmp1 = EmitStackConvert(Node->getOperand(0),
                             Node->getOperand(0).getValueType(),
                             Node->getValueType(0), dl);
     Results.push_back(Tmp1);
     break;
   case ISD::SIGN_EXTEND_INREG: {
     // NOTE: we could fall back on load/store here too for targets without
     // SAR.  However, it is doubtful that any exist.
     EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();
     EVT VT = Node->getValueType(0);
     EVT ShiftAmountTy = TLI.getShiftAmountTy(VT, DAG.getDataLayout());
     if (VT.isVector())
       ShiftAmountTy = VT;
     unsigned BitsDiff = VT.getScalarType().getSizeInBits() -
                         ExtraVT.getScalarType().getSizeInBits();
     SDValue ShiftCst = DAG.getConstant(BitsDiff, dl, ShiftAmountTy);
     Tmp1 = DAG.getNode(ISD::SHL, dl, Node->getValueType(0),
                        Node->getOperand(0), ShiftCst);
     Tmp1 = DAG.getNode(ISD::SRA, dl, Node->getValueType(0), Tmp1, ShiftCst);
     Results.push_back(Tmp1);
     break;
   }
   case ISD::FP_ROUND_INREG: {
     // The only way we can lower this is to turn it into a TRUNCSTORE,
     // EXTLOAD pair, targeting a temporary location (a stack slot).
 
     // NOTE: there is a choice here between constantly creating new stack
     // slots and always reusing the same one.  We currently always create
     // new ones, as reuse may inhibit scheduling.
     EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();
     Tmp1 = EmitStackConvert(Node->getOperand(0), ExtraVT,
                             Node->getValueType(0), dl);
     Results.push_back(Tmp1);
     break;
   }
   case ISD::SINT_TO_FP:
   case ISD::UINT_TO_FP:
     Tmp1 = ExpandLegalINT_TO_FP(Node->getOpcode() == ISD::SINT_TO_FP,
                                 Node->getOperand(0), Node->getValueType(0), dl);
     Results.push_back(Tmp1);
     break;
   case ISD::FP_TO_SINT:
     if (TLI.expandFP_TO_SINT(Node, Tmp1, DAG))
       Results.push_back(Tmp1);
     break;
   case ISD::FP_TO_UINT: {
     SDValue True, False;
     EVT VT =  Node->getOperand(0).getValueType();
     EVT NVT = Node->getValueType(0);
     APFloat apf(DAG.EVTToAPFloatSemantics(VT),
                 APInt::getNullValue(VT.getSizeInBits()));
     APInt x = APInt::getSignBit(NVT.getSizeInBits());
     (void)apf.convertFromAPInt(x, false, APFloat::rmNearestTiesToEven);
     Tmp1 = DAG.getConstantFP(apf, dl, VT);
     Tmp2 = DAG.getSetCC(dl, getSetCCResultType(VT),
                         Node->getOperand(0),
                         Tmp1, ISD::SETLT);
     True = DAG.getNode(ISD::FP_TO_SINT, dl, NVT, Node->getOperand(0));
     // TODO: Should any fast-math-flags be set for the FSUB?
     False = DAG.getNode(ISD::FP_TO_SINT, dl, NVT,
                         DAG.getNode(ISD::FSUB, dl, VT,
                                     Node->getOperand(0), Tmp1));
     False = DAG.getNode(ISD::XOR, dl, NVT, False,
                         DAG.getConstant(x, dl, NVT));
     Tmp1 = DAG.getSelect(dl, NVT, Tmp2, True, False);
     Results.push_back(Tmp1);
     break;
   }
   case ISD::VAARG:
     Results.push_back(DAG.expandVAArg(Node));
     Results.push_back(Results[0].getValue(1));
     break;
   case ISD::VACOPY:
     Results.push_back(DAG.expandVACopy(Node));
     break;
   case ISD::EXTRACT_VECTOR_ELT:
     if (Node->getOperand(0).getValueType().getVectorNumElements() == 1)
       // This must be an access of the only element.  Return it.
       Tmp1 = DAG.getNode(ISD::BITCAST, dl, Node->getValueType(0),
                          Node->getOperand(0));
     else
       Tmp1 = ExpandExtractFromVectorThroughStack(SDValue(Node, 0));
     Results.push_back(Tmp1);
     break;
   case ISD::EXTRACT_SUBVECTOR:
     Results.push_back(ExpandExtractFromVectorThroughStack(SDValue(Node, 0)));
     break;
   case ISD::INSERT_SUBVECTOR:
     Results.push_back(ExpandInsertToVectorThroughStack(SDValue(Node, 0)));
     break;
   case ISD::CONCAT_VECTORS: {
     Results.push_back(ExpandVectorBuildThroughStack(Node));
     break;
   }
   case ISD::SCALAR_TO_VECTOR:
     Results.push_back(ExpandSCALAR_TO_VECTOR(Node));
     break;
   case ISD::INSERT_VECTOR_ELT:
     Results.push_back(ExpandINSERT_VECTOR_ELT(Node->getOperand(0),
                                               Node->getOperand(1),
                                               Node->getOperand(2), dl));
     break;
   case ISD::VECTOR_SHUFFLE: {
     SmallVector<int, 32> NewMask;
     ArrayRef<int> Mask = cast<ShuffleVectorSDNode>(Node)->getMask();
 
     EVT VT = Node->getValueType(0);
     EVT EltVT = VT.getVectorElementType();
     SDValue Op0 = Node->getOperand(0);
     SDValue Op1 = Node->getOperand(1);
     if (!TLI.isTypeLegal(EltVT)) {
 
       EVT NewEltVT = TLI.getTypeToTransformTo(*DAG.getContext(), EltVT);
 
       // BUILD_VECTOR operands are allowed to be wider than the element type.
       // But if NewEltVT is smaller that EltVT the BUILD_VECTOR does not accept
       // it.
       if (NewEltVT.bitsLT(EltVT)) {
 
         // Convert shuffle node.
         // If original node was v4i64 and the new EltVT is i32,
         // cast operands to v8i32 and re-build the mask.
 
         // Calculate new VT, the size of the new VT should be equal to original.
         EVT NewVT =
             EVT::getVectorVT(*DAG.getContext(), NewEltVT,
                              VT.getSizeInBits() / NewEltVT.getSizeInBits());
         assert(NewVT.bitsEq(VT));
 
         // cast operands to new VT
         Op0 = DAG.getNode(ISD::BITCAST, dl, NewVT, Op0);
         Op1 = DAG.getNode(ISD::BITCAST, dl, NewVT, Op1);
 
         // Convert the shuffle mask
         unsigned int factor =
                          NewVT.getVectorNumElements()/VT.getVectorNumElements();
 
         // EltVT gets smaller
         assert(factor > 0);
 
         for (unsigned i = 0; i < VT.getVectorNumElements(); ++i) {
           if (Mask[i] < 0) {
             for (unsigned fi = 0; fi < factor; ++fi)
               NewMask.push_back(Mask[i]);
           }
           else {
             for (unsigned fi = 0; fi < factor; ++fi)
               NewMask.push_back(Mask[i]*factor+fi);
           }
         }
         Mask = NewMask;
         VT = NewVT;
       }
       EltVT = NewEltVT;
     }
     unsigned NumElems = VT.getVectorNumElements();
     SmallVector<SDValue, 16> Ops;
     for (unsigned i = 0; i != NumElems; ++i) {
       if (Mask[i] < 0) {
         Ops.push_back(DAG.getUNDEF(EltVT));
         continue;
       }
       unsigned Idx = Mask[i];
       if (Idx < NumElems)
         Ops.push_back(DAG.getNode(
             ISD::EXTRACT_VECTOR_ELT, dl, EltVT, Op0,
             DAG.getConstant(Idx, dl, TLI.getVectorIdxTy(DAG.getDataLayout()))));
       else
         Ops.push_back(DAG.getNode(
             ISD::EXTRACT_VECTOR_ELT, dl, EltVT, Op1,
             DAG.getConstant(Idx - NumElems, dl,
                             TLI.getVectorIdxTy(DAG.getDataLayout()))));
     }
 
     Tmp1 = DAG.getNode(ISD::BUILD_VECTOR, dl, VT, Ops);
     // We may have changed the BUILD_VECTOR type. Cast it back to the Node type.
     Tmp1 = DAG.getNode(ISD::BITCAST, dl, Node->getValueType(0), Tmp1);
     Results.push_back(Tmp1);
     break;
   }
   case ISD::EXTRACT_ELEMENT: {
     EVT OpTy = Node->getOperand(0).getValueType();
     if (cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue()) {
       // 1 -> Hi
       Tmp1 = DAG.getNode(ISD::SRL, dl, OpTy, Node->getOperand(0),
                          DAG.getConstant(OpTy.getSizeInBits() / 2, dl,
                                          TLI.getShiftAmountTy(
                                              Node->getOperand(0).getValueType(),
                                              DAG.getDataLayout())));
       Tmp1 = DAG.getNode(ISD::TRUNCATE, dl, Node->getValueType(0), Tmp1);
     } else {
       // 0 -> Lo
       Tmp1 = DAG.getNode(ISD::TRUNCATE, dl, Node->getValueType(0),
                          Node->getOperand(0));
     }
     Results.push_back(Tmp1);
     break;
   }
   case ISD::STACKSAVE:
     // Expand to CopyFromReg if the target set
     // StackPointerRegisterToSaveRestore.
     if (unsigned SP = TLI.getStackPointerRegisterToSaveRestore()) {
       Results.push_back(DAG.getCopyFromReg(Node->getOperand(0), dl, SP,
                                            Node->getValueType(0)));
       Results.push_back(Results[0].getValue(1));
     } else {
       Results.push_back(DAG.getUNDEF(Node->getValueType(0)));
       Results.push_back(Node->getOperand(0));
     }
     break;
   case ISD::STACKRESTORE:
     // Expand to CopyToReg if the target set
     // StackPointerRegisterToSaveRestore.
     if (unsigned SP = TLI.getStackPointerRegisterToSaveRestore()) {
       Results.push_back(DAG.getCopyToReg(Node->getOperand(0), dl, SP,
                                          Node->getOperand(1)));
     } else {
       Results.push_back(Node->getOperand(0));
     }
     break;
   case ISD::GET_DYNAMIC_AREA_OFFSET:
     Results.push_back(DAG.getConstant(0, dl, Node->getValueType(0)));
     Results.push_back(Results[0].getValue(0));
     break;
   case ISD::FCOPYSIGN:
     Results.push_back(ExpandFCOPYSIGN(Node));
     break;
   case ISD::FNEG:
     // Expand Y = FNEG(X) ->  Y = SUB -0.0, X
     Tmp1 = DAG.getConstantFP(-0.0, dl, Node->getValueType(0));
     // TODO: If FNEG has fast-math-flags, propagate them to the FSUB.
     Tmp1 = DAG.getNode(ISD::FSUB, dl, Node->getValueType(0), Tmp1,
                        Node->getOperand(0));
     Results.push_back(Tmp1);
     break;
   case ISD::FABS:
     Results.push_back(ExpandFABS(Node));
     break;
   case ISD::SMIN:
   case ISD::SMAX:
   case ISD::UMIN:
   case ISD::UMAX: {
     // Expand Y = MAX(A, B) -> Y = (A > B) ? A : B
     ISD::CondCode Pred;
     switch (Node->getOpcode()) {
     default: llvm_unreachable("How did we get here?");
     case ISD::SMAX: Pred = ISD::SETGT; break;
     case ISD::SMIN: Pred = ISD::SETLT; break;
     case ISD::UMAX: Pred = ISD::SETUGT; break;
     case ISD::UMIN: Pred = ISD::SETULT; break;
     }
     Tmp1 = Node->getOperand(0);
     Tmp2 = Node->getOperand(1);
     Tmp1 = DAG.getSelectCC(dl, Tmp1, Tmp2, Tmp1, Tmp2, Pred);
     Results.push_back(Tmp1);
     break;
   }
     
   case ISD::FSIN:
   case ISD::FCOS: {
     EVT VT = Node->getValueType(0);
     // Turn fsin / fcos into ISD::FSINCOS node if there are a pair of fsin /
     // fcos which share the same operand and both are used.
     if ((TLI.isOperationLegalOrCustom(ISD::FSINCOS, VT) ||
          canCombineSinCosLibcall(Node, TLI, TM))
         && useSinCos(Node)) {
       SDVTList VTs = DAG.getVTList(VT, VT);
       Tmp1 = DAG.getNode(ISD::FSINCOS, dl, VTs, Node->getOperand(0));
       if (Node->getOpcode() == ISD::FCOS)
         Tmp1 = Tmp1.getValue(1);
       Results.push_back(Tmp1);
     }
     break;
   }
   case ISD::FMAD:
     llvm_unreachable("Illegal fmad should never be formed");
 
   case ISD::FP16_TO_FP:
     if (Node->getValueType(0) != MVT::f32) {
       // We can extend to types bigger than f32 in two steps without changing
       // the result. Since "f16 -> f32" is much more commonly available, give
       // CodeGen the option of emitting that before resorting to a libcall.
       SDValue Res =
           DAG.getNode(ISD::FP16_TO_FP, dl, MVT::f32, Node->getOperand(0));
       Results.push_back(
           DAG.getNode(ISD::FP_EXTEND, dl, Node->getValueType(0), Res));
     }
     break;
   case ISD::FP_TO_FP16:
     if (!TLI.useSoftFloat() && TM.Options.UnsafeFPMath) {
       SDValue Op = Node->getOperand(0);
       MVT SVT = Op.getSimpleValueType();
       if ((SVT == MVT::f64 || SVT == MVT::f80) &&
           TLI.isOperationLegalOrCustom(ISD::FP_TO_FP16, MVT::f32)) {
         // Under fastmath, we can expand this node into a fround followed by
         // a float-half conversion.
         SDValue FloatVal = DAG.getNode(ISD::FP_ROUND, dl, MVT::f32, Op,
                                        DAG.getIntPtrConstant(0, dl));
         Results.push_back(
             DAG.getNode(ISD::FP_TO_FP16, dl, MVT::i16, FloatVal));
       }
     }
     break;
   case ISD::ConstantFP: {
     ConstantFPSDNode *CFP = cast<ConstantFPSDNode>(Node);
     // Check to see if this FP immediate is already legal.
     // If this is a legal constant, turn it into a TargetConstantFP node.
     if (!TLI.isFPImmLegal(CFP->getValueAPF(), Node->getValueType(0)))
       Results.push_back(ExpandConstantFP(CFP, true));
     break;
   }
   case ISD::Constant: {
     ConstantSDNode *CP = cast<ConstantSDNode>(Node);
     Results.push_back(ExpandConstant(CP));
     break;
   }
   case ISD::FSUB: {
     EVT VT = Node->getValueType(0);
     if (TLI.isOperationLegalOrCustom(ISD::FADD, VT) &&
         TLI.isOperationLegalOrCustom(ISD::FNEG, VT)) {
       const SDNodeFlags *Flags = &cast<BinaryWithFlagsSDNode>(Node)->Flags;
       Tmp1 = DAG.getNode(ISD::FNEG, dl, VT, Node->getOperand(1));
       Tmp1 = DAG.getNode(ISD::FADD, dl, VT, Node->getOperand(0), Tmp1, Flags);
       Results.push_back(Tmp1);
     }
     break;
   }
   case ISD::SUB: {
     EVT VT = Node->getValueType(0);
     assert(TLI.isOperationLegalOrCustom(ISD::ADD, VT) &&
            TLI.isOperationLegalOrCustom(ISD::XOR, VT) &&
            "Don't know how to expand this subtraction!");
     Tmp1 = DAG.getNode(ISD::XOR, dl, VT, Node->getOperand(1),
                DAG.getConstant(APInt::getAllOnesValue(VT.getSizeInBits()), dl,
                                VT));
     Tmp1 = DAG.getNode(ISD::ADD, dl, VT, Tmp1, DAG.getConstant(1, dl, VT));
     Results.push_back(DAG.getNode(ISD::ADD, dl, VT, Node->getOperand(0), Tmp1));
     break;
   }
   case ISD::UREM:
   case ISD::SREM: {
     EVT VT = Node->getValueType(0);
     bool isSigned = Node->getOpcode() == ISD::SREM;
     unsigned DivOpc = isSigned ? ISD::SDIV : ISD::UDIV;
     unsigned DivRemOpc = isSigned ? ISD::SDIVREM : ISD::UDIVREM;
     Tmp2 = Node->getOperand(0);
     Tmp3 = Node->getOperand(1);
     if (TLI.isOperationLegalOrCustom(DivRemOpc, VT)) {
       SDVTList VTs = DAG.getVTList(VT, VT);
       Tmp1 = DAG.getNode(DivRemOpc, dl, VTs, Tmp2, Tmp3).getValue(1);
       Results.push_back(Tmp1);
     } else if (TLI.isOperationLegalOrCustom(DivOpc, VT)) {
       // X % Y -> X-X/Y*Y
       Tmp1 = DAG.getNode(DivOpc, dl, VT, Tmp2, Tmp3);
       Tmp1 = DAG.getNode(ISD::MUL, dl, VT, Tmp1, Tmp3);
       Tmp1 = DAG.getNode(ISD::SUB, dl, VT, Tmp2, Tmp1);
       Results.push_back(Tmp1);
     }
     break;
   }
   case ISD::UDIV:
   case ISD::SDIV: {
     bool isSigned = Node->getOpcode() == ISD::SDIV;
     unsigned DivRemOpc = isSigned ? ISD::SDIVREM : ISD::UDIVREM;
     EVT VT = Node->getValueType(0);
     if (TLI.isOperationLegalOrCustom(DivRemOpc, VT)) {
       SDVTList VTs = DAG.getVTList(VT, VT);
       Tmp1 = DAG.getNode(DivRemOpc, dl, VTs, Node->getOperand(0),
                          Node->getOperand(1));
       Results.push_back(Tmp1);
     }
     break;
   }
   case ISD::MULHU:
   case ISD::MULHS: {
     unsigned ExpandOpcode = Node->getOpcode() == ISD::MULHU ? ISD::UMUL_LOHI :
                                                               ISD::SMUL_LOHI;
     EVT VT = Node->getValueType(0);
     SDVTList VTs = DAG.getVTList(VT, VT);
     assert(TLI.isOperationLegalOrCustom(ExpandOpcode, VT) &&
            "If this wasn't legal, it shouldn't have been created!");
     Tmp1 = DAG.getNode(ExpandOpcode, dl, VTs, Node->getOperand(0),
                        Node->getOperand(1));
     Results.push_back(Tmp1.getValue(1));
     break;
   }
   case ISD::MUL: {
     EVT VT = Node->getValueType(0);
     SDVTList VTs = DAG.getVTList(VT, VT);
     // See if multiply or divide can be lowered using two-result operations.
     // We just need the low half of the multiply; try both the signed
     // and unsigned forms. If the target supports both SMUL_LOHI and
     // UMUL_LOHI, form a preference by checking which forms of plain
     // MULH it supports.
     bool HasSMUL_LOHI = TLI.isOperationLegalOrCustom(ISD::SMUL_LOHI, VT);
     bool HasUMUL_LOHI = TLI.isOperationLegalOrCustom(ISD::UMUL_LOHI, VT);
     bool HasMULHS = TLI.isOperationLegalOrCustom(ISD::MULHS, VT);
     bool HasMULHU = TLI.isOperationLegalOrCustom(ISD::MULHU, VT);
     unsigned OpToUse = 0;
     if (HasSMUL_LOHI && !HasMULHS) {
       OpToUse = ISD::SMUL_LOHI;
     } else if (HasUMUL_LOHI && !HasMULHU) {
       OpToUse = ISD::UMUL_LOHI;
     } else if (HasSMUL_LOHI) {
       OpToUse = ISD::SMUL_LOHI;
     } else if (HasUMUL_LOHI) {
       OpToUse = ISD::UMUL_LOHI;
     }
     if (OpToUse) {
       Results.push_back(DAG.getNode(OpToUse, dl, VTs, Node->getOperand(0),
                                     Node->getOperand(1)));
       break;
     }
 
     SDValue Lo, Hi;
     EVT HalfType = VT.getHalfSizedIntegerVT(*DAG.getContext());
     if (TLI.isOperationLegalOrCustom(ISD::ZERO_EXTEND, VT) &&
         TLI.isOperationLegalOrCustom(ISD::ANY_EXTEND, VT) &&
         TLI.isOperationLegalOrCustom(ISD::SHL, VT) &&
         TLI.isOperationLegalOrCustom(ISD::OR, VT) &&
         TLI.expandMUL(Node, Lo, Hi, HalfType, DAG)) {
       Lo = DAG.getNode(ISD::ZERO_EXTEND, dl, VT, Lo);
       Hi = DAG.getNode(ISD::ANY_EXTEND, dl, VT, Hi);
       SDValue Shift =
           DAG.getConstant(HalfType.getSizeInBits(), dl,
                           TLI.getShiftAmountTy(HalfType, DAG.getDataLayout()));
       Hi = DAG.getNode(ISD::SHL, dl, VT, Hi, Shift);
       Results.push_back(DAG.getNode(ISD::OR, dl, VT, Lo, Hi));
     }
     break;
   }
   case ISD::SADDO:
   case ISD::SSUBO: {
     SDValue LHS = Node->getOperand(0);
     SDValue RHS = Node->getOperand(1);
     SDValue Sum = DAG.getNode(Node->getOpcode() == ISD::SADDO ?
                               ISD::ADD : ISD::SUB, dl, LHS.getValueType(),
                               LHS, RHS);
     Results.push_back(Sum);
     EVT ResultType = Node->getValueType(1);
     EVT OType = getSetCCResultType(Node->getValueType(0));
 
     SDValue Zero = DAG.getConstant(0, dl, LHS.getValueType());
 
     //   LHSSign -> LHS >= 0
     //   RHSSign -> RHS >= 0
     //   SumSign -> Sum >= 0
     //
     //   Add:
     //   Overflow -> (LHSSign == RHSSign) && (LHSSign != SumSign)
     //   Sub:
     //   Overflow -> (LHSSign != RHSSign) && (LHSSign != SumSign)
     //
     SDValue LHSSign = DAG.getSetCC(dl, OType, LHS, Zero, ISD::SETGE);
     SDValue RHSSign = DAG.getSetCC(dl, OType, RHS, Zero, ISD::SETGE);
     SDValue SignsMatch = DAG.getSetCC(dl, OType, LHSSign, RHSSign,
                                       Node->getOpcode() == ISD::SADDO ?
                                       ISD::SETEQ : ISD::SETNE);
 
     SDValue SumSign = DAG.getSetCC(dl, OType, Sum, Zero, ISD::SETGE);
     SDValue SumSignNE = DAG.getSetCC(dl, OType, LHSSign, SumSign, ISD::SETNE);
 
     SDValue Cmp = DAG.getNode(ISD::AND, dl, OType, SignsMatch, SumSignNE);
     Results.push_back(DAG.getBoolExtOrTrunc(Cmp, dl, ResultType, ResultType));
     break;
   }
   case ISD::UADDO:
   case ISD::USUBO: {
     SDValue LHS = Node->getOperand(0);
     SDValue RHS = Node->getOperand(1);
     SDValue Sum = DAG.getNode(Node->getOpcode() == ISD::UADDO ?
                               ISD::ADD : ISD::SUB, dl, LHS.getValueType(),
                               LHS, RHS);
     Results.push_back(Sum);
 
     EVT ResultType = Node->getValueType(1);
     EVT SetCCType = getSetCCResultType(Node->getValueType(0));
     ISD::CondCode CC
       = Node->getOpcode() == ISD::UADDO ? ISD::SETULT : ISD::SETUGT;
     SDValue SetCC = DAG.getSetCC(dl, SetCCType, Sum, LHS, CC);
 
     Results.push_back(DAG.getBoolExtOrTrunc(SetCC, dl, ResultType, ResultType));
     break;
   }
   case ISD::UMULO:
   case ISD::SMULO: {
     EVT VT = Node->getValueType(0);
     EVT WideVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits() * 2);
     SDValue LHS = Node->getOperand(0);
     SDValue RHS = Node->getOperand(1);
     SDValue BottomHalf;
     SDValue TopHalf;
     static const unsigned Ops[2][3] =
         { { ISD::MULHU, ISD::UMUL_LOHI, ISD::ZERO_EXTEND },
           { ISD::MULHS, ISD::SMUL_LOHI, ISD::SIGN_EXTEND }};
     bool isSigned = Node->getOpcode() == ISD::SMULO;
     if (TLI.isOperationLegalOrCustom(Ops[isSigned][0], VT)) {
       BottomHalf = DAG.getNode(ISD::MUL, dl, VT, LHS, RHS);
       TopHalf = DAG.getNode(Ops[isSigned][0], dl, VT, LHS, RHS);
     } else if (TLI.isOperationLegalOrCustom(Ops[isSigned][1], VT)) {
       BottomHalf = DAG.getNode(Ops[isSigned][1], dl, DAG.getVTList(VT, VT), LHS,
                                RHS);
       TopHalf = BottomHalf.getValue(1);
     } else if (TLI.isTypeLegal(WideVT)) {
       LHS = DAG.getNode(Ops[isSigned][2], dl, WideVT, LHS);
       RHS = DAG.getNode(Ops[isSigned][2], dl, WideVT, RHS);
       Tmp1 = DAG.getNode(ISD::MUL, dl, WideVT, LHS, RHS);
       BottomHalf = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, VT, Tmp1,
                                DAG.getIntPtrConstant(0, dl));
       TopHalf = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, VT, Tmp1,
                             DAG.getIntPtrConstant(1, dl));
     } else {
       // We can fall back to a libcall with an illegal type for the MUL if we
       // have a libcall big enough.
       // Also, we can fall back to a division in some cases, but that's a big
       // performance hit in the general case.
       RTLIB::Libcall LC = RTLIB::UNKNOWN_LIBCALL;
       if (WideVT == MVT::i16)
         LC = RTLIB::MUL_I16;
       else if (WideVT == MVT::i32)
         LC = RTLIB::MUL_I32;
       else if (WideVT == MVT::i64)
         LC = RTLIB::MUL_I64;
       else if (WideVT == MVT::i128)
         LC = RTLIB::MUL_I128;
       assert(LC != RTLIB::UNKNOWN_LIBCALL && "Cannot expand this operation!");
 
       // The high part is obtained by SRA'ing all but one of the bits of low
       // part.
       unsigned LoSize = VT.getSizeInBits();
       SDValue HiLHS =
           DAG.getNode(ISD::SRA, dl, VT, RHS,
                       DAG.getConstant(LoSize - 1, dl,
                                       TLI.getPointerTy(DAG.getDataLayout())));
       SDValue HiRHS =
           DAG.getNode(ISD::SRA, dl, VT, LHS,
                       DAG.getConstant(LoSize - 1, dl,
                                       TLI.getPointerTy(DAG.getDataLayout())));
 
       // Here we're passing the 2 arguments explicitly as 4 arguments that are
       // pre-lowered to the correct types. This all depends upon WideVT not
       // being a legal type for the architecture and thus has to be split to
       // two arguments.
       SDValue Args[] = { LHS, HiLHS, RHS, HiRHS };
       SDValue Ret = ExpandLibCall(LC, WideVT, Args, 4, isSigned, dl);
       BottomHalf = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, VT, Ret,
                                DAG.getIntPtrConstant(0, dl));
       TopHalf = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, VT, Ret,
                             DAG.getIntPtrConstant(1, dl));
       // Ret is a node with an illegal type. Because such things are not
       // generally permitted during this phase of legalization, make sure the
       // node has no more uses. The above EXTRACT_ELEMENT nodes should have been
       // folded.
       assert(Ret->use_empty() &&
              "Unexpected uses of illegally type from expanded lib call.");
     }
 
     if (isSigned) {
       Tmp1 = DAG.getConstant(
           VT.getSizeInBits() - 1, dl,
           TLI.getShiftAmountTy(BottomHalf.getValueType(), DAG.getDataLayout()));
       Tmp1 = DAG.getNode(ISD::SRA, dl, VT, BottomHalf, Tmp1);
       TopHalf = DAG.getSetCC(dl, getSetCCResultType(VT), TopHalf, Tmp1,
                              ISD::SETNE);
     } else {
       TopHalf = DAG.getSetCC(dl, getSetCCResultType(VT), TopHalf,
                              DAG.getConstant(0, dl, VT), ISD::SETNE);
     }
     Results.push_back(BottomHalf);
     Results.push_back(TopHalf);
     break;
   }
   case ISD::BUILD_PAIR: {
     EVT PairTy = Node->getValueType(0);
     Tmp1 = DAG.getNode(ISD::ZERO_EXTEND, dl, PairTy, Node->getOperand(0));
     Tmp2 = DAG.getNode(ISD::ANY_EXTEND, dl, PairTy, Node->getOperand(1));
     Tmp2 = DAG.getNode(
         ISD::SHL, dl, PairTy, Tmp2,
         DAG.getConstant(PairTy.getSizeInBits() / 2, dl,
                         TLI.getShiftAmountTy(PairTy, DAG.getDataLayout())));
     Results.push_back(DAG.getNode(ISD::OR, dl, PairTy, Tmp1, Tmp2));
     break;
   }
   case ISD::SELECT:
     Tmp1 = Node->getOperand(0);
     Tmp2 = Node->getOperand(1);
     Tmp3 = Node->getOperand(2);
     if (Tmp1.getOpcode() == ISD::SETCC) {
       Tmp1 = DAG.getSelectCC(dl, Tmp1.getOperand(0), Tmp1.getOperand(1),
                              Tmp2, Tmp3,
                              cast<CondCodeSDNode>(Tmp1.getOperand(2))->get());
     } else {
       Tmp1 = DAG.getSelectCC(dl, Tmp1,
                              DAG.getConstant(0, dl, Tmp1.getValueType()),
                              Tmp2, Tmp3, ISD::SETNE);
     }
     Results.push_back(Tmp1);
     break;
   case ISD::BR_JT: {
     SDValue Chain = Node->getOperand(0);
     SDValue Table = Node->getOperand(1);
     SDValue Index = Node->getOperand(2);
 
     EVT PTy = TLI.getPointerTy(DAG.getDataLayout());
 
     const DataLayout &TD = DAG.getDataLayout();
     unsigned EntrySize =
       DAG.getMachineFunction().getJumpTableInfo()->getEntrySize(TD);
 
     Index = DAG.getNode(ISD::MUL, dl, Index.getValueType(), Index,
                         DAG.getConstant(EntrySize, dl, Index.getValueType()));
     SDValue Addr = DAG.getNode(ISD::ADD, dl, Index.getValueType(),
                                Index, Table);
 
     EVT MemVT = EVT::getIntegerVT(*DAG.getContext(), EntrySize * 8);
     SDValue LD = DAG.getExtLoad(
         ISD::SEXTLOAD, dl, PTy, Chain, Addr,
         MachinePointerInfo::getJumpTable(DAG.getMachineFunction()), MemVT,
         false, false, false, 0);
     Addr = LD;
     if (TM.getRelocationModel() == Reloc::PIC_) {
       // For PIC, the sequence is:
       // BRIND(load(Jumptable + index) + RelocBase)
       // RelocBase can be JumpTable, GOT or some sort of global base.
       Addr = DAG.getNode(ISD::ADD, dl, PTy, Addr,
                           TLI.getPICJumpTableRelocBase(Table, DAG));
     }
     Tmp1 = DAG.getNode(ISD::BRIND, dl, MVT::Other, LD.getValue(1), Addr);
     Results.push_back(Tmp1);
     break;
   }
   case ISD::BRCOND:
     // Expand brcond's setcc into its constituent parts and create a BR_CC
     // Node.
     Tmp1 = Node->getOperand(0);
     Tmp2 = Node->getOperand(1);
     if (Tmp2.getOpcode() == ISD::SETCC) {
       Tmp1 = DAG.getNode(ISD::BR_CC, dl, MVT::Other,
                          Tmp1, Tmp2.getOperand(2),
                          Tmp2.getOperand(0), Tmp2.getOperand(1),
                          Node->getOperand(2));
     } else {
       // We test only the i1 bit.  Skip the AND if UNDEF.
       Tmp3 = (Tmp2.getOpcode() == ISD::UNDEF) ? Tmp2 :
         DAG.getNode(ISD::AND, dl, Tmp2.getValueType(), Tmp2,
                     DAG.getConstant(1, dl, Tmp2.getValueType()));
       Tmp1 = DAG.getNode(ISD::BR_CC, dl, MVT::Other, Tmp1,
                          DAG.getCondCode(ISD::SETNE), Tmp3,
                          DAG.getConstant(0, dl, Tmp3.getValueType()),
                          Node->getOperand(2));
     }
     Results.push_back(Tmp1);
     break;
   case ISD::SETCC: {
     Tmp1 = Node->getOperand(0);
     Tmp2 = Node->getOperand(1);
     Tmp3 = Node->getOperand(2);
     bool Legalized = LegalizeSetCCCondCode(Node->getValueType(0), Tmp1, Tmp2,
                                            Tmp3, NeedInvert, dl);
 
     if (Legalized) {
       // If we expanded the SETCC by swapping LHS and RHS, or by inverting the
       // condition code, create a new SETCC node.
       if (Tmp3.getNode())
         Tmp1 = DAG.getNode(ISD::SETCC, dl, Node->getValueType(0),
                            Tmp1, Tmp2, Tmp3);
 
       // If we expanded the SETCC by inverting the condition code, then wrap
       // the existing SETCC in a NOT to restore the intended condition.
       if (NeedInvert)
         Tmp1 = DAG.getLogicalNOT(dl, Tmp1, Tmp1->getValueType(0));
 
       Results.push_back(Tmp1);
       break;
     }
 
     // Otherwise, SETCC for the given comparison type must be completely
     // illegal; expand it into a SELECT_CC.
     EVT VT = Node->getValueType(0);
     int TrueValue;
     switch (TLI.getBooleanContents(Tmp1->getValueType(0))) {
     case TargetLowering::ZeroOrOneBooleanContent:
     case TargetLowering::UndefinedBooleanContent:
       TrueValue = 1;
       break;
     case TargetLowering::ZeroOrNegativeOneBooleanContent:
       TrueValue = -1;
       break;
     }
     Tmp1 = DAG.getNode(ISD::SELECT_CC, dl, VT, Tmp1, Tmp2,
                        DAG.getConstant(TrueValue, dl, VT),
                        DAG.getConstant(0, dl, VT),
                        Tmp3);
     Results.push_back(Tmp1);
     break;
   }
   case ISD::SELECT_CC: {
     Tmp1 = Node->getOperand(0);   // LHS
     Tmp2 = Node->getOperand(1);   // RHS
     Tmp3 = Node->getOperand(2);   // True
     Tmp4 = Node->getOperand(3);   // False
     EVT VT = Node->getValueType(0);
     SDValue CC = Node->getOperand(4);
     ISD::CondCode CCOp = cast<CondCodeSDNode>(CC)->get();
 
     if (TLI.isCondCodeLegal(CCOp, Tmp1.getSimpleValueType())) {
       // If the condition code is legal, then we need to expand this
       // node using SETCC and SELECT.
       EVT CmpVT = Tmp1.getValueType();
       assert(!TLI.isOperationExpand(ISD::SELECT, VT) &&
              "Cannot expand ISD::SELECT_CC when ISD::SELECT also needs to be "
              "expanded.");
       EVT CCVT =
           TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), CmpVT);
       SDValue Cond = DAG.getNode(ISD::SETCC, dl, CCVT, Tmp1, Tmp2, CC);
       Results.push_back(DAG.getSelect(dl, VT, Cond, Tmp3, Tmp4));
       break;
     }
 
     // SELECT_CC is legal, so the condition code must not be.
     bool Legalized = false;
     // Try to legalize by inverting the condition.  This is for targets that
     // might support an ordered version of a condition, but not the unordered
     // version (or vice versa).
     ISD::CondCode InvCC = ISD::getSetCCInverse(CCOp,
                                                Tmp1.getValueType().isInteger());
     if (TLI.isCondCodeLegal(InvCC, Tmp1.getSimpleValueType())) {
       // Use the new condition code and swap true and false
       Legalized = true;
       Tmp1 = DAG.getSelectCC(dl, Tmp1, Tmp2, Tmp4, Tmp3, InvCC);
     } else {
       // If The inverse is not legal, then try to swap the arguments using
       // the inverse condition code.
       ISD::CondCode SwapInvCC = ISD::getSetCCSwappedOperands(InvCC);
       if (TLI.isCondCodeLegal(SwapInvCC, Tmp1.getSimpleValueType())) {
         // The swapped inverse condition is legal, so swap true and false,
         // lhs and rhs.
         Legalized = true;
         Tmp1 = DAG.getSelectCC(dl, Tmp2, Tmp1, Tmp4, Tmp3, SwapInvCC);
       }
     }
 
     if (!Legalized) {
       Legalized = LegalizeSetCCCondCode(
           getSetCCResultType(Tmp1.getValueType()), Tmp1, Tmp2, CC, NeedInvert,
           dl);
 
       assert(Legalized && "Can't legalize SELECT_CC with legal condition!");
 
       // If we expanded the SETCC by inverting the condition code, then swap
       // the True/False operands to match.
       if (NeedInvert)
         std::swap(Tmp3, Tmp4);
 
       // If we expanded the SETCC by swapping LHS and RHS, or by inverting the
       // condition code, create a new SELECT_CC node.
       if (CC.getNode()) {
         Tmp1 = DAG.getNode(ISD::SELECT_CC, dl, Node->getValueType(0),
                            Tmp1, Tmp2, Tmp3, Tmp4, CC);
       } else {
         Tmp2 = DAG.getConstant(0, dl, Tmp1.getValueType());
         CC = DAG.getCondCode(ISD::SETNE);
         Tmp1 = DAG.getNode(ISD::SELECT_CC, dl, Node->getValueType(0), Tmp1,
                            Tmp2, Tmp3, Tmp4, CC);
       }
     }
     Results.push_back(Tmp1);
     break;
   }
   case ISD::BR_CC: {
     Tmp1 = Node->getOperand(0);              // Chain
     Tmp2 = Node->getOperand(2);              // LHS
     Tmp3 = Node->getOperand(3);              // RHS
     Tmp4 = Node->getOperand(1);              // CC
 
     bool Legalized = LegalizeSetCCCondCode(getSetCCResultType(
         Tmp2.getValueType()), Tmp2, Tmp3, Tmp4, NeedInvert, dl);
     (void)Legalized;
     assert(Legalized && "Can't legalize BR_CC with legal condition!");
 
     // If we expanded the SETCC by inverting the condition code, then wrap
     // the existing SETCC in a NOT to restore the intended condition.
     if (NeedInvert)
       Tmp4 = DAG.getNOT(dl, Tmp4, Tmp4->getValueType(0));
 
     // If we expanded the SETCC by swapping LHS and RHS, create a new BR_CC
     // node.
     if (Tmp4.getNode()) {
       Tmp1 = DAG.getNode(ISD::BR_CC, dl, Node->getValueType(0), Tmp1,
                          Tmp4, Tmp2, Tmp3, Node->getOperand(4));
     } else {
       Tmp3 = DAG.getConstant(0, dl, Tmp2.getValueType());
       Tmp4 = DAG.getCondCode(ISD::SETNE);
       Tmp1 = DAG.getNode(ISD::BR_CC, dl, Node->getValueType(0), Tmp1, Tmp4,
                          Tmp2, Tmp3, Node->getOperand(4));
     }
     Results.push_back(Tmp1);
     break;
   }
   case ISD::BUILD_VECTOR:
     Results.push_back(ExpandBUILD_VECTOR(Node));
     break;
   case ISD::SRA:
   case ISD::SRL:
   case ISD::SHL: {
     // Scalarize vector SRA/SRL/SHL.
     EVT VT = Node->getValueType(0);
     assert(VT.isVector() && "Unable to legalize non-vector shift");
     assert(TLI.isTypeLegal(VT.getScalarType())&& "Element type must be legal");
     unsigned NumElem = VT.getVectorNumElements();
 
     SmallVector<SDValue, 8> Scalars;
     for (unsigned Idx = 0; Idx < NumElem; Idx++) {
       SDValue Ex = DAG.getNode(
           ISD::EXTRACT_VECTOR_ELT, dl, VT.getScalarType(), Node->getOperand(0),
           DAG.getConstant(Idx, dl, TLI.getVectorIdxTy(DAG.getDataLayout())));
       SDValue Sh = DAG.getNode(
           ISD::EXTRACT_VECTOR_ELT, dl, VT.getScalarType(), Node->getOperand(1),
           DAG.getConstant(Idx, dl, TLI.getVectorIdxTy(DAG.getDataLayout())));
       Scalars.push_back(DAG.getNode(Node->getOpcode(), dl,
                                     VT.getScalarType(), Ex, Sh));
     }
     SDValue Result =
       DAG.getNode(ISD::BUILD_VECTOR, dl, Node->getValueType(0), Scalars);
     ReplaceNode(SDValue(Node, 0), Result);
     break;
   }
   case ISD::GLOBAL_OFFSET_TABLE:
   case ISD::GlobalAddress:
   case ISD::GlobalTLSAddress:
   case ISD::ExternalSymbol:
   case ISD::ConstantPool:
   case ISD::JumpTable:
   case ISD::INTRINSIC_W_CHAIN:
   case ISD::INTRINSIC_WO_CHAIN:
   case ISD::INTRINSIC_VOID:
     // FIXME: Custom lowering for these operations shouldn't return null!
     break;
   }
 
   // Replace the original node with the legalized result.
   if (Results.empty())
     return false;
 
   ReplaceNode(Node, Results.data());
   return true;
 }
 
 void SelectionDAGLegalize::ConvertNodeToLibcall(SDNode *Node) {
   SmallVector<SDValue, 8> Results;
   SDLoc dl(Node);
   SDValue Tmp1, Tmp2, Tmp3, Tmp4;
   unsigned Opc = Node->getOpcode();
   switch (Opc) {
   case ISD::ATOMIC_FENCE: {
     // If the target didn't lower this, lower it to '__sync_synchronize()' call
     // FIXME: handle "fence singlethread" more efficiently.
     TargetLowering::ArgListTy Args;
 
     TargetLowering::CallLoweringInfo CLI(DAG);
     CLI.setDebugLoc(dl)
         .setChain(Node->getOperand(0))
         .setCallee(CallingConv::C, Type::getVoidTy(*DAG.getContext()),
                    DAG.getExternalSymbol("__sync_synchronize",
                                          TLI.getPointerTy(DAG.getDataLayout())),
                    std::move(Args), 0);
 
     std::pair<SDValue, SDValue> CallResult = TLI.LowerCallTo(CLI);
 
     Results.push_back(CallResult.second);
     break;
   }
   // By default, atomic intrinsics are marked Legal and lowered. Targets
   // which don't support them directly, however, may want libcalls, in which
   // case they mark them Expand, and we get here.
   case ISD::ATOMIC_SWAP:
   case ISD::ATOMIC_LOAD_ADD:
   case ISD::ATOMIC_LOAD_SUB:
   case ISD::ATOMIC_LOAD_AND:
   case ISD::ATOMIC_LOAD_OR:
   case ISD::ATOMIC_LOAD_XOR:
   case ISD::ATOMIC_LOAD_NAND:
   case ISD::ATOMIC_LOAD_MIN:
   case ISD::ATOMIC_LOAD_MAX:
   case ISD::ATOMIC_LOAD_UMIN:
   case ISD::ATOMIC_LOAD_UMAX:
   case ISD::ATOMIC_CMP_SWAP: {
     MVT VT = cast<AtomicSDNode>(Node)->getMemoryVT().getSimpleVT();
     RTLIB::Libcall LC = RTLIB::getATOMIC(Opc, VT);
     assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unexpected atomic op or value type!");
 
     std::pair<SDValue, SDValue> Tmp = ExpandChainLibCall(LC, Node, false);
     Results.push_back(Tmp.first);
     Results.push_back(Tmp.second);
     break;
   }
   case ISD::TRAP: {
     // If this operation is not supported, lower it to 'abort()' call
     TargetLowering::ArgListTy Args;
     TargetLowering::CallLoweringInfo CLI(DAG);
     CLI.setDebugLoc(dl)
         .setChain(Node->getOperand(0))
         .setCallee(CallingConv::C, Type::getVoidTy(*DAG.getContext()),
                    DAG.getExternalSymbol("abort",
                                          TLI.getPointerTy(DAG.getDataLayout())),
                    std::move(Args), 0);
     std::pair<SDValue, SDValue> CallResult = TLI.LowerCallTo(CLI);
 
     Results.push_back(CallResult.second);
     break;
   }
   case ISD::FMINNUM:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::FMIN_F32, RTLIB::FMIN_F64,
                                       RTLIB::FMIN_F80, RTLIB::FMIN_F128,
                                       RTLIB::FMIN_PPCF128));
     break;
   case ISD::FMAXNUM:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::FMAX_F32, RTLIB::FMAX_F64,
                                       RTLIB::FMAX_F80, RTLIB::FMAX_F128,
                                       RTLIB::FMAX_PPCF128));
     break;
   case ISD::FSQRT:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::SQRT_F32, RTLIB::SQRT_F64,
                                       RTLIB::SQRT_F80, RTLIB::SQRT_F128,
                                       RTLIB::SQRT_PPCF128));
     break;
   case ISD::FSIN:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::SIN_F32, RTLIB::SIN_F64,
                                       RTLIB::SIN_F80, RTLIB::SIN_F128,
                                       RTLIB::SIN_PPCF128));
     break;
   case ISD::FCOS:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::COS_F32, RTLIB::COS_F64,
                                       RTLIB::COS_F80, RTLIB::COS_F128,
                                       RTLIB::COS_PPCF128));
     break;
   case ISD::FSINCOS:
     // Expand into sincos libcall.
     ExpandSinCosLibCall(Node, Results);
     break;
   case ISD::FLOG:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
                                       RTLIB::LOG_F80, RTLIB::LOG_F128,
                                       RTLIB::LOG_PPCF128));
     break;
   case ISD::FLOG2:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG2_F32, RTLIB::LOG2_F64,
                                       RTLIB::LOG2_F80, RTLIB::LOG2_F128,
                                       RTLIB::LOG2_PPCF128));
     break;
   case ISD::FLOG10:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG10_F32, RTLIB::LOG10_F64,
                                       RTLIB::LOG10_F80, RTLIB::LOG10_F128,
                                       RTLIB::LOG10_PPCF128));
     break;
   case ISD::FEXP:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::EXP_F32, RTLIB::EXP_F64,
                                       RTLIB::EXP_F80, RTLIB::EXP_F128,
                                       RTLIB::EXP_PPCF128));
     break;
   case ISD::FEXP2:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::EXP2_F32, RTLIB::EXP2_F64,
                                       RTLIB::EXP2_F80, RTLIB::EXP2_F128,
                                       RTLIB::EXP2_PPCF128));
     break;
   case ISD::FTRUNC:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::TRUNC_F32, RTLIB::TRUNC_F64,
                                       RTLIB::TRUNC_F80, RTLIB::TRUNC_F128,
                                       RTLIB::TRUNC_PPCF128));
     break;
   case ISD::FFLOOR:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::FLOOR_F32, RTLIB::FLOOR_F64,
                                       RTLIB::FLOOR_F80, RTLIB::FLOOR_F128,
                                       RTLIB::FLOOR_PPCF128));
     break;
   case ISD::FCEIL:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::CEIL_F32, RTLIB::CEIL_F64,
                                       RTLIB::CEIL_F80, RTLIB::CEIL_F128,
                                       RTLIB::CEIL_PPCF128));
     break;
   case ISD::FRINT:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::RINT_F32, RTLIB::RINT_F64,
                                       RTLIB::RINT_F80, RTLIB::RINT_F128,
                                       RTLIB::RINT_PPCF128));
     break;
   case ISD::FNEARBYINT:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::NEARBYINT_F32,
                                       RTLIB::NEARBYINT_F64,
                                       RTLIB::NEARBYINT_F80,
                                       RTLIB::NEARBYINT_F128,
                                       RTLIB::NEARBYINT_PPCF128));
     break;
   case ISD::FROUND:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::ROUND_F32,
                                       RTLIB::ROUND_F64,
                                       RTLIB::ROUND_F80,
                                       RTLIB::ROUND_F128,
                                       RTLIB::ROUND_PPCF128));
     break;
   case ISD::FPOWI:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::POWI_F32, RTLIB::POWI_F64,
                                       RTLIB::POWI_F80, RTLIB::POWI_F128,
                                       RTLIB::POWI_PPCF128));
     break;
   case ISD::FPOW:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::POW_F32, RTLIB::POW_F64,
                                       RTLIB::POW_F80, RTLIB::POW_F128,
                                       RTLIB::POW_PPCF128));
     break;
   case ISD::FDIV:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::DIV_F32, RTLIB::DIV_F64,
                                       RTLIB::DIV_F80, RTLIB::DIV_F128,
                                       RTLIB::DIV_PPCF128));
     break;
   case ISD::FREM:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::REM_F32, RTLIB::REM_F64,
                                       RTLIB::REM_F80, RTLIB::REM_F128,
                                       RTLIB::REM_PPCF128));
     break;
   case ISD::FMA:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::FMA_F32, RTLIB::FMA_F64,
                                       RTLIB::FMA_F80, RTLIB::FMA_F128,
                                       RTLIB::FMA_PPCF128));
     break;
   case ISD::FADD:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::ADD_F32, RTLIB::ADD_F64,
                                       RTLIB::ADD_F80, RTLIB::ADD_F128,
                                       RTLIB::ADD_PPCF128));
     break;
   case ISD::FMUL:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::MUL_F32, RTLIB::MUL_F64,
                                       RTLIB::MUL_F80, RTLIB::MUL_F128,
                                       RTLIB::MUL_PPCF128));
     break;
   case ISD::FP16_TO_FP:
     if (Node->getValueType(0) == MVT::f32) {
       Results.push_back(ExpandLibCall(RTLIB::FPEXT_F16_F32, Node, false));
     }
     break;
   case ISD::FP_TO_FP16: {
     RTLIB::Libcall LC =
         RTLIB::getFPROUND(Node->getOperand(0).getValueType(), MVT::f16);
     assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unable to expand fp_to_fp16");
     Results.push_back(ExpandLibCall(LC, Node, false));
     break;
   }
   case ISD::FSUB:
     Results.push_back(ExpandFPLibCall(Node, RTLIB::SUB_F32, RTLIB::SUB_F64,
                                       RTLIB::SUB_F80, RTLIB::SUB_F128,
                                       RTLIB::SUB_PPCF128));
     break;
   case ISD::SREM:
     Results.push_back(ExpandIntLibCall(Node, true,
                                        RTLIB::SREM_I8,
                                        RTLIB::SREM_I16, RTLIB::SREM_I32,
                                        RTLIB::SREM_I64, RTLIB::SREM_I128));
     break;
   case ISD::UREM:
     Results.push_back(ExpandIntLibCall(Node, false,
                                        RTLIB::UREM_I8,
                                        RTLIB::UREM_I16, RTLIB::UREM_I32,
                                        RTLIB::UREM_I64, RTLIB::UREM_I128));
     break;
   case ISD::SDIV:
     Results.push_back(ExpandIntLibCall(Node, true,
                                        RTLIB::SDIV_I8,
                                        RTLIB::SDIV_I16, RTLIB::SDIV_I32,
                                        RTLIB::SDIV_I64, RTLIB::SDIV_I128));
     break;
   case ISD::UDIV:
     Results.push_back(ExpandIntLibCall(Node, false,
                                        RTLIB::UDIV_I8,
                                        RTLIB::UDIV_I16, RTLIB::UDIV_I32,
                                        RTLIB::UDIV_I64, RTLIB::UDIV_I128));
     break;
   case ISD::SDIVREM:
   case ISD::UDIVREM:
     // Expand into divrem libcall
     ExpandDivRemLibCall(Node, Results);
     break;
   case ISD::MUL:
     Results.push_back(ExpandIntLibCall(Node, false,
                                        RTLIB::MUL_I8,
                                        RTLIB::MUL_I16, RTLIB::MUL_I32,
                                        RTLIB::MUL_I64, RTLIB::MUL_I128));
     break;
   }
 
   // Replace the original node with the legalized result.
   if (!Results.empty())
     ReplaceNode(Node, Results.data());
 }
 
 // Determine the vector type to use in place of an original scalar element when
 // promoting equally sized vectors.
 static MVT getPromotedVectorElementType(const TargetLowering &TLI,
                                         MVT EltVT, MVT NewEltVT) {
   unsigned OldEltsPerNewElt = EltVT.getSizeInBits() / NewEltVT.getSizeInBits();
   MVT MidVT = MVT::getVectorVT(NewEltVT, OldEltsPerNewElt);
   assert(TLI.isTypeLegal(MidVT) && "unexpected");
   return MidVT;
 }
 
 void SelectionDAGLegalize::PromoteNode(SDNode *Node) {
   SmallVector<SDValue, 8> Results;
   MVT OVT = Node->getSimpleValueType(0);
   if (Node->getOpcode() == ISD::UINT_TO_FP ||
       Node->getOpcode() == ISD::SINT_TO_FP ||
       Node->getOpcode() == ISD::SETCC ||
       Node->getOpcode() == ISD::EXTRACT_VECTOR_ELT ||
       Node->getOpcode() == ISD::INSERT_VECTOR_ELT) {
     OVT = Node->getOperand(0).getSimpleValueType();
   }
   if (Node->getOpcode() == ISD::BR_CC)
     OVT = Node->getOperand(2).getSimpleValueType();
   MVT NVT = TLI.getTypeToPromoteTo(Node->getOpcode(), OVT);
   SDLoc dl(Node);
   SDValue Tmp1, Tmp2, Tmp3;
   switch (Node->getOpcode()) {
   case ISD::CTTZ:
   case ISD::CTTZ_ZERO_UNDEF:
   case ISD::CTLZ:
   case ISD::CTLZ_ZERO_UNDEF:
   case ISD::CTPOP:
     // Zero extend the argument.
     Tmp1 = DAG.getNode(ISD::ZERO_EXTEND, dl, NVT, Node->getOperand(0));
     // Perform the larger operation. For CTPOP and CTTZ_ZERO_UNDEF, this is
     // already the correct result.
     Tmp1 = DAG.getNode(Node->getOpcode(), dl, NVT, Tmp1);
     if (Node->getOpcode() == ISD::CTTZ) {
       // FIXME: This should set a bit in the zero extended value instead.
       Tmp2 = DAG.getSetCC(dl, getSetCCResultType(NVT),
                           Tmp1, DAG.getConstant(NVT.getSizeInBits(), dl, NVT),
                           ISD::SETEQ);
       Tmp1 = DAG.getSelect(dl, NVT, Tmp2,
                            DAG.getConstant(OVT.getSizeInBits(), dl, NVT), Tmp1);
     } else if (Node->getOpcode() == ISD::CTLZ ||
                Node->getOpcode() == ISD::CTLZ_ZERO_UNDEF) {
       // Tmp1 = Tmp1 - (sizeinbits(NVT) - sizeinbits(Old VT))
       Tmp1 = DAG.getNode(ISD::SUB, dl, NVT, Tmp1,
                           DAG.getConstant(NVT.getSizeInBits() -
                                           OVT.getSizeInBits(), dl, NVT));
     }
     Results.push_back(DAG.getNode(ISD::TRUNCATE, dl, OVT, Tmp1));
     break;
   case ISD::BSWAP: {
     unsigned DiffBits = NVT.getSizeInBits() - OVT.getSizeInBits();
     Tmp1 = DAG.getNode(ISD::ZERO_EXTEND, dl, NVT, Node->getOperand(0));
     Tmp1 = DAG.getNode(ISD::BSWAP, dl, NVT, Tmp1);
     Tmp1 = DAG.getNode(
         ISD::SRL, dl, NVT, Tmp1,
         DAG.getConstant(DiffBits, dl,
                         TLI.getShiftAmountTy(NVT, DAG.getDataLayout())));
     Results.push_back(Tmp1);
     break;
   }
   case ISD::FP_TO_UINT:
   case ISD::FP_TO_SINT:
     Tmp1 = PromoteLegalFP_TO_INT(Node->getOperand(0), Node->getValueType(0),
                                  Node->getOpcode() == ISD::FP_TO_SINT, dl);
     Results.push_back(Tmp1);
     break;
   case ISD::UINT_TO_FP:
   case ISD::SINT_TO_FP:
     Tmp1 = PromoteLegalINT_TO_FP(Node->getOperand(0), Node->getValueType(0),
                                  Node->getOpcode() == ISD::SINT_TO_FP, dl);
     Results.push_back(Tmp1);
     break;
   case ISD::VAARG: {
     SDValue Chain = Node->getOperand(0); // Get the chain.
     SDValue Ptr = Node->getOperand(1); // Get the pointer.
 
     unsigned TruncOp;
     if (OVT.isVector()) {
       TruncOp = ISD::BITCAST;
     } else {
       assert(OVT.isInteger()
         && "VAARG promotion is supported only for vectors or integer types");
       TruncOp = ISD::TRUNCATE;
     }
 
     // Perform the larger operation, then convert back
     Tmp1 = DAG.getVAArg(NVT, dl, Chain, Ptr, Node->getOperand(2),
              Node->getConstantOperandVal(3));
     Chain = Tmp1.getValue(1);
 
     Tmp2 = DAG.getNode(TruncOp, dl, OVT, Tmp1);
 
     // Modified the chain result - switch anything that used the old chain to
     // use the new one.
     DAG.ReplaceAllUsesOfValueWith(SDValue(Node, 0), Tmp2);
     DAG.ReplaceAllUsesOfValueWith(SDValue(Node, 1), Chain);
     if (UpdatedNodes) {
       UpdatedNodes->insert(Tmp2.getNode());
       UpdatedNodes->insert(Chain.getNode());
     }
     ReplacedNode(Node);
     break;
   }
   case ISD::AND:
   case ISD::OR:
   case ISD::XOR: {
     unsigned ExtOp, TruncOp;
     if (OVT.isVector()) {
       ExtOp   = ISD::BITCAST;
       TruncOp = ISD::BITCAST;
     } else {
       assert(OVT.isInteger() && "Cannot promote logic operation");
       ExtOp   = ISD::ANY_EXTEND;
       TruncOp = ISD::TRUNCATE;
     }
     // Promote each of the values to the new type.
     Tmp1 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(0));
     Tmp2 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(1));
     // Perform the larger operation, then convert back
     Tmp1 = DAG.getNode(Node->getOpcode(), dl, NVT, Tmp1, Tmp2);
     Results.push_back(DAG.getNode(TruncOp, dl, OVT, Tmp1));
     break;
   }
   case ISD::SELECT: {
     unsigned ExtOp, TruncOp;
     if (Node->getValueType(0).isVector() ||
         Node->getValueType(0).getSizeInBits() == NVT.getSizeInBits()) {
       ExtOp   = ISD::BITCAST;
       TruncOp = ISD::BITCAST;
     } else if (Node->getValueType(0).isInteger()) {
       ExtOp   = ISD::ANY_EXTEND;
       TruncOp = ISD::TRUNCATE;
     } else {
       ExtOp   = ISD::FP_EXTEND;
       TruncOp = ISD::FP_ROUND;
     }
     Tmp1 = Node->getOperand(0);
     // Promote each of the values to the new type.
     Tmp2 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(1));
     Tmp3 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(2));
     // Perform the larger operation, then round down.
     Tmp1 = DAG.getSelect(dl, NVT, Tmp1, Tmp2, Tmp3);
     if (TruncOp != ISD::FP_ROUND)
       Tmp1 = DAG.getNode(TruncOp, dl, Node->getValueType(0), Tmp1);
     else
       Tmp1 = DAG.getNode(TruncOp, dl, Node->getValueType(0), Tmp1,
                          DAG.getIntPtrConstant(0, dl));
     Results.push_back(Tmp1);
     break;
   }
   case ISD::VECTOR_SHUFFLE: {
     ArrayRef<int> Mask = cast<ShuffleVectorSDNode>(Node)->getMask();
 
     // Cast the two input vectors.
     Tmp1 = DAG.getNode(ISD::BITCAST, dl, NVT, Node->getOperand(0));
     Tmp2 = DAG.getNode(ISD::BITCAST, dl, NVT, Node->getOperand(1));
 
     // Convert the shuffle mask to the right # elements.
     Tmp1 = ShuffleWithNarrowerEltType(NVT, OVT, dl, Tmp1, Tmp2, Mask);
     Tmp1 = DAG.getNode(ISD::BITCAST, dl, OVT, Tmp1);
     Results.push_back(Tmp1);
     break;
   }
   case ISD::SETCC: {
     unsigned ExtOp = ISD::FP_EXTEND;
     if (NVT.isInteger()) {
       ISD::CondCode CCCode =
         cast<CondCodeSDNode>(Node->getOperand(2))->get();
       ExtOp = isSignedIntSetCC(CCCode) ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
     }
     Tmp1 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(0));
     Tmp2 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(1));
     Results.push_back(DAG.getNode(ISD::SETCC, dl, Node->getValueType(0),
                                   Tmp1, Tmp2, Node->getOperand(2)));
     break;
   }
   case ISD::BR_CC: {
     unsigned ExtOp = ISD::FP_EXTEND;
     if (NVT.isInteger()) {
       ISD::CondCode CCCode =
         cast<CondCodeSDNode>(Node->getOperand(1))->get();
       ExtOp = isSignedIntSetCC(CCCode) ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
     }
     Tmp1 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(2));
     Tmp2 = DAG.getNode(ExtOp, dl, NVT, Node->getOperand(3));
     Results.push_back(DAG.getNode(ISD::BR_CC, dl, Node->getValueType(0),
                                   Node->getOperand(0), Node->getOperand(1),
                                   Tmp1, Tmp2, Node->getOperand(4)));
     break;
   }
   case ISD::FADD:
   case ISD::FSUB:
   case ISD::FMUL:
   case ISD::FDIV:
   case ISD::FREM:
   case ISD::FMINNUM:
   case ISD::FMAXNUM:
   case ISD::FPOW: {
     Tmp1 = DAG.getNode(ISD::FP_EXTEND, dl, NVT, Node->getOperand(0));
     Tmp2 = DAG.getNode(ISD::FP_EXTEND, dl, NVT, Node->getOperand(1));
     Tmp3 = DAG.getNode(Node->getOpcode(), dl, NVT, Tmp1, Tmp2,
                        Node->getFlags());
     Results.push_back(DAG.getNode(ISD::FP_ROUND, dl, OVT,
                                   Tmp3, DAG.getIntPtrConstant(0, dl)));
     break;
   }
   case ISD::FMA: {
     Tmp1 = DAG.getNode(ISD::FP_EXTEND, dl, NVT, Node->getOperand(0));
     Tmp2 = DAG.getNode(ISD::FP_EXTEND, dl, NVT, Node->getOperand(1));
     Tmp3 = DAG.getNode(ISD::FP_EXTEND, dl, NVT, Node->getOperand(2));
     Results.push_back(
         DAG.getNode(ISD::FP_ROUND, dl, OVT,
                     DAG.getNode(Node->getOpcode(), dl, NVT, Tmp1, Tmp2, Tmp3),
                     DAG.getIntPtrConstant(0, dl)));
     break;
   }
   case ISD::FCOPYSIGN:
   case ISD::FPOWI: {
     Tmp1 = DAG.getNode(ISD::FP_EXTEND, dl, NVT, Node->getOperand(0));
     Tmp2 = Node->getOperand(1);
     Tmp3 = DAG.getNode(Node->getOpcode(), dl, NVT, Tmp1, Tmp2);
 
     // fcopysign doesn't change anything but the sign bit, so
     //   (fp_round (fcopysign (fpext a), b))
     // is as precise as
     //   (fp_round (fpext a))
     // which is a no-op. Mark it as a TRUNCating FP_ROUND.
     const bool isTrunc = (Node->getOpcode() == ISD::FCOPYSIGN);
     Results.push_back(DAG.getNode(ISD::FP_ROUND, dl, OVT,
                                   Tmp3, DAG.getIntPtrConstant(isTrunc, dl)));
     break;
   }
   case ISD::FFLOOR:
   case ISD::FCEIL:
   case ISD::FRINT:
   case ISD::FNEARBYINT:
   case ISD::FROUND:
   case ISD::FTRUNC:
   case ISD::FNEG:
   case ISD::FSQRT:
   case ISD::FSIN:
   case ISD::FCOS:
   case ISD::FLOG:
   case ISD::FLOG2:
   case ISD::FLOG10:
   case ISD::FABS:
   case ISD::FEXP:
   case ISD::FEXP2: {
     Tmp1 = DAG.getNode(ISD::FP_EXTEND, dl, NVT, Node->getOperand(0));
     Tmp2 = DAG.getNode(Node->getOpcode(), dl, NVT, Tmp1);
     Results.push_back(DAG.getNode(ISD::FP_ROUND, dl, OVT,
                                   Tmp2, DAG.getIntPtrConstant(0, dl)));
     break;
   }
   case ISD::BUILD_VECTOR: {
     MVT EltVT = OVT.getVectorElementType();
     MVT NewEltVT = NVT.getVectorElementType();
 
     // Handle bitcasts to a different vector type with the same total bit size
     //
     // e.g. v2i64 = build_vector i64:x, i64:y => v4i32
     //  =>
     //  v4i32 = concat_vectors (v2i32 (bitcast i64:x)), (v2i32 (bitcast i64:y))
 
     assert(NVT.isVector() && OVT.getSizeInBits() == NVT.getSizeInBits() &&
            "Invalid promote type for build_vector");
     assert(NewEltVT.bitsLT(EltVT) && "not handled");
 
     MVT MidVT = getPromotedVectorElementType(TLI, EltVT, NewEltVT);
 
     SmallVector<SDValue, 8> NewOps;
     for (unsigned I = 0, E = Node->getNumOperands(); I != E; ++I) {
       SDValue Op = Node->getOperand(I);
       NewOps.push_back(DAG.getNode(ISD::BITCAST, SDLoc(Op), MidVT, Op));
     }
 
     SDLoc SL(Node);
     SDValue Concat = DAG.getNode(ISD::CONCAT_VECTORS, SL, NVT, NewOps);
     SDValue CvtVec = DAG.getNode(ISD::BITCAST, SL, OVT, Concat);
     Results.push_back(CvtVec);
     break;
   }
   case ISD::EXTRACT_VECTOR_ELT: {
     MVT EltVT = OVT.getVectorElementType();
     MVT NewEltVT = NVT.getVectorElementType();
 
     // Handle bitcasts to a different vector type with the same total bit size.
     //
     // e.g. v2i64 = extract_vector_elt x:v2i64, y:i32
     //  =>
     //  v4i32:castx = bitcast x:v2i64
     //
     // i64 = bitcast
     //   (v2i32 build_vector (i32 (extract_vector_elt castx, (2 * y))),
     //                       (i32 (extract_vector_elt castx, (2 * y + 1)))
     //
 
     assert(NVT.isVector() && OVT.getSizeInBits() == NVT.getSizeInBits() &&
            "Invalid promote type for extract_vector_elt");
     assert(NewEltVT.bitsLT(EltVT) && "not handled");
 
     MVT MidVT = getPromotedVectorElementType(TLI, EltVT, NewEltVT);
     unsigned NewEltsPerOldElt = MidVT.getVectorNumElements();
 
     SDValue Idx = Node->getOperand(1);
     EVT IdxVT = Idx.getValueType();
     SDLoc SL(Node);
     SDValue Factor = DAG.getConstant(NewEltsPerOldElt, SL, IdxVT);
     SDValue NewBaseIdx = DAG.getNode(ISD::MUL, SL, IdxVT, Idx, Factor);
 
     SDValue CastVec = DAG.getNode(ISD::BITCAST, SL, NVT, Node->getOperand(0));
 
     SmallVector<SDValue, 8> NewOps;
     for (unsigned I = 0; I < NewEltsPerOldElt; ++I) {
       SDValue IdxOffset = DAG.getConstant(I, SL, IdxVT);
       SDValue TmpIdx = DAG.getNode(ISD::ADD, SL, IdxVT, NewBaseIdx, IdxOffset);
 
       SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, NewEltVT,
                                 CastVec, TmpIdx);
       NewOps.push_back(Elt);
     }
 
     SDValue NewVec = DAG.getNode(ISD::BUILD_VECTOR, SL, MidVT, NewOps);
 
     Results.push_back(DAG.getNode(ISD::BITCAST, SL, EltVT, NewVec));
     break;
   }
   case ISD::INSERT_VECTOR_ELT: {
     MVT EltVT = OVT.getVectorElementType();
     MVT NewEltVT = NVT.getVectorElementType();
 
     // Handle bitcasts to a different vector type with the same total bit size
     //
     // e.g. v2i64 = insert_vector_elt x:v2i64, y:i64, z:i32
     //  =>
     //  v4i32:castx = bitcast x:v2i64
     //  v2i32:casty = bitcast y:i64
     //
     // v2i64 = bitcast
     //   (v4i32 insert_vector_elt
     //       (v4i32 insert_vector_elt v4i32:castx,
     //                                (extract_vector_elt casty, 0), 2 * z),
     //        (extract_vector_elt casty, 1), (2 * z + 1))
 
     assert(NVT.isVector() && OVT.getSizeInBits() == NVT.getSizeInBits() &&
            "Invalid promote type for insert_vector_elt");
     assert(NewEltVT.bitsLT(EltVT) && "not handled");
 
     MVT MidVT = getPromotedVectorElementType(TLI, EltVT, NewEltVT);
     unsigned NewEltsPerOldElt = MidVT.getVectorNumElements();
 
     SDValue Val = Node->getOperand(1);
     SDValue Idx = Node->getOperand(2);
     EVT IdxVT = Idx.getValueType();
     SDLoc SL(Node);
 
     SDValue Factor = DAG.getConstant(NewEltsPerOldElt, SDLoc(), IdxVT);
     SDValue NewBaseIdx = DAG.getNode(ISD::MUL, SL, IdxVT, Idx, Factor);
 
     SDValue CastVec = DAG.getNode(ISD::BITCAST, SL, NVT, Node->getOperand(0));
     SDValue CastVal = DAG.getNode(ISD::BITCAST, SL, MidVT, Val);
 
     SDValue NewVec = CastVec;
     for (unsigned I = 0; I < NewEltsPerOldElt; ++I) {
       SDValue IdxOffset = DAG.getConstant(I, SL, IdxVT);
       SDValue InEltIdx = DAG.getNode(ISD::ADD, SL, IdxVT, NewBaseIdx, IdxOffset);
 
       SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, NewEltVT,
                                 CastVal, IdxOffset);
 
       NewVec = DAG.getNode(ISD::INSERT_VECTOR_ELT, SL, NVT,
                            NewVec, Elt, InEltIdx);
     }
 
     Results.push_back(DAG.getNode(ISD::BITCAST, SL, OVT, NewVec));
     break;
   }
   case ISD::SCALAR_TO_VECTOR: {
     MVT EltVT = OVT.getVectorElementType();
     MVT NewEltVT = NVT.getVectorElementType();
 
     // Handle bitcasts to different vector type with the smae total bit size.
     //
     // e.g. v2i64 = scalar_to_vector x:i64
     //   =>
     //  concat_vectors (v2i32 bitcast x:i64), (v2i32 undef)
     //
 
     MVT MidVT = getPromotedVectorElementType(TLI, EltVT, NewEltVT);
     SDValue Val = Node->getOperand(0);
     SDLoc SL(Node);
 
     SDValue CastVal = DAG.getNode(ISD::BITCAST, SL, MidVT, Val);
     SDValue Undef = DAG.getUNDEF(MidVT);
 
     SmallVector<SDValue, 8> NewElts;
     NewElts.push_back(CastVal);
     for (unsigned I = 1, NElts = OVT.getVectorNumElements(); I != NElts; ++I)
       NewElts.push_back(Undef);
 
     SDValue Concat = DAG.getNode(ISD::CONCAT_VECTORS, SL, NVT, NewElts);
     SDValue CvtVec = DAG.getNode(ISD::BITCAST, SL, OVT, Concat);
     Results.push_back(CvtVec);
     break;
   }
   }
 
   // Replace the original node with the legalized result.
   if (!Results.empty())
     ReplaceNode(Node, Results.data());
 }
 
 /// This is the entry point for the file.
 void SelectionDAG::Legalize() {
   AssignTopologicalOrder();
 
   SmallPtrSet<SDNode *, 16> LegalizedNodes;
   SelectionDAGLegalize Legalizer(*this, LegalizedNodes);
 
   // Visit all the nodes. We start in topological order, so that we see
   // nodes with their original operands intact. Legalization can produce
   // new nodes which may themselves need to be legalized. Iterate until all
   // nodes have been legalized.
   for (;;) {
     bool AnyLegalized = false;
     for (auto NI = allnodes_end(); NI != allnodes_begin();) {
       --NI;
 
       SDNode *N = &*NI;
       if (N->use_empty() && N != getRoot().getNode()) {
         ++NI;
         DeleteNode(N);
         continue;
       }
 
       if (LegalizedNodes.insert(N).second) {
         AnyLegalized = true;
         Legalizer.LegalizeOp(N);
 
         if (N->use_empty() && N != getRoot().getNode()) {
           ++NI;
           DeleteNode(N);
         }
       }
     }
     if (!AnyLegalized)
       break;
 
   }
 
   // Remove dead nodes now.
   RemoveDeadNodes();
 }
 
 bool SelectionDAG::LegalizeOp(SDNode *N,
                               SmallSetVector<SDNode *, 16> &UpdatedNodes) {
   SmallPtrSet<SDNode *, 16> LegalizedNodes;
   SelectionDAGLegalize Legalizer(*this, LegalizedNodes, &UpdatedNodes);
 
   // Directly insert the node in question, and legalize it. This will recurse
   // as needed through operands.
   LegalizedNodes.insert(N);
   Legalizer.LegalizeOp(N);
 
   return LegalizedNodes.count(N);
 }
Index: vendor/llvm/dist/lib/CodeGen/WinEHPrepare.cpp
===================================================================
--- vendor/llvm/dist/lib/CodeGen/WinEHPrepare.cpp	(revision 295845)
+++ vendor/llvm/dist/lib/CodeGen/WinEHPrepare.cpp	(revision 295846)
@@ -1,1222 +1,1230 @@
 //===-- WinEHPrepare - Prepare exception handling for code generation ---===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This pass lowers LLVM IR exception handling into something closer to what the
 // backend wants for functions using a personality function from a runtime
 // provided by MSVC. Functions with other personality functions are left alone
 // and may be prepared by other passes. In particular, all supported MSVC
 // personality functions require cleanup code to be outlined, and the C++
 // personality requires catch handler code to be outlined.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/CodeGen/Passes.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/Analysis/CFG.h"
 #include "llvm/Analysis/EHPersonalities.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/WinEHFuncInfo.h"
 #include "llvm/IR/Verifier.h"
 #include "llvm/MC/MCSymbol.h"
 #include "llvm/Pass.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Cloning.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include "llvm/Transforms/Utils/SSAUpdater.h"
 
 using namespace llvm;
 
 #define DEBUG_TYPE "winehprepare"
 
 static cl::opt<bool> DisableDemotion(
     "disable-demotion", cl::Hidden,
     cl::desc(
         "Clone multicolor basic blocks but do not demote cross funclet values"),
     cl::init(false));
 
 static cl::opt<bool> DisableCleanups(
     "disable-cleanups", cl::Hidden,
     cl::desc("Do not remove implausible terminators or other similar cleanups"),
     cl::init(false));
 
 namespace {
   
 class WinEHPrepare : public FunctionPass {
 public:
   static char ID; // Pass identification, replacement for typeid.
   WinEHPrepare(const TargetMachine *TM = nullptr) : FunctionPass(ID) {}
 
   bool runOnFunction(Function &Fn) override;
 
   bool doFinalization(Module &M) override;
 
   void getAnalysisUsage(AnalysisUsage &AU) const override;
 
   const char *getPassName() const override {
     return "Windows exception handling preparation";
   }
 
 private:
   void insertPHIStores(PHINode *OriginalPHI, AllocaInst *SpillSlot);
   void
   insertPHIStore(BasicBlock *PredBlock, Value *PredVal, AllocaInst *SpillSlot,
                  SmallVectorImpl<std::pair<BasicBlock *, Value *>> &Worklist);
   AllocaInst *insertPHILoads(PHINode *PN, Function &F);
   void replaceUseWithLoad(Value *V, Use &U, AllocaInst *&SpillSlot,
                           DenseMap<BasicBlock *, Value *> &Loads, Function &F);
   bool prepareExplicitEH(Function &F);
   void colorFunclets(Function &F);
 
   void demotePHIsOnFunclets(Function &F);
   void cloneCommonBlocks(Function &F);
   void removeImplausibleInstructions(Function &F);
   void cleanupPreparedFunclets(Function &F);
   void verifyPreparedFunclets(Function &F);
 
   // All fields are reset by runOnFunction.
   EHPersonality Personality = EHPersonality::Unknown;
 
   DenseMap<BasicBlock *, ColorVector> BlockColors;
   MapVector<BasicBlock *, std::vector<BasicBlock *>> FuncletBlocks;
 };
 
 } // end anonymous namespace
 
 char WinEHPrepare::ID = 0;
 INITIALIZE_TM_PASS(WinEHPrepare, "winehprepare", "Prepare Windows exceptions",
                    false, false)
 
 FunctionPass *llvm::createWinEHPass(const TargetMachine *TM) {
   return new WinEHPrepare(TM);
 }
 
 bool WinEHPrepare::runOnFunction(Function &Fn) {
   if (!Fn.hasPersonalityFn())
     return false;
 
   // Classify the personality to see what kind of preparation we need.
   Personality = classifyEHPersonality(Fn.getPersonalityFn());
 
   // Do nothing if this is not a funclet-based personality.
   if (!isFuncletEHPersonality(Personality))
     return false;
 
   return prepareExplicitEH(Fn);
 }
 
 bool WinEHPrepare::doFinalization(Module &M) { return false; }
 
 void WinEHPrepare::getAnalysisUsage(AnalysisUsage &AU) const {}
 
 static int addUnwindMapEntry(WinEHFuncInfo &FuncInfo, int ToState,
                              const BasicBlock *BB) {
   CxxUnwindMapEntry UME;
   UME.ToState = ToState;
   UME.Cleanup = BB;
   FuncInfo.CxxUnwindMap.push_back(UME);
   return FuncInfo.getLastStateNumber();
 }
 
 static void addTryBlockMapEntry(WinEHFuncInfo &FuncInfo, int TryLow,
                                 int TryHigh, int CatchHigh,
                                 ArrayRef<const CatchPadInst *> Handlers) {
   WinEHTryBlockMapEntry TBME;
   TBME.TryLow = TryLow;
   TBME.TryHigh = TryHigh;
   TBME.CatchHigh = CatchHigh;
   assert(TBME.TryLow <= TBME.TryHigh);
   for (const CatchPadInst *CPI : Handlers) {
     WinEHHandlerType HT;
     Constant *TypeInfo = cast<Constant>(CPI->getArgOperand(0));
     if (TypeInfo->isNullValue())
       HT.TypeDescriptor = nullptr;
     else
       HT.TypeDescriptor = cast<GlobalVariable>(TypeInfo->stripPointerCasts());
     HT.Adjectives = cast<ConstantInt>(CPI->getArgOperand(1))->getZExtValue();
     HT.Handler = CPI->getParent();
     if (auto *AI =
             dyn_cast<AllocaInst>(CPI->getArgOperand(2)->stripPointerCasts()))
       HT.CatchObj.Alloca = AI;
     else
       HT.CatchObj.Alloca = nullptr;
     TBME.HandlerArray.push_back(HT);
   }
   FuncInfo.TryBlockMap.push_back(TBME);
 }
 
 static BasicBlock *getCleanupRetUnwindDest(const CleanupPadInst *CleanupPad) {
   for (const User *U : CleanupPad->users())
     if (const auto *CRI = dyn_cast<CleanupReturnInst>(U))
       return CRI->getUnwindDest();
   return nullptr;
 }
 
 static void calculateStateNumbersForInvokes(const Function *Fn,
                                             WinEHFuncInfo &FuncInfo) {
   auto *F = const_cast<Function *>(Fn);
   DenseMap<BasicBlock *, ColorVector> BlockColors = colorEHFunclets(*F);
   for (BasicBlock &BB : *F) {
     auto *II = dyn_cast<InvokeInst>(BB.getTerminator());
     if (!II)
       continue;
 
     auto &BBColors = BlockColors[&BB];
     assert(BBColors.size() == 1 && "multi-color BB not removed by preparation");
     BasicBlock *FuncletEntryBB = BBColors.front();
 
     BasicBlock *FuncletUnwindDest;
     auto *FuncletPad =
         dyn_cast<FuncletPadInst>(FuncletEntryBB->getFirstNonPHI());
     assert(FuncletPad || FuncletEntryBB == &Fn->getEntryBlock());
     if (!FuncletPad)
       FuncletUnwindDest = nullptr;
     else if (auto *CatchPad = dyn_cast<CatchPadInst>(FuncletPad))
       FuncletUnwindDest = CatchPad->getCatchSwitch()->getUnwindDest();
     else if (auto *CleanupPad = dyn_cast<CleanupPadInst>(FuncletPad))
       FuncletUnwindDest = getCleanupRetUnwindDest(CleanupPad);
     else
       llvm_unreachable("unexpected funclet pad!");
 
     BasicBlock *InvokeUnwindDest = II->getUnwindDest();
     int BaseState = -1;
     if (FuncletUnwindDest == InvokeUnwindDest) {
       auto BaseStateI = FuncInfo.FuncletBaseStateMap.find(FuncletPad);
       if (BaseStateI != FuncInfo.FuncletBaseStateMap.end())
         BaseState = BaseStateI->second;
     }
 
     if (BaseState != -1) {
       FuncInfo.InvokeStateMap[II] = BaseState;
     } else {
       Instruction *PadInst = InvokeUnwindDest->getFirstNonPHI();
       assert(FuncInfo.EHPadStateMap.count(PadInst) && "EH Pad has no state!");
       FuncInfo.InvokeStateMap[II] = FuncInfo.EHPadStateMap[PadInst];
     }
   }
 }
 
 // Given BB which ends in an unwind edge, return the EHPad that this BB belongs
 // to. If the unwind edge came from an invoke, return null.
 static const BasicBlock *getEHPadFromPredecessor(const BasicBlock *BB,
                                                  Value *ParentPad) {
   const TerminatorInst *TI = BB->getTerminator();
   if (isa<InvokeInst>(TI))
     return nullptr;
   if (auto *CatchSwitch = dyn_cast<CatchSwitchInst>(TI)) {
     if (CatchSwitch->getParentPad() != ParentPad)
       return nullptr;
     return BB;
   }
   assert(!TI->isEHPad() && "unexpected EHPad!");
   auto *CleanupPad = cast<CleanupReturnInst>(TI)->getCleanupPad();
   if (CleanupPad->getParentPad() != ParentPad)
     return nullptr;
   return CleanupPad->getParent();
 }
 
 static void calculateCXXStateNumbers(WinEHFuncInfo &FuncInfo,
                                      const Instruction *FirstNonPHI,
                                      int ParentState) {
   const BasicBlock *BB = FirstNonPHI->getParent();
   assert(BB->isEHPad() && "not a funclet!");
 
   if (auto *CatchSwitch = dyn_cast<CatchSwitchInst>(FirstNonPHI)) {
     assert(FuncInfo.EHPadStateMap.count(CatchSwitch) == 0 &&
            "shouldn't revist catch funclets!");
 
     SmallVector<const CatchPadInst *, 2> Handlers;
     for (const BasicBlock *CatchPadBB : CatchSwitch->handlers()) {
       auto *CatchPad = cast<CatchPadInst>(CatchPadBB->getFirstNonPHI());
       Handlers.push_back(CatchPad);
     }
     int TryLow = addUnwindMapEntry(FuncInfo, ParentState, nullptr);
     FuncInfo.EHPadStateMap[CatchSwitch] = TryLow;
     for (const BasicBlock *PredBlock : predecessors(BB))
       if ((PredBlock = getEHPadFromPredecessor(PredBlock,
                                                CatchSwitch->getParentPad())))
         calculateCXXStateNumbers(FuncInfo, PredBlock->getFirstNonPHI(),
                                  TryLow);
     int CatchLow = addUnwindMapEntry(FuncInfo, ParentState, nullptr);
 
     // catchpads are separate funclets in C++ EH due to the way rethrow works.
     int TryHigh = CatchLow - 1;
     for (const auto *CatchPad : Handlers) {
       FuncInfo.FuncletBaseStateMap[CatchPad] = CatchLow;
       for (const User *U : CatchPad->users()) {
         const auto *UserI = cast<Instruction>(U);
         if (auto *InnerCatchSwitch = dyn_cast<CatchSwitchInst>(UserI))
           if (InnerCatchSwitch->getUnwindDest() == CatchSwitch->getUnwindDest())
             calculateCXXStateNumbers(FuncInfo, UserI, CatchLow);
-        if (auto *InnerCleanupPad = dyn_cast<CleanupPadInst>(UserI))
-          if (getCleanupRetUnwindDest(InnerCleanupPad) ==
-              CatchSwitch->getUnwindDest())
+        if (auto *InnerCleanupPad = dyn_cast<CleanupPadInst>(UserI)) {
+          BasicBlock *UnwindDest = getCleanupRetUnwindDest(InnerCleanupPad);
+          // If a nested cleanup pad reports a null unwind destination and the
+          // enclosing catch pad doesn't it must be post-dominated by an
+          // unreachable instruction.
+          if (!UnwindDest || UnwindDest == CatchSwitch->getUnwindDest())
             calculateCXXStateNumbers(FuncInfo, UserI, CatchLow);
+        }
       }
     }
     int CatchHigh = FuncInfo.getLastStateNumber();
     addTryBlockMapEntry(FuncInfo, TryLow, TryHigh, CatchHigh, Handlers);
     DEBUG(dbgs() << "TryLow[" << BB->getName() << "]: " << TryLow << '\n');
     DEBUG(dbgs() << "TryHigh[" << BB->getName() << "]: " << TryHigh << '\n');
     DEBUG(dbgs() << "CatchHigh[" << BB->getName() << "]: " << CatchHigh
                  << '\n');
   } else {
     auto *CleanupPad = cast<CleanupPadInst>(FirstNonPHI);
 
     // It's possible for a cleanup to be visited twice: it might have multiple
     // cleanupret instructions.
     if (FuncInfo.EHPadStateMap.count(CleanupPad))
       return;
 
     int CleanupState = addUnwindMapEntry(FuncInfo, ParentState, BB);
     FuncInfo.EHPadStateMap[CleanupPad] = CleanupState;
     DEBUG(dbgs() << "Assigning state #" << CleanupState << " to BB "
                  << BB->getName() << '\n');
     for (const BasicBlock *PredBlock : predecessors(BB)) {
       if ((PredBlock = getEHPadFromPredecessor(PredBlock,
                                                CleanupPad->getParentPad()))) {
         calculateCXXStateNumbers(FuncInfo, PredBlock->getFirstNonPHI(),
                                  CleanupState);
       }
     }
     for (const User *U : CleanupPad->users()) {
       const auto *UserI = cast<Instruction>(U);
       if (UserI->isEHPad())
         report_fatal_error("Cleanup funclets for the MSVC++ personality cannot "
                            "contain exceptional actions");
     }
   }
 }
 
 static int addSEHExcept(WinEHFuncInfo &FuncInfo, int ParentState,
                         const Function *Filter, const BasicBlock *Handler) {
   SEHUnwindMapEntry Entry;
   Entry.ToState = ParentState;
   Entry.IsFinally = false;
   Entry.Filter = Filter;
   Entry.Handler = Handler;
   FuncInfo.SEHUnwindMap.push_back(Entry);
   return FuncInfo.SEHUnwindMap.size() - 1;
 }
 
 static int addSEHFinally(WinEHFuncInfo &FuncInfo, int ParentState,
                          const BasicBlock *Handler) {
   SEHUnwindMapEntry Entry;
   Entry.ToState = ParentState;
   Entry.IsFinally = true;
   Entry.Filter = nullptr;
   Entry.Handler = Handler;
   FuncInfo.SEHUnwindMap.push_back(Entry);
   return FuncInfo.SEHUnwindMap.size() - 1;
 }
 
 static void calculateSEHStateNumbers(WinEHFuncInfo &FuncInfo,
                                      const Instruction *FirstNonPHI,
                                      int ParentState) {
   const BasicBlock *BB = FirstNonPHI->getParent();
   assert(BB->isEHPad() && "no a funclet!");
 
   if (auto *CatchSwitch = dyn_cast<CatchSwitchInst>(FirstNonPHI)) {
     assert(FuncInfo.EHPadStateMap.count(CatchSwitch) == 0 &&
            "shouldn't revist catch funclets!");
 
     // Extract the filter function and the __except basic block and create a
     // state for them.
     assert(CatchSwitch->getNumHandlers() == 1 &&
            "SEH doesn't have multiple handlers per __try");
     const auto *CatchPad =
         cast<CatchPadInst>((*CatchSwitch->handler_begin())->getFirstNonPHI());
     const BasicBlock *CatchPadBB = CatchPad->getParent();
     const Constant *FilterOrNull =
         cast<Constant>(CatchPad->getArgOperand(0)->stripPointerCasts());
     const Function *Filter = dyn_cast<Function>(FilterOrNull);
     assert((Filter || FilterOrNull->isNullValue()) &&
            "unexpected filter value");
     int TryState = addSEHExcept(FuncInfo, ParentState, Filter, CatchPadBB);
 
     // Everything in the __try block uses TryState as its parent state.
     FuncInfo.EHPadStateMap[CatchSwitch] = TryState;
     DEBUG(dbgs() << "Assigning state #" << TryState << " to BB "
                  << CatchPadBB->getName() << '\n');
     for (const BasicBlock *PredBlock : predecessors(BB))
       if ((PredBlock = getEHPadFromPredecessor(PredBlock,
                                                CatchSwitch->getParentPad())))
         calculateSEHStateNumbers(FuncInfo, PredBlock->getFirstNonPHI(),
                                  TryState);
 
     // Everything in the __except block unwinds to ParentState, just like code
     // outside the __try.
     for (const User *U : CatchPad->users()) {
       const auto *UserI = cast<Instruction>(U);
       if (auto *InnerCatchSwitch = dyn_cast<CatchSwitchInst>(UserI))
         if (InnerCatchSwitch->getUnwindDest() == CatchSwitch->getUnwindDest())
           calculateSEHStateNumbers(FuncInfo, UserI, ParentState);
-      if (auto *InnerCleanupPad = dyn_cast<CleanupPadInst>(UserI))
-        if (getCleanupRetUnwindDest(InnerCleanupPad) ==
-            CatchSwitch->getUnwindDest())
+      if (auto *InnerCleanupPad = dyn_cast<CleanupPadInst>(UserI)) {
+        BasicBlock *UnwindDest = getCleanupRetUnwindDest(InnerCleanupPad);
+        // If a nested cleanup pad reports a null unwind destination and the
+        // enclosing catch pad doesn't it must be post-dominated by an
+        // unreachable instruction.
+        if (!UnwindDest || UnwindDest == CatchSwitch->getUnwindDest())
           calculateSEHStateNumbers(FuncInfo, UserI, ParentState);
+      }
     }
   } else {
     auto *CleanupPad = cast<CleanupPadInst>(FirstNonPHI);
 
     // It's possible for a cleanup to be visited twice: it might have multiple
     // cleanupret instructions.
     if (FuncInfo.EHPadStateMap.count(CleanupPad))
       return;
 
     int CleanupState = addSEHFinally(FuncInfo, ParentState, BB);
     FuncInfo.EHPadStateMap[CleanupPad] = CleanupState;
     DEBUG(dbgs() << "Assigning state #" << CleanupState << " to BB "
                  << BB->getName() << '\n');
     for (const BasicBlock *PredBlock : predecessors(BB))
       if ((PredBlock =
                getEHPadFromPredecessor(PredBlock, CleanupPad->getParentPad())))
         calculateSEHStateNumbers(FuncInfo, PredBlock->getFirstNonPHI(),
                                  CleanupState);
     for (const User *U : CleanupPad->users()) {
       const auto *UserI = cast<Instruction>(U);
       if (UserI->isEHPad())
         report_fatal_error("Cleanup funclets for the SEH personality cannot "
                            "contain exceptional actions");
     }
   }
 }
 
 static bool isTopLevelPadForMSVC(const Instruction *EHPad) {
   if (auto *CatchSwitch = dyn_cast<CatchSwitchInst>(EHPad))
     return isa<ConstantTokenNone>(CatchSwitch->getParentPad()) &&
            CatchSwitch->unwindsToCaller();
   if (auto *CleanupPad = dyn_cast<CleanupPadInst>(EHPad))
     return isa<ConstantTokenNone>(CleanupPad->getParentPad()) &&
            getCleanupRetUnwindDest(CleanupPad) == nullptr;
   if (isa<CatchPadInst>(EHPad))
     return false;
   llvm_unreachable("unexpected EHPad!");
 }
 
 void llvm::calculateSEHStateNumbers(const Function *Fn,
                                     WinEHFuncInfo &FuncInfo) {
   // Don't compute state numbers twice.
   if (!FuncInfo.SEHUnwindMap.empty())
     return;
 
   for (const BasicBlock &BB : *Fn) {
     if (!BB.isEHPad())
       continue;
     const Instruction *FirstNonPHI = BB.getFirstNonPHI();
     if (!isTopLevelPadForMSVC(FirstNonPHI))
       continue;
     ::calculateSEHStateNumbers(FuncInfo, FirstNonPHI, -1);
   }
 
   calculateStateNumbersForInvokes(Fn, FuncInfo);
 }
 
 void llvm::calculateWinCXXEHStateNumbers(const Function *Fn,
                                          WinEHFuncInfo &FuncInfo) {
   // Return if it's already been done.
   if (!FuncInfo.EHPadStateMap.empty())
     return;
 
   for (const BasicBlock &BB : *Fn) {
     if (!BB.isEHPad())
       continue;
     const Instruction *FirstNonPHI = BB.getFirstNonPHI();
     if (!isTopLevelPadForMSVC(FirstNonPHI))
       continue;
     calculateCXXStateNumbers(FuncInfo, FirstNonPHI, -1);
   }
 
   calculateStateNumbersForInvokes(Fn, FuncInfo);
 }
 
 static int addClrEHHandler(WinEHFuncInfo &FuncInfo, int HandlerParentState,
                            int TryParentState, ClrHandlerType HandlerType,
                            uint32_t TypeToken, const BasicBlock *Handler) {
   ClrEHUnwindMapEntry Entry;
   Entry.HandlerParentState = HandlerParentState;
   Entry.TryParentState = TryParentState;
   Entry.Handler = Handler;
   Entry.HandlerType = HandlerType;
   Entry.TypeToken = TypeToken;
   FuncInfo.ClrEHUnwindMap.push_back(Entry);
   return FuncInfo.ClrEHUnwindMap.size() - 1;
 }
 
 void llvm::calculateClrEHStateNumbers(const Function *Fn,
                                       WinEHFuncInfo &FuncInfo) {
   // Return if it's already been done.
   if (!FuncInfo.EHPadStateMap.empty())
     return;
 
   // This numbering assigns one state number to each catchpad and cleanuppad.
   // It also computes two tree-like relations over states:
   // 1) Each state has a "HandlerParentState", which is the state of the next
   //    outer handler enclosing this state's handler (same as nearest ancestor
   //    per the ParentPad linkage on EH pads, but skipping over catchswitches).
   // 2) Each state has a "TryParentState", which:
   //    a) for a catchpad that's not the last handler on its catchswitch, is
   //       the state of the next catchpad on that catchswitch
   //    b) for all other pads, is the state of the pad whose try region is the
   //       next outer try region enclosing this state's try region.  The "try
   //       regions are not present as such in the IR, but will be inferred
   //       based on the placement of invokes and pads which reach each other
   //       by exceptional exits
   // Catchswitches do not get their own states, but each gets mapped to the
   // state of its first catchpad.
 
   // Step one: walk down from outermost to innermost funclets, assigning each
   // catchpad and cleanuppad a state number.  Add an entry to the
   // ClrEHUnwindMap for each state, recording its HandlerParentState and
   // handler attributes.  Record the TryParentState as well for each catchpad
   // that's not the last on its catchswitch, but initialize all other entries'
   // TryParentStates to a sentinel -1 value that the next pass will update.
 
   // Seed a worklist with pads that have no parent.
   SmallVector<std::pair<const Instruction *, int>, 8> Worklist;
   for (const BasicBlock &BB : *Fn) {
     const Instruction *FirstNonPHI = BB.getFirstNonPHI();
     const Value *ParentPad;
     if (const auto *CPI = dyn_cast<CleanupPadInst>(FirstNonPHI))
       ParentPad = CPI->getParentPad();
     else if (const auto *CSI = dyn_cast<CatchSwitchInst>(FirstNonPHI))
       ParentPad = CSI->getParentPad();
     else
       continue;
     if (isa<ConstantTokenNone>(ParentPad))
       Worklist.emplace_back(FirstNonPHI, -1);
   }
 
   // Use the worklist to visit all pads, from outer to inner.  Record
   // HandlerParentState for all pads.  Record TryParentState only for catchpads
   // that aren't the last on their catchswitch (setting all other entries'
   // TryParentStates to an initial value of -1).  This loop is also responsible
   // for setting the EHPadStateMap entry for all catchpads, cleanuppads, and
   // catchswitches.
   while (!Worklist.empty()) {
     const Instruction *Pad;
     int HandlerParentState;
     std::tie(Pad, HandlerParentState) = Worklist.pop_back_val();
 
     if (const auto *Cleanup = dyn_cast<CleanupPadInst>(Pad)) {
       // Create the entry for this cleanup with the appropriate handler
       // properties.  Finaly and fault handlers are distinguished by arity.
       ClrHandlerType HandlerType =
           (Cleanup->getNumArgOperands() ? ClrHandlerType::Fault
                                         : ClrHandlerType::Finally);
       int CleanupState = addClrEHHandler(FuncInfo, HandlerParentState, -1,
                                          HandlerType, 0, Pad->getParent());
       // Queue any child EH pads on the worklist.
       for (const User *U : Cleanup->users())
         if (const auto *I = dyn_cast<Instruction>(U))
           if (I->isEHPad())
             Worklist.emplace_back(I, CleanupState);
       // Remember this pad's state.
       FuncInfo.EHPadStateMap[Cleanup] = CleanupState;
     } else {
       // Walk the handlers of this catchswitch in reverse order since all but
       // the last need to set the following one as its TryParentState.
       const auto *CatchSwitch = cast<CatchSwitchInst>(Pad);
       int CatchState = -1, FollowerState = -1;
       SmallVector<const BasicBlock *, 4> CatchBlocks(CatchSwitch->handlers());
       for (auto CBI = CatchBlocks.rbegin(), CBE = CatchBlocks.rend();
            CBI != CBE; ++CBI, FollowerState = CatchState) {
         const BasicBlock *CatchBlock = *CBI;
         // Create the entry for this catch with the appropriate handler
         // properties.
         const auto *Catch = cast<CatchPadInst>(CatchBlock->getFirstNonPHI());
         uint32_t TypeToken = static_cast<uint32_t>(
             cast<ConstantInt>(Catch->getArgOperand(0))->getZExtValue());
         CatchState =
             addClrEHHandler(FuncInfo, HandlerParentState, FollowerState,
                             ClrHandlerType::Catch, TypeToken, CatchBlock);
         // Queue any child EH pads on the worklist.
         for (const User *U : Catch->users())
           if (const auto *I = dyn_cast<Instruction>(U))
             if (I->isEHPad())
               Worklist.emplace_back(I, CatchState);
         // Remember this catch's state.
         FuncInfo.EHPadStateMap[Catch] = CatchState;
       }
       // Associate the catchswitch with the state of its first catch.
       assert(CatchSwitch->getNumHandlers());
       FuncInfo.EHPadStateMap[CatchSwitch] = CatchState;
     }
   }
 
   // Step two: record the TryParentState of each state.  For cleanuppads that
   // don't have cleanuprets, we may need to infer this from their child pads,
   // so visit pads in descendant-most to ancestor-most order.
   for (auto Entry = FuncInfo.ClrEHUnwindMap.rbegin(),
             End = FuncInfo.ClrEHUnwindMap.rend();
        Entry != End; ++Entry) {
     const Instruction *Pad =
         Entry->Handler.get<const BasicBlock *>()->getFirstNonPHI();
     // For most pads, the TryParentState is the state associated with the
     // unwind dest of exceptional exits from it.
     const BasicBlock *UnwindDest;
     if (const auto *Catch = dyn_cast<CatchPadInst>(Pad)) {
       // If a catch is not the last in its catchswitch, its TryParentState is
       // the state associated with the next catch in the switch, even though
       // that's not the unwind dest of exceptions escaping the catch.  Those
       // cases were already assigned a TryParentState in the first pass, so
       // skip them.
       if (Entry->TryParentState != -1)
         continue;
       // Otherwise, get the unwind dest from the catchswitch.
       UnwindDest = Catch->getCatchSwitch()->getUnwindDest();
     } else {
       const auto *Cleanup = cast<CleanupPadInst>(Pad);
       UnwindDest = nullptr;
       for (const User *U : Cleanup->users()) {
         if (auto *CleanupRet = dyn_cast<CleanupReturnInst>(U)) {
           // Common and unambiguous case -- cleanupret indicates cleanup's
           // unwind dest.
           UnwindDest = CleanupRet->getUnwindDest();
           break;
         }
 
         // Get an unwind dest for the user
         const BasicBlock *UserUnwindDest = nullptr;
         if (auto *Invoke = dyn_cast<InvokeInst>(U)) {
           UserUnwindDest = Invoke->getUnwindDest();
         } else if (auto *CatchSwitch = dyn_cast<CatchSwitchInst>(U)) {
           UserUnwindDest = CatchSwitch->getUnwindDest();
         } else if (auto *ChildCleanup = dyn_cast<CleanupPadInst>(U)) {
           int UserState = FuncInfo.EHPadStateMap[ChildCleanup];
           int UserUnwindState =
               FuncInfo.ClrEHUnwindMap[UserState].TryParentState;
           if (UserUnwindState != -1)
             UserUnwindDest = FuncInfo.ClrEHUnwindMap[UserUnwindState]
                                  .Handler.get<const BasicBlock *>();
         }
 
         // Not having an unwind dest for this user might indicate that it
         // doesn't unwind, so can't be taken as proof that the cleanup itself
         // may unwind to caller (see e.g. SimplifyUnreachable and
         // RemoveUnwindEdge).
         if (!UserUnwindDest)
           continue;
 
         // Now we have an unwind dest for the user, but we need to see if it
         // unwinds all the way out of the cleanup or if it stays within it.
         const Instruction *UserUnwindPad = UserUnwindDest->getFirstNonPHI();
         const Value *UserUnwindParent;
         if (auto *CSI = dyn_cast<CatchSwitchInst>(UserUnwindPad))
           UserUnwindParent = CSI->getParentPad();
         else
           UserUnwindParent =
               cast<CleanupPadInst>(UserUnwindPad)->getParentPad();
 
         // The unwind stays within the cleanup iff it targets a child of the
         // cleanup.
         if (UserUnwindParent == Cleanup)
           continue;
 
         // This unwind exits the cleanup, so its dest is the cleanup's dest.
         UnwindDest = UserUnwindDest;
         break;
       }
     }
 
     // Record the state of the unwind dest as the TryParentState.
     int UnwindDestState;
 
     // If UnwindDest is null at this point, either the pad in question can
     // be exited by unwind to caller, or it cannot be exited by unwind.  In
     // either case, reporting such cases as unwinding to caller is correct.
     // This can lead to EH tables that "look strange" -- if this pad's is in
     // a parent funclet which has other children that do unwind to an enclosing
     // pad, the try region for this pad will be missing the "duplicate" EH
     // clause entries that you'd expect to see covering the whole parent.  That
     // should be benign, since the unwind never actually happens.  If it were
     // an issue, we could add a subsequent pass that pushes unwind dests down
     // from parents that have them to children that appear to unwind to caller.
     if (!UnwindDest) {
       UnwindDestState = -1;
     } else {
       UnwindDestState = FuncInfo.EHPadStateMap[UnwindDest->getFirstNonPHI()];
     }
 
     Entry->TryParentState = UnwindDestState;
   }
 
   // Step three: transfer information from pads to invokes.
   calculateStateNumbersForInvokes(Fn, FuncInfo);
 }
 
 void WinEHPrepare::colorFunclets(Function &F) {
   BlockColors = colorEHFunclets(F);
 
   // Invert the map from BB to colors to color to BBs.
   for (BasicBlock &BB : F) {
     ColorVector &Colors = BlockColors[&BB];
     for (BasicBlock *Color : Colors)
       FuncletBlocks[Color].push_back(&BB);
   }
 }
 
 void WinEHPrepare::demotePHIsOnFunclets(Function &F) {
   // Strip PHI nodes off of EH pads.
   SmallVector<PHINode *, 16> PHINodes;
   for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE;) {
     BasicBlock *BB = &*FI++;
     if (!BB->isEHPad())
       continue;
     for (BasicBlock::iterator BI = BB->begin(), BE = BB->end(); BI != BE;) {
       Instruction *I = &*BI++;
       auto *PN = dyn_cast<PHINode>(I);
       // Stop at the first non-PHI.
       if (!PN)
         break;
 
       AllocaInst *SpillSlot = insertPHILoads(PN, F);
       if (SpillSlot)
         insertPHIStores(PN, SpillSlot);
 
       PHINodes.push_back(PN);
     }
   }
 
   for (auto *PN : PHINodes) {
     // There may be lingering uses on other EH PHIs being removed
     PN->replaceAllUsesWith(UndefValue::get(PN->getType()));
     PN->eraseFromParent();
   }
 }
 
 void WinEHPrepare::cloneCommonBlocks(Function &F) {
   // We need to clone all blocks which belong to multiple funclets.  Values are
   // remapped throughout the funclet to propogate both the new instructions
   // *and* the new basic blocks themselves.
   for (auto &Funclets : FuncletBlocks) {
     BasicBlock *FuncletPadBB = Funclets.first;
     std::vector<BasicBlock *> &BlocksInFunclet = Funclets.second;
     Value *FuncletToken;
     if (FuncletPadBB == &F.getEntryBlock())
       FuncletToken = ConstantTokenNone::get(F.getContext());
     else
       FuncletToken = FuncletPadBB->getFirstNonPHI();
 
     std::vector<std::pair<BasicBlock *, BasicBlock *>> Orig2Clone;
     ValueToValueMapTy VMap;
     for (BasicBlock *BB : BlocksInFunclet) {
       ColorVector &ColorsForBB = BlockColors[BB];
       // We don't need to do anything if the block is monochromatic.
       size_t NumColorsForBB = ColorsForBB.size();
       if (NumColorsForBB == 1)
         continue;
 
       DEBUG_WITH_TYPE("winehprepare-coloring",
                       dbgs() << "  Cloning block \'" << BB->getName()
                               << "\' for funclet \'" << FuncletPadBB->getName()
                               << "\'.\n");
 
       // Create a new basic block and copy instructions into it!
       BasicBlock *CBB =
           CloneBasicBlock(BB, VMap, Twine(".for.", FuncletPadBB->getName()));
       // Insert the clone immediately after the original to ensure determinism
       // and to keep the same relative ordering of any funclet's blocks.
       CBB->insertInto(&F, BB->getNextNode());
 
       // Add basic block mapping.
       VMap[BB] = CBB;
 
       // Record delta operations that we need to perform to our color mappings.
       Orig2Clone.emplace_back(BB, CBB);
     }
 
     // If nothing was cloned, we're done cloning in this funclet.
     if (Orig2Clone.empty())
       continue;
 
     // Update our color mappings to reflect that one block has lost a color and
     // another has gained a color.
     for (auto &BBMapping : Orig2Clone) {
       BasicBlock *OldBlock = BBMapping.first;
       BasicBlock *NewBlock = BBMapping.second;
 
       BlocksInFunclet.push_back(NewBlock);
       ColorVector &NewColors = BlockColors[NewBlock];
       assert(NewColors.empty() && "A new block should only have one color!");
       NewColors.push_back(FuncletPadBB);
 
       DEBUG_WITH_TYPE("winehprepare-coloring",
                       dbgs() << "  Assigned color \'" << FuncletPadBB->getName()
                               << "\' to block \'" << NewBlock->getName()
                               << "\'.\n");
 
       BlocksInFunclet.erase(
           std::remove(BlocksInFunclet.begin(), BlocksInFunclet.end(), OldBlock),
           BlocksInFunclet.end());
       ColorVector &OldColors = BlockColors[OldBlock];
       OldColors.erase(
           std::remove(OldColors.begin(), OldColors.end(), FuncletPadBB),
           OldColors.end());
 
       DEBUG_WITH_TYPE("winehprepare-coloring",
                       dbgs() << "  Removed color \'" << FuncletPadBB->getName()
                               << "\' from block \'" << OldBlock->getName()
                               << "\'.\n");
     }
 
     // Loop over all of the instructions in this funclet, fixing up operand
     // references as we go.  This uses VMap to do all the hard work.
     for (BasicBlock *BB : BlocksInFunclet)
       // Loop over all instructions, fixing each one as we find it...
       for (Instruction &I : *BB)
         RemapInstruction(&I, VMap,
                          RF_IgnoreMissingEntries | RF_NoModuleLevelChanges);
 
     // Catchrets targeting cloned blocks need to be updated separately from
     // the loop above because they are not in the current funclet.
     SmallVector<CatchReturnInst *, 2> FixupCatchrets;
     for (auto &BBMapping : Orig2Clone) {
       BasicBlock *OldBlock = BBMapping.first;
       BasicBlock *NewBlock = BBMapping.second;
 
       FixupCatchrets.clear();
       for (BasicBlock *Pred : predecessors(OldBlock))
         if (auto *CatchRet = dyn_cast<CatchReturnInst>(Pred->getTerminator()))
           if (CatchRet->getParentPad() == FuncletToken)
             FixupCatchrets.push_back(CatchRet);
 
       for (CatchReturnInst *CatchRet : FixupCatchrets)
         CatchRet->setSuccessor(NewBlock);
     }
 
     auto UpdatePHIOnClonedBlock = [&](PHINode *PN, bool IsForOldBlock) {
       unsigned NumPreds = PN->getNumIncomingValues();
       for (unsigned PredIdx = 0, PredEnd = NumPreds; PredIdx != PredEnd;
            ++PredIdx) {
         BasicBlock *IncomingBlock = PN->getIncomingBlock(PredIdx);
         bool EdgeTargetsFunclet;
         if (auto *CRI =
                 dyn_cast<CatchReturnInst>(IncomingBlock->getTerminator())) {
           EdgeTargetsFunclet = (CRI->getParentPad() == FuncletToken);
         } else {
           ColorVector &IncomingColors = BlockColors[IncomingBlock];
           assert(!IncomingColors.empty() && "Block not colored!");
           assert((IncomingColors.size() == 1 ||
                   llvm::all_of(IncomingColors,
                                [&](BasicBlock *Color) {
                                  return Color != FuncletPadBB;
                                })) &&
                  "Cloning should leave this funclet's blocks monochromatic");
           EdgeTargetsFunclet = (IncomingColors.front() == FuncletPadBB);
         }
         if (IsForOldBlock != EdgeTargetsFunclet)
           continue;
         PN->removeIncomingValue(IncomingBlock, /*DeletePHIIfEmpty=*/false);
         // Revisit the next entry.
         --PredIdx;
         --PredEnd;
       }
     };
 
     for (auto &BBMapping : Orig2Clone) {
       BasicBlock *OldBlock = BBMapping.first;
       BasicBlock *NewBlock = BBMapping.second;
       for (Instruction &OldI : *OldBlock) {
         auto *OldPN = dyn_cast<PHINode>(&OldI);
         if (!OldPN)
           break;
         UpdatePHIOnClonedBlock(OldPN, /*IsForOldBlock=*/true);
       }
       for (Instruction &NewI : *NewBlock) {
         auto *NewPN = dyn_cast<PHINode>(&NewI);
         if (!NewPN)
           break;
         UpdatePHIOnClonedBlock(NewPN, /*IsForOldBlock=*/false);
       }
     }
 
     // Check to see if SuccBB has PHI nodes. If so, we need to add entries to
     // the PHI nodes for NewBB now.
     for (auto &BBMapping : Orig2Clone) {
       BasicBlock *OldBlock = BBMapping.first;
       BasicBlock *NewBlock = BBMapping.second;
       for (BasicBlock *SuccBB : successors(NewBlock)) {
         for (Instruction &SuccI : *SuccBB) {
           auto *SuccPN = dyn_cast<PHINode>(&SuccI);
           if (!SuccPN)
             break;
 
           // Ok, we have a PHI node.  Figure out what the incoming value was for
           // the OldBlock.
           int OldBlockIdx = SuccPN->getBasicBlockIndex(OldBlock);
           if (OldBlockIdx == -1)
             break;
           Value *IV = SuccPN->getIncomingValue(OldBlockIdx);
 
           // Remap the value if necessary.
           if (auto *Inst = dyn_cast<Instruction>(IV)) {
             ValueToValueMapTy::iterator I = VMap.find(Inst);
             if (I != VMap.end())
               IV = I->second;
           }
 
           SuccPN->addIncoming(IV, NewBlock);
         }
       }
     }
 
     for (ValueToValueMapTy::value_type VT : VMap) {
       // If there were values defined in BB that are used outside the funclet,
       // then we now have to update all uses of the value to use either the
       // original value, the cloned value, or some PHI derived value.  This can
       // require arbitrary PHI insertion, of which we are prepared to do, clean
       // these up now.
       SmallVector<Use *, 16> UsesToRename;
 
       auto *OldI = dyn_cast<Instruction>(const_cast<Value *>(VT.first));
       if (!OldI)
         continue;
       auto *NewI = cast<Instruction>(VT.second);
       // Scan all uses of this instruction to see if it is used outside of its
       // funclet, and if so, record them in UsesToRename.
       for (Use &U : OldI->uses()) {
         Instruction *UserI = cast<Instruction>(U.getUser());
         BasicBlock *UserBB = UserI->getParent();
         ColorVector &ColorsForUserBB = BlockColors[UserBB];
         assert(!ColorsForUserBB.empty());
         if (ColorsForUserBB.size() > 1 ||
             *ColorsForUserBB.begin() != FuncletPadBB)
           UsesToRename.push_back(&U);
       }
 
       // If there are no uses outside the block, we're done with this
       // instruction.
       if (UsesToRename.empty())
         continue;
 
       // We found a use of OldI outside of the funclet.  Rename all uses of OldI
       // that are outside its funclet to be uses of the appropriate PHI node
       // etc.
       SSAUpdater SSAUpdate;
       SSAUpdate.Initialize(OldI->getType(), OldI->getName());
       SSAUpdate.AddAvailableValue(OldI->getParent(), OldI);
       SSAUpdate.AddAvailableValue(NewI->getParent(), NewI);
 
       while (!UsesToRename.empty())
         SSAUpdate.RewriteUseAfterInsertions(*UsesToRename.pop_back_val());
     }
   }
 }
 
 void WinEHPrepare::removeImplausibleInstructions(Function &F) {
   // Remove implausible terminators and replace them with UnreachableInst.
   for (auto &Funclet : FuncletBlocks) {
     BasicBlock *FuncletPadBB = Funclet.first;
     std::vector<BasicBlock *> &BlocksInFunclet = Funclet.second;
     Instruction *FirstNonPHI = FuncletPadBB->getFirstNonPHI();
     auto *FuncletPad = dyn_cast<FuncletPadInst>(FirstNonPHI);
     auto *CatchPad = dyn_cast_or_null<CatchPadInst>(FuncletPad);
     auto *CleanupPad = dyn_cast_or_null<CleanupPadInst>(FuncletPad);
 
     for (BasicBlock *BB : BlocksInFunclet) {
       for (Instruction &I : *BB) {
         CallSite CS(&I);
         if (!CS)
           continue;
 
         Value *FuncletBundleOperand = nullptr;
         if (auto BU = CS.getOperandBundle(LLVMContext::OB_funclet))
           FuncletBundleOperand = BU->Inputs.front();
 
         if (FuncletBundleOperand == FuncletPad)
           continue;
 
         // Skip call sites which are nounwind intrinsics.
         auto *CalledFn =
             dyn_cast<Function>(CS.getCalledValue()->stripPointerCasts());
         if (CalledFn && CalledFn->isIntrinsic() && CS.doesNotThrow())
           continue;
 
         // This call site was not part of this funclet, remove it.
         if (CS.isInvoke()) {
           // Remove the unwind edge if it was an invoke.
           removeUnwindEdge(BB);
           // Get a pointer to the new call.
           BasicBlock::iterator CallI =
               std::prev(BB->getTerminator()->getIterator());
           auto *CI = cast<CallInst>(&*CallI);
           changeToUnreachable(CI, /*UseLLVMTrap=*/false);
         } else {
           changeToUnreachable(&I, /*UseLLVMTrap=*/false);
         }
 
         // There are no more instructions in the block (except for unreachable),
         // we are done.
         break;
       }
 
       TerminatorInst *TI = BB->getTerminator();
       // CatchPadInst and CleanupPadInst can't transfer control to a ReturnInst.
       bool IsUnreachableRet = isa<ReturnInst>(TI) && FuncletPad;
       // The token consumed by a CatchReturnInst must match the funclet token.
       bool IsUnreachableCatchret = false;
       if (auto *CRI = dyn_cast<CatchReturnInst>(TI))
         IsUnreachableCatchret = CRI->getCatchPad() != CatchPad;
       // The token consumed by a CleanupReturnInst must match the funclet token.
       bool IsUnreachableCleanupret = false;
       if (auto *CRI = dyn_cast<CleanupReturnInst>(TI))
         IsUnreachableCleanupret = CRI->getCleanupPad() != CleanupPad;
       if (IsUnreachableRet || IsUnreachableCatchret ||
           IsUnreachableCleanupret) {
         changeToUnreachable(TI, /*UseLLVMTrap=*/false);
       } else if (isa<InvokeInst>(TI)) {
         if (Personality == EHPersonality::MSVC_CXX && CleanupPad) {
           // Invokes within a cleanuppad for the MSVC++ personality never
           // transfer control to their unwind edge: the personality will
           // terminate the program.
           removeUnwindEdge(BB);
         }
       }
     }
   }
 }
 
 void WinEHPrepare::cleanupPreparedFunclets(Function &F) {
   // Clean-up some of the mess we made by removing useles PHI nodes, trivial
   // branches, etc.
   for (Function::iterator FI = F.begin(), FE = F.end(); FI != FE;) {
     BasicBlock *BB = &*FI++;
     SimplifyInstructionsInBlock(BB);
     ConstantFoldTerminator(BB, /*DeleteDeadConditions=*/true);
     MergeBlockIntoPredecessor(BB);
   }
 
   // We might have some unreachable blocks after cleaning up some impossible
   // control flow.
   removeUnreachableBlocks(F);
 }
 
 void WinEHPrepare::verifyPreparedFunclets(Function &F) {
   for (BasicBlock &BB : F) {
     size_t NumColors = BlockColors[&BB].size();
     assert(NumColors == 1 && "Expected monochromatic BB!");
     if (NumColors == 0)
       report_fatal_error("Uncolored BB!");
     if (NumColors > 1)
       report_fatal_error("Multicolor BB!");
     assert((DisableDemotion || !(BB.isEHPad() && isa<PHINode>(BB.begin()))) &&
            "EH Pad still has a PHI!");
   }
 }
 
 bool WinEHPrepare::prepareExplicitEH(Function &F) {
   // Remove unreachable blocks.  It is not valuable to assign them a color and
   // their existence can trick us into thinking values are alive when they are
   // not.
   removeUnreachableBlocks(F);
 
   // Determine which blocks are reachable from which funclet entries.
   colorFunclets(F);
 
   cloneCommonBlocks(F);
 
   if (!DisableDemotion)
     demotePHIsOnFunclets(F);
 
   if (!DisableCleanups) {
     DEBUG(verifyFunction(F));
     removeImplausibleInstructions(F);
 
     DEBUG(verifyFunction(F));
     cleanupPreparedFunclets(F);
   }
 
   DEBUG(verifyPreparedFunclets(F));
   // Recolor the CFG to verify that all is well.
   DEBUG(colorFunclets(F));
   DEBUG(verifyPreparedFunclets(F));
 
   BlockColors.clear();
   FuncletBlocks.clear();
 
   return true;
 }
 
 // TODO: Share loads when one use dominates another, or when a catchpad exit
 // dominates uses (needs dominators).
 AllocaInst *WinEHPrepare::insertPHILoads(PHINode *PN, Function &F) {
   BasicBlock *PHIBlock = PN->getParent();
   AllocaInst *SpillSlot = nullptr;
   Instruction *EHPad = PHIBlock->getFirstNonPHI();
 
   if (!isa<TerminatorInst>(EHPad)) {
     // If the EHPad isn't a terminator, then we can insert a load in this block
     // that will dominate all uses.
     SpillSlot = new AllocaInst(PN->getType(), nullptr,
                                Twine(PN->getName(), ".wineh.spillslot"),
                                &F.getEntryBlock().front());
     Value *V = new LoadInst(SpillSlot, Twine(PN->getName(), ".wineh.reload"),
                             &*PHIBlock->getFirstInsertionPt());
     PN->replaceAllUsesWith(V);
     return SpillSlot;
   }
 
   // Otherwise, we have a PHI on a terminator EHPad, and we give up and insert
   // loads of the slot before every use.
   DenseMap<BasicBlock *, Value *> Loads;
   for (Value::use_iterator UI = PN->use_begin(), UE = PN->use_end();
        UI != UE;) {
     Use &U = *UI++;
     auto *UsingInst = cast<Instruction>(U.getUser());
     if (isa<PHINode>(UsingInst) && UsingInst->getParent()->isEHPad()) {
       // Use is on an EH pad phi.  Leave it alone; we'll insert loads and
       // stores for it separately.
       continue;
     }
     replaceUseWithLoad(PN, U, SpillSlot, Loads, F);
   }
   return SpillSlot;
 }
 
 // TODO: improve store placement.  Inserting at def is probably good, but need
 // to be careful not to introduce interfering stores (needs liveness analysis).
 // TODO: identify related phi nodes that can share spill slots, and share them
 // (also needs liveness).
 void WinEHPrepare::insertPHIStores(PHINode *OriginalPHI,
                                    AllocaInst *SpillSlot) {
   // Use a worklist of (Block, Value) pairs -- the given Value needs to be
   // stored to the spill slot by the end of the given Block.
   SmallVector<std::pair<BasicBlock *, Value *>, 4> Worklist;
 
   Worklist.push_back({OriginalPHI->getParent(), OriginalPHI});
 
   while (!Worklist.empty()) {
     BasicBlock *EHBlock;
     Value *InVal;
     std::tie(EHBlock, InVal) = Worklist.pop_back_val();
 
     PHINode *PN = dyn_cast<PHINode>(InVal);
     if (PN && PN->getParent() == EHBlock) {
       // The value is defined by another PHI we need to remove, with no room to
       // insert a store after the PHI, so each predecessor needs to store its
       // incoming value.
       for (unsigned i = 0, e = PN->getNumIncomingValues(); i < e; ++i) {
         Value *PredVal = PN->getIncomingValue(i);
 
         // Undef can safely be skipped.
         if (isa<UndefValue>(PredVal))
           continue;
 
         insertPHIStore(PN->getIncomingBlock(i), PredVal, SpillSlot, Worklist);
       }
     } else {
       // We need to store InVal, which dominates EHBlock, but can't put a store
       // in EHBlock, so need to put stores in each predecessor.
       for (BasicBlock *PredBlock : predecessors(EHBlock)) {
         insertPHIStore(PredBlock, InVal, SpillSlot, Worklist);
       }
     }
   }
 }
 
 void WinEHPrepare::insertPHIStore(
     BasicBlock *PredBlock, Value *PredVal, AllocaInst *SpillSlot,
     SmallVectorImpl<std::pair<BasicBlock *, Value *>> &Worklist) {
 
   if (PredBlock->isEHPad() &&
       isa<TerminatorInst>(PredBlock->getFirstNonPHI())) {
     // Pred is unsplittable, so we need to queue it on the worklist.
     Worklist.push_back({PredBlock, PredVal});
     return;
   }
 
   // Otherwise, insert the store at the end of the basic block.
   new StoreInst(PredVal, SpillSlot, PredBlock->getTerminator());
 }
 
 void WinEHPrepare::replaceUseWithLoad(Value *V, Use &U, AllocaInst *&SpillSlot,
                                       DenseMap<BasicBlock *, Value *> &Loads,
                                       Function &F) {
   // Lazilly create the spill slot.
   if (!SpillSlot)
     SpillSlot = new AllocaInst(V->getType(), nullptr,
                                Twine(V->getName(), ".wineh.spillslot"),
                                &F.getEntryBlock().front());
 
   auto *UsingInst = cast<Instruction>(U.getUser());
   if (auto *UsingPHI = dyn_cast<PHINode>(UsingInst)) {
     // If this is a PHI node, we can't insert a load of the value before
     // the use.  Instead insert the load in the predecessor block
     // corresponding to the incoming value.
     //
     // Note that if there are multiple edges from a basic block to this
     // PHI node that we cannot have multiple loads.  The problem is that
     // the resulting PHI node will have multiple values (from each load)
     // coming in from the same block, which is illegal SSA form.
     // For this reason, we keep track of and reuse loads we insert.
     BasicBlock *IncomingBlock = UsingPHI->getIncomingBlock(U);
     if (auto *CatchRet =
             dyn_cast<CatchReturnInst>(IncomingBlock->getTerminator())) {
       // Putting a load above a catchret and use on the phi would still leave
       // a cross-funclet def/use.  We need to split the edge, change the
       // catchret to target the new block, and put the load there.
       BasicBlock *PHIBlock = UsingInst->getParent();
       BasicBlock *NewBlock = SplitEdge(IncomingBlock, PHIBlock);
       // SplitEdge gives us:
       //   IncomingBlock:
       //     ...
       //     br label %NewBlock
       //   NewBlock:
       //     catchret label %PHIBlock
       // But we need:
       //   IncomingBlock:
       //     ...
       //     catchret label %NewBlock
       //   NewBlock:
       //     br label %PHIBlock
       // So move the terminators to each others' blocks and swap their
       // successors.
       BranchInst *Goto = cast<BranchInst>(IncomingBlock->getTerminator());
       Goto->removeFromParent();
       CatchRet->removeFromParent();
       IncomingBlock->getInstList().push_back(CatchRet);
       NewBlock->getInstList().push_back(Goto);
       Goto->setSuccessor(0, PHIBlock);
       CatchRet->setSuccessor(NewBlock);
       // Update the color mapping for the newly split edge.
       ColorVector &ColorsForPHIBlock = BlockColors[PHIBlock];
       BlockColors[NewBlock] = ColorsForPHIBlock;
       for (BasicBlock *FuncletPad : ColorsForPHIBlock)
         FuncletBlocks[FuncletPad].push_back(NewBlock);
       // Treat the new block as incoming for load insertion.
       IncomingBlock = NewBlock;
     }
     Value *&Load = Loads[IncomingBlock];
     // Insert the load into the predecessor block
     if (!Load)
       Load = new LoadInst(SpillSlot, Twine(V->getName(), ".wineh.reload"),
                           /*Volatile=*/false, IncomingBlock->getTerminator());
 
     U.set(Load);
   } else {
     // Reload right before the old use.
     auto *Load = new LoadInst(SpillSlot, Twine(V->getName(), ".wineh.reload"),
                               /*Volatile=*/false, UsingInst);
     U.set(Load);
   }
 }
 
 void WinEHFuncInfo::addIPToStateRange(const InvokeInst *II,
                                       MCSymbol *InvokeBegin,
                                       MCSymbol *InvokeEnd) {
   assert(InvokeStateMap.count(II) &&
          "should get invoke with precomputed state");
   LabelToStateMap[InvokeBegin] = std::make_pair(InvokeStateMap[II], InvokeEnd);
 }
 
 WinEHFuncInfo::WinEHFuncInfo() {}
Index: vendor/llvm/dist/lib/ExecutionEngine/IntelJITEvents/CMakeLists.txt
===================================================================
--- vendor/llvm/dist/lib/ExecutionEngine/IntelJITEvents/CMakeLists.txt	(revision 295845)
+++ vendor/llvm/dist/lib/ExecutionEngine/IntelJITEvents/CMakeLists.txt	(revision 295846)
@@ -1,8 +1,17 @@
 include_directories( ${CMAKE_CURRENT_SOURCE_DIR}/.. )
 
+if( HAVE_LIBDL )
+    set(LLVM_INTEL_JIT_LIBS ${CMAKE_DL_LIBS})
+endif()
+if( HAVE_LIBPTHREAD )
+    set(LLVM_INTEL_JIT_LIBS pthread ${LLVM_INTEL_JIT_LIBS})
+endif()
+
 add_llvm_library(LLVMIntelJITEvents
   IntelJITEventListener.cpp
   jitprofiling.c
 
-  LINK_LIBS pthread ${CMAKE_DL_LIBS}
+  LINK_LIBS ${LLVM_INTEL_JIT_LIBS}
 )
+
+add_dependencies(LLVMIntelJITEvents LLVMCodeGen)
Index: vendor/llvm/dist/lib/ExecutionEngine/IntelJITEvents/LLVMBuild.txt
===================================================================
--- vendor/llvm/dist/lib/ExecutionEngine/IntelJITEvents/LLVMBuild.txt	(revision 295845)
+++ vendor/llvm/dist/lib/ExecutionEngine/IntelJITEvents/LLVMBuild.txt	(revision 295846)
@@ -1,24 +1,24 @@
 ;===- ./lib/ExecutionEngine/JITProfileAmplifier/LLVMBuild.txt --*- Conf -*--===;
 ;
 ;                     The LLVM Compiler Infrastructure
 ;
 ; This file is distributed under the University of Illinois Open Source
 ; License. See LICENSE.TXT for details.
 ;
 ;===------------------------------------------------------------------------===;
 ;
 ; This is an LLVMBuild description file for the components in this subdirectory.
 ;
 ; For more information on the LLVMBuild system, please see:
 ;
 ;   http://llvm.org/docs/LLVMBuild.html
 ;
 ;===------------------------------------------------------------------------===;
 
 [common]
 
 [component_0]
 type = OptionalLibrary
 name = IntelJITEvents
 parent = ExecutionEngine
-required_libraries = Core DebugInfoDWARF Support Object ExecutionEngine
+required_libraries = CodeGen Core DebugInfoDWARF Support Object ExecutionEngine
Index: vendor/llvm/dist/lib/Support/Triple.cpp
===================================================================
--- vendor/llvm/dist/lib/Support/Triple.cpp	(revision 295845)
+++ vendor/llvm/dist/lib/Support/Triple.cpp	(revision 295846)
@@ -1,1441 +1,1441 @@
 //===--- Triple.cpp - Target triple helper class --------------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/ADT/Triple.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringSwitch.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/TargetParser.h"
 #include "llvm/Support/Host.h"
 #include <cstring>
 using namespace llvm;
 
 const char *Triple::getArchTypeName(ArchType Kind) {
   switch (Kind) {
   case UnknownArch: return "unknown";
 
   case aarch64:     return "aarch64";
   case aarch64_be:  return "aarch64_be";
   case arm:         return "arm";
   case armeb:       return "armeb";
   case avr:         return "avr";
   case bpfel:       return "bpfel";
   case bpfeb:       return "bpfeb";
   case hexagon:     return "hexagon";
   case mips:        return "mips";
   case mipsel:      return "mipsel";
   case mips64:      return "mips64";
   case mips64el:    return "mips64el";
   case msp430:      return "msp430";
   case ppc64:       return "powerpc64";
   case ppc64le:     return "powerpc64le";
   case ppc:         return "powerpc";
   case r600:        return "r600";
   case amdgcn:      return "amdgcn";
   case sparc:       return "sparc";
   case sparcv9:     return "sparcv9";
   case sparcel:     return "sparcel";
   case systemz:     return "s390x";
   case tce:         return "tce";
   case thumb:       return "thumb";
   case thumbeb:     return "thumbeb";
   case x86:         return "i386";
   case x86_64:      return "x86_64";
   case xcore:       return "xcore";
   case nvptx:       return "nvptx";
   case nvptx64:     return "nvptx64";
   case le32:        return "le32";
   case le64:        return "le64";
   case amdil:       return "amdil";
   case amdil64:     return "amdil64";
   case hsail:       return "hsail";
   case hsail64:     return "hsail64";
   case spir:        return "spir";
   case spir64:      return "spir64";
   case kalimba:     return "kalimba";
   case shave:       return "shave";
   case wasm32:      return "wasm32";
   case wasm64:      return "wasm64";
   }
 
   llvm_unreachable("Invalid ArchType!");
 }
 
 const char *Triple::getArchTypePrefix(ArchType Kind) {
   switch (Kind) {
   default:
     return nullptr;
 
   case aarch64:
   case aarch64_be:  return "aarch64";
 
   case arm:
   case armeb:
   case thumb:
   case thumbeb:     return "arm";
 
   case avr:         return "avr";
 
   case ppc64:
   case ppc64le:
   case ppc:         return "ppc";
 
   case mips:
   case mipsel:
   case mips64:
   case mips64el:    return "mips";
 
   case hexagon:     return "hexagon";
 
   case amdgcn:
   case r600:        return "amdgpu";
 
   case bpfel:
   case bpfeb:       return "bpf";
 
   case sparcv9:
   case sparcel:
   case sparc:       return "sparc";
 
   case systemz:     return "s390";
 
   case x86:
   case x86_64:      return "x86";
 
   case xcore:       return "xcore";
 
   case nvptx:       return "nvptx";
   case nvptx64:     return "nvptx";
 
   case le32:        return "le32";
   case le64:        return "le64";
 
   case amdil:
   case amdil64:     return "amdil";
 
   case hsail:
   case hsail64:     return "hsail";
 
   case spir:
   case spir64:      return "spir";
   case kalimba:     return "kalimba";
   case shave:       return "shave";
   case wasm32:
   case wasm64:      return "wasm";
   }
 }
 
 const char *Triple::getVendorTypeName(VendorType Kind) {
   switch (Kind) {
   case UnknownVendor: return "unknown";
 
   case Apple: return "apple";
   case PC: return "pc";
   case SCEI: return "scei";
   case BGP: return "bgp";
   case BGQ: return "bgq";
   case Freescale: return "fsl";
   case IBM: return "ibm";
   case ImaginationTechnologies: return "img";
   case MipsTechnologies: return "mti";
   case NVIDIA: return "nvidia";
   case CSR: return "csr";
   case Myriad: return "myriad";
   }
 
   llvm_unreachable("Invalid VendorType!");
 }
 
 const char *Triple::getOSTypeName(OSType Kind) {
   switch (Kind) {
   case UnknownOS: return "unknown";
 
   case CloudABI: return "cloudabi";
   case Darwin: return "darwin";
   case DragonFly: return "dragonfly";
   case FreeBSD: return "freebsd";
   case IOS: return "ios";
   case KFreeBSD: return "kfreebsd";
   case Linux: return "linux";
   case Lv2: return "lv2";
   case MacOSX: return "macosx";
   case NetBSD: return "netbsd";
   case OpenBSD: return "openbsd";
   case Solaris: return "solaris";
   case Win32: return "windows";
   case Haiku: return "haiku";
   case Minix: return "minix";
   case RTEMS: return "rtems";
   case NaCl: return "nacl";
   case CNK: return "cnk";
   case Bitrig: return "bitrig";
   case AIX: return "aix";
   case CUDA: return "cuda";
   case NVCL: return "nvcl";
   case AMDHSA: return "amdhsa";
   case PS4: return "ps4";
   case ELFIAMCU: return "elfiamcu";
   case TvOS: return "tvos";
   case WatchOS: return "watchos";
   }
 
   llvm_unreachable("Invalid OSType");
 }
 
 const char *Triple::getEnvironmentTypeName(EnvironmentType Kind) {
   switch (Kind) {
   case UnknownEnvironment: return "unknown";
   case GNU: return "gnu";
   case GNUEABIHF: return "gnueabihf";
   case GNUEABI: return "gnueabi";
   case GNUX32: return "gnux32";
   case CODE16: return "code16";
   case EABI: return "eabi";
   case EABIHF: return "eabihf";
   case Android: return "android";
   case MSVC: return "msvc";
   case Itanium: return "itanium";
   case Cygnus: return "cygnus";
   case AMDOpenCL: return "amdopencl";
   case CoreCLR: return "coreclr";
   }
 
   llvm_unreachable("Invalid EnvironmentType!");
 }
 
 static Triple::ArchType parseBPFArch(StringRef ArchName) {
   if (ArchName.equals("bpf")) {
     if (sys::IsLittleEndianHost)
       return Triple::bpfel;
     else
       return Triple::bpfeb;
   } else if (ArchName.equals("bpf_be") || ArchName.equals("bpfeb")) {
     return Triple::bpfeb;
   } else if (ArchName.equals("bpf_le") || ArchName.equals("bpfel")) {
     return Triple::bpfel;
   } else {
     return Triple::UnknownArch;
   }
 }
 
 Triple::ArchType Triple::getArchTypeForLLVMName(StringRef Name) {
   Triple::ArchType BPFArch(parseBPFArch(Name));
   return StringSwitch<Triple::ArchType>(Name)
     .Case("aarch64", aarch64)
     .Case("aarch64_be", aarch64_be)
     .Case("arm64", aarch64) // "arm64" is an alias for "aarch64"
     .Case("arm", arm)
     .Case("armeb", armeb)
     .Case("avr", avr)
     .StartsWith("bpf", BPFArch)
     .Case("mips", mips)
     .Case("mipsel", mipsel)
     .Case("mips64", mips64)
     .Case("mips64el", mips64el)
     .Case("msp430", msp430)
     .Case("ppc64", ppc64)
     .Case("ppc32", ppc)
     .Case("ppc", ppc)
     .Case("ppc64le", ppc64le)
     .Case("r600", r600)
     .Case("amdgcn", amdgcn)
     .Case("hexagon", hexagon)
     .Case("sparc", sparc)
     .Case("sparcel", sparcel)
     .Case("sparcv9", sparcv9)
     .Case("systemz", systemz)
     .Case("tce", tce)
     .Case("thumb", thumb)
     .Case("thumbeb", thumbeb)
     .Case("x86", x86)
     .Case("x86-64", x86_64)
     .Case("xcore", xcore)
     .Case("nvptx", nvptx)
     .Case("nvptx64", nvptx64)
     .Case("le32", le32)
     .Case("le64", le64)
     .Case("amdil", amdil)
     .Case("amdil64", amdil64)
     .Case("hsail", hsail)
     .Case("hsail64", hsail64)
     .Case("spir", spir)
     .Case("spir64", spir64)
     .Case("kalimba", kalimba)
     .Case("shave", shave)
     .Case("wasm32", wasm32)
     .Case("wasm64", wasm64)
     .Default(UnknownArch);
 }
 
 static Triple::ArchType parseARMArch(StringRef ArchName) {
   unsigned ISA = ARM::parseArchISA(ArchName);
   unsigned ENDIAN = ARM::parseArchEndian(ArchName);
 
   Triple::ArchType arch = Triple::UnknownArch;
   switch (ENDIAN) {
   case ARM::EK_LITTLE: {
     switch (ISA) {
     case ARM::IK_ARM:
       arch = Triple::arm;
       break;
     case ARM::IK_THUMB:
       arch = Triple::thumb;
       break;
     case ARM::IK_AARCH64:
       arch = Triple::aarch64;
       break;
     }
     break;
   }
   case ARM::EK_BIG: {
     switch (ISA) {
     case ARM::IK_ARM:
       arch = Triple::armeb;
       break;
     case ARM::IK_THUMB:
       arch = Triple::thumbeb;
       break;
     case ARM::IK_AARCH64:
       arch = Triple::aarch64_be;
       break;
     }
     break;
   }
   }
 
   ArchName = ARM::getCanonicalArchName(ArchName);
   if (ArchName.empty())
     return Triple::UnknownArch;
 
   // Thumb only exists in v4+
   if (ISA == ARM::IK_THUMB &&
       (ArchName.startswith("v2") || ArchName.startswith("v3")))
     return Triple::UnknownArch;
 
   // Thumb only for v6m
   unsigned Profile = ARM::parseArchProfile(ArchName);
   unsigned Version = ARM::parseArchVersion(ArchName);
   if (Profile == ARM::PK_M && Version == 6) {
     if (ENDIAN == ARM::EK_BIG)
       return Triple::thumbeb;
     else
       return Triple::thumb;
   }
 
   return arch;
 }
 
 static Triple::ArchType parseArch(StringRef ArchName) {
   auto AT = StringSwitch<Triple::ArchType>(ArchName)
     .Cases("i386", "i486", "i586", "i686", Triple::x86)
     // FIXME: Do we need to support these?
     .Cases("i786", "i886", "i986", Triple::x86)
     .Cases("amd64", "x86_64", "x86_64h", Triple::x86_64)
-    .Case("powerpc", Triple::ppc)
-    .Cases("powerpc64", "ppu", Triple::ppc64)
-    .Case("powerpc64le", Triple::ppc64le)
+    .Cases("powerpc", "ppc32", Triple::ppc)
+    .Cases("powerpc64", "ppu", "ppc64", Triple::ppc64)
+    .Cases("powerpc64le", "ppc64le", Triple::ppc64le)
     .Case("xscale", Triple::arm)
     .Case("xscaleeb", Triple::armeb)
     .Case("aarch64", Triple::aarch64)
     .Case("aarch64_be", Triple::aarch64_be)
     .Case("arm64", Triple::aarch64)
     .Case("arm", Triple::arm)
     .Case("armeb", Triple::armeb)
     .Case("thumb", Triple::thumb)
     .Case("thumbeb", Triple::thumbeb)
     .Case("avr", Triple::avr)
     .Case("msp430", Triple::msp430)
     .Cases("mips", "mipseb", "mipsallegrex", Triple::mips)
     .Cases("mipsel", "mipsallegrexel", Triple::mipsel)
     .Cases("mips64", "mips64eb", Triple::mips64)
     .Case("mips64el", Triple::mips64el)
     .Case("r600", Triple::r600)
     .Case("amdgcn", Triple::amdgcn)
     .Case("hexagon", Triple::hexagon)
-    .Case("s390x", Triple::systemz)
+    .Cases("s390x", "systemz", Triple::systemz)
     .Case("sparc", Triple::sparc)
     .Case("sparcel", Triple::sparcel)
     .Cases("sparcv9", "sparc64", Triple::sparcv9)
     .Case("tce", Triple::tce)
     .Case("xcore", Triple::xcore)
     .Case("nvptx", Triple::nvptx)
     .Case("nvptx64", Triple::nvptx64)
     .Case("le32", Triple::le32)
     .Case("le64", Triple::le64)
     .Case("amdil", Triple::amdil)
     .Case("amdil64", Triple::amdil64)
     .Case("hsail", Triple::hsail)
     .Case("hsail64", Triple::hsail64)
     .Case("spir", Triple::spir)
     .Case("spir64", Triple::spir64)
     .StartsWith("kalimba", Triple::kalimba)
     .Case("shave", Triple::shave)
     .Case("wasm32", Triple::wasm32)
     .Case("wasm64", Triple::wasm64)
     .Default(Triple::UnknownArch);
 
   // Some architectures require special parsing logic just to compute the
   // ArchType result.
   if (AT == Triple::UnknownArch) {
     if (ArchName.startswith("arm") || ArchName.startswith("thumb") ||
         ArchName.startswith("aarch64"))
       return parseARMArch(ArchName);
     if (ArchName.startswith("bpf"))
       return parseBPFArch(ArchName);
   }
 
   return AT;
 }
 
 static Triple::VendorType parseVendor(StringRef VendorName) {
   return StringSwitch<Triple::VendorType>(VendorName)
     .Case("apple", Triple::Apple)
     .Case("pc", Triple::PC)
     .Case("scei", Triple::SCEI)
     .Case("bgp", Triple::BGP)
     .Case("bgq", Triple::BGQ)
     .Case("fsl", Triple::Freescale)
     .Case("ibm", Triple::IBM)
     .Case("img", Triple::ImaginationTechnologies)
     .Case("mti", Triple::MipsTechnologies)
     .Case("nvidia", Triple::NVIDIA)
     .Case("csr", Triple::CSR)
     .Case("myriad", Triple::Myriad)
     .Default(Triple::UnknownVendor);
 }
 
 static Triple::OSType parseOS(StringRef OSName) {
   return StringSwitch<Triple::OSType>(OSName)
     .StartsWith("cloudabi", Triple::CloudABI)
     .StartsWith("darwin", Triple::Darwin)
     .StartsWith("dragonfly", Triple::DragonFly)
     .StartsWith("freebsd", Triple::FreeBSD)
     .StartsWith("ios", Triple::IOS)
     .StartsWith("kfreebsd", Triple::KFreeBSD)
     .StartsWith("linux", Triple::Linux)
     .StartsWith("lv2", Triple::Lv2)
     .StartsWith("macosx", Triple::MacOSX)
     .StartsWith("netbsd", Triple::NetBSD)
     .StartsWith("openbsd", Triple::OpenBSD)
     .StartsWith("solaris", Triple::Solaris)
     .StartsWith("win32", Triple::Win32)
     .StartsWith("windows", Triple::Win32)
     .StartsWith("haiku", Triple::Haiku)
     .StartsWith("minix", Triple::Minix)
     .StartsWith("rtems", Triple::RTEMS)
     .StartsWith("nacl", Triple::NaCl)
     .StartsWith("cnk", Triple::CNK)
     .StartsWith("bitrig", Triple::Bitrig)
     .StartsWith("aix", Triple::AIX)
     .StartsWith("cuda", Triple::CUDA)
     .StartsWith("nvcl", Triple::NVCL)
     .StartsWith("amdhsa", Triple::AMDHSA)
     .StartsWith("ps4", Triple::PS4)
     .StartsWith("elfiamcu", Triple::ELFIAMCU)
     .StartsWith("tvos", Triple::TvOS)
     .StartsWith("watchos", Triple::WatchOS)
     .Default(Triple::UnknownOS);
 }
 
 static Triple::EnvironmentType parseEnvironment(StringRef EnvironmentName) {
   return StringSwitch<Triple::EnvironmentType>(EnvironmentName)
     .StartsWith("eabihf", Triple::EABIHF)
     .StartsWith("eabi", Triple::EABI)
     .StartsWith("gnueabihf", Triple::GNUEABIHF)
     .StartsWith("gnueabi", Triple::GNUEABI)
     .StartsWith("gnux32", Triple::GNUX32)
     .StartsWith("code16", Triple::CODE16)
     .StartsWith("gnu", Triple::GNU)
     .StartsWith("android", Triple::Android)
     .StartsWith("msvc", Triple::MSVC)
     .StartsWith("itanium", Triple::Itanium)
     .StartsWith("cygnus", Triple::Cygnus)
     .StartsWith("amdopencl", Triple::AMDOpenCL)
     .StartsWith("coreclr", Triple::CoreCLR)
     .Default(Triple::UnknownEnvironment);
 }
 
 static Triple::ObjectFormatType parseFormat(StringRef EnvironmentName) {
   return StringSwitch<Triple::ObjectFormatType>(EnvironmentName)
     .EndsWith("coff", Triple::COFF)
     .EndsWith("elf", Triple::ELF)
     .EndsWith("macho", Triple::MachO)
     .Default(Triple::UnknownObjectFormat);
 }
 
 static Triple::SubArchType parseSubArch(StringRef SubArchName) {
   StringRef ARMSubArch = ARM::getCanonicalArchName(SubArchName);
 
   // For now, this is the small part. Early return.
   if (ARMSubArch.empty())
     return StringSwitch<Triple::SubArchType>(SubArchName)
       .EndsWith("kalimba3", Triple::KalimbaSubArch_v3)
       .EndsWith("kalimba4", Triple::KalimbaSubArch_v4)
       .EndsWith("kalimba5", Triple::KalimbaSubArch_v5)
       .Default(Triple::NoSubArch);
 
   // ARM sub arch.
   switch(ARM::parseArch(ARMSubArch)) {
   case ARM::AK_ARMV4:
     return Triple::NoSubArch;
   case ARM::AK_ARMV4T:
     return Triple::ARMSubArch_v4t;
   case ARM::AK_ARMV5T:
     return Triple::ARMSubArch_v5;
   case ARM::AK_ARMV5TE:
   case ARM::AK_IWMMXT:
   case ARM::AK_IWMMXT2:
   case ARM::AK_XSCALE:
   case ARM::AK_ARMV5TEJ:
     return Triple::ARMSubArch_v5te;
   case ARM::AK_ARMV6:
     return Triple::ARMSubArch_v6;
   case ARM::AK_ARMV6K:
   case ARM::AK_ARMV6KZ:
     return Triple::ARMSubArch_v6k;
   case ARM::AK_ARMV6T2:
     return Triple::ARMSubArch_v6t2;
   case ARM::AK_ARMV6M:
     return Triple::ARMSubArch_v6m;
   case ARM::AK_ARMV7A:
   case ARM::AK_ARMV7R:
     return Triple::ARMSubArch_v7;
   case ARM::AK_ARMV7K:
     return Triple::ARMSubArch_v7k;
   case ARM::AK_ARMV7M:
     return Triple::ARMSubArch_v7m;
   case ARM::AK_ARMV7S:
     return Triple::ARMSubArch_v7s;
   case ARM::AK_ARMV7EM:
     return Triple::ARMSubArch_v7em;
   case ARM::AK_ARMV8A:
     return Triple::ARMSubArch_v8;
   case ARM::AK_ARMV8_1A:
     return Triple::ARMSubArch_v8_1a;
   case ARM::AK_ARMV8_2A:
     return Triple::ARMSubArch_v8_2a;
   default:
     return Triple::NoSubArch;
   }
 }
 
 static const char *getObjectFormatTypeName(Triple::ObjectFormatType Kind) {
   switch (Kind) {
   case Triple::UnknownObjectFormat: return "";
   case Triple::COFF: return "coff";
   case Triple::ELF: return "elf";
   case Triple::MachO: return "macho";
   }
   llvm_unreachable("unknown object format type");
 }
 
 static Triple::ObjectFormatType getDefaultFormat(const Triple &T) {
   switch (T.getArch()) {
   case Triple::UnknownArch:
   case Triple::aarch64:
   case Triple::arm:
   case Triple::thumb:
   case Triple::x86:
   case Triple::x86_64:
     if (T.isOSDarwin())
       return Triple::MachO;
     else if (T.isOSWindows())
       return Triple::COFF;
     return Triple::ELF;
 
   case Triple::aarch64_be:
   case Triple::amdgcn:
   case Triple::amdil:
   case Triple::amdil64:
   case Triple::armeb:
   case Triple::avr:
   case Triple::bpfeb:
   case Triple::bpfel:
   case Triple::hexagon:
   case Triple::hsail:
   case Triple::hsail64:
   case Triple::kalimba:
   case Triple::le32:
   case Triple::le64:
   case Triple::mips:
   case Triple::mips64:
   case Triple::mips64el:
   case Triple::mipsel:
   case Triple::msp430:
   case Triple::nvptx:
   case Triple::nvptx64:
   case Triple::ppc64le:
   case Triple::r600:
   case Triple::shave:
   case Triple::sparc:
   case Triple::sparcel:
   case Triple::sparcv9:
   case Triple::spir:
   case Triple::spir64:
   case Triple::systemz:
   case Triple::tce:
   case Triple::thumbeb:
   case Triple::wasm32:
   case Triple::wasm64:
   case Triple::xcore:
     return Triple::ELF;
 
   case Triple::ppc:
   case Triple::ppc64:
     if (T.isOSDarwin())
       return Triple::MachO;
     return Triple::ELF;
   }
   llvm_unreachable("unknown architecture");
 }
 
 /// \brief Construct a triple from the string representation provided.
 ///
 /// This stores the string representation and parses the various pieces into
 /// enum members.
 Triple::Triple(const Twine &Str)
     : Data(Str.str()), Arch(UnknownArch), SubArch(NoSubArch),
       Vendor(UnknownVendor), OS(UnknownOS), Environment(UnknownEnvironment),
       ObjectFormat(UnknownObjectFormat) {
   // Do minimal parsing by hand here.
   SmallVector<StringRef, 4> Components;
   StringRef(Data).split(Components, '-', /*MaxSplit*/ 3);
   if (Components.size() > 0) {
     Arch = parseArch(Components[0]);
     SubArch = parseSubArch(Components[0]);
     if (Components.size() > 1) {
       Vendor = parseVendor(Components[1]);
       if (Components.size() > 2) {
         OS = parseOS(Components[2]);
         if (Components.size() > 3) {
           Environment = parseEnvironment(Components[3]);
           ObjectFormat = parseFormat(Components[3]);
         }
       }
     }
   }
   if (ObjectFormat == UnknownObjectFormat)
     ObjectFormat = getDefaultFormat(*this);
 }
 
 /// \brief Construct a triple from string representations of the architecture,
 /// vendor, and OS.
 ///
 /// This joins each argument into a canonical string representation and parses
 /// them into enum members. It leaves the environment unknown and omits it from
 /// the string representation.
 Triple::Triple(const Twine &ArchStr, const Twine &VendorStr, const Twine &OSStr)
     : Data((ArchStr + Twine('-') + VendorStr + Twine('-') + OSStr).str()),
       Arch(parseArch(ArchStr.str())),
       SubArch(parseSubArch(ArchStr.str())),
       Vendor(parseVendor(VendorStr.str())),
       OS(parseOS(OSStr.str())),
       Environment(), ObjectFormat(Triple::UnknownObjectFormat) {
   ObjectFormat = getDefaultFormat(*this);
 }
 
 /// \brief Construct a triple from string representations of the architecture,
 /// vendor, OS, and environment.
 ///
 /// This joins each argument into a canonical string representation and parses
 /// them into enum members.
 Triple::Triple(const Twine &ArchStr, const Twine &VendorStr, const Twine &OSStr,
                const Twine &EnvironmentStr)
     : Data((ArchStr + Twine('-') + VendorStr + Twine('-') + OSStr + Twine('-') +
             EnvironmentStr).str()),
       Arch(parseArch(ArchStr.str())),
       SubArch(parseSubArch(ArchStr.str())),
       Vendor(parseVendor(VendorStr.str())),
       OS(parseOS(OSStr.str())),
       Environment(parseEnvironment(EnvironmentStr.str())),
       ObjectFormat(parseFormat(EnvironmentStr.str())) {
   if (ObjectFormat == Triple::UnknownObjectFormat)
     ObjectFormat = getDefaultFormat(*this);
 }
 
 std::string Triple::normalize(StringRef Str) {
   bool IsMinGW32 = false;
   bool IsCygwin = false;
 
   // Parse into components.
   SmallVector<StringRef, 4> Components;
   Str.split(Components, '-');
 
   // If the first component corresponds to a known architecture, preferentially
   // use it for the architecture.  If the second component corresponds to a
   // known vendor, preferentially use it for the vendor, etc.  This avoids silly
   // component movement when a component parses as (eg) both a valid arch and a
   // valid os.
   ArchType Arch = UnknownArch;
   if (Components.size() > 0)
     Arch = parseArch(Components[0]);
   VendorType Vendor = UnknownVendor;
   if (Components.size() > 1)
     Vendor = parseVendor(Components[1]);
   OSType OS = UnknownOS;
   if (Components.size() > 2) {
     OS = parseOS(Components[2]);
     IsCygwin = Components[2].startswith("cygwin");
     IsMinGW32 = Components[2].startswith("mingw");
   }
   EnvironmentType Environment = UnknownEnvironment;
   if (Components.size() > 3)
     Environment = parseEnvironment(Components[3]);
   ObjectFormatType ObjectFormat = UnknownObjectFormat;
   if (Components.size() > 4)
     ObjectFormat = parseFormat(Components[4]);
 
   // Note which components are already in their final position.  These will not
   // be moved.
   bool Found[4];
   Found[0] = Arch != UnknownArch;
   Found[1] = Vendor != UnknownVendor;
   Found[2] = OS != UnknownOS;
   Found[3] = Environment != UnknownEnvironment;
 
   // If they are not there already, permute the components into their canonical
   // positions by seeing if they parse as a valid architecture, and if so moving
   // the component to the architecture position etc.
   for (unsigned Pos = 0; Pos != array_lengthof(Found); ++Pos) {
     if (Found[Pos])
       continue; // Already in the canonical position.
 
     for (unsigned Idx = 0; Idx != Components.size(); ++Idx) {
       // Do not reparse any components that already matched.
       if (Idx < array_lengthof(Found) && Found[Idx])
         continue;
 
       // Does this component parse as valid for the target position?
       bool Valid = false;
       StringRef Comp = Components[Idx];
       switch (Pos) {
       default: llvm_unreachable("unexpected component type!");
       case 0:
         Arch = parseArch(Comp);
         Valid = Arch != UnknownArch;
         break;
       case 1:
         Vendor = parseVendor(Comp);
         Valid = Vendor != UnknownVendor;
         break;
       case 2:
         OS = parseOS(Comp);
         IsCygwin = Comp.startswith("cygwin");
         IsMinGW32 = Comp.startswith("mingw");
         Valid = OS != UnknownOS || IsCygwin || IsMinGW32;
         break;
       case 3:
         Environment = parseEnvironment(Comp);
         Valid = Environment != UnknownEnvironment;
         if (!Valid) {
           ObjectFormat = parseFormat(Comp);
           Valid = ObjectFormat != UnknownObjectFormat;
         }
         break;
       }
       if (!Valid)
         continue; // Nope, try the next component.
 
       // Move the component to the target position, pushing any non-fixed
       // components that are in the way to the right.  This tends to give
       // good results in the common cases of a forgotten vendor component
       // or a wrongly positioned environment.
       if (Pos < Idx) {
         // Insert left, pushing the existing components to the right.  For
         // example, a-b-i386 -> i386-a-b when moving i386 to the front.
         StringRef CurrentComponent(""); // The empty component.
         // Replace the component we are moving with an empty component.
         std::swap(CurrentComponent, Components[Idx]);
         // Insert the component being moved at Pos, displacing any existing
         // components to the right.
         for (unsigned i = Pos; !CurrentComponent.empty(); ++i) {
           // Skip over any fixed components.
           while (i < array_lengthof(Found) && Found[i])
             ++i;
           // Place the component at the new position, getting the component
           // that was at this position - it will be moved right.
           std::swap(CurrentComponent, Components[i]);
         }
       } else if (Pos > Idx) {
         // Push right by inserting empty components until the component at Idx
         // reaches the target position Pos.  For example, pc-a -> -pc-a when
         // moving pc to the second position.
         do {
           // Insert one empty component at Idx.
           StringRef CurrentComponent(""); // The empty component.
           for (unsigned i = Idx; i < Components.size();) {
             // Place the component at the new position, getting the component
             // that was at this position - it will be moved right.
             std::swap(CurrentComponent, Components[i]);
             // If it was placed on top of an empty component then we are done.
             if (CurrentComponent.empty())
               break;
             // Advance to the next component, skipping any fixed components.
             while (++i < array_lengthof(Found) && Found[i])
               ;
           }
           // The last component was pushed off the end - append it.
           if (!CurrentComponent.empty())
             Components.push_back(CurrentComponent);
 
           // Advance Idx to the component's new position.
           while (++Idx < array_lengthof(Found) && Found[Idx])
             ;
         } while (Idx < Pos); // Add more until the final position is reached.
       }
       assert(Pos < Components.size() && Components[Pos] == Comp &&
              "Component moved wrong!");
       Found[Pos] = true;
       break;
     }
   }
 
   // Special case logic goes here.  At this point Arch, Vendor and OS have the
   // correct values for the computed components.
   std::string NormalizedEnvironment;
   if (Environment == Triple::Android && Components[3].startswith("androideabi")) {
     StringRef AndroidVersion = Components[3].drop_front(strlen("androideabi"));
     if (AndroidVersion.empty()) {
       Components[3] = "android";
     } else {
       NormalizedEnvironment = Twine("android", AndroidVersion).str();
       Components[3] = NormalizedEnvironment;
     }
   }
 
   if (OS == Triple::Win32) {
     Components.resize(4);
     Components[2] = "windows";
     if (Environment == UnknownEnvironment) {
       if (ObjectFormat == UnknownObjectFormat || ObjectFormat == Triple::COFF)
         Components[3] = "msvc";
       else
         Components[3] = getObjectFormatTypeName(ObjectFormat);
     }
   } else if (IsMinGW32) {
     Components.resize(4);
     Components[2] = "windows";
     Components[3] = "gnu";
   } else if (IsCygwin) {
     Components.resize(4);
     Components[2] = "windows";
     Components[3] = "cygnus";
   }
   if (IsMinGW32 || IsCygwin ||
       (OS == Triple::Win32 && Environment != UnknownEnvironment)) {
     if (ObjectFormat != UnknownObjectFormat && ObjectFormat != Triple::COFF) {
       Components.resize(5);
       Components[4] = getObjectFormatTypeName(ObjectFormat);
     }
   }
 
   // Stick the corrected components back together to form the normalized string.
   std::string Normalized;
   for (unsigned i = 0, e = Components.size(); i != e; ++i) {
     if (i) Normalized += '-';
     Normalized += Components[i];
   }
   return Normalized;
 }
 
 StringRef Triple::getArchName() const {
   return StringRef(Data).split('-').first;           // Isolate first component
 }
 
 StringRef Triple::getVendorName() const {
   StringRef Tmp = StringRef(Data).split('-').second; // Strip first component
   return Tmp.split('-').first;                       // Isolate second component
 }
 
 StringRef Triple::getOSName() const {
   StringRef Tmp = StringRef(Data).split('-').second; // Strip first component
   Tmp = Tmp.split('-').second;                       // Strip second component
   return Tmp.split('-').first;                       // Isolate third component
 }
 
 StringRef Triple::getEnvironmentName() const {
   StringRef Tmp = StringRef(Data).split('-').second; // Strip first component
   Tmp = Tmp.split('-').second;                       // Strip second component
   return Tmp.split('-').second;                      // Strip third component
 }
 
 StringRef Triple::getOSAndEnvironmentName() const {
   StringRef Tmp = StringRef(Data).split('-').second; // Strip first component
   return Tmp.split('-').second;                      // Strip second component
 }
 
 static unsigned EatNumber(StringRef &Str) {
   assert(!Str.empty() && Str[0] >= '0' && Str[0] <= '9' && "Not a number");
   unsigned Result = 0;
 
   do {
     // Consume the leading digit.
     Result = Result*10 + (Str[0] - '0');
 
     // Eat the digit.
     Str = Str.substr(1);
   } while (!Str.empty() && Str[0] >= '0' && Str[0] <= '9');
 
   return Result;
 }
 
 static void parseVersionFromName(StringRef Name, unsigned &Major,
                                  unsigned &Minor, unsigned &Micro) {
   // Any unset version defaults to 0.
   Major = Minor = Micro = 0;
 
   // Parse up to three components.
   unsigned *Components[3] = {&Major, &Minor, &Micro};
   for (unsigned i = 0; i != 3; ++i) {
     if (Name.empty() || Name[0] < '0' || Name[0] > '9')
       break;
 
     // Consume the leading number.
     *Components[i] = EatNumber(Name);
 
     // Consume the separator, if present.
     if (Name.startswith("."))
       Name = Name.substr(1);
   }
 }
 
 void Triple::getEnvironmentVersion(unsigned &Major, unsigned &Minor,
                                    unsigned &Micro) const {
   StringRef EnvironmentName = getEnvironmentName();
   StringRef EnvironmentTypeName = getEnvironmentTypeName(getEnvironment());
   if (EnvironmentName.startswith(EnvironmentTypeName))
     EnvironmentName = EnvironmentName.substr(EnvironmentTypeName.size());
 
   parseVersionFromName(EnvironmentName, Major, Minor, Micro);
 }
 
 void Triple::getOSVersion(unsigned &Major, unsigned &Minor,
                           unsigned &Micro) const {
   StringRef OSName = getOSName();
   // Assume that the OS portion of the triple starts with the canonical name.
   StringRef OSTypeName = getOSTypeName(getOS());
   if (OSName.startswith(OSTypeName))
     OSName = OSName.substr(OSTypeName.size());
 
   parseVersionFromName(OSName, Major, Minor, Micro);
 }
 
 bool Triple::getMacOSXVersion(unsigned &Major, unsigned &Minor,
                               unsigned &Micro) const {
   getOSVersion(Major, Minor, Micro);
 
   switch (getOS()) {
   default: llvm_unreachable("unexpected OS for Darwin triple");
   case Darwin:
     // Default to darwin8, i.e., MacOSX 10.4.
     if (Major == 0)
       Major = 8;
     // Darwin version numbers are skewed from OS X versions.
     if (Major < 4)
       return false;
     Micro = 0;
     Minor = Major - 4;
     Major = 10;
     break;
   case MacOSX:
     // Default to 10.4.
     if (Major == 0) {
       Major = 10;
       Minor = 4;
     }
     if (Major != 10)
       return false;
     break;
   case IOS:
   case TvOS:
   case WatchOS:
     // Ignore the version from the triple.  This is only handled because the
     // the clang driver combines OS X and IOS support into a common Darwin
     // toolchain that wants to know the OS X version number even when targeting
     // IOS.
     Major = 10;
     Minor = 4;
     Micro = 0;
     break;
   }
   return true;
 }
 
 void Triple::getiOSVersion(unsigned &Major, unsigned &Minor,
                            unsigned &Micro) const {
   switch (getOS()) {
   default: llvm_unreachable("unexpected OS for Darwin triple");
   case Darwin:
   case MacOSX:
     // Ignore the version from the triple.  This is only handled because the
     // the clang driver combines OS X and IOS support into a common Darwin
     // toolchain that wants to know the iOS version number even when targeting
     // OS X.
     Major = 5;
     Minor = 0;
     Micro = 0;
     break;
   case IOS:
   case TvOS:
     getOSVersion(Major, Minor, Micro);
     // Default to 5.0 (or 7.0 for arm64).
     if (Major == 0)
       Major = (getArch() == aarch64) ? 7 : 5;
     break;
   case WatchOS:
     llvm_unreachable("conflicting triple info");
   }
 }
 
 void Triple::getWatchOSVersion(unsigned &Major, unsigned &Minor,
                                unsigned &Micro) const {
   switch (getOS()) {
   default: llvm_unreachable("unexpected OS for Darwin triple");
   case Darwin:
   case MacOSX:
     // Ignore the version from the triple.  This is only handled because the
     // the clang driver combines OS X and IOS support into a common Darwin
     // toolchain that wants to know the iOS version number even when targeting
     // OS X.
     Major = 2;
     Minor = 0;
     Micro = 0;
     break;
   case WatchOS:
     getOSVersion(Major, Minor, Micro);
     if (Major == 0)
       Major = 2;
     break;
   case IOS:
     llvm_unreachable("conflicting triple info");
   }
 }
 
 void Triple::setTriple(const Twine &Str) {
   *this = Triple(Str);
 }
 
 void Triple::setArch(ArchType Kind) {
   setArchName(getArchTypeName(Kind));
 }
 
 void Triple::setVendor(VendorType Kind) {
   setVendorName(getVendorTypeName(Kind));
 }
 
 void Triple::setOS(OSType Kind) {
   setOSName(getOSTypeName(Kind));
 }
 
 void Triple::setEnvironment(EnvironmentType Kind) {
   if (ObjectFormat == getDefaultFormat(*this))
     return setEnvironmentName(getEnvironmentTypeName(Kind));
 
   setEnvironmentName((getEnvironmentTypeName(Kind) + Twine("-") +
                       getObjectFormatTypeName(ObjectFormat)).str());
 }
 
 void Triple::setObjectFormat(ObjectFormatType Kind) {
   if (Environment == UnknownEnvironment)
     return setEnvironmentName(getObjectFormatTypeName(Kind));
 
   setEnvironmentName((getEnvironmentTypeName(Environment) + Twine("-") +
                       getObjectFormatTypeName(Kind)).str());
 }
 
 void Triple::setArchName(StringRef Str) {
   // Work around a miscompilation bug for Twines in gcc 4.0.3.
   SmallString<64> Triple;
   Triple += Str;
   Triple += "-";
   Triple += getVendorName();
   Triple += "-";
   Triple += getOSAndEnvironmentName();
   setTriple(Triple);
 }
 
 void Triple::setVendorName(StringRef Str) {
   setTriple(getArchName() + "-" + Str + "-" + getOSAndEnvironmentName());
 }
 
 void Triple::setOSName(StringRef Str) {
   if (hasEnvironment())
     setTriple(getArchName() + "-" + getVendorName() + "-" + Str +
               "-" + getEnvironmentName());
   else
     setTriple(getArchName() + "-" + getVendorName() + "-" + Str);
 }
 
 void Triple::setEnvironmentName(StringRef Str) {
   setTriple(getArchName() + "-" + getVendorName() + "-" + getOSName() +
             "-" + Str);
 }
 
 void Triple::setOSAndEnvironmentName(StringRef Str) {
   setTriple(getArchName() + "-" + getVendorName() + "-" + Str);
 }
 
 static unsigned getArchPointerBitWidth(llvm::Triple::ArchType Arch) {
   switch (Arch) {
   case llvm::Triple::UnknownArch:
     return 0;
 
   case llvm::Triple::avr:
   case llvm::Triple::msp430:
     return 16;
 
   case llvm::Triple::arm:
   case llvm::Triple::armeb:
   case llvm::Triple::hexagon:
   case llvm::Triple::le32:
   case llvm::Triple::mips:
   case llvm::Triple::mipsel:
   case llvm::Triple::nvptx:
   case llvm::Triple::ppc:
   case llvm::Triple::r600:
   case llvm::Triple::sparc:
   case llvm::Triple::sparcel:
   case llvm::Triple::tce:
   case llvm::Triple::thumb:
   case llvm::Triple::thumbeb:
   case llvm::Triple::x86:
   case llvm::Triple::xcore:
   case llvm::Triple::amdil:
   case llvm::Triple::hsail:
   case llvm::Triple::spir:
   case llvm::Triple::kalimba:
   case llvm::Triple::shave:
   case llvm::Triple::wasm32:
     return 32;
 
   case llvm::Triple::aarch64:
   case llvm::Triple::aarch64_be:
   case llvm::Triple::amdgcn:
   case llvm::Triple::bpfel:
   case llvm::Triple::bpfeb:
   case llvm::Triple::le64:
   case llvm::Triple::mips64:
   case llvm::Triple::mips64el:
   case llvm::Triple::nvptx64:
   case llvm::Triple::ppc64:
   case llvm::Triple::ppc64le:
   case llvm::Triple::sparcv9:
   case llvm::Triple::systemz:
   case llvm::Triple::x86_64:
   case llvm::Triple::amdil64:
   case llvm::Triple::hsail64:
   case llvm::Triple::spir64:
   case llvm::Triple::wasm64:
     return 64;
   }
   llvm_unreachable("Invalid architecture value");
 }
 
 bool Triple::isArch64Bit() const {
   return getArchPointerBitWidth(getArch()) == 64;
 }
 
 bool Triple::isArch32Bit() const {
   return getArchPointerBitWidth(getArch()) == 32;
 }
 
 bool Triple::isArch16Bit() const {
   return getArchPointerBitWidth(getArch()) == 16;
 }
 
 Triple Triple::get32BitArchVariant() const {
   Triple T(*this);
   switch (getArch()) {
   case Triple::UnknownArch:
   case Triple::amdgcn:
   case Triple::avr:
   case Triple::bpfel:
   case Triple::bpfeb:
   case Triple::msp430:
   case Triple::systemz:
   case Triple::ppc64le:
     T.setArch(UnknownArch);
     break;
 
   case Triple::amdil:
   case Triple::hsail:
   case Triple::spir:
   case Triple::arm:
   case Triple::armeb:
   case Triple::hexagon:
   case Triple::kalimba:
   case Triple::le32:
   case Triple::mips:
   case Triple::mipsel:
   case Triple::nvptx:
   case Triple::ppc:
   case Triple::r600:
   case Triple::sparc:
   case Triple::sparcel:
   case Triple::tce:
   case Triple::thumb:
   case Triple::thumbeb:
   case Triple::x86:
   case Triple::xcore:
   case Triple::shave:
   case Triple::wasm32:
     // Already 32-bit.
     break;
 
   case Triple::aarch64:    T.setArch(Triple::arm);     break;
   case Triple::aarch64_be: T.setArch(Triple::armeb);   break;
   case Triple::le64:       T.setArch(Triple::le32);    break;
   case Triple::mips64:     T.setArch(Triple::mips);    break;
   case Triple::mips64el:   T.setArch(Triple::mipsel);  break;
   case Triple::nvptx64:    T.setArch(Triple::nvptx);   break;
   case Triple::ppc64:      T.setArch(Triple::ppc);     break;
   case Triple::sparcv9:    T.setArch(Triple::sparc);   break;
   case Triple::x86_64:     T.setArch(Triple::x86);     break;
   case Triple::amdil64:    T.setArch(Triple::amdil);   break;
   case Triple::hsail64:    T.setArch(Triple::hsail);   break;
   case Triple::spir64:     T.setArch(Triple::spir);    break;
   case Triple::wasm64:     T.setArch(Triple::wasm32);  break;
   }
   return T;
 }
 
 Triple Triple::get64BitArchVariant() const {
   Triple T(*this);
   switch (getArch()) {
   case Triple::UnknownArch:
   case Triple::avr:
   case Triple::hexagon:
   case Triple::kalimba:
   case Triple::msp430:
   case Triple::r600:
   case Triple::tce:
   case Triple::xcore:
   case Triple::sparcel:
   case Triple::shave:
     T.setArch(UnknownArch);
     break;
 
   case Triple::aarch64:
   case Triple::aarch64_be:
   case Triple::bpfel:
   case Triple::bpfeb:
   case Triple::le64:
   case Triple::amdil64:
   case Triple::amdgcn:
   case Triple::hsail64:
   case Triple::spir64:
   case Triple::mips64:
   case Triple::mips64el:
   case Triple::nvptx64:
   case Triple::ppc64:
   case Triple::ppc64le:
   case Triple::sparcv9:
   case Triple::systemz:
   case Triple::x86_64:
   case Triple::wasm64:
     // Already 64-bit.
     break;
 
   case Triple::arm:     T.setArch(Triple::aarch64);    break;
   case Triple::armeb:   T.setArch(Triple::aarch64_be); break;
   case Triple::le32:    T.setArch(Triple::le64);       break;
   case Triple::mips:    T.setArch(Triple::mips64);     break;
   case Triple::mipsel:  T.setArch(Triple::mips64el);   break;
   case Triple::nvptx:   T.setArch(Triple::nvptx64);    break;
   case Triple::ppc:     T.setArch(Triple::ppc64);      break;
   case Triple::sparc:   T.setArch(Triple::sparcv9);    break;
   case Triple::x86:     T.setArch(Triple::x86_64);     break;
   case Triple::amdil:   T.setArch(Triple::amdil64);    break;
   case Triple::hsail:   T.setArch(Triple::hsail64);    break;
   case Triple::spir:    T.setArch(Triple::spir64);     break;
   case Triple::thumb:   T.setArch(Triple::aarch64);    break;
   case Triple::thumbeb: T.setArch(Triple::aarch64_be); break;
   case Triple::wasm32:  T.setArch(Triple::wasm64);     break;
   }
   return T;
 }
 
 Triple Triple::getBigEndianArchVariant() const {
   Triple T(*this);
   switch (getArch()) {
   case Triple::UnknownArch:
   case Triple::amdgcn:
   case Triple::amdil64:
   case Triple::amdil:
   case Triple::avr:
   case Triple::hexagon:
   case Triple::hsail64:
   case Triple::hsail:
   case Triple::kalimba:
   case Triple::le32:
   case Triple::le64:
   case Triple::msp430:
   case Triple::nvptx64:
   case Triple::nvptx:
   case Triple::r600:
   case Triple::shave:
   case Triple::spir64:
   case Triple::spir:
   case Triple::wasm32:
   case Triple::wasm64:
   case Triple::x86:
   case Triple::x86_64:
   case Triple::xcore:
 
   // ARM is intentionally unsupported here, changing the architecture would
   // drop any arch suffixes.
   case Triple::arm:
   case Triple::thumb:
     T.setArch(UnknownArch);
     break;
 
   case Triple::aarch64_be:
   case Triple::armeb:
   case Triple::bpfeb:
   case Triple::mips64:
   case Triple::mips:
   case Triple::ppc64:
   case Triple::ppc:
   case Triple::sparc:
   case Triple::sparcv9:
   case Triple::systemz:
   case Triple::tce:
   case Triple::thumbeb:
     // Already big endian.
     break;
 
   case Triple::aarch64: T.setArch(Triple::aarch64_be); break;
   case Triple::bpfel:   T.setArch(Triple::bpfeb);      break;
   case Triple::mips64el:T.setArch(Triple::mips64);     break;
   case Triple::mipsel:  T.setArch(Triple::mips);       break;
   case Triple::ppc64le: T.setArch(Triple::ppc64);      break;
   case Triple::sparcel: T.setArch(Triple::sparc);      break;
   }
   return T;
 }
 
 Triple Triple::getLittleEndianArchVariant() const {
   Triple T(*this);
   switch (getArch()) {
   case Triple::UnknownArch:
   case Triple::ppc:
   case Triple::sparcv9:
   case Triple::systemz:
   case Triple::tce:
 
   // ARM is intentionally unsupported here, changing the architecture would
   // drop any arch suffixes.
   case Triple::armeb:
   case Triple::thumbeb:
     T.setArch(UnknownArch);
     break;
 
   case Triple::aarch64:
   case Triple::amdgcn:
   case Triple::amdil64:
   case Triple::amdil:
   case Triple::arm:
   case Triple::avr:
   case Triple::bpfel:
   case Triple::hexagon:
   case Triple::hsail64:
   case Triple::hsail:
   case Triple::kalimba:
   case Triple::le32:
   case Triple::le64:
   case Triple::mips64el:
   case Triple::mipsel:
   case Triple::msp430:
   case Triple::nvptx64:
   case Triple::nvptx:
   case Triple::ppc64le:
   case Triple::r600:
   case Triple::shave:
   case Triple::sparcel:
   case Triple::spir64:
   case Triple::spir:
   case Triple::thumb:
   case Triple::wasm32:
   case Triple::wasm64:
   case Triple::x86:
   case Triple::x86_64:
   case Triple::xcore:
     // Already little endian.
     break;
 
   case Triple::aarch64_be: T.setArch(Triple::aarch64);  break;
   case Triple::bpfeb:      T.setArch(Triple::bpfel);    break;
   case Triple::mips64:     T.setArch(Triple::mips64el); break;
   case Triple::mips:       T.setArch(Triple::mipsel);   break;
   case Triple::ppc64:      T.setArch(Triple::ppc64le);  break;
   case Triple::sparc:      T.setArch(Triple::sparcel);  break;
   }
   return T;
 }
 
 StringRef Triple::getARMCPUForArch(StringRef MArch) const {
   if (MArch.empty())
     MArch = getArchName();
   MArch = ARM::getCanonicalArchName(MArch);
 
   // Some defaults are forced.
   switch (getOS()) {
   case llvm::Triple::FreeBSD:
   case llvm::Triple::NetBSD:
     if (!MArch.empty() && MArch == "v6")
       return "arm1176jzf-s";
     break;
   case llvm::Triple::Win32:
     // FIXME: this is invalid for WindowsCE
     return "cortex-a9";
   case llvm::Triple::MacOSX:
   case llvm::Triple::IOS:
   case llvm::Triple::WatchOS:
     if (MArch == "v7k")
       return "cortex-a7";
     break;
   default:
     break;
   }
 
   if (MArch.empty())
     return StringRef();
 
   StringRef CPU = ARM::getDefaultCPU(MArch);
   if (!CPU.empty())
     return CPU;
 
   // If no specific architecture version is requested, return the minimum CPU
   // required by the OS and environment.
   switch (getOS()) {
   case llvm::Triple::NetBSD:
     switch (getEnvironment()) {
     case llvm::Triple::GNUEABIHF:
     case llvm::Triple::GNUEABI:
     case llvm::Triple::EABIHF:
     case llvm::Triple::EABI:
       return "arm926ej-s";
     default:
       return "strongarm";
     }
   case llvm::Triple::NaCl:
     return "cortex-a8";
   default:
     switch (getEnvironment()) {
     case llvm::Triple::EABIHF:
     case llvm::Triple::GNUEABIHF:
       return "arm1176jzf-s";
     default:
       return "arm7tdmi";
     }
   }
 
   llvm_unreachable("invalid arch name");
 }
Index: vendor/llvm/dist/lib/Target/Sparc/SparcInstrAliases.td
===================================================================
--- vendor/llvm/dist/lib/Target/Sparc/SparcInstrAliases.td	(revision 295845)
+++ vendor/llvm/dist/lib/Target/Sparc/SparcInstrAliases.td	(revision 295846)
@@ -1,461 +1,461 @@
 //===-- SparcInstrAliases.td - Instruction Aliases for Sparc Target -------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file contains instruction aliases for Sparc.
 //===----------------------------------------------------------------------===//
 
 // Instruction aliases for conditional moves.
 
 // mov<cond> <ccreg> rs2, rd
 multiclass intcond_mov_alias<string cond, int condVal, string ccreg,
                           Instruction movrr, Instruction movri,
                           Instruction fmovs, Instruction fmovd> {
 
   // mov<cond> (%icc|%xcc), rs2, rd
   def : InstAlias<!strconcat(!strconcat(!strconcat("mov", cond), ccreg),
                              ", $rs2, $rd"),
                   (movrr IntRegs:$rd, IntRegs:$rs2, condVal)>;
 
   // mov<cond> (%icc|%xcc), simm11, rd
   def : InstAlias<!strconcat(!strconcat(!strconcat("mov", cond), ccreg),
                              ", $simm11, $rd"),
                   (movri IntRegs:$rd, i32imm:$simm11, condVal)>;
 
   // fmovs<cond> (%icc|%xcc), $rs2, $rd
   def : InstAlias<!strconcat(!strconcat(!strconcat("fmovs", cond), ccreg),
                              ", $rs2, $rd"),
                   (fmovs FPRegs:$rd, FPRegs:$rs2, condVal)>;
 
   // fmovd<cond> (%icc|%xcc), $rs2, $rd
   def : InstAlias<!strconcat(!strconcat(!strconcat("fmovd", cond), ccreg),
                              ", $rs2, $rd"),
                   (fmovd DFPRegs:$rd, DFPRegs:$rs2, condVal)>;
 }
 
 // mov<cond> <ccreg> rs2, rd
 multiclass fpcond_mov_alias<string cond, int condVal,
                            Instruction movrr, Instruction movri,
                            Instruction fmovs, Instruction fmovd> {
 
   // mov<cond> %fcc[0-3], rs2, rd
   def : InstAlias<!strconcat(!strconcat("mov", cond), " $cc, $rs2, $rd"),
                   (movrr IntRegs:$rd, FCCRegs:$cc, IntRegs:$rs2, condVal)>;
 
   // mov<cond> %fcc[0-3], simm11, rd
   def : InstAlias<!strconcat(!strconcat("mov", cond), " $cc, $simm11, $rd"),
                   (movri IntRegs:$rd, FCCRegs:$cc, i32imm:$simm11, condVal)>;
 
   // fmovs<cond> %fcc[0-3], $rs2, $rd
   def : InstAlias<!strconcat(!strconcat("fmovs", cond), " $cc, $rs2, $rd"),
                   (fmovs FPRegs:$rd, FCCRegs:$cc, FPRegs:$rs2, condVal)>;
 
   // fmovd<cond> %fcc[0-3], $rs2, $rd
   def : InstAlias<!strconcat(!strconcat("fmovd", cond), " $cc, $rs2, $rd"),
                   (fmovd DFPRegs:$rd, FCCRegs:$cc, DFPRegs:$rs2, condVal)>;
 }
 
 // Instruction aliases for integer conditional branches and moves.
 multiclass int_cond_alias<string cond, int condVal> {
 
   // b<cond> $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), " $imm"),
                   (BCOND brtarget:$imm, condVal)>;
 
   // b<cond>,a $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), ",a $imm"),
                   (BCONDA brtarget:$imm, condVal)>;
 
   // b<cond> %icc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), " %icc, $imm"),
                   (BPICC brtarget:$imm, condVal)>, Requires<[HasV9]>;
 
   // b<cond>,pt %icc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), ",pt %icc, $imm"),
                   (BPICC brtarget:$imm, condVal)>, Requires<[HasV9]>;
 
   // b<cond>,a %icc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), ",a %icc, $imm"),
                   (BPICCA brtarget:$imm, condVal)>, Requires<[HasV9]>;
 
   // b<cond>,a,pt %icc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), ",a,pt %icc, $imm"),
                   (BPICCA brtarget:$imm, condVal)>, Requires<[HasV9]>;
 
   // b<cond>,pn %icc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), ",pn %icc, $imm"),
                   (BPICCNT brtarget:$imm, condVal)>, Requires<[HasV9]>;
 
   // b<cond>,a,pn %icc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), ",a,pn %icc, $imm"),
                   (BPICCANT brtarget:$imm, condVal)>, Requires<[HasV9]>;
 
   // b<cond> %xcc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), " %xcc, $imm"),
                   (BPXCC brtarget:$imm, condVal)>, Requires<[Is64Bit]>;
 
   // b<cond>,pt %xcc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), ",pt %xcc, $imm"),
                   (BPXCC brtarget:$imm, condVal)>, Requires<[Is64Bit]>;
 
   // b<cond>,a %xcc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), ",a %xcc, $imm"),
                   (BPXCCA brtarget:$imm, condVal)>, Requires<[Is64Bit]>;
 
   // b<cond>,a,pt %xcc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), ",a,pt %xcc, $imm"),
                   (BPXCCA brtarget:$imm, condVal)>, Requires<[Is64Bit]>;
 
   // b<cond>,pn %xcc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), ",pn %xcc, $imm"),
                   (BPXCCNT brtarget:$imm, condVal)>, Requires<[Is64Bit]>;
 
   // b<cond>,a,pn %xcc, $imm
   def : InstAlias<!strconcat(!strconcat("b", cond), ",a,pn %xcc, $imm"),
                   (BPXCCANT brtarget:$imm, condVal)>, Requires<[Is64Bit]>;
 
 
   defm : intcond_mov_alias<cond, condVal, " %icc",
                             MOVICCrr, MOVICCri,
                             FMOVS_ICC, FMOVD_ICC>, Requires<[HasV9]>;
 
   defm : intcond_mov_alias<cond, condVal, " %xcc",
                             MOVXCCrr, MOVXCCri,
                             FMOVS_XCC, FMOVD_XCC>, Requires<[Is64Bit]>;
 
   // fmovq<cond> (%icc|%xcc), $rs2, $rd
   def : InstAlias<!strconcat(!strconcat("fmovq", cond), " %icc, $rs2, $rd"),
                   (FMOVQ_ICC QFPRegs:$rd, QFPRegs:$rs2, condVal)>,
                   Requires<[HasV9, HasHardQuad]>;
   def : InstAlias<!strconcat(!strconcat("fmovq", cond), " %xcc, $rs2, $rd"),
                   (FMOVQ_XCC QFPRegs:$rd, QFPRegs:$rs2, condVal)>,
                   Requires<[Is64Bit, HasHardQuad]>;
 
   // t<cond> %icc, rs1 + rs2
   def : InstAlias<!strconcat(!strconcat("t", cond), " %icc, $rs1 + $rs2"),
                   (TICCrr IntRegs:$rs1, IntRegs:$rs2, condVal)>,
                   Requires<[HasV9]>;
 
   // t<cond> %icc,  rs => t<cond> %icc, G0 + rs
   def : InstAlias<!strconcat(!strconcat("t", cond), " %icc, $rs2"),
                   (TICCrr G0, IntRegs:$rs2, condVal)>,
                   Requires<[HasV9]>;
 
   // t<cond> %xcc, rs1 + rs2
   def : InstAlias<!strconcat(!strconcat("t", cond), " %xcc, $rs1 + $rs2"),
                   (TXCCrr IntRegs:$rs1, IntRegs:$rs2, condVal)>,
                   Requires<[HasV9]>;
 
   // t<cond> %xcc, rs => t<cond> %xcc, G0 + rs
   def : InstAlias<!strconcat(!strconcat("t", cond), " %xcc, $rs2"),
                   (TXCCrr G0, IntRegs:$rs2, condVal)>,
                   Requires<[HasV9]>;
 
   // t<cond> rs1 + rs2 => t<cond> %icc, rs1 + rs2
   def : InstAlias<!strconcat(!strconcat("t", cond), " $rs1 + $rs2"),
                   (TICCrr IntRegs:$rs1, IntRegs:$rs2, condVal)>;
 
   // t<cond> rs=> t<cond> %icc,  G0 + rs2
   def : InstAlias<!strconcat(!strconcat("t", cond), " $rs2"),
                   (TICCrr G0, IntRegs:$rs2, condVal)>;
 
   // t<cond> %icc, rs1 + imm
   def : InstAlias<!strconcat(!strconcat("t", cond), " %icc, $rs1 + $imm"),
                   (TICCri IntRegs:$rs1, i32imm:$imm, condVal)>,
                   Requires<[HasV9]>;
   // t<cond> %icc, imm => t<cond> %icc, G0 + imm
   def : InstAlias<!strconcat(!strconcat("t", cond), " %icc, $imm"),
                   (TICCri G0, i32imm:$imm, condVal)>,
                   Requires<[HasV9]>;
   // t<cond> %xcc, rs1 + imm
   def : InstAlias<!strconcat(!strconcat("t", cond), " %xcc, $rs1 + $imm"),
                   (TXCCri IntRegs:$rs1, i32imm:$imm, condVal)>,
                   Requires<[HasV9]>;
   // t<cond> %xcc, imm => t<cond> %xcc, G0 + imm
   def : InstAlias<!strconcat(!strconcat("t", cond), " %xcc, $imm"),
                   (TXCCri G0, i32imm:$imm, condVal)>,
                   Requires<[HasV9]>;
 
   // t<cond> rs1 + imm => t<cond> %icc, rs1 + imm
   def : InstAlias<!strconcat(!strconcat("t", cond), " $rs1 + $imm"),
                   (TICCri IntRegs:$rs1, i32imm:$imm, condVal)>;
 
   // t<cond> imm => t<cond> %icc, G0 + imm
   def : InstAlias<!strconcat(!strconcat("t", cond), " $imm"),
                   (TICCri G0, i32imm:$imm, condVal)>;
 
 }
 
 
 // Instruction aliases for floating point conditional branches and moves.
 multiclass fp_cond_alias<string cond, int condVal> {
 
   // fb<cond> $imm
   def : InstAlias<!strconcat(!strconcat("fb", cond), " $imm"),
                   (FBCOND brtarget:$imm, condVal), 0>;
 
   // fb<cond>,a $imm
   def : InstAlias<!strconcat(!strconcat("fb", cond), ",a $imm"),
                   (FBCONDA brtarget:$imm, condVal), 0>;
 
   // fb<cond> %fcc0, $imm
   def : InstAlias<!strconcat(!strconcat("fb", cond), " $cc, $imm"),
                   (BPFCC brtarget:$imm, condVal, FCCRegs:$cc)>,
                   Requires<[HasV9]>;
 
   // fb<cond>,pt %fcc0, $imm
   def : InstAlias<!strconcat(!strconcat("fb", cond), ",pt $cc, $imm"),
                   (BPFCC brtarget:$imm, condVal, FCCRegs:$cc)>,
                   Requires<[HasV9]>;
 
   // fb<cond>,a %fcc0, $imm
   def : InstAlias<!strconcat(!strconcat("fb", cond), ",a $cc, $imm"),
                   (BPFCCA brtarget:$imm, condVal, FCCRegs:$cc)>,
                   Requires<[HasV9]>;
 
   // fb<cond>,a,pt %fcc0, $imm
   def : InstAlias<!strconcat(!strconcat("fb", cond), ",a,pt $cc, $imm"),
                   (BPFCCA brtarget:$imm, condVal, FCCRegs:$cc)>,
                    Requires<[HasV9]>;
 
   // fb<cond>,pn %fcc0, $imm
   def : InstAlias<!strconcat(!strconcat("fb", cond), ",pn $cc, $imm"),
                   (BPFCCNT brtarget:$imm, condVal, FCCRegs:$cc)>,
                    Requires<[HasV9]>;
 
   // fb<cond>,a,pn %fcc0, $imm
   def : InstAlias<!strconcat(!strconcat("fb", cond), ",a,pn $cc, $imm"),
                   (BPFCCANT brtarget:$imm, condVal, FCCRegs:$cc)>,
                   Requires<[HasV9]>;
 
   defm : fpcond_mov_alias<cond, condVal,
                           V9MOVFCCrr, V9MOVFCCri,
                           V9FMOVS_FCC, V9FMOVD_FCC>, Requires<[HasV9]>;
 
   // fmovq<cond> %fcc0, $rs2, $rd
   def : InstAlias<!strconcat(!strconcat("fmovq", cond), " $cc, $rs2, $rd"),
                   (V9FMOVQ_FCC QFPRegs:$rd, FCCRegs:$cc, QFPRegs:$rs2,
                                                           condVal)>,
                   Requires<[HasV9, HasHardQuad]>;
 }
 
 defm : int_cond_alias<"a",    0b1000>;
 defm : int_cond_alias<"",     0b1000>; // same as a; gnu asm, not in manual
 defm : int_cond_alias<"n",    0b0000>;
 defm : int_cond_alias<"ne",   0b1001>;
 defm : int_cond_alias<"nz",   0b1001>; // same as ne
 defm : int_cond_alias<"e",    0b0001>;
 defm : int_cond_alias<"eq",    0b0001>; // same as e
 defm : int_cond_alias<"z",    0b0001>; // same as e
 defm : int_cond_alias<"g",    0b1010>;
 defm : int_cond_alias<"le",   0b0010>;
 defm : int_cond_alias<"ge",   0b1011>;
 defm : int_cond_alias<"l",    0b0011>;
 defm : int_cond_alias<"gu",   0b1100>;
 defm : int_cond_alias<"leu",  0b0100>;
 defm : int_cond_alias<"cc",   0b1101>;
 defm : int_cond_alias<"geu",  0b1101>; // same as cc
 defm : int_cond_alias<"cs",   0b0101>;
 defm : int_cond_alias<"lu",   0b0101>; // same as cs
 defm : int_cond_alias<"pos",  0b1110>;
 defm : int_cond_alias<"neg",  0b0110>;
 defm : int_cond_alias<"vc",   0b1111>;
 defm : int_cond_alias<"vs",   0b0111>;
 
-defm : fp_cond_alias<"a",     0b0000>;
-defm : fp_cond_alias<"",      0b0000>; // same as a; gnu asm, not in manual
-defm : fp_cond_alias<"n",     0b1000>;
+defm : fp_cond_alias<"a",     0b1000>;
+defm : fp_cond_alias<"",      0b1000>; // same as a; gnu asm, not in manual
+defm : fp_cond_alias<"n",     0b0000>;
 defm : fp_cond_alias<"u",     0b0111>;
 defm : fp_cond_alias<"g",     0b0110>;
 defm : fp_cond_alias<"ug",    0b0101>;
 defm : fp_cond_alias<"l",     0b0100>;
 defm : fp_cond_alias<"ul",    0b0011>;
 defm : fp_cond_alias<"lg",    0b0010>;
 defm : fp_cond_alias<"ne",    0b0001>;
 defm : fp_cond_alias<"nz",    0b0001>; // same as ne
 defm : fp_cond_alias<"e",     0b1001>;
 defm : fp_cond_alias<"z",     0b1001>; // same as e
 defm : fp_cond_alias<"ue",    0b1010>;
 defm : fp_cond_alias<"ge",    0b1011>;
 defm : fp_cond_alias<"uge",   0b1100>;
 defm : fp_cond_alias<"le",    0b1101>;
 defm : fp_cond_alias<"ule",   0b1110>;
 defm : fp_cond_alias<"o",     0b1111>;
 
 // Section A.3 Synthetic Instructions
 
 // Most are marked as Emit=0, so that they are not used for disassembly. This is
 // an aesthetic issue, but the chosen policy is to typically prefer using the
 // non-alias form, except for the most obvious and clarifying aliases: cmp, jmp,
 // call, tst, ret, retl.
 
 // Note: cmp is handled in SparcInstrInfo.
 //       jmp/call/ret/retl have special case handling for output in
 //       SparcInstPrinter.cpp
 
 // jmp addr -> jmpl addr, %g0
 def : InstAlias<"jmp $addr", (JMPLrr G0, MEMrr:$addr), 0>;
 def : InstAlias<"jmp $addr", (JMPLri G0, MEMri:$addr), 0>;
 
 // call addr -> jmpl addr, %o7
 def : InstAlias<"call $addr", (JMPLrr O7, MEMrr:$addr), 0>;
 def : InstAlias<"call $addr", (JMPLri O7, MEMri:$addr), 0>;
 
 // tst reg -> orcc %g0, reg, %g0
 def : InstAlias<"tst $rs2", (ORCCrr G0, IntRegs:$rs2, G0)>;
 
 // ret -> jmpl %i7+8, %g0 (aka RET 8)
 def : InstAlias<"ret", (RET 8)>;
 
 // retl -> jmpl %o7+8, %g0 (aka RETL 8)
 def : InstAlias<"retl", (RETL 8)>;
 
 // restore -> restore %g0, %g0, %g0
 def : InstAlias<"restore", (RESTORErr G0, G0, G0)>;
 
 // save -> restore %g0, %g0, %g0
 def : InstAlias<"save", (SAVErr G0, G0, G0)>;
 
 // set value, rd
 // (turns into a sequence of sethi+or, depending on the value)
 // def : InstAlias<"set $val, $rd", (ORri IntRegs:$rd, (SETHIi (HI22 imm:$val)), (LO10 imm:$val))>;
 def SET : AsmPseudoInst<(outs IntRegs:$rd), (ins i32imm:$val), "set $val, $rd">;
 
 // not rd -> xnor rd, %g0, rd
 def : InstAlias<"not $rd", (XNORrr IntRegs:$rd, IntRegs:$rd, G0), 0>;
 
 // not reg, rd -> xnor reg, %g0, rd
 def : InstAlias<"not $rs1, $rd", (XNORrr IntRegs:$rd, IntRegs:$rs1, G0), 0>;
 
 // neg rd -> sub %g0, rd, rd
 def : InstAlias<"neg $rd", (SUBrr IntRegs:$rd, G0, IntRegs:$rd), 0>;
 
 // neg reg, rd -> sub %g0, reg, rd
 def : InstAlias<"neg $rs2, $rd", (SUBrr IntRegs:$rd, G0, IntRegs:$rs2), 0>;
 
 // inc rd -> add rd, 1, rd
 def : InstAlias<"inc $rd", (ADDri IntRegs:$rd, IntRegs:$rd, 1), 0>;
 
 // inc simm13, rd -> add rd, simm13, rd
 def : InstAlias<"inc $simm13, $rd", (ADDri IntRegs:$rd, IntRegs:$rd, i32imm:$simm13), 0>;
 
 // inccc rd -> addcc rd, 1, rd
 def : InstAlias<"inccc $rd", (ADDCCri IntRegs:$rd, IntRegs:$rd, 1), 0>;
 
 // inccc simm13, rd -> addcc rd, simm13, rd
 def : InstAlias<"inccc $simm13, $rd", (ADDCCri IntRegs:$rd, IntRegs:$rd, i32imm:$simm13), 0>;
 
 // dec rd -> sub rd, 1, rd
 def : InstAlias<"dec $rd", (SUBri IntRegs:$rd, IntRegs:$rd, 1), 0>;
 
 // dec simm13, rd -> sub rd, simm13, rd
 def : InstAlias<"dec $simm13, $rd", (SUBri IntRegs:$rd, IntRegs:$rd, i32imm:$simm13), 0>;
 
 // deccc rd -> subcc rd, 1, rd
 def : InstAlias<"deccc $rd", (SUBCCri IntRegs:$rd, IntRegs:$rd, 1), 0>;
 
 // deccc simm13, rd -> subcc rd, simm13, rd
 def : InstAlias<"deccc $simm13, $rd", (SUBCCri IntRegs:$rd, IntRegs:$rd, i32imm:$simm13), 0>;
 
 // btst reg_or_imm, reg -> andcc reg,reg_or_imm,%g0
 def : InstAlias<"btst $rs2, $rs1", (ANDCCrr G0, IntRegs:$rs1, IntRegs:$rs2), 0>;
 def : InstAlias<"btst $simm13, $rs1", (ANDCCri G0, IntRegs:$rs1, i32imm:$simm13), 0>;
 
 // bset reg_or_imm, rd -> or rd,reg_or_imm,rd
 def : InstAlias<"bset $rs2, $rd", (ORrr IntRegs:$rd, IntRegs:$rd, IntRegs:$rs2), 0>;
 def : InstAlias<"bset $simm13, $rd", (ORri IntRegs:$rd, IntRegs:$rd, i32imm:$simm13), 0>;
 
 // bclr reg_or_imm, rd -> andn rd,reg_or_imm,rd
 def : InstAlias<"bclr $rs2, $rd", (ANDNrr IntRegs:$rd, IntRegs:$rd, IntRegs:$rs2), 0>;
 def : InstAlias<"bclr $simm13, $rd", (ANDNri IntRegs:$rd, IntRegs:$rd, i32imm:$simm13), 0>;
 
 // btog reg_or_imm, rd -> xor rd,reg_or_imm,rd
 def : InstAlias<"btog $rs2, $rd", (XORrr IntRegs:$rd, IntRegs:$rd, IntRegs:$rs2), 0>;
 def : InstAlias<"btog $simm13, $rd", (XORri IntRegs:$rd, IntRegs:$rd, i32imm:$simm13), 0>;
 
 
 // clr rd -> or %g0, %g0, rd
 def : InstAlias<"clr $rd", (ORrr IntRegs:$rd, G0, G0), 0>;
 
 // clr{b,h,} [addr] -> st{b,h,} %g0, [addr]
 def : InstAlias<"clrb [$addr]", (STBrr MEMrr:$addr, G0), 0>;
 def : InstAlias<"clrb [$addr]", (STBri MEMri:$addr, G0), 0>;
 def : InstAlias<"clrh [$addr]", (STHrr MEMrr:$addr, G0), 0>;
 def : InstAlias<"clrh [$addr]", (STHri MEMri:$addr, G0), 0>;
 def : InstAlias<"clr [$addr]", (STrr MEMrr:$addr, G0), 0>;
 def : InstAlias<"clr [$addr]", (STri MEMri:$addr, G0), 0>;
 
 
 // mov reg_or_imm, rd -> or %g0, reg_or_imm, rd
 def : InstAlias<"mov $rs2, $rd", (ORrr IntRegs:$rd, G0, IntRegs:$rs2)>;
 def : InstAlias<"mov $simm13, $rd", (ORri IntRegs:$rd, G0, i32imm:$simm13)>;
 
 // mov specialreg, rd -> rd specialreg, rd
 def : InstAlias<"mov $asr, $rd", (RDASR IntRegs:$rd, ASRRegs:$asr), 0>;
 def : InstAlias<"mov %psr, $rd", (RDPSR IntRegs:$rd), 0>;
 def : InstAlias<"mov %wim, $rd", (RDWIM IntRegs:$rd), 0>;
 def : InstAlias<"mov %tbr, $rd", (RDTBR IntRegs:$rd), 0>;
 
 // mov reg_or_imm, specialreg -> wr %g0, reg_or_imm, specialreg
 def : InstAlias<"mov $rs2, $asr", (WRASRrr ASRRegs:$asr, G0, IntRegs:$rs2), 0>;
 def : InstAlias<"mov $simm13, $asr", (WRASRri ASRRegs:$asr, G0, i32imm:$simm13), 0>;
 def : InstAlias<"mov $rs2, %psr", (WRPSRrr G0, IntRegs:$rs2), 0>;
 def : InstAlias<"mov $simm13, %psr", (WRPSRri G0, i32imm:$simm13), 0>;
 def : InstAlias<"mov $rs2, %wim", (WRWIMrr G0, IntRegs:$rs2), 0>;
 def : InstAlias<"mov $simm13, %wim", (WRWIMri G0, i32imm:$simm13), 0>;
 def : InstAlias<"mov $rs2, %tbr", (WRTBRrr G0, IntRegs:$rs2), 0>;
 def : InstAlias<"mov $simm13, %tbr", (WRTBRri G0, i32imm:$simm13), 0>;
 
 // End of Section A.3
 
 // wr reg_or_imm, specialreg -> wr %g0, reg_or_imm, specialreg
 // (aka: omit the first arg when it's g0. This is not in the manual, but is
 // supported by gnu and solaris as)
 def : InstAlias<"wr $rs2, $asr", (WRASRrr ASRRegs:$asr, G0, IntRegs:$rs2), 0>;
 def : InstAlias<"wr $simm13, $asr", (WRASRri ASRRegs:$asr, G0, i32imm:$simm13), 0>;
 def : InstAlias<"wr $rs2, %psr", (WRPSRrr G0, IntRegs:$rs2), 0>;
 def : InstAlias<"wr $simm13, %psr", (WRPSRri G0, i32imm:$simm13), 0>;
 def : InstAlias<"wr $rs2, %wim", (WRWIMrr G0, IntRegs:$rs2), 0>;
 def : InstAlias<"wr $simm13, %wim", (WRWIMri G0, i32imm:$simm13), 0>;
 def : InstAlias<"wr $rs2, %tbr", (WRTBRrr G0, IntRegs:$rs2), 0>;
 def : InstAlias<"wr $simm13, %tbr", (WRTBRri G0, i32imm:$simm13), 0>;
 
 
 // flush -> flush %g0
 def : InstAlias<"flush", (FLUSH), 0>;
 
 
 def : MnemonicAlias<"lduw", "ld">, Requires<[HasV9]>;
 def : MnemonicAlias<"lduwa", "lda">, Requires<[HasV9]>;
 
 def : MnemonicAlias<"return", "rett">, Requires<[HasV9]>;
 
 def : MnemonicAlias<"addc", "addx">, Requires<[HasV9]>;
 def : MnemonicAlias<"addccc", "addxcc">, Requires<[HasV9]>;
 
 def : MnemonicAlias<"subc", "subx">, Requires<[HasV9]>;
 def : MnemonicAlias<"subccc", "subxcc">, Requires<[HasV9]>;
 
 
 def : InstAlias<"fcmps $rs1, $rs2", (V9FCMPS FCC0, FPRegs:$rs1, FPRegs:$rs2)>;
 def : InstAlias<"fcmpd $rs1, $rs2", (V9FCMPD FCC0, DFPRegs:$rs1, DFPRegs:$rs2)>;
 def : InstAlias<"fcmpq $rs1, $rs2", (V9FCMPQ FCC0, QFPRegs:$rs1, QFPRegs:$rs2)>,
                 Requires<[HasHardQuad]>;
 
 def : InstAlias<"fcmpes $rs1, $rs2", (V9FCMPES FCC0, FPRegs:$rs1, FPRegs:$rs2)>;
 def : InstAlias<"fcmped $rs1, $rs2", (V9FCMPED FCC0, DFPRegs:$rs1,
                                                      DFPRegs:$rs2)>;
 def : InstAlias<"fcmpeq $rs1, $rs2", (V9FCMPEQ FCC0, QFPRegs:$rs1,
                                                      QFPRegs:$rs2)>,
                 Requires<[HasHardQuad]>;
 
 // signx rd -> sra rd, %g0, rd
 def : InstAlias<"signx $rd", (SRArr IntRegs:$rd, IntRegs:$rd, G0), 0>, Requires<[HasV9]>;
 
 // signx reg, rd -> sra reg, %g0, rd
 def : InstAlias<"signx $rs1, $rd", (SRArr IntRegs:$rd, IntRegs:$rs1, G0), 0>, Requires<[HasV9]>;
Index: vendor/llvm/dist/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
===================================================================
--- vendor/llvm/dist/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp	(revision 295845)
+++ vendor/llvm/dist/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp	(revision 295846)
@@ -1,855 +1,837 @@
 //===-- X86AsmBackend.cpp - X86 Assembler Backend -------------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 
 #include "MCTargetDesc/X86BaseInfo.h"
 #include "MCTargetDesc/X86FixupKinds.h"
 #include "llvm/ADT/StringSwitch.h"
 #include "llvm/MC/MCAsmBackend.h"
 #include "llvm/MC/MCELFObjectWriter.h"
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCFixupKindInfo.h"
 #include "llvm/MC/MCInst.h"
 #include "llvm/MC/MCMachObjectWriter.h"
 #include "llvm/MC/MCObjectWriter.h"
 #include "llvm/MC/MCRegisterInfo.h"
 #include "llvm/MC/MCSectionCOFF.h"
 #include "llvm/MC/MCSectionELF.h"
 #include "llvm/MC/MCSectionMachO.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/ELF.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MachO.h"
 #include "llvm/Support/TargetRegistry.h"
 #include "llvm/Support/raw_ostream.h"
 using namespace llvm;
 
 static unsigned getFixupKindLog2Size(unsigned Kind) {
   switch (Kind) {
   default:
     llvm_unreachable("invalid fixup kind!");
   case FK_PCRel_1:
   case FK_SecRel_1:
   case FK_Data_1:
     return 0;
   case FK_PCRel_2:
   case FK_SecRel_2:
   case FK_Data_2:
     return 1;
   case FK_PCRel_4:
   case X86::reloc_riprel_4byte:
   case X86::reloc_riprel_4byte_movq_load:
   case X86::reloc_signed_4byte:
   case X86::reloc_global_offset_table:
   case FK_SecRel_4:
   case FK_Data_4:
     return 2;
   case FK_PCRel_8:
   case FK_SecRel_8:
   case FK_Data_8:
   case X86::reloc_global_offset_table8:
     return 3;
   }
 }
 
 namespace {
 
 class X86ELFObjectWriter : public MCELFObjectTargetWriter {
 public:
   X86ELFObjectWriter(bool is64Bit, uint8_t OSABI, uint16_t EMachine,
                      bool HasRelocationAddend, bool foobar)
     : MCELFObjectTargetWriter(is64Bit, OSABI, EMachine, HasRelocationAddend) {}
 };
 
 class X86AsmBackend : public MCAsmBackend {
   const StringRef CPU;
   bool HasNopl;
-  uint64_t MaxNopLength;
+  const uint64_t MaxNopLength;
 public:
-  X86AsmBackend(const Target &T, StringRef CPU) : MCAsmBackend(), CPU(CPU) {
+  X86AsmBackend(const Target &T, StringRef CPU)
+      : MCAsmBackend(), CPU(CPU), MaxNopLength(CPU == "slm" ? 7 : 15) {
     HasNopl = CPU != "generic" && CPU != "i386" && CPU != "i486" &&
               CPU != "i586" && CPU != "pentium" && CPU != "pentium-mmx" &&
               CPU != "i686" && CPU != "k6" && CPU != "k6-2" && CPU != "k6-3" &&
               CPU != "geode" && CPU != "winchip-c6" && CPU != "winchip2" &&
               CPU != "c3" && CPU != "c3-2";
-    // Max length of true long nop instruction is 15 bytes.
-    // Max length of long nop replacement instruction is 7 bytes.
-    // Taking into account SilverMont architecture features max length of nops
-    // is reduced for it to achieve better performance.
-    MaxNopLength = (!HasNopl || CPU == "slm") ? 7 : 15;
   }
 
   unsigned getNumFixupKinds() const override {
     return X86::NumTargetFixupKinds;
   }
 
   const MCFixupKindInfo &getFixupKindInfo(MCFixupKind Kind) const override {
     const static MCFixupKindInfo Infos[X86::NumTargetFixupKinds] = {
       { "reloc_riprel_4byte", 0, 4 * 8, MCFixupKindInfo::FKF_IsPCRel },
       { "reloc_riprel_4byte_movq_load", 0, 4 * 8, MCFixupKindInfo::FKF_IsPCRel},
       { "reloc_signed_4byte", 0, 4 * 8, 0},
       { "reloc_global_offset_table", 0, 4 * 8, 0}
     };
 
     if (Kind < FirstTargetFixupKind)
       return MCAsmBackend::getFixupKindInfo(Kind);
 
     assert(unsigned(Kind - FirstTargetFixupKind) < getNumFixupKinds() &&
            "Invalid kind!");
     return Infos[Kind - FirstTargetFixupKind];
   }
 
   void applyFixup(const MCFixup &Fixup, char *Data, unsigned DataSize,
                   uint64_t Value, bool IsPCRel) const override {
     unsigned Size = 1 << getFixupKindLog2Size(Fixup.getKind());
 
     assert(Fixup.getOffset() + Size <= DataSize &&
            "Invalid fixup offset!");
 
     // Check that uppper bits are either all zeros or all ones.
     // Specifically ignore overflow/underflow as long as the leakage is
     // limited to the lower bits. This is to remain compatible with
     // other assemblers.
     assert(isIntN(Size * 8 + 1, Value) &&
            "Value does not fit in the Fixup field");
 
     for (unsigned i = 0; i != Size; ++i)
       Data[Fixup.getOffset() + i] = uint8_t(Value >> (i * 8));
   }
 
   bool mayNeedRelaxation(const MCInst &Inst) const override;
 
   bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value,
                             const MCRelaxableFragment *DF,
                             const MCAsmLayout &Layout) const override;
 
   void relaxInstruction(const MCInst &Inst, MCInst &Res) const override;
 
   bool writeNopData(uint64_t Count, MCObjectWriter *OW) const override;
 };
 } // end anonymous namespace
 
 static unsigned getRelaxedOpcodeBranch(unsigned Op) {
   switch (Op) {
   default:
     return Op;
 
   case X86::JAE_1: return X86::JAE_4;
   case X86::JA_1:  return X86::JA_4;
   case X86::JBE_1: return X86::JBE_4;
   case X86::JB_1:  return X86::JB_4;
   case X86::JE_1:  return X86::JE_4;
   case X86::JGE_1: return X86::JGE_4;
   case X86::JG_1:  return X86::JG_4;
   case X86::JLE_1: return X86::JLE_4;
   case X86::JL_1:  return X86::JL_4;
   case X86::JMP_1: return X86::JMP_4;
   case X86::JNE_1: return X86::JNE_4;
   case X86::JNO_1: return X86::JNO_4;
   case X86::JNP_1: return X86::JNP_4;
   case X86::JNS_1: return X86::JNS_4;
   case X86::JO_1:  return X86::JO_4;
   case X86::JP_1:  return X86::JP_4;
   case X86::JS_1:  return X86::JS_4;
   }
 }
 
 static unsigned getRelaxedOpcodeArith(unsigned Op) {
   switch (Op) {
   default:
     return Op;
 
     // IMUL
   case X86::IMUL16rri8: return X86::IMUL16rri;
   case X86::IMUL16rmi8: return X86::IMUL16rmi;
   case X86::IMUL32rri8: return X86::IMUL32rri;
   case X86::IMUL32rmi8: return X86::IMUL32rmi;
   case X86::IMUL64rri8: return X86::IMUL64rri32;
   case X86::IMUL64rmi8: return X86::IMUL64rmi32;
 
     // AND
   case X86::AND16ri8: return X86::AND16ri;
   case X86::AND16mi8: return X86::AND16mi;
   case X86::AND32ri8: return X86::AND32ri;
   case X86::AND32mi8: return X86::AND32mi;
   case X86::AND64ri8: return X86::AND64ri32;
   case X86::AND64mi8: return X86::AND64mi32;
 
     // OR
   case X86::OR16ri8: return X86::OR16ri;
   case X86::OR16mi8: return X86::OR16mi;
   case X86::OR32ri8: return X86::OR32ri;
   case X86::OR32mi8: return X86::OR32mi;
   case X86::OR64ri8: return X86::OR64ri32;
   case X86::OR64mi8: return X86::OR64mi32;
 
     // XOR
   case X86::XOR16ri8: return X86::XOR16ri;
   case X86::XOR16mi8: return X86::XOR16mi;
   case X86::XOR32ri8: return X86::XOR32ri;
   case X86::XOR32mi8: return X86::XOR32mi;
   case X86::XOR64ri8: return X86::XOR64ri32;
   case X86::XOR64mi8: return X86::XOR64mi32;
 
     // ADD
   case X86::ADD16ri8: return X86::ADD16ri;
   case X86::ADD16mi8: return X86::ADD16mi;
   case X86::ADD32ri8: return X86::ADD32ri;
   case X86::ADD32mi8: return X86::ADD32mi;
   case X86::ADD64ri8: return X86::ADD64ri32;
   case X86::ADD64mi8: return X86::ADD64mi32;
 
    // ADC
   case X86::ADC16ri8: return X86::ADC16ri;
   case X86::ADC16mi8: return X86::ADC16mi;
   case X86::ADC32ri8: return X86::ADC32ri;
   case X86::ADC32mi8: return X86::ADC32mi;
   case X86::ADC64ri8: return X86::ADC64ri32;
   case X86::ADC64mi8: return X86::ADC64mi32;
 
     // SUB
   case X86::SUB16ri8: return X86::SUB16ri;
   case X86::SUB16mi8: return X86::SUB16mi;
   case X86::SUB32ri8: return X86::SUB32ri;
   case X86::SUB32mi8: return X86::SUB32mi;
   case X86::SUB64ri8: return X86::SUB64ri32;
   case X86::SUB64mi8: return X86::SUB64mi32;
 
    // SBB
   case X86::SBB16ri8: return X86::SBB16ri;
   case X86::SBB16mi8: return X86::SBB16mi;
   case X86::SBB32ri8: return X86::SBB32ri;
   case X86::SBB32mi8: return X86::SBB32mi;
   case X86::SBB64ri8: return X86::SBB64ri32;
   case X86::SBB64mi8: return X86::SBB64mi32;
 
     // CMP
   case X86::CMP16ri8: return X86::CMP16ri;
   case X86::CMP16mi8: return X86::CMP16mi;
   case X86::CMP32ri8: return X86::CMP32ri;
   case X86::CMP32mi8: return X86::CMP32mi;
   case X86::CMP64ri8: return X86::CMP64ri32;
   case X86::CMP64mi8: return X86::CMP64mi32;
 
     // PUSH
   case X86::PUSH32i8:  return X86::PUSHi32;
   case X86::PUSH16i8:  return X86::PUSHi16;
   case X86::PUSH64i8:  return X86::PUSH64i32;
   }
 }
 
 static unsigned getRelaxedOpcode(unsigned Op) {
   unsigned R = getRelaxedOpcodeArith(Op);
   if (R != Op)
     return R;
   return getRelaxedOpcodeBranch(Op);
 }
 
 bool X86AsmBackend::mayNeedRelaxation(const MCInst &Inst) const {
   // Branches can always be relaxed.
   if (getRelaxedOpcodeBranch(Inst.getOpcode()) != Inst.getOpcode())
     return true;
 
   // Check if this instruction is ever relaxable.
   if (getRelaxedOpcodeArith(Inst.getOpcode()) == Inst.getOpcode())
     return false;
 
 
   // Check if the relaxable operand has an expression. For the current set of
   // relaxable instructions, the relaxable operand is always the last operand.
   unsigned RelaxableOp = Inst.getNumOperands() - 1;
   if (Inst.getOperand(RelaxableOp).isExpr())
     return true;
 
   return false;
 }
 
 bool X86AsmBackend::fixupNeedsRelaxation(const MCFixup &Fixup,
                                          uint64_t Value,
                                          const MCRelaxableFragment *DF,
                                          const MCAsmLayout &Layout) const {
   // Relax if the value is too big for a (signed) i8.
   return int64_t(Value) != int64_t(int8_t(Value));
 }
 
 // FIXME: Can tblgen help at all here to verify there aren't other instructions
 // we can relax?
 void X86AsmBackend::relaxInstruction(const MCInst &Inst, MCInst &Res) const {
   // The only relaxations X86 does is from a 1byte pcrel to a 4byte pcrel.
   unsigned RelaxedOp = getRelaxedOpcode(Inst.getOpcode());
 
   if (RelaxedOp == Inst.getOpcode()) {
     SmallString<256> Tmp;
     raw_svector_ostream OS(Tmp);
     Inst.dump_pretty(OS);
     OS << "\n";
     report_fatal_error("unexpected instruction to relax: " + OS.str());
   }
 
   Res = Inst;
   Res.setOpcode(RelaxedOp);
 }
 
 /// \brief Write a sequence of optimal nops to the output, covering \p Count
 /// bytes.
 /// \return - true on success, false on failure
 bool X86AsmBackend::writeNopData(uint64_t Count, MCObjectWriter *OW) const {
-  static const uint8_t TrueNops[10][10] = {
+  static const uint8_t Nops[10][10] = {
     // nop
     {0x90},
     // xchg %ax,%ax
     {0x66, 0x90},
     // nopl (%[re]ax)
     {0x0f, 0x1f, 0x00},
     // nopl 0(%[re]ax)
     {0x0f, 0x1f, 0x40, 0x00},
     // nopl 0(%[re]ax,%[re]ax,1)
     {0x0f, 0x1f, 0x44, 0x00, 0x00},
     // nopw 0(%[re]ax,%[re]ax,1)
     {0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00},
     // nopl 0L(%[re]ax)
     {0x0f, 0x1f, 0x80, 0x00, 0x00, 0x00, 0x00},
     // nopl 0L(%[re]ax,%[re]ax,1)
     {0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00},
     // nopw 0L(%[re]ax,%[re]ax,1)
     {0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00},
     // nopw %cs:0L(%[re]ax,%[re]ax,1)
     {0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00},
   };
 
-  // Alternative nop instructions for CPUs which don't support long nops.
-  static const uint8_t AltNops[7][10] = {
-      // nop
-      {0x90},
-      // xchg %ax,%ax
-      {0x66, 0x90},
-      // lea 0x0(%esi),%esi
-      {0x8d, 0x76, 0x00},
-      // lea 0x0(%esi),%esi
-      {0x8d, 0x74, 0x26, 0x00},
-      // nop + lea 0x0(%esi),%esi
-      {0x90, 0x8d, 0x74, 0x26, 0x00},
-      // lea 0x0(%esi),%esi
-      {0x8d, 0xb6, 0x00, 0x00, 0x00, 0x00 },
-      // lea 0x0(%esi),%esi
-      {0x8d, 0xb4, 0x26, 0x00, 0x00, 0x00, 0x00},
-  };
+  // This CPU doesn't support long nops. If needed add more.
+  // FIXME: Can we get this from the subtarget somehow?
+  // FIXME: We could generated something better than plain 0x90.
+  if (!HasNopl) {
+    for (uint64_t i = 0; i < Count; ++i)
+      OW->write8(0x90);
+    return true;
+  }
 
-  // Select the right NOP table.
-  // FIXME: Can we get if CPU supports long nops from the subtarget somehow?
-  const uint8_t (*Nops)[10] = HasNopl ? TrueNops : AltNops;
-  assert(HasNopl || MaxNopLength <= 7);
-
-  // Emit as many largest nops as needed, then emit a nop of the remaining
-  // length.
+  // 15 is the longest single nop instruction.  Emit as many 15-byte nops as
+  // needed, then emit a nop of the remaining length.
   do {
     const uint8_t ThisNopLength = (uint8_t) std::min(Count, MaxNopLength);
     const uint8_t Prefixes = ThisNopLength <= 10 ? 0 : ThisNopLength - 10;
     for (uint8_t i = 0; i < Prefixes; i++)
       OW->write8(0x66);
     const uint8_t Rest = ThisNopLength - Prefixes;
     for (uint8_t i = 0; i < Rest; i++)
       OW->write8(Nops[Rest - 1][i]);
     Count -= ThisNopLength;
   } while (Count != 0);
 
   return true;
 }
 
 /* *** */
 
 namespace {
 
 class ELFX86AsmBackend : public X86AsmBackend {
 public:
   uint8_t OSABI;
   ELFX86AsmBackend(const Target &T, uint8_t OSABI, StringRef CPU)
       : X86AsmBackend(T, CPU), OSABI(OSABI) {}
 };
 
 class ELFX86_32AsmBackend : public ELFX86AsmBackend {
 public:
   ELFX86_32AsmBackend(const Target &T, uint8_t OSABI, StringRef CPU)
     : ELFX86AsmBackend(T, OSABI, CPU) {}
 
   MCObjectWriter *createObjectWriter(raw_pwrite_stream &OS) const override {
     return createX86ELFObjectWriter(OS, /*IsELF64*/ false, OSABI, ELF::EM_386);
   }
 };
 
 class ELFX86_X32AsmBackend : public ELFX86AsmBackend {
 public:
   ELFX86_X32AsmBackend(const Target &T, uint8_t OSABI, StringRef CPU)
       : ELFX86AsmBackend(T, OSABI, CPU) {}
 
   MCObjectWriter *createObjectWriter(raw_pwrite_stream &OS) const override {
     return createX86ELFObjectWriter(OS, /*IsELF64*/ false, OSABI,
                                     ELF::EM_X86_64);
   }
 };
 
 class ELFX86_IAMCUAsmBackend : public ELFX86AsmBackend {
 public:
   ELFX86_IAMCUAsmBackend(const Target &T, uint8_t OSABI, StringRef CPU)
       : ELFX86AsmBackend(T, OSABI, CPU) {}
 
   MCObjectWriter *createObjectWriter(raw_pwrite_stream &OS) const override {
     return createX86ELFObjectWriter(OS, /*IsELF64*/ false, OSABI,
                                     ELF::EM_IAMCU);
   }
 };
 
 class ELFX86_64AsmBackend : public ELFX86AsmBackend {
 public:
   ELFX86_64AsmBackend(const Target &T, uint8_t OSABI, StringRef CPU)
     : ELFX86AsmBackend(T, OSABI, CPU) {}
 
   MCObjectWriter *createObjectWriter(raw_pwrite_stream &OS) const override {
     return createX86ELFObjectWriter(OS, /*IsELF64*/ true, OSABI, ELF::EM_X86_64);
   }
 };
 
 class WindowsX86AsmBackend : public X86AsmBackend {
   bool Is64Bit;
 
 public:
   WindowsX86AsmBackend(const Target &T, bool is64Bit, StringRef CPU)
     : X86AsmBackend(T, CPU)
     , Is64Bit(is64Bit) {
   }
 
   MCObjectWriter *createObjectWriter(raw_pwrite_stream &OS) const override {
     return createX86WinCOFFObjectWriter(OS, Is64Bit);
   }
 };
 
 namespace CU {
 
   /// Compact unwind encoding values.
   enum CompactUnwindEncodings {
     /// [RE]BP based frame where [RE]BP is pused on the stack immediately after
     /// the return address, then [RE]SP is moved to [RE]BP.
     UNWIND_MODE_BP_FRAME                   = 0x01000000,
 
     /// A frameless function with a small constant stack size.
     UNWIND_MODE_STACK_IMMD                 = 0x02000000,
 
     /// A frameless function with a large constant stack size.
     UNWIND_MODE_STACK_IND                  = 0x03000000,
 
     /// No compact unwind encoding is available.
     UNWIND_MODE_DWARF                      = 0x04000000,
 
     /// Mask for encoding the frame registers.
     UNWIND_BP_FRAME_REGISTERS              = 0x00007FFF,
 
     /// Mask for encoding the frameless registers.
     UNWIND_FRAMELESS_STACK_REG_PERMUTATION = 0x000003FF
   };
 
 } // end CU namespace
 
 class DarwinX86AsmBackend : public X86AsmBackend {
   const MCRegisterInfo &MRI;
 
   /// \brief Number of registers that can be saved in a compact unwind encoding.
   enum { CU_NUM_SAVED_REGS = 6 };
 
   mutable unsigned SavedRegs[CU_NUM_SAVED_REGS];
   bool Is64Bit;
 
   unsigned OffsetSize;                   ///< Offset of a "push" instruction.
   unsigned MoveInstrSize;                ///< Size of a "move" instruction.
   unsigned StackDivide;                  ///< Amount to adjust stack size by.
 protected:
   /// \brief Size of a "push" instruction for the given register.
   unsigned PushInstrSize(unsigned Reg) const {
     switch (Reg) {
       case X86::EBX:
       case X86::ECX:
       case X86::EDX:
       case X86::EDI:
       case X86::ESI:
       case X86::EBP:
       case X86::RBX:
       case X86::RBP:
         return 1;
       case X86::R12:
       case X86::R13:
       case X86::R14:
       case X86::R15:
         return 2;
     }
     return 1;
   }
 
   /// \brief Implementation of algorithm to generate the compact unwind encoding
   /// for the CFI instructions.
   uint32_t
   generateCompactUnwindEncodingImpl(ArrayRef<MCCFIInstruction> Instrs) const {
     if (Instrs.empty()) return 0;
 
     // Reset the saved registers.
     unsigned SavedRegIdx = 0;
     memset(SavedRegs, 0, sizeof(SavedRegs));
 
     bool HasFP = false;
 
     // Encode that we are using EBP/RBP as the frame pointer.
     uint32_t CompactUnwindEncoding = 0;
 
     unsigned SubtractInstrIdx = Is64Bit ? 3 : 2;
     unsigned InstrOffset = 0;
     unsigned StackAdjust = 0;
     unsigned StackSize = 0;
     unsigned PrevStackSize = 0;
     unsigned NumDefCFAOffsets = 0;
 
     for (unsigned i = 0, e = Instrs.size(); i != e; ++i) {
       const MCCFIInstruction &Inst = Instrs[i];
 
       switch (Inst.getOperation()) {
       default:
         // Any other CFI directives indicate a frame that we aren't prepared
         // to represent via compact unwind, so just bail out.
         return 0;
       case MCCFIInstruction::OpDefCfaRegister: {
         // Defines a frame pointer. E.g.
         //
         //     movq %rsp, %rbp
         //  L0:
         //     .cfi_def_cfa_register %rbp
         //
         HasFP = true;
         assert(MRI.getLLVMRegNum(Inst.getRegister(), true) ==
                (Is64Bit ? X86::RBP : X86::EBP) && "Invalid frame pointer!");
 
         // Reset the counts.
         memset(SavedRegs, 0, sizeof(SavedRegs));
         StackAdjust = 0;
         SavedRegIdx = 0;
         InstrOffset += MoveInstrSize;
         break;
       }
       case MCCFIInstruction::OpDefCfaOffset: {
         // Defines a new offset for the CFA. E.g.
         //
         //  With frame:
         //
         //     pushq %rbp
         //  L0:
         //     .cfi_def_cfa_offset 16
         //
         //  Without frame:
         //
         //     subq $72, %rsp
         //  L0:
         //     .cfi_def_cfa_offset 80
         //
         PrevStackSize = StackSize;
         StackSize = std::abs(Inst.getOffset()) / StackDivide;
         ++NumDefCFAOffsets;
         break;
       }
       case MCCFIInstruction::OpOffset: {
         // Defines a "push" of a callee-saved register. E.g.
         //
         //     pushq %r15
         //     pushq %r14
         //     pushq %rbx
         //  L0:
         //     subq $120, %rsp
         //  L1:
         //     .cfi_offset %rbx, -40
         //     .cfi_offset %r14, -32
         //     .cfi_offset %r15, -24
         //
         if (SavedRegIdx == CU_NUM_SAVED_REGS)
           // If there are too many saved registers, we cannot use a compact
           // unwind encoding.
           return CU::UNWIND_MODE_DWARF;
 
         unsigned Reg = MRI.getLLVMRegNum(Inst.getRegister(), true);
         SavedRegs[SavedRegIdx++] = Reg;
         StackAdjust += OffsetSize;
         InstrOffset += PushInstrSize(Reg);
         break;
       }
       }
     }
 
     StackAdjust /= StackDivide;
 
     if (HasFP) {
       if ((StackAdjust & 0xFF) != StackAdjust)
         // Offset was too big for a compact unwind encoding.
         return CU::UNWIND_MODE_DWARF;
 
       // Get the encoding of the saved registers when we have a frame pointer.
       uint32_t RegEnc = encodeCompactUnwindRegistersWithFrame();
       if (RegEnc == ~0U) return CU::UNWIND_MODE_DWARF;
 
       CompactUnwindEncoding |= CU::UNWIND_MODE_BP_FRAME;
       CompactUnwindEncoding |= (StackAdjust & 0xFF) << 16;
       CompactUnwindEncoding |= RegEnc & CU::UNWIND_BP_FRAME_REGISTERS;
     } else {
       // If the amount of the stack allocation is the size of a register, then
       // we "push" the RAX/EAX register onto the stack instead of adjusting the
       // stack pointer with a SUB instruction. We don't support the push of the
       // RAX/EAX register with compact unwind. So we check for that situation
       // here.
       if ((NumDefCFAOffsets == SavedRegIdx + 1 &&
            StackSize - PrevStackSize == 1) ||
           (Instrs.size() == 1 && NumDefCFAOffsets == 1 && StackSize == 2))
         return CU::UNWIND_MODE_DWARF;
 
       SubtractInstrIdx += InstrOffset;
       ++StackAdjust;
 
       if ((StackSize & 0xFF) == StackSize) {
         // Frameless stack with a small stack size.
         CompactUnwindEncoding |= CU::UNWIND_MODE_STACK_IMMD;
 
         // Encode the stack size.
         CompactUnwindEncoding |= (StackSize & 0xFF) << 16;
       } else {
         if ((StackAdjust & 0x7) != StackAdjust)
           // The extra stack adjustments are too big for us to handle.
           return CU::UNWIND_MODE_DWARF;
 
         // Frameless stack with an offset too large for us to encode compactly.
         CompactUnwindEncoding |= CU::UNWIND_MODE_STACK_IND;
 
         // Encode the offset to the nnnnnn value in the 'subl $nnnnnn, ESP'
         // instruction.
         CompactUnwindEncoding |= (SubtractInstrIdx & 0xFF) << 16;
 
         // Encode any extra stack stack adjustments (done via push
         // instructions).
         CompactUnwindEncoding |= (StackAdjust & 0x7) << 13;
       }
 
       // Encode the number of registers saved. (Reverse the list first.)
       std::reverse(&SavedRegs[0], &SavedRegs[SavedRegIdx]);
       CompactUnwindEncoding |= (SavedRegIdx & 0x7) << 10;
 
       // Get the encoding of the saved registers when we don't have a frame
       // pointer.
       uint32_t RegEnc = encodeCompactUnwindRegistersWithoutFrame(SavedRegIdx);
       if (RegEnc == ~0U) return CU::UNWIND_MODE_DWARF;
 
       // Encode the register encoding.
       CompactUnwindEncoding |=
         RegEnc & CU::UNWIND_FRAMELESS_STACK_REG_PERMUTATION;
     }
 
     return CompactUnwindEncoding;
   }
 
 private:
   /// \brief Get the compact unwind number for a given register. The number
   /// corresponds to the enum lists in compact_unwind_encoding.h.
   int getCompactUnwindRegNum(unsigned Reg) const {
     static const MCPhysReg CU32BitRegs[7] = {
       X86::EBX, X86::ECX, X86::EDX, X86::EDI, X86::ESI, X86::EBP, 0
     };
     static const MCPhysReg CU64BitRegs[] = {
       X86::RBX, X86::R12, X86::R13, X86::R14, X86::R15, X86::RBP, 0
     };
     const MCPhysReg *CURegs = Is64Bit ? CU64BitRegs : CU32BitRegs;
     for (int Idx = 1; *CURegs; ++CURegs, ++Idx)
       if (*CURegs == Reg)
         return Idx;
 
     return -1;
   }
 
   /// \brief Return the registers encoded for a compact encoding with a frame
   /// pointer.
   uint32_t encodeCompactUnwindRegistersWithFrame() const {
     // Encode the registers in the order they were saved --- 3-bits per
     // register. The list of saved registers is assumed to be in reverse
     // order. The registers are numbered from 1 to CU_NUM_SAVED_REGS.
     uint32_t RegEnc = 0;
     for (int i = 0, Idx = 0; i != CU_NUM_SAVED_REGS; ++i) {
       unsigned Reg = SavedRegs[i];
       if (Reg == 0) break;
 
       int CURegNum = getCompactUnwindRegNum(Reg);
       if (CURegNum == -1) return ~0U;
 
       // Encode the 3-bit register number in order, skipping over 3-bits for
       // each register.
       RegEnc |= (CURegNum & 0x7) << (Idx++ * 3);
     }
 
     assert((RegEnc & 0x3FFFF) == RegEnc &&
            "Invalid compact register encoding!");
     return RegEnc;
   }
 
   /// \brief Create the permutation encoding used with frameless stacks. It is
   /// passed the number of registers to be saved and an array of the registers
   /// saved.
   uint32_t encodeCompactUnwindRegistersWithoutFrame(unsigned RegCount) const {
     // The saved registers are numbered from 1 to 6. In order to encode the
     // order in which they were saved, we re-number them according to their
     // place in the register order. The re-numbering is relative to the last
     // re-numbered register. E.g., if we have registers {6, 2, 4, 5} saved in
     // that order:
     //
     //    Orig  Re-Num
     //    ----  ------
     //     6       6
     //     2       2
     //     4       3
     //     5       3
     //
     for (unsigned i = 0; i < RegCount; ++i) {
       int CUReg = getCompactUnwindRegNum(SavedRegs[i]);
       if (CUReg == -1) return ~0U;
       SavedRegs[i] = CUReg;
     }
 
     // Reverse the list.
     std::reverse(&SavedRegs[0], &SavedRegs[CU_NUM_SAVED_REGS]);
 
     uint32_t RenumRegs[CU_NUM_SAVED_REGS];
     for (unsigned i = CU_NUM_SAVED_REGS - RegCount; i < CU_NUM_SAVED_REGS; ++i){
       unsigned Countless = 0;
       for (unsigned j = CU_NUM_SAVED_REGS - RegCount; j < i; ++j)
         if (SavedRegs[j] < SavedRegs[i])
           ++Countless;
 
       RenumRegs[i] = SavedRegs[i] - Countless - 1;
     }
 
     // Take the renumbered values and encode them into a 10-bit number.
     uint32_t permutationEncoding = 0;
     switch (RegCount) {
     case 6:
       permutationEncoding |= 120 * RenumRegs[0] + 24 * RenumRegs[1]
                              + 6 * RenumRegs[2] +  2 * RenumRegs[3]
                              +     RenumRegs[4];
       break;
     case 5:
       permutationEncoding |= 120 * RenumRegs[1] + 24 * RenumRegs[2]
                              + 6 * RenumRegs[3] +  2 * RenumRegs[4]
                              +     RenumRegs[5];
       break;
     case 4:
       permutationEncoding |=  60 * RenumRegs[2] + 12 * RenumRegs[3]
                              + 3 * RenumRegs[4] +      RenumRegs[5];
       break;
     case 3:
       permutationEncoding |=  20 * RenumRegs[3] +  4 * RenumRegs[4]
                              +     RenumRegs[5];
       break;
     case 2:
       permutationEncoding |=   5 * RenumRegs[4] +      RenumRegs[5];
       break;
     case 1:
       permutationEncoding |=       RenumRegs[5];
       break;
     }
 
     assert((permutationEncoding & 0x3FF) == permutationEncoding &&
            "Invalid compact register encoding!");
     return permutationEncoding;
   }
 
 public:
   DarwinX86AsmBackend(const Target &T, const MCRegisterInfo &MRI, StringRef CPU,
                       bool Is64Bit)
     : X86AsmBackend(T, CPU), MRI(MRI), Is64Bit(Is64Bit) {
     memset(SavedRegs, 0, sizeof(SavedRegs));
     OffsetSize = Is64Bit ? 8 : 4;
     MoveInstrSize = Is64Bit ? 3 : 2;
     StackDivide = Is64Bit ? 8 : 4;
   }
 };
 
 class DarwinX86_32AsmBackend : public DarwinX86AsmBackend {
 public:
   DarwinX86_32AsmBackend(const Target &T, const MCRegisterInfo &MRI,
                          StringRef CPU)
       : DarwinX86AsmBackend(T, MRI, CPU, false) {}
 
   MCObjectWriter *createObjectWriter(raw_pwrite_stream &OS) const override {
     return createX86MachObjectWriter(OS, /*Is64Bit=*/false,
                                      MachO::CPU_TYPE_I386,
                                      MachO::CPU_SUBTYPE_I386_ALL);
   }
 
   /// \brief Generate the compact unwind encoding for the CFI instructions.
   uint32_t generateCompactUnwindEncoding(
                              ArrayRef<MCCFIInstruction> Instrs) const override {
     return generateCompactUnwindEncodingImpl(Instrs);
   }
 };
 
 class DarwinX86_64AsmBackend : public DarwinX86AsmBackend {
   const MachO::CPUSubTypeX86 Subtype;
 public:
   DarwinX86_64AsmBackend(const Target &T, const MCRegisterInfo &MRI,
                          StringRef CPU, MachO::CPUSubTypeX86 st)
       : DarwinX86AsmBackend(T, MRI, CPU, true), Subtype(st) {}
 
   MCObjectWriter *createObjectWriter(raw_pwrite_stream &OS) const override {
     return createX86MachObjectWriter(OS, /*Is64Bit=*/true,
                                      MachO::CPU_TYPE_X86_64, Subtype);
   }
 
   /// \brief Generate the compact unwind encoding for the CFI instructions.
   uint32_t generateCompactUnwindEncoding(
                              ArrayRef<MCCFIInstruction> Instrs) const override {
     return generateCompactUnwindEncodingImpl(Instrs);
   }
 };
 
 } // end anonymous namespace
 
 MCAsmBackend *llvm::createX86_32AsmBackend(const Target &T,
                                            const MCRegisterInfo &MRI,
                                            const Triple &TheTriple,
                                            StringRef CPU) {
   if (TheTriple.isOSBinFormatMachO())
     return new DarwinX86_32AsmBackend(T, MRI, CPU);
 
   if (TheTriple.isOSWindows() && !TheTriple.isOSBinFormatELF())
     return new WindowsX86AsmBackend(T, false, CPU);
 
   uint8_t OSABI = MCELFObjectTargetWriter::getOSABI(TheTriple.getOS());
 
   if (TheTriple.isOSIAMCU())
     return new ELFX86_IAMCUAsmBackend(T, OSABI, CPU);
 
   return new ELFX86_32AsmBackend(T, OSABI, CPU);
 }
 
 MCAsmBackend *llvm::createX86_64AsmBackend(const Target &T,
                                            const MCRegisterInfo &MRI,
                                            const Triple &TheTriple,
                                            StringRef CPU) {
   if (TheTriple.isOSBinFormatMachO()) {
     MachO::CPUSubTypeX86 CS =
         StringSwitch<MachO::CPUSubTypeX86>(TheTriple.getArchName())
             .Case("x86_64h", MachO::CPU_SUBTYPE_X86_64_H)
             .Default(MachO::CPU_SUBTYPE_X86_64_ALL);
     return new DarwinX86_64AsmBackend(T, MRI, CPU, CS);
   }
 
   if (TheTriple.isOSWindows() && !TheTriple.isOSBinFormatELF())
     return new WindowsX86AsmBackend(T, true, CPU);
 
   uint8_t OSABI = MCELFObjectTargetWriter::getOSABI(TheTriple.getOS());
 
   if (TheTriple.getEnvironment() == Triple::GNUX32)
     return new ELFX86_X32AsmBackend(T, OSABI, CPU);
   return new ELFX86_64AsmBackend(T, OSABI, CPU);
 }
Index: vendor/llvm/dist/lib/Target/X86/X86FrameLowering.cpp
===================================================================
--- vendor/llvm/dist/lib/Target/X86/X86FrameLowering.cpp	(revision 295845)
+++ vendor/llvm/dist/lib/Target/X86/X86FrameLowering.cpp	(revision 295846)
@@ -1,2715 +1,2714 @@
 //===-- X86FrameLowering.cpp - X86 Frame Information ----------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file contains the X86 implementation of TargetFrameLowering class.
 //
 //===----------------------------------------------------------------------===//
 
 #include "X86FrameLowering.h"
 #include "X86InstrBuilder.h"
 #include "X86InstrInfo.h"
 #include "X86MachineFunctionInfo.h"
 #include "X86Subtarget.h"
 #include "X86TargetMachine.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/Analysis/EHPersonalities.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineModuleInfo.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/WinEHFuncInfo.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/Function.h"
 #include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCSymbol.h"
 #include "llvm/Target/TargetOptions.h"
 #include "llvm/Support/Debug.h"
 #include <cstdlib>
 
 using namespace llvm;
 
 X86FrameLowering::X86FrameLowering(const X86Subtarget &STI,
                                    unsigned StackAlignOverride)
     : TargetFrameLowering(StackGrowsDown, StackAlignOverride,
                           STI.is64Bit() ? -8 : -4),
       STI(STI), TII(*STI.getInstrInfo()), TRI(STI.getRegisterInfo()) {
   // Cache a bunch of frame-related predicates for this subtarget.
   SlotSize = TRI->getSlotSize();
   Is64Bit = STI.is64Bit();
   IsLP64 = STI.isTarget64BitLP64();
   // standard x86_64 and NaCl use 64-bit frame/stack pointers, x32 - 32-bit.
   Uses64BitFramePtr = STI.isTarget64BitLP64() || STI.isTargetNaCl64();
   StackPtr = TRI->getStackRegister();
 }
 
 bool X86FrameLowering::hasReservedCallFrame(const MachineFunction &MF) const {
   return !MF.getFrameInfo()->hasVarSizedObjects() &&
          !MF.getInfo<X86MachineFunctionInfo>()->getHasPushSequences();
 }
 
 /// canSimplifyCallFramePseudos - If there is a reserved call frame, the
 /// call frame pseudos can be simplified.  Having a FP, as in the default
 /// implementation, is not sufficient here since we can't always use it.
 /// Use a more nuanced condition.
 bool
 X86FrameLowering::canSimplifyCallFramePseudos(const MachineFunction &MF) const {
   return hasReservedCallFrame(MF) ||
          (hasFP(MF) && !TRI->needsStackRealignment(MF)) ||
          TRI->hasBasePointer(MF);
 }
 
 // needsFrameIndexResolution - Do we need to perform FI resolution for
 // this function. Normally, this is required only when the function
 // has any stack objects. However, FI resolution actually has another job,
 // not apparent from the title - it resolves callframesetup/destroy 
 // that were not simplified earlier.
 // So, this is required for x86 functions that have push sequences even
 // when there are no stack objects.
 bool
 X86FrameLowering::needsFrameIndexResolution(const MachineFunction &MF) const {
   return MF.getFrameInfo()->hasStackObjects() ||
          MF.getInfo<X86MachineFunctionInfo>()->getHasPushSequences();
 }
 
 /// hasFP - Return true if the specified function should have a dedicated frame
 /// pointer register.  This is true if the function has variable sized allocas
 /// or if frame pointer elimination is disabled.
 bool X86FrameLowering::hasFP(const MachineFunction &MF) const {
   const MachineFrameInfo *MFI = MF.getFrameInfo();
   const MachineModuleInfo &MMI = MF.getMMI();
 
   return (MF.getTarget().Options.DisableFramePointerElim(MF) ||
           TRI->needsStackRealignment(MF) ||
           MFI->hasVarSizedObjects() ||
           MFI->isFrameAddressTaken() || MFI->hasOpaqueSPAdjustment() ||
           MF.getInfo<X86MachineFunctionInfo>()->getForceFramePointer() ||
           MMI.callsUnwindInit() || MMI.hasEHFunclets() || MMI.callsEHReturn() ||
           MFI->hasStackMap() || MFI->hasPatchPoint() ||
           MFI->hasCopyImplyingStackAdjustment());
 }
 
 static unsigned getSUBriOpcode(unsigned IsLP64, int64_t Imm) {
   if (IsLP64) {
     if (isInt<8>(Imm))
       return X86::SUB64ri8;
     return X86::SUB64ri32;
   } else {
     if (isInt<8>(Imm))
       return X86::SUB32ri8;
     return X86::SUB32ri;
   }
 }
 
 static unsigned getADDriOpcode(unsigned IsLP64, int64_t Imm) {
   if (IsLP64) {
     if (isInt<8>(Imm))
       return X86::ADD64ri8;
     return X86::ADD64ri32;
   } else {
     if (isInt<8>(Imm))
       return X86::ADD32ri8;
     return X86::ADD32ri;
   }
 }
 
 static unsigned getSUBrrOpcode(unsigned isLP64) {
   return isLP64 ? X86::SUB64rr : X86::SUB32rr;
 }
 
 static unsigned getADDrrOpcode(unsigned isLP64) {
   return isLP64 ? X86::ADD64rr : X86::ADD32rr;
 }
 
 static unsigned getANDriOpcode(bool IsLP64, int64_t Imm) {
   if (IsLP64) {
     if (isInt<8>(Imm))
       return X86::AND64ri8;
     return X86::AND64ri32;
   }
   if (isInt<8>(Imm))
     return X86::AND32ri8;
   return X86::AND32ri;
 }
 
 static unsigned getLEArOpcode(unsigned IsLP64) {
   return IsLP64 ? X86::LEA64r : X86::LEA32r;
 }
 
 /// findDeadCallerSavedReg - Return a caller-saved register that isn't live
 /// when it reaches the "return" instruction. We can then pop a stack object
 /// to this register without worry about clobbering it.
 static unsigned findDeadCallerSavedReg(MachineBasicBlock &MBB,
                                        MachineBasicBlock::iterator &MBBI,
                                        const X86RegisterInfo *TRI,
                                        bool Is64Bit) {
   const MachineFunction *MF = MBB.getParent();
   const Function *F = MF->getFunction();
   if (!F || MF->getMMI().callsEHReturn())
     return 0;
 
   const TargetRegisterClass &AvailableRegs = *TRI->getGPRsForTailCall(*MF);
 
   unsigned Opc = MBBI->getOpcode();
   switch (Opc) {
   default: return 0;
   case X86::RETL:
   case X86::RETQ:
   case X86::RETIL:
   case X86::RETIQ:
   case X86::TCRETURNdi:
   case X86::TCRETURNri:
   case X86::TCRETURNmi:
   case X86::TCRETURNdi64:
   case X86::TCRETURNri64:
   case X86::TCRETURNmi64:
   case X86::EH_RETURN:
   case X86::EH_RETURN64: {
     SmallSet<uint16_t, 8> Uses;
     for (unsigned i = 0, e = MBBI->getNumOperands(); i != e; ++i) {
       MachineOperand &MO = MBBI->getOperand(i);
       if (!MO.isReg() || MO.isDef())
         continue;
       unsigned Reg = MO.getReg();
       if (!Reg)
         continue;
       for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI)
         Uses.insert(*AI);
     }
 
     for (auto CS : AvailableRegs)
       if (!Uses.count(CS) && CS != X86::RIP)
         return CS;
   }
   }
 
   return 0;
 }
 
-static bool isEAXLiveIn(MachineFunction &MF) {
-  for (MachineRegisterInfo::livein_iterator II = MF.getRegInfo().livein_begin(),
-       EE = MF.getRegInfo().livein_end(); II != EE; ++II) {
-    unsigned Reg = II->first;
+static bool isEAXLiveIn(MachineBasicBlock &MBB) {
+  for (MachineBasicBlock::RegisterMaskPair RegMask : MBB.liveins()) {
+    unsigned Reg = RegMask.PhysReg;
 
     if (Reg == X86::RAX || Reg == X86::EAX || Reg == X86::AX ||
         Reg == X86::AH || Reg == X86::AL)
       return true;
   }
 
   return false;
 }
 
 /// Check if the flags need to be preserved before the terminators.
 /// This would be the case, if the eflags is live-in of the region
 /// composed by the terminators or live-out of that region, without
 /// being defined by a terminator.
 static bool
 flagsNeedToBePreservedBeforeTheTerminators(const MachineBasicBlock &MBB) {
   for (const MachineInstr &MI : MBB.terminators()) {
     bool BreakNext = false;
     for (const MachineOperand &MO : MI.operands()) {
       if (!MO.isReg())
         continue;
       unsigned Reg = MO.getReg();
       if (Reg != X86::EFLAGS)
         continue;
 
       // This terminator needs an eflags that is not defined
       // by a previous another terminator:
       // EFLAGS is live-in of the region composed by the terminators.
       if (!MO.isDef())
         return true;
       // This terminator defines the eflags, i.e., we don't need to preserve it.
       // However, we still need to check this specific terminator does not
       // read a live-in value.
       BreakNext = true;
     }
     // We found a definition of the eflags, no need to preserve them.
     if (BreakNext)
       return false;
   }
 
   // None of the terminators use or define the eflags.
   // Check if they are live-out, that would imply we need to preserve them.
   for (const MachineBasicBlock *Succ : MBB.successors())
     if (Succ->isLiveIn(X86::EFLAGS))
       return true;
 
   return false;
 }
 
 /// emitSPUpdate - Emit a series of instructions to increment / decrement the
 /// stack pointer by a constant value.
 void X86FrameLowering::emitSPUpdate(MachineBasicBlock &MBB,
                                     MachineBasicBlock::iterator &MBBI,
                                     int64_t NumBytes, bool InEpilogue) const {
   bool isSub = NumBytes < 0;
   uint64_t Offset = isSub ? -NumBytes : NumBytes;
 
   uint64_t Chunk = (1LL << 31) - 1;
   DebugLoc DL = MBB.findDebugLoc(MBBI);
 
   while (Offset) {
     if (Offset > Chunk) {
       // Rather than emit a long series of instructions for large offsets,
       // load the offset into a register and do one sub/add
       unsigned Reg = 0;
 
-      if (isSub && !isEAXLiveIn(*MBB.getParent()))
+      if (isSub && !isEAXLiveIn(MBB))
         Reg = (unsigned)(Is64Bit ? X86::RAX : X86::EAX);
       else
         Reg = findDeadCallerSavedReg(MBB, MBBI, TRI, Is64Bit);
 
       if (Reg) {
         unsigned Opc = Is64Bit ? X86::MOV64ri : X86::MOV32ri;
         BuildMI(MBB, MBBI, DL, TII.get(Opc), Reg)
           .addImm(Offset);
         Opc = isSub
           ? getSUBrrOpcode(Is64Bit)
           : getADDrrOpcode(Is64Bit);
         MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
           .addReg(StackPtr)
           .addReg(Reg);
         MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.
         Offset = 0;
         continue;
       }
     }
 
     uint64_t ThisVal = std::min(Offset, Chunk);
     if (ThisVal == (Is64Bit ? 8 : 4)) {
       // Use push / pop instead.
       unsigned Reg = isSub
         ? (unsigned)(Is64Bit ? X86::RAX : X86::EAX)
         : findDeadCallerSavedReg(MBB, MBBI, TRI, Is64Bit);
       if (Reg) {
         unsigned Opc = isSub
           ? (Is64Bit ? X86::PUSH64r : X86::PUSH32r)
           : (Is64Bit ? X86::POP64r  : X86::POP32r);
         MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(Opc))
           .addReg(Reg, getDefRegState(!isSub) | getUndefRegState(isSub));
         if (isSub)
           MI->setFlag(MachineInstr::FrameSetup);
         else
           MI->setFlag(MachineInstr::FrameDestroy);
         Offset -= ThisVal;
         continue;
       }
     }
 
     MachineInstrBuilder MI = BuildStackAdjustment(
         MBB, MBBI, DL, isSub ? -ThisVal : ThisVal, InEpilogue);
     if (isSub)
       MI.setMIFlag(MachineInstr::FrameSetup);
     else
       MI.setMIFlag(MachineInstr::FrameDestroy);
 
     Offset -= ThisVal;
   }
 }
 
 MachineInstrBuilder X86FrameLowering::BuildStackAdjustment(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, DebugLoc DL,
     int64_t Offset, bool InEpilogue) const {
   assert(Offset != 0 && "zero offset stack adjustment requested");
 
   // On Atom, using LEA to adjust SP is preferred, but using it in the epilogue
   // is tricky.
   bool UseLEA;
   if (!InEpilogue) {
     // Check if inserting the prologue at the beginning
     // of MBB would require to use LEA operations.
     // We need to use LEA operations if EFLAGS is live in, because
     // it means an instruction will read it before it gets defined.
     UseLEA = STI.useLeaForSP() || MBB.isLiveIn(X86::EFLAGS);
   } else {
     // If we can use LEA for SP but we shouldn't, check that none
     // of the terminators uses the eflags. Otherwise we will insert
     // a ADD that will redefine the eflags and break the condition.
     // Alternatively, we could move the ADD, but this may not be possible
     // and is an optimization anyway.
     UseLEA = canUseLEAForSPInEpilogue(*MBB.getParent());
     if (UseLEA && !STI.useLeaForSP())
       UseLEA = flagsNeedToBePreservedBeforeTheTerminators(MBB);
     // If that assert breaks, that means we do not do the right thing
     // in canUseAsEpilogue.
     assert((UseLEA || !flagsNeedToBePreservedBeforeTheTerminators(MBB)) &&
            "We shouldn't have allowed this insertion point");
   }
 
   MachineInstrBuilder MI;
   if (UseLEA) {
     MI = addRegOffset(BuildMI(MBB, MBBI, DL,
                               TII.get(getLEArOpcode(Uses64BitFramePtr)),
                               StackPtr),
                       StackPtr, false, Offset);
   } else {
     bool IsSub = Offset < 0;
     uint64_t AbsOffset = IsSub ? -Offset : Offset;
     unsigned Opc = IsSub ? getSUBriOpcode(Uses64BitFramePtr, AbsOffset)
                          : getADDriOpcode(Uses64BitFramePtr, AbsOffset);
     MI = BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
              .addReg(StackPtr)
              .addImm(AbsOffset);
     MI->getOperand(3).setIsDead(); // The EFLAGS implicit def is dead.
   }
   return MI;
 }
 
 int X86FrameLowering::mergeSPUpdates(MachineBasicBlock &MBB,
                                      MachineBasicBlock::iterator &MBBI,
                                      bool doMergeWithPrevious) const {
   if ((doMergeWithPrevious && MBBI == MBB.begin()) ||
       (!doMergeWithPrevious && MBBI == MBB.end()))
     return 0;
 
   MachineBasicBlock::iterator PI = doMergeWithPrevious ? std::prev(MBBI) : MBBI;
   MachineBasicBlock::iterator NI = doMergeWithPrevious ? nullptr
                                                        : std::next(MBBI);
   unsigned Opc = PI->getOpcode();
   int Offset = 0;
 
   if ((Opc == X86::ADD64ri32 || Opc == X86::ADD64ri8 ||
        Opc == X86::ADD32ri || Opc == X86::ADD32ri8 ||
        Opc == X86::LEA32r || Opc == X86::LEA64_32r) &&
       PI->getOperand(0).getReg() == StackPtr){
     Offset += PI->getOperand(2).getImm();
     MBB.erase(PI);
     if (!doMergeWithPrevious) MBBI = NI;
   } else if ((Opc == X86::SUB64ri32 || Opc == X86::SUB64ri8 ||
               Opc == X86::SUB32ri || Opc == X86::SUB32ri8) &&
              PI->getOperand(0).getReg() == StackPtr) {
     Offset -= PI->getOperand(2).getImm();
     MBB.erase(PI);
     if (!doMergeWithPrevious) MBBI = NI;
   }
 
   return Offset;
 }
 
 void X86FrameLowering::BuildCFI(MachineBasicBlock &MBB,
                                 MachineBasicBlock::iterator MBBI, DebugLoc DL,
                                 MCCFIInstruction CFIInst) const {
   MachineFunction &MF = *MBB.getParent();
   unsigned CFIIndex = MF.getMMI().addFrameInst(CFIInst);
   BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
       .addCFIIndex(CFIIndex);
 }
 
 void
 X86FrameLowering::emitCalleeSavedFrameMoves(MachineBasicBlock &MBB,
                                             MachineBasicBlock::iterator MBBI,
                                             DebugLoc DL) const {
   MachineFunction &MF = *MBB.getParent();
   MachineFrameInfo *MFI = MF.getFrameInfo();
   MachineModuleInfo &MMI = MF.getMMI();
   const MCRegisterInfo *MRI = MMI.getContext().getRegisterInfo();
 
   // Add callee saved registers to move list.
   const std::vector<CalleeSavedInfo> &CSI = MFI->getCalleeSavedInfo();
   if (CSI.empty()) return;
 
   // Calculate offsets.
   for (std::vector<CalleeSavedInfo>::const_iterator
          I = CSI.begin(), E = CSI.end(); I != E; ++I) {
     int64_t Offset = MFI->getObjectOffset(I->getFrameIdx());
     unsigned Reg = I->getReg();
 
     unsigned DwarfReg = MRI->getDwarfRegNum(Reg, true);
     BuildCFI(MBB, MBBI, DL,
              MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
   }
 }
 
 MachineInstr *X86FrameLowering::emitStackProbe(MachineFunction &MF,
                                                MachineBasicBlock &MBB,
                                                MachineBasicBlock::iterator MBBI,
                                                DebugLoc DL,
                                                bool InProlog) const {
   const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();
   if (STI.isTargetWindowsCoreCLR()) {
     if (InProlog) {
       return emitStackProbeInlineStub(MF, MBB, MBBI, DL, true);
     } else {
       return emitStackProbeInline(MF, MBB, MBBI, DL, false);
     }
   } else {
     return emitStackProbeCall(MF, MBB, MBBI, DL, InProlog);
   }
 }
 
 void X86FrameLowering::inlineStackProbe(MachineFunction &MF,
                                         MachineBasicBlock &PrologMBB) const {
   const StringRef ChkStkStubSymbol = "__chkstk_stub";
   MachineInstr *ChkStkStub = nullptr;
 
   for (MachineInstr &MI : PrologMBB) {
     if (MI.isCall() && MI.getOperand(0).isSymbol() &&
         ChkStkStubSymbol == MI.getOperand(0).getSymbolName()) {
       ChkStkStub = &MI;
       break;
     }
   }
 
   if (ChkStkStub != nullptr) {
     MachineBasicBlock::iterator MBBI = std::next(ChkStkStub->getIterator());
     assert(std::prev(MBBI).operator==(ChkStkStub) &&
       "MBBI expected after __chkstk_stub.");
     DebugLoc DL = PrologMBB.findDebugLoc(MBBI);
     emitStackProbeInline(MF, PrologMBB, MBBI, DL, true);
     ChkStkStub->eraseFromParent();
   }
 }
 
 MachineInstr *X86FrameLowering::emitStackProbeInline(
   MachineFunction &MF, MachineBasicBlock &MBB,
   MachineBasicBlock::iterator MBBI, DebugLoc DL, bool InProlog) const {
   const X86Subtarget &STI = MF.getSubtarget<X86Subtarget>();
   assert(STI.is64Bit() && "different expansion needed for 32 bit");
   assert(STI.isTargetWindowsCoreCLR() && "custom expansion expects CoreCLR");
   const TargetInstrInfo &TII = *STI.getInstrInfo();
   const BasicBlock *LLVM_BB = MBB.getBasicBlock();
 
   // RAX contains the number of bytes of desired stack adjustment.
   // The handling here assumes this value has already been updated so as to
   // maintain stack alignment.
   //
   // We need to exit with RSP modified by this amount and execute suitable
   // page touches to notify the OS that we're growing the stack responsibly.
   // All stack probing must be done without modifying RSP.
   //
   // MBB:
   //    SizeReg = RAX;
   //    ZeroReg = 0
   //    CopyReg = RSP
   //    Flags, TestReg = CopyReg - SizeReg
   //    FinalReg = !Flags.Ovf ? TestReg : ZeroReg
   //    LimitReg = gs magic thread env access
   //    if FinalReg >= LimitReg goto ContinueMBB
   // RoundBB:
   //    RoundReg = page address of FinalReg
   // LoopMBB:
   //    LoopReg = PHI(LimitReg,ProbeReg)
   //    ProbeReg = LoopReg - PageSize
   //    [ProbeReg] = 0
   //    if (ProbeReg > RoundReg) goto LoopMBB
   // ContinueMBB:
   //    RSP = RSP - RAX
   //    [rest of original MBB]
 
   // Set up the new basic blocks
   MachineBasicBlock *RoundMBB = MF.CreateMachineBasicBlock(LLVM_BB);
   MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(LLVM_BB);
   MachineBasicBlock *ContinueMBB = MF.CreateMachineBasicBlock(LLVM_BB);
 
   MachineFunction::iterator MBBIter = std::next(MBB.getIterator());
   MF.insert(MBBIter, RoundMBB);
   MF.insert(MBBIter, LoopMBB);
   MF.insert(MBBIter, ContinueMBB);
 
   // Split MBB and move the tail portion down to ContinueMBB.
   MachineBasicBlock::iterator BeforeMBBI = std::prev(MBBI);
   ContinueMBB->splice(ContinueMBB->begin(), &MBB, MBBI, MBB.end());
   ContinueMBB->transferSuccessorsAndUpdatePHIs(&MBB);
 
   // Some useful constants
   const int64_t ThreadEnvironmentStackLimit = 0x10;
   const int64_t PageSize = 0x1000;
   const int64_t PageMask = ~(PageSize - 1);
 
   // Registers we need. For the normal case we use virtual
   // registers. For the prolog expansion we use RAX, RCX and RDX.
   MachineRegisterInfo &MRI = MF.getRegInfo();
   const TargetRegisterClass *RegClass = &X86::GR64RegClass;
   const unsigned SizeReg = InProlog ? (unsigned)X86::RAX
                                     : MRI.createVirtualRegister(RegClass),
                  ZeroReg = InProlog ? (unsigned)X86::RCX
                                     : MRI.createVirtualRegister(RegClass),
                  CopyReg = InProlog ? (unsigned)X86::RDX
                                     : MRI.createVirtualRegister(RegClass),
                  TestReg = InProlog ? (unsigned)X86::RDX
                                     : MRI.createVirtualRegister(RegClass),
                  FinalReg = InProlog ? (unsigned)X86::RDX
                                      : MRI.createVirtualRegister(RegClass),
                  RoundedReg = InProlog ? (unsigned)X86::RDX
                                        : MRI.createVirtualRegister(RegClass),
                  LimitReg = InProlog ? (unsigned)X86::RCX
                                      : MRI.createVirtualRegister(RegClass),
                  JoinReg = InProlog ? (unsigned)X86::RCX
                                     : MRI.createVirtualRegister(RegClass),
                  ProbeReg = InProlog ? (unsigned)X86::RCX
                                      : MRI.createVirtualRegister(RegClass);
 
   // SP-relative offsets where we can save RCX and RDX.
   int64_t RCXShadowSlot = 0;
   int64_t RDXShadowSlot = 0;
 
   // If inlining in the prolog, save RCX and RDX.     
   // Future optimization: don't save or restore if not live in.
   if (InProlog) {
     // Compute the offsets. We need to account for things already
     // pushed onto the stack at this point: return address, frame
     // pointer (if used), and callee saves.
     X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
     const int64_t CalleeSaveSize = X86FI->getCalleeSavedFrameSize();
     const bool HasFP = hasFP(MF);
     RCXShadowSlot = 8 + CalleeSaveSize + (HasFP ? 8 : 0);
     RDXShadowSlot = RCXShadowSlot + 8;
     // Emit the saves.
     addRegOffset(BuildMI(&MBB, DL, TII.get(X86::MOV64mr)), X86::RSP, false,
                  RCXShadowSlot)
         .addReg(X86::RCX);
     addRegOffset(BuildMI(&MBB, DL, TII.get(X86::MOV64mr)), X86::RSP, false,
                  RDXShadowSlot)
         .addReg(X86::RDX);
   } else {
     // Not in the prolog. Copy RAX to a virtual reg.
     BuildMI(&MBB, DL, TII.get(X86::MOV64rr), SizeReg).addReg(X86::RAX);
   }
 
   // Add code to MBB to check for overflow and set the new target stack pointer
   // to zero if so.
   BuildMI(&MBB, DL, TII.get(X86::XOR64rr), ZeroReg)
       .addReg(ZeroReg, RegState::Undef)
       .addReg(ZeroReg, RegState::Undef);
   BuildMI(&MBB, DL, TII.get(X86::MOV64rr), CopyReg).addReg(X86::RSP);
   BuildMI(&MBB, DL, TII.get(X86::SUB64rr), TestReg)
       .addReg(CopyReg)
       .addReg(SizeReg);
   BuildMI(&MBB, DL, TII.get(X86::CMOVB64rr), FinalReg)
       .addReg(TestReg)
       .addReg(ZeroReg);
 
   // FinalReg now holds final stack pointer value, or zero if
   // allocation would overflow. Compare against the current stack
   // limit from the thread environment block. Note this limit is the
   // lowest touched page on the stack, not the point at which the OS
   // will cause an overflow exception, so this is just an optimization
   // to avoid unnecessarily touching pages that are below the current
   // SP but already commited to the stack by the OS.
   BuildMI(&MBB, DL, TII.get(X86::MOV64rm), LimitReg)
       .addReg(0)
       .addImm(1)
       .addReg(0)
       .addImm(ThreadEnvironmentStackLimit)
       .addReg(X86::GS);
   BuildMI(&MBB, DL, TII.get(X86::CMP64rr)).addReg(FinalReg).addReg(LimitReg);
   // Jump if the desired stack pointer is at or above the stack limit.
   BuildMI(&MBB, DL, TII.get(X86::JAE_1)).addMBB(ContinueMBB);
 
   // Add code to roundMBB to round the final stack pointer to a page boundary.
   BuildMI(RoundMBB, DL, TII.get(X86::AND64ri32), RoundedReg)
       .addReg(FinalReg)
       .addImm(PageMask);
   BuildMI(RoundMBB, DL, TII.get(X86::JMP_1)).addMBB(LoopMBB);
 
   // LimitReg now holds the current stack limit, RoundedReg page-rounded
   // final RSP value. Add code to loopMBB to decrement LimitReg page-by-page
   // and probe until we reach RoundedReg.
   if (!InProlog) {
     BuildMI(LoopMBB, DL, TII.get(X86::PHI), JoinReg)
         .addReg(LimitReg)
         .addMBB(RoundMBB)
         .addReg(ProbeReg)
         .addMBB(LoopMBB);
   }
 
   addRegOffset(BuildMI(LoopMBB, DL, TII.get(X86::LEA64r), ProbeReg), JoinReg,
                false, -PageSize);
 
   // Probe by storing a byte onto the stack.
   BuildMI(LoopMBB, DL, TII.get(X86::MOV8mi))
       .addReg(ProbeReg)
       .addImm(1)
       .addReg(0)
       .addImm(0)
       .addReg(0)
       .addImm(0);
   BuildMI(LoopMBB, DL, TII.get(X86::CMP64rr))
       .addReg(RoundedReg)
       .addReg(ProbeReg);
   BuildMI(LoopMBB, DL, TII.get(X86::JNE_1)).addMBB(LoopMBB);
 
   MachineBasicBlock::iterator ContinueMBBI = ContinueMBB->getFirstNonPHI();
 
   // If in prolog, restore RDX and RCX.
   if (InProlog) {
     addRegOffset(BuildMI(*ContinueMBB, ContinueMBBI, DL, TII.get(X86::MOV64rm),
                          X86::RCX),
                  X86::RSP, false, RCXShadowSlot);
     addRegOffset(BuildMI(*ContinueMBB, ContinueMBBI, DL, TII.get(X86::MOV64rm),
                          X86::RDX),
                  X86::RSP, false, RDXShadowSlot);
   }
 
   // Now that the probing is done, add code to continueMBB to update
   // the stack pointer for real.
   BuildMI(*ContinueMBB, ContinueMBBI, DL, TII.get(X86::SUB64rr), X86::RSP)
       .addReg(X86::RSP)
       .addReg(SizeReg);
 
   // Add the control flow edges we need.
   MBB.addSuccessor(ContinueMBB);
   MBB.addSuccessor(RoundMBB);
   RoundMBB->addSuccessor(LoopMBB);
   LoopMBB->addSuccessor(ContinueMBB);
   LoopMBB->addSuccessor(LoopMBB);
 
   // Mark all the instructions added to the prolog as frame setup.
   if (InProlog) {
     for (++BeforeMBBI; BeforeMBBI != MBB.end(); ++BeforeMBBI) {
       BeforeMBBI->setFlag(MachineInstr::FrameSetup);
     }
     for (MachineInstr &MI : *RoundMBB) {
       MI.setFlag(MachineInstr::FrameSetup);
     }
     for (MachineInstr &MI : *LoopMBB) {
       MI.setFlag(MachineInstr::FrameSetup);
     }
     for (MachineBasicBlock::iterator CMBBI = ContinueMBB->begin();
          CMBBI != ContinueMBBI; ++CMBBI) {
       CMBBI->setFlag(MachineInstr::FrameSetup);
     }
   }
 
   // Possible TODO: physreg liveness for InProlog case.
 
   return ContinueMBBI;
 }
 
 MachineInstr *X86FrameLowering::emitStackProbeCall(
     MachineFunction &MF, MachineBasicBlock &MBB,
     MachineBasicBlock::iterator MBBI, DebugLoc DL, bool InProlog) const {
   bool IsLargeCodeModel = MF.getTarget().getCodeModel() == CodeModel::Large;
 
   unsigned CallOp;
   if (Is64Bit)
     CallOp = IsLargeCodeModel ? X86::CALL64r : X86::CALL64pcrel32;
   else
     CallOp = X86::CALLpcrel32;
 
   const char *Symbol;
   if (Is64Bit) {
     if (STI.isTargetCygMing()) {
       Symbol = "___chkstk_ms";
     } else {
       Symbol = "__chkstk";
     }
   } else if (STI.isTargetCygMing())
     Symbol = "_alloca";
   else
     Symbol = "_chkstk";
 
   MachineInstrBuilder CI;
   MachineBasicBlock::iterator ExpansionMBBI = std::prev(MBBI);
 
   // All current stack probes take AX and SP as input, clobber flags, and
   // preserve all registers. x86_64 probes leave RSP unmodified.
   if (Is64Bit && MF.getTarget().getCodeModel() == CodeModel::Large) {
     // For the large code model, we have to call through a register. Use R11,
     // as it is scratch in all supported calling conventions.
     BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri), X86::R11)
         .addExternalSymbol(Symbol);
     CI = BuildMI(MBB, MBBI, DL, TII.get(CallOp)).addReg(X86::R11);
   } else {
     CI = BuildMI(MBB, MBBI, DL, TII.get(CallOp)).addExternalSymbol(Symbol);
   }
 
   unsigned AX = Is64Bit ? X86::RAX : X86::EAX;
   unsigned SP = Is64Bit ? X86::RSP : X86::ESP;
   CI.addReg(AX, RegState::Implicit)
       .addReg(SP, RegState::Implicit)
       .addReg(AX, RegState::Define | RegState::Implicit)
       .addReg(SP, RegState::Define | RegState::Implicit)
       .addReg(X86::EFLAGS, RegState::Define | RegState::Implicit);
 
   if (Is64Bit) {
     // MSVC x64's __chkstk and cygwin/mingw's ___chkstk_ms do not adjust %rsp
     // themselves. It also does not clobber %rax so we can reuse it when
     // adjusting %rsp.
     BuildMI(MBB, MBBI, DL, TII.get(X86::SUB64rr), X86::RSP)
         .addReg(X86::RSP)
         .addReg(X86::RAX);
   }
 
   if (InProlog) {
     // Apply the frame setup flag to all inserted instrs.
     for (++ExpansionMBBI; ExpansionMBBI != MBBI; ++ExpansionMBBI)
       ExpansionMBBI->setFlag(MachineInstr::FrameSetup);
   }
 
   return MBBI;
 }
 
 MachineInstr *X86FrameLowering::emitStackProbeInlineStub(
     MachineFunction &MF, MachineBasicBlock &MBB,
     MachineBasicBlock::iterator MBBI, DebugLoc DL, bool InProlog) const {
 
   assert(InProlog && "ChkStkStub called outside prolog!");
 
   BuildMI(MBB, MBBI, DL, TII.get(X86::CALLpcrel32))
       .addExternalSymbol("__chkstk_stub");
 
   return MBBI;
 }
 
 static unsigned calculateSetFPREG(uint64_t SPAdjust) {
   // Win64 ABI has a less restrictive limitation of 240; 128 works equally well
   // and might require smaller successive adjustments.
   const uint64_t Win64MaxSEHOffset = 128;
   uint64_t SEHFrameOffset = std::min(SPAdjust, Win64MaxSEHOffset);
   // Win64 ABI requires 16-byte alignment for the UWOP_SET_FPREG opcode.
   return SEHFrameOffset & -16;
 }
 
 // If we're forcing a stack realignment we can't rely on just the frame
 // info, we need to know the ABI stack alignment as well in case we
 // have a call out.  Otherwise just make sure we have some alignment - we'll
 // go with the minimum SlotSize.
 uint64_t X86FrameLowering::calculateMaxStackAlign(const MachineFunction &MF) const {
   const MachineFrameInfo *MFI = MF.getFrameInfo();
   uint64_t MaxAlign = MFI->getMaxAlignment(); // Desired stack alignment.
   unsigned StackAlign = getStackAlignment();
   if (MF.getFunction()->hasFnAttribute("stackrealign")) {
     if (MFI->hasCalls())
       MaxAlign = (StackAlign > MaxAlign) ? StackAlign : MaxAlign;
     else if (MaxAlign < SlotSize)
       MaxAlign = SlotSize;
   }
   return MaxAlign;
 }
 
 void X86FrameLowering::BuildStackAlignAND(MachineBasicBlock &MBB,
                                           MachineBasicBlock::iterator MBBI,
                                           DebugLoc DL, unsigned Reg,
                                           uint64_t MaxAlign) const {
   uint64_t Val = -MaxAlign;
   unsigned AndOp = getANDriOpcode(Uses64BitFramePtr, Val);
   MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII.get(AndOp), Reg)
                          .addReg(Reg)
                          .addImm(Val)
                          .setMIFlag(MachineInstr::FrameSetup);
 
   // The EFLAGS implicit def is dead.
   MI->getOperand(3).setIsDead();
 }
 
 /// emitPrologue - Push callee-saved registers onto the stack, which
 /// automatically adjust the stack pointer. Adjust the stack pointer to allocate
 /// space for local variables. Also emit labels used by the exception handler to
 /// generate the exception handling frames.
 
 /*
   Here's a gist of what gets emitted:
 
   ; Establish frame pointer, if needed
   [if needs FP]
       push  %rbp
       .cfi_def_cfa_offset 16
       .cfi_offset %rbp, -16
       .seh_pushreg %rpb
       mov  %rsp, %rbp
       .cfi_def_cfa_register %rbp
 
   ; Spill general-purpose registers
   [for all callee-saved GPRs]
       pushq %<reg>
       [if not needs FP]
          .cfi_def_cfa_offset (offset from RETADDR)
       .seh_pushreg %<reg>
 
   ; If the required stack alignment > default stack alignment
   ; rsp needs to be re-aligned.  This creates a "re-alignment gap"
   ; of unknown size in the stack frame.
   [if stack needs re-alignment]
       and  $MASK, %rsp
 
   ; Allocate space for locals
   [if target is Windows and allocated space > 4096 bytes]
       ; Windows needs special care for allocations larger
       ; than one page.
       mov $NNN, %rax
       call ___chkstk_ms/___chkstk
       sub  %rax, %rsp
   [else]
       sub  $NNN, %rsp
 
   [if needs FP]
       .seh_stackalloc (size of XMM spill slots)
       .seh_setframe %rbp, SEHFrameOffset ; = size of all spill slots
   [else]
       .seh_stackalloc NNN
 
   ; Spill XMMs
   ; Note, that while only Windows 64 ABI specifies XMMs as callee-preserved,
   ; they may get spilled on any platform, if the current function
   ; calls @llvm.eh.unwind.init
   [if needs FP]
       [for all callee-saved XMM registers]
           movaps  %<xmm reg>, -MMM(%rbp)
       [for all callee-saved XMM registers]
           .seh_savexmm %<xmm reg>, (-MMM + SEHFrameOffset)
               ; i.e. the offset relative to (%rbp - SEHFrameOffset)
   [else]
       [for all callee-saved XMM registers]
           movaps  %<xmm reg>, KKK(%rsp)
       [for all callee-saved XMM registers]
           .seh_savexmm %<xmm reg>, KKK
 
   .seh_endprologue
 
   [if needs base pointer]
       mov  %rsp, %rbx
       [if needs to restore base pointer]
           mov %rsp, -MMM(%rbp)
 
   ; Emit CFI info
   [if needs FP]
       [for all callee-saved registers]
           .cfi_offset %<reg>, (offset from %rbp)
   [else]
        .cfi_def_cfa_offset (offset from RETADDR)
       [for all callee-saved registers]
           .cfi_offset %<reg>, (offset from %rsp)
 
   Notes:
   - .seh directives are emitted only for Windows 64 ABI
   - .cfi directives are emitted for all other ABIs
   - for 32-bit code, substitute %e?? registers for %r??
 */
 
 void X86FrameLowering::emitPrologue(MachineFunction &MF,
                                     MachineBasicBlock &MBB) const {
   assert(&STI == &MF.getSubtarget<X86Subtarget>() &&
          "MF used frame lowering for wrong subtarget");
   MachineBasicBlock::iterator MBBI = MBB.begin();
   MachineFrameInfo *MFI = MF.getFrameInfo();
   const Function *Fn = MF.getFunction();
   MachineModuleInfo &MMI = MF.getMMI();
   X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
   uint64_t MaxAlign = calculateMaxStackAlign(MF); // Desired stack alignment.
   uint64_t StackSize = MFI->getStackSize();    // Number of bytes to allocate.
   bool IsFunclet = MBB.isEHFuncletEntry();
   EHPersonality Personality = EHPersonality::Unknown;
   if (Fn->hasPersonalityFn())
     Personality = classifyEHPersonality(Fn->getPersonalityFn());
   bool FnHasClrFunclet =
       MMI.hasEHFunclets() && Personality == EHPersonality::CoreCLR;
   bool IsClrFunclet = IsFunclet && FnHasClrFunclet;
   bool HasFP = hasFP(MF);
   bool IsWin64CC = STI.isCallingConvWin64(Fn->getCallingConv());
   bool IsWin64Prologue = MF.getTarget().getMCAsmInfo()->usesWindowsCFI();
   bool NeedsWinCFI = IsWin64Prologue && Fn->needsUnwindTableEntry();
   bool NeedsDwarfCFI =
       !IsWin64Prologue && (MMI.hasDebugInfo() || Fn->needsUnwindTableEntry());
   unsigned FramePtr = TRI->getFrameRegister(MF);
   const unsigned MachineFramePtr =
       STI.isTarget64BitILP32()
           ? getX86SubSuperRegister(FramePtr, 64) : FramePtr;
   unsigned BasePtr = TRI->getBaseRegister();
   
   // Debug location must be unknown since the first debug location is used
   // to determine the end of the prologue.
   DebugLoc DL;
 
   // Add RETADDR move area to callee saved frame size.
   int TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();
   if (TailCallReturnAddrDelta && IsWin64Prologue)
     report_fatal_error("Can't handle guaranteed tail call under win64 yet");
 
   if (TailCallReturnAddrDelta < 0)
     X86FI->setCalleeSavedFrameSize(
       X86FI->getCalleeSavedFrameSize() - TailCallReturnAddrDelta);
 
   bool UseStackProbe = (STI.isOSWindows() && !STI.isTargetMachO());
 
   // The default stack probe size is 4096 if the function has no stackprobesize
   // attribute.
   unsigned StackProbeSize = 4096;
   if (Fn->hasFnAttribute("stack-probe-size"))
     Fn->getFnAttribute("stack-probe-size")
         .getValueAsString()
         .getAsInteger(0, StackProbeSize);
 
   // If this is x86-64 and the Red Zone is not disabled, if we are a leaf
   // function, and use up to 128 bytes of stack space, don't have a frame
   // pointer, calls, or dynamic alloca then we do not need to adjust the
   // stack pointer (we fit in the Red Zone). We also check that we don't
   // push and pop from the stack.
   if (Is64Bit && !Fn->hasFnAttribute(Attribute::NoRedZone) &&
       !TRI->needsStackRealignment(MF) &&
       !MFI->hasVarSizedObjects() &&             // No dynamic alloca.
       !MFI->adjustsStack() &&                   // No calls.
       !IsWin64CC &&                             // Win64 has no Red Zone
       !MFI->hasCopyImplyingStackAdjustment() && // Don't push and pop.
       !MF.shouldSplitStack()) {                 // Regular stack
     uint64_t MinSize = X86FI->getCalleeSavedFrameSize();
     if (HasFP) MinSize += SlotSize;
     StackSize = std::max(MinSize, StackSize > 128 ? StackSize - 128 : 0);
     MFI->setStackSize(StackSize);
   }
 
   // Insert stack pointer adjustment for later moving of return addr.  Only
   // applies to tail call optimized functions where the callee argument stack
   // size is bigger than the callers.
   if (TailCallReturnAddrDelta < 0) {
     BuildStackAdjustment(MBB, MBBI, DL, TailCallReturnAddrDelta,
                          /*InEpilogue=*/false)
         .setMIFlag(MachineInstr::FrameSetup);
   }
 
   // Mapping for machine moves:
   //
   //   DST: VirtualFP AND
   //        SRC: VirtualFP              => DW_CFA_def_cfa_offset
   //        ELSE                        => DW_CFA_def_cfa
   //
   //   SRC: VirtualFP AND
   //        DST: Register               => DW_CFA_def_cfa_register
   //
   //   ELSE
   //        OFFSET < 0                  => DW_CFA_offset_extended_sf
   //        REG < 64                    => DW_CFA_offset + Reg
   //        ELSE                        => DW_CFA_offset_extended
 
   uint64_t NumBytes = 0;
   int stackGrowth = -SlotSize;
 
   // Find the funclet establisher parameter
   unsigned Establisher = X86::NoRegister;
   if (IsClrFunclet)
     Establisher = Uses64BitFramePtr ? X86::RCX : X86::ECX;
   else if (IsFunclet)
     Establisher = Uses64BitFramePtr ? X86::RDX : X86::EDX;
 
   if (IsWin64Prologue && IsFunclet && !IsClrFunclet) {
     // Immediately spill establisher into the home slot.
     // The runtime cares about this.
     // MOV64mr %rdx, 16(%rsp)
     unsigned MOVmr = Uses64BitFramePtr ? X86::MOV64mr : X86::MOV32mr;
     addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(MOVmr)), StackPtr, true, 16)
         .addReg(Establisher)
         .setMIFlag(MachineInstr::FrameSetup);
     MBB.addLiveIn(Establisher);
   }
 
   if (HasFP) {
     // Calculate required stack adjustment.
     uint64_t FrameSize = StackSize - SlotSize;
     // If required, include space for extra hidden slot for stashing base pointer.
     if (X86FI->getRestoreBasePointer())
       FrameSize += SlotSize;
 
     NumBytes = FrameSize - X86FI->getCalleeSavedFrameSize();
 
     // Callee-saved registers are pushed on stack before the stack is realigned.
     if (TRI->needsStackRealignment(MF) && !IsWin64Prologue)
       NumBytes = RoundUpToAlignment(NumBytes, MaxAlign);
 
     // Get the offset of the stack slot for the EBP register, which is
     // guaranteed to be the last slot by processFunctionBeforeFrameFinalized.
     // Update the frame offset adjustment.
     if (!IsFunclet)
       MFI->setOffsetAdjustment(-NumBytes);
     else
       assert(MFI->getOffsetAdjustment() == -(int)NumBytes &&
              "should calculate same local variable offset for funclets");
 
     // Save EBP/RBP into the appropriate stack slot.
     BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::PUSH64r : X86::PUSH32r))
       .addReg(MachineFramePtr, RegState::Kill)
       .setMIFlag(MachineInstr::FrameSetup);
 
     if (NeedsDwarfCFI) {
       // Mark the place where EBP/RBP was saved.
       // Define the current CFA rule to use the provided offset.
       assert(StackSize);
       BuildCFI(MBB, MBBI, DL,
                MCCFIInstruction::createDefCfaOffset(nullptr, 2 * stackGrowth));
 
       // Change the rule for the FramePtr to be an "offset" rule.
       unsigned DwarfFramePtr = TRI->getDwarfRegNum(MachineFramePtr, true);
       BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createOffset(
                                   nullptr, DwarfFramePtr, 2 * stackGrowth));
     }
 
     if (NeedsWinCFI) {
       BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
           .addImm(FramePtr)
           .setMIFlag(MachineInstr::FrameSetup);
     }
 
     if (!IsWin64Prologue && !IsFunclet) {
       // Update EBP with the new base value.
       BuildMI(MBB, MBBI, DL,
               TII.get(Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr),
               FramePtr)
           .addReg(StackPtr)
           .setMIFlag(MachineInstr::FrameSetup);
 
       if (NeedsDwarfCFI) {
         // Mark effective beginning of when frame pointer becomes valid.
         // Define the current CFA to use the EBP/RBP register.
         unsigned DwarfFramePtr = TRI->getDwarfRegNum(MachineFramePtr, true);
         BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createDefCfaRegister(
                                     nullptr, DwarfFramePtr));
       }
     }
 
     // Mark the FramePtr as live-in in every block. Don't do this again for
     // funclet prologues.
     if (!IsFunclet) {
       for (MachineBasicBlock &EveryMBB : MF)
         EveryMBB.addLiveIn(MachineFramePtr);
     }
   } else {
     assert(!IsFunclet && "funclets without FPs not yet implemented");
     NumBytes = StackSize - X86FI->getCalleeSavedFrameSize();
   }
 
   // For EH funclets, only allocate enough space for outgoing calls. Save the
   // NumBytes value that we would've used for the parent frame.
   unsigned ParentFrameNumBytes = NumBytes;
   if (IsFunclet)
     NumBytes = getWinEHFuncletFrameSize(MF);
 
   // Skip the callee-saved push instructions.
   bool PushedRegs = false;
   int StackOffset = 2 * stackGrowth;
 
   while (MBBI != MBB.end() &&
          MBBI->getFlag(MachineInstr::FrameSetup) &&
          (MBBI->getOpcode() == X86::PUSH32r ||
           MBBI->getOpcode() == X86::PUSH64r)) {
     PushedRegs = true;
     unsigned Reg = MBBI->getOperand(0).getReg();
     ++MBBI;
 
     if (!HasFP && NeedsDwarfCFI) {
       // Mark callee-saved push instruction.
       // Define the current CFA rule to use the provided offset.
       assert(StackSize);
       BuildCFI(MBB, MBBI, DL,
                MCCFIInstruction::createDefCfaOffset(nullptr, StackOffset));
       StackOffset += stackGrowth;
     }
 
     if (NeedsWinCFI) {
       BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg)).addImm(Reg).setMIFlag(
           MachineInstr::FrameSetup);
     }
   }
 
   // Realign stack after we pushed callee-saved registers (so that we'll be
   // able to calculate their offsets from the frame pointer).
   // Don't do this for Win64, it needs to realign the stack after the prologue.
   if (!IsWin64Prologue && !IsFunclet && TRI->needsStackRealignment(MF)) {
     assert(HasFP && "There should be a frame pointer if stack is realigned.");
     BuildStackAlignAND(MBB, MBBI, DL, StackPtr, MaxAlign);
   }
 
   // If there is an SUB32ri of ESP immediately before this instruction, merge
   // the two. This can be the case when tail call elimination is enabled and
   // the callee has more arguments then the caller.
   NumBytes -= mergeSPUpdates(MBB, MBBI, true);
 
   // Adjust stack pointer: ESP -= numbytes.
 
   // Windows and cygwin/mingw require a prologue helper routine when allocating
   // more than 4K bytes on the stack.  Windows uses __chkstk and cygwin/mingw
   // uses __alloca.  __alloca and the 32-bit version of __chkstk will probe the
   // stack and adjust the stack pointer in one go.  The 64-bit version of
   // __chkstk is only responsible for probing the stack.  The 64-bit prologue is
   // responsible for adjusting the stack pointer.  Touching the stack at 4K
   // increments is necessary to ensure that the guard pages used by the OS
   // virtual memory manager are allocated in correct sequence.
   uint64_t AlignedNumBytes = NumBytes;
   if (IsWin64Prologue && !IsFunclet && TRI->needsStackRealignment(MF))
     AlignedNumBytes = RoundUpToAlignment(AlignedNumBytes, MaxAlign);
   if (AlignedNumBytes >= StackProbeSize && UseStackProbe) {
-    // Check whether EAX is livein for this function.
-    bool isEAXAlive = isEAXLiveIn(MF);
+    // Check whether EAX is livein for this block.
+    bool isEAXAlive = isEAXLiveIn(MBB);
 
     if (isEAXAlive) {
       // Sanity check that EAX is not livein for this function.
       // It should not be, so throw an assert.
       assert(!Is64Bit && "EAX is livein in x64 case!");
 
       // Save EAX
       BuildMI(MBB, MBBI, DL, TII.get(X86::PUSH32r))
         .addReg(X86::EAX, RegState::Kill)
         .setMIFlag(MachineInstr::FrameSetup);
     }
 
     if (Is64Bit) {
       // Handle the 64-bit Windows ABI case where we need to call __chkstk.
       // Function prologue is responsible for adjusting the stack pointer.
       if (isUInt<32>(NumBytes)) {
         BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), X86::EAX)
             .addImm(NumBytes)
             .setMIFlag(MachineInstr::FrameSetup);
       } else if (isInt<32>(NumBytes)) {
         BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri32), X86::RAX)
             .addImm(NumBytes)
             .setMIFlag(MachineInstr::FrameSetup);
       } else {
         BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri), X86::RAX)
             .addImm(NumBytes)
             .setMIFlag(MachineInstr::FrameSetup);
       }
     } else {
       // Allocate NumBytes-4 bytes on stack in case of isEAXAlive.
       // We'll also use 4 already allocated bytes for EAX.
       BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), X86::EAX)
           .addImm(isEAXAlive ? NumBytes - 4 : NumBytes)
           .setMIFlag(MachineInstr::FrameSetup);
     }
 
     // Call __chkstk, __chkstk_ms, or __alloca.
     emitStackProbe(MF, MBB, MBBI, DL, true);
 
     if (isEAXAlive) {
       // Restore EAX
       MachineInstr *MI =
           addRegOffset(BuildMI(MF, DL, TII.get(X86::MOV32rm), X86::EAX),
                        StackPtr, false, NumBytes - 4);
       MI->setFlag(MachineInstr::FrameSetup);
       MBB.insert(MBBI, MI);
     }
   } else if (NumBytes) {
     emitSPUpdate(MBB, MBBI, -(int64_t)NumBytes, /*InEpilogue=*/false);
   }
 
   if (NeedsWinCFI && NumBytes)
     BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_StackAlloc))
         .addImm(NumBytes)
         .setMIFlag(MachineInstr::FrameSetup);
 
   int SEHFrameOffset = 0;
   unsigned SPOrEstablisher;
   if (IsFunclet) {
     if (IsClrFunclet) {
       // The establisher parameter passed to a CLR funclet is actually a pointer
       // to the (mostly empty) frame of its nearest enclosing funclet; we have
       // to find the root function establisher frame by loading the PSPSym from
       // the intermediate frame.
       unsigned PSPSlotOffset = getPSPSlotOffsetFromSP(MF);
       MachinePointerInfo NoInfo;
       MBB.addLiveIn(Establisher);
       addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64rm), Establisher),
                    Establisher, false, PSPSlotOffset)
           .addMemOperand(MF.getMachineMemOperand(
               NoInfo, MachineMemOperand::MOLoad, SlotSize, SlotSize));
       ;
       // Save the root establisher back into the current funclet's (mostly
       // empty) frame, in case a sub-funclet or the GC needs it.
       addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64mr)), StackPtr,
                    false, PSPSlotOffset)
           .addReg(Establisher)
           .addMemOperand(
               MF.getMachineMemOperand(NoInfo, MachineMemOperand::MOStore |
                                                   MachineMemOperand::MOVolatile,
                                       SlotSize, SlotSize));
     }
     SPOrEstablisher = Establisher;
   } else {
     SPOrEstablisher = StackPtr;
   }
 
   if (IsWin64Prologue && HasFP) {
     // Set RBP to a small fixed offset from RSP. In the funclet case, we base
     // this calculation on the incoming establisher, which holds the value of
     // RSP from the parent frame at the end of the prologue.
     SEHFrameOffset = calculateSetFPREG(ParentFrameNumBytes);
     if (SEHFrameOffset)
       addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(X86::LEA64r), FramePtr),
                    SPOrEstablisher, false, SEHFrameOffset);
     else
       BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64rr), FramePtr)
           .addReg(SPOrEstablisher);
 
     // If this is not a funclet, emit the CFI describing our frame pointer.
     if (NeedsWinCFI && !IsFunclet) {
       BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_SetFrame))
           .addImm(FramePtr)
           .addImm(SEHFrameOffset)
           .setMIFlag(MachineInstr::FrameSetup);
       if (isAsynchronousEHPersonality(Personality))
         MF.getWinEHFuncInfo()->SEHSetFrameOffset = SEHFrameOffset;
     }
   } else if (IsFunclet && STI.is32Bit()) {
     // Reset EBP / ESI to something good for funclets.
     MBBI = restoreWin32EHStackPointers(MBB, MBBI, DL);
     // If we're a catch funclet, we can be returned to via catchret. Save ESP
     // into the registration node so that the runtime will restore it for us.
     if (!MBB.isCleanupFuncletEntry()) {
       assert(Personality == EHPersonality::MSVC_CXX);
       unsigned FrameReg;
       int FI = MF.getWinEHFuncInfo()->EHRegNodeFrameIndex;
       int64_t EHRegOffset = getFrameIndexReference(MF, FI, FrameReg);
       // ESP is the first field, so no extra displacement is needed.
       addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32mr)), FrameReg,
                    false, EHRegOffset)
           .addReg(X86::ESP);
     }
   }
 
   while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup)) {
     const MachineInstr *FrameInstr = &*MBBI;
     ++MBBI;
 
     if (NeedsWinCFI) {
       int FI;
       if (unsigned Reg = TII.isStoreToStackSlot(FrameInstr, FI)) {
         if (X86::FR64RegClass.contains(Reg)) {
           unsigned IgnoredFrameReg;
           int Offset = getFrameIndexReference(MF, FI, IgnoredFrameReg);
           Offset += SEHFrameOffset;
 
           BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_SaveXMM))
               .addImm(Reg)
               .addImm(Offset)
               .setMIFlag(MachineInstr::FrameSetup);
         }
       }
     }
   }
 
   if (NeedsWinCFI)
     BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_EndPrologue))
         .setMIFlag(MachineInstr::FrameSetup);
 
   if (FnHasClrFunclet && !IsFunclet) {
     // Save the so-called Initial-SP (i.e. the value of the stack pointer
     // immediately after the prolog)  into the PSPSlot so that funclets
     // and the GC can recover it.
     unsigned PSPSlotOffset = getPSPSlotOffsetFromSP(MF);
     auto PSPInfo = MachinePointerInfo::getFixedStack(
         MF, MF.getWinEHFuncInfo()->PSPSymFrameIdx);
     addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64mr)), StackPtr, false,
                  PSPSlotOffset)
         .addReg(StackPtr)
         .addMemOperand(MF.getMachineMemOperand(
             PSPInfo, MachineMemOperand::MOStore | MachineMemOperand::MOVolatile,
             SlotSize, SlotSize));
   }
 
   // Realign stack after we spilled callee-saved registers (so that we'll be
   // able to calculate their offsets from the frame pointer).
   // Win64 requires aligning the stack after the prologue.
   if (IsWin64Prologue && TRI->needsStackRealignment(MF)) {
     assert(HasFP && "There should be a frame pointer if stack is realigned.");
     BuildStackAlignAND(MBB, MBBI, DL, SPOrEstablisher, MaxAlign);
   }
 
   // We already dealt with stack realignment and funclets above.
   if (IsFunclet && STI.is32Bit())
     return;
 
   // If we need a base pointer, set it up here. It's whatever the value
   // of the stack pointer is at this point. Any variable size objects
   // will be allocated after this, so we can still use the base pointer
   // to reference locals.
   if (TRI->hasBasePointer(MF)) {
     // Update the base pointer with the current stack pointer.
     unsigned Opc = Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr;
     BuildMI(MBB, MBBI, DL, TII.get(Opc), BasePtr)
       .addReg(SPOrEstablisher)
       .setMIFlag(MachineInstr::FrameSetup);
     if (X86FI->getRestoreBasePointer()) {
       // Stash value of base pointer.  Saving RSP instead of EBP shortens
       // dependence chain. Used by SjLj EH.
       unsigned Opm = Uses64BitFramePtr ? X86::MOV64mr : X86::MOV32mr;
       addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(Opm)),
                    FramePtr, true, X86FI->getRestoreBasePointerOffset())
         .addReg(SPOrEstablisher)
         .setMIFlag(MachineInstr::FrameSetup);
     }
 
     if (X86FI->getHasSEHFramePtrSave() && !IsFunclet) {
       // Stash the value of the frame pointer relative to the base pointer for
       // Win32 EH. This supports Win32 EH, which does the inverse of the above:
       // it recovers the frame pointer from the base pointer rather than the
       // other way around.
       unsigned Opm = Uses64BitFramePtr ? X86::MOV64mr : X86::MOV32mr;
       unsigned UsedReg;
       int Offset =
           getFrameIndexReference(MF, X86FI->getSEHFramePtrSaveIndex(), UsedReg);
       assert(UsedReg == BasePtr);
       addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(Opm)), UsedReg, true, Offset)
           .addReg(FramePtr)
           .setMIFlag(MachineInstr::FrameSetup);
     }
   }
 
   if (((!HasFP && NumBytes) || PushedRegs) && NeedsDwarfCFI) {
     // Mark end of stack pointer adjustment.
     if (!HasFP && NumBytes) {
       // Define the current CFA rule to use the provided offset.
       assert(StackSize);
       BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createDefCfaOffset(
                                   nullptr, -StackSize + stackGrowth));
     }
 
     // Emit DWARF info specifying the offsets of the callee-saved registers.
     if (PushedRegs)
       emitCalleeSavedFrameMoves(MBB, MBBI, DL);
   }
 }
 
 bool X86FrameLowering::canUseLEAForSPInEpilogue(
     const MachineFunction &MF) const {
   // We can't use LEA instructions for adjusting the stack pointer if this is a
   // leaf function in the Win64 ABI.  Only ADD instructions may be used to
   // deallocate the stack.
   // This means that we can use LEA for SP in two situations:
   // 1. We *aren't* using the Win64 ABI which means we are free to use LEA.
   // 2. We *have* a frame pointer which means we are permitted to use LEA.
   return !MF.getTarget().getMCAsmInfo()->usesWindowsCFI() || hasFP(MF);
 }
 
 static bool isFuncletReturnInstr(MachineInstr *MI) {
   switch (MI->getOpcode()) {
   case X86::CATCHRET:
   case X86::CLEANUPRET:
     return true;
   default:
     return false;
   }
   llvm_unreachable("impossible");
 }
 
 // CLR funclets use a special "Previous Stack Pointer Symbol" slot on the
 // stack. It holds a pointer to the bottom of the root function frame.  The
 // establisher frame pointer passed to a nested funclet may point to the
 // (mostly empty) frame of its parent funclet, but it will need to find
 // the frame of the root function to access locals.  To facilitate this,
 // every funclet copies the pointer to the bottom of the root function
 // frame into a PSPSym slot in its own (mostly empty) stack frame. Using the
 // same offset for the PSPSym in the root function frame that's used in the
 // funclets' frames allows each funclet to dynamically accept any ancestor
 // frame as its establisher argument (the runtime doesn't guarantee the
 // immediate parent for some reason lost to history), and also allows the GC,
 // which uses the PSPSym for some bookkeeping, to find it in any funclet's
 // frame with only a single offset reported for the entire method.
 unsigned
 X86FrameLowering::getPSPSlotOffsetFromSP(const MachineFunction &MF) const {
   const WinEHFuncInfo &Info = *MF.getWinEHFuncInfo();
   // getFrameIndexReferenceFromSP has an out ref parameter for the stack
   // pointer register; pass a dummy that we ignore
   unsigned SPReg;
   int Offset = getFrameIndexReferenceFromSP(MF, Info.PSPSymFrameIdx, SPReg);
   assert(Offset >= 0);
   return static_cast<unsigned>(Offset);
 }
 
 unsigned
 X86FrameLowering::getWinEHFuncletFrameSize(const MachineFunction &MF) const {
   // This is the size of the pushed CSRs.
   unsigned CSSize =
       MF.getInfo<X86MachineFunctionInfo>()->getCalleeSavedFrameSize();
   // This is the amount of stack a funclet needs to allocate.
   unsigned UsedSize;
   EHPersonality Personality =
       classifyEHPersonality(MF.getFunction()->getPersonalityFn());
   if (Personality == EHPersonality::CoreCLR) {
     // CLR funclets need to hold enough space to include the PSPSym, at the
     // same offset from the stack pointer (immediately after the prolog) as it
     // resides at in the main function.
     UsedSize = getPSPSlotOffsetFromSP(MF) + SlotSize;
   } else {
     // Other funclets just need enough stack for outgoing call arguments.
     UsedSize = MF.getFrameInfo()->getMaxCallFrameSize();
   }
   // RBP is not included in the callee saved register block. After pushing RBP,
   // everything is 16 byte aligned. Everything we allocate before an outgoing
   // call must also be 16 byte aligned.
   unsigned FrameSizeMinusRBP =
       RoundUpToAlignment(CSSize + UsedSize, getStackAlignment());
   // Subtract out the size of the callee saved registers. This is how much stack
   // each funclet will allocate.
   return FrameSizeMinusRBP - CSSize;
 }
 
 void X86FrameLowering::emitEpilogue(MachineFunction &MF,
                                     MachineBasicBlock &MBB) const {
   const MachineFrameInfo *MFI = MF.getFrameInfo();
   X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
   MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
   DebugLoc DL;
   if (MBBI != MBB.end())
     DL = MBBI->getDebugLoc();
   // standard x86_64 and NaCl use 64-bit frame/stack pointers, x32 - 32-bit.
   const bool Is64BitILP32 = STI.isTarget64BitILP32();
   unsigned FramePtr = TRI->getFrameRegister(MF);
   unsigned MachineFramePtr =
       Is64BitILP32 ? getX86SubSuperRegister(FramePtr, 64) : FramePtr;
 
   bool IsWin64Prologue = MF.getTarget().getMCAsmInfo()->usesWindowsCFI();
   bool NeedsWinCFI =
       IsWin64Prologue && MF.getFunction()->needsUnwindTableEntry();
   bool IsFunclet = isFuncletReturnInstr(MBBI);
   MachineBasicBlock *TargetMBB = nullptr;
 
   // Get the number of bytes to allocate from the FrameInfo.
   uint64_t StackSize = MFI->getStackSize();
   uint64_t MaxAlign = calculateMaxStackAlign(MF);
   unsigned CSSize = X86FI->getCalleeSavedFrameSize();
   uint64_t NumBytes = 0;
 
   if (MBBI->getOpcode() == X86::CATCHRET) {
     // SEH shouldn't use catchret.
     assert(!isAsynchronousEHPersonality(
                classifyEHPersonality(MF.getFunction()->getPersonalityFn())) &&
            "SEH should not use CATCHRET");
 
     NumBytes = getWinEHFuncletFrameSize(MF);
     assert(hasFP(MF) && "EH funclets without FP not yet implemented");
     TargetMBB = MBBI->getOperand(0).getMBB();
 
     // Pop EBP.
     BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::POP64r : X86::POP32r),
             MachineFramePtr)
         .setMIFlag(MachineInstr::FrameDestroy);
   } else if (MBBI->getOpcode() == X86::CLEANUPRET) {
     NumBytes = getWinEHFuncletFrameSize(MF);
     assert(hasFP(MF) && "EH funclets without FP not yet implemented");
     BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::POP64r : X86::POP32r),
             MachineFramePtr)
         .setMIFlag(MachineInstr::FrameDestroy);
   } else if (hasFP(MF)) {
     // Calculate required stack adjustment.
     uint64_t FrameSize = StackSize - SlotSize;
     NumBytes = FrameSize - CSSize;
 
     // Callee-saved registers were pushed on stack before the stack was
     // realigned.
     if (TRI->needsStackRealignment(MF) && !IsWin64Prologue)
       NumBytes = RoundUpToAlignment(FrameSize, MaxAlign);
 
     // Pop EBP.
     BuildMI(MBB, MBBI, DL,
             TII.get(Is64Bit ? X86::POP64r : X86::POP32r), MachineFramePtr)
         .setMIFlag(MachineInstr::FrameDestroy);
   } else {
     NumBytes = StackSize - CSSize;
   }
   uint64_t SEHStackAllocAmt = NumBytes;
 
   // Skip the callee-saved pop instructions.
   while (MBBI != MBB.begin()) {
     MachineBasicBlock::iterator PI = std::prev(MBBI);
     unsigned Opc = PI->getOpcode();
 
     if ((Opc != X86::POP32r || !PI->getFlag(MachineInstr::FrameDestroy)) &&
         (Opc != X86::POP64r || !PI->getFlag(MachineInstr::FrameDestroy)) &&
         Opc != X86::DBG_VALUE && !PI->isTerminator())
       break;
 
     --MBBI;
   }
   MachineBasicBlock::iterator FirstCSPop = MBBI;
 
   if (TargetMBB) {
     // Fill EAX/RAX with the address of the target block.
     unsigned ReturnReg = STI.is64Bit() ? X86::RAX : X86::EAX;
     if (STI.is64Bit()) {
       // LEA64r TargetMBB(%rip), %rax
       BuildMI(MBB, FirstCSPop, DL, TII.get(X86::LEA64r), ReturnReg)
           .addReg(X86::RIP)
           .addImm(0)
           .addReg(0)
           .addMBB(TargetMBB)
           .addReg(0);
     } else {
       // MOV32ri $TargetMBB, %eax
       BuildMI(MBB, FirstCSPop, DL, TII.get(X86::MOV32ri), ReturnReg)
           .addMBB(TargetMBB);
     }
     // Record that we've taken the address of TargetMBB and no longer just
     // reference it in a terminator.
     TargetMBB->setHasAddressTaken();
   }
 
   if (MBBI != MBB.end())
     DL = MBBI->getDebugLoc();
 
   // If there is an ADD32ri or SUB32ri of ESP immediately before this
   // instruction, merge the two instructions.
   if (NumBytes || MFI->hasVarSizedObjects())
     NumBytes += mergeSPUpdates(MBB, MBBI, true);
 
   // If dynamic alloca is used, then reset esp to point to the last callee-saved
   // slot before popping them off! Same applies for the case, when stack was
   // realigned. Don't do this if this was a funclet epilogue, since the funclets
   // will not do realignment or dynamic stack allocation.
   if ((TRI->needsStackRealignment(MF) || MFI->hasVarSizedObjects()) &&
       !IsFunclet) {
     if (TRI->needsStackRealignment(MF))
       MBBI = FirstCSPop;
     unsigned SEHFrameOffset = calculateSetFPREG(SEHStackAllocAmt);
     uint64_t LEAAmount =
         IsWin64Prologue ? SEHStackAllocAmt - SEHFrameOffset : -CSSize;
 
     // There are only two legal forms of epilogue:
     // - add SEHAllocationSize, %rsp
     // - lea SEHAllocationSize(%FramePtr), %rsp
     //
     // 'mov %FramePtr, %rsp' will not be recognized as an epilogue sequence.
     // However, we may use this sequence if we have a frame pointer because the
     // effects of the prologue can safely be undone.
     if (LEAAmount != 0) {
       unsigned Opc = getLEArOpcode(Uses64BitFramePtr);
       addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr),
                    FramePtr, false, LEAAmount);
       --MBBI;
     } else {
       unsigned Opc = (Uses64BitFramePtr ? X86::MOV64rr : X86::MOV32rr);
       BuildMI(MBB, MBBI, DL, TII.get(Opc), StackPtr)
         .addReg(FramePtr);
       --MBBI;
     }
   } else if (NumBytes) {
     // Adjust stack pointer back: ESP += numbytes.
     emitSPUpdate(MBB, MBBI, NumBytes, /*InEpilogue=*/true);
     --MBBI;
   }
 
   // Windows unwinder will not invoke function's exception handler if IP is
   // either in prologue or in epilogue.  This behavior causes a problem when a
   // call immediately precedes an epilogue, because the return address points
   // into the epilogue.  To cope with that, we insert an epilogue marker here,
   // then replace it with a 'nop' if it ends up immediately after a CALL in the
   // final emitted code.
   if (NeedsWinCFI)
     BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_Epilogue));
 
   // Add the return addr area delta back since we are not tail calling.
   int Offset = -1 * X86FI->getTCReturnAddrDelta();
   assert(Offset >= 0 && "TCDelta should never be positive");
   if (Offset) {
     MBBI = MBB.getFirstTerminator();
 
     // Check for possible merge with preceding ADD instruction.
     Offset += mergeSPUpdates(MBB, MBBI, true);
     emitSPUpdate(MBB, MBBI, Offset, /*InEpilogue=*/true);
   }
 }
 
 // NOTE: this only has a subset of the full frame index logic. In
 // particular, the FI < 0 and AfterFPPop logic is handled in
 // X86RegisterInfo::eliminateFrameIndex, but not here. Possibly
 // (probably?) it should be moved into here.
 int X86FrameLowering::getFrameIndexReference(const MachineFunction &MF, int FI,
                                              unsigned &FrameReg) const {
   const MachineFrameInfo *MFI = MF.getFrameInfo();
 
   // We can't calculate offset from frame pointer if the stack is realigned,
   // so enforce usage of stack/base pointer.  The base pointer is used when we
   // have dynamic allocas in addition to dynamic realignment.
   if (TRI->hasBasePointer(MF))
     FrameReg = TRI->getBaseRegister();
   else if (TRI->needsStackRealignment(MF))
     FrameReg = TRI->getStackRegister();
   else
     FrameReg = TRI->getFrameRegister(MF);
 
   // Offset will hold the offset from the stack pointer at function entry to the
   // object.
   // We need to factor in additional offsets applied during the prologue to the
   // frame, base, and stack pointer depending on which is used.
   int Offset = MFI->getObjectOffset(FI) - getOffsetOfLocalArea();
   const X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
   unsigned CSSize = X86FI->getCalleeSavedFrameSize();
   uint64_t StackSize = MFI->getStackSize();
   bool HasFP = hasFP(MF);
   bool IsWin64Prologue = MF.getTarget().getMCAsmInfo()->usesWindowsCFI();
   int64_t FPDelta = 0;
 
   if (IsWin64Prologue) {
     assert(!MFI->hasCalls() || (StackSize % 16) == 8);
 
     // Calculate required stack adjustment.
     uint64_t FrameSize = StackSize - SlotSize;
     // If required, include space for extra hidden slot for stashing base pointer.
     if (X86FI->getRestoreBasePointer())
       FrameSize += SlotSize;
     uint64_t NumBytes = FrameSize - CSSize;
 
     uint64_t SEHFrameOffset = calculateSetFPREG(NumBytes);
     if (FI && FI == X86FI->getFAIndex())
       return -SEHFrameOffset;
 
     // FPDelta is the offset from the "traditional" FP location of the old base
     // pointer followed by return address and the location required by the
     // restricted Win64 prologue.
     // Add FPDelta to all offsets below that go through the frame pointer.
     FPDelta = FrameSize - SEHFrameOffset;
     assert((!MFI->hasCalls() || (FPDelta % 16) == 0) &&
            "FPDelta isn't aligned per the Win64 ABI!");
   }
 
 
   if (TRI->hasBasePointer(MF)) {
     assert(HasFP && "VLAs and dynamic stack realign, but no FP?!");
     if (FI < 0) {
       // Skip the saved EBP.
       return Offset + SlotSize + FPDelta;
     } else {
       assert((-(Offset + StackSize)) % MFI->getObjectAlignment(FI) == 0);
       return Offset + StackSize;
     }
   } else if (TRI->needsStackRealignment(MF)) {
     if (FI < 0) {
       // Skip the saved EBP.
       return Offset + SlotSize + FPDelta;
     } else {
       assert((-(Offset + StackSize)) % MFI->getObjectAlignment(FI) == 0);
       return Offset + StackSize;
     }
     // FIXME: Support tail calls
   } else {
     if (!HasFP)
       return Offset + StackSize;
 
     // Skip the saved EBP.
     Offset += SlotSize;
 
     // Skip the RETADDR move area
     int TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();
     if (TailCallReturnAddrDelta < 0)
       Offset -= TailCallReturnAddrDelta;
   }
 
   return Offset + FPDelta;
 }
 
 // Simplified from getFrameIndexReference keeping only StackPointer cases
 int X86FrameLowering::getFrameIndexReferenceFromSP(const MachineFunction &MF,
                                                    int FI,
                                                    unsigned &FrameReg) const {
   const MachineFrameInfo *MFI = MF.getFrameInfo();
   // Does not include any dynamic realign.
   const uint64_t StackSize = MFI->getStackSize();
   {
 #ifndef NDEBUG
     // LLVM arranges the stack as follows:
     //   ...
     //   ARG2
     //   ARG1
     //   RETADDR
     //   PUSH RBP   <-- RBP points here
     //   PUSH CSRs
     //   ~~~~~~~    <-- possible stack realignment (non-win64)
     //   ...
     //   STACK OBJECTS
     //   ...        <-- RSP after prologue points here
     //   ~~~~~~~    <-- possible stack realignment (win64)
     //
     // if (hasVarSizedObjects()):
     //   ...        <-- "base pointer" (ESI/RBX) points here
     //   DYNAMIC ALLOCAS
     //   ...        <-- RSP points here
     //
     // Case 1: In the simple case of no stack realignment and no dynamic
     // allocas, both "fixed" stack objects (arguments and CSRs) are addressable
     // with fixed offsets from RSP.
     //
     // Case 2: In the case of stack realignment with no dynamic allocas, fixed
     // stack objects are addressed with RBP and regular stack objects with RSP.
     //
     // Case 3: In the case of dynamic allocas and stack realignment, RSP is used
     // to address stack arguments for outgoing calls and nothing else. The "base
     // pointer" points to local variables, and RBP points to fixed objects.
     //
     // In cases 2 and 3, we can only answer for non-fixed stack objects, and the
     // answer we give is relative to the SP after the prologue, and not the
     // SP in the middle of the function.
 
     assert((!MFI->isFixedObjectIndex(FI) || !TRI->needsStackRealignment(MF) ||
             STI.isTargetWin64()) &&
            "offset from fixed object to SP is not static");
 
     // We don't handle tail calls, and shouldn't be seeing them either.
     int TailCallReturnAddrDelta =
         MF.getInfo<X86MachineFunctionInfo>()->getTCReturnAddrDelta();
     assert(!(TailCallReturnAddrDelta < 0) && "we don't handle this case!");
 #endif
   }
 
   // Fill in FrameReg output argument.
   FrameReg = TRI->getStackRegister();
 
   // This is how the math works out:
   //
   //  %rsp grows (i.e. gets lower) left to right. Each box below is
   //  one word (eight bytes).  Obj0 is the stack slot we're trying to
   //  get to.
   //
   //    ----------------------------------
   //    | BP | Obj0 | Obj1 | ... | ObjN |
   //    ----------------------------------
   //    ^    ^      ^                   ^
   //    A    B      C                   E
   //
   // A is the incoming stack pointer.
   // (B - A) is the local area offset (-8 for x86-64) [1]
   // (C - A) is the Offset returned by MFI->getObjectOffset for Obj0 [2]
   //
   // |(E - B)| is the StackSize (absolute value, positive).  For a
   // stack that grown down, this works out to be (B - E). [3]
   //
   // E is also the value of %rsp after stack has been set up, and we
   // want (C - E) -- the value we can add to %rsp to get to Obj0.  Now
   // (C - E) == (C - A) - (B - A) + (B - E)
   //            { Using [1], [2] and [3] above }
   //         == getObjectOffset - LocalAreaOffset + StackSize
   //
 
   // Get the Offset from the StackPointer
   int Offset = MFI->getObjectOffset(FI) - getOffsetOfLocalArea();
 
   return Offset + StackSize;
 }
 
 bool X86FrameLowering::assignCalleeSavedSpillSlots(
     MachineFunction &MF, const TargetRegisterInfo *TRI,
     std::vector<CalleeSavedInfo> &CSI) const {
   MachineFrameInfo *MFI = MF.getFrameInfo();
   X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
 
   unsigned CalleeSavedFrameSize = 0;
   int SpillSlotOffset = getOffsetOfLocalArea() + X86FI->getTCReturnAddrDelta();
 
   if (hasFP(MF)) {
     // emitPrologue always spills frame register the first thing.
     SpillSlotOffset -= SlotSize;
     MFI->CreateFixedSpillStackObject(SlotSize, SpillSlotOffset);
 
     // Since emitPrologue and emitEpilogue will handle spilling and restoring of
     // the frame register, we can delete it from CSI list and not have to worry
     // about avoiding it later.
     unsigned FPReg = TRI->getFrameRegister(MF);
     for (unsigned i = 0; i < CSI.size(); ++i) {
       if (TRI->regsOverlap(CSI[i].getReg(),FPReg)) {
         CSI.erase(CSI.begin() + i);
         break;
       }
     }
   }
 
   // Assign slots for GPRs. It increases frame size.
   for (unsigned i = CSI.size(); i != 0; --i) {
     unsigned Reg = CSI[i - 1].getReg();
 
     if (!X86::GR64RegClass.contains(Reg) && !X86::GR32RegClass.contains(Reg))
       continue;
 
     SpillSlotOffset -= SlotSize;
     CalleeSavedFrameSize += SlotSize;
 
     int SlotIndex = MFI->CreateFixedSpillStackObject(SlotSize, SpillSlotOffset);
     CSI[i - 1].setFrameIdx(SlotIndex);
   }
 
   X86FI->setCalleeSavedFrameSize(CalleeSavedFrameSize);
 
   // Assign slots for XMMs.
   for (unsigned i = CSI.size(); i != 0; --i) {
     unsigned Reg = CSI[i - 1].getReg();
     if (X86::GR64RegClass.contains(Reg) || X86::GR32RegClass.contains(Reg))
       continue;
 
     const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);
     // ensure alignment
     SpillSlotOffset -= std::abs(SpillSlotOffset) % RC->getAlignment();
     // spill into slot
     SpillSlotOffset -= RC->getSize();
     int SlotIndex =
         MFI->CreateFixedSpillStackObject(RC->getSize(), SpillSlotOffset);
     CSI[i - 1].setFrameIdx(SlotIndex);
     MFI->ensureMaxAlignment(RC->getAlignment());
   }
 
   return true;
 }
 
 bool X86FrameLowering::spillCalleeSavedRegisters(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
     const std::vector<CalleeSavedInfo> &CSI,
     const TargetRegisterInfo *TRI) const {
   DebugLoc DL = MBB.findDebugLoc(MI);
 
   // Don't save CSRs in 32-bit EH funclets. The caller saves EBX, EBP, ESI, EDI
   // for us, and there are no XMM CSRs on Win32.
   if (MBB.isEHFuncletEntry() && STI.is32Bit() && STI.isOSWindows())
     return true;
 
   // Push GPRs. It increases frame size.
   unsigned Opc = STI.is64Bit() ? X86::PUSH64r : X86::PUSH32r;
   for (unsigned i = CSI.size(); i != 0; --i) {
     unsigned Reg = CSI[i - 1].getReg();
 
     if (!X86::GR64RegClass.contains(Reg) && !X86::GR32RegClass.contains(Reg))
       continue;
     // Add the callee-saved register as live-in. It's killed at the spill.
     MBB.addLiveIn(Reg);
 
     BuildMI(MBB, MI, DL, TII.get(Opc)).addReg(Reg, RegState::Kill)
       .setMIFlag(MachineInstr::FrameSetup);
   }
 
   // Make XMM regs spilled. X86 does not have ability of push/pop XMM.
   // It can be done by spilling XMMs to stack frame.
   for (unsigned i = CSI.size(); i != 0; --i) {
     unsigned Reg = CSI[i-1].getReg();
     if (X86::GR64RegClass.contains(Reg) || X86::GR32RegClass.contains(Reg))
       continue;
     // Add the callee-saved register as live-in. It's killed at the spill.
     MBB.addLiveIn(Reg);
     const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);
 
     TII.storeRegToStackSlot(MBB, MI, Reg, true, CSI[i - 1].getFrameIdx(), RC,
                             TRI);
     --MI;
     MI->setFlag(MachineInstr::FrameSetup);
     ++MI;
   }
 
   return true;
 }
 
 bool X86FrameLowering::restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
                                                MachineBasicBlock::iterator MI,
                                         const std::vector<CalleeSavedInfo> &CSI,
                                           const TargetRegisterInfo *TRI) const {
   if (CSI.empty())
     return false;
 
   if (isFuncletReturnInstr(MI) && STI.isOSWindows()) {
     // Don't restore CSRs in 32-bit EH funclets. Matches
     // spillCalleeSavedRegisters.
     if (STI.is32Bit())
       return true;
     // Don't restore CSRs before an SEH catchret. SEH except blocks do not form
     // funclets. emitEpilogue transforms these to normal jumps.
     if (MI->getOpcode() == X86::CATCHRET) {
       const Function *Func = MBB.getParent()->getFunction();
       bool IsSEH = isAsynchronousEHPersonality(
           classifyEHPersonality(Func->getPersonalityFn()));
       if (IsSEH)
         return true;
     }
   }
 
   DebugLoc DL = MBB.findDebugLoc(MI);
 
   // Reload XMMs from stack frame.
   for (unsigned i = 0, e = CSI.size(); i != e; ++i) {
     unsigned Reg = CSI[i].getReg();
     if (X86::GR64RegClass.contains(Reg) ||
         X86::GR32RegClass.contains(Reg))
       continue;
 
     const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);
     TII.loadRegFromStackSlot(MBB, MI, Reg, CSI[i].getFrameIdx(), RC, TRI);
   }
 
   // POP GPRs.
   unsigned Opc = STI.is64Bit() ? X86::POP64r : X86::POP32r;
   for (unsigned i = 0, e = CSI.size(); i != e; ++i) {
     unsigned Reg = CSI[i].getReg();
     if (!X86::GR64RegClass.contains(Reg) &&
         !X86::GR32RegClass.contains(Reg))
       continue;
 
     BuildMI(MBB, MI, DL, TII.get(Opc), Reg)
         .setMIFlag(MachineInstr::FrameDestroy);
   }
   return true;
 }
 
 void X86FrameLowering::determineCalleeSaves(MachineFunction &MF,
                                             BitVector &SavedRegs,
                                             RegScavenger *RS) const {
   TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);
 
   MachineFrameInfo *MFI = MF.getFrameInfo();
 
   X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
   int64_t TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();
 
   if (TailCallReturnAddrDelta < 0) {
     // create RETURNADDR area
     //   arg
     //   arg
     //   RETADDR
     //   { ...
     //     RETADDR area
     //     ...
     //   }
     //   [EBP]
     MFI->CreateFixedObject(-TailCallReturnAddrDelta,
                            TailCallReturnAddrDelta - SlotSize, true);
   }
 
   // Spill the BasePtr if it's used.
   if (TRI->hasBasePointer(MF)) {
     SavedRegs.set(TRI->getBaseRegister());
 
     // Allocate a spill slot for EBP if we have a base pointer and EH funclets.
     if (MF.getMMI().hasEHFunclets()) {
       int FI = MFI->CreateSpillStackObject(SlotSize, SlotSize);
       X86FI->setHasSEHFramePtrSave(true);
       X86FI->setSEHFramePtrSaveIndex(FI);
     }
   }
 }
 
 static bool
 HasNestArgument(const MachineFunction *MF) {
   const Function *F = MF->getFunction();
   for (Function::const_arg_iterator I = F->arg_begin(), E = F->arg_end();
        I != E; I++) {
     if (I->hasNestAttr())
       return true;
   }
   return false;
 }
 
 /// GetScratchRegister - Get a temp register for performing work in the
 /// segmented stack and the Erlang/HiPE stack prologue. Depending on platform
 /// and the properties of the function either one or two registers will be
 /// needed. Set primary to true for the first register, false for the second.
 static unsigned
 GetScratchRegister(bool Is64Bit, bool IsLP64, const MachineFunction &MF, bool Primary) {
   CallingConv::ID CallingConvention = MF.getFunction()->getCallingConv();
 
   // Erlang stuff.
   if (CallingConvention == CallingConv::HiPE) {
     if (Is64Bit)
       return Primary ? X86::R14 : X86::R13;
     else
       return Primary ? X86::EBX : X86::EDI;
   }
 
   if (Is64Bit) {
     if (IsLP64)
       return Primary ? X86::R11 : X86::R12;
     else
       return Primary ? X86::R11D : X86::R12D;
   }
 
   bool IsNested = HasNestArgument(&MF);
 
   if (CallingConvention == CallingConv::X86_FastCall ||
       CallingConvention == CallingConv::Fast) {
     if (IsNested)
       report_fatal_error("Segmented stacks does not support fastcall with "
                          "nested function.");
     return Primary ? X86::EAX : X86::ECX;
   }
   if (IsNested)
     return Primary ? X86::EDX : X86::EAX;
   return Primary ? X86::ECX : X86::EAX;
 }
 
 // The stack limit in the TCB is set to this many bytes above the actual stack
 // limit.
 static const uint64_t kSplitStackAvailable = 256;
 
 void X86FrameLowering::adjustForSegmentedStacks(
     MachineFunction &MF, MachineBasicBlock &PrologueMBB) const {
   MachineFrameInfo *MFI = MF.getFrameInfo();
   uint64_t StackSize;
   unsigned TlsReg, TlsOffset;
   DebugLoc DL;
 
   // To support shrink-wrapping we would need to insert the new blocks
   // at the right place and update the branches to PrologueMBB.
   assert(&(*MF.begin()) == &PrologueMBB && "Shrink-wrapping not supported yet");
 
   unsigned ScratchReg = GetScratchRegister(Is64Bit, IsLP64, MF, true);
   assert(!MF.getRegInfo().isLiveIn(ScratchReg) &&
          "Scratch register is live-in");
 
   if (MF.getFunction()->isVarArg())
     report_fatal_error("Segmented stacks do not support vararg functions.");
   if (!STI.isTargetLinux() && !STI.isTargetDarwin() && !STI.isTargetWin32() &&
       !STI.isTargetWin64() && !STI.isTargetFreeBSD() &&
       !STI.isTargetDragonFly())
     report_fatal_error("Segmented stacks not supported on this platform.");
 
   // Eventually StackSize will be calculated by a link-time pass; which will
   // also decide whether checking code needs to be injected into this particular
   // prologue.
   StackSize = MFI->getStackSize();
 
   // Do not generate a prologue for functions with a stack of size zero
   if (StackSize == 0)
     return;
 
   MachineBasicBlock *allocMBB = MF.CreateMachineBasicBlock();
   MachineBasicBlock *checkMBB = MF.CreateMachineBasicBlock();
   X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
   bool IsNested = false;
 
   // We need to know if the function has a nest argument only in 64 bit mode.
   if (Is64Bit)
     IsNested = HasNestArgument(&MF);
 
   // The MOV R10, RAX needs to be in a different block, since the RET we emit in
   // allocMBB needs to be last (terminating) instruction.
 
   for (const auto &LI : PrologueMBB.liveins()) {
     allocMBB->addLiveIn(LI);
     checkMBB->addLiveIn(LI);
   }
 
   if (IsNested)
     allocMBB->addLiveIn(IsLP64 ? X86::R10 : X86::R10D);
 
   MF.push_front(allocMBB);
   MF.push_front(checkMBB);
 
   // When the frame size is less than 256 we just compare the stack
   // boundary directly to the value of the stack pointer, per gcc.
   bool CompareStackPointer = StackSize < kSplitStackAvailable;
 
   // Read the limit off the current stacklet off the stack_guard location.
   if (Is64Bit) {
     if (STI.isTargetLinux()) {
       TlsReg = X86::FS;
       TlsOffset = IsLP64 ? 0x70 : 0x40;
     } else if (STI.isTargetDarwin()) {
       TlsReg = X86::GS;
       TlsOffset = 0x60 + 90*8; // See pthread_machdep.h. Steal TLS slot 90.
     } else if (STI.isTargetWin64()) {
       TlsReg = X86::GS;
       TlsOffset = 0x28; // pvArbitrary, reserved for application use
     } else if (STI.isTargetFreeBSD()) {
       TlsReg = X86::FS;
       TlsOffset = 0x18;
     } else if (STI.isTargetDragonFly()) {
       TlsReg = X86::FS;
       TlsOffset = 0x20; // use tls_tcb.tcb_segstack
     } else {
       report_fatal_error("Segmented stacks not supported on this platform.");
     }
 
     if (CompareStackPointer)
       ScratchReg = IsLP64 ? X86::RSP : X86::ESP;
     else
       BuildMI(checkMBB, DL, TII.get(IsLP64 ? X86::LEA64r : X86::LEA64_32r), ScratchReg).addReg(X86::RSP)
         .addImm(1).addReg(0).addImm(-StackSize).addReg(0);
 
     BuildMI(checkMBB, DL, TII.get(IsLP64 ? X86::CMP64rm : X86::CMP32rm)).addReg(ScratchReg)
       .addReg(0).addImm(1).addReg(0).addImm(TlsOffset).addReg(TlsReg);
   } else {
     if (STI.isTargetLinux()) {
       TlsReg = X86::GS;
       TlsOffset = 0x30;
     } else if (STI.isTargetDarwin()) {
       TlsReg = X86::GS;
       TlsOffset = 0x48 + 90*4;
     } else if (STI.isTargetWin32()) {
       TlsReg = X86::FS;
       TlsOffset = 0x14; // pvArbitrary, reserved for application use
     } else if (STI.isTargetDragonFly()) {
       TlsReg = X86::FS;
       TlsOffset = 0x10; // use tls_tcb.tcb_segstack
     } else if (STI.isTargetFreeBSD()) {
       report_fatal_error("Segmented stacks not supported on FreeBSD i386.");
     } else {
       report_fatal_error("Segmented stacks not supported on this platform.");
     }
 
     if (CompareStackPointer)
       ScratchReg = X86::ESP;
     else
       BuildMI(checkMBB, DL, TII.get(X86::LEA32r), ScratchReg).addReg(X86::ESP)
         .addImm(1).addReg(0).addImm(-StackSize).addReg(0);
 
     if (STI.isTargetLinux() || STI.isTargetWin32() || STI.isTargetWin64() ||
         STI.isTargetDragonFly()) {
       BuildMI(checkMBB, DL, TII.get(X86::CMP32rm)).addReg(ScratchReg)
         .addReg(0).addImm(0).addReg(0).addImm(TlsOffset).addReg(TlsReg);
     } else if (STI.isTargetDarwin()) {
 
       // TlsOffset doesn't fit into a mod r/m byte so we need an extra register.
       unsigned ScratchReg2;
       bool SaveScratch2;
       if (CompareStackPointer) {
         // The primary scratch register is available for holding the TLS offset.
         ScratchReg2 = GetScratchRegister(Is64Bit, IsLP64, MF, true);
         SaveScratch2 = false;
       } else {
         // Need to use a second register to hold the TLS offset
         ScratchReg2 = GetScratchRegister(Is64Bit, IsLP64, MF, false);
 
         // Unfortunately, with fastcc the second scratch register may hold an
         // argument.
         SaveScratch2 = MF.getRegInfo().isLiveIn(ScratchReg2);
       }
 
       // If Scratch2 is live-in then it needs to be saved.
       assert((!MF.getRegInfo().isLiveIn(ScratchReg2) || SaveScratch2) &&
              "Scratch register is live-in and not saved");
 
       if (SaveScratch2)
         BuildMI(checkMBB, DL, TII.get(X86::PUSH32r))
           .addReg(ScratchReg2, RegState::Kill);
 
       BuildMI(checkMBB, DL, TII.get(X86::MOV32ri), ScratchReg2)
         .addImm(TlsOffset);
       BuildMI(checkMBB, DL, TII.get(X86::CMP32rm))
         .addReg(ScratchReg)
         .addReg(ScratchReg2).addImm(1).addReg(0)
         .addImm(0)
         .addReg(TlsReg);
 
       if (SaveScratch2)
         BuildMI(checkMBB, DL, TII.get(X86::POP32r), ScratchReg2);
     }
   }
 
   // This jump is taken if SP >= (Stacklet Limit + Stack Space required).
   // It jumps to normal execution of the function body.
   BuildMI(checkMBB, DL, TII.get(X86::JA_1)).addMBB(&PrologueMBB);
 
   // On 32 bit we first push the arguments size and then the frame size. On 64
   // bit, we pass the stack frame size in r10 and the argument size in r11.
   if (Is64Bit) {
     // Functions with nested arguments use R10, so it needs to be saved across
     // the call to _morestack
 
     const unsigned RegAX = IsLP64 ? X86::RAX : X86::EAX;
     const unsigned Reg10 = IsLP64 ? X86::R10 : X86::R10D;
     const unsigned Reg11 = IsLP64 ? X86::R11 : X86::R11D;
     const unsigned MOVrr = IsLP64 ? X86::MOV64rr : X86::MOV32rr;
     const unsigned MOVri = IsLP64 ? X86::MOV64ri : X86::MOV32ri;
 
     if (IsNested)
       BuildMI(allocMBB, DL, TII.get(MOVrr), RegAX).addReg(Reg10);
 
     BuildMI(allocMBB, DL, TII.get(MOVri), Reg10)
       .addImm(StackSize);
     BuildMI(allocMBB, DL, TII.get(MOVri), Reg11)
       .addImm(X86FI->getArgumentStackSize());
   } else {
     BuildMI(allocMBB, DL, TII.get(X86::PUSHi32))
       .addImm(X86FI->getArgumentStackSize());
     BuildMI(allocMBB, DL, TII.get(X86::PUSHi32))
       .addImm(StackSize);
   }
 
   // __morestack is in libgcc
   if (Is64Bit && MF.getTarget().getCodeModel() == CodeModel::Large) {
     // Under the large code model, we cannot assume that __morestack lives
     // within 2^31 bytes of the call site, so we cannot use pc-relative
     // addressing. We cannot perform the call via a temporary register,
     // as the rax register may be used to store the static chain, and all
     // other suitable registers may be either callee-save or used for
     // parameter passing. We cannot use the stack at this point either
     // because __morestack manipulates the stack directly.
     //
     // To avoid these issues, perform an indirect call via a read-only memory
     // location containing the address.
     //
     // This solution is not perfect, as it assumes that the .rodata section
     // is laid out within 2^31 bytes of each function body, but this seems
     // to be sufficient for JIT.
     BuildMI(allocMBB, DL, TII.get(X86::CALL64m))
         .addReg(X86::RIP)
         .addImm(0)
         .addReg(0)
         .addExternalSymbol("__morestack_addr")
         .addReg(0);
     MF.getMMI().setUsesMorestackAddr(true);
   } else {
     if (Is64Bit)
       BuildMI(allocMBB, DL, TII.get(X86::CALL64pcrel32))
         .addExternalSymbol("__morestack");
     else
       BuildMI(allocMBB, DL, TII.get(X86::CALLpcrel32))
         .addExternalSymbol("__morestack");
   }
 
   if (IsNested)
     BuildMI(allocMBB, DL, TII.get(X86::MORESTACK_RET_RESTORE_R10));
   else
     BuildMI(allocMBB, DL, TII.get(X86::MORESTACK_RET));
 
   allocMBB->addSuccessor(&PrologueMBB);
 
   checkMBB->addSuccessor(allocMBB);
   checkMBB->addSuccessor(&PrologueMBB);
 
 #ifdef XDEBUG
   MF.verify();
 #endif
 }
 
 /// Erlang programs may need a special prologue to handle the stack size they
 /// might need at runtime. That is because Erlang/OTP does not implement a C
 /// stack but uses a custom implementation of hybrid stack/heap architecture.
 /// (for more information see Eric Stenman's Ph.D. thesis:
 /// http://publications.uu.se/uu/fulltext/nbn_se_uu_diva-2688.pdf)
 ///
 /// CheckStack:
 ///       temp0 = sp - MaxStack
 ///       if( temp0 < SP_LIMIT(P) ) goto IncStack else goto OldStart
 /// OldStart:
 ///       ...
 /// IncStack:
 ///       call inc_stack   # doubles the stack space
 ///       temp0 = sp - MaxStack
 ///       if( temp0 < SP_LIMIT(P) ) goto IncStack else goto OldStart
 void X86FrameLowering::adjustForHiPEPrologue(
     MachineFunction &MF, MachineBasicBlock &PrologueMBB) const {
   MachineFrameInfo *MFI = MF.getFrameInfo();
   DebugLoc DL;
 
   // To support shrink-wrapping we would need to insert the new blocks
   // at the right place and update the branches to PrologueMBB.
   assert(&(*MF.begin()) == &PrologueMBB && "Shrink-wrapping not supported yet");
 
   // HiPE-specific values
   const unsigned HipeLeafWords = 24;
   const unsigned CCRegisteredArgs = Is64Bit ? 6 : 5;
   const unsigned Guaranteed = HipeLeafWords * SlotSize;
   unsigned CallerStkArity = MF.getFunction()->arg_size() > CCRegisteredArgs ?
                             MF.getFunction()->arg_size() - CCRegisteredArgs : 0;
   unsigned MaxStack = MFI->getStackSize() + CallerStkArity*SlotSize + SlotSize;
 
   assert(STI.isTargetLinux() &&
          "HiPE prologue is only supported on Linux operating systems.");
 
   // Compute the largest caller's frame that is needed to fit the callees'
   // frames. This 'MaxStack' is computed from:
   //
   // a) the fixed frame size, which is the space needed for all spilled temps,
   // b) outgoing on-stack parameter areas, and
   // c) the minimum stack space this function needs to make available for the
   //    functions it calls (a tunable ABI property).
   if (MFI->hasCalls()) {
     unsigned MoreStackForCalls = 0;
 
     for (MachineFunction::iterator MBBI = MF.begin(), MBBE = MF.end();
          MBBI != MBBE; ++MBBI)
       for (MachineBasicBlock::iterator MI = MBBI->begin(), ME = MBBI->end();
            MI != ME; ++MI) {
         if (!MI->isCall())
           continue;
 
         // Get callee operand.
         const MachineOperand &MO = MI->getOperand(0);
 
         // Only take account of global function calls (no closures etc.).
         if (!MO.isGlobal())
           continue;
 
         const Function *F = dyn_cast<Function>(MO.getGlobal());
         if (!F)
           continue;
 
         // Do not update 'MaxStack' for primitive and built-in functions
         // (encoded with names either starting with "erlang."/"bif_" or not
         // having a ".", such as a simple <Module>.<Function>.<Arity>, or an
         // "_", such as the BIF "suspend_0") as they are executed on another
         // stack.
         if (F->getName().find("erlang.") != StringRef::npos ||
             F->getName().find("bif_") != StringRef::npos ||
             F->getName().find_first_of("._") == StringRef::npos)
           continue;
 
         unsigned CalleeStkArity =
           F->arg_size() > CCRegisteredArgs ? F->arg_size()-CCRegisteredArgs : 0;
         if (HipeLeafWords - 1 > CalleeStkArity)
           MoreStackForCalls = std::max(MoreStackForCalls,
                                (HipeLeafWords - 1 - CalleeStkArity) * SlotSize);
       }
     MaxStack += MoreStackForCalls;
   }
 
   // If the stack frame needed is larger than the guaranteed then runtime checks
   // and calls to "inc_stack_0" BIF should be inserted in the assembly prologue.
   if (MaxStack > Guaranteed) {
     MachineBasicBlock *stackCheckMBB = MF.CreateMachineBasicBlock();
     MachineBasicBlock *incStackMBB = MF.CreateMachineBasicBlock();
 
     for (const auto &LI : PrologueMBB.liveins()) {
       stackCheckMBB->addLiveIn(LI);
       incStackMBB->addLiveIn(LI);
     }
 
     MF.push_front(incStackMBB);
     MF.push_front(stackCheckMBB);
 
     unsigned ScratchReg, SPReg, PReg, SPLimitOffset;
     unsigned LEAop, CMPop, CALLop;
     if (Is64Bit) {
       SPReg = X86::RSP;
       PReg  = X86::RBP;
       LEAop = X86::LEA64r;
       CMPop = X86::CMP64rm;
       CALLop = X86::CALL64pcrel32;
       SPLimitOffset = 0x90;
     } else {
       SPReg = X86::ESP;
       PReg  = X86::EBP;
       LEAop = X86::LEA32r;
       CMPop = X86::CMP32rm;
       CALLop = X86::CALLpcrel32;
       SPLimitOffset = 0x4c;
     }
 
     ScratchReg = GetScratchRegister(Is64Bit, IsLP64, MF, true);
     assert(!MF.getRegInfo().isLiveIn(ScratchReg) &&
            "HiPE prologue scratch register is live-in");
 
     // Create new MBB for StackCheck:
     addRegOffset(BuildMI(stackCheckMBB, DL, TII.get(LEAop), ScratchReg),
                  SPReg, false, -MaxStack);
     // SPLimitOffset is in a fixed heap location (pointed by BP).
     addRegOffset(BuildMI(stackCheckMBB, DL, TII.get(CMPop))
                  .addReg(ScratchReg), PReg, false, SPLimitOffset);
     BuildMI(stackCheckMBB, DL, TII.get(X86::JAE_1)).addMBB(&PrologueMBB);
 
     // Create new MBB for IncStack:
     BuildMI(incStackMBB, DL, TII.get(CALLop)).
       addExternalSymbol("inc_stack_0");
     addRegOffset(BuildMI(incStackMBB, DL, TII.get(LEAop), ScratchReg),
                  SPReg, false, -MaxStack);
     addRegOffset(BuildMI(incStackMBB, DL, TII.get(CMPop))
                  .addReg(ScratchReg), PReg, false, SPLimitOffset);
     BuildMI(incStackMBB, DL, TII.get(X86::JLE_1)).addMBB(incStackMBB);
 
     stackCheckMBB->addSuccessor(&PrologueMBB, {99, 100});
     stackCheckMBB->addSuccessor(incStackMBB, {1, 100});
     incStackMBB->addSuccessor(&PrologueMBB, {99, 100});
     incStackMBB->addSuccessor(incStackMBB, {1, 100});
   }
 #ifdef XDEBUG
   MF.verify();
 #endif
 }
 
 bool X86FrameLowering::adjustStackWithPops(MachineBasicBlock &MBB,
     MachineBasicBlock::iterator MBBI, DebugLoc DL, int Offset) const {
 
   if (Offset <= 0)
     return false;
 
   if (Offset % SlotSize)
     return false;
 
   int NumPops = Offset / SlotSize;
   // This is only worth it if we have at most 2 pops.
   if (NumPops != 1 && NumPops != 2)
     return false;
 
   // Handle only the trivial case where the adjustment directly follows
   // a call. This is the most common one, anyway.
   if (MBBI == MBB.begin())
     return false;
   MachineBasicBlock::iterator Prev = std::prev(MBBI);
   if (!Prev->isCall() || !Prev->getOperand(1).isRegMask())
     return false;
 
   unsigned Regs[2];
   unsigned FoundRegs = 0;
 
   auto RegMask = Prev->getOperand(1);
 
   auto &RegClass =
       Is64Bit ? X86::GR64_NOREX_NOSPRegClass : X86::GR32_NOREX_NOSPRegClass;
   // Try to find up to NumPops free registers.
   for (auto Candidate : RegClass) {
 
     // Poor man's liveness:
     // Since we're immediately after a call, any register that is clobbered
     // by the call and not defined by it can be considered dead.
     if (!RegMask.clobbersPhysReg(Candidate))
       continue;
 
     bool IsDef = false;
     for (const MachineOperand &MO : Prev->implicit_operands()) {
       if (MO.isReg() && MO.isDef() && MO.getReg() == Candidate) {
         IsDef = true;
         break;
       }
     }
 
     if (IsDef)
       continue;
 
     Regs[FoundRegs++] = Candidate;
     if (FoundRegs == (unsigned)NumPops)
       break;
   }
 
   if (FoundRegs == 0)
     return false;
 
   // If we found only one free register, but need two, reuse the same one twice.
   while (FoundRegs < (unsigned)NumPops)
     Regs[FoundRegs++] = Regs[0];
 
   for (int i = 0; i < NumPops; ++i)
     BuildMI(MBB, MBBI, DL, 
             TII.get(STI.is64Bit() ? X86::POP64r : X86::POP32r), Regs[i]);
 
   return true;
 }
 
 void X86FrameLowering::
 eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
                               MachineBasicBlock::iterator I) const {
   bool reserveCallFrame = hasReservedCallFrame(MF);
   unsigned Opcode = I->getOpcode();
   bool isDestroy = Opcode == TII.getCallFrameDestroyOpcode();
   DebugLoc DL = I->getDebugLoc();
   uint64_t Amount = !reserveCallFrame ? I->getOperand(0).getImm() : 0;
   uint64_t InternalAmt = (isDestroy || Amount) ? I->getOperand(1).getImm() : 0;
   I = MBB.erase(I);
 
   if (!reserveCallFrame) {
     // If the stack pointer can be changed after prologue, turn the
     // adjcallstackup instruction into a 'sub ESP, <amt>' and the
     // adjcallstackdown instruction into 'add ESP, <amt>'
 
     // We need to keep the stack aligned properly.  To do this, we round the
     // amount of space needed for the outgoing arguments up to the next
     // alignment boundary.
     unsigned StackAlign = getStackAlignment();
     Amount = RoundUpToAlignment(Amount, StackAlign);
 
     MachineModuleInfo &MMI = MF.getMMI();
     const Function *Fn = MF.getFunction();
     bool WindowsCFI = MF.getTarget().getMCAsmInfo()->usesWindowsCFI();
     bool DwarfCFI = !WindowsCFI && 
                     (MMI.hasDebugInfo() || Fn->needsUnwindTableEntry());
 
     // If we have any exception handlers in this function, and we adjust
     // the SP before calls, we may need to indicate this to the unwinder
     // using GNU_ARGS_SIZE. Note that this may be necessary even when
     // Amount == 0, because the preceding function may have set a non-0
     // GNU_ARGS_SIZE.
     // TODO: We don't need to reset this between subsequent functions,
     // if it didn't change.
     bool HasDwarfEHHandlers = !WindowsCFI &&
                               !MF.getMMI().getLandingPads().empty();
 
     if (HasDwarfEHHandlers && !isDestroy &&
         MF.getInfo<X86MachineFunctionInfo>()->getHasPushSequences())
       BuildCFI(MBB, I, DL,
                MCCFIInstruction::createGnuArgsSize(nullptr, Amount));
 
     if (Amount == 0)
       return;
 
     // Factor out the amount that gets handled inside the sequence
     // (Pushes of argument for frame setup, callee pops for frame destroy)
     Amount -= InternalAmt;
 
     // TODO: This is needed only if we require precise CFA.
     // If this is a callee-pop calling convention, emit a CFA adjust for
     // the amount the callee popped.
     if (isDestroy && InternalAmt && DwarfCFI && !hasFP(MF))
       BuildCFI(MBB, I, DL, 
                MCCFIInstruction::createAdjustCfaOffset(nullptr, -InternalAmt));
 
     if (Amount) {
       // Add Amount to SP to destroy a frame, and subtract to setup.
       int Offset = isDestroy ? Amount : -Amount;
 
       if (!(Fn->optForMinSize() && 
             adjustStackWithPops(MBB, I, DL, Offset)))
         BuildStackAdjustment(MBB, I, DL, Offset, /*InEpilogue=*/false);
     }
 
     if (DwarfCFI && !hasFP(MF)) {
       // If we don't have FP, but need to generate unwind information,
       // we need to set the correct CFA offset after the stack adjustment.
       // How much we adjust the CFA offset depends on whether we're emitting
       // CFI only for EH purposes or for debugging. EH only requires the CFA
       // offset to be correct at each call site, while for debugging we want
       // it to be more precise.
       int CFAOffset = Amount;
       // TODO: When not using precise CFA, we also need to adjust for the
       // InternalAmt here.
 
       if (CFAOffset) {
         CFAOffset = isDestroy ? -CFAOffset : CFAOffset;
         BuildCFI(MBB, I, DL, 
                  MCCFIInstruction::createAdjustCfaOffset(nullptr, CFAOffset));
       }
     }
 
     return;
   }
 
   if (isDestroy && InternalAmt) {
     // If we are performing frame pointer elimination and if the callee pops
     // something off the stack pointer, add it back.  We do this until we have
     // more advanced stack pointer tracking ability.
     // We are not tracking the stack pointer adjustment by the callee, so make
     // sure we restore the stack pointer immediately after the call, there may
     // be spill code inserted between the CALL and ADJCALLSTACKUP instructions.
     MachineBasicBlock::iterator B = MBB.begin();
     while (I != B && !std::prev(I)->isCall())
       --I;
     BuildStackAdjustment(MBB, I, DL, -InternalAmt, /*InEpilogue=*/false);
   }
 }
 
 bool X86FrameLowering::canUseAsEpilogue(const MachineBasicBlock &MBB) const {
   assert(MBB.getParent() && "Block is not attached to a function!");
 
   // Win64 has strict requirements in terms of epilogue and we are
   // not taking a chance at messing with them.
   // I.e., unless this block is already an exit block, we can't use
   // it as an epilogue.
   if (STI.isTargetWin64() && !MBB.succ_empty() && !MBB.isReturnBlock())
     return false;
 
   if (canUseLEAForSPInEpilogue(*MBB.getParent()))
     return true;
 
   // If we cannot use LEA to adjust SP, we may need to use ADD, which
   // clobbers the EFLAGS. Check that we do not need to preserve it,
   // otherwise, conservatively assume this is not
   // safe to insert the epilogue here.
   return !flagsNeedToBePreservedBeforeTheTerminators(MBB);
 }
 
 bool X86FrameLowering::enableShrinkWrapping(const MachineFunction &MF) const {
   // If we may need to emit frameless compact unwind information, give
   // up as this is currently broken: PR25614.
   return (MF.getFunction()->hasFnAttribute(Attribute::NoUnwind) || hasFP(MF)) &&
          // The lowering of segmented stack and HiPE only support entry blocks
          // as prologue blocks: PR26107.
          // This limitation may be lifted if we fix:
          // - adjustForSegmentedStacks
          // - adjustForHiPEPrologue
          MF.getFunction()->getCallingConv() != CallingConv::HiPE &&
          !MF.shouldSplitStack();
 }
 
 MachineBasicBlock::iterator X86FrameLowering::restoreWin32EHStackPointers(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
     DebugLoc DL, bool RestoreSP) const {
   assert(STI.isTargetWindowsMSVC() && "funclets only supported in MSVC env");
   assert(STI.isTargetWin32() && "EBP/ESI restoration only required on win32");
   assert(STI.is32Bit() && !Uses64BitFramePtr &&
          "restoring EBP/ESI on non-32-bit target");
 
   MachineFunction &MF = *MBB.getParent();
   unsigned FramePtr = TRI->getFrameRegister(MF);
   unsigned BasePtr = TRI->getBaseRegister();
   WinEHFuncInfo &FuncInfo = *MF.getWinEHFuncInfo();
   X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
   MachineFrameInfo *MFI = MF.getFrameInfo();
 
   // FIXME: Don't set FrameSetup flag in catchret case.
 
   int FI = FuncInfo.EHRegNodeFrameIndex;
   int EHRegSize = MFI->getObjectSize(FI);
 
   if (RestoreSP) {
     // MOV32rm -EHRegSize(%ebp), %esp
     addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32rm), X86::ESP),
                  X86::EBP, true, -EHRegSize)
         .setMIFlag(MachineInstr::FrameSetup);
   }
 
   unsigned UsedReg;
   int EHRegOffset = getFrameIndexReference(MF, FI, UsedReg);
   int EndOffset = -EHRegOffset - EHRegSize;
   FuncInfo.EHRegNodeEndOffset = EndOffset;
 
   if (UsedReg == FramePtr) {
     // ADD $offset, %ebp
     unsigned ADDri = getADDriOpcode(false, EndOffset);
     BuildMI(MBB, MBBI, DL, TII.get(ADDri), FramePtr)
         .addReg(FramePtr)
         .addImm(EndOffset)
         .setMIFlag(MachineInstr::FrameSetup)
         ->getOperand(3)
         .setIsDead();
     assert(EndOffset >= 0 &&
            "end of registration object above normal EBP position!");
   } else if (UsedReg == BasePtr) {
     // LEA offset(%ebp), %esi
     addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(X86::LEA32r), BasePtr),
                  FramePtr, false, EndOffset)
         .setMIFlag(MachineInstr::FrameSetup);
     // MOV32rm SavedEBPOffset(%esi), %ebp
     assert(X86FI->getHasSEHFramePtrSave());
     int Offset =
         getFrameIndexReference(MF, X86FI->getSEHFramePtrSaveIndex(), UsedReg);
     assert(UsedReg == BasePtr);
     addRegOffset(BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32rm), FramePtr),
                  UsedReg, true, Offset)
         .setMIFlag(MachineInstr::FrameSetup);
   } else {
     llvm_unreachable("32-bit frames with WinEH must use FramePtr or BasePtr");
   }
   return MBBI;
 }
 
 unsigned X86FrameLowering::getWinEHParentFrameOffset(const MachineFunction &MF) const {
   // RDX, the parent frame pointer, is homed into 16(%rsp) in the prologue.
   unsigned Offset = 16;
   // RBP is immediately pushed.
   Offset += SlotSize;
   // All callee-saved registers are then pushed.
   Offset += MF.getInfo<X86MachineFunctionInfo>()->getCalleeSavedFrameSize();
   // Every funclet allocates enough stack space for the largest outgoing call.
   Offset += getWinEHFuncletFrameSize(MF);
   return Offset;
 }
 
 void X86FrameLowering::processFunctionBeforeFrameFinalized(
     MachineFunction &MF, RegScavenger *RS) const {
   // If this function isn't doing Win64-style C++ EH, we don't need to do
   // anything.
   const Function *Fn = MF.getFunction();
   if (!STI.is64Bit() || !MF.getMMI().hasEHFunclets() ||
       classifyEHPersonality(Fn->getPersonalityFn()) != EHPersonality::MSVC_CXX)
     return;
 
   // Win64 C++ EH needs to allocate the UnwindHelp object at some fixed offset
   // relative to RSP after the prologue.  Find the offset of the last fixed
   // object, so that we can allocate a slot immediately following it. If there
   // were no fixed objects, use offset -SlotSize, which is immediately after the
   // return address. Fixed objects have negative frame indices.
   MachineFrameInfo *MFI = MF.getFrameInfo();
   int64_t MinFixedObjOffset = -SlotSize;
   for (int I = MFI->getObjectIndexBegin(); I < 0; ++I)
     MinFixedObjOffset = std::min(MinFixedObjOffset, MFI->getObjectOffset(I));
 
   int64_t UnwindHelpOffset = MinFixedObjOffset - SlotSize;
   int UnwindHelpFI =
       MFI->CreateFixedObject(SlotSize, UnwindHelpOffset, /*Immutable=*/false);
   MF.getWinEHFuncInfo()->UnwindHelpFrameIdx = UnwindHelpFI;
 
   // Store -2 into UnwindHelp on function entry. We have to scan forwards past
   // other frame setup instructions.
   MachineBasicBlock &MBB = MF.front();
   auto MBBI = MBB.begin();
   while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
     ++MBBI;
 
   DebugLoc DL = MBB.findDebugLoc(MBBI);
   addFrameReference(BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64mi32)),
                     UnwindHelpFI)
       .addImm(-2);
 }
Index: vendor/llvm/dist/lib/Target/X86/X86InstrAVX512.td
===================================================================
--- vendor/llvm/dist/lib/Target/X86/X86InstrAVX512.td	(revision 295845)
+++ vendor/llvm/dist/lib/Target/X86/X86InstrAVX512.td	(revision 295846)
@@ -1,7487 +1,7487 @@
 //===-- X86InstrAVX512.td - AVX512 Instruction Set ---------*- tablegen -*-===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file describes the X86 AVX512 instruction set, defining the
 // instructions, and properties of the instructions which are needed for code
 // generation, machine code emission, and analysis.
 //
 //===----------------------------------------------------------------------===//
 
 // Group template arguments that can be derived from the vector type (EltNum x
 // EltVT).  These are things like the register class for the writemask, etc.
 // The idea is to pass one of these as the template argument rather than the
 // individual arguments.
 // The template is also used for scalar types, in this case numelts is 1.
 class X86VectorVTInfo<int numelts, ValueType eltvt, RegisterClass rc,
                       string suffix = ""> {
   RegisterClass RC = rc;
   ValueType EltVT = eltvt;
   int NumElts = numelts;
 
   // Corresponding mask register class.
   RegisterClass KRC = !cast<RegisterClass>("VK" # NumElts);
 
   // Corresponding write-mask register class.
   RegisterClass KRCWM = !cast<RegisterClass>("VK" # NumElts # "WM");
 
   // The GPR register class that can hold the write mask.  Use GR8 for fewer
   // than 8 elements.  Use shift-right and equal to work around the lack of
   // !lt in tablegen.
   RegisterClass MRC =
     !cast<RegisterClass>("GR" #
                          !if (!eq (!srl(NumElts, 3), 0), 8, NumElts));
 
   // Suffix used in the instruction mnemonic.
   string Suffix = suffix;
 
   // VTName is a string name for vector VT. For vector types it will be
   // v # NumElts # EltVT, so for vector of 8 elements of i32 it will be v8i32
   // It is a little bit complex for scalar types, where NumElts = 1.
   // In this case we build v4f32 or v2f64
   string VTName = "v" # !if (!eq (NumElts, 1),
                         !if (!eq (EltVT.Size, 32), 4,
                         !if (!eq (EltVT.Size, 64), 2, NumElts)), NumElts) # EltVT;
 
   // The vector VT.
   ValueType VT = !cast<ValueType>(VTName);
 
   string EltTypeName = !cast<string>(EltVT);
   // Size of the element type in bits, e.g. 32 for v16i32.
   string EltSizeName = !subst("i", "", !subst("f", "", EltTypeName));
   int EltSize = EltVT.Size;
 
   // "i" for integer types and "f" for floating-point types
   string TypeVariantName = !subst(EltSizeName, "", EltTypeName);
 
   // Size of RC in bits, e.g. 512 for VR512.
   int Size = VT.Size;
 
   // The corresponding memory operand, e.g. i512mem for VR512.
   X86MemOperand MemOp = !cast<X86MemOperand>(TypeVariantName # Size # "mem");
   X86MemOperand ScalarMemOp = !cast<X86MemOperand>(EltVT # "mem");
 
   // Load patterns
   // Note: For 128/256-bit integer VT we choose loadv2i64/loadv4i64
   //       due to load promotion during legalization
   PatFrag LdFrag = !cast<PatFrag>("load" #
                                   !if (!eq (TypeVariantName, "i"),
                                        !if (!eq (Size, 128), "v2i64",
                                        !if (!eq (Size, 256), "v4i64",
                                             VTName)), VTName));
 
   PatFrag AlignedLdFrag = !cast<PatFrag>("alignedload" #
                           !if (!eq (TypeVariantName, "i"),
                                 !if (!eq (Size, 128), "v2i64",
                                 !if (!eq (Size, 256), "v4i64",
                                 !if (!eq (Size, 512),
                                     !if (!eq (EltSize, 64), "v8i64", "v16i32"),
                                     VTName))), VTName));
 
   PatFrag ScalarLdFrag = !cast<PatFrag>("load" # EltVT);
 
   // The corresponding float type, e.g. v16f32 for v16i32
   // Note: For EltSize < 32, FloatVT is illegal and TableGen
   //       fails to compile, so we choose FloatVT = VT
   ValueType FloatVT = !cast<ValueType>(
                         !if (!eq (!srl(EltSize,5),0),
                              VTName,
                              !if (!eq(TypeVariantName, "i"),
                                   "v" # NumElts # "f" # EltSize,
                                   VTName)));
 
   // The string to specify embedded broadcast in assembly.
   string BroadcastStr = "{1to" # NumElts # "}";
 
   // 8-bit compressed displacement tuple/subvector format.  This is only
   // defined for NumElts <= 8.
   CD8VForm CD8TupleForm = !if (!eq (!srl(NumElts, 4), 0),
                                !cast<CD8VForm>("CD8VT" # NumElts), ?);
 
   SubRegIndex SubRegIdx = !if (!eq (Size, 128), sub_xmm,
                           !if (!eq (Size, 256), sub_ymm, ?));
 
   Domain ExeDomain = !if (!eq (EltTypeName, "f32"), SSEPackedSingle,
                      !if (!eq (EltTypeName, "f64"), SSEPackedDouble,
                      SSEPackedInt));
 
   RegisterClass FRC = !if (!eq (EltTypeName, "f32"), FR32X, FR64X);
 
   // A vector type of the same width with element type i32.  This is used to
   // create the canonical constant zero node ImmAllZerosV.
   ValueType i32VT = !cast<ValueType>("v" # !srl(Size, 5) # "i32");
   dag ImmAllZerosV = (VT (bitconvert (i32VT immAllZerosV)));
 
   string ZSuffix = !if (!eq (Size, 128), "Z128",
                    !if (!eq (Size, 256), "Z256", "Z"));
 }
 
 def v64i8_info  : X86VectorVTInfo<64,  i8, VR512, "b">;
 def v32i16_info : X86VectorVTInfo<32, i16, VR512, "w">;
 def v16i32_info : X86VectorVTInfo<16, i32, VR512, "d">;
 def v8i64_info  : X86VectorVTInfo<8,  i64, VR512, "q">;
 def v16f32_info : X86VectorVTInfo<16, f32, VR512, "ps">;
 def v8f64_info  : X86VectorVTInfo<8,  f64, VR512, "pd">;
 
 // "x" in v32i8x_info means RC = VR256X
 def v32i8x_info  : X86VectorVTInfo<32,  i8, VR256X, "b">;
 def v16i16x_info : X86VectorVTInfo<16, i16, VR256X, "w">;
 def v8i32x_info  : X86VectorVTInfo<8,  i32, VR256X, "d">;
 def v4i64x_info  : X86VectorVTInfo<4,  i64, VR256X, "q">;
 def v8f32x_info  : X86VectorVTInfo<8,  f32, VR256X, "ps">;
 def v4f64x_info  : X86VectorVTInfo<4,  f64, VR256X, "pd">;
 
 def v16i8x_info  : X86VectorVTInfo<16,  i8, VR128X, "b">;
 def v8i16x_info  : X86VectorVTInfo<8,  i16, VR128X, "w">;
 def v4i32x_info  : X86VectorVTInfo<4,  i32, VR128X, "d">;
 def v2i64x_info  : X86VectorVTInfo<2,  i64, VR128X, "q">;
 def v4f32x_info  : X86VectorVTInfo<4,  f32, VR128X, "ps">;
 def v2f64x_info  : X86VectorVTInfo<2,  f64, VR128X, "pd">;
 
 // We map scalar types to the smallest (128-bit) vector type
 // with the appropriate element type. This allows to use the same masking logic.
 def i32x_info    : X86VectorVTInfo<1,  i32, GR32, "si">;
 def i64x_info    : X86VectorVTInfo<1,  i64, GR64, "sq">;
 def f32x_info    : X86VectorVTInfo<1,  f32, VR128X, "ss">;
 def f64x_info    : X86VectorVTInfo<1,  f64, VR128X, "sd">;
 
 class AVX512VLVectorVTInfo<X86VectorVTInfo i512, X86VectorVTInfo i256,
                            X86VectorVTInfo i128> {
   X86VectorVTInfo info512 = i512;
   X86VectorVTInfo info256 = i256;
   X86VectorVTInfo info128 = i128;
 }
 
 def avx512vl_i8_info  : AVX512VLVectorVTInfo<v64i8_info, v32i8x_info,
                                              v16i8x_info>;
 def avx512vl_i16_info : AVX512VLVectorVTInfo<v32i16_info, v16i16x_info,
                                              v8i16x_info>;
 def avx512vl_i32_info : AVX512VLVectorVTInfo<v16i32_info, v8i32x_info,
                                              v4i32x_info>;
 def avx512vl_i64_info : AVX512VLVectorVTInfo<v8i64_info, v4i64x_info,
                                              v2i64x_info>;
 def avx512vl_f32_info : AVX512VLVectorVTInfo<v16f32_info, v8f32x_info,
                                              v4f32x_info>;
 def avx512vl_f64_info : AVX512VLVectorVTInfo<v8f64_info, v4f64x_info,
                                              v2f64x_info>;
 
 // This multiclass generates the masking variants from the non-masking
 // variant.  It only provides the assembly pieces for the masking variants.
 // It assumes custom ISel patterns for masking which can be provided as
 // template arguments.
 multiclass AVX512_maskable_custom<bits<8> O, Format F,
                                   dag Outs,
                                   dag Ins, dag MaskingIns, dag ZeroMaskingIns,
                                   string OpcodeStr,
                                   string AttSrcAsm, string IntelSrcAsm,
                                   list<dag> Pattern,
                                   list<dag> MaskingPattern,
                                   list<dag> ZeroMaskingPattern,
                                   string MaskingConstraint = "",
                                   InstrItinClass itin = NoItinerary,
                                   bit IsCommutable = 0> {
   let isCommutable = IsCommutable in
     def NAME: AVX512<O, F, Outs, Ins,
                        OpcodeStr#"\t{"#AttSrcAsm#", $dst|"#
                                      "$dst, "#IntelSrcAsm#"}",
                        Pattern, itin>;
 
   // Prefer over VMOV*rrk Pat<>
   let AddedComplexity = 20 in
     def NAME#k: AVX512<O, F, Outs, MaskingIns,
                        OpcodeStr#"\t{"#AttSrcAsm#", $dst {${mask}}|"#
                                      "$dst {${mask}}, "#IntelSrcAsm#"}",
                        MaskingPattern, itin>,
               EVEX_K {
       // In case of the 3src subclass this is overridden with a let.
       string Constraints = MaskingConstraint;
   }
   let AddedComplexity = 30 in // Prefer over VMOV*rrkz Pat<>
     def NAME#kz: AVX512<O, F, Outs, ZeroMaskingIns,
                        OpcodeStr#"\t{"#AttSrcAsm#", $dst {${mask}} {z}|"#
                                      "$dst {${mask}} {z}, "#IntelSrcAsm#"}",
                        ZeroMaskingPattern,
                        itin>,
               EVEX_KZ;
 }
 
 
 // Common base class of AVX512_maskable and AVX512_maskable_3src.
 multiclass AVX512_maskable_common<bits<8> O, Format F, X86VectorVTInfo _,
                                   dag Outs,
                                   dag Ins, dag MaskingIns, dag ZeroMaskingIns,
                                   string OpcodeStr,
                                   string AttSrcAsm, string IntelSrcAsm,
                                   dag RHS, dag MaskingRHS,
                                   SDNode Select = vselect,
                                   string MaskingConstraint = "",
                                   InstrItinClass itin = NoItinerary,
                                   bit IsCommutable = 0> :
   AVX512_maskable_custom<O, F, Outs, Ins, MaskingIns, ZeroMaskingIns, OpcodeStr,
                          AttSrcAsm, IntelSrcAsm,
                          [(set _.RC:$dst, RHS)],
                          [(set _.RC:$dst, MaskingRHS)],
                          [(set _.RC:$dst,
                                (Select _.KRCWM:$mask, RHS, _.ImmAllZerosV))],
                          MaskingConstraint, NoItinerary, IsCommutable>;
 
 // This multiclass generates the unconditional/non-masking, the masking and
 // the zero-masking variant of the vector instruction.  In the masking case, the
 // perserved vector elements come from a new dummy input operand tied to $dst.
 multiclass AVX512_maskable<bits<8> O, Format F, X86VectorVTInfo _,
                            dag Outs, dag Ins, string OpcodeStr,
                            string AttSrcAsm, string IntelSrcAsm,
                            dag RHS,
                            InstrItinClass itin = NoItinerary,
                            bit IsCommutable = 0> :
    AVX512_maskable_common<O, F, _, Outs, Ins,
                           !con((ins _.RC:$src0, _.KRCWM:$mask), Ins),
                           !con((ins _.KRCWM:$mask), Ins),
                           OpcodeStr, AttSrcAsm, IntelSrcAsm, RHS,
                           (vselect _.KRCWM:$mask, RHS, _.RC:$src0), vselect,
                           "$src0 = $dst", itin, IsCommutable>;
 
 // This multiclass generates the unconditional/non-masking, the masking and
 // the zero-masking variant of the scalar instruction.
 multiclass AVX512_maskable_scalar<bits<8> O, Format F, X86VectorVTInfo _,
                            dag Outs, dag Ins, string OpcodeStr,
                            string AttSrcAsm, string IntelSrcAsm,
                            dag RHS,
                            InstrItinClass itin = NoItinerary,
                            bit IsCommutable = 0> :
    AVX512_maskable_common<O, F, _, Outs, Ins,
                           !con((ins _.RC:$src0, _.KRCWM:$mask), Ins),
                           !con((ins _.KRCWM:$mask), Ins),
                           OpcodeStr, AttSrcAsm, IntelSrcAsm, RHS,
                           (X86select _.KRCWM:$mask, RHS, _.RC:$src0), X86select,
                           "$src0 = $dst", itin, IsCommutable>;
 
 // Similar to AVX512_maskable but in this case one of the source operands
 // ($src1) is already tied to $dst so we just use that for the preserved
 // vector elements.  NOTE that the NonTiedIns (the ins dag) should exclude
 // $src1.
 multiclass AVX512_maskable_3src<bits<8> O, Format F, X86VectorVTInfo _,
                                 dag Outs, dag NonTiedIns, string OpcodeStr,
                                 string AttSrcAsm, string IntelSrcAsm,
                                 dag RHS> :
    AVX512_maskable_common<O, F, _, Outs,
                           !con((ins _.RC:$src1), NonTiedIns),
                           !con((ins _.RC:$src1, _.KRCWM:$mask), NonTiedIns),
                           !con((ins _.RC:$src1, _.KRCWM:$mask), NonTiedIns),
                           OpcodeStr, AttSrcAsm, IntelSrcAsm, RHS,
                           (vselect _.KRCWM:$mask, RHS, _.RC:$src1)>;
 
 // Similar to AVX512_maskable_3rc but in this case the input VT for the tied
 // operand differs from the output VT. This requires a bitconvert on
 // the preserved vector going into the vselect.
 multiclass AVX512_maskable_3src_cast<bits<8> O, Format F, X86VectorVTInfo OutVT,
                                      X86VectorVTInfo InVT,
                                      dag Outs, dag NonTiedIns, string OpcodeStr,
                                      string AttSrcAsm, string IntelSrcAsm,
                                      dag RHS> :
    AVX512_maskable_common<O, F, OutVT, Outs,
                           !con((ins InVT.RC:$src1), NonTiedIns),
                           !con((ins InVT.RC:$src1, InVT.KRCWM:$mask), NonTiedIns),
                           !con((ins InVT.RC:$src1, InVT.KRCWM:$mask), NonTiedIns),
                           OpcodeStr, AttSrcAsm, IntelSrcAsm, RHS,
                           (vselect InVT.KRCWM:$mask, RHS,
                            (bitconvert InVT.RC:$src1))>;
 
 multiclass AVX512_maskable_3src_scalar<bits<8> O, Format F, X86VectorVTInfo _,
                                      dag Outs, dag NonTiedIns, string OpcodeStr,
                                      string AttSrcAsm, string IntelSrcAsm,
                                      dag RHS> :
    AVX512_maskable_common<O, F, _, Outs,
                           !con((ins _.RC:$src1), NonTiedIns),
                           !con((ins _.RC:$src1, _.KRCWM:$mask), NonTiedIns),
                           !con((ins _.RC:$src1, _.KRCWM:$mask), NonTiedIns),
                           OpcodeStr, AttSrcAsm, IntelSrcAsm, RHS,
                           (X86select _.KRCWM:$mask, RHS, _.RC:$src1)>;
 
 multiclass AVX512_maskable_in_asm<bits<8> O, Format F, X86VectorVTInfo _,
                                   dag Outs, dag Ins,
                                   string OpcodeStr,
                                   string AttSrcAsm, string IntelSrcAsm,
                                   list<dag> Pattern> :
    AVX512_maskable_custom<O, F, Outs, Ins,
                           !con((ins _.RC:$src0, _.KRCWM:$mask), Ins),
                           !con((ins _.KRCWM:$mask), Ins),
                           OpcodeStr, AttSrcAsm, IntelSrcAsm, Pattern, [], [],
                           "$src0 = $dst">;
 
 
 // Instruction with mask that puts result in mask register,
 // like "compare" and "vptest"
 multiclass AVX512_maskable_custom_cmp<bits<8> O, Format F,
                                   dag Outs,
                                   dag Ins, dag MaskingIns,
                                   string OpcodeStr,
                                   string AttSrcAsm, string IntelSrcAsm,
                                   list<dag> Pattern,
                                   list<dag> MaskingPattern> {
     def NAME: AVX512<O, F, Outs, Ins,
                        OpcodeStr#"\t{"#AttSrcAsm#", $dst|"#
                                      "$dst, "#IntelSrcAsm#"}",
                        Pattern, NoItinerary>;
 
     def NAME#k: AVX512<O, F, Outs, MaskingIns,
                        OpcodeStr#"\t{"#AttSrcAsm#", $dst {${mask}}|"#
                                      "$dst {${mask}}, "#IntelSrcAsm#"}",
                        MaskingPattern, NoItinerary>, EVEX_K;
 }
 
 multiclass AVX512_maskable_common_cmp<bits<8> O, Format F, X86VectorVTInfo _,
                                   dag Outs,
                                   dag Ins, dag MaskingIns,
                                   string OpcodeStr,
                                   string AttSrcAsm, string IntelSrcAsm,
                                   dag RHS, dag MaskingRHS> :
   AVX512_maskable_custom_cmp<O, F, Outs, Ins, MaskingIns, OpcodeStr,
                          AttSrcAsm, IntelSrcAsm,
                          [(set _.KRC:$dst, RHS)],
                          [(set _.KRC:$dst, MaskingRHS)]>;
 
 multiclass AVX512_maskable_cmp<bits<8> O, Format F, X86VectorVTInfo _,
                            dag Outs, dag Ins, string OpcodeStr,
                            string AttSrcAsm, string IntelSrcAsm,
                            dag RHS> :
    AVX512_maskable_common_cmp<O, F, _, Outs, Ins,
                           !con((ins _.KRCWM:$mask), Ins),
                           OpcodeStr, AttSrcAsm, IntelSrcAsm, RHS,
                           (and _.KRCWM:$mask, RHS)>;
 
 multiclass AVX512_maskable_cmp_alt<bits<8> O, Format F, X86VectorVTInfo _,
                            dag Outs, dag Ins, string OpcodeStr,
                            string AttSrcAsm, string IntelSrcAsm> :
    AVX512_maskable_custom_cmp<O, F, Outs,
                              Ins, !con((ins _.KRCWM:$mask),Ins), OpcodeStr,
                              AttSrcAsm, IntelSrcAsm, [],[]>;
 
 // Bitcasts between 512-bit vector types. Return the original type since
 // no instruction is needed for the conversion
 let Predicates = [HasAVX512] in {
   def : Pat<(v8f64  (bitconvert (v8i64 VR512:$src))),  (v8f64 VR512:$src)>;
   def : Pat<(v8f64  (bitconvert (v16i32 VR512:$src))), (v8f64 VR512:$src)>;
   def : Pat<(v8f64  (bitconvert (v32i16 VR512:$src))),  (v8f64 VR512:$src)>;
   def : Pat<(v8f64  (bitconvert (v64i8 VR512:$src))), (v8f64 VR512:$src)>;
   def : Pat<(v8f64  (bitconvert (v16f32 VR512:$src))), (v8f64 VR512:$src)>;
   def : Pat<(v16f32 (bitconvert (v8i64 VR512:$src))),  (v16f32 VR512:$src)>;
   def : Pat<(v16f32 (bitconvert (v16i32 VR512:$src))), (v16f32 VR512:$src)>;
   def : Pat<(v16f32 (bitconvert (v32i16 VR512:$src))), (v16f32 VR512:$src)>;
   def : Pat<(v16f32 (bitconvert (v64i8 VR512:$src))), (v16f32 VR512:$src)>;
   def : Pat<(v16f32 (bitconvert (v8f64 VR512:$src))),  (v16f32 VR512:$src)>;
   def : Pat<(v8i64  (bitconvert (v16i32 VR512:$src))), (v8i64 VR512:$src)>;
   def : Pat<(v8i64  (bitconvert (v32i16 VR512:$src))), (v8i64 VR512:$src)>;
   def : Pat<(v8i64  (bitconvert (v64i8 VR512:$src))), (v8i64 VR512:$src)>;
   def : Pat<(v8i64  (bitconvert (v8f64 VR512:$src))),  (v8i64 VR512:$src)>;
   def : Pat<(v8i64  (bitconvert (v16f32 VR512:$src))), (v8i64 VR512:$src)>;
   def : Pat<(v16i32 (bitconvert (v8i64 VR512:$src))), (v16i32 VR512:$src)>;
   def : Pat<(v16i32 (bitconvert (v16f32 VR512:$src))), (v16i32 VR512:$src)>;
   def : Pat<(v16i32 (bitconvert (v32i16 VR512:$src))),  (v16i32 VR512:$src)>;
   def : Pat<(v16i32 (bitconvert (v64i8 VR512:$src))),  (v16i32 VR512:$src)>;
   def : Pat<(v16i32 (bitconvert (v8f64 VR512:$src))),  (v16i32 VR512:$src)>;
   def : Pat<(v32i16 (bitconvert (v8i64 VR512:$src))), (v32i16 VR512:$src)>;
   def : Pat<(v32i16 (bitconvert (v16i32 VR512:$src))),  (v32i16 VR512:$src)>;
   def : Pat<(v32i16 (bitconvert (v64i8 VR512:$src))),  (v32i16 VR512:$src)>;
   def : Pat<(v32i16 (bitconvert (v8f64 VR512:$src))),  (v32i16 VR512:$src)>;
   def : Pat<(v32i16 (bitconvert (v16f32 VR512:$src))), (v32i16 VR512:$src)>;
   def : Pat<(v32i16 (bitconvert (v16f32 VR512:$src))), (v32i16 VR512:$src)>;
   def : Pat<(v64i8  (bitconvert (v8i64 VR512:$src))), (v64i8 VR512:$src)>;
   def : Pat<(v64i8  (bitconvert (v16i32 VR512:$src))), (v64i8 VR512:$src)>;
   def : Pat<(v64i8  (bitconvert (v32i16 VR512:$src))), (v64i8 VR512:$src)>;
   def : Pat<(v64i8  (bitconvert (v8f64 VR512:$src))),  (v64i8 VR512:$src)>;
   def : Pat<(v64i8  (bitconvert (v16f32 VR512:$src))), (v64i8 VR512:$src)>;
 
   def : Pat<(v2i64 (bitconvert (v4i32 VR128X:$src))), (v2i64 VR128X:$src)>;
   def : Pat<(v2i64 (bitconvert (v8i16 VR128X:$src))), (v2i64 VR128X:$src)>;
   def : Pat<(v2i64 (bitconvert (v16i8 VR128X:$src))), (v2i64 VR128X:$src)>;
   def : Pat<(v2i64 (bitconvert (v2f64 VR128X:$src))), (v2i64 VR128X:$src)>;
   def : Pat<(v2i64 (bitconvert (v4f32 VR128X:$src))), (v2i64 VR128X:$src)>;
   def : Pat<(v4i32 (bitconvert (v2i64 VR128X:$src))), (v4i32 VR128X:$src)>;
   def : Pat<(v4i32 (bitconvert (v8i16 VR128X:$src))), (v4i32 VR128X:$src)>;
   def : Pat<(v4i32 (bitconvert (v16i8 VR128X:$src))), (v4i32 VR128X:$src)>;
   def : Pat<(v4i32 (bitconvert (v2f64 VR128X:$src))), (v4i32 VR128X:$src)>;
   def : Pat<(v4i32 (bitconvert (v4f32 VR128X:$src))), (v4i32 VR128X:$src)>;
   def : Pat<(v8i16 (bitconvert (v2i64 VR128X:$src))), (v8i16 VR128X:$src)>;
   def : Pat<(v8i16 (bitconvert (v4i32 VR128X:$src))), (v8i16 VR128X:$src)>;
   def : Pat<(v8i16 (bitconvert (v16i8 VR128X:$src))), (v8i16 VR128X:$src)>;
   def : Pat<(v8i16 (bitconvert (v2f64 VR128X:$src))), (v8i16 VR128X:$src)>;
   def : Pat<(v8i16 (bitconvert (v4f32 VR128X:$src))), (v8i16 VR128X:$src)>;
   def : Pat<(v16i8 (bitconvert (v2i64 VR128X:$src))), (v16i8 VR128X:$src)>;
   def : Pat<(v16i8 (bitconvert (v4i32 VR128X:$src))), (v16i8 VR128X:$src)>;
   def : Pat<(v16i8 (bitconvert (v8i16 VR128X:$src))), (v16i8 VR128X:$src)>;
   def : Pat<(v16i8 (bitconvert (v2f64 VR128X:$src))), (v16i8 VR128X:$src)>;
   def : Pat<(v16i8 (bitconvert (v4f32 VR128X:$src))), (v16i8 VR128X:$src)>;
   def : Pat<(v4f32 (bitconvert (v2i64 VR128X:$src))), (v4f32 VR128X:$src)>;
   def : Pat<(v4f32 (bitconvert (v4i32 VR128X:$src))), (v4f32 VR128X:$src)>;
   def : Pat<(v4f32 (bitconvert (v8i16 VR128X:$src))), (v4f32 VR128X:$src)>;
   def : Pat<(v4f32 (bitconvert (v16i8 VR128X:$src))), (v4f32 VR128X:$src)>;
   def : Pat<(v4f32 (bitconvert (v2f64 VR128X:$src))), (v4f32 VR128X:$src)>;
   def : Pat<(v2f64 (bitconvert (v2i64 VR128X:$src))), (v2f64 VR128X:$src)>;
   def : Pat<(v2f64 (bitconvert (v4i32 VR128X:$src))), (v2f64 VR128X:$src)>;
   def : Pat<(v2f64 (bitconvert (v8i16 VR128X:$src))), (v2f64 VR128X:$src)>;
   def : Pat<(v2f64 (bitconvert (v16i8 VR128X:$src))), (v2f64 VR128X:$src)>;
   def : Pat<(v2f64 (bitconvert (v4f32 VR128X:$src))), (v2f64 VR128X:$src)>;
 
 // Bitcasts between 256-bit vector types. Return the original type since
 // no instruction is needed for the conversion
   def : Pat<(v4f64  (bitconvert (v8f32 VR256X:$src))),  (v4f64 VR256X:$src)>;
   def : Pat<(v4f64  (bitconvert (v8i32 VR256X:$src))),  (v4f64 VR256X:$src)>;
   def : Pat<(v4f64  (bitconvert (v4i64 VR256X:$src))),  (v4f64 VR256X:$src)>;
   def : Pat<(v4f64  (bitconvert (v16i16 VR256X:$src))), (v4f64 VR256X:$src)>;
   def : Pat<(v4f64  (bitconvert (v32i8 VR256X:$src))),  (v4f64 VR256X:$src)>;
   def : Pat<(v8f32  (bitconvert (v8i32 VR256X:$src))),  (v8f32 VR256X:$src)>;
   def : Pat<(v8f32  (bitconvert (v4i64 VR256X:$src))),  (v8f32 VR256X:$src)>;
   def : Pat<(v8f32  (bitconvert (v4f64 VR256X:$src))),  (v8f32 VR256X:$src)>;
   def : Pat<(v8f32  (bitconvert (v32i8 VR256X:$src))),  (v8f32 VR256X:$src)>;
   def : Pat<(v8f32  (bitconvert (v16i16 VR256X:$src))), (v8f32 VR256X:$src)>;
   def : Pat<(v4i64  (bitconvert (v8f32 VR256X:$src))),  (v4i64 VR256X:$src)>;
   def : Pat<(v4i64  (bitconvert (v8i32 VR256X:$src))),  (v4i64 VR256X:$src)>;
   def : Pat<(v4i64  (bitconvert (v4f64 VR256X:$src))),  (v4i64 VR256X:$src)>;
   def : Pat<(v4i64  (bitconvert (v32i8 VR256X:$src))),  (v4i64 VR256X:$src)>;
   def : Pat<(v4i64  (bitconvert (v16i16 VR256X:$src))), (v4i64 VR256X:$src)>;
   def : Pat<(v32i8  (bitconvert (v4f64 VR256X:$src))),  (v32i8 VR256X:$src)>;
   def : Pat<(v32i8  (bitconvert (v4i64 VR256X:$src))),  (v32i8 VR256X:$src)>;
   def : Pat<(v32i8  (bitconvert (v8f32 VR256X:$src))),  (v32i8 VR256X:$src)>;
   def : Pat<(v32i8  (bitconvert (v8i32 VR256X:$src))),  (v32i8 VR256X:$src)>;
   def : Pat<(v32i8  (bitconvert (v16i16 VR256X:$src))), (v32i8 VR256X:$src)>;
   def : Pat<(v8i32  (bitconvert (v32i8 VR256X:$src))),  (v8i32 VR256X:$src)>;
   def : Pat<(v8i32  (bitconvert (v16i16 VR256X:$src))), (v8i32 VR256X:$src)>;
   def : Pat<(v8i32  (bitconvert (v8f32 VR256X:$src))),  (v8i32 VR256X:$src)>;
   def : Pat<(v8i32  (bitconvert (v4i64 VR256X:$src))),  (v8i32 VR256X:$src)>;
   def : Pat<(v8i32  (bitconvert (v4f64 VR256X:$src))),  (v8i32 VR256X:$src)>;
   def : Pat<(v16i16 (bitconvert (v8f32 VR256X:$src))),  (v16i16 VR256X:$src)>;
   def : Pat<(v16i16 (bitconvert (v8i32 VR256X:$src))),  (v16i16 VR256X:$src)>;
   def : Pat<(v16i16 (bitconvert (v4i64 VR256X:$src))),  (v16i16 VR256X:$src)>;
   def : Pat<(v16i16 (bitconvert (v4f64 VR256X:$src))),  (v16i16 VR256X:$src)>;
   def : Pat<(v16i16 (bitconvert (v32i8 VR256X:$src))),  (v16i16 VR256X:$src)>;
 }
 
 //
 // AVX-512: VPXOR instruction writes zero to its upper part, it's safe build zeros.
 //
 
 let isReMaterializable = 1, isAsCheapAsAMove = 1, canFoldAsLoad = 1,
     isPseudo = 1, Predicates = [HasAVX512] in {
 def AVX512_512_SET0 : I<0, Pseudo, (outs VR512:$dst), (ins), "",
                [(set VR512:$dst, (v16f32 immAllZerosV))]>;
 }
 
 let Predicates = [HasAVX512] in {
 def : Pat<(v8i64 immAllZerosV), (AVX512_512_SET0)>;
 def : Pat<(v16i32 immAllZerosV), (AVX512_512_SET0)>;
 def : Pat<(v8f64 immAllZerosV), (AVX512_512_SET0)>;
 }
 
 //===----------------------------------------------------------------------===//
 // AVX-512 - VECTOR INSERT
 //
 multiclass vinsert_for_size<int Opcode, X86VectorVTInfo From, X86VectorVTInfo To,
                                                        PatFrag vinsert_insert> {
   let hasSideEffects = 0, ExeDomain = To.ExeDomain in {
     defm rr : AVX512_maskable<Opcode, MRMSrcReg, To, (outs To.RC:$dst),
                    (ins To.RC:$src1, From.RC:$src2, i32u8imm:$src3),
                    "vinsert" # From.EltTypeName # "x" # From.NumElts,
                    "$src3, $src2, $src1", "$src1, $src2, $src3",
                    (vinsert_insert:$src3 (To.VT To.RC:$src1),
                                          (From.VT From.RC:$src2),
                                          (iPTR imm))>, AVX512AIi8Base, EVEX_4V;
 
   let mayLoad = 1 in
     defm rm : AVX512_maskable<Opcode, MRMSrcMem, To, (outs To.RC:$dst),
                    (ins To.RC:$src1, From.MemOp:$src2, i32u8imm:$src3),
                    "vinsert" # From.EltTypeName # "x" # From.NumElts,
                    "$src3, $src2, $src1", "$src1, $src2, $src3",
                    (vinsert_insert:$src3 (To.VT To.RC:$src1),
                                (From.VT (bitconvert (From.LdFrag addr:$src2))),
                                (iPTR imm))>, AVX512AIi8Base, EVEX_4V,
                    EVEX_CD8<From.EltSize, From.CD8TupleForm>;
   }
 }
 
 multiclass vinsert_for_size_lowering<string InstrStr, X86VectorVTInfo From,
                        X86VectorVTInfo To, PatFrag vinsert_insert,
                        SDNodeXForm INSERT_get_vinsert_imm , list<Predicate> p> {
   let Predicates = p in {
     def : Pat<(vinsert_insert:$ins
                      (To.VT To.RC:$src1), (From.VT From.RC:$src2), (iPTR imm)),
               (To.VT (!cast<Instruction>(InstrStr#"rr")
                      To.RC:$src1, From.RC:$src2,
                      (INSERT_get_vinsert_imm To.RC:$ins)))>;
 
     def : Pat<(vinsert_insert:$ins
                   (To.VT To.RC:$src1),
                   (From.VT (bitconvert (From.LdFrag addr:$src2))),
                   (iPTR imm)),
               (To.VT (!cast<Instruction>(InstrStr#"rm")
                   To.RC:$src1, addr:$src2,
                   (INSERT_get_vinsert_imm To.RC:$ins)))>;
   }
 }
 
 multiclass vinsert_for_type<ValueType EltVT32, int Opcode128,
                             ValueType EltVT64, int Opcode256> {
 
   let Predicates = [HasVLX] in
     defm NAME # "32x4Z256" : vinsert_for_size<Opcode128,
                                  X86VectorVTInfo< 4, EltVT32, VR128X>,
                                  X86VectorVTInfo< 8, EltVT32, VR256X>,
                                  vinsert128_insert>, EVEX_V256;
 
   defm NAME # "32x4Z" : vinsert_for_size<Opcode128,
                                  X86VectorVTInfo< 4, EltVT32, VR128X>,
                                  X86VectorVTInfo<16, EltVT32, VR512>,
                                  vinsert128_insert>, EVEX_V512;
 
   defm NAME # "64x4Z" : vinsert_for_size<Opcode256,
                                  X86VectorVTInfo< 4, EltVT64, VR256X>,
                                  X86VectorVTInfo< 8, EltVT64, VR512>,
                                  vinsert256_insert>, VEX_W, EVEX_V512;
 
   let Predicates = [HasVLX, HasDQI] in
     defm NAME # "64x2Z256" : vinsert_for_size<Opcode128,
                                    X86VectorVTInfo< 2, EltVT64, VR128X>,
                                    X86VectorVTInfo< 4, EltVT64, VR256X>,
                                    vinsert128_insert>, VEX_W, EVEX_V256;
 
   let Predicates = [HasDQI] in {
     defm NAME # "64x2Z" : vinsert_for_size<Opcode128,
                                  X86VectorVTInfo< 2, EltVT64, VR128X>,
                                  X86VectorVTInfo< 8, EltVT64, VR512>,
                                  vinsert128_insert>, VEX_W, EVEX_V512;
 
     defm NAME # "32x8Z" : vinsert_for_size<Opcode256,
                                    X86VectorVTInfo< 8, EltVT32, VR256X>,
                                    X86VectorVTInfo<16, EltVT32, VR512>,
                                    vinsert256_insert>, EVEX_V512;
   }
 }
 
 defm VINSERTF : vinsert_for_type<f32, 0x18, f64, 0x1a>;
 defm VINSERTI : vinsert_for_type<i32, 0x38, i64, 0x3a>;
 
 // Codegen pattern with the alternative types,
 // Only add this if 64x2 and its friends are not supported natively via AVX512DQ.
 defm : vinsert_for_size_lowering<"VINSERTF32x4Z256", v2f64x_info, v4f64x_info,
               vinsert128_insert, INSERT_get_vinsert128_imm, [HasVLX, NoDQI]>;
 defm : vinsert_for_size_lowering<"VINSERTI32x4Z256", v2i64x_info, v4i64x_info,
               vinsert128_insert, INSERT_get_vinsert128_imm, [HasVLX, NoDQI]>;
 
 defm : vinsert_for_size_lowering<"VINSERTF32x4Z", v2f64x_info, v8f64_info,
               vinsert128_insert, INSERT_get_vinsert128_imm, [HasAVX512, NoDQI]>;
 defm : vinsert_for_size_lowering<"VINSERTI32x4Z", v2i64x_info, v8i64_info,
               vinsert128_insert, INSERT_get_vinsert128_imm, [HasAVX512, NoDQI]>;
 
 defm : vinsert_for_size_lowering<"VINSERTF64x4Z", v8f32x_info, v16f32_info,
               vinsert256_insert, INSERT_get_vinsert256_imm, [HasAVX512, NoDQI]>;
 defm : vinsert_for_size_lowering<"VINSERTI64x4Z", v8i32x_info, v16i32_info,
               vinsert256_insert, INSERT_get_vinsert256_imm, [HasAVX512, NoDQI]>;
 
 // Codegen pattern with the alternative types insert VEC128 into VEC256
 defm : vinsert_for_size_lowering<"VINSERTI32x4Z256", v8i16x_info, v16i16x_info,
               vinsert128_insert, INSERT_get_vinsert128_imm, [HasVLX]>;
 defm : vinsert_for_size_lowering<"VINSERTI32x4Z256", v16i8x_info, v32i8x_info,
               vinsert128_insert, INSERT_get_vinsert128_imm, [HasVLX]>;
 // Codegen pattern with the alternative types insert VEC128 into VEC512
 defm : vinsert_for_size_lowering<"VINSERTI32x4Z", v8i16x_info, v32i16_info,
               vinsert128_insert, INSERT_get_vinsert128_imm, [HasAVX512]>;
 defm : vinsert_for_size_lowering<"VINSERTI32x4Z", v16i8x_info, v64i8_info,
                vinsert128_insert, INSERT_get_vinsert128_imm, [HasAVX512]>;
 // Codegen pattern with the alternative types insert VEC256 into VEC512
 defm : vinsert_for_size_lowering<"VINSERTI64x4Z", v16i16x_info, v32i16_info,
               vinsert256_insert, INSERT_get_vinsert256_imm, [HasAVX512]>;
 defm : vinsert_for_size_lowering<"VINSERTI64x4Z", v32i8x_info, v64i8_info,
               vinsert256_insert, INSERT_get_vinsert256_imm, [HasAVX512]>;
 
 // vinsertps - insert f32 to XMM
 def VINSERTPSzrr : AVX512AIi8<0x21, MRMSrcReg, (outs VR128X:$dst),
       (ins VR128X:$src1, VR128X:$src2, u8imm:$src3),
       "vinsertps\t{$src3, $src2, $src1, $dst|$dst, $src1, $src2, $src3}",
       [(set VR128X:$dst, (X86insertps VR128X:$src1, VR128X:$src2, imm:$src3))]>,
       EVEX_4V;
 def VINSERTPSzrm: AVX512AIi8<0x21, MRMSrcMem, (outs VR128X:$dst),
       (ins VR128X:$src1, f32mem:$src2, u8imm:$src3),
       "vinsertps\t{$src3, $src2, $src1, $dst|$dst, $src1, $src2, $src3}",
       [(set VR128X:$dst, (X86insertps VR128X:$src1,
                           (v4f32 (scalar_to_vector (loadf32 addr:$src2))),
                           imm:$src3))]>, EVEX_4V, EVEX_CD8<32, CD8VT1>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512 VECTOR EXTRACT
 //---
 
 multiclass vextract_for_size_first_position_lowering<X86VectorVTInfo From,
                                                      X86VectorVTInfo To> {
   // A subvector extract from the first vector position is
   // a subregister copy that needs no instruction.
   def NAME # To.NumElts:
       Pat<(To.VT (extract_subvector (From.VT From.RC:$src),(iPTR 0))),
           (To.VT (EXTRACT_SUBREG (From.VT From.RC:$src), To.SubRegIdx))>;
 }
 
 multiclass vextract_for_size<int Opcode,
                                     X86VectorVTInfo From, X86VectorVTInfo To,
                                     PatFrag vextract_extract> :
   vextract_for_size_first_position_lowering<From, To> {
 
   let hasSideEffects = 0, ExeDomain = To.ExeDomain in {
     // use AVX512_maskable_in_asm (AVX512_maskable can't be used due to
     // vextract_extract), we interesting only in patterns without mask,
     // intrinsics pattern match generated bellow.
     defm rr : AVX512_maskable_in_asm<Opcode, MRMDestReg, To, (outs To.RC:$dst),
                 (ins From.RC:$src1, i32u8imm:$idx),
                 "vextract" # To.EltTypeName # "x" # To.NumElts,
                 "$idx, $src1", "$src1, $idx",
                 [(set To.RC:$dst, (vextract_extract:$idx (From.VT From.RC:$src1),
                                                          (iPTR imm)))]>,
               AVX512AIi8Base, EVEX;
     let mayStore = 1 in {
       def rm  : AVX512AIi8<Opcode, MRMDestMem, (outs),
                       (ins To.MemOp:$dst, From.RC:$src1, i32u8imm:$src2),
                       "vextract" # To.EltTypeName # "x" # To.NumElts #
                           "\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                       []>, EVEX;
 
       def rmk : AVX512AIi8<Opcode, MRMDestMem, (outs),
                       (ins To.MemOp:$dst, To.KRCWM:$mask,
                                           From.RC:$src1, i32u8imm:$src2),
                        "vextract" # To.EltTypeName # "x" # To.NumElts #
                             "\t{$src2, $src1, $dst {${mask}}|"
                             "$dst {${mask}}, $src1, $src2}",
                       []>, EVEX_K, EVEX;
     }//mayStore = 1
   }
 
   // Intrinsic call with masking.
   def : Pat<(!cast<Intrinsic>("int_x86_avx512_mask_vextract" # To.EltTypeName #
                               "x" # To.NumElts # "_" # From.Size)
                 From.RC:$src1, (iPTR imm:$idx), To.RC:$src0, To.MRC:$mask),
             (!cast<Instruction>(NAME # To.EltSize # "x" # To.NumElts #
                                 From.ZSuffix # "rrk")
                 To.RC:$src0,
                 (COPY_TO_REGCLASS To.MRC:$mask, To.KRCWM),
                 From.RC:$src1, imm:$idx)>;
 
   // Intrinsic call with zero-masking.
   def : Pat<(!cast<Intrinsic>("int_x86_avx512_mask_vextract" # To.EltTypeName #
                               "x" # To.NumElts # "_" # From.Size)
                 From.RC:$src1, (iPTR imm:$idx), To.ImmAllZerosV, To.MRC:$mask),
             (!cast<Instruction>(NAME # To.EltSize # "x" # To.NumElts #
                                 From.ZSuffix # "rrkz")
                 (COPY_TO_REGCLASS To.MRC:$mask, To.KRCWM),
                 From.RC:$src1, imm:$idx)>;
 
   // Intrinsic call without masking.
   def : Pat<(!cast<Intrinsic>("int_x86_avx512_mask_vextract" # To.EltTypeName #
                               "x" # To.NumElts # "_" # From.Size)
                 From.RC:$src1, (iPTR imm:$idx), To.ImmAllZerosV, (i8 -1)),
             (!cast<Instruction>(NAME # To.EltSize # "x" # To.NumElts #
                                 From.ZSuffix # "rr")
                 From.RC:$src1, imm:$idx)>;
 }
 
 // Codegen pattern for the alternative types
 multiclass vextract_for_size_lowering<string InstrStr, X86VectorVTInfo From,
                 X86VectorVTInfo To, PatFrag vextract_extract,
                 SDNodeXForm EXTRACT_get_vextract_imm, list<Predicate> p> :
   vextract_for_size_first_position_lowering<From, To> {
 
   let Predicates = p in
      def : Pat<(vextract_extract:$ext (From.VT From.RC:$src1), (iPTR imm)),
                (To.VT (!cast<Instruction>(InstrStr#"rr")
                           From.RC:$src1,
                           (EXTRACT_get_vextract_imm To.RC:$ext)))>;
 }
 
 multiclass vextract_for_type<ValueType EltVT32, int Opcode128,
                                              ValueType EltVT64, int Opcode256> {
   defm NAME # "32x4Z" : vextract_for_size<Opcode128,
                                  X86VectorVTInfo<16, EltVT32, VR512>,
                                  X86VectorVTInfo< 4, EltVT32, VR128X>,
                                  vextract128_extract>,
                                      EVEX_V512, EVEX_CD8<32, CD8VT4>;
   defm NAME # "64x4Z" : vextract_for_size<Opcode256,
                                  X86VectorVTInfo< 8, EltVT64, VR512>,
                                  X86VectorVTInfo< 4, EltVT64, VR256X>,
                                  vextract256_extract>,
                                      VEX_W, EVEX_V512, EVEX_CD8<64, CD8VT4>;
   let Predicates = [HasVLX] in
     defm NAME # "32x4Z256" : vextract_for_size<Opcode128,
                                  X86VectorVTInfo< 8, EltVT32, VR256X>,
                                  X86VectorVTInfo< 4, EltVT32, VR128X>,
                                  vextract128_extract>,
                                      EVEX_V256, EVEX_CD8<32, CD8VT4>;
   let Predicates = [HasVLX, HasDQI] in
     defm NAME # "64x2Z256" : vextract_for_size<Opcode128,
                                  X86VectorVTInfo< 4, EltVT64, VR256X>,
                                  X86VectorVTInfo< 2, EltVT64, VR128X>,
                                  vextract128_extract>,
                                      VEX_W, EVEX_V256, EVEX_CD8<64, CD8VT2>;
   let Predicates = [HasDQI] in {
     defm NAME # "64x2Z" : vextract_for_size<Opcode128,
                                  X86VectorVTInfo< 8, EltVT64, VR512>,
                                  X86VectorVTInfo< 2, EltVT64, VR128X>,
                                  vextract128_extract>,
                                      VEX_W, EVEX_V512, EVEX_CD8<64, CD8VT2>;
     defm NAME # "32x8Z" : vextract_for_size<Opcode256,
                                  X86VectorVTInfo<16, EltVT32, VR512>,
                                  X86VectorVTInfo< 8, EltVT32, VR256X>,
                                  vextract256_extract>,
                                      EVEX_V512, EVEX_CD8<32, CD8VT8>;
   }
 }
 
 defm VEXTRACTF : vextract_for_type<f32, 0x19, f64, 0x1b>;
 defm VEXTRACTI : vextract_for_type<i32, 0x39, i64, 0x3b>;
 
 // extract_subvector codegen patterns with the alternative types.
 // Only add this if 64x2 and its friends are not supported natively via AVX512DQ.
 defm : vextract_for_size_lowering<"VEXTRACTF32x4Z", v8f64_info, v2f64x_info,
           vextract128_extract, EXTRACT_get_vextract128_imm, [HasAVX512, NoDQI]>;
 defm : vextract_for_size_lowering<"VEXTRACTI32x4Z", v8i64_info, v2i64x_info,
           vextract128_extract, EXTRACT_get_vextract128_imm, [HasAVX512, NoDQI]>;
 
 defm : vextract_for_size_lowering<"VEXTRACTF64x4Z", v16f32_info, v8f32x_info,
           vextract256_extract, EXTRACT_get_vextract256_imm, [HasAVX512, NoDQI]>;
 defm : vextract_for_size_lowering<"VEXTRACTI64x4Z", v16i32_info, v8i32x_info,
           vextract256_extract, EXTRACT_get_vextract256_imm, [HasAVX512, NoDQI]>;
 
 defm : vextract_for_size_lowering<"VEXTRACTF32x4Z256", v4f64x_info, v2f64x_info,
           vextract128_extract, EXTRACT_get_vextract128_imm, [HasVLX, NoDQI]>;
 defm : vextract_for_size_lowering<"VEXTRACTI32x4Z256", v4i64x_info, v2i64x_info,
           vextract128_extract, EXTRACT_get_vextract128_imm, [HasVLX, NoDQI]>;
 
 // Codegen pattern with the alternative types extract VEC128 from VEC512
 defm : vextract_for_size_lowering<"VEXTRACTI32x4Z", v32i16_info, v8i16x_info,
                  vextract128_extract, EXTRACT_get_vextract128_imm, [HasAVX512]>;
 defm : vextract_for_size_lowering<"VEXTRACTI32x4Z", v64i8_info, v16i8x_info,
                  vextract128_extract, EXTRACT_get_vextract128_imm, [HasAVX512]>;
 // Codegen pattern with the alternative types extract VEC256 from VEC512
 defm : vextract_for_size_lowering<"VEXTRACTI64x4Z", v32i16_info, v16i16x_info,
                  vextract256_extract, EXTRACT_get_vextract256_imm, [HasAVX512]>;
 defm : vextract_for_size_lowering<"VEXTRACTI64x4Z", v64i8_info, v32i8x_info,
                  vextract256_extract, EXTRACT_get_vextract256_imm, [HasAVX512]>;
 
 // A 128-bit subvector insert to the first 512-bit vector position
 // is a subregister copy that needs no instruction.
 def : Pat<(insert_subvector undef, (v2i64 VR128X:$src), (iPTR 0)),
           (INSERT_SUBREG (v8i64 (IMPLICIT_DEF)),
           (INSERT_SUBREG (v4i64 (IMPLICIT_DEF)), VR128X:$src, sub_xmm),
           sub_ymm)>;
 def : Pat<(insert_subvector undef, (v2f64 VR128X:$src), (iPTR 0)),
           (INSERT_SUBREG (v8f64 (IMPLICIT_DEF)),
           (INSERT_SUBREG (v4f64 (IMPLICIT_DEF)), VR128X:$src, sub_xmm),
           sub_ymm)>;
 def : Pat<(insert_subvector undef, (v4i32 VR128X:$src), (iPTR 0)),
           (INSERT_SUBREG (v16i32 (IMPLICIT_DEF)),
           (INSERT_SUBREG (v8i32 (IMPLICIT_DEF)), VR128X:$src, sub_xmm),
           sub_ymm)>;
 def : Pat<(insert_subvector undef, (v4f32 VR128X:$src), (iPTR 0)),
           (INSERT_SUBREG (v16f32 (IMPLICIT_DEF)),
           (INSERT_SUBREG (v8f32 (IMPLICIT_DEF)), VR128X:$src, sub_xmm),
           sub_ymm)>;
 
 def : Pat<(insert_subvector undef, (v4i64 VR256X:$src), (iPTR 0)),
           (INSERT_SUBREG (v8i64 (IMPLICIT_DEF)), VR256X:$src, sub_ymm)>;
 def : Pat<(insert_subvector undef, (v4f64 VR256X:$src), (iPTR 0)),
           (INSERT_SUBREG (v8f64 (IMPLICIT_DEF)), VR256X:$src, sub_ymm)>;
 def : Pat<(insert_subvector undef, (v8i32 VR256X:$src), (iPTR 0)),
           (INSERT_SUBREG (v16i32 (IMPLICIT_DEF)), VR256X:$src, sub_ymm)>;
 def : Pat<(insert_subvector undef, (v8f32 VR256X:$src), (iPTR 0)),
           (INSERT_SUBREG (v16f32 (IMPLICIT_DEF)), VR256X:$src, sub_ymm)>;
 def : Pat<(insert_subvector undef, (v16i16 VR256X:$src), (iPTR 0)),
           (INSERT_SUBREG (v32i16 (IMPLICIT_DEF)), VR256X:$src, sub_ymm)>;
 def : Pat<(insert_subvector undef, (v32i8 VR256X:$src), (iPTR 0)),
           (INSERT_SUBREG (v64i8 (IMPLICIT_DEF)), VR256X:$src, sub_ymm)>;
 
 // vextractps - extract 32 bits from XMM
 def VEXTRACTPSzrr : AVX512AIi8<0x17, MRMDestReg, (outs GR32:$dst),
       (ins VR128X:$src1, u8imm:$src2),
       "vextractps\t{$src2, $src1, $dst|$dst, $src1, $src2}",
       [(set GR32:$dst, (extractelt (bc_v4i32 (v4f32 VR128X:$src1)), imm:$src2))]>,
       EVEX;
 
 def VEXTRACTPSzmr : AVX512AIi8<0x17, MRMDestMem, (outs),
       (ins f32mem:$dst, VR128X:$src1, u8imm:$src2),
       "vextractps\t{$src2, $src1, $dst|$dst, $src1, $src2}",
       [(store (extractelt (bc_v4i32 (v4f32 VR128X:$src1)), imm:$src2),
                           addr:$dst)]>, EVEX, EVEX_CD8<32, CD8VT1>;
 
 //===---------------------------------------------------------------------===//
 // AVX-512 BROADCAST
 //---
 
 multiclass avx512_broadcast_rm<bits<8> opc, string OpcodeStr,
                             X86VectorVTInfo DestInfo, X86VectorVTInfo SrcInfo> {
 
   defm r : AVX512_maskable<opc, MRMSrcReg, DestInfo, (outs DestInfo.RC:$dst),
                    (ins SrcInfo.RC:$src), OpcodeStr, "$src", "$src",
                    (DestInfo.VT (X86VBroadcast (SrcInfo.VT SrcInfo.RC:$src)))>,
                    T8PD, EVEX;
   let mayLoad = 1 in
     defm m : AVX512_maskable<opc, MRMSrcMem, DestInfo, (outs DestInfo.RC:$dst),
                      (ins SrcInfo.ScalarMemOp:$src), OpcodeStr, "$src", "$src",
                      (DestInfo.VT (X86VBroadcast
                                      (SrcInfo.ScalarLdFrag addr:$src)))>,
                      T8PD, EVEX, EVEX_CD8<SrcInfo.EltSize, CD8VT1>;
 }
 
 multiclass avx512_fp_broadcast_vl<bits<8> opc, string OpcodeStr,
                                                        AVX512VLVectorVTInfo _> {
   defm Z  : avx512_broadcast_rm<opc, OpcodeStr, _.info512, _.info128>,
                              EVEX_V512;
 
   let Predicates = [HasVLX] in {
     defm Z256  : avx512_broadcast_rm<opc, OpcodeStr, _.info256, _.info128>,
                              EVEX_V256;
   }
 }
 
 let ExeDomain = SSEPackedSingle in {
   defm VBROADCASTSS  : avx512_fp_broadcast_vl<0x18, "vbroadcastss",
                                          avx512vl_f32_info>;
    let Predicates = [HasVLX] in {
      defm VBROADCASTSSZ128  : avx512_broadcast_rm<0x18, "vbroadcastss",
                                          v4f32x_info, v4f32x_info>, EVEX_V128;
    }
 }
 
 let ExeDomain = SSEPackedDouble in {
   defm VBROADCASTSD  : avx512_fp_broadcast_vl<0x19, "vbroadcastsd",
                                          avx512vl_f64_info>, VEX_W;
 }
 
 // avx512_broadcast_pat introduces patterns for broadcast with a scalar argument.
 // Later, we can canonize broadcast instructions before ISel phase and
 // eliminate additional patterns on ISel.
 // SrcRC_v and SrcRC_s are RegisterClasses for vector and scalar
 // representations of source
 multiclass avx512_broadcast_pat<string InstName, SDNode OpNode,
                                 X86VectorVTInfo _, RegisterClass SrcRC_v,
                                 RegisterClass SrcRC_s> {
   def : Pat<(_.VT (OpNode  (_.EltVT SrcRC_s:$src))),
             (!cast<Instruction>(InstName##"r")
               (COPY_TO_REGCLASS SrcRC_s:$src, SrcRC_v))>;
 
   let AddedComplexity = 30 in {
     def : Pat<(_.VT (vselect _.KRCWM:$mask,
                 (OpNode (_.EltVT SrcRC_s:$src)), _.RC:$src0)),
               (!cast<Instruction>(InstName##"rk") _.RC:$src0, _.KRCWM:$mask,
                 (COPY_TO_REGCLASS SrcRC_s:$src, SrcRC_v))>;
 
     def : Pat<(_.VT(vselect _.KRCWM:$mask,
                 (OpNode (_.EltVT SrcRC_s:$src)), _.ImmAllZerosV)),
               (!cast<Instruction>(InstName##"rkz") _.KRCWM:$mask,
                 (COPY_TO_REGCLASS SrcRC_s:$src, SrcRC_v))>;
   }
 }
 
 defm : avx512_broadcast_pat<"VBROADCASTSSZ", X86VBroadcast, v16f32_info,
                             VR128X, FR32X>;
 defm : avx512_broadcast_pat<"VBROADCASTSDZ", X86VBroadcast, v8f64_info,
                             VR128X, FR64X>;
 
 let Predicates = [HasVLX] in {
   defm : avx512_broadcast_pat<"VBROADCASTSSZ256", X86VBroadcast,
                               v8f32x_info, VR128X, FR32X>;
   defm : avx512_broadcast_pat<"VBROADCASTSSZ128", X86VBroadcast,
                               v4f32x_info, VR128X, FR32X>;
   defm : avx512_broadcast_pat<"VBROADCASTSDZ256", X86VBroadcast,
                               v4f64x_info, VR128X, FR64X>;
 }
 
 def : Pat<(v16f32 (X86VBroadcast (loadf32 addr:$src))),
           (VBROADCASTSSZm addr:$src)>;
 def : Pat<(v8f64 (X86VBroadcast (loadf64 addr:$src))),
           (VBROADCASTSDZm addr:$src)>;
 
 def : Pat<(int_x86_avx512_vbroadcast_ss_512 addr:$src),
           (VBROADCASTSSZm addr:$src)>;
 def : Pat<(int_x86_avx512_vbroadcast_sd_512 addr:$src),
           (VBROADCASTSDZm addr:$src)>;
 
 multiclass avx512_int_broadcast_reg<bits<8> opc, X86VectorVTInfo _,
                                     RegisterClass SrcRC> {
   defm r : AVX512_maskable_in_asm<opc, MRMSrcReg, _, (outs _.RC:$dst),
                            (ins SrcRC:$src),  "vpbroadcast"##_.Suffix,
                            "$src", "$src", []>, T8PD, EVEX;
 }
 
 multiclass avx512_int_broadcast_reg_vl<bits<8> opc, AVX512VLVectorVTInfo _,
                                        RegisterClass SrcRC, Predicate prd> {
   let Predicates = [prd] in
     defm Z : avx512_int_broadcast_reg<opc, _.info512, SrcRC>, EVEX_V512;
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_int_broadcast_reg<opc, _.info256, SrcRC>, EVEX_V256;
     defm Z128 : avx512_int_broadcast_reg<opc, _.info128, SrcRC>, EVEX_V128;
   }
 }
 
 defm VPBROADCASTBr : avx512_int_broadcast_reg_vl<0x7A, avx512vl_i8_info, GR32,
                                                  HasBWI>;
 defm VPBROADCASTWr : avx512_int_broadcast_reg_vl<0x7B, avx512vl_i16_info, GR32,
                                                  HasBWI>;
 defm VPBROADCASTDr : avx512_int_broadcast_reg_vl<0x7C, avx512vl_i32_info, GR32,
                                                  HasAVX512>;
 defm VPBROADCASTQr : avx512_int_broadcast_reg_vl<0x7C, avx512vl_i64_info, GR64,
                                                  HasAVX512>, VEX_W;
 
 def : Pat <(v16i32 (X86vzext VK16WM:$mask)),
            (VPBROADCASTDrZrkz VK16WM:$mask, (i32 (MOV32ri 0x1)))>;
 
 def : Pat <(v8i64 (X86vzext VK8WM:$mask)),
            (VPBROADCASTQrZrkz VK8WM:$mask, (i64 (MOV64ri 0x1)))>;
 
 def : Pat<(v16i32 (X86VBroadcast (i32 GR32:$src))),
         (VPBROADCASTDrZr GR32:$src)>;
 def : Pat<(v8i64 (X86VBroadcast (i64 GR64:$src))),
         (VPBROADCASTQrZr GR64:$src)>;
 
 def : Pat<(v16i32 (int_x86_avx512_pbroadcastd_i32_512 (i32 GR32:$src))),
         (VPBROADCASTDrZr GR32:$src)>;
 def : Pat<(v8i64 (int_x86_avx512_pbroadcastq_i64_512 (i64 GR64:$src))),
         (VPBROADCASTQrZr GR64:$src)>;
 
 def : Pat<(v16i32 (int_x86_avx512_mask_pbroadcast_d_gpr_512 (i32 GR32:$src),
                    (v16i32 immAllZerosV), (i16 GR16:$mask))),
           (VPBROADCASTDrZrkz (COPY_TO_REGCLASS GR16:$mask, VK16WM), GR32:$src)>;
 def : Pat<(v8i64 (int_x86_avx512_mask_pbroadcast_q_gpr_512 (i64 GR64:$src),
                    (bc_v8i64 (v16i32 immAllZerosV)), (i8 GR8:$mask))),
           (VPBROADCASTQrZrkz (COPY_TO_REGCLASS GR8:$mask, VK8WM), GR64:$src)>;
 
 // Provide aliases for broadcast from the same register class that
 // automatically does the extract.
 multiclass avx512_int_broadcast_rm_lowering<X86VectorVTInfo DestInfo,
                                             X86VectorVTInfo SrcInfo> {
   def : Pat<(DestInfo.VT (X86VBroadcast (SrcInfo.VT SrcInfo.RC:$src))),
             (!cast<Instruction>(NAME#DestInfo.ZSuffix#"r")
                 (EXTRACT_SUBREG (SrcInfo.VT SrcInfo.RC:$src), sub_xmm))>;
 }
 
 multiclass avx512_int_broadcast_rm_vl<bits<8> opc, string OpcodeStr,
                                         AVX512VLVectorVTInfo _, Predicate prd> {
   let Predicates = [prd] in {
     defm Z :   avx512_broadcast_rm<opc, OpcodeStr, _.info512, _.info128>,
                avx512_int_broadcast_rm_lowering<_.info512, _.info256>,
                                   EVEX_V512;
     // Defined separately to avoid redefinition.
     defm Z_Alt : avx512_int_broadcast_rm_lowering<_.info512, _.info512>;
   }
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_broadcast_rm<opc, OpcodeStr, _.info256, _.info128>,
                 avx512_int_broadcast_rm_lowering<_.info256, _.info256>,
                                  EVEX_V256;
     defm Z128 : avx512_broadcast_rm<opc, OpcodeStr, _.info128, _.info128>,
                                  EVEX_V128;
   }
 }
 
 defm VPBROADCASTB  : avx512_int_broadcast_rm_vl<0x78, "vpbroadcastb",
                                            avx512vl_i8_info, HasBWI>;
 defm VPBROADCASTW  : avx512_int_broadcast_rm_vl<0x79, "vpbroadcastw",
                                            avx512vl_i16_info, HasBWI>;
 defm VPBROADCASTD  : avx512_int_broadcast_rm_vl<0x58, "vpbroadcastd",
                                            avx512vl_i32_info, HasAVX512>;
 defm VPBROADCASTQ  : avx512_int_broadcast_rm_vl<0x59, "vpbroadcastq",
                                            avx512vl_i64_info, HasAVX512>, VEX_W;
 
 multiclass avx512_subvec_broadcast_rm<bits<8> opc, string OpcodeStr,
                           X86VectorVTInfo _Dst, X86VectorVTInfo _Src> {
   let mayLoad = 1 in 
     defm rm : AVX512_maskable<opc, MRMSrcMem, _Dst, (outs _Dst.RC:$dst), 
                              (ins _Src.MemOp:$src), OpcodeStr, "$src", "$src",
                              (_Dst.VT (X86SubVBroadcast
                                (_Src.VT (bitconvert (_Src.LdFrag addr:$src)))))>, 
                               AVX5128IBase, EVEX;
 }
 
 defm VBROADCASTI32X4 : avx512_subvec_broadcast_rm<0x5a, "vbroadcasti32x4",
                        v16i32_info, v4i32x_info>,
                        EVEX_V512, EVEX_CD8<32, CD8VT4>;
 defm VBROADCASTF32X4 : avx512_subvec_broadcast_rm<0x1a, "vbroadcastf32x4",
                        v16f32_info, v4f32x_info>,
                        EVEX_V512, EVEX_CD8<32, CD8VT4>;
 defm VBROADCASTI64X4 : avx512_subvec_broadcast_rm<0x5b, "vbroadcasti64x4",
                        v8i64_info, v4i64x_info>, VEX_W,
                        EVEX_V512, EVEX_CD8<64, CD8VT4>;
 defm VBROADCASTF64X4 : avx512_subvec_broadcast_rm<0x1b, "vbroadcastf64x4",
                        v8f64_info, v4f64x_info>, VEX_W,
                        EVEX_V512, EVEX_CD8<64, CD8VT4>;
 
 let Predicates = [HasVLX] in {
 defm VBROADCASTI32X4Z256 : avx512_subvec_broadcast_rm<0x5a, "vbroadcasti32x4",
                            v8i32x_info, v4i32x_info>,
                            EVEX_V256, EVEX_CD8<32, CD8VT4>;
 defm VBROADCASTF32X4Z256 : avx512_subvec_broadcast_rm<0x1a, "vbroadcastf32x4",
                            v8f32x_info, v4f32x_info>,
                            EVEX_V256, EVEX_CD8<32, CD8VT4>;
 }
 let Predicates = [HasVLX, HasDQI] in {
 defm VBROADCASTI64X2Z128 : avx512_subvec_broadcast_rm<0x5a, "vbroadcasti64x2",
                            v4i64x_info, v2i64x_info>, VEX_W,
                            EVEX_V256, EVEX_CD8<64, CD8VT2>;
 defm VBROADCASTF64X2Z128 : avx512_subvec_broadcast_rm<0x1a, "vbroadcastf64x2",
                            v4f64x_info, v2f64x_info>, VEX_W,
                            EVEX_V256, EVEX_CD8<64, CD8VT2>;
 }
 let Predicates = [HasDQI] in {
 defm VBROADCASTI64X2 : avx512_subvec_broadcast_rm<0x5a, "vbroadcasti64x2",
                        v8i64_info, v2i64x_info>, VEX_W,
                        EVEX_V512, EVEX_CD8<64, CD8VT2>;
 defm VBROADCASTI32X8 : avx512_subvec_broadcast_rm<0x5b, "vbroadcasti32x8",
                        v16i32_info, v8i32x_info>,
                        EVEX_V512, EVEX_CD8<32, CD8VT8>;
 defm VBROADCASTF64X2 : avx512_subvec_broadcast_rm<0x1a, "vbroadcastf64x2",
                        v8f64_info, v2f64x_info>, VEX_W,
                        EVEX_V512, EVEX_CD8<64, CD8VT2>;
 defm VBROADCASTF32X8 : avx512_subvec_broadcast_rm<0x1b, "vbroadcastf32x8",
                        v16f32_info, v8f32x_info>,
                        EVEX_V512, EVEX_CD8<32, CD8VT8>;
 }
 
 multiclass avx512_broadcast_32x2<bits<8> opc, string OpcodeStr,
                                  X86VectorVTInfo _Dst, X86VectorVTInfo _Src,
                                  SDNode OpNode = X86SubVBroadcast> {
 
   defm r : AVX512_maskable<opc, MRMSrcReg, _Dst, (outs _Dst.RC:$dst),
                    (ins _Src.RC:$src), OpcodeStr, "$src", "$src",
                    (_Dst.VT (OpNode (_Src.VT _Src.RC:$src)))>,
                    T8PD, EVEX;
   let mayLoad = 1 in
     defm m : AVX512_maskable<opc, MRMSrcMem, _Dst, (outs _Dst.RC:$dst),
                    (ins _Src.ScalarMemOp:$src), OpcodeStr, "$src", "$src",
                    (_Dst.VT (OpNode
                               (_Src.VT (scalar_to_vector(loadi64 addr:$src)))))>,
                    T8PD, EVEX, EVEX_CD8<_Src.EltSize, CD8VT2>;
 }
 
 multiclass avx512_common_broadcast_32x2<bits<8> opc, string OpcodeStr,
                              AVX512VLVectorVTInfo _> {
   let Predicates = [HasDQI] in
     defm Z :    avx512_broadcast_32x2<opc, OpcodeStr, _.info512, _.info128>,
                                   EVEX_V512;
   let Predicates = [HasDQI, HasVLX] in
     defm Z256 : avx512_broadcast_32x2<opc, OpcodeStr, _.info256, _.info128>,
                                   EVEX_V256;
 }
 
 multiclass avx512_common_broadcast_i32x2<bits<8> opc, string OpcodeStr,
                                                        AVX512VLVectorVTInfo _> :
   avx512_common_broadcast_32x2<opc, OpcodeStr, _> {
 
   let Predicates = [HasDQI, HasVLX] in
     defm Z128 : avx512_broadcast_32x2<opc, OpcodeStr, _.info128, _.info128,
                                       X86SubV32x2Broadcast>, EVEX_V128;
 }
 
 defm VPBROADCASTI32X2  : avx512_common_broadcast_i32x2<0x59, "vbroadcasti32x2",
                                            avx512vl_i32_info>;
 defm VPBROADCASTF32X2  : avx512_common_broadcast_32x2<0x19, "vbroadcastf32x2",
                                            avx512vl_f32_info>;
 
 def : Pat<(v16f32 (X86VBroadcast (v16f32 VR512:$src))),
           (VBROADCASTSSZr (EXTRACT_SUBREG (v16f32 VR512:$src), sub_xmm))>;
 def : Pat<(v16f32 (X86VBroadcast (v8f32 VR256X:$src))),
           (VBROADCASTSSZr (EXTRACT_SUBREG (v8f32 VR256X:$src), sub_xmm))>;
 
 def : Pat<(v8f64 (X86VBroadcast (v8f64 VR512:$src))),
           (VBROADCASTSDZr (EXTRACT_SUBREG (v8f64 VR512:$src), sub_xmm))>;
 def : Pat<(v8f64 (X86VBroadcast (v4f64 VR256X:$src))),
           (VBROADCASTSDZr (EXTRACT_SUBREG (v4f64 VR256X:$src), sub_xmm))>;
 
 // Provide fallback in case the load node that is used in the patterns above
 // is used by additional users, which prevents the pattern selection.
 def : Pat<(v16f32 (X86VBroadcast FR32X:$src)),
           (VBROADCASTSSZr (COPY_TO_REGCLASS FR32X:$src, VR128X))>;
 def : Pat<(v8f64 (X86VBroadcast FR64X:$src)),
           (VBROADCASTSDZr (COPY_TO_REGCLASS FR64X:$src, VR128X))>;
 
 
 //===----------------------------------------------------------------------===//
 // AVX-512 BROADCAST MASK TO VECTOR REGISTER
 //---
 multiclass avx512_mask_broadcastm<bits<8> opc, string OpcodeStr,
                                   X86VectorVTInfo _, RegisterClass KRC> {
   def rr : AVX512XS8I<opc, MRMSrcReg, (outs _.RC:$dst), (ins KRC:$src),
                   !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"),
                   [(set _.RC:$dst, (_.VT (X86VBroadcastm KRC:$src)))]>, EVEX;
 }
 
 multiclass avx512_mask_broadcast<bits<8> opc, string OpcodeStr, 
                                  AVX512VLVectorVTInfo VTInfo, RegisterClass KRC> {
   let Predicates = [HasCDI] in
     defm Z : avx512_mask_broadcastm<opc, OpcodeStr, VTInfo.info512, KRC>, EVEX_V512;
   let Predicates = [HasCDI, HasVLX] in {
     defm Z256 : avx512_mask_broadcastm<opc, OpcodeStr, VTInfo.info256, KRC>, EVEX_V256;
     defm Z128 : avx512_mask_broadcastm<opc, OpcodeStr, VTInfo.info128, KRC>, EVEX_V128;
   }
 }
 
 defm VPBROADCASTMW2D : avx512_mask_broadcast<0x3A, "vpbroadcastmw2d",
                                                avx512vl_i32_info, VK16>;
 defm VPBROADCASTMB2Q : avx512_mask_broadcast<0x2A, "vpbroadcastmb2q",
                                                avx512vl_i64_info, VK8>, VEX_W;
 
 //===----------------------------------------------------------------------===//
 // -- VPERMI2 - 3 source operands form --
 multiclass avx512_perm_i<bits<8> opc, string OpcodeStr,
                          X86VectorVTInfo _, X86VectorVTInfo IdxVT> {
 let Constraints = "$src1 = $dst" in {
   defm rr: AVX512_maskable_3src_cast<opc, MRMSrcReg, _, IdxVT, (outs _.RC:$dst),
           (ins _.RC:$src2, _.RC:$src3),
           OpcodeStr, "$src3, $src2", "$src2, $src3",
           (_.VT (X86VPermi2X IdxVT.RC:$src1, _.RC:$src2, _.RC:$src3))>, EVEX_4V,
          AVX5128IBase;
 
   let mayLoad = 1 in
   defm rm: AVX512_maskable_3src_cast<opc, MRMSrcMem, _, IdxVT, (outs _.RC:$dst),
             (ins _.RC:$src2, _.MemOp:$src3),
             OpcodeStr, "$src3, $src2", "$src2, $src3",
             (_.VT (X86VPermi2X IdxVT.RC:$src1, _.RC:$src2,
                    (_.VT (bitconvert (_.LdFrag addr:$src3)))))>,
             EVEX_4V, AVX5128IBase;
   }
 }
 multiclass avx512_perm_i_mb<bits<8> opc, string OpcodeStr,
                             X86VectorVTInfo _, X86VectorVTInfo IdxVT> {
   let mayLoad = 1, Constraints = "$src1 = $dst" in
   defm rmb: AVX512_maskable_3src_cast<opc, MRMSrcMem, _, IdxVT, (outs _.RC:$dst),
               (ins _.RC:$src2, _.ScalarMemOp:$src3),
               OpcodeStr,   !strconcat("${src3}", _.BroadcastStr,", $src2"),
               !strconcat("$src2, ${src3}", _.BroadcastStr ),
               (_.VT (X86VPermi2X IdxVT.RC:$src1,
                _.RC:$src2,(_.VT (X86VBroadcast (_.ScalarLdFrag addr:$src3)))))>,
               AVX5128IBase, EVEX_4V, EVEX_B;
 }
 
 multiclass avx512_perm_i_sizes<bits<8> opc, string OpcodeStr,
                                AVX512VLVectorVTInfo VTInfo,
                                AVX512VLVectorVTInfo ShuffleMask> {
   defm NAME: avx512_perm_i<opc, OpcodeStr, VTInfo.info512,
                            ShuffleMask.info512>,
             avx512_perm_i_mb<opc, OpcodeStr, VTInfo.info512,
                              ShuffleMask.info512>, EVEX_V512;
   let Predicates = [HasVLX] in {
   defm NAME#128: avx512_perm_i<opc, OpcodeStr, VTInfo.info128,
                                ShuffleMask.info128>,
                  avx512_perm_i_mb<opc, OpcodeStr, VTInfo.info128,
                                   ShuffleMask.info128>, EVEX_V128;
   defm NAME#256: avx512_perm_i<opc, OpcodeStr, VTInfo.info256,
                                ShuffleMask.info256>,
                  avx512_perm_i_mb<opc, OpcodeStr, VTInfo.info256,
                                   ShuffleMask.info256>,  EVEX_V256;
   }
 }
 
 multiclass avx512_perm_i_sizes_w<bits<8> opc, string OpcodeStr,
                                  AVX512VLVectorVTInfo VTInfo,
                                  AVX512VLVectorVTInfo Idx> {
   let Predicates = [HasBWI] in
   defm NAME: avx512_perm_i<opc, OpcodeStr, VTInfo.info512,
                            Idx.info512>, EVEX_V512;
   let Predicates = [HasBWI, HasVLX] in {
   defm NAME#128: avx512_perm_i<opc, OpcodeStr, VTInfo.info128,
                                Idx.info128>, EVEX_V128;
   defm NAME#256: avx512_perm_i<opc, OpcodeStr, VTInfo.info256,
                                Idx.info256>,  EVEX_V256;
   }
 }
 
 defm VPERMI2D  : avx512_perm_i_sizes<0x76, "vpermi2d",
                   avx512vl_i32_info, avx512vl_i32_info>, EVEX_CD8<32, CD8VF>;
 defm VPERMI2Q  : avx512_perm_i_sizes<0x76, "vpermi2q",
                   avx512vl_i64_info, avx512vl_i64_info>, VEX_W, EVEX_CD8<64, CD8VF>;
 defm VPERMI2W  : avx512_perm_i_sizes_w<0x75, "vpermi2w",
                   avx512vl_i16_info, avx512vl_i16_info>, VEX_W, EVEX_CD8<16, CD8VF>;
 defm VPERMI2PS : avx512_perm_i_sizes<0x77, "vpermi2ps",
                   avx512vl_f32_info, avx512vl_i32_info>, EVEX_CD8<32, CD8VF>;
 defm VPERMI2PD : avx512_perm_i_sizes<0x77, "vpermi2pd",
                   avx512vl_f64_info, avx512vl_i64_info>, VEX_W, EVEX_CD8<64, CD8VF>;
 
 // VPERMT2
 multiclass avx512_perm_t<bits<8> opc, string OpcodeStr,
                          X86VectorVTInfo _, X86VectorVTInfo IdxVT> {
 let Constraints = "$src1 = $dst" in {
   defm rr: AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),
           (ins IdxVT.RC:$src2, _.RC:$src3),
           OpcodeStr, "$src3, $src2", "$src2, $src3",
           (_.VT (X86VPermt2 _.RC:$src1, IdxVT.RC:$src2, _.RC:$src3))>, EVEX_4V,
          AVX5128IBase;
 
   let mayLoad = 1 in
   defm rm: AVX512_maskable_3src<opc, MRMSrcMem, _, (outs _.RC:$dst),
             (ins IdxVT.RC:$src2, _.MemOp:$src3),
             OpcodeStr, "$src3, $src2", "$src2, $src3",
             (_.VT (X86VPermt2 _.RC:$src1, IdxVT.RC:$src2,
                    (bitconvert (_.LdFrag addr:$src3))))>,
             EVEX_4V, AVX5128IBase;
   }
 }
 multiclass avx512_perm_t_mb<bits<8> opc, string OpcodeStr,
                             X86VectorVTInfo _, X86VectorVTInfo IdxVT> {
   let mayLoad = 1, Constraints = "$src1 = $dst" in
   defm rmb: AVX512_maskable_3src<opc, MRMSrcMem, _, (outs _.RC:$dst),
               (ins IdxVT.RC:$src2, _.ScalarMemOp:$src3),
               OpcodeStr,   !strconcat("${src3}", _.BroadcastStr,", $src2"),
               !strconcat("$src2, ${src3}", _.BroadcastStr ),
               (_.VT (X86VPermt2 _.RC:$src1,
                IdxVT.RC:$src2,(_.VT (X86VBroadcast (_.ScalarLdFrag addr:$src3)))))>,
               AVX5128IBase, EVEX_4V, EVEX_B;
 }
 
 multiclass avx512_perm_t_sizes<bits<8> opc, string OpcodeStr,
                                AVX512VLVectorVTInfo VTInfo,
                                AVX512VLVectorVTInfo ShuffleMask> {
   defm NAME: avx512_perm_t<opc, OpcodeStr, VTInfo.info512,
                               ShuffleMask.info512>,
             avx512_perm_t_mb<opc, OpcodeStr, VTInfo.info512,
                               ShuffleMask.info512>, EVEX_V512;
   let Predicates = [HasVLX] in {
   defm NAME#128: avx512_perm_t<opc, OpcodeStr, VTInfo.info128,
                               ShuffleMask.info128>,
                  avx512_perm_t_mb<opc, OpcodeStr, VTInfo.info128,
                               ShuffleMask.info128>, EVEX_V128;
   defm NAME#256: avx512_perm_t<opc, OpcodeStr, VTInfo.info256,
                               ShuffleMask.info256>,
                  avx512_perm_t_mb<opc, OpcodeStr, VTInfo.info256,
                               ShuffleMask.info256>, EVEX_V256;
   }
 }
 
 multiclass avx512_perm_t_sizes_w<bits<8> opc, string OpcodeStr,
                                  AVX512VLVectorVTInfo VTInfo,
                                  AVX512VLVectorVTInfo Idx> {
   let Predicates = [HasBWI] in
   defm NAME: avx512_perm_t<opc, OpcodeStr, VTInfo.info512,
                            Idx.info512>, EVEX_V512;
   let Predicates = [HasBWI, HasVLX] in {
   defm NAME#128: avx512_perm_t<opc, OpcodeStr, VTInfo.info128,
                                Idx.info128>, EVEX_V128;
   defm NAME#256: avx512_perm_t<opc, OpcodeStr, VTInfo.info256,
                                Idx.info256>, EVEX_V256;
   }
 }
 
 defm VPERMT2D  : avx512_perm_t_sizes<0x7E, "vpermt2d",
                   avx512vl_i32_info, avx512vl_i32_info>, EVEX_CD8<32, CD8VF>;
 defm VPERMT2Q  : avx512_perm_t_sizes<0x7E, "vpermt2q",
                   avx512vl_i64_info, avx512vl_i64_info>, VEX_W, EVEX_CD8<64, CD8VF>;
 defm VPERMT2W  : avx512_perm_t_sizes_w<0x7D, "vpermt2w",
                   avx512vl_i16_info, avx512vl_i16_info>, VEX_W, EVEX_CD8<16, CD8VF>;
 defm VPERMT2PS : avx512_perm_t_sizes<0x7F, "vpermt2ps",
                   avx512vl_f32_info, avx512vl_i32_info>, EVEX_CD8<32, CD8VF>;
 defm VPERMT2PD : avx512_perm_t_sizes<0x7F, "vpermt2pd",
                   avx512vl_f64_info, avx512vl_i64_info>, VEX_W, EVEX_CD8<64, CD8VF>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512 - BLEND using mask
 //
 multiclass avx512_blendmask<bits<8> opc, string OpcodeStr, X86VectorVTInfo _> {
   let ExeDomain = _.ExeDomain in {
   def rr : AVX5128I<opc, MRMSrcReg, (outs _.RC:$dst),
              (ins _.RC:$src1, _.RC:$src2),
              !strconcat(OpcodeStr,
              "\t{$src2, $src1, ${dst}|${dst}, $src1, $src2}"),
              []>, EVEX_4V;
   def rrk : AVX5128I<opc, MRMSrcReg, (outs _.RC:$dst),
              (ins _.KRCWM:$mask, _.RC:$src1, _.RC:$src2),
              !strconcat(OpcodeStr,
              "\t{$src2, $src1, ${dst} {${mask}}|${dst} {${mask}}, $src1, $src2}"),
              [(set _.RC:$dst, (X86select _.KRCWM:$mask, (_.VT _.RC:$src1),
                  (_.VT _.RC:$src2)))]>, EVEX_4V, EVEX_K;
   def rrkz : AVX5128I<opc, MRMSrcReg, (outs _.RC:$dst),
              (ins _.KRCWM:$mask, _.RC:$src1, _.RC:$src2),
              !strconcat(OpcodeStr,
              "\t{$src2, $src1, ${dst} {${mask}} {z}|${dst} {${mask}} {z}, $src1, $src2}"),
              []>, EVEX_4V, EVEX_KZ;
   let mayLoad = 1 in {
   def rm  : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst),
              (ins _.RC:$src1, _.MemOp:$src2),
              !strconcat(OpcodeStr,
              "\t{$src2, $src1, ${dst}|${dst}, $src1, $src2}"),
              []>, EVEX_4V, EVEX_CD8<_.EltSize, CD8VF>;
   def rmk : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst),
              (ins _.KRCWM:$mask, _.RC:$src1, _.MemOp:$src2),
              !strconcat(OpcodeStr,
              "\t{$src2, $src1, ${dst} {${mask}}|${dst} {${mask}}, $src1, $src2}"),
              [(set _.RC:$dst, (X86select _.KRCWM:$mask, (_.VT _.RC:$src1),
               (_.VT (bitconvert (_.LdFrag addr:$src2)))))]>,
               EVEX_4V, EVEX_K, EVEX_CD8<_.EltSize, CD8VF>;
   def rmkz : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst),
              (ins _.KRCWM:$mask, _.RC:$src1, _.MemOp:$src2),
              !strconcat(OpcodeStr,
              "\t{$src2, $src1, ${dst} {${mask}} {z}|${dst} {${mask}} {z}, $src1, $src2}"),
              []>, EVEX_4V, EVEX_KZ, EVEX_CD8<_.EltSize, CD8VF>;
   }
   }
 }
 multiclass avx512_blendmask_rmb<bits<8> opc, string OpcodeStr, X86VectorVTInfo _> {
 
   def rmbk : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst),
       (ins _.KRCWM:$mask, _.RC:$src1, _.ScalarMemOp:$src2),
        !strconcat(OpcodeStr,
             "\t{${src2}", _.BroadcastStr, ", $src1, $dst {${mask}}|",
             "$dst {${mask}}, $src1, ${src2}", _.BroadcastStr, "}"),
       [(set _.RC:$dst,(X86select _.KRCWM:$mask, (_.VT _.RC:$src1),
                        (X86VBroadcast (_.ScalarLdFrag addr:$src2))))]>,
       EVEX_4V, EVEX_K, EVEX_B, EVEX_CD8<_.EltSize, CD8VF>;
 
   def rmb : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst),
       (ins _.RC:$src1, _.ScalarMemOp:$src2),
        !strconcat(OpcodeStr,
             "\t{${src2}", _.BroadcastStr, ", $src1, $dst|",
             "$dst, $src1, ${src2}", _.BroadcastStr, "}"),
       []>,  EVEX_4V, EVEX_B, EVEX_CD8<_.EltSize, CD8VF>;
 
 }
 
 multiclass blendmask_dq <bits<8> opc, string OpcodeStr,
                                  AVX512VLVectorVTInfo VTInfo> {
   defm Z : avx512_blendmask      <opc, OpcodeStr, VTInfo.info512>,
            avx512_blendmask_rmb  <opc, OpcodeStr, VTInfo.info512>, EVEX_V512;
 
   let Predicates = [HasVLX] in {
     defm Z256 : avx512_blendmask<opc, OpcodeStr, VTInfo.info256>,
                 avx512_blendmask_rmb  <opc, OpcodeStr, VTInfo.info256>, EVEX_V256;
     defm Z128 : avx512_blendmask<opc, OpcodeStr, VTInfo.info128>,
                 avx512_blendmask_rmb  <opc, OpcodeStr, VTInfo.info128>, EVEX_V128;
   }
 }
 
 multiclass blendmask_bw <bits<8> opc, string OpcodeStr,
                          AVX512VLVectorVTInfo VTInfo> {
   let Predicates = [HasBWI] in
     defm Z : avx512_blendmask    <opc, OpcodeStr, VTInfo.info512>, EVEX_V512;
 
   let Predicates = [HasBWI, HasVLX] in {
     defm Z256 : avx512_blendmask <opc, OpcodeStr, VTInfo.info256>, EVEX_V256;
     defm Z128 : avx512_blendmask <opc, OpcodeStr, VTInfo.info128>, EVEX_V128;
   }
 }
 
 
 defm VBLENDMPS : blendmask_dq <0x65, "vblendmps", avx512vl_f32_info>;
 defm VBLENDMPD : blendmask_dq <0x65, "vblendmpd", avx512vl_f64_info>, VEX_W;
 defm VPBLENDMD : blendmask_dq <0x64, "vpblendmd", avx512vl_i32_info>;
 defm VPBLENDMQ : blendmask_dq <0x64, "vpblendmq", avx512vl_i64_info>, VEX_W;
 defm VPBLENDMB : blendmask_bw <0x66, "vpblendmb", avx512vl_i8_info>;
 defm VPBLENDMW : blendmask_bw <0x66, "vpblendmw", avx512vl_i16_info>, VEX_W;
 
 
 let Predicates = [HasAVX512] in {
 def : Pat<(v8f32 (vselect (v8i1 VK8WM:$mask), (v8f32 VR256X:$src1),
                             (v8f32 VR256X:$src2))),
             (EXTRACT_SUBREG
               (v16f32 (VBLENDMPSZrrk (COPY_TO_REGCLASS VK8WM:$mask, VK16WM),
             (v16f32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm)),
             (v16f32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)))), sub_ymm)>;
 
 def : Pat<(v8i32 (vselect (v8i1 VK8WM:$mask), (v8i32 VR256X:$src1),
                             (v8i32 VR256X:$src2))),
             (EXTRACT_SUBREG
                 (v16i32 (VPBLENDMDZrrk (COPY_TO_REGCLASS VK8WM:$mask, VK16WM),
             (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm)),
             (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)))), sub_ymm)>;
 }
 //===----------------------------------------------------------------------===//
 // Compare Instructions
 //===----------------------------------------------------------------------===//
 
 // avx512_cmp_scalar - AVX512 CMPSS and CMPSD
 
 multiclass avx512_cmp_scalar<X86VectorVTInfo _, SDNode OpNode, SDNode OpNodeRnd>{
 
   defm  rr_Int  : AVX512_maskable_cmp<0xC2, MRMSrcReg, _,
                       (outs _.KRC:$dst),
                       (ins _.RC:$src1, _.RC:$src2, AVXCC:$cc),
                       "vcmp${cc}"#_.Suffix,
                       "$src2, $src1", "$src1, $src2",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT _.RC:$src2),
                               imm:$cc)>, EVEX_4V;
   let mayLoad = 1 in
     defm  rm_Int  : AVX512_maskable_cmp<0xC2, MRMSrcMem, _,
                       (outs _.KRC:$dst),
                       (ins _.RC:$src1, _.MemOp:$src2, AVXCC:$cc),
                       "vcmp${cc}"#_.Suffix,
                       "$src2, $src1", "$src1, $src2",
                       (OpNode (_.VT _.RC:$src1),
                           (_.VT (scalar_to_vector (_.ScalarLdFrag addr:$src2))),
                           imm:$cc)>, EVEX_4V, EVEX_CD8<_.EltSize, CD8VT1>;
 
   defm  rrb_Int  : AVX512_maskable_cmp<0xC2, MRMSrcReg, _,
                      (outs _.KRC:$dst),
                      (ins _.RC:$src1, _.RC:$src2, AVXCC:$cc),
                      "vcmp${cc}"#_.Suffix,
                      "{sae}, $src2, $src1", "$src1, $src2, {sae}",
                      (OpNodeRnd (_.VT _.RC:$src1),
                                 (_.VT _.RC:$src2),
                                 imm:$cc,
                                 (i32 FROUND_NO_EXC))>, EVEX_4V, EVEX_B;
   // Accept explicit immediate argument form instead of comparison code.
   let isAsmParserOnly = 1, hasSideEffects = 0 in {
     defm  rri_alt  : AVX512_maskable_cmp_alt<0xC2, MRMSrcReg, _,
                         (outs VK1:$dst),
                         (ins _.RC:$src1, _.RC:$src2, u8imm:$cc),
                         "vcmp"#_.Suffix,
                         "$cc, $src2, $src1", "$src1, $src2, $cc">, EVEX_4V;
     defm  rmi_alt  : AVX512_maskable_cmp_alt<0xC2, MRMSrcMem, _,
                         (outs _.KRC:$dst),
                         (ins _.RC:$src1, _.MemOp:$src2, u8imm:$cc),
                         "vcmp"#_.Suffix,
                         "$cc, $src2, $src1", "$src1, $src2, $cc">,
                         EVEX_4V, EVEX_CD8<_.EltSize, CD8VT1>;
 
     defm  rrb_alt  : AVX512_maskable_cmp_alt<0xC2, MRMSrcReg, _,
                        (outs _.KRC:$dst),
                        (ins _.RC:$src1, _.RC:$src2, u8imm:$cc),
                        "vcmp"#_.Suffix,
                        "$cc, {sae}, $src2, $src1","$src1, $src2, {sae}, $cc">,
                        EVEX_4V, EVEX_B;
   }// let isAsmParserOnly = 1, hasSideEffects = 0
 
   let isCodeGenOnly = 1 in {
     def rr : AVX512Ii8<0xC2, MRMSrcReg,
                 (outs _.KRC:$dst), (ins _.FRC:$src1, _.FRC:$src2, AVXCC:$cc),
                 !strconcat("vcmp${cc}", _.Suffix,
                            "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
                 [(set _.KRC:$dst, (OpNode _.FRC:$src1,
                                           _.FRC:$src2,
                                           imm:$cc))],
                 IIC_SSE_ALU_F32S_RR>, EVEX_4V;
     let mayLoad = 1 in
       def rm : AVX512Ii8<0xC2, MRMSrcMem,
                 (outs _.KRC:$dst),
                 (ins _.FRC:$src1, _.ScalarMemOp:$src2, AVXCC:$cc),
                 !strconcat("vcmp${cc}", _.Suffix,
                            "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
                 [(set _.KRC:$dst, (OpNode _.FRC:$src1,
                                           (_.ScalarLdFrag addr:$src2),
                                           imm:$cc))],
                 IIC_SSE_ALU_F32P_RM>, EVEX_4V, EVEX_CD8<_.EltSize, CD8VT1>;
   }
 }
 
 let Predicates = [HasAVX512] in {
   defm VCMPSSZ : avx512_cmp_scalar<f32x_info, X86cmpms, X86cmpmsRnd>,
                                    AVX512XSIi8Base;
   defm VCMPSDZ : avx512_cmp_scalar<f64x_info, X86cmpms, X86cmpmsRnd>,
                                    AVX512XDIi8Base, VEX_W;
 }
 
 multiclass avx512_icmp_packed<bits<8> opc, string OpcodeStr, SDNode OpNode,
               X86VectorVTInfo _> {
   def rr : AVX512BI<opc, MRMSrcReg,
              (outs _.KRC:$dst), (ins _.RC:$src1, _.RC:$src2),
              !strconcat(OpcodeStr, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
              [(set _.KRC:$dst, (OpNode (_.VT _.RC:$src1), (_.VT _.RC:$src2)))],
              IIC_SSE_ALU_F32P_RR>, EVEX_4V;
   let mayLoad = 1 in
   def rm : AVX512BI<opc, MRMSrcMem,
              (outs _.KRC:$dst), (ins _.RC:$src1, _.MemOp:$src2),
              !strconcat(OpcodeStr, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
              [(set _.KRC:$dst, (OpNode (_.VT _.RC:$src1),
                                      (_.VT (bitconvert (_.LdFrag addr:$src2)))))],
              IIC_SSE_ALU_F32P_RM>, EVEX_4V;
   def rrk : AVX512BI<opc, MRMSrcReg,
               (outs _.KRC:$dst), (ins _.KRCWM:$mask, _.RC:$src1, _.RC:$src2),
               !strconcat(OpcodeStr, "\t{$src2, $src1, $dst {${mask}}|",
                           "$dst {${mask}}, $src1, $src2}"),
               [(set _.KRC:$dst, (and _.KRCWM:$mask,
                                    (OpNode (_.VT _.RC:$src1), (_.VT _.RC:$src2))))],
               IIC_SSE_ALU_F32P_RR>, EVEX_4V, EVEX_K;
   let mayLoad = 1 in
   def rmk : AVX512BI<opc, MRMSrcMem,
               (outs _.KRC:$dst), (ins _.KRCWM:$mask, _.RC:$src1, _.MemOp:$src2),
               !strconcat(OpcodeStr, "\t{$src2, $src1, $dst {${mask}}|",
                           "$dst {${mask}}, $src1, $src2}"),
               [(set _.KRC:$dst, (and _.KRCWM:$mask,
                                    (OpNode (_.VT _.RC:$src1),
                                        (_.VT (bitconvert
                                               (_.LdFrag addr:$src2))))))],
               IIC_SSE_ALU_F32P_RM>, EVEX_4V, EVEX_K;
 }
 
 multiclass avx512_icmp_packed_rmb<bits<8> opc, string OpcodeStr, SDNode OpNode,
               X86VectorVTInfo _> :
            avx512_icmp_packed<opc, OpcodeStr, OpNode, _> {
   let mayLoad = 1 in {
   def rmb : AVX512BI<opc, MRMSrcMem,
               (outs _.KRC:$dst), (ins _.RC:$src1, _.ScalarMemOp:$src2),
               !strconcat(OpcodeStr, "\t{${src2}", _.BroadcastStr, ", $src1, $dst",
                                     "|$dst, $src1, ${src2}", _.BroadcastStr, "}"),
               [(set _.KRC:$dst, (OpNode (_.VT _.RC:$src1),
                               (X86VBroadcast (_.ScalarLdFrag addr:$src2))))],
               IIC_SSE_ALU_F32P_RM>, EVEX_4V, EVEX_B;
   def rmbk : AVX512BI<opc, MRMSrcMem,
                (outs _.KRC:$dst), (ins _.KRCWM:$mask, _.RC:$src1,
                                        _.ScalarMemOp:$src2),
                !strconcat(OpcodeStr,
                           "\t{${src2}", _.BroadcastStr, ", $src1, $dst {${mask}}|",
                           "$dst {${mask}}, $src1, ${src2}", _.BroadcastStr, "}"),
                [(set _.KRC:$dst, (and _.KRCWM:$mask,
                                       (OpNode (_.VT _.RC:$src1),
                                         (X86VBroadcast
                                           (_.ScalarLdFrag addr:$src2)))))],
                IIC_SSE_ALU_F32P_RM>, EVEX_4V, EVEX_K, EVEX_B;
   }
 }
 
 multiclass avx512_icmp_packed_vl<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                  AVX512VLVectorVTInfo VTInfo, Predicate prd> {
   let Predicates = [prd] in
   defm Z : avx512_icmp_packed<opc, OpcodeStr, OpNode, VTInfo.info512>,
            EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_icmp_packed<opc, OpcodeStr, OpNode, VTInfo.info256>,
                 EVEX_V256;
     defm Z128 : avx512_icmp_packed<opc, OpcodeStr, OpNode, VTInfo.info128>,
                 EVEX_V128;
   }
 }
 
 multiclass avx512_icmp_packed_rmb_vl<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode, AVX512VLVectorVTInfo VTInfo,
                                   Predicate prd> {
   let Predicates = [prd] in
   defm Z : avx512_icmp_packed_rmb<opc, OpcodeStr, OpNode, VTInfo.info512>,
            EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_icmp_packed_rmb<opc, OpcodeStr, OpNode, VTInfo.info256>,
                 EVEX_V256;
     defm Z128 : avx512_icmp_packed_rmb<opc, OpcodeStr, OpNode, VTInfo.info128>,
                 EVEX_V128;
   }
 }
 
 defm VPCMPEQB : avx512_icmp_packed_vl<0x74, "vpcmpeqb", X86pcmpeqm,
                       avx512vl_i8_info, HasBWI>,
                 EVEX_CD8<8, CD8VF>;
 
 defm VPCMPEQW : avx512_icmp_packed_vl<0x75, "vpcmpeqw", X86pcmpeqm,
                       avx512vl_i16_info, HasBWI>,
                 EVEX_CD8<16, CD8VF>;
 
 defm VPCMPEQD : avx512_icmp_packed_rmb_vl<0x76, "vpcmpeqd", X86pcmpeqm,
                       avx512vl_i32_info, HasAVX512>,
                 EVEX_CD8<32, CD8VF>;
 
 defm VPCMPEQQ : avx512_icmp_packed_rmb_vl<0x29, "vpcmpeqq", X86pcmpeqm,
                       avx512vl_i64_info, HasAVX512>,
                 T8PD, VEX_W, EVEX_CD8<64, CD8VF>;
 
 defm VPCMPGTB : avx512_icmp_packed_vl<0x64, "vpcmpgtb", X86pcmpgtm,
                       avx512vl_i8_info, HasBWI>,
                 EVEX_CD8<8, CD8VF>;
 
 defm VPCMPGTW : avx512_icmp_packed_vl<0x65, "vpcmpgtw", X86pcmpgtm,
                       avx512vl_i16_info, HasBWI>,
                 EVEX_CD8<16, CD8VF>;
 
 defm VPCMPGTD : avx512_icmp_packed_rmb_vl<0x66, "vpcmpgtd", X86pcmpgtm,
                       avx512vl_i32_info, HasAVX512>,
                 EVEX_CD8<32, CD8VF>;
 
 defm VPCMPGTQ : avx512_icmp_packed_rmb_vl<0x37, "vpcmpgtq", X86pcmpgtm,
                       avx512vl_i64_info, HasAVX512>,
                 T8PD, VEX_W, EVEX_CD8<64, CD8VF>;
 
 def : Pat<(v8i1 (X86pcmpgtm (v8i32 VR256X:$src1), (v8i32 VR256X:$src2))),
             (COPY_TO_REGCLASS (VPCMPGTDZrr
             (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)),
             (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm))), VK8)>;
 
 def : Pat<(v8i1 (X86pcmpeqm (v8i32 VR256X:$src1), (v8i32 VR256X:$src2))),
             (COPY_TO_REGCLASS (VPCMPEQDZrr
             (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)),
             (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm))), VK8)>;
 
 multiclass avx512_icmp_cc<bits<8> opc, string Suffix, SDNode OpNode,
                           X86VectorVTInfo _> {
   def rri : AVX512AIi8<opc, MRMSrcReg,
              (outs _.KRC:$dst), (ins _.RC:$src1, _.RC:$src2, AVX512ICC:$cc),
              !strconcat("vpcmp${cc}", Suffix,
                         "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
              [(set _.KRC:$dst, (OpNode (_.VT _.RC:$src1), (_.VT _.RC:$src2),
                                        imm:$cc))],
              IIC_SSE_ALU_F32P_RR>, EVEX_4V;
   let mayLoad = 1 in
   def rmi : AVX512AIi8<opc, MRMSrcMem,
              (outs _.KRC:$dst), (ins _.RC:$src1, _.MemOp:$src2, AVX512ICC:$cc),
              !strconcat("vpcmp${cc}", Suffix,
                         "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
              [(set _.KRC:$dst, (OpNode (_.VT _.RC:$src1),
                               (_.VT (bitconvert (_.LdFrag addr:$src2))),
                               imm:$cc))],
              IIC_SSE_ALU_F32P_RM>, EVEX_4V;
   def rrik : AVX512AIi8<opc, MRMSrcReg,
               (outs _.KRC:$dst), (ins _.KRCWM:$mask, _.RC:$src1, _.RC:$src2,
                                       AVX512ICC:$cc),
               !strconcat("vpcmp${cc}", Suffix,
                          "\t{$src2, $src1, $dst {${mask}}|",
                          "$dst {${mask}}, $src1, $src2}"),
               [(set _.KRC:$dst, (and _.KRCWM:$mask,
                                   (OpNode (_.VT _.RC:$src1), (_.VT _.RC:$src2),
                                           imm:$cc)))],
               IIC_SSE_ALU_F32P_RR>, EVEX_4V, EVEX_K;
   let mayLoad = 1 in
   def rmik : AVX512AIi8<opc, MRMSrcMem,
               (outs _.KRC:$dst), (ins _.KRCWM:$mask, _.RC:$src1, _.MemOp:$src2,
                                     AVX512ICC:$cc),
               !strconcat("vpcmp${cc}", Suffix,
                          "\t{$src2, $src1, $dst {${mask}}|",
                          "$dst {${mask}}, $src1, $src2}"),
               [(set _.KRC:$dst, (and _.KRCWM:$mask,
                                    (OpNode (_.VT _.RC:$src1),
                                       (_.VT (bitconvert (_.LdFrag addr:$src2))),
                                       imm:$cc)))],
               IIC_SSE_ALU_F32P_RM>, EVEX_4V, EVEX_K;
 
   // Accept explicit immediate argument form instead of comparison code.
   let isAsmParserOnly = 1, hasSideEffects = 0 in {
     def rri_alt : AVX512AIi8<opc, MRMSrcReg,
                (outs _.KRC:$dst), (ins _.RC:$src1, _.RC:$src2, u8imm:$cc),
                !strconcat("vpcmp", Suffix, "\t{$cc, $src2, $src1, $dst|",
                           "$dst, $src1, $src2, $cc}"),
                [], IIC_SSE_ALU_F32P_RR>, EVEX_4V;
     let mayLoad = 1 in
     def rmi_alt : AVX512AIi8<opc, MRMSrcMem,
                (outs _.KRC:$dst), (ins _.RC:$src1, _.MemOp:$src2, u8imm:$cc),
                !strconcat("vpcmp", Suffix, "\t{$cc, $src2, $src1, $dst|",
                           "$dst, $src1, $src2, $cc}"),
                [], IIC_SSE_ALU_F32P_RM>, EVEX_4V;
     def rrik_alt : AVX512AIi8<opc, MRMSrcReg,
                (outs _.KRC:$dst), (ins _.KRCWM:$mask, _.RC:$src1, _.RC:$src2,
                                        u8imm:$cc),
                !strconcat("vpcmp", Suffix,
                           "\t{$cc, $src2, $src1, $dst {${mask}}|",
                           "$dst {${mask}}, $src1, $src2, $cc}"),
                [], IIC_SSE_ALU_F32P_RR>, EVEX_4V, EVEX_K;
     let mayLoad = 1 in
     def rmik_alt : AVX512AIi8<opc, MRMSrcMem,
                (outs _.KRC:$dst), (ins _.KRCWM:$mask, _.RC:$src1, _.MemOp:$src2,
                                        u8imm:$cc),
                !strconcat("vpcmp", Suffix,
                           "\t{$cc, $src2, $src1, $dst {${mask}}|",
                           "$dst {${mask}}, $src1, $src2, $cc}"),
                [], IIC_SSE_ALU_F32P_RM>, EVEX_4V, EVEX_K;
   }
 }
 
 multiclass avx512_icmp_cc_rmb<bits<8> opc, string Suffix, SDNode OpNode,
                               X86VectorVTInfo _> :
            avx512_icmp_cc<opc, Suffix, OpNode, _> {
   def rmib : AVX512AIi8<opc, MRMSrcMem,
              (outs _.KRC:$dst), (ins _.RC:$src1, _.ScalarMemOp:$src2,
                                      AVX512ICC:$cc),
              !strconcat("vpcmp${cc}", Suffix,
                         "\t{${src2}", _.BroadcastStr, ", $src1, $dst|",
                         "$dst, $src1, ${src2}", _.BroadcastStr, "}"),
              [(set _.KRC:$dst, (OpNode (_.VT _.RC:$src1),
                                (X86VBroadcast (_.ScalarLdFrag addr:$src2)),
                                imm:$cc))],
              IIC_SSE_ALU_F32P_RM>, EVEX_4V, EVEX_B;
   def rmibk : AVX512AIi8<opc, MRMSrcMem,
               (outs _.KRC:$dst), (ins _.KRCWM:$mask, _.RC:$src1,
                                        _.ScalarMemOp:$src2, AVX512ICC:$cc),
               !strconcat("vpcmp${cc}", Suffix,
                        "\t{${src2}", _.BroadcastStr, ", $src1, $dst {${mask}}|",
                        "$dst {${mask}}, $src1, ${src2}", _.BroadcastStr, "}"),
               [(set _.KRC:$dst, (and _.KRCWM:$mask,
                                   (OpNode (_.VT _.RC:$src1),
                                     (X86VBroadcast (_.ScalarLdFrag addr:$src2)),
                                     imm:$cc)))],
               IIC_SSE_ALU_F32P_RM>, EVEX_4V, EVEX_K, EVEX_B;
 
   // Accept explicit immediate argument form instead of comparison code.
   let isAsmParserOnly = 1, hasSideEffects = 0, mayLoad = 1 in {
     def rmib_alt : AVX512AIi8<opc, MRMSrcMem,
                (outs _.KRC:$dst), (ins _.RC:$src1, _.ScalarMemOp:$src2,
                                        u8imm:$cc),
                !strconcat("vpcmp", Suffix,
                    "\t{$cc, ${src2}", _.BroadcastStr, ", $src1, $dst|",
                    "$dst, $src1, ${src2}", _.BroadcastStr, ", $cc}"),
                [], IIC_SSE_ALU_F32P_RM>, EVEX_4V, EVEX_B;
     def rmibk_alt : AVX512AIi8<opc, MRMSrcMem,
                (outs _.KRC:$dst), (ins _.KRCWM:$mask, _.RC:$src1,
                                        _.ScalarMemOp:$src2, u8imm:$cc),
                !strconcat("vpcmp", Suffix,
                   "\t{$cc, ${src2}", _.BroadcastStr, ", $src1, $dst {${mask}}|",
                   "$dst {${mask}}, $src1, ${src2}", _.BroadcastStr, ", $cc}"),
                [], IIC_SSE_ALU_F32P_RM>, EVEX_4V, EVEX_K, EVEX_B;
   }
 }
 
 multiclass avx512_icmp_cc_vl<bits<8> opc, string Suffix, SDNode OpNode,
                              AVX512VLVectorVTInfo VTInfo, Predicate prd> {
   let Predicates = [prd] in
   defm Z : avx512_icmp_cc<opc, Suffix, OpNode, VTInfo.info512>, EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_icmp_cc<opc, Suffix, OpNode, VTInfo.info256>, EVEX_V256;
     defm Z128 : avx512_icmp_cc<opc, Suffix, OpNode, VTInfo.info128>, EVEX_V128;
   }
 }
 
 multiclass avx512_icmp_cc_rmb_vl<bits<8> opc, string Suffix, SDNode OpNode,
                                 AVX512VLVectorVTInfo VTInfo, Predicate prd> {
   let Predicates = [prd] in
   defm Z : avx512_icmp_cc_rmb<opc, Suffix, OpNode, VTInfo.info512>,
            EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_icmp_cc_rmb<opc, Suffix, OpNode, VTInfo.info256>,
                 EVEX_V256;
     defm Z128 : avx512_icmp_cc_rmb<opc, Suffix, OpNode, VTInfo.info128>,
                 EVEX_V128;
   }
 }
 
 defm VPCMPB : avx512_icmp_cc_vl<0x3F, "b", X86cmpm, avx512vl_i8_info,
                                 HasBWI>, EVEX_CD8<8, CD8VF>;
 defm VPCMPUB : avx512_icmp_cc_vl<0x3E, "ub", X86cmpmu, avx512vl_i8_info,
                                  HasBWI>, EVEX_CD8<8, CD8VF>;
 
 defm VPCMPW : avx512_icmp_cc_vl<0x3F, "w", X86cmpm, avx512vl_i16_info,
                                 HasBWI>, VEX_W, EVEX_CD8<16, CD8VF>;
 defm VPCMPUW : avx512_icmp_cc_vl<0x3E, "uw", X86cmpmu, avx512vl_i16_info,
                                  HasBWI>, VEX_W, EVEX_CD8<16, CD8VF>;
 
 defm VPCMPD : avx512_icmp_cc_rmb_vl<0x1F, "d", X86cmpm, avx512vl_i32_info,
                                     HasAVX512>, EVEX_CD8<32, CD8VF>;
 defm VPCMPUD : avx512_icmp_cc_rmb_vl<0x1E, "ud", X86cmpmu, avx512vl_i32_info,
                                      HasAVX512>, EVEX_CD8<32, CD8VF>;
 
 defm VPCMPQ : avx512_icmp_cc_rmb_vl<0x1F, "q", X86cmpm, avx512vl_i64_info,
                                     HasAVX512>, VEX_W, EVEX_CD8<64, CD8VF>;
 defm VPCMPUQ : avx512_icmp_cc_rmb_vl<0x1E, "uq", X86cmpmu, avx512vl_i64_info,
                                      HasAVX512>, VEX_W, EVEX_CD8<64, CD8VF>;
 
 multiclass avx512_vcmp_common<X86VectorVTInfo _> {
 
   defm  rri  : AVX512_maskable_cmp<0xC2, MRMSrcReg, _,
                    (outs _.KRC:$dst), (ins _.RC:$src1, _.RC:$src2,AVXCC:$cc),
                    "vcmp${cc}"#_.Suffix,
                    "$src2, $src1", "$src1, $src2",
                    (X86cmpm (_.VT _.RC:$src1),
                          (_.VT _.RC:$src2),
                            imm:$cc)>;
 
   let mayLoad = 1 in {
     defm  rmi  : AVX512_maskable_cmp<0xC2, MRMSrcMem, _,
                   (outs _.KRC:$dst),(ins _.RC:$src1, _.MemOp:$src2, AVXCC:$cc),
                   "vcmp${cc}"#_.Suffix,
                   "$src2, $src1", "$src1, $src2",
                   (X86cmpm (_.VT _.RC:$src1),
                           (_.VT (bitconvert (_.LdFrag addr:$src2))),
                           imm:$cc)>;
 
     defm  rmbi : AVX512_maskable_cmp<0xC2, MRMSrcMem, _,
                   (outs _.KRC:$dst),
                   (ins _.RC:$src1, _.ScalarMemOp:$src2, AVXCC:$cc),
                   "vcmp${cc}"#_.Suffix,
                   "${src2}"##_.BroadcastStr##", $src1",
                   "$src1, ${src2}"##_.BroadcastStr,
                   (X86cmpm (_.VT _.RC:$src1),
                           (_.VT (X86VBroadcast(_.ScalarLdFrag addr:$src2))),
                           imm:$cc)>,EVEX_B;
   }
   // Accept explicit immediate argument form instead of comparison code.
   let isAsmParserOnly = 1, hasSideEffects = 0 in {
     defm  rri_alt : AVX512_maskable_cmp_alt<0xC2, MRMSrcReg, _,
                          (outs _.KRC:$dst),
                          (ins _.RC:$src1, _.RC:$src2, u8imm:$cc),
                          "vcmp"#_.Suffix,
                          "$cc, $src2, $src1", "$src1, $src2, $cc">;
 
     let mayLoad = 1 in {
       defm rmi_alt : AVX512_maskable_cmp_alt<0xC2, MRMSrcMem, _,
                              (outs _.KRC:$dst),
                              (ins _.RC:$src1, _.MemOp:$src2, u8imm:$cc),
                              "vcmp"#_.Suffix,
                              "$cc, $src2, $src1", "$src1, $src2, $cc">;
 
       defm  rmbi_alt : AVX512_maskable_cmp_alt<0xC2, MRMSrcMem, _,
                          (outs _.KRC:$dst),
                          (ins _.RC:$src1, _.ScalarMemOp:$src2, u8imm:$cc),
                          "vcmp"#_.Suffix,
                          "$cc, ${src2}"##_.BroadcastStr##", $src1",
                          "$src1, ${src2}"##_.BroadcastStr##", $cc">,EVEX_B;
     }
  }
 }
 
 multiclass avx512_vcmp_sae<X86VectorVTInfo _> {
   // comparison code form (VCMP[EQ/LT/LE/...]
   defm  rrib  : AVX512_maskable_cmp<0xC2, MRMSrcReg, _,
                      (outs _.KRC:$dst),(ins _.RC:$src1, _.RC:$src2, AVXCC:$cc),
                      "vcmp${cc}"#_.Suffix,
                      "{sae}, $src2, $src1", "$src1, $src2, {sae}",
                      (X86cmpmRnd (_.VT _.RC:$src1),
                                     (_.VT _.RC:$src2),
                                     imm:$cc,
                                 (i32 FROUND_NO_EXC))>, EVEX_B;
 
   let isAsmParserOnly = 1, hasSideEffects = 0 in {
     defm  rrib_alt  : AVX512_maskable_cmp_alt<0xC2, MRMSrcReg, _,
                          (outs _.KRC:$dst),
                          (ins _.RC:$src1, _.RC:$src2, u8imm:$cc),
                          "vcmp"#_.Suffix,
                          "$cc, {sae}, $src2, $src1",
                          "$src1, $src2, {sae}, $cc">, EVEX_B;
    }
 }
 
 multiclass avx512_vcmp<AVX512VLVectorVTInfo _> {
   let Predicates = [HasAVX512] in {
     defm Z    : avx512_vcmp_common<_.info512>,
                 avx512_vcmp_sae<_.info512>, EVEX_V512;
 
   }
   let Predicates = [HasAVX512,HasVLX] in {
    defm Z128 : avx512_vcmp_common<_.info128>, EVEX_V128;
    defm Z256 : avx512_vcmp_common<_.info256>, EVEX_V256;
   }
 }
 
 defm VCMPPD : avx512_vcmp<avx512vl_f64_info>,
                           AVX512PDIi8Base, EVEX_4V, EVEX_CD8<64, CD8VF>, VEX_W;
 defm VCMPPS : avx512_vcmp<avx512vl_f32_info>,
                           AVX512PSIi8Base, EVEX_4V, EVEX_CD8<32, CD8VF>;
 
 def : Pat<(v8i1 (X86cmpm (v8f32 VR256X:$src1), (v8f32 VR256X:$src2), imm:$cc)),
           (COPY_TO_REGCLASS (VCMPPSZrri
             (v16f32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)),
             (v16f32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm)),
             imm:$cc), VK8)>;
 def : Pat<(v8i1 (X86cmpm (v8i32 VR256X:$src1), (v8i32 VR256X:$src2), imm:$cc)),
           (COPY_TO_REGCLASS (VPCMPDZrri
             (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)),
             (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm)),
             imm:$cc), VK8)>;
 def : Pat<(v8i1 (X86cmpmu (v8i32 VR256X:$src1), (v8i32 VR256X:$src2), imm:$cc)),
           (COPY_TO_REGCLASS (VPCMPUDZrri
             (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)),
             (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src2, sub_ymm)),
             imm:$cc), VK8)>;
 
 // ----------------------------------------------------------------
 // FPClass
 //handle fpclass instruction  mask =  op(reg_scalar,imm)
 //                                    op(mem_scalar,imm)
 multiclass avx512_scalar_fpclass<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                  X86VectorVTInfo _, Predicate prd> {
   let Predicates = [prd] in {
       def rr : AVX512<opc, MRMSrcReg, (outs _.KRC:$dst),//_.KRC:$dst),
                       (ins _.RC:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix#"\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                       [(set _.KRC:$dst,(OpNode (_.VT _.RC:$src1),
                               (i32 imm:$src2)))], NoItinerary>;
       def rrk : AVX512<opc, MRMSrcReg, (outs _.KRC:$dst),
                       (ins _.KRCWM:$mask, _.RC:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix#
                       "\t{$src2, $src1, $dst {${mask}}|$dst {${mask}}, $src1, $src2}",
                       [(set _.KRC:$dst,(or _.KRCWM:$mask, 
                                       (OpNode (_.VT _.RC:$src1),
                                       (i32 imm:$src2))))], NoItinerary>, EVEX_K;
     let mayLoad = 1, AddedComplexity = 20 in {
       def rm : AVX512<opc, MRMSrcMem, (outs _.KRC:$dst),
                       (ins _.MemOp:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix##
                                 "\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                       [(set _.KRC:$dst,
                             (OpNode (_.VT (bitconvert (_.LdFrag addr:$src1))),
                                     (i32 imm:$src2)))], NoItinerary>;
       def rmk : AVX512<opc, MRMSrcMem, (outs _.KRC:$dst),
                       (ins _.KRCWM:$mask, _.MemOp:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix##
                       "\t{$src2, $src1, $dst {${mask}}|$dst {${mask}}, $src1, $src2}",
                       [(set _.KRC:$dst,(or _.KRCWM:$mask, 
                           (OpNode (_.VT (bitconvert (_.LdFrag addr:$src1))),
                               (i32 imm:$src2))))], NoItinerary>, EVEX_K;
     }
   }
 }
 
 //handle fpclass instruction mask = fpclass(reg_vec, reg_vec, imm)
 //                                  fpclass(reg_vec, mem_vec, imm)
 //                                  fpclass(reg_vec, broadcast(eltVt), imm)
 multiclass avx512_vector_fpclass<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                  X86VectorVTInfo _, string mem, string broadcast>{
   def rr : AVX512<opc, MRMSrcReg, (outs _.KRC:$dst),
                       (ins _.RC:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix#"\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                       [(set _.KRC:$dst,(OpNode (_.VT _.RC:$src1),
                                        (i32 imm:$src2)))], NoItinerary>;
   def rrk : AVX512<opc, MRMSrcReg, (outs _.KRC:$dst),
                       (ins _.KRCWM:$mask, _.RC:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix#
                       "\t{$src2, $src1, $dst {${mask}}|$dst {${mask}}, $src1, $src2}",
                       [(set _.KRC:$dst,(or _.KRCWM:$mask, 
                                        (OpNode (_.VT _.RC:$src1),
                                        (i32 imm:$src2))))], NoItinerary>, EVEX_K;
   let mayLoad = 1 in {
     def rm : AVX512<opc, MRMSrcMem, (outs _.KRC:$dst),
                       (ins _.MemOp:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix##mem#
                       "\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                       [(set _.KRC:$dst,(OpNode 
                                        (_.VT (bitconvert (_.LdFrag addr:$src1))),
                                        (i32 imm:$src2)))], NoItinerary>;
     def rmk : AVX512<opc, MRMSrcMem, (outs _.KRC:$dst),
                       (ins _.KRCWM:$mask, _.MemOp:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix##mem#
                       "\t{$src2, $src1, $dst {${mask}}|$dst {${mask}}, $src1, $src2}",
                       [(set _.KRC:$dst, (or _.KRCWM:$mask, (OpNode 
                                     (_.VT (bitconvert (_.LdFrag addr:$src1))),
                                     (i32 imm:$src2))))], NoItinerary>, EVEX_K;
     def rmb : AVX512<opc, MRMSrcMem, (outs _.KRC:$dst),
                       (ins _.ScalarMemOp:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix##broadcast##"\t{$src2, ${src1}"##
                                         _.BroadcastStr##", $dst|$dst, ${src1}"
                                                     ##_.BroadcastStr##", $src2}",
                       [(set _.KRC:$dst,(OpNode 
                                        (_.VT (X86VBroadcast 
                                              (_.ScalarLdFrag addr:$src1))),
                                        (i32 imm:$src2)))], NoItinerary>,EVEX_B;
     def rmbk : AVX512<opc, MRMSrcMem, (outs _.KRC:$dst),
                       (ins _.KRCWM:$mask, _.ScalarMemOp:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix##broadcast##"\t{$src2, ${src1}"##
                             _.BroadcastStr##", $dst {${mask}}|$dst {${mask}}, ${src1}"##
                                                      _.BroadcastStr##", $src2}",
                       [(set _.KRC:$dst,(or _.KRCWM:$mask, (OpNode 
                                        (_.VT (X86VBroadcast 
                                              (_.ScalarLdFrag addr:$src1))),
                                        (i32 imm:$src2))))], NoItinerary>,
                                                             EVEX_B, EVEX_K;
   }
 }
 
 multiclass avx512_vector_fpclass_all<string OpcodeStr,
             AVX512VLVectorVTInfo _, bits<8> opc, SDNode OpNode, Predicate prd, 
                                                               string broadcast>{
   let Predicates = [prd] in {
     defm Z    : avx512_vector_fpclass<opc, OpcodeStr, OpNode, _.info512, "{z}", 
                                       broadcast>, EVEX_V512;
   }
   let Predicates = [prd, HasVLX] in {
     defm Z128 : avx512_vector_fpclass<opc, OpcodeStr, OpNode, _.info128, "{x}",
                                       broadcast>, EVEX_V128;
     defm Z256 : avx512_vector_fpclass<opc, OpcodeStr, OpNode, _.info256, "{y}",
                                       broadcast>, EVEX_V256;
   }
 }
 
 multiclass avx512_fp_fpclass_all<string OpcodeStr, bits<8> opcVec,
              bits<8> opcScalar, SDNode VecOpNode, SDNode ScalarOpNode, Predicate prd>{
   defm PS : avx512_vector_fpclass_all<OpcodeStr,  avx512vl_f32_info, opcVec, 
                                       VecOpNode, prd, "{l}">, EVEX_CD8<32, CD8VF>;
   defm PD : avx512_vector_fpclass_all<OpcodeStr,  avx512vl_f64_info, opcVec, 
                                       VecOpNode, prd, "{q}">,EVEX_CD8<64, CD8VF> , VEX_W;
   defm SS : avx512_scalar_fpclass<opcScalar, OpcodeStr, ScalarOpNode,
                                       f32x_info, prd>, EVEX_CD8<32, CD8VT1>;
   defm SD : avx512_scalar_fpclass<opcScalar, OpcodeStr, ScalarOpNode,
                                       f64x_info, prd>, EVEX_CD8<64, CD8VT1>, VEX_W;
 }
 
 defm VFPCLASS : avx512_fp_fpclass_all<"vfpclass", 0x66, 0x67, X86Vfpclass,
                                       X86Vfpclasss, HasDQI>, AVX512AIi8Base,EVEX;
 
 //-----------------------------------------------------------------
 // Mask register copy, including
 // - copy between mask registers
 // - load/store mask registers
 // - copy from GPR to mask register and vice versa
 //
 multiclass avx512_mask_mov<bits<8> opc_kk, bits<8> opc_km, bits<8> opc_mk,
                          string OpcodeStr, RegisterClass KRC,
                          ValueType vvt, X86MemOperand x86memop> {
   let hasSideEffects = 0 in {
     def kk : I<opc_kk, MRMSrcReg, (outs KRC:$dst), (ins KRC:$src),
                !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"), []>;
     let mayLoad = 1 in
     def km : I<opc_km, MRMSrcMem, (outs KRC:$dst), (ins x86memop:$src),
                !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"),
                [(set KRC:$dst, (vvt (load addr:$src)))]>;
     let mayStore = 1 in
     def mk : I<opc_mk, MRMDestMem, (outs), (ins x86memop:$dst, KRC:$src),
                !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"),
                [(store KRC:$src, addr:$dst)]>;
   }
 }
 
 multiclass avx512_mask_mov_gpr<bits<8> opc_kr, bits<8> opc_rk,
                              string OpcodeStr,
                              RegisterClass KRC, RegisterClass GRC> {
   let hasSideEffects = 0 in {
     def kr : I<opc_kr, MRMSrcReg, (outs KRC:$dst), (ins GRC:$src),
                !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"), []>;
     def rk : I<opc_rk, MRMSrcReg, (outs GRC:$dst), (ins KRC:$src),
                !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"), []>;
   }
 }
 
 let Predicates = [HasDQI] in
   defm KMOVB : avx512_mask_mov<0x90, 0x90, 0x91, "kmovb", VK8, v8i1, i8mem>,
                avx512_mask_mov_gpr<0x92, 0x93, "kmovb", VK8, GR32>,
                VEX, PD;
 
 let Predicates = [HasAVX512] in
   defm KMOVW : avx512_mask_mov<0x90, 0x90, 0x91, "kmovw", VK16, v16i1, i16mem>,
                avx512_mask_mov_gpr<0x92, 0x93, "kmovw", VK16, GR32>,
                VEX, PS;
 
 let Predicates = [HasBWI] in {
   defm KMOVD : avx512_mask_mov<0x90, 0x90, 0x91, "kmovd", VK32, v32i1,i32mem>,
                VEX, PD, VEX_W;
   defm KMOVD : avx512_mask_mov_gpr<0x92, 0x93, "kmovd", VK32, GR32>,
                VEX, XD;
 }
 
 let Predicates = [HasBWI] in {
   defm KMOVQ : avx512_mask_mov<0x90, 0x90, 0x91, "kmovq", VK64, v64i1, i64mem>,
                VEX, PS, VEX_W;
   defm KMOVQ : avx512_mask_mov_gpr<0x92, 0x93, "kmovq", VK64, GR64>,
                VEX, XD, VEX_W;
 }
 
 // GR from/to mask register
 let Predicates = [HasDQI] in {
   def : Pat<(v8i1 (bitconvert (i8 GR8:$src))),
             (KMOVBkr (SUBREG_TO_REG (i32 0), GR8:$src, sub_8bit))>;
   def : Pat<(i8 (bitconvert (v8i1 VK8:$src))),
             (EXTRACT_SUBREG (KMOVBrk VK8:$src), sub_8bit)>;
 }
 let Predicates = [HasAVX512] in {
   def : Pat<(v16i1 (bitconvert (i16 GR16:$src))),
             (KMOVWkr (SUBREG_TO_REG (i32 0), GR16:$src, sub_16bit))>;
   def : Pat<(i16 (bitconvert (v16i1 VK16:$src))),
             (EXTRACT_SUBREG (KMOVWrk VK16:$src), sub_16bit)>;
 }
 let Predicates = [HasBWI] in {
   def : Pat<(v32i1 (bitconvert (i32 GR32:$src))), (KMOVDkr GR32:$src)>;
   def : Pat<(i32 (bitconvert (v32i1 VK32:$src))), (KMOVDrk VK32:$src)>;
 }
 let Predicates = [HasBWI] in {
   def : Pat<(v64i1 (bitconvert (i64 GR64:$src))), (KMOVQkr GR64:$src)>;
   def : Pat<(i64 (bitconvert (v64i1 VK64:$src))), (KMOVQrk VK64:$src)>;
 }
 
 // Load/store kreg
 let Predicates = [HasDQI] in {
   def : Pat<(store (i8 (bitconvert (v8i1 VK8:$src))), addr:$dst),
             (KMOVBmk addr:$dst, VK8:$src)>;
   def : Pat<(v8i1 (bitconvert (i8 (load addr:$src)))),
             (KMOVBkm addr:$src)>;
 
   def : Pat<(store VK4:$src, addr:$dst),
             (KMOVBmk addr:$dst, (COPY_TO_REGCLASS VK4:$src, VK8))>;
   def : Pat<(store VK2:$src, addr:$dst),
             (KMOVBmk addr:$dst, (COPY_TO_REGCLASS VK2:$src, VK8))>;
 }
 let Predicates = [HasAVX512, NoDQI] in {
   def : Pat<(store (i8 (bitconvert (v8i1 VK8:$src))), addr:$dst),
             (KMOVWmk addr:$dst, (COPY_TO_REGCLASS VK8:$src, VK16))>;
   def : Pat<(v8i1 (bitconvert (i8 (load addr:$src)))),
             (COPY_TO_REGCLASS (KMOVWkm addr:$src), VK8)>;
 }
 let Predicates = [HasAVX512] in {
   def : Pat<(store (i16 (bitconvert (v16i1 VK16:$src))), addr:$dst),
             (KMOVWmk addr:$dst, VK16:$src)>;
   def : Pat<(i1 (load addr:$src)),
             (COPY_TO_REGCLASS (AND16ri (i16 (SUBREG_TO_REG (i32 0),
                                               (MOV8rm addr:$src), sub_8bit)),
                                 (i16 1)), VK1)>;
   def : Pat<(v16i1 (bitconvert (i16 (load addr:$src)))),
             (KMOVWkm addr:$src)>;
 }
 let Predicates = [HasBWI] in {
   def : Pat<(store (i32 (bitconvert (v32i1 VK32:$src))), addr:$dst),
             (KMOVDmk addr:$dst, VK32:$src)>;
   def : Pat<(v32i1 (bitconvert (i32 (load addr:$src)))),
             (KMOVDkm addr:$src)>;
 }
 let Predicates = [HasBWI] in {
   def : Pat<(store (i64 (bitconvert (v64i1 VK64:$src))), addr:$dst),
             (KMOVQmk addr:$dst, VK64:$src)>;
   def : Pat<(v64i1 (bitconvert (i64 (load addr:$src)))),
             (KMOVQkm addr:$src)>;
 }
 
 let Predicates = [HasAVX512] in {
   def : Pat<(i1 (trunc (i64 GR64:$src))),
             (COPY_TO_REGCLASS (KMOVWkr (AND32ri (EXTRACT_SUBREG $src, sub_32bit),
                                         (i32 1))), VK1)>;
 
   def : Pat<(i1 (trunc (i32 GR32:$src))),
             (COPY_TO_REGCLASS (KMOVWkr (AND32ri $src, (i32 1))), VK1)>;
 
   def : Pat<(i1 (trunc (i8 GR8:$src))),
        (COPY_TO_REGCLASS
         (KMOVWkr (AND32ri (SUBREG_TO_REG (i32 0), GR8:$src, sub_8bit), (i32 1))),
        VK1)>;
   def : Pat<(i1 (trunc (i16 GR16:$src))),
        (COPY_TO_REGCLASS
         (KMOVWkr (AND32ri (SUBREG_TO_REG (i32 0), $src, sub_16bit), (i32 1))),
        VK1)>;
 
   def : Pat<(i32 (zext VK1:$src)),
             (AND32ri (KMOVWrk (COPY_TO_REGCLASS VK1:$src, VK16)), (i32 1))>;
   def : Pat<(i32 (anyext VK1:$src)),
             (KMOVWrk (COPY_TO_REGCLASS VK1:$src, VK16))>;
 
   def : Pat<(i8 (zext VK1:$src)),
             (EXTRACT_SUBREG
              (AND32ri (KMOVWrk
                        (COPY_TO_REGCLASS VK1:$src, VK16)), (i32 1)), sub_8bit)>;
   def : Pat<(i8 (anyext VK1:$src)),
               (EXTRACT_SUBREG
                 (KMOVWrk (COPY_TO_REGCLASS VK1:$src, VK16)), sub_8bit)>;
 
   def : Pat<(i64 (zext VK1:$src)),
             (AND64ri8 (SUBREG_TO_REG (i64 0),
              (KMOVWrk (COPY_TO_REGCLASS VK1:$src, VK16)), sub_32bit), (i64 1))>;
   def : Pat<(i16 (zext VK1:$src)),
             (EXTRACT_SUBREG
              (AND32ri (KMOVWrk (COPY_TO_REGCLASS VK1:$src, VK16)), (i32 1)),
               sub_16bit)>;
 }
 def : Pat<(v16i1 (scalar_to_vector VK1:$src)),
           (COPY_TO_REGCLASS VK1:$src, VK16)>;
 def : Pat<(v8i1 (scalar_to_vector VK1:$src)),
           (COPY_TO_REGCLASS VK1:$src, VK8)>;
 def : Pat<(v4i1 (scalar_to_vector VK1:$src)),
           (COPY_TO_REGCLASS VK1:$src, VK4)>;
 def : Pat<(v2i1 (scalar_to_vector VK1:$src)),
           (COPY_TO_REGCLASS VK1:$src, VK2)>;
 def : Pat<(v32i1 (scalar_to_vector VK1:$src)),
           (COPY_TO_REGCLASS VK1:$src, VK32)>;
 def : Pat<(v64i1 (scalar_to_vector VK1:$src)),
           (COPY_TO_REGCLASS VK1:$src, VK64)>;
 
 
 // With AVX-512 only, 8-bit mask is promoted to 16-bit mask.
 let Predicates = [HasAVX512, NoDQI] in {
   // GR from/to 8-bit mask without native support
   def : Pat<(v8i1 (bitconvert (i8 GR8:$src))),
             (COPY_TO_REGCLASS
              (KMOVWkr (MOVZX32rr8 GR8 :$src)), VK8)>;
   def : Pat<(i8 (bitconvert (v8i1 VK8:$src))),
             (EXTRACT_SUBREG
               (KMOVWrk (COPY_TO_REGCLASS VK8:$src, VK16)),
               sub_8bit)>;
 }
 
 let Predicates = [HasAVX512] in {
   def : Pat<(i1 (X86Vextract VK16:$src, (iPTR 0))),
             (COPY_TO_REGCLASS VK16:$src, VK1)>;
   def : Pat<(i1 (X86Vextract VK8:$src, (iPTR 0))),
             (COPY_TO_REGCLASS VK8:$src, VK1)>;
 }
 let Predicates = [HasBWI] in {
   def : Pat<(i1 (X86Vextract VK32:$src, (iPTR 0))),
             (COPY_TO_REGCLASS VK32:$src, VK1)>;
   def : Pat<(i1 (X86Vextract VK64:$src, (iPTR 0))),
             (COPY_TO_REGCLASS VK64:$src, VK1)>;
 }
 
 // Mask unary operation
 // - KNOT
 multiclass avx512_mask_unop<bits<8> opc, string OpcodeStr,
                             RegisterClass KRC, SDPatternOperator OpNode,
                             Predicate prd> {
   let Predicates = [prd] in
     def rr : I<opc, MRMSrcReg, (outs KRC:$dst), (ins KRC:$src),
                !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"),
                [(set KRC:$dst, (OpNode KRC:$src))]>;
 }
 
 multiclass avx512_mask_unop_all<bits<8> opc, string OpcodeStr,
                                 SDPatternOperator OpNode> {
   defm B : avx512_mask_unop<opc, !strconcat(OpcodeStr, "b"), VK8, OpNode,
                             HasDQI>, VEX, PD;
   defm W : avx512_mask_unop<opc, !strconcat(OpcodeStr, "w"), VK16, OpNode,
                             HasAVX512>, VEX, PS;
   defm D : avx512_mask_unop<opc, !strconcat(OpcodeStr, "d"), VK32, OpNode,
                             HasBWI>, VEX, PD, VEX_W;
   defm Q : avx512_mask_unop<opc, !strconcat(OpcodeStr, "q"), VK64, OpNode,
                             HasBWI>, VEX, PS, VEX_W;
 }
 
 defm KNOT : avx512_mask_unop_all<0x44, "knot", not>;
 
 multiclass avx512_mask_unop_int<string IntName, string InstName> {
   let Predicates = [HasAVX512] in
     def : Pat<(!cast<Intrinsic>("int_x86_avx512_"##IntName##"_w")
                 (i16 GR16:$src)),
               (COPY_TO_REGCLASS (!cast<Instruction>(InstName##"Wrr")
               (v16i1 (COPY_TO_REGCLASS GR16:$src, VK16))), GR16)>;
 }
 defm : avx512_mask_unop_int<"knot", "KNOT">;
 
 let Predicates = [HasDQI] in
 def : Pat<(xor VK8:$src1, (v8i1 immAllOnesV)), (KNOTBrr VK8:$src1)>;
 let Predicates = [HasAVX512] in
 def : Pat<(xor VK16:$src1, (v16i1 immAllOnesV)), (KNOTWrr VK16:$src1)>;
 let Predicates = [HasBWI] in
 def : Pat<(xor VK32:$src1, (v32i1 immAllOnesV)), (KNOTDrr VK32:$src1)>;
 let Predicates = [HasBWI] in
 def : Pat<(xor VK64:$src1, (v64i1 immAllOnesV)), (KNOTQrr VK64:$src1)>;
 
 // KNL does not support KMOVB, 8-bit mask is promoted to 16-bit
 let Predicates = [HasAVX512, NoDQI] in {
 def : Pat<(xor VK8:$src1,  (v8i1 immAllOnesV)),
           (COPY_TO_REGCLASS (KNOTWrr (COPY_TO_REGCLASS VK8:$src1, VK16)), VK8)>;
 def : Pat<(not VK8:$src),
           (COPY_TO_REGCLASS
             (KNOTWrr (COPY_TO_REGCLASS VK8:$src, VK16)), VK8)>;
 }
 def : Pat<(xor VK4:$src1,  (v4i1 immAllOnesV)),
           (COPY_TO_REGCLASS (KNOTWrr (COPY_TO_REGCLASS VK4:$src1, VK16)), VK4)>;
 def : Pat<(xor VK2:$src1,  (v2i1 immAllOnesV)),
           (COPY_TO_REGCLASS (KNOTWrr (COPY_TO_REGCLASS VK2:$src1, VK16)), VK2)>;
 
 // Mask binary operation
 // - KAND, KANDN, KOR, KXNOR, KXOR
 multiclass avx512_mask_binop<bits<8> opc, string OpcodeStr,
                            RegisterClass KRC, SDPatternOperator OpNode,
                            Predicate prd, bit IsCommutable> {
   let Predicates = [prd], isCommutable = IsCommutable in
     def rr : I<opc, MRMSrcReg, (outs KRC:$dst), (ins KRC:$src1, KRC:$src2),
                !strconcat(OpcodeStr,
                           "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
                [(set KRC:$dst, (OpNode KRC:$src1, KRC:$src2))]>;
 }
 
 multiclass avx512_mask_binop_all<bits<8> opc, string OpcodeStr,
                                SDPatternOperator OpNode, bit IsCommutable,
                                Predicate prdW = HasAVX512> {
   defm B : avx512_mask_binop<opc, !strconcat(OpcodeStr, "b"), VK8, OpNode,
                              HasDQI, IsCommutable>, VEX_4V, VEX_L, PD;
   defm W : avx512_mask_binop<opc, !strconcat(OpcodeStr, "w"), VK16, OpNode,
                              prdW, IsCommutable>, VEX_4V, VEX_L, PS;
   defm D : avx512_mask_binop<opc, !strconcat(OpcodeStr, "d"), VK32, OpNode,
                              HasBWI, IsCommutable>, VEX_4V, VEX_L, VEX_W, PD;
   defm Q : avx512_mask_binop<opc, !strconcat(OpcodeStr, "q"), VK64, OpNode,
                              HasBWI, IsCommutable>, VEX_4V, VEX_L, VEX_W, PS;
 }
 
 def andn : PatFrag<(ops node:$i0, node:$i1), (and (not node:$i0), node:$i1)>;
 def xnor : PatFrag<(ops node:$i0, node:$i1), (not (xor node:$i0, node:$i1))>;
 
 defm KAND  : avx512_mask_binop_all<0x41, "kand",  and,  1>;
 defm KOR   : avx512_mask_binop_all<0x45, "kor",   or,   1>;
 defm KXNOR : avx512_mask_binop_all<0x46, "kxnor", xnor, 1>;
 defm KXOR  : avx512_mask_binop_all<0x47, "kxor",  xor,  1>;
 defm KANDN : avx512_mask_binop_all<0x42, "kandn", andn, 0>;
 defm KADD  : avx512_mask_binop_all<0x4A, "kadd",  add,  1, HasDQI>;
 
 multiclass avx512_mask_binop_int<string IntName, string InstName> {
   let Predicates = [HasAVX512] in
     def : Pat<(!cast<Intrinsic>("int_x86_avx512_"##IntName##"_w")
                 (i16 GR16:$src1), (i16 GR16:$src2)),
               (COPY_TO_REGCLASS (!cast<Instruction>(InstName##"Wrr")
               (v16i1 (COPY_TO_REGCLASS GR16:$src1, VK16)),
               (v16i1 (COPY_TO_REGCLASS GR16:$src2, VK16))), GR16)>;
 }
 
 defm : avx512_mask_binop_int<"kand",  "KAND">;
 defm : avx512_mask_binop_int<"kandn", "KANDN">;
 defm : avx512_mask_binop_int<"kor",   "KOR">;
 defm : avx512_mask_binop_int<"kxnor", "KXNOR">;
 defm : avx512_mask_binop_int<"kxor",  "KXOR">;
 
 multiclass avx512_binop_pat<SDPatternOperator OpNode, Instruction Inst> {
   // With AVX512F, 8-bit mask is promoted to 16-bit mask,
   // for the DQI set, this type is legal and KxxxB instruction is used
   let Predicates = [NoDQI] in
   def : Pat<(OpNode VK8:$src1, VK8:$src2),
             (COPY_TO_REGCLASS
               (Inst (COPY_TO_REGCLASS VK8:$src1, VK16),
                     (COPY_TO_REGCLASS VK8:$src2, VK16)), VK8)>;
 
   // All types smaller than 8 bits require conversion anyway
   def : Pat<(OpNode VK1:$src1, VK1:$src2),
         (COPY_TO_REGCLASS (Inst
                            (COPY_TO_REGCLASS VK1:$src1, VK16),
                            (COPY_TO_REGCLASS VK1:$src2, VK16)), VK1)>;
   def : Pat<(OpNode VK2:$src1, VK2:$src2),
         (COPY_TO_REGCLASS (Inst
                            (COPY_TO_REGCLASS VK2:$src1, VK16),
                            (COPY_TO_REGCLASS VK2:$src2, VK16)), VK1)>;
   def : Pat<(OpNode VK4:$src1, VK4:$src2),
         (COPY_TO_REGCLASS (Inst
                            (COPY_TO_REGCLASS VK4:$src1, VK16),
                            (COPY_TO_REGCLASS VK4:$src2, VK16)), VK1)>;
 }
 
 defm : avx512_binop_pat<and,  KANDWrr>;
 defm : avx512_binop_pat<andn, KANDNWrr>;
 defm : avx512_binop_pat<or,   KORWrr>;
 defm : avx512_binop_pat<xnor, KXNORWrr>;
 defm : avx512_binop_pat<xor,  KXORWrr>;
 
 def : Pat<(xor (xor VK16:$src1, VK16:$src2), (v16i1 immAllOnesV)),
           (KXNORWrr VK16:$src1, VK16:$src2)>;
 def : Pat<(xor (xor VK8:$src1, VK8:$src2), (v8i1 immAllOnesV)),
           (KXNORBrr VK8:$src1, VK8:$src2)>, Requires<[HasDQI]>;
 def : Pat<(xor (xor VK32:$src1, VK32:$src2), (v32i1 immAllOnesV)),
           (KXNORDrr VK32:$src1, VK32:$src2)>, Requires<[HasBWI]>;
 def : Pat<(xor (xor VK64:$src1, VK64:$src2), (v64i1 immAllOnesV)),
           (KXNORQrr VK64:$src1, VK64:$src2)>, Requires<[HasBWI]>;
 
 let Predicates = [NoDQI] in
 def : Pat<(xor (xor VK8:$src1, VK8:$src2), (v8i1 immAllOnesV)),
           (COPY_TO_REGCLASS (KXNORWrr (COPY_TO_REGCLASS VK8:$src1, VK16),
                              (COPY_TO_REGCLASS VK8:$src2, VK16)), VK8)>;
 
 def : Pat<(xor (xor VK4:$src1, VK4:$src2), (v4i1 immAllOnesV)),
           (COPY_TO_REGCLASS (KXNORWrr (COPY_TO_REGCLASS VK4:$src1, VK16),
                              (COPY_TO_REGCLASS VK4:$src2, VK16)), VK4)>;
 
 def : Pat<(xor (xor VK2:$src1, VK2:$src2), (v2i1 immAllOnesV)),
           (COPY_TO_REGCLASS (KXNORWrr (COPY_TO_REGCLASS VK2:$src1, VK16),
                              (COPY_TO_REGCLASS VK2:$src2, VK16)), VK2)>;
 
 def : Pat<(xor (xor VK1:$src1, VK1:$src2), (i1 1)),
           (COPY_TO_REGCLASS (KXNORWrr (COPY_TO_REGCLASS VK1:$src1, VK16),
                              (COPY_TO_REGCLASS VK1:$src2, VK16)), VK1)>;
 
 // Mask unpacking
 multiclass avx512_mask_unpck<string Suffix,RegisterClass KRC, ValueType VT,
                              RegisterClass KRCSrc, Predicate prd> {
   let Predicates = [prd] in {
     let hasSideEffects = 0 in
     def rr : I<0x4b, MRMSrcReg, (outs KRC:$dst),
                (ins KRC:$src1, KRC:$src2),
                "kunpck"#Suffix#"\t{$src2, $src1, $dst|$dst, $src1, $src2}", []>,
                VEX_4V, VEX_L;
 
     def : Pat<(VT (concat_vectors KRCSrc:$src1, KRCSrc:$src2)),
               (!cast<Instruction>(NAME##rr)
                         (COPY_TO_REGCLASS KRCSrc:$src2, KRC),
                         (COPY_TO_REGCLASS KRCSrc:$src1, KRC))>;
   }
 }
 
 defm KUNPCKBW : avx512_mask_unpck<"bw", VK16, v16i1, VK8, HasAVX512>, PD;
 defm KUNPCKWD : avx512_mask_unpck<"wd", VK32, v32i1, VK16, HasBWI>, PS;
 defm KUNPCKDQ : avx512_mask_unpck<"dq", VK64, v64i1, VK32, HasBWI>, PS, VEX_W;
 
 // Mask bit testing
 multiclass avx512_mask_testop<bits<8> opc, string OpcodeStr, RegisterClass KRC,
                               SDNode OpNode, Predicate prd> {
   let Predicates = [prd], Defs = [EFLAGS] in
     def rr : I<opc, MRMSrcReg, (outs), (ins KRC:$src1, KRC:$src2),
                !strconcat(OpcodeStr, "\t{$src2, $src1|$src1, $src2}"),
                [(set EFLAGS, (OpNode KRC:$src1, KRC:$src2))]>;
 }
 
 multiclass avx512_mask_testop_w<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                 Predicate prdW = HasAVX512> {
   defm B : avx512_mask_testop<opc, OpcodeStr#"b", VK8, OpNode, HasDQI>,
                                                                 VEX, PD;
   defm W : avx512_mask_testop<opc, OpcodeStr#"w", VK16, OpNode, prdW>,
                                                                 VEX, PS;
   defm Q : avx512_mask_testop<opc, OpcodeStr#"q", VK64, OpNode, HasBWI>,
                                                                 VEX, PS, VEX_W;
   defm D : avx512_mask_testop<opc, OpcodeStr#"d", VK32, OpNode, HasBWI>,
                                                                 VEX, PD, VEX_W;
 }
 
 defm KORTEST : avx512_mask_testop_w<0x98, "kortest", X86kortest>;
 defm KTEST   : avx512_mask_testop_w<0x99, "ktest", X86ktest, HasDQI>;
 
 // Mask shift
 multiclass avx512_mask_shiftop<bits<8> opc, string OpcodeStr, RegisterClass KRC,
                              SDNode OpNode> {
   let Predicates = [HasAVX512] in
     def ri : Ii8<opc, MRMSrcReg, (outs KRC:$dst), (ins KRC:$src, u8imm:$imm),
                  !strconcat(OpcodeStr,
                             "\t{$imm, $src, $dst|$dst, $src, $imm}"),
                             [(set KRC:$dst, (OpNode KRC:$src, (i8 imm:$imm)))]>;
 }
 
 multiclass avx512_mask_shiftop_w<bits<8> opc1, bits<8> opc2, string OpcodeStr,
                                SDNode OpNode> {
   defm W : avx512_mask_shiftop<opc1, !strconcat(OpcodeStr, "w"), VK16, OpNode>,
                                VEX, TAPD, VEX_W;
   let Predicates = [HasDQI] in
   defm B : avx512_mask_shiftop<opc1, !strconcat(OpcodeStr, "b"), VK8, OpNode>,
                                VEX, TAPD;
   let Predicates = [HasBWI] in {
   defm Q : avx512_mask_shiftop<opc2, !strconcat(OpcodeStr, "q"), VK64, OpNode>,
                                VEX, TAPD, VEX_W;
   let Predicates = [HasDQI] in
   defm D : avx512_mask_shiftop<opc2, !strconcat(OpcodeStr, "d"), VK32, OpNode>,
                                VEX, TAPD;
   }
 }
 
 defm KSHIFTL : avx512_mask_shiftop_w<0x32, 0x33, "kshiftl", X86vshli>;
 defm KSHIFTR : avx512_mask_shiftop_w<0x30, 0x31, "kshiftr", X86vsrli>;
 
 // Mask setting all 0s or 1s
 multiclass avx512_mask_setop<RegisterClass KRC, ValueType VT, PatFrag Val> {
   let Predicates = [HasAVX512] in
     let isReMaterializable = 1, isAsCheapAsAMove = 1, isPseudo = 1 in
       def #NAME# : I<0, Pseudo, (outs KRC:$dst), (ins), "",
                      [(set KRC:$dst, (VT Val))]>;
 }
 
 multiclass avx512_mask_setop_w<PatFrag Val> {
   defm B : avx512_mask_setop<VK8,   v8i1, Val>;
   defm W : avx512_mask_setop<VK16, v16i1, Val>;
   defm D : avx512_mask_setop<VK32,  v32i1, Val>;
   defm Q : avx512_mask_setop<VK64, v64i1, Val>;
 }
 
 defm KSET0 : avx512_mask_setop_w<immAllZerosV>;
 defm KSET1 : avx512_mask_setop_w<immAllOnesV>;
 
 // With AVX-512 only, 8-bit mask is promoted to 16-bit mask.
 let Predicates = [HasAVX512] in {
   def : Pat<(v8i1 immAllZerosV), (COPY_TO_REGCLASS (KSET0W), VK8)>;
   def : Pat<(v8i1 immAllOnesV),  (COPY_TO_REGCLASS (KSET1W), VK8)>;
   def : Pat<(v4i1 immAllOnesV),  (COPY_TO_REGCLASS (KSET1W), VK4)>;
   def : Pat<(v2i1 immAllOnesV),  (COPY_TO_REGCLASS (KSET1W), VK2)>;
   def : Pat<(i1 0), (COPY_TO_REGCLASS (KSET0W), VK1)>;
   def : Pat<(i1 1), (COPY_TO_REGCLASS (KSHIFTRWri (KSET1W), (i8 15)), VK1)>;
   def : Pat<(i1 -1), (COPY_TO_REGCLASS (KSHIFTRWri (KSET1W), (i8 15)), VK1)>;
 }
 def : Pat<(v8i1 (extract_subvector (v16i1 VK16:$src), (iPTR 0))),
           (v8i1 (COPY_TO_REGCLASS VK16:$src, VK8))>;
 
 def : Pat<(v16i1 (insert_subvector undef, (v8i1 VK8:$src), (iPTR 0))),
           (v16i1 (COPY_TO_REGCLASS VK8:$src, VK16))>;
 
 def : Pat<(v8i1 (extract_subvector (v16i1 VK16:$src), (iPTR 8))),
           (v8i1 (COPY_TO_REGCLASS (KSHIFTRWri VK16:$src, (i8 8)), VK8))>;
 
 def : Pat<(v16i1 (extract_subvector (v32i1 VK32:$src), (iPTR 0))),
           (v16i1 (COPY_TO_REGCLASS VK32:$src, VK16))>;
 
 def : Pat<(v16i1 (extract_subvector (v32i1 VK32:$src), (iPTR 16))),
           (v16i1 (COPY_TO_REGCLASS (KSHIFTRDri VK32:$src, (i8 16)), VK16))>;
 
 def : Pat<(v32i1 (extract_subvector (v64i1 VK64:$src), (iPTR 0))),
           (v32i1 (COPY_TO_REGCLASS VK64:$src, VK32))>;
 
 def : Pat<(v32i1 (extract_subvector (v64i1 VK64:$src), (iPTR 32))),
           (v32i1 (COPY_TO_REGCLASS (KSHIFTRQri VK64:$src, (i8 32)), VK32))>;
 
 def : Pat<(v4i1 (extract_subvector (v8i1 VK8:$src), (iPTR 0))),
           (v4i1 (COPY_TO_REGCLASS VK8:$src, VK4))>;
 
 def : Pat<(v2i1 (extract_subvector (v8i1 VK8:$src), (iPTR 0))),
           (v2i1 (COPY_TO_REGCLASS VK8:$src, VK2))>;
 
 def : Pat<(v4i1 (insert_subvector undef, (v2i1 VK2:$src), (iPTR 0))),
           (v4i1 (COPY_TO_REGCLASS VK2:$src, VK4))>;
 
 def : Pat<(v8i1 (insert_subvector undef, (v4i1 VK4:$src), (iPTR 0))),
           (v8i1 (COPY_TO_REGCLASS VK4:$src, VK8))>;
 def : Pat<(v8i1 (insert_subvector undef, (v2i1 VK2:$src), (iPTR 0))),
           (v8i1 (COPY_TO_REGCLASS VK2:$src, VK8))>;
 
 def : Pat<(v32i1 (insert_subvector undef, VK2:$src, (iPTR 0))),
           (v32i1 (COPY_TO_REGCLASS VK2:$src, VK32))>;
 def : Pat<(v32i1 (insert_subvector undef, VK4:$src, (iPTR 0))),
           (v32i1 (COPY_TO_REGCLASS VK4:$src, VK32))>;
 def : Pat<(v32i1 (insert_subvector undef, VK8:$src, (iPTR 0))),
           (v32i1 (COPY_TO_REGCLASS VK8:$src, VK32))>;
 def : Pat<(v32i1 (insert_subvector undef, VK16:$src, (iPTR 0))),
           (v32i1 (COPY_TO_REGCLASS VK16:$src, VK32))>;
 
 def : Pat<(v64i1 (insert_subvector undef, VK2:$src, (iPTR 0))),
           (v64i1 (COPY_TO_REGCLASS VK2:$src, VK64))>;
 def : Pat<(v64i1 (insert_subvector undef, VK4:$src, (iPTR 0))),
           (v64i1 (COPY_TO_REGCLASS VK4:$src, VK64))>;
 def : Pat<(v64i1 (insert_subvector undef, VK8:$src, (iPTR 0))),
           (v64i1 (COPY_TO_REGCLASS VK8:$src, VK64))>;
 def : Pat<(v64i1 (insert_subvector undef, VK16:$src, (iPTR 0))),
           (v64i1 (COPY_TO_REGCLASS VK16:$src, VK64))>;
 def : Pat<(v64i1 (insert_subvector undef, VK32:$src, (iPTR 0))),
           (v64i1 (COPY_TO_REGCLASS VK32:$src, VK64))>;
 
 
 def : Pat<(v8i1 (X86vshli VK8:$src, (i8 imm:$imm))),
           (v8i1 (COPY_TO_REGCLASS
                  (KSHIFTLWri (COPY_TO_REGCLASS VK8:$src, VK16),
                   (I8Imm $imm)), VK8))>, Requires<[HasAVX512, NoDQI]>;
 
 def : Pat<(v8i1 (X86vsrli VK8:$src, (i8 imm:$imm))),
           (v8i1 (COPY_TO_REGCLASS
                  (KSHIFTRWri (COPY_TO_REGCLASS VK8:$src, VK16),
                   (I8Imm $imm)), VK8))>, Requires<[HasAVX512, NoDQI]>;
 
 def : Pat<(v4i1 (X86vshli VK4:$src, (i8 imm:$imm))),
           (v4i1 (COPY_TO_REGCLASS
                  (KSHIFTLWri (COPY_TO_REGCLASS VK4:$src, VK16),
                   (I8Imm $imm)), VK4))>, Requires<[HasAVX512]>;
 
 def : Pat<(v4i1 (X86vsrli VK4:$src, (i8 imm:$imm))),
           (v4i1 (COPY_TO_REGCLASS
                  (KSHIFTRWri (COPY_TO_REGCLASS VK4:$src, VK16),
                   (I8Imm $imm)), VK4))>, Requires<[HasAVX512]>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512 - Aligned and unaligned load and store
 //
 
 
 multiclass avx512_load<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                          PatFrag ld_frag, PatFrag mload,
                          bit IsReMaterializable = 1> {
   let hasSideEffects = 0 in {
   def rr : AVX512PI<opc, MRMSrcReg, (outs _.RC:$dst), (ins _.RC:$src),
                     !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"), [],
                     _.ExeDomain>, EVEX;
   def rrkz : AVX512PI<opc, MRMSrcReg, (outs _.RC:$dst),
                       (ins _.KRCWM:$mask,  _.RC:$src),
                       !strconcat(OpcodeStr, "\t{$src, ${dst} {${mask}} {z}|",
                        "${dst} {${mask}} {z}, $src}"), [], _.ExeDomain>,
                        EVEX, EVEX_KZ;
 
   let canFoldAsLoad = 1, isReMaterializable = IsReMaterializable,
       SchedRW = [WriteLoad] in
   def rm : AVX512PI<opc, MRMSrcMem, (outs _.RC:$dst), (ins _.MemOp:$src),
                     !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"),
                     [(set _.RC:$dst, (_.VT (bitconvert (ld_frag addr:$src))))],
                     _.ExeDomain>, EVEX;
 
   let Constraints = "$src0 = $dst" in {
   def rrk : AVX512PI<opc, MRMSrcReg, (outs _.RC:$dst),
                     (ins _.RC:$src0, _.KRCWM:$mask, _.RC:$src1),
                     !strconcat(OpcodeStr, "\t{$src1, ${dst} {${mask}}|",
                     "${dst} {${mask}}, $src1}"),
                     [(set _.RC:$dst, (_.VT (vselect _.KRCWM:$mask,
                                         (_.VT _.RC:$src1),
                                         (_.VT _.RC:$src0))))], _.ExeDomain>,
                      EVEX, EVEX_K;
   let mayLoad = 1, SchedRW = [WriteLoad] in
     def rmk : AVX512PI<opc, MRMSrcMem, (outs _.RC:$dst),
                      (ins _.RC:$src0, _.KRCWM:$mask, _.MemOp:$src1),
                      !strconcat(OpcodeStr, "\t{$src1, ${dst} {${mask}}|",
                       "${dst} {${mask}}, $src1}"),
                      [(set _.RC:$dst, (_.VT
                          (vselect _.KRCWM:$mask,
                           (_.VT (bitconvert (ld_frag addr:$src1))),
                            (_.VT _.RC:$src0))))], _.ExeDomain>, EVEX, EVEX_K;
   }
   let mayLoad = 1, SchedRW = [WriteLoad] in
   def rmkz : AVX512PI<opc, MRMSrcMem, (outs _.RC:$dst),
                   (ins _.KRCWM:$mask, _.MemOp:$src),
                   OpcodeStr #"\t{$src, ${dst} {${mask}} {z}|"#
                                 "${dst} {${mask}} {z}, $src}",
                   [(set _.RC:$dst, (_.VT (vselect _.KRCWM:$mask,
                     (_.VT (bitconvert (ld_frag addr:$src))), _.ImmAllZerosV)))],
                   _.ExeDomain>, EVEX, EVEX_KZ;
   }
   def : Pat<(_.VT (mload addr:$ptr, _.KRCWM:$mask, undef)),
             (!cast<Instruction>(NAME#_.ZSuffix##rmkz) _.KRCWM:$mask, addr:$ptr)>;
 
   def : Pat<(_.VT (mload addr:$ptr, _.KRCWM:$mask, _.ImmAllZerosV)),
             (!cast<Instruction>(NAME#_.ZSuffix##rmkz) _.KRCWM:$mask, addr:$ptr)>;
 
   def : Pat<(_.VT (mload addr:$ptr, _.KRCWM:$mask, (_.VT _.RC:$src0))),
             (!cast<Instruction>(NAME#_.ZSuffix##rmk) _.RC:$src0,
              _.KRCWM:$mask, addr:$ptr)>;
 }
 
 multiclass avx512_alignedload_vl<bits<8> opc, string OpcodeStr,
                                   AVX512VLVectorVTInfo _,
                                   Predicate prd,
                                   bit IsReMaterializable = 1> {
   let Predicates = [prd] in
   defm Z : avx512_load<opc, OpcodeStr, _.info512, _.info512.AlignedLdFrag,
                        masked_load_aligned512, IsReMaterializable>, EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
   defm Z256 : avx512_load<opc, OpcodeStr, _.info256, _.info256.AlignedLdFrag,
                           masked_load_aligned256, IsReMaterializable>, EVEX_V256;
   defm Z128 : avx512_load<opc, OpcodeStr, _.info128, _.info128.AlignedLdFrag,
                           masked_load_aligned128, IsReMaterializable>, EVEX_V128;
   }
 }
 
 multiclass avx512_load_vl<bits<8> opc, string OpcodeStr,
                                   AVX512VLVectorVTInfo _,
                                   Predicate prd,
                                   bit IsReMaterializable = 1> {
   let Predicates = [prd] in
   defm Z : avx512_load<opc, OpcodeStr, _.info512, _.info512.LdFrag,
                        masked_load_unaligned, IsReMaterializable>, EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
   defm Z256 : avx512_load<opc, OpcodeStr, _.info256, _.info256.LdFrag,
                          masked_load_unaligned, IsReMaterializable>, EVEX_V256;
   defm Z128 : avx512_load<opc, OpcodeStr, _.info128, _.info128.LdFrag,
                          masked_load_unaligned, IsReMaterializable>, EVEX_V128;
   }
 }
 
 multiclass avx512_store<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                         PatFrag st_frag, PatFrag mstore> {
 
   def rr_REV  : AVX512PI<opc, MRMDestReg, (outs _.RC:$dst), (ins _.RC:$src),
                          OpcodeStr # ".s\t{$src, $dst|$dst, $src}",
                          [], _.ExeDomain>, EVEX;
   def rrk_REV : AVX512PI<opc, MRMDestReg, (outs  _.RC:$dst),
                          (ins _.KRCWM:$mask, _.RC:$src),
                          OpcodeStr # ".s\t{$src, ${dst} {${mask}}|"#
                          "${dst} {${mask}}, $src}",
                          [], _.ExeDomain>,  EVEX, EVEX_K;
   def rrkz_REV : AVX512PI<opc, MRMDestReg, (outs  _.RC:$dst),
                           (ins _.KRCWM:$mask, _.RC:$src),
                           OpcodeStr # ".s\t{$src, ${dst} {${mask}} {z}|" #
                           "${dst} {${mask}} {z}, $src}",
                           [], _.ExeDomain>, EVEX, EVEX_KZ;
 
   let mayStore = 1 in {
   def mr : AVX512PI<opc, MRMDestMem, (outs), (ins _.MemOp:$dst, _.RC:$src),
                     !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"),
                     [(st_frag (_.VT _.RC:$src), addr:$dst)], _.ExeDomain>, EVEX;
   def mrk : AVX512PI<opc, MRMDestMem, (outs),
                      (ins _.MemOp:$dst, _.KRCWM:$mask, _.RC:$src),
               OpcodeStr # "\t{$src, ${dst} {${mask}}|${dst} {${mask}}, $src}",
                [], _.ExeDomain>, EVEX, EVEX_K;
   }
 
   def: Pat<(mstore addr:$ptr, _.KRCWM:$mask, (_.VT _.RC:$src)),
            (!cast<Instruction>(NAME#_.ZSuffix##mrk) addr:$ptr,
                                                     _.KRCWM:$mask, _.RC:$src)>;
 }
 
 
 multiclass avx512_store_vl< bits<8> opc, string OpcodeStr,
                             AVX512VLVectorVTInfo _, Predicate prd> {
   let Predicates = [prd] in
   defm Z : avx512_store<opc, OpcodeStr, _.info512, store,
                         masked_store_unaligned>, EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_store<opc, OpcodeStr, _.info256, store,
                              masked_store_unaligned>, EVEX_V256;
     defm Z128 : avx512_store<opc, OpcodeStr, _.info128, store,
                              masked_store_unaligned>, EVEX_V128;
   }
 }
 
 multiclass avx512_alignedstore_vl<bits<8> opc, string OpcodeStr,
                                   AVX512VLVectorVTInfo _,  Predicate prd> {
   let Predicates = [prd] in
   defm Z : avx512_store<opc, OpcodeStr, _.info512, alignedstore512,
                         masked_store_aligned512>, EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_store<opc, OpcodeStr, _.info256, alignedstore256,
                              masked_store_aligned256>, EVEX_V256;
     defm Z128 : avx512_store<opc, OpcodeStr, _.info128, alignedstore,
                              masked_store_aligned128>, EVEX_V128;
   }
 }
 
 defm VMOVAPS : avx512_alignedload_vl<0x28, "vmovaps", avx512vl_f32_info,
                                      HasAVX512>,
                avx512_alignedstore_vl<0x29, "vmovaps", avx512vl_f32_info,
                                       HasAVX512>,  PS, EVEX_CD8<32, CD8VF>;
 
 defm VMOVAPD : avx512_alignedload_vl<0x28, "vmovapd", avx512vl_f64_info,
                                      HasAVX512>,
                avx512_alignedstore_vl<0x29, "vmovapd", avx512vl_f64_info,
                                      HasAVX512>, PD, VEX_W, EVEX_CD8<64, CD8VF>;
 
 defm VMOVUPS : avx512_load_vl<0x10, "vmovups", avx512vl_f32_info, HasAVX512>,
                avx512_store_vl<0x11, "vmovups", avx512vl_f32_info, HasAVX512>,
                               PS, EVEX_CD8<32, CD8VF>;
 
 defm VMOVUPD : avx512_load_vl<0x10, "vmovupd", avx512vl_f64_info, HasAVX512, 0>,
                avx512_store_vl<0x11, "vmovupd", avx512vl_f64_info, HasAVX512>,
                PD, VEX_W, EVEX_CD8<64, CD8VF>;
 
 def: Pat<(int_x86_avx512_mask_storeu_ps_512 addr:$ptr, (v16f32 VR512:$src),
           GR16:$mask),
          (VMOVUPSZmrk addr:$ptr, (v16i1 (COPY_TO_REGCLASS GR16:$mask, VK16WM)),
             VR512:$src)>;
 def: Pat<(int_x86_avx512_mask_storeu_pd_512 addr:$ptr, (v8f64 VR512:$src),
           GR8:$mask),
          (VMOVUPDZmrk addr:$ptr, (v8i1 (COPY_TO_REGCLASS GR8:$mask, VK8WM)),
             VR512:$src)>;
 
 def: Pat<(int_x86_avx512_mask_store_ps_512 addr:$ptr, (v16f32 VR512:$src),
           GR16:$mask),
          (VMOVAPSZmrk addr:$ptr, (v16i1 (COPY_TO_REGCLASS GR16:$mask, VK16WM)),
             VR512:$src)>;
 def: Pat<(int_x86_avx512_mask_store_pd_512 addr:$ptr, (v8f64 VR512:$src),
           GR8:$mask),
          (VMOVAPDZmrk addr:$ptr, (v8i1 (COPY_TO_REGCLASS GR8:$mask, VK8WM)),
             VR512:$src)>;
 
 defm VMOVDQA32 : avx512_alignedload_vl<0x6F, "vmovdqa32", avx512vl_i32_info,
                                        HasAVX512>,
                  avx512_alignedstore_vl<0x7F, "vmovdqa32", avx512vl_i32_info,
                                        HasAVX512>, PD, EVEX_CD8<32, CD8VF>;
 
 defm VMOVDQA64 : avx512_alignedload_vl<0x6F, "vmovdqa64", avx512vl_i64_info,
                                        HasAVX512>,
                  avx512_alignedstore_vl<0x7F, "vmovdqa64", avx512vl_i64_info,
                                     HasAVX512>, PD, VEX_W, EVEX_CD8<64, CD8VF>;
 
 defm VMOVDQU8 : avx512_load_vl<0x6F, "vmovdqu8", avx512vl_i8_info, HasBWI>,
                  avx512_store_vl<0x7F, "vmovdqu8", avx512vl_i8_info,
                                  HasBWI>, XD, EVEX_CD8<8, CD8VF>;
 
 defm VMOVDQU16 : avx512_load_vl<0x6F, "vmovdqu16", avx512vl_i16_info, HasBWI>,
                  avx512_store_vl<0x7F, "vmovdqu16", avx512vl_i16_info,
                                  HasBWI>, XD, VEX_W, EVEX_CD8<16, CD8VF>;
 
 defm VMOVDQU32 : avx512_load_vl<0x6F, "vmovdqu32", avx512vl_i32_info, HasAVX512>,
                  avx512_store_vl<0x7F, "vmovdqu32", avx512vl_i32_info,
                                  HasAVX512>, XS, EVEX_CD8<32, CD8VF>;
 
 defm VMOVDQU64 : avx512_load_vl<0x6F, "vmovdqu64", avx512vl_i64_info, HasAVX512>,
                  avx512_store_vl<0x7F, "vmovdqu64", avx512vl_i64_info,
                                  HasAVX512>, XS, VEX_W, EVEX_CD8<64, CD8VF>;
 
 def: Pat<(v16i32 (int_x86_avx512_mask_loadu_d_512 addr:$ptr,
                  (v16i32 immAllZerosV), GR16:$mask)),
        (VMOVDQU32Zrmkz (v16i1 (COPY_TO_REGCLASS GR16:$mask, VK16WM)), addr:$ptr)>;
 
 def: Pat<(v8i64 (int_x86_avx512_mask_loadu_q_512 addr:$ptr,
                 (bc_v8i64 (v16i32 immAllZerosV)), GR8:$mask)),
        (VMOVDQU64Zrmkz (v8i1 (COPY_TO_REGCLASS GR8:$mask, VK8WM)), addr:$ptr)>;
 
 def: Pat<(int_x86_avx512_mask_storeu_d_512 addr:$ptr, (v16i32 VR512:$src),
             GR16:$mask),
          (VMOVDQU32Zmrk addr:$ptr, (v16i1 (COPY_TO_REGCLASS GR16:$mask, VK16WM)),
             VR512:$src)>;
 def: Pat<(int_x86_avx512_mask_storeu_q_512 addr:$ptr, (v8i64 VR512:$src),
             GR8:$mask),
          (VMOVDQU64Zmrk addr:$ptr, (v8i1 (COPY_TO_REGCLASS GR8:$mask, VK8WM)),
             VR512:$src)>;
 
 let AddedComplexity = 20 in {
 def : Pat<(v8i64 (vselect VK8WM:$mask, (v8i64 VR512:$src),
                           (bc_v8i64 (v16i32 immAllZerosV)))),
                   (VMOVDQU64Zrrkz VK8WM:$mask, VR512:$src)>;
 
 def : Pat<(v8i64 (vselect VK8WM:$mask, (bc_v8i64 (v16i32 immAllZerosV)),
                           (v8i64 VR512:$src))),
    (VMOVDQU64Zrrkz (COPY_TO_REGCLASS (KNOTWrr (COPY_TO_REGCLASS VK8:$mask, VK16)),
                                               VK8), VR512:$src)>;
 
 def : Pat<(v16i32 (vselect VK16WM:$mask, (v16i32 VR512:$src),
                            (v16i32 immAllZerosV))),
                   (VMOVDQU32Zrrkz VK16WM:$mask, VR512:$src)>;
 
 def : Pat<(v16i32 (vselect VK16WM:$mask, (v16i32 immAllZerosV),
                            (v16i32 VR512:$src))),
                   (VMOVDQU32Zrrkz (KNOTWrr VK16WM:$mask), VR512:$src)>;
 }
 
 // Move Int Doubleword to Packed Double Int
 //
 def VMOVDI2PDIZrr : AVX512BI<0x6E, MRMSrcReg, (outs VR128X:$dst), (ins GR32:$src),
                       "vmovd\t{$src, $dst|$dst, $src}",
                       [(set VR128X:$dst,
                         (v4i32 (scalar_to_vector GR32:$src)))], IIC_SSE_MOVDQ>,
                         EVEX;
 def VMOVDI2PDIZrm : AVX512BI<0x6E, MRMSrcMem, (outs VR128X:$dst), (ins i32mem:$src),
                       "vmovd\t{$src, $dst|$dst, $src}",
                       [(set VR128X:$dst,
                         (v4i32 (scalar_to_vector (loadi32 addr:$src))))],
                         IIC_SSE_MOVDQ>, EVEX, EVEX_CD8<32, CD8VT1>;
 def VMOV64toPQIZrr : AVX512BI<0x6E, MRMSrcReg, (outs VR128X:$dst), (ins GR64:$src),
                       "vmovq\t{$src, $dst|$dst, $src}",
                         [(set VR128X:$dst,
                           (v2i64 (scalar_to_vector GR64:$src)))],
                           IIC_SSE_MOVDQ>, EVEX, VEX_W;
 let isCodeGenOnly = 1, ForceDisassemble = 1, hasSideEffects = 0, mayLoad = 1 in
 def VMOV64toPQIZrm : AVX512BI<0x6E, MRMSrcMem, (outs VR128X:$dst),
                       (ins i64mem:$src),
                       "vmovq\t{$src, $dst|$dst, $src}", []>,
                       EVEX, VEX_W, EVEX_CD8<64, CD8VT1>;
 let isCodeGenOnly = 1 in {
 def VMOV64toSDZrr : AVX512BI<0x6E, MRMSrcReg, (outs FR64X:$dst), (ins GR64:$src),
                        "vmovq\t{$src, $dst|$dst, $src}",
                        [(set FR64X:$dst, (bitconvert GR64:$src))],
                        IIC_SSE_MOVDQ>, EVEX, VEX_W, Sched<[WriteMove]>;
 def VMOVSDto64Zrr : AVX512BI<0x7E, MRMDestReg, (outs GR64:$dst), (ins FR64X:$src),
                          "vmovq\t{$src, $dst|$dst, $src}",
                          [(set GR64:$dst, (bitconvert FR64X:$src))],
                          IIC_SSE_MOVDQ>, EVEX, VEX_W, Sched<[WriteMove]>;
 def VMOVSDto64Zmr : AVX512BI<0x7E, MRMDestMem, (outs), (ins i64mem:$dst, FR64X:$src),
                          "vmovq\t{$src, $dst|$dst, $src}",
                          [(store (i64 (bitconvert FR64X:$src)), addr:$dst)],
                          IIC_SSE_MOVDQ>, EVEX, VEX_W, Sched<[WriteStore]>,
                          EVEX_CD8<64, CD8VT1>;
 }
 
 // Move Int Doubleword to Single Scalar
 //
 let isCodeGenOnly = 1 in {
 def VMOVDI2SSZrr  : AVX512BI<0x6E, MRMSrcReg, (outs FR32X:$dst), (ins GR32:$src),
                       "vmovd\t{$src, $dst|$dst, $src}",
                       [(set FR32X:$dst, (bitconvert GR32:$src))],
                       IIC_SSE_MOVDQ>, EVEX;
 
 def VMOVDI2SSZrm  : AVX512BI<0x6E, MRMSrcMem, (outs FR32X:$dst), (ins i32mem:$src),
                       "vmovd\t{$src, $dst|$dst, $src}",
                       [(set FR32X:$dst, (bitconvert (loadi32 addr:$src)))],
                       IIC_SSE_MOVDQ>, EVEX, EVEX_CD8<32, CD8VT1>;
 }
 
 // Move doubleword from xmm register to r/m32
 //
 def VMOVPDI2DIZrr  : AVX512BI<0x7E, MRMDestReg, (outs GR32:$dst), (ins VR128X:$src),
                        "vmovd\t{$src, $dst|$dst, $src}",
                        [(set GR32:$dst, (extractelt (v4i32 VR128X:$src),
                                         (iPTR 0)))], IIC_SSE_MOVD_ToGP>,
                        EVEX;
 def VMOVPDI2DIZmr  : AVX512BI<0x7E, MRMDestMem, (outs),
                        (ins i32mem:$dst, VR128X:$src),
                        "vmovd\t{$src, $dst|$dst, $src}",
                        [(store (i32 (extractelt (v4i32 VR128X:$src),
                                      (iPTR 0))), addr:$dst)], IIC_SSE_MOVDQ>,
                        EVEX, EVEX_CD8<32, CD8VT1>;
 
 // Move quadword from xmm1 register to r/m64
 //
 def VMOVPQIto64Zrr : I<0x7E, MRMDestReg, (outs GR64:$dst), (ins VR128X:$src),
                       "vmovq\t{$src, $dst|$dst, $src}",
                       [(set GR64:$dst, (extractelt (v2i64 VR128X:$src),
                                                    (iPTR 0)))],
                       IIC_SSE_MOVD_ToGP>, PD, EVEX, VEX_W,
                       Requires<[HasAVX512, In64BitMode]>;
 
 let isCodeGenOnly = 1, ForceDisassemble = 1, hasSideEffects = 0, mayStore = 1 in
 def VMOVPQIto64Zmr : I<0x7E, MRMDestMem, (outs), (ins i64mem:$dst, VR128X:$src),
                       "vmovq\t{$src, $dst|$dst, $src}",
                       [], IIC_SSE_MOVD_ToGP>, PD, EVEX, VEX_W,
                       Requires<[HasAVX512, In64BitMode]>;
 
 def VMOVPQI2QIZmr : I<0xD6, MRMDestMem, (outs),
                       (ins i64mem:$dst, VR128X:$src),
                       "vmovq\t{$src, $dst|$dst, $src}",
                       [(store (extractelt (v2i64 VR128X:$src), (iPTR 0)),
                               addr:$dst)], IIC_SSE_MOVDQ>,
                       EVEX, PD, VEX_W, EVEX_CD8<64, CD8VT1>,
                       Sched<[WriteStore]>, Requires<[HasAVX512, In64BitMode]>;
 
 let hasSideEffects = 0 in
 def VMOVPQI2QIZrr : AVX512BI<0xD6, MRMDestReg, (outs VR128X:$dst),
                              (ins VR128X:$src),
                              "vmovq.s\t{$src, $dst|$dst, $src}",[]>,
                              EVEX, VEX_W;
 
 // Move Scalar Single to Double Int
 //
 let isCodeGenOnly = 1 in {
 def VMOVSS2DIZrr  : AVX512BI<0x7E, MRMDestReg, (outs GR32:$dst),
                       (ins FR32X:$src),
                       "vmovd\t{$src, $dst|$dst, $src}",
                       [(set GR32:$dst, (bitconvert FR32X:$src))],
                       IIC_SSE_MOVD_ToGP>, EVEX;
 def VMOVSS2DIZmr  : AVX512BI<0x7E, MRMDestMem, (outs),
                       (ins i32mem:$dst, FR32X:$src),
                       "vmovd\t{$src, $dst|$dst, $src}",
                       [(store (i32 (bitconvert FR32X:$src)), addr:$dst)],
                       IIC_SSE_MOVDQ>, EVEX, EVEX_CD8<32, CD8VT1>;
 }
 
 // Move Quadword Int to Packed Quadword Int
 //
 def VMOVQI2PQIZrm : AVX512XSI<0x7E, MRMSrcMem, (outs VR128X:$dst),
                       (ins i64mem:$src),
                       "vmovq\t{$src, $dst|$dst, $src}",
                       [(set VR128X:$dst,
                         (v2i64 (scalar_to_vector (loadi64 addr:$src))))]>,
                       EVEX, VEX_W, EVEX_CD8<8, CD8VT8>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512  MOVSS, MOVSD
 //===----------------------------------------------------------------------===//
 
 multiclass avx512_move_scalar <string asm, SDNode OpNode, 
                               X86VectorVTInfo _> {
   defm rr_Int : AVX512_maskable_scalar<0x10, MRMSrcReg, _, (outs _.RC:$dst), 
                     (ins _.RC:$src1, _.RC:$src2),
                     asm, "$src2, $src1","$src1, $src2", 
                     (_.VT (OpNode (_.VT _.RC:$src1),
                                    (_.VT _.RC:$src2))),
                                    IIC_SSE_MOV_S_RR>, EVEX_4V;
   let Constraints = "$src1 = $dst" , mayLoad = 1 in
     defm rm_Int : AVX512_maskable_3src_scalar<0x10, MRMSrcMem, _,
                     (outs _.RC:$dst), 
                     (ins _.ScalarMemOp:$src),
                     asm,"$src","$src",
                     (_.VT (OpNode (_.VT _.RC:$src1), 
                                (_.VT (scalar_to_vector 
                                      (_.ScalarLdFrag addr:$src)))))>, EVEX;
   let isCodeGenOnly = 1 in {
     def rr : AVX512PI<0x10, MRMSrcReg, (outs _.RC:$dst), 
                (ins _.RC:$src1, _.FRC:$src2),
                !strconcat(asm, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
                [(set _.RC:$dst, (_.VT (OpNode _.RC:$src1,
                                       (scalar_to_vector _.FRC:$src2))))],
                _.ExeDomain,IIC_SSE_MOV_S_RR>, EVEX_4V;
   let mayLoad = 1 in
     def rm : AVX512PI<0x10, MRMSrcMem, (outs _.FRC:$dst), (ins _.ScalarMemOp:$src),
                !strconcat(asm, "\t{$src, $dst|$dst, $src}"),
                [(set _.FRC:$dst, (_.ScalarLdFrag addr:$src))],
                _.ExeDomain, IIC_SSE_MOV_S_RM>, EVEX;
   }
   let mayStore = 1 in {
     def mr: AVX512PI<0x11, MRMDestMem, (outs), (ins _.ScalarMemOp:$dst, _.FRC:$src),
                !strconcat(asm, "\t{$src, $dst|$dst, $src}"),
                [(store _.FRC:$src, addr:$dst)],  _.ExeDomain, IIC_SSE_MOV_S_MR>,
                EVEX;
     def mrk: AVX512PI<0x11, MRMDestMem, (outs), 
                 (ins _.ScalarMemOp:$dst, VK1WM:$mask, _.FRC:$src),
                 !strconcat(asm, "\t{$src, $dst {${mask}}|$dst {${mask}}, $src}"),
                 [], _.ExeDomain, IIC_SSE_MOV_S_MR>, EVEX, EVEX_K;
   } // mayStore
 }
 
 defm VMOVSSZ : avx512_move_scalar<"vmovss", X86Movss, f32x_info>,
                                   VEX_LIG, XS, EVEX_CD8<32, CD8VT1>;
 
 defm VMOVSDZ : avx512_move_scalar<"vmovsd", X86Movsd, f64x_info>,
                                   VEX_LIG, XD, VEX_W, EVEX_CD8<64, CD8VT1>;
 
 def : Pat<(f32 (X86select VK1WM:$mask, (f32 FR32X:$src1), (f32 FR32X:$src2))),
           (COPY_TO_REGCLASS (VMOVSSZrr_Intk (COPY_TO_REGCLASS FR32X:$src2, VR128X),
            VK1WM:$mask, (v4f32 (IMPLICIT_DEF)),(COPY_TO_REGCLASS FR32X:$src1, VR128X)), FR32X)>;
 
 def : Pat<(f64 (X86select VK1WM:$mask, (f64 FR64X:$src1), (f64 FR64X:$src2))),
           (COPY_TO_REGCLASS (VMOVSDZrr_Intk (COPY_TO_REGCLASS FR64X:$src2, VR128X),
            VK1WM:$mask, (v2f64 (IMPLICIT_DEF)), (COPY_TO_REGCLASS FR64X:$src1, VR128X)), FR64X)>;
 
 def : Pat<(int_x86_avx512_mask_store_ss addr:$dst, VR128X:$src, GR8:$mask),
           (VMOVSSZmrk addr:$dst, (i1 (COPY_TO_REGCLASS GR8:$mask, VK1WM)),
            (COPY_TO_REGCLASS VR128X:$src, FR32X))>;
 
 defm VMOVSSZrr_REV : AVX512_maskable_in_asm<0x11, MRMDestReg, f32x_info,
                            (outs VR128X:$dst), (ins VR128X:$src1, VR128X:$src2),
                            "vmovss.s", "$src2, $src1", "$src1, $src2", []>,
                            XS, EVEX_4V, VEX_LIG;
 
 defm VMOVSSDrr_REV : AVX512_maskable_in_asm<0x11, MRMDestReg, f64x_info,
                            (outs VR128X:$dst), (ins VR128X:$src1, VR128X:$src2),
                            "vmovsd.s", "$src2, $src1", "$src1, $src2", []>,
                            XD, EVEX_4V, VEX_LIG, VEX_W;
 
 let Predicates = [HasAVX512] in {
   let AddedComplexity = 15 in {
   // Move scalar to XMM zero-extended, zeroing a VR128X then do a
   // MOVS{S,D} to the lower bits.
   def : Pat<(v4f32 (X86vzmovl (v4f32 (scalar_to_vector FR32X:$src)))),
             (VMOVSSZrr (v4f32 (V_SET0)), FR32X:$src)>;
   def : Pat<(v4f32 (X86vzmovl (v4f32 VR128X:$src))),
             (VMOVSSZrr (v4f32 (V_SET0)), (COPY_TO_REGCLASS VR128X:$src, FR32X))>;
   def : Pat<(v4i32 (X86vzmovl (v4i32 VR128X:$src))),
             (VMOVSSZrr (v4i32 (V_SET0)), (COPY_TO_REGCLASS VR128X:$src, FR32X))>;
   def : Pat<(v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64X:$src)))),
             (VMOVSDZrr (v2f64 (V_SET0)), FR64X:$src)>;
 
   // Move low f32 and clear high bits.
   def : Pat<(v8f32 (X86vzmovl (v8f32 VR256X:$src))),
             (SUBREG_TO_REG (i32 0),
              (VMOVSSZrr (v4f32 (V_SET0)),
               (EXTRACT_SUBREG (v8f32 VR256X:$src), sub_xmm)), sub_xmm)>;
   def : Pat<(v8i32 (X86vzmovl (v8i32 VR256X:$src))),
             (SUBREG_TO_REG (i32 0),
              (VMOVSSZrr (v4i32 (V_SET0)),
                        (EXTRACT_SUBREG (v8i32 VR256X:$src), sub_xmm)), sub_xmm)>;
   }
 
   let AddedComplexity = 20 in {
   // MOVSSrm zeros the high parts of the register; represent this
   // with SUBREG_TO_REG. The AVX versions also write: DST[255:128] <- 0
   def : Pat<(v4f32 (X86vzmovl (v4f32 (scalar_to_vector (loadf32 addr:$src))))),
             (COPY_TO_REGCLASS (VMOVSSZrm addr:$src), VR128X)>;
   def : Pat<(v4f32 (scalar_to_vector (loadf32 addr:$src))),
             (COPY_TO_REGCLASS (VMOVSSZrm addr:$src), VR128X)>;
   def : Pat<(v4f32 (X86vzmovl (loadv4f32 addr:$src))),
             (COPY_TO_REGCLASS (VMOVSSZrm addr:$src), VR128X)>;
 
   // MOVSDrm zeros the high parts of the register; represent this
   // with SUBREG_TO_REG. The AVX versions also write: DST[255:128] <- 0
   def : Pat<(v2f64 (X86vzmovl (v2f64 (scalar_to_vector (loadf64 addr:$src))))),
             (COPY_TO_REGCLASS (VMOVSDZrm addr:$src), VR128X)>;
   def : Pat<(v2f64 (scalar_to_vector (loadf64 addr:$src))),
             (COPY_TO_REGCLASS (VMOVSDZrm addr:$src), VR128X)>;
   def : Pat<(v2f64 (X86vzmovl (loadv2f64 addr:$src))),
             (COPY_TO_REGCLASS (VMOVSDZrm addr:$src), VR128X)>;
   def : Pat<(v2f64 (X86vzmovl (bc_v2f64 (loadv4f32 addr:$src)))),
             (COPY_TO_REGCLASS (VMOVSDZrm addr:$src), VR128X)>;
   def : Pat<(v2f64 (X86vzload addr:$src)),
             (COPY_TO_REGCLASS (VMOVSDZrm addr:$src), VR128X)>;
 
   // Represent the same patterns above but in the form they appear for
   // 256-bit types
   def : Pat<(v8i32 (X86vzmovl (insert_subvector undef,
                    (v4i32 (scalar_to_vector (loadi32 addr:$src))), (iPTR 0)))),
             (SUBREG_TO_REG (i32 0), (VMOVDI2PDIZrm addr:$src), sub_xmm)>;
   def : Pat<(v8f32 (X86vzmovl (insert_subvector undef,
                    (v4f32 (scalar_to_vector (loadf32 addr:$src))), (iPTR 0)))),
             (SUBREG_TO_REG (i32 0), (VMOVSSZrm addr:$src), sub_xmm)>;
   def : Pat<(v4f64 (X86vzmovl (insert_subvector undef,
                    (v2f64 (scalar_to_vector (loadf64 addr:$src))), (iPTR 0)))),
             (SUBREG_TO_REG (i32 0), (VMOVSDZrm addr:$src), sub_xmm)>;
   }
   def : Pat<(v8f32 (X86vzmovl (insert_subvector undef,
                    (v4f32 (scalar_to_vector FR32X:$src)), (iPTR 0)))),
             (SUBREG_TO_REG (i32 0), (v4f32 (VMOVSSZrr (v4f32 (V_SET0)),
                                             FR32X:$src)), sub_xmm)>;
   def : Pat<(v4f64 (X86vzmovl (insert_subvector undef,
                    (v2f64 (scalar_to_vector FR64X:$src)), (iPTR 0)))),
             (SUBREG_TO_REG (i64 0), (v2f64 (VMOVSDZrr (v2f64 (V_SET0)),
                                      FR64X:$src)), sub_xmm)>;
   def : Pat<(v4i64 (X86vzmovl (insert_subvector undef,
                    (v2i64 (scalar_to_vector (loadi64 addr:$src))), (iPTR 0)))),
             (SUBREG_TO_REG (i64 0), (VMOVQI2PQIZrm addr:$src), sub_xmm)>;
 
   // Move low f64 and clear high bits.
   def : Pat<(v4f64 (X86vzmovl (v4f64 VR256X:$src))),
             (SUBREG_TO_REG (i32 0),
              (VMOVSDZrr (v2f64 (V_SET0)),
                        (EXTRACT_SUBREG (v4f64 VR256X:$src), sub_xmm)), sub_xmm)>;
 
   def : Pat<(v4i64 (X86vzmovl (v4i64 VR256X:$src))),
             (SUBREG_TO_REG (i32 0), (VMOVSDZrr (v2i64 (V_SET0)),
                        (EXTRACT_SUBREG (v4i64 VR256X:$src), sub_xmm)), sub_xmm)>;
 
   // Extract and store.
   def : Pat<(store (f32 (extractelt (v4f32 VR128X:$src), (iPTR 0))),
                    addr:$dst),
             (VMOVSSZmr addr:$dst, (COPY_TO_REGCLASS (v4f32 VR128X:$src), FR32X))>;
   def : Pat<(store (f64 (extractelt (v2f64 VR128X:$src), (iPTR 0))),
                    addr:$dst),
             (VMOVSDZmr addr:$dst, (COPY_TO_REGCLASS (v2f64 VR128X:$src), FR64X))>;
 
   // Shuffle with VMOVSS
   def : Pat<(v4i32 (X86Movss VR128X:$src1, VR128X:$src2)),
             (VMOVSSZrr (v4i32 VR128X:$src1),
                       (COPY_TO_REGCLASS (v4i32 VR128X:$src2), FR32X))>;
   def : Pat<(v4f32 (X86Movss VR128X:$src1, VR128X:$src2)),
             (VMOVSSZrr (v4f32 VR128X:$src1),
                       (COPY_TO_REGCLASS (v4f32 VR128X:$src2), FR32X))>;
 
   // 256-bit variants
   def : Pat<(v8i32 (X86Movss VR256X:$src1, VR256X:$src2)),
             (SUBREG_TO_REG (i32 0),
               (VMOVSSZrr (EXTRACT_SUBREG (v8i32 VR256X:$src1), sub_xmm),
                         (EXTRACT_SUBREG (v8i32 VR256X:$src2), sub_xmm)),
               sub_xmm)>;
   def : Pat<(v8f32 (X86Movss VR256X:$src1, VR256X:$src2)),
             (SUBREG_TO_REG (i32 0),
               (VMOVSSZrr (EXTRACT_SUBREG (v8f32 VR256X:$src1), sub_xmm),
                         (EXTRACT_SUBREG (v8f32 VR256X:$src2), sub_xmm)),
               sub_xmm)>;
 
   // Shuffle with VMOVSD
   def : Pat<(v2i64 (X86Movsd VR128X:$src1, VR128X:$src2)),
             (VMOVSDZrr VR128X:$src1, (COPY_TO_REGCLASS VR128X:$src2, FR64X))>;
   def : Pat<(v2f64 (X86Movsd VR128X:$src1, VR128X:$src2)),
             (VMOVSDZrr VR128X:$src1, (COPY_TO_REGCLASS VR128X:$src2, FR64X))>;
   def : Pat<(v4f32 (X86Movsd VR128X:$src1, VR128X:$src2)),
             (VMOVSDZrr VR128X:$src1, (COPY_TO_REGCLASS VR128X:$src2, FR64X))>;
   def : Pat<(v4i32 (X86Movsd VR128X:$src1, VR128X:$src2)),
             (VMOVSDZrr VR128X:$src1, (COPY_TO_REGCLASS VR128X:$src2, FR64X))>;
 
   // 256-bit variants
   def : Pat<(v4i64 (X86Movsd VR256X:$src1, VR256X:$src2)),
             (SUBREG_TO_REG (i32 0),
               (VMOVSDZrr (EXTRACT_SUBREG (v4i64 VR256X:$src1), sub_xmm),
                         (EXTRACT_SUBREG (v4i64 VR256X:$src2), sub_xmm)),
               sub_xmm)>;
   def : Pat<(v4f64 (X86Movsd VR256X:$src1, VR256X:$src2)),
             (SUBREG_TO_REG (i32 0),
               (VMOVSDZrr (EXTRACT_SUBREG (v4f64 VR256X:$src1), sub_xmm),
                         (EXTRACT_SUBREG (v4f64 VR256X:$src2), sub_xmm)),
               sub_xmm)>;
 
   def : Pat<(v2f64 (X86Movlpd VR128X:$src1, VR128X:$src2)),
             (VMOVSDZrr VR128X:$src1, (COPY_TO_REGCLASS VR128X:$src2, FR64X))>;
   def : Pat<(v2i64 (X86Movlpd VR128X:$src1, VR128X:$src2)),
             (VMOVSDZrr VR128X:$src1, (COPY_TO_REGCLASS VR128X:$src2, FR64X))>;
   def : Pat<(v4f32 (X86Movlps VR128X:$src1, VR128X:$src2)),
             (VMOVSDZrr VR128X:$src1, (COPY_TO_REGCLASS VR128X:$src2, FR64X))>;
   def : Pat<(v4i32 (X86Movlps VR128X:$src1, VR128X:$src2)),
             (VMOVSDZrr VR128X:$src1, (COPY_TO_REGCLASS VR128X:$src2, FR64X))>;
 }
 
 let AddedComplexity = 15 in
 def VMOVZPQILo2PQIZrr : AVX512XSI<0x7E, MRMSrcReg, (outs VR128X:$dst),
                                 (ins VR128X:$src),
                                 "vmovq\t{$src, $dst|$dst, $src}",
                                 [(set VR128X:$dst, (v2i64 (X86vzmovl
                                                    (v2i64 VR128X:$src))))],
                                 IIC_SSE_MOVQ_RR>, EVEX, VEX_W;
 
 let AddedComplexity = 20 , isCodeGenOnly = 1 in
 def VMOVZPQILo2PQIZrm : AVX512XSI<0x7E, MRMSrcMem, (outs VR128X:$dst),
                                  (ins i128mem:$src),
                                  "vmovq\t{$src, $dst|$dst, $src}",
                                  [(set VR128X:$dst, (v2i64 (X86vzmovl
                                                      (loadv2i64 addr:$src))))],
                                  IIC_SSE_MOVDQ>, EVEX, VEX_W,
                                  EVEX_CD8<8, CD8VT8>;
 
 let Predicates = [HasAVX512] in {
   // AVX 128-bit movd/movq instruction write zeros in the high 128-bit part.
   let AddedComplexity = 20 in {
     def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector (loadi32 addr:$src))))),
               (VMOVDI2PDIZrm addr:$src)>;
     def : Pat<(v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))),
               (VMOV64toPQIZrr GR64:$src)>;
     def : Pat<(v4i32 (X86vzmovl (v4i32 (scalar_to_vector GR32:$src)))),
               (VMOVDI2PDIZrr GR32:$src)>;
 
     def : Pat<(v4i32 (X86vzmovl (bc_v4i32 (loadv4f32 addr:$src)))),
               (VMOVDI2PDIZrm addr:$src)>;
     def : Pat<(v4i32 (X86vzmovl (bc_v4i32 (loadv2i64 addr:$src)))),
               (VMOVDI2PDIZrm addr:$src)>;
     def : Pat<(v2i64 (X86vzmovl (loadv2i64 addr:$src))),
             (VMOVZPQILo2PQIZrm addr:$src)>;
     def : Pat<(v2f64 (X86vzmovl (v2f64 VR128X:$src))),
             (VMOVZPQILo2PQIZrr VR128X:$src)>;
     def : Pat<(v2i64 (X86vzload addr:$src)),
             (VMOVZPQILo2PQIZrm addr:$src)>;
   }
 
   // Use regular 128-bit instructions to match 256-bit scalar_to_vec+zext.
   def : Pat<(v8i32 (X86vzmovl (insert_subvector undef,
                                (v4i32 (scalar_to_vector GR32:$src)),(iPTR 0)))),
             (SUBREG_TO_REG (i32 0), (VMOVDI2PDIZrr GR32:$src), sub_xmm)>;
   def : Pat<(v4i64 (X86vzmovl (insert_subvector undef,
                                (v2i64 (scalar_to_vector GR64:$src)),(iPTR 0)))),
             (SUBREG_TO_REG (i64 0), (VMOV64toPQIZrr GR64:$src), sub_xmm)>;
 }
 
 def : Pat<(v16i32 (X86Vinsert (v16i32 immAllZerosV), GR32:$src2, (iPTR 0))),
         (SUBREG_TO_REG (i32 0), (VMOVDI2PDIZrr GR32:$src2), sub_xmm)>;
 
 def : Pat<(v8i64 (X86Vinsert (bc_v8i64 (v16i32 immAllZerosV)), GR64:$src2, (iPTR 0))),
         (SUBREG_TO_REG (i32 0), (VMOV64toPQIZrr GR64:$src2), sub_xmm)>;
 
 def : Pat<(v16i32 (X86Vinsert undef, GR32:$src2, (iPTR 0))),
         (SUBREG_TO_REG (i32 0), (VMOVDI2PDIZrr GR32:$src2), sub_xmm)>;
 
 def : Pat<(v8i64 (X86Vinsert undef, GR64:$src2, (iPTR 0))),
         (SUBREG_TO_REG (i32 0), (VMOV64toPQIZrr GR64:$src2), sub_xmm)>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512 - Non-temporals
 //===----------------------------------------------------------------------===//
 let SchedRW = [WriteLoad] in {
   def VMOVNTDQAZrm : AVX512PI<0x2A, MRMSrcMem, (outs VR512:$dst),
                         (ins i512mem:$src), "vmovntdqa\t{$src, $dst|$dst, $src}",
                         [(set VR512:$dst, (int_x86_avx512_movntdqa addr:$src))],
                         SSEPackedInt>, EVEX, T8PD, EVEX_V512,
                         EVEX_CD8<64, CD8VF>;
 
   let Predicates = [HasAVX512, HasVLX] in {
     def VMOVNTDQAZ256rm : AVX512PI<0x2A, MRMSrcMem, (outs VR256X:$dst),
                              (ins i256mem:$src),
                              "vmovntdqa\t{$src, $dst|$dst, $src}", [],
                              SSEPackedInt>, EVEX, T8PD, EVEX_V256,
                              EVEX_CD8<64, CD8VF>;
 
     def VMOVNTDQAZ128rm : AVX512PI<0x2A, MRMSrcMem, (outs VR128X:$dst),
                              (ins i128mem:$src),
                              "vmovntdqa\t{$src, $dst|$dst, $src}", [],
                              SSEPackedInt>, EVEX, T8PD, EVEX_V128,
                              EVEX_CD8<64, CD8VF>;
   }
 }
 
 multiclass avx512_movnt<bits<8> opc, string OpcodeStr, PatFrag st_frag,
                         ValueType OpVT, RegisterClass RC, X86MemOperand memop,
                         Domain d, InstrItinClass itin = IIC_SSE_MOVNT> {
   let SchedRW = [WriteStore], mayStore = 1,
       AddedComplexity = 400 in
   def mr : AVX512PI<opc, MRMDestMem, (outs), (ins memop:$dst, RC:$src),
                     !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"),
                     [(st_frag (OpVT RC:$src), addr:$dst)], d, itin>, EVEX;
 }
 
 multiclass avx512_movnt_vl<bits<8> opc, string OpcodeStr, PatFrag st_frag,
                            string elty, string elsz, string vsz512,
                            string vsz256, string vsz128, Domain d,
                            Predicate prd, InstrItinClass itin = IIC_SSE_MOVNT> {
   let Predicates = [prd] in
   defm Z : avx512_movnt<opc, OpcodeStr, st_frag,
                         !cast<ValueType>("v"##vsz512##elty##elsz), VR512,
                         !cast<X86MemOperand>(elty##"512mem"), d, itin>,
                         EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_movnt<opc, OpcodeStr, st_frag,
                              !cast<ValueType>("v"##vsz256##elty##elsz), VR256X,
                              !cast<X86MemOperand>(elty##"256mem"), d, itin>,
                              EVEX_V256;
 
     defm Z128 : avx512_movnt<opc, OpcodeStr, st_frag,
                              !cast<ValueType>("v"##vsz128##elty##elsz), VR128X,
                              !cast<X86MemOperand>(elty##"128mem"), d, itin>,
                              EVEX_V128;
   }
 }
 
 defm VMOVNTDQ : avx512_movnt_vl<0xE7, "vmovntdq", alignednontemporalstore,
                                 "i", "64", "8", "4", "2", SSEPackedInt,
                                 HasAVX512>, PD, EVEX_CD8<64, CD8VF>;
 
 defm VMOVNTPD : avx512_movnt_vl<0x2B, "vmovntpd", alignednontemporalstore,
                                 "f", "64", "8", "4", "2", SSEPackedDouble,
                                 HasAVX512>, PD, VEX_W, EVEX_CD8<64, CD8VF>;
 
 defm VMOVNTPS : avx512_movnt_vl<0x2B, "vmovntps", alignednontemporalstore,
                                 "f", "32", "16", "8", "4", SSEPackedSingle,
                                 HasAVX512>, PS, EVEX_CD8<32, CD8VF>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512 - Integer arithmetic
 //
 multiclass avx512_binop_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,
                            X86VectorVTInfo _, OpndItins itins,
                            bit IsCommutable = 0> {
   defm rr : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                     (ins _.RC:$src1, _.RC:$src2), OpcodeStr,
                     "$src2, $src1", "$src1, $src2",
                     (_.VT (OpNode _.RC:$src1, _.RC:$src2)),
                     itins.rr, IsCommutable>,
             AVX512BIBase, EVEX_4V;
 
   let mayLoad = 1 in
     defm rm : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                     (ins _.RC:$src1, _.MemOp:$src2), OpcodeStr,
                     "$src2, $src1", "$src1, $src2",
                     (_.VT (OpNode _.RC:$src1,
                                   (bitconvert (_.LdFrag addr:$src2)))),
                     itins.rm>,
               AVX512BIBase, EVEX_4V;
 }
 
 multiclass avx512_binop_rmb<bits<8> opc, string OpcodeStr, SDNode OpNode,
                             X86VectorVTInfo _, OpndItins itins,
                             bit IsCommutable = 0> :
            avx512_binop_rm<opc, OpcodeStr, OpNode, _, itins, IsCommutable> {
   let mayLoad = 1 in
     defm rmb : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                     (ins _.RC:$src1, _.ScalarMemOp:$src2), OpcodeStr,
                     "${src2}"##_.BroadcastStr##", $src1",
                     "$src1, ${src2}"##_.BroadcastStr,
                     (_.VT (OpNode _.RC:$src1,
                                   (X86VBroadcast
                                       (_.ScalarLdFrag addr:$src2)))),
                     itins.rm>,
                AVX512BIBase, EVEX_4V, EVEX_B;
 }
 
 multiclass avx512_binop_rm_vl<bits<8> opc, string OpcodeStr, SDNode OpNode,
                               AVX512VLVectorVTInfo VTInfo, OpndItins itins,
                               Predicate prd, bit IsCommutable = 0> {
   let Predicates = [prd] in
     defm Z : avx512_binop_rm<opc, OpcodeStr, OpNode, VTInfo.info512, itins,
                              IsCommutable>, EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_binop_rm<opc, OpcodeStr, OpNode, VTInfo.info256, itins,
                              IsCommutable>, EVEX_V256;
     defm Z128 : avx512_binop_rm<opc, OpcodeStr, OpNode, VTInfo.info128, itins,
                              IsCommutable>, EVEX_V128;
   }
 }
 
 multiclass avx512_binop_rmb_vl<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                AVX512VLVectorVTInfo VTInfo, OpndItins itins,
                                Predicate prd, bit IsCommutable = 0> {
   let Predicates = [prd] in
     defm Z : avx512_binop_rmb<opc, OpcodeStr, OpNode, VTInfo.info512, itins,
                              IsCommutable>, EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_binop_rmb<opc, OpcodeStr, OpNode, VTInfo.info256, itins,
                              IsCommutable>, EVEX_V256;
     defm Z128 : avx512_binop_rmb<opc, OpcodeStr, OpNode, VTInfo.info128, itins,
                              IsCommutable>, EVEX_V128;
   }
 }
 
 multiclass avx512_binop_rm_vl_q<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                 OpndItins itins, Predicate prd,
                                 bit IsCommutable = 0> {
   defm NAME : avx512_binop_rmb_vl<opc, OpcodeStr, OpNode, avx512vl_i64_info,
                                itins, prd, IsCommutable>,
                                VEX_W, EVEX_CD8<64, CD8VF>;
 }
 
 multiclass avx512_binop_rm_vl_d<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                 OpndItins itins, Predicate prd,
                                 bit IsCommutable = 0> {
   defm NAME : avx512_binop_rmb_vl<opc, OpcodeStr, OpNode, avx512vl_i32_info,
                                itins, prd, IsCommutable>, EVEX_CD8<32, CD8VF>;
 }
 
 multiclass avx512_binop_rm_vl_w<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                 OpndItins itins, Predicate prd,
                                 bit IsCommutable = 0> {
   defm NAME : avx512_binop_rm_vl<opc, OpcodeStr, OpNode, avx512vl_i16_info,
                               itins, prd, IsCommutable>, EVEX_CD8<16, CD8VF>;
 }
 
 multiclass avx512_binop_rm_vl_b<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                 OpndItins itins, Predicate prd,
                                 bit IsCommutable = 0> {
   defm NAME : avx512_binop_rm_vl<opc, OpcodeStr, OpNode, avx512vl_i8_info,
                               itins, prd, IsCommutable>, EVEX_CD8<8, CD8VF>;
 }
 
 multiclass avx512_binop_rm_vl_dq<bits<8> opc_d, bits<8> opc_q, string OpcodeStr,
                                  SDNode OpNode, OpndItins itins, Predicate prd,
                                  bit IsCommutable = 0> {
   defm Q : avx512_binop_rm_vl_q<opc_q, OpcodeStr#"q", OpNode, itins, prd,
                                    IsCommutable>;
 
   defm D : avx512_binop_rm_vl_d<opc_d, OpcodeStr#"d", OpNode, itins, prd,
                                    IsCommutable>;
 }
 
 multiclass avx512_binop_rm_vl_bw<bits<8> opc_b, bits<8> opc_w, string OpcodeStr,
                                  SDNode OpNode, OpndItins itins, Predicate prd,
                                  bit IsCommutable = 0> {
   defm W : avx512_binop_rm_vl_w<opc_w, OpcodeStr#"w", OpNode, itins, prd,
                                    IsCommutable>;
 
   defm B : avx512_binop_rm_vl_b<opc_b, OpcodeStr#"b", OpNode, itins, prd,
                                    IsCommutable>;
 }
 
 multiclass avx512_binop_rm_vl_all<bits<8> opc_b, bits<8> opc_w,
                                   bits<8> opc_d, bits<8> opc_q,
                                   string OpcodeStr, SDNode OpNode,
                                   OpndItins itins, bit IsCommutable = 0> {
   defm NAME : avx512_binop_rm_vl_dq<opc_d, opc_q, OpcodeStr, OpNode,
                                     itins, HasAVX512, IsCommutable>,
               avx512_binop_rm_vl_bw<opc_b, opc_w, OpcodeStr, OpNode,
                                     itins, HasBWI, IsCommutable>;
 }
 
 multiclass avx512_binop_rm2<bits<8> opc, string OpcodeStr, OpndItins itins,
                             SDNode OpNode,X86VectorVTInfo _Src,
                             X86VectorVTInfo _Dst, bit IsCommutable = 0> {
   defm rr : AVX512_maskable<opc, MRMSrcReg, _Dst, (outs _Dst.RC:$dst),
                             (ins _Src.RC:$src1, _Src.RC:$src2), OpcodeStr,
                             "$src2, $src1","$src1, $src2",
                             (_Dst.VT (OpNode
                                          (_Src.VT _Src.RC:$src1),
                                          (_Src.VT _Src.RC:$src2))),
                             itins.rr, IsCommutable>,
                             AVX512BIBase, EVEX_4V;
   let mayLoad = 1 in {
       defm rm : AVX512_maskable<opc, MRMSrcMem, _Dst, (outs _Dst.RC:$dst),
                             (ins _Src.RC:$src1, _Src.MemOp:$src2), OpcodeStr,
                             "$src2, $src1", "$src1, $src2",
                             (_Dst.VT (OpNode (_Src.VT _Src.RC:$src1),
                                           (bitconvert (_Src.LdFrag addr:$src2)))),
                             itins.rm>,
                             AVX512BIBase, EVEX_4V;
 
       defm rmb : AVX512_maskable<opc, MRMSrcMem, _Dst, (outs _Dst.RC:$dst),
                         (ins _Src.RC:$src1, _Dst.ScalarMemOp:$src2),
                         OpcodeStr,
                         "${src2}"##_Dst.BroadcastStr##", $src1",
                          "$src1, ${src2}"##_Dst.BroadcastStr,
                         (_Dst.VT (OpNode (_Src.VT _Src.RC:$src1), (bitconvert
                                      (_Dst.VT (X86VBroadcast
                                               (_Dst.ScalarLdFrag addr:$src2)))))),
                         itins.rm>,
                         AVX512BIBase, EVEX_4V, EVEX_B;
   }
 }
 
 defm VPADD : avx512_binop_rm_vl_all<0xFC, 0xFD, 0xFE, 0xD4, "vpadd", add,
                                     SSE_INTALU_ITINS_P, 1>;
 defm VPSUB : avx512_binop_rm_vl_all<0xF8, 0xF9, 0xFA, 0xFB, "vpsub", sub,
                                     SSE_INTALU_ITINS_P, 0>;
 defm VPADDS : avx512_binop_rm_vl_bw<0xEC, 0xED, "vpadds", X86adds,
                                     SSE_INTALU_ITINS_P, HasBWI, 1>;
 defm VPSUBS : avx512_binop_rm_vl_bw<0xE8, 0xE9, "vpsubs", X86subs,
                                     SSE_INTALU_ITINS_P, HasBWI, 0>;
 defm VPADDUS : avx512_binop_rm_vl_bw<0xDC, 0xDD, "vpaddus", X86addus,
                                      SSE_INTALU_ITINS_P, HasBWI, 1>;
 defm VPSUBUS : avx512_binop_rm_vl_bw<0xD8, 0xD9, "vpsubus", X86subus,
                                      SSE_INTALU_ITINS_P, HasBWI, 0>;
 defm VPMULLD : avx512_binop_rm_vl_d<0x40, "vpmulld", mul,
                                     SSE_INTALU_ITINS_P, HasAVX512, 1>, T8PD;
 defm VPMULLW : avx512_binop_rm_vl_w<0xD5, "vpmullw", mul,
                                     SSE_INTALU_ITINS_P, HasBWI, 1>;
 defm VPMULLQ : avx512_binop_rm_vl_q<0x40, "vpmullq", mul,
                                     SSE_INTALU_ITINS_P, HasDQI, 1>, T8PD;
 defm VPMULHW : avx512_binop_rm_vl_w<0xE5, "vpmulhw", mulhs, SSE_INTALU_ITINS_P,
                                     HasBWI, 1>;
 defm VPMULHUW : avx512_binop_rm_vl_w<0xE4, "vpmulhuw", mulhu, SSE_INTMUL_ITINS_P,
                                      HasBWI, 1>;
 defm VPMULHRSW : avx512_binop_rm_vl_w<0x0B, "vpmulhrsw", X86mulhrs, SSE_INTMUL_ITINS_P,
                                       HasBWI, 1>, T8PD;
 defm VPAVG : avx512_binop_rm_vl_bw<0xE0, 0xE3, "vpavg", X86avg,
                                    SSE_INTALU_ITINS_P, HasBWI, 1>;
 
 multiclass avx512_binop_all<bits<8> opc, string OpcodeStr, OpndItins itins,
                             SDNode OpNode, bit IsCommutable = 0> {
 
   defm NAME#Z : avx512_binop_rm2<opc, OpcodeStr, itins, OpNode,
                                  v16i32_info, v8i64_info, IsCommutable>,
                                 EVEX_V512, EVEX_CD8<64, CD8VF>, VEX_W;
   let Predicates = [HasVLX] in {
     defm NAME#Z256 : avx512_binop_rm2<opc, OpcodeStr, itins, OpNode,
                                       v8i32x_info, v4i64x_info, IsCommutable>,
                                      EVEX_V256, EVEX_CD8<64, CD8VF>, VEX_W;
     defm NAME#Z128 : avx512_binop_rm2<opc, OpcodeStr, itins, OpNode,
                                       v4i32x_info, v2i64x_info, IsCommutable>,
                                      EVEX_V128, EVEX_CD8<64, CD8VF>, VEX_W;
   }
 }
 
 defm VPMULDQ : avx512_binop_all<0x28, "vpmuldq", SSE_INTALU_ITINS_P,
                    X86pmuldq, 1>,T8PD;
 defm VPMULUDQ : avx512_binop_all<0xF4, "vpmuludq", SSE_INTMUL_ITINS_P,
                    X86pmuludq, 1>;
 
 multiclass avx512_packs_rmb<bits<8> opc, string OpcodeStr, SDNode OpNode,
                             X86VectorVTInfo _Src, X86VectorVTInfo _Dst> {
   let mayLoad = 1 in {
       defm rmb : AVX512_maskable<opc, MRMSrcMem, _Dst, (outs _Dst.RC:$dst),
                         (ins _Src.RC:$src1, _Src.ScalarMemOp:$src2),
                         OpcodeStr,
                         "${src2}"##_Src.BroadcastStr##", $src1",
                          "$src1, ${src2}"##_Src.BroadcastStr,
                         (_Dst.VT (OpNode (_Src.VT _Src.RC:$src1), (bitconvert
                                      (_Src.VT (X86VBroadcast
                                               (_Src.ScalarLdFrag addr:$src2))))))>,
                         EVEX_4V, EVEX_B, EVEX_CD8<_Src.EltSize, CD8VF>;
   }
 }
 
 multiclass avx512_packs_rm<bits<8> opc, string OpcodeStr,
                             SDNode OpNode,X86VectorVTInfo _Src,
                             X86VectorVTInfo _Dst> {
   defm rr : AVX512_maskable<opc, MRMSrcReg, _Dst, (outs _Dst.RC:$dst),
                             (ins _Src.RC:$src1, _Src.RC:$src2), OpcodeStr,
                             "$src2, $src1","$src1, $src2",
                             (_Dst.VT (OpNode
                                          (_Src.VT _Src.RC:$src1),
                                          (_Src.VT _Src.RC:$src2)))>,
                             EVEX_CD8<_Src.EltSize, CD8VF>, EVEX_4V;
   let mayLoad = 1 in {
     defm rm : AVX512_maskable<opc, MRMSrcMem, _Dst, (outs _Dst.RC:$dst),
                           (ins _Src.RC:$src1, _Src.MemOp:$src2), OpcodeStr,
                           "$src2, $src1", "$src1, $src2",
                           (_Dst.VT (OpNode (_Src.VT _Src.RC:$src1),
                                         (bitconvert (_Src.LdFrag addr:$src2))))>,
                            EVEX_4V, EVEX_CD8<_Src.EltSize, CD8VF>;
   }
 }
 
 multiclass avx512_packs_all_i32_i16<bits<8> opc, string OpcodeStr,
                                     SDNode OpNode> {
   defm NAME#Z : avx512_packs_rm<opc, OpcodeStr, OpNode, v16i32_info,
                                  v32i16_info>,
                 avx512_packs_rmb<opc, OpcodeStr, OpNode, v16i32_info,
                                  v32i16_info>, EVEX_V512;
   let Predicates = [HasVLX] in {
     defm NAME#Z256 : avx512_packs_rm<opc, OpcodeStr, OpNode, v8i32x_info,
                                      v16i16x_info>,
                      avx512_packs_rmb<opc, OpcodeStr, OpNode, v8i32x_info,
                                      v16i16x_info>, EVEX_V256;
     defm NAME#Z128 : avx512_packs_rm<opc, OpcodeStr, OpNode, v4i32x_info,
                                      v8i16x_info>,
                      avx512_packs_rmb<opc, OpcodeStr, OpNode, v4i32x_info,
                                      v8i16x_info>, EVEX_V128;
   }
 }
 multiclass avx512_packs_all_i16_i8<bits<8> opc, string OpcodeStr,
                             SDNode OpNode> {
   defm NAME#Z : avx512_packs_rm<opc, OpcodeStr, OpNode, v32i16_info,
                                 v64i8_info>, EVEX_V512;
   let Predicates = [HasVLX] in {
     defm NAME#Z256 : avx512_packs_rm<opc, OpcodeStr, OpNode, v16i16x_info,
                                     v32i8x_info>, EVEX_V256;
     defm NAME#Z128 : avx512_packs_rm<opc, OpcodeStr, OpNode, v8i16x_info,
                                     v16i8x_info>, EVEX_V128;
   }
 }
 
 multiclass avx512_vpmadd<bits<8> opc, string OpcodeStr,
                             SDNode OpNode, AVX512VLVectorVTInfo _Src,
                             AVX512VLVectorVTInfo _Dst> {
   defm NAME#Z : avx512_packs_rm<opc, OpcodeStr, OpNode, _Src.info512,
                                 _Dst.info512>, EVEX_V512;
   let Predicates = [HasVLX] in {
     defm NAME#Z256 : avx512_packs_rm<opc, OpcodeStr, OpNode, _Src.info256,
                                      _Dst.info256>, EVEX_V256;
     defm NAME#Z128 : avx512_packs_rm<opc, OpcodeStr, OpNode, _Src.info128,
                                      _Dst.info128>, EVEX_V128;
   }
 }
 
 let Predicates = [HasBWI] in {
   defm VPACKSSDW : avx512_packs_all_i32_i16<0x6B, "vpackssdw", X86Packss>, PD;
   defm VPACKUSDW : avx512_packs_all_i32_i16<0x2b, "vpackusdw", X86Packus>, T8PD;
   defm VPACKSSWB : avx512_packs_all_i16_i8 <0x63, "vpacksswb", X86Packss>, AVX512BIBase, VEX_W;
   defm VPACKUSWB : avx512_packs_all_i16_i8 <0x67, "vpackuswb", X86Packus>, AVX512BIBase, VEX_W;
 
   defm VPMADDUBSW : avx512_vpmadd<0x04, "vpmaddubsw", X86vpmaddubsw,
                        avx512vl_i8_info, avx512vl_i16_info>, AVX512BIBase, T8PD;
   defm VPMADDWD   : avx512_vpmadd<0xF5, "vpmaddwd", X86vpmaddwd,
                        avx512vl_i16_info, avx512vl_i32_info>, AVX512BIBase;
 }
 
 defm VPMAXSB : avx512_binop_rm_vl_b<0x3C, "vpmaxsb", smax,
                                      SSE_INTALU_ITINS_P, HasBWI, 1>, T8PD;
 defm VPMAXSW : avx512_binop_rm_vl_w<0xEE, "vpmaxsw", smax,
                                      SSE_INTALU_ITINS_P, HasBWI, 1>;
 defm VPMAXS : avx512_binop_rm_vl_dq<0x3D, 0x3D, "vpmaxs", smax,
                                      SSE_INTALU_ITINS_P, HasAVX512, 1>, T8PD;
 
 defm VPMAXUB : avx512_binop_rm_vl_b<0xDE, "vpmaxub", umax,
                                      SSE_INTALU_ITINS_P, HasBWI, 1>;
 defm VPMAXUW : avx512_binop_rm_vl_w<0x3E, "vpmaxuw", umax,
                                      SSE_INTALU_ITINS_P, HasBWI, 1>, T8PD;
 defm VPMAXU : avx512_binop_rm_vl_dq<0x3F, 0x3F, "vpmaxu", umax,
                                      SSE_INTALU_ITINS_P, HasAVX512, 1>, T8PD;
 
 defm VPMINSB : avx512_binop_rm_vl_b<0x38, "vpminsb", smin,
                                      SSE_INTALU_ITINS_P, HasBWI, 1>, T8PD;
 defm VPMINSW : avx512_binop_rm_vl_w<0xEA, "vpminsw", smin,
                                      SSE_INTALU_ITINS_P, HasBWI, 1>;
 defm VPMINS : avx512_binop_rm_vl_dq<0x39, 0x39, "vpmins", smin,
                                      SSE_INTALU_ITINS_P, HasAVX512, 1>, T8PD;
 
 defm VPMINUB : avx512_binop_rm_vl_b<0xDA, "vpminub", umin,
                                      SSE_INTALU_ITINS_P, HasBWI, 1>;
 defm VPMINUW : avx512_binop_rm_vl_w<0x3A, "vpminuw", umin,
                                      SSE_INTALU_ITINS_P, HasBWI, 1>, T8PD;
 defm VPMINU : avx512_binop_rm_vl_dq<0x3B, 0x3B, "vpminu", umin,
                                      SSE_INTALU_ITINS_P, HasAVX512, 1>, T8PD;
 //===----------------------------------------------------------------------===//
 // AVX-512  Logical Instructions
 //===----------------------------------------------------------------------===//
 
 defm VPAND : avx512_binop_rm_vl_dq<0xDB, 0xDB, "vpand", and,
                                   SSE_INTALU_ITINS_P, HasAVX512, 1>;
 defm VPOR : avx512_binop_rm_vl_dq<0xEB, 0xEB, "vpor", or,
                                   SSE_INTALU_ITINS_P, HasAVX512, 1>;
 defm VPXOR : avx512_binop_rm_vl_dq<0xEF, 0xEF, "vpxor", xor,
                                   SSE_INTALU_ITINS_P, HasAVX512, 1>;
 defm VPANDN : avx512_binop_rm_vl_dq<0xDF, 0xDF, "vpandn", X86andnp,
                                   SSE_INTALU_ITINS_P, HasAVX512, 0>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512  FP arithmetic
 //===----------------------------------------------------------------------===//
 multiclass avx512_fp_scalar<bits<8> opc, string OpcodeStr,X86VectorVTInfo _,
                          SDNode OpNode, SDNode VecNode, OpndItins itins,
                          bit IsCommutable> {
 
   defm rr_Int : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                            (ins _.RC:$src1, _.RC:$src2), OpcodeStr,
                            "$src2, $src1", "$src1, $src2",
                            (VecNode (_.VT _.RC:$src1), (_.VT _.RC:$src2),
                            (i32 FROUND_CURRENT)),
                            itins.rr, IsCommutable>;
 
   defm rm_Int : AVX512_maskable_scalar<opc, MRMSrcMem, _, (outs _.RC:$dst),
                          (ins _.RC:$src1, _.MemOp:$src2), OpcodeStr,
                          "$src2, $src1", "$src1, $src2",
                          (VecNode (_.VT _.RC:$src1),
                           (_.VT (scalar_to_vector (_.ScalarLdFrag addr:$src2))),
                            (i32 FROUND_CURRENT)),
                          itins.rm, IsCommutable>;
   let isCodeGenOnly = 1, isCommutable = IsCommutable,
       Predicates = [HasAVX512] in {
   def rr : I< opc, MRMSrcReg, (outs _.FRC:$dst),
                          (ins _.FRC:$src1, _.FRC:$src2),
                           OpcodeStr#"\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                           [(set _.FRC:$dst, (OpNode _.FRC:$src1, _.FRC:$src2))],
                           itins.rr>;
   def rm : I< opc, MRMSrcMem, (outs _.FRC:$dst),
                          (ins _.FRC:$src1, _.ScalarMemOp:$src2),
                          OpcodeStr#"\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                          [(set _.FRC:$dst, (OpNode _.FRC:$src1,
                          (_.ScalarLdFrag addr:$src2)))], itins.rr>;
   }
 }
 
 multiclass avx512_fp_scalar_round<bits<8> opc, string OpcodeStr,X86VectorVTInfo _,
                          SDNode VecNode, OpndItins itins, bit IsCommutable = 0> {
 
   defm rrb : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                           (ins _.RC:$src1, _.RC:$src2, AVX512RC:$rc), OpcodeStr,
                           "$rc, $src2, $src1", "$src1, $src2, $rc",
                           (VecNode (_.VT _.RC:$src1), (_.VT _.RC:$src2),
                           (i32 imm:$rc)), itins.rr, IsCommutable>,
                           EVEX_B, EVEX_RC;
 }
 multiclass avx512_fp_scalar_sae<bits<8> opc, string OpcodeStr,X86VectorVTInfo _,
                          SDNode VecNode, OpndItins itins, bit IsCommutable> {
 
   defm rrb : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                             (ins _.RC:$src1, _.RC:$src2), OpcodeStr,
                             "{sae}, $src2, $src1", "$src1, $src2, {sae}",
                             (VecNode (_.VT _.RC:$src1), (_.VT _.RC:$src2),
                             (i32 FROUND_NO_EXC))>, EVEX_B;
 }
 
 multiclass avx512_binop_s_round<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                   SDNode VecNode,
                                   SizeItins itins, bit IsCommutable> {
   defm SSZ : avx512_fp_scalar<opc, OpcodeStr#"ss", f32x_info, OpNode, VecNode,
                               itins.s, IsCommutable>,
              avx512_fp_scalar_round<opc, OpcodeStr#"ss", f32x_info, VecNode,
                               itins.s, IsCommutable>,
                               XS, EVEX_4V, VEX_LIG,  EVEX_CD8<32, CD8VT1>;
   defm SDZ : avx512_fp_scalar<opc, OpcodeStr#"sd", f64x_info, OpNode, VecNode,
                               itins.d,                  IsCommutable>,
              avx512_fp_scalar_round<opc, OpcodeStr#"sd", f64x_info, VecNode,
                               itins.d, IsCommutable>,
                               XD, VEX_W, EVEX_4V, VEX_LIG, EVEX_CD8<64, CD8VT1>;
 }
 
 multiclass avx512_binop_s_sae<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                   SDNode VecNode,
                                   SizeItins itins, bit IsCommutable> {
   defm SSZ : avx512_fp_scalar<opc, OpcodeStr#"ss", f32x_info, OpNode, VecNode,
                               itins.s, IsCommutable>,
              avx512_fp_scalar_sae<opc, OpcodeStr#"ss", f32x_info, VecNode,
                               itins.s, IsCommutable>,
                               XS, EVEX_4V, VEX_LIG,  EVEX_CD8<32, CD8VT1>;
   defm SDZ : avx512_fp_scalar<opc, OpcodeStr#"sd", f64x_info, OpNode, VecNode,
                               itins.d,                  IsCommutable>,
              avx512_fp_scalar_sae<opc, OpcodeStr#"sd", f64x_info, VecNode,
                               itins.d, IsCommutable>,
                               XD, VEX_W, EVEX_4V, VEX_LIG, EVEX_CD8<64, CD8VT1>;
 }
 defm VADD : avx512_binop_s_round<0x58, "vadd", fadd, X86faddRnd, SSE_ALU_ITINS_S, 1>;
 defm VMUL : avx512_binop_s_round<0x59, "vmul", fmul, X86fmulRnd, SSE_ALU_ITINS_S, 1>;
 defm VSUB : avx512_binop_s_round<0x5C, "vsub", fsub, X86fsubRnd, SSE_ALU_ITINS_S, 0>;
 defm VDIV : avx512_binop_s_round<0x5E, "vdiv", fdiv, X86fdivRnd, SSE_ALU_ITINS_S, 0>;
 defm VMIN : avx512_binop_s_sae  <0x5D, "vmin", X86fmin, X86fminRnd, SSE_ALU_ITINS_S, 1>;
 defm VMAX : avx512_binop_s_sae  <0x5F, "vmax", X86fmax, X86fmaxRnd, SSE_ALU_ITINS_S, 1>;
 
 multiclass avx512_fp_packed<bits<8> opc, string OpcodeStr, SDNode OpNode,
                             X86VectorVTInfo _, bit IsCommutable> {
   defm rr: AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                   (ins _.RC:$src1, _.RC:$src2), OpcodeStr##_.Suffix,
                   "$src2, $src1", "$src1, $src2",
                   (_.VT (OpNode _.RC:$src1, _.RC:$src2))>, EVEX_4V;
   let mayLoad = 1 in {
     defm rm: AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                     (ins _.RC:$src1, _.MemOp:$src2), OpcodeStr##_.Suffix,
                     "$src2, $src1", "$src1, $src2",
                     (OpNode _.RC:$src1, (_.LdFrag addr:$src2))>, EVEX_4V;
     defm rmb: AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                      (ins _.RC:$src1, _.ScalarMemOp:$src2), OpcodeStr##_.Suffix,
                      "${src2}"##_.BroadcastStr##", $src1",
                      "$src1, ${src2}"##_.BroadcastStr,
                      (OpNode  _.RC:$src1, (_.VT (X86VBroadcast
                                                 (_.ScalarLdFrag addr:$src2))))>,
                      EVEX_4V, EVEX_B;
   }//let mayLoad = 1
 }
 
 multiclass avx512_fp_round_packed<bits<8> opc, string OpcodeStr, SDNode OpNodeRnd,
                             X86VectorVTInfo _> {
   defm rb: AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                   (ins _.RC:$src1, _.RC:$src2, AVX512RC:$rc), OpcodeStr##_.Suffix,
                   "$rc, $src2, $src1", "$src1, $src2, $rc",
                   (_.VT (OpNodeRnd _.RC:$src1, _.RC:$src2, (i32 imm:$rc)))>,
                   EVEX_4V, EVEX_B, EVEX_RC;
 }
 
 
 multiclass avx512_fp_sae_packed<bits<8> opc, string OpcodeStr, SDNode OpNodeRnd,
                             X86VectorVTInfo _> {
   defm rb: AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                   (ins _.RC:$src1, _.RC:$src2), OpcodeStr##_.Suffix,
                   "{sae}, $src2, $src1", "$src1, $src2, {sae}",
                   (_.VT (OpNodeRnd _.RC:$src1, _.RC:$src2, (i32 FROUND_NO_EXC)))>,
                   EVEX_4V, EVEX_B;
 }
 
 multiclass avx512_fp_binop_p<bits<8> opc, string OpcodeStr, SDNode OpNode,
                              bit IsCommutable = 0> {
   defm PSZ : avx512_fp_packed<opc, OpcodeStr, OpNode, v16f32_info,
                               IsCommutable>, EVEX_V512, PS,
                               EVEX_CD8<32, CD8VF>;
   defm PDZ : avx512_fp_packed<opc, OpcodeStr, OpNode, v8f64_info,
                               IsCommutable>, EVEX_V512, PD, VEX_W,
                               EVEX_CD8<64, CD8VF>;
 
     // Define only if AVX512VL feature is present.
   let Predicates = [HasVLX] in {
     defm PSZ128 : avx512_fp_packed<opc, OpcodeStr, OpNode, v4f32x_info,
                                    IsCommutable>, EVEX_V128, PS,
                                    EVEX_CD8<32, CD8VF>;
     defm PSZ256 : avx512_fp_packed<opc, OpcodeStr, OpNode, v8f32x_info,
                                    IsCommutable>, EVEX_V256, PS,
                                    EVEX_CD8<32, CD8VF>;
     defm PDZ128 : avx512_fp_packed<opc, OpcodeStr, OpNode, v2f64x_info,
                                    IsCommutable>, EVEX_V128, PD, VEX_W,
                                    EVEX_CD8<64, CD8VF>;
     defm PDZ256 : avx512_fp_packed<opc, OpcodeStr, OpNode, v4f64x_info,
                                    IsCommutable>, EVEX_V256, PD, VEX_W,
                                    EVEX_CD8<64, CD8VF>;
   }
 }
 
 multiclass avx512_fp_binop_p_round<bits<8> opc, string OpcodeStr, SDNode OpNodeRnd> {
   defm PSZ : avx512_fp_round_packed<opc, OpcodeStr, OpNodeRnd, v16f32_info>,
                               EVEX_V512, PS, EVEX_CD8<32, CD8VF>;
   defm PDZ : avx512_fp_round_packed<opc, OpcodeStr, OpNodeRnd, v8f64_info>,
                               EVEX_V512, PD, VEX_W,EVEX_CD8<64, CD8VF>;
 }
 
 multiclass avx512_fp_binop_p_sae<bits<8> opc, string OpcodeStr, SDNode OpNodeRnd> {
   defm PSZ : avx512_fp_sae_packed<opc, OpcodeStr, OpNodeRnd, v16f32_info>,
                               EVEX_V512, PS, EVEX_CD8<32, CD8VF>;
   defm PDZ : avx512_fp_sae_packed<opc, OpcodeStr, OpNodeRnd, v8f64_info>,
                               EVEX_V512, PD, VEX_W,EVEX_CD8<64, CD8VF>;
 }
 
 defm VADD : avx512_fp_binop_p<0x58, "vadd", fadd, 1>,
             avx512_fp_binop_p_round<0x58, "vadd", X86faddRnd>;
 defm VMUL : avx512_fp_binop_p<0x59, "vmul", fmul, 1>,
             avx512_fp_binop_p_round<0x59, "vmul", X86fmulRnd>;
 defm VSUB : avx512_fp_binop_p<0x5C, "vsub", fsub>,
             avx512_fp_binop_p_round<0x5C, "vsub", X86fsubRnd>;
 defm VDIV : avx512_fp_binop_p<0x5E, "vdiv", fdiv>,
             avx512_fp_binop_p_round<0x5E, "vdiv", X86fdivRnd>;
 defm VMIN : avx512_fp_binop_p<0x5D, "vmin", X86fmin, 1>,
             avx512_fp_binop_p_sae<0x5D, "vmin", X86fminRnd>;
 defm VMAX : avx512_fp_binop_p<0x5F, "vmax", X86fmax, 1>,
             avx512_fp_binop_p_sae<0x5F, "vmax", X86fmaxRnd>;
 let Predicates = [HasDQI] in {
   defm VAND  : avx512_fp_binop_p<0x54, "vand", X86fand, 1>;
   defm VANDN : avx512_fp_binop_p<0x55, "vandn", X86fandn, 0>;
   defm VOR   : avx512_fp_binop_p<0x56, "vor", X86for, 1>;
   defm VXOR  : avx512_fp_binop_p<0x57, "vxor", X86fxor, 1>;
 }
 
 multiclass avx512_fp_scalef_p<bits<8> opc, string OpcodeStr, SDNode OpNode,
                             X86VectorVTInfo _> {
   defm rr: AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                   (ins _.RC:$src1, _.RC:$src2), OpcodeStr##_.Suffix,
                   "$src2, $src1", "$src1, $src2",
                   (_.VT (OpNode _.RC:$src1, _.RC:$src2, (i32 FROUND_CURRENT)))>, EVEX_4V;
   let mayLoad = 1 in {
     defm rm: AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                     (ins _.RC:$src1, _.MemOp:$src2), OpcodeStr##_.Suffix,
                     "$src2, $src1", "$src1, $src2",
                     (OpNode _.RC:$src1, (_.LdFrag addr:$src2), (i32 FROUND_CURRENT))>, EVEX_4V;
     defm rmb: AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                      (ins _.RC:$src1, _.ScalarMemOp:$src2), OpcodeStr##_.Suffix,
                      "${src2}"##_.BroadcastStr##", $src1",
                      "$src1, ${src2}"##_.BroadcastStr,
                      (OpNode  _.RC:$src1, (_.VT (X86VBroadcast
                                                 (_.ScalarLdFrag addr:$src2))), (i32 FROUND_CURRENT))>,
                      EVEX_4V, EVEX_B;
   }//let mayLoad = 1
 }
 
 multiclass avx512_fp_scalef_scalar<bits<8> opc, string OpcodeStr, SDNode OpNode,
                             X86VectorVTInfo _> {
   defm rr: AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                   (ins _.RC:$src1, _.RC:$src2), OpcodeStr##_.Suffix,
                   "$src2, $src1", "$src1, $src2",
                   (_.VT (OpNode _.RC:$src1, _.RC:$src2, (i32 FROUND_CURRENT)))>;
   let mayLoad = 1 in {
     defm rm: AVX512_maskable_scalar<opc, MRMSrcMem, _, (outs _.RC:$dst),
                     (ins _.RC:$src1, _.MemOp:$src2), OpcodeStr##_.Suffix,
                     "$src2, $src1", "$src1, $src2",
                     (OpNode _.RC:$src1, (_.LdFrag addr:$src2), (i32 FROUND_CURRENT))>;
   }//let mayLoad = 1
 }
 
 multiclass avx512_fp_scalef_all<bits<8> opc, bits<8> opcScaler, string OpcodeStr, SDNode OpNode> {
   defm PSZ : avx512_fp_scalef_p<opc, OpcodeStr, OpNode, v16f32_info>,
              avx512_fp_round_packed<opc, OpcodeStr, OpNode, v16f32_info>,
                               EVEX_V512, EVEX_CD8<32, CD8VF>;
   defm PDZ : avx512_fp_scalef_p<opc, OpcodeStr, OpNode, v8f64_info>,
              avx512_fp_round_packed<opc, OpcodeStr, OpNode, v8f64_info>,
                               EVEX_V512, VEX_W, EVEX_CD8<64, CD8VF>;
   defm SSZ128 : avx512_fp_scalef_scalar<opcScaler, OpcodeStr, OpNode, f32x_info>,
                 avx512_fp_scalar_round<opcScaler, OpcodeStr##"ss", f32x_info, OpNode, SSE_ALU_ITINS_S.s>,
                               EVEX_4V,EVEX_CD8<32, CD8VT1>;
   defm SDZ128 : avx512_fp_scalef_scalar<opcScaler, OpcodeStr, OpNode, f64x_info>,
                 avx512_fp_scalar_round<opcScaler, OpcodeStr##"sd", f64x_info, OpNode, SSE_ALU_ITINS_S.d>,
                               EVEX_4V, EVEX_CD8<64, CD8VT1>, VEX_W;
 
   // Define only if AVX512VL feature is present.
   let Predicates = [HasVLX] in {
     defm PSZ128 : avx512_fp_scalef_p<opc, OpcodeStr, OpNode, v4f32x_info>,
                                    EVEX_V128, EVEX_CD8<32, CD8VF>;
     defm PSZ256 : avx512_fp_scalef_p<opc, OpcodeStr, OpNode, v8f32x_info>,
                                    EVEX_V256, EVEX_CD8<32, CD8VF>;
     defm PDZ128 : avx512_fp_scalef_p<opc, OpcodeStr, OpNode, v2f64x_info>,
                                    EVEX_V128, VEX_W, EVEX_CD8<64, CD8VF>;
     defm PDZ256 : avx512_fp_scalef_p<opc, OpcodeStr, OpNode, v4f64x_info>,
                                    EVEX_V256, VEX_W, EVEX_CD8<64, CD8VF>;
   }
 }
 defm VSCALEF : avx512_fp_scalef_all<0x2C, 0x2D, "vscalef", X86scalef>, T8PD;
 
 //===----------------------------------------------------------------------===//
 // AVX-512  VPTESTM instructions
 //===----------------------------------------------------------------------===//
 
 multiclass avx512_vptest<bits<8> opc, string OpcodeStr, SDNode OpNode,
                             X86VectorVTInfo _> {
   defm rr : AVX512_maskable_cmp<opc, MRMSrcReg, _, (outs _.KRC:$dst),
                    (ins _.RC:$src1, _.RC:$src2), OpcodeStr,
                       "$src2, $src1", "$src1, $src2",
                    (OpNode (_.VT _.RC:$src1), (_.VT _.RC:$src2))>,
                     EVEX_4V;
   let mayLoad = 1 in
   defm rm : AVX512_maskable_cmp<opc, MRMSrcMem, _, (outs _.KRC:$dst),
                    (ins _.RC:$src1, _.MemOp:$src2), OpcodeStr,
                        "$src2, $src1", "$src1, $src2",
                    (OpNode (_.VT _.RC:$src1),
                     (_.VT (bitconvert (_.LdFrag addr:$src2))))>,
                     EVEX_4V,
                    EVEX_CD8<_.EltSize, CD8VF>;
 }
 
 multiclass avx512_vptest_mb<bits<8> opc, string OpcodeStr, SDNode OpNode,
                             X86VectorVTInfo _> {
   let mayLoad = 1 in
   defm rmb : AVX512_maskable_cmp<opc, MRMSrcMem, _, (outs _.KRC:$dst),
                     (ins _.RC:$src1, _.ScalarMemOp:$src2), OpcodeStr,
                     "${src2}"##_.BroadcastStr##", $src1",
                     "$src1, ${src2}"##_.BroadcastStr,
                     (OpNode (_.VT _.RC:$src1), (_.VT (X86VBroadcast
                                                 (_.ScalarLdFrag addr:$src2))))>,
                     EVEX_B, EVEX_4V, EVEX_CD8<_.EltSize, CD8VF>;
 }
 multiclass avx512_vptest_dq_sizes<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                   AVX512VLVectorVTInfo _> {
   let Predicates  = [HasAVX512] in
   defm Z : avx512_vptest<opc, OpcodeStr, OpNode, _.info512>,
            avx512_vptest_mb<opc, OpcodeStr, OpNode, _.info512>, EVEX_V512;
 
   let Predicates = [HasAVX512, HasVLX] in {
   defm Z256 : avx512_vptest<opc, OpcodeStr, OpNode, _.info256>,
               avx512_vptest_mb<opc, OpcodeStr, OpNode, _.info256>, EVEX_V256;
   defm Z128 : avx512_vptest<opc, OpcodeStr, OpNode, _.info128>,
               avx512_vptest_mb<opc, OpcodeStr, OpNode, _.info128>, EVEX_V128;
   }
 }
 
 multiclass avx512_vptest_dq<bits<8> opc, string OpcodeStr, SDNode OpNode> {
   defm D : avx512_vptest_dq_sizes<opc, OpcodeStr#"d", OpNode,
                                  avx512vl_i32_info>;
   defm Q : avx512_vptest_dq_sizes<opc, OpcodeStr#"q", OpNode,
                                  avx512vl_i64_info>, VEX_W;
 }
 
 multiclass avx512_vptest_wb<bits<8> opc, string OpcodeStr,
                                  SDNode OpNode> {
   let Predicates = [HasBWI] in {
   defm WZ:    avx512_vptest<opc, OpcodeStr#"w", OpNode, v32i16_info>,
               EVEX_V512, VEX_W;
   defm BZ:    avx512_vptest<opc, OpcodeStr#"b", OpNode, v64i8_info>,
               EVEX_V512;
   }
   let Predicates = [HasVLX, HasBWI] in {
 
   defm WZ256: avx512_vptest<opc, OpcodeStr#"w", OpNode, v16i16x_info>,
               EVEX_V256, VEX_W;
   defm WZ128: avx512_vptest<opc, OpcodeStr#"w", OpNode, v8i16x_info>,
               EVEX_V128, VEX_W;
   defm BZ256: avx512_vptest<opc, OpcodeStr#"b", OpNode, v32i8x_info>,
               EVEX_V256;
   defm BZ128: avx512_vptest<opc, OpcodeStr#"b", OpNode, v16i8x_info>,
               EVEX_V128;
   }
 }
 
 multiclass avx512_vptest_all_forms<bits<8> opc_wb, bits<8> opc_dq, string OpcodeStr,
                                    SDNode OpNode> :
   avx512_vptest_wb <opc_wb, OpcodeStr, OpNode>,
   avx512_vptest_dq<opc_dq, OpcodeStr, OpNode>;
 
 defm VPTESTM   : avx512_vptest_all_forms<0x26, 0x27, "vptestm", X86testm>, T8PD;
 defm VPTESTNM  : avx512_vptest_all_forms<0x26, 0x27, "vptestnm", X86testnm>, T8XS;
 
 def : Pat <(i16 (int_x86_avx512_mask_ptestm_d_512 (v16i32 VR512:$src1),
                  (v16i32 VR512:$src2), (i16 -1))),
                  (COPY_TO_REGCLASS (VPTESTMDZrr VR512:$src1, VR512:$src2), GR16)>;
 
 def : Pat <(i8 (int_x86_avx512_mask_ptestm_q_512 (v8i64 VR512:$src1),
                  (v8i64 VR512:$src2), (i8 -1))),
                  (COPY_TO_REGCLASS (VPTESTMQZrr VR512:$src1, VR512:$src2), GR8)>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512  Shift instructions
 //===----------------------------------------------------------------------===//
 multiclass avx512_shift_rmi<bits<8> opc, Format ImmFormR, Format ImmFormM,
                          string OpcodeStr, SDNode OpNode, X86VectorVTInfo _> {
   defm ri : AVX512_maskable<opc, ImmFormR, _, (outs _.RC:$dst),
                    (ins _.RC:$src1, u8imm:$src2), OpcodeStr,
                       "$src2, $src1", "$src1, $src2",
                    (_.VT (OpNode _.RC:$src1, (i8 imm:$src2))),
                    SSE_INTSHIFT_ITINS_P.rr>;
   let mayLoad = 1 in
   defm mi : AVX512_maskable<opc, ImmFormM, _, (outs _.RC:$dst),
                    (ins _.MemOp:$src1, u8imm:$src2), OpcodeStr,
                        "$src2, $src1", "$src1, $src2",
                    (_.VT (OpNode (_.VT (bitconvert (_.LdFrag addr:$src1))),
                           (i8 imm:$src2))),
                    SSE_INTSHIFT_ITINS_P.rm>;
 }
 
 multiclass avx512_shift_rmbi<bits<8> opc, Format ImmFormM,
                          string OpcodeStr, SDNode OpNode, X86VectorVTInfo _> {
   let mayLoad = 1 in
   defm mbi : AVX512_maskable<opc, ImmFormM, _, (outs _.RC:$dst),
                    (ins _.ScalarMemOp:$src1, u8imm:$src2), OpcodeStr,
       "$src2, ${src1}"##_.BroadcastStr, "${src1}"##_.BroadcastStr##", $src2",
      (_.VT (OpNode (X86VBroadcast (_.ScalarLdFrag addr:$src1)), (i8 imm:$src2))),
      SSE_INTSHIFT_ITINS_P.rm>, EVEX_B;
 }
 
 multiclass avx512_shift_rrm<bits<8> opc, string OpcodeStr, SDNode OpNode,
                          ValueType SrcVT, PatFrag bc_frag, X86VectorVTInfo _> {
    // src2 is always 128-bit
   defm rr : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                    (ins _.RC:$src1, VR128X:$src2), OpcodeStr,
                       "$src2, $src1", "$src1, $src2",
                    (_.VT (OpNode _.RC:$src1, (SrcVT VR128X:$src2))),
                    SSE_INTSHIFT_ITINS_P.rr>, AVX512BIBase, EVEX_4V;
   defm rm : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                    (ins _.RC:$src1, i128mem:$src2), OpcodeStr,
                        "$src2, $src1", "$src1, $src2",
                    (_.VT (OpNode _.RC:$src1, (bc_frag (loadv2i64 addr:$src2)))),
                    SSE_INTSHIFT_ITINS_P.rm>, AVX512BIBase,
                    EVEX_4V;
 }
 
 multiclass avx512_shift_sizes<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                   ValueType SrcVT, PatFrag bc_frag,
                                   AVX512VLVectorVTInfo VTInfo, Predicate prd> {
   let Predicates = [prd] in
   defm Z    : avx512_shift_rrm<opc, OpcodeStr, OpNode, SrcVT, bc_frag,
                             VTInfo.info512>, EVEX_V512,
                             EVEX_CD8<VTInfo.info512.EltSize, CD8VQ> ;
   let Predicates = [prd, HasVLX] in {
   defm Z256 : avx512_shift_rrm<opc, OpcodeStr, OpNode, SrcVT, bc_frag,
                             VTInfo.info256>, EVEX_V256,
                             EVEX_CD8<VTInfo.info256.EltSize, CD8VH>;
   defm Z128 : avx512_shift_rrm<opc, OpcodeStr, OpNode, SrcVT, bc_frag,
                             VTInfo.info128>, EVEX_V128,
                             EVEX_CD8<VTInfo.info128.EltSize, CD8VF>;
   }
 }
 
 multiclass avx512_shift_types<bits<8> opcd, bits<8> opcq, bits<8> opcw,
                               string OpcodeStr, SDNode OpNode> {
   defm D : avx512_shift_sizes<opcd, OpcodeStr#"d", OpNode, v4i32, bc_v4i32,
                                  avx512vl_i32_info, HasAVX512>;
   defm Q : avx512_shift_sizes<opcq, OpcodeStr#"q", OpNode, v2i64, bc_v2i64,
                                  avx512vl_i64_info, HasAVX512>, VEX_W;
   defm W : avx512_shift_sizes<opcw, OpcodeStr#"w", OpNode, v8i16, bc_v8i16,
                                  avx512vl_i16_info, HasBWI>;
 }
 
 multiclass avx512_shift_rmi_sizes<bits<8> opc, Format ImmFormR, Format ImmFormM,
                                  string OpcodeStr, SDNode OpNode,
                                  AVX512VLVectorVTInfo VTInfo> {
   let Predicates = [HasAVX512] in
   defm Z:    avx512_shift_rmi<opc, ImmFormR, ImmFormM, OpcodeStr, OpNode,
                               VTInfo.info512>,
              avx512_shift_rmbi<opc, ImmFormM, OpcodeStr, OpNode,
                               VTInfo.info512>, EVEX_V512;
   let Predicates = [HasAVX512, HasVLX] in {
   defm Z256: avx512_shift_rmi<opc, ImmFormR, ImmFormM, OpcodeStr, OpNode,
                               VTInfo.info256>,
              avx512_shift_rmbi<opc, ImmFormM, OpcodeStr, OpNode,
                               VTInfo.info256>, EVEX_V256;
   defm Z128: avx512_shift_rmi<opc, ImmFormR, ImmFormM, OpcodeStr, OpNode,
                               VTInfo.info128>,
              avx512_shift_rmbi<opc, ImmFormM, OpcodeStr, OpNode,
                               VTInfo.info128>, EVEX_V128;
   }
 }
 
 multiclass avx512_shift_rmi_w<bits<8> opcw,
                                  Format ImmFormR, Format ImmFormM,
                                  string OpcodeStr, SDNode OpNode> {
   let Predicates = [HasBWI] in
   defm WZ:    avx512_shift_rmi<opcw, ImmFormR, ImmFormM, OpcodeStr, OpNode,
                                v32i16_info>, EVEX_V512;
   let Predicates = [HasVLX, HasBWI] in {
   defm WZ256: avx512_shift_rmi<opcw, ImmFormR, ImmFormM, OpcodeStr, OpNode,
                                v16i16x_info>, EVEX_V256;
   defm WZ128: avx512_shift_rmi<opcw, ImmFormR, ImmFormM, OpcodeStr, OpNode,
                                v8i16x_info>, EVEX_V128;
   }
 }
 
 multiclass avx512_shift_rmi_dq<bits<8> opcd, bits<8> opcq,
                                  Format ImmFormR, Format ImmFormM,
                                  string OpcodeStr, SDNode OpNode> {
   defm D: avx512_shift_rmi_sizes<opcd, ImmFormR, ImmFormM, OpcodeStr#"d", OpNode,
                                  avx512vl_i32_info>, EVEX_CD8<32, CD8VF>;
   defm Q: avx512_shift_rmi_sizes<opcq, ImmFormR, ImmFormM, OpcodeStr#"q", OpNode,
                                  avx512vl_i64_info>, EVEX_CD8<64, CD8VF>, VEX_W;
 }
 
 defm VPSRL : avx512_shift_rmi_dq<0x72, 0x73, MRM2r, MRM2m, "vpsrl", X86vsrli>,
              avx512_shift_rmi_w<0x71, MRM2r, MRM2m, "vpsrlw", X86vsrli>, AVX512BIi8Base, EVEX_4V;
 
 defm VPSLL : avx512_shift_rmi_dq<0x72, 0x73, MRM6r, MRM6m, "vpsll", X86vshli>,
              avx512_shift_rmi_w<0x71, MRM6r, MRM6m, "vpsllw", X86vshli>, AVX512BIi8Base, EVEX_4V;
 
 defm VPSRA : avx512_shift_rmi_dq<0x72, 0x72, MRM4r, MRM4m, "vpsra", X86vsrai>,
              avx512_shift_rmi_w<0x71, MRM4r, MRM4m, "vpsraw", X86vsrai>, AVX512BIi8Base, EVEX_4V;
 
 defm VPROR : avx512_shift_rmi_dq<0x72, 0x72, MRM0r, MRM0m, "vpror", X86vrotri>, AVX512BIi8Base, EVEX_4V;
 defm VPROL : avx512_shift_rmi_dq<0x72, 0x72, MRM1r, MRM1m, "vprol", X86vrotli>, AVX512BIi8Base, EVEX_4V;
 
 defm VPSLL : avx512_shift_types<0xF2, 0xF3, 0xF1, "vpsll", X86vshl>;
 defm VPSRA : avx512_shift_types<0xE2, 0xE2, 0xE1, "vpsra", X86vsra>;
 defm VPSRL : avx512_shift_types<0xD2, 0xD3, 0xD1, "vpsrl", X86vsrl>;
 
 //===-------------------------------------------------------------------===//
 // Variable Bit Shifts
 //===-------------------------------------------------------------------===//
 multiclass avx512_var_shift<bits<8> opc, string OpcodeStr, SDNode OpNode,
                             X86VectorVTInfo _> {
   defm rr : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                    (ins _.RC:$src1, _.RC:$src2), OpcodeStr,
                       "$src2, $src1", "$src1, $src2",
                    (_.VT (OpNode _.RC:$src1, (_.VT _.RC:$src2))),
                    SSE_INTSHIFT_ITINS_P.rr>, AVX5128IBase, EVEX_4V;
   let mayLoad = 1 in
   defm rm : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                    (ins _.RC:$src1, _.MemOp:$src2), OpcodeStr,
                        "$src2, $src1", "$src1, $src2",
                    (_.VT (OpNode _.RC:$src1,
                    (_.VT (bitconvert (_.LdFrag addr:$src2))))),
                    SSE_INTSHIFT_ITINS_P.rm>, AVX5128IBase, EVEX_4V,
                    EVEX_CD8<_.EltSize, CD8VF>;
 }
 
 multiclass avx512_var_shift_mb<bits<8> opc, string OpcodeStr, SDNode OpNode,
                             X86VectorVTInfo _> {
   let mayLoad = 1 in
   defm rmb : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                     (ins _.RC:$src1, _.ScalarMemOp:$src2), OpcodeStr,
                     "${src2}"##_.BroadcastStr##", $src1",
                     "$src1, ${src2}"##_.BroadcastStr,
                     (_.VT (OpNode _.RC:$src1, (_.VT (X86VBroadcast
                                                 (_.ScalarLdFrag addr:$src2))))),
                     SSE_INTSHIFT_ITINS_P.rm>, AVX5128IBase, EVEX_B,
                     EVEX_4V, EVEX_CD8<_.EltSize, CD8VF>;
 }
 multiclass avx512_var_shift_sizes<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                   AVX512VLVectorVTInfo _> {
   let Predicates  = [HasAVX512] in
   defm Z : avx512_var_shift<opc, OpcodeStr, OpNode, _.info512>,
            avx512_var_shift_mb<opc, OpcodeStr, OpNode, _.info512>, EVEX_V512;
 
   let Predicates = [HasAVX512, HasVLX] in {
   defm Z256 : avx512_var_shift<opc, OpcodeStr, OpNode, _.info256>,
               avx512_var_shift_mb<opc, OpcodeStr, OpNode, _.info256>, EVEX_V256;
   defm Z128 : avx512_var_shift<opc, OpcodeStr, OpNode, _.info128>,
               avx512_var_shift_mb<opc, OpcodeStr, OpNode, _.info128>, EVEX_V128;
   }
 }
 
 multiclass avx512_var_shift_types<bits<8> opc, string OpcodeStr,
                                  SDNode OpNode> {
   defm D : avx512_var_shift_sizes<opc, OpcodeStr#"d", OpNode,
                                  avx512vl_i32_info>;
   defm Q : avx512_var_shift_sizes<opc, OpcodeStr#"q", OpNode,
                                  avx512vl_i64_info>, VEX_W;
 }
 
 // Use 512bit version to implement 128/256 bit in case NoVLX.  
 multiclass avx512_var_shift_w_lowering<AVX512VLVectorVTInfo _, SDNode OpNode> {
   let Predicates = [HasBWI, NoVLX] in {
   def : Pat<(_.info256.VT (OpNode (_.info256.VT _.info256.RC:$src1), 
                                   (_.info256.VT _.info256.RC:$src2))),
             (EXTRACT_SUBREG                
                 (!cast<Instruction>(NAME#"WZrr")
                     (INSERT_SUBREG (_.info512.VT (IMPLICIT_DEF)), VR256X:$src1, sub_ymm),
                     (INSERT_SUBREG (_.info512.VT (IMPLICIT_DEF)), VR256X:$src2, sub_ymm)),
              sub_ymm)>;
 
   def : Pat<(_.info128.VT (OpNode (_.info128.VT _.info128.RC:$src1), 
                                   (_.info128.VT _.info128.RC:$src2))),
             (EXTRACT_SUBREG                
                 (!cast<Instruction>(NAME#"WZrr")
                     (INSERT_SUBREG (_.info512.VT (IMPLICIT_DEF)), VR128X:$src1, sub_xmm),
                     (INSERT_SUBREG (_.info512.VT (IMPLICIT_DEF)), VR128X:$src2, sub_xmm)),
              sub_xmm)>;
   }
 }
 
 multiclass avx512_var_shift_w<bits<8> opc, string OpcodeStr,
                                  SDNode OpNode> {
   let Predicates = [HasBWI] in
   defm WZ:    avx512_var_shift<opc, OpcodeStr, OpNode, v32i16_info>,
               EVEX_V512, VEX_W;
   let Predicates = [HasVLX, HasBWI] in {
 
   defm WZ256: avx512_var_shift<opc, OpcodeStr, OpNode, v16i16x_info>,
               EVEX_V256, VEX_W;
   defm WZ128: avx512_var_shift<opc, OpcodeStr, OpNode, v8i16x_info>,
               EVEX_V128, VEX_W;
   }
 }
 
 defm VPSLLV : avx512_var_shift_types<0x47, "vpsllv", shl>,
               avx512_var_shift_w<0x12, "vpsllvw", shl>,
               avx512_var_shift_w_lowering<avx512vl_i16_info, shl>;
 defm VPSRAV : avx512_var_shift_types<0x46, "vpsrav", sra>,
               avx512_var_shift_w<0x11, "vpsravw", sra>,
               avx512_var_shift_w_lowering<avx512vl_i16_info, sra>;
 defm VPSRLV : avx512_var_shift_types<0x45, "vpsrlv", srl>,
               avx512_var_shift_w<0x10, "vpsrlvw", srl>,
               avx512_var_shift_w_lowering<avx512vl_i16_info, srl>;
 defm VPRORV : avx512_var_shift_types<0x14, "vprorv", rotr>;
 defm VPROLV : avx512_var_shift_types<0x15, "vprolv", rotl>;
 
 //===-------------------------------------------------------------------===//
 // 1-src variable permutation VPERMW/D/Q
 //===-------------------------------------------------------------------===//
 multiclass avx512_vperm_dq_sizes<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                   AVX512VLVectorVTInfo _> {
   let Predicates  = [HasAVX512] in
   defm Z : avx512_var_shift<opc, OpcodeStr, OpNode, _.info512>,
            avx512_var_shift_mb<opc, OpcodeStr, OpNode, _.info512>, EVEX_V512;
 
   let Predicates = [HasAVX512, HasVLX] in
   defm Z256 : avx512_var_shift<opc, OpcodeStr, OpNode, _.info256>,
               avx512_var_shift_mb<opc, OpcodeStr, OpNode, _.info256>, EVEX_V256;
 }
 
 multiclass avx512_vpermi_dq_sizes<bits<8> opc, Format ImmFormR, Format ImmFormM,
                                  string OpcodeStr, SDNode OpNode,
                                  AVX512VLVectorVTInfo VTInfo> {
   let Predicates = [HasAVX512] in
   defm Z:    avx512_shift_rmi<opc, ImmFormR, ImmFormM, OpcodeStr, OpNode,
                               VTInfo.info512>,
              avx512_shift_rmbi<opc, ImmFormM, OpcodeStr, OpNode,
                               VTInfo.info512>, EVEX_V512;
   let Predicates = [HasAVX512, HasVLX] in
   defm Z256: avx512_shift_rmi<opc, ImmFormR, ImmFormM, OpcodeStr, OpNode,
                               VTInfo.info256>,
              avx512_shift_rmbi<opc, ImmFormM, OpcodeStr, OpNode,
                               VTInfo.info256>, EVEX_V256;
 }
 
 
 defm VPERM  : avx512_var_shift_w<0x8D, "vpermw", X86VPermv>;
 
 defm VPERMD : avx512_vperm_dq_sizes<0x36, "vpermd", X86VPermv,
                                     avx512vl_i32_info>;
 defm VPERMQ : avx512_vperm_dq_sizes<0x36, "vpermq", X86VPermv,
                                     avx512vl_i64_info>, VEX_W;
 defm VPERMPS : avx512_vperm_dq_sizes<0x16, "vpermps", X86VPermv,
                                     avx512vl_f32_info>;
 defm VPERMPD : avx512_vperm_dq_sizes<0x16, "vpermpd", X86VPermv,
                                     avx512vl_f64_info>, VEX_W;
 
 defm VPERMQ : avx512_vpermi_dq_sizes<0x00, MRMSrcReg, MRMSrcMem, "vpermq",
                              X86VPermi, avx512vl_i64_info>,
                              EVEX, AVX512AIi8Base, EVEX_CD8<64, CD8VF>, VEX_W;
 defm VPERMPD : avx512_vpermi_dq_sizes<0x01, MRMSrcReg, MRMSrcMem, "vpermpd",
                              X86VPermi, avx512vl_f64_info>,
                              EVEX, AVX512AIi8Base, EVEX_CD8<64, CD8VF>, VEX_W;
 //===----------------------------------------------------------------------===//
 // AVX-512 - VPERMIL 
 //===----------------------------------------------------------------------===//
 
 multiclass avx512_permil_vec<bits<8> OpcVar, string OpcodeStr,  SDNode OpNode,
                              X86VectorVTInfo _, X86VectorVTInfo Ctrl> {
   defm rr: AVX512_maskable<OpcVar, MRMSrcReg, _, (outs _.RC:$dst),
                   (ins _.RC:$src1, Ctrl.RC:$src2), OpcodeStr,
                   "$src2, $src1", "$src1, $src2",
                   (_.VT (OpNode _.RC:$src1,
                                (Ctrl.VT Ctrl.RC:$src2)))>,
                   T8PD, EVEX_4V;
   let mayLoad = 1 in {
     defm rm: AVX512_maskable<OpcVar, MRMSrcMem, _, (outs _.RC:$dst),
                     (ins _.RC:$src1, Ctrl.MemOp:$src2), OpcodeStr,
                     "$src2, $src1", "$src1, $src2",
                     (_.VT (OpNode
                              _.RC:$src1,
                              (Ctrl.VT (bitconvert(Ctrl.LdFrag addr:$src2)))))>,
                     T8PD, EVEX_4V, EVEX_CD8<_.EltSize, CD8VF>;
     defm rmb: AVX512_maskable<OpcVar, MRMSrcMem, _, (outs _.RC:$dst),
                      (ins _.RC:$src1, _.ScalarMemOp:$src2), OpcodeStr,
                      "${src2}"##_.BroadcastStr##", $src1",
                      "$src1, ${src2}"##_.BroadcastStr,
                      (_.VT (OpNode
                               _.RC:$src1,
                               (Ctrl.VT (X86VBroadcast
                                          (Ctrl.ScalarLdFrag addr:$src2)))))>,
                      T8PD, EVEX_4V, EVEX_B, EVEX_CD8<_.EltSize, CD8VF>;
   }//let mayLoad = 1
 }
 
 multiclass avx512_permil_vec_common<string OpcodeStr, bits<8> OpcVar,
                              AVX512VLVectorVTInfo _, AVX512VLVectorVTInfo Ctrl>{
   let Predicates = [HasAVX512] in {
     defm Z    : avx512_permil_vec<OpcVar, OpcodeStr, X86VPermilpv, _.info512,
                                   Ctrl.info512>, EVEX_V512;
   }
   let Predicates = [HasAVX512, HasVLX] in {
     defm Z128 : avx512_permil_vec<OpcVar, OpcodeStr, X86VPermilpv, _.info128,
                                   Ctrl.info128>, EVEX_V128;
     defm Z256 : avx512_permil_vec<OpcVar, OpcodeStr, X86VPermilpv, _.info256,
                                   Ctrl.info256>, EVEX_V256;
   }
 }
 
 multiclass avx512_permil<string OpcodeStr, bits<8> OpcImm, bits<8> OpcVar,
                          AVX512VLVectorVTInfo _, AVX512VLVectorVTInfo Ctrl>{
 
   defm NAME: avx512_permil_vec_common<OpcodeStr, OpcVar, _, Ctrl>;
   defm NAME: avx512_shift_rmi_sizes<OpcImm, MRMSrcReg, MRMSrcMem, OpcodeStr,
                                     X86VPermilpi, _>,
                     EVEX, AVX512AIi8Base, EVEX_CD8<_.info128.EltSize, CD8VF>;
 }
 
 defm VPERMILPS : avx512_permil<"vpermilps", 0x04, 0x0C, avx512vl_f32_info,
                                avx512vl_i32_info>;
 defm VPERMILPD : avx512_permil<"vpermilpd", 0x05, 0x0D, avx512vl_f64_info,
                                avx512vl_i64_info>, VEX_W;
 //===----------------------------------------------------------------------===//
 // AVX-512 - VPSHUFD, VPSHUFLW, VPSHUFHW
 //===----------------------------------------------------------------------===//
 
 defm VPSHUFD : avx512_shift_rmi_sizes<0x70, MRMSrcReg, MRMSrcMem, "vpshufd",
                              X86PShufd, avx512vl_i32_info>,
                              EVEX, AVX512BIi8Base, EVEX_CD8<32, CD8VF>;
 defm VPSHUFH : avx512_shift_rmi_w<0x70, MRMSrcReg, MRMSrcMem, "vpshufhw",
                                   X86PShufhw>, EVEX, AVX512XSIi8Base;
 defm VPSHUFL : avx512_shift_rmi_w<0x70, MRMSrcReg, MRMSrcMem, "vpshuflw",
                                   X86PShuflw>, EVEX, AVX512XDIi8Base;
 
 multiclass avx512_pshufb_sizes<bits<8> opc, string OpcodeStr, SDNode OpNode> {
   let Predicates = [HasBWI] in
   defm Z:    avx512_var_shift<opc, OpcodeStr, OpNode, v64i8_info>, EVEX_V512;
 
   let Predicates = [HasVLX, HasBWI] in {
   defm Z256: avx512_var_shift<opc, OpcodeStr, OpNode, v32i8x_info>, EVEX_V256;
   defm Z128: avx512_var_shift<opc, OpcodeStr, OpNode, v16i8x_info>, EVEX_V128;
   }
 }
 
 defm VPSHUFB: avx512_pshufb_sizes<0x00, "vpshufb", X86pshufb>;
 
 //===----------------------------------------------------------------------===//
 // Move Low to High and High to Low packed FP Instructions
 //===----------------------------------------------------------------------===//
 def VMOVLHPSZrr : AVX512PSI<0x16, MRMSrcReg, (outs VR128X:$dst),
           (ins VR128X:$src1, VR128X:$src2),
           "vmovlhps\t{$src2, $src1, $dst|$dst, $src1, $src2}",
           [(set VR128X:$dst, (v4f32 (X86Movlhps VR128X:$src1, VR128X:$src2)))],
            IIC_SSE_MOV_LH>, EVEX_4V;
 def VMOVHLPSZrr : AVX512PSI<0x12, MRMSrcReg, (outs VR128X:$dst),
           (ins VR128X:$src1, VR128X:$src2),
           "vmovhlps\t{$src2, $src1, $dst|$dst, $src1, $src2}",
           [(set VR128X:$dst, (v4f32 (X86Movhlps VR128X:$src1, VR128X:$src2)))],
           IIC_SSE_MOV_LH>, EVEX_4V;
 
 let Predicates = [HasAVX512] in {
   // MOVLHPS patterns
   def : Pat<(v4i32 (X86Movlhps VR128X:$src1, VR128X:$src2)),
             (VMOVLHPSZrr VR128X:$src1, VR128X:$src2)>;
   def : Pat<(v2i64 (X86Movlhps VR128X:$src1, VR128X:$src2)),
             (VMOVLHPSZrr (v2i64 VR128X:$src1), VR128X:$src2)>;
 
   // MOVHLPS patterns
   def : Pat<(v4i32 (X86Movhlps VR128X:$src1, VR128X:$src2)),
             (VMOVHLPSZrr VR128X:$src1, VR128X:$src2)>;
 }
 
 //===----------------------------------------------------------------------===//
 // VMOVHPS/PD VMOVLPS Instructions
 // All patterns was taken from SSS implementation.
 //===----------------------------------------------------------------------===//
 multiclass avx512_mov_hilo_packed<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                   X86VectorVTInfo _> {
   let mayLoad = 1 in
     def rm : AVX512<opc, MRMSrcMem, (outs _.RC:$dst),
                     (ins _.RC:$src1, f64mem:$src2),
                     !strconcat(OpcodeStr,
                                "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
                     [(set _.RC:$dst,
                        (OpNode _.RC:$src1,
                          (_.VT (bitconvert
                            (v2f64 (scalar_to_vector (loadf64 addr:$src2)))))))],
                     IIC_SSE_MOV_LH>, EVEX_4V;
 }
 
 defm VMOVHPSZ128 : avx512_mov_hilo_packed<0x16, "vmovhps", X86Movlhps,
                                   v4f32x_info>, EVEX_CD8<32, CD8VT2>, PS;
 defm VMOVHPDZ128 : avx512_mov_hilo_packed<0x16, "vmovhpd", X86Movlhpd,
                                   v2f64x_info>, EVEX_CD8<64, CD8VT1>, PD, VEX_W;
 defm VMOVLPSZ128 : avx512_mov_hilo_packed<0x12, "vmovlps", X86Movlps,
                                   v4f32x_info>, EVEX_CD8<32, CD8VT2>, PS;
 defm VMOVLPDZ128 : avx512_mov_hilo_packed<0x12, "vmovlpd", X86Movlpd,
                                   v2f64x_info>, EVEX_CD8<64, CD8VT1>, PD, VEX_W;
 
 let Predicates = [HasAVX512] in {
   // VMOVHPS patterns
   def : Pat<(X86Movlhps VR128X:$src1,
                (bc_v4f32 (v2i64 (scalar_to_vector (loadi64 addr:$src2))))),
           (VMOVHPSZ128rm VR128X:$src1, addr:$src2)>;
   def : Pat<(X86Movlhps VR128X:$src1,
                (bc_v4i32 (v2i64 (X86vzload addr:$src2)))),
           (VMOVHPSZ128rm VR128X:$src1, addr:$src2)>;
   // VMOVHPD patterns
   def : Pat<(v2f64 (X86Unpckl VR128X:$src1,
                     (scalar_to_vector (loadf64 addr:$src2)))),
            (VMOVHPDZ128rm VR128X:$src1, addr:$src2)>;
   def : Pat<(v2f64 (X86Unpckl VR128X:$src1,
                     (bc_v2f64 (v2i64 (scalar_to_vector (loadi64 addr:$src2)))))),
            (VMOVHPDZ128rm VR128X:$src1, addr:$src2)>;
   // VMOVLPS patterns
   def : Pat<(v4f32 (X86Movlps VR128X:$src1, (load addr:$src2))),
           (VMOVLPSZ128rm VR128X:$src1, addr:$src2)>;
   def : Pat<(v4i32 (X86Movlps VR128X:$src1, (load addr:$src2))),
           (VMOVLPSZ128rm VR128X:$src1, addr:$src2)>;
   // VMOVLPD patterns
   def : Pat<(v2f64 (X86Movlpd VR128X:$src1, (load addr:$src2))),
           (VMOVLPDZ128rm VR128X:$src1, addr:$src2)>;
   def : Pat<(v2i64 (X86Movlpd VR128X:$src1, (load addr:$src2))),
           (VMOVLPDZ128rm VR128X:$src1, addr:$src2)>;
   def : Pat<(v2f64 (X86Movsd VR128X:$src1,
                            (v2f64 (scalar_to_vector (loadf64 addr:$src2))))),
           (VMOVLPDZ128rm VR128X:$src1, addr:$src2)>;
 }
 
 let mayStore = 1 in {
 def VMOVHPSZ128mr : AVX512PSI<0x17, MRMDestMem, (outs),
                        (ins f64mem:$dst, VR128X:$src),
                        "vmovhps\t{$src, $dst|$dst, $src}",
                        [(store (f64 (vector_extract
                                      (X86Unpckh (bc_v2f64 (v4f32 VR128X:$src)),
                                                 (bc_v2f64 (v4f32 VR128X:$src))),
                                      (iPTR 0))), addr:$dst)], IIC_SSE_MOV_LH>,
                        EVEX, EVEX_CD8<32, CD8VT2>;
 def VMOVHPDZ128mr : AVX512PDI<0x17, MRMDestMem, (outs),
                        (ins f64mem:$dst, VR128X:$src),
                        "vmovhpd\t{$src, $dst|$dst, $src}",
                        [(store (f64 (vector_extract
                                      (v2f64 (X86Unpckh VR128X:$src, VR128X:$src)),
                                      (iPTR 0))), addr:$dst)], IIC_SSE_MOV_LH>,
                        EVEX, EVEX_CD8<64, CD8VT1>, VEX_W;
 def VMOVLPSZ128mr : AVX512PSI<0x13, MRMDestMem, (outs),
                        (ins f64mem:$dst, VR128X:$src),
                        "vmovlps\t{$src, $dst|$dst, $src}",
                        [(store (f64 (vector_extract (bc_v2f64 (v4f32 VR128X:$src)),
                                      (iPTR 0))), addr:$dst)],
                                      IIC_SSE_MOV_LH>,
                        EVEX, EVEX_CD8<32, CD8VT2>;
 def VMOVLPDZ128mr : AVX512PDI<0x13, MRMDestMem, (outs),
                        (ins f64mem:$dst, VR128X:$src),
                        "vmovlpd\t{$src, $dst|$dst, $src}",
                        [(store (f64 (vector_extract (v2f64 VR128X:$src),
                                      (iPTR 0))), addr:$dst)],
                                      IIC_SSE_MOV_LH>,
                        EVEX, EVEX_CD8<64, CD8VT1>, VEX_W;
 }
 let Predicates = [HasAVX512] in {
   // VMOVHPD patterns
   def : Pat<(store (f64 (vector_extract
                            (v2f64 (X86VPermilpi VR128X:$src, (i8 1))),
                            (iPTR 0))), addr:$dst),
            (VMOVHPDZ128mr addr:$dst, VR128X:$src)>;
   // VMOVLPS patterns
   def : Pat<(store (v4f32 (X86Movlps (load addr:$src1), VR128X:$src2)),
                    addr:$src1),
             (VMOVLPSZ128mr addr:$src1, VR128X:$src2)>;
   def : Pat<(store (v4i32 (X86Movlps
                    (bc_v4i32 (loadv2i64 addr:$src1)), VR128X:$src2)), addr:$src1),
             (VMOVLPSZ128mr addr:$src1, VR128X:$src2)>;
   // VMOVLPD patterns
   def : Pat<(store (v2f64 (X86Movlpd (load addr:$src1), VR128X:$src2)),
                    addr:$src1),
             (VMOVLPDZ128mr addr:$src1, VR128X:$src2)>;
   def : Pat<(store (v2i64 (X86Movlpd (load addr:$src1), VR128X:$src2)),
                    addr:$src1),
             (VMOVLPDZ128mr addr:$src1, VR128X:$src2)>;
 }
 //===----------------------------------------------------------------------===//
 // FMA - Fused Multiply Operations
 //
 
 let Constraints = "$src1 = $dst" in {
 multiclass avx512_fma3p_213_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             X86VectorVTInfo _> {
   defm r: AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),
           (ins _.RC:$src2, _.RC:$src3),
           OpcodeStr, "$src3, $src2", "$src2, $src3",
           (_.VT (OpNode _.RC:$src1, _.RC:$src2, _.RC:$src3))>,
          AVX512FMA3Base;
 
   let mayLoad = 1 in {
     defm m: AVX512_maskable_3src<opc, MRMSrcMem, _, (outs _.RC:$dst),
             (ins _.RC:$src2, _.MemOp:$src3),
             OpcodeStr, "$src3, $src2", "$src2, $src3",
             (_.VT (OpNode _.RC:$src1, _.RC:$src2, (_.LdFrag addr:$src3)))>,
             AVX512FMA3Base;
 
     defm mb: AVX512_maskable_3src<opc, MRMSrcMem, _, (outs _.RC:$dst),
               (ins _.RC:$src2, _.ScalarMemOp:$src3),
               OpcodeStr,   !strconcat("${src3}", _.BroadcastStr,", $src2"),
               !strconcat("$src2, ${src3}", _.BroadcastStr ),
               (OpNode _.RC:$src1,
                _.RC:$src2,(_.VT (X86VBroadcast (_.ScalarLdFrag addr:$src3))))>,
               AVX512FMA3Base, EVEX_B;
   }
 }
 
 multiclass avx512_fma3_213_round<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             X86VectorVTInfo _> {
   defm rb: AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),
           (ins _.RC:$src2, _.RC:$src3, AVX512RC:$rc),
           OpcodeStr, "$rc, $src3, $src2", "$src2, $src3, $rc",
           (_.VT ( OpNode _.RC:$src1, _.RC:$src2, _.RC:$src3, (i32 imm:$rc)))>,
           AVX512FMA3Base, EVEX_B, EVEX_RC;
 }
 } // Constraints = "$src1 = $dst"
 
 multiclass avx512_fma3p_213_common<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                      SDNode OpNodeRnd, AVX512VLVectorVTInfo _> {
   let Predicates = [HasAVX512] in {
     defm Z      : avx512_fma3p_213_rm<opc, OpcodeStr, OpNode, _.info512>,
                   avx512_fma3_213_round<opc, OpcodeStr, OpNodeRnd, _.info512>,
                       EVEX_V512, EVEX_CD8<_.info512.EltSize, CD8VF>;
   }
   let Predicates = [HasVLX, HasAVX512] in {
     defm Z256 : avx512_fma3p_213_rm<opc, OpcodeStr, OpNode, _.info256>,
                       EVEX_V256, EVEX_CD8<_.info256.EltSize, CD8VF>;
     defm Z128 : avx512_fma3p_213_rm<opc, OpcodeStr, OpNode, _.info128>,
                       EVEX_V128, EVEX_CD8<_.info128.EltSize, CD8VF>;
   }
 }
 
 multiclass avx512_fma3p_213_f<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             SDNode OpNodeRnd > {
     defm PS : avx512_fma3p_213_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,
                                       avx512vl_f32_info>;
     defm PD : avx512_fma3p_213_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,
                                       avx512vl_f64_info>, VEX_W;
 }
 
 defm VFMADD213    : avx512_fma3p_213_f<0xA8, "vfmadd213", X86Fmadd, X86FmaddRnd>;
 defm VFMSUB213    : avx512_fma3p_213_f<0xAA, "vfmsub213", X86Fmsub, X86FmsubRnd>;
 defm VFMADDSUB213 : avx512_fma3p_213_f<0xA6, "vfmaddsub213", X86Fmaddsub, X86FmaddsubRnd>;
 defm VFMSUBADD213 : avx512_fma3p_213_f<0xA7, "vfmsubadd213", X86Fmsubadd, X86FmsubaddRnd>;
 defm VFNMADD213   : avx512_fma3p_213_f<0xAC, "vfnmadd213", X86Fnmadd, X86FnmaddRnd>;
 defm VFNMSUB213   : avx512_fma3p_213_f<0xAE, "vfnmsub213", X86Fnmsub, X86FnmsubRnd>;
 
 
 let Constraints = "$src1 = $dst" in {
 multiclass avx512_fma3p_231_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             X86VectorVTInfo _> {
   defm r: AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),
           (ins _.RC:$src2, _.RC:$src3),
           OpcodeStr, "$src3, $src2", "$src2, $src3",
           (_.VT (OpNode _.RC:$src2, _.RC:$src3, _.RC:$src1))>,
          AVX512FMA3Base;
 
   let mayLoad = 1 in {
     defm m: AVX512_maskable_3src<opc, MRMSrcMem, _, (outs _.RC:$dst),
             (ins _.RC:$src2, _.MemOp:$src3),
             OpcodeStr, "$src3, $src2", "$src2, $src3",
             (_.VT (OpNode _.RC:$src2, (_.LdFrag addr:$src3), _.RC:$src1))>,
            AVX512FMA3Base;
 
     defm mb: AVX512_maskable_3src<opc, MRMSrcMem, _, (outs _.RC:$dst),
            (ins _.RC:$src2, _.ScalarMemOp:$src3),
            OpcodeStr, "${src3}"##_.BroadcastStr##", $src2",
            "$src2, ${src3}"##_.BroadcastStr,
            (_.VT (OpNode _.RC:$src2,
                         (_.VT (X86VBroadcast(_.ScalarLdFrag addr:$src3))),
                         _.RC:$src1))>, AVX512FMA3Base, EVEX_B;
   }
 }
 
 multiclass avx512_fma3_231_round<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             X86VectorVTInfo _> {
   defm rb: AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),
           (ins _.RC:$src2, _.RC:$src3, AVX512RC:$rc),
           OpcodeStr, "$rc, $src3, $src2", "$src2, $src3, $rc",
           (_.VT ( OpNode _.RC:$src2, _.RC:$src3, _.RC:$src1, (i32 imm:$rc)))>,
           AVX512FMA3Base, EVEX_B, EVEX_RC;
 }
 } // Constraints = "$src1 = $dst"
 
 multiclass avx512_fma3p_231_common<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                      SDNode OpNodeRnd, AVX512VLVectorVTInfo _> {
   let Predicates = [HasAVX512] in {
     defm Z      : avx512_fma3p_231_rm<opc, OpcodeStr, OpNode, _.info512>,
                   avx512_fma3_231_round<opc, OpcodeStr, OpNodeRnd, _.info512>,
                       EVEX_V512, EVEX_CD8<_.info512.EltSize, CD8VF>;
   }
   let Predicates = [HasVLX, HasAVX512] in {
     defm Z256 : avx512_fma3p_231_rm<opc, OpcodeStr, OpNode, _.info256>,
                       EVEX_V256, EVEX_CD8<_.info256.EltSize, CD8VF>;
     defm Z128 : avx512_fma3p_231_rm<opc, OpcodeStr, OpNode, _.info128>,
                       EVEX_V128, EVEX_CD8<_.info128.EltSize, CD8VF>;
   }
 }
 
 multiclass avx512_fma3p_231_f<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             SDNode OpNodeRnd > {
     defm PS : avx512_fma3p_231_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,
                                       avx512vl_f32_info>;
     defm PD : avx512_fma3p_231_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,
                                       avx512vl_f64_info>, VEX_W;
 }
 
 defm VFMADD231    : avx512_fma3p_231_f<0xB8, "vfmadd231", X86Fmadd, X86FmaddRnd>;
 defm VFMSUB231    : avx512_fma3p_231_f<0xBA, "vfmsub231", X86Fmsub, X86FmsubRnd>;
 defm VFMADDSUB231 : avx512_fma3p_231_f<0xB6, "vfmaddsub231", X86Fmaddsub, X86FmaddsubRnd>;
 defm VFMSUBADD231 : avx512_fma3p_231_f<0xB7, "vfmsubadd231", X86Fmsubadd, X86FmsubaddRnd>;
 defm VFNMADD231   : avx512_fma3p_231_f<0xBC, "vfnmadd231", X86Fnmadd, X86FnmaddRnd>;
 defm VFNMSUB231   : avx512_fma3p_231_f<0xBE, "vfnmsub231", X86Fnmsub, X86FnmsubRnd>;
 
 let Constraints = "$src1 = $dst" in {
 multiclass avx512_fma3p_132_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             X86VectorVTInfo _> {
   defm r: AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),
           (ins _.RC:$src3, _.RC:$src2),
           OpcodeStr, "$src2, $src3", "$src3, $src2",
           (_.VT (OpNode _.RC:$src1, _.RC:$src2, _.RC:$src3))>,
          AVX512FMA3Base;
 
   let mayLoad = 1 in {
     defm m: AVX512_maskable_3src<opc, MRMSrcMem, _, (outs _.RC:$dst),
             (ins _.RC:$src3, _.MemOp:$src2),
             OpcodeStr, "$src2, $src3", "$src3, $src2",
             (_.VT (OpNode _.RC:$src1, (_.LdFrag addr:$src2), _.RC:$src3))>,
            AVX512FMA3Base;
 
     defm mb: AVX512_maskable_3src<opc, MRMSrcMem, _, (outs _.RC:$dst),
            (ins _.RC:$src3, _.ScalarMemOp:$src2),
            OpcodeStr, "${src2}"##_.BroadcastStr##", $src3",
            "$src3, ${src2}"##_.BroadcastStr,
            (_.VT (OpNode _.RC:$src1,
                         (_.VT (X86VBroadcast(_.ScalarLdFrag addr:$src2))),
                         _.RC:$src3))>, AVX512FMA3Base, EVEX_B;
   }
 }
 
 multiclass avx512_fma3_132_round<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             X86VectorVTInfo _> {
   defm rb: AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),
           (ins _.RC:$src3, _.RC:$src2, AVX512RC:$rc),
           OpcodeStr, "$rc, $src2, $src3", "$src3, $src2, $rc",
           (_.VT ( OpNode _.RC:$src1, _.RC:$src2, _.RC:$src3, (i32 imm:$rc)))>,
           AVX512FMA3Base, EVEX_B, EVEX_RC;
 }
 } // Constraints = "$src1 = $dst"
 
 multiclass avx512_fma3p_132_common<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                      SDNode OpNodeRnd, AVX512VLVectorVTInfo _> {
   let Predicates = [HasAVX512] in {
     defm Z      : avx512_fma3p_132_rm<opc, OpcodeStr, OpNode, _.info512>,
                   avx512_fma3_132_round<opc, OpcodeStr, OpNodeRnd, _.info512>,
                       EVEX_V512, EVEX_CD8<_.info512.EltSize, CD8VF>;
   }
   let Predicates = [HasVLX, HasAVX512] in {
     defm Z256 : avx512_fma3p_132_rm<opc, OpcodeStr, OpNode, _.info256>,
                       EVEX_V256, EVEX_CD8<_.info256.EltSize, CD8VF>;
     defm Z128 : avx512_fma3p_132_rm<opc, OpcodeStr, OpNode, _.info128>,
                       EVEX_V128, EVEX_CD8<_.info128.EltSize, CD8VF>;
   }
 }
 
 multiclass avx512_fma3p_132_f<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             SDNode OpNodeRnd > {
     defm PS : avx512_fma3p_132_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,
                                       avx512vl_f32_info>;
     defm PD : avx512_fma3p_132_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,
                                       avx512vl_f64_info>, VEX_W;
 }
 
 defm VFMADD132    : avx512_fma3p_132_f<0x98, "vfmadd132", X86Fmadd, X86FmaddRnd>;
 defm VFMSUB132    : avx512_fma3p_132_f<0x9A, "vfmsub132", X86Fmsub, X86FmsubRnd>;
 defm VFMADDSUB132 : avx512_fma3p_132_f<0x96, "vfmaddsub132", X86Fmaddsub, X86FmaddsubRnd>;
 defm VFMSUBADD132 : avx512_fma3p_132_f<0x97, "vfmsubadd132", X86Fmsubadd, X86FmsubaddRnd>;
 defm VFNMADD132   : avx512_fma3p_132_f<0x9C, "vfnmadd132", X86Fnmadd, X86FnmaddRnd>;
 defm VFNMSUB132   : avx512_fma3p_132_f<0x9E, "vfnmsub132", X86Fnmsub, X86FnmsubRnd>;
 
 // Scalar FMA
 let Constraints = "$src1 = $dst" in {
 multiclass avx512_fma3s_common<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                                dag RHS_VEC_r, dag RHS_VEC_m, dag RHS_VEC_rb,
                                                         dag RHS_r, dag RHS_m > {
   defm r_Int: AVX512_maskable_3src_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
           (ins _.RC:$src2, _.RC:$src3), OpcodeStr,
           "$src3, $src2", "$src2, $src3", RHS_VEC_r>, AVX512FMA3Base;
 
   let mayLoad = 1 in
     defm m_Int: AVX512_maskable_3src_scalar<opc, MRMSrcMem, _, (outs _.RC:$dst),
             (ins _.RC:$src2, _.MemOp:$src3), OpcodeStr,
             "$src3, $src2", "$src2, $src3", RHS_VEC_m>, AVX512FMA3Base;
 
   defm rb_Int: AVX512_maskable_3src_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
          (ins _.RC:$src2, _.RC:$src3, AVX512RC:$rc),
          OpcodeStr, "$rc, $src3, $src2", "$src2, $src3, $rc", RHS_VEC_rb>,
                                        AVX512FMA3Base, EVEX_B, EVEX_RC;
 
   let isCodeGenOnly = 1 in {
     def r     : AVX512FMA3<opc, MRMSrcReg, (outs _.FRC:$dst),
                      (ins _.FRC:$src1, _.FRC:$src2, _.FRC:$src3),
                      !strconcat(OpcodeStr,
                               "\t{$src3, $src2, $dst|$dst, $src2, $src3}"),
                      [RHS_r]>;
     let mayLoad = 1 in
       def m     : AVX512FMA3<opc, MRMSrcMem, (outs _.FRC:$dst),
                       (ins _.FRC:$src1, _.FRC:$src2, _.ScalarMemOp:$src3),
                       !strconcat(OpcodeStr,
                                  "\t{$src3, $src2, $dst|$dst, $src2, $src3}"),
                       [RHS_m]>;
   }// isCodeGenOnly = 1
 }
 }// Constraints = "$src1 = $dst"
 
 multiclass avx512_fma3s_all<bits<8> opc213, bits<8> opc231, bits<8> opc132,
          string OpcodeStr, SDNode OpNode, SDNode OpNodeRnd, X86VectorVTInfo _ ,
                                                                   string SUFF> {
 
   defm NAME#213#SUFF: avx512_fma3s_common<opc213, OpcodeStr#"213"#_.Suffix , _ ,
                 (_.VT (OpNode _.RC:$src2, _.RC:$src1, _.RC:$src3)),
                 (_.VT (OpNode _.RC:$src2, _.RC:$src1,
                          (_.VT (scalar_to_vector(_.ScalarLdFrag addr:$src3))))),
                 (_.VT ( OpNodeRnd _.RC:$src2, _.RC:$src1, _.RC:$src3,
                          (i32 imm:$rc))),
                 (set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2, _.FRC:$src1,
                          _.FRC:$src3))),
                 (set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2, _.FRC:$src1,
                          (_.ScalarLdFrag addr:$src3))))>;
 
   defm NAME#231#SUFF: avx512_fma3s_common<opc231, OpcodeStr#"231"#_.Suffix , _ ,
                 (_.VT (OpNode _.RC:$src2, _.RC:$src3, _.RC:$src1)),
                 (_.VT (OpNode _.RC:$src2,
                        (_.VT (scalar_to_vector(_.ScalarLdFrag addr:$src3))),
                               _.RC:$src1)),
                 (_.VT ( OpNodeRnd _.RC:$src2, _.RC:$src3, _.RC:$src1,
                                   (i32 imm:$rc))),
                 (set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2, _.FRC:$src3,
                                           _.FRC:$src1))),
                 (set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src2,
                             (_.ScalarLdFrag addr:$src3), _.FRC:$src1)))>;
 
   defm NAME#132#SUFF: avx512_fma3s_common<opc132, OpcodeStr#"132"#_.Suffix , _ ,
                 (_.VT (OpNode _.RC:$src1, _.RC:$src3, _.RC:$src2)),
                 (_.VT (OpNode _.RC:$src1,
                        (_.VT (scalar_to_vector(_.ScalarLdFrag addr:$src3))),
                               _.RC:$src2)),
                 (_.VT ( OpNodeRnd _.RC:$src1, _.RC:$src3, _.RC:$src2,
                          (i32 imm:$rc))),
                 (set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src1, _.FRC:$src3,
                          _.FRC:$src2))),
                 (set _.FRC:$dst, (_.EltVT (OpNode _.FRC:$src1,
                           (_.ScalarLdFrag addr:$src3), _.FRC:$src2)))>;
 }
 
 multiclass avx512_fma3s<bits<8> opc213, bits<8> opc231, bits<8> opc132,
                              string OpcodeStr, SDNode OpNode, SDNode OpNodeRnd>{
   let Predicates = [HasAVX512] in {
     defm NAME : avx512_fma3s_all<opc213, opc231, opc132, OpcodeStr, OpNode,
                                    OpNodeRnd, f32x_info, "SS">,
                                    EVEX_CD8<32, CD8VT1>, VEX_LIG;
     defm NAME : avx512_fma3s_all<opc213, opc231, opc132, OpcodeStr, OpNode,
                                    OpNodeRnd, f64x_info, "SD">,
                                    EVEX_CD8<64, CD8VT1>, VEX_LIG, VEX_W;
   }
 }
 
 defm VFMADD  : avx512_fma3s<0xA9, 0xB9, 0x99, "vfmadd", X86Fmadd, X86FmaddRnd>;
 defm VFMSUB  : avx512_fma3s<0xAB, 0xBB, 0x9B, "vfmsub", X86Fmsub, X86FmsubRnd>;
 defm VFNMADD : avx512_fma3s<0xAD, 0xBD, 0x9D, "vfnmadd", X86Fnmadd, X86FnmaddRnd>;
 defm VFNMSUB : avx512_fma3s<0xAF, 0xBF, 0x9F, "vfnmsub", X86Fnmsub, X86FnmsubRnd>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512  Scalar convert from sign integer to float/double
 //===----------------------------------------------------------------------===//
 
 multiclass avx512_vcvtsi<bits<8> opc, SDNode OpNode, RegisterClass SrcRC,
                     X86VectorVTInfo DstVT, X86MemOperand x86memop,
                     PatFrag ld_frag, string asm> {
   let hasSideEffects = 0 in {
     def rr : SI<opc, MRMSrcReg, (outs DstVT.FRC:$dst),
               (ins DstVT.FRC:$src1, SrcRC:$src),
               !strconcat(asm,"\t{$src, $src1, $dst|$dst, $src1, $src}"), []>,
               EVEX_4V;
     let mayLoad = 1 in
       def rm : SI<opc, MRMSrcMem, (outs DstVT.FRC:$dst),
               (ins DstVT.FRC:$src1, x86memop:$src),
               !strconcat(asm,"\t{$src, $src1, $dst|$dst, $src1, $src}"), []>,
               EVEX_4V;
   } // hasSideEffects = 0
   let isCodeGenOnly = 1 in {
     def rr_Int : SI<opc, MRMSrcReg, (outs DstVT.RC:$dst),
                   (ins DstVT.RC:$src1, SrcRC:$src2),
                   !strconcat(asm,"\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
                   [(set DstVT.RC:$dst,
                         (OpNode (DstVT.VT DstVT.RC:$src1),
                                  SrcRC:$src2,
                                  (i32 FROUND_CURRENT)))]>, EVEX_4V;
 
     def rm_Int : SI<opc, MRMSrcMem, (outs DstVT.RC:$dst),
                   (ins DstVT.RC:$src1, x86memop:$src2),
                   !strconcat(asm,"\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
                   [(set DstVT.RC:$dst,
                         (OpNode (DstVT.VT DstVT.RC:$src1),
                                  (ld_frag addr:$src2),
                                  (i32 FROUND_CURRENT)))]>, EVEX_4V;
   }//isCodeGenOnly = 1
 }
 
 multiclass avx512_vcvtsi_round<bits<8> opc, SDNode OpNode, RegisterClass SrcRC,
                     X86VectorVTInfo DstVT, string asm> {
   def rrb_Int : SI<opc, MRMSrcReg, (outs DstVT.RC:$dst),
               (ins DstVT.RC:$src1, SrcRC:$src2, AVX512RC:$rc),
               !strconcat(asm,
                   "\t{$src2, $rc, $src1, $dst|$dst, $src1, $rc, $src2}"),
               [(set DstVT.RC:$dst,
                     (OpNode (DstVT.VT DstVT.RC:$src1),
                              SrcRC:$src2,
                              (i32 imm:$rc)))]>, EVEX_4V, EVEX_B, EVEX_RC;
 }
 
 multiclass avx512_vcvtsi_common<bits<8> opc, SDNode OpNode, RegisterClass SrcRC,
                     X86VectorVTInfo DstVT, X86MemOperand x86memop,
                     PatFrag ld_frag, string asm> {
   defm NAME : avx512_vcvtsi_round<opc, OpNode, SrcRC, DstVT, asm>,
               avx512_vcvtsi<opc, OpNode, SrcRC, DstVT, x86memop, ld_frag, asm>,
                         VEX_LIG;
 }
 
 let Predicates = [HasAVX512] in {
 defm VCVTSI2SSZ  : avx512_vcvtsi_common<0x2A, X86SintToFpRnd, GR32,
                                  v4f32x_info, i32mem, loadi32, "cvtsi2ss{l}">,
                                  XS, EVEX_CD8<32, CD8VT1>;
 defm VCVTSI642SSZ: avx512_vcvtsi_common<0x2A, X86SintToFpRnd, GR64,
                                  v4f32x_info, i64mem, loadi64, "cvtsi2ss{q}">,
                                  XS, VEX_W, EVEX_CD8<64, CD8VT1>;
 defm VCVTSI2SDZ  : avx512_vcvtsi_common<0x2A, X86SintToFpRnd, GR32,
                                  v2f64x_info, i32mem, loadi32, "cvtsi2sd{l}">,
                                  XD, EVEX_CD8<32, CD8VT1>;
 defm VCVTSI642SDZ: avx512_vcvtsi_common<0x2A, X86SintToFpRnd, GR64,
                                  v2f64x_info, i64mem, loadi64, "cvtsi2sd{q}">,
                                  XD, VEX_W, EVEX_CD8<64, CD8VT1>;
 
 def : Pat<(f32 (sint_to_fp (loadi32 addr:$src))),
           (VCVTSI2SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;
 def : Pat<(f32 (sint_to_fp (loadi64 addr:$src))),
           (VCVTSI642SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;
 def : Pat<(f64 (sint_to_fp (loadi32 addr:$src))),
           (VCVTSI2SDZrm (f64 (IMPLICIT_DEF)), addr:$src)>;
 def : Pat<(f64 (sint_to_fp (loadi64 addr:$src))),
           (VCVTSI642SDZrm (f64 (IMPLICIT_DEF)), addr:$src)>;
 
 def : Pat<(f32 (sint_to_fp GR32:$src)),
           (VCVTSI2SSZrr (f32 (IMPLICIT_DEF)), GR32:$src)>;
 def : Pat<(f32 (sint_to_fp GR64:$src)),
           (VCVTSI642SSZrr (f32 (IMPLICIT_DEF)), GR64:$src)>;
 def : Pat<(f64 (sint_to_fp GR32:$src)),
           (VCVTSI2SDZrr (f64 (IMPLICIT_DEF)), GR32:$src)>;
 def : Pat<(f64 (sint_to_fp GR64:$src)),
           (VCVTSI642SDZrr (f64 (IMPLICIT_DEF)), GR64:$src)>;
 
 defm VCVTUSI2SSZ   : avx512_vcvtsi_common<0x7B, X86UintToFpRnd, GR32,
                                   v4f32x_info, i32mem, loadi32,
                                   "cvtusi2ss{l}">, XS, EVEX_CD8<32, CD8VT1>;
 defm VCVTUSI642SSZ : avx512_vcvtsi_common<0x7B, X86UintToFpRnd, GR64,
                                   v4f32x_info, i64mem, loadi64, "cvtusi2ss{q}">,
                                   XS, VEX_W, EVEX_CD8<64, CD8VT1>;
 defm VCVTUSI2SDZ   : avx512_vcvtsi<0x7B, X86UintToFpRnd, GR32, v2f64x_info,
                                   i32mem, loadi32, "cvtusi2sd{l}">,
                                   XD, VEX_LIG, EVEX_CD8<32, CD8VT1>;
 defm VCVTUSI642SDZ : avx512_vcvtsi_common<0x7B, X86UintToFpRnd, GR64,
                                   v2f64x_info, i64mem, loadi64, "cvtusi2sd{q}">,
                                   XD, VEX_W, EVEX_CD8<64, CD8VT1>;
 
 def : Pat<(f32 (uint_to_fp (loadi32 addr:$src))),
           (VCVTUSI2SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;
 def : Pat<(f32 (uint_to_fp (loadi64 addr:$src))),
           (VCVTUSI642SSZrm (f32 (IMPLICIT_DEF)), addr:$src)>;
 def : Pat<(f64 (uint_to_fp (loadi32 addr:$src))),
           (VCVTUSI2SDZrm (f64 (IMPLICIT_DEF)), addr:$src)>;
 def : Pat<(f64 (uint_to_fp (loadi64 addr:$src))),
           (VCVTUSI642SDZrm (f64 (IMPLICIT_DEF)), addr:$src)>;
 
 def : Pat<(f32 (uint_to_fp GR32:$src)),
           (VCVTUSI2SSZrr (f32 (IMPLICIT_DEF)), GR32:$src)>;
 def : Pat<(f32 (uint_to_fp GR64:$src)),
           (VCVTUSI642SSZrr (f32 (IMPLICIT_DEF)), GR64:$src)>;
 def : Pat<(f64 (uint_to_fp GR32:$src)),
           (VCVTUSI2SDZrr (f64 (IMPLICIT_DEF)), GR32:$src)>;
 def : Pat<(f64 (uint_to_fp GR64:$src)),
           (VCVTUSI642SDZrr (f64 (IMPLICIT_DEF)), GR64:$src)>;
 }
 
 //===----------------------------------------------------------------------===//
 // AVX-512  Scalar convert from float/double to integer
 //===----------------------------------------------------------------------===//
 multiclass avx512_cvt_s_int_round<bits<8> opc, RegisterClass SrcRC, 
                                   RegisterClass DstRC, Intrinsic Int,
                            Operand memop, ComplexPattern mem_cpat, string asm> {
   let hasSideEffects = 0, Predicates = [HasAVX512] in {
     def rr : SI<opc, MRMSrcReg, (outs DstRC:$dst), (ins SrcRC:$src),
                 !strconcat(asm,"\t{$src, $dst|$dst, $src}"),
                 [(set DstRC:$dst, (Int SrcRC:$src))]>, EVEX, VEX_LIG;
     def rb : SI<opc, MRMSrcReg, (outs DstRC:$dst), (ins SrcRC:$src, AVX512RC:$rc),
                 !strconcat(asm,"\t{$rc, $src, $dst|$dst, $src, $rc}"), []>, 
                 EVEX, VEX_LIG, EVEX_B, EVEX_RC;
     let mayLoad = 1 in
     def rm : SI<opc, MRMSrcMem, (outs DstRC:$dst), (ins memop:$src),
                 !strconcat(asm,"\t{$src, $dst|$dst, $src}"), []>, EVEX, VEX_LIG;
   } // hasSideEffects = 0, Predicates = [HasAVX512] 
 }
 
 // Convert float/double to signed/unsigned int 32/64
 defm VCVTSS2SIZ: avx512_cvt_s_int_round<0x2D, VR128X, GR32, int_x86_sse_cvtss2si,
                                    ssmem, sse_load_f32, "cvtss2si">,
                                    XS, EVEX_CD8<32, CD8VT1>;
 defm VCVTSS2SI64Z: avx512_cvt_s_int_round<0x2D, VR128X, GR64, 
                                   int_x86_sse_cvtss2si64,
                                    ssmem, sse_load_f32, "cvtss2si">,
                                    XS, VEX_W, EVEX_CD8<32, CD8VT1>;
 defm VCVTSS2USIZ: avx512_cvt_s_int_round<0x79, VR128X, GR32, 
                                   int_x86_avx512_cvtss2usi,
                                    ssmem, sse_load_f32, "cvtss2usi">,
                                    XS, EVEX_CD8<32, CD8VT1>;
 defm VCVTSS2USI64Z: avx512_cvt_s_int_round<0x79, VR128X, GR64,
                                    int_x86_avx512_cvtss2usi64, ssmem,
                                    sse_load_f32, "cvtss2usi">, XS, VEX_W,
                                    EVEX_CD8<32, CD8VT1>;
 defm VCVTSD2SIZ: avx512_cvt_s_int_round<0x2D, VR128X, GR32, int_x86_sse2_cvtsd2si,
                                    sdmem, sse_load_f64, "cvtsd2si">,
                                    XD, EVEX_CD8<64, CD8VT1>;
 defm VCVTSD2SI64Z: avx512_cvt_s_int_round<0x2D, VR128X, GR64, 
                                    int_x86_sse2_cvtsd2si64,
                                    sdmem, sse_load_f64, "cvtsd2si">,
                                    XD, VEX_W, EVEX_CD8<64, CD8VT1>;
 defm VCVTSD2USIZ:   avx512_cvt_s_int_round<0x79, VR128X, GR32, 
                                    int_x86_avx512_cvtsd2usi,
                                    sdmem, sse_load_f64, "cvtsd2usi">,
                                    XD, EVEX_CD8<64, CD8VT1>;
 defm VCVTSD2USI64Z: avx512_cvt_s_int_round<0x79, VR128X, GR64,
                                    int_x86_avx512_cvtsd2usi64, sdmem,
                                    sse_load_f64, "cvtsd2usi">, XD, VEX_W,
                                    EVEX_CD8<64, CD8VT1>;
 
 let isCodeGenOnly = 1 , Predicates = [HasAVX512] in {
   defm Int_VCVTSI2SSZ : sse12_cvt_sint_3addr<0x2A, GR32, VR128X,
             int_x86_sse_cvtsi2ss, i32mem, loadi32, "cvtsi2ss{l}",
             SSE_CVT_Scalar, 0>, XS, EVEX_4V;
   defm Int_VCVTSI2SS64Z : sse12_cvt_sint_3addr<0x2A, GR64, VR128X,
             int_x86_sse_cvtsi642ss, i64mem, loadi64, "cvtsi2ss{q}",
             SSE_CVT_Scalar, 0>, XS, EVEX_4V, VEX_W;
   defm Int_VCVTSI2SDZ : sse12_cvt_sint_3addr<0x2A, GR32, VR128X,
             int_x86_sse2_cvtsi2sd, i32mem, loadi32, "cvtsi2sd{l}",
             SSE_CVT_Scalar, 0>, XD, EVEX_4V;
   defm Int_VCVTSI2SD64Z : sse12_cvt_sint_3addr<0x2A, GR64, VR128X,
             int_x86_sse2_cvtsi642sd, i64mem, loadi64, "cvtsi2sd{q}",
             SSE_CVT_Scalar, 0>, XD, EVEX_4V, VEX_W;
 
   defm Int_VCVTUSI2SDZ : sse12_cvt_sint_3addr<0x2A, GR32, VR128X,
             int_x86_avx512_cvtusi2sd, i32mem, loadi32, "cvtusi2sd{l}",
             SSE_CVT_Scalar, 0>, XD, EVEX_4V;
 } // isCodeGenOnly = 1, Predicates = [HasAVX512]
 
 // Convert float/double to signed/unsigned int 32/64 with truncation
 multiclass avx512_cvt_s_all<bits<8> opc, string asm, X86VectorVTInfo _SrcRC, 
                             X86VectorVTInfo _DstRC, SDNode OpNode, 
                             SDNode OpNodeRnd>{
 let Predicates = [HasAVX512] in {
   def rr : SI<opc, MRMSrcReg, (outs _DstRC.RC:$dst), (ins _SrcRC.FRC:$src),
               !strconcat(asm,"\t{$src, $dst|$dst, $src}"),
               [(set _DstRC.RC:$dst, (OpNode _SrcRC.FRC:$src))]>, EVEX;
   def rb : SI<opc, MRMSrcReg, (outs _DstRC.RC:$dst), (ins _SrcRC.FRC:$src),
                 !strconcat(asm,"\t{{sae}, $src, $dst|$dst, $src, {sae}}"),
                 []>, EVEX, EVEX_B;
   def rm : SI<opc, MRMSrcMem, (outs _DstRC.RC:$dst), (ins _SrcRC.MemOp:$src),
               !strconcat(asm,"\t{$src, $dst|$dst, $src}"),
               [(set _DstRC.RC:$dst, (OpNode (_SrcRC.ScalarLdFrag addr:$src)))]>, 
               EVEX;
 
   let isCodeGenOnly = 1,hasSideEffects = 0 in {
       def rr_Int : SI<opc, MRMSrcReg, (outs _DstRC.RC:$dst), (ins _SrcRC.RC:$src),
                 !strconcat(asm,"\t{$src, $dst|$dst, $src}"),
                [(set _DstRC.RC:$dst, (OpNodeRnd _SrcRC.RC:$src,
                                      (i32 FROUND_CURRENT)))]>, EVEX, VEX_LIG;
       def rb_Int : SI<opc, MRMSrcReg, (outs _DstRC.RC:$dst), (ins _SrcRC.RC:$src),
                 !strconcat(asm,"\t{{sae}, $src, $dst|$dst, $src, {sae}}"),
                 [(set _DstRC.RC:$dst, (OpNodeRnd _SrcRC.RC:$src, 
                                       (i32 FROUND_NO_EXC)))]>, 
                                       EVEX,VEX_LIG , EVEX_B;
       let mayLoad = 1 in
         def rm_Int : SI<opc, MRMSrcMem, (outs _DstRC.RC:$dst), 
                     (ins _SrcRC.MemOp:$src),
                     !strconcat(asm,"\t{$src, $dst|$dst, $src}"),
                     []>, EVEX, VEX_LIG;
 
   } // isCodeGenOnly = 1, hasSideEffects = 0
 } //HasAVX512
 }
 
 
 defm VCVTTSS2SIZ: avx512_cvt_s_all<0x2C, "cvttss2si", f32x_info, i32x_info, 
                         fp_to_sint,X86cvttss2IntRnd>, 
                         XS, EVEX_CD8<32, CD8VT1>;
 defm VCVTTSS2SI64Z: avx512_cvt_s_all<0x2C, "cvttss2si", f32x_info, i64x_info, 
                         fp_to_sint,X86cvttss2IntRnd>, 
                         VEX_W, XS, EVEX_CD8<32, CD8VT1>;
 defm VCVTTSD2SIZ: avx512_cvt_s_all<0x2C, "cvttsd2si", f64x_info, i32x_info, 
                         fp_to_sint,X86cvttsd2IntRnd>,
                         XD, EVEX_CD8<64, CD8VT1>;
 defm VCVTTSD2SI64Z: avx512_cvt_s_all<0x2C, "cvttsd2si", f64x_info, i64x_info, 
                         fp_to_sint,X86cvttsd2IntRnd>, 
                         VEX_W, XD, EVEX_CD8<64, CD8VT1>;
 
 defm VCVTTSS2USIZ: avx512_cvt_s_all<0x78, "cvttss2usi", f32x_info, i32x_info, 
                         fp_to_uint,X86cvttss2UIntRnd>, 
                         XS, EVEX_CD8<32, CD8VT1>;
 defm VCVTTSS2USI64Z: avx512_cvt_s_all<0x78, "cvttss2usi", f32x_info, i64x_info, 
                         fp_to_uint,X86cvttss2UIntRnd>, 
                         XS,VEX_W, EVEX_CD8<32, CD8VT1>;
 defm VCVTTSD2USIZ: avx512_cvt_s_all<0x78, "cvttsd2usi", f64x_info, i32x_info, 
                         fp_to_uint,X86cvttsd2UIntRnd>, 
                         XD, EVEX_CD8<64, CD8VT1>;
 defm VCVTTSD2USI64Z: avx512_cvt_s_all<0x78, "cvttsd2usi", f64x_info, i64x_info, 
                         fp_to_uint,X86cvttsd2UIntRnd>, 
                         XD, VEX_W, EVEX_CD8<64, CD8VT1>;
 let Predicates = [HasAVX512] in {
   def : Pat<(i32 (int_x86_sse_cvttss2si (v4f32 VR128X:$src))),
             (VCVTTSS2SIZrr_Int (COPY_TO_REGCLASS VR128X:$src, FR32X))>;
   def : Pat<(i64 (int_x86_sse_cvttss2si64 (v4f32 VR128X:$src))),
             (VCVTTSS2SI64Zrr_Int (COPY_TO_REGCLASS VR128X:$src, FR32X))>;
   def : Pat<(i32 (int_x86_sse2_cvttsd2si (v2f64 VR128X:$src))),
             (VCVTTSD2SIZrr_Int (COPY_TO_REGCLASS VR128X:$src, FR64X))>;
   def : Pat<(i64 (int_x86_sse2_cvttsd2si64 (v2f64 VR128X:$src))),
             (VCVTTSD2SI64Zrr_Int (COPY_TO_REGCLASS VR128X:$src, FR64X))>;
 
 } // HasAVX512
 //===----------------------------------------------------------------------===//
 // AVX-512  Convert form float to double and back
 //===----------------------------------------------------------------------===//
 multiclass avx512_cvt_fp_scalar<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                          X86VectorVTInfo _Src, SDNode OpNode> {
   defm rr : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                          (ins _Src.RC:$src1, _Src.RC:$src2), OpcodeStr, 
                          "$src2, $src1", "$src1, $src2",
                          (_.VT (OpNode (_Src.VT _Src.RC:$src1),
                                        (_Src.VT _Src.RC:$src2)))>, 
                          EVEX_4V, VEX_LIG, Sched<[WriteCvtF2F]>;
   defm rm : AVX512_maskable_scalar<opc, MRMSrcMem, _, (outs _.RC:$dst),
                          (ins _Src.RC:$src1, _Src.MemOp:$src2), OpcodeStr, 
                          "$src2, $src1", "$src1, $src2",
                          (_.VT (OpNode (_Src.VT _Src.RC:$src1), 
                                   (_Src.VT (scalar_to_vector 
                                             (_Src.ScalarLdFrag addr:$src2)))))>, 
                          EVEX_4V, VEX_LIG, Sched<[WriteCvtF2FLd, ReadAfterLd]>;
 }
 
 // Scalar Coversion with SAE - suppress all exceptions
 multiclass avx512_cvt_fp_sae_scalar<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                          X86VectorVTInfo _Src, SDNode OpNodeRnd> {
   defm rrb : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                         (ins _Src.RC:$src1, _Src.RC:$src2), OpcodeStr,
                         "{sae}, $src2, $src1", "$src1, $src2, {sae}",
                         (_.VT (OpNodeRnd (_Src.VT _Src.RC:$src1), 
                                          (_Src.VT _Src.RC:$src2),
                                          (i32 FROUND_NO_EXC)))>,
                         EVEX_4V, VEX_LIG, EVEX_B;
 }
 
 // Scalar Conversion with rounding control (RC)
 multiclass avx512_cvt_fp_rc_scalar<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                          X86VectorVTInfo _Src, SDNode OpNodeRnd> {
   defm rrb : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                         (ins _Src.RC:$src1, _Src.RC:$src2, AVX512RC:$rc), OpcodeStr,
                         "$rc, $src2, $src1", "$src1, $src2, $rc",
                         (_.VT (OpNodeRnd (_Src.VT _Src.RC:$src1), 
                                          (_Src.VT _Src.RC:$src2), (i32 imm:$rc)))>,
                         EVEX_4V, VEX_LIG, Sched<[WriteCvtF2FLd, ReadAfterLd]>,
                         EVEX_B, EVEX_RC;
 }
 multiclass avx512_cvt_fp_scalar_sd2ss<bits<8> opc, string OpcodeStr, SDNode OpNode, 
                                   SDNode OpNodeRnd, X86VectorVTInfo _src, 
                                                         X86VectorVTInfo _dst> {
   let Predicates = [HasAVX512] in {
     defm Z : avx512_cvt_fp_scalar<opc, OpcodeStr, _dst, _src, OpNode>,
              avx512_cvt_fp_rc_scalar<opc, OpcodeStr, _dst, _src,
                                OpNodeRnd>, VEX_W, EVEX_CD8<64, CD8VT1>,
                                EVEX_V512, XD;
   }
 }
 
 multiclass avx512_cvt_fp_scalar_ss2sd<bits<8> opc, string OpcodeStr, SDNode OpNode, 
                                     SDNode OpNodeRnd, X86VectorVTInfo _src, 
                                                           X86VectorVTInfo _dst> {
   let Predicates = [HasAVX512] in {
     defm Z : avx512_cvt_fp_scalar<opc, OpcodeStr, _dst, _src, OpNode>,
              avx512_cvt_fp_sae_scalar<opc, OpcodeStr, _dst, _src, OpNodeRnd>, 
              EVEX_CD8<32, CD8VT1>, XS, EVEX_V512;
   }
 }
 defm VCVTSD2SS : avx512_cvt_fp_scalar_sd2ss<0x5A, "vcvtsd2ss", X86fround,
                                          X86froundRnd, f64x_info, f32x_info>;
 defm VCVTSS2SD : avx512_cvt_fp_scalar_ss2sd<0x5A, "vcvtss2sd", X86fpext, 
                                           X86fpextRnd,f32x_info, f64x_info >;
 
 def : Pat<(f64 (fextend FR32X:$src)), 
           (COPY_TO_REGCLASS (VCVTSS2SDZrr (COPY_TO_REGCLASS FR32X:$src, VR128X), 
                                (COPY_TO_REGCLASS FR32X:$src, VR128X)), VR128X)>,
           Requires<[HasAVX512]>;
 def : Pat<(f64 (fextend (loadf32 addr:$src))),
           (COPY_TO_REGCLASS (VCVTSS2SDZrm (v4f32 (IMPLICIT_DEF)), addr:$src), VR128X)>,
           Requires<[HasAVX512]>;
 
 def : Pat<(f64 (extloadf32 addr:$src)),
       (COPY_TO_REGCLASS (VCVTSS2SDZrm (v4f32 (IMPLICIT_DEF)), addr:$src), VR128X)>,
       Requires<[HasAVX512, OptForSize]>;
 
 def : Pat<(f64 (extloadf32 addr:$src)),
           (COPY_TO_REGCLASS (VCVTSS2SDZrr (v4f32 (IMPLICIT_DEF)), 
                     (COPY_TO_REGCLASS (VMOVSSZrm addr:$src), VR128X)), VR128X)>,
           Requires<[HasAVX512, OptForSpeed]>;
 
 def : Pat<(f32 (fround FR64X:$src)), 
           (COPY_TO_REGCLASS (VCVTSD2SSZrr (COPY_TO_REGCLASS FR64X:$src, VR128X), 
                     (COPY_TO_REGCLASS FR64X:$src, VR128X)), VR128X)>,
            Requires<[HasAVX512]>;
 //===----------------------------------------------------------------------===//
 // AVX-512  Vector convert from signed/unsigned integer to float/double
 //          and from float/double to signed/unsigned integer
 //===----------------------------------------------------------------------===//
 
 multiclass avx512_vcvt_fp<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                          X86VectorVTInfo _Src, SDNode OpNode,
                          string Broadcast = _.BroadcastStr,
                          string Alias = ""> {
 
   defm rr : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                          (ins _Src.RC:$src), OpcodeStr, "$src", "$src",
                          (_.VT (OpNode (_Src.VT _Src.RC:$src)))>, EVEX;
 
   defm rm : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                          (ins _Src.MemOp:$src), OpcodeStr#Alias, "$src", "$src",
                          (_.VT (OpNode (_Src.VT
                              (bitconvert (_Src.LdFrag addr:$src)))))>, EVEX;
 
   defm rmb : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                          (ins _Src.MemOp:$src), OpcodeStr,
                          "${src}"##Broadcast, "${src}"##Broadcast,
                          (_.VT (OpNode (_Src.VT
                                   (X86VBroadcast (_Src.ScalarLdFrag addr:$src)))
                             ))>, EVEX, EVEX_B;
 }
 // Coversion with SAE - suppress all exceptions
 multiclass avx512_vcvt_fp_sae<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                          X86VectorVTInfo _Src, SDNode OpNodeRnd> {
   defm rrb : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                         (ins _Src.RC:$src), OpcodeStr,
                         "{sae}, $src", "$src, {sae}",
                         (_.VT (OpNodeRnd (_Src.VT _Src.RC:$src),
                                (i32 FROUND_NO_EXC)))>,
                         EVEX, EVEX_B;
 }
 
 // Conversion with rounding control (RC)
 multiclass avx512_vcvt_fp_rc<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                          X86VectorVTInfo _Src, SDNode OpNodeRnd> {
   defm rrb : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                         (ins _Src.RC:$src, AVX512RC:$rc), OpcodeStr,
                         "$rc, $src", "$src, $rc",
                         (_.VT (OpNodeRnd (_Src.VT _Src.RC:$src), (i32 imm:$rc)))>,
                         EVEX, EVEX_B, EVEX_RC;
 }
 
 // Extend Float to Double
 multiclass avx512_cvtps2pd<bits<8> opc, string OpcodeStr> {
   let Predicates = [HasAVX512] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v8f64_info, v8f32x_info, fextend>,
              avx512_vcvt_fp_sae<opc, OpcodeStr, v8f64_info, v8f32x_info,
                                 X86vfpextRnd>, EVEX_V512;
   }
   let Predicates = [HasVLX] in {
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v2f64x_info, v4f32x_info,
                                X86vfpext, "{1to2}">, EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v4f64x_info, v4f32x_info, fextend>,
                                      EVEX_V256;
   }
 }
 
 // Truncate Double to Float
 multiclass avx512_cvtpd2ps<bits<8> opc, string OpcodeStr> {
   let Predicates = [HasAVX512] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v8f32x_info, v8f64_info, fround>,
              avx512_vcvt_fp_rc<opc, OpcodeStr, v8f32x_info, v8f64_info,
                                X86vfproundRnd>, EVEX_V512;
   }
   let Predicates = [HasVLX] in {
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v4f32x_info, v2f64x_info,
                                X86vfpround, "{1to2}", "{x}">, EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v4f32x_info, v4f64x_info, fround,
                                "{1to4}", "{y}">, EVEX_V256;
   }
 }
 
 defm VCVTPD2PS : avx512_cvtpd2ps<0x5A, "vcvtpd2ps">,
                                   VEX_W, PD, EVEX_CD8<64, CD8VF>;
 defm VCVTPS2PD : avx512_cvtps2pd<0x5A, "vcvtps2pd">,
                                   PS, EVEX_CD8<32, CD8VH>;
 
 def : Pat<(v8f64 (extloadv8f32 addr:$src)),
             (VCVTPS2PDZrm addr:$src)>;
 
 let Predicates = [HasVLX] in {
   def : Pat<(v4f64 (extloadv4f32 addr:$src)),
               (VCVTPS2PDZ256rm addr:$src)>;
 }
 
 // Convert Signed/Unsigned Doubleword to Double
 multiclass avx512_cvtdq2pd<bits<8> opc, string OpcodeStr, SDNode OpNode,
                            SDNode OpNode128> {
   // No rounding in this op
   let Predicates = [HasAVX512] in
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v8f64_info, v8i32x_info, OpNode>,
                                      EVEX_V512;
 
   let Predicates = [HasVLX] in {
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v2f64x_info, v4i32x_info,
                                      OpNode128, "{1to2}">, EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v4f64x_info, v4i32x_info, OpNode>,
                                      EVEX_V256;
   }
 }
 
 // Convert Signed/Unsigned Doubleword to Float
 multiclass avx512_cvtdq2ps<bits<8> opc, string OpcodeStr, SDNode OpNode,
                            SDNode OpNodeRnd> {
   let Predicates = [HasAVX512] in
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v16f32_info, v16i32_info, OpNode>,
              avx512_vcvt_fp_rc<opc, OpcodeStr, v16f32_info, v16i32_info,
                                OpNodeRnd>, EVEX_V512;
 
   let Predicates = [HasVLX] in {
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v4f32x_info, v4i32x_info, OpNode>,
                                      EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v8f32x_info, v8i32x_info, OpNode>,
                                      EVEX_V256;
   }
 }
 
 // Convert Float to Signed/Unsigned Doubleword with truncation
 multiclass avx512_cvttps2dq<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode, SDNode OpNodeRnd> {
   let Predicates = [HasAVX512] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v16i32_info, v16f32_info, OpNode>,
              avx512_vcvt_fp_sae<opc, OpcodeStr, v16i32_info, v16f32_info,
                                 OpNodeRnd>, EVEX_V512;
   }
   let Predicates = [HasVLX] in {
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v4i32x_info, v4f32x_info, OpNode>,
                                      EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v8i32x_info, v8f32x_info, OpNode>,
                                      EVEX_V256;
   }
 }
 
 // Convert Float to Signed/Unsigned Doubleword
 multiclass avx512_cvtps2dq<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode, SDNode OpNodeRnd> {
   let Predicates = [HasAVX512] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v16i32_info, v16f32_info, OpNode>,
              avx512_vcvt_fp_rc<opc, OpcodeStr, v16i32_info, v16f32_info,
                                 OpNodeRnd>, EVEX_V512;
   }
   let Predicates = [HasVLX] in {
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v4i32x_info, v4f32x_info, OpNode>,
                                      EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v8i32x_info, v8f32x_info, OpNode>,
                                      EVEX_V256;
   }
 }
 
 // Convert Double to Signed/Unsigned Doubleword with truncation
 multiclass avx512_cvttpd2dq<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode, SDNode OpNodeRnd> {
   let Predicates = [HasAVX512] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v8i32x_info, v8f64_info, OpNode>,
              avx512_vcvt_fp_sae<opc, OpcodeStr, v8i32x_info, v8f64_info,
                                 OpNodeRnd>, EVEX_V512;
   }
   let Predicates = [HasVLX] in {
     // we need "x"/"y" suffixes in order to distinguish between 128 and 256
     // memory forms of these instructions in Asm Parcer. They have the same
     // dest type - 'v4i32x_info'. We also specify the broadcast string explicitly
     // due to the same reason.
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v4i32x_info, v2f64x_info, OpNode,
                                "{1to2}", "{x}">, EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v4i32x_info, v4f64x_info, OpNode,
                                "{1to4}", "{y}">, EVEX_V256;
   }
 }
 
 // Convert Double to Signed/Unsigned Doubleword
 multiclass avx512_cvtpd2dq<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode, SDNode OpNodeRnd> {
   let Predicates = [HasAVX512] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v8i32x_info, v8f64_info, OpNode>,
              avx512_vcvt_fp_rc<opc, OpcodeStr, v8i32x_info, v8f64_info,
                                OpNodeRnd>, EVEX_V512;
   }
   let Predicates = [HasVLX] in {
     // we need "x"/"y" suffixes in order to distinguish between 128 and 256
     // memory forms of these instructions in Asm Parcer. They have the same
     // dest type - 'v4i32x_info'. We also specify the broadcast string explicitly
     // due to the same reason.
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v4i32x_info, v2f64x_info, OpNode,
                                "{1to2}", "{x}">, EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v4i32x_info, v4f64x_info, OpNode,
                                "{1to4}", "{y}">, EVEX_V256;
   }
 }
 
 // Convert Double to Signed/Unsigned Quardword
 multiclass avx512_cvtpd2qq<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode, SDNode OpNodeRnd> {
   let Predicates = [HasDQI] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v8i64_info, v8f64_info, OpNode>,
              avx512_vcvt_fp_rc<opc, OpcodeStr, v8i64_info, v8f64_info,
                                OpNodeRnd>, EVEX_V512;
   }
   let Predicates = [HasDQI, HasVLX] in {
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v2i64x_info, v2f64x_info, OpNode>,
                                EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v4i64x_info, v4f64x_info, OpNode>,
                                EVEX_V256;
   }
 }
 
 // Convert Double to Signed/Unsigned Quardword with truncation
 multiclass avx512_cvttpd2qq<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode, SDNode OpNodeRnd> {
   let Predicates = [HasDQI] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v8i64_info, v8f64_info, OpNode>,
              avx512_vcvt_fp_sae<opc, OpcodeStr, v8i64_info, v8f64_info,
                                OpNodeRnd>, EVEX_V512;
   }
   let Predicates = [HasDQI, HasVLX] in {
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v2i64x_info, v2f64x_info, OpNode>,
                                EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v4i64x_info, v4f64x_info, OpNode>,
                                EVEX_V256;
   }
 }
 
 // Convert Signed/Unsigned Quardword to Double
 multiclass avx512_cvtqq2pd<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode, SDNode OpNodeRnd> {
   let Predicates = [HasDQI] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v8f64_info, v8i64_info, OpNode>,
              avx512_vcvt_fp_rc<opc, OpcodeStr, v8f64_info, v8i64_info,
                                OpNodeRnd>, EVEX_V512;
   }
   let Predicates = [HasDQI, HasVLX] in {
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v2f64x_info, v2i64x_info, OpNode>,
                                EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v4f64x_info, v4i64x_info, OpNode>,
                                EVEX_V256;
   }
 }
 
 // Convert Float to Signed/Unsigned Quardword
 multiclass avx512_cvtps2qq<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode, SDNode OpNodeRnd> {
   let Predicates = [HasDQI] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v8i64_info, v8f32x_info, OpNode>,
              avx512_vcvt_fp_rc<opc, OpcodeStr, v8i64_info, v8f32x_info,
                                OpNodeRnd>, EVEX_V512;
   }
   let Predicates = [HasDQI, HasVLX] in {
     // Explicitly specified broadcast string, since we take only 2 elements
     // from v4f32x_info source
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v2i64x_info, v4f32x_info, OpNode,
                                "{1to2}">, EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v4i64x_info, v4f32x_info, OpNode>,
                                EVEX_V256;
   }
 }
 
 // Convert Float to Signed/Unsigned Quardword with truncation
 multiclass avx512_cvttps2qq<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode, SDNode OpNodeRnd> {
   let Predicates = [HasDQI] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v8i64_info, v8f32x_info, OpNode>,
              avx512_vcvt_fp_sae<opc, OpcodeStr, v8i64_info, v8f32x_info,
                                OpNodeRnd>, EVEX_V512;
   }
   let Predicates = [HasDQI, HasVLX] in {
     // Explicitly specified broadcast string, since we take only 2 elements
     // from v4f32x_info source
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v2i64x_info, v4f32x_info, OpNode,
                                "{1to2}">, EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v4i64x_info, v4f32x_info, OpNode>,
                                EVEX_V256;
   }
 }
 
 // Convert Signed/Unsigned Quardword to Float
 multiclass avx512_cvtqq2ps<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode, SDNode OpNodeRnd> {
   let Predicates = [HasDQI] in {
     defm Z : avx512_vcvt_fp<opc, OpcodeStr, v8f32x_info, v8i64_info, OpNode>,
              avx512_vcvt_fp_rc<opc, OpcodeStr, v8f32x_info, v8i64_info,
                                OpNodeRnd>, EVEX_V512;
   }
   let Predicates = [HasDQI, HasVLX] in {
     // we need "x"/"y" suffixes in order to distinguish between 128 and 256
     // memory forms of these instructions in Asm Parcer. They have the same
     // dest type - 'v4i32x_info'. We also specify the broadcast string explicitly
     // due to the same reason.
     defm Z128 : avx512_vcvt_fp<opc, OpcodeStr, v4f32x_info, v2i64x_info, OpNode,
                                "{1to2}", "{x}">, EVEX_V128;
     defm Z256 : avx512_vcvt_fp<opc, OpcodeStr, v4f32x_info, v4i64x_info, OpNode,
                                "{1to4}", "{y}">, EVEX_V256;
   }
 }
 
 defm VCVTDQ2PD : avx512_cvtdq2pd<0xE6, "vcvtdq2pd", sint_to_fp, X86cvtdq2pd>, XS,
                                 EVEX_CD8<32, CD8VH>;
 
 defm VCVTDQ2PS : avx512_cvtdq2ps<0x5B, "vcvtdq2ps", sint_to_fp,
                                 X86VSintToFpRnd>,
                                 PS, EVEX_CD8<32, CD8VF>;
 
 defm VCVTTPS2DQ : avx512_cvttps2dq<0x5B, "vcvttps2dq", fp_to_sint,
                                 X86VFpToSintRnd>,
                                 XS, EVEX_CD8<32, CD8VF>;
 
 defm VCVTTPD2DQ : avx512_cvttpd2dq<0xE6, "vcvttpd2dq", fp_to_sint,
                                  X86VFpToSintRnd>,
                                  PD, VEX_W, EVEX_CD8<64, CD8VF>;
 
 defm VCVTTPS2UDQ : avx512_cvttps2dq<0x78, "vcvttps2udq", fp_to_uint,
                                  X86VFpToUintRnd>, PS,
                                  EVEX_CD8<32, CD8VF>;
 
 defm VCVTTPD2UDQ : avx512_cvttpd2dq<0x78, "vcvttpd2udq", fp_to_uint,
                                  X86VFpToUintRnd>, PS, VEX_W,
                                  EVEX_CD8<64, CD8VF>;
 
 defm VCVTUDQ2PD : avx512_cvtdq2pd<0x7A, "vcvtudq2pd", uint_to_fp, X86cvtudq2pd>,
                                  XS, EVEX_CD8<32, CD8VH>;
 
 defm VCVTUDQ2PS : avx512_cvtdq2ps<0x7A, "vcvtudq2ps", uint_to_fp,
                                  X86VUintToFpRnd>, XD,
                                  EVEX_CD8<32, CD8VF>;
 
 defm VCVTPS2DQ : avx512_cvtps2dq<0x5B, "vcvtps2dq", X86cvtps2Int,
                                  X86cvtps2IntRnd>, PD, EVEX_CD8<32, CD8VF>;
 
 defm VCVTPD2DQ : avx512_cvtpd2dq<0xE6, "vcvtpd2dq", X86cvtpd2Int,
                                  X86cvtpd2IntRnd>, XD, VEX_W,
                                  EVEX_CD8<64, CD8VF>;
 
 defm VCVTPS2UDQ : avx512_cvtps2dq<0x79, "vcvtps2udq", X86cvtps2UInt,
                                  X86cvtps2UIntRnd>,
                                  PS, EVEX_CD8<32, CD8VF>;
 defm VCVTPD2UDQ : avx512_cvtpd2dq<0x79, "vcvtpd2udq", X86cvtpd2UInt,
                                  X86cvtpd2UIntRnd>, VEX_W,
                                  PS, EVEX_CD8<64, CD8VF>;
 
 defm VCVTPD2QQ : avx512_cvtpd2qq<0x7B, "vcvtpd2qq", X86cvtpd2Int,
                                  X86cvtpd2IntRnd>, VEX_W,
                                  PD, EVEX_CD8<64, CD8VF>;
 
 defm VCVTPS2QQ : avx512_cvtps2qq<0x7B, "vcvtps2qq", X86cvtps2Int,
                                  X86cvtps2IntRnd>, PD, EVEX_CD8<32, CD8VH>;
 
 defm VCVTPD2UQQ : avx512_cvtpd2qq<0x79, "vcvtpd2uqq", X86cvtpd2UInt,
                                  X86cvtpd2UIntRnd>, VEX_W,
                                  PD, EVEX_CD8<64, CD8VF>;
 
 defm VCVTPS2UQQ : avx512_cvtps2qq<0x79, "vcvtps2uqq", X86cvtps2UInt,
                                  X86cvtps2UIntRnd>, PD, EVEX_CD8<32, CD8VH>;
 
 defm VCVTTPD2QQ : avx512_cvttpd2qq<0x7A, "vcvttpd2qq", fp_to_sint,
                                  X86VFpToSlongRnd>, VEX_W,
                                  PD, EVEX_CD8<64, CD8VF>;
 
 defm VCVTTPS2QQ : avx512_cvttps2qq<0x7A, "vcvttps2qq", fp_to_sint,
                                  X86VFpToSlongRnd>, PD, EVEX_CD8<32, CD8VH>;
 
 defm VCVTTPD2UQQ : avx512_cvttpd2qq<0x78, "vcvttpd2uqq", fp_to_uint,
                                  X86VFpToUlongRnd>, VEX_W,
                                  PD, EVEX_CD8<64, CD8VF>;
 
 defm VCVTTPS2UQQ : avx512_cvttps2qq<0x78, "vcvttps2uqq", fp_to_uint,
                                  X86VFpToUlongRnd>, PD, EVEX_CD8<32, CD8VH>;
 
 defm VCVTQQ2PD : avx512_cvtqq2pd<0xE6, "vcvtqq2pd", sint_to_fp,
                             X86VSlongToFpRnd>, VEX_W, XS, EVEX_CD8<64, CD8VF>;
 
 defm VCVTUQQ2PD : avx512_cvtqq2pd<0x7A, "vcvtuqq2pd", uint_to_fp,
                             X86VUlongToFpRnd>, VEX_W, XS, EVEX_CD8<64, CD8VF>;
 
 defm VCVTQQ2PS : avx512_cvtqq2ps<0x5B, "vcvtqq2ps", sint_to_fp,
                             X86VSlongToFpRnd>, VEX_W, PS, EVEX_CD8<64, CD8VF>;
 
 defm VCVTUQQ2PS : avx512_cvtqq2ps<0x7A, "vcvtuqq2ps", uint_to_fp,
                             X86VUlongToFpRnd>, VEX_W, XD, EVEX_CD8<64, CD8VF>;
 
 let Predicates = [HasAVX512, NoVLX] in {
 def : Pat<(v8i32 (fp_to_uint (v8f32 VR256X:$src1))),
           (EXTRACT_SUBREG (v16i32 (VCVTTPS2UDQZrr
            (v16f32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)))), sub_ymm)>;
 
 def : Pat<(v4i32 (fp_to_uint (v4f32 VR128X:$src1))),
           (EXTRACT_SUBREG (v16i32 (VCVTTPS2UDQZrr
            (v16f32 (SUBREG_TO_REG (i32 0), VR128X:$src1, sub_xmm)))), sub_xmm)>;
 
 def : Pat<(v8f32 (uint_to_fp (v8i32 VR256X:$src1))),
           (EXTRACT_SUBREG (v16f32 (VCVTUDQ2PSZrr
            (v16i32 (SUBREG_TO_REG (i32 0), VR256X:$src1, sub_ymm)))), sub_ymm)>;
 
 def : Pat<(v4f32 (uint_to_fp (v4i32 VR128X:$src1))),
           (EXTRACT_SUBREG (v16f32 (VCVTUDQ2PSZrr
            (v16i32 (SUBREG_TO_REG (i32 0), VR128X:$src1, sub_xmm)))), sub_xmm)>;
 
 def : Pat<(v4f64 (uint_to_fp (v4i32 VR128X:$src1))),
           (EXTRACT_SUBREG (v8f64 (VCVTUDQ2PDZrr
            (v8i32 (SUBREG_TO_REG (i32 0), VR128X:$src1, sub_xmm)))), sub_ymm)>;
 }
 
 let Predicates = [HasAVX512] in {
   def : Pat<(v8f32 (fround (loadv8f64 addr:$src))),
             (VCVTPD2PSZrm addr:$src)>;
   def : Pat<(v8f64 (extloadv8f32 addr:$src)),
             (VCVTPS2PDZrm addr:$src)>;
 }
 
 //===----------------------------------------------------------------------===//
 // Half precision conversion instructions
 //===----------------------------------------------------------------------===//
 multiclass avx512_cvtph2ps<X86VectorVTInfo _dest, X86VectorVTInfo _src, 
                            X86MemOperand x86memop, PatFrag ld_frag> {
   defm rr : AVX512_maskable<0x13, MRMSrcReg, _dest ,(outs _dest.RC:$dst), (ins _src.RC:$src),
                     "vcvtph2ps", "$src", "$src",
                    (X86cvtph2ps (_src.VT _src.RC:$src),
                                                 (i32 FROUND_CURRENT))>, T8PD;
   let hasSideEffects = 0, mayLoad = 1 in {
     defm rm : AVX512_maskable<0x13, MRMSrcMem, _dest, (outs _dest.RC:$dst), (ins x86memop:$src),
                       "vcvtph2ps", "$src", "$src", 
                       (X86cvtph2ps (_src.VT (bitconvert (ld_frag addr:$src))),
                                        (i32 FROUND_CURRENT))>, T8PD;
   }
 }
 
 multiclass avx512_cvtph2ps_sae<X86VectorVTInfo _dest, X86VectorVTInfo _src> {
   defm rb : AVX512_maskable<0x13, MRMSrcReg, _dest ,(outs _dest.RC:$dst), (ins _src.RC:$src),
                     "vcvtph2ps", "{sae}, $src", "$src, {sae}",
                    (X86cvtph2ps (_src.VT _src.RC:$src),
                                                 (i32 FROUND_NO_EXC))>, T8PD, EVEX_B;
 
 }
 
 let Predicates = [HasAVX512] in {
   defm VCVTPH2PSZ : avx512_cvtph2ps<v16f32_info, v16i16x_info, f256mem, loadv4i64>,
                     avx512_cvtph2ps_sae<v16f32_info, v16i16x_info>, 
                     EVEX, EVEX_V512, EVEX_CD8<32, CD8VH>;
   let Predicates = [HasVLX] in {
     defm VCVTPH2PSZ256 : avx512_cvtph2ps<v8f32x_info, v8i16x_info, f128mem, 
                          loadv2i64>,EVEX, EVEX_V256, EVEX_CD8<32, CD8VH>;
     defm VCVTPH2PSZ128 : avx512_cvtph2ps<v4f32x_info, v8i16x_info, f64mem,
                          loadv2i64>, EVEX, EVEX_V128, EVEX_CD8<32, CD8VH>;
   }
 }
 
 multiclass avx512_cvtps2ph<X86VectorVTInfo _dest, X86VectorVTInfo _src, 
                            X86MemOperand x86memop> {
   defm rr : AVX512_maskable<0x1D, MRMDestReg, _dest ,(outs _dest.RC:$dst),
                (ins _src.RC:$src1, i32u8imm:$src2),
                     "vcvtps2ph", "$src2, $src1", "$src1, $src2", 
                    (X86cvtps2ph (_src.VT _src.RC:$src1),
                                 (i32 imm:$src2), 
                                 (i32 FROUND_CURRENT))>, AVX512AIi8Base;
   let hasSideEffects = 0, mayStore = 1 in {
     def mr : AVX512AIi8<0x1D, MRMDestMem, (outs),
                (ins x86memop:$dst, _src.RC:$src1, i32u8imm:$src2),
                "vcvtps2ph\t{$src2, $src1, $dst|$dst, $src1, $src2}", 
                [(store (_dest.VT (X86cvtps2ph (_src.VT _src.RC:$src1),
                                        (i32 imm:$src2), (i32 FROUND_CURRENT) )),
                                        addr:$dst)]>;
     def mrk : AVX512AIi8<0x1D, MRMDestMem, (outs),
                (ins x86memop:$dst, _dest.KRCWM:$mask, _src.RC:$src1, i32u8imm:$src2),
                "vcvtps2ph\t{$src2, $src1, $dst {${mask}}|$dst {${mask}}, $src1, $src2}", 
                 []>, EVEX_K;
   }
 }
 multiclass avx512_cvtps2ph_sae<X86VectorVTInfo _dest, X86VectorVTInfo _src> {
   defm rb : AVX512_maskable<0x1D, MRMDestReg, _dest ,(outs _dest.RC:$dst),
                (ins _src.RC:$src1, i32u8imm:$src2),
                     "vcvtps2ph", "$src2, {sae}, $src1", "$src1, $src2, {sae}", 
                    (X86cvtps2ph (_src.VT _src.RC:$src1),
                                 (i32 imm:$src2), 
                                 (i32 FROUND_NO_EXC))>, EVEX_B, AVX512AIi8Base;
 }
 let Predicates = [HasAVX512] in {
   defm VCVTPS2PHZ : avx512_cvtps2ph<v16i16x_info, v16f32_info, f256mem>,
                     avx512_cvtps2ph_sae<v16i16x_info, v16f32_info>,
                       EVEX, EVEX_V512, EVEX_CD8<32, CD8VH>;
   let Predicates = [HasVLX] in {
     defm VCVTPS2PHZ256 : avx512_cvtps2ph<v8i16x_info, v8f32x_info, f128mem>,
                         EVEX, EVEX_V256, EVEX_CD8<32, CD8VH>;
     defm VCVTPS2PHZ128 : avx512_cvtps2ph<v8i16x_info, v4f32x_info, f128mem>,
                         EVEX, EVEX_V128, EVEX_CD8<32, CD8VH>;
   }
 }
 
 //  Unordered/Ordered scalar fp compare with Sea and set EFLAGS
 multiclass avx512_ord_cmp_sae<bits<8> opc, X86VectorVTInfo _, SDNode OpNode,
                             string OpcodeStr> {
   def rb: AVX512<opc, MRMSrcReg, (outs), (ins _.RC:$src1, _.RC:$src2),
                  !strconcat(OpcodeStr, "\t{{sae}, $src2, $src1|$src1, $src2, {sae}}"),
                  [(set EFLAGS, (OpNode (_.VT _.RC:$src1), _.RC:$src2, 
                                                         (i32 FROUND_NO_EXC)))],
                  IIC_SSE_COMIS_RR>, EVEX, EVEX_B, VEX_LIG, EVEX_V128,
                  Sched<[WriteFAdd]>;
 }
 
 let Defs = [EFLAGS], Predicates = [HasAVX512] in {
   defm VUCOMISSZ : avx512_ord_cmp_sae<0x2E, v4f32x_info, X86ucomiSae, "vucomiss">,
                                    AVX512PSIi8Base, EVEX_CD8<32, CD8VT1>;
   defm VUCOMISDZ : avx512_ord_cmp_sae<0x2E, v2f64x_info, X86ucomiSae, "vucomisd">,
                                    AVX512PDIi8Base, VEX_W, EVEX_CD8<64, CD8VT1>;
   defm VCOMISSZ : avx512_ord_cmp_sae<0x2F, v4f32x_info, X86comiSae, "vcomiss">,
                                    AVX512PSIi8Base, EVEX_CD8<32, CD8VT1>;
   defm VCOMISDZ : avx512_ord_cmp_sae<0x2F, v2f64x_info, X86comiSae, "vcomisd">,
                                    AVX512PDIi8Base, VEX_W, EVEX_CD8<64, CD8VT1>;
 }
 
 let Defs = [EFLAGS], Predicates = [HasAVX512] in {
   defm VUCOMISSZ : sse12_ord_cmp<0x2E, FR32X, X86cmp, f32, f32mem, loadf32,
                                  "ucomiss">, PS, EVEX, VEX_LIG,
                                  EVEX_CD8<32, CD8VT1>;
   defm VUCOMISDZ : sse12_ord_cmp<0x2E, FR64X, X86cmp, f64, f64mem, loadf64,
                                   "ucomisd">, PD, EVEX,
                                   VEX_LIG, VEX_W, EVEX_CD8<64, CD8VT1>;
   let Pattern = []<dag> in {
     defm VCOMISSZ  : sse12_ord_cmp<0x2F, FR32X, undef, f32, f32mem, loadf32,
                                    "comiss">, PS, EVEX, VEX_LIG,
                                    EVEX_CD8<32, CD8VT1>;
     defm VCOMISDZ  : sse12_ord_cmp<0x2F, FR64X, undef, f64, f64mem, loadf64,
                                    "comisd">, PD, EVEX,
                                     VEX_LIG, VEX_W, EVEX_CD8<64, CD8VT1>;
   }
   let isCodeGenOnly = 1 in {
     defm Int_VUCOMISSZ  : sse12_ord_cmp<0x2E, VR128X, X86ucomi, v4f32, f128mem,
                               load, "ucomiss">, PS, EVEX, VEX_LIG,
                               EVEX_CD8<32, CD8VT1>;
     defm Int_VUCOMISDZ  : sse12_ord_cmp<0x2E, VR128X, X86ucomi, v2f64, f128mem,
                               load, "ucomisd">, PD, EVEX,
                               VEX_LIG, VEX_W, EVEX_CD8<64, CD8VT1>;
 
     defm Int_VCOMISSZ  : sse12_ord_cmp<0x2F, VR128X, X86comi, v4f32, f128mem,
                               load, "comiss">, PS, EVEX, VEX_LIG,
                               EVEX_CD8<32, CD8VT1>;
     defm Int_VCOMISDZ  : sse12_ord_cmp<0x2F, VR128X, X86comi, v2f64, f128mem,
                               load, "comisd">, PD, EVEX,
                               VEX_LIG, VEX_W, EVEX_CD8<64, CD8VT1>;
   }
 }
 
 /// avx512_fp14_s rcp14ss, rcp14sd, rsqrt14ss, rsqrt14sd
 multiclass avx512_fp14_s<bits<8> opc, string OpcodeStr, SDNode OpNode,
                             X86VectorVTInfo _> {
   let hasSideEffects = 0, AddedComplexity = 20 , Predicates = [HasAVX512] in {
   defm rr : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                            (ins _.RC:$src1, _.RC:$src2), OpcodeStr,
                            "$src2, $src1", "$src1, $src2",
                            (OpNode (_.VT _.RC:$src1), (_.VT _.RC:$src2))>, EVEX_4V;
   let mayLoad = 1 in {
   defm rm : AVX512_maskable_scalar<opc, MRMSrcMem, _, (outs _.RC:$dst),
                          (ins _.RC:$src1, _.MemOp:$src2), OpcodeStr,
                          "$src2, $src1", "$src1, $src2",
                          (OpNode (_.VT _.RC:$src1),
                           (_.VT (scalar_to_vector (_.ScalarLdFrag addr:$src2))))>, EVEX_4V;
   }
 }
 }
 
 defm VRCP14SS   : avx512_fp14_s<0x4D, "vrcp14ss", X86frcp14s, f32x_info>,
                   EVEX_CD8<32, CD8VT1>, T8PD;
 defm VRCP14SD   : avx512_fp14_s<0x4D, "vrcp14sd", X86frcp14s, f64x_info>,
                   VEX_W, EVEX_CD8<64, CD8VT1>, T8PD;
 defm VRSQRT14SS   : avx512_fp14_s<0x4F, "vrsqrt14ss", X86frsqrt14s, f32x_info>,
                   EVEX_CD8<32, CD8VT1>, T8PD;
 defm VRSQRT14SD   : avx512_fp14_s<0x4F, "vrsqrt14sd", X86frsqrt14s, f64x_info>,
                   VEX_W, EVEX_CD8<64, CD8VT1>, T8PD;
 
 /// avx512_fp14_p rcp14ps, rcp14pd, rsqrt14ps, rsqrt14pd
 multiclass avx512_fp14_p<bits<8> opc, string OpcodeStr, SDNode OpNode,
                          X86VectorVTInfo _> {
   defm r: AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                          (ins _.RC:$src), OpcodeStr, "$src", "$src",
                          (_.FloatVT (OpNode _.RC:$src))>, EVEX, T8PD;
   let mayLoad = 1 in {
     defm m: AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                            (ins _.MemOp:$src), OpcodeStr, "$src", "$src",
                            (OpNode (_.FloatVT
                              (bitconvert (_.LdFrag addr:$src))))>, EVEX, T8PD;
     defm mb: AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                             (ins _.ScalarMemOp:$src), OpcodeStr,
                             "${src}"##_.BroadcastStr, "${src}"##_.BroadcastStr,
                             (OpNode (_.FloatVT
                               (X86VBroadcast (_.ScalarLdFrag addr:$src))))>,
                             EVEX, T8PD, EVEX_B;
   }
 }
 
 multiclass avx512_fp14_p_vl_all<bits<8> opc, string OpcodeStr, SDNode OpNode> {
   defm PSZ : avx512_fp14_p<opc, !strconcat(OpcodeStr, "ps"), OpNode, v16f32_info>,
                           EVEX_V512, EVEX_CD8<32, CD8VF>;
   defm PDZ : avx512_fp14_p<opc, !strconcat(OpcodeStr, "pd"), OpNode, v8f64_info>,
                           EVEX_V512, VEX_W, EVEX_CD8<64, CD8VF>;
 
   // Define only if AVX512VL feature is present.
   let Predicates = [HasVLX] in {
     defm PSZ128 : avx512_fp14_p<opc, !strconcat(OpcodeStr, "ps"),
                                 OpNode, v4f32x_info>,
                                EVEX_V128, EVEX_CD8<32, CD8VF>;
     defm PSZ256 : avx512_fp14_p<opc, !strconcat(OpcodeStr, "ps"),
                                 OpNode, v8f32x_info>,
                                EVEX_V256, EVEX_CD8<32, CD8VF>;
     defm PDZ128 : avx512_fp14_p<opc, !strconcat(OpcodeStr, "pd"),
                                 OpNode, v2f64x_info>,
                                EVEX_V128, VEX_W, EVEX_CD8<64, CD8VF>;
     defm PDZ256 : avx512_fp14_p<opc, !strconcat(OpcodeStr, "pd"),
                                 OpNode, v4f64x_info>,
                                EVEX_V256, VEX_W, EVEX_CD8<64, CD8VF>;
   }
 }
 
 defm VRSQRT14 : avx512_fp14_p_vl_all<0x4E, "vrsqrt14", X86frsqrt>;
 defm VRCP14 : avx512_fp14_p_vl_all<0x4C, "vrcp14", X86frcp>;
 
 /// avx512_fp28_s rcp28ss, rcp28sd, rsqrt28ss, rsqrt28sd
 multiclass avx512_fp28_s<bits<8> opc, string OpcodeStr,X86VectorVTInfo _,
                          SDNode OpNode> {
 
   defm r : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                            (ins _.RC:$src1, _.RC:$src2), OpcodeStr,
                            "$src2, $src1", "$src1, $src2",
                            (OpNode (_.VT _.RC:$src1), (_.VT _.RC:$src2),
                            (i32 FROUND_CURRENT))>;
 
   defm rb : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                             (ins _.RC:$src1, _.RC:$src2), OpcodeStr,
                             "{sae}, $src2, $src1", "$src1, $src2, {sae}",
                             (OpNode (_.VT _.RC:$src1), (_.VT _.RC:$src2),
                             (i32 FROUND_NO_EXC))>, EVEX_B;
 
   defm m : AVX512_maskable_scalar<opc, MRMSrcMem, _, (outs _.RC:$dst),
                          (ins _.RC:$src1, _.MemOp:$src2), OpcodeStr,
                          "$src2, $src1", "$src1, $src2",
                          (OpNode (_.VT _.RC:$src1),
                           (_.VT (scalar_to_vector (_.ScalarLdFrag addr:$src2))),
                          (i32 FROUND_CURRENT))>;
 }
 
 multiclass avx512_eri_s<bits<8> opc, string OpcodeStr, SDNode OpNode> {
   defm SS : avx512_fp28_s<opc, OpcodeStr#"ss", f32x_info, OpNode>,
               EVEX_CD8<32, CD8VT1>;
   defm SD : avx512_fp28_s<opc, OpcodeStr#"sd", f64x_info, OpNode>,
               EVEX_CD8<64, CD8VT1>, VEX_W;
 }
 
 let hasSideEffects = 0, Predicates = [HasERI] in {
   defm VRCP28   : avx512_eri_s<0xCB, "vrcp28",   X86rcp28s>,   T8PD, EVEX_4V;
   defm VRSQRT28 : avx512_eri_s<0xCD, "vrsqrt28", X86rsqrt28s>, T8PD, EVEX_4V;
 }
 
 defm VGETEXP   : avx512_eri_s<0x43, "vgetexp", X86fgetexpRnds>, T8PD, EVEX_4V;
 /// avx512_fp28_p rcp28ps, rcp28pd, rsqrt28ps, rsqrt28pd
 
 multiclass avx512_fp28_p<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                          SDNode OpNode> {
 
   defm r : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                          (ins _.RC:$src), OpcodeStr, "$src", "$src",
                          (OpNode (_.VT _.RC:$src), (i32 FROUND_CURRENT))>;
 
   defm m : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                          (ins _.MemOp:$src), OpcodeStr, "$src", "$src",
                          (OpNode (_.FloatVT
                              (bitconvert (_.LdFrag addr:$src))),
                           (i32 FROUND_CURRENT))>;
 
   defm mb : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                          (ins _.MemOp:$src), OpcodeStr,
                          "${src}"##_.BroadcastStr, "${src}"##_.BroadcastStr,
                          (OpNode (_.FloatVT
                                   (X86VBroadcast (_.ScalarLdFrag addr:$src))),
                                  (i32 FROUND_CURRENT))>, EVEX_B;
 }
 multiclass avx512_fp28_p_round<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                          SDNode OpNode> {
   defm rb : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                         (ins _.RC:$src), OpcodeStr,
                         "{sae}, $src", "$src, {sae}",
                         (OpNode (_.VT _.RC:$src), (i32 FROUND_NO_EXC))>, EVEX_B;
 }
 
 multiclass  avx512_eri<bits<8> opc, string OpcodeStr, SDNode OpNode> {
    defm PS : avx512_fp28_p<opc, OpcodeStr#"ps", v16f32_info, OpNode>,
              avx512_fp28_p_round<opc, OpcodeStr#"ps", v16f32_info, OpNode>,
              T8PD, EVEX_V512, EVEX_CD8<32, CD8VF>;
    defm PD : avx512_fp28_p<opc, OpcodeStr#"pd", v8f64_info, OpNode>,
              avx512_fp28_p_round<opc, OpcodeStr#"pd", v8f64_info, OpNode>,
              T8PD, EVEX_V512, VEX_W, EVEX_CD8<64, CD8VF>;
 }
 
 multiclass avx512_fp_unaryop_packed<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode> {
   // Define only if AVX512VL feature is present.
   let Predicates = [HasVLX] in {
     defm PSZ128 : avx512_fp28_p<opc, OpcodeStr#"ps", v4f32x_info, OpNode>,
                                      EVEX_V128, T8PD, EVEX_CD8<32, CD8VF>;
     defm PSZ256 : avx512_fp28_p<opc, OpcodeStr#"ps", v8f32x_info, OpNode>,
                                      EVEX_V256, T8PD, EVEX_CD8<32, CD8VF>;
     defm PDZ128 : avx512_fp28_p<opc, OpcodeStr#"pd", v2f64x_info, OpNode>,
                                      EVEX_V128, VEX_W, T8PD, EVEX_CD8<64, CD8VF>;
     defm PDZ256 : avx512_fp28_p<opc, OpcodeStr#"pd", v4f64x_info, OpNode>,
                                      EVEX_V256, VEX_W, T8PD, EVEX_CD8<64, CD8VF>;
   }
 }
 let Predicates = [HasERI], hasSideEffects = 0 in {
 
  defm VRSQRT28 : avx512_eri<0xCC, "vrsqrt28", X86rsqrt28>, EVEX;
  defm VRCP28   : avx512_eri<0xCA, "vrcp28",   X86rcp28>,   EVEX;
  defm VEXP2    : avx512_eri<0xC8, "vexp2",    X86exp2>,    EVEX;
 }
 defm VGETEXP   : avx512_eri<0x42, "vgetexp", X86fgetexpRnd>,
                  avx512_fp_unaryop_packed<0x42, "vgetexp", X86fgetexpRnd> , EVEX;
 
 multiclass avx512_sqrt_packed_round<bits<8> opc, string OpcodeStr,
                               SDNode OpNodeRnd, X86VectorVTInfo _>{
   defm rb: AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                          (ins _.RC:$src, AVX512RC:$rc), OpcodeStr, "$rc, $src", "$src, $rc",
                          (_.VT (OpNodeRnd _.RC:$src, (i32 imm:$rc)))>,
                          EVEX, EVEX_B, EVEX_RC;
 }
 
 multiclass avx512_sqrt_packed<bits<8> opc, string OpcodeStr,
                               SDNode OpNode, X86VectorVTInfo _>{
   defm r: AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                          (ins _.RC:$src), OpcodeStr, "$src", "$src",
                          (_.FloatVT (OpNode _.RC:$src))>, EVEX;
   let mayLoad = 1 in {
     defm m: AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                            (ins _.MemOp:$src), OpcodeStr, "$src", "$src",
                            (OpNode (_.FloatVT
                              (bitconvert (_.LdFrag addr:$src))))>, EVEX;
 
     defm mb: AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                             (ins _.ScalarMemOp:$src), OpcodeStr,
                             "${src}"##_.BroadcastStr, "${src}"##_.BroadcastStr,
                             (OpNode (_.FloatVT
                               (X86VBroadcast (_.ScalarLdFrag addr:$src))))>,
                             EVEX, EVEX_B;
   }
 }
 
 multiclass avx512_sqrt_packed_all<bits<8> opc, string OpcodeStr,
                                   SDNode OpNode> {
   defm PSZ : avx512_sqrt_packed<opc, !strconcat(OpcodeStr, "ps"), OpNode,
                                 v16f32_info>,
                                 EVEX_V512, PS, EVEX_CD8<32, CD8VF>;
   defm PDZ : avx512_sqrt_packed<opc, !strconcat(OpcodeStr, "pd"), OpNode,
                                 v8f64_info>,
                                 EVEX_V512, VEX_W, PD, EVEX_CD8<64, CD8VF>;
   // Define only if AVX512VL feature is present.
   let Predicates = [HasVLX] in {
     defm PSZ128 : avx512_sqrt_packed<opc, !strconcat(OpcodeStr, "ps"),
                                      OpNode, v4f32x_info>,
                                      EVEX_V128, PS, EVEX_CD8<32, CD8VF>;
     defm PSZ256 : avx512_sqrt_packed<opc, !strconcat(OpcodeStr, "ps"),
                                      OpNode, v8f32x_info>,
                                      EVEX_V256, PS, EVEX_CD8<32, CD8VF>;
     defm PDZ128 : avx512_sqrt_packed<opc, !strconcat(OpcodeStr, "pd"),
                                      OpNode, v2f64x_info>,
                                      EVEX_V128, VEX_W, PD, EVEX_CD8<64, CD8VF>;
     defm PDZ256 : avx512_sqrt_packed<opc, !strconcat(OpcodeStr, "pd"),
                                      OpNode, v4f64x_info>,
                                      EVEX_V256, VEX_W, PD, EVEX_CD8<64, CD8VF>;
   }
 }
 
 multiclass avx512_sqrt_packed_all_round<bits<8> opc, string OpcodeStr,
                                           SDNode OpNodeRnd> {
   defm PSZ : avx512_sqrt_packed_round<opc, !strconcat(OpcodeStr, "ps"), OpNodeRnd,
                                 v16f32_info>, EVEX_V512, PS, EVEX_CD8<32, CD8VF>;
   defm PDZ : avx512_sqrt_packed_round<opc, !strconcat(OpcodeStr, "pd"), OpNodeRnd,
                                 v8f64_info>, EVEX_V512, VEX_W, PD, EVEX_CD8<64, CD8VF>;
 }
 
 multiclass avx512_sqrt_scalar<bits<8> opc, string OpcodeStr,X86VectorVTInfo _,
                               string SUFF, SDNode OpNode, SDNode OpNodeRnd> {
 
   defm r_Int : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                          (ins _.RC:$src1, _.RC:$src2), OpcodeStr,
                          "$src2, $src1", "$src1, $src2",
                          (OpNodeRnd (_.VT _.RC:$src1),
                                     (_.VT _.RC:$src2),
                                     (i32 FROUND_CURRENT))>;
   let mayLoad = 1 in
     defm m_Int : AVX512_maskable_scalar<opc, MRMSrcMem, _, (outs _.RC:$dst),
                          (ins _.RC:$src1, _.MemOp:$src2), OpcodeStr,
                          "$src2, $src1", "$src1, $src2",
                          (OpNodeRnd (_.VT _.RC:$src1),
                                     (_.VT (scalar_to_vector
                                               (_.ScalarLdFrag addr:$src2))),
                                     (i32 FROUND_CURRENT))>;
 
   defm rb_Int : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                          (ins _.RC:$src1, _.RC:$src2, AVX512RC:$rc), OpcodeStr,
                          "$rc, $src2, $src1", "$src1, $src2, $rc",
                          (OpNodeRnd (_.VT _.RC:$src1),
                                      (_.VT _.RC:$src2),
                                      (i32 imm:$rc))>,
                          EVEX_B, EVEX_RC;
 
   let isCodeGenOnly = 1 in {
     def r : I<opc, MRMSrcReg, (outs _.FRC:$dst),
                (ins _.FRC:$src1, _.FRC:$src2),
                OpcodeStr#"\t{$src2, $src1, $dst|$dst, $src1, $src2}", []>;
 
     let mayLoad = 1 in
       def m : I<opc, MRMSrcMem, (outs _.FRC:$dst),
                  (ins _.FRC:$src1, _.ScalarMemOp:$src2),
                  OpcodeStr#"\t{$src2, $src1, $dst|$dst, $src1, $src2}", []>;
   }
 
   def : Pat<(_.EltVT (OpNode _.FRC:$src)),
             (!cast<Instruction>(NAME#SUFF#Zr)
                 (_.EltVT (IMPLICIT_DEF)), _.FRC:$src)>;
 
   def : Pat<(_.EltVT (OpNode (load addr:$src))),
             (!cast<Instruction>(NAME#SUFF#Zm)
-                (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[OptForSize]>;
+                (_.EltVT (IMPLICIT_DEF)), addr:$src)>, Requires<[HasAVX512, OptForSize]>;
 }
 
 multiclass avx512_sqrt_scalar_all<bits<8> opc, string OpcodeStr> {
   defm SSZ : avx512_sqrt_scalar<opc, OpcodeStr#"ss", f32x_info, "SS", fsqrt,
                         X86fsqrtRnds>, EVEX_CD8<32, CD8VT1>, EVEX_4V, XS;
   defm SDZ : avx512_sqrt_scalar<opc, OpcodeStr#"sd", f64x_info, "SD", fsqrt,
                         X86fsqrtRnds>, EVEX_CD8<64, CD8VT1>, EVEX_4V, XD, VEX_W;
 }
 
 defm VSQRT   : avx512_sqrt_packed_all<0x51, "vsqrt", fsqrt>,
                avx512_sqrt_packed_all_round<0x51, "vsqrt", X86fsqrtRnd>;
 
 defm VSQRT   : avx512_sqrt_scalar_all<0x51, "vsqrt">, VEX_LIG;
 
 let Predicates = [HasAVX512] in {
   def : Pat<(f32 (X86frsqrt FR32X:$src)),
             (COPY_TO_REGCLASS (VRSQRT14SSrr (v4f32 (IMPLICIT_DEF)), (COPY_TO_REGCLASS FR32X:$src, VR128X)), VR128X)>;
   def : Pat<(f32 (X86frsqrt (load addr:$src))),
             (COPY_TO_REGCLASS (VRSQRT14SSrm (v4f32 (IMPLICIT_DEF)), addr:$src), VR128X)>,
             Requires<[OptForSize]>;
   def : Pat<(f32 (X86frcp FR32X:$src)),
             (COPY_TO_REGCLASS (VRCP14SSrr (v4f32 (IMPLICIT_DEF)), (COPY_TO_REGCLASS FR32X:$src, VR128X)), VR128X )>;
   def : Pat<(f32 (X86frcp (load addr:$src))),
             (COPY_TO_REGCLASS (VRCP14SSrm (v4f32 (IMPLICIT_DEF)), addr:$src), VR128X)>,
             Requires<[OptForSize]>;
 }
 
 multiclass
 avx512_rndscale_scalar<bits<8> opc, string OpcodeStr, X86VectorVTInfo _> {
 
   let ExeDomain = _.ExeDomain in {
   defm r : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                            (ins _.RC:$src1, _.RC:$src2, i32u8imm:$src3), OpcodeStr,
                            "$src3, $src2, $src1", "$src1, $src2, $src3",
                            (_.VT (X86RndScales (_.VT _.RC:$src1), (_.VT _.RC:$src2),
                             (i32 imm:$src3), (i32 FROUND_CURRENT)))>;
 
   defm rb : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                          (ins _.RC:$src1, _.RC:$src2, i32u8imm:$src3), OpcodeStr,
                          "$src3, {sae}, $src2, $src1", "$src1, $src2, {sae}, $src3",
                          (_.VT (X86RndScales (_.VT _.RC:$src1), (_.VT _.RC:$src2),
                          (i32 imm:$src3), (i32 FROUND_NO_EXC)))>, EVEX_B;
 
   let mayLoad = 1 in
   defm m : AVX512_maskable_scalar<opc, MRMSrcMem, _, (outs _.RC:$dst),
                          (ins _.RC:$src1, _.MemOp:$src2, i32u8imm:$src3), OpcodeStr,
                          "$src3, $src2, $src1", "$src1, $src2, $src3",
                          (_.VT (X86RndScales (_.VT _.RC:$src1),
                           (_.VT (scalar_to_vector (_.ScalarLdFrag addr:$src2))),
                           (i32 imm:$src3), (i32 FROUND_CURRENT)))>;
   }
   let Predicates = [HasAVX512] in {
   def : Pat<(ffloor _.FRC:$src), (COPY_TO_REGCLASS
              (_.VT (!cast<Instruction>(NAME##r) (_.VT (IMPLICIT_DEF)),
              (_.VT (COPY_TO_REGCLASS _.FRC:$src, _.RC)), (i32 0x1))), _.FRC)>;
   def : Pat<(fceil _.FRC:$src), (COPY_TO_REGCLASS
              (_.VT (!cast<Instruction>(NAME##r) (_.VT (IMPLICIT_DEF)),
              (_.VT (COPY_TO_REGCLASS _.FRC:$src, _.RC)), (i32 0x2))), _.FRC)>;
   def : Pat<(ftrunc _.FRC:$src), (COPY_TO_REGCLASS
              (_.VT (!cast<Instruction>(NAME##r) (_.VT (IMPLICIT_DEF)),
              (_.VT (COPY_TO_REGCLASS _.FRC:$src, _.RC)), (i32 0x3))), _.FRC)>;
   def : Pat<(frint _.FRC:$src), (COPY_TO_REGCLASS
              (_.VT (!cast<Instruction>(NAME##r) (_.VT (IMPLICIT_DEF)),
              (_.VT (COPY_TO_REGCLASS _.FRC:$src, _.RC)), (i32 0x4))), _.FRC)>;
   def : Pat<(fnearbyint _.FRC:$src), (COPY_TO_REGCLASS
              (_.VT (!cast<Instruction>(NAME##r) (_.VT (IMPLICIT_DEF)),
              (_.VT (COPY_TO_REGCLASS _.FRC:$src, _.RC)), (i32 0xc))), _.FRC)>;
 
   def : Pat<(ffloor (_.ScalarLdFrag addr:$src)), (COPY_TO_REGCLASS
              (_.VT (!cast<Instruction>(NAME##m) (_.VT (IMPLICIT_DEF)),
              addr:$src, (i32 0x1))), _.FRC)>;
   def : Pat<(fceil (_.ScalarLdFrag addr:$src)), (COPY_TO_REGCLASS
              (_.VT (!cast<Instruction>(NAME##m) (_.VT (IMPLICIT_DEF)),
              addr:$src, (i32 0x2))), _.FRC)>;
   def : Pat<(ftrunc (_.ScalarLdFrag addr:$src)), (COPY_TO_REGCLASS
              (_.VT (!cast<Instruction>(NAME##m) (_.VT (IMPLICIT_DEF)),
              addr:$src, (i32 0x3))), _.FRC)>;
   def : Pat<(frint (_.ScalarLdFrag addr:$src)), (COPY_TO_REGCLASS
              (_.VT (!cast<Instruction>(NAME##m) (_.VT (IMPLICIT_DEF)),
              addr:$src, (i32 0x4))), _.FRC)>;
   def : Pat<(fnearbyint (_.ScalarLdFrag addr:$src)), (COPY_TO_REGCLASS
              (_.VT (!cast<Instruction>(NAME##m) (_.VT (IMPLICIT_DEF)),
              addr:$src, (i32 0xc))), _.FRC)>;
   }
 }
 
 defm VRNDSCALESS : avx512_rndscale_scalar<0x0A, "vrndscaless", f32x_info>,
                                 AVX512AIi8Base, EVEX_4V, EVEX_CD8<32, CD8VT1>;
 
 defm VRNDSCALESD : avx512_rndscale_scalar<0x0B, "vrndscalesd", f64x_info>, VEX_W,
                                 AVX512AIi8Base, EVEX_4V, EVEX_CD8<64, CD8VT1>;
 
 //-------------------------------------------------
 // Integer truncate and extend operations
 //-------------------------------------------------
 
 multiclass avx512_trunc_common<bits<8> opc, string OpcodeStr, SDNode OpNode,
                               X86VectorVTInfo SrcInfo, X86VectorVTInfo DestInfo,
                               X86MemOperand x86memop> {
 
   defm rr  : AVX512_maskable<opc, MRMDestReg, DestInfo, (outs DestInfo.RC:$dst),
                       (ins SrcInfo.RC:$src1), OpcodeStr ,"$src1", "$src1",
                       (DestInfo.VT (OpNode (SrcInfo.VT SrcInfo.RC:$src1)))>,
                        EVEX, T8XS;
 
   // for intrinsic patter match
   def : Pat<(DestInfo.VT (X86select DestInfo.KRCWM:$mask,
                            (DestInfo.VT (OpNode (SrcInfo.VT SrcInfo.RC:$src1))),
                            undef)),
             (!cast<Instruction>(NAME#SrcInfo.ZSuffix##rrkz) DestInfo.KRCWM:$mask ,
                                       SrcInfo.RC:$src1)>;
 
   def : Pat<(DestInfo.VT (X86select DestInfo.KRCWM:$mask,
                            (DestInfo.VT (OpNode (SrcInfo.VT SrcInfo.RC:$src1))),
                            DestInfo.ImmAllZerosV)),
             (!cast<Instruction>(NAME#SrcInfo.ZSuffix##rrkz) DestInfo.KRCWM:$mask ,
                                       SrcInfo.RC:$src1)>;
 
   def : Pat<(DestInfo.VT (X86select DestInfo.KRCWM:$mask,
                            (DestInfo.VT (OpNode (SrcInfo.VT SrcInfo.RC:$src1))),
                            DestInfo.RC:$src0)),
             (!cast<Instruction>(NAME#SrcInfo.ZSuffix##rrk) DestInfo.RC:$src0,
                                       DestInfo.KRCWM:$mask ,
                                       SrcInfo.RC:$src1)>;
 
   let mayStore = 1 in {
     def mr : AVX512XS8I<opc, MRMDestMem, (outs),
                (ins x86memop:$dst, SrcInfo.RC:$src),
                OpcodeStr # "\t{$src, $dst|$dst, $src}",
                []>, EVEX;
 
     def mrk : AVX512XS8I<opc, MRMDestMem, (outs),
                (ins x86memop:$dst, SrcInfo.KRCWM:$mask, SrcInfo.RC:$src),
                OpcodeStr # "\t{$src, $dst {${mask}}|$dst {${mask}}, $src}",
                []>, EVEX, EVEX_K;
   }//mayStore = 1
 }
 
 multiclass avx512_trunc_mr_lowering<X86VectorVTInfo SrcInfo,
                                     X86VectorVTInfo DestInfo,
                                     PatFrag truncFrag, PatFrag mtruncFrag > {
 
   def : Pat<(truncFrag (SrcInfo.VT SrcInfo.RC:$src), addr:$dst),
             (!cast<Instruction>(NAME#SrcInfo.ZSuffix##mr)
                                     addr:$dst, SrcInfo.RC:$src)>;
 
   def : Pat<(mtruncFrag addr:$dst, SrcInfo.KRCWM:$mask,
                                                (SrcInfo.VT SrcInfo.RC:$src)),
             (!cast<Instruction>(NAME#SrcInfo.ZSuffix##mrk)
                             addr:$dst, SrcInfo.KRCWM:$mask, SrcInfo.RC:$src)>;
 }
 
 multiclass avx512_trunc_sat_mr_lowering<X86VectorVTInfo SrcInfo,
                                         X86VectorVTInfo DestInfo, string sat > {
 
   def: Pat<(!cast<Intrinsic>("int_x86_avx512_mask_pmov"#sat#"_"#SrcInfo.Suffix#
                                DestInfo.Suffix#"_mem_"#SrcInfo.Size)
                   addr:$ptr, (SrcInfo.VT SrcInfo.RC:$src), SrcInfo.MRC:$mask),
            (!cast<Instruction>(NAME#SrcInfo.ZSuffix##mrk) addr:$ptr,
                     (COPY_TO_REGCLASS SrcInfo.MRC:$mask, SrcInfo.KRCWM),
                     (SrcInfo.VT SrcInfo.RC:$src))>;
 
   def: Pat<(!cast<Intrinsic>("int_x86_avx512_mask_pmov"#sat#"_"#SrcInfo.Suffix#
                                DestInfo.Suffix#"_mem_"#SrcInfo.Size)
                   addr:$ptr, (SrcInfo.VT SrcInfo.RC:$src), -1),
            (!cast<Instruction>(NAME#SrcInfo.ZSuffix##mr) addr:$ptr,
                     (SrcInfo.VT SrcInfo.RC:$src))>;
 }
 
 multiclass avx512_trunc<bits<8> opc, string OpcodeStr, SDNode OpNode,
          AVX512VLVectorVTInfo VTSrcInfo, X86VectorVTInfo DestInfoZ128,
          X86VectorVTInfo DestInfoZ256, X86VectorVTInfo DestInfoZ,
          X86MemOperand x86memopZ128, X86MemOperand x86memopZ256,
          X86MemOperand x86memopZ, PatFrag truncFrag, PatFrag mtruncFrag,
                                                      Predicate prd = HasAVX512>{
 
   let Predicates = [HasVLX, prd] in {
     defm Z128:  avx512_trunc_common<opc, OpcodeStr, OpNode, VTSrcInfo.info128,
                              DestInfoZ128, x86memopZ128>,
                 avx512_trunc_mr_lowering<VTSrcInfo.info128, DestInfoZ128,
                              truncFrag, mtruncFrag>, EVEX_V128;
 
     defm Z256:  avx512_trunc_common<opc, OpcodeStr, OpNode, VTSrcInfo.info256,
                              DestInfoZ256, x86memopZ256>,
                 avx512_trunc_mr_lowering<VTSrcInfo.info256, DestInfoZ256,
                              truncFrag, mtruncFrag>, EVEX_V256;
   }
   let Predicates = [prd] in
     defm Z:     avx512_trunc_common<opc, OpcodeStr, OpNode, VTSrcInfo.info512,
                              DestInfoZ, x86memopZ>,
                 avx512_trunc_mr_lowering<VTSrcInfo.info512, DestInfoZ,
                              truncFrag, mtruncFrag>, EVEX_V512;
 }
 
 multiclass avx512_trunc_sat<bits<8> opc, string OpcodeStr, SDNode OpNode,
          AVX512VLVectorVTInfo VTSrcInfo, X86VectorVTInfo DestInfoZ128,
          X86VectorVTInfo DestInfoZ256, X86VectorVTInfo DestInfoZ,
          X86MemOperand x86memopZ128, X86MemOperand x86memopZ256,
          X86MemOperand x86memopZ, string sat, Predicate prd = HasAVX512>{
 
   let Predicates = [HasVLX, prd] in {
     defm Z128:  avx512_trunc_common<opc, OpcodeStr, OpNode, VTSrcInfo.info128,
                              DestInfoZ128, x86memopZ128>,
                 avx512_trunc_sat_mr_lowering<VTSrcInfo.info128, DestInfoZ128,
                              sat>, EVEX_V128;
 
     defm Z256:  avx512_trunc_common<opc, OpcodeStr, OpNode, VTSrcInfo.info256,
                              DestInfoZ256, x86memopZ256>,
                 avx512_trunc_sat_mr_lowering<VTSrcInfo.info256, DestInfoZ256,
                              sat>, EVEX_V256;
   }
   let Predicates = [prd] in
     defm Z:     avx512_trunc_common<opc, OpcodeStr, OpNode, VTSrcInfo.info512,
                              DestInfoZ, x86memopZ>,
                 avx512_trunc_sat_mr_lowering<VTSrcInfo.info512, DestInfoZ,
                              sat>, EVEX_V512;
 }
 
 multiclass avx512_trunc_qb<bits<8> opc, string OpcodeStr, SDNode OpNode> {
   defm NAME: avx512_trunc<opc, OpcodeStr, OpNode, avx512vl_i64_info,
                v16i8x_info, v16i8x_info, v16i8x_info, i16mem, i32mem, i64mem,
                truncstorevi8, masked_truncstorevi8>, EVEX_CD8<8, CD8VO>;
 }
 multiclass avx512_trunc_sat_qb<bits<8> opc, string sat, SDNode OpNode> {
   defm NAME: avx512_trunc_sat<opc, "vpmov"##sat##"qb", OpNode, avx512vl_i64_info,
                v16i8x_info, v16i8x_info, v16i8x_info, i16mem, i32mem, i64mem,
                sat>, EVEX_CD8<8, CD8VO>;
 }
 
 multiclass avx512_trunc_qw<bits<8> opc, string OpcodeStr, SDNode OpNode> {
   defm NAME: avx512_trunc<opc, OpcodeStr, OpNode, avx512vl_i64_info,
                v8i16x_info, v8i16x_info, v8i16x_info, i32mem, i64mem, i128mem,
                truncstorevi16, masked_truncstorevi16>, EVEX_CD8<16, CD8VQ>;
 }
 multiclass avx512_trunc_sat_qw<bits<8> opc, string sat, SDNode OpNode> {
   defm NAME: avx512_trunc_sat<opc, "vpmov"##sat##"qw", OpNode, avx512vl_i64_info,
                v8i16x_info, v8i16x_info, v8i16x_info, i32mem, i64mem, i128mem,
                sat>, EVEX_CD8<16, CD8VQ>;
 }
 
 multiclass avx512_trunc_qd<bits<8> opc, string OpcodeStr, SDNode OpNode> {
   defm NAME: avx512_trunc<opc, OpcodeStr, OpNode, avx512vl_i64_info,
                v4i32x_info, v4i32x_info, v8i32x_info, i64mem, i128mem, i256mem,
                truncstorevi32, masked_truncstorevi32>, EVEX_CD8<32, CD8VH>;
 }
 multiclass avx512_trunc_sat_qd<bits<8> opc, string sat, SDNode OpNode> {
   defm NAME: avx512_trunc_sat<opc, "vpmov"##sat##"qd", OpNode, avx512vl_i64_info,
                v4i32x_info, v4i32x_info, v8i32x_info, i64mem, i128mem, i256mem,
                sat>, EVEX_CD8<32, CD8VH>;
 }
 
 multiclass avx512_trunc_db<bits<8> opc, string OpcodeStr, SDNode OpNode> {
   defm NAME: avx512_trunc<opc, OpcodeStr, OpNode, avx512vl_i32_info,
                v16i8x_info, v16i8x_info, v16i8x_info, i32mem, i64mem, i128mem,
                truncstorevi8, masked_truncstorevi8>, EVEX_CD8<8, CD8VQ>;
 }
 multiclass avx512_trunc_sat_db<bits<8> opc, string sat, SDNode OpNode> {
   defm NAME: avx512_trunc_sat<opc, "vpmov"##sat##"db", OpNode, avx512vl_i32_info,
                v16i8x_info, v16i8x_info, v16i8x_info, i32mem, i64mem, i128mem,
                sat>, EVEX_CD8<8, CD8VQ>;
 }
 
 multiclass avx512_trunc_dw<bits<8> opc, string OpcodeStr, SDNode OpNode> {
   defm NAME: avx512_trunc<opc, OpcodeStr, OpNode, avx512vl_i32_info,
               v8i16x_info, v8i16x_info, v16i16x_info, i64mem, i128mem, i256mem,
               truncstorevi16, masked_truncstorevi16>, EVEX_CD8<16, CD8VH>;
 }
 multiclass avx512_trunc_sat_dw<bits<8> opc, string sat, SDNode OpNode> {
   defm NAME: avx512_trunc_sat<opc, "vpmov"##sat##"dw", OpNode, avx512vl_i32_info,
               v8i16x_info, v8i16x_info, v16i16x_info, i64mem, i128mem, i256mem,
               sat>, EVEX_CD8<16, CD8VH>;
 }
 
 multiclass avx512_trunc_wb<bits<8> opc, string OpcodeStr, SDNode OpNode> {
   defm NAME: avx512_trunc<opc, OpcodeStr, OpNode, avx512vl_i16_info,
               v16i8x_info, v16i8x_info, v32i8x_info, i64mem, i128mem, i256mem,
               truncstorevi8, masked_truncstorevi8,HasBWI>, EVEX_CD8<16, CD8VH>;
 }
 multiclass avx512_trunc_sat_wb<bits<8> opc, string sat, SDNode OpNode> {
   defm NAME: avx512_trunc_sat<opc, "vpmov"##sat##"wb", OpNode, avx512vl_i16_info,
               v16i8x_info, v16i8x_info, v32i8x_info, i64mem, i128mem, i256mem,
               sat, HasBWI>, EVEX_CD8<16, CD8VH>;
 }
 
 defm VPMOVQB    : avx512_trunc_qb<0x32, "vpmovqb", X86vtrunc>;
 defm VPMOVSQB   : avx512_trunc_sat_qb<0x22, "s",   X86vtruncs>;
 defm VPMOVUSQB  : avx512_trunc_sat_qb<0x12, "us",  X86vtruncus>;
 
 defm VPMOVQW    : avx512_trunc_qw<0x34, "vpmovqw", X86vtrunc>;
 defm VPMOVSQW   : avx512_trunc_sat_qw<0x24, "s",   X86vtruncs>;
 defm VPMOVUSQW  : avx512_trunc_sat_qw<0x14, "us",  X86vtruncus>;
 
 defm VPMOVQD    : avx512_trunc_qd<0x35, "vpmovqd", X86vtrunc>;
 defm VPMOVSQD   : avx512_trunc_sat_qd<0x25, "s",   X86vtruncs>;
 defm VPMOVUSQD  : avx512_trunc_sat_qd<0x15, "us",  X86vtruncus>;
 
 defm VPMOVDB    : avx512_trunc_db<0x31, "vpmovdb", X86vtrunc>;
 defm VPMOVSDB   : avx512_trunc_sat_db<0x21, "s",   X86vtruncs>;
 defm VPMOVUSDB  : avx512_trunc_sat_db<0x11, "us",  X86vtruncus>;
 
 defm VPMOVDW    : avx512_trunc_dw<0x33, "vpmovdw", X86vtrunc>;
 defm VPMOVSDW   : avx512_trunc_sat_dw<0x23, "s",   X86vtruncs>;
 defm VPMOVUSDW  : avx512_trunc_sat_dw<0x13, "us",  X86vtruncus>;
 
 defm VPMOVWB    : avx512_trunc_wb<0x30, "vpmovwb", X86vtrunc>;
 defm VPMOVSWB   : avx512_trunc_sat_wb<0x20, "s",   X86vtruncs>;
 defm VPMOVUSWB  : avx512_trunc_sat_wb<0x10, "us",  X86vtruncus>;
 
 let Predicates = [HasAVX512, NoVLX] in {
 def: Pat<(v8i16 (X86vtrunc (v8i32 VR256X:$src))),
          (v8i16 (EXTRACT_SUBREG
                  (v16i16 (VPMOVDWZrr (v16i32 (SUBREG_TO_REG (i32 0),
                                           VR256X:$src, sub_ymm)))), sub_xmm))>;
 def: Pat<(v4i32 (X86vtrunc (v4i64 VR256X:$src))),
          (v4i32 (EXTRACT_SUBREG
                  (v8i32 (VPMOVQDZrr (v8i64 (SUBREG_TO_REG (i32 0),
                                            VR256X:$src, sub_ymm)))), sub_xmm))>;
 }
 
 let Predicates = [HasBWI, NoVLX] in {
 def: Pat<(v16i8 (X86vtrunc (v16i16 VR256X:$src))),
          (v16i8 (EXTRACT_SUBREG  (VPMOVWBZrr (v32i16 (SUBREG_TO_REG (i32 0),
                                             VR256X:$src, sub_ymm))), sub_xmm))>;
 }
 
 multiclass avx512_extend_common<bits<8> opc, string OpcodeStr,
                   X86VectorVTInfo DestInfo, X86VectorVTInfo SrcInfo,
                   X86MemOperand x86memop, PatFrag LdFrag, SDNode OpNode>{
 
   defm rr   : AVX512_maskable<opc, MRMSrcReg, DestInfo, (outs DestInfo.RC:$dst),
                     (ins SrcInfo.RC:$src), OpcodeStr ,"$src", "$src",
                     (DestInfo.VT (OpNode (SrcInfo.VT SrcInfo.RC:$src)))>,
                   EVEX;
 
   let mayLoad = 1 in {
     defm rm : AVX512_maskable<opc, MRMSrcMem, DestInfo, (outs DestInfo.RC:$dst),
                     (ins x86memop:$src), OpcodeStr ,"$src", "$src",
                     (DestInfo.VT (LdFrag addr:$src))>,
                   EVEX;
   }
 }
 
 multiclass avx512_extend_BW<bits<8> opc, string OpcodeStr, SDNode OpNode,
           string ExtTy,PatFrag LdFrag = !cast<PatFrag>(ExtTy#"extloadvi8")> {
   let Predicates = [HasVLX, HasBWI] in {
     defm Z128:  avx512_extend_common<opc, OpcodeStr, v8i16x_info,
                     v16i8x_info, i64mem, LdFrag, OpNode>,
                      EVEX_CD8<8, CD8VH>, T8PD, EVEX_V128;
 
     defm Z256:  avx512_extend_common<opc, OpcodeStr, v16i16x_info,
                     v16i8x_info, i128mem, LdFrag, OpNode>,
                      EVEX_CD8<8, CD8VH>, T8PD, EVEX_V256;
   }
   let Predicates = [HasBWI] in {
     defm Z   :  avx512_extend_common<opc, OpcodeStr, v32i16_info,
                     v32i8x_info, i256mem, LdFrag, OpNode>,
                      EVEX_CD8<8, CD8VH>, T8PD, EVEX_V512;
   }
 }
 
 multiclass avx512_extend_BD<bits<8> opc, string OpcodeStr, SDNode OpNode,
           string ExtTy,PatFrag LdFrag = !cast<PatFrag>(ExtTy#"extloadvi8")> {
   let Predicates = [HasVLX, HasAVX512] in {
     defm Z128:  avx512_extend_common<opc, OpcodeStr, v4i32x_info,
                    v16i8x_info, i32mem, LdFrag, OpNode>,
                          EVEX_CD8<8, CD8VQ>, T8PD, EVEX_V128;
 
     defm Z256:  avx512_extend_common<opc, OpcodeStr, v8i32x_info,
                    v16i8x_info, i64mem, LdFrag, OpNode>,
                          EVEX_CD8<8, CD8VQ>, T8PD, EVEX_V256;
   }
   let Predicates = [HasAVX512] in {
     defm Z   :  avx512_extend_common<opc, OpcodeStr, v16i32_info,
                    v16i8x_info, i128mem, LdFrag, OpNode>,
                          EVEX_CD8<8, CD8VQ>, T8PD, EVEX_V512;
   }
 }
 
 multiclass avx512_extend_BQ<bits<8> opc, string OpcodeStr, SDNode OpNode,
           string ExtTy,PatFrag LdFrag = !cast<PatFrag>(ExtTy#"extloadvi8")> {
   let Predicates = [HasVLX, HasAVX512] in {
     defm Z128:  avx512_extend_common<opc, OpcodeStr, v2i64x_info,
                    v16i8x_info, i16mem, LdFrag, OpNode>,
                      EVEX_CD8<8, CD8VO>, T8PD, EVEX_V128;
 
     defm Z256:  avx512_extend_common<opc, OpcodeStr, v4i64x_info,
                    v16i8x_info, i32mem, LdFrag, OpNode>,
                      EVEX_CD8<8, CD8VO>, T8PD, EVEX_V256;
   }
   let Predicates = [HasAVX512] in {
     defm Z   :  avx512_extend_common<opc, OpcodeStr, v8i64_info,
                    v16i8x_info, i64mem, LdFrag, OpNode>,
                      EVEX_CD8<8, CD8VO>, T8PD, EVEX_V512;
   }
 }
 
 multiclass avx512_extend_WD<bits<8> opc, string OpcodeStr, SDNode OpNode,
          string ExtTy,PatFrag LdFrag = !cast<PatFrag>(ExtTy#"extloadvi16")> {
   let Predicates = [HasVLX, HasAVX512] in {
     defm Z128:  avx512_extend_common<opc, OpcodeStr, v4i32x_info,
                    v8i16x_info, i64mem, LdFrag, OpNode>,
                      EVEX_CD8<16, CD8VH>, T8PD, EVEX_V128;
 
     defm Z256:  avx512_extend_common<opc, OpcodeStr, v8i32x_info,
                    v8i16x_info, i128mem, LdFrag, OpNode>,
                      EVEX_CD8<16, CD8VH>, T8PD, EVEX_V256;
   }
   let Predicates = [HasAVX512] in {
     defm Z   :  avx512_extend_common<opc, OpcodeStr, v16i32_info,
                    v16i16x_info, i256mem, LdFrag, OpNode>,
                      EVEX_CD8<16, CD8VH>, T8PD, EVEX_V512;
   }
 }
 
 multiclass avx512_extend_WQ<bits<8> opc, string OpcodeStr, SDNode OpNode,
          string ExtTy,PatFrag LdFrag = !cast<PatFrag>(ExtTy#"extloadvi16")> {
   let Predicates = [HasVLX, HasAVX512] in {
     defm Z128:  avx512_extend_common<opc, OpcodeStr, v2i64x_info,
                    v8i16x_info, i32mem, LdFrag, OpNode>,
                      EVEX_CD8<16, CD8VQ>, T8PD, EVEX_V128;
 
     defm Z256:  avx512_extend_common<opc, OpcodeStr, v4i64x_info,
                    v8i16x_info, i64mem, LdFrag, OpNode>,
                      EVEX_CD8<16, CD8VQ>, T8PD, EVEX_V256;
   }
   let Predicates = [HasAVX512] in {
     defm Z   :  avx512_extend_common<opc, OpcodeStr, v8i64_info,
                    v8i16x_info, i128mem, LdFrag, OpNode>,
                      EVEX_CD8<16, CD8VQ>, T8PD, EVEX_V512;
   }
 }
 
 multiclass avx512_extend_DQ<bits<8> opc, string OpcodeStr, SDNode OpNode,
          string ExtTy,PatFrag LdFrag = !cast<PatFrag>(ExtTy#"extloadvi32")> {
 
   let Predicates = [HasVLX, HasAVX512] in {
     defm Z128:  avx512_extend_common<opc, OpcodeStr, v2i64x_info,
                    v4i32x_info, i64mem, LdFrag, OpNode>,
                      EVEX_CD8<32, CD8VH>, T8PD, EVEX_V128;
 
     defm Z256:  avx512_extend_common<opc, OpcodeStr, v4i64x_info,
                    v4i32x_info, i128mem, LdFrag, OpNode>,
                      EVEX_CD8<32, CD8VH>, T8PD, EVEX_V256;
   }
   let Predicates = [HasAVX512] in {
     defm Z   :  avx512_extend_common<opc, OpcodeStr, v8i64_info,
                    v8i32x_info, i256mem, LdFrag, OpNode>,
                      EVEX_CD8<32, CD8VH>, T8PD, EVEX_V512;
   }
 }
 
 defm VPMOVZXBW : avx512_extend_BW<0x30, "vpmovzxbw", X86vzext, "z">;
 defm VPMOVZXBD : avx512_extend_BD<0x31, "vpmovzxbd", X86vzext, "z">;
 defm VPMOVZXBQ : avx512_extend_BQ<0x32, "vpmovzxbq", X86vzext, "z">;
 defm VPMOVZXWD : avx512_extend_WD<0x33, "vpmovzxwd", X86vzext, "z">;
 defm VPMOVZXWQ : avx512_extend_WQ<0x34, "vpmovzxwq", X86vzext, "z">;
 defm VPMOVZXDQ : avx512_extend_DQ<0x35, "vpmovzxdq", X86vzext, "z">;
 
 
 defm VPMOVSXBW: avx512_extend_BW<0x20, "vpmovsxbw", X86vsext, "s">;
 defm VPMOVSXBD: avx512_extend_BD<0x21, "vpmovsxbd", X86vsext, "s">;
 defm VPMOVSXBQ: avx512_extend_BQ<0x22, "vpmovsxbq", X86vsext, "s">;
 defm VPMOVSXWD: avx512_extend_WD<0x23, "vpmovsxwd", X86vsext, "s">;
 defm VPMOVSXWQ: avx512_extend_WQ<0x24, "vpmovsxwq", X86vsext, "s">;
 defm VPMOVSXDQ: avx512_extend_DQ<0x25, "vpmovsxdq", X86vsext, "s">;
 
 //===----------------------------------------------------------------------===//
 // GATHER - SCATTER Operations
 
 multiclass avx512_gather<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                          X86MemOperand memop, PatFrag GatherNode> {
   let Constraints = "@earlyclobber $dst, $src1 = $dst, $mask = $mask_wb",
       ExeDomain = _.ExeDomain in
   def rm  : AVX5128I<opc, MRMSrcMem, (outs _.RC:$dst, _.KRCWM:$mask_wb),
             (ins _.RC:$src1, _.KRCWM:$mask, memop:$src2),
             !strconcat(OpcodeStr#_.Suffix,
             "\t{$src2, ${dst} {${mask}}|${dst} {${mask}}, $src2}"),
             [(set _.RC:$dst, _.KRCWM:$mask_wb,
               (GatherNode  (_.VT _.RC:$src1), _.KRCWM:$mask,
                      vectoraddr:$src2))]>, EVEX, EVEX_K,
              EVEX_CD8<_.EltSize, CD8VT1>;
 }
 
 multiclass avx512_gather_q_pd<bits<8> dopc, bits<8> qopc,
                         AVX512VLVectorVTInfo _, string OpcodeStr, string SUFF> {
   defm NAME##D##SUFF##Z: avx512_gather<dopc, OpcodeStr##"d", _.info512,
                                       vy32xmem, mgatherv8i32>, EVEX_V512, VEX_W;
   defm NAME##Q##SUFF##Z: avx512_gather<qopc, OpcodeStr##"q", _.info512,
                                       vz64mem,  mgatherv8i64>, EVEX_V512, VEX_W;
 let Predicates = [HasVLX] in {
   defm NAME##D##SUFF##Z256: avx512_gather<dopc, OpcodeStr##"d", _.info256,
                               vx32xmem, mgatherv4i32>, EVEX_V256, VEX_W;
   defm NAME##Q##SUFF##Z256: avx512_gather<qopc, OpcodeStr##"q", _.info256,
                               vy64xmem, mgatherv4i64>, EVEX_V256, VEX_W;
   defm NAME##D##SUFF##Z128: avx512_gather<dopc, OpcodeStr##"d", _.info128,
                               vx32xmem, mgatherv4i32>, EVEX_V128, VEX_W;
   defm NAME##Q##SUFF##Z128: avx512_gather<qopc, OpcodeStr##"q", _.info128,
                               vx64xmem, mgatherv2i64>, EVEX_V128, VEX_W;
 }
 }
 
 multiclass avx512_gather_d_ps<bits<8> dopc, bits<8> qopc,
                        AVX512VLVectorVTInfo _, string OpcodeStr, string SUFF> {
   defm NAME##D##SUFF##Z: avx512_gather<dopc, OpcodeStr##"d", _.info512, vz32mem,
                                        mgatherv16i32>, EVEX_V512;
   defm NAME##Q##SUFF##Z: avx512_gather<qopc, OpcodeStr##"q", _.info256, vz64mem,
                                        mgatherv8i64>, EVEX_V512;
 let Predicates = [HasVLX] in {
   defm NAME##D##SUFF##Z256: avx512_gather<dopc, OpcodeStr##"d", _.info256,
                                           vy32xmem, mgatherv8i32>, EVEX_V256;
   defm NAME##Q##SUFF##Z256: avx512_gather<qopc, OpcodeStr##"q", _.info128,
                                           vy64xmem, mgatherv4i64>, EVEX_V256;
   defm NAME##D##SUFF##Z128: avx512_gather<dopc, OpcodeStr##"d", _.info128,
                                           vx32xmem, mgatherv4i32>, EVEX_V128;
   defm NAME##Q##SUFF##Z128: avx512_gather<qopc, OpcodeStr##"q", _.info128,
                                           vx64xmem, mgatherv2i64>, EVEX_V128;
 }
 }
 
 
 defm VGATHER : avx512_gather_q_pd<0x92, 0x93, avx512vl_f64_info, "vgather", "PD">,
                avx512_gather_d_ps<0x92, 0x93, avx512vl_f32_info, "vgather", "PS">;
 
 defm VPGATHER : avx512_gather_q_pd<0x90, 0x91, avx512vl_i64_info, "vpgather", "Q">,
                 avx512_gather_d_ps<0x90, 0x91, avx512vl_i32_info, "vpgather", "D">;
 
 multiclass avx512_scatter<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
                           X86MemOperand memop, PatFrag ScatterNode> {
 
 let mayStore = 1, Constraints = "$mask = $mask_wb", ExeDomain = _.ExeDomain in
 
   def mr  : AVX5128I<opc, MRMDestMem, (outs _.KRCWM:$mask_wb),
             (ins memop:$dst, _.KRCWM:$mask, _.RC:$src),
             !strconcat(OpcodeStr#_.Suffix,
             "\t{$src, ${dst} {${mask}}|${dst} {${mask}}, $src}"),
             [(set _.KRCWM:$mask_wb, (ScatterNode (_.VT _.RC:$src),
                                      _.KRCWM:$mask,  vectoraddr:$dst))]>,
             EVEX, EVEX_K, EVEX_CD8<_.EltSize, CD8VT1>;
 }
 
 multiclass avx512_scatter_q_pd<bits<8> dopc, bits<8> qopc,
                         AVX512VLVectorVTInfo _, string OpcodeStr, string SUFF> {
   defm NAME##D##SUFF##Z: avx512_scatter<dopc, OpcodeStr##"d", _.info512,
                                       vy32xmem, mscatterv8i32>, EVEX_V512, VEX_W;
   defm NAME##Q##SUFF##Z: avx512_scatter<qopc, OpcodeStr##"q", _.info512,
                                       vz64mem,  mscatterv8i64>, EVEX_V512, VEX_W;
 let Predicates = [HasVLX] in {
   defm NAME##D##SUFF##Z256: avx512_scatter<dopc, OpcodeStr##"d", _.info256,
                               vx32xmem, mscatterv4i32>, EVEX_V256, VEX_W;
   defm NAME##Q##SUFF##Z256: avx512_scatter<qopc, OpcodeStr##"q", _.info256,
                               vy64xmem, mscatterv4i64>, EVEX_V256, VEX_W;
   defm NAME##D##SUFF##Z128: avx512_scatter<dopc, OpcodeStr##"d", _.info128,
                               vx32xmem, mscatterv4i32>, EVEX_V128, VEX_W;
   defm NAME##Q##SUFF##Z128: avx512_scatter<qopc, OpcodeStr##"q", _.info128,
                               vx64xmem, mscatterv2i64>, EVEX_V128, VEX_W;
 }
 }
 
 multiclass avx512_scatter_d_ps<bits<8> dopc, bits<8> qopc,
                        AVX512VLVectorVTInfo _, string OpcodeStr, string SUFF> {
   defm NAME##D##SUFF##Z: avx512_scatter<dopc, OpcodeStr##"d", _.info512, vz32mem,
                                        mscatterv16i32>, EVEX_V512;
   defm NAME##Q##SUFF##Z: avx512_scatter<qopc, OpcodeStr##"q", _.info256, vz64mem,
                                        mscatterv8i64>, EVEX_V512;
 let Predicates = [HasVLX] in {
   defm NAME##D##SUFF##Z256: avx512_scatter<dopc, OpcodeStr##"d", _.info256,
                                           vy32xmem, mscatterv8i32>, EVEX_V256;
   defm NAME##Q##SUFF##Z256: avx512_scatter<qopc, OpcodeStr##"q", _.info128,
                                           vy64xmem, mscatterv4i64>, EVEX_V256;
   defm NAME##D##SUFF##Z128: avx512_scatter<dopc, OpcodeStr##"d", _.info128,
                                           vx32xmem, mscatterv4i32>, EVEX_V128;
   defm NAME##Q##SUFF##Z128: avx512_scatter<qopc, OpcodeStr##"q", _.info128,
                                           vx64xmem, mscatterv2i64>, EVEX_V128;
 }
 }
 
 defm VSCATTER : avx512_scatter_q_pd<0xA2, 0xA3, avx512vl_f64_info, "vscatter", "PD">,
                avx512_scatter_d_ps<0xA2, 0xA3, avx512vl_f32_info, "vscatter", "PS">;
 
 defm VPSCATTER : avx512_scatter_q_pd<0xA0, 0xA1, avx512vl_i64_info, "vpscatter", "Q">,
                 avx512_scatter_d_ps<0xA0, 0xA1, avx512vl_i32_info, "vpscatter", "D">;
 
 // prefetch
 multiclass avx512_gather_scatter_prefetch<bits<8> opc, Format F, string OpcodeStr,
                        RegisterClass KRC, X86MemOperand memop> {
   let Predicates = [HasPFI], hasSideEffects = 1 in
   def m  : AVX5128I<opc, F, (outs), (ins KRC:$mask, memop:$src),
             !strconcat(OpcodeStr, "\t{$src {${mask}}|{${mask}}, $src}"),
             []>, EVEX, EVEX_K;
 }
 
 defm VGATHERPF0DPS: avx512_gather_scatter_prefetch<0xC6, MRM1m, "vgatherpf0dps",
                      VK16WM, vz32mem>, EVEX_V512, EVEX_CD8<32, CD8VT1>;
 
 defm VGATHERPF0QPS: avx512_gather_scatter_prefetch<0xC7, MRM1m, "vgatherpf0qps",
                      VK8WM, vz64mem>, EVEX_V512, EVEX_CD8<64, CD8VT1>;
 
 defm VGATHERPF0DPD: avx512_gather_scatter_prefetch<0xC6, MRM1m, "vgatherpf0dpd",
                      VK8WM, vy32mem>, EVEX_V512, VEX_W, EVEX_CD8<32, CD8VT1>;
 
 defm VGATHERPF0QPD: avx512_gather_scatter_prefetch<0xC7, MRM1m, "vgatherpf0qpd",
                      VK8WM, vz64mem>, EVEX_V512, VEX_W, EVEX_CD8<64, CD8VT1>;
 
 defm VGATHERPF1DPS: avx512_gather_scatter_prefetch<0xC6, MRM2m, "vgatherpf1dps",
                      VK16WM, vz32mem>, EVEX_V512, EVEX_CD8<32, CD8VT1>;
 
 defm VGATHERPF1QPS: avx512_gather_scatter_prefetch<0xC7, MRM2m, "vgatherpf1qps",
                      VK8WM, vz64mem>, EVEX_V512, EVEX_CD8<64, CD8VT1>;
 
 defm VGATHERPF1DPD: avx512_gather_scatter_prefetch<0xC6, MRM2m, "vgatherpf1dpd",
                      VK8WM, vy32mem>, EVEX_V512, VEX_W, EVEX_CD8<32, CD8VT1>;
 
 defm VGATHERPF1QPD: avx512_gather_scatter_prefetch<0xC7, MRM2m, "vgatherpf1qpd",
                      VK8WM, vz64mem>, EVEX_V512, VEX_W, EVEX_CD8<64, CD8VT1>;
 
 defm VSCATTERPF0DPS: avx512_gather_scatter_prefetch<0xC6, MRM5m, "vscatterpf0dps",
                      VK16WM, vz32mem>, EVEX_V512, EVEX_CD8<32, CD8VT1>;
 
 defm VSCATTERPF0QPS: avx512_gather_scatter_prefetch<0xC7, MRM5m, "vscatterpf0qps",
                      VK8WM, vz64mem>, EVEX_V512, EVEX_CD8<64, CD8VT1>;
 
 defm VSCATTERPF0DPD: avx512_gather_scatter_prefetch<0xC6, MRM5m, "vscatterpf0dpd",
                      VK8WM, vy32mem>, EVEX_V512, VEX_W, EVEX_CD8<32, CD8VT1>;
 
 defm VSCATTERPF0QPD: avx512_gather_scatter_prefetch<0xC7, MRM5m, "vscatterpf0qpd",
                      VK8WM, vz64mem>, EVEX_V512, VEX_W, EVEX_CD8<64, CD8VT1>;
 
 defm VSCATTERPF1DPS: avx512_gather_scatter_prefetch<0xC6, MRM6m, "vscatterpf1dps",
                      VK16WM, vz32mem>, EVEX_V512, EVEX_CD8<32, CD8VT1>;
 
 defm VSCATTERPF1QPS: avx512_gather_scatter_prefetch<0xC7, MRM6m, "vscatterpf1qps",
                      VK8WM, vz64mem>, EVEX_V512, EVEX_CD8<64, CD8VT1>;
 
 defm VSCATTERPF1DPD: avx512_gather_scatter_prefetch<0xC6, MRM6m, "vscatterpf1dpd",
                      VK8WM, vy32mem>, EVEX_V512, VEX_W, EVEX_CD8<32, CD8VT1>;
 
 defm VSCATTERPF1QPD: avx512_gather_scatter_prefetch<0xC7, MRM6m, "vscatterpf1qpd",
                      VK8WM, vz64mem>, EVEX_V512, VEX_W, EVEX_CD8<64, CD8VT1>;
 
 // Helper fragments to match sext vXi1 to vXiY.
 def v16i1sextv16i32  : PatLeaf<(v16i32 (X86vsrai VR512:$src, (i8 31)))>;
 def v8i1sextv8i64  : PatLeaf<(v8i64 (X86vsrai VR512:$src, (i8 63)))>;
 
 def : Pat<(store (i1 -1), addr:$dst), (MOV8mi addr:$dst, (i8 1))>;
 def : Pat<(store (i1  1), addr:$dst), (MOV8mi addr:$dst, (i8 1))>;
 def : Pat<(store (i1  0), addr:$dst), (MOV8mi addr:$dst, (i8 0))>;
 
 def : Pat<(store VK1:$src, addr:$dst),
           (MOV8mr addr:$dst,
            (EXTRACT_SUBREG (KMOVWrk (COPY_TO_REGCLASS VK1:$src, VK16)),
             sub_8bit))>, Requires<[HasAVX512, NoDQI]>;
 
 def : Pat<(store VK8:$src, addr:$dst),
           (MOV8mr addr:$dst,
            (EXTRACT_SUBREG (KMOVWrk (COPY_TO_REGCLASS VK8:$src, VK16)),
             sub_8bit))>, Requires<[HasAVX512, NoDQI]>;
 
 def truncstorei1 : PatFrag<(ops node:$val, node:$ptr),
                            (truncstore node:$val, node:$ptr), [{
   return cast<StoreSDNode>(N)->getMemoryVT() == MVT::i1;
 }]>;
 
 def : Pat<(truncstorei1 GR8:$src, addr:$dst),
           (MOV8mr addr:$dst, GR8:$src)>;
 
 multiclass cvt_by_vec_width<bits<8> opc, X86VectorVTInfo Vec, string OpcodeStr > {
 def rr : AVX512XS8I<opc, MRMSrcReg, (outs Vec.RC:$dst), (ins Vec.KRC:$src),
                   !strconcat(OpcodeStr##Vec.Suffix, "\t{$src, $dst|$dst, $src}"),
                   [(set Vec.RC:$dst, (Vec.VT (X86vsext Vec.KRC:$src)))]>, EVEX;
 }
 
 multiclass cvt_mask_by_elt_width<bits<8> opc, AVX512VLVectorVTInfo VTInfo,
                                  string OpcodeStr, Predicate prd> {
 let Predicates = [prd] in
   defm Z : cvt_by_vec_width<opc, VTInfo.info512, OpcodeStr>, EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : cvt_by_vec_width<opc, VTInfo.info256, OpcodeStr>, EVEX_V256;
     defm Z128 : cvt_by_vec_width<opc, VTInfo.info128, OpcodeStr>, EVEX_V128;
   }
 }
 
 multiclass avx512_convert_mask_to_vector<string OpcodeStr> {
   defm NAME##B : cvt_mask_by_elt_width<0x28, avx512vl_i8_info,  OpcodeStr,
                                        HasBWI>;
   defm NAME##W : cvt_mask_by_elt_width<0x28, avx512vl_i16_info, OpcodeStr,
                                        HasBWI>, VEX_W;
   defm NAME##D : cvt_mask_by_elt_width<0x38, avx512vl_i32_info, OpcodeStr,
                                        HasDQI>;
   defm NAME##Q : cvt_mask_by_elt_width<0x38, avx512vl_i64_info, OpcodeStr,
                                        HasDQI>, VEX_W;
 }
 
 defm VPMOVM2 : avx512_convert_mask_to_vector<"vpmovm2">;
 
 multiclass convert_vector_to_mask_common<bits<8> opc, X86VectorVTInfo _, string OpcodeStr > {
 def rr : AVX512XS8I<opc, MRMSrcReg, (outs _.KRC:$dst), (ins _.RC:$src),
                   !strconcat(OpcodeStr, "\t{$src, $dst|$dst, $src}"),
                   [(set _.KRC:$dst, (X86cvt2mask (_.VT _.RC:$src)))]>, EVEX;
 }
 
 multiclass avx512_convert_vector_to_mask<bits<8> opc, string OpcodeStr,
                         AVX512VLVectorVTInfo VTInfo, Predicate prd> {
 let Predicates = [prd] in
   defm Z : convert_vector_to_mask_common <opc, VTInfo.info512, OpcodeStr>,
    EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : convert_vector_to_mask_common<opc, VTInfo.info256, OpcodeStr>,
      EVEX_V256;
     defm Z128 : convert_vector_to_mask_common<opc, VTInfo.info128, OpcodeStr>,
      EVEX_V128;
   }
 }
 
 defm VPMOVB2M : avx512_convert_vector_to_mask<0x29, "vpmovb2m",
                                               avx512vl_i8_info, HasBWI>;
 defm VPMOVW2M : avx512_convert_vector_to_mask<0x29, "vpmovw2m",
                                               avx512vl_i16_info, HasBWI>, VEX_W;
 defm VPMOVD2M : avx512_convert_vector_to_mask<0x39, "vpmovd2m",
                                               avx512vl_i32_info, HasDQI>;
 defm VPMOVQ2M : avx512_convert_vector_to_mask<0x39, "vpmovq2m",
                                               avx512vl_i64_info, HasDQI>, VEX_W;
 
 //===----------------------------------------------------------------------===//
 // AVX-512 - COMPRESS and EXPAND
 //
 
 multiclass compress_by_vec_width<bits<8> opc, X86VectorVTInfo _,
                                  string OpcodeStr> {
   defm rr : AVX512_maskable<opc, MRMDestReg, _, (outs _.RC:$dst),
               (ins _.RC:$src1), OpcodeStr, "$src1", "$src1",
               (_.VT (X86compress _.RC:$src1))>, AVX5128IBase;
 
   let mayStore = 1 in {
   def mr : AVX5128I<opc, MRMDestMem, (outs),
               (ins _.MemOp:$dst, _.RC:$src),
               OpcodeStr # "\t{$src, $dst|$dst, $src}",
               []>, EVEX_CD8<_.EltSize, CD8VT1>;
 
   def mrk : AVX5128I<opc, MRMDestMem, (outs),
               (ins _.MemOp:$dst, _.KRCWM:$mask, _.RC:$src),
               OpcodeStr # "\t{$src, $dst {${mask}}|$dst {${mask}}, $src}",
               [(store (_.VT (vselect _.KRCWM:$mask,
                              (_.VT (X86compress  _.RC:$src)), _.ImmAllZerosV)),
                 addr:$dst)]>,
               EVEX_K, EVEX_CD8<_.EltSize, CD8VT1>;
   }
 }
 
 multiclass compress_by_elt_width<bits<8> opc, string OpcodeStr,
                                  AVX512VLVectorVTInfo VTInfo> {
   defm Z : compress_by_vec_width<opc, VTInfo.info512, OpcodeStr>, EVEX_V512;
 
   let Predicates = [HasVLX] in {
     defm Z256 : compress_by_vec_width<opc, VTInfo.info256, OpcodeStr>, EVEX_V256;
     defm Z128 : compress_by_vec_width<opc, VTInfo.info128, OpcodeStr>, EVEX_V128;
   }
 }
 
 defm VPCOMPRESSD : compress_by_elt_width <0x8B, "vpcompressd", avx512vl_i32_info>,
                                          EVEX;
 defm VPCOMPRESSQ : compress_by_elt_width <0x8B, "vpcompressq", avx512vl_i64_info>,
                                          EVEX, VEX_W;
 defm VCOMPRESSPS : compress_by_elt_width <0x8A, "vcompressps", avx512vl_f32_info>,
                                          EVEX;
 defm VCOMPRESSPD : compress_by_elt_width <0x8A, "vcompresspd", avx512vl_f64_info>,
                                          EVEX, VEX_W;
 
 // expand
 multiclass expand_by_vec_width<bits<8> opc, X86VectorVTInfo _,
                                  string OpcodeStr> {
   defm rr : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
               (ins _.RC:$src1), OpcodeStr, "$src1", "$src1",
               (_.VT (X86expand _.RC:$src1))>, AVX5128IBase;
 
   let mayLoad = 1 in
   defm rm : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
               (ins _.MemOp:$src1), OpcodeStr, "$src1", "$src1",
               (_.VT (X86expand (_.VT (bitconvert
                                       (_.LdFrag addr:$src1)))))>,
             AVX5128IBase, EVEX_CD8<_.EltSize, CD8VT1>;
 }
 
 multiclass expand_by_elt_width<bits<8> opc, string OpcodeStr,
                                  AVX512VLVectorVTInfo VTInfo> {
   defm Z : expand_by_vec_width<opc, VTInfo.info512, OpcodeStr>, EVEX_V512;
 
   let Predicates = [HasVLX] in {
     defm Z256 : expand_by_vec_width<opc, VTInfo.info256, OpcodeStr>, EVEX_V256;
     defm Z128 : expand_by_vec_width<opc, VTInfo.info128, OpcodeStr>, EVEX_V128;
   }
 }
 
 defm VPEXPANDD : expand_by_elt_width <0x89, "vpexpandd", avx512vl_i32_info>,
                                          EVEX;
 defm VPEXPANDQ : expand_by_elt_width <0x89, "vpexpandq", avx512vl_i64_info>,
                                          EVEX, VEX_W;
 defm VEXPANDPS : expand_by_elt_width <0x88, "vexpandps", avx512vl_f32_info>,
                                          EVEX;
 defm VEXPANDPD : expand_by_elt_width <0x88, "vexpandpd", avx512vl_f64_info>,
                                          EVEX, VEX_W;
 
 //handle instruction  reg_vec1 = op(reg_vec,imm)
 //                               op(mem_vec,imm)
 //                               op(broadcast(eltVt),imm)
 //all instruction created with FROUND_CURRENT
 multiclass avx512_unary_fp_packed_imm<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             X86VectorVTInfo _>{
   defm rri : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                       (ins _.RC:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix, "$src2, $src1", "$src2, $src2",
                       (OpNode (_.VT _.RC:$src1),
                               (i32 imm:$src2),
                               (i32 FROUND_CURRENT))>;
   let mayLoad = 1 in {
     defm rmi : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                       (ins _.MemOp:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix, "$src2, $src1", "$src1, $src2",
                       (OpNode (_.VT (bitconvert (_.LdFrag addr:$src1))),
                               (i32 imm:$src2),
                               (i32 FROUND_CURRENT))>;
     defm rmbi : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                       (ins _.ScalarMemOp:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix, "$src2, ${src1}"##_.BroadcastStr,
                       "${src1}"##_.BroadcastStr##", $src2",
                       (OpNode (_.VT (X86VBroadcast(_.ScalarLdFrag addr:$src1))),
                               (i32 imm:$src2),
                               (i32 FROUND_CURRENT))>, EVEX_B;
   }
 }
 
 //handle instruction  reg_vec1 = op(reg_vec2,reg_vec3,imm),{sae}
 multiclass avx512_unary_fp_sae_packed_imm<bits<8> opc, string OpcodeStr,
                                              SDNode OpNode, X86VectorVTInfo _>{
   defm rrib : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                       (ins _.RC:$src1, i32u8imm:$src2),
                       OpcodeStr##_.Suffix, "$src2, {sae}, $src1",
                       "$src1, {sae}, $src2",
                       (OpNode (_.VT _.RC:$src1),
                               (i32 imm:$src2),
                               (i32 FROUND_NO_EXC))>, EVEX_B;
 }
 
 multiclass avx512_common_unary_fp_sae_packed_imm<string OpcodeStr,
             AVX512VLVectorVTInfo _, bits<8> opc, SDNode OpNode, Predicate prd>{
   let Predicates = [prd] in {
     defm Z    : avx512_unary_fp_packed_imm<opc, OpcodeStr, OpNode, _.info512>,
                 avx512_unary_fp_sae_packed_imm<opc, OpcodeStr, OpNode, _.info512>,
                                   EVEX_V512;
   }
   let Predicates = [prd, HasVLX] in {
     defm Z128 : avx512_unary_fp_packed_imm<opc, OpcodeStr, OpNode, _.info128>,
                                   EVEX_V128;
     defm Z256 : avx512_unary_fp_packed_imm<opc, OpcodeStr, OpNode, _.info256>,
                                   EVEX_V256;
   }
 }
 
 //handle instruction  reg_vec1 = op(reg_vec2,reg_vec3,imm)
 //                               op(reg_vec2,mem_vec,imm)
 //                               op(reg_vec2,broadcast(eltVt),imm)
 //all instruction created with FROUND_CURRENT
 multiclass avx512_fp_packed_imm<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             X86VectorVTInfo _>{
   defm rri : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                       (ins _.RC:$src1, _.RC:$src2, i32u8imm:$src3),
                       OpcodeStr, "$src3, $src2, $src1", "$src1, $src2, $src3",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT _.RC:$src2),
                               (i32 imm:$src3),
                               (i32 FROUND_CURRENT))>;
   let mayLoad = 1 in {
     defm rmi : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                       (ins _.RC:$src1, _.MemOp:$src2, i32u8imm:$src3),
                       OpcodeStr, "$src3, $src2, $src1", "$src1, $src2, $src3",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT (bitconvert (_.LdFrag addr:$src2))),
                               (i32 imm:$src3),
                               (i32 FROUND_CURRENT))>;
     defm rmbi : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                       (ins _.RC:$src1, _.ScalarMemOp:$src2, i32u8imm:$src3),
                       OpcodeStr, "$src3, ${src2}"##_.BroadcastStr##", $src1",
                       "$src1, ${src2}"##_.BroadcastStr##", $src3",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT (X86VBroadcast(_.ScalarLdFrag addr:$src2))),
                               (i32 imm:$src3),
                               (i32 FROUND_CURRENT))>, EVEX_B;
   }
 }
 
 //handle instruction  reg_vec1 = op(reg_vec2,reg_vec3,imm)
 //                               op(reg_vec2,mem_vec,imm)
 multiclass avx512_3Op_rm_imm8<bits<8> opc, string OpcodeStr, SDNode OpNode,
                              X86VectorVTInfo DestInfo, X86VectorVTInfo SrcInfo>{
 
   defm rri : AVX512_maskable<opc, MRMSrcReg, DestInfo, (outs DestInfo.RC:$dst),
                   (ins SrcInfo.RC:$src1, SrcInfo.RC:$src2, u8imm:$src3),
                   OpcodeStr, "$src3, $src2, $src1", "$src1, $src2, $src3",
                   (DestInfo.VT (OpNode (SrcInfo.VT SrcInfo.RC:$src1),
                                (SrcInfo.VT SrcInfo.RC:$src2),
                                (i8 imm:$src3)))>;
   let mayLoad = 1 in
     defm rmi : AVX512_maskable<opc, MRMSrcMem, DestInfo, (outs DestInfo.RC:$dst),
                   (ins SrcInfo.RC:$src1, SrcInfo.MemOp:$src2, u8imm:$src3),
                   OpcodeStr, "$src3, $src2, $src1", "$src1, $src2, $src3",
                   (DestInfo.VT (OpNode (SrcInfo.VT SrcInfo.RC:$src1),
                                (SrcInfo.VT (bitconvert
                                                   (SrcInfo.LdFrag addr:$src2))),
                                (i8 imm:$src3)))>;
 }
 
 //handle instruction  reg_vec1 = op(reg_vec2,reg_vec3,imm)
 //                               op(reg_vec2,mem_vec,imm)
 //                               op(reg_vec2,broadcast(eltVt),imm)
 multiclass avx512_3Op_imm8<bits<8> opc, string OpcodeStr, SDNode OpNode,
                            X86VectorVTInfo _>:
   avx512_3Op_rm_imm8<opc, OpcodeStr, OpNode, _, _>{
 
   let mayLoad = 1 in
     defm rmbi : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                       (ins _.RC:$src1, _.ScalarMemOp:$src2, u8imm:$src3),
                       OpcodeStr, "$src3, ${src2}"##_.BroadcastStr##", $src1",
                       "$src1, ${src2}"##_.BroadcastStr##", $src3",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT (X86VBroadcast(_.ScalarLdFrag addr:$src2))),
                               (i8 imm:$src3))>, EVEX_B;
 }
 
 //handle scalar instruction  reg_vec1 = op(reg_vec2,reg_vec3,imm)
 //                                      op(reg_vec2,mem_scalar,imm)
 //all instruction created with FROUND_CURRENT
 multiclass avx512_fp_scalar_imm<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                            X86VectorVTInfo _> {
 
   defm rri : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                       (ins _.RC:$src1, _.RC:$src2, i32u8imm:$src3),
                       OpcodeStr, "$src3, $src2, $src1", "$src1, $src2, $src3",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT _.RC:$src2),
                               (i32 imm:$src3),
                               (i32 FROUND_CURRENT))>;
   let mayLoad = 1 in {
     defm rmi : AVX512_maskable_scalar<opc, MRMSrcMem, _, (outs _.RC:$dst),
                       (ins _.RC:$src1, _.MemOp:$src2, i32u8imm:$src3),
                       OpcodeStr, "$src3, $src2, $src1", "$src1, $src2, $src3",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT (scalar_to_vector
                                         (_.ScalarLdFrag addr:$src2))),
                               (i32 imm:$src3),
                               (i32 FROUND_CURRENT))>;
 
     let isAsmParserOnly = 1 in {
       defm rmi_alt :AVX512_maskable_in_asm<opc, MRMSrcMem, _, (outs _.FRC:$dst),
                       (ins _.FRC:$src1, _.ScalarMemOp:$src2, u8imm:$src3),
                       OpcodeStr, "$src3, $src2, $src1", "$src1, $src2, $src3",
                       []>;
     }
   }
 }
 
 //handle instruction  reg_vec1 = op(reg_vec2,reg_vec3,imm),{sae}
 multiclass avx512_fp_sae_packed_imm<bits<8> opc, string OpcodeStr,
                                              SDNode OpNode, X86VectorVTInfo _>{
   defm rrib : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                       (ins _.RC:$src1, _.RC:$src2, i32u8imm:$src3),
                       OpcodeStr, "$src3, {sae}, $src2, $src1",
                       "$src1, $src2, {sae}, $src3",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT _.RC:$src2),
                               (i32 imm:$src3),
                               (i32 FROUND_NO_EXC))>, EVEX_B;
 }
 //handle scalar instruction  reg_vec1 = op(reg_vec2,reg_vec3,imm),{sae}
 multiclass avx512_fp_sae_scalar_imm<bits<8> opc, string OpcodeStr,
                                              SDNode OpNode, X86VectorVTInfo _> {
   defm NAME#rrib : AVX512_maskable_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
                       (ins _.RC:$src1, _.RC:$src2, i32u8imm:$src3),
                       OpcodeStr, "$src3, {sae}, $src2, $src1",
                       "$src1, $src2, {sae}, $src3",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT _.RC:$src2),
                               (i32 imm:$src3),
                               (i32 FROUND_NO_EXC))>, EVEX_B;
 }
 
 multiclass avx512_common_fp_sae_packed_imm<string OpcodeStr,
             AVX512VLVectorVTInfo _, bits<8> opc, SDNode OpNode, Predicate prd>{
   let Predicates = [prd] in {
     defm Z    : avx512_fp_packed_imm<opc, OpcodeStr, OpNode, _.info512>,
                 avx512_fp_sae_packed_imm<opc, OpcodeStr, OpNode, _.info512>,
                                   EVEX_V512;
 
   }
   let Predicates = [prd, HasVLX] in {
     defm Z128 : avx512_fp_packed_imm<opc, OpcodeStr, OpNode, _.info128>,
                                   EVEX_V128;
     defm Z256 : avx512_fp_packed_imm<opc, OpcodeStr, OpNode, _.info256>,
                                   EVEX_V256;
   }
 }
 
 multiclass avx512_common_3Op_rm_imm8<bits<8> opc, SDNode OpNode, string OpStr,
                    AVX512VLVectorVTInfo DestInfo, AVX512VLVectorVTInfo SrcInfo>{
   let Predicates = [HasBWI] in {
     defm Z    : avx512_3Op_rm_imm8<opc, OpStr, OpNode, DestInfo.info512,
                            SrcInfo.info512>, EVEX_V512, AVX512AIi8Base, EVEX_4V;
   }
   let Predicates = [HasBWI, HasVLX] in {
     defm Z128 : avx512_3Op_rm_imm8<opc, OpStr, OpNode, DestInfo.info128,
                            SrcInfo.info128>, EVEX_V128, AVX512AIi8Base, EVEX_4V;
     defm Z256 : avx512_3Op_rm_imm8<opc, OpStr, OpNode,  DestInfo.info256,
                            SrcInfo.info256>, EVEX_V256, AVX512AIi8Base, EVEX_4V;
   }
 }
 
 multiclass avx512_common_3Op_imm8<string OpcodeStr, AVX512VLVectorVTInfo _,
                                 bits<8> opc, SDNode OpNode>{
   let Predicates = [HasAVX512] in {
     defm Z    : avx512_3Op_imm8<opc, OpcodeStr, OpNode, _.info512>, EVEX_V512;
   }
   let Predicates = [HasAVX512, HasVLX] in {
     defm Z128 : avx512_3Op_imm8<opc, OpcodeStr, OpNode, _.info128>, EVEX_V128;
     defm Z256 : avx512_3Op_imm8<opc, OpcodeStr, OpNode, _.info256>, EVEX_V256;
   }
 }
 
 multiclass avx512_common_fp_sae_scalar_imm<string OpcodeStr,
                   X86VectorVTInfo _, bits<8> opc, SDNode OpNode, Predicate prd>{
   let Predicates = [prd] in {
      defm Z128 : avx512_fp_scalar_imm<opc, OpcodeStr, OpNode, _>,
                  avx512_fp_sae_scalar_imm<opc, OpcodeStr, OpNode, _>;
   }
 }
 
 multiclass avx512_common_unary_fp_sae_packed_imm_all<string OpcodeStr,
                     bits<8> opcPs, bits<8> opcPd, SDNode OpNode, Predicate prd>{
   defm PS : avx512_common_unary_fp_sae_packed_imm<OpcodeStr, avx512vl_f32_info,
                             opcPs, OpNode, prd>, EVEX_CD8<32, CD8VF>;
   defm PD : avx512_common_unary_fp_sae_packed_imm<OpcodeStr, avx512vl_f64_info,
                             opcPd, OpNode, prd>, EVEX_CD8<64, CD8VF>, VEX_W;
 }
 
 defm VFIXUPIMMPD : avx512_common_fp_sae_packed_imm<"vfixupimmpd",
                               avx512vl_f64_info, 0x54, X86VFixupimm, HasAVX512>,
       AVX512AIi8Base, EVEX_4V, EVEX_CD8<64, CD8VF>, VEX_W;
 defm VFIXUPIMMPS : avx512_common_fp_sae_packed_imm<"vfixupimmps",
                               avx512vl_f32_info, 0x54, X86VFixupimm, HasAVX512>,
       AVX512AIi8Base, EVEX_4V, EVEX_CD8<32, CD8VF>;
 
 defm VFIXUPIMMSD: avx512_common_fp_sae_scalar_imm<"vfixupimmsd", f64x_info,
                                                  0x55, X86VFixupimm, HasAVX512>,
       AVX512AIi8Base, VEX_LIG, EVEX_4V, EVEX_CD8<64, CD8VT1>, VEX_W;
 defm VFIXUPIMMSS: avx512_common_fp_sae_scalar_imm<"vfixupimmss", f32x_info,
                                                  0x55, X86VFixupimm, HasAVX512>,
       AVX512AIi8Base, VEX_LIG, EVEX_4V, EVEX_CD8<32, CD8VT1>;
 
 defm VREDUCE   : avx512_common_unary_fp_sae_packed_imm_all<"vreduce", 0x56, 0x56,
                               X86VReduce, HasDQI>, AVX512AIi8Base, EVEX;
 defm VRNDSCALE : avx512_common_unary_fp_sae_packed_imm_all<"vrndscale", 0x08, 0x09,
                               X86VRndScale, HasAVX512>, AVX512AIi8Base, EVEX;
 defm VGETMANT : avx512_common_unary_fp_sae_packed_imm_all<"vgetmant", 0x26, 0x26,
                               X86VGetMant, HasAVX512>, AVX512AIi8Base, EVEX;
 
 
 defm VRANGEPD : avx512_common_fp_sae_packed_imm<"vrangepd", avx512vl_f64_info,
                                                        0x50, X86VRange, HasDQI>,
       AVX512AIi8Base, EVEX_4V, EVEX_CD8<64, CD8VF>, VEX_W;
 defm VRANGEPS : avx512_common_fp_sae_packed_imm<"vrangeps", avx512vl_f32_info,
                                                        0x50, X86VRange, HasDQI>,
       AVX512AIi8Base, EVEX_4V, EVEX_CD8<32, CD8VF>;
 
 defm VRANGESD: avx512_common_fp_sae_scalar_imm<"vrangesd", f64x_info,
                                                  0x51, X86VRange, HasDQI>,
       AVX512AIi8Base, VEX_LIG, EVEX_4V, EVEX_CD8<64, CD8VT1>, VEX_W;
 defm VRANGESS: avx512_common_fp_sae_scalar_imm<"vrangess", f32x_info,
                                                  0x51, X86VRange, HasDQI>,
       AVX512AIi8Base, VEX_LIG, EVEX_4V, EVEX_CD8<32, CD8VT1>;
 
 defm VREDUCESD: avx512_common_fp_sae_scalar_imm<"vreducesd", f64x_info,
                                                  0x57, X86Reduces, HasDQI>,
       AVX512AIi8Base, VEX_LIG, EVEX_4V, EVEX_CD8<64, CD8VT1>, VEX_W;
 defm VREDUCESS: avx512_common_fp_sae_scalar_imm<"vreducess", f32x_info,
                                                  0x57, X86Reduces, HasDQI>,
       AVX512AIi8Base, VEX_LIG, EVEX_4V, EVEX_CD8<32, CD8VT1>;
 
 defm VGETMANTSD: avx512_common_fp_sae_scalar_imm<"vgetmantsd", f64x_info,
                                                  0x27, X86GetMants, HasAVX512>,
       AVX512AIi8Base, VEX_LIG, EVEX_4V, EVEX_CD8<64, CD8VT1>, VEX_W;
 defm VGETMANTSS: avx512_common_fp_sae_scalar_imm<"vgetmantss", f32x_info,
                                                  0x27, X86GetMants, HasAVX512>,
       AVX512AIi8Base, VEX_LIG, EVEX_4V, EVEX_CD8<32, CD8VT1>;
 
 multiclass avx512_shuff_packed_128<string OpcodeStr, AVX512VLVectorVTInfo _,
                                        bits<8> opc, SDNode OpNode = X86Shuf128>{
   let Predicates = [HasAVX512] in {
     defm Z    : avx512_3Op_imm8<opc, OpcodeStr, OpNode, _.info512>, EVEX_V512;
 
   }
   let Predicates = [HasAVX512, HasVLX] in {
      defm Z256 : avx512_3Op_imm8<opc, OpcodeStr, OpNode, _.info256>, EVEX_V256;
   }
 }
 let Predicates = [HasAVX512] in {
 def : Pat<(v16f32 (ffloor VR512:$src)),
           (VRNDSCALEPSZrri VR512:$src, (i32 0x1))>;
 def : Pat<(v16f32 (fnearbyint VR512:$src)),
           (VRNDSCALEPSZrri VR512:$src, (i32 0xC))>;
 def : Pat<(v16f32 (fceil VR512:$src)),
           (VRNDSCALEPSZrri VR512:$src, (i32 0x2))>;
 def : Pat<(v16f32 (frint VR512:$src)),
           (VRNDSCALEPSZrri VR512:$src, (i32 0x4))>;
 def : Pat<(v16f32 (ftrunc VR512:$src)),
           (VRNDSCALEPSZrri VR512:$src, (i32 0x3))>;
 
 def : Pat<(v8f64 (ffloor VR512:$src)),
           (VRNDSCALEPDZrri VR512:$src, (i32 0x1))>;
 def : Pat<(v8f64 (fnearbyint VR512:$src)),
           (VRNDSCALEPDZrri VR512:$src, (i32 0xC))>;
 def : Pat<(v8f64 (fceil VR512:$src)),
           (VRNDSCALEPDZrri VR512:$src, (i32 0x2))>;
 def : Pat<(v8f64 (frint VR512:$src)),
           (VRNDSCALEPDZrri VR512:$src, (i32 0x4))>;
 def : Pat<(v8f64 (ftrunc VR512:$src)),
           (VRNDSCALEPDZrri VR512:$src, (i32 0x3))>;
 }
 
 defm VSHUFF32X4 : avx512_shuff_packed_128<"vshuff32x4",avx512vl_f32_info, 0x23>,
       AVX512AIi8Base, EVEX_4V, EVEX_CD8<32, CD8VF>;
 defm VSHUFF64X2 : avx512_shuff_packed_128<"vshuff64x2",avx512vl_f64_info, 0x23>,
       AVX512AIi8Base, EVEX_4V, EVEX_CD8<64, CD8VF>, VEX_W;
 defm VSHUFI32X4 : avx512_shuff_packed_128<"vshufi32x4",avx512vl_i32_info, 0x43>,
       AVX512AIi8Base, EVEX_4V, EVEX_CD8<32, CD8VF>;
 defm VSHUFI64X2 : avx512_shuff_packed_128<"vshufi64x2",avx512vl_i64_info, 0x43>,
       AVX512AIi8Base, EVEX_4V, EVEX_CD8<64, CD8VF>, VEX_W;
 
 multiclass avx512_valign<string OpcodeStr, AVX512VLVectorVTInfo VTInfo_I> {
   defm NAME:       avx512_common_3Op_imm8<OpcodeStr, VTInfo_I, 0x03, X86VAlign>,
                            AVX512AIi8Base, EVEX_4V;
 }
 
 defm VALIGND: avx512_valign<"valignd", avx512vl_i32_info>,
                                                   EVEX_CD8<32, CD8VF>;
 defm VALIGNQ: avx512_valign<"valignq", avx512vl_i64_info>,
                                                   EVEX_CD8<64, CD8VF>, VEX_W;
 
 multiclass avx512_vpalign_lowering<X86VectorVTInfo _ , list<Predicate> p>{
   let Predicates = p in
     def NAME#_.VTName#rri:
           Pat<(_.VT (X86PAlignr _.RC:$src1, _.RC:$src2, (i8 imm:$imm))),
               (!cast<Instruction>(NAME#_.ZSuffix#rri)
                     _.RC:$src1, _.RC:$src2, imm:$imm)>;
 }
 
 multiclass avx512_vpalign_lowering_common<AVX512VLVectorVTInfo _>:
       avx512_vpalign_lowering<_.info512, [HasBWI]>,
       avx512_vpalign_lowering<_.info128, [HasBWI, HasVLX]>,
       avx512_vpalign_lowering<_.info256, [HasBWI, HasVLX]>;
 
 defm VPALIGN:   avx512_common_3Op_rm_imm8<0x0F, X86PAlignr, "vpalignr" ,
                                           avx512vl_i8_info, avx512vl_i8_info>,
                 avx512_vpalign_lowering_common<avx512vl_i16_info>,
                 avx512_vpalign_lowering_common<avx512vl_i32_info>,
                 avx512_vpalign_lowering_common<avx512vl_f32_info>,
                 avx512_vpalign_lowering_common<avx512vl_i64_info>,
                 avx512_vpalign_lowering_common<avx512vl_f64_info>,
                 EVEX_CD8<8, CD8VF>;
 
 defm VDBPSADBW: avx512_common_3Op_rm_imm8<0x42, X86dbpsadbw, "vdbpsadbw" ,
                     avx512vl_i16_info, avx512vl_i8_info>, EVEX_CD8<8, CD8VF>;
 
 multiclass avx512_unary_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,
                            X86VectorVTInfo _> {
   defm rr : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                     (ins _.RC:$src1), OpcodeStr,
                     "$src1", "$src1",
                     (_.VT (OpNode _.RC:$src1))>, EVEX, AVX5128IBase;
 
   let mayLoad = 1 in
     defm rm : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                     (ins _.MemOp:$src1), OpcodeStr,
                     "$src1", "$src1",
                     (_.VT (OpNode (bitconvert (_.LdFrag addr:$src1))))>,
               EVEX, AVX5128IBase, EVEX_CD8<_.EltSize, CD8VF>;
 }
 
 multiclass avx512_unary_rmb<bits<8> opc, string OpcodeStr, SDNode OpNode,
                             X86VectorVTInfo _> :
            avx512_unary_rm<opc, OpcodeStr, OpNode, _> {
   let mayLoad = 1 in
     defm rmb : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                     (ins _.ScalarMemOp:$src1), OpcodeStr,
                     "${src1}"##_.BroadcastStr,
                     "${src1}"##_.BroadcastStr,
                     (_.VT (OpNode (X86VBroadcast
                                       (_.ScalarLdFrag addr:$src1))))>,
                EVEX, AVX5128IBase, EVEX_B, EVEX_CD8<_.EltSize, CD8VF>;
 }
 
 multiclass avx512_unary_rm_vl<bits<8> opc, string OpcodeStr, SDNode OpNode,
                               AVX512VLVectorVTInfo VTInfo, Predicate prd> {
   let Predicates = [prd] in
     defm Z : avx512_unary_rm<opc, OpcodeStr, OpNode, VTInfo.info512>, EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_unary_rm<opc, OpcodeStr, OpNode, VTInfo.info256>,
                               EVEX_V256;
     defm Z128 : avx512_unary_rm<opc, OpcodeStr, OpNode, VTInfo.info128>,
                               EVEX_V128;
   }
 }
 
 multiclass avx512_unary_rmb_vl<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                AVX512VLVectorVTInfo VTInfo, Predicate prd> {
   let Predicates = [prd] in
     defm Z : avx512_unary_rmb<opc, OpcodeStr, OpNode, VTInfo.info512>,
                               EVEX_V512;
 
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_unary_rmb<opc, OpcodeStr, OpNode, VTInfo.info256>,
                                  EVEX_V256;
     defm Z128 : avx512_unary_rmb<opc, OpcodeStr, OpNode, VTInfo.info128>,
                                  EVEX_V128;
   }
 }
 
 multiclass avx512_unary_rm_vl_dq<bits<8> opc_d, bits<8> opc_q, string OpcodeStr,
                                  SDNode OpNode, Predicate prd> {
   defm Q : avx512_unary_rmb_vl<opc_q, OpcodeStr#"q", OpNode, avx512vl_i64_info,
                                prd>, VEX_W;
   defm D : avx512_unary_rmb_vl<opc_d, OpcodeStr#"d", OpNode, avx512vl_i32_info,
                                prd>;
 }
 
 multiclass avx512_unary_rm_vl_bw<bits<8> opc_b, bits<8> opc_w, string OpcodeStr,
                                  SDNode OpNode, Predicate prd> {
   defm W : avx512_unary_rm_vl<opc_w, OpcodeStr#"w", OpNode, avx512vl_i16_info, prd>;
   defm B : avx512_unary_rm_vl<opc_b, OpcodeStr#"b", OpNode, avx512vl_i8_info, prd>;
 }
 
 multiclass avx512_unary_rm_vl_all<bits<8> opc_b, bits<8> opc_w,
                                   bits<8> opc_d, bits<8> opc_q,
                                   string OpcodeStr, SDNode OpNode> {
   defm NAME : avx512_unary_rm_vl_dq<opc_d, opc_q, OpcodeStr, OpNode,
                                     HasAVX512>,
               avx512_unary_rm_vl_bw<opc_b, opc_w, OpcodeStr, OpNode,
                                     HasBWI>;
 }
 
 defm VPABS : avx512_unary_rm_vl_all<0x1C, 0x1D, 0x1E, 0x1F, "vpabs", X86Abs>;
 
 def : Pat<(xor
           (bc_v16i32 (v16i1sextv16i32)),
           (bc_v16i32 (add (v16i32 VR512:$src), (v16i1sextv16i32)))),
           (VPABSDZrr VR512:$src)>;
 def : Pat<(xor
           (bc_v8i64 (v8i1sextv8i64)),
           (bc_v8i64 (add (v8i64 VR512:$src), (v8i1sextv8i64)))),
           (VPABSQZrr VR512:$src)>;
 
 multiclass avx512_ctlz<bits<8> opc, string OpcodeStr, Predicate prd>{
 
   defm NAME :          avx512_unary_rm_vl_dq<opc, opc, OpcodeStr, ctlz, prd>;
 }
 
 defm VPLZCNT    : avx512_ctlz<0x44, "vplzcnt", HasCDI>;
 defm VPCONFLICT : avx512_unary_rm_vl_dq<0xC4, 0xC4, "vpconflict", X86Conflict, HasCDI>;
 
 //===---------------------------------------------------------------------===//
 // Replicate Single FP - MOVSHDUP and MOVSLDUP
 //===---------------------------------------------------------------------===//
 multiclass avx512_replicate<bits<8> opc, string OpcodeStr, SDNode OpNode>{
   defm NAME:       avx512_unary_rm_vl<opc, OpcodeStr, OpNode, avx512vl_f32_info,
                                       HasAVX512>, XS;
 }
 
 defm VMOVSHDUP : avx512_replicate<0x16, "vmovshdup", X86Movshdup>;
 defm VMOVSLDUP : avx512_replicate<0x12, "vmovsldup", X86Movsldup>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512 - MOVDDUP
 //===----------------------------------------------------------------------===//
 
 multiclass avx512_movddup_128<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             X86VectorVTInfo _> {
   defm rr : AVX512_maskable<opc, MRMSrcReg, _, (outs _.RC:$dst),
                    (ins _.RC:$src), OpcodeStr, "$src", "$src",
                    (_.VT (OpNode (_.VT _.RC:$src)))>, EVEX;
   let mayLoad = 1 in
     defm rm : AVX512_maskable<opc, MRMSrcMem, _, (outs _.RC:$dst),
                    (ins _.ScalarMemOp:$src), OpcodeStr, "$src", "$src",
                    (_.VT (OpNode (_.VT (scalar_to_vector
                                          (_.ScalarLdFrag addr:$src)))))>,
                    EVEX, EVEX_CD8<_.EltSize, CD8VH>;
 }
 
 multiclass avx512_movddup_common<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                   AVX512VLVectorVTInfo VTInfo> {
 
   defm Z : avx512_unary_rm<opc, OpcodeStr, OpNode, VTInfo.info512>, EVEX_V512;
 
   let Predicates = [HasAVX512, HasVLX] in {
     defm Z256 : avx512_unary_rm<opc, OpcodeStr, OpNode, VTInfo.info256>,
                                EVEX_V256;
     defm Z128 : avx512_movddup_128<opc, OpcodeStr, OpNode, VTInfo.info128>,
                                EVEX_V128;
   }
 }
 
 multiclass avx512_movddup<bits<8> opc, string OpcodeStr, SDNode OpNode>{
   defm NAME:      avx512_movddup_common<opc, OpcodeStr, OpNode,
                                         avx512vl_f64_info>, XD, VEX_W;
 }
 
 defm VMOVDDUP : avx512_movddup<0x12, "vmovddup", X86Movddup>;
 
 def : Pat<(X86Movddup (loadv2f64 addr:$src)),
           (VMOVDDUPZ128rm addr:$src)>, Requires<[HasAVX512, HasVLX]>;
 def : Pat<(v2f64 (X86VBroadcast (loadf64 addr:$src))),
           (VMOVDDUPZ128rm addr:$src)>, Requires<[HasAVX512, HasVLX]>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512 - Unpack Instructions
 //===----------------------------------------------------------------------===//
 defm VUNPCKH : avx512_fp_binop_p<0x15, "vunpckh", X86Unpckh>;
 defm VUNPCKL : avx512_fp_binop_p<0x14, "vunpckl", X86Unpckl>;
 
 defm VPUNPCKLBW : avx512_binop_rm_vl_b<0x60, "vpunpcklbw", X86Unpckl,
                                        SSE_INTALU_ITINS_P, HasBWI>;
 defm VPUNPCKHBW : avx512_binop_rm_vl_b<0x68, "vpunpckhbw", X86Unpckh,
                                        SSE_INTALU_ITINS_P, HasBWI>;
 defm VPUNPCKLWD : avx512_binop_rm_vl_w<0x61, "vpunpcklwd", X86Unpckl,
                                        SSE_INTALU_ITINS_P, HasBWI>;
 defm VPUNPCKHWD : avx512_binop_rm_vl_w<0x69, "vpunpckhwd", X86Unpckh,
                                        SSE_INTALU_ITINS_P, HasBWI>;
 
 defm VPUNPCKLDQ : avx512_binop_rm_vl_d<0x62, "vpunpckldq", X86Unpckl,
                                        SSE_INTALU_ITINS_P, HasAVX512>;
 defm VPUNPCKHDQ : avx512_binop_rm_vl_d<0x6A, "vpunpckhdq", X86Unpckh,
                                        SSE_INTALU_ITINS_P, HasAVX512>;
 defm VPUNPCKLQDQ : avx512_binop_rm_vl_q<0x6C, "vpunpcklqdq", X86Unpckl,
                                        SSE_INTALU_ITINS_P, HasAVX512>;
 defm VPUNPCKHQDQ : avx512_binop_rm_vl_q<0x6D, "vpunpckhqdq", X86Unpckh,
                                        SSE_INTALU_ITINS_P, HasAVX512>;
 
 //===----------------------------------------------------------------------===//
 // AVX-512 - Extract & Insert Integer Instructions
 //===----------------------------------------------------------------------===//
 
 multiclass avx512_extract_elt_bw_m<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             X86VectorVTInfo _> {
   let mayStore = 1 in
     def mr : AVX512Ii8<opc, MRMDestMem, (outs),
                 (ins _.ScalarMemOp:$dst, _.RC:$src1, u8imm:$src2),
                 OpcodeStr#"\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                 [(store (_.EltVT (trunc (assertzext (OpNode (_.VT _.RC:$src1),
                                                             imm:$src2)))),
                         addr:$dst)]>,
                 EVEX, EVEX_CD8<_.EltSize, CD8VT1>;
 }
 
 multiclass avx512_extract_elt_b<string OpcodeStr, X86VectorVTInfo _> {
   let Predicates = [HasBWI] in {
     def rr : AVX512Ii8<0x14, MRMDestReg, (outs GR32orGR64:$dst),
                   (ins _.RC:$src1, u8imm:$src2),
                   OpcodeStr#"\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                   [(set GR32orGR64:$dst,
                         (X86pextrb (_.VT _.RC:$src1), imm:$src2))]>,
                   EVEX, TAPD;
 
     defm NAME : avx512_extract_elt_bw_m<0x14, OpcodeStr, X86pextrb, _>, TAPD;
   }
 }
 
 multiclass avx512_extract_elt_w<string OpcodeStr, X86VectorVTInfo _> {
   let Predicates = [HasBWI] in {
     def rr : AVX512Ii8<0xC5, MRMSrcReg, (outs GR32orGR64:$dst),
                   (ins _.RC:$src1, u8imm:$src2),
                   OpcodeStr#"\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                   [(set GR32orGR64:$dst,
                         (X86pextrw (_.VT _.RC:$src1), imm:$src2))]>,
                   EVEX, PD;
 
     def rr_REV : AVX512Ii8<0x15, MRMDestReg, (outs GR32orGR64:$dst),
                    (ins _.RC:$src1, u8imm:$src2),
                    OpcodeStr#".s\t{$src2, $src1, $dst|$dst, $src1, $src2}", []>,
                    EVEX, TAPD;
 
     defm NAME : avx512_extract_elt_bw_m<0x15, OpcodeStr, X86pextrw, _>, TAPD;
   }
 }
 
 multiclass avx512_extract_elt_dq<string OpcodeStr, X86VectorVTInfo _,
                                                             RegisterClass GRC> {
   let Predicates = [HasDQI] in {
     def rr : AVX512Ii8<0x16, MRMDestReg, (outs GRC:$dst),
                   (ins _.RC:$src1, u8imm:$src2),
                   OpcodeStr#"\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                   [(set GRC:$dst,
                       (extractelt (_.VT _.RC:$src1), imm:$src2))]>,
                   EVEX, TAPD;
 
     let mayStore = 1 in
       def mr : AVX512Ii8<0x16, MRMDestMem, (outs),
                   (ins _.ScalarMemOp:$dst, _.RC:$src1, u8imm:$src2),
                   OpcodeStr#"\t{$src2, $src1, $dst|$dst, $src1, $src2}",
                   [(store (extractelt (_.VT _.RC:$src1),
                                       imm:$src2),addr:$dst)]>,
                   EVEX, EVEX_CD8<_.EltSize, CD8VT1>, TAPD;
   }
 }
 
 defm VPEXTRBZ : avx512_extract_elt_b<"vpextrb", v16i8x_info>;
 defm VPEXTRWZ : avx512_extract_elt_w<"vpextrw", v8i16x_info>;
 defm VPEXTRDZ : avx512_extract_elt_dq<"vpextrd", v4i32x_info, GR32>;
 defm VPEXTRQZ : avx512_extract_elt_dq<"vpextrq", v2i64x_info, GR64>, VEX_W;
 
 multiclass avx512_insert_elt_m<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                             X86VectorVTInfo _, PatFrag LdFrag> {
   def rm : AVX512Ii8<opc, MRMSrcMem, (outs _.RC:$dst),
       (ins _.RC:$src1,  _.ScalarMemOp:$src2, u8imm:$src3),
       OpcodeStr#"\t{$src3, $src2, $src1, $dst|$dst, $src1, $src2, $src3}",
       [(set _.RC:$dst,
           (_.VT (OpNode _.RC:$src1, (LdFrag addr:$src2), imm:$src3)))]>,
       EVEX_4V, EVEX_CD8<_.EltSize, CD8VT1>;
 }
 
 multiclass avx512_insert_elt_bw<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                             X86VectorVTInfo _, PatFrag LdFrag> {
   let Predicates = [HasBWI] in {
     def rr : AVX512Ii8<opc, MRMSrcReg, (outs _.RC:$dst),
         (ins _.RC:$src1, GR32orGR64:$src2, u8imm:$src3),
         OpcodeStr#"\t{$src3, $src2, $src1, $dst|$dst, $src1, $src2, $src3}",
         [(set _.RC:$dst,
             (OpNode _.RC:$src1, GR32orGR64:$src2, imm:$src3))]>, EVEX_4V;
 
     defm NAME : avx512_insert_elt_m<opc, OpcodeStr, OpNode, _, LdFrag>;
   }
 }
 
 multiclass avx512_insert_elt_dq<bits<8> opc, string OpcodeStr,
                                          X86VectorVTInfo _, RegisterClass GRC> {
   let Predicates = [HasDQI] in {
     def rr : AVX512Ii8<opc, MRMSrcReg, (outs _.RC:$dst),
         (ins _.RC:$src1, GRC:$src2, u8imm:$src3),
         OpcodeStr#"\t{$src3, $src2, $src1, $dst|$dst, $src1, $src2, $src3}",
         [(set _.RC:$dst,
             (_.VT (insertelt _.RC:$src1, GRC:$src2, imm:$src3)))]>,
         EVEX_4V, TAPD;
 
     defm NAME : avx512_insert_elt_m<opc, OpcodeStr, insertelt, _,
                                     _.ScalarLdFrag>, TAPD;
   }
 }
 
 defm VPINSRBZ : avx512_insert_elt_bw<0x20, "vpinsrb", X86pinsrb, v16i8x_info,
                                      extloadi8>, TAPD;
 defm VPINSRWZ : avx512_insert_elt_bw<0xC4, "vpinsrw", X86pinsrw, v8i16x_info,
                                      extloadi16>, PD;
 defm VPINSRDZ : avx512_insert_elt_dq<0x22, "vpinsrd", v4i32x_info, GR32>;
 defm VPINSRQZ : avx512_insert_elt_dq<0x22, "vpinsrq", v2i64x_info, GR64>, VEX_W;
 //===----------------------------------------------------------------------===//
 // VSHUFPS - VSHUFPD Operations
 //===----------------------------------------------------------------------===//
 multiclass avx512_shufp<string OpcodeStr, AVX512VLVectorVTInfo VTInfo_I,
                                                 AVX512VLVectorVTInfo VTInfo_FP>{
   defm NAME:     avx512_common_3Op_imm8<OpcodeStr, VTInfo_FP, 0xC6, X86Shufp>,
                                    EVEX_CD8<VTInfo_FP.info512.EltSize, CD8VF>,
                                    AVX512AIi8Base, EVEX_4V;
 }
 
 defm VSHUFPS: avx512_shufp<"vshufps", avx512vl_i32_info, avx512vl_f32_info>, PS;
 defm VSHUFPD: avx512_shufp<"vshufpd", avx512vl_i64_info, avx512vl_f64_info>, PD, VEX_W;
 //===----------------------------------------------------------------------===//
 // AVX-512 - Byte shift Left/Right
 //===----------------------------------------------------------------------===//
 
 multiclass avx512_shift_packed<bits<8> opc, SDNode OpNode, Format MRMr,
                              Format MRMm, string OpcodeStr, X86VectorVTInfo _>{
   def rr : AVX512<opc, MRMr,
              (outs _.RC:$dst), (ins _.RC:$src1, u8imm:$src2),
              !strconcat(OpcodeStr, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
              [(set _.RC:$dst,(_.VT (OpNode _.RC:$src1, (i8 imm:$src2))))]>;
   let mayLoad = 1 in
     def rm : AVX512<opc, MRMm,
              (outs _.RC:$dst), (ins _.MemOp:$src1, u8imm:$src2),
              !strconcat(OpcodeStr, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
              [(set _.RC:$dst,(_.VT (OpNode 
                                    (_.LdFrag addr:$src1), (i8 imm:$src2))))]>;
 }
 
 multiclass avx512_shift_packed_all<bits<8> opc, SDNode OpNode, Format MRMr, 
                                  Format MRMm, string OpcodeStr, Predicate prd>{
   let Predicates = [prd] in
     defm Z512 : avx512_shift_packed<opc, OpNode, MRMr, MRMm, 
                                     OpcodeStr, v8i64_info>, EVEX_V512;
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_shift_packed<opc, OpNode, MRMr, MRMm, 
                                     OpcodeStr, v4i64x_info>, EVEX_V256;
     defm Z128 : avx512_shift_packed<opc, OpNode, MRMr, MRMm, 
                                     OpcodeStr, v2i64x_info>, EVEX_V128;
   }
 }
 defm VPSLLDQ : avx512_shift_packed_all<0x73, X86vshldq, MRM7r, MRM7m, "vpslldq", 
                                        HasBWI>, AVX512PDIi8Base, EVEX_4V;
 defm VPSRLDQ : avx512_shift_packed_all<0x73, X86vshrdq, MRM3r, MRM3m, "vpsrldq", 
                                        HasBWI>, AVX512PDIi8Base, EVEX_4V;
 
 
 multiclass avx512_psadbw_packed<bits<8> opc, SDNode OpNode, 
                                 string OpcodeStr, X86VectorVTInfo _dst,
                                 X86VectorVTInfo _src>{
   def rr : AVX512BI<opc, MRMSrcReg,
              (outs _dst.RC:$dst), (ins _src.RC:$src1, _src.RC:$src2),
              !strconcat(OpcodeStr, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
              [(set _dst.RC:$dst,(_dst.VT
                                 (OpNode (_src.VT _src.RC:$src1),
                                         (_src.VT _src.RC:$src2))))]>;
   let mayLoad = 1 in
     def rm : AVX512BI<opc, MRMSrcMem,
              (outs _dst.RC:$dst), (ins _src.RC:$src1, _src.MemOp:$src2),
              !strconcat(OpcodeStr, "\t{$src2, $src1, $dst|$dst, $src1, $src2}"),
              [(set _dst.RC:$dst,(_dst.VT
                                 (OpNode (_src.VT _src.RC:$src1),
                                 (_src.VT (bitconvert
                                           (_src.LdFrag addr:$src2))))))]>;
 }
 
 multiclass avx512_psadbw_packed_all<bits<8> opc, SDNode OpNode, 
                                     string OpcodeStr, Predicate prd> {
   let Predicates = [prd] in
     defm Z512 : avx512_psadbw_packed<opc, OpNode, OpcodeStr, v8i64_info,
                                     v64i8_info>, EVEX_V512;
   let Predicates = [prd, HasVLX] in {
     defm Z256 : avx512_psadbw_packed<opc, OpNode, OpcodeStr, v4i64x_info,
                                     v32i8x_info>, EVEX_V256;
     defm Z128 : avx512_psadbw_packed<opc, OpNode, OpcodeStr, v2i64x_info,
                                     v16i8x_info>, EVEX_V128;
   }
 }
 
 defm VPSADBW : avx512_psadbw_packed_all<0xf6, X86psadbw, "vpsadbw", 
                                        HasBWI>, EVEX_4V;
 
 multiclass avx512_ternlog<bits<8> opc, string OpcodeStr, SDNode OpNode,
                                                             X86VectorVTInfo _>{
   let Constraints = "$src1 = $dst" in {
   defm rri : AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),
                       (ins _.RC:$src2, _.RC:$src3, u8imm:$src4),
                       OpcodeStr, "$src4, $src3, $src2", "$src2, $src3, $src3",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT _.RC:$src2),
                               (_.VT _.RC:$src3),
                               (i8 imm:$src4))>, AVX512AIi8Base, EVEX_4V;
   let mayLoad = 1 in {
     defm rmi : AVX512_maskable_3src<opc, MRMSrcMem, _, (outs _.RC:$dst),
                       (ins _.RC:$src2, _.MemOp:$src3, u8imm:$src4),
                       OpcodeStr, "$src4, $src3, $src2", "$src2, $src3, $src3",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT _.RC:$src2),
                               (_.VT (bitconvert (_.LdFrag addr:$src3))),
                               (i8 imm:$src4))>,
                       AVX512AIi8Base, EVEX_4V, EVEX_CD8<_.EltSize, CD8VF>;
     defm rmbi : AVX512_maskable_3src<opc, MRMSrcMem, _, (outs _.RC:$dst),
                       (ins _.RC:$src2, _.ScalarMemOp:$src3, u8imm:$src4),
                       OpcodeStr, "$src4, ${src3}"##_.BroadcastStr##", $src2",
                       "$src2, ${src3}"##_.BroadcastStr##", $src4",
                       (OpNode (_.VT _.RC:$src1),
                               (_.VT _.RC:$src2),
                               (_.VT (X86VBroadcast(_.ScalarLdFrag addr:$src3))),
                               (i8 imm:$src4))>, EVEX_B,
                       AVX512AIi8Base, EVEX_4V, EVEX_CD8<_.EltSize, CD8VF>;
   }
   }// Constraints = "$src1 = $dst"
 }
 
 multiclass avx512_common_ternlog<string OpcodeStr, AVX512VLVectorVTInfo _>{
   let Predicates = [HasAVX512] in
     defm Z    : avx512_ternlog<0x25, OpcodeStr, X86vpternlog, _.info512>, EVEX_V512;
   let Predicates = [HasAVX512, HasVLX] in {
     defm Z128 : avx512_ternlog<0x25, OpcodeStr, X86vpternlog, _.info128>, EVEX_V128;
     defm Z256 : avx512_ternlog<0x25, OpcodeStr, X86vpternlog, _.info256>, EVEX_V256;
   }
 }
 
 defm VPTERNLOGD : avx512_common_ternlog<"vpternlogd", avx512vl_i32_info>;
 defm VPTERNLOGQ : avx512_common_ternlog<"vpternlogq", avx512vl_i64_info>, VEX_W;
 
Index: vendor/llvm/dist/lib/Transforms/IPO/PruneEH.cpp
===================================================================
--- vendor/llvm/dist/lib/Transforms/IPO/PruneEH.cpp	(revision 295845)
+++ vendor/llvm/dist/lib/Transforms/IPO/PruneEH.cpp	(revision 295846)
@@ -1,273 +1,266 @@
 //===- PruneEH.cpp - Pass which deletes unused exception handlers ---------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file implements a simple interprocedural pass which walks the
 // call-graph, turning invoke instructions into calls, iff the callee cannot
 // throw an exception, and marking functions 'nounwind' if they cannot throw.
 // It implements this as a bottom-up traversal of the call-graph.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/Transforms/IPO.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Analysis/CallGraph.h"
 #include "llvm/Analysis/CallGraphSCCPass.h"
 #include "llvm/Analysis/EHPersonalities.h"
 #include "llvm/IR/CFG.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/InlineAsm.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/LLVMContext.h"
+#include "llvm/Transforms/Utils/Local.h"
 #include <algorithm>
 using namespace llvm;
 
 #define DEBUG_TYPE "prune-eh"
 
 STATISTIC(NumRemoved, "Number of invokes removed");
 STATISTIC(NumUnreach, "Number of noreturn calls optimized");
 
 namespace {
   struct PruneEH : public CallGraphSCCPass {
     static char ID; // Pass identification, replacement for typeid
     PruneEH() : CallGraphSCCPass(ID) {
       initializePruneEHPass(*PassRegistry::getPassRegistry());
     }
 
     // runOnSCC - Analyze the SCC, performing the transformation if possible.
     bool runOnSCC(CallGraphSCC &SCC) override;
 
     bool SimplifyFunction(Function *F);
     void DeleteBasicBlock(BasicBlock *BB);
   };
 }
 
 char PruneEH::ID = 0;
 INITIALIZE_PASS_BEGIN(PruneEH, "prune-eh",
                 "Remove unused exception handling info", false, false)
 INITIALIZE_PASS_DEPENDENCY(CallGraphWrapperPass)
 INITIALIZE_PASS_END(PruneEH, "prune-eh",
                 "Remove unused exception handling info", false, false)
 
 Pass *llvm::createPruneEHPass() { return new PruneEH(); }
 
 
 bool PruneEH::runOnSCC(CallGraphSCC &SCC) {
   SmallPtrSet<CallGraphNode *, 8> SCCNodes;
   CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();
   bool MadeChange = false;
 
   // Fill SCCNodes with the elements of the SCC.  Used for quickly
   // looking up whether a given CallGraphNode is in this SCC.
   for (CallGraphSCC::iterator I = SCC.begin(), E = SCC.end(); I != E; ++I)
     SCCNodes.insert(*I);
 
   // First pass, scan all of the functions in the SCC, simplifying them
   // according to what we know.
   for (CallGraphSCC::iterator I = SCC.begin(), E = SCC.end(); I != E; ++I)
     if (Function *F = (*I)->getFunction())
       MadeChange |= SimplifyFunction(F);
 
   // Next, check to see if any callees might throw or if there are any external
   // functions in this SCC: if so, we cannot prune any functions in this SCC.
   // Definitions that are weak and not declared non-throwing might be 
   // overridden at linktime with something that throws, so assume that.
   // If this SCC includes the unwind instruction, we KNOW it throws, so
   // obviously the SCC might throw.
   //
   bool SCCMightUnwind = false, SCCMightReturn = false;
   for (CallGraphSCC::iterator I = SCC.begin(), E = SCC.end(); 
        (!SCCMightUnwind || !SCCMightReturn) && I != E; ++I) {
     Function *F = (*I)->getFunction();
     if (!F) {
       SCCMightUnwind = true;
       SCCMightReturn = true;
     } else if (F->isDeclaration() || F->mayBeOverridden()) {
       SCCMightUnwind |= !F->doesNotThrow();
       SCCMightReturn |= !F->doesNotReturn();
     } else {
       bool CheckUnwind = !SCCMightUnwind && !F->doesNotThrow();
       bool CheckReturn = !SCCMightReturn && !F->doesNotReturn();
       // Determine if we should scan for InlineAsm in a naked function as it
       // is the only way to return without a ReturnInst.  Only do this for
       // no-inline functions as functions which may be inlined cannot
       // meaningfully return via assembly.
       bool CheckReturnViaAsm = CheckReturn &&
                                F->hasFnAttribute(Attribute::Naked) &&
                                F->hasFnAttribute(Attribute::NoInline);
 
       if (!CheckUnwind && !CheckReturn)
         continue;
 
       for (const BasicBlock &BB : *F) {
         const TerminatorInst *TI = BB.getTerminator();
         if (CheckUnwind && TI->mayThrow()) {
           SCCMightUnwind = true;
         } else if (CheckReturn && isa<ReturnInst>(TI)) {
           SCCMightReturn = true;
         }
 
         for (const Instruction &I : BB) {
           if ((!CheckUnwind || SCCMightUnwind) &&
               (!CheckReturnViaAsm || SCCMightReturn))
             break;
 
           // Check to see if this function performs an unwind or calls an
           // unwinding function.
           if (CheckUnwind && !SCCMightUnwind && I.mayThrow()) {
             bool InstMightUnwind = true;
             if (const auto *CI = dyn_cast<CallInst>(&I)) {
               if (Function *Callee = CI->getCalledFunction()) {
                 CallGraphNode *CalleeNode = CG[Callee];
                 // If the callee is outside our current SCC then we may throw
                 // because it might.  If it is inside, do nothing.
                 if (SCCNodes.count(CalleeNode) > 0)
                   InstMightUnwind = false;
               }
             }
             SCCMightUnwind |= InstMightUnwind;
           }
           if (CheckReturnViaAsm && !SCCMightReturn)
             if (auto ICS = ImmutableCallSite(&I))
               if (const auto *IA = dyn_cast<InlineAsm>(ICS.getCalledValue()))
                 if (IA->hasSideEffects())
                   SCCMightReturn = true;
         }
 
         if (SCCMightUnwind && SCCMightReturn)
           break;
       }
     }
   }
 
   // If the SCC doesn't unwind or doesn't throw, note this fact.
   if (!SCCMightUnwind || !SCCMightReturn)
     for (CallGraphSCC::iterator I = SCC.begin(), E = SCC.end(); I != E; ++I) {
       Function *F = (*I)->getFunction();
 
       if (!SCCMightUnwind && !F->hasFnAttribute(Attribute::NoUnwind)) {
         F->addFnAttr(Attribute::NoUnwind);
         MadeChange = true;
       }
 
       if (!SCCMightReturn && !F->hasFnAttribute(Attribute::NoReturn)) {
         F->addFnAttr(Attribute::NoReturn);
         MadeChange = true;
       }
     }
 
   for (CallGraphSCC::iterator I = SCC.begin(), E = SCC.end(); I != E; ++I) {
     // Convert any invoke instructions to non-throwing functions in this node
     // into call instructions with a branch.  This makes the exception blocks
     // dead.
     if (Function *F = (*I)->getFunction())
       MadeChange |= SimplifyFunction(F);
   }
 
   return MadeChange;
 }
 
 
 // SimplifyFunction - Given information about callees, simplify the specified
 // function if we have invokes to non-unwinding functions or code after calls to
 // no-return functions.
 bool PruneEH::SimplifyFunction(Function *F) {
   bool MadeChange = false;
   for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB) {
     if (InvokeInst *II = dyn_cast<InvokeInst>(BB->getTerminator()))
       if (II->doesNotThrow() && canSimplifyInvokeNoUnwind(F)) {
-        SmallVector<Value*, 8> Args(II->arg_begin(), II->arg_end());
-        SmallVector<OperandBundleDef, 1> OpBundles;
-        II->getOperandBundlesAsDefs(OpBundles);
-
-        // Insert a call instruction before the invoke.
-        CallInst *Call = CallInst::Create(II->getCalledValue(), Args, OpBundles,
-                                          "", II);
-        Call->takeName(II);
-        Call->setCallingConv(II->getCallingConv());
-        Call->setAttributes(II->getAttributes());
-        Call->setDebugLoc(II->getDebugLoc());
-
-        // Anything that used the value produced by the invoke instruction
-        // now uses the value produced by the call instruction.  Note that we
-        // do this even for void functions and calls with no uses so that the
-        // callgraph edge is updated.
-        II->replaceAllUsesWith(Call);
         BasicBlock *UnwindBlock = II->getUnwindDest();
-        UnwindBlock->removePredecessor(II->getParent());
+        removeUnwindEdge(&*BB);
 
-        // Insert a branch to the normal destination right before the
-        // invoke.
-        BranchInst::Create(II->getNormalDest(), II);
-
-        // Finally, delete the invoke instruction!
-        BB->getInstList().pop_back();
-
         // If the unwind block is now dead, nuke it.
         if (pred_empty(UnwindBlock))
           DeleteBasicBlock(UnwindBlock);  // Delete the new BB.
 
         ++NumRemoved;
         MadeChange = true;
       }
 
     for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; )
       if (CallInst *CI = dyn_cast<CallInst>(I++))
         if (CI->doesNotReturn() && !isa<UnreachableInst>(I)) {
           // This call calls a function that cannot return.  Insert an
           // unreachable instruction after it and simplify the code.  Do this
           // by splitting the BB, adding the unreachable, then deleting the
           // new BB.
           BasicBlock *New = BB->splitBasicBlock(I);
 
           // Remove the uncond branch and add an unreachable.
           BB->getInstList().pop_back();
           new UnreachableInst(BB->getContext(), &*BB);
 
           DeleteBasicBlock(New);  // Delete the new BB.
           MadeChange = true;
           ++NumUnreach;
           break;
         }
   }
 
   return MadeChange;
 }
 
 /// DeleteBasicBlock - remove the specified basic block from the program,
 /// updating the callgraph to reflect any now-obsolete edges due to calls that
 /// exist in the BB.
 void PruneEH::DeleteBasicBlock(BasicBlock *BB) {
   assert(pred_empty(BB) && "BB is not dead!");
   CallGraph &CG = getAnalysis<CallGraphWrapperPass>().getCallGraph();
 
+  Instruction *TokenInst = nullptr;
+
   CallGraphNode *CGN = CG[BB->getParent()];
   for (BasicBlock::iterator I = BB->end(), E = BB->begin(); I != E; ) {
     --I;
-    if (CallInst *CI = dyn_cast<CallInst>(I)) {
-      if (!isa<IntrinsicInst>(I))
-        CGN->removeCallEdgeFor(CI);
-    } else if (InvokeInst *II = dyn_cast<InvokeInst>(I))
-      CGN->removeCallEdgeFor(II);
+
+    if (I->getType()->isTokenTy()) {
+      TokenInst = &*I;
+      break;
+    }
+
+    if (auto CS = CallSite (&*I)) {
+      const Function *Callee = CS.getCalledFunction();
+      if (!Callee || !Intrinsic::isLeaf(Callee->getIntrinsicID()))
+        CGN->removeCallEdgeFor(CS);
+      else if (!Callee->isIntrinsic())
+        CGN->removeCallEdgeFor(CS);
+    }
+
     if (!I->use_empty())
       I->replaceAllUsesWith(UndefValue::get(I->getType()));
   }
 
-  // Get the list of successors of this block.
-  std::vector<BasicBlock*> Succs(succ_begin(BB), succ_end(BB));
+  if (TokenInst) {
+    if (!isa<TerminatorInst>(TokenInst))
+      changeToUnreachable(TokenInst->getNextNode(), /*UseLLVMTrap=*/false);
+  } else {
+    // Get the list of successors of this block.
+    std::vector<BasicBlock *> Succs(succ_begin(BB), succ_end(BB));
 
-  for (unsigned i = 0, e = Succs.size(); i != e; ++i)
-    Succs[i]->removePredecessor(BB);
+    for (unsigned i = 0, e = Succs.size(); i != e; ++i)
+      Succs[i]->removePredecessor(BB);
 
-  BB->eraseFromParent();
+    BB->eraseFromParent();
+  }
 }
Index: vendor/llvm/dist/lib/Transforms/Scalar/LoopStrengthReduce.cpp
===================================================================
--- vendor/llvm/dist/lib/Transforms/Scalar/LoopStrengthReduce.cpp	(revision 295845)
+++ vendor/llvm/dist/lib/Transforms/Scalar/LoopStrengthReduce.cpp	(revision 295846)
@@ -1,5024 +1,5035 @@
 //===- LoopStrengthReduce.cpp - Strength Reduce IVs in Loops --------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This transformation analyzes and transforms the induction variables (and
 // computations derived from them) into forms suitable for efficient execution
 // on the target.
 //
 // This pass performs a strength reduction on array references inside loops that
 // have as one or more of their components the loop induction variable, it
 // rewrites expressions to take advantage of scaled-index addressing modes
 // available on the target, and it performs a variety of other optimizations
 // related to loop induction variables.
 //
 // Terminology note: this code has a lot of handling for "post-increment" or
 // "post-inc" users. This is not talking about post-increment addressing modes;
 // it is instead talking about code like this:
 //
 //   %i = phi [ 0, %entry ], [ %i.next, %latch ]
 //   ...
 //   %i.next = add %i, 1
 //   %c = icmp eq %i.next, %n
 //
 // The SCEV for %i is {0,+,1}<%L>. The SCEV for %i.next is {1,+,1}<%L>, however
 // it's useful to think about these as the same register, with some uses using
 // the value of the register before the add and some using it after. In this
 // example, the icmp is a post-increment user, since it uses %i.next, which is
 // the value of the induction variable after the increment. The other common
 // case of post-increment users is users outside the loop.
 //
 // TODO: More sophistication in the way Formulae are generated and filtered.
 //
 // TODO: Handle multiple loops at a time.
 //
 // TODO: Should the addressing mode BaseGV be changed to a ConstantExpr instead
 //       of a GlobalValue?
 //
 // TODO: When truncation is free, truncate ICmp users' operands to make it a
 //       smaller encoding (on x86 at least).
 //
 // TODO: When a negated register is used by an add (such as in a list of
 //       multiple base registers, or as the increment expression in an addrec),
 //       we may not actually need both reg and (-1 * reg) in registers; the
 //       negation can be implemented by using a sub instead of an add. The
 //       lack of support for taking this into consideration when making
 //       register pressure decisions is partly worked around by the "Special"
 //       use kind.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/Transforms/Scalar.h"
 #include "llvm/ADT/DenseSet.h"
 #include "llvm/ADT/Hashing.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallBitVector.h"
 #include "llvm/Analysis/IVUsers.h"
 #include "llvm/Analysis/LoopPass.h"
 #include "llvm/Analysis/ScalarEvolutionExpander.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/ValueHandle.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include <algorithm>
 using namespace llvm;
 
 #define DEBUG_TYPE "loop-reduce"
 
 /// MaxIVUsers is an arbitrary threshold that provides an early opportunitiy for
 /// bail out. This threshold is far beyond the number of users that LSR can
 /// conceivably solve, so it should not affect generated code, but catches the
 /// worst cases before LSR burns too much compile time and stack space.
 static const unsigned MaxIVUsers = 200;
 
 // Temporary flag to cleanup congruent phis after LSR phi expansion.
 // It's currently disabled until we can determine whether it's truly useful or
 // not. The flag should be removed after the v3.0 release.
 // This is now needed for ivchains.
 static cl::opt<bool> EnablePhiElim(
   "enable-lsr-phielim", cl::Hidden, cl::init(true),
   cl::desc("Enable LSR phi elimination"));
 
 #ifndef NDEBUG
 // Stress test IV chain generation.
 static cl::opt<bool> StressIVChain(
   "stress-ivchain", cl::Hidden, cl::init(false),
   cl::desc("Stress test LSR IV chains"));
 #else
 static bool StressIVChain = false;
 #endif
 
 namespace {
 
 struct MemAccessTy {
   /// Used in situations where the accessed memory type is unknown.
   static const unsigned UnknownAddressSpace = ~0u;
 
   Type *MemTy;
   unsigned AddrSpace;
 
   MemAccessTy() : MemTy(nullptr), AddrSpace(UnknownAddressSpace) {}
 
   MemAccessTy(Type *Ty, unsigned AS) :
     MemTy(Ty), AddrSpace(AS) {}
 
   bool operator==(MemAccessTy Other) const {
     return MemTy == Other.MemTy && AddrSpace == Other.AddrSpace;
   }
 
   bool operator!=(MemAccessTy Other) const { return !(*this == Other); }
 
   static MemAccessTy getUnknown(LLVMContext &Ctx) {
     return MemAccessTy(Type::getVoidTy(Ctx), UnknownAddressSpace);
   }
 };
 
 /// This class holds data which is used to order reuse candidates.
 class RegSortData {
 public:
   /// This represents the set of LSRUse indices which reference
   /// a particular register.
   SmallBitVector UsedByIndices;
 
   void print(raw_ostream &OS) const;
   void dump() const;
 };
 
 }
 
 void RegSortData::print(raw_ostream &OS) const {
   OS << "[NumUses=" << UsedByIndices.count() << ']';
 }
 
 LLVM_DUMP_METHOD
 void RegSortData::dump() const {
   print(errs()); errs() << '\n';
 }
 
 namespace {
 
 /// Map register candidates to information about how they are used.
 class RegUseTracker {
   typedef DenseMap<const SCEV *, RegSortData> RegUsesTy;
 
   RegUsesTy RegUsesMap;
   SmallVector<const SCEV *, 16> RegSequence;
 
 public:
   void countRegister(const SCEV *Reg, size_t LUIdx);
   void dropRegister(const SCEV *Reg, size_t LUIdx);
   void swapAndDropUse(size_t LUIdx, size_t LastLUIdx);
 
   bool isRegUsedByUsesOtherThan(const SCEV *Reg, size_t LUIdx) const;
 
   const SmallBitVector &getUsedByIndices(const SCEV *Reg) const;
 
   void clear();
 
   typedef SmallVectorImpl<const SCEV *>::iterator iterator;
   typedef SmallVectorImpl<const SCEV *>::const_iterator const_iterator;
   iterator begin() { return RegSequence.begin(); }
   iterator end()   { return RegSequence.end(); }
   const_iterator begin() const { return RegSequence.begin(); }
   const_iterator end() const   { return RegSequence.end(); }
 };
 
 }
 
 void
 RegUseTracker::countRegister(const SCEV *Reg, size_t LUIdx) {
   std::pair<RegUsesTy::iterator, bool> Pair =
     RegUsesMap.insert(std::make_pair(Reg, RegSortData()));
   RegSortData &RSD = Pair.first->second;
   if (Pair.second)
     RegSequence.push_back(Reg);
   RSD.UsedByIndices.resize(std::max(RSD.UsedByIndices.size(), LUIdx + 1));
   RSD.UsedByIndices.set(LUIdx);
 }
 
 void
 RegUseTracker::dropRegister(const SCEV *Reg, size_t LUIdx) {
   RegUsesTy::iterator It = RegUsesMap.find(Reg);
   assert(It != RegUsesMap.end());
   RegSortData &RSD = It->second;
   assert(RSD.UsedByIndices.size() > LUIdx);
   RSD.UsedByIndices.reset(LUIdx);
 }
 
 void
 RegUseTracker::swapAndDropUse(size_t LUIdx, size_t LastLUIdx) {
   assert(LUIdx <= LastLUIdx);
 
   // Update RegUses. The data structure is not optimized for this purpose;
   // we must iterate through it and update each of the bit vectors.
   for (auto &Pair : RegUsesMap) {
     SmallBitVector &UsedByIndices = Pair.second.UsedByIndices;
     if (LUIdx < UsedByIndices.size())
       UsedByIndices[LUIdx] =
         LastLUIdx < UsedByIndices.size() ? UsedByIndices[LastLUIdx] : 0;
     UsedByIndices.resize(std::min(UsedByIndices.size(), LastLUIdx));
   }
 }
 
 bool
 RegUseTracker::isRegUsedByUsesOtherThan(const SCEV *Reg, size_t LUIdx) const {
   RegUsesTy::const_iterator I = RegUsesMap.find(Reg);
   if (I == RegUsesMap.end())
     return false;
   const SmallBitVector &UsedByIndices = I->second.UsedByIndices;
   int i = UsedByIndices.find_first();
   if (i == -1) return false;
   if ((size_t)i != LUIdx) return true;
   return UsedByIndices.find_next(i) != -1;
 }
 
 const SmallBitVector &RegUseTracker::getUsedByIndices(const SCEV *Reg) const {
   RegUsesTy::const_iterator I = RegUsesMap.find(Reg);
   assert(I != RegUsesMap.end() && "Unknown register!");
   return I->second.UsedByIndices;
 }
 
 void RegUseTracker::clear() {
   RegUsesMap.clear();
   RegSequence.clear();
 }
 
 namespace {
 
 /// This class holds information that describes a formula for computing
 /// satisfying a use. It may include broken-out immediates and scaled registers.
 struct Formula {
   /// Global base address used for complex addressing.
   GlobalValue *BaseGV;
 
   /// Base offset for complex addressing.
   int64_t BaseOffset;
 
   /// Whether any complex addressing has a base register.
   bool HasBaseReg;
 
   /// The scale of any complex addressing.
   int64_t Scale;
 
   /// The list of "base" registers for this use. When this is non-empty. The
   /// canonical representation of a formula is
   /// 1. BaseRegs.size > 1 implies ScaledReg != NULL and
   /// 2. ScaledReg != NULL implies Scale != 1 || !BaseRegs.empty().
   /// #1 enforces that the scaled register is always used when at least two
   /// registers are needed by the formula: e.g., reg1 + reg2 is reg1 + 1 * reg2.
   /// #2 enforces that 1 * reg is reg.
   /// This invariant can be temporarly broken while building a formula.
   /// However, every formula inserted into the LSRInstance must be in canonical
   /// form.
   SmallVector<const SCEV *, 4> BaseRegs;
 
   /// The 'scaled' register for this use. This should be non-null when Scale is
   /// not zero.
   const SCEV *ScaledReg;
 
   /// An additional constant offset which added near the use. This requires a
   /// temporary register, but the offset itself can live in an add immediate
   /// field rather than a register.
   int64_t UnfoldedOffset;
 
   Formula()
       : BaseGV(nullptr), BaseOffset(0), HasBaseReg(false), Scale(0),
         ScaledReg(nullptr), UnfoldedOffset(0) {}
 
   void initialMatch(const SCEV *S, Loop *L, ScalarEvolution &SE);
 
   bool isCanonical() const;
 
   void canonicalize();
 
   bool unscale();
 
   size_t getNumRegs() const;
   Type *getType() const;
 
   void deleteBaseReg(const SCEV *&S);
 
   bool referencesReg(const SCEV *S) const;
   bool hasRegsUsedByUsesOtherThan(size_t LUIdx,
                                   const RegUseTracker &RegUses) const;
 
   void print(raw_ostream &OS) const;
   void dump() const;
 };
 
 }
 
 /// Recursion helper for initialMatch.
 static void DoInitialMatch(const SCEV *S, Loop *L,
                            SmallVectorImpl<const SCEV *> &Good,
                            SmallVectorImpl<const SCEV *> &Bad,
                            ScalarEvolution &SE) {
   // Collect expressions which properly dominate the loop header.
   if (SE.properlyDominates(S, L->getHeader())) {
     Good.push_back(S);
     return;
   }
 
   // Look at add operands.
   if (const SCEVAddExpr *Add = dyn_cast<SCEVAddExpr>(S)) {
     for (const SCEV *S : Add->operands())
       DoInitialMatch(S, L, Good, Bad, SE);
     return;
   }
 
   // Look at addrec operands.
   if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(S))
     if (!AR->getStart()->isZero()) {
       DoInitialMatch(AR->getStart(), L, Good, Bad, SE);
       DoInitialMatch(SE.getAddRecExpr(SE.getConstant(AR->getType(), 0),
                                       AR->getStepRecurrence(SE),
                                       // FIXME: AR->getNoWrapFlags()
                                       AR->getLoop(), SCEV::FlagAnyWrap),
                      L, Good, Bad, SE);
       return;
     }
 
   // Handle a multiplication by -1 (negation) if it didn't fold.
   if (const SCEVMulExpr *Mul = dyn_cast<SCEVMulExpr>(S))
     if (Mul->getOperand(0)->isAllOnesValue()) {
       SmallVector<const SCEV *, 4> Ops(Mul->op_begin()+1, Mul->op_end());
       const SCEV *NewMul = SE.getMulExpr(Ops);
 
       SmallVector<const SCEV *, 4> MyGood;
       SmallVector<const SCEV *, 4> MyBad;
       DoInitialMatch(NewMul, L, MyGood, MyBad, SE);
       const SCEV *NegOne = SE.getSCEV(ConstantInt::getAllOnesValue(
         SE.getEffectiveSCEVType(NewMul->getType())));
       for (const SCEV *S : MyGood)
         Good.push_back(SE.getMulExpr(NegOne, S));
       for (const SCEV *S : MyBad)
         Bad.push_back(SE.getMulExpr(NegOne, S));
       return;
     }
 
   // Ok, we can't do anything interesting. Just stuff the whole thing into a
   // register and hope for the best.
   Bad.push_back(S);
 }
 
 /// Incorporate loop-variant parts of S into this Formula, attempting to keep
 /// all loop-invariant and loop-computable values in a single base register.
 void Formula::initialMatch(const SCEV *S, Loop *L, ScalarEvolution &SE) {
   SmallVector<const SCEV *, 4> Good;
   SmallVector<const SCEV *, 4> Bad;
   DoInitialMatch(S, L, Good, Bad, SE);
   if (!Good.empty()) {
     const SCEV *Sum = SE.getAddExpr(Good);
     if (!Sum->isZero())
       BaseRegs.push_back(Sum);
     HasBaseReg = true;
   }
   if (!Bad.empty()) {
     const SCEV *Sum = SE.getAddExpr(Bad);
     if (!Sum->isZero())
       BaseRegs.push_back(Sum);
     HasBaseReg = true;
   }
   canonicalize();
 }
 
 /// \brief Check whether or not this formula statisfies the canonical
 /// representation.
 /// \see Formula::BaseRegs.
 bool Formula::isCanonical() const {
   if (ScaledReg)
     return Scale != 1 || !BaseRegs.empty();
   return BaseRegs.size() <= 1;
 }
 
 /// \brief Helper method to morph a formula into its canonical representation.
 /// \see Formula::BaseRegs.
 /// Every formula having more than one base register, must use the ScaledReg
 /// field. Otherwise, we would have to do special cases everywhere in LSR
 /// to treat reg1 + reg2 + ... the same way as reg1 + 1*reg2 + ...
 /// On the other hand, 1*reg should be canonicalized into reg.
 void Formula::canonicalize() {
   if (isCanonical())
     return;
   // So far we did not need this case. This is easy to implement but it is
   // useless to maintain dead code. Beside it could hurt compile time.
   assert(!BaseRegs.empty() && "1*reg => reg, should not be needed.");
   // Keep the invariant sum in BaseRegs and one of the variant sum in ScaledReg.
   ScaledReg = BaseRegs.back();
   BaseRegs.pop_back();
   Scale = 1;
   size_t BaseRegsSize = BaseRegs.size();
   size_t Try = 0;
   // If ScaledReg is an invariant, try to find a variant expression.
   while (Try < BaseRegsSize && !isa<SCEVAddRecExpr>(ScaledReg))
     std::swap(ScaledReg, BaseRegs[Try++]);
 }
 
 /// \brief Get rid of the scale in the formula.
 /// In other words, this method morphes reg1 + 1*reg2 into reg1 + reg2.
 /// \return true if it was possible to get rid of the scale, false otherwise.
 /// \note After this operation the formula may not be in the canonical form.
 bool Formula::unscale() {
   if (Scale != 1)
     return false;
   Scale = 0;
   BaseRegs.push_back(ScaledReg);
   ScaledReg = nullptr;
   return true;
 }
 
 /// Return the total number of register operands used by this formula. This does
 /// not include register uses implied by non-constant addrec strides.
 size_t Formula::getNumRegs() const {
   return !!ScaledReg + BaseRegs.size();
 }
 
 /// Return the type of this formula, if it has one, or null otherwise. This type
 /// is meaningless except for the bit size.
 Type *Formula::getType() const {
   return !BaseRegs.empty() ? BaseRegs.front()->getType() :
          ScaledReg ? ScaledReg->getType() :
          BaseGV ? BaseGV->getType() :
          nullptr;
 }
 
 /// Delete the given base reg from the BaseRegs list.
 void Formula::deleteBaseReg(const SCEV *&S) {
   if (&S != &BaseRegs.back())
     std::swap(S, BaseRegs.back());
   BaseRegs.pop_back();
 }
 
 /// Test if this formula references the given register.
 bool Formula::referencesReg(const SCEV *S) const {
   return S == ScaledReg ||
          std::find(BaseRegs.begin(), BaseRegs.end(), S) != BaseRegs.end();
 }
 
 /// Test whether this formula uses registers which are used by uses other than
 /// the use with the given index.
 bool Formula::hasRegsUsedByUsesOtherThan(size_t LUIdx,
                                          const RegUseTracker &RegUses) const {
   if (ScaledReg)
     if (RegUses.isRegUsedByUsesOtherThan(ScaledReg, LUIdx))
       return true;
   for (const SCEV *BaseReg : BaseRegs)
     if (RegUses.isRegUsedByUsesOtherThan(BaseReg, LUIdx))
       return true;
   return false;
 }
 
 void Formula::print(raw_ostream &OS) const {
   bool First = true;
   if (BaseGV) {
     if (!First) OS << " + "; else First = false;
     BaseGV->printAsOperand(OS, /*PrintType=*/false);
   }
   if (BaseOffset != 0) {
     if (!First) OS << " + "; else First = false;
     OS << BaseOffset;
   }
   for (const SCEV *BaseReg : BaseRegs) {
     if (!First) OS << " + "; else First = false;
     OS << "reg(" << *BaseReg << ')';
   }
   if (HasBaseReg && BaseRegs.empty()) {
     if (!First) OS << " + "; else First = false;
     OS << "**error: HasBaseReg**";
   } else if (!HasBaseReg && !BaseRegs.empty()) {
     if (!First) OS << " + "; else First = false;
     OS << "**error: !HasBaseReg**";
   }
   if (Scale != 0) {
     if (!First) OS << " + "; else First = false;
     OS << Scale << "*reg(";
     if (ScaledReg)
       OS << *ScaledReg;
     else
       OS << "<unknown>";
     OS << ')';
   }
   if (UnfoldedOffset != 0) {
     if (!First) OS << " + ";
     OS << "imm(" << UnfoldedOffset << ')';
   }
 }
 
 LLVM_DUMP_METHOD
 void Formula::dump() const {
   print(errs()); errs() << '\n';
 }
 
 /// Return true if the given addrec can be sign-extended without changing its
 /// value.
 static bool isAddRecSExtable(const SCEVAddRecExpr *AR, ScalarEvolution &SE) {
   Type *WideTy =
     IntegerType::get(SE.getContext(), SE.getTypeSizeInBits(AR->getType()) + 1);
   return isa<SCEVAddRecExpr>(SE.getSignExtendExpr(AR, WideTy));
 }
 
 /// Return true if the given add can be sign-extended without changing its
 /// value.
 static bool isAddSExtable(const SCEVAddExpr *A, ScalarEvolution &SE) {
   Type *WideTy =
     IntegerType::get(SE.getContext(), SE.getTypeSizeInBits(A->getType()) + 1);
   return isa<SCEVAddExpr>(SE.getSignExtendExpr(A, WideTy));
 }
 
 /// Return true if the given mul can be sign-extended without changing its
 /// value.
 static bool isMulSExtable(const SCEVMulExpr *M, ScalarEvolution &SE) {
   Type *WideTy =
     IntegerType::get(SE.getContext(),
                      SE.getTypeSizeInBits(M->getType()) * M->getNumOperands());
   return isa<SCEVMulExpr>(SE.getSignExtendExpr(M, WideTy));
 }
 
 /// Return an expression for LHS /s RHS, if it can be determined and if the
 /// remainder is known to be zero, or null otherwise. If IgnoreSignificantBits
 /// is true, expressions like (X * Y) /s Y are simplified to Y, ignoring that
 /// the multiplication may overflow, which is useful when the result will be
 /// used in a context where the most significant bits are ignored.
 static const SCEV *getExactSDiv(const SCEV *LHS, const SCEV *RHS,
                                 ScalarEvolution &SE,
                                 bool IgnoreSignificantBits = false) {
   // Handle the trivial case, which works for any SCEV type.
   if (LHS == RHS)
     return SE.getConstant(LHS->getType(), 1);
 
   // Handle a few RHS special cases.
   const SCEVConstant *RC = dyn_cast<SCEVConstant>(RHS);
   if (RC) {
     const APInt &RA = RC->getAPInt();
     // Handle x /s -1 as x * -1, to give ScalarEvolution a chance to do
     // some folding.
     if (RA.isAllOnesValue())
       return SE.getMulExpr(LHS, RC);
     // Handle x /s 1 as x.
     if (RA == 1)
       return LHS;
   }
 
   // Check for a division of a constant by a constant.
   if (const SCEVConstant *C = dyn_cast<SCEVConstant>(LHS)) {
     if (!RC)
       return nullptr;
     const APInt &LA = C->getAPInt();
     const APInt &RA = RC->getAPInt();
     if (LA.srem(RA) != 0)
       return nullptr;
     return SE.getConstant(LA.sdiv(RA));
   }
 
   // Distribute the sdiv over addrec operands, if the addrec doesn't overflow.
   if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(LHS)) {
     if (IgnoreSignificantBits || isAddRecSExtable(AR, SE)) {
       const SCEV *Step = getExactSDiv(AR->getStepRecurrence(SE), RHS, SE,
                                       IgnoreSignificantBits);
       if (!Step) return nullptr;
       const SCEV *Start = getExactSDiv(AR->getStart(), RHS, SE,
                                        IgnoreSignificantBits);
       if (!Start) return nullptr;
       // FlagNW is independent of the start value, step direction, and is
       // preserved with smaller magnitude steps.
       // FIXME: AR->getNoWrapFlags(SCEV::FlagNW)
       return SE.getAddRecExpr(Start, Step, AR->getLoop(), SCEV::FlagAnyWrap);
     }
     return nullptr;
   }
 
   // Distribute the sdiv over add operands, if the add doesn't overflow.
   if (const SCEVAddExpr *Add = dyn_cast<SCEVAddExpr>(LHS)) {
     if (IgnoreSignificantBits || isAddSExtable(Add, SE)) {
       SmallVector<const SCEV *, 8> Ops;
       for (const SCEV *S : Add->operands()) {
         const SCEV *Op = getExactSDiv(S, RHS, SE, IgnoreSignificantBits);
         if (!Op) return nullptr;
         Ops.push_back(Op);
       }
       return SE.getAddExpr(Ops);
     }
     return nullptr;
   }
 
   // Check for a multiply operand that we can pull RHS out of.
   if (const SCEVMulExpr *Mul = dyn_cast<SCEVMulExpr>(LHS)) {
     if (IgnoreSignificantBits || isMulSExtable(Mul, SE)) {
       SmallVector<const SCEV *, 4> Ops;
       bool Found = false;
       for (const SCEV *S : Mul->operands()) {
         if (!Found)
           if (const SCEV *Q = getExactSDiv(S, RHS, SE,
                                            IgnoreSignificantBits)) {
             S = Q;
             Found = true;
           }
         Ops.push_back(S);
       }
       return Found ? SE.getMulExpr(Ops) : nullptr;
     }
     return nullptr;
   }
 
   // Otherwise we don't know.
   return nullptr;
 }
 
 /// If S involves the addition of a constant integer value, return that integer
 /// value, and mutate S to point to a new SCEV with that value excluded.
 static int64_t ExtractImmediate(const SCEV *&S, ScalarEvolution &SE) {
   if (const SCEVConstant *C = dyn_cast<SCEVConstant>(S)) {
     if (C->getAPInt().getMinSignedBits() <= 64) {
       S = SE.getConstant(C->getType(), 0);
       return C->getValue()->getSExtValue();
     }
   } else if (const SCEVAddExpr *Add = dyn_cast<SCEVAddExpr>(S)) {
     SmallVector<const SCEV *, 8> NewOps(Add->op_begin(), Add->op_end());
     int64_t Result = ExtractImmediate(NewOps.front(), SE);
     if (Result != 0)
       S = SE.getAddExpr(NewOps);
     return Result;
   } else if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(S)) {
     SmallVector<const SCEV *, 8> NewOps(AR->op_begin(), AR->op_end());
     int64_t Result = ExtractImmediate(NewOps.front(), SE);
     if (Result != 0)
       S = SE.getAddRecExpr(NewOps, AR->getLoop(),
                            // FIXME: AR->getNoWrapFlags(SCEV::FlagNW)
                            SCEV::FlagAnyWrap);
     return Result;
   }
   return 0;
 }
 
 /// If S involves the addition of a GlobalValue address, return that symbol, and
 /// mutate S to point to a new SCEV with that value excluded.
 static GlobalValue *ExtractSymbol(const SCEV *&S, ScalarEvolution &SE) {
   if (const SCEVUnknown *U = dyn_cast<SCEVUnknown>(S)) {
     if (GlobalValue *GV = dyn_cast<GlobalValue>(U->getValue())) {
       S = SE.getConstant(GV->getType(), 0);
       return GV;
     }
   } else if (const SCEVAddExpr *Add = dyn_cast<SCEVAddExpr>(S)) {
     SmallVector<const SCEV *, 8> NewOps(Add->op_begin(), Add->op_end());
     GlobalValue *Result = ExtractSymbol(NewOps.back(), SE);
     if (Result)
       S = SE.getAddExpr(NewOps);
     return Result;
   } else if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(S)) {
     SmallVector<const SCEV *, 8> NewOps(AR->op_begin(), AR->op_end());
     GlobalValue *Result = ExtractSymbol(NewOps.front(), SE);
     if (Result)
       S = SE.getAddRecExpr(NewOps, AR->getLoop(),
                            // FIXME: AR->getNoWrapFlags(SCEV::FlagNW)
                            SCEV::FlagAnyWrap);
     return Result;
   }
   return nullptr;
 }
 
 /// Returns true if the specified instruction is using the specified value as an
 /// address.
 static bool isAddressUse(Instruction *Inst, Value *OperandVal) {
   bool isAddress = isa<LoadInst>(Inst);
   if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {
     if (SI->getOperand(1) == OperandVal)
       isAddress = true;
   } else if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {
     // Addressing modes can also be folded into prefetches and a variety
     // of intrinsics.
     switch (II->getIntrinsicID()) {
       default: break;
       case Intrinsic::prefetch:
       case Intrinsic::x86_sse_storeu_ps:
       case Intrinsic::x86_sse2_storeu_pd:
       case Intrinsic::x86_sse2_storeu_dq:
       case Intrinsic::x86_sse2_storel_dq:
         if (II->getArgOperand(0) == OperandVal)
           isAddress = true;
         break;
     }
   }
   return isAddress;
 }
 
 /// Return the type of the memory being accessed.
 static MemAccessTy getAccessType(const Instruction *Inst) {
   MemAccessTy AccessTy(Inst->getType(), MemAccessTy::UnknownAddressSpace);
   if (const StoreInst *SI = dyn_cast<StoreInst>(Inst)) {
     AccessTy.MemTy = SI->getOperand(0)->getType();
     AccessTy.AddrSpace = SI->getPointerAddressSpace();
   } else if (const LoadInst *LI = dyn_cast<LoadInst>(Inst)) {
     AccessTy.AddrSpace = LI->getPointerAddressSpace();
   } else if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {
     // Addressing modes can also be folded into prefetches and a variety
     // of intrinsics.
     switch (II->getIntrinsicID()) {
     default: break;
     case Intrinsic::x86_sse_storeu_ps:
     case Intrinsic::x86_sse2_storeu_pd:
     case Intrinsic::x86_sse2_storeu_dq:
     case Intrinsic::x86_sse2_storel_dq:
       AccessTy.MemTy = II->getArgOperand(0)->getType();
       break;
     }
   }
 
   // All pointers have the same requirements, so canonicalize them to an
   // arbitrary pointer type to minimize variation.
   if (PointerType *PTy = dyn_cast<PointerType>(AccessTy.MemTy))
     AccessTy.MemTy = PointerType::get(IntegerType::get(PTy->getContext(), 1),
                                       PTy->getAddressSpace());
 
   return AccessTy;
 }
 
 /// Return true if this AddRec is already a phi in its loop.
 static bool isExistingPhi(const SCEVAddRecExpr *AR, ScalarEvolution &SE) {
   for (BasicBlock::iterator I = AR->getLoop()->getHeader()->begin();
        PHINode *PN = dyn_cast<PHINode>(I); ++I) {
     if (SE.isSCEVable(PN->getType()) &&
         (SE.getEffectiveSCEVType(PN->getType()) ==
          SE.getEffectiveSCEVType(AR->getType())) &&
         SE.getSCEV(PN) == AR)
       return true;
   }
   return false;
 }
 
 /// Check if expanding this expression is likely to incur significant cost. This
 /// is tricky because SCEV doesn't track which expressions are actually computed
 /// by the current IR.
 ///
 /// We currently allow expansion of IV increments that involve adds,
 /// multiplication by constants, and AddRecs from existing phis.
 ///
 /// TODO: Allow UDivExpr if we can find an existing IV increment that is an
 /// obvious multiple of the UDivExpr.
 static bool isHighCostExpansion(const SCEV *S,
                                 SmallPtrSetImpl<const SCEV*> &Processed,
                                 ScalarEvolution &SE) {
   // Zero/One operand expressions
   switch (S->getSCEVType()) {
   case scUnknown:
   case scConstant:
     return false;
   case scTruncate:
     return isHighCostExpansion(cast<SCEVTruncateExpr>(S)->getOperand(),
                                Processed, SE);
   case scZeroExtend:
     return isHighCostExpansion(cast<SCEVZeroExtendExpr>(S)->getOperand(),
                                Processed, SE);
   case scSignExtend:
     return isHighCostExpansion(cast<SCEVSignExtendExpr>(S)->getOperand(),
                                Processed, SE);
   }
 
   if (!Processed.insert(S).second)
     return false;
 
   if (const SCEVAddExpr *Add = dyn_cast<SCEVAddExpr>(S)) {
     for (const SCEV *S : Add->operands()) {
       if (isHighCostExpansion(S, Processed, SE))
         return true;
     }
     return false;
   }
 
   if (const SCEVMulExpr *Mul = dyn_cast<SCEVMulExpr>(S)) {
     if (Mul->getNumOperands() == 2) {
       // Multiplication by a constant is ok
       if (isa<SCEVConstant>(Mul->getOperand(0)))
         return isHighCostExpansion(Mul->getOperand(1), Processed, SE);
 
       // If we have the value of one operand, check if an existing
       // multiplication already generates this expression.
       if (const SCEVUnknown *U = dyn_cast<SCEVUnknown>(Mul->getOperand(1))) {
         Value *UVal = U->getValue();
         for (User *UR : UVal->users()) {
           // If U is a constant, it may be used by a ConstantExpr.
           Instruction *UI = dyn_cast<Instruction>(UR);
           if (UI && UI->getOpcode() == Instruction::Mul &&
               SE.isSCEVable(UI->getType())) {
             return SE.getSCEV(UI) == Mul;
           }
         }
       }
     }
   }
 
   if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(S)) {
     if (isExistingPhi(AR, SE))
       return false;
   }
 
   // Fow now, consider any other type of expression (div/mul/min/max) high cost.
   return true;
 }
 
 /// If any of the instructions is the specified set are trivially dead, delete
 /// them and see if this makes any of their operands subsequently dead.
 static bool
 DeleteTriviallyDeadInstructions(SmallVectorImpl<WeakVH> &DeadInsts) {
   bool Changed = false;
 
   while (!DeadInsts.empty()) {
     Value *V = DeadInsts.pop_back_val();
     Instruction *I = dyn_cast_or_null<Instruction>(V);
 
     if (!I || !isInstructionTriviallyDead(I))
       continue;
 
     for (Use &O : I->operands())
       if (Instruction *U = dyn_cast<Instruction>(O)) {
         O = nullptr;
         if (U->use_empty())
           DeadInsts.emplace_back(U);
       }
 
     I->eraseFromParent();
     Changed = true;
   }
 
   return Changed;
 }
 
 namespace {
 class LSRUse;
 }
 
 /// \brief Check if the addressing mode defined by \p F is completely
 /// folded in \p LU at isel time.
 /// This includes address-mode folding and special icmp tricks.
 /// This function returns true if \p LU can accommodate what \p F
 /// defines and up to 1 base + 1 scaled + offset.
 /// In other words, if \p F has several base registers, this function may
 /// still return true. Therefore, users still need to account for
 /// additional base registers and/or unfolded offsets to derive an
 /// accurate cost model.
 static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
                                  const LSRUse &LU, const Formula &F);
 // Get the cost of the scaling factor used in F for LU.
 static unsigned getScalingFactorCost(const TargetTransformInfo &TTI,
                                      const LSRUse &LU, const Formula &F);
 
 namespace {
 
 /// This class is used to measure and compare candidate formulae.
 class Cost {
   /// TODO: Some of these could be merged. Also, a lexical ordering
   /// isn't always optimal.
   unsigned NumRegs;
   unsigned AddRecCost;
   unsigned NumIVMuls;
   unsigned NumBaseAdds;
   unsigned ImmCost;
   unsigned SetupCost;
   unsigned ScaleCost;
 
 public:
   Cost()
     : NumRegs(0), AddRecCost(0), NumIVMuls(0), NumBaseAdds(0), ImmCost(0),
       SetupCost(0), ScaleCost(0) {}
 
   bool operator<(const Cost &Other) const;
 
   void Lose();
 
 #ifndef NDEBUG
   // Once any of the metrics loses, they must all remain losers.
   bool isValid() {
     return ((NumRegs | AddRecCost | NumIVMuls | NumBaseAdds
              | ImmCost | SetupCost | ScaleCost) != ~0u)
       || ((NumRegs & AddRecCost & NumIVMuls & NumBaseAdds
            & ImmCost & SetupCost & ScaleCost) == ~0u);
   }
 #endif
 
   bool isLoser() {
     assert(isValid() && "invalid cost");
     return NumRegs == ~0u;
   }
 
   void RateFormula(const TargetTransformInfo &TTI,
                    const Formula &F,
                    SmallPtrSetImpl<const SCEV *> &Regs,
                    const DenseSet<const SCEV *> &VisitedRegs,
                    const Loop *L,
                    const SmallVectorImpl<int64_t> &Offsets,
                    ScalarEvolution &SE, DominatorTree &DT,
                    const LSRUse &LU,
                    SmallPtrSetImpl<const SCEV *> *LoserRegs = nullptr);
 
   void print(raw_ostream &OS) const;
   void dump() const;
 
 private:
   void RateRegister(const SCEV *Reg,
                     SmallPtrSetImpl<const SCEV *> &Regs,
                     const Loop *L,
                     ScalarEvolution &SE, DominatorTree &DT);
   void RatePrimaryRegister(const SCEV *Reg,
                            SmallPtrSetImpl<const SCEV *> &Regs,
                            const Loop *L,
                            ScalarEvolution &SE, DominatorTree &DT,
                            SmallPtrSetImpl<const SCEV *> *LoserRegs);
 };
 
 }
 
 /// Tally up interesting quantities from the given register.
 void Cost::RateRegister(const SCEV *Reg,
                         SmallPtrSetImpl<const SCEV *> &Regs,
                         const Loop *L,
                         ScalarEvolution &SE, DominatorTree &DT) {
   if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {
     // If this is an addrec for another loop, don't second-guess its addrec phi
     // nodes. LSR isn't currently smart enough to reason about more than one
     // loop at a time. LSR has already run on inner loops, will not run on outer
     // loops, and cannot be expected to change sibling loops.
     if (AR->getLoop() != L) {
       // If the AddRec exists, consider it's register free and leave it alone.
       if (isExistingPhi(AR, SE))
         return;
 
       // Otherwise, do not consider this formula at all.
       Lose();
       return;
     }
     AddRecCost += 1; /// TODO: This should be a function of the stride.
 
     // Add the step value register, if it needs one.
     // TODO: The non-affine case isn't precisely modeled here.
     if (!AR->isAffine() || !isa<SCEVConstant>(AR->getOperand(1))) {
       if (!Regs.count(AR->getOperand(1))) {
         RateRegister(AR->getOperand(1), Regs, L, SE, DT);
         if (isLoser())
           return;
       }
     }
   }
   ++NumRegs;
 
   // Rough heuristic; favor registers which don't require extra setup
   // instructions in the preheader.
   if (!isa<SCEVUnknown>(Reg) &&
       !isa<SCEVConstant>(Reg) &&
       !(isa<SCEVAddRecExpr>(Reg) &&
         (isa<SCEVUnknown>(cast<SCEVAddRecExpr>(Reg)->getStart()) ||
          isa<SCEVConstant>(cast<SCEVAddRecExpr>(Reg)->getStart()))))
     ++SetupCost;
 
     NumIVMuls += isa<SCEVMulExpr>(Reg) &&
                  SE.hasComputableLoopEvolution(Reg, L);
 }
 
 /// Record this register in the set. If we haven't seen it before, rate
 /// it. Optional LoserRegs provides a way to declare any formula that refers to
 /// one of those regs an instant loser.
 void Cost::RatePrimaryRegister(const SCEV *Reg,
                                SmallPtrSetImpl<const SCEV *> &Regs,
                                const Loop *L,
                                ScalarEvolution &SE, DominatorTree &DT,
                                SmallPtrSetImpl<const SCEV *> *LoserRegs) {
   if (LoserRegs && LoserRegs->count(Reg)) {
     Lose();
     return;
   }
   if (Regs.insert(Reg).second) {
     RateRegister(Reg, Regs, L, SE, DT);
     if (LoserRegs && isLoser())
       LoserRegs->insert(Reg);
   }
 }
 
 void Cost::RateFormula(const TargetTransformInfo &TTI,
                        const Formula &F,
                        SmallPtrSetImpl<const SCEV *> &Regs,
                        const DenseSet<const SCEV *> &VisitedRegs,
                        const Loop *L,
                        const SmallVectorImpl<int64_t> &Offsets,
                        ScalarEvolution &SE, DominatorTree &DT,
                        const LSRUse &LU,
                        SmallPtrSetImpl<const SCEV *> *LoserRegs) {
   assert(F.isCanonical() && "Cost is accurate only for canonical formula");
   // Tally up the registers.
   if (const SCEV *ScaledReg = F.ScaledReg) {
     if (VisitedRegs.count(ScaledReg)) {
       Lose();
       return;
     }
     RatePrimaryRegister(ScaledReg, Regs, L, SE, DT, LoserRegs);
     if (isLoser())
       return;
   }
   for (const SCEV *BaseReg : F.BaseRegs) {
     if (VisitedRegs.count(BaseReg)) {
       Lose();
       return;
     }
     RatePrimaryRegister(BaseReg, Regs, L, SE, DT, LoserRegs);
     if (isLoser())
       return;
   }
 
   // Determine how many (unfolded) adds we'll need inside the loop.
   size_t NumBaseParts = F.getNumRegs();
   if (NumBaseParts > 1)
     // Do not count the base and a possible second register if the target
     // allows to fold 2 registers.
     NumBaseAdds +=
         NumBaseParts - (1 + (F.Scale && isAMCompletelyFolded(TTI, LU, F)));
   NumBaseAdds += (F.UnfoldedOffset != 0);
 
   // Accumulate non-free scaling amounts.
   ScaleCost += getScalingFactorCost(TTI, LU, F);
 
   // Tally up the non-zero immediates.
   for (int64_t O : Offsets) {
     int64_t Offset = (uint64_t)O + F.BaseOffset;
     if (F.BaseGV)
       ImmCost += 64; // Handle symbolic values conservatively.
                      // TODO: This should probably be the pointer size.
     else if (Offset != 0)
       ImmCost += APInt(64, Offset, true).getMinSignedBits();
   }
   assert(isValid() && "invalid cost");
 }
 
 /// Set this cost to a losing value.
 void Cost::Lose() {
   NumRegs = ~0u;
   AddRecCost = ~0u;
   NumIVMuls = ~0u;
   NumBaseAdds = ~0u;
   ImmCost = ~0u;
   SetupCost = ~0u;
   ScaleCost = ~0u;
 }
 
 /// Choose the lower cost.
 bool Cost::operator<(const Cost &Other) const {
   return std::tie(NumRegs, AddRecCost, NumIVMuls, NumBaseAdds, ScaleCost,
                   ImmCost, SetupCost) <
          std::tie(Other.NumRegs, Other.AddRecCost, Other.NumIVMuls,
                   Other.NumBaseAdds, Other.ScaleCost, Other.ImmCost,
                   Other.SetupCost);
 }
 
 void Cost::print(raw_ostream &OS) const {
   OS << NumRegs << " reg" << (NumRegs == 1 ? "" : "s");
   if (AddRecCost != 0)
     OS << ", with addrec cost " << AddRecCost;
   if (NumIVMuls != 0)
     OS << ", plus " << NumIVMuls << " IV mul" << (NumIVMuls == 1 ? "" : "s");
   if (NumBaseAdds != 0)
     OS << ", plus " << NumBaseAdds << " base add"
        << (NumBaseAdds == 1 ? "" : "s");
   if (ScaleCost != 0)
     OS << ", plus " << ScaleCost << " scale cost";
   if (ImmCost != 0)
     OS << ", plus " << ImmCost << " imm cost";
   if (SetupCost != 0)
     OS << ", plus " << SetupCost << " setup cost";
 }
 
 LLVM_DUMP_METHOD
 void Cost::dump() const {
   print(errs()); errs() << '\n';
 }
 
 namespace {
 
 /// An operand value in an instruction which is to be replaced with some
 /// equivalent, possibly strength-reduced, replacement.
 struct LSRFixup {
   /// The instruction which will be updated.
   Instruction *UserInst;
 
   /// The operand of the instruction which will be replaced. The operand may be
   /// used more than once; every instance will be replaced.
   Value *OperandValToReplace;
 
   /// If this user is to use the post-incremented value of an induction
   /// variable, this variable is non-null and holds the loop associated with the
   /// induction variable.
   PostIncLoopSet PostIncLoops;
 
   /// The index of the LSRUse describing the expression which this fixup needs,
   /// minus an offset (below).
   size_t LUIdx;
 
   /// A constant offset to be added to the LSRUse expression.  This allows
   /// multiple fixups to share the same LSRUse with different offsets, for
   /// example in an unrolled loop.
   int64_t Offset;
 
   bool isUseFullyOutsideLoop(const Loop *L) const;
 
   LSRFixup();
 
   void print(raw_ostream &OS) const;
   void dump() const;
 };
 
 }
 
 LSRFixup::LSRFixup()
   : UserInst(nullptr), OperandValToReplace(nullptr), LUIdx(~size_t(0)),
     Offset(0) {}
 
 /// Test whether this fixup always uses its value outside of the given loop.
 bool LSRFixup::isUseFullyOutsideLoop(const Loop *L) const {
   // PHI nodes use their value in their incoming blocks.
   if (const PHINode *PN = dyn_cast<PHINode>(UserInst)) {
     for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i)
       if (PN->getIncomingValue(i) == OperandValToReplace &&
           L->contains(PN->getIncomingBlock(i)))
         return false;
     return true;
   }
 
   return !L->contains(UserInst);
 }
 
 void LSRFixup::print(raw_ostream &OS) const {
   OS << "UserInst=";
   // Store is common and interesting enough to be worth special-casing.
   if (StoreInst *Store = dyn_cast<StoreInst>(UserInst)) {
     OS << "store ";
     Store->getOperand(0)->printAsOperand(OS, /*PrintType=*/false);
   } else if (UserInst->getType()->isVoidTy())
     OS << UserInst->getOpcodeName();
   else
     UserInst->printAsOperand(OS, /*PrintType=*/false);
 
   OS << ", OperandValToReplace=";
   OperandValToReplace->printAsOperand(OS, /*PrintType=*/false);
 
   for (const Loop *PIL : PostIncLoops) {
     OS << ", PostIncLoop=";
     PIL->getHeader()->printAsOperand(OS, /*PrintType=*/false);
   }
 
   if (LUIdx != ~size_t(0))
     OS << ", LUIdx=" << LUIdx;
 
   if (Offset != 0)
     OS << ", Offset=" << Offset;
 }
 
 LLVM_DUMP_METHOD
 void LSRFixup::dump() const {
   print(errs()); errs() << '\n';
 }
 
 namespace {
 
 /// A DenseMapInfo implementation for holding DenseMaps and DenseSets of sorted
 /// SmallVectors of const SCEV*.
 struct UniquifierDenseMapInfo {
   static SmallVector<const SCEV *, 4> getEmptyKey() {
     SmallVector<const SCEV *, 4>  V;
     V.push_back(reinterpret_cast<const SCEV *>(-1));
     return V;
   }
 
   static SmallVector<const SCEV *, 4> getTombstoneKey() {
     SmallVector<const SCEV *, 4> V;
     V.push_back(reinterpret_cast<const SCEV *>(-2));
     return V;
   }
 
   static unsigned getHashValue(const SmallVector<const SCEV *, 4> &V) {
     return static_cast<unsigned>(hash_combine_range(V.begin(), V.end()));
   }
 
   static bool isEqual(const SmallVector<const SCEV *, 4> &LHS,
                       const SmallVector<const SCEV *, 4> &RHS) {
     return LHS == RHS;
   }
 };
 
 /// This class holds the state that LSR keeps for each use in IVUsers, as well
 /// as uses invented by LSR itself. It includes information about what kinds of
 /// things can be folded into the user, information about the user itself, and
 /// information about how the use may be satisfied.  TODO: Represent multiple
 /// users of the same expression in common?
 class LSRUse {
   DenseSet<SmallVector<const SCEV *, 4>, UniquifierDenseMapInfo> Uniquifier;
 
 public:
   /// An enum for a kind of use, indicating what types of scaled and immediate
   /// operands it might support.
   enum KindType {
     Basic,   ///< A normal use, with no folding.
     Special, ///< A special case of basic, allowing -1 scales.
     Address, ///< An address use; folding according to TargetLowering
     ICmpZero ///< An equality icmp with both operands folded into one.
     // TODO: Add a generic icmp too?
   };
 
   typedef PointerIntPair<const SCEV *, 2, KindType> SCEVUseKindPair;
 
   KindType Kind;
   MemAccessTy AccessTy;
 
   SmallVector<int64_t, 8> Offsets;
   int64_t MinOffset;
   int64_t MaxOffset;
 
   /// This records whether all of the fixups using this LSRUse are outside of
   /// the loop, in which case some special-case heuristics may be used.
   bool AllFixupsOutsideLoop;
 
   /// RigidFormula is set to true to guarantee that this use will be associated
   /// with a single formula--the one that initially matched. Some SCEV
   /// expressions cannot be expanded. This allows LSR to consider the registers
   /// used by those expressions without the need to expand them later after
   /// changing the formula.
   bool RigidFormula;
 
   /// This records the widest use type for any fixup using this
   /// LSRUse. FindUseWithSimilarFormula can't consider uses with different max
   /// fixup widths to be equivalent, because the narrower one may be relying on
   /// the implicit truncation to truncate away bogus bits.
   Type *WidestFixupType;
 
   /// A list of ways to build a value that can satisfy this user.  After the
   /// list is populated, one of these is selected heuristically and used to
   /// formulate a replacement for OperandValToReplace in UserInst.
   SmallVector<Formula, 12> Formulae;
 
   /// The set of register candidates used by all formulae in this LSRUse.
   SmallPtrSet<const SCEV *, 4> Regs;
 
   LSRUse(KindType K, MemAccessTy AT)
       : Kind(K), AccessTy(AT), MinOffset(INT64_MAX), MaxOffset(INT64_MIN),
         AllFixupsOutsideLoop(true), RigidFormula(false),
         WidestFixupType(nullptr) {}
 
   bool HasFormulaWithSameRegs(const Formula &F) const;
   bool InsertFormula(const Formula &F);
   void DeleteFormula(Formula &F);
   void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);
 
   void print(raw_ostream &OS) const;
   void dump() const;
 };
 
 }
 
 /// Test whether this use as a formula which has the same registers as the given
 /// formula.
 bool LSRUse::HasFormulaWithSameRegs(const Formula &F) const {
   SmallVector<const SCEV *, 4> Key = F.BaseRegs;
   if (F.ScaledReg) Key.push_back(F.ScaledReg);
   // Unstable sort by host order ok, because this is only used for uniquifying.
   std::sort(Key.begin(), Key.end());
   return Uniquifier.count(Key);
 }
 
 /// If the given formula has not yet been inserted, add it to the list, and
 /// return true. Return false otherwise.  The formula must be in canonical form.
 bool LSRUse::InsertFormula(const Formula &F) {
   assert(F.isCanonical() && "Invalid canonical representation");
 
   if (!Formulae.empty() && RigidFormula)
     return false;
 
   SmallVector<const SCEV *, 4> Key = F.BaseRegs;
   if (F.ScaledReg) Key.push_back(F.ScaledReg);
   // Unstable sort by host order ok, because this is only used for uniquifying.
   std::sort(Key.begin(), Key.end());
 
   if (!Uniquifier.insert(Key).second)
     return false;
 
   // Using a register to hold the value of 0 is not profitable.
   assert((!F.ScaledReg || !F.ScaledReg->isZero()) &&
          "Zero allocated in a scaled register!");
 #ifndef NDEBUG
   for (const SCEV *BaseReg : F.BaseRegs)
     assert(!BaseReg->isZero() && "Zero allocated in a base register!");
 #endif
 
   // Add the formula to the list.
   Formulae.push_back(F);
 
   // Record registers now being used by this use.
   Regs.insert(F.BaseRegs.begin(), F.BaseRegs.end());
   if (F.ScaledReg)
     Regs.insert(F.ScaledReg);
 
   return true;
 }
 
 /// Remove the given formula from this use's list.
 void LSRUse::DeleteFormula(Formula &F) {
   if (&F != &Formulae.back())
     std::swap(F, Formulae.back());
   Formulae.pop_back();
 }
 
 /// Recompute the Regs field, and update RegUses.
 void LSRUse::RecomputeRegs(size_t LUIdx, RegUseTracker &RegUses) {
   // Now that we've filtered out some formulae, recompute the Regs set.
   SmallPtrSet<const SCEV *, 4> OldRegs = std::move(Regs);
   Regs.clear();
   for (const Formula &F : Formulae) {
     if (F.ScaledReg) Regs.insert(F.ScaledReg);
     Regs.insert(F.BaseRegs.begin(), F.BaseRegs.end());
   }
 
   // Update the RegTracker.
   for (const SCEV *S : OldRegs)
     if (!Regs.count(S))
       RegUses.dropRegister(S, LUIdx);
 }
 
 void LSRUse::print(raw_ostream &OS) const {
   OS << "LSR Use: Kind=";
   switch (Kind) {
   case Basic:    OS << "Basic"; break;
   case Special:  OS << "Special"; break;
   case ICmpZero: OS << "ICmpZero"; break;
   case Address:
     OS << "Address of ";
     if (AccessTy.MemTy->isPointerTy())
       OS << "pointer"; // the full pointer type could be really verbose
     else {
       OS << *AccessTy.MemTy;
     }
 
     OS << " in addrspace(" << AccessTy.AddrSpace << ')';
   }
 
   OS << ", Offsets={";
   bool NeedComma = false;
   for (int64_t O : Offsets) {
     if (NeedComma) OS << ',';
     OS << O;
     NeedComma = true;
   }
   OS << '}';
 
   if (AllFixupsOutsideLoop)
     OS << ", all-fixups-outside-loop";
 
   if (WidestFixupType)
     OS << ", widest fixup type: " << *WidestFixupType;
 }
 
 LLVM_DUMP_METHOD
 void LSRUse::dump() const {
   print(errs()); errs() << '\n';
 }
 
 static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
                                  LSRUse::KindType Kind, MemAccessTy AccessTy,
                                  GlobalValue *BaseGV, int64_t BaseOffset,
                                  bool HasBaseReg, int64_t Scale) {
   switch (Kind) {
   case LSRUse::Address:
     return TTI.isLegalAddressingMode(AccessTy.MemTy, BaseGV, BaseOffset,
                                      HasBaseReg, Scale, AccessTy.AddrSpace);
 
   case LSRUse::ICmpZero:
     // There's not even a target hook for querying whether it would be legal to
     // fold a GV into an ICmp.
     if (BaseGV)
       return false;
 
     // ICmp only has two operands; don't allow more than two non-trivial parts.
     if (Scale != 0 && HasBaseReg && BaseOffset != 0)
       return false;
 
     // ICmp only supports no scale or a -1 scale, as we can "fold" a -1 scale by
     // putting the scaled register in the other operand of the icmp.
     if (Scale != 0 && Scale != -1)
       return false;
 
     // If we have low-level target information, ask the target if it can fold an
     // integer immediate on an icmp.
     if (BaseOffset != 0) {
       // We have one of:
       // ICmpZero     BaseReg + BaseOffset => ICmp BaseReg, -BaseOffset
       // ICmpZero -1*ScaleReg + BaseOffset => ICmp ScaleReg, BaseOffset
       // Offs is the ICmp immediate.
       if (Scale == 0)
         // The cast does the right thing with INT64_MIN.
         BaseOffset = -(uint64_t)BaseOffset;
       return TTI.isLegalICmpImmediate(BaseOffset);
     }
 
     // ICmpZero BaseReg + -1*ScaleReg => ICmp BaseReg, ScaleReg
     return true;
 
   case LSRUse::Basic:
     // Only handle single-register values.
     return !BaseGV && Scale == 0 && BaseOffset == 0;
 
   case LSRUse::Special:
     // Special case Basic to handle -1 scales.
     return !BaseGV && (Scale == 0 || Scale == -1) && BaseOffset == 0;
   }
 
   llvm_unreachable("Invalid LSRUse Kind!");
 }
 
 static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
                                  int64_t MinOffset, int64_t MaxOffset,
                                  LSRUse::KindType Kind, MemAccessTy AccessTy,
                                  GlobalValue *BaseGV, int64_t BaseOffset,
                                  bool HasBaseReg, int64_t Scale) {
   // Check for overflow.
   if (((int64_t)((uint64_t)BaseOffset + MinOffset) > BaseOffset) !=
       (MinOffset > 0))
     return false;
   MinOffset = (uint64_t)BaseOffset + MinOffset;
   if (((int64_t)((uint64_t)BaseOffset + MaxOffset) > BaseOffset) !=
       (MaxOffset > 0))
     return false;
   MaxOffset = (uint64_t)BaseOffset + MaxOffset;
 
   return isAMCompletelyFolded(TTI, Kind, AccessTy, BaseGV, MinOffset,
                               HasBaseReg, Scale) &&
          isAMCompletelyFolded(TTI, Kind, AccessTy, BaseGV, MaxOffset,
                               HasBaseReg, Scale);
 }
 
 static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
                                  int64_t MinOffset, int64_t MaxOffset,
                                  LSRUse::KindType Kind, MemAccessTy AccessTy,
                                  const Formula &F) {
   // For the purpose of isAMCompletelyFolded either having a canonical formula
   // or a scale not equal to zero is correct.
   // Problems may arise from non canonical formulae having a scale == 0.
   // Strictly speaking it would best to just rely on canonical formulae.
   // However, when we generate the scaled formulae, we first check that the
   // scaling factor is profitable before computing the actual ScaledReg for
   // compile time sake.
   assert((F.isCanonical() || F.Scale != 0));
   return isAMCompletelyFolded(TTI, MinOffset, MaxOffset, Kind, AccessTy,
                               F.BaseGV, F.BaseOffset, F.HasBaseReg, F.Scale);
 }
 
 /// Test whether we know how to expand the current formula.
 static bool isLegalUse(const TargetTransformInfo &TTI, int64_t MinOffset,
                        int64_t MaxOffset, LSRUse::KindType Kind,
                        MemAccessTy AccessTy, GlobalValue *BaseGV,
                        int64_t BaseOffset, bool HasBaseReg, int64_t Scale) {
   // We know how to expand completely foldable formulae.
   return isAMCompletelyFolded(TTI, MinOffset, MaxOffset, Kind, AccessTy, BaseGV,
                               BaseOffset, HasBaseReg, Scale) ||
          // Or formulae that use a base register produced by a sum of base
          // registers.
          (Scale == 1 &&
           isAMCompletelyFolded(TTI, MinOffset, MaxOffset, Kind, AccessTy,
                                BaseGV, BaseOffset, true, 0));
 }
 
 static bool isLegalUse(const TargetTransformInfo &TTI, int64_t MinOffset,
                        int64_t MaxOffset, LSRUse::KindType Kind,
                        MemAccessTy AccessTy, const Formula &F) {
   return isLegalUse(TTI, MinOffset, MaxOffset, Kind, AccessTy, F.BaseGV,
                     F.BaseOffset, F.HasBaseReg, F.Scale);
 }
 
 static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
                                  const LSRUse &LU, const Formula &F) {
   return isAMCompletelyFolded(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind,
                               LU.AccessTy, F.BaseGV, F.BaseOffset, F.HasBaseReg,
                               F.Scale);
 }
 
 static unsigned getScalingFactorCost(const TargetTransformInfo &TTI,
                                      const LSRUse &LU, const Formula &F) {
   if (!F.Scale)
     return 0;
 
   // If the use is not completely folded in that instruction, we will have to
   // pay an extra cost only for scale != 1.
   if (!isAMCompletelyFolded(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind,
                             LU.AccessTy, F))
     return F.Scale != 1;
 
   switch (LU.Kind) {
   case LSRUse::Address: {
     // Check the scaling factor cost with both the min and max offsets.
     int ScaleCostMinOffset = TTI.getScalingFactorCost(
         LU.AccessTy.MemTy, F.BaseGV, F.BaseOffset + LU.MinOffset, F.HasBaseReg,
         F.Scale, LU.AccessTy.AddrSpace);
     int ScaleCostMaxOffset = TTI.getScalingFactorCost(
         LU.AccessTy.MemTy, F.BaseGV, F.BaseOffset + LU.MaxOffset, F.HasBaseReg,
         F.Scale, LU.AccessTy.AddrSpace);
 
     assert(ScaleCostMinOffset >= 0 && ScaleCostMaxOffset >= 0 &&
            "Legal addressing mode has an illegal cost!");
     return std::max(ScaleCostMinOffset, ScaleCostMaxOffset);
   }
   case LSRUse::ICmpZero:
   case LSRUse::Basic:
   case LSRUse::Special:
     // The use is completely folded, i.e., everything is folded into the
     // instruction.
     return 0;
   }
 
   llvm_unreachable("Invalid LSRUse Kind!");
 }
 
 static bool isAlwaysFoldable(const TargetTransformInfo &TTI,
                              LSRUse::KindType Kind, MemAccessTy AccessTy,
                              GlobalValue *BaseGV, int64_t BaseOffset,
                              bool HasBaseReg) {
   // Fast-path: zero is always foldable.
   if (BaseOffset == 0 && !BaseGV) return true;
 
   // Conservatively, create an address with an immediate and a
   // base and a scale.
   int64_t Scale = Kind == LSRUse::ICmpZero ? -1 : 1;
 
   // Canonicalize a scale of 1 to a base register if the formula doesn't
   // already have a base register.
   if (!HasBaseReg && Scale == 1) {
     Scale = 0;
     HasBaseReg = true;
   }
 
   return isAMCompletelyFolded(TTI, Kind, AccessTy, BaseGV, BaseOffset,
                               HasBaseReg, Scale);
 }
 
 static bool isAlwaysFoldable(const TargetTransformInfo &TTI,
                              ScalarEvolution &SE, int64_t MinOffset,
                              int64_t MaxOffset, LSRUse::KindType Kind,
                              MemAccessTy AccessTy, const SCEV *S,
                              bool HasBaseReg) {
   // Fast-path: zero is always foldable.
   if (S->isZero()) return true;
 
   // Conservatively, create an address with an immediate and a
   // base and a scale.
   int64_t BaseOffset = ExtractImmediate(S, SE);
   GlobalValue *BaseGV = ExtractSymbol(S, SE);
 
   // If there's anything else involved, it's not foldable.
   if (!S->isZero()) return false;
 
   // Fast-path: zero is always foldable.
   if (BaseOffset == 0 && !BaseGV) return true;
 
   // Conservatively, create an address with an immediate and a
   // base and a scale.
   int64_t Scale = Kind == LSRUse::ICmpZero ? -1 : 1;
 
   return isAMCompletelyFolded(TTI, MinOffset, MaxOffset, Kind, AccessTy, BaseGV,
                               BaseOffset, HasBaseReg, Scale);
 }
 
 namespace {
 
 /// An individual increment in a Chain of IV increments.  Relate an IV user to
 /// an expression that computes the IV it uses from the IV used by the previous
 /// link in the Chain.
 ///
 /// For the head of a chain, IncExpr holds the absolute SCEV expression for the
 /// original IVOperand. The head of the chain's IVOperand is only valid during
 /// chain collection, before LSR replaces IV users. During chain generation,
 /// IncExpr can be used to find the new IVOperand that computes the same
 /// expression.
 struct IVInc {
   Instruction *UserInst;
   Value* IVOperand;
   const SCEV *IncExpr;
 
   IVInc(Instruction *U, Value *O, const SCEV *E):
     UserInst(U), IVOperand(O), IncExpr(E) {}
 };
 
 // The list of IV increments in program order.  We typically add the head of a
 // chain without finding subsequent links.
 struct IVChain {
   SmallVector<IVInc,1> Incs;
   const SCEV *ExprBase;
 
   IVChain() : ExprBase(nullptr) {}
 
   IVChain(const IVInc &Head, const SCEV *Base)
     : Incs(1, Head), ExprBase(Base) {}
 
   typedef SmallVectorImpl<IVInc>::const_iterator const_iterator;
 
   // Return the first increment in the chain.
   const_iterator begin() const {
     assert(!Incs.empty());
     return std::next(Incs.begin());
   }
   const_iterator end() const {
     return Incs.end();
   }
 
   // Returns true if this chain contains any increments.
   bool hasIncs() const { return Incs.size() >= 2; }
 
   // Add an IVInc to the end of this chain.
   void add(const IVInc &X) { Incs.push_back(X); }
 
   // Returns the last UserInst in the chain.
   Instruction *tailUserInst() const { return Incs.back().UserInst; }
 
   // Returns true if IncExpr can be profitably added to this chain.
   bool isProfitableIncrement(const SCEV *OperExpr,
                              const SCEV *IncExpr,
                              ScalarEvolution&);
 };
 
 /// Helper for CollectChains to track multiple IV increment uses.  Distinguish
 /// between FarUsers that definitely cross IV increments and NearUsers that may
 /// be used between IV increments.
 struct ChainUsers {
   SmallPtrSet<Instruction*, 4> FarUsers;
   SmallPtrSet<Instruction*, 4> NearUsers;
 };
 
 /// This class holds state for the main loop strength reduction logic.
 class LSRInstance {
   IVUsers &IU;
   ScalarEvolution &SE;
   DominatorTree &DT;
   LoopInfo &LI;
   const TargetTransformInfo &TTI;
   Loop *const L;
   bool Changed;
 
   /// This is the insert position that the current loop's induction variable
   /// increment should be placed. In simple loops, this is the latch block's
   /// terminator. But in more complicated cases, this is a position which will
   /// dominate all the in-loop post-increment users.
   Instruction *IVIncInsertPos;
 
   /// Interesting factors between use strides.
   SmallSetVector<int64_t, 8> Factors;
 
   /// Interesting use types, to facilitate truncation reuse.
   SmallSetVector<Type *, 4> Types;
 
   /// The list of operands which are to be replaced.
   SmallVector<LSRFixup, 16> Fixups;
 
   /// The list of interesting uses.
   SmallVector<LSRUse, 16> Uses;
 
   /// Track which uses use which register candidates.
   RegUseTracker RegUses;
 
   // Limit the number of chains to avoid quadratic behavior. We don't expect to
   // have more than a few IV increment chains in a loop. Missing a Chain falls
   // back to normal LSR behavior for those uses.
   static const unsigned MaxChains = 8;
 
   /// IV users can form a chain of IV increments.
   SmallVector<IVChain, MaxChains> IVChainVec;
 
   /// IV users that belong to profitable IVChains.
   SmallPtrSet<Use*, MaxChains> IVIncSet;
 
   void OptimizeShadowIV();
   bool FindIVUserForCond(ICmpInst *Cond, IVStrideUse *&CondUse);
   ICmpInst *OptimizeMax(ICmpInst *Cond, IVStrideUse* &CondUse);
   void OptimizeLoopTermCond();
 
   void ChainInstruction(Instruction *UserInst, Instruction *IVOper,
                         SmallVectorImpl<ChainUsers> &ChainUsersVec);
   void FinalizeChain(IVChain &Chain);
   void CollectChains();
   void GenerateIVChain(const IVChain &Chain, SCEVExpander &Rewriter,
                        SmallVectorImpl<WeakVH> &DeadInsts);
 
   void CollectInterestingTypesAndFactors();
   void CollectFixupsAndInitialFormulae();
 
   LSRFixup &getNewFixup() {
     Fixups.push_back(LSRFixup());
     return Fixups.back();
   }
 
   // Support for sharing of LSRUses between LSRFixups.
   typedef DenseMap<LSRUse::SCEVUseKindPair, size_t> UseMapTy;
   UseMapTy UseMap;
 
   bool reconcileNewOffset(LSRUse &LU, int64_t NewOffset, bool HasBaseReg,
                           LSRUse::KindType Kind, MemAccessTy AccessTy);
 
   std::pair<size_t, int64_t> getUse(const SCEV *&Expr, LSRUse::KindType Kind,
                                     MemAccessTy AccessTy);
 
   void DeleteUse(LSRUse &LU, size_t LUIdx);
 
   LSRUse *FindUseWithSimilarFormula(const Formula &F, const LSRUse &OrigLU);
 
   void InsertInitialFormula(const SCEV *S, LSRUse &LU, size_t LUIdx);
   void InsertSupplementalFormula(const SCEV *S, LSRUse &LU, size_t LUIdx);
   void CountRegisters(const Formula &F, size_t LUIdx);
   bool InsertFormula(LSRUse &LU, unsigned LUIdx, const Formula &F);
 
   void CollectLoopInvariantFixupsAndFormulae();
 
   void GenerateReassociations(LSRUse &LU, unsigned LUIdx, Formula Base,
                               unsigned Depth = 0);
 
   void GenerateReassociationsImpl(LSRUse &LU, unsigned LUIdx,
                                   const Formula &Base, unsigned Depth,
                                   size_t Idx, bool IsScaledReg = false);
   void GenerateCombinations(LSRUse &LU, unsigned LUIdx, Formula Base);
   void GenerateSymbolicOffsetsImpl(LSRUse &LU, unsigned LUIdx,
                                    const Formula &Base, size_t Idx,
                                    bool IsScaledReg = false);
   void GenerateSymbolicOffsets(LSRUse &LU, unsigned LUIdx, Formula Base);
   void GenerateConstantOffsetsImpl(LSRUse &LU, unsigned LUIdx,
                                    const Formula &Base,
                                    const SmallVectorImpl<int64_t> &Worklist,
                                    size_t Idx, bool IsScaledReg = false);
   void GenerateConstantOffsets(LSRUse &LU, unsigned LUIdx, Formula Base);
   void GenerateICmpZeroScales(LSRUse &LU, unsigned LUIdx, Formula Base);
   void GenerateScales(LSRUse &LU, unsigned LUIdx, Formula Base);
   void GenerateTruncates(LSRUse &LU, unsigned LUIdx, Formula Base);
   void GenerateCrossUseConstantOffsets();
   void GenerateAllReuseFormulae();
 
   void FilterOutUndesirableDedicatedRegisters();
 
   size_t EstimateSearchSpaceComplexity() const;
   void NarrowSearchSpaceByDetectingSupersets();
   void NarrowSearchSpaceByCollapsingUnrolledCode();
   void NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();
   void NarrowSearchSpaceByPickingWinnerRegs();
   void NarrowSearchSpaceUsingHeuristics();
 
   void SolveRecurse(SmallVectorImpl<const Formula *> &Solution,
                     Cost &SolutionCost,
                     SmallVectorImpl<const Formula *> &Workspace,
                     const Cost &CurCost,
                     const SmallPtrSet<const SCEV *, 16> &CurRegs,
                     DenseSet<const SCEV *> &VisitedRegs) const;
   void Solve(SmallVectorImpl<const Formula *> &Solution) const;
 
   BasicBlock::iterator
     HoistInsertPosition(BasicBlock::iterator IP,
                         const SmallVectorImpl<Instruction *> &Inputs) const;
   BasicBlock::iterator
     AdjustInsertPositionForExpand(BasicBlock::iterator IP,
                                   const LSRFixup &LF,
                                   const LSRUse &LU,
                                   SCEVExpander &Rewriter) const;
 
   Value *Expand(const LSRFixup &LF,
                 const Formula &F,
                 BasicBlock::iterator IP,
                 SCEVExpander &Rewriter,
                 SmallVectorImpl<WeakVH> &DeadInsts) const;
   void RewriteForPHI(PHINode *PN, const LSRFixup &LF,
                      const Formula &F,
                      SCEVExpander &Rewriter,
                      SmallVectorImpl<WeakVH> &DeadInsts) const;
   void Rewrite(const LSRFixup &LF,
                const Formula &F,
                SCEVExpander &Rewriter,
                SmallVectorImpl<WeakVH> &DeadInsts) const;
   void ImplementSolution(const SmallVectorImpl<const Formula *> &Solution);
 
 public:
   LSRInstance(Loop *L, IVUsers &IU, ScalarEvolution &SE, DominatorTree &DT,
               LoopInfo &LI, const TargetTransformInfo &TTI);
 
   bool getChanged() const { return Changed; }
 
   void print_factors_and_types(raw_ostream &OS) const;
   void print_fixups(raw_ostream &OS) const;
   void print_uses(raw_ostream &OS) const;
   void print(raw_ostream &OS) const;
   void dump() const;
 };
 
 }
 
 /// If IV is used in a int-to-float cast inside the loop then try to eliminate
 /// the cast operation.
 void LSRInstance::OptimizeShadowIV() {
   const SCEV *BackedgeTakenCount = SE.getBackedgeTakenCount(L);
   if (isa<SCEVCouldNotCompute>(BackedgeTakenCount))
     return;
 
   for (IVUsers::const_iterator UI = IU.begin(), E = IU.end();
        UI != E; /* empty */) {
     IVUsers::const_iterator CandidateUI = UI;
     ++UI;
     Instruction *ShadowUse = CandidateUI->getUser();
     Type *DestTy = nullptr;
     bool IsSigned = false;
 
     /* If shadow use is a int->float cast then insert a second IV
        to eliminate this cast.
 
          for (unsigned i = 0; i < n; ++i)
            foo((double)i);
 
        is transformed into
 
          double d = 0.0;
          for (unsigned i = 0; i < n; ++i, ++d)
            foo(d);
     */
     if (UIToFPInst *UCast = dyn_cast<UIToFPInst>(CandidateUI->getUser())) {
       IsSigned = false;
       DestTy = UCast->getDestTy();
     }
     else if (SIToFPInst *SCast = dyn_cast<SIToFPInst>(CandidateUI->getUser())) {
       IsSigned = true;
       DestTy = SCast->getDestTy();
     }
     if (!DestTy) continue;
 
     // If target does not support DestTy natively then do not apply
     // this transformation.
     if (!TTI.isTypeLegal(DestTy)) continue;
 
     PHINode *PH = dyn_cast<PHINode>(ShadowUse->getOperand(0));
     if (!PH) continue;
     if (PH->getNumIncomingValues() != 2) continue;
 
     Type *SrcTy = PH->getType();
     int Mantissa = DestTy->getFPMantissaWidth();
     if (Mantissa == -1) continue;
     if ((int)SE.getTypeSizeInBits(SrcTy) > Mantissa)
       continue;
 
     unsigned Entry, Latch;
     if (PH->getIncomingBlock(0) == L->getLoopPreheader()) {
       Entry = 0;
       Latch = 1;
     } else {
       Entry = 1;
       Latch = 0;
     }
 
     ConstantInt *Init = dyn_cast<ConstantInt>(PH->getIncomingValue(Entry));
     if (!Init) continue;
     Constant *NewInit = ConstantFP::get(DestTy, IsSigned ?
                                         (double)Init->getSExtValue() :
                                         (double)Init->getZExtValue());
 
     BinaryOperator *Incr =
       dyn_cast<BinaryOperator>(PH->getIncomingValue(Latch));
     if (!Incr) continue;
     if (Incr->getOpcode() != Instruction::Add
         && Incr->getOpcode() != Instruction::Sub)
       continue;
 
     /* Initialize new IV, double d = 0.0 in above example. */
     ConstantInt *C = nullptr;
     if (Incr->getOperand(0) == PH)
       C = dyn_cast<ConstantInt>(Incr->getOperand(1));
     else if (Incr->getOperand(1) == PH)
       C = dyn_cast<ConstantInt>(Incr->getOperand(0));
     else
       continue;
 
     if (!C) continue;
 
     // Ignore negative constants, as the code below doesn't handle them
     // correctly. TODO: Remove this restriction.
     if (!C->getValue().isStrictlyPositive()) continue;
 
     /* Add new PHINode. */
     PHINode *NewPH = PHINode::Create(DestTy, 2, "IV.S.", PH);
 
     /* create new increment. '++d' in above example. */
     Constant *CFP = ConstantFP::get(DestTy, C->getZExtValue());
     BinaryOperator *NewIncr =
       BinaryOperator::Create(Incr->getOpcode() == Instruction::Add ?
                                Instruction::FAdd : Instruction::FSub,
                              NewPH, CFP, "IV.S.next.", Incr);
 
     NewPH->addIncoming(NewInit, PH->getIncomingBlock(Entry));
     NewPH->addIncoming(NewIncr, PH->getIncomingBlock(Latch));
 
     /* Remove cast operation */
     ShadowUse->replaceAllUsesWith(NewPH);
     ShadowUse->eraseFromParent();
     Changed = true;
     break;
   }
 }
 
 /// If Cond has an operand that is an expression of an IV, set the IV user and
 /// stride information and return true, otherwise return false.
 bool LSRInstance::FindIVUserForCond(ICmpInst *Cond, IVStrideUse *&CondUse) {
   for (IVStrideUse &U : IU)
     if (U.getUser() == Cond) {
       // NOTE: we could handle setcc instructions with multiple uses here, but
       // InstCombine does it as well for simple uses, it's not clear that it
       // occurs enough in real life to handle.
       CondUse = &U;
       return true;
     }
   return false;
 }
 
 /// Rewrite the loop's terminating condition if it uses a max computation.
 ///
 /// This is a narrow solution to a specific, but acute, problem. For loops
 /// like this:
 ///
 ///   i = 0;
 ///   do {
 ///     p[i] = 0.0;
 ///   } while (++i < n);
 ///
 /// the trip count isn't just 'n', because 'n' might not be positive. And
 /// unfortunately this can come up even for loops where the user didn't use
 /// a C do-while loop. For example, seemingly well-behaved top-test loops
 /// will commonly be lowered like this:
 //
 ///   if (n > 0) {
 ///     i = 0;
 ///     do {
 ///       p[i] = 0.0;
 ///     } while (++i < n);
 ///   }
 ///
 /// and then it's possible for subsequent optimization to obscure the if
 /// test in such a way that indvars can't find it.
 ///
 /// When indvars can't find the if test in loops like this, it creates a
 /// max expression, which allows it to give the loop a canonical
 /// induction variable:
 ///
 ///   i = 0;
 ///   max = n < 1 ? 1 : n;
 ///   do {
 ///     p[i] = 0.0;
 ///   } while (++i != max);
 ///
 /// Canonical induction variables are necessary because the loop passes
 /// are designed around them. The most obvious example of this is the
 /// LoopInfo analysis, which doesn't remember trip count values. It
 /// expects to be able to rediscover the trip count each time it is
 /// needed, and it does this using a simple analysis that only succeeds if
 /// the loop has a canonical induction variable.
 ///
 /// However, when it comes time to generate code, the maximum operation
 /// can be quite costly, especially if it's inside of an outer loop.
 ///
 /// This function solves this problem by detecting this type of loop and
 /// rewriting their conditions from ICMP_NE back to ICMP_SLT, and deleting
 /// the instructions for the maximum computation.
 ///
 ICmpInst *LSRInstance::OptimizeMax(ICmpInst *Cond, IVStrideUse* &CondUse) {
   // Check that the loop matches the pattern we're looking for.
   if (Cond->getPredicate() != CmpInst::ICMP_EQ &&
       Cond->getPredicate() != CmpInst::ICMP_NE)
     return Cond;
 
   SelectInst *Sel = dyn_cast<SelectInst>(Cond->getOperand(1));
   if (!Sel || !Sel->hasOneUse()) return Cond;
 
   const SCEV *BackedgeTakenCount = SE.getBackedgeTakenCount(L);
   if (isa<SCEVCouldNotCompute>(BackedgeTakenCount))
     return Cond;
   const SCEV *One = SE.getConstant(BackedgeTakenCount->getType(), 1);
 
   // Add one to the backedge-taken count to get the trip count.
   const SCEV *IterationCount = SE.getAddExpr(One, BackedgeTakenCount);
   if (IterationCount != SE.getSCEV(Sel)) return Cond;
 
   // Check for a max calculation that matches the pattern. There's no check
   // for ICMP_ULE here because the comparison would be with zero, which
   // isn't interesting.
   CmpInst::Predicate Pred = ICmpInst::BAD_ICMP_PREDICATE;
   const SCEVNAryExpr *Max = nullptr;
   if (const SCEVSMaxExpr *S = dyn_cast<SCEVSMaxExpr>(BackedgeTakenCount)) {
     Pred = ICmpInst::ICMP_SLE;
     Max = S;
   } else if (const SCEVSMaxExpr *S = dyn_cast<SCEVSMaxExpr>(IterationCount)) {
     Pred = ICmpInst::ICMP_SLT;
     Max = S;
   } else if (const SCEVUMaxExpr *U = dyn_cast<SCEVUMaxExpr>(IterationCount)) {
     Pred = ICmpInst::ICMP_ULT;
     Max = U;
   } else {
     // No match; bail.
     return Cond;
   }
 
   // To handle a max with more than two operands, this optimization would
   // require additional checking and setup.
   if (Max->getNumOperands() != 2)
     return Cond;
 
   const SCEV *MaxLHS = Max->getOperand(0);
   const SCEV *MaxRHS = Max->getOperand(1);
 
   // ScalarEvolution canonicalizes constants to the left. For < and >, look
   // for a comparison with 1. For <= and >=, a comparison with zero.
   if (!MaxLHS ||
       (ICmpInst::isTrueWhenEqual(Pred) ? !MaxLHS->isZero() : (MaxLHS != One)))
     return Cond;
 
   // Check the relevant induction variable for conformance to
   // the pattern.
   const SCEV *IV = SE.getSCEV(Cond->getOperand(0));
   const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(IV);
   if (!AR || !AR->isAffine() ||
       AR->getStart() != One ||
       AR->getStepRecurrence(SE) != One)
     return Cond;
 
   assert(AR->getLoop() == L &&
          "Loop condition operand is an addrec in a different loop!");
 
   // Check the right operand of the select, and remember it, as it will
   // be used in the new comparison instruction.
   Value *NewRHS = nullptr;
   if (ICmpInst::isTrueWhenEqual(Pred)) {
     // Look for n+1, and grab n.
     if (AddOperator *BO = dyn_cast<AddOperator>(Sel->getOperand(1)))
       if (ConstantInt *BO1 = dyn_cast<ConstantInt>(BO->getOperand(1)))
          if (BO1->isOne() && SE.getSCEV(BO->getOperand(0)) == MaxRHS)
            NewRHS = BO->getOperand(0);
     if (AddOperator *BO = dyn_cast<AddOperator>(Sel->getOperand(2)))
       if (ConstantInt *BO1 = dyn_cast<ConstantInt>(BO->getOperand(1)))
         if (BO1->isOne() && SE.getSCEV(BO->getOperand(0)) == MaxRHS)
           NewRHS = BO->getOperand(0);
     if (!NewRHS)
       return Cond;
   } else if (SE.getSCEV(Sel->getOperand(1)) == MaxRHS)
     NewRHS = Sel->getOperand(1);
   else if (SE.getSCEV(Sel->getOperand(2)) == MaxRHS)
     NewRHS = Sel->getOperand(2);
   else if (const SCEVUnknown *SU = dyn_cast<SCEVUnknown>(MaxRHS))
     NewRHS = SU->getValue();
   else
     // Max doesn't match expected pattern.
     return Cond;
 
   // Determine the new comparison opcode. It may be signed or unsigned,
   // and the original comparison may be either equality or inequality.
   if (Cond->getPredicate() == CmpInst::ICMP_EQ)
     Pred = CmpInst::getInversePredicate(Pred);
 
   // Ok, everything looks ok to change the condition into an SLT or SGE and
   // delete the max calculation.
   ICmpInst *NewCond =
     new ICmpInst(Cond, Pred, Cond->getOperand(0), NewRHS, "scmp");
 
   // Delete the max calculation instructions.
   Cond->replaceAllUsesWith(NewCond);
   CondUse->setUser(NewCond);
   Instruction *Cmp = cast<Instruction>(Sel->getOperand(0));
   Cond->eraseFromParent();
   Sel->eraseFromParent();
   if (Cmp->use_empty())
     Cmp->eraseFromParent();
   return NewCond;
 }
 
 /// Change loop terminating condition to use the postinc iv when possible.
 void
 LSRInstance::OptimizeLoopTermCond() {
   SmallPtrSet<Instruction *, 4> PostIncs;
 
   BasicBlock *LatchBlock = L->getLoopLatch();
   SmallVector<BasicBlock*, 8> ExitingBlocks;
   L->getExitingBlocks(ExitingBlocks);
 
   for (BasicBlock *ExitingBlock : ExitingBlocks) {
 
     // Get the terminating condition for the loop if possible.  If we
     // can, we want to change it to use a post-incremented version of its
     // induction variable, to allow coalescing the live ranges for the IV into
     // one register value.
 
     BranchInst *TermBr = dyn_cast<BranchInst>(ExitingBlock->getTerminator());
     if (!TermBr)
       continue;
     // FIXME: Overly conservative, termination condition could be an 'or' etc..
     if (TermBr->isUnconditional() || !isa<ICmpInst>(TermBr->getCondition()))
       continue;
 
     // Search IVUsesByStride to find Cond's IVUse if there is one.
     IVStrideUse *CondUse = nullptr;
     ICmpInst *Cond = cast<ICmpInst>(TermBr->getCondition());
     if (!FindIVUserForCond(Cond, CondUse))
       continue;
 
     // If the trip count is computed in terms of a max (due to ScalarEvolution
     // being unable to find a sufficient guard, for example), change the loop
     // comparison to use SLT or ULT instead of NE.
     // One consequence of doing this now is that it disrupts the count-down
     // optimization. That's not always a bad thing though, because in such
     // cases it may still be worthwhile to avoid a max.
     Cond = OptimizeMax(Cond, CondUse);
 
     // If this exiting block dominates the latch block, it may also use
     // the post-inc value if it won't be shared with other uses.
     // Check for dominance.
     if (!DT.dominates(ExitingBlock, LatchBlock))
       continue;
 
     // Conservatively avoid trying to use the post-inc value in non-latch
     // exits if there may be pre-inc users in intervening blocks.
     if (LatchBlock != ExitingBlock)
       for (IVUsers::const_iterator UI = IU.begin(), E = IU.end(); UI != E; ++UI)
         // Test if the use is reachable from the exiting block. This dominator
         // query is a conservative approximation of reachability.
         if (&*UI != CondUse &&
             !DT.properlyDominates(UI->getUser()->getParent(), ExitingBlock)) {
           // Conservatively assume there may be reuse if the quotient of their
           // strides could be a legal scale.
           const SCEV *A = IU.getStride(*CondUse, L);
           const SCEV *B = IU.getStride(*UI, L);
           if (!A || !B) continue;
           if (SE.getTypeSizeInBits(A->getType()) !=
               SE.getTypeSizeInBits(B->getType())) {
             if (SE.getTypeSizeInBits(A->getType()) >
                 SE.getTypeSizeInBits(B->getType()))
               B = SE.getSignExtendExpr(B, A->getType());
             else
               A = SE.getSignExtendExpr(A, B->getType());
           }
           if (const SCEVConstant *D =
                 dyn_cast_or_null<SCEVConstant>(getExactSDiv(B, A, SE))) {
             const ConstantInt *C = D->getValue();
             // Stride of one or negative one can have reuse with non-addresses.
             if (C->isOne() || C->isAllOnesValue())
               goto decline_post_inc;
             // Avoid weird situations.
             if (C->getValue().getMinSignedBits() >= 64 ||
                 C->getValue().isMinSignedValue())
               goto decline_post_inc;
             // Check for possible scaled-address reuse.
             MemAccessTy AccessTy = getAccessType(UI->getUser());
             int64_t Scale = C->getSExtValue();
             if (TTI.isLegalAddressingMode(AccessTy.MemTy, /*BaseGV=*/nullptr,
                                           /*BaseOffset=*/0,
                                           /*HasBaseReg=*/false, Scale,
                                           AccessTy.AddrSpace))
               goto decline_post_inc;
             Scale = -Scale;
             if (TTI.isLegalAddressingMode(AccessTy.MemTy, /*BaseGV=*/nullptr,
                                           /*BaseOffset=*/0,
                                           /*HasBaseReg=*/false, Scale,
                                           AccessTy.AddrSpace))
               goto decline_post_inc;
           }
         }
 
     DEBUG(dbgs() << "  Change loop exiting icmp to use postinc iv: "
                  << *Cond << '\n');
 
     // It's possible for the setcc instruction to be anywhere in the loop, and
     // possible for it to have multiple users.  If it is not immediately before
     // the exiting block branch, move it.
     if (&*++BasicBlock::iterator(Cond) != TermBr) {
       if (Cond->hasOneUse()) {
         Cond->moveBefore(TermBr);
       } else {
         // Clone the terminating condition and insert into the loopend.
         ICmpInst *OldCond = Cond;
         Cond = cast<ICmpInst>(Cond->clone());
         Cond->setName(L->getHeader()->getName() + ".termcond");
         ExitingBlock->getInstList().insert(TermBr->getIterator(), Cond);
 
         // Clone the IVUse, as the old use still exists!
         CondUse = &IU.AddUser(Cond, CondUse->getOperandValToReplace());
         TermBr->replaceUsesOfWith(OldCond, Cond);
       }
     }
 
     // If we get to here, we know that we can transform the setcc instruction to
     // use the post-incremented version of the IV, allowing us to coalesce the
     // live ranges for the IV correctly.
     CondUse->transformToPostInc(L);
     Changed = true;
 
     PostIncs.insert(Cond);
   decline_post_inc:;
   }
 
   // Determine an insertion point for the loop induction variable increment. It
   // must dominate all the post-inc comparisons we just set up, and it must
   // dominate the loop latch edge.
   IVIncInsertPos = L->getLoopLatch()->getTerminator();
   for (Instruction *Inst : PostIncs) {
     BasicBlock *BB =
       DT.findNearestCommonDominator(IVIncInsertPos->getParent(),
                                     Inst->getParent());
     if (BB == Inst->getParent())
       IVIncInsertPos = Inst;
     else if (BB != IVIncInsertPos->getParent())
       IVIncInsertPos = BB->getTerminator();
   }
 }
 
 /// Determine if the given use can accommodate a fixup at the given offset and
 /// other details. If so, update the use and return true.
 bool LSRInstance::reconcileNewOffset(LSRUse &LU, int64_t NewOffset,
                                      bool HasBaseReg, LSRUse::KindType Kind,
                                      MemAccessTy AccessTy) {
   int64_t NewMinOffset = LU.MinOffset;
   int64_t NewMaxOffset = LU.MaxOffset;
   MemAccessTy NewAccessTy = AccessTy;
 
   // Check for a mismatched kind. It's tempting to collapse mismatched kinds to
   // something conservative, however this can pessimize in the case that one of
   // the uses will have all its uses outside the loop, for example.
   if (LU.Kind != Kind)
     return false;
 
   // Check for a mismatched access type, and fall back conservatively as needed.
   // TODO: Be less conservative when the type is similar and can use the same
   // addressing modes.
   if (Kind == LSRUse::Address) {
     if (AccessTy != LU.AccessTy)
       NewAccessTy = MemAccessTy::getUnknown(AccessTy.MemTy->getContext());
   }
 
   // Conservatively assume HasBaseReg is true for now.
   if (NewOffset < LU.MinOffset) {
     if (!isAlwaysFoldable(TTI, Kind, NewAccessTy, /*BaseGV=*/nullptr,
                           LU.MaxOffset - NewOffset, HasBaseReg))
       return false;
     NewMinOffset = NewOffset;
   } else if (NewOffset > LU.MaxOffset) {
     if (!isAlwaysFoldable(TTI, Kind, NewAccessTy, /*BaseGV=*/nullptr,
                           NewOffset - LU.MinOffset, HasBaseReg))
       return false;
     NewMaxOffset = NewOffset;
   }
 
   // Update the use.
   LU.MinOffset = NewMinOffset;
   LU.MaxOffset = NewMaxOffset;
   LU.AccessTy = NewAccessTy;
   if (NewOffset != LU.Offsets.back())
     LU.Offsets.push_back(NewOffset);
   return true;
 }
 
 /// Return an LSRUse index and an offset value for a fixup which needs the given
 /// expression, with the given kind and optional access type.  Either reuse an
 /// existing use or create a new one, as needed.
 std::pair<size_t, int64_t> LSRInstance::getUse(const SCEV *&Expr,
                                                LSRUse::KindType Kind,
                                                MemAccessTy AccessTy) {
   const SCEV *Copy = Expr;
   int64_t Offset = ExtractImmediate(Expr, SE);
 
   // Basic uses can't accept any offset, for example.
   if (!isAlwaysFoldable(TTI, Kind, AccessTy, /*BaseGV=*/ nullptr,
                         Offset, /*HasBaseReg=*/ true)) {
     Expr = Copy;
     Offset = 0;
   }
 
   std::pair<UseMapTy::iterator, bool> P =
     UseMap.insert(std::make_pair(LSRUse::SCEVUseKindPair(Expr, Kind), 0));
   if (!P.second) {
     // A use already existed with this base.
     size_t LUIdx = P.first->second;
     LSRUse &LU = Uses[LUIdx];
     if (reconcileNewOffset(LU, Offset, /*HasBaseReg=*/true, Kind, AccessTy))
       // Reuse this use.
       return std::make_pair(LUIdx, Offset);
   }
 
   // Create a new use.
   size_t LUIdx = Uses.size();
   P.first->second = LUIdx;
   Uses.push_back(LSRUse(Kind, AccessTy));
   LSRUse &LU = Uses[LUIdx];
 
   // We don't need to track redundant offsets, but we don't need to go out
   // of our way here to avoid them.
   if (LU.Offsets.empty() || Offset != LU.Offsets.back())
     LU.Offsets.push_back(Offset);
 
   LU.MinOffset = Offset;
   LU.MaxOffset = Offset;
   return std::make_pair(LUIdx, Offset);
 }
 
 /// Delete the given use from the Uses list.
 void LSRInstance::DeleteUse(LSRUse &LU, size_t LUIdx) {
   if (&LU != &Uses.back())
     std::swap(LU, Uses.back());
   Uses.pop_back();
 
   // Update RegUses.
   RegUses.swapAndDropUse(LUIdx, Uses.size());
 }
 
 /// Look for a use distinct from OrigLU which is has a formula that has the same
 /// registers as the given formula.
 LSRUse *
 LSRInstance::FindUseWithSimilarFormula(const Formula &OrigF,
                                        const LSRUse &OrigLU) {
   // Search all uses for the formula. This could be more clever.
   for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
     LSRUse &LU = Uses[LUIdx];
     // Check whether this use is close enough to OrigLU, to see whether it's
     // worthwhile looking through its formulae.
     // Ignore ICmpZero uses because they may contain formulae generated by
     // GenerateICmpZeroScales, in which case adding fixup offsets may
     // be invalid.
     if (&LU != &OrigLU &&
         LU.Kind != LSRUse::ICmpZero &&
         LU.Kind == OrigLU.Kind && OrigLU.AccessTy == LU.AccessTy &&
         LU.WidestFixupType == OrigLU.WidestFixupType &&
         LU.HasFormulaWithSameRegs(OrigF)) {
       // Scan through this use's formulae.
       for (const Formula &F : LU.Formulae) {
         // Check to see if this formula has the same registers and symbols
         // as OrigF.
         if (F.BaseRegs == OrigF.BaseRegs &&
             F.ScaledReg == OrigF.ScaledReg &&
             F.BaseGV == OrigF.BaseGV &&
             F.Scale == OrigF.Scale &&
             F.UnfoldedOffset == OrigF.UnfoldedOffset) {
           if (F.BaseOffset == 0)
             return &LU;
           // This is the formula where all the registers and symbols matched;
           // there aren't going to be any others. Since we declined it, we
           // can skip the rest of the formulae and proceed to the next LSRUse.
           break;
         }
       }
     }
   }
 
   // Nothing looked good.
   return nullptr;
 }
 
 void LSRInstance::CollectInterestingTypesAndFactors() {
   SmallSetVector<const SCEV *, 4> Strides;
 
   // Collect interesting types and strides.
   SmallVector<const SCEV *, 4> Worklist;
   for (const IVStrideUse &U : IU) {
     const SCEV *Expr = IU.getExpr(U);
 
     // Collect interesting types.
     Types.insert(SE.getEffectiveSCEVType(Expr->getType()));
 
     // Add strides for mentioned loops.
     Worklist.push_back(Expr);
     do {
       const SCEV *S = Worklist.pop_back_val();
       if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(S)) {
         if (AR->getLoop() == L)
           Strides.insert(AR->getStepRecurrence(SE));
         Worklist.push_back(AR->getStart());
       } else if (const SCEVAddExpr *Add = dyn_cast<SCEVAddExpr>(S)) {
         Worklist.append(Add->op_begin(), Add->op_end());
       }
     } while (!Worklist.empty());
   }
 
   // Compute interesting factors from the set of interesting strides.
   for (SmallSetVector<const SCEV *, 4>::const_iterator
        I = Strides.begin(), E = Strides.end(); I != E; ++I)
     for (SmallSetVector<const SCEV *, 4>::const_iterator NewStrideIter =
          std::next(I); NewStrideIter != E; ++NewStrideIter) {
       const SCEV *OldStride = *I;
       const SCEV *NewStride = *NewStrideIter;
 
       if (SE.getTypeSizeInBits(OldStride->getType()) !=
           SE.getTypeSizeInBits(NewStride->getType())) {
         if (SE.getTypeSizeInBits(OldStride->getType()) >
             SE.getTypeSizeInBits(NewStride->getType()))
           NewStride = SE.getSignExtendExpr(NewStride, OldStride->getType());
         else
           OldStride = SE.getSignExtendExpr(OldStride, NewStride->getType());
       }
       if (const SCEVConstant *Factor =
             dyn_cast_or_null<SCEVConstant>(getExactSDiv(NewStride, OldStride,
                                                         SE, true))) {
         if (Factor->getAPInt().getMinSignedBits() <= 64)
           Factors.insert(Factor->getAPInt().getSExtValue());
       } else if (const SCEVConstant *Factor =
                    dyn_cast_or_null<SCEVConstant>(getExactSDiv(OldStride,
                                                                NewStride,
                                                                SE, true))) {
         if (Factor->getAPInt().getMinSignedBits() <= 64)
           Factors.insert(Factor->getAPInt().getSExtValue());
       }
     }
 
   // If all uses use the same type, don't bother looking for truncation-based
   // reuse.
   if (Types.size() == 1)
     Types.clear();
 
   DEBUG(print_factors_and_types(dbgs()));
 }
 
 /// Helper for CollectChains that finds an IV operand (computed by an AddRec in
 /// this loop) within [OI,OE) or returns OE. If IVUsers mapped Instructions to
 /// IVStrideUses, we could partially skip this.
 static User::op_iterator
 findIVOperand(User::op_iterator OI, User::op_iterator OE,
               Loop *L, ScalarEvolution &SE) {
   for(; OI != OE; ++OI) {
     if (Instruction *Oper = dyn_cast<Instruction>(*OI)) {
       if (!SE.isSCEVable(Oper->getType()))
         continue;
 
       if (const SCEVAddRecExpr *AR =
           dyn_cast<SCEVAddRecExpr>(SE.getSCEV(Oper))) {
         if (AR->getLoop() == L)
           break;
       }
     }
   }
   return OI;
 }
 
 /// IVChain logic must consistenctly peek base TruncInst operands, so wrap it in
 /// a convenient helper.
 static Value *getWideOperand(Value *Oper) {
   if (TruncInst *Trunc = dyn_cast<TruncInst>(Oper))
     return Trunc->getOperand(0);
   return Oper;
 }
 
 /// Return true if we allow an IV chain to include both types.
 static bool isCompatibleIVType(Value *LVal, Value *RVal) {
   Type *LType = LVal->getType();
   Type *RType = RVal->getType();
   return (LType == RType) || (LType->isPointerTy() && RType->isPointerTy());
 }
 
 /// Return an approximation of this SCEV expression's "base", or NULL for any
 /// constant. Returning the expression itself is conservative. Returning a
 /// deeper subexpression is more precise and valid as long as it isn't less
 /// complex than another subexpression. For expressions involving multiple
 /// unscaled values, we need to return the pointer-type SCEVUnknown. This avoids
 /// forming chains across objects, such as: PrevOper==a[i], IVOper==b[i],
 /// IVInc==b-a.
 ///
 /// Since SCEVUnknown is the rightmost type, and pointers are the rightmost
 /// SCEVUnknown, we simply return the rightmost SCEV operand.
 static const SCEV *getExprBase(const SCEV *S) {
   switch (S->getSCEVType()) {
   default: // uncluding scUnknown.
     return S;
   case scConstant:
     return nullptr;
   case scTruncate:
     return getExprBase(cast<SCEVTruncateExpr>(S)->getOperand());
   case scZeroExtend:
     return getExprBase(cast<SCEVZeroExtendExpr>(S)->getOperand());
   case scSignExtend:
     return getExprBase(cast<SCEVSignExtendExpr>(S)->getOperand());
   case scAddExpr: {
     // Skip over scaled operands (scMulExpr) to follow add operands as long as
     // there's nothing more complex.
     // FIXME: not sure if we want to recognize negation.
     const SCEVAddExpr *Add = cast<SCEVAddExpr>(S);
     for (std::reverse_iterator<SCEVAddExpr::op_iterator> I(Add->op_end()),
            E(Add->op_begin()); I != E; ++I) {
       const SCEV *SubExpr = *I;
       if (SubExpr->getSCEVType() == scAddExpr)
         return getExprBase(SubExpr);
 
       if (SubExpr->getSCEVType() != scMulExpr)
         return SubExpr;
     }
     return S; // all operands are scaled, be conservative.
   }
   case scAddRecExpr:
     return getExprBase(cast<SCEVAddRecExpr>(S)->getStart());
   }
 }
 
 /// Return true if the chain increment is profitable to expand into a loop
 /// invariant value, which may require its own register. A profitable chain
 /// increment will be an offset relative to the same base. We allow such offsets
 /// to potentially be used as chain increment as long as it's not obviously
 /// expensive to expand using real instructions.
 bool IVChain::isProfitableIncrement(const SCEV *OperExpr,
                                     const SCEV *IncExpr,
                                     ScalarEvolution &SE) {
   // Aggressively form chains when -stress-ivchain.
   if (StressIVChain)
     return true;
 
   // Do not replace a constant offset from IV head with a nonconstant IV
   // increment.
   if (!isa<SCEVConstant>(IncExpr)) {
     const SCEV *HeadExpr = SE.getSCEV(getWideOperand(Incs[0].IVOperand));
     if (isa<SCEVConstant>(SE.getMinusSCEV(OperExpr, HeadExpr)))
       return 0;
   }
 
   SmallPtrSet<const SCEV*, 8> Processed;
   return !isHighCostExpansion(IncExpr, Processed, SE);
 }
 
 /// Return true if the number of registers needed for the chain is estimated to
 /// be less than the number required for the individual IV users. First prohibit
 /// any IV users that keep the IV live across increments (the Users set should
 /// be empty). Next count the number and type of increments in the chain.
 ///
 /// Chaining IVs can lead to considerable code bloat if ISEL doesn't
 /// effectively use postinc addressing modes. Only consider it profitable it the
 /// increments can be computed in fewer registers when chained.
 ///
 /// TODO: Consider IVInc free if it's already used in another chains.
 static bool
 isProfitableChain(IVChain &Chain, SmallPtrSetImpl<Instruction*> &Users,
                   ScalarEvolution &SE, const TargetTransformInfo &TTI) {
   if (StressIVChain)
     return true;
 
   if (!Chain.hasIncs())
     return false;
 
   if (!Users.empty()) {
     DEBUG(dbgs() << "Chain: " << *Chain.Incs[0].UserInst << " users:\n";
           for (Instruction *Inst : Users) {
             dbgs() << "  " << *Inst << "\n";
           });
     return false;
   }
   assert(!Chain.Incs.empty() && "empty IV chains are not allowed");
 
   // The chain itself may require a register, so intialize cost to 1.
   int cost = 1;
 
   // A complete chain likely eliminates the need for keeping the original IV in
   // a register. LSR does not currently know how to form a complete chain unless
   // the header phi already exists.
   if (isa<PHINode>(Chain.tailUserInst())
       && SE.getSCEV(Chain.tailUserInst()) == Chain.Incs[0].IncExpr) {
     --cost;
   }
   const SCEV *LastIncExpr = nullptr;
   unsigned NumConstIncrements = 0;
   unsigned NumVarIncrements = 0;
   unsigned NumReusedIncrements = 0;
   for (const IVInc &Inc : Chain) {
     if (Inc.IncExpr->isZero())
       continue;
 
     // Incrementing by zero or some constant is neutral. We assume constants can
     // be folded into an addressing mode or an add's immediate operand.
     if (isa<SCEVConstant>(Inc.IncExpr)) {
       ++NumConstIncrements;
       continue;
     }
 
     if (Inc.IncExpr == LastIncExpr)
       ++NumReusedIncrements;
     else
       ++NumVarIncrements;
 
     LastIncExpr = Inc.IncExpr;
   }
   // An IV chain with a single increment is handled by LSR's postinc
   // uses. However, a chain with multiple increments requires keeping the IV's
   // value live longer than it needs to be if chained.
   if (NumConstIncrements > 1)
     --cost;
 
   // Materializing increment expressions in the preheader that didn't exist in
   // the original code may cost a register. For example, sign-extended array
   // indices can produce ridiculous increments like this:
   // IV + ((sext i32 (2 * %s) to i64) + (-1 * (sext i32 %s to i64)))
   cost += NumVarIncrements;
 
   // Reusing variable increments likely saves a register to hold the multiple of
   // the stride.
   cost -= NumReusedIncrements;
 
   DEBUG(dbgs() << "Chain: " << *Chain.Incs[0].UserInst << " Cost: " << cost
                << "\n");
 
   return cost < 0;
 }
 
 /// Add this IV user to an existing chain or make it the head of a new chain.
 void LSRInstance::ChainInstruction(Instruction *UserInst, Instruction *IVOper,
                                    SmallVectorImpl<ChainUsers> &ChainUsersVec) {
   // When IVs are used as types of varying widths, they are generally converted
   // to a wider type with some uses remaining narrow under a (free) trunc.
   Value *const NextIV = getWideOperand(IVOper);
   const SCEV *const OperExpr = SE.getSCEV(NextIV);
   const SCEV *const OperExprBase = getExprBase(OperExpr);
 
   // Visit all existing chains. Check if its IVOper can be computed as a
   // profitable loop invariant increment from the last link in the Chain.
   unsigned ChainIdx = 0, NChains = IVChainVec.size();
   const SCEV *LastIncExpr = nullptr;
   for (; ChainIdx < NChains; ++ChainIdx) {
     IVChain &Chain = IVChainVec[ChainIdx];
 
     // Prune the solution space aggressively by checking that both IV operands
     // are expressions that operate on the same unscaled SCEVUnknown. This
     // "base" will be canceled by the subsequent getMinusSCEV call. Checking
     // first avoids creating extra SCEV expressions.
     if (!StressIVChain && Chain.ExprBase != OperExprBase)
       continue;
 
     Value *PrevIV = getWideOperand(Chain.Incs.back().IVOperand);
     if (!isCompatibleIVType(PrevIV, NextIV))
       continue;
 
     // A phi node terminates a chain.
     if (isa<PHINode>(UserInst) && isa<PHINode>(Chain.tailUserInst()))
       continue;
 
     // The increment must be loop-invariant so it can be kept in a register.
     const SCEV *PrevExpr = SE.getSCEV(PrevIV);
     const SCEV *IncExpr = SE.getMinusSCEV(OperExpr, PrevExpr);
     if (!SE.isLoopInvariant(IncExpr, L))
       continue;
 
     if (Chain.isProfitableIncrement(OperExpr, IncExpr, SE)) {
       LastIncExpr = IncExpr;
       break;
     }
   }
   // If we haven't found a chain, create a new one, unless we hit the max. Don't
   // bother for phi nodes, because they must be last in the chain.
   if (ChainIdx == NChains) {
     if (isa<PHINode>(UserInst))
       return;
     if (NChains >= MaxChains && !StressIVChain) {
       DEBUG(dbgs() << "IV Chain Limit\n");
       return;
     }
     LastIncExpr = OperExpr;
     // IVUsers may have skipped over sign/zero extensions. We don't currently
     // attempt to form chains involving extensions unless they can be hoisted
     // into this loop's AddRec.
     if (!isa<SCEVAddRecExpr>(LastIncExpr))
       return;
     ++NChains;
     IVChainVec.push_back(IVChain(IVInc(UserInst, IVOper, LastIncExpr),
                                  OperExprBase));
     ChainUsersVec.resize(NChains);
     DEBUG(dbgs() << "IV Chain#" << ChainIdx << " Head: (" << *UserInst
                  << ") IV=" << *LastIncExpr << "\n");
   } else {
     DEBUG(dbgs() << "IV Chain#" << ChainIdx << "  Inc: (" << *UserInst
                  << ") IV+" << *LastIncExpr << "\n");
     // Add this IV user to the end of the chain.
     IVChainVec[ChainIdx].add(IVInc(UserInst, IVOper, LastIncExpr));
   }
   IVChain &Chain = IVChainVec[ChainIdx];
 
   SmallPtrSet<Instruction*,4> &NearUsers = ChainUsersVec[ChainIdx].NearUsers;
   // This chain's NearUsers become FarUsers.
   if (!LastIncExpr->isZero()) {
     ChainUsersVec[ChainIdx].FarUsers.insert(NearUsers.begin(),
                                             NearUsers.end());
     NearUsers.clear();
   }
 
   // All other uses of IVOperand become near uses of the chain.
   // We currently ignore intermediate values within SCEV expressions, assuming
   // they will eventually be used be the current chain, or can be computed
   // from one of the chain increments. To be more precise we could
   // transitively follow its user and only add leaf IV users to the set.
   for (User *U : IVOper->users()) {
     Instruction *OtherUse = dyn_cast<Instruction>(U);
     if (!OtherUse)
       continue;
     // Uses in the chain will no longer be uses if the chain is formed.
     // Include the head of the chain in this iteration (not Chain.begin()).
     IVChain::const_iterator IncIter = Chain.Incs.begin();
     IVChain::const_iterator IncEnd = Chain.Incs.end();
     for( ; IncIter != IncEnd; ++IncIter) {
       if (IncIter->UserInst == OtherUse)
         break;
     }
     if (IncIter != IncEnd)
       continue;
 
     if (SE.isSCEVable(OtherUse->getType())
         && !isa<SCEVUnknown>(SE.getSCEV(OtherUse))
         && IU.isIVUserOrOperand(OtherUse)) {
       continue;
     }
     NearUsers.insert(OtherUse);
   }
 
   // Since this user is part of the chain, it's no longer considered a use
   // of the chain.
   ChainUsersVec[ChainIdx].FarUsers.erase(UserInst);
 }
 
 /// Populate the vector of Chains.
 ///
 /// This decreases ILP at the architecture level. Targets with ample registers,
 /// multiple memory ports, and no register renaming probably don't want
 /// this. However, such targets should probably disable LSR altogether.
 ///
 /// The job of LSR is to make a reasonable choice of induction variables across
 /// the loop. Subsequent passes can easily "unchain" computation exposing more
 /// ILP *within the loop* if the target wants it.
 ///
 /// Finding the best IV chain is potentially a scheduling problem. Since LSR
 /// will not reorder memory operations, it will recognize this as a chain, but
 /// will generate redundant IV increments. Ideally this would be corrected later
 /// by a smart scheduler:
 ///        = A[i]
 ///        = A[i+x]
 /// A[i]   =
 /// A[i+x] =
 ///
 /// TODO: Walk the entire domtree within this loop, not just the path to the
 /// loop latch. This will discover chains on side paths, but requires
 /// maintaining multiple copies of the Chains state.
 void LSRInstance::CollectChains() {
   DEBUG(dbgs() << "Collecting IV Chains.\n");
   SmallVector<ChainUsers, 8> ChainUsersVec;
 
   SmallVector<BasicBlock *,8> LatchPath;
   BasicBlock *LoopHeader = L->getHeader();
   for (DomTreeNode *Rung = DT.getNode(L->getLoopLatch());
        Rung->getBlock() != LoopHeader; Rung = Rung->getIDom()) {
     LatchPath.push_back(Rung->getBlock());
   }
   LatchPath.push_back(LoopHeader);
 
   // Walk the instruction stream from the loop header to the loop latch.
   for (SmallVectorImpl<BasicBlock *>::reverse_iterator
          BBIter = LatchPath.rbegin(), BBEnd = LatchPath.rend();
        BBIter != BBEnd; ++BBIter) {
     for (BasicBlock::iterator I = (*BBIter)->begin(), E = (*BBIter)->end();
          I != E; ++I) {
       // Skip instructions that weren't seen by IVUsers analysis.
       if (isa<PHINode>(I) || !IU.isIVUserOrOperand(&*I))
         continue;
 
       // Ignore users that are part of a SCEV expression. This way we only
       // consider leaf IV Users. This effectively rediscovers a portion of
       // IVUsers analysis but in program order this time.
       if (SE.isSCEVable(I->getType()) && !isa<SCEVUnknown>(SE.getSCEV(&*I)))
         continue;
 
       // Remove this instruction from any NearUsers set it may be in.
       for (unsigned ChainIdx = 0, NChains = IVChainVec.size();
            ChainIdx < NChains; ++ChainIdx) {
         ChainUsersVec[ChainIdx].NearUsers.erase(&*I);
       }
       // Search for operands that can be chained.
       SmallPtrSet<Instruction*, 4> UniqueOperands;
       User::op_iterator IVOpEnd = I->op_end();
       User::op_iterator IVOpIter = findIVOperand(I->op_begin(), IVOpEnd, L, SE);
       while (IVOpIter != IVOpEnd) {
         Instruction *IVOpInst = cast<Instruction>(*IVOpIter);
         if (UniqueOperands.insert(IVOpInst).second)
           ChainInstruction(&*I, IVOpInst, ChainUsersVec);
         IVOpIter = findIVOperand(std::next(IVOpIter), IVOpEnd, L, SE);
       }
     } // Continue walking down the instructions.
   } // Continue walking down the domtree.
   // Visit phi backedges to determine if the chain can generate the IV postinc.
   for (BasicBlock::iterator I = L->getHeader()->begin();
        PHINode *PN = dyn_cast<PHINode>(I); ++I) {
     if (!SE.isSCEVable(PN->getType()))
       continue;
 
     Instruction *IncV =
       dyn_cast<Instruction>(PN->getIncomingValueForBlock(L->getLoopLatch()));
     if (IncV)
       ChainInstruction(PN, IncV, ChainUsersVec);
   }
   // Remove any unprofitable chains.
   unsigned ChainIdx = 0;
   for (unsigned UsersIdx = 0, NChains = IVChainVec.size();
        UsersIdx < NChains; ++UsersIdx) {
     if (!isProfitableChain(IVChainVec[UsersIdx],
                            ChainUsersVec[UsersIdx].FarUsers, SE, TTI))
       continue;
     // Preserve the chain at UsesIdx.
     if (ChainIdx != UsersIdx)
       IVChainVec[ChainIdx] = IVChainVec[UsersIdx];
     FinalizeChain(IVChainVec[ChainIdx]);
     ++ChainIdx;
   }
   IVChainVec.resize(ChainIdx);
 }
 
 void LSRInstance::FinalizeChain(IVChain &Chain) {
   assert(!Chain.Incs.empty() && "empty IV chains are not allowed");
   DEBUG(dbgs() << "Final Chain: " << *Chain.Incs[0].UserInst << "\n");
 
   for (const IVInc &Inc : Chain) {
     DEBUG(dbgs() << "        Inc: " << Inc.UserInst << "\n");
     auto UseI = std::find(Inc.UserInst->op_begin(), Inc.UserInst->op_end(),
                           Inc.IVOperand);
     assert(UseI != Inc.UserInst->op_end() && "cannot find IV operand");
     IVIncSet.insert(UseI);
   }
 }
 
 /// Return true if the IVInc can be folded into an addressing mode.
 static bool canFoldIVIncExpr(const SCEV *IncExpr, Instruction *UserInst,
                              Value *Operand, const TargetTransformInfo &TTI) {
   const SCEVConstant *IncConst = dyn_cast<SCEVConstant>(IncExpr);
   if (!IncConst || !isAddressUse(UserInst, Operand))
     return false;
 
   if (IncConst->getAPInt().getMinSignedBits() > 64)
     return false;
 
   MemAccessTy AccessTy = getAccessType(UserInst);
   int64_t IncOffset = IncConst->getValue()->getSExtValue();
   if (!isAlwaysFoldable(TTI, LSRUse::Address, AccessTy, /*BaseGV=*/nullptr,
                         IncOffset, /*HaseBaseReg=*/false))
     return false;
 
   return true;
 }
 
 /// Generate an add or subtract for each IVInc in a chain to materialize the IV
 /// user's operand from the previous IV user's operand.
 void LSRInstance::GenerateIVChain(const IVChain &Chain, SCEVExpander &Rewriter,
                                   SmallVectorImpl<WeakVH> &DeadInsts) {
   // Find the new IVOperand for the head of the chain. It may have been replaced
   // by LSR.
   const IVInc &Head = Chain.Incs[0];
   User::op_iterator IVOpEnd = Head.UserInst->op_end();
   // findIVOperand returns IVOpEnd if it can no longer find a valid IV user.
   User::op_iterator IVOpIter = findIVOperand(Head.UserInst->op_begin(),
                                              IVOpEnd, L, SE);
   Value *IVSrc = nullptr;
   while (IVOpIter != IVOpEnd) {
     IVSrc = getWideOperand(*IVOpIter);
 
     // If this operand computes the expression that the chain needs, we may use
     // it. (Check this after setting IVSrc which is used below.)
     //
     // Note that if Head.IncExpr is wider than IVSrc, then this phi is too
     // narrow for the chain, so we can no longer use it. We do allow using a
     // wider phi, assuming the LSR checked for free truncation. In that case we
     // should already have a truncate on this operand such that
     // getSCEV(IVSrc) == IncExpr.
     if (SE.getSCEV(*IVOpIter) == Head.IncExpr
         || SE.getSCEV(IVSrc) == Head.IncExpr) {
       break;
     }
     IVOpIter = findIVOperand(std::next(IVOpIter), IVOpEnd, L, SE);
   }
   if (IVOpIter == IVOpEnd) {
     // Gracefully give up on this chain.
     DEBUG(dbgs() << "Concealed chain head: " << *Head.UserInst << "\n");
     return;
   }
 
   DEBUG(dbgs() << "Generate chain at: " << *IVSrc << "\n");
   Type *IVTy = IVSrc->getType();
   Type *IntTy = SE.getEffectiveSCEVType(IVTy);
   const SCEV *LeftOverExpr = nullptr;
   for (const IVInc &Inc : Chain) {
     Instruction *InsertPt = Inc.UserInst;
     if (isa<PHINode>(InsertPt))
       InsertPt = L->getLoopLatch()->getTerminator();
 
     // IVOper will replace the current IV User's operand. IVSrc is the IV
     // value currently held in a register.
     Value *IVOper = IVSrc;
     if (!Inc.IncExpr->isZero()) {
       // IncExpr was the result of subtraction of two narrow values, so must
       // be signed.
       const SCEV *IncExpr = SE.getNoopOrSignExtend(Inc.IncExpr, IntTy);
       LeftOverExpr = LeftOverExpr ?
         SE.getAddExpr(LeftOverExpr, IncExpr) : IncExpr;
     }
     if (LeftOverExpr && !LeftOverExpr->isZero()) {
       // Expand the IV increment.
       Rewriter.clearPostInc();
       Value *IncV = Rewriter.expandCodeFor(LeftOverExpr, IntTy, InsertPt);
       const SCEV *IVOperExpr = SE.getAddExpr(SE.getUnknown(IVSrc),
                                              SE.getUnknown(IncV));
       IVOper = Rewriter.expandCodeFor(IVOperExpr, IVTy, InsertPt);
 
       // If an IV increment can't be folded, use it as the next IV value.
       if (!canFoldIVIncExpr(LeftOverExpr, Inc.UserInst, Inc.IVOperand, TTI)) {
         assert(IVTy == IVOper->getType() && "inconsistent IV increment type");
         IVSrc = IVOper;
         LeftOverExpr = nullptr;
       }
     }
     Type *OperTy = Inc.IVOperand->getType();
     if (IVTy != OperTy) {
       assert(SE.getTypeSizeInBits(IVTy) >= SE.getTypeSizeInBits(OperTy) &&
              "cannot extend a chained IV");
       IRBuilder<> Builder(InsertPt);
       IVOper = Builder.CreateTruncOrBitCast(IVOper, OperTy, "lsr.chain");
     }
     Inc.UserInst->replaceUsesOfWith(Inc.IVOperand, IVOper);
     DeadInsts.emplace_back(Inc.IVOperand);
   }
   // If LSR created a new, wider phi, we may also replace its postinc. We only
   // do this if we also found a wide value for the head of the chain.
   if (isa<PHINode>(Chain.tailUserInst())) {
     for (BasicBlock::iterator I = L->getHeader()->begin();
          PHINode *Phi = dyn_cast<PHINode>(I); ++I) {
       if (!isCompatibleIVType(Phi, IVSrc))
         continue;
       Instruction *PostIncV = dyn_cast<Instruction>(
         Phi->getIncomingValueForBlock(L->getLoopLatch()));
       if (!PostIncV || (SE.getSCEV(PostIncV) != SE.getSCEV(IVSrc)))
         continue;
       Value *IVOper = IVSrc;
       Type *PostIncTy = PostIncV->getType();
       if (IVTy != PostIncTy) {
         assert(PostIncTy->isPointerTy() && "mixing int/ptr IV types");
         IRBuilder<> Builder(L->getLoopLatch()->getTerminator());
         Builder.SetCurrentDebugLocation(PostIncV->getDebugLoc());
         IVOper = Builder.CreatePointerCast(IVSrc, PostIncTy, "lsr.chain");
       }
       Phi->replaceUsesOfWith(PostIncV, IVOper);
       DeadInsts.emplace_back(PostIncV);
     }
   }
 }
 
 void LSRInstance::CollectFixupsAndInitialFormulae() {
   for (const IVStrideUse &U : IU) {
     Instruction *UserInst = U.getUser();
     // Skip IV users that are part of profitable IV Chains.
     User::op_iterator UseI = std::find(UserInst->op_begin(), UserInst->op_end(),
                                        U.getOperandValToReplace());
     assert(UseI != UserInst->op_end() && "cannot find IV operand");
     if (IVIncSet.count(UseI))
       continue;
 
     // Record the uses.
     LSRFixup &LF = getNewFixup();
     LF.UserInst = UserInst;
     LF.OperandValToReplace = U.getOperandValToReplace();
     LF.PostIncLoops = U.getPostIncLoops();
 
     LSRUse::KindType Kind = LSRUse::Basic;
     MemAccessTy AccessTy;
     if (isAddressUse(LF.UserInst, LF.OperandValToReplace)) {
       Kind = LSRUse::Address;
       AccessTy = getAccessType(LF.UserInst);
     }
 
     const SCEV *S = IU.getExpr(U);
 
     // Equality (== and !=) ICmps are special. We can rewrite (i == N) as
     // (N - i == 0), and this allows (N - i) to be the expression that we work
     // with rather than just N or i, so we can consider the register
     // requirements for both N and i at the same time. Limiting this code to
     // equality icmps is not a problem because all interesting loops use
     // equality icmps, thanks to IndVarSimplify.
     if (ICmpInst *CI = dyn_cast<ICmpInst>(LF.UserInst))
       if (CI->isEquality()) {
         // Swap the operands if needed to put the OperandValToReplace on the
         // left, for consistency.
         Value *NV = CI->getOperand(1);
         if (NV == LF.OperandValToReplace) {
           CI->setOperand(1, CI->getOperand(0));
           CI->setOperand(0, NV);
           NV = CI->getOperand(1);
           Changed = true;
         }
 
         // x == y  -->  x - y == 0
         const SCEV *N = SE.getSCEV(NV);
         if (SE.isLoopInvariant(N, L) && isSafeToExpand(N, SE)) {
           // S is normalized, so normalize N before folding it into S
           // to keep the result normalized.
           N = TransformForPostIncUse(Normalize, N, CI, nullptr,
                                      LF.PostIncLoops, SE, DT);
           Kind = LSRUse::ICmpZero;
           S = SE.getMinusSCEV(N, S);
         }
 
         // -1 and the negations of all interesting strides (except the negation
         // of -1) are now also interesting.
         for (size_t i = 0, e = Factors.size(); i != e; ++i)
           if (Factors[i] != -1)
             Factors.insert(-(uint64_t)Factors[i]);
         Factors.insert(-1);
       }
 
     // Set up the initial formula for this use.
     std::pair<size_t, int64_t> P = getUse(S, Kind, AccessTy);
     LF.LUIdx = P.first;
     LF.Offset = P.second;
     LSRUse &LU = Uses[LF.LUIdx];
     LU.AllFixupsOutsideLoop &= LF.isUseFullyOutsideLoop(L);
     if (!LU.WidestFixupType ||
         SE.getTypeSizeInBits(LU.WidestFixupType) <
         SE.getTypeSizeInBits(LF.OperandValToReplace->getType()))
       LU.WidestFixupType = LF.OperandValToReplace->getType();
 
     // If this is the first use of this LSRUse, give it a formula.
     if (LU.Formulae.empty()) {
       InsertInitialFormula(S, LU, LF.LUIdx);
       CountRegisters(LU.Formulae.back(), LF.LUIdx);
     }
   }
 
   DEBUG(print_fixups(dbgs()));
 }
 
 /// Insert a formula for the given expression into the given use, separating out
 /// loop-variant portions from loop-invariant and loop-computable portions.
 void
 LSRInstance::InsertInitialFormula(const SCEV *S, LSRUse &LU, size_t LUIdx) {
   // Mark uses whose expressions cannot be expanded.
   if (!isSafeToExpand(S, SE))
     LU.RigidFormula = true;
 
   Formula F;
   F.initialMatch(S, L, SE);
   bool Inserted = InsertFormula(LU, LUIdx, F);
   assert(Inserted && "Initial formula already exists!"); (void)Inserted;
 }
 
 /// Insert a simple single-register formula for the given expression into the
 /// given use.
 void
 LSRInstance::InsertSupplementalFormula(const SCEV *S,
                                        LSRUse &LU, size_t LUIdx) {
   Formula F;
   F.BaseRegs.push_back(S);
   F.HasBaseReg = true;
   bool Inserted = InsertFormula(LU, LUIdx, F);
   assert(Inserted && "Supplemental formula already exists!"); (void)Inserted;
 }
 
 /// Note which registers are used by the given formula, updating RegUses.
 void LSRInstance::CountRegisters(const Formula &F, size_t LUIdx) {
   if (F.ScaledReg)
     RegUses.countRegister(F.ScaledReg, LUIdx);
   for (const SCEV *BaseReg : F.BaseRegs)
     RegUses.countRegister(BaseReg, LUIdx);
 }
 
 /// If the given formula has not yet been inserted, add it to the list, and
 /// return true. Return false otherwise.
 bool LSRInstance::InsertFormula(LSRUse &LU, unsigned LUIdx, const Formula &F) {
   // Do not insert formula that we will not be able to expand.
   assert(isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy, F) &&
          "Formula is illegal");
   if (!LU.InsertFormula(F))
     return false;
 
   CountRegisters(F, LUIdx);
   return true;
 }
 
 /// Check for other uses of loop-invariant values which we're tracking. These
 /// other uses will pin these values in registers, making them less profitable
 /// for elimination.
 /// TODO: This currently misses non-constant addrec step registers.
 /// TODO: Should this give more weight to users inside the loop?
 void
 LSRInstance::CollectLoopInvariantFixupsAndFormulae() {
   SmallVector<const SCEV *, 8> Worklist(RegUses.begin(), RegUses.end());
   SmallPtrSet<const SCEV *, 32> Visited;
 
   while (!Worklist.empty()) {
     const SCEV *S = Worklist.pop_back_val();
 
     // Don't process the same SCEV twice
     if (!Visited.insert(S).second)
       continue;
 
     if (const SCEVNAryExpr *N = dyn_cast<SCEVNAryExpr>(S))
       Worklist.append(N->op_begin(), N->op_end());
     else if (const SCEVCastExpr *C = dyn_cast<SCEVCastExpr>(S))
       Worklist.push_back(C->getOperand());
     else if (const SCEVUDivExpr *D = dyn_cast<SCEVUDivExpr>(S)) {
       Worklist.push_back(D->getLHS());
       Worklist.push_back(D->getRHS());
     } else if (const SCEVUnknown *US = dyn_cast<SCEVUnknown>(S)) {
       const Value *V = US->getValue();
       if (const Instruction *Inst = dyn_cast<Instruction>(V)) {
         // Look for instructions defined outside the loop.
         if (L->contains(Inst)) continue;
       } else if (isa<UndefValue>(V))
         // Undef doesn't have a live range, so it doesn't matter.
         continue;
       for (const Use &U : V->uses()) {
         const Instruction *UserInst = dyn_cast<Instruction>(U.getUser());
         // Ignore non-instructions.
         if (!UserInst)
           continue;
         // Ignore instructions in other functions (as can happen with
         // Constants).
         if (UserInst->getParent()->getParent() != L->getHeader()->getParent())
           continue;
         // Ignore instructions not dominated by the loop.
         const BasicBlock *UseBB = !isa<PHINode>(UserInst) ?
           UserInst->getParent() :
           cast<PHINode>(UserInst)->getIncomingBlock(
             PHINode::getIncomingValueNumForOperand(U.getOperandNo()));
         if (!DT.dominates(L->getHeader(), UseBB))
           continue;
         // Don't bother if the instruction is in a BB which ends in an EHPad.
         if (UseBB->getTerminator()->isEHPad())
           continue;
         // Ignore uses which are part of other SCEV expressions, to avoid
         // analyzing them multiple times.
         if (SE.isSCEVable(UserInst->getType())) {
           const SCEV *UserS = SE.getSCEV(const_cast<Instruction *>(UserInst));
           // If the user is a no-op, look through to its uses.
           if (!isa<SCEVUnknown>(UserS))
             continue;
           if (UserS == US) {
             Worklist.push_back(
               SE.getUnknown(const_cast<Instruction *>(UserInst)));
             continue;
           }
         }
         // Ignore icmp instructions which are already being analyzed.
         if (const ICmpInst *ICI = dyn_cast<ICmpInst>(UserInst)) {
           unsigned OtherIdx = !U.getOperandNo();
           Value *OtherOp = const_cast<Value *>(ICI->getOperand(OtherIdx));
           if (SE.hasComputableLoopEvolution(SE.getSCEV(OtherOp), L))
             continue;
         }
 
         LSRFixup &LF = getNewFixup();
         LF.UserInst = const_cast<Instruction *>(UserInst);
         LF.OperandValToReplace = U;
         std::pair<size_t, int64_t> P = getUse(
             S, LSRUse::Basic, MemAccessTy());
         LF.LUIdx = P.first;
         LF.Offset = P.second;
         LSRUse &LU = Uses[LF.LUIdx];
         LU.AllFixupsOutsideLoop &= LF.isUseFullyOutsideLoop(L);
         if (!LU.WidestFixupType ||
             SE.getTypeSizeInBits(LU.WidestFixupType) <
             SE.getTypeSizeInBits(LF.OperandValToReplace->getType()))
           LU.WidestFixupType = LF.OperandValToReplace->getType();
         InsertSupplementalFormula(US, LU, LF.LUIdx);
         CountRegisters(LU.Formulae.back(), Uses.size() - 1);
         break;
       }
     }
   }
 }
 
 /// Split S into subexpressions which can be pulled out into separate
 /// registers. If C is non-null, multiply each subexpression by C.
 ///
 /// Return remainder expression after factoring the subexpressions captured by
 /// Ops. If Ops is complete, return NULL.
 static const SCEV *CollectSubexprs(const SCEV *S, const SCEVConstant *C,
                                    SmallVectorImpl<const SCEV *> &Ops,
                                    const Loop *L,
                                    ScalarEvolution &SE,
                                    unsigned Depth = 0) {
   // Arbitrarily cap recursion to protect compile time.
   if (Depth >= 3)
     return S;
 
   if (const SCEVAddExpr *Add = dyn_cast<SCEVAddExpr>(S)) {
     // Break out add operands.
     for (const SCEV *S : Add->operands()) {
       const SCEV *Remainder = CollectSubexprs(S, C, Ops, L, SE, Depth+1);
       if (Remainder)
         Ops.push_back(C ? SE.getMulExpr(C, Remainder) : Remainder);
     }
     return nullptr;
   } else if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(S)) {
     // Split a non-zero base out of an addrec.
     if (AR->getStart()->isZero())
       return S;
 
     const SCEV *Remainder = CollectSubexprs(AR->getStart(),
                                             C, Ops, L, SE, Depth+1);
     // Split the non-zero AddRec unless it is part of a nested recurrence that
     // does not pertain to this loop.
     if (Remainder && (AR->getLoop() == L || !isa<SCEVAddRecExpr>(Remainder))) {
       Ops.push_back(C ? SE.getMulExpr(C, Remainder) : Remainder);
       Remainder = nullptr;
     }
     if (Remainder != AR->getStart()) {
       if (!Remainder)
         Remainder = SE.getConstant(AR->getType(), 0);
       return SE.getAddRecExpr(Remainder,
                               AR->getStepRecurrence(SE),
                               AR->getLoop(),
                               //FIXME: AR->getNoWrapFlags(SCEV::FlagNW)
                               SCEV::FlagAnyWrap);
     }
   } else if (const SCEVMulExpr *Mul = dyn_cast<SCEVMulExpr>(S)) {
     // Break (C * (a + b + c)) into C*a + C*b + C*c.
     if (Mul->getNumOperands() != 2)
       return S;
     if (const SCEVConstant *Op0 =
         dyn_cast<SCEVConstant>(Mul->getOperand(0))) {
       C = C ? cast<SCEVConstant>(SE.getMulExpr(C, Op0)) : Op0;
       const SCEV *Remainder =
         CollectSubexprs(Mul->getOperand(1), C, Ops, L, SE, Depth+1);
       if (Remainder)
         Ops.push_back(SE.getMulExpr(C, Remainder));
       return nullptr;
     }
   }
   return S;
 }
 
 /// \brief Helper function for LSRInstance::GenerateReassociations.
 void LSRInstance::GenerateReassociationsImpl(LSRUse &LU, unsigned LUIdx,
                                              const Formula &Base,
                                              unsigned Depth, size_t Idx,
                                              bool IsScaledReg) {
   const SCEV *BaseReg = IsScaledReg ? Base.ScaledReg : Base.BaseRegs[Idx];
   SmallVector<const SCEV *, 8> AddOps;
   const SCEV *Remainder = CollectSubexprs(BaseReg, nullptr, AddOps, L, SE);
   if (Remainder)
     AddOps.push_back(Remainder);
 
   if (AddOps.size() == 1)
     return;
 
   for (SmallVectorImpl<const SCEV *>::const_iterator J = AddOps.begin(),
                                                      JE = AddOps.end();
        J != JE; ++J) {
 
     // Loop-variant "unknown" values are uninteresting; we won't be able to
     // do anything meaningful with them.
     if (isa<SCEVUnknown>(*J) && !SE.isLoopInvariant(*J, L))
       continue;
 
     // Don't pull a constant into a register if the constant could be folded
     // into an immediate field.
     if (isAlwaysFoldable(TTI, SE, LU.MinOffset, LU.MaxOffset, LU.Kind,
                          LU.AccessTy, *J, Base.getNumRegs() > 1))
       continue;
 
     // Collect all operands except *J.
     SmallVector<const SCEV *, 8> InnerAddOps(
         ((const SmallVector<const SCEV *, 8> &)AddOps).begin(), J);
     InnerAddOps.append(std::next(J),
                        ((const SmallVector<const SCEV *, 8> &)AddOps).end());
 
     // Don't leave just a constant behind in a register if the constant could
     // be folded into an immediate field.
     if (InnerAddOps.size() == 1 &&
         isAlwaysFoldable(TTI, SE, LU.MinOffset, LU.MaxOffset, LU.Kind,
                          LU.AccessTy, InnerAddOps[0], Base.getNumRegs() > 1))
       continue;
 
     const SCEV *InnerSum = SE.getAddExpr(InnerAddOps);
     if (InnerSum->isZero())
       continue;
     Formula F = Base;
 
     // Add the remaining pieces of the add back into the new formula.
     const SCEVConstant *InnerSumSC = dyn_cast<SCEVConstant>(InnerSum);
     if (InnerSumSC && SE.getTypeSizeInBits(InnerSumSC->getType()) <= 64 &&
         TTI.isLegalAddImmediate((uint64_t)F.UnfoldedOffset +
                                 InnerSumSC->getValue()->getZExtValue())) {
       F.UnfoldedOffset =
           (uint64_t)F.UnfoldedOffset + InnerSumSC->getValue()->getZExtValue();
       if (IsScaledReg)
         F.ScaledReg = nullptr;
       else
         F.BaseRegs.erase(F.BaseRegs.begin() + Idx);
     } else if (IsScaledReg)
       F.ScaledReg = InnerSum;
     else
       F.BaseRegs[Idx] = InnerSum;
 
     // Add J as its own register, or an unfolded immediate.
     const SCEVConstant *SC = dyn_cast<SCEVConstant>(*J);
     if (SC && SE.getTypeSizeInBits(SC->getType()) <= 64 &&
         TTI.isLegalAddImmediate((uint64_t)F.UnfoldedOffset +
                                 SC->getValue()->getZExtValue()))
       F.UnfoldedOffset =
           (uint64_t)F.UnfoldedOffset + SC->getValue()->getZExtValue();
     else
       F.BaseRegs.push_back(*J);
     // We may have changed the number of register in base regs, adjust the
     // formula accordingly.
     F.canonicalize();
 
     if (InsertFormula(LU, LUIdx, F))
       // If that formula hadn't been seen before, recurse to find more like
       // it.
       GenerateReassociations(LU, LUIdx, LU.Formulae.back(), Depth + 1);
   }
 }
 
 /// Split out subexpressions from adds and the bases of addrecs.
 void LSRInstance::GenerateReassociations(LSRUse &LU, unsigned LUIdx,
                                          Formula Base, unsigned Depth) {
   assert(Base.isCanonical() && "Input must be in the canonical form");
   // Arbitrarily cap recursion to protect compile time.
   if (Depth >= 3)
     return;
 
   for (size_t i = 0, e = Base.BaseRegs.size(); i != e; ++i)
     GenerateReassociationsImpl(LU, LUIdx, Base, Depth, i);
 
   if (Base.Scale == 1)
     GenerateReassociationsImpl(LU, LUIdx, Base, Depth,
                                /* Idx */ -1, /* IsScaledReg */ true);
 }
 
 ///  Generate a formula consisting of all of the loop-dominating registers added
 /// into a single register.
 void LSRInstance::GenerateCombinations(LSRUse &LU, unsigned LUIdx,
                                        Formula Base) {
   // This method is only interesting on a plurality of registers.
   if (Base.BaseRegs.size() + (Base.Scale == 1) <= 1)
     return;
 
   // Flatten the representation, i.e., reg1 + 1*reg2 => reg1 + reg2, before
   // processing the formula.
   Base.unscale();
   Formula F = Base;
   F.BaseRegs.clear();
   SmallVector<const SCEV *, 4> Ops;
   for (const SCEV *BaseReg : Base.BaseRegs) {
     if (SE.properlyDominates(BaseReg, L->getHeader()) &&
         !SE.hasComputableLoopEvolution(BaseReg, L))
       Ops.push_back(BaseReg);
     else
       F.BaseRegs.push_back(BaseReg);
   }
   if (Ops.size() > 1) {
     const SCEV *Sum = SE.getAddExpr(Ops);
     // TODO: If Sum is zero, it probably means ScalarEvolution missed an
     // opportunity to fold something. For now, just ignore such cases
     // rather than proceed with zero in a register.
     if (!Sum->isZero()) {
       F.BaseRegs.push_back(Sum);
       F.canonicalize();
       (void)InsertFormula(LU, LUIdx, F);
     }
   }
 }
 
 /// \brief Helper function for LSRInstance::GenerateSymbolicOffsets.
 void LSRInstance::GenerateSymbolicOffsetsImpl(LSRUse &LU, unsigned LUIdx,
                                               const Formula &Base, size_t Idx,
                                               bool IsScaledReg) {
   const SCEV *G = IsScaledReg ? Base.ScaledReg : Base.BaseRegs[Idx];
   GlobalValue *GV = ExtractSymbol(G, SE);
   if (G->isZero() || !GV)
     return;
   Formula F = Base;
   F.BaseGV = GV;
   if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy, F))
     return;
   if (IsScaledReg)
     F.ScaledReg = G;
   else
     F.BaseRegs[Idx] = G;
   (void)InsertFormula(LU, LUIdx, F);
 }
 
 /// Generate reuse formulae using symbolic offsets.
 void LSRInstance::GenerateSymbolicOffsets(LSRUse &LU, unsigned LUIdx,
                                           Formula Base) {
   // We can't add a symbolic offset if the address already contains one.
   if (Base.BaseGV) return;
 
   for (size_t i = 0, e = Base.BaseRegs.size(); i != e; ++i)
     GenerateSymbolicOffsetsImpl(LU, LUIdx, Base, i);
   if (Base.Scale == 1)
     GenerateSymbolicOffsetsImpl(LU, LUIdx, Base, /* Idx */ -1,
                                 /* IsScaledReg */ true);
 }
 
 /// \brief Helper function for LSRInstance::GenerateConstantOffsets.
 void LSRInstance::GenerateConstantOffsetsImpl(
     LSRUse &LU, unsigned LUIdx, const Formula &Base,
     const SmallVectorImpl<int64_t> &Worklist, size_t Idx, bool IsScaledReg) {
   const SCEV *G = IsScaledReg ? Base.ScaledReg : Base.BaseRegs[Idx];
   for (int64_t Offset : Worklist) {
     Formula F = Base;
     F.BaseOffset = (uint64_t)Base.BaseOffset - Offset;
     if (isLegalUse(TTI, LU.MinOffset - Offset, LU.MaxOffset - Offset, LU.Kind,
                    LU.AccessTy, F)) {
       // Add the offset to the base register.
       const SCEV *NewG = SE.getAddExpr(SE.getConstant(G->getType(), Offset), G);
       // If it cancelled out, drop the base register, otherwise update it.
       if (NewG->isZero()) {
         if (IsScaledReg) {
           F.Scale = 0;
           F.ScaledReg = nullptr;
         } else
           F.deleteBaseReg(F.BaseRegs[Idx]);
         F.canonicalize();
       } else if (IsScaledReg)
         F.ScaledReg = NewG;
       else
         F.BaseRegs[Idx] = NewG;
 
       (void)InsertFormula(LU, LUIdx, F);
     }
   }
 
   int64_t Imm = ExtractImmediate(G, SE);
   if (G->isZero() || Imm == 0)
     return;
   Formula F = Base;
   F.BaseOffset = (uint64_t)F.BaseOffset + Imm;
   if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy, F))
     return;
   if (IsScaledReg)
     F.ScaledReg = G;
   else
     F.BaseRegs[Idx] = G;
   (void)InsertFormula(LU, LUIdx, F);
 }
 
 /// GenerateConstantOffsets - Generate reuse formulae using symbolic offsets.
 void LSRInstance::GenerateConstantOffsets(LSRUse &LU, unsigned LUIdx,
                                           Formula Base) {
   // TODO: For now, just add the min and max offset, because it usually isn't
   // worthwhile looking at everything inbetween.
   SmallVector<int64_t, 2> Worklist;
   Worklist.push_back(LU.MinOffset);
   if (LU.MaxOffset != LU.MinOffset)
     Worklist.push_back(LU.MaxOffset);
 
   for (size_t i = 0, e = Base.BaseRegs.size(); i != e; ++i)
     GenerateConstantOffsetsImpl(LU, LUIdx, Base, Worklist, i);
   if (Base.Scale == 1)
     GenerateConstantOffsetsImpl(LU, LUIdx, Base, Worklist, /* Idx */ -1,
                                 /* IsScaledReg */ true);
 }
 
 /// For ICmpZero, check to see if we can scale up the comparison. For example, x
 /// == y -> x*c == y*c.
 void LSRInstance::GenerateICmpZeroScales(LSRUse &LU, unsigned LUIdx,
                                          Formula Base) {
   if (LU.Kind != LSRUse::ICmpZero) return;
 
   // Determine the integer type for the base formula.
   Type *IntTy = Base.getType();
   if (!IntTy) return;
   if (SE.getTypeSizeInBits(IntTy) > 64) return;
 
   // Don't do this if there is more than one offset.
   if (LU.MinOffset != LU.MaxOffset) return;
 
   assert(!Base.BaseGV && "ICmpZero use is not legal!");
 
   // Check each interesting stride.
   for (int64_t Factor : Factors) {
     // Check that the multiplication doesn't overflow.
     if (Base.BaseOffset == INT64_MIN && Factor == -1)
       continue;
     int64_t NewBaseOffset = (uint64_t)Base.BaseOffset * Factor;
     if (NewBaseOffset / Factor != Base.BaseOffset)
       continue;
     // If the offset will be truncated at this use, check that it is in bounds.
     if (!IntTy->isPointerTy() &&
         !ConstantInt::isValueValidForType(IntTy, NewBaseOffset))
       continue;
 
     // Check that multiplying with the use offset doesn't overflow.
     int64_t Offset = LU.MinOffset;
     if (Offset == INT64_MIN && Factor == -1)
       continue;
     Offset = (uint64_t)Offset * Factor;
     if (Offset / Factor != LU.MinOffset)
       continue;
     // If the offset will be truncated at this use, check that it is in bounds.
     if (!IntTy->isPointerTy() &&
         !ConstantInt::isValueValidForType(IntTy, Offset))
       continue;
 
     Formula F = Base;
     F.BaseOffset = NewBaseOffset;
 
     // Check that this scale is legal.
     if (!isLegalUse(TTI, Offset, Offset, LU.Kind, LU.AccessTy, F))
       continue;
 
     // Compensate for the use having MinOffset built into it.
     F.BaseOffset = (uint64_t)F.BaseOffset + Offset - LU.MinOffset;
 
     const SCEV *FactorS = SE.getConstant(IntTy, Factor);
 
     // Check that multiplying with each base register doesn't overflow.
     for (size_t i = 0, e = F.BaseRegs.size(); i != e; ++i) {
       F.BaseRegs[i] = SE.getMulExpr(F.BaseRegs[i], FactorS);
       if (getExactSDiv(F.BaseRegs[i], FactorS, SE) != Base.BaseRegs[i])
         goto next;
     }
 
     // Check that multiplying with the scaled register doesn't overflow.
     if (F.ScaledReg) {
       F.ScaledReg = SE.getMulExpr(F.ScaledReg, FactorS);
       if (getExactSDiv(F.ScaledReg, FactorS, SE) != Base.ScaledReg)
         continue;
     }
 
     // Check that multiplying with the unfolded offset doesn't overflow.
     if (F.UnfoldedOffset != 0) {
       if (F.UnfoldedOffset == INT64_MIN && Factor == -1)
         continue;
       F.UnfoldedOffset = (uint64_t)F.UnfoldedOffset * Factor;
       if (F.UnfoldedOffset / Factor != Base.UnfoldedOffset)
         continue;
       // If the offset will be truncated, check that it is in bounds.
       if (!IntTy->isPointerTy() &&
           !ConstantInt::isValueValidForType(IntTy, F.UnfoldedOffset))
         continue;
     }
 
     // If we make it here and it's legal, add it.
     (void)InsertFormula(LU, LUIdx, F);
   next:;
   }
 }
 
 /// Generate stride factor reuse formulae by making use of scaled-offset address
 /// modes, for example.
 void LSRInstance::GenerateScales(LSRUse &LU, unsigned LUIdx, Formula Base) {
   // Determine the integer type for the base formula.
   Type *IntTy = Base.getType();
   if (!IntTy) return;
 
   // If this Formula already has a scaled register, we can't add another one.
   // Try to unscale the formula to generate a better scale.
   if (Base.Scale != 0 && !Base.unscale())
     return;
 
   assert(Base.Scale == 0 && "unscale did not did its job!");
 
   // Check each interesting stride.
   for (int64_t Factor : Factors) {
     Base.Scale = Factor;
     Base.HasBaseReg = Base.BaseRegs.size() > 1;
     // Check whether this scale is going to be legal.
     if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy,
                     Base)) {
       // As a special-case, handle special out-of-loop Basic users specially.
       // TODO: Reconsider this special case.
       if (LU.Kind == LSRUse::Basic &&
           isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LSRUse::Special,
                      LU.AccessTy, Base) &&
           LU.AllFixupsOutsideLoop)
         LU.Kind = LSRUse::Special;
       else
         continue;
     }
     // For an ICmpZero, negating a solitary base register won't lead to
     // new solutions.
     if (LU.Kind == LSRUse::ICmpZero &&
         !Base.HasBaseReg && Base.BaseOffset == 0 && !Base.BaseGV)
       continue;
     // For each addrec base reg, apply the scale, if possible.
     for (size_t i = 0, e = Base.BaseRegs.size(); i != e; ++i)
       if (const SCEVAddRecExpr *AR =
             dyn_cast<SCEVAddRecExpr>(Base.BaseRegs[i])) {
         const SCEV *FactorS = SE.getConstant(IntTy, Factor);
         if (FactorS->isZero())
           continue;
         // Divide out the factor, ignoring high bits, since we'll be
         // scaling the value back up in the end.
         if (const SCEV *Quotient = getExactSDiv(AR, FactorS, SE, true)) {
           // TODO: This could be optimized to avoid all the copying.
           Formula F = Base;
           F.ScaledReg = Quotient;
           F.deleteBaseReg(F.BaseRegs[i]);
           // The canonical representation of 1*reg is reg, which is already in
           // Base. In that case, do not try to insert the formula, it will be
           // rejected anyway.
           if (F.Scale == 1 && F.BaseRegs.empty())
             continue;
           (void)InsertFormula(LU, LUIdx, F);
         }
       }
   }
 }
 
 /// Generate reuse formulae from different IV types.
 void LSRInstance::GenerateTruncates(LSRUse &LU, unsigned LUIdx, Formula Base) {
   // Don't bother truncating symbolic values.
   if (Base.BaseGV) return;
 
   // Determine the integer type for the base formula.
   Type *DstTy = Base.getType();
   if (!DstTy) return;
   DstTy = SE.getEffectiveSCEVType(DstTy);
 
   for (Type *SrcTy : Types) {
     if (SrcTy != DstTy && TTI.isTruncateFree(SrcTy, DstTy)) {
       Formula F = Base;
 
       if (F.ScaledReg) F.ScaledReg = SE.getAnyExtendExpr(F.ScaledReg, SrcTy);
       for (const SCEV *&BaseReg : F.BaseRegs)
         BaseReg = SE.getAnyExtendExpr(BaseReg, SrcTy);
 
       // TODO: This assumes we've done basic processing on all uses and
       // have an idea what the register usage is.
       if (!F.hasRegsUsedByUsesOtherThan(LUIdx, RegUses))
         continue;
 
       (void)InsertFormula(LU, LUIdx, F);
     }
   }
 }
 
 namespace {
 
 /// Helper class for GenerateCrossUseConstantOffsets. It's used to defer
 /// modifications so that the search phase doesn't have to worry about the data
 /// structures moving underneath it.
 struct WorkItem {
   size_t LUIdx;
   int64_t Imm;
   const SCEV *OrigReg;
 
   WorkItem(size_t LI, int64_t I, const SCEV *R)
     : LUIdx(LI), Imm(I), OrigReg(R) {}
 
   void print(raw_ostream &OS) const;
   void dump() const;
 };
 
 }
 
 void WorkItem::print(raw_ostream &OS) const {
   OS << "in formulae referencing " << *OrigReg << " in use " << LUIdx
      << " , add offset " << Imm;
 }
 
 LLVM_DUMP_METHOD
 void WorkItem::dump() const {
   print(errs()); errs() << '\n';
 }
 
 /// Look for registers which are a constant distance apart and try to form reuse
 /// opportunities between them.
 void LSRInstance::GenerateCrossUseConstantOffsets() {
   // Group the registers by their value without any added constant offset.
   typedef std::map<int64_t, const SCEV *> ImmMapTy;
   DenseMap<const SCEV *, ImmMapTy> Map;
   DenseMap<const SCEV *, SmallBitVector> UsedByIndicesMap;
   SmallVector<const SCEV *, 8> Sequence;
   for (const SCEV *Use : RegUses) {
     const SCEV *Reg = Use; // Make a copy for ExtractImmediate to modify.
     int64_t Imm = ExtractImmediate(Reg, SE);
     auto Pair = Map.insert(std::make_pair(Reg, ImmMapTy()));
     if (Pair.second)
       Sequence.push_back(Reg);
     Pair.first->second.insert(std::make_pair(Imm, Use));
     UsedByIndicesMap[Reg] |= RegUses.getUsedByIndices(Use);
   }
 
   // Now examine each set of registers with the same base value. Build up
   // a list of work to do and do the work in a separate step so that we're
   // not adding formulae and register counts while we're searching.
   SmallVector<WorkItem, 32> WorkItems;
   SmallSet<std::pair<size_t, int64_t>, 32> UniqueItems;
   for (const SCEV *Reg : Sequence) {
     const ImmMapTy &Imms = Map.find(Reg)->second;
 
     // It's not worthwhile looking for reuse if there's only one offset.
     if (Imms.size() == 1)
       continue;
 
     DEBUG(dbgs() << "Generating cross-use offsets for " << *Reg << ':';
           for (const auto &Entry : Imms)
             dbgs() << ' ' << Entry.first;
           dbgs() << '\n');
 
     // Examine each offset.
     for (ImmMapTy::const_iterator J = Imms.begin(), JE = Imms.end();
          J != JE; ++J) {
       const SCEV *OrigReg = J->second;
 
       int64_t JImm = J->first;
       const SmallBitVector &UsedByIndices = RegUses.getUsedByIndices(OrigReg);
 
       if (!isa<SCEVConstant>(OrigReg) &&
           UsedByIndicesMap[Reg].count() == 1) {
         DEBUG(dbgs() << "Skipping cross-use reuse for " << *OrigReg << '\n');
         continue;
       }
 
       // Conservatively examine offsets between this orig reg a few selected
       // other orig regs.
       ImmMapTy::const_iterator OtherImms[] = {
         Imms.begin(), std::prev(Imms.end()),
         Imms.lower_bound((Imms.begin()->first + std::prev(Imms.end())->first) /
                          2)
       };
       for (size_t i = 0, e = array_lengthof(OtherImms); i != e; ++i) {
         ImmMapTy::const_iterator M = OtherImms[i];
         if (M == J || M == JE) continue;
 
         // Compute the difference between the two.
         int64_t Imm = (uint64_t)JImm - M->first;
         for (int LUIdx = UsedByIndices.find_first(); LUIdx != -1;
              LUIdx = UsedByIndices.find_next(LUIdx))
           // Make a memo of this use, offset, and register tuple.
           if (UniqueItems.insert(std::make_pair(LUIdx, Imm)).second)
             WorkItems.push_back(WorkItem(LUIdx, Imm, OrigReg));
       }
     }
   }
 
   Map.clear();
   Sequence.clear();
   UsedByIndicesMap.clear();
   UniqueItems.clear();
 
   // Now iterate through the worklist and add new formulae.
   for (const WorkItem &WI : WorkItems) {
     size_t LUIdx = WI.LUIdx;
     LSRUse &LU = Uses[LUIdx];
     int64_t Imm = WI.Imm;
     const SCEV *OrigReg = WI.OrigReg;
 
     Type *IntTy = SE.getEffectiveSCEVType(OrigReg->getType());
     const SCEV *NegImmS = SE.getSCEV(ConstantInt::get(IntTy, -(uint64_t)Imm));
     unsigned BitWidth = SE.getTypeSizeInBits(IntTy);
 
     // TODO: Use a more targeted data structure.
     for (size_t L = 0, LE = LU.Formulae.size(); L != LE; ++L) {
       Formula F = LU.Formulae[L];
       // FIXME: The code for the scaled and unscaled registers looks
       // very similar but slightly different. Investigate if they
       // could be merged. That way, we would not have to unscale the
       // Formula.
       F.unscale();
       // Use the immediate in the scaled register.
       if (F.ScaledReg == OrigReg) {
         int64_t Offset = (uint64_t)F.BaseOffset + Imm * (uint64_t)F.Scale;
         // Don't create 50 + reg(-50).
         if (F.referencesReg(SE.getSCEV(
                    ConstantInt::get(IntTy, -(uint64_t)Offset))))
           continue;
         Formula NewF = F;
         NewF.BaseOffset = Offset;
         if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy,
                         NewF))
           continue;
         NewF.ScaledReg = SE.getAddExpr(NegImmS, NewF.ScaledReg);
 
         // If the new scale is a constant in a register, and adding the constant
         // value to the immediate would produce a value closer to zero than the
         // immediate itself, then the formula isn't worthwhile.
         if (const SCEVConstant *C = dyn_cast<SCEVConstant>(NewF.ScaledReg))
           if (C->getValue()->isNegative() != (NewF.BaseOffset < 0) &&
               (C->getAPInt().abs() * APInt(BitWidth, F.Scale))
                   .ule(std::abs(NewF.BaseOffset)))
             continue;
 
         // OK, looks good.
         NewF.canonicalize();
         (void)InsertFormula(LU, LUIdx, NewF);
       } else {
         // Use the immediate in a base register.
         for (size_t N = 0, NE = F.BaseRegs.size(); N != NE; ++N) {
           const SCEV *BaseReg = F.BaseRegs[N];
           if (BaseReg != OrigReg)
             continue;
           Formula NewF = F;
           NewF.BaseOffset = (uint64_t)NewF.BaseOffset + Imm;
           if (!isLegalUse(TTI, LU.MinOffset, LU.MaxOffset,
                           LU.Kind, LU.AccessTy, NewF)) {
             if (!TTI.isLegalAddImmediate((uint64_t)NewF.UnfoldedOffset + Imm))
               continue;
             NewF = F;
             NewF.UnfoldedOffset = (uint64_t)NewF.UnfoldedOffset + Imm;
           }
           NewF.BaseRegs[N] = SE.getAddExpr(NegImmS, BaseReg);
 
           // If the new formula has a constant in a register, and adding the
           // constant value to the immediate would produce a value closer to
           // zero than the immediate itself, then the formula isn't worthwhile.
           for (const SCEV *NewReg : NewF.BaseRegs)
             if (const SCEVConstant *C = dyn_cast<SCEVConstant>(NewReg))
               if ((C->getAPInt() + NewF.BaseOffset)
                       .abs()
                       .slt(std::abs(NewF.BaseOffset)) &&
                   (C->getAPInt() + NewF.BaseOffset).countTrailingZeros() >=
                       countTrailingZeros<uint64_t>(NewF.BaseOffset))
                 goto skip_formula;
 
           // Ok, looks good.
           NewF.canonicalize();
           (void)InsertFormula(LU, LUIdx, NewF);
           break;
         skip_formula:;
         }
       }
     }
   }
 }
 
 /// Generate formulae for each use.
 void
 LSRInstance::GenerateAllReuseFormulae() {
   // This is split into multiple loops so that hasRegsUsedByUsesOtherThan
   // queries are more precise.
   for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
     LSRUse &LU = Uses[LUIdx];
     for (size_t i = 0, f = LU.Formulae.size(); i != f; ++i)
       GenerateReassociations(LU, LUIdx, LU.Formulae[i]);
     for (size_t i = 0, f = LU.Formulae.size(); i != f; ++i)
       GenerateCombinations(LU, LUIdx, LU.Formulae[i]);
   }
   for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
     LSRUse &LU = Uses[LUIdx];
     for (size_t i = 0, f = LU.Formulae.size(); i != f; ++i)
       GenerateSymbolicOffsets(LU, LUIdx, LU.Formulae[i]);
     for (size_t i = 0, f = LU.Formulae.size(); i != f; ++i)
       GenerateConstantOffsets(LU, LUIdx, LU.Formulae[i]);
     for (size_t i = 0, f = LU.Formulae.size(); i != f; ++i)
       GenerateICmpZeroScales(LU, LUIdx, LU.Formulae[i]);
     for (size_t i = 0, f = LU.Formulae.size(); i != f; ++i)
       GenerateScales(LU, LUIdx, LU.Formulae[i]);
   }
   for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
     LSRUse &LU = Uses[LUIdx];
     for (size_t i = 0, f = LU.Formulae.size(); i != f; ++i)
       GenerateTruncates(LU, LUIdx, LU.Formulae[i]);
   }
 
   GenerateCrossUseConstantOffsets();
 
   DEBUG(dbgs() << "\n"
                   "After generating reuse formulae:\n";
         print_uses(dbgs()));
 }
 
 /// If there are multiple formulae with the same set of registers used
 /// by other uses, pick the best one and delete the others.
 void LSRInstance::FilterOutUndesirableDedicatedRegisters() {
   DenseSet<const SCEV *> VisitedRegs;
   SmallPtrSet<const SCEV *, 16> Regs;
   SmallPtrSet<const SCEV *, 16> LoserRegs;
 #ifndef NDEBUG
   bool ChangedFormulae = false;
 #endif
 
   // Collect the best formula for each unique set of shared registers. This
   // is reset for each use.
   typedef DenseMap<SmallVector<const SCEV *, 4>, size_t, UniquifierDenseMapInfo>
     BestFormulaeTy;
   BestFormulaeTy BestFormulae;
 
   for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
     LSRUse &LU = Uses[LUIdx];
     DEBUG(dbgs() << "Filtering for use "; LU.print(dbgs()); dbgs() << '\n');
 
     bool Any = false;
     for (size_t FIdx = 0, NumForms = LU.Formulae.size();
          FIdx != NumForms; ++FIdx) {
       Formula &F = LU.Formulae[FIdx];
 
       // Some formulas are instant losers. For example, they may depend on
       // nonexistent AddRecs from other loops. These need to be filtered
       // immediately, otherwise heuristics could choose them over others leading
       // to an unsatisfactory solution. Passing LoserRegs into RateFormula here
       // avoids the need to recompute this information across formulae using the
       // same bad AddRec. Passing LoserRegs is also essential unless we remove
       // the corresponding bad register from the Regs set.
       Cost CostF;
       Regs.clear();
       CostF.RateFormula(TTI, F, Regs, VisitedRegs, L, LU.Offsets, SE, DT, LU,
                         &LoserRegs);
       if (CostF.isLoser()) {
         // During initial formula generation, undesirable formulae are generated
         // by uses within other loops that have some non-trivial address mode or
         // use the postinc form of the IV. LSR needs to provide these formulae
         // as the basis of rediscovering the desired formula that uses an AddRec
         // corresponding to the existing phi. Once all formulae have been
         // generated, these initial losers may be pruned.
         DEBUG(dbgs() << "  Filtering loser "; F.print(dbgs());
               dbgs() << "\n");
       }
       else {
         SmallVector<const SCEV *, 4> Key;
         for (const SCEV *Reg : F.BaseRegs) {
           if (RegUses.isRegUsedByUsesOtherThan(Reg, LUIdx))
             Key.push_back(Reg);
         }
         if (F.ScaledReg &&
             RegUses.isRegUsedByUsesOtherThan(F.ScaledReg, LUIdx))
           Key.push_back(F.ScaledReg);
         // Unstable sort by host order ok, because this is only used for
         // uniquifying.
         std::sort(Key.begin(), Key.end());
 
         std::pair<BestFormulaeTy::const_iterator, bool> P =
           BestFormulae.insert(std::make_pair(Key, FIdx));
         if (P.second)
           continue;
 
         Formula &Best = LU.Formulae[P.first->second];
 
         Cost CostBest;
         Regs.clear();
         CostBest.RateFormula(TTI, Best, Regs, VisitedRegs, L, LU.Offsets, SE,
                              DT, LU);
         if (CostF < CostBest)
           std::swap(F, Best);
         DEBUG(dbgs() << "  Filtering out formula "; F.print(dbgs());
               dbgs() << "\n"
                         "    in favor of formula "; Best.print(dbgs());
               dbgs() << '\n');
       }
 #ifndef NDEBUG
       ChangedFormulae = true;
 #endif
       LU.DeleteFormula(F);
       --FIdx;
       --NumForms;
       Any = true;
     }
 
     // Now that we've filtered out some formulae, recompute the Regs set.
     if (Any)
       LU.RecomputeRegs(LUIdx, RegUses);
 
     // Reset this to prepare for the next use.
     BestFormulae.clear();
   }
 
   DEBUG(if (ChangedFormulae) {
           dbgs() << "\n"
                     "After filtering out undesirable candidates:\n";
           print_uses(dbgs());
         });
 }
 
 // This is a rough guess that seems to work fairly well.
 static const size_t ComplexityLimit = UINT16_MAX;
 
 /// Estimate the worst-case number of solutions the solver might have to
 /// consider. It almost never considers this many solutions because it prune the
 /// search space, but the pruning isn't always sufficient.
 size_t LSRInstance::EstimateSearchSpaceComplexity() const {
   size_t Power = 1;
   for (const LSRUse &LU : Uses) {
     size_t FSize = LU.Formulae.size();
     if (FSize >= ComplexityLimit) {
       Power = ComplexityLimit;
       break;
     }
     Power *= FSize;
     if (Power >= ComplexityLimit)
       break;
   }
   return Power;
 }
 
 /// When one formula uses a superset of the registers of another formula, it
 /// won't help reduce register pressure (though it may not necessarily hurt
 /// register pressure); remove it to simplify the system.
 void LSRInstance::NarrowSearchSpaceByDetectingSupersets() {
   if (EstimateSearchSpaceComplexity() >= ComplexityLimit) {
     DEBUG(dbgs() << "The search space is too complex.\n");
 
     DEBUG(dbgs() << "Narrowing the search space by eliminating formulae "
                     "which use a superset of registers used by other "
                     "formulae.\n");
 
     for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
       LSRUse &LU = Uses[LUIdx];
       bool Any = false;
       for (size_t i = 0, e = LU.Formulae.size(); i != e; ++i) {
         Formula &F = LU.Formulae[i];
         // Look for a formula with a constant or GV in a register. If the use
         // also has a formula with that same value in an immediate field,
         // delete the one that uses a register.
         for (SmallVectorImpl<const SCEV *>::const_iterator
              I = F.BaseRegs.begin(), E = F.BaseRegs.end(); I != E; ++I) {
           if (const SCEVConstant *C = dyn_cast<SCEVConstant>(*I)) {
             Formula NewF = F;
             NewF.BaseOffset += C->getValue()->getSExtValue();
             NewF.BaseRegs.erase(NewF.BaseRegs.begin() +
                                 (I - F.BaseRegs.begin()));
             if (LU.HasFormulaWithSameRegs(NewF)) {
               DEBUG(dbgs() << "  Deleting "; F.print(dbgs()); dbgs() << '\n');
               LU.DeleteFormula(F);
               --i;
               --e;
               Any = true;
               break;
             }
           } else if (const SCEVUnknown *U = dyn_cast<SCEVUnknown>(*I)) {
             if (GlobalValue *GV = dyn_cast<GlobalValue>(U->getValue()))
               if (!F.BaseGV) {
                 Formula NewF = F;
                 NewF.BaseGV = GV;
                 NewF.BaseRegs.erase(NewF.BaseRegs.begin() +
                                     (I - F.BaseRegs.begin()));
                 if (LU.HasFormulaWithSameRegs(NewF)) {
                   DEBUG(dbgs() << "  Deleting "; F.print(dbgs());
                         dbgs() << '\n');
                   LU.DeleteFormula(F);
                   --i;
                   --e;
                   Any = true;
                   break;
                 }
               }
           }
         }
       }
       if (Any)
         LU.RecomputeRegs(LUIdx, RegUses);
     }
 
     DEBUG(dbgs() << "After pre-selection:\n";
           print_uses(dbgs()));
   }
 }
 
 /// When there are many registers for expressions like A, A+1, A+2, etc.,
 /// allocate a single register for them.
 void LSRInstance::NarrowSearchSpaceByCollapsingUnrolledCode() {
   if (EstimateSearchSpaceComplexity() < ComplexityLimit)
     return;
 
   DEBUG(dbgs() << "The search space is too complex.\n"
                   "Narrowing the search space by assuming that uses separated "
                   "by a constant offset will use the same registers.\n");
 
   // This is especially useful for unrolled loops.
 
   for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
     LSRUse &LU = Uses[LUIdx];
     for (const Formula &F : LU.Formulae) {
       if (F.BaseOffset == 0 || (F.Scale != 0 && F.Scale != 1))
         continue;
 
       LSRUse *LUThatHas = FindUseWithSimilarFormula(F, LU);
       if (!LUThatHas)
         continue;
 
       if (!reconcileNewOffset(*LUThatHas, F.BaseOffset, /*HasBaseReg=*/ false,
                               LU.Kind, LU.AccessTy))
         continue;
 
       DEBUG(dbgs() << "  Deleting use "; LU.print(dbgs()); dbgs() << '\n');
 
       LUThatHas->AllFixupsOutsideLoop &= LU.AllFixupsOutsideLoop;
 
       // Update the relocs to reference the new use.
       for (LSRFixup &Fixup : Fixups) {
         if (Fixup.LUIdx == LUIdx) {
           Fixup.LUIdx = LUThatHas - &Uses.front();
           Fixup.Offset += F.BaseOffset;
           // Add the new offset to LUThatHas' offset list.
           if (LUThatHas->Offsets.back() != Fixup.Offset) {
             LUThatHas->Offsets.push_back(Fixup.Offset);
             if (Fixup.Offset > LUThatHas->MaxOffset)
               LUThatHas->MaxOffset = Fixup.Offset;
             if (Fixup.Offset < LUThatHas->MinOffset)
               LUThatHas->MinOffset = Fixup.Offset;
           }
           DEBUG(dbgs() << "New fixup has offset " << Fixup.Offset << '\n');
         }
         if (Fixup.LUIdx == NumUses-1)
           Fixup.LUIdx = LUIdx;
       }
 
       // Delete formulae from the new use which are no longer legal.
       bool Any = false;
       for (size_t i = 0, e = LUThatHas->Formulae.size(); i != e; ++i) {
         Formula &F = LUThatHas->Formulae[i];
         if (!isLegalUse(TTI, LUThatHas->MinOffset, LUThatHas->MaxOffset,
                         LUThatHas->Kind, LUThatHas->AccessTy, F)) {
           DEBUG(dbgs() << "  Deleting "; F.print(dbgs());
                 dbgs() << '\n');
           LUThatHas->DeleteFormula(F);
           --i;
           --e;
           Any = true;
         }
       }
 
       if (Any)
         LUThatHas->RecomputeRegs(LUThatHas - &Uses.front(), RegUses);
 
       // Delete the old use.
       DeleteUse(LU, LUIdx);
       --LUIdx;
       --NumUses;
       break;
     }
   }
 
   DEBUG(dbgs() << "After pre-selection:\n"; print_uses(dbgs()));
 }
 
 /// Call FilterOutUndesirableDedicatedRegisters again, if necessary, now that
 /// we've done more filtering, as it may be able to find more formulae to
 /// eliminate.
 void LSRInstance::NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters(){
   if (EstimateSearchSpaceComplexity() >= ComplexityLimit) {
     DEBUG(dbgs() << "The search space is too complex.\n");
 
     DEBUG(dbgs() << "Narrowing the search space by re-filtering out "
                     "undesirable dedicated registers.\n");
 
     FilterOutUndesirableDedicatedRegisters();
 
     DEBUG(dbgs() << "After pre-selection:\n";
           print_uses(dbgs()));
   }
 }
 
 /// Pick a register which seems likely to be profitable, and then in any use
 /// which has any reference to that register, delete all formulae which do not
 /// reference that register.
 void LSRInstance::NarrowSearchSpaceByPickingWinnerRegs() {
   // With all other options exhausted, loop until the system is simple
   // enough to handle.
   SmallPtrSet<const SCEV *, 4> Taken;
   while (EstimateSearchSpaceComplexity() >= ComplexityLimit) {
     // Ok, we have too many of formulae on our hands to conveniently handle.
     // Use a rough heuristic to thin out the list.
     DEBUG(dbgs() << "The search space is too complex.\n");
 
     // Pick the register which is used by the most LSRUses, which is likely
     // to be a good reuse register candidate.
     const SCEV *Best = nullptr;
     unsigned BestNum = 0;
     for (const SCEV *Reg : RegUses) {
       if (Taken.count(Reg))
         continue;
       if (!Best)
         Best = Reg;
       else {
         unsigned Count = RegUses.getUsedByIndices(Reg).count();
         if (Count > BestNum) {
           Best = Reg;
           BestNum = Count;
         }
       }
     }
 
     DEBUG(dbgs() << "Narrowing the search space by assuming " << *Best
                  << " will yield profitable reuse.\n");
     Taken.insert(Best);
 
     // In any use with formulae which references this register, delete formulae
     // which don't reference it.
     for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx) {
       LSRUse &LU = Uses[LUIdx];
       if (!LU.Regs.count(Best)) continue;
 
       bool Any = false;
       for (size_t i = 0, e = LU.Formulae.size(); i != e; ++i) {
         Formula &F = LU.Formulae[i];
         if (!F.referencesReg(Best)) {
           DEBUG(dbgs() << "  Deleting "; F.print(dbgs()); dbgs() << '\n');
           LU.DeleteFormula(F);
           --e;
           --i;
           Any = true;
           assert(e != 0 && "Use has no formulae left! Is Regs inconsistent?");
           continue;
         }
       }
 
       if (Any)
         LU.RecomputeRegs(LUIdx, RegUses);
     }
 
     DEBUG(dbgs() << "After pre-selection:\n";
           print_uses(dbgs()));
   }
 }
 
 /// If there are an extraordinary number of formulae to choose from, use some
 /// rough heuristics to prune down the number of formulae. This keeps the main
 /// solver from taking an extraordinary amount of time in some worst-case
 /// scenarios.
 void LSRInstance::NarrowSearchSpaceUsingHeuristics() {
   NarrowSearchSpaceByDetectingSupersets();
   NarrowSearchSpaceByCollapsingUnrolledCode();
   NarrowSearchSpaceByRefilteringUndesirableDedicatedRegisters();
   NarrowSearchSpaceByPickingWinnerRegs();
 }
 
 /// This is the recursive solver.
 void LSRInstance::SolveRecurse(SmallVectorImpl<const Formula *> &Solution,
                                Cost &SolutionCost,
                                SmallVectorImpl<const Formula *> &Workspace,
                                const Cost &CurCost,
                                const SmallPtrSet<const SCEV *, 16> &CurRegs,
                                DenseSet<const SCEV *> &VisitedRegs) const {
   // Some ideas:
   //  - prune more:
   //    - use more aggressive filtering
   //    - sort the formula so that the most profitable solutions are found first
   //    - sort the uses too
   //  - search faster:
   //    - don't compute a cost, and then compare. compare while computing a cost
   //      and bail early.
   //    - track register sets with SmallBitVector
 
   const LSRUse &LU = Uses[Workspace.size()];
 
   // If this use references any register that's already a part of the
   // in-progress solution, consider it a requirement that a formula must
   // reference that register in order to be considered. This prunes out
   // unprofitable searching.
   SmallSetVector<const SCEV *, 4> ReqRegs;
   for (const SCEV *S : CurRegs)
     if (LU.Regs.count(S))
       ReqRegs.insert(S);
 
   SmallPtrSet<const SCEV *, 16> NewRegs;
   Cost NewCost;
   for (const Formula &F : LU.Formulae) {
     // Ignore formulae which may not be ideal in terms of register reuse of
     // ReqRegs.  The formula should use all required registers before
     // introducing new ones.
     int NumReqRegsToFind = std::min(F.getNumRegs(), ReqRegs.size());
     for (const SCEV *Reg : ReqRegs) {
       if ((F.ScaledReg && F.ScaledReg == Reg) ||
           std::find(F.BaseRegs.begin(), F.BaseRegs.end(), Reg) !=
           F.BaseRegs.end()) {
         --NumReqRegsToFind;
         if (NumReqRegsToFind == 0)
           break;
       }
     }
     if (NumReqRegsToFind != 0) {
       // If none of the formulae satisfied the required registers, then we could
       // clear ReqRegs and try again. Currently, we simply give up in this case.
       continue;
     }
 
     // Evaluate the cost of the current formula. If it's already worse than
     // the current best, prune the search at that point.
     NewCost = CurCost;
     NewRegs = CurRegs;
     NewCost.RateFormula(TTI, F, NewRegs, VisitedRegs, L, LU.Offsets, SE, DT,
                         LU);
     if (NewCost < SolutionCost) {
       Workspace.push_back(&F);
       if (Workspace.size() != Uses.size()) {
         SolveRecurse(Solution, SolutionCost, Workspace, NewCost,
                      NewRegs, VisitedRegs);
         if (F.getNumRegs() == 1 && Workspace.size() == 1)
           VisitedRegs.insert(F.ScaledReg ? F.ScaledReg : F.BaseRegs[0]);
       } else {
         DEBUG(dbgs() << "New best at "; NewCost.print(dbgs());
               dbgs() << ".\n Regs:";
               for (const SCEV *S : NewRegs)
                 dbgs() << ' ' << *S;
               dbgs() << '\n');
 
         SolutionCost = NewCost;
         Solution = Workspace;
       }
       Workspace.pop_back();
     }
   }
 }
 
 /// Choose one formula from each use. Return the results in the given Solution
 /// vector.
 void LSRInstance::Solve(SmallVectorImpl<const Formula *> &Solution) const {
   SmallVector<const Formula *, 8> Workspace;
   Cost SolutionCost;
   SolutionCost.Lose();
   Cost CurCost;
   SmallPtrSet<const SCEV *, 16> CurRegs;
   DenseSet<const SCEV *> VisitedRegs;
   Workspace.reserve(Uses.size());
 
   // SolveRecurse does all the work.
   SolveRecurse(Solution, SolutionCost, Workspace, CurCost,
                CurRegs, VisitedRegs);
   if (Solution.empty()) {
     DEBUG(dbgs() << "\nNo Satisfactory Solution\n");
     return;
   }
 
   // Ok, we've now made all our decisions.
   DEBUG(dbgs() << "\n"
                   "The chosen solution requires "; SolutionCost.print(dbgs());
         dbgs() << ":\n";
         for (size_t i = 0, e = Uses.size(); i != e; ++i) {
           dbgs() << "  ";
           Uses[i].print(dbgs());
           dbgs() << "\n"
                     "    ";
           Solution[i]->print(dbgs());
           dbgs() << '\n';
         });
 
   assert(Solution.size() == Uses.size() && "Malformed solution!");
 }
 
 /// Helper for AdjustInsertPositionForExpand. Climb up the dominator tree far as
 /// we can go while still being dominated by the input positions. This helps
 /// canonicalize the insert position, which encourages sharing.
 BasicBlock::iterator
 LSRInstance::HoistInsertPosition(BasicBlock::iterator IP,
                                  const SmallVectorImpl<Instruction *> &Inputs)
                                                                          const {
   for (;;) {
     const Loop *IPLoop = LI.getLoopFor(IP->getParent());
     unsigned IPLoopDepth = IPLoop ? IPLoop->getLoopDepth() : 0;
 
     BasicBlock *IDom;
     for (DomTreeNode *Rung = DT.getNode(IP->getParent()); ; ) {
       if (!Rung) return IP;
       Rung = Rung->getIDom();
       if (!Rung) return IP;
       IDom = Rung->getBlock();
 
       // Don't climb into a loop though.
       const Loop *IDomLoop = LI.getLoopFor(IDom);
       unsigned IDomDepth = IDomLoop ? IDomLoop->getLoopDepth() : 0;
       if (IDomDepth <= IPLoopDepth &&
           (IDomDepth != IPLoopDepth || IDomLoop == IPLoop))
         break;
     }
 
     bool AllDominate = true;
     Instruction *BetterPos = nullptr;
     Instruction *Tentative = IDom->getTerminator();
     for (Instruction *Inst : Inputs) {
       if (Inst == Tentative || !DT.dominates(Inst, Tentative)) {
         AllDominate = false;
         break;
       }
       // Attempt to find an insert position in the middle of the block,
       // instead of at the end, so that it can be used for other expansions.
       if (IDom == Inst->getParent() &&
           (!BetterPos || !DT.dominates(Inst, BetterPos)))
         BetterPos = &*std::next(BasicBlock::iterator(Inst));
     }
     if (!AllDominate)
       break;
     if (BetterPos)
       IP = BetterPos->getIterator();
     else
       IP = Tentative->getIterator();
   }
 
   return IP;
 }
 
 /// Determine an input position which will be dominated by the operands and
 /// which will dominate the result.
 BasicBlock::iterator
 LSRInstance::AdjustInsertPositionForExpand(BasicBlock::iterator LowestIP,
                                            const LSRFixup &LF,
                                            const LSRUse &LU,
                                            SCEVExpander &Rewriter) const {
   // Collect some instructions which must be dominated by the
   // expanding replacement. These must be dominated by any operands that
   // will be required in the expansion.
   SmallVector<Instruction *, 4> Inputs;
   if (Instruction *I = dyn_cast<Instruction>(LF.OperandValToReplace))
     Inputs.push_back(I);
   if (LU.Kind == LSRUse::ICmpZero)
     if (Instruction *I =
           dyn_cast<Instruction>(cast<ICmpInst>(LF.UserInst)->getOperand(1)))
       Inputs.push_back(I);
   if (LF.PostIncLoops.count(L)) {
     if (LF.isUseFullyOutsideLoop(L))
       Inputs.push_back(L->getLoopLatch()->getTerminator());
     else
       Inputs.push_back(IVIncInsertPos);
   }
   // The expansion must also be dominated by the increment positions of any
   // loops it for which it is using post-inc mode.
   for (const Loop *PIL : LF.PostIncLoops) {
     if (PIL == L) continue;
 
     // Be dominated by the loop exit.
     SmallVector<BasicBlock *, 4> ExitingBlocks;
     PIL->getExitingBlocks(ExitingBlocks);
     if (!ExitingBlocks.empty()) {
       BasicBlock *BB = ExitingBlocks[0];
       for (unsigned i = 1, e = ExitingBlocks.size(); i != e; ++i)
         BB = DT.findNearestCommonDominator(BB, ExitingBlocks[i]);
       Inputs.push_back(BB->getTerminator());
     }
   }
 
   assert(!isa<PHINode>(LowestIP) && !LowestIP->isEHPad()
          && !isa<DbgInfoIntrinsic>(LowestIP) &&
          "Insertion point must be a normal instruction");
 
   // Then, climb up the immediate dominator tree as far as we can go while
   // still being dominated by the input positions.
   BasicBlock::iterator IP = HoistInsertPosition(LowestIP, Inputs);
 
   // Don't insert instructions before PHI nodes.
   while (isa<PHINode>(IP)) ++IP;
 
   // Ignore landingpad instructions.
   while (!isa<TerminatorInst>(IP) && IP->isEHPad()) ++IP;
 
   // Ignore debug intrinsics.
   while (isa<DbgInfoIntrinsic>(IP)) ++IP;
 
   // Set IP below instructions recently inserted by SCEVExpander. This keeps the
   // IP consistent across expansions and allows the previously inserted
   // instructions to be reused by subsequent expansion.
   while (Rewriter.isInsertedInstruction(&*IP) && IP != LowestIP)
     ++IP;
 
   return IP;
 }
 
 /// Emit instructions for the leading candidate expression for this LSRUse (this
 /// is called "expanding").
 Value *LSRInstance::Expand(const LSRFixup &LF,
                            const Formula &F,
                            BasicBlock::iterator IP,
                            SCEVExpander &Rewriter,
                            SmallVectorImpl<WeakVH> &DeadInsts) const {
   const LSRUse &LU = Uses[LF.LUIdx];
   if (LU.RigidFormula)
     return LF.OperandValToReplace;
 
   // Determine an input position which will be dominated by the operands and
   // which will dominate the result.
   IP = AdjustInsertPositionForExpand(IP, LF, LU, Rewriter);
 
   // Inform the Rewriter if we have a post-increment use, so that it can
   // perform an advantageous expansion.
   Rewriter.setPostInc(LF.PostIncLoops);
 
   // This is the type that the user actually needs.
   Type *OpTy = LF.OperandValToReplace->getType();
   // This will be the type that we'll initially expand to.
   Type *Ty = F.getType();
   if (!Ty)
     // No type known; just expand directly to the ultimate type.
     Ty = OpTy;
   else if (SE.getEffectiveSCEVType(Ty) == SE.getEffectiveSCEVType(OpTy))
     // Expand directly to the ultimate type if it's the right size.
     Ty = OpTy;
   // This is the type to do integer arithmetic in.
   Type *IntTy = SE.getEffectiveSCEVType(Ty);
 
   // Build up a list of operands to add together to form the full base.
   SmallVector<const SCEV *, 8> Ops;
 
   // Expand the BaseRegs portion.
   for (const SCEV *Reg : F.BaseRegs) {
     assert(!Reg->isZero() && "Zero allocated in a base register!");
 
     // If we're expanding for a post-inc user, make the post-inc adjustment.
     PostIncLoopSet &Loops = const_cast<PostIncLoopSet &>(LF.PostIncLoops);
     Reg = TransformForPostIncUse(Denormalize, Reg,
                                  LF.UserInst, LF.OperandValToReplace,
                                  Loops, SE, DT);
 
     Ops.push_back(SE.getUnknown(Rewriter.expandCodeFor(Reg, nullptr, &*IP)));
   }
 
   // Expand the ScaledReg portion.
   Value *ICmpScaledV = nullptr;
   if (F.Scale != 0) {
     const SCEV *ScaledS = F.ScaledReg;
 
     // If we're expanding for a post-inc user, make the post-inc adjustment.
     PostIncLoopSet &Loops = const_cast<PostIncLoopSet &>(LF.PostIncLoops);
     ScaledS = TransformForPostIncUse(Denormalize, ScaledS,
                                      LF.UserInst, LF.OperandValToReplace,
                                      Loops, SE, DT);
 
     if (LU.Kind == LSRUse::ICmpZero) {
       // Expand ScaleReg as if it was part of the base regs.
       if (F.Scale == 1)
         Ops.push_back(
             SE.getUnknown(Rewriter.expandCodeFor(ScaledS, nullptr, &*IP)));
       else {
         // An interesting way of "folding" with an icmp is to use a negated
         // scale, which we'll implement by inserting it into the other operand
         // of the icmp.
         assert(F.Scale == -1 &&
                "The only scale supported by ICmpZero uses is -1!");
         ICmpScaledV = Rewriter.expandCodeFor(ScaledS, nullptr, &*IP);
       }
     } else {
       // Otherwise just expand the scaled register and an explicit scale,
       // which is expected to be matched as part of the address.
 
       // Flush the operand list to suppress SCEVExpander hoisting address modes.
       // Unless the addressing mode will not be folded.
       if (!Ops.empty() && LU.Kind == LSRUse::Address &&
           isAMCompletelyFolded(TTI, LU, F)) {
         Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty, &*IP);
         Ops.clear();
         Ops.push_back(SE.getUnknown(FullV));
       }
       ScaledS = SE.getUnknown(Rewriter.expandCodeFor(ScaledS, nullptr, &*IP));
       if (F.Scale != 1)
         ScaledS =
             SE.getMulExpr(ScaledS, SE.getConstant(ScaledS->getType(), F.Scale));
       Ops.push_back(ScaledS);
     }
   }
 
   // Expand the GV portion.
   if (F.BaseGV) {
     // Flush the operand list to suppress SCEVExpander hoisting.
     if (!Ops.empty()) {
       Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty, &*IP);
       Ops.clear();
       Ops.push_back(SE.getUnknown(FullV));
     }
     Ops.push_back(SE.getUnknown(F.BaseGV));
   }
 
   // Flush the operand list to suppress SCEVExpander hoisting of both folded and
   // unfolded offsets. LSR assumes they both live next to their uses.
   if (!Ops.empty()) {
     Value *FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty, &*IP);
     Ops.clear();
     Ops.push_back(SE.getUnknown(FullV));
   }
 
   // Expand the immediate portion.
   int64_t Offset = (uint64_t)F.BaseOffset + LF.Offset;
   if (Offset != 0) {
     if (LU.Kind == LSRUse::ICmpZero) {
       // The other interesting way of "folding" with an ICmpZero is to use a
       // negated immediate.
       if (!ICmpScaledV)
         ICmpScaledV = ConstantInt::get(IntTy, -(uint64_t)Offset);
       else {
         Ops.push_back(SE.getUnknown(ICmpScaledV));
         ICmpScaledV = ConstantInt::get(IntTy, Offset);
       }
     } else {
       // Just add the immediate values. These again are expected to be matched
       // as part of the address.
       Ops.push_back(SE.getUnknown(ConstantInt::getSigned(IntTy, Offset)));
     }
   }
 
   // Expand the unfolded offset portion.
   int64_t UnfoldedOffset = F.UnfoldedOffset;
   if (UnfoldedOffset != 0) {
     // Just add the immediate values.
     Ops.push_back(SE.getUnknown(ConstantInt::getSigned(IntTy,
                                                        UnfoldedOffset)));
   }
 
   // Emit instructions summing all the operands.
   const SCEV *FullS = Ops.empty() ?
                       SE.getConstant(IntTy, 0) :
                       SE.getAddExpr(Ops);
   Value *FullV = Rewriter.expandCodeFor(FullS, Ty, &*IP);
 
   // We're done expanding now, so reset the rewriter.
   Rewriter.clearPostInc();
 
   // An ICmpZero Formula represents an ICmp which we're handling as a
   // comparison against zero. Now that we've expanded an expression for that
   // form, update the ICmp's other operand.
   if (LU.Kind == LSRUse::ICmpZero) {
     ICmpInst *CI = cast<ICmpInst>(LF.UserInst);
     DeadInsts.emplace_back(CI->getOperand(1));
     assert(!F.BaseGV && "ICmp does not support folding a global value and "
                            "a scale at the same time!");
     if (F.Scale == -1) {
       if (ICmpScaledV->getType() != OpTy) {
         Instruction *Cast =
           CastInst::Create(CastInst::getCastOpcode(ICmpScaledV, false,
                                                    OpTy, false),
                            ICmpScaledV, OpTy, "tmp", CI);
         ICmpScaledV = Cast;
       }
       CI->setOperand(1, ICmpScaledV);
     } else {
       // A scale of 1 means that the scale has been expanded as part of the
       // base regs.
       assert((F.Scale == 0 || F.Scale == 1) &&
              "ICmp does not support folding a global value and "
              "a scale at the same time!");
       Constant *C = ConstantInt::getSigned(SE.getEffectiveSCEVType(OpTy),
                                            -(uint64_t)Offset);
       if (C->getType() != OpTy)
         C = ConstantExpr::getCast(CastInst::getCastOpcode(C, false,
                                                           OpTy, false),
                                   C, OpTy);
 
       CI->setOperand(1, C);
     }
   }
 
   return FullV;
 }
 
 /// Helper for Rewrite. PHI nodes are special because the use of their operands
 /// effectively happens in their predecessor blocks, so the expression may need
 /// to be expanded in multiple places.
 void LSRInstance::RewriteForPHI(PHINode *PN,
                                 const LSRFixup &LF,
                                 const Formula &F,
                                 SCEVExpander &Rewriter,
                                 SmallVectorImpl<WeakVH> &DeadInsts) const {
   DenseMap<BasicBlock *, Value *> Inserted;
   for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i)
     if (PN->getIncomingValue(i) == LF.OperandValToReplace) {
       BasicBlock *BB = PN->getIncomingBlock(i);
 
       // If this is a critical edge, split the edge so that we do not insert
       // the code on all predecessor/successor paths.  We do this unless this
       // is the canonical backedge for this loop, which complicates post-inc
       // users.
       if (e != 1 && BB->getTerminator()->getNumSuccessors() > 1 &&
           !isa<IndirectBrInst>(BB->getTerminator())) {
         BasicBlock *Parent = PN->getParent();
         Loop *PNLoop = LI.getLoopFor(Parent);
         if (!PNLoop || Parent != PNLoop->getHeader()) {
           // Split the critical edge.
           BasicBlock *NewBB = nullptr;
           if (!Parent->isLandingPad()) {
             NewBB = SplitCriticalEdge(BB, Parent,
                                       CriticalEdgeSplittingOptions(&DT, &LI)
                                           .setMergeIdenticalEdges()
                                           .setDontDeleteUselessPHIs());
           } else {
             SmallVector<BasicBlock*, 2> NewBBs;
             SplitLandingPadPredecessors(Parent, BB, "", "", NewBBs, &DT, &LI);
             NewBB = NewBBs[0];
           }
           // If NewBB==NULL, then SplitCriticalEdge refused to split because all
           // phi predecessors are identical. The simple thing to do is skip
           // splitting in this case rather than complicate the API.
           if (NewBB) {
             // If PN is outside of the loop and BB is in the loop, we want to
             // move the block to be immediately before the PHI block, not
             // immediately after BB.
             if (L->contains(BB) && !L->contains(PN))
               NewBB->moveBefore(PN->getParent());
 
             // Splitting the edge can reduce the number of PHI entries we have.
             e = PN->getNumIncomingValues();
             BB = NewBB;
             i = PN->getBasicBlockIndex(BB);
           }
         }
       }
 
       std::pair<DenseMap<BasicBlock *, Value *>::iterator, bool> Pair =
         Inserted.insert(std::make_pair(BB, static_cast<Value *>(nullptr)));
       if (!Pair.second)
         PN->setIncomingValue(i, Pair.first->second);
       else {
         Value *FullV = Expand(LF, F, BB->getTerminator()->getIterator(),
                               Rewriter, DeadInsts);
 
         // If this is reuse-by-noop-cast, insert the noop cast.
         Type *OpTy = LF.OperandValToReplace->getType();
         if (FullV->getType() != OpTy)
           FullV =
             CastInst::Create(CastInst::getCastOpcode(FullV, false,
                                                      OpTy, false),
                              FullV, LF.OperandValToReplace->getType(),
                              "tmp", BB->getTerminator());
 
         PN->setIncomingValue(i, FullV);
         Pair.first->second = FullV;
       }
     }
 }
 
 /// Emit instructions for the leading candidate expression for this LSRUse (this
 /// is called "expanding"), and update the UserInst to reference the newly
 /// expanded value.
 void LSRInstance::Rewrite(const LSRFixup &LF,
                           const Formula &F,
                           SCEVExpander &Rewriter,
                           SmallVectorImpl<WeakVH> &DeadInsts) const {
   // First, find an insertion point that dominates UserInst. For PHI nodes,
   // find the nearest block which dominates all the relevant uses.
   if (PHINode *PN = dyn_cast<PHINode>(LF.UserInst)) {
     RewriteForPHI(PN, LF, F, Rewriter, DeadInsts);
   } else {
     Value *FullV =
         Expand(LF, F, LF.UserInst->getIterator(), Rewriter, DeadInsts);
 
     // If this is reuse-by-noop-cast, insert the noop cast.
     Type *OpTy = LF.OperandValToReplace->getType();
     if (FullV->getType() != OpTy) {
       Instruction *Cast =
         CastInst::Create(CastInst::getCastOpcode(FullV, false, OpTy, false),
                          FullV, OpTy, "tmp", LF.UserInst);
       FullV = Cast;
     }
 
     // Update the user. ICmpZero is handled specially here (for now) because
     // Expand may have updated one of the operands of the icmp already, and
     // its new value may happen to be equal to LF.OperandValToReplace, in
     // which case doing replaceUsesOfWith leads to replacing both operands
     // with the same value. TODO: Reorganize this.
     if (Uses[LF.LUIdx].Kind == LSRUse::ICmpZero)
       LF.UserInst->setOperand(0, FullV);
     else
       LF.UserInst->replaceUsesOfWith(LF.OperandValToReplace, FullV);
   }
 
   DeadInsts.emplace_back(LF.OperandValToReplace);
 }
 
 /// Rewrite all the fixup locations with new values, following the chosen
 /// solution.
 void LSRInstance::ImplementSolution(
     const SmallVectorImpl<const Formula *> &Solution) {
   // Keep track of instructions we may have made dead, so that
   // we can remove them after we are done working.
   SmallVector<WeakVH, 16> DeadInsts;
 
   SCEVExpander Rewriter(SE, L->getHeader()->getModule()->getDataLayout(),
                         "lsr");
 #ifndef NDEBUG
   Rewriter.setDebugType(DEBUG_TYPE);
 #endif
   Rewriter.disableCanonicalMode();
   Rewriter.enableLSRMode();
   Rewriter.setIVIncInsertPos(L, IVIncInsertPos);
 
   // Mark phi nodes that terminate chains so the expander tries to reuse them.
   for (const IVChain &Chain : IVChainVec) {
     if (PHINode *PN = dyn_cast<PHINode>(Chain.tailUserInst()))
       Rewriter.setChainedPhi(PN);
   }
 
   // Expand the new value definitions and update the users.
   for (const LSRFixup &Fixup : Fixups) {
     Rewrite(Fixup, *Solution[Fixup.LUIdx], Rewriter, DeadInsts);
 
     Changed = true;
   }
 
   for (const IVChain &Chain : IVChainVec) {
     GenerateIVChain(Chain, Rewriter, DeadInsts);
     Changed = true;
   }
   // Clean up after ourselves. This must be done before deleting any
   // instructions.
   Rewriter.clear();
 
   Changed |= DeleteTriviallyDeadInstructions(DeadInsts);
 }
 
 LSRInstance::LSRInstance(Loop *L, IVUsers &IU, ScalarEvolution &SE,
                          DominatorTree &DT, LoopInfo &LI,
                          const TargetTransformInfo &TTI)
     : IU(IU), SE(SE), DT(DT), LI(LI), TTI(TTI), L(L), Changed(false),
       IVIncInsertPos(nullptr) {
   // If LoopSimplify form is not available, stay out of trouble.
   if (!L->isLoopSimplifyForm())
     return;
 
   // If there's no interesting work to be done, bail early.
   if (IU.empty()) return;
 
   // If there's too much analysis to be done, bail early. We won't be able to
   // model the problem anyway.
   unsigned NumUsers = 0;
   for (const IVStrideUse &U : IU) {
     if (++NumUsers > MaxIVUsers) {
       (void)U;
       DEBUG(dbgs() << "LSR skipping loop, too many IV Users in " << U << "\n");
       return;
     }
+    // Bail out if we have a PHI on an EHPad that gets a value from a
+    // CatchSwitchInst.  Because the CatchSwitchInst cannot be split, there is
+    // no good place to stick any instructions.
+    if (auto *PN = dyn_cast<PHINode>(U.getUser())) {
+       auto *FirstNonPHI = PN->getParent()->getFirstNonPHI();
+       if (isa<FuncletPadInst>(FirstNonPHI) ||
+           isa<CatchSwitchInst>(FirstNonPHI))
+         for (BasicBlock *PredBB : PN->blocks())
+           if (isa<CatchSwitchInst>(PredBB->getFirstNonPHI()))
+             return;
+    }
   }
 
 #ifndef NDEBUG
   // All dominating loops must have preheaders, or SCEVExpander may not be able
   // to materialize an AddRecExpr whose Start is an outer AddRecExpr.
   //
   // IVUsers analysis should only create users that are dominated by simple loop
   // headers. Since this loop should dominate all of its users, its user list
   // should be empty if this loop itself is not within a simple loop nest.
   for (DomTreeNode *Rung = DT.getNode(L->getLoopPreheader());
        Rung; Rung = Rung->getIDom()) {
     BasicBlock *BB = Rung->getBlock();
     const Loop *DomLoop = LI.getLoopFor(BB);
     if (DomLoop && DomLoop->getHeader() == BB) {
       assert(DomLoop->getLoopPreheader() && "LSR needs a simplified loop nest");
     }
   }
 #endif // DEBUG
 
   DEBUG(dbgs() << "\nLSR on loop ";
         L->getHeader()->printAsOperand(dbgs(), /*PrintType=*/false);
         dbgs() << ":\n");
 
   // First, perform some low-level loop optimizations.
   OptimizeShadowIV();
   OptimizeLoopTermCond();
 
   // If loop preparation eliminates all interesting IV users, bail.
   if (IU.empty()) return;
 
   // Skip nested loops until we can model them better with formulae.
   if (!L->empty()) {
     DEBUG(dbgs() << "LSR skipping outer loop " << *L << "\n");
     return;
   }
 
   // Start collecting data and preparing for the solver.
   CollectChains();
   CollectInterestingTypesAndFactors();
   CollectFixupsAndInitialFormulae();
   CollectLoopInvariantFixupsAndFormulae();
 
   assert(!Uses.empty() && "IVUsers reported at least one use");
   DEBUG(dbgs() << "LSR found " << Uses.size() << " uses:\n";
         print_uses(dbgs()));
 
   // Now use the reuse data to generate a bunch of interesting ways
   // to formulate the values needed for the uses.
   GenerateAllReuseFormulae();
 
   FilterOutUndesirableDedicatedRegisters();
   NarrowSearchSpaceUsingHeuristics();
 
   SmallVector<const Formula *, 8> Solution;
   Solve(Solution);
 
   // Release memory that is no longer needed.
   Factors.clear();
   Types.clear();
   RegUses.clear();
 
   if (Solution.empty())
     return;
 
 #ifndef NDEBUG
   // Formulae should be legal.
   for (const LSRUse &LU : Uses) {
     for (const Formula &F : LU.Formulae)
       assert(isLegalUse(TTI, LU.MinOffset, LU.MaxOffset, LU.Kind, LU.AccessTy,
                         F) && "Illegal formula generated!");
   };
 #endif
 
   // Now that we've decided what we want, make it so.
   ImplementSolution(Solution);
 }
 
 void LSRInstance::print_factors_and_types(raw_ostream &OS) const {
   if (Factors.empty() && Types.empty()) return;
 
   OS << "LSR has identified the following interesting factors and types: ";
   bool First = true;
 
   for (int64_t Factor : Factors) {
     if (!First) OS << ", ";
     First = false;
     OS << '*' << Factor;
   }
 
   for (Type *Ty : Types) {
     if (!First) OS << ", ";
     First = false;
     OS << '(' << *Ty << ')';
   }
   OS << '\n';
 }
 
 void LSRInstance::print_fixups(raw_ostream &OS) const {
   OS << "LSR is examining the following fixup sites:\n";
   for (const LSRFixup &LF : Fixups) {
     dbgs() << "  ";
     LF.print(OS);
     OS << '\n';
   }
 }
 
 void LSRInstance::print_uses(raw_ostream &OS) const {
   OS << "LSR is examining the following uses:\n";
   for (const LSRUse &LU : Uses) {
     dbgs() << "  ";
     LU.print(OS);
     OS << '\n';
     for (const Formula &F : LU.Formulae) {
       OS << "    ";
       F.print(OS);
       OS << '\n';
     }
   }
 }
 
 void LSRInstance::print(raw_ostream &OS) const {
   print_factors_and_types(OS);
   print_fixups(OS);
   print_uses(OS);
 }
 
 LLVM_DUMP_METHOD
 void LSRInstance::dump() const {
   print(errs()); errs() << '\n';
 }
 
 namespace {
 
 class LoopStrengthReduce : public LoopPass {
 public:
   static char ID; // Pass ID, replacement for typeid
   LoopStrengthReduce();
 
 private:
   bool runOnLoop(Loop *L, LPPassManager &LPM) override;
   void getAnalysisUsage(AnalysisUsage &AU) const override;
 };
 
 }
 
 char LoopStrengthReduce::ID = 0;
 INITIALIZE_PASS_BEGIN(LoopStrengthReduce, "loop-reduce",
                 "Loop Strength Reduction", false, false)
 INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(IVUsers)
 INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(LoopSimplify)
 INITIALIZE_PASS_END(LoopStrengthReduce, "loop-reduce",
                 "Loop Strength Reduction", false, false)
 
 
 Pass *llvm::createLoopStrengthReducePass() {
   return new LoopStrengthReduce();
 }
 
 LoopStrengthReduce::LoopStrengthReduce() : LoopPass(ID) {
   initializeLoopStrengthReducePass(*PassRegistry::getPassRegistry());
 }
 
 void LoopStrengthReduce::getAnalysisUsage(AnalysisUsage &AU) const {
   // We split critical edges, so we change the CFG.  However, we do update
   // many analyses if they are around.
   AU.addPreservedID(LoopSimplifyID);
 
   AU.addRequired<LoopInfoWrapperPass>();
   AU.addPreserved<LoopInfoWrapperPass>();
   AU.addRequiredID(LoopSimplifyID);
   AU.addRequired<DominatorTreeWrapperPass>();
   AU.addPreserved<DominatorTreeWrapperPass>();
   AU.addRequired<ScalarEvolutionWrapperPass>();
   AU.addPreserved<ScalarEvolutionWrapperPass>();
   // Requiring LoopSimplify a second time here prevents IVUsers from running
   // twice, since LoopSimplify was invalidated by running ScalarEvolution.
   AU.addRequiredID(LoopSimplifyID);
   AU.addRequired<IVUsers>();
   AU.addPreserved<IVUsers>();
   AU.addRequired<TargetTransformInfoWrapperPass>();
 }
 
 bool LoopStrengthReduce::runOnLoop(Loop *L, LPPassManager & /*LPM*/) {
   if (skipOptnoneFunction(L))
     return false;
 
   auto &IU = getAnalysis<IVUsers>();
   auto &SE = getAnalysis<ScalarEvolutionWrapperPass>().getSE();
   auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
   auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
   const auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(
       *L->getHeader()->getParent());
   bool Changed = false;
 
   // Run the main LSR transformation.
   Changed |= LSRInstance(L, IU, SE, DT, LI, TTI).getChanged();
 
   // Remove any extra phis created by processing inner loops.
   Changed |= DeleteDeadPHIs(L->getHeader());
   if (EnablePhiElim && L->isLoopSimplifyForm()) {
     SmallVector<WeakVH, 16> DeadInsts;
     const DataLayout &DL = L->getHeader()->getModule()->getDataLayout();
     SCEVExpander Rewriter(getAnalysis<ScalarEvolutionWrapperPass>().getSE(), DL,
                           "lsr");
 #ifndef NDEBUG
     Rewriter.setDebugType(DEBUG_TYPE);
 #endif
     unsigned numFolded = Rewriter.replaceCongruentIVs(
         L, &getAnalysis<DominatorTreeWrapperPass>().getDomTree(), DeadInsts,
         &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(
             *L->getHeader()->getParent()));
     if (numFolded) {
       Changed = true;
       DeleteTriviallyDeadInstructions(DeadInsts);
       DeleteDeadPHIs(L->getHeader());
     }
   }
   return Changed;
 }
Index: vendor/llvm/dist/lib/Transforms/Vectorize/LoopVectorize.cpp
===================================================================
--- vendor/llvm/dist/lib/Transforms/Vectorize/LoopVectorize.cpp	(revision 295845)
+++ vendor/llvm/dist/lib/Transforms/Vectorize/LoopVectorize.cpp	(revision 295846)
@@ -1,5823 +1,5758 @@
 //===- LoopVectorize.cpp - A Loop Vectorizer ------------------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This is the LLVM loop vectorizer. This pass modifies 'vectorizable' loops
 // and generates target-independent LLVM-IR.
 // The vectorizer uses the TargetTransformInfo analysis to estimate the costs
 // of instructions in order to estimate the profitability of vectorization.
 //
 // The loop vectorizer combines consecutive loop iterations into a single
 // 'wide' iteration. After this transformation the index is incremented
 // by the SIMD vector width, and not by one.
 //
 // This pass has three parts:
 // 1. The main loop pass that drives the different parts.
 // 2. LoopVectorizationLegality - A unit that checks for the legality
 //    of the vectorization.
 // 3. InnerLoopVectorizer - A unit that performs the actual
 //    widening of instructions.
 // 4. LoopVectorizationCostModel - A unit that checks for the profitability
 //    of vectorization. It decides on the optimal vector width, which
 //    can be one, if vectorization is not profitable.
 //
 //===----------------------------------------------------------------------===//
 //
 // The reduction-variable vectorization is based on the paper:
 //  D. Nuzman and R. Henderson. Multi-platform Auto-vectorization.
 //
 // Variable uniformity checks are inspired by:
 //  Karrenberg, R. and Hack, S. Whole Function Vectorization.
 //
 // The interleaved access vectorization is based on the paper:
 //  Dorit Nuzman, Ira Rosen and Ayal Zaks.  Auto-Vectorization of Interleaved
 //  Data for SIMD
 //
 // Other ideas/concepts are from:
 //  A. Zaks and D. Nuzman. Autovectorization in GCC-two years later.
 //
 //  S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.  An Evaluation of
 //  Vectorizing Compilers.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/Transforms/Vectorize.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/Hashing.h"
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/Analysis/BasicAliasAnalysis.h"
 #include "llvm/Analysis/AliasSetTracker.h"
 #include "llvm/Analysis/AssumptionCache.h"
 #include "llvm/Analysis/BlockFrequencyInfo.h"
 #include "llvm/Analysis/CodeMetrics.h"
 #include "llvm/Analysis/DemandedBits.h"
 #include "llvm/Analysis/GlobalsModRef.h"
 #include "llvm/Analysis/LoopAccessAnalysis.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/LoopIterator.h"
 #include "llvm/Analysis/LoopPass.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/ScalarEvolutionExpander.h"
 #include "llvm/Analysis/ScalarEvolutionExpressions.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/DebugInfo.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/PatternMatch.h"
 #include "llvm/IR/Type.h"
 #include "llvm/IR/Value.h"
 #include "llvm/IR/ValueHandle.h"
 #include "llvm/IR/Verifier.h"
 #include "llvm/Pass.h"
 #include "llvm/Support/BranchProbability.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Transforms/Scalar.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include "llvm/Analysis/VectorUtils.h"
 #include "llvm/Transforms/Utils/LoopUtils.h"
 #include <algorithm>
 #include <functional>
 #include <map>
 #include <tuple>
 
 using namespace llvm;
 using namespace llvm::PatternMatch;
 
 #define LV_NAME "loop-vectorize"
 #define DEBUG_TYPE LV_NAME
 
 STATISTIC(LoopsVectorized, "Number of loops vectorized");
 STATISTIC(LoopsAnalyzed, "Number of loops analyzed for vectorization");
 
 static cl::opt<bool>
 EnableIfConversion("enable-if-conversion", cl::init(true), cl::Hidden,
                    cl::desc("Enable if-conversion during vectorization."));
 
 /// We don't vectorize loops with a known constant trip count below this number.
 static cl::opt<unsigned>
 TinyTripCountVectorThreshold("vectorizer-min-trip-count", cl::init(16),
                              cl::Hidden,
                              cl::desc("Don't vectorize loops with a constant "
                                       "trip count that is smaller than this "
                                       "value."));
 
 static cl::opt<bool> MaximizeBandwidth(
     "vectorizer-maximize-bandwidth", cl::init(false), cl::Hidden,
     cl::desc("Maximize bandwidth when selecting vectorization factor which "
              "will be determined by the smallest type in loop."));
 
 /// This enables versioning on the strides of symbolically striding memory
 /// accesses in code like the following.
 ///   for (i = 0; i < N; ++i)
 ///     A[i * Stride1] += B[i * Stride2] ...
 ///
 /// Will be roughly translated to
 ///    if (Stride1 == 1 && Stride2 == 1) {
 ///      for (i = 0; i < N; i+=4)
 ///       A[i:i+3] += ...
 ///    } else
 ///      ...
 static cl::opt<bool> EnableMemAccessVersioning(
     "enable-mem-access-versioning", cl::init(true), cl::Hidden,
     cl::desc("Enable symbolic stride memory access versioning"));
 
 static cl::opt<bool> EnableInterleavedMemAccesses(
     "enable-interleaved-mem-accesses", cl::init(false), cl::Hidden,
     cl::desc("Enable vectorization on interleaved memory accesses in a loop"));
 
 /// Maximum factor for an interleaved memory access.
 static cl::opt<unsigned> MaxInterleaveGroupFactor(
     "max-interleave-group-factor", cl::Hidden,
     cl::desc("Maximum factor for an interleaved access group (default = 8)"),
     cl::init(8));
 
 /// We don't interleave loops with a known constant trip count below this
 /// number.
 static const unsigned TinyTripCountInterleaveThreshold = 128;
 
 static cl::opt<unsigned> ForceTargetNumScalarRegs(
     "force-target-num-scalar-regs", cl::init(0), cl::Hidden,
     cl::desc("A flag that overrides the target's number of scalar registers."));
 
 static cl::opt<unsigned> ForceTargetNumVectorRegs(
     "force-target-num-vector-regs", cl::init(0), cl::Hidden,
     cl::desc("A flag that overrides the target's number of vector registers."));
 
 /// Maximum vectorization interleave count.
 static const unsigned MaxInterleaveFactor = 16;
 
 static cl::opt<unsigned> ForceTargetMaxScalarInterleaveFactor(
     "force-target-max-scalar-interleave", cl::init(0), cl::Hidden,
     cl::desc("A flag that overrides the target's max interleave factor for "
              "scalar loops."));
 
 static cl::opt<unsigned> ForceTargetMaxVectorInterleaveFactor(
     "force-target-max-vector-interleave", cl::init(0), cl::Hidden,
     cl::desc("A flag that overrides the target's max interleave factor for "
              "vectorized loops."));
 
 static cl::opt<unsigned> ForceTargetInstructionCost(
     "force-target-instruction-cost", cl::init(0), cl::Hidden,
     cl::desc("A flag that overrides the target's expected cost for "
              "an instruction to a single constant value. Mostly "
              "useful for getting consistent testing."));
 
 static cl::opt<unsigned> SmallLoopCost(
     "small-loop-cost", cl::init(20), cl::Hidden,
     cl::desc(
         "The cost of a loop that is considered 'small' by the interleaver."));
 
 static cl::opt<bool> LoopVectorizeWithBlockFrequency(
     "loop-vectorize-with-block-frequency", cl::init(false), cl::Hidden,
     cl::desc("Enable the use of the block frequency analysis to access PGO "
              "heuristics minimizing code growth in cold regions and being more "
              "aggressive in hot regions."));
 
 // Runtime interleave loops for load/store throughput.
 static cl::opt<bool> EnableLoadStoreRuntimeInterleave(
     "enable-loadstore-runtime-interleave", cl::init(true), cl::Hidden,
     cl::desc(
         "Enable runtime interleaving until load/store ports are saturated"));
 
 /// The number of stores in a loop that are allowed to need predication.
 static cl::opt<unsigned> NumberOfStoresToPredicate(
     "vectorize-num-stores-pred", cl::init(1), cl::Hidden,
     cl::desc("Max number of stores to be predicated behind an if."));
 
 static cl::opt<bool> EnableIndVarRegisterHeur(
     "enable-ind-var-reg-heur", cl::init(true), cl::Hidden,
     cl::desc("Count the induction variable only once when interleaving"));
 
 static cl::opt<bool> EnableCondStoresVectorization(
     "enable-cond-stores-vec", cl::init(false), cl::Hidden,
     cl::desc("Enable if predication of stores during vectorization."));
 
 static cl::opt<unsigned> MaxNestedScalarReductionIC(
     "max-nested-scalar-reduction-interleave", cl::init(2), cl::Hidden,
     cl::desc("The maximum interleave count to use when interleaving a scalar "
              "reduction in a nested loop."));
 
 static cl::opt<unsigned> PragmaVectorizeMemoryCheckThreshold(
     "pragma-vectorize-memory-check-threshold", cl::init(128), cl::Hidden,
     cl::desc("The maximum allowed number of runtime memory checks with a "
              "vectorize(enable) pragma."));
 
 static cl::opt<unsigned> VectorizeSCEVCheckThreshold(
     "vectorize-scev-check-threshold", cl::init(16), cl::Hidden,
     cl::desc("The maximum number of SCEV checks allowed."));
 
 static cl::opt<unsigned> PragmaVectorizeSCEVCheckThreshold(
     "pragma-vectorize-scev-check-threshold", cl::init(128), cl::Hidden,
     cl::desc("The maximum number of SCEV checks allowed with a "
              "vectorize(enable) pragma"));
 
 namespace {
 
 // Forward declarations.
 class LoopVectorizeHints;
 class LoopVectorizationLegality;
 class LoopVectorizationCostModel;
 class LoopVectorizationRequirements;
 
 /// \brief This modifies LoopAccessReport to initialize message with
 /// loop-vectorizer-specific part.
 class VectorizationReport : public LoopAccessReport {
 public:
   VectorizationReport(Instruction *I = nullptr)
       : LoopAccessReport("loop not vectorized: ", I) {}
 
   /// \brief This allows promotion of the loop-access analysis report into the
   /// loop-vectorizer report.  It modifies the message to add the
   /// loop-vectorizer-specific part of the message.
   explicit VectorizationReport(const LoopAccessReport &R)
       : LoopAccessReport(Twine("loop not vectorized: ") + R.str(),
                          R.getInstr()) {}
 };
 
 /// A helper function for converting Scalar types to vector types.
 /// If the incoming type is void, we return void. If the VF is 1, we return
 /// the scalar type.
 static Type* ToVectorTy(Type *Scalar, unsigned VF) {
   if (Scalar->isVoidTy() || VF == 1)
     return Scalar;
   return VectorType::get(Scalar, VF);
 }
 
 /// A helper function that returns GEP instruction and knows to skip a
 /// 'bitcast'. The 'bitcast' may be skipped if the source and the destination
 /// pointee types of the 'bitcast' have the same size.
 /// For example:
 ///   bitcast double** %var to i64* - can be skipped
 ///   bitcast double** %var to i8*  - can not
 static GetElementPtrInst *getGEPInstruction(Value *Ptr) {
 
   if (isa<GetElementPtrInst>(Ptr))
     return cast<GetElementPtrInst>(Ptr);
 
   if (isa<BitCastInst>(Ptr) &&
       isa<GetElementPtrInst>(cast<BitCastInst>(Ptr)->getOperand(0))) {
     Type *BitcastTy = Ptr->getType();
     Type *GEPTy = cast<BitCastInst>(Ptr)->getSrcTy();
     if (!isa<PointerType>(BitcastTy) || !isa<PointerType>(GEPTy))
       return nullptr;
     Type *Pointee1Ty = cast<PointerType>(BitcastTy)->getPointerElementType();
     Type *Pointee2Ty = cast<PointerType>(GEPTy)->getPointerElementType();
     const DataLayout &DL = cast<BitCastInst>(Ptr)->getModule()->getDataLayout();
     if (DL.getTypeSizeInBits(Pointee1Ty) == DL.getTypeSizeInBits(Pointee2Ty))
       return cast<GetElementPtrInst>(cast<BitCastInst>(Ptr)->getOperand(0));
   }
   return nullptr;
 }
 
 /// InnerLoopVectorizer vectorizes loops which contain only one basic
 /// block to a specified vectorization factor (VF).
 /// This class performs the widening of scalars into vectors, or multiple
 /// scalars. This class also implements the following features:
 /// * It inserts an epilogue loop for handling loops that don't have iteration
 ///   counts that are known to be a multiple of the vectorization factor.
 /// * It handles the code generation for reduction variables.
 /// * Scalarization (implementation using scalars) of un-vectorizable
 ///   instructions.
 /// InnerLoopVectorizer does not perform any vectorization-legality
 /// checks, and relies on the caller to check for the different legality
 /// aspects. The InnerLoopVectorizer relies on the
 /// LoopVectorizationLegality class to provide information about the induction
 /// and reduction variables that were found to a given vectorization factor.
 class InnerLoopVectorizer {
 public:
   InnerLoopVectorizer(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
                       LoopInfo *LI, DominatorTree *DT,
                       const TargetLibraryInfo *TLI,
                       const TargetTransformInfo *TTI, unsigned VecWidth,
                       unsigned UnrollFactor)
       : OrigLoop(OrigLoop), PSE(PSE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),
         VF(VecWidth), UF(UnrollFactor), Builder(PSE.getSE()->getContext()),
         Induction(nullptr), OldInduction(nullptr), WidenMap(UnrollFactor),
         TripCount(nullptr), VectorTripCount(nullptr), Legal(nullptr),
         AddedSafetyChecks(false) {}
 
   // Perform the actual loop widening (vectorization).
   // MinimumBitWidths maps scalar integer values to the smallest bitwidth they
   // can be validly truncated to. The cost model has assumed this truncation
   // will happen when vectorizing.
   void vectorize(LoopVectorizationLegality *L,
                  MapVector<Instruction*,uint64_t> MinimumBitWidths) {
     MinBWs = MinimumBitWidths;
     Legal = L;
     // Create a new empty loop. Unlink the old loop and connect the new one.
     createEmptyLoop();
     // Widen each instruction in the old loop to a new one in the new loop.
     // Use the Legality module to find the induction and reduction variables.
     vectorizeLoop();
   }
 
   // Return true if any runtime check is added.
   bool IsSafetyChecksAdded() {
     return AddedSafetyChecks;
   }
 
   virtual ~InnerLoopVectorizer() {}
 
 protected:
   /// A small list of PHINodes.
   typedef SmallVector<PHINode*, 4> PhiVector;
   /// When we unroll loops we have multiple vector values for each scalar.
   /// This data structure holds the unrolled and vectorized values that
   /// originated from one scalar instruction.
   typedef SmallVector<Value*, 2> VectorParts;
 
   // When we if-convert we need to create edge masks. We have to cache values
   // so that we don't end up with exponential recursion/IR.
   typedef DenseMap<std::pair<BasicBlock*, BasicBlock*>,
                    VectorParts> EdgeMaskCache;
 
   /// Create an empty loop, based on the loop ranges of the old loop.
   void createEmptyLoop();
   /// Create a new induction variable inside L.
   PHINode *createInductionVariable(Loop *L, Value *Start, Value *End,
                                    Value *Step, Instruction *DL);
   /// Copy and widen the instructions from the old loop.
   virtual void vectorizeLoop();
 
   /// \brief The Loop exit block may have single value PHI nodes where the
   /// incoming value is 'Undef'. While vectorizing we only handled real values
   /// that were defined inside the loop. Here we fix the 'undef case'.
   /// See PR14725.
   void fixLCSSAPHIs();
 
   /// Shrinks vector element sizes based on information in "MinBWs".
   void truncateToMinimalBitwidths();
   
   /// A helper function that computes the predicate of the block BB, assuming
   /// that the header block of the loop is set to True. It returns the *entry*
   /// mask for the block BB.
   VectorParts createBlockInMask(BasicBlock *BB);
   /// A helper function that computes the predicate of the edge between SRC
   /// and DST.
   VectorParts createEdgeMask(BasicBlock *Src, BasicBlock *Dst);
 
   /// A helper function to vectorize a single BB within the innermost loop.
   void vectorizeBlockInLoop(BasicBlock *BB, PhiVector *PV);
   
   /// Vectorize a single PHINode in a block. This method handles the induction
   /// variable canonicalization. It supports both VF = 1 for unrolled loops and
   /// arbitrary length vectors.
   void widenPHIInstruction(Instruction *PN, VectorParts &Entry,
                            unsigned UF, unsigned VF, PhiVector *PV);
 
   /// Insert the new loop to the loop hierarchy and pass manager
   /// and update the analysis passes.
   void updateAnalysis();
 
   /// This instruction is un-vectorizable. Implement it as a sequence
   /// of scalars. If \p IfPredicateStore is true we need to 'hide' each
   /// scalarized instruction behind an if block predicated on the control
   /// dependence of the instruction.
   virtual void scalarizeInstruction(Instruction *Instr,
                                     bool IfPredicateStore=false);
 
   /// Vectorize Load and Store instructions,
   virtual void vectorizeMemoryInstruction(Instruction *Instr);
 
   /// Create a broadcast instruction. This method generates a broadcast
   /// instruction (shuffle) for loop invariant values and for the induction
   /// value. If this is the induction variable then we extend it to N, N+1, ...
   /// this is needed because each iteration in the loop corresponds to a SIMD
   /// element.
   virtual Value *getBroadcastInstrs(Value *V);
 
   /// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...)
   /// to each vector element of Val. The sequence starts at StartIndex.
   virtual Value *getStepVector(Value *Val, int StartIdx, Value *Step);
 
   /// When we go over instructions in the basic block we rely on previous
   /// values within the current basic block or on loop invariant values.
   /// When we widen (vectorize) values we place them in the map. If the values
   /// are not within the map, they have to be loop invariant, so we simply
   /// broadcast them into a vector.
   VectorParts &getVectorValue(Value *V);
 
   /// Try to vectorize the interleaved access group that \p Instr belongs to.
   void vectorizeInterleaveGroup(Instruction *Instr);
 
   /// Generate a shuffle sequence that will reverse the vector Vec.
   virtual Value *reverseVector(Value *Vec);
 
   /// Returns (and creates if needed) the original loop trip count.
   Value *getOrCreateTripCount(Loop *NewLoop);
 
   /// Returns (and creates if needed) the trip count of the widened loop.
   Value *getOrCreateVectorTripCount(Loop *NewLoop);
 
   /// Emit a bypass check to see if the trip count would overflow, or we
   /// wouldn't have enough iterations to execute one vector loop.
   void emitMinimumIterationCountCheck(Loop *L, BasicBlock *Bypass);
   /// Emit a bypass check to see if the vector trip count is nonzero.
   void emitVectorLoopEnteredCheck(Loop *L, BasicBlock *Bypass);
   /// Emit a bypass check to see if all of the SCEV assumptions we've
   /// had to make are correct.
   void emitSCEVChecks(Loop *L, BasicBlock *Bypass);
   /// Emit bypass checks to check any memory assumptions we may have made.
   void emitMemRuntimeChecks(Loop *L, BasicBlock *Bypass);
 
   /// This is a helper class that holds the vectorizer state. It maps scalar
   /// instructions to vector instructions. When the code is 'unrolled' then
   /// then a single scalar value is mapped to multiple vector parts. The parts
   /// are stored in the VectorPart type.
   struct ValueMap {
     /// C'tor.  UnrollFactor controls the number of vectors ('parts') that
     /// are mapped.
     ValueMap(unsigned UnrollFactor) : UF(UnrollFactor) {}
 
     /// \return True if 'Key' is saved in the Value Map.
     bool has(Value *Key) const { return MapStorage.count(Key); }
 
     /// Initializes a new entry in the map. Sets all of the vector parts to the
     /// save value in 'Val'.
     /// \return A reference to a vector with splat values.
     VectorParts &splat(Value *Key, Value *Val) {
       VectorParts &Entry = MapStorage[Key];
       Entry.assign(UF, Val);
       return Entry;
     }
 
     ///\return A reference to the value that is stored at 'Key'.
     VectorParts &get(Value *Key) {
       VectorParts &Entry = MapStorage[Key];
       if (Entry.empty())
         Entry.resize(UF);
       assert(Entry.size() == UF);
       return Entry;
     }
 
   private:
     /// The unroll factor. Each entry in the map stores this number of vector
     /// elements.
     unsigned UF;
 
     /// Map storage. We use std::map and not DenseMap because insertions to a
     /// dense map invalidates its iterators.
     std::map<Value *, VectorParts> MapStorage;
   };
 
   /// The original loop.
   Loop *OrigLoop;
   /// A wrapper around ScalarEvolution used to add runtime SCEV checks. Applies
   /// dynamic knowledge to simplify SCEV expressions and converts them to a
   /// more usable form.
   PredicatedScalarEvolution &PSE;
   /// Loop Info.
   LoopInfo *LI;
   /// Dominator Tree.
   DominatorTree *DT;
   /// Alias Analysis.
   AliasAnalysis *AA;
   /// Target Library Info.
   const TargetLibraryInfo *TLI;
   /// Target Transform Info.
   const TargetTransformInfo *TTI;
 
   /// The vectorization SIMD factor to use. Each vector will have this many
   /// vector elements.
   unsigned VF;
 
 protected:
   /// The vectorization unroll factor to use. Each scalar is vectorized to this
   /// many different vector instructions.
   unsigned UF;
 
   /// The builder that we use
   IRBuilder<> Builder;
 
   // --- Vectorization state ---
 
   /// The vector-loop preheader.
   BasicBlock *LoopVectorPreHeader;
   /// The scalar-loop preheader.
   BasicBlock *LoopScalarPreHeader;
   /// Middle Block between the vector and the scalar.
   BasicBlock *LoopMiddleBlock;
   ///The ExitBlock of the scalar loop.
   BasicBlock *LoopExitBlock;
   ///The vector loop body.
   SmallVector<BasicBlock *, 4> LoopVectorBody;
   ///The scalar loop body.
   BasicBlock *LoopScalarBody;
   /// A list of all bypass blocks. The first block is the entry of the loop.
   SmallVector<BasicBlock *, 4> LoopBypassBlocks;
 
   /// The new Induction variable which was added to the new block.
   PHINode *Induction;
   /// The induction variable of the old basic block.
   PHINode *OldInduction;
   /// Maps scalars to widened vectors.
   ValueMap WidenMap;
   /// Store instructions that should be predicated, as a pair
   ///   <StoreInst, Predicate>
   SmallVector<std::pair<StoreInst*,Value*>, 4> PredicatedStores;
   EdgeMaskCache MaskCache;
   /// Trip count of the original loop.
   Value *TripCount;
   /// Trip count of the widened loop (TripCount - TripCount % (VF*UF))
   Value *VectorTripCount;
 
   /// Map of scalar integer values to the smallest bitwidth they can be legally
   /// represented as. The vector equivalents of these values should be truncated
   /// to this type.
   MapVector<Instruction*,uint64_t> MinBWs;
   LoopVectorizationLegality *Legal;
 
   // Record whether runtime check is added.
   bool AddedSafetyChecks;
 };
 
 class InnerLoopUnroller : public InnerLoopVectorizer {
 public:
   InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
                     LoopInfo *LI, DominatorTree *DT,
                     const TargetLibraryInfo *TLI,
                     const TargetTransformInfo *TTI, unsigned UnrollFactor)
       : InnerLoopVectorizer(OrigLoop, PSE, LI, DT, TLI, TTI, 1, UnrollFactor) {}
 
 private:
   void scalarizeInstruction(Instruction *Instr,
                             bool IfPredicateStore = false) override;
   void vectorizeMemoryInstruction(Instruction *Instr) override;
   Value *getBroadcastInstrs(Value *V) override;
   Value *getStepVector(Value *Val, int StartIdx, Value *Step) override;
   Value *reverseVector(Value *Vec) override;
 };
 
 /// \brief Look for a meaningful debug location on the instruction or it's
 /// operands.
 static Instruction *getDebugLocFromInstOrOperands(Instruction *I) {
   if (!I)
     return I;
 
   DebugLoc Empty;
   if (I->getDebugLoc() != Empty)
     return I;
 
   for (User::op_iterator OI = I->op_begin(), OE = I->op_end(); OI != OE; ++OI) {
     if (Instruction *OpInst = dyn_cast<Instruction>(*OI))
       if (OpInst->getDebugLoc() != Empty)
         return OpInst;
   }
 
   return I;
 }
 
 /// \brief Set the debug location in the builder using the debug location in the
 /// instruction.
 static void setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr) {
   if (const Instruction *Inst = dyn_cast_or_null<Instruction>(Ptr))
     B.SetCurrentDebugLocation(Inst->getDebugLoc());
   else
     B.SetCurrentDebugLocation(DebugLoc());
 }
 
 #ifndef NDEBUG
 /// \return string containing a file name and a line # for the given loop.
 static std::string getDebugLocString(const Loop *L) {
   std::string Result;
   if (L) {
     raw_string_ostream OS(Result);
     if (const DebugLoc LoopDbgLoc = L->getStartLoc())
       LoopDbgLoc.print(OS);
     else
       // Just print the module name.
       OS << L->getHeader()->getParent()->getParent()->getModuleIdentifier();
     OS.flush();
   }
   return Result;
 }
 #endif
 
 /// \brief Propagate known metadata from one instruction to another.
 static void propagateMetadata(Instruction *To, const Instruction *From) {
   SmallVector<std::pair<unsigned, MDNode *>, 4> Metadata;
   From->getAllMetadataOtherThanDebugLoc(Metadata);
 
   for (auto M : Metadata) {
     unsigned Kind = M.first;
 
     // These are safe to transfer (this is safe for TBAA, even when we
     // if-convert, because should that metadata have had a control dependency
     // on the condition, and thus actually aliased with some other
     // non-speculated memory access when the condition was false, this would be
     // caught by the runtime overlap checks).
     if (Kind != LLVMContext::MD_tbaa &&
         Kind != LLVMContext::MD_alias_scope &&
         Kind != LLVMContext::MD_noalias &&
         Kind != LLVMContext::MD_fpmath &&
         Kind != LLVMContext::MD_nontemporal)
       continue;
 
     To->setMetadata(Kind, M.second);
   }
 }
 
 /// \brief Propagate known metadata from one instruction to a vector of others.
 static void propagateMetadata(SmallVectorImpl<Value *> &To,
                               const Instruction *From) {
   for (Value *V : To)
     if (Instruction *I = dyn_cast<Instruction>(V))
       propagateMetadata(I, From);
 }
 
 /// \brief The group of interleaved loads/stores sharing the same stride and
 /// close to each other.
 ///
 /// Each member in this group has an index starting from 0, and the largest
 /// index should be less than interleaved factor, which is equal to the absolute
 /// value of the access's stride.
 ///
 /// E.g. An interleaved load group of factor 4:
 ///        for (unsigned i = 0; i < 1024; i+=4) {
 ///          a = A[i];                           // Member of index 0
 ///          b = A[i+1];                         // Member of index 1
 ///          d = A[i+3];                         // Member of index 3
 ///          ...
 ///        }
 ///
 ///      An interleaved store group of factor 4:
 ///        for (unsigned i = 0; i < 1024; i+=4) {
 ///          ...
 ///          A[i]   = a;                         // Member of index 0
 ///          A[i+1] = b;                         // Member of index 1
 ///          A[i+2] = c;                         // Member of index 2
 ///          A[i+3] = d;                         // Member of index 3
 ///        }
 ///
 /// Note: the interleaved load group could have gaps (missing members), but
 /// the interleaved store group doesn't allow gaps.
 class InterleaveGroup {
 public:
   InterleaveGroup(Instruction *Instr, int Stride, unsigned Align)
       : Align(Align), SmallestKey(0), LargestKey(0), InsertPos(Instr) {
     assert(Align && "The alignment should be non-zero");
 
     Factor = std::abs(Stride);
     assert(Factor > 1 && "Invalid interleave factor");
 
     Reverse = Stride < 0;
     Members[0] = Instr;
   }
 
   bool isReverse() const { return Reverse; }
   unsigned getFactor() const { return Factor; }
   unsigned getAlignment() const { return Align; }
   unsigned getNumMembers() const { return Members.size(); }
 
   /// \brief Try to insert a new member \p Instr with index \p Index and
   /// alignment \p NewAlign. The index is related to the leader and it could be
   /// negative if it is the new leader.
   ///
   /// \returns false if the instruction doesn't belong to the group.
   bool insertMember(Instruction *Instr, int Index, unsigned NewAlign) {
     assert(NewAlign && "The new member's alignment should be non-zero");
 
     int Key = Index + SmallestKey;
 
     // Skip if there is already a member with the same index.
     if (Members.count(Key))
       return false;
 
     if (Key > LargestKey) {
       // The largest index is always less than the interleave factor.
       if (Index >= static_cast<int>(Factor))
         return false;
 
       LargestKey = Key;
     } else if (Key < SmallestKey) {
       // The largest index is always less than the interleave factor.
       if (LargestKey - Key >= static_cast<int>(Factor))
         return false;
 
       SmallestKey = Key;
     }
 
     // It's always safe to select the minimum alignment.
     Align = std::min(Align, NewAlign);
     Members[Key] = Instr;
     return true;
   }
 
   /// \brief Get the member with the given index \p Index
   ///
   /// \returns nullptr if contains no such member.
   Instruction *getMember(unsigned Index) const {
     int Key = SmallestKey + Index;
     if (!Members.count(Key))
       return nullptr;
 
     return Members.find(Key)->second;
   }
 
   /// \brief Get the index for the given member. Unlike the key in the member
   /// map, the index starts from 0.
   unsigned getIndex(Instruction *Instr) const {
     for (auto I : Members)
       if (I.second == Instr)
         return I.first - SmallestKey;
 
     llvm_unreachable("InterleaveGroup contains no such member");
   }
 
   Instruction *getInsertPos() const { return InsertPos; }
   void setInsertPos(Instruction *Inst) { InsertPos = Inst; }
 
 private:
   unsigned Factor; // Interleave Factor.
   bool Reverse;
   unsigned Align;
   DenseMap<int, Instruction *> Members;
   int SmallestKey;
   int LargestKey;
 
   // To avoid breaking dependences, vectorized instructions of an interleave
   // group should be inserted at either the first load or the last store in
   // program order.
   //
   // E.g. %even = load i32             // Insert Position
   //      %add = add i32 %even         // Use of %even
   //      %odd = load i32
   //
   //      store i32 %even
   //      %odd = add i32               // Def of %odd
   //      store i32 %odd               // Insert Position
   Instruction *InsertPos;
 };
 
 /// \brief Drive the analysis of interleaved memory accesses in the loop.
 ///
 /// Use this class to analyze interleaved accesses only when we can vectorize
 /// a loop. Otherwise it's meaningless to do analysis as the vectorization
 /// on interleaved accesses is unsafe.
 ///
 /// The analysis collects interleave groups and records the relationships
 /// between the member and the group in a map.
 class InterleavedAccessInfo {
 public:
   InterleavedAccessInfo(PredicatedScalarEvolution &PSE, Loop *L,
                         DominatorTree *DT)
       : PSE(PSE), TheLoop(L), DT(DT) {}
 
   ~InterleavedAccessInfo() {
     SmallSet<InterleaveGroup *, 4> DelSet;
     // Avoid releasing a pointer twice.
     for (auto &I : InterleaveGroupMap)
       DelSet.insert(I.second);
     for (auto *Ptr : DelSet)
       delete Ptr;
   }
 
   /// \brief Analyze the interleaved accesses and collect them in interleave
   /// groups. Substitute symbolic strides using \p Strides.
   void analyzeInterleaving(const ValueToValueMap &Strides);
 
   /// \brief Check if \p Instr belongs to any interleave group.
   bool isInterleaved(Instruction *Instr) const {
     return InterleaveGroupMap.count(Instr);
   }
 
   /// \brief Get the interleave group that \p Instr belongs to.
   ///
   /// \returns nullptr if doesn't have such group.
   InterleaveGroup *getInterleaveGroup(Instruction *Instr) const {
     if (InterleaveGroupMap.count(Instr))
       return InterleaveGroupMap.find(Instr)->second;
     return nullptr;
   }
 
 private:
   /// A wrapper around ScalarEvolution, used to add runtime SCEV checks.
   /// Simplifies SCEV expressions in the context of existing SCEV assumptions.
   /// The interleaved access analysis can also add new predicates (for example
   /// by versioning strides of pointers).
   PredicatedScalarEvolution &PSE;
   Loop *TheLoop;
   DominatorTree *DT;
 
   /// Holds the relationships between the members and the interleave group.
   DenseMap<Instruction *, InterleaveGroup *> InterleaveGroupMap;
 
   /// \brief The descriptor for a strided memory access.
   struct StrideDescriptor {
     StrideDescriptor(int Stride, const SCEV *Scev, unsigned Size,
                      unsigned Align)
         : Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}
 
     StrideDescriptor() : Stride(0), Scev(nullptr), Size(0), Align(0) {}
 
     int Stride; // The access's stride. It is negative for a reverse access.
     const SCEV *Scev; // The scalar expression of this access
     unsigned Size;    // The size of the memory object.
     unsigned Align;   // The alignment of this access.
   };
 
   /// \brief Create a new interleave group with the given instruction \p Instr,
   /// stride \p Stride and alignment \p Align.
   ///
   /// \returns the newly created interleave group.
   InterleaveGroup *createInterleaveGroup(Instruction *Instr, int Stride,
                                          unsigned Align) {
     assert(!InterleaveGroupMap.count(Instr) &&
            "Already in an interleaved access group");
     InterleaveGroupMap[Instr] = new InterleaveGroup(Instr, Stride, Align);
     return InterleaveGroupMap[Instr];
   }
 
   /// \brief Release the group and remove all the relationships.
   void releaseGroup(InterleaveGroup *Group) {
     for (unsigned i = 0; i < Group->getFactor(); i++)
       if (Instruction *Member = Group->getMember(i))
         InterleaveGroupMap.erase(Member);
 
     delete Group;
   }
 
   /// \brief Collect all the accesses with a constant stride in program order.
   void collectConstStridedAccesses(
       MapVector<Instruction *, StrideDescriptor> &StrideAccesses,
       const ValueToValueMap &Strides);
 };
 
 /// Utility class for getting and setting loop vectorizer hints in the form
 /// of loop metadata.
 /// This class keeps a number of loop annotations locally (as member variables)
 /// and can, upon request, write them back as metadata on the loop. It will
 /// initially scan the loop for existing metadata, and will update the local
 /// values based on information in the loop.
 /// We cannot write all values to metadata, as the mere presence of some info,
 /// for example 'force', means a decision has been made. So, we need to be
 /// careful NOT to add them if the user hasn't specifically asked so.
 class LoopVectorizeHints {
   enum HintKind {
     HK_WIDTH,
     HK_UNROLL,
     HK_FORCE
   };
 
   /// Hint - associates name and validation with the hint value.
   struct Hint {
     const char * Name;
     unsigned Value; // This may have to change for non-numeric values.
     HintKind Kind;
 
     Hint(const char * Name, unsigned Value, HintKind Kind)
       : Name(Name), Value(Value), Kind(Kind) { }
 
     bool validate(unsigned Val) {
       switch (Kind) {
       case HK_WIDTH:
         return isPowerOf2_32(Val) && Val <= VectorizerParams::MaxVectorWidth;
       case HK_UNROLL:
         return isPowerOf2_32(Val) && Val <= MaxInterleaveFactor;
       case HK_FORCE:
         return (Val <= 1);
       }
       return false;
     }
   };
 
   /// Vectorization width.
   Hint Width;
   /// Vectorization interleave factor.
   Hint Interleave;
   /// Vectorization forced
   Hint Force;
 
   /// Return the loop metadata prefix.
   static StringRef Prefix() { return "llvm.loop."; }
 
 public:
   enum ForceKind {
     FK_Undefined = -1, ///< Not selected.
     FK_Disabled = 0,   ///< Forcing disabled.
     FK_Enabled = 1,    ///< Forcing enabled.
   };
 
   LoopVectorizeHints(const Loop *L, bool DisableInterleaving)
       : Width("vectorize.width", VectorizerParams::VectorizationFactor,
               HK_WIDTH),
         Interleave("interleave.count", DisableInterleaving, HK_UNROLL),
         Force("vectorize.enable", FK_Undefined, HK_FORCE),
         TheLoop(L) {
     // Populate values with existing loop metadata.
     getHintsFromMetadata();
 
     // force-vector-interleave overrides DisableInterleaving.
     if (VectorizerParams::isInterleaveForced())
       Interleave.Value = VectorizerParams::VectorizationInterleave;
 
     DEBUG(if (DisableInterleaving && Interleave.Value == 1) dbgs()
           << "LV: Interleaving disabled by the pass manager\n");
   }
 
   /// Mark the loop L as already vectorized by setting the width to 1.
   void setAlreadyVectorized() {
     Width.Value = Interleave.Value = 1;
     Hint Hints[] = {Width, Interleave};
     writeHintsToMetadata(Hints);
   }
 
   bool allowVectorization(Function *F, Loop *L, bool AlwaysVectorize) const {
     if (getForce() == LoopVectorizeHints::FK_Disabled) {
       DEBUG(dbgs() << "LV: Not vectorizing: #pragma vectorize disable.\n");
       emitOptimizationRemarkAnalysis(F->getContext(),
                                      vectorizeAnalysisPassName(), *F,
                                      L->getStartLoc(), emitRemark());
       return false;
     }
 
     if (!AlwaysVectorize && getForce() != LoopVectorizeHints::FK_Enabled) {
       DEBUG(dbgs() << "LV: Not vectorizing: No #pragma vectorize enable.\n");
       emitOptimizationRemarkAnalysis(F->getContext(),
                                      vectorizeAnalysisPassName(), *F,
                                      L->getStartLoc(), emitRemark());
       return false;
     }
 
     if (getWidth() == 1 && getInterleave() == 1) {
       // FIXME: Add a separate metadata to indicate when the loop has already
       // been vectorized instead of setting width and count to 1.
       DEBUG(dbgs() << "LV: Not vectorizing: Disabled/already vectorized.\n");
       // FIXME: Add interleave.disable metadata. This will allow
       // vectorize.disable to be used without disabling the pass and errors
       // to differentiate between disabled vectorization and a width of 1.
       emitOptimizationRemarkAnalysis(
           F->getContext(), vectorizeAnalysisPassName(), *F, L->getStartLoc(),
           "loop not vectorized: vectorization and interleaving are explicitly "
           "disabled, or vectorize width and interleave count are both set to "
           "1");
       return false;
     }
 
     return true;
   }
 
   /// Dumps all the hint information.
   std::string emitRemark() const {
     VectorizationReport R;
     if (Force.Value == LoopVectorizeHints::FK_Disabled)
       R << "vectorization is explicitly disabled";
     else {
       R << "use -Rpass-analysis=loop-vectorize for more info";
       if (Force.Value == LoopVectorizeHints::FK_Enabled) {
         R << " (Force=true";
         if (Width.Value != 0)
           R << ", Vector Width=" << Width.Value;
         if (Interleave.Value != 0)
           R << ", Interleave Count=" << Interleave.Value;
         R << ")";
       }
     }
 
     return R.str();
   }
 
   unsigned getWidth() const { return Width.Value; }
   unsigned getInterleave() const { return Interleave.Value; }
   enum ForceKind getForce() const { return (ForceKind)Force.Value; }
   const char *vectorizeAnalysisPassName() const {
     // If hints are provided that don't disable vectorization use the
     // AlwaysPrint pass name to force the frontend to print the diagnostic.
     if (getWidth() == 1)
       return LV_NAME;
     if (getForce() == LoopVectorizeHints::FK_Disabled)
       return LV_NAME;
     if (getForce() == LoopVectorizeHints::FK_Undefined && getWidth() == 0)
       return LV_NAME;
     return DiagnosticInfo::AlwaysPrint;
   }
 
   bool allowReordering() const {
     // When enabling loop hints are provided we allow the vectorizer to change
     // the order of operations that is given by the scalar loop. This is not
     // enabled by default because can be unsafe or inefficient. For example,
     // reordering floating-point operations will change the way round-off
     // error accumulates in the loop.
     return getForce() == LoopVectorizeHints::FK_Enabled || getWidth() > 1;
   }
 
 private:
   /// Find hints specified in the loop metadata and update local values.
   void getHintsFromMetadata() {
     MDNode *LoopID = TheLoop->getLoopID();
     if (!LoopID)
       return;
 
     // First operand should refer to the loop id itself.
     assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
     assert(LoopID->getOperand(0) == LoopID && "invalid loop id");
 
     for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
       const MDString *S = nullptr;
       SmallVector<Metadata *, 4> Args;
 
       // The expected hint is either a MDString or a MDNode with the first
       // operand a MDString.
       if (const MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i))) {
         if (!MD || MD->getNumOperands() == 0)
           continue;
         S = dyn_cast<MDString>(MD->getOperand(0));
         for (unsigned i = 1, ie = MD->getNumOperands(); i < ie; ++i)
           Args.push_back(MD->getOperand(i));
       } else {
         S = dyn_cast<MDString>(LoopID->getOperand(i));
         assert(Args.size() == 0 && "too many arguments for MDString");
       }
 
       if (!S)
         continue;
 
       // Check if the hint starts with the loop metadata prefix.
       StringRef Name = S->getString();
       if (Args.size() == 1)
         setHint(Name, Args[0]);
     }
   }
 
   /// Checks string hint with one operand and set value if valid.
   void setHint(StringRef Name, Metadata *Arg) {
     if (!Name.startswith(Prefix()))
       return;
     Name = Name.substr(Prefix().size(), StringRef::npos);
 
     const ConstantInt *C = mdconst::dyn_extract<ConstantInt>(Arg);
     if (!C) return;
     unsigned Val = C->getZExtValue();
 
     Hint *Hints[] = {&Width, &Interleave, &Force};
     for (auto H : Hints) {
       if (Name == H->Name) {
         if (H->validate(Val))
           H->Value = Val;
         else
           DEBUG(dbgs() << "LV: ignoring invalid hint '" << Name << "'\n");
         break;
       }
     }
   }
 
   /// Create a new hint from name / value pair.
   MDNode *createHintMetadata(StringRef Name, unsigned V) const {
     LLVMContext &Context = TheLoop->getHeader()->getContext();
     Metadata *MDs[] = {MDString::get(Context, Name),
                        ConstantAsMetadata::get(
                            ConstantInt::get(Type::getInt32Ty(Context), V))};
     return MDNode::get(Context, MDs);
   }
 
   /// Matches metadata with hint name.
   bool matchesHintMetadataName(MDNode *Node, ArrayRef<Hint> HintTypes) {
     MDString* Name = dyn_cast<MDString>(Node->getOperand(0));
     if (!Name)
       return false;
 
     for (auto H : HintTypes)
       if (Name->getString().endswith(H.Name))
         return true;
     return false;
   }
 
   /// Sets current hints into loop metadata, keeping other values intact.
   void writeHintsToMetadata(ArrayRef<Hint> HintTypes) {
     if (HintTypes.size() == 0)
       return;
 
     // Reserve the first element to LoopID (see below).
     SmallVector<Metadata *, 4> MDs(1);
     // If the loop already has metadata, then ignore the existing operands.
     MDNode *LoopID = TheLoop->getLoopID();
     if (LoopID) {
       for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
         MDNode *Node = cast<MDNode>(LoopID->getOperand(i));
         // If node in update list, ignore old value.
         if (!matchesHintMetadataName(Node, HintTypes))
           MDs.push_back(Node);
       }
     }
 
     // Now, add the missing hints.
     for (auto H : HintTypes)
       MDs.push_back(createHintMetadata(Twine(Prefix(), H.Name).str(), H.Value));
 
     // Replace current metadata node with new one.
     LLVMContext &Context = TheLoop->getHeader()->getContext();
     MDNode *NewLoopID = MDNode::get(Context, MDs);
     // Set operand 0 to refer to the loop id itself.
     NewLoopID->replaceOperandWith(0, NewLoopID);
 
     TheLoop->setLoopID(NewLoopID);
   }
 
   /// The loop these hints belong to.
   const Loop *TheLoop;
 };
 
 static void emitAnalysisDiag(const Function *TheFunction, const Loop *TheLoop,
                              const LoopVectorizeHints &Hints,
                              const LoopAccessReport &Message) {
   const char *Name = Hints.vectorizeAnalysisPassName();
   LoopAccessReport::emitAnalysis(Message, TheFunction, TheLoop, Name);
 }
 
 static void emitMissedWarning(Function *F, Loop *L,
                               const LoopVectorizeHints &LH) {
   emitOptimizationRemarkMissed(F->getContext(), LV_NAME, *F, L->getStartLoc(),
                                LH.emitRemark());
 
   if (LH.getForce() == LoopVectorizeHints::FK_Enabled) {
     if (LH.getWidth() != 1)
       emitLoopVectorizeWarning(
           F->getContext(), *F, L->getStartLoc(),
           "failed explicitly specified loop vectorization");
     else if (LH.getInterleave() != 1)
       emitLoopInterleaveWarning(
           F->getContext(), *F, L->getStartLoc(),
           "failed explicitly specified loop interleaving");
   }
 }
 
 /// LoopVectorizationLegality checks if it is legal to vectorize a loop, and
 /// to what vectorization factor.
 /// This class does not look at the profitability of vectorization, only the
 /// legality. This class has two main kinds of checks:
 /// * Memory checks - The code in canVectorizeMemory checks if vectorization
 ///   will change the order of memory accesses in a way that will change the
 ///   correctness of the program.
 /// * Scalars checks - The code in canVectorizeInstrs and canVectorizeMemory
 /// checks for a number of different conditions, such as the availability of a
 /// single induction variable, that all types are supported and vectorize-able,
 /// etc. This code reflects the capabilities of InnerLoopVectorizer.
 /// This class is also used by InnerLoopVectorizer for identifying
 /// induction variable and the different reduction variables.
 class LoopVectorizationLegality {
 public:
   LoopVectorizationLegality(Loop *L, PredicatedScalarEvolution &PSE,
                             DominatorTree *DT, TargetLibraryInfo *TLI,
                             AliasAnalysis *AA, Function *F,
                             const TargetTransformInfo *TTI,
                             LoopAccessAnalysis *LAA,
                             LoopVectorizationRequirements *R,
                             const LoopVectorizeHints *H)
       : NumPredStores(0), TheLoop(L), PSE(PSE), TLI(TLI), TheFunction(F),
         TTI(TTI), DT(DT), LAA(LAA), LAI(nullptr), InterleaveInfo(PSE, L, DT),
         Induction(nullptr), WidestIndTy(nullptr), HasFunNoNaNAttr(false),
         Requirements(R), Hints(H) {}
 
   /// ReductionList contains the reduction descriptors for all
   /// of the reductions that were found in the loop.
   typedef DenseMap<PHINode *, RecurrenceDescriptor> ReductionList;
 
   /// InductionList saves induction variables and maps them to the
   /// induction descriptor.
   typedef MapVector<PHINode*, InductionDescriptor> InductionList;
 
   /// Returns true if it is legal to vectorize this loop.
   /// This does not mean that it is profitable to vectorize this
   /// loop, only that it is legal to do so.
   bool canVectorize();
 
   /// Returns the Induction variable.
   PHINode *getInduction() { return Induction; }
 
   /// Returns the reduction variables found in the loop.
   ReductionList *getReductionVars() { return &Reductions; }
 
   /// Returns the induction variables found in the loop.
   InductionList *getInductionVars() { return &Inductions; }
 
   /// Returns the widest induction type.
   Type *getWidestInductionType() { return WidestIndTy; }
 
   /// Returns True if V is an induction variable in this loop.
   bool isInductionVariable(const Value *V);
 
   /// Returns True if PN is a reduction variable in this loop.
   bool isReductionVariable(PHINode *PN) { return Reductions.count(PN); }
 
   /// Return true if the block BB needs to be predicated in order for the loop
   /// to be vectorized.
   bool blockNeedsPredication(BasicBlock *BB);
 
   /// Check if this  pointer is consecutive when vectorizing. This happens
   /// when the last index of the GEP is the induction variable, or that the
   /// pointer itself is an induction variable.
   /// This check allows us to vectorize A[idx] into a wide load/store.
   /// Returns:
   /// 0 - Stride is unknown or non-consecutive.
   /// 1 - Address is consecutive.
   /// -1 - Address is consecutive, and decreasing.
   int isConsecutivePtr(Value *Ptr);
 
   /// Returns true if the value V is uniform within the loop.
   bool isUniform(Value *V);
 
   /// Returns true if this instruction will remain scalar after vectorization.
   bool isUniformAfterVectorization(Instruction* I) { return Uniforms.count(I); }
 
   /// Returns the information that we collected about runtime memory check.
   const RuntimePointerChecking *getRuntimePointerChecking() const {
     return LAI->getRuntimePointerChecking();
   }
 
   const LoopAccessInfo *getLAI() const {
     return LAI;
   }
 
   /// \brief Check if \p Instr belongs to any interleaved access group.
   bool isAccessInterleaved(Instruction *Instr) {
     return InterleaveInfo.isInterleaved(Instr);
   }
 
   /// \brief Get the interleaved access group that \p Instr belongs to.
   const InterleaveGroup *getInterleavedAccessGroup(Instruction *Instr) {
     return InterleaveInfo.getInterleaveGroup(Instr);
   }
 
   unsigned getMaxSafeDepDistBytes() { return LAI->getMaxSafeDepDistBytes(); }
 
   bool hasStride(Value *V) { return StrideSet.count(V); }
   bool mustCheckStrides() { return !StrideSet.empty(); }
   SmallPtrSet<Value *, 8>::iterator strides_begin() {
     return StrideSet.begin();
   }
   SmallPtrSet<Value *, 8>::iterator strides_end() { return StrideSet.end(); }
 
   /// Returns true if the target machine supports masked store operation
   /// for the given \p DataType and kind of access to \p Ptr.
   bool isLegalMaskedStore(Type *DataType, Value *Ptr) {
     return isConsecutivePtr(Ptr) && TTI->isLegalMaskedStore(DataType);
   }
   /// Returns true if the target machine supports masked load operation
   /// for the given \p DataType and kind of access to \p Ptr.
   bool isLegalMaskedLoad(Type *DataType, Value *Ptr) {
     return isConsecutivePtr(Ptr) && TTI->isLegalMaskedLoad(DataType);
   }
   /// Returns true if vector representation of the instruction \p I
   /// requires mask.
   bool isMaskRequired(const Instruction* I) {
     return (MaskedOp.count(I) != 0);
   }
   unsigned getNumStores() const {
     return LAI->getNumStores();
   }
   unsigned getNumLoads() const {
     return LAI->getNumLoads();
   }
   unsigned getNumPredStores() const {
     return NumPredStores;
   }
 private:
   /// Check if a single basic block loop is vectorizable.
   /// At this point we know that this is a loop with a constant trip count
   /// and we only need to check individual instructions.
   bool canVectorizeInstrs();
 
   /// When we vectorize loops we may change the order in which
   /// we read and write from memory. This method checks if it is
   /// legal to vectorize the code, considering only memory constrains.
   /// Returns true if the loop is vectorizable
   bool canVectorizeMemory();
 
   /// Return true if we can vectorize this loop using the IF-conversion
   /// transformation.
   bool canVectorizeWithIfConvert();
 
   /// Collect the variables that need to stay uniform after vectorization.
   void collectLoopUniforms();
 
   /// Return true if all of the instructions in the block can be speculatively
   /// executed. \p SafePtrs is a list of addresses that are known to be legal
   /// and we know that we can read from them without segfault.
   bool blockCanBePredicated(BasicBlock *BB, SmallPtrSetImpl<Value *> &SafePtrs);
 
   /// \brief Collect memory access with loop invariant strides.
   ///
   /// Looks for accesses like "a[i * StrideA]" where "StrideA" is loop
   /// invariant.
   void collectStridedAccess(Value *LoadOrStoreInst);
 
   /// Report an analysis message to assist the user in diagnosing loops that are
   /// not vectorized.  These are handled as LoopAccessReport rather than
   /// VectorizationReport because the << operator of VectorizationReport returns
   /// LoopAccessReport.
   void emitAnalysis(const LoopAccessReport &Message) const {
     emitAnalysisDiag(TheFunction, TheLoop, *Hints, Message);
   }
 
   unsigned NumPredStores;
 
   /// The loop that we evaluate.
   Loop *TheLoop;
   /// A wrapper around ScalarEvolution used to add runtime SCEV checks.
   /// Applies dynamic knowledge to simplify SCEV expressions in the context
   /// of existing SCEV assumptions. The analysis will also add a minimal set
   /// of new predicates if this is required to enable vectorization and
   /// unrolling.
   PredicatedScalarEvolution &PSE;
   /// Target Library Info.
   TargetLibraryInfo *TLI;
   /// Parent function
   Function *TheFunction;
   /// Target Transform Info
   const TargetTransformInfo *TTI;
   /// Dominator Tree.
   DominatorTree *DT;
   // LoopAccess analysis.
   LoopAccessAnalysis *LAA;
   // And the loop-accesses info corresponding to this loop.  This pointer is
   // null until canVectorizeMemory sets it up.
   const LoopAccessInfo *LAI;
 
   /// The interleave access information contains groups of interleaved accesses
   /// with the same stride and close to each other.
   InterleavedAccessInfo InterleaveInfo;
 
   //  ---  vectorization state --- //
 
   /// Holds the integer induction variable. This is the counter of the
   /// loop.
   PHINode *Induction;
   /// Holds the reduction variables.
   ReductionList Reductions;
   /// Holds all of the induction variables that we found in the loop.
   /// Notice that inductions don't need to start at zero and that induction
   /// variables can be pointers.
   InductionList Inductions;
   /// Holds the widest induction type encountered.
   Type *WidestIndTy;
 
   /// Allowed outside users. This holds the reduction
   /// vars which can be accessed from outside the loop.
   SmallPtrSet<Value*, 4> AllowedExit;
   /// This set holds the variables which are known to be uniform after
   /// vectorization.
   SmallPtrSet<Instruction*, 4> Uniforms;
 
   /// Can we assume the absence of NaNs.
   bool HasFunNoNaNAttr;
 
   /// Vectorization requirements that will go through late-evaluation.
   LoopVectorizationRequirements *Requirements;
 
   /// Used to emit an analysis of any legality issues.
   const LoopVectorizeHints *Hints;
 
   ValueToValueMap Strides;
   SmallPtrSet<Value *, 8> StrideSet;
 
   /// While vectorizing these instructions we have to generate a
   /// call to the appropriate masked intrinsic
   SmallPtrSet<const Instruction *, 8> MaskedOp;
 };
 
 /// LoopVectorizationCostModel - estimates the expected speedups due to
 /// vectorization.
 /// In many cases vectorization is not profitable. This can happen because of
 /// a number of reasons. In this class we mainly attempt to predict the
 /// expected speedup/slowdowns due to the supported instruction set. We use the
 /// TargetTransformInfo to query the different backends for the cost of
 /// different operations.
 class LoopVectorizationCostModel {
 public:
-  LoopVectorizationCostModel(Loop *L, PredicatedScalarEvolution &PSE,
-                             LoopInfo *LI, LoopVectorizationLegality *Legal,
+  LoopVectorizationCostModel(Loop *L, ScalarEvolution *SE, LoopInfo *LI,
+                             LoopVectorizationLegality *Legal,
                              const TargetTransformInfo &TTI,
                              const TargetLibraryInfo *TLI, DemandedBits *DB,
                              AssumptionCache *AC, const Function *F,
-                             const LoopVectorizeHints *Hints)
-      : TheLoop(L), PSE(PSE), LI(LI), Legal(Legal), TTI(TTI), TLI(TLI), DB(DB),
-        AC(AC), TheFunction(F), Hints(Hints) {}
+                             const LoopVectorizeHints *Hints,
+                             SmallPtrSetImpl<const Value *> &ValuesToIgnore)
+      : TheLoop(L), SE(SE), LI(LI), Legal(Legal), TTI(TTI), TLI(TLI), DB(DB),
+        TheFunction(F), Hints(Hints), ValuesToIgnore(ValuesToIgnore) {}
 
   /// Information about vectorization costs
   struct VectorizationFactor {
     unsigned Width; // Vector width with best cost
     unsigned Cost; // Cost of the loop with that width
   };
   /// \return The most profitable vectorization factor and the cost of that VF.
   /// This method checks every power of two up to VF. If UserVF is not ZERO
   /// then this vectorization factor will be selected if vectorization is
   /// possible.
   VectorizationFactor selectVectorizationFactor(bool OptForSize);
 
   /// \return The size (in bits) of the smallest and widest types in the code
   /// that needs to be vectorized. We ignore values that remain scalar such as
   /// 64 bit loop indices.
   std::pair<unsigned, unsigned> getSmallestAndWidestTypes();
 
   /// \return The desired interleave count.
   /// If interleave count has been specified by metadata it will be returned.
   /// Otherwise, the interleave count is computed and returned. VF and LoopCost
   /// are the selected vectorization factor and the cost of the selected VF.
   unsigned selectInterleaveCount(bool OptForSize, unsigned VF,
                                  unsigned LoopCost);
 
   /// \return The most profitable unroll factor.
   /// This method finds the best unroll-factor based on register pressure and
   /// other parameters. VF and LoopCost are the selected vectorization factor
   /// and the cost of the selected VF.
   unsigned computeInterleaveCount(bool OptForSize, unsigned VF,
                                   unsigned LoopCost);
 
   /// \brief A struct that represents some properties of the register usage
   /// of a loop.
   struct RegisterUsage {
     /// Holds the number of loop invariant values that are used in the loop.
     unsigned LoopInvariantRegs;
     /// Holds the maximum number of concurrent live intervals in the loop.
     unsigned MaxLocalUsers;
     /// Holds the number of instructions in the loop.
     unsigned NumInstructions;
   };
 
   /// \return Returns information about the register usages of the loop for the
   /// given vectorization factors.
   SmallVector<RegisterUsage, 8>
   calculateRegisterUsage(const SmallVector<unsigned, 8> &VFs);
 
-  /// Collect values we want to ignore in the cost model.
-  void collectValuesToIgnore();
-
 private:
   /// Returns the expected execution cost. The unit of the cost does
   /// not matter because we use the 'cost' units to compare different
   /// vector widths. The cost that is returned is *not* normalized by
   /// the factor width.
   unsigned expectedCost(unsigned VF);
 
   /// Returns the execution time cost of an instruction for a given vector
   /// width. Vector width of one means scalar.
   unsigned getInstructionCost(Instruction *I, unsigned VF);
 
   /// Returns whether the instruction is a load or store and will be a emitted
   /// as a vector operation.
   bool isConsecutiveLoadOrStore(Instruction *I);
 
   /// Report an analysis message to assist the user in diagnosing loops that are
   /// not vectorized.  These are handled as LoopAccessReport rather than
   /// VectorizationReport because the << operator of VectorizationReport returns
   /// LoopAccessReport.
   void emitAnalysis(const LoopAccessReport &Message) const {
     emitAnalysisDiag(TheFunction, TheLoop, *Hints, Message);
   }
 
 public:
   /// Map of scalar integer values to the smallest bitwidth they can be legally
   /// represented as. The vector equivalents of these values should be truncated
   /// to this type.
   MapVector<Instruction*,uint64_t> MinBWs;
 
   /// The loop that we evaluate.
   Loop *TheLoop;
-  /// Predicated scalar evolution analysis.
-  PredicatedScalarEvolution &PSE;
+  /// Scev analysis.
+  ScalarEvolution *SE;
   /// Loop Info analysis.
   LoopInfo *LI;
   /// Vectorization legality.
   LoopVectorizationLegality *Legal;
   /// Vector target information.
   const TargetTransformInfo &TTI;
   /// Target Library Info.
   const TargetLibraryInfo *TLI;
-  /// Demanded bits analysis.
+  /// Demanded bits analysis
   DemandedBits *DB;
-  /// Assumption cache.
-  AssumptionCache *AC;
   const Function *TheFunction;
-  /// Loop Vectorize Hint.
+  // Loop Vectorize Hint.
   const LoopVectorizeHints *Hints;
-  /// Values to ignore in the cost model.
-  SmallPtrSet<const Value *, 16> ValuesToIgnore;
-  /// Values to ignore in the cost model when VF > 1.
-  SmallPtrSet<const Value *, 16> VecValuesToIgnore;
+  // Values to ignore in the cost model.
+  const SmallPtrSetImpl<const Value *> &ValuesToIgnore;
 };
 
 /// \brief This holds vectorization requirements that must be verified late in
 /// the process. The requirements are set by legalize and costmodel. Once
 /// vectorization has been determined to be possible and profitable the
 /// requirements can be verified by looking for metadata or compiler options.
 /// For example, some loops require FP commutativity which is only allowed if
 /// vectorization is explicitly specified or if the fast-math compiler option
 /// has been provided.
 /// Late evaluation of these requirements allows helpful diagnostics to be
 /// composed that tells the user what need to be done to vectorize the loop. For
 /// example, by specifying #pragma clang loop vectorize or -ffast-math. Late
 /// evaluation should be used only when diagnostics can generated that can be
 /// followed by a non-expert user.
 class LoopVectorizationRequirements {
 public:
   LoopVectorizationRequirements()
       : NumRuntimePointerChecks(0), UnsafeAlgebraInst(nullptr) {}
 
   void addUnsafeAlgebraInst(Instruction *I) {
     // First unsafe algebra instruction.
     if (!UnsafeAlgebraInst)
       UnsafeAlgebraInst = I;
   }
 
   void addRuntimePointerChecks(unsigned Num) { NumRuntimePointerChecks = Num; }
 
   bool doesNotMeet(Function *F, Loop *L, const LoopVectorizeHints &Hints) {
     const char *Name = Hints.vectorizeAnalysisPassName();
     bool Failed = false;
     if (UnsafeAlgebraInst && !Hints.allowReordering()) {
       emitOptimizationRemarkAnalysisFPCommute(
           F->getContext(), Name, *F, UnsafeAlgebraInst->getDebugLoc(),
           VectorizationReport() << "cannot prove it is safe to reorder "
                                    "floating-point operations");
       Failed = true;
     }
 
     // Test if runtime memcheck thresholds are exceeded.
     bool PragmaThresholdReached =
         NumRuntimePointerChecks > PragmaVectorizeMemoryCheckThreshold;
     bool ThresholdReached =
         NumRuntimePointerChecks > VectorizerParams::RuntimeMemoryCheckThreshold;
     if ((ThresholdReached && !Hints.allowReordering()) ||
         PragmaThresholdReached) {
       emitOptimizationRemarkAnalysisAliasing(
           F->getContext(), Name, *F, L->getStartLoc(),
           VectorizationReport()
               << "cannot prove it is safe to reorder memory operations");
       DEBUG(dbgs() << "LV: Too many memory checks needed.\n");
       Failed = true;
     }
 
     return Failed;
   }
 
 private:
   unsigned NumRuntimePointerChecks;
   Instruction *UnsafeAlgebraInst;
 };
 
 static void addInnerLoop(Loop &L, SmallVectorImpl<Loop *> &V) {
   if (L.empty())
     return V.push_back(&L);
 
   for (Loop *InnerL : L)
     addInnerLoop(*InnerL, V);
 }
 
 /// The LoopVectorize Pass.
 struct LoopVectorize : public FunctionPass {
   /// Pass identification, replacement for typeid
   static char ID;
 
   explicit LoopVectorize(bool NoUnrolling = false, bool AlwaysVectorize = true)
     : FunctionPass(ID),
       DisableUnrolling(NoUnrolling),
       AlwaysVectorize(AlwaysVectorize) {
     initializeLoopVectorizePass(*PassRegistry::getPassRegistry());
   }
 
   ScalarEvolution *SE;
   LoopInfo *LI;
   TargetTransformInfo *TTI;
   DominatorTree *DT;
   BlockFrequencyInfo *BFI;
   TargetLibraryInfo *TLI;
   DemandedBits *DB;
   AliasAnalysis *AA;
   AssumptionCache *AC;
   LoopAccessAnalysis *LAA;
   bool DisableUnrolling;
   bool AlwaysVectorize;
 
   BlockFrequency ColdEntryFreq;
 
   bool runOnFunction(Function &F) override {
     SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
     LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
     TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
     DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
     BFI = &getAnalysis<BlockFrequencyInfoWrapperPass>().getBFI();
     auto *TLIP = getAnalysisIfAvailable<TargetLibraryInfoWrapperPass>();
     TLI = TLIP ? &TLIP->getTLI() : nullptr;
     AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
     AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
     LAA = &getAnalysis<LoopAccessAnalysis>();
     DB = &getAnalysis<DemandedBits>();
 
     // Compute some weights outside of the loop over the loops. Compute this
     // using a BranchProbability to re-use its scaling math.
     const BranchProbability ColdProb(1, 5); // 20%
     ColdEntryFreq = BlockFrequency(BFI->getEntryFreq()) * ColdProb;
 
     // Don't attempt if
     // 1. the target claims to have no vector registers, and
     // 2. interleaving won't help ILP.
     //
     // The second condition is necessary because, even if the target has no
     // vector registers, loop vectorization may still enable scalar
     // interleaving.
     if (!TTI->getNumberOfRegisters(true) && TTI->getMaxInterleaveFactor(1) < 2)
       return false;
 
     // Build up a worklist of inner-loops to vectorize. This is necessary as
     // the act of vectorizing or partially unrolling a loop creates new loops
     // and can invalidate iterators across the loops.
     SmallVector<Loop *, 8> Worklist;
 
     for (Loop *L : *LI)
       addInnerLoop(*L, Worklist);
 
     LoopsAnalyzed += Worklist.size();
 
     // Now walk the identified inner loops.
     bool Changed = false;
     while (!Worklist.empty())
       Changed |= processLoop(Worklist.pop_back_val());
 
     // Process each loop nest in the function.
     return Changed;
   }
 
   static void AddRuntimeUnrollDisableMetaData(Loop *L) {
     SmallVector<Metadata *, 4> MDs;
     // Reserve first location for self reference to the LoopID metadata node.
     MDs.push_back(nullptr);
     bool IsUnrollMetadata = false;
     MDNode *LoopID = L->getLoopID();
     if (LoopID) {
       // First find existing loop unrolling disable metadata.
       for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
         MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i));
         if (MD) {
           const MDString *S = dyn_cast<MDString>(MD->getOperand(0));
           IsUnrollMetadata =
               S && S->getString().startswith("llvm.loop.unroll.disable");
         }
         MDs.push_back(LoopID->getOperand(i));
       }
     }
 
     if (!IsUnrollMetadata) {
       // Add runtime unroll disable metadata.
       LLVMContext &Context = L->getHeader()->getContext();
       SmallVector<Metadata *, 1> DisableOperands;
       DisableOperands.push_back(
           MDString::get(Context, "llvm.loop.unroll.runtime.disable"));
       MDNode *DisableNode = MDNode::get(Context, DisableOperands);
       MDs.push_back(DisableNode);
       MDNode *NewLoopID = MDNode::get(Context, MDs);
       // Set operand 0 to refer to the loop id itself.
       NewLoopID->replaceOperandWith(0, NewLoopID);
       L->setLoopID(NewLoopID);
     }
   }
 
   bool processLoop(Loop *L) {
     assert(L->empty() && "Only process inner loops.");
 
 #ifndef NDEBUG
     const std::string DebugLocStr = getDebugLocString(L);
 #endif /* NDEBUG */
 
     DEBUG(dbgs() << "\nLV: Checking a loop in \""
                  << L->getHeader()->getParent()->getName() << "\" from "
                  << DebugLocStr << "\n");
 
     LoopVectorizeHints Hints(L, DisableUnrolling);
 
     DEBUG(dbgs() << "LV: Loop hints:"
                  << " force="
                  << (Hints.getForce() == LoopVectorizeHints::FK_Disabled
                          ? "disabled"
                          : (Hints.getForce() == LoopVectorizeHints::FK_Enabled
                                 ? "enabled"
                                 : "?")) << " width=" << Hints.getWidth()
                  << " unroll=" << Hints.getInterleave() << "\n");
 
     // Function containing loop
     Function *F = L->getHeader()->getParent();
 
     // Looking at the diagnostic output is the only way to determine if a loop
     // was vectorized (other than looking at the IR or machine code), so it
     // is important to generate an optimization remark for each loop. Most of
     // these messages are generated by emitOptimizationRemarkAnalysis. Remarks
     // generated by emitOptimizationRemark and emitOptimizationRemarkMissed are
     // less verbose reporting vectorized loops and unvectorized loops that may
     // benefit from vectorization, respectively.
 
     if (!Hints.allowVectorization(F, L, AlwaysVectorize)) {
       DEBUG(dbgs() << "LV: Loop hints prevent vectorization.\n");
       return false;
     }
 
     // Check the loop for a trip count threshold:
     // do not vectorize loops with a tiny trip count.
     const unsigned TC = SE->getSmallConstantTripCount(L);
     if (TC > 0u && TC < TinyTripCountVectorThreshold) {
       DEBUG(dbgs() << "LV: Found a loop with a very small trip count. "
                    << "This loop is not worth vectorizing.");
       if (Hints.getForce() == LoopVectorizeHints::FK_Enabled)
         DEBUG(dbgs() << " But vectorizing was explicitly forced.\n");
       else {
         DEBUG(dbgs() << "\n");
         emitAnalysisDiag(F, L, Hints, VectorizationReport()
                                           << "vectorization is not beneficial "
                                              "and is not explicitly forced");
         return false;
       }
     }
 
     PredicatedScalarEvolution PSE(*SE);
 
     // Check if it is legal to vectorize the loop.
     LoopVectorizationRequirements Requirements;
     LoopVectorizationLegality LVL(L, PSE, DT, TLI, AA, F, TTI, LAA,
                                   &Requirements, &Hints);
     if (!LVL.canVectorize()) {
       DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");
       emitMissedWarning(F, L, Hints);
       return false;
     }
 
+    // Collect values we want to ignore in the cost model. This includes
+    // type-promoting instructions we identified during reduction detection.
+    SmallPtrSet<const Value *, 32> ValuesToIgnore;
+    CodeMetrics::collectEphemeralValues(L, AC, ValuesToIgnore);
+    for (auto &Reduction : *LVL.getReductionVars()) {
+      RecurrenceDescriptor &RedDes = Reduction.second;
+      SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();
+      ValuesToIgnore.insert(Casts.begin(), Casts.end());
+    }
+
     // Use the cost model.
-    LoopVectorizationCostModel CM(L, PSE, LI, &LVL, *TTI, TLI, DB, AC, F,
-                                  &Hints);
-    CM.collectValuesToIgnore();
+    LoopVectorizationCostModel CM(L, PSE.getSE(), LI, &LVL, *TTI, TLI, DB, AC,
+                                  F, &Hints, ValuesToIgnore);
 
     // Check the function attributes to find out if this function should be
     // optimized for size.
     bool OptForSize = Hints.getForce() != LoopVectorizeHints::FK_Enabled &&
                       F->optForSize();
 
     // Compute the weighted frequency of this loop being executed and see if it
     // is less than 20% of the function entry baseline frequency. Note that we
     // always have a canonical loop here because we think we *can* vectorize.
     // FIXME: This is hidden behind a flag due to pervasive problems with
     // exactly what block frequency models.
     if (LoopVectorizeWithBlockFrequency) {
       BlockFrequency LoopEntryFreq = BFI->getBlockFreq(L->getLoopPreheader());
       if (Hints.getForce() != LoopVectorizeHints::FK_Enabled &&
           LoopEntryFreq < ColdEntryFreq)
         OptForSize = true;
     }
 
     // Check the function attributes to see if implicit floats are allowed.
     // FIXME: This check doesn't seem possibly correct -- what if the loop is
     // an integer loop and the vector instructions selected are purely integer
     // vector instructions?
     if (F->hasFnAttribute(Attribute::NoImplicitFloat)) {
       DEBUG(dbgs() << "LV: Can't vectorize when the NoImplicitFloat"
             "attribute is used.\n");
       emitAnalysisDiag(
           F, L, Hints,
           VectorizationReport()
               << "loop not vectorized due to NoImplicitFloat attribute");
       emitMissedWarning(F, L, Hints);
       return false;
     }
 
     // Select the optimal vectorization factor.
     const LoopVectorizationCostModel::VectorizationFactor VF =
         CM.selectVectorizationFactor(OptForSize);
 
     // Select the interleave count.
     unsigned IC = CM.selectInterleaveCount(OptForSize, VF.Width, VF.Cost);
 
     // Get user interleave count.
     unsigned UserIC = Hints.getInterleave();
 
     // Identify the diagnostic messages that should be produced.
     std::string VecDiagMsg, IntDiagMsg;
     bool VectorizeLoop = true, InterleaveLoop = true;
 
     if (Requirements.doesNotMeet(F, L, Hints)) {
       DEBUG(dbgs() << "LV: Not vectorizing: loop did not meet vectorization "
                       "requirements.\n");
       emitMissedWarning(F, L, Hints);
       return false;
     }
 
     if (VF.Width == 1) {
       DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");
       VecDiagMsg =
           "the cost-model indicates that vectorization is not beneficial";
       VectorizeLoop = false;
     }
 
     if (IC == 1 && UserIC <= 1) {
       // Tell the user interleaving is not beneficial.
       DEBUG(dbgs() << "LV: Interleaving is not beneficial.\n");
       IntDiagMsg =
           "the cost-model indicates that interleaving is not beneficial";
       InterleaveLoop = false;
       if (UserIC == 1)
         IntDiagMsg +=
             " and is explicitly disabled or interleave count is set to 1";
     } else if (IC > 1 && UserIC == 1) {
       // Tell the user interleaving is beneficial, but it explicitly disabled.
       DEBUG(dbgs()
             << "LV: Interleaving is beneficial but is explicitly disabled.");
       IntDiagMsg = "the cost-model indicates that interleaving is beneficial "
                    "but is explicitly disabled or interleave count is set to 1";
       InterleaveLoop = false;
     }
 
     // Override IC if user provided an interleave count.
     IC = UserIC > 0 ? UserIC : IC;
 
     // Emit diagnostic messages, if any.
     const char *VAPassName = Hints.vectorizeAnalysisPassName();
     if (!VectorizeLoop && !InterleaveLoop) {
       // Do not vectorize or interleaving the loop.
       emitOptimizationRemarkAnalysis(F->getContext(), VAPassName, *F,
                                      L->getStartLoc(), VecDiagMsg);
       emitOptimizationRemarkAnalysis(F->getContext(), LV_NAME, *F,
                                      L->getStartLoc(), IntDiagMsg);
       return false;
     } else if (!VectorizeLoop && InterleaveLoop) {
       DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');
       emitOptimizationRemarkAnalysis(F->getContext(), VAPassName, *F,
                                      L->getStartLoc(), VecDiagMsg);
     } else if (VectorizeLoop && !InterleaveLoop) {
       DEBUG(dbgs() << "LV: Found a vectorizable loop (" << VF.Width << ") in "
                    << DebugLocStr << '\n');
       emitOptimizationRemarkAnalysis(F->getContext(), LV_NAME, *F,
                                      L->getStartLoc(), IntDiagMsg);
     } else if (VectorizeLoop && InterleaveLoop) {
       DEBUG(dbgs() << "LV: Found a vectorizable loop (" << VF.Width << ") in "
                    << DebugLocStr << '\n');
       DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');
     }
 
     if (!VectorizeLoop) {
       assert(IC > 1 && "interleave count should not be 1 or 0");
       // If we decided that it is not legal to vectorize the loop then
       // interleave it.
       InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, IC);
       Unroller.vectorize(&LVL, CM.MinBWs);
 
       emitOptimizationRemark(F->getContext(), LV_NAME, *F, L->getStartLoc(),
                              Twine("interleaved loop (interleaved count: ") +
                                  Twine(IC) + ")");
     } else {
       // If we decided that it is *legal* to vectorize the loop then do it.
       InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, VF.Width, IC);
       LB.vectorize(&LVL, CM.MinBWs);
       ++LoopsVectorized;
 
       // Add metadata to disable runtime unrolling scalar loop when there's no
       // runtime check about strides and memory. Because at this situation,
       // scalar loop is rarely used not worthy to be unrolled.
       if (!LB.IsSafetyChecksAdded())
         AddRuntimeUnrollDisableMetaData(L);
 
       // Report the vectorization decision.
       emitOptimizationRemark(F->getContext(), LV_NAME, *F, L->getStartLoc(),
                              Twine("vectorized loop (vectorization width: ") +
                                  Twine(VF.Width) + ", interleaved count: " +
                                  Twine(IC) + ")");
     }
 
     // Mark the loop as already vectorized to avoid vectorizing again.
     Hints.setAlreadyVectorized();
 
     DEBUG(verifyFunction(*L->getHeader()->getParent()));
     return true;
   }
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.addRequired<AssumptionCacheTracker>();
     AU.addRequiredID(LoopSimplifyID);
     AU.addRequiredID(LCSSAID);
     AU.addRequired<BlockFrequencyInfoWrapperPass>();
     AU.addRequired<DominatorTreeWrapperPass>();
     AU.addRequired<LoopInfoWrapperPass>();
     AU.addRequired<ScalarEvolutionWrapperPass>();
     AU.addRequired<TargetTransformInfoWrapperPass>();
     AU.addRequired<AAResultsWrapperPass>();
     AU.addRequired<LoopAccessAnalysis>();
     AU.addRequired<DemandedBits>();
     AU.addPreserved<LoopInfoWrapperPass>();
     AU.addPreserved<DominatorTreeWrapperPass>();
     AU.addPreserved<BasicAAWrapperPass>();
     AU.addPreserved<AAResultsWrapperPass>();
     AU.addPreserved<GlobalsAAWrapperPass>();
   }
 
 };
 
 } // end anonymous namespace
 
 //===----------------------------------------------------------------------===//
 // Implementation of LoopVectorizationLegality, InnerLoopVectorizer and
 // LoopVectorizationCostModel.
 //===----------------------------------------------------------------------===//
 
 Value *InnerLoopVectorizer::getBroadcastInstrs(Value *V) {
   // We need to place the broadcast of invariant variables outside the loop.
   Instruction *Instr = dyn_cast<Instruction>(V);
   bool NewInstr =
       (Instr && std::find(LoopVectorBody.begin(), LoopVectorBody.end(),
                           Instr->getParent()) != LoopVectorBody.end());
   bool Invariant = OrigLoop->isLoopInvariant(V) && !NewInstr;
 
   // Place the code for broadcasting invariant variables in the new preheader.
   IRBuilder<>::InsertPointGuard Guard(Builder);
   if (Invariant)
     Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
 
   // Broadcast the scalar into all locations in the vector.
   Value *Shuf = Builder.CreateVectorSplat(VF, V, "broadcast");
 
   return Shuf;
 }
 
 Value *InnerLoopVectorizer::getStepVector(Value *Val, int StartIdx,
                                           Value *Step) {
   assert(Val->getType()->isVectorTy() && "Must be a vector");
   assert(Val->getType()->getScalarType()->isIntegerTy() &&
          "Elem must be an integer");
   assert(Step->getType() == Val->getType()->getScalarType() &&
          "Step has wrong type");
   // Create the types.
   Type *ITy = Val->getType()->getScalarType();
   VectorType *Ty = cast<VectorType>(Val->getType());
   int VLen = Ty->getNumElements();
   SmallVector<Constant*, 8> Indices;
 
   // Create a vector of consecutive numbers from zero to VF.
   for (int i = 0; i < VLen; ++i)
     Indices.push_back(ConstantInt::get(ITy, StartIdx + i));
 
   // Add the consecutive indices to the vector value.
   Constant *Cv = ConstantVector::get(Indices);
   assert(Cv->getType() == Val->getType() && "Invalid consecutive vec");
   Step = Builder.CreateVectorSplat(VLen, Step);
   assert(Step->getType() == Val->getType() && "Invalid step vec");
   // FIXME: The newly created binary instructions should contain nsw/nuw flags,
   // which can be found from the original scalar operations.
   Step = Builder.CreateMul(Cv, Step);
   return Builder.CreateAdd(Val, Step, "induction");
 }
 
 int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {
   assert(Ptr->getType()->isPointerTy() && "Unexpected non-ptr");
   auto *SE = PSE.getSE();
   // Make sure that the pointer does not point to structs.
   if (Ptr->getType()->getPointerElementType()->isAggregateType())
     return 0;
 
   // If this value is a pointer induction variable we know it is consecutive.
   PHINode *Phi = dyn_cast_or_null<PHINode>(Ptr);
   if (Phi && Inductions.count(Phi)) {
     InductionDescriptor II = Inductions[Phi];
     return II.getConsecutiveDirection();
   }
 
   GetElementPtrInst *Gep = getGEPInstruction(Ptr);
   if (!Gep)
     return 0;
 
   unsigned NumOperands = Gep->getNumOperands();
   Value *GpPtr = Gep->getPointerOperand();
   // If this GEP value is a consecutive pointer induction variable and all of
   // the indices are constant then we know it is consecutive. We can
   Phi = dyn_cast<PHINode>(GpPtr);
   if (Phi && Inductions.count(Phi)) {
 
     // Make sure that the pointer does not point to structs.
     PointerType *GepPtrType = cast<PointerType>(GpPtr->getType());
     if (GepPtrType->getElementType()->isAggregateType())
       return 0;
 
     // Make sure that all of the index operands are loop invariant.
     for (unsigned i = 1; i < NumOperands; ++i)
       if (!SE->isLoopInvariant(PSE.getSCEV(Gep->getOperand(i)), TheLoop))
         return 0;
 
     InductionDescriptor II = Inductions[Phi];
     return II.getConsecutiveDirection();
   }
 
   unsigned InductionOperand = getGEPInductionOperand(Gep);
 
   // Check that all of the gep indices are uniform except for our induction
   // operand.
   for (unsigned i = 0; i != NumOperands; ++i)
     if (i != InductionOperand &&
         !SE->isLoopInvariant(PSE.getSCEV(Gep->getOperand(i)), TheLoop))
       return 0;
 
   // We can emit wide load/stores only if the last non-zero index is the
   // induction variable.
   const SCEV *Last = nullptr;
   if (!Strides.count(Gep))
     Last = PSE.getSCEV(Gep->getOperand(InductionOperand));
   else {
     // Because of the multiplication by a stride we can have a s/zext cast.
     // We are going to replace this stride by 1 so the cast is safe to ignore.
     //
     //  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
     //  %0 = trunc i64 %indvars.iv to i32
     //  %mul = mul i32 %0, %Stride1
     //  %idxprom = zext i32 %mul to i64  << Safe cast.
     //  %arrayidx = getelementptr inbounds i32* %B, i64 %idxprom
     //
     Last = replaceSymbolicStrideSCEV(PSE, Strides,
                                      Gep->getOperand(InductionOperand), Gep);
     if (const SCEVCastExpr *C = dyn_cast<SCEVCastExpr>(Last))
       Last =
           (C->getSCEVType() == scSignExtend || C->getSCEVType() == scZeroExtend)
               ? C->getOperand()
               : Last;
   }
   if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Last)) {
     const SCEV *Step = AR->getStepRecurrence(*SE);
 
     // The memory is consecutive because the last index is consecutive
     // and all other indices are loop invariant.
     if (Step->isOne())
       return 1;
     if (Step->isAllOnesValue())
       return -1;
   }
 
   return 0;
 }
 
 bool LoopVectorizationLegality::isUniform(Value *V) {
   return LAI->isUniform(V);
 }
 
 InnerLoopVectorizer::VectorParts&
 InnerLoopVectorizer::getVectorValue(Value *V) {
   assert(V != Induction && "The new induction variable should not be used.");
   assert(!V->getType()->isVectorTy() && "Can't widen a vector");
 
   // If we have a stride that is replaced by one, do it here.
   if (Legal->hasStride(V))
     V = ConstantInt::get(V->getType(), 1);
 
   // If we have this scalar in the map, return it.
   if (WidenMap.has(V))
     return WidenMap.get(V);
 
   // If this scalar is unknown, assume that it is a constant or that it is
   // loop invariant. Broadcast V and save the value for future uses.
   Value *B = getBroadcastInstrs(V);
   return WidenMap.splat(V, B);
 }
 
 Value *InnerLoopVectorizer::reverseVector(Value *Vec) {
   assert(Vec->getType()->isVectorTy() && "Invalid type");
   SmallVector<Constant*, 8> ShuffleMask;
   for (unsigned i = 0; i < VF; ++i)
     ShuffleMask.push_back(Builder.getInt32(VF - i - 1));
 
   return Builder.CreateShuffleVector(Vec, UndefValue::get(Vec->getType()),
                                      ConstantVector::get(ShuffleMask),
                                      "reverse");
 }
 
 // Get a mask to interleave \p NumVec vectors into a wide vector.
 // I.e.  <0, VF, VF*2, ..., VF*(NumVec-1), 1, VF+1, VF*2+1, ...>
 // E.g. For 2 interleaved vectors, if VF is 4, the mask is:
 //      <0, 4, 1, 5, 2, 6, 3, 7>
 static Constant *getInterleavedMask(IRBuilder<> &Builder, unsigned VF,
                                     unsigned NumVec) {
   SmallVector<Constant *, 16> Mask;
   for (unsigned i = 0; i < VF; i++)
     for (unsigned j = 0; j < NumVec; j++)
       Mask.push_back(Builder.getInt32(j * VF + i));
 
   return ConstantVector::get(Mask);
 }
 
 // Get the strided mask starting from index \p Start.
 // I.e.  <Start, Start + Stride, ..., Start + Stride*(VF-1)>
 static Constant *getStridedMask(IRBuilder<> &Builder, unsigned Start,
                                 unsigned Stride, unsigned VF) {
   SmallVector<Constant *, 16> Mask;
   for (unsigned i = 0; i < VF; i++)
     Mask.push_back(Builder.getInt32(Start + i * Stride));
 
   return ConstantVector::get(Mask);
 }
 
 // Get a mask of two parts: The first part consists of sequential integers
 // starting from 0, The second part consists of UNDEFs.
 // I.e. <0, 1, 2, ..., NumInt - 1, undef, ..., undef>
 static Constant *getSequentialMask(IRBuilder<> &Builder, unsigned NumInt,
                                    unsigned NumUndef) {
   SmallVector<Constant *, 16> Mask;
   for (unsigned i = 0; i < NumInt; i++)
     Mask.push_back(Builder.getInt32(i));
 
   Constant *Undef = UndefValue::get(Builder.getInt32Ty());
   for (unsigned i = 0; i < NumUndef; i++)
     Mask.push_back(Undef);
 
   return ConstantVector::get(Mask);
 }
 
 // Concatenate two vectors with the same element type. The 2nd vector should
 // not have more elements than the 1st vector. If the 2nd vector has less
 // elements, extend it with UNDEFs.
 static Value *ConcatenateTwoVectors(IRBuilder<> &Builder, Value *V1,
                                     Value *V2) {
   VectorType *VecTy1 = dyn_cast<VectorType>(V1->getType());
   VectorType *VecTy2 = dyn_cast<VectorType>(V2->getType());
   assert(VecTy1 && VecTy2 &&
          VecTy1->getScalarType() == VecTy2->getScalarType() &&
          "Expect two vectors with the same element type");
 
   unsigned NumElts1 = VecTy1->getNumElements();
   unsigned NumElts2 = VecTy2->getNumElements();
   assert(NumElts1 >= NumElts2 && "Unexpect the first vector has less elements");
 
   if (NumElts1 > NumElts2) {
     // Extend with UNDEFs.
     Constant *ExtMask =
         getSequentialMask(Builder, NumElts2, NumElts1 - NumElts2);
     V2 = Builder.CreateShuffleVector(V2, UndefValue::get(VecTy2), ExtMask);
   }
 
   Constant *Mask = getSequentialMask(Builder, NumElts1 + NumElts2, 0);
   return Builder.CreateShuffleVector(V1, V2, Mask);
 }
 
 // Concatenate vectors in the given list. All vectors have the same type.
 static Value *ConcatenateVectors(IRBuilder<> &Builder,
                                  ArrayRef<Value *> InputList) {
   unsigned NumVec = InputList.size();
   assert(NumVec > 1 && "Should be at least two vectors");
 
   SmallVector<Value *, 8> ResList;
   ResList.append(InputList.begin(), InputList.end());
   do {
     SmallVector<Value *, 8> TmpList;
     for (unsigned i = 0; i < NumVec - 1; i += 2) {
       Value *V0 = ResList[i], *V1 = ResList[i + 1];
       assert((V0->getType() == V1->getType() || i == NumVec - 2) &&
              "Only the last vector may have a different type");
 
       TmpList.push_back(ConcatenateTwoVectors(Builder, V0, V1));
     }
 
     // Push the last vector if the total number of vectors is odd.
     if (NumVec % 2 != 0)
       TmpList.push_back(ResList[NumVec - 1]);
 
     ResList = TmpList;
     NumVec = ResList.size();
   } while (NumVec > 1);
 
   return ResList[0];
 }
 
 // Try to vectorize the interleave group that \p Instr belongs to.
 //
 // E.g. Translate following interleaved load group (factor = 3):
 //   for (i = 0; i < N; i+=3) {
 //     R = Pic[i];             // Member of index 0
 //     G = Pic[i+1];           // Member of index 1
 //     B = Pic[i+2];           // Member of index 2
 //     ... // do something to R, G, B
 //   }
 // To:
 //   %wide.vec = load <12 x i32>                       ; Read 4 tuples of R,G,B
 //   %R.vec = shuffle %wide.vec, undef, <0, 3, 6, 9>   ; R elements
 //   %G.vec = shuffle %wide.vec, undef, <1, 4, 7, 10>  ; G elements
 //   %B.vec = shuffle %wide.vec, undef, <2, 5, 8, 11>  ; B elements
 //
 // Or translate following interleaved store group (factor = 3):
 //   for (i = 0; i < N; i+=3) {
 //     ... do something to R, G, B
 //     Pic[i]   = R;           // Member of index 0
 //     Pic[i+1] = G;           // Member of index 1
 //     Pic[i+2] = B;           // Member of index 2
 //   }
 // To:
 //   %R_G.vec = shuffle %R.vec, %G.vec, <0, 1, 2, ..., 7>
 //   %B_U.vec = shuffle %B.vec, undef, <0, 1, 2, 3, u, u, u, u>
 //   %interleaved.vec = shuffle %R_G.vec, %B_U.vec,
 //        <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>    ; Interleave R,G,B elements
 //   store <12 x i32> %interleaved.vec              ; Write 4 tuples of R,G,B
 void InnerLoopVectorizer::vectorizeInterleaveGroup(Instruction *Instr) {
   const InterleaveGroup *Group = Legal->getInterleavedAccessGroup(Instr);
   assert(Group && "Fail to get an interleaved access group.");
 
   // Skip if current instruction is not the insert position.
   if (Instr != Group->getInsertPos())
     return;
 
   LoadInst *LI = dyn_cast<LoadInst>(Instr);
   StoreInst *SI = dyn_cast<StoreInst>(Instr);
   Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand();
 
   // Prepare for the vector type of the interleaved load/store.
   Type *ScalarTy = LI ? LI->getType() : SI->getValueOperand()->getType();
   unsigned InterleaveFactor = Group->getFactor();
   Type *VecTy = VectorType::get(ScalarTy, InterleaveFactor * VF);
   Type *PtrTy = VecTy->getPointerTo(Ptr->getType()->getPointerAddressSpace());
 
   // Prepare for the new pointers.
   setDebugLocFromInst(Builder, Ptr);
   VectorParts &PtrParts = getVectorValue(Ptr);
   SmallVector<Value *, 2> NewPtrs;
   unsigned Index = Group->getIndex(Instr);
   for (unsigned Part = 0; Part < UF; Part++) {
     // Extract the pointer for current instruction from the pointer vector. A
     // reverse access uses the pointer in the last lane.
     Value *NewPtr = Builder.CreateExtractElement(
         PtrParts[Part],
         Group->isReverse() ? Builder.getInt32(VF - 1) : Builder.getInt32(0));
 
     // Notice current instruction could be any index. Need to adjust the address
     // to the member of index 0.
     //
     // E.g.  a = A[i+1];     // Member of index 1 (Current instruction)
     //       b = A[i];       // Member of index 0
     // Current pointer is pointed to A[i+1], adjust it to A[i].
     //
     // E.g.  A[i+1] = a;     // Member of index 1
     //       A[i]   = b;     // Member of index 0
     //       A[i+2] = c;     // Member of index 2 (Current instruction)
     // Current pointer is pointed to A[i+2], adjust it to A[i].
     NewPtr = Builder.CreateGEP(NewPtr, Builder.getInt32(-Index));
 
     // Cast to the vector pointer type.
     NewPtrs.push_back(Builder.CreateBitCast(NewPtr, PtrTy));
   }
 
   setDebugLocFromInst(Builder, Instr);
   Value *UndefVec = UndefValue::get(VecTy);
 
   // Vectorize the interleaved load group.
   if (LI) {
     for (unsigned Part = 0; Part < UF; Part++) {
       Instruction *NewLoadInstr = Builder.CreateAlignedLoad(
           NewPtrs[Part], Group->getAlignment(), "wide.vec");
 
       for (unsigned i = 0; i < InterleaveFactor; i++) {
         Instruction *Member = Group->getMember(i);
 
         // Skip the gaps in the group.
         if (!Member)
           continue;
 
         Constant *StrideMask = getStridedMask(Builder, i, InterleaveFactor, VF);
         Value *StridedVec = Builder.CreateShuffleVector(
             NewLoadInstr, UndefVec, StrideMask, "strided.vec");
 
         // If this member has different type, cast the result type.
         if (Member->getType() != ScalarTy) {
           VectorType *OtherVTy = VectorType::get(Member->getType(), VF);
           StridedVec = Builder.CreateBitOrPointerCast(StridedVec, OtherVTy);
         }
 
         VectorParts &Entry = WidenMap.get(Member);
         Entry[Part] =
             Group->isReverse() ? reverseVector(StridedVec) : StridedVec;
       }
 
       propagateMetadata(NewLoadInstr, Instr);
     }
     return;
   }
 
   // The sub vector type for current instruction.
   VectorType *SubVT = VectorType::get(ScalarTy, VF);
 
   // Vectorize the interleaved store group.
   for (unsigned Part = 0; Part < UF; Part++) {
     // Collect the stored vector from each member.
     SmallVector<Value *, 4> StoredVecs;
     for (unsigned i = 0; i < InterleaveFactor; i++) {
       // Interleaved store group doesn't allow a gap, so each index has a member
       Instruction *Member = Group->getMember(i);
       assert(Member && "Fail to get a member from an interleaved store group");
 
       Value *StoredVec =
           getVectorValue(dyn_cast<StoreInst>(Member)->getValueOperand())[Part];
       if (Group->isReverse())
         StoredVec = reverseVector(StoredVec);
 
       // If this member has different type, cast it to an unified type.
       if (StoredVec->getType() != SubVT)
         StoredVec = Builder.CreateBitOrPointerCast(StoredVec, SubVT);
 
       StoredVecs.push_back(StoredVec);
     }
 
     // Concatenate all vectors into a wide vector.
     Value *WideVec = ConcatenateVectors(Builder, StoredVecs);
 
     // Interleave the elements in the wide vector.
     Constant *IMask = getInterleavedMask(Builder, VF, InterleaveFactor);
     Value *IVec = Builder.CreateShuffleVector(WideVec, UndefVec, IMask,
                                               "interleaved.vec");
 
     Instruction *NewStoreInstr =
         Builder.CreateAlignedStore(IVec, NewPtrs[Part], Group->getAlignment());
     propagateMetadata(NewStoreInstr, Instr);
   }
 }
 
 void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
   // Attempt to issue a wide load.
   LoadInst *LI = dyn_cast<LoadInst>(Instr);
   StoreInst *SI = dyn_cast<StoreInst>(Instr);
 
   assert((LI || SI) && "Invalid Load/Store instruction");
 
   // Try to vectorize the interleave group if this access is interleaved.
   if (Legal->isAccessInterleaved(Instr))
     return vectorizeInterleaveGroup(Instr);
 
   Type *ScalarDataTy = LI ? LI->getType() : SI->getValueOperand()->getType();
   Type *DataTy = VectorType::get(ScalarDataTy, VF);
   Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand();
   unsigned Alignment = LI ? LI->getAlignment() : SI->getAlignment();
   // An alignment of 0 means target abi alignment. We need to use the scalar's
   // target abi alignment in such a case.
   const DataLayout &DL = Instr->getModule()->getDataLayout();
   if (!Alignment)
     Alignment = DL.getABITypeAlignment(ScalarDataTy);
   unsigned AddressSpace = Ptr->getType()->getPointerAddressSpace();
   unsigned ScalarAllocatedSize = DL.getTypeAllocSize(ScalarDataTy);
   unsigned VectorElementSize = DL.getTypeStoreSize(DataTy) / VF;
 
   if (SI && Legal->blockNeedsPredication(SI->getParent()) &&
       !Legal->isMaskRequired(SI))
     return scalarizeInstruction(Instr, true);
 
   if (ScalarAllocatedSize != VectorElementSize)
     return scalarizeInstruction(Instr);
 
   // If the pointer is loop invariant or if it is non-consecutive,
   // scalarize the load.
   int ConsecutiveStride = Legal->isConsecutivePtr(Ptr);
   bool Reverse = ConsecutiveStride < 0;
   bool UniformLoad = LI && Legal->isUniform(Ptr);
   if (!ConsecutiveStride || UniformLoad)
     return scalarizeInstruction(Instr);
 
   Constant *Zero = Builder.getInt32(0);
   VectorParts &Entry = WidenMap.get(Instr);
 
   // Handle consecutive loads/stores.
   GetElementPtrInst *Gep = getGEPInstruction(Ptr);
   if (Gep && Legal->isInductionVariable(Gep->getPointerOperand())) {
     setDebugLocFromInst(Builder, Gep);
     Value *PtrOperand = Gep->getPointerOperand();
     Value *FirstBasePtr = getVectorValue(PtrOperand)[0];
     FirstBasePtr = Builder.CreateExtractElement(FirstBasePtr, Zero);
 
     // Create the new GEP with the new induction variable.
     GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());
     Gep2->setOperand(0, FirstBasePtr);
     Gep2->setName("gep.indvar.base");
     Ptr = Builder.Insert(Gep2);
   } else if (Gep) {
     setDebugLocFromInst(Builder, Gep);
     assert(PSE.getSE()->isLoopInvariant(PSE.getSCEV(Gep->getPointerOperand()),
                                         OrigLoop) &&
            "Base ptr must be invariant");
 
     // The last index does not have to be the induction. It can be
     // consecutive and be a function of the index. For example A[I+1];
     unsigned NumOperands = Gep->getNumOperands();
     unsigned InductionOperand = getGEPInductionOperand(Gep);
     // Create the new GEP with the new induction variable.
     GetElementPtrInst *Gep2 = cast<GetElementPtrInst>(Gep->clone());
 
     for (unsigned i = 0; i < NumOperands; ++i) {
       Value *GepOperand = Gep->getOperand(i);
       Instruction *GepOperandInst = dyn_cast<Instruction>(GepOperand);
 
       // Update last index or loop invariant instruction anchored in loop.
       if (i == InductionOperand ||
           (GepOperandInst && OrigLoop->contains(GepOperandInst))) {
         assert((i == InductionOperand ||
                 PSE.getSE()->isLoopInvariant(PSE.getSCEV(GepOperandInst),
                                              OrigLoop)) &&
                "Must be last index or loop invariant");
 
         VectorParts &GEPParts = getVectorValue(GepOperand);
         Value *Index = GEPParts[0];
         Index = Builder.CreateExtractElement(Index, Zero);
         Gep2->setOperand(i, Index);
         Gep2->setName("gep.indvar.idx");
       }
     }
     Ptr = Builder.Insert(Gep2);
   } else {
     // Use the induction element ptr.
     assert(isa<PHINode>(Ptr) && "Invalid induction ptr");
     setDebugLocFromInst(Builder, Ptr);
     VectorParts &PtrVal = getVectorValue(Ptr);
     Ptr = Builder.CreateExtractElement(PtrVal[0], Zero);
   }
 
   VectorParts Mask = createBlockInMask(Instr->getParent());
   // Handle Stores:
   if (SI) {
     assert(!Legal->isUniform(SI->getPointerOperand()) &&
            "We do not allow storing to uniform addresses");
     setDebugLocFromInst(Builder, SI);
     // We don't want to update the value in the map as it might be used in
     // another expression. So don't use a reference type for "StoredVal".
     VectorParts StoredVal = getVectorValue(SI->getValueOperand());
 
     for (unsigned Part = 0; Part < UF; ++Part) {
       // Calculate the pointer for the specific unroll-part.
       Value *PartPtr =
           Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(Part * VF));
 
       if (Reverse) {
         // If we store to reverse consecutive memory locations, then we need
         // to reverse the order of elements in the stored value.
         StoredVal[Part] = reverseVector(StoredVal[Part]);
         // If the address is consecutive but reversed, then the
         // wide store needs to start at the last vector element.
         PartPtr = Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(-Part * VF));
         PartPtr = Builder.CreateGEP(nullptr, PartPtr, Builder.getInt32(1 - VF));
         Mask[Part] = reverseVector(Mask[Part]);
       }
 
       Value *VecPtr = Builder.CreateBitCast(PartPtr,
                                             DataTy->getPointerTo(AddressSpace));
 
       Instruction *NewSI;
       if (Legal->isMaskRequired(SI))
         NewSI = Builder.CreateMaskedStore(StoredVal[Part], VecPtr, Alignment,
                                           Mask[Part]);
       else 
         NewSI = Builder.CreateAlignedStore(StoredVal[Part], VecPtr, Alignment);
       propagateMetadata(NewSI, SI);
     }
     return;
   }
 
   // Handle loads.
   assert(LI && "Must have a load instruction");
   setDebugLocFromInst(Builder, LI);
   for (unsigned Part = 0; Part < UF; ++Part) {
     // Calculate the pointer for the specific unroll-part.
     Value *PartPtr =
         Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(Part * VF));
 
     if (Reverse) {
       // If the address is consecutive but reversed, then the
       // wide load needs to start at the last vector element.
       PartPtr = Builder.CreateGEP(nullptr, Ptr, Builder.getInt32(-Part * VF));
       PartPtr = Builder.CreateGEP(nullptr, PartPtr, Builder.getInt32(1 - VF));
       Mask[Part] = reverseVector(Mask[Part]);
     }
 
     Instruction* NewLI;
     Value *VecPtr = Builder.CreateBitCast(PartPtr,
                                           DataTy->getPointerTo(AddressSpace));
     if (Legal->isMaskRequired(LI))
       NewLI = Builder.CreateMaskedLoad(VecPtr, Alignment, Mask[Part],
                                        UndefValue::get(DataTy),
                                        "wide.masked.load");
     else
       NewLI = Builder.CreateAlignedLoad(VecPtr, Alignment, "wide.load");
     propagateMetadata(NewLI, LI);
     Entry[Part] = Reverse ? reverseVector(NewLI) :  NewLI;
   }
 }
 
 void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,
                                                bool IfPredicateStore) {
   assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
   // Holds vector parameters or scalars, in case of uniform vals.
   SmallVector<VectorParts, 4> Params;
 
   setDebugLocFromInst(Builder, Instr);
 
   // Find all of the vectorized parameters.
   for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
     Value *SrcOp = Instr->getOperand(op);
 
     // If we are accessing the old induction variable, use the new one.
     if (SrcOp == OldInduction) {
       Params.push_back(getVectorValue(SrcOp));
       continue;
     }
 
     // Try using previously calculated values.
     Instruction *SrcInst = dyn_cast<Instruction>(SrcOp);
 
     // If the src is an instruction that appeared earlier in the basic block,
     // then it should already be vectorized.
     if (SrcInst && OrigLoop->contains(SrcInst)) {
       assert(WidenMap.has(SrcInst) && "Source operand is unavailable");
       // The parameter is a vector value from earlier.
       Params.push_back(WidenMap.get(SrcInst));
     } else {
       // The parameter is a scalar from outside the loop. Maybe even a constant.
       VectorParts Scalars;
       Scalars.append(UF, SrcOp);
       Params.push_back(Scalars);
     }
   }
 
   assert(Params.size() == Instr->getNumOperands() &&
          "Invalid number of operands");
 
   // Does this instruction return a value ?
   bool IsVoidRetTy = Instr->getType()->isVoidTy();
 
   Value *UndefVec = IsVoidRetTy ? nullptr :
     UndefValue::get(VectorType::get(Instr->getType(), VF));
   // Create a new entry in the WidenMap and initialize it to Undef or Null.
   VectorParts &VecResults = WidenMap.splat(Instr, UndefVec);
 
   VectorParts Cond;
   if (IfPredicateStore) {
     assert(Instr->getParent()->getSinglePredecessor() &&
            "Only support single predecessor blocks");
     Cond = createEdgeMask(Instr->getParent()->getSinglePredecessor(),
                           Instr->getParent());
   }
 
   // For each vector unroll 'part':
   for (unsigned Part = 0; Part < UF; ++Part) {
     // For each scalar that we create:
     for (unsigned Width = 0; Width < VF; ++Width) {
 
       // Start if-block.
       Value *Cmp = nullptr;
       if (IfPredicateStore) {
         Cmp = Builder.CreateExtractElement(Cond[Part], Builder.getInt32(Width));
         Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cmp,
                                  ConstantInt::get(Cmp->getType(), 1));
       }
 
       Instruction *Cloned = Instr->clone();
       if (!IsVoidRetTy)
         Cloned->setName(Instr->getName() + ".cloned");
       // Replace the operands of the cloned instructions with extracted scalars.
       for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
         Value *Op = Params[op][Part];
         // Param is a vector. Need to extract the right lane.
         if (Op->getType()->isVectorTy())
           Op = Builder.CreateExtractElement(Op, Builder.getInt32(Width));
         Cloned->setOperand(op, Op);
       }
 
       // Place the cloned scalar in the new loop.
       Builder.Insert(Cloned);
 
       // If the original scalar returns a value we need to place it in a vector
       // so that future users will be able to use it.
       if (!IsVoidRetTy)
         VecResults[Part] = Builder.CreateInsertElement(VecResults[Part], Cloned,
                                                        Builder.getInt32(Width));
       // End if-block.
       if (IfPredicateStore)
         PredicatedStores.push_back(std::make_pair(cast<StoreInst>(Cloned),
                                                   Cmp));
     }
   }
 }
 
 PHINode *InnerLoopVectorizer::createInductionVariable(Loop *L, Value *Start,
                                                       Value *End, Value *Step,
                                                       Instruction *DL) {
   BasicBlock *Header = L->getHeader();
   BasicBlock *Latch = L->getLoopLatch();
   // As we're just creating this loop, it's possible no latch exists
   // yet. If so, use the header as this will be a single block loop.
   if (!Latch)
     Latch = Header;
 
   IRBuilder<> Builder(&*Header->getFirstInsertionPt());
   setDebugLocFromInst(Builder, getDebugLocFromInstOrOperands(OldInduction));
   auto *Induction = Builder.CreatePHI(Start->getType(), 2, "index");
 
   Builder.SetInsertPoint(Latch->getTerminator());
   
   // Create i+1 and fill the PHINode.
   Value *Next = Builder.CreateAdd(Induction, Step, "index.next");
   Induction->addIncoming(Start, L->getLoopPreheader());
   Induction->addIncoming(Next, Latch);
   // Create the compare.
   Value *ICmp = Builder.CreateICmpEQ(Next, End);
   Builder.CreateCondBr(ICmp, L->getExitBlock(), Header);
   
   // Now we have two terminators. Remove the old one from the block.
   Latch->getTerminator()->eraseFromParent();
 
   return Induction;
 }
 
 Value *InnerLoopVectorizer::getOrCreateTripCount(Loop *L) {
   if (TripCount)
     return TripCount;
 
   IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());
   // Find the loop boundaries.
   ScalarEvolution *SE = PSE.getSE();
   const SCEV *BackedgeTakenCount = SE->getBackedgeTakenCount(OrigLoop);
   assert(BackedgeTakenCount != SE->getCouldNotCompute() &&
          "Invalid loop count");
 
   Type *IdxTy = Legal->getWidestInductionType();
   
   // The exit count might have the type of i64 while the phi is i32. This can
   // happen if we have an induction variable that is sign extended before the
   // compare. The only way that we get a backedge taken count is that the
   // induction variable was signed and as such will not overflow. In such a case
   // truncation is legal.
   if (BackedgeTakenCount->getType()->getPrimitiveSizeInBits() >
       IdxTy->getPrimitiveSizeInBits())
     BackedgeTakenCount = SE->getTruncateOrNoop(BackedgeTakenCount, IdxTy);
   BackedgeTakenCount = SE->getNoopOrZeroExtend(BackedgeTakenCount, IdxTy);
   
   // Get the total trip count from the count by adding 1.
   const SCEV *ExitCount = SE->getAddExpr(
       BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));
 
   const DataLayout &DL = L->getHeader()->getModule()->getDataLayout();
 
   // Expand the trip count and place the new instructions in the preheader.
   // Notice that the pre-header does not change, only the loop body.
   SCEVExpander Exp(*SE, DL, "induction");
 
   // Count holds the overall loop count (N).
   TripCount = Exp.expandCodeFor(ExitCount, ExitCount->getType(),
                                 L->getLoopPreheader()->getTerminator());
 
   if (TripCount->getType()->isPointerTy())
     TripCount =
       CastInst::CreatePointerCast(TripCount, IdxTy,
                                   "exitcount.ptrcnt.to.int",
                                   L->getLoopPreheader()->getTerminator());
 
   return TripCount;
 }
 
 Value *InnerLoopVectorizer::getOrCreateVectorTripCount(Loop *L) {
   if (VectorTripCount)
     return VectorTripCount;
   
   Value *TC = getOrCreateTripCount(L);
   IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());
   
   // Now we need to generate the expression for N - (N % VF), which is
   // the part that the vectorized body will execute.
   // The loop step is equal to the vectorization factor (num of SIMD elements)
   // times the unroll factor (num of SIMD instructions).
   Constant *Step = ConstantInt::get(TC->getType(), VF * UF);
   Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");
   VectorTripCount = Builder.CreateSub(TC, R, "n.vec");
 
   return VectorTripCount;
 }
 
 void InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop *L,
                                                          BasicBlock *Bypass) {
   Value *Count = getOrCreateTripCount(L);
   BasicBlock *BB = L->getLoopPreheader();
   IRBuilder<> Builder(BB->getTerminator());
 
   // Generate code to check that the loop's trip count that we computed by
   // adding one to the backedge-taken count will not overflow.
   Value *CheckMinIters =
     Builder.CreateICmpULT(Count,
                           ConstantInt::get(Count->getType(), VF * UF),
                           "min.iters.check");
   
   BasicBlock *NewBB = BB->splitBasicBlock(BB->getTerminator(),
                                           "min.iters.checked");
   if (L->getParentLoop())
     L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);
   ReplaceInstWithInst(BB->getTerminator(),
                       BranchInst::Create(Bypass, NewBB, CheckMinIters));
   LoopBypassBlocks.push_back(BB);
 }
 
 void InnerLoopVectorizer::emitVectorLoopEnteredCheck(Loop *L,
                                                      BasicBlock *Bypass) {
   Value *TC = getOrCreateVectorTripCount(L);
   BasicBlock *BB = L->getLoopPreheader();
   IRBuilder<> Builder(BB->getTerminator());
   
   // Now, compare the new count to zero. If it is zero skip the vector loop and
   // jump to the scalar loop.
   Value *Cmp = Builder.CreateICmpEQ(TC, Constant::getNullValue(TC->getType()),
                                     "cmp.zero");
 
   // Generate code to check that the loop's trip count that we computed by
   // adding one to the backedge-taken count will not overflow.
   BasicBlock *NewBB = BB->splitBasicBlock(BB->getTerminator(),
                                           "vector.ph");
   if (L->getParentLoop())
     L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);
   ReplaceInstWithInst(BB->getTerminator(),
                       BranchInst::Create(Bypass, NewBB, Cmp));
   LoopBypassBlocks.push_back(BB);
 }
 
 void InnerLoopVectorizer::emitSCEVChecks(Loop *L, BasicBlock *Bypass) {
   BasicBlock *BB = L->getLoopPreheader();
 
   // Generate the code to check that the SCEV assumptions that we made.
   // We want the new basic block to start at the first instruction in a
   // sequence of instructions that form a check.
   SCEVExpander Exp(*PSE.getSE(), Bypass->getModule()->getDataLayout(),
                    "scev.check");
   Value *SCEVCheck =
       Exp.expandCodeForPredicate(&PSE.getUnionPredicate(), BB->getTerminator());
 
   if (auto *C = dyn_cast<ConstantInt>(SCEVCheck))
     if (C->isZero())
       return;
 
   // Create a new block containing the stride check.
   BB->setName("vector.scevcheck");
   auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");
   if (L->getParentLoop())
     L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);
   ReplaceInstWithInst(BB->getTerminator(),
                       BranchInst::Create(Bypass, NewBB, SCEVCheck));
   LoopBypassBlocks.push_back(BB);
   AddedSafetyChecks = true;
 }
 
 void InnerLoopVectorizer::emitMemRuntimeChecks(Loop *L,
                                                BasicBlock *Bypass) {
   BasicBlock *BB = L->getLoopPreheader();
 
   // Generate the code that checks in runtime if arrays overlap. We put the
   // checks into a separate block to make the more common case of few elements
   // faster.
   Instruction *FirstCheckInst;
   Instruction *MemRuntimeCheck;
   std::tie(FirstCheckInst, MemRuntimeCheck) =
       Legal->getLAI()->addRuntimeChecks(BB->getTerminator());
   if (!MemRuntimeCheck)
     return;
 
   // Create a new block containing the memory check.
   BB->setName("vector.memcheck");
   auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");
   if (L->getParentLoop())
     L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);
   ReplaceInstWithInst(BB->getTerminator(),
                       BranchInst::Create(Bypass, NewBB, MemRuntimeCheck));
   LoopBypassBlocks.push_back(BB);
   AddedSafetyChecks = true;
 }
 
 
 void InnerLoopVectorizer::createEmptyLoop() {
   /*
    In this function we generate a new loop. The new loop will contain
    the vectorized instructions while the old loop will continue to run the
    scalar remainder.
 
        [ ] <-- loop iteration number check.
     /   |
    /    v
   |    [ ] <-- vector loop bypass (may consist of multiple blocks).
   |  /  |
   | /   v
   ||   [ ]     <-- vector pre header.
   |/    |
   |     v
   |    [  ] \
   |    [  ]_|   <-- vector loop.
   |     |
   |     v
   |   -[ ]   <--- middle-block.
   |  /  |
   | /   v
   -|- >[ ]     <--- new preheader.
    |    |
    |    v
    |   [ ] \
    |   [ ]_|   <-- old scalar loop to handle remainder.
     \   |
      \  v
       >[ ]     <-- exit block.
    ...
    */
 
   BasicBlock *OldBasicBlock = OrigLoop->getHeader();
   BasicBlock *VectorPH = OrigLoop->getLoopPreheader();
   BasicBlock *ExitBlock = OrigLoop->getExitBlock();
   assert(VectorPH && "Invalid loop structure");
   assert(ExitBlock && "Must have an exit block");
 
   // Some loops have a single integer induction variable, while other loops
   // don't. One example is c++ iterators that often have multiple pointer
   // induction variables. In the code below we also support a case where we
   // don't have a single induction variable.
   //
   // We try to obtain an induction variable from the original loop as hard
   // as possible. However if we don't find one that:
   //   - is an integer
   //   - counts from zero, stepping by one
   //   - is the size of the widest induction variable type
   // then we create a new one.
   OldInduction = Legal->getInduction();
   Type *IdxTy = Legal->getWidestInductionType();
 
   // Split the single block loop into the two loop structure described above.
   BasicBlock *VecBody =
       VectorPH->splitBasicBlock(VectorPH->getTerminator(), "vector.body");
   BasicBlock *MiddleBlock =
   VecBody->splitBasicBlock(VecBody->getTerminator(), "middle.block");
   BasicBlock *ScalarPH =
   MiddleBlock->splitBasicBlock(MiddleBlock->getTerminator(), "scalar.ph");
 
   // Create and register the new vector loop.
   Loop* Lp = new Loop();
   Loop *ParentLoop = OrigLoop->getParentLoop();
 
   // Insert the new loop into the loop nest and register the new basic blocks
   // before calling any utilities such as SCEV that require valid LoopInfo.
   if (ParentLoop) {
     ParentLoop->addChildLoop(Lp);
     ParentLoop->addBasicBlockToLoop(ScalarPH, *LI);
     ParentLoop->addBasicBlockToLoop(MiddleBlock, *LI);
   } else {
     LI->addTopLevelLoop(Lp);
   }
   Lp->addBasicBlockToLoop(VecBody, *LI);
 
   // Find the loop boundaries.
   Value *Count = getOrCreateTripCount(Lp);
 
   Value *StartIdx = ConstantInt::get(IdxTy, 0);
 
   // We need to test whether the backedge-taken count is uint##_max. Adding one
   // to it will cause overflow and an incorrect loop trip count in the vector
   // body. In case of overflow we want to directly jump to the scalar remainder
   // loop.
   emitMinimumIterationCountCheck(Lp, ScalarPH);
   // Now, compare the new count to zero. If it is zero skip the vector loop and
   // jump to the scalar loop.
   emitVectorLoopEnteredCheck(Lp, ScalarPH);
   // Generate the code to check any assumptions that we've made for SCEV
   // expressions.
   emitSCEVChecks(Lp, ScalarPH);
 
   // Generate the code that checks in runtime if arrays overlap. We put the
   // checks into a separate block to make the more common case of few elements
   // faster.
   emitMemRuntimeChecks(Lp, ScalarPH);
   
   // Generate the induction variable.
   // The loop step is equal to the vectorization factor (num of SIMD elements)
   // times the unroll factor (num of SIMD instructions).
   Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
   Constant *Step = ConstantInt::get(IdxTy, VF * UF);
   Induction =
     createInductionVariable(Lp, StartIdx, CountRoundDown, Step,
                             getDebugLocFromInstOrOperands(OldInduction));
 
   // We are going to resume the execution of the scalar loop.
   // Go over all of the induction variables that we found and fix the
   // PHIs that are left in the scalar version of the loop.
   // The starting values of PHI nodes depend on the counter of the last
   // iteration in the vectorized loop.
   // If we come from a bypass edge then we need to start from the original
   // start value.
 
   // This variable saves the new starting index for the scalar loop. It is used
   // to test if there are any tail iterations left once the vector loop has
   // completed.
   LoopVectorizationLegality::InductionList::iterator I, E;
   LoopVectorizationLegality::InductionList *List = Legal->getInductionVars();
   for (I = List->begin(), E = List->end(); I != E; ++I) {
     PHINode *OrigPhi = I->first;
     InductionDescriptor II = I->second;
 
     // Create phi nodes to merge from the  backedge-taken check block.
     PHINode *BCResumeVal = PHINode::Create(OrigPhi->getType(), 3,
                                            "bc.resume.val",
                                            ScalarPH->getTerminator());
     Value *EndValue;
     if (OrigPhi == OldInduction) {
       // We know what the end value is.
       EndValue = CountRoundDown;
     } else {
       IRBuilder<> B(LoopBypassBlocks.back()->getTerminator());
       Value *CRD = B.CreateSExtOrTrunc(CountRoundDown,
                                        II.getStepValue()->getType(),
                                        "cast.crd");
       EndValue = II.transform(B, CRD);
       EndValue->setName("ind.end");
     }
 
     // The new PHI merges the original incoming value, in case of a bypass,
     // or the value at the end of the vectorized loop.
     BCResumeVal->addIncoming(EndValue, MiddleBlock);
 
     // Fix the scalar body counter (PHI node).
     unsigned BlockIdx = OrigPhi->getBasicBlockIndex(ScalarPH);
 
     // The old induction's phi node in the scalar body needs the truncated
     // value.
     for (unsigned I = 0, E = LoopBypassBlocks.size(); I != E; ++I)
       BCResumeVal->addIncoming(II.getStartValue(), LoopBypassBlocks[I]);
     OrigPhi->setIncomingValue(BlockIdx, BCResumeVal);
   }
 
   // Add a check in the middle block to see if we have completed
   // all of the iterations in the first vector loop.
   // If (N - N%VF) == N, then we *don't* need to run the remainder.
   Value *CmpN = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_EQ, Count,
                                 CountRoundDown, "cmp.n",
                                 MiddleBlock->getTerminator());
   ReplaceInstWithInst(MiddleBlock->getTerminator(),
                       BranchInst::Create(ExitBlock, ScalarPH, CmpN));
 
   // Get ready to start creating new instructions into the vectorized body.
   Builder.SetInsertPoint(&*VecBody->getFirstInsertionPt());
 
   // Save the state.
   LoopVectorPreHeader = Lp->getLoopPreheader();
   LoopScalarPreHeader = ScalarPH;
   LoopMiddleBlock = MiddleBlock;
   LoopExitBlock = ExitBlock;
   LoopVectorBody.push_back(VecBody);
   LoopScalarBody = OldBasicBlock;
 
   LoopVectorizeHints Hints(Lp, true);
   Hints.setAlreadyVectorized();
 }
 
 namespace {
 struct CSEDenseMapInfo {
   static bool canHandle(Instruction *I) {
     return isa<InsertElementInst>(I) || isa<ExtractElementInst>(I) ||
            isa<ShuffleVectorInst>(I) || isa<GetElementPtrInst>(I);
   }
   static inline Instruction *getEmptyKey() {
     return DenseMapInfo<Instruction *>::getEmptyKey();
   }
   static inline Instruction *getTombstoneKey() {
     return DenseMapInfo<Instruction *>::getTombstoneKey();
   }
   static unsigned getHashValue(Instruction *I) {
     assert(canHandle(I) && "Unknown instruction!");
     return hash_combine(I->getOpcode(), hash_combine_range(I->value_op_begin(),
                                                            I->value_op_end()));
   }
   static bool isEqual(Instruction *LHS, Instruction *RHS) {
     if (LHS == getEmptyKey() || RHS == getEmptyKey() ||
         LHS == getTombstoneKey() || RHS == getTombstoneKey())
       return LHS == RHS;
     return LHS->isIdenticalTo(RHS);
   }
 };
 }
 
 /// \brief Check whether this block is a predicated block.
 /// Due to if predication of stores we might create a sequence of "if(pred) a[i]
 /// = ...;  " blocks. We start with one vectorized basic block. For every
 /// conditional block we split this vectorized block. Therefore, every second
 /// block will be a predicated one.
 static bool isPredicatedBlock(unsigned BlockNum) {
   return BlockNum % 2;
 }
 
 ///\brief Perform cse of induction variable instructions.
 static void cse(SmallVector<BasicBlock *, 4> &BBs) {
   // Perform simple cse.
   SmallDenseMap<Instruction *, Instruction *, 4, CSEDenseMapInfo> CSEMap;
   for (unsigned i = 0, e = BBs.size(); i != e; ++i) {
     BasicBlock *BB = BBs[i];
     for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {
       Instruction *In = &*I++;
 
       if (!CSEDenseMapInfo::canHandle(In))
         continue;
 
       // Check if we can replace this instruction with any of the
       // visited instructions.
       if (Instruction *V = CSEMap.lookup(In)) {
         In->replaceAllUsesWith(V);
         In->eraseFromParent();
         continue;
       }
       // Ignore instructions in conditional blocks. We create "if (pred) a[i] =
       // ...;" blocks for predicated stores. Every second block is a predicated
       // block.
       if (isPredicatedBlock(i))
         continue;
 
       CSEMap[In] = In;
     }
   }
 }
 
 /// \brief Adds a 'fast' flag to floating point operations.
 static Value *addFastMathFlag(Value *V) {
   if (isa<FPMathOperator>(V)){
     FastMathFlags Flags;
     Flags.setUnsafeAlgebra();
     cast<Instruction>(V)->setFastMathFlags(Flags);
   }
   return V;
 }
 
 /// Estimate the overhead of scalarizing a value. Insert and Extract are set if
 /// the result needs to be inserted and/or extracted from vectors.
 static unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract,
                                          const TargetTransformInfo &TTI) {
   if (Ty->isVoidTy())
     return 0;
 
   assert(Ty->isVectorTy() && "Can only scalarize vectors");
   unsigned Cost = 0;
 
   for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) {
     if (Insert)
       Cost += TTI.getVectorInstrCost(Instruction::InsertElement, Ty, i);
     if (Extract)
       Cost += TTI.getVectorInstrCost(Instruction::ExtractElement, Ty, i);
   }
 
   return Cost;
 }
 
 // Estimate cost of a call instruction CI if it were vectorized with factor VF.
 // Return the cost of the instruction, including scalarization overhead if it's
 // needed. The flag NeedToScalarize shows if the call needs to be scalarized -
 // i.e. either vector version isn't available, or is too expensive.
 static unsigned getVectorCallCost(CallInst *CI, unsigned VF,
                                   const TargetTransformInfo &TTI,
                                   const TargetLibraryInfo *TLI,
                                   bool &NeedToScalarize) {
   Function *F = CI->getCalledFunction();
   StringRef FnName = CI->getCalledFunction()->getName();
   Type *ScalarRetTy = CI->getType();
   SmallVector<Type *, 4> Tys, ScalarTys;
   for (auto &ArgOp : CI->arg_operands())
     ScalarTys.push_back(ArgOp->getType());
 
   // Estimate cost of scalarized vector call. The source operands are assumed
   // to be vectors, so we need to extract individual elements from there,
   // execute VF scalar calls, and then gather the result into the vector return
   // value.
   unsigned ScalarCallCost = TTI.getCallInstrCost(F, ScalarRetTy, ScalarTys);
   if (VF == 1)
     return ScalarCallCost;
 
   // Compute corresponding vector type for return value and arguments.
   Type *RetTy = ToVectorTy(ScalarRetTy, VF);
   for (unsigned i = 0, ie = ScalarTys.size(); i != ie; ++i)
     Tys.push_back(ToVectorTy(ScalarTys[i], VF));
 
   // Compute costs of unpacking argument values for the scalar calls and
   // packing the return values to a vector.
   unsigned ScalarizationCost =
       getScalarizationOverhead(RetTy, true, false, TTI);
   for (unsigned i = 0, ie = Tys.size(); i != ie; ++i)
     ScalarizationCost += getScalarizationOverhead(Tys[i], false, true, TTI);
 
   unsigned Cost = ScalarCallCost * VF + ScalarizationCost;
 
   // If we can't emit a vector call for this function, then the currently found
   // cost is the cost we need to return.
   NeedToScalarize = true;
   if (!TLI || !TLI->isFunctionVectorizable(FnName, VF) || CI->isNoBuiltin())
     return Cost;
 
   // If the corresponding vector cost is cheaper, return its cost.
   unsigned VectorCallCost = TTI.getCallInstrCost(nullptr, RetTy, Tys);
   if (VectorCallCost < Cost) {
     NeedToScalarize = false;
     return VectorCallCost;
   }
   return Cost;
 }
 
 // Estimate cost of an intrinsic call instruction CI if it were vectorized with
 // factor VF.  Return the cost of the instruction, including scalarization
 // overhead if it's needed.
 static unsigned getVectorIntrinsicCost(CallInst *CI, unsigned VF,
                                        const TargetTransformInfo &TTI,
                                        const TargetLibraryInfo *TLI) {
   Intrinsic::ID ID = getIntrinsicIDForCall(CI, TLI);
   assert(ID && "Expected intrinsic call!");
 
   Type *RetTy = ToVectorTy(CI->getType(), VF);
   SmallVector<Type *, 4> Tys;
   for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i)
     Tys.push_back(ToVectorTy(CI->getArgOperand(i)->getType(), VF));
 
   return TTI.getIntrinsicInstrCost(ID, RetTy, Tys);
 }
 
 static Type *smallestIntegerVectorType(Type *T1, Type *T2) {
   IntegerType *I1 = cast<IntegerType>(T1->getVectorElementType());
   IntegerType *I2 = cast<IntegerType>(T2->getVectorElementType());
   return I1->getBitWidth() < I2->getBitWidth() ? T1 : T2;
 }
 static Type *largestIntegerVectorType(Type *T1, Type *T2) {
   IntegerType *I1 = cast<IntegerType>(T1->getVectorElementType());
   IntegerType *I2 = cast<IntegerType>(T2->getVectorElementType());
   return I1->getBitWidth() > I2->getBitWidth() ? T1 : T2;
 }
 
 void InnerLoopVectorizer::truncateToMinimalBitwidths() {
   // For every instruction `I` in MinBWs, truncate the operands, create a
   // truncated version of `I` and reextend its result. InstCombine runs
   // later and will remove any ext/trunc pairs.
   //
   for (auto &KV : MinBWs) {
     VectorParts &Parts = WidenMap.get(KV.first);
     for (Value *&I : Parts) {
       if (I->use_empty())
         continue;
       Type *OriginalTy = I->getType();
       Type *ScalarTruncatedTy = IntegerType::get(OriginalTy->getContext(),
                                                  KV.second);
       Type *TruncatedTy = VectorType::get(ScalarTruncatedTy,
                                           OriginalTy->getVectorNumElements());
       if (TruncatedTy == OriginalTy)
         continue;
 
       IRBuilder<> B(cast<Instruction>(I));
       auto ShrinkOperand = [&](Value *V) -> Value* {
         if (auto *ZI = dyn_cast<ZExtInst>(V))
           if (ZI->getSrcTy() == TruncatedTy)
             return ZI->getOperand(0);
         return B.CreateZExtOrTrunc(V, TruncatedTy);
       };
 
       // The actual instruction modification depends on the instruction type,
       // unfortunately.
       Value *NewI = nullptr;
       if (BinaryOperator *BO = dyn_cast<BinaryOperator>(I)) {
         NewI = B.CreateBinOp(BO->getOpcode(),
                              ShrinkOperand(BO->getOperand(0)),
                              ShrinkOperand(BO->getOperand(1)));
         cast<BinaryOperator>(NewI)->copyIRFlags(I);
       } else if (ICmpInst *CI = dyn_cast<ICmpInst>(I)) {
         NewI = B.CreateICmp(CI->getPredicate(),
                             ShrinkOperand(CI->getOperand(0)),
                             ShrinkOperand(CI->getOperand(1)));
       } else if (SelectInst *SI = dyn_cast<SelectInst>(I)) {
         NewI = B.CreateSelect(SI->getCondition(),
                               ShrinkOperand(SI->getTrueValue()),
                               ShrinkOperand(SI->getFalseValue()));
       } else if (CastInst *CI = dyn_cast<CastInst>(I)) {
         switch (CI->getOpcode()) {
         default: llvm_unreachable("Unhandled cast!");
         case Instruction::Trunc:
           NewI = ShrinkOperand(CI->getOperand(0));
           break;
         case Instruction::SExt:
           NewI = B.CreateSExtOrTrunc(CI->getOperand(0),
                                      smallestIntegerVectorType(OriginalTy,
                                                                TruncatedTy));
           break;
         case Instruction::ZExt:
           NewI = B.CreateZExtOrTrunc(CI->getOperand(0),
                                      smallestIntegerVectorType(OriginalTy,
                                                                TruncatedTy));
           break;
         }
       } else if (ShuffleVectorInst *SI = dyn_cast<ShuffleVectorInst>(I)) {
         auto Elements0 = SI->getOperand(0)->getType()->getVectorNumElements();
         auto *O0 =
           B.CreateZExtOrTrunc(SI->getOperand(0),
                               VectorType::get(ScalarTruncatedTy, Elements0));
         auto Elements1 = SI->getOperand(1)->getType()->getVectorNumElements();
         auto *O1 =
           B.CreateZExtOrTrunc(SI->getOperand(1),
                               VectorType::get(ScalarTruncatedTy, Elements1));
 
         NewI = B.CreateShuffleVector(O0, O1, SI->getMask());
       } else if (isa<LoadInst>(I)) {
         // Don't do anything with the operands, just extend the result.
         continue;
       } else {
         llvm_unreachable("Unhandled instruction type!");
       }
 
       // Lastly, extend the result.
       NewI->takeName(cast<Instruction>(I));
       Value *Res = B.CreateZExtOrTrunc(NewI, OriginalTy);
       I->replaceAllUsesWith(Res);
       cast<Instruction>(I)->eraseFromParent();
       I = Res;
     }
   }
 
   // We'll have created a bunch of ZExts that are now parentless. Clean up.
   for (auto &KV : MinBWs) {
     VectorParts &Parts = WidenMap.get(KV.first);
     for (Value *&I : Parts) {
       ZExtInst *Inst = dyn_cast<ZExtInst>(I);
       if (Inst && Inst->use_empty()) {
         Value *NewI = Inst->getOperand(0);
         Inst->eraseFromParent();
         I = NewI;
       }
     }
   }
 }
 
 void InnerLoopVectorizer::vectorizeLoop() {
   //===------------------------------------------------===//
   //
   // Notice: any optimization or new instruction that go
   // into the code below should be also be implemented in
   // the cost-model.
   //
   //===------------------------------------------------===//
   Constant *Zero = Builder.getInt32(0);
 
   // In order to support reduction variables we need to be able to vectorize
   // Phi nodes. Phi nodes have cycles, so we need to vectorize them in two
   // stages. First, we create a new vector PHI node with no incoming edges.
   // We use this value when we vectorize all of the instructions that use the
   // PHI. Next, after all of the instructions in the block are complete we
   // add the new incoming edges to the PHI. At this point all of the
   // instructions in the basic block are vectorized, so we can use them to
   // construct the PHI.
   PhiVector RdxPHIsToFix;
 
   // Scan the loop in a topological order to ensure that defs are vectorized
   // before users.
   LoopBlocksDFS DFS(OrigLoop);
   DFS.perform(LI);
 
   // Vectorize all of the blocks in the original loop.
   for (LoopBlocksDFS::RPOIterator bb = DFS.beginRPO(),
        be = DFS.endRPO(); bb != be; ++bb)
     vectorizeBlockInLoop(*bb, &RdxPHIsToFix);
 
   // Insert truncates and extends for any truncated instructions as hints to
   // InstCombine.
   if (VF > 1)
     truncateToMinimalBitwidths();
   
   // At this point every instruction in the original loop is widened to
   // a vector form. We are almost done. Now, we need to fix the PHI nodes
   // that we vectorized. The PHI nodes are currently empty because we did
   // not want to introduce cycles. Notice that the remaining PHI nodes
   // that we need to fix are reduction variables.
 
   // Create the 'reduced' values for each of the induction vars.
   // The reduced values are the vector values that we scalarize and combine
   // after the loop is finished.
   for (PhiVector::iterator it = RdxPHIsToFix.begin(), e = RdxPHIsToFix.end();
        it != e; ++it) {
     PHINode *RdxPhi = *it;
     assert(RdxPhi && "Unable to recover vectorized PHI");
 
     // Find the reduction variable descriptor.
     assert(Legal->isReductionVariable(RdxPhi) &&
            "Unable to find the reduction variable");
     RecurrenceDescriptor RdxDesc = (*Legal->getReductionVars())[RdxPhi];
 
     RecurrenceDescriptor::RecurrenceKind RK = RdxDesc.getRecurrenceKind();
     TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
     Instruction *LoopExitInst = RdxDesc.getLoopExitInstr();
     RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind =
         RdxDesc.getMinMaxRecurrenceKind();
     setDebugLocFromInst(Builder, ReductionStartValue);
 
     // We need to generate a reduction vector from the incoming scalar.
     // To do so, we need to generate the 'identity' vector and override
     // one of the elements with the incoming scalar reduction. We need
     // to do it in the vector-loop preheader.
     Builder.SetInsertPoint(LoopBypassBlocks[1]->getTerminator());
 
     // This is the vector-clone of the value that leaves the loop.
     VectorParts &VectorExit = getVectorValue(LoopExitInst);
     Type *VecTy = VectorExit[0]->getType();
 
     // Find the reduction identity variable. Zero for addition, or, xor,
     // one for multiplication, -1 for And.
     Value *Identity;
     Value *VectorStart;
     if (RK == RecurrenceDescriptor::RK_IntegerMinMax ||
         RK == RecurrenceDescriptor::RK_FloatMinMax) {
       // MinMax reduction have the start value as their identify.
       if (VF == 1) {
         VectorStart = Identity = ReductionStartValue;
       } else {
         VectorStart = Identity =
             Builder.CreateVectorSplat(VF, ReductionStartValue, "minmax.ident");
       }
     } else {
       // Handle other reduction kinds:
       Constant *Iden = RecurrenceDescriptor::getRecurrenceIdentity(
           RK, VecTy->getScalarType());
       if (VF == 1) {
         Identity = Iden;
         // This vector is the Identity vector where the first element is the
         // incoming scalar reduction.
         VectorStart = ReductionStartValue;
       } else {
         Identity = ConstantVector::getSplat(VF, Iden);
 
         // This vector is the Identity vector where the first element is the
         // incoming scalar reduction.
         VectorStart =
             Builder.CreateInsertElement(Identity, ReductionStartValue, Zero);
       }
     }
 
     // Fix the vector-loop phi.
 
     // Reductions do not have to start at zero. They can start with
     // any loop invariant values.
     VectorParts &VecRdxPhi = WidenMap.get(RdxPhi);
     BasicBlock *Latch = OrigLoop->getLoopLatch();
     Value *LoopVal = RdxPhi->getIncomingValueForBlock(Latch);
     VectorParts &Val = getVectorValue(LoopVal);
     for (unsigned part = 0; part < UF; ++part) {
       // Make sure to add the reduction stat value only to the
       // first unroll part.
       Value *StartVal = (part == 0) ? VectorStart : Identity;
       cast<PHINode>(VecRdxPhi[part])->addIncoming(StartVal,
                                                   LoopVectorPreHeader);
       cast<PHINode>(VecRdxPhi[part])->addIncoming(Val[part],
                                                   LoopVectorBody.back());
     }
 
     // Before each round, move the insertion point right between
     // the PHIs and the values we are going to write.
     // This allows us to write both PHINodes and the extractelement
     // instructions.
     Builder.SetInsertPoint(&*LoopMiddleBlock->getFirstInsertionPt());
 
     VectorParts RdxParts = getVectorValue(LoopExitInst);
     setDebugLocFromInst(Builder, LoopExitInst);
 
     // If the vector reduction can be performed in a smaller type, we truncate
     // then extend the loop exit value to enable InstCombine to evaluate the
     // entire expression in the smaller type.
     if (VF > 1 && RdxPhi->getType() != RdxDesc.getRecurrenceType()) {
       Type *RdxVecTy = VectorType::get(RdxDesc.getRecurrenceType(), VF);
       Builder.SetInsertPoint(LoopVectorBody.back()->getTerminator());
       for (unsigned part = 0; part < UF; ++part) {
         Value *Trunc = Builder.CreateTrunc(RdxParts[part], RdxVecTy);
         Value *Extnd = RdxDesc.isSigned() ? Builder.CreateSExt(Trunc, VecTy)
                                           : Builder.CreateZExt(Trunc, VecTy);
         for (Value::user_iterator UI = RdxParts[part]->user_begin();
              UI != RdxParts[part]->user_end();)
           if (*UI != Trunc) {
             (*UI++)->replaceUsesOfWith(RdxParts[part], Extnd);
             RdxParts[part] = Extnd;
           } else {
             ++UI;
           }
       }
       Builder.SetInsertPoint(&*LoopMiddleBlock->getFirstInsertionPt());
       for (unsigned part = 0; part < UF; ++part)
         RdxParts[part] = Builder.CreateTrunc(RdxParts[part], RdxVecTy);
     }
 
     // Reduce all of the unrolled parts into a single vector.
     Value *ReducedPartRdx = RdxParts[0];
     unsigned Op = RecurrenceDescriptor::getRecurrenceBinOp(RK);
     setDebugLocFromInst(Builder, ReducedPartRdx);
     for (unsigned part = 1; part < UF; ++part) {
       if (Op != Instruction::ICmp && Op != Instruction::FCmp)
         // Floating point operations had to be 'fast' to enable the reduction.
         ReducedPartRdx = addFastMathFlag(
             Builder.CreateBinOp((Instruction::BinaryOps)Op, RdxParts[part],
                                 ReducedPartRdx, "bin.rdx"));
       else
         ReducedPartRdx = RecurrenceDescriptor::createMinMaxOp(
             Builder, MinMaxKind, ReducedPartRdx, RdxParts[part]);
     }
 
     if (VF > 1) {
       // VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
       // and vector ops, reducing the set of values being computed by half each
       // round.
       assert(isPowerOf2_32(VF) &&
              "Reduction emission only supported for pow2 vectors!");
       Value *TmpVec = ReducedPartRdx;
       SmallVector<Constant*, 32> ShuffleMask(VF, nullptr);
       for (unsigned i = VF; i != 1; i >>= 1) {
         // Move the upper half of the vector to the lower half.
         for (unsigned j = 0; j != i/2; ++j)
           ShuffleMask[j] = Builder.getInt32(i/2 + j);
 
         // Fill the rest of the mask with undef.
         std::fill(&ShuffleMask[i/2], ShuffleMask.end(),
                   UndefValue::get(Builder.getInt32Ty()));
 
         Value *Shuf =
         Builder.CreateShuffleVector(TmpVec,
                                     UndefValue::get(TmpVec->getType()),
                                     ConstantVector::get(ShuffleMask),
                                     "rdx.shuf");
 
         if (Op != Instruction::ICmp && Op != Instruction::FCmp)
           // Floating point operations had to be 'fast' to enable the reduction.
           TmpVec = addFastMathFlag(Builder.CreateBinOp(
               (Instruction::BinaryOps)Op, TmpVec, Shuf, "bin.rdx"));
         else
           TmpVec = RecurrenceDescriptor::createMinMaxOp(Builder, MinMaxKind,
                                                         TmpVec, Shuf);
       }
 
       // The result is in the first element of the vector.
       ReducedPartRdx = Builder.CreateExtractElement(TmpVec,
                                                     Builder.getInt32(0));
 
       // If the reduction can be performed in a smaller type, we need to extend
       // the reduction to the wider type before we branch to the original loop.
       if (RdxPhi->getType() != RdxDesc.getRecurrenceType())
         ReducedPartRdx =
             RdxDesc.isSigned()
                 ? Builder.CreateSExt(ReducedPartRdx, RdxPhi->getType())
                 : Builder.CreateZExt(ReducedPartRdx, RdxPhi->getType());
     }
 
     // Create a phi node that merges control-flow from the backedge-taken check
     // block and the middle block.
     PHINode *BCBlockPhi = PHINode::Create(RdxPhi->getType(), 2, "bc.merge.rdx",
                                           LoopScalarPreHeader->getTerminator());
     for (unsigned I = 0, E = LoopBypassBlocks.size(); I != E; ++I)
       BCBlockPhi->addIncoming(ReductionStartValue, LoopBypassBlocks[I]);
     BCBlockPhi->addIncoming(ReducedPartRdx, LoopMiddleBlock);
 
     // Now, we need to fix the users of the reduction variable
     // inside and outside of the scalar remainder loop.
     // We know that the loop is in LCSSA form. We need to update the
     // PHI nodes in the exit blocks.
     for (BasicBlock::iterator LEI = LoopExitBlock->begin(),
          LEE = LoopExitBlock->end(); LEI != LEE; ++LEI) {
       PHINode *LCSSAPhi = dyn_cast<PHINode>(LEI);
       if (!LCSSAPhi) break;
 
       // All PHINodes need to have a single entry edge, or two if
       // we already fixed them.
       assert(LCSSAPhi->getNumIncomingValues() < 3 && "Invalid LCSSA PHI");
 
       // We found our reduction value exit-PHI. Update it with the
       // incoming bypass edge.
       if (LCSSAPhi->getIncomingValue(0) == LoopExitInst) {
         // Add an edge coming from the bypass.
         LCSSAPhi->addIncoming(ReducedPartRdx, LoopMiddleBlock);
         break;
       }
     }// end of the LCSSA phi scan.
 
     // Fix the scalar loop reduction variable with the incoming reduction sum
     // from the vector body and from the backedge value.
     int IncomingEdgeBlockIdx =
     (RdxPhi)->getBasicBlockIndex(OrigLoop->getLoopLatch());
     assert(IncomingEdgeBlockIdx >= 0 && "Invalid block index");
     // Pick the other block.
     int SelfEdgeBlockIdx = (IncomingEdgeBlockIdx ? 0 : 1);
     (RdxPhi)->setIncomingValue(SelfEdgeBlockIdx, BCBlockPhi);
     (RdxPhi)->setIncomingValue(IncomingEdgeBlockIdx, LoopExitInst);
   }// end of for each redux variable.
 
   fixLCSSAPHIs();
 
   // Make sure DomTree is updated.
   updateAnalysis();
   
   // Predicate any stores.
   for (auto KV : PredicatedStores) {
     BasicBlock::iterator I(KV.first);
     auto *BB = SplitBlock(I->getParent(), &*std::next(I), DT, LI);
     auto *T = SplitBlockAndInsertIfThen(KV.second, &*I, /*Unreachable=*/false,
                                         /*BranchWeights=*/nullptr, DT);
     I->moveBefore(T);
     I->getParent()->setName("pred.store.if");
     BB->setName("pred.store.continue");
   }
   DEBUG(DT->verifyDomTree());
   // Remove redundant induction instructions.
   cse(LoopVectorBody);
 }
 
 void InnerLoopVectorizer::fixLCSSAPHIs() {
   for (BasicBlock::iterator LEI = LoopExitBlock->begin(),
        LEE = LoopExitBlock->end(); LEI != LEE; ++LEI) {
     PHINode *LCSSAPhi = dyn_cast<PHINode>(LEI);
     if (!LCSSAPhi) break;
     if (LCSSAPhi->getNumIncomingValues() == 1)
       LCSSAPhi->addIncoming(UndefValue::get(LCSSAPhi->getType()),
                             LoopMiddleBlock);
   }
 }
 
 InnerLoopVectorizer::VectorParts
 InnerLoopVectorizer::createEdgeMask(BasicBlock *Src, BasicBlock *Dst) {
   assert(std::find(pred_begin(Dst), pred_end(Dst), Src) != pred_end(Dst) &&
          "Invalid edge");
 
   // Look for cached value.
   std::pair<BasicBlock*, BasicBlock*> Edge(Src, Dst);
   EdgeMaskCache::iterator ECEntryIt = MaskCache.find(Edge);
   if (ECEntryIt != MaskCache.end())
     return ECEntryIt->second;
 
   VectorParts SrcMask = createBlockInMask(Src);
 
   // The terminator has to be a branch inst!
   BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());
   assert(BI && "Unexpected terminator found");
 
   if (BI->isConditional()) {
     VectorParts EdgeMask = getVectorValue(BI->getCondition());
 
     if (BI->getSuccessor(0) != Dst)
       for (unsigned part = 0; part < UF; ++part)
         EdgeMask[part] = Builder.CreateNot(EdgeMask[part]);
 
     for (unsigned part = 0; part < UF; ++part)
       EdgeMask[part] = Builder.CreateAnd(EdgeMask[part], SrcMask[part]);
 
     MaskCache[Edge] = EdgeMask;
     return EdgeMask;
   }
 
   MaskCache[Edge] = SrcMask;
   return SrcMask;
 }
 
 InnerLoopVectorizer::VectorParts
 InnerLoopVectorizer::createBlockInMask(BasicBlock *BB) {
   assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
 
   // Loop incoming mask is all-one.
   if (OrigLoop->getHeader() == BB) {
     Value *C = ConstantInt::get(IntegerType::getInt1Ty(BB->getContext()), 1);
     return getVectorValue(C);
   }
 
   // This is the block mask. We OR all incoming edges, and with zero.
   Value *Zero = ConstantInt::get(IntegerType::getInt1Ty(BB->getContext()), 0);
   VectorParts BlockMask = getVectorValue(Zero);
 
   // For each pred:
   for (pred_iterator it = pred_begin(BB), e = pred_end(BB); it != e; ++it) {
     VectorParts EM = createEdgeMask(*it, BB);
     for (unsigned part = 0; part < UF; ++part)
       BlockMask[part] = Builder.CreateOr(BlockMask[part], EM[part]);
   }
 
   return BlockMask;
 }
 
 void InnerLoopVectorizer::widenPHIInstruction(
     Instruction *PN, InnerLoopVectorizer::VectorParts &Entry, unsigned UF,
     unsigned VF, PhiVector *PV) {
   PHINode* P = cast<PHINode>(PN);
   // Handle reduction variables:
   if (Legal->isReductionVariable(P)) {
     for (unsigned part = 0; part < UF; ++part) {
       // This is phase one of vectorizing PHIs.
       Type *VecTy = (VF == 1) ? PN->getType() :
       VectorType::get(PN->getType(), VF);
       Entry[part] = PHINode::Create(
           VecTy, 2, "vec.phi", &*LoopVectorBody.back()->getFirstInsertionPt());
     }
     PV->push_back(P);
     return;
   }
 
   setDebugLocFromInst(Builder, P);
   // Check for PHI nodes that are lowered to vector selects.
   if (P->getParent() != OrigLoop->getHeader()) {
     // We know that all PHIs in non-header blocks are converted into
     // selects, so we don't have to worry about the insertion order and we
     // can just use the builder.
     // At this point we generate the predication tree. There may be
     // duplications since this is a simple recursive scan, but future
     // optimizations will clean it up.
 
     unsigned NumIncoming = P->getNumIncomingValues();
 
     // Generate a sequence of selects of the form:
     // SELECT(Mask3, In3,
     //      SELECT(Mask2, In2,
     //                   ( ...)))
     for (unsigned In = 0; In < NumIncoming; In++) {
       VectorParts Cond = createEdgeMask(P->getIncomingBlock(In),
                                         P->getParent());
       VectorParts &In0 = getVectorValue(P->getIncomingValue(In));
 
       for (unsigned part = 0; part < UF; ++part) {
         // We might have single edge PHIs (blocks) - use an identity
         // 'select' for the first PHI operand.
         if (In == 0)
           Entry[part] = Builder.CreateSelect(Cond[part], In0[part],
                                              In0[part]);
         else
           // Select between the current value and the previous incoming edge
           // based on the incoming mask.
           Entry[part] = Builder.CreateSelect(Cond[part], In0[part],
                                              Entry[part], "predphi");
       }
     }
     return;
   }
 
   // This PHINode must be an induction variable.
   // Make sure that we know about it.
   assert(Legal->getInductionVars()->count(P) &&
          "Not an induction variable");
 
   InductionDescriptor II = Legal->getInductionVars()->lookup(P);
 
   // FIXME: The newly created binary instructions should contain nsw/nuw flags,
   // which can be found from the original scalar operations.
   switch (II.getKind()) {
     case InductionDescriptor::IK_NoInduction:
       llvm_unreachable("Unknown induction");
     case InductionDescriptor::IK_IntInduction: {
       assert(P->getType() == II.getStartValue()->getType() &&
              "Types must match");
       // Handle other induction variables that are now based on the
       // canonical one.
       Value *V = Induction;
       if (P != OldInduction) {
         V = Builder.CreateSExtOrTrunc(Induction, P->getType());
         V = II.transform(Builder, V);
         V->setName("offset.idx");
       }
       Value *Broadcasted = getBroadcastInstrs(V);
       // After broadcasting the induction variable we need to make the vector
       // consecutive by adding 0, 1, 2, etc.
       for (unsigned part = 0; part < UF; ++part)
         Entry[part] = getStepVector(Broadcasted, VF * part, II.getStepValue());
       return;
     }
     case InductionDescriptor::IK_PtrInduction:
       // Handle the pointer induction variable case.
       assert(P->getType()->isPointerTy() && "Unexpected type.");
       // This is the normalized GEP that starts counting at zero.
       Value *PtrInd = Induction;
       PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStepValue()->getType());
       // This is the vector of results. Notice that we don't generate
       // vector geps because scalar geps result in better code.
       for (unsigned part = 0; part < UF; ++part) {
         if (VF == 1) {
           int EltIndex = part;
           Constant *Idx = ConstantInt::get(PtrInd->getType(), EltIndex);
           Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
           Value *SclrGep = II.transform(Builder, GlobalIdx);
           SclrGep->setName("next.gep");
           Entry[part] = SclrGep;
           continue;
         }
 
         Value *VecVal = UndefValue::get(VectorType::get(P->getType(), VF));
         for (unsigned int i = 0; i < VF; ++i) {
           int EltIndex = i + part * VF;
           Constant *Idx = ConstantInt::get(PtrInd->getType(), EltIndex);
           Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
           Value *SclrGep = II.transform(Builder, GlobalIdx);
           SclrGep->setName("next.gep");
           VecVal = Builder.CreateInsertElement(VecVal, SclrGep,
                                                Builder.getInt32(i),
                                                "insert.gep");
         }
         Entry[part] = VecVal;
       }
       return;
   }
 }
 
 void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock *BB, PhiVector *PV) {
   // For each instruction in the old loop.
   for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
     VectorParts &Entry = WidenMap.get(&*it);
 
     switch (it->getOpcode()) {
     case Instruction::Br:
       // Nothing to do for PHIs and BR, since we already took care of the
       // loop control flow instructions.
       continue;
     case Instruction::PHI: {
       // Vectorize PHINodes.
       widenPHIInstruction(&*it, Entry, UF, VF, PV);
       continue;
     }// End of PHI.
 
     case Instruction::Add:
     case Instruction::FAdd:
     case Instruction::Sub:
     case Instruction::FSub:
     case Instruction::Mul:
     case Instruction::FMul:
     case Instruction::UDiv:
     case Instruction::SDiv:
     case Instruction::FDiv:
     case Instruction::URem:
     case Instruction::SRem:
     case Instruction::FRem:
     case Instruction::Shl:
     case Instruction::LShr:
     case Instruction::AShr:
     case Instruction::And:
     case Instruction::Or:
     case Instruction::Xor: {
       // Just widen binops.
       BinaryOperator *BinOp = dyn_cast<BinaryOperator>(it);
       setDebugLocFromInst(Builder, BinOp);
       VectorParts &A = getVectorValue(it->getOperand(0));
       VectorParts &B = getVectorValue(it->getOperand(1));
 
       // Use this vector value for all users of the original instruction.
       for (unsigned Part = 0; Part < UF; ++Part) {
         Value *V = Builder.CreateBinOp(BinOp->getOpcode(), A[Part], B[Part]);
 
         if (BinaryOperator *VecOp = dyn_cast<BinaryOperator>(V))
           VecOp->copyIRFlags(BinOp);
 
         Entry[Part] = V;
       }
 
       propagateMetadata(Entry, &*it);
       break;
     }
     case Instruction::Select: {
       // Widen selects.
       // If the selector is loop invariant we can create a select
       // instruction with a scalar condition. Otherwise, use vector-select.
       auto *SE = PSE.getSE();
       bool InvariantCond =
           SE->isLoopInvariant(PSE.getSCEV(it->getOperand(0)), OrigLoop);
       setDebugLocFromInst(Builder, &*it);
 
       // The condition can be loop invariant  but still defined inside the
       // loop. This means that we can't just use the original 'cond' value.
       // We have to take the 'vectorized' value and pick the first lane.
       // Instcombine will make this a no-op.
       VectorParts &Cond = getVectorValue(it->getOperand(0));
       VectorParts &Op0  = getVectorValue(it->getOperand(1));
       VectorParts &Op1  = getVectorValue(it->getOperand(2));
       
       Value *ScalarCond = (VF == 1) ? Cond[0] :
         Builder.CreateExtractElement(Cond[0], Builder.getInt32(0));
 
       for (unsigned Part = 0; Part < UF; ++Part) {
         Entry[Part] = Builder.CreateSelect(
           InvariantCond ? ScalarCond : Cond[Part],
           Op0[Part],
           Op1[Part]);
       }
 
       propagateMetadata(Entry, &*it);
       break;
     }
 
     case Instruction::ICmp:
     case Instruction::FCmp: {
       // Widen compares. Generate vector compares.
       bool FCmp = (it->getOpcode() == Instruction::FCmp);
       CmpInst *Cmp = dyn_cast<CmpInst>(it);
       setDebugLocFromInst(Builder, &*it);
       VectorParts &A = getVectorValue(it->getOperand(0));
       VectorParts &B = getVectorValue(it->getOperand(1));
       for (unsigned Part = 0; Part < UF; ++Part) {
         Value *C = nullptr;
         if (FCmp) {
           C = Builder.CreateFCmp(Cmp->getPredicate(), A[Part], B[Part]);
           cast<FCmpInst>(C)->copyFastMathFlags(&*it);
         } else {
           C = Builder.CreateICmp(Cmp->getPredicate(), A[Part], B[Part]);
         }
         Entry[Part] = C;
       }
 
       propagateMetadata(Entry, &*it);
       break;
     }
 
     case Instruction::Store:
     case Instruction::Load:
       vectorizeMemoryInstruction(&*it);
         break;
     case Instruction::ZExt:
     case Instruction::SExt:
     case Instruction::FPToUI:
     case Instruction::FPToSI:
     case Instruction::FPExt:
     case Instruction::PtrToInt:
     case Instruction::IntToPtr:
     case Instruction::SIToFP:
     case Instruction::UIToFP:
     case Instruction::Trunc:
     case Instruction::FPTrunc:
     case Instruction::BitCast: {
       CastInst *CI = dyn_cast<CastInst>(it);
       setDebugLocFromInst(Builder, &*it);
       /// Optimize the special case where the source is the induction
       /// variable. Notice that we can only optimize the 'trunc' case
       /// because: a. FP conversions lose precision, b. sext/zext may wrap,
       /// c. other casts depend on pointer size.
       if (CI->getOperand(0) == OldInduction &&
           it->getOpcode() == Instruction::Trunc) {
         Value *ScalarCast = Builder.CreateCast(CI->getOpcode(), Induction,
                                                CI->getType());
         Value *Broadcasted = getBroadcastInstrs(ScalarCast);
         InductionDescriptor II =
             Legal->getInductionVars()->lookup(OldInduction);
         Constant *Step = ConstantInt::getSigned(
             CI->getType(), II.getStepValue()->getSExtValue());
         for (unsigned Part = 0; Part < UF; ++Part)
           Entry[Part] = getStepVector(Broadcasted, VF * Part, Step);
         propagateMetadata(Entry, &*it);
         break;
       }
       /// Vectorize casts.
       Type *DestTy = (VF == 1) ? CI->getType() :
                                  VectorType::get(CI->getType(), VF);
 
       VectorParts &A = getVectorValue(it->getOperand(0));
       for (unsigned Part = 0; Part < UF; ++Part)
         Entry[Part] = Builder.CreateCast(CI->getOpcode(), A[Part], DestTy);
       propagateMetadata(Entry, &*it);
       break;
     }
 
     case Instruction::Call: {
       // Ignore dbg intrinsics.
       if (isa<DbgInfoIntrinsic>(it))
         break;
       setDebugLocFromInst(Builder, &*it);
 
       Module *M = BB->getParent()->getParent();
       CallInst *CI = cast<CallInst>(it);
 
       StringRef FnName = CI->getCalledFunction()->getName();
       Function *F = CI->getCalledFunction();
       Type *RetTy = ToVectorTy(CI->getType(), VF);
       SmallVector<Type *, 4> Tys;
       for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i)
         Tys.push_back(ToVectorTy(CI->getArgOperand(i)->getType(), VF));
 
       Intrinsic::ID ID = getIntrinsicIDForCall(CI, TLI);
       if (ID &&
           (ID == Intrinsic::assume || ID == Intrinsic::lifetime_end ||
            ID == Intrinsic::lifetime_start)) {
         scalarizeInstruction(&*it);
         break;
       }
       // The flag shows whether we use Intrinsic or a usual Call for vectorized
       // version of the instruction.
       // Is it beneficial to perform intrinsic call compared to lib call?
       bool NeedToScalarize;
       unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);
       bool UseVectorIntrinsic =
           ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;
       if (!UseVectorIntrinsic && NeedToScalarize) {
         scalarizeInstruction(&*it);
         break;
       }
 
       for (unsigned Part = 0; Part < UF; ++Part) {
         SmallVector<Value *, 4> Args;
         for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {
           Value *Arg = CI->getArgOperand(i);
           // Some intrinsics have a scalar argument - don't replace it with a
           // vector.
           if (!UseVectorIntrinsic || !hasVectorInstrinsicScalarOpd(ID, i)) {
             VectorParts &VectorArg = getVectorValue(CI->getArgOperand(i));
             Arg = VectorArg[Part];
           }
           Args.push_back(Arg);
         }
 
         Function *VectorF;
         if (UseVectorIntrinsic) {
           // Use vector version of the intrinsic.
           Type *TysForDecl[] = {CI->getType()};
           if (VF > 1)
             TysForDecl[0] = VectorType::get(CI->getType()->getScalarType(), VF);
           VectorF = Intrinsic::getDeclaration(M, ID, TysForDecl);
         } else {
           // Use vector version of the library call.
           StringRef VFnName = TLI->getVectorizedFunction(FnName, VF);
           assert(!VFnName.empty() && "Vector function name is empty.");
           VectorF = M->getFunction(VFnName);
           if (!VectorF) {
             // Generate a declaration
             FunctionType *FTy = FunctionType::get(RetTy, Tys, false);
             VectorF =
                 Function::Create(FTy, Function::ExternalLinkage, VFnName, M);
             VectorF->copyAttributesFrom(F);
           }
         }
         assert(VectorF && "Can't create vector function.");
         Entry[Part] = Builder.CreateCall(VectorF, Args);
       }
 
       propagateMetadata(Entry, &*it);
       break;
     }
 
     default:
       // All other instructions are unsupported. Scalarize them.
       scalarizeInstruction(&*it);
       break;
     }// end of switch.
   }// end of for_each instr.
 }
 
 void InnerLoopVectorizer::updateAnalysis() {
   // Forget the original basic block.
   PSE.getSE()->forgetLoop(OrigLoop);
 
   // Update the dominator tree information.
   assert(DT->properlyDominates(LoopBypassBlocks.front(), LoopExitBlock) &&
          "Entry does not dominate exit.");
 
   for (unsigned I = 1, E = LoopBypassBlocks.size(); I != E; ++I)
     DT->addNewBlock(LoopBypassBlocks[I], LoopBypassBlocks[I-1]);
   DT->addNewBlock(LoopVectorPreHeader, LoopBypassBlocks.back());
 
   // We don't predicate stores by this point, so the vector body should be a
   // single loop.
   assert(LoopVectorBody.size() == 1 && "Expected single block loop!");
   DT->addNewBlock(LoopVectorBody[0], LoopVectorPreHeader);
 
   DT->addNewBlock(LoopMiddleBlock, LoopVectorBody.back());
   DT->addNewBlock(LoopScalarPreHeader, LoopBypassBlocks[0]);
   DT->changeImmediateDominator(LoopScalarBody, LoopScalarPreHeader);
   DT->changeImmediateDominator(LoopExitBlock, LoopBypassBlocks[0]);
 
   DEBUG(DT->verifyDomTree());
 }
 
 /// \brief Check whether it is safe to if-convert this phi node.
 ///
 /// Phi nodes with constant expressions that can trap are not safe to if
 /// convert.
 static bool canIfConvertPHINodes(BasicBlock *BB) {
   for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {
     PHINode *Phi = dyn_cast<PHINode>(I);
     if (!Phi)
       return true;
     for (unsigned p = 0, e = Phi->getNumIncomingValues(); p != e; ++p)
       if (Constant *C = dyn_cast<Constant>(Phi->getIncomingValue(p)))
         if (C->canTrap())
           return false;
   }
   return true;
 }
 
 bool LoopVectorizationLegality::canVectorizeWithIfConvert() {
   if (!EnableIfConversion) {
     emitAnalysis(VectorizationReport() << "if-conversion is disabled");
     return false;
   }
 
   assert(TheLoop->getNumBlocks() > 1 && "Single block loops are vectorizable");
 
   // A list of pointers that we can safely read and write to.
   SmallPtrSet<Value *, 8> SafePointes;
 
   // Collect safe addresses.
   for (Loop::block_iterator BI = TheLoop->block_begin(),
          BE = TheLoop->block_end(); BI != BE; ++BI) {
     BasicBlock *BB = *BI;
 
     if (blockNeedsPredication(BB))
       continue;
 
     for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {
       if (LoadInst *LI = dyn_cast<LoadInst>(I))
         SafePointes.insert(LI->getPointerOperand());
       else if (StoreInst *SI = dyn_cast<StoreInst>(I))
         SafePointes.insert(SI->getPointerOperand());
     }
   }
 
   // Collect the blocks that need predication.
   BasicBlock *Header = TheLoop->getHeader();
   for (Loop::block_iterator BI = TheLoop->block_begin(),
          BE = TheLoop->block_end(); BI != BE; ++BI) {
     BasicBlock *BB = *BI;
 
     // We don't support switch statements inside loops.
     if (!isa<BranchInst>(BB->getTerminator())) {
       emitAnalysis(VectorizationReport(BB->getTerminator())
                    << "loop contains a switch statement");
       return false;
     }
 
     // We must be able to predicate all blocks that need to be predicated.
     if (blockNeedsPredication(BB)) {
       if (!blockCanBePredicated(BB, SafePointes)) {
         emitAnalysis(VectorizationReport(BB->getTerminator())
                      << "control flow cannot be substituted for a select");
         return false;
       }
     } else if (BB != Header && !canIfConvertPHINodes(BB)) {
       emitAnalysis(VectorizationReport(BB->getTerminator())
                    << "control flow cannot be substituted for a select");
       return false;
     }
   }
 
   // We can if-convert this loop.
   return true;
 }
 
 bool LoopVectorizationLegality::canVectorize() {
   // We must have a loop in canonical form. Loops with indirectbr in them cannot
   // be canonicalized.
   if (!TheLoop->getLoopPreheader()) {
     emitAnalysis(
         VectorizationReport() <<
         "loop control flow is not understood by vectorizer");
     return false;
   }
 
   // We can only vectorize innermost loops.
   if (!TheLoop->empty()) {
     emitAnalysis(VectorizationReport() << "loop is not the innermost loop");
     return false;
   }
 
   // We must have a single backedge.
   if (TheLoop->getNumBackEdges() != 1) {
     emitAnalysis(
         VectorizationReport() <<
         "loop control flow is not understood by vectorizer");
     return false;
   }
 
   // We must have a single exiting block.
   if (!TheLoop->getExitingBlock()) {
     emitAnalysis(
         VectorizationReport() <<
         "loop control flow is not understood by vectorizer");
     return false;
   }
 
   // We only handle bottom-tested loops, i.e. loop in which the condition is
   // checked at the end of each iteration. With that we can assume that all
   // instructions in the loop are executed the same number of times.
   if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
     emitAnalysis(
         VectorizationReport() <<
         "loop control flow is not understood by vectorizer");
     return false;
   }
 
   // We need to have a loop header.
   DEBUG(dbgs() << "LV: Found a loop: " <<
         TheLoop->getHeader()->getName() << '\n');
 
   // Check if we can if-convert non-single-bb loops.
   unsigned NumBlocks = TheLoop->getNumBlocks();
   if (NumBlocks != 1 && !canVectorizeWithIfConvert()) {
     DEBUG(dbgs() << "LV: Can't if-convert the loop.\n");
     return false;
   }
 
   // ScalarEvolution needs to be able to find the exit count.
   const SCEV *ExitCount = PSE.getSE()->getBackedgeTakenCount(TheLoop);
   if (ExitCount == PSE.getSE()->getCouldNotCompute()) {
     emitAnalysis(VectorizationReport()
                  << "could not determine number of loop iterations");
     DEBUG(dbgs() << "LV: SCEV could not compute the loop exit count.\n");
     return false;
   }
 
   // Check if we can vectorize the instructions and CFG in this loop.
   if (!canVectorizeInstrs()) {
     DEBUG(dbgs() << "LV: Can't vectorize the instructions or CFG\n");
     return false;
   }
 
   // Go over each instruction and look at memory deps.
   if (!canVectorizeMemory()) {
     DEBUG(dbgs() << "LV: Can't vectorize due to memory conflicts\n");
     return false;
   }
 
   // Collect all of the variables that remain uniform after vectorization.
   collectLoopUniforms();
 
   DEBUG(dbgs() << "LV: We can vectorize this loop"
                << (LAI->getRuntimePointerChecking()->Need
                        ? " (with a runtime bound check)"
                        : "")
                << "!\n");
 
   bool UseInterleaved = TTI->enableInterleavedAccessVectorization();
 
   // If an override option has been passed in for interleaved accesses, use it.
   if (EnableInterleavedMemAccesses.getNumOccurrences() > 0)
     UseInterleaved = EnableInterleavedMemAccesses;
 
   // Analyze interleaved memory accesses.
   if (UseInterleaved)
     InterleaveInfo.analyzeInterleaving(Strides);
 
   unsigned SCEVThreshold = VectorizeSCEVCheckThreshold;
   if (Hints->getForce() == LoopVectorizeHints::FK_Enabled)
     SCEVThreshold = PragmaVectorizeSCEVCheckThreshold;
 
   if (PSE.getUnionPredicate().getComplexity() > SCEVThreshold) {
     emitAnalysis(VectorizationReport()
                  << "Too many SCEV assumptions need to be made and checked "
                  << "at runtime");
     DEBUG(dbgs() << "LV: Too many SCEV checks needed.\n");
     return false;
   }
 
   // Okay! We can vectorize. At this point we don't have any other mem analysis
   // which may limit our maximum vectorization factor, so just return true with
   // no restrictions.
   return true;
 }
 
 static Type *convertPointerToIntegerType(const DataLayout &DL, Type *Ty) {
   if (Ty->isPointerTy())
     return DL.getIntPtrType(Ty);
 
   // It is possible that char's or short's overflow when we ask for the loop's
   // trip count, work around this by changing the type size.
   if (Ty->getScalarSizeInBits() < 32)
     return Type::getInt32Ty(Ty->getContext());
 
   return Ty;
 }
 
 static Type* getWiderType(const DataLayout &DL, Type *Ty0, Type *Ty1) {
   Ty0 = convertPointerToIntegerType(DL, Ty0);
   Ty1 = convertPointerToIntegerType(DL, Ty1);
   if (Ty0->getScalarSizeInBits() > Ty1->getScalarSizeInBits())
     return Ty0;
   return Ty1;
 }
 
 /// \brief Check that the instruction has outside loop users and is not an
 /// identified reduction variable.
 static bool hasOutsideLoopUser(const Loop *TheLoop, Instruction *Inst,
                                SmallPtrSetImpl<Value *> &Reductions) {
   // Reduction instructions are allowed to have exit users. All other
   // instructions must not have external users.
   if (!Reductions.count(Inst))
     //Check that all of the users of the loop are inside the BB.
     for (User *U : Inst->users()) {
       Instruction *UI = cast<Instruction>(U);
       // This user may be a reduction exit value.
       if (!TheLoop->contains(UI)) {
         DEBUG(dbgs() << "LV: Found an outside user for : " << *UI << '\n');
         return true;
       }
     }
   return false;
 }
 
 bool LoopVectorizationLegality::canVectorizeInstrs() {
   BasicBlock *Header = TheLoop->getHeader();
 
   // Look for the attribute signaling the absence of NaNs.
   Function &F = *Header->getParent();
   const DataLayout &DL = F.getParent()->getDataLayout();
   if (F.hasFnAttribute("no-nans-fp-math"))
     HasFunNoNaNAttr =
         F.getFnAttribute("no-nans-fp-math").getValueAsString() == "true";
 
   // For each block in the loop.
   for (Loop::block_iterator bb = TheLoop->block_begin(),
        be = TheLoop->block_end(); bb != be; ++bb) {
 
     // Scan the instructions in the block and look for hazards.
     for (BasicBlock::iterator it = (*bb)->begin(), e = (*bb)->end(); it != e;
          ++it) {
 
       if (PHINode *Phi = dyn_cast<PHINode>(it)) {
         Type *PhiTy = Phi->getType();
         // Check that this PHI type is allowed.
         if (!PhiTy->isIntegerTy() &&
             !PhiTy->isFloatingPointTy() &&
             !PhiTy->isPointerTy()) {
           emitAnalysis(VectorizationReport(&*it)
                        << "loop control flow is not understood by vectorizer");
           DEBUG(dbgs() << "LV: Found an non-int non-pointer PHI.\n");
           return false;
         }
 
         // If this PHINode is not in the header block, then we know that we
         // can convert it to select during if-conversion. No need to check if
         // the PHIs in this block are induction or reduction variables.
         if (*bb != Header) {
           // Check that this instruction has no outside users or is an
           // identified reduction value with an outside user.
           if (!hasOutsideLoopUser(TheLoop, &*it, AllowedExit))
             continue;
           emitAnalysis(VectorizationReport(&*it) <<
                        "value could not be identified as "
                        "an induction or reduction variable");
           return false;
         }
 
         // We only allow if-converted PHIs with exactly two incoming values.
         if (Phi->getNumIncomingValues() != 2) {
           emitAnalysis(VectorizationReport(&*it)
                        << "control flow not understood by vectorizer");
           DEBUG(dbgs() << "LV: Found an invalid PHI.\n");
           return false;
         }
 
         InductionDescriptor ID;
         if (InductionDescriptor::isInductionPHI(Phi, PSE.getSE(), ID)) {
           Inductions[Phi] = ID;
           // Get the widest type.
           if (!WidestIndTy)
             WidestIndTy = convertPointerToIntegerType(DL, PhiTy);
           else
             WidestIndTy = getWiderType(DL, PhiTy, WidestIndTy);
 
           // Int inductions are special because we only allow one IV.
           if (ID.getKind() == InductionDescriptor::IK_IntInduction &&
               ID.getStepValue()->isOne() &&
               isa<Constant>(ID.getStartValue()) &&
                 cast<Constant>(ID.getStartValue())->isNullValue()) {
             // Use the phi node with the widest type as induction. Use the last
             // one if there are multiple (no good reason for doing this other
             // than it is expedient). We've checked that it begins at zero and
             // steps by one, so this is a canonical induction variable.
             if (!Induction || PhiTy == WidestIndTy)
               Induction = Phi;
           }
 
           DEBUG(dbgs() << "LV: Found an induction variable.\n");
 
           // Until we explicitly handle the case of an induction variable with
           // an outside loop user we have to give up vectorizing this loop.
           if (hasOutsideLoopUser(TheLoop, &*it, AllowedExit)) {
             emitAnalysis(VectorizationReport(&*it) <<
                          "use of induction value outside of the "
                          "loop is not handled by vectorizer");
             return false;
           }
 
           continue;
         }
 
         RecurrenceDescriptor RedDes;
         if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes)) {
           if (RedDes.hasUnsafeAlgebra())
             Requirements->addUnsafeAlgebraInst(RedDes.getUnsafeAlgebraInst());
           AllowedExit.insert(RedDes.getLoopExitInstr());
           Reductions[Phi] = RedDes;
           continue;
         }
 
         emitAnalysis(VectorizationReport(&*it) <<
                      "value that could not be identified as "
                      "reduction is used outside the loop");
         DEBUG(dbgs() << "LV: Found an unidentified PHI."<< *Phi <<"\n");
         return false;
       }// end of PHI handling
 
       // We handle calls that:
       //   * Are debug info intrinsics.
       //   * Have a mapping to an IR intrinsic.
       //   * Have a vector version available.
       CallInst *CI = dyn_cast<CallInst>(it);
       if (CI && !getIntrinsicIDForCall(CI, TLI) && !isa<DbgInfoIntrinsic>(CI) &&
           !(CI->getCalledFunction() && TLI &&
             TLI->isFunctionVectorizable(CI->getCalledFunction()->getName()))) {
         emitAnalysis(VectorizationReport(&*it)
                      << "call instruction cannot be vectorized");
         DEBUG(dbgs() << "LV: Found a non-intrinsic, non-libfunc callsite.\n");
         return false;
       }
 
       // Intrinsics such as powi,cttz and ctlz are legal to vectorize if the
       // second argument is the same (i.e. loop invariant)
       if (CI &&
           hasVectorInstrinsicScalarOpd(getIntrinsicIDForCall(CI, TLI), 1)) {
         auto *SE = PSE.getSE();
         if (!SE->isLoopInvariant(PSE.getSCEV(CI->getOperand(1)), TheLoop)) {
           emitAnalysis(VectorizationReport(&*it)
                        << "intrinsic instruction cannot be vectorized");
           DEBUG(dbgs() << "LV: Found unvectorizable intrinsic " << *CI << "\n");
           return false;
         }
       }
 
       // Check that the instruction return type is vectorizable.
       // Also, we can't vectorize extractelement instructions.
       if ((!VectorType::isValidElementType(it->getType()) &&
            !it->getType()->isVoidTy()) || isa<ExtractElementInst>(it)) {
         emitAnalysis(VectorizationReport(&*it)
                      << "instruction return type cannot be vectorized");
         DEBUG(dbgs() << "LV: Found unvectorizable type.\n");
         return false;
       }
 
       // Check that the stored type is vectorizable.
       if (StoreInst *ST = dyn_cast<StoreInst>(it)) {
         Type *T = ST->getValueOperand()->getType();
         if (!VectorType::isValidElementType(T)) {
           emitAnalysis(VectorizationReport(ST) <<
                        "store instruction cannot be vectorized");
           return false;
         }
         if (EnableMemAccessVersioning)
           collectStridedAccess(ST);
       }
 
       if (EnableMemAccessVersioning)
         if (LoadInst *LI = dyn_cast<LoadInst>(it))
           collectStridedAccess(LI);
 
       // Reduction instructions are allowed to have exit users.
       // All other instructions must not have external users.
       if (hasOutsideLoopUser(TheLoop, &*it, AllowedExit)) {
         emitAnalysis(VectorizationReport(&*it) <<
                      "value cannot be used outside the loop");
         return false;
       }
 
     } // next instr.
 
   }
 
   if (!Induction) {
     DEBUG(dbgs() << "LV: Did not find one integer induction var.\n");
     if (Inductions.empty()) {
       emitAnalysis(VectorizationReport()
                    << "loop induction variable could not be identified");
       return false;
     }
   }
 
   // Now we know the widest induction type, check if our found induction
   // is the same size. If it's not, unset it here and InnerLoopVectorizer
   // will create another.
   if (Induction && WidestIndTy != Induction->getType())
     Induction = nullptr;
 
   return true;
 }
 
 void LoopVectorizationLegality::collectStridedAccess(Value *MemAccess) {
   Value *Ptr = nullptr;
   if (LoadInst *LI = dyn_cast<LoadInst>(MemAccess))
     Ptr = LI->getPointerOperand();
   else if (StoreInst *SI = dyn_cast<StoreInst>(MemAccess))
     Ptr = SI->getPointerOperand();
   else
     return;
 
   Value *Stride = getStrideFromPointer(Ptr, PSE.getSE(), TheLoop);
   if (!Stride)
     return;
 
   DEBUG(dbgs() << "LV: Found a strided access that we can version");
   DEBUG(dbgs() << "  Ptr: " << *Ptr << " Stride: " << *Stride << "\n");
   Strides[Ptr] = Stride;
   StrideSet.insert(Stride);
 }
 
 void LoopVectorizationLegality::collectLoopUniforms() {
   // We now know that the loop is vectorizable!
   // Collect variables that will remain uniform after vectorization.
   std::vector<Value*> Worklist;
   BasicBlock *Latch = TheLoop->getLoopLatch();
 
   // Start with the conditional branch and walk up the block.
   Worklist.push_back(Latch->getTerminator()->getOperand(0));
 
   // Also add all consecutive pointer values; these values will be uniform
   // after vectorization (and subsequent cleanup) and, until revectorization is
   // supported, all dependencies must also be uniform.
   for (Loop::block_iterator B = TheLoop->block_begin(),
        BE = TheLoop->block_end(); B != BE; ++B)
     for (BasicBlock::iterator I = (*B)->begin(), IE = (*B)->end();
          I != IE; ++I)
       if (I->getType()->isPointerTy() && isConsecutivePtr(&*I))
         Worklist.insert(Worklist.end(), I->op_begin(), I->op_end());
 
   while (!Worklist.empty()) {
     Instruction *I = dyn_cast<Instruction>(Worklist.back());
     Worklist.pop_back();
 
     // Look at instructions inside this loop.
     // Stop when reaching PHI nodes.
     // TODO: we need to follow values all over the loop, not only in this block.
     if (!I || !TheLoop->contains(I) || isa<PHINode>(I))
       continue;
 
     // This is a known uniform.
     Uniforms.insert(I);
 
     // Insert all operands.
     Worklist.insert(Worklist.end(), I->op_begin(), I->op_end());
   }
 }
 
 bool LoopVectorizationLegality::canVectorizeMemory() {
   LAI = &LAA->getInfo(TheLoop, Strides);
   auto &OptionalReport = LAI->getReport();
   if (OptionalReport)
     emitAnalysis(VectorizationReport(*OptionalReport));
   if (!LAI->canVectorizeMemory())
     return false;
 
   if (LAI->hasStoreToLoopInvariantAddress()) {
     emitAnalysis(
         VectorizationReport()
         << "write to a loop invariant address could not be vectorized");
     DEBUG(dbgs() << "LV: We don't allow storing to uniform addresses\n");
     return false;
   }
 
   Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());
   PSE.addPredicate(LAI->PSE.getUnionPredicate());
 
   return true;
 }
 
 bool LoopVectorizationLegality::isInductionVariable(const Value *V) {
   Value *In0 = const_cast<Value*>(V);
   PHINode *PN = dyn_cast_or_null<PHINode>(In0);
   if (!PN)
     return false;
 
   return Inductions.count(PN);
 }
 
 bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB)  {
   return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);
 }
 
 bool LoopVectorizationLegality::blockCanBePredicated(BasicBlock *BB,
                                            SmallPtrSetImpl<Value *> &SafePtrs) {
   
   for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
     // Check that we don't have a constant expression that can trap as operand.
     for (Instruction::op_iterator OI = it->op_begin(), OE = it->op_end();
          OI != OE; ++OI) {
       if (Constant *C = dyn_cast<Constant>(*OI))
         if (C->canTrap())
           return false;
     }
     // We might be able to hoist the load.
     if (it->mayReadFromMemory()) {
       LoadInst *LI = dyn_cast<LoadInst>(it);
       if (!LI)
         return false;
       if (!SafePtrs.count(LI->getPointerOperand())) {
         if (isLegalMaskedLoad(LI->getType(), LI->getPointerOperand())) {
           MaskedOp.insert(LI);
           continue;
         }
         return false;
       }
     }
 
     // We don't predicate stores at the moment.
     if (it->mayWriteToMemory()) {
       StoreInst *SI = dyn_cast<StoreInst>(it);
       // We only support predication of stores in basic blocks with one
       // predecessor.
       if (!SI)
         return false;
 
       bool isSafePtr = (SafePtrs.count(SI->getPointerOperand()) != 0);
       bool isSinglePredecessor = SI->getParent()->getSinglePredecessor();
       
       if (++NumPredStores > NumberOfStoresToPredicate || !isSafePtr ||
           !isSinglePredecessor) {
         // Build a masked store if it is legal for the target, otherwise
         // scalarize the block.
         bool isLegalMaskedOp =
           isLegalMaskedStore(SI->getValueOperand()->getType(),
                              SI->getPointerOperand());
         if (isLegalMaskedOp) {
           --NumPredStores;
           MaskedOp.insert(SI);
           continue;
         }
         return false;
       }
     }
     if (it->mayThrow())
       return false;
 
     // The instructions below can trap.
     switch (it->getOpcode()) {
     default: continue;
     case Instruction::UDiv:
     case Instruction::SDiv:
     case Instruction::URem:
     case Instruction::SRem:
       return false;
     }
   }
 
   return true;
 }
 
 void InterleavedAccessInfo::collectConstStridedAccesses(
     MapVector<Instruction *, StrideDescriptor> &StrideAccesses,
     const ValueToValueMap &Strides) {
   // Holds load/store instructions in program order.
   SmallVector<Instruction *, 16> AccessList;
 
   for (auto *BB : TheLoop->getBlocks()) {
     bool IsPred = LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);
 
     for (auto &I : *BB) {
       if (!isa<LoadInst>(&I) && !isa<StoreInst>(&I))
         continue;
       // FIXME: Currently we can't handle mixed accesses and predicated accesses
       if (IsPred)
         return;
 
       AccessList.push_back(&I);
     }
   }
 
   if (AccessList.empty())
     return;
 
   auto &DL = TheLoop->getHeader()->getModule()->getDataLayout();
   for (auto I : AccessList) {
     LoadInst *LI = dyn_cast<LoadInst>(I);
     StoreInst *SI = dyn_cast<StoreInst>(I);
 
     Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand();
     int Stride = isStridedPtr(PSE, Ptr, TheLoop, Strides);
 
     // The factor of the corresponding interleave group.
     unsigned Factor = std::abs(Stride);
 
     // Ignore the access if the factor is too small or too large.
     if (Factor < 2 || Factor > MaxInterleaveGroupFactor)
       continue;
 
     const SCEV *Scev = replaceSymbolicStrideSCEV(PSE, Strides, Ptr);
     PointerType *PtrTy = dyn_cast<PointerType>(Ptr->getType());
     unsigned Size = DL.getTypeAllocSize(PtrTy->getElementType());
 
     // An alignment of 0 means target ABI alignment.
     unsigned Align = LI ? LI->getAlignment() : SI->getAlignment();
     if (!Align)
       Align = DL.getABITypeAlignment(PtrTy->getElementType());
 
     StrideAccesses[I] = StrideDescriptor(Stride, Scev, Size, Align);
   }
 }
 
 // Analyze interleaved accesses and collect them into interleave groups.
 //
 // Notice that the vectorization on interleaved groups will change instruction
 // orders and may break dependences. But the memory dependence check guarantees
 // that there is no overlap between two pointers of different strides, element
 // sizes or underlying bases.
 //
 // For pointers sharing the same stride, element size and underlying base, no
 // need to worry about Read-After-Write dependences and Write-After-Read
 // dependences.
 //
 // E.g. The RAW dependence:  A[i] = a;
 //                           b = A[i];
 // This won't exist as it is a store-load forwarding conflict, which has
 // already been checked and forbidden in the dependence check.
 //
 // E.g. The WAR dependence:  a = A[i];  // (1)
 //                           A[i] = b;  // (2)
 // The store group of (2) is always inserted at or below (2), and the load group
 // of (1) is always inserted at or above (1). The dependence is safe.
 void InterleavedAccessInfo::analyzeInterleaving(
     const ValueToValueMap &Strides) {
   DEBUG(dbgs() << "LV: Analyzing interleaved accesses...\n");
 
   // Holds all the stride accesses.
   MapVector<Instruction *, StrideDescriptor> StrideAccesses;
   collectConstStridedAccesses(StrideAccesses, Strides);
 
   if (StrideAccesses.empty())
     return;
 
   // Holds all interleaved store groups temporarily.
   SmallSetVector<InterleaveGroup *, 4> StoreGroups;
+  // Holds all interleaved load groups temporarily.
+  SmallSetVector<InterleaveGroup *, 4> LoadGroups;
 
   // Search the load-load/write-write pair B-A in bottom-up order and try to
   // insert B into the interleave group of A according to 3 rules:
   //   1. A and B have the same stride.
   //   2. A and B have the same memory object size.
   //   3. B belongs to the group according to the distance.
   //
   // The bottom-up order can avoid breaking the Write-After-Write dependences
   // between two pointers of the same base.
   // E.g.  A[i]   = a;   (1)
   //       A[i]   = b;   (2)
   //       A[i+1] = c    (3)
   // We form the group (2)+(3) in front, so (1) has to form groups with accesses
   // above (1), which guarantees that (1) is always above (2).
   for (auto I = StrideAccesses.rbegin(), E = StrideAccesses.rend(); I != E;
        ++I) {
     Instruction *A = I->first;
     StrideDescriptor DesA = I->second;
 
     InterleaveGroup *Group = getInterleaveGroup(A);
     if (!Group) {
       DEBUG(dbgs() << "LV: Creating an interleave group with:" << *A << '\n');
       Group = createInterleaveGroup(A, DesA.Stride, DesA.Align);
     }
 
     if (A->mayWriteToMemory())
       StoreGroups.insert(Group);
+    else
+      LoadGroups.insert(Group);
 
     for (auto II = std::next(I); II != E; ++II) {
       Instruction *B = II->first;
       StrideDescriptor DesB = II->second;
 
       // Ignore if B is already in a group or B is a different memory operation.
       if (isInterleaved(B) || A->mayReadFromMemory() != B->mayReadFromMemory())
         continue;
 
       // Check the rule 1 and 2.
       if (DesB.Stride != DesA.Stride || DesB.Size != DesA.Size)
         continue;
 
       // Calculate the distance and prepare for the rule 3.
       const SCEVConstant *DistToA = dyn_cast<SCEVConstant>(
           PSE.getSE()->getMinusSCEV(DesB.Scev, DesA.Scev));
       if (!DistToA)
         continue;
 
       int DistanceToA = DistToA->getAPInt().getSExtValue();
 
       // Skip if the distance is not multiple of size as they are not in the
       // same group.
       if (DistanceToA % static_cast<int>(DesA.Size))
         continue;
 
       // The index of B is the index of A plus the related index to A.
       int IndexB =
           Group->getIndex(A) + DistanceToA / static_cast<int>(DesA.Size);
 
       // Try to insert B into the group.
       if (Group->insertMember(B, IndexB, DesB.Align)) {
         DEBUG(dbgs() << "LV: Inserted:" << *B << '\n'
                      << "    into the interleave group with" << *A << '\n');
         InterleaveGroupMap[B] = Group;
 
         // Set the first load in program order as the insert position.
         if (B->mayReadFromMemory())
           Group->setInsertPos(B);
       }
     } // Iteration on instruction B
   }   // Iteration on instruction A
 
   // Remove interleaved store groups with gaps.
   for (InterleaveGroup *Group : StoreGroups)
     if (Group->getNumMembers() != Group->getFactor())
       releaseGroup(Group);
+
+  // Remove interleaved load groups that don't have the first and last member.
+  // This guarantees that we won't do speculative out of bounds loads.
+  for (InterleaveGroup *Group : LoadGroups)
+    if (!Group->getMember(0) || !Group->getMember(Group->getFactor() - 1))
+      releaseGroup(Group);
 }
 
 LoopVectorizationCostModel::VectorizationFactor
 LoopVectorizationCostModel::selectVectorizationFactor(bool OptForSize) {
   // Width 1 means no vectorize
   VectorizationFactor Factor = { 1U, 0U };
   if (OptForSize && Legal->getRuntimePointerChecking()->Need) {
     emitAnalysis(VectorizationReport() <<
                  "runtime pointer checks needed. Enable vectorization of this "
                  "loop with '#pragma clang loop vectorize(enable)' when "
                  "compiling with -Os/-Oz");
     DEBUG(dbgs() <<
           "LV: Aborting. Runtime ptr check is required with -Os/-Oz.\n");
     return Factor;
   }
 
   if (!EnableCondStoresVectorization && Legal->getNumPredStores()) {
     emitAnalysis(VectorizationReport() <<
                  "store that is conditionally executed prevents vectorization");
     DEBUG(dbgs() << "LV: No vectorization. There are conditional stores.\n");
     return Factor;
   }
 
   // Find the trip count.
-  unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
+  unsigned TC = SE->getSmallConstantTripCount(TheLoop);
   DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');
 
   MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);
   unsigned SmallestType, WidestType;
   std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();
   unsigned WidestRegister = TTI.getRegisterBitWidth(true);
   unsigned MaxSafeDepDist = -1U;
   if (Legal->getMaxSafeDepDistBytes() != -1U)
     MaxSafeDepDist = Legal->getMaxSafeDepDistBytes() * 8;
   WidestRegister = ((WidestRegister < MaxSafeDepDist) ?
                     WidestRegister : MaxSafeDepDist);
   unsigned MaxVectorSize = WidestRegister / WidestType;
 
   DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType << " / "
                << WidestType << " bits.\n");
   DEBUG(dbgs() << "LV: The Widest register is: "
           << WidestRegister << " bits.\n");
 
   if (MaxVectorSize == 0) {
     DEBUG(dbgs() << "LV: The target has no vector registers.\n");
     MaxVectorSize = 1;
   }
 
   assert(MaxVectorSize <= 64 && "Did not expect to pack so many elements"
          " into one vector!");
 
   unsigned VF = MaxVectorSize;
   if (MaximizeBandwidth && !OptForSize) {
     // Collect all viable vectorization factors.
     SmallVector<unsigned, 8> VFs;
     unsigned NewMaxVectorSize = WidestRegister / SmallestType;
     for (unsigned VS = MaxVectorSize; VS <= NewMaxVectorSize; VS *= 2)
       VFs.push_back(VS);
 
     // For each VF calculate its register usage.
     auto RUs = calculateRegisterUsage(VFs);
 
     // Select the largest VF which doesn't require more registers than existing
     // ones.
     unsigned TargetNumRegisters = TTI.getNumberOfRegisters(true);
     for (int i = RUs.size() - 1; i >= 0; --i) {
       if (RUs[i].MaxLocalUsers <= TargetNumRegisters) {
         VF = VFs[i];
         break;
       }
     }
   }
 
   // If we optimize the program for size, avoid creating the tail loop.
   if (OptForSize) {
     // If we are unable to calculate the trip count then don't try to vectorize.
     if (TC < 2) {
       emitAnalysis
         (VectorizationReport() <<
          "unable to calculate the loop count due to complex control flow");
       DEBUG(dbgs() << "LV: Aborting. A tail loop is required with -Os/-Oz.\n");
       return Factor;
     }
 
     // Find the maximum SIMD width that can fit within the trip count.
     VF = TC % MaxVectorSize;
 
     if (VF == 0)
       VF = MaxVectorSize;
     else {
       // If the trip count that we found modulo the vectorization factor is not
       // zero then we require a tail.
       emitAnalysis(VectorizationReport() <<
                    "cannot optimize for size and vectorize at the "
                    "same time. Enable vectorization of this loop "
                    "with '#pragma clang loop vectorize(enable)' "
                    "when compiling with -Os/-Oz");
       DEBUG(dbgs() << "LV: Aborting. A tail loop is required with -Os/-Oz.\n");
       return Factor;
     }
   }
 
   int UserVF = Hints->getWidth();
   if (UserVF != 0) {
     assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
     DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
 
     Factor.Width = UserVF;
     return Factor;
   }
 
   float Cost = expectedCost(1);
 #ifndef NDEBUG
   const float ScalarCost = Cost;
 #endif /* NDEBUG */
   unsigned Width = 1;
   DEBUG(dbgs() << "LV: Scalar loop costs: " << (int)ScalarCost << ".\n");
 
   bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled;
   // Ignore scalar width, because the user explicitly wants vectorization.
   if (ForceVectorization && VF > 1) {
     Width = 2;
     Cost = expectedCost(Width) / (float)Width;
   }
 
   for (unsigned i=2; i <= VF; i*=2) {
     // Notice that the vector loop needs to be executed less times, so
     // we need to divide the cost of the vector loops by the width of
     // the vector elements.
     float VectorCost = expectedCost(i) / (float)i;
     DEBUG(dbgs() << "LV: Vector loop of width " << i << " costs: " <<
           (int)VectorCost << ".\n");
     if (VectorCost < Cost) {
       Cost = VectorCost;
       Width = i;
     }
   }
 
   DEBUG(if (ForceVectorization && Width > 1 && Cost >= ScalarCost) dbgs()
         << "LV: Vectorization seems to be not beneficial, "
         << "but was forced by a user.\n");
   DEBUG(dbgs() << "LV: Selecting VF: "<< Width << ".\n");
   Factor.Width = Width;
   Factor.Cost = Width * Cost;
   return Factor;
 }
 
 std::pair<unsigned, unsigned>
 LoopVectorizationCostModel::getSmallestAndWidestTypes() {
   unsigned MinWidth = -1U;
   unsigned MaxWidth = 8;
   const DataLayout &DL = TheFunction->getParent()->getDataLayout();
 
   // For each block.
   for (Loop::block_iterator bb = TheLoop->block_begin(),
        be = TheLoop->block_end(); bb != be; ++bb) {
     BasicBlock *BB = *bb;
 
     // For each instruction in the loop.
     for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
       Type *T = it->getType();
 
       // Skip ignored values.
       if (ValuesToIgnore.count(&*it))
         continue;
 
       // Only examine Loads, Stores and PHINodes.
       if (!isa<LoadInst>(it) && !isa<StoreInst>(it) && !isa<PHINode>(it))
         continue;
 
       // Examine PHI nodes that are reduction variables. Update the type to
       // account for the recurrence type.
       if (PHINode *PN = dyn_cast<PHINode>(it)) {
         if (!Legal->isReductionVariable(PN))
           continue;
         RecurrenceDescriptor RdxDesc = (*Legal->getReductionVars())[PN];
         T = RdxDesc.getRecurrenceType();
       }
 
       // Examine the stored values.
       if (StoreInst *ST = dyn_cast<StoreInst>(it))
         T = ST->getValueOperand()->getType();
 
       // Ignore loaded pointer types and stored pointer types that are not
       // consecutive. However, we do want to take consecutive stores/loads of
       // pointer vectors into account.
       if (T->isPointerTy() && !isConsecutiveLoadOrStore(&*it))
         continue;
 
       MinWidth = std::min(MinWidth,
                           (unsigned)DL.getTypeSizeInBits(T->getScalarType()));
       MaxWidth = std::max(MaxWidth,
                           (unsigned)DL.getTypeSizeInBits(T->getScalarType()));
     }
   }
 
   return {MinWidth, MaxWidth};
 }
 
 unsigned LoopVectorizationCostModel::selectInterleaveCount(bool OptForSize,
                                                            unsigned VF,
                                                            unsigned LoopCost) {
 
   // -- The interleave heuristics --
   // We interleave the loop in order to expose ILP and reduce the loop overhead.
   // There are many micro-architectural considerations that we can't predict
   // at this level. For example, frontend pressure (on decode or fetch) due to
   // code size, or the number and capabilities of the execution ports.
   //
   // We use the following heuristics to select the interleave count:
   // 1. If the code has reductions, then we interleave to break the cross
   // iteration dependency.
   // 2. If the loop is really small, then we interleave to reduce the loop
   // overhead.
   // 3. We don't interleave if we think that we will spill registers to memory
   // due to the increased register pressure.
 
   // When we optimize for size, we don't interleave.
   if (OptForSize)
     return 1;
 
   // We used the distance for the interleave count.
   if (Legal->getMaxSafeDepDistBytes() != -1U)
     return 1;
 
   // Do not interleave loops with a relatively small trip count.
-  unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
+  unsigned TC = SE->getSmallConstantTripCount(TheLoop);
   if (TC > 1 && TC < TinyTripCountInterleaveThreshold)
     return 1;
 
   unsigned TargetNumRegisters = TTI.getNumberOfRegisters(VF > 1);
   DEBUG(dbgs() << "LV: The target has " << TargetNumRegisters <<
         " registers\n");
 
   if (VF == 1) {
     if (ForceTargetNumScalarRegs.getNumOccurrences() > 0)
       TargetNumRegisters = ForceTargetNumScalarRegs;
   } else {
     if (ForceTargetNumVectorRegs.getNumOccurrences() > 0)
       TargetNumRegisters = ForceTargetNumVectorRegs;
   }
 
   RegisterUsage R = calculateRegisterUsage({VF})[0];
   // We divide by these constants so assume that we have at least one
   // instruction that uses at least one register.
   R.MaxLocalUsers = std::max(R.MaxLocalUsers, 1U);
   R.NumInstructions = std::max(R.NumInstructions, 1U);
 
   // We calculate the interleave count using the following formula.
   // Subtract the number of loop invariants from the number of available
   // registers. These registers are used by all of the interleaved instances.
   // Next, divide the remaining registers by the number of registers that is
   // required by the loop, in order to estimate how many parallel instances
   // fit without causing spills. All of this is rounded down if necessary to be
   // a power of two. We want power of two interleave count to simplify any
   // addressing operations or alignment considerations.
   unsigned IC = PowerOf2Floor((TargetNumRegisters - R.LoopInvariantRegs) /
                               R.MaxLocalUsers);
 
   // Don't count the induction variable as interleaved.
   if (EnableIndVarRegisterHeur)
     IC = PowerOf2Floor((TargetNumRegisters - R.LoopInvariantRegs - 1) /
                        std::max(1U, (R.MaxLocalUsers - 1)));
 
   // Clamp the interleave ranges to reasonable counts.
   unsigned MaxInterleaveCount = TTI.getMaxInterleaveFactor(VF);
 
   // Check if the user has overridden the max.
   if (VF == 1) {
     if (ForceTargetMaxScalarInterleaveFactor.getNumOccurrences() > 0)
       MaxInterleaveCount = ForceTargetMaxScalarInterleaveFactor;
   } else {
     if (ForceTargetMaxVectorInterleaveFactor.getNumOccurrences() > 0)
       MaxInterleaveCount = ForceTargetMaxVectorInterleaveFactor;
   }
 
   // If we did not calculate the cost for VF (because the user selected the VF)
   // then we calculate the cost of VF here.
   if (LoopCost == 0)
     LoopCost = expectedCost(VF);
 
   // Clamp the calculated IC to be between the 1 and the max interleave count
   // that the target allows.
   if (IC > MaxInterleaveCount)
     IC = MaxInterleaveCount;
   else if (IC < 1)
     IC = 1;
 
   // Interleave if we vectorized this loop and there is a reduction that could
   // benefit from interleaving.
   if (VF > 1 && Legal->getReductionVars()->size()) {
     DEBUG(dbgs() << "LV: Interleaving because of reductions.\n");
     return IC;
   }
 
   // Note that if we've already vectorized the loop we will have done the
   // runtime check and so interleaving won't require further checks.
   bool InterleavingRequiresRuntimePointerCheck =
       (VF == 1 && Legal->getRuntimePointerChecking()->Need);
 
   // We want to interleave small loops in order to reduce the loop overhead and
   // potentially expose ILP opportunities.
   DEBUG(dbgs() << "LV: Loop cost is " << LoopCost << '\n');
   if (!InterleavingRequiresRuntimePointerCheck && LoopCost < SmallLoopCost) {
     // We assume that the cost overhead is 1 and we use the cost model
     // to estimate the cost of the loop and interleave until the cost of the
     // loop overhead is about 5% of the cost of the loop.
     unsigned SmallIC =
         std::min(IC, (unsigned)PowerOf2Floor(SmallLoopCost / LoopCost));
 
     // Interleave until store/load ports (estimated by max interleave count) are
     // saturated.
     unsigned NumStores = Legal->getNumStores();
     unsigned NumLoads = Legal->getNumLoads();
     unsigned StoresIC = IC / (NumStores ? NumStores : 1);
     unsigned LoadsIC = IC / (NumLoads ? NumLoads : 1);
 
     // If we have a scalar reduction (vector reductions are already dealt with
     // by this point), we can increase the critical path length if the loop
     // we're interleaving is inside another loop. Limit, by default to 2, so the
     // critical path only gets increased by one reduction operation.
     if (Legal->getReductionVars()->size() &&
         TheLoop->getLoopDepth() > 1) {
       unsigned F = static_cast<unsigned>(MaxNestedScalarReductionIC);
       SmallIC = std::min(SmallIC, F);
       StoresIC = std::min(StoresIC, F);
       LoadsIC = std::min(LoadsIC, F);
     }
 
     if (EnableLoadStoreRuntimeInterleave &&
         std::max(StoresIC, LoadsIC) > SmallIC) {
       DEBUG(dbgs() << "LV: Interleaving to saturate store or load ports.\n");
       return std::max(StoresIC, LoadsIC);
     }
 
     DEBUG(dbgs() << "LV: Interleaving to reduce branch cost.\n");
     return SmallIC;
   }
 
   // Interleave if this is a large loop (small loops are already dealt with by
   // this point) that could benefit from interleaving.
   bool HasReductions = (Legal->getReductionVars()->size() > 0);
   if (TTI.enableAggressiveInterleaving(HasReductions)) {
     DEBUG(dbgs() << "LV: Interleaving to expose ILP.\n");
     return IC;
   }
 
   DEBUG(dbgs() << "LV: Not Interleaving.\n");
   return 1;
 }
 
 SmallVector<LoopVectorizationCostModel::RegisterUsage, 8>
 LoopVectorizationCostModel::calculateRegisterUsage(
     const SmallVector<unsigned, 8> &VFs) {
   // This function calculates the register usage by measuring the highest number
   // of values that are alive at a single location. Obviously, this is a very
   // rough estimation. We scan the loop in a topological order in order and
   // assign a number to each instruction. We use RPO to ensure that defs are
   // met before their users. We assume that each instruction that has in-loop
   // users starts an interval. We record every time that an in-loop value is
   // used, so we have a list of the first and last occurrences of each
   // instruction. Next, we transpose this data structure into a multi map that
   // holds the list of intervals that *end* at a specific location. This multi
   // map allows us to perform a linear search. We scan the instructions linearly
   // and record each time that a new interval starts, by placing it in a set.
   // If we find this value in the multi-map then we remove it from the set.
   // The max register usage is the maximum size of the set.
   // We also search for instructions that are defined outside the loop, but are
   // used inside the loop. We need this number separately from the max-interval
   // usage number because when we unroll, loop-invariant values do not take
   // more register.
   LoopBlocksDFS DFS(TheLoop);
   DFS.perform(LI);
 
   RegisterUsage RU;
   RU.NumInstructions = 0;
 
   // Each 'key' in the map opens a new interval. The values
   // of the map are the index of the 'last seen' usage of the
   // instruction that is the key.
   typedef DenseMap<Instruction*, unsigned> IntervalMap;
   // Maps instruction to its index.
   DenseMap<unsigned, Instruction*> IdxToInstr;
   // Marks the end of each interval.
   IntervalMap EndPoint;
   // Saves the list of instruction indices that are used in the loop.
   SmallSet<Instruction*, 8> Ends;
   // Saves the list of values that are used in the loop but are
   // defined outside the loop, such as arguments and constants.
   SmallPtrSet<Value*, 8> LoopInvariants;
 
   unsigned Index = 0;
   for (LoopBlocksDFS::RPOIterator bb = DFS.beginRPO(),
        be = DFS.endRPO(); bb != be; ++bb) {
     RU.NumInstructions += (*bb)->size();
     for (Instruction &I : **bb) {
       IdxToInstr[Index++] = &I;
 
       // Save the end location of each USE.
       for (unsigned i = 0; i < I.getNumOperands(); ++i) {
         Value *U = I.getOperand(i);
         Instruction *Instr = dyn_cast<Instruction>(U);
 
         // Ignore non-instruction values such as arguments, constants, etc.
         if (!Instr) continue;
 
         // If this instruction is outside the loop then record it and continue.
         if (!TheLoop->contains(Instr)) {
           LoopInvariants.insert(Instr);
           continue;
         }
 
         // Overwrite previous end points.
         EndPoint[Instr] = Index;
         Ends.insert(Instr);
       }
     }
   }
 
   // Saves the list of intervals that end with the index in 'key'.
   typedef SmallVector<Instruction*, 2> InstrList;
   DenseMap<unsigned, InstrList> TransposeEnds;
 
   // Transpose the EndPoints to a list of values that end at each index.
   for (IntervalMap::iterator it = EndPoint.begin(), e = EndPoint.end();
        it != e; ++it)
     TransposeEnds[it->second].push_back(it->first);
 
   SmallSet<Instruction*, 8> OpenIntervals;
 
   // Get the size of the widest register.
   unsigned MaxSafeDepDist = -1U;
   if (Legal->getMaxSafeDepDistBytes() != -1U)
     MaxSafeDepDist = Legal->getMaxSafeDepDistBytes() * 8;
   unsigned WidestRegister =
       std::min(TTI.getRegisterBitWidth(true), MaxSafeDepDist);
   const DataLayout &DL = TheFunction->getParent()->getDataLayout();
 
   SmallVector<RegisterUsage, 8> RUs(VFs.size());
   SmallVector<unsigned, 8> MaxUsages(VFs.size(), 0);
 
   DEBUG(dbgs() << "LV(REG): Calculating max register usage:\n");
 
   // A lambda that gets the register usage for the given type and VF.
   auto GetRegUsage = [&DL, WidestRegister](Type *Ty, unsigned VF) {
     unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType());
     return std::max<unsigned>(1, VF * TypeSize / WidestRegister);
   };
 
   for (unsigned int i = 0; i < Index; ++i) {
     Instruction *I = IdxToInstr[i];
     // Ignore instructions that are never used within the loop.
     if (!Ends.count(I)) continue;
 
+    // Skip ignored values.
+    if (ValuesToIgnore.count(I))
+      continue;
+
     // Remove all of the instructions that end at this location.
     InstrList &List = TransposeEnds[i];
     for (unsigned int j = 0, e = List.size(); j < e; ++j)
       OpenIntervals.erase(List[j]);
 
-    // Skip ignored values.
-    if (ValuesToIgnore.count(I))
-      continue;
-
     // For each VF find the maximum usage of registers.
     for (unsigned j = 0, e = VFs.size(); j < e; ++j) {
       if (VFs[j] == 1) {
         MaxUsages[j] = std::max(MaxUsages[j], OpenIntervals.size());
         continue;
       }
 
       // Count the number of live intervals.
       unsigned RegUsage = 0;
-      for (auto Inst : OpenIntervals) {
-        // Skip ignored values for VF > 1.
-        if (VecValuesToIgnore.count(Inst))
-          continue;
+      for (auto Inst : OpenIntervals)
         RegUsage += GetRegUsage(Inst->getType(), VFs[j]);
-      }
       MaxUsages[j] = std::max(MaxUsages[j], RegUsage);
     }
 
     DEBUG(dbgs() << "LV(REG): At #" << i << " Interval # "
                  << OpenIntervals.size() << '\n');
 
     // Add the current instruction to the list of open intervals.
     OpenIntervals.insert(I);
   }
 
   for (unsigned i = 0, e = VFs.size(); i < e; ++i) {
     unsigned Invariant = 0;
     if (VFs[i] == 1)
       Invariant = LoopInvariants.size();
     else {
       for (auto Inst : LoopInvariants)
         Invariant += GetRegUsage(Inst->getType(), VFs[i]);
     }
 
     DEBUG(dbgs() << "LV(REG): VF = " << VFs[i] <<  '\n');
     DEBUG(dbgs() << "LV(REG): Found max usage: " << MaxUsages[i] << '\n');
     DEBUG(dbgs() << "LV(REG): Found invariant usage: " << Invariant << '\n');
     DEBUG(dbgs() << "LV(REG): LoopSize: " << RU.NumInstructions << '\n');
 
     RU.LoopInvariantRegs = Invariant;
     RU.MaxLocalUsers = MaxUsages[i];
     RUs[i] = RU;
   }
 
   return RUs;
 }
 
 unsigned LoopVectorizationCostModel::expectedCost(unsigned VF) {
   unsigned Cost = 0;
 
   // For each block.
   for (Loop::block_iterator bb = TheLoop->block_begin(),
        be = TheLoop->block_end(); bb != be; ++bb) {
     unsigned BlockCost = 0;
     BasicBlock *BB = *bb;
 
     // For each instruction in the old loop.
     for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
       // Skip dbg intrinsics.
       if (isa<DbgInfoIntrinsic>(it))
         continue;
 
       // Skip ignored values.
       if (ValuesToIgnore.count(&*it))
         continue;
 
       unsigned C = getInstructionCost(&*it, VF);
 
       // Check if we should override the cost.
       if (ForceTargetInstructionCost.getNumOccurrences() > 0)
         C = ForceTargetInstructionCost;
 
       BlockCost += C;
       DEBUG(dbgs() << "LV: Found an estimated cost of " << C << " for VF " <<
             VF << " For instruction: " << *it << '\n');
     }
 
     // We assume that if-converted blocks have a 50% chance of being executed.
     // When the code is scalar then some of the blocks are avoided due to CF.
     // When the code is vectorized we execute all code paths.
     if (VF == 1 && Legal->blockNeedsPredication(*bb))
       BlockCost /= 2;
 
     Cost += BlockCost;
   }
 
   return Cost;
 }
 
 /// \brief Check whether the address computation for a non-consecutive memory
 /// access looks like an unlikely candidate for being merged into the indexing
 /// mode.
 ///
 /// We look for a GEP which has one index that is an induction variable and all
 /// other indices are loop invariant. If the stride of this access is also
 /// within a small bound we decide that this address computation can likely be
 /// merged into the addressing mode.
 /// In all other cases, we identify the address computation as complex.
 static bool isLikelyComplexAddressComputation(Value *Ptr,
                                               LoopVectorizationLegality *Legal,
                                               ScalarEvolution *SE,
                                               const Loop *TheLoop) {
   GetElementPtrInst *Gep = dyn_cast<GetElementPtrInst>(Ptr);
   if (!Gep)
     return true;
 
   // We are looking for a gep with all loop invariant indices except for one
   // which should be an induction variable.
   unsigned NumOperands = Gep->getNumOperands();
   for (unsigned i = 1; i < NumOperands; ++i) {
     Value *Opd = Gep->getOperand(i);
     if (!SE->isLoopInvariant(SE->getSCEV(Opd), TheLoop) &&
         !Legal->isInductionVariable(Opd))
       return true;
   }
 
   // Now we know we have a GEP ptr, %inv, %ind, %inv. Make sure that the step
   // can likely be merged into the address computation.
   unsigned MaxMergeDistance = 64;
 
   const SCEVAddRecExpr *AddRec = dyn_cast<SCEVAddRecExpr>(SE->getSCEV(Ptr));
   if (!AddRec)
     return true;
 
   // Check the step is constant.
   const SCEV *Step = AddRec->getStepRecurrence(*SE);
   // Calculate the pointer stride and check if it is consecutive.
   const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);
   if (!C)
     return true;
 
   const APInt &APStepVal = C->getAPInt();
 
   // Huge step value - give up.
   if (APStepVal.getBitWidth() > 64)
     return true;
 
   int64_t StepVal = APStepVal.getSExtValue();
 
   return StepVal > MaxMergeDistance;
 }
 
 static bool isStrideMul(Instruction *I, LoopVectorizationLegality *Legal) {
   return Legal->hasStride(I->getOperand(0)) ||
          Legal->hasStride(I->getOperand(1));
 }
 
 unsigned
 LoopVectorizationCostModel::getInstructionCost(Instruction *I, unsigned VF) {
   // If we know that this instruction will remain uniform, check the cost of
   // the scalar version.
   if (Legal->isUniformAfterVectorization(I))
     VF = 1;
 
   Type *RetTy = I->getType();
   if (VF > 1 && MinBWs.count(I))
     RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);
   Type *VectorTy = ToVectorTy(RetTy, VF);
-  auto SE = PSE.getSE();
 
   // TODO: We need to estimate the cost of intrinsic calls.
   switch (I->getOpcode()) {
   case Instruction::GetElementPtr:
     // We mark this instruction as zero-cost because the cost of GEPs in
     // vectorized code depends on whether the corresponding memory instruction
     // is scalarized or not. Therefore, we handle GEPs with the memory
     // instruction cost.
     return 0;
   case Instruction::Br: {
     return TTI.getCFInstrCost(I->getOpcode());
   }
   case Instruction::PHI:
     //TODO: IF-converted IFs become selects.
     return 0;
   case Instruction::Add:
   case Instruction::FAdd:
   case Instruction::Sub:
   case Instruction::FSub:
   case Instruction::Mul:
   case Instruction::FMul:
   case Instruction::UDiv:
   case Instruction::SDiv:
   case Instruction::FDiv:
   case Instruction::URem:
   case Instruction::SRem:
   case Instruction::FRem:
   case Instruction::Shl:
   case Instruction::LShr:
   case Instruction::AShr:
   case Instruction::And:
   case Instruction::Or:
   case Instruction::Xor: {
     // Since we will replace the stride by 1 the multiplication should go away.
     if (I->getOpcode() == Instruction::Mul && isStrideMul(I, Legal))
       return 0;
     // Certain instructions can be cheaper to vectorize if they have a constant
     // second vector operand. One example of this are shifts on x86.
     TargetTransformInfo::OperandValueKind Op1VK =
       TargetTransformInfo::OK_AnyValue;
     TargetTransformInfo::OperandValueKind Op2VK =
       TargetTransformInfo::OK_AnyValue;
     TargetTransformInfo::OperandValueProperties Op1VP =
         TargetTransformInfo::OP_None;
     TargetTransformInfo::OperandValueProperties Op2VP =
         TargetTransformInfo::OP_None;
     Value *Op2 = I->getOperand(1);
 
     // Check for a splat of a constant or for a non uniform vector of constants.
     if (isa<ConstantInt>(Op2)) {
       ConstantInt *CInt = cast<ConstantInt>(Op2);
       if (CInt && CInt->getValue().isPowerOf2())
         Op2VP = TargetTransformInfo::OP_PowerOf2;
       Op2VK = TargetTransformInfo::OK_UniformConstantValue;
     } else if (isa<ConstantVector>(Op2) || isa<ConstantDataVector>(Op2)) {
       Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;
       Constant *SplatValue = cast<Constant>(Op2)->getSplatValue();
       if (SplatValue) {
         ConstantInt *CInt = dyn_cast<ConstantInt>(SplatValue);
         if (CInt && CInt->getValue().isPowerOf2())
           Op2VP = TargetTransformInfo::OP_PowerOf2;
         Op2VK = TargetTransformInfo::OK_UniformConstantValue;
       }
     }
 
     return TTI.getArithmeticInstrCost(I->getOpcode(), VectorTy, Op1VK, Op2VK,
                                       Op1VP, Op2VP);
   }
   case Instruction::Select: {
     SelectInst *SI = cast<SelectInst>(I);
     const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());
     bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));
     Type *CondTy = SI->getCondition()->getType();
     if (!ScalarCond)
       CondTy = VectorType::get(CondTy, VF);
 
     return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy);
   }
   case Instruction::ICmp:
   case Instruction::FCmp: {
     Type *ValTy = I->getOperand(0)->getType();
     Instruction *Op0AsInstruction = dyn_cast<Instruction>(I->getOperand(0));
     auto It = MinBWs.find(Op0AsInstruction);
     if (VF > 1 && It != MinBWs.end())
       ValTy = IntegerType::get(ValTy->getContext(), It->second);
     VectorTy = ToVectorTy(ValTy, VF);
     return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy);
   }
   case Instruction::Store:
   case Instruction::Load: {
     StoreInst *SI = dyn_cast<StoreInst>(I);
     LoadInst *LI = dyn_cast<LoadInst>(I);
     Type *ValTy = (SI ? SI->getValueOperand()->getType() :
                    LI->getType());
     VectorTy = ToVectorTy(ValTy, VF);
 
     unsigned Alignment = SI ? SI->getAlignment() : LI->getAlignment();
     unsigned AS = SI ? SI->getPointerAddressSpace() :
       LI->getPointerAddressSpace();
     Value *Ptr = SI ? SI->getPointerOperand() : LI->getPointerOperand();
     // We add the cost of address computation here instead of with the gep
     // instruction because only here we know whether the operation is
     // scalarized.
     if (VF == 1)
       return TTI.getAddressComputationCost(VectorTy) +
         TTI.getMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS);
 
     // For an interleaved access, calculate the total cost of the whole
     // interleave group.
     if (Legal->isAccessInterleaved(I)) {
       auto Group = Legal->getInterleavedAccessGroup(I);
       assert(Group && "Fail to get an interleaved access group.");
 
       // Only calculate the cost once at the insert position.
       if (Group->getInsertPos() != I)
         return 0;
 
       unsigned InterleaveFactor = Group->getFactor();
       Type *WideVecTy =
           VectorType::get(VectorTy->getVectorElementType(),
                           VectorTy->getVectorNumElements() * InterleaveFactor);
 
       // Holds the indices of existing members in an interleaved load group.
       // An interleaved store group doesn't need this as it dones't allow gaps.
       SmallVector<unsigned, 4> Indices;
       if (LI) {
         for (unsigned i = 0; i < InterleaveFactor; i++)
           if (Group->getMember(i))
             Indices.push_back(i);
       }
 
       // Calculate the cost of the whole interleaved group.
       unsigned Cost = TTI.getInterleavedMemoryOpCost(
           I->getOpcode(), WideVecTy, Group->getFactor(), Indices,
           Group->getAlignment(), AS);
 
       if (Group->isReverse())
         Cost +=
             Group->getNumMembers() *
             TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy, 0);
 
       // FIXME: The interleaved load group with a huge gap could be even more
       // expensive than scalar operations. Then we could ignore such group and
       // use scalar operations instead.
       return Cost;
     }
 
     // Scalarized loads/stores.
     int ConsecutiveStride = Legal->isConsecutivePtr(Ptr);
     bool Reverse = ConsecutiveStride < 0;
     const DataLayout &DL = I->getModule()->getDataLayout();
     unsigned ScalarAllocatedSize = DL.getTypeAllocSize(ValTy);
     unsigned VectorElementSize = DL.getTypeStoreSize(VectorTy) / VF;
     if (!ConsecutiveStride || ScalarAllocatedSize != VectorElementSize) {
       bool IsComplexComputation =
         isLikelyComplexAddressComputation(Ptr, Legal, SE, TheLoop);
       unsigned Cost = 0;
       // The cost of extracting from the value vector and pointer vector.
       Type *PtrTy = ToVectorTy(Ptr->getType(), VF);
       for (unsigned i = 0; i < VF; ++i) {
         //  The cost of extracting the pointer operand.
         Cost += TTI.getVectorInstrCost(Instruction::ExtractElement, PtrTy, i);
         // In case of STORE, the cost of ExtractElement from the vector.
         // In case of LOAD, the cost of InsertElement into the returned
         // vector.
         Cost += TTI.getVectorInstrCost(SI ? Instruction::ExtractElement :
                                             Instruction::InsertElement,
                                             VectorTy, i);
       }
 
       // The cost of the scalar loads/stores.
       Cost += VF * TTI.getAddressComputationCost(PtrTy, IsComplexComputation);
       Cost += VF * TTI.getMemoryOpCost(I->getOpcode(), ValTy->getScalarType(),
                                        Alignment, AS);
       return Cost;
     }
 
     // Wide load/stores.
     unsigned Cost = TTI.getAddressComputationCost(VectorTy);
     if (Legal->isMaskRequired(I))
       Cost += TTI.getMaskedMemoryOpCost(I->getOpcode(), VectorTy, Alignment,
                                         AS);
     else
       Cost += TTI.getMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS);
 
     if (Reverse)
       Cost += TTI.getShuffleCost(TargetTransformInfo::SK_Reverse,
                                   VectorTy, 0);
     return Cost;
   }
   case Instruction::ZExt:
   case Instruction::SExt:
   case Instruction::FPToUI:
   case Instruction::FPToSI:
   case Instruction::FPExt:
   case Instruction::PtrToInt:
   case Instruction::IntToPtr:
   case Instruction::SIToFP:
   case Instruction::UIToFP:
   case Instruction::Trunc:
   case Instruction::FPTrunc:
   case Instruction::BitCast: {
     // We optimize the truncation of induction variable.
     // The cost of these is the same as the scalar operation.
     if (I->getOpcode() == Instruction::Trunc &&
         Legal->isInductionVariable(I->getOperand(0)))
       return TTI.getCastInstrCost(I->getOpcode(), I->getType(),
                                   I->getOperand(0)->getType());
     
     Type *SrcScalarTy = I->getOperand(0)->getType();
     Type *SrcVecTy = ToVectorTy(SrcScalarTy, VF);
     if (VF > 1 && MinBWs.count(I)) {
       // This cast is going to be shrunk. This may remove the cast or it might
       // turn it into slightly different cast. For example, if MinBW == 16,
       // "zext i8 %1 to i32" becomes "zext i8 %1 to i16".
       //
       // Calculate the modified src and dest types.
       Type *MinVecTy = VectorTy;
       if (I->getOpcode() == Instruction::Trunc) {
         SrcVecTy = smallestIntegerVectorType(SrcVecTy, MinVecTy);
         VectorTy = largestIntegerVectorType(ToVectorTy(I->getType(), VF),
                                             MinVecTy);
       } else if (I->getOpcode() == Instruction::ZExt ||
                  I->getOpcode() == Instruction::SExt) {
         SrcVecTy = largestIntegerVectorType(SrcVecTy, MinVecTy);
         VectorTy = smallestIntegerVectorType(ToVectorTy(I->getType(), VF),
                                              MinVecTy);
       }
     }
     
     return TTI.getCastInstrCost(I->getOpcode(), VectorTy, SrcVecTy);
   }
   case Instruction::Call: {
     bool NeedToScalarize;
     CallInst *CI = cast<CallInst>(I);
     unsigned CallCost = getVectorCallCost(CI, VF, TTI, TLI, NeedToScalarize);
     if (getIntrinsicIDForCall(CI, TLI))
       return std::min(CallCost, getVectorIntrinsicCost(CI, VF, TTI, TLI));
     return CallCost;
   }
   default: {
     // We are scalarizing the instruction. Return the cost of the scalar
     // instruction, plus the cost of insert and extract into vector
     // elements, times the vector width.
     unsigned Cost = 0;
 
     if (!RetTy->isVoidTy() && VF != 1) {
       unsigned InsCost = TTI.getVectorInstrCost(Instruction::InsertElement,
                                                 VectorTy);
       unsigned ExtCost = TTI.getVectorInstrCost(Instruction::ExtractElement,
                                                 VectorTy);
 
       // The cost of inserting the results plus extracting each one of the
       // operands.
       Cost += VF * (InsCost + ExtCost * I->getNumOperands());
     }
 
     // The cost of executing VF copies of the scalar instruction. This opcode
     // is unknown. Assume that it is the same as 'mul'.
     Cost += VF * TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy);
     return Cost;
   }
   }// end of switch.
 }
 
 char LoopVectorize::ID = 0;
 static const char lv_name[] = "Loop Vectorization";
 INITIALIZE_PASS_BEGIN(LoopVectorize, LV_NAME, lv_name, false, false)
 INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(BasicAAWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
 INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfoWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(LCSSA)
 INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(LoopSimplify)
 INITIALIZE_PASS_DEPENDENCY(LoopAccessAnalysis)
 INITIALIZE_PASS_DEPENDENCY(DemandedBits)
 INITIALIZE_PASS_END(LoopVectorize, LV_NAME, lv_name, false, false)
 
 namespace llvm {
   Pass *createLoopVectorizePass(bool NoUnrolling, bool AlwaysVectorize) {
     return new LoopVectorize(NoUnrolling, AlwaysVectorize);
   }
 }
 
 bool LoopVectorizationCostModel::isConsecutiveLoadOrStore(Instruction *Inst) {
   // Check for a store.
   if (StoreInst *ST = dyn_cast<StoreInst>(Inst))
     return Legal->isConsecutivePtr(ST->getPointerOperand()) != 0;
 
   // Check for a load.
   if (LoadInst *LI = dyn_cast<LoadInst>(Inst))
     return Legal->isConsecutivePtr(LI->getPointerOperand()) != 0;
 
   return false;
 }
 
-void LoopVectorizationCostModel::collectValuesToIgnore() {
-  // Ignore ephemeral values.
-  CodeMetrics::collectEphemeralValues(TheLoop, AC, ValuesToIgnore);
-
-  // Ignore type-promoting instructions we identified during reduction
-  // detection.
-  for (auto &Reduction : *Legal->getReductionVars()) {
-    RecurrenceDescriptor &RedDes = Reduction.second;
-    SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();
-    VecValuesToIgnore.insert(Casts.begin(), Casts.end());
-  }
-
-  // Ignore induction phis that are only used in either GetElementPtr or ICmp
-  // instruction to exit loop. Induction variables usually have large types and
-  // can have big impact when estimating register usage.
-  // This is for when VF > 1.
-  for (auto &Induction : *Legal->getInductionVars()) {
-    auto *PN = Induction.first;
-    auto *UpdateV = PN->getIncomingValueForBlock(TheLoop->getLoopLatch());
-
-    // Check that the PHI is only used by the induction increment (UpdateV) or
-    // by GEPs. Then check that UpdateV is only used by a compare instruction or
-    // the loop header PHI.
-    // FIXME: Need precise def-use analysis to determine if this instruction
-    // variable will be vectorized.
-    if (std::all_of(PN->user_begin(), PN->user_end(),
-                    [&](const User *U) -> bool {
-                      return U == UpdateV || isa<GetElementPtrInst>(U);
-                    }) &&
-        std::all_of(UpdateV->user_begin(), UpdateV->user_end(),
-                    [&](const User *U) -> bool {
-                      return U == PN || isa<ICmpInst>(U);
-                    })) {
-      VecValuesToIgnore.insert(PN);
-      VecValuesToIgnore.insert(UpdateV);
-    }
-  }
-
-  // Ignore instructions that will not be vectorized.
-  // This is for when VF > 1.
-  for (auto bb = TheLoop->block_begin(), be = TheLoop->block_end(); bb != be;
-       ++bb) {
-    for (auto &Inst : **bb) {
-      switch (Inst.getOpcode()) {
-      case Instruction::GetElementPtr: {
-        // Ignore GEP if its last operand is an induction variable so that it is
-        // a consecutive load/store and won't be vectorized as scatter/gather
-        // pattern.
-
-        GetElementPtrInst *Gep = cast<GetElementPtrInst>(&Inst);
-        unsigned NumOperands = Gep->getNumOperands();
-        unsigned InductionOperand = getGEPInductionOperand(Gep);
-        bool GepToIgnore = true;
-
-        // Check that all of the gep indices are uniform except for the
-        // induction operand.
-        for (unsigned i = 0; i != NumOperands; ++i) {
-          if (i != InductionOperand &&
-              !PSE.getSE()->isLoopInvariant(PSE.getSCEV(Gep->getOperand(i)),
-                                            TheLoop)) {
-            GepToIgnore = false;
-            break;
-          }
-        }
-
-        if (GepToIgnore)
-          VecValuesToIgnore.insert(&Inst);
-        break;
-      }
-      }
-    }
-  }
-}
 
 void InnerLoopUnroller::scalarizeInstruction(Instruction *Instr,
                                              bool IfPredicateStore) {
   assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
   // Holds vector parameters or scalars, in case of uniform vals.
   SmallVector<VectorParts, 4> Params;
 
   setDebugLocFromInst(Builder, Instr);
 
   // Find all of the vectorized parameters.
   for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
     Value *SrcOp = Instr->getOperand(op);
 
     // If we are accessing the old induction variable, use the new one.
     if (SrcOp == OldInduction) {
       Params.push_back(getVectorValue(SrcOp));
       continue;
     }
 
     // Try using previously calculated values.
     Instruction *SrcInst = dyn_cast<Instruction>(SrcOp);
 
     // If the src is an instruction that appeared earlier in the basic block
     // then it should already be vectorized.
     if (SrcInst && OrigLoop->contains(SrcInst)) {
       assert(WidenMap.has(SrcInst) && "Source operand is unavailable");
       // The parameter is a vector value from earlier.
       Params.push_back(WidenMap.get(SrcInst));
     } else {
       // The parameter is a scalar from outside the loop. Maybe even a constant.
       VectorParts Scalars;
       Scalars.append(UF, SrcOp);
       Params.push_back(Scalars);
     }
   }
 
   assert(Params.size() == Instr->getNumOperands() &&
          "Invalid number of operands");
 
   // Does this instruction return a value ?
   bool IsVoidRetTy = Instr->getType()->isVoidTy();
 
   Value *UndefVec = IsVoidRetTy ? nullptr :
   UndefValue::get(Instr->getType());
   // Create a new entry in the WidenMap and initialize it to Undef or Null.
   VectorParts &VecResults = WidenMap.splat(Instr, UndefVec);
 
   VectorParts Cond;
   if (IfPredicateStore) {
     assert(Instr->getParent()->getSinglePredecessor() &&
            "Only support single predecessor blocks");
     Cond = createEdgeMask(Instr->getParent()->getSinglePredecessor(),
                           Instr->getParent());
   }
 
   // For each vector unroll 'part':
   for (unsigned Part = 0; Part < UF; ++Part) {
     // For each scalar that we create:
 
     // Start an "if (pred) a[i] = ..." block.
     Value *Cmp = nullptr;
     if (IfPredicateStore) {
       if (Cond[Part]->getType()->isVectorTy())
         Cond[Part] =
             Builder.CreateExtractElement(Cond[Part], Builder.getInt32(0));
       Cmp = Builder.CreateICmp(ICmpInst::ICMP_EQ, Cond[Part],
                                ConstantInt::get(Cond[Part]->getType(), 1));
     }
 
     Instruction *Cloned = Instr->clone();
       if (!IsVoidRetTy)
         Cloned->setName(Instr->getName() + ".cloned");
       // Replace the operands of the cloned instructions with extracted scalars.
       for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
         Value *Op = Params[op][Part];
         Cloned->setOperand(op, Op);
       }
 
       // Place the cloned scalar in the new loop.
       Builder.Insert(Cloned);
 
       // If the original scalar returns a value we need to place it in a vector
       // so that future users will be able to use it.
       if (!IsVoidRetTy)
         VecResults[Part] = Cloned;
 
       // End if-block.
       if (IfPredicateStore)
         PredicatedStores.push_back(std::make_pair(cast<StoreInst>(Cloned),
                                                   Cmp));
   }
 }
 
 void InnerLoopUnroller::vectorizeMemoryInstruction(Instruction *Instr) {
   StoreInst *SI = dyn_cast<StoreInst>(Instr);
   bool IfPredicateStore = (SI && Legal->blockNeedsPredication(SI->getParent()));
 
   return scalarizeInstruction(Instr, IfPredicateStore);
 }
 
 Value *InnerLoopUnroller::reverseVector(Value *Vec) {
   return Vec;
 }
 
 Value *InnerLoopUnroller::getBroadcastInstrs(Value *V) {
   return V;
 }
 
 Value *InnerLoopUnroller::getStepVector(Value *Val, int StartIdx, Value *Step) {
   // When unrolling and the VF is 1, we only need to add a simple scalar.
   Type *ITy = Val->getType();
   assert(!ITy->isVectorTy() && "Val must be a scalar");
   Constant *C = ConstantInt::get(ITy, StartIdx);
   return Builder.CreateAdd(Val, Builder.CreateMul(C, Step), "induction");
 }
Index: vendor/llvm/dist/test/CodeGen/AArch64/fcopysign.ll
===================================================================
--- vendor/llvm/dist/test/CodeGen/AArch64/fcopysign.ll	(nonexistent)
+++ vendor/llvm/dist/test/CodeGen/AArch64/fcopysign.ll	(revision 295846)
@@ -0,0 +1,23 @@
+; RUN: llc -o - %s | FileCheck %s
+; Check that selection dag legalization of fcopysign works in cases with
+; different modes for the arguments.
+target triple = "aarch64--"
+
+declare fp128 @llvm.copysign.f128(fp128, fp128)
+
+@val = global double zeroinitializer, align 8
+
+; CHECK-LABEL: copysign0
+; CHECK: ldr [[REG:x[0-9]+]], [x8, :lo12:val]
+; CHECK: and [[ANDREG:x[0-9]+]], [[REG]], #0x8000000000000000
+; CHECK: lsr x[[LSRREGNUM:[0-9]+]], [[ANDREG]], #56
+; CHECK: bfxil w[[LSRREGNUM]], w{{[0-9]+}}, #0, #7
+; CHECK: strb w[[LSRREGNUM]],
+; CHECK: ldr q{{[0-9]+}},
+define fp128 @copysign0() {
+entry:
+  %v = load double, double* @val, align 8
+  %conv = fpext double %v to fp128
+  %call = tail call fp128 @llvm.copysign.f128(fp128 0xL00000000000000007FFF000000000000, fp128 %conv) #2
+  ret fp128 %call
+}
Index: vendor/llvm/dist/test/CodeGen/WinEH/wineh-noret-cleanup.ll
===================================================================
--- vendor/llvm/dist/test/CodeGen/WinEH/wineh-noret-cleanup.ll	(nonexistent)
+++ vendor/llvm/dist/test/CodeGen/WinEH/wineh-noret-cleanup.ll	(revision 295846)
@@ -0,0 +1,80 @@
+; RUN: sed -e s/.Cxx:// %s | llc -mtriple=x86_64-pc-windows-msvc | FileCheck %s --check-prefix=CXX
+; RUN: sed -e s/.Seh:// %s | llc -mtriple=x86_64-pc-windows-msvc | FileCheck %s --check-prefix=SEH
+
+declare i32 @__CxxFrameHandler3(...)
+declare i32 @__C_specific_handler(...)
+declare void @dummy_filter()
+
+declare void @f(i32)
+
+;Cxx: define void @test() personality i32 (...)* @__CxxFrameHandler3 {
+;Seh: define void @test() personality i32 (...)* @__C_specific_handler {
+entry:
+  invoke void @f(i32 1)
+          to label %invoke.cont unwind label %catch.dispatch
+
+catch.dispatch:
+  %cs1 = catchswitch within none [label %catch.body] unwind label %catch.dispatch.2
+
+catch.body:
+;Cxx: %catch = catchpad within %cs1 [i8* null, i32 u0x40, i8* null]
+;Seh: %catch = catchpad within %cs1 [void ()* @dummy_filter]
+  invoke void @f(i32 2) [ "funclet"(token %catch) ]
+          to label %unreachable unwind label %terminate
+
+terminate:
+  %cleanup = cleanuppad within %catch []
+  call void @f(i32 3) [ "funclet"(token %cleanup) ]
+  unreachable
+
+unreachable:
+  unreachable
+
+invoke.cont:
+  ret void
+
+catch.dispatch.2:
+  %cs2 = catchswitch within none [label %catch.body.2] unwind to caller
+
+catch.body.2:
+;Cxx: %catch2 = catchpad within %cs2 [i8* null, i32 u0x40, i8* null]
+;Seh: %catch2 = catchpad within %cs2 [void ()* @dummy_filter]
+  unreachable
+}
+
+; CXX-LABEL: test:
+; CXX-LABEL: $ip2state$test:
+; CXX-NEXT:   .long   .Lfunc_begin0@IMGREL
+; CXX-NEXT:   .long   -1
+; CXX-NEXT:   .long   .Ltmp0@IMGREL+1
+; CXX-NEXT:   .long   1
+; CXX-NEXT:   .long   .Ltmp1@IMGREL+1
+; CXX-NEXT:   .long   -1
+; CXX-NEXT:   .long   "?catch$3@?0?test@4HA"@IMGREL
+; CXX-NEXT:   .long   2
+; CXX-NEXT:   .long   .Ltmp2@IMGREL+1
+; CXX-NEXT:   .long   3
+; CXX-NEXT:   .long   .Ltmp3@IMGREL+1
+; CXX-NEXT:   .long   2
+; CXX-NEXT:   .long   "?catch$5@?0?test@4HA"@IMGREL
+; CXX-NEXT:   .long   4
+
+; SEH-LABEL: test:
+; SEH-LABEL: .Llsda_begin0:
+; SEH-NEXT:    .long   .Ltmp0@IMGREL+1
+; SEH-NEXT:    .long   .Ltmp1@IMGREL+1
+; SEH-NEXT:    .long   dummy_filter@IMGREL
+; SEH-NEXT:    .long   .LBB0_3@IMGREL
+; SEH-NEXT:    .long   .Ltmp0@IMGREL+1
+; SEH-NEXT:    .long   .Ltmp1@IMGREL+1
+; SEH-NEXT:    .long   dummy_filter@IMGREL
+; SEH-NEXT:    .long   .LBB0_5@IMGREL
+; SEH-NEXT:    .long   .Ltmp2@IMGREL+1
+; SEH-NEXT:    .long   .Ltmp3@IMGREL+1
+; SEH-NEXT:    .long   "?dtor$2@?0?test@4HA"@IMGREL
+; SEH-NEXT:    .long   0
+; SEH-NEXT:    .long   .Ltmp2@IMGREL+1
+; SEH-NEXT:    .long   .Ltmp3@IMGREL+1
+; SEH-NEXT:    .long   dummy_filter@IMGREL
+; SEH-NEXT:    .long   .LBB0_5@IMGREL
+; SEH-NEXT:  .Llsda_end0:
Index: vendor/llvm/dist/test/CodeGen/X86/pr26625.ll
===================================================================
--- vendor/llvm/dist/test/CodeGen/X86/pr26625.ll	(nonexistent)
+++ vendor/llvm/dist/test/CodeGen/X86/pr26625.ll	(revision 295846)
@@ -0,0 +1,20 @@
+; RUN: llc < %s -mcpu=i686 2>&1 | FileCheck %s
+; PR26625
+
+target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
+target triple = "i386"
+
+define float @x0(float %f) #0 {
+entry:
+  %call = tail call float @sqrtf(float %f) #1
+  ret float %call
+; CHECK-LABEL: x0:
+; CHECK: flds
+; CHECK-NEXT: fsqrt
+; CHECK-NOT: vsqrtss
+}
+
+declare float @sqrtf(float) #0
+
+attributes #0 = { nounwind optsize readnone }
+attributes #1 = { nounwind optsize readnone }
Index: vendor/llvm/dist/test/CodeGen/X86/regalloc-spill-at-ehpad.ll
===================================================================
--- vendor/llvm/dist/test/CodeGen/X86/regalloc-spill-at-ehpad.ll	(nonexistent)
+++ vendor/llvm/dist/test/CodeGen/X86/regalloc-spill-at-ehpad.ll	(revision 295846)
@@ -0,0 +1,75 @@
+; RUN: llc -regalloc=greedy -mtriple=x86_64-pc-windows-msvc  < %s -o - | FileCheck %s
+
+; This test checks for proper handling of a condition where the greedy register
+; allocator encounters a very short interval that contains no uses but does
+; contain an EH pad unwind edge, which requires spilling.  Previously the
+; register allocator marked a interval like this as unspillable, resulting in
+; a compilation failure.
+
+
+; The following checks that the value %p is reloaded within the catch handler.
+; CHECK-LABEL: "?catch$8@?0?test@4HA":
+; CHECK:           .seh_endprologue
+; CHECK:           movq    -16(%rbp), %rax
+; CHECK:           movb    $0, (%rax)
+
+define i32* @test(i32* %a) personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*) {
+entry:
+  %call = call i32 @f()
+  %p = bitcast i32* %a to i8*
+  br i1 undef, label %if.end, label %if.else
+
+if.else:                                          ; preds = %entry
+  br i1 undef, label %cond.false.i, label %if.else.else
+
+if.else.else:                                     ; preds = %if.else
+  br i1 undef, label %cond.true.i, label %cond.false.i
+
+cond.true.i:                                      ; preds = %if.else.else
+  br label %invoke.cont
+
+cond.false.i:                                     ; preds = %if.else.else, %if.else
+  %call.i = invoke i32 @f()
+          to label %invoke.cont unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %cond.false.i
+  %tmp0 = catchswitch within none [label %catch] unwind label %ehcleanup
+
+catch:                                            ; preds = %catch.dispatch
+  %tmp1 = catchpad within %tmp0 [i8* null, i32 64, i8* null]
+  %p.0 = getelementptr inbounds i8, i8* %p, i64 0
+  store i8 0, i8* %p.0, align 8
+  invoke void @_CxxThrowException(i8* null, %eh.ThrowInfo* null) [ "funclet"(token %tmp1) ]
+          to label %noexc unwind label %ehcleanup
+
+noexc:                                            ; preds = %catch
+  unreachable
+
+invoke.cont:                                      ; preds = %cond.false.i, %cond.true.i
+  %cond.i = phi i32 [ %call, %cond.true.i ], [ %call.i, %cond.false.i ]
+  %cmp = icmp eq i32 %cond.i, -1
+  %tmp3 = select i1 %cmp, i32 4, i32 0
+  br label %if.end
+
+if.end:                                           ; preds = %invoke.cont, %entry
+  %state.0 = phi i32 [ %tmp3, %invoke.cont ], [ 4, %entry ]
+  %p.1 = getelementptr inbounds i8, i8* %p, i64 0
+  invoke void @g(i8* %p.1, i32 %state.0)
+          to label %invoke.cont.1 unwind label %ehcleanup
+
+invoke.cont.1:                                    ; preds = %if.end
+  ret i32* %a
+
+ehcleanup:                                        ; preds = %if.end, %catch, %catch.dispatch
+  %tmp4 = cleanuppad within none []
+  cleanupret from %tmp4 unwind to caller
+}
+
+%eh.ThrowInfo = type { i32, i32, i32, i32 }
+
+declare i32 @__CxxFrameHandler3(...)
+
+declare void @_CxxThrowException(i8*, %eh.ThrowInfo*)
+
+declare i32 @f()
+declare void @g(i8*, i32)
Index: vendor/llvm/dist/test/CodeGen/X86/shrink-wrap-chkstk.ll
===================================================================
--- vendor/llvm/dist/test/CodeGen/X86/shrink-wrap-chkstk.ll	(revision 295845)
+++ vendor/llvm/dist/test/CodeGen/X86/shrink-wrap-chkstk.ll	(revision 295846)
@@ -1,37 +1,72 @@
 ; RUN: llc < %s -enable-shrink-wrap=true | FileCheck %s
 
 ; chkstk cannot come before the usual prologue, since it adjusts ESP.
+; If chkstk is used in the prologue, we also have to be careful about preserving
+; EAX if it is used.
 
 target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"
 target triple = "i686-pc-windows-msvc18.0.0"
 
 %struct.S = type { [12 x i8] }
 
 define x86_thiscallcc void @call_inalloca(i1 %x) {
 entry:
   %argmem = alloca inalloca <{ %struct.S }>, align 4
   %argidx1 = getelementptr inbounds <{ %struct.S }>, <{ %struct.S }>* %argmem, i32 0, i32 0, i32 0, i32 0
   %argidx2 = getelementptr inbounds <{ %struct.S }>, <{ %struct.S }>* %argmem, i32 0, i32 0, i32 0, i32 1
   store i8 42, i8* %argidx2, align 4
   br i1 %x, label %bb1, label %bb2
 
 bb1:
   store i8 42, i8* %argidx1, align 4
   br label %bb2
 
 bb2:
   call void @inalloca_params(<{ %struct.S }>* inalloca nonnull %argmem)
   ret void
 }
 
 ; CHECK-LABEL: _call_inalloca: # @call_inalloca
 ; CHECK: pushl %ebp
 ; CHECK: movl %esp, %ebp
 ; CHECK: movl $12, %eax
 ; CHECK: calll __chkstk
 ; CHECK: calll _inalloca_params
 ; CHECK: movl %ebp, %esp
 ; CHECK: popl %ebp
 ; CHECK: retl
 
 declare void @inalloca_params(<{ %struct.S }>* inalloca)
+
+declare i32 @doSomething(i32, i32*)
+
+; In this test case, we force usage of EAX before the prologue, and have to
+; compensate before calling __chkstk. It would also be valid for us to avoid
+; shrink wrapping in this case.
+
+define x86_fastcallcc i32 @use_eax_before_prologue(i32 inreg %a, i32 inreg %b) {
+  %tmp = alloca i32, i32 1024, align 4
+  %tmp2 = icmp slt i32 %a, %b
+  br i1 %tmp2, label %true, label %false
+
+true:
+  store i32 %a, i32* %tmp, align 4
+  %tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
+  br label %false
+
+false:
+  %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
+  ret i32 %tmp.0
+}
+
+; CHECK-LABEL: @use_eax_before_prologue@8: # @use_eax_before_prologue
+; CHECK: movl %ecx, %eax
+; CHECK: cmpl %edx, %eax
+; CHECK: jge LBB1_2
+; CHECK: pushl %eax
+; CHECK: movl $4100, %eax
+; CHECK: calll __chkstk
+; CHECK: movl 4100(%esp), %eax
+; CHECK: calll _doSomething
+; CHECK: LBB1_2:
+; CHECK: retl
Index: vendor/llvm/dist/test/MC/Sparc/sparc-ctrl-instructions.s
===================================================================
--- vendor/llvm/dist/test/MC/Sparc/sparc-ctrl-instructions.s	(revision 295845)
+++ vendor/llvm/dist/test/MC/Sparc/sparc-ctrl-instructions.s	(revision 295846)
@@ -1,306 +1,319 @@
 ! RUN: llvm-mc %s -arch=sparc   -show-encoding | FileCheck %s
 ! RUN: llvm-mc %s -arch=sparcv9 -show-encoding | FileCheck %s
 
         ! CHECK: call foo     ! encoding: [0b01AAAAAA,A,A,A]
         ! CHECK:              !   fixup A - offset: 0, value: foo, kind: fixup_sparc_call30
         call foo
 
         ! CHECK: call %g1+%i2 ! encoding: [0x9f,0xc0,0x40,0x1a]
         call %g1 + %i2
 
         ! CHECK: call %o1+8   ! encoding: [0x9f,0xc2,0x60,0x08]
         call %o1 + 8
 
         ! CHECK: call %g1     ! encoding: [0x9f,0xc0,0x40,0x00]
         call %g1
 
         ! CHECK: call %g1+%lo(sym)   ! encoding: [0x9f,0xc0,0b011000AA,A]
         ! CHECK-NEXT:                ! fixup A - offset: 0, value: %lo(sym), kind: fixup_sparc_lo10
         call %g1+%lo(sym)
 
         ! CHECK: jmp %g1+%i2  ! encoding: [0x81,0xc0,0x40,0x1a]
         jmp %g1 + %i2
 
         ! CHECK: jmp %o1+8    ! encoding: [0x81,0xc2,0x60,0x08]
         jmp %o1 + 8
 
         ! CHECK: jmp %g1      ! encoding: [0x81,0xc0,0x40,0x00]
         jmp %g1
 
         ! CHECK: jmp %g1+%lo(sym)   ! encoding: [0x81,0xc0,0b011000AA,A]
         ! CHECK-NEXT:                ! fixup A - offset: 0, value: %lo(sym), kind: fixup_sparc_lo10
         jmp %g1+%lo(sym)
 
         ! CHECK: jmpl %g1+%i2, %g2  ! encoding: [0x85,0xc0,0x40,0x1a]
         jmpl %g1 + %i2, %g2
 
         ! CHECK: jmpl %o1+8, %g2    ! encoding: [0x85,0xc2,0x60,0x08]
         jmpl %o1 + 8, %g2
 
         ! CHECK: jmpl %g1, %g2      ! encoding: [0x85,0xc0,0x40,0x00]
         jmpl %g1, %g2
 
         ! CHECK: jmpl %g1+%lo(sym), %g2   ! encoding: [0x85,0xc0,0b011000AA,A]
         ! CHECK-NEXT:                     ! fixup A - offset: 0, value: %lo(sym), kind: fixup_sparc_lo10
         jmpl %g1+%lo(sym), %g2
 
         ! CHECK: ba .BB0      ! encoding: [0x10,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         ba .BB0
 
         ! CHECK: bne .BB0     ! encoding: [0x12,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bne .BB0
 
         ! CHECK: bne .BB0     ! encoding: [0x12,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bnz .BB0
 
         ! CHECK: be .BB0      ! encoding: [0x02,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         be .BB0
 
         ! CHECK: be .BB0      ! encoding: [0x02,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bz .BB0
 
         ! CHECK: be .BB0      ! encoding: [0x02,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         beq .BB0
 
         ! CHECK: bg .BB0      ! encoding: [0x14,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bg .BB0
 
         ! CHECK: ble .BB0      ! encoding: [0x04,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         ble .BB0
 
         ! CHECK: bge .BB0      ! encoding: [0x16,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bge .BB0
 
         ! CHECK: bl .BB0      ! encoding: [0x06,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bl .BB0
 
         ! CHECK: bgu .BB0      ! encoding: [0x18,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bgu .BB0
 
         ! CHECK: bleu .BB0      ! encoding: [0x08,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bleu .BB0
 
         ! CHECK: bcc .BB0      ! encoding: [0x1a,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bcc .BB0
 
         ! CHECK: bcc .BB0      ! encoding: [0x1a,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bgeu .BB0
 
         ! CHECK: bcs .BB0      ! encoding: [0x0a,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bcs .BB0
 
         ! CHECK: bcs .BB0      ! encoding: [0x0a,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         blu .BB0
 
         ! CHECK: bpos .BB0      ! encoding: [0x1c,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bpos .BB0
 
         ! CHECK: bneg .BB0      ! encoding: [0x0c,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bneg .BB0
 
         ! CHECK: bvc .BB0      ! encoding: [0x1e,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bvc .BB0
 
         ! CHECK: bvs .BB0      ! encoding: [0x0e,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bvs .BB0
 
+        ! CHECK:             fba .BB0                        ! encoding: [0x11,0b10AAAAAA,A,A]
+        ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
+        fba .BB0
+
+        ! CHECK:             fba .BB0                        ! encoding: [0x11,0b10AAAAAA,A,A]
+        ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
+        fb .BB0
+
+        ! CHECK:             fbn .BB0                        ! encoding: [0x01,0b10AAAAAA,A,A]
+        ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
+        fbn .BB0
+
         ! CHECK:             fbu .BB0                        ! encoding: [0x0f,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbu .BB0
 
         ! CHECK:             fbg .BB0                        ! encoding: [0x0d,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbg .BB0
+
         ! CHECK:             fbug .BB0                       ! encoding: [0x0b,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbug .BB0
 
         ! CHECK:             fbl .BB0                        ! encoding: [0x09,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbl .BB0
 
         ! CHECK:             fbul .BB0                       ! encoding: [0x07,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbul .BB0
 
         ! CHECK:             fblg .BB0                       ! encoding: [0x05,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fblg .BB0
 
         ! CHECK:             fbne .BB0                       ! encoding: [0x03,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbne .BB0
 
         ! CHECK:             fbne .BB0                       ! encoding: [0x03,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbnz .BB0
 
         ! CHECK:             fbe .BB0                        ! encoding: [0x13,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbe .BB0
 
         ! CHECK:             fbe .BB0                        ! encoding: [0x13,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbz .BB0
 
         ! CHECK:             fbue .BB0                       ! encoding: [0x15,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbue .BB0
 
         ! CHECK:             fbge .BB0                       ! encoding: [0x17,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbge .BB0
 
         ! CHECK:             fbuge .BB0                      ! encoding: [0x19,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbuge .BB0
 
         ! CHECK:             fble .BB0                       ! encoding: [0x1b,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fble .BB0
 
         ! CHECK:             fbule .BB0                      ! encoding: [0x1d,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbule .BB0
 
         ! CHECK:             fbo .BB0                        ! encoding: [0x1f,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbo .BB0
 
         ! CHECK: ba,a .BB0    ! encoding: [0x30,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         ba,a .BB0
 
         ! CHECK: bne,a .BB0   ! encoding: [0x32,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bne,a .BB0
 
         ! CHECK: be,a .BB0    ! encoding: [0x22,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         be,a .BB0
 
         ! CHECK: bg,a .BB0    ! encoding: [0x34,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bg,a .BB0
 
         ! CHECK: ble,a .BB0   ! encoding: [0x24,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         ble,a .BB0
 
         ! CHECK: bge,a .BB0   ! encoding: [0x36,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bge,a .BB0
 
         ! CHECK: bl,a .BB0    ! encoding: [0x26,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bl,a .BB0
 
         ! CHECK: bgu,a .BB0   ! encoding: [0x38,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bgu,a .BB0
 
         ! CHECK: bleu,a .BB0  ! encoding: [0x28,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bleu,a .BB0
 
         ! CHECK: bcc,a .BB0   ! encoding: [0x3a,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bcc,a .BB0
 
         ! CHECK: bcs,a .BB0   ! encoding: [0x2a,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bcs,a .BB0
 
         ! CHECK: bpos,a .BB0  ! encoding: [0x3c,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bpos,a .BB0
 
         ! CHECK: bneg,a .BB0  ! encoding: [0x2c,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bneg,a .BB0
 
         ! CHECK: bvc,a .BB0   ! encoding: [0x3e,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bvc,a .BB0
 
         ! CHECK: bvs,a .BB0   ! encoding: [0x2e,0b10AAAAAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         bvs,a .BB0
 
         ! CHECK:             fbu,a .BB0                      ! encoding: [0x2f,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbu,a .BB0
 
         ! CHECK:             fbg,a .BB0                      ! encoding: [0x2d,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbg,a .BB0
         ! CHECK:             fbug,a .BB0                     ! encoding: [0x2b,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbug,a .BB0
 
         ! CHECK:             fbl,a .BB0                      ! encoding: [0x29,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbl,a .BB0
 
         ! CHECK:             fbul,a .BB0                     ! encoding: [0x27,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbul,a .BB0
 
         ! CHECK:             fblg,a .BB0                     ! encoding: [0x25,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fblg,a .BB0
 
         ! CHECK:             fbne,a .BB0                     ! encoding: [0x23,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbne,a .BB0
 
         ! CHECK:             fbe,a .BB0                      ! encoding: [0x33,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbe,a .BB0
 
         ! CHECK:             fbue,a .BB0                     ! encoding: [0x35,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbue,a .BB0
 
         ! CHECK:             fbge,a .BB0                     ! encoding: [0x37,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbge,a .BB0
 
         ! CHECK:             fbuge,a .BB0                    ! encoding: [0x39,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbuge,a .BB0
 
         ! CHECK:             fble,a .BB0                     ! encoding: [0x3b,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fble,a .BB0
 
         ! CHECK:             fbule,a .BB0                    ! encoding: [0x3d,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbule,a .BB0
 
         ! CHECK:             fbo,a .BB0                      ! encoding: [0x3f,0b10AAAAAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br22
         fbo,a .BB0
 
         ! CHECK:  rett %i7+8   ! encoding: [0x81,0xcf,0xe0,0x08]
         rett %i7 + 8
Index: vendor/llvm/dist/test/MC/Sparc/sparc64-ctrl-instructions.s
===================================================================
--- vendor/llvm/dist/test/MC/Sparc/sparc64-ctrl-instructions.s	(revision 295845)
+++ vendor/llvm/dist/test/MC/Sparc/sparc64-ctrl-instructions.s	(revision 295846)
@@ -1,1226 +1,1238 @@
 ! RUN: llvm-mc %s -triple=sparc64-unknown-linux-gnu -show-encoding | FileCheck %s
 
 
         ! CHECK: bne %xcc, .BB0     ! encoding: [0x12,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bne %xcc, .BB0
 
         ! CHECK: be %xcc, .BB0      ! encoding: [0x02,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         be %xcc, .BB0
 
         ! CHECK: bg %xcc, .BB0      ! encoding: [0x14,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bg %xcc, .BB0
 
         ! CHECK: ble %xcc, .BB0      ! encoding: [0x04,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         ble %xcc, .BB0
 
         ! CHECK: bge %xcc, .BB0      ! encoding: [0x16,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bge %xcc, .BB0
 
         ! CHECK: bl %xcc, .BB0      ! encoding: [0x06,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bl %xcc, .BB0
 
         ! CHECK: bgu %xcc, .BB0      ! encoding: [0x18,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bgu %xcc, .BB0
 
         ! CHECK: bleu %xcc, .BB0      ! encoding: [0x08,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bleu %xcc, .BB0
 
         ! CHECK: bcc %xcc, .BB0      ! encoding: [0x1a,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcc %xcc, .BB0
 
         ! CHECK: bcs %xcc, .BB0      ! encoding: [0x0a,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcs %xcc, .BB0
 
         ! CHECK: bpos %xcc, .BB0      ! encoding: [0x1c,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bpos %xcc, .BB0
 
         ! CHECK: bneg %xcc, .BB0      ! encoding: [0x0c,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bneg %xcc, .BB0
 
         ! CHECK: bvc %xcc, .BB0      ! encoding: [0x1e,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvc %xcc, .BB0
 
         ! CHECK: bvs %xcc, .BB0      ! encoding: [0x0e,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvs %xcc, .BB0
 
 
         ! CHECK: movne %icc, %g1, %g2            ! encoding: [0x85,0x66,0x40,0x01]
         ! CHECK: move %icc, %g1, %g2             ! encoding: [0x85,0x64,0x40,0x01]
         ! CHECK: movg %icc, %g1, %g2             ! encoding: [0x85,0x66,0x80,0x01]
         ! CHECK: movle %icc, %g1, %g2            ! encoding: [0x85,0x64,0x80,0x01]
         ! CHECK: movge %icc, %g1, %g2            ! encoding: [0x85,0x66,0xc0,0x01]
         ! CHECK: movl %icc, %g1, %g2             ! encoding: [0x85,0x64,0xc0,0x01]
         ! CHECK: movgu %icc, %g1, %g2            ! encoding: [0x85,0x67,0x00,0x01]
         ! CHECK: movleu %icc, %g1, %g2           ! encoding: [0x85,0x65,0x00,0x01]
         ! CHECK: movcc %icc, %g1, %g2            ! encoding: [0x85,0x67,0x40,0x01]
         ! CHECK: movcs %icc, %g1, %g2            ! encoding: [0x85,0x65,0x40,0x01]
         ! CHECK: movpos %icc, %g1, %g2           ! encoding: [0x85,0x67,0x80,0x01]
         ! CHECK: movneg %icc, %g1, %g2           ! encoding: [0x85,0x65,0x80,0x01]
         ! CHECK: movvc %icc, %g1, %g2            ! encoding: [0x85,0x67,0xc0,0x01]
         ! CHECK: movvs %icc, %g1, %g2            ! encoding: [0x85,0x65,0xc0,0x01]
         movne  %icc, %g1, %g2
         move   %icc, %g1, %g2
         movg   %icc, %g1, %g2
         movle  %icc, %g1, %g2
         movge  %icc, %g1, %g2
         movl   %icc, %g1, %g2
         movgu  %icc, %g1, %g2
         movleu %icc, %g1, %g2
         movcc  %icc, %g1, %g2
         movcs  %icc, %g1, %g2
         movpos %icc, %g1, %g2
         movneg %icc, %g1, %g2
         movvc  %icc, %g1, %g2
         movvs  %icc, %g1, %g2
 
         ! CHECK: movne %xcc, %g1, %g2            ! encoding: [0x85,0x66,0x50,0x01]
         ! CHECK: move %xcc, %g1, %g2             ! encoding: [0x85,0x64,0x50,0x01]
         ! CHECK: movg %xcc, %g1, %g2             ! encoding: [0x85,0x66,0x90,0x01]
         ! CHECK: movle %xcc, %g1, %g2            ! encoding: [0x85,0x64,0x90,0x01]
         ! CHECK: movge %xcc, %g1, %g2            ! encoding: [0x85,0x66,0xd0,0x01]
         ! CHECK: movl %xcc, %g1, %g2             ! encoding: [0x85,0x64,0xd0,0x01]
         ! CHECK: movgu %xcc, %g1, %g2            ! encoding: [0x85,0x67,0x10,0x01]
         ! CHECK: movleu %xcc, %g1, %g2           ! encoding: [0x85,0x65,0x10,0x01]
         ! CHECK: movcc %xcc, %g1, %g2            ! encoding: [0x85,0x67,0x50,0x01]
         ! CHECK: movcs %xcc, %g1, %g2            ! encoding: [0x85,0x65,0x50,0x01]
         ! CHECK: movpos %xcc, %g1, %g2           ! encoding: [0x85,0x67,0x90,0x01]
         ! CHECK: movneg %xcc, %g1, %g2           ! encoding: [0x85,0x65,0x90,0x01]
         ! CHECK: movvc %xcc, %g1, %g2            ! encoding: [0x85,0x67,0xd0,0x01]
         ! CHECK: movvs %xcc, %g1, %g2            ! encoding: [0x85,0x65,0xd0,0x01]
         movne  %xcc, %g1, %g2
         move   %xcc, %g1, %g2
         movg   %xcc, %g1, %g2
         movle  %xcc, %g1, %g2
         movge  %xcc, %g1, %g2
         movl   %xcc, %g1, %g2
         movgu  %xcc, %g1, %g2
         movleu %xcc, %g1, %g2
         movcc  %xcc, %g1, %g2
         movcs  %xcc, %g1, %g2
         movpos %xcc, %g1, %g2
         movneg %xcc, %g1, %g2
         movvc  %xcc, %g1, %g2
         movvs  %xcc, %g1, %g2
 
         ! CHECK: movu %fcc0, %g1, %g2            ! encoding: [0x85,0x61,0xc0,0x01]
         ! CHECK: movg %fcc0, %g1, %g2            ! encoding: [0x85,0x61,0x80,0x01]
         ! CHECK: movug %fcc0, %g1, %g2           ! encoding: [0x85,0x61,0x40,0x01]
         ! CHECK: movl %fcc0, %g1, %g2            ! encoding: [0x85,0x61,0x00,0x01]
         ! CHECK: movul %fcc0, %g1, %g2           ! encoding: [0x85,0x60,0xc0,0x01]
         ! CHECK: movlg %fcc0, %g1, %g2           ! encoding: [0x85,0x60,0x80,0x01]
         ! CHECK: movne %fcc0, %g1, %g2           ! encoding: [0x85,0x60,0x40,0x01]
         ! CHECK: move %fcc0, %g1, %g2            ! encoding: [0x85,0x62,0x40,0x01]
         ! CHECK: movue %fcc0, %g1, %g2           ! encoding: [0x85,0x62,0x80,0x01]
         ! CHECK: movge %fcc0, %g1, %g2           ! encoding: [0x85,0x62,0xc0,0x01]
         ! CHECK: movuge %fcc0, %g1, %g2          ! encoding: [0x85,0x63,0x00,0x01]
         ! CHECK: movle %fcc0, %g1, %g2           ! encoding: [0x85,0x63,0x40,0x01]
         ! CHECK: movule %fcc0, %g1, %g2          ! encoding: [0x85,0x63,0x80,0x01]
         ! CHECK: movo %fcc0, %g1, %g2            ! encoding: [0x85,0x63,0xc0,0x01]
         movu   %fcc0, %g1, %g2
         movg   %fcc0, %g1, %g2
         movug  %fcc0, %g1, %g2
         movl   %fcc0, %g1, %g2
         movul  %fcc0, %g1, %g2
         movlg  %fcc0, %g1, %g2
         movne  %fcc0, %g1, %g2
         move   %fcc0, %g1, %g2
         movue  %fcc0, %g1, %g2
         movge  %fcc0, %g1, %g2
         movuge %fcc0, %g1, %g2
         movle  %fcc0, %g1, %g2
         movule %fcc0, %g1, %g2
         movo   %fcc0, %g1, %g2
 
 
         ! CHECK: fmovsne %icc, %f1, %f2          ! encoding: [0x85,0xaa,0x60,0x21]
         ! CHECK: fmovse %icc, %f1, %f2           ! encoding: [0x85,0xa8,0x60,0x21]
         ! CHECK: fmovsg %icc, %f1, %f2           ! encoding: [0x85,0xaa,0xa0,0x21]
         ! CHECK: fmovsle %icc, %f1, %f2          ! encoding: [0x85,0xa8,0xa0,0x21]
         ! CHECK: fmovsge %icc, %f1, %f2          ! encoding: [0x85,0xaa,0xe0,0x21]
         ! CHECK: fmovsl %icc, %f1, %f2           ! encoding: [0x85,0xa8,0xe0,0x21]
         ! CHECK: fmovsgu %icc, %f1, %f2          ! encoding: [0x85,0xab,0x20,0x21]
         ! CHECK: fmovsleu %icc, %f1, %f2         ! encoding: [0x85,0xa9,0x20,0x21]
         ! CHECK: fmovscc %icc, %f1, %f2          ! encoding: [0x85,0xab,0x60,0x21]
         ! CHECK: fmovscs %icc, %f1, %f2          ! encoding: [0x85,0xa9,0x60,0x21]
         ! CHECK: fmovspos %icc, %f1, %f2         ! encoding: [0x85,0xab,0xa0,0x21]
         ! CHECK: fmovsneg %icc, %f1, %f2         ! encoding: [0x85,0xa9,0xa0,0x21]
         ! CHECK: fmovsvc %icc, %f1, %f2          ! encoding: [0x85,0xab,0xe0,0x21]
         ! CHECK: fmovsvs %icc, %f1, %f2          ! encoding: [0x85,0xa9,0xe0,0x21]
         fmovsne  %icc, %f1, %f2
         fmovse   %icc, %f1, %f2
         fmovsg   %icc, %f1, %f2
         fmovsle  %icc, %f1, %f2
         fmovsge  %icc, %f1, %f2
         fmovsl   %icc, %f1, %f2
         fmovsgu  %icc, %f1, %f2
         fmovsleu %icc, %f1, %f2
         fmovscc  %icc, %f1, %f2
         fmovscs  %icc, %f1, %f2
         fmovspos %icc, %f1, %f2
         fmovsneg %icc, %f1, %f2
         fmovsvc  %icc, %f1, %f2
         fmovsvs  %icc, %f1, %f2
 
         ! CHECK: fmovsne %xcc, %f1, %f2          ! encoding: [0x85,0xaa,0x70,0x21]
         ! CHECK: fmovse %xcc, %f1, %f2           ! encoding: [0x85,0xa8,0x70,0x21]
         ! CHECK: fmovsg %xcc, %f1, %f2           ! encoding: [0x85,0xaa,0xb0,0x21]
         ! CHECK: fmovsle %xcc, %f1, %f2          ! encoding: [0x85,0xa8,0xb0,0x21]
         ! CHECK: fmovsge %xcc, %f1, %f2          ! encoding: [0x85,0xaa,0xf0,0x21]
         ! CHECK: fmovsl %xcc, %f1, %f2           ! encoding: [0x85,0xa8,0xf0,0x21]
         ! CHECK: fmovsgu %xcc, %f1, %f2          ! encoding: [0x85,0xab,0x30,0x21]
         ! CHECK: fmovsleu %xcc, %f1, %f2         ! encoding: [0x85,0xa9,0x30,0x21]
         ! CHECK: fmovscc %xcc, %f1, %f2          ! encoding: [0x85,0xab,0x70,0x21]
         ! CHECK: fmovscs %xcc, %f1, %f2          ! encoding: [0x85,0xa9,0x70,0x21]
         ! CHECK: fmovspos %xcc, %f1, %f2         ! encoding: [0x85,0xab,0xb0,0x21]
         ! CHECK: fmovsneg %xcc, %f1, %f2         ! encoding: [0x85,0xa9,0xb0,0x21]
         ! CHECK: fmovsvc %xcc, %f1, %f2          ! encoding: [0x85,0xab,0xf0,0x21]
         ! CHECK: fmovsvs %xcc, %f1, %f2          ! encoding: [0x85,0xa9,0xf0,0x21]
         fmovsne  %xcc, %f1, %f2
         fmovse   %xcc, %f1, %f2
         fmovsg   %xcc, %f1, %f2
         fmovsle  %xcc, %f1, %f2
         fmovsge  %xcc, %f1, %f2
         fmovsl   %xcc, %f1, %f2
         fmovsgu  %xcc, %f1, %f2
         fmovsleu %xcc, %f1, %f2
         fmovscc  %xcc, %f1, %f2
         fmovscs  %xcc, %f1, %f2
         fmovspos %xcc, %f1, %f2
         fmovsneg %xcc, %f1, %f2
         fmovsvc  %xcc, %f1, %f2
         fmovsvs  %xcc, %f1, %f2
 
         ! CHECK: fmovsu %fcc0, %f1, %f2          ! encoding: [0x85,0xa9,0xc0,0x21]
         ! CHECK: fmovsg %fcc0, %f1, %f2          ! encoding: [0x85,0xa9,0x80,0x21]
         ! CHECK: fmovsug %fcc0, %f1, %f2         ! encoding: [0x85,0xa9,0x40,0x21]
         ! CHECK: fmovsl %fcc0, %f1, %f2          ! encoding: [0x85,0xa9,0x00,0x21]
         ! CHECK: fmovsul %fcc0, %f1, %f2         ! encoding: [0x85,0xa8,0xc0,0x21]
         ! CHECK: fmovslg %fcc0, %f1, %f2         ! encoding: [0x85,0xa8,0x80,0x21]
         ! CHECK: fmovsne %fcc0, %f1, %f2         ! encoding: [0x85,0xa8,0x40,0x21]
         ! CHECK: fmovse %fcc0, %f1, %f2          ! encoding: [0x85,0xaa,0x40,0x21]
         ! CHECK: fmovsue %fcc0, %f1, %f2         ! encoding: [0x85,0xaa,0x80,0x21]
         ! CHECK: fmovsge %fcc0, %f1, %f2         ! encoding: [0x85,0xaa,0xc0,0x21]
         ! CHECK: fmovsuge %fcc0, %f1, %f2        ! encoding: [0x85,0xab,0x00,0x21]
         ! CHECK: fmovsle %fcc0, %f1, %f2         ! encoding: [0x85,0xab,0x40,0x21]
         ! CHECK: fmovsule %fcc0, %f1, %f2        ! encoding: [0x85,0xab,0x80,0x21]
         ! CHECK: fmovso %fcc0, %f1, %f2          ! encoding: [0x85,0xab,0xc0,0x21]
         fmovsu   %fcc0, %f1, %f2
         fmovsg   %fcc0, %f1, %f2
         fmovsug  %fcc0, %f1, %f2
         fmovsl   %fcc0, %f1, %f2
         fmovsul  %fcc0, %f1, %f2
         fmovslg  %fcc0, %f1, %f2
         fmovsne  %fcc0, %f1, %f2
         fmovse   %fcc0, %f1, %f2
         fmovsue  %fcc0, %f1, %f2
         fmovsge  %fcc0, %f1, %f2
         fmovsuge %fcc0, %f1, %f2
         fmovsle  %fcc0, %f1, %f2
         fmovsule %fcc0, %f1, %f2
         fmovso   %fcc0, %f1, %f2
 
         ! CHECK: bne,a %icc, .BB0     ! encoding: [0x32,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bne,a %icc, .BB0
 
         ! CHECK: be,a %icc, .BB0      ! encoding: [0x22,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         be,a %icc, .BB0
 
         ! CHECK: bg,a %icc, .BB0      ! encoding: [0x34,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bg,a %icc, .BB0
 
         ! CHECK: ble,a %icc, .BB0      ! encoding: [0x24,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         ble,a %icc, .BB0
 
         ! CHECK: bge,a %icc, .BB0      ! encoding: [0x36,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bge,a %icc, .BB0
 
         ! CHECK: bl,a %icc, .BB0      ! encoding: [0x26,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bl,a %icc, .BB0
 
         ! CHECK: bgu,a %icc, .BB0      ! encoding: [0x38,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bgu,a %icc, .BB0
 
         ! CHECK: bleu,a %icc, .BB0      ! encoding: [0x28,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bleu,a %icc, .BB0
 
         ! CHECK: bcc,a %icc, .BB0      ! encoding: [0x3a,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcc,a %icc, .BB0
 
         ! CHECK: bcs,a %icc, .BB0      ! encoding: [0x2a,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcs,a %icc, .BB0
 
         ! CHECK: bpos,a %icc, .BB0      ! encoding: [0x3c,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bpos,a %icc, .BB0
 
         ! CHECK: bneg,a %icc, .BB0      ! encoding: [0x2c,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bneg,a %icc, .BB0
 
         ! CHECK: bvc,a %icc, .BB0      ! encoding: [0x3e,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvc,a %icc, .BB0
 
         ! CHECK: bvs,a %icc, .BB0      ! encoding: [0x2e,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvs,a %icc, .BB0
 
         ! CHECK: bne,pn %icc, .BB0     ! encoding: [0x12,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bne,pn %icc, .BB0
 
         ! CHECK: be,pn %icc, .BB0      ! encoding: [0x02,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         be,pn %icc, .BB0
 
         ! CHECK: bg,pn %icc, .BB0      ! encoding: [0x14,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bg,pn %icc, .BB0
 
         ! CHECK: ble,pn %icc, .BB0      ! encoding: [0x04,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         ble,pn %icc, .BB0
 
         ! CHECK: bge,pn %icc, .BB0      ! encoding: [0x16,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bge,pn %icc, .BB0
 
         ! CHECK: bl,pn %icc, .BB0      ! encoding: [0x06,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bl,pn %icc, .BB0
 
         ! CHECK: bgu,pn %icc, .BB0      ! encoding: [0x18,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bgu,pn %icc, .BB0
 
         ! CHECK: bleu,pn %icc, .BB0      ! encoding: [0x08,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bleu,pn %icc, .BB0
 
         ! CHECK: bcc,pn %icc, .BB0      ! encoding: [0x1a,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcc,pn %icc, .BB0
 
         ! CHECK: bcs,pn %icc, .BB0      ! encoding: [0x0a,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcs,pn %icc, .BB0
 
         ! CHECK: bpos,pn %icc, .BB0      ! encoding: [0x1c,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bpos,pn %icc, .BB0
 
         ! CHECK: bneg,pn %icc, .BB0      ! encoding: [0x0c,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bneg,pn %icc, .BB0
 
         ! CHECK: bvc,pn %icc, .BB0      ! encoding: [0x1e,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvc,pn %icc, .BB0
 
         ! CHECK: bvs,pn %icc, .BB0      ! encoding: [0x0e,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvs,pn %icc, .BB0
 
         ! CHECK: bne,a,pn %icc, .BB0     ! encoding: [0x32,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bne,a,pn %icc, .BB0
 
         ! CHECK: be,a,pn %icc, .BB0      ! encoding: [0x22,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         be,a,pn %icc, .BB0
 
         ! CHECK: bg,a,pn %icc, .BB0      ! encoding: [0x34,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bg,a,pn %icc, .BB0
 
         ! CHECK: ble,a,pn %icc, .BB0      ! encoding: [0x24,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         ble,a,pn %icc, .BB0
 
         ! CHECK: bge,a,pn %icc, .BB0      ! encoding: [0x36,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bge,a,pn %icc, .BB0
 
         ! CHECK: bl,a,pn %icc, .BB0      ! encoding: [0x26,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bl,a,pn %icc, .BB0
 
         ! CHECK: bgu,a,pn %icc, .BB0      ! encoding: [0x38,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bgu,a,pn %icc, .BB0
 
         ! CHECK: bleu,a,pn %icc, .BB0      ! encoding: [0x28,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bleu,a,pn %icc, .BB0
 
         ! CHECK: bcc,a,pn %icc, .BB0      ! encoding: [0x3a,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcc,a,pn %icc, .BB0
 
         ! CHECK: bcs,a,pn %icc, .BB0      ! encoding: [0x2a,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcs,a,pn %icc, .BB0
 
         ! CHECK: bpos,a,pn %icc, .BB0      ! encoding: [0x3c,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bpos,a,pn %icc, .BB0
 
         ! CHECK: bneg,a,pn %icc, .BB0      ! encoding: [0x2c,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bneg,a,pn %icc, .BB0
 
         ! CHECK: bvc,a,pn %icc, .BB0      ! encoding: [0x3e,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvc,a,pn %icc, .BB0
 
         ! CHECK: bvs,a,pn %icc, .BB0      ! encoding: [0x2e,0b01000AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvs,a,pn %icc, .BB0
 
         ! CHECK: bne %icc, .BB0     ! encoding: [0x12,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bne,pt %icc, .BB0
 
         ! CHECK: be %icc, .BB0      ! encoding: [0x02,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         be,pt %icc, .BB0
 
         ! CHECK: bg %icc, .BB0      ! encoding: [0x14,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bg,pt %icc, .BB0
 
         ! CHECK: ble %icc, .BB0      ! encoding: [0x04,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         ble,pt %icc, .BB0
 
         ! CHECK: bge %icc, .BB0      ! encoding: [0x16,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bge,pt %icc, .BB0
 
         ! CHECK: bl %icc, .BB0      ! encoding: [0x06,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bl,pt %icc, .BB0
 
         ! CHECK: bgu %icc, .BB0      ! encoding: [0x18,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bgu,pt %icc, .BB0
 
         ! CHECK: bleu %icc, .BB0      ! encoding: [0x08,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bleu,pt %icc, .BB0
 
         ! CHECK: bcc %icc, .BB0      ! encoding: [0x1a,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcc,pt %icc, .BB0
 
         ! CHECK: bcs %icc, .BB0      ! encoding: [0x0a,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcs,pt %icc, .BB0
 
         ! CHECK: bpos %icc, .BB0      ! encoding: [0x1c,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bpos,pt %icc, .BB0
 
         ! CHECK: bneg %icc, .BB0      ! encoding: [0x0c,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bneg,pt %icc, .BB0
 
         ! CHECK: bvc %icc, .BB0      ! encoding: [0x1e,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvc,pt %icc, .BB0
 
         ! CHECK: bvs %icc, .BB0      ! encoding: [0x0e,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvs,pt %icc, .BB0
 
         ! CHECK: bne,a %icc, .BB0     ! encoding: [0x32,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bne,a,pt %icc, .BB0
 
         ! CHECK: be,a %icc, .BB0      ! encoding: [0x22,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         be,a,pt %icc, .BB0
 
         ! CHECK: bg,a %icc, .BB0      ! encoding: [0x34,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bg,a,pt %icc, .BB0
 
         ! CHECK: ble,a %icc, .BB0      ! encoding: [0x24,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         ble,a,pt %icc, .BB0
 
         ! CHECK: bge,a %icc, .BB0      ! encoding: [0x36,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bge,a,pt %icc, .BB0
 
         ! CHECK: bl,a %icc, .BB0      ! encoding: [0x26,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bl,a,pt %icc, .BB0
 
         ! CHECK: bgu,a %icc, .BB0      ! encoding: [0x38,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bgu,a,pt %icc, .BB0
 
         ! CHECK: bleu,a %icc, .BB0      ! encoding: [0x28,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bleu,a,pt %icc, .BB0
 
         ! CHECK: bcc,a %icc, .BB0      ! encoding: [0x3a,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcc,a,pt %icc, .BB0
 
         ! CHECK: bcs,a %icc, .BB0      ! encoding: [0x2a,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcs,a,pt %icc, .BB0
 
         ! CHECK: bpos,a %icc, .BB0      ! encoding: [0x3c,0b01001AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bpos,a,pt %icc, .BB0
 
 
         ! CHECK: bne,a %xcc, .BB0     ! encoding: [0x32,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bne,a %xcc, .BB0
 
         ! CHECK: be,a %xcc, .BB0      ! encoding: [0x22,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         be,a %xcc, .BB0
 
         ! CHECK: bg,a %xcc, .BB0      ! encoding: [0x34,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bg,a %xcc, .BB0
 
         ! CHECK: ble,a %xcc, .BB0      ! encoding: [0x24,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         ble,a %xcc, .BB0
 
         ! CHECK: bge,a %xcc, .BB0      ! encoding: [0x36,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bge,a %xcc, .BB0
 
         ! CHECK: bl,a %xcc, .BB0      ! encoding: [0x26,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bl,a %xcc, .BB0
 
         ! CHECK: bgu,a %xcc, .BB0      ! encoding: [0x38,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bgu,a %xcc, .BB0
 
         ! CHECK: bleu,a %xcc, .BB0      ! encoding: [0x28,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bleu,a %xcc, .BB0
 
         ! CHECK: bcc,a %xcc, .BB0      ! encoding: [0x3a,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcc,a %xcc, .BB0
 
         ! CHECK: bcs,a %xcc, .BB0      ! encoding: [0x2a,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcs,a %xcc, .BB0
 
         ! CHECK: bpos,a %xcc, .BB0      ! encoding: [0x3c,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bpos,a %xcc, .BB0
 
         ! CHECK: bneg,a %xcc, .BB0      ! encoding: [0x2c,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bneg,a %xcc, .BB0
 
         ! CHECK: bvc,a %xcc, .BB0      ! encoding: [0x3e,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvc,a %xcc, .BB0
 
         ! CHECK: bvs,a %xcc, .BB0      ! encoding: [0x2e,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvs,a %xcc, .BB0
 
         ! CHECK: bne,pn %xcc, .BB0     ! encoding: [0x12,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bne,pn %xcc, .BB0
 
         ! CHECK: be,pn %xcc, .BB0      ! encoding: [0x02,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         be,pn %xcc, .BB0
 
         ! CHECK: bg,pn %xcc, .BB0      ! encoding: [0x14,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bg,pn %xcc, .BB0
 
         ! CHECK: ble,pn %xcc, .BB0      ! encoding: [0x04,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         ble,pn %xcc, .BB0
 
         ! CHECK: bge,pn %xcc, .BB0      ! encoding: [0x16,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bge,pn %xcc, .BB0
 
         ! CHECK: bl,pn %xcc, .BB0      ! encoding: [0x06,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bl,pn %xcc, .BB0
 
         ! CHECK: bgu,pn %xcc, .BB0      ! encoding: [0x18,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bgu,pn %xcc, .BB0
 
         ! CHECK: bleu,pn %xcc, .BB0      ! encoding: [0x08,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bleu,pn %xcc, .BB0
 
         ! CHECK: bcc,pn %xcc, .BB0      ! encoding: [0x1a,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcc,pn %xcc, .BB0
 
         ! CHECK: bcs,pn %xcc, .BB0      ! encoding: [0x0a,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcs,pn %xcc, .BB0
 
         ! CHECK: bpos,pn %xcc, .BB0      ! encoding: [0x1c,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bpos,pn %xcc, .BB0
 
         ! CHECK: bneg,pn %xcc, .BB0      ! encoding: [0x0c,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bneg,pn %xcc, .BB0
 
         ! CHECK: bvc,pn %xcc, .BB0      ! encoding: [0x1e,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvc,pn %xcc, .BB0
 
         ! CHECK: bvs,pn %xcc, .BB0      ! encoding: [0x0e,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvs,pn %xcc, .BB0
 
         ! CHECK: bne,a,pn %xcc, .BB0     ! encoding: [0x32,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bne,a,pn %xcc, .BB0
 
         ! CHECK: be,a,pn %xcc, .BB0      ! encoding: [0x22,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         be,a,pn %xcc, .BB0
 
         ! CHECK: bg,a,pn %xcc, .BB0      ! encoding: [0x34,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bg,a,pn %xcc, .BB0
 
         ! CHECK: ble,a,pn %xcc, .BB0      ! encoding: [0x24,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         ble,a,pn %xcc, .BB0
 
         ! CHECK: bge,a,pn %xcc, .BB0      ! encoding: [0x36,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bge,a,pn %xcc, .BB0
 
         ! CHECK: bl,a,pn %xcc, .BB0      ! encoding: [0x26,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bl,a,pn %xcc, .BB0
 
         ! CHECK: bgu,a,pn %xcc, .BB0      ! encoding: [0x38,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bgu,a,pn %xcc, .BB0
 
         ! CHECK: bleu,a,pn %xcc, .BB0      ! encoding: [0x28,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bleu,a,pn %xcc, .BB0
 
         ! CHECK: bcc,a,pn %xcc, .BB0      ! encoding: [0x3a,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcc,a,pn %xcc, .BB0
 
         ! CHECK: bcs,a,pn %xcc, .BB0      ! encoding: [0x2a,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcs,a,pn %xcc, .BB0
 
         ! CHECK: bpos,a,pn %xcc, .BB0      ! encoding: [0x3c,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bpos,a,pn %xcc, .BB0
 
         ! CHECK: bneg,a,pn %xcc, .BB0      ! encoding: [0x2c,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bneg,a,pn %xcc, .BB0
 
         ! CHECK: bvc,a,pn %xcc, .BB0      ! encoding: [0x3e,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvc,a,pn %xcc, .BB0
 
         ! CHECK: bvs,a,pn %xcc, .BB0      ! encoding: [0x2e,0b01100AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvs,a,pn %xcc, .BB0
 
         ! CHECK: bne %xcc, .BB0     ! encoding: [0x12,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bne,pt %xcc, .BB0
 
         ! CHECK: be %xcc, .BB0      ! encoding: [0x02,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         be,pt %xcc, .BB0
 
         ! CHECK: bg %xcc, .BB0      ! encoding: [0x14,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bg,pt %xcc, .BB0
 
         ! CHECK: ble %xcc, .BB0      ! encoding: [0x04,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         ble,pt %xcc, .BB0
 
         ! CHECK: bge %xcc, .BB0      ! encoding: [0x16,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bge,pt %xcc, .BB0
 
         ! CHECK: bl %xcc, .BB0      ! encoding: [0x06,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bl,pt %xcc, .BB0
 
         ! CHECK: bgu %xcc, .BB0      ! encoding: [0x18,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bgu,pt %xcc, .BB0
 
         ! CHECK: bleu %xcc, .BB0      ! encoding: [0x08,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bleu,pt %xcc, .BB0
 
         ! CHECK: bcc %xcc, .BB0      ! encoding: [0x1a,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcc,pt %xcc, .BB0
 
         ! CHECK: bcs %xcc, .BB0      ! encoding: [0x0a,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcs,pt %xcc, .BB0
 
         ! CHECK: bpos %xcc, .BB0      ! encoding: [0x1c,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bpos,pt %xcc, .BB0
 
         ! CHECK: bneg %xcc, .BB0      ! encoding: [0x0c,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bneg,pt %xcc, .BB0
 
         ! CHECK: bvc %xcc, .BB0      ! encoding: [0x1e,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvc,pt %xcc, .BB0
 
         ! CHECK: bvs %xcc, .BB0      ! encoding: [0x0e,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bvs,pt %xcc, .BB0
 
         ! CHECK: bne,a %xcc, .BB0     ! encoding: [0x32,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bne,a,pt %xcc, .BB0
 
         ! CHECK: be,a %xcc, .BB0      ! encoding: [0x22,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         be,a,pt %xcc, .BB0
 
         ! CHECK: bg,a %xcc, .BB0      ! encoding: [0x34,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bg,a,pt %xcc, .BB0
 
         ! CHECK: ble,a %xcc, .BB0      ! encoding: [0x24,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         ble,a,pt %xcc, .BB0
 
         ! CHECK: bge,a %xcc, .BB0      ! encoding: [0x36,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bge,a,pt %xcc, .BB0
 
         ! CHECK: bl,a %xcc, .BB0      ! encoding: [0x26,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bl,a,pt %xcc, .BB0
 
         ! CHECK: bgu,a %xcc, .BB0      ! encoding: [0x38,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bgu,a,pt %xcc, .BB0
 
         ! CHECK: bleu,a %xcc, .BB0      ! encoding: [0x28,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bleu,a,pt %xcc, .BB0
 
         ! CHECK: bcc,a %xcc, .BB0      ! encoding: [0x3a,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcc,a,pt %xcc, .BB0
 
         ! CHECK: bcs,a %xcc, .BB0      ! encoding: [0x2a,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bcs,a,pt %xcc, .BB0
 
         ! CHECK: bpos,a %xcc, .BB0      ! encoding: [0x3c,0b01101AAA,A,A]
         ! CHECK-NEXT:         ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         bpos,a,pt %xcc, .BB0
 
+        ! CHECK:             fba %fcc0, .BB0                        ! encoding: [0x11,0b01001AAA,A,A]
+        ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
+        fba %fcc0, .BB0
+
+        ! CHECK:             fba %fcc0, .BB0                        ! encoding: [0x11,0b01001AAA,A,A]
+        ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
+        fb %fcc0, .BB0
+
+        ! CHECK:             fbn %fcc0, .BB0                        ! encoding: [0x01,0b01001AAA,A,A]
+        ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
+        fbn %fcc0, .BB0
+
         ! CHECK:             fbu %fcc0, .BB0                      ! encoding: [0x0f,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbu %fcc0, .BB0
 
         ! CHECK:             fbg %fcc0, .BB0                      ! encoding: [0x0d,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbg %fcc0, .BB0
         ! CHECK:             fbug %fcc0, .BB0                     ! encoding: [0x0b,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbug %fcc0, .BB0
 
         ! CHECK:             fbl %fcc0, .BB0                      ! encoding: [0x09,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbl %fcc0, .BB0
 
         ! CHECK:             fbul %fcc0, .BB0                     ! encoding: [0x07,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbul %fcc0, .BB0
 
         ! CHECK:             fblg %fcc0, .BB0                     ! encoding: [0x05,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fblg %fcc0, .BB0
 
         ! CHECK:             fbne %fcc0, .BB0                     ! encoding: [0x03,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbne %fcc0, .BB0
 
         ! CHECK:             fbe %fcc0, .BB0                      ! encoding: [0x13,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbe %fcc0, .BB0
 
         ! CHECK:             fbue %fcc0, .BB0                     ! encoding: [0x15,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbue %fcc0, .BB0
 
         ! CHECK:             fbge %fcc0, .BB0                     ! encoding: [0x17,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbge %fcc0, .BB0
 
         ! CHECK:             fbuge %fcc0, .BB0                    ! encoding: [0x19,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbuge %fcc0, .BB0
 
         ! CHECK:             fble %fcc0, .BB0                     ! encoding: [0x1b,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fble %fcc0, .BB0
 
         ! CHECK:             fbule %fcc0, .BB0                    ! encoding: [0x1d,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbule %fcc0, .BB0
 
         ! CHECK:             fbo %fcc0, .BB0                      ! encoding: [0x1f,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbo %fcc0, .BB0
 
         ! CHECK:             fbu %fcc0, .BB0                      ! encoding: [0x0f,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbu,pt %fcc0, .BB0
 
         ! CHECK:             fbg %fcc0, .BB0                      ! encoding: [0x0d,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbg,pt %fcc0, .BB0
         ! CHECK:             fbug %fcc0, .BB0                     ! encoding: [0x0b,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbug,pt %fcc0, .BB0
 
         ! CHECK:             fbl %fcc0, .BB0                      ! encoding: [0x09,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbl,pt %fcc0, .BB0
 
         ! CHECK:             fbul %fcc0, .BB0                     ! encoding: [0x07,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbul,pt %fcc0, .BB0
 
         ! CHECK:             fblg %fcc0, .BB0                     ! encoding: [0x05,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fblg,pt %fcc0, .BB0
 
         ! CHECK:             fbne %fcc0, .BB0                     ! encoding: [0x03,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbne,pt %fcc0, .BB0
 
         ! CHECK:             fbe %fcc0, .BB0                      ! encoding: [0x13,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbe,pt %fcc0, .BB0
 
         ! CHECK:             fbue %fcc0, .BB0                     ! encoding: [0x15,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbue,pt %fcc0, .BB0
 
         ! CHECK:             fbge %fcc0, .BB0                     ! encoding: [0x17,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbge,pt %fcc0, .BB0
 
         ! CHECK:             fbuge %fcc0, .BB0                    ! encoding: [0x19,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbuge,pt %fcc0, .BB0
 
         ! CHECK:             fble %fcc0, .BB0                     ! encoding: [0x1b,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fble,pt %fcc0, .BB0
 
         ! CHECK:             fbule %fcc0, .BB0                    ! encoding: [0x1d,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbule,pt %fcc0, .BB0
 
         ! CHECK:             fbo %fcc0, .BB0                      ! encoding: [0x1f,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbo,pt %fcc0, .BB0
 
 
         ! CHECK:             fbo,a %fcc0, .BB0                      ! encoding: [0x3f,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbo,a %fcc0, .BB0
 
         ! CHECK:             fbu,a %fcc0, .BB0                      ! encoding: [0x2f,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbu,a %fcc0, .BB0
 
         ! CHECK:             fbg,a %fcc0, .BB0                      ! encoding: [0x2d,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbg,a %fcc0, .BB0
         ! CHECK:             fbug,a %fcc0, .BB0                     ! encoding: [0x2b,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbug,a %fcc0, .BB0
 
         ! CHECK:             fbl,a %fcc0, .BB0                      ! encoding: [0x29,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbl,a %fcc0, .BB0
 
         ! CHECK:             fbul,a %fcc0, .BB0                     ! encoding: [0x27,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbul,a %fcc0, .BB0
 
         ! CHECK:             fblg,a %fcc0, .BB0                     ! encoding: [0x25,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fblg,a %fcc0, .BB0
 
         ! CHECK:             fbne,a %fcc0, .BB0                     ! encoding: [0x23,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbne,a %fcc0, .BB0
 
         ! CHECK:             fbe,a %fcc0, .BB0                      ! encoding: [0x33,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbe,a %fcc0, .BB0
 
         ! CHECK:             fbue,a %fcc0, .BB0                     ! encoding: [0x35,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbue,a %fcc0, .BB0
 
         ! CHECK:             fbge,a %fcc0, .BB0                     ! encoding: [0x37,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbge,a %fcc0, .BB0
 
         ! CHECK:             fbuge,a %fcc0, .BB0                    ! encoding: [0x39,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbuge,a %fcc0, .BB0
 
         ! CHECK:             fble,a %fcc0, .BB0                     ! encoding: [0x3b,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fble,a %fcc0, .BB0
 
         ! CHECK:             fbule,a %fcc0, .BB0                    ! encoding: [0x3d,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbule,a %fcc0, .BB0
 
         ! CHECK:             fbo,a %fcc0, .BB0                      ! encoding: [0x3f,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbo,a %fcc0, .BB0
 
                 ! CHECK:             fbo,a %fcc0, .BB0                      ! encoding: [0x3f,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbo,a %fcc0, .BB0
 
         ! CHECK:             fbu,a %fcc0, .BB0                      ! encoding: [0x2f,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbu,a,pt %fcc0, .BB0
 
         ! CHECK:             fbg,a %fcc0, .BB0                      ! encoding: [0x2d,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbg,a,pt %fcc0, .BB0
 
         ! CHECK:             fbug,a %fcc0, .BB0                     ! encoding: [0x2b,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbug,a,pt %fcc0, .BB0
 
         ! CHECK:             fbl,a %fcc0, .BB0                      ! encoding: [0x29,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbl,a,pt %fcc0, .BB0
 
         ! CHECK:             fbul,a %fcc0, .BB0                     ! encoding: [0x27,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbul,a,pt %fcc0, .BB0
 
         ! CHECK:             fblg,a %fcc0, .BB0                     ! encoding: [0x25,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fblg,a,pt %fcc0, .BB0
 
         ! CHECK:             fbne,a %fcc0, .BB0                     ! encoding: [0x23,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbne,a,pt %fcc0, .BB0
 
         ! CHECK:             fbe,a %fcc0, .BB0                      ! encoding: [0x33,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbe,a,pt %fcc0, .BB0
 
         ! CHECK:             fbue,a %fcc0, .BB0                     ! encoding: [0x35,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbue,a,pt %fcc0, .BB0
 
         ! CHECK:             fbge,a %fcc0, .BB0                     ! encoding: [0x37,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbge,a,pt %fcc0, .BB0
 
         ! CHECK:             fbuge,a %fcc0, .BB0                    ! encoding: [0x39,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbuge,a,pt %fcc0, .BB0
 
         ! CHECK:             fble,a %fcc0, .BB0                     ! encoding: [0x3b,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fble,a,pt %fcc0, .BB0
 
         ! CHECK:             fbule,a %fcc0, .BB0                    ! encoding: [0x3d,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbule,a,pt %fcc0, .BB0
 
         ! CHECK:             fbo,a %fcc0, .BB0                      ! encoding: [0x3f,0b01001AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbo,a,pt %fcc0, .BB0
 
         ! CHECK:             fbu,pn %fcc0, .BB0                 ! encoding: [0x0f,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbu,pn %fcc0, .BB0
 
         ! CHECK:             fbg,pn %fcc0, .BB0                      ! encoding: [0x0d,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbg,pn %fcc0, .BB0
         ! CHECK:             fbug,pn %fcc0, .BB0                     ! encoding: [0x0b,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbug,pn %fcc0, .BB0
 
         ! CHECK:             fbl,pn %fcc0, .BB0                      ! encoding: [0x09,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbl,pn %fcc0, .BB0
 
         ! CHECK:             fbul,pn %fcc0, .BB0                     ! encoding: [0x07,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbul,pn %fcc0, .BB0
 
         ! CHECK:             fblg,pn %fcc0, .BB0                     ! encoding: [0x05,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fblg,pn %fcc0, .BB0
 
         ! CHECK:             fbne,pn %fcc0, .BB0                     ! encoding: [0x03,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbne,pn %fcc0, .BB0
 
         ! CHECK:             fbe,pn %fcc0, .BB0                      ! encoding: [0x13,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbe,pn %fcc0, .BB0
 
         ! CHECK:             fbue,pn %fcc0, .BB0                     ! encoding: [0x15,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbue,pn %fcc0, .BB0
 
         ! CHECK:             fbge,pn %fcc0, .BB0                     ! encoding: [0x17,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbge,pn %fcc0, .BB0
 
         ! CHECK:             fbuge,pn %fcc0, .BB0                    ! encoding: [0x19,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbuge,pn %fcc0, .BB0
 
         ! CHECK:             fble,pn %fcc0, .BB0                     ! encoding: [0x1b,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fble,pn %fcc0, .BB0
 
         ! CHECK:             fbule,pn %fcc0, .BB0                    ! encoding: [0x1d,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbule,pn %fcc0, .BB0
 
         ! CHECK:             fbo,pn %fcc0, .BB0                      ! encoding: [0x1f,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbo,pn %fcc0, .BB0
 
                 ! CHECK:             fbu,a,pn %fcc0, .BB0                      ! encoding: [0x2f,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbu,a,pn %fcc0, .BB0
 
         ! CHECK:             fbg,a,pn %fcc0, .BB0                      ! encoding: [0x2d,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbg,a,pn %fcc0, .BB0
 
         ! CHECK:             fbug,a,pn %fcc0, .BB0                     ! encoding: [0x2b,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbug,a,pn %fcc0, .BB0
 
         ! CHECK:             fbl,a,pn %fcc0, .BB0                      ! encoding: [0x29,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbl,a,pn %fcc0, .BB0
 
         ! CHECK:             fbul,a,pn %fcc0, .BB0                     ! encoding: [0x27,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbul,a,pn %fcc0, .BB0
 
         ! CHECK:             fblg,a,pn %fcc0, .BB0                     ! encoding: [0x25,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fblg,a,pn %fcc0, .BB0
 
         ! CHECK:             fbne,a,pn %fcc0, .BB0                     ! encoding: [0x23,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbne,a,pn %fcc0, .BB0
 
         ! CHECK:             fbe,a,pn %fcc0, .BB0                      ! encoding: [0x33,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbe,a,pn %fcc0, .BB0
 
         ! CHECK:             fbue,a,pn %fcc0, .BB0                     ! encoding: [0x35,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbue,a,pn %fcc0, .BB0
 
         ! CHECK:             fbge,a,pn %fcc0, .BB0                     ! encoding: [0x37,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbge,a,pn %fcc0, .BB0
 
         ! CHECK:             fbuge,a,pn %fcc0, .BB0                    ! encoding: [0x39,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbuge,a,pn %fcc0, .BB0
 
         ! CHECK:             fble,a,pn %fcc0, .BB0                     ! encoding: [0x3b,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fble,a,pn %fcc0, .BB0
 
         ! CHECK:             fbule,a,pn %fcc0, .BB0                    ! encoding: [0x3d,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbule,a,pn %fcc0, .BB0
 
         ! CHECK:             fbo,a,pn %fcc0, .BB0                      ! encoding: [0x3f,0b01000AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbo,a,pn %fcc0, .BB0
 
         ! CHECK: movu %fcc1, %g1, %g2            ! encoding: [0x85,0x61,0xc8,0x01]
         movu %fcc1, %g1, %g2
 
         ! CHECK: fmovsg %fcc2, %f1, %f2          ! encoding: [0x85,0xa9,0x90,0x21]
         fmovsg %fcc2, %f1, %f2
 
         ! CHECK:             fbug %fcc3, .BB0                ! encoding: [0x0b,0b01111AAA,A,A]
         ! CHECK-NEXT:                                        ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbug %fcc3, .BB0
 
         ! CHECK:             fbu %fcc3, .BB0                 ! encoding: [0x0f,0b01111AAA,A,A]
         ! CHECK-NEXT:                                        ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbu,pt %fcc3, .BB0
 
         ! CHECK:             fbl,a %fcc3, .BB0               ! encoding: [0x29,0b01111AAA,A,A]
         ! CHECK-NEXT:                                        ! fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbl,a %fcc3, .BB0
 
         ! CHECK:             fbue,pn %fcc3, .BB0             ! encoding: [0x15,0b01110AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbue,pn %fcc3, .BB0
 
         ! CHECK:             fbne,a,pn %fcc3, .BB0           ! encoding: [0x23,0b01110AAA,A,A]
         ! CHECK-NEXT:                                        !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br19
         fbne,a,pn %fcc3, .BB0
 
 
         ! CHECK:                brz %g1, .BB0                   ! encoding: [0x02,0b11AA1000,0b01BBBBBB,B]
         ! CHECK-NEXT:                                           !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br16_2
         ! CHECK-NEXT:                                           !   fixup B - offset: 0, value: .BB0, kind: fixup_sparc_br16_14
         ! CHECK:                brlez %g1, .BB0                 ! encoding: [0x04,0b11AA1000,0b01BBBBBB,B]
         ! CHECK-NEXT:                                           !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br16_2
         ! CHECK-NEXT:                                           !   fixup B - offset: 0, value: .BB0, kind: fixup_sparc_br16_14
         ! CHECK:                brlz %g1, .BB0                  ! encoding: [0x06,0b11AA1000,0b01BBBBBB,B]
         ! CHECK-NEXT:                                           !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br16_2
         ! CHECK-NEXT:                                           !   fixup B - offset: 0, value: .BB0, kind: fixup_sparc_br16_14
         ! CHECK:                brnz %g1, .BB0                  ! encoding: [0x0a,0b11AA1000,0b01BBBBBB,B]
         ! CHECK-NEXT:                                           !  fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br16_2
         ! CHECK-NEXT:                                           !   fixup B - offset: 0, value: .BB0, kind: fixup_sparc_br16_14
         ! CHECK:                brgz %g1, .BB0                  ! encoding: [0x0c,0b11AA1000,0b01BBBBBB,B]
         ! CHECK-NEXT:                                           !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br16_2
         ! CHECK-NEXT:                                           !   fixup B - offset: 0, value: .BB0, kind: fixup_sparc_br16_14
         ! CHECK:                brgez %g1, .BB0                 ! encoding: [0x0e,0b11AA1000,0b01BBBBBB,B]
         ! CHECK-NEXT:                                           !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br16_2
         ! CHECK-NEXT:                                           !   fixup B - offset: 0, value: .BB0, kind: fixup_sparc_br16_14
 
         brz   %g1, .BB0
         brlez %g1, .BB0
         brlz  %g1, .BB0
         brnz  %g1, .BB0
         brgz  %g1, .BB0
         brgez %g1, .BB0
 
         ! CHECK: brz %g1, .BB0                   ! encoding: [0x02,0b11AA1000,0b01BBBBBB,B]
         ! CHECK-NEXT:                            !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br16_2
         ! CHECK-NEXT:                            !   fixup B - offset: 0, value: .BB0, kind: fixup_sparc_br16_14
         brz,pt   %g1, .BB0
 
         ! CHECK: brz,a %g1, .BB0                 ! encoding: [0x22,0b11AA1000,0b01BBBBBB,B]
         ! CHECK-NEXT:                            !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br16_2
         ! CHECK-NEXT:                            !   fixup B - offset: 0, value: .BB0, kind: fixup_sparc_br16_14
         brz,a   %g1, .BB0
 
         ! CHECK: brz,a %g1, .BB0                 ! encoding: [0x22,0b11AA1000,0b01BBBBBB,B]
         ! CHECK-NEXT:                            !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br16_2
         ! CHECK-NEXT:                            !   fixup B - offset: 0, value: .BB0, kind: fixup_sparc_br16_14
         brz,a,pt   %g1, .BB0
 
         ! CHECK:  brz,pn %g1, .BB0               ! encoding: [0x02,0b11AA0000,0b01BBBBBB,B]
         ! CHECK-NEXT:                            !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br16_2
         ! CHECK-NEXT:                            !   fixup B - offset: 0, value: .BB0, kind: fixup_sparc_br16_14
         brz,pn   %g1, .BB0
 
         ! CHECK:  brz,a,pn %g1, .BB0              ! encoding: [0x22,0b11AA0000,0b01BBBBBB,B]
         ! CHECK-NEXT:                             !   fixup A - offset: 0, value: .BB0, kind: fixup_sparc_br16_2
         ! CHECK-NEXT:                             !   fixup B - offset: 0, value: .BB0, kind: fixup_sparc_br16_14
         brz,a,pn   %g1, .BB0
 
         ! CHECK: movrz   %g1, %g2, %g3 ! encoding: [0x87,0x78,0x44,0x02]
         ! CHECK: movrlez %g1, %g2, %g3 ! encoding: [0x87,0x78,0x48,0x02]
         ! CHECK: movrlz  %g1, %g2, %g3 ! encoding: [0x87,0x78,0x4c,0x02]
         ! CHECK: movrnz  %g1, %g2, %g3 ! encoding: [0x87,0x78,0x54,0x02]
         ! CHECK: movrgz  %g1, %g2, %g3 ! encoding: [0x87,0x78,0x58,0x02]
         ! CHECK: movrgez %g1, %g2, %g3 ! encoding: [0x87,0x78,0x5c,0x02]
         movrz   %g1, %g2, %g3
         movrlez %g1, %g2, %g3
         movrlz  %g1, %g2, %g3
         movrnz  %g1, %g2, %g3
         movrgz  %g1, %g2, %g3
         movrgez %g1, %g2, %g3
 
         ! CHECK: fmovrsz %g1, %f2, %f3         ! encoding: [0x87,0xa8,0x44,0xa2]
         ! CHECK: fmovrslez %g1, %f2, %f3       ! encoding: [0x87,0xa8,0x48,0xa2]
         ! CHECK: fmovrslz %g1, %f2, %f3        ! encoding: [0x87,0xa8,0x4c,0xa2]
         ! CHECK: fmovrsnz %g1, %f2, %f3        ! encoding: [0x87,0xa8,0x54,0xa2]
         ! CHECK: fmovrsgz %g1, %f2, %f3        ! encoding: [0x87,0xa8,0x58,0xa2]
         ! CHECK: fmovrsgez %g1, %f2, %f3       ! encoding: [0x87,0xa8,0x5c,0xa2]
         fmovrsz   %g1, %f2, %f3
         fmovrslez %g1, %f2, %f3
         fmovrslz  %g1, %f2, %f3
         fmovrsnz  %g1, %f2, %f3
         fmovrsgz  %g1, %f2, %f3
         fmovrsgez %g1, %f2, %f3
 
         ! CHECK:  rett %i7+8   ! encoding: [0x81,0xcf,0xe0,0x08]
         return %i7 + 8
 
         ! CHECK: ta %icc, %g0 + 5               ! encoding: [0x91,0xd0,0x20,0x05]
         ta 5
 
         ! CHECK: te %xcc, %g0 + 3               ! encoding: [0x83,0xd0,0x30,0x03]
         te %xcc, 3
 
Index: vendor/llvm/dist/test/MC/X86/x86_nop.s
===================================================================
--- vendor/llvm/dist/test/MC/X86/x86_nop.s	(revision 295845)
+++ vendor/llvm/dist/test/MC/X86/x86_nop.s	(revision 295846)
@@ -1,31 +1,37 @@
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=generic %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=i386 %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=i486 %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=i586 %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=pentium %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=pentium-mmx %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=geode %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=i686 %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=k6 %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=k6-2 %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=k6-3 %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=winchip-c6 %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=winchip2 %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=c3 %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=c3-2 %s | llvm-objdump -d - | FileCheck %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=core2 %s | llvm-objdump -d - | FileCheck --check-prefix=NOPL %s
 # RUN: llvm-mc -filetype=obj -triple=i686-pc-linux -mcpu=slm %s | llvm-objdump -d - | FileCheck --check-prefix=NOPL %s
 
 
 inc %eax
 .align 8
 inc %eax
 
 // CHECK: 0:	40                                           	incl	%eax
-// CHECK: 1:	8d b4 26 00 00 00 00                            leal (%esi), %esi
+// CHECK: 1:	90                                           	nop
+// CHECK: 2:	90                                           	nop
+// CHECK: 3:	90                                           	nop
+// CHECK: 4:	90                                           	nop
+// CHECK: 5:	90                                           	nop
+// CHECK: 6:	90                                           	nop
+// CHECK: 7:	90                                           	nop
 // CHECK: 8:	40                                           	incl	%eax
 
 
 // NOPL: 0:	40                                           	incl	%eax
 // NOPL: 1:	0f 1f 80 00 00 00 00                         	nopl	(%eax)
 // NOPL: 8:	40                                           	incl	%eax
Index: vendor/llvm/dist/test/Transforms/InstCombine/fprintf-1.ll
===================================================================
--- vendor/llvm/dist/test/Transforms/InstCombine/fprintf-1.ll	(revision 295845)
+++ vendor/llvm/dist/test/Transforms/InstCombine/fprintf-1.ll	(revision 295846)
@@ -1,89 +1,98 @@
 ; Test that the fprintf library call simplifier works correctly.
 ;
 ; RUN: opt < %s -instcombine -S | FileCheck %s
 ; RUN: opt < %s -mtriple xcore-xmos-elf -instcombine -S | FileCheck %s -check-prefix=CHECK-IPRINTF
 
 target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"
 
 %FILE = type { }
 
 @hello_world = constant [13 x i8] c"hello world\0A\00"
 @percent_c = constant [3 x i8] c"%c\00"
 @percent_d = constant [3 x i8] c"%d\00"
 @percent_f = constant [3 x i8] c"%f\00"
 @percent_s = constant [3 x i8] c"%s\00"
 
 declare i32 @fprintf(%FILE*, i8*, ...)
 
 ; Check fprintf(fp, "foo") -> fwrite("foo", 3, 1, fp).
 
 define void @test_simplify1(%FILE* %fp) {
 ; CHECK-LABEL: @test_simplify1(
   %fmt = getelementptr [13 x i8], [13 x i8]* @hello_world, i32 0, i32 0
   call i32 (%FILE*, i8*, ...) @fprintf(%FILE* %fp, i8* %fmt)
 ; CHECK-NEXT: call i32 @fwrite(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @hello_world, i32 0, i32 0), i32 12, i32 1, %FILE* %fp)
   ret void
 ; CHECK-NEXT: ret void
 }
 
 ; Check fprintf(fp, "%c", chr) -> fputc(chr, fp).
 
 define void @test_simplify2(%FILE* %fp) {
 ; CHECK-LABEL: @test_simplify2(
   %fmt = getelementptr [3 x i8], [3 x i8]* @percent_c, i32 0, i32 0
   call i32 (%FILE*, i8*, ...) @fprintf(%FILE* %fp, i8* %fmt, i8 104)
 ; CHECK-NEXT: call i32 @fputc(i32 104, %FILE* %fp)
   ret void
 ; CHECK-NEXT: ret void
 }
 
 ; Check fprintf(fp, "%s", str) -> fputs(str, fp).
 ; NOTE: The fputs simplifier simplifies this further to fwrite.
 
 define void @test_simplify3(%FILE* %fp) {
 ; CHECK-LABEL: @test_simplify3(
   %fmt = getelementptr [3 x i8], [3 x i8]* @percent_s, i32 0, i32 0
   %str = getelementptr [13 x i8], [13 x i8]* @hello_world, i32 0, i32 0
   call i32 (%FILE*, i8*, ...) @fprintf(%FILE* %fp, i8* %fmt, i8* %str)
 ; CHECK-NEXT: call i32 @fwrite(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @hello_world, i32 0, i32 0), i32 12, i32 1, %FILE* %fp)
   ret void
 ; CHECK-NEXT: ret void
 }
 
 ; Check fprintf(fp, fmt, ...) -> fiprintf(fp, fmt, ...) if no floating point.
 
 define void @test_simplify4(%FILE* %fp) {
 ; CHECK-IPRINTF-LABEL: @test_simplify4(
   %fmt = getelementptr [3 x i8], [3 x i8]* @percent_d, i32 0, i32 0
   call i32 (%FILE*, i8*, ...) @fprintf(%FILE* %fp, i8* %fmt, i32 187)
 ; CHECK-IPRINTF-NEXT: call i32 (%FILE*, i8*, ...) @fiprintf(%FILE* %fp, i8* getelementptr inbounds ([3 x i8], [3 x i8]* @percent_d, i32 0, i32 0), i32 187)
   ret void
 ; CHECK-IPRINTF-NEXT: ret void
 }
 
+define void @test_simplify5(%FILE* %fp) {
+; CHECK-LABEL: @test_simplify5(
+  %fmt = getelementptr [13 x i8], [13 x i8]* @hello_world, i32 0, i32 0
+  call i32 (%FILE*, i8*, ...) @fprintf(%FILE* %fp, i8* %fmt) [ "deopt"() ]
+; CHECK-NEXT: call i32 @fwrite(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @hello_world, i32 0, i32 0), i32 12, i32 1, %FILE* %fp) [ "deopt"() ]
+  ret void
+; CHECK-NEXT: ret void
+}
+
 define void @test_no_simplify1(%FILE* %fp) {
 ; CHECK-IPRINTF-LABEL: @test_no_simplify1(
   %fmt = getelementptr [3 x i8], [3 x i8]* @percent_f, i32 0, i32 0
   call i32 (%FILE*, i8*, ...) @fprintf(%FILE* %fp, i8* %fmt, double 1.87)
 ; CHECK-IPRINTF-NEXT: call i32 (%FILE*, i8*, ...) @fprintf(%FILE* %fp, i8* getelementptr inbounds ([3 x i8], [3 x i8]* @percent_f, i32 0, i32 0), double 1.870000e+00)
   ret void
 ; CHECK-IPRINTF-NEXT: ret void
 }
 
 define void @test_no_simplify2(%FILE* %fp, double %d) {
 ; CHECK-LABEL: @test_no_simplify2(
   %fmt = getelementptr [3 x i8], [3 x i8]* @percent_f, i32 0, i32 0
   call i32 (%FILE*, i8*, ...) @fprintf(%FILE* %fp, i8* %fmt, double %d)
 ; CHECK-NEXT: call i32 (%FILE*, i8*, ...) @fprintf(%FILE* %fp, i8* getelementptr inbounds ([3 x i8], [3 x i8]* @percent_f, i32 0, i32 0), double %d)
   ret void
 ; CHECK-NEXT: ret void
 }
 
 define i32 @test_no_simplify3(%FILE* %fp) {
 ; CHECK-LABEL: @test_no_simplify3(
   %fmt = getelementptr [13 x i8], [13 x i8]* @hello_world, i32 0, i32 0
   %1 = call i32 (%FILE*, i8*, ...) @fprintf(%FILE* %fp, i8* %fmt)
 ; CHECK-NEXT: call i32 (%FILE*, i8*, ...) @fprintf(%FILE* %fp, i8* getelementptr inbounds ([13 x i8], [13 x i8]* @hello_world, i32 0, i32 0))
   ret i32 %1
 ; CHECK-NEXT: ret i32 %1
 }
Index: vendor/llvm/dist/test/Transforms/LoopStrengthReduce/funclet.ll
===================================================================
--- vendor/llvm/dist/test/Transforms/LoopStrengthReduce/funclet.ll	(revision 295845)
+++ vendor/llvm/dist/test/Transforms/LoopStrengthReduce/funclet.ll	(revision 295846)
@@ -1,216 +1,245 @@
 ; RUN: opt < %s -loop-reduce -S | FileCheck %s
 
 target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"
 target triple = "i686-pc-windows-msvc"
 
 declare i32 @_except_handler3(...)
 declare i32 @__CxxFrameHandler3(...)
 
 declare void @external(i32*)
 declare void @reserve()
 
 define void @f() personality i32 (...)* @_except_handler3 {
 entry:
   br label %throw
 
 throw:                                            ; preds = %throw, %entry
   %tmp96 = getelementptr inbounds i8, i8* undef, i32 1
   invoke void @reserve()
           to label %throw unwind label %pad
 
 pad:                                              ; preds = %throw
   %phi2 = phi i8* [ %tmp96, %throw ]
   %cs = catchswitch within none [label %unreachable] unwind label %blah2
 
 unreachable:
   catchpad within %cs []
   unreachable
 
 blah2:
   %cleanuppadi4.i.i.i = cleanuppad within none []
   br label %loop_body
 
 loop_body:                                        ; preds = %iter, %pad
   %tmp99 = phi i8* [ %tmp101, %iter ], [ %phi2, %blah2 ]
   %tmp100 = icmp eq i8* %tmp99, undef
   br i1 %tmp100, label %unwind_out, label %iter
 
 iter:                                             ; preds = %loop_body
   %tmp101 = getelementptr inbounds i8, i8* %tmp99, i32 1
   br i1 undef, label %unwind_out, label %loop_body
 
 unwind_out:                                       ; preds = %iter, %loop_body
   cleanupret from %cleanuppadi4.i.i.i unwind to caller
 }
 
 ; CHECK-LABEL: define void @f(
 ; CHECK: cleanuppad within none []
 ; CHECK-NEXT: ptrtoint i8* %phi2 to i32
 
 define void @g() personality i32 (...)* @_except_handler3 {
 entry:
   br label %throw
 
 throw:                                            ; preds = %throw, %entry
   %tmp96 = getelementptr inbounds i8, i8* undef, i32 1
   invoke void @reserve()
           to label %throw unwind label %pad
 
 pad:
   %phi2 = phi i8* [ %tmp96, %throw ]
   %cs = catchswitch within none [label %unreachable, label %blah] unwind to caller
 
 unreachable:
   catchpad within %cs []
   unreachable
 
 blah:
   %catchpad = catchpad within %cs []
   br label %loop_body
 
 unwind_out:
   catchret from %catchpad to label %leave
 
 leave:
   ret void
 
 loop_body:                                        ; preds = %iter, %pad
   %tmp99 = phi i8* [ %tmp101, %iter ], [ %phi2, %blah ]
   %tmp100 = icmp eq i8* %tmp99, undef
   br i1 %tmp100, label %unwind_out, label %iter
 
 iter:                                             ; preds = %loop_body
   %tmp101 = getelementptr inbounds i8, i8* %tmp99, i32 1
   br i1 undef, label %unwind_out, label %loop_body
 }
 
 ; CHECK-LABEL: define void @g(
 ; CHECK: blah:
 ; CHECK-NEXT: catchpad within %cs []
 ; CHECK-NEXT: ptrtoint i8* %phi2 to i32
 
 
 define void @h() personality i32 (...)* @_except_handler3 {
 entry:
   br label %throw
 
 throw:                                            ; preds = %throw, %entry
   %tmp96 = getelementptr inbounds i8, i8* undef, i32 1
   invoke void @reserve()
           to label %throw unwind label %pad
 
 pad:
   %cs = catchswitch within none [label %unreachable, label %blug] unwind to caller
 
 unreachable:
   catchpad within %cs []
   unreachable
 
 blug:
   %phi2 = phi i8* [ %tmp96, %pad ]
   %catchpad = catchpad within %cs []
   br label %loop_body
 
 unwind_out:
   catchret from %catchpad to label %leave
 
 leave:
   ret void
 
 loop_body:                                        ; preds = %iter, %pad
   %tmp99 = phi i8* [ %tmp101, %iter ], [ %phi2, %blug ]
   %tmp100 = icmp eq i8* %tmp99, undef
   br i1 %tmp100, label %unwind_out, label %iter
 
 iter:                                             ; preds = %loop_body
   %tmp101 = getelementptr inbounds i8, i8* %tmp99, i32 1
   br i1 undef, label %unwind_out, label %loop_body
 }
 
 ; CHECK-LABEL: define void @h(
 ; CHECK: blug:
 ; CHECK: catchpad within %cs []
 ; CHECK-NEXT: ptrtoint i8* %phi2 to i32
 
 define void @i() personality i32 (...)* @_except_handler3 {
 entry:
   br label %throw
 
 throw:                                            ; preds = %throw, %entry
   %tmp96 = getelementptr inbounds i8, i8* undef, i32 1
   invoke void @reserve()
           to label %throw unwind label %catchpad
 
 catchpad:                                              ; preds = %throw
   %phi2 = phi i8* [ %tmp96, %throw ]
   %cs = catchswitch within none [label %cp_body] unwind label %cleanuppad
 
 cp_body:
   catchpad within %cs []
   br label %loop_head
 
 cleanuppad:
   cleanuppad within none []
   br label %loop_head
 
 loop_head:
   br label %loop_body
 
 loop_body:                                        ; preds = %iter, %catchpad
   %tmp99 = phi i8* [ %tmp101, %iter ], [ %phi2, %loop_head ]
   %tmp100 = icmp eq i8* %tmp99, undef
   br i1 %tmp100, label %unwind_out, label %iter
 
 iter:                                             ; preds = %loop_body
   %tmp101 = getelementptr inbounds i8, i8* %tmp99, i32 1
   br i1 undef, label %unwind_out, label %loop_body
 
 unwind_out:                                       ; preds = %iter, %loop_body
   unreachable
 }
 
 ; CHECK-LABEL: define void @i(
 ; CHECK: ptrtoint i8* %phi2 to i32
 
 define void @test1(i32* %b, i32* %c) personality i32 (...)* @__CxxFrameHandler3 {
 entry:
   br label %for.cond
 
 for.cond:                                         ; preds = %for.inc, %entry
   %d.0 = phi i32* [ %b, %entry ], [ %incdec.ptr, %for.inc ]
   invoke void @external(i32* %d.0)
           to label %for.inc unwind label %catch.dispatch
 
 for.inc:                                          ; preds = %for.cond
   %incdec.ptr = getelementptr inbounds i32, i32* %d.0, i32 1
   br label %for.cond
 
 catch.dispatch:                                   ; preds = %for.cond
   %cs = catchswitch within none [label %catch] unwind label %catch.dispatch.2
 
 catch:                                            ; preds = %catch.dispatch
   %0 = catchpad within %cs [i8* null, i32 64, i8* null]
   catchret from %0 to label %try.cont
 
 try.cont:                                         ; preds = %catch
   invoke void @external(i32* %c)
           to label %try.cont.7 unwind label %catch.dispatch.2
 
 catch.dispatch.2:                                 ; preds = %try.cont, %catchendblock
   %e.0 = phi i32* [ %c, %try.cont ], [ %b, %catch.dispatch ]
   %cs2 = catchswitch within none [label %catch.4] unwind to caller
 
 catch.4:                                          ; preds = %catch.dispatch.2
   catchpad within %cs2 [i8* null, i32 64, i8* null]
   unreachable
 
 try.cont.7:                                       ; preds = %try.cont
   ret void
 }
 
 ; CHECK-LABEL: define void @test1(
 ; CHECK: for.cond:
 ; CHECK:   %d.0 = phi i32* [ %b, %entry ], [ %incdec.ptr, %for.inc ]
 
 ; CHECK: catch.dispatch.2:
 ; CHECK: %e.0 = phi i32* [ %c, %try.cont ], [ %b, %catch.dispatch ]
+
+define i32 @test2() personality i32 (...)* @_except_handler3 {
+entry:
+  br label %for.body
+
+for.body:                                         ; preds = %for.inc, %entry
+  %phi = phi i32 [ %inc, %for.inc ], [ 0, %entry ]
+  invoke void @reserve()
+          to label %for.inc unwind label %catch.dispatch
+
+catch.dispatch:                                   ; preds = %for.body
+  %tmp18 = catchswitch within none [label %catch.handler] unwind to caller
+
+catch.handler:                                    ; preds = %catch.dispatch
+  %phi.lcssa = phi i32 [ %phi, %catch.dispatch ]
+  %tmp19 = catchpad within %tmp18 [i8* null]
+  catchret from %tmp19 to label %done
+
+done:
+  ret i32 %phi.lcssa
+
+for.inc:                                          ; preds = %for.body
+  %inc = add i32 %phi, 1
+  br label %for.body
+}
+
+; CHECK-LABEL: define i32 @test2(
+; CHECK:      %phi.lcssa = phi i32 [ %phi, %catch.dispatch ]
+; CHECK-NEXT: catchpad within
Index: vendor/llvm/dist/test/Transforms/LoopVectorize/PowerPC/stride-vectorization.ll
===================================================================
--- vendor/llvm/dist/test/Transforms/LoopVectorize/PowerPC/stride-vectorization.ll	(revision 295845)
+++ vendor/llvm/dist/test/Transforms/LoopVectorize/PowerPC/stride-vectorization.ll	(revision 295846)
@@ -1,30 +1,36 @@
 ; RUN: opt -S -basicaa -loop-vectorize < %s | FileCheck %s
 target datalayout = "E-m:e-i64:64-n32:64"
 target triple = "powerpc64-unknown-linux-gnu"
 
 ; Function Attrs: nounwind
 define void @foo(double* noalias nocapture %a, double* noalias nocapture readonly %b) #0 {
 entry:
   br label %for.body
 
 ; CHECK-LABEL: @foo
 ; CHECK: <2 x double>
 
 for.cond.cleanup:                                 ; preds = %for.body
   ret void
 
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %0 = shl nsw i64 %indvars.iv, 1
+  %odd.idx = add nsw i64 %0, 1
+
   %arrayidx = getelementptr inbounds double, double* %b, i64 %0
+  %arrayidx.odd = getelementptr inbounds double, double* %b, i64 %odd.idx
+
   %1 = load double, double* %arrayidx, align 8
-  %add = fadd double %1, 1.000000e+00
+  %2 = load double, double* %arrayidx.odd, align 8
+
+  %add = fadd double %1, %2
   %arrayidx2 = getelementptr inbounds double, double* %a, i64 %indvars.iv
   store double %add, double* %arrayidx2, align 8
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 1600
   br i1 %exitcond, label %for.cond.cleanup, label %for.body
 }
 
 attributes #0 = { nounwind "target-cpu"="pwr8" }
 
Index: vendor/llvm/dist/test/Transforms/LoopVectorize/X86/reg-usage.ll
===================================================================
--- vendor/llvm/dist/test/Transforms/LoopVectorize/X86/reg-usage.ll	(revision 295845)
+++ vendor/llvm/dist/test/Transforms/LoopVectorize/X86/reg-usage.ll	(nonexistent)
@@ -1,71 +0,0 @@
-; RUN: opt < %s -debug-only=loop-vectorize -loop-vectorize -vectorizer-maximize-bandwidth -O2 -S 2>&1 | FileCheck %s
-; REQUIRES: asserts
-
-target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
-target triple = "x86_64-unknown-linux-gnu"
-
-@a = global [1024 x i8] zeroinitializer, align 16
-@b = global [1024 x i8] zeroinitializer, align 16
-
-define i32 @foo() {
-; This function has a loop of SAD pattern. Here we check when VF = 16 the
-; register usage doesn't exceed 16.
-;
-; CHECK-LABEL: foo
-; CHECK:      LV(REG): VF = 4
-; CHECK-NEXT: LV(REG): Found max usage: 4
-; CHECK:      LV(REG): VF = 8
-; CHECK-NEXT: LV(REG): Found max usage: 7
-; CHECK:      LV(REG): VF = 16
-; CHECK-NEXT: LV(REG): Found max usage: 13
-
-entry:
-  br label %for.body
-
-for.cond.cleanup:
-  %add.lcssa = phi i32 [ %add, %for.body ]
-  ret i32 %add.lcssa
-
-for.body:
-  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
-  %s.015 = phi i32 [ 0, %entry ], [ %add, %for.body ]
-  %arrayidx = getelementptr inbounds [1024 x i8], [1024 x i8]* @a, i64 0, i64 %indvars.iv
-  %0 = load i8, i8* %arrayidx, align 1
-  %conv = zext i8 %0 to i32
-  %arrayidx2 = getelementptr inbounds [1024 x i8], [1024 x i8]* @b, i64 0, i64 %indvars.iv
-  %1 = load i8, i8* %arrayidx2, align 1
-  %conv3 = zext i8 %1 to i32
-  %sub = sub nsw i32 %conv, %conv3
-  %ispos = icmp sgt i32 %sub, -1
-  %neg = sub nsw i32 0, %sub
-  %2 = select i1 %ispos, i32 %sub, i32 %neg
-  %add = add nsw i32 %2, %s.015
-  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
-  %exitcond = icmp eq i64 %indvars.iv.next, 1024
-  br i1 %exitcond, label %for.cond.cleanup, label %for.body
-}
-
-define i64 @bar(i64* nocapture %a) {
-; CHECK-LABEL: bar
-; CHECK:       LV(REG): VF = 2
-; CHECK:       LV(REG): Found max usage: 4
-;
-entry:
-  br label %for.body
-
-for.cond.cleanup:
-  %add2.lcssa = phi i64 [ %add2, %for.body ]
-  ret i64 %add2.lcssa
-
-for.body:
-  %i.012 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
-  %s.011 = phi i64 [ 0, %entry ], [ %add2, %for.body ]
-  %arrayidx = getelementptr inbounds i64, i64* %a, i64 %i.012
-  %0 = load i64, i64* %arrayidx, align 8
-  %add = add nsw i64 %0, %i.012
-  store i64 %add, i64* %arrayidx, align 8
-  %add2 = add nsw i64 %add, %s.011
-  %inc = add nuw nsw i64 %i.012, 1
-  %exitcond = icmp eq i64 %inc, 1024
-  br i1 %exitcond, label %for.cond.cleanup, label %for.body
-}
Index: vendor/llvm/dist/test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll
===================================================================
--- vendor/llvm/dist/test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll	(revision 295845)
+++ vendor/llvm/dist/test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll	(revision 295846)
@@ -1,46 +1,46 @@
 ; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -mcpu=corei7-avx -debug-only=loop-vectorize -S < %s 2>&1 | FileCheck %s
 ; REQUIRES: asserts
 
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 
 @a = global [1000 x i8] zeroinitializer, align 16
 @b = global [1000 x i8] zeroinitializer, align 16
 @c = global [1000 x i8] zeroinitializer, align 16
 @u = global [1000 x i32] zeroinitializer, align 16
 @v = global [1000 x i32] zeroinitializer, align 16
 @w = global [1000 x i32] zeroinitializer, align 16
 
 ; Tests that the vectorization factor is determined by the smallest instead of
 ; widest type in the loop for maximum bandwidth when
 ; -vectorizer-maximize-bandwidth is indicated.
 ;
 ; CHECK-label: foo
-; CHECK: LV: Selecting VF: 32.
+; CHECK: LV: Selecting VF: 16.
 define void @foo() {
 entry:
   br label %for.body
 
 for.cond.cleanup:
   ret void
 
 for.body:
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds [1000 x i8], [1000 x i8]* @b, i64 0, i64 %indvars.iv
   %0 = load i8, i8* %arrayidx, align 1
   %arrayidx2 = getelementptr inbounds [1000 x i8], [1000 x i8]* @c, i64 0, i64 %indvars.iv
   %1 = load i8, i8* %arrayidx2, align 1
   %add = add i8 %1, %0
   %arrayidx6 = getelementptr inbounds [1000 x i8], [1000 x i8]* @a, i64 0, i64 %indvars.iv
   store i8 %add, i8* %arrayidx6, align 1
   %arrayidx8 = getelementptr inbounds [1000 x i32], [1000 x i32]* @v, i64 0, i64 %indvars.iv
   %2 = load i32, i32* %arrayidx8, align 4
   %arrayidx10 = getelementptr inbounds [1000 x i32], [1000 x i32]* @w, i64 0, i64 %indvars.iv
   %3 = load i32, i32* %arrayidx10, align 4
   %add11 = add nsw i32 %3, %2
   %arrayidx13 = getelementptr inbounds [1000 x i32], [1000 x i32]* @u, i64 0, i64 %indvars.iv
   store i32 %add11, i32* %arrayidx13, align 4
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 1000
   br i1 %exitcond, label %for.cond.cleanup, label %for.body
 }
Index: vendor/llvm/dist/test/Transforms/LoopVectorize/interleaved-accesses.ll
===================================================================
--- vendor/llvm/dist/test/Transforms/LoopVectorize/interleaved-accesses.ll	(revision 295845)
+++ vendor/llvm/dist/test/Transforms/LoopVectorize/interleaved-accesses.ll	(revision 295846)
@@ -1,467 +1,465 @@
 ; RUN: opt -S -loop-vectorize -instcombine -force-vector-width=4 -force-vector-interleave=1 -enable-interleaved-mem-accesses=true -runtime-memory-check-threshold=24 < %s | FileCheck %s
 
 target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
 
 ; Check vectorization on an interleaved load group of factor 2 and an interleaved
 ; store group of factor 2.
 
 ; int AB[1024];
 ; int CD[1024];
 ;  void test_array_load2_store2(int C, int D) {
 ;   for (int i = 0; i < 1024; i+=2) {
 ;     int A = AB[i];
 ;     int B = AB[i+1];
 ;     CD[i] = A + C;
 ;     CD[i+1] = B * D;
 ;   }
 ; }
 
 ; CHECK-LABEL: @test_array_load2_store2(
 ; CHECK: %wide.vec = load <8 x i32>, <8 x i32>* %{{.*}}, align 4
 ; CHECK: shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
 ; CHECK: shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
 ; CHECK: add nsw <4 x i32>
 ; CHECK: mul nsw <4 x i32>
 ; CHECK: %interleaved.vec = shufflevector <4 x i32> {{.*}}, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
 ; CHECK: store <8 x i32> %interleaved.vec, <8 x i32>* %{{.*}}, align 4
 
 @AB = common global [1024 x i32] zeroinitializer, align 4
 @CD = common global [1024 x i32] zeroinitializer, align 4
 
 define void @test_array_load2_store2(i32 %C, i32 %D) {
 entry:
   br label %for.body
 
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx0 = getelementptr inbounds [1024 x i32], [1024 x i32]* @AB, i64 0, i64 %indvars.iv
   %tmp = load i32, i32* %arrayidx0, align 4
   %tmp1 = or i64 %indvars.iv, 1
   %arrayidx1 = getelementptr inbounds [1024 x i32], [1024 x i32]* @AB, i64 0, i64 %tmp1
   %tmp2 = load i32, i32* %arrayidx1, align 4
   %add = add nsw i32 %tmp, %C
   %mul = mul nsw i32 %tmp2, %D
   %arrayidx2 = getelementptr inbounds [1024 x i32], [1024 x i32]* @CD, i64 0, i64 %indvars.iv
   store i32 %add, i32* %arrayidx2, align 4
   %arrayidx3 = getelementptr inbounds [1024 x i32], [1024 x i32]* @CD, i64 0, i64 %tmp1
   store i32 %mul, i32* %arrayidx3, align 4
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
   %cmp = icmp slt i64 %indvars.iv.next, 1024
   br i1 %cmp, label %for.body, label %for.end
 
 for.end:                                          ; preds = %for.body
   ret void
 }
 
 ; int A[3072];
 ; struct ST S[1024];
 ; void test_struct_st3() {
 ;   int *ptr = A;
 ;   for (int i = 0; i < 1024; i++) {
 ;     int X1 = *ptr++;
 ;     int X2 = *ptr++;
 ;     int X3 = *ptr++;
 ;     T[i].x = X1 + 1;
 ;     T[i].y = X2 + 2;
 ;     T[i].z = X3 + 3;
 ;   }
 ; }
 
 ; CHECK-LABEL: @test_struct_array_load3_store3(
 ; CHECK: %wide.vec = load <12 x i32>, <12 x i32>* {{.*}}, align 4
 ; CHECK: shufflevector <12 x i32> %wide.vec, <12 x i32> undef, <4 x i32> <i32 0, i32 3, i32 6, i32 9>
 ; CHECK: shufflevector <12 x i32> %wide.vec, <12 x i32> undef, <4 x i32> <i32 1, i32 4, i32 7, i32 10>
 ; CHECK: shufflevector <12 x i32> %wide.vec, <12 x i32> undef, <4 x i32> <i32 2, i32 5, i32 8, i32 11>
 ; CHECK: add nsw <4 x i32> {{.*}}, <i32 1, i32 1, i32 1, i32 1>
 ; CHECK: add nsw <4 x i32> {{.*}}, <i32 2, i32 2, i32 2, i32 2>
 ; CHECK: add nsw <4 x i32> {{.*}}, <i32 3, i32 3, i32 3, i32 3>
 ; CHECK: shufflevector <4 x i32> {{.*}}, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
 ; CHECK: shufflevector <4 x i32> {{.*}}, <4 x i32> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
 ; CHECK: %interleaved.vec = shufflevector <8 x i32> {{.*}}, <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>
 ; CHECK: store <12 x i32> %interleaved.vec, <12 x i32>* {{.*}}, align 4
 
 %struct.ST3 = type { i32, i32, i32 }
 @A = common global [3072 x i32] zeroinitializer, align 4
 @S = common global [1024 x %struct.ST3] zeroinitializer, align 4
 
 define void @test_struct_array_load3_store3() {
 entry:
   br label %for.body
 
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %ptr.016 = phi i32* [ getelementptr inbounds ([3072 x i32], [3072 x i32]* @A, i64 0, i64 0), %entry ], [ %incdec.ptr2, %for.body ]
   %incdec.ptr = getelementptr inbounds i32, i32* %ptr.016, i64 1
   %tmp = load i32, i32* %ptr.016, align 4
   %incdec.ptr1 = getelementptr inbounds i32, i32* %ptr.016, i64 2
   %tmp1 = load i32, i32* %incdec.ptr, align 4
   %incdec.ptr2 = getelementptr inbounds i32, i32* %ptr.016, i64 3
   %tmp2 = load i32, i32* %incdec.ptr1, align 4
   %add = add nsw i32 %tmp, 1
   %x = getelementptr inbounds [1024 x %struct.ST3], [1024 x %struct.ST3]* @S, i64 0, i64 %indvars.iv, i32 0
   store i32 %add, i32* %x, align 4
   %add3 = add nsw i32 %tmp1, 2
   %y = getelementptr inbounds [1024 x %struct.ST3], [1024 x %struct.ST3]* @S, i64 0, i64 %indvars.iv, i32 1
   store i32 %add3, i32* %y, align 4
   %add6 = add nsw i32 %tmp2, 3
   %z = getelementptr inbounds [1024 x %struct.ST3], [1024 x %struct.ST3]* @S, i64 0, i64 %indvars.iv, i32 2
   store i32 %add6, i32* %z, align 4
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 1024
   br i1 %exitcond, label %for.end, label %for.body
 
 for.end:                                          ; preds = %for.body
   ret void
 }
 
 ; Check vectorization on an interleaved load group of factor 4.
 
 ; struct ST4{
 ;   int x;
 ;   int y;
 ;   int z;
 ;   int w;
 ; };
 ; int test_struct_load4(struct ST4 *S) {
 ;   int r = 0;
 ;   for (int i = 0; i < 1024; i++) {
 ;      r += S[i].x;
 ;      r -= S[i].y;
 ;      r += S[i].z;
 ;      r -= S[i].w;
 ;   }
 ;   return r;
 ; }
 
 ; CHECK-LABEL: @test_struct_load4(
 ; CHECK: %wide.vec = load <16 x i32>, <16 x i32>* {{.*}}, align 4
 ; CHECK: shufflevector <16 x i32> %wide.vec, <16 x i32> undef, <4 x i32> <i32 0, i32 4, i32 8, i32 12>
 ; CHECK: shufflevector <16 x i32> %wide.vec, <16 x i32> undef, <4 x i32> <i32 1, i32 5, i32 9, i32 13>
 ; CHECK: shufflevector <16 x i32> %wide.vec, <16 x i32> undef, <4 x i32> <i32 2, i32 6, i32 10, i32 14>
 ; CHECK: shufflevector <16 x i32> %wide.vec, <16 x i32> undef, <4 x i32> <i32 3, i32 7, i32 11, i32 15>
 ; CHECK: add nsw <4 x i32>
 ; CHECK: sub <4 x i32>
 ; CHECK: add nsw <4 x i32>
 ; CHECK: sub <4 x i32>
 
 %struct.ST4 = type { i32, i32, i32, i32 }
 
 define i32 @test_struct_load4(%struct.ST4* nocapture readonly %S) {
 entry:
   br label %for.body
 
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %r.022 = phi i32 [ 0, %entry ], [ %sub8, %for.body ]
   %x = getelementptr inbounds %struct.ST4, %struct.ST4* %S, i64 %indvars.iv, i32 0
   %tmp = load i32, i32* %x, align 4
   %add = add nsw i32 %tmp, %r.022
   %y = getelementptr inbounds %struct.ST4, %struct.ST4* %S, i64 %indvars.iv, i32 1
   %tmp1 = load i32, i32* %y, align 4
   %sub = sub i32 %add, %tmp1
   %z = getelementptr inbounds %struct.ST4, %struct.ST4* %S, i64 %indvars.iv, i32 2
   %tmp2 = load i32, i32* %z, align 4
   %add5 = add nsw i32 %sub, %tmp2
   %w = getelementptr inbounds %struct.ST4, %struct.ST4* %S, i64 %indvars.iv, i32 3
   %tmp3 = load i32, i32* %w, align 4
   %sub8 = sub i32 %add5, %tmp3
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 1024
   br i1 %exitcond, label %for.end, label %for.body
 
 for.end:                                          ; preds = %for.body
   ret i32 %sub8
 }
 
 ; Check vectorization on an interleaved store group of factor 4.
 
 ; void test_struct_store4(int *A, struct ST4 *B) {
 ;   int *ptr = A;
 ;   for (int i = 0; i < 1024; i++) {
 ;     int X = *ptr++;
 ;     B[i].x = X + 1;
 ;     B[i].y = X * 2;
 ;     B[i].z = X + 3;
 ;     B[i].w = X + 4;
 ;   }
 ; }
 
 ; CHECK-LABEL: @test_struct_store4(
 ; CHECK: %[[LD:.*]] = load <4 x i32>, <4 x i32>* 
 ; CHECK: add nsw <4 x i32> %[[LD]], <i32 1, i32 1, i32 1, i32 1>
 ; CHECK: shl nsw <4 x i32> %[[LD]], <i32 1, i32 1, i32 1, i32 1>
 ; CHECK: add nsw <4 x i32> %[[LD]], <i32 3, i32 3, i32 3, i32 3>
 ; CHECK: add nsw <4 x i32> %[[LD]], <i32 4, i32 4, i32 4, i32 4>
 ; CHECK: shufflevector <4 x i32> {{.*}}, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
 ; CHECK: shufflevector <4 x i32> {{.*}}, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
 ; CHECK: %interleaved.vec = shufflevector <8 x i32> {{.*}}, <16 x i32> <i32 0, i32 4, i32 8, i32 12, i32 1, i32 5, i32 9, i32 13, i32 2, i32 6, i32 10, i32 14, i32 3, i32 7, i32 11, i32 15>
 ; CHECK: store <16 x i32> %interleaved.vec, <16 x i32>* {{.*}}, align 4
 
 define void @test_struct_store4(i32* noalias nocapture readonly %A, %struct.ST4* noalias nocapture %B) {
 entry:
   br label %for.body
 
 for.cond.cleanup:                                 ; preds = %for.body
   ret void
 
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %ptr.024 = phi i32* [ %A, %entry ], [ %incdec.ptr, %for.body ]
   %incdec.ptr = getelementptr inbounds i32, i32* %ptr.024, i64 1
   %tmp = load i32, i32* %ptr.024, align 4
   %add = add nsw i32 %tmp, 1
   %x = getelementptr inbounds %struct.ST4, %struct.ST4* %B, i64 %indvars.iv, i32 0
   store i32 %add, i32* %x, align 4
   %mul = shl nsw i32 %tmp, 1
   %y = getelementptr inbounds %struct.ST4, %struct.ST4* %B, i64 %indvars.iv, i32 1
   store i32 %mul, i32* %y, align 4
   %add3 = add nsw i32 %tmp, 3
   %z = getelementptr inbounds %struct.ST4, %struct.ST4* %B, i64 %indvars.iv, i32 2
   store i32 %add3, i32* %z, align 4
   %add6 = add nsw i32 %tmp, 4
   %w = getelementptr inbounds %struct.ST4, %struct.ST4* %B, i64 %indvars.iv, i32 3
   store i32 %add6, i32* %w, align 4
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 1024
   br i1 %exitcond, label %for.cond.cleanup, label %for.body
 }
 
 ; Check vectorization on a reverse interleaved load group of factor 2 and
 ; a reverse interleaved store group of factor 2.
 
 ; struct ST2 {
 ;  int x;
 ;  int y;
 ; };
 ;
 ; void test_reversed_load2_store2(struct ST2 *A, struct ST2 *B) {
 ;   for (int i = 1023; i >= 0; i--) {
 ;     int a = A[i].x + i;  // interleaved load of index 0
 ;     int b = A[i].y - i;  // interleaved load of index 1
 ;     B[i].x = a;          // interleaved store of index 0
 ;     B[i].y = b;          // interleaved store of index 1
 ;   }
 ; }
 
 ; CHECK-LABEL: @test_reversed_load2_store2(
 ; CHECK: %wide.vec = load <8 x i32>, <8 x i32>* {{.*}}, align 4
 ; CHECK: shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
 ; CHECK: shufflevector <4 x i32> {{.*}}, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
 ; CHECK: shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
 ; CHECK: shufflevector <4 x i32> {{.*}}, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
 ; CHECK: add nsw <4 x i32>
 ; CHECK: sub nsw <4 x i32>
 ; CHECK: shufflevector <4 x i32> {{.*}}, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
 ; CHECK: shufflevector <4 x i32> {{.*}}, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
 ; CHECK: %interleaved.vec = shufflevector <4 x i32> {{.*}}, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
 ; CHECK: store <8 x i32> %interleaved.vec, <8 x i32>* %{{.*}}, align 4
 
 %struct.ST2 = type { i32, i32 }
 
 define void @test_reversed_load2_store2(%struct.ST2* noalias nocapture readonly %A, %struct.ST2* noalias nocapture %B) {
 entry:
   br label %for.body
 
 for.cond.cleanup:                                 ; preds = %for.body
   ret void
 
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 1023, %entry ], [ %indvars.iv.next, %for.body ]
   %x = getelementptr inbounds %struct.ST2, %struct.ST2* %A, i64 %indvars.iv, i32 0
   %tmp = load i32, i32* %x, align 4
   %tmp1 = trunc i64 %indvars.iv to i32
   %add = add nsw i32 %tmp, %tmp1
   %y = getelementptr inbounds %struct.ST2, %struct.ST2* %A, i64 %indvars.iv, i32 1
   %tmp2 = load i32, i32* %y, align 4
   %sub = sub nsw i32 %tmp2, %tmp1
   %x5 = getelementptr inbounds %struct.ST2, %struct.ST2* %B, i64 %indvars.iv, i32 0
   store i32 %add, i32* %x5, align 4
   %y8 = getelementptr inbounds %struct.ST2, %struct.ST2* %B, i64 %indvars.iv, i32 1
   store i32 %sub, i32* %y8, align 4
   %indvars.iv.next = add nsw i64 %indvars.iv, -1
   %cmp = icmp sgt i64 %indvars.iv, 0
   br i1 %cmp, label %for.body, label %for.cond.cleanup
 }
 
 ; Check vectorization on an interleaved load group of factor 2 with 1 gap
 ; (missing the load of odd elements).
 
 ; void even_load(int *A, int *B) {
 ;  for (unsigned i = 0; i < 1024; i+=2)
 ;     B[i/2] = A[i] * 2;
 ; }
 
 ; CHECK-LABEL: @even_load(
-; CHECK: %wide.vec = load <8 x i32>, <8 x i32>* %{{.*}}, align 4
-; CHECK: %strided.vec = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
-; CHECK-NOT: shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
-; CHECK: shl nsw <4 x i32> %strided.vec, <i32 1, i32 1, i32 1, i32 1>
+; CHECK-NOT: %wide.vec = load <8 x i32>, <8 x i32>* %{{.*}}, align 4
+; CHECK-NOT: %strided.vec = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
 
 define void @even_load(i32* noalias nocapture readonly %A, i32* noalias nocapture %B) {
 entry:
   br label %for.body
 
 for.cond.cleanup:                                 ; preds = %for.body
   ret void
 
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
   %tmp = load i32, i32* %arrayidx, align 4
   %mul = shl nsw i32 %tmp, 1
   %tmp1 = lshr exact i64 %indvars.iv, 1
   %arrayidx2 = getelementptr inbounds i32, i32* %B, i64 %tmp1
   store i32 %mul, i32* %arrayidx2, align 4
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
   %cmp = icmp ult i64 %indvars.iv.next, 1024
   br i1 %cmp, label %for.body, label %for.cond.cleanup
 }
 
 ; Check vectorization on interleaved access groups identified from mixed
 ; loads/stores.
 ; void mixed_load2_store2(int *A, int *B) {
 ;   for (unsigned i = 0; i < 1024; i+=2)  {
 ;     B[i] = A[i] * A[i+1];
 ;     B[i+1] = A[i] + A[i+1];
 ;   }
 ; }
 
 ; CHECK-LABEL: @mixed_load2_store2(
 ; CHECK: %wide.vec = load <8 x i32>, <8 x i32>* {{.*}}, align 4
 ; CHECK: shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
 ; CHECK: shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
 ; CHECK: %interleaved.vec = shufflevector <4 x i32> %{{.*}}, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
 ; CHECK: store <8 x i32> %interleaved.vec
 
 define void @mixed_load2_store2(i32* noalias nocapture readonly %A, i32* noalias nocapture %B) {
 entry:
   br label %for.body
 
 for.cond.cleanup:                                 ; preds = %for.body
   ret void
 
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
   %tmp = load i32, i32* %arrayidx, align 4
   %tmp1 = or i64 %indvars.iv, 1
   %arrayidx2 = getelementptr inbounds i32, i32* %A, i64 %tmp1
   %tmp2 = load i32, i32* %arrayidx2, align 4
   %mul = mul nsw i32 %tmp2, %tmp
   %arrayidx4 = getelementptr inbounds i32, i32* %B, i64 %indvars.iv
   store i32 %mul, i32* %arrayidx4, align 4
   %tmp3 = load i32, i32* %arrayidx, align 4
   %tmp4 = load i32, i32* %arrayidx2, align 4
   %add10 = add nsw i32 %tmp4, %tmp3
   %arrayidx13 = getelementptr inbounds i32, i32* %B, i64 %tmp1
   store i32 %add10, i32* %arrayidx13, align 4
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
   %cmp = icmp ult i64 %indvars.iv.next, 1024
   br i1 %cmp, label %for.body, label %for.cond.cleanup
 }
 
 ; Check vectorization on interleaved access groups identified from mixed
 ; loads/stores.
 ; void mixed_load3_store3(int *A) {
 ;   for (unsigned i = 0; i < 1024; i++)  {
 ;     *A++ += i;
 ;     *A++ += i;
 ;     *A++ += i;
 ;   }
 ; }
 
 ; CHECK-LABEL: @mixed_load3_store3(
 ; CHECK: %wide.vec = load <12 x i32>, <12 x i32>* {{.*}}, align 4
 ; CHECK: shufflevector <12 x i32> %wide.vec, <12 x i32> undef, <4 x i32> <i32 0, i32 3, i32 6, i32 9>
 ; CHECK: shufflevector <12 x i32> %wide.vec, <12 x i32> undef, <4 x i32> <i32 1, i32 4, i32 7, i32 10>
 ; CHECK: shufflevector <12 x i32> %wide.vec, <12 x i32> undef, <4 x i32> <i32 2, i32 5, i32 8, i32 11>
 ; CHECK: %interleaved.vec = shufflevector <8 x i32> %{{.*}}, <12 x i32> <i32 0, i32 4, i32 8, i32 1, i32 5, i32 9, i32 2, i32 6, i32 10, i32 3, i32 7, i32 11>
 ; CHECK: store <12 x i32> %interleaved.vec, <12 x i32>* %{{.*}}, align 4
 
 define void @mixed_load3_store3(i32* nocapture %A) {
 entry:
   br label %for.body
 
 for.cond.cleanup:                                 ; preds = %for.body
   ret void
 
 for.body:                                         ; preds = %for.body, %entry
   %i.013 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
   %A.addr.012 = phi i32* [ %A, %entry ], [ %incdec.ptr3, %for.body ]
   %incdec.ptr = getelementptr inbounds i32, i32* %A.addr.012, i64 1
   %tmp = load i32, i32* %A.addr.012, align 4
   %add = add i32 %tmp, %i.013
   store i32 %add, i32* %A.addr.012, align 4
   %incdec.ptr1 = getelementptr inbounds i32, i32* %A.addr.012, i64 2
   %tmp1 = load i32, i32* %incdec.ptr, align 4
   %add2 = add i32 %tmp1, %i.013
   store i32 %add2, i32* %incdec.ptr, align 4
   %incdec.ptr3 = getelementptr inbounds i32, i32* %A.addr.012, i64 3
   %tmp2 = load i32, i32* %incdec.ptr1, align 4
   %add4 = add i32 %tmp2, %i.013
   store i32 %add4, i32* %incdec.ptr1, align 4
   %inc = add nuw nsw i32 %i.013, 1
   %exitcond = icmp eq i32 %inc, 1024
   br i1 %exitcond, label %for.cond.cleanup, label %for.body
 }
 
 ; Check vectorization on interleaved access groups with members having different
 ; kinds of type.
 
 ; struct IntFloat {
 ;   int a;
 ;   float b;
 ; };
 ; 
 ; int SA;
 ; float SB;
 ;
 ; void int_float_struct(struct IntFloat *A) {
 ;   int SumA;
 ;   float SumB;
 ;   for (unsigned i = 0; i < 1024; i++)  {
 ;     SumA += A[i].a;
 ;     SumB += A[i].b;
 ;   }
 ;   SA = SumA;
 ;   SB = SumB;
 ; }
 
 ; CHECK-LABEL: @int_float_struct(
 ; CHECK: %wide.vec = load <8 x i32>, <8 x i32>* %{{.*}}, align 4
 ; CHECK: %[[V0:.*]] = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
 ; CHECK: %[[V1:.*]] = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
 ; CHECK: bitcast <4 x i32> %[[V1]] to <4 x float>
 ; CHECK: add nsw <4 x i32>
 ; CHECK: fadd fast <4 x float>
 
 %struct.IntFloat = type { i32, float }
 
 @SA = common global i32 0, align 4
 @SB = common global float 0.000000e+00, align 4
 
 define void @int_float_struct(%struct.IntFloat* nocapture readonly %A) #0 {
 entry:
   br label %for.body
 
 for.cond.cleanup:                                 ; preds = %for.body
   store i32 %add, i32* @SA, align 4
   store float %add3, float* @SB, align 4
   ret void
 
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %SumB.014 = phi float [ undef, %entry ], [ %add3, %for.body ]
   %SumA.013 = phi i32 [ undef, %entry ], [ %add, %for.body ]
   %a = getelementptr inbounds %struct.IntFloat, %struct.IntFloat* %A, i64 %indvars.iv, i32 0
   %tmp = load i32, i32* %a, align 4
   %add = add nsw i32 %tmp, %SumA.013
   %b = getelementptr inbounds %struct.IntFloat, %struct.IntFloat* %A, i64 %indvars.iv, i32 1
   %tmp1 = load float, float* %b, align 4
   %add3 = fadd fast float %SumB.014, %tmp1
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 1024
   br i1 %exitcond, label %for.cond.cleanup, label %for.body
 }
 
 attributes #0 = { "unsafe-fp-math"="true" }
Index: vendor/llvm/dist/test/Transforms/PruneEH/pr26263.ll
===================================================================
--- vendor/llvm/dist/test/Transforms/PruneEH/pr26263.ll	(nonexistent)
+++ vendor/llvm/dist/test/Transforms/PruneEH/pr26263.ll	(revision 295846)
@@ -0,0 +1,56 @@
+; RUN: opt -prune-eh -S < %s | FileCheck %s
+target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"
+target triple = "i386-pc-windows-msvc"
+
+declare void @neverthrows() nounwind
+
+define void @test1() personality i32 (...)* @__CxxFrameHandler3 {
+  invoke void @neverthrows()
+          to label %try.cont unwind label %cleanuppad
+
+try.cont:
+  ret void
+
+cleanuppad:
+  %cp = cleanuppad within none []
+  br label %cleanupret
+
+cleanupret:
+  cleanupret from %cp unwind to caller
+}
+
+; CHECK-LABEL: define void @test1(
+; CHECK: call void @neverthrows()
+
+; CHECK: %[[cp:.*]] = cleanuppad within none []
+; CHECK-NEXT: unreachable
+
+; CHECK: cleanupret from %[[cp]] unwind to caller
+
+define void @test2() personality i32 (...)* @__CxxFrameHandler3 {
+  invoke void @neverthrows()
+          to label %try.cont unwind label %catchswitch
+
+try.cont:
+  ret void
+
+catchswitch:
+  %cs = catchswitch within none [label %catchpad] unwind to caller
+
+catchpad:
+  %cp = catchpad within %cs []
+  unreachable
+
+ret:
+  ret void
+}
+
+; CHECK-LABEL: define void @test2(
+; CHECK: call void @neverthrows()
+
+; CHECK: %[[cs:.*]] = catchswitch within none [label
+
+; CHECK: catchpad within %[[cs]] []
+; CHECK-NEXT: unreachable
+
+declare i32 @__CxxFrameHandler3(...)