Index: vendor/llvm-project/release-11.x/lld/docs/ReleaseNotes.rst
===================================================================
--- vendor/llvm-project/release-11.x/lld/docs/ReleaseNotes.rst	(revision 366332)
+++ vendor/llvm-project/release-11.x/lld/docs/ReleaseNotes.rst	(revision 366333)
@@ -1,187 +1,173 @@
 ========================
 lld 11.0.0 Release Notes
 ========================
 
 .. contents::
     :local:
 
-.. warning::
-   These are in-progress notes for the upcoming LLVM 11.0.0 release.
-   Release notes for previous releases can be found on
-   `the Download Page <https://releases.llvm.org/download.html>`_.
-
 Introduction
 ============
 
 This document contains the release notes for the lld linker, release 11.0.0.
 Here we describe the status of lld, including major improvements
 from the previous release. All lld releases may be downloaded
 from the `LLVM releases web site <https://llvm.org/releases/>`_.
 
 Non-comprehensive list of changes in this release
 =================================================
 
 ELF Improvements
 ----------------
 
 * ``--lto-emit-asm`` is added to emit assembly output for debugging purposes.
   (`D77231 <https://reviews.llvm.org/D77231>`_)
 * ``--lto-whole-program-visibility`` is added to specify that classes have hidden LTO visibility in LTO and ThinLTO links of source files compiled with ``-fwhole-program-vtables``. See `LTOVisibility <https://clang.llvm.org/docs/LTOVisibility.html>`_ for details.
   (`D71913 <https://reviews.llvm.org/D71913>`_)
 * ``--print-archive-stats=`` is added to print the number of members and the number of fetched members for each archive.
   The feature is similar to GNU gold's ``--print-symbol-counts=``.
   (`D78983 <https://reviews.llvm.org/D78983>`_)
 * ``--shuffle-sections=`` is added to introduce randomization in the output to help reduce measurement bias and detect static initialization order fiasco.
   (`D74791 <https://reviews.llvm.org/D74791>`_)
   (`D74887 <https://reviews.llvm.org/D74887>`_)
 * ``--time-trace`` is added. It records a time trace file that can be viewed in
   chrome://tracing. The file can be specified with ``--time-trace-file``.
   Trace granularity can be specified with ``--time-trace-granularity``.
   (`D71060 <https://reviews.llvm.org/D71060>`_)
 * ``--thinlto-single-module`` is added to compile a subset of modules in ThinLTO for debugging purposes.
   (`D80406 <https://reviews.llvm.org/D80406>`_)
 * ``--unique`` is added to create separate output sections for orphan sections.
   (`D75536 <https://reviews.llvm.org/D75536>`_)
 * ``--warn-backrefs`` has been improved to emulate GNU ld's archive semantics.
   If a link passes with warnings from ``--warn-backrefs``, it almost assuredly
   means that the link will fail with GNU ld, or the symbol will get different
   resolutions in GNU ld and LLD. ``--warn-backrefs-exclude=`` is added to
   exclude known issues.
   (`D77522 <https://reviews.llvm.org/D77522>`_)
   (`D77630 <https://reviews.llvm.org/D77630>`_)
   (`D77512 <https://reviews.llvm.org/D77512>`_)
 * ``--no-relax`` is accepted but ignored. The Linux kernel's RISC-V port uses this option.
   (`D81359 <https://reviews.llvm.org/D81359>`_)
 * ``--rosegment`` (default) is added to complement ``--no-rosegment``.
   GNU gold from 2.35 onwards support both options.
 * ``--threads=N`` is added. The default uses all threads.
   (`D76885 <https://reviews.llvm.org/D76885>`_)
 * ``--wrap`` has better compatibility with GNU ld.
 * ``-z dead-reloc-in-nonalloc=<section_glob>=<value>`` is added to resolve an absolute relocation
   referencing a discarded symbol.
   (`D83264 <https://reviews.llvm.org/D83264>`_)
 * Changed tombstone values to (``.debug_ranges``/``.debug_loc``) 1 and (other ``.debug_*``) 0.
   A tombstone value is the computed value of a relocation referencing a discarded symbol (``--gc-sections``, ICF or ``/DISCARD/``).
   (`D84825 <https://reviews.llvm.org/D84825>`_)
   In the future many .debug_* may switch to 0xffffffff/0xffffffffffffffff as the tombstone value.
 * ``-z keep-text-section-prefix`` moves ``.text.unknown.*`` input sections to ``.text.unknown``.
 * ``-z rel`` and ``-z rela`` are added to select the REL/RELA format for dynamic relocations.
   The default is target specific and typically matches the form used in relocatable objects.
 * ``-z start-stop-visibility={default,protected,internal,hidden}`` is added.
   GNU ld/gold from 2.35 onwards support this option.
   (`D55682 <https://reviews.llvm.org/D55682>`_)
 * When ``-r`` or ``--emit-relocs`` is specified, the GNU ld compatible
   ``--discard-all`` and ``--discard-locals`` semantics are implemented.
   (`D77807 <https://reviews.llvm.org/D77807>`_)
 * ``--emit-relocs --strip-debug`` can now be used together.
   (`D74375 <https://reviews.llvm.org/D74375>`_)
 * ``--gdb-index`` supports DWARF v5.
   (`D79061 <https://reviews.llvm.org/D79061>`_)
   (`D85579 <https://reviews.llvm.org/D85579>`_)
 * ``-r`` allows SHT_X86_64_UNWIND to be merged into SHT_PROGBITS.
   This allows clang/GCC produced object files to be mixed together.
   (`D85785 <https://reviews.llvm.org/D85785>`_)
 * Better linker script support related to output section alignments and LMA regions.
   (`D74286 <https://reviews.llvm.org/D75724>`_)
   (`D74297 <https://reviews.llvm.org/D75724>`_)
   (`D75724 <https://reviews.llvm.org/D75724>`_)
   (`D81986 <https://reviews.llvm.org/D81986>`_)
 * In a input section description, the filename can be specified in double quotes.
   ``archive:file`` syntax is added.
   (`D72517 <https://reviews.llvm.org/D72517>`_)
   (`D75100 <https://reviews.llvm.org/D75100>`_)
 * Linker script specified empty ``(.init|.preinit|.fini)_array`` are allowed with RELRO.
   (`D76915 <https://reviews.llvm.org/D76915>`_)
 * ``INSERT AFTER`` and ``INSERT BEFORE`` work for orphan sections now.
   (`D74375 <https://reviews.llvm.org/D74375>`_)
 * ``INPUT_SECTION_FLAGS`` is supported in linker scripts.
   (`D72745 <https://reviews.llvm.org/D72745>`_)
 * ``DF_1_PIE`` is set for position-independent executables.
   (`D80872 <https://reviews.llvm.org/D80872>`_)
 * For a symbol assignment ``alias = aliasee;``, ``alias`` inherits the ``aliasee``'s symbol type.
   (`D86263 <https://reviews.llvm.org/D86263>`_)
 * ``SHT_GNU_verneed`` in shared objects are parsed, and versioned undefined symbols in shared objects are respected.
   (`D80059 <https://reviews.llvm.org/D80059>`_)
 * SHF_LINK_ORDER and non-SHF_LINK_ORDER sections can be mixed along as the SHF_LINK_ORDER components are contiguous.
   (`D77007 <https://reviews.llvm.org/D77007>`_)
 * An out-of-range relocation diagnostic mentions the referenced symbol now.
   (`D73518 <https://reviews.llvm.org/D73518>`_)
 * AArch64: ``R_AARCH64_PLT32`` is supported.
   (`D81184 <https://reviews.llvm.org/D81184>`_)
 * ARM: SBREL type relocations are supported.
   (`D74375 <https://reviews.llvm.org/D74375>`_)
 * ARM: ``R_ARM_ALU_PC_G0``, ``R_ARM_LDR_PC_G0``, ``R_ARM_THUMB_PC8`` and ``R_ARM_THUMB__PC12`` are supported.
   (`D75349 <https://reviews.llvm.org/D75349>`_)
   (`D77200 <https://reviews.llvm.org/D77200>`_)
 * ARM: various improvements to .ARM.exidx: ``/DISCARD/`` support for a subset, out-of-range handling, support for non monotonic section order.
   (`PR44824 <https://llvm.org/PR44824>`_)
 * AVR: many relocation types are supported.
   (`D78741 <https://reviews.llvm.org/D78741>`_)
 * Hexagon: General Dynamic and some other relocation types are supported.
 * PPC: Canonical PLT and range extension thunks with addends are supported.
   (`D73399 <https://reviews.llvm.org/D73399>`_)
   (`D73424 <https://reviews.llvm.org/D73424>`_)
   (`D75394 <https://reviews.llvm.org/D75394>`_)
 * PPC and PPC64: copy relocations.
   (`D73255 <https://reviews.llvm.org/D73255>`_)
 * PPC64: ``_savegpr[01]_{14..31}`` and ``_restgpr[01]_{14..31}`` can be synthesized.
   (`D79977 <https://reviews.llvm.org/D79977>`_)
 * PPC64: ``R_PPC64_GOT_PCREL34`` and ``R_PPC64_REL24_NOTOC`` are supported. r2 save stub is supported.
   (`D81948 <https://reviews.llvm.org/D81948>`_)
   (`D82950 <https://reviews.llvm.org/D82950>`_)
   (`D82816 <https://reviews.llvm.org/D82816>`_)
 * RISC-V: ``R_RISCV_IRELATIVE`` is supported.
   (`D74022 <https://reviews.llvm.org/D74022>`_)
 * RISC-V: ``R_RISCV_ALIGN`` is errored because GNU ld style linker relaxation is not supported.
   (`D71820 <https://reviews.llvm.org/D71820>`_)
 * SPARCv9: more relocation types are supported.
   (`D77672 <https://reviews.llvm.org/D77672>`_)
 
 Breaking changes
 ----------------
 
 * One-dash form of some long option (``--thinlto-*``, ``--lto-*``, ``--shuffle-sections=``)
   are no longer supported.
   (`D79371 <https://reviews.llvm.org/D79371>`_)
 * ``--export-dynamic-symbol`` no longer implies ``-u``.
   The new behavior matches GNU ld from binutils 2.35 onwards.
   (`D80487 <https://reviews.llvm.org/D80487>`_)
 * ARM: the default max page size was increased from 4096 to 65536.
   This increases compatibility with systems where a non standard page
   size was configured. This also is inline with GNU ld defaults.
   (`D77330 <https://reviews.llvm.org/D77330>`_)
 * ARM: for non-STT_FUNC symbols, Thumb interworking thunks are not added and BL/BLX are not substituted.
   (`D73474 <https://reviews.llvm.org/D73474>`_)
   (`D73542 <https://reviews.llvm.org/D73542>`_)
 * AArch64: ``--force-bti`` is renamed to ``-z force-bti`. ``--pac-plt`` is renamed to ``-z pac-plt``.
   This change is compatibile with GNU ld.
 * A readonly ``PT_LOAD`` is created in the presence of a ``SECTIONS`` command.
   The new behavior is consistent with the longstanding behavior in the absence of a SECTIONS command.
 * Orphan section names like ``.rodata.foo`` and ``.text.foo`` are not grouped into ``.rodata`` and ``.text`` in the presence of a ``SECTIONS`` command.
   The new behavior matches GNU ld.
   (`D75225 <https://reviews.llvm.org/D75225>`_)
 * ``--no-threads`` is removed. Use ``--threads=1`` instead. ``--threads`` (no-op) is removed.
 
 COFF Improvements
 -----------------
 
 * Fixed exporting symbols whose names contain a period (``.``), which was
   a regression in lld 7.
 
 MinGW Improvements
 ------------------
 
 * Implemented new options for disabling auto import and runtime pseudo
   relocations (``--disable-auto-import`` and
   ``--disable-runtime-pseudo-reloc``), the ``--no-seh`` flag and options
   for selecting file and section alignment (``--file-alignment`` and
   ``--section-alignment``).
-
-MachO Improvements
-------------------
-
-* Item 1.
-
-WebAssembly Improvements
-------------------------
-
Index: vendor/llvm-project/release-11.x/llvm/include/llvm/ADT/SmallPtrSet.h
===================================================================
--- vendor/llvm-project/release-11.x/llvm/include/llvm/ADT/SmallPtrSet.h	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/include/llvm/ADT/SmallPtrSet.h	(revision 366333)
@@ -1,507 +1,510 @@
 //===- llvm/ADT/SmallPtrSet.h - 'Normally small' pointer set ----*- C++ -*-===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 //
 // This file defines the SmallPtrSet class.  See the doxygen comment for
 // SmallPtrSetImplBase for more details on the algorithm used.
 //
 //===----------------------------------------------------------------------===//
 
 #ifndef LLVM_ADT_SMALLPTRSET_H
 #define LLVM_ADT_SMALLPTRSET_H
 
 #include "llvm/ADT/EpochTracker.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/ReverseIteration.h"
 #include "llvm/Support/type_traits.h"
 #include <cassert>
 #include <cstddef>
 #include <cstdlib>
 #include <cstring>
 #include <initializer_list>
 #include <iterator>
 #include <utility>
 
 namespace llvm {
 
 /// SmallPtrSetImplBase - This is the common code shared among all the
 /// SmallPtrSet<>'s, which is almost everything.  SmallPtrSet has two modes, one
 /// for small and one for large sets.
 ///
 /// Small sets use an array of pointers allocated in the SmallPtrSet object,
 /// which is treated as a simple array of pointers.  When a pointer is added to
 /// the set, the array is scanned to see if the element already exists, if not
 /// the element is 'pushed back' onto the array.  If we run out of space in the
 /// array, we grow into the 'large set' case.  SmallSet should be used when the
 /// sets are often small.  In this case, no memory allocation is used, and only
 /// light-weight and cache-efficient scanning is used.
 ///
 /// Large sets use a classic exponentially-probed hash table.  Empty buckets are
 /// represented with an illegal pointer value (-1) to allow null pointers to be
 /// inserted.  Tombstones are represented with another illegal pointer value
 /// (-2), to allow deletion.  The hash table is resized when the table is 3/4 or
 /// more.  When this happens, the table is doubled in size.
 ///
 class SmallPtrSetImplBase : public DebugEpochBase {
   friend class SmallPtrSetIteratorImpl;
 
 protected:
   /// SmallArray - Points to a fixed size set of buckets, used in 'small mode'.
   const void **SmallArray;
   /// CurArray - This is the current set of buckets.  If equal to SmallArray,
   /// then the set is in 'small mode'.
   const void **CurArray;
   /// CurArraySize - The allocated size of CurArray, always a power of two.
   unsigned CurArraySize;
 
   /// Number of elements in CurArray that contain a value or are a tombstone.
   /// If small, all these elements are at the beginning of CurArray and the rest
   /// is uninitialized.
   unsigned NumNonEmpty;
   /// Number of tombstones in CurArray.
   unsigned NumTombstones;
 
   // Helpers to copy and move construct a SmallPtrSet.
   SmallPtrSetImplBase(const void **SmallStorage,
                       const SmallPtrSetImplBase &that);
   SmallPtrSetImplBase(const void **SmallStorage, unsigned SmallSize,
                       SmallPtrSetImplBase &&that);
 
   explicit SmallPtrSetImplBase(const void **SmallStorage, unsigned SmallSize)
       : SmallArray(SmallStorage), CurArray(SmallStorage),
         CurArraySize(SmallSize), NumNonEmpty(0), NumTombstones(0) {
     assert(SmallSize && (SmallSize & (SmallSize-1)) == 0 &&
            "Initial size must be a power of two!");
   }
 
   ~SmallPtrSetImplBase() {
     if (!isSmall())
       free(CurArray);
   }
 
 public:
   using size_type = unsigned;
 
   SmallPtrSetImplBase &operator=(const SmallPtrSetImplBase &) = delete;
 
   LLVM_NODISCARD bool empty() const { return size() == 0; }
   size_type size() const { return NumNonEmpty - NumTombstones; }
 
   void clear() {
     incrementEpoch();
     // If the capacity of the array is huge, and the # elements used is small,
     // shrink the array.
     if (!isSmall()) {
       if (size() * 4 < CurArraySize && CurArraySize > 32)
         return shrink_and_clear();
       // Fill the array with empty markers.
       memset(CurArray, -1, CurArraySize * sizeof(void *));
     }
 
     NumNonEmpty = 0;
     NumTombstones = 0;
   }
 
 protected:
   static void *getTombstoneMarker() { return reinterpret_cast<void*>(-2); }
 
   static void *getEmptyMarker() {
     // Note that -1 is chosen to make clear() efficiently implementable with
     // memset and because it's not a valid pointer value.
     return reinterpret_cast<void*>(-1);
   }
 
   const void **EndPointer() const {
     return isSmall() ? CurArray + NumNonEmpty : CurArray + CurArraySize;
   }
 
   /// insert_imp - This returns true if the pointer was new to the set, false if
   /// it was already in the set.  This is hidden from the client so that the
   /// derived class can check that the right type of pointer is passed in.
   std::pair<const void *const *, bool> insert_imp(const void *Ptr) {
     if (isSmall()) {
       // Check to see if it is already in the set.
       const void **LastTombstone = nullptr;
       for (const void **APtr = SmallArray, **E = SmallArray + NumNonEmpty;
            APtr != E; ++APtr) {
         const void *Value = *APtr;
         if (Value == Ptr)
           return std::make_pair(APtr, false);
         if (Value == getTombstoneMarker())
           LastTombstone = APtr;
       }
 
       // Did we find any tombstone marker?
       if (LastTombstone != nullptr) {
         *LastTombstone = Ptr;
         --NumTombstones;
         incrementEpoch();
         return std::make_pair(LastTombstone, true);
       }
 
       // Nope, there isn't.  If we stay small, just 'pushback' now.
       if (NumNonEmpty < CurArraySize) {
         SmallArray[NumNonEmpty++] = Ptr;
         incrementEpoch();
         return std::make_pair(SmallArray + (NumNonEmpty - 1), true);
       }
       // Otherwise, hit the big set case, which will call grow.
     }
     return insert_imp_big(Ptr);
   }
 
   /// erase_imp - If the set contains the specified pointer, remove it and
   /// return true, otherwise return false.  This is hidden from the client so
   /// that the derived class can check that the right type of pointer is passed
   /// in.
   bool erase_imp(const void * Ptr) {
     const void *const *P = find_imp(Ptr);
     if (P == EndPointer())
       return false;
 
     const void **Loc = const_cast<const void **>(P);
     assert(*Loc == Ptr && "broken find!");
     *Loc = getTombstoneMarker();
     NumTombstones++;
     return true;
   }
 
   /// Returns the raw pointer needed to construct an iterator.  If element not
   /// found, this will be EndPointer.  Otherwise, it will be a pointer to the
   /// slot which stores Ptr;
   const void *const * find_imp(const void * Ptr) const {
     if (isSmall()) {
       // Linear search for the item.
       for (const void *const *APtr = SmallArray,
                       *const *E = SmallArray + NumNonEmpty; APtr != E; ++APtr)
         if (*APtr == Ptr)
           return APtr;
       return EndPointer();
     }
 
     // Big set case.
     auto *Bucket = FindBucketFor(Ptr);
     if (*Bucket == Ptr)
       return Bucket;
     return EndPointer();
   }
 
 private:
   bool isSmall() const { return CurArray == SmallArray; }
 
   std::pair<const void *const *, bool> insert_imp_big(const void *Ptr);
 
   const void * const *FindBucketFor(const void *Ptr) const;
   void shrink_and_clear();
 
   /// Grow - Allocate a larger backing store for the buckets and move it over.
   void Grow(unsigned NewSize);
 
 protected:
   /// swap - Swaps the elements of two sets.
   /// Note: This method assumes that both sets have the same small size.
   void swap(SmallPtrSetImplBase &RHS);
 
   void CopyFrom(const SmallPtrSetImplBase &RHS);
   void MoveFrom(unsigned SmallSize, SmallPtrSetImplBase &&RHS);
 
 private:
   /// Code shared by MoveFrom() and move constructor.
   void MoveHelper(unsigned SmallSize, SmallPtrSetImplBase &&RHS);
   /// Code shared by CopyFrom() and copy constructor.
   void CopyHelper(const SmallPtrSetImplBase &RHS);
 };
 
 /// SmallPtrSetIteratorImpl - This is the common base class shared between all
 /// instances of SmallPtrSetIterator.
 class SmallPtrSetIteratorImpl {
 protected:
   const void *const *Bucket;
   const void *const *End;
 
 public:
   explicit SmallPtrSetIteratorImpl(const void *const *BP, const void*const *E)
     : Bucket(BP), End(E) {
     if (shouldReverseIterate()) {
       RetreatIfNotValid();
       return;
     }
     AdvanceIfNotValid();
   }
 
   bool operator==(const SmallPtrSetIteratorImpl &RHS) const {
     return Bucket == RHS.Bucket;
   }
   bool operator!=(const SmallPtrSetIteratorImpl &RHS) const {
     return Bucket != RHS.Bucket;
   }
 
 protected:
   /// AdvanceIfNotValid - If the current bucket isn't valid, advance to a bucket
   /// that is.   This is guaranteed to stop because the end() bucket is marked
   /// valid.
   void AdvanceIfNotValid() {
     assert(Bucket <= End);
     while (Bucket != End &&
            (*Bucket == SmallPtrSetImplBase::getEmptyMarker() ||
             *Bucket == SmallPtrSetImplBase::getTombstoneMarker()))
       ++Bucket;
   }
   void RetreatIfNotValid() {
     assert(Bucket >= End);
     while (Bucket != End &&
            (Bucket[-1] == SmallPtrSetImplBase::getEmptyMarker() ||
             Bucket[-1] == SmallPtrSetImplBase::getTombstoneMarker())) {
       --Bucket;
     }
   }
 };
 
 /// SmallPtrSetIterator - This implements a const_iterator for SmallPtrSet.
 template <typename PtrTy>
 class SmallPtrSetIterator : public SmallPtrSetIteratorImpl,
                             DebugEpochBase::HandleBase {
   using PtrTraits = PointerLikeTypeTraits<PtrTy>;
 
 public:
   using value_type = PtrTy;
   using reference = PtrTy;
   using pointer = PtrTy;
   using difference_type = std::ptrdiff_t;
   using iterator_category = std::forward_iterator_tag;
 
   explicit SmallPtrSetIterator(const void *const *BP, const void *const *E,
                                const DebugEpochBase &Epoch)
       : SmallPtrSetIteratorImpl(BP, E), DebugEpochBase::HandleBase(&Epoch) {}
 
   // Most methods are provided by the base class.
 
   const PtrTy operator*() const {
     assert(isHandleInSync() && "invalid iterator access!");
     if (shouldReverseIterate()) {
       assert(Bucket > End);
       return PtrTraits::getFromVoidPointer(const_cast<void *>(Bucket[-1]));
     }
     assert(Bucket < End);
     return PtrTraits::getFromVoidPointer(const_cast<void*>(*Bucket));
   }
 
   inline SmallPtrSetIterator& operator++() {          // Preincrement
     assert(isHandleInSync() && "invalid iterator access!");
     if (shouldReverseIterate()) {
       --Bucket;
       RetreatIfNotValid();
       return *this;
     }
     ++Bucket;
     AdvanceIfNotValid();
     return *this;
   }
 
   SmallPtrSetIterator operator++(int) {        // Postincrement
     SmallPtrSetIterator tmp = *this;
     ++*this;
     return tmp;
   }
 };
 
 /// RoundUpToPowerOfTwo - This is a helper template that rounds N up to the next
 /// power of two (which means N itself if N is already a power of two).
 template<unsigned N>
 struct RoundUpToPowerOfTwo;
 
 /// RoundUpToPowerOfTwoH - If N is not a power of two, increase it.  This is a
 /// helper template used to implement RoundUpToPowerOfTwo.
 template<unsigned N, bool isPowerTwo>
 struct RoundUpToPowerOfTwoH {
   enum { Val = N };
 };
 template<unsigned N>
 struct RoundUpToPowerOfTwoH<N, false> {
   enum {
     // We could just use NextVal = N+1, but this converges faster.  N|(N-1) sets
     // the right-most zero bits to one all at once, e.g. 0b0011000 -> 0b0011111.
     Val = RoundUpToPowerOfTwo<(N|(N-1)) + 1>::Val
   };
 };
 
 template<unsigned N>
 struct RoundUpToPowerOfTwo {
   enum { Val = RoundUpToPowerOfTwoH<N, (N&(N-1)) == 0>::Val };
 };
 
 /// A templated base class for \c SmallPtrSet which provides the
 /// typesafe interface that is common across all small sizes.
 ///
 /// This is particularly useful for passing around between interface boundaries
 /// to avoid encoding a particular small size in the interface boundary.
 template <typename PtrType>
 class SmallPtrSetImpl : public SmallPtrSetImplBase {
   using ConstPtrType = typename add_const_past_pointer<PtrType>::type;
   using PtrTraits = PointerLikeTypeTraits<PtrType>;
   using ConstPtrTraits = PointerLikeTypeTraits<ConstPtrType>;
 
 protected:
   // Forward constructors to the base.
   using SmallPtrSetImplBase::SmallPtrSetImplBase;
 
 public:
   using iterator = SmallPtrSetIterator<PtrType>;
   using const_iterator = SmallPtrSetIterator<PtrType>;
   using key_type = ConstPtrType;
   using value_type = PtrType;
 
   SmallPtrSetImpl(const SmallPtrSetImpl &) = delete;
 
   /// Inserts Ptr if and only if there is no element in the container equal to
   /// Ptr. The bool component of the returned pair is true if and only if the
   /// insertion takes place, and the iterator component of the pair points to
   /// the element equal to Ptr.
   std::pair<iterator, bool> insert(PtrType Ptr) {
     auto p = insert_imp(PtrTraits::getAsVoidPointer(Ptr));
     return std::make_pair(makeIterator(p.first), p.second);
   }
 
   /// erase - If the set contains the specified pointer, remove it and return
   /// true, otherwise return false.
   bool erase(PtrType Ptr) {
     return erase_imp(PtrTraits::getAsVoidPointer(Ptr));
   }
   /// count - Return 1 if the specified pointer is in the set, 0 otherwise.
   size_type count(ConstPtrType Ptr) const {
     return find_imp(ConstPtrTraits::getAsVoidPointer(Ptr)) != EndPointer();
   }
   iterator find(ConstPtrType Ptr) const {
     return makeIterator(find_imp(ConstPtrTraits::getAsVoidPointer(Ptr)));
   }
+  bool contains(ConstPtrType Ptr) const {
+    return find_imp(ConstPtrTraits::getAsVoidPointer(Ptr)) != EndPointer();
+  }
 
   template <typename IterT>
   void insert(IterT I, IterT E) {
     for (; I != E; ++I)
       insert(*I);
   }
 
   void insert(std::initializer_list<PtrType> IL) {
     insert(IL.begin(), IL.end());
   }
 
   iterator begin() const {
     if (shouldReverseIterate())
       return makeIterator(EndPointer() - 1);
     return makeIterator(CurArray);
   }
   iterator end() const { return makeIterator(EndPointer()); }
 
 private:
   /// Create an iterator that dereferences to same place as the given pointer.
   iterator makeIterator(const void *const *P) const {
     if (shouldReverseIterate())
       return iterator(P == EndPointer() ? CurArray : P + 1, CurArray, *this);
     return iterator(P, EndPointer(), *this);
   }
 };
 
 /// Equality comparison for SmallPtrSet.
 ///
 /// Iterates over elements of LHS confirming that each value from LHS is also in
 /// RHS, and that no additional values are in RHS.
 template <typename PtrType>
 bool operator==(const SmallPtrSetImpl<PtrType> &LHS,
                 const SmallPtrSetImpl<PtrType> &RHS) {
   if (LHS.size() != RHS.size())
     return false;
 
   for (const auto *KV : LHS)
     if (!RHS.count(KV))
       return false;
 
   return true;
 }
 
 /// Inequality comparison for SmallPtrSet.
 ///
 /// Equivalent to !(LHS == RHS).
 template <typename PtrType>
 bool operator!=(const SmallPtrSetImpl<PtrType> &LHS,
                 const SmallPtrSetImpl<PtrType> &RHS) {
   return !(LHS == RHS);
 }
 
 /// SmallPtrSet - This class implements a set which is optimized for holding
 /// SmallSize or less elements.  This internally rounds up SmallSize to the next
 /// power of two if it is not already a power of two.  See the comments above
 /// SmallPtrSetImplBase for details of the algorithm.
 template<class PtrType, unsigned SmallSize>
 class SmallPtrSet : public SmallPtrSetImpl<PtrType> {
   // In small mode SmallPtrSet uses linear search for the elements, so it is
   // not a good idea to choose this value too high. You may consider using a
   // DenseSet<> instead if you expect many elements in the set.
   static_assert(SmallSize <= 32, "SmallSize should be small");
 
   using BaseT = SmallPtrSetImpl<PtrType>;
 
   // Make sure that SmallSize is a power of two, round up if not.
   enum { SmallSizePowTwo = RoundUpToPowerOfTwo<SmallSize>::Val };
   /// SmallStorage - Fixed size storage used in 'small mode'.
   const void *SmallStorage[SmallSizePowTwo];
 
 public:
   SmallPtrSet() : BaseT(SmallStorage, SmallSizePowTwo) {}
   SmallPtrSet(const SmallPtrSet &that) : BaseT(SmallStorage, that) {}
   SmallPtrSet(SmallPtrSet &&that)
       : BaseT(SmallStorage, SmallSizePowTwo, std::move(that)) {}
 
   template<typename It>
   SmallPtrSet(It I, It E) : BaseT(SmallStorage, SmallSizePowTwo) {
     this->insert(I, E);
   }
 
   SmallPtrSet(std::initializer_list<PtrType> IL)
       : BaseT(SmallStorage, SmallSizePowTwo) {
     this->insert(IL.begin(), IL.end());
   }
 
   SmallPtrSet<PtrType, SmallSize> &
   operator=(const SmallPtrSet<PtrType, SmallSize> &RHS) {
     if (&RHS != this)
       this->CopyFrom(RHS);
     return *this;
   }
 
   SmallPtrSet<PtrType, SmallSize> &
   operator=(SmallPtrSet<PtrType, SmallSize> &&RHS) {
     if (&RHS != this)
       this->MoveFrom(SmallSizePowTwo, std::move(RHS));
     return *this;
   }
 
   SmallPtrSet<PtrType, SmallSize> &
   operator=(std::initializer_list<PtrType> IL) {
     this->clear();
     this->insert(IL.begin(), IL.end());
     return *this;
   }
 
   /// swap - Swaps the elements of two sets.
   void swap(SmallPtrSet<PtrType, SmallSize> &RHS) {
     SmallPtrSetImplBase::swap(RHS);
   }
 };
 
 } // end namespace llvm
 
 namespace std {
 
   /// Implement std::swap in terms of SmallPtrSet swap.
   template<class T, unsigned N>
   inline void swap(llvm::SmallPtrSet<T, N> &LHS, llvm::SmallPtrSet<T, N> &RHS) {
     LHS.swap(RHS);
   }
 
 } // end namespace std
 
 #endif // LLVM_ADT_SMALLPTRSET_H
Index: vendor/llvm-project/release-11.x/llvm/include/llvm-c/Core.h
===================================================================
--- vendor/llvm-project/release-11.x/llvm/include/llvm-c/Core.h	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/include/llvm-c/Core.h	(revision 366333)
@@ -1,4102 +1,4122 @@
 /*===-- llvm-c/Core.h - Core Library C Interface ------------------*- C -*-===*\
 |*                                                                            *|
 |* Part of the LLVM Project, under the Apache License v2.0 with LLVM          *|
 |* Exceptions.                                                                *|
 |* See https://llvm.org/LICENSE.txt for license information.                  *|
 |* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception                    *|
 |*                                                                            *|
 |*===----------------------------------------------------------------------===*|
 |*                                                                            *|
 |* This header declares the C interface to libLLVMCore.a, which implements    *|
 |* the LLVM intermediate representation.                                      *|
 |*                                                                            *|
 \*===----------------------------------------------------------------------===*/
 
 #ifndef LLVM_C_CORE_H
 #define LLVM_C_CORE_H
 
 #include "llvm-c/ErrorHandling.h"
 #include "llvm-c/ExternC.h"
 #include "llvm-c/Types.h"
 
 LLVM_C_EXTERN_C_BEGIN
 
 /**
  * @defgroup LLVMC LLVM-C: C interface to LLVM
  *
  * This module exposes parts of the LLVM library as a C API.
  *
  * @{
  */
 
 /**
  * @defgroup LLVMCTransforms Transforms
  */
 
 /**
  * @defgroup LLVMCCore Core
  *
  * This modules provide an interface to libLLVMCore, which implements
  * the LLVM intermediate representation as well as other related types
  * and utilities.
  *
  * Many exotic languages can interoperate with C code but have a harder time
  * with C++ due to name mangling. So in addition to C, this interface enables
  * tools written in such languages.
  *
  * @{
  */
 
 /**
  * @defgroup LLVMCCoreTypes Types and Enumerations
  *
  * @{
  */
 
 /// External users depend on the following values being stable. It is not safe
 /// to reorder them.
 typedef enum {
   /* Terminator Instructions */
   LLVMRet            = 1,
   LLVMBr             = 2,
   LLVMSwitch         = 3,
   LLVMIndirectBr     = 4,
   LLVMInvoke         = 5,
   /* removed 6 due to API changes */
   LLVMUnreachable    = 7,
   LLVMCallBr         = 67,
 
   /* Standard Unary Operators */
   LLVMFNeg           = 66,
 
   /* Standard Binary Operators */
   LLVMAdd            = 8,
   LLVMFAdd           = 9,
   LLVMSub            = 10,
   LLVMFSub           = 11,
   LLVMMul            = 12,
   LLVMFMul           = 13,
   LLVMUDiv           = 14,
   LLVMSDiv           = 15,
   LLVMFDiv           = 16,
   LLVMURem           = 17,
   LLVMSRem           = 18,
   LLVMFRem           = 19,
 
   /* Logical Operators */
   LLVMShl            = 20,
   LLVMLShr           = 21,
   LLVMAShr           = 22,
   LLVMAnd            = 23,
   LLVMOr             = 24,
   LLVMXor            = 25,
 
   /* Memory Operators */
   LLVMAlloca         = 26,
   LLVMLoad           = 27,
   LLVMStore          = 28,
   LLVMGetElementPtr  = 29,
 
   /* Cast Operators */
   LLVMTrunc          = 30,
   LLVMZExt           = 31,
   LLVMSExt           = 32,
   LLVMFPToUI         = 33,
   LLVMFPToSI         = 34,
   LLVMUIToFP         = 35,
   LLVMSIToFP         = 36,
   LLVMFPTrunc        = 37,
   LLVMFPExt          = 38,
   LLVMPtrToInt       = 39,
   LLVMIntToPtr       = 40,
   LLVMBitCast        = 41,
   LLVMAddrSpaceCast  = 60,
 
   /* Other Operators */
   LLVMICmp           = 42,
   LLVMFCmp           = 43,
   LLVMPHI            = 44,
   LLVMCall           = 45,
   LLVMSelect         = 46,
   LLVMUserOp1        = 47,
   LLVMUserOp2        = 48,
   LLVMVAArg          = 49,
   LLVMExtractElement = 50,
   LLVMInsertElement  = 51,
   LLVMShuffleVector  = 52,
   LLVMExtractValue   = 53,
   LLVMInsertValue    = 54,
   LLVMFreeze         = 68,
 
   /* Atomic operators */
   LLVMFence          = 55,
   LLVMAtomicCmpXchg  = 56,
   LLVMAtomicRMW      = 57,
 
   /* Exception Handling Operators */
   LLVMResume         = 58,
   LLVMLandingPad     = 59,
   LLVMCleanupRet     = 61,
   LLVMCatchRet       = 62,
   LLVMCatchPad       = 63,
   LLVMCleanupPad     = 64,
   LLVMCatchSwitch    = 65
 } LLVMOpcode;
 
 typedef enum {
   LLVMVoidTypeKind,      /**< type with no size */
   LLVMHalfTypeKind,      /**< 16 bit floating point type */
   LLVMFloatTypeKind,     /**< 32 bit floating point type */
   LLVMDoubleTypeKind,    /**< 64 bit floating point type */
   LLVMX86_FP80TypeKind,  /**< 80 bit floating point type (X87) */
   LLVMFP128TypeKind,     /**< 128 bit floating point type (112-bit mantissa)*/
   LLVMPPC_FP128TypeKind, /**< 128 bit floating point type (two 64-bits) */
   LLVMLabelTypeKind,     /**< Labels */
   LLVMIntegerTypeKind,   /**< Arbitrary bit width integers */
   LLVMFunctionTypeKind,  /**< Functions */
   LLVMStructTypeKind,    /**< Structures */
   LLVMArrayTypeKind,     /**< Arrays */
   LLVMPointerTypeKind,   /**< Pointers */
   LLVMVectorTypeKind,    /**< Fixed width SIMD vector type */
   LLVMMetadataTypeKind,  /**< Metadata */
   LLVMX86_MMXTypeKind,   /**< X86 MMX */
   LLVMTokenTypeKind,     /**< Tokens */
   LLVMScalableVectorTypeKind, /**< Scalable SIMD vector type */
   LLVMBFloatTypeKind     /**< 16 bit brain floating point type */
 } LLVMTypeKind;
 
 typedef enum {
   LLVMExternalLinkage,    /**< Externally visible function */
   LLVMAvailableExternallyLinkage,
   LLVMLinkOnceAnyLinkage, /**< Keep one copy of function when linking (inline)*/
   LLVMLinkOnceODRLinkage, /**< Same, but only replaced by something
                             equivalent. */
   LLVMLinkOnceODRAutoHideLinkage, /**< Obsolete */
   LLVMWeakAnyLinkage,     /**< Keep one copy of function when linking (weak) */
   LLVMWeakODRLinkage,     /**< Same, but only replaced by something
                             equivalent. */
   LLVMAppendingLinkage,   /**< Special purpose, only applies to global arrays */
   LLVMInternalLinkage,    /**< Rename collisions when linking (static
                                functions) */
   LLVMPrivateLinkage,     /**< Like Internal, but omit from symbol table */
   LLVMDLLImportLinkage,   /**< Obsolete */
   LLVMDLLExportLinkage,   /**< Obsolete */
   LLVMExternalWeakLinkage,/**< ExternalWeak linkage description */
   LLVMGhostLinkage,       /**< Obsolete */
   LLVMCommonLinkage,      /**< Tentative definitions */
   LLVMLinkerPrivateLinkage, /**< Like Private, but linker removes. */
   LLVMLinkerPrivateWeakLinkage /**< Like LinkerPrivate, but is weak. */
 } LLVMLinkage;
 
 typedef enum {
   LLVMDefaultVisibility,  /**< The GV is visible */
   LLVMHiddenVisibility,   /**< The GV is hidden */
   LLVMProtectedVisibility /**< The GV is protected */
 } LLVMVisibility;
 
 typedef enum {
   LLVMNoUnnamedAddr,    /**< Address of the GV is significant. */
   LLVMLocalUnnamedAddr, /**< Address of the GV is locally insignificant. */
   LLVMGlobalUnnamedAddr /**< Address of the GV is globally insignificant. */
 } LLVMUnnamedAddr;
 
 typedef enum {
   LLVMDefaultStorageClass   = 0,
   LLVMDLLImportStorageClass = 1, /**< Function to be imported from DLL. */
   LLVMDLLExportStorageClass = 2  /**< Function to be accessible from DLL. */
 } LLVMDLLStorageClass;
 
 typedef enum {
   LLVMCCallConv             = 0,
   LLVMFastCallConv          = 8,
   LLVMColdCallConv          = 9,
   LLVMGHCCallConv           = 10,
   LLVMHiPECallConv          = 11,
   LLVMWebKitJSCallConv      = 12,
   LLVMAnyRegCallConv        = 13,
   LLVMPreserveMostCallConv  = 14,
   LLVMPreserveAllCallConv   = 15,
   LLVMSwiftCallConv         = 16,
   LLVMCXXFASTTLSCallConv    = 17,
   LLVMX86StdcallCallConv    = 64,
   LLVMX86FastcallCallConv   = 65,
   LLVMARMAPCSCallConv       = 66,
   LLVMARMAAPCSCallConv      = 67,
   LLVMARMAAPCSVFPCallConv   = 68,
   LLVMMSP430INTRCallConv    = 69,
   LLVMX86ThisCallCallConv   = 70,
   LLVMPTXKernelCallConv     = 71,
   LLVMPTXDeviceCallConv     = 72,
   LLVMSPIRFUNCCallConv      = 75,
   LLVMSPIRKERNELCallConv    = 76,
   LLVMIntelOCLBICallConv    = 77,
   LLVMX8664SysVCallConv     = 78,
   LLVMWin64CallConv         = 79,
   LLVMX86VectorCallCallConv = 80,
   LLVMHHVMCallConv          = 81,
   LLVMHHVMCCallConv         = 82,
   LLVMX86INTRCallConv       = 83,
   LLVMAVRINTRCallConv       = 84,
   LLVMAVRSIGNALCallConv     = 85,
   LLVMAVRBUILTINCallConv    = 86,
   LLVMAMDGPUVSCallConv      = 87,
   LLVMAMDGPUGSCallConv      = 88,
   LLVMAMDGPUPSCallConv      = 89,
   LLVMAMDGPUCSCallConv      = 90,
   LLVMAMDGPUKERNELCallConv  = 91,
   LLVMX86RegCallCallConv    = 92,
   LLVMAMDGPUHSCallConv      = 93,
   LLVMMSP430BUILTINCallConv = 94,
   LLVMAMDGPULSCallConv      = 95,
   LLVMAMDGPUESCallConv      = 96
 } LLVMCallConv;
 
 typedef enum {
   LLVMArgumentValueKind,
   LLVMBasicBlockValueKind,
   LLVMMemoryUseValueKind,
   LLVMMemoryDefValueKind,
   LLVMMemoryPhiValueKind,
 
   LLVMFunctionValueKind,
   LLVMGlobalAliasValueKind,
   LLVMGlobalIFuncValueKind,
   LLVMGlobalVariableValueKind,
   LLVMBlockAddressValueKind,
   LLVMConstantExprValueKind,
   LLVMConstantArrayValueKind,
   LLVMConstantStructValueKind,
   LLVMConstantVectorValueKind,
 
   LLVMUndefValueValueKind,
   LLVMConstantAggregateZeroValueKind,
   LLVMConstantDataArrayValueKind,
   LLVMConstantDataVectorValueKind,
   LLVMConstantIntValueKind,
   LLVMConstantFPValueKind,
   LLVMConstantPointerNullValueKind,
   LLVMConstantTokenNoneValueKind,
 
   LLVMMetadataAsValueValueKind,
   LLVMInlineAsmValueKind,
 
   LLVMInstructionValueKind,
 } LLVMValueKind;
 
 typedef enum {
   LLVMIntEQ = 32, /**< equal */
   LLVMIntNE,      /**< not equal */
   LLVMIntUGT,     /**< unsigned greater than */
   LLVMIntUGE,     /**< unsigned greater or equal */
   LLVMIntULT,     /**< unsigned less than */
   LLVMIntULE,     /**< unsigned less or equal */
   LLVMIntSGT,     /**< signed greater than */
   LLVMIntSGE,     /**< signed greater or equal */
   LLVMIntSLT,     /**< signed less than */
   LLVMIntSLE      /**< signed less or equal */
 } LLVMIntPredicate;
 
 typedef enum {
   LLVMRealPredicateFalse, /**< Always false (always folded) */
   LLVMRealOEQ,            /**< True if ordered and equal */
   LLVMRealOGT,            /**< True if ordered and greater than */
   LLVMRealOGE,            /**< True if ordered and greater than or equal */
   LLVMRealOLT,            /**< True if ordered and less than */
   LLVMRealOLE,            /**< True if ordered and less than or equal */
   LLVMRealONE,            /**< True if ordered and operands are unequal */
   LLVMRealORD,            /**< True if ordered (no nans) */
   LLVMRealUNO,            /**< True if unordered: isnan(X) | isnan(Y) */
   LLVMRealUEQ,            /**< True if unordered or equal */
   LLVMRealUGT,            /**< True if unordered or greater than */
   LLVMRealUGE,            /**< True if unordered, greater than, or equal */
   LLVMRealULT,            /**< True if unordered or less than */
   LLVMRealULE,            /**< True if unordered, less than, or equal */
   LLVMRealUNE,            /**< True if unordered or not equal */
   LLVMRealPredicateTrue   /**< Always true (always folded) */
 } LLVMRealPredicate;
 
 typedef enum {
   LLVMLandingPadCatch,    /**< A catch clause   */
   LLVMLandingPadFilter    /**< A filter clause  */
 } LLVMLandingPadClauseTy;
 
 typedef enum {
   LLVMNotThreadLocal = 0,
   LLVMGeneralDynamicTLSModel,
   LLVMLocalDynamicTLSModel,
   LLVMInitialExecTLSModel,
   LLVMLocalExecTLSModel
 } LLVMThreadLocalMode;
 
 typedef enum {
   LLVMAtomicOrderingNotAtomic = 0, /**< A load or store which is not atomic */
   LLVMAtomicOrderingUnordered = 1, /**< Lowest level of atomicity, guarantees
                                      somewhat sane results, lock free. */
   LLVMAtomicOrderingMonotonic = 2, /**< guarantees that if you take all the
                                      operations affecting a specific address,
                                      a consistent ordering exists */
   LLVMAtomicOrderingAcquire = 4, /**< Acquire provides a barrier of the sort
                                    necessary to acquire a lock to access other
                                    memory with normal loads and stores. */
   LLVMAtomicOrderingRelease = 5, /**< Release is similar to Acquire, but with
                                    a barrier of the sort necessary to release
                                    a lock. */
   LLVMAtomicOrderingAcquireRelease = 6, /**< provides both an Acquire and a
                                           Release barrier (for fences and
                                           operations which both read and write
                                            memory). */
   LLVMAtomicOrderingSequentiallyConsistent = 7 /**< provides Acquire semantics
                                                  for loads and Release
                                                  semantics for stores.
                                                  Additionally, it guarantees
                                                  that a total ordering exists
                                                  between all
                                                  SequentiallyConsistent
                                                  operations. */
 } LLVMAtomicOrdering;
 
 typedef enum {
     LLVMAtomicRMWBinOpXchg, /**< Set the new value and return the one old */
     LLVMAtomicRMWBinOpAdd, /**< Add a value and return the old one */
     LLVMAtomicRMWBinOpSub, /**< Subtract a value and return the old one */
     LLVMAtomicRMWBinOpAnd, /**< And a value and return the old one */
     LLVMAtomicRMWBinOpNand, /**< Not-And a value and return the old one */
     LLVMAtomicRMWBinOpOr, /**< OR a value and return the old one */
     LLVMAtomicRMWBinOpXor, /**< Xor a value and return the old one */
     LLVMAtomicRMWBinOpMax, /**< Sets the value if it's greater than the
                              original using a signed comparison and return
                              the old one */
     LLVMAtomicRMWBinOpMin, /**< Sets the value if it's Smaller than the
                              original using a signed comparison and return
                              the old one */
     LLVMAtomicRMWBinOpUMax, /**< Sets the value if it's greater than the
                              original using an unsigned comparison and return
                              the old one */
     LLVMAtomicRMWBinOpUMin, /**< Sets the value if it's greater than the
                               original using an unsigned comparison and return
                               the old one */
     LLVMAtomicRMWBinOpFAdd, /**< Add a floating point value and return the
                               old one */
     LLVMAtomicRMWBinOpFSub /**< Subtract a floating point value and return the
                              old one */
 } LLVMAtomicRMWBinOp;
 
 typedef enum {
     LLVMDSError,
     LLVMDSWarning,
     LLVMDSRemark,
     LLVMDSNote
 } LLVMDiagnosticSeverity;
 
 typedef enum {
   LLVMInlineAsmDialectATT,
   LLVMInlineAsmDialectIntel
 } LLVMInlineAsmDialect;
 
 typedef enum {
   /**
    * Emits an error if two values disagree, otherwise the resulting value is
    * that of the operands.
    *
    * @see Module::ModFlagBehavior::Error
    */
   LLVMModuleFlagBehaviorError,
   /**
    * Emits a warning if two values disagree. The result value will be the
    * operand for the flag from the first module being linked.
    *
    * @see Module::ModFlagBehavior::Warning
    */
   LLVMModuleFlagBehaviorWarning,
   /**
    * Adds a requirement that another module flag be present and have a
    * specified value after linking is performed. The value must be a metadata
    * pair, where the first element of the pair is the ID of the module flag
    * to be restricted, and the second element of the pair is the value the
    * module flag should be restricted to. This behavior can be used to
    * restrict the allowable results (via triggering of an error) of linking
    * IDs with the **Override** behavior.
    *
    * @see Module::ModFlagBehavior::Require
    */
   LLVMModuleFlagBehaviorRequire,
   /**
    * Uses the specified value, regardless of the behavior or value of the
    * other module. If both modules specify **Override**, but the values
    * differ, an error will be emitted.
    *
    * @see Module::ModFlagBehavior::Override
    */
   LLVMModuleFlagBehaviorOverride,
   /**
    * Appends the two values, which are required to be metadata nodes.
    *
    * @see Module::ModFlagBehavior::Append
    */
   LLVMModuleFlagBehaviorAppend,
   /**
    * Appends the two values, which are required to be metadata
    * nodes. However, duplicate entries in the second list are dropped
    * during the append operation.
    *
    * @see Module::ModFlagBehavior::AppendUnique
    */
   LLVMModuleFlagBehaviorAppendUnique,
 } LLVMModuleFlagBehavior;
 
 /**
  * Attribute index are either LLVMAttributeReturnIndex,
  * LLVMAttributeFunctionIndex or a parameter number from 1 to N.
  */
 enum {
   LLVMAttributeReturnIndex = 0U,
   // ISO C restricts enumerator values to range of 'int'
   // (4294967295 is too large)
   // LLVMAttributeFunctionIndex = ~0U,
   LLVMAttributeFunctionIndex = -1,
 };
 
 typedef unsigned LLVMAttributeIndex;
 
 /**
  * @}
  */
 
 void LLVMInitializeCore(LLVMPassRegistryRef R);
 
 /** Deallocate and destroy all ManagedStatic variables.
     @see llvm::llvm_shutdown
     @see ManagedStatic */
 void LLVMShutdown(void);
 
 /*===-- Error handling ----------------------------------------------------===*/
 
 char *LLVMCreateMessage(const char *Message);
 void LLVMDisposeMessage(char *Message);
 
 /**
  * @defgroup LLVMCCoreContext Contexts
  *
  * Contexts are execution states for the core LLVM IR system.
  *
  * Most types are tied to a context instance. Multiple contexts can
  * exist simultaneously. A single context is not thread safe. However,
  * different contexts can execute on different threads simultaneously.
  *
  * @{
  */
 
 typedef void (*LLVMDiagnosticHandler)(LLVMDiagnosticInfoRef, void *);
 typedef void (*LLVMYieldCallback)(LLVMContextRef, void *);
 
 /**
  * Create a new context.
  *
  * Every call to this function should be paired with a call to
  * LLVMContextDispose() or the context will leak memory.
  */
 LLVMContextRef LLVMContextCreate(void);
 
 /**
  * Obtain the global context instance.
  */
 LLVMContextRef LLVMGetGlobalContext(void);
 
 /**
  * Set the diagnostic handler for this context.
  */
 void LLVMContextSetDiagnosticHandler(LLVMContextRef C,
                                      LLVMDiagnosticHandler Handler,
                                      void *DiagnosticContext);
 
 /**
  * Get the diagnostic handler of this context.
  */
 LLVMDiagnosticHandler LLVMContextGetDiagnosticHandler(LLVMContextRef C);
 
 /**
  * Get the diagnostic context of this context.
  */
 void *LLVMContextGetDiagnosticContext(LLVMContextRef C);
 
 /**
  * Set the yield callback function for this context.
  *
  * @see LLVMContext::setYieldCallback()
  */
 void LLVMContextSetYieldCallback(LLVMContextRef C, LLVMYieldCallback Callback,
                                  void *OpaqueHandle);
 
 /**
  * Retrieve whether the given context is set to discard all value names.
  *
  * @see LLVMContext::shouldDiscardValueNames()
  */
 LLVMBool LLVMContextShouldDiscardValueNames(LLVMContextRef C);
 
 /**
  * Set whether the given context discards all value names.
  *
  * If true, only the names of GlobalValue objects will be available in the IR.
  * This can be used to save memory and runtime, especially in release mode.
  *
  * @see LLVMContext::setDiscardValueNames()
  */
 void LLVMContextSetDiscardValueNames(LLVMContextRef C, LLVMBool Discard);
 
 /**
  * Destroy a context instance.
  *
  * This should be called for every call to LLVMContextCreate() or memory
  * will be leaked.
  */
 void LLVMContextDispose(LLVMContextRef C);
 
 /**
  * Return a string representation of the DiagnosticInfo. Use
  * LLVMDisposeMessage to free the string.
  *
  * @see DiagnosticInfo::print()
  */
 char *LLVMGetDiagInfoDescription(LLVMDiagnosticInfoRef DI);
 
 /**
  * Return an enum LLVMDiagnosticSeverity.
  *
  * @see DiagnosticInfo::getSeverity()
  */
 LLVMDiagnosticSeverity LLVMGetDiagInfoSeverity(LLVMDiagnosticInfoRef DI);
 
 unsigned LLVMGetMDKindIDInContext(LLVMContextRef C, const char *Name,
                                   unsigned SLen);
 unsigned LLVMGetMDKindID(const char *Name, unsigned SLen);
 
 /**
  * Return an unique id given the name of a enum attribute,
  * or 0 if no attribute by that name exists.
  *
  * See http://llvm.org/docs/LangRef.html#parameter-attributes
  * and http://llvm.org/docs/LangRef.html#function-attributes
  * for the list of available attributes.
  *
  * NB: Attribute names and/or id are subject to change without
  * going through the C API deprecation cycle.
  */
 unsigned LLVMGetEnumAttributeKindForName(const char *Name, size_t SLen);
 unsigned LLVMGetLastEnumAttributeKind(void);
 
 /**
  * Create an enum attribute.
  */
 LLVMAttributeRef LLVMCreateEnumAttribute(LLVMContextRef C, unsigned KindID,
                                          uint64_t Val);
 
 /**
  * Get the unique id corresponding to the enum attribute
  * passed as argument.
  */
 unsigned LLVMGetEnumAttributeKind(LLVMAttributeRef A);
 
 /**
  * Get the enum attribute's value. 0 is returned if none exists.
  */
 uint64_t LLVMGetEnumAttributeValue(LLVMAttributeRef A);
 
 /**
  * Create a string attribute.
  */
 LLVMAttributeRef LLVMCreateStringAttribute(LLVMContextRef C,
                                            const char *K, unsigned KLength,
                                            const char *V, unsigned VLength);
 
 /**
  * Get the string attribute's kind.
  */
 const char *LLVMGetStringAttributeKind(LLVMAttributeRef A, unsigned *Length);
 
 /**
  * Get the string attribute's value.
  */
 const char *LLVMGetStringAttributeValue(LLVMAttributeRef A, unsigned *Length);
 
 /**
  * Check for the different types of attributes.
  */
 LLVMBool LLVMIsEnumAttribute(LLVMAttributeRef A);
 LLVMBool LLVMIsStringAttribute(LLVMAttributeRef A);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreModule Modules
  *
  * Modules represent the top-level structure in an LLVM program. An LLVM
  * module is effectively a translation unit or a collection of
  * translation units merged together.
  *
  * @{
  */
 
 /**
  * Create a new, empty module in the global context.
  *
  * This is equivalent to calling LLVMModuleCreateWithNameInContext with
  * LLVMGetGlobalContext() as the context parameter.
  *
  * Every invocation should be paired with LLVMDisposeModule() or memory
  * will be leaked.
  */
 LLVMModuleRef LLVMModuleCreateWithName(const char *ModuleID);
 
 /**
  * Create a new, empty module in a specific context.
  *
  * Every invocation should be paired with LLVMDisposeModule() or memory
  * will be leaked.
  */
 LLVMModuleRef LLVMModuleCreateWithNameInContext(const char *ModuleID,
                                                 LLVMContextRef C);
 /**
  * Return an exact copy of the specified module.
  */
 LLVMModuleRef LLVMCloneModule(LLVMModuleRef M);
 
 /**
  * Destroy a module instance.
  *
  * This must be called for every created module or memory will be
  * leaked.
  */
 void LLVMDisposeModule(LLVMModuleRef M);
 
 /**
  * Obtain the identifier of a module.
  *
  * @param M Module to obtain identifier of
  * @param Len Out parameter which holds the length of the returned string.
  * @return The identifier of M.
  * @see Module::getModuleIdentifier()
  */
 const char *LLVMGetModuleIdentifier(LLVMModuleRef M, size_t *Len);
 
 /**
  * Set the identifier of a module to a string Ident with length Len.
  *
  * @param M The module to set identifier
  * @param Ident The string to set M's identifier to
  * @param Len Length of Ident
  * @see Module::setModuleIdentifier()
  */
 void LLVMSetModuleIdentifier(LLVMModuleRef M, const char *Ident, size_t Len);
 
 /**
  * Obtain the module's original source file name.
  *
  * @param M Module to obtain the name of
  * @param Len Out parameter which holds the length of the returned string
  * @return The original source file name of M
  * @see Module::getSourceFileName()
  */
 const char *LLVMGetSourceFileName(LLVMModuleRef M, size_t *Len);
 
 /**
  * Set the original source file name of a module to a string Name with length
  * Len.
  *
  * @param M The module to set the source file name of
  * @param Name The string to set M's source file name to
  * @param Len Length of Name
  * @see Module::setSourceFileName()
  */
 void LLVMSetSourceFileName(LLVMModuleRef M, const char *Name, size_t Len);
 
 /**
  * Obtain the data layout for a module.
  *
  * @see Module::getDataLayoutStr()
  *
  * LLVMGetDataLayout is DEPRECATED, as the name is not only incorrect,
  * but match the name of another method on the module. Prefer the use
  * of LLVMGetDataLayoutStr, which is not ambiguous.
  */
 const char *LLVMGetDataLayoutStr(LLVMModuleRef M);
 const char *LLVMGetDataLayout(LLVMModuleRef M);
 
 /**
  * Set the data layout for a module.
  *
  * @see Module::setDataLayout()
  */
 void LLVMSetDataLayout(LLVMModuleRef M, const char *DataLayoutStr);
 
 /**
  * Obtain the target triple for a module.
  *
  * @see Module::getTargetTriple()
  */
 const char *LLVMGetTarget(LLVMModuleRef M);
 
 /**
  * Set the target triple for a module.
  *
  * @see Module::setTargetTriple()
  */
 void LLVMSetTarget(LLVMModuleRef M, const char *Triple);
 
 /**
  * Returns the module flags as an array of flag-key-value triples.  The caller
  * is responsible for freeing this array by calling
  * \c LLVMDisposeModuleFlagsMetadata.
  *
  * @see Module::getModuleFlagsMetadata()
  */
 LLVMModuleFlagEntry *LLVMCopyModuleFlagsMetadata(LLVMModuleRef M, size_t *Len);
 
 /**
  * Destroys module flags metadata entries.
  */
 void LLVMDisposeModuleFlagsMetadata(LLVMModuleFlagEntry *Entries);
 
 /**
  * Returns the flag behavior for a module flag entry at a specific index.
  *
  * @see Module::ModuleFlagEntry::Behavior
  */
 LLVMModuleFlagBehavior
 LLVMModuleFlagEntriesGetFlagBehavior(LLVMModuleFlagEntry *Entries,
                                      unsigned Index);
 
 /**
  * Returns the key for a module flag entry at a specific index.
  *
  * @see Module::ModuleFlagEntry::Key
  */
 const char *LLVMModuleFlagEntriesGetKey(LLVMModuleFlagEntry *Entries,
                                         unsigned Index, size_t *Len);
 
 /**
  * Returns the metadata for a module flag entry at a specific index.
  *
  * @see Module::ModuleFlagEntry::Val
  */
 LLVMMetadataRef LLVMModuleFlagEntriesGetMetadata(LLVMModuleFlagEntry *Entries,
                                                  unsigned Index);
 
 /**
  * Add a module-level flag to the module-level flags metadata if it doesn't
  * already exist.
  *
  * @see Module::getModuleFlag()
  */
 LLVMMetadataRef LLVMGetModuleFlag(LLVMModuleRef M,
                                   const char *Key, size_t KeyLen);
 
 /**
  * Add a module-level flag to the module-level flags metadata if it doesn't
  * already exist.
  *
  * @see Module::addModuleFlag()
  */
 void LLVMAddModuleFlag(LLVMModuleRef M, LLVMModuleFlagBehavior Behavior,
                        const char *Key, size_t KeyLen,
                        LLVMMetadataRef Val);
 
 /**
  * Dump a representation of a module to stderr.
  *
  * @see Module::dump()
  */
 void LLVMDumpModule(LLVMModuleRef M);
 
 /**
  * Print a representation of a module to a file. The ErrorMessage needs to be
  * disposed with LLVMDisposeMessage. Returns 0 on success, 1 otherwise.
  *
  * @see Module::print()
  */
 LLVMBool LLVMPrintModuleToFile(LLVMModuleRef M, const char *Filename,
                                char **ErrorMessage);
 
 /**
  * Return a string representation of the module. Use
  * LLVMDisposeMessage to free the string.
  *
  * @see Module::print()
  */
 char *LLVMPrintModuleToString(LLVMModuleRef M);
 
 /**
  * Get inline assembly for a module.
  *
  * @see Module::getModuleInlineAsm()
  */
 const char *LLVMGetModuleInlineAsm(LLVMModuleRef M, size_t *Len);
 
 /**
  * Set inline assembly for a module.
  *
  * @see Module::setModuleInlineAsm()
  */
 void LLVMSetModuleInlineAsm2(LLVMModuleRef M, const char *Asm, size_t Len);
 
 /**
  * Append inline assembly to a module.
  *
  * @see Module::appendModuleInlineAsm()
  */
 void LLVMAppendModuleInlineAsm(LLVMModuleRef M, const char *Asm, size_t Len);
 
 /**
  * Create the specified uniqued inline asm string.
  *
  * @see InlineAsm::get()
  */
 LLVMValueRef LLVMGetInlineAsm(LLVMTypeRef Ty,
                               char *AsmString, size_t AsmStringSize,
                               char *Constraints, size_t ConstraintsSize,
                               LLVMBool HasSideEffects, LLVMBool IsAlignStack,
                               LLVMInlineAsmDialect Dialect);
 
 /**
  * Obtain the context to which this module is associated.
  *
  * @see Module::getContext()
  */
 LLVMContextRef LLVMGetModuleContext(LLVMModuleRef M);
 
 /**
  * Obtain a Type from a module by its registered name.
  */
 LLVMTypeRef LLVMGetTypeByName(LLVMModuleRef M, const char *Name);
 
 /**
  * Obtain an iterator to the first NamedMDNode in a Module.
  *
  * @see llvm::Module::named_metadata_begin()
  */
 LLVMNamedMDNodeRef LLVMGetFirstNamedMetadata(LLVMModuleRef M);
 
 /**
  * Obtain an iterator to the last NamedMDNode in a Module.
  *
  * @see llvm::Module::named_metadata_end()
  */
 LLVMNamedMDNodeRef LLVMGetLastNamedMetadata(LLVMModuleRef M);
 
 /**
  * Advance a NamedMDNode iterator to the next NamedMDNode.
  *
  * Returns NULL if the iterator was already at the end and there are no more
  * named metadata nodes.
  */
 LLVMNamedMDNodeRef LLVMGetNextNamedMetadata(LLVMNamedMDNodeRef NamedMDNode);
 
 /**
  * Decrement a NamedMDNode iterator to the previous NamedMDNode.
  *
  * Returns NULL if the iterator was already at the beginning and there are
  * no previous named metadata nodes.
  */
 LLVMNamedMDNodeRef LLVMGetPreviousNamedMetadata(LLVMNamedMDNodeRef NamedMDNode);
 
 /**
  * Retrieve a NamedMDNode with the given name, returning NULL if no such
  * node exists.
  *
  * @see llvm::Module::getNamedMetadata()
  */
 LLVMNamedMDNodeRef LLVMGetNamedMetadata(LLVMModuleRef M,
                                         const char *Name, size_t NameLen);
 
 /**
  * Retrieve a NamedMDNode with the given name, creating a new node if no such
  * node exists.
  *
  * @see llvm::Module::getOrInsertNamedMetadata()
  */
 LLVMNamedMDNodeRef LLVMGetOrInsertNamedMetadata(LLVMModuleRef M,
                                                 const char *Name,
                                                 size_t NameLen);
 
 /**
  * Retrieve the name of a NamedMDNode.
  *
  * @see llvm::NamedMDNode::getName()
  */
 const char *LLVMGetNamedMetadataName(LLVMNamedMDNodeRef NamedMD,
                                      size_t *NameLen);
 
 /**
  * Obtain the number of operands for named metadata in a module.
  *
  * @see llvm::Module::getNamedMetadata()
  */
 unsigned LLVMGetNamedMetadataNumOperands(LLVMModuleRef M, const char *Name);
 
 /**
  * Obtain the named metadata operands for a module.
  *
  * The passed LLVMValueRef pointer should refer to an array of
  * LLVMValueRef at least LLVMGetNamedMetadataNumOperands long. This
  * array will be populated with the LLVMValueRef instances. Each
  * instance corresponds to a llvm::MDNode.
  *
  * @see llvm::Module::getNamedMetadata()
  * @see llvm::MDNode::getOperand()
  */
 void LLVMGetNamedMetadataOperands(LLVMModuleRef M, const char *Name,
                                   LLVMValueRef *Dest);
 
 /**
  * Add an operand to named metadata.
  *
  * @see llvm::Module::getNamedMetadata()
  * @see llvm::MDNode::addOperand()
  */
 void LLVMAddNamedMetadataOperand(LLVMModuleRef M, const char *Name,
                                  LLVMValueRef Val);
 
 /**
  * Return the directory of the debug location for this value, which must be
  * an llvm::Instruction, llvm::GlobalVariable, or llvm::Function.
  *
  * @see llvm::Instruction::getDebugLoc()
  * @see llvm::GlobalVariable::getDebugInfo()
  * @see llvm::Function::getSubprogram()
  */
 const char *LLVMGetDebugLocDirectory(LLVMValueRef Val, unsigned *Length);
 
 /**
  * Return the filename of the debug location for this value, which must be
  * an llvm::Instruction, llvm::GlobalVariable, or llvm::Function.
  *
  * @see llvm::Instruction::getDebugLoc()
  * @see llvm::GlobalVariable::getDebugInfo()
  * @see llvm::Function::getSubprogram()
  */
 const char *LLVMGetDebugLocFilename(LLVMValueRef Val, unsigned *Length);
 
 /**
  * Return the line number of the debug location for this value, which must be
  * an llvm::Instruction, llvm::GlobalVariable, or llvm::Function.
  *
  * @see llvm::Instruction::getDebugLoc()
  * @see llvm::GlobalVariable::getDebugInfo()
  * @see llvm::Function::getSubprogram()
  */
 unsigned LLVMGetDebugLocLine(LLVMValueRef Val);
 
 /**
  * Return the column number of the debug location for this value, which must be
  * an llvm::Instruction.
  *
  * @see llvm::Instruction::getDebugLoc()
  */
 unsigned LLVMGetDebugLocColumn(LLVMValueRef Val);
 
 /**
  * Add a function to a module under a specified name.
  *
  * @see llvm::Function::Create()
  */
 LLVMValueRef LLVMAddFunction(LLVMModuleRef M, const char *Name,
                              LLVMTypeRef FunctionTy);
 
 /**
  * Obtain a Function value from a Module by its name.
  *
  * The returned value corresponds to a llvm::Function value.
  *
  * @see llvm::Module::getFunction()
  */
 LLVMValueRef LLVMGetNamedFunction(LLVMModuleRef M, const char *Name);
 
 /**
  * Obtain an iterator to the first Function in a Module.
  *
  * @see llvm::Module::begin()
  */
 LLVMValueRef LLVMGetFirstFunction(LLVMModuleRef M);
 
 /**
  * Obtain an iterator to the last Function in a Module.
  *
  * @see llvm::Module::end()
  */
 LLVMValueRef LLVMGetLastFunction(LLVMModuleRef M);
 
 /**
  * Advance a Function iterator to the next Function.
  *
  * Returns NULL if the iterator was already at the end and there are no more
  * functions.
  */
 LLVMValueRef LLVMGetNextFunction(LLVMValueRef Fn);
 
 /**
  * Decrement a Function iterator to the previous Function.
  *
  * Returns NULL if the iterator was already at the beginning and there are
  * no previous functions.
  */
 LLVMValueRef LLVMGetPreviousFunction(LLVMValueRef Fn);
 
 /** Deprecated: Use LLVMSetModuleInlineAsm2 instead. */
 void LLVMSetModuleInlineAsm(LLVMModuleRef M, const char *Asm);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreType Types
  *
  * Types represent the type of a value.
  *
  * Types are associated with a context instance. The context internally
  * deduplicates types so there is only 1 instance of a specific type
  * alive at a time. In other words, a unique type is shared among all
  * consumers within a context.
  *
  * A Type in the C API corresponds to llvm::Type.
  *
  * Types have the following hierarchy:
  *
  *   types:
  *     integer type
  *     real type
  *     function type
  *     sequence types:
  *       array type
  *       pointer type
  *       vector type
  *     void type
  *     label type
  *     opaque type
  *
  * @{
  */
 
 /**
  * Obtain the enumerated type of a Type instance.
  *
  * @see llvm::Type:getTypeID()
  */
 LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty);
 
 /**
  * Whether the type has a known size.
  *
  * Things that don't have a size are abstract types, labels, and void.a
  *
  * @see llvm::Type::isSized()
  */
 LLVMBool LLVMTypeIsSized(LLVMTypeRef Ty);
 
 /**
  * Obtain the context to which this type instance is associated.
  *
  * @see llvm::Type::getContext()
  */
 LLVMContextRef LLVMGetTypeContext(LLVMTypeRef Ty);
 
 /**
  * Dump a representation of a type to stderr.
  *
  * @see llvm::Type::dump()
  */
 void LLVMDumpType(LLVMTypeRef Val);
 
 /**
  * Return a string representation of the type. Use
  * LLVMDisposeMessage to free the string.
  *
  * @see llvm::Type::print()
  */
 char *LLVMPrintTypeToString(LLVMTypeRef Val);
 
 /**
  * @defgroup LLVMCCoreTypeInt Integer Types
  *
  * Functions in this section operate on integer types.
  *
  * @{
  */
 
 /**
  * Obtain an integer type from a context with specified bit width.
  */
 LLVMTypeRef LLVMInt1TypeInContext(LLVMContextRef C);
 LLVMTypeRef LLVMInt8TypeInContext(LLVMContextRef C);
 LLVMTypeRef LLVMInt16TypeInContext(LLVMContextRef C);
 LLVMTypeRef LLVMInt32TypeInContext(LLVMContextRef C);
 LLVMTypeRef LLVMInt64TypeInContext(LLVMContextRef C);
 LLVMTypeRef LLVMInt128TypeInContext(LLVMContextRef C);
 LLVMTypeRef LLVMIntTypeInContext(LLVMContextRef C, unsigned NumBits);
 
 /**
  * Obtain an integer type from the global context with a specified bit
  * width.
  */
 LLVMTypeRef LLVMInt1Type(void);
 LLVMTypeRef LLVMInt8Type(void);
 LLVMTypeRef LLVMInt16Type(void);
 LLVMTypeRef LLVMInt32Type(void);
 LLVMTypeRef LLVMInt64Type(void);
 LLVMTypeRef LLVMInt128Type(void);
 LLVMTypeRef LLVMIntType(unsigned NumBits);
 unsigned LLVMGetIntTypeWidth(LLVMTypeRef IntegerTy);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreTypeFloat Floating Point Types
  *
  * @{
  */
 
 /**
  * Obtain a 16-bit floating point type from a context.
  */
 LLVMTypeRef LLVMHalfTypeInContext(LLVMContextRef C);
 
 /**
  * Obtain a 16-bit brain floating point type from a context.
  */
 LLVMTypeRef LLVMBFloatTypeInContext(LLVMContextRef C);
 
 /**
  * Obtain a 32-bit floating point type from a context.
  */
 LLVMTypeRef LLVMFloatTypeInContext(LLVMContextRef C);
 
 /**
  * Obtain a 64-bit floating point type from a context.
  */
 LLVMTypeRef LLVMDoubleTypeInContext(LLVMContextRef C);
 
 /**
  * Obtain a 80-bit floating point type (X87) from a context.
  */
 LLVMTypeRef LLVMX86FP80TypeInContext(LLVMContextRef C);
 
 /**
  * Obtain a 128-bit floating point type (112-bit mantissa) from a
  * context.
  */
 LLVMTypeRef LLVMFP128TypeInContext(LLVMContextRef C);
 
 /**
  * Obtain a 128-bit floating point type (two 64-bits) from a context.
  */
 LLVMTypeRef LLVMPPCFP128TypeInContext(LLVMContextRef C);
 
 /**
  * Obtain a floating point type from the global context.
  *
  * These map to the functions in this group of the same name.
  */
 LLVMTypeRef LLVMHalfType(void);
 LLVMTypeRef LLVMBFloatType(void);
 LLVMTypeRef LLVMFloatType(void);
 LLVMTypeRef LLVMDoubleType(void);
 LLVMTypeRef LLVMX86FP80Type(void);
 LLVMTypeRef LLVMFP128Type(void);
 LLVMTypeRef LLVMPPCFP128Type(void);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreTypeFunction Function Types
  *
  * @{
  */
 
 /**
  * Obtain a function type consisting of a specified signature.
  *
  * The function is defined as a tuple of a return Type, a list of
  * parameter types, and whether the function is variadic.
  */
 LLVMTypeRef LLVMFunctionType(LLVMTypeRef ReturnType,
                              LLVMTypeRef *ParamTypes, unsigned ParamCount,
                              LLVMBool IsVarArg);
 
 /**
  * Returns whether a function type is variadic.
  */
 LLVMBool LLVMIsFunctionVarArg(LLVMTypeRef FunctionTy);
 
 /**
  * Obtain the Type this function Type returns.
  */
 LLVMTypeRef LLVMGetReturnType(LLVMTypeRef FunctionTy);
 
 /**
  * Obtain the number of parameters this function accepts.
  */
 unsigned LLVMCountParamTypes(LLVMTypeRef FunctionTy);
 
 /**
  * Obtain the types of a function's parameters.
  *
  * The Dest parameter should point to a pre-allocated array of
  * LLVMTypeRef at least LLVMCountParamTypes() large. On return, the
  * first LLVMCountParamTypes() entries in the array will be populated
  * with LLVMTypeRef instances.
  *
  * @param FunctionTy The function type to operate on.
  * @param Dest Memory address of an array to be filled with result.
  */
 void LLVMGetParamTypes(LLVMTypeRef FunctionTy, LLVMTypeRef *Dest);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreTypeStruct Structure Types
  *
  * These functions relate to LLVMTypeRef instances.
  *
  * @see llvm::StructType
  *
  * @{
  */
 
 /**
  * Create a new structure type in a context.
  *
  * A structure is specified by a list of inner elements/types and
  * whether these can be packed together.
  *
  * @see llvm::StructType::create()
  */
 LLVMTypeRef LLVMStructTypeInContext(LLVMContextRef C, LLVMTypeRef *ElementTypes,
                                     unsigned ElementCount, LLVMBool Packed);
 
 /**
  * Create a new structure type in the global context.
  *
  * @see llvm::StructType::create()
  */
 LLVMTypeRef LLVMStructType(LLVMTypeRef *ElementTypes, unsigned ElementCount,
                            LLVMBool Packed);
 
 /**
  * Create an empty structure in a context having a specified name.
  *
  * @see llvm::StructType::create()
  */
 LLVMTypeRef LLVMStructCreateNamed(LLVMContextRef C, const char *Name);
 
 /**
  * Obtain the name of a structure.
  *
  * @see llvm::StructType::getName()
  */
 const char *LLVMGetStructName(LLVMTypeRef Ty);
 
 /**
  * Set the contents of a structure type.
  *
  * @see llvm::StructType::setBody()
  */
 void LLVMStructSetBody(LLVMTypeRef StructTy, LLVMTypeRef *ElementTypes,
                        unsigned ElementCount, LLVMBool Packed);
 
 /**
  * Get the number of elements defined inside the structure.
  *
  * @see llvm::StructType::getNumElements()
  */
 unsigned LLVMCountStructElementTypes(LLVMTypeRef StructTy);
 
 /**
  * Get the elements within a structure.
  *
  * The function is passed the address of a pre-allocated array of
  * LLVMTypeRef at least LLVMCountStructElementTypes() long. After
  * invocation, this array will be populated with the structure's
  * elements. The objects in the destination array will have a lifetime
  * of the structure type itself, which is the lifetime of the context it
  * is contained in.
  */
 void LLVMGetStructElementTypes(LLVMTypeRef StructTy, LLVMTypeRef *Dest);
 
 /**
  * Get the type of the element at a given index in the structure.
  *
  * @see llvm::StructType::getTypeAtIndex()
  */
 LLVMTypeRef LLVMStructGetTypeAtIndex(LLVMTypeRef StructTy, unsigned i);
 
 /**
  * Determine whether a structure is packed.
  *
  * @see llvm::StructType::isPacked()
  */
 LLVMBool LLVMIsPackedStruct(LLVMTypeRef StructTy);
 
 /**
  * Determine whether a structure is opaque.
  *
  * @see llvm::StructType::isOpaque()
  */
 LLVMBool LLVMIsOpaqueStruct(LLVMTypeRef StructTy);
 
 /**
  * Determine whether a structure is literal.
  *
  * @see llvm::StructType::isLiteral()
  */
 LLVMBool LLVMIsLiteralStruct(LLVMTypeRef StructTy);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreTypeSequential Sequential Types
  *
  * Sequential types represents "arrays" of types. This is a super class
  * for array, vector, and pointer types.
  *
  * @{
  */
 
 /**
  * Obtain the type of elements within a sequential type.
  *
  * This works on array, vector, and pointer types.
  *
  * @see llvm::SequentialType::getElementType()
  */
 LLVMTypeRef LLVMGetElementType(LLVMTypeRef Ty);
 
 /**
  * Returns type's subtypes
  *
  * @see llvm::Type::subtypes()
  */
 void LLVMGetSubtypes(LLVMTypeRef Tp, LLVMTypeRef *Arr);
 
 /**
  *  Return the number of types in the derived type.
  *
  * @see llvm::Type::getNumContainedTypes()
  */
 unsigned LLVMGetNumContainedTypes(LLVMTypeRef Tp);
 
 /**
  * Create a fixed size array type that refers to a specific type.
  *
  * The created type will exist in the context that its element type
  * exists in.
  *
  * @see llvm::ArrayType::get()
  */
 LLVMTypeRef LLVMArrayType(LLVMTypeRef ElementType, unsigned ElementCount);
 
 /**
  * Obtain the length of an array type.
  *
  * This only works on types that represent arrays.
  *
  * @see llvm::ArrayType::getNumElements()
  */
 unsigned LLVMGetArrayLength(LLVMTypeRef ArrayTy);
 
 /**
  * Create a pointer type that points to a defined type.
  *
  * The created type will exist in the context that its pointee type
  * exists in.
  *
  * @see llvm::PointerType::get()
  */
 LLVMTypeRef LLVMPointerType(LLVMTypeRef ElementType, unsigned AddressSpace);
 
 /**
  * Obtain the address space of a pointer type.
  *
  * This only works on types that represent pointers.
  *
  * @see llvm::PointerType::getAddressSpace()
  */
 unsigned LLVMGetPointerAddressSpace(LLVMTypeRef PointerTy);
 
 /**
  * Create a vector type that contains a defined type and has a specific
  * number of elements.
  *
  * The created type will exist in the context thats its element type
  * exists in.
  *
  * @see llvm::VectorType::get()
  */
 LLVMTypeRef LLVMVectorType(LLVMTypeRef ElementType, unsigned ElementCount);
 
 /**
  * Obtain the number of elements in a vector type.
  *
  * This only works on types that represent vectors.
  *
  * @see llvm::VectorType::getNumElements()
  */
 unsigned LLVMGetVectorSize(LLVMTypeRef VectorTy);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreTypeOther Other Types
  *
  * @{
  */
 
 /**
  * Create a void type in a context.
  */
 LLVMTypeRef LLVMVoidTypeInContext(LLVMContextRef C);
 
 /**
  * Create a label type in a context.
  */
 LLVMTypeRef LLVMLabelTypeInContext(LLVMContextRef C);
 
 /**
  * Create a X86 MMX type in a context.
  */
 LLVMTypeRef LLVMX86MMXTypeInContext(LLVMContextRef C);
 
 /**
  * Create a token type in a context.
  */
 LLVMTypeRef LLVMTokenTypeInContext(LLVMContextRef C);
 
 /**
  * Create a metadata type in a context.
  */
 LLVMTypeRef LLVMMetadataTypeInContext(LLVMContextRef C);
 
 /**
  * These are similar to the above functions except they operate on the
  * global context.
  */
 LLVMTypeRef LLVMVoidType(void);
 LLVMTypeRef LLVMLabelType(void);
 LLVMTypeRef LLVMX86MMXType(void);
 
 /**
  * @}
  */
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValues Values
  *
  * The bulk of LLVM's object model consists of values, which comprise a very
  * rich type hierarchy.
  *
  * LLVMValueRef essentially represents llvm::Value. There is a rich
  * hierarchy of classes within this type. Depending on the instance
  * obtained, not all APIs are available.
  *
  * Callers can determine the type of an LLVMValueRef by calling the
  * LLVMIsA* family of functions (e.g. LLVMIsAArgument()). These
  * functions are defined by a macro, so it isn't obvious which are
  * available by looking at the Doxygen source code. Instead, look at the
  * source definition of LLVM_FOR_EACH_VALUE_SUBCLASS and note the list
  * of value names given. These value names also correspond to classes in
  * the llvm::Value hierarchy.
  *
  * @{
  */
 
 #define LLVM_FOR_EACH_VALUE_SUBCLASS(macro) \
   macro(Argument)                           \
   macro(BasicBlock)                         \
   macro(InlineAsm)                          \
   macro(User)                               \
     macro(Constant)                         \
       macro(BlockAddress)                   \
       macro(ConstantAggregateZero)          \
       macro(ConstantArray)                  \
       macro(ConstantDataSequential)         \
         macro(ConstantDataArray)            \
         macro(ConstantDataVector)           \
       macro(ConstantExpr)                   \
       macro(ConstantFP)                     \
       macro(ConstantInt)                    \
       macro(ConstantPointerNull)            \
       macro(ConstantStruct)                 \
       macro(ConstantTokenNone)              \
       macro(ConstantVector)                 \
       macro(GlobalValue)                    \
         macro(GlobalAlias)                  \
         macro(GlobalIFunc)                  \
         macro(GlobalObject)                 \
           macro(Function)                   \
           macro(GlobalVariable)             \
       macro(UndefValue)                     \
     macro(Instruction)                      \
       macro(UnaryOperator)                  \
       macro(BinaryOperator)                 \
       macro(CallInst)                       \
         macro(IntrinsicInst)                \
           macro(DbgInfoIntrinsic)           \
             macro(DbgVariableIntrinsic)     \
               macro(DbgDeclareInst)         \
             macro(DbgLabelInst)             \
           macro(MemIntrinsic)               \
             macro(MemCpyInst)               \
             macro(MemMoveInst)              \
             macro(MemSetInst)               \
       macro(CmpInst)                        \
         macro(FCmpInst)                     \
         macro(ICmpInst)                     \
       macro(ExtractElementInst)             \
       macro(GetElementPtrInst)              \
       macro(InsertElementInst)              \
       macro(InsertValueInst)                \
       macro(LandingPadInst)                 \
       macro(PHINode)                        \
       macro(SelectInst)                     \
       macro(ShuffleVectorInst)              \
       macro(StoreInst)                      \
       macro(BranchInst)                     \
       macro(IndirectBrInst)                 \
       macro(InvokeInst)                     \
       macro(ReturnInst)                     \
       macro(SwitchInst)                     \
       macro(UnreachableInst)                \
       macro(ResumeInst)                     \
       macro(CleanupReturnInst)              \
       macro(CatchReturnInst)                \
       macro(CatchSwitchInst)                \
       macro(CallBrInst)                     \
       macro(FuncletPadInst)                 \
         macro(CatchPadInst)                 \
         macro(CleanupPadInst)               \
       macro(UnaryInstruction)               \
         macro(AllocaInst)                   \
         macro(CastInst)                     \
           macro(AddrSpaceCastInst)          \
           macro(BitCastInst)                \
           macro(FPExtInst)                  \
           macro(FPToSIInst)                 \
           macro(FPToUIInst)                 \
           macro(FPTruncInst)                \
           macro(IntToPtrInst)               \
           macro(PtrToIntInst)               \
           macro(SExtInst)                   \
           macro(SIToFPInst)                 \
           macro(TruncInst)                  \
           macro(UIToFPInst)                 \
           macro(ZExtInst)                   \
         macro(ExtractValueInst)             \
         macro(LoadInst)                     \
         macro(VAArgInst)                    \
         macro(FreezeInst)                   \
       macro(AtomicCmpXchgInst)              \
       macro(AtomicRMWInst)                  \
       macro(FenceInst)
 
 /**
  * @defgroup LLVMCCoreValueGeneral General APIs
  *
  * Functions in this section work on all LLVMValueRef instances,
  * regardless of their sub-type. They correspond to functions available
  * on llvm::Value.
  *
  * @{
  */
 
 /**
  * Obtain the type of a value.
  *
  * @see llvm::Value::getType()
  */
 LLVMTypeRef LLVMTypeOf(LLVMValueRef Val);
 
 /**
  * Obtain the enumerated type of a Value instance.
  *
  * @see llvm::Value::getValueID()
  */
 LLVMValueKind LLVMGetValueKind(LLVMValueRef Val);
 
 /**
  * Obtain the string name of a value.
  *
  * @see llvm::Value::getName()
  */
 const char *LLVMGetValueName2(LLVMValueRef Val, size_t *Length);
 
 /**
  * Set the string name of a value.
  *
  * @see llvm::Value::setName()
  */
 void LLVMSetValueName2(LLVMValueRef Val, const char *Name, size_t NameLen);
 
 /**
  * Dump a representation of a value to stderr.
  *
  * @see llvm::Value::dump()
  */
 void LLVMDumpValue(LLVMValueRef Val);
 
 /**
  * Return a string representation of the value. Use
  * LLVMDisposeMessage to free the string.
  *
  * @see llvm::Value::print()
  */
 char *LLVMPrintValueToString(LLVMValueRef Val);
 
 /**
  * Replace all uses of a value with another one.
  *
  * @see llvm::Value::replaceAllUsesWith()
  */
 void LLVMReplaceAllUsesWith(LLVMValueRef OldVal, LLVMValueRef NewVal);
 
 /**
  * Determine whether the specified value instance is constant.
  */
 LLVMBool LLVMIsConstant(LLVMValueRef Val);
 
 /**
  * Determine whether a value instance is undefined.
  */
 LLVMBool LLVMIsUndef(LLVMValueRef Val);
 
 /**
  * Convert value instances between types.
  *
  * Internally, an LLVMValueRef is "pinned" to a specific type. This
  * series of functions allows you to cast an instance to a specific
  * type.
  *
  * If the cast is not valid for the specified type, NULL is returned.
  *
  * @see llvm::dyn_cast_or_null<>
  */
 #define LLVM_DECLARE_VALUE_CAST(name) \
   LLVMValueRef LLVMIsA##name(LLVMValueRef Val);
 LLVM_FOR_EACH_VALUE_SUBCLASS(LLVM_DECLARE_VALUE_CAST)
 
 LLVMValueRef LLVMIsAMDNode(LLVMValueRef Val);
 LLVMValueRef LLVMIsAMDString(LLVMValueRef Val);
 
 /** Deprecated: Use LLVMGetValueName2 instead. */
 const char *LLVMGetValueName(LLVMValueRef Val);
 /** Deprecated: Use LLVMSetValueName2 instead. */
 void LLVMSetValueName(LLVMValueRef Val, const char *Name);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueUses Usage
  *
  * This module defines functions that allow you to inspect the uses of a
  * LLVMValueRef.
  *
  * It is possible to obtain an LLVMUseRef for any LLVMValueRef instance.
  * Each LLVMUseRef (which corresponds to a llvm::Use instance) holds a
  * llvm::User and llvm::Value.
  *
  * @{
  */
 
 /**
  * Obtain the first use of a value.
  *
  * Uses are obtained in an iterator fashion. First, call this function
  * to obtain a reference to the first use. Then, call LLVMGetNextUse()
  * on that instance and all subsequently obtained instances until
  * LLVMGetNextUse() returns NULL.
  *
  * @see llvm::Value::use_begin()
  */
 LLVMUseRef LLVMGetFirstUse(LLVMValueRef Val);
 
 /**
  * Obtain the next use of a value.
  *
  * This effectively advances the iterator. It returns NULL if you are on
  * the final use and no more are available.
  */
 LLVMUseRef LLVMGetNextUse(LLVMUseRef U);
 
 /**
  * Obtain the user value for a user.
  *
  * The returned value corresponds to a llvm::User type.
  *
  * @see llvm::Use::getUser()
  */
 LLVMValueRef LLVMGetUser(LLVMUseRef U);
 
 /**
  * Obtain the value this use corresponds to.
  *
  * @see llvm::Use::get().
  */
 LLVMValueRef LLVMGetUsedValue(LLVMUseRef U);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueUser User value
  *
  * Function in this group pertain to LLVMValueRef instances that descent
  * from llvm::User. This includes constants, instructions, and
  * operators.
  *
  * @{
  */
 
 /**
  * Obtain an operand at a specific index in a llvm::User value.
  *
  * @see llvm::User::getOperand()
  */
 LLVMValueRef LLVMGetOperand(LLVMValueRef Val, unsigned Index);
 
 /**
  * Obtain the use of an operand at a specific index in a llvm::User value.
  *
  * @see llvm::User::getOperandUse()
  */
 LLVMUseRef LLVMGetOperandUse(LLVMValueRef Val, unsigned Index);
 
 /**
  * Set an operand at a specific index in a llvm::User value.
  *
  * @see llvm::User::setOperand()
  */
 void LLVMSetOperand(LLVMValueRef User, unsigned Index, LLVMValueRef Val);
 
 /**
  * Obtain the number of operands in a llvm::User value.
  *
  * @see llvm::User::getNumOperands()
  */
 int LLVMGetNumOperands(LLVMValueRef Val);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueConstant Constants
  *
  * This section contains APIs for interacting with LLVMValueRef that
  * correspond to llvm::Constant instances.
  *
  * These functions will work for any LLVMValueRef in the llvm::Constant
  * class hierarchy.
  *
  * @{
  */
 
 /**
  * Obtain a constant value referring to the null instance of a type.
  *
  * @see llvm::Constant::getNullValue()
  */
 LLVMValueRef LLVMConstNull(LLVMTypeRef Ty); /* all zeroes */
 
 /**
  * Obtain a constant value referring to the instance of a type
  * consisting of all ones.
  *
  * This is only valid for integer types.
  *
  * @see llvm::Constant::getAllOnesValue()
  */
 LLVMValueRef LLVMConstAllOnes(LLVMTypeRef Ty);
 
 /**
  * Obtain a constant value referring to an undefined value of a type.
  *
  * @see llvm::UndefValue::get()
  */
 LLVMValueRef LLVMGetUndef(LLVMTypeRef Ty);
 
 /**
  * Determine whether a value instance is null.
  *
  * @see llvm::Constant::isNullValue()
  */
 LLVMBool LLVMIsNull(LLVMValueRef Val);
 
 /**
  * Obtain a constant that is a constant pointer pointing to NULL for a
  * specified type.
  */
 LLVMValueRef LLVMConstPointerNull(LLVMTypeRef Ty);
 
 /**
  * @defgroup LLVMCCoreValueConstantScalar Scalar constants
  *
  * Functions in this group model LLVMValueRef instances that correspond
  * to constants referring to scalar types.
  *
  * For integer types, the LLVMTypeRef parameter should correspond to a
  * llvm::IntegerType instance and the returned LLVMValueRef will
  * correspond to a llvm::ConstantInt.
  *
  * For floating point types, the LLVMTypeRef returned corresponds to a
  * llvm::ConstantFP.
  *
  * @{
  */
 
 /**
  * Obtain a constant value for an integer type.
  *
  * The returned value corresponds to a llvm::ConstantInt.
  *
  * @see llvm::ConstantInt::get()
  *
  * @param IntTy Integer type to obtain value of.
  * @param N The value the returned instance should refer to.
  * @param SignExtend Whether to sign extend the produced value.
  */
 LLVMValueRef LLVMConstInt(LLVMTypeRef IntTy, unsigned long long N,
                           LLVMBool SignExtend);
 
 /**
  * Obtain a constant value for an integer of arbitrary precision.
  *
  * @see llvm::ConstantInt::get()
  */
 LLVMValueRef LLVMConstIntOfArbitraryPrecision(LLVMTypeRef IntTy,
                                               unsigned NumWords,
                                               const uint64_t Words[]);
 
 /**
  * Obtain a constant value for an integer parsed from a string.
  *
  * A similar API, LLVMConstIntOfStringAndSize is also available. If the
  * string's length is available, it is preferred to call that function
  * instead.
  *
  * @see llvm::ConstantInt::get()
  */
 LLVMValueRef LLVMConstIntOfString(LLVMTypeRef IntTy, const char *Text,
                                   uint8_t Radix);
 
 /**
  * Obtain a constant value for an integer parsed from a string with
  * specified length.
  *
  * @see llvm::ConstantInt::get()
  */
 LLVMValueRef LLVMConstIntOfStringAndSize(LLVMTypeRef IntTy, const char *Text,
                                          unsigned SLen, uint8_t Radix);
 
 /**
  * Obtain a constant value referring to a double floating point value.
  */
 LLVMValueRef LLVMConstReal(LLVMTypeRef RealTy, double N);
 
 /**
  * Obtain a constant for a floating point value parsed from a string.
  *
  * A similar API, LLVMConstRealOfStringAndSize is also available. It
  * should be used if the input string's length is known.
  */
 LLVMValueRef LLVMConstRealOfString(LLVMTypeRef RealTy, const char *Text);
 
 /**
  * Obtain a constant for a floating point value parsed from a string.
  */
 LLVMValueRef LLVMConstRealOfStringAndSize(LLVMTypeRef RealTy, const char *Text,
                                           unsigned SLen);
 
 /**
  * Obtain the zero extended value for an integer constant value.
  *
  * @see llvm::ConstantInt::getZExtValue()
  */
 unsigned long long LLVMConstIntGetZExtValue(LLVMValueRef ConstantVal);
 
 /**
  * Obtain the sign extended value for an integer constant value.
  *
  * @see llvm::ConstantInt::getSExtValue()
  */
 long long LLVMConstIntGetSExtValue(LLVMValueRef ConstantVal);
 
 /**
  * Obtain the double value for an floating point constant value.
  * losesInfo indicates if some precision was lost in the conversion.
  *
  * @see llvm::ConstantFP::getDoubleValue
  */
 double LLVMConstRealGetDouble(LLVMValueRef ConstantVal, LLVMBool *losesInfo);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueConstantComposite Composite Constants
  *
  * Functions in this group operate on composite constants.
  *
  * @{
  */
 
 /**
  * Create a ConstantDataSequential and initialize it with a string.
  *
  * @see llvm::ConstantDataArray::getString()
  */
 LLVMValueRef LLVMConstStringInContext(LLVMContextRef C, const char *Str,
                                       unsigned Length, LLVMBool DontNullTerminate);
 
 /**
  * Create a ConstantDataSequential with string content in the global context.
  *
  * This is the same as LLVMConstStringInContext except it operates on the
  * global context.
  *
  * @see LLVMConstStringInContext()
  * @see llvm::ConstantDataArray::getString()
  */
 LLVMValueRef LLVMConstString(const char *Str, unsigned Length,
                              LLVMBool DontNullTerminate);
 
 /**
  * Returns true if the specified constant is an array of i8.
  *
  * @see ConstantDataSequential::getAsString()
  */
 LLVMBool LLVMIsConstantString(LLVMValueRef c);
 
 /**
  * Get the given constant data sequential as a string.
  *
  * @see ConstantDataSequential::getAsString()
  */
 const char *LLVMGetAsString(LLVMValueRef c, size_t *Length);
 
 /**
  * Create an anonymous ConstantStruct with the specified values.
  *
  * @see llvm::ConstantStruct::getAnon()
  */
 LLVMValueRef LLVMConstStructInContext(LLVMContextRef C,
                                       LLVMValueRef *ConstantVals,
                                       unsigned Count, LLVMBool Packed);
 
 /**
  * Create a ConstantStruct in the global Context.
  *
  * This is the same as LLVMConstStructInContext except it operates on the
  * global Context.
  *
  * @see LLVMConstStructInContext()
  */
 LLVMValueRef LLVMConstStruct(LLVMValueRef *ConstantVals, unsigned Count,
                              LLVMBool Packed);
 
 /**
  * Create a ConstantArray from values.
  *
  * @see llvm::ConstantArray::get()
  */
 LLVMValueRef LLVMConstArray(LLVMTypeRef ElementTy,
                             LLVMValueRef *ConstantVals, unsigned Length);
 
 /**
  * Create a non-anonymous ConstantStruct from values.
  *
  * @see llvm::ConstantStruct::get()
  */
 LLVMValueRef LLVMConstNamedStruct(LLVMTypeRef StructTy,
                                   LLVMValueRef *ConstantVals,
                                   unsigned Count);
 
 /**
  * Get an element at specified index as a constant.
  *
  * @see ConstantDataSequential::getElementAsConstant()
  */
 LLVMValueRef LLVMGetElementAsConstant(LLVMValueRef C, unsigned idx);
 
 /**
  * Create a ConstantVector from values.
  *
  * @see llvm::ConstantVector::get()
  */
 LLVMValueRef LLVMConstVector(LLVMValueRef *ScalarConstantVals, unsigned Size);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueConstantExpressions Constant Expressions
  *
  * Functions in this group correspond to APIs on llvm::ConstantExpr.
  *
  * @see llvm::ConstantExpr.
  *
  * @{
  */
 LLVMOpcode LLVMGetConstOpcode(LLVMValueRef ConstantVal);
 LLVMValueRef LLVMAlignOf(LLVMTypeRef Ty);
 LLVMValueRef LLVMSizeOf(LLVMTypeRef Ty);
 LLVMValueRef LLVMConstNeg(LLVMValueRef ConstantVal);
 LLVMValueRef LLVMConstNSWNeg(LLVMValueRef ConstantVal);
 LLVMValueRef LLVMConstNUWNeg(LLVMValueRef ConstantVal);
 LLVMValueRef LLVMConstFNeg(LLVMValueRef ConstantVal);
 LLVMValueRef LLVMConstNot(LLVMValueRef ConstantVal);
 LLVMValueRef LLVMConstAdd(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstNSWAdd(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstNUWAdd(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstFAdd(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstSub(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstNSWSub(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstNUWSub(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstFSub(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstMul(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstNSWMul(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstNUWMul(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstFMul(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstUDiv(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstExactUDiv(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstSDiv(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstExactSDiv(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstFDiv(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstURem(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstSRem(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstFRem(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstAnd(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstOr(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstXor(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstICmp(LLVMIntPredicate Predicate,
                            LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstFCmp(LLVMRealPredicate Predicate,
                            LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstShl(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstLShr(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstAShr(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant);
 LLVMValueRef LLVMConstGEP(LLVMValueRef ConstantVal,
                           LLVMValueRef *ConstantIndices, unsigned NumIndices);
 LLVMValueRef LLVMConstGEP2(LLVMTypeRef Ty, LLVMValueRef ConstantVal,
                            LLVMValueRef *ConstantIndices, unsigned NumIndices);
 LLVMValueRef LLVMConstInBoundsGEP(LLVMValueRef ConstantVal,
                                   LLVMValueRef *ConstantIndices,
                                   unsigned NumIndices);
 LLVMValueRef LLVMConstInBoundsGEP2(LLVMTypeRef Ty, LLVMValueRef ConstantVal,
                                    LLVMValueRef *ConstantIndices,
                                    unsigned NumIndices);
 LLVMValueRef LLVMConstTrunc(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstSExt(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstZExt(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstFPTrunc(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstFPExt(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstUIToFP(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstSIToFP(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstFPToUI(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstFPToSI(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstPtrToInt(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstIntToPtr(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstBitCast(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstAddrSpaceCast(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstZExtOrBitCast(LLVMValueRef ConstantVal,
                                     LLVMTypeRef ToType);
 LLVMValueRef LLVMConstSExtOrBitCast(LLVMValueRef ConstantVal,
                                     LLVMTypeRef ToType);
 LLVMValueRef LLVMConstTruncOrBitCast(LLVMValueRef ConstantVal,
                                      LLVMTypeRef ToType);
 LLVMValueRef LLVMConstPointerCast(LLVMValueRef ConstantVal,
                                   LLVMTypeRef ToType);
 LLVMValueRef LLVMConstIntCast(LLVMValueRef ConstantVal, LLVMTypeRef ToType,
                               LLVMBool isSigned);
 LLVMValueRef LLVMConstFPCast(LLVMValueRef ConstantVal, LLVMTypeRef ToType);
 LLVMValueRef LLVMConstSelect(LLVMValueRef ConstantCondition,
                              LLVMValueRef ConstantIfTrue,
                              LLVMValueRef ConstantIfFalse);
 LLVMValueRef LLVMConstExtractElement(LLVMValueRef VectorConstant,
                                      LLVMValueRef IndexConstant);
 LLVMValueRef LLVMConstInsertElement(LLVMValueRef VectorConstant,
                                     LLVMValueRef ElementValueConstant,
                                     LLVMValueRef IndexConstant);
 LLVMValueRef LLVMConstShuffleVector(LLVMValueRef VectorAConstant,
                                     LLVMValueRef VectorBConstant,
                                     LLVMValueRef MaskConstant);
 LLVMValueRef LLVMConstExtractValue(LLVMValueRef AggConstant, unsigned *IdxList,
                                    unsigned NumIdx);
 LLVMValueRef LLVMConstInsertValue(LLVMValueRef AggConstant,
                                   LLVMValueRef ElementValueConstant,
                                   unsigned *IdxList, unsigned NumIdx);
 LLVMValueRef LLVMBlockAddress(LLVMValueRef F, LLVMBasicBlockRef BB);
 
 /** Deprecated: Use LLVMGetInlineAsm instead. */
 LLVMValueRef LLVMConstInlineAsm(LLVMTypeRef Ty,
                                 const char *AsmString, const char *Constraints,
                                 LLVMBool HasSideEffects, LLVMBool IsAlignStack);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueConstantGlobals Global Values
  *
  * This group contains functions that operate on global values. Functions in
  * this group relate to functions in the llvm::GlobalValue class tree.
  *
  * @see llvm::GlobalValue
  *
  * @{
  */
 
 LLVMModuleRef LLVMGetGlobalParent(LLVMValueRef Global);
 LLVMBool LLVMIsDeclaration(LLVMValueRef Global);
 LLVMLinkage LLVMGetLinkage(LLVMValueRef Global);
 void LLVMSetLinkage(LLVMValueRef Global, LLVMLinkage Linkage);
 const char *LLVMGetSection(LLVMValueRef Global);
 void LLVMSetSection(LLVMValueRef Global, const char *Section);
 LLVMVisibility LLVMGetVisibility(LLVMValueRef Global);
 void LLVMSetVisibility(LLVMValueRef Global, LLVMVisibility Viz);
 LLVMDLLStorageClass LLVMGetDLLStorageClass(LLVMValueRef Global);
 void LLVMSetDLLStorageClass(LLVMValueRef Global, LLVMDLLStorageClass Class);
 LLVMUnnamedAddr LLVMGetUnnamedAddress(LLVMValueRef Global);
 void LLVMSetUnnamedAddress(LLVMValueRef Global, LLVMUnnamedAddr UnnamedAddr);
 
 /**
  * Returns the "value type" of a global value.  This differs from the formal
  * type of a global value which is always a pointer type.
  *
  * @see llvm::GlobalValue::getValueType()
  */
 LLVMTypeRef LLVMGlobalGetValueType(LLVMValueRef Global);
 
 /** Deprecated: Use LLVMGetUnnamedAddress instead. */
 LLVMBool LLVMHasUnnamedAddr(LLVMValueRef Global);
 /** Deprecated: Use LLVMSetUnnamedAddress instead. */
 void LLVMSetUnnamedAddr(LLVMValueRef Global, LLVMBool HasUnnamedAddr);
 
 /**
  * @defgroup LLVMCCoreValueWithAlignment Values with alignment
  *
  * Functions in this group only apply to values with alignment, i.e.
  * global variables, load and store instructions.
  */
 
 /**
  * Obtain the preferred alignment of the value.
  * @see llvm::AllocaInst::getAlignment()
  * @see llvm::LoadInst::getAlignment()
  * @see llvm::StoreInst::getAlignment()
  * @see llvm::GlobalValue::getAlignment()
  */
 unsigned LLVMGetAlignment(LLVMValueRef V);
 
 /**
  * Set the preferred alignment of the value.
  * @see llvm::AllocaInst::setAlignment()
  * @see llvm::LoadInst::setAlignment()
  * @see llvm::StoreInst::setAlignment()
  * @see llvm::GlobalValue::setAlignment()
  */
 void LLVMSetAlignment(LLVMValueRef V, unsigned Bytes);
 
 /**
  * Sets a metadata attachment, erasing the existing metadata attachment if
  * it already exists for the given kind.
  *
  * @see llvm::GlobalObject::setMetadata()
  */
 void LLVMGlobalSetMetadata(LLVMValueRef Global, unsigned Kind,
                            LLVMMetadataRef MD);
 
 /**
  * Erases a metadata attachment of the given kind if it exists.
  *
  * @see llvm::GlobalObject::eraseMetadata()
  */
 void LLVMGlobalEraseMetadata(LLVMValueRef Global, unsigned Kind);
 
 /**
  * Removes all metadata attachments from this value.
  *
  * @see llvm::GlobalObject::clearMetadata()
  */
 void LLVMGlobalClearMetadata(LLVMValueRef Global);
 
 /**
  * Retrieves an array of metadata entries representing the metadata attached to
  * this value. The caller is responsible for freeing this array by calling
  * \c LLVMDisposeValueMetadataEntries.
  *
  * @see llvm::GlobalObject::getAllMetadata()
  */
 LLVMValueMetadataEntry *LLVMGlobalCopyAllMetadata(LLVMValueRef Value,
                                                   size_t *NumEntries);
 
 /**
  * Destroys value metadata entries.
  */
 void LLVMDisposeValueMetadataEntries(LLVMValueMetadataEntry *Entries);
 
 /**
  * Returns the kind of a value metadata entry at a specific index.
  */
 unsigned LLVMValueMetadataEntriesGetKind(LLVMValueMetadataEntry *Entries,
                                          unsigned Index);
 
 /**
  * Returns the underlying metadata node of a value metadata entry at a
  * specific index.
  */
 LLVMMetadataRef
 LLVMValueMetadataEntriesGetMetadata(LLVMValueMetadataEntry *Entries,
                                     unsigned Index);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCoreValueConstantGlobalVariable Global Variables
  *
  * This group contains functions that operate on global variable values.
  *
  * @see llvm::GlobalVariable
  *
  * @{
  */
 LLVMValueRef LLVMAddGlobal(LLVMModuleRef M, LLVMTypeRef Ty, const char *Name);
 LLVMValueRef LLVMAddGlobalInAddressSpace(LLVMModuleRef M, LLVMTypeRef Ty,
                                          const char *Name,
                                          unsigned AddressSpace);
 LLVMValueRef LLVMGetNamedGlobal(LLVMModuleRef M, const char *Name);
 LLVMValueRef LLVMGetFirstGlobal(LLVMModuleRef M);
 LLVMValueRef LLVMGetLastGlobal(LLVMModuleRef M);
 LLVMValueRef LLVMGetNextGlobal(LLVMValueRef GlobalVar);
 LLVMValueRef LLVMGetPreviousGlobal(LLVMValueRef GlobalVar);
 void LLVMDeleteGlobal(LLVMValueRef GlobalVar);
 LLVMValueRef LLVMGetInitializer(LLVMValueRef GlobalVar);
 void LLVMSetInitializer(LLVMValueRef GlobalVar, LLVMValueRef ConstantVal);
 LLVMBool LLVMIsThreadLocal(LLVMValueRef GlobalVar);
 void LLVMSetThreadLocal(LLVMValueRef GlobalVar, LLVMBool IsThreadLocal);
 LLVMBool LLVMIsGlobalConstant(LLVMValueRef GlobalVar);
 void LLVMSetGlobalConstant(LLVMValueRef GlobalVar, LLVMBool IsConstant);
 LLVMThreadLocalMode LLVMGetThreadLocalMode(LLVMValueRef GlobalVar);
 void LLVMSetThreadLocalMode(LLVMValueRef GlobalVar, LLVMThreadLocalMode Mode);
 LLVMBool LLVMIsExternallyInitialized(LLVMValueRef GlobalVar);
 void LLVMSetExternallyInitialized(LLVMValueRef GlobalVar, LLVMBool IsExtInit);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCoreValueConstantGlobalAlias Global Aliases
  *
  * This group contains function that operate on global alias values.
  *
  * @see llvm::GlobalAlias
  *
  * @{
  */
 LLVMValueRef LLVMAddAlias(LLVMModuleRef M, LLVMTypeRef Ty, LLVMValueRef Aliasee,
                           const char *Name);
 
 /**
  * Obtain a GlobalAlias value from a Module by its name.
  *
  * The returned value corresponds to a llvm::GlobalAlias value.
  *
  * @see llvm::Module::getNamedAlias()
  */
 LLVMValueRef LLVMGetNamedGlobalAlias(LLVMModuleRef M,
                                      const char *Name, size_t NameLen);
 
 /**
  * Obtain an iterator to the first GlobalAlias in a Module.
  *
  * @see llvm::Module::alias_begin()
  */
 LLVMValueRef LLVMGetFirstGlobalAlias(LLVMModuleRef M);
 
 /**
  * Obtain an iterator to the last GlobalAlias in a Module.
  *
  * @see llvm::Module::alias_end()
  */
 LLVMValueRef LLVMGetLastGlobalAlias(LLVMModuleRef M);
 
 /**
  * Advance a GlobalAlias iterator to the next GlobalAlias.
  *
  * Returns NULL if the iterator was already at the end and there are no more
  * global aliases.
  */
 LLVMValueRef LLVMGetNextGlobalAlias(LLVMValueRef GA);
 
 /**
  * Decrement a GlobalAlias iterator to the previous GlobalAlias.
  *
  * Returns NULL if the iterator was already at the beginning and there are
  * no previous global aliases.
  */
 LLVMValueRef LLVMGetPreviousGlobalAlias(LLVMValueRef GA);
 
 /**
  * Retrieve the target value of an alias.
  */
 LLVMValueRef LLVMAliasGetAliasee(LLVMValueRef Alias);
 
 /**
  * Set the target value of an alias.
  */
 void LLVMAliasSetAliasee(LLVMValueRef Alias, LLVMValueRef Aliasee);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueFunction Function values
  *
  * Functions in this group operate on LLVMValueRef instances that
  * correspond to llvm::Function instances.
  *
  * @see llvm::Function
  *
  * @{
  */
 
 /**
  * Remove a function from its containing module and deletes it.
  *
  * @see llvm::Function::eraseFromParent()
  */
 void LLVMDeleteFunction(LLVMValueRef Fn);
 
 /**
  * Check whether the given function has a personality function.
  *
  * @see llvm::Function::hasPersonalityFn()
  */
 LLVMBool LLVMHasPersonalityFn(LLVMValueRef Fn);
 
 /**
  * Obtain the personality function attached to the function.
  *
  * @see llvm::Function::getPersonalityFn()
  */
 LLVMValueRef LLVMGetPersonalityFn(LLVMValueRef Fn);
 
 /**
  * Set the personality function attached to the function.
  *
  * @see llvm::Function::setPersonalityFn()
  */
 void LLVMSetPersonalityFn(LLVMValueRef Fn, LLVMValueRef PersonalityFn);
 
 /**
  * Obtain the intrinsic ID number which matches the given function name.
  *
  * @see llvm::Function::lookupIntrinsicID()
  */
 unsigned LLVMLookupIntrinsicID(const char *Name, size_t NameLen);
 
 /**
  * Obtain the ID number from a function instance.
  *
  * @see llvm::Function::getIntrinsicID()
  */
 unsigned LLVMGetIntrinsicID(LLVMValueRef Fn);
 
 /**
  * Create or insert the declaration of an intrinsic.  For overloaded intrinsics,
  * parameter types must be provided to uniquely identify an overload.
  *
  * @see llvm::Intrinsic::getDeclaration()
  */
 LLVMValueRef LLVMGetIntrinsicDeclaration(LLVMModuleRef Mod,
                                          unsigned ID,
                                          LLVMTypeRef *ParamTypes,
                                          size_t ParamCount);
 
 /**
  * Retrieves the type of an intrinsic.  For overloaded intrinsics, parameter
  * types must be provided to uniquely identify an overload.
  *
  * @see llvm::Intrinsic::getType()
  */
 LLVMTypeRef LLVMIntrinsicGetType(LLVMContextRef Ctx, unsigned ID,
                                  LLVMTypeRef *ParamTypes, size_t ParamCount);
 
 /**
  * Retrieves the name of an intrinsic.
  *
  * @see llvm::Intrinsic::getName()
  */
 const char *LLVMIntrinsicGetName(unsigned ID, size_t *NameLength);
 
 /**
  * Copies the name of an overloaded intrinsic identified by a given list of
  * parameter types.
  *
  * Unlike LLVMIntrinsicGetName, the caller is responsible for freeing the
  * returned string.
  *
  * @see llvm::Intrinsic::getName()
  */
 const char *LLVMIntrinsicCopyOverloadedName(unsigned ID,
                                             LLVMTypeRef *ParamTypes,
                                             size_t ParamCount,
                                             size_t *NameLength);
 
 /**
  * Obtain if the intrinsic identified by the given ID is overloaded.
  *
  * @see llvm::Intrinsic::isOverloaded()
  */
 LLVMBool LLVMIntrinsicIsOverloaded(unsigned ID);
 
 /**
  * Obtain the calling function of a function.
  *
  * The returned value corresponds to the LLVMCallConv enumeration.
  *
  * @see llvm::Function::getCallingConv()
  */
 unsigned LLVMGetFunctionCallConv(LLVMValueRef Fn);
 
 /**
  * Set the calling convention of a function.
  *
  * @see llvm::Function::setCallingConv()
  *
  * @param Fn Function to operate on
  * @param CC LLVMCallConv to set calling convention to
  */
 void LLVMSetFunctionCallConv(LLVMValueRef Fn, unsigned CC);
 
 /**
  * Obtain the name of the garbage collector to use during code
  * generation.
  *
  * @see llvm::Function::getGC()
  */
 const char *LLVMGetGC(LLVMValueRef Fn);
 
 /**
  * Define the garbage collector to use during code generation.
  *
  * @see llvm::Function::setGC()
  */
 void LLVMSetGC(LLVMValueRef Fn, const char *Name);
 
 /**
  * Add an attribute to a function.
  *
  * @see llvm::Function::addAttribute()
  */
 void LLVMAddAttributeAtIndex(LLVMValueRef F, LLVMAttributeIndex Idx,
                              LLVMAttributeRef A);
 unsigned LLVMGetAttributeCountAtIndex(LLVMValueRef F, LLVMAttributeIndex Idx);
 void LLVMGetAttributesAtIndex(LLVMValueRef F, LLVMAttributeIndex Idx,
                               LLVMAttributeRef *Attrs);
 LLVMAttributeRef LLVMGetEnumAttributeAtIndex(LLVMValueRef F,
                                              LLVMAttributeIndex Idx,
                                              unsigned KindID);
 LLVMAttributeRef LLVMGetStringAttributeAtIndex(LLVMValueRef F,
                                                LLVMAttributeIndex Idx,
                                                const char *K, unsigned KLen);
 void LLVMRemoveEnumAttributeAtIndex(LLVMValueRef F, LLVMAttributeIndex Idx,
                                     unsigned KindID);
 void LLVMRemoveStringAttributeAtIndex(LLVMValueRef F, LLVMAttributeIndex Idx,
                                       const char *K, unsigned KLen);
 
 /**
  * Add a target-dependent attribute to a function
  * @see llvm::AttrBuilder::addAttribute()
  */
 void LLVMAddTargetDependentFunctionAttr(LLVMValueRef Fn, const char *A,
                                         const char *V);
 
 /**
  * @defgroup LLVMCCoreValueFunctionParameters Function Parameters
  *
  * Functions in this group relate to arguments/parameters on functions.
  *
  * Functions in this group expect LLVMValueRef instances that correspond
  * to llvm::Function instances.
  *
  * @{
  */
 
 /**
  * Obtain the number of parameters in a function.
  *
  * @see llvm::Function::arg_size()
  */
 unsigned LLVMCountParams(LLVMValueRef Fn);
 
 /**
  * Obtain the parameters in a function.
  *
  * The takes a pointer to a pre-allocated array of LLVMValueRef that is
  * at least LLVMCountParams() long. This array will be filled with
  * LLVMValueRef instances which correspond to the parameters the
  * function receives. Each LLVMValueRef corresponds to a llvm::Argument
  * instance.
  *
  * @see llvm::Function::arg_begin()
  */
 void LLVMGetParams(LLVMValueRef Fn, LLVMValueRef *Params);
 
 /**
  * Obtain the parameter at the specified index.
  *
  * Parameters are indexed from 0.
  *
  * @see llvm::Function::arg_begin()
  */
 LLVMValueRef LLVMGetParam(LLVMValueRef Fn, unsigned Index);
 
 /**
  * Obtain the function to which this argument belongs.
  *
  * Unlike other functions in this group, this one takes an LLVMValueRef
  * that corresponds to a llvm::Attribute.
  *
  * The returned LLVMValueRef is the llvm::Function to which this
  * argument belongs.
  */
 LLVMValueRef LLVMGetParamParent(LLVMValueRef Inst);
 
 /**
  * Obtain the first parameter to a function.
  *
  * @see llvm::Function::arg_begin()
  */
 LLVMValueRef LLVMGetFirstParam(LLVMValueRef Fn);
 
 /**
  * Obtain the last parameter to a function.
  *
  * @see llvm::Function::arg_end()
  */
 LLVMValueRef LLVMGetLastParam(LLVMValueRef Fn);
 
 /**
  * Obtain the next parameter to a function.
  *
  * This takes an LLVMValueRef obtained from LLVMGetFirstParam() (which is
  * actually a wrapped iterator) and obtains the next parameter from the
  * underlying iterator.
  */
 LLVMValueRef LLVMGetNextParam(LLVMValueRef Arg);
 
 /**
  * Obtain the previous parameter to a function.
  *
  * This is the opposite of LLVMGetNextParam().
  */
 LLVMValueRef LLVMGetPreviousParam(LLVMValueRef Arg);
 
 /**
  * Set the alignment for a function parameter.
  *
  * @see llvm::Argument::addAttr()
  * @see llvm::AttrBuilder::addAlignmentAttr()
  */
 void LLVMSetParamAlignment(LLVMValueRef Arg, unsigned Align);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueGlobalIFunc IFuncs
  *
  * Functions in this group relate to indirect functions.
  *
  * Functions in this group expect LLVMValueRef instances that correspond
  * to llvm::GlobalIFunc instances.
  *
  * @{
  */
 
 /**
  * Add a global indirect function to a module under a specified name.
  *
  * @see llvm::GlobalIFunc::create()
  */
 LLVMValueRef LLVMAddGlobalIFunc(LLVMModuleRef M,
                                 const char *Name, size_t NameLen,
                                 LLVMTypeRef Ty, unsigned AddrSpace,
                                 LLVMValueRef Resolver);
 
 /**
  * Obtain a GlobalIFunc value from a Module by its name.
  *
  * The returned value corresponds to a llvm::GlobalIFunc value.
  *
  * @see llvm::Module::getNamedIFunc()
  */
 LLVMValueRef LLVMGetNamedGlobalIFunc(LLVMModuleRef M,
                                      const char *Name, size_t NameLen);
 
 /**
  * Obtain an iterator to the first GlobalIFunc in a Module.
  *
  * @see llvm::Module::ifunc_begin()
  */
 LLVMValueRef LLVMGetFirstGlobalIFunc(LLVMModuleRef M);
 
 /**
  * Obtain an iterator to the last GlobalIFunc in a Module.
  *
  * @see llvm::Module::ifunc_end()
  */
 LLVMValueRef LLVMGetLastGlobalIFunc(LLVMModuleRef M);
 
 /**
  * Advance a GlobalIFunc iterator to the next GlobalIFunc.
  *
  * Returns NULL if the iterator was already at the end and there are no more
  * global aliases.
  */
 LLVMValueRef LLVMGetNextGlobalIFunc(LLVMValueRef IFunc);
 
 /**
  * Decrement a GlobalIFunc iterator to the previous GlobalIFunc.
  *
  * Returns NULL if the iterator was already at the beginning and there are
  * no previous global aliases.
  */
 LLVMValueRef LLVMGetPreviousGlobalIFunc(LLVMValueRef IFunc);
 
 /**
  * Retrieves the resolver function associated with this indirect function, or
  * NULL if it doesn't not exist.
  *
  * @see llvm::GlobalIFunc::getResolver()
  */
 LLVMValueRef LLVMGetGlobalIFuncResolver(LLVMValueRef IFunc);
 
 /**
  * Sets the resolver function associated with this indirect function.
  *
  * @see llvm::GlobalIFunc::setResolver()
  */
 void LLVMSetGlobalIFuncResolver(LLVMValueRef IFunc, LLVMValueRef Resolver);
 
 /**
  * Remove a global indirect function from its parent module and delete it.
  *
  * @see llvm::GlobalIFunc::eraseFromParent()
  */
 void LLVMEraseGlobalIFunc(LLVMValueRef IFunc);
 
 /**
  * Remove a global indirect function from its parent module.
  *
  * This unlinks the global indirect function from its containing module but
  * keeps it alive.
  *
  * @see llvm::GlobalIFunc::removeFromParent()
  */
 void LLVMRemoveGlobalIFunc(LLVMValueRef IFunc);
 
 /**
  * @}
  */
 
 /**
  * @}
  */
 
 /**
  * @}
  */
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueMetadata Metadata
  *
  * @{
  */
 
 /**
  * Create an MDString value from a given string value.
  *
  * The MDString value does not take ownership of the given string, it remains
  * the responsibility of the caller to free it.
  *
  * @see llvm::MDString::get()
  */
 LLVMMetadataRef LLVMMDStringInContext2(LLVMContextRef C, const char *Str,
                                        size_t SLen);
 
 /**
  * Create an MDNode value with the given array of operands.
  *
  * @see llvm::MDNode::get()
  */
 LLVMMetadataRef LLVMMDNodeInContext2(LLVMContextRef C, LLVMMetadataRef *MDs,
                                      size_t Count);
 
 /**
  * Obtain a Metadata as a Value.
  */
 LLVMValueRef LLVMMetadataAsValue(LLVMContextRef C, LLVMMetadataRef MD);
 
 /**
  * Obtain a Value as a Metadata.
  */
 LLVMMetadataRef LLVMValueAsMetadata(LLVMValueRef Val);
 
 /**
  * Obtain the underlying string from a MDString value.
  *
  * @param V Instance to obtain string from.
  * @param Length Memory address which will hold length of returned string.
  * @return String data in MDString.
  */
 const char *LLVMGetMDString(LLVMValueRef V, unsigned *Length);
 
 /**
  * Obtain the number of operands from an MDNode value.
  *
  * @param V MDNode to get number of operands from.
  * @return Number of operands of the MDNode.
  */
 unsigned LLVMGetMDNodeNumOperands(LLVMValueRef V);
 
 /**
  * Obtain the given MDNode's operands.
  *
  * The passed LLVMValueRef pointer should point to enough memory to hold all of
  * the operands of the given MDNode (see LLVMGetMDNodeNumOperands) as
  * LLVMValueRefs. This memory will be populated with the LLVMValueRefs of the
  * MDNode's operands.
  *
  * @param V MDNode to get the operands from.
  * @param Dest Destination array for operands.
  */
 void LLVMGetMDNodeOperands(LLVMValueRef V, LLVMValueRef *Dest);
 
 /** Deprecated: Use LLVMMDStringInContext2 instead. */
 LLVMValueRef LLVMMDStringInContext(LLVMContextRef C, const char *Str,
                                    unsigned SLen);
 /** Deprecated: Use LLVMMDStringInContext2 instead. */
 LLVMValueRef LLVMMDString(const char *Str, unsigned SLen);
 /** Deprecated: Use LLVMMDNodeInContext2 instead. */
 LLVMValueRef LLVMMDNodeInContext(LLVMContextRef C, LLVMValueRef *Vals,
                                  unsigned Count);
 /** Deprecated: Use LLVMMDNodeInContext2 instead. */
 LLVMValueRef LLVMMDNode(LLVMValueRef *Vals, unsigned Count);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueBasicBlock Basic Block
  *
  * A basic block represents a single entry single exit section of code.
  * Basic blocks contain a list of instructions which form the body of
  * the block.
  *
  * Basic blocks belong to functions. They have the type of label.
  *
  * Basic blocks are themselves values. However, the C API models them as
  * LLVMBasicBlockRef.
  *
  * @see llvm::BasicBlock
  *
  * @{
  */
 
 /**
  * Convert a basic block instance to a value type.
  */
 LLVMValueRef LLVMBasicBlockAsValue(LLVMBasicBlockRef BB);
 
 /**
  * Determine whether an LLVMValueRef is itself a basic block.
  */
 LLVMBool LLVMValueIsBasicBlock(LLVMValueRef Val);
 
 /**
  * Convert an LLVMValueRef to an LLVMBasicBlockRef instance.
  */
 LLVMBasicBlockRef LLVMValueAsBasicBlock(LLVMValueRef Val);
 
 /**
  * Obtain the string name of a basic block.
  */
 const char *LLVMGetBasicBlockName(LLVMBasicBlockRef BB);
 
 /**
  * Obtain the function to which a basic block belongs.
  *
  * @see llvm::BasicBlock::getParent()
  */
 LLVMValueRef LLVMGetBasicBlockParent(LLVMBasicBlockRef BB);
 
 /**
  * Obtain the terminator instruction for a basic block.
  *
  * If the basic block does not have a terminator (it is not well-formed
  * if it doesn't), then NULL is returned.
  *
  * The returned LLVMValueRef corresponds to an llvm::Instruction.
  *
  * @see llvm::BasicBlock::getTerminator()
  */
 LLVMValueRef LLVMGetBasicBlockTerminator(LLVMBasicBlockRef BB);
 
 /**
  * Obtain the number of basic blocks in a function.
  *
  * @param Fn Function value to operate on.
  */
 unsigned LLVMCountBasicBlocks(LLVMValueRef Fn);
 
 /**
  * Obtain all of the basic blocks in a function.
  *
  * This operates on a function value. The BasicBlocks parameter is a
  * pointer to a pre-allocated array of LLVMBasicBlockRef of at least
  * LLVMCountBasicBlocks() in length. This array is populated with
  * LLVMBasicBlockRef instances.
  */
 void LLVMGetBasicBlocks(LLVMValueRef Fn, LLVMBasicBlockRef *BasicBlocks);
 
 /**
  * Obtain the first basic block in a function.
  *
  * The returned basic block can be used as an iterator. You will likely
  * eventually call into LLVMGetNextBasicBlock() with it.
  *
  * @see llvm::Function::begin()
  */
 LLVMBasicBlockRef LLVMGetFirstBasicBlock(LLVMValueRef Fn);
 
 /**
  * Obtain the last basic block in a function.
  *
  * @see llvm::Function::end()
  */
 LLVMBasicBlockRef LLVMGetLastBasicBlock(LLVMValueRef Fn);
 
 /**
  * Advance a basic block iterator.
  */
 LLVMBasicBlockRef LLVMGetNextBasicBlock(LLVMBasicBlockRef BB);
 
 /**
  * Go backwards in a basic block iterator.
  */
 LLVMBasicBlockRef LLVMGetPreviousBasicBlock(LLVMBasicBlockRef BB);
 
 /**
  * Obtain the basic block that corresponds to the entry point of a
  * function.
  *
  * @see llvm::Function::getEntryBlock()
  */
 LLVMBasicBlockRef LLVMGetEntryBasicBlock(LLVMValueRef Fn);
 
 /**
  * Insert the given basic block after the insertion point of the given builder.
  *
  * The insertion point must be valid.
  *
  * @see llvm::Function::BasicBlockListType::insertAfter()
  */
 void LLVMInsertExistingBasicBlockAfterInsertBlock(LLVMBuilderRef Builder,
                                                   LLVMBasicBlockRef BB);
 
 /**
  * Append the given basic block to the basic block list of the given function.
  *
  * @see llvm::Function::BasicBlockListType::push_back()
  */
 void LLVMAppendExistingBasicBlock(LLVMValueRef Fn,
                                   LLVMBasicBlockRef BB);
 
 /**
  * Create a new basic block without inserting it into a function.
  *
  * @see llvm::BasicBlock::Create()
  */
 LLVMBasicBlockRef LLVMCreateBasicBlockInContext(LLVMContextRef C,
                                                 const char *Name);
 
 /**
  * Append a basic block to the end of a function.
  *
  * @see llvm::BasicBlock::Create()
  */
 LLVMBasicBlockRef LLVMAppendBasicBlockInContext(LLVMContextRef C,
                                                 LLVMValueRef Fn,
                                                 const char *Name);
 
 /**
  * Append a basic block to the end of a function using the global
  * context.
  *
  * @see llvm::BasicBlock::Create()
  */
 LLVMBasicBlockRef LLVMAppendBasicBlock(LLVMValueRef Fn, const char *Name);
 
 /**
  * Insert a basic block in a function before another basic block.
  *
  * The function to add to is determined by the function of the
  * passed basic block.
  *
  * @see llvm::BasicBlock::Create()
  */
 LLVMBasicBlockRef LLVMInsertBasicBlockInContext(LLVMContextRef C,
                                                 LLVMBasicBlockRef BB,
                                                 const char *Name);
 
 /**
  * Insert a basic block in a function using the global context.
  *
  * @see llvm::BasicBlock::Create()
  */
 LLVMBasicBlockRef LLVMInsertBasicBlock(LLVMBasicBlockRef InsertBeforeBB,
                                        const char *Name);
 
 /**
  * Remove a basic block from a function and delete it.
  *
  * This deletes the basic block from its containing function and deletes
  * the basic block itself.
  *
  * @see llvm::BasicBlock::eraseFromParent()
  */
 void LLVMDeleteBasicBlock(LLVMBasicBlockRef BB);
 
 /**
  * Remove a basic block from a function.
  *
  * This deletes the basic block from its containing function but keep
  * the basic block alive.
  *
  * @see llvm::BasicBlock::removeFromParent()
  */
 void LLVMRemoveBasicBlockFromParent(LLVMBasicBlockRef BB);
 
 /**
  * Move a basic block to before another one.
  *
  * @see llvm::BasicBlock::moveBefore()
  */
 void LLVMMoveBasicBlockBefore(LLVMBasicBlockRef BB, LLVMBasicBlockRef MovePos);
 
 /**
  * Move a basic block to after another one.
  *
  * @see llvm::BasicBlock::moveAfter()
  */
 void LLVMMoveBasicBlockAfter(LLVMBasicBlockRef BB, LLVMBasicBlockRef MovePos);
 
 /**
  * Obtain the first instruction in a basic block.
  *
  * The returned LLVMValueRef corresponds to a llvm::Instruction
  * instance.
  */
 LLVMValueRef LLVMGetFirstInstruction(LLVMBasicBlockRef BB);
 
 /**
  * Obtain the last instruction in a basic block.
  *
  * The returned LLVMValueRef corresponds to an LLVM:Instruction.
  */
 LLVMValueRef LLVMGetLastInstruction(LLVMBasicBlockRef BB);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueInstruction Instructions
  *
  * Functions in this group relate to the inspection and manipulation of
  * individual instructions.
  *
  * In the C++ API, an instruction is modeled by llvm::Instruction. This
  * class has a large number of descendents. llvm::Instruction is a
  * llvm::Value and in the C API, instructions are modeled by
  * LLVMValueRef.
  *
  * This group also contains sub-groups which operate on specific
  * llvm::Instruction types, e.g. llvm::CallInst.
  *
  * @{
  */
 
 /**
  * Determine whether an instruction has any metadata attached.
  */
 int LLVMHasMetadata(LLVMValueRef Val);
 
 /**
  * Return metadata associated with an instruction value.
  */
 LLVMValueRef LLVMGetMetadata(LLVMValueRef Val, unsigned KindID);
 
 /**
  * Set metadata associated with an instruction value.
  */
 void LLVMSetMetadata(LLVMValueRef Val, unsigned KindID, LLVMValueRef Node);
 
 /**
  * Returns the metadata associated with an instruction value, but filters out
  * all the debug locations.
  *
  * @see llvm::Instruction::getAllMetadataOtherThanDebugLoc()
  */
 LLVMValueMetadataEntry *
 LLVMInstructionGetAllMetadataOtherThanDebugLoc(LLVMValueRef Instr,
                                                size_t *NumEntries);
 
 /**
  * Obtain the basic block to which an instruction belongs.
  *
  * @see llvm::Instruction::getParent()
  */
 LLVMBasicBlockRef LLVMGetInstructionParent(LLVMValueRef Inst);
 
 /**
  * Obtain the instruction that occurs after the one specified.
  *
  * The next instruction will be from the same basic block.
  *
  * If this is the last instruction in a basic block, NULL will be
  * returned.
  */
 LLVMValueRef LLVMGetNextInstruction(LLVMValueRef Inst);
 
 /**
  * Obtain the instruction that occurred before this one.
  *
  * If the instruction is the first instruction in a basic block, NULL
  * will be returned.
  */
 LLVMValueRef LLVMGetPreviousInstruction(LLVMValueRef Inst);
 
 /**
  * Remove and delete an instruction.
  *
  * The instruction specified is removed from its containing building
  * block but is kept alive.
  *
  * @see llvm::Instruction::removeFromParent()
  */
 void LLVMInstructionRemoveFromParent(LLVMValueRef Inst);
 
 /**
  * Remove and delete an instruction.
  *
  * The instruction specified is removed from its containing building
  * block and then deleted.
  *
  * @see llvm::Instruction::eraseFromParent()
  */
 void LLVMInstructionEraseFromParent(LLVMValueRef Inst);
 
 /**
  * Obtain the code opcode for an individual instruction.
  *
  * @see llvm::Instruction::getOpCode()
  */
 LLVMOpcode LLVMGetInstructionOpcode(LLVMValueRef Inst);
 
 /**
  * Obtain the predicate of an instruction.
  *
  * This is only valid for instructions that correspond to llvm::ICmpInst
  * or llvm::ConstantExpr whose opcode is llvm::Instruction::ICmp.
  *
  * @see llvm::ICmpInst::getPredicate()
  */
 LLVMIntPredicate LLVMGetICmpPredicate(LLVMValueRef Inst);
 
 /**
  * Obtain the float predicate of an instruction.
  *
  * This is only valid for instructions that correspond to llvm::FCmpInst
  * or llvm::ConstantExpr whose opcode is llvm::Instruction::FCmp.
  *
  * @see llvm::FCmpInst::getPredicate()
  */
 LLVMRealPredicate LLVMGetFCmpPredicate(LLVMValueRef Inst);
 
 /**
  * Create a copy of 'this' instruction that is identical in all ways
  * except the following:
  *   * The instruction has no parent
  *   * The instruction has no name
  *
  * @see llvm::Instruction::clone()
  */
 LLVMValueRef LLVMInstructionClone(LLVMValueRef Inst);
 
 /**
  * Determine whether an instruction is a terminator. This routine is named to
  * be compatible with historical functions that did this by querying the
  * underlying C++ type.
  *
  * @see llvm::Instruction::isTerminator()
  */
 LLVMValueRef LLVMIsATerminatorInst(LLVMValueRef Inst);
 
 /**
  * @defgroup LLVMCCoreValueInstructionCall Call Sites and Invocations
  *
  * Functions in this group apply to instructions that refer to call
  * sites and invocations. These correspond to C++ types in the
  * llvm::CallInst class tree.
  *
  * @{
  */
 
 /**
  * Obtain the argument count for a call instruction.
  *
  * This expects an LLVMValueRef that corresponds to a llvm::CallInst,
  * llvm::InvokeInst, or llvm:FuncletPadInst.
  *
  * @see llvm::CallInst::getNumArgOperands()
  * @see llvm::InvokeInst::getNumArgOperands()
  * @see llvm::FuncletPadInst::getNumArgOperands()
  */
 unsigned LLVMGetNumArgOperands(LLVMValueRef Instr);
 
 /**
  * Set the calling convention for a call instruction.
  *
  * This expects an LLVMValueRef that corresponds to a llvm::CallInst or
  * llvm::InvokeInst.
  *
  * @see llvm::CallInst::setCallingConv()
  * @see llvm::InvokeInst::setCallingConv()
  */
 void LLVMSetInstructionCallConv(LLVMValueRef Instr, unsigned CC);
 
 /**
  * Obtain the calling convention for a call instruction.
  *
  * This is the opposite of LLVMSetInstructionCallConv(). Reads its
  * usage.
  *
  * @see LLVMSetInstructionCallConv()
  */
 unsigned LLVMGetInstructionCallConv(LLVMValueRef Instr);
 
 void LLVMSetInstrParamAlignment(LLVMValueRef Instr, unsigned index,
                                 unsigned Align);
 
 void LLVMAddCallSiteAttribute(LLVMValueRef C, LLVMAttributeIndex Idx,
                               LLVMAttributeRef A);
 unsigned LLVMGetCallSiteAttributeCount(LLVMValueRef C, LLVMAttributeIndex Idx);
 void LLVMGetCallSiteAttributes(LLVMValueRef C, LLVMAttributeIndex Idx,
                                LLVMAttributeRef *Attrs);
 LLVMAttributeRef LLVMGetCallSiteEnumAttribute(LLVMValueRef C,
                                               LLVMAttributeIndex Idx,
                                               unsigned KindID);
 LLVMAttributeRef LLVMGetCallSiteStringAttribute(LLVMValueRef C,
                                                 LLVMAttributeIndex Idx,
                                                 const char *K, unsigned KLen);
 void LLVMRemoveCallSiteEnumAttribute(LLVMValueRef C, LLVMAttributeIndex Idx,
                                      unsigned KindID);
 void LLVMRemoveCallSiteStringAttribute(LLVMValueRef C, LLVMAttributeIndex Idx,
                                        const char *K, unsigned KLen);
 
 /**
  * Obtain the function type called by this instruction.
  *
  * @see llvm::CallBase::getFunctionType()
  */
 LLVMTypeRef LLVMGetCalledFunctionType(LLVMValueRef C);
 
 /**
  * Obtain the pointer to the function invoked by this instruction.
  *
  * This expects an LLVMValueRef that corresponds to a llvm::CallInst or
  * llvm::InvokeInst.
  *
  * @see llvm::CallInst::getCalledOperand()
  * @see llvm::InvokeInst::getCalledOperand()
  */
 LLVMValueRef LLVMGetCalledValue(LLVMValueRef Instr);
 
 /**
  * Obtain whether a call instruction is a tail call.
  *
  * This only works on llvm::CallInst instructions.
  *
  * @see llvm::CallInst::isTailCall()
  */
 LLVMBool LLVMIsTailCall(LLVMValueRef CallInst);
 
 /**
  * Set whether a call instruction is a tail call.
  *
  * This only works on llvm::CallInst instructions.
  *
  * @see llvm::CallInst::setTailCall()
  */
 void LLVMSetTailCall(LLVMValueRef CallInst, LLVMBool IsTailCall);
 
 /**
  * Return the normal destination basic block.
  *
  * This only works on llvm::InvokeInst instructions.
  *
  * @see llvm::InvokeInst::getNormalDest()
  */
 LLVMBasicBlockRef LLVMGetNormalDest(LLVMValueRef InvokeInst);
 
 /**
  * Return the unwind destination basic block.
  *
  * Works on llvm::InvokeInst, llvm::CleanupReturnInst, and
  * llvm::CatchSwitchInst instructions.
  *
  * @see llvm::InvokeInst::getUnwindDest()
  * @see llvm::CleanupReturnInst::getUnwindDest()
  * @see llvm::CatchSwitchInst::getUnwindDest()
  */
 LLVMBasicBlockRef LLVMGetUnwindDest(LLVMValueRef InvokeInst);
 
 /**
  * Set the normal destination basic block.
  *
  * This only works on llvm::InvokeInst instructions.
  *
  * @see llvm::InvokeInst::setNormalDest()
  */
 void LLVMSetNormalDest(LLVMValueRef InvokeInst, LLVMBasicBlockRef B);
 
 /**
  * Set the unwind destination basic block.
  *
  * Works on llvm::InvokeInst, llvm::CleanupReturnInst, and
  * llvm::CatchSwitchInst instructions.
  *
  * @see llvm::InvokeInst::setUnwindDest()
  * @see llvm::CleanupReturnInst::setUnwindDest()
  * @see llvm::CatchSwitchInst::setUnwindDest()
  */
 void LLVMSetUnwindDest(LLVMValueRef InvokeInst, LLVMBasicBlockRef B);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueInstructionTerminator Terminators
  *
  * Functions in this group only apply to instructions for which
  * LLVMIsATerminatorInst returns true.
  *
  * @{
  */
 
 /**
  * Return the number of successors that this terminator has.
  *
  * @see llvm::Instruction::getNumSuccessors
  */
 unsigned LLVMGetNumSuccessors(LLVMValueRef Term);
 
 /**
  * Return the specified successor.
  *
  * @see llvm::Instruction::getSuccessor
  */
 LLVMBasicBlockRef LLVMGetSuccessor(LLVMValueRef Term, unsigned i);
 
 /**
  * Update the specified successor to point at the provided block.
  *
  * @see llvm::Instruction::setSuccessor
  */
 void LLVMSetSuccessor(LLVMValueRef Term, unsigned i, LLVMBasicBlockRef block);
 
 /**
  * Return if a branch is conditional.
  *
  * This only works on llvm::BranchInst instructions.
  *
  * @see llvm::BranchInst::isConditional
  */
 LLVMBool LLVMIsConditional(LLVMValueRef Branch);
 
 /**
  * Return the condition of a branch instruction.
  *
  * This only works on llvm::BranchInst instructions.
  *
  * @see llvm::BranchInst::getCondition
  */
 LLVMValueRef LLVMGetCondition(LLVMValueRef Branch);
 
 /**
  * Set the condition of a branch instruction.
  *
  * This only works on llvm::BranchInst instructions.
  *
  * @see llvm::BranchInst::setCondition
  */
 void LLVMSetCondition(LLVMValueRef Branch, LLVMValueRef Cond);
 
 /**
  * Obtain the default destination basic block of a switch instruction.
  *
  * This only works on llvm::SwitchInst instructions.
  *
  * @see llvm::SwitchInst::getDefaultDest()
  */
 LLVMBasicBlockRef LLVMGetSwitchDefaultDest(LLVMValueRef SwitchInstr);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueInstructionAlloca Allocas
  *
  * Functions in this group only apply to instructions that map to
  * llvm::AllocaInst instances.
  *
  * @{
  */
 
 /**
  * Obtain the type that is being allocated by the alloca instruction.
  */
 LLVMTypeRef LLVMGetAllocatedType(LLVMValueRef Alloca);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueInstructionGetElementPointer GEPs
  *
  * Functions in this group only apply to instructions that map to
  * llvm::GetElementPtrInst instances.
  *
  * @{
  */
 
 /**
  * Check whether the given GEP instruction is inbounds.
  */
 LLVMBool LLVMIsInBounds(LLVMValueRef GEP);
 
 /**
  * Set the given GEP instruction to be inbounds or not.
  */
 void LLVMSetIsInBounds(LLVMValueRef GEP, LLVMBool InBounds);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueInstructionPHINode PHI Nodes
  *
  * Functions in this group only apply to instructions that map to
  * llvm::PHINode instances.
  *
  * @{
  */
 
 /**
  * Add an incoming value to the end of a PHI list.
  */
 void LLVMAddIncoming(LLVMValueRef PhiNode, LLVMValueRef *IncomingValues,
                      LLVMBasicBlockRef *IncomingBlocks, unsigned Count);
 
 /**
  * Obtain the number of incoming basic blocks to a PHI node.
  */
 unsigned LLVMCountIncoming(LLVMValueRef PhiNode);
 
 /**
  * Obtain an incoming value to a PHI node as an LLVMValueRef.
  */
 LLVMValueRef LLVMGetIncomingValue(LLVMValueRef PhiNode, unsigned Index);
 
 /**
  * Obtain an incoming value to a PHI node as an LLVMBasicBlockRef.
  */
 LLVMBasicBlockRef LLVMGetIncomingBlock(LLVMValueRef PhiNode, unsigned Index);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreValueInstructionExtractValue ExtractValue
  * @defgroup LLVMCCoreValueInstructionInsertValue InsertValue
  *
  * Functions in this group only apply to instructions that map to
  * llvm::ExtractValue and llvm::InsertValue instances.
  *
  * @{
  */
 
 /**
  * Obtain the number of indices.
  * NB: This also works on GEP.
  */
 unsigned LLVMGetNumIndices(LLVMValueRef Inst);
 
 /**
  * Obtain the indices as an array.
  */
 const unsigned *LLVMGetIndices(LLVMValueRef Inst);
 
 /**
  * @}
  */
 
 /**
  * @}
  */
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreInstructionBuilder Instruction Builders
  *
  * An instruction builder represents a point within a basic block and is
  * the exclusive means of building instructions using the C interface.
  *
  * @{
  */
 
 LLVMBuilderRef LLVMCreateBuilderInContext(LLVMContextRef C);
 LLVMBuilderRef LLVMCreateBuilder(void);
 void LLVMPositionBuilder(LLVMBuilderRef Builder, LLVMBasicBlockRef Block,
                          LLVMValueRef Instr);
 void LLVMPositionBuilderBefore(LLVMBuilderRef Builder, LLVMValueRef Instr);
 void LLVMPositionBuilderAtEnd(LLVMBuilderRef Builder, LLVMBasicBlockRef Block);
 LLVMBasicBlockRef LLVMGetInsertBlock(LLVMBuilderRef Builder);
 void LLVMClearInsertionPosition(LLVMBuilderRef Builder);
 void LLVMInsertIntoBuilder(LLVMBuilderRef Builder, LLVMValueRef Instr);
 void LLVMInsertIntoBuilderWithName(LLVMBuilderRef Builder, LLVMValueRef Instr,
                                    const char *Name);
 void LLVMDisposeBuilder(LLVMBuilderRef Builder);
 
 /* Metadata */
 
 /**
  * Get location information used by debugging information.
  *
  * @see llvm::IRBuilder::getCurrentDebugLocation()
  */
 LLVMMetadataRef LLVMGetCurrentDebugLocation2(LLVMBuilderRef Builder);
 
 /**
  * Set location information used by debugging information.
  *
  * To clear the location metadata of the given instruction, pass NULL to \p Loc.
  *
  * @see llvm::IRBuilder::SetCurrentDebugLocation()
  */
 void LLVMSetCurrentDebugLocation2(LLVMBuilderRef Builder, LLVMMetadataRef Loc);
 
 /**
  * Attempts to set the debug location for the given instruction using the
  * current debug location for the given builder.  If the builder has no current
  * debug location, this function is a no-op.
  *
  * @see llvm::IRBuilder::SetInstDebugLocation()
  */
 void LLVMSetInstDebugLocation(LLVMBuilderRef Builder, LLVMValueRef Inst);
 
 /**
  * Get the dafult floating-point math metadata for a given builder.
  *
  * @see llvm::IRBuilder::getDefaultFPMathTag()
  */
 LLVMMetadataRef LLVMBuilderGetDefaultFPMathTag(LLVMBuilderRef Builder);
 
 /**
  * Set the default floating-point math metadata for the given builder.
  *
  * To clear the metadata, pass NULL to \p FPMathTag.
  *
  * @see llvm::IRBuilder::setDefaultFPMathTag()
  */
 void LLVMBuilderSetDefaultFPMathTag(LLVMBuilderRef Builder,
                                     LLVMMetadataRef FPMathTag);
 
 /**
  * Deprecated: Passing the NULL location will crash.
  * Use LLVMGetCurrentDebugLocation2 instead.
  */
 void LLVMSetCurrentDebugLocation(LLVMBuilderRef Builder, LLVMValueRef L);
 /**
  * Deprecated: Returning the NULL location will crash.
  * Use LLVMGetCurrentDebugLocation2 instead.
  */
 LLVMValueRef LLVMGetCurrentDebugLocation(LLVMBuilderRef Builder);
 
 /* Terminators */
 LLVMValueRef LLVMBuildRetVoid(LLVMBuilderRef);
 LLVMValueRef LLVMBuildRet(LLVMBuilderRef, LLVMValueRef V);
 LLVMValueRef LLVMBuildAggregateRet(LLVMBuilderRef, LLVMValueRef *RetVals,
                                    unsigned N);
 LLVMValueRef LLVMBuildBr(LLVMBuilderRef, LLVMBasicBlockRef Dest);
 LLVMValueRef LLVMBuildCondBr(LLVMBuilderRef, LLVMValueRef If,
                              LLVMBasicBlockRef Then, LLVMBasicBlockRef Else);
 LLVMValueRef LLVMBuildSwitch(LLVMBuilderRef, LLVMValueRef V,
                              LLVMBasicBlockRef Else, unsigned NumCases);
 LLVMValueRef LLVMBuildIndirectBr(LLVMBuilderRef B, LLVMValueRef Addr,
                                  unsigned NumDests);
 // LLVMBuildInvoke is deprecated in favor of LLVMBuildInvoke2, in preparation
 // for opaque pointer types.
 LLVMValueRef LLVMBuildInvoke(LLVMBuilderRef, LLVMValueRef Fn,
                              LLVMValueRef *Args, unsigned NumArgs,
                              LLVMBasicBlockRef Then, LLVMBasicBlockRef Catch,
                              const char *Name);
 LLVMValueRef LLVMBuildInvoke2(LLVMBuilderRef, LLVMTypeRef Ty, LLVMValueRef Fn,
                               LLVMValueRef *Args, unsigned NumArgs,
                               LLVMBasicBlockRef Then, LLVMBasicBlockRef Catch,
                               const char *Name);
 LLVMValueRef LLVMBuildUnreachable(LLVMBuilderRef);
 
 /* Exception Handling */
 LLVMValueRef LLVMBuildResume(LLVMBuilderRef B, LLVMValueRef Exn);
 LLVMValueRef LLVMBuildLandingPad(LLVMBuilderRef B, LLVMTypeRef Ty,
                                  LLVMValueRef PersFn, unsigned NumClauses,
                                  const char *Name);
 LLVMValueRef LLVMBuildCleanupRet(LLVMBuilderRef B, LLVMValueRef CatchPad,
                                  LLVMBasicBlockRef BB);
 LLVMValueRef LLVMBuildCatchRet(LLVMBuilderRef B, LLVMValueRef CatchPad,
                                LLVMBasicBlockRef BB);
 LLVMValueRef LLVMBuildCatchPad(LLVMBuilderRef B, LLVMValueRef ParentPad,
                                LLVMValueRef *Args, unsigned NumArgs,
                                const char *Name);
 LLVMValueRef LLVMBuildCleanupPad(LLVMBuilderRef B, LLVMValueRef ParentPad,
                                  LLVMValueRef *Args, unsigned NumArgs,
                                  const char *Name);
 LLVMValueRef LLVMBuildCatchSwitch(LLVMBuilderRef B, LLVMValueRef ParentPad,
                                   LLVMBasicBlockRef UnwindBB,
                                   unsigned NumHandlers, const char *Name);
 
 /* Add a case to the switch instruction */
 void LLVMAddCase(LLVMValueRef Switch, LLVMValueRef OnVal,
                  LLVMBasicBlockRef Dest);
 
 /* Add a destination to the indirectbr instruction */
 void LLVMAddDestination(LLVMValueRef IndirectBr, LLVMBasicBlockRef Dest);
 
 /* Get the number of clauses on the landingpad instruction */
 unsigned LLVMGetNumClauses(LLVMValueRef LandingPad);
 
-/* Get the value of the clause at idnex Idx on the landingpad instruction */
+/* Get the value of the clause at index Idx on the landingpad instruction */
 LLVMValueRef LLVMGetClause(LLVMValueRef LandingPad, unsigned Idx);
 
 /* Add a catch or filter clause to the landingpad instruction */
 void LLVMAddClause(LLVMValueRef LandingPad, LLVMValueRef ClauseVal);
 
 /* Get the 'cleanup' flag in the landingpad instruction */
 LLVMBool LLVMIsCleanup(LLVMValueRef LandingPad);
 
 /* Set the 'cleanup' flag in the landingpad instruction */
 void LLVMSetCleanup(LLVMValueRef LandingPad, LLVMBool Val);
 
 /* Add a destination to the catchswitch instruction */
 void LLVMAddHandler(LLVMValueRef CatchSwitch, LLVMBasicBlockRef Dest);
 
 /* Get the number of handlers on the catchswitch instruction */
 unsigned LLVMGetNumHandlers(LLVMValueRef CatchSwitch);
 
 /**
  * Obtain the basic blocks acting as handlers for a catchswitch instruction.
  *
  * The Handlers parameter should point to a pre-allocated array of
  * LLVMBasicBlockRefs at least LLVMGetNumHandlers() large. On return, the
  * first LLVMGetNumHandlers() entries in the array will be populated
  * with LLVMBasicBlockRef instances.
  *
  * @param CatchSwitch The catchswitch instruction to operate on.
  * @param Handlers Memory address of an array to be filled with basic blocks.
  */
 void LLVMGetHandlers(LLVMValueRef CatchSwitch, LLVMBasicBlockRef *Handlers);
 
 /* Funclets */
 
 /* Get the number of funcletpad arguments. */
 LLVMValueRef LLVMGetArgOperand(LLVMValueRef Funclet, unsigned i);
 
 /* Set a funcletpad argument at the given index. */
 void LLVMSetArgOperand(LLVMValueRef Funclet, unsigned i, LLVMValueRef value);
 
 /**
  * Get the parent catchswitch instruction of a catchpad instruction.
  *
  * This only works on llvm::CatchPadInst instructions.
  *
  * @see llvm::CatchPadInst::getCatchSwitch()
  */
 LLVMValueRef LLVMGetParentCatchSwitch(LLVMValueRef CatchPad);
 
 /**
  * Set the parent catchswitch instruction of a catchpad instruction.
  *
  * This only works on llvm::CatchPadInst instructions.
  *
  * @see llvm::CatchPadInst::setCatchSwitch()
  */
 void LLVMSetParentCatchSwitch(LLVMValueRef CatchPad, LLVMValueRef CatchSwitch);
 
 /* Arithmetic */
 LLVMValueRef LLVMBuildAdd(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name);
 LLVMValueRef LLVMBuildNSWAdd(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                              const char *Name);
 LLVMValueRef LLVMBuildNUWAdd(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                              const char *Name);
 LLVMValueRef LLVMBuildFAdd(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildSub(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name);
 LLVMValueRef LLVMBuildNSWSub(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                              const char *Name);
 LLVMValueRef LLVMBuildNUWSub(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                              const char *Name);
 LLVMValueRef LLVMBuildFSub(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildMul(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name);
 LLVMValueRef LLVMBuildNSWMul(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                              const char *Name);
 LLVMValueRef LLVMBuildNUWMul(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                              const char *Name);
 LLVMValueRef LLVMBuildFMul(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildUDiv(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildExactUDiv(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                                 const char *Name);
 LLVMValueRef LLVMBuildSDiv(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildExactSDiv(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                                 const char *Name);
 LLVMValueRef LLVMBuildFDiv(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildURem(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildSRem(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildFRem(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildShl(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildLShr(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildAShr(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildAnd(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name);
 LLVMValueRef LLVMBuildOr(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name);
 LLVMValueRef LLVMBuildXor(LLVMBuilderRef, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name);
 LLVMValueRef LLVMBuildBinOp(LLVMBuilderRef B, LLVMOpcode Op,
                             LLVMValueRef LHS, LLVMValueRef RHS,
                             const char *Name);
 LLVMValueRef LLVMBuildNeg(LLVMBuilderRef, LLVMValueRef V, const char *Name);
 LLVMValueRef LLVMBuildNSWNeg(LLVMBuilderRef B, LLVMValueRef V,
                              const char *Name);
 LLVMValueRef LLVMBuildNUWNeg(LLVMBuilderRef B, LLVMValueRef V,
                              const char *Name);
 LLVMValueRef LLVMBuildFNeg(LLVMBuilderRef, LLVMValueRef V, const char *Name);
 LLVMValueRef LLVMBuildNot(LLVMBuilderRef, LLVMValueRef V, const char *Name);
 
 /* Memory */
 LLVMValueRef LLVMBuildMalloc(LLVMBuilderRef, LLVMTypeRef Ty, const char *Name);
 LLVMValueRef LLVMBuildArrayMalloc(LLVMBuilderRef, LLVMTypeRef Ty,
                                   LLVMValueRef Val, const char *Name);
 
 /**
  * Creates and inserts a memset to the specified pointer and the
  * specified value.
  *
  * @see llvm::IRRBuilder::CreateMemSet()
  */
 LLVMValueRef LLVMBuildMemSet(LLVMBuilderRef B, LLVMValueRef Ptr,
                              LLVMValueRef Val, LLVMValueRef Len,
                              unsigned Align);
 /**
  * Creates and inserts a memcpy between the specified pointers.
  *
  * @see llvm::IRRBuilder::CreateMemCpy()
  */
 LLVMValueRef LLVMBuildMemCpy(LLVMBuilderRef B,
                              LLVMValueRef Dst, unsigned DstAlign,
                              LLVMValueRef Src, unsigned SrcAlign,
                              LLVMValueRef Size);
 /**
  * Creates and inserts a memmove between the specified pointers.
  *
  * @see llvm::IRRBuilder::CreateMemMove()
  */
 LLVMValueRef LLVMBuildMemMove(LLVMBuilderRef B,
                               LLVMValueRef Dst, unsigned DstAlign,
                               LLVMValueRef Src, unsigned SrcAlign,
                               LLVMValueRef Size);
 
 LLVMValueRef LLVMBuildAlloca(LLVMBuilderRef, LLVMTypeRef Ty, const char *Name);
 LLVMValueRef LLVMBuildArrayAlloca(LLVMBuilderRef, LLVMTypeRef Ty,
                                   LLVMValueRef Val, const char *Name);
 LLVMValueRef LLVMBuildFree(LLVMBuilderRef, LLVMValueRef PointerVal);
 // LLVMBuildLoad is deprecated in favor of LLVMBuildLoad2, in preparation for
 // opaque pointer types.
 LLVMValueRef LLVMBuildLoad(LLVMBuilderRef, LLVMValueRef PointerVal,
                            const char *Name);
 LLVMValueRef LLVMBuildLoad2(LLVMBuilderRef, LLVMTypeRef Ty,
                             LLVMValueRef PointerVal, const char *Name);
 LLVMValueRef LLVMBuildStore(LLVMBuilderRef, LLVMValueRef Val, LLVMValueRef Ptr);
 // LLVMBuildGEP, LLVMBuildInBoundsGEP, and LLVMBuildStructGEP are deprecated in
 // favor of LLVMBuild*GEP2, in preparation for opaque pointer types.
 LLVMValueRef LLVMBuildGEP(LLVMBuilderRef B, LLVMValueRef Pointer,
                           LLVMValueRef *Indices, unsigned NumIndices,
                           const char *Name);
 LLVMValueRef LLVMBuildInBoundsGEP(LLVMBuilderRef B, LLVMValueRef Pointer,
                                   LLVMValueRef *Indices, unsigned NumIndices,
                                   const char *Name);
 LLVMValueRef LLVMBuildStructGEP(LLVMBuilderRef B, LLVMValueRef Pointer,
                                 unsigned Idx, const char *Name);
 LLVMValueRef LLVMBuildGEP2(LLVMBuilderRef B, LLVMTypeRef Ty,
                            LLVMValueRef Pointer, LLVMValueRef *Indices,
                            unsigned NumIndices, const char *Name);
 LLVMValueRef LLVMBuildInBoundsGEP2(LLVMBuilderRef B, LLVMTypeRef Ty,
                                    LLVMValueRef Pointer, LLVMValueRef *Indices,
                                    unsigned NumIndices, const char *Name);
 LLVMValueRef LLVMBuildStructGEP2(LLVMBuilderRef B, LLVMTypeRef Ty,
                                  LLVMValueRef Pointer, unsigned Idx,
                                  const char *Name);
 LLVMValueRef LLVMBuildGlobalString(LLVMBuilderRef B, const char *Str,
                                    const char *Name);
 LLVMValueRef LLVMBuildGlobalStringPtr(LLVMBuilderRef B, const char *Str,
                                       const char *Name);
 LLVMBool LLVMGetVolatile(LLVMValueRef MemoryAccessInst);
 void LLVMSetVolatile(LLVMValueRef MemoryAccessInst, LLVMBool IsVolatile);
 LLVMBool LLVMGetWeak(LLVMValueRef CmpXchgInst);
 void LLVMSetWeak(LLVMValueRef CmpXchgInst, LLVMBool IsWeak);
 LLVMAtomicOrdering LLVMGetOrdering(LLVMValueRef MemoryAccessInst);
 void LLVMSetOrdering(LLVMValueRef MemoryAccessInst, LLVMAtomicOrdering Ordering);
 LLVMAtomicRMWBinOp LLVMGetAtomicRMWBinOp(LLVMValueRef AtomicRMWInst);
 void LLVMSetAtomicRMWBinOp(LLVMValueRef AtomicRMWInst, LLVMAtomicRMWBinOp BinOp);
 
 /* Casts */
 LLVMValueRef LLVMBuildTrunc(LLVMBuilderRef, LLVMValueRef Val,
                             LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildZExt(LLVMBuilderRef, LLVMValueRef Val,
                            LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildSExt(LLVMBuilderRef, LLVMValueRef Val,
                            LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildFPToUI(LLVMBuilderRef, LLVMValueRef Val,
                              LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildFPToSI(LLVMBuilderRef, LLVMValueRef Val,
                              LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildUIToFP(LLVMBuilderRef, LLVMValueRef Val,
                              LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildSIToFP(LLVMBuilderRef, LLVMValueRef Val,
                              LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildFPTrunc(LLVMBuilderRef, LLVMValueRef Val,
                               LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildFPExt(LLVMBuilderRef, LLVMValueRef Val,
                             LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildPtrToInt(LLVMBuilderRef, LLVMValueRef Val,
                                LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildIntToPtr(LLVMBuilderRef, LLVMValueRef Val,
                                LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildBitCast(LLVMBuilderRef, LLVMValueRef Val,
                               LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildAddrSpaceCast(LLVMBuilderRef, LLVMValueRef Val,
                                     LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildZExtOrBitCast(LLVMBuilderRef, LLVMValueRef Val,
                                     LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildSExtOrBitCast(LLVMBuilderRef, LLVMValueRef Val,
                                     LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildTruncOrBitCast(LLVMBuilderRef, LLVMValueRef Val,
                                      LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildCast(LLVMBuilderRef B, LLVMOpcode Op, LLVMValueRef Val,
                            LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildPointerCast(LLVMBuilderRef, LLVMValueRef Val,
                                   LLVMTypeRef DestTy, const char *Name);
 LLVMValueRef LLVMBuildIntCast2(LLVMBuilderRef, LLVMValueRef Val,
                                LLVMTypeRef DestTy, LLVMBool IsSigned,
                                const char *Name);
 LLVMValueRef LLVMBuildFPCast(LLVMBuilderRef, LLVMValueRef Val,
                              LLVMTypeRef DestTy, const char *Name);
 
 /** Deprecated: This cast is always signed. Use LLVMBuildIntCast2 instead. */
 LLVMValueRef LLVMBuildIntCast(LLVMBuilderRef, LLVMValueRef Val, /*Signed cast!*/
                               LLVMTypeRef DestTy, const char *Name);
 
 /* Comparisons */
 LLVMValueRef LLVMBuildICmp(LLVMBuilderRef, LLVMIntPredicate Op,
                            LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 LLVMValueRef LLVMBuildFCmp(LLVMBuilderRef, LLVMRealPredicate Op,
                            LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name);
 
 /* Miscellaneous instructions */
 LLVMValueRef LLVMBuildPhi(LLVMBuilderRef, LLVMTypeRef Ty, const char *Name);
 // LLVMBuildCall is deprecated in favor of LLVMBuildCall2, in preparation for
 // opaque pointer types.
 LLVMValueRef LLVMBuildCall(LLVMBuilderRef, LLVMValueRef Fn,
                            LLVMValueRef *Args, unsigned NumArgs,
                            const char *Name);
 LLVMValueRef LLVMBuildCall2(LLVMBuilderRef, LLVMTypeRef, LLVMValueRef Fn,
                             LLVMValueRef *Args, unsigned NumArgs,
                             const char *Name);
 LLVMValueRef LLVMBuildSelect(LLVMBuilderRef, LLVMValueRef If,
                              LLVMValueRef Then, LLVMValueRef Else,
                              const char *Name);
 LLVMValueRef LLVMBuildVAArg(LLVMBuilderRef, LLVMValueRef List, LLVMTypeRef Ty,
                             const char *Name);
 LLVMValueRef LLVMBuildExtractElement(LLVMBuilderRef, LLVMValueRef VecVal,
                                      LLVMValueRef Index, const char *Name);
 LLVMValueRef LLVMBuildInsertElement(LLVMBuilderRef, LLVMValueRef VecVal,
                                     LLVMValueRef EltVal, LLVMValueRef Index,
                                     const char *Name);
 LLVMValueRef LLVMBuildShuffleVector(LLVMBuilderRef, LLVMValueRef V1,
                                     LLVMValueRef V2, LLVMValueRef Mask,
                                     const char *Name);
 LLVMValueRef LLVMBuildExtractValue(LLVMBuilderRef, LLVMValueRef AggVal,
                                    unsigned Index, const char *Name);
 LLVMValueRef LLVMBuildInsertValue(LLVMBuilderRef, LLVMValueRef AggVal,
                                   LLVMValueRef EltVal, unsigned Index,
                                   const char *Name);
 LLVMValueRef LLVMBuildFreeze(LLVMBuilderRef, LLVMValueRef Val,
                              const char *Name);
 
 LLVMValueRef LLVMBuildIsNull(LLVMBuilderRef, LLVMValueRef Val,
                              const char *Name);
 LLVMValueRef LLVMBuildIsNotNull(LLVMBuilderRef, LLVMValueRef Val,
                                 const char *Name);
 LLVMValueRef LLVMBuildPtrDiff(LLVMBuilderRef, LLVMValueRef LHS,
                               LLVMValueRef RHS, const char *Name);
 LLVMValueRef LLVMBuildFence(LLVMBuilderRef B, LLVMAtomicOrdering ordering,
                             LLVMBool singleThread, const char *Name);
 LLVMValueRef LLVMBuildAtomicRMW(LLVMBuilderRef B, LLVMAtomicRMWBinOp op,
                                 LLVMValueRef PTR, LLVMValueRef Val,
                                 LLVMAtomicOrdering ordering,
                                 LLVMBool singleThread);
 LLVMValueRef LLVMBuildAtomicCmpXchg(LLVMBuilderRef B, LLVMValueRef Ptr,
                                     LLVMValueRef Cmp, LLVMValueRef New,
                                     LLVMAtomicOrdering SuccessOrdering,
                                     LLVMAtomicOrdering FailureOrdering,
                                     LLVMBool SingleThread);
+
+/**
+ * Get the number of elements in the mask of a ShuffleVector instruction.
+ */
+unsigned LLVMGetNumMaskElements(LLVMValueRef ShuffleVectorInst);
+
+/**
+ * \returns a constant that specifies that the result of a \c ShuffleVectorInst
+ * is undefined.
+ */
+int LLVMGetUndefMaskElem(void);
+
+/**
+ * Get the mask value at position Elt in the mask of a ShuffleVector
+ * instruction.
+ *
+ * \Returns the result of \c LLVMGetUndefMaskElem() if the mask value is undef
+ * at that position.
+ */
+int LLVMGetMaskValue(LLVMValueRef ShuffleVectorInst, unsigned Elt);
 
 LLVMBool LLVMIsAtomicSingleThread(LLVMValueRef AtomicInst);
 void LLVMSetAtomicSingleThread(LLVMValueRef AtomicInst, LLVMBool SingleThread);
 
 LLVMAtomicOrdering LLVMGetCmpXchgSuccessOrdering(LLVMValueRef CmpXchgInst);
 void LLVMSetCmpXchgSuccessOrdering(LLVMValueRef CmpXchgInst,
                                    LLVMAtomicOrdering Ordering);
 LLVMAtomicOrdering LLVMGetCmpXchgFailureOrdering(LLVMValueRef CmpXchgInst);
 void LLVMSetCmpXchgFailureOrdering(LLVMValueRef CmpXchgInst,
                                    LLVMAtomicOrdering Ordering);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreModuleProvider Module Providers
  *
  * @{
  */
 
 /**
  * Changes the type of M so it can be passed to FunctionPassManagers and the
  * JIT.  They take ModuleProviders for historical reasons.
  */
 LLVMModuleProviderRef
 LLVMCreateModuleProviderForExistingModule(LLVMModuleRef M);
 
 /**
  * Destroys the module M.
  */
 void LLVMDisposeModuleProvider(LLVMModuleProviderRef M);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreMemoryBuffers Memory Buffers
  *
  * @{
  */
 
 LLVMBool LLVMCreateMemoryBufferWithContentsOfFile(const char *Path,
                                                   LLVMMemoryBufferRef *OutMemBuf,
                                                   char **OutMessage);
 LLVMBool LLVMCreateMemoryBufferWithSTDIN(LLVMMemoryBufferRef *OutMemBuf,
                                          char **OutMessage);
 LLVMMemoryBufferRef LLVMCreateMemoryBufferWithMemoryRange(const char *InputData,
                                                           size_t InputDataLength,
                                                           const char *BufferName,
                                                           LLVMBool RequiresNullTerminator);
 LLVMMemoryBufferRef LLVMCreateMemoryBufferWithMemoryRangeCopy(const char *InputData,
                                                               size_t InputDataLength,
                                                               const char *BufferName);
 const char *LLVMGetBufferStart(LLVMMemoryBufferRef MemBuf);
 size_t LLVMGetBufferSize(LLVMMemoryBufferRef MemBuf);
 void LLVMDisposeMemoryBuffer(LLVMMemoryBufferRef MemBuf);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCorePassRegistry Pass Registry
  *
  * @{
  */
 
 /** Return the global pass registry, for use with initialization functions.
     @see llvm::PassRegistry::getPassRegistry */
 LLVMPassRegistryRef LLVMGetGlobalPassRegistry(void);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCorePassManagers Pass Managers
  *
  * @{
  */
 
 /** Constructs a new whole-module pass pipeline. This type of pipeline is
     suitable for link-time optimization and whole-module transformations.
     @see llvm::PassManager::PassManager */
 LLVMPassManagerRef LLVMCreatePassManager(void);
 
 /** Constructs a new function-by-function pass pipeline over the module
     provider. It does not take ownership of the module provider. This type of
     pipeline is suitable for code generation and JIT compilation tasks.
     @see llvm::FunctionPassManager::FunctionPassManager */
 LLVMPassManagerRef LLVMCreateFunctionPassManagerForModule(LLVMModuleRef M);
 
 /** Deprecated: Use LLVMCreateFunctionPassManagerForModule instead. */
 LLVMPassManagerRef LLVMCreateFunctionPassManager(LLVMModuleProviderRef MP);
 
 /** Initializes, executes on the provided module, and finalizes all of the
     passes scheduled in the pass manager. Returns 1 if any of the passes
     modified the module, 0 otherwise.
     @see llvm::PassManager::run(Module&) */
 LLVMBool LLVMRunPassManager(LLVMPassManagerRef PM, LLVMModuleRef M);
 
 /** Initializes all of the function passes scheduled in the function pass
     manager. Returns 1 if any of the passes modified the module, 0 otherwise.
     @see llvm::FunctionPassManager::doInitialization */
 LLVMBool LLVMInitializeFunctionPassManager(LLVMPassManagerRef FPM);
 
 /** Executes all of the function passes scheduled in the function pass manager
     on the provided function. Returns 1 if any of the passes modified the
     function, false otherwise.
     @see llvm::FunctionPassManager::run(Function&) */
 LLVMBool LLVMRunFunctionPassManager(LLVMPassManagerRef FPM, LLVMValueRef F);
 
 /** Finalizes all of the function passes scheduled in the function pass
     manager. Returns 1 if any of the passes modified the module, 0 otherwise.
     @see llvm::FunctionPassManager::doFinalization */
 LLVMBool LLVMFinalizeFunctionPassManager(LLVMPassManagerRef FPM);
 
 /** Frees the memory of a pass pipeline. For function pipelines, does not free
     the module provider.
     @see llvm::PassManagerBase::~PassManagerBase. */
 void LLVMDisposePassManager(LLVMPassManagerRef PM);
 
 /**
  * @}
  */
 
 /**
  * @defgroup LLVMCCoreThreading Threading
  *
  * Handle the structures needed to make LLVM safe for multithreading.
  *
  * @{
  */
 
 /** Deprecated: Multi-threading can only be enabled/disabled with the compile
     time define LLVM_ENABLE_THREADS.  This function always returns
     LLVMIsMultithreaded(). */
 LLVMBool LLVMStartMultithreaded(void);
 
 /** Deprecated: Multi-threading can only be enabled/disabled with the compile
     time define LLVM_ENABLE_THREADS. */
 void LLVMStopMultithreaded(void);
 
 /** Check whether LLVM is executing in thread-safe mode or not.
     @see llvm::llvm_is_multithreaded */
 LLVMBool LLVMIsMultithreaded(void);
 
 /**
  * @}
  */
 
 /**
  * @}
  */
 
 /**
  * @}
  */
 
 LLVM_C_EXTERN_C_END
 
 #endif /* LLVM_C_CORE_H */
Index: vendor/llvm-project/release-11.x/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
===================================================================
--- vendor/llvm-project/release-11.x/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp	(revision 366333)
@@ -1,3368 +1,3368 @@
 //===- AsmPrinter.cpp - Common AsmPrinter code ----------------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 //
 // This file implements the AsmPrinter class.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/CodeGen/AsmPrinter.h"
 #include "CodeViewDebug.h"
 #include "DwarfDebug.h"
 #include "DwarfException.h"
 #include "WasmException.h"
 #include "WinCFGuard.h"
 #include "WinException.h"
 #include "llvm/ADT/APFloat.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Triple.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/Analysis/EHPersonalities.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/BinaryFormat/COFF.h"
 #include "llvm/BinaryFormat/Dwarf.h"
 #include "llvm/BinaryFormat/ELF.h"
 #include "llvm/CodeGen/GCMetadata.h"
 #include "llvm/CodeGen/GCMetadataPrinter.h"
 #include "llvm/CodeGen/GCStrategy.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineConstantPool.h"
 #include "llvm/CodeGen/MachineDominators.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineInstr.h"
 #include "llvm/CodeGen/MachineInstrBundle.h"
 #include "llvm/CodeGen/MachineJumpTableInfo.h"
 #include "llvm/CodeGen/MachineLoopInfo.h"
 #include "llvm/CodeGen/MachineMemOperand.h"
 #include "llvm/CodeGen/MachineModuleInfo.h"
 #include "llvm/CodeGen/MachineModuleInfoImpls.h"
 #include "llvm/CodeGen/MachineOperand.h"
 #include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
 #include "llvm/CodeGen/StackMaps.h"
 #include "llvm/CodeGen/TargetFrameLowering.h"
 #include "llvm/CodeGen/TargetInstrInfo.h"
 #include "llvm/CodeGen/TargetLowering.h"
 #include "llvm/CodeGen/TargetOpcodes.h"
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/Comdat.h"
 #include "llvm/IR/Constant.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/DebugInfoMetadata.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GlobalAlias.h"
 #include "llvm/IR/GlobalIFunc.h"
 #include "llvm/IR/GlobalIndirectSymbol.h"
 #include "llvm/IR/GlobalObject.h"
 #include "llvm/IR/GlobalValue.h"
 #include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/Mangler.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Operator.h"
 #include "llvm/IR/Type.h"
 #include "llvm/IR/Value.h"
 #include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCDirectives.h"
 #include "llvm/MC/MCDwarf.h"
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCInst.h"
 #include "llvm/MC/MCSection.h"
 #include "llvm/MC/MCSectionCOFF.h"
 #include "llvm/MC/MCSectionELF.h"
 #include "llvm/MC/MCSectionMachO.h"
 #include "llvm/MC/MCSectionXCOFF.h"
 #include "llvm/MC/MCStreamer.h"
 #include "llvm/MC/MCSubtargetInfo.h"
 #include "llvm/MC/MCSymbol.h"
 #include "llvm/MC/MCSymbolELF.h"
 #include "llvm/MC/MCSymbolXCOFF.h"
 #include "llvm/MC/MCTargetOptions.h"
 #include "llvm/MC/MCValue.h"
 #include "llvm/MC/SectionKind.h"
 #include "llvm/Pass.h"
 #include "llvm/Remarks/Remark.h"
 #include "llvm/Remarks/RemarkFormat.h"
 #include "llvm/Remarks/RemarkStreamer.h"
 #include "llvm/Remarks/RemarkStringTable.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/Format.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/TargetRegistry.h"
 #include "llvm/Support/Timer.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetLoweringObjectFile.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Target/TargetOptions.h"
 #include <algorithm>
 #include <cassert>
 #include <cinttypes>
 #include <cstdint>
 #include <iterator>
 #include <limits>
 #include <memory>
 #include <string>
 #include <utility>
 #include <vector>
 
 using namespace llvm;
 
 #define DEBUG_TYPE "asm-printer"
 
 static const char *const DWARFGroupName = "dwarf";
 static const char *const DWARFGroupDescription = "DWARF Emission";
 static const char *const DbgTimerName = "emit";
 static const char *const DbgTimerDescription = "Debug Info Emission";
 static const char *const EHTimerName = "write_exception";
 static const char *const EHTimerDescription = "DWARF Exception Writer";
 static const char *const CFGuardName = "Control Flow Guard";
 static const char *const CFGuardDescription = "Control Flow Guard";
 static const char *const CodeViewLineTablesGroupName = "linetables";
 static const char *const CodeViewLineTablesGroupDescription =
   "CodeView Line Tables";
 
 STATISTIC(EmittedInsts, "Number of machine instrs printed");
 
 char AsmPrinter::ID = 0;
 
 using gcp_map_type = DenseMap<GCStrategy *, std::unique_ptr<GCMetadataPrinter>>;
 
 static gcp_map_type &getGCMap(void *&P) {
   if (!P)
     P = new gcp_map_type();
   return *(gcp_map_type*)P;
 }
 
 /// getGVAlignment - Return the alignment to use for the specified global
 /// value.  This rounds up to the preferred alignment if possible and legal.
 Align AsmPrinter::getGVAlignment(const GlobalObject *GV, const DataLayout &DL,
                                  Align InAlign) {
   Align Alignment;
   if (const GlobalVariable *GVar = dyn_cast<GlobalVariable>(GV))
     Alignment = DL.getPreferredAlign(GVar);
 
   // If InAlign is specified, round it to it.
   if (InAlign > Alignment)
     Alignment = InAlign;
 
   // If the GV has a specified alignment, take it into account.
   const MaybeAlign GVAlign(GV->getAlignment());
   if (!GVAlign)
     return Alignment;
 
   assert(GVAlign && "GVAlign must be set");
 
   // If the GVAlign is larger than NumBits, or if we are required to obey
   // NumBits because the GV has an assigned section, obey it.
   if (*GVAlign > Alignment || GV->hasSection())
     Alignment = *GVAlign;
   return Alignment;
 }
 
 AsmPrinter::AsmPrinter(TargetMachine &tm, std::unique_ptr<MCStreamer> Streamer)
     : MachineFunctionPass(ID), TM(tm), MAI(tm.getMCAsmInfo()),
       OutContext(Streamer->getContext()), OutStreamer(std::move(Streamer)) {
   VerboseAsm = OutStreamer->isVerboseAsm();
 }
 
 AsmPrinter::~AsmPrinter() {
   assert(!DD && Handlers.empty() && "Debug/EH info didn't get finalized");
 
   if (GCMetadataPrinters) {
     gcp_map_type &GCMap = getGCMap(GCMetadataPrinters);
 
     delete &GCMap;
     GCMetadataPrinters = nullptr;
   }
 }
 
 bool AsmPrinter::isPositionIndependent() const {
   return TM.isPositionIndependent();
 }
 
 /// getFunctionNumber - Return a unique ID for the current function.
 unsigned AsmPrinter::getFunctionNumber() const {
   return MF->getFunctionNumber();
 }
 
 const TargetLoweringObjectFile &AsmPrinter::getObjFileLowering() const {
   return *TM.getObjFileLowering();
 }
 
 const DataLayout &AsmPrinter::getDataLayout() const {
   return MMI->getModule()->getDataLayout();
 }
 
 // Do not use the cached DataLayout because some client use it without a Module
 // (dsymutil, llvm-dwarfdump).
 unsigned AsmPrinter::getPointerSize() const {
   return TM.getPointerSize(0); // FIXME: Default address space
 }
 
 const MCSubtargetInfo &AsmPrinter::getSubtargetInfo() const {
   assert(MF && "getSubtargetInfo requires a valid MachineFunction!");
   return MF->getSubtarget<MCSubtargetInfo>();
 }
 
 void AsmPrinter::EmitToStreamer(MCStreamer &S, const MCInst &Inst) {
   S.emitInstruction(Inst, getSubtargetInfo());
 }
 
 void AsmPrinter::emitInitialRawDwarfLocDirective(const MachineFunction &MF) {
   assert(DD && "Dwarf debug file is not defined.");
   assert(OutStreamer->hasRawTextSupport() && "Expected assembly output mode.");
   (void)DD->emitInitialLocDirective(MF, /*CUID=*/0);
 }
 
 /// getCurrentSection() - Return the current section we are emitting to.
 const MCSection *AsmPrinter::getCurrentSection() const {
   return OutStreamer->getCurrentSectionOnly();
 }
 
 void AsmPrinter::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.setPreservesAll();
   MachineFunctionPass::getAnalysisUsage(AU);
   AU.addRequired<MachineOptimizationRemarkEmitterPass>();
   AU.addRequired<GCModuleInfo>();
 }
 
 bool AsmPrinter::doInitialization(Module &M) {
   auto *MMIWP = getAnalysisIfAvailable<MachineModuleInfoWrapperPass>();
   MMI = MMIWP ? &MMIWP->getMMI() : nullptr;
 
   // Initialize TargetLoweringObjectFile.
   const_cast<TargetLoweringObjectFile&>(getObjFileLowering())
     .Initialize(OutContext, TM);
 
   const_cast<TargetLoweringObjectFile &>(getObjFileLowering())
       .getModuleMetadata(M);
 
   OutStreamer->InitSections(false);
 
   // Emit the version-min deployment target directive if needed.
   //
   // FIXME: If we end up with a collection of these sorts of Darwin-specific
   // or ELF-specific things, it may make sense to have a platform helper class
   // that will work with the target helper class. For now keep it here, as the
   // alternative is duplicated code in each of the target asm printers that
   // use the directive, where it would need the same conditionalization
   // anyway.
   const Triple &Target = TM.getTargetTriple();
   OutStreamer->emitVersionForTarget(Target, M.getSDKVersion());
 
   // Allow the target to emit any magic that it wants at the start of the file.
   emitStartOfAsmFile(M);
 
   // Very minimal debug info. It is ignored if we emit actual debug info. If we
   // don't, this at least helps the user find where a global came from.
   if (MAI->hasSingleParameterDotFile()) {
     // .file "foo.c"
     OutStreamer->emitFileDirective(
         llvm::sys::path::filename(M.getSourceFileName()));
   }
 
   GCModuleInfo *MI = getAnalysisIfAvailable<GCModuleInfo>();
   assert(MI && "AsmPrinter didn't require GCModuleInfo?");
   for (auto &I : *MI)
     if (GCMetadataPrinter *MP = GetOrCreateGCPrinter(*I))
       MP->beginAssembly(M, *MI, *this);
 
   // Emit module-level inline asm if it exists.
   if (!M.getModuleInlineAsm().empty()) {
     // We're at the module level. Construct MCSubtarget from the default CPU
     // and target triple.
     std::unique_ptr<MCSubtargetInfo> STI(TM.getTarget().createMCSubtargetInfo(
         TM.getTargetTriple().str(), TM.getTargetCPU(),
         TM.getTargetFeatureString()));
     OutStreamer->AddComment("Start of file scope inline assembly");
     OutStreamer->AddBlankLine();
     emitInlineAsm(M.getModuleInlineAsm() + "\n",
                   OutContext.getSubtargetCopy(*STI), TM.Options.MCOptions);
     OutStreamer->AddComment("End of file scope inline assembly");
     OutStreamer->AddBlankLine();
   }
 
   if (MAI->doesSupportDebugInformation()) {
     bool EmitCodeView = M.getCodeViewFlag();
     if (EmitCodeView && TM.getTargetTriple().isOSWindows()) {
       Handlers.emplace_back(std::make_unique<CodeViewDebug>(this),
                             DbgTimerName, DbgTimerDescription,
                             CodeViewLineTablesGroupName,
                             CodeViewLineTablesGroupDescription);
     }
     if (!EmitCodeView || M.getDwarfVersion()) {
       DD = new DwarfDebug(this, &M);
       DD->beginModule();
       Handlers.emplace_back(std::unique_ptr<DwarfDebug>(DD), DbgTimerName,
                             DbgTimerDescription, DWARFGroupName,
                             DWARFGroupDescription);
     }
   }
 
   switch (MAI->getExceptionHandlingType()) {
   case ExceptionHandling::SjLj:
   case ExceptionHandling::DwarfCFI:
   case ExceptionHandling::ARM:
     isCFIMoveForDebugging = true;
     if (MAI->getExceptionHandlingType() != ExceptionHandling::DwarfCFI)
       break;
     for (auto &F: M.getFunctionList()) {
       // If the module contains any function with unwind data,
       // .eh_frame has to be emitted.
       // Ignore functions that won't get emitted.
       if (!F.isDeclarationForLinker() && F.needsUnwindTableEntry()) {
         isCFIMoveForDebugging = false;
         break;
       }
     }
     break;
   default:
     isCFIMoveForDebugging = false;
     break;
   }
 
   EHStreamer *ES = nullptr;
   switch (MAI->getExceptionHandlingType()) {
   case ExceptionHandling::None:
     break;
   case ExceptionHandling::SjLj:
   case ExceptionHandling::DwarfCFI:
     ES = new DwarfCFIException(this);
     break;
   case ExceptionHandling::ARM:
     ES = new ARMException(this);
     break;
   case ExceptionHandling::WinEH:
     switch (MAI->getWinEHEncodingType()) {
     default: llvm_unreachable("unsupported unwinding information encoding");
     case WinEH::EncodingType::Invalid:
       break;
     case WinEH::EncodingType::X86:
     case WinEH::EncodingType::Itanium:
       ES = new WinException(this);
       break;
     }
     break;
   case ExceptionHandling::Wasm:
     ES = new WasmException(this);
     break;
   }
   if (ES)
     Handlers.emplace_back(std::unique_ptr<EHStreamer>(ES), EHTimerName,
                           EHTimerDescription, DWARFGroupName,
                           DWARFGroupDescription);
 
   // Emit tables for any value of cfguard flag (i.e. cfguard=1 or cfguard=2).
   if (mdconst::extract_or_null<ConstantInt>(M.getModuleFlag("cfguard")))
     Handlers.emplace_back(std::make_unique<WinCFGuard>(this), CFGuardName,
                           CFGuardDescription, DWARFGroupName,
                           DWARFGroupDescription);
   return false;
 }
 
 static bool canBeHidden(const GlobalValue *GV, const MCAsmInfo &MAI) {
   if (!MAI.hasWeakDefCanBeHiddenDirective())
     return false;
 
   return GV->canBeOmittedFromSymbolTable();
 }
 
 void AsmPrinter::emitLinkage(const GlobalValue *GV, MCSymbol *GVSym) const {
   GlobalValue::LinkageTypes Linkage = GV->getLinkage();
   switch (Linkage) {
   case GlobalValue::CommonLinkage:
   case GlobalValue::LinkOnceAnyLinkage:
   case GlobalValue::LinkOnceODRLinkage:
   case GlobalValue::WeakAnyLinkage:
   case GlobalValue::WeakODRLinkage:
     if (MAI->hasWeakDefDirective()) {
       // .globl _foo
       OutStreamer->emitSymbolAttribute(GVSym, MCSA_Global);
 
       if (!canBeHidden(GV, *MAI))
         // .weak_definition _foo
         OutStreamer->emitSymbolAttribute(GVSym, MCSA_WeakDefinition);
       else
         OutStreamer->emitSymbolAttribute(GVSym, MCSA_WeakDefAutoPrivate);
     } else if (MAI->avoidWeakIfComdat() && GV->hasComdat()) {
       // .globl _foo
       OutStreamer->emitSymbolAttribute(GVSym, MCSA_Global);
       //NOTE: linkonce is handled by the section the symbol was assigned to.
     } else {
       // .weak _foo
       OutStreamer->emitSymbolAttribute(GVSym, MCSA_Weak);
     }
     return;
   case GlobalValue::ExternalLinkage:
     OutStreamer->emitSymbolAttribute(GVSym, MCSA_Global);
     return;
   case GlobalValue::PrivateLinkage:
   case GlobalValue::InternalLinkage:
     return;
   case GlobalValue::ExternalWeakLinkage:
   case GlobalValue::AvailableExternallyLinkage:
   case GlobalValue::AppendingLinkage:
     llvm_unreachable("Should never emit this");
   }
   llvm_unreachable("Unknown linkage type!");
 }
 
 void AsmPrinter::getNameWithPrefix(SmallVectorImpl<char> &Name,
                                    const GlobalValue *GV) const {
   TM.getNameWithPrefix(Name, GV, getObjFileLowering().getMangler());
 }
 
 MCSymbol *AsmPrinter::getSymbol(const GlobalValue *GV) const {
   return TM.getSymbol(GV);
 }
 
 MCSymbol *AsmPrinter::getSymbolPreferLocal(const GlobalValue &GV) const {
   // On ELF, use .Lfoo$local if GV is a non-interposable GlobalObject with an
   // exact definion (intersection of GlobalValue::hasExactDefinition() and
   // !isInterposable()). These linkages include: external, appending, internal,
   // private. It may be profitable to use a local alias for external. The
   // assembler would otherwise be conservative and assume a global default
   // visibility symbol can be interposable, even if the code generator already
   // assumed it.
   if (TM.getTargetTriple().isOSBinFormatELF() && GV.canBenefitFromLocalAlias()) {
     const Module &M = *GV.getParent();
     if (TM.getRelocationModel() != Reloc::Static &&
         M.getPIELevel() == PIELevel::Default)
       if (GV.isDSOLocal() || (TM.getTargetTriple().isX86() &&
                               GV.getParent()->noSemanticInterposition()))
         return getSymbolWithGlobalValueBase(&GV, "$local");
   }
   return TM.getSymbol(&GV);
 }
 
 /// EmitGlobalVariable - Emit the specified global variable to the .s file.
 void AsmPrinter::emitGlobalVariable(const GlobalVariable *GV) {
   bool IsEmuTLSVar = TM.useEmulatedTLS() && GV->isThreadLocal();
   assert(!(IsEmuTLSVar && GV->hasCommonLinkage()) &&
          "No emulated TLS variables in the common section");
 
   // Never emit TLS variable xyz in emulated TLS model.
   // The initialization value is in __emutls_t.xyz instead of xyz.
   if (IsEmuTLSVar)
     return;
 
   if (GV->hasInitializer()) {
     // Check to see if this is a special global used by LLVM, if so, emit it.
     if (emitSpecialLLVMGlobal(GV))
       return;
 
     // Skip the emission of global equivalents. The symbol can be emitted later
     // on by emitGlobalGOTEquivs in case it turns out to be needed.
     if (GlobalGOTEquivs.count(getSymbol(GV)))
       return;
 
     if (isVerbose()) {
       // When printing the control variable __emutls_v.*,
       // we don't need to print the original TLS variable name.
       GV->printAsOperand(OutStreamer->GetCommentOS(),
                      /*PrintType=*/false, GV->getParent());
       OutStreamer->GetCommentOS() << '\n';
     }
   }
 
   MCSymbol *GVSym = getSymbol(GV);
   MCSymbol *EmittedSym = GVSym;
 
   // getOrCreateEmuTLSControlSym only creates the symbol with name and default
   // attributes.
   // GV's or GVSym's attributes will be used for the EmittedSym.
   emitVisibility(EmittedSym, GV->getVisibility(), !GV->isDeclaration());
 
   if (!GV->hasInitializer())   // External globals require no extra code.
     return;
 
   GVSym->redefineIfPossible();
   if (GVSym->isDefined() || GVSym->isVariable())
     report_fatal_error("symbol '" + Twine(GVSym->getName()) +
                        "' is already defined");
 
   if (MAI->hasDotTypeDotSizeDirective())
     OutStreamer->emitSymbolAttribute(EmittedSym, MCSA_ELF_TypeObject);
 
   SectionKind GVKind = TargetLoweringObjectFile::getKindForGlobal(GV, TM);
 
   const DataLayout &DL = GV->getParent()->getDataLayout();
   uint64_t Size = DL.getTypeAllocSize(GV->getValueType());
 
   // If the alignment is specified, we *must* obey it.  Overaligning a global
   // with a specified alignment is a prompt way to break globals emitted to
   // sections and expected to be contiguous (e.g. ObjC metadata).
   const Align Alignment = getGVAlignment(GV, DL);
 
   for (const HandlerInfo &HI : Handlers) {
     NamedRegionTimer T(HI.TimerName, HI.TimerDescription,
                        HI.TimerGroupName, HI.TimerGroupDescription,
                        TimePassesIsEnabled);
     HI.Handler->setSymbolSize(GVSym, Size);
   }
 
   // Handle common symbols
   if (GVKind.isCommon()) {
     if (Size == 0) Size = 1;   // .comm Foo, 0 is undefined, avoid it.
     // .comm _foo, 42, 4
     const bool SupportsAlignment =
         getObjFileLowering().getCommDirectiveSupportsAlignment();
     OutStreamer->emitCommonSymbol(GVSym, Size,
                                   SupportsAlignment ? Alignment.value() : 0);
     return;
   }
 
   // Determine to which section this global should be emitted.
   MCSection *TheSection = getObjFileLowering().SectionForGlobal(GV, GVKind, TM);
 
   // If we have a bss global going to a section that supports the
   // zerofill directive, do so here.
   if (GVKind.isBSS() && MAI->hasMachoZeroFillDirective() &&
       TheSection->isVirtualSection()) {
     if (Size == 0)
       Size = 1; // zerofill of 0 bytes is undefined.
     emitLinkage(GV, GVSym);
     // .zerofill __DATA, __bss, _foo, 400, 5
     OutStreamer->emitZerofill(TheSection, GVSym, Size, Alignment.value());
     return;
   }
 
   // If this is a BSS local symbol and we are emitting in the BSS
   // section use .lcomm/.comm directive.
   if (GVKind.isBSSLocal() &&
       getObjFileLowering().getBSSSection() == TheSection) {
     if (Size == 0)
       Size = 1; // .comm Foo, 0 is undefined, avoid it.
 
     // Use .lcomm only if it supports user-specified alignment.
     // Otherwise, while it would still be correct to use .lcomm in some
     // cases (e.g. when Align == 1), the external assembler might enfore
     // some -unknown- default alignment behavior, which could cause
     // spurious differences between external and integrated assembler.
     // Prefer to simply fall back to .local / .comm in this case.
     if (MAI->getLCOMMDirectiveAlignmentType() != LCOMM::NoAlignment) {
       // .lcomm _foo, 42
       OutStreamer->emitLocalCommonSymbol(GVSym, Size, Alignment.value());
       return;
     }
 
     // .local _foo
     OutStreamer->emitSymbolAttribute(GVSym, MCSA_Local);
     // .comm _foo, 42, 4
     const bool SupportsAlignment =
         getObjFileLowering().getCommDirectiveSupportsAlignment();
     OutStreamer->emitCommonSymbol(GVSym, Size,
                                   SupportsAlignment ? Alignment.value() : 0);
     return;
   }
 
   // Handle thread local data for mach-o which requires us to output an
   // additional structure of data and mangle the original symbol so that we
   // can reference it later.
   //
   // TODO: This should become an "emit thread local global" method on TLOF.
   // All of this macho specific stuff should be sunk down into TLOFMachO and
   // stuff like "TLSExtraDataSection" should no longer be part of the parent
   // TLOF class.  This will also make it more obvious that stuff like
   // MCStreamer::EmitTBSSSymbol is macho specific and only called from macho
   // specific code.
   if (GVKind.isThreadLocal() && MAI->hasMachoTBSSDirective()) {
     // Emit the .tbss symbol
     MCSymbol *MangSym =
         OutContext.getOrCreateSymbol(GVSym->getName() + Twine("$tlv$init"));
 
     if (GVKind.isThreadBSS()) {
       TheSection = getObjFileLowering().getTLSBSSSection();
       OutStreamer->emitTBSSSymbol(TheSection, MangSym, Size, Alignment.value());
     } else if (GVKind.isThreadData()) {
       OutStreamer->SwitchSection(TheSection);
 
       emitAlignment(Alignment, GV);
       OutStreamer->emitLabel(MangSym);
 
       emitGlobalConstant(GV->getParent()->getDataLayout(),
                          GV->getInitializer());
     }
 
     OutStreamer->AddBlankLine();
 
     // Emit the variable struct for the runtime.
     MCSection *TLVSect = getObjFileLowering().getTLSExtraDataSection();
 
     OutStreamer->SwitchSection(TLVSect);
     // Emit the linkage here.
     emitLinkage(GV, GVSym);
     OutStreamer->emitLabel(GVSym);
 
     // Three pointers in size:
     //   - __tlv_bootstrap - used to make sure support exists
     //   - spare pointer, used when mapped by the runtime
     //   - pointer to mangled symbol above with initializer
     unsigned PtrSize = DL.getPointerTypeSize(GV->getType());
     OutStreamer->emitSymbolValue(GetExternalSymbolSymbol("_tlv_bootstrap"),
                                 PtrSize);
     OutStreamer->emitIntValue(0, PtrSize);
     OutStreamer->emitSymbolValue(MangSym, PtrSize);
 
     OutStreamer->AddBlankLine();
     return;
   }
 
   MCSymbol *EmittedInitSym = GVSym;
 
   OutStreamer->SwitchSection(TheSection);
 
   emitLinkage(GV, EmittedInitSym);
   emitAlignment(Alignment, GV);
 
   OutStreamer->emitLabel(EmittedInitSym);
   MCSymbol *LocalAlias = getSymbolPreferLocal(*GV);
   if (LocalAlias != EmittedInitSym)
     OutStreamer->emitLabel(LocalAlias);
 
   emitGlobalConstant(GV->getParent()->getDataLayout(), GV->getInitializer());
 
   if (MAI->hasDotTypeDotSizeDirective())
     // .size foo, 42
     OutStreamer->emitELFSize(EmittedInitSym,
                              MCConstantExpr::create(Size, OutContext));
 
   OutStreamer->AddBlankLine();
 }
 
 /// Emit the directive and value for debug thread local expression
 ///
 /// \p Value - The value to emit.
 /// \p Size - The size of the integer (in bytes) to emit.
 void AsmPrinter::emitDebugValue(const MCExpr *Value, unsigned Size) const {
   OutStreamer->emitValue(Value, Size);
 }
 
 void AsmPrinter::emitFunctionHeaderComment() {}
 
 /// EmitFunctionHeader - This method emits the header for the current
 /// function.
 void AsmPrinter::emitFunctionHeader() {
   const Function &F = MF->getFunction();
 
   if (isVerbose())
     OutStreamer->GetCommentOS()
         << "-- Begin function "
         << GlobalValue::dropLLVMManglingEscape(F.getName()) << '\n';
 
   // Print out constants referenced by the function
   emitConstantPool();
 
   // Print the 'header' of function.
   MF->setSection(getObjFileLowering().SectionForGlobal(&F, TM));
   OutStreamer->SwitchSection(MF->getSection());
 
   if (!MAI->hasVisibilityOnlyWithLinkage())
     emitVisibility(CurrentFnSym, F.getVisibility());
 
   if (MAI->needsFunctionDescriptors())
     emitLinkage(&F, CurrentFnDescSym);
 
   emitLinkage(&F, CurrentFnSym);
   if (MAI->hasFunctionAlignment())
     emitAlignment(MF->getAlignment(), &F);
 
   if (MAI->hasDotTypeDotSizeDirective())
     OutStreamer->emitSymbolAttribute(CurrentFnSym, MCSA_ELF_TypeFunction);
 
   if (F.hasFnAttribute(Attribute::Cold))
     OutStreamer->emitSymbolAttribute(CurrentFnSym, MCSA_Cold);
 
   if (isVerbose()) {
     F.printAsOperand(OutStreamer->GetCommentOS(),
                    /*PrintType=*/false, F.getParent());
     emitFunctionHeaderComment();
     OutStreamer->GetCommentOS() << '\n';
   }
 
   // Emit the prefix data.
   if (F.hasPrefixData()) {
     if (MAI->hasSubsectionsViaSymbols()) {
       // Preserving prefix data on platforms which use subsections-via-symbols
       // is a bit tricky. Here we introduce a symbol for the prefix data
       // and use the .alt_entry attribute to mark the function's real entry point
       // as an alternative entry point to the prefix-data symbol.
       MCSymbol *PrefixSym = OutContext.createLinkerPrivateTempSymbol();
       OutStreamer->emitLabel(PrefixSym);
 
       emitGlobalConstant(F.getParent()->getDataLayout(), F.getPrefixData());
 
       // Emit an .alt_entry directive for the actual function symbol.
       OutStreamer->emitSymbolAttribute(CurrentFnSym, MCSA_AltEntry);
     } else {
       emitGlobalConstant(F.getParent()->getDataLayout(), F.getPrefixData());
     }
   }
 
   // Emit M NOPs for -fpatchable-function-entry=N,M where M>0. We arbitrarily
   // place prefix data before NOPs.
   unsigned PatchableFunctionPrefix = 0;
   unsigned PatchableFunctionEntry = 0;
   (void)F.getFnAttribute("patchable-function-prefix")
       .getValueAsString()
       .getAsInteger(10, PatchableFunctionPrefix);
   (void)F.getFnAttribute("patchable-function-entry")
       .getValueAsString()
       .getAsInteger(10, PatchableFunctionEntry);
   if (PatchableFunctionPrefix) {
     CurrentPatchableFunctionEntrySym =
         OutContext.createLinkerPrivateTempSymbol();
     OutStreamer->emitLabel(CurrentPatchableFunctionEntrySym);
     emitNops(PatchableFunctionPrefix);
   } else if (PatchableFunctionEntry) {
     // May be reassigned when emitting the body, to reference the label after
     // the initial BTI (AArch64) or endbr32/endbr64 (x86).
     CurrentPatchableFunctionEntrySym = CurrentFnBegin;
   }
 
   // Emit the function descriptor. This is a virtual function to allow targets
   // to emit their specific function descriptor. Right now it is only used by
   // the AIX target. The PowerPC 64-bit V1 ELF target also uses function
   // descriptors and should be converted to use this hook as well.
   if (MAI->needsFunctionDescriptors())
     emitFunctionDescriptor();
 
   // Emit the CurrentFnSym. This is a virtual function to allow targets to do
   // their wild and crazy things as required.
   emitFunctionEntryLabel();
 
   if (CurrentFnBegin) {
     if (MAI->useAssignmentForEHBegin()) {
       MCSymbol *CurPos = OutContext.createTempSymbol();
       OutStreamer->emitLabel(CurPos);
       OutStreamer->emitAssignment(CurrentFnBegin,
                                  MCSymbolRefExpr::create(CurPos, OutContext));
     } else {
       OutStreamer->emitLabel(CurrentFnBegin);
     }
   }
 
   // Emit pre-function debug and/or EH information.
   for (const HandlerInfo &HI : Handlers) {
     NamedRegionTimer T(HI.TimerName, HI.TimerDescription, HI.TimerGroupName,
                        HI.TimerGroupDescription, TimePassesIsEnabled);
     HI.Handler->beginFunction(MF);
   }
 
   // Emit the prologue data.
   if (F.hasPrologueData())
     emitGlobalConstant(F.getParent()->getDataLayout(), F.getPrologueData());
 }
 
 /// EmitFunctionEntryLabel - Emit the label that is the entrypoint for the
 /// function.  This can be overridden by targets as required to do custom stuff.
 void AsmPrinter::emitFunctionEntryLabel() {
   CurrentFnSym->redefineIfPossible();
 
   // The function label could have already been emitted if two symbols end up
   // conflicting due to asm renaming.  Detect this and emit an error.
   if (CurrentFnSym->isVariable())
     report_fatal_error("'" + Twine(CurrentFnSym->getName()) +
                        "' is a protected alias");
   if (CurrentFnSym->isDefined())
     report_fatal_error("'" + Twine(CurrentFnSym->getName()) +
                        "' label emitted multiple times to assembly file");
 
   OutStreamer->emitLabel(CurrentFnSym);
 
   if (TM.getTargetTriple().isOSBinFormatELF()) {
     MCSymbol *Sym = getSymbolPreferLocal(MF->getFunction());
     if (Sym != CurrentFnSym)
       OutStreamer->emitLabel(Sym);
   }
 }
 
 /// emitComments - Pretty-print comments for instructions.
 static void emitComments(const MachineInstr &MI, raw_ostream &CommentOS) {
   const MachineFunction *MF = MI.getMF();
   const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();
 
   // Check for spills and reloads
 
   // We assume a single instruction only has a spill or reload, not
   // both.
   Optional<unsigned> Size;
   if ((Size = MI.getRestoreSize(TII))) {
     CommentOS << *Size << "-byte Reload\n";
   } else if ((Size = MI.getFoldedRestoreSize(TII))) {
     if (*Size)
       CommentOS << *Size << "-byte Folded Reload\n";
   } else if ((Size = MI.getSpillSize(TII))) {
     CommentOS << *Size << "-byte Spill\n";
   } else if ((Size = MI.getFoldedSpillSize(TII))) {
     if (*Size)
       CommentOS << *Size << "-byte Folded Spill\n";
   }
 
   // Check for spill-induced copies
   if (MI.getAsmPrinterFlag(MachineInstr::ReloadReuse))
     CommentOS << " Reload Reuse\n";
 }
 
 /// emitImplicitDef - This method emits the specified machine instruction
 /// that is an implicit def.
 void AsmPrinter::emitImplicitDef(const MachineInstr *MI) const {
   Register RegNo = MI->getOperand(0).getReg();
 
   SmallString<128> Str;
   raw_svector_ostream OS(Str);
   OS << "implicit-def: "
      << printReg(RegNo, MF->getSubtarget().getRegisterInfo());
 
   OutStreamer->AddComment(OS.str());
   OutStreamer->AddBlankLine();
 }
 
 static void emitKill(const MachineInstr *MI, AsmPrinter &AP) {
   std::string Str;
   raw_string_ostream OS(Str);
   OS << "kill:";
   for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
     const MachineOperand &Op = MI->getOperand(i);
     assert(Op.isReg() && "KILL instruction must have only register operands");
     OS << ' ' << (Op.isDef() ? "def " : "killed ")
        << printReg(Op.getReg(), AP.MF->getSubtarget().getRegisterInfo());
   }
   AP.OutStreamer->AddComment(OS.str());
   AP.OutStreamer->AddBlankLine();
 }
 
 /// emitDebugValueComment - This method handles the target-independent form
 /// of DBG_VALUE, returning true if it was able to do so.  A false return
 /// means the target will need to handle MI in EmitInstruction.
 static bool emitDebugValueComment(const MachineInstr *MI, AsmPrinter &AP) {
   // This code handles only the 4-operand target-independent form.
   if (MI->getNumOperands() != 4)
     return false;
 
   SmallString<128> Str;
   raw_svector_ostream OS(Str);
   OS << "DEBUG_VALUE: ";
 
   const DILocalVariable *V = MI->getDebugVariable();
   if (auto *SP = dyn_cast<DISubprogram>(V->getScope())) {
     StringRef Name = SP->getName();
     if (!Name.empty())
       OS << Name << ":";
   }
   OS << V->getName();
   OS << " <- ";
 
   // The second operand is only an offset if it's an immediate.
   bool MemLoc = MI->isIndirectDebugValue();
   int64_t Offset = MemLoc ? MI->getOperand(1).getImm() : 0;
   const DIExpression *Expr = MI->getDebugExpression();
   if (Expr->getNumElements()) {
     OS << '[';
     bool NeedSep = false;
     for (auto Op : Expr->expr_ops()) {
       if (NeedSep)
         OS << ", ";
       else
         NeedSep = true;
       OS << dwarf::OperationEncodingString(Op.getOp());
       for (unsigned I = 0; I < Op.getNumArgs(); ++I)
         OS << ' ' << Op.getArg(I);
     }
     OS << "] ";
   }
 
   // Register or immediate value. Register 0 means undef.
   if (MI->getDebugOperand(0).isFPImm()) {
     APFloat APF = APFloat(MI->getDebugOperand(0).getFPImm()->getValueAPF());
     if (MI->getDebugOperand(0).getFPImm()->getType()->isFloatTy()) {
       OS << (double)APF.convertToFloat();
     } else if (MI->getDebugOperand(0).getFPImm()->getType()->isDoubleTy()) {
       OS << APF.convertToDouble();
     } else {
       // There is no good way to print long double.  Convert a copy to
       // double.  Ah well, it's only a comment.
       bool ignored;
       APF.convert(APFloat::IEEEdouble(), APFloat::rmNearestTiesToEven,
                   &ignored);
       OS << "(long double) " << APF.convertToDouble();
     }
   } else if (MI->getDebugOperand(0).isImm()) {
     OS << MI->getDebugOperand(0).getImm();
   } else if (MI->getDebugOperand(0).isCImm()) {
     MI->getDebugOperand(0).getCImm()->getValue().print(OS, false /*isSigned*/);
   } else if (MI->getDebugOperand(0).isTargetIndex()) {
     auto Op = MI->getDebugOperand(0);
     OS << "!target-index(" << Op.getIndex() << "," << Op.getOffset() << ")";
     return true;
   } else {
     Register Reg;
     if (MI->getDebugOperand(0).isReg()) {
       Reg = MI->getDebugOperand(0).getReg();
     } else {
       assert(MI->getDebugOperand(0).isFI() && "Unknown operand type");
       const TargetFrameLowering *TFI = AP.MF->getSubtarget().getFrameLowering();
       Offset += TFI->getFrameIndexReference(
           *AP.MF, MI->getDebugOperand(0).getIndex(), Reg);
       MemLoc = true;
     }
     if (Reg == 0) {
       // Suppress offset, it is not meaningful here.
       OS << "undef";
       // NOTE: Want this comment at start of line, don't emit with AddComment.
       AP.OutStreamer->emitRawComment(OS.str());
       return true;
     }
     if (MemLoc)
       OS << '[';
     OS << printReg(Reg, AP.MF->getSubtarget().getRegisterInfo());
   }
 
   if (MemLoc)
     OS << '+' << Offset << ']';
 
   // NOTE: Want this comment at start of line, don't emit with AddComment.
   AP.OutStreamer->emitRawComment(OS.str());
   return true;
 }
 
 /// This method handles the target-independent form of DBG_LABEL, returning
 /// true if it was able to do so.  A false return means the target will need
 /// to handle MI in EmitInstruction.
 static bool emitDebugLabelComment(const MachineInstr *MI, AsmPrinter &AP) {
   if (MI->getNumOperands() != 1)
     return false;
 
   SmallString<128> Str;
   raw_svector_ostream OS(Str);
   OS << "DEBUG_LABEL: ";
 
   const DILabel *V = MI->getDebugLabel();
   if (auto *SP = dyn_cast<DISubprogram>(
           V->getScope()->getNonLexicalBlockFileScope())) {
     StringRef Name = SP->getName();
     if (!Name.empty())
       OS << Name << ":";
   }
   OS << V->getName();
 
   // NOTE: Want this comment at start of line, don't emit with AddComment.
   AP.OutStreamer->emitRawComment(OS.str());
   return true;
 }
 
 AsmPrinter::CFIMoveType AsmPrinter::needsCFIMoves() const {
   if (MAI->getExceptionHandlingType() == ExceptionHandling::DwarfCFI &&
       MF->getFunction().needsUnwindTableEntry())
     return CFI_M_EH;
 
   if (MMI->hasDebugInfo() || MF->getTarget().Options.ForceDwarfFrameSection)
     return CFI_M_Debug;
 
   return CFI_M_None;
 }
 
 bool AsmPrinter::needsSEHMoves() {
   return MAI->usesWindowsCFI() && MF->getFunction().needsUnwindTableEntry();
 }
 
 void AsmPrinter::emitCFIInstruction(const MachineInstr &MI) {
   ExceptionHandling ExceptionHandlingType = MAI->getExceptionHandlingType();
   if (ExceptionHandlingType != ExceptionHandling::DwarfCFI &&
       ExceptionHandlingType != ExceptionHandling::ARM)
     return;
 
   if (needsCFIMoves() == CFI_M_None)
     return;
 
   // If there is no "real" instruction following this CFI instruction, skip
   // emitting it; it would be beyond the end of the function's FDE range.
   auto *MBB = MI.getParent();
   auto I = std::next(MI.getIterator());
   while (I != MBB->end() && I->isTransient())
     ++I;
   if (I == MBB->instr_end() &&
       MBB->getReverseIterator() == MBB->getParent()->rbegin())
     return;
 
   const std::vector<MCCFIInstruction> &Instrs = MF->getFrameInstructions();
   unsigned CFIIndex = MI.getOperand(0).getCFIIndex();
   const MCCFIInstruction &CFI = Instrs[CFIIndex];
   emitCFIInstruction(CFI);
 }
 
 void AsmPrinter::emitFrameAlloc(const MachineInstr &MI) {
   // The operands are the MCSymbol and the frame offset of the allocation.
   MCSymbol *FrameAllocSym = MI.getOperand(0).getMCSymbol();
   int FrameOffset = MI.getOperand(1).getImm();
 
   // Emit a symbol assignment.
   OutStreamer->emitAssignment(FrameAllocSym,
                              MCConstantExpr::create(FrameOffset, OutContext));
 }
 
 void AsmPrinter::emitStackSizeSection(const MachineFunction &MF) {
   if (!MF.getTarget().Options.EmitStackSizeSection)
     return;
 
   MCSection *StackSizeSection =
       getObjFileLowering().getStackSizesSection(*getCurrentSection());
   if (!StackSizeSection)
     return;
 
   const MachineFrameInfo &FrameInfo = MF.getFrameInfo();
   // Don't emit functions with dynamic stack allocations.
   if (FrameInfo.hasVarSizedObjects())
     return;
 
   OutStreamer->PushSection();
   OutStreamer->SwitchSection(StackSizeSection);
 
   const MCSymbol *FunctionSymbol = getFunctionBegin();
   uint64_t StackSize = FrameInfo.getStackSize();
   OutStreamer->emitSymbolValue(FunctionSymbol, TM.getProgramPointerSize());
   OutStreamer->emitULEB128IntValue(StackSize);
 
   OutStreamer->PopSection();
 }
 
 static bool needFuncLabelsForEHOrDebugInfo(const MachineFunction &MF) {
   MachineModuleInfo &MMI = MF.getMMI();
   if (!MF.getLandingPads().empty() || MF.hasEHFunclets() || MMI.hasDebugInfo())
     return true;
 
   // We might emit an EH table that uses function begin and end labels even if
   // we don't have any landingpads.
   if (!MF.getFunction().hasPersonalityFn())
     return false;
   return !isNoOpWithoutInvoke(
       classifyEHPersonality(MF.getFunction().getPersonalityFn()));
 }
 
 /// EmitFunctionBody - This method emits the body and trailer for a
 /// function.
 void AsmPrinter::emitFunctionBody() {
   emitFunctionHeader();
 
   // Emit target-specific gunk before the function body.
   emitFunctionBodyStart();
 
   bool ShouldPrintDebugScopes = MMI->hasDebugInfo();
 
   if (isVerbose()) {
     // Get MachineDominatorTree or compute it on the fly if it's unavailable
     MDT = getAnalysisIfAvailable<MachineDominatorTree>();
     if (!MDT) {
       OwnedMDT = std::make_unique<MachineDominatorTree>();
       OwnedMDT->getBase().recalculate(*MF);
       MDT = OwnedMDT.get();
     }
 
     // Get MachineLoopInfo or compute it on the fly if it's unavailable
     MLI = getAnalysisIfAvailable<MachineLoopInfo>();
     if (!MLI) {
       OwnedMLI = std::make_unique<MachineLoopInfo>();
       OwnedMLI->getBase().analyze(MDT->getBase());
       MLI = OwnedMLI.get();
     }
   }
 
   // Print out code for the function.
   bool HasAnyRealCode = false;
   int NumInstsInFunction = 0;
 
   for (auto &MBB : *MF) {
     // Print a label for the basic block.
     emitBasicBlockStart(MBB);
     for (auto &MI : MBB) {
       // Print the assembly for the instruction.
       if (!MI.isPosition() && !MI.isImplicitDef() && !MI.isKill() &&
           !MI.isDebugInstr()) {
         HasAnyRealCode = true;
         ++NumInstsInFunction;
       }
 
       // If there is a pre-instruction symbol, emit a label for it here.
       if (MCSymbol *S = MI.getPreInstrSymbol())
         OutStreamer->emitLabel(S);
 
       if (ShouldPrintDebugScopes) {
         for (const HandlerInfo &HI : Handlers) {
           NamedRegionTimer T(HI.TimerName, HI.TimerDescription,
                              HI.TimerGroupName, HI.TimerGroupDescription,
                              TimePassesIsEnabled);
           HI.Handler->beginInstruction(&MI);
         }
       }
 
       if (isVerbose())
         emitComments(MI, OutStreamer->GetCommentOS());
 
       switch (MI.getOpcode()) {
       case TargetOpcode::CFI_INSTRUCTION:
         emitCFIInstruction(MI);
         break;
       case TargetOpcode::LOCAL_ESCAPE:
         emitFrameAlloc(MI);
         break;
       case TargetOpcode::ANNOTATION_LABEL:
       case TargetOpcode::EH_LABEL:
       case TargetOpcode::GC_LABEL:
         OutStreamer->emitLabel(MI.getOperand(0).getMCSymbol());
         break;
       case TargetOpcode::INLINEASM:
       case TargetOpcode::INLINEASM_BR:
         emitInlineAsm(&MI);
         break;
       case TargetOpcode::DBG_VALUE:
         if (isVerbose()) {
           if (!emitDebugValueComment(&MI, *this))
             emitInstruction(&MI);
         }
         break;
       case TargetOpcode::DBG_LABEL:
         if (isVerbose()) {
           if (!emitDebugLabelComment(&MI, *this))
             emitInstruction(&MI);
         }
         break;
       case TargetOpcode::IMPLICIT_DEF:
         if (isVerbose()) emitImplicitDef(&MI);
         break;
       case TargetOpcode::KILL:
         if (isVerbose()) emitKill(&MI, *this);
         break;
       default:
         emitInstruction(&MI);
         break;
       }
 
       // If there is a post-instruction symbol, emit a label for it here.
       if (MCSymbol *S = MI.getPostInstrSymbol())
         OutStreamer->emitLabel(S);
 
       if (ShouldPrintDebugScopes) {
         for (const HandlerInfo &HI : Handlers) {
           NamedRegionTimer T(HI.TimerName, HI.TimerDescription,
                              HI.TimerGroupName, HI.TimerGroupDescription,
                              TimePassesIsEnabled);
           HI.Handler->endInstruction();
         }
       }
     }
 
     // We need a temporary symbol for the end of this basic block, if either we
     // have BBLabels enabled and we want to emit size directive for the BBs, or
     // if this basic blocks marks the end of a section (except the section
     // containing the entry basic block as the end symbol for that section is
     // CurrentFnEnd).
     MCSymbol *CurrentBBEnd = nullptr;
     if ((MAI->hasDotTypeDotSizeDirective() && MF->hasBBLabels()) ||
         (MBB.isEndSection() && !MBB.sameSection(&MF->front()))) {
       CurrentBBEnd = OutContext.createTempSymbol();
       OutStreamer->emitLabel(CurrentBBEnd);
     }
 
     // Helper for emitting the size directive associated with a basic block
     // symbol.
     auto emitELFSizeDirective = [&](MCSymbol *SymForSize) {
       assert(CurrentBBEnd && "Basicblock end symbol not set!");
       const MCExpr *SizeExp = MCBinaryExpr::createSub(
           MCSymbolRefExpr::create(CurrentBBEnd, OutContext),
           MCSymbolRefExpr::create(SymForSize, OutContext), OutContext);
       OutStreamer->emitELFSize(SymForSize, SizeExp);
     };
 
     // Emit size directive for the size of each basic block, if BBLabels is
     // enabled.
     if (MAI->hasDotTypeDotSizeDirective() && MF->hasBBLabels())
       emitELFSizeDirective(MBB.getSymbol());
 
     // Emit size directive for the size of each basic block section once we
     // get to the end of that section.
     if (MBB.isEndSection()) {
       if (!MBB.sameSection(&MF->front())) {
         if (MAI->hasDotTypeDotSizeDirective())
           emitELFSizeDirective(CurrentSectionBeginSym);
         MBBSectionRanges[MBB.getSectionIDNum()] =
             MBBSectionRange{CurrentSectionBeginSym, CurrentBBEnd};
       }
     }
     emitBasicBlockEnd(MBB);
   }
 
   EmittedInsts += NumInstsInFunction;
   MachineOptimizationRemarkAnalysis R(DEBUG_TYPE, "InstructionCount",
                                       MF->getFunction().getSubprogram(),
                                       &MF->front());
   R << ore::NV("NumInstructions", NumInstsInFunction)
     << " instructions in function";
   ORE->emit(R);
 
   // If the function is empty and the object file uses .subsections_via_symbols,
   // then we need to emit *something* to the function body to prevent the
   // labels from collapsing together.  Just emit a noop.
   // Similarly, don't emit empty functions on Windows either. It can lead to
   // duplicate entries (two functions with the same RVA) in the Guard CF Table
   // after linking, causing the kernel not to load the binary:
   // https://developercommunity.visualstudio.com/content/problem/45366/vc-linker-creates-invalid-dll-with-clang-cl.html
   // FIXME: Hide this behind some API in e.g. MCAsmInfo or MCTargetStreamer.
   const Triple &TT = TM.getTargetTriple();
   if (!HasAnyRealCode && (MAI->hasSubsectionsViaSymbols() ||
                           (TT.isOSWindows() && TT.isOSBinFormatCOFF()))) {
     MCInst Noop;
     MF->getSubtarget().getInstrInfo()->getNoop(Noop);
 
     // Targets can opt-out of emitting the noop here by leaving the opcode
     // unspecified.
     if (Noop.getOpcode()) {
       OutStreamer->AddComment("avoids zero-length function");
       emitNops(1);
     }
   }
 
   // Switch to the original section in case basic block sections was used.
   OutStreamer->SwitchSection(MF->getSection());
 
   const Function &F = MF->getFunction();
   for (const auto &BB : F) {
     if (!BB.hasAddressTaken())
       continue;
     MCSymbol *Sym = GetBlockAddressSymbol(&BB);
     if (Sym->isDefined())
       continue;
     OutStreamer->AddComment("Address of block that was removed by CodeGen");
     OutStreamer->emitLabel(Sym);
   }
 
   // Emit target-specific gunk after the function body.
   emitFunctionBodyEnd();
 
   if (needFuncLabelsForEHOrDebugInfo(*MF) ||
       MAI->hasDotTypeDotSizeDirective()) {
     // Create a symbol for the end of function.
     CurrentFnEnd = createTempSymbol("func_end");
     OutStreamer->emitLabel(CurrentFnEnd);
   }
 
   // If the target wants a .size directive for the size of the function, emit
   // it.
   if (MAI->hasDotTypeDotSizeDirective()) {
     // We can get the size as difference between the function label and the
     // temp label.
     const MCExpr *SizeExp = MCBinaryExpr::createSub(
         MCSymbolRefExpr::create(CurrentFnEnd, OutContext),
         MCSymbolRefExpr::create(CurrentFnSymForSize, OutContext), OutContext);
     OutStreamer->emitELFSize(CurrentFnSym, SizeExp);
   }
 
   for (const HandlerInfo &HI : Handlers) {
     NamedRegionTimer T(HI.TimerName, HI.TimerDescription, HI.TimerGroupName,
                        HI.TimerGroupDescription, TimePassesIsEnabled);
     HI.Handler->markFunctionEnd();
   }
 
   MBBSectionRanges[MF->front().getSectionIDNum()] =
       MBBSectionRange{CurrentFnBegin, CurrentFnEnd};
 
   // Print out jump tables referenced by the function.
   emitJumpTableInfo();
 
   // Emit post-function debug and/or EH information.
   for (const HandlerInfo &HI : Handlers) {
     NamedRegionTimer T(HI.TimerName, HI.TimerDescription, HI.TimerGroupName,
                        HI.TimerGroupDescription, TimePassesIsEnabled);
     HI.Handler->endFunction(MF);
   }
 
   // Emit section containing stack size metadata.
   emitStackSizeSection(*MF);
 
   emitPatchableFunctionEntries();
 
   if (isVerbose())
     OutStreamer->GetCommentOS() << "-- End function\n";
 
   OutStreamer->AddBlankLine();
 }
 
 /// Compute the number of Global Variables that uses a Constant.
 static unsigned getNumGlobalVariableUses(const Constant *C) {
   if (!C)
     return 0;
 
   if (isa<GlobalVariable>(C))
     return 1;
 
   unsigned NumUses = 0;
   for (auto *CU : C->users())
     NumUses += getNumGlobalVariableUses(dyn_cast<Constant>(CU));
 
   return NumUses;
 }
 
 /// Only consider global GOT equivalents if at least one user is a
 /// cstexpr inside an initializer of another global variables. Also, don't
 /// handle cstexpr inside instructions. During global variable emission,
 /// candidates are skipped and are emitted later in case at least one cstexpr
 /// isn't replaced by a PC relative GOT entry access.
 static bool isGOTEquivalentCandidate(const GlobalVariable *GV,
                                      unsigned &NumGOTEquivUsers) {
   // Global GOT equivalents are unnamed private globals with a constant
   // pointer initializer to another global symbol. They must point to a
   // GlobalVariable or Function, i.e., as GlobalValue.
   if (!GV->hasGlobalUnnamedAddr() || !GV->hasInitializer() ||
       !GV->isConstant() || !GV->isDiscardableIfUnused() ||
       !isa<GlobalValue>(GV->getOperand(0)))
     return false;
 
   // To be a got equivalent, at least one of its users need to be a constant
   // expression used by another global variable.
   for (auto *U : GV->users())
     NumGOTEquivUsers += getNumGlobalVariableUses(dyn_cast<Constant>(U));
 
   return NumGOTEquivUsers > 0;
 }
 
 /// Unnamed constant global variables solely contaning a pointer to
 /// another globals variable is equivalent to a GOT table entry; it contains the
 /// the address of another symbol. Optimize it and replace accesses to these
 /// "GOT equivalents" by using the GOT entry for the final global instead.
 /// Compute GOT equivalent candidates among all global variables to avoid
 /// emitting them if possible later on, after it use is replaced by a GOT entry
 /// access.
 void AsmPrinter::computeGlobalGOTEquivs(Module &M) {
   if (!getObjFileLowering().supportIndirectSymViaGOTPCRel())
     return;
 
   for (const auto &G : M.globals()) {
     unsigned NumGOTEquivUsers = 0;
     if (!isGOTEquivalentCandidate(&G, NumGOTEquivUsers))
       continue;
 
     const MCSymbol *GOTEquivSym = getSymbol(&G);
     GlobalGOTEquivs[GOTEquivSym] = std::make_pair(&G, NumGOTEquivUsers);
   }
 }
 
 /// Constant expressions using GOT equivalent globals may not be eligible
 /// for PC relative GOT entry conversion, in such cases we need to emit such
 /// globals we previously omitted in EmitGlobalVariable.
 void AsmPrinter::emitGlobalGOTEquivs() {
   if (!getObjFileLowering().supportIndirectSymViaGOTPCRel())
     return;
 
   SmallVector<const GlobalVariable *, 8> FailedCandidates;
   for (auto &I : GlobalGOTEquivs) {
     const GlobalVariable *GV = I.second.first;
     unsigned Cnt = I.second.second;
     if (Cnt)
       FailedCandidates.push_back(GV);
   }
   GlobalGOTEquivs.clear();
 
   for (auto *GV : FailedCandidates)
     emitGlobalVariable(GV);
 }
 
 void AsmPrinter::emitGlobalIndirectSymbol(Module &M,
                                           const GlobalIndirectSymbol& GIS) {
   MCSymbol *Name = getSymbol(&GIS);
 
   if (GIS.hasExternalLinkage() || !MAI->getWeakRefDirective())
     OutStreamer->emitSymbolAttribute(Name, MCSA_Global);
   else if (GIS.hasWeakLinkage() || GIS.hasLinkOnceLinkage())
     OutStreamer->emitSymbolAttribute(Name, MCSA_WeakReference);
   else
     assert(GIS.hasLocalLinkage() && "Invalid alias or ifunc linkage");
 
   bool IsFunction = GIS.getValueType()->isFunctionTy();
 
   // Treat bitcasts of functions as functions also. This is important at least
   // on WebAssembly where object and function addresses can't alias each other.
   if (!IsFunction)
     if (auto *CE = dyn_cast<ConstantExpr>(GIS.getIndirectSymbol()))
       if (CE->getOpcode() == Instruction::BitCast)
         IsFunction =
           CE->getOperand(0)->getType()->getPointerElementType()->isFunctionTy();
 
   // Set the symbol type to function if the alias has a function type.
   // This affects codegen when the aliasee is not a function.
   if (IsFunction)
     OutStreamer->emitSymbolAttribute(Name, isa<GlobalIFunc>(GIS)
                                                ? MCSA_ELF_TypeIndFunction
                                                : MCSA_ELF_TypeFunction);
 
   emitVisibility(Name, GIS.getVisibility());
 
   const MCExpr *Expr = lowerConstant(GIS.getIndirectSymbol());
 
   if (isa<GlobalAlias>(&GIS) && MAI->hasAltEntry() && isa<MCBinaryExpr>(Expr))
     OutStreamer->emitSymbolAttribute(Name, MCSA_AltEntry);
 
   // Emit the directives as assignments aka .set:
   OutStreamer->emitAssignment(Name, Expr);
   MCSymbol *LocalAlias = getSymbolPreferLocal(GIS);
   if (LocalAlias != Name)
     OutStreamer->emitAssignment(LocalAlias, Expr);
 
   if (auto *GA = dyn_cast<GlobalAlias>(&GIS)) {
     // If the aliasee does not correspond to a symbol in the output, i.e. the
     // alias is not of an object or the aliased object is private, then set the
     // size of the alias symbol from the type of the alias. We don't do this in
     // other situations as the alias and aliasee having differing types but same
     // size may be intentional.
     const GlobalObject *BaseObject = GA->getBaseObject();
     if (MAI->hasDotTypeDotSizeDirective() && GA->getValueType()->isSized() &&
         (!BaseObject || BaseObject->hasPrivateLinkage())) {
       const DataLayout &DL = M.getDataLayout();
       uint64_t Size = DL.getTypeAllocSize(GA->getValueType());
       OutStreamer->emitELFSize(Name, MCConstantExpr::create(Size, OutContext));
     }
   }
 }
 
 void AsmPrinter::emitRemarksSection(remarks::RemarkStreamer &RS) {
   if (!RS.needsSection())
     return;
 
   remarks::RemarkSerializer &RemarkSerializer = RS.getSerializer();
 
   Optional<SmallString<128>> Filename;
   if (Optional<StringRef> FilenameRef = RS.getFilename()) {
     Filename = *FilenameRef;
     sys::fs::make_absolute(*Filename);
     assert(!Filename->empty() && "The filename can't be empty.");
   }
 
   std::string Buf;
   raw_string_ostream OS(Buf);
   std::unique_ptr<remarks::MetaSerializer> MetaSerializer =
       Filename ? RemarkSerializer.metaSerializer(OS, StringRef(*Filename))
                : RemarkSerializer.metaSerializer(OS);
   MetaSerializer->emit();
 
   // Switch to the remarks section.
   MCSection *RemarksSection =
       OutContext.getObjectFileInfo()->getRemarksSection();
   OutStreamer->SwitchSection(RemarksSection);
 
   OutStreamer->emitBinaryData(OS.str());
 }
 
 bool AsmPrinter::doFinalization(Module &M) {
   // Set the MachineFunction to nullptr so that we can catch attempted
   // accesses to MF specific features at the module level and so that
   // we can conditionalize accesses based on whether or not it is nullptr.
   MF = nullptr;
 
   // Gather all GOT equivalent globals in the module. We really need two
   // passes over the globals: one to compute and another to avoid its emission
   // in EmitGlobalVariable, otherwise we would not be able to handle cases
   // where the got equivalent shows up before its use.
   computeGlobalGOTEquivs(M);
 
   // Emit global variables.
   for (const auto &G : M.globals())
     emitGlobalVariable(&G);
 
   // Emit remaining GOT equivalent globals.
   emitGlobalGOTEquivs();
 
   const TargetLoweringObjectFile &TLOF = getObjFileLowering();
 
   // Emit linkage(XCOFF) and visibility info for declarations
   for (const Function &F : M) {
     if (!F.isDeclarationForLinker())
       continue;
 
     MCSymbol *Name = getSymbol(&F);
     // Function getSymbol gives us the function descriptor symbol for XCOFF.
 
     if (!TM.getTargetTriple().isOSBinFormatXCOFF()) {
       GlobalValue::VisibilityTypes V = F.getVisibility();
       if (V == GlobalValue::DefaultVisibility)
         continue;
 
       emitVisibility(Name, V, false);
       continue;
     }
 
     if (F.isIntrinsic())
       continue;
 
     // Handle the XCOFF case.
     // Variable `Name` is the function descriptor symbol (see above). Get the
     // function entry point symbol.
     MCSymbol *FnEntryPointSym = TLOF.getFunctionEntryPointSymbol(&F, TM);
     if (cast<MCSymbolXCOFF>(FnEntryPointSym)->hasRepresentedCsectSet())
       // Emit linkage for the function entry point.
       emitLinkage(&F, FnEntryPointSym);
 
     // Emit linkage for the function descriptor.
     emitLinkage(&F, Name);
   }
 
   // Emit the remarks section contents.
   // FIXME: Figure out when is the safest time to emit this section. It should
   // not come after debug info.
   if (remarks::RemarkStreamer *RS = M.getContext().getMainRemarkStreamer())
     emitRemarksSection(*RS);
 
   TLOF.emitModuleMetadata(*OutStreamer, M);
 
   if (TM.getTargetTriple().isOSBinFormatELF()) {
     MachineModuleInfoELF &MMIELF = MMI->getObjFileInfo<MachineModuleInfoELF>();
 
     // Output stubs for external and common global variables.
     MachineModuleInfoELF::SymbolListTy Stubs = MMIELF.GetGVStubList();
     if (!Stubs.empty()) {
       OutStreamer->SwitchSection(TLOF.getDataSection());
       const DataLayout &DL = M.getDataLayout();
 
       emitAlignment(Align(DL.getPointerSize()));
       for (const auto &Stub : Stubs) {
         OutStreamer->emitLabel(Stub.first);
         OutStreamer->emitSymbolValue(Stub.second.getPointer(),
                                      DL.getPointerSize());
       }
     }
   }
 
   if (TM.getTargetTriple().isOSBinFormatCOFF()) {
     MachineModuleInfoCOFF &MMICOFF =
         MMI->getObjFileInfo<MachineModuleInfoCOFF>();
 
     // Output stubs for external and common global variables.
     MachineModuleInfoCOFF::SymbolListTy Stubs = MMICOFF.GetGVStubList();
     if (!Stubs.empty()) {
       const DataLayout &DL = M.getDataLayout();
 
       for (const auto &Stub : Stubs) {
         SmallString<256> SectionName = StringRef(".rdata$");
         SectionName += Stub.first->getName();
         OutStreamer->SwitchSection(OutContext.getCOFFSection(
             SectionName,
             COFF::IMAGE_SCN_CNT_INITIALIZED_DATA | COFF::IMAGE_SCN_MEM_READ |
                 COFF::IMAGE_SCN_LNK_COMDAT,
             SectionKind::getReadOnly(), Stub.first->getName(),
             COFF::IMAGE_COMDAT_SELECT_ANY));
         emitAlignment(Align(DL.getPointerSize()));
         OutStreamer->emitSymbolAttribute(Stub.first, MCSA_Global);
         OutStreamer->emitLabel(Stub.first);
         OutStreamer->emitSymbolValue(Stub.second.getPointer(),
                                      DL.getPointerSize());
       }
     }
   }
 
   // Finalize debug and EH information.
   for (const HandlerInfo &HI : Handlers) {
     NamedRegionTimer T(HI.TimerName, HI.TimerDescription, HI.TimerGroupName,
                        HI.TimerGroupDescription, TimePassesIsEnabled);
     HI.Handler->endModule();
   }
   Handlers.clear();
   DD = nullptr;
 
   // If the target wants to know about weak references, print them all.
   if (MAI->getWeakRefDirective()) {
     // FIXME: This is not lazy, it would be nice to only print weak references
     // to stuff that is actually used.  Note that doing so would require targets
     // to notice uses in operands (due to constant exprs etc).  This should
     // happen with the MC stuff eventually.
 
     // Print out module-level global objects here.
     for (const auto &GO : M.global_objects()) {
       if (!GO.hasExternalWeakLinkage())
         continue;
       OutStreamer->emitSymbolAttribute(getSymbol(&GO), MCSA_WeakReference);
     }
   }
 
   // Print aliases in topological order, that is, for each alias a = b,
   // b must be printed before a.
   // This is because on some targets (e.g. PowerPC) linker expects aliases in
   // such an order to generate correct TOC information.
   SmallVector<const GlobalAlias *, 16> AliasStack;
   SmallPtrSet<const GlobalAlias *, 16> AliasVisited;
   for (const auto &Alias : M.aliases()) {
     for (const GlobalAlias *Cur = &Alias; Cur;
          Cur = dyn_cast<GlobalAlias>(Cur->getAliasee())) {
       if (!AliasVisited.insert(Cur).second)
         break;
       AliasStack.push_back(Cur);
     }
     for (const GlobalAlias *AncestorAlias : llvm::reverse(AliasStack))
       emitGlobalIndirectSymbol(M, *AncestorAlias);
     AliasStack.clear();
   }
   for (const auto &IFunc : M.ifuncs())
     emitGlobalIndirectSymbol(M, IFunc);
 
   GCModuleInfo *MI = getAnalysisIfAvailable<GCModuleInfo>();
   assert(MI && "AsmPrinter didn't require GCModuleInfo?");
   for (GCModuleInfo::iterator I = MI->end(), E = MI->begin(); I != E; )
     if (GCMetadataPrinter *MP = GetOrCreateGCPrinter(**--I))
       MP->finishAssembly(M, *MI, *this);
 
   // Emit llvm.ident metadata in an '.ident' directive.
   emitModuleIdents(M);
 
   // Emit bytes for llvm.commandline metadata.
   emitModuleCommandLines(M);
 
   // Emit __morestack address if needed for indirect calls.
   if (MMI->usesMorestackAddr()) {
     Align Alignment(1);
     MCSection *ReadOnlySection = getObjFileLowering().getSectionForConstant(
         getDataLayout(), SectionKind::getReadOnly(),
         /*C=*/nullptr, Alignment);
     OutStreamer->SwitchSection(ReadOnlySection);
 
     MCSymbol *AddrSymbol =
         OutContext.getOrCreateSymbol(StringRef("__morestack_addr"));
     OutStreamer->emitLabel(AddrSymbol);
 
     unsigned PtrSize = MAI->getCodePointerSize();
     OutStreamer->emitSymbolValue(GetExternalSymbolSymbol("__morestack"),
                                  PtrSize);
   }
 
   // Emit .note.GNU-split-stack and .note.GNU-no-split-stack sections if
   // split-stack is used.
   if (TM.getTargetTriple().isOSBinFormatELF() && MMI->hasSplitStack()) {
     OutStreamer->SwitchSection(
         OutContext.getELFSection(".note.GNU-split-stack", ELF::SHT_PROGBITS, 0));
     if (MMI->hasNosplitStack())
       OutStreamer->SwitchSection(
           OutContext.getELFSection(".note.GNU-no-split-stack", ELF::SHT_PROGBITS, 0));
   }
 
   // If we don't have any trampolines, then we don't require stack memory
   // to be executable. Some targets have a directive to declare this.
   Function *InitTrampolineIntrinsic = M.getFunction("llvm.init.trampoline");
   if (!InitTrampolineIntrinsic || InitTrampolineIntrinsic->use_empty())
     if (MCSection *S = MAI->getNonexecutableStackSection(OutContext))
       OutStreamer->SwitchSection(S);
 
   if (TM.getTargetTriple().isOSBinFormatCOFF()) {
     // Emit /EXPORT: flags for each exported global as necessary.
     const auto &TLOF = getObjFileLowering();
     std::string Flags;
 
     for (const GlobalValue &GV : M.global_values()) {
       raw_string_ostream OS(Flags);
       TLOF.emitLinkerFlagsForGlobal(OS, &GV);
       OS.flush();
       if (!Flags.empty()) {
         OutStreamer->SwitchSection(TLOF.getDrectveSection());
         OutStreamer->emitBytes(Flags);
       }
       Flags.clear();
     }
 
     // Emit /INCLUDE: flags for each used global as necessary.
     if (const auto *LU = M.getNamedGlobal("llvm.used")) {
       assert(LU->hasInitializer() &&
              "expected llvm.used to have an initializer");
       assert(isa<ArrayType>(LU->getValueType()) &&
              "expected llvm.used to be an array type");
       if (const auto *A = cast<ConstantArray>(LU->getInitializer())) {
         for (const Value *Op : A->operands()) {
           const auto *GV = cast<GlobalValue>(Op->stripPointerCasts());
           // Global symbols with internal or private linkage are not visible to
           // the linker, and thus would cause an error when the linker tried to
           // preserve the symbol due to the `/include:` directive.
           if (GV->hasLocalLinkage())
             continue;
 
           raw_string_ostream OS(Flags);
           TLOF.emitLinkerFlagsForUsed(OS, GV);
           OS.flush();
 
           if (!Flags.empty()) {
             OutStreamer->SwitchSection(TLOF.getDrectveSection());
             OutStreamer->emitBytes(Flags);
           }
           Flags.clear();
         }
       }
     }
   }
 
   if (TM.Options.EmitAddrsig) {
     // Emit address-significance attributes for all globals.
     OutStreamer->emitAddrsig();
     for (const GlobalValue &GV : M.global_values())
       if (!GV.use_empty() && !GV.isThreadLocal() &&
           !GV.hasDLLImportStorageClass() && !GV.getName().startswith("llvm.") &&
           !GV.hasAtLeastLocalUnnamedAddr())
         OutStreamer->emitAddrsigSym(getSymbol(&GV));
   }
 
   // Emit symbol partition specifications (ELF only).
   if (TM.getTargetTriple().isOSBinFormatELF()) {
     unsigned UniqueID = 0;
     for (const GlobalValue &GV : M.global_values()) {
       if (!GV.hasPartition() || GV.isDeclarationForLinker() ||
           GV.getVisibility() != GlobalValue::DefaultVisibility)
         continue;
 
       OutStreamer->SwitchSection(
           OutContext.getELFSection(".llvm_sympart", ELF::SHT_LLVM_SYMPART, 0, 0,
                                    "", ++UniqueID, nullptr));
       OutStreamer->emitBytes(GV.getPartition());
       OutStreamer->emitZeros(1);
       OutStreamer->emitValue(
           MCSymbolRefExpr::create(getSymbol(&GV), OutContext),
           MAI->getCodePointerSize());
     }
   }
 
   // Allow the target to emit any magic that it wants at the end of the file,
   // after everything else has gone out.
   emitEndOfAsmFile(M);
 
   MMI = nullptr;
 
   OutStreamer->Finish();
   OutStreamer->reset();
   OwnedMLI.reset();
   OwnedMDT.reset();
 
   return false;
 }
 
 MCSymbol *AsmPrinter::getCurExceptionSym() {
   if (!CurExceptionSym)
     CurExceptionSym = createTempSymbol("exception");
   return CurExceptionSym;
 }
 
 void AsmPrinter::SetupMachineFunction(MachineFunction &MF) {
   this->MF = &MF;
   const Function &F = MF.getFunction();
 
   // Get the function symbol.
   if (!MAI->needsFunctionDescriptors()) {
     CurrentFnSym = getSymbol(&MF.getFunction());
   } else {
     assert(TM.getTargetTriple().isOSAIX() &&
            "Only AIX uses the function descriptor hooks.");
     // AIX is unique here in that the name of the symbol emitted for the
     // function body does not have the same name as the source function's
     // C-linkage name.
     assert(CurrentFnDescSym && "The function descriptor symbol needs to be"
                                " initalized first.");
 
     // Get the function entry point symbol.
     CurrentFnSym = getObjFileLowering().getFunctionEntryPointSymbol(&F, TM);
   }
 
   CurrentFnSymForSize = CurrentFnSym;
   CurrentFnBegin = nullptr;
   CurrentSectionBeginSym = nullptr;
   MBBSectionRanges.clear();
   CurExceptionSym = nullptr;
   bool NeedsLocalForSize = MAI->needsLocalForSize();
   if (F.hasFnAttribute("patchable-function-entry") ||
       F.hasFnAttribute("function-instrument") ||
       F.hasFnAttribute("xray-instruction-threshold") ||
       needFuncLabelsForEHOrDebugInfo(MF) || NeedsLocalForSize ||
       MF.getTarget().Options.EmitStackSizeSection) {
     CurrentFnBegin = createTempSymbol("func_begin");
     if (NeedsLocalForSize)
       CurrentFnSymForSize = CurrentFnBegin;
   }
 
   ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE();
 }
 
 namespace {
 
 // Keep track the alignment, constpool entries per Section.
   struct SectionCPs {
     MCSection *S;
     Align Alignment;
     SmallVector<unsigned, 4> CPEs;
 
     SectionCPs(MCSection *s, Align a) : S(s), Alignment(a) {}
   };
 
 } // end anonymous namespace
 
 /// EmitConstantPool - Print to the current output stream assembly
 /// representations of the constants in the constant pool MCP. This is
 /// used to print out constants which have been "spilled to memory" by
 /// the code generator.
 void AsmPrinter::emitConstantPool() {
   const MachineConstantPool *MCP = MF->getConstantPool();
   const std::vector<MachineConstantPoolEntry> &CP = MCP->getConstants();
   if (CP.empty()) return;
 
   // Calculate sections for constant pool entries. We collect entries to go into
   // the same section together to reduce amount of section switch statements.
   SmallVector<SectionCPs, 4> CPSections;
   for (unsigned i = 0, e = CP.size(); i != e; ++i) {
     const MachineConstantPoolEntry &CPE = CP[i];
     Align Alignment = CPE.getAlign();
 
     SectionKind Kind = CPE.getSectionKind(&getDataLayout());
 
     const Constant *C = nullptr;
     if (!CPE.isMachineConstantPoolEntry())
       C = CPE.Val.ConstVal;
 
     MCSection *S = getObjFileLowering().getSectionForConstant(
         getDataLayout(), Kind, C, Alignment);
 
     // The number of sections are small, just do a linear search from the
     // last section to the first.
     bool Found = false;
     unsigned SecIdx = CPSections.size();
     while (SecIdx != 0) {
       if (CPSections[--SecIdx].S == S) {
         Found = true;
         break;
       }
     }
     if (!Found) {
       SecIdx = CPSections.size();
       CPSections.push_back(SectionCPs(S, Alignment));
     }
 
     if (Alignment > CPSections[SecIdx].Alignment)
       CPSections[SecIdx].Alignment = Alignment;
     CPSections[SecIdx].CPEs.push_back(i);
   }
 
   // Now print stuff into the calculated sections.
   const MCSection *CurSection = nullptr;
   unsigned Offset = 0;
   for (unsigned i = 0, e = CPSections.size(); i != e; ++i) {
     for (unsigned j = 0, ee = CPSections[i].CPEs.size(); j != ee; ++j) {
       unsigned CPI = CPSections[i].CPEs[j];
       MCSymbol *Sym = GetCPISymbol(CPI);
       if (!Sym->isUndefined())
         continue;
 
       if (CurSection != CPSections[i].S) {
         OutStreamer->SwitchSection(CPSections[i].S);
         emitAlignment(Align(CPSections[i].Alignment));
         CurSection = CPSections[i].S;
         Offset = 0;
       }
 
       MachineConstantPoolEntry CPE = CP[CPI];
 
       // Emit inter-object padding for alignment.
       unsigned NewOffset = alignTo(Offset, CPE.getAlign());
       OutStreamer->emitZeros(NewOffset - Offset);
 
       Type *Ty = CPE.getType();
       Offset = NewOffset + getDataLayout().getTypeAllocSize(Ty);
 
       OutStreamer->emitLabel(Sym);
       if (CPE.isMachineConstantPoolEntry())
         emitMachineConstantPoolValue(CPE.Val.MachineCPVal);
       else
         emitGlobalConstant(getDataLayout(), CPE.Val.ConstVal);
     }
   }
 }
 
 // Print assembly representations of the jump tables used by the current
 // function.
 void AsmPrinter::emitJumpTableInfo() {
   const DataLayout &DL = MF->getDataLayout();
   const MachineJumpTableInfo *MJTI = MF->getJumpTableInfo();
   if (!MJTI) return;
   if (MJTI->getEntryKind() == MachineJumpTableInfo::EK_Inline) return;
   const std::vector<MachineJumpTableEntry> &JT = MJTI->getJumpTables();
   if (JT.empty()) return;
 
   // Pick the directive to use to print the jump table entries, and switch to
   // the appropriate section.
   const Function &F = MF->getFunction();
   const TargetLoweringObjectFile &TLOF = getObjFileLowering();
   bool JTInDiffSection = !TLOF.shouldPutJumpTableInFunctionSection(
       MJTI->getEntryKind() == MachineJumpTableInfo::EK_LabelDifference32,
       F);
   if (JTInDiffSection) {
     // Drop it in the readonly section.
     MCSection *ReadOnlySection = TLOF.getSectionForJumpTable(F, TM);
     OutStreamer->SwitchSection(ReadOnlySection);
   }
 
   emitAlignment(Align(MJTI->getEntryAlignment(DL)));
 
   // Jump tables in code sections are marked with a data_region directive
   // where that's supported.
   if (!JTInDiffSection)
     OutStreamer->emitDataRegion(MCDR_DataRegionJT32);
 
   for (unsigned JTI = 0, e = JT.size(); JTI != e; ++JTI) {
     const std::vector<MachineBasicBlock*> &JTBBs = JT[JTI].MBBs;
 
     // If this jump table was deleted, ignore it.
     if (JTBBs.empty()) continue;
 
     // For the EK_LabelDifference32 entry, if using .set avoids a relocation,
     /// emit a .set directive for each unique entry.
     if (MJTI->getEntryKind() == MachineJumpTableInfo::EK_LabelDifference32 &&
         MAI->doesSetDirectiveSuppressReloc()) {
       SmallPtrSet<const MachineBasicBlock*, 16> EmittedSets;
       const TargetLowering *TLI = MF->getSubtarget().getTargetLowering();
       const MCExpr *Base = TLI->getPICJumpTableRelocBaseExpr(MF,JTI,OutContext);
       for (unsigned ii = 0, ee = JTBBs.size(); ii != ee; ++ii) {
         const MachineBasicBlock *MBB = JTBBs[ii];
         if (!EmittedSets.insert(MBB).second)
           continue;
 
         // .set LJTSet, LBB32-base
         const MCExpr *LHS =
           MCSymbolRefExpr::create(MBB->getSymbol(), OutContext);
         OutStreamer->emitAssignment(GetJTSetSymbol(JTI, MBB->getNumber()),
                                     MCBinaryExpr::createSub(LHS, Base,
                                                             OutContext));
       }
     }
 
     // On some targets (e.g. Darwin) we want to emit two consecutive labels
     // before each jump table.  The first label is never referenced, but tells
     // the assembler and linker the extents of the jump table object.  The
     // second label is actually referenced by the code.
     if (JTInDiffSection && DL.hasLinkerPrivateGlobalPrefix())
       // FIXME: This doesn't have to have any specific name, just any randomly
       // named and numbered local label started with 'l' would work.  Simplify
       // GetJTISymbol.
       OutStreamer->emitLabel(GetJTISymbol(JTI, true));
 
     MCSymbol* JTISymbol = GetJTISymbol(JTI);
     OutStreamer->emitLabel(JTISymbol);
 
     for (unsigned ii = 0, ee = JTBBs.size(); ii != ee; ++ii)
       emitJumpTableEntry(MJTI, JTBBs[ii], JTI);
   }
   if (!JTInDiffSection)
     OutStreamer->emitDataRegion(MCDR_DataRegionEnd);
 }
 
 /// EmitJumpTableEntry - Emit a jump table entry for the specified MBB to the
 /// current stream.
 void AsmPrinter::emitJumpTableEntry(const MachineJumpTableInfo *MJTI,
                                     const MachineBasicBlock *MBB,
                                     unsigned UID) const {
   assert(MBB && MBB->getNumber() >= 0 && "Invalid basic block");
   const MCExpr *Value = nullptr;
   switch (MJTI->getEntryKind()) {
   case MachineJumpTableInfo::EK_Inline:
     llvm_unreachable("Cannot emit EK_Inline jump table entry");
   case MachineJumpTableInfo::EK_Custom32:
     Value = MF->getSubtarget().getTargetLowering()->LowerCustomJumpTableEntry(
         MJTI, MBB, UID, OutContext);
     break;
   case MachineJumpTableInfo::EK_BlockAddress:
     // EK_BlockAddress - Each entry is a plain address of block, e.g.:
     //     .word LBB123
     Value = MCSymbolRefExpr::create(MBB->getSymbol(), OutContext);
     break;
   case MachineJumpTableInfo::EK_GPRel32BlockAddress: {
     // EK_GPRel32BlockAddress - Each entry is an address of block, encoded
     // with a relocation as gp-relative, e.g.:
     //     .gprel32 LBB123
     MCSymbol *MBBSym = MBB->getSymbol();
     OutStreamer->emitGPRel32Value(MCSymbolRefExpr::create(MBBSym, OutContext));
     return;
   }
 
   case MachineJumpTableInfo::EK_GPRel64BlockAddress: {
     // EK_GPRel64BlockAddress - Each entry is an address of block, encoded
     // with a relocation as gp-relative, e.g.:
     //     .gpdword LBB123
     MCSymbol *MBBSym = MBB->getSymbol();
     OutStreamer->emitGPRel64Value(MCSymbolRefExpr::create(MBBSym, OutContext));
     return;
   }
 
   case MachineJumpTableInfo::EK_LabelDifference32: {
     // Each entry is the address of the block minus the address of the jump
     // table. This is used for PIC jump tables where gprel32 is not supported.
     // e.g.:
     //      .word LBB123 - LJTI1_2
     // If the .set directive avoids relocations, this is emitted as:
     //      .set L4_5_set_123, LBB123 - LJTI1_2
     //      .word L4_5_set_123
     if (MAI->doesSetDirectiveSuppressReloc()) {
       Value = MCSymbolRefExpr::create(GetJTSetSymbol(UID, MBB->getNumber()),
                                       OutContext);
       break;
     }
     Value = MCSymbolRefExpr::create(MBB->getSymbol(), OutContext);
     const TargetLowering *TLI = MF->getSubtarget().getTargetLowering();
     const MCExpr *Base = TLI->getPICJumpTableRelocBaseExpr(MF, UID, OutContext);
     Value = MCBinaryExpr::createSub(Value, Base, OutContext);
     break;
   }
   }
 
   assert(Value && "Unknown entry kind!");
 
   unsigned EntrySize = MJTI->getEntrySize(getDataLayout());
   OutStreamer->emitValue(Value, EntrySize);
 }
 
 /// EmitSpecialLLVMGlobal - Check to see if the specified global is a
 /// special global used by LLVM.  If so, emit it and return true, otherwise
 /// do nothing and return false.
 bool AsmPrinter::emitSpecialLLVMGlobal(const GlobalVariable *GV) {
   if (GV->getName() == "llvm.used") {
     if (MAI->hasNoDeadStrip())    // No need to emit this at all.
       emitLLVMUsedList(cast<ConstantArray>(GV->getInitializer()));
     return true;
   }
 
   // Ignore debug and non-emitted data.  This handles llvm.compiler.used.
   if (GV->getSection() == "llvm.metadata" ||
       GV->hasAvailableExternallyLinkage())
     return true;
 
   if (!GV->hasAppendingLinkage()) return false;
 
   assert(GV->hasInitializer() && "Not a special LLVM global!");
 
   if (GV->getName() == "llvm.global_ctors") {
     emitXXStructorList(GV->getParent()->getDataLayout(), GV->getInitializer(),
                        /* isCtor */ true);
 
     return true;
   }
 
   if (GV->getName() == "llvm.global_dtors") {
     emitXXStructorList(GV->getParent()->getDataLayout(), GV->getInitializer(),
                        /* isCtor */ false);
 
     return true;
   }
 
   report_fatal_error("unknown special variable");
 }
 
 /// EmitLLVMUsedList - For targets that define a MAI::UsedDirective, mark each
 /// global in the specified llvm.used list.
 void AsmPrinter::emitLLVMUsedList(const ConstantArray *InitList) {
   // Should be an array of 'i8*'.
   for (unsigned i = 0, e = InitList->getNumOperands(); i != e; ++i) {
     const GlobalValue *GV =
       dyn_cast<GlobalValue>(InitList->getOperand(i)->stripPointerCasts());
     if (GV)
       OutStreamer->emitSymbolAttribute(getSymbol(GV), MCSA_NoDeadStrip);
   }
 }
 
 namespace {
 
 struct Structor {
   int Priority = 0;
   Constant *Func = nullptr;
   GlobalValue *ComdatKey = nullptr;
 
   Structor() = default;
 };
 
 } // end anonymous namespace
 
 /// EmitXXStructorList - Emit the ctor or dtor list taking into account the init
 /// priority.
 void AsmPrinter::emitXXStructorList(const DataLayout &DL, const Constant *List,
                                     bool isCtor) {
   // Should be an array of '{ i32, void ()*, i8* }' structs.  The first value is the
   // init priority.
   if (!isa<ConstantArray>(List)) return;
 
   // Gather the structors in a form that's convenient for sorting by priority.
   SmallVector<Structor, 8> Structors;
   for (Value *O : cast<ConstantArray>(List)->operands()) {
     auto *CS = cast<ConstantStruct>(O);
     if (CS->getOperand(1)->isNullValue())
       break;  // Found a null terminator, skip the rest.
     ConstantInt *Priority = dyn_cast<ConstantInt>(CS->getOperand(0));
     if (!Priority) continue; // Malformed.
     Structors.push_back(Structor());
     Structor &S = Structors.back();
     S.Priority = Priority->getLimitedValue(65535);
     S.Func = CS->getOperand(1);
     if (!CS->getOperand(2)->isNullValue())
       S.ComdatKey =
           dyn_cast<GlobalValue>(CS->getOperand(2)->stripPointerCasts());
   }
 
   // Emit the function pointers in the target-specific order
   llvm::stable_sort(Structors, [](const Structor &L, const Structor &R) {
     return L.Priority < R.Priority;
   });
   const Align Align = DL.getPointerPrefAlignment();
   for (Structor &S : Structors) {
     const TargetLoweringObjectFile &Obj = getObjFileLowering();
     const MCSymbol *KeySym = nullptr;
     if (GlobalValue *GV = S.ComdatKey) {
       if (GV->isDeclarationForLinker())
         // If the associated variable is not defined in this module
         // (it might be available_externally, or have been an
         // available_externally definition that was dropped by the
         // EliminateAvailableExternally pass), some other TU
         // will provide its dynamic initializer.
         continue;
 
       KeySym = getSymbol(GV);
     }
     MCSection *OutputSection =
         (isCtor ? Obj.getStaticCtorSection(S.Priority, KeySym)
                 : Obj.getStaticDtorSection(S.Priority, KeySym));
     OutStreamer->SwitchSection(OutputSection);
     if (OutStreamer->getCurrentSection() != OutStreamer->getPreviousSection())
       emitAlignment(Align);
     emitXXStructor(DL, S.Func);
   }
 }
 
 void AsmPrinter::emitModuleIdents(Module &M) {
   if (!MAI->hasIdentDirective())
     return;
 
   if (const NamedMDNode *NMD = M.getNamedMetadata("llvm.ident")) {
     for (unsigned i = 0, e = NMD->getNumOperands(); i != e; ++i) {
       const MDNode *N = NMD->getOperand(i);
       assert(N->getNumOperands() == 1 &&
              "llvm.ident metadata entry can have only one operand");
       const MDString *S = cast<MDString>(N->getOperand(0));
       OutStreamer->emitIdent(S->getString());
     }
   }
 }
 
 void AsmPrinter::emitModuleCommandLines(Module &M) {
   MCSection *CommandLine = getObjFileLowering().getSectionForCommandLines();
   if (!CommandLine)
     return;
 
   const NamedMDNode *NMD = M.getNamedMetadata("llvm.commandline");
   if (!NMD || !NMD->getNumOperands())
     return;
 
   OutStreamer->PushSection();
   OutStreamer->SwitchSection(CommandLine);
   OutStreamer->emitZeros(1);
   for (unsigned i = 0, e = NMD->getNumOperands(); i != e; ++i) {
     const MDNode *N = NMD->getOperand(i);
     assert(N->getNumOperands() == 1 &&
            "llvm.commandline metadata entry can have only one operand");
     const MDString *S = cast<MDString>(N->getOperand(0));
     OutStreamer->emitBytes(S->getString());
     OutStreamer->emitZeros(1);
   }
   OutStreamer->PopSection();
 }
 
 //===--------------------------------------------------------------------===//
 // Emission and print routines
 //
 
 /// Emit a byte directive and value.
 ///
 void AsmPrinter::emitInt8(int Value) const { OutStreamer->emitInt8(Value); }
 
 /// Emit a short directive and value.
 void AsmPrinter::emitInt16(int Value) const { OutStreamer->emitInt16(Value); }
 
 /// Emit a long directive and value.
 void AsmPrinter::emitInt32(int Value) const { OutStreamer->emitInt32(Value); }
 
 /// Emit a long long directive and value.
 void AsmPrinter::emitInt64(uint64_t Value) const {
   OutStreamer->emitInt64(Value);
 }
 
 /// Emit something like ".long Hi-Lo" where the size in bytes of the directive
 /// is specified by Size and Hi/Lo specify the labels. This implicitly uses
 /// .set if it avoids relocations.
 void AsmPrinter::emitLabelDifference(const MCSymbol *Hi, const MCSymbol *Lo,
                                      unsigned Size) const {
   OutStreamer->emitAbsoluteSymbolDiff(Hi, Lo, Size);
 }
 
 /// EmitLabelPlusOffset - Emit something like ".long Label+Offset"
 /// where the size in bytes of the directive is specified by Size and Label
 /// specifies the label.  This implicitly uses .set if it is available.
 void AsmPrinter::emitLabelPlusOffset(const MCSymbol *Label, uint64_t Offset,
                                      unsigned Size,
                                      bool IsSectionRelative) const {
   if (MAI->needsDwarfSectionOffsetDirective() && IsSectionRelative) {
     OutStreamer->EmitCOFFSecRel32(Label, Offset);
     if (Size > 4)
       OutStreamer->emitZeros(Size - 4);
     return;
   }
 
   // Emit Label+Offset (or just Label if Offset is zero)
   const MCExpr *Expr = MCSymbolRefExpr::create(Label, OutContext);
   if (Offset)
     Expr = MCBinaryExpr::createAdd(
         Expr, MCConstantExpr::create(Offset, OutContext), OutContext);
 
   OutStreamer->emitValue(Expr, Size);
 }
 
 //===----------------------------------------------------------------------===//
 
 // EmitAlignment - Emit an alignment directive to the specified power of
 // two boundary.  If a global value is specified, and if that global has
 // an explicit alignment requested, it will override the alignment request
 // if required for correctness.
 void AsmPrinter::emitAlignment(Align Alignment, const GlobalObject *GV) const {
   if (GV)
     Alignment = getGVAlignment(GV, GV->getParent()->getDataLayout(), Alignment);
 
   if (Alignment == Align(1))
     return; // 1-byte aligned: no need to emit alignment.
 
   if (getCurrentSection()->getKind().isText())
     OutStreamer->emitCodeAlignment(Alignment.value());
   else
     OutStreamer->emitValueToAlignment(Alignment.value());
 }
 
 //===----------------------------------------------------------------------===//
 // Constant emission.
 //===----------------------------------------------------------------------===//
 
 const MCExpr *AsmPrinter::lowerConstant(const Constant *CV) {
   MCContext &Ctx = OutContext;
 
   if (CV->isNullValue() || isa<UndefValue>(CV))
     return MCConstantExpr::create(0, Ctx);
 
   if (const ConstantInt *CI = dyn_cast<ConstantInt>(CV))
     return MCConstantExpr::create(CI->getZExtValue(), Ctx);
 
   if (const GlobalValue *GV = dyn_cast<GlobalValue>(CV))
     return MCSymbolRefExpr::create(getSymbol(GV), Ctx);
 
   if (const BlockAddress *BA = dyn_cast<BlockAddress>(CV))
     return MCSymbolRefExpr::create(GetBlockAddressSymbol(BA), Ctx);
 
   const ConstantExpr *CE = dyn_cast<ConstantExpr>(CV);
   if (!CE) {
     llvm_unreachable("Unknown constant value to lower!");
   }
 
   switch (CE->getOpcode()) {
   default: {
     // If the code isn't optimized, there may be outstanding folding
     // opportunities. Attempt to fold the expression using DataLayout as a
     // last resort before giving up.
     Constant *C = ConstantFoldConstant(CE, getDataLayout());
     if (C != CE)
       return lowerConstant(C);
 
     // Otherwise report the problem to the user.
     std::string S;
     raw_string_ostream OS(S);
     OS << "Unsupported expression in static initializer: ";
     CE->printAsOperand(OS, /*PrintType=*/false,
                    !MF ? nullptr : MF->getFunction().getParent());
     report_fatal_error(OS.str());
   }
   case Instruction::GetElementPtr: {
     // Generate a symbolic expression for the byte address
     APInt OffsetAI(getDataLayout().getPointerTypeSizeInBits(CE->getType()), 0);
     cast<GEPOperator>(CE)->accumulateConstantOffset(getDataLayout(), OffsetAI);
 
     const MCExpr *Base = lowerConstant(CE->getOperand(0));
     if (!OffsetAI)
       return Base;
 
     int64_t Offset = OffsetAI.getSExtValue();
     return MCBinaryExpr::createAdd(Base, MCConstantExpr::create(Offset, Ctx),
                                    Ctx);
   }
 
   case Instruction::Trunc:
     // We emit the value and depend on the assembler to truncate the generated
     // expression properly.  This is important for differences between
     // blockaddress labels.  Since the two labels are in the same function, it
     // is reasonable to treat their delta as a 32-bit value.
     LLVM_FALLTHROUGH;
   case Instruction::BitCast:
     return lowerConstant(CE->getOperand(0));
 
   case Instruction::IntToPtr: {
     const DataLayout &DL = getDataLayout();
 
     // Handle casts to pointers by changing them into casts to the appropriate
     // integer type.  This promotes constant folding and simplifies this code.
     Constant *Op = CE->getOperand(0);
     Op = ConstantExpr::getIntegerCast(Op, DL.getIntPtrType(CV->getType()),
                                       false/*ZExt*/);
     return lowerConstant(Op);
   }
 
   case Instruction::PtrToInt: {
     const DataLayout &DL = getDataLayout();
 
     // Support only foldable casts to/from pointers that can be eliminated by
     // changing the pointer to the appropriately sized integer type.
     Constant *Op = CE->getOperand(0);
     Type *Ty = CE->getType();
 
     const MCExpr *OpExpr = lowerConstant(Op);
 
     // We can emit the pointer value into this slot if the slot is an
     // integer slot equal to the size of the pointer.
     //
     // If the pointer is larger than the resultant integer, then
     // as with Trunc just depend on the assembler to truncate it.
     if (DL.getTypeAllocSize(Ty) <= DL.getTypeAllocSize(Op->getType()))
       return OpExpr;
 
     // Otherwise the pointer is smaller than the resultant integer, mask off
     // the high bits so we are sure to get a proper truncation if the input is
     // a constant expr.
     unsigned InBits = DL.getTypeAllocSizeInBits(Op->getType());
     const MCExpr *MaskExpr = MCConstantExpr::create(~0ULL >> (64-InBits), Ctx);
     return MCBinaryExpr::createAnd(OpExpr, MaskExpr, Ctx);
   }
 
   case Instruction::Sub: {
     GlobalValue *LHSGV;
     APInt LHSOffset;
     if (IsConstantOffsetFromGlobal(CE->getOperand(0), LHSGV, LHSOffset,
                                    getDataLayout())) {
       GlobalValue *RHSGV;
       APInt RHSOffset;
       if (IsConstantOffsetFromGlobal(CE->getOperand(1), RHSGV, RHSOffset,
                                      getDataLayout())) {
         const MCExpr *RelocExpr =
             getObjFileLowering().lowerRelativeReference(LHSGV, RHSGV, TM);
         if (!RelocExpr)
           RelocExpr = MCBinaryExpr::createSub(
               MCSymbolRefExpr::create(getSymbol(LHSGV), Ctx),
               MCSymbolRefExpr::create(getSymbol(RHSGV), Ctx), Ctx);
         int64_t Addend = (LHSOffset - RHSOffset).getSExtValue();
         if (Addend != 0)
           RelocExpr = MCBinaryExpr::createAdd(
               RelocExpr, MCConstantExpr::create(Addend, Ctx), Ctx);
         return RelocExpr;
       }
     }
   }
   // else fallthrough
   LLVM_FALLTHROUGH;
 
   // The MC library also has a right-shift operator, but it isn't consistently
   // signed or unsigned between different targets.
   case Instruction::Add:
   case Instruction::Mul:
   case Instruction::SDiv:
   case Instruction::SRem:
   case Instruction::Shl:
   case Instruction::And:
   case Instruction::Or:
   case Instruction::Xor: {
     const MCExpr *LHS = lowerConstant(CE->getOperand(0));
     const MCExpr *RHS = lowerConstant(CE->getOperand(1));
     switch (CE->getOpcode()) {
     default: llvm_unreachable("Unknown binary operator constant cast expr");
     case Instruction::Add: return MCBinaryExpr::createAdd(LHS, RHS, Ctx);
     case Instruction::Sub: return MCBinaryExpr::createSub(LHS, RHS, Ctx);
     case Instruction::Mul: return MCBinaryExpr::createMul(LHS, RHS, Ctx);
     case Instruction::SDiv: return MCBinaryExpr::createDiv(LHS, RHS, Ctx);
     case Instruction::SRem: return MCBinaryExpr::createMod(LHS, RHS, Ctx);
     case Instruction::Shl: return MCBinaryExpr::createShl(LHS, RHS, Ctx);
     case Instruction::And: return MCBinaryExpr::createAnd(LHS, RHS, Ctx);
     case Instruction::Or:  return MCBinaryExpr::createOr (LHS, RHS, Ctx);
     case Instruction::Xor: return MCBinaryExpr::createXor(LHS, RHS, Ctx);
     }
   }
   }
 }
 
 static void emitGlobalConstantImpl(const DataLayout &DL, const Constant *C,
                                    AsmPrinter &AP,
                                    const Constant *BaseCV = nullptr,
                                    uint64_t Offset = 0);
 
 static void emitGlobalConstantFP(const ConstantFP *CFP, AsmPrinter &AP);
 static void emitGlobalConstantFP(APFloat APF, Type *ET, AsmPrinter &AP);
 
 /// isRepeatedByteSequence - Determine whether the given value is
 /// composed of a repeated sequence of identical bytes and return the
 /// byte value.  If it is not a repeated sequence, return -1.
 static int isRepeatedByteSequence(const ConstantDataSequential *V) {
   StringRef Data = V->getRawDataValues();
   assert(!Data.empty() && "Empty aggregates should be CAZ node");
   char C = Data[0];
   for (unsigned i = 1, e = Data.size(); i != e; ++i)
     if (Data[i] != C) return -1;
   return static_cast<uint8_t>(C); // Ensure 255 is not returned as -1.
 }
 
 /// isRepeatedByteSequence - Determine whether the given value is
 /// composed of a repeated sequence of identical bytes and return the
 /// byte value.  If it is not a repeated sequence, return -1.
 static int isRepeatedByteSequence(const Value *V, const DataLayout &DL) {
   if (const ConstantInt *CI = dyn_cast<ConstantInt>(V)) {
     uint64_t Size = DL.getTypeAllocSizeInBits(V->getType());
     assert(Size % 8 == 0);
 
     // Extend the element to take zero padding into account.
     APInt Value = CI->getValue().zextOrSelf(Size);
     if (!Value.isSplat(8))
       return -1;
 
     return Value.zextOrTrunc(8).getZExtValue();
   }
   if (const ConstantArray *CA = dyn_cast<ConstantArray>(V)) {
     // Make sure all array elements are sequences of the same repeated
     // byte.
     assert(CA->getNumOperands() != 0 && "Should be a CAZ");
     Constant *Op0 = CA->getOperand(0);
     int Byte = isRepeatedByteSequence(Op0, DL);
     if (Byte == -1)
       return -1;
 
     // All array elements must be equal.
     for (unsigned i = 1, e = CA->getNumOperands(); i != e; ++i)
       if (CA->getOperand(i) != Op0)
         return -1;
     return Byte;
   }
 
   if (const ConstantDataSequential *CDS = dyn_cast<ConstantDataSequential>(V))
     return isRepeatedByteSequence(CDS);
 
   return -1;
 }
 
 static void emitGlobalConstantDataSequential(const DataLayout &DL,
                                              const ConstantDataSequential *CDS,
                                              AsmPrinter &AP) {
   // See if we can aggregate this into a .fill, if so, emit it as such.
   int Value = isRepeatedByteSequence(CDS, DL);
   if (Value != -1) {
     uint64_t Bytes = DL.getTypeAllocSize(CDS->getType());
     // Don't emit a 1-byte object as a .fill.
     if (Bytes > 1)
       return AP.OutStreamer->emitFill(Bytes, Value);
   }
 
   // If this can be emitted with .ascii/.asciz, emit it as such.
   if (CDS->isString())
     return AP.OutStreamer->emitBytes(CDS->getAsString());
 
   // Otherwise, emit the values in successive locations.
   unsigned ElementByteSize = CDS->getElementByteSize();
   if (isa<IntegerType>(CDS->getElementType())) {
     for (unsigned i = 0, e = CDS->getNumElements(); i != e; ++i) {
       if (AP.isVerbose())
         AP.OutStreamer->GetCommentOS() << format("0x%" PRIx64 "\n",
                                                  CDS->getElementAsInteger(i));
       AP.OutStreamer->emitIntValue(CDS->getElementAsInteger(i),
                                    ElementByteSize);
     }
   } else {
     Type *ET = CDS->getElementType();
     for (unsigned I = 0, E = CDS->getNumElements(); I != E; ++I)
       emitGlobalConstantFP(CDS->getElementAsAPFloat(I), ET, AP);
   }
 
   unsigned Size = DL.getTypeAllocSize(CDS->getType());
   unsigned EmittedSize =
       DL.getTypeAllocSize(CDS->getElementType()) * CDS->getNumElements();
   assert(EmittedSize <= Size && "Size cannot be less than EmittedSize!");
   if (unsigned Padding = Size - EmittedSize)
     AP.OutStreamer->emitZeros(Padding);
 }
 
 static void emitGlobalConstantArray(const DataLayout &DL,
                                     const ConstantArray *CA, AsmPrinter &AP,
                                     const Constant *BaseCV, uint64_t Offset) {
   // See if we can aggregate some values.  Make sure it can be
   // represented as a series of bytes of the constant value.
   int Value = isRepeatedByteSequence(CA, DL);
 
   if (Value != -1) {
     uint64_t Bytes = DL.getTypeAllocSize(CA->getType());
     AP.OutStreamer->emitFill(Bytes, Value);
   }
   else {
     for (unsigned i = 0, e = CA->getNumOperands(); i != e; ++i) {
       emitGlobalConstantImpl(DL, CA->getOperand(i), AP, BaseCV, Offset);
       Offset += DL.getTypeAllocSize(CA->getOperand(i)->getType());
     }
   }
 }
 
 static void emitGlobalConstantVector(const DataLayout &DL,
                                      const ConstantVector *CV, AsmPrinter &AP) {
   for (unsigned i = 0, e = CV->getType()->getNumElements(); i != e; ++i)
     emitGlobalConstantImpl(DL, CV->getOperand(i), AP);
 
   unsigned Size = DL.getTypeAllocSize(CV->getType());
   unsigned EmittedSize = DL.getTypeAllocSize(CV->getType()->getElementType()) *
                          CV->getType()->getNumElements();
   if (unsigned Padding = Size - EmittedSize)
     AP.OutStreamer->emitZeros(Padding);
 }
 
 static void emitGlobalConstantStruct(const DataLayout &DL,
                                      const ConstantStruct *CS, AsmPrinter &AP,
                                      const Constant *BaseCV, uint64_t Offset) {
   // Print the fields in successive locations. Pad to align if needed!
   unsigned Size = DL.getTypeAllocSize(CS->getType());
   const StructLayout *Layout = DL.getStructLayout(CS->getType());
   uint64_t SizeSoFar = 0;
   for (unsigned i = 0, e = CS->getNumOperands(); i != e; ++i) {
     const Constant *Field = CS->getOperand(i);
 
     // Print the actual field value.
     emitGlobalConstantImpl(DL, Field, AP, BaseCV, Offset + SizeSoFar);
 
     // Check if padding is needed and insert one or more 0s.
     uint64_t FieldSize = DL.getTypeAllocSize(Field->getType());
     uint64_t PadSize = ((i == e-1 ? Size : Layout->getElementOffset(i+1))
                         - Layout->getElementOffset(i)) - FieldSize;
     SizeSoFar += FieldSize + PadSize;
 
     // Insert padding - this may include padding to increase the size of the
     // current field up to the ABI size (if the struct is not packed) as well
     // as padding to ensure that the next field starts at the right offset.
     AP.OutStreamer->emitZeros(PadSize);
   }
   assert(SizeSoFar == Layout->getSizeInBytes() &&
          "Layout of constant struct may be incorrect!");
 }
 
 static void emitGlobalConstantFP(APFloat APF, Type *ET, AsmPrinter &AP) {
   assert(ET && "Unknown float type");
   APInt API = APF.bitcastToAPInt();
 
   // First print a comment with what we think the original floating-point value
   // should have been.
   if (AP.isVerbose()) {
     SmallString<8> StrVal;
     APF.toString(StrVal);
     ET->print(AP.OutStreamer->GetCommentOS());
     AP.OutStreamer->GetCommentOS() << ' ' << StrVal << '\n';
   }
 
   // Now iterate through the APInt chunks, emitting them in endian-correct
   // order, possibly with a smaller chunk at beginning/end (e.g. for x87 80-bit
   // floats).
   unsigned NumBytes = API.getBitWidth() / 8;
   unsigned TrailingBytes = NumBytes % sizeof(uint64_t);
   const uint64_t *p = API.getRawData();
 
   // PPC's long double has odd notions of endianness compared to how LLVM
   // handles it: p[0] goes first for *big* endian on PPC.
   if (AP.getDataLayout().isBigEndian() && !ET->isPPC_FP128Ty()) {
     int Chunk = API.getNumWords() - 1;
 
     if (TrailingBytes)
       AP.OutStreamer->emitIntValueInHexWithPadding(p[Chunk--], TrailingBytes);
 
     for (; Chunk >= 0; --Chunk)
       AP.OutStreamer->emitIntValueInHexWithPadding(p[Chunk], sizeof(uint64_t));
   } else {
     unsigned Chunk;
     for (Chunk = 0; Chunk < NumBytes / sizeof(uint64_t); ++Chunk)
       AP.OutStreamer->emitIntValueInHexWithPadding(p[Chunk], sizeof(uint64_t));
 
     if (TrailingBytes)
       AP.OutStreamer->emitIntValueInHexWithPadding(p[Chunk], TrailingBytes);
   }
 
   // Emit the tail padding for the long double.
   const DataLayout &DL = AP.getDataLayout();
   AP.OutStreamer->emitZeros(DL.getTypeAllocSize(ET) - DL.getTypeStoreSize(ET));
 }
 
 static void emitGlobalConstantFP(const ConstantFP *CFP, AsmPrinter &AP) {
   emitGlobalConstantFP(CFP->getValueAPF(), CFP->getType(), AP);
 }
 
 static void emitGlobalConstantLargeInt(const ConstantInt *CI, AsmPrinter &AP) {
   const DataLayout &DL = AP.getDataLayout();
   unsigned BitWidth = CI->getBitWidth();
 
   // Copy the value as we may massage the layout for constants whose bit width
   // is not a multiple of 64-bits.
   APInt Realigned(CI->getValue());
   uint64_t ExtraBits = 0;
   unsigned ExtraBitsSize = BitWidth & 63;
 
   if (ExtraBitsSize) {
     // The bit width of the data is not a multiple of 64-bits.
     // The extra bits are expected to be at the end of the chunk of the memory.
     // Little endian:
     // * Nothing to be done, just record the extra bits to emit.
     // Big endian:
     // * Record the extra bits to emit.
     // * Realign the raw data to emit the chunks of 64-bits.
     if (DL.isBigEndian()) {
       // Basically the structure of the raw data is a chunk of 64-bits cells:
       //    0        1         BitWidth / 64
       // [chunk1][chunk2] ... [chunkN].
       // The most significant chunk is chunkN and it should be emitted first.
       // However, due to the alignment issue chunkN contains useless bits.
       // Realign the chunks so that they contain only useful information:
       // ExtraBits     0       1       (BitWidth / 64) - 1
       //       chu[nk1 chu][nk2 chu] ... [nkN-1 chunkN]
       ExtraBitsSize = alignTo(ExtraBitsSize, 8);
       ExtraBits = Realigned.getRawData()[0] &
         (((uint64_t)-1) >> (64 - ExtraBitsSize));
       Realigned.lshrInPlace(ExtraBitsSize);
     } else
       ExtraBits = Realigned.getRawData()[BitWidth / 64];
   }
 
   // We don't expect assemblers to support integer data directives
   // for more than 64 bits, so we emit the data in at most 64-bit
   // quantities at a time.
   const uint64_t *RawData = Realigned.getRawData();
   for (unsigned i = 0, e = BitWidth / 64; i != e; ++i) {
     uint64_t Val = DL.isBigEndian() ? RawData[e - i - 1] : RawData[i];
     AP.OutStreamer->emitIntValue(Val, 8);
   }
 
   if (ExtraBitsSize) {
     // Emit the extra bits after the 64-bits chunks.
 
     // Emit a directive that fills the expected size.
     uint64_t Size = AP.getDataLayout().getTypeStoreSize(CI->getType());
     Size -= (BitWidth / 64) * 8;
     assert(Size && Size * 8 >= ExtraBitsSize &&
            (ExtraBits & (((uint64_t)-1) >> (64 - ExtraBitsSize)))
            == ExtraBits && "Directive too small for extra bits.");
     AP.OutStreamer->emitIntValue(ExtraBits, Size);
   }
 }
 
 /// Transform a not absolute MCExpr containing a reference to a GOT
 /// equivalent global, by a target specific GOT pc relative access to the
 /// final symbol.
 static void handleIndirectSymViaGOTPCRel(AsmPrinter &AP, const MCExpr **ME,
                                          const Constant *BaseCst,
                                          uint64_t Offset) {
   // The global @foo below illustrates a global that uses a got equivalent.
   //
   //  @bar = global i32 42
   //  @gotequiv = private unnamed_addr constant i32* @bar
   //  @foo = i32 trunc (i64 sub (i64 ptrtoint (i32** @gotequiv to i64),
   //                             i64 ptrtoint (i32* @foo to i64))
   //                        to i32)
   //
   // The cstexpr in @foo is converted into the MCExpr `ME`, where we actually
   // check whether @foo is suitable to use a GOTPCREL. `ME` is usually in the
   // form:
   //
   //  foo = cstexpr, where
   //    cstexpr := <gotequiv> - "." + <cst>
   //    cstexpr := <gotequiv> - (<foo> - <offset from @foo base>) + <cst>
   //
   // After canonicalization by evaluateAsRelocatable `ME` turns into:
   //
   //  cstexpr := <gotequiv> - <foo> + gotpcrelcst, where
   //    gotpcrelcst := <offset from @foo base> + <cst>
   MCValue MV;
   if (!(*ME)->evaluateAsRelocatable(MV, nullptr, nullptr) || MV.isAbsolute())
     return;
   const MCSymbolRefExpr *SymA = MV.getSymA();
   if (!SymA)
     return;
 
   // Check that GOT equivalent symbol is cached.
   const MCSymbol *GOTEquivSym = &SymA->getSymbol();
   if (!AP.GlobalGOTEquivs.count(GOTEquivSym))
     return;
 
   const GlobalValue *BaseGV = dyn_cast_or_null<GlobalValue>(BaseCst);
   if (!BaseGV)
     return;
 
   // Check for a valid base symbol
   const MCSymbol *BaseSym = AP.getSymbol(BaseGV);
   const MCSymbolRefExpr *SymB = MV.getSymB();
 
   if (!SymB || BaseSym != &SymB->getSymbol())
     return;
 
   // Make sure to match:
   //
   //    gotpcrelcst := <offset from @foo base> + <cst>
   //
   // If gotpcrelcst is positive it means that we can safely fold the pc rel
   // displacement into the GOTPCREL. We can also can have an extra offset <cst>
   // if the target knows how to encode it.
   int64_t GOTPCRelCst = Offset + MV.getConstant();
   if (GOTPCRelCst < 0)
     return;
   if (!AP.getObjFileLowering().supportGOTPCRelWithOffset() && GOTPCRelCst != 0)
     return;
 
   // Emit the GOT PC relative to replace the got equivalent global, i.e.:
   //
   //  bar:
   //    .long 42
   //  gotequiv:
   //    .quad bar
   //  foo:
   //    .long gotequiv - "." + <cst>
   //
   // is replaced by the target specific equivalent to:
   //
   //  bar:
   //    .long 42
   //  foo:
   //    .long bar@GOTPCREL+<gotpcrelcst>
   AsmPrinter::GOTEquivUsePair Result = AP.GlobalGOTEquivs[GOTEquivSym];
   const GlobalVariable *GV = Result.first;
   int NumUses = (int)Result.second;
   const GlobalValue *FinalGV = dyn_cast<GlobalValue>(GV->getOperand(0));
   const MCSymbol *FinalSym = AP.getSymbol(FinalGV);
   *ME = AP.getObjFileLowering().getIndirectSymViaGOTPCRel(
       FinalGV, FinalSym, MV, Offset, AP.MMI, *AP.OutStreamer);
 
   // Update GOT equivalent usage information
   --NumUses;
   if (NumUses >= 0)
     AP.GlobalGOTEquivs[GOTEquivSym] = std::make_pair(GV, NumUses);
 }
 
 static void emitGlobalConstantImpl(const DataLayout &DL, const Constant *CV,
                                    AsmPrinter &AP, const Constant *BaseCV,
                                    uint64_t Offset) {
   uint64_t Size = DL.getTypeAllocSize(CV->getType());
 
   // Globals with sub-elements such as combinations of arrays and structs
   // are handled recursively by emitGlobalConstantImpl. Keep track of the
   // constant symbol base and the current position with BaseCV and Offset.
   if (!BaseCV && CV->hasOneUse())
     BaseCV = dyn_cast<Constant>(CV->user_back());
 
   if (isa<ConstantAggregateZero>(CV) || isa<UndefValue>(CV))
     return AP.OutStreamer->emitZeros(Size);
 
   if (const ConstantInt *CI = dyn_cast<ConstantInt>(CV)) {
     const uint64_t StoreSize = DL.getTypeStoreSize(CV->getType());
 
-    if (StoreSize < 8) {
+    if (StoreSize <= 8) {
       if (AP.isVerbose())
         AP.OutStreamer->GetCommentOS() << format("0x%" PRIx64 "\n",
                                                  CI->getZExtValue());
       AP.OutStreamer->emitIntValue(CI->getZExtValue(), StoreSize);
     } else {
       emitGlobalConstantLargeInt(CI, AP);
     }
 
     // Emit tail padding if needed
     if (Size != StoreSize)
       AP.OutStreamer->emitZeros(Size - StoreSize);
 
     return;
   }
 
   if (const ConstantFP *CFP = dyn_cast<ConstantFP>(CV))
     return emitGlobalConstantFP(CFP, AP);
 
   if (isa<ConstantPointerNull>(CV)) {
     AP.OutStreamer->emitIntValue(0, Size);
     return;
   }
 
   if (const ConstantDataSequential *CDS = dyn_cast<ConstantDataSequential>(CV))
     return emitGlobalConstantDataSequential(DL, CDS, AP);
 
   if (const ConstantArray *CVA = dyn_cast<ConstantArray>(CV))
     return emitGlobalConstantArray(DL, CVA, AP, BaseCV, Offset);
 
   if (const ConstantStruct *CVS = dyn_cast<ConstantStruct>(CV))
     return emitGlobalConstantStruct(DL, CVS, AP, BaseCV, Offset);
 
   if (const ConstantExpr *CE = dyn_cast<ConstantExpr>(CV)) {
     // Look through bitcasts, which might not be able to be MCExpr'ized (e.g. of
     // vectors).
     if (CE->getOpcode() == Instruction::BitCast)
       return emitGlobalConstantImpl(DL, CE->getOperand(0), AP);
 
     if (Size > 8) {
       // If the constant expression's size is greater than 64-bits, then we have
       // to emit the value in chunks. Try to constant fold the value and emit it
       // that way.
       Constant *New = ConstantFoldConstant(CE, DL);
       if (New != CE)
         return emitGlobalConstantImpl(DL, New, AP);
     }
   }
 
   if (const ConstantVector *V = dyn_cast<ConstantVector>(CV))
     return emitGlobalConstantVector(DL, V, AP);
 
   // Otherwise, it must be a ConstantExpr.  Lower it to an MCExpr, then emit it
   // thread the streamer with EmitValue.
   const MCExpr *ME = AP.lowerConstant(CV);
 
   // Since lowerConstant already folded and got rid of all IR pointer and
   // integer casts, detect GOT equivalent accesses by looking into the MCExpr
   // directly.
   if (AP.getObjFileLowering().supportIndirectSymViaGOTPCRel())
     handleIndirectSymViaGOTPCRel(AP, &ME, BaseCV, Offset);
 
   AP.OutStreamer->emitValue(ME, Size);
 }
 
 /// EmitGlobalConstant - Print a general LLVM constant to the .s file.
 void AsmPrinter::emitGlobalConstant(const DataLayout &DL, const Constant *CV) {
   uint64_t Size = DL.getTypeAllocSize(CV->getType());
   if (Size)
     emitGlobalConstantImpl(DL, CV, *this);
   else if (MAI->hasSubsectionsViaSymbols()) {
     // If the global has zero size, emit a single byte so that two labels don't
     // look like they are at the same location.
     OutStreamer->emitIntValue(0, 1);
   }
 }
 
 void AsmPrinter::emitMachineConstantPoolValue(MachineConstantPoolValue *MCPV) {
   // Target doesn't support this yet!
   llvm_unreachable("Target does not support EmitMachineConstantPoolValue");
 }
 
 void AsmPrinter::printOffset(int64_t Offset, raw_ostream &OS) const {
   if (Offset > 0)
     OS << '+' << Offset;
   else if (Offset < 0)
     OS << Offset;
 }
 
 void AsmPrinter::emitNops(unsigned N) {
   MCInst Nop;
   MF->getSubtarget().getInstrInfo()->getNoop(Nop);
   for (; N; --N)
     EmitToStreamer(*OutStreamer, Nop);
 }
 
 //===----------------------------------------------------------------------===//
 // Symbol Lowering Routines.
 //===----------------------------------------------------------------------===//
 
 MCSymbol *AsmPrinter::createTempSymbol(const Twine &Name) const {
   return OutContext.createTempSymbol(Name, true);
 }
 
 MCSymbol *AsmPrinter::GetBlockAddressSymbol(const BlockAddress *BA) const {
   return MMI->getAddrLabelSymbol(BA->getBasicBlock());
 }
 
 MCSymbol *AsmPrinter::GetBlockAddressSymbol(const BasicBlock *BB) const {
   return MMI->getAddrLabelSymbol(BB);
 }
 
 /// GetCPISymbol - Return the symbol for the specified constant pool entry.
 MCSymbol *AsmPrinter::GetCPISymbol(unsigned CPID) const {
   if (getSubtargetInfo().getTargetTriple().isWindowsMSVCEnvironment()) {
     const MachineConstantPoolEntry &CPE =
         MF->getConstantPool()->getConstants()[CPID];
     if (!CPE.isMachineConstantPoolEntry()) {
       const DataLayout &DL = MF->getDataLayout();
       SectionKind Kind = CPE.getSectionKind(&DL);
       const Constant *C = CPE.Val.ConstVal;
       Align Alignment = CPE.Alignment;
       if (const MCSectionCOFF *S = dyn_cast<MCSectionCOFF>(
               getObjFileLowering().getSectionForConstant(DL, Kind, C,
                                                          Alignment))) {
         if (MCSymbol *Sym = S->getCOMDATSymbol()) {
           if (Sym->isUndefined())
             OutStreamer->emitSymbolAttribute(Sym, MCSA_Global);
           return Sym;
         }
       }
     }
   }
 
   const DataLayout &DL = getDataLayout();
   return OutContext.getOrCreateSymbol(Twine(DL.getPrivateGlobalPrefix()) +
                                       "CPI" + Twine(getFunctionNumber()) + "_" +
                                       Twine(CPID));
 }
 
 /// GetJTISymbol - Return the symbol for the specified jump table entry.
 MCSymbol *AsmPrinter::GetJTISymbol(unsigned JTID, bool isLinkerPrivate) const {
   return MF->getJTISymbol(JTID, OutContext, isLinkerPrivate);
 }
 
 /// GetJTSetSymbol - Return the symbol for the specified jump table .set
 /// FIXME: privatize to AsmPrinter.
 MCSymbol *AsmPrinter::GetJTSetSymbol(unsigned UID, unsigned MBBID) const {
   const DataLayout &DL = getDataLayout();
   return OutContext.getOrCreateSymbol(Twine(DL.getPrivateGlobalPrefix()) +
                                       Twine(getFunctionNumber()) + "_" +
                                       Twine(UID) + "_set_" + Twine(MBBID));
 }
 
 MCSymbol *AsmPrinter::getSymbolWithGlobalValueBase(const GlobalValue *GV,
                                                    StringRef Suffix) const {
   return getObjFileLowering().getSymbolWithGlobalValueBase(GV, Suffix, TM);
 }
 
 /// Return the MCSymbol for the specified ExternalSymbol.
 MCSymbol *AsmPrinter::GetExternalSymbolSymbol(StringRef Sym) const {
   SmallString<60> NameStr;
   Mangler::getNameWithPrefix(NameStr, Sym, getDataLayout());
   return OutContext.getOrCreateSymbol(NameStr);
 }
 
 /// PrintParentLoopComment - Print comments about parent loops of this one.
 static void PrintParentLoopComment(raw_ostream &OS, const MachineLoop *Loop,
                                    unsigned FunctionNumber) {
   if (!Loop) return;
   PrintParentLoopComment(OS, Loop->getParentLoop(), FunctionNumber);
   OS.indent(Loop->getLoopDepth()*2)
     << "Parent Loop BB" << FunctionNumber << "_"
     << Loop->getHeader()->getNumber()
     << " Depth=" << Loop->getLoopDepth() << '\n';
 }
 
 /// PrintChildLoopComment - Print comments about child loops within
 /// the loop for this basic block, with nesting.
 static void PrintChildLoopComment(raw_ostream &OS, const MachineLoop *Loop,
                                   unsigned FunctionNumber) {
   // Add child loop information
   for (const MachineLoop *CL : *Loop) {
     OS.indent(CL->getLoopDepth()*2)
       << "Child Loop BB" << FunctionNumber << "_"
       << CL->getHeader()->getNumber() << " Depth " << CL->getLoopDepth()
       << '\n';
     PrintChildLoopComment(OS, CL, FunctionNumber);
   }
 }
 
 /// emitBasicBlockLoopComments - Pretty-print comments for basic blocks.
 static void emitBasicBlockLoopComments(const MachineBasicBlock &MBB,
                                        const MachineLoopInfo *LI,
                                        const AsmPrinter &AP) {
   // Add loop depth information
   const MachineLoop *Loop = LI->getLoopFor(&MBB);
   if (!Loop) return;
 
   MachineBasicBlock *Header = Loop->getHeader();
   assert(Header && "No header for loop");
 
   // If this block is not a loop header, just print out what is the loop header
   // and return.
   if (Header != &MBB) {
     AP.OutStreamer->AddComment("  in Loop: Header=BB" +
                                Twine(AP.getFunctionNumber())+"_" +
                                Twine(Loop->getHeader()->getNumber())+
                                " Depth="+Twine(Loop->getLoopDepth()));
     return;
   }
 
   // Otherwise, it is a loop header.  Print out information about child and
   // parent loops.
   raw_ostream &OS = AP.OutStreamer->GetCommentOS();
 
   PrintParentLoopComment(OS, Loop->getParentLoop(), AP.getFunctionNumber());
 
   OS << "=>";
   OS.indent(Loop->getLoopDepth()*2-2);
 
   OS << "This ";
   if (Loop->empty())
     OS << "Inner ";
   OS << "Loop Header: Depth=" + Twine(Loop->getLoopDepth()) << '\n';
 
   PrintChildLoopComment(OS, Loop, AP.getFunctionNumber());
 }
 
 /// emitBasicBlockStart - This method prints the label for the specified
 /// MachineBasicBlock, an alignment (if present) and a comment describing
 /// it if appropriate.
 void AsmPrinter::emitBasicBlockStart(const MachineBasicBlock &MBB) {
   // End the previous funclet and start a new one.
   if (MBB.isEHFuncletEntry()) {
     for (const HandlerInfo &HI : Handlers) {
       HI.Handler->endFunclet();
       HI.Handler->beginFunclet(MBB);
     }
   }
 
   // Emit an alignment directive for this block, if needed.
   const Align Alignment = MBB.getAlignment();
   if (Alignment != Align(1))
     emitAlignment(Alignment);
 
   // If the block has its address taken, emit any labels that were used to
   // reference the block.  It is possible that there is more than one label
   // here, because multiple LLVM BB's may have been RAUW'd to this block after
   // the references were generated.
   if (MBB.hasAddressTaken()) {
     const BasicBlock *BB = MBB.getBasicBlock();
     if (isVerbose())
       OutStreamer->AddComment("Block address taken");
 
     // MBBs can have their address taken as part of CodeGen without having
     // their corresponding BB's address taken in IR
     if (BB->hasAddressTaken())
       for (MCSymbol *Sym : MMI->getAddrLabelSymbolToEmit(BB))
         OutStreamer->emitLabel(Sym);
   }
 
   // Print some verbose block comments.
   if (isVerbose()) {
     if (const BasicBlock *BB = MBB.getBasicBlock()) {
       if (BB->hasName()) {
         BB->printAsOperand(OutStreamer->GetCommentOS(),
                            /*PrintType=*/false, BB->getModule());
         OutStreamer->GetCommentOS() << '\n';
       }
     }
 
     assert(MLI != nullptr && "MachineLoopInfo should has been computed");
     emitBasicBlockLoopComments(MBB, MLI, *this);
   }
 
   if (MBB.pred_empty() ||
       (!MF->hasBBLabels() && isBlockOnlyReachableByFallthrough(&MBB) &&
        !MBB.isEHFuncletEntry() && !MBB.hasLabelMustBeEmitted())) {
     if (isVerbose()) {
       // NOTE: Want this comment at start of line, don't emit with AddComment.
       OutStreamer->emitRawComment(" %bb." + Twine(MBB.getNumber()) + ":",
                                   false);
     }
   } else {
     if (isVerbose() && MBB.hasLabelMustBeEmitted()) {
       OutStreamer->AddComment("Label of block must be emitted");
     }
     auto *BBSymbol = MBB.getSymbol();
     // Switch to a new section if this basic block must begin a section.
     if (MBB.isBeginSection()) {
       OutStreamer->SwitchSection(
           getObjFileLowering().getSectionForMachineBasicBlock(MF->getFunction(),
                                                               MBB, TM));
       CurrentSectionBeginSym = BBSymbol;
     }
     OutStreamer->emitLabel(BBSymbol);
     // With BB sections, each basic block must handle CFI information on its own
     // if it begins a section.
     if (MBB.isBeginSection())
       for (const HandlerInfo &HI : Handlers)
         HI.Handler->beginBasicBlock(MBB);
   }
 }
 
 void AsmPrinter::emitBasicBlockEnd(const MachineBasicBlock &MBB) {
   // Check if CFI information needs to be updated for this MBB with basic block
   // sections.
   if (MBB.isEndSection())
     for (const HandlerInfo &HI : Handlers)
       HI.Handler->endBasicBlock(MBB);
 }
 
 void AsmPrinter::emitVisibility(MCSymbol *Sym, unsigned Visibility,
                                 bool IsDefinition) const {
   MCSymbolAttr Attr = MCSA_Invalid;
 
   switch (Visibility) {
   default: break;
   case GlobalValue::HiddenVisibility:
     if (IsDefinition)
       Attr = MAI->getHiddenVisibilityAttr();
     else
       Attr = MAI->getHiddenDeclarationVisibilityAttr();
     break;
   case GlobalValue::ProtectedVisibility:
     Attr = MAI->getProtectedVisibilityAttr();
     break;
   }
 
   if (Attr != MCSA_Invalid)
     OutStreamer->emitSymbolAttribute(Sym, Attr);
 }
 
 /// isBlockOnlyReachableByFallthough - Return true if the basic block has
 /// exactly one predecessor and the control transfer mechanism between
 /// the predecessor and this block is a fall-through.
 bool AsmPrinter::
 isBlockOnlyReachableByFallthrough(const MachineBasicBlock *MBB) const {
   // With BasicBlock Sections, beginning of the section is not a fallthrough.
   if (MBB->isBeginSection())
     return false;
 
   // If this is a landing pad, it isn't a fall through.  If it has no preds,
   // then nothing falls through to it.
   if (MBB->isEHPad() || MBB->pred_empty())
     return false;
 
   // If there isn't exactly one predecessor, it can't be a fall through.
   if (MBB->pred_size() > 1)
     return false;
 
   // The predecessor has to be immediately before this block.
   MachineBasicBlock *Pred = *MBB->pred_begin();
   if (!Pred->isLayoutSuccessor(MBB))
     return false;
 
   // If the block is completely empty, then it definitely does fall through.
   if (Pred->empty())
     return true;
 
   // Check the terminators in the previous blocks
   for (const auto &MI : Pred->terminators()) {
     // If it is not a simple branch, we are in a table somewhere.
     if (!MI.isBranch() || MI.isIndirectBranch())
       return false;
 
     // If we are the operands of one of the branches, this is not a fall
     // through. Note that targets with delay slots will usually bundle
     // terminators with the delay slot instruction.
     for (ConstMIBundleOperands OP(MI); OP.isValid(); ++OP) {
       if (OP->isJTI())
         return false;
       if (OP->isMBB() && OP->getMBB() == MBB)
         return false;
     }
   }
 
   return true;
 }
 
 GCMetadataPrinter *AsmPrinter::GetOrCreateGCPrinter(GCStrategy &S) {
   if (!S.usesMetadata())
     return nullptr;
 
   gcp_map_type &GCMap = getGCMap(GCMetadataPrinters);
   gcp_map_type::iterator GCPI = GCMap.find(&S);
   if (GCPI != GCMap.end())
     return GCPI->second.get();
 
   auto Name = S.getName();
 
   for (const GCMetadataPrinterRegistry::entry &GCMetaPrinter :
        GCMetadataPrinterRegistry::entries())
     if (Name == GCMetaPrinter.getName()) {
       std::unique_ptr<GCMetadataPrinter> GMP = GCMetaPrinter.instantiate();
       GMP->S = &S;
       auto IterBool = GCMap.insert(std::make_pair(&S, std::move(GMP)));
       return IterBool.first->second.get();
     }
 
   report_fatal_error("no GCMetadataPrinter registered for GC: " + Twine(Name));
 }
 
 void AsmPrinter::emitStackMaps(StackMaps &SM) {
   GCModuleInfo *MI = getAnalysisIfAvailable<GCModuleInfo>();
   assert(MI && "AsmPrinter didn't require GCModuleInfo?");
   bool NeedsDefault = false;
   if (MI->begin() == MI->end())
     // No GC strategy, use the default format.
     NeedsDefault = true;
   else
     for (auto &I : *MI) {
       if (GCMetadataPrinter *MP = GetOrCreateGCPrinter(*I))
         if (MP->emitStackMaps(SM, *this))
           continue;
       // The strategy doesn't have printer or doesn't emit custom stack maps.
       // Use the default format.
       NeedsDefault = true;
     }
 
   if (NeedsDefault)
     SM.serializeToStackMapSection();
 }
 
 /// Pin vtable to this file.
 AsmPrinterHandler::~AsmPrinterHandler() = default;
 
 void AsmPrinterHandler::markFunctionEnd() {}
 
 // In the binary's "xray_instr_map" section, an array of these function entries
 // describes each instrumentation point.  When XRay patches your code, the index
 // into this table will be given to your handler as a patch point identifier.
 void AsmPrinter::XRayFunctionEntry::emit(int Bytes, MCStreamer *Out) const {
   auto Kind8 = static_cast<uint8_t>(Kind);
   Out->emitBinaryData(StringRef(reinterpret_cast<const char *>(&Kind8), 1));
   Out->emitBinaryData(
       StringRef(reinterpret_cast<const char *>(&AlwaysInstrument), 1));
   Out->emitBinaryData(StringRef(reinterpret_cast<const char *>(&Version), 1));
   auto Padding = (4 * Bytes) - ((2 * Bytes) + 3);
   assert(Padding >= 0 && "Instrumentation map entry > 4 * Word Size");
   Out->emitZeros(Padding);
 }
 
 void AsmPrinter::emitXRayTable() {
   if (Sleds.empty())
     return;
 
   auto PrevSection = OutStreamer->getCurrentSectionOnly();
   const Function &F = MF->getFunction();
   MCSection *InstMap = nullptr;
   MCSection *FnSledIndex = nullptr;
   const Triple &TT = TM.getTargetTriple();
   // Use PC-relative addresses on all targets except MIPS (MIPS64 cannot use
   // PC-relative addresses because R_MIPS_PC64 does not exist).
   bool PCRel = !TT.isMIPS();
   if (TT.isOSBinFormatELF()) {
     auto LinkedToSym = cast<MCSymbolELF>(CurrentFnSym);
     auto Flags = ELF::SHF_ALLOC | ELF::SHF_LINK_ORDER;
     if (!PCRel)
       Flags |= ELF::SHF_WRITE;
     StringRef GroupName;
     if (F.hasComdat()) {
       Flags |= ELF::SHF_GROUP;
       GroupName = F.getComdat()->getName();
     }
     InstMap = OutContext.getELFSection("xray_instr_map", ELF::SHT_PROGBITS,
                                        Flags, 0, GroupName,
                                        MCSection::NonUniqueID, LinkedToSym);
 
     if (!TM.Options.XRayOmitFunctionIndex)
       FnSledIndex = OutContext.getELFSection(
           "xray_fn_idx", ELF::SHT_PROGBITS, Flags | ELF::SHF_WRITE, 0,
           GroupName, MCSection::NonUniqueID, LinkedToSym);
   } else if (MF->getSubtarget().getTargetTriple().isOSBinFormatMachO()) {
     InstMap = OutContext.getMachOSection("__DATA", "xray_instr_map", 0,
                                          SectionKind::getReadOnlyWithRel());
     if (!TM.Options.XRayOmitFunctionIndex)
       FnSledIndex = OutContext.getMachOSection(
           "__DATA", "xray_fn_idx", 0, SectionKind::getReadOnlyWithRel());
   } else {
     llvm_unreachable("Unsupported target");
   }
 
   auto WordSizeBytes = MAI->getCodePointerSize();
 
   // Now we switch to the instrumentation map section. Because this is done
   // per-function, we are able to create an index entry that will represent the
   // range of sleds associated with a function.
   auto &Ctx = OutContext;
   MCSymbol *SledsStart = OutContext.createTempSymbol("xray_sleds_start", true);
   OutStreamer->SwitchSection(InstMap);
   OutStreamer->emitLabel(SledsStart);
   for (const auto &Sled : Sleds) {
     if (PCRel) {
       MCSymbol *Dot = Ctx.createTempSymbol();
       OutStreamer->emitLabel(Dot);
       OutStreamer->emitValueImpl(
           MCBinaryExpr::createSub(MCSymbolRefExpr::create(Sled.Sled, Ctx),
                                   MCSymbolRefExpr::create(Dot, Ctx), Ctx),
           WordSizeBytes);
       OutStreamer->emitValueImpl(
           MCBinaryExpr::createSub(
               MCSymbolRefExpr::create(CurrentFnBegin, Ctx),
               MCBinaryExpr::createAdd(
                   MCSymbolRefExpr::create(Dot, Ctx),
                   MCConstantExpr::create(WordSizeBytes, Ctx), Ctx),
               Ctx),
           WordSizeBytes);
     } else {
       OutStreamer->emitSymbolValue(Sled.Sled, WordSizeBytes);
       OutStreamer->emitSymbolValue(CurrentFnSym, WordSizeBytes);
     }
     Sled.emit(WordSizeBytes, OutStreamer.get());
   }
   MCSymbol *SledsEnd = OutContext.createTempSymbol("xray_sleds_end", true);
   OutStreamer->emitLabel(SledsEnd);
 
   // We then emit a single entry in the index per function. We use the symbols
   // that bound the instrumentation map as the range for a specific function.
   // Each entry here will be 2 * word size aligned, as we're writing down two
   // pointers. This should work for both 32-bit and 64-bit platforms.
   if (FnSledIndex) {
     OutStreamer->SwitchSection(FnSledIndex);
     OutStreamer->emitCodeAlignment(2 * WordSizeBytes);
     OutStreamer->emitSymbolValue(SledsStart, WordSizeBytes, false);
     OutStreamer->emitSymbolValue(SledsEnd, WordSizeBytes, false);
     OutStreamer->SwitchSection(PrevSection);
   }
   Sleds.clear();
 }
 
 void AsmPrinter::recordSled(MCSymbol *Sled, const MachineInstr &MI,
                             SledKind Kind, uint8_t Version) {
   const Function &F = MI.getMF()->getFunction();
   auto Attr = F.getFnAttribute("function-instrument");
   bool LogArgs = F.hasFnAttribute("xray-log-args");
   bool AlwaysInstrument =
     Attr.isStringAttribute() && Attr.getValueAsString() == "xray-always";
   if (Kind == SledKind::FUNCTION_ENTER && LogArgs)
     Kind = SledKind::LOG_ARGS_ENTER;
   Sleds.emplace_back(XRayFunctionEntry{Sled, CurrentFnSym, Kind,
                                        AlwaysInstrument, &F, Version});
 }
 
 void AsmPrinter::emitPatchableFunctionEntries() {
   const Function &F = MF->getFunction();
   unsigned PatchableFunctionPrefix = 0, PatchableFunctionEntry = 0;
   (void)F.getFnAttribute("patchable-function-prefix")
       .getValueAsString()
       .getAsInteger(10, PatchableFunctionPrefix);
   (void)F.getFnAttribute("patchable-function-entry")
       .getValueAsString()
       .getAsInteger(10, PatchableFunctionEntry);
   if (!PatchableFunctionPrefix && !PatchableFunctionEntry)
     return;
   const unsigned PointerSize = getPointerSize();
   if (TM.getTargetTriple().isOSBinFormatELF()) {
     auto Flags = ELF::SHF_WRITE | ELF::SHF_ALLOC;
     const MCSymbolELF *LinkedToSym = nullptr;
     StringRef GroupName;
 
     // GNU as < 2.35 did not support section flag 'o'. Use SHF_LINK_ORDER only
     // if we are using the integrated assembler.
     if (MAI->useIntegratedAssembler()) {
       Flags |= ELF::SHF_LINK_ORDER;
       if (F.hasComdat()) {
         Flags |= ELF::SHF_GROUP;
         GroupName = F.getComdat()->getName();
       }
       LinkedToSym = cast<MCSymbolELF>(CurrentFnSym);
     }
     OutStreamer->SwitchSection(OutContext.getELFSection(
         "__patchable_function_entries", ELF::SHT_PROGBITS, Flags, 0, GroupName,
         MCSection::NonUniqueID, LinkedToSym));
     emitAlignment(Align(PointerSize));
     OutStreamer->emitSymbolValue(CurrentPatchableFunctionEntrySym, PointerSize);
   }
 }
 
 uint16_t AsmPrinter::getDwarfVersion() const {
   return OutStreamer->getContext().getDwarfVersion();
 }
 
 void AsmPrinter::setDwarfVersion(uint16_t Version) {
   OutStreamer->getContext().setDwarfVersion(Version);
 }
Index: vendor/llvm-project/release-11.x/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
===================================================================
--- vendor/llvm-project/release-11.x/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp	(revision 366333)
@@ -1,507 +1,509 @@
 //===-- lib/CodeGen/GlobalISel/CallLowering.cpp - Call lowering -----------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 ///
 /// \file
 /// This file implements some simple delegations needed for call lowering.
 ///
 //===----------------------------------------------------------------------===//
 
 #include "llvm/CodeGen/Analysis.h"
 #include "llvm/CodeGen/GlobalISel/CallLowering.h"
 #include "llvm/CodeGen/GlobalISel/Utils.h"
 #include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
 #include "llvm/CodeGen/MachineOperand.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/TargetLowering.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/Module.h"
 #include "llvm/Target/TargetMachine.h"
 
 #define DEBUG_TYPE "call-lowering"
 
 using namespace llvm;
 
 void CallLowering::anchor() {}
 
 bool CallLowering::lowerCall(MachineIRBuilder &MIRBuilder, const CallBase &CB,
                              ArrayRef<Register> ResRegs,
                              ArrayRef<ArrayRef<Register>> ArgRegs,
                              Register SwiftErrorVReg,
                              std::function<unsigned()> GetCalleeReg) const {
   CallLoweringInfo Info;
   const DataLayout &DL = MIRBuilder.getDataLayout();
 
   // First step is to marshall all the function's parameters into the correct
   // physregs and memory locations. Gather the sequence of argument types that
   // we'll pass to the assigner function.
   unsigned i = 0;
   unsigned NumFixedArgs = CB.getFunctionType()->getNumParams();
   for (auto &Arg : CB.args()) {
     ArgInfo OrigArg{ArgRegs[i], Arg->getType(), ISD::ArgFlagsTy{},
                     i < NumFixedArgs};
     setArgFlags(OrigArg, i + AttributeList::FirstArgIndex, DL, CB);
     Info.OrigArgs.push_back(OrigArg);
     ++i;
   }
 
   // Try looking through a bitcast from one function type to another.
   // Commonly happens with calls to objc_msgSend().
   const Value *CalleeV = CB.getCalledOperand()->stripPointerCasts();
   if (const Function *F = dyn_cast<Function>(CalleeV))
     Info.Callee = MachineOperand::CreateGA(F, 0);
   else
     Info.Callee = MachineOperand::CreateReg(GetCalleeReg(), false);
 
   Info.OrigRet = ArgInfo{ResRegs, CB.getType(), ISD::ArgFlagsTy{}};
   if (!Info.OrigRet.Ty->isVoidTy())
     setArgFlags(Info.OrigRet, AttributeList::ReturnIndex, DL, CB);
 
   MachineFunction &MF = MIRBuilder.getMF();
   Info.KnownCallees = CB.getMetadata(LLVMContext::MD_callees);
   Info.CallConv = CB.getCallingConv();
   Info.SwiftErrorVReg = SwiftErrorVReg;
   Info.IsMustTailCall = CB.isMustTailCall();
   Info.IsTailCall =
       CB.isTailCall() && isInTailCallPosition(CB, MF.getTarget()) &&
       (MF.getFunction()
            .getFnAttribute("disable-tail-calls")
            .getValueAsString() != "true");
   Info.IsVarArg = CB.getFunctionType()->isVarArg();
   return lowerCall(MIRBuilder, Info);
 }
 
 template <typename FuncInfoTy>
 void CallLowering::setArgFlags(CallLowering::ArgInfo &Arg, unsigned OpIdx,
                                const DataLayout &DL,
                                const FuncInfoTy &FuncInfo) const {
   auto &Flags = Arg.Flags[0];
   const AttributeList &Attrs = FuncInfo.getAttributes();
   if (Attrs.hasAttribute(OpIdx, Attribute::ZExt))
     Flags.setZExt();
   if (Attrs.hasAttribute(OpIdx, Attribute::SExt))
     Flags.setSExt();
   if (Attrs.hasAttribute(OpIdx, Attribute::InReg))
     Flags.setInReg();
   if (Attrs.hasAttribute(OpIdx, Attribute::StructRet))
     Flags.setSRet();
   if (Attrs.hasAttribute(OpIdx, Attribute::SwiftSelf))
     Flags.setSwiftSelf();
   if (Attrs.hasAttribute(OpIdx, Attribute::SwiftError))
     Flags.setSwiftError();
   if (Attrs.hasAttribute(OpIdx, Attribute::ByVal))
     Flags.setByVal();
   if (Attrs.hasAttribute(OpIdx, Attribute::Preallocated))
     Flags.setPreallocated();
   if (Attrs.hasAttribute(OpIdx, Attribute::InAlloca))
     Flags.setInAlloca();
 
   if (Flags.isByVal() || Flags.isInAlloca() || Flags.isPreallocated()) {
     Type *ElementTy = cast<PointerType>(Arg.Ty)->getElementType();
 
     auto Ty = Attrs.getAttribute(OpIdx, Attribute::ByVal).getValueAsType();
     Flags.setByValSize(DL.getTypeAllocSize(Ty ? Ty : ElementTy));
 
     // For ByVal, alignment should be passed from FE.  BE will guess if
     // this info is not there but there are cases it cannot get right.
     Align FrameAlign;
     if (auto ParamAlign = FuncInfo.getParamAlign(OpIdx - 2))
       FrameAlign = *ParamAlign;
     else
       FrameAlign = Align(getTLI()->getByValTypeAlignment(ElementTy, DL));
     Flags.setByValAlign(FrameAlign);
   }
   if (Attrs.hasAttribute(OpIdx, Attribute::Nest))
     Flags.setNest();
   Flags.setOrigAlign(DL.getABITypeAlign(Arg.Ty));
 }
 
 template void
 CallLowering::setArgFlags<Function>(CallLowering::ArgInfo &Arg, unsigned OpIdx,
                                     const DataLayout &DL,
                                     const Function &FuncInfo) const;
 
 template void
 CallLowering::setArgFlags<CallBase>(CallLowering::ArgInfo &Arg, unsigned OpIdx,
                                     const DataLayout &DL,
                                     const CallBase &FuncInfo) const;
 
 Register CallLowering::packRegs(ArrayRef<Register> SrcRegs, Type *PackedTy,
                                 MachineIRBuilder &MIRBuilder) const {
   assert(SrcRegs.size() > 1 && "Nothing to pack");
 
   const DataLayout &DL = MIRBuilder.getMF().getDataLayout();
   MachineRegisterInfo *MRI = MIRBuilder.getMRI();
 
   LLT PackedLLT = getLLTForType(*PackedTy, DL);
 
   SmallVector<LLT, 8> LLTs;
   SmallVector<uint64_t, 8> Offsets;
   computeValueLLTs(DL, *PackedTy, LLTs, &Offsets);
   assert(LLTs.size() == SrcRegs.size() && "Regs / types mismatch");
 
   Register Dst = MRI->createGenericVirtualRegister(PackedLLT);
   MIRBuilder.buildUndef(Dst);
   for (unsigned i = 0; i < SrcRegs.size(); ++i) {
     Register NewDst = MRI->createGenericVirtualRegister(PackedLLT);
     MIRBuilder.buildInsert(NewDst, Dst, SrcRegs[i], Offsets[i]);
     Dst = NewDst;
   }
 
   return Dst;
 }
 
 void CallLowering::unpackRegs(ArrayRef<Register> DstRegs, Register SrcReg,
                               Type *PackedTy,
                               MachineIRBuilder &MIRBuilder) const {
   assert(DstRegs.size() > 1 && "Nothing to unpack");
 
   const DataLayout &DL = MIRBuilder.getDataLayout();
 
   SmallVector<LLT, 8> LLTs;
   SmallVector<uint64_t, 8> Offsets;
   computeValueLLTs(DL, *PackedTy, LLTs, &Offsets);
   assert(LLTs.size() == DstRegs.size() && "Regs / types mismatch");
 
   for (unsigned i = 0; i < DstRegs.size(); ++i)
     MIRBuilder.buildExtract(DstRegs[i], SrcReg, Offsets[i]);
 }
 
 bool CallLowering::handleAssignments(MachineIRBuilder &MIRBuilder,
                                      SmallVectorImpl<ArgInfo> &Args,
                                      ValueHandler &Handler) const {
   MachineFunction &MF = MIRBuilder.getMF();
   const Function &F = MF.getFunction();
   SmallVector<CCValAssign, 16> ArgLocs;
   CCState CCInfo(F.getCallingConv(), F.isVarArg(), MF, ArgLocs, F.getContext());
   return handleAssignments(CCInfo, ArgLocs, MIRBuilder, Args, Handler);
 }
 
 bool CallLowering::handleAssignments(CCState &CCInfo,
                                      SmallVectorImpl<CCValAssign> &ArgLocs,
                                      MachineIRBuilder &MIRBuilder,
                                      SmallVectorImpl<ArgInfo> &Args,
                                      ValueHandler &Handler) const {
   MachineFunction &MF = MIRBuilder.getMF();
   const Function &F = MF.getFunction();
   const DataLayout &DL = F.getParent()->getDataLayout();
 
   unsigned NumArgs = Args.size();
   for (unsigned i = 0; i != NumArgs; ++i) {
     EVT CurVT = EVT::getEVT(Args[i].Ty);
     if (!CurVT.isSimple() ||
         Handler.assignArg(i, CurVT.getSimpleVT(), CurVT.getSimpleVT(),
                           CCValAssign::Full, Args[i], Args[i].Flags[0],
                           CCInfo)) {
       MVT NewVT = TLI->getRegisterTypeForCallingConv(
           F.getContext(), F.getCallingConv(), EVT(CurVT));
 
       // If we need to split the type over multiple regs, check it's a scenario
       // we currently support.
       unsigned NumParts = TLI->getNumRegistersForCallingConv(
           F.getContext(), F.getCallingConv(), CurVT);
       if (NumParts > 1) {
         // For now only handle exact splits.
         if (NewVT.getSizeInBits() * NumParts != CurVT.getSizeInBits())
           return false;
       }
 
       // For incoming arguments (physregs to vregs), we could have values in
       // physregs (or memlocs) which we want to extract and copy to vregs.
       // During this, we might have to deal with the LLT being split across
       // multiple regs, so we have to record this information for later.
       //
       // If we have outgoing args, then we have the opposite case. We have a
       // vreg with an LLT which we want to assign to a physical location, and
       // we might have to record that the value has to be split later.
       if (Handler.isIncomingArgumentHandler()) {
         if (NumParts == 1) {
           // Try to use the register type if we couldn't assign the VT.
           if (Handler.assignArg(i, NewVT, NewVT, CCValAssign::Full, Args[i],
                                 Args[i].Flags[0], CCInfo))
             return false;
         } else {
           // We're handling an incoming arg which is split over multiple regs.
           // E.g. passing an s128 on AArch64.
           ISD::ArgFlagsTy OrigFlags = Args[i].Flags[0];
           Args[i].OrigRegs.push_back(Args[i].Regs[0]);
           Args[i].Regs.clear();
           Args[i].Flags.clear();
           LLT NewLLT = getLLTForMVT(NewVT);
           // For each split register, create and assign a vreg that will store
           // the incoming component of the larger value. These will later be
           // merged to form the final vreg.
           for (unsigned Part = 0; Part < NumParts; ++Part) {
             Register Reg =
                 MIRBuilder.getMRI()->createGenericVirtualRegister(NewLLT);
             ISD::ArgFlagsTy Flags = OrigFlags;
             if (Part == 0) {
               Flags.setSplit();
             } else {
               Flags.setOrigAlign(Align(1));
               if (Part == NumParts - 1)
                 Flags.setSplitEnd();
             }
             Args[i].Regs.push_back(Reg);
             Args[i].Flags.push_back(Flags);
             if (Handler.assignArg(i + Part, NewVT, NewVT, CCValAssign::Full,
                                   Args[i], Args[i].Flags[Part], CCInfo)) {
               // Still couldn't assign this smaller part type for some reason.
               return false;
             }
           }
         }
       } else {
         // Handling an outgoing arg that might need to be split.
         if (NumParts < 2)
           return false; // Don't know how to deal with this type combination.
 
         // This type is passed via multiple registers in the calling convention.
         // We need to extract the individual parts.
         Register LargeReg = Args[i].Regs[0];
         LLT SmallTy = LLT::scalar(NewVT.getSizeInBits());
         auto Unmerge = MIRBuilder.buildUnmerge(SmallTy, LargeReg);
         assert(Unmerge->getNumOperands() == NumParts + 1);
         ISD::ArgFlagsTy OrigFlags = Args[i].Flags[0];
         // We're going to replace the regs and flags with the split ones.
         Args[i].Regs.clear();
         Args[i].Flags.clear();
         for (unsigned PartIdx = 0; PartIdx < NumParts; ++PartIdx) {
           ISD::ArgFlagsTy Flags = OrigFlags;
           if (PartIdx == 0) {
             Flags.setSplit();
           } else {
             Flags.setOrigAlign(Align(1));
             if (PartIdx == NumParts - 1)
               Flags.setSplitEnd();
           }
           Args[i].Regs.push_back(Unmerge.getReg(PartIdx));
           Args[i].Flags.push_back(Flags);
           if (Handler.assignArg(i + PartIdx, NewVT, NewVT, CCValAssign::Full,
                                 Args[i], Args[i].Flags[PartIdx], CCInfo))
             return false;
         }
       }
     }
   }
 
   for (unsigned i = 0, e = Args.size(), j = 0; i != e; ++i, ++j) {
     assert(j < ArgLocs.size() && "Skipped too many arg locs");
 
     CCValAssign &VA = ArgLocs[j];
     assert(VA.getValNo() == i && "Location doesn't correspond to current arg");
 
     if (VA.needsCustom()) {
       unsigned NumArgRegs =
           Handler.assignCustomValue(Args[i], makeArrayRef(ArgLocs).slice(j));
       if (!NumArgRegs)
         return false;
       j += NumArgRegs;
       continue;
     }
 
     // FIXME: Pack registers if we have more than one.
     Register ArgReg = Args[i].Regs[0];
 
     EVT OrigVT = EVT::getEVT(Args[i].Ty);
     EVT VAVT = VA.getValVT();
     const LLT OrigTy = getLLTForType(*Args[i].Ty, DL);
 
     if (VA.isRegLoc()) {
       if (Handler.isIncomingArgumentHandler() && VAVT != OrigVT) {
         if (VAVT.getSizeInBits() < OrigVT.getSizeInBits()) {
           // Expected to be multiple regs for a single incoming arg.
           unsigned NumArgRegs = Args[i].Regs.size();
           if (NumArgRegs < 2)
             return false;
 
           assert((j + (NumArgRegs - 1)) < ArgLocs.size() &&
                  "Too many regs for number of args");
           for (unsigned Part = 0; Part < NumArgRegs; ++Part) {
             // There should be Regs.size() ArgLocs per argument.
             VA = ArgLocs[j + Part];
             Handler.assignValueToReg(Args[i].Regs[Part], VA.getLocReg(), VA);
           }
           j += NumArgRegs - 1;
           // Merge the split registers into the expected larger result vreg
           // of the original call.
           MIRBuilder.buildMerge(Args[i].OrigRegs[0], Args[i].Regs);
           continue;
         }
         const LLT VATy(VAVT.getSimpleVT());
         Register NewReg =
             MIRBuilder.getMRI()->createGenericVirtualRegister(VATy);
         Handler.assignValueToReg(NewReg, VA.getLocReg(), VA);
         // If it's a vector type, we either need to truncate the elements
         // or do an unmerge to get the lower block of elements.
         if (VATy.isVector() &&
             VATy.getNumElements() > OrigVT.getVectorNumElements()) {
           // Just handle the case where the VA type is 2 * original type.
           if (VATy.getNumElements() != OrigVT.getVectorNumElements() * 2) {
             LLVM_DEBUG(dbgs()
                        << "Incoming promoted vector arg has too many elts");
             return false;
           }
           auto Unmerge = MIRBuilder.buildUnmerge({OrigTy, OrigTy}, {NewReg});
           MIRBuilder.buildCopy(ArgReg, Unmerge.getReg(0));
         } else {
           MIRBuilder.buildTrunc(ArgReg, {NewReg}).getReg(0);
         }
       } else if (!Handler.isIncomingArgumentHandler()) {
         assert((j + (Args[i].Regs.size() - 1)) < ArgLocs.size() &&
                "Too many regs for number of args");
         // This is an outgoing argument that might have been split.
         for (unsigned Part = 0; Part < Args[i].Regs.size(); ++Part) {
           // There should be Regs.size() ArgLocs per argument.
           VA = ArgLocs[j + Part];
           Handler.assignValueToReg(Args[i].Regs[Part], VA.getLocReg(), VA);
         }
         j += Args[i].Regs.size() - 1;
       } else {
         Handler.assignValueToReg(ArgReg, VA.getLocReg(), VA);
       }
     } else if (VA.isMemLoc()) {
       // Don't currently support loading/storing a type that needs to be split
       // to the stack. Should be easy, just not implemented yet.
       if (Args[i].Regs.size() > 1) {
         LLVM_DEBUG(
             dbgs()
             << "Load/store a split arg to/from the stack not implemented yet");
         return false;
       }
-      MVT VT = MVT::getVT(Args[i].Ty);
-      unsigned Size = VT == MVT::iPTR ? DL.getPointerSize()
-                                      : alignTo(VT.getSizeInBits(), 8) / 8;
+
+      EVT LocVT = VA.getValVT();
+      unsigned MemSize = LocVT == MVT::iPTR ? DL.getPointerSize()
+                                            : LocVT.getStoreSize();
+
       unsigned Offset = VA.getLocMemOffset();
       MachinePointerInfo MPO;
-      Register StackAddr = Handler.getStackAddress(Size, Offset, MPO);
-      Handler.assignValueToAddress(Args[i], StackAddr, Size, MPO, VA);
+      Register StackAddr = Handler.getStackAddress(MemSize, Offset, MPO);
+      Handler.assignValueToAddress(Args[i], StackAddr, MemSize, MPO, VA);
     } else {
       // FIXME: Support byvals and other weirdness
       return false;
     }
   }
   return true;
 }
 
 bool CallLowering::analyzeArgInfo(CCState &CCState,
                                   SmallVectorImpl<ArgInfo> &Args,
                                   CCAssignFn &AssignFnFixed,
                                   CCAssignFn &AssignFnVarArg) const {
   for (unsigned i = 0, e = Args.size(); i < e; ++i) {
     MVT VT = MVT::getVT(Args[i].Ty);
     CCAssignFn &Fn = Args[i].IsFixed ? AssignFnFixed : AssignFnVarArg;
     if (Fn(i, VT, VT, CCValAssign::Full, Args[i].Flags[0], CCState)) {
       // Bail out on anything we can't handle.
       LLVM_DEBUG(dbgs() << "Cannot analyze " << EVT(VT).getEVTString()
                         << " (arg number = " << i << "\n");
       return false;
     }
   }
   return true;
 }
 
 bool CallLowering::resultsCompatible(CallLoweringInfo &Info,
                                      MachineFunction &MF,
                                      SmallVectorImpl<ArgInfo> &InArgs,
                                      CCAssignFn &CalleeAssignFnFixed,
                                      CCAssignFn &CalleeAssignFnVarArg,
                                      CCAssignFn &CallerAssignFnFixed,
                                      CCAssignFn &CallerAssignFnVarArg) const {
   const Function &F = MF.getFunction();
   CallingConv::ID CalleeCC = Info.CallConv;
   CallingConv::ID CallerCC = F.getCallingConv();
 
   if (CallerCC == CalleeCC)
     return true;
 
   SmallVector<CCValAssign, 16> ArgLocs1;
   CCState CCInfo1(CalleeCC, false, MF, ArgLocs1, F.getContext());
   if (!analyzeArgInfo(CCInfo1, InArgs, CalleeAssignFnFixed,
                       CalleeAssignFnVarArg))
     return false;
 
   SmallVector<CCValAssign, 16> ArgLocs2;
   CCState CCInfo2(CallerCC, false, MF, ArgLocs2, F.getContext());
   if (!analyzeArgInfo(CCInfo2, InArgs, CallerAssignFnFixed,
                       CalleeAssignFnVarArg))
     return false;
 
   // We need the argument locations to match up exactly. If there's more in
   // one than the other, then we are done.
   if (ArgLocs1.size() != ArgLocs2.size())
     return false;
 
   // Make sure that each location is passed in exactly the same way.
   for (unsigned i = 0, e = ArgLocs1.size(); i < e; ++i) {
     const CCValAssign &Loc1 = ArgLocs1[i];
     const CCValAssign &Loc2 = ArgLocs2[i];
 
     // We need both of them to be the same. So if one is a register and one
     // isn't, we're done.
     if (Loc1.isRegLoc() != Loc2.isRegLoc())
       return false;
 
     if (Loc1.isRegLoc()) {
       // If they don't have the same register location, we're done.
       if (Loc1.getLocReg() != Loc2.getLocReg())
         return false;
 
       // They matched, so we can move to the next ArgLoc.
       continue;
     }
 
     // Loc1 wasn't a RegLoc, so they both must be MemLocs. Check if they match.
     if (Loc1.getLocMemOffset() != Loc2.getLocMemOffset())
       return false;
   }
 
   return true;
 }
 
 Register CallLowering::ValueHandler::extendRegister(Register ValReg,
                                                     CCValAssign &VA,
                                                     unsigned MaxSizeBits) {
   LLT LocTy{VA.getLocVT()};
   LLT ValTy = MRI.getType(ValReg);
   if (LocTy.getSizeInBits() == ValTy.getSizeInBits())
     return ValReg;
 
   if (LocTy.isScalar() && MaxSizeBits && MaxSizeBits < LocTy.getSizeInBits()) {
     if (MaxSizeBits <= ValTy.getSizeInBits())
       return ValReg;
     LocTy = LLT::scalar(MaxSizeBits);
   }
 
   switch (VA.getLocInfo()) {
   default: break;
   case CCValAssign::Full:
   case CCValAssign::BCvt:
     // FIXME: bitconverting between vector types may or may not be a
     // nop in big-endian situations.
     return ValReg;
   case CCValAssign::AExt: {
     auto MIB = MIRBuilder.buildAnyExt(LocTy, ValReg);
     return MIB.getReg(0);
   }
   case CCValAssign::SExt: {
     Register NewReg = MRI.createGenericVirtualRegister(LocTy);
     MIRBuilder.buildSExt(NewReg, ValReg);
     return NewReg;
   }
   case CCValAssign::ZExt: {
     Register NewReg = MRI.createGenericVirtualRegister(LocTy);
     MIRBuilder.buildZExt(NewReg, ValReg);
     return NewReg;
   }
   }
   llvm_unreachable("unable to extend register");
 }
 
 void CallLowering::ValueHandler::anchor() {}
Index: vendor/llvm-project/release-11.x/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
===================================================================
--- vendor/llvm-project/release-11.x/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp	(revision 366333)
@@ -1,5370 +1,5371 @@
 //===-- llvm/CodeGen/GlobalISel/LegalizerHelper.cpp -----------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 //
 /// \file This file implements the LegalizerHelper class to legalize
 /// individual instructions and the LegalizeMachineIR wrapper pass for the
 /// primary legalization.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/CodeGen/GlobalISel/LegalizerHelper.h"
 #include "llvm/CodeGen/GlobalISel/CallLowering.h"
 #include "llvm/CodeGen/GlobalISel/GISelChangeObserver.h"
 #include "llvm/CodeGen/GlobalISel/LegalizerInfo.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/TargetFrameLowering.h"
 #include "llvm/CodeGen/TargetInstrInfo.h"
 #include "llvm/CodeGen/TargetLowering.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 
 #define DEBUG_TYPE "legalizer"
 
 using namespace llvm;
 using namespace LegalizeActions;
 
 /// Try to break down \p OrigTy into \p NarrowTy sized pieces.
 ///
 /// Returns the number of \p NarrowTy elements needed to reconstruct \p OrigTy,
 /// with any leftover piece as type \p LeftoverTy
 ///
 /// Returns -1 in the first element of the pair if the breakdown is not
 /// satisfiable.
 static std::pair<int, int>
 getNarrowTypeBreakDown(LLT OrigTy, LLT NarrowTy, LLT &LeftoverTy) {
   assert(!LeftoverTy.isValid() && "this is an out argument");
 
   unsigned Size = OrigTy.getSizeInBits();
   unsigned NarrowSize = NarrowTy.getSizeInBits();
   unsigned NumParts = Size / NarrowSize;
   unsigned LeftoverSize = Size - NumParts * NarrowSize;
   assert(Size > NarrowSize);
 
   if (LeftoverSize == 0)
     return {NumParts, 0};
 
   if (NarrowTy.isVector()) {
     unsigned EltSize = OrigTy.getScalarSizeInBits();
     if (LeftoverSize % EltSize != 0)
       return {-1, -1};
     LeftoverTy = LLT::scalarOrVector(LeftoverSize / EltSize, EltSize);
   } else {
     LeftoverTy = LLT::scalar(LeftoverSize);
   }
 
   int NumLeftover = LeftoverSize / LeftoverTy.getSizeInBits();
   return std::make_pair(NumParts, NumLeftover);
 }
 
 static Type *getFloatTypeForLLT(LLVMContext &Ctx, LLT Ty) {
 
   if (!Ty.isScalar())
     return nullptr;
 
   switch (Ty.getSizeInBits()) {
   case 16:
     return Type::getHalfTy(Ctx);
   case 32:
     return Type::getFloatTy(Ctx);
   case 64:
     return Type::getDoubleTy(Ctx);
   case 128:
     return Type::getFP128Ty(Ctx);
   default:
     return nullptr;
   }
 }
 
 LegalizerHelper::LegalizerHelper(MachineFunction &MF,
                                  GISelChangeObserver &Observer,
                                  MachineIRBuilder &Builder)
     : MIRBuilder(Builder), Observer(Observer), MRI(MF.getRegInfo()),
       LI(*MF.getSubtarget().getLegalizerInfo()) {
   MIRBuilder.setChangeObserver(Observer);
 }
 
 LegalizerHelper::LegalizerHelper(MachineFunction &MF, const LegalizerInfo &LI,
                                  GISelChangeObserver &Observer,
                                  MachineIRBuilder &B)
     : MIRBuilder(B), Observer(Observer), MRI(MF.getRegInfo()), LI(LI) {
   MIRBuilder.setChangeObserver(Observer);
 }
 LegalizerHelper::LegalizeResult
 LegalizerHelper::legalizeInstrStep(MachineInstr &MI) {
   LLVM_DEBUG(dbgs() << "Legalizing: " << MI);
 
   MIRBuilder.setInstrAndDebugLoc(MI);
 
   if (MI.getOpcode() == TargetOpcode::G_INTRINSIC ||
       MI.getOpcode() == TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS)
     return LI.legalizeIntrinsic(*this, MI) ? Legalized : UnableToLegalize;
   auto Step = LI.getAction(MI, MRI);
   switch (Step.Action) {
   case Legal:
     LLVM_DEBUG(dbgs() << ".. Already legal\n");
     return AlreadyLegal;
   case Libcall:
     LLVM_DEBUG(dbgs() << ".. Convert to libcall\n");
     return libcall(MI);
   case NarrowScalar:
     LLVM_DEBUG(dbgs() << ".. Narrow scalar\n");
     return narrowScalar(MI, Step.TypeIdx, Step.NewType);
   case WidenScalar:
     LLVM_DEBUG(dbgs() << ".. Widen scalar\n");
     return widenScalar(MI, Step.TypeIdx, Step.NewType);
   case Bitcast:
     LLVM_DEBUG(dbgs() << ".. Bitcast type\n");
     return bitcast(MI, Step.TypeIdx, Step.NewType);
   case Lower:
     LLVM_DEBUG(dbgs() << ".. Lower\n");
     return lower(MI, Step.TypeIdx, Step.NewType);
   case FewerElements:
     LLVM_DEBUG(dbgs() << ".. Reduce number of elements\n");
     return fewerElementsVector(MI, Step.TypeIdx, Step.NewType);
   case MoreElements:
     LLVM_DEBUG(dbgs() << ".. Increase number of elements\n");
     return moreElementsVector(MI, Step.TypeIdx, Step.NewType);
   case Custom:
     LLVM_DEBUG(dbgs() << ".. Custom legalization\n");
     return LI.legalizeCustom(*this, MI) ? Legalized : UnableToLegalize;
   default:
     LLVM_DEBUG(dbgs() << ".. Unable to legalize\n");
     return UnableToLegalize;
   }
 }
 
 void LegalizerHelper::extractParts(Register Reg, LLT Ty, int NumParts,
                                    SmallVectorImpl<Register> &VRegs) {
   for (int i = 0; i < NumParts; ++i)
     VRegs.push_back(MRI.createGenericVirtualRegister(Ty));
   MIRBuilder.buildUnmerge(VRegs, Reg);
 }
 
 bool LegalizerHelper::extractParts(Register Reg, LLT RegTy,
                                    LLT MainTy, LLT &LeftoverTy,
                                    SmallVectorImpl<Register> &VRegs,
                                    SmallVectorImpl<Register> &LeftoverRegs) {
   assert(!LeftoverTy.isValid() && "this is an out argument");
 
   unsigned RegSize = RegTy.getSizeInBits();
   unsigned MainSize = MainTy.getSizeInBits();
   unsigned NumParts = RegSize / MainSize;
   unsigned LeftoverSize = RegSize - NumParts * MainSize;
 
   // Use an unmerge when possible.
   if (LeftoverSize == 0) {
     for (unsigned I = 0; I < NumParts; ++I)
       VRegs.push_back(MRI.createGenericVirtualRegister(MainTy));
     MIRBuilder.buildUnmerge(VRegs, Reg);
     return true;
   }
 
   if (MainTy.isVector()) {
     unsigned EltSize = MainTy.getScalarSizeInBits();
     if (LeftoverSize % EltSize != 0)
       return false;
     LeftoverTy = LLT::scalarOrVector(LeftoverSize / EltSize, EltSize);
   } else {
     LeftoverTy = LLT::scalar(LeftoverSize);
   }
 
   // For irregular sizes, extract the individual parts.
   for (unsigned I = 0; I != NumParts; ++I) {
     Register NewReg = MRI.createGenericVirtualRegister(MainTy);
     VRegs.push_back(NewReg);
     MIRBuilder.buildExtract(NewReg, Reg, MainSize * I);
   }
 
   for (unsigned Offset = MainSize * NumParts; Offset < RegSize;
        Offset += LeftoverSize) {
     Register NewReg = MRI.createGenericVirtualRegister(LeftoverTy);
     LeftoverRegs.push_back(NewReg);
     MIRBuilder.buildExtract(NewReg, Reg, Offset);
   }
 
   return true;
 }
 
 void LegalizerHelper::insertParts(Register DstReg,
                                   LLT ResultTy, LLT PartTy,
                                   ArrayRef<Register> PartRegs,
                                   LLT LeftoverTy,
                                   ArrayRef<Register> LeftoverRegs) {
   if (!LeftoverTy.isValid()) {
     assert(LeftoverRegs.empty());
 
     if (!ResultTy.isVector()) {
       MIRBuilder.buildMerge(DstReg, PartRegs);
       return;
     }
 
     if (PartTy.isVector())
       MIRBuilder.buildConcatVectors(DstReg, PartRegs);
     else
       MIRBuilder.buildBuildVector(DstReg, PartRegs);
     return;
   }
 
   unsigned PartSize = PartTy.getSizeInBits();
   unsigned LeftoverPartSize = LeftoverTy.getSizeInBits();
 
   Register CurResultReg = MRI.createGenericVirtualRegister(ResultTy);
   MIRBuilder.buildUndef(CurResultReg);
 
   unsigned Offset = 0;
   for (Register PartReg : PartRegs) {
     Register NewResultReg = MRI.createGenericVirtualRegister(ResultTy);
     MIRBuilder.buildInsert(NewResultReg, CurResultReg, PartReg, Offset);
     CurResultReg = NewResultReg;
     Offset += PartSize;
   }
 
   for (unsigned I = 0, E = LeftoverRegs.size(); I != E; ++I) {
     // Use the original output register for the final insert to avoid a copy.
     Register NewResultReg = (I + 1 == E) ?
       DstReg : MRI.createGenericVirtualRegister(ResultTy);
 
     MIRBuilder.buildInsert(NewResultReg, CurResultReg, LeftoverRegs[I], Offset);
     CurResultReg = NewResultReg;
     Offset += LeftoverPartSize;
   }
 }
 
 /// Return the result registers of G_UNMERGE_VALUES \p MI in \p Regs
 static void getUnmergeResults(SmallVectorImpl<Register> &Regs,
                               const MachineInstr &MI) {
   assert(MI.getOpcode() == TargetOpcode::G_UNMERGE_VALUES);
 
   const int NumResults = MI.getNumOperands() - 1;
   Regs.resize(NumResults);
   for (int I = 0; I != NumResults; ++I)
     Regs[I] = MI.getOperand(I).getReg();
 }
 
 LLT LegalizerHelper::extractGCDType(SmallVectorImpl<Register> &Parts, LLT DstTy,
                                     LLT NarrowTy, Register SrcReg) {
   LLT SrcTy = MRI.getType(SrcReg);
 
   LLT GCDTy = getGCDType(DstTy, getGCDType(SrcTy, NarrowTy));
   if (SrcTy == GCDTy) {
     // If the source already evenly divides the result type, we don't need to do
     // anything.
     Parts.push_back(SrcReg);
   } else {
     // Need to split into common type sized pieces.
     auto Unmerge = MIRBuilder.buildUnmerge(GCDTy, SrcReg);
     getUnmergeResults(Parts, *Unmerge);
   }
 
   return GCDTy;
 }
 
 LLT LegalizerHelper::buildLCMMergePieces(LLT DstTy, LLT NarrowTy, LLT GCDTy,
                                          SmallVectorImpl<Register> &VRegs,
                                          unsigned PadStrategy) {
   LLT LCMTy = getLCMType(DstTy, NarrowTy);
 
   int NumParts = LCMTy.getSizeInBits() / NarrowTy.getSizeInBits();
   int NumSubParts = NarrowTy.getSizeInBits() / GCDTy.getSizeInBits();
   int NumOrigSrc = VRegs.size();
 
   Register PadReg;
 
   // Get a value we can use to pad the source value if the sources won't evenly
   // cover the result type.
   if (NumOrigSrc < NumParts * NumSubParts) {
     if (PadStrategy == TargetOpcode::G_ZEXT)
       PadReg = MIRBuilder.buildConstant(GCDTy, 0).getReg(0);
     else if (PadStrategy == TargetOpcode::G_ANYEXT)
       PadReg = MIRBuilder.buildUndef(GCDTy).getReg(0);
     else {
       assert(PadStrategy == TargetOpcode::G_SEXT);
 
       // Shift the sign bit of the low register through the high register.
       auto ShiftAmt =
         MIRBuilder.buildConstant(LLT::scalar(64), GCDTy.getSizeInBits() - 1);
       PadReg = MIRBuilder.buildAShr(GCDTy, VRegs.back(), ShiftAmt).getReg(0);
     }
   }
 
   // Registers for the final merge to be produced.
   SmallVector<Register, 4> Remerge(NumParts);
 
   // Registers needed for intermediate merges, which will be merged into a
   // source for Remerge.
   SmallVector<Register, 4> SubMerge(NumSubParts);
 
   // Once we've fully read off the end of the original source bits, we can reuse
   // the same high bits for remaining padding elements.
   Register AllPadReg;
 
   // Build merges to the LCM type to cover the original result type.
   for (int I = 0; I != NumParts; ++I) {
     bool AllMergePartsArePadding = true;
 
     // Build the requested merges to the requested type.
     for (int J = 0; J != NumSubParts; ++J) {
       int Idx = I * NumSubParts + J;
       if (Idx >= NumOrigSrc) {
         SubMerge[J] = PadReg;
         continue;
       }
 
       SubMerge[J] = VRegs[Idx];
 
       // There are meaningful bits here we can't reuse later.
       AllMergePartsArePadding = false;
     }
 
     // If we've filled up a complete piece with padding bits, we can directly
     // emit the natural sized constant if applicable, rather than a merge of
     // smaller constants.
     if (AllMergePartsArePadding && !AllPadReg) {
       if (PadStrategy == TargetOpcode::G_ANYEXT)
         AllPadReg = MIRBuilder.buildUndef(NarrowTy).getReg(0);
       else if (PadStrategy == TargetOpcode::G_ZEXT)
         AllPadReg = MIRBuilder.buildConstant(NarrowTy, 0).getReg(0);
 
       // If this is a sign extension, we can't materialize a trivial constant
       // with the right type and have to produce a merge.
     }
 
     if (AllPadReg) {
       // Avoid creating additional instructions if we're just adding additional
       // copies of padding bits.
       Remerge[I] = AllPadReg;
       continue;
     }
 
     if (NumSubParts == 1)
       Remerge[I] = SubMerge[0];
     else
       Remerge[I] = MIRBuilder.buildMerge(NarrowTy, SubMerge).getReg(0);
 
     // In the sign extend padding case, re-use the first all-signbit merge.
     if (AllMergePartsArePadding && !AllPadReg)
       AllPadReg = Remerge[I];
   }
 
   VRegs = std::move(Remerge);
   return LCMTy;
 }
 
 void LegalizerHelper::buildWidenedRemergeToDst(Register DstReg, LLT LCMTy,
                                                ArrayRef<Register> RemergeRegs) {
   LLT DstTy = MRI.getType(DstReg);
 
   // Create the merge to the widened source, and extract the relevant bits into
   // the result.
 
   if (DstTy == LCMTy) {
     MIRBuilder.buildMerge(DstReg, RemergeRegs);
     return;
   }
 
   auto Remerge = MIRBuilder.buildMerge(LCMTy, RemergeRegs);
   if (DstTy.isScalar() && LCMTy.isScalar()) {
     MIRBuilder.buildTrunc(DstReg, Remerge);
     return;
   }
 
   if (LCMTy.isVector()) {
     MIRBuilder.buildExtract(DstReg, Remerge, 0);
     return;
   }
 
   llvm_unreachable("unhandled case");
 }
 
 static RTLIB::Libcall getRTLibDesc(unsigned Opcode, unsigned Size) {
 #define RTLIBCASE(LibcallPrefix)                                               \
   do {                                                                         \
     switch (Size) {                                                            \
     case 32:                                                                   \
       return RTLIB::LibcallPrefix##32;                                         \
     case 64:                                                                   \
       return RTLIB::LibcallPrefix##64;                                         \
     case 128:                                                                  \
       return RTLIB::LibcallPrefix##128;                                        \
     default:                                                                   \
       llvm_unreachable("unexpected size");                                     \
     }                                                                          \
   } while (0)
 
   assert((Size == 32 || Size == 64 || Size == 128) && "Unsupported size");
 
   switch (Opcode) {
   case TargetOpcode::G_SDIV:
     RTLIBCASE(SDIV_I);
   case TargetOpcode::G_UDIV:
     RTLIBCASE(UDIV_I);
   case TargetOpcode::G_SREM:
     RTLIBCASE(SREM_I);
   case TargetOpcode::G_UREM:
     RTLIBCASE(UREM_I);
   case TargetOpcode::G_CTLZ_ZERO_UNDEF:
     RTLIBCASE(CTLZ_I);
   case TargetOpcode::G_FADD:
     RTLIBCASE(ADD_F);
   case TargetOpcode::G_FSUB:
     RTLIBCASE(SUB_F);
   case TargetOpcode::G_FMUL:
     RTLIBCASE(MUL_F);
   case TargetOpcode::G_FDIV:
     RTLIBCASE(DIV_F);
   case TargetOpcode::G_FEXP:
     RTLIBCASE(EXP_F);
   case TargetOpcode::G_FEXP2:
     RTLIBCASE(EXP2_F);
   case TargetOpcode::G_FREM:
     RTLIBCASE(REM_F);
   case TargetOpcode::G_FPOW:
     RTLIBCASE(POW_F);
   case TargetOpcode::G_FMA:
     RTLIBCASE(FMA_F);
   case TargetOpcode::G_FSIN:
     RTLIBCASE(SIN_F);
   case TargetOpcode::G_FCOS:
     RTLIBCASE(COS_F);
   case TargetOpcode::G_FLOG10:
     RTLIBCASE(LOG10_F);
   case TargetOpcode::G_FLOG:
     RTLIBCASE(LOG_F);
   case TargetOpcode::G_FLOG2:
     RTLIBCASE(LOG2_F);
   case TargetOpcode::G_FCEIL:
     RTLIBCASE(CEIL_F);
   case TargetOpcode::G_FFLOOR:
     RTLIBCASE(FLOOR_F);
   case TargetOpcode::G_FMINNUM:
     RTLIBCASE(FMIN_F);
   case TargetOpcode::G_FMAXNUM:
     RTLIBCASE(FMAX_F);
   case TargetOpcode::G_FSQRT:
     RTLIBCASE(SQRT_F);
   case TargetOpcode::G_FRINT:
     RTLIBCASE(RINT_F);
   case TargetOpcode::G_FNEARBYINT:
     RTLIBCASE(NEARBYINT_F);
   }
   llvm_unreachable("Unknown libcall function");
 }
 
 /// True if an instruction is in tail position in its caller. Intended for
 /// legalizing libcalls as tail calls when possible.
 static bool isLibCallInTailPosition(MachineInstr &MI) {
   MachineBasicBlock &MBB = *MI.getParent();
   const Function &F = MBB.getParent()->getFunction();
 
   // Conservatively require the attributes of the call to match those of
   // the return. Ignore NoAlias and NonNull because they don't affect the
   // call sequence.
   AttributeList CallerAttrs = F.getAttributes();
   if (AttrBuilder(CallerAttrs, AttributeList::ReturnIndex)
           .removeAttribute(Attribute::NoAlias)
           .removeAttribute(Attribute::NonNull)
           .hasAttributes())
     return false;
 
   // It's not safe to eliminate the sign / zero extension of the return value.
   if (CallerAttrs.hasAttribute(AttributeList::ReturnIndex, Attribute::ZExt) ||
       CallerAttrs.hasAttribute(AttributeList::ReturnIndex, Attribute::SExt))
     return false;
 
   // Only tail call if the following instruction is a standard return.
   auto &TII = *MI.getMF()->getSubtarget().getInstrInfo();
   auto Next = next_nodbg(MI.getIterator(), MBB.instr_end());
   if (Next == MBB.instr_end() || TII.isTailCall(*Next) || !Next->isReturn())
     return false;
 
   return true;
 }
 
 LegalizerHelper::LegalizeResult
 llvm::createLibcall(MachineIRBuilder &MIRBuilder, const char *Name,
                     const CallLowering::ArgInfo &Result,
                     ArrayRef<CallLowering::ArgInfo> Args,
                     const CallingConv::ID CC) {
   auto &CLI = *MIRBuilder.getMF().getSubtarget().getCallLowering();
 
   CallLowering::CallLoweringInfo Info;
   Info.CallConv = CC;
   Info.Callee = MachineOperand::CreateES(Name);
   Info.OrigRet = Result;
   std::copy(Args.begin(), Args.end(), std::back_inserter(Info.OrigArgs));
   if (!CLI.lowerCall(MIRBuilder, Info))
     return LegalizerHelper::UnableToLegalize;
 
   return LegalizerHelper::Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 llvm::createLibcall(MachineIRBuilder &MIRBuilder, RTLIB::Libcall Libcall,
                     const CallLowering::ArgInfo &Result,
                     ArrayRef<CallLowering::ArgInfo> Args) {
   auto &TLI = *MIRBuilder.getMF().getSubtarget().getTargetLowering();
   const char *Name = TLI.getLibcallName(Libcall);
   const CallingConv::ID CC = TLI.getLibcallCallingConv(Libcall);
   return createLibcall(MIRBuilder, Name, Result, Args, CC);
 }
 
 // Useful for libcalls where all operands have the same type.
 static LegalizerHelper::LegalizeResult
 simpleLibcall(MachineInstr &MI, MachineIRBuilder &MIRBuilder, unsigned Size,
               Type *OpType) {
   auto Libcall = getRTLibDesc(MI.getOpcode(), Size);
 
   SmallVector<CallLowering::ArgInfo, 3> Args;
   for (unsigned i = 1; i < MI.getNumOperands(); i++)
     Args.push_back({MI.getOperand(i).getReg(), OpType});
   return createLibcall(MIRBuilder, Libcall, {MI.getOperand(0).getReg(), OpType},
                        Args);
 }
 
 LegalizerHelper::LegalizeResult
 llvm::createMemLibcall(MachineIRBuilder &MIRBuilder, MachineRegisterInfo &MRI,
                        MachineInstr &MI) {
   assert(MI.getOpcode() == TargetOpcode::G_INTRINSIC_W_SIDE_EFFECTS);
   auto &Ctx = MIRBuilder.getMF().getFunction().getContext();
 
   SmallVector<CallLowering::ArgInfo, 3> Args;
   // Add all the args, except for the last which is an imm denoting 'tail'.
   for (unsigned i = 1; i < MI.getNumOperands() - 1; i++) {
     Register Reg = MI.getOperand(i).getReg();
 
     // Need derive an IR type for call lowering.
     LLT OpLLT = MRI.getType(Reg);
     Type *OpTy = nullptr;
     if (OpLLT.isPointer())
       OpTy = Type::getInt8PtrTy(Ctx, OpLLT.getAddressSpace());
     else
       OpTy = IntegerType::get(Ctx, OpLLT.getSizeInBits());
     Args.push_back({Reg, OpTy});
   }
 
   auto &CLI = *MIRBuilder.getMF().getSubtarget().getCallLowering();
   auto &TLI = *MIRBuilder.getMF().getSubtarget().getTargetLowering();
   Intrinsic::ID ID = MI.getOperand(0).getIntrinsicID();
   RTLIB::Libcall RTLibcall;
   switch (ID) {
   case Intrinsic::memcpy:
     RTLibcall = RTLIB::MEMCPY;
     break;
   case Intrinsic::memset:
     RTLibcall = RTLIB::MEMSET;
     break;
   case Intrinsic::memmove:
     RTLibcall = RTLIB::MEMMOVE;
     break;
   default:
     return LegalizerHelper::UnableToLegalize;
   }
   const char *Name = TLI.getLibcallName(RTLibcall);
 
   MIRBuilder.setInstrAndDebugLoc(MI);
 
   CallLowering::CallLoweringInfo Info;
   Info.CallConv = TLI.getLibcallCallingConv(RTLibcall);
   Info.Callee = MachineOperand::CreateES(Name);
   Info.OrigRet = CallLowering::ArgInfo({0}, Type::getVoidTy(Ctx));
   Info.IsTailCall = MI.getOperand(MI.getNumOperands() - 1).getImm() == 1 &&
                     isLibCallInTailPosition(MI);
 
   std::copy(Args.begin(), Args.end(), std::back_inserter(Info.OrigArgs));
   if (!CLI.lowerCall(MIRBuilder, Info))
     return LegalizerHelper::UnableToLegalize;
 
   if (Info.LoweredTailCall) {
     assert(Info.IsTailCall && "Lowered tail call when it wasn't a tail call?");
     // We must have a return following the call (or debug insts) to get past
     // isLibCallInTailPosition.
     do {
       MachineInstr *Next = MI.getNextNode();
       assert(Next && (Next->isReturn() || Next->isDebugInstr()) &&
              "Expected instr following MI to be return or debug inst?");
       // We lowered a tail call, so the call is now the return from the block.
       // Delete the old return.
       Next->eraseFromParent();
     } while (MI.getNextNode());
   }
 
   return LegalizerHelper::Legalized;
 }
 
 static RTLIB::Libcall getConvRTLibDesc(unsigned Opcode, Type *ToType,
                                        Type *FromType) {
   auto ToMVT = MVT::getVT(ToType);
   auto FromMVT = MVT::getVT(FromType);
 
   switch (Opcode) {
   case TargetOpcode::G_FPEXT:
     return RTLIB::getFPEXT(FromMVT, ToMVT);
   case TargetOpcode::G_FPTRUNC:
     return RTLIB::getFPROUND(FromMVT, ToMVT);
   case TargetOpcode::G_FPTOSI:
     return RTLIB::getFPTOSINT(FromMVT, ToMVT);
   case TargetOpcode::G_FPTOUI:
     return RTLIB::getFPTOUINT(FromMVT, ToMVT);
   case TargetOpcode::G_SITOFP:
     return RTLIB::getSINTTOFP(FromMVT, ToMVT);
   case TargetOpcode::G_UITOFP:
     return RTLIB::getUINTTOFP(FromMVT, ToMVT);
   }
   llvm_unreachable("Unsupported libcall function");
 }
 
 static LegalizerHelper::LegalizeResult
 conversionLibcall(MachineInstr &MI, MachineIRBuilder &MIRBuilder, Type *ToType,
                   Type *FromType) {
   RTLIB::Libcall Libcall = getConvRTLibDesc(MI.getOpcode(), ToType, FromType);
   return createLibcall(MIRBuilder, Libcall, {MI.getOperand(0).getReg(), ToType},
                        {{MI.getOperand(1).getReg(), FromType}});
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::libcall(MachineInstr &MI) {
   LLT LLTy = MRI.getType(MI.getOperand(0).getReg());
   unsigned Size = LLTy.getSizeInBits();
   auto &Ctx = MIRBuilder.getMF().getFunction().getContext();
 
   switch (MI.getOpcode()) {
   default:
     return UnableToLegalize;
   case TargetOpcode::G_SDIV:
   case TargetOpcode::G_UDIV:
   case TargetOpcode::G_SREM:
   case TargetOpcode::G_UREM:
   case TargetOpcode::G_CTLZ_ZERO_UNDEF: {
     Type *HLTy = IntegerType::get(Ctx, Size);
     auto Status = simpleLibcall(MI, MIRBuilder, Size, HLTy);
     if (Status != Legalized)
       return Status;
     break;
   }
   case TargetOpcode::G_FADD:
   case TargetOpcode::G_FSUB:
   case TargetOpcode::G_FMUL:
   case TargetOpcode::G_FDIV:
   case TargetOpcode::G_FMA:
   case TargetOpcode::G_FPOW:
   case TargetOpcode::G_FREM:
   case TargetOpcode::G_FCOS:
   case TargetOpcode::G_FSIN:
   case TargetOpcode::G_FLOG10:
   case TargetOpcode::G_FLOG:
   case TargetOpcode::G_FLOG2:
   case TargetOpcode::G_FEXP:
   case TargetOpcode::G_FEXP2:
   case TargetOpcode::G_FCEIL:
   case TargetOpcode::G_FFLOOR:
   case TargetOpcode::G_FMINNUM:
   case TargetOpcode::G_FMAXNUM:
   case TargetOpcode::G_FSQRT:
   case TargetOpcode::G_FRINT:
   case TargetOpcode::G_FNEARBYINT: {
     Type *HLTy = getFloatTypeForLLT(Ctx, LLTy);
     if (!HLTy || (Size != 32 && Size != 64 && Size != 128)) {
       LLVM_DEBUG(dbgs() << "No libcall available for size " << Size << ".\n");
       return UnableToLegalize;
     }
     auto Status = simpleLibcall(MI, MIRBuilder, Size, HLTy);
     if (Status != Legalized)
       return Status;
     break;
   }
   case TargetOpcode::G_FPEXT:
   case TargetOpcode::G_FPTRUNC: {
     Type *FromTy = getFloatTypeForLLT(Ctx,  MRI.getType(MI.getOperand(1).getReg()));
     Type *ToTy = getFloatTypeForLLT(Ctx, MRI.getType(MI.getOperand(0).getReg()));
     if (!FromTy || !ToTy)
       return UnableToLegalize;
     LegalizeResult Status = conversionLibcall(MI, MIRBuilder, ToTy, FromTy );
     if (Status != Legalized)
       return Status;
     break;
   }
   case TargetOpcode::G_FPTOSI:
   case TargetOpcode::G_FPTOUI: {
     // FIXME: Support other types
     unsigned FromSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
     unsigned ToSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
     if ((ToSize != 32 && ToSize != 64) || (FromSize != 32 && FromSize != 64))
       return UnableToLegalize;
     LegalizeResult Status = conversionLibcall(
         MI, MIRBuilder,
         ToSize == 32 ? Type::getInt32Ty(Ctx) : Type::getInt64Ty(Ctx),
         FromSize == 64 ? Type::getDoubleTy(Ctx) : Type::getFloatTy(Ctx));
     if (Status != Legalized)
       return Status;
     break;
   }
   case TargetOpcode::G_SITOFP:
   case TargetOpcode::G_UITOFP: {
     // FIXME: Support other types
     unsigned FromSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
     unsigned ToSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
     if ((FromSize != 32 && FromSize != 64) || (ToSize != 32 && ToSize != 64))
       return UnableToLegalize;
     LegalizeResult Status = conversionLibcall(
         MI, MIRBuilder,
         ToSize == 64 ? Type::getDoubleTy(Ctx) : Type::getFloatTy(Ctx),
         FromSize == 32 ? Type::getInt32Ty(Ctx) : Type::getInt64Ty(Ctx));
     if (Status != Legalized)
       return Status;
     break;
   }
   }
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult LegalizerHelper::narrowScalar(MachineInstr &MI,
                                                               unsigned TypeIdx,
                                                               LLT NarrowTy) {
   uint64_t SizeOp0 = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
   uint64_t NarrowSize = NarrowTy.getSizeInBits();
 
   switch (MI.getOpcode()) {
   default:
     return UnableToLegalize;
   case TargetOpcode::G_IMPLICIT_DEF: {
     Register DstReg = MI.getOperand(0).getReg();
     LLT DstTy = MRI.getType(DstReg);
 
     // If SizeOp0 is not an exact multiple of NarrowSize, emit
     // G_ANYEXT(G_IMPLICIT_DEF). Cast result to vector if needed.
     // FIXME: Although this would also be legal for the general case, it causes
     //  a lot of regressions in the emitted code (superfluous COPYs, artifact
     //  combines not being hit). This seems to be a problem related to the
     //  artifact combiner.
     if (SizeOp0 % NarrowSize != 0) {
       LLT ImplicitTy = NarrowTy;
       if (DstTy.isVector())
         ImplicitTy = LLT::vector(DstTy.getNumElements(), ImplicitTy);
 
       Register ImplicitReg = MIRBuilder.buildUndef(ImplicitTy).getReg(0);
       MIRBuilder.buildAnyExt(DstReg, ImplicitReg);
 
       MI.eraseFromParent();
       return Legalized;
     }
 
     int NumParts = SizeOp0 / NarrowSize;
 
     SmallVector<Register, 2> DstRegs;
     for (int i = 0; i < NumParts; ++i)
       DstRegs.push_back(MIRBuilder.buildUndef(NarrowTy).getReg(0));
 
     if (DstTy.isVector())
       MIRBuilder.buildBuildVector(DstReg, DstRegs);
     else
       MIRBuilder.buildMerge(DstReg, DstRegs);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_CONSTANT: {
     LLT Ty = MRI.getType(MI.getOperand(0).getReg());
     const APInt &Val = MI.getOperand(1).getCImm()->getValue();
     unsigned TotalSize = Ty.getSizeInBits();
     unsigned NarrowSize = NarrowTy.getSizeInBits();
     int NumParts = TotalSize / NarrowSize;
 
     SmallVector<Register, 4> PartRegs;
     for (int I = 0; I != NumParts; ++I) {
       unsigned Offset = I * NarrowSize;
       auto K = MIRBuilder.buildConstant(NarrowTy,
                                         Val.lshr(Offset).trunc(NarrowSize));
       PartRegs.push_back(K.getReg(0));
     }
 
     LLT LeftoverTy;
     unsigned LeftoverBits = TotalSize - NumParts * NarrowSize;
     SmallVector<Register, 1> LeftoverRegs;
     if (LeftoverBits != 0) {
       LeftoverTy = LLT::scalar(LeftoverBits);
       auto K = MIRBuilder.buildConstant(
         LeftoverTy,
         Val.lshr(NumParts * NarrowSize).trunc(LeftoverBits));
       LeftoverRegs.push_back(K.getReg(0));
     }
 
     insertParts(MI.getOperand(0).getReg(),
                 Ty, NarrowTy, PartRegs, LeftoverTy, LeftoverRegs);
 
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_SEXT:
   case TargetOpcode::G_ZEXT:
   case TargetOpcode::G_ANYEXT:
     return narrowScalarExt(MI, TypeIdx, NarrowTy);
   case TargetOpcode::G_TRUNC: {
     if (TypeIdx != 1)
       return UnableToLegalize;
 
     uint64_t SizeOp1 = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
     if (NarrowTy.getSizeInBits() * 2 != SizeOp1) {
       LLVM_DEBUG(dbgs() << "Can't narrow trunc to type " << NarrowTy << "\n");
       return UnableToLegalize;
     }
 
     auto Unmerge = MIRBuilder.buildUnmerge(NarrowTy, MI.getOperand(1));
     MIRBuilder.buildCopy(MI.getOperand(0), Unmerge.getReg(0));
     MI.eraseFromParent();
     return Legalized;
   }
 
   case TargetOpcode::G_FREEZE:
     return reduceOperationWidth(MI, TypeIdx, NarrowTy);
 
   case TargetOpcode::G_ADD: {
     // FIXME: add support for when SizeOp0 isn't an exact multiple of
     // NarrowSize.
     if (SizeOp0 % NarrowSize != 0)
       return UnableToLegalize;
     // Expand in terms of carry-setting/consuming G_ADDE instructions.
     int NumParts = SizeOp0 / NarrowTy.getSizeInBits();
 
     SmallVector<Register, 2> Src1Regs, Src2Regs, DstRegs;
     extractParts(MI.getOperand(1).getReg(), NarrowTy, NumParts, Src1Regs);
     extractParts(MI.getOperand(2).getReg(), NarrowTy, NumParts, Src2Regs);
 
     Register CarryIn;
     for (int i = 0; i < NumParts; ++i) {
       Register DstReg = MRI.createGenericVirtualRegister(NarrowTy);
       Register CarryOut = MRI.createGenericVirtualRegister(LLT::scalar(1));
 
       if (i == 0)
         MIRBuilder.buildUAddo(DstReg, CarryOut, Src1Regs[i], Src2Regs[i]);
       else {
         MIRBuilder.buildUAdde(DstReg, CarryOut, Src1Regs[i],
                               Src2Regs[i], CarryIn);
       }
 
       DstRegs.push_back(DstReg);
       CarryIn = CarryOut;
     }
     Register DstReg = MI.getOperand(0).getReg();
     if(MRI.getType(DstReg).isVector())
       MIRBuilder.buildBuildVector(DstReg, DstRegs);
     else
       MIRBuilder.buildMerge(DstReg, DstRegs);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_SUB: {
     // FIXME: add support for when SizeOp0 isn't an exact multiple of
     // NarrowSize.
     if (SizeOp0 % NarrowSize != 0)
       return UnableToLegalize;
 
     int NumParts = SizeOp0 / NarrowTy.getSizeInBits();
 
     SmallVector<Register, 2> Src1Regs, Src2Regs, DstRegs;
     extractParts(MI.getOperand(1).getReg(), NarrowTy, NumParts, Src1Regs);
     extractParts(MI.getOperand(2).getReg(), NarrowTy, NumParts, Src2Regs);
 
     Register DstReg = MRI.createGenericVirtualRegister(NarrowTy);
     Register BorrowOut = MRI.createGenericVirtualRegister(LLT::scalar(1));
     MIRBuilder.buildInstr(TargetOpcode::G_USUBO, {DstReg, BorrowOut},
                           {Src1Regs[0], Src2Regs[0]});
     DstRegs.push_back(DstReg);
     Register BorrowIn = BorrowOut;
     for (int i = 1; i < NumParts; ++i) {
       DstReg = MRI.createGenericVirtualRegister(NarrowTy);
       BorrowOut = MRI.createGenericVirtualRegister(LLT::scalar(1));
 
       MIRBuilder.buildInstr(TargetOpcode::G_USUBE, {DstReg, BorrowOut},
                             {Src1Regs[i], Src2Regs[i], BorrowIn});
 
       DstRegs.push_back(DstReg);
       BorrowIn = BorrowOut;
     }
     MIRBuilder.buildMerge(MI.getOperand(0), DstRegs);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_MUL:
   case TargetOpcode::G_UMULH:
     return narrowScalarMul(MI, NarrowTy);
   case TargetOpcode::G_EXTRACT:
     return narrowScalarExtract(MI, TypeIdx, NarrowTy);
   case TargetOpcode::G_INSERT:
     return narrowScalarInsert(MI, TypeIdx, NarrowTy);
   case TargetOpcode::G_LOAD: {
     const auto &MMO = **MI.memoperands_begin();
     Register DstReg = MI.getOperand(0).getReg();
     LLT DstTy = MRI.getType(DstReg);
     if (DstTy.isVector())
       return UnableToLegalize;
 
     if (8 * MMO.getSize() != DstTy.getSizeInBits()) {
       Register TmpReg = MRI.createGenericVirtualRegister(NarrowTy);
       auto &MMO = **MI.memoperands_begin();
       MIRBuilder.buildLoad(TmpReg, MI.getOperand(1), MMO);
       MIRBuilder.buildAnyExt(DstReg, TmpReg);
       MI.eraseFromParent();
       return Legalized;
     }
 
     return reduceLoadStoreWidth(MI, TypeIdx, NarrowTy);
   }
   case TargetOpcode::G_ZEXTLOAD:
   case TargetOpcode::G_SEXTLOAD: {
     bool ZExt = MI.getOpcode() == TargetOpcode::G_ZEXTLOAD;
     Register DstReg = MI.getOperand(0).getReg();
     Register PtrReg = MI.getOperand(1).getReg();
 
     Register TmpReg = MRI.createGenericVirtualRegister(NarrowTy);
     auto &MMO = **MI.memoperands_begin();
     if (MMO.getSizeInBits() == NarrowSize) {
       MIRBuilder.buildLoad(TmpReg, PtrReg, MMO);
     } else {
       MIRBuilder.buildLoadInstr(MI.getOpcode(), TmpReg, PtrReg, MMO);
     }
 
     if (ZExt)
       MIRBuilder.buildZExt(DstReg, TmpReg);
     else
       MIRBuilder.buildSExt(DstReg, TmpReg);
 
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_STORE: {
     const auto &MMO = **MI.memoperands_begin();
 
     Register SrcReg = MI.getOperand(0).getReg();
     LLT SrcTy = MRI.getType(SrcReg);
     if (SrcTy.isVector())
       return UnableToLegalize;
 
     int NumParts = SizeOp0 / NarrowSize;
     unsigned HandledSize = NumParts * NarrowTy.getSizeInBits();
     unsigned LeftoverBits = SrcTy.getSizeInBits() - HandledSize;
     if (SrcTy.isVector() && LeftoverBits != 0)
       return UnableToLegalize;
 
     if (8 * MMO.getSize() != SrcTy.getSizeInBits()) {
       Register TmpReg = MRI.createGenericVirtualRegister(NarrowTy);
       auto &MMO = **MI.memoperands_begin();
       MIRBuilder.buildTrunc(TmpReg, SrcReg);
       MIRBuilder.buildStore(TmpReg, MI.getOperand(1), MMO);
       MI.eraseFromParent();
       return Legalized;
     }
 
     return reduceLoadStoreWidth(MI, 0, NarrowTy);
   }
   case TargetOpcode::G_SELECT:
     return narrowScalarSelect(MI, TypeIdx, NarrowTy);
   case TargetOpcode::G_AND:
   case TargetOpcode::G_OR:
   case TargetOpcode::G_XOR: {
     // Legalize bitwise operation:
     // A = BinOp<Ty> B, C
     // into:
     // B1, ..., BN = G_UNMERGE_VALUES B
     // C1, ..., CN = G_UNMERGE_VALUES C
     // A1 = BinOp<Ty/N> B1, C2
     // ...
     // AN = BinOp<Ty/N> BN, CN
     // A = G_MERGE_VALUES A1, ..., AN
     return narrowScalarBasic(MI, TypeIdx, NarrowTy);
   }
   case TargetOpcode::G_SHL:
   case TargetOpcode::G_LSHR:
   case TargetOpcode::G_ASHR:
     return narrowScalarShift(MI, TypeIdx, NarrowTy);
   case TargetOpcode::G_CTLZ:
   case TargetOpcode::G_CTLZ_ZERO_UNDEF:
   case TargetOpcode::G_CTTZ:
   case TargetOpcode::G_CTTZ_ZERO_UNDEF:
   case TargetOpcode::G_CTPOP:
     if (TypeIdx == 1)
       switch (MI.getOpcode()) {
       case TargetOpcode::G_CTLZ:
       case TargetOpcode::G_CTLZ_ZERO_UNDEF:
         return narrowScalarCTLZ(MI, TypeIdx, NarrowTy);
       case TargetOpcode::G_CTTZ:
       case TargetOpcode::G_CTTZ_ZERO_UNDEF:
         return narrowScalarCTTZ(MI, TypeIdx, NarrowTy);
       case TargetOpcode::G_CTPOP:
         return narrowScalarCTPOP(MI, TypeIdx, NarrowTy);
       default:
         return UnableToLegalize;
       }
 
     Observer.changingInstr(MI);
     narrowScalarDst(MI, NarrowTy, 0, TargetOpcode::G_ZEXT);
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_INTTOPTR:
     if (TypeIdx != 1)
       return UnableToLegalize;
 
     Observer.changingInstr(MI);
     narrowScalarSrc(MI, NarrowTy, 1);
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_PTRTOINT:
     if (TypeIdx != 0)
       return UnableToLegalize;
 
     Observer.changingInstr(MI);
     narrowScalarDst(MI, NarrowTy, 0, TargetOpcode::G_ZEXT);
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_PHI: {
     unsigned NumParts = SizeOp0 / NarrowSize;
     SmallVector<Register, 2> DstRegs(NumParts);
     SmallVector<SmallVector<Register, 2>, 2> SrcRegs(MI.getNumOperands() / 2);
     Observer.changingInstr(MI);
     for (unsigned i = 1; i < MI.getNumOperands(); i += 2) {
       MachineBasicBlock &OpMBB = *MI.getOperand(i + 1).getMBB();
       MIRBuilder.setInsertPt(OpMBB, OpMBB.getFirstTerminator());
       extractParts(MI.getOperand(i).getReg(), NarrowTy, NumParts,
                    SrcRegs[i / 2]);
     }
     MachineBasicBlock &MBB = *MI.getParent();
     MIRBuilder.setInsertPt(MBB, MI);
     for (unsigned i = 0; i < NumParts; ++i) {
       DstRegs[i] = MRI.createGenericVirtualRegister(NarrowTy);
       MachineInstrBuilder MIB =
           MIRBuilder.buildInstr(TargetOpcode::G_PHI).addDef(DstRegs[i]);
       for (unsigned j = 1; j < MI.getNumOperands(); j += 2)
         MIB.addUse(SrcRegs[j / 2][i]).add(MI.getOperand(j + 1));
     }
     MIRBuilder.setInsertPt(MBB, MBB.getFirstNonPHI());
     MIRBuilder.buildMerge(MI.getOperand(0), DstRegs);
     Observer.changedInstr(MI);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_EXTRACT_VECTOR_ELT:
   case TargetOpcode::G_INSERT_VECTOR_ELT: {
     if (TypeIdx != 2)
       return UnableToLegalize;
 
     int OpIdx = MI.getOpcode() == TargetOpcode::G_EXTRACT_VECTOR_ELT ? 2 : 3;
     Observer.changingInstr(MI);
     narrowScalarSrc(MI, NarrowTy, OpIdx);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_ICMP: {
     uint64_t SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
     if (NarrowSize * 2 != SrcSize)
       return UnableToLegalize;
 
     Observer.changingInstr(MI);
     Register LHSL = MRI.createGenericVirtualRegister(NarrowTy);
     Register LHSH = MRI.createGenericVirtualRegister(NarrowTy);
     MIRBuilder.buildUnmerge({LHSL, LHSH}, MI.getOperand(2));
 
     Register RHSL = MRI.createGenericVirtualRegister(NarrowTy);
     Register RHSH = MRI.createGenericVirtualRegister(NarrowTy);
     MIRBuilder.buildUnmerge({RHSL, RHSH}, MI.getOperand(3));
 
     CmpInst::Predicate Pred =
         static_cast<CmpInst::Predicate>(MI.getOperand(1).getPredicate());
     LLT ResTy = MRI.getType(MI.getOperand(0).getReg());
 
     if (Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) {
       MachineInstrBuilder XorL = MIRBuilder.buildXor(NarrowTy, LHSL, RHSL);
       MachineInstrBuilder XorH = MIRBuilder.buildXor(NarrowTy, LHSH, RHSH);
       MachineInstrBuilder Or = MIRBuilder.buildOr(NarrowTy, XorL, XorH);
       MachineInstrBuilder Zero = MIRBuilder.buildConstant(NarrowTy, 0);
       MIRBuilder.buildICmp(Pred, MI.getOperand(0), Or, Zero);
     } else {
       MachineInstrBuilder CmpH = MIRBuilder.buildICmp(Pred, ResTy, LHSH, RHSH);
       MachineInstrBuilder CmpHEQ =
           MIRBuilder.buildICmp(CmpInst::Predicate::ICMP_EQ, ResTy, LHSH, RHSH);
       MachineInstrBuilder CmpLU = MIRBuilder.buildICmp(
           ICmpInst::getUnsignedPredicate(Pred), ResTy, LHSL, RHSL);
       MIRBuilder.buildSelect(MI.getOperand(0), CmpHEQ, CmpLU, CmpH);
     }
     Observer.changedInstr(MI);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_SEXT_INREG: {
     if (TypeIdx != 0)
       return UnableToLegalize;
 
     int64_t SizeInBits = MI.getOperand(2).getImm();
 
     // So long as the new type has more bits than the bits we're extending we
     // don't need to break it apart.
     if (NarrowTy.getScalarSizeInBits() >= SizeInBits) {
       Observer.changingInstr(MI);
       // We don't lose any non-extension bits by truncating the src and
       // sign-extending the dst.
       MachineOperand &MO1 = MI.getOperand(1);
       auto TruncMIB = MIRBuilder.buildTrunc(NarrowTy, MO1);
       MO1.setReg(TruncMIB.getReg(0));
 
       MachineOperand &MO2 = MI.getOperand(0);
       Register DstExt = MRI.createGenericVirtualRegister(NarrowTy);
       MIRBuilder.setInsertPt(MIRBuilder.getMBB(), ++MIRBuilder.getInsertPt());
       MIRBuilder.buildSExt(MO2, DstExt);
       MO2.setReg(DstExt);
       Observer.changedInstr(MI);
       return Legalized;
     }
 
     // Break it apart. Components below the extension point are unmodified. The
     // component containing the extension point becomes a narrower SEXT_INREG.
     // Components above it are ashr'd from the component containing the
     // extension point.
     if (SizeOp0 % NarrowSize != 0)
       return UnableToLegalize;
     int NumParts = SizeOp0 / NarrowSize;
 
     // List the registers where the destination will be scattered.
     SmallVector<Register, 2> DstRegs;
     // List the registers where the source will be split.
     SmallVector<Register, 2> SrcRegs;
 
     // Create all the temporary registers.
     for (int i = 0; i < NumParts; ++i) {
       Register SrcReg = MRI.createGenericVirtualRegister(NarrowTy);
 
       SrcRegs.push_back(SrcReg);
     }
 
     // Explode the big arguments into smaller chunks.
     MIRBuilder.buildUnmerge(SrcRegs, MI.getOperand(1));
 
     Register AshrCstReg =
         MIRBuilder.buildConstant(NarrowTy, NarrowTy.getScalarSizeInBits() - 1)
             .getReg(0);
     Register FullExtensionReg = 0;
     Register PartialExtensionReg = 0;
 
     // Do the operation on each small part.
     for (int i = 0; i < NumParts; ++i) {
       if ((i + 1) * NarrowTy.getScalarSizeInBits() < SizeInBits)
         DstRegs.push_back(SrcRegs[i]);
       else if (i * NarrowTy.getScalarSizeInBits() > SizeInBits) {
         assert(PartialExtensionReg &&
                "Expected to visit partial extension before full");
         if (FullExtensionReg) {
           DstRegs.push_back(FullExtensionReg);
           continue;
         }
         DstRegs.push_back(
             MIRBuilder.buildAShr(NarrowTy, PartialExtensionReg, AshrCstReg)
                 .getReg(0));
         FullExtensionReg = DstRegs.back();
       } else {
         DstRegs.push_back(
             MIRBuilder
                 .buildInstr(
                     TargetOpcode::G_SEXT_INREG, {NarrowTy},
                     {SrcRegs[i], SizeInBits % NarrowTy.getScalarSizeInBits()})
                 .getReg(0));
         PartialExtensionReg = DstRegs.back();
       }
     }
 
     // Gather the destination registers into the final destination.
     Register DstReg = MI.getOperand(0).getReg();
     MIRBuilder.buildMerge(DstReg, DstRegs);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_BSWAP:
   case TargetOpcode::G_BITREVERSE: {
     if (SizeOp0 % NarrowSize != 0)
       return UnableToLegalize;
 
     Observer.changingInstr(MI);
     SmallVector<Register, 2> SrcRegs, DstRegs;
     unsigned NumParts = SizeOp0 / NarrowSize;
     extractParts(MI.getOperand(1).getReg(), NarrowTy, NumParts, SrcRegs);
 
     for (unsigned i = 0; i < NumParts; ++i) {
       auto DstPart = MIRBuilder.buildInstr(MI.getOpcode(), {NarrowTy},
                                            {SrcRegs[NumParts - 1 - i]});
       DstRegs.push_back(DstPart.getReg(0));
     }
 
     MIRBuilder.buildMerge(MI.getOperand(0), DstRegs);
 
     Observer.changedInstr(MI);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_PTRMASK: {
     if (TypeIdx != 1)
       return UnableToLegalize;
     Observer.changingInstr(MI);
     narrowScalarSrc(MI, NarrowTy, 2);
     Observer.changedInstr(MI);
     return Legalized;
   }
   }
 }
 
 Register LegalizerHelper::coerceToScalar(Register Val) {
   LLT Ty = MRI.getType(Val);
   if (Ty.isScalar())
     return Val;
 
   const DataLayout &DL = MIRBuilder.getDataLayout();
   LLT NewTy = LLT::scalar(Ty.getSizeInBits());
   if (Ty.isPointer()) {
     if (DL.isNonIntegralAddressSpace(Ty.getAddressSpace()))
       return Register();
     return MIRBuilder.buildPtrToInt(NewTy, Val).getReg(0);
   }
 
   Register NewVal = Val;
 
   assert(Ty.isVector());
   LLT EltTy = Ty.getElementType();
   if (EltTy.isPointer())
     NewVal = MIRBuilder.buildPtrToInt(NewTy, NewVal).getReg(0);
   return MIRBuilder.buildBitcast(NewTy, NewVal).getReg(0);
 }
 
 void LegalizerHelper::widenScalarSrc(MachineInstr &MI, LLT WideTy,
                                      unsigned OpIdx, unsigned ExtOpcode) {
   MachineOperand &MO = MI.getOperand(OpIdx);
   auto ExtB = MIRBuilder.buildInstr(ExtOpcode, {WideTy}, {MO});
   MO.setReg(ExtB.getReg(0));
 }
 
 void LegalizerHelper::narrowScalarSrc(MachineInstr &MI, LLT NarrowTy,
                                       unsigned OpIdx) {
   MachineOperand &MO = MI.getOperand(OpIdx);
   auto ExtB = MIRBuilder.buildTrunc(NarrowTy, MO);
   MO.setReg(ExtB.getReg(0));
 }
 
 void LegalizerHelper::widenScalarDst(MachineInstr &MI, LLT WideTy,
                                      unsigned OpIdx, unsigned TruncOpcode) {
   MachineOperand &MO = MI.getOperand(OpIdx);
   Register DstExt = MRI.createGenericVirtualRegister(WideTy);
   MIRBuilder.setInsertPt(MIRBuilder.getMBB(), ++MIRBuilder.getInsertPt());
   MIRBuilder.buildInstr(TruncOpcode, {MO}, {DstExt});
   MO.setReg(DstExt);
 }
 
 void LegalizerHelper::narrowScalarDst(MachineInstr &MI, LLT NarrowTy,
                                       unsigned OpIdx, unsigned ExtOpcode) {
   MachineOperand &MO = MI.getOperand(OpIdx);
   Register DstTrunc = MRI.createGenericVirtualRegister(NarrowTy);
   MIRBuilder.setInsertPt(MIRBuilder.getMBB(), ++MIRBuilder.getInsertPt());
   MIRBuilder.buildInstr(ExtOpcode, {MO}, {DstTrunc});
   MO.setReg(DstTrunc);
 }
 
 void LegalizerHelper::moreElementsVectorDst(MachineInstr &MI, LLT WideTy,
                                             unsigned OpIdx) {
   MachineOperand &MO = MI.getOperand(OpIdx);
   Register DstExt = MRI.createGenericVirtualRegister(WideTy);
   MIRBuilder.setInsertPt(MIRBuilder.getMBB(), ++MIRBuilder.getInsertPt());
   MIRBuilder.buildExtract(MO, DstExt, 0);
   MO.setReg(DstExt);
 }
 
 void LegalizerHelper::moreElementsVectorSrc(MachineInstr &MI, LLT MoreTy,
                                             unsigned OpIdx) {
   MachineOperand &MO = MI.getOperand(OpIdx);
 
   LLT OldTy = MRI.getType(MO.getReg());
   unsigned OldElts = OldTy.getNumElements();
   unsigned NewElts = MoreTy.getNumElements();
 
   unsigned NumParts = NewElts / OldElts;
 
   // Use concat_vectors if the result is a multiple of the number of elements.
   if (NumParts * OldElts == NewElts) {
     SmallVector<Register, 8> Parts;
     Parts.push_back(MO.getReg());
 
     Register ImpDef = MIRBuilder.buildUndef(OldTy).getReg(0);
     for (unsigned I = 1; I != NumParts; ++I)
       Parts.push_back(ImpDef);
 
     auto Concat = MIRBuilder.buildConcatVectors(MoreTy, Parts);
     MO.setReg(Concat.getReg(0));
     return;
   }
 
   Register MoreReg = MRI.createGenericVirtualRegister(MoreTy);
   Register ImpDef = MIRBuilder.buildUndef(MoreTy).getReg(0);
   MIRBuilder.buildInsert(MoreReg, ImpDef, MO.getReg(), 0);
   MO.setReg(MoreReg);
 }
 
 void LegalizerHelper::bitcastSrc(MachineInstr &MI, LLT CastTy, unsigned OpIdx) {
   MachineOperand &Op = MI.getOperand(OpIdx);
   Op.setReg(MIRBuilder.buildBitcast(CastTy, Op).getReg(0));
 }
 
 void LegalizerHelper::bitcastDst(MachineInstr &MI, LLT CastTy, unsigned OpIdx) {
   MachineOperand &MO = MI.getOperand(OpIdx);
   Register CastDst = MRI.createGenericVirtualRegister(CastTy);
   MIRBuilder.setInsertPt(MIRBuilder.getMBB(), ++MIRBuilder.getInsertPt());
   MIRBuilder.buildBitcast(MO, CastDst);
   MO.setReg(CastDst);
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::widenScalarMergeValues(MachineInstr &MI, unsigned TypeIdx,
                                         LLT WideTy) {
   if (TypeIdx != 1)
     return UnableToLegalize;
 
   Register DstReg = MI.getOperand(0).getReg();
   LLT DstTy = MRI.getType(DstReg);
   if (DstTy.isVector())
     return UnableToLegalize;
 
   Register Src1 = MI.getOperand(1).getReg();
   LLT SrcTy = MRI.getType(Src1);
   const int DstSize = DstTy.getSizeInBits();
   const int SrcSize = SrcTy.getSizeInBits();
   const int WideSize = WideTy.getSizeInBits();
   const int NumMerge = (DstSize + WideSize - 1) / WideSize;
 
   unsigned NumOps = MI.getNumOperands();
   unsigned NumSrc = MI.getNumOperands() - 1;
   unsigned PartSize = DstTy.getSizeInBits() / NumSrc;
 
   if (WideSize >= DstSize) {
     // Directly pack the bits in the target type.
     Register ResultReg = MIRBuilder.buildZExt(WideTy, Src1).getReg(0);
 
     for (unsigned I = 2; I != NumOps; ++I) {
       const unsigned Offset = (I - 1) * PartSize;
 
       Register SrcReg = MI.getOperand(I).getReg();
       assert(MRI.getType(SrcReg) == LLT::scalar(PartSize));
 
       auto ZextInput = MIRBuilder.buildZExt(WideTy, SrcReg);
 
       Register NextResult = I + 1 == NumOps && WideTy == DstTy ? DstReg :
         MRI.createGenericVirtualRegister(WideTy);
 
       auto ShiftAmt = MIRBuilder.buildConstant(WideTy, Offset);
       auto Shl = MIRBuilder.buildShl(WideTy, ZextInput, ShiftAmt);
       MIRBuilder.buildOr(NextResult, ResultReg, Shl);
       ResultReg = NextResult;
     }
 
     if (WideSize > DstSize)
       MIRBuilder.buildTrunc(DstReg, ResultReg);
     else if (DstTy.isPointer())
       MIRBuilder.buildIntToPtr(DstReg, ResultReg);
 
     MI.eraseFromParent();
     return Legalized;
   }
 
   // Unmerge the original values to the GCD type, and recombine to the next
   // multiple greater than the original type.
   //
   // %3:_(s12) = G_MERGE_VALUES %0:_(s4), %1:_(s4), %2:_(s4) -> s6
   // %4:_(s2), %5:_(s2) = G_UNMERGE_VALUES %0
   // %6:_(s2), %7:_(s2) = G_UNMERGE_VALUES %1
   // %8:_(s2), %9:_(s2) = G_UNMERGE_VALUES %2
   // %10:_(s6) = G_MERGE_VALUES %4, %5, %6
   // %11:_(s6) = G_MERGE_VALUES %7, %8, %9
   // %12:_(s12) = G_MERGE_VALUES %10, %11
   //
   // Padding with undef if necessary:
   //
   // %2:_(s8) = G_MERGE_VALUES %0:_(s4), %1:_(s4) -> s6
   // %3:_(s2), %4:_(s2) = G_UNMERGE_VALUES %0
   // %5:_(s2), %6:_(s2) = G_UNMERGE_VALUES %1
   // %7:_(s2) = G_IMPLICIT_DEF
   // %8:_(s6) = G_MERGE_VALUES %3, %4, %5
   // %9:_(s6) = G_MERGE_VALUES %6, %7, %7
   // %10:_(s12) = G_MERGE_VALUES %8, %9
 
   const int GCD = greatestCommonDivisor(SrcSize, WideSize);
   LLT GCDTy = LLT::scalar(GCD);
 
   SmallVector<Register, 8> Parts;
   SmallVector<Register, 8> NewMergeRegs;
   SmallVector<Register, 8> Unmerges;
   LLT WideDstTy = LLT::scalar(NumMerge * WideSize);
 
   // Decompose the original operands if they don't evenly divide.
   for (int I = 1, E = MI.getNumOperands(); I != E; ++I) {
     Register SrcReg = MI.getOperand(I).getReg();
     if (GCD == SrcSize) {
       Unmerges.push_back(SrcReg);
     } else {
       auto Unmerge = MIRBuilder.buildUnmerge(GCDTy, SrcReg);
       for (int J = 0, JE = Unmerge->getNumOperands() - 1; J != JE; ++J)
         Unmerges.push_back(Unmerge.getReg(J));
     }
   }
 
   // Pad with undef to the next size that is a multiple of the requested size.
   if (static_cast<int>(Unmerges.size()) != NumMerge * WideSize) {
     Register UndefReg = MIRBuilder.buildUndef(GCDTy).getReg(0);
     for (int I = Unmerges.size(); I != NumMerge * WideSize; ++I)
       Unmerges.push_back(UndefReg);
   }
 
   const int PartsPerGCD = WideSize / GCD;
 
   // Build merges of each piece.
   ArrayRef<Register> Slicer(Unmerges);
   for (int I = 0; I != NumMerge; ++I, Slicer = Slicer.drop_front(PartsPerGCD)) {
     auto Merge = MIRBuilder.buildMerge(WideTy, Slicer.take_front(PartsPerGCD));
     NewMergeRegs.push_back(Merge.getReg(0));
   }
 
   // A truncate may be necessary if the requested type doesn't evenly divide the
   // original result type.
   if (DstTy.getSizeInBits() == WideDstTy.getSizeInBits()) {
     MIRBuilder.buildMerge(DstReg, NewMergeRegs);
   } else {
     auto FinalMerge = MIRBuilder.buildMerge(WideDstTy, NewMergeRegs);
     MIRBuilder.buildTrunc(DstReg, FinalMerge.getReg(0));
   }
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::widenScalarUnmergeValues(MachineInstr &MI, unsigned TypeIdx,
                                           LLT WideTy) {
   if (TypeIdx != 0)
     return UnableToLegalize;
 
   int NumDst = MI.getNumOperands() - 1;
   Register SrcReg = MI.getOperand(NumDst).getReg();
   LLT SrcTy = MRI.getType(SrcReg);
   if (SrcTy.isVector())
     return UnableToLegalize;
 
   Register Dst0Reg = MI.getOperand(0).getReg();
   LLT DstTy = MRI.getType(Dst0Reg);
   if (!DstTy.isScalar())
     return UnableToLegalize;
 
   if (WideTy.getSizeInBits() >= SrcTy.getSizeInBits()) {
     if (SrcTy.isPointer()) {
       const DataLayout &DL = MIRBuilder.getDataLayout();
       if (DL.isNonIntegralAddressSpace(SrcTy.getAddressSpace())) {
         LLVM_DEBUG(
             dbgs() << "Not casting non-integral address space integer\n");
         return UnableToLegalize;
       }
 
       SrcTy = LLT::scalar(SrcTy.getSizeInBits());
       SrcReg = MIRBuilder.buildPtrToInt(SrcTy, SrcReg).getReg(0);
     }
 
     // Widen SrcTy to WideTy. This does not affect the result, but since the
     // user requested this size, it is probably better handled than SrcTy and
     // should reduce the total number of legalization artifacts
     if (WideTy.getSizeInBits() > SrcTy.getSizeInBits()) {
       SrcTy = WideTy;
       SrcReg = MIRBuilder.buildAnyExt(WideTy, SrcReg).getReg(0);
     }
 
     // Theres no unmerge type to target. Directly extract the bits from the
     // source type
     unsigned DstSize = DstTy.getSizeInBits();
 
     MIRBuilder.buildTrunc(Dst0Reg, SrcReg);
     for (int I = 1; I != NumDst; ++I) {
       auto ShiftAmt = MIRBuilder.buildConstant(SrcTy, DstSize * I);
       auto Shr = MIRBuilder.buildLShr(SrcTy, SrcReg, ShiftAmt);
       MIRBuilder.buildTrunc(MI.getOperand(I), Shr);
     }
 
     MI.eraseFromParent();
     return Legalized;
   }
 
   // Extend the source to a wider type.
   LLT LCMTy = getLCMType(SrcTy, WideTy);
 
   Register WideSrc = SrcReg;
   if (LCMTy.getSizeInBits() != SrcTy.getSizeInBits()) {
     // TODO: If this is an integral address space, cast to integer and anyext.
     if (SrcTy.isPointer()) {
       LLVM_DEBUG(dbgs() << "Widening pointer source types not implemented\n");
       return UnableToLegalize;
     }
 
     WideSrc = MIRBuilder.buildAnyExt(LCMTy, WideSrc).getReg(0);
   }
 
   auto Unmerge = MIRBuilder.buildUnmerge(WideTy, WideSrc);
 
   // Create a sequence of unmerges to the original results. since we may have
   // widened the source, we will need to pad the results with dead defs to cover
   // the source register.
   // e.g. widen s16 to s32:
   // %1:_(s16), %2:_(s16), %3:_(s16) = G_UNMERGE_VALUES %0:_(s48)
   //
   // =>
   //  %4:_(s64) = G_ANYEXT %0:_(s48)
   //  %5:_(s32), %6:_(s32) = G_UNMERGE_VALUES %4 ; Requested unmerge
   //  %1:_(s16), %2:_(s16) = G_UNMERGE_VALUES %5 ; unpack to original regs
   //  %3:_(s16), dead %7 = G_UNMERGE_VALUES %6 ; original reg + extra dead def
 
   const int NumUnmerge = Unmerge->getNumOperands() - 1;
   const int PartsPerUnmerge = WideTy.getSizeInBits() / DstTy.getSizeInBits();
 
   for (int I = 0; I != NumUnmerge; ++I) {
     auto MIB = MIRBuilder.buildInstr(TargetOpcode::G_UNMERGE_VALUES);
 
     for (int J = 0; J != PartsPerUnmerge; ++J) {
       int Idx = I * PartsPerUnmerge + J;
       if (Idx < NumDst)
         MIB.addDef(MI.getOperand(Idx).getReg());
       else {
         // Create dead def for excess components.
         MIB.addDef(MRI.createGenericVirtualRegister(DstTy));
       }
     }
 
     MIB.addUse(Unmerge.getReg(I));
   }
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::widenScalarExtract(MachineInstr &MI, unsigned TypeIdx,
                                     LLT WideTy) {
   Register DstReg = MI.getOperand(0).getReg();
   Register SrcReg = MI.getOperand(1).getReg();
   LLT SrcTy = MRI.getType(SrcReg);
 
   LLT DstTy = MRI.getType(DstReg);
   unsigned Offset = MI.getOperand(2).getImm();
 
   if (TypeIdx == 0) {
     if (SrcTy.isVector() || DstTy.isVector())
       return UnableToLegalize;
 
     SrcOp Src(SrcReg);
     if (SrcTy.isPointer()) {
       // Extracts from pointers can be handled only if they are really just
       // simple integers.
       const DataLayout &DL = MIRBuilder.getDataLayout();
       if (DL.isNonIntegralAddressSpace(SrcTy.getAddressSpace()))
         return UnableToLegalize;
 
       LLT SrcAsIntTy = LLT::scalar(SrcTy.getSizeInBits());
       Src = MIRBuilder.buildPtrToInt(SrcAsIntTy, Src);
       SrcTy = SrcAsIntTy;
     }
 
     if (DstTy.isPointer())
       return UnableToLegalize;
 
     if (Offset == 0) {
       // Avoid a shift in the degenerate case.
       MIRBuilder.buildTrunc(DstReg,
                             MIRBuilder.buildAnyExtOrTrunc(WideTy, Src));
       MI.eraseFromParent();
       return Legalized;
     }
 
     // Do a shift in the source type.
     LLT ShiftTy = SrcTy;
     if (WideTy.getSizeInBits() > SrcTy.getSizeInBits()) {
       Src = MIRBuilder.buildAnyExt(WideTy, Src);
       ShiftTy = WideTy;
     } else if (WideTy.getSizeInBits() > SrcTy.getSizeInBits())
       return UnableToLegalize;
 
     auto LShr = MIRBuilder.buildLShr(
       ShiftTy, Src, MIRBuilder.buildConstant(ShiftTy, Offset));
     MIRBuilder.buildTrunc(DstReg, LShr);
     MI.eraseFromParent();
     return Legalized;
   }
 
   if (SrcTy.isScalar()) {
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ANYEXT);
     Observer.changedInstr(MI);
     return Legalized;
   }
 
   if (!SrcTy.isVector())
     return UnableToLegalize;
 
   if (DstTy != SrcTy.getElementType())
     return UnableToLegalize;
 
   if (Offset % SrcTy.getScalarSizeInBits() != 0)
     return UnableToLegalize;
 
   Observer.changingInstr(MI);
   widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ANYEXT);
 
   MI.getOperand(2).setImm((WideTy.getSizeInBits() / SrcTy.getSizeInBits()) *
                           Offset);
   widenScalarDst(MI, WideTy.getScalarType(), 0);
   Observer.changedInstr(MI);
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::widenScalarInsert(MachineInstr &MI, unsigned TypeIdx,
                                    LLT WideTy) {
   if (TypeIdx != 0)
     return UnableToLegalize;
   Observer.changingInstr(MI);
   widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ANYEXT);
   widenScalarDst(MI, WideTy);
   Observer.changedInstr(MI);
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::widenScalarAddSubSat(MachineInstr &MI, unsigned TypeIdx,
                                       LLT WideTy) {
   bool IsSigned = MI.getOpcode() == TargetOpcode::G_SADDSAT ||
                   MI.getOpcode() == TargetOpcode::G_SSUBSAT;
   // We can convert this to:
   //   1. Any extend iN to iM
   //   2. SHL by M-N
   //   3. [US][ADD|SUB]SAT
   //   4. L/ASHR by M-N
   //
   // It may be more efficient to lower this to a min and a max operation in
   // the higher precision arithmetic if the promoted operation isn't legal,
   // but this decision is up to the target's lowering request.
   Register DstReg = MI.getOperand(0).getReg();
 
   unsigned NewBits = WideTy.getScalarSizeInBits();
   unsigned SHLAmount = NewBits - MRI.getType(DstReg).getScalarSizeInBits();
 
   auto LHS = MIRBuilder.buildAnyExt(WideTy, MI.getOperand(1));
   auto RHS = MIRBuilder.buildAnyExt(WideTy, MI.getOperand(2));
   auto ShiftK = MIRBuilder.buildConstant(WideTy, SHLAmount);
   auto ShiftL = MIRBuilder.buildShl(WideTy, LHS, ShiftK);
   auto ShiftR = MIRBuilder.buildShl(WideTy, RHS, ShiftK);
 
   auto WideInst = MIRBuilder.buildInstr(MI.getOpcode(), {WideTy},
                                         {ShiftL, ShiftR}, MI.getFlags());
 
   // Use a shift that will preserve the number of sign bits when the trunc is
   // folded away.
   auto Result = IsSigned ? MIRBuilder.buildAShr(WideTy, WideInst, ShiftK)
                          : MIRBuilder.buildLShr(WideTy, WideInst, ShiftK);
 
   MIRBuilder.buildTrunc(DstReg, Result);
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy) {
   switch (MI.getOpcode()) {
   default:
     return UnableToLegalize;
   case TargetOpcode::G_EXTRACT:
     return widenScalarExtract(MI, TypeIdx, WideTy);
   case TargetOpcode::G_INSERT:
     return widenScalarInsert(MI, TypeIdx, WideTy);
   case TargetOpcode::G_MERGE_VALUES:
     return widenScalarMergeValues(MI, TypeIdx, WideTy);
   case TargetOpcode::G_UNMERGE_VALUES:
     return widenScalarUnmergeValues(MI, TypeIdx, WideTy);
   case TargetOpcode::G_UADDO:
   case TargetOpcode::G_USUBO: {
     if (TypeIdx == 1)
       return UnableToLegalize; // TODO
     auto LHSZext = MIRBuilder.buildZExt(WideTy, MI.getOperand(2));
     auto RHSZext = MIRBuilder.buildZExt(WideTy, MI.getOperand(3));
     unsigned Opcode = MI.getOpcode() == TargetOpcode::G_UADDO
                           ? TargetOpcode::G_ADD
                           : TargetOpcode::G_SUB;
     // Do the arithmetic in the larger type.
     auto NewOp = MIRBuilder.buildInstr(Opcode, {WideTy}, {LHSZext, RHSZext});
     LLT OrigTy = MRI.getType(MI.getOperand(0).getReg());
     APInt Mask =
         APInt::getLowBitsSet(WideTy.getSizeInBits(), OrigTy.getSizeInBits());
     auto AndOp = MIRBuilder.buildAnd(
         WideTy, NewOp, MIRBuilder.buildConstant(WideTy, Mask));
     // There is no overflow if the AndOp is the same as NewOp.
     MIRBuilder.buildICmp(CmpInst::ICMP_NE, MI.getOperand(1), NewOp, AndOp);
     // Now trunc the NewOp to the original result.
     MIRBuilder.buildTrunc(MI.getOperand(0), NewOp);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_SADDSAT:
   case TargetOpcode::G_SSUBSAT:
   case TargetOpcode::G_UADDSAT:
   case TargetOpcode::G_USUBSAT:
     return widenScalarAddSubSat(MI, TypeIdx, WideTy);
   case TargetOpcode::G_CTTZ:
   case TargetOpcode::G_CTTZ_ZERO_UNDEF:
   case TargetOpcode::G_CTLZ:
   case TargetOpcode::G_CTLZ_ZERO_UNDEF:
   case TargetOpcode::G_CTPOP: {
     if (TypeIdx == 0) {
       Observer.changingInstr(MI);
       widenScalarDst(MI, WideTy, 0);
       Observer.changedInstr(MI);
       return Legalized;
     }
 
     Register SrcReg = MI.getOperand(1).getReg();
 
     // First ZEXT the input.
     auto MIBSrc = MIRBuilder.buildZExt(WideTy, SrcReg);
     LLT CurTy = MRI.getType(SrcReg);
     if (MI.getOpcode() == TargetOpcode::G_CTTZ) {
       // The count is the same in the larger type except if the original
       // value was zero.  This can be handled by setting the bit just off
       // the top of the original type.
       auto TopBit =
           APInt::getOneBitSet(WideTy.getSizeInBits(), CurTy.getSizeInBits());
       MIBSrc = MIRBuilder.buildOr(
         WideTy, MIBSrc, MIRBuilder.buildConstant(WideTy, TopBit));
     }
 
     // Perform the operation at the larger size.
     auto MIBNewOp = MIRBuilder.buildInstr(MI.getOpcode(), {WideTy}, {MIBSrc});
     // This is already the correct result for CTPOP and CTTZs
     if (MI.getOpcode() == TargetOpcode::G_CTLZ ||
         MI.getOpcode() == TargetOpcode::G_CTLZ_ZERO_UNDEF) {
       // The correct result is NewOp - (Difference in widety and current ty).
       unsigned SizeDiff = WideTy.getSizeInBits() - CurTy.getSizeInBits();
       MIBNewOp = MIRBuilder.buildSub(
           WideTy, MIBNewOp, MIRBuilder.buildConstant(WideTy, SizeDiff));
     }
 
     MIRBuilder.buildZExtOrTrunc(MI.getOperand(0), MIBNewOp);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_BSWAP: {
     Observer.changingInstr(MI);
     Register DstReg = MI.getOperand(0).getReg();
 
     Register ShrReg = MRI.createGenericVirtualRegister(WideTy);
     Register DstExt = MRI.createGenericVirtualRegister(WideTy);
     Register ShiftAmtReg = MRI.createGenericVirtualRegister(WideTy);
     widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ANYEXT);
 
     MI.getOperand(0).setReg(DstExt);
 
     MIRBuilder.setInsertPt(MIRBuilder.getMBB(), ++MIRBuilder.getInsertPt());
 
     LLT Ty = MRI.getType(DstReg);
     unsigned DiffBits = WideTy.getScalarSizeInBits() - Ty.getScalarSizeInBits();
     MIRBuilder.buildConstant(ShiftAmtReg, DiffBits);
     MIRBuilder.buildLShr(ShrReg, DstExt, ShiftAmtReg);
 
     MIRBuilder.buildTrunc(DstReg, ShrReg);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_BITREVERSE: {
     Observer.changingInstr(MI);
 
     Register DstReg = MI.getOperand(0).getReg();
     LLT Ty = MRI.getType(DstReg);
     unsigned DiffBits = WideTy.getScalarSizeInBits() - Ty.getScalarSizeInBits();
 
     Register DstExt = MRI.createGenericVirtualRegister(WideTy);
     widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ANYEXT);
     MI.getOperand(0).setReg(DstExt);
     MIRBuilder.setInsertPt(MIRBuilder.getMBB(), ++MIRBuilder.getInsertPt());
 
     auto ShiftAmt = MIRBuilder.buildConstant(WideTy, DiffBits);
     auto Shift = MIRBuilder.buildLShr(WideTy, DstExt, ShiftAmt);
     MIRBuilder.buildTrunc(DstReg, Shift);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_FREEZE:
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ANYEXT);
     widenScalarDst(MI, WideTy);
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_ADD:
   case TargetOpcode::G_AND:
   case TargetOpcode::G_MUL:
   case TargetOpcode::G_OR:
   case TargetOpcode::G_XOR:
   case TargetOpcode::G_SUB:
     // Perform operation at larger width (any extension is fines here, high bits
     // don't affect the result) and then truncate the result back to the
     // original type.
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ANYEXT);
     widenScalarSrc(MI, WideTy, 2, TargetOpcode::G_ANYEXT);
     widenScalarDst(MI, WideTy);
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_SHL:
     Observer.changingInstr(MI);
 
     if (TypeIdx == 0) {
       widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ANYEXT);
       widenScalarDst(MI, WideTy);
     } else {
       assert(TypeIdx == 1);
       // The "number of bits to shift" operand must preserve its value as an
       // unsigned integer:
       widenScalarSrc(MI, WideTy, 2, TargetOpcode::G_ZEXT);
     }
 
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_SDIV:
   case TargetOpcode::G_SREM:
   case TargetOpcode::G_SMIN:
   case TargetOpcode::G_SMAX:
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_SEXT);
     widenScalarSrc(MI, WideTy, 2, TargetOpcode::G_SEXT);
     widenScalarDst(MI, WideTy);
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_ASHR:
   case TargetOpcode::G_LSHR:
     Observer.changingInstr(MI);
 
     if (TypeIdx == 0) {
       unsigned CvtOp = MI.getOpcode() == TargetOpcode::G_ASHR ?
         TargetOpcode::G_SEXT : TargetOpcode::G_ZEXT;
 
       widenScalarSrc(MI, WideTy, 1, CvtOp);
       widenScalarDst(MI, WideTy);
     } else {
       assert(TypeIdx == 1);
       // The "number of bits to shift" operand must preserve its value as an
       // unsigned integer:
       widenScalarSrc(MI, WideTy, 2, TargetOpcode::G_ZEXT);
     }
 
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_UDIV:
   case TargetOpcode::G_UREM:
   case TargetOpcode::G_UMIN:
   case TargetOpcode::G_UMAX:
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ZEXT);
     widenScalarSrc(MI, WideTy, 2, TargetOpcode::G_ZEXT);
     widenScalarDst(MI, WideTy);
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_SELECT:
     Observer.changingInstr(MI);
     if (TypeIdx == 0) {
       // Perform operation at larger width (any extension is fine here, high
       // bits don't affect the result) and then truncate the result back to the
       // original type.
       widenScalarSrc(MI, WideTy, 2, TargetOpcode::G_ANYEXT);
       widenScalarSrc(MI, WideTy, 3, TargetOpcode::G_ANYEXT);
       widenScalarDst(MI, WideTy);
     } else {
       bool IsVec = MRI.getType(MI.getOperand(1).getReg()).isVector();
       // Explicit extension is required here since high bits affect the result.
       widenScalarSrc(MI, WideTy, 1, MIRBuilder.getBoolExtOp(IsVec, false));
     }
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_FPTOSI:
   case TargetOpcode::G_FPTOUI:
     Observer.changingInstr(MI);
 
     if (TypeIdx == 0)
       widenScalarDst(MI, WideTy);
     else
       widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_FPEXT);
 
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_SITOFP:
     if (TypeIdx != 1)
       return UnableToLegalize;
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_SEXT);
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_UITOFP:
     if (TypeIdx != 1)
       return UnableToLegalize;
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ZEXT);
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_LOAD:
   case TargetOpcode::G_SEXTLOAD:
   case TargetOpcode::G_ZEXTLOAD:
     Observer.changingInstr(MI);
     widenScalarDst(MI, WideTy);
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_STORE: {
     if (TypeIdx != 0)
       return UnableToLegalize;
 
     LLT Ty = MRI.getType(MI.getOperand(0).getReg());
     if (!isPowerOf2_32(Ty.getSizeInBits()))
       return UnableToLegalize;
 
     Observer.changingInstr(MI);
 
     unsigned ExtType = Ty.getScalarSizeInBits() == 1 ?
       TargetOpcode::G_ZEXT : TargetOpcode::G_ANYEXT;
     widenScalarSrc(MI, WideTy, 0, ExtType);
 
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_CONSTANT: {
     MachineOperand &SrcMO = MI.getOperand(1);
     LLVMContext &Ctx = MIRBuilder.getMF().getFunction().getContext();
     unsigned ExtOpc = LI.getExtOpcodeForWideningConstant(
         MRI.getType(MI.getOperand(0).getReg()));
     assert((ExtOpc == TargetOpcode::G_ZEXT || ExtOpc == TargetOpcode::G_SEXT ||
             ExtOpc == TargetOpcode::G_ANYEXT) &&
            "Illegal Extend");
     const APInt &SrcVal = SrcMO.getCImm()->getValue();
     const APInt &Val = (ExtOpc == TargetOpcode::G_SEXT)
                            ? SrcVal.sext(WideTy.getSizeInBits())
                            : SrcVal.zext(WideTy.getSizeInBits());
     Observer.changingInstr(MI);
     SrcMO.setCImm(ConstantInt::get(Ctx, Val));
 
     widenScalarDst(MI, WideTy);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_FCONSTANT: {
     MachineOperand &SrcMO = MI.getOperand(1);
     LLVMContext &Ctx = MIRBuilder.getMF().getFunction().getContext();
     APFloat Val = SrcMO.getFPImm()->getValueAPF();
     bool LosesInfo;
     switch (WideTy.getSizeInBits()) {
     case 32:
       Val.convert(APFloat::IEEEsingle(), APFloat::rmNearestTiesToEven,
                   &LosesInfo);
       break;
     case 64:
       Val.convert(APFloat::IEEEdouble(), APFloat::rmNearestTiesToEven,
                   &LosesInfo);
       break;
     default:
       return UnableToLegalize;
     }
 
     assert(!LosesInfo && "extend should always be lossless");
 
     Observer.changingInstr(MI);
     SrcMO.setFPImm(ConstantFP::get(Ctx, Val));
 
     widenScalarDst(MI, WideTy, 0, TargetOpcode::G_FPTRUNC);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_IMPLICIT_DEF: {
     Observer.changingInstr(MI);
     widenScalarDst(MI, WideTy);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_BRCOND:
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 0, MIRBuilder.getBoolExtOp(false, false));
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_FCMP:
     Observer.changingInstr(MI);
     if (TypeIdx == 0)
       widenScalarDst(MI, WideTy);
     else {
       widenScalarSrc(MI, WideTy, 2, TargetOpcode::G_FPEXT);
       widenScalarSrc(MI, WideTy, 3, TargetOpcode::G_FPEXT);
     }
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_ICMP:
     Observer.changingInstr(MI);
     if (TypeIdx == 0)
       widenScalarDst(MI, WideTy);
     else {
       unsigned ExtOpcode = CmpInst::isSigned(static_cast<CmpInst::Predicate>(
                                MI.getOperand(1).getPredicate()))
                                ? TargetOpcode::G_SEXT
                                : TargetOpcode::G_ZEXT;
       widenScalarSrc(MI, WideTy, 2, ExtOpcode);
       widenScalarSrc(MI, WideTy, 3, ExtOpcode);
     }
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_PTR_ADD:
     assert(TypeIdx == 1 && "unable to legalize pointer of G_PTR_ADD");
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 2, TargetOpcode::G_SEXT);
     Observer.changedInstr(MI);
     return Legalized;
 
   case TargetOpcode::G_PHI: {
     assert(TypeIdx == 0 && "Expecting only Idx 0");
 
     Observer.changingInstr(MI);
     for (unsigned I = 1; I < MI.getNumOperands(); I += 2) {
       MachineBasicBlock &OpMBB = *MI.getOperand(I + 1).getMBB();
       MIRBuilder.setInsertPt(OpMBB, OpMBB.getFirstTerminator());
       widenScalarSrc(MI, WideTy, I, TargetOpcode::G_ANYEXT);
     }
 
     MachineBasicBlock &MBB = *MI.getParent();
     MIRBuilder.setInsertPt(MBB, --MBB.getFirstNonPHI());
     widenScalarDst(MI, WideTy);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_EXTRACT_VECTOR_ELT: {
     if (TypeIdx == 0) {
       Register VecReg = MI.getOperand(1).getReg();
       LLT VecTy = MRI.getType(VecReg);
       Observer.changingInstr(MI);
 
       widenScalarSrc(MI, LLT::vector(VecTy.getNumElements(),
                                      WideTy.getSizeInBits()),
                      1, TargetOpcode::G_SEXT);
 
       widenScalarDst(MI, WideTy, 0);
       Observer.changedInstr(MI);
       return Legalized;
     }
 
     if (TypeIdx != 2)
       return UnableToLegalize;
     Observer.changingInstr(MI);
     // TODO: Probably should be zext
     widenScalarSrc(MI, WideTy, 2, TargetOpcode::G_SEXT);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_INSERT_VECTOR_ELT: {
     if (TypeIdx == 1) {
       Observer.changingInstr(MI);
 
       Register VecReg = MI.getOperand(1).getReg();
       LLT VecTy = MRI.getType(VecReg);
       LLT WideVecTy = LLT::vector(VecTy.getNumElements(), WideTy);
 
       widenScalarSrc(MI, WideVecTy, 1, TargetOpcode::G_ANYEXT);
       widenScalarSrc(MI, WideTy, 2, TargetOpcode::G_ANYEXT);
       widenScalarDst(MI, WideVecTy, 0);
       Observer.changedInstr(MI);
       return Legalized;
     }
 
     if (TypeIdx == 2) {
       Observer.changingInstr(MI);
       // TODO: Probably should be zext
       widenScalarSrc(MI, WideTy, 3, TargetOpcode::G_SEXT);
       Observer.changedInstr(MI);
       return Legalized;
     }
 
     return UnableToLegalize;
   }
   case TargetOpcode::G_FADD:
   case TargetOpcode::G_FMUL:
   case TargetOpcode::G_FSUB:
   case TargetOpcode::G_FMA:
   case TargetOpcode::G_FMAD:
   case TargetOpcode::G_FNEG:
   case TargetOpcode::G_FABS:
   case TargetOpcode::G_FCANONICALIZE:
   case TargetOpcode::G_FMINNUM:
   case TargetOpcode::G_FMAXNUM:
   case TargetOpcode::G_FMINNUM_IEEE:
   case TargetOpcode::G_FMAXNUM_IEEE:
   case TargetOpcode::G_FMINIMUM:
   case TargetOpcode::G_FMAXIMUM:
   case TargetOpcode::G_FDIV:
   case TargetOpcode::G_FREM:
   case TargetOpcode::G_FCEIL:
   case TargetOpcode::G_FFLOOR:
   case TargetOpcode::G_FCOS:
   case TargetOpcode::G_FSIN:
   case TargetOpcode::G_FLOG10:
   case TargetOpcode::G_FLOG:
   case TargetOpcode::G_FLOG2:
   case TargetOpcode::G_FRINT:
   case TargetOpcode::G_FNEARBYINT:
   case TargetOpcode::G_FSQRT:
   case TargetOpcode::G_FEXP:
   case TargetOpcode::G_FEXP2:
   case TargetOpcode::G_FPOW:
   case TargetOpcode::G_INTRINSIC_TRUNC:
   case TargetOpcode::G_INTRINSIC_ROUND:
     assert(TypeIdx == 0);
     Observer.changingInstr(MI);
 
     for (unsigned I = 1, E = MI.getNumOperands(); I != E; ++I)
       widenScalarSrc(MI, WideTy, I, TargetOpcode::G_FPEXT);
 
     widenScalarDst(MI, WideTy, 0, TargetOpcode::G_FPTRUNC);
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_INTTOPTR:
     if (TypeIdx != 1)
       return UnableToLegalize;
 
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ZEXT);
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_PTRTOINT:
     if (TypeIdx != 0)
       return UnableToLegalize;
 
     Observer.changingInstr(MI);
     widenScalarDst(MI, WideTy, 0);
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_BUILD_VECTOR: {
     Observer.changingInstr(MI);
 
     const LLT WideEltTy = TypeIdx == 1 ? WideTy : WideTy.getElementType();
     for (int I = 1, E = MI.getNumOperands(); I != E; ++I)
       widenScalarSrc(MI, WideEltTy, I, TargetOpcode::G_ANYEXT);
 
     // Avoid changing the result vector type if the source element type was
     // requested.
     if (TypeIdx == 1) {
       auto &TII = *MI.getMF()->getSubtarget().getInstrInfo();
       MI.setDesc(TII.get(TargetOpcode::G_BUILD_VECTOR_TRUNC));
     } else {
       widenScalarDst(MI, WideTy, 0);
     }
 
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_SEXT_INREG:
     if (TypeIdx != 0)
       return UnableToLegalize;
 
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 1, TargetOpcode::G_ANYEXT);
     widenScalarDst(MI, WideTy, 0, TargetOpcode::G_TRUNC);
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_PTRMASK: {
     if (TypeIdx != 1)
       return UnableToLegalize;
     Observer.changingInstr(MI);
     widenScalarSrc(MI, WideTy, 2, TargetOpcode::G_ZEXT);
     Observer.changedInstr(MI);
     return Legalized;
   }
   }
 }
 
 static void getUnmergePieces(SmallVectorImpl<Register> &Pieces,
                              MachineIRBuilder &B, Register Src, LLT Ty) {
   auto Unmerge = B.buildUnmerge(Ty, Src);
   for (int I = 0, E = Unmerge->getNumOperands() - 1; I != E; ++I)
     Pieces.push_back(Unmerge.getReg(I));
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerBitcast(MachineInstr &MI) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
   LLT DstTy = MRI.getType(Dst);
   LLT SrcTy = MRI.getType(Src);
 
   if (SrcTy.isVector()) {
     LLT SrcEltTy = SrcTy.getElementType();
     SmallVector<Register, 8> SrcRegs;
 
     if (DstTy.isVector()) {
       int NumDstElt = DstTy.getNumElements();
       int NumSrcElt = SrcTy.getNumElements();
 
       LLT DstEltTy = DstTy.getElementType();
       LLT DstCastTy = DstEltTy; // Intermediate bitcast result type
       LLT SrcPartTy = SrcEltTy; // Original unmerge result type.
 
       // If there's an element size mismatch, insert intermediate casts to match
       // the result element type.
       if (NumSrcElt < NumDstElt) { // Source element type is larger.
         // %1:_(<4 x s8>) = G_BITCAST %0:_(<2 x s16>)
         //
         // =>
         //
         // %2:_(s16), %3:_(s16) = G_UNMERGE_VALUES %0
         // %3:_(<2 x s8>) = G_BITCAST %2
         // %4:_(<2 x s8>) = G_BITCAST %3
         // %1:_(<4 x s16>) = G_CONCAT_VECTORS %3, %4
         DstCastTy = LLT::vector(NumDstElt / NumSrcElt, DstEltTy);
         SrcPartTy = SrcEltTy;
       } else if (NumSrcElt > NumDstElt) { // Source element type is smaller.
         //
         // %1:_(<2 x s16>) = G_BITCAST %0:_(<4 x s8>)
         //
         // =>
         //
         // %2:_(<2 x s8>), %3:_(<2 x s8>) = G_UNMERGE_VALUES %0
         // %3:_(s16) = G_BITCAST %2
         // %4:_(s16) = G_BITCAST %3
         // %1:_(<2 x s16>) = G_BUILD_VECTOR %3, %4
         SrcPartTy = LLT::vector(NumSrcElt / NumDstElt, SrcEltTy);
         DstCastTy = DstEltTy;
       }
 
       getUnmergePieces(SrcRegs, MIRBuilder, Src, SrcPartTy);
       for (Register &SrcReg : SrcRegs)
         SrcReg = MIRBuilder.buildBitcast(DstCastTy, SrcReg).getReg(0);
     } else
       getUnmergePieces(SrcRegs, MIRBuilder, Src, SrcEltTy);
 
     MIRBuilder.buildMerge(Dst, SrcRegs);
     MI.eraseFromParent();
     return Legalized;
   }
 
   if (DstTy.isVector()) {
     SmallVector<Register, 8> SrcRegs;
     getUnmergePieces(SrcRegs, MIRBuilder, Src, DstTy.getElementType());
     MIRBuilder.buildMerge(Dst, SrcRegs);
     MI.eraseFromParent();
     return Legalized;
   }
 
   return UnableToLegalize;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::bitcast(MachineInstr &MI, unsigned TypeIdx, LLT CastTy) {
   switch (MI.getOpcode()) {
   case TargetOpcode::G_LOAD: {
     if (TypeIdx != 0)
       return UnableToLegalize;
 
     Observer.changingInstr(MI);
     bitcastDst(MI, CastTy, 0);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_STORE: {
     if (TypeIdx != 0)
       return UnableToLegalize;
 
     Observer.changingInstr(MI);
     bitcastSrc(MI, CastTy, 0);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_SELECT: {
     if (TypeIdx != 0)
       return UnableToLegalize;
 
     if (MRI.getType(MI.getOperand(1).getReg()).isVector()) {
       LLVM_DEBUG(
           dbgs() << "bitcast action not implemented for vector select\n");
       return UnableToLegalize;
     }
 
     Observer.changingInstr(MI);
     bitcastSrc(MI, CastTy, 2);
     bitcastSrc(MI, CastTy, 3);
     bitcastDst(MI, CastTy, 0);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_AND:
   case TargetOpcode::G_OR:
   case TargetOpcode::G_XOR: {
     Observer.changingInstr(MI);
     bitcastSrc(MI, CastTy, 1);
     bitcastSrc(MI, CastTy, 2);
     bitcastDst(MI, CastTy, 0);
     Observer.changedInstr(MI);
     return Legalized;
   }
   default:
     return UnableToLegalize;
   }
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lower(MachineInstr &MI, unsigned TypeIdx, LLT Ty) {
   using namespace TargetOpcode;
 
   switch(MI.getOpcode()) {
   default:
     return UnableToLegalize;
   case TargetOpcode::G_BITCAST:
     return lowerBitcast(MI);
   case TargetOpcode::G_SREM:
   case TargetOpcode::G_UREM: {
     auto Quot =
         MIRBuilder.buildInstr(MI.getOpcode() == G_SREM ? G_SDIV : G_UDIV, {Ty},
                               {MI.getOperand(1), MI.getOperand(2)});
 
     auto Prod = MIRBuilder.buildMul(Ty, Quot, MI.getOperand(2));
     MIRBuilder.buildSub(MI.getOperand(0), MI.getOperand(1), Prod);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_SADDO:
   case TargetOpcode::G_SSUBO:
     return lowerSADDO_SSUBO(MI);
   case TargetOpcode::G_SMULO:
   case TargetOpcode::G_UMULO: {
     // Generate G_UMULH/G_SMULH to check for overflow and a normal G_MUL for the
     // result.
     Register Res = MI.getOperand(0).getReg();
     Register Overflow = MI.getOperand(1).getReg();
     Register LHS = MI.getOperand(2).getReg();
     Register RHS = MI.getOperand(3).getReg();
 
     unsigned Opcode = MI.getOpcode() == TargetOpcode::G_SMULO
                           ? TargetOpcode::G_SMULH
                           : TargetOpcode::G_UMULH;
 
     Observer.changingInstr(MI);
     const auto &TII = MIRBuilder.getTII();
     MI.setDesc(TII.get(TargetOpcode::G_MUL));
     MI.RemoveOperand(1);
     Observer.changedInstr(MI);
 
-    MIRBuilder.setInsertPt(MIRBuilder.getMBB(), ++MIRBuilder.getInsertPt());
-
     auto HiPart = MIRBuilder.buildInstr(Opcode, {Ty}, {LHS, RHS});
     auto Zero = MIRBuilder.buildConstant(Ty, 0);
+
+    // Move insert point forward so we can use the Res register if needed.
+    MIRBuilder.setInsertPt(MIRBuilder.getMBB(), ++MIRBuilder.getInsertPt());
 
     // For *signed* multiply, overflow is detected by checking:
     // (hi != (lo >> bitwidth-1))
     if (Opcode == TargetOpcode::G_SMULH) {
       auto ShiftAmt = MIRBuilder.buildConstant(Ty, Ty.getSizeInBits() - 1);
       auto Shifted = MIRBuilder.buildAShr(Ty, Res, ShiftAmt);
       MIRBuilder.buildICmp(CmpInst::ICMP_NE, Overflow, HiPart, Shifted);
     } else {
       MIRBuilder.buildICmp(CmpInst::ICMP_NE, Overflow, HiPart, Zero);
     }
     return Legalized;
   }
   case TargetOpcode::G_FNEG: {
     // TODO: Handle vector types once we are able to
     // represent them.
     if (Ty.isVector())
       return UnableToLegalize;
     Register Res = MI.getOperand(0).getReg();
     LLVMContext &Ctx = MIRBuilder.getMF().getFunction().getContext();
     Type *ZeroTy = getFloatTypeForLLT(Ctx, Ty);
     if (!ZeroTy)
       return UnableToLegalize;
     ConstantFP &ZeroForNegation =
         *cast<ConstantFP>(ConstantFP::getZeroValueForNegation(ZeroTy));
     auto Zero = MIRBuilder.buildFConstant(Ty, ZeroForNegation);
     Register SubByReg = MI.getOperand(1).getReg();
     Register ZeroReg = Zero.getReg(0);
     MIRBuilder.buildFSub(Res, ZeroReg, SubByReg, MI.getFlags());
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_FSUB: {
     // Lower (G_FSUB LHS, RHS) to (G_FADD LHS, (G_FNEG RHS)).
     // First, check if G_FNEG is marked as Lower. If so, we may
     // end up with an infinite loop as G_FSUB is used to legalize G_FNEG.
     if (LI.getAction({G_FNEG, {Ty}}).Action == Lower)
       return UnableToLegalize;
     Register Res = MI.getOperand(0).getReg();
     Register LHS = MI.getOperand(1).getReg();
     Register RHS = MI.getOperand(2).getReg();
     Register Neg = MRI.createGenericVirtualRegister(Ty);
     MIRBuilder.buildFNeg(Neg, RHS);
     MIRBuilder.buildFAdd(Res, LHS, Neg, MI.getFlags());
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_FMAD:
     return lowerFMad(MI);
   case TargetOpcode::G_FFLOOR:
     return lowerFFloor(MI);
   case TargetOpcode::G_INTRINSIC_ROUND:
     return lowerIntrinsicRound(MI);
   case TargetOpcode::G_ATOMIC_CMPXCHG_WITH_SUCCESS: {
     Register OldValRes = MI.getOperand(0).getReg();
     Register SuccessRes = MI.getOperand(1).getReg();
     Register Addr = MI.getOperand(2).getReg();
     Register CmpVal = MI.getOperand(3).getReg();
     Register NewVal = MI.getOperand(4).getReg();
     MIRBuilder.buildAtomicCmpXchg(OldValRes, Addr, CmpVal, NewVal,
                                   **MI.memoperands_begin());
     MIRBuilder.buildICmp(CmpInst::ICMP_EQ, SuccessRes, OldValRes, CmpVal);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_LOAD:
   case TargetOpcode::G_SEXTLOAD:
   case TargetOpcode::G_ZEXTLOAD: {
     // Lower to a memory-width G_LOAD and a G_SEXT/G_ZEXT/G_ANYEXT
     Register DstReg = MI.getOperand(0).getReg();
     Register PtrReg = MI.getOperand(1).getReg();
     LLT DstTy = MRI.getType(DstReg);
     auto &MMO = **MI.memoperands_begin();
 
     if (DstTy.getSizeInBits() == MMO.getSizeInBits()) {
       if (MI.getOpcode() == TargetOpcode::G_LOAD) {
         // This load needs splitting into power of 2 sized loads.
         if (DstTy.isVector())
           return UnableToLegalize;
         if (isPowerOf2_32(DstTy.getSizeInBits()))
           return UnableToLegalize; // Don't know what we're being asked to do.
 
         // Our strategy here is to generate anyextending loads for the smaller
         // types up to next power-2 result type, and then combine the two larger
         // result values together, before truncating back down to the non-pow-2
         // type.
         // E.g. v1 = i24 load =>
         // v2 = i32 zextload (2 byte)
         // v3 = i32 load (1 byte)
         // v4 = i32 shl v3, 16
         // v5 = i32 or v4, v2
         // v1 = i24 trunc v5
         // By doing this we generate the correct truncate which should get
         // combined away as an artifact with a matching extend.
         uint64_t LargeSplitSize = PowerOf2Floor(DstTy.getSizeInBits());
         uint64_t SmallSplitSize = DstTy.getSizeInBits() - LargeSplitSize;
 
         MachineFunction &MF = MIRBuilder.getMF();
         MachineMemOperand *LargeMMO =
             MF.getMachineMemOperand(&MMO, 0, LargeSplitSize / 8);
         MachineMemOperand *SmallMMO = MF.getMachineMemOperand(
             &MMO, LargeSplitSize / 8, SmallSplitSize / 8);
 
         LLT PtrTy = MRI.getType(PtrReg);
         unsigned AnyExtSize = NextPowerOf2(DstTy.getSizeInBits());
         LLT AnyExtTy = LLT::scalar(AnyExtSize);
         Register LargeLdReg = MRI.createGenericVirtualRegister(AnyExtTy);
         Register SmallLdReg = MRI.createGenericVirtualRegister(AnyExtTy);
         auto LargeLoad = MIRBuilder.buildLoadInstr(
             TargetOpcode::G_ZEXTLOAD, LargeLdReg, PtrReg, *LargeMMO);
 
         auto OffsetCst = MIRBuilder.buildConstant(
             LLT::scalar(PtrTy.getSizeInBits()), LargeSplitSize / 8);
         Register PtrAddReg = MRI.createGenericVirtualRegister(PtrTy);
         auto SmallPtr =
             MIRBuilder.buildPtrAdd(PtrAddReg, PtrReg, OffsetCst.getReg(0));
         auto SmallLoad = MIRBuilder.buildLoad(SmallLdReg, SmallPtr.getReg(0),
                                               *SmallMMO);
 
         auto ShiftAmt = MIRBuilder.buildConstant(AnyExtTy, LargeSplitSize);
         auto Shift = MIRBuilder.buildShl(AnyExtTy, SmallLoad, ShiftAmt);
         auto Or = MIRBuilder.buildOr(AnyExtTy, Shift, LargeLoad);
         MIRBuilder.buildTrunc(DstReg, {Or.getReg(0)});
         MI.eraseFromParent();
         return Legalized;
       }
       MIRBuilder.buildLoad(DstReg, PtrReg, MMO);
       MI.eraseFromParent();
       return Legalized;
     }
 
     if (DstTy.isScalar()) {
       Register TmpReg =
           MRI.createGenericVirtualRegister(LLT::scalar(MMO.getSizeInBits()));
       MIRBuilder.buildLoad(TmpReg, PtrReg, MMO);
       switch (MI.getOpcode()) {
       default:
         llvm_unreachable("Unexpected opcode");
       case TargetOpcode::G_LOAD:
         MIRBuilder.buildExtOrTrunc(TargetOpcode::G_ANYEXT, DstReg, TmpReg);
         break;
       case TargetOpcode::G_SEXTLOAD:
         MIRBuilder.buildSExt(DstReg, TmpReg);
         break;
       case TargetOpcode::G_ZEXTLOAD:
         MIRBuilder.buildZExt(DstReg, TmpReg);
         break;
       }
       MI.eraseFromParent();
       return Legalized;
     }
 
     return UnableToLegalize;
   }
   case TargetOpcode::G_STORE: {
     // Lower a non-power of 2 store into multiple pow-2 stores.
     // E.g. split an i24 store into an i16 store + i8 store.
     // We do this by first extending the stored value to the next largest power
     // of 2 type, and then using truncating stores to store the components.
     // By doing this, likewise with G_LOAD, generate an extend that can be
     // artifact-combined away instead of leaving behind extracts.
     Register SrcReg = MI.getOperand(0).getReg();
     Register PtrReg = MI.getOperand(1).getReg();
     LLT SrcTy = MRI.getType(SrcReg);
     MachineMemOperand &MMO = **MI.memoperands_begin();
     if (SrcTy.getSizeInBits() != MMO.getSizeInBits())
       return UnableToLegalize;
     if (SrcTy.isVector())
       return UnableToLegalize;
     if (isPowerOf2_32(SrcTy.getSizeInBits()))
       return UnableToLegalize; // Don't know what we're being asked to do.
 
     // Extend to the next pow-2.
     const LLT ExtendTy = LLT::scalar(NextPowerOf2(SrcTy.getSizeInBits()));
     auto ExtVal = MIRBuilder.buildAnyExt(ExtendTy, SrcReg);
 
     // Obtain the smaller value by shifting away the larger value.
     uint64_t LargeSplitSize = PowerOf2Floor(SrcTy.getSizeInBits());
     uint64_t SmallSplitSize = SrcTy.getSizeInBits() - LargeSplitSize;
     auto ShiftAmt = MIRBuilder.buildConstant(ExtendTy, LargeSplitSize);
     auto SmallVal = MIRBuilder.buildLShr(ExtendTy, ExtVal, ShiftAmt);
 
     // Generate the PtrAdd and truncating stores.
     LLT PtrTy = MRI.getType(PtrReg);
     auto OffsetCst = MIRBuilder.buildConstant(
             LLT::scalar(PtrTy.getSizeInBits()), LargeSplitSize / 8);
     Register PtrAddReg = MRI.createGenericVirtualRegister(PtrTy);
     auto SmallPtr =
         MIRBuilder.buildPtrAdd(PtrAddReg, PtrReg, OffsetCst.getReg(0));
 
     MachineFunction &MF = MIRBuilder.getMF();
     MachineMemOperand *LargeMMO =
         MF.getMachineMemOperand(&MMO, 0, LargeSplitSize / 8);
     MachineMemOperand *SmallMMO =
         MF.getMachineMemOperand(&MMO, LargeSplitSize / 8, SmallSplitSize / 8);
     MIRBuilder.buildStore(ExtVal.getReg(0), PtrReg, *LargeMMO);
     MIRBuilder.buildStore(SmallVal.getReg(0), SmallPtr.getReg(0), *SmallMMO);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_CTLZ_ZERO_UNDEF:
   case TargetOpcode::G_CTTZ_ZERO_UNDEF:
   case TargetOpcode::G_CTLZ:
   case TargetOpcode::G_CTTZ:
   case TargetOpcode::G_CTPOP:
     return lowerBitCount(MI, TypeIdx, Ty);
   case G_UADDO: {
     Register Res = MI.getOperand(0).getReg();
     Register CarryOut = MI.getOperand(1).getReg();
     Register LHS = MI.getOperand(2).getReg();
     Register RHS = MI.getOperand(3).getReg();
 
     MIRBuilder.buildAdd(Res, LHS, RHS);
     MIRBuilder.buildICmp(CmpInst::ICMP_ULT, CarryOut, Res, RHS);
 
     MI.eraseFromParent();
     return Legalized;
   }
   case G_UADDE: {
     Register Res = MI.getOperand(0).getReg();
     Register CarryOut = MI.getOperand(1).getReg();
     Register LHS = MI.getOperand(2).getReg();
     Register RHS = MI.getOperand(3).getReg();
     Register CarryIn = MI.getOperand(4).getReg();
     LLT Ty = MRI.getType(Res);
 
     auto TmpRes = MIRBuilder.buildAdd(Ty, LHS, RHS);
     auto ZExtCarryIn = MIRBuilder.buildZExt(Ty, CarryIn);
     MIRBuilder.buildAdd(Res, TmpRes, ZExtCarryIn);
     MIRBuilder.buildICmp(CmpInst::ICMP_ULT, CarryOut, Res, LHS);
 
     MI.eraseFromParent();
     return Legalized;
   }
   case G_USUBO: {
     Register Res = MI.getOperand(0).getReg();
     Register BorrowOut = MI.getOperand(1).getReg();
     Register LHS = MI.getOperand(2).getReg();
     Register RHS = MI.getOperand(3).getReg();
 
     MIRBuilder.buildSub(Res, LHS, RHS);
     MIRBuilder.buildICmp(CmpInst::ICMP_ULT, BorrowOut, LHS, RHS);
 
     MI.eraseFromParent();
     return Legalized;
   }
   case G_USUBE: {
     Register Res = MI.getOperand(0).getReg();
     Register BorrowOut = MI.getOperand(1).getReg();
     Register LHS = MI.getOperand(2).getReg();
     Register RHS = MI.getOperand(3).getReg();
     Register BorrowIn = MI.getOperand(4).getReg();
     const LLT CondTy = MRI.getType(BorrowOut);
     const LLT Ty = MRI.getType(Res);
 
     auto TmpRes = MIRBuilder.buildSub(Ty, LHS, RHS);
     auto ZExtBorrowIn = MIRBuilder.buildZExt(Ty, BorrowIn);
     MIRBuilder.buildSub(Res, TmpRes, ZExtBorrowIn);
 
     auto LHS_EQ_RHS = MIRBuilder.buildICmp(CmpInst::ICMP_EQ, CondTy, LHS, RHS);
     auto LHS_ULT_RHS = MIRBuilder.buildICmp(CmpInst::ICMP_ULT, CondTy, LHS, RHS);
     MIRBuilder.buildSelect(BorrowOut, LHS_EQ_RHS, BorrowIn, LHS_ULT_RHS);
 
     MI.eraseFromParent();
     return Legalized;
   }
   case G_UITOFP:
     return lowerUITOFP(MI, TypeIdx, Ty);
   case G_SITOFP:
     return lowerSITOFP(MI, TypeIdx, Ty);
   case G_FPTOUI:
     return lowerFPTOUI(MI, TypeIdx, Ty);
   case G_FPTOSI:
     return lowerFPTOSI(MI);
   case G_FPTRUNC:
     return lowerFPTRUNC(MI, TypeIdx, Ty);
   case G_SMIN:
   case G_SMAX:
   case G_UMIN:
   case G_UMAX:
     return lowerMinMax(MI, TypeIdx, Ty);
   case G_FCOPYSIGN:
     return lowerFCopySign(MI, TypeIdx, Ty);
   case G_FMINNUM:
   case G_FMAXNUM:
     return lowerFMinNumMaxNum(MI);
   case G_MERGE_VALUES:
     return lowerMergeValues(MI);
   case G_UNMERGE_VALUES:
     return lowerUnmergeValues(MI);
   case TargetOpcode::G_SEXT_INREG: {
     assert(MI.getOperand(2).isImm() && "Expected immediate");
     int64_t SizeInBits = MI.getOperand(2).getImm();
 
     Register DstReg = MI.getOperand(0).getReg();
     Register SrcReg = MI.getOperand(1).getReg();
     LLT DstTy = MRI.getType(DstReg);
     Register TmpRes = MRI.createGenericVirtualRegister(DstTy);
 
     auto MIBSz = MIRBuilder.buildConstant(DstTy, DstTy.getScalarSizeInBits() - SizeInBits);
     MIRBuilder.buildShl(TmpRes, SrcReg, MIBSz->getOperand(0));
     MIRBuilder.buildAShr(DstReg, TmpRes, MIBSz->getOperand(0));
     MI.eraseFromParent();
     return Legalized;
   }
   case G_SHUFFLE_VECTOR:
     return lowerShuffleVector(MI);
   case G_DYN_STACKALLOC:
     return lowerDynStackAlloc(MI);
   case G_EXTRACT:
     return lowerExtract(MI);
   case G_INSERT:
     return lowerInsert(MI);
   case G_BSWAP:
     return lowerBswap(MI);
   case G_BITREVERSE:
     return lowerBitreverse(MI);
   case G_READ_REGISTER:
   case G_WRITE_REGISTER:
     return lowerReadWriteRegister(MI);
   }
 }
 
 LegalizerHelper::LegalizeResult LegalizerHelper::fewerElementsVectorImplicitDef(
     MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy) {
   SmallVector<Register, 2> DstRegs;
 
   unsigned NarrowSize = NarrowTy.getSizeInBits();
   Register DstReg = MI.getOperand(0).getReg();
   unsigned Size = MRI.getType(DstReg).getSizeInBits();
   int NumParts = Size / NarrowSize;
   // FIXME: Don't know how to handle the situation where the small vectors
   // aren't all the same size yet.
   if (Size % NarrowSize != 0)
     return UnableToLegalize;
 
   for (int i = 0; i < NumParts; ++i) {
     Register TmpReg = MRI.createGenericVirtualRegister(NarrowTy);
     MIRBuilder.buildUndef(TmpReg);
     DstRegs.push_back(TmpReg);
   }
 
   if (NarrowTy.isVector())
     MIRBuilder.buildConcatVectors(DstReg, DstRegs);
   else
     MIRBuilder.buildBuildVector(DstReg, DstRegs);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 // Handle splitting vector operations which need to have the same number of
 // elements in each type index, but each type index may have a different element
 // type.
 //
 // e.g.  <4 x s64> = G_SHL <4 x s64>, <4 x s32> ->
 //       <2 x s64> = G_SHL <2 x s64>, <2 x s32>
 //       <2 x s64> = G_SHL <2 x s64>, <2 x s32>
 //
 // Also handles some irregular breakdown cases, e.g.
 // e.g.  <3 x s64> = G_SHL <3 x s64>, <3 x s32> ->
 //       <2 x s64> = G_SHL <2 x s64>, <2 x s32>
 //             s64 = G_SHL s64, s32
 LegalizerHelper::LegalizeResult
 LegalizerHelper::fewerElementsVectorMultiEltType(
   MachineInstr &MI, unsigned TypeIdx, LLT NarrowTyArg) {
   if (TypeIdx != 0)
     return UnableToLegalize;
 
   const LLT NarrowTy0 = NarrowTyArg;
   const unsigned NewNumElts =
       NarrowTy0.isVector() ? NarrowTy0.getNumElements() : 1;
 
   const Register DstReg = MI.getOperand(0).getReg();
   LLT DstTy = MRI.getType(DstReg);
   LLT LeftoverTy0;
 
   // All of the operands need to have the same number of elements, so if we can
   // determine a type breakdown for the result type, we can for all of the
   // source types.
   int NumParts = getNarrowTypeBreakDown(DstTy, NarrowTy0, LeftoverTy0).first;
   if (NumParts < 0)
     return UnableToLegalize;
 
   SmallVector<MachineInstrBuilder, 4> NewInsts;
 
   SmallVector<Register, 4> DstRegs, LeftoverDstRegs;
   SmallVector<Register, 4> PartRegs, LeftoverRegs;
 
   for (unsigned I = 1, E = MI.getNumOperands(); I != E; ++I) {
     Register SrcReg = MI.getOperand(I).getReg();
     LLT SrcTyI = MRI.getType(SrcReg);
     LLT NarrowTyI = LLT::scalarOrVector(NewNumElts, SrcTyI.getScalarType());
     LLT LeftoverTyI;
 
     // Split this operand into the requested typed registers, and any leftover
     // required to reproduce the original type.
     if (!extractParts(SrcReg, SrcTyI, NarrowTyI, LeftoverTyI, PartRegs,
                       LeftoverRegs))
       return UnableToLegalize;
 
     if (I == 1) {
       // For the first operand, create an instruction for each part and setup
       // the result.
       for (Register PartReg : PartRegs) {
         Register PartDstReg = MRI.createGenericVirtualRegister(NarrowTy0);
         NewInsts.push_back(MIRBuilder.buildInstrNoInsert(MI.getOpcode())
                                .addDef(PartDstReg)
                                .addUse(PartReg));
         DstRegs.push_back(PartDstReg);
       }
 
       for (Register LeftoverReg : LeftoverRegs) {
         Register PartDstReg = MRI.createGenericVirtualRegister(LeftoverTy0);
         NewInsts.push_back(MIRBuilder.buildInstrNoInsert(MI.getOpcode())
                                .addDef(PartDstReg)
                                .addUse(LeftoverReg));
         LeftoverDstRegs.push_back(PartDstReg);
       }
     } else {
       assert(NewInsts.size() == PartRegs.size() + LeftoverRegs.size());
 
       // Add the newly created operand splits to the existing instructions. The
       // odd-sized pieces are ordered after the requested NarrowTyArg sized
       // pieces.
       unsigned InstCount = 0;
       for (unsigned J = 0, JE = PartRegs.size(); J != JE; ++J)
         NewInsts[InstCount++].addUse(PartRegs[J]);
       for (unsigned J = 0, JE = LeftoverRegs.size(); J != JE; ++J)
         NewInsts[InstCount++].addUse(LeftoverRegs[J]);
     }
 
     PartRegs.clear();
     LeftoverRegs.clear();
   }
 
   // Insert the newly built operations and rebuild the result register.
   for (auto &MIB : NewInsts)
     MIRBuilder.insertInstr(MIB);
 
   insertParts(DstReg, DstTy, NarrowTy0, DstRegs, LeftoverTy0, LeftoverDstRegs);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::fewerElementsVectorCasts(MachineInstr &MI, unsigned TypeIdx,
                                           LLT NarrowTy) {
   if (TypeIdx != 0)
     return UnableToLegalize;
 
   Register DstReg = MI.getOperand(0).getReg();
   Register SrcReg = MI.getOperand(1).getReg();
   LLT DstTy = MRI.getType(DstReg);
   LLT SrcTy = MRI.getType(SrcReg);
 
   LLT NarrowTy0 = NarrowTy;
   LLT NarrowTy1;
   unsigned NumParts;
 
   if (NarrowTy.isVector()) {
     // Uneven breakdown not handled.
     NumParts = DstTy.getNumElements() / NarrowTy.getNumElements();
     if (NumParts * NarrowTy.getNumElements() != DstTy.getNumElements())
       return UnableToLegalize;
 
     NarrowTy1 = LLT::vector(NumParts, SrcTy.getElementType().getSizeInBits());
   } else {
     NumParts = DstTy.getNumElements();
     NarrowTy1 = SrcTy.getElementType();
   }
 
   SmallVector<Register, 4> SrcRegs, DstRegs;
   extractParts(SrcReg, NarrowTy1, NumParts, SrcRegs);
 
   for (unsigned I = 0; I < NumParts; ++I) {
     Register DstReg = MRI.createGenericVirtualRegister(NarrowTy0);
     MachineInstr *NewInst =
         MIRBuilder.buildInstr(MI.getOpcode(), {DstReg}, {SrcRegs[I]});
 
     NewInst->setFlags(MI.getFlags());
     DstRegs.push_back(DstReg);
   }
 
   if (NarrowTy.isVector())
     MIRBuilder.buildConcatVectors(DstReg, DstRegs);
   else
     MIRBuilder.buildBuildVector(DstReg, DstRegs);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::fewerElementsVectorCmp(MachineInstr &MI, unsigned TypeIdx,
                                         LLT NarrowTy) {
   Register DstReg = MI.getOperand(0).getReg();
   Register Src0Reg = MI.getOperand(2).getReg();
   LLT DstTy = MRI.getType(DstReg);
   LLT SrcTy = MRI.getType(Src0Reg);
 
   unsigned NumParts;
   LLT NarrowTy0, NarrowTy1;
 
   if (TypeIdx == 0) {
     unsigned NewElts = NarrowTy.isVector() ? NarrowTy.getNumElements() : 1;
     unsigned OldElts = DstTy.getNumElements();
 
     NarrowTy0 = NarrowTy;
     NumParts = NarrowTy.isVector() ? (OldElts / NewElts) : DstTy.getNumElements();
     NarrowTy1 = NarrowTy.isVector() ?
       LLT::vector(NarrowTy.getNumElements(), SrcTy.getScalarSizeInBits()) :
       SrcTy.getElementType();
 
   } else {
     unsigned NewElts = NarrowTy.isVector() ? NarrowTy.getNumElements() : 1;
     unsigned OldElts = SrcTy.getNumElements();
 
     NumParts = NarrowTy.isVector() ? (OldElts / NewElts) :
       NarrowTy.getNumElements();
     NarrowTy0 = LLT::vector(NarrowTy.getNumElements(),
                             DstTy.getScalarSizeInBits());
     NarrowTy1 = NarrowTy;
   }
 
   // FIXME: Don't know how to handle the situation where the small vectors
   // aren't all the same size yet.
   if (NarrowTy1.isVector() &&
       NarrowTy1.getNumElements() * NumParts != DstTy.getNumElements())
     return UnableToLegalize;
 
   CmpInst::Predicate Pred
     = static_cast<CmpInst::Predicate>(MI.getOperand(1).getPredicate());
 
   SmallVector<Register, 2> Src1Regs, Src2Regs, DstRegs;
   extractParts(MI.getOperand(2).getReg(), NarrowTy1, NumParts, Src1Regs);
   extractParts(MI.getOperand(3).getReg(), NarrowTy1, NumParts, Src2Regs);
 
   for (unsigned I = 0; I < NumParts; ++I) {
     Register DstReg = MRI.createGenericVirtualRegister(NarrowTy0);
     DstRegs.push_back(DstReg);
 
     if (MI.getOpcode() == TargetOpcode::G_ICMP)
       MIRBuilder.buildICmp(Pred, DstReg, Src1Regs[I], Src2Regs[I]);
     else {
       MachineInstr *NewCmp
         = MIRBuilder.buildFCmp(Pred, DstReg, Src1Regs[I], Src2Regs[I]);
       NewCmp->setFlags(MI.getFlags());
     }
   }
 
   if (NarrowTy1.isVector())
     MIRBuilder.buildConcatVectors(DstReg, DstRegs);
   else
     MIRBuilder.buildBuildVector(DstReg, DstRegs);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::fewerElementsVectorSelect(MachineInstr &MI, unsigned TypeIdx,
                                            LLT NarrowTy) {
   Register DstReg = MI.getOperand(0).getReg();
   Register CondReg = MI.getOperand(1).getReg();
 
   unsigned NumParts = 0;
   LLT NarrowTy0, NarrowTy1;
 
   LLT DstTy = MRI.getType(DstReg);
   LLT CondTy = MRI.getType(CondReg);
   unsigned Size = DstTy.getSizeInBits();
 
   assert(TypeIdx == 0 || CondTy.isVector());
 
   if (TypeIdx == 0) {
     NarrowTy0 = NarrowTy;
     NarrowTy1 = CondTy;
 
     unsigned NarrowSize = NarrowTy0.getSizeInBits();
     // FIXME: Don't know how to handle the situation where the small vectors
     // aren't all the same size yet.
     if (Size % NarrowSize != 0)
       return UnableToLegalize;
 
     NumParts = Size / NarrowSize;
 
     // Need to break down the condition type
     if (CondTy.isVector()) {
       if (CondTy.getNumElements() == NumParts)
         NarrowTy1 = CondTy.getElementType();
       else
         NarrowTy1 = LLT::vector(CondTy.getNumElements() / NumParts,
                                 CondTy.getScalarSizeInBits());
     }
   } else {
     NumParts = CondTy.getNumElements();
     if (NarrowTy.isVector()) {
       // TODO: Handle uneven breakdown.
       if (NumParts * NarrowTy.getNumElements() != CondTy.getNumElements())
         return UnableToLegalize;
 
       return UnableToLegalize;
     } else {
       NarrowTy0 = DstTy.getElementType();
       NarrowTy1 = NarrowTy;
     }
   }
 
   SmallVector<Register, 2> DstRegs, Src0Regs, Src1Regs, Src2Regs;
   if (CondTy.isVector())
     extractParts(MI.getOperand(1).getReg(), NarrowTy1, NumParts, Src0Regs);
 
   extractParts(MI.getOperand(2).getReg(), NarrowTy0, NumParts, Src1Regs);
   extractParts(MI.getOperand(3).getReg(), NarrowTy0, NumParts, Src2Regs);
 
   for (unsigned i = 0; i < NumParts; ++i) {
     Register DstReg = MRI.createGenericVirtualRegister(NarrowTy0);
     MIRBuilder.buildSelect(DstReg, CondTy.isVector() ? Src0Regs[i] : CondReg,
                            Src1Regs[i], Src2Regs[i]);
     DstRegs.push_back(DstReg);
   }
 
   if (NarrowTy0.isVector())
     MIRBuilder.buildConcatVectors(DstReg, DstRegs);
   else
     MIRBuilder.buildBuildVector(DstReg, DstRegs);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::fewerElementsVectorPhi(MachineInstr &MI, unsigned TypeIdx,
                                         LLT NarrowTy) {
   const Register DstReg = MI.getOperand(0).getReg();
   LLT PhiTy = MRI.getType(DstReg);
   LLT LeftoverTy;
 
   // All of the operands need to have the same number of elements, so if we can
   // determine a type breakdown for the result type, we can for all of the
   // source types.
   int NumParts, NumLeftover;
   std::tie(NumParts, NumLeftover)
     = getNarrowTypeBreakDown(PhiTy, NarrowTy, LeftoverTy);
   if (NumParts < 0)
     return UnableToLegalize;
 
   SmallVector<Register, 4> DstRegs, LeftoverDstRegs;
   SmallVector<MachineInstrBuilder, 4> NewInsts;
 
   const int TotalNumParts = NumParts + NumLeftover;
 
   // Insert the new phis in the result block first.
   for (int I = 0; I != TotalNumParts; ++I) {
     LLT Ty = I < NumParts ? NarrowTy : LeftoverTy;
     Register PartDstReg = MRI.createGenericVirtualRegister(Ty);
     NewInsts.push_back(MIRBuilder.buildInstr(TargetOpcode::G_PHI)
                        .addDef(PartDstReg));
     if (I < NumParts)
       DstRegs.push_back(PartDstReg);
     else
       LeftoverDstRegs.push_back(PartDstReg);
   }
 
   MachineBasicBlock *MBB = MI.getParent();
   MIRBuilder.setInsertPt(*MBB, MBB->getFirstNonPHI());
   insertParts(DstReg, PhiTy, NarrowTy, DstRegs, LeftoverTy, LeftoverDstRegs);
 
   SmallVector<Register, 4> PartRegs, LeftoverRegs;
 
   // Insert code to extract the incoming values in each predecessor block.
   for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
     PartRegs.clear();
     LeftoverRegs.clear();
 
     Register SrcReg = MI.getOperand(I).getReg();
     MachineBasicBlock &OpMBB = *MI.getOperand(I + 1).getMBB();
     MIRBuilder.setInsertPt(OpMBB, OpMBB.getFirstTerminator());
 
     LLT Unused;
     if (!extractParts(SrcReg, PhiTy, NarrowTy, Unused, PartRegs,
                       LeftoverRegs))
       return UnableToLegalize;
 
     // Add the newly created operand splits to the existing instructions. The
     // odd-sized pieces are ordered after the requested NarrowTyArg sized
     // pieces.
     for (int J = 0; J != TotalNumParts; ++J) {
       MachineInstrBuilder MIB = NewInsts[J];
       MIB.addUse(J < NumParts ? PartRegs[J] : LeftoverRegs[J - NumParts]);
       MIB.addMBB(&OpMBB);
     }
   }
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::fewerElementsVectorUnmergeValues(MachineInstr &MI,
                                                   unsigned TypeIdx,
                                                   LLT NarrowTy) {
   if (TypeIdx != 1)
     return UnableToLegalize;
 
   const int NumDst = MI.getNumOperands() - 1;
   const Register SrcReg = MI.getOperand(NumDst).getReg();
   LLT SrcTy = MRI.getType(SrcReg);
 
   LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
 
   // TODO: Create sequence of extracts.
   if (DstTy == NarrowTy)
     return UnableToLegalize;
 
   LLT GCDTy = getGCDType(SrcTy, NarrowTy);
   if (DstTy == GCDTy) {
     // This would just be a copy of the same unmerge.
     // TODO: Create extracts, pad with undef and create intermediate merges.
     return UnableToLegalize;
   }
 
   auto Unmerge = MIRBuilder.buildUnmerge(GCDTy, SrcReg);
   const int NumUnmerge = Unmerge->getNumOperands() - 1;
   const int PartsPerUnmerge = NumDst / NumUnmerge;
 
   for (int I = 0; I != NumUnmerge; ++I) {
     auto MIB = MIRBuilder.buildInstr(TargetOpcode::G_UNMERGE_VALUES);
 
     for (int J = 0; J != PartsPerUnmerge; ++J)
       MIB.addDef(MI.getOperand(I * PartsPerUnmerge + J).getReg());
     MIB.addUse(Unmerge.getReg(I));
   }
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::fewerElementsVectorBuildVector(MachineInstr &MI,
                                                 unsigned TypeIdx,
                                                 LLT NarrowTy) {
   assert(TypeIdx == 0 && "not a vector type index");
   Register DstReg = MI.getOperand(0).getReg();
   LLT DstTy = MRI.getType(DstReg);
   LLT SrcTy = DstTy.getElementType();
 
   int DstNumElts = DstTy.getNumElements();
   int NarrowNumElts = NarrowTy.getNumElements();
   int NumConcat = (DstNumElts + NarrowNumElts - 1) / NarrowNumElts;
   LLT WidenedDstTy = LLT::vector(NarrowNumElts * NumConcat, SrcTy);
 
   SmallVector<Register, 8> ConcatOps;
   SmallVector<Register, 8> SubBuildVector;
 
   Register UndefReg;
   if (WidenedDstTy != DstTy)
     UndefReg = MIRBuilder.buildUndef(SrcTy).getReg(0);
 
   // Create a G_CONCAT_VECTORS of NarrowTy pieces, padding with undef as
   // necessary.
   //
   // %3:_(<3 x s16>) = G_BUILD_VECTOR %0, %1, %2
   //   -> <2 x s16>
   //
   // %4:_(s16) = G_IMPLICIT_DEF
   // %5:_(<2 x s16>) = G_BUILD_VECTOR %0, %1
   // %6:_(<2 x s16>) = G_BUILD_VECTOR %2, %4
   // %7:_(<4 x s16>) = G_CONCAT_VECTORS %5, %6
   // %3:_(<3 x s16>) = G_EXTRACT %7, 0
   for (int I = 0; I != NumConcat; ++I) {
     for (int J = 0; J != NarrowNumElts; ++J) {
       int SrcIdx = NarrowNumElts * I + J;
 
       if (SrcIdx < DstNumElts) {
         Register SrcReg = MI.getOperand(SrcIdx + 1).getReg();
         SubBuildVector.push_back(SrcReg);
       } else
         SubBuildVector.push_back(UndefReg);
     }
 
     auto BuildVec = MIRBuilder.buildBuildVector(NarrowTy, SubBuildVector);
     ConcatOps.push_back(BuildVec.getReg(0));
     SubBuildVector.clear();
   }
 
   if (DstTy == WidenedDstTy)
     MIRBuilder.buildConcatVectors(DstReg, ConcatOps);
   else {
     auto Concat = MIRBuilder.buildConcatVectors(WidenedDstTy, ConcatOps);
     MIRBuilder.buildExtract(DstReg, Concat, 0);
   }
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::reduceLoadStoreWidth(MachineInstr &MI, unsigned TypeIdx,
                                       LLT NarrowTy) {
   // FIXME: Don't know how to handle secondary types yet.
   if (TypeIdx != 0)
     return UnableToLegalize;
 
   MachineMemOperand *MMO = *MI.memoperands_begin();
 
   // This implementation doesn't work for atomics. Give up instead of doing
   // something invalid.
   if (MMO->getOrdering() != AtomicOrdering::NotAtomic ||
       MMO->getFailureOrdering() != AtomicOrdering::NotAtomic)
     return UnableToLegalize;
 
   bool IsLoad = MI.getOpcode() == TargetOpcode::G_LOAD;
   Register ValReg = MI.getOperand(0).getReg();
   Register AddrReg = MI.getOperand(1).getReg();
   LLT ValTy = MRI.getType(ValReg);
 
   // FIXME: Do we need a distinct NarrowMemory legalize action?
   if (ValTy.getSizeInBits() != 8 * MMO->getSize()) {
     LLVM_DEBUG(dbgs() << "Can't narrow extload/truncstore\n");
     return UnableToLegalize;
   }
 
   int NumParts = -1;
   int NumLeftover = -1;
   LLT LeftoverTy;
   SmallVector<Register, 8> NarrowRegs, NarrowLeftoverRegs;
   if (IsLoad) {
     std::tie(NumParts, NumLeftover) = getNarrowTypeBreakDown(ValTy, NarrowTy, LeftoverTy);
   } else {
     if (extractParts(ValReg, ValTy, NarrowTy, LeftoverTy, NarrowRegs,
                      NarrowLeftoverRegs)) {
       NumParts = NarrowRegs.size();
       NumLeftover = NarrowLeftoverRegs.size();
     }
   }
 
   if (NumParts == -1)
     return UnableToLegalize;
 
   const LLT OffsetTy = LLT::scalar(MRI.getType(AddrReg).getScalarSizeInBits());
 
   unsigned TotalSize = ValTy.getSizeInBits();
 
   // Split the load/store into PartTy sized pieces starting at Offset. If this
   // is a load, return the new registers in ValRegs. For a store, each elements
   // of ValRegs should be PartTy. Returns the next offset that needs to be
   // handled.
   auto splitTypePieces = [=](LLT PartTy, SmallVectorImpl<Register> &ValRegs,
                              unsigned Offset) -> unsigned {
     MachineFunction &MF = MIRBuilder.getMF();
     unsigned PartSize = PartTy.getSizeInBits();
     for (unsigned Idx = 0, E = NumParts; Idx != E && Offset < TotalSize;
          Offset += PartSize, ++Idx) {
       unsigned ByteSize = PartSize / 8;
       unsigned ByteOffset = Offset / 8;
       Register NewAddrReg;
 
       MIRBuilder.materializePtrAdd(NewAddrReg, AddrReg, OffsetTy, ByteOffset);
 
       MachineMemOperand *NewMMO =
         MF.getMachineMemOperand(MMO, ByteOffset, ByteSize);
 
       if (IsLoad) {
         Register Dst = MRI.createGenericVirtualRegister(PartTy);
         ValRegs.push_back(Dst);
         MIRBuilder.buildLoad(Dst, NewAddrReg, *NewMMO);
       } else {
         MIRBuilder.buildStore(ValRegs[Idx], NewAddrReg, *NewMMO);
       }
     }
 
     return Offset;
   };
 
   unsigned HandledOffset = splitTypePieces(NarrowTy, NarrowRegs, 0);
 
   // Handle the rest of the register if this isn't an even type breakdown.
   if (LeftoverTy.isValid())
     splitTypePieces(LeftoverTy, NarrowLeftoverRegs, HandledOffset);
 
   if (IsLoad) {
     insertParts(ValReg, ValTy, NarrowTy, NarrowRegs,
                 LeftoverTy, NarrowLeftoverRegs);
   }
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::reduceOperationWidth(MachineInstr &MI, unsigned int TypeIdx,
                                       LLT NarrowTy) {
   assert(TypeIdx == 0 && "only one type index expected");
 
   const unsigned Opc = MI.getOpcode();
   const int NumOps = MI.getNumOperands() - 1;
   const Register DstReg = MI.getOperand(0).getReg();
   const unsigned Flags = MI.getFlags();
   const unsigned NarrowSize = NarrowTy.getSizeInBits();
   const LLT NarrowScalarTy = LLT::scalar(NarrowSize);
 
   assert(NumOps <= 3 && "expected instruction with 1 result and 1-3 sources");
 
   // First of all check whether we are narrowing (changing the element type)
   // or reducing the vector elements
   const LLT DstTy = MRI.getType(DstReg);
   const bool IsNarrow = NarrowTy.getScalarType() != DstTy.getScalarType();
 
   SmallVector<Register, 8> ExtractedRegs[3];
   SmallVector<Register, 8> Parts;
 
   unsigned NarrowElts = NarrowTy.isVector() ? NarrowTy.getNumElements() : 1;
 
   // Break down all the sources into NarrowTy pieces we can operate on. This may
   // involve creating merges to a wider type, padded with undef.
   for (int I = 0; I != NumOps; ++I) {
     Register SrcReg = MI.getOperand(I + 1).getReg();
     LLT SrcTy = MRI.getType(SrcReg);
 
     // The type to narrow SrcReg to. For narrowing, this is a smaller scalar.
     // For fewerElements, this is a smaller vector with the same element type.
     LLT OpNarrowTy;
     if (IsNarrow) {
       OpNarrowTy = NarrowScalarTy;
 
       // In case of narrowing, we need to cast vectors to scalars for this to
       // work properly
       // FIXME: Can we do without the bitcast here if we're narrowing?
       if (SrcTy.isVector()) {
         SrcTy = LLT::scalar(SrcTy.getSizeInBits());
         SrcReg = MIRBuilder.buildBitcast(SrcTy, SrcReg).getReg(0);
       }
     } else {
       OpNarrowTy = LLT::scalarOrVector(NarrowElts, SrcTy.getScalarType());
     }
 
     LLT GCDTy = extractGCDType(ExtractedRegs[I], SrcTy, OpNarrowTy, SrcReg);
 
     // Build a sequence of NarrowTy pieces in ExtractedRegs for this operand.
     buildLCMMergePieces(SrcTy, OpNarrowTy, GCDTy, ExtractedRegs[I],
                         TargetOpcode::G_ANYEXT);
   }
 
   SmallVector<Register, 8> ResultRegs;
 
   // Input operands for each sub-instruction.
   SmallVector<SrcOp, 4> InputRegs(NumOps, Register());
 
   int NumParts = ExtractedRegs[0].size();
   const unsigned DstSize = DstTy.getSizeInBits();
   const LLT DstScalarTy = LLT::scalar(DstSize);
 
   // Narrowing needs to use scalar types
   LLT DstLCMTy, NarrowDstTy;
   if (IsNarrow) {
     DstLCMTy = getLCMType(DstScalarTy, NarrowScalarTy);
     NarrowDstTy = NarrowScalarTy;
   } else {
     DstLCMTy = getLCMType(DstTy, NarrowTy);
     NarrowDstTy = NarrowTy;
   }
 
   // We widened the source registers to satisfy merge/unmerge size
   // constraints. We'll have some extra fully undef parts.
   const int NumRealParts = (DstSize + NarrowSize - 1) / NarrowSize;
 
   for (int I = 0; I != NumRealParts; ++I) {
     // Emit this instruction on each of the split pieces.
     for (int J = 0; J != NumOps; ++J)
       InputRegs[J] = ExtractedRegs[J][I];
 
     auto Inst = MIRBuilder.buildInstr(Opc, {NarrowDstTy}, InputRegs, Flags);
     ResultRegs.push_back(Inst.getReg(0));
   }
 
   // Fill out the widened result with undef instead of creating instructions
   // with undef inputs.
   int NumUndefParts = NumParts - NumRealParts;
   if (NumUndefParts != 0)
     ResultRegs.append(NumUndefParts,
                       MIRBuilder.buildUndef(NarrowDstTy).getReg(0));
 
   // Extract the possibly padded result. Use a scratch register if we need to do
   // a final bitcast, otherwise use the original result register.
   Register MergeDstReg;
   if (IsNarrow && DstTy.isVector())
     MergeDstReg = MRI.createGenericVirtualRegister(DstScalarTy);
   else
     MergeDstReg = DstReg;
 
   buildWidenedRemergeToDst(MergeDstReg, DstLCMTy, ResultRegs);
 
   // Recast to vector if we narrowed a vector
   if (IsNarrow && DstTy.isVector())
     MIRBuilder.buildBitcast(DstReg, MergeDstReg);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::fewerElementsVectorSextInReg(MachineInstr &MI, unsigned TypeIdx,
                                               LLT NarrowTy) {
   Register DstReg = MI.getOperand(0).getReg();
   Register SrcReg = MI.getOperand(1).getReg();
   int64_t Imm = MI.getOperand(2).getImm();
 
   LLT DstTy = MRI.getType(DstReg);
 
   SmallVector<Register, 8> Parts;
   LLT GCDTy = extractGCDType(Parts, DstTy, NarrowTy, SrcReg);
   LLT LCMTy = buildLCMMergePieces(DstTy, NarrowTy, GCDTy, Parts);
 
   for (Register &R : Parts)
     R = MIRBuilder.buildSExtInReg(NarrowTy, R, Imm).getReg(0);
 
   buildWidenedRemergeToDst(DstReg, LCMTy, Parts);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::fewerElementsVector(MachineInstr &MI, unsigned TypeIdx,
                                      LLT NarrowTy) {
   using namespace TargetOpcode;
 
   switch (MI.getOpcode()) {
   case G_IMPLICIT_DEF:
     return fewerElementsVectorImplicitDef(MI, TypeIdx, NarrowTy);
   case G_TRUNC:
   case G_AND:
   case G_OR:
   case G_XOR:
   case G_ADD:
   case G_SUB:
   case G_MUL:
   case G_SMULH:
   case G_UMULH:
   case G_FADD:
   case G_FMUL:
   case G_FSUB:
   case G_FNEG:
   case G_FABS:
   case G_FCANONICALIZE:
   case G_FDIV:
   case G_FREM:
   case G_FMA:
   case G_FMAD:
   case G_FPOW:
   case G_FEXP:
   case G_FEXP2:
   case G_FLOG:
   case G_FLOG2:
   case G_FLOG10:
   case G_FNEARBYINT:
   case G_FCEIL:
   case G_FFLOOR:
   case G_FRINT:
   case G_INTRINSIC_ROUND:
   case G_INTRINSIC_TRUNC:
   case G_FCOS:
   case G_FSIN:
   case G_FSQRT:
   case G_BSWAP:
   case G_BITREVERSE:
   case G_SDIV:
   case G_UDIV:
   case G_SREM:
   case G_UREM:
   case G_SMIN:
   case G_SMAX:
   case G_UMIN:
   case G_UMAX:
   case G_FMINNUM:
   case G_FMAXNUM:
   case G_FMINNUM_IEEE:
   case G_FMAXNUM_IEEE:
   case G_FMINIMUM:
   case G_FMAXIMUM:
   case G_FSHL:
   case G_FSHR:
   case G_FREEZE:
   case G_SADDSAT:
   case G_SSUBSAT:
   case G_UADDSAT:
   case G_USUBSAT:
     return reduceOperationWidth(MI, TypeIdx, NarrowTy);
   case G_SHL:
   case G_LSHR:
   case G_ASHR:
   case G_CTLZ:
   case G_CTLZ_ZERO_UNDEF:
   case G_CTTZ:
   case G_CTTZ_ZERO_UNDEF:
   case G_CTPOP:
   case G_FCOPYSIGN:
     return fewerElementsVectorMultiEltType(MI, TypeIdx, NarrowTy);
   case G_ZEXT:
   case G_SEXT:
   case G_ANYEXT:
   case G_FPEXT:
   case G_FPTRUNC:
   case G_SITOFP:
   case G_UITOFP:
   case G_FPTOSI:
   case G_FPTOUI:
   case G_INTTOPTR:
   case G_PTRTOINT:
   case G_ADDRSPACE_CAST:
     return fewerElementsVectorCasts(MI, TypeIdx, NarrowTy);
   case G_ICMP:
   case G_FCMP:
     return fewerElementsVectorCmp(MI, TypeIdx, NarrowTy);
   case G_SELECT:
     return fewerElementsVectorSelect(MI, TypeIdx, NarrowTy);
   case G_PHI:
     return fewerElementsVectorPhi(MI, TypeIdx, NarrowTy);
   case G_UNMERGE_VALUES:
     return fewerElementsVectorUnmergeValues(MI, TypeIdx, NarrowTy);
   case G_BUILD_VECTOR:
     return fewerElementsVectorBuildVector(MI, TypeIdx, NarrowTy);
   case G_LOAD:
   case G_STORE:
     return reduceLoadStoreWidth(MI, TypeIdx, NarrowTy);
   case G_SEXT_INREG:
     return fewerElementsVectorSextInReg(MI, TypeIdx, NarrowTy);
   default:
     return UnableToLegalize;
   }
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::narrowScalarShiftByConstant(MachineInstr &MI, const APInt &Amt,
                                              const LLT HalfTy, const LLT AmtTy) {
 
   Register InL = MRI.createGenericVirtualRegister(HalfTy);
   Register InH = MRI.createGenericVirtualRegister(HalfTy);
   MIRBuilder.buildUnmerge({InL, InH}, MI.getOperand(1));
 
   if (Amt.isNullValue()) {
     MIRBuilder.buildMerge(MI.getOperand(0), {InL, InH});
     MI.eraseFromParent();
     return Legalized;
   }
 
   LLT NVT = HalfTy;
   unsigned NVTBits = HalfTy.getSizeInBits();
   unsigned VTBits = 2 * NVTBits;
 
   SrcOp Lo(Register(0)), Hi(Register(0));
   if (MI.getOpcode() == TargetOpcode::G_SHL) {
     if (Amt.ugt(VTBits)) {
       Lo = Hi = MIRBuilder.buildConstant(NVT, 0);
     } else if (Amt.ugt(NVTBits)) {
       Lo = MIRBuilder.buildConstant(NVT, 0);
       Hi = MIRBuilder.buildShl(NVT, InL,
                                MIRBuilder.buildConstant(AmtTy, Amt - NVTBits));
     } else if (Amt == NVTBits) {
       Lo = MIRBuilder.buildConstant(NVT, 0);
       Hi = InL;
     } else {
       Lo = MIRBuilder.buildShl(NVT, InL, MIRBuilder.buildConstant(AmtTy, Amt));
       auto OrLHS =
           MIRBuilder.buildShl(NVT, InH, MIRBuilder.buildConstant(AmtTy, Amt));
       auto OrRHS = MIRBuilder.buildLShr(
           NVT, InL, MIRBuilder.buildConstant(AmtTy, -Amt + NVTBits));
       Hi = MIRBuilder.buildOr(NVT, OrLHS, OrRHS);
     }
   } else if (MI.getOpcode() == TargetOpcode::G_LSHR) {
     if (Amt.ugt(VTBits)) {
       Lo = Hi = MIRBuilder.buildConstant(NVT, 0);
     } else if (Amt.ugt(NVTBits)) {
       Lo = MIRBuilder.buildLShr(NVT, InH,
                                 MIRBuilder.buildConstant(AmtTy, Amt - NVTBits));
       Hi = MIRBuilder.buildConstant(NVT, 0);
     } else if (Amt == NVTBits) {
       Lo = InH;
       Hi = MIRBuilder.buildConstant(NVT, 0);
     } else {
       auto ShiftAmtConst = MIRBuilder.buildConstant(AmtTy, Amt);
 
       auto OrLHS = MIRBuilder.buildLShr(NVT, InL, ShiftAmtConst);
       auto OrRHS = MIRBuilder.buildShl(
           NVT, InH, MIRBuilder.buildConstant(AmtTy, -Amt + NVTBits));
 
       Lo = MIRBuilder.buildOr(NVT, OrLHS, OrRHS);
       Hi = MIRBuilder.buildLShr(NVT, InH, ShiftAmtConst);
     }
   } else {
     if (Amt.ugt(VTBits)) {
       Hi = Lo = MIRBuilder.buildAShr(
           NVT, InH, MIRBuilder.buildConstant(AmtTy, NVTBits - 1));
     } else if (Amt.ugt(NVTBits)) {
       Lo = MIRBuilder.buildAShr(NVT, InH,
                                 MIRBuilder.buildConstant(AmtTy, Amt - NVTBits));
       Hi = MIRBuilder.buildAShr(NVT, InH,
                                 MIRBuilder.buildConstant(AmtTy, NVTBits - 1));
     } else if (Amt == NVTBits) {
       Lo = InH;
       Hi = MIRBuilder.buildAShr(NVT, InH,
                                 MIRBuilder.buildConstant(AmtTy, NVTBits - 1));
     } else {
       auto ShiftAmtConst = MIRBuilder.buildConstant(AmtTy, Amt);
 
       auto OrLHS = MIRBuilder.buildLShr(NVT, InL, ShiftAmtConst);
       auto OrRHS = MIRBuilder.buildShl(
           NVT, InH, MIRBuilder.buildConstant(AmtTy, -Amt + NVTBits));
 
       Lo = MIRBuilder.buildOr(NVT, OrLHS, OrRHS);
       Hi = MIRBuilder.buildAShr(NVT, InH, ShiftAmtConst);
     }
   }
 
   MIRBuilder.buildMerge(MI.getOperand(0), {Lo, Hi});
   MI.eraseFromParent();
 
   return Legalized;
 }
 
 // TODO: Optimize if constant shift amount.
 LegalizerHelper::LegalizeResult
 LegalizerHelper::narrowScalarShift(MachineInstr &MI, unsigned TypeIdx,
                                    LLT RequestedTy) {
   if (TypeIdx == 1) {
     Observer.changingInstr(MI);
     narrowScalarSrc(MI, RequestedTy, 2);
     Observer.changedInstr(MI);
     return Legalized;
   }
 
   Register DstReg = MI.getOperand(0).getReg();
   LLT DstTy = MRI.getType(DstReg);
   if (DstTy.isVector())
     return UnableToLegalize;
 
   Register Amt = MI.getOperand(2).getReg();
   LLT ShiftAmtTy = MRI.getType(Amt);
   const unsigned DstEltSize = DstTy.getScalarSizeInBits();
   if (DstEltSize % 2 != 0)
     return UnableToLegalize;
 
   // Ignore the input type. We can only go to exactly half the size of the
   // input. If that isn't small enough, the resulting pieces will be further
   // legalized.
   const unsigned NewBitSize = DstEltSize / 2;
   const LLT HalfTy = LLT::scalar(NewBitSize);
   const LLT CondTy = LLT::scalar(1);
 
   if (const MachineInstr *KShiftAmt =
           getOpcodeDef(TargetOpcode::G_CONSTANT, Amt, MRI)) {
     return narrowScalarShiftByConstant(
         MI, KShiftAmt->getOperand(1).getCImm()->getValue(), HalfTy, ShiftAmtTy);
   }
 
   // TODO: Expand with known bits.
 
   // Handle the fully general expansion by an unknown amount.
   auto NewBits = MIRBuilder.buildConstant(ShiftAmtTy, NewBitSize);
 
   Register InL = MRI.createGenericVirtualRegister(HalfTy);
   Register InH = MRI.createGenericVirtualRegister(HalfTy);
   MIRBuilder.buildUnmerge({InL, InH}, MI.getOperand(1));
 
   auto AmtExcess = MIRBuilder.buildSub(ShiftAmtTy, Amt, NewBits);
   auto AmtLack = MIRBuilder.buildSub(ShiftAmtTy, NewBits, Amt);
 
   auto Zero = MIRBuilder.buildConstant(ShiftAmtTy, 0);
   auto IsShort = MIRBuilder.buildICmp(ICmpInst::ICMP_ULT, CondTy, Amt, NewBits);
   auto IsZero = MIRBuilder.buildICmp(ICmpInst::ICMP_EQ, CondTy, Amt, Zero);
 
   Register ResultRegs[2];
   switch (MI.getOpcode()) {
   case TargetOpcode::G_SHL: {
     // Short: ShAmt < NewBitSize
     auto LoS = MIRBuilder.buildShl(HalfTy, InL, Amt);
 
     auto LoOr = MIRBuilder.buildLShr(HalfTy, InL, AmtLack);
     auto HiOr = MIRBuilder.buildShl(HalfTy, InH, Amt);
     auto HiS = MIRBuilder.buildOr(HalfTy, LoOr, HiOr);
 
     // Long: ShAmt >= NewBitSize
     auto LoL = MIRBuilder.buildConstant(HalfTy, 0);         // Lo part is zero.
     auto HiL = MIRBuilder.buildShl(HalfTy, InL, AmtExcess); // Hi from Lo part.
 
     auto Lo = MIRBuilder.buildSelect(HalfTy, IsShort, LoS, LoL);
     auto Hi = MIRBuilder.buildSelect(
         HalfTy, IsZero, InH, MIRBuilder.buildSelect(HalfTy, IsShort, HiS, HiL));
 
     ResultRegs[0] = Lo.getReg(0);
     ResultRegs[1] = Hi.getReg(0);
     break;
   }
   case TargetOpcode::G_LSHR:
   case TargetOpcode::G_ASHR: {
     // Short: ShAmt < NewBitSize
     auto HiS = MIRBuilder.buildInstr(MI.getOpcode(), {HalfTy}, {InH, Amt});
 
     auto LoOr = MIRBuilder.buildLShr(HalfTy, InL, Amt);
     auto HiOr = MIRBuilder.buildShl(HalfTy, InH, AmtLack);
     auto LoS = MIRBuilder.buildOr(HalfTy, LoOr, HiOr);
 
     // Long: ShAmt >= NewBitSize
     MachineInstrBuilder HiL;
     if (MI.getOpcode() == TargetOpcode::G_LSHR) {
       HiL = MIRBuilder.buildConstant(HalfTy, 0);            // Hi part is zero.
     } else {
       auto ShiftAmt = MIRBuilder.buildConstant(ShiftAmtTy, NewBitSize - 1);
       HiL = MIRBuilder.buildAShr(HalfTy, InH, ShiftAmt);    // Sign of Hi part.
     }
     auto LoL = MIRBuilder.buildInstr(MI.getOpcode(), {HalfTy},
                                      {InH, AmtExcess});     // Lo from Hi part.
 
     auto Lo = MIRBuilder.buildSelect(
         HalfTy, IsZero, InL, MIRBuilder.buildSelect(HalfTy, IsShort, LoS, LoL));
 
     auto Hi = MIRBuilder.buildSelect(HalfTy, IsShort, HiS, HiL);
 
     ResultRegs[0] = Lo.getReg(0);
     ResultRegs[1] = Hi.getReg(0);
     break;
   }
   default:
     llvm_unreachable("not a shift");
   }
 
   MIRBuilder.buildMerge(DstReg, ResultRegs);
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::moreElementsVectorPhi(MachineInstr &MI, unsigned TypeIdx,
                                        LLT MoreTy) {
   assert(TypeIdx == 0 && "Expecting only Idx 0");
 
   Observer.changingInstr(MI);
   for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
     MachineBasicBlock &OpMBB = *MI.getOperand(I + 1).getMBB();
     MIRBuilder.setInsertPt(OpMBB, OpMBB.getFirstTerminator());
     moreElementsVectorSrc(MI, MoreTy, I);
   }
 
   MachineBasicBlock &MBB = *MI.getParent();
   MIRBuilder.setInsertPt(MBB, --MBB.getFirstNonPHI());
   moreElementsVectorDst(MI, MoreTy, 0);
   Observer.changedInstr(MI);
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx,
                                     LLT MoreTy) {
   unsigned Opc = MI.getOpcode();
   switch (Opc) {
   case TargetOpcode::G_IMPLICIT_DEF:
   case TargetOpcode::G_LOAD: {
     if (TypeIdx != 0)
       return UnableToLegalize;
     Observer.changingInstr(MI);
     moreElementsVectorDst(MI, MoreTy, 0);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_STORE:
     if (TypeIdx != 0)
       return UnableToLegalize;
     Observer.changingInstr(MI);
     moreElementsVectorSrc(MI, MoreTy, 0);
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_AND:
   case TargetOpcode::G_OR:
   case TargetOpcode::G_XOR:
   case TargetOpcode::G_SMIN:
   case TargetOpcode::G_SMAX:
   case TargetOpcode::G_UMIN:
   case TargetOpcode::G_UMAX:
   case TargetOpcode::G_FMINNUM:
   case TargetOpcode::G_FMAXNUM:
   case TargetOpcode::G_FMINNUM_IEEE:
   case TargetOpcode::G_FMAXNUM_IEEE:
   case TargetOpcode::G_FMINIMUM:
   case TargetOpcode::G_FMAXIMUM: {
     Observer.changingInstr(MI);
     moreElementsVectorSrc(MI, MoreTy, 1);
     moreElementsVectorSrc(MI, MoreTy, 2);
     moreElementsVectorDst(MI, MoreTy, 0);
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_EXTRACT:
     if (TypeIdx != 1)
       return UnableToLegalize;
     Observer.changingInstr(MI);
     moreElementsVectorSrc(MI, MoreTy, 1);
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_INSERT:
   case TargetOpcode::G_FREEZE:
     if (TypeIdx != 0)
       return UnableToLegalize;
     Observer.changingInstr(MI);
     moreElementsVectorSrc(MI, MoreTy, 1);
     moreElementsVectorDst(MI, MoreTy, 0);
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_SELECT:
     if (TypeIdx != 0)
       return UnableToLegalize;
     if (MRI.getType(MI.getOperand(1).getReg()).isVector())
       return UnableToLegalize;
 
     Observer.changingInstr(MI);
     moreElementsVectorSrc(MI, MoreTy, 2);
     moreElementsVectorSrc(MI, MoreTy, 3);
     moreElementsVectorDst(MI, MoreTy, 0);
     Observer.changedInstr(MI);
     return Legalized;
   case TargetOpcode::G_UNMERGE_VALUES: {
     if (TypeIdx != 1)
       return UnableToLegalize;
 
     LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
     int NumDst = MI.getNumOperands() - 1;
     moreElementsVectorSrc(MI, MoreTy, NumDst);
 
     auto MIB = MIRBuilder.buildInstr(TargetOpcode::G_UNMERGE_VALUES);
     for (int I = 0; I != NumDst; ++I)
       MIB.addDef(MI.getOperand(I).getReg());
 
     int NewNumDst = MoreTy.getSizeInBits() / DstTy.getSizeInBits();
     for (int I = NumDst; I != NewNumDst; ++I)
       MIB.addDef(MRI.createGenericVirtualRegister(DstTy));
 
     MIB.addUse(MI.getOperand(NumDst).getReg());
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_PHI:
     return moreElementsVectorPhi(MI, TypeIdx, MoreTy);
   default:
     return UnableToLegalize;
   }
 }
 
 void LegalizerHelper::multiplyRegisters(SmallVectorImpl<Register> &DstRegs,
                                         ArrayRef<Register> Src1Regs,
                                         ArrayRef<Register> Src2Regs,
                                         LLT NarrowTy) {
   MachineIRBuilder &B = MIRBuilder;
   unsigned SrcParts = Src1Regs.size();
   unsigned DstParts = DstRegs.size();
 
   unsigned DstIdx = 0; // Low bits of the result.
   Register FactorSum =
       B.buildMul(NarrowTy, Src1Regs[DstIdx], Src2Regs[DstIdx]).getReg(0);
   DstRegs[DstIdx] = FactorSum;
 
   unsigned CarrySumPrevDstIdx;
   SmallVector<Register, 4> Factors;
 
   for (DstIdx = 1; DstIdx < DstParts; DstIdx++) {
     // Collect low parts of muls for DstIdx.
     for (unsigned i = DstIdx + 1 < SrcParts ? 0 : DstIdx - SrcParts + 1;
          i <= std::min(DstIdx, SrcParts - 1); ++i) {
       MachineInstrBuilder Mul =
           B.buildMul(NarrowTy, Src1Regs[DstIdx - i], Src2Regs[i]);
       Factors.push_back(Mul.getReg(0));
     }
     // Collect high parts of muls from previous DstIdx.
     for (unsigned i = DstIdx < SrcParts ? 0 : DstIdx - SrcParts;
          i <= std::min(DstIdx - 1, SrcParts - 1); ++i) {
       MachineInstrBuilder Umulh =
           B.buildUMulH(NarrowTy, Src1Regs[DstIdx - 1 - i], Src2Regs[i]);
       Factors.push_back(Umulh.getReg(0));
     }
     // Add CarrySum from additions calculated for previous DstIdx.
     if (DstIdx != 1) {
       Factors.push_back(CarrySumPrevDstIdx);
     }
 
     Register CarrySum;
     // Add all factors and accumulate all carries into CarrySum.
     if (DstIdx != DstParts - 1) {
       MachineInstrBuilder Uaddo =
           B.buildUAddo(NarrowTy, LLT::scalar(1), Factors[0], Factors[1]);
       FactorSum = Uaddo.getReg(0);
       CarrySum = B.buildZExt(NarrowTy, Uaddo.getReg(1)).getReg(0);
       for (unsigned i = 2; i < Factors.size(); ++i) {
         MachineInstrBuilder Uaddo =
             B.buildUAddo(NarrowTy, LLT::scalar(1), FactorSum, Factors[i]);
         FactorSum = Uaddo.getReg(0);
         MachineInstrBuilder Carry = B.buildZExt(NarrowTy, Uaddo.getReg(1));
         CarrySum = B.buildAdd(NarrowTy, CarrySum, Carry).getReg(0);
       }
     } else {
       // Since value for the next index is not calculated, neither is CarrySum.
       FactorSum = B.buildAdd(NarrowTy, Factors[0], Factors[1]).getReg(0);
       for (unsigned i = 2; i < Factors.size(); ++i)
         FactorSum = B.buildAdd(NarrowTy, FactorSum, Factors[i]).getReg(0);
     }
 
     CarrySumPrevDstIdx = CarrySum;
     DstRegs[DstIdx] = FactorSum;
     Factors.clear();
   }
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::narrowScalarMul(MachineInstr &MI, LLT NarrowTy) {
   Register DstReg = MI.getOperand(0).getReg();
   Register Src1 = MI.getOperand(1).getReg();
   Register Src2 = MI.getOperand(2).getReg();
 
   LLT Ty = MRI.getType(DstReg);
   if (Ty.isVector())
     return UnableToLegalize;
 
   unsigned SrcSize = MRI.getType(Src1).getSizeInBits();
   unsigned DstSize = Ty.getSizeInBits();
   unsigned NarrowSize = NarrowTy.getSizeInBits();
   if (DstSize % NarrowSize != 0 || SrcSize % NarrowSize != 0)
     return UnableToLegalize;
 
   unsigned NumDstParts = DstSize / NarrowSize;
   unsigned NumSrcParts = SrcSize / NarrowSize;
   bool IsMulHigh = MI.getOpcode() == TargetOpcode::G_UMULH;
   unsigned DstTmpParts = NumDstParts * (IsMulHigh ? 2 : 1);
 
   SmallVector<Register, 2> Src1Parts, Src2Parts;
   SmallVector<Register, 2> DstTmpRegs(DstTmpParts);
   extractParts(Src1, NarrowTy, NumSrcParts, Src1Parts);
   extractParts(Src2, NarrowTy, NumSrcParts, Src2Parts);
   multiplyRegisters(DstTmpRegs, Src1Parts, Src2Parts, NarrowTy);
 
   // Take only high half of registers if this is high mul.
   ArrayRef<Register> DstRegs(
       IsMulHigh ? &DstTmpRegs[DstTmpParts / 2] : &DstTmpRegs[0], NumDstParts);
   MIRBuilder.buildMerge(DstReg, DstRegs);
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::narrowScalarExtract(MachineInstr &MI, unsigned TypeIdx,
                                      LLT NarrowTy) {
   if (TypeIdx != 1)
     return UnableToLegalize;
 
   uint64_t NarrowSize = NarrowTy.getSizeInBits();
 
   int64_t SizeOp1 = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
   // FIXME: add support for when SizeOp1 isn't an exact multiple of
   // NarrowSize.
   if (SizeOp1 % NarrowSize != 0)
     return UnableToLegalize;
   int NumParts = SizeOp1 / NarrowSize;
 
   SmallVector<Register, 2> SrcRegs, DstRegs;
   SmallVector<uint64_t, 2> Indexes;
   extractParts(MI.getOperand(1).getReg(), NarrowTy, NumParts, SrcRegs);
 
   Register OpReg = MI.getOperand(0).getReg();
   uint64_t OpStart = MI.getOperand(2).getImm();
   uint64_t OpSize = MRI.getType(OpReg).getSizeInBits();
   for (int i = 0; i < NumParts; ++i) {
     unsigned SrcStart = i * NarrowSize;
 
     if (SrcStart + NarrowSize <= OpStart || SrcStart >= OpStart + OpSize) {
       // No part of the extract uses this subregister, ignore it.
       continue;
     } else if (SrcStart == OpStart && NarrowTy == MRI.getType(OpReg)) {
       // The entire subregister is extracted, forward the value.
       DstRegs.push_back(SrcRegs[i]);
       continue;
     }
 
     // OpSegStart is where this destination segment would start in OpReg if it
     // extended infinitely in both directions.
     int64_t ExtractOffset;
     uint64_t SegSize;
     if (OpStart < SrcStart) {
       ExtractOffset = 0;
       SegSize = std::min(NarrowSize, OpStart + OpSize - SrcStart);
     } else {
       ExtractOffset = OpStart - SrcStart;
       SegSize = std::min(SrcStart + NarrowSize - OpStart, OpSize);
     }
 
     Register SegReg = SrcRegs[i];
     if (ExtractOffset != 0 || SegSize != NarrowSize) {
       // A genuine extract is needed.
       SegReg = MRI.createGenericVirtualRegister(LLT::scalar(SegSize));
       MIRBuilder.buildExtract(SegReg, SrcRegs[i], ExtractOffset);
     }
 
     DstRegs.push_back(SegReg);
   }
 
   Register DstReg = MI.getOperand(0).getReg();
   if (MRI.getType(DstReg).isVector())
     MIRBuilder.buildBuildVector(DstReg, DstRegs);
   else if (DstRegs.size() > 1)
     MIRBuilder.buildMerge(DstReg, DstRegs);
   else
     MIRBuilder.buildCopy(DstReg, DstRegs[0]);
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::narrowScalarInsert(MachineInstr &MI, unsigned TypeIdx,
                                     LLT NarrowTy) {
   // FIXME: Don't know how to handle secondary types yet.
   if (TypeIdx != 0)
     return UnableToLegalize;
 
   uint64_t SizeOp0 = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
   uint64_t NarrowSize = NarrowTy.getSizeInBits();
 
   // FIXME: add support for when SizeOp0 isn't an exact multiple of
   // NarrowSize.
   if (SizeOp0 % NarrowSize != 0)
     return UnableToLegalize;
 
   int NumParts = SizeOp0 / NarrowSize;
 
   SmallVector<Register, 2> SrcRegs, DstRegs;
   SmallVector<uint64_t, 2> Indexes;
   extractParts(MI.getOperand(1).getReg(), NarrowTy, NumParts, SrcRegs);
 
   Register OpReg = MI.getOperand(2).getReg();
   uint64_t OpStart = MI.getOperand(3).getImm();
   uint64_t OpSize = MRI.getType(OpReg).getSizeInBits();
   for (int i = 0; i < NumParts; ++i) {
     unsigned DstStart = i * NarrowSize;
 
     if (DstStart + NarrowSize <= OpStart || DstStart >= OpStart + OpSize) {
       // No part of the insert affects this subregister, forward the original.
       DstRegs.push_back(SrcRegs[i]);
       continue;
     } else if (DstStart == OpStart && NarrowTy == MRI.getType(OpReg)) {
       // The entire subregister is defined by this insert, forward the new
       // value.
       DstRegs.push_back(OpReg);
       continue;
     }
 
     // OpSegStart is where this destination segment would start in OpReg if it
     // extended infinitely in both directions.
     int64_t ExtractOffset, InsertOffset;
     uint64_t SegSize;
     if (OpStart < DstStart) {
       InsertOffset = 0;
       ExtractOffset = DstStart - OpStart;
       SegSize = std::min(NarrowSize, OpStart + OpSize - DstStart);
     } else {
       InsertOffset = OpStart - DstStart;
       ExtractOffset = 0;
       SegSize =
         std::min(NarrowSize - InsertOffset, OpStart + OpSize - DstStart);
     }
 
     Register SegReg = OpReg;
     if (ExtractOffset != 0 || SegSize != OpSize) {
       // A genuine extract is needed.
       SegReg = MRI.createGenericVirtualRegister(LLT::scalar(SegSize));
       MIRBuilder.buildExtract(SegReg, OpReg, ExtractOffset);
     }
 
     Register DstReg = MRI.createGenericVirtualRegister(NarrowTy);
     MIRBuilder.buildInsert(DstReg, SrcRegs[i], SegReg, InsertOffset);
     DstRegs.push_back(DstReg);
   }
 
   assert(DstRegs.size() == (unsigned)NumParts && "not all parts covered");
   Register DstReg = MI.getOperand(0).getReg();
   if(MRI.getType(DstReg).isVector())
     MIRBuilder.buildBuildVector(DstReg, DstRegs);
   else
     MIRBuilder.buildMerge(DstReg, DstRegs);
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::narrowScalarBasic(MachineInstr &MI, unsigned TypeIdx,
                                    LLT NarrowTy) {
   Register DstReg = MI.getOperand(0).getReg();
   LLT DstTy = MRI.getType(DstReg);
 
   assert(MI.getNumOperands() == 3 && TypeIdx == 0);
 
   SmallVector<Register, 4> DstRegs, DstLeftoverRegs;
   SmallVector<Register, 4> Src0Regs, Src0LeftoverRegs;
   SmallVector<Register, 4> Src1Regs, Src1LeftoverRegs;
   LLT LeftoverTy;
   if (!extractParts(MI.getOperand(1).getReg(), DstTy, NarrowTy, LeftoverTy,
                     Src0Regs, Src0LeftoverRegs))
     return UnableToLegalize;
 
   LLT Unused;
   if (!extractParts(MI.getOperand(2).getReg(), DstTy, NarrowTy, Unused,
                     Src1Regs, Src1LeftoverRegs))
     llvm_unreachable("inconsistent extractParts result");
 
   for (unsigned I = 0, E = Src1Regs.size(); I != E; ++I) {
     auto Inst = MIRBuilder.buildInstr(MI.getOpcode(), {NarrowTy},
                                         {Src0Regs[I], Src1Regs[I]});
     DstRegs.push_back(Inst.getReg(0));
   }
 
   for (unsigned I = 0, E = Src1LeftoverRegs.size(); I != E; ++I) {
     auto Inst = MIRBuilder.buildInstr(
       MI.getOpcode(),
       {LeftoverTy}, {Src0LeftoverRegs[I], Src1LeftoverRegs[I]});
     DstLeftoverRegs.push_back(Inst.getReg(0));
   }
 
   insertParts(DstReg, DstTy, NarrowTy, DstRegs,
               LeftoverTy, DstLeftoverRegs);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::narrowScalarExt(MachineInstr &MI, unsigned TypeIdx,
                                  LLT NarrowTy) {
   if (TypeIdx != 0)
     return UnableToLegalize;
 
   Register DstReg = MI.getOperand(0).getReg();
   Register SrcReg = MI.getOperand(1).getReg();
 
   LLT DstTy = MRI.getType(DstReg);
   if (DstTy.isVector())
     return UnableToLegalize;
 
   SmallVector<Register, 8> Parts;
   LLT GCDTy = extractGCDType(Parts, DstTy, NarrowTy, SrcReg);
   LLT LCMTy = buildLCMMergePieces(DstTy, NarrowTy, GCDTy, Parts, MI.getOpcode());
   buildWidenedRemergeToDst(DstReg, LCMTy, Parts);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::narrowScalarSelect(MachineInstr &MI, unsigned TypeIdx,
                                     LLT NarrowTy) {
   if (TypeIdx != 0)
     return UnableToLegalize;
 
   Register CondReg = MI.getOperand(1).getReg();
   LLT CondTy = MRI.getType(CondReg);
   if (CondTy.isVector()) // TODO: Handle vselect
     return UnableToLegalize;
 
   Register DstReg = MI.getOperand(0).getReg();
   LLT DstTy = MRI.getType(DstReg);
 
   SmallVector<Register, 4> DstRegs, DstLeftoverRegs;
   SmallVector<Register, 4> Src1Regs, Src1LeftoverRegs;
   SmallVector<Register, 4> Src2Regs, Src2LeftoverRegs;
   LLT LeftoverTy;
   if (!extractParts(MI.getOperand(2).getReg(), DstTy, NarrowTy, LeftoverTy,
                     Src1Regs, Src1LeftoverRegs))
     return UnableToLegalize;
 
   LLT Unused;
   if (!extractParts(MI.getOperand(3).getReg(), DstTy, NarrowTy, Unused,
                     Src2Regs, Src2LeftoverRegs))
     llvm_unreachable("inconsistent extractParts result");
 
   for (unsigned I = 0, E = Src1Regs.size(); I != E; ++I) {
     auto Select = MIRBuilder.buildSelect(NarrowTy,
                                          CondReg, Src1Regs[I], Src2Regs[I]);
     DstRegs.push_back(Select.getReg(0));
   }
 
   for (unsigned I = 0, E = Src1LeftoverRegs.size(); I != E; ++I) {
     auto Select = MIRBuilder.buildSelect(
       LeftoverTy, CondReg, Src1LeftoverRegs[I], Src2LeftoverRegs[I]);
     DstLeftoverRegs.push_back(Select.getReg(0));
   }
 
   insertParts(DstReg, DstTy, NarrowTy, DstRegs,
               LeftoverTy, DstLeftoverRegs);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::narrowScalarCTLZ(MachineInstr &MI, unsigned TypeIdx,
                                   LLT NarrowTy) {
   if (TypeIdx != 1)
     return UnableToLegalize;
 
   Register DstReg = MI.getOperand(0).getReg();
   Register SrcReg = MI.getOperand(1).getReg();
   LLT DstTy = MRI.getType(DstReg);
   LLT SrcTy = MRI.getType(SrcReg);
   unsigned NarrowSize = NarrowTy.getSizeInBits();
 
   if (SrcTy.isScalar() && SrcTy.getSizeInBits() == 2 * NarrowSize) {
     const bool IsUndef = MI.getOpcode() == TargetOpcode::G_CTLZ_ZERO_UNDEF;
 
     MachineIRBuilder &B = MIRBuilder;
     auto UnmergeSrc = B.buildUnmerge(NarrowTy, SrcReg);
     // ctlz(Hi:Lo) -> Hi == 0 ? (NarrowSize + ctlz(Lo)) : ctlz(Hi)
     auto C_0 = B.buildConstant(NarrowTy, 0);
     auto HiIsZero = B.buildICmp(CmpInst::ICMP_EQ, LLT::scalar(1),
                                 UnmergeSrc.getReg(1), C_0);
     auto LoCTLZ = IsUndef ?
       B.buildCTLZ_ZERO_UNDEF(DstTy, UnmergeSrc.getReg(0)) :
       B.buildCTLZ(DstTy, UnmergeSrc.getReg(0));
     auto C_NarrowSize = B.buildConstant(DstTy, NarrowSize);
     auto HiIsZeroCTLZ = B.buildAdd(DstTy, LoCTLZ, C_NarrowSize);
     auto HiCTLZ = B.buildCTLZ_ZERO_UNDEF(DstTy, UnmergeSrc.getReg(1));
     B.buildSelect(DstReg, HiIsZero, HiIsZeroCTLZ, HiCTLZ);
 
     MI.eraseFromParent();
     return Legalized;
   }
 
   return UnableToLegalize;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::narrowScalarCTTZ(MachineInstr &MI, unsigned TypeIdx,
                                   LLT NarrowTy) {
   if (TypeIdx != 1)
     return UnableToLegalize;
 
   Register DstReg = MI.getOperand(0).getReg();
   Register SrcReg = MI.getOperand(1).getReg();
   LLT DstTy = MRI.getType(DstReg);
   LLT SrcTy = MRI.getType(SrcReg);
   unsigned NarrowSize = NarrowTy.getSizeInBits();
 
   if (SrcTy.isScalar() && SrcTy.getSizeInBits() == 2 * NarrowSize) {
     const bool IsUndef = MI.getOpcode() == TargetOpcode::G_CTTZ_ZERO_UNDEF;
 
     MachineIRBuilder &B = MIRBuilder;
     auto UnmergeSrc = B.buildUnmerge(NarrowTy, SrcReg);
     // cttz(Hi:Lo) -> Lo == 0 ? (cttz(Hi) + NarrowSize) : cttz(Lo)
     auto C_0 = B.buildConstant(NarrowTy, 0);
     auto LoIsZero = B.buildICmp(CmpInst::ICMP_EQ, LLT::scalar(1),
                                 UnmergeSrc.getReg(0), C_0);
     auto HiCTTZ = IsUndef ?
       B.buildCTTZ_ZERO_UNDEF(DstTy, UnmergeSrc.getReg(1)) :
       B.buildCTTZ(DstTy, UnmergeSrc.getReg(1));
     auto C_NarrowSize = B.buildConstant(DstTy, NarrowSize);
     auto LoIsZeroCTTZ = B.buildAdd(DstTy, HiCTTZ, C_NarrowSize);
     auto LoCTTZ = B.buildCTTZ_ZERO_UNDEF(DstTy, UnmergeSrc.getReg(0));
     B.buildSelect(DstReg, LoIsZero, LoIsZeroCTTZ, LoCTTZ);
 
     MI.eraseFromParent();
     return Legalized;
   }
 
   return UnableToLegalize;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::narrowScalarCTPOP(MachineInstr &MI, unsigned TypeIdx,
                                    LLT NarrowTy) {
   if (TypeIdx != 1)
     return UnableToLegalize;
 
   Register DstReg = MI.getOperand(0).getReg();
   LLT DstTy = MRI.getType(DstReg);
   LLT SrcTy = MRI.getType(MI.getOperand(1).getReg());
   unsigned NarrowSize = NarrowTy.getSizeInBits();
 
   if (SrcTy.isScalar() && SrcTy.getSizeInBits() == 2 * NarrowSize) {
     auto UnmergeSrc = MIRBuilder.buildUnmerge(NarrowTy, MI.getOperand(1));
 
     auto LoCTPOP = MIRBuilder.buildCTPOP(DstTy, UnmergeSrc.getReg(0));
     auto HiCTPOP = MIRBuilder.buildCTPOP(DstTy, UnmergeSrc.getReg(1));
     MIRBuilder.buildAdd(DstReg, HiCTPOP, LoCTPOP);
 
     MI.eraseFromParent();
     return Legalized;
   }
 
   return UnableToLegalize;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerBitCount(MachineInstr &MI, unsigned TypeIdx, LLT Ty) {
   unsigned Opc = MI.getOpcode();
   auto &TII = *MI.getMF()->getSubtarget().getInstrInfo();
   auto isSupported = [this](const LegalityQuery &Q) {
     auto QAction = LI.getAction(Q).Action;
     return QAction == Legal || QAction == Libcall || QAction == Custom;
   };
   switch (Opc) {
   default:
     return UnableToLegalize;
   case TargetOpcode::G_CTLZ_ZERO_UNDEF: {
     // This trivially expands to CTLZ.
     Observer.changingInstr(MI);
     MI.setDesc(TII.get(TargetOpcode::G_CTLZ));
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_CTLZ: {
     Register DstReg = MI.getOperand(0).getReg();
     Register SrcReg = MI.getOperand(1).getReg();
     LLT DstTy = MRI.getType(DstReg);
     LLT SrcTy = MRI.getType(SrcReg);
     unsigned Len = SrcTy.getSizeInBits();
 
     if (isSupported({TargetOpcode::G_CTLZ_ZERO_UNDEF, {DstTy, SrcTy}})) {
       // If CTLZ_ZERO_UNDEF is supported, emit that and a select for zero.
       auto CtlzZU = MIRBuilder.buildCTLZ_ZERO_UNDEF(DstTy, SrcReg);
       auto ZeroSrc = MIRBuilder.buildConstant(SrcTy, 0);
       auto ICmp = MIRBuilder.buildICmp(
           CmpInst::ICMP_EQ, SrcTy.changeElementSize(1), SrcReg, ZeroSrc);
       auto LenConst = MIRBuilder.buildConstant(DstTy, Len);
       MIRBuilder.buildSelect(DstReg, ICmp, LenConst, CtlzZU);
       MI.eraseFromParent();
       return Legalized;
     }
     // for now, we do this:
     // NewLen = NextPowerOf2(Len);
     // x = x | (x >> 1);
     // x = x | (x >> 2);
     // ...
     // x = x | (x >>16);
     // x = x | (x >>32); // for 64-bit input
     // Upto NewLen/2
     // return Len - popcount(x);
     //
     // Ref: "Hacker's Delight" by Henry Warren
     Register Op = SrcReg;
     unsigned NewLen = PowerOf2Ceil(Len);
     for (unsigned i = 0; (1U << i) <= (NewLen / 2); ++i) {
       auto MIBShiftAmt = MIRBuilder.buildConstant(SrcTy, 1ULL << i);
       auto MIBOp = MIRBuilder.buildOr(
           SrcTy, Op, MIRBuilder.buildLShr(SrcTy, Op, MIBShiftAmt));
       Op = MIBOp.getReg(0);
     }
     auto MIBPop = MIRBuilder.buildCTPOP(DstTy, Op);
     MIRBuilder.buildSub(MI.getOperand(0), MIRBuilder.buildConstant(DstTy, Len),
                         MIBPop);
     MI.eraseFromParent();
     return Legalized;
   }
   case TargetOpcode::G_CTTZ_ZERO_UNDEF: {
     // This trivially expands to CTTZ.
     Observer.changingInstr(MI);
     MI.setDesc(TII.get(TargetOpcode::G_CTTZ));
     Observer.changedInstr(MI);
     return Legalized;
   }
   case TargetOpcode::G_CTTZ: {
     Register DstReg = MI.getOperand(0).getReg();
     Register SrcReg = MI.getOperand(1).getReg();
     LLT DstTy = MRI.getType(DstReg);
     LLT SrcTy = MRI.getType(SrcReg);
 
     unsigned Len = SrcTy.getSizeInBits();
     if (isSupported({TargetOpcode::G_CTTZ_ZERO_UNDEF, {DstTy, SrcTy}})) {
       // If CTTZ_ZERO_UNDEF is legal or custom, emit that and a select with
       // zero.
       auto CttzZU = MIRBuilder.buildCTTZ_ZERO_UNDEF(DstTy, SrcReg);
       auto Zero = MIRBuilder.buildConstant(SrcTy, 0);
       auto ICmp = MIRBuilder.buildICmp(
           CmpInst::ICMP_EQ, DstTy.changeElementSize(1), SrcReg, Zero);
       auto LenConst = MIRBuilder.buildConstant(DstTy, Len);
       MIRBuilder.buildSelect(DstReg, ICmp, LenConst, CttzZU);
       MI.eraseFromParent();
       return Legalized;
     }
     // for now, we use: { return popcount(~x & (x - 1)); }
     // unless the target has ctlz but not ctpop, in which case we use:
     // { return 32 - nlz(~x & (x-1)); }
     // Ref: "Hacker's Delight" by Henry Warren
     auto MIBCstNeg1 = MIRBuilder.buildConstant(Ty, -1);
     auto MIBNot = MIRBuilder.buildXor(Ty, SrcReg, MIBCstNeg1);
     auto MIBTmp = MIRBuilder.buildAnd(
         Ty, MIBNot, MIRBuilder.buildAdd(Ty, SrcReg, MIBCstNeg1));
     if (!isSupported({TargetOpcode::G_CTPOP, {Ty, Ty}}) &&
         isSupported({TargetOpcode::G_CTLZ, {Ty, Ty}})) {
       auto MIBCstLen = MIRBuilder.buildConstant(Ty, Len);
       MIRBuilder.buildSub(MI.getOperand(0), MIBCstLen,
                           MIRBuilder.buildCTLZ(Ty, MIBTmp));
       MI.eraseFromParent();
       return Legalized;
     }
     MI.setDesc(TII.get(TargetOpcode::G_CTPOP));
     MI.getOperand(1).setReg(MIBTmp.getReg(0));
     return Legalized;
   }
   case TargetOpcode::G_CTPOP: {
     unsigned Size = Ty.getSizeInBits();
     MachineIRBuilder &B = MIRBuilder;
 
     // Count set bits in blocks of 2 bits. Default approach would be
     // B2Count = { val & 0x55555555 } + { (val >> 1) & 0x55555555 }
     // We use following formula instead:
     // B2Count = val - { (val >> 1) & 0x55555555 }
     // since it gives same result in blocks of 2 with one instruction less.
     auto C_1 = B.buildConstant(Ty, 1);
     auto B2Set1LoTo1Hi = B.buildLShr(Ty, MI.getOperand(1).getReg(), C_1);
     APInt B2Mask1HiTo0 = APInt::getSplat(Size, APInt(8, 0x55));
     auto C_B2Mask1HiTo0 = B.buildConstant(Ty, B2Mask1HiTo0);
     auto B2Count1Hi = B.buildAnd(Ty, B2Set1LoTo1Hi, C_B2Mask1HiTo0);
     auto B2Count = B.buildSub(Ty, MI.getOperand(1).getReg(), B2Count1Hi);
 
     // In order to get count in blocks of 4 add values from adjacent block of 2.
     // B4Count = { B2Count & 0x33333333 } + { (B2Count >> 2) & 0x33333333 }
     auto C_2 = B.buildConstant(Ty, 2);
     auto B4Set2LoTo2Hi = B.buildLShr(Ty, B2Count, C_2);
     APInt B4Mask2HiTo0 = APInt::getSplat(Size, APInt(8, 0x33));
     auto C_B4Mask2HiTo0 = B.buildConstant(Ty, B4Mask2HiTo0);
     auto B4HiB2Count = B.buildAnd(Ty, B4Set2LoTo2Hi, C_B4Mask2HiTo0);
     auto B4LoB2Count = B.buildAnd(Ty, B2Count, C_B4Mask2HiTo0);
     auto B4Count = B.buildAdd(Ty, B4HiB2Count, B4LoB2Count);
 
     // For count in blocks of 8 bits we don't have to mask high 4 bits before
     // addition since count value sits in range {0,...,8} and 4 bits are enough
     // to hold such binary values. After addition high 4 bits still hold count
     // of set bits in high 4 bit block, set them to zero and get 8 bit result.
     // B8Count = { B4Count + (B4Count >> 4) } & 0x0F0F0F0F
     auto C_4 = B.buildConstant(Ty, 4);
     auto B8HiB4Count = B.buildLShr(Ty, B4Count, C_4);
     auto B8CountDirty4Hi = B.buildAdd(Ty, B8HiB4Count, B4Count);
     APInt B8Mask4HiTo0 = APInt::getSplat(Size, APInt(8, 0x0F));
     auto C_B8Mask4HiTo0 = B.buildConstant(Ty, B8Mask4HiTo0);
     auto B8Count = B.buildAnd(Ty, B8CountDirty4Hi, C_B8Mask4HiTo0);
 
     assert(Size<=128 && "Scalar size is too large for CTPOP lower algorithm");
     // 8 bits can hold CTPOP result of 128 bit int or smaller. Mul with this
     // bitmask will set 8 msb in ResTmp to sum of all B8Counts in 8 bit blocks.
     auto MulMask = B.buildConstant(Ty, APInt::getSplat(Size, APInt(8, 0x01)));
     auto ResTmp = B.buildMul(Ty, B8Count, MulMask);
 
     // Shift count result from 8 high bits to low bits.
     auto C_SizeM8 = B.buildConstant(Ty, Size - 8);
     B.buildLShr(MI.getOperand(0).getReg(), ResTmp, C_SizeM8);
 
     MI.eraseFromParent();
     return Legalized;
   }
   }
 }
 
 // Expand s32 = G_UITOFP s64 using bit operations to an IEEE float
 // representation.
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerU64ToF32BitOps(MachineInstr &MI) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
   const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
   const LLT S1 = LLT::scalar(1);
 
   assert(MRI.getType(Src) == S64 && MRI.getType(Dst) == S32);
 
   // unsigned cul2f(ulong u) {
   //   uint lz = clz(u);
   //   uint e = (u != 0) ? 127U + 63U - lz : 0;
   //   u = (u << lz) & 0x7fffffffffffffffUL;
   //   ulong t = u & 0xffffffffffUL;
   //   uint v = (e << 23) | (uint)(u >> 40);
   //   uint r = t > 0x8000000000UL ? 1U : (t == 0x8000000000UL ? v & 1U : 0U);
   //   return as_float(v + r);
   // }
 
   auto Zero32 = MIRBuilder.buildConstant(S32, 0);
   auto Zero64 = MIRBuilder.buildConstant(S64, 0);
 
   auto LZ = MIRBuilder.buildCTLZ_ZERO_UNDEF(S32, Src);
 
   auto K = MIRBuilder.buildConstant(S32, 127U + 63U);
   auto Sub = MIRBuilder.buildSub(S32, K, LZ);
 
   auto NotZero = MIRBuilder.buildICmp(CmpInst::ICMP_NE, S1, Src, Zero64);
   auto E = MIRBuilder.buildSelect(S32, NotZero, Sub, Zero32);
 
   auto Mask0 = MIRBuilder.buildConstant(S64, (-1ULL) >> 1);
   auto ShlLZ = MIRBuilder.buildShl(S64, Src, LZ);
 
   auto U = MIRBuilder.buildAnd(S64, ShlLZ, Mask0);
 
   auto Mask1 = MIRBuilder.buildConstant(S64, 0xffffffffffULL);
   auto T = MIRBuilder.buildAnd(S64, U, Mask1);
 
   auto UShl = MIRBuilder.buildLShr(S64, U, MIRBuilder.buildConstant(S64, 40));
   auto ShlE = MIRBuilder.buildShl(S32, E, MIRBuilder.buildConstant(S32, 23));
   auto V = MIRBuilder.buildOr(S32, ShlE, MIRBuilder.buildTrunc(S32, UShl));
 
   auto C = MIRBuilder.buildConstant(S64, 0x8000000000ULL);
   auto RCmp = MIRBuilder.buildICmp(CmpInst::ICMP_UGT, S1, T, C);
   auto TCmp = MIRBuilder.buildICmp(CmpInst::ICMP_EQ, S1, T, C);
   auto One = MIRBuilder.buildConstant(S32, 1);
 
   auto VTrunc1 = MIRBuilder.buildAnd(S32, V, One);
   auto Select0 = MIRBuilder.buildSelect(S32, TCmp, VTrunc1, Zero32);
   auto R = MIRBuilder.buildSelect(S32, RCmp, One, Select0);
   MIRBuilder.buildAdd(Dst, V, R);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerUITOFP(MachineInstr &MI, unsigned TypeIdx, LLT Ty) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
   LLT DstTy = MRI.getType(Dst);
   LLT SrcTy = MRI.getType(Src);
 
   if (SrcTy == LLT::scalar(1)) {
     auto True = MIRBuilder.buildFConstant(DstTy, 1.0);
     auto False = MIRBuilder.buildFConstant(DstTy, 0.0);
     MIRBuilder.buildSelect(Dst, Src, True, False);
     MI.eraseFromParent();
     return Legalized;
   }
 
   if (SrcTy != LLT::scalar(64))
     return UnableToLegalize;
 
   if (DstTy == LLT::scalar(32)) {
     // TODO: SelectionDAG has several alternative expansions to port which may
     // be more reasonble depending on the available instructions. If a target
     // has sitofp, does not have CTLZ, or can efficiently use f64 as an
     // intermediate type, this is probably worse.
     return lowerU64ToF32BitOps(MI);
   }
 
   return UnableToLegalize;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerSITOFP(MachineInstr &MI, unsigned TypeIdx, LLT Ty) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
   LLT DstTy = MRI.getType(Dst);
   LLT SrcTy = MRI.getType(Src);
 
   const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
   const LLT S1 = LLT::scalar(1);
 
   if (SrcTy == S1) {
     auto True = MIRBuilder.buildFConstant(DstTy, -1.0);
     auto False = MIRBuilder.buildFConstant(DstTy, 0.0);
     MIRBuilder.buildSelect(Dst, Src, True, False);
     MI.eraseFromParent();
     return Legalized;
   }
 
   if (SrcTy != S64)
     return UnableToLegalize;
 
   if (DstTy == S32) {
     // signed cl2f(long l) {
     //   long s = l >> 63;
     //   float r = cul2f((l + s) ^ s);
     //   return s ? -r : r;
     // }
     Register L = Src;
     auto SignBit = MIRBuilder.buildConstant(S64, 63);
     auto S = MIRBuilder.buildAShr(S64, L, SignBit);
 
     auto LPlusS = MIRBuilder.buildAdd(S64, L, S);
     auto Xor = MIRBuilder.buildXor(S64, LPlusS, S);
     auto R = MIRBuilder.buildUITOFP(S32, Xor);
 
     auto RNeg = MIRBuilder.buildFNeg(S32, R);
     auto SignNotZero = MIRBuilder.buildICmp(CmpInst::ICMP_NE, S1, S,
                                             MIRBuilder.buildConstant(S64, 0));
     MIRBuilder.buildSelect(Dst, SignNotZero, RNeg, R);
     MI.eraseFromParent();
     return Legalized;
   }
 
   return UnableToLegalize;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerFPTOUI(MachineInstr &MI, unsigned TypeIdx, LLT Ty) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
   LLT DstTy = MRI.getType(Dst);
   LLT SrcTy = MRI.getType(Src);
   const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
 
   if (SrcTy != S64 && SrcTy != S32)
     return UnableToLegalize;
   if (DstTy != S32 && DstTy != S64)
     return UnableToLegalize;
 
   // FPTOSI gives same result as FPTOUI for positive signed integers.
   // FPTOUI needs to deal with fp values that convert to unsigned integers
   // greater or equal to 2^31 for float or 2^63 for double. For brevity 2^Exp.
 
   APInt TwoPExpInt = APInt::getSignMask(DstTy.getSizeInBits());
   APFloat TwoPExpFP(SrcTy.getSizeInBits() == 32 ? APFloat::IEEEsingle()
                                                 : APFloat::IEEEdouble(),
                     APInt::getNullValue(SrcTy.getSizeInBits()));
   TwoPExpFP.convertFromAPInt(TwoPExpInt, false, APFloat::rmNearestTiesToEven);
 
   MachineInstrBuilder FPTOSI = MIRBuilder.buildFPTOSI(DstTy, Src);
 
   MachineInstrBuilder Threshold = MIRBuilder.buildFConstant(SrcTy, TwoPExpFP);
   // For fp Value greater or equal to Threshold(2^Exp), we use FPTOSI on
   // (Value - 2^Exp) and add 2^Exp by setting highest bit in result to 1.
   MachineInstrBuilder FSub = MIRBuilder.buildFSub(SrcTy, Src, Threshold);
   MachineInstrBuilder ResLowBits = MIRBuilder.buildFPTOSI(DstTy, FSub);
   MachineInstrBuilder ResHighBit = MIRBuilder.buildConstant(DstTy, TwoPExpInt);
   MachineInstrBuilder Res = MIRBuilder.buildXor(DstTy, ResLowBits, ResHighBit);
 
   const LLT S1 = LLT::scalar(1);
 
   MachineInstrBuilder FCMP =
       MIRBuilder.buildFCmp(CmpInst::FCMP_ULT, S1, Src, Threshold);
   MIRBuilder.buildSelect(Dst, FCMP, FPTOSI, Res);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult LegalizerHelper::lowerFPTOSI(MachineInstr &MI) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
   LLT DstTy = MRI.getType(Dst);
   LLT SrcTy = MRI.getType(Src);
   const LLT S64 = LLT::scalar(64);
   const LLT S32 = LLT::scalar(32);
 
   // FIXME: Only f32 to i64 conversions are supported.
   if (SrcTy.getScalarType() != S32 || DstTy.getScalarType() != S64)
     return UnableToLegalize;
 
   // Expand f32 -> i64 conversion
   // This algorithm comes from compiler-rt's implementation of fixsfdi:
   // https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/builtins/fixsfdi.c
 
   unsigned SrcEltBits = SrcTy.getScalarSizeInBits();
 
   auto ExponentMask = MIRBuilder.buildConstant(SrcTy, 0x7F800000);
   auto ExponentLoBit = MIRBuilder.buildConstant(SrcTy, 23);
 
   auto AndExpMask = MIRBuilder.buildAnd(SrcTy, Src, ExponentMask);
   auto ExponentBits = MIRBuilder.buildLShr(SrcTy, AndExpMask, ExponentLoBit);
 
   auto SignMask = MIRBuilder.buildConstant(SrcTy,
                                            APInt::getSignMask(SrcEltBits));
   auto AndSignMask = MIRBuilder.buildAnd(SrcTy, Src, SignMask);
   auto SignLowBit = MIRBuilder.buildConstant(SrcTy, SrcEltBits - 1);
   auto Sign = MIRBuilder.buildAShr(SrcTy, AndSignMask, SignLowBit);
   Sign = MIRBuilder.buildSExt(DstTy, Sign);
 
   auto MantissaMask = MIRBuilder.buildConstant(SrcTy, 0x007FFFFF);
   auto AndMantissaMask = MIRBuilder.buildAnd(SrcTy, Src, MantissaMask);
   auto K = MIRBuilder.buildConstant(SrcTy, 0x00800000);
 
   auto R = MIRBuilder.buildOr(SrcTy, AndMantissaMask, K);
   R = MIRBuilder.buildZExt(DstTy, R);
 
   auto Bias = MIRBuilder.buildConstant(SrcTy, 127);
   auto Exponent = MIRBuilder.buildSub(SrcTy, ExponentBits, Bias);
   auto SubExponent = MIRBuilder.buildSub(SrcTy, Exponent, ExponentLoBit);
   auto ExponentSub = MIRBuilder.buildSub(SrcTy, ExponentLoBit, Exponent);
 
   auto Shl = MIRBuilder.buildShl(DstTy, R, SubExponent);
   auto Srl = MIRBuilder.buildLShr(DstTy, R, ExponentSub);
 
   const LLT S1 = LLT::scalar(1);
   auto CmpGt = MIRBuilder.buildICmp(CmpInst::ICMP_SGT,
                                     S1, Exponent, ExponentLoBit);
 
   R = MIRBuilder.buildSelect(DstTy, CmpGt, Shl, Srl);
 
   auto XorSign = MIRBuilder.buildXor(DstTy, R, Sign);
   auto Ret = MIRBuilder.buildSub(DstTy, XorSign, Sign);
 
   auto ZeroSrcTy = MIRBuilder.buildConstant(SrcTy, 0);
 
   auto ExponentLt0 = MIRBuilder.buildICmp(CmpInst::ICMP_SLT,
                                           S1, Exponent, ZeroSrcTy);
 
   auto ZeroDstTy = MIRBuilder.buildConstant(DstTy, 0);
   MIRBuilder.buildSelect(Dst, ExponentLt0, ZeroDstTy, Ret);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 // f64 -> f16 conversion using round-to-nearest-even rounding mode.
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerFPTRUNC_F64_TO_F16(MachineInstr &MI) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
 
   if (MRI.getType(Src).isVector()) // TODO: Handle vectors directly.
     return UnableToLegalize;
 
   const unsigned ExpMask = 0x7ff;
   const unsigned ExpBiasf64 = 1023;
   const unsigned ExpBiasf16 = 15;
   const LLT S32 = LLT::scalar(32);
   const LLT S1 = LLT::scalar(1);
 
   auto Unmerge = MIRBuilder.buildUnmerge(S32, Src);
   Register U = Unmerge.getReg(0);
   Register UH = Unmerge.getReg(1);
 
   auto E = MIRBuilder.buildLShr(S32, UH, MIRBuilder.buildConstant(S32, 20));
   E = MIRBuilder.buildAnd(S32, E, MIRBuilder.buildConstant(S32, ExpMask));
 
   // Subtract the fp64 exponent bias (1023) to get the real exponent and
   // add the f16 bias (15) to get the biased exponent for the f16 format.
   E = MIRBuilder.buildAdd(
     S32, E, MIRBuilder.buildConstant(S32, -ExpBiasf64 + ExpBiasf16));
 
   auto M = MIRBuilder.buildLShr(S32, UH, MIRBuilder.buildConstant(S32, 8));
   M = MIRBuilder.buildAnd(S32, M, MIRBuilder.buildConstant(S32, 0xffe));
 
   auto MaskedSig = MIRBuilder.buildAnd(S32, UH,
                                        MIRBuilder.buildConstant(S32, 0x1ff));
   MaskedSig = MIRBuilder.buildOr(S32, MaskedSig, U);
 
   auto Zero = MIRBuilder.buildConstant(S32, 0);
   auto SigCmpNE0 = MIRBuilder.buildICmp(CmpInst::ICMP_NE, S1, MaskedSig, Zero);
   auto Lo40Set = MIRBuilder.buildZExt(S32, SigCmpNE0);
   M = MIRBuilder.buildOr(S32, M, Lo40Set);
 
   // (M != 0 ? 0x0200 : 0) | 0x7c00;
   auto Bits0x200 = MIRBuilder.buildConstant(S32, 0x0200);
   auto CmpM_NE0 = MIRBuilder.buildICmp(CmpInst::ICMP_NE, S1, M, Zero);
   auto SelectCC = MIRBuilder.buildSelect(S32, CmpM_NE0, Bits0x200, Zero);
 
   auto Bits0x7c00 = MIRBuilder.buildConstant(S32, 0x7c00);
   auto I = MIRBuilder.buildOr(S32, SelectCC, Bits0x7c00);
 
   // N = M | (E << 12);
   auto EShl12 = MIRBuilder.buildShl(S32, E, MIRBuilder.buildConstant(S32, 12));
   auto N = MIRBuilder.buildOr(S32, M, EShl12);
 
   // B = clamp(1-E, 0, 13);
   auto One = MIRBuilder.buildConstant(S32, 1);
   auto OneSubExp = MIRBuilder.buildSub(S32, One, E);
   auto B = MIRBuilder.buildSMax(S32, OneSubExp, Zero);
   B = MIRBuilder.buildSMin(S32, B, MIRBuilder.buildConstant(S32, 13));
 
   auto SigSetHigh = MIRBuilder.buildOr(S32, M,
                                        MIRBuilder.buildConstant(S32, 0x1000));
 
   auto D = MIRBuilder.buildLShr(S32, SigSetHigh, B);
   auto D0 = MIRBuilder.buildShl(S32, D, B);
 
   auto D0_NE_SigSetHigh = MIRBuilder.buildICmp(CmpInst::ICMP_NE, S1,
                                              D0, SigSetHigh);
   auto D1 = MIRBuilder.buildZExt(S32, D0_NE_SigSetHigh);
   D = MIRBuilder.buildOr(S32, D, D1);
 
   auto CmpELtOne = MIRBuilder.buildICmp(CmpInst::ICMP_SLT, S1, E, One);
   auto V = MIRBuilder.buildSelect(S32, CmpELtOne, D, N);
 
   auto VLow3 = MIRBuilder.buildAnd(S32, V, MIRBuilder.buildConstant(S32, 7));
   V = MIRBuilder.buildLShr(S32, V, MIRBuilder.buildConstant(S32, 2));
 
   auto VLow3Eq3 = MIRBuilder.buildICmp(CmpInst::ICMP_EQ, S1, VLow3,
                                        MIRBuilder.buildConstant(S32, 3));
   auto V0 = MIRBuilder.buildZExt(S32, VLow3Eq3);
 
   auto VLow3Gt5 = MIRBuilder.buildICmp(CmpInst::ICMP_SGT, S1, VLow3,
                                        MIRBuilder.buildConstant(S32, 5));
   auto V1 = MIRBuilder.buildZExt(S32, VLow3Gt5);
 
   V1 = MIRBuilder.buildOr(S32, V0, V1);
   V = MIRBuilder.buildAdd(S32, V, V1);
 
   auto CmpEGt30 = MIRBuilder.buildICmp(CmpInst::ICMP_SGT,  S1,
                                        E, MIRBuilder.buildConstant(S32, 30));
   V = MIRBuilder.buildSelect(S32, CmpEGt30,
                              MIRBuilder.buildConstant(S32, 0x7c00), V);
 
   auto CmpEGt1039 = MIRBuilder.buildICmp(CmpInst::ICMP_EQ, S1,
                                          E, MIRBuilder.buildConstant(S32, 1039));
   V = MIRBuilder.buildSelect(S32, CmpEGt1039, I, V);
 
   // Extract the sign bit.
   auto Sign = MIRBuilder.buildLShr(S32, UH, MIRBuilder.buildConstant(S32, 16));
   Sign = MIRBuilder.buildAnd(S32, Sign, MIRBuilder.buildConstant(S32, 0x8000));
 
   // Insert the sign bit
   V = MIRBuilder.buildOr(S32, Sign, V);
 
   MIRBuilder.buildTrunc(Dst, V);
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerFPTRUNC(MachineInstr &MI, unsigned TypeIdx, LLT Ty) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
 
   LLT DstTy = MRI.getType(Dst);
   LLT SrcTy = MRI.getType(Src);
   const LLT S64 = LLT::scalar(64);
   const LLT S16 = LLT::scalar(16);
 
   if (DstTy.getScalarType() == S16 && SrcTy.getScalarType() == S64)
     return lowerFPTRUNC_F64_TO_F16(MI);
 
   return UnableToLegalize;
 }
 
 static CmpInst::Predicate minMaxToCompare(unsigned Opc) {
   switch (Opc) {
   case TargetOpcode::G_SMIN:
     return CmpInst::ICMP_SLT;
   case TargetOpcode::G_SMAX:
     return CmpInst::ICMP_SGT;
   case TargetOpcode::G_UMIN:
     return CmpInst::ICMP_ULT;
   case TargetOpcode::G_UMAX:
     return CmpInst::ICMP_UGT;
   default:
     llvm_unreachable("not in integer min/max");
   }
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerMinMax(MachineInstr &MI, unsigned TypeIdx, LLT Ty) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src0 = MI.getOperand(1).getReg();
   Register Src1 = MI.getOperand(2).getReg();
 
   const CmpInst::Predicate Pred = minMaxToCompare(MI.getOpcode());
   LLT CmpType = MRI.getType(Dst).changeElementSize(1);
 
   auto Cmp = MIRBuilder.buildICmp(Pred, CmpType, Src0, Src1);
   MIRBuilder.buildSelect(Dst, Cmp, Src0, Src1);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerFCopySign(MachineInstr &MI, unsigned TypeIdx, LLT Ty) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src0 = MI.getOperand(1).getReg();
   Register Src1 = MI.getOperand(2).getReg();
 
   const LLT Src0Ty = MRI.getType(Src0);
   const LLT Src1Ty = MRI.getType(Src1);
 
   const int Src0Size = Src0Ty.getScalarSizeInBits();
   const int Src1Size = Src1Ty.getScalarSizeInBits();
 
   auto SignBitMask = MIRBuilder.buildConstant(
     Src0Ty, APInt::getSignMask(Src0Size));
 
   auto NotSignBitMask = MIRBuilder.buildConstant(
     Src0Ty, APInt::getLowBitsSet(Src0Size, Src0Size - 1));
 
   auto And0 = MIRBuilder.buildAnd(Src0Ty, Src0, NotSignBitMask);
   MachineInstr *Or;
 
   if (Src0Ty == Src1Ty) {
     auto And1 = MIRBuilder.buildAnd(Src1Ty, Src1, SignBitMask);
     Or = MIRBuilder.buildOr(Dst, And0, And1);
   } else if (Src0Size > Src1Size) {
     auto ShiftAmt = MIRBuilder.buildConstant(Src0Ty, Src0Size - Src1Size);
     auto Zext = MIRBuilder.buildZExt(Src0Ty, Src1);
     auto Shift = MIRBuilder.buildShl(Src0Ty, Zext, ShiftAmt);
     auto And1 = MIRBuilder.buildAnd(Src0Ty, Shift, SignBitMask);
     Or = MIRBuilder.buildOr(Dst, And0, And1);
   } else {
     auto ShiftAmt = MIRBuilder.buildConstant(Src1Ty, Src1Size - Src0Size);
     auto Shift = MIRBuilder.buildLShr(Src1Ty, Src1, ShiftAmt);
     auto Trunc = MIRBuilder.buildTrunc(Src0Ty, Shift);
     auto And1 = MIRBuilder.buildAnd(Src0Ty, Trunc, SignBitMask);
     Or = MIRBuilder.buildOr(Dst, And0, And1);
   }
 
   // Be careful about setting nsz/nnan/ninf on every instruction, since the
   // constants are a nan and -0.0, but the final result should preserve
   // everything.
   if (unsigned Flags = MI.getFlags())
     Or->setFlags(Flags);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerFMinNumMaxNum(MachineInstr &MI) {
   unsigned NewOp = MI.getOpcode() == TargetOpcode::G_FMINNUM ?
     TargetOpcode::G_FMINNUM_IEEE : TargetOpcode::G_FMAXNUM_IEEE;
 
   Register Dst = MI.getOperand(0).getReg();
   Register Src0 = MI.getOperand(1).getReg();
   Register Src1 = MI.getOperand(2).getReg();
   LLT Ty = MRI.getType(Dst);
 
   if (!MI.getFlag(MachineInstr::FmNoNans)) {
     // Insert canonicalizes if it's possible we need to quiet to get correct
     // sNaN behavior.
 
     // Note this must be done here, and not as an optimization combine in the
     // absence of a dedicate quiet-snan instruction as we're using an
     // omni-purpose G_FCANONICALIZE.
     if (!isKnownNeverSNaN(Src0, MRI))
       Src0 = MIRBuilder.buildFCanonicalize(Ty, Src0, MI.getFlags()).getReg(0);
 
     if (!isKnownNeverSNaN(Src1, MRI))
       Src1 = MIRBuilder.buildFCanonicalize(Ty, Src1, MI.getFlags()).getReg(0);
   }
 
   // If there are no nans, it's safe to simply replace this with the non-IEEE
   // version.
   MIRBuilder.buildInstr(NewOp, {Dst}, {Src0, Src1}, MI.getFlags());
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult LegalizerHelper::lowerFMad(MachineInstr &MI) {
   // Expand G_FMAD a, b, c -> G_FADD (G_FMUL a, b), c
   Register DstReg = MI.getOperand(0).getReg();
   LLT Ty = MRI.getType(DstReg);
   unsigned Flags = MI.getFlags();
 
   auto Mul = MIRBuilder.buildFMul(Ty, MI.getOperand(1), MI.getOperand(2),
                                   Flags);
   MIRBuilder.buildFAdd(DstReg, Mul, MI.getOperand(3), Flags);
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerIntrinsicRound(MachineInstr &MI) {
   Register DstReg = MI.getOperand(0).getReg();
   Register X = MI.getOperand(1).getReg();
   const unsigned Flags = MI.getFlags();
   const LLT Ty = MRI.getType(DstReg);
   const LLT CondTy = Ty.changeElementSize(1);
 
   // round(x) =>
   //  t = trunc(x);
   //  d = fabs(x - t);
   //  o = copysign(1.0f, x);
   //  return t + (d >= 0.5 ? o : 0.0);
 
   auto T = MIRBuilder.buildIntrinsicTrunc(Ty, X, Flags);
 
   auto Diff = MIRBuilder.buildFSub(Ty, X, T, Flags);
   auto AbsDiff = MIRBuilder.buildFAbs(Ty, Diff, Flags);
   auto Zero = MIRBuilder.buildFConstant(Ty, 0.0);
   auto One = MIRBuilder.buildFConstant(Ty, 1.0);
   auto Half = MIRBuilder.buildFConstant(Ty, 0.5);
   auto SignOne = MIRBuilder.buildFCopysign(Ty, One, X);
 
   auto Cmp = MIRBuilder.buildFCmp(CmpInst::FCMP_OGE, CondTy, AbsDiff, Half,
                                   Flags);
   auto Sel = MIRBuilder.buildSelect(Ty, Cmp, SignOne, Zero, Flags);
 
   MIRBuilder.buildFAdd(DstReg, T, Sel, Flags);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerFFloor(MachineInstr &MI) {
   Register DstReg = MI.getOperand(0).getReg();
   Register SrcReg = MI.getOperand(1).getReg();
   unsigned Flags = MI.getFlags();
   LLT Ty = MRI.getType(DstReg);
   const LLT CondTy = Ty.changeElementSize(1);
 
   // result = trunc(src);
   // if (src < 0.0 && src != result)
   //   result += -1.0.
 
   auto Trunc = MIRBuilder.buildIntrinsicTrunc(Ty, SrcReg, Flags);
   auto Zero = MIRBuilder.buildFConstant(Ty, 0.0);
 
   auto Lt0 = MIRBuilder.buildFCmp(CmpInst::FCMP_OLT, CondTy,
                                   SrcReg, Zero, Flags);
   auto NeTrunc = MIRBuilder.buildFCmp(CmpInst::FCMP_ONE, CondTy,
                                       SrcReg, Trunc, Flags);
   auto And = MIRBuilder.buildAnd(CondTy, Lt0, NeTrunc);
   auto AddVal = MIRBuilder.buildSITOFP(Ty, And);
 
   MIRBuilder.buildFAdd(DstReg, Trunc, AddVal, Flags);
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerMergeValues(MachineInstr &MI) {
   const unsigned NumOps = MI.getNumOperands();
   Register DstReg = MI.getOperand(0).getReg();
   Register Src0Reg = MI.getOperand(1).getReg();
   LLT DstTy = MRI.getType(DstReg);
   LLT SrcTy = MRI.getType(Src0Reg);
   unsigned PartSize = SrcTy.getSizeInBits();
 
   LLT WideTy = LLT::scalar(DstTy.getSizeInBits());
   Register ResultReg = MIRBuilder.buildZExt(WideTy, Src0Reg).getReg(0);
 
   for (unsigned I = 2; I != NumOps; ++I) {
     const unsigned Offset = (I - 1) * PartSize;
 
     Register SrcReg = MI.getOperand(I).getReg();
     auto ZextInput = MIRBuilder.buildZExt(WideTy, SrcReg);
 
     Register NextResult = I + 1 == NumOps && WideTy == DstTy ? DstReg :
       MRI.createGenericVirtualRegister(WideTy);
 
     auto ShiftAmt = MIRBuilder.buildConstant(WideTy, Offset);
     auto Shl = MIRBuilder.buildShl(WideTy, ZextInput, ShiftAmt);
     MIRBuilder.buildOr(NextResult, ResultReg, Shl);
     ResultReg = NextResult;
   }
 
   if (DstTy.isPointer()) {
     if (MIRBuilder.getDataLayout().isNonIntegralAddressSpace(
           DstTy.getAddressSpace())) {
       LLVM_DEBUG(dbgs() << "Not casting nonintegral address space\n");
       return UnableToLegalize;
     }
 
     MIRBuilder.buildIntToPtr(DstReg, ResultReg);
   }
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerUnmergeValues(MachineInstr &MI) {
   const unsigned NumDst = MI.getNumOperands() - 1;
   Register SrcReg = MI.getOperand(NumDst).getReg();
   Register Dst0Reg = MI.getOperand(0).getReg();
   LLT DstTy = MRI.getType(Dst0Reg);
   if (DstTy.isPointer())
     return UnableToLegalize; // TODO
 
   SrcReg = coerceToScalar(SrcReg);
   if (!SrcReg)
     return UnableToLegalize;
 
   // Expand scalarizing unmerge as bitcast to integer and shift.
   LLT IntTy = MRI.getType(SrcReg);
 
   MIRBuilder.buildTrunc(Dst0Reg, SrcReg);
 
   const unsigned DstSize = DstTy.getSizeInBits();
   unsigned Offset = DstSize;
   for (unsigned I = 1; I != NumDst; ++I, Offset += DstSize) {
     auto ShiftAmt = MIRBuilder.buildConstant(IntTy, Offset);
     auto Shift = MIRBuilder.buildLShr(IntTy, SrcReg, ShiftAmt);
     MIRBuilder.buildTrunc(MI.getOperand(I), Shift);
   }
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerShuffleVector(MachineInstr &MI) {
   Register DstReg = MI.getOperand(0).getReg();
   Register Src0Reg = MI.getOperand(1).getReg();
   Register Src1Reg = MI.getOperand(2).getReg();
   LLT Src0Ty = MRI.getType(Src0Reg);
   LLT DstTy = MRI.getType(DstReg);
   LLT IdxTy = LLT::scalar(32);
 
   ArrayRef<int> Mask = MI.getOperand(3).getShuffleMask();
 
   if (DstTy.isScalar()) {
     if (Src0Ty.isVector())
       return UnableToLegalize;
 
     // This is just a SELECT.
     assert(Mask.size() == 1 && "Expected a single mask element");
     Register Val;
     if (Mask[0] < 0 || Mask[0] > 1)
       Val = MIRBuilder.buildUndef(DstTy).getReg(0);
     else
       Val = Mask[0] == 0 ? Src0Reg : Src1Reg;
     MIRBuilder.buildCopy(DstReg, Val);
     MI.eraseFromParent();
     return Legalized;
   }
 
   Register Undef;
   SmallVector<Register, 32> BuildVec;
   LLT EltTy = DstTy.getElementType();
 
   for (int Idx : Mask) {
     if (Idx < 0) {
       if (!Undef.isValid())
         Undef = MIRBuilder.buildUndef(EltTy).getReg(0);
       BuildVec.push_back(Undef);
       continue;
     }
 
     if (Src0Ty.isScalar()) {
       BuildVec.push_back(Idx == 0 ? Src0Reg : Src1Reg);
     } else {
       int NumElts = Src0Ty.getNumElements();
       Register SrcVec = Idx < NumElts ? Src0Reg : Src1Reg;
       int ExtractIdx = Idx < NumElts ? Idx : Idx - NumElts;
       auto IdxK = MIRBuilder.buildConstant(IdxTy, ExtractIdx);
       auto Extract = MIRBuilder.buildExtractVectorElement(EltTy, SrcVec, IdxK);
       BuildVec.push_back(Extract.getReg(0));
     }
   }
 
   MIRBuilder.buildBuildVector(DstReg, BuildVec);
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerDynStackAlloc(MachineInstr &MI) {
   const auto &MF = *MI.getMF();
   const auto &TFI = *MF.getSubtarget().getFrameLowering();
   if (TFI.getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp)
     return UnableToLegalize;
 
   Register Dst = MI.getOperand(0).getReg();
   Register AllocSize = MI.getOperand(1).getReg();
   Align Alignment = assumeAligned(MI.getOperand(2).getImm());
 
   LLT PtrTy = MRI.getType(Dst);
   LLT IntPtrTy = LLT::scalar(PtrTy.getSizeInBits());
 
   const auto &TLI = *MF.getSubtarget().getTargetLowering();
   Register SPReg = TLI.getStackPointerRegisterToSaveRestore();
   auto SPTmp = MIRBuilder.buildCopy(PtrTy, SPReg);
   SPTmp = MIRBuilder.buildCast(IntPtrTy, SPTmp);
 
   // Subtract the final alloc from the SP. We use G_PTRTOINT here so we don't
   // have to generate an extra instruction to negate the alloc and then use
   // G_PTR_ADD to add the negative offset.
   auto Alloc = MIRBuilder.buildSub(IntPtrTy, SPTmp, AllocSize);
   if (Alignment > Align(1)) {
     APInt AlignMask(IntPtrTy.getSizeInBits(), Alignment.value(), true);
     AlignMask.negate();
     auto AlignCst = MIRBuilder.buildConstant(IntPtrTy, AlignMask);
     Alloc = MIRBuilder.buildAnd(IntPtrTy, Alloc, AlignCst);
   }
 
   SPTmp = MIRBuilder.buildCast(PtrTy, Alloc);
   MIRBuilder.buildCopy(SPReg, SPTmp);
   MIRBuilder.buildCopy(Dst, SPTmp);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerExtract(MachineInstr &MI) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
   unsigned Offset = MI.getOperand(2).getImm();
 
   LLT DstTy = MRI.getType(Dst);
   LLT SrcTy = MRI.getType(Src);
 
   if (DstTy.isScalar() &&
       (SrcTy.isScalar() ||
        (SrcTy.isVector() && DstTy == SrcTy.getElementType()))) {
     LLT SrcIntTy = SrcTy;
     if (!SrcTy.isScalar()) {
       SrcIntTy = LLT::scalar(SrcTy.getSizeInBits());
       Src = MIRBuilder.buildBitcast(SrcIntTy, Src).getReg(0);
     }
 
     if (Offset == 0)
       MIRBuilder.buildTrunc(Dst, Src);
     else {
       auto ShiftAmt = MIRBuilder.buildConstant(SrcIntTy, Offset);
       auto Shr = MIRBuilder.buildLShr(SrcIntTy, Src, ShiftAmt);
       MIRBuilder.buildTrunc(Dst, Shr);
     }
 
     MI.eraseFromParent();
     return Legalized;
   }
 
   return UnableToLegalize;
 }
 
 LegalizerHelper::LegalizeResult LegalizerHelper::lowerInsert(MachineInstr &MI) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
   Register InsertSrc = MI.getOperand(2).getReg();
   uint64_t Offset = MI.getOperand(3).getImm();
 
   LLT DstTy = MRI.getType(Src);
   LLT InsertTy = MRI.getType(InsertSrc);
 
   if (InsertTy.isVector() ||
       (DstTy.isVector() && DstTy.getElementType() != InsertTy))
     return UnableToLegalize;
 
   const DataLayout &DL = MIRBuilder.getDataLayout();
   if ((DstTy.isPointer() &&
        DL.isNonIntegralAddressSpace(DstTy.getAddressSpace())) ||
       (InsertTy.isPointer() &&
        DL.isNonIntegralAddressSpace(InsertTy.getAddressSpace()))) {
     LLVM_DEBUG(dbgs() << "Not casting non-integral address space integer\n");
     return UnableToLegalize;
   }
 
   LLT IntDstTy = DstTy;
 
   if (!DstTy.isScalar()) {
     IntDstTy = LLT::scalar(DstTy.getSizeInBits());
     Src = MIRBuilder.buildCast(IntDstTy, Src).getReg(0);
   }
 
   if (!InsertTy.isScalar()) {
     const LLT IntInsertTy = LLT::scalar(InsertTy.getSizeInBits());
     InsertSrc = MIRBuilder.buildPtrToInt(IntInsertTy, InsertSrc).getReg(0);
   }
 
   Register ExtInsSrc = MIRBuilder.buildZExt(IntDstTy, InsertSrc).getReg(0);
   if (Offset != 0) {
     auto ShiftAmt = MIRBuilder.buildConstant(IntDstTy, Offset);
     ExtInsSrc = MIRBuilder.buildShl(IntDstTy, ExtInsSrc, ShiftAmt).getReg(0);
   }
 
   APInt MaskVal = APInt::getBitsSetWithWrap(
       DstTy.getSizeInBits(), Offset + InsertTy.getSizeInBits(), Offset);
 
   auto Mask = MIRBuilder.buildConstant(IntDstTy, MaskVal);
   auto MaskedSrc = MIRBuilder.buildAnd(IntDstTy, Src, Mask);
   auto Or = MIRBuilder.buildOr(IntDstTy, MaskedSrc, ExtInsSrc);
 
   MIRBuilder.buildCast(Dst, Or);
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerSADDO_SSUBO(MachineInstr &MI) {
   Register Dst0 = MI.getOperand(0).getReg();
   Register Dst1 = MI.getOperand(1).getReg();
   Register LHS = MI.getOperand(2).getReg();
   Register RHS = MI.getOperand(3).getReg();
   const bool IsAdd = MI.getOpcode() == TargetOpcode::G_SADDO;
 
   LLT Ty = MRI.getType(Dst0);
   LLT BoolTy = MRI.getType(Dst1);
 
   if (IsAdd)
     MIRBuilder.buildAdd(Dst0, LHS, RHS);
   else
     MIRBuilder.buildSub(Dst0, LHS, RHS);
 
   // TODO: If SADDSAT/SSUBSAT is legal, compare results to detect overflow.
 
   auto Zero = MIRBuilder.buildConstant(Ty, 0);
 
   // For an addition, the result should be less than one of the operands (LHS)
   // if and only if the other operand (RHS) is negative, otherwise there will
   // be overflow.
   // For a subtraction, the result should be less than one of the operands
   // (LHS) if and only if the other operand (RHS) is (non-zero) positive,
   // otherwise there will be overflow.
   auto ResultLowerThanLHS =
       MIRBuilder.buildICmp(CmpInst::ICMP_SLT, BoolTy, Dst0, LHS);
   auto ConditionRHS = MIRBuilder.buildICmp(
       IsAdd ? CmpInst::ICMP_SLT : CmpInst::ICMP_SGT, BoolTy, RHS, Zero);
 
   MIRBuilder.buildXor(Dst1, ConditionRHS, ResultLowerThanLHS);
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerBswap(MachineInstr &MI) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
   const LLT Ty = MRI.getType(Src);
   unsigned SizeInBytes = (Ty.getScalarSizeInBits() + 7) / 8;
   unsigned BaseShiftAmt = (SizeInBytes - 1) * 8;
 
   // Swap most and least significant byte, set remaining bytes in Res to zero.
   auto ShiftAmt = MIRBuilder.buildConstant(Ty, BaseShiftAmt);
   auto LSByteShiftedLeft = MIRBuilder.buildShl(Ty, Src, ShiftAmt);
   auto MSByteShiftedRight = MIRBuilder.buildLShr(Ty, Src, ShiftAmt);
   auto Res = MIRBuilder.buildOr(Ty, MSByteShiftedRight, LSByteShiftedLeft);
 
   // Set i-th high/low byte in Res to i-th low/high byte from Src.
   for (unsigned i = 1; i < SizeInBytes / 2; ++i) {
     // AND with Mask leaves byte i unchanged and sets remaining bytes to 0.
     APInt APMask(SizeInBytes * 8, 0xFF << (i * 8));
     auto Mask = MIRBuilder.buildConstant(Ty, APMask);
     auto ShiftAmt = MIRBuilder.buildConstant(Ty, BaseShiftAmt - 16 * i);
     // Low byte shifted left to place of high byte: (Src & Mask) << ShiftAmt.
     auto LoByte = MIRBuilder.buildAnd(Ty, Src, Mask);
     auto LoShiftedLeft = MIRBuilder.buildShl(Ty, LoByte, ShiftAmt);
     Res = MIRBuilder.buildOr(Ty, Res, LoShiftedLeft);
     // High byte shifted right to place of low byte: (Src >> ShiftAmt) & Mask.
     auto SrcShiftedRight = MIRBuilder.buildLShr(Ty, Src, ShiftAmt);
     auto HiShiftedRight = MIRBuilder.buildAnd(Ty, SrcShiftedRight, Mask);
     Res = MIRBuilder.buildOr(Ty, Res, HiShiftedRight);
   }
   Res.getInstr()->getOperand(0).setReg(Dst);
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 //{ (Src & Mask) >> N } | { (Src << N) & Mask }
 static MachineInstrBuilder SwapN(unsigned N, DstOp Dst, MachineIRBuilder &B,
                                  MachineInstrBuilder Src, APInt Mask) {
   const LLT Ty = Dst.getLLTTy(*B.getMRI());
   MachineInstrBuilder C_N = B.buildConstant(Ty, N);
   MachineInstrBuilder MaskLoNTo0 = B.buildConstant(Ty, Mask);
   auto LHS = B.buildLShr(Ty, B.buildAnd(Ty, Src, MaskLoNTo0), C_N);
   auto RHS = B.buildAnd(Ty, B.buildShl(Ty, Src, C_N), MaskLoNTo0);
   return B.buildOr(Dst, LHS, RHS);
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerBitreverse(MachineInstr &MI) {
   Register Dst = MI.getOperand(0).getReg();
   Register Src = MI.getOperand(1).getReg();
   const LLT Ty = MRI.getType(Src);
   unsigned Size = Ty.getSizeInBits();
 
   MachineInstrBuilder BSWAP =
       MIRBuilder.buildInstr(TargetOpcode::G_BSWAP, {Ty}, {Src});
 
   // swap high and low 4 bits in 8 bit blocks 7654|3210 -> 3210|7654
   //    [(val & 0xF0F0F0F0) >> 4] | [(val & 0x0F0F0F0F) << 4]
   // -> [(val & 0xF0F0F0F0) >> 4] | [(val << 4) & 0xF0F0F0F0]
   MachineInstrBuilder Swap4 =
       SwapN(4, Ty, MIRBuilder, BSWAP, APInt::getSplat(Size, APInt(8, 0xF0)));
 
   // swap high and low 2 bits in 4 bit blocks 32|10 76|54 -> 10|32 54|76
   //    [(val & 0xCCCCCCCC) >> 2] & [(val & 0x33333333) << 2]
   // -> [(val & 0xCCCCCCCC) >> 2] & [(val << 2) & 0xCCCCCCCC]
   MachineInstrBuilder Swap2 =
       SwapN(2, Ty, MIRBuilder, Swap4, APInt::getSplat(Size, APInt(8, 0xCC)));
 
   // swap high and low 1 bit in 2 bit blocks 1|0 3|2 5|4 7|6 -> 0|1 2|3 4|5 6|7
   //    [(val & 0xAAAAAAAA) >> 1] & [(val & 0x55555555) << 1]
   // -> [(val & 0xAAAAAAAA) >> 1] & [(val << 1) & 0xAAAAAAAA]
   SwapN(1, Dst, MIRBuilder, Swap2, APInt::getSplat(Size, APInt(8, 0xAA)));
 
   MI.eraseFromParent();
   return Legalized;
 }
 
 LegalizerHelper::LegalizeResult
 LegalizerHelper::lowerReadWriteRegister(MachineInstr &MI) {
   MachineFunction &MF = MIRBuilder.getMF();
   const TargetSubtargetInfo &STI = MF.getSubtarget();
   const TargetLowering *TLI = STI.getTargetLowering();
 
   bool IsRead = MI.getOpcode() == TargetOpcode::G_READ_REGISTER;
   int NameOpIdx = IsRead ? 1 : 0;
   int ValRegIndex = IsRead ? 0 : 1;
 
   Register ValReg = MI.getOperand(ValRegIndex).getReg();
   const LLT Ty = MRI.getType(ValReg);
   const MDString *RegStr = cast<MDString>(
     cast<MDNode>(MI.getOperand(NameOpIdx).getMetadata())->getOperand(0));
 
   Register PhysReg = TLI->getRegisterByName(RegStr->getString().data(), Ty, MF);
   if (!PhysReg.isValid())
     return UnableToLegalize;
 
   if (IsRead)
     MIRBuilder.buildCopy(ValReg, PhysReg);
   else
     MIRBuilder.buildCopy(PhysReg, ValReg);
 
   MI.eraseFromParent();
   return Legalized;
 }
Index: vendor/llvm-project/release-11.x/llvm/lib/CodeGen/PHIEliminationUtils.cpp
===================================================================
--- vendor/llvm-project/release-11.x/llvm/lib/CodeGen/PHIEliminationUtils.cpp	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/lib/CodeGen/PHIEliminationUtils.cpp	(revision 366333)
@@ -1,60 +1,64 @@
 //===-- PHIEliminationUtils.cpp - Helper functions for PHI elimination ----===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 
 #include "PHIEliminationUtils.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 using namespace llvm;
 
 // findCopyInsertPoint - Find a safe place in MBB to insert a copy from SrcReg
 // when following the CFG edge to SuccMBB. This needs to be after any def of
 // SrcReg, but before any subsequent point where control flow might jump out of
 // the basic block.
 MachineBasicBlock::iterator
 llvm::findPHICopyInsertPoint(MachineBasicBlock* MBB, MachineBasicBlock* SuccMBB,
                              unsigned SrcReg) {
   // Handle the trivial case trivially.
   if (MBB->empty())
     return MBB->begin();
 
   // Usually, we just want to insert the copy before the first terminator
   // instruction. However, for the edge going to a landing pad, we must insert
   // the copy before the call/invoke instruction. Similarly for an INLINEASM_BR
-  // going to an indirect target.
-  if (!SuccMBB->isEHPad() && !SuccMBB->isInlineAsmBrIndirectTarget())
+  // going to an indirect target. This is similar to SplitKit.cpp's
+  // computeLastInsertPoint, and similarly assumes that there cannot be multiple
+  // instructions that are Calls with EHPad successors or INLINEASM_BR in a
+  // block.
+  bool EHPadSuccessor = SuccMBB->isEHPad();
+  if (!EHPadSuccessor && !SuccMBB->isInlineAsmBrIndirectTarget())
     return MBB->getFirstTerminator();
 
-  // Discover any defs/uses in this basic block.
-  SmallPtrSet<MachineInstr*, 8> DefUsesInMBB;
+  // Discover any defs in this basic block.
+  SmallPtrSet<MachineInstr *, 8> DefsInMBB;
   MachineRegisterInfo& MRI = MBB->getParent()->getRegInfo();
-  for (MachineInstr &RI : MRI.reg_instructions(SrcReg)) {
+  for (MachineInstr &RI : MRI.def_instructions(SrcReg))
     if (RI.getParent() == MBB)
-      DefUsesInMBB.insert(&RI);
-  }
+      DefsInMBB.insert(&RI);
 
-  MachineBasicBlock::iterator InsertPoint;
-  if (DefUsesInMBB.empty()) {
-    // No defs.  Insert the copy at the start of the basic block.
-    InsertPoint = MBB->begin();
-  } else if (DefUsesInMBB.size() == 1) {
-    // Insert the copy immediately after the def/use.
-    InsertPoint = *DefUsesInMBB.begin();
-    ++InsertPoint;
-  } else {
-    // Insert the copy immediately after the last def/use.
-    InsertPoint = MBB->end();
-    while (!DefUsesInMBB.count(&*--InsertPoint)) {}
-    ++InsertPoint;
+  MachineBasicBlock::iterator InsertPoint = MBB->begin();
+  // Insert the copy at the _latest_ point of:
+  // 1. Immediately AFTER the last def
+  // 2. Immediately BEFORE a call/inlineasm_br.
+  for (auto I = MBB->rbegin(), E = MBB->rend(); I != E; ++I) {
+    if (DefsInMBB.contains(&*I)) {
+      InsertPoint = std::next(I.getReverse());
+      break;
+    }
+    if ((EHPadSuccessor && I->isCall()) ||
+        I->getOpcode() == TargetOpcode::INLINEASM_BR) {
+      InsertPoint = I.getReverse();
+      break;
+    }
   }
 
   // Make sure the copy goes after any phi nodes but before
   // any debug nodes.
   return MBB->SkipPHIsAndLabels(InsertPoint);
 }
Index: vendor/llvm-project/release-11.x/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
===================================================================
--- vendor/llvm-project/release-11.x/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp	(revision 366333)
@@ -1,10701 +1,10675 @@
 //===- SelectionDAGBuilder.cpp - Selection-DAG building -------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 //
 // This implements routines for translating from LLVM IR into SelectionDAG IR.
 //
 //===----------------------------------------------------------------------===//
 
 #include "SelectionDAGBuilder.h"
 #include "SDNodeDbgValue.h"
 #include "llvm/ADT/APFloat.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/BitVector.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/None.h"
 #include "llvm/ADT/Optional.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Triple.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/Analysis/BlockFrequencyInfo.h"
 #include "llvm/Analysis/BranchProbabilityInfo.h"
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/Analysis/EHPersonalities.h"
 #include "llvm/Analysis/Loads.h"
 #include "llvm/Analysis/MemoryLocation.h"
 #include "llvm/Analysis/ProfileSummaryInfo.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/Analysis/VectorUtils.h"
 #include "llvm/CodeGen/Analysis.h"
 #include "llvm/CodeGen/FunctionLoweringInfo.h"
 #include "llvm/CodeGen/GCMetadata.h"
 #include "llvm/CodeGen/ISDOpcodes.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineInstr.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineJumpTableInfo.h"
 #include "llvm/CodeGen/MachineMemOperand.h"
 #include "llvm/CodeGen/MachineModuleInfo.h"
 #include "llvm/CodeGen/MachineOperand.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/RuntimeLibcalls.h"
 #include "llvm/CodeGen/SelectionDAG.h"
 #include "llvm/CodeGen/SelectionDAGNodes.h"
 #include "llvm/CodeGen/SelectionDAGTargetInfo.h"
 #include "llvm/CodeGen/StackMaps.h"
 #include "llvm/CodeGen/SwiftErrorValueTracking.h"
 #include "llvm/CodeGen/TargetFrameLowering.h"
 #include "llvm/CodeGen/TargetInstrInfo.h"
 #include "llvm/CodeGen/TargetLowering.h"
 #include "llvm/CodeGen/TargetOpcodes.h"
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/CodeGen/ValueTypes.h"
 #include "llvm/CodeGen/WinEHFuncInfo.h"
 #include "llvm/IR/Argument.h"
 #include "llvm/IR/Attributes.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/CFG.h"
 #include "llvm/IR/CallingConv.h"
 #include "llvm/IR/Constant.h"
 #include "llvm/IR/ConstantRange.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/DebugInfoMetadata.h"
 #include "llvm/IR/DebugLoc.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GetElementPtrTypeIterator.h"
 #include "llvm/IR/InlineAsm.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/IntrinsicsAArch64.h"
 #include "llvm/IR/IntrinsicsWebAssembly.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Operator.h"
 #include "llvm/IR/PatternMatch.h"
 #include "llvm/IR/Statepoint.h"
 #include "llvm/IR/Type.h"
 #include "llvm/IR/User.h"
 #include "llvm/IR/Value.h"
 #include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCSymbol.h"
 #include "llvm/Support/AtomicOrdering.h"
 #include "llvm/Support/BranchProbability.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CodeGen.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MachineValueType.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetIntrinsicInfo.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Target/TargetOptions.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include <algorithm>
 #include <cassert>
 #include <cstddef>
 #include <cstdint>
 #include <cstring>
 #include <iterator>
 #include <limits>
 #include <numeric>
 #include <tuple>
 #include <utility>
 #include <vector>
 
 using namespace llvm;
 using namespace PatternMatch;
 using namespace SwitchCG;
 
 #define DEBUG_TYPE "isel"
 
 /// LimitFloatPrecision - Generate low-precision inline sequences for
 /// some float libcalls (6, 8 or 12 bits).
 static unsigned LimitFloatPrecision;
 
 static cl::opt<bool>
     InsertAssertAlign("insert-assert-align", cl::init(true),
                       cl::desc("Insert the experimental `assertalign` node."),
                       cl::ReallyHidden);
 
 static cl::opt<unsigned, true>
     LimitFPPrecision("limit-float-precision",
                      cl::desc("Generate low-precision inline sequences "
                               "for some float libcalls"),
                      cl::location(LimitFloatPrecision), cl::Hidden,
                      cl::init(0));
 
 static cl::opt<unsigned> SwitchPeelThreshold(
     "switch-peel-threshold", cl::Hidden, cl::init(66),
     cl::desc("Set the case probability threshold for peeling the case from a "
              "switch statement. A value greater than 100 will void this "
              "optimization"));
 
 // Limit the width of DAG chains. This is important in general to prevent
 // DAG-based analysis from blowing up. For example, alias analysis and
 // load clustering may not complete in reasonable time. It is difficult to
 // recognize and avoid this situation within each individual analysis, and
 // future analyses are likely to have the same behavior. Limiting DAG width is
 // the safe approach and will be especially important with global DAGs.
 //
 // MaxParallelChains default is arbitrarily high to avoid affecting
 // optimization, but could be lowered to improve compile time. Any ld-ld-st-st
 // sequence over this should have been converted to llvm.memcpy by the
 // frontend. It is easy to induce this behavior with .ll code such as:
 // %buffer = alloca [4096 x i8]
 // %data = load [4096 x i8]* %argPtr
 // store [4096 x i8] %data, [4096 x i8]* %buffer
 static const unsigned MaxParallelChains = 64;
 
-// Return the calling convention if the Value passed requires ABI mangling as it
-// is a parameter to a function or a return value from a function which is not
-// an intrinsic.
-static Optional<CallingConv::ID> getABIRegCopyCC(const Value *V) {
-  if (auto *R = dyn_cast<ReturnInst>(V))
-    return R->getParent()->getParent()->getCallingConv();
-
-  if (auto *CI = dyn_cast<CallInst>(V)) {
-    const bool IsInlineAsm = CI->isInlineAsm();
-    const bool IsIndirectFunctionCall =
-        !IsInlineAsm && !CI->getCalledFunction();
-
-    // It is possible that the call instruction is an inline asm statement or an
-    // indirect function call in which case the return value of
-    // getCalledFunction() would be nullptr.
-    const bool IsInstrinsicCall =
-        !IsInlineAsm && !IsIndirectFunctionCall &&
-        CI->getCalledFunction()->getIntrinsicID() != Intrinsic::not_intrinsic;
-
-    if (!IsInlineAsm && !IsInstrinsicCall)
-      return CI->getCallingConv();
-  }
-
-  return None;
-}
-
 static SDValue getCopyFromPartsVector(SelectionDAG &DAG, const SDLoc &DL,
                                       const SDValue *Parts, unsigned NumParts,
                                       MVT PartVT, EVT ValueVT, const Value *V,
                                       Optional<CallingConv::ID> CC);
 
 /// getCopyFromParts - Create a value that contains the specified legal parts
 /// combined into the value they represent.  If the parts combine to a type
 /// larger than ValueVT then AssertOp can be used to specify whether the extra
 /// bits are known to be zero (ISD::AssertZext) or sign extended from ValueVT
 /// (ISD::AssertSext).
 static SDValue getCopyFromParts(SelectionDAG &DAG, const SDLoc &DL,
                                 const SDValue *Parts, unsigned NumParts,
                                 MVT PartVT, EVT ValueVT, const Value *V,
                                 Optional<CallingConv::ID> CC = None,
                                 Optional<ISD::NodeType> AssertOp = None) {
   // Let the target assemble the parts if it wants to
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   if (SDValue Val = TLI.joinRegisterPartsIntoValue(DAG, DL, Parts, NumParts,
                                                    PartVT, ValueVT, CC))
     return Val;
 
   if (ValueVT.isVector())
     return getCopyFromPartsVector(DAG, DL, Parts, NumParts, PartVT, ValueVT, V,
                                   CC);
 
   assert(NumParts > 0 && "No parts to assemble!");
   SDValue Val = Parts[0];
 
   if (NumParts > 1) {
     // Assemble the value from multiple parts.
     if (ValueVT.isInteger()) {
       unsigned PartBits = PartVT.getSizeInBits();
       unsigned ValueBits = ValueVT.getSizeInBits();
 
       // Assemble the power of 2 part.
       unsigned RoundParts =
           (NumParts & (NumParts - 1)) ? 1 << Log2_32(NumParts) : NumParts;
       unsigned RoundBits = PartBits * RoundParts;
       EVT RoundVT = RoundBits == ValueBits ?
         ValueVT : EVT::getIntegerVT(*DAG.getContext(), RoundBits);
       SDValue Lo, Hi;
 
       EVT HalfVT = EVT::getIntegerVT(*DAG.getContext(), RoundBits/2);
 
       if (RoundParts > 2) {
         Lo = getCopyFromParts(DAG, DL, Parts, RoundParts / 2,
                               PartVT, HalfVT, V);
         Hi = getCopyFromParts(DAG, DL, Parts + RoundParts / 2,
                               RoundParts / 2, PartVT, HalfVT, V);
       } else {
         Lo = DAG.getNode(ISD::BITCAST, DL, HalfVT, Parts[0]);
         Hi = DAG.getNode(ISD::BITCAST, DL, HalfVT, Parts[1]);
       }
 
       if (DAG.getDataLayout().isBigEndian())
         std::swap(Lo, Hi);
 
       Val = DAG.getNode(ISD::BUILD_PAIR, DL, RoundVT, Lo, Hi);
 
       if (RoundParts < NumParts) {
         // Assemble the trailing non-power-of-2 part.
         unsigned OddParts = NumParts - RoundParts;
         EVT OddVT = EVT::getIntegerVT(*DAG.getContext(), OddParts * PartBits);
         Hi = getCopyFromParts(DAG, DL, Parts + RoundParts, OddParts, PartVT,
                               OddVT, V, CC);
 
         // Combine the round and odd parts.
         Lo = Val;
         if (DAG.getDataLayout().isBigEndian())
           std::swap(Lo, Hi);
         EVT TotalVT = EVT::getIntegerVT(*DAG.getContext(), NumParts * PartBits);
         Hi = DAG.getNode(ISD::ANY_EXTEND, DL, TotalVT, Hi);
         Hi =
             DAG.getNode(ISD::SHL, DL, TotalVT, Hi,
                         DAG.getConstant(Lo.getValueSizeInBits(), DL,
                                         TLI.getPointerTy(DAG.getDataLayout())));
         Lo = DAG.getNode(ISD::ZERO_EXTEND, DL, TotalVT, Lo);
         Val = DAG.getNode(ISD::OR, DL, TotalVT, Lo, Hi);
       }
     } else if (PartVT.isFloatingPoint()) {
       // FP split into multiple FP parts (for ppcf128)
       assert(ValueVT == EVT(MVT::ppcf128) && PartVT == MVT::f64 &&
              "Unexpected split");
       SDValue Lo, Hi;
       Lo = DAG.getNode(ISD::BITCAST, DL, EVT(MVT::f64), Parts[0]);
       Hi = DAG.getNode(ISD::BITCAST, DL, EVT(MVT::f64), Parts[1]);
       if (TLI.hasBigEndianPartOrdering(ValueVT, DAG.getDataLayout()))
         std::swap(Lo, Hi);
       Val = DAG.getNode(ISD::BUILD_PAIR, DL, ValueVT, Lo, Hi);
     } else {
       // FP split into integer parts (soft fp)
       assert(ValueVT.isFloatingPoint() && PartVT.isInteger() &&
              !PartVT.isVector() && "Unexpected split");
       EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), ValueVT.getSizeInBits());
       Val = getCopyFromParts(DAG, DL, Parts, NumParts, PartVT, IntVT, V, CC);
     }
   }
 
   // There is now one part, held in Val.  Correct it to match ValueVT.
   // PartEVT is the type of the register class that holds the value.
   // ValueVT is the type of the inline asm operation.
   EVT PartEVT = Val.getValueType();
 
   if (PartEVT == ValueVT)
     return Val;
 
   if (PartEVT.isInteger() && ValueVT.isFloatingPoint() &&
       ValueVT.bitsLT(PartEVT)) {
     // For an FP value in an integer part, we need to truncate to the right
     // width first.
     PartEVT = EVT::getIntegerVT(*DAG.getContext(),  ValueVT.getSizeInBits());
     Val = DAG.getNode(ISD::TRUNCATE, DL, PartEVT, Val);
   }
 
   // Handle types that have the same size.
   if (PartEVT.getSizeInBits() == ValueVT.getSizeInBits())
     return DAG.getNode(ISD::BITCAST, DL, ValueVT, Val);
 
   // Handle types with different sizes.
   if (PartEVT.isInteger() && ValueVT.isInteger()) {
     if (ValueVT.bitsLT(PartEVT)) {
       // For a truncate, see if we have any information to
       // indicate whether the truncated bits will always be
       // zero or sign-extension.
       if (AssertOp.hasValue())
         Val = DAG.getNode(*AssertOp, DL, PartEVT, Val,
                           DAG.getValueType(ValueVT));
       return DAG.getNode(ISD::TRUNCATE, DL, ValueVT, Val);
     }
     return DAG.getNode(ISD::ANY_EXTEND, DL, ValueVT, Val);
   }
 
   if (PartEVT.isFloatingPoint() && ValueVT.isFloatingPoint()) {
     // FP_ROUND's are always exact here.
     if (ValueVT.bitsLT(Val.getValueType()))
       return DAG.getNode(
           ISD::FP_ROUND, DL, ValueVT, Val,
           DAG.getTargetConstant(1, DL, TLI.getPointerTy(DAG.getDataLayout())));
 
     return DAG.getNode(ISD::FP_EXTEND, DL, ValueVT, Val);
   }
 
   // Handle MMX to a narrower integer type by bitcasting MMX to integer and
   // then truncating.
   if (PartEVT == MVT::x86mmx && ValueVT.isInteger() &&
       ValueVT.bitsLT(PartEVT)) {
     Val = DAG.getNode(ISD::BITCAST, DL, MVT::i64, Val);
     return DAG.getNode(ISD::TRUNCATE, DL, ValueVT, Val);
   }
 
   report_fatal_error("Unknown mismatch in getCopyFromParts!");
 }
 
 static void diagnosePossiblyInvalidConstraint(LLVMContext &Ctx, const Value *V,
                                               const Twine &ErrMsg) {
   const Instruction *I = dyn_cast_or_null<Instruction>(V);
   if (!V)
     return Ctx.emitError(ErrMsg);
 
   const char *AsmError = ", possible invalid constraint for vector type";
   if (const CallInst *CI = dyn_cast<CallInst>(I))
     if (CI->isInlineAsm())
       return Ctx.emitError(I, ErrMsg + AsmError);
 
   return Ctx.emitError(I, ErrMsg);
 }
 
 /// getCopyFromPartsVector - Create a value that contains the specified legal
 /// parts combined into the value they represent.  If the parts combine to a
 /// type larger than ValueVT then AssertOp can be used to specify whether the
 /// extra bits are known to be zero (ISD::AssertZext) or sign extended from
 /// ValueVT (ISD::AssertSext).
 static SDValue getCopyFromPartsVector(SelectionDAG &DAG, const SDLoc &DL,
                                       const SDValue *Parts, unsigned NumParts,
                                       MVT PartVT, EVT ValueVT, const Value *V,
                                       Optional<CallingConv::ID> CallConv) {
   assert(ValueVT.isVector() && "Not a vector value");
   assert(NumParts > 0 && "No parts to assemble!");
   const bool IsABIRegCopy = CallConv.hasValue();
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SDValue Val = Parts[0];
 
   // Handle a multi-element vector.
   if (NumParts > 1) {
     EVT IntermediateVT;
     MVT RegisterVT;
     unsigned NumIntermediates;
     unsigned NumRegs;
 
     if (IsABIRegCopy) {
       NumRegs = TLI.getVectorTypeBreakdownForCallingConv(
           *DAG.getContext(), CallConv.getValue(), ValueVT, IntermediateVT,
           NumIntermediates, RegisterVT);
     } else {
       NumRegs =
           TLI.getVectorTypeBreakdown(*DAG.getContext(), ValueVT, IntermediateVT,
                                      NumIntermediates, RegisterVT);
     }
 
     assert(NumRegs == NumParts && "Part count doesn't match vector breakdown!");
     NumParts = NumRegs; // Silence a compiler warning.
     assert(RegisterVT == PartVT && "Part type doesn't match vector breakdown!");
     assert(RegisterVT.getSizeInBits() ==
            Parts[0].getSimpleValueType().getSizeInBits() &&
            "Part type sizes don't match!");
 
     // Assemble the parts into intermediate operands.
     SmallVector<SDValue, 8> Ops(NumIntermediates);
     if (NumIntermediates == NumParts) {
       // If the register was not expanded, truncate or copy the value,
       // as appropriate.
       for (unsigned i = 0; i != NumParts; ++i)
         Ops[i] = getCopyFromParts(DAG, DL, &Parts[i], 1,
                                   PartVT, IntermediateVT, V, CallConv);
     } else if (NumParts > 0) {
       // If the intermediate type was expanded, build the intermediate
       // operands from the parts.
       assert(NumParts % NumIntermediates == 0 &&
              "Must expand into a divisible number of parts!");
       unsigned Factor = NumParts / NumIntermediates;
       for (unsigned i = 0; i != NumIntermediates; ++i)
         Ops[i] = getCopyFromParts(DAG, DL, &Parts[i * Factor], Factor,
                                   PartVT, IntermediateVT, V, CallConv);
     }
 
     // Build a vector with BUILD_VECTOR or CONCAT_VECTORS from the
     // intermediate operands.
     EVT BuiltVectorTy =
         IntermediateVT.isVector()
             ? EVT::getVectorVT(
                   *DAG.getContext(), IntermediateVT.getScalarType(),
                   IntermediateVT.getVectorElementCount() * NumParts)
             : EVT::getVectorVT(*DAG.getContext(),
                                IntermediateVT.getScalarType(),
                                NumIntermediates);
     Val = DAG.getNode(IntermediateVT.isVector() ? ISD::CONCAT_VECTORS
                                                 : ISD::BUILD_VECTOR,
                       DL, BuiltVectorTy, Ops);
   }
 
   // There is now one part, held in Val.  Correct it to match ValueVT.
   EVT PartEVT = Val.getValueType();
 
   if (PartEVT == ValueVT)
     return Val;
 
   if (PartEVT.isVector()) {
     // If the element type of the source/dest vectors are the same, but the
     // parts vector has more elements than the value vector, then we have a
     // vector widening case (e.g. <2 x float> -> <4 x float>).  Extract the
     // elements we want.
     if (PartEVT.getVectorElementType() == ValueVT.getVectorElementType()) {
       assert((PartEVT.getVectorElementCount().Min >
               ValueVT.getVectorElementCount().Min) &&
              (PartEVT.getVectorElementCount().Scalable ==
               ValueVT.getVectorElementCount().Scalable) &&
              "Cannot narrow, it would be a lossy transformation");
       return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ValueVT, Val,
                          DAG.getVectorIdxConstant(0, DL));
     }
 
     // Vector/Vector bitcast.
     if (ValueVT.getSizeInBits() == PartEVT.getSizeInBits())
       return DAG.getNode(ISD::BITCAST, DL, ValueVT, Val);
 
     assert(PartEVT.getVectorElementCount() == ValueVT.getVectorElementCount() &&
       "Cannot handle this kind of promotion");
     // Promoted vector extract
     return DAG.getAnyExtOrTrunc(Val, DL, ValueVT);
 
   }
 
   // Trivial bitcast if the types are the same size and the destination
   // vector type is legal.
   if (PartEVT.getSizeInBits() == ValueVT.getSizeInBits() &&
       TLI.isTypeLegal(ValueVT))
     return DAG.getNode(ISD::BITCAST, DL, ValueVT, Val);
 
   if (ValueVT.getVectorNumElements() != 1) {
      // Certain ABIs require that vectors are passed as integers. For vectors
      // are the same size, this is an obvious bitcast.
      if (ValueVT.getSizeInBits() == PartEVT.getSizeInBits()) {
        return DAG.getNode(ISD::BITCAST, DL, ValueVT, Val);
      } else if (ValueVT.getSizeInBits() < PartEVT.getSizeInBits()) {
        // Bitcast Val back the original type and extract the corresponding
        // vector we want.
        unsigned Elts = PartEVT.getSizeInBits() / ValueVT.getScalarSizeInBits();
        EVT WiderVecType = EVT::getVectorVT(*DAG.getContext(),
                                            ValueVT.getVectorElementType(), Elts);
        Val = DAG.getBitcast(WiderVecType, Val);
        return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ValueVT, Val,
                           DAG.getVectorIdxConstant(0, DL));
      }
 
      diagnosePossiblyInvalidConstraint(
          *DAG.getContext(), V, "non-trivial scalar-to-vector conversion");
      return DAG.getUNDEF(ValueVT);
   }
 
   // Handle cases such as i8 -> <1 x i1>
   EVT ValueSVT = ValueVT.getVectorElementType();
   if (ValueVT.getVectorNumElements() == 1 && ValueSVT != PartEVT) {
     if (ValueSVT.getSizeInBits() == PartEVT.getSizeInBits())
       Val = DAG.getNode(ISD::BITCAST, DL, ValueSVT, Val);
     else
       Val = ValueVT.isFloatingPoint()
                 ? DAG.getFPExtendOrRound(Val, DL, ValueSVT)
                 : DAG.getAnyExtOrTrunc(Val, DL, ValueSVT);
   }
 
   return DAG.getBuildVector(ValueVT, DL, Val);
 }
 
 static void getCopyToPartsVector(SelectionDAG &DAG, const SDLoc &dl,
                                  SDValue Val, SDValue *Parts, unsigned NumParts,
                                  MVT PartVT, const Value *V,
                                  Optional<CallingConv::ID> CallConv);
 
 /// getCopyToParts - Create a series of nodes that contain the specified value
 /// split into legal parts.  If the parts contain more bits than Val, then, for
 /// integers, ExtendKind can be used to specify how to generate the extra bits.
 static void getCopyToParts(SelectionDAG &DAG, const SDLoc &DL, SDValue Val,
                            SDValue *Parts, unsigned NumParts, MVT PartVT,
                            const Value *V,
                            Optional<CallingConv::ID> CallConv = None,
                            ISD::NodeType ExtendKind = ISD::ANY_EXTEND) {
   // Let the target split the parts if it wants to
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   if (TLI.splitValueIntoRegisterParts(DAG, DL, Val, Parts, NumParts, PartVT,
                                       CallConv))
     return;
   EVT ValueVT = Val.getValueType();
 
   // Handle the vector case separately.
   if (ValueVT.isVector())
     return getCopyToPartsVector(DAG, DL, Val, Parts, NumParts, PartVT, V,
                                 CallConv);
 
   unsigned PartBits = PartVT.getSizeInBits();
   unsigned OrigNumParts = NumParts;
   assert(DAG.getTargetLoweringInfo().isTypeLegal(PartVT) &&
          "Copying to an illegal type!");
 
   if (NumParts == 0)
     return;
 
   assert(!ValueVT.isVector() && "Vector case handled elsewhere");
   EVT PartEVT = PartVT;
   if (PartEVT == ValueVT) {
     assert(NumParts == 1 && "No-op copy with multiple parts!");
     Parts[0] = Val;
     return;
   }
 
   if (NumParts * PartBits > ValueVT.getSizeInBits()) {
     // If the parts cover more bits than the value has, promote the value.
     if (PartVT.isFloatingPoint() && ValueVT.isFloatingPoint()) {
       assert(NumParts == 1 && "Do not know what to promote to!");
       Val = DAG.getNode(ISD::FP_EXTEND, DL, PartVT, Val);
     } else {
       if (ValueVT.isFloatingPoint()) {
         // FP values need to be bitcast, then extended if they are being put
         // into a larger container.
         ValueVT = EVT::getIntegerVT(*DAG.getContext(),  ValueVT.getSizeInBits());
         Val = DAG.getNode(ISD::BITCAST, DL, ValueVT, Val);
       }
       assert((PartVT.isInteger() || PartVT == MVT::x86mmx) &&
              ValueVT.isInteger() &&
              "Unknown mismatch!");
       ValueVT = EVT::getIntegerVT(*DAG.getContext(), NumParts * PartBits);
       Val = DAG.getNode(ExtendKind, DL, ValueVT, Val);
       if (PartVT == MVT::x86mmx)
         Val = DAG.getNode(ISD::BITCAST, DL, PartVT, Val);
     }
   } else if (PartBits == ValueVT.getSizeInBits()) {
     // Different types of the same size.
     assert(NumParts == 1 && PartEVT != ValueVT);
     Val = DAG.getNode(ISD::BITCAST, DL, PartVT, Val);
   } else if (NumParts * PartBits < ValueVT.getSizeInBits()) {
     // If the parts cover less bits than value has, truncate the value.
     assert((PartVT.isInteger() || PartVT == MVT::x86mmx) &&
            ValueVT.isInteger() &&
            "Unknown mismatch!");
     ValueVT = EVT::getIntegerVT(*DAG.getContext(), NumParts * PartBits);
     Val = DAG.getNode(ISD::TRUNCATE, DL, ValueVT, Val);
     if (PartVT == MVT::x86mmx)
       Val = DAG.getNode(ISD::BITCAST, DL, PartVT, Val);
   }
 
   // The value may have changed - recompute ValueVT.
   ValueVT = Val.getValueType();
   assert(NumParts * PartBits == ValueVT.getSizeInBits() &&
          "Failed to tile the value with PartVT!");
 
   if (NumParts == 1) {
     if (PartEVT != ValueVT) {
       diagnosePossiblyInvalidConstraint(*DAG.getContext(), V,
                                         "scalar-to-vector conversion failed");
       Val = DAG.getNode(ISD::BITCAST, DL, PartVT, Val);
     }
 
     Parts[0] = Val;
     return;
   }
 
   // Expand the value into multiple parts.
   if (NumParts & (NumParts - 1)) {
     // The number of parts is not a power of 2.  Split off and copy the tail.
     assert(PartVT.isInteger() && ValueVT.isInteger() &&
            "Do not know what to expand to!");
     unsigned RoundParts = 1 << Log2_32(NumParts);
     unsigned RoundBits = RoundParts * PartBits;
     unsigned OddParts = NumParts - RoundParts;
     SDValue OddVal = DAG.getNode(ISD::SRL, DL, ValueVT, Val,
       DAG.getShiftAmountConstant(RoundBits, ValueVT, DL, /*LegalTypes*/false));
 
     getCopyToParts(DAG, DL, OddVal, Parts + RoundParts, OddParts, PartVT, V,
                    CallConv);
 
     if (DAG.getDataLayout().isBigEndian())
       // The odd parts were reversed by getCopyToParts - unreverse them.
       std::reverse(Parts + RoundParts, Parts + NumParts);
 
     NumParts = RoundParts;
     ValueVT = EVT::getIntegerVT(*DAG.getContext(), NumParts * PartBits);
     Val = DAG.getNode(ISD::TRUNCATE, DL, ValueVT, Val);
   }
 
   // The number of parts is a power of 2.  Repeatedly bisect the value using
   // EXTRACT_ELEMENT.
   Parts[0] = DAG.getNode(ISD::BITCAST, DL,
                          EVT::getIntegerVT(*DAG.getContext(),
                                            ValueVT.getSizeInBits()),
                          Val);
 
   for (unsigned StepSize = NumParts; StepSize > 1; StepSize /= 2) {
     for (unsigned i = 0; i < NumParts; i += StepSize) {
       unsigned ThisBits = StepSize * PartBits / 2;
       EVT ThisVT = EVT::getIntegerVT(*DAG.getContext(), ThisBits);
       SDValue &Part0 = Parts[i];
       SDValue &Part1 = Parts[i+StepSize/2];
 
       Part1 = DAG.getNode(ISD::EXTRACT_ELEMENT, DL,
                           ThisVT, Part0, DAG.getIntPtrConstant(1, DL));
       Part0 = DAG.getNode(ISD::EXTRACT_ELEMENT, DL,
                           ThisVT, Part0, DAG.getIntPtrConstant(0, DL));
 
       if (ThisBits == PartBits && ThisVT != PartVT) {
         Part0 = DAG.getNode(ISD::BITCAST, DL, PartVT, Part0);
         Part1 = DAG.getNode(ISD::BITCAST, DL, PartVT, Part1);
       }
     }
   }
 
   if (DAG.getDataLayout().isBigEndian())
     std::reverse(Parts, Parts + OrigNumParts);
 }
 
 static SDValue widenVectorToPartType(SelectionDAG &DAG,
                                      SDValue Val, const SDLoc &DL, EVT PartVT) {
   if (!PartVT.isFixedLengthVector())
     return SDValue();
 
   EVT ValueVT = Val.getValueType();
   unsigned PartNumElts = PartVT.getVectorNumElements();
   unsigned ValueNumElts = ValueVT.getVectorNumElements();
   if (PartNumElts > ValueNumElts &&
       PartVT.getVectorElementType() == ValueVT.getVectorElementType()) {
     EVT ElementVT = PartVT.getVectorElementType();
     // Vector widening case, e.g. <2 x float> -> <4 x float>.  Shuffle in
     // undef elements.
     SmallVector<SDValue, 16> Ops;
     DAG.ExtractVectorElements(Val, Ops);
     SDValue EltUndef = DAG.getUNDEF(ElementVT);
     for (unsigned i = ValueNumElts, e = PartNumElts; i != e; ++i)
       Ops.push_back(EltUndef);
 
     // FIXME: Use CONCAT for 2x -> 4x.
     return DAG.getBuildVector(PartVT, DL, Ops);
   }
 
   return SDValue();
 }
 
 /// getCopyToPartsVector - Create a series of nodes that contain the specified
 /// value split into legal parts.
 static void getCopyToPartsVector(SelectionDAG &DAG, const SDLoc &DL,
                                  SDValue Val, SDValue *Parts, unsigned NumParts,
                                  MVT PartVT, const Value *V,
                                  Optional<CallingConv::ID> CallConv) {
   EVT ValueVT = Val.getValueType();
   assert(ValueVT.isVector() && "Not a vector");
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   const bool IsABIRegCopy = CallConv.hasValue();
 
   if (NumParts == 1) {
     EVT PartEVT = PartVT;
     if (PartEVT == ValueVT) {
       // Nothing to do.
     } else if (PartVT.getSizeInBits() == ValueVT.getSizeInBits()) {
       // Bitconvert vector->vector case.
       Val = DAG.getNode(ISD::BITCAST, DL, PartVT, Val);
     } else if (SDValue Widened = widenVectorToPartType(DAG, Val, DL, PartVT)) {
       Val = Widened;
     } else if (PartVT.isVector() &&
                PartEVT.getVectorElementType().bitsGE(
                    ValueVT.getVectorElementType()) &&
                PartEVT.getVectorElementCount() ==
                    ValueVT.getVectorElementCount()) {
 
       // Promoted vector extract
       Val = DAG.getAnyExtOrTrunc(Val, DL, PartVT);
     } else {
       if (ValueVT.getVectorNumElements() == 1) {
         Val = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, PartVT, Val,
                           DAG.getVectorIdxConstant(0, DL));
       } else {
         assert(PartVT.getSizeInBits() > ValueVT.getSizeInBits() &&
                "lossy conversion of vector to scalar type");
         EVT IntermediateType =
             EVT::getIntegerVT(*DAG.getContext(), ValueVT.getSizeInBits());
         Val = DAG.getBitcast(IntermediateType, Val);
         Val = DAG.getAnyExtOrTrunc(Val, DL, PartVT);
       }
     }
 
     assert(Val.getValueType() == PartVT && "Unexpected vector part value type");
     Parts[0] = Val;
     return;
   }
 
   // Handle a multi-element vector.
   EVT IntermediateVT;
   MVT RegisterVT;
   unsigned NumIntermediates;
   unsigned NumRegs;
   if (IsABIRegCopy) {
     NumRegs = TLI.getVectorTypeBreakdownForCallingConv(
         *DAG.getContext(), CallConv.getValue(), ValueVT, IntermediateVT,
         NumIntermediates, RegisterVT);
   } else {
     NumRegs =
         TLI.getVectorTypeBreakdown(*DAG.getContext(), ValueVT, IntermediateVT,
                                    NumIntermediates, RegisterVT);
   }
 
   assert(NumRegs == NumParts && "Part count doesn't match vector breakdown!");
   NumParts = NumRegs; // Silence a compiler warning.
   assert(RegisterVT == PartVT && "Part type doesn't match vector breakdown!");
 
   assert(IntermediateVT.isScalableVector() == ValueVT.isScalableVector() &&
          "Mixing scalable and fixed vectors when copying in parts");
 
   ElementCount DestEltCnt;
 
   if (IntermediateVT.isVector())
     DestEltCnt = IntermediateVT.getVectorElementCount() * NumIntermediates;
   else
     DestEltCnt = ElementCount(NumIntermediates, false);
 
   EVT BuiltVectorTy = EVT::getVectorVT(
       *DAG.getContext(), IntermediateVT.getScalarType(), DestEltCnt);
   if (ValueVT != BuiltVectorTy) {
     if (SDValue Widened = widenVectorToPartType(DAG, Val, DL, BuiltVectorTy))
       Val = Widened;
 
     Val = DAG.getNode(ISD::BITCAST, DL, BuiltVectorTy, Val);
   }
 
   // Split the vector into intermediate operands.
   SmallVector<SDValue, 8> Ops(NumIntermediates);
   for (unsigned i = 0; i != NumIntermediates; ++i) {
     if (IntermediateVT.isVector()) {
       // This does something sensible for scalable vectors - see the
       // definition of EXTRACT_SUBVECTOR for further details.
       unsigned IntermediateNumElts = IntermediateVT.getVectorMinNumElements();
       Ops[i] =
           DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, IntermediateVT, Val,
                       DAG.getVectorIdxConstant(i * IntermediateNumElts, DL));
     } else {
       Ops[i] = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, IntermediateVT, Val,
                            DAG.getVectorIdxConstant(i, DL));
     }
   }
 
   // Split the intermediate operands into legal parts.
   if (NumParts == NumIntermediates) {
     // If the register was not expanded, promote or copy the value,
     // as appropriate.
     for (unsigned i = 0; i != NumParts; ++i)
       getCopyToParts(DAG, DL, Ops[i], &Parts[i], 1, PartVT, V, CallConv);
   } else if (NumParts > 0) {
     // If the intermediate type was expanded, split each the value into
     // legal parts.
     assert(NumIntermediates != 0 && "division by zero");
     assert(NumParts % NumIntermediates == 0 &&
            "Must expand into a divisible number of parts!");
     unsigned Factor = NumParts / NumIntermediates;
     for (unsigned i = 0; i != NumIntermediates; ++i)
       getCopyToParts(DAG, DL, Ops[i], &Parts[i * Factor], Factor, PartVT, V,
                      CallConv);
   }
 }
 
 RegsForValue::RegsForValue(const SmallVector<unsigned, 4> &regs, MVT regvt,
                            EVT valuevt, Optional<CallingConv::ID> CC)
     : ValueVTs(1, valuevt), RegVTs(1, regvt), Regs(regs),
       RegCount(1, regs.size()), CallConv(CC) {}
 
 RegsForValue::RegsForValue(LLVMContext &Context, const TargetLowering &TLI,
                            const DataLayout &DL, unsigned Reg, Type *Ty,
                            Optional<CallingConv::ID> CC) {
   ComputeValueVTs(TLI, DL, Ty, ValueVTs);
 
   CallConv = CC;
 
   for (EVT ValueVT : ValueVTs) {
     unsigned NumRegs =
         isABIMangled()
             ? TLI.getNumRegistersForCallingConv(Context, CC.getValue(), ValueVT)
             : TLI.getNumRegisters(Context, ValueVT);
     MVT RegisterVT =
         isABIMangled()
             ? TLI.getRegisterTypeForCallingConv(Context, CC.getValue(), ValueVT)
             : TLI.getRegisterType(Context, ValueVT);
     for (unsigned i = 0; i != NumRegs; ++i)
       Regs.push_back(Reg + i);
     RegVTs.push_back(RegisterVT);
     RegCount.push_back(NumRegs);
     Reg += NumRegs;
   }
 }
 
 SDValue RegsForValue::getCopyFromRegs(SelectionDAG &DAG,
                                       FunctionLoweringInfo &FuncInfo,
                                       const SDLoc &dl, SDValue &Chain,
                                       SDValue *Flag, const Value *V) const {
   // A Value with type {} or [0 x %t] needs no registers.
   if (ValueVTs.empty())
     return SDValue();
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
 
   // Assemble the legal parts into the final values.
   SmallVector<SDValue, 4> Values(ValueVTs.size());
   SmallVector<SDValue, 8> Parts;
   for (unsigned Value = 0, Part = 0, e = ValueVTs.size(); Value != e; ++Value) {
     // Copy the legal parts from the registers.
     EVT ValueVT = ValueVTs[Value];
     unsigned NumRegs = RegCount[Value];
     MVT RegisterVT = isABIMangled() ? TLI.getRegisterTypeForCallingConv(
                                           *DAG.getContext(),
                                           CallConv.getValue(), RegVTs[Value])
                                     : RegVTs[Value];
 
     Parts.resize(NumRegs);
     for (unsigned i = 0; i != NumRegs; ++i) {
       SDValue P;
       if (!Flag) {
         P = DAG.getCopyFromReg(Chain, dl, Regs[Part+i], RegisterVT);
       } else {
         P = DAG.getCopyFromReg(Chain, dl, Regs[Part+i], RegisterVT, *Flag);
         *Flag = P.getValue(2);
       }
 
       Chain = P.getValue(1);
       Parts[i] = P;
 
       // If the source register was virtual and if we know something about it,
       // add an assert node.
       if (!Register::isVirtualRegister(Regs[Part + i]) ||
           !RegisterVT.isInteger())
         continue;
 
       const FunctionLoweringInfo::LiveOutInfo *LOI =
         FuncInfo.GetLiveOutRegInfo(Regs[Part+i]);
       if (!LOI)
         continue;
 
       unsigned RegSize = RegisterVT.getScalarSizeInBits();
       unsigned NumSignBits = LOI->NumSignBits;
       unsigned NumZeroBits = LOI->Known.countMinLeadingZeros();
 
       if (NumZeroBits == RegSize) {
         // The current value is a zero.
         // Explicitly express that as it would be easier for
         // optimizations to kick in.
         Parts[i] = DAG.getConstant(0, dl, RegisterVT);
         continue;
       }
 
       // FIXME: We capture more information than the dag can represent.  For
       // now, just use the tightest assertzext/assertsext possible.
       bool isSExt;
       EVT FromVT(MVT::Other);
       if (NumZeroBits) {
         FromVT = EVT::getIntegerVT(*DAG.getContext(), RegSize - NumZeroBits);
         isSExt = false;
       } else if (NumSignBits > 1) {
         FromVT =
             EVT::getIntegerVT(*DAG.getContext(), RegSize - NumSignBits + 1);
         isSExt = true;
       } else {
         continue;
       }
       // Add an assertion node.
       assert(FromVT != MVT::Other);
       Parts[i] = DAG.getNode(isSExt ? ISD::AssertSext : ISD::AssertZext, dl,
                              RegisterVT, P, DAG.getValueType(FromVT));
     }
 
     Values[Value] = getCopyFromParts(DAG, dl, Parts.begin(), NumRegs,
                                      RegisterVT, ValueVT, V, CallConv);
     Part += NumRegs;
     Parts.clear();
   }
 
   return DAG.getNode(ISD::MERGE_VALUES, dl, DAG.getVTList(ValueVTs), Values);
 }
 
 void RegsForValue::getCopyToRegs(SDValue Val, SelectionDAG &DAG,
                                  const SDLoc &dl, SDValue &Chain, SDValue *Flag,
                                  const Value *V,
                                  ISD::NodeType PreferredExtendType) const {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   ISD::NodeType ExtendKind = PreferredExtendType;
 
   // Get the list of the values's legal parts.
   unsigned NumRegs = Regs.size();
   SmallVector<SDValue, 8> Parts(NumRegs);
   for (unsigned Value = 0, Part = 0, e = ValueVTs.size(); Value != e; ++Value) {
     unsigned NumParts = RegCount[Value];
 
     MVT RegisterVT = isABIMangled() ? TLI.getRegisterTypeForCallingConv(
                                           *DAG.getContext(),
                                           CallConv.getValue(), RegVTs[Value])
                                     : RegVTs[Value];
 
     if (ExtendKind == ISD::ANY_EXTEND && TLI.isZExtFree(Val, RegisterVT))
       ExtendKind = ISD::ZERO_EXTEND;
 
     getCopyToParts(DAG, dl, Val.getValue(Val.getResNo() + Value), &Parts[Part],
                    NumParts, RegisterVT, V, CallConv, ExtendKind);
     Part += NumParts;
   }
 
   // Copy the parts into the registers.
   SmallVector<SDValue, 8> Chains(NumRegs);
   for (unsigned i = 0; i != NumRegs; ++i) {
     SDValue Part;
     if (!Flag) {
       Part = DAG.getCopyToReg(Chain, dl, Regs[i], Parts[i]);
     } else {
       Part = DAG.getCopyToReg(Chain, dl, Regs[i], Parts[i], *Flag);
       *Flag = Part.getValue(1);
     }
 
     Chains[i] = Part.getValue(0);
   }
 
   if (NumRegs == 1 || Flag)
     // If NumRegs > 1 && Flag is used then the use of the last CopyToReg is
     // flagged to it. That is the CopyToReg nodes and the user are considered
     // a single scheduling unit. If we create a TokenFactor and return it as
     // chain, then the TokenFactor is both a predecessor (operand) of the
     // user as well as a successor (the TF operands are flagged to the user).
     // c1, f1 = CopyToReg
     // c2, f2 = CopyToReg
     // c3     = TokenFactor c1, c2
     // ...
     //        = op c3, ..., f2
     Chain = Chains[NumRegs-1];
   else
     Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Chains);
 }
 
 void RegsForValue::AddInlineAsmOperands(unsigned Code, bool HasMatching,
                                         unsigned MatchingIdx, const SDLoc &dl,
                                         SelectionDAG &DAG,
                                         std::vector<SDValue> &Ops) const {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
 
   unsigned Flag = InlineAsm::getFlagWord(Code, Regs.size());
   if (HasMatching)
     Flag = InlineAsm::getFlagWordForMatchingOp(Flag, MatchingIdx);
   else if (!Regs.empty() && Register::isVirtualRegister(Regs.front())) {
     // Put the register class of the virtual registers in the flag word.  That
     // way, later passes can recompute register class constraints for inline
     // assembly as well as normal instructions.
     // Don't do this for tied operands that can use the regclass information
     // from the def.
     const MachineRegisterInfo &MRI = DAG.getMachineFunction().getRegInfo();
     const TargetRegisterClass *RC = MRI.getRegClass(Regs.front());
     Flag = InlineAsm::getFlagWordForRegClass(Flag, RC->getID());
   }
 
   SDValue Res = DAG.getTargetConstant(Flag, dl, MVT::i32);
   Ops.push_back(Res);
 
   if (Code == InlineAsm::Kind_Clobber) {
     // Clobbers should always have a 1:1 mapping with registers, and may
     // reference registers that have illegal (e.g. vector) types. Hence, we
     // shouldn't try to apply any sort of splitting logic to them.
     assert(Regs.size() == RegVTs.size() && Regs.size() == ValueVTs.size() &&
            "No 1:1 mapping from clobbers to regs?");
     unsigned SP = TLI.getStackPointerRegisterToSaveRestore();
     (void)SP;
     for (unsigned I = 0, E = ValueVTs.size(); I != E; ++I) {
       Ops.push_back(DAG.getRegister(Regs[I], RegVTs[I]));
       assert(
           (Regs[I] != SP ||
            DAG.getMachineFunction().getFrameInfo().hasOpaqueSPAdjustment()) &&
           "If we clobbered the stack pointer, MFI should know about it.");
     }
     return;
   }
 
   for (unsigned Value = 0, Reg = 0, e = ValueVTs.size(); Value != e; ++Value) {
     unsigned NumRegs = TLI.getNumRegisters(*DAG.getContext(), ValueVTs[Value]);
     MVT RegisterVT = RegVTs[Value];
     for (unsigned i = 0; i != NumRegs; ++i) {
       assert(Reg < Regs.size() && "Mismatch in # registers expected");
       unsigned TheReg = Regs[Reg++];
       Ops.push_back(DAG.getRegister(TheReg, RegisterVT));
     }
   }
 }
 
 SmallVector<std::pair<unsigned, unsigned>, 4>
 RegsForValue::getRegsAndSizes() const {
   SmallVector<std::pair<unsigned, unsigned>, 4> OutVec;
   unsigned I = 0;
   for (auto CountAndVT : zip_first(RegCount, RegVTs)) {
     unsigned RegCount = std::get<0>(CountAndVT);
     MVT RegisterVT = std::get<1>(CountAndVT);
     unsigned RegisterSize = RegisterVT.getSizeInBits();
     for (unsigned E = I + RegCount; I != E; ++I)
       OutVec.push_back(std::make_pair(Regs[I], RegisterSize));
   }
   return OutVec;
 }
 
 void SelectionDAGBuilder::init(GCFunctionInfo *gfi, AliasAnalysis *aa,
                                const TargetLibraryInfo *li) {
   AA = aa;
   GFI = gfi;
   LibInfo = li;
   DL = &DAG.getDataLayout();
   Context = DAG.getContext();
   LPadToCallSiteMap.clear();
   SL->init(DAG.getTargetLoweringInfo(), TM, DAG.getDataLayout());
 }
 
 void SelectionDAGBuilder::clear() {
   NodeMap.clear();
   UnusedArgNodeMap.clear();
   PendingLoads.clear();
   PendingExports.clear();
   PendingConstrainedFP.clear();
   PendingConstrainedFPStrict.clear();
   CurInst = nullptr;
   HasTailCall = false;
   SDNodeOrder = LowestSDNodeOrder;
   StatepointLowering.clear();
 }
 
 void SelectionDAGBuilder::clearDanglingDebugInfo() {
   DanglingDebugInfoMap.clear();
 }
 
 // Update DAG root to include dependencies on Pending chains.
 SDValue SelectionDAGBuilder::updateRoot(SmallVectorImpl<SDValue> &Pending) {
   SDValue Root = DAG.getRoot();
 
   if (Pending.empty())
     return Root;
 
   // Add current root to PendingChains, unless we already indirectly
   // depend on it.
   if (Root.getOpcode() != ISD::EntryToken) {
     unsigned i = 0, e = Pending.size();
     for (; i != e; ++i) {
       assert(Pending[i].getNode()->getNumOperands() > 1);
       if (Pending[i].getNode()->getOperand(0) == Root)
         break;  // Don't add the root if we already indirectly depend on it.
     }
 
     if (i == e)
       Pending.push_back(Root);
   }
 
   if (Pending.size() == 1)
     Root = Pending[0];
   else
     Root = DAG.getTokenFactor(getCurSDLoc(), Pending);
 
   DAG.setRoot(Root);
   Pending.clear();
   return Root;
 }
 
 SDValue SelectionDAGBuilder::getMemoryRoot() {
   return updateRoot(PendingLoads);
 }
 
 SDValue SelectionDAGBuilder::getRoot() {
   // Chain up all pending constrained intrinsics together with all
   // pending loads, by simply appending them to PendingLoads and
   // then calling getMemoryRoot().
   PendingLoads.reserve(PendingLoads.size() +
                        PendingConstrainedFP.size() +
                        PendingConstrainedFPStrict.size());
   PendingLoads.append(PendingConstrainedFP.begin(),
                       PendingConstrainedFP.end());
   PendingLoads.append(PendingConstrainedFPStrict.begin(),
                       PendingConstrainedFPStrict.end());
   PendingConstrainedFP.clear();
   PendingConstrainedFPStrict.clear();
   return getMemoryRoot();
 }
 
 SDValue SelectionDAGBuilder::getControlRoot() {
   // We need to emit pending fpexcept.strict constrained intrinsics,
   // so append them to the PendingExports list.
   PendingExports.append(PendingConstrainedFPStrict.begin(),
                         PendingConstrainedFPStrict.end());
   PendingConstrainedFPStrict.clear();
   return updateRoot(PendingExports);
 }
 
 void SelectionDAGBuilder::visit(const Instruction &I) {
   // Set up outgoing PHI node register values before emitting the terminator.
   if (I.isTerminator()) {
     HandlePHINodesInSuccessorBlocks(I.getParent());
   }
 
   // Increase the SDNodeOrder if dealing with a non-debug instruction.
   if (!isa<DbgInfoIntrinsic>(I))
     ++SDNodeOrder;
 
   CurInst = &I;
 
   visit(I.getOpcode(), I);
 
   if (auto *FPMO = dyn_cast<FPMathOperator>(&I)) {
     // ConstrainedFPIntrinsics handle their own FMF.
     if (!isa<ConstrainedFPIntrinsic>(&I)) {
       // Propagate the fast-math-flags of this IR instruction to the DAG node that
       // maps to this instruction.
       // TODO: We could handle all flags (nsw, etc) here.
       // TODO: If an IR instruction maps to >1 node, only the final node will have
       //       flags set.
       if (SDNode *Node = getNodeForIRValue(&I)) {
         SDNodeFlags IncomingFlags;
         IncomingFlags.copyFMF(*FPMO);
         if (!Node->getFlags().isDefined())
           Node->setFlags(IncomingFlags);
         else
           Node->intersectFlagsWith(IncomingFlags);
       }
     }
   }
 
   if (!I.isTerminator() && !HasTailCall &&
       !isa<GCStatepointInst>(I)) // statepoints handle their exports internally
     CopyToExportRegsIfNeeded(&I);
 
   CurInst = nullptr;
 }
 
 void SelectionDAGBuilder::visitPHI(const PHINode &) {
   llvm_unreachable("SelectionDAGBuilder shouldn't visit PHI nodes!");
 }
 
 void SelectionDAGBuilder::visit(unsigned Opcode, const User &I) {
   // Note: this doesn't use InstVisitor, because it has to work with
   // ConstantExpr's in addition to instructions.
   switch (Opcode) {
   default: llvm_unreachable("Unknown instruction type encountered!");
     // Build the switch statement using the Instruction.def file.
 #define HANDLE_INST(NUM, OPCODE, CLASS) \
     case Instruction::OPCODE: visit##OPCODE((const CLASS&)I); break;
 #include "llvm/IR/Instruction.def"
   }
 }
 
 void SelectionDAGBuilder::dropDanglingDebugInfo(const DILocalVariable *Variable,
                                                 const DIExpression *Expr) {
   auto isMatchingDbgValue = [&](DanglingDebugInfo &DDI) {
     const DbgValueInst *DI = DDI.getDI();
     DIVariable *DanglingVariable = DI->getVariable();
     DIExpression *DanglingExpr = DI->getExpression();
     if (DanglingVariable == Variable && Expr->fragmentsOverlap(DanglingExpr)) {
       LLVM_DEBUG(dbgs() << "Dropping dangling debug info for " << *DI << "\n");
       return true;
     }
     return false;
   };
 
   for (auto &DDIMI : DanglingDebugInfoMap) {
     DanglingDebugInfoVector &DDIV = DDIMI.second;
 
     // If debug info is to be dropped, run it through final checks to see
     // whether it can be salvaged.
     for (auto &DDI : DDIV)
       if (isMatchingDbgValue(DDI))
         salvageUnresolvedDbgValue(DDI);
 
     DDIV.erase(remove_if(DDIV, isMatchingDbgValue), DDIV.end());
   }
 }
 
 // resolveDanglingDebugInfo - if we saw an earlier dbg_value referring to V,
 // generate the debug data structures now that we've seen its definition.
 void SelectionDAGBuilder::resolveDanglingDebugInfo(const Value *V,
                                                    SDValue Val) {
   auto DanglingDbgInfoIt = DanglingDebugInfoMap.find(V);
   if (DanglingDbgInfoIt == DanglingDebugInfoMap.end())
     return;
 
   DanglingDebugInfoVector &DDIV = DanglingDbgInfoIt->second;
   for (auto &DDI : DDIV) {
     const DbgValueInst *DI = DDI.getDI();
     assert(DI && "Ill-formed DanglingDebugInfo");
     DebugLoc dl = DDI.getdl();
     unsigned ValSDNodeOrder = Val.getNode()->getIROrder();
     unsigned DbgSDNodeOrder = DDI.getSDNodeOrder();
     DILocalVariable *Variable = DI->getVariable();
     DIExpression *Expr = DI->getExpression();
     assert(Variable->isValidLocationForIntrinsic(dl) &&
            "Expected inlined-at fields to agree");
     SDDbgValue *SDV;
     if (Val.getNode()) {
       // FIXME: I doubt that it is correct to resolve a dangling DbgValue as a
       // FuncArgumentDbgValue (it would be hoisted to the function entry, and if
       // we couldn't resolve it directly when examining the DbgValue intrinsic
       // in the first place we should not be more successful here). Unless we
       // have some test case that prove this to be correct we should avoid
       // calling EmitFuncArgumentDbgValue here.
       if (!EmitFuncArgumentDbgValue(V, Variable, Expr, dl, false, Val)) {
         LLVM_DEBUG(dbgs() << "Resolve dangling debug info [order="
                           << DbgSDNodeOrder << "] for:\n  " << *DI << "\n");
         LLVM_DEBUG(dbgs() << "  By mapping to:\n    "; Val.dump());
         // Increase the SDNodeOrder for the DbgValue here to make sure it is
         // inserted after the definition of Val when emitting the instructions
         // after ISel. An alternative could be to teach
         // ScheduleDAGSDNodes::EmitSchedule to delay the insertion properly.
         LLVM_DEBUG(if (ValSDNodeOrder > DbgSDNodeOrder) dbgs()
                    << "changing SDNodeOrder from " << DbgSDNodeOrder << " to "
                    << ValSDNodeOrder << "\n");
         SDV = getDbgValue(Val, Variable, Expr, dl,
                           std::max(DbgSDNodeOrder, ValSDNodeOrder));
         DAG.AddDbgValue(SDV, Val.getNode(), false);
       } else
         LLVM_DEBUG(dbgs() << "Resolved dangling debug info for " << *DI
                           << "in EmitFuncArgumentDbgValue\n");
     } else {
       LLVM_DEBUG(dbgs() << "Dropping debug info for " << *DI << "\n");
       auto Undef =
           UndefValue::get(DDI.getDI()->getVariableLocation()->getType());
       auto SDV =
           DAG.getConstantDbgValue(Variable, Expr, Undef, dl, DbgSDNodeOrder);
       DAG.AddDbgValue(SDV, nullptr, false);
     }
   }
   DDIV.clear();
 }
 
 void SelectionDAGBuilder::salvageUnresolvedDbgValue(DanglingDebugInfo &DDI) {
   Value *V = DDI.getDI()->getValue();
   DILocalVariable *Var = DDI.getDI()->getVariable();
   DIExpression *Expr = DDI.getDI()->getExpression();
   DebugLoc DL = DDI.getdl();
   DebugLoc InstDL = DDI.getDI()->getDebugLoc();
   unsigned SDOrder = DDI.getSDNodeOrder();
 
   // Currently we consider only dbg.value intrinsics -- we tell the salvager
   // that DW_OP_stack_value is desired.
   assert(isa<DbgValueInst>(DDI.getDI()));
   bool StackValue = true;
 
   // Can this Value can be encoded without any further work?
   if (handleDebugValue(V, Var, Expr, DL, InstDL, SDOrder))
     return;
 
   // Attempt to salvage back through as many instructions as possible. Bail if
   // a non-instruction is seen, such as a constant expression or global
   // variable. FIXME: Further work could recover those too.
   while (isa<Instruction>(V)) {
     Instruction &VAsInst = *cast<Instruction>(V);
     DIExpression *NewExpr = salvageDebugInfoImpl(VAsInst, Expr, StackValue);
 
     // If we cannot salvage any further, and haven't yet found a suitable debug
     // expression, bail out.
     if (!NewExpr)
       break;
 
     // New value and expr now represent this debuginfo.
     V = VAsInst.getOperand(0);
     Expr = NewExpr;
 
     // Some kind of simplification occurred: check whether the operand of the
     // salvaged debug expression can be encoded in this DAG.
     if (handleDebugValue(V, Var, Expr, DL, InstDL, SDOrder)) {
       LLVM_DEBUG(dbgs() << "Salvaged debug location info for:\n  "
                         << DDI.getDI() << "\nBy stripping back to:\n  " << V);
       return;
     }
   }
 
   // This was the final opportunity to salvage this debug information, and it
   // couldn't be done. Place an undef DBG_VALUE at this location to terminate
   // any earlier variable location.
   auto Undef = UndefValue::get(DDI.getDI()->getVariableLocation()->getType());
   auto SDV = DAG.getConstantDbgValue(Var, Expr, Undef, DL, SDNodeOrder);
   DAG.AddDbgValue(SDV, nullptr, false);
 
   LLVM_DEBUG(dbgs() << "Dropping debug value info for:\n  " << DDI.getDI()
                     << "\n");
   LLVM_DEBUG(dbgs() << "  Last seen at:\n    " << *DDI.getDI()->getOperand(0)
                     << "\n");
 }
 
 bool SelectionDAGBuilder::handleDebugValue(const Value *V, DILocalVariable *Var,
                                            DIExpression *Expr, DebugLoc dl,
                                            DebugLoc InstDL, unsigned Order) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SDDbgValue *SDV;
   if (isa<ConstantInt>(V) || isa<ConstantFP>(V) || isa<UndefValue>(V) ||
       isa<ConstantPointerNull>(V)) {
     SDV = DAG.getConstantDbgValue(Var, Expr, V, dl, SDNodeOrder);
     DAG.AddDbgValue(SDV, nullptr, false);
     return true;
   }
 
   // If the Value is a frame index, we can create a FrameIndex debug value
   // without relying on the DAG at all.
   if (const AllocaInst *AI = dyn_cast<AllocaInst>(V)) {
     auto SI = FuncInfo.StaticAllocaMap.find(AI);
     if (SI != FuncInfo.StaticAllocaMap.end()) {
       auto SDV =
           DAG.getFrameIndexDbgValue(Var, Expr, SI->second,
                                     /*IsIndirect*/ false, dl, SDNodeOrder);
       // Do not attach the SDNodeDbgValue to an SDNode: this variable location
       // is still available even if the SDNode gets optimized out.
       DAG.AddDbgValue(SDV, nullptr, false);
       return true;
     }
   }
 
   // Do not use getValue() in here; we don't want to generate code at
   // this point if it hasn't been done yet.
   SDValue N = NodeMap[V];
   if (!N.getNode() && isa<Argument>(V)) // Check unused arguments map.
     N = UnusedArgNodeMap[V];
   if (N.getNode()) {
     if (EmitFuncArgumentDbgValue(V, Var, Expr, dl, false, N))
       return true;
     SDV = getDbgValue(N, Var, Expr, dl, SDNodeOrder);
     DAG.AddDbgValue(SDV, N.getNode(), false);
     return true;
   }
 
   // Special rules apply for the first dbg.values of parameter variables in a
   // function. Identify them by the fact they reference Argument Values, that
   // they're parameters, and they are parameters of the current function. We
   // need to let them dangle until they get an SDNode.
   bool IsParamOfFunc = isa<Argument>(V) && Var->isParameter() &&
                        !InstDL.getInlinedAt();
   if (!IsParamOfFunc) {
     // The value is not used in this block yet (or it would have an SDNode).
     // We still want the value to appear for the user if possible -- if it has
     // an associated VReg, we can refer to that instead.
     auto VMI = FuncInfo.ValueMap.find(V);
     if (VMI != FuncInfo.ValueMap.end()) {
       unsigned Reg = VMI->second;
       // If this is a PHI node, it may be split up into several MI PHI nodes
       // (in FunctionLoweringInfo::set).
       RegsForValue RFV(V->getContext(), TLI, DAG.getDataLayout(), Reg,
                        V->getType(), None);
       if (RFV.occupiesMultipleRegs()) {
         unsigned Offset = 0;
         unsigned BitsToDescribe = 0;
         if (auto VarSize = Var->getSizeInBits())
           BitsToDescribe = *VarSize;
         if (auto Fragment = Expr->getFragmentInfo())
           BitsToDescribe = Fragment->SizeInBits;
         for (auto RegAndSize : RFV.getRegsAndSizes()) {
           unsigned RegisterSize = RegAndSize.second;
           // Bail out if all bits are described already.
           if (Offset >= BitsToDescribe)
             break;
           unsigned FragmentSize = (Offset + RegisterSize > BitsToDescribe)
               ? BitsToDescribe - Offset
               : RegisterSize;
           auto FragmentExpr = DIExpression::createFragmentExpression(
               Expr, Offset, FragmentSize);
           if (!FragmentExpr)
               continue;
           SDV = DAG.getVRegDbgValue(Var, *FragmentExpr, RegAndSize.first,
                                     false, dl, SDNodeOrder);
           DAG.AddDbgValue(SDV, nullptr, false);
           Offset += RegisterSize;
         }
       } else {
         SDV = DAG.getVRegDbgValue(Var, Expr, Reg, false, dl, SDNodeOrder);
         DAG.AddDbgValue(SDV, nullptr, false);
       }
       return true;
     }
   }
 
   return false;
 }
 
 void SelectionDAGBuilder::resolveOrClearDbgInfo() {
   // Try to fixup any remaining dangling debug info -- and drop it if we can't.
   for (auto &Pair : DanglingDebugInfoMap)
     for (auto &DDI : Pair.second)
       salvageUnresolvedDbgValue(DDI);
   clearDanglingDebugInfo();
 }
 
 /// getCopyFromRegs - If there was virtual register allocated for the value V
 /// emit CopyFromReg of the specified type Ty. Return empty SDValue() otherwise.
 SDValue SelectionDAGBuilder::getCopyFromRegs(const Value *V, Type *Ty) {
   DenseMap<const Value *, Register>::iterator It = FuncInfo.ValueMap.find(V);
   SDValue Result;
 
   if (It != FuncInfo.ValueMap.end()) {
     Register InReg = It->second;
 
     RegsForValue RFV(*DAG.getContext(), DAG.getTargetLoweringInfo(),
                      DAG.getDataLayout(), InReg, Ty,
                      None); // This is not an ABI copy.
     SDValue Chain = DAG.getEntryNode();
     Result = RFV.getCopyFromRegs(DAG, FuncInfo, getCurSDLoc(), Chain, nullptr,
                                  V);
     resolveDanglingDebugInfo(V, Result);
   }
 
   return Result;
 }
 
 /// getValue - Return an SDValue for the given Value.
 SDValue SelectionDAGBuilder::getValue(const Value *V) {
   // If we already have an SDValue for this value, use it. It's important
   // to do this first, so that we don't create a CopyFromReg if we already
   // have a regular SDValue.
   SDValue &N = NodeMap[V];
   if (N.getNode()) return N;
 
   // If there's a virtual register allocated and initialized for this
   // value, use it.
   if (SDValue copyFromReg = getCopyFromRegs(V, V->getType()))
     return copyFromReg;
 
   // Otherwise create a new SDValue and remember it.
   SDValue Val = getValueImpl(V);
   NodeMap[V] = Val;
   resolveDanglingDebugInfo(V, Val);
   return Val;
 }
 
 /// getNonRegisterValue - Return an SDValue for the given Value, but
 /// don't look in FuncInfo.ValueMap for a virtual register.
 SDValue SelectionDAGBuilder::getNonRegisterValue(const Value *V) {
   // If we already have an SDValue for this value, use it.
   SDValue &N = NodeMap[V];
   if (N.getNode()) {
     if (isa<ConstantSDNode>(N) || isa<ConstantFPSDNode>(N)) {
       // Remove the debug location from the node as the node is about to be used
       // in a location which may differ from the original debug location.  This
       // is relevant to Constant and ConstantFP nodes because they can appear
       // as constant expressions inside PHI nodes.
       N->setDebugLoc(DebugLoc());
     }
     return N;
   }
 
   // Otherwise create a new SDValue and remember it.
   SDValue Val = getValueImpl(V);
   NodeMap[V] = Val;
   resolveDanglingDebugInfo(V, Val);
   return Val;
 }
 
 /// getValueImpl - Helper function for getValue and getNonRegisterValue.
 /// Create an SDValue for the given value.
 SDValue SelectionDAGBuilder::getValueImpl(const Value *V) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
 
   if (const Constant *C = dyn_cast<Constant>(V)) {
     EVT VT = TLI.getValueType(DAG.getDataLayout(), V->getType(), true);
 
     if (const ConstantInt *CI = dyn_cast<ConstantInt>(C))
       return DAG.getConstant(*CI, getCurSDLoc(), VT);
 
     if (const GlobalValue *GV = dyn_cast<GlobalValue>(C))
       return DAG.getGlobalAddress(GV, getCurSDLoc(), VT);
 
     if (isa<ConstantPointerNull>(C)) {
       unsigned AS = V->getType()->getPointerAddressSpace();
       return DAG.getConstant(0, getCurSDLoc(),
                              TLI.getPointerTy(DAG.getDataLayout(), AS));
     }
 
     if (match(C, m_VScale(DAG.getDataLayout())))
       return DAG.getVScale(getCurSDLoc(), VT, APInt(VT.getSizeInBits(), 1));
 
     if (const ConstantFP *CFP = dyn_cast<ConstantFP>(C))
       return DAG.getConstantFP(*CFP, getCurSDLoc(), VT);
 
     if (isa<UndefValue>(C) && !V->getType()->isAggregateType())
       return DAG.getUNDEF(VT);
 
     if (const ConstantExpr *CE = dyn_cast<ConstantExpr>(C)) {
       visit(CE->getOpcode(), *CE);
       SDValue N1 = NodeMap[V];
       assert(N1.getNode() && "visit didn't populate the NodeMap!");
       return N1;
     }
 
     if (isa<ConstantStruct>(C) || isa<ConstantArray>(C)) {
       SmallVector<SDValue, 4> Constants;
       for (User::const_op_iterator OI = C->op_begin(), OE = C->op_end();
            OI != OE; ++OI) {
         SDNode *Val = getValue(*OI).getNode();
         // If the operand is an empty aggregate, there are no values.
         if (!Val) continue;
         // Add each leaf value from the operand to the Constants list
         // to form a flattened list of all the values.
         for (unsigned i = 0, e = Val->getNumValues(); i != e; ++i)
           Constants.push_back(SDValue(Val, i));
       }
 
       return DAG.getMergeValues(Constants, getCurSDLoc());
     }
 
     if (const ConstantDataSequential *CDS =
           dyn_cast<ConstantDataSequential>(C)) {
       SmallVector<SDValue, 4> Ops;
       for (unsigned i = 0, e = CDS->getNumElements(); i != e; ++i) {
         SDNode *Val = getValue(CDS->getElementAsConstant(i)).getNode();
         // Add each leaf value from the operand to the Constants list
         // to form a flattened list of all the values.
         for (unsigned i = 0, e = Val->getNumValues(); i != e; ++i)
           Ops.push_back(SDValue(Val, i));
       }
 
       if (isa<ArrayType>(CDS->getType()))
         return DAG.getMergeValues(Ops, getCurSDLoc());
       return NodeMap[V] = DAG.getBuildVector(VT, getCurSDLoc(), Ops);
     }
 
     if (C->getType()->isStructTy() || C->getType()->isArrayTy()) {
       assert((isa<ConstantAggregateZero>(C) || isa<UndefValue>(C)) &&
              "Unknown struct or array constant!");
 
       SmallVector<EVT, 4> ValueVTs;
       ComputeValueVTs(TLI, DAG.getDataLayout(), C->getType(), ValueVTs);
       unsigned NumElts = ValueVTs.size();
       if (NumElts == 0)
         return SDValue(); // empty struct
       SmallVector<SDValue, 4> Constants(NumElts);
       for (unsigned i = 0; i != NumElts; ++i) {
         EVT EltVT = ValueVTs[i];
         if (isa<UndefValue>(C))
           Constants[i] = DAG.getUNDEF(EltVT);
         else if (EltVT.isFloatingPoint())
           Constants[i] = DAG.getConstantFP(0, getCurSDLoc(), EltVT);
         else
           Constants[i] = DAG.getConstant(0, getCurSDLoc(), EltVT);
       }
 
       return DAG.getMergeValues(Constants, getCurSDLoc());
     }
 
     if (const BlockAddress *BA = dyn_cast<BlockAddress>(C))
       return DAG.getBlockAddress(BA, VT);
 
     VectorType *VecTy = cast<VectorType>(V->getType());
 
     // Now that we know the number and type of the elements, get that number of
     // elements into the Ops array based on what kind of constant it is.
     if (const ConstantVector *CV = dyn_cast<ConstantVector>(C)) {
       SmallVector<SDValue, 16> Ops;
       unsigned NumElements = cast<FixedVectorType>(VecTy)->getNumElements();
       for (unsigned i = 0; i != NumElements; ++i)
         Ops.push_back(getValue(CV->getOperand(i)));
 
       return NodeMap[V] = DAG.getBuildVector(VT, getCurSDLoc(), Ops);
     } else if (isa<ConstantAggregateZero>(C)) {
       EVT EltVT =
           TLI.getValueType(DAG.getDataLayout(), VecTy->getElementType());
 
       SDValue Op;
       if (EltVT.isFloatingPoint())
         Op = DAG.getConstantFP(0, getCurSDLoc(), EltVT);
       else
         Op = DAG.getConstant(0, getCurSDLoc(), EltVT);
 
       if (isa<ScalableVectorType>(VecTy))
         return NodeMap[V] = DAG.getSplatVector(VT, getCurSDLoc(), Op);
       else {
         SmallVector<SDValue, 16> Ops;
         Ops.assign(cast<FixedVectorType>(VecTy)->getNumElements(), Op);
         return NodeMap[V] = DAG.getBuildVector(VT, getCurSDLoc(), Ops);
       }
     }
     llvm_unreachable("Unknown vector constant");
   }
 
   // If this is a static alloca, generate it as the frameindex instead of
   // computation.
   if (const AllocaInst *AI = dyn_cast<AllocaInst>(V)) {
     DenseMap<const AllocaInst*, int>::iterator SI =
       FuncInfo.StaticAllocaMap.find(AI);
     if (SI != FuncInfo.StaticAllocaMap.end())
       return DAG.getFrameIndex(SI->second,
                                TLI.getFrameIndexTy(DAG.getDataLayout()));
   }
 
   // If this is an instruction which fast-isel has deferred, select it now.
   if (const Instruction *Inst = dyn_cast<Instruction>(V)) {
     unsigned InReg = FuncInfo.InitializeRegForValue(Inst);
 
     RegsForValue RFV(*DAG.getContext(), TLI, DAG.getDataLayout(), InReg,
-                     Inst->getType(), getABIRegCopyCC(V));
+                     Inst->getType(), None);
     SDValue Chain = DAG.getEntryNode();
     return RFV.getCopyFromRegs(DAG, FuncInfo, getCurSDLoc(), Chain, nullptr, V);
   }
 
   if (const MetadataAsValue *MD = dyn_cast<MetadataAsValue>(V)) {
     return DAG.getMDNode(cast<MDNode>(MD->getMetadata()));
   }
   llvm_unreachable("Can't get register for value!");
 }
 
 void SelectionDAGBuilder::visitCatchPad(const CatchPadInst &I) {
   auto Pers = classifyEHPersonality(FuncInfo.Fn->getPersonalityFn());
   bool IsMSVCCXX = Pers == EHPersonality::MSVC_CXX;
   bool IsCoreCLR = Pers == EHPersonality::CoreCLR;
   bool IsSEH = isAsynchronousEHPersonality(Pers);
   MachineBasicBlock *CatchPadMBB = FuncInfo.MBB;
   if (!IsSEH)
     CatchPadMBB->setIsEHScopeEntry();
   // In MSVC C++ and CoreCLR, catchblocks are funclets and need prologues.
   if (IsMSVCCXX || IsCoreCLR)
     CatchPadMBB->setIsEHFuncletEntry();
 }
 
 void SelectionDAGBuilder::visitCatchRet(const CatchReturnInst &I) {
   // Update machine-CFG edge.
   MachineBasicBlock *TargetMBB = FuncInfo.MBBMap[I.getSuccessor()];
   FuncInfo.MBB->addSuccessor(TargetMBB);
 
   auto Pers = classifyEHPersonality(FuncInfo.Fn->getPersonalityFn());
   bool IsSEH = isAsynchronousEHPersonality(Pers);
   if (IsSEH) {
     // If this is not a fall-through branch or optimizations are switched off,
     // emit the branch.
     if (TargetMBB != NextBlock(FuncInfo.MBB) ||
         TM.getOptLevel() == CodeGenOpt::None)
       DAG.setRoot(DAG.getNode(ISD::BR, getCurSDLoc(), MVT::Other,
                               getControlRoot(), DAG.getBasicBlock(TargetMBB)));
     return;
   }
 
   // Figure out the funclet membership for the catchret's successor.
   // This will be used by the FuncletLayout pass to determine how to order the
   // BB's.
   // A 'catchret' returns to the outer scope's color.
   Value *ParentPad = I.getCatchSwitchParentPad();
   const BasicBlock *SuccessorColor;
   if (isa<ConstantTokenNone>(ParentPad))
     SuccessorColor = &FuncInfo.Fn->getEntryBlock();
   else
     SuccessorColor = cast<Instruction>(ParentPad)->getParent();
   assert(SuccessorColor && "No parent funclet for catchret!");
   MachineBasicBlock *SuccessorColorMBB = FuncInfo.MBBMap[SuccessorColor];
   assert(SuccessorColorMBB && "No MBB for SuccessorColor!");
 
   // Create the terminator node.
   SDValue Ret = DAG.getNode(ISD::CATCHRET, getCurSDLoc(), MVT::Other,
                             getControlRoot(), DAG.getBasicBlock(TargetMBB),
                             DAG.getBasicBlock(SuccessorColorMBB));
   DAG.setRoot(Ret);
 }
 
 void SelectionDAGBuilder::visitCleanupPad(const CleanupPadInst &CPI) {
   // Don't emit any special code for the cleanuppad instruction. It just marks
   // the start of an EH scope/funclet.
   FuncInfo.MBB->setIsEHScopeEntry();
   auto Pers = classifyEHPersonality(FuncInfo.Fn->getPersonalityFn());
   if (Pers != EHPersonality::Wasm_CXX) {
     FuncInfo.MBB->setIsEHFuncletEntry();
     FuncInfo.MBB->setIsCleanupFuncletEntry();
   }
 }
 
 // For wasm, there's alwyas a single catch pad attached to a catchswitch, and
 // the control flow always stops at the single catch pad, as it does for a
 // cleanup pad. In case the exception caught is not of the types the catch pad
 // catches, it will be rethrown by a rethrow.
 static void findWasmUnwindDestinations(
     FunctionLoweringInfo &FuncInfo, const BasicBlock *EHPadBB,
     BranchProbability Prob,
     SmallVectorImpl<std::pair<MachineBasicBlock *, BranchProbability>>
         &UnwindDests) {
   while (EHPadBB) {
     const Instruction *Pad = EHPadBB->getFirstNonPHI();
     if (isa<CleanupPadInst>(Pad)) {
       // Stop on cleanup pads.
       UnwindDests.emplace_back(FuncInfo.MBBMap[EHPadBB], Prob);
       UnwindDests.back().first->setIsEHScopeEntry();
       break;
     } else if (auto *CatchSwitch = dyn_cast<CatchSwitchInst>(Pad)) {
       // Add the catchpad handlers to the possible destinations. We don't
       // continue to the unwind destination of the catchswitch for wasm.
       for (const BasicBlock *CatchPadBB : CatchSwitch->handlers()) {
         UnwindDests.emplace_back(FuncInfo.MBBMap[CatchPadBB], Prob);
         UnwindDests.back().first->setIsEHScopeEntry();
       }
       break;
     } else {
       continue;
     }
   }
 }
 
 /// When an invoke or a cleanupret unwinds to the next EH pad, there are
 /// many places it could ultimately go. In the IR, we have a single unwind
 /// destination, but in the machine CFG, we enumerate all the possible blocks.
 /// This function skips over imaginary basic blocks that hold catchswitch
 /// instructions, and finds all the "real" machine
 /// basic block destinations. As those destinations may not be successors of
 /// EHPadBB, here we also calculate the edge probability to those destinations.
 /// The passed-in Prob is the edge probability to EHPadBB.
 static void findUnwindDestinations(
     FunctionLoweringInfo &FuncInfo, const BasicBlock *EHPadBB,
     BranchProbability Prob,
     SmallVectorImpl<std::pair<MachineBasicBlock *, BranchProbability>>
         &UnwindDests) {
   EHPersonality Personality =
     classifyEHPersonality(FuncInfo.Fn->getPersonalityFn());
   bool IsMSVCCXX = Personality == EHPersonality::MSVC_CXX;
   bool IsCoreCLR = Personality == EHPersonality::CoreCLR;
   bool IsWasmCXX = Personality == EHPersonality::Wasm_CXX;
   bool IsSEH = isAsynchronousEHPersonality(Personality);
 
   if (IsWasmCXX) {
     findWasmUnwindDestinations(FuncInfo, EHPadBB, Prob, UnwindDests);
     assert(UnwindDests.size() <= 1 &&
            "There should be at most one unwind destination for wasm");
     return;
   }
 
   while (EHPadBB) {
     const Instruction *Pad = EHPadBB->getFirstNonPHI();
     BasicBlock *NewEHPadBB = nullptr;
     if (isa<LandingPadInst>(Pad)) {
       // Stop on landingpads. They are not funclets.
       UnwindDests.emplace_back(FuncInfo.MBBMap[EHPadBB], Prob);
       break;
     } else if (isa<CleanupPadInst>(Pad)) {
       // Stop on cleanup pads. Cleanups are always funclet entries for all known
       // personalities.
       UnwindDests.emplace_back(FuncInfo.MBBMap[EHPadBB], Prob);
       UnwindDests.back().first->setIsEHScopeEntry();
       UnwindDests.back().first->setIsEHFuncletEntry();
       break;
     } else if (auto *CatchSwitch = dyn_cast<CatchSwitchInst>(Pad)) {
       // Add the catchpad handlers to the possible destinations.
       for (const BasicBlock *CatchPadBB : CatchSwitch->handlers()) {
         UnwindDests.emplace_back(FuncInfo.MBBMap[CatchPadBB], Prob);
         // For MSVC++ and the CLR, catchblocks are funclets and need prologues.
         if (IsMSVCCXX || IsCoreCLR)
           UnwindDests.back().first->setIsEHFuncletEntry();
         if (!IsSEH)
           UnwindDests.back().first->setIsEHScopeEntry();
       }
       NewEHPadBB = CatchSwitch->getUnwindDest();
     } else {
       continue;
     }
 
     BranchProbabilityInfo *BPI = FuncInfo.BPI;
     if (BPI && NewEHPadBB)
       Prob *= BPI->getEdgeProbability(EHPadBB, NewEHPadBB);
     EHPadBB = NewEHPadBB;
   }
 }
 
 void SelectionDAGBuilder::visitCleanupRet(const CleanupReturnInst &I) {
   // Update successor info.
   SmallVector<std::pair<MachineBasicBlock *, BranchProbability>, 1> UnwindDests;
   auto UnwindDest = I.getUnwindDest();
   BranchProbabilityInfo *BPI = FuncInfo.BPI;
   BranchProbability UnwindDestProb =
       (BPI && UnwindDest)
           ? BPI->getEdgeProbability(FuncInfo.MBB->getBasicBlock(), UnwindDest)
           : BranchProbability::getZero();
   findUnwindDestinations(FuncInfo, UnwindDest, UnwindDestProb, UnwindDests);
   for (auto &UnwindDest : UnwindDests) {
     UnwindDest.first->setIsEHPad();
     addSuccessorWithProb(FuncInfo.MBB, UnwindDest.first, UnwindDest.second);
   }
   FuncInfo.MBB->normalizeSuccProbs();
 
   // Create the terminator node.
   SDValue Ret =
       DAG.getNode(ISD::CLEANUPRET, getCurSDLoc(), MVT::Other, getControlRoot());
   DAG.setRoot(Ret);
 }
 
 void SelectionDAGBuilder::visitCatchSwitch(const CatchSwitchInst &CSI) {
   report_fatal_error("visitCatchSwitch not yet implemented!");
 }
 
 void SelectionDAGBuilder::visitRet(const ReturnInst &I) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   auto &DL = DAG.getDataLayout();
   SDValue Chain = getControlRoot();
   SmallVector<ISD::OutputArg, 8> Outs;
   SmallVector<SDValue, 8> OutVals;
 
   // Calls to @llvm.experimental.deoptimize don't generate a return value, so
   // lower
   //
   //   %val = call <ty> @llvm.experimental.deoptimize()
   //   ret <ty> %val
   //
   // differently.
   if (I.getParent()->getTerminatingDeoptimizeCall()) {
     LowerDeoptimizingReturn();
     return;
   }
 
   if (!FuncInfo.CanLowerReturn) {
     unsigned DemoteReg = FuncInfo.DemoteRegister;
     const Function *F = I.getParent()->getParent();
 
     // Emit a store of the return value through the virtual register.
     // Leave Outs empty so that LowerReturn won't try to load return
     // registers the usual way.
     SmallVector<EVT, 1> PtrValueVTs;
     ComputeValueVTs(TLI, DL,
                     F->getReturnType()->getPointerTo(
                         DAG.getDataLayout().getAllocaAddrSpace()),
                     PtrValueVTs);
 
     SDValue RetPtr = DAG.getCopyFromReg(DAG.getEntryNode(), getCurSDLoc(),
                                         DemoteReg, PtrValueVTs[0]);
     SDValue RetOp = getValue(I.getOperand(0));
 
     SmallVector<EVT, 4> ValueVTs, MemVTs;
     SmallVector<uint64_t, 4> Offsets;
     ComputeValueVTs(TLI, DL, I.getOperand(0)->getType(), ValueVTs, &MemVTs,
                     &Offsets);
     unsigned NumValues = ValueVTs.size();
 
     SmallVector<SDValue, 4> Chains(NumValues);
     Align BaseAlign = DL.getPrefTypeAlign(I.getOperand(0)->getType());
     for (unsigned i = 0; i != NumValues; ++i) {
       // An aggregate return value cannot wrap around the address space, so
       // offsets to its parts don't wrap either.
       SDValue Ptr = DAG.getObjectPtrOffset(getCurSDLoc(), RetPtr, Offsets[i]);
 
       SDValue Val = RetOp.getValue(RetOp.getResNo() + i);
       if (MemVTs[i] != ValueVTs[i])
         Val = DAG.getPtrExtOrTrunc(Val, getCurSDLoc(), MemVTs[i]);
       Chains[i] = DAG.getStore(
           Chain, getCurSDLoc(), Val,
           // FIXME: better loc info would be nice.
           Ptr, MachinePointerInfo::getUnknownStack(DAG.getMachineFunction()),
           commonAlignment(BaseAlign, Offsets[i]));
     }
 
     Chain = DAG.getNode(ISD::TokenFactor, getCurSDLoc(),
                         MVT::Other, Chains);
   } else if (I.getNumOperands() != 0) {
     SmallVector<EVT, 4> ValueVTs;
     ComputeValueVTs(TLI, DL, I.getOperand(0)->getType(), ValueVTs);
     unsigned NumValues = ValueVTs.size();
     if (NumValues) {
       SDValue RetOp = getValue(I.getOperand(0));
 
       const Function *F = I.getParent()->getParent();
 
       bool NeedsRegBlock = TLI.functionArgumentNeedsConsecutiveRegisters(
           I.getOperand(0)->getType(), F->getCallingConv(),
           /*IsVarArg*/ false);
 
       ISD::NodeType ExtendKind = ISD::ANY_EXTEND;
       if (F->getAttributes().hasAttribute(AttributeList::ReturnIndex,
                                           Attribute::SExt))
         ExtendKind = ISD::SIGN_EXTEND;
       else if (F->getAttributes().hasAttribute(AttributeList::ReturnIndex,
                                                Attribute::ZExt))
         ExtendKind = ISD::ZERO_EXTEND;
 
       LLVMContext &Context = F->getContext();
       bool RetInReg = F->getAttributes().hasAttribute(
           AttributeList::ReturnIndex, Attribute::InReg);
 
       for (unsigned j = 0; j != NumValues; ++j) {
         EVT VT = ValueVTs[j];
 
         if (ExtendKind != ISD::ANY_EXTEND && VT.isInteger())
           VT = TLI.getTypeForExtReturn(Context, VT, ExtendKind);
 
         CallingConv::ID CC = F->getCallingConv();
 
         unsigned NumParts = TLI.getNumRegistersForCallingConv(Context, CC, VT);
         MVT PartVT = TLI.getRegisterTypeForCallingConv(Context, CC, VT);
         SmallVector<SDValue, 4> Parts(NumParts);
         getCopyToParts(DAG, getCurSDLoc(),
                        SDValue(RetOp.getNode(), RetOp.getResNo() + j),
                        &Parts[0], NumParts, PartVT, &I, CC, ExtendKind);
 
         // 'inreg' on function refers to return value
         ISD::ArgFlagsTy Flags = ISD::ArgFlagsTy();
         if (RetInReg)
           Flags.setInReg();
 
         if (I.getOperand(0)->getType()->isPointerTy()) {
           Flags.setPointer();
           Flags.setPointerAddrSpace(
               cast<PointerType>(I.getOperand(0)->getType())->getAddressSpace());
         }
 
         if (NeedsRegBlock) {
           Flags.setInConsecutiveRegs();
           if (j == NumValues - 1)
             Flags.setInConsecutiveRegsLast();
         }
 
         // Propagate extension type if any
         if (ExtendKind == ISD::SIGN_EXTEND)
           Flags.setSExt();
         else if (ExtendKind == ISD::ZERO_EXTEND)
           Flags.setZExt();
 
         for (unsigned i = 0; i < NumParts; ++i) {
           Outs.push_back(ISD::OutputArg(Flags, Parts[i].getValueType(),
                                         VT, /*isfixed=*/true, 0, 0));
           OutVals.push_back(Parts[i]);
         }
       }
     }
   }
 
   // Push in swifterror virtual register as the last element of Outs. This makes
   // sure swifterror virtual register will be returned in the swifterror
   // physical register.
   const Function *F = I.getParent()->getParent();
   if (TLI.supportSwiftError() &&
       F->getAttributes().hasAttrSomewhere(Attribute::SwiftError)) {
     assert(SwiftError.getFunctionArg() && "Need a swift error argument");
     ISD::ArgFlagsTy Flags = ISD::ArgFlagsTy();
     Flags.setSwiftError();
     Outs.push_back(ISD::OutputArg(Flags, EVT(TLI.getPointerTy(DL)) /*vt*/,
                                   EVT(TLI.getPointerTy(DL)) /*argvt*/,
                                   true /*isfixed*/, 1 /*origidx*/,
                                   0 /*partOffs*/));
     // Create SDNode for the swifterror virtual register.
     OutVals.push_back(
         DAG.getRegister(SwiftError.getOrCreateVRegUseAt(
                             &I, FuncInfo.MBB, SwiftError.getFunctionArg()),
                         EVT(TLI.getPointerTy(DL))));
   }
 
   bool isVarArg = DAG.getMachineFunction().getFunction().isVarArg();
   CallingConv::ID CallConv =
     DAG.getMachineFunction().getFunction().getCallingConv();
   Chain = DAG.getTargetLoweringInfo().LowerReturn(
       Chain, CallConv, isVarArg, Outs, OutVals, getCurSDLoc(), DAG);
 
   // Verify that the target's LowerReturn behaved as expected.
   assert(Chain.getNode() && Chain.getValueType() == MVT::Other &&
          "LowerReturn didn't return a valid chain!");
 
   // Update the DAG with the new chain value resulting from return lowering.
   DAG.setRoot(Chain);
 }
 
 /// CopyToExportRegsIfNeeded - If the given value has virtual registers
 /// created for it, emit nodes to copy the value into the virtual
 /// registers.
 void SelectionDAGBuilder::CopyToExportRegsIfNeeded(const Value *V) {
   // Skip empty types
   if (V->getType()->isEmptyTy())
     return;
 
   DenseMap<const Value *, Register>::iterator VMI = FuncInfo.ValueMap.find(V);
   if (VMI != FuncInfo.ValueMap.end()) {
     assert(!V->use_empty() && "Unused value assigned virtual registers!");
     CopyValueToVirtualRegister(V, VMI->second);
   }
 }
 
 /// ExportFromCurrentBlock - If this condition isn't known to be exported from
 /// the current basic block, add it to ValueMap now so that we'll get a
 /// CopyTo/FromReg.
 void SelectionDAGBuilder::ExportFromCurrentBlock(const Value *V) {
   // No need to export constants.
   if (!isa<Instruction>(V) && !isa<Argument>(V)) return;
 
   // Already exported?
   if (FuncInfo.isExportedInst(V)) return;
 
   unsigned Reg = FuncInfo.InitializeRegForValue(V);
   CopyValueToVirtualRegister(V, Reg);
 }
 
 bool SelectionDAGBuilder::isExportableFromCurrentBlock(const Value *V,
                                                      const BasicBlock *FromBB) {
   // The operands of the setcc have to be in this block.  We don't know
   // how to export them from some other block.
   if (const Instruction *VI = dyn_cast<Instruction>(V)) {
     // Can export from current BB.
     if (VI->getParent() == FromBB)
       return true;
 
     // Is already exported, noop.
     return FuncInfo.isExportedInst(V);
   }
 
   // If this is an argument, we can export it if the BB is the entry block or
   // if it is already exported.
   if (isa<Argument>(V)) {
     if (FromBB == &FromBB->getParent()->getEntryBlock())
       return true;
 
     // Otherwise, can only export this if it is already exported.
     return FuncInfo.isExportedInst(V);
   }
 
   // Otherwise, constants can always be exported.
   return true;
 }
 
 /// Return branch probability calculated by BranchProbabilityInfo for IR blocks.
 BranchProbability
 SelectionDAGBuilder::getEdgeProbability(const MachineBasicBlock *Src,
                                         const MachineBasicBlock *Dst) const {
   BranchProbabilityInfo *BPI = FuncInfo.BPI;
   const BasicBlock *SrcBB = Src->getBasicBlock();
   const BasicBlock *DstBB = Dst->getBasicBlock();
   if (!BPI) {
     // If BPI is not available, set the default probability as 1 / N, where N is
     // the number of successors.
     auto SuccSize = std::max<uint32_t>(succ_size(SrcBB), 1);
     return BranchProbability(1, SuccSize);
   }
   return BPI->getEdgeProbability(SrcBB, DstBB);
 }
 
 void SelectionDAGBuilder::addSuccessorWithProb(MachineBasicBlock *Src,
                                                MachineBasicBlock *Dst,
                                                BranchProbability Prob) {
   if (!FuncInfo.BPI)
     Src->addSuccessorWithoutProb(Dst);
   else {
     if (Prob.isUnknown())
       Prob = getEdgeProbability(Src, Dst);
     Src->addSuccessor(Dst, Prob);
   }
 }
 
 static bool InBlock(const Value *V, const BasicBlock *BB) {
   if (const Instruction *I = dyn_cast<Instruction>(V))
     return I->getParent() == BB;
   return true;
 }
 
 /// EmitBranchForMergedCondition - Helper method for FindMergedConditions.
 /// This function emits a branch and is used at the leaves of an OR or an
 /// AND operator tree.
 void
 SelectionDAGBuilder::EmitBranchForMergedCondition(const Value *Cond,
                                                   MachineBasicBlock *TBB,
                                                   MachineBasicBlock *FBB,
                                                   MachineBasicBlock *CurBB,
                                                   MachineBasicBlock *SwitchBB,
                                                   BranchProbability TProb,
                                                   BranchProbability FProb,
                                                   bool InvertCond) {
   const BasicBlock *BB = CurBB->getBasicBlock();
 
   // If the leaf of the tree is a comparison, merge the condition into
   // the caseblock.
   if (const CmpInst *BOp = dyn_cast<CmpInst>(Cond)) {
     // The operands of the cmp have to be in this block.  We don't know
     // how to export them from some other block.  If this is the first block
     // of the sequence, no exporting is needed.
     if (CurBB == SwitchBB ||
         (isExportableFromCurrentBlock(BOp->getOperand(0), BB) &&
          isExportableFromCurrentBlock(BOp->getOperand(1), BB))) {
       ISD::CondCode Condition;
       if (const ICmpInst *IC = dyn_cast<ICmpInst>(Cond)) {
         ICmpInst::Predicate Pred =
             InvertCond ? IC->getInversePredicate() : IC->getPredicate();
         Condition = getICmpCondCode(Pred);
       } else {
         const FCmpInst *FC = cast<FCmpInst>(Cond);
         FCmpInst::Predicate Pred =
             InvertCond ? FC->getInversePredicate() : FC->getPredicate();
         Condition = getFCmpCondCode(Pred);
         if (TM.Options.NoNaNsFPMath)
           Condition = getFCmpCodeWithoutNaN(Condition);
       }
 
       CaseBlock CB(Condition, BOp->getOperand(0), BOp->getOperand(1), nullptr,
                    TBB, FBB, CurBB, getCurSDLoc(), TProb, FProb);
       SL->SwitchCases.push_back(CB);
       return;
     }
   }
 
   // Create a CaseBlock record representing this branch.
   ISD::CondCode Opc = InvertCond ? ISD::SETNE : ISD::SETEQ;
   CaseBlock CB(Opc, Cond, ConstantInt::getTrue(*DAG.getContext()),
                nullptr, TBB, FBB, CurBB, getCurSDLoc(), TProb, FProb);
   SL->SwitchCases.push_back(CB);
 }
 
 void SelectionDAGBuilder::FindMergedConditions(const Value *Cond,
                                                MachineBasicBlock *TBB,
                                                MachineBasicBlock *FBB,
                                                MachineBasicBlock *CurBB,
                                                MachineBasicBlock *SwitchBB,
                                                Instruction::BinaryOps Opc,
                                                BranchProbability TProb,
                                                BranchProbability FProb,
                                                bool InvertCond) {
   // Skip over not part of the tree and remember to invert op and operands at
   // next level.
   Value *NotCond;
   if (match(Cond, m_OneUse(m_Not(m_Value(NotCond)))) &&
       InBlock(NotCond, CurBB->getBasicBlock())) {
     FindMergedConditions(NotCond, TBB, FBB, CurBB, SwitchBB, Opc, TProb, FProb,
                          !InvertCond);
     return;
   }
 
   const Instruction *BOp = dyn_cast<Instruction>(Cond);
   // Compute the effective opcode for Cond, taking into account whether it needs
   // to be inverted, e.g.
   //   and (not (or A, B)), C
   // gets lowered as
   //   and (and (not A, not B), C)
   unsigned BOpc = 0;
   if (BOp) {
     BOpc = BOp->getOpcode();
     if (InvertCond) {
       if (BOpc == Instruction::And)
         BOpc = Instruction::Or;
       else if (BOpc == Instruction::Or)
         BOpc = Instruction::And;
     }
   }
 
   // If this node is not part of the or/and tree, emit it as a branch.
   if (!BOp || !(isa<BinaryOperator>(BOp) || isa<CmpInst>(BOp)) ||
       BOpc != unsigned(Opc) || !BOp->hasOneUse() ||
       BOp->getParent() != CurBB->getBasicBlock() ||
       !InBlock(BOp->getOperand(0), CurBB->getBasicBlock()) ||
       !InBlock(BOp->getOperand(1), CurBB->getBasicBlock())) {
     EmitBranchForMergedCondition(Cond, TBB, FBB, CurBB, SwitchBB,
                                  TProb, FProb, InvertCond);
     return;
   }
 
   //  Create TmpBB after CurBB.
   MachineFunction::iterator BBI(CurBB);
   MachineFunction &MF = DAG.getMachineFunction();
   MachineBasicBlock *TmpBB = MF.CreateMachineBasicBlock(CurBB->getBasicBlock());
   CurBB->getParent()->insert(++BBI, TmpBB);
 
   if (Opc == Instruction::Or) {
     // Codegen X | Y as:
     // BB1:
     //   jmp_if_X TBB
     //   jmp TmpBB
     // TmpBB:
     //   jmp_if_Y TBB
     //   jmp FBB
     //
 
     // We have flexibility in setting Prob for BB1 and Prob for TmpBB.
     // The requirement is that
     //   TrueProb for BB1 + (FalseProb for BB1 * TrueProb for TmpBB)
     //     = TrueProb for original BB.
     // Assuming the original probabilities are A and B, one choice is to set
     // BB1's probabilities to A/2 and A/2+B, and set TmpBB's probabilities to
     // A/(1+B) and 2B/(1+B). This choice assumes that
     //   TrueProb for BB1 == FalseProb for BB1 * TrueProb for TmpBB.
     // Another choice is to assume TrueProb for BB1 equals to TrueProb for
     // TmpBB, but the math is more complicated.
 
     auto NewTrueProb = TProb / 2;
     auto NewFalseProb = TProb / 2 + FProb;
     // Emit the LHS condition.
     FindMergedConditions(BOp->getOperand(0), TBB, TmpBB, CurBB, SwitchBB, Opc,
                          NewTrueProb, NewFalseProb, InvertCond);
 
     // Normalize A/2 and B to get A/(1+B) and 2B/(1+B).
     SmallVector<BranchProbability, 2> Probs{TProb / 2, FProb};
     BranchProbability::normalizeProbabilities(Probs.begin(), Probs.end());
     // Emit the RHS condition into TmpBB.
     FindMergedConditions(BOp->getOperand(1), TBB, FBB, TmpBB, SwitchBB, Opc,
                          Probs[0], Probs[1], InvertCond);
   } else {
     assert(Opc == Instruction::And && "Unknown merge op!");
     // Codegen X & Y as:
     // BB1:
     //   jmp_if_X TmpBB
     //   jmp FBB
     // TmpBB:
     //   jmp_if_Y TBB
     //   jmp FBB
     //
     //  This requires creation of TmpBB after CurBB.
 
     // We have flexibility in setting Prob for BB1 and Prob for TmpBB.
     // The requirement is that
     //   FalseProb for BB1 + (TrueProb for BB1 * FalseProb for TmpBB)
     //     = FalseProb for original BB.
     // Assuming the original probabilities are A and B, one choice is to set
     // BB1's probabilities to A+B/2 and B/2, and set TmpBB's probabilities to
     // 2A/(1+A) and B/(1+A). This choice assumes that FalseProb for BB1 ==
     // TrueProb for BB1 * FalseProb for TmpBB.
 
     auto NewTrueProb = TProb + FProb / 2;
     auto NewFalseProb = FProb / 2;
     // Emit the LHS condition.
     FindMergedConditions(BOp->getOperand(0), TmpBB, FBB, CurBB, SwitchBB, Opc,
                          NewTrueProb, NewFalseProb, InvertCond);
 
     // Normalize A and B/2 to get 2A/(1+A) and B/(1+A).
     SmallVector<BranchProbability, 2> Probs{TProb, FProb / 2};
     BranchProbability::normalizeProbabilities(Probs.begin(), Probs.end());
     // Emit the RHS condition into TmpBB.
     FindMergedConditions(BOp->getOperand(1), TBB, FBB, TmpBB, SwitchBB, Opc,
                          Probs[0], Probs[1], InvertCond);
   }
 }
 
 /// If the set of cases should be emitted as a series of branches, return true.
 /// If we should emit this as a bunch of and/or'd together conditions, return
 /// false.
 bool
 SelectionDAGBuilder::ShouldEmitAsBranches(const std::vector<CaseBlock> &Cases) {
   if (Cases.size() != 2) return true;
 
   // If this is two comparisons of the same values or'd or and'd together, they
   // will get folded into a single comparison, so don't emit two blocks.
   if ((Cases[0].CmpLHS == Cases[1].CmpLHS &&
        Cases[0].CmpRHS == Cases[1].CmpRHS) ||
       (Cases[0].CmpRHS == Cases[1].CmpLHS &&
        Cases[0].CmpLHS == Cases[1].CmpRHS)) {
     return false;
   }
 
   // Handle: (X != null) | (Y != null) --> (X|Y) != 0
   // Handle: (X == null) & (Y == null) --> (X|Y) == 0
   if (Cases[0].CmpRHS == Cases[1].CmpRHS &&
       Cases[0].CC == Cases[1].CC &&
       isa<Constant>(Cases[0].CmpRHS) &&
       cast<Constant>(Cases[0].CmpRHS)->isNullValue()) {
     if (Cases[0].CC == ISD::SETEQ && Cases[0].TrueBB == Cases[1].ThisBB)
       return false;
     if (Cases[0].CC == ISD::SETNE && Cases[0].FalseBB == Cases[1].ThisBB)
       return false;
   }
 
   return true;
 }
 
 void SelectionDAGBuilder::visitBr(const BranchInst &I) {
   MachineBasicBlock *BrMBB = FuncInfo.MBB;
 
   // Update machine-CFG edges.
   MachineBasicBlock *Succ0MBB = FuncInfo.MBBMap[I.getSuccessor(0)];
 
   if (I.isUnconditional()) {
     // Update machine-CFG edges.
     BrMBB->addSuccessor(Succ0MBB);
 
     // If this is not a fall-through branch or optimizations are switched off,
     // emit the branch.
     if (Succ0MBB != NextBlock(BrMBB) || TM.getOptLevel() == CodeGenOpt::None)
       DAG.setRoot(DAG.getNode(ISD::BR, getCurSDLoc(),
                               MVT::Other, getControlRoot(),
                               DAG.getBasicBlock(Succ0MBB)));
 
     return;
   }
 
   // If this condition is one of the special cases we handle, do special stuff
   // now.
   const Value *CondVal = I.getCondition();
   MachineBasicBlock *Succ1MBB = FuncInfo.MBBMap[I.getSuccessor(1)];
 
   // If this is a series of conditions that are or'd or and'd together, emit
   // this as a sequence of branches instead of setcc's with and/or operations.
   // As long as jumps are not expensive (exceptions for multi-use logic ops,
   // unpredictable branches, and vector extracts because those jumps are likely
   // expensive for any target), this should improve performance.
   // For example, instead of something like:
   //     cmp A, B
   //     C = seteq
   //     cmp D, E
   //     F = setle
   //     or C, F
   //     jnz foo
   // Emit:
   //     cmp A, B
   //     je foo
   //     cmp D, E
   //     jle foo
   if (const BinaryOperator *BOp = dyn_cast<BinaryOperator>(CondVal)) {
     Instruction::BinaryOps Opcode = BOp->getOpcode();
     Value *Vec, *BOp0 = BOp->getOperand(0), *BOp1 = BOp->getOperand(1);
     if (!DAG.getTargetLoweringInfo().isJumpExpensive() && BOp->hasOneUse() &&
         !I.hasMetadata(LLVMContext::MD_unpredictable) &&
         (Opcode == Instruction::And || Opcode == Instruction::Or) &&
         !(match(BOp0, m_ExtractElt(m_Value(Vec), m_Value())) &&
           match(BOp1, m_ExtractElt(m_Specific(Vec), m_Value())))) {
       FindMergedConditions(BOp, Succ0MBB, Succ1MBB, BrMBB, BrMBB,
                            Opcode,
                            getEdgeProbability(BrMBB, Succ0MBB),
                            getEdgeProbability(BrMBB, Succ1MBB),
                            /*InvertCond=*/false);
       // If the compares in later blocks need to use values not currently
       // exported from this block, export them now.  This block should always
       // be the first entry.
       assert(SL->SwitchCases[0].ThisBB == BrMBB && "Unexpected lowering!");
 
       // Allow some cases to be rejected.
       if (ShouldEmitAsBranches(SL->SwitchCases)) {
         for (unsigned i = 1, e = SL->SwitchCases.size(); i != e; ++i) {
           ExportFromCurrentBlock(SL->SwitchCases[i].CmpLHS);
           ExportFromCurrentBlock(SL->SwitchCases[i].CmpRHS);
         }
 
         // Emit the branch for this block.
         visitSwitchCase(SL->SwitchCases[0], BrMBB);
         SL->SwitchCases.erase(SL->SwitchCases.begin());
         return;
       }
 
       // Okay, we decided not to do this, remove any inserted MBB's and clear
       // SwitchCases.
       for (unsigned i = 1, e = SL->SwitchCases.size(); i != e; ++i)
         FuncInfo.MF->erase(SL->SwitchCases[i].ThisBB);
 
       SL->SwitchCases.clear();
     }
   }
 
   // Create a CaseBlock record representing this branch.
   CaseBlock CB(ISD::SETEQ, CondVal, ConstantInt::getTrue(*DAG.getContext()),
                nullptr, Succ0MBB, Succ1MBB, BrMBB, getCurSDLoc());
 
   // Use visitSwitchCase to actually insert the fast branch sequence for this
   // cond branch.
   visitSwitchCase(CB, BrMBB);
 }
 
 /// visitSwitchCase - Emits the necessary code to represent a single node in
 /// the binary search tree resulting from lowering a switch instruction.
 void SelectionDAGBuilder::visitSwitchCase(CaseBlock &CB,
                                           MachineBasicBlock *SwitchBB) {
   SDValue Cond;
   SDValue CondLHS = getValue(CB.CmpLHS);
   SDLoc dl = CB.DL;
 
   if (CB.CC == ISD::SETTRUE) {
     // Branch or fall through to TrueBB.
     addSuccessorWithProb(SwitchBB, CB.TrueBB, CB.TrueProb);
     SwitchBB->normalizeSuccProbs();
     if (CB.TrueBB != NextBlock(SwitchBB)) {
       DAG.setRoot(DAG.getNode(ISD::BR, dl, MVT::Other, getControlRoot(),
                               DAG.getBasicBlock(CB.TrueBB)));
     }
     return;
   }
 
   auto &TLI = DAG.getTargetLoweringInfo();
   EVT MemVT = TLI.getMemValueType(DAG.getDataLayout(), CB.CmpLHS->getType());
 
   // Build the setcc now.
   if (!CB.CmpMHS) {
     // Fold "(X == true)" to X and "(X == false)" to !X to
     // handle common cases produced by branch lowering.
     if (CB.CmpRHS == ConstantInt::getTrue(*DAG.getContext()) &&
         CB.CC == ISD::SETEQ)
       Cond = CondLHS;
     else if (CB.CmpRHS == ConstantInt::getFalse(*DAG.getContext()) &&
              CB.CC == ISD::SETEQ) {
       SDValue True = DAG.getConstant(1, dl, CondLHS.getValueType());
       Cond = DAG.getNode(ISD::XOR, dl, CondLHS.getValueType(), CondLHS, True);
     } else {
       SDValue CondRHS = getValue(CB.CmpRHS);
 
       // If a pointer's DAG type is larger than its memory type then the DAG
       // values are zero-extended. This breaks signed comparisons so truncate
       // back to the underlying type before doing the compare.
       if (CondLHS.getValueType() != MemVT) {
         CondLHS = DAG.getPtrExtOrTrunc(CondLHS, getCurSDLoc(), MemVT);
         CondRHS = DAG.getPtrExtOrTrunc(CondRHS, getCurSDLoc(), MemVT);
       }
       Cond = DAG.getSetCC(dl, MVT::i1, CondLHS, CondRHS, CB.CC);
     }
   } else {
     assert(CB.CC == ISD::SETLE && "Can handle only LE ranges now");
 
     const APInt& Low = cast<ConstantInt>(CB.CmpLHS)->getValue();
     const APInt& High = cast<ConstantInt>(CB.CmpRHS)->getValue();
 
     SDValue CmpOp = getValue(CB.CmpMHS);
     EVT VT = CmpOp.getValueType();
 
     if (cast<ConstantInt>(CB.CmpLHS)->isMinValue(true)) {
       Cond = DAG.getSetCC(dl, MVT::i1, CmpOp, DAG.getConstant(High, dl, VT),
                           ISD::SETLE);
     } else {
       SDValue SUB = DAG.getNode(ISD::SUB, dl,
                                 VT, CmpOp, DAG.getConstant(Low, dl, VT));
       Cond = DAG.getSetCC(dl, MVT::i1, SUB,
                           DAG.getConstant(High-Low, dl, VT), ISD::SETULE);
     }
   }
 
   // Update successor info
   addSuccessorWithProb(SwitchBB, CB.TrueBB, CB.TrueProb);
   // TrueBB and FalseBB are always different unless the incoming IR is
   // degenerate. This only happens when running llc on weird IR.
   if (CB.TrueBB != CB.FalseBB)
     addSuccessorWithProb(SwitchBB, CB.FalseBB, CB.FalseProb);
   SwitchBB->normalizeSuccProbs();
 
   // If the lhs block is the next block, invert the condition so that we can
   // fall through to the lhs instead of the rhs block.
   if (CB.TrueBB == NextBlock(SwitchBB)) {
     std::swap(CB.TrueBB, CB.FalseBB);
     SDValue True = DAG.getConstant(1, dl, Cond.getValueType());
     Cond = DAG.getNode(ISD::XOR, dl, Cond.getValueType(), Cond, True);
   }
 
   SDValue BrCond = DAG.getNode(ISD::BRCOND, dl,
                                MVT::Other, getControlRoot(), Cond,
                                DAG.getBasicBlock(CB.TrueBB));
 
   // Insert the false branch. Do this even if it's a fall through branch,
   // this makes it easier to do DAG optimizations which require inverting
   // the branch condition.
   BrCond = DAG.getNode(ISD::BR, dl, MVT::Other, BrCond,
                        DAG.getBasicBlock(CB.FalseBB));
 
   DAG.setRoot(BrCond);
 }
 
 /// visitJumpTable - Emit JumpTable node in the current MBB
 void SelectionDAGBuilder::visitJumpTable(SwitchCG::JumpTable &JT) {
   // Emit the code for the jump table
   assert(JT.Reg != -1U && "Should lower JT Header first!");
   EVT PTy = DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout());
   SDValue Index = DAG.getCopyFromReg(getControlRoot(), getCurSDLoc(),
                                      JT.Reg, PTy);
   SDValue Table = DAG.getJumpTable(JT.JTI, PTy);
   SDValue BrJumpTable = DAG.getNode(ISD::BR_JT, getCurSDLoc(),
                                     MVT::Other, Index.getValue(1),
                                     Table, Index);
   DAG.setRoot(BrJumpTable);
 }
 
 /// visitJumpTableHeader - This function emits necessary code to produce index
 /// in the JumpTable from switch case.
 void SelectionDAGBuilder::visitJumpTableHeader(SwitchCG::JumpTable &JT,
                                                JumpTableHeader &JTH,
                                                MachineBasicBlock *SwitchBB) {
   SDLoc dl = getCurSDLoc();
 
   // Subtract the lowest switch case value from the value being switched on.
   SDValue SwitchOp = getValue(JTH.SValue);
   EVT VT = SwitchOp.getValueType();
   SDValue Sub = DAG.getNode(ISD::SUB, dl, VT, SwitchOp,
                             DAG.getConstant(JTH.First, dl, VT));
 
   // The SDNode we just created, which holds the value being switched on minus
   // the smallest case value, needs to be copied to a virtual register so it
   // can be used as an index into the jump table in a subsequent basic block.
   // This value may be smaller or larger than the target's pointer type, and
   // therefore require extension or truncating.
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SwitchOp = DAG.getZExtOrTrunc(Sub, dl, TLI.getPointerTy(DAG.getDataLayout()));
 
   unsigned JumpTableReg =
       FuncInfo.CreateReg(TLI.getPointerTy(DAG.getDataLayout()));
   SDValue CopyTo = DAG.getCopyToReg(getControlRoot(), dl,
                                     JumpTableReg, SwitchOp);
   JT.Reg = JumpTableReg;
 
   if (!JTH.OmitRangeCheck) {
     // Emit the range check for the jump table, and branch to the default block
     // for the switch statement if the value being switched on exceeds the
     // largest case in the switch.
     SDValue CMP = DAG.getSetCC(
         dl, TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
                                    Sub.getValueType()),
         Sub, DAG.getConstant(JTH.Last - JTH.First, dl, VT), ISD::SETUGT);
 
     SDValue BrCond = DAG.getNode(ISD::BRCOND, dl,
                                  MVT::Other, CopyTo, CMP,
                                  DAG.getBasicBlock(JT.Default));
 
     // Avoid emitting unnecessary branches to the next block.
     if (JT.MBB != NextBlock(SwitchBB))
       BrCond = DAG.getNode(ISD::BR, dl, MVT::Other, BrCond,
                            DAG.getBasicBlock(JT.MBB));
 
     DAG.setRoot(BrCond);
   } else {
     // Avoid emitting unnecessary branches to the next block.
     if (JT.MBB != NextBlock(SwitchBB))
       DAG.setRoot(DAG.getNode(ISD::BR, dl, MVT::Other, CopyTo,
                               DAG.getBasicBlock(JT.MBB)));
     else
       DAG.setRoot(CopyTo);
   }
 }
 
 /// Create a LOAD_STACK_GUARD node, and let it carry the target specific global
 /// variable if there exists one.
 static SDValue getLoadStackGuard(SelectionDAG &DAG, const SDLoc &DL,
                                  SDValue &Chain) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   EVT PtrTy = TLI.getPointerTy(DAG.getDataLayout());
   EVT PtrMemTy = TLI.getPointerMemTy(DAG.getDataLayout());
   MachineFunction &MF = DAG.getMachineFunction();
   Value *Global = TLI.getSDagStackGuard(*MF.getFunction().getParent());
   MachineSDNode *Node =
       DAG.getMachineNode(TargetOpcode::LOAD_STACK_GUARD, DL, PtrTy, Chain);
   if (Global) {
     MachinePointerInfo MPInfo(Global);
     auto Flags = MachineMemOperand::MOLoad | MachineMemOperand::MOInvariant |
                  MachineMemOperand::MODereferenceable;
     MachineMemOperand *MemRef = MF.getMachineMemOperand(
         MPInfo, Flags, PtrTy.getSizeInBits() / 8, DAG.getEVTAlign(PtrTy));
     DAG.setNodeMemRefs(Node, {MemRef});
   }
   if (PtrTy != PtrMemTy)
     return DAG.getPtrExtOrTrunc(SDValue(Node, 0), DL, PtrMemTy);
   return SDValue(Node, 0);
 }
 
 /// Codegen a new tail for a stack protector check ParentMBB which has had its
 /// tail spliced into a stack protector check success bb.
 ///
 /// For a high level explanation of how this fits into the stack protector
 /// generation see the comment on the declaration of class
 /// StackProtectorDescriptor.
 void SelectionDAGBuilder::visitSPDescriptorParent(StackProtectorDescriptor &SPD,
                                                   MachineBasicBlock *ParentBB) {
 
   // First create the loads to the guard/stack slot for the comparison.
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   EVT PtrTy = TLI.getPointerTy(DAG.getDataLayout());
   EVT PtrMemTy = TLI.getPointerMemTy(DAG.getDataLayout());
 
   MachineFrameInfo &MFI = ParentBB->getParent()->getFrameInfo();
   int FI = MFI.getStackProtectorIndex();
 
   SDValue Guard;
   SDLoc dl = getCurSDLoc();
   SDValue StackSlotPtr = DAG.getFrameIndex(FI, PtrTy);
   const Module &M = *ParentBB->getParent()->getFunction().getParent();
   unsigned Align = DL->getPrefTypeAlignment(Type::getInt8PtrTy(M.getContext()));
 
   // Generate code to load the content of the guard slot.
   SDValue GuardVal = DAG.getLoad(
       PtrMemTy, dl, DAG.getEntryNode(), StackSlotPtr,
       MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FI), Align,
       MachineMemOperand::MOVolatile);
 
   if (TLI.useStackGuardXorFP())
     GuardVal = TLI.emitStackGuardXorFP(DAG, GuardVal, dl);
 
   // Retrieve guard check function, nullptr if instrumentation is inlined.
   if (const Function *GuardCheckFn = TLI.getSSPStackGuardCheck(M)) {
     // The target provides a guard check function to validate the guard value.
     // Generate a call to that function with the content of the guard slot as
     // argument.
     FunctionType *FnTy = GuardCheckFn->getFunctionType();
     assert(FnTy->getNumParams() == 1 && "Invalid function signature");
 
     TargetLowering::ArgListTy Args;
     TargetLowering::ArgListEntry Entry;
     Entry.Node = GuardVal;
     Entry.Ty = FnTy->getParamType(0);
     if (GuardCheckFn->hasAttribute(1, Attribute::AttrKind::InReg))
       Entry.IsInReg = true;
     Args.push_back(Entry);
 
     TargetLowering::CallLoweringInfo CLI(DAG);
     CLI.setDebugLoc(getCurSDLoc())
         .setChain(DAG.getEntryNode())
         .setCallee(GuardCheckFn->getCallingConv(), FnTy->getReturnType(),
                    getValue(GuardCheckFn), std::move(Args));
 
     std::pair<SDValue, SDValue> Result = TLI.LowerCallTo(CLI);
     DAG.setRoot(Result.second);
     return;
   }
 
   // If useLoadStackGuardNode returns true, generate LOAD_STACK_GUARD.
   // Otherwise, emit a volatile load to retrieve the stack guard value.
   SDValue Chain = DAG.getEntryNode();
   if (TLI.useLoadStackGuardNode()) {
     Guard = getLoadStackGuard(DAG, dl, Chain);
   } else {
     const Value *IRGuard = TLI.getSDagStackGuard(M);
     SDValue GuardPtr = getValue(IRGuard);
 
     Guard = DAG.getLoad(PtrMemTy, dl, Chain, GuardPtr,
                         MachinePointerInfo(IRGuard, 0), Align,
                         MachineMemOperand::MOVolatile);
   }
 
   // Perform the comparison via a getsetcc.
   SDValue Cmp = DAG.getSetCC(dl, TLI.getSetCCResultType(DAG.getDataLayout(),
                                                         *DAG.getContext(),
                                                         Guard.getValueType()),
                              Guard, GuardVal, ISD::SETNE);
 
   // If the guard/stackslot do not equal, branch to failure MBB.
   SDValue BrCond = DAG.getNode(ISD::BRCOND, dl,
                                MVT::Other, GuardVal.getOperand(0),
                                Cmp, DAG.getBasicBlock(SPD.getFailureMBB()));
   // Otherwise branch to success MBB.
   SDValue Br = DAG.getNode(ISD::BR, dl,
                            MVT::Other, BrCond,
                            DAG.getBasicBlock(SPD.getSuccessMBB()));
 
   DAG.setRoot(Br);
 }
 
 /// Codegen the failure basic block for a stack protector check.
 ///
 /// A failure stack protector machine basic block consists simply of a call to
 /// __stack_chk_fail().
 ///
 /// For a high level explanation of how this fits into the stack protector
 /// generation see the comment on the declaration of class
 /// StackProtectorDescriptor.
 void
 SelectionDAGBuilder::visitSPDescriptorFailure(StackProtectorDescriptor &SPD) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   TargetLowering::MakeLibCallOptions CallOptions;
   CallOptions.setDiscardResult(true);
   SDValue Chain =
       TLI.makeLibCall(DAG, RTLIB::STACKPROTECTOR_CHECK_FAIL, MVT::isVoid,
                       None, CallOptions, getCurSDLoc()).second;
   // On PS4, the "return address" must still be within the calling function,
   // even if it's at the very end, so emit an explicit TRAP here.
   // Passing 'true' for doesNotReturn above won't generate the trap for us.
   if (TM.getTargetTriple().isPS4CPU())
     Chain = DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, Chain);
   // WebAssembly needs an unreachable instruction after a non-returning call,
   // because the function return type can be different from __stack_chk_fail's
   // return type (void).
   if (TM.getTargetTriple().isWasm())
     Chain = DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, Chain);
 
   DAG.setRoot(Chain);
 }
 
 /// visitBitTestHeader - This function emits necessary code to produce value
 /// suitable for "bit tests"
 void SelectionDAGBuilder::visitBitTestHeader(BitTestBlock &B,
                                              MachineBasicBlock *SwitchBB) {
   SDLoc dl = getCurSDLoc();
 
   // Subtract the minimum value.
   SDValue SwitchOp = getValue(B.SValue);
   EVT VT = SwitchOp.getValueType();
   SDValue RangeSub =
       DAG.getNode(ISD::SUB, dl, VT, SwitchOp, DAG.getConstant(B.First, dl, VT));
 
   // Determine the type of the test operands.
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   bool UsePtrType = false;
   if (!TLI.isTypeLegal(VT)) {
     UsePtrType = true;
   } else {
     for (unsigned i = 0, e = B.Cases.size(); i != e; ++i)
       if (!isUIntN(VT.getSizeInBits(), B.Cases[i].Mask)) {
         // Switch table case range are encoded into series of masks.
         // Just use pointer type, it's guaranteed to fit.
         UsePtrType = true;
         break;
       }
   }
   SDValue Sub = RangeSub;
   if (UsePtrType) {
     VT = TLI.getPointerTy(DAG.getDataLayout());
     Sub = DAG.getZExtOrTrunc(Sub, dl, VT);
   }
 
   B.RegVT = VT.getSimpleVT();
   B.Reg = FuncInfo.CreateReg(B.RegVT);
   SDValue CopyTo = DAG.getCopyToReg(getControlRoot(), dl, B.Reg, Sub);
 
   MachineBasicBlock* MBB = B.Cases[0].ThisBB;
 
   if (!B.OmitRangeCheck)
     addSuccessorWithProb(SwitchBB, B.Default, B.DefaultProb);
   addSuccessorWithProb(SwitchBB, MBB, B.Prob);
   SwitchBB->normalizeSuccProbs();
 
   SDValue Root = CopyTo;
   if (!B.OmitRangeCheck) {
     // Conditional branch to the default block.
     SDValue RangeCmp = DAG.getSetCC(dl,
         TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(),
                                RangeSub.getValueType()),
         RangeSub, DAG.getConstant(B.Range, dl, RangeSub.getValueType()),
         ISD::SETUGT);
 
     Root = DAG.getNode(ISD::BRCOND, dl, MVT::Other, Root, RangeCmp,
                        DAG.getBasicBlock(B.Default));
   }
 
   // Avoid emitting unnecessary branches to the next block.
   if (MBB != NextBlock(SwitchBB))
     Root = DAG.getNode(ISD::BR, dl, MVT::Other, Root, DAG.getBasicBlock(MBB));
 
   DAG.setRoot(Root);
 }
 
 /// visitBitTestCase - this function produces one "bit test"
 void SelectionDAGBuilder::visitBitTestCase(BitTestBlock &BB,
                                            MachineBasicBlock* NextMBB,
                                            BranchProbability BranchProbToNext,
                                            unsigned Reg,
                                            BitTestCase &B,
                                            MachineBasicBlock *SwitchBB) {
   SDLoc dl = getCurSDLoc();
   MVT VT = BB.RegVT;
   SDValue ShiftOp = DAG.getCopyFromReg(getControlRoot(), dl, Reg, VT);
   SDValue Cmp;
   unsigned PopCount = countPopulation(B.Mask);
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   if (PopCount == 1) {
     // Testing for a single bit; just compare the shift count with what it
     // would need to be to shift a 1 bit in that position.
     Cmp = DAG.getSetCC(
         dl, TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT),
         ShiftOp, DAG.getConstant(countTrailingZeros(B.Mask), dl, VT),
         ISD::SETEQ);
   } else if (PopCount == BB.Range) {
     // There is only one zero bit in the range, test for it directly.
     Cmp = DAG.getSetCC(
         dl, TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT),
         ShiftOp, DAG.getConstant(countTrailingOnes(B.Mask), dl, VT),
         ISD::SETNE);
   } else {
     // Make desired shift
     SDValue SwitchVal = DAG.getNode(ISD::SHL, dl, VT,
                                     DAG.getConstant(1, dl, VT), ShiftOp);
 
     // Emit bit tests and jumps
     SDValue AndOp = DAG.getNode(ISD::AND, dl,
                                 VT, SwitchVal, DAG.getConstant(B.Mask, dl, VT));
     Cmp = DAG.getSetCC(
         dl, TLI.getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT),
         AndOp, DAG.getConstant(0, dl, VT), ISD::SETNE);
   }
 
   // The branch probability from SwitchBB to B.TargetBB is B.ExtraProb.
   addSuccessorWithProb(SwitchBB, B.TargetBB, B.ExtraProb);
   // The branch probability from SwitchBB to NextMBB is BranchProbToNext.
   addSuccessorWithProb(SwitchBB, NextMBB, BranchProbToNext);
   // It is not guaranteed that the sum of B.ExtraProb and BranchProbToNext is
   // one as they are relative probabilities (and thus work more like weights),
   // and hence we need to normalize them to let the sum of them become one.
   SwitchBB->normalizeSuccProbs();
 
   SDValue BrAnd = DAG.getNode(ISD::BRCOND, dl,
                               MVT::Other, getControlRoot(),
                               Cmp, DAG.getBasicBlock(B.TargetBB));
 
   // Avoid emitting unnecessary branches to the next block.
   if (NextMBB != NextBlock(SwitchBB))
     BrAnd = DAG.getNode(ISD::BR, dl, MVT::Other, BrAnd,
                         DAG.getBasicBlock(NextMBB));
 
   DAG.setRoot(BrAnd);
 }
 
 void SelectionDAGBuilder::visitInvoke(const InvokeInst &I) {
   MachineBasicBlock *InvokeMBB = FuncInfo.MBB;
 
   // Retrieve successors. Look through artificial IR level blocks like
   // catchswitch for successors.
   MachineBasicBlock *Return = FuncInfo.MBBMap[I.getSuccessor(0)];
   const BasicBlock *EHPadBB = I.getSuccessor(1);
 
   // Deopt bundles are lowered in LowerCallSiteWithDeoptBundle, and we don't
   // have to do anything here to lower funclet bundles.
   assert(!I.hasOperandBundlesOtherThan({LLVMContext::OB_deopt,
                                         LLVMContext::OB_gc_transition,
                                         LLVMContext::OB_gc_live,
                                         LLVMContext::OB_funclet,
                                         LLVMContext::OB_cfguardtarget}) &&
          "Cannot lower invokes with arbitrary operand bundles yet!");
 
   const Value *Callee(I.getCalledOperand());
   const Function *Fn = dyn_cast<Function>(Callee);
   if (isa<InlineAsm>(Callee))
     visitInlineAsm(I);
   else if (Fn && Fn->isIntrinsic()) {
     switch (Fn->getIntrinsicID()) {
     default:
       llvm_unreachable("Cannot invoke this intrinsic");
     case Intrinsic::donothing:
       // Ignore invokes to @llvm.donothing: jump directly to the next BB.
       break;
     case Intrinsic::experimental_patchpoint_void:
     case Intrinsic::experimental_patchpoint_i64:
       visitPatchpoint(I, EHPadBB);
       break;
     case Intrinsic::experimental_gc_statepoint:
       LowerStatepoint(cast<GCStatepointInst>(I), EHPadBB);
       break;
     case Intrinsic::wasm_rethrow_in_catch: {
       // This is usually done in visitTargetIntrinsic, but this intrinsic is
       // special because it can be invoked, so we manually lower it to a DAG
       // node here.
       SmallVector<SDValue, 8> Ops;
       Ops.push_back(getRoot()); // inchain
       const TargetLowering &TLI = DAG.getTargetLoweringInfo();
       Ops.push_back(
           DAG.getTargetConstant(Intrinsic::wasm_rethrow_in_catch, getCurSDLoc(),
                                 TLI.getPointerTy(DAG.getDataLayout())));
       SDVTList VTs = DAG.getVTList(ArrayRef<EVT>({MVT::Other})); // outchain
       DAG.setRoot(DAG.getNode(ISD::INTRINSIC_VOID, getCurSDLoc(), VTs, Ops));
       break;
     }
     }
   } else if (I.countOperandBundlesOfType(LLVMContext::OB_deopt)) {
     // Currently we do not lower any intrinsic calls with deopt operand bundles.
     // Eventually we will support lowering the @llvm.experimental.deoptimize
     // intrinsic, and right now there are no plans to support other intrinsics
     // with deopt state.
     LowerCallSiteWithDeoptBundle(&I, getValue(Callee), EHPadBB);
   } else {
     LowerCallTo(I, getValue(Callee), false, EHPadBB);
   }
 
   // If the value of the invoke is used outside of its defining block, make it
   // available as a virtual register.
   // We already took care of the exported value for the statepoint instruction
   // during call to the LowerStatepoint.
   if (!isa<GCStatepointInst>(I)) {
     CopyToExportRegsIfNeeded(&I);
   }
 
   SmallVector<std::pair<MachineBasicBlock *, BranchProbability>, 1> UnwindDests;
   BranchProbabilityInfo *BPI = FuncInfo.BPI;
   BranchProbability EHPadBBProb =
       BPI ? BPI->getEdgeProbability(InvokeMBB->getBasicBlock(), EHPadBB)
           : BranchProbability::getZero();
   findUnwindDestinations(FuncInfo, EHPadBB, EHPadBBProb, UnwindDests);
 
   // Update successor info.
   addSuccessorWithProb(InvokeMBB, Return);
   for (auto &UnwindDest : UnwindDests) {
     UnwindDest.first->setIsEHPad();
     addSuccessorWithProb(InvokeMBB, UnwindDest.first, UnwindDest.second);
   }
   InvokeMBB->normalizeSuccProbs();
 
   // Drop into normal successor.
   DAG.setRoot(DAG.getNode(ISD::BR, getCurSDLoc(), MVT::Other, getControlRoot(),
                           DAG.getBasicBlock(Return)));
 }
 
 void SelectionDAGBuilder::visitCallBr(const CallBrInst &I) {
   MachineBasicBlock *CallBrMBB = FuncInfo.MBB;
 
   // Deopt bundles are lowered in LowerCallSiteWithDeoptBundle, and we don't
   // have to do anything here to lower funclet bundles.
   assert(!I.hasOperandBundlesOtherThan(
              {LLVMContext::OB_deopt, LLVMContext::OB_funclet}) &&
          "Cannot lower callbrs with arbitrary operand bundles yet!");
 
   assert(I.isInlineAsm() && "Only know how to handle inlineasm callbr");
   visitInlineAsm(I);
   CopyToExportRegsIfNeeded(&I);
 
   // Retrieve successors.
   MachineBasicBlock *Return = FuncInfo.MBBMap[I.getDefaultDest()];
 
   // Update successor info.
   addSuccessorWithProb(CallBrMBB, Return, BranchProbability::getOne());
   for (unsigned i = 0, e = I.getNumIndirectDests(); i < e; ++i) {
     MachineBasicBlock *Target = FuncInfo.MBBMap[I.getIndirectDest(i)];
     addSuccessorWithProb(CallBrMBB, Target, BranchProbability::getZero());
     Target->setIsInlineAsmBrIndirectTarget();
   }
   CallBrMBB->normalizeSuccProbs();
 
   // Drop into default successor.
   DAG.setRoot(DAG.getNode(ISD::BR, getCurSDLoc(),
                           MVT::Other, getControlRoot(),
                           DAG.getBasicBlock(Return)));
 }
 
 void SelectionDAGBuilder::visitResume(const ResumeInst &RI) {
   llvm_unreachable("SelectionDAGBuilder shouldn't visit resume instructions!");
 }
 
 void SelectionDAGBuilder::visitLandingPad(const LandingPadInst &LP) {
   assert(FuncInfo.MBB->isEHPad() &&
          "Call to landingpad not in landing pad!");
 
   // If there aren't registers to copy the values into (e.g., during SjLj
   // exceptions), then don't bother to create these DAG nodes.
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   const Constant *PersonalityFn = FuncInfo.Fn->getPersonalityFn();
   if (TLI.getExceptionPointerRegister(PersonalityFn) == 0 &&
       TLI.getExceptionSelectorRegister(PersonalityFn) == 0)
     return;
 
   // If landingpad's return type is token type, we don't create DAG nodes
   // for its exception pointer and selector value. The extraction of exception
   // pointer or selector value from token type landingpads is not currently
   // supported.
   if (LP.getType()->isTokenTy())
     return;
 
   SmallVector<EVT, 2> ValueVTs;
   SDLoc dl = getCurSDLoc();
   ComputeValueVTs(TLI, DAG.getDataLayout(), LP.getType(), ValueVTs);
   assert(ValueVTs.size() == 2 && "Only two-valued landingpads are supported");
 
   // Get the two live-in registers as SDValues. The physregs have already been
   // copied into virtual registers.
   SDValue Ops[2];
   if (FuncInfo.ExceptionPointerVirtReg) {
     Ops[0] = DAG.getZExtOrTrunc(
         DAG.getCopyFromReg(DAG.getEntryNode(), dl,
                            FuncInfo.ExceptionPointerVirtReg,
                            TLI.getPointerTy(DAG.getDataLayout())),
         dl, ValueVTs[0]);
   } else {
     Ops[0] = DAG.getConstant(0, dl, TLI.getPointerTy(DAG.getDataLayout()));
   }
   Ops[1] = DAG.getZExtOrTrunc(
       DAG.getCopyFromReg(DAG.getEntryNode(), dl,
                          FuncInfo.ExceptionSelectorVirtReg,
                          TLI.getPointerTy(DAG.getDataLayout())),
       dl, ValueVTs[1]);
 
   // Merge into one.
   SDValue Res = DAG.getNode(ISD::MERGE_VALUES, dl,
                             DAG.getVTList(ValueVTs), Ops);
   setValue(&LP, Res);
 }
 
 void SelectionDAGBuilder::UpdateSplitBlock(MachineBasicBlock *First,
                                            MachineBasicBlock *Last) {
   // Update JTCases.
   for (unsigned i = 0, e = SL->JTCases.size(); i != e; ++i)
     if (SL->JTCases[i].first.HeaderBB == First)
       SL->JTCases[i].first.HeaderBB = Last;
 
   // Update BitTestCases.
   for (unsigned i = 0, e = SL->BitTestCases.size(); i != e; ++i)
     if (SL->BitTestCases[i].Parent == First)
       SL->BitTestCases[i].Parent = Last;
 }
 
 void SelectionDAGBuilder::visitIndirectBr(const IndirectBrInst &I) {
   MachineBasicBlock *IndirectBrMBB = FuncInfo.MBB;
 
   // Update machine-CFG edges with unique successors.
   SmallSet<BasicBlock*, 32> Done;
   for (unsigned i = 0, e = I.getNumSuccessors(); i != e; ++i) {
     BasicBlock *BB = I.getSuccessor(i);
     bool Inserted = Done.insert(BB).second;
     if (!Inserted)
         continue;
 
     MachineBasicBlock *Succ = FuncInfo.MBBMap[BB];
     addSuccessorWithProb(IndirectBrMBB, Succ);
   }
   IndirectBrMBB->normalizeSuccProbs();
 
   DAG.setRoot(DAG.getNode(ISD::BRIND, getCurSDLoc(),
                           MVT::Other, getControlRoot(),
                           getValue(I.getAddress())));
 }
 
 void SelectionDAGBuilder::visitUnreachable(const UnreachableInst &I) {
   if (!DAG.getTarget().Options.TrapUnreachable)
     return;
 
   // We may be able to ignore unreachable behind a noreturn call.
   if (DAG.getTarget().Options.NoTrapAfterNoreturn) {
     const BasicBlock &BB = *I.getParent();
     if (&I != &BB.front()) {
       BasicBlock::const_iterator PredI =
         std::prev(BasicBlock::const_iterator(&I));
       if (const CallInst *Call = dyn_cast<CallInst>(&*PredI)) {
         if (Call->doesNotReturn())
           return;
       }
     }
   }
 
   DAG.setRoot(DAG.getNode(ISD::TRAP, getCurSDLoc(), MVT::Other, DAG.getRoot()));
 }
 
 void SelectionDAGBuilder::visitFSub(const User &I) {
   // -0.0 - X --> fneg
   Type *Ty = I.getType();
   if (isa<Constant>(I.getOperand(0)) &&
       I.getOperand(0) == ConstantFP::getZeroValueForNegation(Ty)) {
     SDValue Op2 = getValue(I.getOperand(1));
     setValue(&I, DAG.getNode(ISD::FNEG, getCurSDLoc(),
                              Op2.getValueType(), Op2));
     return;
   }
 
   visitBinary(I, ISD::FSUB);
 }
 
 void SelectionDAGBuilder::visitUnary(const User &I, unsigned Opcode) {
   SDNodeFlags Flags;
 
   SDValue Op = getValue(I.getOperand(0));
   SDValue UnNodeValue = DAG.getNode(Opcode, getCurSDLoc(), Op.getValueType(),
                                     Op, Flags);
   setValue(&I, UnNodeValue);
 }
 
 void SelectionDAGBuilder::visitBinary(const User &I, unsigned Opcode) {
   SDNodeFlags Flags;
   if (auto *OFBinOp = dyn_cast<OverflowingBinaryOperator>(&I)) {
     Flags.setNoSignedWrap(OFBinOp->hasNoSignedWrap());
     Flags.setNoUnsignedWrap(OFBinOp->hasNoUnsignedWrap());
   }
   if (auto *ExactOp = dyn_cast<PossiblyExactOperator>(&I)) {
     Flags.setExact(ExactOp->isExact());
   }
 
   SDValue Op1 = getValue(I.getOperand(0));
   SDValue Op2 = getValue(I.getOperand(1));
   SDValue BinNodeValue = DAG.getNode(Opcode, getCurSDLoc(), Op1.getValueType(),
                                      Op1, Op2, Flags);
   setValue(&I, BinNodeValue);
 }
 
 void SelectionDAGBuilder::visitShift(const User &I, unsigned Opcode) {
   SDValue Op1 = getValue(I.getOperand(0));
   SDValue Op2 = getValue(I.getOperand(1));
 
   EVT ShiftTy = DAG.getTargetLoweringInfo().getShiftAmountTy(
       Op1.getValueType(), DAG.getDataLayout());
 
   // Coerce the shift amount to the right type if we can.
   if (!I.getType()->isVectorTy() && Op2.getValueType() != ShiftTy) {
     unsigned ShiftSize = ShiftTy.getSizeInBits();
     unsigned Op2Size = Op2.getValueSizeInBits();
     SDLoc DL = getCurSDLoc();
 
     // If the operand is smaller than the shift count type, promote it.
     if (ShiftSize > Op2Size)
       Op2 = DAG.getNode(ISD::ZERO_EXTEND, DL, ShiftTy, Op2);
 
     // If the operand is larger than the shift count type but the shift
     // count type has enough bits to represent any shift value, truncate
     // it now. This is a common case and it exposes the truncate to
     // optimization early.
     else if (ShiftSize >= Log2_32_Ceil(Op2.getValueSizeInBits()))
       Op2 = DAG.getNode(ISD::TRUNCATE, DL, ShiftTy, Op2);
     // Otherwise we'll need to temporarily settle for some other convenient
     // type.  Type legalization will make adjustments once the shiftee is split.
     else
       Op2 = DAG.getZExtOrTrunc(Op2, DL, MVT::i32);
   }
 
   bool nuw = false;
   bool nsw = false;
   bool exact = false;
 
   if (Opcode == ISD::SRL || Opcode == ISD::SRA || Opcode == ISD::SHL) {
 
     if (const OverflowingBinaryOperator *OFBinOp =
             dyn_cast<const OverflowingBinaryOperator>(&I)) {
       nuw = OFBinOp->hasNoUnsignedWrap();
       nsw = OFBinOp->hasNoSignedWrap();
     }
     if (const PossiblyExactOperator *ExactOp =
             dyn_cast<const PossiblyExactOperator>(&I))
       exact = ExactOp->isExact();
   }
   SDNodeFlags Flags;
   Flags.setExact(exact);
   Flags.setNoSignedWrap(nsw);
   Flags.setNoUnsignedWrap(nuw);
   SDValue Res = DAG.getNode(Opcode, getCurSDLoc(), Op1.getValueType(), Op1, Op2,
                             Flags);
   setValue(&I, Res);
 }
 
 void SelectionDAGBuilder::visitSDiv(const User &I) {
   SDValue Op1 = getValue(I.getOperand(0));
   SDValue Op2 = getValue(I.getOperand(1));
 
   SDNodeFlags Flags;
   Flags.setExact(isa<PossiblyExactOperator>(&I) &&
                  cast<PossiblyExactOperator>(&I)->isExact());
   setValue(&I, DAG.getNode(ISD::SDIV, getCurSDLoc(), Op1.getValueType(), Op1,
                            Op2, Flags));
 }
 
 void SelectionDAGBuilder::visitICmp(const User &I) {
   ICmpInst::Predicate predicate = ICmpInst::BAD_ICMP_PREDICATE;
   if (const ICmpInst *IC = dyn_cast<ICmpInst>(&I))
     predicate = IC->getPredicate();
   else if (const ConstantExpr *IC = dyn_cast<ConstantExpr>(&I))
     predicate = ICmpInst::Predicate(IC->getPredicate());
   SDValue Op1 = getValue(I.getOperand(0));
   SDValue Op2 = getValue(I.getOperand(1));
   ISD::CondCode Opcode = getICmpCondCode(predicate);
 
   auto &TLI = DAG.getTargetLoweringInfo();
   EVT MemVT =
       TLI.getMemValueType(DAG.getDataLayout(), I.getOperand(0)->getType());
 
   // If a pointer's DAG type is larger than its memory type then the DAG values
   // are zero-extended. This breaks signed comparisons so truncate back to the
   // underlying type before doing the compare.
   if (Op1.getValueType() != MemVT) {
     Op1 = DAG.getPtrExtOrTrunc(Op1, getCurSDLoc(), MemVT);
     Op2 = DAG.getPtrExtOrTrunc(Op2, getCurSDLoc(), MemVT);
   }
 
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
   setValue(&I, DAG.getSetCC(getCurSDLoc(), DestVT, Op1, Op2, Opcode));
 }
 
 void SelectionDAGBuilder::visitFCmp(const User &I) {
   FCmpInst::Predicate predicate = FCmpInst::BAD_FCMP_PREDICATE;
   if (const FCmpInst *FC = dyn_cast<FCmpInst>(&I))
     predicate = FC->getPredicate();
   else if (const ConstantExpr *FC = dyn_cast<ConstantExpr>(&I))
     predicate = FCmpInst::Predicate(FC->getPredicate());
   SDValue Op1 = getValue(I.getOperand(0));
   SDValue Op2 = getValue(I.getOperand(1));
 
   ISD::CondCode Condition = getFCmpCondCode(predicate);
   auto *FPMO = dyn_cast<FPMathOperator>(&I);
   if ((FPMO && FPMO->hasNoNaNs()) || TM.Options.NoNaNsFPMath)
     Condition = getFCmpCodeWithoutNaN(Condition);
 
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
   setValue(&I, DAG.getSetCC(getCurSDLoc(), DestVT, Op1, Op2, Condition));
 }
 
 // Check if the condition of the select has one use or two users that are both
 // selects with the same condition.
 static bool hasOnlySelectUsers(const Value *Cond) {
   return llvm::all_of(Cond->users(), [](const Value *V) {
     return isa<SelectInst>(V);
   });
 }
 
 void SelectionDAGBuilder::visitSelect(const User &I) {
   SmallVector<EVT, 4> ValueVTs;
   ComputeValueVTs(DAG.getTargetLoweringInfo(), DAG.getDataLayout(), I.getType(),
                   ValueVTs);
   unsigned NumValues = ValueVTs.size();
   if (NumValues == 0) return;
 
   SmallVector<SDValue, 4> Values(NumValues);
   SDValue Cond     = getValue(I.getOperand(0));
   SDValue LHSVal   = getValue(I.getOperand(1));
   SDValue RHSVal   = getValue(I.getOperand(2));
   SmallVector<SDValue, 1> BaseOps(1, Cond);
   ISD::NodeType OpCode =
       Cond.getValueType().isVector() ? ISD::VSELECT : ISD::SELECT;
 
   bool IsUnaryAbs = false;
 
   // Min/max matching is only viable if all output VTs are the same.
   if (is_splat(ValueVTs)) {
     EVT VT = ValueVTs[0];
     LLVMContext &Ctx = *DAG.getContext();
     auto &TLI = DAG.getTargetLoweringInfo();
 
     // We care about the legality of the operation after it has been type
     // legalized.
     while (TLI.getTypeAction(Ctx, VT) != TargetLoweringBase::TypeLegal)
       VT = TLI.getTypeToTransformTo(Ctx, VT);
 
     // If the vselect is legal, assume we want to leave this as a vector setcc +
     // vselect. Otherwise, if this is going to be scalarized, we want to see if
     // min/max is legal on the scalar type.
     bool UseScalarMinMax = VT.isVector() &&
       !TLI.isOperationLegalOrCustom(ISD::VSELECT, VT);
 
     Value *LHS, *RHS;
     auto SPR = matchSelectPattern(const_cast<User*>(&I), LHS, RHS);
     ISD::NodeType Opc = ISD::DELETED_NODE;
     switch (SPR.Flavor) {
     case SPF_UMAX:    Opc = ISD::UMAX; break;
     case SPF_UMIN:    Opc = ISD::UMIN; break;
     case SPF_SMAX:    Opc = ISD::SMAX; break;
     case SPF_SMIN:    Opc = ISD::SMIN; break;
     case SPF_FMINNUM:
       switch (SPR.NaNBehavior) {
       case SPNB_NA: llvm_unreachable("No NaN behavior for FP op?");
       case SPNB_RETURNS_NAN:   Opc = ISD::FMINIMUM; break;
       case SPNB_RETURNS_OTHER: Opc = ISD::FMINNUM; break;
       case SPNB_RETURNS_ANY: {
         if (TLI.isOperationLegalOrCustom(ISD::FMINNUM, VT))
           Opc = ISD::FMINNUM;
         else if (TLI.isOperationLegalOrCustom(ISD::FMINIMUM, VT))
           Opc = ISD::FMINIMUM;
         else if (UseScalarMinMax)
           Opc = TLI.isOperationLegalOrCustom(ISD::FMINNUM, VT.getScalarType()) ?
             ISD::FMINNUM : ISD::FMINIMUM;
         break;
       }
       }
       break;
     case SPF_FMAXNUM:
       switch (SPR.NaNBehavior) {
       case SPNB_NA: llvm_unreachable("No NaN behavior for FP op?");
       case SPNB_RETURNS_NAN:   Opc = ISD::FMAXIMUM; break;
       case SPNB_RETURNS_OTHER: Opc = ISD::FMAXNUM; break;
       case SPNB_RETURNS_ANY:
 
         if (TLI.isOperationLegalOrCustom(ISD::FMAXNUM, VT))
           Opc = ISD::FMAXNUM;
         else if (TLI.isOperationLegalOrCustom(ISD::FMAXIMUM, VT))
           Opc = ISD::FMAXIMUM;
         else if (UseScalarMinMax)
           Opc = TLI.isOperationLegalOrCustom(ISD::FMAXNUM, VT.getScalarType()) ?
             ISD::FMAXNUM : ISD::FMAXIMUM;
         break;
       }
       break;
     case SPF_ABS:
       IsUnaryAbs = true;
       Opc = ISD::ABS;
       break;
     case SPF_NABS:
       // TODO: we need to produce sub(0, abs(X)).
     default: break;
     }
 
     if (!IsUnaryAbs && Opc != ISD::DELETED_NODE &&
         (TLI.isOperationLegalOrCustom(Opc, VT) ||
          (UseScalarMinMax &&
           TLI.isOperationLegalOrCustom(Opc, VT.getScalarType()))) &&
         // If the underlying comparison instruction is used by any other
         // instruction, the consumed instructions won't be destroyed, so it is
         // not profitable to convert to a min/max.
         hasOnlySelectUsers(cast<SelectInst>(I).getCondition())) {
       OpCode = Opc;
       LHSVal = getValue(LHS);
       RHSVal = getValue(RHS);
       BaseOps.clear();
     }
 
     if (IsUnaryAbs) {
       OpCode = Opc;
       LHSVal = getValue(LHS);
       BaseOps.clear();
     }
   }
 
   if (IsUnaryAbs) {
     for (unsigned i = 0; i != NumValues; ++i) {
       Values[i] =
           DAG.getNode(OpCode, getCurSDLoc(),
                       LHSVal.getNode()->getValueType(LHSVal.getResNo() + i),
                       SDValue(LHSVal.getNode(), LHSVal.getResNo() + i));
     }
   } else {
     for (unsigned i = 0; i != NumValues; ++i) {
       SmallVector<SDValue, 3> Ops(BaseOps.begin(), BaseOps.end());
       Ops.push_back(SDValue(LHSVal.getNode(), LHSVal.getResNo() + i));
       Ops.push_back(SDValue(RHSVal.getNode(), RHSVal.getResNo() + i));
       Values[i] = DAG.getNode(
           OpCode, getCurSDLoc(),
           LHSVal.getNode()->getValueType(LHSVal.getResNo() + i), Ops);
     }
   }
 
   setValue(&I, DAG.getNode(ISD::MERGE_VALUES, getCurSDLoc(),
                            DAG.getVTList(ValueVTs), Values));
 }
 
 void SelectionDAGBuilder::visitTrunc(const User &I) {
   // TruncInst cannot be a no-op cast because sizeof(src) > sizeof(dest).
   SDValue N = getValue(I.getOperand(0));
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
   setValue(&I, DAG.getNode(ISD::TRUNCATE, getCurSDLoc(), DestVT, N));
 }
 
 void SelectionDAGBuilder::visitZExt(const User &I) {
   // ZExt cannot be a no-op cast because sizeof(src) < sizeof(dest).
   // ZExt also can't be a cast to bool for same reason. So, nothing much to do
   SDValue N = getValue(I.getOperand(0));
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
   setValue(&I, DAG.getNode(ISD::ZERO_EXTEND, getCurSDLoc(), DestVT, N));
 }
 
 void SelectionDAGBuilder::visitSExt(const User &I) {
   // SExt cannot be a no-op cast because sizeof(src) < sizeof(dest).
   // SExt also can't be a cast to bool for same reason. So, nothing much to do
   SDValue N = getValue(I.getOperand(0));
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
   setValue(&I, DAG.getNode(ISD::SIGN_EXTEND, getCurSDLoc(), DestVT, N));
 }
 
 void SelectionDAGBuilder::visitFPTrunc(const User &I) {
   // FPTrunc is never a no-op cast, no need to check
   SDValue N = getValue(I.getOperand(0));
   SDLoc dl = getCurSDLoc();
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   EVT DestVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
   setValue(&I, DAG.getNode(ISD::FP_ROUND, dl, DestVT, N,
                            DAG.getTargetConstant(
                                0, dl, TLI.getPointerTy(DAG.getDataLayout()))));
 }
 
 void SelectionDAGBuilder::visitFPExt(const User &I) {
   // FPExt is never a no-op cast, no need to check
   SDValue N = getValue(I.getOperand(0));
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
   setValue(&I, DAG.getNode(ISD::FP_EXTEND, getCurSDLoc(), DestVT, N));
 }
 
 void SelectionDAGBuilder::visitFPToUI(const User &I) {
   // FPToUI is never a no-op cast, no need to check
   SDValue N = getValue(I.getOperand(0));
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
   setValue(&I, DAG.getNode(ISD::FP_TO_UINT, getCurSDLoc(), DestVT, N));
 }
 
 void SelectionDAGBuilder::visitFPToSI(const User &I) {
   // FPToSI is never a no-op cast, no need to check
   SDValue N = getValue(I.getOperand(0));
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
   setValue(&I, DAG.getNode(ISD::FP_TO_SINT, getCurSDLoc(), DestVT, N));
 }
 
 void SelectionDAGBuilder::visitUIToFP(const User &I) {
   // UIToFP is never a no-op cast, no need to check
   SDValue N = getValue(I.getOperand(0));
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
   setValue(&I, DAG.getNode(ISD::UINT_TO_FP, getCurSDLoc(), DestVT, N));
 }
 
 void SelectionDAGBuilder::visitSIToFP(const User &I) {
   // SIToFP is never a no-op cast, no need to check
   SDValue N = getValue(I.getOperand(0));
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
   setValue(&I, DAG.getNode(ISD::SINT_TO_FP, getCurSDLoc(), DestVT, N));
 }
 
 void SelectionDAGBuilder::visitPtrToInt(const User &I) {
   // What to do depends on the size of the integer and the size of the pointer.
   // We can either truncate, zero extend, or no-op, accordingly.
   SDValue N = getValue(I.getOperand(0));
   auto &TLI = DAG.getTargetLoweringInfo();
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
   EVT PtrMemVT =
       TLI.getMemValueType(DAG.getDataLayout(), I.getOperand(0)->getType());
   N = DAG.getPtrExtOrTrunc(N, getCurSDLoc(), PtrMemVT);
   N = DAG.getZExtOrTrunc(N, getCurSDLoc(), DestVT);
   setValue(&I, N);
 }
 
 void SelectionDAGBuilder::visitIntToPtr(const User &I) {
   // What to do depends on the size of the integer and the size of the pointer.
   // We can either truncate, zero extend, or no-op, accordingly.
   SDValue N = getValue(I.getOperand(0));
   auto &TLI = DAG.getTargetLoweringInfo();
   EVT DestVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
   EVT PtrMemVT = TLI.getMemValueType(DAG.getDataLayout(), I.getType());
   N = DAG.getZExtOrTrunc(N, getCurSDLoc(), PtrMemVT);
   N = DAG.getPtrExtOrTrunc(N, getCurSDLoc(), DestVT);
   setValue(&I, N);
 }
 
 void SelectionDAGBuilder::visitBitCast(const User &I) {
   SDValue N = getValue(I.getOperand(0));
   SDLoc dl = getCurSDLoc();
   EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                         I.getType());
 
   // BitCast assures us that source and destination are the same size so this is
   // either a BITCAST or a no-op.
   if (DestVT != N.getValueType())
     setValue(&I, DAG.getNode(ISD::BITCAST, dl,
                              DestVT, N)); // convert types.
   // Check if the original LLVM IR Operand was a ConstantInt, because getValue()
   // might fold any kind of constant expression to an integer constant and that
   // is not what we are looking for. Only recognize a bitcast of a genuine
   // constant integer as an opaque constant.
   else if(ConstantInt *C = dyn_cast<ConstantInt>(I.getOperand(0)))
     setValue(&I, DAG.getConstant(C->getValue(), dl, DestVT, /*isTarget=*/false,
                                  /*isOpaque*/true));
   else
     setValue(&I, N);            // noop cast.
 }
 
 void SelectionDAGBuilder::visitAddrSpaceCast(const User &I) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   const Value *SV = I.getOperand(0);
   SDValue N = getValue(SV);
   EVT DestVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
 
   unsigned SrcAS = SV->getType()->getPointerAddressSpace();
   unsigned DestAS = I.getType()->getPointerAddressSpace();
 
   if (!TLI.isNoopAddrSpaceCast(SrcAS, DestAS))
     N = DAG.getAddrSpaceCast(getCurSDLoc(), DestVT, N, SrcAS, DestAS);
 
   setValue(&I, N);
 }
 
 void SelectionDAGBuilder::visitInsertElement(const User &I) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SDValue InVec = getValue(I.getOperand(0));
   SDValue InVal = getValue(I.getOperand(1));
   SDValue InIdx = DAG.getSExtOrTrunc(getValue(I.getOperand(2)), getCurSDLoc(),
                                      TLI.getVectorIdxTy(DAG.getDataLayout()));
   setValue(&I, DAG.getNode(ISD::INSERT_VECTOR_ELT, getCurSDLoc(),
                            TLI.getValueType(DAG.getDataLayout(), I.getType()),
                            InVec, InVal, InIdx));
 }
 
 void SelectionDAGBuilder::visitExtractElement(const User &I) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SDValue InVec = getValue(I.getOperand(0));
   SDValue InIdx = DAG.getSExtOrTrunc(getValue(I.getOperand(1)), getCurSDLoc(),
                                      TLI.getVectorIdxTy(DAG.getDataLayout()));
   setValue(&I, DAG.getNode(ISD::EXTRACT_VECTOR_ELT, getCurSDLoc(),
                            TLI.getValueType(DAG.getDataLayout(), I.getType()),
                            InVec, InIdx));
 }
 
 void SelectionDAGBuilder::visitShuffleVector(const User &I) {
   SDValue Src1 = getValue(I.getOperand(0));
   SDValue Src2 = getValue(I.getOperand(1));
   ArrayRef<int> Mask;
   if (auto *SVI = dyn_cast<ShuffleVectorInst>(&I))
     Mask = SVI->getShuffleMask();
   else
     Mask = cast<ConstantExpr>(I).getShuffleMask();
   SDLoc DL = getCurSDLoc();
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
   EVT SrcVT = Src1.getValueType();
 
   if (all_of(Mask, [](int Elem) { return Elem == 0; }) &&
       VT.isScalableVector()) {
     // Canonical splat form of first element of first input vector.
     SDValue FirstElt =
         DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, SrcVT.getScalarType(), Src1,
                     DAG.getVectorIdxConstant(0, DL));
     setValue(&I, DAG.getNode(ISD::SPLAT_VECTOR, DL, VT, FirstElt));
     return;
   }
 
   // For now, we only handle splats for scalable vectors.
   // The DAGCombiner will perform a BUILD_VECTOR -> SPLAT_VECTOR transformation
   // for targets that support a SPLAT_VECTOR for non-scalable vector types.
   assert(!VT.isScalableVector() && "Unsupported scalable vector shuffle");
 
   unsigned SrcNumElts = SrcVT.getVectorNumElements();
   unsigned MaskNumElts = Mask.size();
 
   if (SrcNumElts == MaskNumElts) {
     setValue(&I, DAG.getVectorShuffle(VT, DL, Src1, Src2, Mask));
     return;
   }
 
   // Normalize the shuffle vector since mask and vector length don't match.
   if (SrcNumElts < MaskNumElts) {
     // Mask is longer than the source vectors. We can use concatenate vector to
     // make the mask and vectors lengths match.
 
     if (MaskNumElts % SrcNumElts == 0) {
       // Mask length is a multiple of the source vector length.
       // Check if the shuffle is some kind of concatenation of the input
       // vectors.
       unsigned NumConcat = MaskNumElts / SrcNumElts;
       bool IsConcat = true;
       SmallVector<int, 8> ConcatSrcs(NumConcat, -1);
       for (unsigned i = 0; i != MaskNumElts; ++i) {
         int Idx = Mask[i];
         if (Idx < 0)
           continue;
         // Ensure the indices in each SrcVT sized piece are sequential and that
         // the same source is used for the whole piece.
         if ((Idx % SrcNumElts != (i % SrcNumElts)) ||
             (ConcatSrcs[i / SrcNumElts] >= 0 &&
              ConcatSrcs[i / SrcNumElts] != (int)(Idx / SrcNumElts))) {
           IsConcat = false;
           break;
         }
         // Remember which source this index came from.
         ConcatSrcs[i / SrcNumElts] = Idx / SrcNumElts;
       }
 
       // The shuffle is concatenating multiple vectors together. Just emit
       // a CONCAT_VECTORS operation.
       if (IsConcat) {
         SmallVector<SDValue, 8> ConcatOps;
         for (auto Src : ConcatSrcs) {
           if (Src < 0)
             ConcatOps.push_back(DAG.getUNDEF(SrcVT));
           else if (Src == 0)
             ConcatOps.push_back(Src1);
           else
             ConcatOps.push_back(Src2);
         }
         setValue(&I, DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, ConcatOps));
         return;
       }
     }
 
     unsigned PaddedMaskNumElts = alignTo(MaskNumElts, SrcNumElts);
     unsigned NumConcat = PaddedMaskNumElts / SrcNumElts;
     EVT PaddedVT = EVT::getVectorVT(*DAG.getContext(), VT.getScalarType(),
                                     PaddedMaskNumElts);
 
     // Pad both vectors with undefs to make them the same length as the mask.
     SDValue UndefVal = DAG.getUNDEF(SrcVT);
 
     SmallVector<SDValue, 8> MOps1(NumConcat, UndefVal);
     SmallVector<SDValue, 8> MOps2(NumConcat, UndefVal);
     MOps1[0] = Src1;
     MOps2[0] = Src2;
 
     Src1 = DAG.getNode(ISD::CONCAT_VECTORS, DL, PaddedVT, MOps1);
     Src2 = DAG.getNode(ISD::CONCAT_VECTORS, DL, PaddedVT, MOps2);
 
     // Readjust mask for new input vector length.
     SmallVector<int, 8> MappedOps(PaddedMaskNumElts, -1);
     for (unsigned i = 0; i != MaskNumElts; ++i) {
       int Idx = Mask[i];
       if (Idx >= (int)SrcNumElts)
         Idx -= SrcNumElts - PaddedMaskNumElts;
       MappedOps[i] = Idx;
     }
 
     SDValue Result = DAG.getVectorShuffle(PaddedVT, DL, Src1, Src2, MappedOps);
 
     // If the concatenated vector was padded, extract a subvector with the
     // correct number of elements.
     if (MaskNumElts != PaddedMaskNumElts)
       Result = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, Result,
                            DAG.getVectorIdxConstant(0, DL));
 
     setValue(&I, Result);
     return;
   }
 
   if (SrcNumElts > MaskNumElts) {
     // Analyze the access pattern of the vector to see if we can extract
     // two subvectors and do the shuffle.
     int StartIdx[2] = { -1, -1 };  // StartIdx to extract from
     bool CanExtract = true;
     for (int Idx : Mask) {
       unsigned Input = 0;
       if (Idx < 0)
         continue;
 
       if (Idx >= (int)SrcNumElts) {
         Input = 1;
         Idx -= SrcNumElts;
       }
 
       // If all the indices come from the same MaskNumElts sized portion of
       // the sources we can use extract. Also make sure the extract wouldn't
       // extract past the end of the source.
       int NewStartIdx = alignDown(Idx, MaskNumElts);
       if (NewStartIdx + MaskNumElts > SrcNumElts ||
           (StartIdx[Input] >= 0 && StartIdx[Input] != NewStartIdx))
         CanExtract = false;
       // Make sure we always update StartIdx as we use it to track if all
       // elements are undef.
       StartIdx[Input] = NewStartIdx;
     }
 
     if (StartIdx[0] < 0 && StartIdx[1] < 0) {
       setValue(&I, DAG.getUNDEF(VT)); // Vectors are not used.
       return;
     }
     if (CanExtract) {
       // Extract appropriate subvector and generate a vector shuffle
       for (unsigned Input = 0; Input < 2; ++Input) {
         SDValue &Src = Input == 0 ? Src1 : Src2;
         if (StartIdx[Input] < 0)
           Src = DAG.getUNDEF(VT);
         else {
           Src = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, Src,
                             DAG.getVectorIdxConstant(StartIdx[Input], DL));
         }
       }
 
       // Calculate new mask.
       SmallVector<int, 8> MappedOps(Mask.begin(), Mask.end());
       for (int &Idx : MappedOps) {
         if (Idx >= (int)SrcNumElts)
           Idx -= SrcNumElts + StartIdx[1] - MaskNumElts;
         else if (Idx >= 0)
           Idx -= StartIdx[0];
       }
 
       setValue(&I, DAG.getVectorShuffle(VT, DL, Src1, Src2, MappedOps));
       return;
     }
   }
 
   // We can't use either concat vectors or extract subvectors so fall back to
   // replacing the shuffle with extract and build vector.
   // to insert and build vector.
   EVT EltVT = VT.getVectorElementType();
   SmallVector<SDValue,8> Ops;
   for (int Idx : Mask) {
     SDValue Res;
 
     if (Idx < 0) {
       Res = DAG.getUNDEF(EltVT);
     } else {
       SDValue &Src = Idx < (int)SrcNumElts ? Src1 : Src2;
       if (Idx >= (int)SrcNumElts) Idx -= SrcNumElts;
 
       Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, EltVT, Src,
                         DAG.getVectorIdxConstant(Idx, DL));
     }
 
     Ops.push_back(Res);
   }
 
   setValue(&I, DAG.getBuildVector(VT, DL, Ops));
 }
 
 void SelectionDAGBuilder::visitInsertValue(const User &I) {
   ArrayRef<unsigned> Indices;
   if (const InsertValueInst *IV = dyn_cast<InsertValueInst>(&I))
     Indices = IV->getIndices();
   else
     Indices = cast<ConstantExpr>(&I)->getIndices();
 
   const Value *Op0 = I.getOperand(0);
   const Value *Op1 = I.getOperand(1);
   Type *AggTy = I.getType();
   Type *ValTy = Op1->getType();
   bool IntoUndef = isa<UndefValue>(Op0);
   bool FromUndef = isa<UndefValue>(Op1);
 
   unsigned LinearIndex = ComputeLinearIndex(AggTy, Indices);
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SmallVector<EVT, 4> AggValueVTs;
   ComputeValueVTs(TLI, DAG.getDataLayout(), AggTy, AggValueVTs);
   SmallVector<EVT, 4> ValValueVTs;
   ComputeValueVTs(TLI, DAG.getDataLayout(), ValTy, ValValueVTs);
 
   unsigned NumAggValues = AggValueVTs.size();
   unsigned NumValValues = ValValueVTs.size();
   SmallVector<SDValue, 4> Values(NumAggValues);
 
   // Ignore an insertvalue that produces an empty object
   if (!NumAggValues) {
     setValue(&I, DAG.getUNDEF(MVT(MVT::Other)));
     return;
   }
 
   SDValue Agg = getValue(Op0);
   unsigned i = 0;
   // Copy the beginning value(s) from the original aggregate.
   for (; i != LinearIndex; ++i)
     Values[i] = IntoUndef ? DAG.getUNDEF(AggValueVTs[i]) :
                 SDValue(Agg.getNode(), Agg.getResNo() + i);
   // Copy values from the inserted value(s).
   if (NumValValues) {
     SDValue Val = getValue(Op1);
     for (; i != LinearIndex + NumValValues; ++i)
       Values[i] = FromUndef ? DAG.getUNDEF(AggValueVTs[i]) :
                   SDValue(Val.getNode(), Val.getResNo() + i - LinearIndex);
   }
   // Copy remaining value(s) from the original aggregate.
   for (; i != NumAggValues; ++i)
     Values[i] = IntoUndef ? DAG.getUNDEF(AggValueVTs[i]) :
                 SDValue(Agg.getNode(), Agg.getResNo() + i);
 
   setValue(&I, DAG.getNode(ISD::MERGE_VALUES, getCurSDLoc(),
                            DAG.getVTList(AggValueVTs), Values));
 }
 
 void SelectionDAGBuilder::visitExtractValue(const User &I) {
   ArrayRef<unsigned> Indices;
   if (const ExtractValueInst *EV = dyn_cast<ExtractValueInst>(&I))
     Indices = EV->getIndices();
   else
     Indices = cast<ConstantExpr>(&I)->getIndices();
 
   const Value *Op0 = I.getOperand(0);
   Type *AggTy = Op0->getType();
   Type *ValTy = I.getType();
   bool OutOfUndef = isa<UndefValue>(Op0);
 
   unsigned LinearIndex = ComputeLinearIndex(AggTy, Indices);
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SmallVector<EVT, 4> ValValueVTs;
   ComputeValueVTs(TLI, DAG.getDataLayout(), ValTy, ValValueVTs);
 
   unsigned NumValValues = ValValueVTs.size();
 
   // Ignore a extractvalue that produces an empty object
   if (!NumValValues) {
     setValue(&I, DAG.getUNDEF(MVT(MVT::Other)));
     return;
   }
 
   SmallVector<SDValue, 4> Values(NumValValues);
 
   SDValue Agg = getValue(Op0);
   // Copy out the selected value(s).
   for (unsigned i = LinearIndex; i != LinearIndex + NumValValues; ++i)
     Values[i - LinearIndex] =
       OutOfUndef ?
         DAG.getUNDEF(Agg.getNode()->getValueType(Agg.getResNo() + i)) :
         SDValue(Agg.getNode(), Agg.getResNo() + i);
 
   setValue(&I, DAG.getNode(ISD::MERGE_VALUES, getCurSDLoc(),
                            DAG.getVTList(ValValueVTs), Values));
 }
 
 void SelectionDAGBuilder::visitGetElementPtr(const User &I) {
   Value *Op0 = I.getOperand(0);
   // Note that the pointer operand may be a vector of pointers. Take the scalar
   // element which holds a pointer.
   unsigned AS = Op0->getType()->getScalarType()->getPointerAddressSpace();
   SDValue N = getValue(Op0);
   SDLoc dl = getCurSDLoc();
   auto &TLI = DAG.getTargetLoweringInfo();
   MVT PtrTy = TLI.getPointerTy(DAG.getDataLayout(), AS);
   MVT PtrMemTy = TLI.getPointerMemTy(DAG.getDataLayout(), AS);
 
   // Normalize Vector GEP - all scalar operands should be converted to the
   // splat vector.
   bool IsVectorGEP = I.getType()->isVectorTy();
   ElementCount VectorElementCount =
       IsVectorGEP ? cast<VectorType>(I.getType())->getElementCount()
                   : ElementCount(0, false);
 
   if (IsVectorGEP && !N.getValueType().isVector()) {
     LLVMContext &Context = *DAG.getContext();
     EVT VT = EVT::getVectorVT(Context, N.getValueType(), VectorElementCount);
     if (VectorElementCount.Scalable)
       N = DAG.getSplatVector(VT, dl, N);
     else
       N = DAG.getSplatBuildVector(VT, dl, N);
   }
 
   for (gep_type_iterator GTI = gep_type_begin(&I), E = gep_type_end(&I);
        GTI != E; ++GTI) {
     const Value *Idx = GTI.getOperand();
     if (StructType *StTy = GTI.getStructTypeOrNull()) {
       unsigned Field = cast<Constant>(Idx)->getUniqueInteger().getZExtValue();
       if (Field) {
         // N = N + Offset
         uint64_t Offset = DL->getStructLayout(StTy)->getElementOffset(Field);
 
         // In an inbounds GEP with an offset that is nonnegative even when
         // interpreted as signed, assume there is no unsigned overflow.
         SDNodeFlags Flags;
         if (int64_t(Offset) >= 0 && cast<GEPOperator>(I).isInBounds())
           Flags.setNoUnsignedWrap(true);
 
         N = DAG.getNode(ISD::ADD, dl, N.getValueType(), N,
                         DAG.getConstant(Offset, dl, N.getValueType()), Flags);
       }
     } else {
       // IdxSize is the width of the arithmetic according to IR semantics.
       // In SelectionDAG, we may prefer to do arithmetic in a wider bitwidth
       // (and fix up the result later).
       unsigned IdxSize = DAG.getDataLayout().getIndexSizeInBits(AS);
       MVT IdxTy = MVT::getIntegerVT(IdxSize);
       TypeSize ElementSize = DL->getTypeAllocSize(GTI.getIndexedType());
       // We intentionally mask away the high bits here; ElementSize may not
       // fit in IdxTy.
       APInt ElementMul(IdxSize, ElementSize.getKnownMinSize());
       bool ElementScalable = ElementSize.isScalable();
 
       // If this is a scalar constant or a splat vector of constants,
       // handle it quickly.
       const auto *C = dyn_cast<Constant>(Idx);
       if (C && isa<VectorType>(C->getType()))
         C = C->getSplatValue();
 
       const auto *CI = dyn_cast_or_null<ConstantInt>(C);
       if (CI && CI->isZero())
         continue;
       if (CI && !ElementScalable) {
         APInt Offs = ElementMul * CI->getValue().sextOrTrunc(IdxSize);
         LLVMContext &Context = *DAG.getContext();
         SDValue OffsVal;
         if (IsVectorGEP)
           OffsVal = DAG.getConstant(
               Offs, dl, EVT::getVectorVT(Context, IdxTy, VectorElementCount));
         else
           OffsVal = DAG.getConstant(Offs, dl, IdxTy);
 
         // In an inbounds GEP with an offset that is nonnegative even when
         // interpreted as signed, assume there is no unsigned overflow.
         SDNodeFlags Flags;
         if (Offs.isNonNegative() && cast<GEPOperator>(I).isInBounds())
           Flags.setNoUnsignedWrap(true);
 
         OffsVal = DAG.getSExtOrTrunc(OffsVal, dl, N.getValueType());
 
         N = DAG.getNode(ISD::ADD, dl, N.getValueType(), N, OffsVal, Flags);
         continue;
       }
 
       // N = N + Idx * ElementMul;
       SDValue IdxN = getValue(Idx);
 
       if (!IdxN.getValueType().isVector() && IsVectorGEP) {
         EVT VT = EVT::getVectorVT(*Context, IdxN.getValueType(),
                                   VectorElementCount);
         if (VectorElementCount.Scalable)
           IdxN = DAG.getSplatVector(VT, dl, IdxN);
         else
           IdxN = DAG.getSplatBuildVector(VT, dl, IdxN);
       }
 
       // If the index is smaller or larger than intptr_t, truncate or extend
       // it.
       IdxN = DAG.getSExtOrTrunc(IdxN, dl, N.getValueType());
 
       if (ElementScalable) {
         EVT VScaleTy = N.getValueType().getScalarType();
         SDValue VScale = DAG.getNode(
             ISD::VSCALE, dl, VScaleTy,
             DAG.getConstant(ElementMul.getZExtValue(), dl, VScaleTy));
         if (IsVectorGEP)
           VScale = DAG.getSplatVector(N.getValueType(), dl, VScale);
         IdxN = DAG.getNode(ISD::MUL, dl, N.getValueType(), IdxN, VScale);
       } else {
         // If this is a multiply by a power of two, turn it into a shl
         // immediately.  This is a very common case.
         if (ElementMul != 1) {
           if (ElementMul.isPowerOf2()) {
             unsigned Amt = ElementMul.logBase2();
             IdxN = DAG.getNode(ISD::SHL, dl,
                                N.getValueType(), IdxN,
                                DAG.getConstant(Amt, dl, IdxN.getValueType()));
           } else {
             SDValue Scale = DAG.getConstant(ElementMul.getZExtValue(), dl,
                                             IdxN.getValueType());
             IdxN = DAG.getNode(ISD::MUL, dl,
                                N.getValueType(), IdxN, Scale);
           }
         }
       }
 
       N = DAG.getNode(ISD::ADD, dl,
                       N.getValueType(), N, IdxN);
     }
   }
 
   if (PtrMemTy != PtrTy && !cast<GEPOperator>(I).isInBounds())
     N = DAG.getPtrExtendInReg(N, dl, PtrMemTy);
 
   setValue(&I, N);
 }
 
 void SelectionDAGBuilder::visitAlloca(const AllocaInst &I) {
   // If this is a fixed sized alloca in the entry block of the function,
   // allocate it statically on the stack.
   if (FuncInfo.StaticAllocaMap.count(&I))
     return;   // getValue will auto-populate this.
 
   SDLoc dl = getCurSDLoc();
   Type *Ty = I.getAllocatedType();
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   auto &DL = DAG.getDataLayout();
   uint64_t TySize = DL.getTypeAllocSize(Ty);
   MaybeAlign Alignment = std::max(DL.getPrefTypeAlign(Ty), I.getAlign());
 
   SDValue AllocSize = getValue(I.getArraySize());
 
   EVT IntPtr = TLI.getPointerTy(DAG.getDataLayout(), DL.getAllocaAddrSpace());
   if (AllocSize.getValueType() != IntPtr)
     AllocSize = DAG.getZExtOrTrunc(AllocSize, dl, IntPtr);
 
   AllocSize = DAG.getNode(ISD::MUL, dl, IntPtr,
                           AllocSize,
                           DAG.getConstant(TySize, dl, IntPtr));
 
   // Handle alignment.  If the requested alignment is less than or equal to
   // the stack alignment, ignore it.  If the size is greater than or equal to
   // the stack alignment, we note this in the DYNAMIC_STACKALLOC node.
   Align StackAlign = DAG.getSubtarget().getFrameLowering()->getStackAlign();
   if (*Alignment <= StackAlign)
     Alignment = None;
 
   const uint64_t StackAlignMask = StackAlign.value() - 1U;
   // Round the size of the allocation up to the stack alignment size
   // by add SA-1 to the size. This doesn't overflow because we're computing
   // an address inside an alloca.
   SDNodeFlags Flags;
   Flags.setNoUnsignedWrap(true);
   AllocSize = DAG.getNode(ISD::ADD, dl, AllocSize.getValueType(), AllocSize,
                           DAG.getConstant(StackAlignMask, dl, IntPtr), Flags);
 
   // Mask out the low bits for alignment purposes.
   AllocSize = DAG.getNode(ISD::AND, dl, AllocSize.getValueType(), AllocSize,
                           DAG.getConstant(~StackAlignMask, dl, IntPtr));
 
   SDValue Ops[] = {
       getRoot(), AllocSize,
       DAG.getConstant(Alignment ? Alignment->value() : 0, dl, IntPtr)};
   SDVTList VTs = DAG.getVTList(AllocSize.getValueType(), MVT::Other);
   SDValue DSA = DAG.getNode(ISD::DYNAMIC_STACKALLOC, dl, VTs, Ops);
   setValue(&I, DSA);
   DAG.setRoot(DSA.getValue(1));
 
   assert(FuncInfo.MF->getFrameInfo().hasVarSizedObjects());
 }
 
 void SelectionDAGBuilder::visitLoad(const LoadInst &I) {
   if (I.isAtomic())
     return visitAtomicLoad(I);
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   const Value *SV = I.getOperand(0);
   if (TLI.supportSwiftError()) {
     // Swifterror values can come from either a function parameter with
     // swifterror attribute or an alloca with swifterror attribute.
     if (const Argument *Arg = dyn_cast<Argument>(SV)) {
       if (Arg->hasSwiftErrorAttr())
         return visitLoadFromSwiftError(I);
     }
 
     if (const AllocaInst *Alloca = dyn_cast<AllocaInst>(SV)) {
       if (Alloca->isSwiftError())
         return visitLoadFromSwiftError(I);
     }
   }
 
   SDValue Ptr = getValue(SV);
 
   Type *Ty = I.getType();
   Align Alignment = I.getAlign();
 
   AAMDNodes AAInfo;
   I.getAAMetadata(AAInfo);
   const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);
 
   SmallVector<EVT, 4> ValueVTs, MemVTs;
   SmallVector<uint64_t, 4> Offsets;
   ComputeValueVTs(TLI, DAG.getDataLayout(), Ty, ValueVTs, &MemVTs, &Offsets);
   unsigned NumValues = ValueVTs.size();
   if (NumValues == 0)
     return;
 
   bool isVolatile = I.isVolatile();
 
   SDValue Root;
   bool ConstantMemory = false;
   if (isVolatile)
     // Serialize volatile loads with other side effects.
     Root = getRoot();
   else if (NumValues > MaxParallelChains)
     Root = getMemoryRoot();
   else if (AA &&
            AA->pointsToConstantMemory(MemoryLocation(
                SV,
                LocationSize::precise(DAG.getDataLayout().getTypeStoreSize(Ty)),
                AAInfo))) {
     // Do not serialize (non-volatile) loads of constant memory with anything.
     Root = DAG.getEntryNode();
     ConstantMemory = true;
   } else {
     // Do not serialize non-volatile loads against each other.
     Root = DAG.getRoot();
   }
 
   SDLoc dl = getCurSDLoc();
 
   if (isVolatile)
     Root = TLI.prepareVolatileOrAtomicLoad(Root, dl, DAG);
 
   // An aggregate load cannot wrap around the address space, so offsets to its
   // parts don't wrap either.
   SDNodeFlags Flags;
   Flags.setNoUnsignedWrap(true);
 
   SmallVector<SDValue, 4> Values(NumValues);
   SmallVector<SDValue, 4> Chains(std::min(MaxParallelChains, NumValues));
   EVT PtrVT = Ptr.getValueType();
 
   MachineMemOperand::Flags MMOFlags
     = TLI.getLoadMemOperandFlags(I, DAG.getDataLayout());
 
   unsigned ChainI = 0;
   for (unsigned i = 0; i != NumValues; ++i, ++ChainI) {
     // Serializing loads here may result in excessive register pressure, and
     // TokenFactor places arbitrary choke points on the scheduler. SD scheduling
     // could recover a bit by hoisting nodes upward in the chain by recognizing
     // they are side-effect free or do not alias. The optimizer should really
     // avoid this case by converting large object/array copies to llvm.memcpy
     // (MaxParallelChains should always remain as failsafe).
     if (ChainI == MaxParallelChains) {
       assert(PendingLoads.empty() && "PendingLoads must be serialized first");
       SDValue Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
                                   makeArrayRef(Chains.data(), ChainI));
       Root = Chain;
       ChainI = 0;
     }
     SDValue A = DAG.getNode(ISD::ADD, dl,
                             PtrVT, Ptr,
                             DAG.getConstant(Offsets[i], dl, PtrVT),
                             Flags);
 
     SDValue L = DAG.getLoad(MemVTs[i], dl, Root, A,
                             MachinePointerInfo(SV, Offsets[i]), Alignment,
                             MMOFlags, AAInfo, Ranges);
     Chains[ChainI] = L.getValue(1);
 
     if (MemVTs[i] != ValueVTs[i])
       L = DAG.getZExtOrTrunc(L, dl, ValueVTs[i]);
 
     Values[i] = L;
   }
 
   if (!ConstantMemory) {
     SDValue Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
                                 makeArrayRef(Chains.data(), ChainI));
     if (isVolatile)
       DAG.setRoot(Chain);
     else
       PendingLoads.push_back(Chain);
   }
 
   setValue(&I, DAG.getNode(ISD::MERGE_VALUES, dl,
                            DAG.getVTList(ValueVTs), Values));
 }
 
 void SelectionDAGBuilder::visitStoreToSwiftError(const StoreInst &I) {
   assert(DAG.getTargetLoweringInfo().supportSwiftError() &&
          "call visitStoreToSwiftError when backend supports swifterror");
 
   SmallVector<EVT, 4> ValueVTs;
   SmallVector<uint64_t, 4> Offsets;
   const Value *SrcV = I.getOperand(0);
   ComputeValueVTs(DAG.getTargetLoweringInfo(), DAG.getDataLayout(),
                   SrcV->getType(), ValueVTs, &Offsets);
   assert(ValueVTs.size() == 1 && Offsets[0] == 0 &&
          "expect a single EVT for swifterror");
 
   SDValue Src = getValue(SrcV);
   // Create a virtual register, then update the virtual register.
   Register VReg =
       SwiftError.getOrCreateVRegDefAt(&I, FuncInfo.MBB, I.getPointerOperand());
   // Chain, DL, Reg, N or Chain, DL, Reg, N, Glue
   // Chain can be getRoot or getControlRoot.
   SDValue CopyNode = DAG.getCopyToReg(getRoot(), getCurSDLoc(), VReg,
                                       SDValue(Src.getNode(), Src.getResNo()));
   DAG.setRoot(CopyNode);
 }
 
 void SelectionDAGBuilder::visitLoadFromSwiftError(const LoadInst &I) {
   assert(DAG.getTargetLoweringInfo().supportSwiftError() &&
          "call visitLoadFromSwiftError when backend supports swifterror");
 
   assert(!I.isVolatile() &&
          !I.hasMetadata(LLVMContext::MD_nontemporal) &&
          !I.hasMetadata(LLVMContext::MD_invariant_load) &&
          "Support volatile, non temporal, invariant for load_from_swift_error");
 
   const Value *SV = I.getOperand(0);
   Type *Ty = I.getType();
   AAMDNodes AAInfo;
   I.getAAMetadata(AAInfo);
   assert(
       (!AA ||
        !AA->pointsToConstantMemory(MemoryLocation(
            SV, LocationSize::precise(DAG.getDataLayout().getTypeStoreSize(Ty)),
            AAInfo))) &&
       "load_from_swift_error should not be constant memory");
 
   SmallVector<EVT, 4> ValueVTs;
   SmallVector<uint64_t, 4> Offsets;
   ComputeValueVTs(DAG.getTargetLoweringInfo(), DAG.getDataLayout(), Ty,
                   ValueVTs, &Offsets);
   assert(ValueVTs.size() == 1 && Offsets[0] == 0 &&
          "expect a single EVT for swifterror");
 
   // Chain, DL, Reg, VT, Glue or Chain, DL, Reg, VT
   SDValue L = DAG.getCopyFromReg(
       getRoot(), getCurSDLoc(),
       SwiftError.getOrCreateVRegUseAt(&I, FuncInfo.MBB, SV), ValueVTs[0]);
 
   setValue(&I, L);
 }
 
 void SelectionDAGBuilder::visitStore(const StoreInst &I) {
   if (I.isAtomic())
     return visitAtomicStore(I);
 
   const Value *SrcV = I.getOperand(0);
   const Value *PtrV = I.getOperand(1);
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   if (TLI.supportSwiftError()) {
     // Swifterror values can come from either a function parameter with
     // swifterror attribute or an alloca with swifterror attribute.
     if (const Argument *Arg = dyn_cast<Argument>(PtrV)) {
       if (Arg->hasSwiftErrorAttr())
         return visitStoreToSwiftError(I);
     }
 
     if (const AllocaInst *Alloca = dyn_cast<AllocaInst>(PtrV)) {
       if (Alloca->isSwiftError())
         return visitStoreToSwiftError(I);
     }
   }
 
   SmallVector<EVT, 4> ValueVTs, MemVTs;
   SmallVector<uint64_t, 4> Offsets;
   ComputeValueVTs(DAG.getTargetLoweringInfo(), DAG.getDataLayout(),
                   SrcV->getType(), ValueVTs, &MemVTs, &Offsets);
   unsigned NumValues = ValueVTs.size();
   if (NumValues == 0)
     return;
 
   // Get the lowered operands. Note that we do this after
   // checking if NumResults is zero, because with zero results
   // the operands won't have values in the map.
   SDValue Src = getValue(SrcV);
   SDValue Ptr = getValue(PtrV);
 
   SDValue Root = I.isVolatile() ? getRoot() : getMemoryRoot();
   SmallVector<SDValue, 4> Chains(std::min(MaxParallelChains, NumValues));
   SDLoc dl = getCurSDLoc();
   Align Alignment = I.getAlign();
   AAMDNodes AAInfo;
   I.getAAMetadata(AAInfo);
 
   auto MMOFlags = TLI.getStoreMemOperandFlags(I, DAG.getDataLayout());
 
   // An aggregate load cannot wrap around the address space, so offsets to its
   // parts don't wrap either.
   SDNodeFlags Flags;
   Flags.setNoUnsignedWrap(true);
 
   unsigned ChainI = 0;
   for (unsigned i = 0; i != NumValues; ++i, ++ChainI) {
     // See visitLoad comments.
     if (ChainI == MaxParallelChains) {
       SDValue Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
                                   makeArrayRef(Chains.data(), ChainI));
       Root = Chain;
       ChainI = 0;
     }
     SDValue Add = DAG.getMemBasePlusOffset(Ptr, Offsets[i], dl, Flags);
     SDValue Val = SDValue(Src.getNode(), Src.getResNo() + i);
     if (MemVTs[i] != ValueVTs[i])
       Val = DAG.getPtrExtOrTrunc(Val, dl, MemVTs[i]);
     SDValue St =
         DAG.getStore(Root, dl, Val, Add, MachinePointerInfo(PtrV, Offsets[i]),
                      Alignment, MMOFlags, AAInfo);
     Chains[ChainI] = St;
   }
 
   SDValue StoreNode = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
                                   makeArrayRef(Chains.data(), ChainI));
   DAG.setRoot(StoreNode);
 }
 
 void SelectionDAGBuilder::visitMaskedStore(const CallInst &I,
                                            bool IsCompressing) {
   SDLoc sdl = getCurSDLoc();
 
   auto getMaskedStoreOps = [&](Value *&Ptr, Value *&Mask, Value *&Src0,
                                MaybeAlign &Alignment) {
     // llvm.masked.store.*(Src0, Ptr, alignment, Mask)
     Src0 = I.getArgOperand(0);
     Ptr = I.getArgOperand(1);
     Alignment = cast<ConstantInt>(I.getArgOperand(2))->getMaybeAlignValue();
     Mask = I.getArgOperand(3);
   };
   auto getCompressingStoreOps = [&](Value *&Ptr, Value *&Mask, Value *&Src0,
                                     MaybeAlign &Alignment) {
     // llvm.masked.compressstore.*(Src0, Ptr, Mask)
     Src0 = I.getArgOperand(0);
     Ptr = I.getArgOperand(1);
     Mask = I.getArgOperand(2);
     Alignment = None;
   };
 
   Value  *PtrOperand, *MaskOperand, *Src0Operand;
   MaybeAlign Alignment;
   if (IsCompressing)
     getCompressingStoreOps(PtrOperand, MaskOperand, Src0Operand, Alignment);
   else
     getMaskedStoreOps(PtrOperand, MaskOperand, Src0Operand, Alignment);
 
   SDValue Ptr = getValue(PtrOperand);
   SDValue Src0 = getValue(Src0Operand);
   SDValue Mask = getValue(MaskOperand);
   SDValue Offset = DAG.getUNDEF(Ptr.getValueType());
 
   EVT VT = Src0.getValueType();
   if (!Alignment)
     Alignment = DAG.getEVTAlign(VT);
 
   AAMDNodes AAInfo;
   I.getAAMetadata(AAInfo);
 
   MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
       MachinePointerInfo(PtrOperand), MachineMemOperand::MOStore,
       // TODO: Make MachineMemOperands aware of scalable
       // vectors.
       VT.getStoreSize().getKnownMinSize(), *Alignment, AAInfo);
   SDValue StoreNode =
       DAG.getMaskedStore(getMemoryRoot(), sdl, Src0, Ptr, Offset, Mask, VT, MMO,
                          ISD::UNINDEXED, false /* Truncating */, IsCompressing);
   DAG.setRoot(StoreNode);
   setValue(&I, StoreNode);
 }
 
 // Get a uniform base for the Gather/Scatter intrinsic.
 // The first argument of the Gather/Scatter intrinsic is a vector of pointers.
 // We try to represent it as a base pointer + vector of indices.
 // Usually, the vector of pointers comes from a 'getelementptr' instruction.
 // The first operand of the GEP may be a single pointer or a vector of pointers
 // Example:
 //   %gep.ptr = getelementptr i32, <8 x i32*> %vptr, <8 x i32> %ind
 //  or
 //   %gep.ptr = getelementptr i32, i32* %ptr,        <8 x i32> %ind
 // %res = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %gep.ptr, ..
 //
 // When the first GEP operand is a single pointer - it is the uniform base we
 // are looking for. If first operand of the GEP is a splat vector - we
 // extract the splat value and use it as a uniform base.
 // In all other cases the function returns 'false'.
 static bool getUniformBase(const Value *Ptr, SDValue &Base, SDValue &Index,
                            ISD::MemIndexType &IndexType, SDValue &Scale,
                            SelectionDAGBuilder *SDB, const BasicBlock *CurBB) {
   SelectionDAG& DAG = SDB->DAG;
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   const DataLayout &DL = DAG.getDataLayout();
 
   assert(Ptr->getType()->isVectorTy() && "Uexpected pointer type");
 
   // Handle splat constant pointer.
   if (auto *C = dyn_cast<Constant>(Ptr)) {
     C = C->getSplatValue();
     if (!C)
       return false;
 
     Base = SDB->getValue(C);
 
     unsigned NumElts = cast<FixedVectorType>(Ptr->getType())->getNumElements();
     EVT VT = EVT::getVectorVT(*DAG.getContext(), TLI.getPointerTy(DL), NumElts);
     Index = DAG.getConstant(0, SDB->getCurSDLoc(), VT);
     IndexType = ISD::SIGNED_SCALED;
     Scale = DAG.getTargetConstant(1, SDB->getCurSDLoc(), TLI.getPointerTy(DL));
     return true;
   }
 
   const GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Ptr);
   if (!GEP || GEP->getParent() != CurBB)
     return false;
 
   if (GEP->getNumOperands() != 2)
     return false;
 
   const Value *BasePtr = GEP->getPointerOperand();
   const Value *IndexVal = GEP->getOperand(GEP->getNumOperands() - 1);
 
   // Make sure the base is scalar and the index is a vector.
   if (BasePtr->getType()->isVectorTy() || !IndexVal->getType()->isVectorTy())
     return false;
 
   Base = SDB->getValue(BasePtr);
   Index = SDB->getValue(IndexVal);
   IndexType = ISD::SIGNED_SCALED;
   Scale = DAG.getTargetConstant(
               DL.getTypeAllocSize(GEP->getResultElementType()),
               SDB->getCurSDLoc(), TLI.getPointerTy(DL));
   return true;
 }
 
 void SelectionDAGBuilder::visitMaskedScatter(const CallInst &I) {
   SDLoc sdl = getCurSDLoc();
 
   // llvm.masked.scatter.*(Src0, Ptrs, alignment, Mask)
   const Value *Ptr = I.getArgOperand(1);
   SDValue Src0 = getValue(I.getArgOperand(0));
   SDValue Mask = getValue(I.getArgOperand(3));
   EVT VT = Src0.getValueType();
   Align Alignment = cast<ConstantInt>(I.getArgOperand(2))
                         ->getMaybeAlignValue()
                         .getValueOr(DAG.getEVTAlign(VT));
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
 
   AAMDNodes AAInfo;
   I.getAAMetadata(AAInfo);
 
   SDValue Base;
   SDValue Index;
   ISD::MemIndexType IndexType;
   SDValue Scale;
   bool UniformBase = getUniformBase(Ptr, Base, Index, IndexType, Scale, this,
                                     I.getParent());
 
   unsigned AS = Ptr->getType()->getScalarType()->getPointerAddressSpace();
   MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
       MachinePointerInfo(AS), MachineMemOperand::MOStore,
       // TODO: Make MachineMemOperands aware of scalable
       // vectors.
       MemoryLocation::UnknownSize, Alignment, AAInfo);
   if (!UniformBase) {
     Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
     Index = getValue(Ptr);
     IndexType = ISD::SIGNED_SCALED;
     Scale = DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
   }
   SDValue Ops[] = { getMemoryRoot(), Src0, Mask, Base, Index, Scale };
   SDValue Scatter = DAG.getMaskedScatter(DAG.getVTList(MVT::Other), VT, sdl,
                                          Ops, MMO, IndexType);
   DAG.setRoot(Scatter);
   setValue(&I, Scatter);
 }
 
 void SelectionDAGBuilder::visitMaskedLoad(const CallInst &I, bool IsExpanding) {
   SDLoc sdl = getCurSDLoc();
 
   auto getMaskedLoadOps = [&](Value *&Ptr, Value *&Mask, Value *&Src0,
                               MaybeAlign &Alignment) {
     // @llvm.masked.load.*(Ptr, alignment, Mask, Src0)
     Ptr = I.getArgOperand(0);
     Alignment = cast<ConstantInt>(I.getArgOperand(1))->getMaybeAlignValue();
     Mask = I.getArgOperand(2);
     Src0 = I.getArgOperand(3);
   };
   auto getExpandingLoadOps = [&](Value *&Ptr, Value *&Mask, Value *&Src0,
                                  MaybeAlign &Alignment) {
     // @llvm.masked.expandload.*(Ptr, Mask, Src0)
     Ptr = I.getArgOperand(0);
     Alignment = None;
     Mask = I.getArgOperand(1);
     Src0 = I.getArgOperand(2);
   };
 
   Value  *PtrOperand, *MaskOperand, *Src0Operand;
   MaybeAlign Alignment;
   if (IsExpanding)
     getExpandingLoadOps(PtrOperand, MaskOperand, Src0Operand, Alignment);
   else
     getMaskedLoadOps(PtrOperand, MaskOperand, Src0Operand, Alignment);
 
   SDValue Ptr = getValue(PtrOperand);
   SDValue Src0 = getValue(Src0Operand);
   SDValue Mask = getValue(MaskOperand);
   SDValue Offset = DAG.getUNDEF(Ptr.getValueType());
 
   EVT VT = Src0.getValueType();
   if (!Alignment)
     Alignment = DAG.getEVTAlign(VT);
 
   AAMDNodes AAInfo;
   I.getAAMetadata(AAInfo);
   const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);
 
   // Do not serialize masked loads of constant memory with anything.
   MemoryLocation ML;
   if (VT.isScalableVector())
     ML = MemoryLocation(PtrOperand);
   else
     ML = MemoryLocation(PtrOperand, LocationSize::precise(
                            DAG.getDataLayout().getTypeStoreSize(I.getType())),
                            AAInfo);
   bool AddToChain = !AA || !AA->pointsToConstantMemory(ML);
 
   SDValue InChain = AddToChain ? DAG.getRoot() : DAG.getEntryNode();
 
   MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
       MachinePointerInfo(PtrOperand), MachineMemOperand::MOLoad,
       // TODO: Make MachineMemOperands aware of scalable
       // vectors.
       VT.getStoreSize().getKnownMinSize(), *Alignment, AAInfo, Ranges);
 
   SDValue Load =
       DAG.getMaskedLoad(VT, sdl, InChain, Ptr, Offset, Mask, Src0, VT, MMO,
                         ISD::UNINDEXED, ISD::NON_EXTLOAD, IsExpanding);
   if (AddToChain)
     PendingLoads.push_back(Load.getValue(1));
   setValue(&I, Load);
 }
 
 void SelectionDAGBuilder::visitMaskedGather(const CallInst &I) {
   SDLoc sdl = getCurSDLoc();
 
   // @llvm.masked.gather.*(Ptrs, alignment, Mask, Src0)
   const Value *Ptr = I.getArgOperand(0);
   SDValue Src0 = getValue(I.getArgOperand(3));
   SDValue Mask = getValue(I.getArgOperand(2));
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
   Align Alignment = cast<ConstantInt>(I.getArgOperand(1))
                         ->getMaybeAlignValue()
                         .getValueOr(DAG.getEVTAlign(VT));
 
   AAMDNodes AAInfo;
   I.getAAMetadata(AAInfo);
   const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);
 
   SDValue Root = DAG.getRoot();
   SDValue Base;
   SDValue Index;
   ISD::MemIndexType IndexType;
   SDValue Scale;
   bool UniformBase = getUniformBase(Ptr, Base, Index, IndexType, Scale, this,
                                     I.getParent());
   unsigned AS = Ptr->getType()->getScalarType()->getPointerAddressSpace();
   MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
       MachinePointerInfo(AS), MachineMemOperand::MOLoad,
       // TODO: Make MachineMemOperands aware of scalable
       // vectors.
       MemoryLocation::UnknownSize, Alignment, AAInfo, Ranges);
 
   if (!UniformBase) {
     Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
     Index = getValue(Ptr);
     IndexType = ISD::SIGNED_SCALED;
     Scale = DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
   }
   SDValue Ops[] = { Root, Src0, Mask, Base, Index, Scale };
   SDValue Gather = DAG.getMaskedGather(DAG.getVTList(VT, MVT::Other), VT, sdl,
                                        Ops, MMO, IndexType);
 
   PendingLoads.push_back(Gather.getValue(1));
   setValue(&I, Gather);
 }
 
 void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {
   SDLoc dl = getCurSDLoc();
   AtomicOrdering SuccessOrdering = I.getSuccessOrdering();
   AtomicOrdering FailureOrdering = I.getFailureOrdering();
   SyncScope::ID SSID = I.getSyncScopeID();
 
   SDValue InChain = getRoot();
 
   MVT MemVT = getValue(I.getCompareOperand()).getSimpleValueType();
   SDVTList VTs = DAG.getVTList(MemVT, MVT::i1, MVT::Other);
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   auto Flags = TLI.getAtomicMemOperandFlags(I, DAG.getDataLayout());
 
   MachineFunction &MF = DAG.getMachineFunction();
   MachineMemOperand *MMO = MF.getMachineMemOperand(
       MachinePointerInfo(I.getPointerOperand()), Flags, MemVT.getStoreSize(),
       DAG.getEVTAlign(MemVT), AAMDNodes(), nullptr, SSID, SuccessOrdering,
       FailureOrdering);
 
   SDValue L = DAG.getAtomicCmpSwap(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS,
                                    dl, MemVT, VTs, InChain,
                                    getValue(I.getPointerOperand()),
                                    getValue(I.getCompareOperand()),
                                    getValue(I.getNewValOperand()), MMO);
 
   SDValue OutChain = L.getValue(2);
 
   setValue(&I, L);
   DAG.setRoot(OutChain);
 }
 
 void SelectionDAGBuilder::visitAtomicRMW(const AtomicRMWInst &I) {
   SDLoc dl = getCurSDLoc();
   ISD::NodeType NT;
   switch (I.getOperation()) {
   default: llvm_unreachable("Unknown atomicrmw operation");
   case AtomicRMWInst::Xchg: NT = ISD::ATOMIC_SWAP; break;
   case AtomicRMWInst::Add:  NT = ISD::ATOMIC_LOAD_ADD; break;
   case AtomicRMWInst::Sub:  NT = ISD::ATOMIC_LOAD_SUB; break;
   case AtomicRMWInst::And:  NT = ISD::ATOMIC_LOAD_AND; break;
   case AtomicRMWInst::Nand: NT = ISD::ATOMIC_LOAD_NAND; break;
   case AtomicRMWInst::Or:   NT = ISD::ATOMIC_LOAD_OR; break;
   case AtomicRMWInst::Xor:  NT = ISD::ATOMIC_LOAD_XOR; break;
   case AtomicRMWInst::Max:  NT = ISD::ATOMIC_LOAD_MAX; break;
   case AtomicRMWInst::Min:  NT = ISD::ATOMIC_LOAD_MIN; break;
   case AtomicRMWInst::UMax: NT = ISD::ATOMIC_LOAD_UMAX; break;
   case AtomicRMWInst::UMin: NT = ISD::ATOMIC_LOAD_UMIN; break;
   case AtomicRMWInst::FAdd: NT = ISD::ATOMIC_LOAD_FADD; break;
   case AtomicRMWInst::FSub: NT = ISD::ATOMIC_LOAD_FSUB; break;
   }
   AtomicOrdering Ordering = I.getOrdering();
   SyncScope::ID SSID = I.getSyncScopeID();
 
   SDValue InChain = getRoot();
 
   auto MemVT = getValue(I.getValOperand()).getSimpleValueType();
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   auto Flags = TLI.getAtomicMemOperandFlags(I, DAG.getDataLayout());
 
   MachineFunction &MF = DAG.getMachineFunction();
   MachineMemOperand *MMO = MF.getMachineMemOperand(
       MachinePointerInfo(I.getPointerOperand()), Flags, MemVT.getStoreSize(),
       DAG.getEVTAlign(MemVT), AAMDNodes(), nullptr, SSID, Ordering);
 
   SDValue L =
     DAG.getAtomic(NT, dl, MemVT, InChain,
                   getValue(I.getPointerOperand()), getValue(I.getValOperand()),
                   MMO);
 
   SDValue OutChain = L.getValue(1);
 
   setValue(&I, L);
   DAG.setRoot(OutChain);
 }
 
 void SelectionDAGBuilder::visitFence(const FenceInst &I) {
   SDLoc dl = getCurSDLoc();
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SDValue Ops[3];
   Ops[0] = getRoot();
   Ops[1] = DAG.getTargetConstant((unsigned)I.getOrdering(), dl,
                                  TLI.getFenceOperandTy(DAG.getDataLayout()));
   Ops[2] = DAG.getTargetConstant(I.getSyncScopeID(), dl,
                                  TLI.getFenceOperandTy(DAG.getDataLayout()));
   DAG.setRoot(DAG.getNode(ISD::ATOMIC_FENCE, dl, MVT::Other, Ops));
 }
 
 void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
   SDLoc dl = getCurSDLoc();
   AtomicOrdering Order = I.getOrdering();
   SyncScope::ID SSID = I.getSyncScopeID();
 
   SDValue InChain = getRoot();
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
   EVT MemVT = TLI.getMemValueType(DAG.getDataLayout(), I.getType());
 
   if (!TLI.supportsUnalignedAtomics() &&
       I.getAlignment() < MemVT.getSizeInBits() / 8)
     report_fatal_error("Cannot generate unaligned atomic load");
 
   auto Flags = TLI.getLoadMemOperandFlags(I, DAG.getDataLayout());
 
   MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
       MachinePointerInfo(I.getPointerOperand()), Flags, MemVT.getStoreSize(),
       I.getAlign(), AAMDNodes(), nullptr, SSID, Order);
 
   InChain = TLI.prepareVolatileOrAtomicLoad(InChain, dl, DAG);
 
   SDValue Ptr = getValue(I.getPointerOperand());
 
   if (TLI.lowerAtomicLoadAsLoadSDNode(I)) {
     // TODO: Once this is better exercised by tests, it should be merged with
     // the normal path for loads to prevent future divergence.
     SDValue L = DAG.getLoad(MemVT, dl, InChain, Ptr, MMO);
     if (MemVT != VT)
       L = DAG.getPtrExtOrTrunc(L, dl, VT);
 
     setValue(&I, L);
     SDValue OutChain = L.getValue(1);
     if (!I.isUnordered())
       DAG.setRoot(OutChain);
     else
       PendingLoads.push_back(OutChain);
     return;
   }
 
   SDValue L = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, MemVT, MemVT, InChain,
                             Ptr, MMO);
 
   SDValue OutChain = L.getValue(1);
   if (MemVT != VT)
     L = DAG.getPtrExtOrTrunc(L, dl, VT);
 
   setValue(&I, L);
   DAG.setRoot(OutChain);
 }
 
 void SelectionDAGBuilder::visitAtomicStore(const StoreInst &I) {
   SDLoc dl = getCurSDLoc();
 
   AtomicOrdering Ordering = I.getOrdering();
   SyncScope::ID SSID = I.getSyncScopeID();
 
   SDValue InChain = getRoot();
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   EVT MemVT =
       TLI.getMemValueType(DAG.getDataLayout(), I.getValueOperand()->getType());
 
   if (I.getAlignment() < MemVT.getSizeInBits() / 8)
     report_fatal_error("Cannot generate unaligned atomic store");
 
   auto Flags = TLI.getStoreMemOperandFlags(I, DAG.getDataLayout());
 
   MachineFunction &MF = DAG.getMachineFunction();
   MachineMemOperand *MMO = MF.getMachineMemOperand(
       MachinePointerInfo(I.getPointerOperand()), Flags, MemVT.getStoreSize(),
       I.getAlign(), AAMDNodes(), nullptr, SSID, Ordering);
 
   SDValue Val = getValue(I.getValueOperand());
   if (Val.getValueType() != MemVT)
     Val = DAG.getPtrExtOrTrunc(Val, dl, MemVT);
   SDValue Ptr = getValue(I.getPointerOperand());
 
   if (TLI.lowerAtomicStoreAsStoreSDNode(I)) {
     // TODO: Once this is better exercised by tests, it should be merged with
     // the normal path for stores to prevent future divergence.
     SDValue S = DAG.getStore(InChain, dl, Val, Ptr, MMO);
     DAG.setRoot(S);
     return;
   }
   SDValue OutChain = DAG.getAtomic(ISD::ATOMIC_STORE, dl, MemVT, InChain,
                                    Ptr, Val, MMO);
 
 
   DAG.setRoot(OutChain);
 }
 
 /// visitTargetIntrinsic - Lower a call of a target intrinsic to an INTRINSIC
 /// node.
 void SelectionDAGBuilder::visitTargetIntrinsic(const CallInst &I,
                                                unsigned Intrinsic) {
   // Ignore the callsite's attributes. A specific call site may be marked with
   // readnone, but the lowering code will expect the chain based on the
   // definition.
   const Function *F = I.getCalledFunction();
   bool HasChain = !F->doesNotAccessMemory();
   bool OnlyLoad = HasChain && F->onlyReadsMemory();
 
   // Build the operand list.
   SmallVector<SDValue, 8> Ops;
   if (HasChain) {  // If this intrinsic has side-effects, chainify it.
     if (OnlyLoad) {
       // We don't need to serialize loads against other loads.
       Ops.push_back(DAG.getRoot());
     } else {
       Ops.push_back(getRoot());
     }
   }
 
   // Info is set by getTgtMemInstrinsic
   TargetLowering::IntrinsicInfo Info;
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   bool IsTgtIntrinsic = TLI.getTgtMemIntrinsic(Info, I,
                                                DAG.getMachineFunction(),
                                                Intrinsic);
 
   // Add the intrinsic ID as an integer operand if it's not a target intrinsic.
   if (!IsTgtIntrinsic || Info.opc == ISD::INTRINSIC_VOID ||
       Info.opc == ISD::INTRINSIC_W_CHAIN)
     Ops.push_back(DAG.getTargetConstant(Intrinsic, getCurSDLoc(),
                                         TLI.getPointerTy(DAG.getDataLayout())));
 
   // Add all operands of the call to the operand list.
   for (unsigned i = 0, e = I.getNumArgOperands(); i != e; ++i) {
     const Value *Arg = I.getArgOperand(i);
     if (!I.paramHasAttr(i, Attribute::ImmArg)) {
       Ops.push_back(getValue(Arg));
       continue;
     }
 
     // Use TargetConstant instead of a regular constant for immarg.
     EVT VT = TLI.getValueType(*DL, Arg->getType(), true);
     if (const ConstantInt *CI = dyn_cast<ConstantInt>(Arg)) {
       assert(CI->getBitWidth() <= 64 &&
              "large intrinsic immediates not handled");
       Ops.push_back(DAG.getTargetConstant(*CI, SDLoc(), VT));
     } else {
       Ops.push_back(
           DAG.getTargetConstantFP(*cast<ConstantFP>(Arg), SDLoc(), VT));
     }
   }
 
   SmallVector<EVT, 4> ValueVTs;
   ComputeValueVTs(TLI, DAG.getDataLayout(), I.getType(), ValueVTs);
 
   if (HasChain)
     ValueVTs.push_back(MVT::Other);
 
   SDVTList VTs = DAG.getVTList(ValueVTs);
 
   // Create the node.
   SDValue Result;
   if (IsTgtIntrinsic) {
     // This is target intrinsic that touches memory
     AAMDNodes AAInfo;
     I.getAAMetadata(AAInfo);
     Result =
         DAG.getMemIntrinsicNode(Info.opc, getCurSDLoc(), VTs, Ops, Info.memVT,
                                 MachinePointerInfo(Info.ptrVal, Info.offset),
                                 Info.align, Info.flags, Info.size, AAInfo);
   } else if (!HasChain) {
     Result = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, getCurSDLoc(), VTs, Ops);
   } else if (!I.getType()->isVoidTy()) {
     Result = DAG.getNode(ISD::INTRINSIC_W_CHAIN, getCurSDLoc(), VTs, Ops);
   } else {
     Result = DAG.getNode(ISD::INTRINSIC_VOID, getCurSDLoc(), VTs, Ops);
   }
 
   if (HasChain) {
     SDValue Chain = Result.getValue(Result.getNode()->getNumValues()-1);
     if (OnlyLoad)
       PendingLoads.push_back(Chain);
     else
       DAG.setRoot(Chain);
   }
 
   if (!I.getType()->isVoidTy()) {
     if (VectorType *PTy = dyn_cast<VectorType>(I.getType())) {
       EVT VT = TLI.getValueType(DAG.getDataLayout(), PTy);
       Result = DAG.getNode(ISD::BITCAST, getCurSDLoc(), VT, Result);
     } else
       Result = lowerRangeToAssertZExt(DAG, I, Result);
 
     MaybeAlign Alignment = I.getRetAlign();
     if (!Alignment)
       Alignment = F->getAttributes().getRetAlignment();
     // Insert `assertalign` node if there's an alignment.
     if (InsertAssertAlign && Alignment) {
       Result =
           DAG.getAssertAlign(getCurSDLoc(), Result, Alignment.valueOrOne());
     }
 
     setValue(&I, Result);
   }
 }
 
 /// GetSignificand - Get the significand and build it into a floating-point
 /// number with exponent of 1:
 ///
 ///   Op = (Op & 0x007fffff) | 0x3f800000;
 ///
 /// where Op is the hexadecimal representation of floating point value.
 static SDValue GetSignificand(SelectionDAG &DAG, SDValue Op, const SDLoc &dl) {
   SDValue t1 = DAG.getNode(ISD::AND, dl, MVT::i32, Op,
                            DAG.getConstant(0x007fffff, dl, MVT::i32));
   SDValue t2 = DAG.getNode(ISD::OR, dl, MVT::i32, t1,
                            DAG.getConstant(0x3f800000, dl, MVT::i32));
   return DAG.getNode(ISD::BITCAST, dl, MVT::f32, t2);
 }
 
 /// GetExponent - Get the exponent:
 ///
 ///   (float)(int)(((Op & 0x7f800000) >> 23) - 127);
 ///
 /// where Op is the hexadecimal representation of floating point value.
 static SDValue GetExponent(SelectionDAG &DAG, SDValue Op,
                            const TargetLowering &TLI, const SDLoc &dl) {
   SDValue t0 = DAG.getNode(ISD::AND, dl, MVT::i32, Op,
                            DAG.getConstant(0x7f800000, dl, MVT::i32));
   SDValue t1 = DAG.getNode(
       ISD::SRL, dl, MVT::i32, t0,
       DAG.getConstant(23, dl, TLI.getPointerTy(DAG.getDataLayout())));
   SDValue t2 = DAG.getNode(ISD::SUB, dl, MVT::i32, t1,
                            DAG.getConstant(127, dl, MVT::i32));
   return DAG.getNode(ISD::SINT_TO_FP, dl, MVT::f32, t2);
 }
 
 /// getF32Constant - Get 32-bit floating point constant.
 static SDValue getF32Constant(SelectionDAG &DAG, unsigned Flt,
                               const SDLoc &dl) {
   return DAG.getConstantFP(APFloat(APFloat::IEEEsingle(), APInt(32, Flt)), dl,
                            MVT::f32);
 }
 
 static SDValue getLimitedPrecisionExp2(SDValue t0, const SDLoc &dl,
                                        SelectionDAG &DAG) {
   // TODO: What fast-math-flags should be set on the floating-point nodes?
 
   //   IntegerPartOfX = ((int32_t)(t0);
   SDValue IntegerPartOfX = DAG.getNode(ISD::FP_TO_SINT, dl, MVT::i32, t0);
 
   //   FractionalPartOfX = t0 - (float)IntegerPartOfX;
   SDValue t1 = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::f32, IntegerPartOfX);
   SDValue X = DAG.getNode(ISD::FSUB, dl, MVT::f32, t0, t1);
 
   //   IntegerPartOfX <<= 23;
   IntegerPartOfX = DAG.getNode(
       ISD::SHL, dl, MVT::i32, IntegerPartOfX,
       DAG.getConstant(23, dl, DAG.getTargetLoweringInfo().getPointerTy(
                                   DAG.getDataLayout())));
 
   SDValue TwoToFractionalPartOfX;
   if (LimitFloatPrecision <= 6) {
     // For floating-point precision of 6:
     //
     //   TwoToFractionalPartOfX =
     //     0.997535578f +
     //       (0.735607626f + 0.252464424f * x) * x;
     //
     // error 0.0144103317, which is 6 bits
     SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                              getF32Constant(DAG, 0x3e814304, dl));
     SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2,
                              getF32Constant(DAG, 0x3f3c50c8, dl));
     SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X);
     TwoToFractionalPartOfX = DAG.getNode(ISD::FADD, dl, MVT::f32, t4,
                                          getF32Constant(DAG, 0x3f7f5e7e, dl));
   } else if (LimitFloatPrecision <= 12) {
     // For floating-point precision of 12:
     //
     //   TwoToFractionalPartOfX =
     //     0.999892986f +
     //       (0.696457318f +
     //         (0.224338339f + 0.792043434e-1f * x) * x) * x;
     //
     // error 0.000107046256, which is 13 to 14 bits
     SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                              getF32Constant(DAG, 0x3da235e3, dl));
     SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2,
                              getF32Constant(DAG, 0x3e65b8f3, dl));
     SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X);
     SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4,
                              getF32Constant(DAG, 0x3f324b07, dl));
     SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X);
     TwoToFractionalPartOfX = DAG.getNode(ISD::FADD, dl, MVT::f32, t6,
                                          getF32Constant(DAG, 0x3f7ff8fd, dl));
   } else { // LimitFloatPrecision <= 18
     // For floating-point precision of 18:
     //
     //   TwoToFractionalPartOfX =
     //     0.999999982f +
     //       (0.693148872f +
     //         (0.240227044f +
     //           (0.554906021e-1f +
     //             (0.961591928e-2f +
     //               (0.136028312e-2f + 0.157059148e-3f *x)*x)*x)*x)*x)*x;
     // error 2.47208000*10^(-7), which is better than 18 bits
     SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                              getF32Constant(DAG, 0x3924b03e, dl));
     SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2,
                              getF32Constant(DAG, 0x3ab24b87, dl));
     SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X);
     SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4,
                              getF32Constant(DAG, 0x3c1d8c17, dl));
     SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X);
     SDValue t7 = DAG.getNode(ISD::FADD, dl, MVT::f32, t6,
                              getF32Constant(DAG, 0x3d634a1d, dl));
     SDValue t8 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t7, X);
     SDValue t9 = DAG.getNode(ISD::FADD, dl, MVT::f32, t8,
                              getF32Constant(DAG, 0x3e75fe14, dl));
     SDValue t10 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t9, X);
     SDValue t11 = DAG.getNode(ISD::FADD, dl, MVT::f32, t10,
                               getF32Constant(DAG, 0x3f317234, dl));
     SDValue t12 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t11, X);
     TwoToFractionalPartOfX = DAG.getNode(ISD::FADD, dl, MVT::f32, t12,
                                          getF32Constant(DAG, 0x3f800000, dl));
   }
 
   // Add the exponent into the result in integer domain.
   SDValue t13 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, TwoToFractionalPartOfX);
   return DAG.getNode(ISD::BITCAST, dl, MVT::f32,
                      DAG.getNode(ISD::ADD, dl, MVT::i32, t13, IntegerPartOfX));
 }
 
 /// expandExp - Lower an exp intrinsic. Handles the special sequences for
 /// limited-precision mode.
 static SDValue expandExp(const SDLoc &dl, SDValue Op, SelectionDAG &DAG,
                          const TargetLowering &TLI) {
   if (Op.getValueType() == MVT::f32 &&
       LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) {
 
     // Put the exponent in the right bit position for later addition to the
     // final result:
     //
     // t0 = Op * log2(e)
 
     // TODO: What fast-math-flags should be set here?
     SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, Op,
                              DAG.getConstantFP(numbers::log2ef, dl, MVT::f32));
     return getLimitedPrecisionExp2(t0, dl, DAG);
   }
 
   // No special expansion.
   return DAG.getNode(ISD::FEXP, dl, Op.getValueType(), Op);
 }
 
 /// expandLog - Lower a log intrinsic. Handles the special sequences for
 /// limited-precision mode.
 static SDValue expandLog(const SDLoc &dl, SDValue Op, SelectionDAG &DAG,
                          const TargetLowering &TLI) {
   // TODO: What fast-math-flags should be set on the floating-point nodes?
 
   if (Op.getValueType() == MVT::f32 &&
       LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) {
     SDValue Op1 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, Op);
 
     // Scale the exponent by log(2).
     SDValue Exp = GetExponent(DAG, Op1, TLI, dl);
     SDValue LogOfExponent =
         DAG.getNode(ISD::FMUL, dl, MVT::f32, Exp,
                     DAG.getConstantFP(numbers::ln2f, dl, MVT::f32));
 
     // Get the significand and build it into a floating-point number with
     // exponent of 1.
     SDValue X = GetSignificand(DAG, Op1, dl);
 
     SDValue LogOfMantissa;
     if (LimitFloatPrecision <= 6) {
       // For floating-point precision of 6:
       //
       //   LogofMantissa =
       //     -1.1609546f +
       //       (1.4034025f - 0.23903021f * x) * x;
       //
       // error 0.0034276066, which is better than 8 bits
       SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                                getF32Constant(DAG, 0xbe74c456, dl));
       SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0,
                                getF32Constant(DAG, 0x3fb3a2b1, dl));
       SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X);
       LogOfMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2,
                                   getF32Constant(DAG, 0x3f949a29, dl));
     } else if (LimitFloatPrecision <= 12) {
       // For floating-point precision of 12:
       //
       //   LogOfMantissa =
       //     -1.7417939f +
       //       (2.8212026f +
       //         (-1.4699568f +
       //           (0.44717955f - 0.56570851e-1f * x) * x) * x) * x;
       //
       // error 0.000061011436, which is 14 bits
       SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                                getF32Constant(DAG, 0xbd67b6d6, dl));
       SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0,
                                getF32Constant(DAG, 0x3ee4f4b8, dl));
       SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X);
       SDValue t3 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2,
                                getF32Constant(DAG, 0x3fbc278b, dl));
       SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X);
       SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4,
                                getF32Constant(DAG, 0x40348e95, dl));
       SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X);
       LogOfMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t6,
                                   getF32Constant(DAG, 0x3fdef31a, dl));
     } else { // LimitFloatPrecision <= 18
       // For floating-point precision of 18:
       //
       //   LogOfMantissa =
       //     -2.1072184f +
       //       (4.2372794f +
       //         (-3.7029485f +
       //           (2.2781945f +
       //             (-0.87823314f +
       //               (0.19073739f - 0.17809712e-1f * x) * x) * x) * x) * x)*x;
       //
       // error 0.0000023660568, which is better than 18 bits
       SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                                getF32Constant(DAG, 0xbc91e5ac, dl));
       SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0,
                                getF32Constant(DAG, 0x3e4350aa, dl));
       SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X);
       SDValue t3 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2,
                                getF32Constant(DAG, 0x3f60d3e3, dl));
       SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X);
       SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4,
                                getF32Constant(DAG, 0x4011cdf0, dl));
       SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X);
       SDValue t7 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t6,
                                getF32Constant(DAG, 0x406cfd1c, dl));
       SDValue t8 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t7, X);
       SDValue t9 = DAG.getNode(ISD::FADD, dl, MVT::f32, t8,
                                getF32Constant(DAG, 0x408797cb, dl));
       SDValue t10 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t9, X);
       LogOfMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t10,
                                   getF32Constant(DAG, 0x4006dcab, dl));
     }
 
     return DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, LogOfMantissa);
   }
 
   // No special expansion.
   return DAG.getNode(ISD::FLOG, dl, Op.getValueType(), Op);
 }
 
 /// expandLog2 - Lower a log2 intrinsic. Handles the special sequences for
 /// limited-precision mode.
 static SDValue expandLog2(const SDLoc &dl, SDValue Op, SelectionDAG &DAG,
                           const TargetLowering &TLI) {
   // TODO: What fast-math-flags should be set on the floating-point nodes?
 
   if (Op.getValueType() == MVT::f32 &&
       LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) {
     SDValue Op1 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, Op);
 
     // Get the exponent.
     SDValue LogOfExponent = GetExponent(DAG, Op1, TLI, dl);
 
     // Get the significand and build it into a floating-point number with
     // exponent of 1.
     SDValue X = GetSignificand(DAG, Op1, dl);
 
     // Different possible minimax approximations of significand in
     // floating-point for various degrees of accuracy over [1,2].
     SDValue Log2ofMantissa;
     if (LimitFloatPrecision <= 6) {
       // For floating-point precision of 6:
       //
       //   Log2ofMantissa = -1.6749035f + (2.0246817f - .34484768f * x) * x;
       //
       // error 0.0049451742, which is more than 7 bits
       SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                                getF32Constant(DAG, 0xbeb08fe0, dl));
       SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0,
                                getF32Constant(DAG, 0x40019463, dl));
       SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X);
       Log2ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2,
                                    getF32Constant(DAG, 0x3fd6633d, dl));
     } else if (LimitFloatPrecision <= 12) {
       // For floating-point precision of 12:
       //
       //   Log2ofMantissa =
       //     -2.51285454f +
       //       (4.07009056f +
       //         (-2.12067489f +
       //           (.645142248f - 0.816157886e-1f * x) * x) * x) * x;
       //
       // error 0.0000876136000, which is better than 13 bits
       SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                                getF32Constant(DAG, 0xbda7262e, dl));
       SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0,
                                getF32Constant(DAG, 0x3f25280b, dl));
       SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X);
       SDValue t3 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2,
                                getF32Constant(DAG, 0x4007b923, dl));
       SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X);
       SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4,
                                getF32Constant(DAG, 0x40823e2f, dl));
       SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X);
       Log2ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t6,
                                    getF32Constant(DAG, 0x4020d29c, dl));
     } else { // LimitFloatPrecision <= 18
       // For floating-point precision of 18:
       //
       //   Log2ofMantissa =
       //     -3.0400495f +
       //       (6.1129976f +
       //         (-5.3420409f +
       //           (3.2865683f +
       //             (-1.2669343f +
       //               (0.27515199f -
       //                 0.25691327e-1f * x) * x) * x) * x) * x) * x;
       //
       // error 0.0000018516, which is better than 18 bits
       SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                                getF32Constant(DAG, 0xbcd2769e, dl));
       SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0,
                                getF32Constant(DAG, 0x3e8ce0b9, dl));
       SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X);
       SDValue t3 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2,
                                getF32Constant(DAG, 0x3fa22ae7, dl));
       SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X);
       SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4,
                                getF32Constant(DAG, 0x40525723, dl));
       SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X);
       SDValue t7 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t6,
                                getF32Constant(DAG, 0x40aaf200, dl));
       SDValue t8 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t7, X);
       SDValue t9 = DAG.getNode(ISD::FADD, dl, MVT::f32, t8,
                                getF32Constant(DAG, 0x40c39dad, dl));
       SDValue t10 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t9, X);
       Log2ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t10,
                                    getF32Constant(DAG, 0x4042902c, dl));
     }
 
     return DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, Log2ofMantissa);
   }
 
   // No special expansion.
   return DAG.getNode(ISD::FLOG2, dl, Op.getValueType(), Op);
 }
 
 /// expandLog10 - Lower a log10 intrinsic. Handles the special sequences for
 /// limited-precision mode.
 static SDValue expandLog10(const SDLoc &dl, SDValue Op, SelectionDAG &DAG,
                            const TargetLowering &TLI) {
   // TODO: What fast-math-flags should be set on the floating-point nodes?
 
   if (Op.getValueType() == MVT::f32 &&
       LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) {
     SDValue Op1 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, Op);
 
     // Scale the exponent by log10(2) [0.30102999f].
     SDValue Exp = GetExponent(DAG, Op1, TLI, dl);
     SDValue LogOfExponent = DAG.getNode(ISD::FMUL, dl, MVT::f32, Exp,
                                         getF32Constant(DAG, 0x3e9a209a, dl));
 
     // Get the significand and build it into a floating-point number with
     // exponent of 1.
     SDValue X = GetSignificand(DAG, Op1, dl);
 
     SDValue Log10ofMantissa;
     if (LimitFloatPrecision <= 6) {
       // For floating-point precision of 6:
       //
       //   Log10ofMantissa =
       //     -0.50419619f +
       //       (0.60948995f - 0.10380950f * x) * x;
       //
       // error 0.0014886165, which is 6 bits
       SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                                getF32Constant(DAG, 0xbdd49a13, dl));
       SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0,
                                getF32Constant(DAG, 0x3f1c0789, dl));
       SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X);
       Log10ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2,
                                     getF32Constant(DAG, 0x3f011300, dl));
     } else if (LimitFloatPrecision <= 12) {
       // For floating-point precision of 12:
       //
       //   Log10ofMantissa =
       //     -0.64831180f +
       //       (0.91751397f +
       //         (-0.31664806f + 0.47637168e-1f * x) * x) * x;
       //
       // error 0.00019228036, which is better than 12 bits
       SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                                getF32Constant(DAG, 0x3d431f31, dl));
       SDValue t1 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t0,
                                getF32Constant(DAG, 0x3ea21fb2, dl));
       SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X);
       SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2,
                                getF32Constant(DAG, 0x3f6ae232, dl));
       SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X);
       Log10ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t4,
                                     getF32Constant(DAG, 0x3f25f7c3, dl));
     } else { // LimitFloatPrecision <= 18
       // For floating-point precision of 18:
       //
       //   Log10ofMantissa =
       //     -0.84299375f +
       //       (1.5327582f +
       //         (-1.0688956f +
       //           (0.49102474f +
       //             (-0.12539807f + 0.13508273e-1f * x) * x) * x) * x) * x;
       //
       // error 0.0000037995730, which is better than 18 bits
       SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X,
                                getF32Constant(DAG, 0x3c5d51ce, dl));
       SDValue t1 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t0,
                                getF32Constant(DAG, 0x3e00685a, dl));
       SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X);
       SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2,
                                getF32Constant(DAG, 0x3efb6798, dl));
       SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X);
       SDValue t5 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t4,
                                getF32Constant(DAG, 0x3f88d192, dl));
       SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X);
       SDValue t7 = DAG.getNode(ISD::FADD, dl, MVT::f32, t6,
                                getF32Constant(DAG, 0x3fc4316c, dl));
       SDValue t8 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t7, X);
       Log10ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t8,
                                     getF32Constant(DAG, 0x3f57ce70, dl));
     }
 
     return DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, Log10ofMantissa);
   }
 
   // No special expansion.
   return DAG.getNode(ISD::FLOG10, dl, Op.getValueType(), Op);
 }
 
 /// expandExp2 - Lower an exp2 intrinsic. Handles the special sequences for
 /// limited-precision mode.
 static SDValue expandExp2(const SDLoc &dl, SDValue Op, SelectionDAG &DAG,
                           const TargetLowering &TLI) {
   if (Op.getValueType() == MVT::f32 &&
       LimitFloatPrecision > 0 && LimitFloatPrecision <= 18)
     return getLimitedPrecisionExp2(Op, dl, DAG);
 
   // No special expansion.
   return DAG.getNode(ISD::FEXP2, dl, Op.getValueType(), Op);
 }
 
 /// visitPow - Lower a pow intrinsic. Handles the special sequences for
 /// limited-precision mode with x == 10.0f.
 static SDValue expandPow(const SDLoc &dl, SDValue LHS, SDValue RHS,
                          SelectionDAG &DAG, const TargetLowering &TLI) {
   bool IsExp10 = false;
   if (LHS.getValueType() == MVT::f32 && RHS.getValueType() == MVT::f32 &&
       LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) {
     if (ConstantFPSDNode *LHSC = dyn_cast<ConstantFPSDNode>(LHS)) {
       APFloat Ten(10.0f);
       IsExp10 = LHSC->isExactlyValue(Ten);
     }
   }
 
   // TODO: What fast-math-flags should be set on the FMUL node?
   if (IsExp10) {
     // Put the exponent in the right bit position for later addition to the
     // final result:
     //
     //   #define LOG2OF10 3.3219281f
     //   t0 = Op * LOG2OF10;
     SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, RHS,
                              getF32Constant(DAG, 0x40549a78, dl));
     return getLimitedPrecisionExp2(t0, dl, DAG);
   }
 
   // No special expansion.
   return DAG.getNode(ISD::FPOW, dl, LHS.getValueType(), LHS, RHS);
 }
 
 /// ExpandPowI - Expand a llvm.powi intrinsic.
 static SDValue ExpandPowI(const SDLoc &DL, SDValue LHS, SDValue RHS,
                           SelectionDAG &DAG) {
   // If RHS is a constant, we can expand this out to a multiplication tree,
   // otherwise we end up lowering to a call to __powidf2 (for example).  When
   // optimizing for size, we only want to do this if the expansion would produce
   // a small number of multiplies, otherwise we do the full expansion.
   if (ConstantSDNode *RHSC = dyn_cast<ConstantSDNode>(RHS)) {
     // Get the exponent as a positive value.
     unsigned Val = RHSC->getSExtValue();
     if ((int)Val < 0) Val = -Val;
 
     // powi(x, 0) -> 1.0
     if (Val == 0)
       return DAG.getConstantFP(1.0, DL, LHS.getValueType());
 
     bool OptForSize = DAG.shouldOptForSize();
     if (!OptForSize ||
         // If optimizing for size, don't insert too many multiplies.
         // This inserts up to 5 multiplies.
         countPopulation(Val) + Log2_32(Val) < 7) {
       // We use the simple binary decomposition method to generate the multiply
       // sequence.  There are more optimal ways to do this (for example,
       // powi(x,15) generates one more multiply than it should), but this has
       // the benefit of being both really simple and much better than a libcall.
       SDValue Res;  // Logically starts equal to 1.0
       SDValue CurSquare = LHS;
       // TODO: Intrinsics should have fast-math-flags that propagate to these
       // nodes.
       while (Val) {
         if (Val & 1) {
           if (Res.getNode())
             Res = DAG.getNode(ISD::FMUL, DL,Res.getValueType(), Res, CurSquare);
           else
             Res = CurSquare;  // 1.0*CurSquare.
         }
 
         CurSquare = DAG.getNode(ISD::FMUL, DL, CurSquare.getValueType(),
                                 CurSquare, CurSquare);
         Val >>= 1;
       }
 
       // If the original was negative, invert the result, producing 1/(x*x*x).
       if (RHSC->getSExtValue() < 0)
         Res = DAG.getNode(ISD::FDIV, DL, LHS.getValueType(),
                           DAG.getConstantFP(1.0, DL, LHS.getValueType()), Res);
       return Res;
     }
   }
 
   // Otherwise, expand to a libcall.
   return DAG.getNode(ISD::FPOWI, DL, LHS.getValueType(), LHS, RHS);
 }
 
 static SDValue expandDivFix(unsigned Opcode, const SDLoc &DL,
                             SDValue LHS, SDValue RHS, SDValue Scale,
                             SelectionDAG &DAG, const TargetLowering &TLI) {
   EVT VT = LHS.getValueType();
   bool Signed = Opcode == ISD::SDIVFIX || Opcode == ISD::SDIVFIXSAT;
   bool Saturating = Opcode == ISD::SDIVFIXSAT || Opcode == ISD::UDIVFIXSAT;
   LLVMContext &Ctx = *DAG.getContext();
 
   // If the type is legal but the operation isn't, this node might survive all
   // the way to operation legalization. If we end up there and we do not have
   // the ability to widen the type (if VT*2 is not legal), we cannot expand the
   // node.
 
   // Coax the legalizer into expanding the node during type legalization instead
   // by bumping the size by one bit. This will force it to Promote, enabling the
   // early expansion and avoiding the need to expand later.
 
   // We don't have to do this if Scale is 0; that can always be expanded, unless
   // it's a saturating signed operation. Those can experience true integer
   // division overflow, a case which we must avoid.
 
   // FIXME: We wouldn't have to do this (or any of the early
   // expansion/promotion) if it was possible to expand a libcall of an
   // illegal type during operation legalization. But it's not, so things
   // get a bit hacky.
   unsigned ScaleInt = cast<ConstantSDNode>(Scale)->getZExtValue();
   if ((ScaleInt > 0 || (Saturating && Signed)) &&
       (TLI.isTypeLegal(VT) ||
        (VT.isVector() && TLI.isTypeLegal(VT.getVectorElementType())))) {
     TargetLowering::LegalizeAction Action = TLI.getFixedPointOperationAction(
         Opcode, VT, ScaleInt);
     if (Action != TargetLowering::Legal && Action != TargetLowering::Custom) {
       EVT PromVT;
       if (VT.isScalarInteger())
         PromVT = EVT::getIntegerVT(Ctx, VT.getSizeInBits() + 1);
       else if (VT.isVector()) {
         PromVT = VT.getVectorElementType();
         PromVT = EVT::getIntegerVT(Ctx, PromVT.getSizeInBits() + 1);
         PromVT = EVT::getVectorVT(Ctx, PromVT, VT.getVectorElementCount());
       } else
         llvm_unreachable("Wrong VT for DIVFIX?");
       if (Signed) {
         LHS = DAG.getSExtOrTrunc(LHS, DL, PromVT);
         RHS = DAG.getSExtOrTrunc(RHS, DL, PromVT);
       } else {
         LHS = DAG.getZExtOrTrunc(LHS, DL, PromVT);
         RHS = DAG.getZExtOrTrunc(RHS, DL, PromVT);
       }
       EVT ShiftTy = TLI.getShiftAmountTy(PromVT, DAG.getDataLayout());
       // For saturating operations, we need to shift up the LHS to get the
       // proper saturation width, and then shift down again afterwards.
       if (Saturating)
         LHS = DAG.getNode(ISD::SHL, DL, PromVT, LHS,
                           DAG.getConstant(1, DL, ShiftTy));
       SDValue Res = DAG.getNode(Opcode, DL, PromVT, LHS, RHS, Scale);
       if (Saturating)
         Res = DAG.getNode(Signed ? ISD::SRA : ISD::SRL, DL, PromVT, Res,
                           DAG.getConstant(1, DL, ShiftTy));
       return DAG.getZExtOrTrunc(Res, DL, VT);
     }
   }
 
   return DAG.getNode(Opcode, DL, VT, LHS, RHS, Scale);
 }
 
 // getUnderlyingArgRegs - Find underlying registers used for a truncated,
 // bitcasted, or split argument. Returns a list of <Register, size in bits>
 static void
 getUnderlyingArgRegs(SmallVectorImpl<std::pair<unsigned, unsigned>> &Regs,
                      const SDValue &N) {
   switch (N.getOpcode()) {
   case ISD::CopyFromReg: {
     SDValue Op = N.getOperand(1);
     Regs.emplace_back(cast<RegisterSDNode>(Op)->getReg(),
                       Op.getValueType().getSizeInBits());
     return;
   }
   case ISD::BITCAST:
   case ISD::AssertZext:
   case ISD::AssertSext:
   case ISD::TRUNCATE:
     getUnderlyingArgRegs(Regs, N.getOperand(0));
     return;
   case ISD::BUILD_PAIR:
   case ISD::BUILD_VECTOR:
   case ISD::CONCAT_VECTORS:
     for (SDValue Op : N->op_values())
       getUnderlyingArgRegs(Regs, Op);
     return;
   default:
     return;
   }
 }
 
 /// If the DbgValueInst is a dbg_value of a function argument, create the
 /// corresponding DBG_VALUE machine instruction for it now.  At the end of
 /// instruction selection, they will be inserted to the entry BB.
 bool SelectionDAGBuilder::EmitFuncArgumentDbgValue(
     const Value *V, DILocalVariable *Variable, DIExpression *Expr,
     DILocation *DL, bool IsDbgDeclare, const SDValue &N) {
   const Argument *Arg = dyn_cast<Argument>(V);
   if (!Arg)
     return false;
 
   if (!IsDbgDeclare) {
     // ArgDbgValues are hoisted to the beginning of the entry block. So we
     // should only emit as ArgDbgValue if the dbg.value intrinsic is found in
     // the entry block.
     bool IsInEntryBlock = FuncInfo.MBB == &FuncInfo.MF->front();
     if (!IsInEntryBlock)
       return false;
 
     // ArgDbgValues are hoisted to the beginning of the entry block.  So we
     // should only emit as ArgDbgValue if the dbg.value intrinsic describes a
     // variable that also is a param.
     //
     // Although, if we are at the top of the entry block already, we can still
     // emit using ArgDbgValue. This might catch some situations when the
     // dbg.value refers to an argument that isn't used in the entry block, so
     // any CopyToReg node would be optimized out and the only way to express
     // this DBG_VALUE is by using the physical reg (or FI) as done in this
     // method.  ArgDbgValues are hoisted to the beginning of the entry block. So
     // we should only emit as ArgDbgValue if the Variable is an argument to the
     // current function, and the dbg.value intrinsic is found in the entry
     // block.
     bool VariableIsFunctionInputArg = Variable->isParameter() &&
         !DL->getInlinedAt();
     bool IsInPrologue = SDNodeOrder == LowestSDNodeOrder;
     if (!IsInPrologue && !VariableIsFunctionInputArg)
       return false;
 
     // Here we assume that a function argument on IR level only can be used to
     // describe one input parameter on source level. If we for example have
     // source code like this
     //
     //    struct A { long x, y; };
     //    void foo(struct A a, long b) {
     //      ...
     //      b = a.x;
     //      ...
     //    }
     //
     // and IR like this
     //
     //  define void @foo(i32 %a1, i32 %a2, i32 %b)  {
     //  entry:
     //    call void @llvm.dbg.value(metadata i32 %a1, "a", DW_OP_LLVM_fragment
     //    call void @llvm.dbg.value(metadata i32 %a2, "a", DW_OP_LLVM_fragment
     //    call void @llvm.dbg.value(metadata i32 %b, "b",
     //    ...
     //    call void @llvm.dbg.value(metadata i32 %a1, "b"
     //    ...
     //
     // then the last dbg.value is describing a parameter "b" using a value that
     // is an argument. But since we already has used %a1 to describe a parameter
     // we should not handle that last dbg.value here (that would result in an
     // incorrect hoisting of the DBG_VALUE to the function entry).
     // Notice that we allow one dbg.value per IR level argument, to accommodate
     // for the situation with fragments above.
     if (VariableIsFunctionInputArg) {
       unsigned ArgNo = Arg->getArgNo();
       if (ArgNo >= FuncInfo.DescribedArgs.size())
         FuncInfo.DescribedArgs.resize(ArgNo + 1, false);
       else if (!IsInPrologue && FuncInfo.DescribedArgs.test(ArgNo))
         return false;
       FuncInfo.DescribedArgs.set(ArgNo);
     }
   }
 
   MachineFunction &MF = DAG.getMachineFunction();
   const TargetInstrInfo *TII = DAG.getSubtarget().getInstrInfo();
 
   bool IsIndirect = false;
   Optional<MachineOperand> Op;
   // Some arguments' frame index is recorded during argument lowering.
   int FI = FuncInfo.getArgumentFrameIndex(Arg);
   if (FI != std::numeric_limits<int>::max())
     Op = MachineOperand::CreateFI(FI);
 
   SmallVector<std::pair<unsigned, unsigned>, 8> ArgRegsAndSizes;
   if (!Op && N.getNode()) {
     getUnderlyingArgRegs(ArgRegsAndSizes, N);
     Register Reg;
     if (ArgRegsAndSizes.size() == 1)
       Reg = ArgRegsAndSizes.front().first;
 
     if (Reg && Reg.isVirtual()) {
       MachineRegisterInfo &RegInfo = MF.getRegInfo();
       Register PR = RegInfo.getLiveInPhysReg(Reg);
       if (PR)
         Reg = PR;
     }
     if (Reg) {
       Op = MachineOperand::CreateReg(Reg, false);
       IsIndirect = IsDbgDeclare;
     }
   }
 
   if (!Op && N.getNode()) {
     // Check if frame index is available.
     SDValue LCandidate = peekThroughBitcasts(N);
     if (LoadSDNode *LNode = dyn_cast<LoadSDNode>(LCandidate.getNode()))
       if (FrameIndexSDNode *FINode =
           dyn_cast<FrameIndexSDNode>(LNode->getBasePtr().getNode()))
         Op = MachineOperand::CreateFI(FINode->getIndex());
   }
 
   if (!Op) {
     // Create a DBG_VALUE for each decomposed value in ArgRegs to cover Reg
     auto splitMultiRegDbgValue
       = [&](ArrayRef<std::pair<unsigned, unsigned>> SplitRegs) {
       unsigned Offset = 0;
       for (auto RegAndSize : SplitRegs) {
         // If the expression is already a fragment, the current register
         // offset+size might extend beyond the fragment. In this case, only
         // the register bits that are inside the fragment are relevant.
         int RegFragmentSizeInBits = RegAndSize.second;
         if (auto ExprFragmentInfo = Expr->getFragmentInfo()) {
           uint64_t ExprFragmentSizeInBits = ExprFragmentInfo->SizeInBits;
           // The register is entirely outside the expression fragment,
           // so is irrelevant for debug info.
           if (Offset >= ExprFragmentSizeInBits)
             break;
           // The register is partially outside the expression fragment, only
           // the low bits within the fragment are relevant for debug info.
           if (Offset + RegFragmentSizeInBits > ExprFragmentSizeInBits) {
             RegFragmentSizeInBits = ExprFragmentSizeInBits - Offset;
           }
         }
 
         auto FragmentExpr = DIExpression::createFragmentExpression(
             Expr, Offset, RegFragmentSizeInBits);
         Offset += RegAndSize.second;
         // If a valid fragment expression cannot be created, the variable's
         // correct value cannot be determined and so it is set as Undef.
         if (!FragmentExpr) {
           SDDbgValue *SDV = DAG.getConstantDbgValue(
               Variable, Expr, UndefValue::get(V->getType()), DL, SDNodeOrder);
           DAG.AddDbgValue(SDV, nullptr, false);
           continue;
         }
         assert(!IsDbgDeclare && "DbgDeclare operand is not in memory?");
         FuncInfo.ArgDbgValues.push_back(
           BuildMI(MF, DL, TII->get(TargetOpcode::DBG_VALUE), IsDbgDeclare,
                   RegAndSize.first, Variable, *FragmentExpr));
       }
     };
 
     // Check if ValueMap has reg number.
     DenseMap<const Value *, Register>::const_iterator
       VMI = FuncInfo.ValueMap.find(V);
     if (VMI != FuncInfo.ValueMap.end()) {
       const auto &TLI = DAG.getTargetLoweringInfo();
       RegsForValue RFV(V->getContext(), TLI, DAG.getDataLayout(), VMI->second,
-                       V->getType(), getABIRegCopyCC(V));
+                       V->getType(), None);
       if (RFV.occupiesMultipleRegs()) {
         splitMultiRegDbgValue(RFV.getRegsAndSizes());
         return true;
       }
 
       Op = MachineOperand::CreateReg(VMI->second, false);
       IsIndirect = IsDbgDeclare;
     } else if (ArgRegsAndSizes.size() > 1) {
       // This was split due to the calling convention, and no virtual register
       // mapping exists for the value.
       splitMultiRegDbgValue(ArgRegsAndSizes);
       return true;
     }
   }
 
   if (!Op)
     return false;
 
   assert(Variable->isValidLocationForIntrinsic(DL) &&
          "Expected inlined-at fields to agree");
   IsIndirect = (Op->isReg()) ? IsIndirect : true;
   FuncInfo.ArgDbgValues.push_back(
       BuildMI(MF, DL, TII->get(TargetOpcode::DBG_VALUE), IsIndirect,
               *Op, Variable, Expr));
 
   return true;
 }
 
 /// Return the appropriate SDDbgValue based on N.
 SDDbgValue *SelectionDAGBuilder::getDbgValue(SDValue N,
                                              DILocalVariable *Variable,
                                              DIExpression *Expr,
                                              const DebugLoc &dl,
                                              unsigned DbgSDNodeOrder) {
   if (auto *FISDN = dyn_cast<FrameIndexSDNode>(N.getNode())) {
     // Construct a FrameIndexDbgValue for FrameIndexSDNodes so we can describe
     // stack slot locations.
     //
     // Consider "int x = 0; int *px = &x;". There are two kinds of interesting
     // debug values here after optimization:
     //
     //   dbg.value(i32* %px, !"int *px", !DIExpression()), and
     //   dbg.value(i32* %px, !"int x", !DIExpression(DW_OP_deref))
     //
     // Both describe the direct values of their associated variables.
     return DAG.getFrameIndexDbgValue(Variable, Expr, FISDN->getIndex(),
                                      /*IsIndirect*/ false, dl, DbgSDNodeOrder);
   }
   return DAG.getDbgValue(Variable, Expr, N.getNode(), N.getResNo(),
                          /*IsIndirect*/ false, dl, DbgSDNodeOrder);
 }
 
 static unsigned FixedPointIntrinsicToOpcode(unsigned Intrinsic) {
   switch (Intrinsic) {
   case Intrinsic::smul_fix:
     return ISD::SMULFIX;
   case Intrinsic::umul_fix:
     return ISD::UMULFIX;
   case Intrinsic::smul_fix_sat:
     return ISD::SMULFIXSAT;
   case Intrinsic::umul_fix_sat:
     return ISD::UMULFIXSAT;
   case Intrinsic::sdiv_fix:
     return ISD::SDIVFIX;
   case Intrinsic::udiv_fix:
     return ISD::UDIVFIX;
   case Intrinsic::sdiv_fix_sat:
     return ISD::SDIVFIXSAT;
   case Intrinsic::udiv_fix_sat:
     return ISD::UDIVFIXSAT;
   default:
     llvm_unreachable("Unhandled fixed point intrinsic");
   }
 }
 
 void SelectionDAGBuilder::lowerCallToExternalSymbol(const CallInst &I,
                                            const char *FunctionName) {
   assert(FunctionName && "FunctionName must not be nullptr");
   SDValue Callee = DAG.getExternalSymbol(
       FunctionName,
       DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout()));
   LowerCallTo(I, Callee, I.isTailCall());
 }
 
 /// Given a @llvm.call.preallocated.setup, return the corresponding
 /// preallocated call.
 static const CallBase *FindPreallocatedCall(const Value *PreallocatedSetup) {
   assert(cast<CallBase>(PreallocatedSetup)
                  ->getCalledFunction()
                  ->getIntrinsicID() == Intrinsic::call_preallocated_setup &&
          "expected call_preallocated_setup Value");
   for (auto *U : PreallocatedSetup->users()) {
     auto *UseCall = cast<CallBase>(U);
     const Function *Fn = UseCall->getCalledFunction();
     if (!Fn || Fn->getIntrinsicID() != Intrinsic::call_preallocated_arg) {
       return UseCall;
     }
   }
   llvm_unreachable("expected corresponding call to preallocated setup/arg");
 }
 
 /// Lower the call to the specified intrinsic function.
 void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
                                              unsigned Intrinsic) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SDLoc sdl = getCurSDLoc();
   DebugLoc dl = getCurDebugLoc();
   SDValue Res;
 
   switch (Intrinsic) {
   default:
     // By default, turn this into a target intrinsic node.
     visitTargetIntrinsic(I, Intrinsic);
     return;
   case Intrinsic::vscale: {
     match(&I, m_VScale(DAG.getDataLayout()));
     EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
     setValue(&I,
              DAG.getVScale(getCurSDLoc(), VT, APInt(VT.getSizeInBits(), 1)));
     return;
   }
   case Intrinsic::vastart:  visitVAStart(I); return;
   case Intrinsic::vaend:    visitVAEnd(I); return;
   case Intrinsic::vacopy:   visitVACopy(I); return;
   case Intrinsic::returnaddress:
     setValue(&I, DAG.getNode(ISD::RETURNADDR, sdl,
                              TLI.getPointerTy(DAG.getDataLayout()),
                              getValue(I.getArgOperand(0))));
     return;
   case Intrinsic::addressofreturnaddress:
     setValue(&I, DAG.getNode(ISD::ADDROFRETURNADDR, sdl,
                              TLI.getPointerTy(DAG.getDataLayout())));
     return;
   case Intrinsic::sponentry:
     setValue(&I, DAG.getNode(ISD::SPONENTRY, sdl,
                              TLI.getFrameIndexTy(DAG.getDataLayout())));
     return;
   case Intrinsic::frameaddress:
     setValue(&I, DAG.getNode(ISD::FRAMEADDR, sdl,
                              TLI.getFrameIndexTy(DAG.getDataLayout()),
                              getValue(I.getArgOperand(0))));
     return;
   case Intrinsic::read_volatile_register:
   case Intrinsic::read_register: {
     Value *Reg = I.getArgOperand(0);
     SDValue Chain = getRoot();
     SDValue RegName =
         DAG.getMDNode(cast<MDNode>(cast<MetadataAsValue>(Reg)->getMetadata()));
     EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
     Res = DAG.getNode(ISD::READ_REGISTER, sdl,
       DAG.getVTList(VT, MVT::Other), Chain, RegName);
     setValue(&I, Res);
     DAG.setRoot(Res.getValue(1));
     return;
   }
   case Intrinsic::write_register: {
     Value *Reg = I.getArgOperand(0);
     Value *RegValue = I.getArgOperand(1);
     SDValue Chain = getRoot();
     SDValue RegName =
         DAG.getMDNode(cast<MDNode>(cast<MetadataAsValue>(Reg)->getMetadata()));
     DAG.setRoot(DAG.getNode(ISD::WRITE_REGISTER, sdl, MVT::Other, Chain,
                             RegName, getValue(RegValue)));
     return;
   }
   case Intrinsic::memcpy: {
     const auto &MCI = cast<MemCpyInst>(I);
     SDValue Op1 = getValue(I.getArgOperand(0));
     SDValue Op2 = getValue(I.getArgOperand(1));
     SDValue Op3 = getValue(I.getArgOperand(2));
     // @llvm.memcpy defines 0 and 1 to both mean no alignment.
     Align DstAlign = MCI.getDestAlign().valueOrOne();
     Align SrcAlign = MCI.getSourceAlign().valueOrOne();
     Align Alignment = commonAlignment(DstAlign, SrcAlign);
     bool isVol = MCI.isVolatile();
     bool isTC = I.isTailCall() && isInTailCallPosition(I, DAG.getTarget());
     // FIXME: Support passing different dest/src alignments to the memcpy DAG
     // node.
     SDValue Root = isVol ? getRoot() : getMemoryRoot();
     SDValue MC = DAG.getMemcpy(Root, sdl, Op1, Op2, Op3, Alignment, isVol,
                                /* AlwaysInline */ false, isTC,
                                MachinePointerInfo(I.getArgOperand(0)),
                                MachinePointerInfo(I.getArgOperand(1)));
     updateDAGForMaybeTailCall(MC);
     return;
   }
   case Intrinsic::memcpy_inline: {
     const auto &MCI = cast<MemCpyInlineInst>(I);
     SDValue Dst = getValue(I.getArgOperand(0));
     SDValue Src = getValue(I.getArgOperand(1));
     SDValue Size = getValue(I.getArgOperand(2));
     assert(isa<ConstantSDNode>(Size) && "memcpy_inline needs constant size");
     // @llvm.memcpy.inline defines 0 and 1 to both mean no alignment.
     Align DstAlign = MCI.getDestAlign().valueOrOne();
     Align SrcAlign = MCI.getSourceAlign().valueOrOne();
     Align Alignment = commonAlignment(DstAlign, SrcAlign);
     bool isVol = MCI.isVolatile();
     bool isTC = I.isTailCall() && isInTailCallPosition(I, DAG.getTarget());
     // FIXME: Support passing different dest/src alignments to the memcpy DAG
     // node.
     SDValue MC = DAG.getMemcpy(getRoot(), sdl, Dst, Src, Size, Alignment, isVol,
                                /* AlwaysInline */ true, isTC,
                                MachinePointerInfo(I.getArgOperand(0)),
                                MachinePointerInfo(I.getArgOperand(1)));
     updateDAGForMaybeTailCall(MC);
     return;
   }
   case Intrinsic::memset: {
     const auto &MSI = cast<MemSetInst>(I);
     SDValue Op1 = getValue(I.getArgOperand(0));
     SDValue Op2 = getValue(I.getArgOperand(1));
     SDValue Op3 = getValue(I.getArgOperand(2));
     // @llvm.memset defines 0 and 1 to both mean no alignment.
     Align Alignment = MSI.getDestAlign().valueOrOne();
     bool isVol = MSI.isVolatile();
     bool isTC = I.isTailCall() && isInTailCallPosition(I, DAG.getTarget());
     SDValue Root = isVol ? getRoot() : getMemoryRoot();
     SDValue MS = DAG.getMemset(Root, sdl, Op1, Op2, Op3, Alignment, isVol, isTC,
                                MachinePointerInfo(I.getArgOperand(0)));
     updateDAGForMaybeTailCall(MS);
     return;
   }
   case Intrinsic::memmove: {
     const auto &MMI = cast<MemMoveInst>(I);
     SDValue Op1 = getValue(I.getArgOperand(0));
     SDValue Op2 = getValue(I.getArgOperand(1));
     SDValue Op3 = getValue(I.getArgOperand(2));
     // @llvm.memmove defines 0 and 1 to both mean no alignment.
     Align DstAlign = MMI.getDestAlign().valueOrOne();
     Align SrcAlign = MMI.getSourceAlign().valueOrOne();
     Align Alignment = commonAlignment(DstAlign, SrcAlign);
     bool isVol = MMI.isVolatile();
     bool isTC = I.isTailCall() && isInTailCallPosition(I, DAG.getTarget());
     // FIXME: Support passing different dest/src alignments to the memmove DAG
     // node.
     SDValue Root = isVol ? getRoot() : getMemoryRoot();
     SDValue MM = DAG.getMemmove(Root, sdl, Op1, Op2, Op3, Alignment, isVol,
                                 isTC, MachinePointerInfo(I.getArgOperand(0)),
                                 MachinePointerInfo(I.getArgOperand(1)));
     updateDAGForMaybeTailCall(MM);
     return;
   }
   case Intrinsic::memcpy_element_unordered_atomic: {
     const AtomicMemCpyInst &MI = cast<AtomicMemCpyInst>(I);
     SDValue Dst = getValue(MI.getRawDest());
     SDValue Src = getValue(MI.getRawSource());
     SDValue Length = getValue(MI.getLength());
 
     unsigned DstAlign = MI.getDestAlignment();
     unsigned SrcAlign = MI.getSourceAlignment();
     Type *LengthTy = MI.getLength()->getType();
     unsigned ElemSz = MI.getElementSizeInBytes();
     bool isTC = I.isTailCall() && isInTailCallPosition(I, DAG.getTarget());
     SDValue MC = DAG.getAtomicMemcpy(getRoot(), sdl, Dst, DstAlign, Src,
                                      SrcAlign, Length, LengthTy, ElemSz, isTC,
                                      MachinePointerInfo(MI.getRawDest()),
                                      MachinePointerInfo(MI.getRawSource()));
     updateDAGForMaybeTailCall(MC);
     return;
   }
   case Intrinsic::memmove_element_unordered_atomic: {
     auto &MI = cast<AtomicMemMoveInst>(I);
     SDValue Dst = getValue(MI.getRawDest());
     SDValue Src = getValue(MI.getRawSource());
     SDValue Length = getValue(MI.getLength());
 
     unsigned DstAlign = MI.getDestAlignment();
     unsigned SrcAlign = MI.getSourceAlignment();
     Type *LengthTy = MI.getLength()->getType();
     unsigned ElemSz = MI.getElementSizeInBytes();
     bool isTC = I.isTailCall() && isInTailCallPosition(I, DAG.getTarget());
     SDValue MC = DAG.getAtomicMemmove(getRoot(), sdl, Dst, DstAlign, Src,
                                       SrcAlign, Length, LengthTy, ElemSz, isTC,
                                       MachinePointerInfo(MI.getRawDest()),
                                       MachinePointerInfo(MI.getRawSource()));
     updateDAGForMaybeTailCall(MC);
     return;
   }
   case Intrinsic::memset_element_unordered_atomic: {
     auto &MI = cast<AtomicMemSetInst>(I);
     SDValue Dst = getValue(MI.getRawDest());
     SDValue Val = getValue(MI.getValue());
     SDValue Length = getValue(MI.getLength());
 
     unsigned DstAlign = MI.getDestAlignment();
     Type *LengthTy = MI.getLength()->getType();
     unsigned ElemSz = MI.getElementSizeInBytes();
     bool isTC = I.isTailCall() && isInTailCallPosition(I, DAG.getTarget());
     SDValue MC = DAG.getAtomicMemset(getRoot(), sdl, Dst, DstAlign, Val, Length,
                                      LengthTy, ElemSz, isTC,
                                      MachinePointerInfo(MI.getRawDest()));
     updateDAGForMaybeTailCall(MC);
     return;
   }
   case Intrinsic::call_preallocated_setup: {
     const CallBase *PreallocatedCall = FindPreallocatedCall(&I);
     SDValue SrcValue = DAG.getSrcValue(PreallocatedCall);
     SDValue Res = DAG.getNode(ISD::PREALLOCATED_SETUP, sdl, MVT::Other,
                               getRoot(), SrcValue);
     setValue(&I, Res);
     DAG.setRoot(Res);
     return;
   }
   case Intrinsic::call_preallocated_arg: {
     const CallBase *PreallocatedCall = FindPreallocatedCall(I.getOperand(0));
     SDValue SrcValue = DAG.getSrcValue(PreallocatedCall);
     SDValue Ops[3];
     Ops[0] = getRoot();
     Ops[1] = SrcValue;
     Ops[2] = DAG.getTargetConstant(*cast<ConstantInt>(I.getArgOperand(1)), sdl,
                                    MVT::i32); // arg index
     SDValue Res = DAG.getNode(
         ISD::PREALLOCATED_ARG, sdl,
         DAG.getVTList(TLI.getPointerTy(DAG.getDataLayout()), MVT::Other), Ops);
     setValue(&I, Res);
     DAG.setRoot(Res.getValue(1));
     return;
   }
   case Intrinsic::dbg_addr:
   case Intrinsic::dbg_declare: {
     const auto &DI = cast<DbgVariableIntrinsic>(I);
     DILocalVariable *Variable = DI.getVariable();
     DIExpression *Expression = DI.getExpression();
     dropDanglingDebugInfo(Variable, Expression);
     assert(Variable && "Missing variable");
     LLVM_DEBUG(dbgs() << "SelectionDAG visiting debug intrinsic: " << DI
                       << "\n");
     // Check if address has undef value.
     const Value *Address = DI.getVariableLocation();
     if (!Address || isa<UndefValue>(Address) ||
         (Address->use_empty() && !isa<Argument>(Address))) {
       LLVM_DEBUG(dbgs() << "Dropping debug info for " << DI
                         << " (bad/undef/unused-arg address)\n");
       return;
     }
 
     bool isParameter = Variable->isParameter() || isa<Argument>(Address);
 
     // Check if this variable can be described by a frame index, typically
     // either as a static alloca or a byval parameter.
     int FI = std::numeric_limits<int>::max();
     if (const auto *AI =
             dyn_cast<AllocaInst>(Address->stripInBoundsConstantOffsets())) {
       if (AI->isStaticAlloca()) {
         auto I = FuncInfo.StaticAllocaMap.find(AI);
         if (I != FuncInfo.StaticAllocaMap.end())
           FI = I->second;
       }
     } else if (const auto *Arg = dyn_cast<Argument>(
                    Address->stripInBoundsConstantOffsets())) {
       FI = FuncInfo.getArgumentFrameIndex(Arg);
     }
 
     // llvm.dbg.addr is control dependent and always generates indirect
     // DBG_VALUE instructions. llvm.dbg.declare is handled as a frame index in
     // the MachineFunction variable table.
     if (FI != std::numeric_limits<int>::max()) {
       if (Intrinsic == Intrinsic::dbg_addr) {
         SDDbgValue *SDV = DAG.getFrameIndexDbgValue(
             Variable, Expression, FI, /*IsIndirect*/ true, dl, SDNodeOrder);
         DAG.AddDbgValue(SDV, getRoot().getNode(), isParameter);
       } else {
         LLVM_DEBUG(dbgs() << "Skipping " << DI
                           << " (variable info stashed in MF side table)\n");
       }
       return;
     }
 
     SDValue &N = NodeMap[Address];
     if (!N.getNode() && isa<Argument>(Address))
       // Check unused arguments map.
       N = UnusedArgNodeMap[Address];
     SDDbgValue *SDV;
     if (N.getNode()) {
       if (const BitCastInst *BCI = dyn_cast<BitCastInst>(Address))
         Address = BCI->getOperand(0);
       // Parameters are handled specially.
       auto FINode = dyn_cast<FrameIndexSDNode>(N.getNode());
       if (isParameter && FINode) {
         // Byval parameter. We have a frame index at this point.
         SDV =
             DAG.getFrameIndexDbgValue(Variable, Expression, FINode->getIndex(),
                                       /*IsIndirect*/ true, dl, SDNodeOrder);
       } else if (isa<Argument>(Address)) {
         // Address is an argument, so try to emit its dbg value using
         // virtual register info from the FuncInfo.ValueMap.
         EmitFuncArgumentDbgValue(Address, Variable, Expression, dl, true, N);
         return;
       } else {
         SDV = DAG.getDbgValue(Variable, Expression, N.getNode(), N.getResNo(),
                               true, dl, SDNodeOrder);
       }
       DAG.AddDbgValue(SDV, N.getNode(), isParameter);
     } else {
       // If Address is an argument then try to emit its dbg value using
       // virtual register info from the FuncInfo.ValueMap.
       if (!EmitFuncArgumentDbgValue(Address, Variable, Expression, dl, true,
                                     N)) {
         LLVM_DEBUG(dbgs() << "Dropping debug info for " << DI
                           << " (could not emit func-arg dbg_value)\n");
       }
     }
     return;
   }
   case Intrinsic::dbg_label: {
     const DbgLabelInst &DI = cast<DbgLabelInst>(I);
     DILabel *Label = DI.getLabel();
     assert(Label && "Missing label");
 
     SDDbgLabel *SDV;
     SDV = DAG.getDbgLabel(Label, dl, SDNodeOrder);
     DAG.AddDbgLabel(SDV);
     return;
   }
   case Intrinsic::dbg_value: {
     const DbgValueInst &DI = cast<DbgValueInst>(I);
     assert(DI.getVariable() && "Missing variable");
 
     DILocalVariable *Variable = DI.getVariable();
     DIExpression *Expression = DI.getExpression();
     dropDanglingDebugInfo(Variable, Expression);
     const Value *V = DI.getValue();
     if (!V)
       return;
 
     if (handleDebugValue(V, Variable, Expression, dl, DI.getDebugLoc(),
         SDNodeOrder))
       return;
 
     // TODO: Dangling debug info will eventually either be resolved or produce
     // an Undef DBG_VALUE. However in the resolution case, a gap may appear
     // between the original dbg.value location and its resolved DBG_VALUE, which
     // we should ideally fill with an extra Undef DBG_VALUE.
 
     DanglingDebugInfoMap[V].emplace_back(&DI, dl, SDNodeOrder);
     return;
   }
 
   case Intrinsic::eh_typeid_for: {
     // Find the type id for the given typeinfo.
     GlobalValue *GV = ExtractTypeInfo(I.getArgOperand(0));
     unsigned TypeID = DAG.getMachineFunction().getTypeIDFor(GV);
     Res = DAG.getConstant(TypeID, sdl, MVT::i32);
     setValue(&I, Res);
     return;
   }
 
   case Intrinsic::eh_return_i32:
   case Intrinsic::eh_return_i64:
     DAG.getMachineFunction().setCallsEHReturn(true);
     DAG.setRoot(DAG.getNode(ISD::EH_RETURN, sdl,
                             MVT::Other,
                             getControlRoot(),
                             getValue(I.getArgOperand(0)),
                             getValue(I.getArgOperand(1))));
     return;
   case Intrinsic::eh_unwind_init:
     DAG.getMachineFunction().setCallsUnwindInit(true);
     return;
   case Intrinsic::eh_dwarf_cfa:
     setValue(&I, DAG.getNode(ISD::EH_DWARF_CFA, sdl,
                              TLI.getPointerTy(DAG.getDataLayout()),
                              getValue(I.getArgOperand(0))));
     return;
   case Intrinsic::eh_sjlj_callsite: {
     MachineModuleInfo &MMI = DAG.getMachineFunction().getMMI();
     ConstantInt *CI = dyn_cast<ConstantInt>(I.getArgOperand(0));
     assert(CI && "Non-constant call site value in eh.sjlj.callsite!");
     assert(MMI.getCurrentCallSite() == 0 && "Overlapping call sites!");
 
     MMI.setCurrentCallSite(CI->getZExtValue());
     return;
   }
   case Intrinsic::eh_sjlj_functioncontext: {
     // Get and store the index of the function context.
     MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
     AllocaInst *FnCtx =
       cast<AllocaInst>(I.getArgOperand(0)->stripPointerCasts());
     int FI = FuncInfo.StaticAllocaMap[FnCtx];
     MFI.setFunctionContextIndex(FI);
     return;
   }
   case Intrinsic::eh_sjlj_setjmp: {
     SDValue Ops[2];
     Ops[0] = getRoot();
     Ops[1] = getValue(I.getArgOperand(0));
     SDValue Op = DAG.getNode(ISD::EH_SJLJ_SETJMP, sdl,
                              DAG.getVTList(MVT::i32, MVT::Other), Ops);
     setValue(&I, Op.getValue(0));
     DAG.setRoot(Op.getValue(1));
     return;
   }
   case Intrinsic::eh_sjlj_longjmp:
     DAG.setRoot(DAG.getNode(ISD::EH_SJLJ_LONGJMP, sdl, MVT::Other,
                             getRoot(), getValue(I.getArgOperand(0))));
     return;
   case Intrinsic::eh_sjlj_setup_dispatch:
     DAG.setRoot(DAG.getNode(ISD::EH_SJLJ_SETUP_DISPATCH, sdl, MVT::Other,
                             getRoot()));
     return;
   case Intrinsic::masked_gather:
     visitMaskedGather(I);
     return;
   case Intrinsic::masked_load:
     visitMaskedLoad(I);
     return;
   case Intrinsic::masked_scatter:
     visitMaskedScatter(I);
     return;
   case Intrinsic::masked_store:
     visitMaskedStore(I);
     return;
   case Intrinsic::masked_expandload:
     visitMaskedLoad(I, true /* IsExpanding */);
     return;
   case Intrinsic::masked_compressstore:
     visitMaskedStore(I, true /* IsCompressing */);
     return;
   case Intrinsic::powi:
     setValue(&I, ExpandPowI(sdl, getValue(I.getArgOperand(0)),
                             getValue(I.getArgOperand(1)), DAG));
     return;
   case Intrinsic::log:
     setValue(&I, expandLog(sdl, getValue(I.getArgOperand(0)), DAG, TLI));
     return;
   case Intrinsic::log2:
     setValue(&I, expandLog2(sdl, getValue(I.getArgOperand(0)), DAG, TLI));
     return;
   case Intrinsic::log10:
     setValue(&I, expandLog10(sdl, getValue(I.getArgOperand(0)), DAG, TLI));
     return;
   case Intrinsic::exp:
     setValue(&I, expandExp(sdl, getValue(I.getArgOperand(0)), DAG, TLI));
     return;
   case Intrinsic::exp2:
     setValue(&I, expandExp2(sdl, getValue(I.getArgOperand(0)), DAG, TLI));
     return;
   case Intrinsic::pow:
     setValue(&I, expandPow(sdl, getValue(I.getArgOperand(0)),
                            getValue(I.getArgOperand(1)), DAG, TLI));
     return;
   case Intrinsic::sqrt:
   case Intrinsic::fabs:
   case Intrinsic::sin:
   case Intrinsic::cos:
   case Intrinsic::floor:
   case Intrinsic::ceil:
   case Intrinsic::trunc:
   case Intrinsic::rint:
   case Intrinsic::nearbyint:
   case Intrinsic::round:
   case Intrinsic::roundeven:
   case Intrinsic::canonicalize: {
     unsigned Opcode;
     switch (Intrinsic) {
     default: llvm_unreachable("Impossible intrinsic");  // Can't reach here.
     case Intrinsic::sqrt:      Opcode = ISD::FSQRT;      break;
     case Intrinsic::fabs:      Opcode = ISD::FABS;       break;
     case Intrinsic::sin:       Opcode = ISD::FSIN;       break;
     case Intrinsic::cos:       Opcode = ISD::FCOS;       break;
     case Intrinsic::floor:     Opcode = ISD::FFLOOR;     break;
     case Intrinsic::ceil:      Opcode = ISD::FCEIL;      break;
     case Intrinsic::trunc:     Opcode = ISD::FTRUNC;     break;
     case Intrinsic::rint:      Opcode = ISD::FRINT;      break;
     case Intrinsic::nearbyint: Opcode = ISD::FNEARBYINT; break;
     case Intrinsic::round:     Opcode = ISD::FROUND;     break;
     case Intrinsic::roundeven: Opcode = ISD::FROUNDEVEN; break;
     case Intrinsic::canonicalize: Opcode = ISD::FCANONICALIZE; break;
     }
 
     setValue(&I, DAG.getNode(Opcode, sdl,
                              getValue(I.getArgOperand(0)).getValueType(),
                              getValue(I.getArgOperand(0))));
     return;
   }
   case Intrinsic::lround:
   case Intrinsic::llround:
   case Intrinsic::lrint:
   case Intrinsic::llrint: {
     unsigned Opcode;
     switch (Intrinsic) {
     default: llvm_unreachable("Impossible intrinsic");  // Can't reach here.
     case Intrinsic::lround:  Opcode = ISD::LROUND;  break;
     case Intrinsic::llround: Opcode = ISD::LLROUND; break;
     case Intrinsic::lrint:   Opcode = ISD::LRINT;   break;
     case Intrinsic::llrint:  Opcode = ISD::LLRINT;  break;
     }
 
     EVT RetVT = TLI.getValueType(DAG.getDataLayout(), I.getType());
     setValue(&I, DAG.getNode(Opcode, sdl, RetVT,
                              getValue(I.getArgOperand(0))));
     return;
   }
   case Intrinsic::minnum:
     setValue(&I, DAG.getNode(ISD::FMINNUM, sdl,
                              getValue(I.getArgOperand(0)).getValueType(),
                              getValue(I.getArgOperand(0)),
                              getValue(I.getArgOperand(1))));
     return;
   case Intrinsic::maxnum:
     setValue(&I, DAG.getNode(ISD::FMAXNUM, sdl,
                              getValue(I.getArgOperand(0)).getValueType(),
                              getValue(I.getArgOperand(0)),
                              getValue(I.getArgOperand(1))));
     return;
   case Intrinsic::minimum:
     setValue(&I, DAG.getNode(ISD::FMINIMUM, sdl,
                              getValue(I.getArgOperand(0)).getValueType(),
                              getValue(I.getArgOperand(0)),
                              getValue(I.getArgOperand(1))));
     return;
   case Intrinsic::maximum:
     setValue(&I, DAG.getNode(ISD::FMAXIMUM, sdl,
                              getValue(I.getArgOperand(0)).getValueType(),
                              getValue(I.getArgOperand(0)),
                              getValue(I.getArgOperand(1))));
     return;
   case Intrinsic::copysign:
     setValue(&I, DAG.getNode(ISD::FCOPYSIGN, sdl,
                              getValue(I.getArgOperand(0)).getValueType(),
                              getValue(I.getArgOperand(0)),
                              getValue(I.getArgOperand(1))));
     return;
   case Intrinsic::fma:
     setValue(&I, DAG.getNode(ISD::FMA, sdl,
                              getValue(I.getArgOperand(0)).getValueType(),
                              getValue(I.getArgOperand(0)),
                              getValue(I.getArgOperand(1)),
                              getValue(I.getArgOperand(2))));
     return;
 #define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC)                         \
   case Intrinsic::INTRINSIC:
 #include "llvm/IR/ConstrainedOps.def"
     visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(I));
     return;
   case Intrinsic::fmuladd: {
     EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
     if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&
         TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT)) {
       setValue(&I, DAG.getNode(ISD::FMA, sdl,
                                getValue(I.getArgOperand(0)).getValueType(),
                                getValue(I.getArgOperand(0)),
                                getValue(I.getArgOperand(1)),
                                getValue(I.getArgOperand(2))));
     } else {
       // TODO: Intrinsic calls should have fast-math-flags.
       SDValue Mul = DAG.getNode(ISD::FMUL, sdl,
                                 getValue(I.getArgOperand(0)).getValueType(),
                                 getValue(I.getArgOperand(0)),
                                 getValue(I.getArgOperand(1)));
       SDValue Add = DAG.getNode(ISD::FADD, sdl,
                                 getValue(I.getArgOperand(0)).getValueType(),
                                 Mul,
                                 getValue(I.getArgOperand(2)));
       setValue(&I, Add);
     }
     return;
   }
   case Intrinsic::convert_to_fp16:
     setValue(&I, DAG.getNode(ISD::BITCAST, sdl, MVT::i16,
                              DAG.getNode(ISD::FP_ROUND, sdl, MVT::f16,
                                          getValue(I.getArgOperand(0)),
                                          DAG.getTargetConstant(0, sdl,
                                                                MVT::i32))));
     return;
   case Intrinsic::convert_from_fp16:
     setValue(&I, DAG.getNode(ISD::FP_EXTEND, sdl,
                              TLI.getValueType(DAG.getDataLayout(), I.getType()),
                              DAG.getNode(ISD::BITCAST, sdl, MVT::f16,
                                          getValue(I.getArgOperand(0)))));
     return;
   case Intrinsic::pcmarker: {
     SDValue Tmp = getValue(I.getArgOperand(0));
     DAG.setRoot(DAG.getNode(ISD::PCMARKER, sdl, MVT::Other, getRoot(), Tmp));
     return;
   }
   case Intrinsic::readcyclecounter: {
     SDValue Op = getRoot();
     Res = DAG.getNode(ISD::READCYCLECOUNTER, sdl,
                       DAG.getVTList(MVT::i64, MVT::Other), Op);
     setValue(&I, Res);
     DAG.setRoot(Res.getValue(1));
     return;
   }
   case Intrinsic::bitreverse:
     setValue(&I, DAG.getNode(ISD::BITREVERSE, sdl,
                              getValue(I.getArgOperand(0)).getValueType(),
                              getValue(I.getArgOperand(0))));
     return;
   case Intrinsic::bswap:
     setValue(&I, DAG.getNode(ISD::BSWAP, sdl,
                              getValue(I.getArgOperand(0)).getValueType(),
                              getValue(I.getArgOperand(0))));
     return;
   case Intrinsic::cttz: {
     SDValue Arg = getValue(I.getArgOperand(0));
     ConstantInt *CI = cast<ConstantInt>(I.getArgOperand(1));
     EVT Ty = Arg.getValueType();
     setValue(&I, DAG.getNode(CI->isZero() ? ISD::CTTZ : ISD::CTTZ_ZERO_UNDEF,
                              sdl, Ty, Arg));
     return;
   }
   case Intrinsic::ctlz: {
     SDValue Arg = getValue(I.getArgOperand(0));
     ConstantInt *CI = cast<ConstantInt>(I.getArgOperand(1));
     EVT Ty = Arg.getValueType();
     setValue(&I, DAG.getNode(CI->isZero() ? ISD::CTLZ : ISD::CTLZ_ZERO_UNDEF,
                              sdl, Ty, Arg));
     return;
   }
   case Intrinsic::ctpop: {
     SDValue Arg = getValue(I.getArgOperand(0));
     EVT Ty = Arg.getValueType();
     setValue(&I, DAG.getNode(ISD::CTPOP, sdl, Ty, Arg));
     return;
   }
   case Intrinsic::fshl:
   case Intrinsic::fshr: {
     bool IsFSHL = Intrinsic == Intrinsic::fshl;
     SDValue X = getValue(I.getArgOperand(0));
     SDValue Y = getValue(I.getArgOperand(1));
     SDValue Z = getValue(I.getArgOperand(2));
     EVT VT = X.getValueType();
     SDValue BitWidthC = DAG.getConstant(VT.getScalarSizeInBits(), sdl, VT);
     SDValue Zero = DAG.getConstant(0, sdl, VT);
     SDValue ShAmt = DAG.getNode(ISD::UREM, sdl, VT, Z, BitWidthC);
 
     auto FunnelOpcode = IsFSHL ? ISD::FSHL : ISD::FSHR;
     if (TLI.isOperationLegalOrCustom(FunnelOpcode, VT)) {
       setValue(&I, DAG.getNode(FunnelOpcode, sdl, VT, X, Y, Z));
       return;
     }
 
     // When X == Y, this is rotate. If the data type has a power-of-2 size, we
     // avoid the select that is necessary in the general case to filter out
     // the 0-shift possibility that leads to UB.
     if (X == Y && isPowerOf2_32(VT.getScalarSizeInBits())) {
       auto RotateOpcode = IsFSHL ? ISD::ROTL : ISD::ROTR;
       if (TLI.isOperationLegalOrCustom(RotateOpcode, VT)) {
         setValue(&I, DAG.getNode(RotateOpcode, sdl, VT, X, Z));
         return;
       }
 
       // Some targets only rotate one way. Try the opposite direction.
       RotateOpcode = IsFSHL ? ISD::ROTR : ISD::ROTL;
       if (TLI.isOperationLegalOrCustom(RotateOpcode, VT)) {
         // Negate the shift amount because it is safe to ignore the high bits.
         SDValue NegShAmt = DAG.getNode(ISD::SUB, sdl, VT, Zero, Z);
         setValue(&I, DAG.getNode(RotateOpcode, sdl, VT, X, NegShAmt));
         return;
       }
 
       // fshl (rotl): (X << (Z % BW)) | (X >> ((0 - Z) % BW))
       // fshr (rotr): (X << ((0 - Z) % BW)) | (X >> (Z % BW))
       SDValue NegZ = DAG.getNode(ISD::SUB, sdl, VT, Zero, Z);
       SDValue NShAmt = DAG.getNode(ISD::UREM, sdl, VT, NegZ, BitWidthC);
       SDValue ShX = DAG.getNode(ISD::SHL, sdl, VT, X, IsFSHL ? ShAmt : NShAmt);
       SDValue ShY = DAG.getNode(ISD::SRL, sdl, VT, X, IsFSHL ? NShAmt : ShAmt);
       setValue(&I, DAG.getNode(ISD::OR, sdl, VT, ShX, ShY));
       return;
     }
 
     // fshl: (X << (Z % BW)) | (Y >> (BW - (Z % BW)))
     // fshr: (X << (BW - (Z % BW))) | (Y >> (Z % BW))
     SDValue InvShAmt = DAG.getNode(ISD::SUB, sdl, VT, BitWidthC, ShAmt);
     SDValue ShX = DAG.getNode(ISD::SHL, sdl, VT, X, IsFSHL ? ShAmt : InvShAmt);
     SDValue ShY = DAG.getNode(ISD::SRL, sdl, VT, Y, IsFSHL ? InvShAmt : ShAmt);
     SDValue Or = DAG.getNode(ISD::OR, sdl, VT, ShX, ShY);
 
     // If (Z % BW == 0), then the opposite direction shift is shift-by-bitwidth,
     // and that is undefined. We must compare and select to avoid UB.
     EVT CCVT = MVT::i1;
     if (VT.isVector())
       CCVT = EVT::getVectorVT(*Context, CCVT, VT.getVectorNumElements());
 
     // For fshl, 0-shift returns the 1st arg (X).
     // For fshr, 0-shift returns the 2nd arg (Y).
     SDValue IsZeroShift = DAG.getSetCC(sdl, CCVT, ShAmt, Zero, ISD::SETEQ);
     setValue(&I, DAG.getSelect(sdl, VT, IsZeroShift, IsFSHL ? X : Y, Or));
     return;
   }
   case Intrinsic::sadd_sat: {
     SDValue Op1 = getValue(I.getArgOperand(0));
     SDValue Op2 = getValue(I.getArgOperand(1));
     setValue(&I, DAG.getNode(ISD::SADDSAT, sdl, Op1.getValueType(), Op1, Op2));
     return;
   }
   case Intrinsic::uadd_sat: {
     SDValue Op1 = getValue(I.getArgOperand(0));
     SDValue Op2 = getValue(I.getArgOperand(1));
     setValue(&I, DAG.getNode(ISD::UADDSAT, sdl, Op1.getValueType(), Op1, Op2));
     return;
   }
   case Intrinsic::ssub_sat: {
     SDValue Op1 = getValue(I.getArgOperand(0));
     SDValue Op2 = getValue(I.getArgOperand(1));
     setValue(&I, DAG.getNode(ISD::SSUBSAT, sdl, Op1.getValueType(), Op1, Op2));
     return;
   }
   case Intrinsic::usub_sat: {
     SDValue Op1 = getValue(I.getArgOperand(0));
     SDValue Op2 = getValue(I.getArgOperand(1));
     setValue(&I, DAG.getNode(ISD::USUBSAT, sdl, Op1.getValueType(), Op1, Op2));
     return;
   }
   case Intrinsic::smul_fix:
   case Intrinsic::umul_fix:
   case Intrinsic::smul_fix_sat:
   case Intrinsic::umul_fix_sat: {
     SDValue Op1 = getValue(I.getArgOperand(0));
     SDValue Op2 = getValue(I.getArgOperand(1));
     SDValue Op3 = getValue(I.getArgOperand(2));
     setValue(&I, DAG.getNode(FixedPointIntrinsicToOpcode(Intrinsic), sdl,
                              Op1.getValueType(), Op1, Op2, Op3));
     return;
   }
   case Intrinsic::sdiv_fix:
   case Intrinsic::udiv_fix:
   case Intrinsic::sdiv_fix_sat:
   case Intrinsic::udiv_fix_sat: {
     SDValue Op1 = getValue(I.getArgOperand(0));
     SDValue Op2 = getValue(I.getArgOperand(1));
     SDValue Op3 = getValue(I.getArgOperand(2));
     setValue(&I, expandDivFix(FixedPointIntrinsicToOpcode(Intrinsic), sdl,
                               Op1, Op2, Op3, DAG, TLI));
     return;
   }
   case Intrinsic::stacksave: {
     SDValue Op = getRoot();
     EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
     Res = DAG.getNode(ISD::STACKSAVE, sdl, DAG.getVTList(VT, MVT::Other), Op);
     setValue(&I, Res);
     DAG.setRoot(Res.getValue(1));
     return;
   }
   case Intrinsic::stackrestore:
     Res = getValue(I.getArgOperand(0));
     DAG.setRoot(DAG.getNode(ISD::STACKRESTORE, sdl, MVT::Other, getRoot(), Res));
     return;
   case Intrinsic::get_dynamic_area_offset: {
     SDValue Op = getRoot();
     EVT PtrTy = TLI.getFrameIndexTy(DAG.getDataLayout());
     EVT ResTy = TLI.getValueType(DAG.getDataLayout(), I.getType());
     // Result type for @llvm.get.dynamic.area.offset should match PtrTy for
     // target.
     if (PtrTy.getSizeInBits() < ResTy.getSizeInBits())
       report_fatal_error("Wrong result type for @llvm.get.dynamic.area.offset"
                          " intrinsic!");
     Res = DAG.getNode(ISD::GET_DYNAMIC_AREA_OFFSET, sdl, DAG.getVTList(ResTy),
                       Op);
     DAG.setRoot(Op);
     setValue(&I, Res);
     return;
   }
   case Intrinsic::stackguard: {
     MachineFunction &MF = DAG.getMachineFunction();
     const Module &M = *MF.getFunction().getParent();
     SDValue Chain = getRoot();
     if (TLI.useLoadStackGuardNode()) {
       Res = getLoadStackGuard(DAG, sdl, Chain);
     } else {
       EVT PtrTy = TLI.getValueType(DAG.getDataLayout(), I.getType());
       const Value *Global = TLI.getSDagStackGuard(M);
       unsigned Align = DL->getPrefTypeAlignment(Global->getType());
       Res = DAG.getLoad(PtrTy, sdl, Chain, getValue(Global),
                         MachinePointerInfo(Global, 0), Align,
                         MachineMemOperand::MOVolatile);
     }
     if (TLI.useStackGuardXorFP())
       Res = TLI.emitStackGuardXorFP(DAG, Res, sdl);
     DAG.setRoot(Chain);
     setValue(&I, Res);
     return;
   }
   case Intrinsic::stackprotector: {
     // Emit code into the DAG to store the stack guard onto the stack.
     MachineFunction &MF = DAG.getMachineFunction();
     MachineFrameInfo &MFI = MF.getFrameInfo();
     SDValue Src, Chain = getRoot();
 
     if (TLI.useLoadStackGuardNode())
       Src = getLoadStackGuard(DAG, sdl, Chain);
     else
       Src = getValue(I.getArgOperand(0));   // The guard's value.
 
     AllocaInst *Slot = cast<AllocaInst>(I.getArgOperand(1));
 
     int FI = FuncInfo.StaticAllocaMap[Slot];
     MFI.setStackProtectorIndex(FI);
     EVT PtrTy = TLI.getFrameIndexTy(DAG.getDataLayout());
 
     SDValue FIN = DAG.getFrameIndex(FI, PtrTy);
 
     // Store the stack protector onto the stack.
     Res = DAG.getStore(Chain, sdl, Src, FIN, MachinePointerInfo::getFixedStack(
                                                  DAG.getMachineFunction(), FI),
                        /* Alignment = */ 0, MachineMemOperand::MOVolatile);
     setValue(&I, Res);
     DAG.setRoot(Res);
     return;
   }
   case Intrinsic::objectsize:
     llvm_unreachable("llvm.objectsize.* should have been lowered already");
 
   case Intrinsic::is_constant:
     llvm_unreachable("llvm.is.constant.* should have been lowered already");
 
   case Intrinsic::annotation:
   case Intrinsic::ptr_annotation:
   case Intrinsic::launder_invariant_group:
   case Intrinsic::strip_invariant_group:
     // Drop the intrinsic, but forward the value
     setValue(&I, getValue(I.getOperand(0)));
     return;
   case Intrinsic::assume:
   case Intrinsic::var_annotation:
   case Intrinsic::sideeffect:
     // Discard annotate attributes, assumptions, and artificial side-effects.
     return;
 
   case Intrinsic::codeview_annotation: {
     // Emit a label associated with this metadata.
     MachineFunction &MF = DAG.getMachineFunction();
     MCSymbol *Label =
         MF.getMMI().getContext().createTempSymbol("annotation", true);
     Metadata *MD = cast<MetadataAsValue>(I.getArgOperand(0))->getMetadata();
     MF.addCodeViewAnnotation(Label, cast<MDNode>(MD));
     Res = DAG.getLabelNode(ISD::ANNOTATION_LABEL, sdl, getRoot(), Label);
     DAG.setRoot(Res);
     return;
   }
 
   case Intrinsic::init_trampoline: {
     const Function *F = cast<Function>(I.getArgOperand(1)->stripPointerCasts());
 
     SDValue Ops[6];
     Ops[0] = getRoot();
     Ops[1] = getValue(I.getArgOperand(0));
     Ops[2] = getValue(I.getArgOperand(1));
     Ops[3] = getValue(I.getArgOperand(2));
     Ops[4] = DAG.getSrcValue(I.getArgOperand(0));
     Ops[5] = DAG.getSrcValue(F);
 
     Res = DAG.getNode(ISD::INIT_TRAMPOLINE, sdl, MVT::Other, Ops);
 
     DAG.setRoot(Res);
     return;
   }
   case Intrinsic::adjust_trampoline:
     setValue(&I, DAG.getNode(ISD::ADJUST_TRAMPOLINE, sdl,
                              TLI.getPointerTy(DAG.getDataLayout()),
                              getValue(I.getArgOperand(0))));
     return;
   case Intrinsic::gcroot: {
     assert(DAG.getMachineFunction().getFunction().hasGC() &&
            "only valid in functions with gc specified, enforced by Verifier");
     assert(GFI && "implied by previous");
     const Value *Alloca = I.getArgOperand(0)->stripPointerCasts();
     const Constant *TypeMap = cast<Constant>(I.getArgOperand(1));
 
     FrameIndexSDNode *FI = cast<FrameIndexSDNode>(getValue(Alloca).getNode());
     GFI->addStackRoot(FI->getIndex(), TypeMap);
     return;
   }
   case Intrinsic::gcread:
   case Intrinsic::gcwrite:
     llvm_unreachable("GC failed to lower gcread/gcwrite intrinsics!");
   case Intrinsic::flt_rounds:
     Res = DAG.getNode(ISD::FLT_ROUNDS_, sdl, {MVT::i32, MVT::Other}, getRoot());
     setValue(&I, Res);
     DAG.setRoot(Res.getValue(1));
     return;
 
   case Intrinsic::expect:
     // Just replace __builtin_expect(exp, c) with EXP.
     setValue(&I, getValue(I.getArgOperand(0)));
     return;
 
   case Intrinsic::debugtrap:
   case Intrinsic::trap: {
     StringRef TrapFuncName =
         I.getAttributes()
             .getAttribute(AttributeList::FunctionIndex, "trap-func-name")
             .getValueAsString();
     if (TrapFuncName.empty()) {
       ISD::NodeType Op = (Intrinsic == Intrinsic::trap) ?
         ISD::TRAP : ISD::DEBUGTRAP;
       DAG.setRoot(DAG.getNode(Op, sdl,MVT::Other, getRoot()));
       return;
     }
     TargetLowering::ArgListTy Args;
 
     TargetLowering::CallLoweringInfo CLI(DAG);
     CLI.setDebugLoc(sdl).setChain(getRoot()).setLibCallee(
         CallingConv::C, I.getType(),
         DAG.getExternalSymbol(TrapFuncName.data(),
                               TLI.getPointerTy(DAG.getDataLayout())),
         std::move(Args));
 
     std::pair<SDValue, SDValue> Result = TLI.LowerCallTo(CLI);
     DAG.setRoot(Result.second);
     return;
   }
 
   case Intrinsic::uadd_with_overflow:
   case Intrinsic::sadd_with_overflow:
   case Intrinsic::usub_with_overflow:
   case Intrinsic::ssub_with_overflow:
   case Intrinsic::umul_with_overflow:
   case Intrinsic::smul_with_overflow: {
     ISD::NodeType Op;
     switch (Intrinsic) {
     default: llvm_unreachable("Impossible intrinsic");  // Can't reach here.
     case Intrinsic::uadd_with_overflow: Op = ISD::UADDO; break;
     case Intrinsic::sadd_with_overflow: Op = ISD::SADDO; break;
     case Intrinsic::usub_with_overflow: Op = ISD::USUBO; break;
     case Intrinsic::ssub_with_overflow: Op = ISD::SSUBO; break;
     case Intrinsic::umul_with_overflow: Op = ISD::UMULO; break;
     case Intrinsic::smul_with_overflow: Op = ISD::SMULO; break;
     }
     SDValue Op1 = getValue(I.getArgOperand(0));
     SDValue Op2 = getValue(I.getArgOperand(1));
 
     EVT ResultVT = Op1.getValueType();
     EVT OverflowVT = MVT::i1;
     if (ResultVT.isVector())
       OverflowVT = EVT::getVectorVT(
           *Context, OverflowVT, ResultVT.getVectorNumElements());
 
     SDVTList VTs = DAG.getVTList(ResultVT, OverflowVT);
     setValue(&I, DAG.getNode(Op, sdl, VTs, Op1, Op2));
     return;
   }
   case Intrinsic::prefetch: {
     SDValue Ops[5];
     unsigned rw = cast<ConstantInt>(I.getArgOperand(1))->getZExtValue();
     auto Flags = rw == 0 ? MachineMemOperand::MOLoad :MachineMemOperand::MOStore;
     Ops[0] = DAG.getRoot();
     Ops[1] = getValue(I.getArgOperand(0));
     Ops[2] = getValue(I.getArgOperand(1));
     Ops[3] = getValue(I.getArgOperand(2));
     Ops[4] = getValue(I.getArgOperand(3));
     SDValue Result = DAG.getMemIntrinsicNode(
         ISD::PREFETCH, sdl, DAG.getVTList(MVT::Other), Ops,
         EVT::getIntegerVT(*Context, 8), MachinePointerInfo(I.getArgOperand(0)),
         /* align */ None, Flags);
 
     // Chain the prefetch in parallell with any pending loads, to stay out of
     // the way of later optimizations.
     PendingLoads.push_back(Result);
     Result = getRoot();
     DAG.setRoot(Result);
     return;
   }
   case Intrinsic::lifetime_start:
   case Intrinsic::lifetime_end: {
     bool IsStart = (Intrinsic == Intrinsic::lifetime_start);
     // Stack coloring is not enabled in O0, discard region information.
     if (TM.getOptLevel() == CodeGenOpt::None)
       return;
 
     const int64_t ObjectSize =
         cast<ConstantInt>(I.getArgOperand(0))->getSExtValue();
     Value *const ObjectPtr = I.getArgOperand(1);
     SmallVector<const Value *, 4> Allocas;
     GetUnderlyingObjects(ObjectPtr, Allocas, *DL);
 
     for (SmallVectorImpl<const Value*>::iterator Object = Allocas.begin(),
            E = Allocas.end(); Object != E; ++Object) {
       const AllocaInst *LifetimeObject = dyn_cast_or_null<AllocaInst>(*Object);
 
       // Could not find an Alloca.
       if (!LifetimeObject)
         continue;
 
       // First check that the Alloca is static, otherwise it won't have a
       // valid frame index.
       auto SI = FuncInfo.StaticAllocaMap.find(LifetimeObject);
       if (SI == FuncInfo.StaticAllocaMap.end())
         return;
 
       const int FrameIndex = SI->second;
       int64_t Offset;
       if (GetPointerBaseWithConstantOffset(
               ObjectPtr, Offset, DAG.getDataLayout()) != LifetimeObject)
         Offset = -1; // Cannot determine offset from alloca to lifetime object.
       Res = DAG.getLifetimeNode(IsStart, sdl, getRoot(), FrameIndex, ObjectSize,
                                 Offset);
       DAG.setRoot(Res);
     }
     return;
   }
   case Intrinsic::invariant_start:
     // Discard region information.
     setValue(&I, DAG.getUNDEF(TLI.getPointerTy(DAG.getDataLayout())));
     return;
   case Intrinsic::invariant_end:
     // Discard region information.
     return;
   case Intrinsic::clear_cache:
     /// FunctionName may be null.
     if (const char *FunctionName = TLI.getClearCacheBuiltinName())
       lowerCallToExternalSymbol(I, FunctionName);
     return;
   case Intrinsic::donothing:
     // ignore
     return;
   case Intrinsic::experimental_stackmap:
     visitStackmap(I);
     return;
   case Intrinsic::experimental_patchpoint_void:
   case Intrinsic::experimental_patchpoint_i64:
     visitPatchpoint(I);
     return;
   case Intrinsic::experimental_gc_statepoint:
     LowerStatepoint(cast<GCStatepointInst>(I));
     return;
   case Intrinsic::experimental_gc_result:
     visitGCResult(cast<GCResultInst>(I));
     return;
   case Intrinsic::experimental_gc_relocate:
     visitGCRelocate(cast<GCRelocateInst>(I));
     return;
   case Intrinsic::instrprof_increment:
     llvm_unreachable("instrprof failed to lower an increment");
   case Intrinsic::instrprof_value_profile:
     llvm_unreachable("instrprof failed to lower a value profiling call");
   case Intrinsic::localescape: {
     MachineFunction &MF = DAG.getMachineFunction();
     const TargetInstrInfo *TII = DAG.getSubtarget().getInstrInfo();
 
     // Directly emit some LOCAL_ESCAPE machine instrs. Label assignment emission
     // is the same on all targets.
     for (unsigned Idx = 0, E = I.getNumArgOperands(); Idx < E; ++Idx) {
       Value *Arg = I.getArgOperand(Idx)->stripPointerCasts();
       if (isa<ConstantPointerNull>(Arg))
         continue; // Skip null pointers. They represent a hole in index space.
       AllocaInst *Slot = cast<AllocaInst>(Arg);
       assert(FuncInfo.StaticAllocaMap.count(Slot) &&
              "can only escape static allocas");
       int FI = FuncInfo.StaticAllocaMap[Slot];
       MCSymbol *FrameAllocSym =
           MF.getMMI().getContext().getOrCreateFrameAllocSymbol(
               GlobalValue::dropLLVMManglingEscape(MF.getName()), Idx);
       BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, dl,
               TII->get(TargetOpcode::LOCAL_ESCAPE))
           .addSym(FrameAllocSym)
           .addFrameIndex(FI);
     }
 
     return;
   }
 
   case Intrinsic::localrecover: {
     // i8* @llvm.localrecover(i8* %fn, i8* %fp, i32 %idx)
     MachineFunction &MF = DAG.getMachineFunction();
 
     // Get the symbol that defines the frame offset.
     auto *Fn = cast<Function>(I.getArgOperand(0)->stripPointerCasts());
     auto *Idx = cast<ConstantInt>(I.getArgOperand(2));
     unsigned IdxVal =
         unsigned(Idx->getLimitedValue(std::numeric_limits<int>::max()));
     MCSymbol *FrameAllocSym =
         MF.getMMI().getContext().getOrCreateFrameAllocSymbol(
             GlobalValue::dropLLVMManglingEscape(Fn->getName()), IdxVal);
 
     Value *FP = I.getArgOperand(1);
     SDValue FPVal = getValue(FP);
     EVT PtrVT = FPVal.getValueType();
 
     // Create a MCSymbol for the label to avoid any target lowering
     // that would make this PC relative.
     SDValue OffsetSym = DAG.getMCSymbol(FrameAllocSym, PtrVT);
     SDValue OffsetVal =
         DAG.getNode(ISD::LOCAL_RECOVER, sdl, PtrVT, OffsetSym);
 
     // Add the offset to the FP.
     SDValue Add = DAG.getMemBasePlusOffset(FPVal, OffsetVal, sdl);
     setValue(&I, Add);
 
     return;
   }
 
   case Intrinsic::eh_exceptionpointer:
   case Intrinsic::eh_exceptioncode: {
     // Get the exception pointer vreg, copy from it, and resize it to fit.
     const auto *CPI = cast<CatchPadInst>(I.getArgOperand(0));
     MVT PtrVT = TLI.getPointerTy(DAG.getDataLayout());
     const TargetRegisterClass *PtrRC = TLI.getRegClassFor(PtrVT);
     unsigned VReg = FuncInfo.getCatchPadExceptionPointerVReg(CPI, PtrRC);
     SDValue N =
         DAG.getCopyFromReg(DAG.getEntryNode(), getCurSDLoc(), VReg, PtrVT);
     if (Intrinsic == Intrinsic::eh_exceptioncode)
       N = DAG.getZExtOrTrunc(N, getCurSDLoc(), MVT::i32);
     setValue(&I, N);
     return;
   }
   case Intrinsic::xray_customevent: {
     // Here we want to make sure that the intrinsic behaves as if it has a
     // specific calling convention, and only for x86_64.
     // FIXME: Support other platforms later.
     const auto &Triple = DAG.getTarget().getTargetTriple();
     if (Triple.getArch() != Triple::x86_64 || !Triple.isOSLinux())
       return;
 
     SDLoc DL = getCurSDLoc();
     SmallVector<SDValue, 8> Ops;
 
     // We want to say that we always want the arguments in registers.
     SDValue LogEntryVal = getValue(I.getArgOperand(0));
     SDValue StrSizeVal = getValue(I.getArgOperand(1));
     SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
     SDValue Chain = getRoot();
     Ops.push_back(LogEntryVal);
     Ops.push_back(StrSizeVal);
     Ops.push_back(Chain);
 
     // We need to enforce the calling convention for the callsite, so that
     // argument ordering is enforced correctly, and that register allocation can
     // see that some registers may be assumed clobbered and have to preserve
     // them across calls to the intrinsic.
     MachineSDNode *MN = DAG.getMachineNode(TargetOpcode::PATCHABLE_EVENT_CALL,
                                            DL, NodeTys, Ops);
     SDValue patchableNode = SDValue(MN, 0);
     DAG.setRoot(patchableNode);
     setValue(&I, patchableNode);
     return;
   }
   case Intrinsic::xray_typedevent: {
     // Here we want to make sure that the intrinsic behaves as if it has a
     // specific calling convention, and only for x86_64.
     // FIXME: Support other platforms later.
     const auto &Triple = DAG.getTarget().getTargetTriple();
     if (Triple.getArch() != Triple::x86_64 || !Triple.isOSLinux())
       return;
 
     SDLoc DL = getCurSDLoc();
     SmallVector<SDValue, 8> Ops;
 
     // We want to say that we always want the arguments in registers.
     // It's unclear to me how manipulating the selection DAG here forces callers
     // to provide arguments in registers instead of on the stack.
     SDValue LogTypeId = getValue(I.getArgOperand(0));
     SDValue LogEntryVal = getValue(I.getArgOperand(1));
     SDValue StrSizeVal = getValue(I.getArgOperand(2));
     SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
     SDValue Chain = getRoot();
     Ops.push_back(LogTypeId);
     Ops.push_back(LogEntryVal);
     Ops.push_back(StrSizeVal);
     Ops.push_back(Chain);
 
     // We need to enforce the calling convention for the callsite, so that
     // argument ordering is enforced correctly, and that register allocation can
     // see that some registers may be assumed clobbered and have to preserve
     // them across calls to the intrinsic.
     MachineSDNode *MN = DAG.getMachineNode(
         TargetOpcode::PATCHABLE_TYPED_EVENT_CALL, DL, NodeTys, Ops);
     SDValue patchableNode = SDValue(MN, 0);
     DAG.setRoot(patchableNode);
     setValue(&I, patchableNode);
     return;
   }
   case Intrinsic::experimental_deoptimize:
     LowerDeoptimizeCall(&I);
     return;
 
   case Intrinsic::experimental_vector_reduce_v2_fadd:
   case Intrinsic::experimental_vector_reduce_v2_fmul:
   case Intrinsic::experimental_vector_reduce_add:
   case Intrinsic::experimental_vector_reduce_mul:
   case Intrinsic::experimental_vector_reduce_and:
   case Intrinsic::experimental_vector_reduce_or:
   case Intrinsic::experimental_vector_reduce_xor:
   case Intrinsic::experimental_vector_reduce_smax:
   case Intrinsic::experimental_vector_reduce_smin:
   case Intrinsic::experimental_vector_reduce_umax:
   case Intrinsic::experimental_vector_reduce_umin:
   case Intrinsic::experimental_vector_reduce_fmax:
   case Intrinsic::experimental_vector_reduce_fmin:
     visitVectorReduce(I, Intrinsic);
     return;
 
   case Intrinsic::icall_branch_funnel: {
     SmallVector<SDValue, 16> Ops;
     Ops.push_back(getValue(I.getArgOperand(0)));
 
     int64_t Offset;
     auto *Base = dyn_cast<GlobalObject>(GetPointerBaseWithConstantOffset(
         I.getArgOperand(1), Offset, DAG.getDataLayout()));
     if (!Base)
       report_fatal_error(
           "llvm.icall.branch.funnel operand must be a GlobalValue");
     Ops.push_back(DAG.getTargetGlobalAddress(Base, getCurSDLoc(), MVT::i64, 0));
 
     struct BranchFunnelTarget {
       int64_t Offset;
       SDValue Target;
     };
     SmallVector<BranchFunnelTarget, 8> Targets;
 
     for (unsigned Op = 1, N = I.getNumArgOperands(); Op != N; Op += 2) {
       auto *ElemBase = dyn_cast<GlobalObject>(GetPointerBaseWithConstantOffset(
           I.getArgOperand(Op), Offset, DAG.getDataLayout()));
       if (ElemBase != Base)
         report_fatal_error("all llvm.icall.branch.funnel operands must refer "
                            "to the same GlobalValue");
 
       SDValue Val = getValue(I.getArgOperand(Op + 1));
       auto *GA = dyn_cast<GlobalAddressSDNode>(Val);
       if (!GA)
         report_fatal_error(
             "llvm.icall.branch.funnel operand must be a GlobalValue");
       Targets.push_back({Offset, DAG.getTargetGlobalAddress(
                                      GA->getGlobal(), getCurSDLoc(),
                                      Val.getValueType(), GA->getOffset())});
     }
     llvm::sort(Targets,
                [](const BranchFunnelTarget &T1, const BranchFunnelTarget &T2) {
                  return T1.Offset < T2.Offset;
                });
 
     for (auto &T : Targets) {
       Ops.push_back(DAG.getTargetConstant(T.Offset, getCurSDLoc(), MVT::i32));
       Ops.push_back(T.Target);
     }
 
     Ops.push_back(DAG.getRoot()); // Chain
     SDValue N(DAG.getMachineNode(TargetOpcode::ICALL_BRANCH_FUNNEL,
                                  getCurSDLoc(), MVT::Other, Ops),
               0);
     DAG.setRoot(N);
     setValue(&I, N);
     HasTailCall = true;
     return;
   }
 
   case Intrinsic::wasm_landingpad_index:
     // Information this intrinsic contained has been transferred to
     // MachineFunction in SelectionDAGISel::PrepareEHLandingPad. We can safely
     // delete it now.
     return;
 
   case Intrinsic::aarch64_settag:
   case Intrinsic::aarch64_settag_zero: {
     const SelectionDAGTargetInfo &TSI = DAG.getSelectionDAGInfo();
     bool ZeroMemory = Intrinsic == Intrinsic::aarch64_settag_zero;
     SDValue Val = TSI.EmitTargetCodeForSetTag(
         DAG, getCurSDLoc(), getRoot(), getValue(I.getArgOperand(0)),
         getValue(I.getArgOperand(1)), MachinePointerInfo(I.getArgOperand(0)),
         ZeroMemory);
     DAG.setRoot(Val);
     setValue(&I, Val);
     return;
   }
   case Intrinsic::ptrmask: {
     SDValue Ptr = getValue(I.getOperand(0));
     SDValue Const = getValue(I.getOperand(1));
 
     EVT PtrVT = Ptr.getValueType();
     setValue(&I, DAG.getNode(ISD::AND, getCurSDLoc(), PtrVT, Ptr,
                              DAG.getZExtOrTrunc(Const, getCurSDLoc(), PtrVT)));
     return;
   }
   case Intrinsic::get_active_lane_mask: {
     auto DL = getCurSDLoc();
     SDValue Index = getValue(I.getOperand(0));
     SDValue BTC = getValue(I.getOperand(1));
     Type *ElementTy = I.getOperand(0)->getType();
     EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
     unsigned VecWidth = VT.getVectorNumElements();
 
     SmallVector<SDValue, 16> OpsBTC;
     SmallVector<SDValue, 16> OpsIndex;
     SmallVector<SDValue, 16> OpsStepConstants;
     for (unsigned i = 0; i < VecWidth; i++) {
       OpsBTC.push_back(BTC);
       OpsIndex.push_back(Index);
       OpsStepConstants.push_back(DAG.getConstant(i, DL, MVT::getVT(ElementTy)));
     }
 
     EVT CCVT = MVT::i1;
     CCVT = EVT::getVectorVT(I.getContext(), CCVT, VecWidth);
 
     auto VecTy = MVT::getVT(FixedVectorType::get(ElementTy, VecWidth));
     SDValue VectorIndex = DAG.getBuildVector(VecTy, DL, OpsIndex);
     SDValue VectorStep = DAG.getBuildVector(VecTy, DL, OpsStepConstants);
     SDValue VectorInduction = DAG.getNode(
        ISD::UADDO, DL, DAG.getVTList(VecTy, CCVT), VectorIndex, VectorStep);
     SDValue VectorBTC = DAG.getBuildVector(VecTy, DL, OpsBTC);
     SDValue SetCC = DAG.getSetCC(DL, CCVT, VectorInduction.getValue(0),
                                  VectorBTC, ISD::CondCode::SETULE);
     setValue(&I, DAG.getNode(ISD::AND, DL, CCVT,
                              DAG.getNOT(DL, VectorInduction.getValue(1), CCVT),
                              SetCC));
     return;
   }
   }
 }
 
 void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
     const ConstrainedFPIntrinsic &FPI) {
   SDLoc sdl = getCurSDLoc();
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SmallVector<EVT, 4> ValueVTs;
   ComputeValueVTs(TLI, DAG.getDataLayout(), FPI.getType(), ValueVTs);
   ValueVTs.push_back(MVT::Other); // Out chain
 
   // We do not need to serialize constrained FP intrinsics against
   // each other or against (nonvolatile) loads, so they can be
   // chained like loads.
   SDValue Chain = DAG.getRoot();
   SmallVector<SDValue, 4> Opers;
   Opers.push_back(Chain);
   if (FPI.isUnaryOp()) {
     Opers.push_back(getValue(FPI.getArgOperand(0)));
   } else if (FPI.isTernaryOp()) {
     Opers.push_back(getValue(FPI.getArgOperand(0)));
     Opers.push_back(getValue(FPI.getArgOperand(1)));
     Opers.push_back(getValue(FPI.getArgOperand(2)));
   } else {
     Opers.push_back(getValue(FPI.getArgOperand(0)));
     Opers.push_back(getValue(FPI.getArgOperand(1)));
   }
 
   auto pushOutChain = [this](SDValue Result, fp::ExceptionBehavior EB) {
     assert(Result.getNode()->getNumValues() == 2);
 
     // Push node to the appropriate list so that future instructions can be
     // chained up correctly.
     SDValue OutChain = Result.getValue(1);
     switch (EB) {
     case fp::ExceptionBehavior::ebIgnore:
       // The only reason why ebIgnore nodes still need to be chained is that
       // they might depend on the current rounding mode, and therefore must
       // not be moved across instruction that may change that mode.
       LLVM_FALLTHROUGH;
     case fp::ExceptionBehavior::ebMayTrap:
       // These must not be moved across calls or instructions that may change
       // floating-point exception masks.
       PendingConstrainedFP.push_back(OutChain);
       break;
     case fp::ExceptionBehavior::ebStrict:
       // These must not be moved across calls or instructions that may change
       // floating-point exception masks or read floating-point exception flags.
       // In addition, they cannot be optimized out even if unused.
       PendingConstrainedFPStrict.push_back(OutChain);
       break;
     }
   };
 
   SDVTList VTs = DAG.getVTList(ValueVTs);
   fp::ExceptionBehavior EB = FPI.getExceptionBehavior().getValue();
 
   SDNodeFlags Flags;
   if (EB == fp::ExceptionBehavior::ebIgnore)
     Flags.setNoFPExcept(true);
 
   if (auto *FPOp = dyn_cast<FPMathOperator>(&FPI))
     Flags.copyFMF(*FPOp);
 
   unsigned Opcode;
   switch (FPI.getIntrinsicID()) {
   default: llvm_unreachable("Impossible intrinsic");  // Can't reach here.
 #define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
   case Intrinsic::INTRINSIC:                                                   \
     Opcode = ISD::STRICT_##DAGN;                                               \
     break;
 #include "llvm/IR/ConstrainedOps.def"
   case Intrinsic::experimental_constrained_fmuladd: {
     Opcode = ISD::STRICT_FMA;
     // Break fmuladd into fmul and fadd.
     if (TM.Options.AllowFPOpFusion == FPOpFusion::Strict ||
         !TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(),
                                         ValueVTs[0])) {
       Opers.pop_back();
       SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, sdl, VTs, Opers, Flags);
       pushOutChain(Mul, EB);
       Opcode = ISD::STRICT_FADD;
       Opers.clear();
       Opers.push_back(Mul.getValue(1));
       Opers.push_back(Mul.getValue(0));
       Opers.push_back(getValue(FPI.getArgOperand(2)));
     }
     break;
   }
   }
 
   // A few strict DAG nodes carry additional operands that are not
   // set up by the default code above.
   switch (Opcode) {
   default: break;
   case ISD::STRICT_FP_ROUND:
     Opers.push_back(
         DAG.getTargetConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout())));
     break;
   case ISD::STRICT_FSETCC:
   case ISD::STRICT_FSETCCS: {
     auto *FPCmp = dyn_cast<ConstrainedFPCmpIntrinsic>(&FPI);
     Opers.push_back(DAG.getCondCode(getFCmpCondCode(FPCmp->getPredicate())));
     break;
   }
   }
 
   SDValue Result = DAG.getNode(Opcode, sdl, VTs, Opers, Flags);
   pushOutChain(Result, EB);
 
   SDValue FPResult = Result.getValue(0);
   setValue(&FPI, FPResult);
 }
 
 std::pair<SDValue, SDValue>
 SelectionDAGBuilder::lowerInvokable(TargetLowering::CallLoweringInfo &CLI,
                                     const BasicBlock *EHPadBB) {
   MachineFunction &MF = DAG.getMachineFunction();
   MachineModuleInfo &MMI = MF.getMMI();
   MCSymbol *BeginLabel = nullptr;
 
   if (EHPadBB) {
     // Insert a label before the invoke call to mark the try range.  This can be
     // used to detect deletion of the invoke via the MachineModuleInfo.
     BeginLabel = MMI.getContext().createTempSymbol();
 
     // For SjLj, keep track of which landing pads go with which invokes
     // so as to maintain the ordering of pads in the LSDA.
     unsigned CallSiteIndex = MMI.getCurrentCallSite();
     if (CallSiteIndex) {
       MF.setCallSiteBeginLabel(BeginLabel, CallSiteIndex);
       LPadToCallSiteMap[FuncInfo.MBBMap[EHPadBB]].push_back(CallSiteIndex);
 
       // Now that the call site is handled, stop tracking it.
       MMI.setCurrentCallSite(0);
     }
 
     // Both PendingLoads and PendingExports must be flushed here;
     // this call might not return.
     (void)getRoot();
     DAG.setRoot(DAG.getEHLabel(getCurSDLoc(), getControlRoot(), BeginLabel));
 
     CLI.setChain(getRoot());
   }
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   std::pair<SDValue, SDValue> Result = TLI.LowerCallTo(CLI);
 
   assert((CLI.IsTailCall || Result.second.getNode()) &&
          "Non-null chain expected with non-tail call!");
   assert((Result.second.getNode() || !Result.first.getNode()) &&
          "Null value expected with tail call!");
 
   if (!Result.second.getNode()) {
     // As a special case, a null chain means that a tail call has been emitted
     // and the DAG root is already updated.
     HasTailCall = true;
 
     // Since there's no actual continuation from this block, nothing can be
     // relying on us setting vregs for them.
     PendingExports.clear();
   } else {
     DAG.setRoot(Result.second);
   }
 
   if (EHPadBB) {
     // Insert a label at the end of the invoke call to mark the try range.  This
     // can be used to detect deletion of the invoke via the MachineModuleInfo.
     MCSymbol *EndLabel = MMI.getContext().createTempSymbol();
     DAG.setRoot(DAG.getEHLabel(getCurSDLoc(), getRoot(), EndLabel));
 
     // Inform MachineModuleInfo of range.
     auto Pers = classifyEHPersonality(FuncInfo.Fn->getPersonalityFn());
     // There is a platform (e.g. wasm) that uses funclet style IR but does not
     // actually use outlined funclets and their LSDA info style.
     if (MF.hasEHFunclets() && isFuncletEHPersonality(Pers)) {
       assert(CLI.CB);
       WinEHFuncInfo *EHInfo = DAG.getMachineFunction().getWinEHFuncInfo();
       EHInfo->addIPToStateRange(cast<InvokeInst>(CLI.CB), BeginLabel, EndLabel);
     } else if (!isScopedEHPersonality(Pers)) {
       MF.addInvoke(FuncInfo.MBBMap[EHPadBB], BeginLabel, EndLabel);
     }
   }
 
   return Result;
 }
 
 void SelectionDAGBuilder::LowerCallTo(const CallBase &CB, SDValue Callee,
                                       bool isTailCall,
                                       const BasicBlock *EHPadBB) {
   auto &DL = DAG.getDataLayout();
   FunctionType *FTy = CB.getFunctionType();
   Type *RetTy = CB.getType();
 
   TargetLowering::ArgListTy Args;
   Args.reserve(CB.arg_size());
 
   const Value *SwiftErrorVal = nullptr;
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
 
   if (isTailCall) {
     // Avoid emitting tail calls in functions with the disable-tail-calls
     // attribute.
     auto *Caller = CB.getParent()->getParent();
     if (Caller->getFnAttribute("disable-tail-calls").getValueAsString() ==
         "true")
       isTailCall = false;
 
     // We can't tail call inside a function with a swifterror argument. Lowering
     // does not support this yet. It would have to move into the swifterror
     // register before the call.
     if (TLI.supportSwiftError() &&
         Caller->getAttributes().hasAttrSomewhere(Attribute::SwiftError))
       isTailCall = false;
   }
 
   for (auto I = CB.arg_begin(), E = CB.arg_end(); I != E; ++I) {
     TargetLowering::ArgListEntry Entry;
     const Value *V = *I;
 
     // Skip empty types
     if (V->getType()->isEmptyTy())
       continue;
 
     SDValue ArgNode = getValue(V);
     Entry.Node = ArgNode; Entry.Ty = V->getType();
 
     Entry.setAttributes(&CB, I - CB.arg_begin());
 
     // Use swifterror virtual register as input to the call.
     if (Entry.IsSwiftError && TLI.supportSwiftError()) {
       SwiftErrorVal = V;
       // We find the virtual register for the actual swifterror argument.
       // Instead of using the Value, we use the virtual register instead.
       Entry.Node =
           DAG.getRegister(SwiftError.getOrCreateVRegUseAt(&CB, FuncInfo.MBB, V),
                           EVT(TLI.getPointerTy(DL)));
     }
 
     Args.push_back(Entry);
 
     // If we have an explicit sret argument that is an Instruction, (i.e., it
     // might point to function-local memory), we can't meaningfully tail-call.
     if (Entry.IsSRet && isa<Instruction>(V))
       isTailCall = false;
   }
 
   // If call site has a cfguardtarget operand bundle, create and add an
   // additional ArgListEntry.
   if (auto Bundle = CB.getOperandBundle(LLVMContext::OB_cfguardtarget)) {
     TargetLowering::ArgListEntry Entry;
     Value *V = Bundle->Inputs[0];
     SDValue ArgNode = getValue(V);
     Entry.Node = ArgNode;
     Entry.Ty = V->getType();
     Entry.IsCFGuardTarget = true;
     Args.push_back(Entry);
   }
 
   // Check if target-independent constraints permit a tail call here.
   // Target-dependent constraints are checked within TLI->LowerCallTo.
   if (isTailCall && !isInTailCallPosition(CB, DAG.getTarget()))
     isTailCall = false;
 
   // Disable tail calls if there is an swifterror argument. Targets have not
   // been updated to support tail calls.
   if (TLI.supportSwiftError() && SwiftErrorVal)
     isTailCall = false;
 
   TargetLowering::CallLoweringInfo CLI(DAG);
   CLI.setDebugLoc(getCurSDLoc())
       .setChain(getRoot())
       .setCallee(RetTy, FTy, Callee, std::move(Args), CB)
       .setTailCall(isTailCall)
       .setConvergent(CB.isConvergent())
       .setIsPreallocated(
           CB.countOperandBundlesOfType(LLVMContext::OB_preallocated) != 0);
   std::pair<SDValue, SDValue> Result = lowerInvokable(CLI, EHPadBB);
 
   if (Result.first.getNode()) {
     Result.first = lowerRangeToAssertZExt(DAG, CB, Result.first);
     setValue(&CB, Result.first);
   }
 
   // The last element of CLI.InVals has the SDValue for swifterror return.
   // Here we copy it to a virtual register and update SwiftErrorMap for
   // book-keeping.
   if (SwiftErrorVal && TLI.supportSwiftError()) {
     // Get the last element of InVals.
     SDValue Src = CLI.InVals.back();
     Register VReg =
         SwiftError.getOrCreateVRegDefAt(&CB, FuncInfo.MBB, SwiftErrorVal);
     SDValue CopyNode = CLI.DAG.getCopyToReg(Result.second, CLI.DL, VReg, Src);
     DAG.setRoot(CopyNode);
   }
 }
 
 static SDValue getMemCmpLoad(const Value *PtrVal, MVT LoadVT,
                              SelectionDAGBuilder &Builder) {
   // Check to see if this load can be trivially constant folded, e.g. if the
   // input is from a string literal.
   if (const Constant *LoadInput = dyn_cast<Constant>(PtrVal)) {
     // Cast pointer to the type we really want to load.
     Type *LoadTy =
         Type::getIntNTy(PtrVal->getContext(), LoadVT.getScalarSizeInBits());
     if (LoadVT.isVector())
       LoadTy = FixedVectorType::get(LoadTy, LoadVT.getVectorNumElements());
 
     LoadInput = ConstantExpr::getBitCast(const_cast<Constant *>(LoadInput),
                                          PointerType::getUnqual(LoadTy));
 
     if (const Constant *LoadCst = ConstantFoldLoadFromConstPtr(
             const_cast<Constant *>(LoadInput), LoadTy, *Builder.DL))
       return Builder.getValue(LoadCst);
   }
 
   // Otherwise, we have to emit the load.  If the pointer is to unfoldable but
   // still constant memory, the input chain can be the entry node.
   SDValue Root;
   bool ConstantMemory = false;
 
   // Do not serialize (non-volatile) loads of constant memory with anything.
   if (Builder.AA && Builder.AA->pointsToConstantMemory(PtrVal)) {
     Root = Builder.DAG.getEntryNode();
     ConstantMemory = true;
   } else {
     // Do not serialize non-volatile loads against each other.
     Root = Builder.DAG.getRoot();
   }
 
   SDValue Ptr = Builder.getValue(PtrVal);
   SDValue LoadVal = Builder.DAG.getLoad(LoadVT, Builder.getCurSDLoc(), Root,
                                         Ptr, MachinePointerInfo(PtrVal),
                                         /* Alignment = */ 1);
 
   if (!ConstantMemory)
     Builder.PendingLoads.push_back(LoadVal.getValue(1));
   return LoadVal;
 }
 
 /// Record the value for an instruction that produces an integer result,
 /// converting the type where necessary.
 void SelectionDAGBuilder::processIntegerCallValue(const Instruction &I,
                                                   SDValue Value,
                                                   bool IsSigned) {
   EVT VT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                     I.getType(), true);
   if (IsSigned)
     Value = DAG.getSExtOrTrunc(Value, getCurSDLoc(), VT);
   else
     Value = DAG.getZExtOrTrunc(Value, getCurSDLoc(), VT);
   setValue(&I, Value);
 }
 
 /// See if we can lower a memcmp call into an optimized form. If so, return
 /// true and lower it. Otherwise return false, and it will be lowered like a
 /// normal call.
 /// The caller already checked that \p I calls the appropriate LibFunc with a
 /// correct prototype.
 bool SelectionDAGBuilder::visitMemCmpCall(const CallInst &I) {
   const Value *LHS = I.getArgOperand(0), *RHS = I.getArgOperand(1);
   const Value *Size = I.getArgOperand(2);
   const ConstantInt *CSize = dyn_cast<ConstantInt>(Size);
   if (CSize && CSize->getZExtValue() == 0) {
     EVT CallVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
                                                           I.getType(), true);
     setValue(&I, DAG.getConstant(0, getCurSDLoc(), CallVT));
     return true;
   }
 
   const SelectionDAGTargetInfo &TSI = DAG.getSelectionDAGInfo();
   std::pair<SDValue, SDValue> Res = TSI.EmitTargetCodeForMemcmp(
       DAG, getCurSDLoc(), DAG.getRoot(), getValue(LHS), getValue(RHS),
       getValue(Size), MachinePointerInfo(LHS), MachinePointerInfo(RHS));
   if (Res.first.getNode()) {
     processIntegerCallValue(I, Res.first, true);
     PendingLoads.push_back(Res.second);
     return true;
   }
 
   // memcmp(S1,S2,2) != 0 -> (*(short*)LHS != *(short*)RHS)  != 0
   // memcmp(S1,S2,4) != 0 -> (*(int*)LHS != *(int*)RHS)  != 0
   if (!CSize || !isOnlyUsedInZeroEqualityComparison(&I))
     return false;
 
   // If the target has a fast compare for the given size, it will return a
   // preferred load type for that size. Require that the load VT is legal and
   // that the target supports unaligned loads of that type. Otherwise, return
   // INVALID.
   auto hasFastLoadsAndCompare = [&](unsigned NumBits) {
     const TargetLowering &TLI = DAG.getTargetLoweringInfo();
     MVT LVT = TLI.hasFastEqualityCompare(NumBits);
     if (LVT != MVT::INVALID_SIMPLE_VALUE_TYPE) {
       // TODO: Handle 5 byte compare as 4-byte + 1 byte.
       // TODO: Handle 8 byte compare on x86-32 as two 32-bit loads.
       // TODO: Check alignment of src and dest ptrs.
       unsigned DstAS = LHS->getType()->getPointerAddressSpace();
       unsigned SrcAS = RHS->getType()->getPointerAddressSpace();
       if (!TLI.isTypeLegal(LVT) ||
           !TLI.allowsMisalignedMemoryAccesses(LVT, SrcAS) ||
           !TLI.allowsMisalignedMemoryAccesses(LVT, DstAS))
         LVT = MVT::INVALID_SIMPLE_VALUE_TYPE;
     }
 
     return LVT;
   };
 
   // This turns into unaligned loads. We only do this if the target natively
   // supports the MVT we'll be loading or if it is small enough (<= 4) that
   // we'll only produce a small number of byte loads.
   MVT LoadVT;
   unsigned NumBitsToCompare = CSize->getZExtValue() * 8;
   switch (NumBitsToCompare) {
   default:
     return false;
   case 16:
     LoadVT = MVT::i16;
     break;
   case 32:
     LoadVT = MVT::i32;
     break;
   case 64:
   case 128:
   case 256:
     LoadVT = hasFastLoadsAndCompare(NumBitsToCompare);
     break;
   }
 
   if (LoadVT == MVT::INVALID_SIMPLE_VALUE_TYPE)
     return false;
 
   SDValue LoadL = getMemCmpLoad(LHS, LoadVT, *this);
   SDValue LoadR = getMemCmpLoad(RHS, LoadVT, *this);
 
   // Bitcast to a wide integer type if the loads are vectors.
   if (LoadVT.isVector()) {
     EVT CmpVT = EVT::getIntegerVT(LHS->getContext(), LoadVT.getSizeInBits());
     LoadL = DAG.getBitcast(CmpVT, LoadL);
     LoadR = DAG.getBitcast(CmpVT, LoadR);
   }
 
   SDValue Cmp = DAG.getSetCC(getCurSDLoc(), MVT::i1, LoadL, LoadR, ISD::SETNE);
   processIntegerCallValue(I, Cmp, false);
   return true;
 }
 
 /// See if we can lower a memchr call into an optimized form. If so, return
 /// true and lower it. Otherwise return false, and it will be lowered like a
 /// normal call.
 /// The caller already checked that \p I calls the appropriate LibFunc with a
 /// correct prototype.
 bool SelectionDAGBuilder::visitMemChrCall(const CallInst &I) {
   const Value *Src = I.getArgOperand(0);
   const Value *Char = I.getArgOperand(1);
   const Value *Length = I.getArgOperand(2);
 
   const SelectionDAGTargetInfo &TSI = DAG.getSelectionDAGInfo();
   std::pair<SDValue, SDValue> Res =
     TSI.EmitTargetCodeForMemchr(DAG, getCurSDLoc(), DAG.getRoot(),
                                 getValue(Src), getValue(Char), getValue(Length),
                                 MachinePointerInfo(Src));
   if (Res.first.getNode()) {
     setValue(&I, Res.first);
     PendingLoads.push_back(Res.second);
     return true;
   }
 
   return false;
 }
 
 /// See if we can lower a mempcpy call into an optimized form. If so, return
 /// true and lower it. Otherwise return false, and it will be lowered like a
 /// normal call.
 /// The caller already checked that \p I calls the appropriate LibFunc with a
 /// correct prototype.
 bool SelectionDAGBuilder::visitMemPCpyCall(const CallInst &I) {
   SDValue Dst = getValue(I.getArgOperand(0));
   SDValue Src = getValue(I.getArgOperand(1));
   SDValue Size = getValue(I.getArgOperand(2));
 
   Align DstAlign = DAG.InferPtrAlign(Dst).valueOrOne();
   Align SrcAlign = DAG.InferPtrAlign(Src).valueOrOne();
   // DAG::getMemcpy needs Alignment to be defined.
   Align Alignment = std::min(DstAlign, SrcAlign);
 
   bool isVol = false;
   SDLoc sdl = getCurSDLoc();
 
   // In the mempcpy context we need to pass in a false value for isTailCall
   // because the return pointer needs to be adjusted by the size of
   // the copied memory.
   SDValue Root = isVol ? getRoot() : getMemoryRoot();
   SDValue MC = DAG.getMemcpy(Root, sdl, Dst, Src, Size, Alignment, isVol, false,
                              /*isTailCall=*/false,
                              MachinePointerInfo(I.getArgOperand(0)),
                              MachinePointerInfo(I.getArgOperand(1)));
   assert(MC.getNode() != nullptr &&
          "** memcpy should not be lowered as TailCall in mempcpy context **");
   DAG.setRoot(MC);
 
   // Check if Size needs to be truncated or extended.
   Size = DAG.getSExtOrTrunc(Size, sdl, Dst.getValueType());
 
   // Adjust return pointer to point just past the last dst byte.
   SDValue DstPlusSize = DAG.getNode(ISD::ADD, sdl, Dst.getValueType(),
                                     Dst, Size);
   setValue(&I, DstPlusSize);
   return true;
 }
 
 /// See if we can lower a strcpy call into an optimized form.  If so, return
 /// true and lower it, otherwise return false and it will be lowered like a
 /// normal call.
 /// The caller already checked that \p I calls the appropriate LibFunc with a
 /// correct prototype.
 bool SelectionDAGBuilder::visitStrCpyCall(const CallInst &I, bool isStpcpy) {
   const Value *Arg0 = I.getArgOperand(0), *Arg1 = I.getArgOperand(1);
 
   const SelectionDAGTargetInfo &TSI = DAG.getSelectionDAGInfo();
   std::pair<SDValue, SDValue> Res =
     TSI.EmitTargetCodeForStrcpy(DAG, getCurSDLoc(), getRoot(),
                                 getValue(Arg0), getValue(Arg1),
                                 MachinePointerInfo(Arg0),
                                 MachinePointerInfo(Arg1), isStpcpy);
   if (Res.first.getNode()) {
     setValue(&I, Res.first);
     DAG.setRoot(Res.second);
     return true;
   }
 
   return false;
 }
 
 /// See if we can lower a strcmp call into an optimized form.  If so, return
 /// true and lower it, otherwise return false and it will be lowered like a
 /// normal call.
 /// The caller already checked that \p I calls the appropriate LibFunc with a
 /// correct prototype.
 bool SelectionDAGBuilder::visitStrCmpCall(const CallInst &I) {
   const Value *Arg0 = I.getArgOperand(0), *Arg1 = I.getArgOperand(1);
 
   const SelectionDAGTargetInfo &TSI = DAG.getSelectionDAGInfo();
   std::pair<SDValue, SDValue> Res =
     TSI.EmitTargetCodeForStrcmp(DAG, getCurSDLoc(), DAG.getRoot(),
                                 getValue(Arg0), getValue(Arg1),
                                 MachinePointerInfo(Arg0),
                                 MachinePointerInfo(Arg1));
   if (Res.first.getNode()) {
     processIntegerCallValue(I, Res.first, true);
     PendingLoads.push_back(Res.second);
     return true;
   }
 
   return false;
 }
 
 /// See if we can lower a strlen call into an optimized form.  If so, return
 /// true and lower it, otherwise return false and it will be lowered like a
 /// normal call.
 /// The caller already checked that \p I calls the appropriate LibFunc with a
 /// correct prototype.
 bool SelectionDAGBuilder::visitStrLenCall(const CallInst &I) {
   const Value *Arg0 = I.getArgOperand(0);
 
   const SelectionDAGTargetInfo &TSI = DAG.getSelectionDAGInfo();
   std::pair<SDValue, SDValue> Res =
     TSI.EmitTargetCodeForStrlen(DAG, getCurSDLoc(), DAG.getRoot(),
                                 getValue(Arg0), MachinePointerInfo(Arg0));
   if (Res.first.getNode()) {
     processIntegerCallValue(I, Res.first, false);
     PendingLoads.push_back(Res.second);
     return true;
   }
 
   return false;
 }
 
 /// See if we can lower a strnlen call into an optimized form.  If so, return
 /// true and lower it, otherwise return false and it will be lowered like a
 /// normal call.
 /// The caller already checked that \p I calls the appropriate LibFunc with a
 /// correct prototype.
 bool SelectionDAGBuilder::visitStrNLenCall(const CallInst &I) {
   const Value *Arg0 = I.getArgOperand(0), *Arg1 = I.getArgOperand(1);
 
   const SelectionDAGTargetInfo &TSI = DAG.getSelectionDAGInfo();
   std::pair<SDValue, SDValue> Res =
     TSI.EmitTargetCodeForStrnlen(DAG, getCurSDLoc(), DAG.getRoot(),
                                  getValue(Arg0), getValue(Arg1),
                                  MachinePointerInfo(Arg0));
   if (Res.first.getNode()) {
     processIntegerCallValue(I, Res.first, false);
     PendingLoads.push_back(Res.second);
     return true;
   }
 
   return false;
 }
 
 /// See if we can lower a unary floating-point operation into an SDNode with
 /// the specified Opcode.  If so, return true and lower it, otherwise return
 /// false and it will be lowered like a normal call.
 /// The caller already checked that \p I calls the appropriate LibFunc with a
 /// correct prototype.
 bool SelectionDAGBuilder::visitUnaryFloatCall(const CallInst &I,
                                               unsigned Opcode) {
   // We already checked this call's prototype; verify it doesn't modify errno.
   if (!I.onlyReadsMemory())
     return false;
 
   SDValue Tmp = getValue(I.getArgOperand(0));
   setValue(&I, DAG.getNode(Opcode, getCurSDLoc(), Tmp.getValueType(), Tmp));
   return true;
 }
 
 /// See if we can lower a binary floating-point operation into an SDNode with
 /// the specified Opcode. If so, return true and lower it. Otherwise return
 /// false, and it will be lowered like a normal call.
 /// The caller already checked that \p I calls the appropriate LibFunc with a
 /// correct prototype.
 bool SelectionDAGBuilder::visitBinaryFloatCall(const CallInst &I,
                                                unsigned Opcode) {
   // We already checked this call's prototype; verify it doesn't modify errno.
   if (!I.onlyReadsMemory())
     return false;
 
   SDValue Tmp0 = getValue(I.getArgOperand(0));
   SDValue Tmp1 = getValue(I.getArgOperand(1));
   EVT VT = Tmp0.getValueType();
   setValue(&I, DAG.getNode(Opcode, getCurSDLoc(), VT, Tmp0, Tmp1));
   return true;
 }
 
 void SelectionDAGBuilder::visitCall(const CallInst &I) {
   // Handle inline assembly differently.
   if (I.isInlineAsm()) {
     visitInlineAsm(I);
     return;
   }
 
   if (Function *F = I.getCalledFunction()) {
     if (F->isDeclaration()) {
       // Is this an LLVM intrinsic or a target-specific intrinsic?
       unsigned IID = F->getIntrinsicID();
       if (!IID)
         if (const TargetIntrinsicInfo *II = TM.getIntrinsicInfo())
           IID = II->getIntrinsicID(F);
 
       if (IID) {
         visitIntrinsicCall(I, IID);
         return;
       }
     }
 
     // Check for well-known libc/libm calls.  If the function is internal, it
     // can't be a library call.  Don't do the check if marked as nobuiltin for
     // some reason or the call site requires strict floating point semantics.
     LibFunc Func;
     if (!I.isNoBuiltin() && !I.isStrictFP() && !F->hasLocalLinkage() &&
         F->hasName() && LibInfo->getLibFunc(*F, Func) &&
         LibInfo->hasOptimizedCodeGen(Func)) {
       switch (Func) {
       default: break;
       case LibFunc_copysign:
       case LibFunc_copysignf:
       case LibFunc_copysignl:
         // We already checked this call's prototype; verify it doesn't modify
         // errno.
         if (I.onlyReadsMemory()) {
           SDValue LHS = getValue(I.getArgOperand(0));
           SDValue RHS = getValue(I.getArgOperand(1));
           setValue(&I, DAG.getNode(ISD::FCOPYSIGN, getCurSDLoc(),
                                    LHS.getValueType(), LHS, RHS));
           return;
         }
         break;
       case LibFunc_fabs:
       case LibFunc_fabsf:
       case LibFunc_fabsl:
         if (visitUnaryFloatCall(I, ISD::FABS))
           return;
         break;
       case LibFunc_fmin:
       case LibFunc_fminf:
       case LibFunc_fminl:
         if (visitBinaryFloatCall(I, ISD::FMINNUM))
           return;
         break;
       case LibFunc_fmax:
       case LibFunc_fmaxf:
       case LibFunc_fmaxl:
         if (visitBinaryFloatCall(I, ISD::FMAXNUM))
           return;
         break;
       case LibFunc_sin:
       case LibFunc_sinf:
       case LibFunc_sinl:
         if (visitUnaryFloatCall(I, ISD::FSIN))
           return;
         break;
       case LibFunc_cos:
       case LibFunc_cosf:
       case LibFunc_cosl:
         if (visitUnaryFloatCall(I, ISD::FCOS))
           return;
         break;
       case LibFunc_sqrt:
       case LibFunc_sqrtf:
       case LibFunc_sqrtl:
       case LibFunc_sqrt_finite:
       case LibFunc_sqrtf_finite:
       case LibFunc_sqrtl_finite:
         if (visitUnaryFloatCall(I, ISD::FSQRT))
           return;
         break;
       case LibFunc_floor:
       case LibFunc_floorf:
       case LibFunc_floorl:
         if (visitUnaryFloatCall(I, ISD::FFLOOR))
           return;
         break;
       case LibFunc_nearbyint:
       case LibFunc_nearbyintf:
       case LibFunc_nearbyintl:
         if (visitUnaryFloatCall(I, ISD::FNEARBYINT))
           return;
         break;
       case LibFunc_ceil:
       case LibFunc_ceilf:
       case LibFunc_ceill:
         if (visitUnaryFloatCall(I, ISD::FCEIL))
           return;
         break;
       case LibFunc_rint:
       case LibFunc_rintf:
       case LibFunc_rintl:
         if (visitUnaryFloatCall(I, ISD::FRINT))
           return;
         break;
       case LibFunc_round:
       case LibFunc_roundf:
       case LibFunc_roundl:
         if (visitUnaryFloatCall(I, ISD::FROUND))
           return;
         break;
       case LibFunc_trunc:
       case LibFunc_truncf:
       case LibFunc_truncl:
         if (visitUnaryFloatCall(I, ISD::FTRUNC))
           return;
         break;
       case LibFunc_log2:
       case LibFunc_log2f:
       case LibFunc_log2l:
         if (visitUnaryFloatCall(I, ISD::FLOG2))
           return;
         break;
       case LibFunc_exp2:
       case LibFunc_exp2f:
       case LibFunc_exp2l:
         if (visitUnaryFloatCall(I, ISD::FEXP2))
           return;
         break;
       case LibFunc_memcmp:
         if (visitMemCmpCall(I))
           return;
         break;
       case LibFunc_mempcpy:
         if (visitMemPCpyCall(I))
           return;
         break;
       case LibFunc_memchr:
         if (visitMemChrCall(I))
           return;
         break;
       case LibFunc_strcpy:
         if (visitStrCpyCall(I, false))
           return;
         break;
       case LibFunc_stpcpy:
         if (visitStrCpyCall(I, true))
           return;
         break;
       case LibFunc_strcmp:
         if (visitStrCmpCall(I))
           return;
         break;
       case LibFunc_strlen:
         if (visitStrLenCall(I))
           return;
         break;
       case LibFunc_strnlen:
         if (visitStrNLenCall(I))
           return;
         break;
       }
     }
   }
 
   // Deopt bundles are lowered in LowerCallSiteWithDeoptBundle, and we don't
   // have to do anything here to lower funclet bundles.
   // CFGuardTarget bundles are lowered in LowerCallTo.
   assert(!I.hasOperandBundlesOtherThan(
              {LLVMContext::OB_deopt, LLVMContext::OB_funclet,
               LLVMContext::OB_cfguardtarget, LLVMContext::OB_preallocated}) &&
          "Cannot lower calls with arbitrary operand bundles!");
 
   SDValue Callee = getValue(I.getCalledOperand());
 
   if (I.countOperandBundlesOfType(LLVMContext::OB_deopt))
     LowerCallSiteWithDeoptBundle(&I, Callee, nullptr);
   else
     // Check if we can potentially perform a tail call. More detailed checking
     // is be done within LowerCallTo, after more information about the call is
     // known.
     LowerCallTo(I, Callee, I.isTailCall());
 }
 
 namespace {
 
 /// AsmOperandInfo - This contains information for each constraint that we are
 /// lowering.
 class SDISelAsmOperandInfo : public TargetLowering::AsmOperandInfo {
 public:
   /// CallOperand - If this is the result output operand or a clobber
   /// this is null, otherwise it is the incoming operand to the CallInst.
   /// This gets modified as the asm is processed.
   SDValue CallOperand;
 
   /// AssignedRegs - If this is a register or register class operand, this
   /// contains the set of register corresponding to the operand.
   RegsForValue AssignedRegs;
 
   explicit SDISelAsmOperandInfo(const TargetLowering::AsmOperandInfo &info)
     : TargetLowering::AsmOperandInfo(info), CallOperand(nullptr, 0) {
   }
 
   /// Whether or not this operand accesses memory
   bool hasMemory(const TargetLowering &TLI) const {
     // Indirect operand accesses access memory.
     if (isIndirect)
       return true;
 
     for (const auto &Code : Codes)
       if (TLI.getConstraintType(Code) == TargetLowering::C_Memory)
         return true;
 
     return false;
   }
 
   /// getCallOperandValEVT - Return the EVT of the Value* that this operand
   /// corresponds to.  If there is no Value* for this operand, it returns
   /// MVT::Other.
   EVT getCallOperandValEVT(LLVMContext &Context, const TargetLowering &TLI,
                            const DataLayout &DL) const {
     if (!CallOperandVal) return MVT::Other;
 
     if (isa<BasicBlock>(CallOperandVal))
       return TLI.getProgramPointerTy(DL);
 
     llvm::Type *OpTy = CallOperandVal->getType();
 
     // FIXME: code duplicated from TargetLowering::ParseConstraints().
     // If this is an indirect operand, the operand is a pointer to the
     // accessed type.
     if (isIndirect) {
       PointerType *PtrTy = dyn_cast<PointerType>(OpTy);
       if (!PtrTy)
         report_fatal_error("Indirect operand for inline asm not a pointer!");
       OpTy = PtrTy->getElementType();
     }
 
     // Look for vector wrapped in a struct. e.g. { <16 x i8> }.
     if (StructType *STy = dyn_cast<StructType>(OpTy))
       if (STy->getNumElements() == 1)
         OpTy = STy->getElementType(0);
 
     // If OpTy is not a single value, it may be a struct/union that we
     // can tile with integers.
     if (!OpTy->isSingleValueType() && OpTy->isSized()) {
       unsigned BitSize = DL.getTypeSizeInBits(OpTy);
       switch (BitSize) {
       default: break;
       case 1:
       case 8:
       case 16:
       case 32:
       case 64:
       case 128:
         OpTy = IntegerType::get(Context, BitSize);
         break;
       }
     }
 
     return TLI.getValueType(DL, OpTy, true);
   }
 };
 
 
 } // end anonymous namespace
 
 /// Make sure that the output operand \p OpInfo and its corresponding input
 /// operand \p MatchingOpInfo have compatible constraint types (otherwise error
 /// out).
 static void patchMatchingInput(const SDISelAsmOperandInfo &OpInfo,
                                SDISelAsmOperandInfo &MatchingOpInfo,
                                SelectionDAG &DAG) {
   if (OpInfo.ConstraintVT == MatchingOpInfo.ConstraintVT)
     return;
 
   const TargetRegisterInfo *TRI = DAG.getSubtarget().getRegisterInfo();
   const auto &TLI = DAG.getTargetLoweringInfo();
 
   std::pair<unsigned, const TargetRegisterClass *> MatchRC =
       TLI.getRegForInlineAsmConstraint(TRI, OpInfo.ConstraintCode,
                                        OpInfo.ConstraintVT);
   std::pair<unsigned, const TargetRegisterClass *> InputRC =
       TLI.getRegForInlineAsmConstraint(TRI, MatchingOpInfo.ConstraintCode,
                                        MatchingOpInfo.ConstraintVT);
   if ((OpInfo.ConstraintVT.isInteger() !=
        MatchingOpInfo.ConstraintVT.isInteger()) ||
       (MatchRC.second != InputRC.second)) {
     // FIXME: error out in a more elegant fashion
     report_fatal_error("Unsupported asm: input constraint"
                        " with a matching output constraint of"
                        " incompatible type!");
   }
   MatchingOpInfo.ConstraintVT = OpInfo.ConstraintVT;
 }
 
 /// Get a direct memory input to behave well as an indirect operand.
 /// This may introduce stores, hence the need for a \p Chain.
 /// \return The (possibly updated) chain.
 static SDValue getAddressForMemoryInput(SDValue Chain, const SDLoc &Location,
                                         SDISelAsmOperandInfo &OpInfo,
                                         SelectionDAG &DAG) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
 
   // If we don't have an indirect input, put it in the constpool if we can,
   // otherwise spill it to a stack slot.
   // TODO: This isn't quite right. We need to handle these according to
   // the addressing mode that the constraint wants. Also, this may take
   // an additional register for the computation and we don't want that
   // either.
 
   // If the operand is a float, integer, or vector constant, spill to a
   // constant pool entry to get its address.
   const Value *OpVal = OpInfo.CallOperandVal;
   if (isa<ConstantFP>(OpVal) || isa<ConstantInt>(OpVal) ||
       isa<ConstantVector>(OpVal) || isa<ConstantDataVector>(OpVal)) {
     OpInfo.CallOperand = DAG.getConstantPool(
         cast<Constant>(OpVal), TLI.getPointerTy(DAG.getDataLayout()));
     return Chain;
   }
 
   // Otherwise, create a stack slot and emit a store to it before the asm.
   Type *Ty = OpVal->getType();
   auto &DL = DAG.getDataLayout();
   uint64_t TySize = DL.getTypeAllocSize(Ty);
   MachineFunction &MF = DAG.getMachineFunction();
   int SSFI = MF.getFrameInfo().CreateStackObject(
       TySize, DL.getPrefTypeAlign(Ty), false);
   SDValue StackSlot = DAG.getFrameIndex(SSFI, TLI.getFrameIndexTy(DL));
   Chain = DAG.getTruncStore(Chain, Location, OpInfo.CallOperand, StackSlot,
                             MachinePointerInfo::getFixedStack(MF, SSFI),
                             TLI.getMemValueType(DL, Ty));
   OpInfo.CallOperand = StackSlot;
 
   return Chain;
 }
 
 /// GetRegistersForValue - Assign registers (virtual or physical) for the
 /// specified operand.  We prefer to assign virtual registers, to allow the
 /// register allocator to handle the assignment process.  However, if the asm
 /// uses features that we can't model on machineinstrs, we have SDISel do the
 /// allocation.  This produces generally horrible, but correct, code.
 ///
 ///   OpInfo describes the operand
 ///   RefOpInfo describes the matching operand if any, the operand otherwise
 static void GetRegistersForValue(SelectionDAG &DAG, const SDLoc &DL,
                                  SDISelAsmOperandInfo &OpInfo,
                                  SDISelAsmOperandInfo &RefOpInfo) {
   LLVMContext &Context = *DAG.getContext();
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
 
   MachineFunction &MF = DAG.getMachineFunction();
   SmallVector<unsigned, 4> Regs;
   const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();
 
   // No work to do for memory operations.
   if (OpInfo.ConstraintType == TargetLowering::C_Memory)
     return;
 
   // If this is a constraint for a single physreg, or a constraint for a
   // register class, find it.
   unsigned AssignedReg;
   const TargetRegisterClass *RC;
   std::tie(AssignedReg, RC) = TLI.getRegForInlineAsmConstraint(
       &TRI, RefOpInfo.ConstraintCode, RefOpInfo.ConstraintVT);
   // RC is unset only on failure. Return immediately.
   if (!RC)
     return;
 
   // Get the actual register value type.  This is important, because the user
   // may have asked for (e.g.) the AX register in i32 type.  We need to
   // remember that AX is actually i16 to get the right extension.
   const MVT RegVT = *TRI.legalclasstypes_begin(*RC);
 
   if (OpInfo.ConstraintVT != MVT::Other) {
     // If this is an FP operand in an integer register (or visa versa), or more
     // generally if the operand value disagrees with the register class we plan
     // to stick it in, fix the operand type.
     //
     // If this is an input value, the bitcast to the new type is done now.
     // Bitcast for output value is done at the end of visitInlineAsm().
     if ((OpInfo.Type == InlineAsm::isOutput ||
          OpInfo.Type == InlineAsm::isInput) &&
         !TRI.isTypeLegalForClass(*RC, OpInfo.ConstraintVT)) {
       // Try to convert to the first EVT that the reg class contains.  If the
       // types are identical size, use a bitcast to convert (e.g. two differing
       // vector types).  Note: output bitcast is done at the end of
       // visitInlineAsm().
       if (RegVT.getSizeInBits() == OpInfo.ConstraintVT.getSizeInBits()) {
         // Exclude indirect inputs while they are unsupported because the code
         // to perform the load is missing and thus OpInfo.CallOperand still
         // refers to the input address rather than the pointed-to value.
         if (OpInfo.Type == InlineAsm::isInput && !OpInfo.isIndirect)
           OpInfo.CallOperand =
               DAG.getNode(ISD::BITCAST, DL, RegVT, OpInfo.CallOperand);
         OpInfo.ConstraintVT = RegVT;
         // If the operand is an FP value and we want it in integer registers,
         // use the corresponding integer type. This turns an f64 value into
         // i64, which can be passed with two i32 values on a 32-bit machine.
       } else if (RegVT.isInteger() && OpInfo.ConstraintVT.isFloatingPoint()) {
         MVT VT = MVT::getIntegerVT(OpInfo.ConstraintVT.getSizeInBits());
         if (OpInfo.Type == InlineAsm::isInput)
           OpInfo.CallOperand =
               DAG.getNode(ISD::BITCAST, DL, VT, OpInfo.CallOperand);
         OpInfo.ConstraintVT = VT;
       }
     }
   }
 
   // No need to allocate a matching input constraint since the constraint it's
   // matching to has already been allocated.
   if (OpInfo.isMatchingInputConstraint())
     return;
 
   EVT ValueVT = OpInfo.ConstraintVT;
   if (OpInfo.ConstraintVT == MVT::Other)
     ValueVT = RegVT;
 
   // Initialize NumRegs.
   unsigned NumRegs = 1;
   if (OpInfo.ConstraintVT != MVT::Other)
     NumRegs = TLI.getNumRegisters(Context, OpInfo.ConstraintVT);
 
   // If this is a constraint for a specific physical register, like {r17},
   // assign it now.
 
   // If this associated to a specific register, initialize iterator to correct
   // place. If virtual, make sure we have enough registers
 
   // Initialize iterator if necessary
   TargetRegisterClass::iterator I = RC->begin();
   MachineRegisterInfo &RegInfo = MF.getRegInfo();
 
   // Do not check for single registers.
   if (AssignedReg) {
       for (; *I != AssignedReg; ++I)
         assert(I != RC->end() && "AssignedReg should be member of RC");
   }
 
   for (; NumRegs; --NumRegs, ++I) {
     assert(I != RC->end() && "Ran out of registers to allocate!");
     Register R = AssignedReg ? Register(*I) : RegInfo.createVirtualRegister(RC);
     Regs.push_back(R);
   }
 
   OpInfo.AssignedRegs = RegsForValue(Regs, RegVT, ValueVT);
 }
 
 static unsigned
 findMatchingInlineAsmOperand(unsigned OperandNo,
                              const std::vector<SDValue> &AsmNodeOperands) {
   // Scan until we find the definition we already emitted of this operand.
   unsigned CurOp = InlineAsm::Op_FirstOperand;
   for (; OperandNo; --OperandNo) {
     // Advance to the next operand.
     unsigned OpFlag =
         cast<ConstantSDNode>(AsmNodeOperands[CurOp])->getZExtValue();
     assert((InlineAsm::isRegDefKind(OpFlag) ||
             InlineAsm::isRegDefEarlyClobberKind(OpFlag) ||
             InlineAsm::isMemKind(OpFlag)) &&
            "Skipped past definitions?");
     CurOp += InlineAsm::getNumOperandRegisters(OpFlag) + 1;
   }
   return CurOp;
 }
 
 namespace {
 
 class ExtraFlags {
   unsigned Flags = 0;
 
 public:
   explicit ExtraFlags(const CallBase &Call) {
     const InlineAsm *IA = cast<InlineAsm>(Call.getCalledOperand());
     if (IA->hasSideEffects())
       Flags |= InlineAsm::Extra_HasSideEffects;
     if (IA->isAlignStack())
       Flags |= InlineAsm::Extra_IsAlignStack;
     if (Call.isConvergent())
       Flags |= InlineAsm::Extra_IsConvergent;
     Flags |= IA->getDialect() * InlineAsm::Extra_AsmDialect;
   }
 
   void update(const TargetLowering::AsmOperandInfo &OpInfo) {
     // Ideally, we would only check against memory constraints.  However, the
     // meaning of an Other constraint can be target-specific and we can't easily
     // reason about it.  Therefore, be conservative and set MayLoad/MayStore
     // for Other constraints as well.
     if (OpInfo.ConstraintType == TargetLowering::C_Memory ||
         OpInfo.ConstraintType == TargetLowering::C_Other) {
       if (OpInfo.Type == InlineAsm::isInput)
         Flags |= InlineAsm::Extra_MayLoad;
       else if (OpInfo.Type == InlineAsm::isOutput)
         Flags |= InlineAsm::Extra_MayStore;
       else if (OpInfo.Type == InlineAsm::isClobber)
         Flags |= (InlineAsm::Extra_MayLoad | InlineAsm::Extra_MayStore);
     }
   }
 
   unsigned get() const { return Flags; }
 };
 
 } // end anonymous namespace
 
 /// visitInlineAsm - Handle a call to an InlineAsm object.
 void SelectionDAGBuilder::visitInlineAsm(const CallBase &Call) {
   const InlineAsm *IA = cast<InlineAsm>(Call.getCalledOperand());
 
   /// ConstraintOperands - Information about all of the constraints.
   SmallVector<SDISelAsmOperandInfo, 16> ConstraintOperands;
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   TargetLowering::AsmOperandInfoVector TargetConstraints = TLI.ParseConstraints(
       DAG.getDataLayout(), DAG.getSubtarget().getRegisterInfo(), Call);
 
   // First Pass: Calculate HasSideEffects and ExtraFlags (AlignStack,
   // AsmDialect, MayLoad, MayStore).
   bool HasSideEffect = IA->hasSideEffects();
   ExtraFlags ExtraInfo(Call);
 
   unsigned ArgNo = 0;   // ArgNo - The argument of the CallInst.
   unsigned ResNo = 0;   // ResNo - The result number of the next output.
   unsigned NumMatchingOps = 0;
   for (auto &T : TargetConstraints) {
     ConstraintOperands.push_back(SDISelAsmOperandInfo(T));
     SDISelAsmOperandInfo &OpInfo = ConstraintOperands.back();
 
     // Compute the value type for each operand.
     if (OpInfo.Type == InlineAsm::isInput ||
         (OpInfo.Type == InlineAsm::isOutput && OpInfo.isIndirect)) {
       OpInfo.CallOperandVal = Call.getArgOperand(ArgNo++);
 
       // Process the call argument. BasicBlocks are labels, currently appearing
       // only in asm's.
       if (isa<CallBrInst>(Call) &&
           ArgNo - 1 >= (cast<CallBrInst>(&Call)->getNumArgOperands() -
                         cast<CallBrInst>(&Call)->getNumIndirectDests() -
                         NumMatchingOps) &&
           (NumMatchingOps == 0 ||
            ArgNo - 1 < (cast<CallBrInst>(&Call)->getNumArgOperands() -
                         NumMatchingOps))) {
         const auto *BA = cast<BlockAddress>(OpInfo.CallOperandVal);
         EVT VT = TLI.getValueType(DAG.getDataLayout(), BA->getType(), true);
         OpInfo.CallOperand = DAG.getTargetBlockAddress(BA, VT);
       } else if (const auto *BB = dyn_cast<BasicBlock>(OpInfo.CallOperandVal)) {
         OpInfo.CallOperand = DAG.getBasicBlock(FuncInfo.MBBMap[BB]);
       } else {
         OpInfo.CallOperand = getValue(OpInfo.CallOperandVal);
       }
 
       OpInfo.ConstraintVT =
           OpInfo
               .getCallOperandValEVT(*DAG.getContext(), TLI, DAG.getDataLayout())
               .getSimpleVT();
     } else if (OpInfo.Type == InlineAsm::isOutput && !OpInfo.isIndirect) {
       // The return value of the call is this value.  As such, there is no
       // corresponding argument.
       assert(!Call.getType()->isVoidTy() && "Bad inline asm!");
       if (StructType *STy = dyn_cast<StructType>(Call.getType())) {
         OpInfo.ConstraintVT = TLI.getSimpleValueType(
             DAG.getDataLayout(), STy->getElementType(ResNo));
       } else {
         assert(ResNo == 0 && "Asm only has one result!");
         OpInfo.ConstraintVT =
             TLI.getSimpleValueType(DAG.getDataLayout(), Call.getType());
       }
       ++ResNo;
     } else {
       OpInfo.ConstraintVT = MVT::Other;
     }
 
     if (OpInfo.hasMatchingInput())
       ++NumMatchingOps;
 
     if (!HasSideEffect)
       HasSideEffect = OpInfo.hasMemory(TLI);
 
     // Determine if this InlineAsm MayLoad or MayStore based on the constraints.
     // FIXME: Could we compute this on OpInfo rather than T?
 
     // Compute the constraint code and ConstraintType to use.
     TLI.ComputeConstraintToUse(T, SDValue());
 
     if (T.ConstraintType == TargetLowering::C_Immediate &&
         OpInfo.CallOperand && !isa<ConstantSDNode>(OpInfo.CallOperand))
       // We've delayed emitting a diagnostic like the "n" constraint because
       // inlining could cause an integer showing up.
       return emitInlineAsmError(Call, "constraint '" + Twine(T.ConstraintCode) +
                                           "' expects an integer constant "
                                           "expression");
 
     ExtraInfo.update(T);
   }
 
 
   // We won't need to flush pending loads if this asm doesn't touch
   // memory and is nonvolatile.
   SDValue Flag, Chain = (HasSideEffect) ? getRoot() : DAG.getRoot();
 
   bool IsCallBr = isa<CallBrInst>(Call);
   if (IsCallBr) {
     // If this is a callbr we need to flush pending exports since inlineasm_br
     // is a terminator. We need to do this before nodes are glued to
     // the inlineasm_br node.
     Chain = getControlRoot();
   }
 
   // Second pass over the constraints: compute which constraint option to use.
   for (SDISelAsmOperandInfo &OpInfo : ConstraintOperands) {
     // If this is an output operand with a matching input operand, look up the
     // matching input. If their types mismatch, e.g. one is an integer, the
     // other is floating point, or their sizes are different, flag it as an
     // error.
     if (OpInfo.hasMatchingInput()) {
       SDISelAsmOperandInfo &Input = ConstraintOperands[OpInfo.MatchingInput];
       patchMatchingInput(OpInfo, Input, DAG);
     }
 
     // Compute the constraint code and ConstraintType to use.
     TLI.ComputeConstraintToUse(OpInfo, OpInfo.CallOperand, &DAG);
 
     if (OpInfo.ConstraintType == TargetLowering::C_Memory &&
         OpInfo.Type == InlineAsm::isClobber)
       continue;
 
     // If this is a memory input, and if the operand is not indirect, do what we
     // need to provide an address for the memory input.
     if (OpInfo.ConstraintType == TargetLowering::C_Memory &&
         !OpInfo.isIndirect) {
       assert((OpInfo.isMultipleAlternative ||
               (OpInfo.Type == InlineAsm::isInput)) &&
              "Can only indirectify direct input operands!");
 
       // Memory operands really want the address of the value.
       Chain = getAddressForMemoryInput(Chain, getCurSDLoc(), OpInfo, DAG);
 
       // There is no longer a Value* corresponding to this operand.
       OpInfo.CallOperandVal = nullptr;
 
       // It is now an indirect operand.
       OpInfo.isIndirect = true;
     }
 
   }
 
   // AsmNodeOperands - The operands for the ISD::INLINEASM node.
   std::vector<SDValue> AsmNodeOperands;
   AsmNodeOperands.push_back(SDValue());  // reserve space for input chain
   AsmNodeOperands.push_back(DAG.getTargetExternalSymbol(
       IA->getAsmString().c_str(), TLI.getProgramPointerTy(DAG.getDataLayout())));
 
   // If we have a !srcloc metadata node associated with it, we want to attach
   // this to the ultimately generated inline asm machineinstr.  To do this, we
   // pass in the third operand as this (potentially null) inline asm MDNode.
   const MDNode *SrcLoc = Call.getMetadata("srcloc");
   AsmNodeOperands.push_back(DAG.getMDNode(SrcLoc));
 
   // Remember the HasSideEffect, AlignStack, AsmDialect, MayLoad and MayStore
   // bits as operand 3.
   AsmNodeOperands.push_back(DAG.getTargetConstant(
       ExtraInfo.get(), getCurSDLoc(), TLI.getPointerTy(DAG.getDataLayout())));
 
   // Third pass: Loop over operands to prepare DAG-level operands.. As part of
   // this, assign virtual and physical registers for inputs and otput.
   for (SDISelAsmOperandInfo &OpInfo : ConstraintOperands) {
     // Assign Registers.
     SDISelAsmOperandInfo &RefOpInfo =
         OpInfo.isMatchingInputConstraint()
             ? ConstraintOperands[OpInfo.getMatchedOperand()]
             : OpInfo;
     GetRegistersForValue(DAG, getCurSDLoc(), OpInfo, RefOpInfo);
 
     auto DetectWriteToReservedRegister = [&]() {
       const MachineFunction &MF = DAG.getMachineFunction();
       const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();
       for (unsigned Reg : OpInfo.AssignedRegs.Regs) {
         if (Register::isPhysicalRegister(Reg) &&
             TRI.isInlineAsmReadOnlyReg(MF, Reg)) {
           const char *RegName = TRI.getName(Reg);
           emitInlineAsmError(Call, "write to reserved register '" +
                                        Twine(RegName) + "'");
           return true;
         }
       }
       return false;
     };
 
     switch (OpInfo.Type) {
     case InlineAsm::isOutput:
       if (OpInfo.ConstraintType == TargetLowering::C_Memory) {
         unsigned ConstraintID =
             TLI.getInlineAsmMemConstraint(OpInfo.ConstraintCode);
         assert(ConstraintID != InlineAsm::Constraint_Unknown &&
                "Failed to convert memory constraint code to constraint id.");
 
         // Add information to the INLINEASM node to know about this output.
         unsigned OpFlags = InlineAsm::getFlagWord(InlineAsm::Kind_Mem, 1);
         OpFlags = InlineAsm::getFlagWordForMem(OpFlags, ConstraintID);
         AsmNodeOperands.push_back(DAG.getTargetConstant(OpFlags, getCurSDLoc(),
                                                         MVT::i32));
         AsmNodeOperands.push_back(OpInfo.CallOperand);
       } else {
         // Otherwise, this outputs to a register (directly for C_Register /
         // C_RegisterClass, and a target-defined fashion for
         // C_Immediate/C_Other). Find a register that we can use.
         if (OpInfo.AssignedRegs.Regs.empty()) {
           emitInlineAsmError(
               Call, "couldn't allocate output register for constraint '" +
                         Twine(OpInfo.ConstraintCode) + "'");
           return;
         }
 
         if (DetectWriteToReservedRegister())
           return;
 
         // Add information to the INLINEASM node to know that this register is
         // set.
         OpInfo.AssignedRegs.AddInlineAsmOperands(
             OpInfo.isEarlyClobber ? InlineAsm::Kind_RegDefEarlyClobber
                                   : InlineAsm::Kind_RegDef,
             false, 0, getCurSDLoc(), DAG, AsmNodeOperands);
       }
       break;
 
     case InlineAsm::isInput: {
       SDValue InOperandVal = OpInfo.CallOperand;
 
       if (OpInfo.isMatchingInputConstraint()) {
         // If this is required to match an output register we have already set,
         // just use its register.
         auto CurOp = findMatchingInlineAsmOperand(OpInfo.getMatchedOperand(),
                                                   AsmNodeOperands);
         unsigned OpFlag =
           cast<ConstantSDNode>(AsmNodeOperands[CurOp])->getZExtValue();
         if (InlineAsm::isRegDefKind(OpFlag) ||
             InlineAsm::isRegDefEarlyClobberKind(OpFlag)) {
           // Add (OpFlag&0xffff)>>3 registers to MatchedRegs.
           if (OpInfo.isIndirect) {
             // This happens on gcc/testsuite/gcc.dg/pr8788-1.c
             emitInlineAsmError(Call, "inline asm not supported yet: "
                                      "don't know how to handle tied "
                                      "indirect register inputs");
             return;
           }
 
           MVT RegVT = AsmNodeOperands[CurOp+1].getSimpleValueType();
           SmallVector<unsigned, 4> Regs;
 
           if (const TargetRegisterClass *RC = TLI.getRegClassFor(RegVT)) {
             unsigned NumRegs = InlineAsm::getNumOperandRegisters(OpFlag);
             MachineRegisterInfo &RegInfo =
                 DAG.getMachineFunction().getRegInfo();
             for (unsigned i = 0; i != NumRegs; ++i)
               Regs.push_back(RegInfo.createVirtualRegister(RC));
           } else {
             emitInlineAsmError(Call,
                                "inline asm error: This value type register "
                                "class is not natively supported!");
             return;
           }
 
           RegsForValue MatchedRegs(Regs, RegVT, InOperandVal.getValueType());
 
           SDLoc dl = getCurSDLoc();
           // Use the produced MatchedRegs object to
           MatchedRegs.getCopyToRegs(InOperandVal, DAG, dl, Chain, &Flag, &Call);
           MatchedRegs.AddInlineAsmOperands(InlineAsm::Kind_RegUse,
                                            true, OpInfo.getMatchedOperand(), dl,
                                            DAG, AsmNodeOperands);
           break;
         }
 
         assert(InlineAsm::isMemKind(OpFlag) && "Unknown matching constraint!");
         assert(InlineAsm::getNumOperandRegisters(OpFlag) == 1 &&
                "Unexpected number of operands");
         // Add information to the INLINEASM node to know about this input.
         // See InlineAsm.h isUseOperandTiedToDef.
         OpFlag = InlineAsm::convertMemFlagWordToMatchingFlagWord(OpFlag);
         OpFlag = InlineAsm::getFlagWordForMatchingOp(OpFlag,
                                                     OpInfo.getMatchedOperand());
         AsmNodeOperands.push_back(DAG.getTargetConstant(
             OpFlag, getCurSDLoc(), TLI.getPointerTy(DAG.getDataLayout())));
         AsmNodeOperands.push_back(AsmNodeOperands[CurOp+1]);
         break;
       }
 
       // Treat indirect 'X' constraint as memory.
       if (OpInfo.ConstraintType == TargetLowering::C_Other &&
           OpInfo.isIndirect)
         OpInfo.ConstraintType = TargetLowering::C_Memory;
 
       if (OpInfo.ConstraintType == TargetLowering::C_Immediate ||
           OpInfo.ConstraintType == TargetLowering::C_Other) {
         std::vector<SDValue> Ops;
         TLI.LowerAsmOperandForConstraint(InOperandVal, OpInfo.ConstraintCode,
                                           Ops, DAG);
         if (Ops.empty()) {
           if (OpInfo.ConstraintType == TargetLowering::C_Immediate)
             if (isa<ConstantSDNode>(InOperandVal)) {
               emitInlineAsmError(Call, "value out of range for constraint '" +
                                            Twine(OpInfo.ConstraintCode) + "'");
               return;
             }
 
           emitInlineAsmError(Call,
                              "invalid operand for inline asm constraint '" +
                                  Twine(OpInfo.ConstraintCode) + "'");
           return;
         }
 
         // Add information to the INLINEASM node to know about this input.
         unsigned ResOpType =
           InlineAsm::getFlagWord(InlineAsm::Kind_Imm, Ops.size());
         AsmNodeOperands.push_back(DAG.getTargetConstant(
             ResOpType, getCurSDLoc(), TLI.getPointerTy(DAG.getDataLayout())));
         AsmNodeOperands.insert(AsmNodeOperands.end(), Ops.begin(), Ops.end());
         break;
       }
 
       if (OpInfo.ConstraintType == TargetLowering::C_Memory) {
         assert(OpInfo.isIndirect && "Operand must be indirect to be a mem!");
         assert(InOperandVal.getValueType() ==
                    TLI.getPointerTy(DAG.getDataLayout()) &&
                "Memory operands expect pointer values");
 
         unsigned ConstraintID =
             TLI.getInlineAsmMemConstraint(OpInfo.ConstraintCode);
         assert(ConstraintID != InlineAsm::Constraint_Unknown &&
                "Failed to convert memory constraint code to constraint id.");
 
         // Add information to the INLINEASM node to know about this input.
         unsigned ResOpType = InlineAsm::getFlagWord(InlineAsm::Kind_Mem, 1);
         ResOpType = InlineAsm::getFlagWordForMem(ResOpType, ConstraintID);
         AsmNodeOperands.push_back(DAG.getTargetConstant(ResOpType,
                                                         getCurSDLoc(),
                                                         MVT::i32));
         AsmNodeOperands.push_back(InOperandVal);
         break;
       }
 
       assert((OpInfo.ConstraintType == TargetLowering::C_RegisterClass ||
               OpInfo.ConstraintType == TargetLowering::C_Register) &&
              "Unknown constraint type!");
 
       // TODO: Support this.
       if (OpInfo.isIndirect) {
         emitInlineAsmError(
             Call, "Don't know how to handle indirect register inputs yet "
                   "for constraint '" +
                       Twine(OpInfo.ConstraintCode) + "'");
         return;
       }
 
       // Copy the input into the appropriate registers.
       if (OpInfo.AssignedRegs.Regs.empty()) {
         emitInlineAsmError(Call,
                            "couldn't allocate input reg for constraint '" +
                                Twine(OpInfo.ConstraintCode) + "'");
         return;
       }
 
       if (DetectWriteToReservedRegister())
         return;
 
       SDLoc dl = getCurSDLoc();
 
       OpInfo.AssignedRegs.getCopyToRegs(InOperandVal, DAG, dl, Chain, &Flag,
                                         &Call);
 
       OpInfo.AssignedRegs.AddInlineAsmOperands(InlineAsm::Kind_RegUse, false, 0,
                                                dl, DAG, AsmNodeOperands);
       break;
     }
     case InlineAsm::isClobber:
       // Add the clobbered value to the operand list, so that the register
       // allocator is aware that the physreg got clobbered.
       if (!OpInfo.AssignedRegs.Regs.empty())
         OpInfo.AssignedRegs.AddInlineAsmOperands(InlineAsm::Kind_Clobber,
                                                  false, 0, getCurSDLoc(), DAG,
                                                  AsmNodeOperands);
       break;
     }
   }
 
   // Finish up input operands.  Set the input chain and add the flag last.
   AsmNodeOperands[InlineAsm::Op_InputChain] = Chain;
   if (Flag.getNode()) AsmNodeOperands.push_back(Flag);
 
   unsigned ISDOpc = IsCallBr ? ISD::INLINEASM_BR : ISD::INLINEASM;
   Chain = DAG.getNode(ISDOpc, getCurSDLoc(),
                       DAG.getVTList(MVT::Other, MVT::Glue), AsmNodeOperands);
   Flag = Chain.getValue(1);
 
   // Do additional work to generate outputs.
 
   SmallVector<EVT, 1> ResultVTs;
   SmallVector<SDValue, 1> ResultValues;
   SmallVector<SDValue, 8> OutChains;
 
   llvm::Type *CallResultType = Call.getType();
   ArrayRef<Type *> ResultTypes;
   if (StructType *StructResult = dyn_cast<StructType>(CallResultType))
     ResultTypes = StructResult->elements();
   else if (!CallResultType->isVoidTy())
     ResultTypes = makeArrayRef(CallResultType);
 
   auto CurResultType = ResultTypes.begin();
   auto handleRegAssign = [&](SDValue V) {
     assert(CurResultType != ResultTypes.end() && "Unexpected value");
     assert((*CurResultType)->isSized() && "Unexpected unsized type");
     EVT ResultVT = TLI.getValueType(DAG.getDataLayout(), *CurResultType);
     ++CurResultType;
     // If the type of the inline asm call site return value is different but has
     // same size as the type of the asm output bitcast it.  One example of this
     // is for vectors with different width / number of elements.  This can
     // happen for register classes that can contain multiple different value
     // types.  The preg or vreg allocated may not have the same VT as was
     // expected.
     //
     // This can also happen for a return value that disagrees with the register
     // class it is put in, eg. a double in a general-purpose register on a
     // 32-bit machine.
     if (ResultVT != V.getValueType() &&
         ResultVT.getSizeInBits() == V.getValueSizeInBits())
       V = DAG.getNode(ISD::BITCAST, getCurSDLoc(), ResultVT, V);
     else if (ResultVT != V.getValueType() && ResultVT.isInteger() &&
              V.getValueType().isInteger()) {
       // If a result value was tied to an input value, the computed result
       // may have a wider width than the expected result.  Extract the
       // relevant portion.
       V = DAG.getNode(ISD::TRUNCATE, getCurSDLoc(), ResultVT, V);
     }
     assert(ResultVT == V.getValueType() && "Asm result value mismatch!");
     ResultVTs.push_back(ResultVT);
     ResultValues.push_back(V);
   };
 
   // Deal with output operands.
   for (SDISelAsmOperandInfo &OpInfo : ConstraintOperands) {
     if (OpInfo.Type == InlineAsm::isOutput) {
       SDValue Val;
       // Skip trivial output operands.
       if (OpInfo.AssignedRegs.Regs.empty())
         continue;
 
       switch (OpInfo.ConstraintType) {
       case TargetLowering::C_Register:
       case TargetLowering::C_RegisterClass:
         Val = OpInfo.AssignedRegs.getCopyFromRegs(DAG, FuncInfo, getCurSDLoc(),
                                                   Chain, &Flag, &Call);
         break;
       case TargetLowering::C_Immediate:
       case TargetLowering::C_Other:
         Val = TLI.LowerAsmOutputForConstraint(Chain, Flag, getCurSDLoc(),
                                               OpInfo, DAG);
         break;
       case TargetLowering::C_Memory:
         break; // Already handled.
       case TargetLowering::C_Unknown:
         assert(false && "Unexpected unknown constraint");
       }
 
       // Indirect output manifest as stores. Record output chains.
       if (OpInfo.isIndirect) {
         const Value *Ptr = OpInfo.CallOperandVal;
         assert(Ptr && "Expected value CallOperandVal for indirect asm operand");
         SDValue Store = DAG.getStore(Chain, getCurSDLoc(), Val, getValue(Ptr),
                                      MachinePointerInfo(Ptr));
         OutChains.push_back(Store);
       } else {
         // generate CopyFromRegs to associated registers.
         assert(!Call.getType()->isVoidTy() && "Bad inline asm!");
         if (Val.getOpcode() == ISD::MERGE_VALUES) {
           for (const SDValue &V : Val->op_values())
             handleRegAssign(V);
         } else
           handleRegAssign(Val);
       }
     }
   }
 
   // Set results.
   if (!ResultValues.empty()) {
     assert(CurResultType == ResultTypes.end() &&
            "Mismatch in number of ResultTypes");
     assert(ResultValues.size() == ResultTypes.size() &&
            "Mismatch in number of output operands in asm result");
 
     SDValue V = DAG.getNode(ISD::MERGE_VALUES, getCurSDLoc(),
                             DAG.getVTList(ResultVTs), ResultValues);
     setValue(&Call, V);
   }
 
   // Collect store chains.
   if (!OutChains.empty())
     Chain = DAG.getNode(ISD::TokenFactor, getCurSDLoc(), MVT::Other, OutChains);
 
   // Only Update Root if inline assembly has a memory effect.
   if (ResultValues.empty() || HasSideEffect || !OutChains.empty() || IsCallBr)
     DAG.setRoot(Chain);
 }
 
 void SelectionDAGBuilder::emitInlineAsmError(const CallBase &Call,
                                              const Twine &Message) {
   LLVMContext &Ctx = *DAG.getContext();
   Ctx.emitError(&Call, Message);
 
   // Make sure we leave the DAG in a valid state
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SmallVector<EVT, 1> ValueVTs;
   ComputeValueVTs(TLI, DAG.getDataLayout(), Call.getType(), ValueVTs);
 
   if (ValueVTs.empty())
     return;
 
   SmallVector<SDValue, 1> Ops;
   for (unsigned i = 0, e = ValueVTs.size(); i != e; ++i)
     Ops.push_back(DAG.getUNDEF(ValueVTs[i]));
 
   setValue(&Call, DAG.getMergeValues(Ops, getCurSDLoc()));
 }
 
 void SelectionDAGBuilder::visitVAStart(const CallInst &I) {
   DAG.setRoot(DAG.getNode(ISD::VASTART, getCurSDLoc(),
                           MVT::Other, getRoot(),
                           getValue(I.getArgOperand(0)),
                           DAG.getSrcValue(I.getArgOperand(0))));
 }
 
 void SelectionDAGBuilder::visitVAArg(const VAArgInst &I) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   const DataLayout &DL = DAG.getDataLayout();
   SDValue V = DAG.getVAArg(
       TLI.getMemValueType(DAG.getDataLayout(), I.getType()), getCurSDLoc(),
       getRoot(), getValue(I.getOperand(0)), DAG.getSrcValue(I.getOperand(0)),
       DL.getABITypeAlign(I.getType()).value());
   DAG.setRoot(V.getValue(1));
 
   if (I.getType()->isPointerTy())
     V = DAG.getPtrExtOrTrunc(
         V, getCurSDLoc(), TLI.getValueType(DAG.getDataLayout(), I.getType()));
   setValue(&I, V);
 }
 
 void SelectionDAGBuilder::visitVAEnd(const CallInst &I) {
   DAG.setRoot(DAG.getNode(ISD::VAEND, getCurSDLoc(),
                           MVT::Other, getRoot(),
                           getValue(I.getArgOperand(0)),
                           DAG.getSrcValue(I.getArgOperand(0))));
 }
 
 void SelectionDAGBuilder::visitVACopy(const CallInst &I) {
   DAG.setRoot(DAG.getNode(ISD::VACOPY, getCurSDLoc(),
                           MVT::Other, getRoot(),
                           getValue(I.getArgOperand(0)),
                           getValue(I.getArgOperand(1)),
                           DAG.getSrcValue(I.getArgOperand(0)),
                           DAG.getSrcValue(I.getArgOperand(1))));
 }
 
 SDValue SelectionDAGBuilder::lowerRangeToAssertZExt(SelectionDAG &DAG,
                                                     const Instruction &I,
                                                     SDValue Op) {
   const MDNode *Range = I.getMetadata(LLVMContext::MD_range);
   if (!Range)
     return Op;
 
   ConstantRange CR = getConstantRangeFromMetadata(*Range);
   if (CR.isFullSet() || CR.isEmptySet() || CR.isUpperWrapped())
     return Op;
 
   APInt Lo = CR.getUnsignedMin();
   if (!Lo.isMinValue())
     return Op;
 
   APInt Hi = CR.getUnsignedMax();
   unsigned Bits = std::max(Hi.getActiveBits(),
                            static_cast<unsigned>(IntegerType::MIN_INT_BITS));
 
   EVT SmallVT = EVT::getIntegerVT(*DAG.getContext(), Bits);
 
   SDLoc SL = getCurSDLoc();
 
   SDValue ZExt = DAG.getNode(ISD::AssertZext, SL, Op.getValueType(), Op,
                              DAG.getValueType(SmallVT));
   unsigned NumVals = Op.getNode()->getNumValues();
   if (NumVals == 1)
     return ZExt;
 
   SmallVector<SDValue, 4> Ops;
 
   Ops.push_back(ZExt);
   for (unsigned I = 1; I != NumVals; ++I)
     Ops.push_back(Op.getValue(I));
 
   return DAG.getMergeValues(Ops, SL);
 }
 
 /// Populate a CallLowerinInfo (into \p CLI) based on the properties of
 /// the call being lowered.
 ///
 /// This is a helper for lowering intrinsics that follow a target calling
 /// convention or require stack pointer adjustment. Only a subset of the
 /// intrinsic's operands need to participate in the calling convention.
 void SelectionDAGBuilder::populateCallLoweringInfo(
     TargetLowering::CallLoweringInfo &CLI, const CallBase *Call,
     unsigned ArgIdx, unsigned NumArgs, SDValue Callee, Type *ReturnTy,
     bool IsPatchPoint) {
   TargetLowering::ArgListTy Args;
   Args.reserve(NumArgs);
 
   // Populate the argument list.
   // Attributes for args start at offset 1, after the return attribute.
   for (unsigned ArgI = ArgIdx, ArgE = ArgIdx + NumArgs;
        ArgI != ArgE; ++ArgI) {
     const Value *V = Call->getOperand(ArgI);
 
     assert(!V->getType()->isEmptyTy() && "Empty type passed to intrinsic.");
 
     TargetLowering::ArgListEntry Entry;
     Entry.Node = getValue(V);
     Entry.Ty = V->getType();
     Entry.setAttributes(Call, ArgI);
     Args.push_back(Entry);
   }
 
   CLI.setDebugLoc(getCurSDLoc())
       .setChain(getRoot())
       .setCallee(Call->getCallingConv(), ReturnTy, Callee, std::move(Args))
       .setDiscardResult(Call->use_empty())
       .setIsPatchPoint(IsPatchPoint)
       .setIsPreallocated(
           Call->countOperandBundlesOfType(LLVMContext::OB_preallocated) != 0);
 }
 
 /// Add a stack map intrinsic call's live variable operands to a stackmap
 /// or patchpoint target node's operand list.
 ///
 /// Constants are converted to TargetConstants purely as an optimization to
 /// avoid constant materialization and register allocation.
 ///
 /// FrameIndex operands are converted to TargetFrameIndex so that ISEL does not
 /// generate addess computation nodes, and so FinalizeISel can convert the
 /// TargetFrameIndex into a DirectMemRefOp StackMap location. This avoids
 /// address materialization and register allocation, but may also be required
 /// for correctness. If a StackMap (or PatchPoint) intrinsic directly uses an
 /// alloca in the entry block, then the runtime may assume that the alloca's
 /// StackMap location can be read immediately after compilation and that the
 /// location is valid at any point during execution (this is similar to the
 /// assumption made by the llvm.gcroot intrinsic). If the alloca's location were
 /// only available in a register, then the runtime would need to trap when
 /// execution reaches the StackMap in order to read the alloca's location.
 static void addStackMapLiveVars(const CallBase &Call, unsigned StartIdx,
                                 const SDLoc &DL, SmallVectorImpl<SDValue> &Ops,
                                 SelectionDAGBuilder &Builder) {
   for (unsigned i = StartIdx, e = Call.arg_size(); i != e; ++i) {
     SDValue OpVal = Builder.getValue(Call.getArgOperand(i));
     if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(OpVal)) {
       Ops.push_back(
         Builder.DAG.getTargetConstant(StackMaps::ConstantOp, DL, MVT::i64));
       Ops.push_back(
         Builder.DAG.getTargetConstant(C->getSExtValue(), DL, MVT::i64));
     } else if (FrameIndexSDNode *FI = dyn_cast<FrameIndexSDNode>(OpVal)) {
       const TargetLowering &TLI = Builder.DAG.getTargetLoweringInfo();
       Ops.push_back(Builder.DAG.getTargetFrameIndex(
           FI->getIndex(), TLI.getFrameIndexTy(Builder.DAG.getDataLayout())));
     } else
       Ops.push_back(OpVal);
   }
 }
 
 /// Lower llvm.experimental.stackmap directly to its target opcode.
 void SelectionDAGBuilder::visitStackmap(const CallInst &CI) {
   // void @llvm.experimental.stackmap(i32 <id>, i32 <numShadowBytes>,
   //                                  [live variables...])
 
   assert(CI.getType()->isVoidTy() && "Stackmap cannot return a value.");
 
   SDValue Chain, InFlag, Callee, NullPtr;
   SmallVector<SDValue, 32> Ops;
 
   SDLoc DL = getCurSDLoc();
   Callee = getValue(CI.getCalledOperand());
   NullPtr = DAG.getIntPtrConstant(0, DL, true);
 
   // The stackmap intrinsic only records the live variables (the arguments
   // passed to it) and emits NOPS (if requested). Unlike the patchpoint
   // intrinsic, this won't be lowered to a function call. This means we don't
   // have to worry about calling conventions and target specific lowering code.
   // Instead we perform the call lowering right here.
   //
   // chain, flag = CALLSEQ_START(chain, 0, 0)
   // chain, flag = STACKMAP(id, nbytes, ..., chain, flag)
   // chain, flag = CALLSEQ_END(chain, 0, 0, flag)
   //
   Chain = DAG.getCALLSEQ_START(getRoot(), 0, 0, DL);
   InFlag = Chain.getValue(1);
 
   // Add the <id> and <numBytes> constants.
   SDValue IDVal = getValue(CI.getOperand(PatchPointOpers::IDPos));
   Ops.push_back(DAG.getTargetConstant(
                   cast<ConstantSDNode>(IDVal)->getZExtValue(), DL, MVT::i64));
   SDValue NBytesVal = getValue(CI.getOperand(PatchPointOpers::NBytesPos));
   Ops.push_back(DAG.getTargetConstant(
                   cast<ConstantSDNode>(NBytesVal)->getZExtValue(), DL,
                   MVT::i32));
 
   // Push live variables for the stack map.
   addStackMapLiveVars(CI, 2, DL, Ops, *this);
 
   // We are not pushing any register mask info here on the operands list,
   // because the stackmap doesn't clobber anything.
 
   // Push the chain and the glue flag.
   Ops.push_back(Chain);
   Ops.push_back(InFlag);
 
   // Create the STACKMAP node.
   SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
   SDNode *SM = DAG.getMachineNode(TargetOpcode::STACKMAP, DL, NodeTys, Ops);
   Chain = SDValue(SM, 0);
   InFlag = Chain.getValue(1);
 
   Chain = DAG.getCALLSEQ_END(Chain, NullPtr, NullPtr, InFlag, DL);
 
   // Stackmaps don't generate values, so nothing goes into the NodeMap.
 
   // Set the root to the target-lowered call chain.
   DAG.setRoot(Chain);
 
   // Inform the Frame Information that we have a stackmap in this function.
   FuncInfo.MF->getFrameInfo().setHasStackMap();
 }
 
 /// Lower llvm.experimental.patchpoint directly to its target opcode.
 void SelectionDAGBuilder::visitPatchpoint(const CallBase &CB,
                                           const BasicBlock *EHPadBB) {
   // void|i64 @llvm.experimental.patchpoint.void|i64(i64 <id>,
   //                                                 i32 <numBytes>,
   //                                                 i8* <target>,
   //                                                 i32 <numArgs>,
   //                                                 [Args...],
   //                                                 [live variables...])
 
   CallingConv::ID CC = CB.getCallingConv();
   bool IsAnyRegCC = CC == CallingConv::AnyReg;
   bool HasDef = !CB.getType()->isVoidTy();
   SDLoc dl = getCurSDLoc();
   SDValue Callee = getValue(CB.getArgOperand(PatchPointOpers::TargetPos));
 
   // Handle immediate and symbolic callees.
   if (auto* ConstCallee = dyn_cast<ConstantSDNode>(Callee))
     Callee = DAG.getIntPtrConstant(ConstCallee->getZExtValue(), dl,
                                    /*isTarget=*/true);
   else if (auto* SymbolicCallee = dyn_cast<GlobalAddressSDNode>(Callee))
     Callee =  DAG.getTargetGlobalAddress(SymbolicCallee->getGlobal(),
                                          SDLoc(SymbolicCallee),
                                          SymbolicCallee->getValueType(0));
 
   // Get the real number of arguments participating in the call <numArgs>
   SDValue NArgVal = getValue(CB.getArgOperand(PatchPointOpers::NArgPos));
   unsigned NumArgs = cast<ConstantSDNode>(NArgVal)->getZExtValue();
 
   // Skip the four meta args: <id>, <numNopBytes>, <target>, <numArgs>
   // Intrinsics include all meta-operands up to but not including CC.
   unsigned NumMetaOpers = PatchPointOpers::CCPos;
   assert(CB.arg_size() >= NumMetaOpers + NumArgs &&
          "Not enough arguments provided to the patchpoint intrinsic");
 
   // For AnyRegCC the arguments are lowered later on manually.
   unsigned NumCallArgs = IsAnyRegCC ? 0 : NumArgs;
   Type *ReturnTy =
       IsAnyRegCC ? Type::getVoidTy(*DAG.getContext()) : CB.getType();
 
   TargetLowering::CallLoweringInfo CLI(DAG);
   populateCallLoweringInfo(CLI, &CB, NumMetaOpers, NumCallArgs, Callee,
                            ReturnTy, true);
   std::pair<SDValue, SDValue> Result = lowerInvokable(CLI, EHPadBB);
 
   SDNode *CallEnd = Result.second.getNode();
   if (HasDef && (CallEnd->getOpcode() == ISD::CopyFromReg))
     CallEnd = CallEnd->getOperand(0).getNode();
 
   /// Get a call instruction from the call sequence chain.
   /// Tail calls are not allowed.
   assert(CallEnd->getOpcode() == ISD::CALLSEQ_END &&
          "Expected a callseq node.");
   SDNode *Call = CallEnd->getOperand(0).getNode();
   bool HasGlue = Call->getGluedNode();
 
   // Replace the target specific call node with the patchable intrinsic.
   SmallVector<SDValue, 8> Ops;
 
   // Add the <id> and <numBytes> constants.
   SDValue IDVal = getValue(CB.getArgOperand(PatchPointOpers::IDPos));
   Ops.push_back(DAG.getTargetConstant(
                   cast<ConstantSDNode>(IDVal)->getZExtValue(), dl, MVT::i64));
   SDValue NBytesVal = getValue(CB.getArgOperand(PatchPointOpers::NBytesPos));
   Ops.push_back(DAG.getTargetConstant(
                   cast<ConstantSDNode>(NBytesVal)->getZExtValue(), dl,
                   MVT::i32));
 
   // Add the callee.
   Ops.push_back(Callee);
 
   // Adjust <numArgs> to account for any arguments that have been passed on the
   // stack instead.
   // Call Node: Chain, Target, {Args}, RegMask, [Glue]
   unsigned NumCallRegArgs = Call->getNumOperands() - (HasGlue ? 4 : 3);
   NumCallRegArgs = IsAnyRegCC ? NumArgs : NumCallRegArgs;
   Ops.push_back(DAG.getTargetConstant(NumCallRegArgs, dl, MVT::i32));
 
   // Add the calling convention
   Ops.push_back(DAG.getTargetConstant((unsigned)CC, dl, MVT::i32));
 
   // Add the arguments we omitted previously. The register allocator should
   // place these in any free register.
   if (IsAnyRegCC)
     for (unsigned i = NumMetaOpers, e = NumMetaOpers + NumArgs; i != e; ++i)
       Ops.push_back(getValue(CB.getArgOperand(i)));
 
   // Push the arguments from the call instruction up to the register mask.
   SDNode::op_iterator e = HasGlue ? Call->op_end()-2 : Call->op_end()-1;
   Ops.append(Call->op_begin() + 2, e);
 
   // Push live variables for the stack map.
   addStackMapLiveVars(CB, NumMetaOpers + NumArgs, dl, Ops, *this);
 
   // Push the register mask info.
   if (HasGlue)
     Ops.push_back(*(Call->op_end()-2));
   else
     Ops.push_back(*(Call->op_end()-1));
 
   // Push the chain (this is originally the first operand of the call, but
   // becomes now the last or second to last operand).
   Ops.push_back(*(Call->op_begin()));
 
   // Push the glue flag (last operand).
   if (HasGlue)
     Ops.push_back(*(Call->op_end()-1));
 
   SDVTList NodeTys;
   if (IsAnyRegCC && HasDef) {
     // Create the return types based on the intrinsic definition
     const TargetLowering &TLI = DAG.getTargetLoweringInfo();
     SmallVector<EVT, 3> ValueVTs;
     ComputeValueVTs(TLI, DAG.getDataLayout(), CB.getType(), ValueVTs);
     assert(ValueVTs.size() == 1 && "Expected only one return value type.");
 
     // There is always a chain and a glue type at the end
     ValueVTs.push_back(MVT::Other);
     ValueVTs.push_back(MVT::Glue);
     NodeTys = DAG.getVTList(ValueVTs);
   } else
     NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
 
   // Replace the target specific call node with a PATCHPOINT node.
   MachineSDNode *MN = DAG.getMachineNode(TargetOpcode::PATCHPOINT,
                                          dl, NodeTys, Ops);
 
   // Update the NodeMap.
   if (HasDef) {
     if (IsAnyRegCC)
       setValue(&CB, SDValue(MN, 0));
     else
       setValue(&CB, Result.first);
   }
 
   // Fixup the consumers of the intrinsic. The chain and glue may be used in the
   // call sequence. Furthermore the location of the chain and glue can change
   // when the AnyReg calling convention is used and the intrinsic returns a
   // value.
   if (IsAnyRegCC && HasDef) {
     SDValue From[] = {SDValue(Call, 0), SDValue(Call, 1)};
     SDValue To[] = {SDValue(MN, 1), SDValue(MN, 2)};
     DAG.ReplaceAllUsesOfValuesWith(From, To, 2);
   } else
     DAG.ReplaceAllUsesWith(Call, MN);
   DAG.DeleteNode(Call);
 
   // Inform the Frame Information that we have a patchpoint in this function.
   FuncInfo.MF->getFrameInfo().setHasPatchPoint();
 }
 
 void SelectionDAGBuilder::visitVectorReduce(const CallInst &I,
                                             unsigned Intrinsic) {
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   SDValue Op1 = getValue(I.getArgOperand(0));
   SDValue Op2;
   if (I.getNumArgOperands() > 1)
     Op2 = getValue(I.getArgOperand(1));
   SDLoc dl = getCurSDLoc();
   EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
   SDValue Res;
   FastMathFlags FMF;
   if (isa<FPMathOperator>(I))
     FMF = I.getFastMathFlags();
 
   switch (Intrinsic) {
   case Intrinsic::experimental_vector_reduce_v2_fadd:
     if (FMF.allowReassoc())
       Res = DAG.getNode(ISD::FADD, dl, VT, Op1,
                         DAG.getNode(ISD::VECREDUCE_FADD, dl, VT, Op2));
     else
       Res = DAG.getNode(ISD::VECREDUCE_STRICT_FADD, dl, VT, Op1, Op2);
     break;
   case Intrinsic::experimental_vector_reduce_v2_fmul:
     if (FMF.allowReassoc())
       Res = DAG.getNode(ISD::FMUL, dl, VT, Op1,
                         DAG.getNode(ISD::VECREDUCE_FMUL, dl, VT, Op2));
     else
       Res = DAG.getNode(ISD::VECREDUCE_STRICT_FMUL, dl, VT, Op1, Op2);
     break;
   case Intrinsic::experimental_vector_reduce_add:
     Res = DAG.getNode(ISD::VECREDUCE_ADD, dl, VT, Op1);
     break;
   case Intrinsic::experimental_vector_reduce_mul:
     Res = DAG.getNode(ISD::VECREDUCE_MUL, dl, VT, Op1);
     break;
   case Intrinsic::experimental_vector_reduce_and:
     Res = DAG.getNode(ISD::VECREDUCE_AND, dl, VT, Op1);
     break;
   case Intrinsic::experimental_vector_reduce_or:
     Res = DAG.getNode(ISD::VECREDUCE_OR, dl, VT, Op1);
     break;
   case Intrinsic::experimental_vector_reduce_xor:
     Res = DAG.getNode(ISD::VECREDUCE_XOR, dl, VT, Op1);
     break;
   case Intrinsic::experimental_vector_reduce_smax:
     Res = DAG.getNode(ISD::VECREDUCE_SMAX, dl, VT, Op1);
     break;
   case Intrinsic::experimental_vector_reduce_smin:
     Res = DAG.getNode(ISD::VECREDUCE_SMIN, dl, VT, Op1);
     break;
   case Intrinsic::experimental_vector_reduce_umax:
     Res = DAG.getNode(ISD::VECREDUCE_UMAX, dl, VT, Op1);
     break;
   case Intrinsic::experimental_vector_reduce_umin:
     Res = DAG.getNode(ISD::VECREDUCE_UMIN, dl, VT, Op1);
     break;
   case Intrinsic::experimental_vector_reduce_fmax:
     Res = DAG.getNode(ISD::VECREDUCE_FMAX, dl, VT, Op1);
     break;
   case Intrinsic::experimental_vector_reduce_fmin:
     Res = DAG.getNode(ISD::VECREDUCE_FMIN, dl, VT, Op1);
     break;
   default:
     llvm_unreachable("Unhandled vector reduce intrinsic");
   }
   setValue(&I, Res);
 }
 
 /// Returns an AttributeList representing the attributes applied to the return
 /// value of the given call.
 static AttributeList getReturnAttrs(TargetLowering::CallLoweringInfo &CLI) {
   SmallVector<Attribute::AttrKind, 2> Attrs;
   if (CLI.RetSExt)
     Attrs.push_back(Attribute::SExt);
   if (CLI.RetZExt)
     Attrs.push_back(Attribute::ZExt);
   if (CLI.IsInReg)
     Attrs.push_back(Attribute::InReg);
 
   return AttributeList::get(CLI.RetTy->getContext(), AttributeList::ReturnIndex,
                             Attrs);
 }
 
 /// TargetLowering::LowerCallTo - This is the default LowerCallTo
 /// implementation, which just calls LowerCall.
 /// FIXME: When all targets are
 /// migrated to using LowerCall, this hook should be integrated into SDISel.
 std::pair<SDValue, SDValue>
 TargetLowering::LowerCallTo(TargetLowering::CallLoweringInfo &CLI) const {
   // Handle the incoming return values from the call.
   CLI.Ins.clear();
   Type *OrigRetTy = CLI.RetTy;
   SmallVector<EVT, 4> RetTys;
   SmallVector<uint64_t, 4> Offsets;
   auto &DL = CLI.DAG.getDataLayout();
   ComputeValueVTs(*this, DL, CLI.RetTy, RetTys, &Offsets);
 
   if (CLI.IsPostTypeLegalization) {
     // If we are lowering a libcall after legalization, split the return type.
     SmallVector<EVT, 4> OldRetTys;
     SmallVector<uint64_t, 4> OldOffsets;
     RetTys.swap(OldRetTys);
     Offsets.swap(OldOffsets);
 
     for (size_t i = 0, e = OldRetTys.size(); i != e; ++i) {
       EVT RetVT = OldRetTys[i];
       uint64_t Offset = OldOffsets[i];
       MVT RegisterVT = getRegisterType(CLI.RetTy->getContext(), RetVT);
       unsigned NumRegs = getNumRegisters(CLI.RetTy->getContext(), RetVT);
       unsigned RegisterVTByteSZ = RegisterVT.getSizeInBits() / 8;
       RetTys.append(NumRegs, RegisterVT);
       for (unsigned j = 0; j != NumRegs; ++j)
         Offsets.push_back(Offset + j * RegisterVTByteSZ);
     }
   }
 
   SmallVector<ISD::OutputArg, 4> Outs;
   GetReturnInfo(CLI.CallConv, CLI.RetTy, getReturnAttrs(CLI), Outs, *this, DL);
 
   bool CanLowerReturn =
       this->CanLowerReturn(CLI.CallConv, CLI.DAG.getMachineFunction(),
                            CLI.IsVarArg, Outs, CLI.RetTy->getContext());
 
   SDValue DemoteStackSlot;
   int DemoteStackIdx = -100;
   if (!CanLowerReturn) {
     // FIXME: equivalent assert?
     // assert(!CS.hasInAllocaArgument() &&
     //        "sret demotion is incompatible with inalloca");
     uint64_t TySize = DL.getTypeAllocSize(CLI.RetTy);
     Align Alignment = DL.getPrefTypeAlign(CLI.RetTy);
     MachineFunction &MF = CLI.DAG.getMachineFunction();
     DemoteStackIdx =
         MF.getFrameInfo().CreateStackObject(TySize, Alignment, false);
     Type *StackSlotPtrType = PointerType::get(CLI.RetTy,
                                               DL.getAllocaAddrSpace());
 
     DemoteStackSlot = CLI.DAG.getFrameIndex(DemoteStackIdx, getFrameIndexTy(DL));
     ArgListEntry Entry;
     Entry.Node = DemoteStackSlot;
     Entry.Ty = StackSlotPtrType;
     Entry.IsSExt = false;
     Entry.IsZExt = false;
     Entry.IsInReg = false;
     Entry.IsSRet = true;
     Entry.IsNest = false;
     Entry.IsByVal = false;
     Entry.IsReturned = false;
     Entry.IsSwiftSelf = false;
     Entry.IsSwiftError = false;
     Entry.IsCFGuardTarget = false;
     Entry.Alignment = Alignment;
     CLI.getArgs().insert(CLI.getArgs().begin(), Entry);
     CLI.NumFixedArgs += 1;
     CLI.RetTy = Type::getVoidTy(CLI.RetTy->getContext());
 
     // sret demotion isn't compatible with tail-calls, since the sret argument
     // points into the callers stack frame.
     CLI.IsTailCall = false;
   } else {
     bool NeedsRegBlock = functionArgumentNeedsConsecutiveRegisters(
         CLI.RetTy, CLI.CallConv, CLI.IsVarArg);
     for (unsigned I = 0, E = RetTys.size(); I != E; ++I) {
       ISD::ArgFlagsTy Flags;
       if (NeedsRegBlock) {
         Flags.setInConsecutiveRegs();
         if (I == RetTys.size() - 1)
           Flags.setInConsecutiveRegsLast();
       }
       EVT VT = RetTys[I];
       MVT RegisterVT = getRegisterTypeForCallingConv(CLI.RetTy->getContext(),
                                                      CLI.CallConv, VT);
       unsigned NumRegs = getNumRegistersForCallingConv(CLI.RetTy->getContext(),
                                                        CLI.CallConv, VT);
       for (unsigned i = 0; i != NumRegs; ++i) {
         ISD::InputArg MyFlags;
         MyFlags.Flags = Flags;
         MyFlags.VT = RegisterVT;
         MyFlags.ArgVT = VT;
         MyFlags.Used = CLI.IsReturnValueUsed;
         if (CLI.RetTy->isPointerTy()) {
           MyFlags.Flags.setPointer();
           MyFlags.Flags.setPointerAddrSpace(
               cast<PointerType>(CLI.RetTy)->getAddressSpace());
         }
         if (CLI.RetSExt)
           MyFlags.Flags.setSExt();
         if (CLI.RetZExt)
           MyFlags.Flags.setZExt();
         if (CLI.IsInReg)
           MyFlags.Flags.setInReg();
         CLI.Ins.push_back(MyFlags);
       }
     }
   }
 
   // We push in swifterror return as the last element of CLI.Ins.
   ArgListTy &Args = CLI.getArgs();
   if (supportSwiftError()) {
     for (unsigned i = 0, e = Args.size(); i != e; ++i) {
       if (Args[i].IsSwiftError) {
         ISD::InputArg MyFlags;
         MyFlags.VT = getPointerTy(DL);
         MyFlags.ArgVT = EVT(getPointerTy(DL));
         MyFlags.Flags.setSwiftError();
         CLI.Ins.push_back(MyFlags);
       }
     }
   }
 
   // Handle all of the outgoing arguments.
   CLI.Outs.clear();
   CLI.OutVals.clear();
   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
     SmallVector<EVT, 4> ValueVTs;
     ComputeValueVTs(*this, DL, Args[i].Ty, ValueVTs);
     // FIXME: Split arguments if CLI.IsPostTypeLegalization
     Type *FinalType = Args[i].Ty;
     if (Args[i].IsByVal)
       FinalType = cast<PointerType>(Args[i].Ty)->getElementType();
     bool NeedsRegBlock = functionArgumentNeedsConsecutiveRegisters(
         FinalType, CLI.CallConv, CLI.IsVarArg);
     for (unsigned Value = 0, NumValues = ValueVTs.size(); Value != NumValues;
          ++Value) {
       EVT VT = ValueVTs[Value];
       Type *ArgTy = VT.getTypeForEVT(CLI.RetTy->getContext());
       SDValue Op = SDValue(Args[i].Node.getNode(),
                            Args[i].Node.getResNo() + Value);
       ISD::ArgFlagsTy Flags;
 
       // Certain targets (such as MIPS), may have a different ABI alignment
       // for a type depending on the context. Give the target a chance to
       // specify the alignment it wants.
       const Align OriginalAlignment(getABIAlignmentForCallingConv(ArgTy, DL));
 
       if (Args[i].Ty->isPointerTy()) {
         Flags.setPointer();
         Flags.setPointerAddrSpace(
             cast<PointerType>(Args[i].Ty)->getAddressSpace());
       }
       if (Args[i].IsZExt)
         Flags.setZExt();
       if (Args[i].IsSExt)
         Flags.setSExt();
       if (Args[i].IsInReg) {
         // If we are using vectorcall calling convention, a structure that is
         // passed InReg - is surely an HVA
         if (CLI.CallConv == CallingConv::X86_VectorCall &&
             isa<StructType>(FinalType)) {
           // The first value of a structure is marked
           if (0 == Value)
             Flags.setHvaStart();
           Flags.setHva();
         }
         // Set InReg Flag
         Flags.setInReg();
       }
       if (Args[i].IsSRet)
         Flags.setSRet();
       if (Args[i].IsSwiftSelf)
         Flags.setSwiftSelf();
       if (Args[i].IsSwiftError)
         Flags.setSwiftError();
       if (Args[i].IsCFGuardTarget)
         Flags.setCFGuardTarget();
       if (Args[i].IsByVal)
         Flags.setByVal();
       if (Args[i].IsPreallocated) {
         Flags.setPreallocated();
         // Set the byval flag for CCAssignFn callbacks that don't know about
         // preallocated.  This way we can know how many bytes we should've
         // allocated and how many bytes a callee cleanup function will pop.  If
         // we port preallocated to more targets, we'll have to add custom
         // preallocated handling in the various CC lowering callbacks.
         Flags.setByVal();
       }
       if (Args[i].IsInAlloca) {
         Flags.setInAlloca();
         // Set the byval flag for CCAssignFn callbacks that don't know about
         // inalloca.  This way we can know how many bytes we should've allocated
         // and how many bytes a callee cleanup function will pop.  If we port
         // inalloca to more targets, we'll have to add custom inalloca handling
         // in the various CC lowering callbacks.
         Flags.setByVal();
       }
       if (Args[i].IsByVal || Args[i].IsInAlloca || Args[i].IsPreallocated) {
         PointerType *Ty = cast<PointerType>(Args[i].Ty);
         Type *ElementTy = Ty->getElementType();
 
         unsigned FrameSize = DL.getTypeAllocSize(
             Args[i].ByValType ? Args[i].ByValType : ElementTy);
         Flags.setByValSize(FrameSize);
 
         // info is not there but there are cases it cannot get right.
         Align FrameAlign;
         if (auto MA = Args[i].Alignment)
           FrameAlign = *MA;
         else
           FrameAlign = Align(getByValTypeAlignment(ElementTy, DL));
         Flags.setByValAlign(FrameAlign);
       }
       if (Args[i].IsNest)
         Flags.setNest();
       if (NeedsRegBlock)
         Flags.setInConsecutiveRegs();
       Flags.setOrigAlign(OriginalAlignment);
 
       MVT PartVT = getRegisterTypeForCallingConv(CLI.RetTy->getContext(),
                                                  CLI.CallConv, VT);
       unsigned NumParts = getNumRegistersForCallingConv(CLI.RetTy->getContext(),
                                                         CLI.CallConv, VT);
       SmallVector<SDValue, 4> Parts(NumParts);
       ISD::NodeType ExtendKind = ISD::ANY_EXTEND;
 
       if (Args[i].IsSExt)
         ExtendKind = ISD::SIGN_EXTEND;
       else if (Args[i].IsZExt)
         ExtendKind = ISD::ZERO_EXTEND;
 
       // Conservatively only handle 'returned' on non-vectors that can be lowered,
       // for now.
       if (Args[i].IsReturned && !Op.getValueType().isVector() &&
           CanLowerReturn) {
         assert((CLI.RetTy == Args[i].Ty ||
                 (CLI.RetTy->isPointerTy() && Args[i].Ty->isPointerTy() &&
                  CLI.RetTy->getPointerAddressSpace() ==
                      Args[i].Ty->getPointerAddressSpace())) &&
                RetTys.size() == NumValues && "unexpected use of 'returned'");
         // Before passing 'returned' to the target lowering code, ensure that
         // either the register MVT and the actual EVT are the same size or that
         // the return value and argument are extended in the same way; in these
         // cases it's safe to pass the argument register value unchanged as the
         // return register value (although it's at the target's option whether
         // to do so)
         // TODO: allow code generation to take advantage of partially preserved
         // registers rather than clobbering the entire register when the
         // parameter extension method is not compatible with the return
         // extension method
         if ((NumParts * PartVT.getSizeInBits() == VT.getSizeInBits()) ||
             (ExtendKind != ISD::ANY_EXTEND && CLI.RetSExt == Args[i].IsSExt &&
              CLI.RetZExt == Args[i].IsZExt))
           Flags.setReturned();
       }
 
       getCopyToParts(CLI.DAG, CLI.DL, Op, &Parts[0], NumParts, PartVT, CLI.CB,
                      CLI.CallConv, ExtendKind);
 
       for (unsigned j = 0; j != NumParts; ++j) {
         // if it isn't first piece, alignment must be 1
         // For scalable vectors the scalable part is currently handled
         // by individual targets, so we just use the known minimum size here.
         ISD::OutputArg MyFlags(Flags, Parts[j].getValueType(), VT,
                     i < CLI.NumFixedArgs, i,
                     j*Parts[j].getValueType().getStoreSize().getKnownMinSize());
         if (NumParts > 1 && j == 0)
           MyFlags.Flags.setSplit();
         else if (j != 0) {
           MyFlags.Flags.setOrigAlign(Align(1));
           if (j == NumParts - 1)
             MyFlags.Flags.setSplitEnd();
         }
 
         CLI.Outs.push_back(MyFlags);
         CLI.OutVals.push_back(Parts[j]);
       }
 
       if (NeedsRegBlock && Value == NumValues - 1)
         CLI.Outs[CLI.Outs.size() - 1].Flags.setInConsecutiveRegsLast();
     }
   }
 
   SmallVector<SDValue, 4> InVals;
   CLI.Chain = LowerCall(CLI, InVals);
 
   // Update CLI.InVals to use outside of this function.
   CLI.InVals = InVals;
 
   // Verify that the target's LowerCall behaved as expected.
   assert(CLI.Chain.getNode() && CLI.Chain.getValueType() == MVT::Other &&
          "LowerCall didn't return a valid chain!");
   assert((!CLI.IsTailCall || InVals.empty()) &&
          "LowerCall emitted a return value for a tail call!");
   assert((CLI.IsTailCall || InVals.size() == CLI.Ins.size()) &&
          "LowerCall didn't emit the correct number of values!");
 
   // For a tail call, the return value is merely live-out and there aren't
   // any nodes in the DAG representing it. Return a special value to
   // indicate that a tail call has been emitted and no more Instructions
   // should be processed in the current block.
   if (CLI.IsTailCall) {
     CLI.DAG.setRoot(CLI.Chain);
     return std::make_pair(SDValue(), SDValue());
   }
 
 #ifndef NDEBUG
   for (unsigned i = 0, e = CLI.Ins.size(); i != e; ++i) {
     assert(InVals[i].getNode() && "LowerCall emitted a null value!");
     assert(EVT(CLI.Ins[i].VT) == InVals[i].getValueType() &&
            "LowerCall emitted a value with the wrong type!");
   }
 #endif
 
   SmallVector<SDValue, 4> ReturnValues;
   if (!CanLowerReturn) {
     // The instruction result is the result of loading from the
     // hidden sret parameter.
     SmallVector<EVT, 1> PVTs;
     Type *PtrRetTy = OrigRetTy->getPointerTo(DL.getAllocaAddrSpace());
 
     ComputeValueVTs(*this, DL, PtrRetTy, PVTs);
     assert(PVTs.size() == 1 && "Pointers should fit in one register");
     EVT PtrVT = PVTs[0];
 
     unsigned NumValues = RetTys.size();
     ReturnValues.resize(NumValues);
     SmallVector<SDValue, 4> Chains(NumValues);
 
     // An aggregate return value cannot wrap around the address space, so
     // offsets to its parts don't wrap either.
     SDNodeFlags Flags;
     Flags.setNoUnsignedWrap(true);
 
     MachineFunction &MF = CLI.DAG.getMachineFunction();
     Align HiddenSRetAlign = MF.getFrameInfo().getObjectAlign(DemoteStackIdx);
     for (unsigned i = 0; i < NumValues; ++i) {
       SDValue Add = CLI.DAG.getNode(ISD::ADD, CLI.DL, PtrVT, DemoteStackSlot,
                                     CLI.DAG.getConstant(Offsets[i], CLI.DL,
                                                         PtrVT), Flags);
       SDValue L = CLI.DAG.getLoad(
           RetTys[i], CLI.DL, CLI.Chain, Add,
           MachinePointerInfo::getFixedStack(CLI.DAG.getMachineFunction(),
                                             DemoteStackIdx, Offsets[i]),
           HiddenSRetAlign);
       ReturnValues[i] = L;
       Chains[i] = L.getValue(1);
     }
 
     CLI.Chain = CLI.DAG.getNode(ISD::TokenFactor, CLI.DL, MVT::Other, Chains);
   } else {
     // Collect the legal value parts into potentially illegal values
     // that correspond to the original function's return values.
     Optional<ISD::NodeType> AssertOp;
     if (CLI.RetSExt)
       AssertOp = ISD::AssertSext;
     else if (CLI.RetZExt)
       AssertOp = ISD::AssertZext;
     unsigned CurReg = 0;
     for (unsigned I = 0, E = RetTys.size(); I != E; ++I) {
       EVT VT = RetTys[I];
       MVT RegisterVT = getRegisterTypeForCallingConv(CLI.RetTy->getContext(),
                                                      CLI.CallConv, VT);
       unsigned NumRegs = getNumRegistersForCallingConv(CLI.RetTy->getContext(),
                                                        CLI.CallConv, VT);
 
       ReturnValues.push_back(getCopyFromParts(CLI.DAG, CLI.DL, &InVals[CurReg],
                                               NumRegs, RegisterVT, VT, nullptr,
                                               CLI.CallConv, AssertOp));
       CurReg += NumRegs;
     }
 
     // For a function returning void, there is no return value. We can't create
     // such a node, so we just return a null return value in that case. In
     // that case, nothing will actually look at the value.
     if (ReturnValues.empty())
       return std::make_pair(SDValue(), CLI.Chain);
   }
 
   SDValue Res = CLI.DAG.getNode(ISD::MERGE_VALUES, CLI.DL,
                                 CLI.DAG.getVTList(RetTys), ReturnValues);
   return std::make_pair(Res, CLI.Chain);
 }
 
 void TargetLowering::LowerOperationWrapper(SDNode *N,
                                            SmallVectorImpl<SDValue> &Results,
                                            SelectionDAG &DAG) const {
   if (SDValue Res = LowerOperation(SDValue(N, 0), DAG))
     Results.push_back(Res);
 }
 
 SDValue TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
   llvm_unreachable("LowerOperation not implemented for this target!");
 }
 
 void
 SelectionDAGBuilder::CopyValueToVirtualRegister(const Value *V, unsigned Reg) {
   SDValue Op = getNonRegisterValue(V);
   assert((Op.getOpcode() != ISD::CopyFromReg ||
           cast<RegisterSDNode>(Op.getOperand(1))->getReg() != Reg) &&
          "Copy from a reg to the same reg!");
   assert(!Register::isPhysicalRegister(Reg) && "Is a physreg");
 
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   // If this is an InlineAsm we have to match the registers required, not the
   // notional registers required by the type.
 
   RegsForValue RFV(V->getContext(), TLI, DAG.getDataLayout(), Reg, V->getType(),
                    None); // This is not an ABI copy.
   SDValue Chain = DAG.getEntryNode();
 
   ISD::NodeType ExtendType = (FuncInfo.PreferredExtendType.find(V) ==
                               FuncInfo.PreferredExtendType.end())
                                  ? ISD::ANY_EXTEND
                                  : FuncInfo.PreferredExtendType[V];
   RFV.getCopyToRegs(Op, DAG, getCurSDLoc(), Chain, nullptr, V, ExtendType);
   PendingExports.push_back(Chain);
 }
 
 #include "llvm/CodeGen/SelectionDAGISel.h"
 
 /// isOnlyUsedInEntryBlock - If the specified argument is only used in the
 /// entry block, return true.  This includes arguments used by switches, since
 /// the switch may expand into multiple basic blocks.
 static bool isOnlyUsedInEntryBlock(const Argument *A, bool FastISel) {
   // With FastISel active, we may be splitting blocks, so force creation
   // of virtual registers for all non-dead arguments.
   if (FastISel)
     return A->use_empty();
 
   const BasicBlock &Entry = A->getParent()->front();
   for (const User *U : A->users())
     if (cast<Instruction>(U)->getParent() != &Entry || isa<SwitchInst>(U))
       return false;  // Use not in entry block.
 
   return true;
 }
 
 using ArgCopyElisionMapTy =
     DenseMap<const Argument *,
              std::pair<const AllocaInst *, const StoreInst *>>;
 
 /// Scan the entry block of the function in FuncInfo for arguments that look
 /// like copies into a local alloca. Record any copied arguments in
 /// ArgCopyElisionCandidates.
 static void
 findArgumentCopyElisionCandidates(const DataLayout &DL,
                                   FunctionLoweringInfo *FuncInfo,
                                   ArgCopyElisionMapTy &ArgCopyElisionCandidates) {
   // Record the state of every static alloca used in the entry block. Argument
   // allocas are all used in the entry block, so we need approximately as many
   // entries as we have arguments.
   enum StaticAllocaInfo { Unknown, Clobbered, Elidable };
   SmallDenseMap<const AllocaInst *, StaticAllocaInfo, 8> StaticAllocas;
   unsigned NumArgs = FuncInfo->Fn->arg_size();
   StaticAllocas.reserve(NumArgs * 2);
 
   auto GetInfoIfStaticAlloca = [&](const Value *V) -> StaticAllocaInfo * {
     if (!V)
       return nullptr;
     V = V->stripPointerCasts();
     const auto *AI = dyn_cast<AllocaInst>(V);
     if (!AI || !AI->isStaticAlloca() || !FuncInfo->StaticAllocaMap.count(AI))
       return nullptr;
     auto Iter = StaticAllocas.insert({AI, Unknown});
     return &Iter.first->second;
   };
 
   // Look for stores of arguments to static allocas. Look through bitcasts and
   // GEPs to handle type coercions, as long as the alloca is fully initialized
   // by the store. Any non-store use of an alloca escapes it and any subsequent
   // unanalyzed store might write it.
   // FIXME: Handle structs initialized with multiple stores.
   for (const Instruction &I : FuncInfo->Fn->getEntryBlock()) {
     // Look for stores, and handle non-store uses conservatively.
     const auto *SI = dyn_cast<StoreInst>(&I);
     if (!SI) {
       // We will look through cast uses, so ignore them completely.
       if (I.isCast())
         continue;
       // Ignore debug info intrinsics, they don't escape or store to allocas.
       if (isa<DbgInfoIntrinsic>(I))
         continue;
       // This is an unknown instruction. Assume it escapes or writes to all
       // static alloca operands.
       for (const Use &U : I.operands()) {
         if (StaticAllocaInfo *Info = GetInfoIfStaticAlloca(U))
           *Info = StaticAllocaInfo::Clobbered;
       }
       continue;
     }
 
     // If the stored value is a static alloca, mark it as escaped.
     if (StaticAllocaInfo *Info = GetInfoIfStaticAlloca(SI->getValueOperand()))
       *Info = StaticAllocaInfo::Clobbered;
 
     // Check if the destination is a static alloca.
     const Value *Dst = SI->getPointerOperand()->stripPointerCasts();
     StaticAllocaInfo *Info = GetInfoIfStaticAlloca(Dst);
     if (!Info)
       continue;
     const AllocaInst *AI = cast<AllocaInst>(Dst);
 
     // Skip allocas that have been initialized or clobbered.
     if (*Info != StaticAllocaInfo::Unknown)
       continue;
 
     // Check if the stored value is an argument, and that this store fully
     // initializes the alloca. Don't elide copies from the same argument twice.
     const Value *Val = SI->getValueOperand()->stripPointerCasts();
     const auto *Arg = dyn_cast<Argument>(Val);
     if (!Arg || Arg->hasPassPointeeByValueAttr() ||
         Arg->getType()->isEmptyTy() ||
         DL.getTypeStoreSize(Arg->getType()) !=
             DL.getTypeAllocSize(AI->getAllocatedType()) ||
         ArgCopyElisionCandidates.count(Arg)) {
       *Info = StaticAllocaInfo::Clobbered;
       continue;
     }
 
     LLVM_DEBUG(dbgs() << "Found argument copy elision candidate: " << *AI
                       << '\n');
 
     // Mark this alloca and store for argument copy elision.
     *Info = StaticAllocaInfo::Elidable;
     ArgCopyElisionCandidates.insert({Arg, {AI, SI}});
 
     // Stop scanning if we've seen all arguments. This will happen early in -O0
     // builds, which is useful, because -O0 builds have large entry blocks and
     // many allocas.
     if (ArgCopyElisionCandidates.size() == NumArgs)
       break;
   }
 }
 
 /// Try to elide argument copies from memory into a local alloca. Succeeds if
 /// ArgVal is a load from a suitable fixed stack object.
 static void tryToElideArgumentCopy(
     FunctionLoweringInfo &FuncInfo, SmallVectorImpl<SDValue> &Chains,
     DenseMap<int, int> &ArgCopyElisionFrameIndexMap,
     SmallPtrSetImpl<const Instruction *> &ElidedArgCopyInstrs,
     ArgCopyElisionMapTy &ArgCopyElisionCandidates, const Argument &Arg,
     SDValue ArgVal, bool &ArgHasUses) {
   // Check if this is a load from a fixed stack object.
   auto *LNode = dyn_cast<LoadSDNode>(ArgVal);
   if (!LNode)
     return;
   auto *FINode = dyn_cast<FrameIndexSDNode>(LNode->getBasePtr().getNode());
   if (!FINode)
     return;
 
   // Check that the fixed stack object is the right size and alignment.
   // Look at the alignment that the user wrote on the alloca instead of looking
   // at the stack object.
   auto ArgCopyIter = ArgCopyElisionCandidates.find(&Arg);
   assert(ArgCopyIter != ArgCopyElisionCandidates.end());
   const AllocaInst *AI = ArgCopyIter->second.first;
   int FixedIndex = FINode->getIndex();
   int &AllocaIndex = FuncInfo.StaticAllocaMap[AI];
   int OldIndex = AllocaIndex;
   MachineFrameInfo &MFI = FuncInfo.MF->getFrameInfo();
   if (MFI.getObjectSize(FixedIndex) != MFI.getObjectSize(OldIndex)) {
     LLVM_DEBUG(
         dbgs() << "  argument copy elision failed due to bad fixed stack "
                   "object size\n");
     return;
   }
   Align RequiredAlignment = AI->getAlign();
   if (MFI.getObjectAlign(FixedIndex) < RequiredAlignment) {
     LLVM_DEBUG(dbgs() << "  argument copy elision failed: alignment of alloca "
                          "greater than stack argument alignment ("
                       << DebugStr(RequiredAlignment) << " vs "
                       << DebugStr(MFI.getObjectAlign(FixedIndex)) << ")\n");
     return;
   }
 
   // Perform the elision. Delete the old stack object and replace its only use
   // in the variable info map. Mark the stack object as mutable.
   LLVM_DEBUG({
     dbgs() << "Eliding argument copy from " << Arg << " to " << *AI << '\n'
            << "  Replacing frame index " << OldIndex << " with " << FixedIndex
            << '\n';
   });
   MFI.RemoveStackObject(OldIndex);
   MFI.setIsImmutableObjectIndex(FixedIndex, false);
   AllocaIndex = FixedIndex;
   ArgCopyElisionFrameIndexMap.insert({OldIndex, FixedIndex});
   Chains.push_back(ArgVal.getValue(1));
 
   // Avoid emitting code for the store implementing the copy.
   const StoreInst *SI = ArgCopyIter->second.second;
   ElidedArgCopyInstrs.insert(SI);
 
   // Check for uses of the argument again so that we can avoid exporting ArgVal
   // if it is't used by anything other than the store.
   for (const Value *U : Arg.users()) {
     if (U != SI) {
       ArgHasUses = true;
       break;
     }
   }
 }
 
 void SelectionDAGISel::LowerArguments(const Function &F) {
   SelectionDAG &DAG = SDB->DAG;
   SDLoc dl = SDB->getCurSDLoc();
   const DataLayout &DL = DAG.getDataLayout();
   SmallVector<ISD::InputArg, 16> Ins;
 
   // In Naked functions we aren't going to save any registers.
   if (F.hasFnAttribute(Attribute::Naked))
     return;
 
   if (!FuncInfo->CanLowerReturn) {
     // Put in an sret pointer parameter before all the other parameters.
     SmallVector<EVT, 1> ValueVTs;
     ComputeValueVTs(*TLI, DAG.getDataLayout(),
                     F.getReturnType()->getPointerTo(
                         DAG.getDataLayout().getAllocaAddrSpace()),
                     ValueVTs);
 
     // NOTE: Assuming that a pointer will never break down to more than one VT
     // or one register.
     ISD::ArgFlagsTy Flags;
     Flags.setSRet();
     MVT RegisterVT = TLI->getRegisterType(*DAG.getContext(), ValueVTs[0]);
     ISD::InputArg RetArg(Flags, RegisterVT, ValueVTs[0], true,
                          ISD::InputArg::NoArgIndex, 0);
     Ins.push_back(RetArg);
   }
 
   // Look for stores of arguments to static allocas. Mark such arguments with a
   // flag to ask the target to give us the memory location of that argument if
   // available.
   ArgCopyElisionMapTy ArgCopyElisionCandidates;
   findArgumentCopyElisionCandidates(DL, FuncInfo.get(),
                                     ArgCopyElisionCandidates);
 
   // Set up the incoming argument description vector.
   for (const Argument &Arg : F.args()) {
     unsigned ArgNo = Arg.getArgNo();
     SmallVector<EVT, 4> ValueVTs;
     ComputeValueVTs(*TLI, DAG.getDataLayout(), Arg.getType(), ValueVTs);
     bool isArgValueUsed = !Arg.use_empty();
     unsigned PartBase = 0;
     Type *FinalType = Arg.getType();
     if (Arg.hasAttribute(Attribute::ByVal))
       FinalType = Arg.getParamByValType();
     bool NeedsRegBlock = TLI->functionArgumentNeedsConsecutiveRegisters(
         FinalType, F.getCallingConv(), F.isVarArg());
     for (unsigned Value = 0, NumValues = ValueVTs.size();
          Value != NumValues; ++Value) {
       EVT VT = ValueVTs[Value];
       Type *ArgTy = VT.getTypeForEVT(*DAG.getContext());
       ISD::ArgFlagsTy Flags;
 
       // Certain targets (such as MIPS), may have a different ABI alignment
       // for a type depending on the context. Give the target a chance to
       // specify the alignment it wants.
       const Align OriginalAlignment(
           TLI->getABIAlignmentForCallingConv(ArgTy, DL));
 
       if (Arg.getType()->isPointerTy()) {
         Flags.setPointer();
         Flags.setPointerAddrSpace(
             cast<PointerType>(Arg.getType())->getAddressSpace());
       }
       if (Arg.hasAttribute(Attribute::ZExt))
         Flags.setZExt();
       if (Arg.hasAttribute(Attribute::SExt))
         Flags.setSExt();
       if (Arg.hasAttribute(Attribute::InReg)) {
         // If we are using vectorcall calling convention, a structure that is
         // passed InReg - is surely an HVA
         if (F.getCallingConv() == CallingConv::X86_VectorCall &&
             isa<StructType>(Arg.getType())) {
           // The first value of a structure is marked
           if (0 == Value)
             Flags.setHvaStart();
           Flags.setHva();
         }
         // Set InReg Flag
         Flags.setInReg();
       }
       if (Arg.hasAttribute(Attribute::StructRet))
         Flags.setSRet();
       if (Arg.hasAttribute(Attribute::SwiftSelf))
         Flags.setSwiftSelf();
       if (Arg.hasAttribute(Attribute::SwiftError))
         Flags.setSwiftError();
       if (Arg.hasAttribute(Attribute::ByVal))
         Flags.setByVal();
       if (Arg.hasAttribute(Attribute::InAlloca)) {
         Flags.setInAlloca();
         // Set the byval flag for CCAssignFn callbacks that don't know about
         // inalloca.  This way we can know how many bytes we should've allocated
         // and how many bytes a callee cleanup function will pop.  If we port
         // inalloca to more targets, we'll have to add custom inalloca handling
         // in the various CC lowering callbacks.
         Flags.setByVal();
       }
       if (Arg.hasAttribute(Attribute::Preallocated)) {
         Flags.setPreallocated();
         // Set the byval flag for CCAssignFn callbacks that don't know about
         // preallocated.  This way we can know how many bytes we should've
         // allocated and how many bytes a callee cleanup function will pop.  If
         // we port preallocated to more targets, we'll have to add custom
         // preallocated handling in the various CC lowering callbacks.
         Flags.setByVal();
       }
       if (F.getCallingConv() == CallingConv::X86_INTR) {
         // IA Interrupt passes frame (1st parameter) by value in the stack.
         if (ArgNo == 0)
           Flags.setByVal();
       }
       if (Flags.isByVal() || Flags.isInAlloca() || Flags.isPreallocated()) {
         Type *ElementTy = Arg.getParamByValType();
 
         // For ByVal, size and alignment should be passed from FE.  BE will
         // guess if this info is not there but there are cases it cannot get
         // right.
         unsigned FrameSize = DL.getTypeAllocSize(Arg.getParamByValType());
         Flags.setByValSize(FrameSize);
 
         unsigned FrameAlign;
         if (Arg.getParamAlignment())
           FrameAlign = Arg.getParamAlignment();
         else
           FrameAlign = TLI->getByValTypeAlignment(ElementTy, DL);
         Flags.setByValAlign(Align(FrameAlign));
       }
       if (Arg.hasAttribute(Attribute::Nest))
         Flags.setNest();
       if (NeedsRegBlock)
         Flags.setInConsecutiveRegs();
       Flags.setOrigAlign(OriginalAlignment);
       if (ArgCopyElisionCandidates.count(&Arg))
         Flags.setCopyElisionCandidate();
       if (Arg.hasAttribute(Attribute::Returned))
         Flags.setReturned();
 
       MVT RegisterVT = TLI->getRegisterTypeForCallingConv(
           *CurDAG->getContext(), F.getCallingConv(), VT);
       unsigned NumRegs = TLI->getNumRegistersForCallingConv(
           *CurDAG->getContext(), F.getCallingConv(), VT);
       for (unsigned i = 0; i != NumRegs; ++i) {
         // For scalable vectors, use the minimum size; individual targets
         // are responsible for handling scalable vector arguments and
         // return values.
         ISD::InputArg MyFlags(Flags, RegisterVT, VT, isArgValueUsed,
                  ArgNo, PartBase+i*RegisterVT.getStoreSize().getKnownMinSize());
         if (NumRegs > 1 && i == 0)
           MyFlags.Flags.setSplit();
         // if it isn't first piece, alignment must be 1
         else if (i > 0) {
           MyFlags.Flags.setOrigAlign(Align(1));
           if (i == NumRegs - 1)
             MyFlags.Flags.setSplitEnd();
         }
         Ins.push_back(MyFlags);
       }
       if (NeedsRegBlock && Value == NumValues - 1)
         Ins[Ins.size() - 1].Flags.setInConsecutiveRegsLast();
       PartBase += VT.getStoreSize().getKnownMinSize();
     }
   }
 
   // Call the target to set up the argument values.
   SmallVector<SDValue, 8> InVals;
   SDValue NewRoot = TLI->LowerFormalArguments(
       DAG.getRoot(), F.getCallingConv(), F.isVarArg(), Ins, dl, DAG, InVals);
 
   // Verify that the target's LowerFormalArguments behaved as expected.
   assert(NewRoot.getNode() && NewRoot.getValueType() == MVT::Other &&
          "LowerFormalArguments didn't return a valid chain!");
   assert(InVals.size() == Ins.size() &&
          "LowerFormalArguments didn't emit the correct number of values!");
   LLVM_DEBUG({
     for (unsigned i = 0, e = Ins.size(); i != e; ++i) {
       assert(InVals[i].getNode() &&
              "LowerFormalArguments emitted a null value!");
       assert(EVT(Ins[i].VT) == InVals[i].getValueType() &&
              "LowerFormalArguments emitted a value with the wrong type!");
     }
   });
 
   // Update the DAG with the new chain value resulting from argument lowering.
   DAG.setRoot(NewRoot);
 
   // Set up the argument values.
   unsigned i = 0;
   if (!FuncInfo->CanLowerReturn) {
     // Create a virtual register for the sret pointer, and put in a copy
     // from the sret argument into it.
     SmallVector<EVT, 1> ValueVTs;
     ComputeValueVTs(*TLI, DAG.getDataLayout(),
                     F.getReturnType()->getPointerTo(
                         DAG.getDataLayout().getAllocaAddrSpace()),
                     ValueVTs);
     MVT VT = ValueVTs[0].getSimpleVT();
     MVT RegVT = TLI->getRegisterType(*CurDAG->getContext(), VT);
     Optional<ISD::NodeType> AssertOp = None;
     SDValue ArgValue = getCopyFromParts(DAG, dl, &InVals[0], 1, RegVT, VT,
                                         nullptr, F.getCallingConv(), AssertOp);
 
     MachineFunction& MF = SDB->DAG.getMachineFunction();
     MachineRegisterInfo& RegInfo = MF.getRegInfo();
     Register SRetReg =
         RegInfo.createVirtualRegister(TLI->getRegClassFor(RegVT));
     FuncInfo->DemoteRegister = SRetReg;
     NewRoot =
         SDB->DAG.getCopyToReg(NewRoot, SDB->getCurSDLoc(), SRetReg, ArgValue);
     DAG.setRoot(NewRoot);
 
     // i indexes lowered arguments.  Bump it past the hidden sret argument.
     ++i;
   }
 
   SmallVector<SDValue, 4> Chains;
   DenseMap<int, int> ArgCopyElisionFrameIndexMap;
   for (const Argument &Arg : F.args()) {
     SmallVector<SDValue, 4> ArgValues;
     SmallVector<EVT, 4> ValueVTs;
     ComputeValueVTs(*TLI, DAG.getDataLayout(), Arg.getType(), ValueVTs);
     unsigned NumValues = ValueVTs.size();
     if (NumValues == 0)
       continue;
 
     bool ArgHasUses = !Arg.use_empty();
 
     // Elide the copying store if the target loaded this argument from a
     // suitable fixed stack object.
     if (Ins[i].Flags.isCopyElisionCandidate()) {
       tryToElideArgumentCopy(*FuncInfo, Chains, ArgCopyElisionFrameIndexMap,
                              ElidedArgCopyInstrs, ArgCopyElisionCandidates, Arg,
                              InVals[i], ArgHasUses);
     }
 
     // If this argument is unused then remember its value. It is used to generate
     // debugging information.
     bool isSwiftErrorArg =
         TLI->supportSwiftError() &&
         Arg.hasAttribute(Attribute::SwiftError);
     if (!ArgHasUses && !isSwiftErrorArg) {
       SDB->setUnusedArgValue(&Arg, InVals[i]);
 
       // Also remember any frame index for use in FastISel.
       if (FrameIndexSDNode *FI =
           dyn_cast<FrameIndexSDNode>(InVals[i].getNode()))
         FuncInfo->setArgumentFrameIndex(&Arg, FI->getIndex());
     }
 
     for (unsigned Val = 0; Val != NumValues; ++Val) {
       EVT VT = ValueVTs[Val];
       MVT PartVT = TLI->getRegisterTypeForCallingConv(*CurDAG->getContext(),
                                                       F.getCallingConv(), VT);
       unsigned NumParts = TLI->getNumRegistersForCallingConv(
           *CurDAG->getContext(), F.getCallingConv(), VT);
 
       // Even an apparent 'unused' swifterror argument needs to be returned. So
       // we do generate a copy for it that can be used on return from the
       // function.
       if (ArgHasUses || isSwiftErrorArg) {
         Optional<ISD::NodeType> AssertOp;
         if (Arg.hasAttribute(Attribute::SExt))
           AssertOp = ISD::AssertSext;
         else if (Arg.hasAttribute(Attribute::ZExt))
           AssertOp = ISD::AssertZext;
 
         ArgValues.push_back(getCopyFromParts(DAG, dl, &InVals[i], NumParts,
                                              PartVT, VT, nullptr,
                                              F.getCallingConv(), AssertOp));
       }
 
       i += NumParts;
     }
 
     // We don't need to do anything else for unused arguments.
     if (ArgValues.empty())
       continue;
 
     // Note down frame index.
     if (FrameIndexSDNode *FI =
         dyn_cast<FrameIndexSDNode>(ArgValues[0].getNode()))
       FuncInfo->setArgumentFrameIndex(&Arg, FI->getIndex());
 
     SDValue Res = DAG.getMergeValues(makeArrayRef(ArgValues.data(), NumValues),
                                      SDB->getCurSDLoc());
 
     SDB->setValue(&Arg, Res);
     if (!TM.Options.EnableFastISel && Res.getOpcode() == ISD::BUILD_PAIR) {
       // We want to associate the argument with the frame index, among
       // involved operands, that correspond to the lowest address. The
       // getCopyFromParts function, called earlier, is swapping the order of
       // the operands to BUILD_PAIR depending on endianness. The result of
       // that swapping is that the least significant bits of the argument will
       // be in the first operand of the BUILD_PAIR node, and the most
       // significant bits will be in the second operand.
       unsigned LowAddressOp = DAG.getDataLayout().isBigEndian() ? 1 : 0;
       if (LoadSDNode *LNode =
           dyn_cast<LoadSDNode>(Res.getOperand(LowAddressOp).getNode()))
         if (FrameIndexSDNode *FI =
             dyn_cast<FrameIndexSDNode>(LNode->getBasePtr().getNode()))
           FuncInfo->setArgumentFrameIndex(&Arg, FI->getIndex());
     }
 
     // Analyses past this point are naive and don't expect an assertion.
     if (Res.getOpcode() == ISD::AssertZext)
       Res = Res.getOperand(0);
 
     // Update the SwiftErrorVRegDefMap.
     if (Res.getOpcode() == ISD::CopyFromReg && isSwiftErrorArg) {
       unsigned Reg = cast<RegisterSDNode>(Res.getOperand(1))->getReg();
       if (Register::isVirtualRegister(Reg))
         SwiftError->setCurrentVReg(FuncInfo->MBB, SwiftError->getFunctionArg(),
                                    Reg);
     }
 
     // If this argument is live outside of the entry block, insert a copy from
     // wherever we got it to the vreg that other BB's will reference it as.
     if (Res.getOpcode() == ISD::CopyFromReg) {
       // If we can, though, try to skip creating an unnecessary vreg.
       // FIXME: This isn't very clean... it would be nice to make this more
       // general.
       unsigned Reg = cast<RegisterSDNode>(Res.getOperand(1))->getReg();
       if (Register::isVirtualRegister(Reg)) {
         FuncInfo->ValueMap[&Arg] = Reg;
         continue;
       }
     }
     if (!isOnlyUsedInEntryBlock(&Arg, TM.Options.EnableFastISel)) {
       FuncInfo->InitializeRegForValue(&Arg);
       SDB->CopyToExportRegsIfNeeded(&Arg);
     }
   }
 
   if (!Chains.empty()) {
     Chains.push_back(NewRoot);
     NewRoot = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Chains);
   }
 
   DAG.setRoot(NewRoot);
 
   assert(i == InVals.size() && "Argument register count mismatch!");
 
   // If any argument copy elisions occurred and we have debug info, update the
   // stale frame indices used in the dbg.declare variable info table.
   MachineFunction::VariableDbgInfoMapTy &DbgDeclareInfo = MF->getVariableDbgInfo();
   if (!DbgDeclareInfo.empty() && !ArgCopyElisionFrameIndexMap.empty()) {
     for (MachineFunction::VariableDbgInfo &VI : DbgDeclareInfo) {
       auto I = ArgCopyElisionFrameIndexMap.find(VI.Slot);
       if (I != ArgCopyElisionFrameIndexMap.end())
         VI.Slot = I->second;
     }
   }
 
   // Finally, if the target has anything special to do, allow it to do so.
   emitFunctionEntryCode();
 }
 
 /// Handle PHI nodes in successor blocks.  Emit code into the SelectionDAG to
 /// ensure constants are generated when needed.  Remember the virtual registers
 /// that need to be added to the Machine PHI nodes as input.  We cannot just
 /// directly add them, because expansion might result in multiple MBB's for one
 /// BB.  As such, the start of the BB might correspond to a different MBB than
 /// the end.
 void
 SelectionDAGBuilder::HandlePHINodesInSuccessorBlocks(const BasicBlock *LLVMBB) {
   const Instruction *TI = LLVMBB->getTerminator();
 
   SmallPtrSet<MachineBasicBlock *, 4> SuccsHandled;
 
   // Check PHI nodes in successors that expect a value to be available from this
   // block.
   for (unsigned succ = 0, e = TI->getNumSuccessors(); succ != e; ++succ) {
     const BasicBlock *SuccBB = TI->getSuccessor(succ);
     if (!isa<PHINode>(SuccBB->begin())) continue;
     MachineBasicBlock *SuccMBB = FuncInfo.MBBMap[SuccBB];
 
     // If this terminator has multiple identical successors (common for
     // switches), only handle each succ once.
     if (!SuccsHandled.insert(SuccMBB).second)
       continue;
 
     MachineBasicBlock::iterator MBBI = SuccMBB->begin();
 
     // At this point we know that there is a 1-1 correspondence between LLVM PHI
     // nodes and Machine PHI nodes, but the incoming operands have not been
     // emitted yet.
     for (const PHINode &PN : SuccBB->phis()) {
       // Ignore dead phi's.
       if (PN.use_empty())
         continue;
 
       // Skip empty types
       if (PN.getType()->isEmptyTy())
         continue;
 
       unsigned Reg;
       const Value *PHIOp = PN.getIncomingValueForBlock(LLVMBB);
 
       if (const Constant *C = dyn_cast<Constant>(PHIOp)) {
         unsigned &RegOut = ConstantsOut[C];
         if (RegOut == 0) {
           RegOut = FuncInfo.CreateRegs(C);
           CopyValueToVirtualRegister(C, RegOut);
         }
         Reg = RegOut;
       } else {
         DenseMap<const Value *, Register>::iterator I =
           FuncInfo.ValueMap.find(PHIOp);
         if (I != FuncInfo.ValueMap.end())
           Reg = I->second;
         else {
           assert(isa<AllocaInst>(PHIOp) &&
                  FuncInfo.StaticAllocaMap.count(cast<AllocaInst>(PHIOp)) &&
                  "Didn't codegen value into a register!??");
           Reg = FuncInfo.CreateRegs(PHIOp);
           CopyValueToVirtualRegister(PHIOp, Reg);
         }
       }
 
       // Remember that this register needs to added to the machine PHI node as
       // the input for this MBB.
       SmallVector<EVT, 4> ValueVTs;
       const TargetLowering &TLI = DAG.getTargetLoweringInfo();
       ComputeValueVTs(TLI, DAG.getDataLayout(), PN.getType(), ValueVTs);
       for (unsigned vti = 0, vte = ValueVTs.size(); vti != vte; ++vti) {
         EVT VT = ValueVTs[vti];
         unsigned NumRegisters = TLI.getNumRegisters(*DAG.getContext(), VT);
         for (unsigned i = 0, e = NumRegisters; i != e; ++i)
           FuncInfo.PHINodesToUpdate.push_back(
               std::make_pair(&*MBBI++, Reg + i));
         Reg += NumRegisters;
       }
     }
   }
 
   ConstantsOut.clear();
 }
 
 /// Add a successor MBB to ParentMBB< creating a new MachineBB for BB if SuccMBB
 /// is 0.
 MachineBasicBlock *
 SelectionDAGBuilder::StackProtectorDescriptor::
 AddSuccessorMBB(const BasicBlock *BB,
                 MachineBasicBlock *ParentMBB,
                 bool IsLikely,
                 MachineBasicBlock *SuccMBB) {
   // If SuccBB has not been created yet, create it.
   if (!SuccMBB) {
     MachineFunction *MF = ParentMBB->getParent();
     MachineFunction::iterator BBI(ParentMBB);
     SuccMBB = MF->CreateMachineBasicBlock(BB);
     MF->insert(++BBI, SuccMBB);
   }
   // Add it as a successor of ParentMBB.
   ParentMBB->addSuccessor(
       SuccMBB, BranchProbabilityInfo::getBranchProbStackProtector(IsLikely));
   return SuccMBB;
 }
 
 MachineBasicBlock *SelectionDAGBuilder::NextBlock(MachineBasicBlock *MBB) {
   MachineFunction::iterator I(MBB);
   if (++I == FuncInfo.MF->end())
     return nullptr;
   return &*I;
 }
 
 /// During lowering new call nodes can be created (such as memset, etc.).
 /// Those will become new roots of the current DAG, but complications arise
 /// when they are tail calls. In such cases, the call lowering will update
 /// the root, but the builder still needs to know that a tail call has been
 /// lowered in order to avoid generating an additional return.
 void SelectionDAGBuilder::updateDAGForMaybeTailCall(SDValue MaybeTC) {
   // If the node is null, we do have a tail call.
   if (MaybeTC.getNode() != nullptr)
     DAG.setRoot(MaybeTC);
   else
     HasTailCall = true;
 }
 
 void SelectionDAGBuilder::lowerWorkItem(SwitchWorkListItem W, Value *Cond,
                                         MachineBasicBlock *SwitchMBB,
                                         MachineBasicBlock *DefaultMBB) {
   MachineFunction *CurMF = FuncInfo.MF;
   MachineBasicBlock *NextMBB = nullptr;
   MachineFunction::iterator BBI(W.MBB);
   if (++BBI != FuncInfo.MF->end())
     NextMBB = &*BBI;
 
   unsigned Size = W.LastCluster - W.FirstCluster + 1;
 
   BranchProbabilityInfo *BPI = FuncInfo.BPI;
 
   if (Size == 2 && W.MBB == SwitchMBB) {
     // If any two of the cases has the same destination, and if one value
     // is the same as the other, but has one bit unset that the other has set,
     // use bit manipulation to do two compares at once.  For example:
     // "if (X == 6 || X == 4)" -> "if ((X|2) == 6)"
     // TODO: This could be extended to merge any 2 cases in switches with 3
     // cases.
     // TODO: Handle cases where W.CaseBB != SwitchBB.
     CaseCluster &Small = *W.FirstCluster;
     CaseCluster &Big = *W.LastCluster;
 
     if (Small.Low == Small.High && Big.Low == Big.High &&
         Small.MBB == Big.MBB) {
       const APInt &SmallValue = Small.Low->getValue();
       const APInt &BigValue = Big.Low->getValue();
 
       // Check that there is only one bit different.
       APInt CommonBit = BigValue ^ SmallValue;
       if (CommonBit.isPowerOf2()) {
         SDValue CondLHS = getValue(Cond);
         EVT VT = CondLHS.getValueType();
         SDLoc DL = getCurSDLoc();
 
         SDValue Or = DAG.getNode(ISD::OR, DL, VT, CondLHS,
                                  DAG.getConstant(CommonBit, DL, VT));
         SDValue Cond = DAG.getSetCC(
             DL, MVT::i1, Or, DAG.getConstant(BigValue | SmallValue, DL, VT),
             ISD::SETEQ);
 
         // Update successor info.
         // Both Small and Big will jump to Small.BB, so we sum up the
         // probabilities.
         addSuccessorWithProb(SwitchMBB, Small.MBB, Small.Prob + Big.Prob);
         if (BPI)
           addSuccessorWithProb(
               SwitchMBB, DefaultMBB,
               // The default destination is the first successor in IR.
               BPI->getEdgeProbability(SwitchMBB->getBasicBlock(), (unsigned)0));
         else
           addSuccessorWithProb(SwitchMBB, DefaultMBB);
 
         // Insert the true branch.
         SDValue BrCond =
             DAG.getNode(ISD::BRCOND, DL, MVT::Other, getControlRoot(), Cond,
                         DAG.getBasicBlock(Small.MBB));
         // Insert the false branch.
         BrCond = DAG.getNode(ISD::BR, DL, MVT::Other, BrCond,
                              DAG.getBasicBlock(DefaultMBB));
 
         DAG.setRoot(BrCond);
         return;
       }
     }
   }
 
   if (TM.getOptLevel() != CodeGenOpt::None) {
     // Here, we order cases by probability so the most likely case will be
     // checked first. However, two clusters can have the same probability in
     // which case their relative ordering is non-deterministic. So we use Low
     // as a tie-breaker as clusters are guaranteed to never overlap.
     llvm::sort(W.FirstCluster, W.LastCluster + 1,
                [](const CaseCluster &a, const CaseCluster &b) {
       return a.Prob != b.Prob ?
              a.Prob > b.Prob :
              a.Low->getValue().slt(b.Low->getValue());
     });
 
     // Rearrange the case blocks so that the last one falls through if possible
     // without changing the order of probabilities.
     for (CaseClusterIt I = W.LastCluster; I > W.FirstCluster; ) {
       --I;
       if (I->Prob > W.LastCluster->Prob)
         break;
       if (I->Kind == CC_Range && I->MBB == NextMBB) {
         std::swap(*I, *W.LastCluster);
         break;
       }
     }
   }
 
   // Compute total probability.
   BranchProbability DefaultProb = W.DefaultProb;
   BranchProbability UnhandledProbs = DefaultProb;
   for (CaseClusterIt I = W.FirstCluster; I <= W.LastCluster; ++I)
     UnhandledProbs += I->Prob;
 
   MachineBasicBlock *CurMBB = W.MBB;
   for (CaseClusterIt I = W.FirstCluster, E = W.LastCluster; I <= E; ++I) {
     bool FallthroughUnreachable = false;
     MachineBasicBlock *Fallthrough;
     if (I == W.LastCluster) {
       // For the last cluster, fall through to the default destination.
       Fallthrough = DefaultMBB;
       FallthroughUnreachable = isa<UnreachableInst>(
           DefaultMBB->getBasicBlock()->getFirstNonPHIOrDbg());
     } else {
       Fallthrough = CurMF->CreateMachineBasicBlock(CurMBB->getBasicBlock());
       CurMF->insert(BBI, Fallthrough);
       // Put Cond in a virtual register to make it available from the new blocks.
       ExportFromCurrentBlock(Cond);
     }
     UnhandledProbs -= I->Prob;
 
     switch (I->Kind) {
       case CC_JumpTable: {
         // FIXME: Optimize away range check based on pivot comparisons.
         JumpTableHeader *JTH = &SL->JTCases[I->JTCasesIndex].first;
         SwitchCG::JumpTable *JT = &SL->JTCases[I->JTCasesIndex].second;
 
         // The jump block hasn't been inserted yet; insert it here.
         MachineBasicBlock *JumpMBB = JT->MBB;
         CurMF->insert(BBI, JumpMBB);
 
         auto JumpProb = I->Prob;
         auto FallthroughProb = UnhandledProbs;
 
         // If the default statement is a target of the jump table, we evenly
         // distribute the default probability to successors of CurMBB. Also
         // update the probability on the edge from JumpMBB to Fallthrough.
         for (MachineBasicBlock::succ_iterator SI = JumpMBB->succ_begin(),
                                               SE = JumpMBB->succ_end();
              SI != SE; ++SI) {
           if (*SI == DefaultMBB) {
             JumpProb += DefaultProb / 2;
             FallthroughProb -= DefaultProb / 2;
             JumpMBB->setSuccProbability(SI, DefaultProb / 2);
             JumpMBB->normalizeSuccProbs();
             break;
           }
         }
 
         if (FallthroughUnreachable) {
           // Skip the range check if the fallthrough block is unreachable.
           JTH->OmitRangeCheck = true;
         }
 
         if (!JTH->OmitRangeCheck)
           addSuccessorWithProb(CurMBB, Fallthrough, FallthroughProb);
         addSuccessorWithProb(CurMBB, JumpMBB, JumpProb);
         CurMBB->normalizeSuccProbs();
 
         // The jump table header will be inserted in our current block, do the
         // range check, and fall through to our fallthrough block.
         JTH->HeaderBB = CurMBB;
         JT->Default = Fallthrough; // FIXME: Move Default to JumpTableHeader.
 
         // If we're in the right place, emit the jump table header right now.
         if (CurMBB == SwitchMBB) {
           visitJumpTableHeader(*JT, *JTH, SwitchMBB);
           JTH->Emitted = true;
         }
         break;
       }
       case CC_BitTests: {
         // FIXME: Optimize away range check based on pivot comparisons.
         BitTestBlock *BTB = &SL->BitTestCases[I->BTCasesIndex];
 
         // The bit test blocks haven't been inserted yet; insert them here.
         for (BitTestCase &BTC : BTB->Cases)
           CurMF->insert(BBI, BTC.ThisBB);
 
         // Fill in fields of the BitTestBlock.
         BTB->Parent = CurMBB;
         BTB->Default = Fallthrough;
 
         BTB->DefaultProb = UnhandledProbs;
         // If the cases in bit test don't form a contiguous range, we evenly
         // distribute the probability on the edge to Fallthrough to two
         // successors of CurMBB.
         if (!BTB->ContiguousRange) {
           BTB->Prob += DefaultProb / 2;
           BTB->DefaultProb -= DefaultProb / 2;
         }
 
         if (FallthroughUnreachable) {
           // Skip the range check if the fallthrough block is unreachable.
           BTB->OmitRangeCheck = true;
         }
 
         // If we're in the right place, emit the bit test header right now.
         if (CurMBB == SwitchMBB) {
           visitBitTestHeader(*BTB, SwitchMBB);
           BTB->Emitted = true;
         }
         break;
       }
       case CC_Range: {
         const Value *RHS, *LHS, *MHS;
         ISD::CondCode CC;
         if (I->Low == I->High) {
           // Check Cond == I->Low.
           CC = ISD::SETEQ;
           LHS = Cond;
           RHS=I->Low;
           MHS = nullptr;
         } else {
           // Check I->Low <= Cond <= I->High.
           CC = ISD::SETLE;
           LHS = I->Low;
           MHS = Cond;
           RHS = I->High;
         }
 
         // If Fallthrough is unreachable, fold away the comparison.
         if (FallthroughUnreachable)
           CC = ISD::SETTRUE;
 
         // The false probability is the sum of all unhandled cases.
         CaseBlock CB(CC, LHS, RHS, MHS, I->MBB, Fallthrough, CurMBB,
                      getCurSDLoc(), I->Prob, UnhandledProbs);
 
         if (CurMBB == SwitchMBB)
           visitSwitchCase(CB, SwitchMBB);
         else
           SL->SwitchCases.push_back(CB);
 
         break;
       }
     }
     CurMBB = Fallthrough;
   }
 }
 
 unsigned SelectionDAGBuilder::caseClusterRank(const CaseCluster &CC,
                                               CaseClusterIt First,
                                               CaseClusterIt Last) {
   return std::count_if(First, Last + 1, [&](const CaseCluster &X) {
     if (X.Prob != CC.Prob)
       return X.Prob > CC.Prob;
 
     // Ties are broken by comparing the case value.
     return X.Low->getValue().slt(CC.Low->getValue());
   });
 }
 
 void SelectionDAGBuilder::splitWorkItem(SwitchWorkList &WorkList,
                                         const SwitchWorkListItem &W,
                                         Value *Cond,
                                         MachineBasicBlock *SwitchMBB) {
   assert(W.FirstCluster->Low->getValue().slt(W.LastCluster->Low->getValue()) &&
          "Clusters not sorted?");
 
   assert(W.LastCluster - W.FirstCluster + 1 >= 2 && "Too small to split!");
 
   // Balance the tree based on branch probabilities to create a near-optimal (in
   // terms of search time given key frequency) binary search tree. See e.g. Kurt
   // Mehlhorn "Nearly Optimal Binary Search Trees" (1975).
   CaseClusterIt LastLeft = W.FirstCluster;
   CaseClusterIt FirstRight = W.LastCluster;
   auto LeftProb = LastLeft->Prob + W.DefaultProb / 2;
   auto RightProb = FirstRight->Prob + W.DefaultProb / 2;
 
   // Move LastLeft and FirstRight towards each other from opposite directions to
   // find a partitioning of the clusters which balances the probability on both
   // sides. If LeftProb and RightProb are equal, alternate which side is
   // taken to ensure 0-probability nodes are distributed evenly.
   unsigned I = 0;
   while (LastLeft + 1 < FirstRight) {
     if (LeftProb < RightProb || (LeftProb == RightProb && (I & 1)))
       LeftProb += (++LastLeft)->Prob;
     else
       RightProb += (--FirstRight)->Prob;
     I++;
   }
 
   while (true) {
     // Our binary search tree differs from a typical BST in that ours can have up
     // to three values in each leaf. The pivot selection above doesn't take that
     // into account, which means the tree might require more nodes and be less
     // efficient. We compensate for this here.
 
     unsigned NumLeft = LastLeft - W.FirstCluster + 1;
     unsigned NumRight = W.LastCluster - FirstRight + 1;
 
     if (std::min(NumLeft, NumRight) < 3 && std::max(NumLeft, NumRight) > 3) {
       // If one side has less than 3 clusters, and the other has more than 3,
       // consider taking a cluster from the other side.
 
       if (NumLeft < NumRight) {
         // Consider moving the first cluster on the right to the left side.
         CaseCluster &CC = *FirstRight;
         unsigned RightSideRank = caseClusterRank(CC, FirstRight, W.LastCluster);
         unsigned LeftSideRank = caseClusterRank(CC, W.FirstCluster, LastLeft);
         if (LeftSideRank <= RightSideRank) {
           // Moving the cluster to the left does not demote it.
           ++LastLeft;
           ++FirstRight;
           continue;
         }
       } else {
         assert(NumRight < NumLeft);
         // Consider moving the last element on the left to the right side.
         CaseCluster &CC = *LastLeft;
         unsigned LeftSideRank = caseClusterRank(CC, W.FirstCluster, LastLeft);
         unsigned RightSideRank = caseClusterRank(CC, FirstRight, W.LastCluster);
         if (RightSideRank <= LeftSideRank) {
           // Moving the cluster to the right does not demot it.
           --LastLeft;
           --FirstRight;
           continue;
         }
       }
     }
     break;
   }
 
   assert(LastLeft + 1 == FirstRight);
   assert(LastLeft >= W.FirstCluster);
   assert(FirstRight <= W.LastCluster);
 
   // Use the first element on the right as pivot since we will make less-than
   // comparisons against it.
   CaseClusterIt PivotCluster = FirstRight;
   assert(PivotCluster > W.FirstCluster);
   assert(PivotCluster <= W.LastCluster);
 
   CaseClusterIt FirstLeft = W.FirstCluster;
   CaseClusterIt LastRight = W.LastCluster;
 
   const ConstantInt *Pivot = PivotCluster->Low;
 
   // New blocks will be inserted immediately after the current one.
   MachineFunction::iterator BBI(W.MBB);
   ++BBI;
 
   // We will branch to the LHS if Value < Pivot. If LHS is a single cluster,
   // we can branch to its destination directly if it's squeezed exactly in
   // between the known lower bound and Pivot - 1.
   MachineBasicBlock *LeftMBB;
   if (FirstLeft == LastLeft && FirstLeft->Kind == CC_Range &&
       FirstLeft->Low == W.GE &&
       (FirstLeft->High->getValue() + 1LL) == Pivot->getValue()) {
     LeftMBB = FirstLeft->MBB;
   } else {
     LeftMBB = FuncInfo.MF->CreateMachineBasicBlock(W.MBB->getBasicBlock());
     FuncInfo.MF->insert(BBI, LeftMBB);
     WorkList.push_back(
         {LeftMBB, FirstLeft, LastLeft, W.GE, Pivot, W.DefaultProb / 2});
     // Put Cond in a virtual register to make it available from the new blocks.
     ExportFromCurrentBlock(Cond);
   }
 
   // Similarly, we will branch to the RHS if Value >= Pivot. If RHS is a
   // single cluster, RHS.Low == Pivot, and we can branch to its destination
   // directly if RHS.High equals the current upper bound.
   MachineBasicBlock *RightMBB;
   if (FirstRight == LastRight && FirstRight->Kind == CC_Range &&
       W.LT && (FirstRight->High->getValue() + 1ULL) == W.LT->getValue()) {
     RightMBB = FirstRight->MBB;
   } else {
     RightMBB = FuncInfo.MF->CreateMachineBasicBlock(W.MBB->getBasicBlock());
     FuncInfo.MF->insert(BBI, RightMBB);
     WorkList.push_back(
         {RightMBB, FirstRight, LastRight, Pivot, W.LT, W.DefaultProb / 2});
     // Put Cond in a virtual register to make it available from the new blocks.
     ExportFromCurrentBlock(Cond);
   }
 
   // Create the CaseBlock record that will be used to lower the branch.
   CaseBlock CB(ISD::SETLT, Cond, Pivot, nullptr, LeftMBB, RightMBB, W.MBB,
                getCurSDLoc(), LeftProb, RightProb);
 
   if (W.MBB == SwitchMBB)
     visitSwitchCase(CB, SwitchMBB);
   else
     SL->SwitchCases.push_back(CB);
 }
 
 // Scale CaseProb after peeling a case with the probablity of PeeledCaseProb
 // from the swith statement.
 static BranchProbability scaleCaseProbality(BranchProbability CaseProb,
                                             BranchProbability PeeledCaseProb) {
   if (PeeledCaseProb == BranchProbability::getOne())
     return BranchProbability::getZero();
   BranchProbability SwitchProb = PeeledCaseProb.getCompl();
 
   uint32_t Numerator = CaseProb.getNumerator();
   uint32_t Denominator = SwitchProb.scale(CaseProb.getDenominator());
   return BranchProbability(Numerator, std::max(Numerator, Denominator));
 }
 
 // Try to peel the top probability case if it exceeds the threshold.
 // Return current MachineBasicBlock for the switch statement if the peeling
 // does not occur.
 // If the peeling is performed, return the newly created MachineBasicBlock
 // for the peeled switch statement. Also update Clusters to remove the peeled
 // case. PeeledCaseProb is the BranchProbability for the peeled case.
 MachineBasicBlock *SelectionDAGBuilder::peelDominantCaseCluster(
     const SwitchInst &SI, CaseClusterVector &Clusters,
     BranchProbability &PeeledCaseProb) {
   MachineBasicBlock *SwitchMBB = FuncInfo.MBB;
   // Don't perform if there is only one cluster or optimizing for size.
   if (SwitchPeelThreshold > 100 || !FuncInfo.BPI || Clusters.size() < 2 ||
       TM.getOptLevel() == CodeGenOpt::None ||
       SwitchMBB->getParent()->getFunction().hasMinSize())
     return SwitchMBB;
 
   BranchProbability TopCaseProb = BranchProbability(SwitchPeelThreshold, 100);
   unsigned PeeledCaseIndex = 0;
   bool SwitchPeeled = false;
   for (unsigned Index = 0; Index < Clusters.size(); ++Index) {
     CaseCluster &CC = Clusters[Index];
     if (CC.Prob < TopCaseProb)
       continue;
     TopCaseProb = CC.Prob;
     PeeledCaseIndex = Index;
     SwitchPeeled = true;
   }
   if (!SwitchPeeled)
     return SwitchMBB;
 
   LLVM_DEBUG(dbgs() << "Peeled one top case in switch stmt, prob: "
                     << TopCaseProb << "\n");
 
   // Record the MBB for the peeled switch statement.
   MachineFunction::iterator BBI(SwitchMBB);
   ++BBI;
   MachineBasicBlock *PeeledSwitchMBB =
       FuncInfo.MF->CreateMachineBasicBlock(SwitchMBB->getBasicBlock());
   FuncInfo.MF->insert(BBI, PeeledSwitchMBB);
 
   ExportFromCurrentBlock(SI.getCondition());
   auto PeeledCaseIt = Clusters.begin() + PeeledCaseIndex;
   SwitchWorkListItem W = {SwitchMBB, PeeledCaseIt, PeeledCaseIt,
                           nullptr,   nullptr,      TopCaseProb.getCompl()};
   lowerWorkItem(W, SI.getCondition(), SwitchMBB, PeeledSwitchMBB);
 
   Clusters.erase(PeeledCaseIt);
   for (CaseCluster &CC : Clusters) {
     LLVM_DEBUG(
         dbgs() << "Scale the probablity for one cluster, before scaling: "
                << CC.Prob << "\n");
     CC.Prob = scaleCaseProbality(CC.Prob, TopCaseProb);
     LLVM_DEBUG(dbgs() << "After scaling: " << CC.Prob << "\n");
   }
   PeeledCaseProb = TopCaseProb;
   return PeeledSwitchMBB;
 }
 
 void SelectionDAGBuilder::visitSwitch(const SwitchInst &SI) {
   // Extract cases from the switch.
   BranchProbabilityInfo *BPI = FuncInfo.BPI;
   CaseClusterVector Clusters;
   Clusters.reserve(SI.getNumCases());
   for (auto I : SI.cases()) {
     MachineBasicBlock *Succ = FuncInfo.MBBMap[I.getCaseSuccessor()];
     const ConstantInt *CaseVal = I.getCaseValue();
     BranchProbability Prob =
         BPI ? BPI->getEdgeProbability(SI.getParent(), I.getSuccessorIndex())
             : BranchProbability(1, SI.getNumCases() + 1);
     Clusters.push_back(CaseCluster::range(CaseVal, CaseVal, Succ, Prob));
   }
 
   MachineBasicBlock *DefaultMBB = FuncInfo.MBBMap[SI.getDefaultDest()];
 
   // Cluster adjacent cases with the same destination. We do this at all
   // optimization levels because it's cheap to do and will make codegen faster
   // if there are many clusters.
   sortAndRangeify(Clusters);
 
   // The branch probablity of the peeled case.
   BranchProbability PeeledCaseProb = BranchProbability::getZero();
   MachineBasicBlock *PeeledSwitchMBB =
       peelDominantCaseCluster(SI, Clusters, PeeledCaseProb);
 
   // If there is only the default destination, jump there directly.
   MachineBasicBlock *SwitchMBB = FuncInfo.MBB;
   if (Clusters.empty()) {
     assert(PeeledSwitchMBB == SwitchMBB);
     SwitchMBB->addSuccessor(DefaultMBB);
     if (DefaultMBB != NextBlock(SwitchMBB)) {
       DAG.setRoot(DAG.getNode(ISD::BR, getCurSDLoc(), MVT::Other,
                               getControlRoot(), DAG.getBasicBlock(DefaultMBB)));
     }
     return;
   }
 
   SL->findJumpTables(Clusters, &SI, DefaultMBB, DAG.getPSI(), DAG.getBFI());
   SL->findBitTestClusters(Clusters, &SI);
 
   LLVM_DEBUG({
     dbgs() << "Case clusters: ";
     for (const CaseCluster &C : Clusters) {
       if (C.Kind == CC_JumpTable)
         dbgs() << "JT:";
       if (C.Kind == CC_BitTests)
         dbgs() << "BT:";
 
       C.Low->getValue().print(dbgs(), true);
       if (C.Low != C.High) {
         dbgs() << '-';
         C.High->getValue().print(dbgs(), true);
       }
       dbgs() << ' ';
     }
     dbgs() << '\n';
   });
 
   assert(!Clusters.empty());
   SwitchWorkList WorkList;
   CaseClusterIt First = Clusters.begin();
   CaseClusterIt Last = Clusters.end() - 1;
   auto DefaultProb = getEdgeProbability(PeeledSwitchMBB, DefaultMBB);
   // Scale the branchprobability for DefaultMBB if the peel occurs and
   // DefaultMBB is not replaced.
   if (PeeledCaseProb != BranchProbability::getZero() &&
       DefaultMBB == FuncInfo.MBBMap[SI.getDefaultDest()])
     DefaultProb = scaleCaseProbality(DefaultProb, PeeledCaseProb);
   WorkList.push_back(
       {PeeledSwitchMBB, First, Last, nullptr, nullptr, DefaultProb});
 
   while (!WorkList.empty()) {
     SwitchWorkListItem W = WorkList.back();
     WorkList.pop_back();
     unsigned NumClusters = W.LastCluster - W.FirstCluster + 1;
 
     if (NumClusters > 3 && TM.getOptLevel() != CodeGenOpt::None &&
         !DefaultMBB->getParent()->getFunction().hasMinSize()) {
       // For optimized builds, lower large range as a balanced binary tree.
       splitWorkItem(WorkList, W, SI.getCondition(), SwitchMBB);
       continue;
     }
 
     lowerWorkItem(W, SI.getCondition(), SwitchMBB, DefaultMBB);
   }
 }
 
 void SelectionDAGBuilder::visitFreeze(const FreezeInst &I) {
   SmallVector<EVT, 4> ValueVTs;
   ComputeValueVTs(DAG.getTargetLoweringInfo(), DAG.getDataLayout(), I.getType(),
                   ValueVTs);
   unsigned NumValues = ValueVTs.size();
   if (NumValues == 0) return;
 
   SmallVector<SDValue, 4> Values(NumValues);
   SDValue Op = getValue(I.getOperand(0));
 
   for (unsigned i = 0; i != NumValues; ++i)
     Values[i] = DAG.getNode(ISD::FREEZE, getCurSDLoc(), ValueVTs[i],
                             SDValue(Op.getNode(), Op.getResNo() + i));
 
   setValue(&I, DAG.getNode(ISD::MERGE_VALUES, getCurSDLoc(),
                            DAG.getVTList(ValueVTs), Values));
 }
Index: vendor/llvm-project/release-11.x/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
===================================================================
--- vendor/llvm-project/release-11.x/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp	(revision 366333)
@@ -1,7902 +1,7910 @@
 //===-- TargetLowering.cpp - Implement the TargetLowering class -----------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 //
 // This implements the TargetLowering class.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/CodeGen/TargetLowering.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/CodeGen/CallingConvLower.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineJumpTableInfo.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/SelectionDAG.h"
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCExpr.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/KnownBits.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Target/TargetLoweringObjectFile.h"
 #include "llvm/Target/TargetMachine.h"
 #include <cctype>
 using namespace llvm;
 
 /// NOTE: The TargetMachine owns TLOF.
 TargetLowering::TargetLowering(const TargetMachine &tm)
     : TargetLoweringBase(tm) {}
 
 const char *TargetLowering::getTargetNodeName(unsigned Opcode) const {
   return nullptr;
 }
 
 bool TargetLowering::isPositionIndependent() const {
   return getTargetMachine().isPositionIndependent();
 }
 
 /// Check whether a given call node is in tail position within its function. If
 /// so, it sets Chain to the input chain of the tail call.
 bool TargetLowering::isInTailCallPosition(SelectionDAG &DAG, SDNode *Node,
                                           SDValue &Chain) const {
   const Function &F = DAG.getMachineFunction().getFunction();
 
   // First, check if tail calls have been disabled in this function.
   if (F.getFnAttribute("disable-tail-calls").getValueAsString() == "true")
     return false;
 
   // Conservatively require the attributes of the call to match those of
   // the return. Ignore NoAlias and NonNull because they don't affect the
   // call sequence.
   AttributeList CallerAttrs = F.getAttributes();
   if (AttrBuilder(CallerAttrs, AttributeList::ReturnIndex)
           .removeAttribute(Attribute::NoAlias)
           .removeAttribute(Attribute::NonNull)
           .hasAttributes())
     return false;
 
   // It's not safe to eliminate the sign / zero extension of the return value.
   if (CallerAttrs.hasAttribute(AttributeList::ReturnIndex, Attribute::ZExt) ||
       CallerAttrs.hasAttribute(AttributeList::ReturnIndex, Attribute::SExt))
     return false;
 
   // Check if the only use is a function return node.
   return isUsedByReturnOnly(Node, Chain);
 }
 
 bool TargetLowering::parametersInCSRMatch(const MachineRegisterInfo &MRI,
     const uint32_t *CallerPreservedMask,
     const SmallVectorImpl<CCValAssign> &ArgLocs,
     const SmallVectorImpl<SDValue> &OutVals) const {
   for (unsigned I = 0, E = ArgLocs.size(); I != E; ++I) {
     const CCValAssign &ArgLoc = ArgLocs[I];
     if (!ArgLoc.isRegLoc())
       continue;
     MCRegister Reg = ArgLoc.getLocReg();
     // Only look at callee saved registers.
     if (MachineOperand::clobbersPhysReg(CallerPreservedMask, Reg))
       continue;
     // Check that we pass the value used for the caller.
     // (We look for a CopyFromReg reading a virtual register that is used
     //  for the function live-in value of register Reg)
     SDValue Value = OutVals[I];
     if (Value->getOpcode() != ISD::CopyFromReg)
       return false;
     MCRegister ArgReg = cast<RegisterSDNode>(Value->getOperand(1))->getReg();
     if (MRI.getLiveInPhysReg(ArgReg) != Reg)
       return false;
   }
   return true;
 }
 
 /// Set CallLoweringInfo attribute flags based on a call instruction
 /// and called function attributes.
 void TargetLoweringBase::ArgListEntry::setAttributes(const CallBase *Call,
                                                      unsigned ArgIdx) {
   IsSExt = Call->paramHasAttr(ArgIdx, Attribute::SExt);
   IsZExt = Call->paramHasAttr(ArgIdx, Attribute::ZExt);
   IsInReg = Call->paramHasAttr(ArgIdx, Attribute::InReg);
   IsSRet = Call->paramHasAttr(ArgIdx, Attribute::StructRet);
   IsNest = Call->paramHasAttr(ArgIdx, Attribute::Nest);
   IsByVal = Call->paramHasAttr(ArgIdx, Attribute::ByVal);
   IsPreallocated = Call->paramHasAttr(ArgIdx, Attribute::Preallocated);
   IsInAlloca = Call->paramHasAttr(ArgIdx, Attribute::InAlloca);
   IsReturned = Call->paramHasAttr(ArgIdx, Attribute::Returned);
   IsSwiftSelf = Call->paramHasAttr(ArgIdx, Attribute::SwiftSelf);
   IsSwiftError = Call->paramHasAttr(ArgIdx, Attribute::SwiftError);
   Alignment = Call->getParamAlign(ArgIdx);
   ByValType = nullptr;
   if (IsByVal)
     ByValType = Call->getParamByValType(ArgIdx);
   PreallocatedType = nullptr;
   if (IsPreallocated)
     PreallocatedType = Call->getParamPreallocatedType(ArgIdx);
 }
 
 /// Generate a libcall taking the given operands as arguments and returning a
 /// result of type RetVT.
 std::pair<SDValue, SDValue>
 TargetLowering::makeLibCall(SelectionDAG &DAG, RTLIB::Libcall LC, EVT RetVT,
                             ArrayRef<SDValue> Ops,
                             MakeLibCallOptions CallOptions,
                             const SDLoc &dl,
                             SDValue InChain) const {
   if (!InChain)
     InChain = DAG.getEntryNode();
 
   TargetLowering::ArgListTy Args;
   Args.reserve(Ops.size());
 
   TargetLowering::ArgListEntry Entry;
   for (unsigned i = 0; i < Ops.size(); ++i) {
     SDValue NewOp = Ops[i];
     Entry.Node = NewOp;
     Entry.Ty = Entry.Node.getValueType().getTypeForEVT(*DAG.getContext());
     Entry.IsSExt = shouldSignExtendTypeInLibCall(NewOp.getValueType(),
                                                  CallOptions.IsSExt);
     Entry.IsZExt = !Entry.IsSExt;
 
     if (CallOptions.IsSoften &&
         !shouldExtendTypeInLibCall(CallOptions.OpsVTBeforeSoften[i])) {
       Entry.IsSExt = Entry.IsZExt = false;
     }
     Args.push_back(Entry);
   }
 
   if (LC == RTLIB::UNKNOWN_LIBCALL)
     report_fatal_error("Unsupported library call operation!");
   SDValue Callee = DAG.getExternalSymbol(getLibcallName(LC),
                                          getPointerTy(DAG.getDataLayout()));
 
   Type *RetTy = RetVT.getTypeForEVT(*DAG.getContext());
   TargetLowering::CallLoweringInfo CLI(DAG);
   bool signExtend = shouldSignExtendTypeInLibCall(RetVT, CallOptions.IsSExt);
   bool zeroExtend = !signExtend;
 
   if (CallOptions.IsSoften &&
       !shouldExtendTypeInLibCall(CallOptions.RetVTBeforeSoften)) {
     signExtend = zeroExtend = false;
   }
 
   CLI.setDebugLoc(dl)
       .setChain(InChain)
       .setLibCallee(getLibcallCallingConv(LC), RetTy, Callee, std::move(Args))
       .setNoReturn(CallOptions.DoesNotReturn)
       .setDiscardResult(!CallOptions.IsReturnValueUsed)
       .setIsPostTypeLegalization(CallOptions.IsPostTypeLegalization)
       .setSExtResult(signExtend)
       .setZExtResult(zeroExtend);
   return LowerCallTo(CLI);
 }
 
 bool TargetLowering::findOptimalMemOpLowering(
     std::vector<EVT> &MemOps, unsigned Limit, const MemOp &Op, unsigned DstAS,
     unsigned SrcAS, const AttributeList &FuncAttributes) const {
   if (Op.isMemcpyWithFixedDstAlign() && Op.getSrcAlign() < Op.getDstAlign())
     return false;
 
   EVT VT = getOptimalMemOpType(Op, FuncAttributes);
 
   if (VT == MVT::Other) {
     // Use the largest integer type whose alignment constraints are satisfied.
     // We only need to check DstAlign here as SrcAlign is always greater or
     // equal to DstAlign (or zero).
     VT = MVT::i64;
     if (Op.isFixedDstAlign())
       while (
           Op.getDstAlign() < (VT.getSizeInBits() / 8) &&
           !allowsMisalignedMemoryAccesses(VT, DstAS, Op.getDstAlign().value()))
         VT = (MVT::SimpleValueType)(VT.getSimpleVT().SimpleTy - 1);
     assert(VT.isInteger());
 
     // Find the largest legal integer type.
     MVT LVT = MVT::i64;
     while (!isTypeLegal(LVT))
       LVT = (MVT::SimpleValueType)(LVT.SimpleTy - 1);
     assert(LVT.isInteger());
 
     // If the type we've chosen is larger than the largest legal integer type
     // then use that instead.
     if (VT.bitsGT(LVT))
       VT = LVT;
   }
 
   unsigned NumMemOps = 0;
   uint64_t Size = Op.size();
   while (Size) {
     unsigned VTSize = VT.getSizeInBits() / 8;
     while (VTSize > Size) {
       // For now, only use non-vector load / store's for the left-over pieces.
       EVT NewVT = VT;
       unsigned NewVTSize;
 
       bool Found = false;
       if (VT.isVector() || VT.isFloatingPoint()) {
         NewVT = (VT.getSizeInBits() > 64) ? MVT::i64 : MVT::i32;
         if (isOperationLegalOrCustom(ISD::STORE, NewVT) &&
             isSafeMemOpType(NewVT.getSimpleVT()))
           Found = true;
         else if (NewVT == MVT::i64 &&
                  isOperationLegalOrCustom(ISD::STORE, MVT::f64) &&
                  isSafeMemOpType(MVT::f64)) {
           // i64 is usually not legal on 32-bit targets, but f64 may be.
           NewVT = MVT::f64;
           Found = true;
         }
       }
 
       if (!Found) {
         do {
           NewVT = (MVT::SimpleValueType)(NewVT.getSimpleVT().SimpleTy - 1);
           if (NewVT == MVT::i8)
             break;
         } while (!isSafeMemOpType(NewVT.getSimpleVT()));
       }
       NewVTSize = NewVT.getSizeInBits() / 8;
 
       // If the new VT cannot cover all of the remaining bits, then consider
       // issuing a (or a pair of) unaligned and overlapping load / store.
       bool Fast;
       if (NumMemOps && Op.allowOverlap() && NewVTSize < Size &&
           allowsMisalignedMemoryAccesses(
               VT, DstAS, Op.isFixedDstAlign() ? Op.getDstAlign().value() : 0,
               MachineMemOperand::MONone, &Fast) &&
           Fast)
         VTSize = Size;
       else {
         VT = NewVT;
         VTSize = NewVTSize;
       }
     }
 
     if (++NumMemOps > Limit)
       return false;
 
     MemOps.push_back(VT);
     Size -= VTSize;
   }
 
   return true;
 }
 
 /// Soften the operands of a comparison. This code is shared among BR_CC,
 /// SELECT_CC, and SETCC handlers.
 void TargetLowering::softenSetCCOperands(SelectionDAG &DAG, EVT VT,
                                          SDValue &NewLHS, SDValue &NewRHS,
                                          ISD::CondCode &CCCode,
                                          const SDLoc &dl, const SDValue OldLHS,
                                          const SDValue OldRHS) const {
   SDValue Chain;
   return softenSetCCOperands(DAG, VT, NewLHS, NewRHS, CCCode, dl, OldLHS,
                              OldRHS, Chain);
 }
 
 void TargetLowering::softenSetCCOperands(SelectionDAG &DAG, EVT VT,
                                          SDValue &NewLHS, SDValue &NewRHS,
                                          ISD::CondCode &CCCode,
                                          const SDLoc &dl, const SDValue OldLHS,
                                          const SDValue OldRHS,
                                          SDValue &Chain,
                                          bool IsSignaling) const {
   // FIXME: Currently we cannot really respect all IEEE predicates due to libgcc
   // not supporting it. We can update this code when libgcc provides such
   // functions.
 
   assert((VT == MVT::f32 || VT == MVT::f64 || VT == MVT::f128 || VT == MVT::ppcf128)
          && "Unsupported setcc type!");
 
   // Expand into one or more soft-fp libcall(s).
   RTLIB::Libcall LC1 = RTLIB::UNKNOWN_LIBCALL, LC2 = RTLIB::UNKNOWN_LIBCALL;
   bool ShouldInvertCC = false;
   switch (CCCode) {
   case ISD::SETEQ:
   case ISD::SETOEQ:
     LC1 = (VT == MVT::f32) ? RTLIB::OEQ_F32 :
           (VT == MVT::f64) ? RTLIB::OEQ_F64 :
           (VT == MVT::f128) ? RTLIB::OEQ_F128 : RTLIB::OEQ_PPCF128;
     break;
   case ISD::SETNE:
   case ISD::SETUNE:
     LC1 = (VT == MVT::f32) ? RTLIB::UNE_F32 :
           (VT == MVT::f64) ? RTLIB::UNE_F64 :
           (VT == MVT::f128) ? RTLIB::UNE_F128 : RTLIB::UNE_PPCF128;
     break;
   case ISD::SETGE:
   case ISD::SETOGE:
     LC1 = (VT == MVT::f32) ? RTLIB::OGE_F32 :
           (VT == MVT::f64) ? RTLIB::OGE_F64 :
           (VT == MVT::f128) ? RTLIB::OGE_F128 : RTLIB::OGE_PPCF128;
     break;
   case ISD::SETLT:
   case ISD::SETOLT:
     LC1 = (VT == MVT::f32) ? RTLIB::OLT_F32 :
           (VT == MVT::f64) ? RTLIB::OLT_F64 :
           (VT == MVT::f128) ? RTLIB::OLT_F128 : RTLIB::OLT_PPCF128;
     break;
   case ISD::SETLE:
   case ISD::SETOLE:
     LC1 = (VT == MVT::f32) ? RTLIB::OLE_F32 :
           (VT == MVT::f64) ? RTLIB::OLE_F64 :
           (VT == MVT::f128) ? RTLIB::OLE_F128 : RTLIB::OLE_PPCF128;
     break;
   case ISD::SETGT:
   case ISD::SETOGT:
     LC1 = (VT == MVT::f32) ? RTLIB::OGT_F32 :
           (VT == MVT::f64) ? RTLIB::OGT_F64 :
           (VT == MVT::f128) ? RTLIB::OGT_F128 : RTLIB::OGT_PPCF128;
     break;
   case ISD::SETO:
     ShouldInvertCC = true;
     LLVM_FALLTHROUGH;
   case ISD::SETUO:
     LC1 = (VT == MVT::f32) ? RTLIB::UO_F32 :
           (VT == MVT::f64) ? RTLIB::UO_F64 :
           (VT == MVT::f128) ? RTLIB::UO_F128 : RTLIB::UO_PPCF128;
     break;
   case ISD::SETONE:
     // SETONE = O && UNE
     ShouldInvertCC = true;
     LLVM_FALLTHROUGH;
   case ISD::SETUEQ:
     LC1 = (VT == MVT::f32) ? RTLIB::UO_F32 :
           (VT == MVT::f64) ? RTLIB::UO_F64 :
           (VT == MVT::f128) ? RTLIB::UO_F128 : RTLIB::UO_PPCF128;
     LC2 = (VT == MVT::f32) ? RTLIB::OEQ_F32 :
           (VT == MVT::f64) ? RTLIB::OEQ_F64 :
           (VT == MVT::f128) ? RTLIB::OEQ_F128 : RTLIB::OEQ_PPCF128;
     break;
   default:
     // Invert CC for unordered comparisons
     ShouldInvertCC = true;
     switch (CCCode) {
     case ISD::SETULT:
       LC1 = (VT == MVT::f32) ? RTLIB::OGE_F32 :
             (VT == MVT::f64) ? RTLIB::OGE_F64 :
             (VT == MVT::f128) ? RTLIB::OGE_F128 : RTLIB::OGE_PPCF128;
       break;
     case ISD::SETULE:
       LC1 = (VT == MVT::f32) ? RTLIB::OGT_F32 :
             (VT == MVT::f64) ? RTLIB::OGT_F64 :
             (VT == MVT::f128) ? RTLIB::OGT_F128 : RTLIB::OGT_PPCF128;
       break;
     case ISD::SETUGT:
       LC1 = (VT == MVT::f32) ? RTLIB::OLE_F32 :
             (VT == MVT::f64) ? RTLIB::OLE_F64 :
             (VT == MVT::f128) ? RTLIB::OLE_F128 : RTLIB::OLE_PPCF128;
       break;
     case ISD::SETUGE:
       LC1 = (VT == MVT::f32) ? RTLIB::OLT_F32 :
             (VT == MVT::f64) ? RTLIB::OLT_F64 :
             (VT == MVT::f128) ? RTLIB::OLT_F128 : RTLIB::OLT_PPCF128;
       break;
     default: llvm_unreachable("Do not know how to soften this setcc!");
     }
   }
 
   // Use the target specific return value for comparions lib calls.
   EVT RetVT = getCmpLibcallReturnType();
   SDValue Ops[2] = {NewLHS, NewRHS};
   TargetLowering::MakeLibCallOptions CallOptions;
   EVT OpsVT[2] = { OldLHS.getValueType(),
                    OldRHS.getValueType() };
   CallOptions.setTypeListBeforeSoften(OpsVT, RetVT, true);
   auto Call = makeLibCall(DAG, LC1, RetVT, Ops, CallOptions, dl, Chain);
   NewLHS = Call.first;
   NewRHS = DAG.getConstant(0, dl, RetVT);
 
   CCCode = getCmpLibcallCC(LC1);
   if (ShouldInvertCC) {
     assert(RetVT.isInteger());
     CCCode = getSetCCInverse(CCCode, RetVT);
   }
 
   if (LC2 == RTLIB::UNKNOWN_LIBCALL) {
     // Update Chain.
     Chain = Call.second;
   } else {
     EVT SetCCVT =
         getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), RetVT);
     SDValue Tmp = DAG.getSetCC(dl, SetCCVT, NewLHS, NewRHS, CCCode);
     auto Call2 = makeLibCall(DAG, LC2, RetVT, Ops, CallOptions, dl, Chain);
     CCCode = getCmpLibcallCC(LC2);
     if (ShouldInvertCC)
       CCCode = getSetCCInverse(CCCode, RetVT);
     NewLHS = DAG.getSetCC(dl, SetCCVT, Call2.first, NewRHS, CCCode);
     if (Chain)
       Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Call.second,
                           Call2.second);
     NewLHS = DAG.getNode(ShouldInvertCC ? ISD::AND : ISD::OR, dl,
                          Tmp.getValueType(), Tmp, NewLHS);
     NewRHS = SDValue();
   }
 }
 
 /// Return the entry encoding for a jump table in the current function. The
 /// returned value is a member of the MachineJumpTableInfo::JTEntryKind enum.
 unsigned TargetLowering::getJumpTableEncoding() const {
   // In non-pic modes, just use the address of a block.
   if (!isPositionIndependent())
     return MachineJumpTableInfo::EK_BlockAddress;
 
   // In PIC mode, if the target supports a GPRel32 directive, use it.
   if (getTargetMachine().getMCAsmInfo()->getGPRel32Directive() != nullptr)
     return MachineJumpTableInfo::EK_GPRel32BlockAddress;
 
   // Otherwise, use a label difference.
   return MachineJumpTableInfo::EK_LabelDifference32;
 }
 
 SDValue TargetLowering::getPICJumpTableRelocBase(SDValue Table,
                                                  SelectionDAG &DAG) const {
   // If our PIC model is GP relative, use the global offset table as the base.
   unsigned JTEncoding = getJumpTableEncoding();
 
   if ((JTEncoding == MachineJumpTableInfo::EK_GPRel64BlockAddress) ||
       (JTEncoding == MachineJumpTableInfo::EK_GPRel32BlockAddress))
     return DAG.getGLOBAL_OFFSET_TABLE(getPointerTy(DAG.getDataLayout()));
 
   return Table;
 }
 
 /// This returns the relocation base for the given PIC jumptable, the same as
 /// getPICJumpTableRelocBase, but as an MCExpr.
 const MCExpr *
 TargetLowering::getPICJumpTableRelocBaseExpr(const MachineFunction *MF,
                                              unsigned JTI,MCContext &Ctx) const{
   // The normal PIC reloc base is the label at the start of the jump table.
   return MCSymbolRefExpr::create(MF->getJTISymbol(JTI, Ctx), Ctx);
 }
 
 bool
 TargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
   const TargetMachine &TM = getTargetMachine();
   const GlobalValue *GV = GA->getGlobal();
 
   // If the address is not even local to this DSO we will have to load it from
   // a got and then add the offset.
   if (!TM.shouldAssumeDSOLocal(*GV->getParent(), GV))
     return false;
 
   // If the code is position independent we will have to add a base register.
   if (isPositionIndependent())
     return false;
 
   // Otherwise we can do it.
   return true;
 }
 
 //===----------------------------------------------------------------------===//
 //  Optimization Methods
 //===----------------------------------------------------------------------===//
 
 /// If the specified instruction has a constant integer operand and there are
 /// bits set in that constant that are not demanded, then clear those bits and
 /// return true.
 bool TargetLowering::ShrinkDemandedConstant(SDValue Op,
                                             const APInt &DemandedBits,
                                             const APInt &DemandedElts,
                                             TargetLoweringOpt &TLO) const {
   SDLoc DL(Op);
   unsigned Opcode = Op.getOpcode();
 
   // Do target-specific constant optimization.
   if (targetShrinkDemandedConstant(Op, DemandedBits, DemandedElts, TLO))
     return TLO.New.getNode();
 
   // FIXME: ISD::SELECT, ISD::SELECT_CC
   switch (Opcode) {
   default:
     break;
   case ISD::XOR:
   case ISD::AND:
   case ISD::OR: {
     auto *Op1C = dyn_cast<ConstantSDNode>(Op.getOperand(1));
     if (!Op1C)
       return false;
 
     // If this is a 'not' op, don't touch it because that's a canonical form.
     const APInt &C = Op1C->getAPIntValue();
     if (Opcode == ISD::XOR && DemandedBits.isSubsetOf(C))
       return false;
 
     if (!C.isSubsetOf(DemandedBits)) {
       EVT VT = Op.getValueType();
       SDValue NewC = TLO.DAG.getConstant(DemandedBits & C, DL, VT);
       SDValue NewOp = TLO.DAG.getNode(Opcode, DL, VT, Op.getOperand(0), NewC);
       return TLO.CombineTo(Op, NewOp);
     }
 
     break;
   }
   }
 
   return false;
 }
 
 bool TargetLowering::ShrinkDemandedConstant(SDValue Op,
                                             const APInt &DemandedBits,
                                             TargetLoweringOpt &TLO) const {
   EVT VT = Op.getValueType();
   APInt DemandedElts = VT.isVector()
                            ? APInt::getAllOnesValue(VT.getVectorNumElements())
                            : APInt(1, 1);
   return ShrinkDemandedConstant(Op, DemandedBits, DemandedElts, TLO);
 }
 
 /// Convert x+y to (VT)((SmallVT)x+(SmallVT)y) if the casts are free.
 /// This uses isZExtFree and ZERO_EXTEND for the widening cast, but it could be
 /// generalized for targets with other types of implicit widening casts.
 bool TargetLowering::ShrinkDemandedOp(SDValue Op, unsigned BitWidth,
                                       const APInt &Demanded,
                                       TargetLoweringOpt &TLO) const {
   assert(Op.getNumOperands() == 2 &&
          "ShrinkDemandedOp only supports binary operators!");
   assert(Op.getNode()->getNumValues() == 1 &&
          "ShrinkDemandedOp only supports nodes with one result!");
 
   SelectionDAG &DAG = TLO.DAG;
   SDLoc dl(Op);
 
   // Early return, as this function cannot handle vector types.
   if (Op.getValueType().isVector())
     return false;
 
   // Don't do this if the node has another user, which may require the
   // full value.
   if (!Op.getNode()->hasOneUse())
     return false;
 
   // Search for the smallest integer type with free casts to and from
   // Op's type. For expedience, just check power-of-2 integer types.
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   unsigned DemandedSize = Demanded.getActiveBits();
   unsigned SmallVTBits = DemandedSize;
   if (!isPowerOf2_32(SmallVTBits))
     SmallVTBits = NextPowerOf2(SmallVTBits);
   for (; SmallVTBits < BitWidth; SmallVTBits = NextPowerOf2(SmallVTBits)) {
     EVT SmallVT = EVT::getIntegerVT(*DAG.getContext(), SmallVTBits);
     if (TLI.isTruncateFree(Op.getValueType(), SmallVT) &&
         TLI.isZExtFree(SmallVT, Op.getValueType())) {
       // We found a type with free casts.
       SDValue X = DAG.getNode(
           Op.getOpcode(), dl, SmallVT,
           DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(0)),
           DAG.getNode(ISD::TRUNCATE, dl, SmallVT, Op.getOperand(1)));
       assert(DemandedSize <= SmallVTBits && "Narrowed below demanded bits?");
       SDValue Z = DAG.getNode(ISD::ANY_EXTEND, dl, Op.getValueType(), X);
       return TLO.CombineTo(Op, Z);
     }
   }
   return false;
 }
 
 bool TargetLowering::SimplifyDemandedBits(SDValue Op, const APInt &DemandedBits,
                                           DAGCombinerInfo &DCI) const {
   SelectionDAG &DAG = DCI.DAG;
   TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),
                         !DCI.isBeforeLegalizeOps());
   KnownBits Known;
 
   bool Simplified = SimplifyDemandedBits(Op, DemandedBits, Known, TLO);
   if (Simplified) {
     DCI.AddToWorklist(Op.getNode());
     DCI.CommitTargetLoweringOpt(TLO);
   }
   return Simplified;
 }
 
 bool TargetLowering::SimplifyDemandedBits(SDValue Op, const APInt &DemandedBits,
                                           KnownBits &Known,
                                           TargetLoweringOpt &TLO,
                                           unsigned Depth,
                                           bool AssumeSingleUse) const {
   EVT VT = Op.getValueType();
 
   // TODO: We can probably do more work on calculating the known bits and
   // simplifying the operations for scalable vectors, but for now we just
   // bail out.
   if (VT.isScalableVector()) {
     // Pretend we don't know anything for now.
     Known = KnownBits(DemandedBits.getBitWidth());
     return false;
   }
 
   APInt DemandedElts = VT.isVector()
                            ? APInt::getAllOnesValue(VT.getVectorNumElements())
                            : APInt(1, 1);
   return SimplifyDemandedBits(Op, DemandedBits, DemandedElts, Known, TLO, Depth,
                               AssumeSingleUse);
 }
 
 // TODO: Can we merge SelectionDAG::GetDemandedBits into this?
 // TODO: Under what circumstances can we create nodes? Constant folding?
 SDValue TargetLowering::SimplifyMultipleUseDemandedBits(
     SDValue Op, const APInt &DemandedBits, const APInt &DemandedElts,
     SelectionDAG &DAG, unsigned Depth) const {
   // Limit search depth.
   if (Depth >= SelectionDAG::MaxRecursionDepth)
     return SDValue();
 
   // Ignore UNDEFs.
   if (Op.isUndef())
     return SDValue();
 
   // Not demanding any bits/elts from Op.
   if (DemandedBits == 0 || DemandedElts == 0)
     return DAG.getUNDEF(Op.getValueType());
 
   unsigned NumElts = DemandedElts.getBitWidth();
   unsigned BitWidth = DemandedBits.getBitWidth();
   KnownBits LHSKnown, RHSKnown;
   switch (Op.getOpcode()) {
   case ISD::BITCAST: {
     SDValue Src = peekThroughBitcasts(Op.getOperand(0));
     EVT SrcVT = Src.getValueType();
     EVT DstVT = Op.getValueType();
     if (SrcVT == DstVT)
       return Src;
 
     unsigned NumSrcEltBits = SrcVT.getScalarSizeInBits();
     unsigned NumDstEltBits = DstVT.getScalarSizeInBits();
     if (NumSrcEltBits == NumDstEltBits)
       if (SDValue V = SimplifyMultipleUseDemandedBits(
               Src, DemandedBits, DemandedElts, DAG, Depth + 1))
         return DAG.getBitcast(DstVT, V);
 
     // TODO - bigendian once we have test coverage.
     if (SrcVT.isVector() && (NumDstEltBits % NumSrcEltBits) == 0 &&
         DAG.getDataLayout().isLittleEndian()) {
       unsigned Scale = NumDstEltBits / NumSrcEltBits;
       unsigned NumSrcElts = SrcVT.getVectorNumElements();
       APInt DemandedSrcBits = APInt::getNullValue(NumSrcEltBits);
       APInt DemandedSrcElts = APInt::getNullValue(NumSrcElts);
       for (unsigned i = 0; i != Scale; ++i) {
         unsigned Offset = i * NumSrcEltBits;
         APInt Sub = DemandedBits.extractBits(NumSrcEltBits, Offset);
         if (!Sub.isNullValue()) {
           DemandedSrcBits |= Sub;
           for (unsigned j = 0; j != NumElts; ++j)
             if (DemandedElts[j])
               DemandedSrcElts.setBit((j * Scale) + i);
         }
       }
 
       if (SDValue V = SimplifyMultipleUseDemandedBits(
               Src, DemandedSrcBits, DemandedSrcElts, DAG, Depth + 1))
         return DAG.getBitcast(DstVT, V);
     }
 
     // TODO - bigendian once we have test coverage.
     if ((NumSrcEltBits % NumDstEltBits) == 0 &&
         DAG.getDataLayout().isLittleEndian()) {
       unsigned Scale = NumSrcEltBits / NumDstEltBits;
       unsigned NumSrcElts = SrcVT.isVector() ? SrcVT.getVectorNumElements() : 1;
       APInt DemandedSrcBits = APInt::getNullValue(NumSrcEltBits);
       APInt DemandedSrcElts = APInt::getNullValue(NumSrcElts);
       for (unsigned i = 0; i != NumElts; ++i)
         if (DemandedElts[i]) {
           unsigned Offset = (i % Scale) * NumDstEltBits;
           DemandedSrcBits.insertBits(DemandedBits, Offset);
           DemandedSrcElts.setBit(i / Scale);
         }
 
       if (SDValue V = SimplifyMultipleUseDemandedBits(
               Src, DemandedSrcBits, DemandedSrcElts, DAG, Depth + 1))
         return DAG.getBitcast(DstVT, V);
     }
 
     break;
   }
   case ISD::AND: {
     LHSKnown = DAG.computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);
     RHSKnown = DAG.computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
 
     // If all of the demanded bits are known 1 on one side, return the other.
     // These bits cannot contribute to the result of the 'and' in this
     // context.
     if (DemandedBits.isSubsetOf(LHSKnown.Zero | RHSKnown.One))
       return Op.getOperand(0);
     if (DemandedBits.isSubsetOf(RHSKnown.Zero | LHSKnown.One))
       return Op.getOperand(1);
     break;
   }
   case ISD::OR: {
     LHSKnown = DAG.computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);
     RHSKnown = DAG.computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
 
     // If all of the demanded bits are known zero on one side, return the
     // other.  These bits cannot contribute to the result of the 'or' in this
     // context.
     if (DemandedBits.isSubsetOf(LHSKnown.One | RHSKnown.Zero))
       return Op.getOperand(0);
     if (DemandedBits.isSubsetOf(RHSKnown.One | LHSKnown.Zero))
       return Op.getOperand(1);
     break;
   }
   case ISD::XOR: {
     LHSKnown = DAG.computeKnownBits(Op.getOperand(0), DemandedElts, Depth + 1);
     RHSKnown = DAG.computeKnownBits(Op.getOperand(1), DemandedElts, Depth + 1);
 
     // If all of the demanded bits are known zero on one side, return the
     // other.
     if (DemandedBits.isSubsetOf(RHSKnown.Zero))
       return Op.getOperand(0);
     if (DemandedBits.isSubsetOf(LHSKnown.Zero))
       return Op.getOperand(1);
     break;
   }
   case ISD::SHL: {
     // If we are only demanding sign bits then we can use the shift source
     // directly.
     if (const APInt *MaxSA =
             DAG.getValidMaximumShiftAmountConstant(Op, DemandedElts)) {
       SDValue Op0 = Op.getOperand(0);
       unsigned ShAmt = MaxSA->getZExtValue();
       unsigned NumSignBits =
           DAG.ComputeNumSignBits(Op0, DemandedElts, Depth + 1);
       unsigned UpperDemandedBits = BitWidth - DemandedBits.countTrailingZeros();
       if (NumSignBits > ShAmt && (NumSignBits - ShAmt) >= (UpperDemandedBits))
         return Op0;
     }
     break;
   }
   case ISD::SETCC: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
     ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(2))->get();
     // If (1) we only need the sign-bit, (2) the setcc operands are the same
     // width as the setcc result, and (3) the result of a setcc conforms to 0 or
     // -1, we may be able to bypass the setcc.
     if (DemandedBits.isSignMask() &&
         Op0.getScalarValueSizeInBits() == BitWidth &&
         getBooleanContents(Op0.getValueType()) ==
             BooleanContent::ZeroOrNegativeOneBooleanContent) {
       // If we're testing X < 0, then this compare isn't needed - just use X!
       // FIXME: We're limiting to integer types here, but this should also work
       // if we don't care about FP signed-zero. The use of SETLT with FP means
       // that we don't care about NaNs.
       if (CC == ISD::SETLT && Op1.getValueType().isInteger() &&
           (isNullConstant(Op1) || ISD::isBuildVectorAllZeros(Op1.getNode())))
         return Op0;
     }
     break;
   }
   case ISD::SIGN_EXTEND_INREG: {
     // If none of the extended bits are demanded, eliminate the sextinreg.
     SDValue Op0 = Op.getOperand(0);
     EVT ExVT = cast<VTSDNode>(Op.getOperand(1))->getVT();
     unsigned ExBits = ExVT.getScalarSizeInBits();
     if (DemandedBits.getActiveBits() <= ExBits)
       return Op0;
     // If the input is already sign extended, just drop the extension.
     unsigned NumSignBits = DAG.ComputeNumSignBits(Op0, DemandedElts, Depth + 1);
     if (NumSignBits >= (BitWidth - ExBits + 1))
       return Op0;
     break;
   }
   case ISD::ANY_EXTEND_VECTOR_INREG:
   case ISD::SIGN_EXTEND_VECTOR_INREG:
   case ISD::ZERO_EXTEND_VECTOR_INREG: {
     // If we only want the lowest element and none of extended bits, then we can
     // return the bitcasted source vector.
     SDValue Src = Op.getOperand(0);
     EVT SrcVT = Src.getValueType();
     EVT DstVT = Op.getValueType();
     if (DemandedElts == 1 && DstVT.getSizeInBits() == SrcVT.getSizeInBits() &&
         DAG.getDataLayout().isLittleEndian() &&
         DemandedBits.getActiveBits() <= SrcVT.getScalarSizeInBits()) {
       return DAG.getBitcast(DstVT, Src);
     }
     break;
   }
   case ISD::INSERT_VECTOR_ELT: {
     // If we don't demand the inserted element, return the base vector.
     SDValue Vec = Op.getOperand(0);
     auto *CIdx = dyn_cast<ConstantSDNode>(Op.getOperand(2));
     EVT VecVT = Vec.getValueType();
     if (CIdx && CIdx->getAPIntValue().ult(VecVT.getVectorNumElements()) &&
         !DemandedElts[CIdx->getZExtValue()])
       return Vec;
     break;
   }
   case ISD::INSERT_SUBVECTOR: {
     // If we don't demand the inserted subvector, return the base vector.
     SDValue Vec = Op.getOperand(0);
     SDValue Sub = Op.getOperand(1);
     uint64_t Idx = Op.getConstantOperandVal(2);
     unsigned NumSubElts = Sub.getValueType().getVectorNumElements();
     if (DemandedElts.extractBits(NumSubElts, Idx) == 0)
       return Vec;
     break;
   }
   case ISD::VECTOR_SHUFFLE: {
     ArrayRef<int> ShuffleMask = cast<ShuffleVectorSDNode>(Op)->getMask();
 
     // If all the demanded elts are from one operand and are inline,
     // then we can use the operand directly.
     bool AllUndef = true, IdentityLHS = true, IdentityRHS = true;
     for (unsigned i = 0; i != NumElts; ++i) {
       int M = ShuffleMask[i];
       if (M < 0 || !DemandedElts[i])
         continue;
       AllUndef = false;
       IdentityLHS &= (M == (int)i);
       IdentityRHS &= ((M - NumElts) == i);
     }
 
     if (AllUndef)
       return DAG.getUNDEF(Op.getValueType());
     if (IdentityLHS)
       return Op.getOperand(0);
     if (IdentityRHS)
       return Op.getOperand(1);
     break;
   }
   default:
     if (Op.getOpcode() >= ISD::BUILTIN_OP_END)
       if (SDValue V = SimplifyMultipleUseDemandedBitsForTargetNode(
               Op, DemandedBits, DemandedElts, DAG, Depth))
         return V;
     break;
   }
   return SDValue();
 }
 
 SDValue TargetLowering::SimplifyMultipleUseDemandedBits(
     SDValue Op, const APInt &DemandedBits, SelectionDAG &DAG,
     unsigned Depth) const {
   EVT VT = Op.getValueType();
   APInt DemandedElts = VT.isVector()
                            ? APInt::getAllOnesValue(VT.getVectorNumElements())
                            : APInt(1, 1);
   return SimplifyMultipleUseDemandedBits(Op, DemandedBits, DemandedElts, DAG,
                                          Depth);
 }
 
 SDValue TargetLowering::SimplifyMultipleUseDemandedVectorElts(
     SDValue Op, const APInt &DemandedElts, SelectionDAG &DAG,
     unsigned Depth) const {
   APInt DemandedBits = APInt::getAllOnesValue(Op.getScalarValueSizeInBits());
   return SimplifyMultipleUseDemandedBits(Op, DemandedBits, DemandedElts, DAG,
                                          Depth);
 }
 
 /// Look at Op. At this point, we know that only the OriginalDemandedBits of the
 /// result of Op are ever used downstream. If we can use this information to
 /// simplify Op, create a new simplified DAG node and return true, returning the
 /// original and new nodes in Old and New. Otherwise, analyze the expression and
 /// return a mask of Known bits for the expression (used to simplify the
 /// caller).  The Known bits may only be accurate for those bits in the
 /// OriginalDemandedBits and OriginalDemandedElts.
 bool TargetLowering::SimplifyDemandedBits(
     SDValue Op, const APInt &OriginalDemandedBits,
     const APInt &OriginalDemandedElts, KnownBits &Known, TargetLoweringOpt &TLO,
     unsigned Depth, bool AssumeSingleUse) const {
   unsigned BitWidth = OriginalDemandedBits.getBitWidth();
   assert(Op.getScalarValueSizeInBits() == BitWidth &&
          "Mask size mismatches value type size!");
 
   // Don't know anything.
   Known = KnownBits(BitWidth);
 
   // TODO: We can probably do more work on calculating the known bits and
   // simplifying the operations for scalable vectors, but for now we just
   // bail out.
   if (Op.getValueType().isScalableVector())
     return false;
 
   unsigned NumElts = OriginalDemandedElts.getBitWidth();
   assert((!Op.getValueType().isVector() ||
           NumElts == Op.getValueType().getVectorNumElements()) &&
          "Unexpected vector size");
 
   APInt DemandedBits = OriginalDemandedBits;
   APInt DemandedElts = OriginalDemandedElts;
   SDLoc dl(Op);
   auto &DL = TLO.DAG.getDataLayout();
 
   // Undef operand.
   if (Op.isUndef())
     return false;
 
   if (Op.getOpcode() == ISD::Constant) {
     // We know all of the bits for a constant!
     Known.One = cast<ConstantSDNode>(Op)->getAPIntValue();
     Known.Zero = ~Known.One;
     return false;
   }
 
   // Other users may use these bits.
   EVT VT = Op.getValueType();
   if (!Op.getNode()->hasOneUse() && !AssumeSingleUse) {
     if (Depth != 0) {
       // If not at the root, Just compute the Known bits to
       // simplify things downstream.
       Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);
       return false;
     }
     // If this is the root being simplified, allow it to have multiple uses,
     // just set the DemandedBits/Elts to all bits.
     DemandedBits = APInt::getAllOnesValue(BitWidth);
     DemandedElts = APInt::getAllOnesValue(NumElts);
   } else if (OriginalDemandedBits == 0 || OriginalDemandedElts == 0) {
     // Not demanding any bits/elts from Op.
     return TLO.CombineTo(Op, TLO.DAG.getUNDEF(VT));
   } else if (Depth >= SelectionDAG::MaxRecursionDepth) {
     // Limit search depth.
     return false;
   }
 
   KnownBits Known2;
   switch (Op.getOpcode()) {
   case ISD::TargetConstant:
     llvm_unreachable("Can't simplify this node");
   case ISD::SCALAR_TO_VECTOR: {
     if (!DemandedElts[0])
       return TLO.CombineTo(Op, TLO.DAG.getUNDEF(VT));
 
     KnownBits SrcKnown;
     SDValue Src = Op.getOperand(0);
     unsigned SrcBitWidth = Src.getScalarValueSizeInBits();
     APInt SrcDemandedBits = DemandedBits.zextOrSelf(SrcBitWidth);
     if (SimplifyDemandedBits(Src, SrcDemandedBits, SrcKnown, TLO, Depth + 1))
       return true;
 
     // Upper elements are undef, so only get the knownbits if we just demand
     // the bottom element.
     if (DemandedElts == 1)
       Known = SrcKnown.anyextOrTrunc(BitWidth);
     break;
   }
   case ISD::BUILD_VECTOR:
     // Collect the known bits that are shared by every demanded element.
     // TODO: Call SimplifyDemandedBits for non-constant demanded elements.
     Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);
     return false; // Don't fall through, will infinitely loop.
   case ISD::LOAD: {
     LoadSDNode *LD = cast<LoadSDNode>(Op);
     if (getTargetConstantFromLoad(LD)) {
       Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);
       return false; // Don't fall through, will infinitely loop.
     } else if (ISD::isZEXTLoad(Op.getNode()) && Op.getResNo() == 0) {
       // If this is a ZEXTLoad and we are looking at the loaded value.
       EVT MemVT = LD->getMemoryVT();
       unsigned MemBits = MemVT.getScalarSizeInBits();
       Known.Zero.setBitsFrom(MemBits);
       return false; // Don't fall through, will infinitely loop.
     }
     break;
   }
   case ISD::INSERT_VECTOR_ELT: {
     SDValue Vec = Op.getOperand(0);
     SDValue Scl = Op.getOperand(1);
     auto *CIdx = dyn_cast<ConstantSDNode>(Op.getOperand(2));
     EVT VecVT = Vec.getValueType();
 
     // If index isn't constant, assume we need all vector elements AND the
     // inserted element.
     APInt DemandedVecElts(DemandedElts);
     if (CIdx && CIdx->getAPIntValue().ult(VecVT.getVectorNumElements())) {
       unsigned Idx = CIdx->getZExtValue();
       DemandedVecElts.clearBit(Idx);
 
       // Inserted element is not required.
       if (!DemandedElts[Idx])
         return TLO.CombineTo(Op, Vec);
     }
 
     KnownBits KnownScl;
     unsigned NumSclBits = Scl.getScalarValueSizeInBits();
     APInt DemandedSclBits = DemandedBits.zextOrTrunc(NumSclBits);
     if (SimplifyDemandedBits(Scl, DemandedSclBits, KnownScl, TLO, Depth + 1))
       return true;
 
     Known = KnownScl.anyextOrTrunc(BitWidth);
 
     KnownBits KnownVec;
     if (SimplifyDemandedBits(Vec, DemandedBits, DemandedVecElts, KnownVec, TLO,
                              Depth + 1))
       return true;
 
     if (!!DemandedVecElts) {
       Known.One &= KnownVec.One;
       Known.Zero &= KnownVec.Zero;
     }
 
     return false;
   }
   case ISD::INSERT_SUBVECTOR: {
     // Demand any elements from the subvector and the remainder from the src its
     // inserted into.
     SDValue Src = Op.getOperand(0);
     SDValue Sub = Op.getOperand(1);
     uint64_t Idx = Op.getConstantOperandVal(2);
     unsigned NumSubElts = Sub.getValueType().getVectorNumElements();
     APInt DemandedSubElts = DemandedElts.extractBits(NumSubElts, Idx);
     APInt DemandedSrcElts = DemandedElts;
     DemandedSrcElts.insertBits(APInt::getNullValue(NumSubElts), Idx);
 
     KnownBits KnownSub, KnownSrc;
     if (SimplifyDemandedBits(Sub, DemandedBits, DemandedSubElts, KnownSub, TLO,
                              Depth + 1))
       return true;
     if (SimplifyDemandedBits(Src, DemandedBits, DemandedSrcElts, KnownSrc, TLO,
                              Depth + 1))
       return true;
 
     Known.Zero.setAllBits();
     Known.One.setAllBits();
     if (!!DemandedSubElts) {
       Known.One &= KnownSub.One;
       Known.Zero &= KnownSub.Zero;
     }
     if (!!DemandedSrcElts) {
       Known.One &= KnownSrc.One;
       Known.Zero &= KnownSrc.Zero;
     }
 
     // Attempt to avoid multi-use src if we don't need anything from it.
     if (!DemandedBits.isAllOnesValue() || !DemandedSubElts.isAllOnesValue() ||
         !DemandedSrcElts.isAllOnesValue()) {
       SDValue NewSub = SimplifyMultipleUseDemandedBits(
           Sub, DemandedBits, DemandedSubElts, TLO.DAG, Depth + 1);
       SDValue NewSrc = SimplifyMultipleUseDemandedBits(
           Src, DemandedBits, DemandedSrcElts, TLO.DAG, Depth + 1);
       if (NewSub || NewSrc) {
         NewSub = NewSub ? NewSub : Sub;
         NewSrc = NewSrc ? NewSrc : Src;
         SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), dl, VT, NewSrc, NewSub,
                                         Op.getOperand(2));
         return TLO.CombineTo(Op, NewOp);
       }
     }
     break;
   }
   case ISD::EXTRACT_SUBVECTOR: {
     // Offset the demanded elts by the subvector index.
     SDValue Src = Op.getOperand(0);
     if (Src.getValueType().isScalableVector())
       break;
     uint64_t Idx = Op.getConstantOperandVal(1);
     unsigned NumSrcElts = Src.getValueType().getVectorNumElements();
     APInt DemandedSrcElts = DemandedElts.zextOrSelf(NumSrcElts).shl(Idx);
 
     if (SimplifyDemandedBits(Src, DemandedBits, DemandedSrcElts, Known, TLO,
                              Depth + 1))
       return true;
 
     // Attempt to avoid multi-use src if we don't need anything from it.
     if (!DemandedBits.isAllOnesValue() || !DemandedSrcElts.isAllOnesValue()) {
       SDValue DemandedSrc = SimplifyMultipleUseDemandedBits(
           Src, DemandedBits, DemandedSrcElts, TLO.DAG, Depth + 1);
       if (DemandedSrc) {
         SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), dl, VT, DemandedSrc,
                                         Op.getOperand(1));
         return TLO.CombineTo(Op, NewOp);
       }
     }
     break;
   }
   case ISD::CONCAT_VECTORS: {
     Known.Zero.setAllBits();
     Known.One.setAllBits();
     EVT SubVT = Op.getOperand(0).getValueType();
     unsigned NumSubVecs = Op.getNumOperands();
     unsigned NumSubElts = SubVT.getVectorNumElements();
     for (unsigned i = 0; i != NumSubVecs; ++i) {
       APInt DemandedSubElts =
           DemandedElts.extractBits(NumSubElts, i * NumSubElts);
       if (SimplifyDemandedBits(Op.getOperand(i), DemandedBits, DemandedSubElts,
                                Known2, TLO, Depth + 1))
         return true;
       // Known bits are shared by every demanded subvector element.
       if (!!DemandedSubElts) {
         Known.One &= Known2.One;
         Known.Zero &= Known2.Zero;
       }
     }
     break;
   }
   case ISD::VECTOR_SHUFFLE: {
     ArrayRef<int> ShuffleMask = cast<ShuffleVectorSDNode>(Op)->getMask();
 
     // Collect demanded elements from shuffle operands..
     APInt DemandedLHS(NumElts, 0);
     APInt DemandedRHS(NumElts, 0);
     for (unsigned i = 0; i != NumElts; ++i) {
       if (!DemandedElts[i])
         continue;
       int M = ShuffleMask[i];
       if (M < 0) {
         // For UNDEF elements, we don't know anything about the common state of
         // the shuffle result.
         DemandedLHS.clearAllBits();
         DemandedRHS.clearAllBits();
         break;
       }
       assert(0 <= M && M < (int)(2 * NumElts) && "Shuffle index out of range");
       if (M < (int)NumElts)
         DemandedLHS.setBit(M);
       else
         DemandedRHS.setBit(M - NumElts);
     }
 
     if (!!DemandedLHS || !!DemandedRHS) {
       SDValue Op0 = Op.getOperand(0);
       SDValue Op1 = Op.getOperand(1);
 
       Known.Zero.setAllBits();
       Known.One.setAllBits();
       if (!!DemandedLHS) {
         if (SimplifyDemandedBits(Op0, DemandedBits, DemandedLHS, Known2, TLO,
                                  Depth + 1))
           return true;
         Known.One &= Known2.One;
         Known.Zero &= Known2.Zero;
       }
       if (!!DemandedRHS) {
         if (SimplifyDemandedBits(Op1, DemandedBits, DemandedRHS, Known2, TLO,
                                  Depth + 1))
           return true;
         Known.One &= Known2.One;
         Known.Zero &= Known2.Zero;
       }
 
       // Attempt to avoid multi-use ops if we don't need anything from them.
       SDValue DemandedOp0 = SimplifyMultipleUseDemandedBits(
           Op0, DemandedBits, DemandedLHS, TLO.DAG, Depth + 1);
       SDValue DemandedOp1 = SimplifyMultipleUseDemandedBits(
           Op1, DemandedBits, DemandedRHS, TLO.DAG, Depth + 1);
       if (DemandedOp0 || DemandedOp1) {
         Op0 = DemandedOp0 ? DemandedOp0 : Op0;
         Op1 = DemandedOp1 ? DemandedOp1 : Op1;
         SDValue NewOp = TLO.DAG.getVectorShuffle(VT, dl, Op0, Op1, ShuffleMask);
         return TLO.CombineTo(Op, NewOp);
       }
     }
     break;
   }
   case ISD::AND: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
 
     // If the RHS is a constant, check to see if the LHS would be zero without
     // using the bits from the RHS.  Below, we use knowledge about the RHS to
     // simplify the LHS, here we're using information from the LHS to simplify
     // the RHS.
     if (ConstantSDNode *RHSC = isConstOrConstSplat(Op1)) {
       // Do not increment Depth here; that can cause an infinite loop.
       KnownBits LHSKnown = TLO.DAG.computeKnownBits(Op0, DemandedElts, Depth);
       // If the LHS already has zeros where RHSC does, this 'and' is dead.
       if ((LHSKnown.Zero & DemandedBits) ==
           (~RHSC->getAPIntValue() & DemandedBits))
         return TLO.CombineTo(Op, Op0);
 
       // If any of the set bits in the RHS are known zero on the LHS, shrink
       // the constant.
       if (ShrinkDemandedConstant(Op, ~LHSKnown.Zero & DemandedBits,
                                  DemandedElts, TLO))
         return true;
 
       // Bitwise-not (xor X, -1) is a special case: we don't usually shrink its
       // constant, but if this 'and' is only clearing bits that were just set by
       // the xor, then this 'and' can be eliminated by shrinking the mask of
       // the xor. For example, for a 32-bit X:
       // and (xor (srl X, 31), -1), 1 --> xor (srl X, 31), 1
       if (isBitwiseNot(Op0) && Op0.hasOneUse() &&
           LHSKnown.One == ~RHSC->getAPIntValue()) {
         SDValue Xor = TLO.DAG.getNode(ISD::XOR, dl, VT, Op0.getOperand(0), Op1);
         return TLO.CombineTo(Op, Xor);
       }
     }
 
     if (SimplifyDemandedBits(Op1, DemandedBits, DemandedElts, Known, TLO,
                              Depth + 1))
       return true;
     assert(!Known.hasConflict() && "Bits known to be one AND zero?");
     if (SimplifyDemandedBits(Op0, ~Known.Zero & DemandedBits, DemandedElts,
                              Known2, TLO, Depth + 1))
       return true;
     assert(!Known2.hasConflict() && "Bits known to be one AND zero?");
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     if (!DemandedBits.isAllOnesValue() || !DemandedElts.isAllOnesValue()) {
       SDValue DemandedOp0 = SimplifyMultipleUseDemandedBits(
           Op0, DemandedBits, DemandedElts, TLO.DAG, Depth + 1);
       SDValue DemandedOp1 = SimplifyMultipleUseDemandedBits(
           Op1, DemandedBits, DemandedElts, TLO.DAG, Depth + 1);
       if (DemandedOp0 || DemandedOp1) {
         Op0 = DemandedOp0 ? DemandedOp0 : Op0;
         Op1 = DemandedOp1 ? DemandedOp1 : Op1;
         SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), dl, VT, Op0, Op1);
         return TLO.CombineTo(Op, NewOp);
       }
     }
 
     // If all of the demanded bits are known one on one side, return the other.
     // These bits cannot contribute to the result of the 'and'.
     if (DemandedBits.isSubsetOf(Known2.Zero | Known.One))
       return TLO.CombineTo(Op, Op0);
     if (DemandedBits.isSubsetOf(Known.Zero | Known2.One))
       return TLO.CombineTo(Op, Op1);
     // If all of the demanded bits in the inputs are known zeros, return zero.
     if (DemandedBits.isSubsetOf(Known.Zero | Known2.Zero))
       return TLO.CombineTo(Op, TLO.DAG.getConstant(0, dl, VT));
     // If the RHS is a constant, see if we can simplify it.
     if (ShrinkDemandedConstant(Op, ~Known2.Zero & DemandedBits, DemandedElts,
                                TLO))
       return true;
     // If the operation can be done in a smaller type, do so.
     if (ShrinkDemandedOp(Op, BitWidth, DemandedBits, TLO))
       return true;
 
     Known &= Known2;
     break;
   }
   case ISD::OR: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
 
     if (SimplifyDemandedBits(Op1, DemandedBits, DemandedElts, Known, TLO,
                              Depth + 1))
       return true;
     assert(!Known.hasConflict() && "Bits known to be one AND zero?");
     if (SimplifyDemandedBits(Op0, ~Known.One & DemandedBits, DemandedElts,
                              Known2, TLO, Depth + 1))
       return true;
     assert(!Known2.hasConflict() && "Bits known to be one AND zero?");
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     if (!DemandedBits.isAllOnesValue() || !DemandedElts.isAllOnesValue()) {
       SDValue DemandedOp0 = SimplifyMultipleUseDemandedBits(
           Op0, DemandedBits, DemandedElts, TLO.DAG, Depth + 1);
       SDValue DemandedOp1 = SimplifyMultipleUseDemandedBits(
           Op1, DemandedBits, DemandedElts, TLO.DAG, Depth + 1);
       if (DemandedOp0 || DemandedOp1) {
         Op0 = DemandedOp0 ? DemandedOp0 : Op0;
         Op1 = DemandedOp1 ? DemandedOp1 : Op1;
         SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), dl, VT, Op0, Op1);
         return TLO.CombineTo(Op, NewOp);
       }
     }
 
     // If all of the demanded bits are known zero on one side, return the other.
     // These bits cannot contribute to the result of the 'or'.
     if (DemandedBits.isSubsetOf(Known2.One | Known.Zero))
       return TLO.CombineTo(Op, Op0);
     if (DemandedBits.isSubsetOf(Known.One | Known2.Zero))
       return TLO.CombineTo(Op, Op1);
     // If the RHS is a constant, see if we can simplify it.
     if (ShrinkDemandedConstant(Op, DemandedBits, DemandedElts, TLO))
       return true;
     // If the operation can be done in a smaller type, do so.
     if (ShrinkDemandedOp(Op, BitWidth, DemandedBits, TLO))
       return true;
 
     Known |= Known2;
     break;
   }
   case ISD::XOR: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
 
     if (SimplifyDemandedBits(Op1, DemandedBits, DemandedElts, Known, TLO,
                              Depth + 1))
       return true;
     assert(!Known.hasConflict() && "Bits known to be one AND zero?");
     if (SimplifyDemandedBits(Op0, DemandedBits, DemandedElts, Known2, TLO,
                              Depth + 1))
       return true;
     assert(!Known2.hasConflict() && "Bits known to be one AND zero?");
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     if (!DemandedBits.isAllOnesValue() || !DemandedElts.isAllOnesValue()) {
       SDValue DemandedOp0 = SimplifyMultipleUseDemandedBits(
           Op0, DemandedBits, DemandedElts, TLO.DAG, Depth + 1);
       SDValue DemandedOp1 = SimplifyMultipleUseDemandedBits(
           Op1, DemandedBits, DemandedElts, TLO.DAG, Depth + 1);
       if (DemandedOp0 || DemandedOp1) {
         Op0 = DemandedOp0 ? DemandedOp0 : Op0;
         Op1 = DemandedOp1 ? DemandedOp1 : Op1;
         SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), dl, VT, Op0, Op1);
         return TLO.CombineTo(Op, NewOp);
       }
     }
 
     // If all of the demanded bits are known zero on one side, return the other.
     // These bits cannot contribute to the result of the 'xor'.
     if (DemandedBits.isSubsetOf(Known.Zero))
       return TLO.CombineTo(Op, Op0);
     if (DemandedBits.isSubsetOf(Known2.Zero))
       return TLO.CombineTo(Op, Op1);
     // If the operation can be done in a smaller type, do so.
     if (ShrinkDemandedOp(Op, BitWidth, DemandedBits, TLO))
       return true;
 
     // If all of the unknown bits are known to be zero on one side or the other
     // (but not both) turn this into an *inclusive* or.
     //    e.g. (A & C1)^(B & C2) -> (A & C1)|(B & C2) iff C1&C2 == 0
     if (DemandedBits.isSubsetOf(Known.Zero | Known2.Zero))
       return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::OR, dl, VT, Op0, Op1));
 
     ConstantSDNode* C = isConstOrConstSplat(Op1, DemandedElts);
     if (C) {
       // If one side is a constant, and all of the known set bits on the other
       // side are also set in the constant, turn this into an AND, as we know
       // the bits will be cleared.
       //    e.g. (X | C1) ^ C2 --> (X | C1) & ~C2 iff (C1&C2) == C2
       // NB: it is okay if more bits are known than are requested
       if (C->getAPIntValue() == Known2.One) {
         SDValue ANDC =
             TLO.DAG.getConstant(~C->getAPIntValue() & DemandedBits, dl, VT);
         return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::AND, dl, VT, Op0, ANDC));
       }
 
       // If the RHS is a constant, see if we can change it. Don't alter a -1
       // constant because that's a 'not' op, and that is better for combining
       // and codegen.
       if (!C->isAllOnesValue() &&
           DemandedBits.isSubsetOf(C->getAPIntValue())) {
         // We're flipping all demanded bits. Flip the undemanded bits too.
         SDValue New = TLO.DAG.getNOT(dl, Op0, VT);
         return TLO.CombineTo(Op, New);
       }
     }
 
     // If we can't turn this into a 'not', try to shrink the constant.
     if (!C || !C->isAllOnesValue())
       if (ShrinkDemandedConstant(Op, DemandedBits, DemandedElts, TLO))
         return true;
 
     Known ^= Known2;
     break;
   }
   case ISD::SELECT:
     if (SimplifyDemandedBits(Op.getOperand(2), DemandedBits, Known, TLO,
                              Depth + 1))
       return true;
     if (SimplifyDemandedBits(Op.getOperand(1), DemandedBits, Known2, TLO,
                              Depth + 1))
       return true;
     assert(!Known.hasConflict() && "Bits known to be one AND zero?");
     assert(!Known2.hasConflict() && "Bits known to be one AND zero?");
 
     // If the operands are constants, see if we can simplify them.
     if (ShrinkDemandedConstant(Op, DemandedBits, DemandedElts, TLO))
       return true;
 
     // Only known if known in both the LHS and RHS.
     Known.One &= Known2.One;
     Known.Zero &= Known2.Zero;
     break;
   case ISD::SELECT_CC:
     if (SimplifyDemandedBits(Op.getOperand(3), DemandedBits, Known, TLO,
                              Depth + 1))
       return true;
     if (SimplifyDemandedBits(Op.getOperand(2), DemandedBits, Known2, TLO,
                              Depth + 1))
       return true;
     assert(!Known.hasConflict() && "Bits known to be one AND zero?");
     assert(!Known2.hasConflict() && "Bits known to be one AND zero?");
 
     // If the operands are constants, see if we can simplify them.
     if (ShrinkDemandedConstant(Op, DemandedBits, DemandedElts, TLO))
       return true;
 
     // Only known if known in both the LHS and RHS.
     Known.One &= Known2.One;
     Known.Zero &= Known2.Zero;
     break;
   case ISD::SETCC: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
     ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(2))->get();
     // If (1) we only need the sign-bit, (2) the setcc operands are the same
     // width as the setcc result, and (3) the result of a setcc conforms to 0 or
     // -1, we may be able to bypass the setcc.
     if (DemandedBits.isSignMask() &&
         Op0.getScalarValueSizeInBits() == BitWidth &&
         getBooleanContents(Op0.getValueType()) ==
             BooleanContent::ZeroOrNegativeOneBooleanContent) {
       // If we're testing X < 0, then this compare isn't needed - just use X!
       // FIXME: We're limiting to integer types here, but this should also work
       // if we don't care about FP signed-zero. The use of SETLT with FP means
       // that we don't care about NaNs.
       if (CC == ISD::SETLT && Op1.getValueType().isInteger() &&
           (isNullConstant(Op1) || ISD::isBuildVectorAllZeros(Op1.getNode())))
         return TLO.CombineTo(Op, Op0);
 
       // TODO: Should we check for other forms of sign-bit comparisons?
       // Examples: X <= -1, X >= 0
     }
     if (getBooleanContents(Op0.getValueType()) ==
             TargetLowering::ZeroOrOneBooleanContent &&
         BitWidth > 1)
       Known.Zero.setBitsFrom(1);
     break;
   }
   case ISD::SHL: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
     EVT ShiftVT = Op1.getValueType();
 
     if (const APInt *SA =
             TLO.DAG.getValidShiftAmountConstant(Op, DemandedElts)) {
       unsigned ShAmt = SA->getZExtValue();
       if (ShAmt == 0)
         return TLO.CombineTo(Op, Op0);
 
       // If this is ((X >>u C1) << ShAmt), see if we can simplify this into a
       // single shift.  We can do this if the bottom bits (which are shifted
       // out) are never demanded.
       // TODO - support non-uniform vector amounts.
       if (Op0.getOpcode() == ISD::SRL) {
         if (!DemandedBits.intersects(APInt::getLowBitsSet(BitWidth, ShAmt))) {
           if (const APInt *SA2 =
                   TLO.DAG.getValidShiftAmountConstant(Op0, DemandedElts)) {
             unsigned C1 = SA2->getZExtValue();
             unsigned Opc = ISD::SHL;
             int Diff = ShAmt - C1;
             if (Diff < 0) {
               Diff = -Diff;
               Opc = ISD::SRL;
             }
             SDValue NewSA = TLO.DAG.getConstant(Diff, dl, ShiftVT);
             return TLO.CombineTo(
                 Op, TLO.DAG.getNode(Opc, dl, VT, Op0.getOperand(0), NewSA));
           }
         }
       }
 
       // Convert (shl (anyext x, c)) to (anyext (shl x, c)) if the high bits
       // are not demanded. This will likely allow the anyext to be folded away.
       // TODO - support non-uniform vector amounts.
       if (Op0.getOpcode() == ISD::ANY_EXTEND) {
         SDValue InnerOp = Op0.getOperand(0);
         EVT InnerVT = InnerOp.getValueType();
         unsigned InnerBits = InnerVT.getScalarSizeInBits();
         if (ShAmt < InnerBits && DemandedBits.getActiveBits() <= InnerBits &&
             isTypeDesirableForOp(ISD::SHL, InnerVT)) {
           EVT ShTy = getShiftAmountTy(InnerVT, DL);
           if (!APInt(BitWidth, ShAmt).isIntN(ShTy.getSizeInBits()))
             ShTy = InnerVT;
           SDValue NarrowShl =
               TLO.DAG.getNode(ISD::SHL, dl, InnerVT, InnerOp,
                               TLO.DAG.getConstant(ShAmt, dl, ShTy));
           return TLO.CombineTo(
               Op, TLO.DAG.getNode(ISD::ANY_EXTEND, dl, VT, NarrowShl));
         }
 
         // Repeat the SHL optimization above in cases where an extension
         // intervenes: (shl (anyext (shr x, c1)), c2) to
         // (shl (anyext x), c2-c1).  This requires that the bottom c1 bits
         // aren't demanded (as above) and that the shifted upper c1 bits of
         // x aren't demanded.
         // TODO - support non-uniform vector amounts.
         if (Op0.hasOneUse() && InnerOp.getOpcode() == ISD::SRL &&
             InnerOp.hasOneUse()) {
           if (const APInt *SA2 =
                   TLO.DAG.getValidShiftAmountConstant(InnerOp, DemandedElts)) {
             unsigned InnerShAmt = SA2->getZExtValue();
             if (InnerShAmt < ShAmt && InnerShAmt < InnerBits &&
                 DemandedBits.getActiveBits() <=
                     (InnerBits - InnerShAmt + ShAmt) &&
                 DemandedBits.countTrailingZeros() >= ShAmt) {
               SDValue NewSA =
                   TLO.DAG.getConstant(ShAmt - InnerShAmt, dl, ShiftVT);
               SDValue NewExt = TLO.DAG.getNode(ISD::ANY_EXTEND, dl, VT,
                                                InnerOp.getOperand(0));
               return TLO.CombineTo(
                   Op, TLO.DAG.getNode(ISD::SHL, dl, VT, NewExt, NewSA));
             }
           }
         }
       }
 
       APInt InDemandedMask = DemandedBits.lshr(ShAmt);
       if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO,
                                Depth + 1))
         return true;
       assert(!Known.hasConflict() && "Bits known to be one AND zero?");
       Known.Zero <<= ShAmt;
       Known.One <<= ShAmt;
       // low bits known zero.
       Known.Zero.setLowBits(ShAmt);
 
       // Try shrinking the operation as long as the shift amount will still be
       // in range.
       if ((ShAmt < DemandedBits.getActiveBits()) &&
           ShrinkDemandedOp(Op, BitWidth, DemandedBits, TLO))
         return true;
     }
 
     // If we are only demanding sign bits then we can use the shift source
     // directly.
     if (const APInt *MaxSA =
             TLO.DAG.getValidMaximumShiftAmountConstant(Op, DemandedElts)) {
       unsigned ShAmt = MaxSA->getZExtValue();
       unsigned NumSignBits =
           TLO.DAG.ComputeNumSignBits(Op0, DemandedElts, Depth + 1);
       unsigned UpperDemandedBits = BitWidth - DemandedBits.countTrailingZeros();
       if (NumSignBits > ShAmt && (NumSignBits - ShAmt) >= (UpperDemandedBits))
         return TLO.CombineTo(Op, Op0);
     }
     break;
   }
   case ISD::SRL: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
     EVT ShiftVT = Op1.getValueType();
 
     if (const APInt *SA =
             TLO.DAG.getValidShiftAmountConstant(Op, DemandedElts)) {
       unsigned ShAmt = SA->getZExtValue();
       if (ShAmt == 0)
         return TLO.CombineTo(Op, Op0);
 
       // If this is ((X << C1) >>u ShAmt), see if we can simplify this into a
       // single shift.  We can do this if the top bits (which are shifted out)
       // are never demanded.
       // TODO - support non-uniform vector amounts.
       if (Op0.getOpcode() == ISD::SHL) {
         if (!DemandedBits.intersects(APInt::getHighBitsSet(BitWidth, ShAmt))) {
           if (const APInt *SA2 =
                   TLO.DAG.getValidShiftAmountConstant(Op0, DemandedElts)) {
             unsigned C1 = SA2->getZExtValue();
             unsigned Opc = ISD::SRL;
             int Diff = ShAmt - C1;
             if (Diff < 0) {
               Diff = -Diff;
               Opc = ISD::SHL;
             }
             SDValue NewSA = TLO.DAG.getConstant(Diff, dl, ShiftVT);
             return TLO.CombineTo(
                 Op, TLO.DAG.getNode(Opc, dl, VT, Op0.getOperand(0), NewSA));
           }
         }
       }
 
       APInt InDemandedMask = (DemandedBits << ShAmt);
 
       // If the shift is exact, then it does demand the low bits (and knows that
       // they are zero).
       if (Op->getFlags().hasExact())
         InDemandedMask.setLowBits(ShAmt);
 
       // Compute the new bits that are at the top now.
       if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO,
                                Depth + 1))
         return true;
       assert(!Known.hasConflict() && "Bits known to be one AND zero?");
       Known.Zero.lshrInPlace(ShAmt);
       Known.One.lshrInPlace(ShAmt);
       // High bits known zero.
       Known.Zero.setHighBits(ShAmt);
     }
     break;
   }
   case ISD::SRA: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
     EVT ShiftVT = Op1.getValueType();
 
     // If we only want bits that already match the signbit then we don't need
     // to shift.
     unsigned NumHiDemandedBits = BitWidth - DemandedBits.countTrailingZeros();
     if (TLO.DAG.ComputeNumSignBits(Op0, DemandedElts, Depth + 1) >=
         NumHiDemandedBits)
       return TLO.CombineTo(Op, Op0);
 
     // If this is an arithmetic shift right and only the low-bit is set, we can
     // always convert this into a logical shr, even if the shift amount is
     // variable.  The low bit of the shift cannot be an input sign bit unless
     // the shift amount is >= the size of the datatype, which is undefined.
     if (DemandedBits.isOneValue())
       return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::SRL, dl, VT, Op0, Op1));
 
     if (const APInt *SA =
             TLO.DAG.getValidShiftAmountConstant(Op, DemandedElts)) {
       unsigned ShAmt = SA->getZExtValue();
       if (ShAmt == 0)
         return TLO.CombineTo(Op, Op0);
 
       APInt InDemandedMask = (DemandedBits << ShAmt);
 
       // If the shift is exact, then it does demand the low bits (and knows that
       // they are zero).
       if (Op->getFlags().hasExact())
         InDemandedMask.setLowBits(ShAmt);
 
       // If any of the demanded bits are produced by the sign extension, we also
       // demand the input sign bit.
       if (DemandedBits.countLeadingZeros() < ShAmt)
         InDemandedMask.setSignBit();
 
       if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO,
                                Depth + 1))
         return true;
       assert(!Known.hasConflict() && "Bits known to be one AND zero?");
       Known.Zero.lshrInPlace(ShAmt);
       Known.One.lshrInPlace(ShAmt);
 
       // If the input sign bit is known to be zero, or if none of the top bits
       // are demanded, turn this into an unsigned shift right.
       if (Known.Zero[BitWidth - ShAmt - 1] ||
           DemandedBits.countLeadingZeros() >= ShAmt) {
         SDNodeFlags Flags;
         Flags.setExact(Op->getFlags().hasExact());
         return TLO.CombineTo(
             Op, TLO.DAG.getNode(ISD::SRL, dl, VT, Op0, Op1, Flags));
       }
 
       int Log2 = DemandedBits.exactLogBase2();
       if (Log2 >= 0) {
         // The bit must come from the sign.
         SDValue NewSA = TLO.DAG.getConstant(BitWidth - 1 - Log2, dl, ShiftVT);
         return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::SRL, dl, VT, Op0, NewSA));
       }
 
       if (Known.One[BitWidth - ShAmt - 1])
         // New bits are known one.
         Known.One.setHighBits(ShAmt);
 
       // Attempt to avoid multi-use ops if we don't need anything from them.
       if (!InDemandedMask.isAllOnesValue() || !DemandedElts.isAllOnesValue()) {
         SDValue DemandedOp0 = SimplifyMultipleUseDemandedBits(
             Op0, InDemandedMask, DemandedElts, TLO.DAG, Depth + 1);
         if (DemandedOp0) {
           SDValue NewOp = TLO.DAG.getNode(ISD::SRA, dl, VT, DemandedOp0, Op1);
           return TLO.CombineTo(Op, NewOp);
         }
       }
     }
     break;
   }
   case ISD::FSHL:
   case ISD::FSHR: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
     SDValue Op2 = Op.getOperand(2);
     bool IsFSHL = (Op.getOpcode() == ISD::FSHL);
 
     if (ConstantSDNode *SA = isConstOrConstSplat(Op2, DemandedElts)) {
       unsigned Amt = SA->getAPIntValue().urem(BitWidth);
 
       // For fshl, 0-shift returns the 1st arg.
       // For fshr, 0-shift returns the 2nd arg.
       if (Amt == 0) {
         if (SimplifyDemandedBits(IsFSHL ? Op0 : Op1, DemandedBits, DemandedElts,
                                  Known, TLO, Depth + 1))
           return true;
         break;
       }
 
       // fshl: (Op0 << Amt) | (Op1 >> (BW - Amt))
       // fshr: (Op0 << (BW - Amt)) | (Op1 >> Amt)
       APInt Demanded0 = DemandedBits.lshr(IsFSHL ? Amt : (BitWidth - Amt));
       APInt Demanded1 = DemandedBits << (IsFSHL ? (BitWidth - Amt) : Amt);
       if (SimplifyDemandedBits(Op0, Demanded0, DemandedElts, Known2, TLO,
                                Depth + 1))
         return true;
       if (SimplifyDemandedBits(Op1, Demanded1, DemandedElts, Known, TLO,
                                Depth + 1))
         return true;
 
       Known2.One <<= (IsFSHL ? Amt : (BitWidth - Amt));
       Known2.Zero <<= (IsFSHL ? Amt : (BitWidth - Amt));
       Known.One.lshrInPlace(IsFSHL ? (BitWidth - Amt) : Amt);
       Known.Zero.lshrInPlace(IsFSHL ? (BitWidth - Amt) : Amt);
       Known.One |= Known2.One;
       Known.Zero |= Known2.Zero;
     }
 
     // For pow-2 bitwidths we only demand the bottom modulo amt bits.
     if (isPowerOf2_32(BitWidth)) {
       APInt DemandedAmtBits(Op2.getScalarValueSizeInBits(), BitWidth - 1);
       if (SimplifyDemandedBits(Op2, DemandedAmtBits, DemandedElts,
                                Known2, TLO, Depth + 1))
         return true;
     }
     break;
   }
   case ISD::ROTL:
   case ISD::ROTR: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
 
     // If we're rotating an 0/-1 value, then it stays an 0/-1 value.
     if (BitWidth == TLO.DAG.ComputeNumSignBits(Op0, DemandedElts, Depth + 1))
       return TLO.CombineTo(Op, Op0);
 
     // For pow-2 bitwidths we only demand the bottom modulo amt bits.
     if (isPowerOf2_32(BitWidth)) {
       APInt DemandedAmtBits(Op1.getScalarValueSizeInBits(), BitWidth - 1);
       if (SimplifyDemandedBits(Op1, DemandedAmtBits, DemandedElts, Known2, TLO,
                                Depth + 1))
         return true;
     }
     break;
   }
   case ISD::BITREVERSE: {
     SDValue Src = Op.getOperand(0);
     APInt DemandedSrcBits = DemandedBits.reverseBits();
     if (SimplifyDemandedBits(Src, DemandedSrcBits, DemandedElts, Known2, TLO,
                              Depth + 1))
       return true;
     Known.One = Known2.One.reverseBits();
     Known.Zero = Known2.Zero.reverseBits();
     break;
   }
   case ISD::BSWAP: {
     SDValue Src = Op.getOperand(0);
     APInt DemandedSrcBits = DemandedBits.byteSwap();
     if (SimplifyDemandedBits(Src, DemandedSrcBits, DemandedElts, Known2, TLO,
                              Depth + 1))
       return true;
     Known.One = Known2.One.byteSwap();
     Known.Zero = Known2.Zero.byteSwap();
     break;
   }
   case ISD::SIGN_EXTEND_INREG: {
     SDValue Op0 = Op.getOperand(0);
     EVT ExVT = cast<VTSDNode>(Op.getOperand(1))->getVT();
     unsigned ExVTBits = ExVT.getScalarSizeInBits();
 
     // If we only care about the highest bit, don't bother shifting right.
     if (DemandedBits.isSignMask()) {
       unsigned NumSignBits =
           TLO.DAG.ComputeNumSignBits(Op0, DemandedElts, Depth + 1);
       bool AlreadySignExtended = NumSignBits >= BitWidth - ExVTBits + 1;
       // However if the input is already sign extended we expect the sign
       // extension to be dropped altogether later and do not simplify.
       if (!AlreadySignExtended) {
         // Compute the correct shift amount type, which must be getShiftAmountTy
         // for scalar types after legalization.
         EVT ShiftAmtTy = VT;
         if (TLO.LegalTypes() && !ShiftAmtTy.isVector())
           ShiftAmtTy = getShiftAmountTy(ShiftAmtTy, DL);
 
         SDValue ShiftAmt =
             TLO.DAG.getConstant(BitWidth - ExVTBits, dl, ShiftAmtTy);
         return TLO.CombineTo(Op,
                              TLO.DAG.getNode(ISD::SHL, dl, VT, Op0, ShiftAmt));
       }
     }
 
     // If none of the extended bits are demanded, eliminate the sextinreg.
     if (DemandedBits.getActiveBits() <= ExVTBits)
       return TLO.CombineTo(Op, Op0);
 
     APInt InputDemandedBits = DemandedBits.getLoBits(ExVTBits);
 
     // Since the sign extended bits are demanded, we know that the sign
     // bit is demanded.
     InputDemandedBits.setBit(ExVTBits - 1);
 
     if (SimplifyDemandedBits(Op0, InputDemandedBits, Known, TLO, Depth + 1))
       return true;
     assert(!Known.hasConflict() && "Bits known to be one AND zero?");
 
     // If the sign bit of the input is known set or clear, then we know the
     // top bits of the result.
 
     // If the input sign bit is known zero, convert this into a zero extension.
     if (Known.Zero[ExVTBits - 1])
       return TLO.CombineTo(Op, TLO.DAG.getZeroExtendInReg(Op0, dl, ExVT));
 
     APInt Mask = APInt::getLowBitsSet(BitWidth, ExVTBits);
     if (Known.One[ExVTBits - 1]) { // Input sign bit known set
       Known.One.setBitsFrom(ExVTBits);
       Known.Zero &= Mask;
     } else { // Input sign bit unknown
       Known.Zero &= Mask;
       Known.One &= Mask;
     }
     break;
   }
   case ISD::BUILD_PAIR: {
     EVT HalfVT = Op.getOperand(0).getValueType();
     unsigned HalfBitWidth = HalfVT.getScalarSizeInBits();
 
     APInt MaskLo = DemandedBits.getLoBits(HalfBitWidth).trunc(HalfBitWidth);
     APInt MaskHi = DemandedBits.getHiBits(HalfBitWidth).trunc(HalfBitWidth);
 
     KnownBits KnownLo, KnownHi;
 
     if (SimplifyDemandedBits(Op.getOperand(0), MaskLo, KnownLo, TLO, Depth + 1))
       return true;
 
     if (SimplifyDemandedBits(Op.getOperand(1), MaskHi, KnownHi, TLO, Depth + 1))
       return true;
 
     Known.Zero = KnownLo.Zero.zext(BitWidth) |
                  KnownHi.Zero.zext(BitWidth).shl(HalfBitWidth);
 
     Known.One = KnownLo.One.zext(BitWidth) |
                 KnownHi.One.zext(BitWidth).shl(HalfBitWidth);
     break;
   }
   case ISD::ZERO_EXTEND:
   case ISD::ZERO_EXTEND_VECTOR_INREG: {
     SDValue Src = Op.getOperand(0);
     EVT SrcVT = Src.getValueType();
     unsigned InBits = SrcVT.getScalarSizeInBits();
     unsigned InElts = SrcVT.isVector() ? SrcVT.getVectorNumElements() : 1;
     bool IsVecInReg = Op.getOpcode() == ISD::ZERO_EXTEND_VECTOR_INREG;
 
     // If none of the top bits are demanded, convert this into an any_extend.
     if (DemandedBits.getActiveBits() <= InBits) {
       // If we only need the non-extended bits of the bottom element
       // then we can just bitcast to the result.
       if (IsVecInReg && DemandedElts == 1 &&
           VT.getSizeInBits() == SrcVT.getSizeInBits() &&
           TLO.DAG.getDataLayout().isLittleEndian())
         return TLO.CombineTo(Op, TLO.DAG.getBitcast(VT, Src));
 
       unsigned Opc =
           IsVecInReg ? ISD::ANY_EXTEND_VECTOR_INREG : ISD::ANY_EXTEND;
       if (!TLO.LegalOperations() || isOperationLegal(Opc, VT))
         return TLO.CombineTo(Op, TLO.DAG.getNode(Opc, dl, VT, Src));
     }
 
     APInt InDemandedBits = DemandedBits.trunc(InBits);
     APInt InDemandedElts = DemandedElts.zextOrSelf(InElts);
     if (SimplifyDemandedBits(Src, InDemandedBits, InDemandedElts, Known, TLO,
                              Depth + 1))
       return true;
     assert(!Known.hasConflict() && "Bits known to be one AND zero?");
     assert(Known.getBitWidth() == InBits && "Src width has changed?");
     Known = Known.zext(BitWidth);
     break;
   }
   case ISD::SIGN_EXTEND:
   case ISD::SIGN_EXTEND_VECTOR_INREG: {
     SDValue Src = Op.getOperand(0);
     EVT SrcVT = Src.getValueType();
     unsigned InBits = SrcVT.getScalarSizeInBits();
     unsigned InElts = SrcVT.isVector() ? SrcVT.getVectorNumElements() : 1;
     bool IsVecInReg = Op.getOpcode() == ISD::SIGN_EXTEND_VECTOR_INREG;
 
     // If none of the top bits are demanded, convert this into an any_extend.
     if (DemandedBits.getActiveBits() <= InBits) {
       // If we only need the non-extended bits of the bottom element
       // then we can just bitcast to the result.
       if (IsVecInReg && DemandedElts == 1 &&
           VT.getSizeInBits() == SrcVT.getSizeInBits() &&
           TLO.DAG.getDataLayout().isLittleEndian())
         return TLO.CombineTo(Op, TLO.DAG.getBitcast(VT, Src));
 
       unsigned Opc =
           IsVecInReg ? ISD::ANY_EXTEND_VECTOR_INREG : ISD::ANY_EXTEND;
       if (!TLO.LegalOperations() || isOperationLegal(Opc, VT))
         return TLO.CombineTo(Op, TLO.DAG.getNode(Opc, dl, VT, Src));
     }
 
     APInt InDemandedBits = DemandedBits.trunc(InBits);
     APInt InDemandedElts = DemandedElts.zextOrSelf(InElts);
 
     // Since some of the sign extended bits are demanded, we know that the sign
     // bit is demanded.
     InDemandedBits.setBit(InBits - 1);
 
     if (SimplifyDemandedBits(Src, InDemandedBits, InDemandedElts, Known, TLO,
                              Depth + 1))
       return true;
     assert(!Known.hasConflict() && "Bits known to be one AND zero?");
     assert(Known.getBitWidth() == InBits && "Src width has changed?");
 
     // If the sign bit is known one, the top bits match.
     Known = Known.sext(BitWidth);
 
     // If the sign bit is known zero, convert this to a zero extend.
     if (Known.isNonNegative()) {
       unsigned Opc =
           IsVecInReg ? ISD::ZERO_EXTEND_VECTOR_INREG : ISD::ZERO_EXTEND;
       if (!TLO.LegalOperations() || isOperationLegal(Opc, VT))
         return TLO.CombineTo(Op, TLO.DAG.getNode(Opc, dl, VT, Src));
     }
     break;
   }
   case ISD::ANY_EXTEND:
   case ISD::ANY_EXTEND_VECTOR_INREG: {
     SDValue Src = Op.getOperand(0);
     EVT SrcVT = Src.getValueType();
     unsigned InBits = SrcVT.getScalarSizeInBits();
     unsigned InElts = SrcVT.isVector() ? SrcVT.getVectorNumElements() : 1;
     bool IsVecInReg = Op.getOpcode() == ISD::ANY_EXTEND_VECTOR_INREG;
 
     // If we only need the bottom element then we can just bitcast.
     // TODO: Handle ANY_EXTEND?
     if (IsVecInReg && DemandedElts == 1 &&
         VT.getSizeInBits() == SrcVT.getSizeInBits() &&
         TLO.DAG.getDataLayout().isLittleEndian())
       return TLO.CombineTo(Op, TLO.DAG.getBitcast(VT, Src));
 
     APInt InDemandedBits = DemandedBits.trunc(InBits);
     APInt InDemandedElts = DemandedElts.zextOrSelf(InElts);
     if (SimplifyDemandedBits(Src, InDemandedBits, InDemandedElts, Known, TLO,
                              Depth + 1))
       return true;
     assert(!Known.hasConflict() && "Bits known to be one AND zero?");
     assert(Known.getBitWidth() == InBits && "Src width has changed?");
     Known = Known.anyext(BitWidth);
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     if (SDValue NewSrc = SimplifyMultipleUseDemandedBits(
             Src, InDemandedBits, InDemandedElts, TLO.DAG, Depth + 1))
       return TLO.CombineTo(Op, TLO.DAG.getNode(Op.getOpcode(), dl, VT, NewSrc));
     break;
   }
   case ISD::TRUNCATE: {
     SDValue Src = Op.getOperand(0);
 
     // Simplify the input, using demanded bit information, and compute the known
     // zero/one bits live out.
     unsigned OperandBitWidth = Src.getScalarValueSizeInBits();
     APInt TruncMask = DemandedBits.zext(OperandBitWidth);
     if (SimplifyDemandedBits(Src, TruncMask, Known, TLO, Depth + 1))
       return true;
     Known = Known.trunc(BitWidth);
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     if (SDValue NewSrc = SimplifyMultipleUseDemandedBits(
             Src, TruncMask, DemandedElts, TLO.DAG, Depth + 1))
       return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::TRUNCATE, dl, VT, NewSrc));
 
     // If the input is only used by this truncate, see if we can shrink it based
     // on the known demanded bits.
     if (Src.getNode()->hasOneUse()) {
       switch (Src.getOpcode()) {
       default:
         break;
       case ISD::SRL:
         // Shrink SRL by a constant if none of the high bits shifted in are
         // demanded.
         if (TLO.LegalTypes() && !isTypeDesirableForOp(ISD::SRL, VT))
           // Do not turn (vt1 truncate (vt2 srl)) into (vt1 srl) if vt1 is
           // undesirable.
           break;
 
         SDValue ShAmt = Src.getOperand(1);
         auto *ShAmtC = dyn_cast<ConstantSDNode>(ShAmt);
         if (!ShAmtC || ShAmtC->getAPIntValue().uge(BitWidth))
           break;
         uint64_t ShVal = ShAmtC->getZExtValue();
 
         APInt HighBits =
             APInt::getHighBitsSet(OperandBitWidth, OperandBitWidth - BitWidth);
         HighBits.lshrInPlace(ShVal);
         HighBits = HighBits.trunc(BitWidth);
 
         if (!(HighBits & DemandedBits)) {
           // None of the shifted in bits are needed.  Add a truncate of the
           // shift input, then shift it.
           if (TLO.LegalTypes())
             ShAmt = TLO.DAG.getConstant(ShVal, dl, getShiftAmountTy(VT, DL));
           SDValue NewTrunc =
               TLO.DAG.getNode(ISD::TRUNCATE, dl, VT, Src.getOperand(0));
           return TLO.CombineTo(
               Op, TLO.DAG.getNode(ISD::SRL, dl, VT, NewTrunc, ShAmt));
         }
         break;
       }
     }
 
     assert(!Known.hasConflict() && "Bits known to be one AND zero?");
     break;
   }
   case ISD::AssertZext: {
     // AssertZext demands all of the high bits, plus any of the low bits
     // demanded by its users.
     EVT ZVT = cast<VTSDNode>(Op.getOperand(1))->getVT();
     APInt InMask = APInt::getLowBitsSet(BitWidth, ZVT.getSizeInBits());
     if (SimplifyDemandedBits(Op.getOperand(0), ~InMask | DemandedBits, Known,
                              TLO, Depth + 1))
       return true;
     assert(!Known.hasConflict() && "Bits known to be one AND zero?");
 
     Known.Zero |= ~InMask;
     break;
   }
   case ISD::EXTRACT_VECTOR_ELT: {
     SDValue Src = Op.getOperand(0);
     SDValue Idx = Op.getOperand(1);
     unsigned NumSrcElts = Src.getValueType().getVectorNumElements();
     unsigned EltBitWidth = Src.getScalarValueSizeInBits();
 
     // Demand the bits from every vector element without a constant index.
     APInt DemandedSrcElts = APInt::getAllOnesValue(NumSrcElts);
     if (auto *CIdx = dyn_cast<ConstantSDNode>(Idx))
       if (CIdx->getAPIntValue().ult(NumSrcElts))
         DemandedSrcElts = APInt::getOneBitSet(NumSrcElts, CIdx->getZExtValue());
 
     // If BitWidth > EltBitWidth the value is anyext:ed. So we do not know
     // anything about the extended bits.
     APInt DemandedSrcBits = DemandedBits;
     if (BitWidth > EltBitWidth)
       DemandedSrcBits = DemandedSrcBits.trunc(EltBitWidth);
 
     if (SimplifyDemandedBits(Src, DemandedSrcBits, DemandedSrcElts, Known2, TLO,
                              Depth + 1))
       return true;
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     if (!DemandedSrcBits.isAllOnesValue() ||
         !DemandedSrcElts.isAllOnesValue()) {
       if (SDValue DemandedSrc = SimplifyMultipleUseDemandedBits(
               Src, DemandedSrcBits, DemandedSrcElts, TLO.DAG, Depth + 1)) {
         SDValue NewOp =
             TLO.DAG.getNode(Op.getOpcode(), dl, VT, DemandedSrc, Idx);
         return TLO.CombineTo(Op, NewOp);
       }
     }
 
     Known = Known2;
     if (BitWidth > EltBitWidth)
       Known = Known.anyext(BitWidth);
     break;
   }
   case ISD::BITCAST: {
     SDValue Src = Op.getOperand(0);
     EVT SrcVT = Src.getValueType();
     unsigned NumSrcEltBits = SrcVT.getScalarSizeInBits();
 
     // If this is an FP->Int bitcast and if the sign bit is the only
     // thing demanded, turn this into a FGETSIGN.
     if (!TLO.LegalOperations() && !VT.isVector() && !SrcVT.isVector() &&
         DemandedBits == APInt::getSignMask(Op.getValueSizeInBits()) &&
         SrcVT.isFloatingPoint()) {
       bool OpVTLegal = isOperationLegalOrCustom(ISD::FGETSIGN, VT);
       bool i32Legal = isOperationLegalOrCustom(ISD::FGETSIGN, MVT::i32);
       if ((OpVTLegal || i32Legal) && VT.isSimple() && SrcVT != MVT::f16 &&
           SrcVT != MVT::f128) {
         // Cannot eliminate/lower SHL for f128 yet.
         EVT Ty = OpVTLegal ? VT : MVT::i32;
         // Make a FGETSIGN + SHL to move the sign bit into the appropriate
         // place.  We expect the SHL to be eliminated by other optimizations.
         SDValue Sign = TLO.DAG.getNode(ISD::FGETSIGN, dl, Ty, Src);
         unsigned OpVTSizeInBits = Op.getValueSizeInBits();
         if (!OpVTLegal && OpVTSizeInBits > 32)
           Sign = TLO.DAG.getNode(ISD::ZERO_EXTEND, dl, VT, Sign);
         unsigned ShVal = Op.getValueSizeInBits() - 1;
         SDValue ShAmt = TLO.DAG.getConstant(ShVal, dl, VT);
         return TLO.CombineTo(Op,
                              TLO.DAG.getNode(ISD::SHL, dl, VT, Sign, ShAmt));
       }
     }
 
     // Bitcast from a vector using SimplifyDemanded Bits/VectorElts.
     // Demand the elt/bit if any of the original elts/bits are demanded.
     // TODO - bigendian once we have test coverage.
     if (SrcVT.isVector() && (BitWidth % NumSrcEltBits) == 0 &&
         TLO.DAG.getDataLayout().isLittleEndian()) {
       unsigned Scale = BitWidth / NumSrcEltBits;
       unsigned NumSrcElts = SrcVT.getVectorNumElements();
       APInt DemandedSrcBits = APInt::getNullValue(NumSrcEltBits);
       APInt DemandedSrcElts = APInt::getNullValue(NumSrcElts);
       for (unsigned i = 0; i != Scale; ++i) {
         unsigned Offset = i * NumSrcEltBits;
         APInt Sub = DemandedBits.extractBits(NumSrcEltBits, Offset);
         if (!Sub.isNullValue()) {
           DemandedSrcBits |= Sub;
           for (unsigned j = 0; j != NumElts; ++j)
             if (DemandedElts[j])
               DemandedSrcElts.setBit((j * Scale) + i);
         }
       }
 
       APInt KnownSrcUndef, KnownSrcZero;
       if (SimplifyDemandedVectorElts(Src, DemandedSrcElts, KnownSrcUndef,
                                      KnownSrcZero, TLO, Depth + 1))
         return true;
 
       KnownBits KnownSrcBits;
       if (SimplifyDemandedBits(Src, DemandedSrcBits, DemandedSrcElts,
                                KnownSrcBits, TLO, Depth + 1))
         return true;
     } else if ((NumSrcEltBits % BitWidth) == 0 &&
                TLO.DAG.getDataLayout().isLittleEndian()) {
       unsigned Scale = NumSrcEltBits / BitWidth;
       unsigned NumSrcElts = SrcVT.isVector() ? SrcVT.getVectorNumElements() : 1;
       APInt DemandedSrcBits = APInt::getNullValue(NumSrcEltBits);
       APInt DemandedSrcElts = APInt::getNullValue(NumSrcElts);
       for (unsigned i = 0; i != NumElts; ++i)
         if (DemandedElts[i]) {
           unsigned Offset = (i % Scale) * BitWidth;
           DemandedSrcBits.insertBits(DemandedBits, Offset);
           DemandedSrcElts.setBit(i / Scale);
         }
 
       if (SrcVT.isVector()) {
         APInt KnownSrcUndef, KnownSrcZero;
         if (SimplifyDemandedVectorElts(Src, DemandedSrcElts, KnownSrcUndef,
                                        KnownSrcZero, TLO, Depth + 1))
           return true;
       }
 
       KnownBits KnownSrcBits;
       if (SimplifyDemandedBits(Src, DemandedSrcBits, DemandedSrcElts,
                                KnownSrcBits, TLO, Depth + 1))
         return true;
     }
 
     // If this is a bitcast, let computeKnownBits handle it.  Only do this on a
     // recursive call where Known may be useful to the caller.
     if (Depth > 0) {
       Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);
       return false;
     }
     break;
   }
   case ISD::ADD:
   case ISD::MUL:
   case ISD::SUB: {
     // Add, Sub, and Mul don't demand any bits in positions beyond that
     // of the highest bit demanded of them.
     SDValue Op0 = Op.getOperand(0), Op1 = Op.getOperand(1);
     SDNodeFlags Flags = Op.getNode()->getFlags();
     unsigned DemandedBitsLZ = DemandedBits.countLeadingZeros();
     APInt LoMask = APInt::getLowBitsSet(BitWidth, BitWidth - DemandedBitsLZ);
     if (SimplifyDemandedBits(Op0, LoMask, DemandedElts, Known2, TLO,
                              Depth + 1) ||
         SimplifyDemandedBits(Op1, LoMask, DemandedElts, Known2, TLO,
                              Depth + 1) ||
         // See if the operation should be performed at a smaller bit width.
         ShrinkDemandedOp(Op, BitWidth, DemandedBits, TLO)) {
       if (Flags.hasNoSignedWrap() || Flags.hasNoUnsignedWrap()) {
         // Disable the nsw and nuw flags. We can no longer guarantee that we
         // won't wrap after simplification.
         Flags.setNoSignedWrap(false);
         Flags.setNoUnsignedWrap(false);
         SDValue NewOp =
             TLO.DAG.getNode(Op.getOpcode(), dl, VT, Op0, Op1, Flags);
         return TLO.CombineTo(Op, NewOp);
       }
       return true;
     }
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     if (!LoMask.isAllOnesValue() || !DemandedElts.isAllOnesValue()) {
       SDValue DemandedOp0 = SimplifyMultipleUseDemandedBits(
           Op0, LoMask, DemandedElts, TLO.DAG, Depth + 1);
       SDValue DemandedOp1 = SimplifyMultipleUseDemandedBits(
           Op1, LoMask, DemandedElts, TLO.DAG, Depth + 1);
       if (DemandedOp0 || DemandedOp1) {
         Flags.setNoSignedWrap(false);
         Flags.setNoUnsignedWrap(false);
         Op0 = DemandedOp0 ? DemandedOp0 : Op0;
         Op1 = DemandedOp1 ? DemandedOp1 : Op1;
         SDValue NewOp =
             TLO.DAG.getNode(Op.getOpcode(), dl, VT, Op0, Op1, Flags);
         return TLO.CombineTo(Op, NewOp);
       }
     }
 
     // If we have a constant operand, we may be able to turn it into -1 if we
     // do not demand the high bits. This can make the constant smaller to
     // encode, allow more general folding, or match specialized instruction
     // patterns (eg, 'blsr' on x86). Don't bother changing 1 to -1 because that
     // is probably not useful (and could be detrimental).
     ConstantSDNode *C = isConstOrConstSplat(Op1);
     APInt HighMask = APInt::getHighBitsSet(BitWidth, DemandedBitsLZ);
     if (C && !C->isAllOnesValue() && !C->isOne() &&
         (C->getAPIntValue() | HighMask).isAllOnesValue()) {
       SDValue Neg1 = TLO.DAG.getAllOnesConstant(dl, VT);
       // Disable the nsw and nuw flags. We can no longer guarantee that we
       // won't wrap after simplification.
       Flags.setNoSignedWrap(false);
       Flags.setNoUnsignedWrap(false);
       SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), dl, VT, Op0, Neg1, Flags);
       return TLO.CombineTo(Op, NewOp);
     }
 
     LLVM_FALLTHROUGH;
   }
   default:
     if (Op.getOpcode() >= ISD::BUILTIN_OP_END) {
       if (SimplifyDemandedBitsForTargetNode(Op, DemandedBits, DemandedElts,
                                             Known, TLO, Depth))
         return true;
       break;
     }
 
     // Just use computeKnownBits to compute output bits.
     Known = TLO.DAG.computeKnownBits(Op, DemandedElts, Depth);
     break;
   }
 
   // If we know the value of all of the demanded bits, return this as a
   // constant.
   if (DemandedBits.isSubsetOf(Known.Zero | Known.One)) {
     // Avoid folding to a constant if any OpaqueConstant is involved.
     const SDNode *N = Op.getNode();
     for (SDNodeIterator I = SDNodeIterator::begin(N),
                         E = SDNodeIterator::end(N);
          I != E; ++I) {
       SDNode *Op = *I;
       if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Op))
         if (C->isOpaque())
           return false;
     }
     // TODO: Handle float bits as well.
     if (VT.isInteger())
       return TLO.CombineTo(Op, TLO.DAG.getConstant(Known.One, dl, VT));
   }
 
   return false;
 }
 
 bool TargetLowering::SimplifyDemandedVectorElts(SDValue Op,
                                                 const APInt &DemandedElts,
                                                 APInt &KnownUndef,
                                                 APInt &KnownZero,
                                                 DAGCombinerInfo &DCI) const {
   SelectionDAG &DAG = DCI.DAG;
   TargetLoweringOpt TLO(DAG, !DCI.isBeforeLegalize(),
                         !DCI.isBeforeLegalizeOps());
 
   bool Simplified =
       SimplifyDemandedVectorElts(Op, DemandedElts, KnownUndef, KnownZero, TLO);
   if (Simplified) {
     DCI.AddToWorklist(Op.getNode());
     DCI.CommitTargetLoweringOpt(TLO);
   }
 
   return Simplified;
 }
 
 /// Given a vector binary operation and known undefined elements for each input
 /// operand, compute whether each element of the output is undefined.
 static APInt getKnownUndefForVectorBinop(SDValue BO, SelectionDAG &DAG,
                                          const APInt &UndefOp0,
                                          const APInt &UndefOp1) {
   EVT VT = BO.getValueType();
   assert(DAG.getTargetLoweringInfo().isBinOp(BO.getOpcode()) && VT.isVector() &&
          "Vector binop only");
 
   EVT EltVT = VT.getVectorElementType();
   unsigned NumElts = VT.getVectorNumElements();
   assert(UndefOp0.getBitWidth() == NumElts &&
          UndefOp1.getBitWidth() == NumElts && "Bad type for undef analysis");
 
   auto getUndefOrConstantElt = [&](SDValue V, unsigned Index,
                                    const APInt &UndefVals) {
     if (UndefVals[Index])
       return DAG.getUNDEF(EltVT);
 
     if (auto *BV = dyn_cast<BuildVectorSDNode>(V)) {
       // Try hard to make sure that the getNode() call is not creating temporary
       // nodes. Ignore opaque integers because they do not constant fold.
       SDValue Elt = BV->getOperand(Index);
       auto *C = dyn_cast<ConstantSDNode>(Elt);
       if (isa<ConstantFPSDNode>(Elt) || Elt.isUndef() || (C && !C->isOpaque()))
         return Elt;
     }
 
     return SDValue();
   };
 
   APInt KnownUndef = APInt::getNullValue(NumElts);
   for (unsigned i = 0; i != NumElts; ++i) {
     // If both inputs for this element are either constant or undef and match
     // the element type, compute the constant/undef result for this element of
     // the vector.
     // TODO: Ideally we would use FoldConstantArithmetic() here, but that does
     // not handle FP constants. The code within getNode() should be refactored
     // to avoid the danger of creating a bogus temporary node here.
     SDValue C0 = getUndefOrConstantElt(BO.getOperand(0), i, UndefOp0);
     SDValue C1 = getUndefOrConstantElt(BO.getOperand(1), i, UndefOp1);
     if (C0 && C1 && C0.getValueType() == EltVT && C1.getValueType() == EltVT)
       if (DAG.getNode(BO.getOpcode(), SDLoc(BO), EltVT, C0, C1).isUndef())
         KnownUndef.setBit(i);
   }
   return KnownUndef;
 }
 
 bool TargetLowering::SimplifyDemandedVectorElts(
     SDValue Op, const APInt &OriginalDemandedElts, APInt &KnownUndef,
     APInt &KnownZero, TargetLoweringOpt &TLO, unsigned Depth,
     bool AssumeSingleUse) const {
   EVT VT = Op.getValueType();
   unsigned Opcode = Op.getOpcode();
   APInt DemandedElts = OriginalDemandedElts;
   unsigned NumElts = DemandedElts.getBitWidth();
   assert(VT.isVector() && "Expected vector op");
 
   KnownUndef = KnownZero = APInt::getNullValue(NumElts);
 
   // TODO: For now we assume we know nothing about scalable vectors.
   if (VT.isScalableVector())
     return false;
 
   assert(VT.getVectorNumElements() == NumElts &&
          "Mask size mismatches value type element count!");
 
   // Undef operand.
   if (Op.isUndef()) {
     KnownUndef.setAllBits();
     return false;
   }
 
   // If Op has other users, assume that all elements are needed.
   if (!Op.getNode()->hasOneUse() && !AssumeSingleUse)
     DemandedElts.setAllBits();
 
   // Not demanding any elements from Op.
   if (DemandedElts == 0) {
     KnownUndef.setAllBits();
     return TLO.CombineTo(Op, TLO.DAG.getUNDEF(VT));
   }
 
   // Limit search depth.
   if (Depth >= SelectionDAG::MaxRecursionDepth)
     return false;
 
   SDLoc DL(Op);
   unsigned EltSizeInBits = VT.getScalarSizeInBits();
 
   // Helper for demanding the specified elements and all the bits of both binary
   // operands.
   auto SimplifyDemandedVectorEltsBinOp = [&](SDValue Op0, SDValue Op1) {
     SDValue NewOp0 = SimplifyMultipleUseDemandedVectorElts(Op0, DemandedElts,
                                                            TLO.DAG, Depth + 1);
     SDValue NewOp1 = SimplifyMultipleUseDemandedVectorElts(Op1, DemandedElts,
                                                            TLO.DAG, Depth + 1);
     if (NewOp0 || NewOp1) {
       SDValue NewOp = TLO.DAG.getNode(
           Opcode, SDLoc(Op), VT, NewOp0 ? NewOp0 : Op0, NewOp1 ? NewOp1 : Op1);
       return TLO.CombineTo(Op, NewOp);
     }
     return false;
   };
 
   switch (Opcode) {
   case ISD::SCALAR_TO_VECTOR: {
     if (!DemandedElts[0]) {
       KnownUndef.setAllBits();
       return TLO.CombineTo(Op, TLO.DAG.getUNDEF(VT));
     }
     KnownUndef.setHighBits(NumElts - 1);
     break;
   }
   case ISD::BITCAST: {
     SDValue Src = Op.getOperand(0);
     EVT SrcVT = Src.getValueType();
 
     // We only handle vectors here.
     // TODO - investigate calling SimplifyDemandedBits/ComputeKnownBits?
     if (!SrcVT.isVector())
       break;
 
     // Fast handling of 'identity' bitcasts.
     unsigned NumSrcElts = SrcVT.getVectorNumElements();
     if (NumSrcElts == NumElts)
       return SimplifyDemandedVectorElts(Src, DemandedElts, KnownUndef,
                                         KnownZero, TLO, Depth + 1);
 
     APInt SrcZero, SrcUndef;
     APInt SrcDemandedElts = APInt::getNullValue(NumSrcElts);
 
     // Bitcast from 'large element' src vector to 'small element' vector, we
     // must demand a source element if any DemandedElt maps to it.
     if ((NumElts % NumSrcElts) == 0) {
       unsigned Scale = NumElts / NumSrcElts;
       for (unsigned i = 0; i != NumElts; ++i)
         if (DemandedElts[i])
           SrcDemandedElts.setBit(i / Scale);
 
       if (SimplifyDemandedVectorElts(Src, SrcDemandedElts, SrcUndef, SrcZero,
                                      TLO, Depth + 1))
         return true;
 
       // Try calling SimplifyDemandedBits, converting demanded elts to the bits
       // of the large element.
       // TODO - bigendian once we have test coverage.
       if (TLO.DAG.getDataLayout().isLittleEndian()) {
         unsigned SrcEltSizeInBits = SrcVT.getScalarSizeInBits();
         APInt SrcDemandedBits = APInt::getNullValue(SrcEltSizeInBits);
         for (unsigned i = 0; i != NumElts; ++i)
           if (DemandedElts[i]) {
             unsigned Ofs = (i % Scale) * EltSizeInBits;
             SrcDemandedBits.setBits(Ofs, Ofs + EltSizeInBits);
           }
 
         KnownBits Known;
         if (SimplifyDemandedBits(Src, SrcDemandedBits, SrcDemandedElts, Known,
                                  TLO, Depth + 1))
           return true;
       }
 
       // If the src element is zero/undef then all the output elements will be -
       // only demanded elements are guaranteed to be correct.
       for (unsigned i = 0; i != NumSrcElts; ++i) {
         if (SrcDemandedElts[i]) {
           if (SrcZero[i])
             KnownZero.setBits(i * Scale, (i + 1) * Scale);
           if (SrcUndef[i])
             KnownUndef.setBits(i * Scale, (i + 1) * Scale);
         }
       }
     }
 
     // Bitcast from 'small element' src vector to 'large element' vector, we
     // demand all smaller source elements covered by the larger demanded element
     // of this vector.
     if ((NumSrcElts % NumElts) == 0) {
       unsigned Scale = NumSrcElts / NumElts;
       for (unsigned i = 0; i != NumElts; ++i)
         if (DemandedElts[i])
           SrcDemandedElts.setBits(i * Scale, (i + 1) * Scale);
 
       if (SimplifyDemandedVectorElts(Src, SrcDemandedElts, SrcUndef, SrcZero,
                                      TLO, Depth + 1))
         return true;
 
       // If all the src elements covering an output element are zero/undef, then
       // the output element will be as well, assuming it was demanded.
       for (unsigned i = 0; i != NumElts; ++i) {
         if (DemandedElts[i]) {
           if (SrcZero.extractBits(Scale, i * Scale).isAllOnesValue())
             KnownZero.setBit(i);
           if (SrcUndef.extractBits(Scale, i * Scale).isAllOnesValue())
             KnownUndef.setBit(i);
         }
       }
     }
     break;
   }
   case ISD::BUILD_VECTOR: {
     // Check all elements and simplify any unused elements with UNDEF.
     if (!DemandedElts.isAllOnesValue()) {
       // Don't simplify BROADCASTS.
       if (llvm::any_of(Op->op_values(),
                        [&](SDValue Elt) { return Op.getOperand(0) != Elt; })) {
         SmallVector<SDValue, 32> Ops(Op->op_begin(), Op->op_end());
         bool Updated = false;
         for (unsigned i = 0; i != NumElts; ++i) {
           if (!DemandedElts[i] && !Ops[i].isUndef()) {
             Ops[i] = TLO.DAG.getUNDEF(Ops[0].getValueType());
             KnownUndef.setBit(i);
             Updated = true;
           }
         }
         if (Updated)
           return TLO.CombineTo(Op, TLO.DAG.getBuildVector(VT, DL, Ops));
       }
     }
     for (unsigned i = 0; i != NumElts; ++i) {
       SDValue SrcOp = Op.getOperand(i);
       if (SrcOp.isUndef()) {
         KnownUndef.setBit(i);
       } else if (EltSizeInBits == SrcOp.getScalarValueSizeInBits() &&
                  (isNullConstant(SrcOp) || isNullFPConstant(SrcOp))) {
         KnownZero.setBit(i);
       }
     }
     break;
   }
   case ISD::CONCAT_VECTORS: {
     EVT SubVT = Op.getOperand(0).getValueType();
     unsigned NumSubVecs = Op.getNumOperands();
     unsigned NumSubElts = SubVT.getVectorNumElements();
     for (unsigned i = 0; i != NumSubVecs; ++i) {
       SDValue SubOp = Op.getOperand(i);
       APInt SubElts = DemandedElts.extractBits(NumSubElts, i * NumSubElts);
       APInt SubUndef, SubZero;
       if (SimplifyDemandedVectorElts(SubOp, SubElts, SubUndef, SubZero, TLO,
                                      Depth + 1))
         return true;
       KnownUndef.insertBits(SubUndef, i * NumSubElts);
       KnownZero.insertBits(SubZero, i * NumSubElts);
     }
     break;
   }
   case ISD::INSERT_SUBVECTOR: {
     // Demand any elements from the subvector and the remainder from the src its
     // inserted into.
     SDValue Src = Op.getOperand(0);
     SDValue Sub = Op.getOperand(1);
     uint64_t Idx = Op.getConstantOperandVal(2);
     unsigned NumSubElts = Sub.getValueType().getVectorNumElements();
     APInt DemandedSubElts = DemandedElts.extractBits(NumSubElts, Idx);
     APInt DemandedSrcElts = DemandedElts;
     DemandedSrcElts.insertBits(APInt::getNullValue(NumSubElts), Idx);
 
     APInt SubUndef, SubZero;
     if (SimplifyDemandedVectorElts(Sub, DemandedSubElts, SubUndef, SubZero, TLO,
                                    Depth + 1))
       return true;
 
     // If none of the src operand elements are demanded, replace it with undef.
     if (!DemandedSrcElts && !Src.isUndef())
       return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::INSERT_SUBVECTOR, DL, VT,
                                                TLO.DAG.getUNDEF(VT), Sub,
                                                Op.getOperand(2)));
 
     if (SimplifyDemandedVectorElts(Src, DemandedSrcElts, KnownUndef, KnownZero,
                                    TLO, Depth + 1))
       return true;
     KnownUndef.insertBits(SubUndef, Idx);
     KnownZero.insertBits(SubZero, Idx);
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     if (!DemandedSrcElts.isAllOnesValue() ||
         !DemandedSubElts.isAllOnesValue()) {
       SDValue NewSrc = SimplifyMultipleUseDemandedVectorElts(
           Src, DemandedSrcElts, TLO.DAG, Depth + 1);
       SDValue NewSub = SimplifyMultipleUseDemandedVectorElts(
           Sub, DemandedSubElts, TLO.DAG, Depth + 1);
       if (NewSrc || NewSub) {
         NewSrc = NewSrc ? NewSrc : Src;
         NewSub = NewSub ? NewSub : Sub;
         SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), SDLoc(Op), VT, NewSrc,
                                         NewSub, Op.getOperand(2));
         return TLO.CombineTo(Op, NewOp);
       }
     }
     break;
   }
   case ISD::EXTRACT_SUBVECTOR: {
     // Offset the demanded elts by the subvector index.
     SDValue Src = Op.getOperand(0);
     if (Src.getValueType().isScalableVector())
       break;
     uint64_t Idx = Op.getConstantOperandVal(1);
     unsigned NumSrcElts = Src.getValueType().getVectorNumElements();
     APInt DemandedSrcElts = DemandedElts.zextOrSelf(NumSrcElts).shl(Idx);
 
     APInt SrcUndef, SrcZero;
     if (SimplifyDemandedVectorElts(Src, DemandedSrcElts, SrcUndef, SrcZero, TLO,
                                    Depth + 1))
       return true;
     KnownUndef = SrcUndef.extractBits(NumElts, Idx);
     KnownZero = SrcZero.extractBits(NumElts, Idx);
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     if (!DemandedElts.isAllOnesValue()) {
       SDValue NewSrc = SimplifyMultipleUseDemandedVectorElts(
           Src, DemandedSrcElts, TLO.DAG, Depth + 1);
       if (NewSrc) {
         SDValue NewOp = TLO.DAG.getNode(Op.getOpcode(), SDLoc(Op), VT, NewSrc,
                                         Op.getOperand(1));
         return TLO.CombineTo(Op, NewOp);
       }
     }
     break;
   }
   case ISD::INSERT_VECTOR_ELT: {
     SDValue Vec = Op.getOperand(0);
     SDValue Scl = Op.getOperand(1);
     auto *CIdx = dyn_cast<ConstantSDNode>(Op.getOperand(2));
 
     // For a legal, constant insertion index, if we don't need this insertion
     // then strip it, else remove it from the demanded elts.
     if (CIdx && CIdx->getAPIntValue().ult(NumElts)) {
       unsigned Idx = CIdx->getZExtValue();
       if (!DemandedElts[Idx])
         return TLO.CombineTo(Op, Vec);
 
       APInt DemandedVecElts(DemandedElts);
       DemandedVecElts.clearBit(Idx);
       if (SimplifyDemandedVectorElts(Vec, DemandedVecElts, KnownUndef,
                                      KnownZero, TLO, Depth + 1))
         return true;
 
       KnownUndef.clearBit(Idx);
       if (Scl.isUndef())
         KnownUndef.setBit(Idx);
 
       KnownZero.clearBit(Idx);
       if (isNullConstant(Scl) || isNullFPConstant(Scl))
         KnownZero.setBit(Idx);
       break;
     }
 
     APInt VecUndef, VecZero;
     if (SimplifyDemandedVectorElts(Vec, DemandedElts, VecUndef, VecZero, TLO,
                                    Depth + 1))
       return true;
     // Without knowing the insertion index we can't set KnownUndef/KnownZero.
     break;
   }
   case ISD::VSELECT: {
     // Try to transform the select condition based on the current demanded
     // elements.
     // TODO: If a condition element is undef, we can choose from one arm of the
     //       select (and if one arm is undef, then we can propagate that to the
     //       result).
     // TODO - add support for constant vselect masks (see IR version of this).
     APInt UnusedUndef, UnusedZero;
     if (SimplifyDemandedVectorElts(Op.getOperand(0), DemandedElts, UnusedUndef,
                                    UnusedZero, TLO, Depth + 1))
       return true;
 
     // See if we can simplify either vselect operand.
     APInt DemandedLHS(DemandedElts);
     APInt DemandedRHS(DemandedElts);
     APInt UndefLHS, ZeroLHS;
     APInt UndefRHS, ZeroRHS;
     if (SimplifyDemandedVectorElts(Op.getOperand(1), DemandedLHS, UndefLHS,
                                    ZeroLHS, TLO, Depth + 1))
       return true;
     if (SimplifyDemandedVectorElts(Op.getOperand(2), DemandedRHS, UndefRHS,
                                    ZeroRHS, TLO, Depth + 1))
       return true;
 
     KnownUndef = UndefLHS & UndefRHS;
     KnownZero = ZeroLHS & ZeroRHS;
     break;
   }
   case ISD::VECTOR_SHUFFLE: {
     ArrayRef<int> ShuffleMask = cast<ShuffleVectorSDNode>(Op)->getMask();
 
     // Collect demanded elements from shuffle operands..
     APInt DemandedLHS(NumElts, 0);
     APInt DemandedRHS(NumElts, 0);
     for (unsigned i = 0; i != NumElts; ++i) {
       int M = ShuffleMask[i];
       if (M < 0 || !DemandedElts[i])
         continue;
       assert(0 <= M && M < (int)(2 * NumElts) && "Shuffle index out of range");
       if (M < (int)NumElts)
         DemandedLHS.setBit(M);
       else
         DemandedRHS.setBit(M - NumElts);
     }
 
     // See if we can simplify either shuffle operand.
     APInt UndefLHS, ZeroLHS;
     APInt UndefRHS, ZeroRHS;
     if (SimplifyDemandedVectorElts(Op.getOperand(0), DemandedLHS, UndefLHS,
                                    ZeroLHS, TLO, Depth + 1))
       return true;
     if (SimplifyDemandedVectorElts(Op.getOperand(1), DemandedRHS, UndefRHS,
                                    ZeroRHS, TLO, Depth + 1))
       return true;
 
     // Simplify mask using undef elements from LHS/RHS.
     bool Updated = false;
     bool IdentityLHS = true, IdentityRHS = true;
     SmallVector<int, 32> NewMask(ShuffleMask.begin(), ShuffleMask.end());
     for (unsigned i = 0; i != NumElts; ++i) {
       int &M = NewMask[i];
       if (M < 0)
         continue;
       if (!DemandedElts[i] || (M < (int)NumElts && UndefLHS[M]) ||
           (M >= (int)NumElts && UndefRHS[M - NumElts])) {
         Updated = true;
         M = -1;
       }
       IdentityLHS &= (M < 0) || (M == (int)i);
       IdentityRHS &= (M < 0) || ((M - NumElts) == i);
     }
 
     // Update legal shuffle masks based on demanded elements if it won't reduce
     // to Identity which can cause premature removal of the shuffle mask.
     if (Updated && !IdentityLHS && !IdentityRHS && !TLO.LegalOps) {
       SDValue LegalShuffle =
           buildLegalVectorShuffle(VT, DL, Op.getOperand(0), Op.getOperand(1),
                                   NewMask, TLO.DAG);
       if (LegalShuffle)
         return TLO.CombineTo(Op, LegalShuffle);
     }
 
     // Propagate undef/zero elements from LHS/RHS.
     for (unsigned i = 0; i != NumElts; ++i) {
       int M = ShuffleMask[i];
       if (M < 0) {
         KnownUndef.setBit(i);
       } else if (M < (int)NumElts) {
         if (UndefLHS[M])
           KnownUndef.setBit(i);
         if (ZeroLHS[M])
           KnownZero.setBit(i);
       } else {
         if (UndefRHS[M - NumElts])
           KnownUndef.setBit(i);
         if (ZeroRHS[M - NumElts])
           KnownZero.setBit(i);
       }
     }
     break;
   }
   case ISD::ANY_EXTEND_VECTOR_INREG:
   case ISD::SIGN_EXTEND_VECTOR_INREG:
   case ISD::ZERO_EXTEND_VECTOR_INREG: {
     APInt SrcUndef, SrcZero;
     SDValue Src = Op.getOperand(0);
     unsigned NumSrcElts = Src.getValueType().getVectorNumElements();
     APInt DemandedSrcElts = DemandedElts.zextOrSelf(NumSrcElts);
     if (SimplifyDemandedVectorElts(Src, DemandedSrcElts, SrcUndef, SrcZero, TLO,
                                    Depth + 1))
       return true;
     KnownZero = SrcZero.zextOrTrunc(NumElts);
     KnownUndef = SrcUndef.zextOrTrunc(NumElts);
 
     if (Op.getOpcode() == ISD::ANY_EXTEND_VECTOR_INREG &&
         Op.getValueSizeInBits() == Src.getValueSizeInBits() &&
         DemandedSrcElts == 1 && TLO.DAG.getDataLayout().isLittleEndian()) {
       // aext - if we just need the bottom element then we can bitcast.
       return TLO.CombineTo(Op, TLO.DAG.getBitcast(VT, Src));
     }
 
     if (Op.getOpcode() == ISD::ZERO_EXTEND_VECTOR_INREG) {
       // zext(undef) upper bits are guaranteed to be zero.
       if (DemandedElts.isSubsetOf(KnownUndef))
         return TLO.CombineTo(Op, TLO.DAG.getConstant(0, SDLoc(Op), VT));
       KnownUndef.clearAllBits();
     }
     break;
   }
 
   // TODO: There are more binop opcodes that could be handled here - MIN,
   // MAX, saturated math, etc.
   case ISD::OR:
   case ISD::XOR:
   case ISD::ADD:
   case ISD::SUB:
   case ISD::FADD:
   case ISD::FSUB:
   case ISD::FMUL:
   case ISD::FDIV:
   case ISD::FREM: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
 
     APInt UndefRHS, ZeroRHS;
     if (SimplifyDemandedVectorElts(Op1, DemandedElts, UndefRHS, ZeroRHS, TLO,
                                    Depth + 1))
       return true;
     APInt UndefLHS, ZeroLHS;
     if (SimplifyDemandedVectorElts(Op0, DemandedElts, UndefLHS, ZeroLHS, TLO,
                                    Depth + 1))
       return true;
 
     KnownZero = ZeroLHS & ZeroRHS;
     KnownUndef = getKnownUndefForVectorBinop(Op, TLO.DAG, UndefLHS, UndefRHS);
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     // TODO - use KnownUndef to relax the demandedelts?
     if (!DemandedElts.isAllOnesValue())
       if (SimplifyDemandedVectorEltsBinOp(Op0, Op1))
         return true;
     break;
   }
   case ISD::SHL:
   case ISD::SRL:
   case ISD::SRA:
   case ISD::ROTL:
   case ISD::ROTR: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
 
     APInt UndefRHS, ZeroRHS;
     if (SimplifyDemandedVectorElts(Op1, DemandedElts, UndefRHS, ZeroRHS, TLO,
                                    Depth + 1))
       return true;
     APInt UndefLHS, ZeroLHS;
     if (SimplifyDemandedVectorElts(Op0, DemandedElts, UndefLHS, ZeroLHS, TLO,
                                    Depth + 1))
       return true;
 
     KnownZero = ZeroLHS;
     KnownUndef = UndefLHS & UndefRHS; // TODO: use getKnownUndefForVectorBinop?
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     // TODO - use KnownUndef to relax the demandedelts?
     if (!DemandedElts.isAllOnesValue())
       if (SimplifyDemandedVectorEltsBinOp(Op0, Op1))
         return true;
     break;
   }
   case ISD::MUL:
   case ISD::AND: {
     SDValue Op0 = Op.getOperand(0);
     SDValue Op1 = Op.getOperand(1);
 
     APInt SrcUndef, SrcZero;
     if (SimplifyDemandedVectorElts(Op1, DemandedElts, SrcUndef, SrcZero, TLO,
                                    Depth + 1))
       return true;
     if (SimplifyDemandedVectorElts(Op0, DemandedElts, KnownUndef, KnownZero,
                                    TLO, Depth + 1))
       return true;
 
     // If either side has a zero element, then the result element is zero, even
     // if the other is an UNDEF.
     // TODO: Extend getKnownUndefForVectorBinop to also deal with known zeros
     // and then handle 'and' nodes with the rest of the binop opcodes.
     KnownZero |= SrcZero;
     KnownUndef &= SrcUndef;
     KnownUndef &= ~KnownZero;
 
     // Attempt to avoid multi-use ops if we don't need anything from them.
     // TODO - use KnownUndef to relax the demandedelts?
     if (!DemandedElts.isAllOnesValue())
       if (SimplifyDemandedVectorEltsBinOp(Op0, Op1))
         return true;
     break;
   }
   case ISD::TRUNCATE:
   case ISD::SIGN_EXTEND:
   case ISD::ZERO_EXTEND:
     if (SimplifyDemandedVectorElts(Op.getOperand(0), DemandedElts, KnownUndef,
                                    KnownZero, TLO, Depth + 1))
       return true;
 
     if (Op.getOpcode() == ISD::ZERO_EXTEND) {
       // zext(undef) upper bits are guaranteed to be zero.
       if (DemandedElts.isSubsetOf(KnownUndef))
         return TLO.CombineTo(Op, TLO.DAG.getConstant(0, SDLoc(Op), VT));
       KnownUndef.clearAllBits();
     }
     break;
   default: {
     if (Op.getOpcode() >= ISD::BUILTIN_OP_END) {
       if (SimplifyDemandedVectorEltsForTargetNode(Op, DemandedElts, KnownUndef,
                                                   KnownZero, TLO, Depth))
         return true;
     } else {
       KnownBits Known;
       APInt DemandedBits = APInt::getAllOnesValue(EltSizeInBits);
       if (SimplifyDemandedBits(Op, DemandedBits, OriginalDemandedElts, Known,
                                TLO, Depth, AssumeSingleUse))
         return true;
     }
     break;
   }
   }
   assert((KnownUndef & KnownZero) == 0 && "Elements flagged as undef AND zero");
 
   // Constant fold all undef cases.
   // TODO: Handle zero cases as well.
   if (DemandedElts.isSubsetOf(KnownUndef))
     return TLO.CombineTo(Op, TLO.DAG.getUNDEF(VT));
 
   return false;
 }
 
 /// Determine which of the bits specified in Mask are known to be either zero or
 /// one and return them in the Known.
 void TargetLowering::computeKnownBitsForTargetNode(const SDValue Op,
                                                    KnownBits &Known,
                                                    const APInt &DemandedElts,
                                                    const SelectionDAG &DAG,
                                                    unsigned Depth) const {
   assert((Op.getOpcode() >= ISD::BUILTIN_OP_END ||
           Op.getOpcode() == ISD::INTRINSIC_WO_CHAIN ||
           Op.getOpcode() == ISD::INTRINSIC_W_CHAIN ||
           Op.getOpcode() == ISD::INTRINSIC_VOID) &&
          "Should use MaskedValueIsZero if you don't know whether Op"
          " is a target node!");
   Known.resetAll();
 }
 
 void TargetLowering::computeKnownBitsForTargetInstr(
     GISelKnownBits &Analysis, Register R, KnownBits &Known,
     const APInt &DemandedElts, const MachineRegisterInfo &MRI,
     unsigned Depth) const {
   Known.resetAll();
 }
 
 void TargetLowering::computeKnownBitsForFrameIndex(
   const int FrameIdx, KnownBits &Known, const MachineFunction &MF) const {
   // The low bits are known zero if the pointer is aligned.
   Known.Zero.setLowBits(Log2(MF.getFrameInfo().getObjectAlign(FrameIdx)));
 }
 
 Align TargetLowering::computeKnownAlignForTargetInstr(
   GISelKnownBits &Analysis, Register R, const MachineRegisterInfo &MRI,
   unsigned Depth) const {
   return Align(1);
 }
 
 /// This method can be implemented by targets that want to expose additional
 /// information about sign bits to the DAG Combiner.
 unsigned TargetLowering::ComputeNumSignBitsForTargetNode(SDValue Op,
                                                          const APInt &,
                                                          const SelectionDAG &,
                                                          unsigned Depth) const {
   assert((Op.getOpcode() >= ISD::BUILTIN_OP_END ||
           Op.getOpcode() == ISD::INTRINSIC_WO_CHAIN ||
           Op.getOpcode() == ISD::INTRINSIC_W_CHAIN ||
           Op.getOpcode() == ISD::INTRINSIC_VOID) &&
          "Should use ComputeNumSignBits if you don't know whether Op"
          " is a target node!");
   return 1;
 }
 
 unsigned TargetLowering::computeNumSignBitsForTargetInstr(
   GISelKnownBits &Analysis, Register R, const APInt &DemandedElts,
   const MachineRegisterInfo &MRI, unsigned Depth) const {
   return 1;
 }
 
 bool TargetLowering::SimplifyDemandedVectorEltsForTargetNode(
     SDValue Op, const APInt &DemandedElts, APInt &KnownUndef, APInt &KnownZero,
     TargetLoweringOpt &TLO, unsigned Depth) const {
   assert((Op.getOpcode() >= ISD::BUILTIN_OP_END ||
           Op.getOpcode() == ISD::INTRINSIC_WO_CHAIN ||
           Op.getOpcode() == ISD::INTRINSIC_W_CHAIN ||
           Op.getOpcode() == ISD::INTRINSIC_VOID) &&
          "Should use SimplifyDemandedVectorElts if you don't know whether Op"
          " is a target node!");
   return false;
 }
 
 bool TargetLowering::SimplifyDemandedBitsForTargetNode(
     SDValue Op, const APInt &DemandedBits, const APInt &DemandedElts,
     KnownBits &Known, TargetLoweringOpt &TLO, unsigned Depth) const {
   assert((Op.getOpcode() >= ISD::BUILTIN_OP_END ||
           Op.getOpcode() == ISD::INTRINSIC_WO_CHAIN ||
           Op.getOpcode() == ISD::INTRINSIC_W_CHAIN ||
           Op.getOpcode() == ISD::INTRINSIC_VOID) &&
          "Should use SimplifyDemandedBits if you don't know whether Op"
          " is a target node!");
   computeKnownBitsForTargetNode(Op, Known, DemandedElts, TLO.DAG, Depth);
   return false;
 }
 
 SDValue TargetLowering::SimplifyMultipleUseDemandedBitsForTargetNode(
     SDValue Op, const APInt &DemandedBits, const APInt &DemandedElts,
     SelectionDAG &DAG, unsigned Depth) const {
   assert(
       (Op.getOpcode() >= ISD::BUILTIN_OP_END ||
        Op.getOpcode() == ISD::INTRINSIC_WO_CHAIN ||
        Op.getOpcode() == ISD::INTRINSIC_W_CHAIN ||
        Op.getOpcode() == ISD::INTRINSIC_VOID) &&
       "Should use SimplifyMultipleUseDemandedBits if you don't know whether Op"
       " is a target node!");
   return SDValue();
 }
 
 SDValue
 TargetLowering::buildLegalVectorShuffle(EVT VT, const SDLoc &DL, SDValue N0,
                                         SDValue N1, MutableArrayRef<int> Mask,
                                         SelectionDAG &DAG) const {
   bool LegalMask = isShuffleMaskLegal(Mask, VT);
   if (!LegalMask) {
     std::swap(N0, N1);
     ShuffleVectorSDNode::commuteMask(Mask);
     LegalMask = isShuffleMaskLegal(Mask, VT);
   }
 
   if (!LegalMask)
     return SDValue();
 
   return DAG.getVectorShuffle(VT, DL, N0, N1, Mask);
 }
 
 const Constant *TargetLowering::getTargetConstantFromLoad(LoadSDNode*) const {
   return nullptr;
 }
 
 bool TargetLowering::isKnownNeverNaNForTargetNode(SDValue Op,
                                                   const SelectionDAG &DAG,
                                                   bool SNaN,
                                                   unsigned Depth) const {
   assert((Op.getOpcode() >= ISD::BUILTIN_OP_END ||
           Op.getOpcode() == ISD::INTRINSIC_WO_CHAIN ||
           Op.getOpcode() == ISD::INTRINSIC_W_CHAIN ||
           Op.getOpcode() == ISD::INTRINSIC_VOID) &&
          "Should use isKnownNeverNaN if you don't know whether Op"
          " is a target node!");
   return false;
 }
 
 // FIXME: Ideally, this would use ISD::isConstantSplatVector(), but that must
 // work with truncating build vectors and vectors with elements of less than
 // 8 bits.
 bool TargetLowering::isConstTrueVal(const SDNode *N) const {
   if (!N)
     return false;
 
   APInt CVal;
   if (auto *CN = dyn_cast<ConstantSDNode>(N)) {
     CVal = CN->getAPIntValue();
   } else if (auto *BV = dyn_cast<BuildVectorSDNode>(N)) {
     auto *CN = BV->getConstantSplatNode();
     if (!CN)
       return false;
 
     // If this is a truncating build vector, truncate the splat value.
     // Otherwise, we may fail to match the expected values below.
     unsigned BVEltWidth = BV->getValueType(0).getScalarSizeInBits();
     CVal = CN->getAPIntValue();
     if (BVEltWidth < CVal.getBitWidth())
       CVal = CVal.trunc(BVEltWidth);
   } else {
     return false;
   }
 
   switch (getBooleanContents(N->getValueType(0))) {
   case UndefinedBooleanContent:
     return CVal[0];
   case ZeroOrOneBooleanContent:
     return CVal.isOneValue();
   case ZeroOrNegativeOneBooleanContent:
     return CVal.isAllOnesValue();
   }
 
   llvm_unreachable("Invalid boolean contents");
 }
 
 bool TargetLowering::isConstFalseVal(const SDNode *N) const {
   if (!N)
     return false;
 
   const ConstantSDNode *CN = dyn_cast<ConstantSDNode>(N);
   if (!CN) {
     const BuildVectorSDNode *BV = dyn_cast<BuildVectorSDNode>(N);
     if (!BV)
       return false;
 
     // Only interested in constant splats, we don't care about undef
     // elements in identifying boolean constants and getConstantSplatNode
     // returns NULL if all ops are undef;
     CN = BV->getConstantSplatNode();
     if (!CN)
       return false;
   }
 
   if (getBooleanContents(N->getValueType(0)) == UndefinedBooleanContent)
     return !CN->getAPIntValue()[0];
 
   return CN->isNullValue();
 }
 
 bool TargetLowering::isExtendedTrueVal(const ConstantSDNode *N, EVT VT,
                                        bool SExt) const {
   if (VT == MVT::i1)
     return N->isOne();
 
   TargetLowering::BooleanContent Cnt = getBooleanContents(VT);
   switch (Cnt) {
   case TargetLowering::ZeroOrOneBooleanContent:
     // An extended value of 1 is always true, unless its original type is i1,
     // in which case it will be sign extended to -1.
     return (N->isOne() && !SExt) || (SExt && (N->getValueType(0) != MVT::i1));
   case TargetLowering::UndefinedBooleanContent:
   case TargetLowering::ZeroOrNegativeOneBooleanContent:
     return N->isAllOnesValue() && SExt;
   }
   llvm_unreachable("Unexpected enumeration.");
 }
 
 /// This helper function of SimplifySetCC tries to optimize the comparison when
 /// either operand of the SetCC node is a bitwise-and instruction.
 SDValue TargetLowering::foldSetCCWithAnd(EVT VT, SDValue N0, SDValue N1,
                                          ISD::CondCode Cond, const SDLoc &DL,
                                          DAGCombinerInfo &DCI) const {
   // Match these patterns in any of their permutations:
   // (X & Y) == Y
   // (X & Y) != Y
   if (N1.getOpcode() == ISD::AND && N0.getOpcode() != ISD::AND)
     std::swap(N0, N1);
 
   EVT OpVT = N0.getValueType();
   if (N0.getOpcode() != ISD::AND || !OpVT.isInteger() ||
       (Cond != ISD::SETEQ && Cond != ISD::SETNE))
     return SDValue();
 
   SDValue X, Y;
   if (N0.getOperand(0) == N1) {
     X = N0.getOperand(1);
     Y = N0.getOperand(0);
   } else if (N0.getOperand(1) == N1) {
     X = N0.getOperand(0);
     Y = N0.getOperand(1);
   } else {
     return SDValue();
   }
 
   SelectionDAG &DAG = DCI.DAG;
   SDValue Zero = DAG.getConstant(0, DL, OpVT);
   if (DAG.isKnownToBeAPowerOfTwo(Y)) {
     // Simplify X & Y == Y to X & Y != 0 if Y has exactly one bit set.
     // Note that where Y is variable and is known to have at most one bit set
     // (for example, if it is Z & 1) we cannot do this; the expressions are not
     // equivalent when Y == 0.
     assert(OpVT.isInteger());
     Cond = ISD::getSetCCInverse(Cond, OpVT);
     if (DCI.isBeforeLegalizeOps() ||
         isCondCodeLegal(Cond, N0.getSimpleValueType()))
       return DAG.getSetCC(DL, VT, N0, Zero, Cond);
   } else if (N0.hasOneUse() && hasAndNotCompare(Y)) {
     // If the target supports an 'and-not' or 'and-complement' logic operation,
     // try to use that to make a comparison operation more efficient.
     // But don't do this transform if the mask is a single bit because there are
     // more efficient ways to deal with that case (for example, 'bt' on x86 or
     // 'rlwinm' on PPC).
 
     // Bail out if the compare operand that we want to turn into a zero is
     // already a zero (otherwise, infinite loop).
     auto *YConst = dyn_cast<ConstantSDNode>(Y);
     if (YConst && YConst->isNullValue())
       return SDValue();
 
     // Transform this into: ~X & Y == 0.
     SDValue NotX = DAG.getNOT(SDLoc(X), X, OpVT);
     SDValue NewAnd = DAG.getNode(ISD::AND, SDLoc(N0), OpVT, NotX, Y);
     return DAG.getSetCC(DL, VT, NewAnd, Zero, Cond);
   }
 
   return SDValue();
 }
 
 /// There are multiple IR patterns that could be checking whether certain
 /// truncation of a signed number would be lossy or not. The pattern which is
 /// best at IR level, may not lower optimally. Thus, we want to unfold it.
 /// We are looking for the following pattern: (KeptBits is a constant)
 ///   (add %x, (1 << (KeptBits-1))) srccond (1 << KeptBits)
 /// KeptBits won't be bitwidth(x), that will be constant-folded to true/false.
 /// KeptBits also can't be 1, that would have been folded to  %x dstcond 0
 /// We will unfold it into the natural trunc+sext pattern:
 ///   ((%x << C) a>> C) dstcond %x
 /// Where  C = bitwidth(x) - KeptBits  and  C u< bitwidth(x)
 SDValue TargetLowering::optimizeSetCCOfSignedTruncationCheck(
     EVT SCCVT, SDValue N0, SDValue N1, ISD::CondCode Cond, DAGCombinerInfo &DCI,
     const SDLoc &DL) const {
   // We must be comparing with a constant.
   ConstantSDNode *C1;
   if (!(C1 = dyn_cast<ConstantSDNode>(N1)))
     return SDValue();
 
   // N0 should be:  add %x, (1 << (KeptBits-1))
   if (N0->getOpcode() != ISD::ADD)
     return SDValue();
 
   // And we must be 'add'ing a constant.
   ConstantSDNode *C01;
   if (!(C01 = dyn_cast<ConstantSDNode>(N0->getOperand(1))))
     return SDValue();
 
   SDValue X = N0->getOperand(0);
   EVT XVT = X.getValueType();
 
   // Validate constants ...
 
   APInt I1 = C1->getAPIntValue();
 
   ISD::CondCode NewCond;
   if (Cond == ISD::CondCode::SETULT) {
     NewCond = ISD::CondCode::SETEQ;
   } else if (Cond == ISD::CondCode::SETULE) {
     NewCond = ISD::CondCode::SETEQ;
     // But need to 'canonicalize' the constant.
     I1 += 1;
   } else if (Cond == ISD::CondCode::SETUGT) {
     NewCond = ISD::CondCode::SETNE;
     // But need to 'canonicalize' the constant.
     I1 += 1;
   } else if (Cond == ISD::CondCode::SETUGE) {
     NewCond = ISD::CondCode::SETNE;
   } else
     return SDValue();
 
   APInt I01 = C01->getAPIntValue();
 
   auto checkConstants = [&I1, &I01]() -> bool {
     // Both of them must be power-of-two, and the constant from setcc is bigger.
     return I1.ugt(I01) && I1.isPowerOf2() && I01.isPowerOf2();
   };
 
   if (checkConstants()) {
     // Great, e.g. got  icmp ult i16 (add i16 %x, 128), 256
   } else {
     // What if we invert constants? (and the target predicate)
     I1.negate();
     I01.negate();
     assert(XVT.isInteger());
     NewCond = getSetCCInverse(NewCond, XVT);
     if (!checkConstants())
       return SDValue();
     // Great, e.g. got  icmp uge i16 (add i16 %x, -128), -256
   }
 
   // They are power-of-two, so which bit is set?
   const unsigned KeptBits = I1.logBase2();
   const unsigned KeptBitsMinusOne = I01.logBase2();
 
   // Magic!
   if (KeptBits != (KeptBitsMinusOne + 1))
     return SDValue();
   assert(KeptBits > 0 && KeptBits < XVT.getSizeInBits() && "unreachable");
 
   // We don't want to do this in every single case.
   SelectionDAG &DAG = DCI.DAG;
   if (!DAG.getTargetLoweringInfo().shouldTransformSignedTruncationCheck(
           XVT, KeptBits))
     return SDValue();
 
   const unsigned MaskedBits = XVT.getSizeInBits() - KeptBits;
   assert(MaskedBits > 0 && MaskedBits < XVT.getSizeInBits() && "unreachable");
 
   // Unfold into:  ((%x << C) a>> C) cond %x
   // Where 'cond' will be either 'eq' or 'ne'.
   SDValue ShiftAmt = DAG.getConstant(MaskedBits, DL, XVT);
   SDValue T0 = DAG.getNode(ISD::SHL, DL, XVT, X, ShiftAmt);
   SDValue T1 = DAG.getNode(ISD::SRA, DL, XVT, T0, ShiftAmt);
   SDValue T2 = DAG.getSetCC(DL, SCCVT, T1, X, NewCond);
 
   return T2;
 }
 
 // (X & (C l>>/<< Y)) ==/!= 0  -->  ((X <</l>> Y) & C) ==/!= 0
 SDValue TargetLowering::optimizeSetCCByHoistingAndByConstFromLogicalShift(
     EVT SCCVT, SDValue N0, SDValue N1C, ISD::CondCode Cond,
     DAGCombinerInfo &DCI, const SDLoc &DL) const {
   assert(isConstOrConstSplat(N1C) &&
          isConstOrConstSplat(N1C)->getAPIntValue().isNullValue() &&
          "Should be a comparison with 0.");
   assert((Cond == ISD::SETEQ || Cond == ISD::SETNE) &&
          "Valid only for [in]equality comparisons.");
 
   unsigned NewShiftOpcode;
   SDValue X, C, Y;
 
   SelectionDAG &DAG = DCI.DAG;
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
 
   // Look for '(C l>>/<< Y)'.
   auto Match = [&NewShiftOpcode, &X, &C, &Y, &TLI, &DAG](SDValue V) {
     // The shift should be one-use.
     if (!V.hasOneUse())
       return false;
     unsigned OldShiftOpcode = V.getOpcode();
     switch (OldShiftOpcode) {
     case ISD::SHL:
       NewShiftOpcode = ISD::SRL;
       break;
     case ISD::SRL:
       NewShiftOpcode = ISD::SHL;
       break;
     default:
       return false; // must be a logical shift.
     }
     // We should be shifting a constant.
     // FIXME: best to use isConstantOrConstantVector().
     C = V.getOperand(0);
     ConstantSDNode *CC =
         isConstOrConstSplat(C, /*AllowUndefs=*/true, /*AllowTruncation=*/true);
     if (!CC)
       return false;
     Y = V.getOperand(1);
 
     ConstantSDNode *XC =
         isConstOrConstSplat(X, /*AllowUndefs=*/true, /*AllowTruncation=*/true);
     return TLI.shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd(
         X, XC, CC, Y, OldShiftOpcode, NewShiftOpcode, DAG);
   };
 
   // LHS of comparison should be an one-use 'and'.
   if (N0.getOpcode() != ISD::AND || !N0.hasOneUse())
     return SDValue();
 
   X = N0.getOperand(0);
   SDValue Mask = N0.getOperand(1);
 
   // 'and' is commutative!
   if (!Match(Mask)) {
     std::swap(X, Mask);
     if (!Match(Mask))
       return SDValue();
   }
 
   EVT VT = X.getValueType();
 
   // Produce:
   // ((X 'OppositeShiftOpcode' Y) & C) Cond 0
   SDValue T0 = DAG.getNode(NewShiftOpcode, DL, VT, X, Y);
   SDValue T1 = DAG.getNode(ISD::AND, DL, VT, T0, C);
   SDValue T2 = DAG.getSetCC(DL, SCCVT, T1, N1C, Cond);
   return T2;
 }
 
 /// Try to fold an equality comparison with a {add/sub/xor} binary operation as
 /// the 1st operand (N0). Callers are expected to swap the N0/N1 parameters to
 /// handle the commuted versions of these patterns.
 SDValue TargetLowering::foldSetCCWithBinOp(EVT VT, SDValue N0, SDValue N1,
                                            ISD::CondCode Cond, const SDLoc &DL,
                                            DAGCombinerInfo &DCI) const {
   unsigned BOpcode = N0.getOpcode();
   assert((BOpcode == ISD::ADD || BOpcode == ISD::SUB || BOpcode == ISD::XOR) &&
          "Unexpected binop");
   assert((Cond == ISD::SETEQ || Cond == ISD::SETNE) && "Unexpected condcode");
 
   // (X + Y) == X --> Y == 0
   // (X - Y) == X --> Y == 0
   // (X ^ Y) == X --> Y == 0
   SelectionDAG &DAG = DCI.DAG;
   EVT OpVT = N0.getValueType();
   SDValue X = N0.getOperand(0);
   SDValue Y = N0.getOperand(1);
   if (X == N1)
     return DAG.getSetCC(DL, VT, Y, DAG.getConstant(0, DL, OpVT), Cond);
 
   if (Y != N1)
     return SDValue();
 
   // (X + Y) == Y --> X == 0
   // (X ^ Y) == Y --> X == 0
   if (BOpcode == ISD::ADD || BOpcode == ISD::XOR)
     return DAG.getSetCC(DL, VT, X, DAG.getConstant(0, DL, OpVT), Cond);
 
   // The shift would not be valid if the operands are boolean (i1).
   if (!N0.hasOneUse() || OpVT.getScalarSizeInBits() == 1)
     return SDValue();
 
   // (X - Y) == Y --> X == Y << 1
   EVT ShiftVT = getShiftAmountTy(OpVT, DAG.getDataLayout(),
                                  !DCI.isBeforeLegalize());
   SDValue One = DAG.getConstant(1, DL, ShiftVT);
   SDValue YShl1 = DAG.getNode(ISD::SHL, DL, N1.getValueType(), Y, One);
   if (!DCI.isCalledByLegalizer())
     DCI.AddToWorklist(YShl1.getNode());
   return DAG.getSetCC(DL, VT, X, YShl1, Cond);
 }
 
 /// Try to simplify a setcc built with the specified operands and cc. If it is
 /// unable to simplify it, return a null SDValue.
 SDValue TargetLowering::SimplifySetCC(EVT VT, SDValue N0, SDValue N1,
                                       ISD::CondCode Cond, bool foldBooleans,
                                       DAGCombinerInfo &DCI,
                                       const SDLoc &dl) const {
   SelectionDAG &DAG = DCI.DAG;
   const DataLayout &Layout = DAG.getDataLayout();
   EVT OpVT = N0.getValueType();
 
   // Constant fold or commute setcc.
   if (SDValue Fold = DAG.FoldSetCC(VT, N0, N1, Cond, dl))
     return Fold;
 
   // Ensure that the constant occurs on the RHS and fold constant comparisons.
   // TODO: Handle non-splat vector constants. All undef causes trouble.
   ISD::CondCode SwappedCC = ISD::getSetCCSwappedOperands(Cond);
   if (isConstOrConstSplat(N0) &&
       (DCI.isBeforeLegalizeOps() ||
        isCondCodeLegal(SwappedCC, N0.getSimpleValueType())))
     return DAG.getSetCC(dl, VT, N1, N0, SwappedCC);
 
   // If we have a subtract with the same 2 non-constant operands as this setcc
   // -- but in reverse order -- then try to commute the operands of this setcc
   // to match. A matching pair of setcc (cmp) and sub may be combined into 1
   // instruction on some targets.
   if (!isConstOrConstSplat(N0) && !isConstOrConstSplat(N1) &&
       (DCI.isBeforeLegalizeOps() ||
        isCondCodeLegal(SwappedCC, N0.getSimpleValueType())) &&
       DAG.getNodeIfExists(ISD::SUB, DAG.getVTList(OpVT), { N1, N0 } ) &&
       !DAG.getNodeIfExists(ISD::SUB, DAG.getVTList(OpVT), { N0, N1 } ))
     return DAG.getSetCC(dl, VT, N1, N0, SwappedCC);
 
   if (auto *N1C = dyn_cast<ConstantSDNode>(N1.getNode())) {
     const APInt &C1 = N1C->getAPIntValue();
 
     // If the LHS is '(srl (ctlz x), 5)', the RHS is 0/1, and this is an
     // equality comparison, then we're just comparing whether X itself is
     // zero.
     if (N0.getOpcode() == ISD::SRL && (C1.isNullValue() || C1.isOneValue()) &&
         N0.getOperand(0).getOpcode() == ISD::CTLZ &&
         N0.getOperand(1).getOpcode() == ISD::Constant) {
       const APInt &ShAmt = N0.getConstantOperandAPInt(1);
       if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) &&
           ShAmt == Log2_32(N0.getValueSizeInBits())) {
         if ((C1 == 0) == (Cond == ISD::SETEQ)) {
           // (srl (ctlz x), 5) == 0  -> X != 0
           // (srl (ctlz x), 5) != 1  -> X != 0
           Cond = ISD::SETNE;
         } else {
           // (srl (ctlz x), 5) != 0  -> X == 0
           // (srl (ctlz x), 5) == 1  -> X == 0
           Cond = ISD::SETEQ;
         }
         SDValue Zero = DAG.getConstant(0, dl, N0.getValueType());
         return DAG.getSetCC(dl, VT, N0.getOperand(0).getOperand(0),
                             Zero, Cond);
       }
     }
 
     SDValue CTPOP = N0;
     // Look through truncs that don't change the value of a ctpop.
     if (N0.hasOneUse() && N0.getOpcode() == ISD::TRUNCATE)
       CTPOP = N0.getOperand(0);
 
     if (CTPOP.hasOneUse() && CTPOP.getOpcode() == ISD::CTPOP &&
         (N0 == CTPOP ||
          N0.getValueSizeInBits() > Log2_32_Ceil(CTPOP.getValueSizeInBits()))) {
       EVT CTVT = CTPOP.getValueType();
       SDValue CTOp = CTPOP.getOperand(0);
 
       // (ctpop x) u< 2 -> (x & x-1) == 0
       // (ctpop x) u> 1 -> (x & x-1) != 0
       if ((Cond == ISD::SETULT && C1 == 2) || (Cond == ISD::SETUGT && C1 == 1)){
         SDValue NegOne = DAG.getAllOnesConstant(dl, CTVT);
         SDValue Add = DAG.getNode(ISD::ADD, dl, CTVT, CTOp, NegOne);
         SDValue And = DAG.getNode(ISD::AND, dl, CTVT, CTOp, Add);
         ISD::CondCode CC = Cond == ISD::SETULT ? ISD::SETEQ : ISD::SETNE;
         return DAG.getSetCC(dl, VT, And, DAG.getConstant(0, dl, CTVT), CC);
       }
 
       // If ctpop is not supported, expand a power-of-2 comparison based on it.
       if (C1 == 1 && !isOperationLegalOrCustom(ISD::CTPOP, CTVT) &&
           (Cond == ISD::SETEQ || Cond == ISD::SETNE)) {
         // (ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)
         // (ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)
         SDValue Zero = DAG.getConstant(0, dl, CTVT);
         SDValue NegOne = DAG.getAllOnesConstant(dl, CTVT);
         assert(CTVT.isInteger());
         ISD::CondCode InvCond = ISD::getSetCCInverse(Cond, CTVT);
         SDValue Add = DAG.getNode(ISD::ADD, dl, CTVT, CTOp, NegOne);
         SDValue And = DAG.getNode(ISD::AND, dl, CTVT, CTOp, Add);
         SDValue LHS = DAG.getSetCC(dl, VT, CTOp, Zero, InvCond);
         SDValue RHS = DAG.getSetCC(dl, VT, And, Zero, Cond);
         unsigned LogicOpcode = Cond == ISD::SETEQ ? ISD::AND : ISD::OR;
         return DAG.getNode(LogicOpcode, dl, VT, LHS, RHS);
       }
     }
 
     // (zext x) == C --> x == (trunc C)
     // (sext x) == C --> x == (trunc C)
     if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) &&
         DCI.isBeforeLegalize() && N0->hasOneUse()) {
       unsigned MinBits = N0.getValueSizeInBits();
       SDValue PreExt;
       bool Signed = false;
       if (N0->getOpcode() == ISD::ZERO_EXTEND) {
         // ZExt
         MinBits = N0->getOperand(0).getValueSizeInBits();
         PreExt = N0->getOperand(0);
       } else if (N0->getOpcode() == ISD::AND) {
         // DAGCombine turns costly ZExts into ANDs
         if (auto *C = dyn_cast<ConstantSDNode>(N0->getOperand(1)))
           if ((C->getAPIntValue()+1).isPowerOf2()) {
             MinBits = C->getAPIntValue().countTrailingOnes();
             PreExt = N0->getOperand(0);
           }
       } else if (N0->getOpcode() == ISD::SIGN_EXTEND) {
         // SExt
         MinBits = N0->getOperand(0).getValueSizeInBits();
         PreExt = N0->getOperand(0);
         Signed = true;
       } else if (auto *LN0 = dyn_cast<LoadSDNode>(N0)) {
         // ZEXTLOAD / SEXTLOAD
         if (LN0->getExtensionType() == ISD::ZEXTLOAD) {
           MinBits = LN0->getMemoryVT().getSizeInBits();
           PreExt = N0;
         } else if (LN0->getExtensionType() == ISD::SEXTLOAD) {
           Signed = true;
           MinBits = LN0->getMemoryVT().getSizeInBits();
           PreExt = N0;
         }
       }
 
       // Figure out how many bits we need to preserve this constant.
       unsigned ReqdBits = Signed ?
         C1.getBitWidth() - C1.getNumSignBits() + 1 :
         C1.getActiveBits();
 
       // Make sure we're not losing bits from the constant.
       if (MinBits > 0 &&
           MinBits < C1.getBitWidth() &&
           MinBits >= ReqdBits) {
         EVT MinVT = EVT::getIntegerVT(*DAG.getContext(), MinBits);
         if (isTypeDesirableForOp(ISD::SETCC, MinVT)) {
           // Will get folded away.
           SDValue Trunc = DAG.getNode(ISD::TRUNCATE, dl, MinVT, PreExt);
           if (MinBits == 1 && C1 == 1)
             // Invert the condition.
             return DAG.getSetCC(dl, VT, Trunc, DAG.getConstant(0, dl, MVT::i1),
                                 Cond == ISD::SETEQ ? ISD::SETNE : ISD::SETEQ);
           SDValue C = DAG.getConstant(C1.trunc(MinBits), dl, MinVT);
           return DAG.getSetCC(dl, VT, Trunc, C, Cond);
         }
 
         // If truncating the setcc operands is not desirable, we can still
         // simplify the expression in some cases:
         // setcc ([sz]ext (setcc x, y, cc)), 0, setne) -> setcc (x, y, cc)
         // setcc ([sz]ext (setcc x, y, cc)), 0, seteq) -> setcc (x, y, inv(cc))
         // setcc (zext (setcc x, y, cc)), 1, setne) -> setcc (x, y, inv(cc))
         // setcc (zext (setcc x, y, cc)), 1, seteq) -> setcc (x, y, cc)
         // setcc (sext (setcc x, y, cc)), -1, setne) -> setcc (x, y, inv(cc))
         // setcc (sext (setcc x, y, cc)), -1, seteq) -> setcc (x, y, cc)
         SDValue TopSetCC = N0->getOperand(0);
         unsigned N0Opc = N0->getOpcode();
         bool SExt = (N0Opc == ISD::SIGN_EXTEND);
         if (TopSetCC.getValueType() == MVT::i1 && VT == MVT::i1 &&
             TopSetCC.getOpcode() == ISD::SETCC &&
             (N0Opc == ISD::ZERO_EXTEND || N0Opc == ISD::SIGN_EXTEND) &&
             (isConstFalseVal(N1C) ||
              isExtendedTrueVal(N1C, N0->getValueType(0), SExt))) {
 
           bool Inverse = (N1C->isNullValue() && Cond == ISD::SETEQ) ||
                          (!N1C->isNullValue() && Cond == ISD::SETNE);
 
           if (!Inverse)
             return TopSetCC;
 
           ISD::CondCode InvCond = ISD::getSetCCInverse(
               cast<CondCodeSDNode>(TopSetCC.getOperand(2))->get(),
               TopSetCC.getOperand(0).getValueType());
           return DAG.getSetCC(dl, VT, TopSetCC.getOperand(0),
                                       TopSetCC.getOperand(1),
                                       InvCond);
         }
       }
     }
 
     // If the LHS is '(and load, const)', the RHS is 0, the test is for
     // equality or unsigned, and all 1 bits of the const are in the same
     // partial word, see if we can shorten the load.
     if (DCI.isBeforeLegalize() &&
         !ISD::isSignedIntSetCC(Cond) &&
         N0.getOpcode() == ISD::AND && C1 == 0 &&
         N0.getNode()->hasOneUse() &&
         isa<LoadSDNode>(N0.getOperand(0)) &&
         N0.getOperand(0).getNode()->hasOneUse() &&
         isa<ConstantSDNode>(N0.getOperand(1))) {
       LoadSDNode *Lod = cast<LoadSDNode>(N0.getOperand(0));
       APInt bestMask;
       unsigned bestWidth = 0, bestOffset = 0;
       if (Lod->isSimple() && Lod->isUnindexed()) {
         unsigned origWidth = N0.getValueSizeInBits();
         unsigned maskWidth = origWidth;
         // We can narrow (e.g.) 16-bit extending loads on 32-bit target to
         // 8 bits, but have to be careful...
         if (Lod->getExtensionType() != ISD::NON_EXTLOAD)
           origWidth = Lod->getMemoryVT().getSizeInBits();
         const APInt &Mask = N0.getConstantOperandAPInt(1);
         for (unsigned width = origWidth / 2; width>=8; width /= 2) {
           APInt newMask = APInt::getLowBitsSet(maskWidth, width);
           for (unsigned offset=0; offset<origWidth/width; offset++) {
             if (Mask.isSubsetOf(newMask)) {
               if (Layout.isLittleEndian())
                 bestOffset = (uint64_t)offset * (width/8);
               else
                 bestOffset = (origWidth/width - offset - 1) * (width/8);
               bestMask = Mask.lshr(offset * (width/8) * 8);
               bestWidth = width;
               break;
             }
             newMask <<= width;
           }
         }
       }
       if (bestWidth) {
         EVT newVT = EVT::getIntegerVT(*DAG.getContext(), bestWidth);
         if (newVT.isRound() &&
             shouldReduceLoadWidth(Lod, ISD::NON_EXTLOAD, newVT)) {
           SDValue Ptr = Lod->getBasePtr();
           if (bestOffset != 0)
             Ptr = DAG.getMemBasePlusOffset(Ptr, bestOffset, dl);
           unsigned NewAlign = MinAlign(Lod->getAlignment(), bestOffset);
           SDValue NewLoad = DAG.getLoad(
               newVT, dl, Lod->getChain(), Ptr,
               Lod->getPointerInfo().getWithOffset(bestOffset), NewAlign);
           return DAG.getSetCC(dl, VT,
                               DAG.getNode(ISD::AND, dl, newVT, NewLoad,
                                       DAG.getConstant(bestMask.trunc(bestWidth),
                                                       dl, newVT)),
                               DAG.getConstant(0LL, dl, newVT), Cond);
         }
       }
     }
 
     // If the LHS is a ZERO_EXTEND, perform the comparison on the input.
     if (N0.getOpcode() == ISD::ZERO_EXTEND) {
       unsigned InSize = N0.getOperand(0).getValueSizeInBits();
 
       // If the comparison constant has bits in the upper part, the
       // zero-extended value could never match.
       if (C1.intersects(APInt::getHighBitsSet(C1.getBitWidth(),
                                               C1.getBitWidth() - InSize))) {
         switch (Cond) {
         case ISD::SETUGT:
         case ISD::SETUGE:
         case ISD::SETEQ:
           return DAG.getConstant(0, dl, VT);
         case ISD::SETULT:
         case ISD::SETULE:
         case ISD::SETNE:
           return DAG.getConstant(1, dl, VT);
         case ISD::SETGT:
         case ISD::SETGE:
           // True if the sign bit of C1 is set.
           return DAG.getConstant(C1.isNegative(), dl, VT);
         case ISD::SETLT:
         case ISD::SETLE:
           // True if the sign bit of C1 isn't set.
           return DAG.getConstant(C1.isNonNegative(), dl, VT);
         default:
           break;
         }
       }
 
       // Otherwise, we can perform the comparison with the low bits.
       switch (Cond) {
       case ISD::SETEQ:
       case ISD::SETNE:
       case ISD::SETUGT:
       case ISD::SETUGE:
       case ISD::SETULT:
       case ISD::SETULE: {
         EVT newVT = N0.getOperand(0).getValueType();
         if (DCI.isBeforeLegalizeOps() ||
             (isOperationLegal(ISD::SETCC, newVT) &&
              isCondCodeLegal(Cond, newVT.getSimpleVT()))) {
           EVT NewSetCCVT = getSetCCResultType(Layout, *DAG.getContext(), newVT);
           SDValue NewConst = DAG.getConstant(C1.trunc(InSize), dl, newVT);
 
           SDValue NewSetCC = DAG.getSetCC(dl, NewSetCCVT, N0.getOperand(0),
                                           NewConst, Cond);
           return DAG.getBoolExtOrTrunc(NewSetCC, dl, VT, N0.getValueType());
         }
         break;
       }
       default:
         break; // todo, be more careful with signed comparisons
       }
     } else if (N0.getOpcode() == ISD::SIGN_EXTEND_INREG &&
                (Cond == ISD::SETEQ || Cond == ISD::SETNE)) {
       EVT ExtSrcTy = cast<VTSDNode>(N0.getOperand(1))->getVT();
       unsigned ExtSrcTyBits = ExtSrcTy.getSizeInBits();
       EVT ExtDstTy = N0.getValueType();
       unsigned ExtDstTyBits = ExtDstTy.getSizeInBits();
 
       // If the constant doesn't fit into the number of bits for the source of
       // the sign extension, it is impossible for both sides to be equal.
       if (C1.getMinSignedBits() > ExtSrcTyBits)
         return DAG.getConstant(Cond == ISD::SETNE, dl, VT);
 
       SDValue ZextOp;
       EVT Op0Ty = N0.getOperand(0).getValueType();
       if (Op0Ty == ExtSrcTy) {
         ZextOp = N0.getOperand(0);
       } else {
         APInt Imm = APInt::getLowBitsSet(ExtDstTyBits, ExtSrcTyBits);
         ZextOp = DAG.getNode(ISD::AND, dl, Op0Ty, N0.getOperand(0),
                              DAG.getConstant(Imm, dl, Op0Ty));
       }
       if (!DCI.isCalledByLegalizer())
         DCI.AddToWorklist(ZextOp.getNode());
       // Otherwise, make this a use of a zext.
       return DAG.getSetCC(dl, VT, ZextOp,
                           DAG.getConstant(C1 & APInt::getLowBitsSet(
                                                               ExtDstTyBits,
                                                               ExtSrcTyBits),
                                           dl, ExtDstTy),
                           Cond);
     } else if ((N1C->isNullValue() || N1C->isOne()) &&
                 (Cond == ISD::SETEQ || Cond == ISD::SETNE)) {
       // SETCC (SETCC), [0|1], [EQ|NE]  -> SETCC
       if (N0.getOpcode() == ISD::SETCC &&
           isTypeLegal(VT) && VT.bitsLE(N0.getValueType()) &&
           (N0.getValueType() == MVT::i1 ||
            getBooleanContents(N0.getOperand(0).getValueType()) ==
                        ZeroOrOneBooleanContent)) {
         bool TrueWhenTrue = (Cond == ISD::SETEQ) ^ (!N1C->isOne());
         if (TrueWhenTrue)
           return DAG.getNode(ISD::TRUNCATE, dl, VT, N0);
         // Invert the condition.
         ISD::CondCode CC = cast<CondCodeSDNode>(N0.getOperand(2))->get();
         CC = ISD::getSetCCInverse(CC, N0.getOperand(0).getValueType());
         if (DCI.isBeforeLegalizeOps() ||
             isCondCodeLegal(CC, N0.getOperand(0).getSimpleValueType()))
           return DAG.getSetCC(dl, VT, N0.getOperand(0), N0.getOperand(1), CC);
       }
 
       if ((N0.getOpcode() == ISD::XOR ||
            (N0.getOpcode() == ISD::AND &&
             N0.getOperand(0).getOpcode() == ISD::XOR &&
             N0.getOperand(1) == N0.getOperand(0).getOperand(1))) &&
           isa<ConstantSDNode>(N0.getOperand(1)) &&
           cast<ConstantSDNode>(N0.getOperand(1))->isOne()) {
         // If this is (X^1) == 0/1, swap the RHS and eliminate the xor.  We
         // can only do this if the top bits are known zero.
         unsigned BitWidth = N0.getValueSizeInBits();
         if (DAG.MaskedValueIsZero(N0,
                                   APInt::getHighBitsSet(BitWidth,
                                                         BitWidth-1))) {
           // Okay, get the un-inverted input value.
           SDValue Val;
           if (N0.getOpcode() == ISD::XOR) {
             Val = N0.getOperand(0);
           } else {
             assert(N0.getOpcode() == ISD::AND &&
                     N0.getOperand(0).getOpcode() == ISD::XOR);
             // ((X^1)&1)^1 -> X & 1
             Val = DAG.getNode(ISD::AND, dl, N0.getValueType(),
                               N0.getOperand(0).getOperand(0),
                               N0.getOperand(1));
           }
 
           return DAG.getSetCC(dl, VT, Val, N1,
                               Cond == ISD::SETEQ ? ISD::SETNE : ISD::SETEQ);
         }
       } else if (N1C->isOne()) {
         SDValue Op0 = N0;
         if (Op0.getOpcode() == ISD::TRUNCATE)
           Op0 = Op0.getOperand(0);
 
         if ((Op0.getOpcode() == ISD::XOR) &&
             Op0.getOperand(0).getOpcode() == ISD::SETCC &&
             Op0.getOperand(1).getOpcode() == ISD::SETCC) {
           SDValue XorLHS = Op0.getOperand(0);
           SDValue XorRHS = Op0.getOperand(1);
           // Ensure that the input setccs return an i1 type or 0/1 value.
           if (Op0.getValueType() == MVT::i1 ||
               (getBooleanContents(XorLHS.getOperand(0).getValueType()) ==
                       ZeroOrOneBooleanContent &&
                getBooleanContents(XorRHS.getOperand(0).getValueType()) ==
                         ZeroOrOneBooleanContent)) {
             // (xor (setcc), (setcc)) == / != 1 -> (setcc) != / == (setcc)
             Cond = (Cond == ISD::SETEQ) ? ISD::SETNE : ISD::SETEQ;
             return DAG.getSetCC(dl, VT, XorLHS, XorRHS, Cond);
           }
         }
         if (Op0.getOpcode() == ISD::AND &&
             isa<ConstantSDNode>(Op0.getOperand(1)) &&
             cast<ConstantSDNode>(Op0.getOperand(1))->isOne()) {
           // If this is (X&1) == / != 1, normalize it to (X&1) != / == 0.
           if (Op0.getValueType().bitsGT(VT))
             Op0 = DAG.getNode(ISD::AND, dl, VT,
                           DAG.getNode(ISD::TRUNCATE, dl, VT, Op0.getOperand(0)),
                           DAG.getConstant(1, dl, VT));
           else if (Op0.getValueType().bitsLT(VT))
             Op0 = DAG.getNode(ISD::AND, dl, VT,
                         DAG.getNode(ISD::ANY_EXTEND, dl, VT, Op0.getOperand(0)),
                         DAG.getConstant(1, dl, VT));
 
           return DAG.getSetCC(dl, VT, Op0,
                               DAG.getConstant(0, dl, Op0.getValueType()),
                               Cond == ISD::SETEQ ? ISD::SETNE : ISD::SETEQ);
         }
         if (Op0.getOpcode() == ISD::AssertZext &&
             cast<VTSDNode>(Op0.getOperand(1))->getVT() == MVT::i1)
           return DAG.getSetCC(dl, VT, Op0,
                               DAG.getConstant(0, dl, Op0.getValueType()),
                               Cond == ISD::SETEQ ? ISD::SETNE : ISD::SETEQ);
       }
     }
 
     // Given:
     //   icmp eq/ne (urem %x, %y), 0
     // Iff %x has 0 or 1 bits set, and %y has at least 2 bits set, omit 'urem':
     //   icmp eq/ne %x, 0
     if (N0.getOpcode() == ISD::UREM && N1C->isNullValue() &&
         (Cond == ISD::SETEQ || Cond == ISD::SETNE)) {
       KnownBits XKnown = DAG.computeKnownBits(N0.getOperand(0));
       KnownBits YKnown = DAG.computeKnownBits(N0.getOperand(1));
       if (XKnown.countMaxPopulation() == 1 && YKnown.countMinPopulation() >= 2)
         return DAG.getSetCC(dl, VT, N0.getOperand(0), N1, Cond);
     }
 
     if (SDValue V =
             optimizeSetCCOfSignedTruncationCheck(VT, N0, N1, Cond, DCI, dl))
       return V;
   }
 
   // These simplifications apply to splat vectors as well.
   // TODO: Handle more splat vector cases.
   if (auto *N1C = isConstOrConstSplat(N1)) {
     const APInt &C1 = N1C->getAPIntValue();
 
     APInt MinVal, MaxVal;
     unsigned OperandBitSize = N1C->getValueType(0).getScalarSizeInBits();
     if (ISD::isSignedIntSetCC(Cond)) {
       MinVal = APInt::getSignedMinValue(OperandBitSize);
       MaxVal = APInt::getSignedMaxValue(OperandBitSize);
     } else {
       MinVal = APInt::getMinValue(OperandBitSize);
       MaxVal = APInt::getMaxValue(OperandBitSize);
     }
 
     // Canonicalize GE/LE comparisons to use GT/LT comparisons.
     if (Cond == ISD::SETGE || Cond == ISD::SETUGE) {
       // X >= MIN --> true
       if (C1 == MinVal)
         return DAG.getBoolConstant(true, dl, VT, OpVT);
 
       if (!VT.isVector()) { // TODO: Support this for vectors.
         // X >= C0 --> X > (C0 - 1)
         APInt C = C1 - 1;
         ISD::CondCode NewCC = (Cond == ISD::SETGE) ? ISD::SETGT : ISD::SETUGT;
         if ((DCI.isBeforeLegalizeOps() ||
              isCondCodeLegal(NewCC, VT.getSimpleVT())) &&
             (!N1C->isOpaque() || (C.getBitWidth() <= 64 &&
                                   isLegalICmpImmediate(C.getSExtValue())))) {
           return DAG.getSetCC(dl, VT, N0,
                               DAG.getConstant(C, dl, N1.getValueType()),
                               NewCC);
         }
       }
     }
 
     if (Cond == ISD::SETLE || Cond == ISD::SETULE) {
       // X <= MAX --> true
       if (C1 == MaxVal)
         return DAG.getBoolConstant(true, dl, VT, OpVT);
 
       // X <= C0 --> X < (C0 + 1)
       if (!VT.isVector()) { // TODO: Support this for vectors.
         APInt C = C1 + 1;
         ISD::CondCode NewCC = (Cond == ISD::SETLE) ? ISD::SETLT : ISD::SETULT;
         if ((DCI.isBeforeLegalizeOps() ||
              isCondCodeLegal(NewCC, VT.getSimpleVT())) &&
             (!N1C->isOpaque() || (C.getBitWidth() <= 64 &&
                                   isLegalICmpImmediate(C.getSExtValue())))) {
           return DAG.getSetCC(dl, VT, N0,
                               DAG.getConstant(C, dl, N1.getValueType()),
                               NewCC);
         }
       }
     }
 
     if (Cond == ISD::SETLT || Cond == ISD::SETULT) {
       if (C1 == MinVal)
         return DAG.getBoolConstant(false, dl, VT, OpVT); // X < MIN --> false
 
       // TODO: Support this for vectors after legalize ops.
       if (!VT.isVector() || DCI.isBeforeLegalizeOps()) {
         // Canonicalize setlt X, Max --> setne X, Max
         if (C1 == MaxVal)
           return DAG.getSetCC(dl, VT, N0, N1, ISD::SETNE);
 
         // If we have setult X, 1, turn it into seteq X, 0
         if (C1 == MinVal+1)
           return DAG.getSetCC(dl, VT, N0,
                               DAG.getConstant(MinVal, dl, N0.getValueType()),
                               ISD::SETEQ);
       }
     }
 
     if (Cond == ISD::SETGT || Cond == ISD::SETUGT) {
       if (C1 == MaxVal)
         return DAG.getBoolConstant(false, dl, VT, OpVT); // X > MAX --> false
 
       // TODO: Support this for vectors after legalize ops.
       if (!VT.isVector() || DCI.isBeforeLegalizeOps()) {
         // Canonicalize setgt X, Min --> setne X, Min
         if (C1 == MinVal)
           return DAG.getSetCC(dl, VT, N0, N1, ISD::SETNE);
 
         // If we have setugt X, Max-1, turn it into seteq X, Max
         if (C1 == MaxVal-1)
           return DAG.getSetCC(dl, VT, N0,
                               DAG.getConstant(MaxVal, dl, N0.getValueType()),
                               ISD::SETEQ);
       }
     }
 
     if (Cond == ISD::SETEQ || Cond == ISD::SETNE) {
       // (X & (C l>>/<< Y)) ==/!= 0  -->  ((X <</l>> Y) & C) ==/!= 0
       if (C1.isNullValue())
         if (SDValue CC = optimizeSetCCByHoistingAndByConstFromLogicalShift(
                 VT, N0, N1, Cond, DCI, dl))
           return CC;
     }
 
     // If we have "setcc X, C0", check to see if we can shrink the immediate
     // by changing cc.
     // TODO: Support this for vectors after legalize ops.
     if (!VT.isVector() || DCI.isBeforeLegalizeOps()) {
       // SETUGT X, SINTMAX  -> SETLT X, 0
       if (Cond == ISD::SETUGT &&
           C1 == APInt::getSignedMaxValue(OperandBitSize))
         return DAG.getSetCC(dl, VT, N0,
                             DAG.getConstant(0, dl, N1.getValueType()),
                             ISD::SETLT);
 
       // SETULT X, SINTMIN  -> SETGT X, -1
       if (Cond == ISD::SETULT &&
           C1 == APInt::getSignedMinValue(OperandBitSize)) {
         SDValue ConstMinusOne =
             DAG.getConstant(APInt::getAllOnesValue(OperandBitSize), dl,
                             N1.getValueType());
         return DAG.getSetCC(dl, VT, N0, ConstMinusOne, ISD::SETGT);
       }
     }
   }
 
   // Back to non-vector simplifications.
   // TODO: Can we do these for vector splats?
   if (auto *N1C = dyn_cast<ConstantSDNode>(N1.getNode())) {
     const TargetLowering &TLI = DAG.getTargetLoweringInfo();
     const APInt &C1 = N1C->getAPIntValue();
     EVT ShValTy = N0.getValueType();
 
     // Fold bit comparisons when we can.
     if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) &&
         (VT == ShValTy || (isTypeLegal(VT) && VT.bitsLE(ShValTy))) &&
         N0.getOpcode() == ISD::AND) {
       if (auto *AndRHS = dyn_cast<ConstantSDNode>(N0.getOperand(1))) {
         EVT ShiftTy =
             getShiftAmountTy(ShValTy, Layout, !DCI.isBeforeLegalize());
         if (Cond == ISD::SETNE && C1 == 0) {// (X & 8) != 0  -->  (X & 8) >> 3
           // Perform the xform if the AND RHS is a single bit.
           unsigned ShCt = AndRHS->getAPIntValue().logBase2();
           if (AndRHS->getAPIntValue().isPowerOf2() &&
               !TLI.shouldAvoidTransformToShift(ShValTy, ShCt)) {
             return DAG.getNode(ISD::TRUNCATE, dl, VT,
                                DAG.getNode(ISD::SRL, dl, ShValTy, N0,
                                            DAG.getConstant(ShCt, dl, ShiftTy)));
           }
         } else if (Cond == ISD::SETEQ && C1 == AndRHS->getAPIntValue()) {
           // (X & 8) == 8  -->  (X & 8) >> 3
           // Perform the xform if C1 is a single bit.
           unsigned ShCt = C1.logBase2();
           if (C1.isPowerOf2() &&
               !TLI.shouldAvoidTransformToShift(ShValTy, ShCt)) {
             return DAG.getNode(ISD::TRUNCATE, dl, VT,
                                DAG.getNode(ISD::SRL, dl, ShValTy, N0,
                                            DAG.getConstant(ShCt, dl, ShiftTy)));
           }
         }
       }
     }
 
     if (C1.getMinSignedBits() <= 64 &&
         !isLegalICmpImmediate(C1.getSExtValue())) {
       EVT ShiftTy = getShiftAmountTy(ShValTy, Layout, !DCI.isBeforeLegalize());
       // (X & -256) == 256 -> (X >> 8) == 1
       if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) &&
           N0.getOpcode() == ISD::AND && N0.hasOneUse()) {
         if (auto *AndRHS = dyn_cast<ConstantSDNode>(N0.getOperand(1))) {
           const APInt &AndRHSC = AndRHS->getAPIntValue();
           if ((-AndRHSC).isPowerOf2() && (AndRHSC & C1) == C1) {
             unsigned ShiftBits = AndRHSC.countTrailingZeros();
             if (!TLI.shouldAvoidTransformToShift(ShValTy, ShiftBits)) {
               SDValue Shift =
                 DAG.getNode(ISD::SRL, dl, ShValTy, N0.getOperand(0),
                             DAG.getConstant(ShiftBits, dl, ShiftTy));
               SDValue CmpRHS = DAG.getConstant(C1.lshr(ShiftBits), dl, ShValTy);
               return DAG.getSetCC(dl, VT, Shift, CmpRHS, Cond);
             }
           }
         }
       } else if (Cond == ISD::SETULT || Cond == ISD::SETUGE ||
                  Cond == ISD::SETULE || Cond == ISD::SETUGT) {
         bool AdjOne = (Cond == ISD::SETULE || Cond == ISD::SETUGT);
         // X <  0x100000000 -> (X >> 32) <  1
         // X >= 0x100000000 -> (X >> 32) >= 1
         // X <= 0x0ffffffff -> (X >> 32) <  1
         // X >  0x0ffffffff -> (X >> 32) >= 1
         unsigned ShiftBits;
         APInt NewC = C1;
         ISD::CondCode NewCond = Cond;
         if (AdjOne) {
           ShiftBits = C1.countTrailingOnes();
           NewC = NewC + 1;
           NewCond = (Cond == ISD::SETULE) ? ISD::SETULT : ISD::SETUGE;
         } else {
           ShiftBits = C1.countTrailingZeros();
         }
         NewC.lshrInPlace(ShiftBits);
         if (ShiftBits && NewC.getMinSignedBits() <= 64 &&
             isLegalICmpImmediate(NewC.getSExtValue()) &&
             !TLI.shouldAvoidTransformToShift(ShValTy, ShiftBits)) {
           SDValue Shift = DAG.getNode(ISD::SRL, dl, ShValTy, N0,
                                       DAG.getConstant(ShiftBits, dl, ShiftTy));
           SDValue CmpRHS = DAG.getConstant(NewC, dl, ShValTy);
           return DAG.getSetCC(dl, VT, Shift, CmpRHS, NewCond);
         }
       }
     }
   }
 
   if (!isa<ConstantFPSDNode>(N0) && isa<ConstantFPSDNode>(N1)) {
     auto *CFP = cast<ConstantFPSDNode>(N1);
     assert(!CFP->getValueAPF().isNaN() && "Unexpected NaN value");
 
     // Otherwise, we know the RHS is not a NaN.  Simplify the node to drop the
     // constant if knowing that the operand is non-nan is enough.  We prefer to
     // have SETO(x,x) instead of SETO(x, 0.0) because this avoids having to
     // materialize 0.0.
     if (Cond == ISD::SETO || Cond == ISD::SETUO)
       return DAG.getSetCC(dl, VT, N0, N0, Cond);
 
     // setcc (fneg x), C -> setcc swap(pred) x, -C
     if (N0.getOpcode() == ISD::FNEG) {
       ISD::CondCode SwapCond = ISD::getSetCCSwappedOperands(Cond);
       if (DCI.isBeforeLegalizeOps() ||
           isCondCodeLegal(SwapCond, N0.getSimpleValueType())) {
         SDValue NegN1 = DAG.getNode(ISD::FNEG, dl, N0.getValueType(), N1);
         return DAG.getSetCC(dl, VT, N0.getOperand(0), NegN1, SwapCond);
       }
     }
 
     // If the condition is not legal, see if we can find an equivalent one
     // which is legal.
     if (!isCondCodeLegal(Cond, N0.getSimpleValueType())) {
       // If the comparison was an awkward floating-point == or != and one of
       // the comparison operands is infinity or negative infinity, convert the
       // condition to a less-awkward <= or >=.
       if (CFP->getValueAPF().isInfinity()) {
         bool IsNegInf = CFP->getValueAPF().isNegative();
         ISD::CondCode NewCond = ISD::SETCC_INVALID;
         switch (Cond) {
         case ISD::SETOEQ: NewCond = IsNegInf ? ISD::SETOLE : ISD::SETOGE; break;
         case ISD::SETUEQ: NewCond = IsNegInf ? ISD::SETULE : ISD::SETUGE; break;
         case ISD::SETUNE: NewCond = IsNegInf ? ISD::SETUGT : ISD::SETULT; break;
         case ISD::SETONE: NewCond = IsNegInf ? ISD::SETOGT : ISD::SETOLT; break;
         default: break;
         }
         if (NewCond != ISD::SETCC_INVALID &&
             isCondCodeLegal(NewCond, N0.getSimpleValueType()))
           return DAG.getSetCC(dl, VT, N0, N1, NewCond);
       }
     }
   }
 
   if (N0 == N1) {
     // The sext(setcc()) => setcc() optimization relies on the appropriate
     // constant being emitted.
     assert(!N0.getValueType().isInteger() &&
            "Integer types should be handled by FoldSetCC");
 
     bool EqTrue = ISD::isTrueWhenEqual(Cond);
     unsigned UOF = ISD::getUnorderedFlavor(Cond);
     if (UOF == 2) // FP operators that are undefined on NaNs.
       return DAG.getBoolConstant(EqTrue, dl, VT, OpVT);
     if (UOF == unsigned(EqTrue))
       return DAG.getBoolConstant(EqTrue, dl, VT, OpVT);
     // Otherwise, we can't fold it.  However, we can simplify it to SETUO/SETO
     // if it is not already.
     ISD::CondCode NewCond = UOF == 0 ? ISD::SETO : ISD::SETUO;
     if (NewCond != Cond &&
         (DCI.isBeforeLegalizeOps() ||
                             isCondCodeLegal(NewCond, N0.getSimpleValueType())))
       return DAG.getSetCC(dl, VT, N0, N1, NewCond);
   }
 
   if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) &&
       N0.getValueType().isInteger()) {
     if (N0.getOpcode() == ISD::ADD || N0.getOpcode() == ISD::SUB ||
         N0.getOpcode() == ISD::XOR) {
       // Simplify (X+Y) == (X+Z) -->  Y == Z
       if (N0.getOpcode() == N1.getOpcode()) {
         if (N0.getOperand(0) == N1.getOperand(0))
           return DAG.getSetCC(dl, VT, N0.getOperand(1), N1.getOperand(1), Cond);
         if (N0.getOperand(1) == N1.getOperand(1))
           return DAG.getSetCC(dl, VT, N0.getOperand(0), N1.getOperand(0), Cond);
         if (isCommutativeBinOp(N0.getOpcode())) {
           // If X op Y == Y op X, try other combinations.
           if (N0.getOperand(0) == N1.getOperand(1))
             return DAG.getSetCC(dl, VT, N0.getOperand(1), N1.getOperand(0),
                                 Cond);
           if (N0.getOperand(1) == N1.getOperand(0))
             return DAG.getSetCC(dl, VT, N0.getOperand(0), N1.getOperand(1),
                                 Cond);
         }
       }
 
       // If RHS is a legal immediate value for a compare instruction, we need
       // to be careful about increasing register pressure needlessly.
       bool LegalRHSImm = false;
 
       if (auto *RHSC = dyn_cast<ConstantSDNode>(N1)) {
         if (auto *LHSR = dyn_cast<ConstantSDNode>(N0.getOperand(1))) {
           // Turn (X+C1) == C2 --> X == C2-C1
           if (N0.getOpcode() == ISD::ADD && N0.getNode()->hasOneUse()) {
             return DAG.getSetCC(dl, VT, N0.getOperand(0),
                                 DAG.getConstant(RHSC->getAPIntValue()-
                                                 LHSR->getAPIntValue(),
                                 dl, N0.getValueType()), Cond);
           }
 
           // Turn (X^C1) == C2 into X == C1^C2 iff X&~C1 = 0.
           if (N0.getOpcode() == ISD::XOR)
             // If we know that all of the inverted bits are zero, don't bother
             // performing the inversion.
             if (DAG.MaskedValueIsZero(N0.getOperand(0), ~LHSR->getAPIntValue()))
               return
                 DAG.getSetCC(dl, VT, N0.getOperand(0),
                              DAG.getConstant(LHSR->getAPIntValue() ^
                                                RHSC->getAPIntValue(),
                                              dl, N0.getValueType()),
                              Cond);
         }
 
         // Turn (C1-X) == C2 --> X == C1-C2
         if (auto *SUBC = dyn_cast<ConstantSDNode>(N0.getOperand(0))) {
           if (N0.getOpcode() == ISD::SUB && N0.getNode()->hasOneUse()) {
             return
               DAG.getSetCC(dl, VT, N0.getOperand(1),
                            DAG.getConstant(SUBC->getAPIntValue() -
                                              RHSC->getAPIntValue(),
                                            dl, N0.getValueType()),
                            Cond);
           }
         }
 
         // Could RHSC fold directly into a compare?
         if (RHSC->getValueType(0).getSizeInBits() <= 64)
           LegalRHSImm = isLegalICmpImmediate(RHSC->getSExtValue());
       }
 
       // (X+Y) == X --> Y == 0 and similar folds.
       // Don't do this if X is an immediate that can fold into a cmp
       // instruction and X+Y has other uses. It could be an induction variable
       // chain, and the transform would increase register pressure.
       if (!LegalRHSImm || N0.hasOneUse())
         if (SDValue V = foldSetCCWithBinOp(VT, N0, N1, Cond, dl, DCI))
           return V;
     }
 
     if (N1.getOpcode() == ISD::ADD || N1.getOpcode() == ISD::SUB ||
         N1.getOpcode() == ISD::XOR)
       if (SDValue V = foldSetCCWithBinOp(VT, N1, N0, Cond, dl, DCI))
         return V;
 
     if (SDValue V = foldSetCCWithAnd(VT, N0, N1, Cond, dl, DCI))
       return V;
   }
 
   // Fold remainder of division by a constant.
   if ((N0.getOpcode() == ISD::UREM || N0.getOpcode() == ISD::SREM) &&
       N0.hasOneUse() && (Cond == ISD::SETEQ || Cond == ISD::SETNE)) {
     AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();
 
     // When division is cheap or optimizing for minimum size,
     // fall through to DIVREM creation by skipping this fold.
     if (!isIntDivCheap(VT, Attr) && !Attr.hasFnAttribute(Attribute::MinSize)) {
       if (N0.getOpcode() == ISD::UREM) {
         if (SDValue Folded = buildUREMEqFold(VT, N0, N1, Cond, DCI, dl))
           return Folded;
       } else if (N0.getOpcode() == ISD::SREM) {
         if (SDValue Folded = buildSREMEqFold(VT, N0, N1, Cond, DCI, dl))
           return Folded;
       }
     }
   }
 
   // Fold away ALL boolean setcc's.
   if (N0.getValueType().getScalarType() == MVT::i1 && foldBooleans) {
     SDValue Temp;
     switch (Cond) {
     default: llvm_unreachable("Unknown integer setcc!");
     case ISD::SETEQ:  // X == Y  -> ~(X^Y)
       Temp = DAG.getNode(ISD::XOR, dl, OpVT, N0, N1);
       N0 = DAG.getNOT(dl, Temp, OpVT);
       if (!DCI.isCalledByLegalizer())
         DCI.AddToWorklist(Temp.getNode());
       break;
     case ISD::SETNE:  // X != Y   -->  (X^Y)
       N0 = DAG.getNode(ISD::XOR, dl, OpVT, N0, N1);
       break;
     case ISD::SETGT:  // X >s Y   -->  X == 0 & Y == 1  -->  ~X & Y
     case ISD::SETULT: // X <u Y   -->  X == 0 & Y == 1  -->  ~X & Y
       Temp = DAG.getNOT(dl, N0, OpVT);
       N0 = DAG.getNode(ISD::AND, dl, OpVT, N1, Temp);
       if (!DCI.isCalledByLegalizer())
         DCI.AddToWorklist(Temp.getNode());
       break;
     case ISD::SETLT:  // X <s Y   --> X == 1 & Y == 0  -->  ~Y & X
     case ISD::SETUGT: // X >u Y   --> X == 1 & Y == 0  -->  ~Y & X
       Temp = DAG.getNOT(dl, N1, OpVT);
       N0 = DAG.getNode(ISD::AND, dl, OpVT, N0, Temp);
       if (!DCI.isCalledByLegalizer())
         DCI.AddToWorklist(Temp.getNode());
       break;
     case ISD::SETULE: // X <=u Y  --> X == 0 | Y == 1  -->  ~X | Y
     case ISD::SETGE:  // X >=s Y  --> X == 0 | Y == 1  -->  ~X | Y
       Temp = DAG.getNOT(dl, N0, OpVT);
       N0 = DAG.getNode(ISD::OR, dl, OpVT, N1, Temp);
       if (!DCI.isCalledByLegalizer())
         DCI.AddToWorklist(Temp.getNode());
       break;
     case ISD::SETUGE: // X >=u Y  --> X == 1 | Y == 0  -->  ~Y | X
     case ISD::SETLE:  // X <=s Y  --> X == 1 | Y == 0  -->  ~Y | X
       Temp = DAG.getNOT(dl, N1, OpVT);
       N0 = DAG.getNode(ISD::OR, dl, OpVT, N0, Temp);
       break;
     }
     if (VT.getScalarType() != MVT::i1) {
       if (!DCI.isCalledByLegalizer())
         DCI.AddToWorklist(N0.getNode());
       // FIXME: If running after legalize, we probably can't do this.
       ISD::NodeType ExtendCode = getExtendForContent(getBooleanContents(OpVT));
       N0 = DAG.getNode(ExtendCode, dl, VT, N0);
     }
     return N0;
   }
 
   // Could not fold it.
   return SDValue();
 }
 
 /// Returns true (and the GlobalValue and the offset) if the node is a
 /// GlobalAddress + offset.
 bool TargetLowering::isGAPlusOffset(SDNode *WN, const GlobalValue *&GA,
                                     int64_t &Offset) const {
 
   SDNode *N = unwrapAddress(SDValue(WN, 0)).getNode();
 
   if (auto *GASD = dyn_cast<GlobalAddressSDNode>(N)) {
     GA = GASD->getGlobal();
     Offset += GASD->getOffset();
     return true;
   }
 
   if (N->getOpcode() == ISD::ADD) {
     SDValue N1 = N->getOperand(0);
     SDValue N2 = N->getOperand(1);
     if (isGAPlusOffset(N1.getNode(), GA, Offset)) {
       if (auto *V = dyn_cast<ConstantSDNode>(N2)) {
         Offset += V->getSExtValue();
         return true;
       }
     } else if (isGAPlusOffset(N2.getNode(), GA, Offset)) {
       if (auto *V = dyn_cast<ConstantSDNode>(N1)) {
         Offset += V->getSExtValue();
         return true;
       }
     }
   }
 
   return false;
 }
 
 SDValue TargetLowering::PerformDAGCombine(SDNode *N,
                                           DAGCombinerInfo &DCI) const {
   // Default implementation: no optimization.
   return SDValue();
 }
 
 //===----------------------------------------------------------------------===//
 //  Inline Assembler Implementation Methods
 //===----------------------------------------------------------------------===//
 
 TargetLowering::ConstraintType
 TargetLowering::getConstraintType(StringRef Constraint) const {
   unsigned S = Constraint.size();
 
   if (S == 1) {
     switch (Constraint[0]) {
     default: break;
     case 'r':
       return C_RegisterClass;
     case 'm': // memory
     case 'o': // offsetable
     case 'V': // not offsetable
       return C_Memory;
     case 'n': // Simple Integer
     case 'E': // Floating Point Constant
     case 'F': // Floating Point Constant
       return C_Immediate;
     case 'i': // Simple Integer or Relocatable Constant
     case 's': // Relocatable Constant
     case 'p': // Address.
     case 'X': // Allow ANY value.
     case 'I': // Target registers.
     case 'J':
     case 'K':
     case 'L':
     case 'M':
     case 'N':
     case 'O':
     case 'P':
     case '<':
     case '>':
       return C_Other;
     }
   }
 
   if (S > 1 && Constraint[0] == '{' && Constraint[S - 1] == '}') {
     if (S == 8 && Constraint.substr(1, 6) == "memory") // "{memory}"
       return C_Memory;
     return C_Register;
   }
   return C_Unknown;
 }
 
 /// Try to replace an X constraint, which matches anything, with another that
 /// has more specific requirements based on the type of the corresponding
 /// operand.
 const char *TargetLowering::LowerXConstraint(EVT ConstraintVT) const {
   if (ConstraintVT.isInteger())
     return "r";
   if (ConstraintVT.isFloatingPoint())
     return "f"; // works for many targets
   return nullptr;
 }
 
 SDValue TargetLowering::LowerAsmOutputForConstraint(
     SDValue &Chain, SDValue &Flag, SDLoc DL, const AsmOperandInfo &OpInfo,
     SelectionDAG &DAG) const {
   return SDValue();
 }
 
 /// Lower the specified operand into the Ops vector.
 /// If it is invalid, don't add anything to Ops.
 void TargetLowering::LowerAsmOperandForConstraint(SDValue Op,
                                                   std::string &Constraint,
                                                   std::vector<SDValue> &Ops,
                                                   SelectionDAG &DAG) const {
 
   if (Constraint.length() > 1) return;
 
   char ConstraintLetter = Constraint[0];
   switch (ConstraintLetter) {
   default: break;
   case 'X':     // Allows any operand; labels (basic block) use this.
     if (Op.getOpcode() == ISD::BasicBlock ||
         Op.getOpcode() == ISD::TargetBlockAddress) {
       Ops.push_back(Op);
       return;
     }
     LLVM_FALLTHROUGH;
   case 'i':    // Simple Integer or Relocatable Constant
   case 'n':    // Simple Integer
   case 's': {  // Relocatable Constant
 
     GlobalAddressSDNode *GA;
     ConstantSDNode *C;
     BlockAddressSDNode *BA;
     uint64_t Offset = 0;
 
     // Match (GA) or (C) or (GA+C) or (GA-C) or ((GA+C)+C) or (((GA+C)+C)+C),
     // etc., since getelementpointer is variadic. We can't use
     // SelectionDAG::FoldSymbolOffset because it expects the GA to be accessible
     // while in this case the GA may be furthest from the root node which is
     // likely an ISD::ADD.
     while (1) {
       if ((GA = dyn_cast<GlobalAddressSDNode>(Op)) && ConstraintLetter != 'n') {
         Ops.push_back(DAG.getTargetGlobalAddress(GA->getGlobal(), SDLoc(Op),
                                                  GA->getValueType(0),
                                                  Offset + GA->getOffset()));
         return;
       } else if ((C = dyn_cast<ConstantSDNode>(Op)) &&
                  ConstraintLetter != 's') {
         // gcc prints these as sign extended.  Sign extend value to 64 bits
         // now; without this it would get ZExt'd later in
         // ScheduleDAGSDNodes::EmitNode, which is very generic.
         bool IsBool = C->getConstantIntValue()->getBitWidth() == 1;
         BooleanContent BCont = getBooleanContents(MVT::i64);
         ISD::NodeType ExtOpc = IsBool ? getExtendForContent(BCont)
                                       : ISD::SIGN_EXTEND;
         int64_t ExtVal = ExtOpc == ISD::ZERO_EXTEND ? C->getZExtValue()
                                                     : C->getSExtValue();
         Ops.push_back(DAG.getTargetConstant(Offset + ExtVal,
                                             SDLoc(C), MVT::i64));
         return;
       } else if ((BA = dyn_cast<BlockAddressSDNode>(Op)) &&
                  ConstraintLetter != 'n') {
         Ops.push_back(DAG.getTargetBlockAddress(
             BA->getBlockAddress(), BA->getValueType(0),
             Offset + BA->getOffset(), BA->getTargetFlags()));
         return;
       } else {
         const unsigned OpCode = Op.getOpcode();
         if (OpCode == ISD::ADD || OpCode == ISD::SUB) {
           if ((C = dyn_cast<ConstantSDNode>(Op.getOperand(0))))
             Op = Op.getOperand(1);
           // Subtraction is not commutative.
           else if (OpCode == ISD::ADD &&
                    (C = dyn_cast<ConstantSDNode>(Op.getOperand(1))))
             Op = Op.getOperand(0);
           else
             return;
           Offset += (OpCode == ISD::ADD ? 1 : -1) * C->getSExtValue();
           continue;
         }
       }
       return;
     }
     break;
   }
   }
 }
 
 std::pair<unsigned, const TargetRegisterClass *>
 TargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *RI,
                                              StringRef Constraint,
                                              MVT VT) const {
   if (Constraint.empty() || Constraint[0] != '{')
     return std::make_pair(0u, static_cast<TargetRegisterClass *>(nullptr));
   assert(*(Constraint.end() - 1) == '}' && "Not a brace enclosed constraint?");
 
   // Remove the braces from around the name.
   StringRef RegName(Constraint.data() + 1, Constraint.size() - 2);
 
   std::pair<unsigned, const TargetRegisterClass *> R =
       std::make_pair(0u, static_cast<const TargetRegisterClass *>(nullptr));
 
   // Figure out which register class contains this reg.
   for (const TargetRegisterClass *RC : RI->regclasses()) {
     // If none of the value types for this register class are valid, we
     // can't use it.  For example, 64-bit reg classes on 32-bit targets.
     if (!isLegalRC(*RI, *RC))
       continue;
 
     for (TargetRegisterClass::iterator I = RC->begin(), E = RC->end();
          I != E; ++I) {
       if (RegName.equals_lower(RI->getRegAsmName(*I))) {
         std::pair<unsigned, const TargetRegisterClass *> S =
             std::make_pair(*I, RC);
 
         // If this register class has the requested value type, return it,
         // otherwise keep searching and return the first class found
         // if no other is found which explicitly has the requested type.
         if (RI->isTypeLegalForClass(*RC, VT))
           return S;
         if (!R.second)
           R = S;
       }
     }
   }
 
   return R;
 }
 
 //===----------------------------------------------------------------------===//
 // Constraint Selection.
 
 /// Return true of this is an input operand that is a matching constraint like
 /// "4".
 bool TargetLowering::AsmOperandInfo::isMatchingInputConstraint() const {
   assert(!ConstraintCode.empty() && "No known constraint!");
   return isdigit(static_cast<unsigned char>(ConstraintCode[0]));
 }
 
 /// If this is an input matching constraint, this method returns the output
 /// operand it matches.
 unsigned TargetLowering::AsmOperandInfo::getMatchedOperand() const {
   assert(!ConstraintCode.empty() && "No known constraint!");
   return atoi(ConstraintCode.c_str());
 }
 
 /// Split up the constraint string from the inline assembly value into the
 /// specific constraints and their prefixes, and also tie in the associated
 /// operand values.
 /// If this returns an empty vector, and if the constraint string itself
 /// isn't empty, there was an error parsing.
 TargetLowering::AsmOperandInfoVector
 TargetLowering::ParseConstraints(const DataLayout &DL,
                                  const TargetRegisterInfo *TRI,
                                  const CallBase &Call) const {
   /// Information about all of the constraints.
   AsmOperandInfoVector ConstraintOperands;
   const InlineAsm *IA = cast<InlineAsm>(Call.getCalledOperand());
   unsigned maCount = 0; // Largest number of multiple alternative constraints.
 
   // Do a prepass over the constraints, canonicalizing them, and building up the
   // ConstraintOperands list.
   unsigned ArgNo = 0; // ArgNo - The argument of the CallInst.
   unsigned ResNo = 0; // ResNo - The result number of the next output.
 
   for (InlineAsm::ConstraintInfo &CI : IA->ParseConstraints()) {
     ConstraintOperands.emplace_back(std::move(CI));
     AsmOperandInfo &OpInfo = ConstraintOperands.back();
 
     // Update multiple alternative constraint count.
     if (OpInfo.multipleAlternatives.size() > maCount)
       maCount = OpInfo.multipleAlternatives.size();
 
     OpInfo.ConstraintVT = MVT::Other;
 
     // Compute the value type for each operand.
     switch (OpInfo.Type) {
     case InlineAsm::isOutput:
       // Indirect outputs just consume an argument.
       if (OpInfo.isIndirect) {
         OpInfo.CallOperandVal = Call.getArgOperand(ArgNo++);
         break;
       }
 
       // The return value of the call is this value.  As such, there is no
       // corresponding argument.
       assert(!Call.getType()->isVoidTy() && "Bad inline asm!");
       if (StructType *STy = dyn_cast<StructType>(Call.getType())) {
         OpInfo.ConstraintVT =
             getSimpleValueType(DL, STy->getElementType(ResNo));
       } else {
         assert(ResNo == 0 && "Asm only has one result!");
         OpInfo.ConstraintVT = getSimpleValueType(DL, Call.getType());
       }
       ++ResNo;
       break;
     case InlineAsm::isInput:
       OpInfo.CallOperandVal = Call.getArgOperand(ArgNo++);
       break;
     case InlineAsm::isClobber:
       // Nothing to do.
       break;
     }
 
     if (OpInfo.CallOperandVal) {
       llvm::Type *OpTy = OpInfo.CallOperandVal->getType();
       if (OpInfo.isIndirect) {
         llvm::PointerType *PtrTy = dyn_cast<PointerType>(OpTy);
         if (!PtrTy)
           report_fatal_error("Indirect operand for inline asm not a pointer!");
         OpTy = PtrTy->getElementType();
       }
 
       // Look for vector wrapped in a struct. e.g. { <16 x i8> }.
       if (StructType *STy = dyn_cast<StructType>(OpTy))
         if (STy->getNumElements() == 1)
           OpTy = STy->getElementType(0);
 
       // If OpTy is not a single value, it may be a struct/union that we
       // can tile with integers.
       if (!OpTy->isSingleValueType() && OpTy->isSized()) {
         unsigned BitSize = DL.getTypeSizeInBits(OpTy);
         switch (BitSize) {
         default: break;
         case 1:
         case 8:
         case 16:
         case 32:
         case 64:
         case 128:
           OpInfo.ConstraintVT =
               MVT::getVT(IntegerType::get(OpTy->getContext(), BitSize), true);
           break;
         }
       } else if (PointerType *PT = dyn_cast<PointerType>(OpTy)) {
         unsigned PtrSize = DL.getPointerSizeInBits(PT->getAddressSpace());
         OpInfo.ConstraintVT = MVT::getIntegerVT(PtrSize);
       } else {
         OpInfo.ConstraintVT = MVT::getVT(OpTy, true);
       }
     }
   }
 
   // If we have multiple alternative constraints, select the best alternative.
   if (!ConstraintOperands.empty()) {
     if (maCount) {
       unsigned bestMAIndex = 0;
       int bestWeight = -1;
       // weight:  -1 = invalid match, and 0 = so-so match to 5 = good match.
       int weight = -1;
       unsigned maIndex;
       // Compute the sums of the weights for each alternative, keeping track
       // of the best (highest weight) one so far.
       for (maIndex = 0; maIndex < maCount; ++maIndex) {
         int weightSum = 0;
         for (unsigned cIndex = 0, eIndex = ConstraintOperands.size();
              cIndex != eIndex; ++cIndex) {
           AsmOperandInfo &OpInfo = ConstraintOperands[cIndex];
           if (OpInfo.Type == InlineAsm::isClobber)
             continue;
 
           // If this is an output operand with a matching input operand,
           // look up the matching input. If their types mismatch, e.g. one
           // is an integer, the other is floating point, or their sizes are
           // different, flag it as an maCantMatch.
           if (OpInfo.hasMatchingInput()) {
             AsmOperandInfo &Input = ConstraintOperands[OpInfo.MatchingInput];
             if (OpInfo.ConstraintVT != Input.ConstraintVT) {
               if ((OpInfo.ConstraintVT.isInteger() !=
                    Input.ConstraintVT.isInteger()) ||
                   (OpInfo.ConstraintVT.getSizeInBits() !=
                    Input.ConstraintVT.getSizeInBits())) {
                 weightSum = -1; // Can't match.
                 break;
               }
             }
           }
           weight = getMultipleConstraintMatchWeight(OpInfo, maIndex);
           if (weight == -1) {
             weightSum = -1;
             break;
           }
           weightSum += weight;
         }
         // Update best.
         if (weightSum > bestWeight) {
           bestWeight = weightSum;
           bestMAIndex = maIndex;
         }
       }
 
       // Now select chosen alternative in each constraint.
       for (unsigned cIndex = 0, eIndex = ConstraintOperands.size();
            cIndex != eIndex; ++cIndex) {
         AsmOperandInfo &cInfo = ConstraintOperands[cIndex];
         if (cInfo.Type == InlineAsm::isClobber)
           continue;
         cInfo.selectAlternative(bestMAIndex);
       }
     }
   }
 
   // Check and hook up tied operands, choose constraint code to use.
   for (unsigned cIndex = 0, eIndex = ConstraintOperands.size();
        cIndex != eIndex; ++cIndex) {
     AsmOperandInfo &OpInfo = ConstraintOperands[cIndex];
 
     // If this is an output operand with a matching input operand, look up the
     // matching input. If their types mismatch, e.g. one is an integer, the
     // other is floating point, or their sizes are different, flag it as an
     // error.
     if (OpInfo.hasMatchingInput()) {
       AsmOperandInfo &Input = ConstraintOperands[OpInfo.MatchingInput];
 
       if (OpInfo.ConstraintVT != Input.ConstraintVT) {
         std::pair<unsigned, const TargetRegisterClass *> MatchRC =
             getRegForInlineAsmConstraint(TRI, OpInfo.ConstraintCode,
                                          OpInfo.ConstraintVT);
         std::pair<unsigned, const TargetRegisterClass *> InputRC =
             getRegForInlineAsmConstraint(TRI, Input.ConstraintCode,
                                          Input.ConstraintVT);
         if ((OpInfo.ConstraintVT.isInteger() !=
              Input.ConstraintVT.isInteger()) ||
             (MatchRC.second != InputRC.second)) {
           report_fatal_error("Unsupported asm: input constraint"
                              " with a matching output constraint of"
                              " incompatible type!");
         }
       }
     }
   }
 
   return ConstraintOperands;
 }
 
 /// Return an integer indicating how general CT is.
 static unsigned getConstraintGenerality(TargetLowering::ConstraintType CT) {
   switch (CT) {
   case TargetLowering::C_Immediate:
   case TargetLowering::C_Other:
   case TargetLowering::C_Unknown:
     return 0;
   case TargetLowering::C_Register:
     return 1;
   case TargetLowering::C_RegisterClass:
     return 2;
   case TargetLowering::C_Memory:
     return 3;
   }
   llvm_unreachable("Invalid constraint type");
 }
 
 /// Examine constraint type and operand type and determine a weight value.
 /// This object must already have been set up with the operand type
 /// and the current alternative constraint selected.
 TargetLowering::ConstraintWeight
   TargetLowering::getMultipleConstraintMatchWeight(
     AsmOperandInfo &info, int maIndex) const {
   InlineAsm::ConstraintCodeVector *rCodes;
   if (maIndex >= (int)info.multipleAlternatives.size())
     rCodes = &info.Codes;
   else
     rCodes = &info.multipleAlternatives[maIndex].Codes;
   ConstraintWeight BestWeight = CW_Invalid;
 
   // Loop over the options, keeping track of the most general one.
   for (unsigned i = 0, e = rCodes->size(); i != e; ++i) {
     ConstraintWeight weight =
       getSingleConstraintMatchWeight(info, (*rCodes)[i].c_str());
     if (weight > BestWeight)
       BestWeight = weight;
   }
 
   return BestWeight;
 }
 
 /// Examine constraint type and operand type and determine a weight value.
 /// This object must already have been set up with the operand type
 /// and the current alternative constraint selected.
 TargetLowering::ConstraintWeight
   TargetLowering::getSingleConstraintMatchWeight(
     AsmOperandInfo &info, const char *constraint) const {
   ConstraintWeight weight = CW_Invalid;
   Value *CallOperandVal = info.CallOperandVal;
     // If we don't have a value, we can't do a match,
     // but allow it at the lowest weight.
   if (!CallOperandVal)
     return CW_Default;
   // Look at the constraint type.
   switch (*constraint) {
     case 'i': // immediate integer.
     case 'n': // immediate integer with a known value.
       if (isa<ConstantInt>(CallOperandVal))
         weight = CW_Constant;
       break;
     case 's': // non-explicit intregal immediate.
       if (isa<GlobalValue>(CallOperandVal))
         weight = CW_Constant;
       break;
     case 'E': // immediate float if host format.
     case 'F': // immediate float.
       if (isa<ConstantFP>(CallOperandVal))
         weight = CW_Constant;
       break;
     case '<': // memory operand with autodecrement.
     case '>': // memory operand with autoincrement.
     case 'm': // memory operand.
     case 'o': // offsettable memory operand
     case 'V': // non-offsettable memory operand
       weight = CW_Memory;
       break;
     case 'r': // general register.
     case 'g': // general register, memory operand or immediate integer.
               // note: Clang converts "g" to "imr".
       if (CallOperandVal->getType()->isIntegerTy())
         weight = CW_Register;
       break;
     case 'X': // any operand.
   default:
     weight = CW_Default;
     break;
   }
   return weight;
 }
 
 /// If there are multiple different constraints that we could pick for this
 /// operand (e.g. "imr") try to pick the 'best' one.
 /// This is somewhat tricky: constraints fall into four classes:
 ///    Other         -> immediates and magic values
 ///    Register      -> one specific register
 ///    RegisterClass -> a group of regs
 ///    Memory        -> memory
 /// Ideally, we would pick the most specific constraint possible: if we have
 /// something that fits into a register, we would pick it.  The problem here
 /// is that if we have something that could either be in a register or in
 /// memory that use of the register could cause selection of *other*
 /// operands to fail: they might only succeed if we pick memory.  Because of
 /// this the heuristic we use is:
 ///
 ///  1) If there is an 'other' constraint, and if the operand is valid for
 ///     that constraint, use it.  This makes us take advantage of 'i'
 ///     constraints when available.
 ///  2) Otherwise, pick the most general constraint present.  This prefers
 ///     'm' over 'r', for example.
 ///
 static void ChooseConstraint(TargetLowering::AsmOperandInfo &OpInfo,
                              const TargetLowering &TLI,
                              SDValue Op, SelectionDAG *DAG) {
   assert(OpInfo.Codes.size() > 1 && "Doesn't have multiple constraint options");
   unsigned BestIdx = 0;
   TargetLowering::ConstraintType BestType = TargetLowering::C_Unknown;
   int BestGenerality = -1;
 
   // Loop over the options, keeping track of the most general one.
   for (unsigned i = 0, e = OpInfo.Codes.size(); i != e; ++i) {
     TargetLowering::ConstraintType CType =
       TLI.getConstraintType(OpInfo.Codes[i]);
 
     // Indirect 'other' or 'immediate' constraints are not allowed.
     if (OpInfo.isIndirect && !(CType == TargetLowering::C_Memory ||
                                CType == TargetLowering::C_Register ||
                                CType == TargetLowering::C_RegisterClass))
       continue;
 
     // If this is an 'other' or 'immediate' constraint, see if the operand is
     // valid for it. For example, on X86 we might have an 'rI' constraint. If
     // the operand is an integer in the range [0..31] we want to use I (saving a
     // load of a register), otherwise we must use 'r'.
     if ((CType == TargetLowering::C_Other ||
          CType == TargetLowering::C_Immediate) && Op.getNode()) {
       assert(OpInfo.Codes[i].size() == 1 &&
              "Unhandled multi-letter 'other' constraint");
       std::vector<SDValue> ResultOps;
       TLI.LowerAsmOperandForConstraint(Op, OpInfo.Codes[i],
                                        ResultOps, *DAG);
       if (!ResultOps.empty()) {
         BestType = CType;
         BestIdx = i;
         break;
       }
     }
 
     // Things with matching constraints can only be registers, per gcc
     // documentation.  This mainly affects "g" constraints.
     if (CType == TargetLowering::C_Memory && OpInfo.hasMatchingInput())
       continue;
 
     // This constraint letter is more general than the previous one, use it.
     int Generality = getConstraintGenerality(CType);
     if (Generality > BestGenerality) {
       BestType = CType;
       BestIdx = i;
       BestGenerality = Generality;
     }
   }
 
   OpInfo.ConstraintCode = OpInfo.Codes[BestIdx];
   OpInfo.ConstraintType = BestType;
 }
 
 /// Determines the constraint code and constraint type to use for the specific
 /// AsmOperandInfo, setting OpInfo.ConstraintCode and OpInfo.ConstraintType.
 void TargetLowering::ComputeConstraintToUse(AsmOperandInfo &OpInfo,
                                             SDValue Op,
                                             SelectionDAG *DAG) const {
   assert(!OpInfo.Codes.empty() && "Must have at least one constraint");
 
   // Single-letter constraints ('r') are very common.
   if (OpInfo.Codes.size() == 1) {
     OpInfo.ConstraintCode = OpInfo.Codes[0];
     OpInfo.ConstraintType = getConstraintType(OpInfo.ConstraintCode);
   } else {
     ChooseConstraint(OpInfo, *this, Op, DAG);
   }
 
   // 'X' matches anything.
   if (OpInfo.ConstraintCode == "X" && OpInfo.CallOperandVal) {
     // Labels and constants are handled elsewhere ('X' is the only thing
     // that matches labels).  For Functions, the type here is the type of
     // the result, which is not what we want to look at; leave them alone.
     Value *v = OpInfo.CallOperandVal;
     if (isa<BasicBlock>(v) || isa<ConstantInt>(v) || isa<Function>(v)) {
       OpInfo.CallOperandVal = v;
       return;
     }
 
     if (Op.getNode() && Op.getOpcode() == ISD::TargetBlockAddress)
       return;
 
     // Otherwise, try to resolve it to something we know about by looking at
     // the actual operand type.
     if (const char *Repl = LowerXConstraint(OpInfo.ConstraintVT)) {
       OpInfo.ConstraintCode = Repl;
       OpInfo.ConstraintType = getConstraintType(OpInfo.ConstraintCode);
     }
   }
 }
 
 /// Given an exact SDIV by a constant, create a multiplication
 /// with the multiplicative inverse of the constant.
 static SDValue BuildExactSDIV(const TargetLowering &TLI, SDNode *N,
                               const SDLoc &dl, SelectionDAG &DAG,
                               SmallVectorImpl<SDNode *> &Created) {
   SDValue Op0 = N->getOperand(0);
   SDValue Op1 = N->getOperand(1);
   EVT VT = N->getValueType(0);
   EVT SVT = VT.getScalarType();
   EVT ShVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());
   EVT ShSVT = ShVT.getScalarType();
 
   bool UseSRA = false;
   SmallVector<SDValue, 16> Shifts, Factors;
 
   auto BuildSDIVPattern = [&](ConstantSDNode *C) {
     if (C->isNullValue())
       return false;
     APInt Divisor = C->getAPIntValue();
     unsigned Shift = Divisor.countTrailingZeros();
     if (Shift) {
       Divisor.ashrInPlace(Shift);
       UseSRA = true;
     }
     // Calculate the multiplicative inverse, using Newton's method.
     APInt t;
     APInt Factor = Divisor;
     while ((t = Divisor * Factor) != 1)
       Factor *= APInt(Divisor.getBitWidth(), 2) - t;
     Shifts.push_back(DAG.getConstant(Shift, dl, ShSVT));
     Factors.push_back(DAG.getConstant(Factor, dl, SVT));
     return true;
   };
 
   // Collect all magic values from the build vector.
   if (!ISD::matchUnaryPredicate(Op1, BuildSDIVPattern))
     return SDValue();
 
   SDValue Shift, Factor;
   if (VT.isVector()) {
     Shift = DAG.getBuildVector(ShVT, dl, Shifts);
     Factor = DAG.getBuildVector(VT, dl, Factors);
   } else {
     Shift = Shifts[0];
     Factor = Factors[0];
   }
 
   SDValue Res = Op0;
 
   // Shift the value upfront if it is even, so the LSB is one.
   if (UseSRA) {
     // TODO: For UDIV use SRL instead of SRA.
     SDNodeFlags Flags;
     Flags.setExact(true);
     Res = DAG.getNode(ISD::SRA, dl, VT, Res, Shift, Flags);
     Created.push_back(Res.getNode());
   }
 
   return DAG.getNode(ISD::MUL, dl, VT, Res, Factor);
 }
 
 SDValue TargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,
                               SelectionDAG &DAG,
                               SmallVectorImpl<SDNode *> &Created) const {
   AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   if (TLI.isIntDivCheap(N->getValueType(0), Attr))
     return SDValue(N, 0); // Lower SDIV as SDIV
   return SDValue();
 }
 
 /// Given an ISD::SDIV node expressing a divide by constant,
 /// return a DAG expression to select that will generate the same value by
 /// multiplying by a magic number.
 /// Ref: "Hacker's Delight" or "The PowerPC Compiler Writer's Guide".
 SDValue TargetLowering::BuildSDIV(SDNode *N, SelectionDAG &DAG,
                                   bool IsAfterLegalization,
                                   SmallVectorImpl<SDNode *> &Created) const {
   SDLoc dl(N);
   EVT VT = N->getValueType(0);
   EVT SVT = VT.getScalarType();
   EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());
   EVT ShSVT = ShVT.getScalarType();
   unsigned EltBits = VT.getScalarSizeInBits();
 
   // Check to see if we can do this.
   // FIXME: We should be more aggressive here.
   if (!isTypeLegal(VT))
     return SDValue();
 
   // If the sdiv has an 'exact' bit we can use a simpler lowering.
   if (N->getFlags().hasExact())
     return BuildExactSDIV(*this, N, dl, DAG, Created);
 
   SmallVector<SDValue, 16> MagicFactors, Factors, Shifts, ShiftMasks;
 
   auto BuildSDIVPattern = [&](ConstantSDNode *C) {
     if (C->isNullValue())
       return false;
 
     const APInt &Divisor = C->getAPIntValue();
     APInt::ms magics = Divisor.magic();
     int NumeratorFactor = 0;
     int ShiftMask = -1;
 
     if (Divisor.isOneValue() || Divisor.isAllOnesValue()) {
       // If d is +1/-1, we just multiply the numerator by +1/-1.
       NumeratorFactor = Divisor.getSExtValue();
       magics.m = 0;
       magics.s = 0;
       ShiftMask = 0;
     } else if (Divisor.isStrictlyPositive() && magics.m.isNegative()) {
       // If d > 0 and m < 0, add the numerator.
       NumeratorFactor = 1;
     } else if (Divisor.isNegative() && magics.m.isStrictlyPositive()) {
       // If d < 0 and m > 0, subtract the numerator.
       NumeratorFactor = -1;
     }
 
     MagicFactors.push_back(DAG.getConstant(magics.m, dl, SVT));
     Factors.push_back(DAG.getConstant(NumeratorFactor, dl, SVT));
     Shifts.push_back(DAG.getConstant(magics.s, dl, ShSVT));
     ShiftMasks.push_back(DAG.getConstant(ShiftMask, dl, SVT));
     return true;
   };
 
   SDValue N0 = N->getOperand(0);
   SDValue N1 = N->getOperand(1);
 
   // Collect the shifts / magic values from each element.
   if (!ISD::matchUnaryPredicate(N1, BuildSDIVPattern))
     return SDValue();
 
   SDValue MagicFactor, Factor, Shift, ShiftMask;
   if (VT.isVector()) {
     MagicFactor = DAG.getBuildVector(VT, dl, MagicFactors);
     Factor = DAG.getBuildVector(VT, dl, Factors);
     Shift = DAG.getBuildVector(ShVT, dl, Shifts);
     ShiftMask = DAG.getBuildVector(VT, dl, ShiftMasks);
   } else {
     MagicFactor = MagicFactors[0];
     Factor = Factors[0];
     Shift = Shifts[0];
     ShiftMask = ShiftMasks[0];
   }
 
   // Multiply the numerator (operand 0) by the magic value.
   // FIXME: We should support doing a MUL in a wider type.
   SDValue Q;
   if (IsAfterLegalization ? isOperationLegal(ISD::MULHS, VT)
                           : isOperationLegalOrCustom(ISD::MULHS, VT))
     Q = DAG.getNode(ISD::MULHS, dl, VT, N0, MagicFactor);
   else if (IsAfterLegalization ? isOperationLegal(ISD::SMUL_LOHI, VT)
                                : isOperationLegalOrCustom(ISD::SMUL_LOHI, VT)) {
     SDValue LoHi =
         DAG.getNode(ISD::SMUL_LOHI, dl, DAG.getVTList(VT, VT), N0, MagicFactor);
     Q = SDValue(LoHi.getNode(), 1);
   } else
     return SDValue(); // No mulhs or equivalent.
   Created.push_back(Q.getNode());
 
   // (Optionally) Add/subtract the numerator using Factor.
   Factor = DAG.getNode(ISD::MUL, dl, VT, N0, Factor);
   Created.push_back(Factor.getNode());
   Q = DAG.getNode(ISD::ADD, dl, VT, Q, Factor);
   Created.push_back(Q.getNode());
 
   // Shift right algebraic by shift value.
   Q = DAG.getNode(ISD::SRA, dl, VT, Q, Shift);
   Created.push_back(Q.getNode());
 
   // Extract the sign bit, mask it and add it to the quotient.
   SDValue SignShift = DAG.getConstant(EltBits - 1, dl, ShVT);
   SDValue T = DAG.getNode(ISD::SRL, dl, VT, Q, SignShift);
   Created.push_back(T.getNode());
   T = DAG.getNode(ISD::AND, dl, VT, T, ShiftMask);
   Created.push_back(T.getNode());
   return DAG.getNode(ISD::ADD, dl, VT, Q, T);
 }
 
 /// Given an ISD::UDIV node expressing a divide by constant,
 /// return a DAG expression to select that will generate the same value by
 /// multiplying by a magic number.
 /// Ref: "Hacker's Delight" or "The PowerPC Compiler Writer's Guide".
 SDValue TargetLowering::BuildUDIV(SDNode *N, SelectionDAG &DAG,
                                   bool IsAfterLegalization,
                                   SmallVectorImpl<SDNode *> &Created) const {
   SDLoc dl(N);
   EVT VT = N->getValueType(0);
   EVT SVT = VT.getScalarType();
   EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());
   EVT ShSVT = ShVT.getScalarType();
   unsigned EltBits = VT.getScalarSizeInBits();
 
   // Check to see if we can do this.
   // FIXME: We should be more aggressive here.
   if (!isTypeLegal(VT))
     return SDValue();
 
   bool UseNPQ = false;
   SmallVector<SDValue, 16> PreShifts, PostShifts, MagicFactors, NPQFactors;
 
   auto BuildUDIVPattern = [&](ConstantSDNode *C) {
     if (C->isNullValue())
       return false;
     // FIXME: We should use a narrower constant when the upper
     // bits are known to be zero.
     APInt Divisor = C->getAPIntValue();
     APInt::mu magics = Divisor.magicu();
     unsigned PreShift = 0, PostShift = 0;
 
     // If the divisor is even, we can avoid using the expensive fixup by
     // shifting the divided value upfront.
     if (magics.a != 0 && !Divisor[0]) {
       PreShift = Divisor.countTrailingZeros();
       // Get magic number for the shifted divisor.
       magics = Divisor.lshr(PreShift).magicu(PreShift);
       assert(magics.a == 0 && "Should use cheap fixup now");
     }
 
     APInt Magic = magics.m;
 
     unsigned SelNPQ;
     if (magics.a == 0 || Divisor.isOneValue()) {
       assert(magics.s < Divisor.getBitWidth() &&
              "We shouldn't generate an undefined shift!");
       PostShift = magics.s;
       SelNPQ = false;
     } else {
       PostShift = magics.s - 1;
       SelNPQ = true;
     }
 
     PreShifts.push_back(DAG.getConstant(PreShift, dl, ShSVT));
     MagicFactors.push_back(DAG.getConstant(Magic, dl, SVT));
     NPQFactors.push_back(
         DAG.getConstant(SelNPQ ? APInt::getOneBitSet(EltBits, EltBits - 1)
                                : APInt::getNullValue(EltBits),
                         dl, SVT));
     PostShifts.push_back(DAG.getConstant(PostShift, dl, ShSVT));
     UseNPQ |= SelNPQ;
     return true;
   };
 
   SDValue N0 = N->getOperand(0);
   SDValue N1 = N->getOperand(1);
 
   // Collect the shifts/magic values from each element.
   if (!ISD::matchUnaryPredicate(N1, BuildUDIVPattern))
     return SDValue();
 
   SDValue PreShift, PostShift, MagicFactor, NPQFactor;
   if (VT.isVector()) {
     PreShift = DAG.getBuildVector(ShVT, dl, PreShifts);
     MagicFactor = DAG.getBuildVector(VT, dl, MagicFactors);
     NPQFactor = DAG.getBuildVector(VT, dl, NPQFactors);
     PostShift = DAG.getBuildVector(ShVT, dl, PostShifts);
   } else {
     PreShift = PreShifts[0];
     MagicFactor = MagicFactors[0];
     PostShift = PostShifts[0];
   }
 
   SDValue Q = N0;
   Q = DAG.getNode(ISD::SRL, dl, VT, Q, PreShift);
   Created.push_back(Q.getNode());
 
   // FIXME: We should support doing a MUL in a wider type.
   auto GetMULHU = [&](SDValue X, SDValue Y) {
     if (IsAfterLegalization ? isOperationLegal(ISD::MULHU, VT)
                             : isOperationLegalOrCustom(ISD::MULHU, VT))
       return DAG.getNode(ISD::MULHU, dl, VT, X, Y);
     if (IsAfterLegalization ? isOperationLegal(ISD::UMUL_LOHI, VT)
                             : isOperationLegalOrCustom(ISD::UMUL_LOHI, VT)) {
       SDValue LoHi =
           DAG.getNode(ISD::UMUL_LOHI, dl, DAG.getVTList(VT, VT), X, Y);
       return SDValue(LoHi.getNode(), 1);
     }
     return SDValue(); // No mulhu or equivalent
   };
 
   // Multiply the numerator (operand 0) by the magic value.
   Q = GetMULHU(Q, MagicFactor);
   if (!Q)
     return SDValue();
 
   Created.push_back(Q.getNode());
 
   if (UseNPQ) {
     SDValue NPQ = DAG.getNode(ISD::SUB, dl, VT, N0, Q);
     Created.push_back(NPQ.getNode());
 
     // For vectors we might have a mix of non-NPQ/NPQ paths, so use
     // MULHU to act as a SRL-by-1 for NPQ, else multiply by zero.
     if (VT.isVector())
       NPQ = GetMULHU(NPQ, NPQFactor);
     else
       NPQ = DAG.getNode(ISD::SRL, dl, VT, NPQ, DAG.getConstant(1, dl, ShVT));
 
     Created.push_back(NPQ.getNode());
 
     Q = DAG.getNode(ISD::ADD, dl, VT, NPQ, Q);
     Created.push_back(Q.getNode());
   }
 
   Q = DAG.getNode(ISD::SRL, dl, VT, Q, PostShift);
   Created.push_back(Q.getNode());
 
   SDValue One = DAG.getConstant(1, dl, VT);
   SDValue IsOne = DAG.getSetCC(dl, VT, N1, One, ISD::SETEQ);
   return DAG.getSelect(dl, VT, IsOne, N0, Q);
 }
 
 /// If all values in Values that *don't* match the predicate are same 'splat'
 /// value, then replace all values with that splat value.
 /// Else, if AlternativeReplacement was provided, then replace all values that
 /// do match predicate with AlternativeReplacement value.
 static void
 turnVectorIntoSplatVector(MutableArrayRef<SDValue> Values,
                           std::function<bool(SDValue)> Predicate,
                           SDValue AlternativeReplacement = SDValue()) {
   SDValue Replacement;
   // Is there a value for which the Predicate does *NOT* match? What is it?
   auto SplatValue = llvm::find_if_not(Values, Predicate);
   if (SplatValue != Values.end()) {
     // Does Values consist only of SplatValue's and values matching Predicate?
     if (llvm::all_of(Values, [Predicate, SplatValue](SDValue Value) {
           return Value == *SplatValue || Predicate(Value);
         })) // Then we shall replace values matching predicate with SplatValue.
       Replacement = *SplatValue;
   }
   if (!Replacement) {
     // Oops, we did not find the "baseline" splat value.
     if (!AlternativeReplacement)
       return; // Nothing to do.
     // Let's replace with provided value then.
     Replacement = AlternativeReplacement;
   }
   std::replace_if(Values.begin(), Values.end(), Predicate, Replacement);
 }
 
 /// Given an ISD::UREM used only by an ISD::SETEQ or ISD::SETNE
 /// where the divisor is constant and the comparison target is zero,
 /// return a DAG expression that will generate the same comparison result
 /// using only multiplications, additions and shifts/rotations.
 /// Ref: "Hacker's Delight" 10-17.
 SDValue TargetLowering::buildUREMEqFold(EVT SETCCVT, SDValue REMNode,
                                         SDValue CompTargetNode,
                                         ISD::CondCode Cond,
                                         DAGCombinerInfo &DCI,
                                         const SDLoc &DL) const {
   SmallVector<SDNode *, 5> Built;
   if (SDValue Folded = prepareUREMEqFold(SETCCVT, REMNode, CompTargetNode, Cond,
                                          DCI, DL, Built)) {
     for (SDNode *N : Built)
       DCI.AddToWorklist(N);
     return Folded;
   }
 
   return SDValue();
 }
 
 SDValue
 TargetLowering::prepareUREMEqFold(EVT SETCCVT, SDValue REMNode,
                                   SDValue CompTargetNode, ISD::CondCode Cond,
                                   DAGCombinerInfo &DCI, const SDLoc &DL,
                                   SmallVectorImpl<SDNode *> &Created) const {
   // fold (seteq/ne (urem N, D), 0) -> (setule/ugt (rotr (mul N, P), K), Q)
   // - D must be constant, with D = D0 * 2^K where D0 is odd
   // - P is the multiplicative inverse of D0 modulo 2^W
   // - Q = floor(((2^W) - 1) / D)
   // where W is the width of the common type of N and D.
   assert((Cond == ISD::SETEQ || Cond == ISD::SETNE) &&
          "Only applicable for (in)equality comparisons.");
 
   SelectionDAG &DAG = DCI.DAG;
 
   EVT VT = REMNode.getValueType();
   EVT SVT = VT.getScalarType();
   EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());
   EVT ShSVT = ShVT.getScalarType();
 
   // If MUL is unavailable, we cannot proceed in any case.
   if (!isOperationLegalOrCustom(ISD::MUL, VT))
     return SDValue();
 
   bool ComparingWithAllZeros = true;
   bool AllComparisonsWithNonZerosAreTautological = true;
   bool HadTautologicalLanes = false;
   bool AllLanesAreTautological = true;
   bool HadEvenDivisor = false;
   bool AllDivisorsArePowerOfTwo = true;
   bool HadTautologicalInvertedLanes = false;
   SmallVector<SDValue, 16> PAmts, KAmts, QAmts, IAmts;
 
   auto BuildUREMPattern = [&](ConstantSDNode *CDiv, ConstantSDNode *CCmp) {
     // Division by 0 is UB. Leave it to be constant-folded elsewhere.
     if (CDiv->isNullValue())
       return false;
 
     const APInt &D = CDiv->getAPIntValue();
     const APInt &Cmp = CCmp->getAPIntValue();
 
     ComparingWithAllZeros &= Cmp.isNullValue();
 
     // x u% C1` is *always* less than C1. So given `x u% C1 == C2`,
     // if C2 is not less than C1, the comparison is always false.
     // But we will only be able to produce the comparison that will give the
     // opposive tautological answer. So this lane would need to be fixed up.
     bool TautologicalInvertedLane = D.ule(Cmp);
     HadTautologicalInvertedLanes |= TautologicalInvertedLane;
 
     // If all lanes are tautological (either all divisors are ones, or divisor
     // is not greater than the constant we are comparing with),
     // we will prefer to avoid the fold.
     bool TautologicalLane = D.isOneValue() || TautologicalInvertedLane;
     HadTautologicalLanes |= TautologicalLane;
     AllLanesAreTautological &= TautologicalLane;
 
     // If we are comparing with non-zero, we need'll need  to subtract said
     // comparison value from the LHS. But there is no point in doing that if
     // every lane where we are comparing with non-zero is tautological..
     if (!Cmp.isNullValue())
       AllComparisonsWithNonZerosAreTautological &= TautologicalLane;
 
     // Decompose D into D0 * 2^K
     unsigned K = D.countTrailingZeros();
     assert((!D.isOneValue() || (K == 0)) && "For divisor '1' we won't rotate.");
     APInt D0 = D.lshr(K);
 
     // D is even if it has trailing zeros.
     HadEvenDivisor |= (K != 0);
     // D is a power-of-two if D0 is one.
     // If all divisors are power-of-two, we will prefer to avoid the fold.
     AllDivisorsArePowerOfTwo &= D0.isOneValue();
 
     // P = inv(D0, 2^W)
     // 2^W requires W + 1 bits, so we have to extend and then truncate.
     unsigned W = D.getBitWidth();
     APInt P = D0.zext(W + 1)
                   .multiplicativeInverse(APInt::getSignedMinValue(W + 1))
                   .trunc(W);
     assert(!P.isNullValue() && "No multiplicative inverse!"); // unreachable
     assert((D0 * P).isOneValue() && "Multiplicative inverse sanity check.");
 
     // Q = floor((2^W - 1) u/ D)
     // R = ((2^W - 1) u% D)
     APInt Q, R;
     APInt::udivrem(APInt::getAllOnesValue(W), D, Q, R);
 
     // If we are comparing with zero, then that comparison constant is okay,
     // else it may need to be one less than that.
     if (Cmp.ugt(R))
       Q -= 1;
 
     assert(APInt::getAllOnesValue(ShSVT.getSizeInBits()).ugt(K) &&
            "We are expecting that K is always less than all-ones for ShSVT");
 
     // If the lane is tautological the result can be constant-folded.
     if (TautologicalLane) {
       // Set P and K amount to a bogus values so we can try to splat them.
       P = 0;
       K = -1;
       // And ensure that comparison constant is tautological,
       // it will always compare true/false.
       Q = -1;
     }
 
     PAmts.push_back(DAG.getConstant(P, DL, SVT));
     KAmts.push_back(
         DAG.getConstant(APInt(ShSVT.getSizeInBits(), K), DL, ShSVT));
     QAmts.push_back(DAG.getConstant(Q, DL, SVT));
     return true;
   };
 
   SDValue N = REMNode.getOperand(0);
   SDValue D = REMNode.getOperand(1);
 
   // Collect the values from each element.
   if (!ISD::matchBinaryPredicate(D, CompTargetNode, BuildUREMPattern))
     return SDValue();
 
   // If all lanes are tautological, the result can be constant-folded.
   if (AllLanesAreTautological)
     return SDValue();
 
   // If this is a urem by a powers-of-two, avoid the fold since it can be
   // best implemented as a bit test.
   if (AllDivisorsArePowerOfTwo)
     return SDValue();
 
   SDValue PVal, KVal, QVal;
   if (VT.isVector()) {
     if (HadTautologicalLanes) {
       // Try to turn PAmts into a splat, since we don't care about the values
       // that are currently '0'. If we can't, just keep '0'`s.
       turnVectorIntoSplatVector(PAmts, isNullConstant);
       // Try to turn KAmts into a splat, since we don't care about the values
       // that are currently '-1'. If we can't, change them to '0'`s.
       turnVectorIntoSplatVector(KAmts, isAllOnesConstant,
                                 DAG.getConstant(0, DL, ShSVT));
     }
 
     PVal = DAG.getBuildVector(VT, DL, PAmts);
     KVal = DAG.getBuildVector(ShVT, DL, KAmts);
     QVal = DAG.getBuildVector(VT, DL, QAmts);
   } else {
     PVal = PAmts[0];
     KVal = KAmts[0];
     QVal = QAmts[0];
   }
 
   if (!ComparingWithAllZeros && !AllComparisonsWithNonZerosAreTautological) {
     if (!isOperationLegalOrCustom(ISD::SUB, VT))
       return SDValue(); // FIXME: Could/should use `ISD::ADD`?
     assert(CompTargetNode.getValueType() == N.getValueType() &&
            "Expecting that the types on LHS and RHS of comparisons match.");
     N = DAG.getNode(ISD::SUB, DL, VT, N, CompTargetNode);
   }
 
   // (mul N, P)
   SDValue Op0 = DAG.getNode(ISD::MUL, DL, VT, N, PVal);
   Created.push_back(Op0.getNode());
 
   // Rotate right only if any divisor was even. We avoid rotates for all-odd
   // divisors as a performance improvement, since rotating by 0 is a no-op.
   if (HadEvenDivisor) {
     // We need ROTR to do this.
     if (!isOperationLegalOrCustom(ISD::ROTR, VT))
       return SDValue();
     SDNodeFlags Flags;
     Flags.setExact(true);
     // UREM: (rotr (mul N, P), K)
     Op0 = DAG.getNode(ISD::ROTR, DL, VT, Op0, KVal, Flags);
     Created.push_back(Op0.getNode());
   }
 
   // UREM: (setule/setugt (rotr (mul N, P), K), Q)
   SDValue NewCC =
       DAG.getSetCC(DL, SETCCVT, Op0, QVal,
                    ((Cond == ISD::SETEQ) ? ISD::SETULE : ISD::SETUGT));
   if (!HadTautologicalInvertedLanes)
     return NewCC;
 
   // If any lanes previously compared always-false, the NewCC will give
   // always-true result for them, so we need to fixup those lanes.
   // Or the other way around for inequality predicate.
   assert(VT.isVector() && "Can/should only get here for vectors.");
   Created.push_back(NewCC.getNode());
 
   // x u% C1` is *always* less than C1. So given `x u% C1 == C2`,
   // if C2 is not less than C1, the comparison is always false.
   // But we have produced the comparison that will give the
   // opposive tautological answer. So these lanes would need to be fixed up.
   SDValue TautologicalInvertedChannels =
       DAG.getSetCC(DL, SETCCVT, D, CompTargetNode, ISD::SETULE);
   Created.push_back(TautologicalInvertedChannels.getNode());
 
   if (isOperationLegalOrCustom(ISD::VSELECT, SETCCVT)) {
     // If we have a vector select, let's replace the comparison results in the
     // affected lanes with the correct tautological result.
     SDValue Replacement = DAG.getBoolConstant(Cond == ISD::SETEQ ? false : true,
                                               DL, SETCCVT, SETCCVT);
     return DAG.getNode(ISD::VSELECT, DL, SETCCVT, TautologicalInvertedChannels,
                        Replacement, NewCC);
   }
 
   // Else, we can just invert the comparison result in the appropriate lanes.
   if (isOperationLegalOrCustom(ISD::XOR, SETCCVT))
     return DAG.getNode(ISD::XOR, DL, SETCCVT, NewCC,
                        TautologicalInvertedChannels);
 
   return SDValue(); // Don't know how to lower.
 }
 
 /// Given an ISD::SREM used only by an ISD::SETEQ or ISD::SETNE
 /// where the divisor is constant and the comparison target is zero,
 /// return a DAG expression that will generate the same comparison result
 /// using only multiplications, additions and shifts/rotations.
 /// Ref: "Hacker's Delight" 10-17.
 SDValue TargetLowering::buildSREMEqFold(EVT SETCCVT, SDValue REMNode,
                                         SDValue CompTargetNode,
                                         ISD::CondCode Cond,
                                         DAGCombinerInfo &DCI,
                                         const SDLoc &DL) const {
   SmallVector<SDNode *, 7> Built;
   if (SDValue Folded = prepareSREMEqFold(SETCCVT, REMNode, CompTargetNode, Cond,
                                          DCI, DL, Built)) {
     assert(Built.size() <= 7 && "Max size prediction failed.");
     for (SDNode *N : Built)
       DCI.AddToWorklist(N);
     return Folded;
   }
 
   return SDValue();
 }
 
 SDValue
 TargetLowering::prepareSREMEqFold(EVT SETCCVT, SDValue REMNode,
                                   SDValue CompTargetNode, ISD::CondCode Cond,
                                   DAGCombinerInfo &DCI, const SDLoc &DL,
                                   SmallVectorImpl<SDNode *> &Created) const {
   // Fold:
   //   (seteq/ne (srem N, D), 0)
   // To:
   //   (setule/ugt (rotr (add (mul N, P), A), K), Q)
   //
   // - D must be constant, with D = D0 * 2^K where D0 is odd
   // - P is the multiplicative inverse of D0 modulo 2^W
   // - A = bitwiseand(floor((2^(W - 1) - 1) / D0), (-(2^k)))
   // - Q = floor((2 * A) / (2^K))
   // where W is the width of the common type of N and D.
   assert((Cond == ISD::SETEQ || Cond == ISD::SETNE) &&
          "Only applicable for (in)equality comparisons.");
 
   SelectionDAG &DAG = DCI.DAG;
 
   EVT VT = REMNode.getValueType();
   EVT SVT = VT.getScalarType();
   EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());
   EVT ShSVT = ShVT.getScalarType();
 
   // If MUL is unavailable, we cannot proceed in any case.
   if (!isOperationLegalOrCustom(ISD::MUL, VT))
     return SDValue();
 
   // TODO: Could support comparing with non-zero too.
   ConstantSDNode *CompTarget = isConstOrConstSplat(CompTargetNode);
   if (!CompTarget || !CompTarget->isNullValue())
     return SDValue();
 
   bool HadIntMinDivisor = false;
   bool HadOneDivisor = false;
   bool AllDivisorsAreOnes = true;
   bool HadEvenDivisor = false;
   bool NeedToApplyOffset = false;
   bool AllDivisorsArePowerOfTwo = true;
   SmallVector<SDValue, 16> PAmts, AAmts, KAmts, QAmts;
 
   auto BuildSREMPattern = [&](ConstantSDNode *C) {
     // Division by 0 is UB. Leave it to be constant-folded elsewhere.
     if (C->isNullValue())
       return false;
 
     // FIXME: we don't fold `rem %X, -C` to `rem %X, C` in DAGCombine.
 
     // WARNING: this fold is only valid for positive divisors!
     APInt D = C->getAPIntValue();
     if (D.isNegative())
       D.negate(); //  `rem %X, -C` is equivalent to `rem %X, C`
 
     HadIntMinDivisor |= D.isMinSignedValue();
 
     // If all divisors are ones, we will prefer to avoid the fold.
     HadOneDivisor |= D.isOneValue();
     AllDivisorsAreOnes &= D.isOneValue();
 
     // Decompose D into D0 * 2^K
     unsigned K = D.countTrailingZeros();
     assert((!D.isOneValue() || (K == 0)) && "For divisor '1' we won't rotate.");
     APInt D0 = D.lshr(K);
 
     if (!D.isMinSignedValue()) {
       // D is even if it has trailing zeros; unless it's INT_MIN, in which case
       // we don't care about this lane in this fold, we'll special-handle it.
       HadEvenDivisor |= (K != 0);
     }
 
     // D is a power-of-two if D0 is one. This includes INT_MIN.
     // If all divisors are power-of-two, we will prefer to avoid the fold.
     AllDivisorsArePowerOfTwo &= D0.isOneValue();
 
     // P = inv(D0, 2^W)
     // 2^W requires W + 1 bits, so we have to extend and then truncate.
     unsigned W = D.getBitWidth();
     APInt P = D0.zext(W + 1)
                   .multiplicativeInverse(APInt::getSignedMinValue(W + 1))
                   .trunc(W);
     assert(!P.isNullValue() && "No multiplicative inverse!"); // unreachable
     assert((D0 * P).isOneValue() && "Multiplicative inverse sanity check.");
 
     // A = floor((2^(W - 1) - 1) / D0) & -2^K
     APInt A = APInt::getSignedMaxValue(W).udiv(D0);
     A.clearLowBits(K);
 
     if (!D.isMinSignedValue()) {
       // If divisor INT_MIN, then we don't care about this lane in this fold,
       // we'll special-handle it.
       NeedToApplyOffset |= A != 0;
     }
 
     // Q = floor((2 * A) / (2^K))
     APInt Q = (2 * A).udiv(APInt::getOneBitSet(W, K));
 
     assert(APInt::getAllOnesValue(SVT.getSizeInBits()).ugt(A) &&
            "We are expecting that A is always less than all-ones for SVT");
     assert(APInt::getAllOnesValue(ShSVT.getSizeInBits()).ugt(K) &&
            "We are expecting that K is always less than all-ones for ShSVT");
 
     // If the divisor is 1 the result can be constant-folded. Likewise, we
     // don't care about INT_MIN lanes, those can be set to undef if appropriate.
     if (D.isOneValue()) {
       // Set P, A and K to a bogus values so we can try to splat them.
       P = 0;
       A = -1;
       K = -1;
 
       // x ?% 1 == 0  <-->  true  <-->  x u<= -1
       Q = -1;
     }
 
     PAmts.push_back(DAG.getConstant(P, DL, SVT));
     AAmts.push_back(DAG.getConstant(A, DL, SVT));
     KAmts.push_back(
         DAG.getConstant(APInt(ShSVT.getSizeInBits(), K), DL, ShSVT));
     QAmts.push_back(DAG.getConstant(Q, DL, SVT));
     return true;
   };
 
   SDValue N = REMNode.getOperand(0);
   SDValue D = REMNode.getOperand(1);
 
   // Collect the values from each element.
   if (!ISD::matchUnaryPredicate(D, BuildSREMPattern))
     return SDValue();
 
   // If this is a srem by a one, avoid the fold since it can be constant-folded.
   if (AllDivisorsAreOnes)
     return SDValue();
 
   // If this is a srem by a powers-of-two (including INT_MIN), avoid the fold
   // since it can be best implemented as a bit test.
   if (AllDivisorsArePowerOfTwo)
     return SDValue();
 
   SDValue PVal, AVal, KVal, QVal;
   if (VT.isVector()) {
     if (HadOneDivisor) {
       // Try to turn PAmts into a splat, since we don't care about the values
       // that are currently '0'. If we can't, just keep '0'`s.
       turnVectorIntoSplatVector(PAmts, isNullConstant);
       // Try to turn AAmts into a splat, since we don't care about the
       // values that are currently '-1'. If we can't, change them to '0'`s.
       turnVectorIntoSplatVector(AAmts, isAllOnesConstant,
                                 DAG.getConstant(0, DL, SVT));
       // Try to turn KAmts into a splat, since we don't care about the values
       // that are currently '-1'. If we can't, change them to '0'`s.
       turnVectorIntoSplatVector(KAmts, isAllOnesConstant,
                                 DAG.getConstant(0, DL, ShSVT));
     }
 
     PVal = DAG.getBuildVector(VT, DL, PAmts);
     AVal = DAG.getBuildVector(VT, DL, AAmts);
     KVal = DAG.getBuildVector(ShVT, DL, KAmts);
     QVal = DAG.getBuildVector(VT, DL, QAmts);
   } else {
     PVal = PAmts[0];
     AVal = AAmts[0];
     KVal = KAmts[0];
     QVal = QAmts[0];
   }
 
   // (mul N, P)
   SDValue Op0 = DAG.getNode(ISD::MUL, DL, VT, N, PVal);
   Created.push_back(Op0.getNode());
 
   if (NeedToApplyOffset) {
     // We need ADD to do this.
     if (!isOperationLegalOrCustom(ISD::ADD, VT))
       return SDValue();
 
     // (add (mul N, P), A)
     Op0 = DAG.getNode(ISD::ADD, DL, VT, Op0, AVal);
     Created.push_back(Op0.getNode());
   }
 
   // Rotate right only if any divisor was even. We avoid rotates for all-odd
   // divisors as a performance improvement, since rotating by 0 is a no-op.
   if (HadEvenDivisor) {
     // We need ROTR to do this.
     if (!isOperationLegalOrCustom(ISD::ROTR, VT))
       return SDValue();
     SDNodeFlags Flags;
     Flags.setExact(true);
     // SREM: (rotr (add (mul N, P), A), K)
     Op0 = DAG.getNode(ISD::ROTR, DL, VT, Op0, KVal, Flags);
     Created.push_back(Op0.getNode());
   }
 
   // SREM: (setule/setugt (rotr (add (mul N, P), A), K), Q)
   SDValue Fold =
       DAG.getSetCC(DL, SETCCVT, Op0, QVal,
                    ((Cond == ISD::SETEQ) ? ISD::SETULE : ISD::SETUGT));
 
   // If we didn't have lanes with INT_MIN divisor, then we're done.
   if (!HadIntMinDivisor)
     return Fold;
 
   // That fold is only valid for positive divisors. Which effectively means,
   // it is invalid for INT_MIN divisors. So if we have such a lane,
   // we must fix-up results for said lanes.
   assert(VT.isVector() && "Can/should only get here for vectors.");
 
   if (!isOperationLegalOrCustom(ISD::SETEQ, VT) ||
       !isOperationLegalOrCustom(ISD::AND, VT) ||
       !isOperationLegalOrCustom(Cond, VT) ||
       !isOperationLegalOrCustom(ISD::VSELECT, VT))
     return SDValue();
 
   Created.push_back(Fold.getNode());
 
   SDValue IntMin = DAG.getConstant(
       APInt::getSignedMinValue(SVT.getScalarSizeInBits()), DL, VT);
   SDValue IntMax = DAG.getConstant(
       APInt::getSignedMaxValue(SVT.getScalarSizeInBits()), DL, VT);
   SDValue Zero =
       DAG.getConstant(APInt::getNullValue(SVT.getScalarSizeInBits()), DL, VT);
 
   // Which lanes had INT_MIN divisors? Divisor is constant, so const-folded.
   SDValue DivisorIsIntMin = DAG.getSetCC(DL, SETCCVT, D, IntMin, ISD::SETEQ);
   Created.push_back(DivisorIsIntMin.getNode());
 
   // (N s% INT_MIN) ==/!= 0  <-->  (N & INT_MAX) ==/!= 0
   SDValue Masked = DAG.getNode(ISD::AND, DL, VT, N, IntMax);
   Created.push_back(Masked.getNode());
   SDValue MaskedIsZero = DAG.getSetCC(DL, SETCCVT, Masked, Zero, Cond);
   Created.push_back(MaskedIsZero.getNode());
 
   // To produce final result we need to blend 2 vectors: 'SetCC' and
   // 'MaskedIsZero'. If the divisor for channel was *NOT* INT_MIN, we pick
   // from 'Fold', else pick from 'MaskedIsZero'. Since 'DivisorIsIntMin' is
   // constant-folded, select can get lowered to a shuffle with constant mask.
   SDValue Blended =
       DAG.getNode(ISD::VSELECT, DL, VT, DivisorIsIntMin, MaskedIsZero, Fold);
 
   return Blended;
 }
 
 bool TargetLowering::
 verifyReturnAddressArgumentIsConstant(SDValue Op, SelectionDAG &DAG) const {
   if (!isa<ConstantSDNode>(Op.getOperand(0))) {
     DAG.getContext()->emitError("argument to '__builtin_return_address' must "
                                 "be a constant integer");
     return true;
   }
 
   return false;
 }
 
 SDValue TargetLowering::getNegatedExpression(SDValue Op, SelectionDAG &DAG,
                                              bool LegalOps, bool OptForSize,
                                              NegatibleCost &Cost,
                                              unsigned Depth) const {
   // fneg is removable even if it has multiple uses.
   if (Op.getOpcode() == ISD::FNEG) {
     Cost = NegatibleCost::Cheaper;
     return Op.getOperand(0);
   }
 
   // Don't recurse exponentially.
   if (Depth > SelectionDAG::MaxRecursionDepth)
     return SDValue();
 
   // Pre-increment recursion depth for use in recursive calls.
   ++Depth;
   const SDNodeFlags Flags = Op->getFlags();
   const TargetOptions &Options = DAG.getTarget().Options;
   EVT VT = Op.getValueType();
   unsigned Opcode = Op.getOpcode();
 
   // Don't allow anything with multiple uses unless we know it is free.
   if (!Op.hasOneUse() && Opcode != ISD::ConstantFP) {
     bool IsFreeExtend = Opcode == ISD::FP_EXTEND &&
                         isFPExtFree(VT, Op.getOperand(0).getValueType());
     if (!IsFreeExtend)
       return SDValue();
   }
 
   auto RemoveDeadNode = [&](SDValue N) {
     if (N && N.getNode()->use_empty())
       DAG.RemoveDeadNode(N.getNode());
   };
 
   SDLoc DL(Op);
 
   switch (Opcode) {
   case ISD::ConstantFP: {
     // Don't invert constant FP values after legalization unless the target says
     // the negated constant is legal.
     bool IsOpLegal =
         isOperationLegal(ISD::ConstantFP, VT) ||
         isFPImmLegal(neg(cast<ConstantFPSDNode>(Op)->getValueAPF()), VT,
                      OptForSize);
 
     if (LegalOps && !IsOpLegal)
       break;
 
     APFloat V = cast<ConstantFPSDNode>(Op)->getValueAPF();
     V.changeSign();
     SDValue CFP = DAG.getConstantFP(V, DL, VT);
 
     // If we already have the use of the negated floating constant, it is free
     // to negate it even it has multiple uses.
-    if (!Op.hasOneUse() && CFP.use_empty())
+    if (!Op.hasOneUse() && CFP.use_empty()) {
+      RemoveDeadNode(CFP);
       break;
+    }
     Cost = NegatibleCost::Neutral;
     return CFP;
   }
   case ISD::BUILD_VECTOR: {
     // Only permit BUILD_VECTOR of constants.
     if (llvm::any_of(Op->op_values(), [&](SDValue N) {
           return !N.isUndef() && !isa<ConstantFPSDNode>(N);
         }))
       break;
 
     bool IsOpLegal =
         (isOperationLegal(ISD::ConstantFP, VT) &&
          isOperationLegal(ISD::BUILD_VECTOR, VT)) ||
         llvm::all_of(Op->op_values(), [&](SDValue N) {
           return N.isUndef() ||
                  isFPImmLegal(neg(cast<ConstantFPSDNode>(N)->getValueAPF()), VT,
                               OptForSize);
         });
 
     if (LegalOps && !IsOpLegal)
       break;
 
     SmallVector<SDValue, 4> Ops;
     for (SDValue C : Op->op_values()) {
       if (C.isUndef()) {
         Ops.push_back(C);
         continue;
       }
       APFloat V = cast<ConstantFPSDNode>(C)->getValueAPF();
       V.changeSign();
       Ops.push_back(DAG.getConstantFP(V, DL, C.getValueType()));
     }
     Cost = NegatibleCost::Neutral;
     return DAG.getBuildVector(VT, DL, Ops);
   }
   case ISD::FADD: {
     if (!Options.NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
       break;
 
     // After operation legalization, it might not be legal to create new FSUBs.
     if (LegalOps && !isOperationLegalOrCustom(ISD::FSUB, VT))
       break;
     SDValue X = Op.getOperand(0), Y = Op.getOperand(1);
 
     // fold (fneg (fadd X, Y)) -> (fsub (fneg X), Y)
     NegatibleCost CostX = NegatibleCost::Expensive;
     SDValue NegX =
         getNegatedExpression(X, DAG, LegalOps, OptForSize, CostX, Depth);
     // fold (fneg (fadd X, Y)) -> (fsub (fneg Y), X)
     NegatibleCost CostY = NegatibleCost::Expensive;
     SDValue NegY =
         getNegatedExpression(Y, DAG, LegalOps, OptForSize, CostY, Depth);
 
     // Negate the X if its cost is less or equal than Y.
     if (NegX && (CostX <= CostY)) {
       Cost = CostX;
       SDValue N = DAG.getNode(ISD::FSUB, DL, VT, NegX, Y, Flags);
-      RemoveDeadNode(NegY);
+      if (NegY != N)
+        RemoveDeadNode(NegY);
       return N;
     }
 
     // Negate the Y if it is not expensive.
     if (NegY) {
       Cost = CostY;
       SDValue N = DAG.getNode(ISD::FSUB, DL, VT, NegY, X, Flags);
-      RemoveDeadNode(NegX);
+      if (NegX != N)
+        RemoveDeadNode(NegX);
       return N;
     }
     break;
   }
   case ISD::FSUB: {
     // We can't turn -(A-B) into B-A when we honor signed zeros.
     if (!Options.NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
       break;
 
     SDValue X = Op.getOperand(0), Y = Op.getOperand(1);
     // fold (fneg (fsub 0, Y)) -> Y
     if (ConstantFPSDNode *C = isConstOrConstSplatFP(X, /*AllowUndefs*/ true))
       if (C->isZero()) {
         Cost = NegatibleCost::Cheaper;
         return Y;
       }
 
     // fold (fneg (fsub X, Y)) -> (fsub Y, X)
     Cost = NegatibleCost::Neutral;
     return DAG.getNode(ISD::FSUB, DL, VT, Y, X, Flags);
   }
   case ISD::FMUL:
   case ISD::FDIV: {
     SDValue X = Op.getOperand(0), Y = Op.getOperand(1);
 
     // fold (fneg (fmul X, Y)) -> (fmul (fneg X), Y)
     NegatibleCost CostX = NegatibleCost::Expensive;
     SDValue NegX =
         getNegatedExpression(X, DAG, LegalOps, OptForSize, CostX, Depth);
     // fold (fneg (fmul X, Y)) -> (fmul X, (fneg Y))
     NegatibleCost CostY = NegatibleCost::Expensive;
     SDValue NegY =
         getNegatedExpression(Y, DAG, LegalOps, OptForSize, CostY, Depth);
 
     // Negate the X if its cost is less or equal than Y.
     if (NegX && (CostX <= CostY)) {
       Cost = CostX;
       SDValue N = DAG.getNode(Opcode, DL, VT, NegX, Y, Flags);
-      RemoveDeadNode(NegY);
+      if (NegY != N)
+        RemoveDeadNode(NegY);
       return N;
     }
 
     // Ignore X * 2.0 because that is expected to be canonicalized to X + X.
     if (auto *C = isConstOrConstSplatFP(Op.getOperand(1)))
       if (C->isExactlyValue(2.0) && Op.getOpcode() == ISD::FMUL)
         break;
 
     // Negate the Y if it is not expensive.
     if (NegY) {
       Cost = CostY;
       SDValue N = DAG.getNode(Opcode, DL, VT, X, NegY, Flags);
-      RemoveDeadNode(NegX);
+      if (NegX != N)
+        RemoveDeadNode(NegX);
       return N;
     }
     break;
   }
   case ISD::FMA:
   case ISD::FMAD: {
     if (!Options.NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
       break;
 
     SDValue X = Op.getOperand(0), Y = Op.getOperand(1), Z = Op.getOperand(2);
     NegatibleCost CostZ = NegatibleCost::Expensive;
     SDValue NegZ =
         getNegatedExpression(Z, DAG, LegalOps, OptForSize, CostZ, Depth);
     // Give up if fail to negate the Z.
     if (!NegZ)
       break;
 
     // fold (fneg (fma X, Y, Z)) -> (fma (fneg X), Y, (fneg Z))
     NegatibleCost CostX = NegatibleCost::Expensive;
     SDValue NegX =
         getNegatedExpression(X, DAG, LegalOps, OptForSize, CostX, Depth);
     // fold (fneg (fma X, Y, Z)) -> (fma X, (fneg Y), (fneg Z))
     NegatibleCost CostY = NegatibleCost::Expensive;
     SDValue NegY =
         getNegatedExpression(Y, DAG, LegalOps, OptForSize, CostY, Depth);
 
     // Negate the X if its cost is less or equal than Y.
     if (NegX && (CostX <= CostY)) {
       Cost = std::min(CostX, CostZ);
       SDValue N = DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags);
-      RemoveDeadNode(NegY);
+      if (NegY != N)
+        RemoveDeadNode(NegY);
       return N;
     }
 
     // Negate the Y if it is not expensive.
     if (NegY) {
       Cost = std::min(CostY, CostZ);
       SDValue N = DAG.getNode(Opcode, DL, VT, X, NegY, NegZ, Flags);
-      RemoveDeadNode(NegX);
+      if (NegX != N)
+        RemoveDeadNode(NegX);
       return N;
     }
     break;
   }
 
   case ISD::FP_EXTEND:
   case ISD::FSIN:
     if (SDValue NegV = getNegatedExpression(Op.getOperand(0), DAG, LegalOps,
                                             OptForSize, Cost, Depth))
       return DAG.getNode(Opcode, DL, VT, NegV);
     break;
   case ISD::FP_ROUND:
     if (SDValue NegV = getNegatedExpression(Op.getOperand(0), DAG, LegalOps,
                                             OptForSize, Cost, Depth))
       return DAG.getNode(ISD::FP_ROUND, DL, VT, NegV, Op.getOperand(1));
     break;
   }
 
   return SDValue();
 }
 
 //===----------------------------------------------------------------------===//
 // Legalization Utilities
 //===----------------------------------------------------------------------===//
 
 bool TargetLowering::expandMUL_LOHI(unsigned Opcode, EVT VT, SDLoc dl,
                                     SDValue LHS, SDValue RHS,
                                     SmallVectorImpl<SDValue> &Result,
                                     EVT HiLoVT, SelectionDAG &DAG,
                                     MulExpansionKind Kind, SDValue LL,
                                     SDValue LH, SDValue RL, SDValue RH) const {
   assert(Opcode == ISD::MUL || Opcode == ISD::UMUL_LOHI ||
          Opcode == ISD::SMUL_LOHI);
 
   bool HasMULHS = (Kind == MulExpansionKind::Always) ||
                   isOperationLegalOrCustom(ISD::MULHS, HiLoVT);
   bool HasMULHU = (Kind == MulExpansionKind::Always) ||
                   isOperationLegalOrCustom(ISD::MULHU, HiLoVT);
   bool HasSMUL_LOHI = (Kind == MulExpansionKind::Always) ||
                       isOperationLegalOrCustom(ISD::SMUL_LOHI, HiLoVT);
   bool HasUMUL_LOHI = (Kind == MulExpansionKind::Always) ||
                       isOperationLegalOrCustom(ISD::UMUL_LOHI, HiLoVT);
 
   if (!HasMULHU && !HasMULHS && !HasUMUL_LOHI && !HasSMUL_LOHI)
     return false;
 
   unsigned OuterBitSize = VT.getScalarSizeInBits();
   unsigned InnerBitSize = HiLoVT.getScalarSizeInBits();
   unsigned LHSSB = DAG.ComputeNumSignBits(LHS);
   unsigned RHSSB = DAG.ComputeNumSignBits(RHS);
 
   // LL, LH, RL, and RH must be either all NULL or all set to a value.
   assert((LL.getNode() && LH.getNode() && RL.getNode() && RH.getNode()) ||
          (!LL.getNode() && !LH.getNode() && !RL.getNode() && !RH.getNode()));
 
   SDVTList VTs = DAG.getVTList(HiLoVT, HiLoVT);
   auto MakeMUL_LOHI = [&](SDValue L, SDValue R, SDValue &Lo, SDValue &Hi,
                           bool Signed) -> bool {
     if ((Signed && HasSMUL_LOHI) || (!Signed && HasUMUL_LOHI)) {
       Lo = DAG.getNode(Signed ? ISD::SMUL_LOHI : ISD::UMUL_LOHI, dl, VTs, L, R);
       Hi = SDValue(Lo.getNode(), 1);
       return true;
     }
     if ((Signed && HasMULHS) || (!Signed && HasMULHU)) {
       Lo = DAG.getNode(ISD::MUL, dl, HiLoVT, L, R);
       Hi = DAG.getNode(Signed ? ISD::MULHS : ISD::MULHU, dl, HiLoVT, L, R);
       return true;
     }
     return false;
   };
 
   SDValue Lo, Hi;
 
   if (!LL.getNode() && !RL.getNode() &&
       isOperationLegalOrCustom(ISD::TRUNCATE, HiLoVT)) {
     LL = DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, LHS);
     RL = DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, RHS);
   }
 
   if (!LL.getNode())
     return false;
 
   APInt HighMask = APInt::getHighBitsSet(OuterBitSize, InnerBitSize);
   if (DAG.MaskedValueIsZero(LHS, HighMask) &&
       DAG.MaskedValueIsZero(RHS, HighMask)) {
     // The inputs are both zero-extended.
     if (MakeMUL_LOHI(LL, RL, Lo, Hi, false)) {
       Result.push_back(Lo);
       Result.push_back(Hi);
       if (Opcode != ISD::MUL) {
         SDValue Zero = DAG.getConstant(0, dl, HiLoVT);
         Result.push_back(Zero);
         Result.push_back(Zero);
       }
       return true;
     }
   }
 
   if (!VT.isVector() && Opcode == ISD::MUL && LHSSB > InnerBitSize &&
       RHSSB > InnerBitSize) {
     // The input values are both sign-extended.
     // TODO non-MUL case?
     if (MakeMUL_LOHI(LL, RL, Lo, Hi, true)) {
       Result.push_back(Lo);
       Result.push_back(Hi);
       return true;
     }
   }
 
   unsigned ShiftAmount = OuterBitSize - InnerBitSize;
   EVT ShiftAmountTy = getShiftAmountTy(VT, DAG.getDataLayout());
   if (APInt::getMaxValue(ShiftAmountTy.getSizeInBits()).ult(ShiftAmount)) {
     // FIXME getShiftAmountTy does not always return a sensible result when VT
     // is an illegal type, and so the type may be too small to fit the shift
     // amount. Override it with i32. The shift will have to be legalized.
     ShiftAmountTy = MVT::i32;
   }
   SDValue Shift = DAG.getConstant(ShiftAmount, dl, ShiftAmountTy);
 
   if (!LH.getNode() && !RH.getNode() &&
       isOperationLegalOrCustom(ISD::SRL, VT) &&
       isOperationLegalOrCustom(ISD::TRUNCATE, HiLoVT)) {
     LH = DAG.getNode(ISD::SRL, dl, VT, LHS, Shift);
     LH = DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, LH);
     RH = DAG.getNode(ISD::SRL, dl, VT, RHS, Shift);
     RH = DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, RH);
   }
 
   if (!LH.getNode())
     return false;
 
   if (!MakeMUL_LOHI(LL, RL, Lo, Hi, false))
     return false;
 
   Result.push_back(Lo);
 
   if (Opcode == ISD::MUL) {
     RH = DAG.getNode(ISD::MUL, dl, HiLoVT, LL, RH);
     LH = DAG.getNode(ISD::MUL, dl, HiLoVT, LH, RL);
     Hi = DAG.getNode(ISD::ADD, dl, HiLoVT, Hi, RH);
     Hi = DAG.getNode(ISD::ADD, dl, HiLoVT, Hi, LH);
     Result.push_back(Hi);
     return true;
   }
 
   // Compute the full width result.
   auto Merge = [&](SDValue Lo, SDValue Hi) -> SDValue {
     Lo = DAG.getNode(ISD::ZERO_EXTEND, dl, VT, Lo);
     Hi = DAG.getNode(ISD::ZERO_EXTEND, dl, VT, Hi);
     Hi = DAG.getNode(ISD::SHL, dl, VT, Hi, Shift);
     return DAG.getNode(ISD::OR, dl, VT, Lo, Hi);
   };
 
   SDValue Next = DAG.getNode(ISD::ZERO_EXTEND, dl, VT, Hi);
   if (!MakeMUL_LOHI(LL, RH, Lo, Hi, false))
     return false;
 
   // This is effectively the add part of a multiply-add of half-sized operands,
   // so it cannot overflow.
   Next = DAG.getNode(ISD::ADD, dl, VT, Next, Merge(Lo, Hi));
 
   if (!MakeMUL_LOHI(LH, RL, Lo, Hi, false))
     return false;
 
   SDValue Zero = DAG.getConstant(0, dl, HiLoVT);
   EVT BoolType = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
 
   bool UseGlue = (isOperationLegalOrCustom(ISD::ADDC, VT) &&
                   isOperationLegalOrCustom(ISD::ADDE, VT));
   if (UseGlue)
     Next = DAG.getNode(ISD::ADDC, dl, DAG.getVTList(VT, MVT::Glue), Next,
                        Merge(Lo, Hi));
   else
     Next = DAG.getNode(ISD::ADDCARRY, dl, DAG.getVTList(VT, BoolType), Next,
                        Merge(Lo, Hi), DAG.getConstant(0, dl, BoolType));
 
   SDValue Carry = Next.getValue(1);
   Result.push_back(DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, Next));
   Next = DAG.getNode(ISD::SRL, dl, VT, Next, Shift);
 
   if (!MakeMUL_LOHI(LH, RH, Lo, Hi, Opcode == ISD::SMUL_LOHI))
     return false;
 
   if (UseGlue)
     Hi = DAG.getNode(ISD::ADDE, dl, DAG.getVTList(HiLoVT, MVT::Glue), Hi, Zero,
                      Carry);
   else
     Hi = DAG.getNode(ISD::ADDCARRY, dl, DAG.getVTList(HiLoVT, BoolType), Hi,
                      Zero, Carry);
 
   Next = DAG.getNode(ISD::ADD, dl, VT, Next, Merge(Lo, Hi));
 
   if (Opcode == ISD::SMUL_LOHI) {
     SDValue NextSub = DAG.getNode(ISD::SUB, dl, VT, Next,
                                   DAG.getNode(ISD::ZERO_EXTEND, dl, VT, RL));
     Next = DAG.getSelectCC(dl, LH, Zero, NextSub, Next, ISD::SETLT);
 
     NextSub = DAG.getNode(ISD::SUB, dl, VT, Next,
                           DAG.getNode(ISD::ZERO_EXTEND, dl, VT, LL));
     Next = DAG.getSelectCC(dl, RH, Zero, NextSub, Next, ISD::SETLT);
   }
 
   Result.push_back(DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, Next));
   Next = DAG.getNode(ISD::SRL, dl, VT, Next, Shift);
   Result.push_back(DAG.getNode(ISD::TRUNCATE, dl, HiLoVT, Next));
   return true;
 }
 
 bool TargetLowering::expandMUL(SDNode *N, SDValue &Lo, SDValue &Hi, EVT HiLoVT,
                                SelectionDAG &DAG, MulExpansionKind Kind,
                                SDValue LL, SDValue LH, SDValue RL,
                                SDValue RH) const {
   SmallVector<SDValue, 2> Result;
   bool Ok = expandMUL_LOHI(N->getOpcode(), N->getValueType(0), N,
                            N->getOperand(0), N->getOperand(1), Result, HiLoVT,
                            DAG, Kind, LL, LH, RL, RH);
   if (Ok) {
     assert(Result.size() == 2);
     Lo = Result[0];
     Hi = Result[1];
   }
   return Ok;
 }
 
 // Check that (every element of) Z is undef or not an exact multiple of BW.
 static bool isNonZeroModBitWidth(SDValue Z, unsigned BW) {
   return ISD::matchUnaryPredicate(
       Z,
       [=](ConstantSDNode *C) { return !C || C->getAPIntValue().urem(BW) != 0; },
       true);
 }
 
 bool TargetLowering::expandFunnelShift(SDNode *Node, SDValue &Result,
                                        SelectionDAG &DAG) const {
   EVT VT = Node->getValueType(0);
 
   if (VT.isVector() && (!isOperationLegalOrCustom(ISD::SHL, VT) ||
                         !isOperationLegalOrCustom(ISD::SRL, VT) ||
                         !isOperationLegalOrCustom(ISD::SUB, VT) ||
                         !isOperationLegalOrCustomOrPromote(ISD::OR, VT)))
     return false;
 
   SDValue X = Node->getOperand(0);
   SDValue Y = Node->getOperand(1);
   SDValue Z = Node->getOperand(2);
 
   unsigned BW = VT.getScalarSizeInBits();
   bool IsFSHL = Node->getOpcode() == ISD::FSHL;
   SDLoc DL(SDValue(Node, 0));
 
   EVT ShVT = Z.getValueType();
 
   SDValue ShX, ShY;
   SDValue ShAmt, InvShAmt;
   if (isNonZeroModBitWidth(Z, BW)) {
     // fshl: X << C | Y >> (BW - C)
     // fshr: X << (BW - C) | Y >> C
     // where C = Z % BW is not zero
     SDValue BitWidthC = DAG.getConstant(BW, DL, ShVT);
     ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC);
     InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, BitWidthC, ShAmt);
     ShX = DAG.getNode(ISD::SHL, DL, VT, X, IsFSHL ? ShAmt : InvShAmt);
     ShY = DAG.getNode(ISD::SRL, DL, VT, Y, IsFSHL ? InvShAmt : ShAmt);
   } else {
     // fshl: X << (Z % BW) | Y >> 1 >> (BW - 1 - (Z % BW))
     // fshr: X << 1 << (BW - 1 - (Z % BW)) | Y >> (Z % BW)
     SDValue Mask = DAG.getConstant(BW - 1, DL, ShVT);
     if (isPowerOf2_32(BW)) {
       // Z % BW -> Z & (BW - 1)
       ShAmt = DAG.getNode(ISD::AND, DL, ShVT, Z, Mask);
       // (BW - 1) - (Z % BW) -> ~Z & (BW - 1)
       InvShAmt = DAG.getNode(ISD::AND, DL, ShVT, DAG.getNOT(DL, Z, ShVT), Mask);
     } else {
       SDValue BitWidthC = DAG.getConstant(BW, DL, ShVT);
       ShAmt = DAG.getNode(ISD::UREM, DL, ShVT, Z, BitWidthC);
       InvShAmt = DAG.getNode(ISD::SUB, DL, ShVT, Mask, ShAmt);
     }
 
     SDValue One = DAG.getConstant(1, DL, ShVT);
     if (IsFSHL) {
       ShX = DAG.getNode(ISD::SHL, DL, VT, X, ShAmt);
       SDValue ShY1 = DAG.getNode(ISD::SRL, DL, VT, Y, One);
       ShY = DAG.getNode(ISD::SRL, DL, VT, ShY1, InvShAmt);
     } else {
       SDValue ShX1 = DAG.getNode(ISD::SHL, DL, VT, X, One);
       ShX = DAG.getNode(ISD::SHL, DL, VT, ShX1, InvShAmt);
       ShY = DAG.getNode(ISD::SRL, DL, VT, Y, ShAmt);
     }
   }
   Result = DAG.getNode(ISD::OR, DL, VT, ShX, ShY);
   return true;
 }
 
 // TODO: Merge with expandFunnelShift.
 bool TargetLowering::expandROT(SDNode *Node, SDValue &Result,
                                SelectionDAG &DAG) const {
   EVT VT = Node->getValueType(0);
   unsigned EltSizeInBits = VT.getScalarSizeInBits();
   bool IsLeft = Node->getOpcode() == ISD::ROTL;
   SDValue Op0 = Node->getOperand(0);
   SDValue Op1 = Node->getOperand(1);
   SDLoc DL(SDValue(Node, 0));
 
   EVT ShVT = Op1.getValueType();
   SDValue Zero = DAG.getConstant(0, DL, ShVT);
 
   assert(isPowerOf2_32(EltSizeInBits) && EltSizeInBits > 1 &&
          "Expecting the type bitwidth to be a power of 2");
 
   // If a rotate in the other direction is supported, use it.
   unsigned RevRot = IsLeft ? ISD::ROTR : ISD::ROTL;
   if (isOperationLegalOrCustom(RevRot, VT)) {
     SDValue Sub = DAG.getNode(ISD::SUB, DL, ShVT, Zero, Op1);
     Result = DAG.getNode(RevRot, DL, VT, Op0, Sub);
     return true;
   }
 
   if (VT.isVector() && (!isOperationLegalOrCustom(ISD::SHL, VT) ||
                         !isOperationLegalOrCustom(ISD::SRL, VT) ||
                         !isOperationLegalOrCustom(ISD::SUB, VT) ||
                         !isOperationLegalOrCustomOrPromote(ISD::OR, VT) ||
                         !isOperationLegalOrCustomOrPromote(ISD::AND, VT)))
     return false;
 
   // Otherwise,
   //   (rotl x, c) -> (or (shl x, (and c, w-1)), (srl x, (and -c, w-1)))
   //   (rotr x, c) -> (or (srl x, (and c, w-1)), (shl x, (and -c, w-1)))
   //
   unsigned ShOpc = IsLeft ? ISD::SHL : ISD::SRL;
   unsigned HsOpc = IsLeft ? ISD::SRL : ISD::SHL;
   SDValue BitWidthMinusOneC = DAG.getConstant(EltSizeInBits - 1, DL, ShVT);
   SDValue NegOp1 = DAG.getNode(ISD::SUB, DL, ShVT, Zero, Op1);
   SDValue And0 = DAG.getNode(ISD::AND, DL, ShVT, Op1, BitWidthMinusOneC);
   SDValue And1 = DAG.getNode(ISD::AND, DL, ShVT, NegOp1, BitWidthMinusOneC);
   Result = DAG.getNode(ISD::OR, DL, VT, DAG.getNode(ShOpc, DL, VT, Op0, And0),
                        DAG.getNode(HsOpc, DL, VT, Op0, And1));
   return true;
 }
 
 bool TargetLowering::expandFP_TO_SINT(SDNode *Node, SDValue &Result,
                                       SelectionDAG &DAG) const {
   unsigned OpNo = Node->isStrictFPOpcode() ? 1 : 0;
   SDValue Src = Node->getOperand(OpNo);
   EVT SrcVT = Src.getValueType();
   EVT DstVT = Node->getValueType(0);
   SDLoc dl(SDValue(Node, 0));
 
   // FIXME: Only f32 to i64 conversions are supported.
   if (SrcVT != MVT::f32 || DstVT != MVT::i64)
     return false;
 
   if (Node->isStrictFPOpcode())
     // When a NaN is converted to an integer a trap is allowed. We can't
     // use this expansion here because it would eliminate that trap. Other
     // traps are also allowed and cannot be eliminated. See 
     // IEEE 754-2008 sec 5.8.
     return false;
 
   // Expand f32 -> i64 conversion
   // This algorithm comes from compiler-rt's implementation of fixsfdi:
   // https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/builtins/fixsfdi.c
   unsigned SrcEltBits = SrcVT.getScalarSizeInBits();
   EVT IntVT = SrcVT.changeTypeToInteger();
   EVT IntShVT = getShiftAmountTy(IntVT, DAG.getDataLayout());
 
   SDValue ExponentMask = DAG.getConstant(0x7F800000, dl, IntVT);
   SDValue ExponentLoBit = DAG.getConstant(23, dl, IntVT);
   SDValue Bias = DAG.getConstant(127, dl, IntVT);
   SDValue SignMask = DAG.getConstant(APInt::getSignMask(SrcEltBits), dl, IntVT);
   SDValue SignLowBit = DAG.getConstant(SrcEltBits - 1, dl, IntVT);
   SDValue MantissaMask = DAG.getConstant(0x007FFFFF, dl, IntVT);
 
   SDValue Bits = DAG.getNode(ISD::BITCAST, dl, IntVT, Src);
 
   SDValue ExponentBits = DAG.getNode(
       ISD::SRL, dl, IntVT, DAG.getNode(ISD::AND, dl, IntVT, Bits, ExponentMask),
       DAG.getZExtOrTrunc(ExponentLoBit, dl, IntShVT));
   SDValue Exponent = DAG.getNode(ISD::SUB, dl, IntVT, ExponentBits, Bias);
 
   SDValue Sign = DAG.getNode(ISD::SRA, dl, IntVT,
                              DAG.getNode(ISD::AND, dl, IntVT, Bits, SignMask),
                              DAG.getZExtOrTrunc(SignLowBit, dl, IntShVT));
   Sign = DAG.getSExtOrTrunc(Sign, dl, DstVT);
 
   SDValue R = DAG.getNode(ISD::OR, dl, IntVT,
                           DAG.getNode(ISD::AND, dl, IntVT, Bits, MantissaMask),
                           DAG.getConstant(0x00800000, dl, IntVT));
 
   R = DAG.getZExtOrTrunc(R, dl, DstVT);
 
   R = DAG.getSelectCC(
       dl, Exponent, ExponentLoBit,
       DAG.getNode(ISD::SHL, dl, DstVT, R,
                   DAG.getZExtOrTrunc(
                       DAG.getNode(ISD::SUB, dl, IntVT, Exponent, ExponentLoBit),
                       dl, IntShVT)),
       DAG.getNode(ISD::SRL, dl, DstVT, R,
                   DAG.getZExtOrTrunc(
                       DAG.getNode(ISD::SUB, dl, IntVT, ExponentLoBit, Exponent),
                       dl, IntShVT)),
       ISD::SETGT);
 
   SDValue Ret = DAG.getNode(ISD::SUB, dl, DstVT,
                             DAG.getNode(ISD::XOR, dl, DstVT, R, Sign), Sign);
 
   Result = DAG.getSelectCC(dl, Exponent, DAG.getConstant(0, dl, IntVT),
                            DAG.getConstant(0, dl, DstVT), Ret, ISD::SETLT);
   return true;
 }
 
 bool TargetLowering::expandFP_TO_UINT(SDNode *Node, SDValue &Result,
                                       SDValue &Chain,
                                       SelectionDAG &DAG) const {
   SDLoc dl(SDValue(Node, 0));
   unsigned OpNo = Node->isStrictFPOpcode() ? 1 : 0;
   SDValue Src = Node->getOperand(OpNo);
 
   EVT SrcVT = Src.getValueType();
   EVT DstVT = Node->getValueType(0);
   EVT SetCCVT =
       getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), SrcVT);
   EVT DstSetCCVT =
       getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), DstVT);
 
   // Only expand vector types if we have the appropriate vector bit operations.
   unsigned SIntOpcode = Node->isStrictFPOpcode() ? ISD::STRICT_FP_TO_SINT : 
                                                    ISD::FP_TO_SINT;
   if (DstVT.isVector() && (!isOperationLegalOrCustom(SIntOpcode, DstVT) ||
                            !isOperationLegalOrCustomOrPromote(ISD::XOR, SrcVT)))
     return false;
 
   // If the maximum float value is smaller then the signed integer range,
   // the destination signmask can't be represented by the float, so we can
   // just use FP_TO_SINT directly.
   const fltSemantics &APFSem = DAG.EVTToAPFloatSemantics(SrcVT);
   APFloat APF(APFSem, APInt::getNullValue(SrcVT.getScalarSizeInBits()));
   APInt SignMask = APInt::getSignMask(DstVT.getScalarSizeInBits());
   if (APFloat::opOverflow &
       APF.convertFromAPInt(SignMask, false, APFloat::rmNearestTiesToEven)) {
     if (Node->isStrictFPOpcode()) {
       Result = DAG.getNode(ISD::STRICT_FP_TO_SINT, dl, { DstVT, MVT::Other }, 
                            { Node->getOperand(0), Src }); 
       Chain = Result.getValue(1);
     } else
       Result = DAG.getNode(ISD::FP_TO_SINT, dl, DstVT, Src);
     return true;
   }
 
   SDValue Cst = DAG.getConstantFP(APF, dl, SrcVT);
   SDValue Sel;
 
   if (Node->isStrictFPOpcode()) {
     Sel = DAG.getSetCC(dl, SetCCVT, Src, Cst, ISD::SETLT,
                        Node->getOperand(0), /*IsSignaling*/ true);
     Chain = Sel.getValue(1);
   } else {
     Sel = DAG.getSetCC(dl, SetCCVT, Src, Cst, ISD::SETLT);
   }
 
   bool Strict = Node->isStrictFPOpcode() ||
                 shouldUseStrictFP_TO_INT(SrcVT, DstVT, /*IsSigned*/ false);
 
   if (Strict) {
     // Expand based on maximum range of FP_TO_SINT, if the value exceeds the
     // signmask then offset (the result of which should be fully representable).
     // Sel = Src < 0x8000000000000000
     // FltOfs = select Sel, 0, 0x8000000000000000
     // IntOfs = select Sel, 0, 0x8000000000000000
     // Result = fp_to_sint(Src - FltOfs) ^ IntOfs
 
     // TODO: Should any fast-math-flags be set for the FSUB?
     SDValue FltOfs = DAG.getSelect(dl, SrcVT, Sel,
                                    DAG.getConstantFP(0.0, dl, SrcVT), Cst);
     Sel = DAG.getBoolExtOrTrunc(Sel, dl, DstSetCCVT, DstVT);
     SDValue IntOfs = DAG.getSelect(dl, DstVT, Sel,
                                    DAG.getConstant(0, dl, DstVT),
                                    DAG.getConstant(SignMask, dl, DstVT));
     SDValue SInt;
     if (Node->isStrictFPOpcode()) {
       SDValue Val = DAG.getNode(ISD::STRICT_FSUB, dl, { SrcVT, MVT::Other }, 
                                 { Chain, Src, FltOfs });
       SInt = DAG.getNode(ISD::STRICT_FP_TO_SINT, dl, { DstVT, MVT::Other }, 
                          { Val.getValue(1), Val });
       Chain = SInt.getValue(1);
     } else {
       SDValue Val = DAG.getNode(ISD::FSUB, dl, SrcVT, Src, FltOfs);
       SInt = DAG.getNode(ISD::FP_TO_SINT, dl, DstVT, Val);
     }
     Result = DAG.getNode(ISD::XOR, dl, DstVT, SInt, IntOfs);
   } else {
     // Expand based on maximum range of FP_TO_SINT:
     // True = fp_to_sint(Src)
     // False = 0x8000000000000000 + fp_to_sint(Src - 0x8000000000000000)
     // Result = select (Src < 0x8000000000000000), True, False
 
     SDValue True = DAG.getNode(ISD::FP_TO_SINT, dl, DstVT, Src);
     // TODO: Should any fast-math-flags be set for the FSUB?
     SDValue False = DAG.getNode(ISD::FP_TO_SINT, dl, DstVT,
                                 DAG.getNode(ISD::FSUB, dl, SrcVT, Src, Cst));
     False = DAG.getNode(ISD::XOR, dl, DstVT, False,
                         DAG.getConstant(SignMask, dl, DstVT));
     Sel = DAG.getBoolExtOrTrunc(Sel, dl, DstSetCCVT, DstVT);
     Result = DAG.getSelect(dl, DstVT, Sel, True, False);
   }
   return true;
 }
 
 bool TargetLowering::expandUINT_TO_FP(SDNode *Node, SDValue &Result,
                                       SDValue &Chain,
                                       SelectionDAG &DAG) const {
   unsigned OpNo = Node->isStrictFPOpcode() ? 1 : 0;
   SDValue Src = Node->getOperand(OpNo);
   EVT SrcVT = Src.getValueType();
   EVT DstVT = Node->getValueType(0);
 
   if (SrcVT.getScalarType() != MVT::i64 || DstVT.getScalarType() != MVT::f64)
     return false;
 
   // Only expand vector types if we have the appropriate vector bit operations.
   if (SrcVT.isVector() && (!isOperationLegalOrCustom(ISD::SRL, SrcVT) ||
                            !isOperationLegalOrCustom(ISD::FADD, DstVT) ||
                            !isOperationLegalOrCustom(ISD::FSUB, DstVT) ||
                            !isOperationLegalOrCustomOrPromote(ISD::OR, SrcVT) ||
                            !isOperationLegalOrCustomOrPromote(ISD::AND, SrcVT)))
     return false;
 
   SDLoc dl(SDValue(Node, 0));
   EVT ShiftVT = getShiftAmountTy(SrcVT, DAG.getDataLayout());
 
   // Implementation of unsigned i64 to f64 following the algorithm in
   // __floatundidf in compiler_rt. This implementation has the advantage
   // of performing rounding correctly, both in the default rounding mode
   // and in all alternate rounding modes.
   SDValue TwoP52 = DAG.getConstant(UINT64_C(0x4330000000000000), dl, SrcVT);
   SDValue TwoP84PlusTwoP52 = DAG.getConstantFP(
       BitsToDouble(UINT64_C(0x4530000000100000)), dl, DstVT);
   SDValue TwoP84 = DAG.getConstant(UINT64_C(0x4530000000000000), dl, SrcVT);
   SDValue LoMask = DAG.getConstant(UINT64_C(0x00000000FFFFFFFF), dl, SrcVT);
   SDValue HiShift = DAG.getConstant(32, dl, ShiftVT);
 
   SDValue Lo = DAG.getNode(ISD::AND, dl, SrcVT, Src, LoMask);
   SDValue Hi = DAG.getNode(ISD::SRL, dl, SrcVT, Src, HiShift);
   SDValue LoOr = DAG.getNode(ISD::OR, dl, SrcVT, Lo, TwoP52);
   SDValue HiOr = DAG.getNode(ISD::OR, dl, SrcVT, Hi, TwoP84);
   SDValue LoFlt = DAG.getBitcast(DstVT, LoOr);
   SDValue HiFlt = DAG.getBitcast(DstVT, HiOr);
   if (Node->isStrictFPOpcode()) {
     SDValue HiSub =
         DAG.getNode(ISD::STRICT_FSUB, dl, {DstVT, MVT::Other},
                     {Node->getOperand(0), HiFlt, TwoP84PlusTwoP52});
     Result = DAG.getNode(ISD::STRICT_FADD, dl, {DstVT, MVT::Other},
                          {HiSub.getValue(1), LoFlt, HiSub});
     Chain = Result.getValue(1);
   } else {
     SDValue HiSub =
         DAG.getNode(ISD::FSUB, dl, DstVT, HiFlt, TwoP84PlusTwoP52);
     Result = DAG.getNode(ISD::FADD, dl, DstVT, LoFlt, HiSub);
   }
   return true;
 }
 
 SDValue TargetLowering::expandFMINNUM_FMAXNUM(SDNode *Node,
                                               SelectionDAG &DAG) const {
   SDLoc dl(Node);
   unsigned NewOp = Node->getOpcode() == ISD::FMINNUM ?
     ISD::FMINNUM_IEEE : ISD::FMAXNUM_IEEE;
   EVT VT = Node->getValueType(0);
   if (isOperationLegalOrCustom(NewOp, VT)) {
     SDValue Quiet0 = Node->getOperand(0);
     SDValue Quiet1 = Node->getOperand(1);
 
     if (!Node->getFlags().hasNoNaNs()) {
       // Insert canonicalizes if it's possible we need to quiet to get correct
       // sNaN behavior.
       if (!DAG.isKnownNeverSNaN(Quiet0)) {
         Quiet0 = DAG.getNode(ISD::FCANONICALIZE, dl, VT, Quiet0,
                              Node->getFlags());
       }
       if (!DAG.isKnownNeverSNaN(Quiet1)) {
         Quiet1 = DAG.getNode(ISD::FCANONICALIZE, dl, VT, Quiet1,
                              Node->getFlags());
       }
     }
 
     return DAG.getNode(NewOp, dl, VT, Quiet0, Quiet1, Node->getFlags());
   }
 
   // If the target has FMINIMUM/FMAXIMUM but not FMINNUM/FMAXNUM use that
   // instead if there are no NaNs.
   if (Node->getFlags().hasNoNaNs()) {
     unsigned IEEE2018Op =
         Node->getOpcode() == ISD::FMINNUM ? ISD::FMINIMUM : ISD::FMAXIMUM;
     if (isOperationLegalOrCustom(IEEE2018Op, VT)) {
       return DAG.getNode(IEEE2018Op, dl, VT, Node->getOperand(0),
                          Node->getOperand(1), Node->getFlags());
     }
   }
 
   // If none of the above worked, but there are no NaNs, then expand to
   // a compare/select sequence.  This is required for correctness since
   // InstCombine might have canonicalized a fcmp+select sequence to a
   // FMINNUM/FMAXNUM node.  If we were to fall through to the default
   // expansion to libcall, we might introduce a link-time dependency
   // on libm into a file that originally did not have one.
   if (Node->getFlags().hasNoNaNs()) {
     ISD::CondCode Pred =
         Node->getOpcode() == ISD::FMINNUM ? ISD::SETLT : ISD::SETGT;
     SDValue Op1 = Node->getOperand(0);
     SDValue Op2 = Node->getOperand(1);
     SDValue SelCC = DAG.getSelectCC(dl, Op1, Op2, Op1, Op2, Pred);
     // Copy FMF flags, but always set the no-signed-zeros flag
     // as this is implied by the FMINNUM/FMAXNUM semantics.
     SDNodeFlags Flags = Node->getFlags();
     Flags.setNoSignedZeros(true);
     SelCC->setFlags(Flags);
     return SelCC;
   }
 
   return SDValue();
 }
 
 bool TargetLowering::expandCTPOP(SDNode *Node, SDValue &Result,
                                  SelectionDAG &DAG) const {
   SDLoc dl(Node);
   EVT VT = Node->getValueType(0);
   EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());
   SDValue Op = Node->getOperand(0);
   unsigned Len = VT.getScalarSizeInBits();
   assert(VT.isInteger() && "CTPOP not implemented for this type.");
 
   // TODO: Add support for irregular type lengths.
   if (!(Len <= 128 && Len % 8 == 0))
     return false;
 
   // Only expand vector types if we have the appropriate vector bit operations.
   if (VT.isVector() && (!isOperationLegalOrCustom(ISD::ADD, VT) ||
                         !isOperationLegalOrCustom(ISD::SUB, VT) ||
                         !isOperationLegalOrCustom(ISD::SRL, VT) ||
                         (Len != 8 && !isOperationLegalOrCustom(ISD::MUL, VT)) ||
                         !isOperationLegalOrCustomOrPromote(ISD::AND, VT)))
     return false;
 
   // This is the "best" algorithm from
   // http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
   SDValue Mask55 =
       DAG.getConstant(APInt::getSplat(Len, APInt(8, 0x55)), dl, VT);
   SDValue Mask33 =
       DAG.getConstant(APInt::getSplat(Len, APInt(8, 0x33)), dl, VT);
   SDValue Mask0F =
       DAG.getConstant(APInt::getSplat(Len, APInt(8, 0x0F)), dl, VT);
   SDValue Mask01 =
       DAG.getConstant(APInt::getSplat(Len, APInt(8, 0x01)), dl, VT);
 
   // v = v - ((v >> 1) & 0x55555555...)
   Op = DAG.getNode(ISD::SUB, dl, VT, Op,
                    DAG.getNode(ISD::AND, dl, VT,
                                DAG.getNode(ISD::SRL, dl, VT, Op,
                                            DAG.getConstant(1, dl, ShVT)),
                                Mask55));
   // v = (v & 0x33333333...) + ((v >> 2) & 0x33333333...)
   Op = DAG.getNode(ISD::ADD, dl, VT, DAG.getNode(ISD::AND, dl, VT, Op, Mask33),
                    DAG.getNode(ISD::AND, dl, VT,
                                DAG.getNode(ISD::SRL, dl, VT, Op,
                                            DAG.getConstant(2, dl, ShVT)),
                                Mask33));
   // v = (v + (v >> 4)) & 0x0F0F0F0F...
   Op = DAG.getNode(ISD::AND, dl, VT,
                    DAG.getNode(ISD::ADD, dl, VT, Op,
                                DAG.getNode(ISD::SRL, dl, VT, Op,
                                            DAG.getConstant(4, dl, ShVT))),
                    Mask0F);
   // v = (v * 0x01010101...) >> (Len - 8)
   if (Len > 8)
     Op =
         DAG.getNode(ISD::SRL, dl, VT, DAG.getNode(ISD::MUL, dl, VT, Op, Mask01),
                     DAG.getConstant(Len - 8, dl, ShVT));
 
   Result = Op;
   return true;
 }
 
 bool TargetLowering::expandCTLZ(SDNode *Node, SDValue &Result,
                                 SelectionDAG &DAG) const {
   SDLoc dl(Node);
   EVT VT = Node->getValueType(0);
   EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());
   SDValue Op = Node->getOperand(0);
   unsigned NumBitsPerElt = VT.getScalarSizeInBits();
 
   // If the non-ZERO_UNDEF version is supported we can use that instead.
   if (Node->getOpcode() == ISD::CTLZ_ZERO_UNDEF &&
       isOperationLegalOrCustom(ISD::CTLZ, VT)) {
     Result = DAG.getNode(ISD::CTLZ, dl, VT, Op);
     return true;
   }
 
   // If the ZERO_UNDEF version is supported use that and handle the zero case.
   if (isOperationLegalOrCustom(ISD::CTLZ_ZERO_UNDEF, VT)) {
     EVT SetCCVT =
         getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
     SDValue CTLZ = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, dl, VT, Op);
     SDValue Zero = DAG.getConstant(0, dl, VT);
     SDValue SrcIsZero = DAG.getSetCC(dl, SetCCVT, Op, Zero, ISD::SETEQ);
     Result = DAG.getNode(ISD::SELECT, dl, VT, SrcIsZero,
                          DAG.getConstant(NumBitsPerElt, dl, VT), CTLZ);
     return true;
   }
 
   // Only expand vector types if we have the appropriate vector bit operations.
   if (VT.isVector() && (!isPowerOf2_32(NumBitsPerElt) ||
                         !isOperationLegalOrCustom(ISD::CTPOP, VT) ||
                         !isOperationLegalOrCustom(ISD::SRL, VT) ||
                         !isOperationLegalOrCustomOrPromote(ISD::OR, VT)))
     return false;
 
   // for now, we do this:
   // x = x | (x >> 1);
   // x = x | (x >> 2);
   // ...
   // x = x | (x >>16);
   // x = x | (x >>32); // for 64-bit input
   // return popcount(~x);
   //
   // Ref: "Hacker's Delight" by Henry Warren
   for (unsigned i = 0; (1U << i) <= (NumBitsPerElt / 2); ++i) {
     SDValue Tmp = DAG.getConstant(1ULL << i, dl, ShVT);
     Op = DAG.getNode(ISD::OR, dl, VT, Op,
                      DAG.getNode(ISD::SRL, dl, VT, Op, Tmp));
   }
   Op = DAG.getNOT(dl, Op, VT);
   Result = DAG.getNode(ISD::CTPOP, dl, VT, Op);
   return true;
 }
 
 bool TargetLowering::expandCTTZ(SDNode *Node, SDValue &Result,
                                 SelectionDAG &DAG) const {
   SDLoc dl(Node);
   EVT VT = Node->getValueType(0);
   SDValue Op = Node->getOperand(0);
   unsigned NumBitsPerElt = VT.getScalarSizeInBits();
 
   // If the non-ZERO_UNDEF version is supported we can use that instead.
   if (Node->getOpcode() == ISD::CTTZ_ZERO_UNDEF &&
       isOperationLegalOrCustom(ISD::CTTZ, VT)) {
     Result = DAG.getNode(ISD::CTTZ, dl, VT, Op);
     return true;
   }
 
   // If the ZERO_UNDEF version is supported use that and handle the zero case.
   if (isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)) {
     EVT SetCCVT =
         getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
     SDValue CTTZ = DAG.getNode(ISD::CTTZ_ZERO_UNDEF, dl, VT, Op);
     SDValue Zero = DAG.getConstant(0, dl, VT);
     SDValue SrcIsZero = DAG.getSetCC(dl, SetCCVT, Op, Zero, ISD::SETEQ);
     Result = DAG.getNode(ISD::SELECT, dl, VT, SrcIsZero,
                          DAG.getConstant(NumBitsPerElt, dl, VT), CTTZ);
     return true;
   }
 
   // Only expand vector types if we have the appropriate vector bit operations.
   if (VT.isVector() && (!isPowerOf2_32(NumBitsPerElt) ||
                         (!isOperationLegalOrCustom(ISD::CTPOP, VT) &&
                          !isOperationLegalOrCustom(ISD::CTLZ, VT)) ||
                         !isOperationLegalOrCustom(ISD::SUB, VT) ||
                         !isOperationLegalOrCustomOrPromote(ISD::AND, VT) ||
                         !isOperationLegalOrCustomOrPromote(ISD::XOR, VT)))
     return false;
 
   // for now, we use: { return popcount(~x & (x - 1)); }
   // unless the target has ctlz but not ctpop, in which case we use:
   // { return 32 - nlz(~x & (x-1)); }
   // Ref: "Hacker's Delight" by Henry Warren
   SDValue Tmp = DAG.getNode(
       ISD::AND, dl, VT, DAG.getNOT(dl, Op, VT),
       DAG.getNode(ISD::SUB, dl, VT, Op, DAG.getConstant(1, dl, VT)));
 
   // If ISD::CTLZ is legal and CTPOP isn't, then do that instead.
   if (isOperationLegal(ISD::CTLZ, VT) && !isOperationLegal(ISD::CTPOP, VT)) {
     Result =
         DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(NumBitsPerElt, dl, VT),
                     DAG.getNode(ISD::CTLZ, dl, VT, Tmp));
     return true;
   }
 
   Result = DAG.getNode(ISD::CTPOP, dl, VT, Tmp);
   return true;
 }
 
 bool TargetLowering::expandABS(SDNode *N, SDValue &Result,
                                SelectionDAG &DAG) const {
   SDLoc dl(N);
   EVT VT = N->getValueType(0);
   EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());
   SDValue Op = N->getOperand(0);
 
   // Only expand vector types if we have the appropriate vector operations.
   if (VT.isVector() && (!isOperationLegalOrCustom(ISD::SRA, VT) ||
                         !isOperationLegalOrCustom(ISD::ADD, VT) ||
                         !isOperationLegalOrCustomOrPromote(ISD::XOR, VT)))
     return false;
 
   SDValue Shift =
       DAG.getNode(ISD::SRA, dl, VT, Op,
                   DAG.getConstant(VT.getScalarSizeInBits() - 1, dl, ShVT));
   SDValue Add = DAG.getNode(ISD::ADD, dl, VT, Op, Shift);
   Result = DAG.getNode(ISD::XOR, dl, VT, Add, Shift);
   return true;
 }
 
 std::pair<SDValue, SDValue>
 TargetLowering::scalarizeVectorLoad(LoadSDNode *LD,
                                     SelectionDAG &DAG) const {
   SDLoc SL(LD);
   SDValue Chain = LD->getChain();
   SDValue BasePTR = LD->getBasePtr();
   EVT SrcVT = LD->getMemoryVT();
   EVT DstVT = LD->getValueType(0);
   ISD::LoadExtType ExtType = LD->getExtensionType();
 
   unsigned NumElem = SrcVT.getVectorNumElements();
 
   EVT SrcEltVT = SrcVT.getScalarType();
   EVT DstEltVT = DstVT.getScalarType();
 
   // A vector must always be stored in memory as-is, i.e. without any padding
   // between the elements, since various code depend on it, e.g. in the
   // handling of a bitcast of a vector type to int, which may be done with a
   // vector store followed by an integer load. A vector that does not have
   // elements that are byte-sized must therefore be stored as an integer
   // built out of the extracted vector elements.
   if (!SrcEltVT.isByteSized()) {
     unsigned NumLoadBits = SrcVT.getStoreSizeInBits();
     EVT LoadVT = EVT::getIntegerVT(*DAG.getContext(), NumLoadBits);
 
     unsigned NumSrcBits = SrcVT.getSizeInBits();
     EVT SrcIntVT = EVT::getIntegerVT(*DAG.getContext(), NumSrcBits);
 
     unsigned SrcEltBits = SrcEltVT.getSizeInBits();
     SDValue SrcEltBitMask = DAG.getConstant(
         APInt::getLowBitsSet(NumLoadBits, SrcEltBits), SL, LoadVT);
 
     // Load the whole vector and avoid masking off the top bits as it makes
     // the codegen worse.
     SDValue Load =
         DAG.getExtLoad(ISD::EXTLOAD, SL, LoadVT, Chain, BasePTR,
                        LD->getPointerInfo(), SrcIntVT, LD->getAlignment(),
                        LD->getMemOperand()->getFlags(), LD->getAAInfo());
 
     SmallVector<SDValue, 8> Vals;
     for (unsigned Idx = 0; Idx < NumElem; ++Idx) {
       unsigned ShiftIntoIdx =
           (DAG.getDataLayout().isBigEndian() ? (NumElem - 1) - Idx : Idx);
       SDValue ShiftAmount =
           DAG.getShiftAmountConstant(ShiftIntoIdx * SrcEltVT.getSizeInBits(),
                                      LoadVT, SL, /*LegalTypes=*/false);
       SDValue ShiftedElt = DAG.getNode(ISD::SRL, SL, LoadVT, Load, ShiftAmount);
       SDValue Elt =
           DAG.getNode(ISD::AND, SL, LoadVT, ShiftedElt, SrcEltBitMask);
       SDValue Scalar = DAG.getNode(ISD::TRUNCATE, SL, SrcEltVT, Elt);
 
       if (ExtType != ISD::NON_EXTLOAD) {
         unsigned ExtendOp = ISD::getExtForLoadExtType(false, ExtType);
         Scalar = DAG.getNode(ExtendOp, SL, DstEltVT, Scalar);
       }
 
       Vals.push_back(Scalar);
     }
 
     SDValue Value = DAG.getBuildVector(DstVT, SL, Vals);
     return std::make_pair(Value, Load.getValue(1));
   }
 
   unsigned Stride = SrcEltVT.getSizeInBits() / 8;
   assert(SrcEltVT.isByteSized());
 
   SmallVector<SDValue, 8> Vals;
   SmallVector<SDValue, 8> LoadChains;
 
   for (unsigned Idx = 0; Idx < NumElem; ++Idx) {
     SDValue ScalarLoad =
         DAG.getExtLoad(ExtType, SL, DstEltVT, Chain, BasePTR,
                        LD->getPointerInfo().getWithOffset(Idx * Stride),
                        SrcEltVT, MinAlign(LD->getAlignment(), Idx * Stride),
                        LD->getMemOperand()->getFlags(), LD->getAAInfo());
 
     BasePTR = DAG.getObjectPtrOffset(SL, BasePTR, Stride);
 
     Vals.push_back(ScalarLoad.getValue(0));
     LoadChains.push_back(ScalarLoad.getValue(1));
   }
 
   SDValue NewChain = DAG.getNode(ISD::TokenFactor, SL, MVT::Other, LoadChains);
   SDValue Value = DAG.getBuildVector(DstVT, SL, Vals);
 
   return std::make_pair(Value, NewChain);
 }
 
 SDValue TargetLowering::scalarizeVectorStore(StoreSDNode *ST,
                                              SelectionDAG &DAG) const {
   SDLoc SL(ST);
 
   SDValue Chain = ST->getChain();
   SDValue BasePtr = ST->getBasePtr();
   SDValue Value = ST->getValue();
   EVT StVT = ST->getMemoryVT();
 
   // The type of the data we want to save
   EVT RegVT = Value.getValueType();
   EVT RegSclVT = RegVT.getScalarType();
 
   // The type of data as saved in memory.
   EVT MemSclVT = StVT.getScalarType();
 
   unsigned NumElem = StVT.getVectorNumElements();
 
   // A vector must always be stored in memory as-is, i.e. without any padding
   // between the elements, since various code depend on it, e.g. in the
   // handling of a bitcast of a vector type to int, which may be done with a
   // vector store followed by an integer load. A vector that does not have
   // elements that are byte-sized must therefore be stored as an integer
   // built out of the extracted vector elements.
   if (!MemSclVT.isByteSized()) {
     unsigned NumBits = StVT.getSizeInBits();
     EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), NumBits);
 
     SDValue CurrVal = DAG.getConstant(0, SL, IntVT);
 
     for (unsigned Idx = 0; Idx < NumElem; ++Idx) {
       SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, RegSclVT, Value,
                                 DAG.getVectorIdxConstant(Idx, SL));
       SDValue Trunc = DAG.getNode(ISD::TRUNCATE, SL, MemSclVT, Elt);
       SDValue ExtElt = DAG.getNode(ISD::ZERO_EXTEND, SL, IntVT, Trunc);
       unsigned ShiftIntoIdx =
           (DAG.getDataLayout().isBigEndian() ? (NumElem - 1) - Idx : Idx);
       SDValue ShiftAmount =
           DAG.getConstant(ShiftIntoIdx * MemSclVT.getSizeInBits(), SL, IntVT);
       SDValue ShiftedElt =
           DAG.getNode(ISD::SHL, SL, IntVT, ExtElt, ShiftAmount);
       CurrVal = DAG.getNode(ISD::OR, SL, IntVT, CurrVal, ShiftedElt);
     }
 
     return DAG.getStore(Chain, SL, CurrVal, BasePtr, ST->getPointerInfo(),
                         ST->getAlignment(), ST->getMemOperand()->getFlags(),
                         ST->getAAInfo());
   }
 
   // Store Stride in bytes
   unsigned Stride = MemSclVT.getSizeInBits() / 8;
   assert(Stride && "Zero stride!");
   // Extract each of the elements from the original vector and save them into
   // memory individually.
   SmallVector<SDValue, 8> Stores;
   for (unsigned Idx = 0; Idx < NumElem; ++Idx) {
     SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, RegSclVT, Value,
                               DAG.getVectorIdxConstant(Idx, SL));
 
     SDValue Ptr = DAG.getObjectPtrOffset(SL, BasePtr, Idx * Stride);
 
     // This scalar TruncStore may be illegal, but we legalize it later.
     SDValue Store = DAG.getTruncStore(
         Chain, SL, Elt, Ptr, ST->getPointerInfo().getWithOffset(Idx * Stride),
         MemSclVT, MinAlign(ST->getAlignment(), Idx * Stride),
         ST->getMemOperand()->getFlags(), ST->getAAInfo());
 
     Stores.push_back(Store);
   }
 
   return DAG.getNode(ISD::TokenFactor, SL, MVT::Other, Stores);
 }
 
 std::pair<SDValue, SDValue>
 TargetLowering::expandUnalignedLoad(LoadSDNode *LD, SelectionDAG &DAG) const {
   assert(LD->getAddressingMode() == ISD::UNINDEXED &&
          "unaligned indexed loads not implemented!");
   SDValue Chain = LD->getChain();
   SDValue Ptr = LD->getBasePtr();
   EVT VT = LD->getValueType(0);
   EVT LoadedVT = LD->getMemoryVT();
   SDLoc dl(LD);
   auto &MF = DAG.getMachineFunction();
 
   if (VT.isFloatingPoint() || VT.isVector()) {
     EVT intVT = EVT::getIntegerVT(*DAG.getContext(), LoadedVT.getSizeInBits());
     if (isTypeLegal(intVT) && isTypeLegal(LoadedVT)) {
       if (!isOperationLegalOrCustom(ISD::LOAD, intVT) &&
           LoadedVT.isVector()) {
         // Scalarize the load and let the individual components be handled.
         return scalarizeVectorLoad(LD, DAG);
       }
 
       // Expand to a (misaligned) integer load of the same size,
       // then bitconvert to floating point or vector.
       SDValue newLoad = DAG.getLoad(intVT, dl, Chain, Ptr,
                                     LD->getMemOperand());
       SDValue Result = DAG.getNode(ISD::BITCAST, dl, LoadedVT, newLoad);
       if (LoadedVT != VT)
         Result = DAG.getNode(VT.isFloatingPoint() ? ISD::FP_EXTEND :
                              ISD::ANY_EXTEND, dl, VT, Result);
 
       return std::make_pair(Result, newLoad.getValue(1));
     }
 
     // Copy the value to a (aligned) stack slot using (unaligned) integer
     // loads and stores, then do a (aligned) load from the stack slot.
     MVT RegVT = getRegisterType(*DAG.getContext(), intVT);
     unsigned LoadedBytes = LoadedVT.getStoreSize();
     unsigned RegBytes = RegVT.getSizeInBits() / 8;
     unsigned NumRegs = (LoadedBytes + RegBytes - 1) / RegBytes;
 
     // Make sure the stack slot is also aligned for the register type.
     SDValue StackBase = DAG.CreateStackTemporary(LoadedVT, RegVT);
     auto FrameIndex = cast<FrameIndexSDNode>(StackBase.getNode())->getIndex();
     SmallVector<SDValue, 8> Stores;
     SDValue StackPtr = StackBase;
     unsigned Offset = 0;
 
     EVT PtrVT = Ptr.getValueType();
     EVT StackPtrVT = StackPtr.getValueType();
 
     SDValue PtrIncrement = DAG.getConstant(RegBytes, dl, PtrVT);
     SDValue StackPtrIncrement = DAG.getConstant(RegBytes, dl, StackPtrVT);
 
     // Do all but one copies using the full register width.
     for (unsigned i = 1; i < NumRegs; i++) {
       // Load one integer register's worth from the original location.
       SDValue Load = DAG.getLoad(
           RegVT, dl, Chain, Ptr, LD->getPointerInfo().getWithOffset(Offset),
           MinAlign(LD->getAlignment(), Offset), LD->getMemOperand()->getFlags(),
           LD->getAAInfo());
       // Follow the load with a store to the stack slot.  Remember the store.
       Stores.push_back(DAG.getStore(
           Load.getValue(1), dl, Load, StackPtr,
           MachinePointerInfo::getFixedStack(MF, FrameIndex, Offset)));
       // Increment the pointers.
       Offset += RegBytes;
 
       Ptr = DAG.getObjectPtrOffset(dl, Ptr, PtrIncrement);
       StackPtr = DAG.getObjectPtrOffset(dl, StackPtr, StackPtrIncrement);
     }
 
     // The last copy may be partial.  Do an extending load.
     EVT MemVT = EVT::getIntegerVT(*DAG.getContext(),
                                   8 * (LoadedBytes - Offset));
     SDValue Load =
         DAG.getExtLoad(ISD::EXTLOAD, dl, RegVT, Chain, Ptr,
                        LD->getPointerInfo().getWithOffset(Offset), MemVT,
                        MinAlign(LD->getAlignment(), Offset),
                        LD->getMemOperand()->getFlags(), LD->getAAInfo());
     // Follow the load with a store to the stack slot.  Remember the store.
     // On big-endian machines this requires a truncating store to ensure
     // that the bits end up in the right place.
     Stores.push_back(DAG.getTruncStore(
         Load.getValue(1), dl, Load, StackPtr,
         MachinePointerInfo::getFixedStack(MF, FrameIndex, Offset), MemVT));
 
     // The order of the stores doesn't matter - say it with a TokenFactor.
     SDValue TF = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Stores);
 
     // Finally, perform the original load only redirected to the stack slot.
     Load = DAG.getExtLoad(LD->getExtensionType(), dl, VT, TF, StackBase,
                           MachinePointerInfo::getFixedStack(MF, FrameIndex, 0),
                           LoadedVT);
 
     // Callers expect a MERGE_VALUES node.
     return std::make_pair(Load, TF);
   }
 
   assert(LoadedVT.isInteger() && !LoadedVT.isVector() &&
          "Unaligned load of unsupported type.");
 
   // Compute the new VT that is half the size of the old one.  This is an
   // integer MVT.
   unsigned NumBits = LoadedVT.getSizeInBits();
   EVT NewLoadedVT;
   NewLoadedVT = EVT::getIntegerVT(*DAG.getContext(), NumBits/2);
   NumBits >>= 1;
 
   unsigned Alignment = LD->getAlignment();
   unsigned IncrementSize = NumBits / 8;
   ISD::LoadExtType HiExtType = LD->getExtensionType();
 
   // If the original load is NON_EXTLOAD, the hi part load must be ZEXTLOAD.
   if (HiExtType == ISD::NON_EXTLOAD)
     HiExtType = ISD::ZEXTLOAD;
 
   // Load the value in two parts
   SDValue Lo, Hi;
   if (DAG.getDataLayout().isLittleEndian()) {
     Lo = DAG.getExtLoad(ISD::ZEXTLOAD, dl, VT, Chain, Ptr, LD->getPointerInfo(),
                         NewLoadedVT, Alignment, LD->getMemOperand()->getFlags(),
                         LD->getAAInfo());
 
     Ptr = DAG.getObjectPtrOffset(dl, Ptr, IncrementSize);
     Hi = DAG.getExtLoad(HiExtType, dl, VT, Chain, Ptr,
                         LD->getPointerInfo().getWithOffset(IncrementSize),
                         NewLoadedVT, MinAlign(Alignment, IncrementSize),
                         LD->getMemOperand()->getFlags(), LD->getAAInfo());
   } else {
     Hi = DAG.getExtLoad(HiExtType, dl, VT, Chain, Ptr, LD->getPointerInfo(),
                         NewLoadedVT, Alignment, LD->getMemOperand()->getFlags(),
                         LD->getAAInfo());
 
     Ptr = DAG.getObjectPtrOffset(dl, Ptr, IncrementSize);
     Lo = DAG.getExtLoad(ISD::ZEXTLOAD, dl, VT, Chain, Ptr,
                         LD->getPointerInfo().getWithOffset(IncrementSize),
                         NewLoadedVT, MinAlign(Alignment, IncrementSize),
                         LD->getMemOperand()->getFlags(), LD->getAAInfo());
   }
 
   // aggregate the two parts
   SDValue ShiftAmount =
       DAG.getConstant(NumBits, dl, getShiftAmountTy(Hi.getValueType(),
                                                     DAG.getDataLayout()));
   SDValue Result = DAG.getNode(ISD::SHL, dl, VT, Hi, ShiftAmount);
   Result = DAG.getNode(ISD::OR, dl, VT, Result, Lo);
 
   SDValue TF = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
                              Hi.getValue(1));
 
   return std::make_pair(Result, TF);
 }
 
 SDValue TargetLowering::expandUnalignedStore(StoreSDNode *ST,
                                              SelectionDAG &DAG) const {
   assert(ST->getAddressingMode() == ISD::UNINDEXED &&
          "unaligned indexed stores not implemented!");
   SDValue Chain = ST->getChain();
   SDValue Ptr = ST->getBasePtr();
   SDValue Val = ST->getValue();
   EVT VT = Val.getValueType();
   int Alignment = ST->getAlignment();
   auto &MF = DAG.getMachineFunction();
   EVT StoreMemVT = ST->getMemoryVT();
 
   SDLoc dl(ST);
   if (StoreMemVT.isFloatingPoint() || StoreMemVT.isVector()) {
     EVT intVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits());
     if (isTypeLegal(intVT)) {
       if (!isOperationLegalOrCustom(ISD::STORE, intVT) &&
           StoreMemVT.isVector()) {
         // Scalarize the store and let the individual components be handled.
         SDValue Result = scalarizeVectorStore(ST, DAG);
         return Result;
       }
       // Expand to a bitconvert of the value to the integer type of the
       // same size, then a (misaligned) int store.
       // FIXME: Does not handle truncating floating point stores!
       SDValue Result = DAG.getNode(ISD::BITCAST, dl, intVT, Val);
       Result = DAG.getStore(Chain, dl, Result, Ptr, ST->getPointerInfo(),
                             Alignment, ST->getMemOperand()->getFlags());
       return Result;
     }
     // Do a (aligned) store to a stack slot, then copy from the stack slot
     // to the final destination using (unaligned) integer loads and stores.
     MVT RegVT = getRegisterType(
         *DAG.getContext(),
         EVT::getIntegerVT(*DAG.getContext(), StoreMemVT.getSizeInBits()));
     EVT PtrVT = Ptr.getValueType();
     unsigned StoredBytes = StoreMemVT.getStoreSize();
     unsigned RegBytes = RegVT.getSizeInBits() / 8;
     unsigned NumRegs = (StoredBytes + RegBytes - 1) / RegBytes;
 
     // Make sure the stack slot is also aligned for the register type.
     SDValue StackPtr = DAG.CreateStackTemporary(StoreMemVT, RegVT);
     auto FrameIndex = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
 
     // Perform the original store, only redirected to the stack slot.
     SDValue Store = DAG.getTruncStore(
         Chain, dl, Val, StackPtr,
         MachinePointerInfo::getFixedStack(MF, FrameIndex, 0), StoreMemVT);
 
     EVT StackPtrVT = StackPtr.getValueType();
 
     SDValue PtrIncrement = DAG.getConstant(RegBytes, dl, PtrVT);
     SDValue StackPtrIncrement = DAG.getConstant(RegBytes, dl, StackPtrVT);
     SmallVector<SDValue, 8> Stores;
     unsigned Offset = 0;
 
     // Do all but one copies using the full register width.
     for (unsigned i = 1; i < NumRegs; i++) {
       // Load one integer register's worth from the stack slot.
       SDValue Load = DAG.getLoad(
           RegVT, dl, Store, StackPtr,
           MachinePointerInfo::getFixedStack(MF, FrameIndex, Offset));
       // Store it to the final location.  Remember the store.
       Stores.push_back(DAG.getStore(Load.getValue(1), dl, Load, Ptr,
                                     ST->getPointerInfo().getWithOffset(Offset),
                                     MinAlign(ST->getAlignment(), Offset),
                                     ST->getMemOperand()->getFlags()));
       // Increment the pointers.
       Offset += RegBytes;
       StackPtr = DAG.getObjectPtrOffset(dl, StackPtr, StackPtrIncrement);
       Ptr = DAG.getObjectPtrOffset(dl, Ptr, PtrIncrement);
     }
 
     // The last store may be partial.  Do a truncating store.  On big-endian
     // machines this requires an extending load from the stack slot to ensure
     // that the bits are in the right place.
     EVT LoadMemVT =
         EVT::getIntegerVT(*DAG.getContext(), 8 * (StoredBytes - Offset));
 
     // Load from the stack slot.
     SDValue Load = DAG.getExtLoad(
         ISD::EXTLOAD, dl, RegVT, Store, StackPtr,
         MachinePointerInfo::getFixedStack(MF, FrameIndex, Offset), LoadMemVT);
 
     Stores.push_back(
         DAG.getTruncStore(Load.getValue(1), dl, Load, Ptr,
                           ST->getPointerInfo().getWithOffset(Offset), LoadMemVT,
                           MinAlign(ST->getAlignment(), Offset),
                           ST->getMemOperand()->getFlags(), ST->getAAInfo()));
     // The order of the stores doesn't matter - say it with a TokenFactor.
     SDValue Result = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Stores);
     return Result;
   }
 
   assert(StoreMemVT.isInteger() && !StoreMemVT.isVector() &&
          "Unaligned store of unknown type.");
   // Get the half-size VT
   EVT NewStoredVT = StoreMemVT.getHalfSizedIntegerVT(*DAG.getContext());
   int NumBits = NewStoredVT.getSizeInBits();
   int IncrementSize = NumBits / 8;
 
   // Divide the stored value in two parts.
   SDValue ShiftAmount = DAG.getConstant(
       NumBits, dl, getShiftAmountTy(Val.getValueType(), DAG.getDataLayout()));
   SDValue Lo = Val;
   SDValue Hi = DAG.getNode(ISD::SRL, dl, VT, Val, ShiftAmount);
 
   // Store the two parts
   SDValue Store1, Store2;
   Store1 = DAG.getTruncStore(Chain, dl,
                              DAG.getDataLayout().isLittleEndian() ? Lo : Hi,
                              Ptr, ST->getPointerInfo(), NewStoredVT, Alignment,
                              ST->getMemOperand()->getFlags());
 
   Ptr = DAG.getObjectPtrOffset(dl, Ptr, IncrementSize);
   Alignment = MinAlign(Alignment, IncrementSize);
   Store2 = DAG.getTruncStore(
       Chain, dl, DAG.getDataLayout().isLittleEndian() ? Hi : Lo, Ptr,
       ST->getPointerInfo().getWithOffset(IncrementSize), NewStoredVT, Alignment,
       ST->getMemOperand()->getFlags(), ST->getAAInfo());
 
   SDValue Result =
       DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Store1, Store2);
   return Result;
 }
 
 SDValue
 TargetLowering::IncrementMemoryAddress(SDValue Addr, SDValue Mask,
                                        const SDLoc &DL, EVT DataVT,
                                        SelectionDAG &DAG,
                                        bool IsCompressedMemory) const {
   SDValue Increment;
   EVT AddrVT = Addr.getValueType();
   EVT MaskVT = Mask.getValueType();
   assert(DataVT.getVectorNumElements() == MaskVT.getVectorNumElements() &&
          "Incompatible types of Data and Mask");
   if (IsCompressedMemory) {
     // Incrementing the pointer according to number of '1's in the mask.
     EVT MaskIntVT = EVT::getIntegerVT(*DAG.getContext(), MaskVT.getSizeInBits());
     SDValue MaskInIntReg = DAG.getBitcast(MaskIntVT, Mask);
     if (MaskIntVT.getSizeInBits() < 32) {
       MaskInIntReg = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, MaskInIntReg);
       MaskIntVT = MVT::i32;
     }
 
     // Count '1's with POPCNT.
     Increment = DAG.getNode(ISD::CTPOP, DL, MaskIntVT, MaskInIntReg);
     Increment = DAG.getZExtOrTrunc(Increment, DL, AddrVT);
     // Scale is an element size in bytes.
     SDValue Scale = DAG.getConstant(DataVT.getScalarSizeInBits() / 8, DL,
                                     AddrVT);
     Increment = DAG.getNode(ISD::MUL, DL, AddrVT, Increment, Scale);
   } else
     Increment = DAG.getConstant(DataVT.getStoreSize(), DL, AddrVT);
 
   return DAG.getNode(ISD::ADD, DL, AddrVT, Addr, Increment);
 }
 
 static SDValue clampDynamicVectorIndex(SelectionDAG &DAG,
                                        SDValue Idx,
                                        EVT VecVT,
                                        const SDLoc &dl) {
   if (isa<ConstantSDNode>(Idx))
     return Idx;
 
   EVT IdxVT = Idx.getValueType();
   unsigned NElts = VecVT.getVectorNumElements();
   if (isPowerOf2_32(NElts)) {
     APInt Imm = APInt::getLowBitsSet(IdxVT.getSizeInBits(),
                                      Log2_32(NElts));
     return DAG.getNode(ISD::AND, dl, IdxVT, Idx,
                        DAG.getConstant(Imm, dl, IdxVT));
   }
 
   return DAG.getNode(ISD::UMIN, dl, IdxVT, Idx,
                      DAG.getConstant(NElts - 1, dl, IdxVT));
 }
 
 SDValue TargetLowering::getVectorElementPointer(SelectionDAG &DAG,
                                                 SDValue VecPtr, EVT VecVT,
                                                 SDValue Index) const {
   SDLoc dl(Index);
   // Make sure the index type is big enough to compute in.
   Index = DAG.getZExtOrTrunc(Index, dl, VecPtr.getValueType());
 
   EVT EltVT = VecVT.getVectorElementType();
 
   // Calculate the element offset and add it to the pointer.
   unsigned EltSize = EltVT.getSizeInBits() / 8; // FIXME: should be ABI size.
   assert(EltSize * 8 == EltVT.getSizeInBits() &&
          "Converting bits to bytes lost precision");
 
   Index = clampDynamicVectorIndex(DAG, Index, VecVT, dl);
 
   EVT IdxVT = Index.getValueType();
 
   Index = DAG.getNode(ISD::MUL, dl, IdxVT, Index,
                       DAG.getConstant(EltSize, dl, IdxVT));
   return DAG.getMemBasePlusOffset(VecPtr, Index, dl);
 }
 
 //===----------------------------------------------------------------------===//
 // Implementation of Emulated TLS Model
 //===----------------------------------------------------------------------===//
 
 SDValue TargetLowering::LowerToTLSEmulatedModel(const GlobalAddressSDNode *GA,
                                                 SelectionDAG &DAG) const {
   // Access to address of TLS varialbe xyz is lowered to a function call:
   //   __emutls_get_address( address of global variable named "__emutls_v.xyz" )
   EVT PtrVT = getPointerTy(DAG.getDataLayout());
   PointerType *VoidPtrType = Type::getInt8PtrTy(*DAG.getContext());
   SDLoc dl(GA);
 
   ArgListTy Args;
   ArgListEntry Entry;
   std::string NameString = ("__emutls_v." + GA->getGlobal()->getName()).str();
   Module *VariableModule = const_cast<Module*>(GA->getGlobal()->getParent());
   StringRef EmuTlsVarName(NameString);
   GlobalVariable *EmuTlsVar = VariableModule->getNamedGlobal(EmuTlsVarName);
   assert(EmuTlsVar && "Cannot find EmuTlsVar ");
   Entry.Node = DAG.getGlobalAddress(EmuTlsVar, dl, PtrVT);
   Entry.Ty = VoidPtrType;
   Args.push_back(Entry);
 
   SDValue EmuTlsGetAddr = DAG.getExternalSymbol("__emutls_get_address", PtrVT);
 
   TargetLowering::CallLoweringInfo CLI(DAG);
   CLI.setDebugLoc(dl).setChain(DAG.getEntryNode());
   CLI.setLibCallee(CallingConv::C, VoidPtrType, EmuTlsGetAddr, std::move(Args));
   std::pair<SDValue, SDValue> CallResult = LowerCallTo(CLI);
 
   // TLSADDR will be codegen'ed as call. Inform MFI that function has calls.
   // At last for X86 targets, maybe good for other targets too?
   MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
   MFI.setAdjustsStack(true); // Is this only for X86 target?
   MFI.setHasCalls(true);
 
   assert((GA->getOffset() == 0) &&
          "Emulated TLS must have zero offset in GlobalAddressSDNode");
   return CallResult.first;
 }
 
 SDValue TargetLowering::lowerCmpEqZeroToCtlzSrl(SDValue Op,
                                                 SelectionDAG &DAG) const {
   assert((Op->getOpcode() == ISD::SETCC) && "Input has to be a SETCC node.");
   if (!isCtlzFast())
     return SDValue();
   ISD::CondCode CC = cast<CondCodeSDNode>(Op.getOperand(2))->get();
   SDLoc dl(Op);
   if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Op.getOperand(1))) {
     if (C->isNullValue() && CC == ISD::SETEQ) {
       EVT VT = Op.getOperand(0).getValueType();
       SDValue Zext = Op.getOperand(0);
       if (VT.bitsLT(MVT::i32)) {
         VT = MVT::i32;
         Zext = DAG.getNode(ISD::ZERO_EXTEND, dl, VT, Op.getOperand(0));
       }
       unsigned Log2b = Log2_32(VT.getSizeInBits());
       SDValue Clz = DAG.getNode(ISD::CTLZ, dl, VT, Zext);
       SDValue Scc = DAG.getNode(ISD::SRL, dl, VT, Clz,
                                 DAG.getConstant(Log2b, dl, MVT::i32));
       return DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, Scc);
     }
   }
   return SDValue();
 }
 
 SDValue TargetLowering::expandAddSubSat(SDNode *Node, SelectionDAG &DAG) const {
   unsigned Opcode = Node->getOpcode();
   SDValue LHS = Node->getOperand(0);
   SDValue RHS = Node->getOperand(1);
   EVT VT = LHS.getValueType();
   SDLoc dl(Node);
 
   assert(VT == RHS.getValueType() && "Expected operands to be the same type");
   assert(VT.isInteger() && "Expected operands to be integers");
 
   // usub.sat(a, b) -> umax(a, b) - b
   if (Opcode == ISD::USUBSAT && isOperationLegalOrCustom(ISD::UMAX, VT)) {
     SDValue Max = DAG.getNode(ISD::UMAX, dl, VT, LHS, RHS);
     return DAG.getNode(ISD::SUB, dl, VT, Max, RHS);
   }
 
   if (Opcode == ISD::UADDSAT && isOperationLegalOrCustom(ISD::UMIN, VT)) {
     SDValue InvRHS = DAG.getNOT(dl, RHS, VT);
     SDValue Min = DAG.getNode(ISD::UMIN, dl, VT, LHS, InvRHS);
     return DAG.getNode(ISD::ADD, dl, VT, Min, RHS);
   }
 
   unsigned OverflowOp;
   switch (Opcode) {
   case ISD::SADDSAT:
     OverflowOp = ISD::SADDO;
     break;
   case ISD::UADDSAT:
     OverflowOp = ISD::UADDO;
     break;
   case ISD::SSUBSAT:
     OverflowOp = ISD::SSUBO;
     break;
   case ISD::USUBSAT:
     OverflowOp = ISD::USUBO;
     break;
   default:
     llvm_unreachable("Expected method to receive signed or unsigned saturation "
                      "addition or subtraction node.");
   }
 
   unsigned BitWidth = LHS.getScalarValueSizeInBits();
   EVT BoolVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
   SDValue Result = DAG.getNode(OverflowOp, dl, DAG.getVTList(VT, BoolVT),
                                LHS, RHS);
   SDValue SumDiff = Result.getValue(0);
   SDValue Overflow = Result.getValue(1);
   SDValue Zero = DAG.getConstant(0, dl, VT);
   SDValue AllOnes = DAG.getAllOnesConstant(dl, VT);
 
   if (Opcode == ISD::UADDSAT) {
     if (getBooleanContents(VT) == ZeroOrNegativeOneBooleanContent) {
       // (LHS + RHS) | OverflowMask
       SDValue OverflowMask = DAG.getSExtOrTrunc(Overflow, dl, VT);
       return DAG.getNode(ISD::OR, dl, VT, SumDiff, OverflowMask);
     }
     // Overflow ? 0xffff.... : (LHS + RHS)
     return DAG.getSelect(dl, VT, Overflow, AllOnes, SumDiff);
   } else if (Opcode == ISD::USUBSAT) {
     if (getBooleanContents(VT) == ZeroOrNegativeOneBooleanContent) {
       // (LHS - RHS) & ~OverflowMask
       SDValue OverflowMask = DAG.getSExtOrTrunc(Overflow, dl, VT);
       SDValue Not = DAG.getNOT(dl, OverflowMask, VT);
       return DAG.getNode(ISD::AND, dl, VT, SumDiff, Not);
     }
     // Overflow ? 0 : (LHS - RHS)
     return DAG.getSelect(dl, VT, Overflow, Zero, SumDiff);
   } else {
     // SatMax -> Overflow && SumDiff < 0
     // SatMin -> Overflow && SumDiff >= 0
     APInt MinVal = APInt::getSignedMinValue(BitWidth);
     APInt MaxVal = APInt::getSignedMaxValue(BitWidth);
     SDValue SatMin = DAG.getConstant(MinVal, dl, VT);
     SDValue SatMax = DAG.getConstant(MaxVal, dl, VT);
     SDValue SumNeg = DAG.getSetCC(dl, BoolVT, SumDiff, Zero, ISD::SETLT);
     Result = DAG.getSelect(dl, VT, SumNeg, SatMax, SatMin);
     return DAG.getSelect(dl, VT, Overflow, Result, SumDiff);
   }
 }
 
 SDValue
 TargetLowering::expandFixedPointMul(SDNode *Node, SelectionDAG &DAG) const {
   assert((Node->getOpcode() == ISD::SMULFIX ||
           Node->getOpcode() == ISD::UMULFIX ||
           Node->getOpcode() == ISD::SMULFIXSAT ||
           Node->getOpcode() == ISD::UMULFIXSAT) &&
          "Expected a fixed point multiplication opcode");
 
   SDLoc dl(Node);
   SDValue LHS = Node->getOperand(0);
   SDValue RHS = Node->getOperand(1);
   EVT VT = LHS.getValueType();
   unsigned Scale = Node->getConstantOperandVal(2);
   bool Saturating = (Node->getOpcode() == ISD::SMULFIXSAT ||
                      Node->getOpcode() == ISD::UMULFIXSAT);
   bool Signed = (Node->getOpcode() == ISD::SMULFIX ||
                  Node->getOpcode() == ISD::SMULFIXSAT);
   EVT BoolVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
   unsigned VTSize = VT.getScalarSizeInBits();
 
   if (!Scale) {
     // [us]mul.fix(a, b, 0) -> mul(a, b)
     if (!Saturating) {
       if (isOperationLegalOrCustom(ISD::MUL, VT))
         return DAG.getNode(ISD::MUL, dl, VT, LHS, RHS);
     } else if (Signed && isOperationLegalOrCustom(ISD::SMULO, VT)) {
       SDValue Result =
           DAG.getNode(ISD::SMULO, dl, DAG.getVTList(VT, BoolVT), LHS, RHS);
       SDValue Product = Result.getValue(0);
       SDValue Overflow = Result.getValue(1);
       SDValue Zero = DAG.getConstant(0, dl, VT);
 
       APInt MinVal = APInt::getSignedMinValue(VTSize);
       APInt MaxVal = APInt::getSignedMaxValue(VTSize);
       SDValue SatMin = DAG.getConstant(MinVal, dl, VT);
       SDValue SatMax = DAG.getConstant(MaxVal, dl, VT);
       SDValue ProdNeg = DAG.getSetCC(dl, BoolVT, Product, Zero, ISD::SETLT);
       Result = DAG.getSelect(dl, VT, ProdNeg, SatMax, SatMin);
       return DAG.getSelect(dl, VT, Overflow, Result, Product);
     } else if (!Signed && isOperationLegalOrCustom(ISD::UMULO, VT)) {
       SDValue Result =
           DAG.getNode(ISD::UMULO, dl, DAG.getVTList(VT, BoolVT), LHS, RHS);
       SDValue Product = Result.getValue(0);
       SDValue Overflow = Result.getValue(1);
 
       APInt MaxVal = APInt::getMaxValue(VTSize);
       SDValue SatMax = DAG.getConstant(MaxVal, dl, VT);
       return DAG.getSelect(dl, VT, Overflow, SatMax, Product);
     }
   }
 
   assert(((Signed && Scale < VTSize) || (!Signed && Scale <= VTSize)) &&
          "Expected scale to be less than the number of bits if signed or at "
          "most the number of bits if unsigned.");
   assert(LHS.getValueType() == RHS.getValueType() &&
          "Expected both operands to be the same type");
 
   // Get the upper and lower bits of the result.
   SDValue Lo, Hi;
   unsigned LoHiOp = Signed ? ISD::SMUL_LOHI : ISD::UMUL_LOHI;
   unsigned HiOp = Signed ? ISD::MULHS : ISD::MULHU;
   if (isOperationLegalOrCustom(LoHiOp, VT)) {
     SDValue Result = DAG.getNode(LoHiOp, dl, DAG.getVTList(VT, VT), LHS, RHS);
     Lo = Result.getValue(0);
     Hi = Result.getValue(1);
   } else if (isOperationLegalOrCustom(HiOp, VT)) {
     Lo = DAG.getNode(ISD::MUL, dl, VT, LHS, RHS);
     Hi = DAG.getNode(HiOp, dl, VT, LHS, RHS);
   } else if (VT.isVector()) {
     return SDValue();
   } else {
     report_fatal_error("Unable to expand fixed point multiplication.");
   }
 
   if (Scale == VTSize)
     // Result is just the top half since we'd be shifting by the width of the
     // operand. Overflow impossible so this works for both UMULFIX and
     // UMULFIXSAT.
     return Hi;
 
   // The result will need to be shifted right by the scale since both operands
   // are scaled. The result is given to us in 2 halves, so we only want part of
   // both in the result.
   EVT ShiftTy = getShiftAmountTy(VT, DAG.getDataLayout());
   SDValue Result = DAG.getNode(ISD::FSHR, dl, VT, Hi, Lo,
                                DAG.getConstant(Scale, dl, ShiftTy));
   if (!Saturating)
     return Result;
 
   if (!Signed) {
     // Unsigned overflow happened if the upper (VTSize - Scale) bits (of the
     // widened multiplication) aren't all zeroes.
 
     // Saturate to max if ((Hi >> Scale) != 0),
     // which is the same as if (Hi > ((1 << Scale) - 1))
     APInt MaxVal = APInt::getMaxValue(VTSize);
     SDValue LowMask = DAG.getConstant(APInt::getLowBitsSet(VTSize, Scale),
                                       dl, VT);
     Result = DAG.getSelectCC(dl, Hi, LowMask,
                              DAG.getConstant(MaxVal, dl, VT), Result,
                              ISD::SETUGT);
 
     return Result;
   }
 
   // Signed overflow happened if the upper (VTSize - Scale + 1) bits (of the
   // widened multiplication) aren't all ones or all zeroes.
 
   SDValue SatMin = DAG.getConstant(APInt::getSignedMinValue(VTSize), dl, VT);
   SDValue SatMax = DAG.getConstant(APInt::getSignedMaxValue(VTSize), dl, VT);
 
   if (Scale == 0) {
     SDValue Sign = DAG.getNode(ISD::SRA, dl, VT, Lo,
                                DAG.getConstant(VTSize - 1, dl, ShiftTy));
     SDValue Overflow = DAG.getSetCC(dl, BoolVT, Hi, Sign, ISD::SETNE);
     // Saturated to SatMin if wide product is negative, and SatMax if wide
     // product is positive ...
     SDValue Zero = DAG.getConstant(0, dl, VT);
     SDValue ResultIfOverflow = DAG.getSelectCC(dl, Hi, Zero, SatMin, SatMax,
                                                ISD::SETLT);
     // ... but only if we overflowed.
     return DAG.getSelect(dl, VT, Overflow, ResultIfOverflow, Result);
   }
 
   //  We handled Scale==0 above so all the bits to examine is in Hi.
 
   // Saturate to max if ((Hi >> (Scale - 1)) > 0),
   // which is the same as if (Hi > (1 << (Scale - 1)) - 1)
   SDValue LowMask = DAG.getConstant(APInt::getLowBitsSet(VTSize, Scale - 1),
                                     dl, VT);
   Result = DAG.getSelectCC(dl, Hi, LowMask, SatMax, Result, ISD::SETGT);
   // Saturate to min if (Hi >> (Scale - 1)) < -1),
   // which is the same as if (HI < (-1 << (Scale - 1))
   SDValue HighMask =
       DAG.getConstant(APInt::getHighBitsSet(VTSize, VTSize - Scale + 1),
                       dl, VT);
   Result = DAG.getSelectCC(dl, Hi, HighMask, SatMin, Result, ISD::SETLT);
   return Result;
 }
 
 SDValue
 TargetLowering::expandFixedPointDiv(unsigned Opcode, const SDLoc &dl,
                                     SDValue LHS, SDValue RHS,
                                     unsigned Scale, SelectionDAG &DAG) const {
   assert((Opcode == ISD::SDIVFIX || Opcode == ISD::SDIVFIXSAT ||
           Opcode == ISD::UDIVFIX || Opcode == ISD::UDIVFIXSAT) &&
          "Expected a fixed point division opcode");
 
   EVT VT = LHS.getValueType();
   bool Signed = Opcode == ISD::SDIVFIX || Opcode == ISD::SDIVFIXSAT;
   bool Saturating = Opcode == ISD::SDIVFIXSAT || Opcode == ISD::UDIVFIXSAT;
   EVT BoolVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
 
   // If there is enough room in the type to upscale the LHS or downscale the
   // RHS before the division, we can perform it in this type without having to
   // resize. For signed operations, the LHS headroom is the number of
   // redundant sign bits, and for unsigned ones it is the number of zeroes.
   // The headroom for the RHS is the number of trailing zeroes.
   unsigned LHSLead = Signed ? DAG.ComputeNumSignBits(LHS) - 1
                             : DAG.computeKnownBits(LHS).countMinLeadingZeros();
   unsigned RHSTrail = DAG.computeKnownBits(RHS).countMinTrailingZeros();
 
   // For signed saturating operations, we need to be able to detect true integer
   // division overflow; that is, when you have MIN / -EPS. However, this
   // is undefined behavior and if we emit divisions that could take such
   // values it may cause undesired behavior (arithmetic exceptions on x86, for
   // example).
   // Avoid this by requiring an extra bit so that we never get this case.
   // FIXME: This is a bit unfortunate as it means that for an 8-bit 7-scale
   // signed saturating division, we need to emit a whopping 32-bit division.
   if (LHSLead + RHSTrail < Scale + (unsigned)(Saturating && Signed))
     return SDValue();
 
   unsigned LHSShift = std::min(LHSLead, Scale);
   unsigned RHSShift = Scale - LHSShift;
 
   // At this point, we know that if we shift the LHS up by LHSShift and the
   // RHS down by RHSShift, we can emit a regular division with a final scaling
   // factor of Scale.
 
   EVT ShiftTy = getShiftAmountTy(VT, DAG.getDataLayout());
   if (LHSShift)
     LHS = DAG.getNode(ISD::SHL, dl, VT, LHS,
                       DAG.getConstant(LHSShift, dl, ShiftTy));
   if (RHSShift)
     RHS = DAG.getNode(Signed ? ISD::SRA : ISD::SRL, dl, VT, RHS,
                       DAG.getConstant(RHSShift, dl, ShiftTy));
 
   SDValue Quot;
   if (Signed) {
     // For signed operations, if the resulting quotient is negative and the
     // remainder is nonzero, subtract 1 from the quotient to round towards
     // negative infinity.
     SDValue Rem;
     // FIXME: Ideally we would always produce an SDIVREM here, but if the
     // type isn't legal, SDIVREM cannot be expanded. There is no reason why
     // we couldn't just form a libcall, but the type legalizer doesn't do it.
     if (isTypeLegal(VT) &&
         isOperationLegalOrCustom(ISD::SDIVREM, VT)) {
       Quot = DAG.getNode(ISD::SDIVREM, dl,
                          DAG.getVTList(VT, VT),
                          LHS, RHS);
       Rem = Quot.getValue(1);
       Quot = Quot.getValue(0);
     } else {
       Quot = DAG.getNode(ISD::SDIV, dl, VT,
                          LHS, RHS);
       Rem = DAG.getNode(ISD::SREM, dl, VT,
                         LHS, RHS);
     }
     SDValue Zero = DAG.getConstant(0, dl, VT);
     SDValue RemNonZero = DAG.getSetCC(dl, BoolVT, Rem, Zero, ISD::SETNE);
     SDValue LHSNeg = DAG.getSetCC(dl, BoolVT, LHS, Zero, ISD::SETLT);
     SDValue RHSNeg = DAG.getSetCC(dl, BoolVT, RHS, Zero, ISD::SETLT);
     SDValue QuotNeg = DAG.getNode(ISD::XOR, dl, BoolVT, LHSNeg, RHSNeg);
     SDValue Sub1 = DAG.getNode(ISD::SUB, dl, VT, Quot,
                                DAG.getConstant(1, dl, VT));
     Quot = DAG.getSelect(dl, VT,
                          DAG.getNode(ISD::AND, dl, BoolVT, RemNonZero, QuotNeg),
                          Sub1, Quot);
   } else
     Quot = DAG.getNode(ISD::UDIV, dl, VT,
                        LHS, RHS);
 
   return Quot;
 }
 
 void TargetLowering::expandUADDSUBO(
     SDNode *Node, SDValue &Result, SDValue &Overflow, SelectionDAG &DAG) const {
   SDLoc dl(Node);
   SDValue LHS = Node->getOperand(0);
   SDValue RHS = Node->getOperand(1);
   bool IsAdd = Node->getOpcode() == ISD::UADDO;
 
   // If ADD/SUBCARRY is legal, use that instead.
   unsigned OpcCarry = IsAdd ? ISD::ADDCARRY : ISD::SUBCARRY;
   if (isOperationLegalOrCustom(OpcCarry, Node->getValueType(0))) {
     SDValue CarryIn = DAG.getConstant(0, dl, Node->getValueType(1));
     SDValue NodeCarry = DAG.getNode(OpcCarry, dl, Node->getVTList(),
                                     { LHS, RHS, CarryIn });
     Result = SDValue(NodeCarry.getNode(), 0);
     Overflow = SDValue(NodeCarry.getNode(), 1);
     return;
   }
 
   Result = DAG.getNode(IsAdd ? ISD::ADD : ISD::SUB, dl,
                             LHS.getValueType(), LHS, RHS);
 
   EVT ResultType = Node->getValueType(1);
   EVT SetCCType = getSetCCResultType(
       DAG.getDataLayout(), *DAG.getContext(), Node->getValueType(0));
   ISD::CondCode CC = IsAdd ? ISD::SETULT : ISD::SETUGT;
   SDValue SetCC = DAG.getSetCC(dl, SetCCType, Result, LHS, CC);
   Overflow = DAG.getBoolExtOrTrunc(SetCC, dl, ResultType, ResultType);
 }
 
 void TargetLowering::expandSADDSUBO(
     SDNode *Node, SDValue &Result, SDValue &Overflow, SelectionDAG &DAG) const {
   SDLoc dl(Node);
   SDValue LHS = Node->getOperand(0);
   SDValue RHS = Node->getOperand(1);
   bool IsAdd = Node->getOpcode() == ISD::SADDO;
 
   Result = DAG.getNode(IsAdd ? ISD::ADD : ISD::SUB, dl,
                             LHS.getValueType(), LHS, RHS);
 
   EVT ResultType = Node->getValueType(1);
   EVT OType = getSetCCResultType(
       DAG.getDataLayout(), *DAG.getContext(), Node->getValueType(0));
 
   // If SADDSAT/SSUBSAT is legal, compare results to detect overflow.
   unsigned OpcSat = IsAdd ? ISD::SADDSAT : ISD::SSUBSAT;
   if (isOperationLegalOrCustom(OpcSat, LHS.getValueType())) {
     SDValue Sat = DAG.getNode(OpcSat, dl, LHS.getValueType(), LHS, RHS);
     SDValue SetCC = DAG.getSetCC(dl, OType, Result, Sat, ISD::SETNE);
     Overflow = DAG.getBoolExtOrTrunc(SetCC, dl, ResultType, ResultType);
     return;
   }
 
   SDValue Zero = DAG.getConstant(0, dl, LHS.getValueType());
 
   // For an addition, the result should be less than one of the operands (LHS)
   // if and only if the other operand (RHS) is negative, otherwise there will
   // be overflow.
   // For a subtraction, the result should be less than one of the operands
   // (LHS) if and only if the other operand (RHS) is (non-zero) positive,
   // otherwise there will be overflow.
   SDValue ResultLowerThanLHS = DAG.getSetCC(dl, OType, Result, LHS, ISD::SETLT);
   SDValue ConditionRHS =
       DAG.getSetCC(dl, OType, RHS, Zero, IsAdd ? ISD::SETLT : ISD::SETGT);
 
   Overflow = DAG.getBoolExtOrTrunc(
       DAG.getNode(ISD::XOR, dl, OType, ConditionRHS, ResultLowerThanLHS), dl,
       ResultType, ResultType);
 }
 
 bool TargetLowering::expandMULO(SDNode *Node, SDValue &Result,
                                 SDValue &Overflow, SelectionDAG &DAG) const {
   SDLoc dl(Node);
   EVT VT = Node->getValueType(0);
   EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
   SDValue LHS = Node->getOperand(0);
   SDValue RHS = Node->getOperand(1);
   bool isSigned = Node->getOpcode() == ISD::SMULO;
 
   // For power-of-two multiplications we can use a simpler shift expansion.
   if (ConstantSDNode *RHSC = isConstOrConstSplat(RHS)) {
     const APInt &C = RHSC->getAPIntValue();
     // mulo(X, 1 << S) -> { X << S, (X << S) >> S != X }
     if (C.isPowerOf2()) {
       // smulo(x, signed_min) is same as umulo(x, signed_min).
       bool UseArithShift = isSigned && !C.isMinSignedValue();
       EVT ShiftAmtTy = getShiftAmountTy(VT, DAG.getDataLayout());
       SDValue ShiftAmt = DAG.getConstant(C.logBase2(), dl, ShiftAmtTy);
       Result = DAG.getNode(ISD::SHL, dl, VT, LHS, ShiftAmt);
       Overflow = DAG.getSetCC(dl, SetCCVT,
           DAG.getNode(UseArithShift ? ISD::SRA : ISD::SRL,
                       dl, VT, Result, ShiftAmt),
           LHS, ISD::SETNE);
       return true;
     }
   }
 
   EVT WideVT = EVT::getIntegerVT(*DAG.getContext(), VT.getScalarSizeInBits() * 2);
   if (VT.isVector())
     WideVT = EVT::getVectorVT(*DAG.getContext(), WideVT,
                               VT.getVectorNumElements());
 
   SDValue BottomHalf;
   SDValue TopHalf;
   static const unsigned Ops[2][3] =
       { { ISD::MULHU, ISD::UMUL_LOHI, ISD::ZERO_EXTEND },
         { ISD::MULHS, ISD::SMUL_LOHI, ISD::SIGN_EXTEND }};
   if (isOperationLegalOrCustom(Ops[isSigned][0], VT)) {
     BottomHalf = DAG.getNode(ISD::MUL, dl, VT, LHS, RHS);
     TopHalf = DAG.getNode(Ops[isSigned][0], dl, VT, LHS, RHS);
   } else if (isOperationLegalOrCustom(Ops[isSigned][1], VT)) {
     BottomHalf = DAG.getNode(Ops[isSigned][1], dl, DAG.getVTList(VT, VT), LHS,
                              RHS);
     TopHalf = BottomHalf.getValue(1);
   } else if (isTypeLegal(WideVT)) {
     LHS = DAG.getNode(Ops[isSigned][2], dl, WideVT, LHS);
     RHS = DAG.getNode(Ops[isSigned][2], dl, WideVT, RHS);
     SDValue Mul = DAG.getNode(ISD::MUL, dl, WideVT, LHS, RHS);
     BottomHalf = DAG.getNode(ISD::TRUNCATE, dl, VT, Mul);
     SDValue ShiftAmt = DAG.getConstant(VT.getScalarSizeInBits(), dl,
         getShiftAmountTy(WideVT, DAG.getDataLayout()));
     TopHalf = DAG.getNode(ISD::TRUNCATE, dl, VT,
                           DAG.getNode(ISD::SRL, dl, WideVT, Mul, ShiftAmt));
   } else {
     if (VT.isVector())
       return false;
 
     // We can fall back to a libcall with an illegal type for the MUL if we
     // have a libcall big enough.
     // Also, we can fall back to a division in some cases, but that's a big
     // performance hit in the general case.
     RTLIB::Libcall LC = RTLIB::UNKNOWN_LIBCALL;
     if (WideVT == MVT::i16)
       LC = RTLIB::MUL_I16;
     else if (WideVT == MVT::i32)
       LC = RTLIB::MUL_I32;
     else if (WideVT == MVT::i64)
       LC = RTLIB::MUL_I64;
     else if (WideVT == MVT::i128)
       LC = RTLIB::MUL_I128;
     assert(LC != RTLIB::UNKNOWN_LIBCALL && "Cannot expand this operation!");
 
     SDValue HiLHS;
     SDValue HiRHS;
     if (isSigned) {
       // The high part is obtained by SRA'ing all but one of the bits of low
       // part.
       unsigned LoSize = VT.getSizeInBits();
       HiLHS =
           DAG.getNode(ISD::SRA, dl, VT, LHS,
                       DAG.getConstant(LoSize - 1, dl,
                                       getPointerTy(DAG.getDataLayout())));
       HiRHS =
           DAG.getNode(ISD::SRA, dl, VT, RHS,
                       DAG.getConstant(LoSize - 1, dl,
                                       getPointerTy(DAG.getDataLayout())));
     } else {
         HiLHS = DAG.getConstant(0, dl, VT);
         HiRHS = DAG.getConstant(0, dl, VT);
     }
 
     // Here we're passing the 2 arguments explicitly as 4 arguments that are
     // pre-lowered to the correct types. This all depends upon WideVT not
     // being a legal type for the architecture and thus has to be split to
     // two arguments.
     SDValue Ret;
     TargetLowering::MakeLibCallOptions CallOptions;
     CallOptions.setSExt(isSigned);
     CallOptions.setIsPostTypeLegalization(true);
     if (shouldSplitFunctionArgumentsAsLittleEndian(DAG.getDataLayout())) {
       // Halves of WideVT are packed into registers in different order
       // depending on platform endianness. This is usually handled by
       // the C calling convention, but we can't defer to it in
       // the legalizer.
       SDValue Args[] = { LHS, HiLHS, RHS, HiRHS };
       Ret = makeLibCall(DAG, LC, WideVT, Args, CallOptions, dl).first;
     } else {
       SDValue Args[] = { HiLHS, LHS, HiRHS, RHS };
       Ret = makeLibCall(DAG, LC, WideVT, Args, CallOptions, dl).first;
     }
     assert(Ret.getOpcode() == ISD::MERGE_VALUES &&
            "Ret value is a collection of constituent nodes holding result.");
     if (DAG.getDataLayout().isLittleEndian()) {
       // Same as above.
       BottomHalf = Ret.getOperand(0);
       TopHalf = Ret.getOperand(1);
     } else {
       BottomHalf = Ret.getOperand(1);
       TopHalf = Ret.getOperand(0);
     }
   }
 
   Result = BottomHalf;
   if (isSigned) {
     SDValue ShiftAmt = DAG.getConstant(
         VT.getScalarSizeInBits() - 1, dl,
         getShiftAmountTy(BottomHalf.getValueType(), DAG.getDataLayout()));
     SDValue Sign = DAG.getNode(ISD::SRA, dl, VT, BottomHalf, ShiftAmt);
     Overflow = DAG.getSetCC(dl, SetCCVT, TopHalf, Sign, ISD::SETNE);
   } else {
     Overflow = DAG.getSetCC(dl, SetCCVT, TopHalf,
                             DAG.getConstant(0, dl, VT), ISD::SETNE);
   }
 
   // Truncate the result if SetCC returns a larger type than needed.
   EVT RType = Node->getValueType(1);
   if (RType.getSizeInBits() < Overflow.getValueSizeInBits())
     Overflow = DAG.getNode(ISD::TRUNCATE, dl, RType, Overflow);
 
   assert(RType.getSizeInBits() == Overflow.getValueSizeInBits() &&
          "Unexpected result type for S/UMULO legalization");
   return true;
 }
 
 SDValue TargetLowering::expandVecReduce(SDNode *Node, SelectionDAG &DAG) const {
   SDLoc dl(Node);
   bool NoNaN = Node->getFlags().hasNoNaNs();
   unsigned BaseOpcode = 0;
   switch (Node->getOpcode()) {
   default: llvm_unreachable("Expected VECREDUCE opcode");
   case ISD::VECREDUCE_FADD: BaseOpcode = ISD::FADD; break;
   case ISD::VECREDUCE_FMUL: BaseOpcode = ISD::FMUL; break;
   case ISD::VECREDUCE_ADD:  BaseOpcode = ISD::ADD; break;
   case ISD::VECREDUCE_MUL:  BaseOpcode = ISD::MUL; break;
   case ISD::VECREDUCE_AND:  BaseOpcode = ISD::AND; break;
   case ISD::VECREDUCE_OR:   BaseOpcode = ISD::OR; break;
   case ISD::VECREDUCE_XOR:  BaseOpcode = ISD::XOR; break;
   case ISD::VECREDUCE_SMAX: BaseOpcode = ISD::SMAX; break;
   case ISD::VECREDUCE_SMIN: BaseOpcode = ISD::SMIN; break;
   case ISD::VECREDUCE_UMAX: BaseOpcode = ISD::UMAX; break;
   case ISD::VECREDUCE_UMIN: BaseOpcode = ISD::UMIN; break;
   case ISD::VECREDUCE_FMAX:
     BaseOpcode = NoNaN ? ISD::FMAXNUM : ISD::FMAXIMUM;
     break;
   case ISD::VECREDUCE_FMIN:
     BaseOpcode = NoNaN ? ISD::FMINNUM : ISD::FMINIMUM;
     break;
   }
 
   SDValue Op = Node->getOperand(0);
   EVT VT = Op.getValueType();
 
   // Try to use a shuffle reduction for power of two vectors.
   if (VT.isPow2VectorType()) {
     while (VT.getVectorNumElements() > 1) {
       EVT HalfVT = VT.getHalfNumVectorElementsVT(*DAG.getContext());
       if (!isOperationLegalOrCustom(BaseOpcode, HalfVT))
         break;
 
       SDValue Lo, Hi;
       std::tie(Lo, Hi) = DAG.SplitVector(Op, dl);
       Op = DAG.getNode(BaseOpcode, dl, HalfVT, Lo, Hi);
       VT = HalfVT;
     }
   }
 
   EVT EltVT = VT.getVectorElementType();
   unsigned NumElts = VT.getVectorNumElements();
 
   SmallVector<SDValue, 8> Ops;
   DAG.ExtractVectorElements(Op, Ops, 0, NumElts);
 
   SDValue Res = Ops[0];
   for (unsigned i = 1; i < NumElts; i++)
     Res = DAG.getNode(BaseOpcode, dl, EltVT, Res, Ops[i], Node->getFlags());
 
   // Result type may be wider than element type.
   if (EltVT != Node->getValueType(0))
     Res = DAG.getNode(ISD::ANY_EXTEND, dl, Node->getValueType(0), Res);
   return Res;
 }
 
 bool TargetLowering::expandREM(SDNode *Node, SDValue &Result,
                                SelectionDAG &DAG) const {
   EVT VT = Node->getValueType(0);
   SDLoc dl(Node);
   bool isSigned = Node->getOpcode() == ISD::SREM;
   unsigned DivOpc = isSigned ? ISD::SDIV : ISD::UDIV;
   unsigned DivRemOpc = isSigned ? ISD::SDIVREM : ISD::UDIVREM;
   SDValue Dividend = Node->getOperand(0);
   SDValue Divisor = Node->getOperand(1);
   if (isOperationLegalOrCustom(DivRemOpc, VT)) {
     SDVTList VTs = DAG.getVTList(VT, VT);
     Result = DAG.getNode(DivRemOpc, dl, VTs, Dividend, Divisor).getValue(1);
     return true;
   } else if (isOperationLegalOrCustom(DivOpc, VT)) {
     // X % Y -> X-X/Y*Y
     SDValue Divide = DAG.getNode(DivOpc, dl, VT, Dividend, Divisor);
     SDValue Mul = DAG.getNode(ISD::MUL, dl, VT, Divide, Divisor);
     Result = DAG.getNode(ISD::SUB, dl, VT, Dividend, Mul);
     return true;
   }
   return false;
 }
Index: vendor/llvm-project/release-11.x/llvm/lib/IR/Core.cpp
===================================================================
--- vendor/llvm-project/release-11.x/llvm/lib/IR/Core.cpp	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/lib/IR/Core.cpp	(revision 366333)
@@ -1,4124 +1,4137 @@
 //===-- Core.cpp ----------------------------------------------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 //
 // This file implements the common infrastructure (including the C bindings)
 // for libLLVMCore.a, which implements the LLVM intermediate representation.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm-c/Core.h"
 #include "llvm/ADT/StringSwitch.h"
 #include "llvm/IR/Attributes.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DebugInfoMetadata.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/DiagnosticPrinter.h"
 #include "llvm/IR/GlobalAlias.h"
 #include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/InlineAsm.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/LegacyPassManager.h"
 #include "llvm/IR/Module.h"
 #include "llvm/InitializePasses.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/ManagedStatic.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Threading.h"
 #include "llvm/Support/raw_ostream.h"
 #include <cassert>
 #include <cstdlib>
 #include <cstring>
 #include <system_error>
 
 using namespace llvm;
 
 #define DEBUG_TYPE "ir"
 
 void llvm::initializeCore(PassRegistry &Registry) {
   initializeDominatorTreeWrapperPassPass(Registry);
   initializePrintModulePassWrapperPass(Registry);
   initializePrintFunctionPassWrapperPass(Registry);
   initializeSafepointIRVerifierPass(Registry);
   initializeVerifierLegacyPassPass(Registry);
 }
 
 void LLVMInitializeCore(LLVMPassRegistryRef R) {
   initializeCore(*unwrap(R));
 }
 
 void LLVMShutdown() {
   llvm_shutdown();
 }
 
 /*===-- Error handling ----------------------------------------------------===*/
 
 char *LLVMCreateMessage(const char *Message) {
   return strdup(Message);
 }
 
 void LLVMDisposeMessage(char *Message) {
   free(Message);
 }
 
 
 /*===-- Operations on contexts --------------------------------------------===*/
 
 static ManagedStatic<LLVMContext> GlobalContext;
 
 LLVMContextRef LLVMContextCreate() {
   return wrap(new LLVMContext());
 }
 
 LLVMContextRef LLVMGetGlobalContext() { return wrap(&*GlobalContext); }
 
 void LLVMContextSetDiagnosticHandler(LLVMContextRef C,
                                      LLVMDiagnosticHandler Handler,
                                      void *DiagnosticContext) {
   unwrap(C)->setDiagnosticHandlerCallBack(
       LLVM_EXTENSION reinterpret_cast<DiagnosticHandler::DiagnosticHandlerTy>(
           Handler),
       DiagnosticContext);
 }
 
 LLVMDiagnosticHandler LLVMContextGetDiagnosticHandler(LLVMContextRef C) {
   return LLVM_EXTENSION reinterpret_cast<LLVMDiagnosticHandler>(
       unwrap(C)->getDiagnosticHandlerCallBack());
 }
 
 void *LLVMContextGetDiagnosticContext(LLVMContextRef C) {
   return unwrap(C)->getDiagnosticContext();
 }
 
 void LLVMContextSetYieldCallback(LLVMContextRef C, LLVMYieldCallback Callback,
                                  void *OpaqueHandle) {
   auto YieldCallback =
     LLVM_EXTENSION reinterpret_cast<LLVMContext::YieldCallbackTy>(Callback);
   unwrap(C)->setYieldCallback(YieldCallback, OpaqueHandle);
 }
 
 LLVMBool LLVMContextShouldDiscardValueNames(LLVMContextRef C) {
   return unwrap(C)->shouldDiscardValueNames();
 }
 
 void LLVMContextSetDiscardValueNames(LLVMContextRef C, LLVMBool Discard) {
   unwrap(C)->setDiscardValueNames(Discard);
 }
 
 void LLVMContextDispose(LLVMContextRef C) {
   delete unwrap(C);
 }
 
 unsigned LLVMGetMDKindIDInContext(LLVMContextRef C, const char *Name,
                                   unsigned SLen) {
   return unwrap(C)->getMDKindID(StringRef(Name, SLen));
 }
 
 unsigned LLVMGetMDKindID(const char *Name, unsigned SLen) {
   return LLVMGetMDKindIDInContext(LLVMGetGlobalContext(), Name, SLen);
 }
 
 unsigned LLVMGetEnumAttributeKindForName(const char *Name, size_t SLen) {
   return Attribute::getAttrKindFromName(StringRef(Name, SLen));
 }
 
 unsigned LLVMGetLastEnumAttributeKind(void) {
   return Attribute::AttrKind::EndAttrKinds;
 }
 
 LLVMAttributeRef LLVMCreateEnumAttribute(LLVMContextRef C, unsigned KindID,
                                          uint64_t Val) {
   auto &Ctx = *unwrap(C);
   auto AttrKind = (Attribute::AttrKind)KindID;
 
   if (AttrKind == Attribute::AttrKind::ByVal) {
     // After r362128, byval attributes need to have a type attribute. Provide a
     // NULL one until a proper API is added for this.
     return wrap(Attribute::getWithByValType(Ctx, NULL));
   }
 
   return wrap(Attribute::get(Ctx, AttrKind, Val));
 }
 
 unsigned LLVMGetEnumAttributeKind(LLVMAttributeRef A) {
   return unwrap(A).getKindAsEnum();
 }
 
 uint64_t LLVMGetEnumAttributeValue(LLVMAttributeRef A) {
   auto Attr = unwrap(A);
   if (Attr.isEnumAttribute())
     return 0;
   return Attr.getValueAsInt();
 }
 
 LLVMAttributeRef LLVMCreateStringAttribute(LLVMContextRef C,
                                            const char *K, unsigned KLength,
                                            const char *V, unsigned VLength) {
   return wrap(Attribute::get(*unwrap(C), StringRef(K, KLength),
                              StringRef(V, VLength)));
 }
 
 const char *LLVMGetStringAttributeKind(LLVMAttributeRef A,
                                        unsigned *Length) {
   auto S = unwrap(A).getKindAsString();
   *Length = S.size();
   return S.data();
 }
 
 const char *LLVMGetStringAttributeValue(LLVMAttributeRef A,
                                         unsigned *Length) {
   auto S = unwrap(A).getValueAsString();
   *Length = S.size();
   return S.data();
 }
 
 LLVMBool LLVMIsEnumAttribute(LLVMAttributeRef A) {
   auto Attr = unwrap(A);
   return Attr.isEnumAttribute() || Attr.isIntAttribute();
 }
 
 LLVMBool LLVMIsStringAttribute(LLVMAttributeRef A) {
   return unwrap(A).isStringAttribute();
 }
 
 char *LLVMGetDiagInfoDescription(LLVMDiagnosticInfoRef DI) {
   std::string MsgStorage;
   raw_string_ostream Stream(MsgStorage);
   DiagnosticPrinterRawOStream DP(Stream);
 
   unwrap(DI)->print(DP);
   Stream.flush();
 
   return LLVMCreateMessage(MsgStorage.c_str());
 }
 
 LLVMDiagnosticSeverity LLVMGetDiagInfoSeverity(LLVMDiagnosticInfoRef DI) {
     LLVMDiagnosticSeverity severity;
 
     switch(unwrap(DI)->getSeverity()) {
     default:
       severity = LLVMDSError;
       break;
     case DS_Warning:
       severity = LLVMDSWarning;
       break;
     case DS_Remark:
       severity = LLVMDSRemark;
       break;
     case DS_Note:
       severity = LLVMDSNote;
       break;
     }
 
     return severity;
 }
 
 /*===-- Operations on modules ---------------------------------------------===*/
 
 LLVMModuleRef LLVMModuleCreateWithName(const char *ModuleID) {
   return wrap(new Module(ModuleID, *GlobalContext));
 }
 
 LLVMModuleRef LLVMModuleCreateWithNameInContext(const char *ModuleID,
                                                 LLVMContextRef C) {
   return wrap(new Module(ModuleID, *unwrap(C)));
 }
 
 void LLVMDisposeModule(LLVMModuleRef M) {
   delete unwrap(M);
 }
 
 const char *LLVMGetModuleIdentifier(LLVMModuleRef M, size_t *Len) {
   auto &Str = unwrap(M)->getModuleIdentifier();
   *Len = Str.length();
   return Str.c_str();
 }
 
 void LLVMSetModuleIdentifier(LLVMModuleRef M, const char *Ident, size_t Len) {
   unwrap(M)->setModuleIdentifier(StringRef(Ident, Len));
 }
 
 const char *LLVMGetSourceFileName(LLVMModuleRef M, size_t *Len) {
   auto &Str = unwrap(M)->getSourceFileName();
   *Len = Str.length();
   return Str.c_str();
 }
 
 void LLVMSetSourceFileName(LLVMModuleRef M, const char *Name, size_t Len) {
   unwrap(M)->setSourceFileName(StringRef(Name, Len));
 }
 
 /*--.. Data layout .........................................................--*/
 const char *LLVMGetDataLayoutStr(LLVMModuleRef M) {
   return unwrap(M)->getDataLayoutStr().c_str();
 }
 
 const char *LLVMGetDataLayout(LLVMModuleRef M) {
   return LLVMGetDataLayoutStr(M);
 }
 
 void LLVMSetDataLayout(LLVMModuleRef M, const char *DataLayoutStr) {
   unwrap(M)->setDataLayout(DataLayoutStr);
 }
 
 /*--.. Target triple .......................................................--*/
 const char * LLVMGetTarget(LLVMModuleRef M) {
   return unwrap(M)->getTargetTriple().c_str();
 }
 
 void LLVMSetTarget(LLVMModuleRef M, const char *Triple) {
   unwrap(M)->setTargetTriple(Triple);
 }
 
 /*--.. Module flags ........................................................--*/
 struct LLVMOpaqueModuleFlagEntry {
   LLVMModuleFlagBehavior Behavior;
   const char *Key;
   size_t KeyLen;
   LLVMMetadataRef Metadata;
 };
 
 static Module::ModFlagBehavior
 map_to_llvmModFlagBehavior(LLVMModuleFlagBehavior Behavior) {
   switch (Behavior) {
   case LLVMModuleFlagBehaviorError:
     return Module::ModFlagBehavior::Error;
   case LLVMModuleFlagBehaviorWarning:
     return Module::ModFlagBehavior::Warning;
   case LLVMModuleFlagBehaviorRequire:
     return Module::ModFlagBehavior::Require;
   case LLVMModuleFlagBehaviorOverride:
     return Module::ModFlagBehavior::Override;
   case LLVMModuleFlagBehaviorAppend:
     return Module::ModFlagBehavior::Append;
   case LLVMModuleFlagBehaviorAppendUnique:
     return Module::ModFlagBehavior::AppendUnique;
   }
   llvm_unreachable("Unknown LLVMModuleFlagBehavior");
 }
 
 static LLVMModuleFlagBehavior
 map_from_llvmModFlagBehavior(Module::ModFlagBehavior Behavior) {
   switch (Behavior) {
   case Module::ModFlagBehavior::Error:
     return LLVMModuleFlagBehaviorError;
   case Module::ModFlagBehavior::Warning:
     return LLVMModuleFlagBehaviorWarning;
   case Module::ModFlagBehavior::Require:
     return LLVMModuleFlagBehaviorRequire;
   case Module::ModFlagBehavior::Override:
     return LLVMModuleFlagBehaviorOverride;
   case Module::ModFlagBehavior::Append:
     return LLVMModuleFlagBehaviorAppend;
   case Module::ModFlagBehavior::AppendUnique:
     return LLVMModuleFlagBehaviorAppendUnique;
   default:
     llvm_unreachable("Unhandled Flag Behavior");
   }
 }
 
 LLVMModuleFlagEntry *LLVMCopyModuleFlagsMetadata(LLVMModuleRef M, size_t *Len) {
   SmallVector<Module::ModuleFlagEntry, 8> MFEs;
   unwrap(M)->getModuleFlagsMetadata(MFEs);
 
   LLVMOpaqueModuleFlagEntry *Result = static_cast<LLVMOpaqueModuleFlagEntry *>(
       safe_malloc(MFEs.size() * sizeof(LLVMOpaqueModuleFlagEntry)));
   for (unsigned i = 0; i < MFEs.size(); ++i) {
     const auto &ModuleFlag = MFEs[i];
     Result[i].Behavior = map_from_llvmModFlagBehavior(ModuleFlag.Behavior);
     Result[i].Key = ModuleFlag.Key->getString().data();
     Result[i].KeyLen = ModuleFlag.Key->getString().size();
     Result[i].Metadata = wrap(ModuleFlag.Val);
   }
   *Len = MFEs.size();
   return Result;
 }
 
 void LLVMDisposeModuleFlagsMetadata(LLVMModuleFlagEntry *Entries) {
   free(Entries);
 }
 
 LLVMModuleFlagBehavior
 LLVMModuleFlagEntriesGetFlagBehavior(LLVMModuleFlagEntry *Entries,
                                      unsigned Index) {
   LLVMOpaqueModuleFlagEntry MFE =
       static_cast<LLVMOpaqueModuleFlagEntry>(Entries[Index]);
   return MFE.Behavior;
 }
 
 const char *LLVMModuleFlagEntriesGetKey(LLVMModuleFlagEntry *Entries,
                                         unsigned Index, size_t *Len) {
   LLVMOpaqueModuleFlagEntry MFE =
       static_cast<LLVMOpaqueModuleFlagEntry>(Entries[Index]);
   *Len = MFE.KeyLen;
   return MFE.Key;
 }
 
 LLVMMetadataRef LLVMModuleFlagEntriesGetMetadata(LLVMModuleFlagEntry *Entries,
                                                  unsigned Index) {
   LLVMOpaqueModuleFlagEntry MFE =
       static_cast<LLVMOpaqueModuleFlagEntry>(Entries[Index]);
   return MFE.Metadata;
 }
 
 LLVMMetadataRef LLVMGetModuleFlag(LLVMModuleRef M,
                                   const char *Key, size_t KeyLen) {
   return wrap(unwrap(M)->getModuleFlag({Key, KeyLen}));
 }
 
 void LLVMAddModuleFlag(LLVMModuleRef M, LLVMModuleFlagBehavior Behavior,
                        const char *Key, size_t KeyLen,
                        LLVMMetadataRef Val) {
   unwrap(M)->addModuleFlag(map_to_llvmModFlagBehavior(Behavior),
                            {Key, KeyLen}, unwrap(Val));
 }
 
 /*--.. Printing modules ....................................................--*/
 
 void LLVMDumpModule(LLVMModuleRef M) {
   unwrap(M)->print(errs(), nullptr,
                    /*ShouldPreserveUseListOrder=*/false, /*IsForDebug=*/true);
 }
 
 LLVMBool LLVMPrintModuleToFile(LLVMModuleRef M, const char *Filename,
                                char **ErrorMessage) {
   std::error_code EC;
   raw_fd_ostream dest(Filename, EC, sys::fs::OF_Text);
   if (EC) {
     *ErrorMessage = strdup(EC.message().c_str());
     return true;
   }
 
   unwrap(M)->print(dest, nullptr);
 
   dest.close();
 
   if (dest.has_error()) {
     std::string E = "Error printing to file: " + dest.error().message();
     *ErrorMessage = strdup(E.c_str());
     return true;
   }
 
   return false;
 }
 
 char *LLVMPrintModuleToString(LLVMModuleRef M) {
   std::string buf;
   raw_string_ostream os(buf);
 
   unwrap(M)->print(os, nullptr);
   os.flush();
 
   return strdup(buf.c_str());
 }
 
 /*--.. Operations on inline assembler ......................................--*/
 void LLVMSetModuleInlineAsm2(LLVMModuleRef M, const char *Asm, size_t Len) {
   unwrap(M)->setModuleInlineAsm(StringRef(Asm, Len));
 }
 
 void LLVMSetModuleInlineAsm(LLVMModuleRef M, const char *Asm) {
   unwrap(M)->setModuleInlineAsm(StringRef(Asm));
 }
 
 void LLVMAppendModuleInlineAsm(LLVMModuleRef M, const char *Asm, size_t Len) {
   unwrap(M)->appendModuleInlineAsm(StringRef(Asm, Len));
 }
 
 const char *LLVMGetModuleInlineAsm(LLVMModuleRef M, size_t *Len) {
   auto &Str = unwrap(M)->getModuleInlineAsm();
   *Len = Str.length();
   return Str.c_str();
 }
 
 LLVMValueRef LLVMGetInlineAsm(LLVMTypeRef Ty,
                               char *AsmString, size_t AsmStringSize,
                               char *Constraints, size_t ConstraintsSize,
                               LLVMBool HasSideEffects, LLVMBool IsAlignStack,
                               LLVMInlineAsmDialect Dialect) {
   InlineAsm::AsmDialect AD;
   switch (Dialect) {
   case LLVMInlineAsmDialectATT:
     AD = InlineAsm::AD_ATT;
     break;
   case LLVMInlineAsmDialectIntel:
     AD = InlineAsm::AD_Intel;
     break;
   }
   return wrap(InlineAsm::get(unwrap<FunctionType>(Ty),
                              StringRef(AsmString, AsmStringSize),
                              StringRef(Constraints, ConstraintsSize),
                              HasSideEffects, IsAlignStack, AD));
 }
 
 
 /*--.. Operations on module contexts ......................................--*/
 LLVMContextRef LLVMGetModuleContext(LLVMModuleRef M) {
   return wrap(&unwrap(M)->getContext());
 }
 
 
 /*===-- Operations on types -----------------------------------------------===*/
 
 /*--.. Operations on all types (mostly) ....................................--*/
 
 LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty) {
   switch (unwrap(Ty)->getTypeID()) {
   case Type::VoidTyID:
     return LLVMVoidTypeKind;
   case Type::HalfTyID:
     return LLVMHalfTypeKind;
   case Type::BFloatTyID:
     return LLVMBFloatTypeKind;
   case Type::FloatTyID:
     return LLVMFloatTypeKind;
   case Type::DoubleTyID:
     return LLVMDoubleTypeKind;
   case Type::X86_FP80TyID:
     return LLVMX86_FP80TypeKind;
   case Type::FP128TyID:
     return LLVMFP128TypeKind;
   case Type::PPC_FP128TyID:
     return LLVMPPC_FP128TypeKind;
   case Type::LabelTyID:
     return LLVMLabelTypeKind;
   case Type::MetadataTyID:
     return LLVMMetadataTypeKind;
   case Type::IntegerTyID:
     return LLVMIntegerTypeKind;
   case Type::FunctionTyID:
     return LLVMFunctionTypeKind;
   case Type::StructTyID:
     return LLVMStructTypeKind;
   case Type::ArrayTyID:
     return LLVMArrayTypeKind;
   case Type::PointerTyID:
     return LLVMPointerTypeKind;
   case Type::FixedVectorTyID:
     return LLVMVectorTypeKind;
   case Type::X86_MMXTyID:
     return LLVMX86_MMXTypeKind;
   case Type::TokenTyID:
     return LLVMTokenTypeKind;
   case Type::ScalableVectorTyID:
     return LLVMScalableVectorTypeKind;
   }
   llvm_unreachable("Unhandled TypeID.");
 }
 
 LLVMBool LLVMTypeIsSized(LLVMTypeRef Ty)
 {
     return unwrap(Ty)->isSized();
 }
 
 LLVMContextRef LLVMGetTypeContext(LLVMTypeRef Ty) {
   return wrap(&unwrap(Ty)->getContext());
 }
 
 void LLVMDumpType(LLVMTypeRef Ty) {
   return unwrap(Ty)->print(errs(), /*IsForDebug=*/true);
 }
 
 char *LLVMPrintTypeToString(LLVMTypeRef Ty) {
   std::string buf;
   raw_string_ostream os(buf);
 
   if (unwrap(Ty))
     unwrap(Ty)->print(os);
   else
     os << "Printing <null> Type";
 
   os.flush();
 
   return strdup(buf.c_str());
 }
 
 /*--.. Operations on integer types .........................................--*/
 
 LLVMTypeRef LLVMInt1TypeInContext(LLVMContextRef C)  {
   return (LLVMTypeRef) Type::getInt1Ty(*unwrap(C));
 }
 LLVMTypeRef LLVMInt8TypeInContext(LLVMContextRef C)  {
   return (LLVMTypeRef) Type::getInt8Ty(*unwrap(C));
 }
 LLVMTypeRef LLVMInt16TypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getInt16Ty(*unwrap(C));
 }
 LLVMTypeRef LLVMInt32TypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getInt32Ty(*unwrap(C));
 }
 LLVMTypeRef LLVMInt64TypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getInt64Ty(*unwrap(C));
 }
 LLVMTypeRef LLVMInt128TypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getInt128Ty(*unwrap(C));
 }
 LLVMTypeRef LLVMIntTypeInContext(LLVMContextRef C, unsigned NumBits) {
   return wrap(IntegerType::get(*unwrap(C), NumBits));
 }
 
 LLVMTypeRef LLVMInt1Type(void)  {
   return LLVMInt1TypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMInt8Type(void)  {
   return LLVMInt8TypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMInt16Type(void) {
   return LLVMInt16TypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMInt32Type(void) {
   return LLVMInt32TypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMInt64Type(void) {
   return LLVMInt64TypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMInt128Type(void) {
   return LLVMInt128TypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMIntType(unsigned NumBits) {
   return LLVMIntTypeInContext(LLVMGetGlobalContext(), NumBits);
 }
 
 unsigned LLVMGetIntTypeWidth(LLVMTypeRef IntegerTy) {
   return unwrap<IntegerType>(IntegerTy)->getBitWidth();
 }
 
 /*--.. Operations on real types ............................................--*/
 
 LLVMTypeRef LLVMHalfTypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getHalfTy(*unwrap(C));
 }
 LLVMTypeRef LLVMBFloatTypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getBFloatTy(*unwrap(C));
 }
 LLVMTypeRef LLVMFloatTypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getFloatTy(*unwrap(C));
 }
 LLVMTypeRef LLVMDoubleTypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getDoubleTy(*unwrap(C));
 }
 LLVMTypeRef LLVMX86FP80TypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getX86_FP80Ty(*unwrap(C));
 }
 LLVMTypeRef LLVMFP128TypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getFP128Ty(*unwrap(C));
 }
 LLVMTypeRef LLVMPPCFP128TypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getPPC_FP128Ty(*unwrap(C));
 }
 LLVMTypeRef LLVMX86MMXTypeInContext(LLVMContextRef C) {
   return (LLVMTypeRef) Type::getX86_MMXTy(*unwrap(C));
 }
 
 LLVMTypeRef LLVMHalfType(void) {
   return LLVMHalfTypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMBFloatType(void) {
   return LLVMBFloatTypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMFloatType(void) {
   return LLVMFloatTypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMDoubleType(void) {
   return LLVMDoubleTypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMX86FP80Type(void) {
   return LLVMX86FP80TypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMFP128Type(void) {
   return LLVMFP128TypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMPPCFP128Type(void) {
   return LLVMPPCFP128TypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMX86MMXType(void) {
   return LLVMX86MMXTypeInContext(LLVMGetGlobalContext());
 }
 
 /*--.. Operations on function types ........................................--*/
 
 LLVMTypeRef LLVMFunctionType(LLVMTypeRef ReturnType,
                              LLVMTypeRef *ParamTypes, unsigned ParamCount,
                              LLVMBool IsVarArg) {
   ArrayRef<Type*> Tys(unwrap(ParamTypes), ParamCount);
   return wrap(FunctionType::get(unwrap(ReturnType), Tys, IsVarArg != 0));
 }
 
 LLVMBool LLVMIsFunctionVarArg(LLVMTypeRef FunctionTy) {
   return unwrap<FunctionType>(FunctionTy)->isVarArg();
 }
 
 LLVMTypeRef LLVMGetReturnType(LLVMTypeRef FunctionTy) {
   return wrap(unwrap<FunctionType>(FunctionTy)->getReturnType());
 }
 
 unsigned LLVMCountParamTypes(LLVMTypeRef FunctionTy) {
   return unwrap<FunctionType>(FunctionTy)->getNumParams();
 }
 
 void LLVMGetParamTypes(LLVMTypeRef FunctionTy, LLVMTypeRef *Dest) {
   FunctionType *Ty = unwrap<FunctionType>(FunctionTy);
   for (FunctionType::param_iterator I = Ty->param_begin(),
                                     E = Ty->param_end(); I != E; ++I)
     *Dest++ = wrap(*I);
 }
 
 /*--.. Operations on struct types ..........................................--*/
 
 LLVMTypeRef LLVMStructTypeInContext(LLVMContextRef C, LLVMTypeRef *ElementTypes,
                            unsigned ElementCount, LLVMBool Packed) {
   ArrayRef<Type*> Tys(unwrap(ElementTypes), ElementCount);
   return wrap(StructType::get(*unwrap(C), Tys, Packed != 0));
 }
 
 LLVMTypeRef LLVMStructType(LLVMTypeRef *ElementTypes,
                            unsigned ElementCount, LLVMBool Packed) {
   return LLVMStructTypeInContext(LLVMGetGlobalContext(), ElementTypes,
                                  ElementCount, Packed);
 }
 
 LLVMTypeRef LLVMStructCreateNamed(LLVMContextRef C, const char *Name)
 {
   return wrap(StructType::create(*unwrap(C), Name));
 }
 
 const char *LLVMGetStructName(LLVMTypeRef Ty)
 {
   StructType *Type = unwrap<StructType>(Ty);
   if (!Type->hasName())
     return nullptr;
   return Type->getName().data();
 }
 
 void LLVMStructSetBody(LLVMTypeRef StructTy, LLVMTypeRef *ElementTypes,
                        unsigned ElementCount, LLVMBool Packed) {
   ArrayRef<Type*> Tys(unwrap(ElementTypes), ElementCount);
   unwrap<StructType>(StructTy)->setBody(Tys, Packed != 0);
 }
 
 unsigned LLVMCountStructElementTypes(LLVMTypeRef StructTy) {
   return unwrap<StructType>(StructTy)->getNumElements();
 }
 
 void LLVMGetStructElementTypes(LLVMTypeRef StructTy, LLVMTypeRef *Dest) {
   StructType *Ty = unwrap<StructType>(StructTy);
   for (StructType::element_iterator I = Ty->element_begin(),
                                     E = Ty->element_end(); I != E; ++I)
     *Dest++ = wrap(*I);
 }
 
 LLVMTypeRef LLVMStructGetTypeAtIndex(LLVMTypeRef StructTy, unsigned i) {
   StructType *Ty = unwrap<StructType>(StructTy);
   return wrap(Ty->getTypeAtIndex(i));
 }
 
 LLVMBool LLVMIsPackedStruct(LLVMTypeRef StructTy) {
   return unwrap<StructType>(StructTy)->isPacked();
 }
 
 LLVMBool LLVMIsOpaqueStruct(LLVMTypeRef StructTy) {
   return unwrap<StructType>(StructTy)->isOpaque();
 }
 
 LLVMBool LLVMIsLiteralStruct(LLVMTypeRef StructTy) {
   return unwrap<StructType>(StructTy)->isLiteral();
 }
 
 LLVMTypeRef LLVMGetTypeByName(LLVMModuleRef M, const char *Name) {
   return wrap(unwrap(M)->getTypeByName(Name));
 }
 
 /*--.. Operations on array, pointer, and vector types (sequence types) .....--*/
 
 void LLVMGetSubtypes(LLVMTypeRef Tp, LLVMTypeRef *Arr) {
     int i = 0;
     for (auto *T : unwrap(Tp)->subtypes()) {
         Arr[i] = wrap(T);
         i++;
     }
 }
 
 LLVMTypeRef LLVMArrayType(LLVMTypeRef ElementType, unsigned ElementCount) {
   return wrap(ArrayType::get(unwrap(ElementType), ElementCount));
 }
 
 LLVMTypeRef LLVMPointerType(LLVMTypeRef ElementType, unsigned AddressSpace) {
   return wrap(PointerType::get(unwrap(ElementType), AddressSpace));
 }
 
 LLVMTypeRef LLVMVectorType(LLVMTypeRef ElementType, unsigned ElementCount) {
   return wrap(FixedVectorType::get(unwrap(ElementType), ElementCount));
 }
 
 LLVMTypeRef LLVMGetElementType(LLVMTypeRef WrappedTy) {
   auto *Ty = unwrap<Type>(WrappedTy);
   if (auto *PTy = dyn_cast<PointerType>(Ty))
     return wrap(PTy->getElementType());
   if (auto *ATy = dyn_cast<ArrayType>(Ty))
     return wrap(ATy->getElementType());
   return wrap(cast<VectorType>(Ty)->getElementType());
 }
 
 unsigned LLVMGetNumContainedTypes(LLVMTypeRef Tp) {
     return unwrap(Tp)->getNumContainedTypes();
 }
 
 unsigned LLVMGetArrayLength(LLVMTypeRef ArrayTy) {
   return unwrap<ArrayType>(ArrayTy)->getNumElements();
 }
 
 unsigned LLVMGetPointerAddressSpace(LLVMTypeRef PointerTy) {
   return unwrap<PointerType>(PointerTy)->getAddressSpace();
 }
 
 unsigned LLVMGetVectorSize(LLVMTypeRef VectorTy) {
   return unwrap<VectorType>(VectorTy)->getNumElements();
 }
 
 /*--.. Operations on other types ...........................................--*/
 
 LLVMTypeRef LLVMVoidTypeInContext(LLVMContextRef C)  {
   return wrap(Type::getVoidTy(*unwrap(C)));
 }
 LLVMTypeRef LLVMLabelTypeInContext(LLVMContextRef C) {
   return wrap(Type::getLabelTy(*unwrap(C)));
 }
 LLVMTypeRef LLVMTokenTypeInContext(LLVMContextRef C) {
   return wrap(Type::getTokenTy(*unwrap(C)));
 }
 LLVMTypeRef LLVMMetadataTypeInContext(LLVMContextRef C) {
   return wrap(Type::getMetadataTy(*unwrap(C)));
 }
 
 LLVMTypeRef LLVMVoidType(void)  {
   return LLVMVoidTypeInContext(LLVMGetGlobalContext());
 }
 LLVMTypeRef LLVMLabelType(void) {
   return LLVMLabelTypeInContext(LLVMGetGlobalContext());
 }
 
 /*===-- Operations on values ----------------------------------------------===*/
 
 /*--.. Operations on all values ............................................--*/
 
 LLVMTypeRef LLVMTypeOf(LLVMValueRef Val) {
   return wrap(unwrap(Val)->getType());
 }
 
 LLVMValueKind LLVMGetValueKind(LLVMValueRef Val) {
     switch(unwrap(Val)->getValueID()) {
 #define HANDLE_VALUE(Name) \
   case Value::Name##Val: \
     return LLVM##Name##ValueKind;
 #include "llvm/IR/Value.def"
   default:
     return LLVMInstructionValueKind;
   }
 }
 
 const char *LLVMGetValueName2(LLVMValueRef Val, size_t *Length) {
   auto *V = unwrap(Val);
   *Length = V->getName().size();
   return V->getName().data();
 }
 
 void LLVMSetValueName2(LLVMValueRef Val, const char *Name, size_t NameLen) {
   unwrap(Val)->setName(StringRef(Name, NameLen));
 }
 
 const char *LLVMGetValueName(LLVMValueRef Val) {
   return unwrap(Val)->getName().data();
 }
 
 void LLVMSetValueName(LLVMValueRef Val, const char *Name) {
   unwrap(Val)->setName(Name);
 }
 
 void LLVMDumpValue(LLVMValueRef Val) {
   unwrap(Val)->print(errs(), /*IsForDebug=*/true);
 }
 
 char* LLVMPrintValueToString(LLVMValueRef Val) {
   std::string buf;
   raw_string_ostream os(buf);
 
   if (unwrap(Val))
     unwrap(Val)->print(os);
   else
     os << "Printing <null> Value";
 
   os.flush();
 
   return strdup(buf.c_str());
 }
 
 void LLVMReplaceAllUsesWith(LLVMValueRef OldVal, LLVMValueRef NewVal) {
   unwrap(OldVal)->replaceAllUsesWith(unwrap(NewVal));
 }
 
 int LLVMHasMetadata(LLVMValueRef Inst) {
   return unwrap<Instruction>(Inst)->hasMetadata();
 }
 
 LLVMValueRef LLVMGetMetadata(LLVMValueRef Inst, unsigned KindID) {
   auto *I = unwrap<Instruction>(Inst);
   assert(I && "Expected instruction");
   if (auto *MD = I->getMetadata(KindID))
     return wrap(MetadataAsValue::get(I->getContext(), MD));
   return nullptr;
 }
 
 // MetadataAsValue uses a canonical format which strips the actual MDNode for
 // MDNode with just a single constant value, storing just a ConstantAsMetadata
 // This undoes this canonicalization, reconstructing the MDNode.
 static MDNode *extractMDNode(MetadataAsValue *MAV) {
   Metadata *MD = MAV->getMetadata();
   assert((isa<MDNode>(MD) || isa<ConstantAsMetadata>(MD)) &&
       "Expected a metadata node or a canonicalized constant");
 
   if (MDNode *N = dyn_cast<MDNode>(MD))
     return N;
 
   return MDNode::get(MAV->getContext(), MD);
 }
 
 void LLVMSetMetadata(LLVMValueRef Inst, unsigned KindID, LLVMValueRef Val) {
   MDNode *N = Val ? extractMDNode(unwrap<MetadataAsValue>(Val)) : nullptr;
 
   unwrap<Instruction>(Inst)->setMetadata(KindID, N);
 }
 
 struct LLVMOpaqueValueMetadataEntry {
   unsigned Kind;
   LLVMMetadataRef Metadata;
 };
 
 using MetadataEntries = SmallVectorImpl<std::pair<unsigned, MDNode *>>;
 static LLVMValueMetadataEntry *
 llvm_getMetadata(size_t *NumEntries,
                  llvm::function_ref<void(MetadataEntries &)> AccessMD) {
   SmallVector<std::pair<unsigned, MDNode *>, 8> MVEs;
   AccessMD(MVEs);
 
   LLVMOpaqueValueMetadataEntry *Result =
   static_cast<LLVMOpaqueValueMetadataEntry *>(
                                               safe_malloc(MVEs.size() * sizeof(LLVMOpaqueValueMetadataEntry)));
   for (unsigned i = 0; i < MVEs.size(); ++i) {
     const auto &ModuleFlag = MVEs[i];
     Result[i].Kind = ModuleFlag.first;
     Result[i].Metadata = wrap(ModuleFlag.second);
   }
   *NumEntries = MVEs.size();
   return Result;
 }
 
 LLVMValueMetadataEntry *
 LLVMInstructionGetAllMetadataOtherThanDebugLoc(LLVMValueRef Value,
                                                size_t *NumEntries) {
   return llvm_getMetadata(NumEntries, [&Value](MetadataEntries &Entries) {
     unwrap<Instruction>(Value)->getAllMetadata(Entries);
   });
 }
 
 /*--.. Conversion functions ................................................--*/
 
 #define LLVM_DEFINE_VALUE_CAST(name)                                       \
   LLVMValueRef LLVMIsA##name(LLVMValueRef Val) {                           \
     return wrap(static_cast<Value*>(dyn_cast_or_null<name>(unwrap(Val)))); \
   }
 
 LLVM_FOR_EACH_VALUE_SUBCLASS(LLVM_DEFINE_VALUE_CAST)
 
 LLVMValueRef LLVMIsAMDNode(LLVMValueRef Val) {
   if (auto *MD = dyn_cast_or_null<MetadataAsValue>(unwrap(Val)))
     if (isa<MDNode>(MD->getMetadata()) ||
         isa<ValueAsMetadata>(MD->getMetadata()))
       return Val;
   return nullptr;
 }
 
 LLVMValueRef LLVMIsAMDString(LLVMValueRef Val) {
   if (auto *MD = dyn_cast_or_null<MetadataAsValue>(unwrap(Val)))
     if (isa<MDString>(MD->getMetadata()))
       return Val;
   return nullptr;
 }
 
 /*--.. Operations on Uses ..................................................--*/
 LLVMUseRef LLVMGetFirstUse(LLVMValueRef Val) {
   Value *V = unwrap(Val);
   Value::use_iterator I = V->use_begin();
   if (I == V->use_end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMUseRef LLVMGetNextUse(LLVMUseRef U) {
   Use *Next = unwrap(U)->getNext();
   if (Next)
     return wrap(Next);
   return nullptr;
 }
 
 LLVMValueRef LLVMGetUser(LLVMUseRef U) {
   return wrap(unwrap(U)->getUser());
 }
 
 LLVMValueRef LLVMGetUsedValue(LLVMUseRef U) {
   return wrap(unwrap(U)->get());
 }
 
 /*--.. Operations on Users .................................................--*/
 
 static LLVMValueRef getMDNodeOperandImpl(LLVMContext &Context, const MDNode *N,
                                          unsigned Index) {
   Metadata *Op = N->getOperand(Index);
   if (!Op)
     return nullptr;
   if (auto *C = dyn_cast<ConstantAsMetadata>(Op))
     return wrap(C->getValue());
   return wrap(MetadataAsValue::get(Context, Op));
 }
 
 LLVMValueRef LLVMGetOperand(LLVMValueRef Val, unsigned Index) {
   Value *V = unwrap(Val);
   if (auto *MD = dyn_cast<MetadataAsValue>(V)) {
     if (auto *L = dyn_cast<ValueAsMetadata>(MD->getMetadata())) {
       assert(Index == 0 && "Function-local metadata can only have one operand");
       return wrap(L->getValue());
     }
     return getMDNodeOperandImpl(V->getContext(),
                                 cast<MDNode>(MD->getMetadata()), Index);
   }
 
   return wrap(cast<User>(V)->getOperand(Index));
 }
 
 LLVMUseRef LLVMGetOperandUse(LLVMValueRef Val, unsigned Index) {
   Value *V = unwrap(Val);
   return wrap(&cast<User>(V)->getOperandUse(Index));
 }
 
 void LLVMSetOperand(LLVMValueRef Val, unsigned Index, LLVMValueRef Op) {
   unwrap<User>(Val)->setOperand(Index, unwrap(Op));
 }
 
 int LLVMGetNumOperands(LLVMValueRef Val) {
   Value *V = unwrap(Val);
   if (isa<MetadataAsValue>(V))
     return LLVMGetMDNodeNumOperands(Val);
 
   return cast<User>(V)->getNumOperands();
 }
 
 /*--.. Operations on constants of any type .................................--*/
 
 LLVMValueRef LLVMConstNull(LLVMTypeRef Ty) {
   return wrap(Constant::getNullValue(unwrap(Ty)));
 }
 
 LLVMValueRef LLVMConstAllOnes(LLVMTypeRef Ty) {
   return wrap(Constant::getAllOnesValue(unwrap(Ty)));
 }
 
 LLVMValueRef LLVMGetUndef(LLVMTypeRef Ty) {
   return wrap(UndefValue::get(unwrap(Ty)));
 }
 
 LLVMBool LLVMIsConstant(LLVMValueRef Ty) {
   return isa<Constant>(unwrap(Ty));
 }
 
 LLVMBool LLVMIsNull(LLVMValueRef Val) {
   if (Constant *C = dyn_cast<Constant>(unwrap(Val)))
     return C->isNullValue();
   return false;
 }
 
 LLVMBool LLVMIsUndef(LLVMValueRef Val) {
   return isa<UndefValue>(unwrap(Val));
 }
 
 LLVMValueRef LLVMConstPointerNull(LLVMTypeRef Ty) {
   return wrap(ConstantPointerNull::get(unwrap<PointerType>(Ty)));
 }
 
 /*--.. Operations on metadata nodes ........................................--*/
 
 LLVMMetadataRef LLVMMDStringInContext2(LLVMContextRef C, const char *Str,
                                        size_t SLen) {
   return wrap(MDString::get(*unwrap(C), StringRef(Str, SLen)));
 }
 
 LLVMMetadataRef LLVMMDNodeInContext2(LLVMContextRef C, LLVMMetadataRef *MDs,
                                      size_t Count) {
   return wrap(MDNode::get(*unwrap(C), ArrayRef<Metadata*>(unwrap(MDs), Count)));
 }
 
 LLVMValueRef LLVMMDStringInContext(LLVMContextRef C, const char *Str,
                                    unsigned SLen) {
   LLVMContext &Context = *unwrap(C);
   return wrap(MetadataAsValue::get(
       Context, MDString::get(Context, StringRef(Str, SLen))));
 }
 
 LLVMValueRef LLVMMDString(const char *Str, unsigned SLen) {
   return LLVMMDStringInContext(LLVMGetGlobalContext(), Str, SLen);
 }
 
 LLVMValueRef LLVMMDNodeInContext(LLVMContextRef C, LLVMValueRef *Vals,
                                  unsigned Count) {
   LLVMContext &Context = *unwrap(C);
   SmallVector<Metadata *, 8> MDs;
   for (auto *OV : makeArrayRef(Vals, Count)) {
     Value *V = unwrap(OV);
     Metadata *MD;
     if (!V)
       MD = nullptr;
     else if (auto *C = dyn_cast<Constant>(V))
       MD = ConstantAsMetadata::get(C);
     else if (auto *MDV = dyn_cast<MetadataAsValue>(V)) {
       MD = MDV->getMetadata();
       assert(!isa<LocalAsMetadata>(MD) && "Unexpected function-local metadata "
                                           "outside of direct argument to call");
     } else {
       // This is function-local metadata.  Pretend to make an MDNode.
       assert(Count == 1 &&
              "Expected only one operand to function-local metadata");
       return wrap(MetadataAsValue::get(Context, LocalAsMetadata::get(V)));
     }
 
     MDs.push_back(MD);
   }
   return wrap(MetadataAsValue::get(Context, MDNode::get(Context, MDs)));
 }
 
 LLVMValueRef LLVMMDNode(LLVMValueRef *Vals, unsigned Count) {
   return LLVMMDNodeInContext(LLVMGetGlobalContext(), Vals, Count);
 }
 
 LLVMValueRef LLVMMetadataAsValue(LLVMContextRef C, LLVMMetadataRef MD) {
   return wrap(MetadataAsValue::get(*unwrap(C), unwrap(MD)));
 }
 
 LLVMMetadataRef LLVMValueAsMetadata(LLVMValueRef Val) {
   auto *V = unwrap(Val);
   if (auto *C = dyn_cast<Constant>(V))
     return wrap(ConstantAsMetadata::get(C));
   if (auto *MAV = dyn_cast<MetadataAsValue>(V))
     return wrap(MAV->getMetadata());
   return wrap(ValueAsMetadata::get(V));
 }
 
 const char *LLVMGetMDString(LLVMValueRef V, unsigned *Length) {
   if (const auto *MD = dyn_cast<MetadataAsValue>(unwrap(V)))
     if (const MDString *S = dyn_cast<MDString>(MD->getMetadata())) {
       *Length = S->getString().size();
       return S->getString().data();
     }
   *Length = 0;
   return nullptr;
 }
 
 unsigned LLVMGetMDNodeNumOperands(LLVMValueRef V) {
   auto *MD = cast<MetadataAsValue>(unwrap(V));
   if (isa<ValueAsMetadata>(MD->getMetadata()))
     return 1;
   return cast<MDNode>(MD->getMetadata())->getNumOperands();
 }
 
 LLVMNamedMDNodeRef LLVMGetFirstNamedMetadata(LLVMModuleRef M) {
   Module *Mod = unwrap(M);
   Module::named_metadata_iterator I = Mod->named_metadata_begin();
   if (I == Mod->named_metadata_end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMNamedMDNodeRef LLVMGetLastNamedMetadata(LLVMModuleRef M) {
   Module *Mod = unwrap(M);
   Module::named_metadata_iterator I = Mod->named_metadata_end();
   if (I == Mod->named_metadata_begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMNamedMDNodeRef LLVMGetNextNamedMetadata(LLVMNamedMDNodeRef NMD) {
   NamedMDNode *NamedNode = unwrap<NamedMDNode>(NMD);
   Module::named_metadata_iterator I(NamedNode);
   if (++I == NamedNode->getParent()->named_metadata_end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMNamedMDNodeRef LLVMGetPreviousNamedMetadata(LLVMNamedMDNodeRef NMD) {
   NamedMDNode *NamedNode = unwrap<NamedMDNode>(NMD);
   Module::named_metadata_iterator I(NamedNode);
   if (I == NamedNode->getParent()->named_metadata_begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMNamedMDNodeRef LLVMGetNamedMetadata(LLVMModuleRef M,
                                         const char *Name, size_t NameLen) {
   return wrap(unwrap(M)->getNamedMetadata(StringRef(Name, NameLen)));
 }
 
 LLVMNamedMDNodeRef LLVMGetOrInsertNamedMetadata(LLVMModuleRef M,
                                                 const char *Name, size_t NameLen) {
   return wrap(unwrap(M)->getOrInsertNamedMetadata({Name, NameLen}));
 }
 
 const char *LLVMGetNamedMetadataName(LLVMNamedMDNodeRef NMD, size_t *NameLen) {
   NamedMDNode *NamedNode = unwrap<NamedMDNode>(NMD);
   *NameLen = NamedNode->getName().size();
   return NamedNode->getName().data();
 }
 
 void LLVMGetMDNodeOperands(LLVMValueRef V, LLVMValueRef *Dest) {
   auto *MD = cast<MetadataAsValue>(unwrap(V));
   if (auto *MDV = dyn_cast<ValueAsMetadata>(MD->getMetadata())) {
     *Dest = wrap(MDV->getValue());
     return;
   }
   const auto *N = cast<MDNode>(MD->getMetadata());
   const unsigned numOperands = N->getNumOperands();
   LLVMContext &Context = unwrap(V)->getContext();
   for (unsigned i = 0; i < numOperands; i++)
     Dest[i] = getMDNodeOperandImpl(Context, N, i);
 }
 
 unsigned LLVMGetNamedMetadataNumOperands(LLVMModuleRef M, const char *Name) {
   if (NamedMDNode *N = unwrap(M)->getNamedMetadata(Name)) {
     return N->getNumOperands();
   }
   return 0;
 }
 
 void LLVMGetNamedMetadataOperands(LLVMModuleRef M, const char *Name,
                                   LLVMValueRef *Dest) {
   NamedMDNode *N = unwrap(M)->getNamedMetadata(Name);
   if (!N)
     return;
   LLVMContext &Context = unwrap(M)->getContext();
   for (unsigned i=0;i<N->getNumOperands();i++)
     Dest[i] = wrap(MetadataAsValue::get(Context, N->getOperand(i)));
 }
 
 void LLVMAddNamedMetadataOperand(LLVMModuleRef M, const char *Name,
                                  LLVMValueRef Val) {
   NamedMDNode *N = unwrap(M)->getOrInsertNamedMetadata(Name);
   if (!N)
     return;
   if (!Val)
     return;
   N->addOperand(extractMDNode(unwrap<MetadataAsValue>(Val)));
 }
 
 const char *LLVMGetDebugLocDirectory(LLVMValueRef Val, unsigned *Length) {
   if (!Length) return nullptr;
   StringRef S;
   if (const auto *I = dyn_cast<Instruction>(unwrap(Val))) {
     if (const auto &DL = I->getDebugLoc()) {
       S = DL->getDirectory();
     }
   } else if (const auto *GV = dyn_cast<GlobalVariable>(unwrap(Val))) {
     SmallVector<DIGlobalVariableExpression *, 1> GVEs;
     GV->getDebugInfo(GVEs);
     if (GVEs.size())
       if (const DIGlobalVariable *DGV = GVEs[0]->getVariable())
         S = DGV->getDirectory();
   } else if (const auto *F = dyn_cast<Function>(unwrap(Val))) {
     if (const DISubprogram *DSP = F->getSubprogram())
       S = DSP->getDirectory();
   } else {
     assert(0 && "Expected Instruction, GlobalVariable or Function");
     return nullptr;
   }
   *Length = S.size();
   return S.data();
 }
 
 const char *LLVMGetDebugLocFilename(LLVMValueRef Val, unsigned *Length) {
   if (!Length) return nullptr;
   StringRef S;
   if (const auto *I = dyn_cast<Instruction>(unwrap(Val))) {
     if (const auto &DL = I->getDebugLoc()) {
       S = DL->getFilename();
     }
   } else if (const auto *GV = dyn_cast<GlobalVariable>(unwrap(Val))) {
     SmallVector<DIGlobalVariableExpression *, 1> GVEs;
     GV->getDebugInfo(GVEs);
     if (GVEs.size())
       if (const DIGlobalVariable *DGV = GVEs[0]->getVariable())
         S = DGV->getFilename();
   } else if (const auto *F = dyn_cast<Function>(unwrap(Val))) {
     if (const DISubprogram *DSP = F->getSubprogram())
       S = DSP->getFilename();
   } else {
     assert(0 && "Expected Instruction, GlobalVariable or Function");
     return nullptr;
   }
   *Length = S.size();
   return S.data();
 }
 
 unsigned LLVMGetDebugLocLine(LLVMValueRef Val) {
   unsigned L = 0;
   if (const auto *I = dyn_cast<Instruction>(unwrap(Val))) {
     if (const auto &DL = I->getDebugLoc()) {
       L = DL->getLine();
     }
   } else if (const auto *GV = dyn_cast<GlobalVariable>(unwrap(Val))) {
     SmallVector<DIGlobalVariableExpression *, 1> GVEs;
     GV->getDebugInfo(GVEs);
     if (GVEs.size())
       if (const DIGlobalVariable *DGV = GVEs[0]->getVariable())
         L = DGV->getLine();
   } else if (const auto *F = dyn_cast<Function>(unwrap(Val))) {
     if (const DISubprogram *DSP = F->getSubprogram())
       L = DSP->getLine();
   } else {
     assert(0 && "Expected Instruction, GlobalVariable or Function");
     return -1;
   }
   return L;
 }
 
 unsigned LLVMGetDebugLocColumn(LLVMValueRef Val) {
   unsigned C = 0;
   if (const auto *I = dyn_cast<Instruction>(unwrap(Val)))
     if (const auto &DL = I->getDebugLoc())
       C = DL->getColumn();
   return C;
 }
 
 /*--.. Operations on scalar constants ......................................--*/
 
 LLVMValueRef LLVMConstInt(LLVMTypeRef IntTy, unsigned long long N,
                           LLVMBool SignExtend) {
   return wrap(ConstantInt::get(unwrap<IntegerType>(IntTy), N, SignExtend != 0));
 }
 
 LLVMValueRef LLVMConstIntOfArbitraryPrecision(LLVMTypeRef IntTy,
                                               unsigned NumWords,
                                               const uint64_t Words[]) {
     IntegerType *Ty = unwrap<IntegerType>(IntTy);
     return wrap(ConstantInt::get(Ty->getContext(),
                                  APInt(Ty->getBitWidth(),
                                        makeArrayRef(Words, NumWords))));
 }
 
 LLVMValueRef LLVMConstIntOfString(LLVMTypeRef IntTy, const char Str[],
                                   uint8_t Radix) {
   return wrap(ConstantInt::get(unwrap<IntegerType>(IntTy), StringRef(Str),
                                Radix));
 }
 
 LLVMValueRef LLVMConstIntOfStringAndSize(LLVMTypeRef IntTy, const char Str[],
                                          unsigned SLen, uint8_t Radix) {
   return wrap(ConstantInt::get(unwrap<IntegerType>(IntTy), StringRef(Str, SLen),
                                Radix));
 }
 
 LLVMValueRef LLVMConstReal(LLVMTypeRef RealTy, double N) {
   return wrap(ConstantFP::get(unwrap(RealTy), N));
 }
 
 LLVMValueRef LLVMConstRealOfString(LLVMTypeRef RealTy, const char *Text) {
   return wrap(ConstantFP::get(unwrap(RealTy), StringRef(Text)));
 }
 
 LLVMValueRef LLVMConstRealOfStringAndSize(LLVMTypeRef RealTy, const char Str[],
                                           unsigned SLen) {
   return wrap(ConstantFP::get(unwrap(RealTy), StringRef(Str, SLen)));
 }
 
 unsigned long long LLVMConstIntGetZExtValue(LLVMValueRef ConstantVal) {
   return unwrap<ConstantInt>(ConstantVal)->getZExtValue();
 }
 
 long long LLVMConstIntGetSExtValue(LLVMValueRef ConstantVal) {
   return unwrap<ConstantInt>(ConstantVal)->getSExtValue();
 }
 
 double LLVMConstRealGetDouble(LLVMValueRef ConstantVal, LLVMBool *LosesInfo) {
   ConstantFP *cFP = unwrap<ConstantFP>(ConstantVal) ;
   Type *Ty = cFP->getType();
 
   if (Ty->isFloatTy()) {
     *LosesInfo = false;
     return cFP->getValueAPF().convertToFloat();
   }
 
   if (Ty->isDoubleTy()) {
     *LosesInfo = false;
     return cFP->getValueAPF().convertToDouble();
   }
 
   bool APFLosesInfo;
   APFloat APF = cFP->getValueAPF();
   APF.convert(APFloat::IEEEdouble(), APFloat::rmNearestTiesToEven, &APFLosesInfo);
   *LosesInfo = APFLosesInfo;
   return APF.convertToDouble();
 }
 
 /*--.. Operations on composite constants ...................................--*/
 
 LLVMValueRef LLVMConstStringInContext(LLVMContextRef C, const char *Str,
                                       unsigned Length,
                                       LLVMBool DontNullTerminate) {
   /* Inverted the sense of AddNull because ', 0)' is a
      better mnemonic for null termination than ', 1)'. */
   return wrap(ConstantDataArray::getString(*unwrap(C), StringRef(Str, Length),
                                            DontNullTerminate == 0));
 }
 
 LLVMValueRef LLVMConstString(const char *Str, unsigned Length,
                              LLVMBool DontNullTerminate) {
   return LLVMConstStringInContext(LLVMGetGlobalContext(), Str, Length,
                                   DontNullTerminate);
 }
 
 LLVMValueRef LLVMGetElementAsConstant(LLVMValueRef C, unsigned idx) {
   return wrap(unwrap<ConstantDataSequential>(C)->getElementAsConstant(idx));
 }
 
 LLVMBool LLVMIsConstantString(LLVMValueRef C) {
   return unwrap<ConstantDataSequential>(C)->isString();
 }
 
 const char *LLVMGetAsString(LLVMValueRef C, size_t *Length) {
   StringRef Str = unwrap<ConstantDataSequential>(C)->getAsString();
   *Length = Str.size();
   return Str.data();
 }
 
 LLVMValueRef LLVMConstArray(LLVMTypeRef ElementTy,
                             LLVMValueRef *ConstantVals, unsigned Length) {
   ArrayRef<Constant*> V(unwrap<Constant>(ConstantVals, Length), Length);
   return wrap(ConstantArray::get(ArrayType::get(unwrap(ElementTy), Length), V));
 }
 
 LLVMValueRef LLVMConstStructInContext(LLVMContextRef C,
                                       LLVMValueRef *ConstantVals,
                                       unsigned Count, LLVMBool Packed) {
   Constant **Elements = unwrap<Constant>(ConstantVals, Count);
   return wrap(ConstantStruct::getAnon(*unwrap(C), makeArrayRef(Elements, Count),
                                       Packed != 0));
 }
 
 LLVMValueRef LLVMConstStruct(LLVMValueRef *ConstantVals, unsigned Count,
                              LLVMBool Packed) {
   return LLVMConstStructInContext(LLVMGetGlobalContext(), ConstantVals, Count,
                                   Packed);
 }
 
 LLVMValueRef LLVMConstNamedStruct(LLVMTypeRef StructTy,
                                   LLVMValueRef *ConstantVals,
                                   unsigned Count) {
   Constant **Elements = unwrap<Constant>(ConstantVals, Count);
   StructType *Ty = cast<StructType>(unwrap(StructTy));
 
   return wrap(ConstantStruct::get(Ty, makeArrayRef(Elements, Count)));
 }
 
 LLVMValueRef LLVMConstVector(LLVMValueRef *ScalarConstantVals, unsigned Size) {
   return wrap(ConstantVector::get(makeArrayRef(
                             unwrap<Constant>(ScalarConstantVals, Size), Size)));
 }
 
 /*-- Opcode mapping */
 
 static LLVMOpcode map_to_llvmopcode(int opcode)
 {
     switch (opcode) {
       default: llvm_unreachable("Unhandled Opcode.");
 #define HANDLE_INST(num, opc, clas) case num: return LLVM##opc;
 #include "llvm/IR/Instruction.def"
 #undef HANDLE_INST
     }
 }
 
 static int map_from_llvmopcode(LLVMOpcode code)
 {
     switch (code) {
 #define HANDLE_INST(num, opc, clas) case LLVM##opc: return num;
 #include "llvm/IR/Instruction.def"
 #undef HANDLE_INST
     }
     llvm_unreachable("Unhandled Opcode.");
 }
 
 /*--.. Constant expressions ................................................--*/
 
 LLVMOpcode LLVMGetConstOpcode(LLVMValueRef ConstantVal) {
   return map_to_llvmopcode(unwrap<ConstantExpr>(ConstantVal)->getOpcode());
 }
 
 LLVMValueRef LLVMAlignOf(LLVMTypeRef Ty) {
   return wrap(ConstantExpr::getAlignOf(unwrap(Ty)));
 }
 
 LLVMValueRef LLVMSizeOf(LLVMTypeRef Ty) {
   return wrap(ConstantExpr::getSizeOf(unwrap(Ty)));
 }
 
 LLVMValueRef LLVMConstNeg(LLVMValueRef ConstantVal) {
   return wrap(ConstantExpr::getNeg(unwrap<Constant>(ConstantVal)));
 }
 
 LLVMValueRef LLVMConstNSWNeg(LLVMValueRef ConstantVal) {
   return wrap(ConstantExpr::getNSWNeg(unwrap<Constant>(ConstantVal)));
 }
 
 LLVMValueRef LLVMConstNUWNeg(LLVMValueRef ConstantVal) {
   return wrap(ConstantExpr::getNUWNeg(unwrap<Constant>(ConstantVal)));
 }
 
 
 LLVMValueRef LLVMConstFNeg(LLVMValueRef ConstantVal) {
   return wrap(ConstantExpr::getFNeg(unwrap<Constant>(ConstantVal)));
 }
 
 LLVMValueRef LLVMConstNot(LLVMValueRef ConstantVal) {
   return wrap(ConstantExpr::getNot(unwrap<Constant>(ConstantVal)));
 }
 
 LLVMValueRef LLVMConstAdd(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getAdd(unwrap<Constant>(LHSConstant),
                                    unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstNSWAdd(LLVMValueRef LHSConstant,
                              LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getNSWAdd(unwrap<Constant>(LHSConstant),
                                       unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstNUWAdd(LLVMValueRef LHSConstant,
                              LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getNUWAdd(unwrap<Constant>(LHSConstant),
                                       unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstFAdd(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getFAdd(unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstSub(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getSub(unwrap<Constant>(LHSConstant),
                                    unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstNSWSub(LLVMValueRef LHSConstant,
                              LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getNSWSub(unwrap<Constant>(LHSConstant),
                                       unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstNUWSub(LLVMValueRef LHSConstant,
                              LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getNUWSub(unwrap<Constant>(LHSConstant),
                                       unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstFSub(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getFSub(unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstMul(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getMul(unwrap<Constant>(LHSConstant),
                                    unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstNSWMul(LLVMValueRef LHSConstant,
                              LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getNSWMul(unwrap<Constant>(LHSConstant),
                                       unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstNUWMul(LLVMValueRef LHSConstant,
                              LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getNUWMul(unwrap<Constant>(LHSConstant),
                                       unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstFMul(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getFMul(unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstUDiv(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getUDiv(unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstExactUDiv(LLVMValueRef LHSConstant,
                                 LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getExactUDiv(unwrap<Constant>(LHSConstant),
                                          unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstSDiv(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getSDiv(unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstExactSDiv(LLVMValueRef LHSConstant,
                                 LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getExactSDiv(unwrap<Constant>(LHSConstant),
                                          unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstFDiv(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getFDiv(unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstURem(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getURem(unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstSRem(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getSRem(unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstFRem(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getFRem(unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstAnd(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getAnd(unwrap<Constant>(LHSConstant),
                                    unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstOr(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getOr(unwrap<Constant>(LHSConstant),
                                   unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstXor(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getXor(unwrap<Constant>(LHSConstant),
                                    unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstICmp(LLVMIntPredicate Predicate,
                            LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getICmp(Predicate,
                                     unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstFCmp(LLVMRealPredicate Predicate,
                            LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getFCmp(Predicate,
                                     unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstShl(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getShl(unwrap<Constant>(LHSConstant),
                                    unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstLShr(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getLShr(unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstAShr(LLVMValueRef LHSConstant, LLVMValueRef RHSConstant) {
   return wrap(ConstantExpr::getAShr(unwrap<Constant>(LHSConstant),
                                     unwrap<Constant>(RHSConstant)));
 }
 
 LLVMValueRef LLVMConstGEP(LLVMValueRef ConstantVal,
                           LLVMValueRef *ConstantIndices, unsigned NumIndices) {
   ArrayRef<Constant *> IdxList(unwrap<Constant>(ConstantIndices, NumIndices),
                                NumIndices);
   Constant *Val = unwrap<Constant>(ConstantVal);
   Type *Ty =
       cast<PointerType>(Val->getType()->getScalarType())->getElementType();
   return wrap(ConstantExpr::getGetElementPtr(Ty, Val, IdxList));
 }
 
 LLVMValueRef LLVMConstInBoundsGEP(LLVMValueRef ConstantVal,
                                   LLVMValueRef *ConstantIndices,
                                   unsigned NumIndices) {
   ArrayRef<Constant *> IdxList(unwrap<Constant>(ConstantIndices, NumIndices),
                                NumIndices);
   Constant *Val = unwrap<Constant>(ConstantVal);
   Type *Ty =
       cast<PointerType>(Val->getType()->getScalarType())->getElementType();
   return wrap(ConstantExpr::getInBoundsGetElementPtr(Ty, Val, IdxList));
 }
 
 LLVMValueRef LLVMConstTrunc(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getTrunc(unwrap<Constant>(ConstantVal),
                                      unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstSExt(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getSExt(unwrap<Constant>(ConstantVal),
                                     unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstZExt(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getZExt(unwrap<Constant>(ConstantVal),
                                     unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstFPTrunc(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getFPTrunc(unwrap<Constant>(ConstantVal),
                                        unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstFPExt(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getFPExtend(unwrap<Constant>(ConstantVal),
                                         unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstUIToFP(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getUIToFP(unwrap<Constant>(ConstantVal),
                                       unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstSIToFP(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getSIToFP(unwrap<Constant>(ConstantVal),
                                       unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstFPToUI(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getFPToUI(unwrap<Constant>(ConstantVal),
                                       unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstFPToSI(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getFPToSI(unwrap<Constant>(ConstantVal),
                                       unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstPtrToInt(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getPtrToInt(unwrap<Constant>(ConstantVal),
                                         unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstIntToPtr(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getIntToPtr(unwrap<Constant>(ConstantVal),
                                         unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstBitCast(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getBitCast(unwrap<Constant>(ConstantVal),
                                        unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstAddrSpaceCast(LLVMValueRef ConstantVal,
                                     LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getAddrSpaceCast(unwrap<Constant>(ConstantVal),
                                              unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstZExtOrBitCast(LLVMValueRef ConstantVal,
                                     LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getZExtOrBitCast(unwrap<Constant>(ConstantVal),
                                              unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstSExtOrBitCast(LLVMValueRef ConstantVal,
                                     LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getSExtOrBitCast(unwrap<Constant>(ConstantVal),
                                              unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstTruncOrBitCast(LLVMValueRef ConstantVal,
                                      LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getTruncOrBitCast(unwrap<Constant>(ConstantVal),
                                               unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstPointerCast(LLVMValueRef ConstantVal,
                                   LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getPointerCast(unwrap<Constant>(ConstantVal),
                                            unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstIntCast(LLVMValueRef ConstantVal, LLVMTypeRef ToType,
                               LLVMBool isSigned) {
   return wrap(ConstantExpr::getIntegerCast(unwrap<Constant>(ConstantVal),
                                            unwrap(ToType), isSigned));
 }
 
 LLVMValueRef LLVMConstFPCast(LLVMValueRef ConstantVal, LLVMTypeRef ToType) {
   return wrap(ConstantExpr::getFPCast(unwrap<Constant>(ConstantVal),
                                       unwrap(ToType)));
 }
 
 LLVMValueRef LLVMConstSelect(LLVMValueRef ConstantCondition,
                              LLVMValueRef ConstantIfTrue,
                              LLVMValueRef ConstantIfFalse) {
   return wrap(ConstantExpr::getSelect(unwrap<Constant>(ConstantCondition),
                                       unwrap<Constant>(ConstantIfTrue),
                                       unwrap<Constant>(ConstantIfFalse)));
 }
 
 LLVMValueRef LLVMConstExtractElement(LLVMValueRef VectorConstant,
                                      LLVMValueRef IndexConstant) {
   return wrap(ConstantExpr::getExtractElement(unwrap<Constant>(VectorConstant),
                                               unwrap<Constant>(IndexConstant)));
 }
 
 LLVMValueRef LLVMConstInsertElement(LLVMValueRef VectorConstant,
                                     LLVMValueRef ElementValueConstant,
                                     LLVMValueRef IndexConstant) {
   return wrap(ConstantExpr::getInsertElement(unwrap<Constant>(VectorConstant),
                                          unwrap<Constant>(ElementValueConstant),
                                              unwrap<Constant>(IndexConstant)));
 }
 
 LLVMValueRef LLVMConstShuffleVector(LLVMValueRef VectorAConstant,
                                     LLVMValueRef VectorBConstant,
                                     LLVMValueRef MaskConstant) {
   SmallVector<int, 16> IntMask;
   ShuffleVectorInst::getShuffleMask(unwrap<Constant>(MaskConstant), IntMask);
   return wrap(ConstantExpr::getShuffleVector(unwrap<Constant>(VectorAConstant),
                                              unwrap<Constant>(VectorBConstant),
                                              IntMask));
 }
 
 LLVMValueRef LLVMConstExtractValue(LLVMValueRef AggConstant, unsigned *IdxList,
                                    unsigned NumIdx) {
   return wrap(ConstantExpr::getExtractValue(unwrap<Constant>(AggConstant),
                                             makeArrayRef(IdxList, NumIdx)));
 }
 
 LLVMValueRef LLVMConstInsertValue(LLVMValueRef AggConstant,
                                   LLVMValueRef ElementValueConstant,
                                   unsigned *IdxList, unsigned NumIdx) {
   return wrap(ConstantExpr::getInsertValue(unwrap<Constant>(AggConstant),
                                          unwrap<Constant>(ElementValueConstant),
                                            makeArrayRef(IdxList, NumIdx)));
 }
 
 LLVMValueRef LLVMConstInlineAsm(LLVMTypeRef Ty, const char *AsmString,
                                 const char *Constraints,
                                 LLVMBool HasSideEffects,
                                 LLVMBool IsAlignStack) {
   return wrap(InlineAsm::get(dyn_cast<FunctionType>(unwrap(Ty)), AsmString,
                              Constraints, HasSideEffects, IsAlignStack));
 }
 
 LLVMValueRef LLVMBlockAddress(LLVMValueRef F, LLVMBasicBlockRef BB) {
   return wrap(BlockAddress::get(unwrap<Function>(F), unwrap(BB)));
 }
 
 /*--.. Operations on global variables, functions, and aliases (globals) ....--*/
 
 LLVMModuleRef LLVMGetGlobalParent(LLVMValueRef Global) {
   return wrap(unwrap<GlobalValue>(Global)->getParent());
 }
 
 LLVMBool LLVMIsDeclaration(LLVMValueRef Global) {
   return unwrap<GlobalValue>(Global)->isDeclaration();
 }
 
 LLVMLinkage LLVMGetLinkage(LLVMValueRef Global) {
   switch (unwrap<GlobalValue>(Global)->getLinkage()) {
   case GlobalValue::ExternalLinkage:
     return LLVMExternalLinkage;
   case GlobalValue::AvailableExternallyLinkage:
     return LLVMAvailableExternallyLinkage;
   case GlobalValue::LinkOnceAnyLinkage:
     return LLVMLinkOnceAnyLinkage;
   case GlobalValue::LinkOnceODRLinkage:
     return LLVMLinkOnceODRLinkage;
   case GlobalValue::WeakAnyLinkage:
     return LLVMWeakAnyLinkage;
   case GlobalValue::WeakODRLinkage:
     return LLVMWeakODRLinkage;
   case GlobalValue::AppendingLinkage:
     return LLVMAppendingLinkage;
   case GlobalValue::InternalLinkage:
     return LLVMInternalLinkage;
   case GlobalValue::PrivateLinkage:
     return LLVMPrivateLinkage;
   case GlobalValue::ExternalWeakLinkage:
     return LLVMExternalWeakLinkage;
   case GlobalValue::CommonLinkage:
     return LLVMCommonLinkage;
   }
 
   llvm_unreachable("Invalid GlobalValue linkage!");
 }
 
 void LLVMSetLinkage(LLVMValueRef Global, LLVMLinkage Linkage) {
   GlobalValue *GV = unwrap<GlobalValue>(Global);
 
   switch (Linkage) {
   case LLVMExternalLinkage:
     GV->setLinkage(GlobalValue::ExternalLinkage);
     break;
   case LLVMAvailableExternallyLinkage:
     GV->setLinkage(GlobalValue::AvailableExternallyLinkage);
     break;
   case LLVMLinkOnceAnyLinkage:
     GV->setLinkage(GlobalValue::LinkOnceAnyLinkage);
     break;
   case LLVMLinkOnceODRLinkage:
     GV->setLinkage(GlobalValue::LinkOnceODRLinkage);
     break;
   case LLVMLinkOnceODRAutoHideLinkage:
     LLVM_DEBUG(
         errs() << "LLVMSetLinkage(): LLVMLinkOnceODRAutoHideLinkage is no "
                   "longer supported.");
     break;
   case LLVMWeakAnyLinkage:
     GV->setLinkage(GlobalValue::WeakAnyLinkage);
     break;
   case LLVMWeakODRLinkage:
     GV->setLinkage(GlobalValue::WeakODRLinkage);
     break;
   case LLVMAppendingLinkage:
     GV->setLinkage(GlobalValue::AppendingLinkage);
     break;
   case LLVMInternalLinkage:
     GV->setLinkage(GlobalValue::InternalLinkage);
     break;
   case LLVMPrivateLinkage:
     GV->setLinkage(GlobalValue::PrivateLinkage);
     break;
   case LLVMLinkerPrivateLinkage:
     GV->setLinkage(GlobalValue::PrivateLinkage);
     break;
   case LLVMLinkerPrivateWeakLinkage:
     GV->setLinkage(GlobalValue::PrivateLinkage);
     break;
   case LLVMDLLImportLinkage:
     LLVM_DEBUG(
         errs()
         << "LLVMSetLinkage(): LLVMDLLImportLinkage is no longer supported.");
     break;
   case LLVMDLLExportLinkage:
     LLVM_DEBUG(
         errs()
         << "LLVMSetLinkage(): LLVMDLLExportLinkage is no longer supported.");
     break;
   case LLVMExternalWeakLinkage:
     GV->setLinkage(GlobalValue::ExternalWeakLinkage);
     break;
   case LLVMGhostLinkage:
     LLVM_DEBUG(
         errs() << "LLVMSetLinkage(): LLVMGhostLinkage is no longer supported.");
     break;
   case LLVMCommonLinkage:
     GV->setLinkage(GlobalValue::CommonLinkage);
     break;
   }
 }
 
 const char *LLVMGetSection(LLVMValueRef Global) {
   // Using .data() is safe because of how GlobalObject::setSection is
   // implemented.
   return unwrap<GlobalValue>(Global)->getSection().data();
 }
 
 void LLVMSetSection(LLVMValueRef Global, const char *Section) {
   unwrap<GlobalObject>(Global)->setSection(Section);
 }
 
 LLVMVisibility LLVMGetVisibility(LLVMValueRef Global) {
   return static_cast<LLVMVisibility>(
     unwrap<GlobalValue>(Global)->getVisibility());
 }
 
 void LLVMSetVisibility(LLVMValueRef Global, LLVMVisibility Viz) {
   unwrap<GlobalValue>(Global)
     ->setVisibility(static_cast<GlobalValue::VisibilityTypes>(Viz));
 }
 
 LLVMDLLStorageClass LLVMGetDLLStorageClass(LLVMValueRef Global) {
   return static_cast<LLVMDLLStorageClass>(
       unwrap<GlobalValue>(Global)->getDLLStorageClass());
 }
 
 void LLVMSetDLLStorageClass(LLVMValueRef Global, LLVMDLLStorageClass Class) {
   unwrap<GlobalValue>(Global)->setDLLStorageClass(
       static_cast<GlobalValue::DLLStorageClassTypes>(Class));
 }
 
 LLVMUnnamedAddr LLVMGetUnnamedAddress(LLVMValueRef Global) {
   switch (unwrap<GlobalValue>(Global)->getUnnamedAddr()) {
   case GlobalVariable::UnnamedAddr::None:
     return LLVMNoUnnamedAddr;
   case GlobalVariable::UnnamedAddr::Local:
     return LLVMLocalUnnamedAddr;
   case GlobalVariable::UnnamedAddr::Global:
     return LLVMGlobalUnnamedAddr;
   }
   llvm_unreachable("Unknown UnnamedAddr kind!");
 }
 
 void LLVMSetUnnamedAddress(LLVMValueRef Global, LLVMUnnamedAddr UnnamedAddr) {
   GlobalValue *GV = unwrap<GlobalValue>(Global);
 
   switch (UnnamedAddr) {
   case LLVMNoUnnamedAddr:
     return GV->setUnnamedAddr(GlobalVariable::UnnamedAddr::None);
   case LLVMLocalUnnamedAddr:
     return GV->setUnnamedAddr(GlobalVariable::UnnamedAddr::Local);
   case LLVMGlobalUnnamedAddr:
     return GV->setUnnamedAddr(GlobalVariable::UnnamedAddr::Global);
   }
 }
 
 LLVMBool LLVMHasUnnamedAddr(LLVMValueRef Global) {
   return unwrap<GlobalValue>(Global)->hasGlobalUnnamedAddr();
 }
 
 void LLVMSetUnnamedAddr(LLVMValueRef Global, LLVMBool HasUnnamedAddr) {
   unwrap<GlobalValue>(Global)->setUnnamedAddr(
       HasUnnamedAddr ? GlobalValue::UnnamedAddr::Global
                      : GlobalValue::UnnamedAddr::None);
 }
 
 LLVMTypeRef LLVMGlobalGetValueType(LLVMValueRef Global) {
   return wrap(unwrap<GlobalValue>(Global)->getValueType());
 }
 
 /*--.. Operations on global variables, load and store instructions .........--*/
 
 unsigned LLVMGetAlignment(LLVMValueRef V) {
   Value *P = unwrap<Value>(V);
   if (GlobalObject *GV = dyn_cast<GlobalObject>(P))
     return GV->getAlignment();
   if (AllocaInst *AI = dyn_cast<AllocaInst>(P))
     return AI->getAlignment();
   if (LoadInst *LI = dyn_cast<LoadInst>(P))
     return LI->getAlignment();
   if (StoreInst *SI = dyn_cast<StoreInst>(P))
     return SI->getAlignment();
 
   llvm_unreachable(
       "only GlobalObject, AllocaInst, LoadInst and StoreInst have alignment");
 }
 
 void LLVMSetAlignment(LLVMValueRef V, unsigned Bytes) {
   Value *P = unwrap<Value>(V);
   if (GlobalObject *GV = dyn_cast<GlobalObject>(P))
     GV->setAlignment(MaybeAlign(Bytes));
   else if (AllocaInst *AI = dyn_cast<AllocaInst>(P))
     AI->setAlignment(Align(Bytes));
   else if (LoadInst *LI = dyn_cast<LoadInst>(P))
     LI->setAlignment(Align(Bytes));
   else if (StoreInst *SI = dyn_cast<StoreInst>(P))
     SI->setAlignment(Align(Bytes));
   else
     llvm_unreachable(
         "only GlobalValue, AllocaInst, LoadInst and StoreInst have alignment");
 }
 
 LLVMValueMetadataEntry *LLVMGlobalCopyAllMetadata(LLVMValueRef Value,
                                                   size_t *NumEntries) {
   return llvm_getMetadata(NumEntries, [&Value](MetadataEntries &Entries) {
     if (Instruction *Instr = dyn_cast<Instruction>(unwrap(Value))) {
       Instr->getAllMetadata(Entries);
     } else {
       unwrap<GlobalObject>(Value)->getAllMetadata(Entries);
     }
   });
 }
 
 unsigned LLVMValueMetadataEntriesGetKind(LLVMValueMetadataEntry *Entries,
                                          unsigned Index) {
   LLVMOpaqueValueMetadataEntry MVE =
       static_cast<LLVMOpaqueValueMetadataEntry>(Entries[Index]);
   return MVE.Kind;
 }
 
 LLVMMetadataRef
 LLVMValueMetadataEntriesGetMetadata(LLVMValueMetadataEntry *Entries,
                                     unsigned Index) {
   LLVMOpaqueValueMetadataEntry MVE =
       static_cast<LLVMOpaqueValueMetadataEntry>(Entries[Index]);
   return MVE.Metadata;
 }
 
 void LLVMDisposeValueMetadataEntries(LLVMValueMetadataEntry *Entries) {
   free(Entries);
 }
 
 void LLVMGlobalSetMetadata(LLVMValueRef Global, unsigned Kind,
                            LLVMMetadataRef MD) {
   unwrap<GlobalObject>(Global)->setMetadata(Kind, unwrap<MDNode>(MD));
 }
 
 void LLVMGlobalEraseMetadata(LLVMValueRef Global, unsigned Kind) {
   unwrap<GlobalObject>(Global)->eraseMetadata(Kind);
 }
 
 void LLVMGlobalClearMetadata(LLVMValueRef Global) {
   unwrap<GlobalObject>(Global)->clearMetadata();
 }
 
 /*--.. Operations on global variables ......................................--*/
 
 LLVMValueRef LLVMAddGlobal(LLVMModuleRef M, LLVMTypeRef Ty, const char *Name) {
   return wrap(new GlobalVariable(*unwrap(M), unwrap(Ty), false,
                                  GlobalValue::ExternalLinkage, nullptr, Name));
 }
 
 LLVMValueRef LLVMAddGlobalInAddressSpace(LLVMModuleRef M, LLVMTypeRef Ty,
                                          const char *Name,
                                          unsigned AddressSpace) {
   return wrap(new GlobalVariable(*unwrap(M), unwrap(Ty), false,
                                  GlobalValue::ExternalLinkage, nullptr, Name,
                                  nullptr, GlobalVariable::NotThreadLocal,
                                  AddressSpace));
 }
 
 LLVMValueRef LLVMGetNamedGlobal(LLVMModuleRef M, const char *Name) {
   return wrap(unwrap(M)->getNamedGlobal(Name));
 }
 
 LLVMValueRef LLVMGetFirstGlobal(LLVMModuleRef M) {
   Module *Mod = unwrap(M);
   Module::global_iterator I = Mod->global_begin();
   if (I == Mod->global_end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMValueRef LLVMGetLastGlobal(LLVMModuleRef M) {
   Module *Mod = unwrap(M);
   Module::global_iterator I = Mod->global_end();
   if (I == Mod->global_begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMValueRef LLVMGetNextGlobal(LLVMValueRef GlobalVar) {
   GlobalVariable *GV = unwrap<GlobalVariable>(GlobalVar);
   Module::global_iterator I(GV);
   if (++I == GV->getParent()->global_end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMValueRef LLVMGetPreviousGlobal(LLVMValueRef GlobalVar) {
   GlobalVariable *GV = unwrap<GlobalVariable>(GlobalVar);
   Module::global_iterator I(GV);
   if (I == GV->getParent()->global_begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 void LLVMDeleteGlobal(LLVMValueRef GlobalVar) {
   unwrap<GlobalVariable>(GlobalVar)->eraseFromParent();
 }
 
 LLVMValueRef LLVMGetInitializer(LLVMValueRef GlobalVar) {
   GlobalVariable* GV = unwrap<GlobalVariable>(GlobalVar);
   if ( !GV->hasInitializer() )
     return nullptr;
   return wrap(GV->getInitializer());
 }
 
 void LLVMSetInitializer(LLVMValueRef GlobalVar, LLVMValueRef ConstantVal) {
   unwrap<GlobalVariable>(GlobalVar)
     ->setInitializer(unwrap<Constant>(ConstantVal));
 }
 
 LLVMBool LLVMIsThreadLocal(LLVMValueRef GlobalVar) {
   return unwrap<GlobalVariable>(GlobalVar)->isThreadLocal();
 }
 
 void LLVMSetThreadLocal(LLVMValueRef GlobalVar, LLVMBool IsThreadLocal) {
   unwrap<GlobalVariable>(GlobalVar)->setThreadLocal(IsThreadLocal != 0);
 }
 
 LLVMBool LLVMIsGlobalConstant(LLVMValueRef GlobalVar) {
   return unwrap<GlobalVariable>(GlobalVar)->isConstant();
 }
 
 void LLVMSetGlobalConstant(LLVMValueRef GlobalVar, LLVMBool IsConstant) {
   unwrap<GlobalVariable>(GlobalVar)->setConstant(IsConstant != 0);
 }
 
 LLVMThreadLocalMode LLVMGetThreadLocalMode(LLVMValueRef GlobalVar) {
   switch (unwrap<GlobalVariable>(GlobalVar)->getThreadLocalMode()) {
   case GlobalVariable::NotThreadLocal:
     return LLVMNotThreadLocal;
   case GlobalVariable::GeneralDynamicTLSModel:
     return LLVMGeneralDynamicTLSModel;
   case GlobalVariable::LocalDynamicTLSModel:
     return LLVMLocalDynamicTLSModel;
   case GlobalVariable::InitialExecTLSModel:
     return LLVMInitialExecTLSModel;
   case GlobalVariable::LocalExecTLSModel:
     return LLVMLocalExecTLSModel;
   }
 
   llvm_unreachable("Invalid GlobalVariable thread local mode");
 }
 
 void LLVMSetThreadLocalMode(LLVMValueRef GlobalVar, LLVMThreadLocalMode Mode) {
   GlobalVariable *GV = unwrap<GlobalVariable>(GlobalVar);
 
   switch (Mode) {
   case LLVMNotThreadLocal:
     GV->setThreadLocalMode(GlobalVariable::NotThreadLocal);
     break;
   case LLVMGeneralDynamicTLSModel:
     GV->setThreadLocalMode(GlobalVariable::GeneralDynamicTLSModel);
     break;
   case LLVMLocalDynamicTLSModel:
     GV->setThreadLocalMode(GlobalVariable::LocalDynamicTLSModel);
     break;
   case LLVMInitialExecTLSModel:
     GV->setThreadLocalMode(GlobalVariable::InitialExecTLSModel);
     break;
   case LLVMLocalExecTLSModel:
     GV->setThreadLocalMode(GlobalVariable::LocalExecTLSModel);
     break;
   }
 }
 
 LLVMBool LLVMIsExternallyInitialized(LLVMValueRef GlobalVar) {
   return unwrap<GlobalVariable>(GlobalVar)->isExternallyInitialized();
 }
 
 void LLVMSetExternallyInitialized(LLVMValueRef GlobalVar, LLVMBool IsExtInit) {
   unwrap<GlobalVariable>(GlobalVar)->setExternallyInitialized(IsExtInit);
 }
 
 /*--.. Operations on aliases ......................................--*/
 
 LLVMValueRef LLVMAddAlias(LLVMModuleRef M, LLVMTypeRef Ty, LLVMValueRef Aliasee,
                           const char *Name) {
   auto *PTy = cast<PointerType>(unwrap(Ty));
   return wrap(GlobalAlias::create(PTy->getElementType(), PTy->getAddressSpace(),
                                   GlobalValue::ExternalLinkage, Name,
                                   unwrap<Constant>(Aliasee), unwrap(M)));
 }
 
 LLVMValueRef LLVMGetNamedGlobalAlias(LLVMModuleRef M,
                                      const char *Name, size_t NameLen) {
   return wrap(unwrap(M)->getNamedAlias(Name));
 }
 
 LLVMValueRef LLVMGetFirstGlobalAlias(LLVMModuleRef M) {
   Module *Mod = unwrap(M);
   Module::alias_iterator I = Mod->alias_begin();
   if (I == Mod->alias_end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMValueRef LLVMGetLastGlobalAlias(LLVMModuleRef M) {
   Module *Mod = unwrap(M);
   Module::alias_iterator I = Mod->alias_end();
   if (I == Mod->alias_begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMValueRef LLVMGetNextGlobalAlias(LLVMValueRef GA) {
   GlobalAlias *Alias = unwrap<GlobalAlias>(GA);
   Module::alias_iterator I(Alias);
   if (++I == Alias->getParent()->alias_end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMValueRef LLVMGetPreviousGlobalAlias(LLVMValueRef GA) {
   GlobalAlias *Alias = unwrap<GlobalAlias>(GA);
   Module::alias_iterator I(Alias);
   if (I == Alias->getParent()->alias_begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMValueRef LLVMAliasGetAliasee(LLVMValueRef Alias) {
   return wrap(unwrap<GlobalAlias>(Alias)->getAliasee());
 }
 
 void LLVMAliasSetAliasee(LLVMValueRef Alias, LLVMValueRef Aliasee) {
   unwrap<GlobalAlias>(Alias)->setAliasee(unwrap<Constant>(Aliasee));
 }
 
 /*--.. Operations on functions .............................................--*/
 
 LLVMValueRef LLVMAddFunction(LLVMModuleRef M, const char *Name,
                              LLVMTypeRef FunctionTy) {
   return wrap(Function::Create(unwrap<FunctionType>(FunctionTy),
                                GlobalValue::ExternalLinkage, Name, unwrap(M)));
 }
 
 LLVMValueRef LLVMGetNamedFunction(LLVMModuleRef M, const char *Name) {
   return wrap(unwrap(M)->getFunction(Name));
 }
 
 LLVMValueRef LLVMGetFirstFunction(LLVMModuleRef M) {
   Module *Mod = unwrap(M);
   Module::iterator I = Mod->begin();
   if (I == Mod->end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMValueRef LLVMGetLastFunction(LLVMModuleRef M) {
   Module *Mod = unwrap(M);
   Module::iterator I = Mod->end();
   if (I == Mod->begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMValueRef LLVMGetNextFunction(LLVMValueRef Fn) {
   Function *Func = unwrap<Function>(Fn);
   Module::iterator I(Func);
   if (++I == Func->getParent()->end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMValueRef LLVMGetPreviousFunction(LLVMValueRef Fn) {
   Function *Func = unwrap<Function>(Fn);
   Module::iterator I(Func);
   if (I == Func->getParent()->begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 void LLVMDeleteFunction(LLVMValueRef Fn) {
   unwrap<Function>(Fn)->eraseFromParent();
 }
 
 LLVMBool LLVMHasPersonalityFn(LLVMValueRef Fn) {
   return unwrap<Function>(Fn)->hasPersonalityFn();
 }
 
 LLVMValueRef LLVMGetPersonalityFn(LLVMValueRef Fn) {
   return wrap(unwrap<Function>(Fn)->getPersonalityFn());
 }
 
 void LLVMSetPersonalityFn(LLVMValueRef Fn, LLVMValueRef PersonalityFn) {
   unwrap<Function>(Fn)->setPersonalityFn(unwrap<Constant>(PersonalityFn));
 }
 
 unsigned LLVMGetIntrinsicID(LLVMValueRef Fn) {
   if (Function *F = dyn_cast<Function>(unwrap(Fn)))
     return F->getIntrinsicID();
   return 0;
 }
 
 static Intrinsic::ID llvm_map_to_intrinsic_id(unsigned ID) {
   assert(ID < llvm::Intrinsic::num_intrinsics && "Intrinsic ID out of range");
   return llvm::Intrinsic::ID(ID);
 }
 
 LLVMValueRef LLVMGetIntrinsicDeclaration(LLVMModuleRef Mod,
                                          unsigned ID,
                                          LLVMTypeRef *ParamTypes,
                                          size_t ParamCount) {
   ArrayRef<Type*> Tys(unwrap(ParamTypes), ParamCount);
   auto IID = llvm_map_to_intrinsic_id(ID);
   return wrap(llvm::Intrinsic::getDeclaration(unwrap(Mod), IID, Tys));
 }
 
 const char *LLVMIntrinsicGetName(unsigned ID, size_t *NameLength) {
   auto IID = llvm_map_to_intrinsic_id(ID);
   auto Str = llvm::Intrinsic::getName(IID);
   *NameLength = Str.size();
   return Str.data();
 }
 
 LLVMTypeRef LLVMIntrinsicGetType(LLVMContextRef Ctx, unsigned ID,
                                  LLVMTypeRef *ParamTypes, size_t ParamCount) {
   auto IID = llvm_map_to_intrinsic_id(ID);
   ArrayRef<Type*> Tys(unwrap(ParamTypes), ParamCount);
   return wrap(llvm::Intrinsic::getType(*unwrap(Ctx), IID, Tys));
 }
 
 const char *LLVMIntrinsicCopyOverloadedName(unsigned ID,
                                             LLVMTypeRef *ParamTypes,
                                             size_t ParamCount,
                                             size_t *NameLength) {
   auto IID = llvm_map_to_intrinsic_id(ID);
   ArrayRef<Type*> Tys(unwrap(ParamTypes), ParamCount);
   auto Str = llvm::Intrinsic::getName(IID, Tys);
   *NameLength = Str.length();
   return strdup(Str.c_str());
 }
 
 unsigned LLVMLookupIntrinsicID(const char *Name, size_t NameLen) {
   return Function::lookupIntrinsicID({Name, NameLen});
 }
 
 LLVMBool LLVMIntrinsicIsOverloaded(unsigned ID) {
   auto IID = llvm_map_to_intrinsic_id(ID);
   return llvm::Intrinsic::isOverloaded(IID);
 }
 
 unsigned LLVMGetFunctionCallConv(LLVMValueRef Fn) {
   return unwrap<Function>(Fn)->getCallingConv();
 }
 
 void LLVMSetFunctionCallConv(LLVMValueRef Fn, unsigned CC) {
   return unwrap<Function>(Fn)->setCallingConv(
     static_cast<CallingConv::ID>(CC));
 }
 
 const char *LLVMGetGC(LLVMValueRef Fn) {
   Function *F = unwrap<Function>(Fn);
   return F->hasGC()? F->getGC().c_str() : nullptr;
 }
 
 void LLVMSetGC(LLVMValueRef Fn, const char *GC) {
   Function *F = unwrap<Function>(Fn);
   if (GC)
     F->setGC(GC);
   else
     F->clearGC();
 }
 
 void LLVMAddAttributeAtIndex(LLVMValueRef F, LLVMAttributeIndex Idx,
                              LLVMAttributeRef A) {
   unwrap<Function>(F)->addAttribute(Idx, unwrap(A));
 }
 
 unsigned LLVMGetAttributeCountAtIndex(LLVMValueRef F, LLVMAttributeIndex Idx) {
   auto AS = unwrap<Function>(F)->getAttributes().getAttributes(Idx);
   return AS.getNumAttributes();
 }
 
 void LLVMGetAttributesAtIndex(LLVMValueRef F, LLVMAttributeIndex Idx,
                               LLVMAttributeRef *Attrs) {
   auto AS = unwrap<Function>(F)->getAttributes().getAttributes(Idx);
   for (auto A : AS)
     *Attrs++ = wrap(A);
 }
 
 LLVMAttributeRef LLVMGetEnumAttributeAtIndex(LLVMValueRef F,
                                              LLVMAttributeIndex Idx,
                                              unsigned KindID) {
   return wrap(unwrap<Function>(F)->getAttribute(Idx,
                                                 (Attribute::AttrKind)KindID));
 }
 
 LLVMAttributeRef LLVMGetStringAttributeAtIndex(LLVMValueRef F,
                                                LLVMAttributeIndex Idx,
                                                const char *K, unsigned KLen) {
   return wrap(unwrap<Function>(F)->getAttribute(Idx, StringRef(K, KLen)));
 }
 
 void LLVMRemoveEnumAttributeAtIndex(LLVMValueRef F, LLVMAttributeIndex Idx,
                                     unsigned KindID) {
   unwrap<Function>(F)->removeAttribute(Idx, (Attribute::AttrKind)KindID);
 }
 
 void LLVMRemoveStringAttributeAtIndex(LLVMValueRef F, LLVMAttributeIndex Idx,
                                       const char *K, unsigned KLen) {
   unwrap<Function>(F)->removeAttribute(Idx, StringRef(K, KLen));
 }
 
 void LLVMAddTargetDependentFunctionAttr(LLVMValueRef Fn, const char *A,
                                         const char *V) {
   Function *Func = unwrap<Function>(Fn);
   Attribute Attr = Attribute::get(Func->getContext(), A, V);
   Func->addAttribute(AttributeList::FunctionIndex, Attr);
 }
 
 /*--.. Operations on parameters ............................................--*/
 
 unsigned LLVMCountParams(LLVMValueRef FnRef) {
   // This function is strictly redundant to
   //   LLVMCountParamTypes(LLVMGetElementType(LLVMTypeOf(FnRef)))
   return unwrap<Function>(FnRef)->arg_size();
 }
 
 void LLVMGetParams(LLVMValueRef FnRef, LLVMValueRef *ParamRefs) {
   Function *Fn = unwrap<Function>(FnRef);
   for (Function::arg_iterator I = Fn->arg_begin(),
                               E = Fn->arg_end(); I != E; I++)
     *ParamRefs++ = wrap(&*I);
 }
 
 LLVMValueRef LLVMGetParam(LLVMValueRef FnRef, unsigned index) {
   Function *Fn = unwrap<Function>(FnRef);
   return wrap(&Fn->arg_begin()[index]);
 }
 
 LLVMValueRef LLVMGetParamParent(LLVMValueRef V) {
   return wrap(unwrap<Argument>(V)->getParent());
 }
 
 LLVMValueRef LLVMGetFirstParam(LLVMValueRef Fn) {
   Function *Func = unwrap<Function>(Fn);
   Function::arg_iterator I = Func->arg_begin();
   if (I == Func->arg_end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMValueRef LLVMGetLastParam(LLVMValueRef Fn) {
   Function *Func = unwrap<Function>(Fn);
   Function::arg_iterator I = Func->arg_end();
   if (I == Func->arg_begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMValueRef LLVMGetNextParam(LLVMValueRef Arg) {
   Argument *A = unwrap<Argument>(Arg);
   Function *Fn = A->getParent();
   if (A->getArgNo() + 1 >= Fn->arg_size())
     return nullptr;
   return wrap(&Fn->arg_begin()[A->getArgNo() + 1]);
 }
 
 LLVMValueRef LLVMGetPreviousParam(LLVMValueRef Arg) {
   Argument *A = unwrap<Argument>(Arg);
   if (A->getArgNo() == 0)
     return nullptr;
   return wrap(&A->getParent()->arg_begin()[A->getArgNo() - 1]);
 }
 
 void LLVMSetParamAlignment(LLVMValueRef Arg, unsigned align) {
   Argument *A = unwrap<Argument>(Arg);
   A->addAttr(Attribute::getWithAlignment(A->getContext(), Align(align)));
 }
 
 /*--.. Operations on ifuncs ................................................--*/
 
 LLVMValueRef LLVMAddGlobalIFunc(LLVMModuleRef M,
                                 const char *Name, size_t NameLen,
                                 LLVMTypeRef Ty, unsigned AddrSpace,
                                 LLVMValueRef Resolver) {
   return wrap(GlobalIFunc::create(unwrap(Ty), AddrSpace,
                                   GlobalValue::ExternalLinkage,
                                   StringRef(Name, NameLen),
                                   unwrap<Constant>(Resolver), unwrap(M)));
 }
 
 LLVMValueRef LLVMGetNamedGlobalIFunc(LLVMModuleRef M,
                                      const char *Name, size_t NameLen) {
   return wrap(unwrap(M)->getNamedIFunc(StringRef(Name, NameLen)));
 }
 
 LLVMValueRef LLVMGetFirstGlobalIFunc(LLVMModuleRef M) {
   Module *Mod = unwrap(M);
   Module::ifunc_iterator I = Mod->ifunc_begin();
   if (I == Mod->ifunc_end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMValueRef LLVMGetLastGlobalIFunc(LLVMModuleRef M) {
   Module *Mod = unwrap(M);
   Module::ifunc_iterator I = Mod->ifunc_end();
   if (I == Mod->ifunc_begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMValueRef LLVMGetNextGlobalIFunc(LLVMValueRef IFunc) {
   GlobalIFunc *GIF = unwrap<GlobalIFunc>(IFunc);
   Module::ifunc_iterator I(GIF);
   if (++I == GIF->getParent()->ifunc_end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMValueRef LLVMGetPreviousGlobalIFunc(LLVMValueRef IFunc) {
   GlobalIFunc *GIF = unwrap<GlobalIFunc>(IFunc);
   Module::ifunc_iterator I(GIF);
   if (I == GIF->getParent()->ifunc_begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMValueRef LLVMGetGlobalIFuncResolver(LLVMValueRef IFunc) {
   return wrap(unwrap<GlobalIFunc>(IFunc)->getResolver());
 }
 
 void LLVMSetGlobalIFuncResolver(LLVMValueRef IFunc, LLVMValueRef Resolver) {
   unwrap<GlobalIFunc>(IFunc)->setResolver(unwrap<Constant>(Resolver));
 }
 
 void LLVMEraseGlobalIFunc(LLVMValueRef IFunc) {
   unwrap<GlobalIFunc>(IFunc)->eraseFromParent();
 }
 
 void LLVMRemoveGlobalIFunc(LLVMValueRef IFunc) {
   unwrap<GlobalIFunc>(IFunc)->removeFromParent();
 }
 
 /*--.. Operations on basic blocks ..........................................--*/
 
 LLVMValueRef LLVMBasicBlockAsValue(LLVMBasicBlockRef BB) {
   return wrap(static_cast<Value*>(unwrap(BB)));
 }
 
 LLVMBool LLVMValueIsBasicBlock(LLVMValueRef Val) {
   return isa<BasicBlock>(unwrap(Val));
 }
 
 LLVMBasicBlockRef LLVMValueAsBasicBlock(LLVMValueRef Val) {
   return wrap(unwrap<BasicBlock>(Val));
 }
 
 const char *LLVMGetBasicBlockName(LLVMBasicBlockRef BB) {
   return unwrap(BB)->getName().data();
 }
 
 LLVMValueRef LLVMGetBasicBlockParent(LLVMBasicBlockRef BB) {
   return wrap(unwrap(BB)->getParent());
 }
 
 LLVMValueRef LLVMGetBasicBlockTerminator(LLVMBasicBlockRef BB) {
   return wrap(unwrap(BB)->getTerminator());
 }
 
 unsigned LLVMCountBasicBlocks(LLVMValueRef FnRef) {
   return unwrap<Function>(FnRef)->size();
 }
 
 void LLVMGetBasicBlocks(LLVMValueRef FnRef, LLVMBasicBlockRef *BasicBlocksRefs){
   Function *Fn = unwrap<Function>(FnRef);
   for (BasicBlock &BB : *Fn)
     *BasicBlocksRefs++ = wrap(&BB);
 }
 
 LLVMBasicBlockRef LLVMGetEntryBasicBlock(LLVMValueRef Fn) {
   return wrap(&unwrap<Function>(Fn)->getEntryBlock());
 }
 
 LLVMBasicBlockRef LLVMGetFirstBasicBlock(LLVMValueRef Fn) {
   Function *Func = unwrap<Function>(Fn);
   Function::iterator I = Func->begin();
   if (I == Func->end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMBasicBlockRef LLVMGetLastBasicBlock(LLVMValueRef Fn) {
   Function *Func = unwrap<Function>(Fn);
   Function::iterator I = Func->end();
   if (I == Func->begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMBasicBlockRef LLVMGetNextBasicBlock(LLVMBasicBlockRef BB) {
   BasicBlock *Block = unwrap(BB);
   Function::iterator I(Block);
   if (++I == Block->getParent()->end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMBasicBlockRef LLVMGetPreviousBasicBlock(LLVMBasicBlockRef BB) {
   BasicBlock *Block = unwrap(BB);
   Function::iterator I(Block);
   if (I == Block->getParent()->begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMBasicBlockRef LLVMCreateBasicBlockInContext(LLVMContextRef C,
                                                 const char *Name) {
   return wrap(llvm::BasicBlock::Create(*unwrap(C), Name));
 }
 
 void LLVMInsertExistingBasicBlockAfterInsertBlock(LLVMBuilderRef Builder,
                                                   LLVMBasicBlockRef BB) {
   BasicBlock *ToInsert = unwrap(BB);
   BasicBlock *CurBB = unwrap(Builder)->GetInsertBlock();
   assert(CurBB && "current insertion point is invalid!");
   CurBB->getParent()->getBasicBlockList().insertAfter(CurBB->getIterator(),
                                                       ToInsert);
 }
 
 void LLVMAppendExistingBasicBlock(LLVMValueRef Fn,
                                   LLVMBasicBlockRef BB) {
   unwrap<Function>(Fn)->getBasicBlockList().push_back(unwrap(BB));
 }
 
 LLVMBasicBlockRef LLVMAppendBasicBlockInContext(LLVMContextRef C,
                                                 LLVMValueRef FnRef,
                                                 const char *Name) {
   return wrap(BasicBlock::Create(*unwrap(C), Name, unwrap<Function>(FnRef)));
 }
 
 LLVMBasicBlockRef LLVMAppendBasicBlock(LLVMValueRef FnRef, const char *Name) {
   return LLVMAppendBasicBlockInContext(LLVMGetGlobalContext(), FnRef, Name);
 }
 
 LLVMBasicBlockRef LLVMInsertBasicBlockInContext(LLVMContextRef C,
                                                 LLVMBasicBlockRef BBRef,
                                                 const char *Name) {
   BasicBlock *BB = unwrap(BBRef);
   return wrap(BasicBlock::Create(*unwrap(C), Name, BB->getParent(), BB));
 }
 
 LLVMBasicBlockRef LLVMInsertBasicBlock(LLVMBasicBlockRef BBRef,
                                        const char *Name) {
   return LLVMInsertBasicBlockInContext(LLVMGetGlobalContext(), BBRef, Name);
 }
 
 void LLVMDeleteBasicBlock(LLVMBasicBlockRef BBRef) {
   unwrap(BBRef)->eraseFromParent();
 }
 
 void LLVMRemoveBasicBlockFromParent(LLVMBasicBlockRef BBRef) {
   unwrap(BBRef)->removeFromParent();
 }
 
 void LLVMMoveBasicBlockBefore(LLVMBasicBlockRef BB, LLVMBasicBlockRef MovePos) {
   unwrap(BB)->moveBefore(unwrap(MovePos));
 }
 
 void LLVMMoveBasicBlockAfter(LLVMBasicBlockRef BB, LLVMBasicBlockRef MovePos) {
   unwrap(BB)->moveAfter(unwrap(MovePos));
 }
 
 /*--.. Operations on instructions ..........................................--*/
 
 LLVMBasicBlockRef LLVMGetInstructionParent(LLVMValueRef Inst) {
   return wrap(unwrap<Instruction>(Inst)->getParent());
 }
 
 LLVMValueRef LLVMGetFirstInstruction(LLVMBasicBlockRef BB) {
   BasicBlock *Block = unwrap(BB);
   BasicBlock::iterator I = Block->begin();
   if (I == Block->end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMValueRef LLVMGetLastInstruction(LLVMBasicBlockRef BB) {
   BasicBlock *Block = unwrap(BB);
   BasicBlock::iterator I = Block->end();
   if (I == Block->begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 LLVMValueRef LLVMGetNextInstruction(LLVMValueRef Inst) {
   Instruction *Instr = unwrap<Instruction>(Inst);
   BasicBlock::iterator I(Instr);
   if (++I == Instr->getParent()->end())
     return nullptr;
   return wrap(&*I);
 }
 
 LLVMValueRef LLVMGetPreviousInstruction(LLVMValueRef Inst) {
   Instruction *Instr = unwrap<Instruction>(Inst);
   BasicBlock::iterator I(Instr);
   if (I == Instr->getParent()->begin())
     return nullptr;
   return wrap(&*--I);
 }
 
 void LLVMInstructionRemoveFromParent(LLVMValueRef Inst) {
   unwrap<Instruction>(Inst)->removeFromParent();
 }
 
 void LLVMInstructionEraseFromParent(LLVMValueRef Inst) {
   unwrap<Instruction>(Inst)->eraseFromParent();
 }
 
 LLVMIntPredicate LLVMGetICmpPredicate(LLVMValueRef Inst) {
   if (ICmpInst *I = dyn_cast<ICmpInst>(unwrap(Inst)))
     return (LLVMIntPredicate)I->getPredicate();
   if (ConstantExpr *CE = dyn_cast<ConstantExpr>(unwrap(Inst)))
     if (CE->getOpcode() == Instruction::ICmp)
       return (LLVMIntPredicate)CE->getPredicate();
   return (LLVMIntPredicate)0;
 }
 
 LLVMRealPredicate LLVMGetFCmpPredicate(LLVMValueRef Inst) {
   if (FCmpInst *I = dyn_cast<FCmpInst>(unwrap(Inst)))
     return (LLVMRealPredicate)I->getPredicate();
   if (ConstantExpr *CE = dyn_cast<ConstantExpr>(unwrap(Inst)))
     if (CE->getOpcode() == Instruction::FCmp)
       return (LLVMRealPredicate)CE->getPredicate();
   return (LLVMRealPredicate)0;
 }
 
 LLVMOpcode LLVMGetInstructionOpcode(LLVMValueRef Inst) {
   if (Instruction *C = dyn_cast<Instruction>(unwrap(Inst)))
     return map_to_llvmopcode(C->getOpcode());
   return (LLVMOpcode)0;
 }
 
 LLVMValueRef LLVMInstructionClone(LLVMValueRef Inst) {
   if (Instruction *C = dyn_cast<Instruction>(unwrap(Inst)))
     return wrap(C->clone());
   return nullptr;
 }
 
 LLVMValueRef LLVMIsATerminatorInst(LLVMValueRef Inst) {
   Instruction *I = dyn_cast<Instruction>(unwrap(Inst));
   return (I && I->isTerminator()) ? wrap(I) : nullptr;
 }
 
 unsigned LLVMGetNumArgOperands(LLVMValueRef Instr) {
   if (FuncletPadInst *FPI = dyn_cast<FuncletPadInst>(unwrap(Instr))) {
     return FPI->getNumArgOperands();
   }
   return unwrap<CallBase>(Instr)->getNumArgOperands();
 }
 
 /*--.. Call and invoke instructions ........................................--*/
 
 unsigned LLVMGetInstructionCallConv(LLVMValueRef Instr) {
   return unwrap<CallBase>(Instr)->getCallingConv();
 }
 
 void LLVMSetInstructionCallConv(LLVMValueRef Instr, unsigned CC) {
   return unwrap<CallBase>(Instr)->setCallingConv(
       static_cast<CallingConv::ID>(CC));
 }
 
 void LLVMSetInstrParamAlignment(LLVMValueRef Instr, unsigned index,
                                 unsigned align) {
   auto *Call = unwrap<CallBase>(Instr);
   Attribute AlignAttr =
       Attribute::getWithAlignment(Call->getContext(), Align(align));
   Call->addAttribute(index, AlignAttr);
 }
 
 void LLVMAddCallSiteAttribute(LLVMValueRef C, LLVMAttributeIndex Idx,
                               LLVMAttributeRef A) {
   unwrap<CallBase>(C)->addAttribute(Idx, unwrap(A));
 }
 
 unsigned LLVMGetCallSiteAttributeCount(LLVMValueRef C,
                                        LLVMAttributeIndex Idx) {
   auto *Call = unwrap<CallBase>(C);
   auto AS = Call->getAttributes().getAttributes(Idx);
   return AS.getNumAttributes();
 }
 
 void LLVMGetCallSiteAttributes(LLVMValueRef C, LLVMAttributeIndex Idx,
                                LLVMAttributeRef *Attrs) {
   auto *Call = unwrap<CallBase>(C);
   auto AS = Call->getAttributes().getAttributes(Idx);
   for (auto A : AS)
     *Attrs++ = wrap(A);
 }
 
 LLVMAttributeRef LLVMGetCallSiteEnumAttribute(LLVMValueRef C,
                                               LLVMAttributeIndex Idx,
                                               unsigned KindID) {
   return wrap(
       unwrap<CallBase>(C)->getAttribute(Idx, (Attribute::AttrKind)KindID));
 }
 
 LLVMAttributeRef LLVMGetCallSiteStringAttribute(LLVMValueRef C,
                                                 LLVMAttributeIndex Idx,
                                                 const char *K, unsigned KLen) {
   return wrap(unwrap<CallBase>(C)->getAttribute(Idx, StringRef(K, KLen)));
 }
 
 void LLVMRemoveCallSiteEnumAttribute(LLVMValueRef C, LLVMAttributeIndex Idx,
                                      unsigned KindID) {
   unwrap<CallBase>(C)->removeAttribute(Idx, (Attribute::AttrKind)KindID);
 }
 
 void LLVMRemoveCallSiteStringAttribute(LLVMValueRef C, LLVMAttributeIndex Idx,
                                        const char *K, unsigned KLen) {
   unwrap<CallBase>(C)->removeAttribute(Idx, StringRef(K, KLen));
 }
 
 LLVMValueRef LLVMGetCalledValue(LLVMValueRef Instr) {
   return wrap(unwrap<CallBase>(Instr)->getCalledOperand());
 }
 
 LLVMTypeRef LLVMGetCalledFunctionType(LLVMValueRef Instr) {
   return wrap(unwrap<CallBase>(Instr)->getFunctionType());
 }
 
 /*--.. Operations on call instructions (only) ..............................--*/
 
 LLVMBool LLVMIsTailCall(LLVMValueRef Call) {
   return unwrap<CallInst>(Call)->isTailCall();
 }
 
 void LLVMSetTailCall(LLVMValueRef Call, LLVMBool isTailCall) {
   unwrap<CallInst>(Call)->setTailCall(isTailCall);
 }
 
 /*--.. Operations on invoke instructions (only) ............................--*/
 
 LLVMBasicBlockRef LLVMGetNormalDest(LLVMValueRef Invoke) {
   return wrap(unwrap<InvokeInst>(Invoke)->getNormalDest());
 }
 
 LLVMBasicBlockRef LLVMGetUnwindDest(LLVMValueRef Invoke) {
   if (CleanupReturnInst *CRI = dyn_cast<CleanupReturnInst>(unwrap(Invoke))) {
     return wrap(CRI->getUnwindDest());
   } else if (CatchSwitchInst *CSI = dyn_cast<CatchSwitchInst>(unwrap(Invoke))) {
     return wrap(CSI->getUnwindDest());
   }
   return wrap(unwrap<InvokeInst>(Invoke)->getUnwindDest());
 }
 
 void LLVMSetNormalDest(LLVMValueRef Invoke, LLVMBasicBlockRef B) {
   unwrap<InvokeInst>(Invoke)->setNormalDest(unwrap(B));
 }
 
 void LLVMSetUnwindDest(LLVMValueRef Invoke, LLVMBasicBlockRef B) {
   if (CleanupReturnInst *CRI = dyn_cast<CleanupReturnInst>(unwrap(Invoke))) {
     return CRI->setUnwindDest(unwrap(B));
   } else if (CatchSwitchInst *CSI = dyn_cast<CatchSwitchInst>(unwrap(Invoke))) {
     return CSI->setUnwindDest(unwrap(B));
   }
   unwrap<InvokeInst>(Invoke)->setUnwindDest(unwrap(B));
 }
 
 /*--.. Operations on terminators ...........................................--*/
 
 unsigned LLVMGetNumSuccessors(LLVMValueRef Term) {
   return unwrap<Instruction>(Term)->getNumSuccessors();
 }
 
 LLVMBasicBlockRef LLVMGetSuccessor(LLVMValueRef Term, unsigned i) {
   return wrap(unwrap<Instruction>(Term)->getSuccessor(i));
 }
 
 void LLVMSetSuccessor(LLVMValueRef Term, unsigned i, LLVMBasicBlockRef block) {
   return unwrap<Instruction>(Term)->setSuccessor(i, unwrap(block));
 }
 
 /*--.. Operations on branch instructions (only) ............................--*/
 
 LLVMBool LLVMIsConditional(LLVMValueRef Branch) {
   return unwrap<BranchInst>(Branch)->isConditional();
 }
 
 LLVMValueRef LLVMGetCondition(LLVMValueRef Branch) {
   return wrap(unwrap<BranchInst>(Branch)->getCondition());
 }
 
 void LLVMSetCondition(LLVMValueRef Branch, LLVMValueRef Cond) {
   return unwrap<BranchInst>(Branch)->setCondition(unwrap(Cond));
 }
 
 /*--.. Operations on switch instructions (only) ............................--*/
 
 LLVMBasicBlockRef LLVMGetSwitchDefaultDest(LLVMValueRef Switch) {
   return wrap(unwrap<SwitchInst>(Switch)->getDefaultDest());
 }
 
 /*--.. Operations on alloca instructions (only) ............................--*/
 
 LLVMTypeRef LLVMGetAllocatedType(LLVMValueRef Alloca) {
   return wrap(unwrap<AllocaInst>(Alloca)->getAllocatedType());
 }
 
 /*--.. Operations on gep instructions (only) ...............................--*/
 
 LLVMBool LLVMIsInBounds(LLVMValueRef GEP) {
   return unwrap<GetElementPtrInst>(GEP)->isInBounds();
 }
 
 void LLVMSetIsInBounds(LLVMValueRef GEP, LLVMBool InBounds) {
   return unwrap<GetElementPtrInst>(GEP)->setIsInBounds(InBounds);
 }
 
 /*--.. Operations on phi nodes .............................................--*/
 
 void LLVMAddIncoming(LLVMValueRef PhiNode, LLVMValueRef *IncomingValues,
                      LLVMBasicBlockRef *IncomingBlocks, unsigned Count) {
   PHINode *PhiVal = unwrap<PHINode>(PhiNode);
   for (unsigned I = 0; I != Count; ++I)
     PhiVal->addIncoming(unwrap(IncomingValues[I]), unwrap(IncomingBlocks[I]));
 }
 
 unsigned LLVMCountIncoming(LLVMValueRef PhiNode) {
   return unwrap<PHINode>(PhiNode)->getNumIncomingValues();
 }
 
 LLVMValueRef LLVMGetIncomingValue(LLVMValueRef PhiNode, unsigned Index) {
   return wrap(unwrap<PHINode>(PhiNode)->getIncomingValue(Index));
 }
 
 LLVMBasicBlockRef LLVMGetIncomingBlock(LLVMValueRef PhiNode, unsigned Index) {
   return wrap(unwrap<PHINode>(PhiNode)->getIncomingBlock(Index));
 }
 
 /*--.. Operations on extractvalue and insertvalue nodes ....................--*/
 
 unsigned LLVMGetNumIndices(LLVMValueRef Inst) {
   auto *I = unwrap(Inst);
   if (auto *GEP = dyn_cast<GetElementPtrInst>(I))
     return GEP->getNumIndices();
   if (auto *EV = dyn_cast<ExtractValueInst>(I))
     return EV->getNumIndices();
   if (auto *IV = dyn_cast<InsertValueInst>(I))
     return IV->getNumIndices();
   if (auto *CE = dyn_cast<ConstantExpr>(I))
     return CE->getIndices().size();
   llvm_unreachable(
     "LLVMGetNumIndices applies only to extractvalue and insertvalue!");
 }
 
 const unsigned *LLVMGetIndices(LLVMValueRef Inst) {
   auto *I = unwrap(Inst);
   if (auto *EV = dyn_cast<ExtractValueInst>(I))
     return EV->getIndices().data();
   if (auto *IV = dyn_cast<InsertValueInst>(I))
     return IV->getIndices().data();
   if (auto *CE = dyn_cast<ConstantExpr>(I))
     return CE->getIndices().data();
   llvm_unreachable(
     "LLVMGetIndices applies only to extractvalue and insertvalue!");
 }
 
 
 /*===-- Instruction builders ----------------------------------------------===*/
 
 LLVMBuilderRef LLVMCreateBuilderInContext(LLVMContextRef C) {
   return wrap(new IRBuilder<>(*unwrap(C)));
 }
 
 LLVMBuilderRef LLVMCreateBuilder(void) {
   return LLVMCreateBuilderInContext(LLVMGetGlobalContext());
 }
 
 void LLVMPositionBuilder(LLVMBuilderRef Builder, LLVMBasicBlockRef Block,
                          LLVMValueRef Instr) {
   BasicBlock *BB = unwrap(Block);
   auto I = Instr ? unwrap<Instruction>(Instr)->getIterator() : BB->end();
   unwrap(Builder)->SetInsertPoint(BB, I);
 }
 
 void LLVMPositionBuilderBefore(LLVMBuilderRef Builder, LLVMValueRef Instr) {
   Instruction *I = unwrap<Instruction>(Instr);
   unwrap(Builder)->SetInsertPoint(I->getParent(), I->getIterator());
 }
 
 void LLVMPositionBuilderAtEnd(LLVMBuilderRef Builder, LLVMBasicBlockRef Block) {
   BasicBlock *BB = unwrap(Block);
   unwrap(Builder)->SetInsertPoint(BB);
 }
 
 LLVMBasicBlockRef LLVMGetInsertBlock(LLVMBuilderRef Builder) {
    return wrap(unwrap(Builder)->GetInsertBlock());
 }
 
 void LLVMClearInsertionPosition(LLVMBuilderRef Builder) {
   unwrap(Builder)->ClearInsertionPoint();
 }
 
 void LLVMInsertIntoBuilder(LLVMBuilderRef Builder, LLVMValueRef Instr) {
   unwrap(Builder)->Insert(unwrap<Instruction>(Instr));
 }
 
 void LLVMInsertIntoBuilderWithName(LLVMBuilderRef Builder, LLVMValueRef Instr,
                                    const char *Name) {
   unwrap(Builder)->Insert(unwrap<Instruction>(Instr), Name);
 }
 
 void LLVMDisposeBuilder(LLVMBuilderRef Builder) {
   delete unwrap(Builder);
 }
 
 /*--.. Metadata builders ...................................................--*/
 
 LLVMMetadataRef LLVMGetCurrentDebugLocation2(LLVMBuilderRef Builder) {
   return wrap(unwrap(Builder)->getCurrentDebugLocation().getAsMDNode());
 }
 
 void LLVMSetCurrentDebugLocation2(LLVMBuilderRef Builder, LLVMMetadataRef Loc) {
   if (Loc)
     unwrap(Builder)->SetCurrentDebugLocation(DebugLoc(unwrap<MDNode>(Loc)));
   else
     unwrap(Builder)->SetCurrentDebugLocation(DebugLoc());
 }
 
 void LLVMSetCurrentDebugLocation(LLVMBuilderRef Builder, LLVMValueRef L) {
   MDNode *Loc =
       L ? cast<MDNode>(unwrap<MetadataAsValue>(L)->getMetadata()) : nullptr;
   unwrap(Builder)->SetCurrentDebugLocation(DebugLoc(Loc));
 }
 
 LLVMValueRef LLVMGetCurrentDebugLocation(LLVMBuilderRef Builder) {
   LLVMContext &Context = unwrap(Builder)->getContext();
   return wrap(MetadataAsValue::get(
       Context, unwrap(Builder)->getCurrentDebugLocation().getAsMDNode()));
 }
 
 void LLVMSetInstDebugLocation(LLVMBuilderRef Builder, LLVMValueRef Inst) {
   unwrap(Builder)->SetInstDebugLocation(unwrap<Instruction>(Inst));
 }
 
 void LLVMBuilderSetDefaultFPMathTag(LLVMBuilderRef Builder,
                                     LLVMMetadataRef FPMathTag) {
 
   unwrap(Builder)->setDefaultFPMathTag(FPMathTag
                                        ? unwrap<MDNode>(FPMathTag)
                                        : nullptr);
 }
 
 LLVMMetadataRef LLVMBuilderGetDefaultFPMathTag(LLVMBuilderRef Builder) {
   return wrap(unwrap(Builder)->getDefaultFPMathTag());
 }
 
 /*--.. Instruction builders ................................................--*/
 
 LLVMValueRef LLVMBuildRetVoid(LLVMBuilderRef B) {
   return wrap(unwrap(B)->CreateRetVoid());
 }
 
 LLVMValueRef LLVMBuildRet(LLVMBuilderRef B, LLVMValueRef V) {
   return wrap(unwrap(B)->CreateRet(unwrap(V)));
 }
 
 LLVMValueRef LLVMBuildAggregateRet(LLVMBuilderRef B, LLVMValueRef *RetVals,
                                    unsigned N) {
   return wrap(unwrap(B)->CreateAggregateRet(unwrap(RetVals), N));
 }
 
 LLVMValueRef LLVMBuildBr(LLVMBuilderRef B, LLVMBasicBlockRef Dest) {
   return wrap(unwrap(B)->CreateBr(unwrap(Dest)));
 }
 
 LLVMValueRef LLVMBuildCondBr(LLVMBuilderRef B, LLVMValueRef If,
                              LLVMBasicBlockRef Then, LLVMBasicBlockRef Else) {
   return wrap(unwrap(B)->CreateCondBr(unwrap(If), unwrap(Then), unwrap(Else)));
 }
 
 LLVMValueRef LLVMBuildSwitch(LLVMBuilderRef B, LLVMValueRef V,
                              LLVMBasicBlockRef Else, unsigned NumCases) {
   return wrap(unwrap(B)->CreateSwitch(unwrap(V), unwrap(Else), NumCases));
 }
 
 LLVMValueRef LLVMBuildIndirectBr(LLVMBuilderRef B, LLVMValueRef Addr,
                                  unsigned NumDests) {
   return wrap(unwrap(B)->CreateIndirectBr(unwrap(Addr), NumDests));
 }
 
 LLVMValueRef LLVMBuildInvoke(LLVMBuilderRef B, LLVMValueRef Fn,
                              LLVMValueRef *Args, unsigned NumArgs,
                              LLVMBasicBlockRef Then, LLVMBasicBlockRef Catch,
                              const char *Name) {
   Value *V = unwrap(Fn);
   FunctionType *FnT =
       cast<FunctionType>(cast<PointerType>(V->getType())->getElementType());
 
   return wrap(
       unwrap(B)->CreateInvoke(FnT, unwrap(Fn), unwrap(Then), unwrap(Catch),
                               makeArrayRef(unwrap(Args), NumArgs), Name));
 }
 
 LLVMValueRef LLVMBuildInvoke2(LLVMBuilderRef B, LLVMTypeRef Ty, LLVMValueRef Fn,
                               LLVMValueRef *Args, unsigned NumArgs,
                               LLVMBasicBlockRef Then, LLVMBasicBlockRef Catch,
                               const char *Name) {
   return wrap(unwrap(B)->CreateInvoke(
       unwrap<FunctionType>(Ty), unwrap(Fn), unwrap(Then), unwrap(Catch),
       makeArrayRef(unwrap(Args), NumArgs), Name));
 }
 
 LLVMValueRef LLVMBuildLandingPad(LLVMBuilderRef B, LLVMTypeRef Ty,
                                  LLVMValueRef PersFn, unsigned NumClauses,
                                  const char *Name) {
   // The personality used to live on the landingpad instruction, but now it
   // lives on the parent function. For compatibility, take the provided
   // personality and put it on the parent function.
   if (PersFn)
     unwrap(B)->GetInsertBlock()->getParent()->setPersonalityFn(
         cast<Function>(unwrap(PersFn)));
   return wrap(unwrap(B)->CreateLandingPad(unwrap(Ty), NumClauses, Name));
 }
 
 LLVMValueRef LLVMBuildCatchPad(LLVMBuilderRef B, LLVMValueRef ParentPad,
                                LLVMValueRef *Args, unsigned NumArgs,
                                const char *Name) {
   return wrap(unwrap(B)->CreateCatchPad(unwrap(ParentPad),
                                         makeArrayRef(unwrap(Args), NumArgs),
                                         Name));
 }
 
 LLVMValueRef LLVMBuildCleanupPad(LLVMBuilderRef B, LLVMValueRef ParentPad,
                                  LLVMValueRef *Args, unsigned NumArgs,
                                  const char *Name) {
   if (ParentPad == nullptr) {
     Type *Ty = Type::getTokenTy(unwrap(B)->getContext());
     ParentPad = wrap(Constant::getNullValue(Ty));
   }
   return wrap(unwrap(B)->CreateCleanupPad(unwrap(ParentPad),
                                           makeArrayRef(unwrap(Args), NumArgs),
                                           Name));
 }
 
 LLVMValueRef LLVMBuildResume(LLVMBuilderRef B, LLVMValueRef Exn) {
   return wrap(unwrap(B)->CreateResume(unwrap(Exn)));
 }
 
 LLVMValueRef LLVMBuildCatchSwitch(LLVMBuilderRef B, LLVMValueRef ParentPad,
                                   LLVMBasicBlockRef UnwindBB,
                                   unsigned NumHandlers, const char *Name) {
   if (ParentPad == nullptr) {
     Type *Ty = Type::getTokenTy(unwrap(B)->getContext());
     ParentPad = wrap(Constant::getNullValue(Ty));
   }
   return wrap(unwrap(B)->CreateCatchSwitch(unwrap(ParentPad), unwrap(UnwindBB),
                                            NumHandlers, Name));
 }
 
 LLVMValueRef LLVMBuildCatchRet(LLVMBuilderRef B, LLVMValueRef CatchPad,
                                LLVMBasicBlockRef BB) {
   return wrap(unwrap(B)->CreateCatchRet(unwrap<CatchPadInst>(CatchPad),
                                         unwrap(BB)));
 }
 
 LLVMValueRef LLVMBuildCleanupRet(LLVMBuilderRef B, LLVMValueRef CatchPad,
                                  LLVMBasicBlockRef BB) {
   return wrap(unwrap(B)->CreateCleanupRet(unwrap<CleanupPadInst>(CatchPad),
                                           unwrap(BB)));
 }
 
 LLVMValueRef LLVMBuildUnreachable(LLVMBuilderRef B) {
   return wrap(unwrap(B)->CreateUnreachable());
 }
 
 void LLVMAddCase(LLVMValueRef Switch, LLVMValueRef OnVal,
                  LLVMBasicBlockRef Dest) {
   unwrap<SwitchInst>(Switch)->addCase(unwrap<ConstantInt>(OnVal), unwrap(Dest));
 }
 
 void LLVMAddDestination(LLVMValueRef IndirectBr, LLVMBasicBlockRef Dest) {
   unwrap<IndirectBrInst>(IndirectBr)->addDestination(unwrap(Dest));
 }
 
 unsigned LLVMGetNumClauses(LLVMValueRef LandingPad) {
   return unwrap<LandingPadInst>(LandingPad)->getNumClauses();
 }
 
 LLVMValueRef LLVMGetClause(LLVMValueRef LandingPad, unsigned Idx) {
   return wrap(unwrap<LandingPadInst>(LandingPad)->getClause(Idx));
 }
 
 void LLVMAddClause(LLVMValueRef LandingPad, LLVMValueRef ClauseVal) {
   unwrap<LandingPadInst>(LandingPad)->
     addClause(cast<Constant>(unwrap(ClauseVal)));
 }
 
 LLVMBool LLVMIsCleanup(LLVMValueRef LandingPad) {
   return unwrap<LandingPadInst>(LandingPad)->isCleanup();
 }
 
 void LLVMSetCleanup(LLVMValueRef LandingPad, LLVMBool Val) {
   unwrap<LandingPadInst>(LandingPad)->setCleanup(Val);
 }
 
 void LLVMAddHandler(LLVMValueRef CatchSwitch, LLVMBasicBlockRef Dest) {
   unwrap<CatchSwitchInst>(CatchSwitch)->addHandler(unwrap(Dest));
 }
 
 unsigned LLVMGetNumHandlers(LLVMValueRef CatchSwitch) {
   return unwrap<CatchSwitchInst>(CatchSwitch)->getNumHandlers();
 }
 
 void LLVMGetHandlers(LLVMValueRef CatchSwitch, LLVMBasicBlockRef *Handlers) {
   CatchSwitchInst *CSI = unwrap<CatchSwitchInst>(CatchSwitch);
   for (CatchSwitchInst::handler_iterator I = CSI->handler_begin(),
                                          E = CSI->handler_end(); I != E; ++I)
     *Handlers++ = wrap(*I);
 }
 
 LLVMValueRef LLVMGetParentCatchSwitch(LLVMValueRef CatchPad) {
   return wrap(unwrap<CatchPadInst>(CatchPad)->getCatchSwitch());
 }
 
 void LLVMSetParentCatchSwitch(LLVMValueRef CatchPad, LLVMValueRef CatchSwitch) {
   unwrap<CatchPadInst>(CatchPad)
     ->setCatchSwitch(unwrap<CatchSwitchInst>(CatchSwitch));
 }
 
 /*--.. Funclets ...........................................................--*/
 
 LLVMValueRef LLVMGetArgOperand(LLVMValueRef Funclet, unsigned i) {
   return wrap(unwrap<FuncletPadInst>(Funclet)->getArgOperand(i));
 }
 
 void LLVMSetArgOperand(LLVMValueRef Funclet, unsigned i, LLVMValueRef value) {
   unwrap<FuncletPadInst>(Funclet)->setArgOperand(i, unwrap(value));
 }
 
 /*--.. Arithmetic ..........................................................--*/
 
 LLVMValueRef LLVMBuildAdd(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateAdd(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildNSWAdd(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateNSWAdd(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildNUWAdd(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateNUWAdd(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildFAdd(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateFAdd(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildSub(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateSub(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildNSWSub(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateNSWSub(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildNUWSub(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateNUWSub(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildFSub(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateFSub(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildMul(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateMul(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildNSWMul(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateNSWMul(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildNUWMul(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateNUWMul(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildFMul(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateFMul(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildUDiv(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name) {
   return wrap(unwrap(B)->CreateUDiv(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildExactUDiv(LLVMBuilderRef B, LLVMValueRef LHS,
                                 LLVMValueRef RHS, const char *Name) {
   return wrap(unwrap(B)->CreateExactUDiv(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildSDiv(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name) {
   return wrap(unwrap(B)->CreateSDiv(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildExactSDiv(LLVMBuilderRef B, LLVMValueRef LHS,
                                 LLVMValueRef RHS, const char *Name) {
   return wrap(unwrap(B)->CreateExactSDiv(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildFDiv(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name) {
   return wrap(unwrap(B)->CreateFDiv(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildURem(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name) {
   return wrap(unwrap(B)->CreateURem(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildSRem(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name) {
   return wrap(unwrap(B)->CreateSRem(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildFRem(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name) {
   return wrap(unwrap(B)->CreateFRem(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildShl(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateShl(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildLShr(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name) {
   return wrap(unwrap(B)->CreateLShr(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildAShr(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name) {
   return wrap(unwrap(B)->CreateAShr(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildAnd(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateAnd(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildOr(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                          const char *Name) {
   return wrap(unwrap(B)->CreateOr(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildXor(LLVMBuilderRef B, LLVMValueRef LHS, LLVMValueRef RHS,
                           const char *Name) {
   return wrap(unwrap(B)->CreateXor(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildBinOp(LLVMBuilderRef B, LLVMOpcode Op,
                             LLVMValueRef LHS, LLVMValueRef RHS,
                             const char *Name) {
   return wrap(unwrap(B)->CreateBinOp(Instruction::BinaryOps(map_from_llvmopcode(Op)), unwrap(LHS),
                                      unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildNeg(LLVMBuilderRef B, LLVMValueRef V, const char *Name) {
   return wrap(unwrap(B)->CreateNeg(unwrap(V), Name));
 }
 
 LLVMValueRef LLVMBuildNSWNeg(LLVMBuilderRef B, LLVMValueRef V,
                              const char *Name) {
   return wrap(unwrap(B)->CreateNSWNeg(unwrap(V), Name));
 }
 
 LLVMValueRef LLVMBuildNUWNeg(LLVMBuilderRef B, LLVMValueRef V,
                              const char *Name) {
   return wrap(unwrap(B)->CreateNUWNeg(unwrap(V), Name));
 }
 
 LLVMValueRef LLVMBuildFNeg(LLVMBuilderRef B, LLVMValueRef V, const char *Name) {
   return wrap(unwrap(B)->CreateFNeg(unwrap(V), Name));
 }
 
 LLVMValueRef LLVMBuildNot(LLVMBuilderRef B, LLVMValueRef V, const char *Name) {
   return wrap(unwrap(B)->CreateNot(unwrap(V), Name));
 }
 
 /*--.. Memory ..............................................................--*/
 
 LLVMValueRef LLVMBuildMalloc(LLVMBuilderRef B, LLVMTypeRef Ty,
                              const char *Name) {
   Type* ITy = Type::getInt32Ty(unwrap(B)->GetInsertBlock()->getContext());
   Constant* AllocSize = ConstantExpr::getSizeOf(unwrap(Ty));
   AllocSize = ConstantExpr::getTruncOrBitCast(AllocSize, ITy);
   Instruction* Malloc = CallInst::CreateMalloc(unwrap(B)->GetInsertBlock(),
                                                ITy, unwrap(Ty), AllocSize,
                                                nullptr, nullptr, "");
   return wrap(unwrap(B)->Insert(Malloc, Twine(Name)));
 }
 
 LLVMValueRef LLVMBuildArrayMalloc(LLVMBuilderRef B, LLVMTypeRef Ty,
                                   LLVMValueRef Val, const char *Name) {
   Type* ITy = Type::getInt32Ty(unwrap(B)->GetInsertBlock()->getContext());
   Constant* AllocSize = ConstantExpr::getSizeOf(unwrap(Ty));
   AllocSize = ConstantExpr::getTruncOrBitCast(AllocSize, ITy);
   Instruction* Malloc = CallInst::CreateMalloc(unwrap(B)->GetInsertBlock(),
                                                ITy, unwrap(Ty), AllocSize,
                                                unwrap(Val), nullptr, "");
   return wrap(unwrap(B)->Insert(Malloc, Twine(Name)));
 }
 
 LLVMValueRef LLVMBuildMemSet(LLVMBuilderRef B, LLVMValueRef Ptr,
                              LLVMValueRef Val, LLVMValueRef Len,
                              unsigned Align) {
   return wrap(unwrap(B)->CreateMemSet(unwrap(Ptr), unwrap(Val), unwrap(Len),
                                       MaybeAlign(Align)));
 }
 
 LLVMValueRef LLVMBuildMemCpy(LLVMBuilderRef B,
                              LLVMValueRef Dst, unsigned DstAlign,
                              LLVMValueRef Src, unsigned SrcAlign,
                              LLVMValueRef Size) {
   return wrap(unwrap(B)->CreateMemCpy(unwrap(Dst), MaybeAlign(DstAlign),
                                       unwrap(Src), MaybeAlign(SrcAlign),
                                       unwrap(Size)));
 }
 
 LLVMValueRef LLVMBuildMemMove(LLVMBuilderRef B,
                               LLVMValueRef Dst, unsigned DstAlign,
                               LLVMValueRef Src, unsigned SrcAlign,
                               LLVMValueRef Size) {
   return wrap(unwrap(B)->CreateMemMove(unwrap(Dst), MaybeAlign(DstAlign),
                                        unwrap(Src), MaybeAlign(SrcAlign),
                                        unwrap(Size)));
 }
 
 LLVMValueRef LLVMBuildAlloca(LLVMBuilderRef B, LLVMTypeRef Ty,
                              const char *Name) {
   return wrap(unwrap(B)->CreateAlloca(unwrap(Ty), nullptr, Name));
 }
 
 LLVMValueRef LLVMBuildArrayAlloca(LLVMBuilderRef B, LLVMTypeRef Ty,
                                   LLVMValueRef Val, const char *Name) {
   return wrap(unwrap(B)->CreateAlloca(unwrap(Ty), unwrap(Val), Name));
 }
 
 LLVMValueRef LLVMBuildFree(LLVMBuilderRef B, LLVMValueRef PointerVal) {
   return wrap(unwrap(B)->Insert(
      CallInst::CreateFree(unwrap(PointerVal), unwrap(B)->GetInsertBlock())));
 }
 
 LLVMValueRef LLVMBuildLoad(LLVMBuilderRef B, LLVMValueRef PointerVal,
                            const char *Name) {
   Value *V = unwrap(PointerVal);
   PointerType *Ty = cast<PointerType>(V->getType());
 
   return wrap(unwrap(B)->CreateLoad(Ty->getElementType(), V, Name));
 }
 
 LLVMValueRef LLVMBuildLoad2(LLVMBuilderRef B, LLVMTypeRef Ty,
                             LLVMValueRef PointerVal, const char *Name) {
   return wrap(unwrap(B)->CreateLoad(unwrap(Ty), unwrap(PointerVal), Name));
 }
 
 LLVMValueRef LLVMBuildStore(LLVMBuilderRef B, LLVMValueRef Val,
                             LLVMValueRef PointerVal) {
   return wrap(unwrap(B)->CreateStore(unwrap(Val), unwrap(PointerVal)));
 }
 
 static AtomicOrdering mapFromLLVMOrdering(LLVMAtomicOrdering Ordering) {
   switch (Ordering) {
     case LLVMAtomicOrderingNotAtomic: return AtomicOrdering::NotAtomic;
     case LLVMAtomicOrderingUnordered: return AtomicOrdering::Unordered;
     case LLVMAtomicOrderingMonotonic: return AtomicOrdering::Monotonic;
     case LLVMAtomicOrderingAcquire: return AtomicOrdering::Acquire;
     case LLVMAtomicOrderingRelease: return AtomicOrdering::Release;
     case LLVMAtomicOrderingAcquireRelease:
       return AtomicOrdering::AcquireRelease;
     case LLVMAtomicOrderingSequentiallyConsistent:
       return AtomicOrdering::SequentiallyConsistent;
   }
 
   llvm_unreachable("Invalid LLVMAtomicOrdering value!");
 }
 
 static LLVMAtomicOrdering mapToLLVMOrdering(AtomicOrdering Ordering) {
   switch (Ordering) {
     case AtomicOrdering::NotAtomic: return LLVMAtomicOrderingNotAtomic;
     case AtomicOrdering::Unordered: return LLVMAtomicOrderingUnordered;
     case AtomicOrdering::Monotonic: return LLVMAtomicOrderingMonotonic;
     case AtomicOrdering::Acquire: return LLVMAtomicOrderingAcquire;
     case AtomicOrdering::Release: return LLVMAtomicOrderingRelease;
     case AtomicOrdering::AcquireRelease:
       return LLVMAtomicOrderingAcquireRelease;
     case AtomicOrdering::SequentiallyConsistent:
       return LLVMAtomicOrderingSequentiallyConsistent;
   }
 
   llvm_unreachable("Invalid AtomicOrdering value!");
 }
 
 static AtomicRMWInst::BinOp mapFromLLVMRMWBinOp(LLVMAtomicRMWBinOp BinOp) {
   switch (BinOp) {
     case LLVMAtomicRMWBinOpXchg: return AtomicRMWInst::Xchg;
     case LLVMAtomicRMWBinOpAdd: return AtomicRMWInst::Add;
     case LLVMAtomicRMWBinOpSub: return AtomicRMWInst::Sub;
     case LLVMAtomicRMWBinOpAnd: return AtomicRMWInst::And;
     case LLVMAtomicRMWBinOpNand: return AtomicRMWInst::Nand;
     case LLVMAtomicRMWBinOpOr: return AtomicRMWInst::Or;
     case LLVMAtomicRMWBinOpXor: return AtomicRMWInst::Xor;
     case LLVMAtomicRMWBinOpMax: return AtomicRMWInst::Max;
     case LLVMAtomicRMWBinOpMin: return AtomicRMWInst::Min;
     case LLVMAtomicRMWBinOpUMax: return AtomicRMWInst::UMax;
     case LLVMAtomicRMWBinOpUMin: return AtomicRMWInst::UMin;
     case LLVMAtomicRMWBinOpFAdd: return AtomicRMWInst::FAdd;
     case LLVMAtomicRMWBinOpFSub: return AtomicRMWInst::FSub;
   }
 
   llvm_unreachable("Invalid LLVMAtomicRMWBinOp value!");
 }
 
 static LLVMAtomicRMWBinOp mapToLLVMRMWBinOp(AtomicRMWInst::BinOp BinOp) {
   switch (BinOp) {
     case AtomicRMWInst::Xchg: return LLVMAtomicRMWBinOpXchg;
     case AtomicRMWInst::Add: return LLVMAtomicRMWBinOpAdd;
     case AtomicRMWInst::Sub: return LLVMAtomicRMWBinOpSub;
     case AtomicRMWInst::And: return LLVMAtomicRMWBinOpAnd;
     case AtomicRMWInst::Nand: return LLVMAtomicRMWBinOpNand;
     case AtomicRMWInst::Or: return LLVMAtomicRMWBinOpOr;
     case AtomicRMWInst::Xor: return LLVMAtomicRMWBinOpXor;
     case AtomicRMWInst::Max: return LLVMAtomicRMWBinOpMax;
     case AtomicRMWInst::Min: return LLVMAtomicRMWBinOpMin;
     case AtomicRMWInst::UMax: return LLVMAtomicRMWBinOpUMax;
     case AtomicRMWInst::UMin: return LLVMAtomicRMWBinOpUMin;
     case AtomicRMWInst::FAdd: return LLVMAtomicRMWBinOpFAdd;
     case AtomicRMWInst::FSub: return LLVMAtomicRMWBinOpFSub;
     default: break;
   }
 
   llvm_unreachable("Invalid AtomicRMWBinOp value!");
 }
 
 // TODO: Should this and other atomic instructions support building with
 // "syncscope"?
 LLVMValueRef LLVMBuildFence(LLVMBuilderRef B, LLVMAtomicOrdering Ordering,
                             LLVMBool isSingleThread, const char *Name) {
   return wrap(
     unwrap(B)->CreateFence(mapFromLLVMOrdering(Ordering),
                            isSingleThread ? SyncScope::SingleThread
                                           : SyncScope::System,
                            Name));
 }
 
 LLVMValueRef LLVMBuildGEP(LLVMBuilderRef B, LLVMValueRef Pointer,
                           LLVMValueRef *Indices, unsigned NumIndices,
                           const char *Name) {
   ArrayRef<Value *> IdxList(unwrap(Indices), NumIndices);
   Value *Val = unwrap(Pointer);
   Type *Ty =
       cast<PointerType>(Val->getType()->getScalarType())->getElementType();
   return wrap(unwrap(B)->CreateGEP(Ty, Val, IdxList, Name));
 }
 
 LLVMValueRef LLVMBuildGEP2(LLVMBuilderRef B, LLVMTypeRef Ty,
                            LLVMValueRef Pointer, LLVMValueRef *Indices,
                            unsigned NumIndices, const char *Name) {
   ArrayRef<Value *> IdxList(unwrap(Indices), NumIndices);
   return wrap(unwrap(B)->CreateGEP(unwrap(Ty), unwrap(Pointer), IdxList, Name));
 }
 
 LLVMValueRef LLVMBuildInBoundsGEP(LLVMBuilderRef B, LLVMValueRef Pointer,
                                   LLVMValueRef *Indices, unsigned NumIndices,
                                   const char *Name) {
   ArrayRef<Value *> IdxList(unwrap(Indices), NumIndices);
   Value *Val = unwrap(Pointer);
   Type *Ty =
       cast<PointerType>(Val->getType()->getScalarType())->getElementType();
   return wrap(unwrap(B)->CreateInBoundsGEP(Ty, Val, IdxList, Name));
 }
 
 LLVMValueRef LLVMBuildInBoundsGEP2(LLVMBuilderRef B, LLVMTypeRef Ty,
                                    LLVMValueRef Pointer, LLVMValueRef *Indices,
                                    unsigned NumIndices, const char *Name) {
   ArrayRef<Value *> IdxList(unwrap(Indices), NumIndices);
   return wrap(
       unwrap(B)->CreateInBoundsGEP(unwrap(Ty), unwrap(Pointer), IdxList, Name));
 }
 
 LLVMValueRef LLVMBuildStructGEP(LLVMBuilderRef B, LLVMValueRef Pointer,
                                 unsigned Idx, const char *Name) {
   Value *Val = unwrap(Pointer);
   Type *Ty =
       cast<PointerType>(Val->getType()->getScalarType())->getElementType();
   return wrap(unwrap(B)->CreateStructGEP(Ty, Val, Idx, Name));
 }
 
 LLVMValueRef LLVMBuildStructGEP2(LLVMBuilderRef B, LLVMTypeRef Ty,
                                  LLVMValueRef Pointer, unsigned Idx,
                                  const char *Name) {
   return wrap(
       unwrap(B)->CreateStructGEP(unwrap(Ty), unwrap(Pointer), Idx, Name));
 }
 
 LLVMValueRef LLVMBuildGlobalString(LLVMBuilderRef B, const char *Str,
                                    const char *Name) {
   return wrap(unwrap(B)->CreateGlobalString(Str, Name));
 }
 
 LLVMValueRef LLVMBuildGlobalStringPtr(LLVMBuilderRef B, const char *Str,
                                       const char *Name) {
   return wrap(unwrap(B)->CreateGlobalStringPtr(Str, Name));
 }
 
 LLVMBool LLVMGetVolatile(LLVMValueRef MemAccessInst) {
   Value *P = unwrap<Value>(MemAccessInst);
   if (LoadInst *LI = dyn_cast<LoadInst>(P))
     return LI->isVolatile();
   if (StoreInst *SI = dyn_cast<StoreInst>(P))
     return SI->isVolatile();
   if (AtomicRMWInst *AI = dyn_cast<AtomicRMWInst>(P))
     return AI->isVolatile();
   return cast<AtomicCmpXchgInst>(P)->isVolatile();
 }
 
 void LLVMSetVolatile(LLVMValueRef MemAccessInst, LLVMBool isVolatile) {
   Value *P = unwrap<Value>(MemAccessInst);
   if (LoadInst *LI = dyn_cast<LoadInst>(P))
     return LI->setVolatile(isVolatile);
   if (StoreInst *SI = dyn_cast<StoreInst>(P))
     return SI->setVolatile(isVolatile);
   if (AtomicRMWInst *AI = dyn_cast<AtomicRMWInst>(P))
     return AI->setVolatile(isVolatile);
   return cast<AtomicCmpXchgInst>(P)->setVolatile(isVolatile);
 }
 
 LLVMBool LLVMGetWeak(LLVMValueRef CmpXchgInst) {
   return unwrap<AtomicCmpXchgInst>(CmpXchgInst)->isWeak();
 }
 
 void LLVMSetWeak(LLVMValueRef CmpXchgInst, LLVMBool isWeak) {
   return unwrap<AtomicCmpXchgInst>(CmpXchgInst)->setWeak(isWeak);
 }
 
 LLVMAtomicOrdering LLVMGetOrdering(LLVMValueRef MemAccessInst) {
   Value *P = unwrap<Value>(MemAccessInst);
   AtomicOrdering O;
   if (LoadInst *LI = dyn_cast<LoadInst>(P))
     O = LI->getOrdering();
   else if (StoreInst *SI = dyn_cast<StoreInst>(P))
     O = SI->getOrdering();
   else
     O = cast<AtomicRMWInst>(P)->getOrdering();
   return mapToLLVMOrdering(O);
 }
 
 void LLVMSetOrdering(LLVMValueRef MemAccessInst, LLVMAtomicOrdering Ordering) {
   Value *P = unwrap<Value>(MemAccessInst);
   AtomicOrdering O = mapFromLLVMOrdering(Ordering);
 
   if (LoadInst *LI = dyn_cast<LoadInst>(P))
     return LI->setOrdering(O);
   return cast<StoreInst>(P)->setOrdering(O);
 }
 
 LLVMAtomicRMWBinOp LLVMGetAtomicRMWBinOp(LLVMValueRef Inst) {
   return mapToLLVMRMWBinOp(unwrap<AtomicRMWInst>(Inst)->getOperation());
 }
 
 void LLVMSetAtomicRMWBinOp(LLVMValueRef Inst, LLVMAtomicRMWBinOp BinOp) {
   unwrap<AtomicRMWInst>(Inst)->setOperation(mapFromLLVMRMWBinOp(BinOp));
 }
 
 /*--.. Casts ...............................................................--*/
 
 LLVMValueRef LLVMBuildTrunc(LLVMBuilderRef B, LLVMValueRef Val,
                             LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateTrunc(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildZExt(LLVMBuilderRef B, LLVMValueRef Val,
                            LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateZExt(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildSExt(LLVMBuilderRef B, LLVMValueRef Val,
                            LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateSExt(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildFPToUI(LLVMBuilderRef B, LLVMValueRef Val,
                              LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateFPToUI(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildFPToSI(LLVMBuilderRef B, LLVMValueRef Val,
                              LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateFPToSI(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildUIToFP(LLVMBuilderRef B, LLVMValueRef Val,
                              LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateUIToFP(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildSIToFP(LLVMBuilderRef B, LLVMValueRef Val,
                              LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateSIToFP(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildFPTrunc(LLVMBuilderRef B, LLVMValueRef Val,
                               LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateFPTrunc(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildFPExt(LLVMBuilderRef B, LLVMValueRef Val,
                             LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateFPExt(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildPtrToInt(LLVMBuilderRef B, LLVMValueRef Val,
                                LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreatePtrToInt(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildIntToPtr(LLVMBuilderRef B, LLVMValueRef Val,
                                LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateIntToPtr(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildBitCast(LLVMBuilderRef B, LLVMValueRef Val,
                               LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateBitCast(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildAddrSpaceCast(LLVMBuilderRef B, LLVMValueRef Val,
                                     LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateAddrSpaceCast(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildZExtOrBitCast(LLVMBuilderRef B, LLVMValueRef Val,
                                     LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateZExtOrBitCast(unwrap(Val), unwrap(DestTy),
                                              Name));
 }
 
 LLVMValueRef LLVMBuildSExtOrBitCast(LLVMBuilderRef B, LLVMValueRef Val,
                                     LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateSExtOrBitCast(unwrap(Val), unwrap(DestTy),
                                              Name));
 }
 
 LLVMValueRef LLVMBuildTruncOrBitCast(LLVMBuilderRef B, LLVMValueRef Val,
                                      LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateTruncOrBitCast(unwrap(Val), unwrap(DestTy),
                                               Name));
 }
 
 LLVMValueRef LLVMBuildCast(LLVMBuilderRef B, LLVMOpcode Op, LLVMValueRef Val,
                            LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateCast(Instruction::CastOps(map_from_llvmopcode(Op)), unwrap(Val),
                                     unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildPointerCast(LLVMBuilderRef B, LLVMValueRef Val,
                                   LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreatePointerCast(unwrap(Val), unwrap(DestTy), Name));
 }
 
 LLVMValueRef LLVMBuildIntCast2(LLVMBuilderRef B, LLVMValueRef Val,
                                LLVMTypeRef DestTy, LLVMBool IsSigned,
                                const char *Name) {
   return wrap(
       unwrap(B)->CreateIntCast(unwrap(Val), unwrap(DestTy), IsSigned, Name));
 }
 
 LLVMValueRef LLVMBuildIntCast(LLVMBuilderRef B, LLVMValueRef Val,
                               LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateIntCast(unwrap(Val), unwrap(DestTy),
                                        /*isSigned*/true, Name));
 }
 
 LLVMValueRef LLVMBuildFPCast(LLVMBuilderRef B, LLVMValueRef Val,
                              LLVMTypeRef DestTy, const char *Name) {
   return wrap(unwrap(B)->CreateFPCast(unwrap(Val), unwrap(DestTy), Name));
 }
 
 /*--.. Comparisons .........................................................--*/
 
 LLVMValueRef LLVMBuildICmp(LLVMBuilderRef B, LLVMIntPredicate Op,
                            LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name) {
   return wrap(unwrap(B)->CreateICmp(static_cast<ICmpInst::Predicate>(Op),
                                     unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildFCmp(LLVMBuilderRef B, LLVMRealPredicate Op,
                            LLVMValueRef LHS, LLVMValueRef RHS,
                            const char *Name) {
   return wrap(unwrap(B)->CreateFCmp(static_cast<FCmpInst::Predicate>(Op),
                                     unwrap(LHS), unwrap(RHS), Name));
 }
 
 /*--.. Miscellaneous instructions ..........................................--*/
 
 LLVMValueRef LLVMBuildPhi(LLVMBuilderRef B, LLVMTypeRef Ty, const char *Name) {
   return wrap(unwrap(B)->CreatePHI(unwrap(Ty), 0, Name));
 }
 
 LLVMValueRef LLVMBuildCall(LLVMBuilderRef B, LLVMValueRef Fn,
                            LLVMValueRef *Args, unsigned NumArgs,
                            const char *Name) {
   Value *V = unwrap(Fn);
   FunctionType *FnT =
       cast<FunctionType>(cast<PointerType>(V->getType())->getElementType());
 
   return wrap(unwrap(B)->CreateCall(FnT, unwrap(Fn),
                                     makeArrayRef(unwrap(Args), NumArgs), Name));
 }
 
 LLVMValueRef LLVMBuildCall2(LLVMBuilderRef B, LLVMTypeRef Ty, LLVMValueRef Fn,
                             LLVMValueRef *Args, unsigned NumArgs,
                             const char *Name) {
   FunctionType *FTy = unwrap<FunctionType>(Ty);
   return wrap(unwrap(B)->CreateCall(FTy, unwrap(Fn),
                                     makeArrayRef(unwrap(Args), NumArgs), Name));
 }
 
 LLVMValueRef LLVMBuildSelect(LLVMBuilderRef B, LLVMValueRef If,
                              LLVMValueRef Then, LLVMValueRef Else,
                              const char *Name) {
   return wrap(unwrap(B)->CreateSelect(unwrap(If), unwrap(Then), unwrap(Else),
                                       Name));
 }
 
 LLVMValueRef LLVMBuildVAArg(LLVMBuilderRef B, LLVMValueRef List,
                             LLVMTypeRef Ty, const char *Name) {
   return wrap(unwrap(B)->CreateVAArg(unwrap(List), unwrap(Ty), Name));
 }
 
 LLVMValueRef LLVMBuildExtractElement(LLVMBuilderRef B, LLVMValueRef VecVal,
                                       LLVMValueRef Index, const char *Name) {
   return wrap(unwrap(B)->CreateExtractElement(unwrap(VecVal), unwrap(Index),
                                               Name));
 }
 
 LLVMValueRef LLVMBuildInsertElement(LLVMBuilderRef B, LLVMValueRef VecVal,
                                     LLVMValueRef EltVal, LLVMValueRef Index,
                                     const char *Name) {
   return wrap(unwrap(B)->CreateInsertElement(unwrap(VecVal), unwrap(EltVal),
                                              unwrap(Index), Name));
 }
 
 LLVMValueRef LLVMBuildShuffleVector(LLVMBuilderRef B, LLVMValueRef V1,
                                     LLVMValueRef V2, LLVMValueRef Mask,
                                     const char *Name) {
   return wrap(unwrap(B)->CreateShuffleVector(unwrap(V1), unwrap(V2),
                                              unwrap(Mask), Name));
 }
 
 LLVMValueRef LLVMBuildExtractValue(LLVMBuilderRef B, LLVMValueRef AggVal,
                                    unsigned Index, const char *Name) {
   return wrap(unwrap(B)->CreateExtractValue(unwrap(AggVal), Index, Name));
 }
 
 LLVMValueRef LLVMBuildInsertValue(LLVMBuilderRef B, LLVMValueRef AggVal,
                                   LLVMValueRef EltVal, unsigned Index,
                                   const char *Name) {
   return wrap(unwrap(B)->CreateInsertValue(unwrap(AggVal), unwrap(EltVal),
                                            Index, Name));
 }
 
 LLVMValueRef LLVMBuildFreeze(LLVMBuilderRef B, LLVMValueRef Val,
                              const char *Name) {
   return wrap(unwrap(B)->CreateFreeze(unwrap(Val), Name));
 }
 
 LLVMValueRef LLVMBuildIsNull(LLVMBuilderRef B, LLVMValueRef Val,
                              const char *Name) {
   return wrap(unwrap(B)->CreateIsNull(unwrap(Val), Name));
 }
 
 LLVMValueRef LLVMBuildIsNotNull(LLVMBuilderRef B, LLVMValueRef Val,
                                 const char *Name) {
   return wrap(unwrap(B)->CreateIsNotNull(unwrap(Val), Name));
 }
 
 LLVMValueRef LLVMBuildPtrDiff(LLVMBuilderRef B, LLVMValueRef LHS,
                               LLVMValueRef RHS, const char *Name) {
   return wrap(unwrap(B)->CreatePtrDiff(unwrap(LHS), unwrap(RHS), Name));
 }
 
 LLVMValueRef LLVMBuildAtomicRMW(LLVMBuilderRef B,LLVMAtomicRMWBinOp op,
                                LLVMValueRef PTR, LLVMValueRef Val,
                                LLVMAtomicOrdering ordering,
                                LLVMBool singleThread) {
   AtomicRMWInst::BinOp intop = mapFromLLVMRMWBinOp(op);
   return wrap(unwrap(B)->CreateAtomicRMW(intop, unwrap(PTR), unwrap(Val),
     mapFromLLVMOrdering(ordering), singleThread ? SyncScope::SingleThread
                                                 : SyncScope::System));
 }
 
 LLVMValueRef LLVMBuildAtomicCmpXchg(LLVMBuilderRef B, LLVMValueRef Ptr,
                                     LLVMValueRef Cmp, LLVMValueRef New,
                                     LLVMAtomicOrdering SuccessOrdering,
                                     LLVMAtomicOrdering FailureOrdering,
                                     LLVMBool singleThread) {
 
   return wrap(unwrap(B)->CreateAtomicCmpXchg(unwrap(Ptr), unwrap(Cmp),
                 unwrap(New), mapFromLLVMOrdering(SuccessOrdering),
                 mapFromLLVMOrdering(FailureOrdering),
                 singleThread ? SyncScope::SingleThread : SyncScope::System));
 }
 
+unsigned LLVMGetNumMaskElements(LLVMValueRef SVInst) {
+  Value *P = unwrap<Value>(SVInst);
+  ShuffleVectorInst *I = cast<ShuffleVectorInst>(P);
+  return I->getShuffleMask().size();
+}
+
+int LLVMGetMaskValue(LLVMValueRef SVInst, unsigned Elt) {
+  Value *P = unwrap<Value>(SVInst);
+  ShuffleVectorInst *I = cast<ShuffleVectorInst>(P);
+  return I->getMaskValue(Elt);
+}
+
+int LLVMGetUndefMaskElem(void) { return UndefMaskElem; }
 
 LLVMBool LLVMIsAtomicSingleThread(LLVMValueRef AtomicInst) {
   Value *P = unwrap<Value>(AtomicInst);
 
   if (AtomicRMWInst *I = dyn_cast<AtomicRMWInst>(P))
     return I->getSyncScopeID() == SyncScope::SingleThread;
   return cast<AtomicCmpXchgInst>(P)->getSyncScopeID() ==
              SyncScope::SingleThread;
 }
 
 void LLVMSetAtomicSingleThread(LLVMValueRef AtomicInst, LLVMBool NewValue) {
   Value *P = unwrap<Value>(AtomicInst);
   SyncScope::ID SSID = NewValue ? SyncScope::SingleThread : SyncScope::System;
 
   if (AtomicRMWInst *I = dyn_cast<AtomicRMWInst>(P))
     return I->setSyncScopeID(SSID);
   return cast<AtomicCmpXchgInst>(P)->setSyncScopeID(SSID);
 }
 
 LLVMAtomicOrdering LLVMGetCmpXchgSuccessOrdering(LLVMValueRef CmpXchgInst)  {
   Value *P = unwrap<Value>(CmpXchgInst);
   return mapToLLVMOrdering(cast<AtomicCmpXchgInst>(P)->getSuccessOrdering());
 }
 
 void LLVMSetCmpXchgSuccessOrdering(LLVMValueRef CmpXchgInst,
                                    LLVMAtomicOrdering Ordering) {
   Value *P = unwrap<Value>(CmpXchgInst);
   AtomicOrdering O = mapFromLLVMOrdering(Ordering);
 
   return cast<AtomicCmpXchgInst>(P)->setSuccessOrdering(O);
 }
 
 LLVMAtomicOrdering LLVMGetCmpXchgFailureOrdering(LLVMValueRef CmpXchgInst)  {
   Value *P = unwrap<Value>(CmpXchgInst);
   return mapToLLVMOrdering(cast<AtomicCmpXchgInst>(P)->getFailureOrdering());
 }
 
 void LLVMSetCmpXchgFailureOrdering(LLVMValueRef CmpXchgInst,
                                    LLVMAtomicOrdering Ordering) {
   Value *P = unwrap<Value>(CmpXchgInst);
   AtomicOrdering O = mapFromLLVMOrdering(Ordering);
 
   return cast<AtomicCmpXchgInst>(P)->setFailureOrdering(O);
 }
 
 /*===-- Module providers --------------------------------------------------===*/
 
 LLVMModuleProviderRef
 LLVMCreateModuleProviderForExistingModule(LLVMModuleRef M) {
   return reinterpret_cast<LLVMModuleProviderRef>(M);
 }
 
 void LLVMDisposeModuleProvider(LLVMModuleProviderRef MP) {
   delete unwrap(MP);
 }
 
 
 /*===-- Memory buffers ----------------------------------------------------===*/
 
 LLVMBool LLVMCreateMemoryBufferWithContentsOfFile(
     const char *Path,
     LLVMMemoryBufferRef *OutMemBuf,
     char **OutMessage) {
 
   ErrorOr<std::unique_ptr<MemoryBuffer>> MBOrErr = MemoryBuffer::getFile(Path);
   if (std::error_code EC = MBOrErr.getError()) {
     *OutMessage = strdup(EC.message().c_str());
     return 1;
   }
   *OutMemBuf = wrap(MBOrErr.get().release());
   return 0;
 }
 
 LLVMBool LLVMCreateMemoryBufferWithSTDIN(LLVMMemoryBufferRef *OutMemBuf,
                                          char **OutMessage) {
   ErrorOr<std::unique_ptr<MemoryBuffer>> MBOrErr = MemoryBuffer::getSTDIN();
   if (std::error_code EC = MBOrErr.getError()) {
     *OutMessage = strdup(EC.message().c_str());
     return 1;
   }
   *OutMemBuf = wrap(MBOrErr.get().release());
   return 0;
 }
 
 LLVMMemoryBufferRef LLVMCreateMemoryBufferWithMemoryRange(
     const char *InputData,
     size_t InputDataLength,
     const char *BufferName,
     LLVMBool RequiresNullTerminator) {
 
   return wrap(MemoryBuffer::getMemBuffer(StringRef(InputData, InputDataLength),
                                          StringRef(BufferName),
                                          RequiresNullTerminator).release());
 }
 
 LLVMMemoryBufferRef LLVMCreateMemoryBufferWithMemoryRangeCopy(
     const char *InputData,
     size_t InputDataLength,
     const char *BufferName) {
 
   return wrap(
       MemoryBuffer::getMemBufferCopy(StringRef(InputData, InputDataLength),
                                      StringRef(BufferName)).release());
 }
 
 const char *LLVMGetBufferStart(LLVMMemoryBufferRef MemBuf) {
   return unwrap(MemBuf)->getBufferStart();
 }
 
 size_t LLVMGetBufferSize(LLVMMemoryBufferRef MemBuf) {
   return unwrap(MemBuf)->getBufferSize();
 }
 
 void LLVMDisposeMemoryBuffer(LLVMMemoryBufferRef MemBuf) {
   delete unwrap(MemBuf);
 }
 
 /*===-- Pass Registry -----------------------------------------------------===*/
 
 LLVMPassRegistryRef LLVMGetGlobalPassRegistry(void) {
   return wrap(PassRegistry::getPassRegistry());
 }
 
 /*===-- Pass Manager ------------------------------------------------------===*/
 
 LLVMPassManagerRef LLVMCreatePassManager() {
   return wrap(new legacy::PassManager());
 }
 
 LLVMPassManagerRef LLVMCreateFunctionPassManagerForModule(LLVMModuleRef M) {
   return wrap(new legacy::FunctionPassManager(unwrap(M)));
 }
 
 LLVMPassManagerRef LLVMCreateFunctionPassManager(LLVMModuleProviderRef P) {
   return LLVMCreateFunctionPassManagerForModule(
                                             reinterpret_cast<LLVMModuleRef>(P));
 }
 
 LLVMBool LLVMRunPassManager(LLVMPassManagerRef PM, LLVMModuleRef M) {
   return unwrap<legacy::PassManager>(PM)->run(*unwrap(M));
 }
 
 LLVMBool LLVMInitializeFunctionPassManager(LLVMPassManagerRef FPM) {
   return unwrap<legacy::FunctionPassManager>(FPM)->doInitialization();
 }
 
 LLVMBool LLVMRunFunctionPassManager(LLVMPassManagerRef FPM, LLVMValueRef F) {
   return unwrap<legacy::FunctionPassManager>(FPM)->run(*unwrap<Function>(F));
 }
 
 LLVMBool LLVMFinalizeFunctionPassManager(LLVMPassManagerRef FPM) {
   return unwrap<legacy::FunctionPassManager>(FPM)->doFinalization();
 }
 
 void LLVMDisposePassManager(LLVMPassManagerRef PM) {
   delete unwrap(PM);
 }
 
 /*===-- Threading ------------------------------------------------------===*/
 
 LLVMBool LLVMStartMultithreaded() {
   return LLVMIsMultithreaded();
 }
 
 void LLVMStopMultithreaded() {
 }
 
 LLVMBool LLVMIsMultithreaded() {
   return llvm_is_multithreaded();
 }
Index: vendor/llvm-project/release-11.x/llvm/lib/IR/Globals.cpp
===================================================================
--- vendor/llvm-project/release-11.x/llvm/lib/IR/Globals.cpp	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/lib/IR/Globals.cpp	(revision 366333)
@@ -1,548 +1,549 @@
 //===-- Globals.cpp - Implement the GlobalValue & GlobalVariable class ----===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 //
 // This file implements the GlobalValue & GlobalVariable classes for the IR
 // library.
 //
 //===----------------------------------------------------------------------===//
 
 #include "LLVMContextImpl.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/Triple.h"
 #include "llvm/IR/ConstantRange.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/GlobalAlias.h"
 #include "llvm/IR/GlobalValue.h"
 #include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Operator.h"
 #include "llvm/Support/Error.h"
 #include "llvm/Support/ErrorHandling.h"
 using namespace llvm;
 
 //===----------------------------------------------------------------------===//
 //                            GlobalValue Class
 //===----------------------------------------------------------------------===//
 
 // GlobalValue should be a Constant, plus a type, a module, some flags, and an
 // intrinsic ID. Add an assert to prevent people from accidentally growing
 // GlobalValue while adding flags.
 static_assert(sizeof(GlobalValue) ==
                   sizeof(Constant) + 2 * sizeof(void *) + 2 * sizeof(unsigned),
               "unexpected GlobalValue size growth");
 
 // GlobalObject adds a comdat.
 static_assert(sizeof(GlobalObject) == sizeof(GlobalValue) + sizeof(void *),
               "unexpected GlobalObject size growth");
 
 bool GlobalValue::isMaterializable() const {
   if (const Function *F = dyn_cast<Function>(this))
     return F->isMaterializable();
   return false;
 }
 Error GlobalValue::materialize() {
   return getParent()->materialize(this);
 }
 
 /// Override destroyConstantImpl to make sure it doesn't get called on
 /// GlobalValue's because they shouldn't be treated like other constants.
 void GlobalValue::destroyConstantImpl() {
   llvm_unreachable("You can't GV->destroyConstantImpl()!");
 }
 
 Value *GlobalValue::handleOperandChangeImpl(Value *From, Value *To) {
   llvm_unreachable("Unsupported class for handleOperandChange()!");
 }
 
 /// copyAttributesFrom - copy all additional attributes (those not needed to
 /// create a GlobalValue) from the GlobalValue Src to this one.
 void GlobalValue::copyAttributesFrom(const GlobalValue *Src) {
   setVisibility(Src->getVisibility());
   setUnnamedAddr(Src->getUnnamedAddr());
   setThreadLocalMode(Src->getThreadLocalMode());
   setDLLStorageClass(Src->getDLLStorageClass());
   setDSOLocal(Src->isDSOLocal());
   setPartition(Src->getPartition());
 }
 
 void GlobalValue::removeFromParent() {
   switch (getValueID()) {
 #define HANDLE_GLOBAL_VALUE(NAME)                                              \
   case Value::NAME##Val:                                                       \
     return static_cast<NAME *>(this)->removeFromParent();
 #include "llvm/IR/Value.def"
   default:
     break;
   }
   llvm_unreachable("not a global");
 }
 
 void GlobalValue::eraseFromParent() {
   switch (getValueID()) {
 #define HANDLE_GLOBAL_VALUE(NAME)                                              \
   case Value::NAME##Val:                                                       \
     return static_cast<NAME *>(this)->eraseFromParent();
 #include "llvm/IR/Value.def"
   default:
     break;
   }
   llvm_unreachable("not a global");
 }
 
 bool GlobalValue::isInterposable() const {
   if (isInterposableLinkage(getLinkage()))
     return true;
   return getParent() && getParent()->getSemanticInterposition() &&
          !isDSOLocal();
 }
 
 bool GlobalValue::canBenefitFromLocalAlias() const {
   // See AsmPrinter::getSymbolPreferLocal().
-  return GlobalObject::isExternalLinkage(getLinkage()) && !isDeclaration() &&
+  return hasDefaultVisibility() &&
+         GlobalObject::isExternalLinkage(getLinkage()) && !isDeclaration() &&
          !isa<GlobalIFunc>(this) && !hasComdat();
 }
 
 unsigned GlobalValue::getAddressSpace() const {
   PointerType *PtrTy = getType();
   return PtrTy->getAddressSpace();
 }
 
 void GlobalObject::setAlignment(MaybeAlign Align) {
   assert((!Align || *Align <= MaximumAlignment) &&
          "Alignment is greater than MaximumAlignment!");
   unsigned AlignmentData = encode(Align);
   unsigned OldData = getGlobalValueSubClassData();
   setGlobalValueSubClassData((OldData & ~AlignmentMask) | AlignmentData);
   assert(MaybeAlign(getAlignment()) == Align &&
          "Alignment representation error!");
 }
 
 void GlobalObject::copyAttributesFrom(const GlobalObject *Src) {
   GlobalValue::copyAttributesFrom(Src);
   setAlignment(MaybeAlign(Src->getAlignment()));
   setSection(Src->getSection());
 }
 
 std::string GlobalValue::getGlobalIdentifier(StringRef Name,
                                              GlobalValue::LinkageTypes Linkage,
                                              StringRef FileName) {
 
   // Value names may be prefixed with a binary '1' to indicate
   // that the backend should not modify the symbols due to any platform
   // naming convention. Do not include that '1' in the PGO profile name.
   if (Name[0] == '\1')
     Name = Name.substr(1);
 
   std::string NewName = std::string(Name);
   if (llvm::GlobalValue::isLocalLinkage(Linkage)) {
     // For local symbols, prepend the main file name to distinguish them.
     // Do not include the full path in the file name since there's no guarantee
     // that it will stay the same, e.g., if the files are checked out from
     // version control in different locations.
     if (FileName.empty())
       NewName = NewName.insert(0, "<unknown>:");
     else
       NewName = NewName.insert(0, FileName.str() + ":");
   }
   return NewName;
 }
 
 std::string GlobalValue::getGlobalIdentifier() const {
   return getGlobalIdentifier(getName(), getLinkage(),
                              getParent()->getSourceFileName());
 }
 
 StringRef GlobalValue::getSection() const {
   if (auto *GA = dyn_cast<GlobalAlias>(this)) {
     // In general we cannot compute this at the IR level, but we try.
     if (const GlobalObject *GO = GA->getBaseObject())
       return GO->getSection();
     return "";
   }
   return cast<GlobalObject>(this)->getSection();
 }
 
 const Comdat *GlobalValue::getComdat() const {
   if (auto *GA = dyn_cast<GlobalAlias>(this)) {
     // In general we cannot compute this at the IR level, but we try.
     if (const GlobalObject *GO = GA->getBaseObject())
       return const_cast<GlobalObject *>(GO)->getComdat();
     return nullptr;
   }
   // ifunc and its resolver are separate things so don't use resolver comdat.
   if (isa<GlobalIFunc>(this))
     return nullptr;
   return cast<GlobalObject>(this)->getComdat();
 }
 
 StringRef GlobalValue::getPartition() const {
   if (!hasPartition())
     return "";
   return getContext().pImpl->GlobalValuePartitions[this];
 }
 
 void GlobalValue::setPartition(StringRef S) {
   // Do nothing if we're clearing the partition and it is already empty.
   if (!hasPartition() && S.empty())
     return;
 
   // Get or create a stable partition name string and put it in the table in the
   // context.
   if (!S.empty())
     S = getContext().pImpl->Saver.save(S);
   getContext().pImpl->GlobalValuePartitions[this] = S;
 
   // Update the HasPartition field. Setting the partition to the empty string
   // means this global no longer has a partition.
   HasPartition = !S.empty();
 }
 
 StringRef GlobalObject::getSectionImpl() const {
   assert(hasSection());
   return getContext().pImpl->GlobalObjectSections[this];
 }
 
 void GlobalObject::setSection(StringRef S) {
   // Do nothing if we're clearing the section and it is already empty.
   if (!hasSection() && S.empty())
     return;
 
   // Get or create a stable section name string and put it in the table in the
   // context.
   if (!S.empty())
     S = getContext().pImpl->Saver.save(S);
   getContext().pImpl->GlobalObjectSections[this] = S;
 
   // Update the HasSectionHashEntryBit. Setting the section to the empty string
   // means this global no longer has a section.
   setGlobalObjectFlag(HasSectionHashEntryBit, !S.empty());
 }
 
 bool GlobalValue::isDeclaration() const {
   // Globals are definitions if they have an initializer.
   if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(this))
     return GV->getNumOperands() == 0;
 
   // Functions are definitions if they have a body.
   if (const Function *F = dyn_cast<Function>(this))
     return F->empty() && !F->isMaterializable();
 
   // Aliases and ifuncs are always definitions.
   assert(isa<GlobalIndirectSymbol>(this));
   return false;
 }
 
 bool GlobalObject::canIncreaseAlignment() const {
   // Firstly, can only increase the alignment of a global if it
   // is a strong definition.
   if (!isStrongDefinitionForLinker())
     return false;
 
   // It also has to either not have a section defined, or, not have
   // alignment specified. (If it is assigned a section, the global
   // could be densely packed with other objects in the section, and
   // increasing the alignment could cause padding issues.)
   if (hasSection() && getAlignment() > 0)
     return false;
 
   // On ELF platforms, we're further restricted in that we can't
   // increase the alignment of any variable which might be emitted
   // into a shared library, and which is exported. If the main
   // executable accesses a variable found in a shared-lib, the main
   // exe actually allocates memory for and exports the symbol ITSELF,
   // overriding the symbol found in the library. That is, at link
   // time, the observed alignment of the variable is copied into the
   // executable binary. (A COPY relocation is also generated, to copy
   // the initial data from the shadowed variable in the shared-lib
   // into the location in the main binary, before running code.)
   //
   // And thus, even though you might think you are defining the
   // global, and allocating the memory for the global in your object
   // file, and thus should be able to set the alignment arbitrarily,
   // that's not actually true. Doing so can cause an ABI breakage; an
   // executable might have already been built with the previous
   // alignment of the variable, and then assuming an increased
   // alignment will be incorrect.
 
   // Conservatively assume ELF if there's no parent pointer.
   bool isELF =
       (!Parent || Triple(Parent->getTargetTriple()).isOSBinFormatELF());
   if (isELF && !isDSOLocal())
     return false;
 
   return true;
 }
 
 const GlobalObject *GlobalValue::getBaseObject() const {
   if (auto *GO = dyn_cast<GlobalObject>(this))
     return GO;
   if (auto *GA = dyn_cast<GlobalIndirectSymbol>(this))
     return GA->getBaseObject();
   return nullptr;
 }
 
 bool GlobalValue::isAbsoluteSymbolRef() const {
   auto *GO = dyn_cast<GlobalObject>(this);
   if (!GO)
     return false;
 
   return GO->getMetadata(LLVMContext::MD_absolute_symbol);
 }
 
 Optional<ConstantRange> GlobalValue::getAbsoluteSymbolRange() const {
   auto *GO = dyn_cast<GlobalObject>(this);
   if (!GO)
     return None;
 
   MDNode *MD = GO->getMetadata(LLVMContext::MD_absolute_symbol);
   if (!MD)
     return None;
 
   return getConstantRangeFromMetadata(*MD);
 }
 
 bool GlobalValue::canBeOmittedFromSymbolTable() const {
   if (!hasLinkOnceODRLinkage())
     return false;
 
   // We assume that anyone who sets global unnamed_addr on a non-constant
   // knows what they're doing.
   if (hasGlobalUnnamedAddr())
     return true;
 
   // If it is a non constant variable, it needs to be uniqued across shared
   // objects.
   if (auto *Var = dyn_cast<GlobalVariable>(this))
     if (!Var->isConstant())
       return false;
 
   return hasAtLeastLocalUnnamedAddr();
 }
 
 //===----------------------------------------------------------------------===//
 // GlobalVariable Implementation
 //===----------------------------------------------------------------------===//
 
 GlobalVariable::GlobalVariable(Type *Ty, bool constant, LinkageTypes Link,
                                Constant *InitVal, const Twine &Name,
                                ThreadLocalMode TLMode, unsigned AddressSpace,
                                bool isExternallyInitialized)
     : GlobalObject(Ty, Value::GlobalVariableVal,
                    OperandTraits<GlobalVariable>::op_begin(this),
                    InitVal != nullptr, Link, Name, AddressSpace),
       isConstantGlobal(constant),
       isExternallyInitializedConstant(isExternallyInitialized) {
   assert(!Ty->isFunctionTy() && PointerType::isValidElementType(Ty) &&
          "invalid type for global variable");
   setThreadLocalMode(TLMode);
   if (InitVal) {
     assert(InitVal->getType() == Ty &&
            "Initializer should be the same type as the GlobalVariable!");
     Op<0>() = InitVal;
   }
 }
 
 GlobalVariable::GlobalVariable(Module &M, Type *Ty, bool constant,
                                LinkageTypes Link, Constant *InitVal,
                                const Twine &Name, GlobalVariable *Before,
                                ThreadLocalMode TLMode, unsigned AddressSpace,
                                bool isExternallyInitialized)
     : GlobalObject(Ty, Value::GlobalVariableVal,
                    OperandTraits<GlobalVariable>::op_begin(this),
                    InitVal != nullptr, Link, Name, AddressSpace),
       isConstantGlobal(constant),
       isExternallyInitializedConstant(isExternallyInitialized) {
   assert(!Ty->isFunctionTy() && PointerType::isValidElementType(Ty) &&
          "invalid type for global variable");
   setThreadLocalMode(TLMode);
   if (InitVal) {
     assert(InitVal->getType() == Ty &&
            "Initializer should be the same type as the GlobalVariable!");
     Op<0>() = InitVal;
   }
 
   if (Before)
     Before->getParent()->getGlobalList().insert(Before->getIterator(), this);
   else
     M.getGlobalList().push_back(this);
 }
 
 void GlobalVariable::removeFromParent() {
   getParent()->getGlobalList().remove(getIterator());
 }
 
 void GlobalVariable::eraseFromParent() {
   getParent()->getGlobalList().erase(getIterator());
 }
 
 void GlobalVariable::setInitializer(Constant *InitVal) {
   if (!InitVal) {
     if (hasInitializer()) {
       // Note, the num operands is used to compute the offset of the operand, so
       // the order here matters.  Clearing the operand then clearing the num
       // operands ensures we have the correct offset to the operand.
       Op<0>().set(nullptr);
       setGlobalVariableNumOperands(0);
     }
   } else {
     assert(InitVal->getType() == getValueType() &&
            "Initializer type must match GlobalVariable type");
     // Note, the num operands is used to compute the offset of the operand, so
     // the order here matters.  We need to set num operands to 1 first so that
     // we get the correct offset to the first operand when we set it.
     if (!hasInitializer())
       setGlobalVariableNumOperands(1);
     Op<0>().set(InitVal);
   }
 }
 
 /// Copy all additional attributes (those not needed to create a GlobalVariable)
 /// from the GlobalVariable Src to this one.
 void GlobalVariable::copyAttributesFrom(const GlobalVariable *Src) {
   GlobalObject::copyAttributesFrom(Src);
   setExternallyInitialized(Src->isExternallyInitialized());
   setAttributes(Src->getAttributes());
 }
 
 void GlobalVariable::dropAllReferences() {
   User::dropAllReferences();
   clearMetadata();
 }
 
 //===----------------------------------------------------------------------===//
 // GlobalIndirectSymbol Implementation
 //===----------------------------------------------------------------------===//
 
 GlobalIndirectSymbol::GlobalIndirectSymbol(Type *Ty, ValueTy VTy,
     unsigned AddressSpace, LinkageTypes Linkage, const Twine &Name,
     Constant *Symbol)
     : GlobalValue(Ty, VTy, &Op<0>(), 1, Linkage, Name, AddressSpace) {
     Op<0>() = Symbol;
 }
 
 static const GlobalObject *
 findBaseObject(const Constant *C, DenseSet<const GlobalAlias *> &Aliases) {
   if (auto *GO = dyn_cast<GlobalObject>(C))
     return GO;
   if (auto *GA = dyn_cast<GlobalAlias>(C))
     if (Aliases.insert(GA).second)
       return findBaseObject(GA->getOperand(0), Aliases);
   if (auto *CE = dyn_cast<ConstantExpr>(C)) {
     switch (CE->getOpcode()) {
     case Instruction::Add: {
       auto *LHS = findBaseObject(CE->getOperand(0), Aliases);
       auto *RHS = findBaseObject(CE->getOperand(1), Aliases);
       if (LHS && RHS)
         return nullptr;
       return LHS ? LHS : RHS;
     }
     case Instruction::Sub: {
       if (findBaseObject(CE->getOperand(1), Aliases))
         return nullptr;
       return findBaseObject(CE->getOperand(0), Aliases);
     }
     case Instruction::IntToPtr:
     case Instruction::PtrToInt:
     case Instruction::BitCast:
     case Instruction::GetElementPtr:
       return findBaseObject(CE->getOperand(0), Aliases);
     default:
       break;
     }
   }
   return nullptr;
 }
 
 const GlobalObject *GlobalIndirectSymbol::getBaseObject() const {
   DenseSet<const GlobalAlias *> Aliases;
   return findBaseObject(getOperand(0), Aliases);
 }
 
 //===----------------------------------------------------------------------===//
 // GlobalAlias Implementation
 //===----------------------------------------------------------------------===//
 
 GlobalAlias::GlobalAlias(Type *Ty, unsigned AddressSpace, LinkageTypes Link,
                          const Twine &Name, Constant *Aliasee,
                          Module *ParentModule)
     : GlobalIndirectSymbol(Ty, Value::GlobalAliasVal, AddressSpace, Link, Name,
                            Aliasee) {
   if (ParentModule)
     ParentModule->getAliasList().push_back(this);
 }
 
 GlobalAlias *GlobalAlias::create(Type *Ty, unsigned AddressSpace,
                                  LinkageTypes Link, const Twine &Name,
                                  Constant *Aliasee, Module *ParentModule) {
   return new GlobalAlias(Ty, AddressSpace, Link, Name, Aliasee, ParentModule);
 }
 
 GlobalAlias *GlobalAlias::create(Type *Ty, unsigned AddressSpace,
                                  LinkageTypes Linkage, const Twine &Name,
                                  Module *Parent) {
   return create(Ty, AddressSpace, Linkage, Name, nullptr, Parent);
 }
 
 GlobalAlias *GlobalAlias::create(Type *Ty, unsigned AddressSpace,
                                  LinkageTypes Linkage, const Twine &Name,
                                  GlobalValue *Aliasee) {
   return create(Ty, AddressSpace, Linkage, Name, Aliasee, Aliasee->getParent());
 }
 
 GlobalAlias *GlobalAlias::create(LinkageTypes Link, const Twine &Name,
                                  GlobalValue *Aliasee) {
   PointerType *PTy = Aliasee->getType();
   return create(PTy->getElementType(), PTy->getAddressSpace(), Link, Name,
                 Aliasee);
 }
 
 GlobalAlias *GlobalAlias::create(const Twine &Name, GlobalValue *Aliasee) {
   return create(Aliasee->getLinkage(), Name, Aliasee);
 }
 
 void GlobalAlias::removeFromParent() {
   getParent()->getAliasList().remove(getIterator());
 }
 
 void GlobalAlias::eraseFromParent() {
   getParent()->getAliasList().erase(getIterator());
 }
 
 void GlobalAlias::setAliasee(Constant *Aliasee) {
   assert((!Aliasee || Aliasee->getType() == getType()) &&
          "Alias and aliasee types should match!");
   setIndirectSymbol(Aliasee);
 }
 
 //===----------------------------------------------------------------------===//
 // GlobalIFunc Implementation
 //===----------------------------------------------------------------------===//
 
 GlobalIFunc::GlobalIFunc(Type *Ty, unsigned AddressSpace, LinkageTypes Link,
                          const Twine &Name, Constant *Resolver,
                          Module *ParentModule)
     : GlobalIndirectSymbol(Ty, Value::GlobalIFuncVal, AddressSpace, Link, Name,
                            Resolver) {
   if (ParentModule)
     ParentModule->getIFuncList().push_back(this);
 }
 
 GlobalIFunc *GlobalIFunc::create(Type *Ty, unsigned AddressSpace,
                                  LinkageTypes Link, const Twine &Name,
                                  Constant *Resolver, Module *ParentModule) {
   return new GlobalIFunc(Ty, AddressSpace, Link, Name, Resolver, ParentModule);
 }
 
 void GlobalIFunc::removeFromParent() {
   getParent()->getIFuncList().remove(getIterator());
 }
 
 void GlobalIFunc::eraseFromParent() {
   getParent()->getIFuncList().erase(getIterator());
 }
Index: vendor/llvm-project/release-11.x/llvm/lib/Support/APFloat.cpp
===================================================================
--- vendor/llvm-project/release-11.x/llvm/lib/Support/APFloat.cpp	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/lib/Support/APFloat.cpp	(revision 366333)
@@ -1,4874 +1,4889 @@
 //===-- APFloat.cpp - Implement APFloat class -----------------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 //
 // This file implements a class to represent arbitrary precision floating
 // point values and provide a variety of arithmetic operations on them.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/ADT/APFloat.h"
 #include "llvm/ADT/APSInt.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/FoldingSet.h"
 #include "llvm/ADT/Hashing.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Config/llvm-config.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/Error.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include <cstring>
 #include <limits.h>
 
 #define APFLOAT_DISPATCH_ON_SEMANTICS(METHOD_CALL)                             \
   do {                                                                         \
     if (usesLayout<IEEEFloat>(getSemantics()))                                 \
       return U.IEEE.METHOD_CALL;                                               \
     if (usesLayout<DoubleAPFloat>(getSemantics()))                             \
       return U.Double.METHOD_CALL;                                             \
     llvm_unreachable("Unexpected semantics");                                  \
   } while (false)
 
 using namespace llvm;
 
 /// A macro used to combine two fcCategory enums into one key which can be used
 /// in a switch statement to classify how the interaction of two APFloat's
 /// categories affects an operation.
 ///
 /// TODO: If clang source code is ever allowed to use constexpr in its own
 /// codebase, change this into a static inline function.
 #define PackCategoriesIntoKey(_lhs, _rhs) ((_lhs) * 4 + (_rhs))
 
 /* Assumed in hexadecimal significand parsing, and conversion to
    hexadecimal strings.  */
 static_assert(APFloatBase::integerPartWidth % 4 == 0, "Part width must be divisible by 4!");
 
 namespace llvm {
   /* Represents floating point arithmetic semantics.  */
   struct fltSemantics {
     /* The largest E such that 2^E is representable; this matches the
        definition of IEEE 754.  */
     APFloatBase::ExponentType maxExponent;
 
     /* The smallest E such that 2^E is a normalized number; this
        matches the definition of IEEE 754.  */
     APFloatBase::ExponentType minExponent;
 
     /* Number of bits in the significand.  This includes the integer
        bit.  */
     unsigned int precision;
 
     /* Number of bits actually used in the semantics. */
     unsigned int sizeInBits;
   };
 
   static const fltSemantics semIEEEhalf = {15, -14, 11, 16};
   static const fltSemantics semBFloat = {127, -126, 8, 16};
   static const fltSemantics semIEEEsingle = {127, -126, 24, 32};
   static const fltSemantics semIEEEdouble = {1023, -1022, 53, 64};
   static const fltSemantics semIEEEquad = {16383, -16382, 113, 128};
   static const fltSemantics semX87DoubleExtended = {16383, -16382, 64, 80};
   static const fltSemantics semBogus = {0, 0, 0, 0};
 
   /* The IBM double-double semantics. Such a number consists of a pair of IEEE
      64-bit doubles (Hi, Lo), where |Hi| > |Lo|, and if normal,
      (double)(Hi + Lo) == Hi. The numeric value it's modeling is Hi + Lo.
      Therefore it has two 53-bit mantissa parts that aren't necessarily adjacent
      to each other, and two 11-bit exponents.
 
      Note: we need to make the value different from semBogus as otherwise
      an unsafe optimization may collapse both values to a single address,
      and we heavily rely on them having distinct addresses.             */
   static const fltSemantics semPPCDoubleDouble = {-1, 0, 0, 0};
 
   /* These are legacy semantics for the fallback, inaccrurate implementation of
      IBM double-double, if the accurate semPPCDoubleDouble doesn't handle the
      operation. It's equivalent to having an IEEE number with consecutive 106
      bits of mantissa and 11 bits of exponent.
 
      It's not equivalent to IBM double-double. For example, a legit IBM
      double-double, 1 + epsilon:
 
        1 + epsilon = 1 + (1 >> 1076)
 
      is not representable by a consecutive 106 bits of mantissa.
 
      Currently, these semantics are used in the following way:
 
        semPPCDoubleDouble -> (IEEEdouble, IEEEdouble) ->
        (64-bit APInt, 64-bit APInt) -> (128-bit APInt) ->
        semPPCDoubleDoubleLegacy -> IEEE operations
 
      We use bitcastToAPInt() to get the bit representation (in APInt) of the
      underlying IEEEdouble, then use the APInt constructor to construct the
      legacy IEEE float.
 
      TODO: Implement all operations in semPPCDoubleDouble, and delete these
      semantics.  */
   static const fltSemantics semPPCDoubleDoubleLegacy = {1023, -1022 + 53,
                                                         53 + 53, 128};
 
   const llvm::fltSemantics &APFloatBase::EnumToSemantics(Semantics S) {
     switch (S) {
     case S_IEEEhalf:
       return IEEEhalf();
     case S_BFloat:
       return BFloat();
     case S_IEEEsingle:
       return IEEEsingle();
     case S_IEEEdouble:
       return IEEEdouble();
     case S_x87DoubleExtended:
       return x87DoubleExtended();
     case S_IEEEquad:
       return IEEEquad();
     case S_PPCDoubleDouble:
       return PPCDoubleDouble();
     }
     llvm_unreachable("Unrecognised floating semantics");
   }
 
   APFloatBase::Semantics
   APFloatBase::SemanticsToEnum(const llvm::fltSemantics &Sem) {
     if (&Sem == &llvm::APFloat::IEEEhalf())
       return S_IEEEhalf;
     else if (&Sem == &llvm::APFloat::BFloat())
       return S_BFloat;
     else if (&Sem == &llvm::APFloat::IEEEsingle())
       return S_IEEEsingle;
     else if (&Sem == &llvm::APFloat::IEEEdouble())
       return S_IEEEdouble;
     else if (&Sem == &llvm::APFloat::x87DoubleExtended())
       return S_x87DoubleExtended;
     else if (&Sem == &llvm::APFloat::IEEEquad())
       return S_IEEEquad;
     else if (&Sem == &llvm::APFloat::PPCDoubleDouble())
       return S_PPCDoubleDouble;
     else
       llvm_unreachable("Unknown floating semantics");
   }
 
   const fltSemantics &APFloatBase::IEEEhalf() {
     return semIEEEhalf;
   }
   const fltSemantics &APFloatBase::BFloat() {
     return semBFloat;
   }
   const fltSemantics &APFloatBase::IEEEsingle() {
     return semIEEEsingle;
   }
   const fltSemantics &APFloatBase::IEEEdouble() {
     return semIEEEdouble;
   }
   const fltSemantics &APFloatBase::IEEEquad() {
     return semIEEEquad;
   }
   const fltSemantics &APFloatBase::x87DoubleExtended() {
     return semX87DoubleExtended;
   }
   const fltSemantics &APFloatBase::Bogus() {
     return semBogus;
   }
   const fltSemantics &APFloatBase::PPCDoubleDouble() {
     return semPPCDoubleDouble;
   }
 
   constexpr RoundingMode APFloatBase::rmNearestTiesToEven;
   constexpr RoundingMode APFloatBase::rmTowardPositive;
   constexpr RoundingMode APFloatBase::rmTowardNegative;
   constexpr RoundingMode APFloatBase::rmTowardZero;
   constexpr RoundingMode APFloatBase::rmNearestTiesToAway;
 
   /* A tight upper bound on number of parts required to hold the value
      pow(5, power) is
 
        power * 815 / (351 * integerPartWidth) + 1
 
      However, whilst the result may require only this many parts,
      because we are multiplying two values to get it, the
      multiplication may require an extra part with the excess part
      being zero (consider the trivial case of 1 * 1, tcFullMultiply
      requires two parts to hold the single-part result).  So we add an
      extra one to guarantee enough space whilst multiplying.  */
   const unsigned int maxExponent = 16383;
   const unsigned int maxPrecision = 113;
   const unsigned int maxPowerOfFiveExponent = maxExponent + maxPrecision - 1;
   const unsigned int maxPowerOfFiveParts = 2 + ((maxPowerOfFiveExponent * 815) / (351 * APFloatBase::integerPartWidth));
 
   unsigned int APFloatBase::semanticsPrecision(const fltSemantics &semantics) {
     return semantics.precision;
   }
   APFloatBase::ExponentType
   APFloatBase::semanticsMaxExponent(const fltSemantics &semantics) {
     return semantics.maxExponent;
   }
   APFloatBase::ExponentType
   APFloatBase::semanticsMinExponent(const fltSemantics &semantics) {
     return semantics.minExponent;
   }
   unsigned int APFloatBase::semanticsSizeInBits(const fltSemantics &semantics) {
     return semantics.sizeInBits;
   }
 
   unsigned APFloatBase::getSizeInBits(const fltSemantics &Sem) {
     return Sem.sizeInBits;
 }
 
 /* A bunch of private, handy routines.  */
 
 static inline Error createError(const Twine &Err) {
   return make_error<StringError>(Err, inconvertibleErrorCode());
 }
 
 static inline unsigned int
 partCountForBits(unsigned int bits)
 {
   return ((bits) + APFloatBase::integerPartWidth - 1) / APFloatBase::integerPartWidth;
 }
 
 /* Returns 0U-9U.  Return values >= 10U are not digits.  */
 static inline unsigned int
 decDigitValue(unsigned int c)
 {
   return c - '0';
 }
 
 /* Return the value of a decimal exponent of the form
    [+-]ddddddd.
 
    If the exponent overflows, returns a large exponent with the
    appropriate sign.  */
 static Expected<int> readExponent(StringRef::iterator begin,
                                   StringRef::iterator end) {
   bool isNegative;
   unsigned int absExponent;
   const unsigned int overlargeExponent = 24000;  /* FIXME.  */
   StringRef::iterator p = begin;
 
   // Treat no exponent as 0 to match binutils
   if (p == end || ((*p == '-' || *p == '+') && (p + 1) == end)) {
     return 0;
   }
 
   isNegative = (*p == '-');
   if (*p == '-' || *p == '+') {
     p++;
     if (p == end)
       return createError("Exponent has no digits");
   }
 
   absExponent = decDigitValue(*p++);
   if (absExponent >= 10U)
     return createError("Invalid character in exponent");
 
   for (; p != end; ++p) {
     unsigned int value;
 
     value = decDigitValue(*p);
     if (value >= 10U)
       return createError("Invalid character in exponent");
 
     absExponent = absExponent * 10U + value;
     if (absExponent >= overlargeExponent) {
       absExponent = overlargeExponent;
       break;
     }
   }
 
   if (isNegative)
     return -(int) absExponent;
   else
     return (int) absExponent;
 }
 
 /* This is ugly and needs cleaning up, but I don't immediately see
    how whilst remaining safe.  */
 static Expected<int> totalExponent(StringRef::iterator p,
                                    StringRef::iterator end,
                                    int exponentAdjustment) {
   int unsignedExponent;
   bool negative, overflow;
   int exponent = 0;
 
   if (p == end)
     return createError("Exponent has no digits");
 
   negative = *p == '-';
   if (*p == '-' || *p == '+') {
     p++;
     if (p == end)
       return createError("Exponent has no digits");
   }
 
   unsignedExponent = 0;
   overflow = false;
   for (; p != end; ++p) {
     unsigned int value;
 
     value = decDigitValue(*p);
     if (value >= 10U)
       return createError("Invalid character in exponent");
 
     unsignedExponent = unsignedExponent * 10 + value;
     if (unsignedExponent > 32767) {
       overflow = true;
       break;
     }
   }
 
   if (exponentAdjustment > 32767 || exponentAdjustment < -32768)
     overflow = true;
 
   if (!overflow) {
     exponent = unsignedExponent;
     if (negative)
       exponent = -exponent;
     exponent += exponentAdjustment;
     if (exponent > 32767 || exponent < -32768)
       overflow = true;
   }
 
   if (overflow)
     exponent = negative ? -32768: 32767;
 
   return exponent;
 }
 
 static Expected<StringRef::iterator>
 skipLeadingZeroesAndAnyDot(StringRef::iterator begin, StringRef::iterator end,
                            StringRef::iterator *dot) {
   StringRef::iterator p = begin;
   *dot = end;
   while (p != end && *p == '0')
     p++;
 
   if (p != end && *p == '.') {
     *dot = p++;
 
     if (end - begin == 1)
       return createError("Significand has no digits");
 
     while (p != end && *p == '0')
       p++;
   }
 
   return p;
 }
 
 /* Given a normal decimal floating point number of the form
 
      dddd.dddd[eE][+-]ddd
 
    where the decimal point and exponent are optional, fill out the
    structure D.  Exponent is appropriate if the significand is
    treated as an integer, and normalizedExponent if the significand
    is taken to have the decimal point after a single leading
    non-zero digit.
 
    If the value is zero, V->firstSigDigit points to a non-digit, and
    the return exponent is zero.
 */
 struct decimalInfo {
   const char *firstSigDigit;
   const char *lastSigDigit;
   int exponent;
   int normalizedExponent;
 };
 
 static Error interpretDecimal(StringRef::iterator begin,
                               StringRef::iterator end, decimalInfo *D) {
   StringRef::iterator dot = end;
 
   auto PtrOrErr = skipLeadingZeroesAndAnyDot(begin, end, &dot);
   if (!PtrOrErr)
     return PtrOrErr.takeError();
   StringRef::iterator p = *PtrOrErr;
 
   D->firstSigDigit = p;
   D->exponent = 0;
   D->normalizedExponent = 0;
 
   for (; p != end; ++p) {
     if (*p == '.') {
       if (dot != end)
         return createError("String contains multiple dots");
       dot = p++;
       if (p == end)
         break;
     }
     if (decDigitValue(*p) >= 10U)
       break;
   }
 
   if (p != end) {
     if (*p != 'e' && *p != 'E')
       return createError("Invalid character in significand");
     if (p == begin)
       return createError("Significand has no digits");
     if (dot != end && p - begin == 1)
       return createError("Significand has no digits");
 
     /* p points to the first non-digit in the string */
     auto ExpOrErr = readExponent(p + 1, end);
     if (!ExpOrErr)
       return ExpOrErr.takeError();
     D->exponent = *ExpOrErr;
 
     /* Implied decimal point?  */
     if (dot == end)
       dot = p;
   }
 
   /* If number is all zeroes accept any exponent.  */
   if (p != D->firstSigDigit) {
     /* Drop insignificant trailing zeroes.  */
     if (p != begin) {
       do
         do
           p--;
         while (p != begin && *p == '0');
       while (p != begin && *p == '.');
     }
 
     /* Adjust the exponents for any decimal point.  */
     D->exponent += static_cast<APFloat::ExponentType>((dot - p) - (dot > p));
     D->normalizedExponent = (D->exponent +
               static_cast<APFloat::ExponentType>((p - D->firstSigDigit)
                                       - (dot > D->firstSigDigit && dot < p)));
   }
 
   D->lastSigDigit = p;
   return Error::success();
 }
 
 /* Return the trailing fraction of a hexadecimal number.
    DIGITVALUE is the first hex digit of the fraction, P points to
    the next digit.  */
 static Expected<lostFraction>
 trailingHexadecimalFraction(StringRef::iterator p, StringRef::iterator end,
                             unsigned int digitValue) {
   unsigned int hexDigit;
 
   /* If the first trailing digit isn't 0 or 8 we can work out the
      fraction immediately.  */
   if (digitValue > 8)
     return lfMoreThanHalf;
   else if (digitValue < 8 && digitValue > 0)
     return lfLessThanHalf;
 
   // Otherwise we need to find the first non-zero digit.
   while (p != end && (*p == '0' || *p == '.'))
     p++;
 
   if (p == end)
     return createError("Invalid trailing hexadecimal fraction!");
 
   hexDigit = hexDigitValue(*p);
 
   /* If we ran off the end it is exactly zero or one-half, otherwise
      a little more.  */
   if (hexDigit == -1U)
     return digitValue == 0 ? lfExactlyZero: lfExactlyHalf;
   else
     return digitValue == 0 ? lfLessThanHalf: lfMoreThanHalf;
 }
 
 /* Return the fraction lost were a bignum truncated losing the least
    significant BITS bits.  */
 static lostFraction
 lostFractionThroughTruncation(const APFloatBase::integerPart *parts,
                               unsigned int partCount,
                               unsigned int bits)
 {
   unsigned int lsb;
 
   lsb = APInt::tcLSB(parts, partCount);
 
   /* Note this is guaranteed true if bits == 0, or LSB == -1U.  */
   if (bits <= lsb)
     return lfExactlyZero;
   if (bits == lsb + 1)
     return lfExactlyHalf;
   if (bits <= partCount * APFloatBase::integerPartWidth &&
       APInt::tcExtractBit(parts, bits - 1))
     return lfMoreThanHalf;
 
   return lfLessThanHalf;
 }
 
 /* Shift DST right BITS bits noting lost fraction.  */
 static lostFraction
 shiftRight(APFloatBase::integerPart *dst, unsigned int parts, unsigned int bits)
 {
   lostFraction lost_fraction;
 
   lost_fraction = lostFractionThroughTruncation(dst, parts, bits);
 
   APInt::tcShiftRight(dst, parts, bits);
 
   return lost_fraction;
 }
 
 /* Combine the effect of two lost fractions.  */
 static lostFraction
 combineLostFractions(lostFraction moreSignificant,
                      lostFraction lessSignificant)
 {
   if (lessSignificant != lfExactlyZero) {
     if (moreSignificant == lfExactlyZero)
       moreSignificant = lfLessThanHalf;
     else if (moreSignificant == lfExactlyHalf)
       moreSignificant = lfMoreThanHalf;
   }
 
   return moreSignificant;
 }
 
 /* The error from the true value, in half-ulps, on multiplying two
    floating point numbers, which differ from the value they
    approximate by at most HUE1 and HUE2 half-ulps, is strictly less
    than the returned value.
 
    See "How to Read Floating Point Numbers Accurately" by William D
    Clinger.  */
 static unsigned int
 HUerrBound(bool inexactMultiply, unsigned int HUerr1, unsigned int HUerr2)
 {
   assert(HUerr1 < 2 || HUerr2 < 2 || (HUerr1 + HUerr2 < 8));
 
   if (HUerr1 + HUerr2 == 0)
     return inexactMultiply * 2;  /* <= inexactMultiply half-ulps.  */
   else
     return inexactMultiply + 2 * (HUerr1 + HUerr2);
 }
 
 /* The number of ulps from the boundary (zero, or half if ISNEAREST)
    when the least significant BITS are truncated.  BITS cannot be
    zero.  */
 static APFloatBase::integerPart
 ulpsFromBoundary(const APFloatBase::integerPart *parts, unsigned int bits,
                  bool isNearest) {
   unsigned int count, partBits;
   APFloatBase::integerPart part, boundary;
 
   assert(bits != 0);
 
   bits--;
   count = bits / APFloatBase::integerPartWidth;
   partBits = bits % APFloatBase::integerPartWidth + 1;
 
   part = parts[count] & (~(APFloatBase::integerPart) 0 >> (APFloatBase::integerPartWidth - partBits));
 
   if (isNearest)
     boundary = (APFloatBase::integerPart) 1 << (partBits - 1);
   else
     boundary = 0;
 
   if (count == 0) {
     if (part - boundary <= boundary - part)
       return part - boundary;
     else
       return boundary - part;
   }
 
   if (part == boundary) {
     while (--count)
       if (parts[count])
         return ~(APFloatBase::integerPart) 0; /* A lot.  */
 
     return parts[0];
   } else if (part == boundary - 1) {
     while (--count)
       if (~parts[count])
         return ~(APFloatBase::integerPart) 0; /* A lot.  */
 
     return -parts[0];
   }
 
   return ~(APFloatBase::integerPart) 0; /* A lot.  */
 }
 
 /* Place pow(5, power) in DST, and return the number of parts used.
    DST must be at least one part larger than size of the answer.  */
 static unsigned int
 powerOf5(APFloatBase::integerPart *dst, unsigned int power) {
   static const APFloatBase::integerPart firstEightPowers[] = { 1, 5, 25, 125, 625, 3125, 15625, 78125 };
   APFloatBase::integerPart pow5s[maxPowerOfFiveParts * 2 + 5];
   pow5s[0] = 78125 * 5;
 
   unsigned int partsCount[16] = { 1 };
   APFloatBase::integerPart scratch[maxPowerOfFiveParts], *p1, *p2, *pow5;
   unsigned int result;
   assert(power <= maxExponent);
 
   p1 = dst;
   p2 = scratch;
 
   *p1 = firstEightPowers[power & 7];
   power >>= 3;
 
   result = 1;
   pow5 = pow5s;
 
   for (unsigned int n = 0; power; power >>= 1, n++) {
     unsigned int pc;
 
     pc = partsCount[n];
 
     /* Calculate pow(5,pow(2,n+3)) if we haven't yet.  */
     if (pc == 0) {
       pc = partsCount[n - 1];
       APInt::tcFullMultiply(pow5, pow5 - pc, pow5 - pc, pc, pc);
       pc *= 2;
       if (pow5[pc - 1] == 0)
         pc--;
       partsCount[n] = pc;
     }
 
     if (power & 1) {
       APFloatBase::integerPart *tmp;
 
       APInt::tcFullMultiply(p2, p1, pow5, result, pc);
       result += pc;
       if (p2[result - 1] == 0)
         result--;
 
       /* Now result is in p1 with partsCount parts and p2 is scratch
          space.  */
       tmp = p1;
       p1 = p2;
       p2 = tmp;
     }
 
     pow5 += pc;
   }
 
   if (p1 != dst)
     APInt::tcAssign(dst, p1, result);
 
   return result;
 }
 
 /* Zero at the end to avoid modular arithmetic when adding one; used
    when rounding up during hexadecimal output.  */
 static const char hexDigitsLower[] = "0123456789abcdef0";
 static const char hexDigitsUpper[] = "0123456789ABCDEF0";
 static const char infinityL[] = "infinity";
 static const char infinityU[] = "INFINITY";
 static const char NaNL[] = "nan";
 static const char NaNU[] = "NAN";
 
 /* Write out an integerPart in hexadecimal, starting with the most
    significant nibble.  Write out exactly COUNT hexdigits, return
    COUNT.  */
 static unsigned int
 partAsHex (char *dst, APFloatBase::integerPart part, unsigned int count,
            const char *hexDigitChars)
 {
   unsigned int result = count;
 
   assert(count != 0 && count <= APFloatBase::integerPartWidth / 4);
 
   part >>= (APFloatBase::integerPartWidth - 4 * count);
   while (count--) {
     dst[count] = hexDigitChars[part & 0xf];
     part >>= 4;
   }
 
   return result;
 }
 
 /* Write out an unsigned decimal integer.  */
 static char *
 writeUnsignedDecimal (char *dst, unsigned int n)
 {
   char buff[40], *p;
 
   p = buff;
   do
     *p++ = '0' + n % 10;
   while (n /= 10);
 
   do
     *dst++ = *--p;
   while (p != buff);
 
   return dst;
 }
 
 /* Write out a signed decimal integer.  */
 static char *
 writeSignedDecimal (char *dst, int value)
 {
   if (value < 0) {
     *dst++ = '-';
     dst = writeUnsignedDecimal(dst, -(unsigned) value);
   } else
     dst = writeUnsignedDecimal(dst, value);
 
   return dst;
 }
 
 namespace detail {
 /* Constructors.  */
 void IEEEFloat::initialize(const fltSemantics *ourSemantics) {
   unsigned int count;
 
   semantics = ourSemantics;
   count = partCount();
   if (count > 1)
     significand.parts = new integerPart[count];
 }
 
 void IEEEFloat::freeSignificand() {
   if (needsCleanup())
     delete [] significand.parts;
 }
 
 void IEEEFloat::assign(const IEEEFloat &rhs) {
   assert(semantics == rhs.semantics);
 
   sign = rhs.sign;
   category = rhs.category;
   exponent = rhs.exponent;
   if (isFiniteNonZero() || category == fcNaN)
     copySignificand(rhs);
 }
 
 void IEEEFloat::copySignificand(const IEEEFloat &rhs) {
   assert(isFiniteNonZero() || category == fcNaN);
   assert(rhs.partCount() >= partCount());
 
   APInt::tcAssign(significandParts(), rhs.significandParts(),
                   partCount());
 }
 
 /* Make this number a NaN, with an arbitrary but deterministic value
    for the significand.  If double or longer, this is a signalling NaN,
    which may not be ideal.  If float, this is QNaN(0).  */
 void IEEEFloat::makeNaN(bool SNaN, bool Negative, const APInt *fill) {
   category = fcNaN;
   sign = Negative;
 
   integerPart *significand = significandParts();
   unsigned numParts = partCount();
 
   // Set the significand bits to the fill.
   if (!fill || fill->getNumWords() < numParts)
     APInt::tcSet(significand, 0, numParts);
   if (fill) {
     APInt::tcAssign(significand, fill->getRawData(),
                     std::min(fill->getNumWords(), numParts));
 
     // Zero out the excess bits of the significand.
     unsigned bitsToPreserve = semantics->precision - 1;
     unsigned part = bitsToPreserve / 64;
     bitsToPreserve %= 64;
     significand[part] &= ((1ULL << bitsToPreserve) - 1);
     for (part++; part != numParts; ++part)
       significand[part] = 0;
   }
 
   unsigned QNaNBit = semantics->precision - 2;
 
   if (SNaN) {
     // We always have to clear the QNaN bit to make it an SNaN.
     APInt::tcClearBit(significand, QNaNBit);
 
     // If there are no bits set in the payload, we have to set
     // *something* to make it a NaN instead of an infinity;
     // conventionally, this is the next bit down from the QNaN bit.
     if (APInt::tcIsZero(significand, numParts))
       APInt::tcSetBit(significand, QNaNBit - 1);
   } else {
     // We always have to set the QNaN bit to make it a QNaN.
     APInt::tcSetBit(significand, QNaNBit);
   }
 
   // For x87 extended precision, we want to make a NaN, not a
   // pseudo-NaN.  Maybe we should expose the ability to make
   // pseudo-NaNs?
   if (semantics == &semX87DoubleExtended)
     APInt::tcSetBit(significand, QNaNBit + 1);
 }
 
 IEEEFloat &IEEEFloat::operator=(const IEEEFloat &rhs) {
   if (this != &rhs) {
     if (semantics != rhs.semantics) {
       freeSignificand();
       initialize(rhs.semantics);
     }
     assign(rhs);
   }
 
   return *this;
 }
 
 IEEEFloat &IEEEFloat::operator=(IEEEFloat &&rhs) {
   freeSignificand();
 
   semantics = rhs.semantics;
   significand = rhs.significand;
   exponent = rhs.exponent;
   category = rhs.category;
   sign = rhs.sign;
 
   rhs.semantics = &semBogus;
   return *this;
 }
 
 bool IEEEFloat::isDenormal() const {
   return isFiniteNonZero() && (exponent == semantics->minExponent) &&
          (APInt::tcExtractBit(significandParts(),
                               semantics->precision - 1) == 0);
 }
 
 bool IEEEFloat::isSmallest() const {
   // The smallest number by magnitude in our format will be the smallest
   // denormal, i.e. the floating point number with exponent being minimum
   // exponent and significand bitwise equal to 1 (i.e. with MSB equal to 0).
   return isFiniteNonZero() && exponent == semantics->minExponent &&
     significandMSB() == 0;
 }
 
 bool IEEEFloat::isSignificandAllOnes() const {
   // Test if the significand excluding the integral bit is all ones. This allows
   // us to test for binade boundaries.
   const integerPart *Parts = significandParts();
   const unsigned PartCount = partCount();
   for (unsigned i = 0; i < PartCount - 1; i++)
     if (~Parts[i])
       return false;
 
   // Set the unused high bits to all ones when we compare.
   const unsigned NumHighBits =
     PartCount*integerPartWidth - semantics->precision + 1;
   assert(NumHighBits <= integerPartWidth && "Can not have more high bits to "
          "fill than integerPartWidth");
   const integerPart HighBitFill =
     ~integerPart(0) << (integerPartWidth - NumHighBits);
   if (~(Parts[PartCount - 1] | HighBitFill))
     return false;
 
   return true;
 }
 
 bool IEEEFloat::isSignificandAllZeros() const {
   // Test if the significand excluding the integral bit is all zeros. This
   // allows us to test for binade boundaries.
   const integerPart *Parts = significandParts();
   const unsigned PartCount = partCount();
 
   for (unsigned i = 0; i < PartCount - 1; i++)
     if (Parts[i])
       return false;
 
   const unsigned NumHighBits =
     PartCount*integerPartWidth - semantics->precision + 1;
   assert(NumHighBits <= integerPartWidth && "Can not have more high bits to "
          "clear than integerPartWidth");
   const integerPart HighBitMask = ~integerPart(0) >> NumHighBits;
 
   if (Parts[PartCount - 1] & HighBitMask)
     return false;
 
   return true;
 }
 
 bool IEEEFloat::isLargest() const {
   // The largest number by magnitude in our format will be the floating point
   // number with maximum exponent and with significand that is all ones.
   return isFiniteNonZero() && exponent == semantics->maxExponent
     && isSignificandAllOnes();
 }
 
 bool IEEEFloat::isInteger() const {
   // This could be made more efficient; I'm going for obviously correct.
   if (!isFinite()) return false;
   IEEEFloat truncated = *this;
   truncated.roundToIntegral(rmTowardZero);
   return compare(truncated) == cmpEqual;
 }
 
 bool IEEEFloat::bitwiseIsEqual(const IEEEFloat &rhs) const {
   if (this == &rhs)
     return true;
   if (semantics != rhs.semantics ||
       category != rhs.category ||
       sign != rhs.sign)
     return false;
   if (category==fcZero || category==fcInfinity)
     return true;
 
   if (isFiniteNonZero() && exponent != rhs.exponent)
     return false;
 
   return std::equal(significandParts(), significandParts() + partCount(),
                     rhs.significandParts());
 }
 
 IEEEFloat::IEEEFloat(const fltSemantics &ourSemantics, integerPart value) {
   initialize(&ourSemantics);
   sign = 0;
   category = fcNormal;
   zeroSignificand();
   exponent = ourSemantics.precision - 1;
   significandParts()[0] = value;
   normalize(rmNearestTiesToEven, lfExactlyZero);
 }
 
 IEEEFloat::IEEEFloat(const fltSemantics &ourSemantics) {
   initialize(&ourSemantics);
   category = fcZero;
   sign = false;
 }
 
 // Delegate to the previous constructor, because later copy constructor may
 // actually inspects category, which can't be garbage.
 IEEEFloat::IEEEFloat(const fltSemantics &ourSemantics, uninitializedTag tag)
     : IEEEFloat(ourSemantics) {}
 
 IEEEFloat::IEEEFloat(const IEEEFloat &rhs) {
   initialize(rhs.semantics);
   assign(rhs);
 }
 
 IEEEFloat::IEEEFloat(IEEEFloat &&rhs) : semantics(&semBogus) {
   *this = std::move(rhs);
 }
 
 IEEEFloat::~IEEEFloat() { freeSignificand(); }
 
 unsigned int IEEEFloat::partCount() const {
   return partCountForBits(semantics->precision + 1);
 }
 
 const IEEEFloat::integerPart *IEEEFloat::significandParts() const {
   return const_cast<IEEEFloat *>(this)->significandParts();
 }
 
 IEEEFloat::integerPart *IEEEFloat::significandParts() {
   if (partCount() > 1)
     return significand.parts;
   else
     return &significand.part;
 }
 
 void IEEEFloat::zeroSignificand() {
   APInt::tcSet(significandParts(), 0, partCount());
 }
 
 /* Increment an fcNormal floating point number's significand.  */
 void IEEEFloat::incrementSignificand() {
   integerPart carry;
 
   carry = APInt::tcIncrement(significandParts(), partCount());
 
   /* Our callers should never cause us to overflow.  */
   assert(carry == 0);
   (void)carry;
 }
 
 /* Add the significand of the RHS.  Returns the carry flag.  */
 IEEEFloat::integerPart IEEEFloat::addSignificand(const IEEEFloat &rhs) {
   integerPart *parts;
 
   parts = significandParts();
 
   assert(semantics == rhs.semantics);
   assert(exponent == rhs.exponent);
 
   return APInt::tcAdd(parts, rhs.significandParts(), 0, partCount());
 }
 
 /* Subtract the significand of the RHS with a borrow flag.  Returns
    the borrow flag.  */
 IEEEFloat::integerPart IEEEFloat::subtractSignificand(const IEEEFloat &rhs,
                                                       integerPart borrow) {
   integerPart *parts;
 
   parts = significandParts();
 
   assert(semantics == rhs.semantics);
   assert(exponent == rhs.exponent);
 
   return APInt::tcSubtract(parts, rhs.significandParts(), borrow,
                            partCount());
 }
 
 /* Multiply the significand of the RHS.  If ADDEND is non-NULL, add it
    on to the full-precision result of the multiplication.  Returns the
    lost fraction.  */
 lostFraction IEEEFloat::multiplySignificand(const IEEEFloat &rhs,
                                             IEEEFloat addend) {
   unsigned int omsb;        // One, not zero, based MSB.
   unsigned int partsCount, newPartsCount, precision;
   integerPart *lhsSignificand;
   integerPart scratch[4];
   integerPart *fullSignificand;
   lostFraction lost_fraction;
   bool ignored;
 
   assert(semantics == rhs.semantics);
 
   precision = semantics->precision;
 
   // Allocate space for twice as many bits as the original significand, plus one
   // extra bit for the addition to overflow into.
   newPartsCount = partCountForBits(precision * 2 + 1);
 
   if (newPartsCount > 4)
     fullSignificand = new integerPart[newPartsCount];
   else
     fullSignificand = scratch;
 
   lhsSignificand = significandParts();
   partsCount = partCount();
 
   APInt::tcFullMultiply(fullSignificand, lhsSignificand,
                         rhs.significandParts(), partsCount, partsCount);
 
   lost_fraction = lfExactlyZero;
   omsb = APInt::tcMSB(fullSignificand, newPartsCount) + 1;
   exponent += rhs.exponent;
 
   // Assume the operands involved in the multiplication are single-precision
   // FP, and the two multiplicants are:
   //   *this = a23 . a22 ... a0 * 2^e1
   //     rhs = b23 . b22 ... b0 * 2^e2
   // the result of multiplication is:
   //   *this = c48 c47 c46 . c45 ... c0 * 2^(e1+e2)
   // Note that there are three significant bits at the left-hand side of the
   // radix point: two for the multiplication, and an overflow bit for the
   // addition (that will always be zero at this point). Move the radix point
   // toward left by two bits, and adjust exponent accordingly.
   exponent += 2;
 
   if (addend.isNonZero()) {
     // The intermediate result of the multiplication has "2 * precision"
     // signicant bit; adjust the addend to be consistent with mul result.
     //
     Significand savedSignificand = significand;
     const fltSemantics *savedSemantics = semantics;
     fltSemantics extendedSemantics;
     opStatus status;
     unsigned int extendedPrecision;
 
     // Normalize our MSB to one below the top bit to allow for overflow.
     extendedPrecision = 2 * precision + 1;
     if (omsb != extendedPrecision - 1) {
       assert(extendedPrecision > omsb);
       APInt::tcShiftLeft(fullSignificand, newPartsCount,
                          (extendedPrecision - 1) - omsb);
       exponent -= (extendedPrecision - 1) - omsb;
     }
 
     /* Create new semantics.  */
     extendedSemantics = *semantics;
     extendedSemantics.precision = extendedPrecision;
 
     if (newPartsCount == 1)
       significand.part = fullSignificand[0];
     else
       significand.parts = fullSignificand;
     semantics = &extendedSemantics;
 
     // Make a copy so we can convert it to the extended semantics.
     // Note that we cannot convert the addend directly, as the extendedSemantics
     // is a local variable (which we take a reference to).
     IEEEFloat extendedAddend(addend);
     status = extendedAddend.convert(extendedSemantics, rmTowardZero, &ignored);
     assert(status == opOK);
     (void)status;
 
     // Shift the significand of the addend right by one bit. This guarantees
     // that the high bit of the significand is zero (same as fullSignificand),
     // so the addition will overflow (if it does overflow at all) into the top bit.
     lost_fraction = extendedAddend.shiftSignificandRight(1);
     assert(lost_fraction == lfExactlyZero &&
            "Lost precision while shifting addend for fused-multiply-add.");
 
     lost_fraction = addOrSubtractSignificand(extendedAddend, false);
 
     /* Restore our state.  */
     if (newPartsCount == 1)
       fullSignificand[0] = significand.part;
     significand = savedSignificand;
     semantics = savedSemantics;
 
     omsb = APInt::tcMSB(fullSignificand, newPartsCount) + 1;
   }
 
   // Convert the result having "2 * precision" significant-bits back to the one
   // having "precision" significant-bits. First, move the radix point from
   // poision "2*precision - 1" to "precision - 1". The exponent need to be
   // adjusted by "2*precision - 1" - "precision - 1" = "precision".
   exponent -= precision + 1;
 
   // In case MSB resides at the left-hand side of radix point, shift the
   // mantissa right by some amount to make sure the MSB reside right before
   // the radix point (i.e. "MSB . rest-significant-bits").
   //
   // Note that the result is not normalized when "omsb < precision". So, the
   // caller needs to call IEEEFloat::normalize() if normalized value is
   // expected.
   if (omsb > precision) {
     unsigned int bits, significantParts;
     lostFraction lf;
 
     bits = omsb - precision;
     significantParts = partCountForBits(omsb);
     lf = shiftRight(fullSignificand, significantParts, bits);
     lost_fraction = combineLostFractions(lf, lost_fraction);
     exponent += bits;
   }
 
   APInt::tcAssign(lhsSignificand, fullSignificand, partsCount);
 
   if (newPartsCount > 4)
     delete [] fullSignificand;
 
   return lost_fraction;
 }
 
 lostFraction IEEEFloat::multiplySignificand(const IEEEFloat &rhs) {
   return multiplySignificand(rhs, IEEEFloat(*semantics));
 }
 
 /* Multiply the significands of LHS and RHS to DST.  */
 lostFraction IEEEFloat::divideSignificand(const IEEEFloat &rhs) {
   unsigned int bit, i, partsCount;
   const integerPart *rhsSignificand;
   integerPart *lhsSignificand, *dividend, *divisor;
   integerPart scratch[4];
   lostFraction lost_fraction;
 
   assert(semantics == rhs.semantics);
 
   lhsSignificand = significandParts();
   rhsSignificand = rhs.significandParts();
   partsCount = partCount();
 
   if (partsCount > 2)
     dividend = new integerPart[partsCount * 2];
   else
     dividend = scratch;
 
   divisor = dividend + partsCount;
 
   /* Copy the dividend and divisor as they will be modified in-place.  */
   for (i = 0; i < partsCount; i++) {
     dividend[i] = lhsSignificand[i];
     divisor[i] = rhsSignificand[i];
     lhsSignificand[i] = 0;
   }
 
   exponent -= rhs.exponent;
 
   unsigned int precision = semantics->precision;
 
   /* Normalize the divisor.  */
   bit = precision - APInt::tcMSB(divisor, partsCount) - 1;
   if (bit) {
     exponent += bit;
     APInt::tcShiftLeft(divisor, partsCount, bit);
   }
 
   /* Normalize the dividend.  */
   bit = precision - APInt::tcMSB(dividend, partsCount) - 1;
   if (bit) {
     exponent -= bit;
     APInt::tcShiftLeft(dividend, partsCount, bit);
   }
 
   /* Ensure the dividend >= divisor initially for the loop below.
      Incidentally, this means that the division loop below is
      guaranteed to set the integer bit to one.  */
   if (APInt::tcCompare(dividend, divisor, partsCount) < 0) {
     exponent--;
     APInt::tcShiftLeft(dividend, partsCount, 1);
     assert(APInt::tcCompare(dividend, divisor, partsCount) >= 0);
   }
 
   /* Long division.  */
   for (bit = precision; bit; bit -= 1) {
     if (APInt::tcCompare(dividend, divisor, partsCount) >= 0) {
       APInt::tcSubtract(dividend, divisor, 0, partsCount);
       APInt::tcSetBit(lhsSignificand, bit - 1);
     }
 
     APInt::tcShiftLeft(dividend, partsCount, 1);
   }
 
   /* Figure out the lost fraction.  */
   int cmp = APInt::tcCompare(dividend, divisor, partsCount);
 
   if (cmp > 0)
     lost_fraction = lfMoreThanHalf;
   else if (cmp == 0)
     lost_fraction = lfExactlyHalf;
   else if (APInt::tcIsZero(dividend, partsCount))
     lost_fraction = lfExactlyZero;
   else
     lost_fraction = lfLessThanHalf;
 
   if (partsCount > 2)
     delete [] dividend;
 
   return lost_fraction;
 }
 
 unsigned int IEEEFloat::significandMSB() const {
   return APInt::tcMSB(significandParts(), partCount());
 }
 
 unsigned int IEEEFloat::significandLSB() const {
   return APInt::tcLSB(significandParts(), partCount());
 }
 
 /* Note that a zero result is NOT normalized to fcZero.  */
 lostFraction IEEEFloat::shiftSignificandRight(unsigned int bits) {
   /* Our exponent should not overflow.  */
   assert((ExponentType) (exponent + bits) >= exponent);
 
   exponent += bits;
 
   return shiftRight(significandParts(), partCount(), bits);
 }
 
 /* Shift the significand left BITS bits, subtract BITS from its exponent.  */
 void IEEEFloat::shiftSignificandLeft(unsigned int bits) {
   assert(bits < semantics->precision);
 
   if (bits) {
     unsigned int partsCount = partCount();
 
     APInt::tcShiftLeft(significandParts(), partsCount, bits);
     exponent -= bits;
 
     assert(!APInt::tcIsZero(significandParts(), partsCount));
   }
 }
 
 IEEEFloat::cmpResult
 IEEEFloat::compareAbsoluteValue(const IEEEFloat &rhs) const {
   int compare;
 
   assert(semantics == rhs.semantics);
   assert(isFiniteNonZero());
   assert(rhs.isFiniteNonZero());
 
   compare = exponent - rhs.exponent;
 
   /* If exponents are equal, do an unsigned bignum comparison of the
      significands.  */
   if (compare == 0)
     compare = APInt::tcCompare(significandParts(), rhs.significandParts(),
                                partCount());
 
   if (compare > 0)
     return cmpGreaterThan;
   else if (compare < 0)
     return cmpLessThan;
   else
     return cmpEqual;
 }
 
 /* Handle overflow.  Sign is preserved.  We either become infinity or
    the largest finite number.  */
 IEEEFloat::opStatus IEEEFloat::handleOverflow(roundingMode rounding_mode) {
   /* Infinity?  */
   if (rounding_mode == rmNearestTiesToEven ||
       rounding_mode == rmNearestTiesToAway ||
       (rounding_mode == rmTowardPositive && !sign) ||
       (rounding_mode == rmTowardNegative && sign)) {
     category = fcInfinity;
     return (opStatus) (opOverflow | opInexact);
   }
 
   /* Otherwise we become the largest finite number.  */
   category = fcNormal;
   exponent = semantics->maxExponent;
   APInt::tcSetLeastSignificantBits(significandParts(), partCount(),
                                    semantics->precision);
 
   return opInexact;
 }
 
 /* Returns TRUE if, when truncating the current number, with BIT the
    new LSB, with the given lost fraction and rounding mode, the result
    would need to be rounded away from zero (i.e., by increasing the
    signficand).  This routine must work for fcZero of both signs, and
    fcNormal numbers.  */
 bool IEEEFloat::roundAwayFromZero(roundingMode rounding_mode,
                                   lostFraction lost_fraction,
                                   unsigned int bit) const {
   /* NaNs and infinities should not have lost fractions.  */
   assert(isFiniteNonZero() || category == fcZero);
 
   /* Current callers never pass this so we don't handle it.  */
   assert(lost_fraction != lfExactlyZero);
 
   switch (rounding_mode) {
   case rmNearestTiesToAway:
     return lost_fraction == lfExactlyHalf || lost_fraction == lfMoreThanHalf;
 
   case rmNearestTiesToEven:
     if (lost_fraction == lfMoreThanHalf)
       return true;
 
     /* Our zeroes don't have a significand to test.  */
     if (lost_fraction == lfExactlyHalf && category != fcZero)
       return APInt::tcExtractBit(significandParts(), bit);
 
     return false;
 
   case rmTowardZero:
     return false;
 
   case rmTowardPositive:
     return !sign;
 
   case rmTowardNegative:
     return sign;
 
   default:
     break;
   }
   llvm_unreachable("Invalid rounding mode found");
 }
 
 IEEEFloat::opStatus IEEEFloat::normalize(roundingMode rounding_mode,
                                          lostFraction lost_fraction) {
   unsigned int omsb;                /* One, not zero, based MSB.  */
   int exponentChange;
 
   if (!isFiniteNonZero())
     return opOK;
 
   /* Before rounding normalize the exponent of fcNormal numbers.  */
   omsb = significandMSB() + 1;
 
   if (omsb) {
     /* OMSB is numbered from 1.  We want to place it in the integer
        bit numbered PRECISION if possible, with a compensating change in
        the exponent.  */
     exponentChange = omsb - semantics->precision;
 
     /* If the resulting exponent is too high, overflow according to
        the rounding mode.  */
     if (exponent + exponentChange > semantics->maxExponent)
       return handleOverflow(rounding_mode);
 
     /* Subnormal numbers have exponent minExponent, and their MSB
        is forced based on that.  */
     if (exponent + exponentChange < semantics->minExponent)
       exponentChange = semantics->minExponent - exponent;
 
     /* Shifting left is easy as we don't lose precision.  */
     if (exponentChange < 0) {
       assert(lost_fraction == lfExactlyZero);
 
       shiftSignificandLeft(-exponentChange);
 
       return opOK;
     }
 
     if (exponentChange > 0) {
       lostFraction lf;
 
       /* Shift right and capture any new lost fraction.  */
       lf = shiftSignificandRight(exponentChange);
 
       lost_fraction = combineLostFractions(lf, lost_fraction);
 
       /* Keep OMSB up-to-date.  */
       if (omsb > (unsigned) exponentChange)
         omsb -= exponentChange;
       else
         omsb = 0;
     }
   }
 
   /* Now round the number according to rounding_mode given the lost
      fraction.  */
 
   /* As specified in IEEE 754, since we do not trap we do not report
      underflow for exact results.  */
   if (lost_fraction == lfExactlyZero) {
     /* Canonicalize zeroes.  */
     if (omsb == 0)
       category = fcZero;
 
     return opOK;
   }
 
   /* Increment the significand if we're rounding away from zero.  */
   if (roundAwayFromZero(rounding_mode, lost_fraction, 0)) {
     if (omsb == 0)
       exponent = semantics->minExponent;
 
     incrementSignificand();
     omsb = significandMSB() + 1;
 
     /* Did the significand increment overflow?  */
     if (omsb == (unsigned) semantics->precision + 1) {
       /* Renormalize by incrementing the exponent and shifting our
          significand right one.  However if we already have the
          maximum exponent we overflow to infinity.  */
       if (exponent == semantics->maxExponent) {
         category = fcInfinity;
 
         return (opStatus) (opOverflow | opInexact);
       }
 
       shiftSignificandRight(1);
 
       return opInexact;
     }
   }
 
   /* The normal case - we were and are not denormal, and any
      significand increment above didn't overflow.  */
   if (omsb == semantics->precision)
     return opInexact;
 
   /* We have a non-zero denormal.  */
   assert(omsb < semantics->precision);
 
   /* Canonicalize zeroes.  */
   if (omsb == 0)
     category = fcZero;
 
   /* The fcZero case is a denormal that underflowed to zero.  */
   return (opStatus) (opUnderflow | opInexact);
 }
 
 IEEEFloat::opStatus IEEEFloat::addOrSubtractSpecials(const IEEEFloat &rhs,
                                                      bool subtract) {
   switch (PackCategoriesIntoKey(category, rhs.category)) {
   default:
     llvm_unreachable(nullptr);
 
   case PackCategoriesIntoKey(fcZero, fcNaN):
   case PackCategoriesIntoKey(fcNormal, fcNaN):
   case PackCategoriesIntoKey(fcInfinity, fcNaN):
     assign(rhs);
     LLVM_FALLTHROUGH;
   case PackCategoriesIntoKey(fcNaN, fcZero):
   case PackCategoriesIntoKey(fcNaN, fcNormal):
   case PackCategoriesIntoKey(fcNaN, fcInfinity):
   case PackCategoriesIntoKey(fcNaN, fcNaN):
     if (isSignaling()) {
       makeQuiet();
       return opInvalidOp;
     }
     return rhs.isSignaling() ? opInvalidOp : opOK;
 
   case PackCategoriesIntoKey(fcNormal, fcZero):
   case PackCategoriesIntoKey(fcInfinity, fcNormal):
   case PackCategoriesIntoKey(fcInfinity, fcZero):
     return opOK;
 
   case PackCategoriesIntoKey(fcNormal, fcInfinity):
   case PackCategoriesIntoKey(fcZero, fcInfinity):
     category = fcInfinity;
     sign = rhs.sign ^ subtract;
     return opOK;
 
   case PackCategoriesIntoKey(fcZero, fcNormal):
     assign(rhs);
     sign = rhs.sign ^ subtract;
     return opOK;
 
   case PackCategoriesIntoKey(fcZero, fcZero):
     /* Sign depends on rounding mode; handled by caller.  */
     return opOK;
 
   case PackCategoriesIntoKey(fcInfinity, fcInfinity):
     /* Differently signed infinities can only be validly
        subtracted.  */
     if (((sign ^ rhs.sign)!=0) != subtract) {
       makeNaN();
       return opInvalidOp;
     }
 
     return opOK;
 
   case PackCategoriesIntoKey(fcNormal, fcNormal):
     return opDivByZero;
   }
 }
 
 /* Add or subtract two normal numbers.  */
 lostFraction IEEEFloat::addOrSubtractSignificand(const IEEEFloat &rhs,
                                                  bool subtract) {
   integerPart carry;
   lostFraction lost_fraction;
   int bits;
 
   /* Determine if the operation on the absolute values is effectively
      an addition or subtraction.  */
   subtract ^= static_cast<bool>(sign ^ rhs.sign);
 
   /* Are we bigger exponent-wise than the RHS?  */
   bits = exponent - rhs.exponent;
 
   /* Subtraction is more subtle than one might naively expect.  */
   if (subtract) {
     IEEEFloat temp_rhs(rhs);
 
     if (bits == 0)
       lost_fraction = lfExactlyZero;
     else if (bits > 0) {
       lost_fraction = temp_rhs.shiftSignificandRight(bits - 1);
       shiftSignificandLeft(1);
     } else {
       lost_fraction = shiftSignificandRight(-bits - 1);
       temp_rhs.shiftSignificandLeft(1);
     }
 
     // Should we reverse the subtraction.
     if (compareAbsoluteValue(temp_rhs) == cmpLessThan) {
       carry = temp_rhs.subtractSignificand
         (*this, lost_fraction != lfExactlyZero);
       copySignificand(temp_rhs);
       sign = !sign;
     } else {
       carry = subtractSignificand
         (temp_rhs, lost_fraction != lfExactlyZero);
     }
 
     /* Invert the lost fraction - it was on the RHS and
        subtracted.  */
     if (lost_fraction == lfLessThanHalf)
       lost_fraction = lfMoreThanHalf;
     else if (lost_fraction == lfMoreThanHalf)
       lost_fraction = lfLessThanHalf;
 
     /* The code above is intended to ensure that no borrow is
        necessary.  */
     assert(!carry);
     (void)carry;
   } else {
     if (bits > 0) {
       IEEEFloat temp_rhs(rhs);
 
       lost_fraction = temp_rhs.shiftSignificandRight(bits);
       carry = addSignificand(temp_rhs);
     } else {
       lost_fraction = shiftSignificandRight(-bits);
       carry = addSignificand(rhs);
     }
 
     /* We have a guard bit; generating a carry cannot happen.  */
     assert(!carry);
     (void)carry;
   }
 
   return lost_fraction;
 }
 
 IEEEFloat::opStatus IEEEFloat::multiplySpecials(const IEEEFloat &rhs) {
   switch (PackCategoriesIntoKey(category, rhs.category)) {
   default:
     llvm_unreachable(nullptr);
 
   case PackCategoriesIntoKey(fcZero, fcNaN):
   case PackCategoriesIntoKey(fcNormal, fcNaN):
   case PackCategoriesIntoKey(fcInfinity, fcNaN):
     assign(rhs);
     sign = false;
     LLVM_FALLTHROUGH;
   case PackCategoriesIntoKey(fcNaN, fcZero):
   case PackCategoriesIntoKey(fcNaN, fcNormal):
   case PackCategoriesIntoKey(fcNaN, fcInfinity):
   case PackCategoriesIntoKey(fcNaN, fcNaN):
     sign ^= rhs.sign; // restore the original sign
     if (isSignaling()) {
       makeQuiet();
       return opInvalidOp;
     }
     return rhs.isSignaling() ? opInvalidOp : opOK;
 
   case PackCategoriesIntoKey(fcNormal, fcInfinity):
   case PackCategoriesIntoKey(fcInfinity, fcNormal):
   case PackCategoriesIntoKey(fcInfinity, fcInfinity):
     category = fcInfinity;
     return opOK;
 
   case PackCategoriesIntoKey(fcZero, fcNormal):
   case PackCategoriesIntoKey(fcNormal, fcZero):
   case PackCategoriesIntoKey(fcZero, fcZero):
     category = fcZero;
     return opOK;
 
   case PackCategoriesIntoKey(fcZero, fcInfinity):
   case PackCategoriesIntoKey(fcInfinity, fcZero):
     makeNaN();
     return opInvalidOp;
 
   case PackCategoriesIntoKey(fcNormal, fcNormal):
     return opOK;
   }
 }
 
 IEEEFloat::opStatus IEEEFloat::divideSpecials(const IEEEFloat &rhs) {
   switch (PackCategoriesIntoKey(category, rhs.category)) {
   default:
     llvm_unreachable(nullptr);
 
   case PackCategoriesIntoKey(fcZero, fcNaN):
   case PackCategoriesIntoKey(fcNormal, fcNaN):
   case PackCategoriesIntoKey(fcInfinity, fcNaN):
     assign(rhs);
     sign = false;
     LLVM_FALLTHROUGH;
   case PackCategoriesIntoKey(fcNaN, fcZero):
   case PackCategoriesIntoKey(fcNaN, fcNormal):
   case PackCategoriesIntoKey(fcNaN, fcInfinity):
   case PackCategoriesIntoKey(fcNaN, fcNaN):
     sign ^= rhs.sign; // restore the original sign
     if (isSignaling()) {
       makeQuiet();
       return opInvalidOp;
     }
     return rhs.isSignaling() ? opInvalidOp : opOK;
 
   case PackCategoriesIntoKey(fcInfinity, fcZero):
   case PackCategoriesIntoKey(fcInfinity, fcNormal):
   case PackCategoriesIntoKey(fcZero, fcInfinity):
   case PackCategoriesIntoKey(fcZero, fcNormal):
     return opOK;
 
   case PackCategoriesIntoKey(fcNormal, fcInfinity):
     category = fcZero;
     return opOK;
 
   case PackCategoriesIntoKey(fcNormal, fcZero):
     category = fcInfinity;
     return opDivByZero;
 
   case PackCategoriesIntoKey(fcInfinity, fcInfinity):
   case PackCategoriesIntoKey(fcZero, fcZero):
     makeNaN();
     return opInvalidOp;
 
   case PackCategoriesIntoKey(fcNormal, fcNormal):
     return opOK;
   }
 }
 
 IEEEFloat::opStatus IEEEFloat::modSpecials(const IEEEFloat &rhs) {
   switch (PackCategoriesIntoKey(category, rhs.category)) {
   default:
     llvm_unreachable(nullptr);
 
   case PackCategoriesIntoKey(fcZero, fcNaN):
   case PackCategoriesIntoKey(fcNormal, fcNaN):
   case PackCategoriesIntoKey(fcInfinity, fcNaN):
     assign(rhs);
     LLVM_FALLTHROUGH;
   case PackCategoriesIntoKey(fcNaN, fcZero):
   case PackCategoriesIntoKey(fcNaN, fcNormal):
   case PackCategoriesIntoKey(fcNaN, fcInfinity):
   case PackCategoriesIntoKey(fcNaN, fcNaN):
     if (isSignaling()) {
       makeQuiet();
       return opInvalidOp;
     }
     return rhs.isSignaling() ? opInvalidOp : opOK;
 
   case PackCategoriesIntoKey(fcZero, fcInfinity):
   case PackCategoriesIntoKey(fcZero, fcNormal):
   case PackCategoriesIntoKey(fcNormal, fcInfinity):
     return opOK;
 
   case PackCategoriesIntoKey(fcNormal, fcZero):
   case PackCategoriesIntoKey(fcInfinity, fcZero):
   case PackCategoriesIntoKey(fcInfinity, fcNormal):
   case PackCategoriesIntoKey(fcInfinity, fcInfinity):
   case PackCategoriesIntoKey(fcZero, fcZero):
     makeNaN();
     return opInvalidOp;
 
   case PackCategoriesIntoKey(fcNormal, fcNormal):
     return opOK;
   }
 }
 
 IEEEFloat::opStatus IEEEFloat::remainderSpecials(const IEEEFloat &rhs) {
   switch (PackCategoriesIntoKey(category, rhs.category)) {
   default:
     llvm_unreachable(nullptr);
 
   case PackCategoriesIntoKey(fcZero, fcNaN):
   case PackCategoriesIntoKey(fcNormal, fcNaN):
   case PackCategoriesIntoKey(fcInfinity, fcNaN):
     assign(rhs);
     LLVM_FALLTHROUGH;
   case PackCategoriesIntoKey(fcNaN, fcZero):
   case PackCategoriesIntoKey(fcNaN, fcNormal):
   case PackCategoriesIntoKey(fcNaN, fcInfinity):
   case PackCategoriesIntoKey(fcNaN, fcNaN):
     if (isSignaling()) {
       makeQuiet();
       return opInvalidOp;
     }
     return rhs.isSignaling() ? opInvalidOp : opOK;
 
   case PackCategoriesIntoKey(fcZero, fcInfinity):
   case PackCategoriesIntoKey(fcZero, fcNormal):
   case PackCategoriesIntoKey(fcNormal, fcInfinity):
     return opOK;
 
   case PackCategoriesIntoKey(fcNormal, fcZero):
   case PackCategoriesIntoKey(fcInfinity, fcZero):
   case PackCategoriesIntoKey(fcInfinity, fcNormal):
   case PackCategoriesIntoKey(fcInfinity, fcInfinity):
   case PackCategoriesIntoKey(fcZero, fcZero):
     makeNaN();
     return opInvalidOp;
 
   case PackCategoriesIntoKey(fcNormal, fcNormal):
     return opDivByZero; // fake status, indicating this is not a special case
   }
 }
 
 /* Change sign.  */
 void IEEEFloat::changeSign() {
   /* Look mummy, this one's easy.  */
   sign = !sign;
 }
 
 /* Normalized addition or subtraction.  */
 IEEEFloat::opStatus IEEEFloat::addOrSubtract(const IEEEFloat &rhs,
                                              roundingMode rounding_mode,
                                              bool subtract) {
   opStatus fs;
 
   fs = addOrSubtractSpecials(rhs, subtract);
 
   /* This return code means it was not a simple case.  */
   if (fs == opDivByZero) {
     lostFraction lost_fraction;
 
     lost_fraction = addOrSubtractSignificand(rhs, subtract);
     fs = normalize(rounding_mode, lost_fraction);
 
     /* Can only be zero if we lost no fraction.  */
     assert(category != fcZero || lost_fraction == lfExactlyZero);
   }
 
   /* If two numbers add (exactly) to zero, IEEE 754 decrees it is a
      positive zero unless rounding to minus infinity, except that
      adding two like-signed zeroes gives that zero.  */
   if (category == fcZero) {
     if (rhs.category != fcZero || (sign == rhs.sign) == subtract)
       sign = (rounding_mode == rmTowardNegative);
   }
 
   return fs;
 }
 
 /* Normalized addition.  */
 IEEEFloat::opStatus IEEEFloat::add(const IEEEFloat &rhs,
                                    roundingMode rounding_mode) {
   return addOrSubtract(rhs, rounding_mode, false);
 }
 
 /* Normalized subtraction.  */
 IEEEFloat::opStatus IEEEFloat::subtract(const IEEEFloat &rhs,
                                         roundingMode rounding_mode) {
   return addOrSubtract(rhs, rounding_mode, true);
 }
 
 /* Normalized multiply.  */
 IEEEFloat::opStatus IEEEFloat::multiply(const IEEEFloat &rhs,
                                         roundingMode rounding_mode) {
   opStatus fs;
 
   sign ^= rhs.sign;
   fs = multiplySpecials(rhs);
 
   if (isFiniteNonZero()) {
     lostFraction lost_fraction = multiplySignificand(rhs);
     fs = normalize(rounding_mode, lost_fraction);
     if (lost_fraction != lfExactlyZero)
       fs = (opStatus) (fs | opInexact);
   }
 
   return fs;
 }
 
 /* Normalized divide.  */
 IEEEFloat::opStatus IEEEFloat::divide(const IEEEFloat &rhs,
                                       roundingMode rounding_mode) {
   opStatus fs;
 
   sign ^= rhs.sign;
   fs = divideSpecials(rhs);
 
   if (isFiniteNonZero()) {
     lostFraction lost_fraction = divideSignificand(rhs);
     fs = normalize(rounding_mode, lost_fraction);
     if (lost_fraction != lfExactlyZero)
       fs = (opStatus) (fs | opInexact);
   }
 
   return fs;
 }
 
 /* Normalized remainder.  */
 IEEEFloat::opStatus IEEEFloat::remainder(const IEEEFloat &rhs) {
   opStatus fs;
   unsigned int origSign = sign;
 
   // First handle the special cases.
   fs = remainderSpecials(rhs);
   if (fs != opDivByZero)
     return fs;
 
   fs = opOK;
 
   // Make sure the current value is less than twice the denom. If the addition
   // did not succeed (an overflow has happened), which means that the finite
   // value we currently posses must be less than twice the denom (as we are
   // using the same semantics).
   IEEEFloat P2 = rhs;
   if (P2.add(rhs, rmNearestTiesToEven) == opOK) {
     fs = mod(P2);
     assert(fs == opOK);
   }
 
   // Lets work with absolute numbers.
   IEEEFloat P = rhs;
   P.sign = false;
   sign = false;
 
   //
   // To calculate the remainder we use the following scheme.
   //
   // The remainder is defained as follows:
   //
   // remainder = numer - rquot * denom = x - r * p
   //
   // Where r is the result of: x/p, rounded toward the nearest integral value
   // (with halfway cases rounded toward the even number).
   //
   // Currently, (after x mod 2p):
   // r is the number of 2p's present inside x, which is inherently, an even
   // number of p's.
   //
   // We may split the remaining calculation into 4 options:
   // - if x < 0.5p then we round to the nearest number with is 0, and are done.
   // - if x == 0.5p then we round to the nearest even number which is 0, and we
   //   are done as well.
   // - if 0.5p < x < p then we round to nearest number which is 1, and we have
   //   to subtract 1p at least once.
   // - if x >= p then we must subtract p at least once, as x must be a
   //   remainder.
   //
   // By now, we were done, or we added 1 to r, which in turn, now an odd number.
   //
   // We can now split the remaining calculation to the following 3 options:
   // - if x < 0.5p then we round to the nearest number with is 0, and are done.
   // - if x == 0.5p then we round to the nearest even number. As r is odd, we
   //   must round up to the next even number. so we must subtract p once more.
   // - if x > 0.5p (and inherently x < p) then we must round r up to the next
   //   integral, and subtract p once more.
   //
 
   // Extend the semantics to prevent an overflow/underflow or inexact result.
   bool losesInfo;
   fltSemantics extendedSemantics = *semantics;
   extendedSemantics.maxExponent++;
   extendedSemantics.minExponent--;
   extendedSemantics.precision += 2;
 
   IEEEFloat VEx = *this;
   fs = VEx.convert(extendedSemantics, rmNearestTiesToEven, &losesInfo);
   assert(fs == opOK && !losesInfo);
   IEEEFloat PEx = P;
   fs = PEx.convert(extendedSemantics, rmNearestTiesToEven, &losesInfo);
   assert(fs == opOK && !losesInfo);
 
   // It is simpler to work with 2x instead of 0.5p, and we do not need to lose
   // any fraction.
   fs = VEx.add(VEx, rmNearestTiesToEven);
   assert(fs == opOK);
 
   if (VEx.compare(PEx) == cmpGreaterThan) {
     fs = subtract(P, rmNearestTiesToEven);
     assert(fs == opOK);
 
     // Make VEx = this.add(this), but because we have different semantics, we do
     // not want to `convert` again, so we just subtract PEx twice (which equals
     // to the desired value).
     fs = VEx.subtract(PEx, rmNearestTiesToEven);
     assert(fs == opOK);
     fs = VEx.subtract(PEx, rmNearestTiesToEven);
     assert(fs == opOK);
 
     cmpResult result = VEx.compare(PEx);
     if (result == cmpGreaterThan || result == cmpEqual) {
       fs = subtract(P, rmNearestTiesToEven);
       assert(fs == opOK);
     }
   }
 
   if (isZero())
     sign = origSign;    // IEEE754 requires this
   else
     sign ^= origSign;
   return fs;
 }
 
 /* Normalized llvm frem (C fmod). */
 IEEEFloat::opStatus IEEEFloat::mod(const IEEEFloat &rhs) {
   opStatus fs;
   fs = modSpecials(rhs);
   unsigned int origSign = sign;
 
   while (isFiniteNonZero() && rhs.isFiniteNonZero() &&
          compareAbsoluteValue(rhs) != cmpLessThan) {
     IEEEFloat V = scalbn(rhs, ilogb(*this) - ilogb(rhs), rmNearestTiesToEven);
     if (compareAbsoluteValue(V) == cmpLessThan)
       V = scalbn(V, -1, rmNearestTiesToEven);
     V.sign = sign;
 
     fs = subtract(V, rmNearestTiesToEven);
     assert(fs==opOK);
   }
   if (isZero())
     sign = origSign; // fmod requires this
   return fs;
 }
 
 /* Normalized fused-multiply-add.  */
 IEEEFloat::opStatus IEEEFloat::fusedMultiplyAdd(const IEEEFloat &multiplicand,
                                                 const IEEEFloat &addend,
                                                 roundingMode rounding_mode) {
   opStatus fs;
 
   /* Post-multiplication sign, before addition.  */
   sign ^= multiplicand.sign;
 
   /* If and only if all arguments are normal do we need to do an
      extended-precision calculation.  */
   if (isFiniteNonZero() &&
       multiplicand.isFiniteNonZero() &&
       addend.isFinite()) {
     lostFraction lost_fraction;
 
     lost_fraction = multiplySignificand(multiplicand, addend);
     fs = normalize(rounding_mode, lost_fraction);
     if (lost_fraction != lfExactlyZero)
       fs = (opStatus) (fs | opInexact);
 
     /* If two numbers add (exactly) to zero, IEEE 754 decrees it is a
        positive zero unless rounding to minus infinity, except that
        adding two like-signed zeroes gives that zero.  */
     if (category == fcZero && !(fs & opUnderflow) && sign != addend.sign)
       sign = (rounding_mode == rmTowardNegative);
   } else {
     fs = multiplySpecials(multiplicand);
 
     /* FS can only be opOK or opInvalidOp.  There is no more work
        to do in the latter case.  The IEEE-754R standard says it is
        implementation-defined in this case whether, if ADDEND is a
        quiet NaN, we raise invalid op; this implementation does so.
 
        If we need to do the addition we can do so with normal
        precision.  */
     if (fs == opOK)
       fs = addOrSubtract(addend, rounding_mode, false);
   }
 
   return fs;
 }
 
 /* Rounding-mode correct round to integral value.  */
 IEEEFloat::opStatus IEEEFloat::roundToIntegral(roundingMode rounding_mode) {
   opStatus fs;
 
   if (isInfinity())
     // [IEEE Std 754-2008 6.1]:
     // The behavior of infinity in floating-point arithmetic is derived from the
     // limiting cases of real arithmetic with operands of arbitrarily
     // large magnitude, when such a limit exists.
     // ...
     // Operations on infinite operands are usually exact and therefore signal no
     // exceptions ...
     return opOK;
 
   if (isNaN()) {
     if (isSignaling()) {
       // [IEEE Std 754-2008 6.2]:
       // Under default exception handling, any operation signaling an invalid
       // operation exception and for which a floating-point result is to be
       // delivered shall deliver a quiet NaN.
       makeQuiet();
       // [IEEE Std 754-2008 6.2]:
       // Signaling NaNs shall be reserved operands that, under default exception
       // handling, signal the invalid operation exception(see 7.2) for every
       // general-computational and signaling-computational operation except for
       // the conversions described in 5.12.
       return opInvalidOp;
     } else {
       // [IEEE Std 754-2008 6.2]:
       // For an operation with quiet NaN inputs, other than maximum and minimum
       // operations, if a floating-point result is to be delivered the result
       // shall be a quiet NaN which should be one of the input NaNs.
       // ...
       // Every general-computational and quiet-computational operation involving
       // one or more input NaNs, none of them signaling, shall signal no
       // exception, except fusedMultiplyAdd might signal the invalid operation
       // exception(see 7.2).
       return opOK;
     }
   }
 
   if (isZero()) {
     // [IEEE Std 754-2008 6.3]:
     // ... the sign of the result of conversions, the quantize operation, the
     // roundToIntegral operations, and the roundToIntegralExact(see 5.3.1) is
     // the sign of the first or only operand.
     return opOK;
   }
 
   // If the exponent is large enough, we know that this value is already
   // integral, and the arithmetic below would potentially cause it to saturate
   // to +/-Inf.  Bail out early instead.
   if (exponent+1 >= (int)semanticsPrecision(*semantics))
     return opOK;
 
   // The algorithm here is quite simple: we add 2^(p-1), where p is the
   // precision of our format, and then subtract it back off again.  The choice
   // of rounding modes for the addition/subtraction determines the rounding mode
   // for our integral rounding as well.
   // NOTE: When the input value is negative, we do subtraction followed by
   // addition instead.
   APInt IntegerConstant(NextPowerOf2(semanticsPrecision(*semantics)), 1);
   IntegerConstant <<= semanticsPrecision(*semantics)-1;
   IEEEFloat MagicConstant(*semantics);
   fs = MagicConstant.convertFromAPInt(IntegerConstant, false,
                                       rmNearestTiesToEven);
   assert(fs == opOK);
   MagicConstant.sign = sign;
 
   // Preserve the input sign so that we can handle the case of zero result
   // correctly.
   bool inputSign = isNegative();
 
   fs = add(MagicConstant, rounding_mode);
 
   // Current value and 'MagicConstant' are both integers, so the result of the
   // subtraction is always exact according to Sterbenz' lemma.
   subtract(MagicConstant, rounding_mode);
 
   // Restore the input sign.
   if (inputSign != isNegative())
     changeSign();
 
   return fs;
 }
 
 
 /* Comparison requires normalized numbers.  */
 IEEEFloat::cmpResult IEEEFloat::compare(const IEEEFloat &rhs) const {
   cmpResult result;
 
   assert(semantics == rhs.semantics);
 
   switch (PackCategoriesIntoKey(category, rhs.category)) {
   default:
     llvm_unreachable(nullptr);
 
   case PackCategoriesIntoKey(fcNaN, fcZero):
   case PackCategoriesIntoKey(fcNaN, fcNormal):
   case PackCategoriesIntoKey(fcNaN, fcInfinity):
   case PackCategoriesIntoKey(fcNaN, fcNaN):
   case PackCategoriesIntoKey(fcZero, fcNaN):
   case PackCategoriesIntoKey(fcNormal, fcNaN):
   case PackCategoriesIntoKey(fcInfinity, fcNaN):
     return cmpUnordered;
 
   case PackCategoriesIntoKey(fcInfinity, fcNormal):
   case PackCategoriesIntoKey(fcInfinity, fcZero):
   case PackCategoriesIntoKey(fcNormal, fcZero):
     if (sign)
       return cmpLessThan;
     else
       return cmpGreaterThan;
 
   case PackCategoriesIntoKey(fcNormal, fcInfinity):
   case PackCategoriesIntoKey(fcZero, fcInfinity):
   case PackCategoriesIntoKey(fcZero, fcNormal):
     if (rhs.sign)
       return cmpGreaterThan;
     else
       return cmpLessThan;
 
   case PackCategoriesIntoKey(fcInfinity, fcInfinity):
     if (sign == rhs.sign)
       return cmpEqual;
     else if (sign)
       return cmpLessThan;
     else
       return cmpGreaterThan;
 
   case PackCategoriesIntoKey(fcZero, fcZero):
     return cmpEqual;
 
   case PackCategoriesIntoKey(fcNormal, fcNormal):
     break;
   }
 
   /* Two normal numbers.  Do they have the same sign?  */
   if (sign != rhs.sign) {
     if (sign)
       result = cmpLessThan;
     else
       result = cmpGreaterThan;
   } else {
     /* Compare absolute values; invert result if negative.  */
     result = compareAbsoluteValue(rhs);
 
     if (sign) {
       if (result == cmpLessThan)
         result = cmpGreaterThan;
       else if (result == cmpGreaterThan)
         result = cmpLessThan;
     }
   }
 
   return result;
 }
 
 /// IEEEFloat::convert - convert a value of one floating point type to another.
 /// The return value corresponds to the IEEE754 exceptions.  *losesInfo
 /// records whether the transformation lost information, i.e. whether
 /// converting the result back to the original type will produce the
 /// original value (this is almost the same as return value==fsOK, but there
 /// are edge cases where this is not so).
 
 IEEEFloat::opStatus IEEEFloat::convert(const fltSemantics &toSemantics,
                                        roundingMode rounding_mode,
                                        bool *losesInfo) {
   lostFraction lostFraction;
   unsigned int newPartCount, oldPartCount;
   opStatus fs;
   int shift;
   const fltSemantics &fromSemantics = *semantics;
 
   lostFraction = lfExactlyZero;
   newPartCount = partCountForBits(toSemantics.precision + 1);
   oldPartCount = partCount();
   shift = toSemantics.precision - fromSemantics.precision;
 
   bool X86SpecialNan = false;
   if (&fromSemantics == &semX87DoubleExtended &&
       &toSemantics != &semX87DoubleExtended && category == fcNaN &&
       (!(*significandParts() & 0x8000000000000000ULL) ||
        !(*significandParts() & 0x4000000000000000ULL))) {
     // x86 has some unusual NaNs which cannot be represented in any other
     // format; note them here.
     X86SpecialNan = true;
   }
 
   // If this is a truncation of a denormal number, and the target semantics
   // has larger exponent range than the source semantics (this can happen
   // when truncating from PowerPC double-double to double format), the
   // right shift could lose result mantissa bits.  Adjust exponent instead
   // of performing excessive shift.
   if (shift < 0 && isFiniteNonZero()) {
     int exponentChange = significandMSB() + 1 - fromSemantics.precision;
     if (exponent + exponentChange < toSemantics.minExponent)
       exponentChange = toSemantics.minExponent - exponent;
     if (exponentChange < shift)
       exponentChange = shift;
     if (exponentChange < 0) {
       shift -= exponentChange;
       exponent += exponentChange;
     }
   }
 
   // If this is a truncation, perform the shift before we narrow the storage.
   if (shift < 0 && (isFiniteNonZero() || category==fcNaN))
     lostFraction = shiftRight(significandParts(), oldPartCount, -shift);
 
   // Fix the storage so it can hold to new value.
   if (newPartCount > oldPartCount) {
     // The new type requires more storage; make it available.
     integerPart *newParts;
     newParts = new integerPart[newPartCount];
     APInt::tcSet(newParts, 0, newPartCount);
     if (isFiniteNonZero() || category==fcNaN)
       APInt::tcAssign(newParts, significandParts(), oldPartCount);
     freeSignificand();
     significand.parts = newParts;
   } else if (newPartCount == 1 && oldPartCount != 1) {
     // Switch to built-in storage for a single part.
     integerPart newPart = 0;
     if (isFiniteNonZero() || category==fcNaN)
       newPart = significandParts()[0];
     freeSignificand();
     significand.part = newPart;
   }
 
   // Now that we have the right storage, switch the semantics.
   semantics = &toSemantics;
 
   // If this is an extension, perform the shift now that the storage is
   // available.
   if (shift > 0 && (isFiniteNonZero() || category==fcNaN))
     APInt::tcShiftLeft(significandParts(), newPartCount, shift);
 
   if (isFiniteNonZero()) {
     fs = normalize(rounding_mode, lostFraction);
     *losesInfo = (fs != opOK);
   } else if (category == fcNaN) {
     *losesInfo = lostFraction != lfExactlyZero || X86SpecialNan;
 
     // For x87 extended precision, we want to make a NaN, not a special NaN if
     // the input wasn't special either.
     if (!X86SpecialNan && semantics == &semX87DoubleExtended)
       APInt::tcSetBit(significandParts(), semantics->precision - 1);
 
+    // If we are truncating NaN, it is possible that we shifted out all of the
+    // set bits in a signalling NaN payload. But NaN must remain NaN, so some
+    // bit in the significand must be set (otherwise it is Inf).
+    // This can only happen with sNaN. Set the 1st bit after the quiet bit,
+    // so that we still have an sNaN.
+    // FIXME: Set quiet and return opInvalidOp (on convert of any sNaN).
+    //        But this requires fixing LLVM to parse 32-bit hex FP or ignoring
+    //        conversions while parsing IR.
+    if (APInt::tcIsZero(significandParts(), newPartCount)) {
+      assert(shift < 0 && "Should not lose NaN payload on extend");
+      assert(semantics->precision >= 3 && "Unexpectedly narrow significand");
+      assert(*losesInfo && "Missing payload should have set lost info");
+      APInt::tcSetBit(significandParts(), semantics->precision - 3);
+    }
+
     // gcc forces the Quiet bit on, which means (float)(double)(float_sNan)
     // does not give you back the same bits.  This is dubious, and we
     // don't currently do it.  You're really supposed to get
     // an invalid operation signal at runtime, but nobody does that.
     fs = opOK;
   } else {
     *losesInfo = false;
     fs = opOK;
   }
 
   return fs;
 }
 
 /* Convert a floating point number to an integer according to the
    rounding mode.  If the rounded integer value is out of range this
    returns an invalid operation exception and the contents of the
    destination parts are unspecified.  If the rounded value is in
    range but the floating point number is not the exact integer, the C
    standard doesn't require an inexact exception to be raised.  IEEE
    854 does require it so we do that.
 
    Note that for conversions to integer type the C standard requires
    round-to-zero to always be used.  */
 IEEEFloat::opStatus IEEEFloat::convertToSignExtendedInteger(
     MutableArrayRef<integerPart> parts, unsigned int width, bool isSigned,
     roundingMode rounding_mode, bool *isExact) const {
   lostFraction lost_fraction;
   const integerPart *src;
   unsigned int dstPartsCount, truncatedBits;
 
   *isExact = false;
 
   /* Handle the three special cases first.  */
   if (category == fcInfinity || category == fcNaN)
     return opInvalidOp;
 
   dstPartsCount = partCountForBits(width);
   assert(dstPartsCount <= parts.size() && "Integer too big");
 
   if (category == fcZero) {
     APInt::tcSet(parts.data(), 0, dstPartsCount);
     // Negative zero can't be represented as an int.
     *isExact = !sign;
     return opOK;
   }
 
   src = significandParts();
 
   /* Step 1: place our absolute value, with any fraction truncated, in
      the destination.  */
   if (exponent < 0) {
     /* Our absolute value is less than one; truncate everything.  */
     APInt::tcSet(parts.data(), 0, dstPartsCount);
     /* For exponent -1 the integer bit represents .5, look at that.
        For smaller exponents leftmost truncated bit is 0. */
     truncatedBits = semantics->precision -1U - exponent;
   } else {
     /* We want the most significant (exponent + 1) bits; the rest are
        truncated.  */
     unsigned int bits = exponent + 1U;
 
     /* Hopelessly large in magnitude?  */
     if (bits > width)
       return opInvalidOp;
 
     if (bits < semantics->precision) {
       /* We truncate (semantics->precision - bits) bits.  */
       truncatedBits = semantics->precision - bits;
       APInt::tcExtract(parts.data(), dstPartsCount, src, bits, truncatedBits);
     } else {
       /* We want at least as many bits as are available.  */
       APInt::tcExtract(parts.data(), dstPartsCount, src, semantics->precision,
                        0);
       APInt::tcShiftLeft(parts.data(), dstPartsCount,
                          bits - semantics->precision);
       truncatedBits = 0;
     }
   }
 
   /* Step 2: work out any lost fraction, and increment the absolute
      value if we would round away from zero.  */
   if (truncatedBits) {
     lost_fraction = lostFractionThroughTruncation(src, partCount(),
                                                   truncatedBits);
     if (lost_fraction != lfExactlyZero &&
         roundAwayFromZero(rounding_mode, lost_fraction, truncatedBits)) {
       if (APInt::tcIncrement(parts.data(), dstPartsCount))
         return opInvalidOp;     /* Overflow.  */
     }
   } else {
     lost_fraction = lfExactlyZero;
   }
 
   /* Step 3: check if we fit in the destination.  */
   unsigned int omsb = APInt::tcMSB(parts.data(), dstPartsCount) + 1;
 
   if (sign) {
     if (!isSigned) {
       /* Negative numbers cannot be represented as unsigned.  */
       if (omsb != 0)
         return opInvalidOp;
     } else {
       /* It takes omsb bits to represent the unsigned integer value.
          We lose a bit for the sign, but care is needed as the
          maximally negative integer is a special case.  */
       if (omsb == width &&
           APInt::tcLSB(parts.data(), dstPartsCount) + 1 != omsb)
         return opInvalidOp;
 
       /* This case can happen because of rounding.  */
       if (omsb > width)
         return opInvalidOp;
     }
 
     APInt::tcNegate (parts.data(), dstPartsCount);
   } else {
     if (omsb >= width + !isSigned)
       return opInvalidOp;
   }
 
   if (lost_fraction == lfExactlyZero) {
     *isExact = true;
     return opOK;
   } else
     return opInexact;
 }
 
 /* Same as convertToSignExtendedInteger, except we provide
    deterministic values in case of an invalid operation exception,
    namely zero for NaNs and the minimal or maximal value respectively
    for underflow or overflow.
    The *isExact output tells whether the result is exact, in the sense
    that converting it back to the original floating point type produces
    the original value.  This is almost equivalent to result==opOK,
    except for negative zeroes.
 */
 IEEEFloat::opStatus
 IEEEFloat::convertToInteger(MutableArrayRef<integerPart> parts,
                             unsigned int width, bool isSigned,
                             roundingMode rounding_mode, bool *isExact) const {
   opStatus fs;
 
   fs = convertToSignExtendedInteger(parts, width, isSigned, rounding_mode,
                                     isExact);
 
   if (fs == opInvalidOp) {
     unsigned int bits, dstPartsCount;
 
     dstPartsCount = partCountForBits(width);
     assert(dstPartsCount <= parts.size() && "Integer too big");
 
     if (category == fcNaN)
       bits = 0;
     else if (sign)
       bits = isSigned;
     else
       bits = width - isSigned;
 
     APInt::tcSetLeastSignificantBits(parts.data(), dstPartsCount, bits);
     if (sign && isSigned)
       APInt::tcShiftLeft(parts.data(), dstPartsCount, width - 1);
   }
 
   return fs;
 }
 
 /* Convert an unsigned integer SRC to a floating point number,
    rounding according to ROUNDING_MODE.  The sign of the floating
    point number is not modified.  */
 IEEEFloat::opStatus IEEEFloat::convertFromUnsignedParts(
     const integerPart *src, unsigned int srcCount, roundingMode rounding_mode) {
   unsigned int omsb, precision, dstCount;
   integerPart *dst;
   lostFraction lost_fraction;
 
   category = fcNormal;
   omsb = APInt::tcMSB(src, srcCount) + 1;
   dst = significandParts();
   dstCount = partCount();
   precision = semantics->precision;
 
   /* We want the most significant PRECISION bits of SRC.  There may not
      be that many; extract what we can.  */
   if (precision <= omsb) {
     exponent = omsb - 1;
     lost_fraction = lostFractionThroughTruncation(src, srcCount,
                                                   omsb - precision);
     APInt::tcExtract(dst, dstCount, src, precision, omsb - precision);
   } else {
     exponent = precision - 1;
     lost_fraction = lfExactlyZero;
     APInt::tcExtract(dst, dstCount, src, omsb, 0);
   }
 
   return normalize(rounding_mode, lost_fraction);
 }
 
 IEEEFloat::opStatus IEEEFloat::convertFromAPInt(const APInt &Val, bool isSigned,
                                                 roundingMode rounding_mode) {
   unsigned int partCount = Val.getNumWords();
   APInt api = Val;
 
   sign = false;
   if (isSigned && api.isNegative()) {
     sign = true;
     api = -api;
   }
 
   return convertFromUnsignedParts(api.getRawData(), partCount, rounding_mode);
 }
 
 /* Convert a two's complement integer SRC to a floating point number,
    rounding according to ROUNDING_MODE.  ISSIGNED is true if the
    integer is signed, in which case it must be sign-extended.  */
 IEEEFloat::opStatus
 IEEEFloat::convertFromSignExtendedInteger(const integerPart *src,
                                           unsigned int srcCount, bool isSigned,
                                           roundingMode rounding_mode) {
   opStatus status;
 
   if (isSigned &&
       APInt::tcExtractBit(src, srcCount * integerPartWidth - 1)) {
     integerPart *copy;
 
     /* If we're signed and negative negate a copy.  */
     sign = true;
     copy = new integerPart[srcCount];
     APInt::tcAssign(copy, src, srcCount);
     APInt::tcNegate(copy, srcCount);
     status = convertFromUnsignedParts(copy, srcCount, rounding_mode);
     delete [] copy;
   } else {
     sign = false;
     status = convertFromUnsignedParts(src, srcCount, rounding_mode);
   }
 
   return status;
 }
 
 /* FIXME: should this just take a const APInt reference?  */
 IEEEFloat::opStatus
 IEEEFloat::convertFromZeroExtendedInteger(const integerPart *parts,
                                           unsigned int width, bool isSigned,
                                           roundingMode rounding_mode) {
   unsigned int partCount = partCountForBits(width);
   APInt api = APInt(width, makeArrayRef(parts, partCount));
 
   sign = false;
   if (isSigned && APInt::tcExtractBit(parts, width - 1)) {
     sign = true;
     api = -api;
   }
 
   return convertFromUnsignedParts(api.getRawData(), partCount, rounding_mode);
 }
 
 Expected<IEEEFloat::opStatus>
 IEEEFloat::convertFromHexadecimalString(StringRef s,
                                         roundingMode rounding_mode) {
   lostFraction lost_fraction = lfExactlyZero;
 
   category = fcNormal;
   zeroSignificand();
   exponent = 0;
 
   integerPart *significand = significandParts();
   unsigned partsCount = partCount();
   unsigned bitPos = partsCount * integerPartWidth;
   bool computedTrailingFraction = false;
 
   // Skip leading zeroes and any (hexa)decimal point.
   StringRef::iterator begin = s.begin();
   StringRef::iterator end = s.end();
   StringRef::iterator dot;
   auto PtrOrErr = skipLeadingZeroesAndAnyDot(begin, end, &dot);
   if (!PtrOrErr)
     return PtrOrErr.takeError();
   StringRef::iterator p = *PtrOrErr;
   StringRef::iterator firstSignificantDigit = p;
 
   while (p != end) {
     integerPart hex_value;
 
     if (*p == '.') {
       if (dot != end)
         return createError("String contains multiple dots");
       dot = p++;
       continue;
     }
 
     hex_value = hexDigitValue(*p);
     if (hex_value == -1U)
       break;
 
     p++;
 
     // Store the number while we have space.
     if (bitPos) {
       bitPos -= 4;
       hex_value <<= bitPos % integerPartWidth;
       significand[bitPos / integerPartWidth] |= hex_value;
     } else if (!computedTrailingFraction) {
       auto FractOrErr = trailingHexadecimalFraction(p, end, hex_value);
       if (!FractOrErr)
         return FractOrErr.takeError();
       lost_fraction = *FractOrErr;
       computedTrailingFraction = true;
     }
   }
 
   /* Hex floats require an exponent but not a hexadecimal point.  */
   if (p == end)
     return createError("Hex strings require an exponent");
   if (*p != 'p' && *p != 'P')
     return createError("Invalid character in significand");
   if (p == begin)
     return createError("Significand has no digits");
   if (dot != end && p - begin == 1)
     return createError("Significand has no digits");
 
   /* Ignore the exponent if we are zero.  */
   if (p != firstSignificantDigit) {
     int expAdjustment;
 
     /* Implicit hexadecimal point?  */
     if (dot == end)
       dot = p;
 
     /* Calculate the exponent adjustment implicit in the number of
        significant digits.  */
     expAdjustment = static_cast<int>(dot - firstSignificantDigit);
     if (expAdjustment < 0)
       expAdjustment++;
     expAdjustment = expAdjustment * 4 - 1;
 
     /* Adjust for writing the significand starting at the most
        significant nibble.  */
     expAdjustment += semantics->precision;
     expAdjustment -= partsCount * integerPartWidth;
 
     /* Adjust for the given exponent.  */
     auto ExpOrErr = totalExponent(p + 1, end, expAdjustment);
     if (!ExpOrErr)
       return ExpOrErr.takeError();
     exponent = *ExpOrErr;
   }
 
   return normalize(rounding_mode, lost_fraction);
 }
 
 IEEEFloat::opStatus
 IEEEFloat::roundSignificandWithExponent(const integerPart *decSigParts,
                                         unsigned sigPartCount, int exp,
                                         roundingMode rounding_mode) {
   unsigned int parts, pow5PartCount;
   fltSemantics calcSemantics = { 32767, -32767, 0, 0 };
   integerPart pow5Parts[maxPowerOfFiveParts];
   bool isNearest;
 
   isNearest = (rounding_mode == rmNearestTiesToEven ||
                rounding_mode == rmNearestTiesToAway);
 
   parts = partCountForBits(semantics->precision + 11);
 
   /* Calculate pow(5, abs(exp)).  */
   pow5PartCount = powerOf5(pow5Parts, exp >= 0 ? exp: -exp);
 
   for (;; parts *= 2) {
     opStatus sigStatus, powStatus;
     unsigned int excessPrecision, truncatedBits;
 
     calcSemantics.precision = parts * integerPartWidth - 1;
     excessPrecision = calcSemantics.precision - semantics->precision;
     truncatedBits = excessPrecision;
 
     IEEEFloat decSig(calcSemantics, uninitialized);
     decSig.makeZero(sign);
     IEEEFloat pow5(calcSemantics);
 
     sigStatus = decSig.convertFromUnsignedParts(decSigParts, sigPartCount,
                                                 rmNearestTiesToEven);
     powStatus = pow5.convertFromUnsignedParts(pow5Parts, pow5PartCount,
                                               rmNearestTiesToEven);
     /* Add exp, as 10^n = 5^n * 2^n.  */
     decSig.exponent += exp;
 
     lostFraction calcLostFraction;
     integerPart HUerr, HUdistance;
     unsigned int powHUerr;
 
     if (exp >= 0) {
       /* multiplySignificand leaves the precision-th bit set to 1.  */
       calcLostFraction = decSig.multiplySignificand(pow5);
       powHUerr = powStatus != opOK;
     } else {
       calcLostFraction = decSig.divideSignificand(pow5);
       /* Denormal numbers have less precision.  */
       if (decSig.exponent < semantics->minExponent) {
         excessPrecision += (semantics->minExponent - decSig.exponent);
         truncatedBits = excessPrecision;
         if (excessPrecision > calcSemantics.precision)
           excessPrecision = calcSemantics.precision;
       }
       /* Extra half-ulp lost in reciprocal of exponent.  */
       powHUerr = (powStatus == opOK && calcLostFraction == lfExactlyZero) ? 0:2;
     }
 
     /* Both multiplySignificand and divideSignificand return the
        result with the integer bit set.  */
     assert(APInt::tcExtractBit
            (decSig.significandParts(), calcSemantics.precision - 1) == 1);
 
     HUerr = HUerrBound(calcLostFraction != lfExactlyZero, sigStatus != opOK,
                        powHUerr);
     HUdistance = 2 * ulpsFromBoundary(decSig.significandParts(),
                                       excessPrecision, isNearest);
 
     /* Are we guaranteed to round correctly if we truncate?  */
     if (HUdistance >= HUerr) {
       APInt::tcExtract(significandParts(), partCount(), decSig.significandParts(),
                        calcSemantics.precision - excessPrecision,
                        excessPrecision);
       /* Take the exponent of decSig.  If we tcExtract-ed less bits
          above we must adjust our exponent to compensate for the
          implicit right shift.  */
       exponent = (decSig.exponent + semantics->precision
                   - (calcSemantics.precision - excessPrecision));
       calcLostFraction = lostFractionThroughTruncation(decSig.significandParts(),
                                                        decSig.partCount(),
                                                        truncatedBits);
       return normalize(rounding_mode, calcLostFraction);
     }
   }
 }
 
 Expected<IEEEFloat::opStatus>
 IEEEFloat::convertFromDecimalString(StringRef str, roundingMode rounding_mode) {
   decimalInfo D;
   opStatus fs;
 
   /* Scan the text.  */
   StringRef::iterator p = str.begin();
   if (Error Err = interpretDecimal(p, str.end(), &D))
     return std::move(Err);
 
   /* Handle the quick cases.  First the case of no significant digits,
      i.e. zero, and then exponents that are obviously too large or too
      small.  Writing L for log 10 / log 2, a number d.ddddd*10^exp
      definitely overflows if
 
            (exp - 1) * L >= maxExponent
 
      and definitely underflows to zero where
 
            (exp + 1) * L <= minExponent - precision
 
      With integer arithmetic the tightest bounds for L are
 
            93/28 < L < 196/59            [ numerator <= 256 ]
            42039/12655 < L < 28738/8651  [ numerator <= 65536 ]
   */
 
   // Test if we have a zero number allowing for strings with no null terminators
   // and zero decimals with non-zero exponents.
   //
   // We computed firstSigDigit by ignoring all zeros and dots. Thus if
   // D->firstSigDigit equals str.end(), every digit must be a zero and there can
   // be at most one dot. On the other hand, if we have a zero with a non-zero
   // exponent, then we know that D.firstSigDigit will be non-numeric.
   if (D.firstSigDigit == str.end() || decDigitValue(*D.firstSigDigit) >= 10U) {
     category = fcZero;
     fs = opOK;
 
   /* Check whether the normalized exponent is high enough to overflow
      max during the log-rebasing in the max-exponent check below. */
   } else if (D.normalizedExponent - 1 > INT_MAX / 42039) {
     fs = handleOverflow(rounding_mode);
 
   /* If it wasn't, then it also wasn't high enough to overflow max
      during the log-rebasing in the min-exponent check.  Check that it
      won't overflow min in either check, then perform the min-exponent
      check. */
   } else if (D.normalizedExponent - 1 < INT_MIN / 42039 ||
              (D.normalizedExponent + 1) * 28738 <=
                8651 * (semantics->minExponent - (int) semantics->precision)) {
     /* Underflow to zero and round.  */
     category = fcNormal;
     zeroSignificand();
     fs = normalize(rounding_mode, lfLessThanHalf);
 
   /* We can finally safely perform the max-exponent check. */
   } else if ((D.normalizedExponent - 1) * 42039
              >= 12655 * semantics->maxExponent) {
     /* Overflow and round.  */
     fs = handleOverflow(rounding_mode);
   } else {
     integerPart *decSignificand;
     unsigned int partCount;
 
     /* A tight upper bound on number of bits required to hold an
        N-digit decimal integer is N * 196 / 59.  Allocate enough space
        to hold the full significand, and an extra part required by
        tcMultiplyPart.  */
     partCount = static_cast<unsigned int>(D.lastSigDigit - D.firstSigDigit) + 1;
     partCount = partCountForBits(1 + 196 * partCount / 59);
     decSignificand = new integerPart[partCount + 1];
     partCount = 0;
 
     /* Convert to binary efficiently - we do almost all multiplication
        in an integerPart.  When this would overflow do we do a single
        bignum multiplication, and then revert again to multiplication
        in an integerPart.  */
     do {
       integerPart decValue, val, multiplier;
 
       val = 0;
       multiplier = 1;
 
       do {
         if (*p == '.') {
           p++;
           if (p == str.end()) {
             break;
           }
         }
         decValue = decDigitValue(*p++);
         if (decValue >= 10U) {
           delete[] decSignificand;
           return createError("Invalid character in significand");
         }
         multiplier *= 10;
         val = val * 10 + decValue;
         /* The maximum number that can be multiplied by ten with any
            digit added without overflowing an integerPart.  */
       } while (p <= D.lastSigDigit && multiplier <= (~ (integerPart) 0 - 9) / 10);
 
       /* Multiply out the current part.  */
       APInt::tcMultiplyPart(decSignificand, decSignificand, multiplier, val,
                             partCount, partCount + 1, false);
 
       /* If we used another part (likely but not guaranteed), increase
          the count.  */
       if (decSignificand[partCount])
         partCount++;
     } while (p <= D.lastSigDigit);
 
     category = fcNormal;
     fs = roundSignificandWithExponent(decSignificand, partCount,
                                       D.exponent, rounding_mode);
 
     delete [] decSignificand;
   }
 
   return fs;
 }
 
 bool IEEEFloat::convertFromStringSpecials(StringRef str) {
   const size_t MIN_NAME_SIZE = 3;
 
   if (str.size() < MIN_NAME_SIZE)
     return false;
 
   if (str.equals("inf") || str.equals("INFINITY") || str.equals("+Inf")) {
     makeInf(false);
     return true;
   }
 
   bool IsNegative = str.front() == '-';
   if (IsNegative) {
     str = str.drop_front();
     if (str.size() < MIN_NAME_SIZE)
       return false;
 
     if (str.equals("inf") || str.equals("INFINITY") || str.equals("Inf")) {
       makeInf(true);
       return true;
     }
   }
 
   // If we have a 's' (or 'S') prefix, then this is a Signaling NaN.
   bool IsSignaling = str.front() == 's' || str.front() == 'S';
   if (IsSignaling) {
     str = str.drop_front();
     if (str.size() < MIN_NAME_SIZE)
       return false;
   }
 
   if (str.startswith("nan") || str.startswith("NaN")) {
     str = str.drop_front(3);
 
     // A NaN without payload.
     if (str.empty()) {
       makeNaN(IsSignaling, IsNegative);
       return true;
     }
 
     // Allow the payload to be inside parentheses.
     if (str.front() == '(') {
       // Parentheses should be balanced (and not empty).
       if (str.size() <= 2 || str.back() != ')')
         return false;
 
       str = str.slice(1, str.size() - 1);
     }
 
     // Determine the payload number's radix.
     unsigned Radix = 10;
     if (str[0] == '0') {
       if (str.size() > 1 && tolower(str[1]) == 'x') {
         str = str.drop_front(2);
         Radix = 16;
       } else
         Radix = 8;
     }
 
     // Parse the payload and make the NaN.
     APInt Payload;
     if (!str.getAsInteger(Radix, Payload)) {
       makeNaN(IsSignaling, IsNegative, &Payload);
       return true;
     }
   }
 
   return false;
 }
 
 Expected<IEEEFloat::opStatus>
 IEEEFloat::convertFromString(StringRef str, roundingMode rounding_mode) {
   if (str.empty())
     return createError("Invalid string length");
 
   // Handle special cases.
   if (convertFromStringSpecials(str))
     return opOK;
 
   /* Handle a leading minus sign.  */
   StringRef::iterator p = str.begin();
   size_t slen = str.size();
   sign = *p == '-' ? 1 : 0;
   if (*p == '-' || *p == '+') {
     p++;
     slen--;
     if (!slen)
       return createError("String has no digits");
   }
 
   if (slen >= 2 && p[0] == '0' && (p[1] == 'x' || p[1] == 'X')) {
     if (slen == 2)
       return createError("Invalid string");
     return convertFromHexadecimalString(StringRef(p + 2, slen - 2),
                                         rounding_mode);
   }
 
   return convertFromDecimalString(StringRef(p, slen), rounding_mode);
 }
 
 /* Write out a hexadecimal representation of the floating point value
    to DST, which must be of sufficient size, in the C99 form
    [-]0xh.hhhhp[+-]d.  Return the number of characters written,
    excluding the terminating NUL.
 
    If UPPERCASE, the output is in upper case, otherwise in lower case.
 
    HEXDIGITS digits appear altogether, rounding the value if
    necessary.  If HEXDIGITS is 0, the minimal precision to display the
    number precisely is used instead.  If nothing would appear after
    the decimal point it is suppressed.
 
    The decimal exponent is always printed and has at least one digit.
    Zero values display an exponent of zero.  Infinities and NaNs
    appear as "infinity" or "nan" respectively.
 
    The above rules are as specified by C99.  There is ambiguity about
    what the leading hexadecimal digit should be.  This implementation
    uses whatever is necessary so that the exponent is displayed as
    stored.  This implies the exponent will fall within the IEEE format
    range, and the leading hexadecimal digit will be 0 (for denormals),
    1 (normal numbers) or 2 (normal numbers rounded-away-from-zero with
    any other digits zero).
 */
 unsigned int IEEEFloat::convertToHexString(char *dst, unsigned int hexDigits,
                                            bool upperCase,
                                            roundingMode rounding_mode) const {
   char *p;
 
   p = dst;
   if (sign)
     *dst++ = '-';
 
   switch (category) {
   case fcInfinity:
     memcpy (dst, upperCase ? infinityU: infinityL, sizeof infinityU - 1);
     dst += sizeof infinityL - 1;
     break;
 
   case fcNaN:
     memcpy (dst, upperCase ? NaNU: NaNL, sizeof NaNU - 1);
     dst += sizeof NaNU - 1;
     break;
 
   case fcZero:
     *dst++ = '0';
     *dst++ = upperCase ? 'X': 'x';
     *dst++ = '0';
     if (hexDigits > 1) {
       *dst++ = '.';
       memset (dst, '0', hexDigits - 1);
       dst += hexDigits - 1;
     }
     *dst++ = upperCase ? 'P': 'p';
     *dst++ = '0';
     break;
 
   case fcNormal:
     dst = convertNormalToHexString (dst, hexDigits, upperCase, rounding_mode);
     break;
   }
 
   *dst = 0;
 
   return static_cast<unsigned int>(dst - p);
 }
 
 /* Does the hard work of outputting the correctly rounded hexadecimal
    form of a normal floating point number with the specified number of
    hexadecimal digits.  If HEXDIGITS is zero the minimum number of
    digits necessary to print the value precisely is output.  */
 char *IEEEFloat::convertNormalToHexString(char *dst, unsigned int hexDigits,
                                           bool upperCase,
                                           roundingMode rounding_mode) const {
   unsigned int count, valueBits, shift, partsCount, outputDigits;
   const char *hexDigitChars;
   const integerPart *significand;
   char *p;
   bool roundUp;
 
   *dst++ = '0';
   *dst++ = upperCase ? 'X': 'x';
 
   roundUp = false;
   hexDigitChars = upperCase ? hexDigitsUpper: hexDigitsLower;
 
   significand = significandParts();
   partsCount = partCount();
 
   /* +3 because the first digit only uses the single integer bit, so
      we have 3 virtual zero most-significant-bits.  */
   valueBits = semantics->precision + 3;
   shift = integerPartWidth - valueBits % integerPartWidth;
 
   /* The natural number of digits required ignoring trailing
      insignificant zeroes.  */
   outputDigits = (valueBits - significandLSB () + 3) / 4;
 
   /* hexDigits of zero means use the required number for the
      precision.  Otherwise, see if we are truncating.  If we are,
      find out if we need to round away from zero.  */
   if (hexDigits) {
     if (hexDigits < outputDigits) {
       /* We are dropping non-zero bits, so need to check how to round.
          "bits" is the number of dropped bits.  */
       unsigned int bits;
       lostFraction fraction;
 
       bits = valueBits - hexDigits * 4;
       fraction = lostFractionThroughTruncation (significand, partsCount, bits);
       roundUp = roundAwayFromZero(rounding_mode, fraction, bits);
     }
     outputDigits = hexDigits;
   }
 
   /* Write the digits consecutively, and start writing in the location
      of the hexadecimal point.  We move the most significant digit
      left and add the hexadecimal point later.  */
   p = ++dst;
 
   count = (valueBits + integerPartWidth - 1) / integerPartWidth;
 
   while (outputDigits && count) {
     integerPart part;
 
     /* Put the most significant integerPartWidth bits in "part".  */
     if (--count == partsCount)
       part = 0;  /* An imaginary higher zero part.  */
     else
       part = significand[count] << shift;
 
     if (count && shift)
       part |= significand[count - 1] >> (integerPartWidth - shift);
 
     /* Convert as much of "part" to hexdigits as we can.  */
     unsigned int curDigits = integerPartWidth / 4;
 
     if (curDigits > outputDigits)
       curDigits = outputDigits;
     dst += partAsHex (dst, part, curDigits, hexDigitChars);
     outputDigits -= curDigits;
   }
 
   if (roundUp) {
     char *q = dst;
 
     /* Note that hexDigitChars has a trailing '0'.  */
     do {
       q--;
       *q = hexDigitChars[hexDigitValue (*q) + 1];
     } while (*q == '0');
     assert(q >= p);
   } else {
     /* Add trailing zeroes.  */
     memset (dst, '0', outputDigits);
     dst += outputDigits;
   }
 
   /* Move the most significant digit to before the point, and if there
      is something after the decimal point add it.  This must come
      after rounding above.  */
   p[-1] = p[0];
   if (dst -1 == p)
     dst--;
   else
     p[0] = '.';
 
   /* Finally output the exponent.  */
   *dst++ = upperCase ? 'P': 'p';
 
   return writeSignedDecimal (dst, exponent);
 }
 
 hash_code hash_value(const IEEEFloat &Arg) {
   if (!Arg.isFiniteNonZero())
     return hash_combine((uint8_t)Arg.category,
                         // NaN has no sign, fix it at zero.
                         Arg.isNaN() ? (uint8_t)0 : (uint8_t)Arg.sign,
                         Arg.semantics->precision);
 
   // Normal floats need their exponent and significand hashed.
   return hash_combine((uint8_t)Arg.category, (uint8_t)Arg.sign,
                       Arg.semantics->precision, Arg.exponent,
                       hash_combine_range(
                         Arg.significandParts(),
                         Arg.significandParts() + Arg.partCount()));
 }
 
 // Conversion from APFloat to/from host float/double.  It may eventually be
 // possible to eliminate these and have everybody deal with APFloats, but that
 // will take a while.  This approach will not easily extend to long double.
 // Current implementation requires integerPartWidth==64, which is correct at
 // the moment but could be made more general.
 
 // Denormals have exponent minExponent in APFloat, but minExponent-1 in
 // the actual IEEE respresentations.  We compensate for that here.
 
 APInt IEEEFloat::convertF80LongDoubleAPFloatToAPInt() const {
   assert(semantics == (const llvm::fltSemantics*)&semX87DoubleExtended);
   assert(partCount()==2);
 
   uint64_t myexponent, mysignificand;
 
   if (isFiniteNonZero()) {
     myexponent = exponent+16383; //bias
     mysignificand = significandParts()[0];
     if (myexponent==1 && !(mysignificand & 0x8000000000000000ULL))
       myexponent = 0;   // denormal
   } else if (category==fcZero) {
     myexponent = 0;
     mysignificand = 0;
   } else if (category==fcInfinity) {
     myexponent = 0x7fff;
     mysignificand = 0x8000000000000000ULL;
   } else {
     assert(category == fcNaN && "Unknown category");
     myexponent = 0x7fff;
     mysignificand = significandParts()[0];
   }
 
   uint64_t words[2];
   words[0] = mysignificand;
   words[1] =  ((uint64_t)(sign & 1) << 15) |
               (myexponent & 0x7fffLL);
   return APInt(80, words);
 }
 
 APInt IEEEFloat::convertPPCDoubleDoubleAPFloatToAPInt() const {
   assert(semantics == (const llvm::fltSemantics *)&semPPCDoubleDoubleLegacy);
   assert(partCount()==2);
 
   uint64_t words[2];
   opStatus fs;
   bool losesInfo;
 
   // Convert number to double.  To avoid spurious underflows, we re-
   // normalize against the "double" minExponent first, and only *then*
   // truncate the mantissa.  The result of that second conversion
   // may be inexact, but should never underflow.
   // Declare fltSemantics before APFloat that uses it (and
   // saves pointer to it) to ensure correct destruction order.
   fltSemantics extendedSemantics = *semantics;
   extendedSemantics.minExponent = semIEEEdouble.minExponent;
   IEEEFloat extended(*this);
   fs = extended.convert(extendedSemantics, rmNearestTiesToEven, &losesInfo);
   assert(fs == opOK && !losesInfo);
   (void)fs;
 
   IEEEFloat u(extended);
   fs = u.convert(semIEEEdouble, rmNearestTiesToEven, &losesInfo);
   assert(fs == opOK || fs == opInexact);
   (void)fs;
   words[0] = *u.convertDoubleAPFloatToAPInt().getRawData();
 
   // If conversion was exact or resulted in a special case, we're done;
   // just set the second double to zero.  Otherwise, re-convert back to
   // the extended format and compute the difference.  This now should
   // convert exactly to double.
   if (u.isFiniteNonZero() && losesInfo) {
     fs = u.convert(extendedSemantics, rmNearestTiesToEven, &losesInfo);
     assert(fs == opOK && !losesInfo);
     (void)fs;
 
     IEEEFloat v(extended);
     v.subtract(u, rmNearestTiesToEven);
     fs = v.convert(semIEEEdouble, rmNearestTiesToEven, &losesInfo);
     assert(fs == opOK && !losesInfo);
     (void)fs;
     words[1] = *v.convertDoubleAPFloatToAPInt().getRawData();
   } else {
     words[1] = 0;
   }
 
   return APInt(128, words);
 }
 
 APInt IEEEFloat::convertQuadrupleAPFloatToAPInt() const {
   assert(semantics == (const llvm::fltSemantics*)&semIEEEquad);
   assert(partCount()==2);
 
   uint64_t myexponent, mysignificand, mysignificand2;
 
   if (isFiniteNonZero()) {
     myexponent = exponent+16383; //bias
     mysignificand = significandParts()[0];
     mysignificand2 = significandParts()[1];
     if (myexponent==1 && !(mysignificand2 & 0x1000000000000LL))
       myexponent = 0;   // denormal
   } else if (category==fcZero) {
     myexponent = 0;
     mysignificand = mysignificand2 = 0;
   } else if (category==fcInfinity) {
     myexponent = 0x7fff;
     mysignificand = mysignificand2 = 0;
   } else {
     assert(category == fcNaN && "Unknown category!");
     myexponent = 0x7fff;
     mysignificand = significandParts()[0];
     mysignificand2 = significandParts()[1];
   }
 
   uint64_t words[2];
   words[0] = mysignificand;
   words[1] = ((uint64_t)(sign & 1) << 63) |
              ((myexponent & 0x7fff) << 48) |
              (mysignificand2 & 0xffffffffffffLL);
 
   return APInt(128, words);
 }
 
 APInt IEEEFloat::convertDoubleAPFloatToAPInt() const {
   assert(semantics == (const llvm::fltSemantics*)&semIEEEdouble);
   assert(partCount()==1);
 
   uint64_t myexponent, mysignificand;
 
   if (isFiniteNonZero()) {
     myexponent = exponent+1023; //bias
     mysignificand = *significandParts();
     if (myexponent==1 && !(mysignificand & 0x10000000000000LL))
       myexponent = 0;   // denormal
   } else if (category==fcZero) {
     myexponent = 0;
     mysignificand = 0;
   } else if (category==fcInfinity) {
     myexponent = 0x7ff;
     mysignificand = 0;
   } else {
     assert(category == fcNaN && "Unknown category!");
     myexponent = 0x7ff;
     mysignificand = *significandParts();
   }
 
   return APInt(64, ((((uint64_t)(sign & 1) << 63) |
                      ((myexponent & 0x7ff) <<  52) |
                      (mysignificand & 0xfffffffffffffLL))));
 }
 
 APInt IEEEFloat::convertFloatAPFloatToAPInt() const {
   assert(semantics == (const llvm::fltSemantics*)&semIEEEsingle);
   assert(partCount()==1);
 
   uint32_t myexponent, mysignificand;
 
   if (isFiniteNonZero()) {
     myexponent = exponent+127; //bias
     mysignificand = (uint32_t)*significandParts();
     if (myexponent == 1 && !(mysignificand & 0x800000))
       myexponent = 0;   // denormal
   } else if (category==fcZero) {
     myexponent = 0;
     mysignificand = 0;
   } else if (category==fcInfinity) {
     myexponent = 0xff;
     mysignificand = 0;
   } else {
     assert(category == fcNaN && "Unknown category!");
     myexponent = 0xff;
     mysignificand = (uint32_t)*significandParts();
   }
 
   return APInt(32, (((sign&1) << 31) | ((myexponent&0xff) << 23) |
                     (mysignificand & 0x7fffff)));
 }
 
 APInt IEEEFloat::convertBFloatAPFloatToAPInt() const {
   assert(semantics == (const llvm::fltSemantics *)&semBFloat);
   assert(partCount() == 1);
 
   uint32_t myexponent, mysignificand;
 
   if (isFiniteNonZero()) {
     myexponent = exponent + 127; // bias
     mysignificand = (uint32_t)*significandParts();
     if (myexponent == 1 && !(mysignificand & 0x80))
       myexponent = 0; // denormal
   } else if (category == fcZero) {
     myexponent = 0;
     mysignificand = 0;
   } else if (category == fcInfinity) {
     myexponent = 0xff;
     mysignificand = 0;
   } else {
     assert(category == fcNaN && "Unknown category!");
     myexponent = 0xff;
     mysignificand = (uint32_t)*significandParts();
   }
 
   return APInt(16, (((sign & 1) << 15) | ((myexponent & 0xff) << 7) |
                     (mysignificand & 0x7f)));
 }
 
 APInt IEEEFloat::convertHalfAPFloatToAPInt() const {
   assert(semantics == (const llvm::fltSemantics*)&semIEEEhalf);
   assert(partCount()==1);
 
   uint32_t myexponent, mysignificand;
 
   if (isFiniteNonZero()) {
     myexponent = exponent+15; //bias
     mysignificand = (uint32_t)*significandParts();
     if (myexponent == 1 && !(mysignificand & 0x400))
       myexponent = 0;   // denormal
   } else if (category==fcZero) {
     myexponent = 0;
     mysignificand = 0;
   } else if (category==fcInfinity) {
     myexponent = 0x1f;
     mysignificand = 0;
   } else {
     assert(category == fcNaN && "Unknown category!");
     myexponent = 0x1f;
     mysignificand = (uint32_t)*significandParts();
   }
 
   return APInt(16, (((sign&1) << 15) | ((myexponent&0x1f) << 10) |
                     (mysignificand & 0x3ff)));
 }
 
 // This function creates an APInt that is just a bit map of the floating
 // point constant as it would appear in memory.  It is not a conversion,
 // and treating the result as a normal integer is unlikely to be useful.
 
 APInt IEEEFloat::bitcastToAPInt() const {
   if (semantics == (const llvm::fltSemantics*)&semIEEEhalf)
     return convertHalfAPFloatToAPInt();
 
   if (semantics == (const llvm::fltSemantics *)&semBFloat)
     return convertBFloatAPFloatToAPInt();
 
   if (semantics == (const llvm::fltSemantics*)&semIEEEsingle)
     return convertFloatAPFloatToAPInt();
 
   if (semantics == (const llvm::fltSemantics*)&semIEEEdouble)
     return convertDoubleAPFloatToAPInt();
 
   if (semantics == (const llvm::fltSemantics*)&semIEEEquad)
     return convertQuadrupleAPFloatToAPInt();
 
   if (semantics == (const llvm::fltSemantics *)&semPPCDoubleDoubleLegacy)
     return convertPPCDoubleDoubleAPFloatToAPInt();
 
   assert(semantics == (const llvm::fltSemantics*)&semX87DoubleExtended &&
          "unknown format!");
   return convertF80LongDoubleAPFloatToAPInt();
 }
 
 float IEEEFloat::convertToFloat() const {
   assert(semantics == (const llvm::fltSemantics*)&semIEEEsingle &&
          "Float semantics are not IEEEsingle");
   APInt api = bitcastToAPInt();
   return api.bitsToFloat();
 }
 
 double IEEEFloat::convertToDouble() const {
   assert(semantics == (const llvm::fltSemantics*)&semIEEEdouble &&
          "Float semantics are not IEEEdouble");
   APInt api = bitcastToAPInt();
   return api.bitsToDouble();
 }
 
 /// Integer bit is explicit in this format.  Intel hardware (387 and later)
 /// does not support these bit patterns:
 ///  exponent = all 1's, integer bit 0, significand 0 ("pseudoinfinity")
 ///  exponent = all 1's, integer bit 0, significand nonzero ("pseudoNaN")
 ///  exponent!=0 nor all 1's, integer bit 0 ("unnormal")
 ///  exponent = 0, integer bit 1 ("pseudodenormal")
 /// At the moment, the first three are treated as NaNs, the last one as Normal.
 void IEEEFloat::initFromF80LongDoubleAPInt(const APInt &api) {
   assert(api.getBitWidth()==80);
   uint64_t i1 = api.getRawData()[0];
   uint64_t i2 = api.getRawData()[1];
   uint64_t myexponent = (i2 & 0x7fff);
   uint64_t mysignificand = i1;
   uint8_t myintegerbit = mysignificand >> 63;
 
   initialize(&semX87DoubleExtended);
   assert(partCount()==2);
 
   sign = static_cast<unsigned int>(i2>>15);
   if (myexponent == 0 && mysignificand == 0) {
     // exponent, significand meaningless
     category = fcZero;
   } else if (myexponent==0x7fff && mysignificand==0x8000000000000000ULL) {
     // exponent, significand meaningless
     category = fcInfinity;
   } else if ((myexponent == 0x7fff && mysignificand != 0x8000000000000000ULL) ||
              (myexponent != 0x7fff && myexponent != 0 && myintegerbit == 0)) {
     // exponent meaningless
     category = fcNaN;
     significandParts()[0] = mysignificand;
     significandParts()[1] = 0;
   } else {
     category = fcNormal;
     exponent = myexponent - 16383;
     significandParts()[0] = mysignificand;
     significandParts()[1] = 0;
     if (myexponent==0)          // denormal
       exponent = -16382;
   }
 }
 
 void IEEEFloat::initFromPPCDoubleDoubleAPInt(const APInt &api) {
   assert(api.getBitWidth()==128);
   uint64_t i1 = api.getRawData()[0];
   uint64_t i2 = api.getRawData()[1];
   opStatus fs;
   bool losesInfo;
 
   // Get the first double and convert to our format.
   initFromDoubleAPInt(APInt(64, i1));
   fs = convert(semPPCDoubleDoubleLegacy, rmNearestTiesToEven, &losesInfo);
   assert(fs == opOK && !losesInfo);
   (void)fs;
 
   // Unless we have a special case, add in second double.
   if (isFiniteNonZero()) {
     IEEEFloat v(semIEEEdouble, APInt(64, i2));
     fs = v.convert(semPPCDoubleDoubleLegacy, rmNearestTiesToEven, &losesInfo);
     assert(fs == opOK && !losesInfo);
     (void)fs;
 
     add(v, rmNearestTiesToEven);
   }
 }
 
 void IEEEFloat::initFromQuadrupleAPInt(const APInt &api) {
   assert(api.getBitWidth()==128);
   uint64_t i1 = api.getRawData()[0];
   uint64_t i2 = api.getRawData()[1];
   uint64_t myexponent = (i2 >> 48) & 0x7fff;
   uint64_t mysignificand  = i1;
   uint64_t mysignificand2 = i2 & 0xffffffffffffLL;
 
   initialize(&semIEEEquad);
   assert(partCount()==2);
 
   sign = static_cast<unsigned int>(i2>>63);
   if (myexponent==0 &&
       (mysignificand==0 && mysignificand2==0)) {
     // exponent, significand meaningless
     category = fcZero;
   } else if (myexponent==0x7fff &&
              (mysignificand==0 && mysignificand2==0)) {
     // exponent, significand meaningless
     category = fcInfinity;
   } else if (myexponent==0x7fff &&
              (mysignificand!=0 || mysignificand2 !=0)) {
     // exponent meaningless
     category = fcNaN;
     significandParts()[0] = mysignificand;
     significandParts()[1] = mysignificand2;
   } else {
     category = fcNormal;
     exponent = myexponent - 16383;
     significandParts()[0] = mysignificand;
     significandParts()[1] = mysignificand2;
     if (myexponent==0)          // denormal
       exponent = -16382;
     else
       significandParts()[1] |= 0x1000000000000LL;  // integer bit
   }
 }
 
 void IEEEFloat::initFromDoubleAPInt(const APInt &api) {
   assert(api.getBitWidth()==64);
   uint64_t i = *api.getRawData();
   uint64_t myexponent = (i >> 52) & 0x7ff;
   uint64_t mysignificand = i & 0xfffffffffffffLL;
 
   initialize(&semIEEEdouble);
   assert(partCount()==1);
 
   sign = static_cast<unsigned int>(i>>63);
   if (myexponent==0 && mysignificand==0) {
     // exponent, significand meaningless
     category = fcZero;
   } else if (myexponent==0x7ff && mysignificand==0) {
     // exponent, significand meaningless
     category = fcInfinity;
   } else if (myexponent==0x7ff && mysignificand!=0) {
     // exponent meaningless
     category = fcNaN;
     *significandParts() = mysignificand;
   } else {
     category = fcNormal;
     exponent = myexponent - 1023;
     *significandParts() = mysignificand;
     if (myexponent==0)          // denormal
       exponent = -1022;
     else
       *significandParts() |= 0x10000000000000LL;  // integer bit
   }
 }
 
 void IEEEFloat::initFromFloatAPInt(const APInt &api) {
   assert(api.getBitWidth()==32);
   uint32_t i = (uint32_t)*api.getRawData();
   uint32_t myexponent = (i >> 23) & 0xff;
   uint32_t mysignificand = i & 0x7fffff;
 
   initialize(&semIEEEsingle);
   assert(partCount()==1);
 
   sign = i >> 31;
   if (myexponent==0 && mysignificand==0) {
     // exponent, significand meaningless
     category = fcZero;
   } else if (myexponent==0xff && mysignificand==0) {
     // exponent, significand meaningless
     category = fcInfinity;
   } else if (myexponent==0xff && mysignificand!=0) {
     // sign, exponent, significand meaningless
     category = fcNaN;
     *significandParts() = mysignificand;
   } else {
     category = fcNormal;
     exponent = myexponent - 127;  //bias
     *significandParts() = mysignificand;
     if (myexponent==0)    // denormal
       exponent = -126;
     else
       *significandParts() |= 0x800000; // integer bit
   }
 }
 
 void IEEEFloat::initFromBFloatAPInt(const APInt &api) {
   assert(api.getBitWidth() == 16);
   uint32_t i = (uint32_t)*api.getRawData();
   uint32_t myexponent = (i >> 7) & 0xff;
   uint32_t mysignificand = i & 0x7f;
 
   initialize(&semBFloat);
   assert(partCount() == 1);
 
   sign = i >> 15;
   if (myexponent == 0 && mysignificand == 0) {
     // exponent, significand meaningless
     category = fcZero;
   } else if (myexponent == 0xff && mysignificand == 0) {
     // exponent, significand meaningless
     category = fcInfinity;
   } else if (myexponent == 0xff && mysignificand != 0) {
     // sign, exponent, significand meaningless
     category = fcNaN;
     *significandParts() = mysignificand;
   } else {
     category = fcNormal;
     exponent = myexponent - 127; // bias
     *significandParts() = mysignificand;
     if (myexponent == 0) // denormal
       exponent = -126;
     else
       *significandParts() |= 0x80; // integer bit
   }
 }
 
 void IEEEFloat::initFromHalfAPInt(const APInt &api) {
   assert(api.getBitWidth()==16);
   uint32_t i = (uint32_t)*api.getRawData();
   uint32_t myexponent = (i >> 10) & 0x1f;
   uint32_t mysignificand = i & 0x3ff;
 
   initialize(&semIEEEhalf);
   assert(partCount()==1);
 
   sign = i >> 15;
   if (myexponent==0 && mysignificand==0) {
     // exponent, significand meaningless
     category = fcZero;
   } else if (myexponent==0x1f && mysignificand==0) {
     // exponent, significand meaningless
     category = fcInfinity;
   } else if (myexponent==0x1f && mysignificand!=0) {
     // sign, exponent, significand meaningless
     category = fcNaN;
     *significandParts() = mysignificand;
   } else {
     category = fcNormal;
     exponent = myexponent - 15;  //bias
     *significandParts() = mysignificand;
     if (myexponent==0)    // denormal
       exponent = -14;
     else
       *significandParts() |= 0x400; // integer bit
   }
 }
 
 /// Treat api as containing the bits of a floating point number.  Currently
 /// we infer the floating point type from the size of the APInt.  The
 /// isIEEE argument distinguishes between PPC128 and IEEE128 (not meaningful
 /// when the size is anything else).
 void IEEEFloat::initFromAPInt(const fltSemantics *Sem, const APInt &api) {
   if (Sem == &semIEEEhalf)
     return initFromHalfAPInt(api);
   if (Sem == &semBFloat)
     return initFromBFloatAPInt(api);
   if (Sem == &semIEEEsingle)
     return initFromFloatAPInt(api);
   if (Sem == &semIEEEdouble)
     return initFromDoubleAPInt(api);
   if (Sem == &semX87DoubleExtended)
     return initFromF80LongDoubleAPInt(api);
   if (Sem == &semIEEEquad)
     return initFromQuadrupleAPInt(api);
   if (Sem == &semPPCDoubleDoubleLegacy)
     return initFromPPCDoubleDoubleAPInt(api);
 
   llvm_unreachable(nullptr);
 }
 
 /// Make this number the largest magnitude normal number in the given
 /// semantics.
 void IEEEFloat::makeLargest(bool Negative) {
   // We want (in interchange format):
   //   sign = {Negative}
   //   exponent = 1..10
   //   significand = 1..1
   category = fcNormal;
   sign = Negative;
   exponent = semantics->maxExponent;
 
   // Use memset to set all but the highest integerPart to all ones.
   integerPart *significand = significandParts();
   unsigned PartCount = partCount();
   memset(significand, 0xFF, sizeof(integerPart)*(PartCount - 1));
 
   // Set the high integerPart especially setting all unused top bits for
   // internal consistency.
   const unsigned NumUnusedHighBits =
     PartCount*integerPartWidth - semantics->precision;
   significand[PartCount - 1] = (NumUnusedHighBits < integerPartWidth)
                                    ? (~integerPart(0) >> NumUnusedHighBits)
                                    : 0;
 }
 
 /// Make this number the smallest magnitude denormal number in the given
 /// semantics.
 void IEEEFloat::makeSmallest(bool Negative) {
   // We want (in interchange format):
   //   sign = {Negative}
   //   exponent = 0..0
   //   significand = 0..01
   category = fcNormal;
   sign = Negative;
   exponent = semantics->minExponent;
   APInt::tcSet(significandParts(), 1, partCount());
 }
 
 void IEEEFloat::makeSmallestNormalized(bool Negative) {
   // We want (in interchange format):
   //   sign = {Negative}
   //   exponent = 0..0
   //   significand = 10..0
 
   category = fcNormal;
   zeroSignificand();
   sign = Negative;
   exponent = semantics->minExponent;
   significandParts()[partCountForBits(semantics->precision) - 1] |=
       (((integerPart)1) << ((semantics->precision - 1) % integerPartWidth));
 }
 
 IEEEFloat::IEEEFloat(const fltSemantics &Sem, const APInt &API) {
   initFromAPInt(&Sem, API);
 }
 
 IEEEFloat::IEEEFloat(float f) {
   initFromAPInt(&semIEEEsingle, APInt::floatToBits(f));
 }
 
 IEEEFloat::IEEEFloat(double d) {
   initFromAPInt(&semIEEEdouble, APInt::doubleToBits(d));
 }
 
 namespace {
   void append(SmallVectorImpl<char> &Buffer, StringRef Str) {
     Buffer.append(Str.begin(), Str.end());
   }
 
   /// Removes data from the given significand until it is no more
   /// precise than is required for the desired precision.
   void AdjustToPrecision(APInt &significand,
                          int &exp, unsigned FormatPrecision) {
     unsigned bits = significand.getActiveBits();
 
     // 196/59 is a very slight overestimate of lg_2(10).
     unsigned bitsRequired = (FormatPrecision * 196 + 58) / 59;
 
     if (bits <= bitsRequired) return;
 
     unsigned tensRemovable = (bits - bitsRequired) * 59 / 196;
     if (!tensRemovable) return;
 
     exp += tensRemovable;
 
     APInt divisor(significand.getBitWidth(), 1);
     APInt powten(significand.getBitWidth(), 10);
     while (true) {
       if (tensRemovable & 1)
         divisor *= powten;
       tensRemovable >>= 1;
       if (!tensRemovable) break;
       powten *= powten;
     }
 
     significand = significand.udiv(divisor);
 
     // Truncate the significand down to its active bit count.
     significand = significand.trunc(significand.getActiveBits());
   }
 
 
   void AdjustToPrecision(SmallVectorImpl<char> &buffer,
                          int &exp, unsigned FormatPrecision) {
     unsigned N = buffer.size();
     if (N <= FormatPrecision) return;
 
     // The most significant figures are the last ones in the buffer.
     unsigned FirstSignificant = N - FormatPrecision;
 
     // Round.
     // FIXME: this probably shouldn't use 'round half up'.
 
     // Rounding down is just a truncation, except we also want to drop
     // trailing zeros from the new result.
     if (buffer[FirstSignificant - 1] < '5') {
       while (FirstSignificant < N && buffer[FirstSignificant] == '0')
         FirstSignificant++;
 
       exp += FirstSignificant;
       buffer.erase(&buffer[0], &buffer[FirstSignificant]);
       return;
     }
 
     // Rounding up requires a decimal add-with-carry.  If we continue
     // the carry, the newly-introduced zeros will just be truncated.
     for (unsigned I = FirstSignificant; I != N; ++I) {
       if (buffer[I] == '9') {
         FirstSignificant++;
       } else {
         buffer[I]++;
         break;
       }
     }
 
     // If we carried through, we have exactly one digit of precision.
     if (FirstSignificant == N) {
       exp += FirstSignificant;
       buffer.clear();
       buffer.push_back('1');
       return;
     }
 
     exp += FirstSignificant;
     buffer.erase(&buffer[0], &buffer[FirstSignificant]);
   }
 }
 
 void IEEEFloat::toString(SmallVectorImpl<char> &Str, unsigned FormatPrecision,
                          unsigned FormatMaxPadding, bool TruncateZero) const {
   switch (category) {
   case fcInfinity:
     if (isNegative())
       return append(Str, "-Inf");
     else
       return append(Str, "+Inf");
 
   case fcNaN: return append(Str, "NaN");
 
   case fcZero:
     if (isNegative())
       Str.push_back('-');
 
     if (!FormatMaxPadding) {
       if (TruncateZero)
         append(Str, "0.0E+0");
       else {
         append(Str, "0.0");
         if (FormatPrecision > 1)
           Str.append(FormatPrecision - 1, '0');
         append(Str, "e+00");
       }
     } else
       Str.push_back('0');
     return;
 
   case fcNormal:
     break;
   }
 
   if (isNegative())
     Str.push_back('-');
 
   // Decompose the number into an APInt and an exponent.
   int exp = exponent - ((int) semantics->precision - 1);
   APInt significand(semantics->precision,
                     makeArrayRef(significandParts(),
                                  partCountForBits(semantics->precision)));
 
   // Set FormatPrecision if zero.  We want to do this before we
   // truncate trailing zeros, as those are part of the precision.
   if (!FormatPrecision) {
     // We use enough digits so the number can be round-tripped back to an
     // APFloat. The formula comes from "How to Print Floating-Point Numbers
     // Accurately" by Steele and White.
     // FIXME: Using a formula based purely on the precision is conservative;
     // we can print fewer digits depending on the actual value being printed.
 
     // FormatPrecision = 2 + floor(significandBits / lg_2(10))
     FormatPrecision = 2 + semantics->precision * 59 / 196;
   }
 
   // Ignore trailing binary zeros.
   int trailingZeros = significand.countTrailingZeros();
   exp += trailingZeros;
   significand.lshrInPlace(trailingZeros);
 
   // Change the exponent from 2^e to 10^e.
   if (exp == 0) {
     // Nothing to do.
   } else if (exp > 0) {
     // Just shift left.
     significand = significand.zext(semantics->precision + exp);
     significand <<= exp;
     exp = 0;
   } else { /* exp < 0 */
     int texp = -exp;
 
     // We transform this using the identity:
     //   (N)(2^-e) == (N)(5^e)(10^-e)
     // This means we have to multiply N (the significand) by 5^e.
     // To avoid overflow, we have to operate on numbers large
     // enough to store N * 5^e:
     //   log2(N * 5^e) == log2(N) + e * log2(5)
     //                 <= semantics->precision + e * 137 / 59
     //   (log_2(5) ~ 2.321928 < 2.322034 ~ 137/59)
 
     unsigned precision = semantics->precision + (137 * texp + 136) / 59;
 
     // Multiply significand by 5^e.
     //   N * 5^0101 == N * 5^(1*1) * 5^(0*2) * 5^(1*4) * 5^(0*8)
     significand = significand.zext(precision);
     APInt five_to_the_i(precision, 5);
     while (true) {
       if (texp & 1) significand *= five_to_the_i;
 
       texp >>= 1;
       if (!texp) break;
       five_to_the_i *= five_to_the_i;
     }
   }
 
   AdjustToPrecision(significand, exp, FormatPrecision);
 
   SmallVector<char, 256> buffer;
 
   // Fill the buffer.
   unsigned precision = significand.getBitWidth();
   APInt ten(precision, 10);
   APInt digit(precision, 0);
 
   bool inTrail = true;
   while (significand != 0) {
     // digit <- significand % 10
     // significand <- significand / 10
     APInt::udivrem(significand, ten, significand, digit);
 
     unsigned d = digit.getZExtValue();
 
     // Drop trailing zeros.
     if (inTrail && !d) exp++;
     else {
       buffer.push_back((char) ('0' + d));
       inTrail = false;
     }
   }
 
   assert(!buffer.empty() && "no characters in buffer!");
 
   // Drop down to FormatPrecision.
   // TODO: don't do more precise calculations above than are required.
   AdjustToPrecision(buffer, exp, FormatPrecision);
 
   unsigned NDigits = buffer.size();
 
   // Check whether we should use scientific notation.
   bool FormatScientific;
   if (!FormatMaxPadding)
     FormatScientific = true;
   else {
     if (exp >= 0) {
       // 765e3 --> 765000
       //              ^^^
       // But we shouldn't make the number look more precise than it is.
       FormatScientific = ((unsigned) exp > FormatMaxPadding ||
                           NDigits + (unsigned) exp > FormatPrecision);
     } else {
       // Power of the most significant digit.
       int MSD = exp + (int) (NDigits - 1);
       if (MSD >= 0) {
         // 765e-2 == 7.65
         FormatScientific = false;
       } else {
         // 765e-5 == 0.00765
         //           ^ ^^
         FormatScientific = ((unsigned) -MSD) > FormatMaxPadding;
       }
     }
   }
 
   // Scientific formatting is pretty straightforward.
   if (FormatScientific) {
     exp += (NDigits - 1);
 
     Str.push_back(buffer[NDigits-1]);
     Str.push_back('.');
     if (NDigits == 1 && TruncateZero)
       Str.push_back('0');
     else
       for (unsigned I = 1; I != NDigits; ++I)
         Str.push_back(buffer[NDigits-1-I]);
     // Fill with zeros up to FormatPrecision.
     if (!TruncateZero && FormatPrecision > NDigits - 1)
       Str.append(FormatPrecision - NDigits + 1, '0');
     // For !TruncateZero we use lower 'e'.
     Str.push_back(TruncateZero ? 'E' : 'e');
 
     Str.push_back(exp >= 0 ? '+' : '-');
     if (exp < 0) exp = -exp;
     SmallVector<char, 6> expbuf;
     do {
       expbuf.push_back((char) ('0' + (exp % 10)));
       exp /= 10;
     } while (exp);
     // Exponent always at least two digits if we do not truncate zeros.
     if (!TruncateZero && expbuf.size() < 2)
       expbuf.push_back('0');
     for (unsigned I = 0, E = expbuf.size(); I != E; ++I)
       Str.push_back(expbuf[E-1-I]);
     return;
   }
 
   // Non-scientific, positive exponents.
   if (exp >= 0) {
     for (unsigned I = 0; I != NDigits; ++I)
       Str.push_back(buffer[NDigits-1-I]);
     for (unsigned I = 0; I != (unsigned) exp; ++I)
       Str.push_back('0');
     return;
   }
 
   // Non-scientific, negative exponents.
 
   // The number of digits to the left of the decimal point.
   int NWholeDigits = exp + (int) NDigits;
 
   unsigned I = 0;
   if (NWholeDigits > 0) {
     for (; I != (unsigned) NWholeDigits; ++I)
       Str.push_back(buffer[NDigits-I-1]);
     Str.push_back('.');
   } else {
     unsigned NZeros = 1 + (unsigned) -NWholeDigits;
 
     Str.push_back('0');
     Str.push_back('.');
     for (unsigned Z = 1; Z != NZeros; ++Z)
       Str.push_back('0');
   }
 
   for (; I != NDigits; ++I)
     Str.push_back(buffer[NDigits-I-1]);
 }
 
 bool IEEEFloat::getExactInverse(APFloat *inv) const {
   // Special floats and denormals have no exact inverse.
   if (!isFiniteNonZero())
     return false;
 
   // Check that the number is a power of two by making sure that only the
   // integer bit is set in the significand.
   if (significandLSB() != semantics->precision - 1)
     return false;
 
   // Get the inverse.
   IEEEFloat reciprocal(*semantics, 1ULL);
   if (reciprocal.divide(*this, rmNearestTiesToEven) != opOK)
     return false;
 
   // Avoid multiplication with a denormal, it is not safe on all platforms and
   // may be slower than a normal division.
   if (reciprocal.isDenormal())
     return false;
 
   assert(reciprocal.isFiniteNonZero() &&
          reciprocal.significandLSB() == reciprocal.semantics->precision - 1);
 
   if (inv)
     *inv = APFloat(reciprocal, *semantics);
 
   return true;
 }
 
 bool IEEEFloat::isSignaling() const {
   if (!isNaN())
     return false;
 
   // IEEE-754R 2008 6.2.1: A signaling NaN bit string should be encoded with the
   // first bit of the trailing significand being 0.
   return !APInt::tcExtractBit(significandParts(), semantics->precision - 2);
 }
 
 /// IEEE-754R 2008 5.3.1: nextUp/nextDown.
 ///
 /// *NOTE* since nextDown(x) = -nextUp(-x), we only implement nextUp with
 /// appropriate sign switching before/after the computation.
 IEEEFloat::opStatus IEEEFloat::next(bool nextDown) {
   // If we are performing nextDown, swap sign so we have -x.
   if (nextDown)
     changeSign();
 
   // Compute nextUp(x)
   opStatus result = opOK;
 
   // Handle each float category separately.
   switch (category) {
   case fcInfinity:
     // nextUp(+inf) = +inf
     if (!isNegative())
       break;
     // nextUp(-inf) = -getLargest()
     makeLargest(true);
     break;
   case fcNaN:
     // IEEE-754R 2008 6.2 Par 2: nextUp(sNaN) = qNaN. Set Invalid flag.
     // IEEE-754R 2008 6.2: nextUp(qNaN) = qNaN. Must be identity so we do not
     //                     change the payload.
     if (isSignaling()) {
       result = opInvalidOp;
       // For consistency, propagate the sign of the sNaN to the qNaN.
       makeNaN(false, isNegative(), nullptr);
     }
     break;
   case fcZero:
     // nextUp(pm 0) = +getSmallest()
     makeSmallest(false);
     break;
   case fcNormal:
     // nextUp(-getSmallest()) = -0
     if (isSmallest() && isNegative()) {
       APInt::tcSet(significandParts(), 0, partCount());
       category = fcZero;
       exponent = 0;
       break;
     }
 
     // nextUp(getLargest()) == INFINITY
     if (isLargest() && !isNegative()) {
       APInt::tcSet(significandParts(), 0, partCount());
       category = fcInfinity;
       exponent = semantics->maxExponent + 1;
       break;
     }
 
     // nextUp(normal) == normal + inc.
     if (isNegative()) {
       // If we are negative, we need to decrement the significand.
 
       // We only cross a binade boundary that requires adjusting the exponent
       // if:
       //   1. exponent != semantics->minExponent. This implies we are not in the
       //   smallest binade or are dealing with denormals.
       //   2. Our significand excluding the integral bit is all zeros.
       bool WillCrossBinadeBoundary =
         exponent != semantics->minExponent && isSignificandAllZeros();
 
       // Decrement the significand.
       //
       // We always do this since:
       //   1. If we are dealing with a non-binade decrement, by definition we
       //   just decrement the significand.
       //   2. If we are dealing with a normal -> normal binade decrement, since
       //   we have an explicit integral bit the fact that all bits but the
       //   integral bit are zero implies that subtracting one will yield a
       //   significand with 0 integral bit and 1 in all other spots. Thus we
       //   must just adjust the exponent and set the integral bit to 1.
       //   3. If we are dealing with a normal -> denormal binade decrement,
       //   since we set the integral bit to 0 when we represent denormals, we
       //   just decrement the significand.
       integerPart *Parts = significandParts();
       APInt::tcDecrement(Parts, partCount());
 
       if (WillCrossBinadeBoundary) {
         // Our result is a normal number. Do the following:
         // 1. Set the integral bit to 1.
         // 2. Decrement the exponent.
         APInt::tcSetBit(Parts, semantics->precision - 1);
         exponent--;
       }
     } else {
       // If we are positive, we need to increment the significand.
 
       // We only cross a binade boundary that requires adjusting the exponent if
       // the input is not a denormal and all of said input's significand bits
       // are set. If all of said conditions are true: clear the significand, set
       // the integral bit to 1, and increment the exponent. If we have a
       // denormal always increment since moving denormals and the numbers in the
       // smallest normal binade have the same exponent in our representation.
       bool WillCrossBinadeBoundary = !isDenormal() && isSignificandAllOnes();
 
       if (WillCrossBinadeBoundary) {
         integerPart *Parts = significandParts();
         APInt::tcSet(Parts, 0, partCount());
         APInt::tcSetBit(Parts, semantics->precision - 1);
         assert(exponent != semantics->maxExponent &&
                "We can not increment an exponent beyond the maxExponent allowed"
                " by the given floating point semantics.");
         exponent++;
       } else {
         incrementSignificand();
       }
     }
     break;
   }
 
   // If we are performing nextDown, swap sign so we have -nextUp(-x)
   if (nextDown)
     changeSign();
 
   return result;
 }
 
 void IEEEFloat::makeInf(bool Negative) {
   category = fcInfinity;
   sign = Negative;
   exponent = semantics->maxExponent + 1;
   APInt::tcSet(significandParts(), 0, partCount());
 }
 
 void IEEEFloat::makeZero(bool Negative) {
   category = fcZero;
   sign = Negative;
   exponent = semantics->minExponent-1;
   APInt::tcSet(significandParts(), 0, partCount());
 }
 
 void IEEEFloat::makeQuiet() {
   assert(isNaN());
   APInt::tcSetBit(significandParts(), semantics->precision - 2);
 }
 
 int ilogb(const IEEEFloat &Arg) {
   if (Arg.isNaN())
     return IEEEFloat::IEK_NaN;
   if (Arg.isZero())
     return IEEEFloat::IEK_Zero;
   if (Arg.isInfinity())
     return IEEEFloat::IEK_Inf;
   if (!Arg.isDenormal())
     return Arg.exponent;
 
   IEEEFloat Normalized(Arg);
   int SignificandBits = Arg.getSemantics().precision - 1;
 
   Normalized.exponent += SignificandBits;
   Normalized.normalize(IEEEFloat::rmNearestTiesToEven, lfExactlyZero);
   return Normalized.exponent - SignificandBits;
 }
 
 IEEEFloat scalbn(IEEEFloat X, int Exp, IEEEFloat::roundingMode RoundingMode) {
   auto MaxExp = X.getSemantics().maxExponent;
   auto MinExp = X.getSemantics().minExponent;
 
   // If Exp is wildly out-of-scale, simply adding it to X.exponent will
   // overflow; clamp it to a safe range before adding, but ensure that the range
   // is large enough that the clamp does not change the result. The range we
   // need to support is the difference between the largest possible exponent and
   // the normalized exponent of half the smallest denormal.
 
   int SignificandBits = X.getSemantics().precision - 1;
   int MaxIncrement = MaxExp - (MinExp - SignificandBits) + 1;
 
   // Clamp to one past the range ends to let normalize handle overlflow.
   X.exponent += std::min(std::max(Exp, -MaxIncrement - 1), MaxIncrement);
   X.normalize(RoundingMode, lfExactlyZero);
   if (X.isNaN())
     X.makeQuiet();
   return X;
 }
 
 IEEEFloat frexp(const IEEEFloat &Val, int &Exp, IEEEFloat::roundingMode RM) {
   Exp = ilogb(Val);
 
   // Quiet signalling nans.
   if (Exp == IEEEFloat::IEK_NaN) {
     IEEEFloat Quiet(Val);
     Quiet.makeQuiet();
     return Quiet;
   }
 
   if (Exp == IEEEFloat::IEK_Inf)
     return Val;
 
   // 1 is added because frexp is defined to return a normalized fraction in
   // +/-[0.5, 1.0), rather than the usual +/-[1.0, 2.0).
   Exp = Exp == IEEEFloat::IEK_Zero ? 0 : Exp + 1;
   return scalbn(Val, -Exp, RM);
 }
 
 DoubleAPFloat::DoubleAPFloat(const fltSemantics &S)
     : Semantics(&S),
       Floats(new APFloat[2]{APFloat(semIEEEdouble), APFloat(semIEEEdouble)}) {
   assert(Semantics == &semPPCDoubleDouble);
 }
 
 DoubleAPFloat::DoubleAPFloat(const fltSemantics &S, uninitializedTag)
     : Semantics(&S),
       Floats(new APFloat[2]{APFloat(semIEEEdouble, uninitialized),
                             APFloat(semIEEEdouble, uninitialized)}) {
   assert(Semantics == &semPPCDoubleDouble);
 }
 
 DoubleAPFloat::DoubleAPFloat(const fltSemantics &S, integerPart I)
     : Semantics(&S), Floats(new APFloat[2]{APFloat(semIEEEdouble, I),
                                            APFloat(semIEEEdouble)}) {
   assert(Semantics == &semPPCDoubleDouble);
 }
 
 DoubleAPFloat::DoubleAPFloat(const fltSemantics &S, const APInt &I)
     : Semantics(&S),
       Floats(new APFloat[2]{
           APFloat(semIEEEdouble, APInt(64, I.getRawData()[0])),
           APFloat(semIEEEdouble, APInt(64, I.getRawData()[1]))}) {
   assert(Semantics == &semPPCDoubleDouble);
 }
 
 DoubleAPFloat::DoubleAPFloat(const fltSemantics &S, APFloat &&First,
                              APFloat &&Second)
     : Semantics(&S),
       Floats(new APFloat[2]{std::move(First), std::move(Second)}) {
   assert(Semantics == &semPPCDoubleDouble);
   assert(&Floats[0].getSemantics() == &semIEEEdouble);
   assert(&Floats[1].getSemantics() == &semIEEEdouble);
 }
 
 DoubleAPFloat::DoubleAPFloat(const DoubleAPFloat &RHS)
     : Semantics(RHS.Semantics),
       Floats(RHS.Floats ? new APFloat[2]{APFloat(RHS.Floats[0]),
                                          APFloat(RHS.Floats[1])}
                         : nullptr) {
   assert(Semantics == &semPPCDoubleDouble);
 }
 
 DoubleAPFloat::DoubleAPFloat(DoubleAPFloat &&RHS)
     : Semantics(RHS.Semantics), Floats(std::move(RHS.Floats)) {
   RHS.Semantics = &semBogus;
   assert(Semantics == &semPPCDoubleDouble);
 }
 
 DoubleAPFloat &DoubleAPFloat::operator=(const DoubleAPFloat &RHS) {
   if (Semantics == RHS.Semantics && RHS.Floats) {
     Floats[0] = RHS.Floats[0];
     Floats[1] = RHS.Floats[1];
   } else if (this != &RHS) {
     this->~DoubleAPFloat();
     new (this) DoubleAPFloat(RHS);
   }
   return *this;
 }
 
 // Implement addition, subtraction, multiplication and division based on:
 // "Software for Doubled-Precision Floating-Point Computations",
 // by Seppo Linnainmaa, ACM TOMS vol 7 no 3, September 1981, pages 272-283.
 APFloat::opStatus DoubleAPFloat::addImpl(const APFloat &a, const APFloat &aa,
                                          const APFloat &c, const APFloat &cc,
                                          roundingMode RM) {
   int Status = opOK;
   APFloat z = a;
   Status |= z.add(c, RM);
   if (!z.isFinite()) {
     if (!z.isInfinity()) {
       Floats[0] = std::move(z);
       Floats[1].makeZero(/* Neg = */ false);
       return (opStatus)Status;
     }
     Status = opOK;
     auto AComparedToC = a.compareAbsoluteValue(c);
     z = cc;
     Status |= z.add(aa, RM);
     if (AComparedToC == APFloat::cmpGreaterThan) {
       // z = cc + aa + c + a;
       Status |= z.add(c, RM);
       Status |= z.add(a, RM);
     } else {
       // z = cc + aa + a + c;
       Status |= z.add(a, RM);
       Status |= z.add(c, RM);
     }
     if (!z.isFinite()) {
       Floats[0] = std::move(z);
       Floats[1].makeZero(/* Neg = */ false);
       return (opStatus)Status;
     }
     Floats[0] = z;
     APFloat zz = aa;
     Status |= zz.add(cc, RM);
     if (AComparedToC == APFloat::cmpGreaterThan) {
       // Floats[1] = a - z + c + zz;
       Floats[1] = a;
       Status |= Floats[1].subtract(z, RM);
       Status |= Floats[1].add(c, RM);
       Status |= Floats[1].add(zz, RM);
     } else {
       // Floats[1] = c - z + a + zz;
       Floats[1] = c;
       Status |= Floats[1].subtract(z, RM);
       Status |= Floats[1].add(a, RM);
       Status |= Floats[1].add(zz, RM);
     }
   } else {
     // q = a - z;
     APFloat q = a;
     Status |= q.subtract(z, RM);
 
     // zz = q + c + (a - (q + z)) + aa + cc;
     // Compute a - (q + z) as -((q + z) - a) to avoid temporary copies.
     auto zz = q;
     Status |= zz.add(c, RM);
     Status |= q.add(z, RM);
     Status |= q.subtract(a, RM);
     q.changeSign();
     Status |= zz.add(q, RM);
     Status |= zz.add(aa, RM);
     Status |= zz.add(cc, RM);
     if (zz.isZero() && !zz.isNegative()) {
       Floats[0] = std::move(z);
       Floats[1].makeZero(/* Neg = */ false);
       return opOK;
     }
     Floats[0] = z;
     Status |= Floats[0].add(zz, RM);
     if (!Floats[0].isFinite()) {
       Floats[1].makeZero(/* Neg = */ false);
       return (opStatus)Status;
     }
     Floats[1] = std::move(z);
     Status |= Floats[1].subtract(Floats[0], RM);
     Status |= Floats[1].add(zz, RM);
   }
   return (opStatus)Status;
 }
 
 APFloat::opStatus DoubleAPFloat::addWithSpecial(const DoubleAPFloat &LHS,
                                                 const DoubleAPFloat &RHS,
                                                 DoubleAPFloat &Out,
                                                 roundingMode RM) {
   if (LHS.getCategory() == fcNaN) {
     Out = LHS;
     return opOK;
   }
   if (RHS.getCategory() == fcNaN) {
     Out = RHS;
     return opOK;
   }
   if (LHS.getCategory() == fcZero) {
     Out = RHS;
     return opOK;
   }
   if (RHS.getCategory() == fcZero) {
     Out = LHS;
     return opOK;
   }
   if (LHS.getCategory() == fcInfinity && RHS.getCategory() == fcInfinity &&
       LHS.isNegative() != RHS.isNegative()) {
     Out.makeNaN(false, Out.isNegative(), nullptr);
     return opInvalidOp;
   }
   if (LHS.getCategory() == fcInfinity) {
     Out = LHS;
     return opOK;
   }
   if (RHS.getCategory() == fcInfinity) {
     Out = RHS;
     return opOK;
   }
   assert(LHS.getCategory() == fcNormal && RHS.getCategory() == fcNormal);
 
   APFloat A(LHS.Floats[0]), AA(LHS.Floats[1]), C(RHS.Floats[0]),
       CC(RHS.Floats[1]);
   assert(&A.getSemantics() == &semIEEEdouble);
   assert(&AA.getSemantics() == &semIEEEdouble);
   assert(&C.getSemantics() == &semIEEEdouble);
   assert(&CC.getSemantics() == &semIEEEdouble);
   assert(&Out.Floats[0].getSemantics() == &semIEEEdouble);
   assert(&Out.Floats[1].getSemantics() == &semIEEEdouble);
   return Out.addImpl(A, AA, C, CC, RM);
 }
 
 APFloat::opStatus DoubleAPFloat::add(const DoubleAPFloat &RHS,
                                      roundingMode RM) {
   return addWithSpecial(*this, RHS, *this, RM);
 }
 
 APFloat::opStatus DoubleAPFloat::subtract(const DoubleAPFloat &RHS,
                                           roundingMode RM) {
   changeSign();
   auto Ret = add(RHS, RM);
   changeSign();
   return Ret;
 }
 
 APFloat::opStatus DoubleAPFloat::multiply(const DoubleAPFloat &RHS,
                                           APFloat::roundingMode RM) {
   const auto &LHS = *this;
   auto &Out = *this;
   /* Interesting observation: For special categories, finding the lowest
      common ancestor of the following layered graph gives the correct
      return category:
 
         NaN
        /   \
      Zero  Inf
        \   /
        Normal
 
      e.g. NaN * NaN = NaN
           Zero * Inf = NaN
           Normal * Zero = Zero
           Normal * Inf = Inf
   */
   if (LHS.getCategory() == fcNaN) {
     Out = LHS;
     return opOK;
   }
   if (RHS.getCategory() == fcNaN) {
     Out = RHS;
     return opOK;
   }
   if ((LHS.getCategory() == fcZero && RHS.getCategory() == fcInfinity) ||
       (LHS.getCategory() == fcInfinity && RHS.getCategory() == fcZero)) {
     Out.makeNaN(false, false, nullptr);
     return opOK;
   }
   if (LHS.getCategory() == fcZero || LHS.getCategory() == fcInfinity) {
     Out = LHS;
     return opOK;
   }
   if (RHS.getCategory() == fcZero || RHS.getCategory() == fcInfinity) {
     Out = RHS;
     return opOK;
   }
   assert(LHS.getCategory() == fcNormal && RHS.getCategory() == fcNormal &&
          "Special cases not handled exhaustively");
 
   int Status = opOK;
   APFloat A = Floats[0], B = Floats[1], C = RHS.Floats[0], D = RHS.Floats[1];
   // t = a * c
   APFloat T = A;
   Status |= T.multiply(C, RM);
   if (!T.isFiniteNonZero()) {
     Floats[0] = T;
     Floats[1].makeZero(/* Neg = */ false);
     return (opStatus)Status;
   }
 
   // tau = fmsub(a, c, t), that is -fmadd(-a, c, t).
   APFloat Tau = A;
   T.changeSign();
   Status |= Tau.fusedMultiplyAdd(C, T, RM);
   T.changeSign();
   {
     // v = a * d
     APFloat V = A;
     Status |= V.multiply(D, RM);
     // w = b * c
     APFloat W = B;
     Status |= W.multiply(C, RM);
     Status |= V.add(W, RM);
     // tau += v + w
     Status |= Tau.add(V, RM);
   }
   // u = t + tau
   APFloat U = T;
   Status |= U.add(Tau, RM);
 
   Floats[0] = U;
   if (!U.isFinite()) {
     Floats[1].makeZero(/* Neg = */ false);
   } else {
     // Floats[1] = (t - u) + tau
     Status |= T.subtract(U, RM);
     Status |= T.add(Tau, RM);
     Floats[1] = T;
   }
   return (opStatus)Status;
 }
 
 APFloat::opStatus DoubleAPFloat::divide(const DoubleAPFloat &RHS,
                                         APFloat::roundingMode RM) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat Tmp(semPPCDoubleDoubleLegacy, bitcastToAPInt());
   auto Ret =
       Tmp.divide(APFloat(semPPCDoubleDoubleLegacy, RHS.bitcastToAPInt()), RM);
   *this = DoubleAPFloat(semPPCDoubleDouble, Tmp.bitcastToAPInt());
   return Ret;
 }
 
 APFloat::opStatus DoubleAPFloat::remainder(const DoubleAPFloat &RHS) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat Tmp(semPPCDoubleDoubleLegacy, bitcastToAPInt());
   auto Ret =
       Tmp.remainder(APFloat(semPPCDoubleDoubleLegacy, RHS.bitcastToAPInt()));
   *this = DoubleAPFloat(semPPCDoubleDouble, Tmp.bitcastToAPInt());
   return Ret;
 }
 
 APFloat::opStatus DoubleAPFloat::mod(const DoubleAPFloat &RHS) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat Tmp(semPPCDoubleDoubleLegacy, bitcastToAPInt());
   auto Ret = Tmp.mod(APFloat(semPPCDoubleDoubleLegacy, RHS.bitcastToAPInt()));
   *this = DoubleAPFloat(semPPCDoubleDouble, Tmp.bitcastToAPInt());
   return Ret;
 }
 
 APFloat::opStatus
 DoubleAPFloat::fusedMultiplyAdd(const DoubleAPFloat &Multiplicand,
                                 const DoubleAPFloat &Addend,
                                 APFloat::roundingMode RM) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat Tmp(semPPCDoubleDoubleLegacy, bitcastToAPInt());
   auto Ret = Tmp.fusedMultiplyAdd(
       APFloat(semPPCDoubleDoubleLegacy, Multiplicand.bitcastToAPInt()),
       APFloat(semPPCDoubleDoubleLegacy, Addend.bitcastToAPInt()), RM);
   *this = DoubleAPFloat(semPPCDoubleDouble, Tmp.bitcastToAPInt());
   return Ret;
 }
 
 APFloat::opStatus DoubleAPFloat::roundToIntegral(APFloat::roundingMode RM) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat Tmp(semPPCDoubleDoubleLegacy, bitcastToAPInt());
   auto Ret = Tmp.roundToIntegral(RM);
   *this = DoubleAPFloat(semPPCDoubleDouble, Tmp.bitcastToAPInt());
   return Ret;
 }
 
 void DoubleAPFloat::changeSign() {
   Floats[0].changeSign();
   Floats[1].changeSign();
 }
 
 APFloat::cmpResult
 DoubleAPFloat::compareAbsoluteValue(const DoubleAPFloat &RHS) const {
   auto Result = Floats[0].compareAbsoluteValue(RHS.Floats[0]);
   if (Result != cmpEqual)
     return Result;
   Result = Floats[1].compareAbsoluteValue(RHS.Floats[1]);
   if (Result == cmpLessThan || Result == cmpGreaterThan) {
     auto Against = Floats[0].isNegative() ^ Floats[1].isNegative();
     auto RHSAgainst = RHS.Floats[0].isNegative() ^ RHS.Floats[1].isNegative();
     if (Against && !RHSAgainst)
       return cmpLessThan;
     if (!Against && RHSAgainst)
       return cmpGreaterThan;
     if (!Against && !RHSAgainst)
       return Result;
     if (Against && RHSAgainst)
       return (cmpResult)(cmpLessThan + cmpGreaterThan - Result);
   }
   return Result;
 }
 
 APFloat::fltCategory DoubleAPFloat::getCategory() const {
   return Floats[0].getCategory();
 }
 
 bool DoubleAPFloat::isNegative() const { return Floats[0].isNegative(); }
 
 void DoubleAPFloat::makeInf(bool Neg) {
   Floats[0].makeInf(Neg);
   Floats[1].makeZero(/* Neg = */ false);
 }
 
 void DoubleAPFloat::makeZero(bool Neg) {
   Floats[0].makeZero(Neg);
   Floats[1].makeZero(/* Neg = */ false);
 }
 
 void DoubleAPFloat::makeLargest(bool Neg) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   Floats[0] = APFloat(semIEEEdouble, APInt(64, 0x7fefffffffffffffull));
   Floats[1] = APFloat(semIEEEdouble, APInt(64, 0x7c8ffffffffffffeull));
   if (Neg)
     changeSign();
 }
 
 void DoubleAPFloat::makeSmallest(bool Neg) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   Floats[0].makeSmallest(Neg);
   Floats[1].makeZero(/* Neg = */ false);
 }
 
 void DoubleAPFloat::makeSmallestNormalized(bool Neg) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   Floats[0] = APFloat(semIEEEdouble, APInt(64, 0x0360000000000000ull));
   if (Neg)
     Floats[0].changeSign();
   Floats[1].makeZero(/* Neg = */ false);
 }
 
 void DoubleAPFloat::makeNaN(bool SNaN, bool Neg, const APInt *fill) {
   Floats[0].makeNaN(SNaN, Neg, fill);
   Floats[1].makeZero(/* Neg = */ false);
 }
 
 APFloat::cmpResult DoubleAPFloat::compare(const DoubleAPFloat &RHS) const {
   auto Result = Floats[0].compare(RHS.Floats[0]);
   // |Float[0]| > |Float[1]|
   if (Result == APFloat::cmpEqual)
     return Floats[1].compare(RHS.Floats[1]);
   return Result;
 }
 
 bool DoubleAPFloat::bitwiseIsEqual(const DoubleAPFloat &RHS) const {
   return Floats[0].bitwiseIsEqual(RHS.Floats[0]) &&
          Floats[1].bitwiseIsEqual(RHS.Floats[1]);
 }
 
 hash_code hash_value(const DoubleAPFloat &Arg) {
   if (Arg.Floats)
     return hash_combine(hash_value(Arg.Floats[0]), hash_value(Arg.Floats[1]));
   return hash_combine(Arg.Semantics);
 }
 
 APInt DoubleAPFloat::bitcastToAPInt() const {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   uint64_t Data[] = {
       Floats[0].bitcastToAPInt().getRawData()[0],
       Floats[1].bitcastToAPInt().getRawData()[0],
   };
   return APInt(128, 2, Data);
 }
 
 Expected<APFloat::opStatus> DoubleAPFloat::convertFromString(StringRef S,
                                                              roundingMode RM) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat Tmp(semPPCDoubleDoubleLegacy);
   auto Ret = Tmp.convertFromString(S, RM);
   *this = DoubleAPFloat(semPPCDoubleDouble, Tmp.bitcastToAPInt());
   return Ret;
 }
 
 APFloat::opStatus DoubleAPFloat::next(bool nextDown) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat Tmp(semPPCDoubleDoubleLegacy, bitcastToAPInt());
   auto Ret = Tmp.next(nextDown);
   *this = DoubleAPFloat(semPPCDoubleDouble, Tmp.bitcastToAPInt());
   return Ret;
 }
 
 APFloat::opStatus
 DoubleAPFloat::convertToInteger(MutableArrayRef<integerPart> Input,
                                 unsigned int Width, bool IsSigned,
                                 roundingMode RM, bool *IsExact) const {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   return APFloat(semPPCDoubleDoubleLegacy, bitcastToAPInt())
       .convertToInteger(Input, Width, IsSigned, RM, IsExact);
 }
 
 APFloat::opStatus DoubleAPFloat::convertFromAPInt(const APInt &Input,
                                                   bool IsSigned,
                                                   roundingMode RM) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat Tmp(semPPCDoubleDoubleLegacy);
   auto Ret = Tmp.convertFromAPInt(Input, IsSigned, RM);
   *this = DoubleAPFloat(semPPCDoubleDouble, Tmp.bitcastToAPInt());
   return Ret;
 }
 
 APFloat::opStatus
 DoubleAPFloat::convertFromSignExtendedInteger(const integerPart *Input,
                                               unsigned int InputSize,
                                               bool IsSigned, roundingMode RM) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat Tmp(semPPCDoubleDoubleLegacy);
   auto Ret = Tmp.convertFromSignExtendedInteger(Input, InputSize, IsSigned, RM);
   *this = DoubleAPFloat(semPPCDoubleDouble, Tmp.bitcastToAPInt());
   return Ret;
 }
 
 APFloat::opStatus
 DoubleAPFloat::convertFromZeroExtendedInteger(const integerPart *Input,
                                               unsigned int InputSize,
                                               bool IsSigned, roundingMode RM) {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat Tmp(semPPCDoubleDoubleLegacy);
   auto Ret = Tmp.convertFromZeroExtendedInteger(Input, InputSize, IsSigned, RM);
   *this = DoubleAPFloat(semPPCDoubleDouble, Tmp.bitcastToAPInt());
   return Ret;
 }
 
 unsigned int DoubleAPFloat::convertToHexString(char *DST,
                                                unsigned int HexDigits,
                                                bool UpperCase,
                                                roundingMode RM) const {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   return APFloat(semPPCDoubleDoubleLegacy, bitcastToAPInt())
       .convertToHexString(DST, HexDigits, UpperCase, RM);
 }
 
 bool DoubleAPFloat::isDenormal() const {
   return getCategory() == fcNormal &&
          (Floats[0].isDenormal() || Floats[1].isDenormal() ||
           // (double)(Hi + Lo) == Hi defines a normal number.
           Floats[0] != Floats[0] + Floats[1]);
 }
 
 bool DoubleAPFloat::isSmallest() const {
   if (getCategory() != fcNormal)
     return false;
   DoubleAPFloat Tmp(*this);
   Tmp.makeSmallest(this->isNegative());
   return Tmp.compare(*this) == cmpEqual;
 }
 
 bool DoubleAPFloat::isLargest() const {
   if (getCategory() != fcNormal)
     return false;
   DoubleAPFloat Tmp(*this);
   Tmp.makeLargest(this->isNegative());
   return Tmp.compare(*this) == cmpEqual;
 }
 
 bool DoubleAPFloat::isInteger() const {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   return Floats[0].isInteger() && Floats[1].isInteger();
 }
 
 void DoubleAPFloat::toString(SmallVectorImpl<char> &Str,
                              unsigned FormatPrecision,
                              unsigned FormatMaxPadding,
                              bool TruncateZero) const {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat(semPPCDoubleDoubleLegacy, bitcastToAPInt())
       .toString(Str, FormatPrecision, FormatMaxPadding, TruncateZero);
 }
 
 bool DoubleAPFloat::getExactInverse(APFloat *inv) const {
   assert(Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat Tmp(semPPCDoubleDoubleLegacy, bitcastToAPInt());
   if (!inv)
     return Tmp.getExactInverse(nullptr);
   APFloat Inv(semPPCDoubleDoubleLegacy);
   auto Ret = Tmp.getExactInverse(&Inv);
   *inv = APFloat(semPPCDoubleDouble, Inv.bitcastToAPInt());
   return Ret;
 }
 
 DoubleAPFloat scalbn(DoubleAPFloat Arg, int Exp, APFloat::roundingMode RM) {
   assert(Arg.Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   return DoubleAPFloat(semPPCDoubleDouble, scalbn(Arg.Floats[0], Exp, RM),
                        scalbn(Arg.Floats[1], Exp, RM));
 }
 
 DoubleAPFloat frexp(const DoubleAPFloat &Arg, int &Exp,
                     APFloat::roundingMode RM) {
   assert(Arg.Semantics == &semPPCDoubleDouble && "Unexpected Semantics");
   APFloat First = frexp(Arg.Floats[0], Exp, RM);
   APFloat Second = Arg.Floats[1];
   if (Arg.getCategory() == APFloat::fcNormal)
     Second = scalbn(Second, -Exp, RM);
   return DoubleAPFloat(semPPCDoubleDouble, std::move(First), std::move(Second));
 }
 
 } // End detail namespace
 
 APFloat::Storage::Storage(IEEEFloat F, const fltSemantics &Semantics) {
   if (usesLayout<IEEEFloat>(Semantics)) {
     new (&IEEE) IEEEFloat(std::move(F));
     return;
   }
   if (usesLayout<DoubleAPFloat>(Semantics)) {
     const fltSemantics& S = F.getSemantics();
     new (&Double)
         DoubleAPFloat(Semantics, APFloat(std::move(F), S),
                       APFloat(semIEEEdouble));
     return;
   }
   llvm_unreachable("Unexpected semantics");
 }
 
 Expected<APFloat::opStatus> APFloat::convertFromString(StringRef Str,
                                                        roundingMode RM) {
   APFLOAT_DISPATCH_ON_SEMANTICS(convertFromString(Str, RM));
 }
 
 hash_code hash_value(const APFloat &Arg) {
   if (APFloat::usesLayout<detail::IEEEFloat>(Arg.getSemantics()))
     return hash_value(Arg.U.IEEE);
   if (APFloat::usesLayout<detail::DoubleAPFloat>(Arg.getSemantics()))
     return hash_value(Arg.U.Double);
   llvm_unreachable("Unexpected semantics");
 }
 
 APFloat::APFloat(const fltSemantics &Semantics, StringRef S)
     : APFloat(Semantics) {
   auto StatusOrErr = convertFromString(S, rmNearestTiesToEven);
   assert(StatusOrErr && "Invalid floating point representation");
   consumeError(StatusOrErr.takeError());
 }
 
 APFloat::opStatus APFloat::convert(const fltSemantics &ToSemantics,
                                    roundingMode RM, bool *losesInfo) {
   if (&getSemantics() == &ToSemantics) {
     *losesInfo = false;
     return opOK;
   }
   if (usesLayout<IEEEFloat>(getSemantics()) &&
       usesLayout<IEEEFloat>(ToSemantics))
     return U.IEEE.convert(ToSemantics, RM, losesInfo);
   if (usesLayout<IEEEFloat>(getSemantics()) &&
       usesLayout<DoubleAPFloat>(ToSemantics)) {
     assert(&ToSemantics == &semPPCDoubleDouble);
     auto Ret = U.IEEE.convert(semPPCDoubleDoubleLegacy, RM, losesInfo);
     *this = APFloat(ToSemantics, U.IEEE.bitcastToAPInt());
     return Ret;
   }
   if (usesLayout<DoubleAPFloat>(getSemantics()) &&
       usesLayout<IEEEFloat>(ToSemantics)) {
     auto Ret = getIEEE().convert(ToSemantics, RM, losesInfo);
     *this = APFloat(std::move(getIEEE()), ToSemantics);
     return Ret;
   }
   llvm_unreachable("Unexpected semantics");
 }
 
 APFloat APFloat::getAllOnesValue(const fltSemantics &Semantics,
                                  unsigned BitWidth) {
   return APFloat(Semantics, APInt::getAllOnesValue(BitWidth));
 }
 
 void APFloat::print(raw_ostream &OS) const {
   SmallVector<char, 16> Buffer;
   toString(Buffer);
   OS << Buffer << "\n";
 }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 LLVM_DUMP_METHOD void APFloat::dump() const { print(dbgs()); }
 #endif
 
 void APFloat::Profile(FoldingSetNodeID &NID) const {
   NID.Add(bitcastToAPInt());
 }
 
 /* Same as convertToInteger(integerPart*, ...), except the result is returned in
    an APSInt, whose initial bit-width and signed-ness are used to determine the
    precision of the conversion.
  */
 APFloat::opStatus APFloat::convertToInteger(APSInt &result,
                                             roundingMode rounding_mode,
                                             bool *isExact) const {
   unsigned bitWidth = result.getBitWidth();
   SmallVector<uint64_t, 4> parts(result.getNumWords());
   opStatus status = convertToInteger(parts, bitWidth, result.isSigned(),
                                      rounding_mode, isExact);
   // Keeps the original signed-ness.
   result = APInt(bitWidth, parts);
   return status;
 }
 
 } // End llvm namespace
 
 #undef APFLOAT_DISPATCH_ON_SEMANTICS
Index: vendor/llvm-project/release-11.x/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp
===================================================================
--- vendor/llvm-project/release-11.x/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp	(revision 366333)
@@ -1,1049 +1,1054 @@
 //===--- AArch64CallLowering.cpp - Call lowering --------------------------===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 ///
 /// \file
 /// This file implements the lowering of LLVM calls to machine code calls for
 /// GlobalISel.
 ///
 //===----------------------------------------------------------------------===//
 
 #include "AArch64CallLowering.h"
 #include "AArch64ISelLowering.h"
 #include "AArch64MachineFunctionInfo.h"
 #include "AArch64Subtarget.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/CodeGen/Analysis.h"
 #include "llvm/CodeGen/CallingConvLower.h"
 #include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
 #include "llvm/CodeGen/GlobalISel/Utils.h"
 #include "llvm/CodeGen/LowLevelType.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineMemOperand.h"
 #include "llvm/CodeGen/MachineOperand.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/CodeGen/ValueTypes.h"
 #include "llvm/IR/Argument.h"
 #include "llvm/IR/Attributes.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/Type.h"
 #include "llvm/IR/Value.h"
 #include "llvm/Support/MachineValueType.h"
 #include <algorithm>
 #include <cassert>
 #include <cstdint>
 #include <iterator>
 
 #define DEBUG_TYPE "aarch64-call-lowering"
 
 using namespace llvm;
 
 AArch64CallLowering::AArch64CallLowering(const AArch64TargetLowering &TLI)
   : CallLowering(&TLI) {}
 
 namespace {
 struct IncomingArgHandler : public CallLowering::ValueHandler {
   IncomingArgHandler(MachineIRBuilder &MIRBuilder, MachineRegisterInfo &MRI,
                      CCAssignFn *AssignFn)
       : ValueHandler(MIRBuilder, MRI, AssignFn), StackUsed(0) {}
 
   Register getStackAddress(uint64_t Size, int64_t Offset,
                            MachinePointerInfo &MPO) override {
     auto &MFI = MIRBuilder.getMF().getFrameInfo();
     int FI = MFI.CreateFixedObject(Size, Offset, true);
     MPO = MachinePointerInfo::getFixedStack(MIRBuilder.getMF(), FI);
     auto AddrReg = MIRBuilder.buildFrameIndex(LLT::pointer(0, 64), FI);
     StackUsed = std::max(StackUsed, Size + Offset);
     return AddrReg.getReg(0);
   }
 
   void assignValueToReg(Register ValVReg, Register PhysReg,
                         CCValAssign &VA) override {
     markPhysRegUsed(PhysReg);
     switch (VA.getLocInfo()) {
     default:
       MIRBuilder.buildCopy(ValVReg, PhysReg);
       break;
     case CCValAssign::LocInfo::SExt:
     case CCValAssign::LocInfo::ZExt:
     case CCValAssign::LocInfo::AExt: {
       auto Copy = MIRBuilder.buildCopy(LLT{VA.getLocVT()}, PhysReg);
       MIRBuilder.buildTrunc(ValVReg, Copy);
       break;
     }
     }
   }
 
-  void assignValueToAddress(Register ValVReg, Register Addr, uint64_t Size,
+  void assignValueToAddress(Register ValVReg, Register Addr, uint64_t MemSize,
                             MachinePointerInfo &MPO, CCValAssign &VA) override {
     MachineFunction &MF = MIRBuilder.getMF();
+
+    // The reported memory location may be wider than the value.
+    const LLT RegTy = MRI.getType(ValVReg);
+    MemSize = std::min(static_cast<uint64_t>(RegTy.getSizeInBytes()), MemSize);
+
     auto MMO = MF.getMachineMemOperand(
-        MPO, MachineMemOperand::MOLoad | MachineMemOperand::MOInvariant, Size,
+        MPO, MachineMemOperand::MOLoad | MachineMemOperand::MOInvariant, MemSize,
         inferAlignFromPtrInfo(MF, MPO));
     MIRBuilder.buildLoad(ValVReg, Addr, *MMO);
   }
 
   /// How the physical register gets marked varies between formal
   /// parameters (it's a basic-block live-in), and a call instruction
   /// (it's an implicit-def of the BL).
   virtual void markPhysRegUsed(unsigned PhysReg) = 0;
 
   bool isIncomingArgumentHandler() const override { return true; }
 
   uint64_t StackUsed;
 };
 
 struct FormalArgHandler : public IncomingArgHandler {
   FormalArgHandler(MachineIRBuilder &MIRBuilder, MachineRegisterInfo &MRI,
                    CCAssignFn *AssignFn)
     : IncomingArgHandler(MIRBuilder, MRI, AssignFn) {}
 
   void markPhysRegUsed(unsigned PhysReg) override {
     MIRBuilder.getMRI()->addLiveIn(PhysReg);
     MIRBuilder.getMBB().addLiveIn(PhysReg);
   }
 };
 
 struct CallReturnHandler : public IncomingArgHandler {
   CallReturnHandler(MachineIRBuilder &MIRBuilder, MachineRegisterInfo &MRI,
                     MachineInstrBuilder MIB, CCAssignFn *AssignFn)
     : IncomingArgHandler(MIRBuilder, MRI, AssignFn), MIB(MIB) {}
 
   void markPhysRegUsed(unsigned PhysReg) override {
     MIB.addDef(PhysReg, RegState::Implicit);
   }
 
   MachineInstrBuilder MIB;
 };
 
 struct OutgoingArgHandler : public CallLowering::ValueHandler {
   OutgoingArgHandler(MachineIRBuilder &MIRBuilder, MachineRegisterInfo &MRI,
                      MachineInstrBuilder MIB, CCAssignFn *AssignFn,
                      CCAssignFn *AssignFnVarArg, bool IsTailCall = false,
                      int FPDiff = 0)
       : ValueHandler(MIRBuilder, MRI, AssignFn), MIB(MIB),
         AssignFnVarArg(AssignFnVarArg), IsTailCall(IsTailCall), FPDiff(FPDiff),
         StackSize(0), SPReg(0) {}
 
   bool isIncomingArgumentHandler() const override { return false; }
 
   Register getStackAddress(uint64_t Size, int64_t Offset,
                            MachinePointerInfo &MPO) override {
     MachineFunction &MF = MIRBuilder.getMF();
     LLT p0 = LLT::pointer(0, 64);
     LLT s64 = LLT::scalar(64);
 
     if (IsTailCall) {
       Offset += FPDiff;
       int FI = MF.getFrameInfo().CreateFixedObject(Size, Offset, true);
       auto FIReg = MIRBuilder.buildFrameIndex(p0, FI);
       MPO = MachinePointerInfo::getFixedStack(MF, FI);
       return FIReg.getReg(0);
     }
 
     if (!SPReg)
       SPReg = MIRBuilder.buildCopy(p0, Register(AArch64::SP)).getReg(0);
 
     auto OffsetReg = MIRBuilder.buildConstant(s64, Offset);
 
     auto AddrReg = MIRBuilder.buildPtrAdd(p0, SPReg, OffsetReg);
 
     MPO = MachinePointerInfo::getStack(MF, Offset);
     return AddrReg.getReg(0);
   }
 
   void assignValueToReg(Register ValVReg, Register PhysReg,
                         CCValAssign &VA) override {
     MIB.addUse(PhysReg, RegState::Implicit);
     Register ExtReg = extendRegister(ValVReg, VA);
     MIRBuilder.buildCopy(PhysReg, ExtReg);
   }
 
   void assignValueToAddress(Register ValVReg, Register Addr, uint64_t Size,
                             MachinePointerInfo &MPO, CCValAssign &VA) override {
     MachineFunction &MF = MIRBuilder.getMF();
     auto MMO = MF.getMachineMemOperand(MPO, MachineMemOperand::MOStore, Size,
                                        inferAlignFromPtrInfo(MF, MPO));
     MIRBuilder.buildStore(ValVReg, Addr, *MMO);
   }
 
   void assignValueToAddress(const CallLowering::ArgInfo &Arg, Register Addr,
                             uint64_t Size, MachinePointerInfo &MPO,
                             CCValAssign &VA) override {
     unsigned MaxSize = Size * 8;
     // For varargs, we always want to extend them to 8 bytes, in which case
     // we disable setting a max.
     if (!Arg.IsFixed)
       MaxSize = 0;
 
     Register ValVReg = VA.getLocInfo() != CCValAssign::LocInfo::FPExt
                            ? extendRegister(Arg.Regs[0], VA, MaxSize)
                            : Arg.Regs[0];
 
     // If we extended we might need to adjust the MMO's Size.
     const LLT RegTy = MRI.getType(ValVReg);
     if (RegTy.getSizeInBytes() > Size)
       Size = RegTy.getSizeInBytes();
 
     assignValueToAddress(ValVReg, Addr, Size, MPO, VA);
   }
 
   bool assignArg(unsigned ValNo, MVT ValVT, MVT LocVT,
                  CCValAssign::LocInfo LocInfo,
                  const CallLowering::ArgInfo &Info,
                  ISD::ArgFlagsTy Flags,
                  CCState &State) override {
     bool Res;
     if (Info.IsFixed)
       Res = AssignFn(ValNo, ValVT, LocVT, LocInfo, Flags, State);
     else
       Res = AssignFnVarArg(ValNo, ValVT, LocVT, LocInfo, Flags, State);
 
     StackSize = State.getNextStackOffset();
     return Res;
   }
 
   MachineInstrBuilder MIB;
   CCAssignFn *AssignFnVarArg;
   bool IsTailCall;
 
   /// For tail calls, the byte offset of the call's argument area from the
   /// callee's. Unused elsewhere.
   int FPDiff;
   uint64_t StackSize;
 
   // Cache the SP register vreg if we need it more than once in this call site.
   Register SPReg;
 };
 } // namespace
 
 static bool doesCalleeRestoreStack(CallingConv::ID CallConv, bool TailCallOpt) {
   return CallConv == CallingConv::Fast && TailCallOpt;
 }
 
 void AArch64CallLowering::splitToValueTypes(
     const ArgInfo &OrigArg, SmallVectorImpl<ArgInfo> &SplitArgs,
     const DataLayout &DL, MachineRegisterInfo &MRI, CallingConv::ID CallConv) const {
   const AArch64TargetLowering &TLI = *getTLI<AArch64TargetLowering>();
   LLVMContext &Ctx = OrigArg.Ty->getContext();
 
   SmallVector<EVT, 4> SplitVTs;
   SmallVector<uint64_t, 4> Offsets;
   ComputeValueVTs(TLI, DL, OrigArg.Ty, SplitVTs, &Offsets, 0);
 
   if (SplitVTs.size() == 0)
     return;
 
   if (SplitVTs.size() == 1) {
     // No splitting to do, but we want to replace the original type (e.g. [1 x
     // double] -> double).
     SplitArgs.emplace_back(OrigArg.Regs[0], SplitVTs[0].getTypeForEVT(Ctx),
                            OrigArg.Flags[0], OrigArg.IsFixed);
     return;
   }
 
   // Create one ArgInfo for each virtual register in the original ArgInfo.
   assert(OrigArg.Regs.size() == SplitVTs.size() && "Regs / types mismatch");
 
   bool NeedsRegBlock = TLI.functionArgumentNeedsConsecutiveRegisters(
       OrigArg.Ty, CallConv, false);
   for (unsigned i = 0, e = SplitVTs.size(); i < e; ++i) {
     Type *SplitTy = SplitVTs[i].getTypeForEVT(Ctx);
     SplitArgs.emplace_back(OrigArg.Regs[i], SplitTy, OrigArg.Flags[0],
                            OrigArg.IsFixed);
     if (NeedsRegBlock)
       SplitArgs.back().Flags[0].setInConsecutiveRegs();
   }
 
   SplitArgs.back().Flags[0].setInConsecutiveRegsLast();
 }
 
 bool AArch64CallLowering::lowerReturn(MachineIRBuilder &MIRBuilder,
                                       const Value *Val,
                                       ArrayRef<Register> VRegs,
                                       Register SwiftErrorVReg) const {
   auto MIB = MIRBuilder.buildInstrNoInsert(AArch64::RET_ReallyLR);
   assert(((Val && !VRegs.empty()) || (!Val && VRegs.empty())) &&
          "Return value without a vreg");
 
   bool Success = true;
   if (!VRegs.empty()) {
     MachineFunction &MF = MIRBuilder.getMF();
     const Function &F = MF.getFunction();
 
     MachineRegisterInfo &MRI = MF.getRegInfo();
     const AArch64TargetLowering &TLI = *getTLI<AArch64TargetLowering>();
     CCAssignFn *AssignFn = TLI.CCAssignFnForReturn(F.getCallingConv());
     auto &DL = F.getParent()->getDataLayout();
     LLVMContext &Ctx = Val->getType()->getContext();
 
     SmallVector<EVT, 4> SplitEVTs;
     ComputeValueVTs(TLI, DL, Val->getType(), SplitEVTs);
     assert(VRegs.size() == SplitEVTs.size() &&
            "For each split Type there should be exactly one VReg.");
 
     SmallVector<ArgInfo, 8> SplitArgs;
     CallingConv::ID CC = F.getCallingConv();
 
     for (unsigned i = 0; i < SplitEVTs.size(); ++i) {
       if (TLI.getNumRegistersForCallingConv(Ctx, CC, SplitEVTs[i]) > 1) {
         LLVM_DEBUG(dbgs() << "Can't handle extended arg types which need split");
         return false;
       }
 
       Register CurVReg = VRegs[i];
       ArgInfo CurArgInfo = ArgInfo{CurVReg, SplitEVTs[i].getTypeForEVT(Ctx)};
       setArgFlags(CurArgInfo, AttributeList::ReturnIndex, DL, F);
 
       // i1 is a special case because SDAG i1 true is naturally zero extended
       // when widened using ANYEXT. We need to do it explicitly here.
       if (MRI.getType(CurVReg).getSizeInBits() == 1) {
         CurVReg = MIRBuilder.buildZExt(LLT::scalar(8), CurVReg).getReg(0);
       } else {
         // Some types will need extending as specified by the CC.
         MVT NewVT = TLI.getRegisterTypeForCallingConv(Ctx, CC, SplitEVTs[i]);
         if (EVT(NewVT) != SplitEVTs[i]) {
           unsigned ExtendOp = TargetOpcode::G_ANYEXT;
           if (F.getAttributes().hasAttribute(AttributeList::ReturnIndex,
                                              Attribute::SExt))
             ExtendOp = TargetOpcode::G_SEXT;
           else if (F.getAttributes().hasAttribute(AttributeList::ReturnIndex,
                                                   Attribute::ZExt))
             ExtendOp = TargetOpcode::G_ZEXT;
 
           LLT NewLLT(NewVT);
           LLT OldLLT(MVT::getVT(CurArgInfo.Ty));
           CurArgInfo.Ty = EVT(NewVT).getTypeForEVT(Ctx);
           // Instead of an extend, we might have a vector type which needs
           // padding with more elements, e.g. <2 x half> -> <4 x half>.
           if (NewVT.isVector()) {
             if (OldLLT.isVector()) {
               if (NewLLT.getNumElements() > OldLLT.getNumElements()) {
                 // We don't handle VA types which are not exactly twice the
                 // size, but can easily be done in future.
                 if (NewLLT.getNumElements() != OldLLT.getNumElements() * 2) {
                   LLVM_DEBUG(dbgs() << "Outgoing vector ret has too many elts");
                   return false;
                 }
                 auto Undef = MIRBuilder.buildUndef({OldLLT});
                 CurVReg =
                     MIRBuilder.buildMerge({NewLLT}, {CurVReg, Undef}).getReg(0);
               } else {
                 // Just do a vector extend.
                 CurVReg = MIRBuilder.buildInstr(ExtendOp, {NewLLT}, {CurVReg})
                               .getReg(0);
               }
             } else if (NewLLT.getNumElements() == 2) {
               // We need to pad a <1 x S> type to <2 x S>. Since we don't have
               // <1 x S> vector types in GISel we use a build_vector instead
               // of a vector merge/concat.
               auto Undef = MIRBuilder.buildUndef({OldLLT});
               CurVReg =
                   MIRBuilder
                       .buildBuildVector({NewLLT}, {CurVReg, Undef.getReg(0)})
                       .getReg(0);
             } else {
               LLVM_DEBUG(dbgs() << "Could not handle ret ty");
               return false;
             }
           } else {
             // A scalar extend.
             CurVReg =
                 MIRBuilder.buildInstr(ExtendOp, {NewLLT}, {CurVReg}).getReg(0);
           }
         }
       }
       if (CurVReg != CurArgInfo.Regs[0]) {
         CurArgInfo.Regs[0] = CurVReg;
         // Reset the arg flags after modifying CurVReg.
         setArgFlags(CurArgInfo, AttributeList::ReturnIndex, DL, F);
       }
      splitToValueTypes(CurArgInfo, SplitArgs, DL, MRI, CC);
     }
 
     OutgoingArgHandler Handler(MIRBuilder, MRI, MIB, AssignFn, AssignFn);
     Success = handleAssignments(MIRBuilder, SplitArgs, Handler);
   }
 
   if (SwiftErrorVReg) {
     MIB.addUse(AArch64::X21, RegState::Implicit);
     MIRBuilder.buildCopy(AArch64::X21, SwiftErrorVReg);
   }
 
   MIRBuilder.insertInstr(MIB);
   return Success;
 }
 
 /// Helper function to compute forwarded registers for musttail calls. Computes
 /// the forwarded registers, sets MBB liveness, and emits COPY instructions that
 /// can be used to save + restore registers later.
 static void handleMustTailForwardedRegisters(MachineIRBuilder &MIRBuilder,
                                              CCAssignFn *AssignFn) {
   MachineBasicBlock &MBB = MIRBuilder.getMBB();
   MachineFunction &MF = MIRBuilder.getMF();
   MachineFrameInfo &MFI = MF.getFrameInfo();
 
   if (!MFI.hasMustTailInVarArgFunc())
     return;
 
   AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();
   const Function &F = MF.getFunction();
   assert(F.isVarArg() && "Expected F to be vararg?");
 
   // Compute the set of forwarded registers. The rest are scratch.
   SmallVector<CCValAssign, 16> ArgLocs;
   CCState CCInfo(F.getCallingConv(), /*IsVarArg=*/true, MF, ArgLocs,
                  F.getContext());
   SmallVector<MVT, 2> RegParmTypes;
   RegParmTypes.push_back(MVT::i64);
   RegParmTypes.push_back(MVT::f128);
 
   // Later on, we can use this vector to restore the registers if necessary.
   SmallVectorImpl<ForwardedRegister> &Forwards =
       FuncInfo->getForwardedMustTailRegParms();
   CCInfo.analyzeMustTailForwardedRegisters(Forwards, RegParmTypes, AssignFn);
 
   // Conservatively forward X8, since it might be used for an aggregate
   // return.
   if (!CCInfo.isAllocated(AArch64::X8)) {
     unsigned X8VReg = MF.addLiveIn(AArch64::X8, &AArch64::GPR64RegClass);
     Forwards.push_back(ForwardedRegister(X8VReg, AArch64::X8, MVT::i64));
   }
 
   // Add the forwards to the MachineBasicBlock and MachineFunction.
   for (const auto &F : Forwards) {
     MBB.addLiveIn(F.PReg);
     MIRBuilder.buildCopy(Register(F.VReg), Register(F.PReg));
   }
 }
 
 bool AArch64CallLowering::fallBackToDAGISel(const Function &F) const {
   if (isa<ScalableVectorType>(F.getReturnType()))
     return true;
   return llvm::any_of(F.args(), [](const Argument &A) {
     return isa<ScalableVectorType>(A.getType());
   });
 }
 
 bool AArch64CallLowering::lowerFormalArguments(
     MachineIRBuilder &MIRBuilder, const Function &F,
     ArrayRef<ArrayRef<Register>> VRegs) const {
   MachineFunction &MF = MIRBuilder.getMF();
   MachineBasicBlock &MBB = MIRBuilder.getMBB();
   MachineRegisterInfo &MRI = MF.getRegInfo();
   auto &DL = F.getParent()->getDataLayout();
 
   SmallVector<ArgInfo, 8> SplitArgs;
   unsigned i = 0;
   for (auto &Arg : F.args()) {
     if (DL.getTypeStoreSize(Arg.getType()).isZero())
       continue;
 
     ArgInfo OrigArg{VRegs[i], Arg.getType()};
     setArgFlags(OrigArg, i + AttributeList::FirstArgIndex, DL, F);
 
     splitToValueTypes(OrigArg, SplitArgs, DL, MRI, F.getCallingConv());
     ++i;
   }
 
   if (!MBB.empty())
     MIRBuilder.setInstr(*MBB.begin());
 
   const AArch64TargetLowering &TLI = *getTLI<AArch64TargetLowering>();
   CCAssignFn *AssignFn =
       TLI.CCAssignFnForCall(F.getCallingConv(), /*IsVarArg=*/false);
 
   FormalArgHandler Handler(MIRBuilder, MRI, AssignFn);
   if (!handleAssignments(MIRBuilder, SplitArgs, Handler))
     return false;
 
   AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();
   uint64_t StackOffset = Handler.StackUsed;
   if (F.isVarArg()) {
     auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
     if (!Subtarget.isTargetDarwin()) {
         // FIXME: we need to reimplement saveVarArgsRegisters from
       // AArch64ISelLowering.
       return false;
     }
 
     // We currently pass all varargs at 8-byte alignment, or 4 in ILP32.
     StackOffset = alignTo(Handler.StackUsed, Subtarget.isTargetILP32() ? 4 : 8);
 
     auto &MFI = MIRBuilder.getMF().getFrameInfo();
     FuncInfo->setVarArgsStackIndex(MFI.CreateFixedObject(4, StackOffset, true));
   }
 
   if (doesCalleeRestoreStack(F.getCallingConv(),
                              MF.getTarget().Options.GuaranteedTailCallOpt)) {
     // We have a non-standard ABI, so why not make full use of the stack that
     // we're going to pop? It must be aligned to 16 B in any case.
     StackOffset = alignTo(StackOffset, 16);
 
     // If we're expected to restore the stack (e.g. fastcc), then we'll be
     // adding a multiple of 16.
     FuncInfo->setArgumentStackToRestore(StackOffset);
 
     // Our own callers will guarantee that the space is free by giving an
     // aligned value to CALLSEQ_START.
   }
 
   // When we tail call, we need to check if the callee's arguments
   // will fit on the caller's stack. So, whenever we lower formal arguments,
   // we should keep track of this information, since we might lower a tail call
   // in this function later.
   FuncInfo->setBytesInStackArgArea(StackOffset);
 
   auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
   if (Subtarget.hasCustomCallingConv())
     Subtarget.getRegisterInfo()->UpdateCustomCalleeSavedRegs(MF);
 
   handleMustTailForwardedRegisters(MIRBuilder, AssignFn);
 
   // Move back to the end of the basic block.
   MIRBuilder.setMBB(MBB);
 
   return true;
 }
 
 /// Return true if the calling convention is one that we can guarantee TCO for.
 static bool canGuaranteeTCO(CallingConv::ID CC) {
   return CC == CallingConv::Fast;
 }
 
 /// Return true if we might ever do TCO for calls with this calling convention.
 static bool mayTailCallThisCC(CallingConv::ID CC) {
   switch (CC) {
   case CallingConv::C:
   case CallingConv::PreserveMost:
   case CallingConv::Swift:
     return true;
   default:
     return canGuaranteeTCO(CC);
   }
 }
 
 /// Returns a pair containing the fixed CCAssignFn and the vararg CCAssignFn for
 /// CC.
 static std::pair<CCAssignFn *, CCAssignFn *>
 getAssignFnsForCC(CallingConv::ID CC, const AArch64TargetLowering &TLI) {
   return {TLI.CCAssignFnForCall(CC, false), TLI.CCAssignFnForCall(CC, true)};
 }
 
 bool AArch64CallLowering::doCallerAndCalleePassArgsTheSameWay(
     CallLoweringInfo &Info, MachineFunction &MF,
     SmallVectorImpl<ArgInfo> &InArgs) const {
   const Function &CallerF = MF.getFunction();
   CallingConv::ID CalleeCC = Info.CallConv;
   CallingConv::ID CallerCC = CallerF.getCallingConv();
 
   // If the calling conventions match, then everything must be the same.
   if (CalleeCC == CallerCC)
     return true;
 
   // Check if the caller and callee will handle arguments in the same way.
   const AArch64TargetLowering &TLI = *getTLI<AArch64TargetLowering>();
   CCAssignFn *CalleeAssignFnFixed;
   CCAssignFn *CalleeAssignFnVarArg;
   std::tie(CalleeAssignFnFixed, CalleeAssignFnVarArg) =
       getAssignFnsForCC(CalleeCC, TLI);
 
   CCAssignFn *CallerAssignFnFixed;
   CCAssignFn *CallerAssignFnVarArg;
   std::tie(CallerAssignFnFixed, CallerAssignFnVarArg) =
       getAssignFnsForCC(CallerCC, TLI);
 
   if (!resultsCompatible(Info, MF, InArgs, *CalleeAssignFnFixed,
                          *CalleeAssignFnVarArg, *CallerAssignFnFixed,
                          *CallerAssignFnVarArg))
     return false;
 
   // Make sure that the caller and callee preserve all of the same registers.
   auto TRI = MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
   const uint32_t *CallerPreserved = TRI->getCallPreservedMask(MF, CallerCC);
   const uint32_t *CalleePreserved = TRI->getCallPreservedMask(MF, CalleeCC);
   if (MF.getSubtarget<AArch64Subtarget>().hasCustomCallingConv()) {
     TRI->UpdateCustomCallPreservedMask(MF, &CallerPreserved);
     TRI->UpdateCustomCallPreservedMask(MF, &CalleePreserved);
   }
 
   return TRI->regmaskSubsetEqual(CallerPreserved, CalleePreserved);
 }
 
 bool AArch64CallLowering::areCalleeOutgoingArgsTailCallable(
     CallLoweringInfo &Info, MachineFunction &MF,
     SmallVectorImpl<ArgInfo> &OutArgs) const {
   // If there are no outgoing arguments, then we are done.
   if (OutArgs.empty())
     return true;
 
   const Function &CallerF = MF.getFunction();
   CallingConv::ID CalleeCC = Info.CallConv;
   CallingConv::ID CallerCC = CallerF.getCallingConv();
   const AArch64TargetLowering &TLI = *getTLI<AArch64TargetLowering>();
 
   CCAssignFn *AssignFnFixed;
   CCAssignFn *AssignFnVarArg;
   std::tie(AssignFnFixed, AssignFnVarArg) = getAssignFnsForCC(CalleeCC, TLI);
 
   // We have outgoing arguments. Make sure that we can tail call with them.
   SmallVector<CCValAssign, 16> OutLocs;
   CCState OutInfo(CalleeCC, false, MF, OutLocs, CallerF.getContext());
 
   if (!analyzeArgInfo(OutInfo, OutArgs, *AssignFnFixed, *AssignFnVarArg)) {
     LLVM_DEBUG(dbgs() << "... Could not analyze call operands.\n");
     return false;
   }
 
   // Make sure that they can fit on the caller's stack.
   const AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();
   if (OutInfo.getNextStackOffset() > FuncInfo->getBytesInStackArgArea()) {
     LLVM_DEBUG(dbgs() << "... Cannot fit call operands on caller's stack.\n");
     return false;
   }
 
   // Verify that the parameters in callee-saved registers match.
   // TODO: Port this over to CallLowering as general code once swiftself is
   // supported.
   auto TRI = MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
   const uint32_t *CallerPreservedMask = TRI->getCallPreservedMask(MF, CallerCC);
   MachineRegisterInfo &MRI = MF.getRegInfo();
 
   for (unsigned i = 0; i < OutLocs.size(); ++i) {
     auto &ArgLoc = OutLocs[i];
     // If it's not a register, it's fine.
     if (!ArgLoc.isRegLoc()) {
       if (Info.IsVarArg) {
         // Be conservative and disallow variadic memory operands to match SDAG's
         // behaviour.
         // FIXME: If the caller's calling convention is C, then we can
         // potentially use its argument area. However, for cases like fastcc,
         // we can't do anything.
         LLVM_DEBUG(
             dbgs()
             << "... Cannot tail call vararg function with stack arguments\n");
         return false;
       }
       continue;
     }
 
     Register Reg = ArgLoc.getLocReg();
 
     // Only look at callee-saved registers.
     if (MachineOperand::clobbersPhysReg(CallerPreservedMask, Reg))
       continue;
 
     LLVM_DEBUG(
         dbgs()
         << "... Call has an argument passed in a callee-saved register.\n");
 
     // Check if it was copied from.
     ArgInfo &OutInfo = OutArgs[i];
 
     if (OutInfo.Regs.size() > 1) {
       LLVM_DEBUG(
           dbgs() << "... Cannot handle arguments in multiple registers.\n");
       return false;
     }
 
     // Check if we copy the register, walking through copies from virtual
     // registers. Note that getDefIgnoringCopies does not ignore copies from
     // physical registers.
     MachineInstr *RegDef = getDefIgnoringCopies(OutInfo.Regs[0], MRI);
     if (!RegDef || RegDef->getOpcode() != TargetOpcode::COPY) {
       LLVM_DEBUG(
           dbgs()
           << "... Parameter was not copied into a VReg, cannot tail call.\n");
       return false;
     }
 
     // Got a copy. Verify that it's the same as the register we want.
     Register CopyRHS = RegDef->getOperand(1).getReg();
     if (CopyRHS != Reg) {
       LLVM_DEBUG(dbgs() << "... Callee-saved register was not copied into "
                            "VReg, cannot tail call.\n");
       return false;
     }
   }
 
   return true;
 }
 
 bool AArch64CallLowering::isEligibleForTailCallOptimization(
     MachineIRBuilder &MIRBuilder, CallLoweringInfo &Info,
     SmallVectorImpl<ArgInfo> &InArgs,
     SmallVectorImpl<ArgInfo> &OutArgs) const {
 
   // Must pass all target-independent checks in order to tail call optimize.
   if (!Info.IsTailCall)
     return false;
 
   CallingConv::ID CalleeCC = Info.CallConv;
   MachineFunction &MF = MIRBuilder.getMF();
   const Function &CallerF = MF.getFunction();
 
   LLVM_DEBUG(dbgs() << "Attempting to lower call as tail call\n");
 
   if (Info.SwiftErrorVReg) {
     // TODO: We should handle this.
     // Note that this is also handled by the check for no outgoing arguments.
     // Proactively disabling this though, because the swifterror handling in
     // lowerCall inserts a COPY *after* the location of the call.
     LLVM_DEBUG(dbgs() << "... Cannot handle tail calls with swifterror yet.\n");
     return false;
   }
 
   if (!mayTailCallThisCC(CalleeCC)) {
     LLVM_DEBUG(dbgs() << "... Calling convention cannot be tail called.\n");
     return false;
   }
 
   // Byval parameters hand the function a pointer directly into the stack area
   // we want to reuse during a tail call. Working around this *is* possible (see
   // X86).
   //
   // FIXME: In AArch64ISelLowering, this isn't worked around. Can/should we try
   // it?
   //
   // On Windows, "inreg" attributes signify non-aggregate indirect returns.
   // In this case, it is necessary to save/restore X0 in the callee. Tail
   // call opt interferes with this. So we disable tail call opt when the
   // caller has an argument with "inreg" attribute.
   //
   // FIXME: Check whether the callee also has an "inreg" argument.
   //
   // When the caller has a swifterror argument, we don't want to tail call
   // because would have to move into the swifterror register before the
   // tail call.
   if (any_of(CallerF.args(), [](const Argument &A) {
         return A.hasByValAttr() || A.hasInRegAttr() || A.hasSwiftErrorAttr();
       })) {
     LLVM_DEBUG(dbgs() << "... Cannot tail call from callers with byval, "
                          "inreg, or swifterror arguments\n");
     return false;
   }
 
   // Externally-defined functions with weak linkage should not be
   // tail-called on AArch64 when the OS does not support dynamic
   // pre-emption of symbols, as the AAELF spec requires normal calls
   // to undefined weak functions to be replaced with a NOP or jump to the
   // next instruction. The behaviour of branch instructions in this
   // situation (as used for tail calls) is implementation-defined, so we
   // cannot rely on the linker replacing the tail call with a return.
   if (Info.Callee.isGlobal()) {
     const GlobalValue *GV = Info.Callee.getGlobal();
     const Triple &TT = MF.getTarget().getTargetTriple();
     if (GV->hasExternalWeakLinkage() &&
         (!TT.isOSWindows() || TT.isOSBinFormatELF() ||
          TT.isOSBinFormatMachO())) {
       LLVM_DEBUG(dbgs() << "... Cannot tail call externally-defined function "
                            "with weak linkage for this OS.\n");
       return false;
     }
   }
 
   // If we have -tailcallopt, then we're done.
   if (MF.getTarget().Options.GuaranteedTailCallOpt)
     return canGuaranteeTCO(CalleeCC) && CalleeCC == CallerF.getCallingConv();
 
   // We don't have -tailcallopt, so we're allowed to change the ABI (sibcall).
   // Try to find cases where we can do that.
 
   // I want anyone implementing a new calling convention to think long and hard
   // about this assert.
   assert((!Info.IsVarArg || CalleeCC == CallingConv::C) &&
          "Unexpected variadic calling convention");
 
   // Verify that the incoming and outgoing arguments from the callee are
   // safe to tail call.
   if (!doCallerAndCalleePassArgsTheSameWay(Info, MF, InArgs)) {
     LLVM_DEBUG(
         dbgs()
         << "... Caller and callee have incompatible calling conventions.\n");
     return false;
   }
 
   if (!areCalleeOutgoingArgsTailCallable(Info, MF, OutArgs))
     return false;
 
   LLVM_DEBUG(
       dbgs() << "... Call is eligible for tail call optimization.\n");
   return true;
 }
 
 static unsigned getCallOpcode(const MachineFunction &CallerF, bool IsIndirect,
                               bool IsTailCall) {
   if (!IsTailCall)
     return IsIndirect ? getBLRCallOpcode(CallerF) : (unsigned)AArch64::BL;
 
   if (!IsIndirect)
     return AArch64::TCRETURNdi;
 
   // When BTI is enabled, we need to use TCRETURNriBTI to make sure that we use
   // x16 or x17.
   if (CallerF.getFunction().hasFnAttribute("branch-target-enforcement"))
     return AArch64::TCRETURNriBTI;
 
   return AArch64::TCRETURNri;
 }
 
 bool AArch64CallLowering::lowerTailCall(
     MachineIRBuilder &MIRBuilder, CallLoweringInfo &Info,
     SmallVectorImpl<ArgInfo> &OutArgs) const {
   MachineFunction &MF = MIRBuilder.getMF();
   const Function &F = MF.getFunction();
   MachineRegisterInfo &MRI = MF.getRegInfo();
   const AArch64TargetLowering &TLI = *getTLI<AArch64TargetLowering>();
   AArch64FunctionInfo *FuncInfo = MF.getInfo<AArch64FunctionInfo>();
 
   // True when we're tail calling, but without -tailcallopt.
   bool IsSibCall = !MF.getTarget().Options.GuaranteedTailCallOpt;
 
   // TODO: Right now, regbankselect doesn't know how to handle the rtcGPR64
   // register class. Until we can do that, we should fall back here.
   if (F.hasFnAttribute("branch-target-enforcement")) {
     LLVM_DEBUG(
         dbgs() << "Cannot lower indirect tail calls with BTI enabled yet.\n");
     return false;
   }
 
   // Find out which ABI gets to decide where things go.
   CallingConv::ID CalleeCC = Info.CallConv;
   CCAssignFn *AssignFnFixed;
   CCAssignFn *AssignFnVarArg;
   std::tie(AssignFnFixed, AssignFnVarArg) = getAssignFnsForCC(CalleeCC, TLI);
 
   MachineInstrBuilder CallSeqStart;
   if (!IsSibCall)
     CallSeqStart = MIRBuilder.buildInstr(AArch64::ADJCALLSTACKDOWN);
 
   unsigned Opc = getCallOpcode(MF, Info.Callee.isReg(), true);
   auto MIB = MIRBuilder.buildInstrNoInsert(Opc);
   MIB.add(Info.Callee);
 
   // Byte offset for the tail call. When we are sibcalling, this will always
   // be 0.
   MIB.addImm(0);
 
   // Tell the call which registers are clobbered.
   auto TRI = MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
   const uint32_t *Mask = TRI->getCallPreservedMask(MF, CalleeCC);
   if (MF.getSubtarget<AArch64Subtarget>().hasCustomCallingConv())
     TRI->UpdateCustomCallPreservedMask(MF, &Mask);
   MIB.addRegMask(Mask);
 
   if (TRI->isAnyArgRegReserved(MF))
     TRI->emitReservedArgRegCallError(MF);
 
   // FPDiff is the byte offset of the call's argument area from the callee's.
   // Stores to callee stack arguments will be placed in FixedStackSlots offset
   // by this amount for a tail call. In a sibling call it must be 0 because the
   // caller will deallocate the entire stack and the callee still expects its
   // arguments to begin at SP+0.
   int FPDiff = 0;
 
   // This will be 0 for sibcalls, potentially nonzero for tail calls produced
   // by -tailcallopt. For sibcalls, the memory operands for the call are
   // already available in the caller's incoming argument space.
   unsigned NumBytes = 0;
   if (!IsSibCall) {
     // We aren't sibcalling, so we need to compute FPDiff. We need to do this
     // before handling assignments, because FPDiff must be known for memory
     // arguments.
     unsigned NumReusableBytes = FuncInfo->getBytesInStackArgArea();
     SmallVector<CCValAssign, 16> OutLocs;
     CCState OutInfo(CalleeCC, false, MF, OutLocs, F.getContext());
     analyzeArgInfo(OutInfo, OutArgs, *AssignFnFixed, *AssignFnVarArg);
 
     // The callee will pop the argument stack as a tail call. Thus, we must
     // keep it 16-byte aligned.
     NumBytes = alignTo(OutInfo.getNextStackOffset(), 16);
 
     // FPDiff will be negative if this tail call requires more space than we
     // would automatically have in our incoming argument space. Positive if we
     // actually shrink the stack.
     FPDiff = NumReusableBytes - NumBytes;
 
     // The stack pointer must be 16-byte aligned at all times it's used for a
     // memory operation, which in practice means at *all* times and in
     // particular across call boundaries. Therefore our own arguments started at
     // a 16-byte aligned SP and the delta applied for the tail call should
     // satisfy the same constraint.
     assert(FPDiff % 16 == 0 && "unaligned stack on tail call");
   }
 
   const auto &Forwards = FuncInfo->getForwardedMustTailRegParms();
 
   // Do the actual argument marshalling.
   OutgoingArgHandler Handler(MIRBuilder, MRI, MIB, AssignFnFixed,
                              AssignFnVarArg, true, FPDiff);
   if (!handleAssignments(MIRBuilder, OutArgs, Handler))
     return false;
 
   if (Info.IsVarArg && Info.IsMustTailCall) {
     // Now we know what's being passed to the function. Add uses to the call for
     // the forwarded registers that we *aren't* passing as parameters. This will
     // preserve the copies we build earlier.
     for (const auto &F : Forwards) {
       Register ForwardedReg = F.PReg;
       // If the register is already passed, or aliases a register which is
       // already being passed, then skip it.
       if (any_of(MIB->uses(), [&ForwardedReg, &TRI](const MachineOperand &Use) {
             if (!Use.isReg())
               return false;
             return TRI->regsOverlap(Use.getReg(), ForwardedReg);
           }))
         continue;
 
       // We aren't passing it already, so we should add it to the call.
       MIRBuilder.buildCopy(ForwardedReg, Register(F.VReg));
       MIB.addReg(ForwardedReg, RegState::Implicit);
     }
   }
 
   // If we have -tailcallopt, we need to adjust the stack. We'll do the call
   // sequence start and end here.
   if (!IsSibCall) {
     MIB->getOperand(1).setImm(FPDiff);
     CallSeqStart.addImm(NumBytes).addImm(0);
     // End the call sequence *before* emitting the call. Normally, we would
     // tidy the frame up after the call. However, here, we've laid out the
     // parameters so that when SP is reset, they will be in the correct
     // location.
     MIRBuilder.buildInstr(AArch64::ADJCALLSTACKUP).addImm(NumBytes).addImm(0);
   }
 
   // Now we can add the actual call instruction to the correct basic block.
   MIRBuilder.insertInstr(MIB);
 
   // If Callee is a reg, since it is used by a target specific instruction,
   // it must have a register class matching the constraint of that instruction.
   if (Info.Callee.isReg())
     MIB->getOperand(0).setReg(constrainOperandRegClass(
         MF, *TRI, MRI, *MF.getSubtarget().getInstrInfo(),
         *MF.getSubtarget().getRegBankInfo(), *MIB, MIB->getDesc(), Info.Callee,
         0));
 
   MF.getFrameInfo().setHasTailCall();
   Info.LoweredTailCall = true;
   return true;
 }
 
 bool AArch64CallLowering::lowerCall(MachineIRBuilder &MIRBuilder,
                                     CallLoweringInfo &Info) const {
   MachineFunction &MF = MIRBuilder.getMF();
   const Function &F = MF.getFunction();
   MachineRegisterInfo &MRI = MF.getRegInfo();
   auto &DL = F.getParent()->getDataLayout();
   const AArch64TargetLowering &TLI = *getTLI<AArch64TargetLowering>();
 
   SmallVector<ArgInfo, 8> OutArgs;
   for (auto &OrigArg : Info.OrigArgs) {
     splitToValueTypes(OrigArg, OutArgs, DL, MRI, Info.CallConv);
     // AAPCS requires that we zero-extend i1 to 8 bits by the caller.
     if (OrigArg.Ty->isIntegerTy(1))
       OutArgs.back().Flags[0].setZExt();
   }
 
   SmallVector<ArgInfo, 8> InArgs;
   if (!Info.OrigRet.Ty->isVoidTy())
     splitToValueTypes(Info.OrigRet, InArgs, DL, MRI, F.getCallingConv());
 
   // If we can lower as a tail call, do that instead.
   bool CanTailCallOpt =
       isEligibleForTailCallOptimization(MIRBuilder, Info, InArgs, OutArgs);
 
   // We must emit a tail call if we have musttail.
   if (Info.IsMustTailCall && !CanTailCallOpt) {
     // There are types of incoming/outgoing arguments we can't handle yet, so
     // it doesn't make sense to actually die here like in ISelLowering. Instead,
     // fall back to SelectionDAG and let it try to handle this.
     LLVM_DEBUG(dbgs() << "Failed to lower musttail call as tail call\n");
     return false;
   }
 
   if (CanTailCallOpt)
     return lowerTailCall(MIRBuilder, Info, OutArgs);
 
   // Find out which ABI gets to decide where things go.
   CCAssignFn *AssignFnFixed;
   CCAssignFn *AssignFnVarArg;
   std::tie(AssignFnFixed, AssignFnVarArg) =
       getAssignFnsForCC(Info.CallConv, TLI);
 
   MachineInstrBuilder CallSeqStart;
   CallSeqStart = MIRBuilder.buildInstr(AArch64::ADJCALLSTACKDOWN);
 
   // Create a temporarily-floating call instruction so we can add the implicit
   // uses of arg registers.
   unsigned Opc = getCallOpcode(MF, Info.Callee.isReg(), false);
 
   auto MIB = MIRBuilder.buildInstrNoInsert(Opc);
   MIB.add(Info.Callee);
 
   // Tell the call which registers are clobbered.
   auto TRI = MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
   const uint32_t *Mask = TRI->getCallPreservedMask(MF, Info.CallConv);
   if (MF.getSubtarget<AArch64Subtarget>().hasCustomCallingConv())
     TRI->UpdateCustomCallPreservedMask(MF, &Mask);
   MIB.addRegMask(Mask);
 
   if (TRI->isAnyArgRegReserved(MF))
     TRI->emitReservedArgRegCallError(MF);
 
   // Do the actual argument marshalling.
   OutgoingArgHandler Handler(MIRBuilder, MRI, MIB, AssignFnFixed,
                              AssignFnVarArg, false);
   if (!handleAssignments(MIRBuilder, OutArgs, Handler))
     return false;
 
   // Now we can add the actual call instruction to the correct basic block.
   MIRBuilder.insertInstr(MIB);
 
   // If Callee is a reg, since it is used by a target specific
   // instruction, it must have a register class matching the
   // constraint of that instruction.
   if (Info.Callee.isReg())
     MIB->getOperand(0).setReg(constrainOperandRegClass(
         MF, *TRI, MRI, *MF.getSubtarget().getInstrInfo(),
         *MF.getSubtarget().getRegBankInfo(), *MIB, MIB->getDesc(), Info.Callee,
         0));
 
   // Finally we can copy the returned value back into its virtual-register. In
   // symmetry with the arguments, the physical register must be an
   // implicit-define of the call instruction.
   if (!Info.OrigRet.Ty->isVoidTy()) {
     CCAssignFn *RetAssignFn = TLI.CCAssignFnForReturn(Info.CallConv);
     CallReturnHandler Handler(MIRBuilder, MRI, MIB, RetAssignFn);
     if (!handleAssignments(MIRBuilder, InArgs, Handler))
       return false;
   }
 
   if (Info.SwiftErrorVReg) {
     MIB.addDef(AArch64::X21, RegState::Implicit);
     MIRBuilder.buildCopy(Info.SwiftErrorVReg, Register(AArch64::X21));
   }
 
   uint64_t CalleePopBytes =
       doesCalleeRestoreStack(Info.CallConv,
                              MF.getTarget().Options.GuaranteedTailCallOpt)
           ? alignTo(Handler.StackSize, 16)
           : 0;
 
   CallSeqStart.addImm(Handler.StackSize).addImm(0);
   MIRBuilder.buildInstr(AArch64::ADJCALLSTACKUP)
       .addImm(Handler.StackSize)
       .addImm(CalleePopBytes);
 
   return true;
 }
Index: vendor/llvm-project/release-11.x/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
===================================================================
--- vendor/llvm-project/release-11.x/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp	(revision 366332)
+++ vendor/llvm-project/release-11.x/llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp	(revision 366333)
@@ -1,820 +1,824 @@
 //===-- llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp - Call lowering -----===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 //
 //===----------------------------------------------------------------------===//
 ///
 /// \file
 /// This file implements the lowering of LLVM calls to machine code calls for
 /// GlobalISel.
 ///
 //===----------------------------------------------------------------------===//
 
 #include "AMDGPUCallLowering.h"
 #include "AMDGPU.h"
 #include "AMDGPUISelLowering.h"
 #include "AMDGPUSubtarget.h"
 #include "AMDGPUTargetMachine.h"
 #include "SIISelLowering.h"
 #include "SIMachineFunctionInfo.h"
 #include "SIRegisterInfo.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
 #include "llvm/CodeGen/Analysis.h"
 #include "llvm/CodeGen/CallingConvLower.h"
 #include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/Support/LowLevelTypeImpl.h"
 
 using namespace llvm;
 
 namespace {
 
 struct OutgoingValueHandler : public CallLowering::ValueHandler {
   OutgoingValueHandler(MachineIRBuilder &B, MachineRegisterInfo &MRI,
                        MachineInstrBuilder MIB, CCAssignFn *AssignFn)
       : ValueHandler(B, MRI, AssignFn), MIB(MIB) {}
 
   MachineInstrBuilder MIB;
 
   bool isIncomingArgumentHandler() const override { return false; }
 
   Register getStackAddress(uint64_t Size, int64_t Offset,
                            MachinePointerInfo &MPO) override {
     llvm_unreachable("not implemented");
   }
 
   void assignValueToAddress(Register ValVReg, Register Addr, uint64_t Size,
                             MachinePointerInfo &MPO, CCValAssign &VA) override {
     llvm_unreachable("not implemented");
   }
 
   void assignValueToReg(Register ValVReg, Register PhysReg,
                         CCValAssign &VA) override {
     Register ExtReg;
     if (VA.getLocVT().getSizeInBits() < 32) {
       // 16-bit types are reported as legal for 32-bit registers. We need to
       // extend and do a 32-bit copy to avoid the verifier complaining about it.
       ExtReg = MIRBuilder.buildAnyExt(LLT::scalar(32), ValVReg).getReg(0);
     } else
       ExtReg = extendRegister(ValVReg, VA);
 
     // If this is a scalar return, insert a readfirstlane just in case the value
     // ends up in a VGPR.
     // FIXME: Assert this is a shader return.
     const SIRegisterInfo *TRI
       = static_cast<const SIRegisterInfo *>(MRI.getTargetRegisterInfo());
     if (TRI->isSGPRReg(MRI, PhysReg)) {
       auto ToSGPR = MIRBuilder.buildIntrinsic(Intrinsic::amdgcn_readfirstlane,
                                               {MRI.getType(ExtReg)}, false)
         .addReg(ExtReg);
       ExtReg = ToSGPR.getReg(0);
     }
 
     MIRBuilder.buildCopy(PhysReg, ExtReg);
     MIB.addUse(PhysReg, RegState::Implicit);
   }
 
   bool assignArg(unsigned ValNo, MVT ValVT, MVT LocVT,
                  CCValAssign::LocInfo LocInfo,
                  const CallLowering::ArgInfo &Info,
                  ISD::ArgFlagsTy Flags,
                  CCState &State) override {
     return AssignFn(ValNo, ValVT, LocVT, LocInfo, Flags, State);
   }
 };
 
 struct IncomingArgHandler : public CallLowering::ValueHandler {
   uint64_t StackUsed = 0;
 
   IncomingArgHandler(MachineIRBuilder &B, MachineRegisterInfo &MRI,
                      CCAssignFn *AssignFn)
     : ValueHandler(B, MRI, AssignFn) {}
 
   Register getStackAddress(uint64_t Size, int64_t Offset,
                            MachinePointerInfo &MPO) override {
     auto &MFI = MIRBuilder.getMF().getFrameInfo();
     int FI = MFI.CreateFixedObject(Size, Offset, true);
     MPO = MachinePointerInfo::getFixedStack(MIRBuilder.getMF(), FI);
     auto AddrReg = MIRBuilder.buildFrameIndex(
         LLT::pointer(AMDGPUAS::PRIVATE_ADDRESS, 32), FI);
     StackUsed = std::max(StackUsed, Size + Offset);
     return AddrReg.getReg(0);
   }
 
   void assignValueToReg(Register ValVReg, Register PhysReg,
                         CCValAssign &VA) override {
     markPhysRegUsed(PhysReg);
 
     if (VA.getLocVT().getSizeInBits() < 32) {
       // 16-bit types are reported as legal for 32-bit registers. We need to do
       // a 32-bit copy, and truncate to avoid the verifier complaining about it.
       auto Copy = MIRBuilder.buildCopy(LLT::scalar(32), PhysReg);
       MIRBuilder.buildTrunc(ValVReg, Copy);
       return;
     }
 
     switch (VA.getLocInfo()) {
     case CCValAssign::LocInfo::SExt:
     case CCValAssign::LocInfo::ZExt:
     case CCValAssign::LocInfo::AExt: {
       auto Copy = MIRBuilder.buildCopy(LLT{VA.getLocVT()}, PhysReg);
       MIRBuilder.buildTrunc(ValVReg, Copy);
       break;
     }
     default:
       MIRBuilder.buildCopy(ValVReg, PhysReg);
       break;
     }
   }
 
-  void assignValueToAddress(Register ValVReg, Register Addr, uint64_t Size,
+  void assignValueToAddress(Register ValVReg, Register Addr, uint64_t MemSize,
                             MachinePointerInfo &MPO, CCValAssign &VA) override {
     MachineFunction &MF = MIRBuilder.getMF();
 
+    // The reported memory location may be wider than the value.
+    const LLT RegTy = MRI.getType(ValVReg);
+    MemSize = std::min(static_cast<uint64_t>(RegTy.getSizeInBytes()), MemSize);
+
     // FIXME: Get alignment
     auto MMO = MF.getMachineMemOperand(
-        MPO, MachineMemOperand::MOLoad | MachineMemOperand::MOInvariant, Size,
+        MPO, MachineMemOperand::MOLoad | MachineMemOperand::MOInvariant, MemSize,
         inferAlignFromPtrInfo(MF, MPO));
     MIRBuilder.buildLoad(ValVReg, Addr, *MMO);
   }
 
   /// How the physical register gets marked varies between formal
   /// parameters (it's a basic-block live-in), and a call instruction
   /// (it's an implicit-def of the BL).
   virtual void markPhysRegUsed(unsigned PhysReg) = 0;
 
   // FIXME: What is the point of this being a callback?
   bool isIncomingArgumentHandler() const override { return true; }
 };
 
 struct FormalArgHandler : public IncomingArgHandler {
   FormalArgHandler(MachineIRBuilder &B, MachineRegisterInfo &MRI,
                    CCAssignFn *AssignFn)
     : IncomingArgHandler(B, MRI, AssignFn) {}
 
   void markPhysRegUsed(unsigned PhysReg) override {
     MIRBuilder.getMBB().addLiveIn(PhysReg);
   }
 };
 
 }
 
 AMDGPUCallLowering::AMDGPUCallLowering(const AMDGPUTargetLowering &TLI)
   : CallLowering(&TLI) {
 }
 
 // FIXME: Compatability shim
 static ISD::NodeType extOpcodeToISDExtOpcode(unsigned MIOpc) {
   switch (MIOpc) {
   case TargetOpcode::G_SEXT:
     return ISD::SIGN_EXTEND;
   case TargetOpcode::G_ZEXT:
     return ISD::ZERO_EXTEND;
   case TargetOpcode::G_ANYEXT:
     return ISD::ANY_EXTEND;
   default:
     llvm_unreachable("not an extend opcode");
   }
 }
 
 void AMDGPUCallLowering::splitToValueTypes(
   MachineIRBuilder &B,
   const ArgInfo &OrigArg, unsigned OrigArgIdx,
   SmallVectorImpl<ArgInfo> &SplitArgs,
   const DataLayout &DL, CallingConv::ID CallConv,
   SplitArgTy PerformArgSplit) const {
   const SITargetLowering &TLI = *getTLI<SITargetLowering>();
   LLVMContext &Ctx = OrigArg.Ty->getContext();
 
   if (OrigArg.Ty->isVoidTy())
     return;
 
   SmallVector<EVT, 4> SplitVTs;
   ComputeValueVTs(TLI, DL, OrigArg.Ty, SplitVTs);
 
   assert(OrigArg.Regs.size() == SplitVTs.size());
 
   int SplitIdx = 0;
   for (EVT VT : SplitVTs) {
     Register Reg = OrigArg.Regs[SplitIdx];
     Type *Ty = VT.getTypeForEVT(Ctx);
     LLT LLTy = getLLTForType(*Ty, DL);
 
     if (OrigArgIdx == AttributeList::ReturnIndex && VT.isScalarInteger()) {
       unsigned ExtendOp = TargetOpcode::G_ANYEXT;
       if (OrigArg.Flags[0].isSExt()) {
         assert(OrigArg.Regs.size() == 1 && "expect only simple return values");
         ExtendOp = TargetOpcode::G_SEXT;
       } else if (OrigArg.Flags[0].isZExt()) {
         assert(OrigArg.Regs.size() == 1 && "expect only simple return values");
         ExtendOp = TargetOpcode::G_ZEXT;
       }
 
       EVT ExtVT = TLI.getTypeForExtReturn(Ctx, VT,
                                           extOpcodeToISDExtOpcode(ExtendOp));
       if (ExtVT != VT) {
         VT = ExtVT;
         Ty = ExtVT.getTypeForEVT(Ctx);
         LLTy = getLLTForType(*Ty, DL);
         Reg = B.buildInstr(ExtendOp, {LLTy}, {Reg}).getReg(0);
       }
     }
 
     unsigned NumParts = TLI.getNumRegistersForCallingConv(Ctx, CallConv, VT);
     MVT RegVT = TLI.getRegisterTypeForCallingConv(Ctx, CallConv, VT);
 
     if (NumParts == 1) {
       // No splitting to do, but we want to replace the original type (e.g. [1 x
       // double] -> double).
       SplitArgs.emplace_back(Reg, Ty, OrigArg.Flags, OrigArg.IsFixed);
 
       ++SplitIdx;
       continue;
     }
 
     SmallVector<Register, 8> SplitRegs;
     Type *PartTy = EVT(RegVT).getTypeForEVT(Ctx);
     LLT PartLLT = getLLTForType(*PartTy, DL);
     MachineRegisterInfo &MRI = *B.getMRI();
 
     // FIXME: Should we be reporting all of the part registers for a single
     // argument, and let handleAssignments take care of the repacking?
     for (unsigned i = 0; i < NumParts; ++i) {
       Register PartReg = MRI.createGenericVirtualRegister(PartLLT);
       SplitRegs.push_back(PartReg);
       SplitArgs.emplace_back(ArrayRef<Register>(PartReg), PartTy, OrigArg.Flags);
     }
 
     PerformArgSplit(SplitRegs, Reg, LLTy, PartLLT, SplitIdx);
 
     ++SplitIdx;
   }
 }
 
 // Get the appropriate type to make \p OrigTy \p Factor times bigger.
 static LLT getMultipleType(LLT OrigTy, int Factor) {
   if (OrigTy.isVector()) {
     return LLT::vector(OrigTy.getNumElements() * Factor,
                        OrigTy.getElementType());
   }
 
   return LLT::scalar(OrigTy.getSizeInBits() * Factor);
 }
 
 // TODO: Move to generic code
 static void unpackRegsToOrigType(MachineIRBuilder &B,
                                  ArrayRef<Register> DstRegs,
                                  Register SrcReg,
                                  const CallLowering::ArgInfo &Info,
                                  LLT SrcTy,
                                  LLT PartTy) {
   assert(DstRegs.size() > 1 && "Nothing to unpack");
 
   const unsigned SrcSize = SrcTy.getSizeInBits();
   const unsigned PartSize = PartTy.getSizeInBits();
 
   if (SrcTy.isVector() && !PartTy.isVector() &&
       PartSize > SrcTy.getElementType().getSizeInBits()) {
     // Vector was scalarized, and the elements extended.
     auto UnmergeToEltTy = B.buildUnmerge(SrcTy.getElementType(),
                                                   SrcReg);
     for (int i = 0, e = DstRegs.size(); i != e; ++i)
       B.buildAnyExt(DstRegs[i], UnmergeToEltTy.getReg(i));
     return;
   }
 
   if (SrcSize % PartSize == 0) {
     B.buildUnmerge(DstRegs, SrcReg);
     return;
   }
 
   const int NumRoundedParts = (SrcSize + PartSize - 1) / PartSize;
 
   LLT BigTy = getMultipleType(PartTy, NumRoundedParts);
   auto ImpDef = B.buildUndef(BigTy);
 
   auto Big = B.buildInsert(BigTy, ImpDef.getReg(0), SrcReg, 0).getReg(0);
 
   int64_t Offset = 0;
   for (unsigned i = 0, e = DstRegs.size(); i != e; ++i, Offset += PartSize)
     B.buildExtract(DstRegs[i], Big, Offset);
 }
 
 /// Lower the return value for the already existing \p Ret. This assumes that
 /// \p B's insertion point is correct.
 bool AMDGPUCallLowering::lowerReturnVal(MachineIRBuilder &B,
                                         const Value *Val, ArrayRef<Register> VRegs,
                                         MachineInstrBuilder &Ret) const {
   if (!Val)
     return true;
 
   auto &MF = B.getMF();
   const auto &F = MF.getFunction();
   const DataLayout &DL = MF.getDataLayout();
   MachineRegisterInfo *MRI = B.getMRI();
 
   CallingConv::ID CC = F.getCallingConv();
   const SITargetLowering &TLI = *getTLI<SITargetLowering>();
 
   ArgInfo OrigRetInfo(VRegs, Val->getType());
   setArgFlags(OrigRetInfo, AttributeList::ReturnIndex, DL, F);
   SmallVector<ArgInfo, 4> SplitRetInfos;
 
   splitToValueTypes(
     B, OrigRetInfo, AttributeList::ReturnIndex, SplitRetInfos, DL, CC,
     [&](ArrayRef<Register> Regs, Register SrcReg, LLT LLTy, LLT PartLLT,
         int VTSplitIdx) {
       unpackRegsToOrigType(B, Regs, SrcReg,
                            SplitRetInfos[VTSplitIdx],
                            LLTy, PartLLT);
     });
 
   CCAssignFn *AssignFn = TLI.CCAssignFnForReturn(CC, F.isVarArg());
   OutgoingValueHandler RetHandler(B, *MRI, Ret, AssignFn);
   return handleAssignments(B, SplitRetInfos, RetHandler);
 }
 
 bool AMDGPUCallLowering::lowerReturn(MachineIRBuilder &B,
                                      const Value *Val,
                                      ArrayRef<Register> VRegs) const {
 
   MachineFunction &MF = B.getMF();
   MachineRegisterInfo &MRI = MF.getRegInfo();
   SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
   MFI->setIfReturnsVoid(!Val);
 
   assert(!Val == VRegs.empty() && "Return value without a vreg");
 
   CallingConv::ID CC = B.getMF().getFunction().getCallingConv();
   const bool IsShader = AMDGPU::isShader(CC);
   const bool IsWaveEnd = (IsShader && MFI->returnsVoid()) ||
                          AMDGPU::isKernel(CC);
   if (IsWaveEnd) {
     B.buildInstr(AMDGPU::S_ENDPGM)
       .addImm(0);
     return true;
   }
 
   auto const &ST = MF.getSubtarget<GCNSubtarget>();
 
   unsigned ReturnOpc =
       IsShader ? AMDGPU::SI_RETURN_TO_EPILOG : AMDGPU::S_SETPC_B64_return;
 
   auto Ret = B.buildInstrNoInsert(ReturnOpc);
   Register ReturnAddrVReg;
   if (ReturnOpc == AMDGPU::S_SETPC_B64_return) {
     ReturnAddrVReg = MRI.createVirtualRegister(&AMDGPU::CCR_SGPR_64RegClass);
     Ret.addUse(ReturnAddrVReg);
   }
 
   if (!lowerReturnVal(B, Val, VRegs, Ret))
     return false;
 
   if (ReturnOpc == AMDGPU::S_SETPC_B64_return) {
     const SIRegisterInfo *TRI = ST.getRegisterInfo();
     Register LiveInReturn = MF.addLiveIn(TRI->getReturnAddressReg(MF),
                                          &AMDGPU::SGPR_64RegClass);
     B.buildCopy(ReturnAddrVReg, LiveInReturn);
   }
 
   // TODO: Handle CalleeSavedRegsViaCopy.
 
   B.insertInstr(Ret);
   return true;
 }
 
 Register AMDGPUCallLowering::lowerParameterPtr(MachineIRBuilder &B,
                                                Type *ParamTy,
                                                uint64_t Offset) const {
 
   MachineFunction &MF = B.getMF();
   const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
   MachineRegisterInfo &MRI = MF.getRegInfo();
   const Function &F = MF.getFunction();
   const DataLayout &DL = F.getParent()->getDataLayout();
   PointerType *PtrTy = PointerType::get(ParamTy, AMDGPUAS::CONSTANT_ADDRESS);
   LLT PtrType = getLLTForType(*PtrTy, DL);
   Register KernArgSegmentPtr =
     MFI->getPreloadedReg(AMDGPUFunctionArgInfo::KERNARG_SEGMENT_PTR);
   Register KernArgSegmentVReg = MRI.getLiveInVirtReg(KernArgSegmentPtr);
 
   auto OffsetReg = B.buildConstant(LLT::scalar(64), Offset);
 
   return B.buildPtrAdd(PtrType, KernArgSegmentVReg, OffsetReg).getReg(0);
 }
 
 void AMDGPUCallLowering::lowerParameter(MachineIRBuilder &B, Type *ParamTy,
                                         uint64_t Offset, Align Alignment,
                                         Register DstReg) const {
   MachineFunction &MF = B.getMF();
   const Function &F = MF.getFunction();
   const DataLayout &DL = F.getParent()->getDataLayout();
   MachinePointerInfo PtrInfo(AMDGPUAS::CONSTANT_ADDRESS);
   unsigned TypeSize = DL.getTypeStoreSize(ParamTy);
   Register PtrReg = lowerParameterPtr(B, ParamTy, Offset);
 
   MachineMemOperand *MMO = MF.getMachineMemOperand(
       PtrInfo,
       MachineMemOperand::MOLoad | MachineMemOperand::MODereferenceable |
           MachineMemOperand::MOInvariant,
       TypeSize, Alignment);
 
   B.buildLoad(DstReg, PtrReg, *MMO);
 }
 
 // Allocate special inputs passed in user SGPRs.
 static void allocateHSAUserSGPRs(CCState &CCInfo,
                                  MachineIRBuilder &B,
                                  MachineFunction &MF,
                                  const SIRegisterInfo &TRI,
                                  SIMachineFunctionInfo &Info) {
   // FIXME: How should these inputs interact with inreg / custom SGPR inputs?
   if (Info.hasPrivateSegmentBuffer()) {
     Register PrivateSegmentBufferReg = Info.addPrivateSegmentBuffer(TRI);
     MF.addLiveIn(PrivateSegmentBufferReg, &AMDGPU::SGPR_128RegClass);
     CCInfo.AllocateReg(PrivateSegmentBufferReg);
   }
 
   if (Info.hasDispatchPtr()) {
     Register DispatchPtrReg = Info.addDispatchPtr(TRI);
     MF.addLiveIn(DispatchPtrReg, &AMDGPU::SGPR_64RegClass);
     CCInfo.AllocateReg(DispatchPtrReg);
   }
 
   if (Info.hasQueuePtr()) {
     Register QueuePtrReg = Info.addQueuePtr(TRI);
     MF.addLiveIn(QueuePtrReg, &AMDGPU::SGPR_64RegClass);
     CCInfo.AllocateReg(QueuePtrReg);
   }
 
   if (Info.hasKernargSegmentPtr()) {
     MachineRegisterInfo &MRI = MF.getRegInfo();
     Register InputPtrReg = Info.addKernargSegmentPtr(TRI);
     const LLT P4 = LLT::pointer(AMDGPUAS::CONSTANT_ADDRESS, 64);
     Register VReg = MRI.createGenericVirtualRegister(P4);
     MRI.addLiveIn(InputPtrReg, VReg);
     B.getMBB().addLiveIn(InputPtrReg);
     B.buildCopy(VReg, InputPtrReg);
     CCInfo.AllocateReg(InputPtrReg);
   }
 
   if (Info.hasDispatchID()) {
     Register DispatchIDReg = Info.addDispatchID(TRI);
     MF.addLiveIn(DispatchIDReg, &AMDGPU::SGPR_64RegClass);
     CCInfo.AllocateReg(DispatchIDReg);
   }
 
   if (Info.hasFlatScratchInit()) {
     Register FlatScratchInitReg = Info.addFlatScratchInit(TRI);
     MF.addLiveIn(FlatScratchInitReg, &AMDGPU::SGPR_64RegClass);
     CCInfo.AllocateReg(FlatScratchInitReg);
   }
 
   // TODO: Add GridWorkGroupCount user SGPRs when used. For now with HSA we read
   // these from the dispatch pointer.
 }
 
 bool AMDGPUCallLowering::lowerFormalArgumentsKernel(
     MachineIRBuilder &B, const Function &F,
     ArrayRef<ArrayRef<Register>> VRegs) const {
   MachineFunction &MF = B.getMF();
   const GCNSubtarget *Subtarget = &MF.getSubtarget<GCNSubtarget>();
   MachineRegisterInfo &MRI = MF.getRegInfo();
   SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
   const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();
   const SITargetLowering &TLI = *getTLI<SITargetLowering>();
 
   const DataLayout &DL = F.getParent()->getDataLayout();
 
   SmallVector<CCValAssign, 16> ArgLocs;
   CCState CCInfo(F.getCallingConv(), F.isVarArg(), MF, ArgLocs, F.getContext());
 
   allocateHSAUserSGPRs(CCInfo, B, MF, *TRI, *Info);
 
   unsigned i = 0;
   const Align KernArgBaseAlign(16);
   const unsigned BaseOffset = Subtarget->getExplicitKernelArgOffset(F);
   uint64_t ExplicitArgOffset = 0;
 
   // TODO: Align down to dword alignment and extract bits for extending loads.
   for (auto &Arg : F.args()) {
     Type *ArgTy = Arg.getType();
     unsigned AllocSize = DL.getTypeAllocSize(ArgTy);
     if (AllocSize == 0)
       continue;
 
     Align ABIAlign = DL.getABITypeAlign(ArgTy);
 
     uint64_t ArgOffset = alignTo(ExplicitArgOffset, ABIAlign) + BaseOffset;
     ExplicitArgOffset = alignTo(ExplicitArgOffset, ABIAlign) + AllocSize;
 
     if (Arg.use_empty()) {
       ++i;
       continue;
     }
 
     ArrayRef<Register> OrigArgRegs = VRegs[i];
     Register ArgReg =
       OrigArgRegs.size() == 1
       ? OrigArgRegs[0]
       : MRI.createGenericVirtualRegister(getLLTForType(*ArgTy, DL));
 
     Align Alignment = commonAlignment(KernArgBaseAlign, ArgOffset);
     lowerParameter(B, ArgTy, ArgOffset, Alignment, ArgReg);
     if (OrigArgRegs.size() > 1)
       unpackRegs(OrigArgRegs, ArgReg, ArgTy, B);
     ++i;
   }
 
   TLI.allocateSpecialEntryInputVGPRs(CCInfo, MF, *TRI, *Info);
   TLI.allocateSystemSGPRs(CCInfo, MF, *Info, F.getCallingConv(), false);
   return true;
 }
 
 /// Pack values \p SrcRegs to cover the vector type result \p DstRegs.
 static MachineInstrBuilder mergeVectorRegsToResultRegs(
   MachineIRBuilder &B, ArrayRef<Register> DstRegs, ArrayRef<Register> SrcRegs) {
   MachineRegisterInfo &MRI = *B.getMRI();
   LLT LLTy = MRI.getType(DstRegs[0]);
   LLT PartLLT = MRI.getType(SrcRegs[0]);
 
   // Deal with v3s16 split into v2s16
   LLT LCMTy = getLCMType(LLTy, PartLLT);
   if (LCMTy == LLTy) {
     // Common case where no padding is needed.
     assert(DstRegs.size() == 1);
     return B.buildConcatVectors(DstRegs[0], SrcRegs);
   }
 
   const int NumWide =  LCMTy.getSizeInBits() / PartLLT.getSizeInBits();
   Register Undef = B.buildUndef(PartLLT).getReg(0);
 
   // Build vector of undefs.
   SmallVector<Register, 8> WidenedSrcs(NumWide, Undef);
 
   // Replace the first sources with the real registers.
   std::copy(SrcRegs.begin(), SrcRegs.end(), WidenedSrcs.begin());
 
   auto Widened = B.buildConcatVectors(LCMTy, WidenedSrcs);
   int NumDst = LCMTy.getSizeInBits() / LLTy.getSizeInBits();
 
   SmallVector<Register, 8> PadDstRegs(NumDst);
   std::copy(DstRegs.begin(), DstRegs.end(), PadDstRegs.begin());
 
   // Create the excess dead defs for the unmerge.
   for (int I = DstRegs.size(); I != NumDst; ++I)
     PadDstRegs[I] = MRI.createGenericVirtualRegister(LLTy);
 
   return B.buildUnmerge(PadDstRegs, Widened);
 }
 
 // TODO: Move this to generic code
 static void packSplitRegsToOrigType(MachineIRBuilder &B,
                                     ArrayRef<Register> OrigRegs,
                                     ArrayRef<Register> Regs,
                                     LLT LLTy,
                                     LLT PartLLT) {
   MachineRegisterInfo &MRI = *B.getMRI();
 
   if (!LLTy.isVector() && !PartLLT.isVector()) {
     assert(OrigRegs.size() == 1);
     LLT OrigTy = MRI.getType(OrigRegs[0]);
 
     unsigned SrcSize = PartLLT.getSizeInBits() * Regs.size();
     if (SrcSize == OrigTy.getSizeInBits())
       B.buildMerge(OrigRegs[0], Regs);
     else {
       auto Widened = B.buildMerge(LLT::scalar(SrcSize), Regs);
       B.buildTrunc(OrigRegs[0], Widened);
     }
 
     return;
   }
 
   if (LLTy.isVector() && PartLLT.isVector()) {
     assert(OrigRegs.size() == 1);
     assert(LLTy.getElementType() == PartLLT.getElementType());
     mergeVectorRegsToResultRegs(B, OrigRegs, Regs);
     return;
   }
 
   assert(LLTy.isVector() && !PartLLT.isVector());
 
   LLT DstEltTy = LLTy.getElementType();
 
   // Pointer information was discarded. We'll need to coerce some register types
   // to avoid violating type constraints.
   LLT RealDstEltTy = MRI.getType(OrigRegs[0]).getElementType();
 
   assert(DstEltTy.getSizeInBits() == RealDstEltTy.getSizeInBits());
 
   if (DstEltTy == PartLLT) {
     // Vector was trivially scalarized.
 
     if (RealDstEltTy.isPointer()) {
       for (Register Reg : Regs)
         MRI.setType(Reg, RealDstEltTy);
     }
 
     B.buildBuildVector(OrigRegs[0], Regs);
   } else if (DstEltTy.getSizeInBits() > PartLLT.getSizeInBits()) {
     // Deal with vector with 64-bit elements decomposed to 32-bit
     // registers. Need to create intermediate 64-bit elements.
     SmallVector<Register, 8> EltMerges;
     int PartsPerElt = DstEltTy.getSizeInBits() / PartLLT.getSizeInBits();
 
     assert(DstEltTy.getSizeInBits() % PartLLT.getSizeInBits() == 0);
 
     for (int I = 0, NumElts = LLTy.getNumElements(); I != NumElts; ++I)  {
       auto Merge = B.buildMerge(RealDstEltTy, Regs.take_front(PartsPerElt));
       // Fix the type in case this is really a vector of pointers.
       MRI.setType(Merge.getReg(0), RealDstEltTy);
       EltMerges.push_back(Merge.getReg(0));
       Regs = Regs.drop_front(PartsPerElt);
     }
 
     B.buildBuildVector(OrigRegs[0], EltMerges);
   } else {
     // Vector was split, and elements promoted to a wider type.
     LLT BVType = LLT::vector(LLTy.getNumElements(), PartLLT);
     auto BV = B.buildBuildVector(BVType, Regs);
     B.buildTrunc(OrigRegs[0], BV);
   }
 }
 
 bool AMDGPUCallLowering::lowerFormalArguments(
     MachineIRBuilder &B, const Function &F,
     ArrayRef<ArrayRef<Register>> VRegs) const {
   CallingConv::ID CC = F.getCallingConv();
 
   // The infrastructure for normal calling convention lowering is essentially
   // useless for kernels. We want to avoid any kind of legalization or argument
   // splitting.
   if (CC == CallingConv::AMDGPU_KERNEL)
     return lowerFormalArgumentsKernel(B, F, VRegs);
 
   const bool IsShader = AMDGPU::isShader(CC);
   const bool IsEntryFunc = AMDGPU::isEntryFunctionCC(CC);
 
   MachineFunction &MF = B.getMF();
   MachineBasicBlock &MBB = B.getMBB();
   MachineRegisterInfo &MRI = MF.getRegInfo();
   SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
   const GCNSubtarget &Subtarget = MF.getSubtarget<GCNSubtarget>();
   const SIRegisterInfo *TRI = Subtarget.getRegisterInfo();
   const DataLayout &DL = F.getParent()->getDataLayout();
 
 
   SmallVector<CCValAssign, 16> ArgLocs;
   CCState CCInfo(CC, F.isVarArg(), MF, ArgLocs, F.getContext());
 
   if (!IsEntryFunc) {
     Register ReturnAddrReg = TRI->getReturnAddressReg(MF);
     Register LiveInReturn = MF.addLiveIn(ReturnAddrReg,
                                          &AMDGPU::SGPR_64RegClass);
     MBB.addLiveIn(ReturnAddrReg);
     B.buildCopy(LiveInReturn, ReturnAddrReg);
   }
 
   if (Info->hasImplicitBufferPtr()) {
     Register ImplicitBufferPtrReg = Info->addImplicitBufferPtr(*TRI);
     MF.addLiveIn(ImplicitBufferPtrReg, &AMDGPU::SGPR_64RegClass);
     CCInfo.AllocateReg(ImplicitBufferPtrReg);
   }
 
 
   SmallVector<ArgInfo, 32> SplitArgs;
   unsigned Idx = 0;
   unsigned PSInputNum = 0;
 
   for (auto &Arg : F.args()) {
     if (DL.getTypeStoreSize(Arg.getType()) == 0)
       continue;
 
     const bool InReg = Arg.hasAttribute(Attribute::InReg);
 
     // SGPR arguments to functions not implemented.
     if (!IsShader && InReg)
       return false;
 
     if (Arg.hasAttribute(Attribute::SwiftSelf) ||
         Arg.hasAttribute(Attribute::SwiftError) ||
         Arg.hasAttribute(Attribute::Nest))
       return false;
 
     if (CC == CallingConv::AMDGPU_PS && !InReg && PSInputNum <= 15) {
       const bool ArgUsed = !Arg.use_empty();
       bool SkipArg = !ArgUsed && !Info->isPSInputAllocated(PSInputNum);
 
       if (!SkipArg) {
         Info->markPSInputAllocated(PSInputNum);
         if (ArgUsed)
           Info->markPSInputEnabled(PSInputNum);
       }
 
       ++PSInputNum;
 
       if (SkipArg) {
         for (int I = 0, E = VRegs[Idx].size(); I != E; ++I)
           B.buildUndef(VRegs[Idx][I]);
 
         ++Idx;
         continue;
       }
     }
 
     ArgInfo OrigArg(VRegs[Idx], Arg.getType());
     const unsigned OrigArgIdx = Idx + AttributeList::FirstArgIndex;
     setArgFlags(OrigArg, OrigArgIdx, DL, F);
 
     splitToValueTypes(
       B, OrigArg, OrigArgIdx, SplitArgs, DL, CC,
       // FIXME: We should probably be passing multiple registers to
       // handleAssignments to do this
       [&](ArrayRef<Register> Regs, Register DstReg,
           LLT LLTy, LLT PartLLT, int VTSplitIdx) {
         assert(DstReg == VRegs[Idx][VTSplitIdx]);
         packSplitRegsToOrigType(B, VRegs[Idx][VTSplitIdx], Regs,
                                 LLTy, PartLLT);
       });
 
     ++Idx;
   }
 
   // At least one interpolation mode must be enabled or else the GPU will
   // hang.
   //
   // Check PSInputAddr instead of PSInputEnable. The idea is that if the user
   // set PSInputAddr, the user wants to enable some bits after the compilation
   // based on run-time states. Since we can't know what the final PSInputEna
   // will look like, so we shouldn't do anything here and the user should take
   // responsibility for the correct programming.
   //
   // Otherwise, the following restrictions apply:
   // - At least one of PERSP_* (0xF) or LINEAR_* (0x70) must be enabled.
   // - If POS_W_FLOAT (11) is enabled, at least one of PERSP_* must be
   //   enabled too.
   if (CC == CallingConv::AMDGPU_PS) {
     if ((Info->getPSInputAddr() & 0x7F) == 0 ||
         ((Info->getPSInputAddr() & 0xF) == 0 &&
          Info->isPSInputAllocated(11))) {
       CCInfo.AllocateReg(AMDGPU::VGPR0);
       CCInfo.AllocateReg(AMDGPU::VGPR1);
       Info->markPSInputAllocated(0);
       Info->markPSInputEnabled(0);
     }
 
     if (Subtarget.isAmdPalOS()) {
       // For isAmdPalOS, the user does not enable some bits after compilation
       // based on run-time states; the register values being generated here are
       // the final ones set in hardware. Therefore we need to apply the
       // workaround to PSInputAddr and PSInputEnable together.  (The case where
       // a bit is set in PSInputAddr but not PSInputEnable is where the frontend
       // set up an input arg for a particular interpolation mode, but nothing
       // uses that input arg. Really we should have an earlier pass that removes
       // such an arg.)
       unsigned PsInputBits = Info->getPSInputAddr() & Info->getPSInputEnable();
       if ((PsInputBits & 0x7F) == 0 ||
           ((PsInputBits & 0xF) == 0 &&
            (PsInputBits >> 11 & 1)))
         Info->markPSInputEnabled(
           countTrailingZeros(Info->getPSInputAddr(), ZB_Undefined));
     }
   }
 
   const SITargetLowering &TLI = *getTLI<SITargetLowering>();
   CCAssignFn *AssignFn = TLI.CCAssignFnForCall(CC, F.isVarArg());
 
   if (!MBB.empty())
     B.setInstr(*MBB.begin());
 
   if (!IsEntryFunc) {
     // For the fixed ABI, pass workitem IDs in the last argument register.
     if (AMDGPUTargetMachine::EnableFixedFunctionABI)
       TLI.allocateSpecialInputVGPRsFixed(CCInfo, MF, *TRI, *Info);
   }
 
   FormalArgHandler Handler(B, MRI, AssignFn);
   if (!handleAssignments(CCInfo, ArgLocs, B, SplitArgs, Handler))
     return false;
 
   if (!IsEntryFunc && !AMDGPUTargetMachine::EnableFixedFunctionABI) {
     // Special inputs come after user arguments.
     TLI.allocateSpecialInputVGPRs(CCInfo, MF, *TRI, *Info);
   }
 
   // Start adding system SGPRs.
   if (IsEntryFunc) {
     TLI.allocateSystemSGPRs(CCInfo, MF, *Info, CC, IsShader);
   } else {
     CCInfo.AllocateReg(Info->getScratchRSrcReg());
     TLI.allocateSpecialInputSGPRs(CCInfo, MF, *TRI, *Info);
   }
 
   // Move back to the end of the basic block.
   B.setMBB(MBB);
 
   return true;
 }