Index: vendor/llvm/dist-release_70/docs/ReleaseNotes.rst
===================================================================
--- vendor/llvm/dist-release_70/docs/ReleaseNotes.rst	(revision 338574)
+++ vendor/llvm/dist-release_70/docs/ReleaseNotes.rst	(revision 338575)
@@ -1,282 +1,326 @@
 ========================
 LLVM 7.0.0 Release Notes
 ========================
 
 .. contents::
     :local:
 
-.. warning::
-   These are in-progress notes for the upcoming LLVM 7 release.
-   Release notes for previous releases can be found on
-   `the Download Page <http://releases.llvm.org/download.html>`_.
 
-
 Introduction
 ============
 
 This document contains the release notes for the LLVM Compiler Infrastructure,
 release 7.0.0.  Here we describe the status of LLVM, including major improvements
 from the previous release, improvements in various subprojects of LLVM, and
 some of the current users of the code.  All LLVM releases may be downloaded
-from the `LLVM releases web site <http://llvm.org/releases/>`_.
+from the `LLVM releases web site <https://llvm.org/releases/>`_.
 
 For more information about LLVM, including information about the latest
-release, please check out the `main LLVM web site <http://llvm.org/>`_.  If you
+release, please check out the `main LLVM web site <https://llvm.org/>`_.  If you
 have questions or comments, the `LLVM Developer's Mailing List
-<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ is a good place to send
+<https://lists.llvm.org/mailman/listinfo/llvm-dev>`_ is a good place to send
 them.
 
-Note that if you are reading this file from a Subversion checkout or the main
-LLVM web page, this document applies to the *next* release, not the current
-one.  To see the release notes for a specific release, please see the `releases
-page <http://llvm.org/releases/>`_.
-
 Non-comprehensive list of changes in this release
 =================================================
-.. NOTE
-   For small 1-3 sentence descriptions, just add an entry at the end of
-   this list. If your description won't fit comfortably in one bullet
-   point (e.g. maybe you would like to give an example of the
-   functionality, or simply have a lot to talk about), see the `NOTE` below
-   for adding a new subsection.
 
 * The Windows installer no longer includes a Visual Studio integration.
   Instead, a new
-  `LLVM Compiler Toolchain Visual Studio extension <https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.llvm-toolchain>`
-  is available on the Visual Studio Marketplace. The new integration includes
-  support for Visual Studio 2017.
+  `LLVM Compiler Toolchain Visual Studio extension <https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.llvm-toolchain>`_
+  is available on the Visual Studio Marketplace. The new integration
+  supports Visual Studio 2017.
 
 * Libraries have been renamed from 7.0 to 7. This change also impacts
   downstream libraries like lldb.
 
-* The LoopInstSimplify pass (-loop-instsimplify) has been removed.
+* The LoopInstSimplify pass (``-loop-instsimplify``) has been removed.
 
 * Symbols starting with ``?`` are no longer mangled by LLVM when using the
   Windows ``x`` or ``w`` IR mangling schemes.
 
 * A new tool named :doc:`llvm-exegesis <CommandGuide/llvm-exegesis>` has been
   added. :program:`llvm-exegesis` automatically measures instruction scheduling
   properties (latency/uops) and provides a principled way to edit scheduling
   models.
 
 * A new tool named :doc:`llvm-mca <CommandGuide/llvm-mca>` has been added.
   :program:`llvm-mca` is a  static performance analysis tool that uses
   information available in LLVM to statically predict the performance of
   machine code for a specific CPU.
 
-* The optimization flag to merge constants (-fmerge-all-constants) is no longer
-  applied by default.
-
 * Optimization of floating-point casts is improved. This may cause surprising
-  results for code that is relying on the undefined behavior of overflowing 
+  results for code that is relying on the undefined behavior of overflowing
   casts. The optimization can be disabled by specifying a function attribute:
-  "strict-float-cast-overflow"="false". This attribute may be created by the
+  ``"strict-float-cast-overflow"="false"``. This attribute may be created by the
   clang option ``-fno-strict-float-cast-overflow``.
-  Code sanitizers can be used to detect affected patterns. The option for
-  detecting this problem alone is "-fsanitize=float-cast-overflow":
+  Code sanitizers can be used to detect affected patterns. The clang option for
+  detecting this problem alone is ``-fsanitize=float-cast-overflow``:
 
 .. code-block:: c
 
     int main() {
       float x = 4294967296.0f;
       x = (float)((int)x);
       printf("junk in the ftrunc: %f\n", x);
       return 0;
     }
 
 .. code-block:: bash
 
-    clang -O1 ftrunc.c -fsanitize=float-cast-overflow ; ./a.out 
+    clang -O1 ftrunc.c -fsanitize=float-cast-overflow ; ./a.out
     ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of representable values of type 'int'
     junk in the ftrunc: 0.000000
 
 * ``LLVM_ON_WIN32`` is no longer set by ``llvm/Config/config.h`` and
   ``llvm/Config/llvm-config.h``.  If you used this macro, use the compiler-set
   ``_WIN32`` instead which is set exactly when ``LLVM_ON_WIN32`` used to be set.
 
 * The ``DEBUG`` macro has been renamed to ``LLVM_DEBUG``, the interface remains
   the same.  If you used this macro you need to migrate to the new one.
   You should also clang-format your code to make it easier to integrate future
   changes locally.  This can be done with the following bash commands:
 
 .. code-block:: bash
 
     git grep -l 'DEBUG' | xargs perl -pi -e 's/\bDEBUG\s?\(/LLVM_DEBUG(/g'
     git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
 
-* Early support for UBsan, X-Ray instrumentation and libFuzzer (x86 and x86_64) for OpenBSD. Support for MSan
-  (x86_64), X-Ray instrumentation and libFuzzer (x86 and x86_64) for FreeBSD.
+* Early support for UBsan, X-Ray instrumentation and libFuzzer (x86 and x86_64)
+  for OpenBSD. Support for MSan (x86_64), X-Ray instrumentation and libFuzzer
+  (x86 and x86_64) for FreeBSD.
 
 * ``SmallVector<T, 0>`` shrank from ``sizeof(void*) * 4 + sizeof(T)`` to
   ``sizeof(void*) + sizeof(unsigned) * 2``, smaller than ``std::vector<T>`` on
   64-bit platforms.  The maximum capacity is now restricted to ``UINT32_MAX``.
   Since SmallVector doesn't have the exception-safety pessimizations some
-  implementations saddle std::vector with and is better at using ``realloc``,
-  it's now a better choice even on the heap (although when TinyPtrVector works,
-  it's even smaller).
+  implementations saddle ``std::vector`` with and is better at using ``realloc``,
+  it's now a better choice even on the heap (although when ``TinyPtrVector`` works,
+  that's even smaller).
 
 * Preliminary/experimental support for DWARF v5 debugging information,
-  including the new .debug_names accelerator table. DWARF emitted at ``-O0``
+  including the new ``.debug_names`` accelerator table. DWARF emitted at ``-O0``
   should be fully DWARF v5 compliant. Type units and split DWARF are known
   not to be compliant, and higher optimization levels will still emit some
   information in v4 format.
 
 * Added support for the ``.rva`` assembler directive for COFF targets.
 
 * The :program:`llvm-rc` tool (Windows Resource Compiler) has been improved
   a bit. There are still known missing features, but it is generally usable
   in many cases. (The tool still doesn't preprocess input files automatically,
   but it can now handle leftover C declarations in preprocessor output, if
   given output from a preprocessor run externally.)
 
-* CodeView debug info can now be emitted MinGW configurations, if requested.
+* CodeView debug info can now be emitted for MinGW configurations, if requested.
 
-* Note..
+* The :program:`opt` tool now supports the ``-load-pass-plugin`` option for
+  loading pass plugins for the new PassManager.
 
-.. NOTE
-   If you would like to document a larger change, then you can add a
-   subsection about it right here. You can copy the following boilerplate
-   and un-indent it (the indentation causes it to be inside this comment).
+* Support for profiling JITed code with perf.
 
-   Special New Feature
-   -------------------
 
-   Makes programs 10x faster by doing Special New Thing.
-
 Changes to the LLVM IR
 ----------------------
 
-* The signatures for the builtins @llvm.memcpy, @llvm.memmove, and @llvm.memset
-  have changed. Alignment is no longer an argument, and are instead conveyed as
-  parameter attributes.
+* The signatures for the builtins ``@llvm.memcpy``, ``@llvm.memmove``, and
+  ``@llvm.memset`` have changed. Alignment is no longer an argument, and are
+  instead conveyed as parameter attributes.
 
-* invariant.group.barrier has been renamed to launder.invariant.group.
+* ``invariant.group.barrier`` has been renamed to ``launder.invariant.group``.
 
-* invariant.group metadata can now refer only empty metadata nodes.
+* ``invariant.group`` metadata can now refer only to empty metadata nodes.
 
 Changes to the AArch64 Target
 -----------------------------
 
 * The ``.inst`` assembler directive is now usable on both COFF and Mach-O
   targets, in addition to ELF.
 
-* Support for most remaining COFF relocations have been added.
+* Support for most remaining COFF relocations has been added.
 
 * Support for TLS on Windows has been added.
 
+* Assembler and disassembler support for the ARM Scalable Vector Extension has
+  been added.
+
 Changes to the ARM Target
 -------------------------
 
 * The ``.inst`` assembler directive is now usable on both COFF and Mach-O
   targets, in addition to ELF. For Thumb, it can now also automatically
   deduce the instruction size, without having to specify it with
   e.g. ``.inst.w`` as before.
 
 Changes to the Hexagon Target
 -----------------------------
 
 * Hexagon now supports auto-vectorization for HVX. It is disabled by default
   and can be turned on with ``-fvectorize``. For auto-vectorization to take
   effect, code generation for HVX needs to be enabled with ``-mhvx``.
   The complete set of options should include ``-fvectorize``, ``-mhvx``,
   and ``-mhvx-length={64b|128b}``.
 
 * The support for Hexagon ISA V4 is deprecated and will be removed in the
   next release.
 
 Changes to the MIPS Target
 --------------------------
 
- During this release ...
+During this release the MIPS target has:
 
+* Added support for Virtualization, Global INValidate ASE,
+  and CRC ASE instructions.
 
+* Introduced definitions of ``[d]rem``, ``[d]remu``,
+  and microMIPSR6 ``ll/sc`` instructions.
+
+* Shrink-wrapping is now supported and enabled by default (except for ``-O0``).
+
+* Extended size reduction pass by the LWP and SWP instructions.
+
+* Gained initial support of GlobalISel instruction selection framework.
+
+* Updated the P5600 scheduler model not to use instruction itineraries.
+
+* Added disassembly support for comparison and fused (negative) multiply
+  ``add/sub`` instructions.
+
+* Improved the selection of multiple instructions.
+
+* Load/store ``lb``, ``sb``, ``ld``, ``sd``, ``lld``, ... instructions
+  now support 32/64-bit offsets.
+
+* Added support for ``y``, ``M``, and ``L`` inline assembler operand codes.
+
+* Extended list of relocations supported by the ``.reloc`` directive
+
+* Fixed using a wrong register class for creating an emergency
+  spill slot for mips3 / n64 ABI.
+
+* MIPS relocation types were generated for microMIPS code.
+
+* Corrected definitions of multiple instructions (``lwp``, ``swp``, ``ctc2``,
+  ``cfc2``, ``sync``, ``synci``, ``cvt.d.w``, ...).
+
+* Fixed atomic operations at ``-O0`` level.
+
+* Fixed local dynamic TLS with Sym64
+
 Changes to the PowerPC Target
 -----------------------------
 
- During this release ...
+During this release the PowerPC target has:
 
+* Replaced the list scheduler for post register allocation with the machine scheduler.
+
+* Added support for ``coldcc`` calling convention.
+
+* Added support for ``symbol@high`` and ``symbol@higha`` symbol modifiers.
+
+* Added support for quad-precision floating point type (``__float128``) under the llvm option ``-enable-ppc-quad-precision``.
+
+* Added dump function to ``LatencyPriorityQueue``.
+
+* Completed the Power9 scheduler model.
+
+* Optimized TLS code generation.
+
+* Improved MachineLICM for hoisting constant stores.
+
+* Improved code generation to reduce register use by using more register + immediate instructions.
+
+* Improved code generation to better exploit rotate-and-mask instructions.
+
+* Fixed the bug in dynamic loader for JIT which crashed NNVM.
+
+* Numerous bug fixes and code cleanups.
+
 Changes to the SystemZ Target
 -----------------------------
 
 During this release the SystemZ target has:
 
 * Added support for vector registers in inline asm statements.
 
 * Added support for stackmaps, patchpoints, and the anyregcc
   calling convention.
 
 * Changed the default function alignment to 16 bytes.
 
 * Improved codegen for condition code handling.
 
 * Improved instruction scheduling and microarchitecture tuning for z13/z14.
 
 * Fixed support for generating GCOV coverage data.
 
 * Fixed some codegen bugs.
 
 Changes to the X86 Target
 -------------------------
 
 * The calling convention for the ``f80`` data type on MinGW targets has been
   fixed. Normally, the calling convention for this type is handled within clang,
   but if an intrinsic is used, which LLVM expands into a libcall, the
   proper calling convention needs to be supported in LLVM as well. (Note,
   on Windows, this data type is only used for long doubles in MinGW
   environments - in MSVC environments, long doubles are the same size as
   normal doubles.)
 
-Changes to the AMDGPU Target
------------------------------
-
- During this release ...
-
-Changes to the AVR Target
------------------------------
-
- During this release ...
-
 Changes to the OCaml bindings
 -----------------------------
 
-* Remove ``add_bb_vectorize``.
+* Removed ``add_bb_vectorize``.
 
 
 Changes to the C API
 --------------------
 
-* Remove ``LLVMAddBBVectorizePass``. The implementation was removed and the C
+* Removed ``LLVMAddBBVectorizePass``. The implementation was removed and the C
   interface was made a deprecated no-op in LLVM 5. Use
   ``LLVMAddSLPVectorizePass`` instead to get the supported SLP vectorizer.
 
+* Expanded the OrcJIT APIs so they can register event listeners like debuggers
+  and profilers.
+
 Changes to the DAG infrastructure
 ---------------------------------
-* ADDC/ADDE/SUBC/SUBE are now deprecated and will default to expand. Backends
-  that wish to continue to use these opcodes should explicitely request so
+* ``ADDC``/``ADDE``/``SUBC``/``SUBE`` are now deprecated and will default to expand. Backends
+  that wish to continue to use these opcodes should explicitely request to do so
   using ``setOperationAction`` in their ``TargetLowering``. New backends
-  should use UADDO/ADDCARRY/USUBO/SUBCARRY instead of the deprecated opcodes.
+  should use ``UADDO``/``ADDCARRY``/``USUBO``/``SUBCARRY`` instead of the deprecated opcodes.
 
-* The SETCCE opcode has now been removed in favor of SETCCCARRY.
+* The ``SETCCE`` opcode has now been removed in favor of ``SETCCCARRY``.
 
-* TableGen now supports multi-alternative pattern fragments via the PatFrags
-  class.  PatFrag is now derived from PatFrags, which may require minor
-  changes to backends that directly access PatFrag members.
+* TableGen now supports multi-alternative pattern fragments via the ``PatFrags``
+  class.  ``PatFrag`` is now derived from ``PatFrags``, which may require minor
+  changes to backends that directly access ``PatFrag`` members.
 
+
 External Open Source Projects Using LLVM 7
 ==========================================
 
-* A project...
+Zig Programming Language
+------------------------
 
+`Zig <https://ziglang.org>`_  is an open-source programming language designed
+for robustness, optimality, and clarity. Zig is an alternative to C, providing
+high level features such as generics, compile time function execution, partial
+evaluation, and LLVM-based coroutines, while exposing low level LLVM IR
+features such as aliases and intrinsics. Zig uses Clang to provide automatic
+import of .h symbols - even inline functions and macros. Zig uses LLD combined
+with lazily building compiler-rt to provide out-of-the-box cross-compiling for
+all supported targets.
 
+
 Additional Information
 ======================
 
 A wide variety of additional information is available on the `LLVM web page
-<http://llvm.org/>`_, in particular in the `documentation
-<http://llvm.org/docs/>`_ section.  The web page also contains versions of the
+<https://llvm.org/>`_, in particular in the `documentation
+<https://llvm.org/docs/>`_ section.  The web page also contains versions of the
 API documentation which is up-to-date with the Subversion version of the source
 code.  You can access versions of these documents specific to this release by
 going into the ``llvm/docs/`` directory in the LLVM tree.
 
 If you have any questions or comments about LLVM, please feel free to contact
-us via the `mailing lists <http://llvm.org/docs/#maillist>`_.
+us via the `mailing lists <https://llvm.org/docs/#mailing-lists>`_.
Index: vendor/llvm/dist-release_70/docs/index.rst
===================================================================
--- vendor/llvm/dist-release_70/docs/index.rst	(revision 338574)
+++ vendor/llvm/dist-release_70/docs/index.rst	(revision 338575)
@@ -1,569 +1,564 @@
 Overview
 ========
 
-.. warning::
-
-   If you are using a released version of LLVM, see `the download page
-   <http://llvm.org/releases/>`_ to find your documentation.
-
 The LLVM compiler infrastructure supports a wide range of projects, from
 industrial strength compilers to specialized JIT applications to small
 research projects.
 
 Similarly, documentation is broken down into several high-level groupings
 targeted at different audiences:
 
 LLVM Design & Overview
 ======================
 
 Several introductory papers and presentations.
 
 .. toctree::
    :hidden:
 
    LangRef
 
 :doc:`LangRef`
   Defines the LLVM intermediate representation.
 
 `Introduction to the LLVM Compiler`__
   Presentation providing a users introduction to LLVM.
 
   .. __: http://llvm.org/pubs/2008-10-04-ACAT-LLVM-Intro.html
 
 `Intro to LLVM`__
   Book chapter providing a compiler hacker's introduction to LLVM.
 
   .. __: http://www.aosabook.org/en/llvm.html
 
 
 `LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation`__
   Design overview.
 
   .. __: http://llvm.org/pubs/2004-01-30-CGO-LLVM.html
 
 `LLVM: An Infrastructure for Multi-Stage Optimization`__
   More details (quite old now).
 
   .. __: http://llvm.org/pubs/2002-12-LattnerMSThesis.html
 
 `Publications mentioning LLVM <http://llvm.org/pubs>`_
    ..
 
 User Guides
 ===========
 
 For those new to the LLVM system.
 
 NOTE: If you are a user who is only interested in using LLVM-based
 compilers, you should look into `Clang <http://clang.llvm.org>`_ or
 `DragonEgg <http://dragonegg.llvm.org>`_ instead. The documentation here is
 intended for users who have a need to work with the intermediate LLVM
 representation.
 
 .. toctree::
    :hidden:
 
    CMake
    CMakePrimer
    AdvancedBuilds
    HowToBuildOnARM
    HowToCrossCompileBuiltinsOnArm
    HowToCrossCompileLLVM
    CommandGuide/index
    GettingStarted
    GettingStartedVS
    FAQ
    Lexicon
    HowToAddABuilder
    yaml2obj
    HowToSubmitABug
    SphinxQuickstartTemplate
    Phabricator
    TestingGuide
    tutorial/index
    ReleaseNotes
    Passes
    YamlIO
    GetElementPtr
    Frontend/PerformanceTips
    MCJITDesignAndImplementation
    CodeOfConduct
    CompileCudaWithLLVM
    ReportingGuide
    Benchmarking
    Docker
 
 :doc:`GettingStarted`
    Discusses how to get up and running quickly with the LLVM infrastructure.
    Everything from unpacking and compilation of the distribution to execution
    of some tools.
 
 :doc:`CMake`
    An addendum to the main Getting Started guide for those using the `CMake
    build system <http://www.cmake.org>`_.
 
 :doc:`HowToBuildOnARM`
    Notes on building and testing LLVM/Clang on ARM.
 
 :doc:`HowToCrossCompileBuiltinsOnArm`
    Notes on cross-building and testing the compiler-rt builtins for Arm.
 
 :doc:`HowToCrossCompileLLVM`
    Notes on cross-building and testing LLVM/Clang.
 
 :doc:`GettingStartedVS`
    An addendum to the main Getting Started guide for those using Visual Studio
    on Windows.
 
 :doc:`tutorial/index`
    Tutorials about using LLVM. Includes a tutorial about making a custom
    language with LLVM.
 
 :doc:`LLVM Command Guide <CommandGuide/index>`
    A reference manual for the LLVM command line utilities ("man" pages for LLVM
    tools).
 
 :doc:`Passes`
    A list of optimizations and analyses implemented in LLVM.
 
 :doc:`FAQ`
    A list of common questions and problems and their solutions.
 
 :doc:`Release notes for the current release <ReleaseNotes>`
    This describes new features, known bugs, and other limitations.
 
 :doc:`HowToSubmitABug`
    Instructions for properly submitting information about any bugs you run into
    in the LLVM system.
 
 :doc:`SphinxQuickstartTemplate`
   A template + tutorial for writing new Sphinx documentation. It is meant
   to be read in source form.
 
 :doc:`LLVM Testing Infrastructure Guide <TestingGuide>`
    A reference manual for using the LLVM testing infrastructure.
 
 `How to build the C, C++, ObjC, and ObjC++ front end`__
    Instructions for building the clang front-end from source.
 
    .. __: http://clang.llvm.org/get_started.html
 
 :doc:`Lexicon`
    Definition of acronyms, terms and concepts used in LLVM.
 
 :doc:`HowToAddABuilder`
    Instructions for adding new builder to LLVM buildbot master.
 
 :doc:`YamlIO`
    A reference guide for using LLVM's YAML I/O library.
 
 :doc:`GetElementPtr`
   Answers to some very frequent questions about LLVM's most frequently
   misunderstood instruction.
 
 :doc:`Frontend/PerformanceTips`
    A collection of tips for frontend authors on how to generate IR
    which LLVM is able to effectively optimize.
 
 :doc:`Docker`
    A reference for using Dockerfiles provided with LLVM.
 
 
 Programming Documentation
 =========================
 
 For developers of applications which use LLVM as a library.
 
 .. toctree::
    :hidden:
 
    Atomics
    CodingStandards
    CommandLine
    CompilerWriterInfo
    ExtendingLLVM
    HowToSetUpLLVMStyleRTTI
    ProgrammersManual
    Extensions
    LibFuzzer
    FuzzingLLVM
    ScudoHardenedAllocator
    OptBisect
 
 :doc:`LLVM Language Reference Manual <LangRef>`
   Defines the LLVM intermediate representation and the assembly form of the
   different nodes.
 
 :doc:`Atomics`
   Information about LLVM's concurrency model.
 
 :doc:`ProgrammersManual`
   Introduction to the general layout of the LLVM sourcebase, important classes
   and APIs, and some tips & tricks.
 
 :doc:`Extensions`
   LLVM-specific extensions to tools and formats LLVM seeks compatibility with.
 
 :doc:`CommandLine`
   Provides information on using the command line parsing library.
 
 :doc:`CodingStandards`
   Details the LLVM coding standards and provides useful information on writing
   efficient C++ code.
 
 :doc:`HowToSetUpLLVMStyleRTTI`
   How to make ``isa<>``, ``dyn_cast<>``, etc. available for clients of your
   class hierarchy.
 
 :doc:`ExtendingLLVM`
   Look here to see how to add instructions and intrinsics to LLVM.
 
 `Doxygen generated documentation <http://llvm.org/doxygen/>`_
   (`classes <http://llvm.org/doxygen/inherits.html>`_)
 
 `Documentation for Go bindings <http://godoc.org/llvm.org/llvm/bindings/go/llvm>`_
 
 `ViewVC Repository Browser <http://llvm.org/viewvc/>`_
    ..
 
 :doc:`CompilerWriterInfo`
   A list of helpful links for compiler writers.
 
 :doc:`LibFuzzer`
   A library for writing in-process guided fuzzers.
 
 :doc:`FuzzingLLVM`
   Information on writing and using Fuzzers to find bugs in LLVM.
 
 :doc:`ScudoHardenedAllocator`
   A library that implements a security-hardened `malloc()`.
 
 :doc:`OptBisect`
   A command line option for debugging optimization-induced failures.
 
 .. _index-subsystem-docs:
 
 Subsystem Documentation
 =======================
 
 For API clients and LLVM developers.
 
 .. toctree::
    :hidden:
 
    AliasAnalysis
    MemorySSA
    BitCodeFormat
    BlockFrequencyTerminology
    BranchWeightMetadata
    Bugpoint
    CodeGenerator
    ExceptionHandling
    LinkTimeOptimization
    SegmentedStacks
    TableGenFundamentals
    TableGen/index
    DebuggingJITedCode
    GoldPlugin
    MarkedUpDisassembly
    SystemLibrary
    SourceLevelDebugging
    Vectorizers
    WritingAnLLVMBackend
    GarbageCollection
    WritingAnLLVMPass
    HowToUseAttributes
    NVPTXUsage
    AMDGPUUsage
    StackMaps
    InAlloca
    BigEndianNEON
    CoverageMappingFormat
    Statepoints
    MergeFunctions
    TypeMetadata
    FaultMaps
    MIRLangRef
    Coroutines
    GlobalISel
    XRay
    XRayExample
    XRayFDRFormat
    PDB/index
    CFIVerify
 
 :doc:`WritingAnLLVMPass`
    Information on how to write LLVM transformations and analyses.
 
 :doc:`WritingAnLLVMBackend`
    Information on how to write LLVM backends for machine targets.
 
 :doc:`CodeGenerator`
    The design and implementation of the LLVM code generator.  Useful if you are
    working on retargetting LLVM to a new architecture, designing a new codegen
    pass, or enhancing existing components.
 
 :doc:`Machine IR (MIR) Format Reference Manual <MIRLangRef>`
    A reference manual for the MIR serialization format, which is used to test
    LLVM's code generation passes.
 
 :doc:`TableGen <TableGen/index>`
    Describes the TableGen tool, which is used heavily by the LLVM code
    generator.
 
 :doc:`AliasAnalysis`
    Information on how to write a new alias analysis implementation or how to
    use existing analyses.
 
 :doc:`MemorySSA`
    Information about the MemorySSA utility in LLVM, as well as how to use it.
 
 :doc:`GarbageCollection`
    The interfaces source-language compilers should use for compiling GC'd
    programs.
 
 :doc:`Source Level Debugging with LLVM <SourceLevelDebugging>`
    This document describes the design and philosophy behind the LLVM
    source-level debugger.
 
 :doc:`Vectorizers`
    This document describes the current status of vectorization in LLVM.
 
 :doc:`ExceptionHandling`
    This document describes the design and implementation of exception handling
    in LLVM.
 
 :doc:`Bugpoint`
    Automatic bug finder and test-case reducer description and usage
    information.
 
 :doc:`BitCodeFormat`
    This describes the file format and encoding used for LLVM "bc" files.
 
 :doc:`System Library <SystemLibrary>`
    This document describes the LLVM System Library (``lib/System``) and
    how to keep LLVM source code portable
 
 :doc:`LinkTimeOptimization`
    This document describes the interface between LLVM intermodular optimizer
    and the linker and its design
 
 :doc:`GoldPlugin`
    How to build your programs with link-time optimization on Linux.
 
 :doc:`DebuggingJITedCode`
    How to debug JITed code with GDB.
 
 :doc:`MCJITDesignAndImplementation`
    Describes the inner workings of MCJIT execution engine.
 
 :doc:`BranchWeightMetadata`
    Provides information about Branch Prediction Information.
 
 :doc:`BlockFrequencyTerminology`
    Provides information about terminology used in the ``BlockFrequencyInfo``
    analysis pass.
 
 :doc:`SegmentedStacks`
    This document describes segmented stacks and how they are used in LLVM.
 
 :doc:`MarkedUpDisassembly`
    This document describes the optional rich disassembly output syntax.
 
 :doc:`HowToUseAttributes`
   Answers some questions about the new Attributes infrastructure.
 
 :doc:`NVPTXUsage`
    This document describes using the NVPTX backend to compile GPU kernels.
 
 :doc:`AMDGPUUsage`
    This document describes using the AMDGPU backend to compile GPU kernels.
 
 :doc:`StackMaps`
   LLVM support for mapping instruction addresses to the location of
   values and allowing code to be patched.
 
 :doc:`BigEndianNEON`
   LLVM's support for generating NEON instructions on big endian ARM targets is
   somewhat nonintuitive. This document explains the implementation and rationale.
 
 :doc:`CoverageMappingFormat`
   This describes the format and encoding used for LLVM’s code coverage mapping.
 
 :doc:`Statepoints`
   This describes a set of experimental extensions for garbage
   collection support.
 
 :doc:`MergeFunctions`
   Describes functions merging optimization.
 
 :doc:`InAlloca`
   Description of the ``inalloca`` argument attribute.
 
 :doc:`FaultMaps`
   LLVM support for folding control flow into faulting machine instructions.
 
 :doc:`CompileCudaWithLLVM`
   LLVM support for CUDA.
 
 :doc:`Coroutines`
   LLVM support for coroutines.
 
 :doc:`GlobalISel`
   This describes the prototype instruction selection replacement, GlobalISel.
 
 :doc:`XRay`
   High-level documentation of how to use XRay in LLVM.
 
 :doc:`XRayExample`
   An example of how to debug an application with XRay.
 
 :doc:`The Microsoft PDB File Format <PDB/index>`
   A detailed description of the Microsoft PDB (Program Database) file format.
 
 :doc:`CFIVerify`
   A description of the verification tool for Control Flow Integrity.
 
 Development Process Documentation
 =================================
 
 Information about LLVM's development process.
 
 .. toctree::
    :hidden:
 
    Contributing
    DeveloperPolicy
    Projects
    LLVMBuild
    HowToReleaseLLVM
    Packaging
    ReleaseProcess
    Phabricator
 
 :doc:`Contributing`
    An overview on how to contribute to LLVM.
 
 :doc:`DeveloperPolicy`
    The LLVM project's policy towards developers and their contributions.
 
 :doc:`Projects`
   How-to guide and templates for new projects that *use* the LLVM
   infrastructure.  The templates (directory organization, Makefiles, and test
   tree) allow the project code to be located outside (or inside) the ``llvm/``
   tree, while using LLVM header files and libraries.
 
 :doc:`LLVMBuild`
   Describes the LLVMBuild organization and files used by LLVM to specify
   component descriptions.
 
 :doc:`HowToReleaseLLVM`
   This is a guide to preparing LLVM releases. Most developers can ignore it.
 
 :doc:`ReleaseProcess`
   This is a guide to validate a new release, during the release process. Most developers can ignore it.
 
 :doc:`Packaging`
    Advice on packaging LLVM into a distribution.
 
 :doc:`Phabricator`
    Describes how to use the Phabricator code review tool hosted on
    http://reviews.llvm.org/ and its command line interface, Arcanist.
 
 Community
 =========
 
 LLVM has a thriving community of friendly and helpful developers.
 The two primary communication mechanisms in the LLVM community are mailing
 lists and IRC.
 
 Mailing Lists
 -------------
 
 If you can't find what you need in these docs, try consulting the mailing
 lists.
 
 `Developer's List (llvm-dev)`__
   This list is for people who want to be included in technical discussions of
   LLVM. People post to this list when they have questions about writing code
   for or using the LLVM tools. It is relatively low volume.
 
   .. __: http://lists.llvm.org/mailman/listinfo/llvm-dev
 
 `Commits Archive (llvm-commits)`__
   This list contains all commit messages that are made when LLVM developers
   commit code changes to the repository. It also serves as a forum for
   patch review (i.e. send patches here). It is useful for those who want to
   stay on the bleeding edge of LLVM development. This list is very high
   volume.
 
   .. __: http://lists.llvm.org/pipermail/llvm-commits/
 
 `Bugs & Patches Archive (llvm-bugs)`__
   This list gets emailed every time a bug is opened and closed. It is
   higher volume than the LLVM-dev list.
 
   .. __: http://lists.llvm.org/pipermail/llvm-bugs/
 
 `Test Results Archive (llvm-testresults)`__
   A message is automatically sent to this list by every active nightly tester
   when it completes.  As such, this list gets email several times each day,
   making it a high volume list.
 
   .. __: http://lists.llvm.org/pipermail/llvm-testresults/
 
 `LLVM Announcements List (llvm-announce)`__
   This is a low volume list that provides important announcements regarding
   LLVM.  It gets email about once a month.
 
   .. __: http://lists.llvm.org/mailman/listinfo/llvm-announce
 
 IRC
 ---
 
 Users and developers of the LLVM project (including subprojects such as Clang)
 can be found in #llvm on `irc.oftc.net <irc://irc.oftc.net/llvm>`_.
 
 This channel has several bots.
 
 * Buildbot reporters
 
   * llvmbb - Bot for the main LLVM buildbot master.
     http://lab.llvm.org:8011/console
   * bb-chapuni - An individually run buildbot master. http://bb.pgr.jp/console
   * smooshlab - Apple's internal buildbot master.
 
 * robot - Bugzilla linker. %bug <number>
 
 * clang-bot - A `geordi <http://www.eelis.net/geordi/>`_ instance running
   near-trunk clang instead of gcc.
 
 Community wide proposals
 ------------------------
 
 Proposals for massive changes in how the community behaves and how the work flow
 can be better.
 
 .. toctree::
    :hidden:
 
    CodeOfConduct
    Proposals/GitHubMove
    Proposals/VectorizationPlan
 
 :doc:`CodeOfConduct`
    Proposal to adopt a code of conduct on the LLVM social spaces (lists, events,
    IRC, etc).
 
 :doc:`Proposals/GitHubMove`
    Proposal to move from SVN/Git to GitHub.
 
 :doc:`Proposals/VectorizationPlan`
    Proposal to model the process and upgrade the infrastructure of LLVM's Loop Vectorizer.
 
 Indices and tables
 ==================
 
 * :ref:`genindex`
 * :ref:`search`
Index: vendor/llvm/dist-release_70/lib/MC/MCParser/AsmParser.cpp
===================================================================
--- vendor/llvm/dist-release_70/lib/MC/MCParser/AsmParser.cpp	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/MC/MCParser/AsmParser.cpp	(revision 338575)
@@ -1,5899 +1,5899 @@
 //===- AsmParser.cpp - Parser for Assembly Files --------------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This class implements the parser for assembly files.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/ADT/APFloat.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/None.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringMap.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/BinaryFormat/Dwarf.h"
 #include "llvm/MC/MCAsmInfo.h"
 #include "llvm/MC/MCCodeView.h"
 #include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCDirectives.h"
 #include "llvm/MC/MCDwarf.h"
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCInstPrinter.h"
 #include "llvm/MC/MCInstrDesc.h"
 #include "llvm/MC/MCInstrInfo.h"
 #include "llvm/MC/MCObjectFileInfo.h"
 #include "llvm/MC/MCParser/AsmCond.h"
 #include "llvm/MC/MCParser/AsmLexer.h"
 #include "llvm/MC/MCParser/MCAsmLexer.h"
 #include "llvm/MC/MCParser/MCAsmParser.h"
 #include "llvm/MC/MCParser/MCAsmParserExtension.h"
 #include "llvm/MC/MCParser/MCAsmParserUtils.h"
 #include "llvm/MC/MCParser/MCParsedAsmOperand.h"
 #include "llvm/MC/MCParser/MCTargetAsmParser.h"
 #include "llvm/MC/MCRegisterInfo.h"
 #include "llvm/MC/MCSection.h"
 #include "llvm/MC/MCStreamer.h"
 #include "llvm/MC/MCSymbol.h"
 #include "llvm/MC/MCTargetOptions.h"
 #include "llvm/MC/MCValue.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MD5.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/SMLoc.h"
 #include "llvm/Support/SourceMgr.h"
 #include "llvm/Support/raw_ostream.h"
 #include <algorithm>
 #include <cassert>
 #include <cctype>
 #include <climits>
 #include <cstddef>
 #include <cstdint>
 #include <deque>
 #include <memory>
 #include <sstream>
 #include <string>
 #include <tuple>
 #include <utility>
 #include <vector>
 
 using namespace llvm;
 
 MCAsmParserSemaCallback::~MCAsmParserSemaCallback() = default;
 
 static cl::opt<unsigned> AsmMacroMaxNestingDepth(
      "asm-macro-max-nesting-depth", cl::init(20), cl::Hidden,
      cl::desc("The maximum nesting depth allowed for assembly macros."));
 
 namespace {
 
 /// Helper types for tracking macro definitions.
 typedef std::vector<AsmToken> MCAsmMacroArgument;
 typedef std::vector<MCAsmMacroArgument> MCAsmMacroArguments;
 
 /// Helper class for storing information about an active macro
 /// instantiation.
 struct MacroInstantiation {
   /// The location of the instantiation.
   SMLoc InstantiationLoc;
 
   /// The buffer where parsing should resume upon instantiation completion.
   int ExitBuffer;
 
   /// The location where parsing should resume upon instantiation completion.
   SMLoc ExitLoc;
 
   /// The depth of TheCondStack at the start of the instantiation.
   size_t CondStackDepth;
 
 public:
   MacroInstantiation(SMLoc IL, int EB, SMLoc EL, size_t CondStackDepth);
 };
 
 struct ParseStatementInfo {
   /// The parsed operands from the last parsed statement.
   SmallVector<std::unique_ptr<MCParsedAsmOperand>, 8> ParsedOperands;
 
   /// The opcode from the last parsed instruction.
   unsigned Opcode = ~0U;
 
   /// Was there an error parsing the inline assembly?
   bool ParseError = false;
 
   SmallVectorImpl<AsmRewrite> *AsmRewrites = nullptr;
 
   ParseStatementInfo() = delete;
   ParseStatementInfo(SmallVectorImpl<AsmRewrite> *rewrites)
     : AsmRewrites(rewrites) {}
 };
 
 /// The concrete assembly parser instance.
 class AsmParser : public MCAsmParser {
 private:
   AsmLexer Lexer;
   MCContext &Ctx;
   MCStreamer &Out;
   const MCAsmInfo &MAI;
   SourceMgr &SrcMgr;
   SourceMgr::DiagHandlerTy SavedDiagHandler;
   void *SavedDiagContext;
   std::unique_ptr<MCAsmParserExtension> PlatformParser;
 
   /// This is the current buffer index we're lexing from as managed by the
   /// SourceMgr object.
   unsigned CurBuffer;
 
   AsmCond TheCondState;
   std::vector<AsmCond> TheCondStack;
 
   /// maps directive names to handler methods in parser
   /// extensions. Extensions register themselves in this map by calling
   /// addDirectiveHandler.
   StringMap<ExtensionDirectiveHandler> ExtensionDirectiveMap;
 
   /// Stack of active macro instantiations.
   std::vector<MacroInstantiation*> ActiveMacros;
 
   /// List of bodies of anonymous macros.
   std::deque<MCAsmMacro> MacroLikeBodies;
 
   /// Boolean tracking whether macro substitution is enabled.
   unsigned MacrosEnabledFlag : 1;
 
   /// Keeps track of how many .macro's have been instantiated.
   unsigned NumOfMacroInstantiations;
 
   /// The values from the last parsed cpp hash file line comment if any.
   struct CppHashInfoTy {
     StringRef Filename;
     int64_t LineNumber = 0;
     SMLoc Loc;
     unsigned Buf = 0;
   };
   CppHashInfoTy CppHashInfo;
 
   /// List of forward directional labels for diagnosis at the end.
   SmallVector<std::tuple<SMLoc, CppHashInfoTy, MCSymbol *>, 4> DirLabels;
 
   /// AssemblerDialect. ~OU means unset value and use value provided by MAI.
   unsigned AssemblerDialect = ~0U;
 
   /// is Darwin compatibility enabled?
   bool IsDarwin = false;
 
   /// Are we parsing ms-style inline assembly?
   bool ParsingInlineAsm = false;
 
   /// Did we already inform the user about inconsistent MD5 usage?
   bool ReportedInconsistentMD5 = false;
 
 public:
   AsmParser(SourceMgr &SM, MCContext &Ctx, MCStreamer &Out,
             const MCAsmInfo &MAI, unsigned CB);
   AsmParser(const AsmParser &) = delete;
   AsmParser &operator=(const AsmParser &) = delete;
   ~AsmParser() override;
 
   bool Run(bool NoInitialTextSection, bool NoFinalize = false) override;
 
   void addDirectiveHandler(StringRef Directive,
                            ExtensionDirectiveHandler Handler) override {
     ExtensionDirectiveMap[Directive] = Handler;
   }
 
   void addAliasForDirective(StringRef Directive, StringRef Alias) override {
     DirectiveKindMap[Directive] = DirectiveKindMap[Alias];
   }
 
   /// @name MCAsmParser Interface
   /// {
 
   SourceMgr &getSourceManager() override { return SrcMgr; }
   MCAsmLexer &getLexer() override { return Lexer; }
   MCContext &getContext() override { return Ctx; }
   MCStreamer &getStreamer() override { return Out; }
 
   CodeViewContext &getCVContext() { return Ctx.getCVContext(); }
 
   unsigned getAssemblerDialect() override {
     if (AssemblerDialect == ~0U)
       return MAI.getAssemblerDialect();
     else
       return AssemblerDialect;
   }
   void setAssemblerDialect(unsigned i) override {
     AssemblerDialect = i;
   }
 
   void Note(SMLoc L, const Twine &Msg, SMRange Range = None) override;
   bool Warning(SMLoc L, const Twine &Msg, SMRange Range = None) override;
   bool printError(SMLoc L, const Twine &Msg, SMRange Range = None) override;
 
   const AsmToken &Lex() override;
 
   void setParsingInlineAsm(bool V) override {
     ParsingInlineAsm = V;
     Lexer.setParsingMSInlineAsm(V);
   }
   bool isParsingInlineAsm() override { return ParsingInlineAsm; }
 
   bool parseMSInlineAsm(void *AsmLoc, std::string &AsmString,
                         unsigned &NumOutputs, unsigned &NumInputs,
                         SmallVectorImpl<std::pair<void *,bool>> &OpDecls,
                         SmallVectorImpl<std::string> &Constraints,
                         SmallVectorImpl<std::string> &Clobbers,
                         const MCInstrInfo *MII, const MCInstPrinter *IP,
                         MCAsmParserSemaCallback &SI) override;
 
   bool parseExpression(const MCExpr *&Res);
   bool parseExpression(const MCExpr *&Res, SMLoc &EndLoc) override;
   bool parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) override;
   bool parseParenExpression(const MCExpr *&Res, SMLoc &EndLoc) override;
   bool parseParenExprOfDepth(unsigned ParenDepth, const MCExpr *&Res,
                              SMLoc &EndLoc) override;
   bool parseAbsoluteExpression(int64_t &Res) override;
 
   /// Parse a floating point expression using the float \p Semantics
   /// and set \p Res to the value.
   bool parseRealValue(const fltSemantics &Semantics, APInt &Res);
 
   /// Parse an identifier or string (as a quoted identifier)
   /// and set \p Res to the identifier contents.
   bool parseIdentifier(StringRef &Res) override;
   void eatToEndOfStatement() override;
 
   bool checkForValidSection() override;
 
   /// }
 
 private:
   bool isAltmacroString(SMLoc &StrLoc, SMLoc &EndLoc);
   void altMacroString(StringRef AltMacroStr, std::string &Res);
   bool parseStatement(ParseStatementInfo &Info,
                       MCAsmParserSemaCallback *SI);
   bool parseCurlyBlockScope(SmallVectorImpl<AsmRewrite>& AsmStrRewrites);
   bool parseCppHashLineFilenameComment(SMLoc L);
 
   void checkForBadMacro(SMLoc DirectiveLoc, StringRef Name, StringRef Body,
                         ArrayRef<MCAsmMacroParameter> Parameters);
   bool expandMacro(raw_svector_ostream &OS, StringRef Body,
                    ArrayRef<MCAsmMacroParameter> Parameters,
                    ArrayRef<MCAsmMacroArgument> A, bool EnableAtPseudoVariable,
                    SMLoc L);
 
   /// Are macros enabled in the parser?
   bool areMacrosEnabled() {return MacrosEnabledFlag;}
 
   /// Control a flag in the parser that enables or disables macros.
   void setMacrosEnabled(bool Flag) {MacrosEnabledFlag = Flag;}
 
   /// Are we inside a macro instantiation?
   bool isInsideMacroInstantiation() {return !ActiveMacros.empty();}
 
   /// Handle entry to macro instantiation.
   ///
   /// \param M The macro.
   /// \param NameLoc Instantiation location.
   bool handleMacroEntry(const MCAsmMacro *M, SMLoc NameLoc);
 
   /// Handle exit from macro instantiation.
   void handleMacroExit();
 
   /// Extract AsmTokens for a macro argument.
   bool parseMacroArgument(MCAsmMacroArgument &MA, bool Vararg);
 
   /// Parse all macro arguments for a given macro.
   bool parseMacroArguments(const MCAsmMacro *M, MCAsmMacroArguments &A);
 
   void printMacroInstantiations();
   void printMessage(SMLoc Loc, SourceMgr::DiagKind Kind, const Twine &Msg,
                     SMRange Range = None) const {
     ArrayRef<SMRange> Ranges(Range);
     SrcMgr.PrintMessage(Loc, Kind, Msg, Ranges);
   }
   static void DiagHandler(const SMDiagnostic &Diag, void *Context);
 
   /// Should we emit DWARF describing this assembler source?  (Returns false if
   /// the source has .file directives, which means we don't want to generate
   /// info describing the assembler source itself.)
   bool enabledGenDwarfForAssembly();
 
   /// Enter the specified file. This returns true on failure.
   bool enterIncludeFile(const std::string &Filename);
 
   /// Process the specified file for the .incbin directive.
   /// This returns true on failure.
   bool processIncbinFile(const std::string &Filename, int64_t Skip = 0,
                          const MCExpr *Count = nullptr, SMLoc Loc = SMLoc());
 
   /// Reset the current lexer position to that given by \p Loc. The
   /// current token is not set; clients should ensure Lex() is called
   /// subsequently.
   ///
   /// \param InBuffer If not 0, should be the known buffer id that contains the
   /// location.
   void jumpToLoc(SMLoc Loc, unsigned InBuffer = 0);
 
   /// Parse up to the end of statement and a return the contents from the
   /// current token until the end of the statement; the current token on exit
   /// will be either the EndOfStatement or EOF.
   StringRef parseStringToEndOfStatement() override;
 
   /// Parse until the end of a statement or a comma is encountered,
   /// return the contents from the current token up to the end or comma.
   StringRef parseStringToComma();
 
   bool parseAssignment(StringRef Name, bool allow_redef,
                        bool NoDeadStrip = false);
 
   unsigned getBinOpPrecedence(AsmToken::TokenKind K,
                               MCBinaryExpr::Opcode &Kind);
 
   bool parseBinOpRHS(unsigned Precedence, const MCExpr *&Res, SMLoc &EndLoc);
   bool parseParenExpr(const MCExpr *&Res, SMLoc &EndLoc);
   bool parseBracketExpr(const MCExpr *&Res, SMLoc &EndLoc);
 
   bool parseRegisterOrRegisterNumber(int64_t &Register, SMLoc DirectiveLoc);
 
   bool parseCVFunctionId(int64_t &FunctionId, StringRef DirectiveName);
   bool parseCVFileId(int64_t &FileId, StringRef DirectiveName);
 
   // Generic (target and platform independent) directive parsing.
   enum DirectiveKind {
     DK_NO_DIRECTIVE, // Placeholder
     DK_SET,
     DK_EQU,
     DK_EQUIV,
     DK_ASCII,
     DK_ASCIZ,
     DK_STRING,
     DK_BYTE,
     DK_SHORT,
     DK_RELOC,
     DK_VALUE,
     DK_2BYTE,
     DK_LONG,
     DK_INT,
     DK_4BYTE,
     DK_QUAD,
     DK_8BYTE,
     DK_OCTA,
     DK_DC,
     DK_DC_A,
     DK_DC_B,
     DK_DC_D,
     DK_DC_L,
     DK_DC_S,
     DK_DC_W,
     DK_DC_X,
     DK_DCB,
     DK_DCB_B,
     DK_DCB_D,
     DK_DCB_L,
     DK_DCB_S,
     DK_DCB_W,
     DK_DCB_X,
     DK_DS,
     DK_DS_B,
     DK_DS_D,
     DK_DS_L,
     DK_DS_P,
     DK_DS_S,
     DK_DS_W,
     DK_DS_X,
     DK_SINGLE,
     DK_FLOAT,
     DK_DOUBLE,
     DK_ALIGN,
     DK_ALIGN32,
     DK_BALIGN,
     DK_BALIGNW,
     DK_BALIGNL,
     DK_P2ALIGN,
     DK_P2ALIGNW,
     DK_P2ALIGNL,
     DK_ORG,
     DK_FILL,
     DK_ENDR,
     DK_BUNDLE_ALIGN_MODE,
     DK_BUNDLE_LOCK,
     DK_BUNDLE_UNLOCK,
     DK_ZERO,
     DK_EXTERN,
     DK_GLOBL,
     DK_GLOBAL,
     DK_LAZY_REFERENCE,
     DK_NO_DEAD_STRIP,
     DK_SYMBOL_RESOLVER,
     DK_PRIVATE_EXTERN,
     DK_REFERENCE,
     DK_WEAK_DEFINITION,
     DK_WEAK_REFERENCE,
     DK_WEAK_DEF_CAN_BE_HIDDEN,
     DK_COMM,
     DK_COMMON,
     DK_LCOMM,
     DK_ABORT,
     DK_INCLUDE,
     DK_INCBIN,
     DK_CODE16,
     DK_CODE16GCC,
     DK_REPT,
     DK_IRP,
     DK_IRPC,
     DK_IF,
     DK_IFEQ,
     DK_IFGE,
     DK_IFGT,
     DK_IFLE,
     DK_IFLT,
     DK_IFNE,
     DK_IFB,
     DK_IFNB,
     DK_IFC,
     DK_IFEQS,
     DK_IFNC,
     DK_IFNES,
     DK_IFDEF,
     DK_IFNDEF,
     DK_IFNOTDEF,
     DK_ELSEIF,
     DK_ELSE,
     DK_ENDIF,
     DK_SPACE,
     DK_SKIP,
     DK_FILE,
     DK_LINE,
     DK_LOC,
     DK_STABS,
     DK_CV_FILE,
     DK_CV_FUNC_ID,
     DK_CV_INLINE_SITE_ID,
     DK_CV_LOC,
     DK_CV_LINETABLE,
     DK_CV_INLINE_LINETABLE,
     DK_CV_DEF_RANGE,
     DK_CV_STRINGTABLE,
     DK_CV_FILECHECKSUMS,
     DK_CV_FILECHECKSUM_OFFSET,
     DK_CV_FPO_DATA,
     DK_CFI_SECTIONS,
     DK_CFI_STARTPROC,
     DK_CFI_ENDPROC,
     DK_CFI_DEF_CFA,
     DK_CFI_DEF_CFA_OFFSET,
     DK_CFI_ADJUST_CFA_OFFSET,
     DK_CFI_DEF_CFA_REGISTER,
     DK_CFI_OFFSET,
     DK_CFI_REL_OFFSET,
     DK_CFI_PERSONALITY,
     DK_CFI_LSDA,
     DK_CFI_REMEMBER_STATE,
     DK_CFI_RESTORE_STATE,
     DK_CFI_SAME_VALUE,
     DK_CFI_RESTORE,
     DK_CFI_ESCAPE,
     DK_CFI_RETURN_COLUMN,
     DK_CFI_SIGNAL_FRAME,
     DK_CFI_UNDEFINED,
     DK_CFI_REGISTER,
     DK_CFI_WINDOW_SAVE,
     DK_MACROS_ON,
     DK_MACROS_OFF,
     DK_ALTMACRO,
     DK_NOALTMACRO,
     DK_MACRO,
     DK_EXITM,
     DK_ENDM,
     DK_ENDMACRO,
     DK_PURGEM,
     DK_SLEB128,
     DK_ULEB128,
     DK_ERR,
     DK_ERROR,
     DK_WARNING,
     DK_PRINT,
     DK_ADDRSIG,
     DK_ADDRSIG_SYM,
     DK_END
   };
 
   /// Maps directive name --> DirectiveKind enum, for
   /// directives parsed by this class.
   StringMap<DirectiveKind> DirectiveKindMap;
 
   // ".ascii", ".asciz", ".string"
   bool parseDirectiveAscii(StringRef IDVal, bool ZeroTerminated);
   bool parseDirectiveReloc(SMLoc DirectiveLoc); // ".reloc"
   bool parseDirectiveValue(StringRef IDVal,
                            unsigned Size);       // ".byte", ".long", ...
   bool parseDirectiveOctaValue(StringRef IDVal); // ".octa", ...
   bool parseDirectiveRealValue(StringRef IDVal,
                                const fltSemantics &); // ".single", ...
   bool parseDirectiveFill(); // ".fill"
   bool parseDirectiveZero(); // ".zero"
   // ".set", ".equ", ".equiv"
   bool parseDirectiveSet(StringRef IDVal, bool allow_redef);
   bool parseDirectiveOrg(); // ".org"
   // ".align{,32}", ".p2align{,w,l}"
   bool parseDirectiveAlign(bool IsPow2, unsigned ValueSize);
 
   // ".file", ".line", ".loc", ".stabs"
   bool parseDirectiveFile(SMLoc DirectiveLoc);
   bool parseDirectiveLine();
   bool parseDirectiveLoc();
   bool parseDirectiveStabs();
 
   // ".cv_file", ".cv_func_id", ".cv_inline_site_id", ".cv_loc", ".cv_linetable",
   // ".cv_inline_linetable", ".cv_def_range"
   bool parseDirectiveCVFile();
   bool parseDirectiveCVFuncId();
   bool parseDirectiveCVInlineSiteId();
   bool parseDirectiveCVLoc();
   bool parseDirectiveCVLinetable();
   bool parseDirectiveCVInlineLinetable();
   bool parseDirectiveCVDefRange();
   bool parseDirectiveCVStringTable();
   bool parseDirectiveCVFileChecksums();
   bool parseDirectiveCVFileChecksumOffset();
   bool parseDirectiveCVFPOData();
 
   // .cfi directives
   bool parseDirectiveCFIRegister(SMLoc DirectiveLoc);
   bool parseDirectiveCFIWindowSave();
   bool parseDirectiveCFISections();
   bool parseDirectiveCFIStartProc();
   bool parseDirectiveCFIEndProc();
   bool parseDirectiveCFIDefCfaOffset();
   bool parseDirectiveCFIDefCfa(SMLoc DirectiveLoc);
   bool parseDirectiveCFIAdjustCfaOffset();
   bool parseDirectiveCFIDefCfaRegister(SMLoc DirectiveLoc);
   bool parseDirectiveCFIOffset(SMLoc DirectiveLoc);
   bool parseDirectiveCFIRelOffset(SMLoc DirectiveLoc);
   bool parseDirectiveCFIPersonalityOrLsda(bool IsPersonality);
   bool parseDirectiveCFIRememberState();
   bool parseDirectiveCFIRestoreState();
   bool parseDirectiveCFISameValue(SMLoc DirectiveLoc);
   bool parseDirectiveCFIRestore(SMLoc DirectiveLoc);
   bool parseDirectiveCFIEscape();
   bool parseDirectiveCFIReturnColumn(SMLoc DirectiveLoc);
   bool parseDirectiveCFISignalFrame();
   bool parseDirectiveCFIUndefined(SMLoc DirectiveLoc);
 
   // macro directives
   bool parseDirectivePurgeMacro(SMLoc DirectiveLoc);
   bool parseDirectiveExitMacro(StringRef Directive);
   bool parseDirectiveEndMacro(StringRef Directive);
   bool parseDirectiveMacro(SMLoc DirectiveLoc);
   bool parseDirectiveMacrosOnOff(StringRef Directive);
   // alternate macro mode directives
   bool parseDirectiveAltmacro(StringRef Directive);
   // ".bundle_align_mode"
   bool parseDirectiveBundleAlignMode();
   // ".bundle_lock"
   bool parseDirectiveBundleLock();
   // ".bundle_unlock"
   bool parseDirectiveBundleUnlock();
 
   // ".space", ".skip"
   bool parseDirectiveSpace(StringRef IDVal);
 
   // ".dcb"
   bool parseDirectiveDCB(StringRef IDVal, unsigned Size);
   bool parseDirectiveRealDCB(StringRef IDVal, const fltSemantics &);
   // ".ds"
   bool parseDirectiveDS(StringRef IDVal, unsigned Size);
 
   // .sleb128 (Signed=true) and .uleb128 (Signed=false)
   bool parseDirectiveLEB128(bool Signed);
 
   /// Parse a directive like ".globl" which
   /// accepts a single symbol (which should be a label or an external).
   bool parseDirectiveSymbolAttribute(MCSymbolAttr Attr);
 
   bool parseDirectiveComm(bool IsLocal); // ".comm" and ".lcomm"
 
   bool parseDirectiveAbort(); // ".abort"
   bool parseDirectiveInclude(); // ".include"
   bool parseDirectiveIncbin(); // ".incbin"
 
   // ".if", ".ifeq", ".ifge", ".ifgt" , ".ifle", ".iflt" or ".ifne"
   bool parseDirectiveIf(SMLoc DirectiveLoc, DirectiveKind DirKind);
   // ".ifb" or ".ifnb", depending on ExpectBlank.
   bool parseDirectiveIfb(SMLoc DirectiveLoc, bool ExpectBlank);
   // ".ifc" or ".ifnc", depending on ExpectEqual.
   bool parseDirectiveIfc(SMLoc DirectiveLoc, bool ExpectEqual);
   // ".ifeqs" or ".ifnes", depending on ExpectEqual.
   bool parseDirectiveIfeqs(SMLoc DirectiveLoc, bool ExpectEqual);
   // ".ifdef" or ".ifndef", depending on expect_defined
   bool parseDirectiveIfdef(SMLoc DirectiveLoc, bool expect_defined);
   bool parseDirectiveElseIf(SMLoc DirectiveLoc); // ".elseif"
   bool parseDirectiveElse(SMLoc DirectiveLoc); // ".else"
   bool parseDirectiveEndIf(SMLoc DirectiveLoc); // .endif
   bool parseEscapedString(std::string &Data) override;
 
   const MCExpr *applyModifierToExpr(const MCExpr *E,
                                     MCSymbolRefExpr::VariantKind Variant);
 
   // Macro-like directives
   MCAsmMacro *parseMacroLikeBody(SMLoc DirectiveLoc);
   void instantiateMacroLikeBody(MCAsmMacro *M, SMLoc DirectiveLoc,
                                 raw_svector_ostream &OS);
   bool parseDirectiveRept(SMLoc DirectiveLoc, StringRef Directive);
   bool parseDirectiveIrp(SMLoc DirectiveLoc);  // ".irp"
   bool parseDirectiveIrpc(SMLoc DirectiveLoc); // ".irpc"
   bool parseDirectiveEndr(SMLoc DirectiveLoc); // ".endr"
 
   // "_emit" or "__emit"
   bool parseDirectiveMSEmit(SMLoc DirectiveLoc, ParseStatementInfo &Info,
                             size_t Len);
 
   // "align"
   bool parseDirectiveMSAlign(SMLoc DirectiveLoc, ParseStatementInfo &Info);
 
   // "end"
   bool parseDirectiveEnd(SMLoc DirectiveLoc);
 
   // ".err" or ".error"
   bool parseDirectiveError(SMLoc DirectiveLoc, bool WithMessage);
 
   // ".warning"
   bool parseDirectiveWarning(SMLoc DirectiveLoc);
 
   // .print <double-quotes-string>
   bool parseDirectivePrint(SMLoc DirectiveLoc);
 
   // Directives to support address-significance tables.
   bool parseDirectiveAddrsig();
   bool parseDirectiveAddrsigSym();
 
   void initializeDirectiveKindMap();
 };
 
 } // end anonymous namespace
 
 namespace llvm {
 
 extern MCAsmParserExtension *createDarwinAsmParser();
 extern MCAsmParserExtension *createELFAsmParser();
 extern MCAsmParserExtension *createCOFFAsmParser();
 
 } // end namespace llvm
 
 enum { DEFAULT_ADDRSPACE = 0 };
 
 AsmParser::AsmParser(SourceMgr &SM, MCContext &Ctx, MCStreamer &Out,
                      const MCAsmInfo &MAI, unsigned CB = 0)
     : Lexer(MAI), Ctx(Ctx), Out(Out), MAI(MAI), SrcMgr(SM),
       CurBuffer(CB ? CB : SM.getMainFileID()), MacrosEnabledFlag(true) {
   HadError = false;
   // Save the old handler.
   SavedDiagHandler = SrcMgr.getDiagHandler();
   SavedDiagContext = SrcMgr.getDiagContext();
   // Set our own handler which calls the saved handler.
   SrcMgr.setDiagHandler(DiagHandler, this);
   Lexer.setBuffer(SrcMgr.getMemoryBuffer(CurBuffer)->getBuffer());
 
   // Initialize the platform / file format parser.
   switch (Ctx.getObjectFileInfo()->getObjectFileType()) {
   case MCObjectFileInfo::IsCOFF:
     PlatformParser.reset(createCOFFAsmParser());
     break;
   case MCObjectFileInfo::IsMachO:
     PlatformParser.reset(createDarwinAsmParser());
     IsDarwin = true;
     break;
   case MCObjectFileInfo::IsELF:
     PlatformParser.reset(createELFAsmParser());
     break;
   case MCObjectFileInfo::IsWasm:
     // TODO: WASM will need its own MCAsmParserExtension implementation, but
     // for now we can re-use the ELF one, since the directives can be the
     // same for now.
     PlatformParser.reset(createELFAsmParser());
     break;
   }
 
   PlatformParser->Initialize(*this);
   initializeDirectiveKindMap();
 
   NumOfMacroInstantiations = 0;
 }
 
 AsmParser::~AsmParser() {
   assert((HadError || ActiveMacros.empty()) &&
          "Unexpected active macro instantiation!");
 
   // Restore the saved diagnostics handler and context for use during
   // finalization.
   SrcMgr.setDiagHandler(SavedDiagHandler, SavedDiagContext);
 }
 
 void AsmParser::printMacroInstantiations() {
   // Print the active macro instantiation stack.
   for (std::vector<MacroInstantiation *>::const_reverse_iterator
            it = ActiveMacros.rbegin(),
            ie = ActiveMacros.rend();
        it != ie; ++it)
     printMessage((*it)->InstantiationLoc, SourceMgr::DK_Note,
                  "while in macro instantiation");
 }
 
 void AsmParser::Note(SMLoc L, const Twine &Msg, SMRange Range) {
   printPendingErrors();
   printMessage(L, SourceMgr::DK_Note, Msg, Range);
   printMacroInstantiations();
 }
 
 bool AsmParser::Warning(SMLoc L, const Twine &Msg, SMRange Range) {
   if(getTargetParser().getTargetOptions().MCNoWarn)
     return false;
   if (getTargetParser().getTargetOptions().MCFatalWarnings)
     return Error(L, Msg, Range);
   printMessage(L, SourceMgr::DK_Warning, Msg, Range);
   printMacroInstantiations();
   return false;
 }
 
 bool AsmParser::printError(SMLoc L, const Twine &Msg, SMRange Range) {
   HadError = true;
   printMessage(L, SourceMgr::DK_Error, Msg, Range);
   printMacroInstantiations();
   return true;
 }
 
 bool AsmParser::enterIncludeFile(const std::string &Filename) {
   std::string IncludedFile;
   unsigned NewBuf =
       SrcMgr.AddIncludeFile(Filename, Lexer.getLoc(), IncludedFile);
   if (!NewBuf)
     return true;
 
   CurBuffer = NewBuf;
   Lexer.setBuffer(SrcMgr.getMemoryBuffer(CurBuffer)->getBuffer());
   return false;
 }
 
 /// Process the specified .incbin file by searching for it in the include paths
 /// then just emitting the byte contents of the file to the streamer. This
 /// returns true on failure.
 bool AsmParser::processIncbinFile(const std::string &Filename, int64_t Skip,
                                   const MCExpr *Count, SMLoc Loc) {
   std::string IncludedFile;
   unsigned NewBuf =
       SrcMgr.AddIncludeFile(Filename, Lexer.getLoc(), IncludedFile);
   if (!NewBuf)
     return true;
 
   // Pick up the bytes from the file and emit them.
   StringRef Bytes = SrcMgr.getMemoryBuffer(NewBuf)->getBuffer();
   Bytes = Bytes.drop_front(Skip);
   if (Count) {
     int64_t Res;
     if (!Count->evaluateAsAbsolute(Res, getStreamer().getAssemblerPtr()))
       return Error(Loc, "expected absolute expression");
     if (Res < 0)
       return Warning(Loc, "negative count has no effect");
     Bytes = Bytes.take_front(Res);
   }
   getStreamer().EmitBytes(Bytes);
   return false;
 }
 
 void AsmParser::jumpToLoc(SMLoc Loc, unsigned InBuffer) {
   CurBuffer = InBuffer ? InBuffer : SrcMgr.FindBufferContainingLoc(Loc);
   Lexer.setBuffer(SrcMgr.getMemoryBuffer(CurBuffer)->getBuffer(),
                   Loc.getPointer());
 }
 
 const AsmToken &AsmParser::Lex() {
   if (Lexer.getTok().is(AsmToken::Error))
     Error(Lexer.getErrLoc(), Lexer.getErr());
 
   // if it's a end of statement with a comment in it
   if (getTok().is(AsmToken::EndOfStatement)) {
     // if this is a line comment output it.
     if (!getTok().getString().empty() && getTok().getString().front() != '\n' &&
         getTok().getString().front() != '\r' && MAI.preserveAsmComments())
       Out.addExplicitComment(Twine(getTok().getString()));
   }
 
   const AsmToken *tok = &Lexer.Lex();
 
   // Parse comments here to be deferred until end of next statement.
   while (tok->is(AsmToken::Comment)) {
     if (MAI.preserveAsmComments())
       Out.addExplicitComment(Twine(tok->getString()));
     tok = &Lexer.Lex();
   }
 
   if (tok->is(AsmToken::Eof)) {
     // If this is the end of an included file, pop the parent file off the
     // include stack.
     SMLoc ParentIncludeLoc = SrcMgr.getParentIncludeLoc(CurBuffer);
     if (ParentIncludeLoc != SMLoc()) {
       jumpToLoc(ParentIncludeLoc);
       return Lex();
     }
   }
 
   return *tok;
 }
 
 bool AsmParser::enabledGenDwarfForAssembly() {
   // Check whether the user specified -g.
   if (!getContext().getGenDwarfForAssembly())
     return false;
   // If we haven't encountered any .file directives (which would imply that
   // the assembler source was produced with debug info already) then emit one
   // describing the assembler source file itself.
   if (getContext().getGenDwarfFileNumber() == 0)
     getContext().setGenDwarfFileNumber(getStreamer().EmitDwarfFileDirective(
         0, StringRef(), getContext().getMainFileName()));
   return true;
 }
 
 bool AsmParser::Run(bool NoInitialTextSection, bool NoFinalize) {
   // Create the initial section, if requested.
   if (!NoInitialTextSection)
     Out.InitSections(false);
 
   // Prime the lexer.
   Lex();
 
   HadError = false;
   AsmCond StartingCondState = TheCondState;
   SmallVector<AsmRewrite, 4> AsmStrRewrites;
 
   // If we are generating dwarf for assembly source files save the initial text
   // section.  (Don't use enabledGenDwarfForAssembly() here, as we aren't
   // emitting any actual debug info yet and haven't had a chance to parse any
   // embedded .file directives.)
   if (getContext().getGenDwarfForAssembly()) {
     MCSection *Sec = getStreamer().getCurrentSectionOnly();
     if (!Sec->getBeginSymbol()) {
       MCSymbol *SectionStartSym = getContext().createTempSymbol();
       getStreamer().EmitLabel(SectionStartSym);
       Sec->setBeginSymbol(SectionStartSym);
     }
     bool InsertResult = getContext().addGenDwarfSection(Sec);
     assert(InsertResult && ".text section should not have debug info yet");
     (void)InsertResult;
   }
 
   // While we have input, parse each statement.
   while (Lexer.isNot(AsmToken::Eof)) {
     ParseStatementInfo Info(&AsmStrRewrites);
     if (!parseStatement(Info, nullptr))
       continue;
 
     // If we have a Lexer Error we are on an Error Token. Load in Lexer Error
     // for printing ErrMsg via Lex() only if no (presumably better) parser error
     // exists.
     if (!hasPendingError() && Lexer.getTok().is(AsmToken::Error)) {
       Lex();
     }
 
     // parseStatement returned true so may need to emit an error.
     printPendingErrors();
 
     // Skipping to the next line if needed.
     if (!getLexer().isAtStartOfStatement())
       eatToEndOfStatement();
   }
 
   // All errors should have been emitted.
   assert(!hasPendingError() && "unexpected error from parseStatement");
 
   getTargetParser().flushPendingInstructions(getStreamer());
 
   if (TheCondState.TheCond != StartingCondState.TheCond ||
       TheCondState.Ignore != StartingCondState.Ignore)
     printError(getTok().getLoc(), "unmatched .ifs or .elses");
   // Check to see there are no empty DwarfFile slots.
   const auto &LineTables = getContext().getMCDwarfLineTables();
   if (!LineTables.empty()) {
     unsigned Index = 0;
     for (const auto &File : LineTables.begin()->second.getMCDwarfFiles()) {
       if (File.Name.empty() && Index != 0)
         printError(getTok().getLoc(), "unassigned file number: " +
                                           Twine(Index) +
                                           " for .file directives");
       ++Index;
     }
   }
 
   // Check to see that all assembler local symbols were actually defined.
   // Targets that don't do subsections via symbols may not want this, though,
   // so conservatively exclude them. Only do this if we're finalizing, though,
   // as otherwise we won't necessarilly have seen everything yet.
   if (!NoFinalize) {
     if (MAI.hasSubsectionsViaSymbols()) {
       for (const auto &TableEntry : getContext().getSymbols()) {
         MCSymbol *Sym = TableEntry.getValue();
         // Variable symbols may not be marked as defined, so check those
         // explicitly. If we know it's a variable, we have a definition for
         // the purposes of this check.
         if (Sym->isTemporary() && !Sym->isVariable() && !Sym->isDefined())
           // FIXME: We would really like to refer back to where the symbol was
           // first referenced for a source location. We need to add something
           // to track that. Currently, we just point to the end of the file.
           printError(getTok().getLoc(), "assembler local symbol '" +
                                             Sym->getName() + "' not defined");
       }
     }
 
     // Temporary symbols like the ones for directional jumps don't go in the
     // symbol table. They also need to be diagnosed in all (final) cases.
     for (std::tuple<SMLoc, CppHashInfoTy, MCSymbol *> &LocSym : DirLabels) {
       if (std::get<2>(LocSym)->isUndefined()) {
         // Reset the state of any "# line file" directives we've seen to the
         // context as it was at the diagnostic site.
         CppHashInfo = std::get<1>(LocSym);
         printError(std::get<0>(LocSym), "directional label undefined");
       }
     }
   }
 
   // Finalize the output stream if there are no errors and if the client wants
   // us to.
   if (!HadError && !NoFinalize)
     Out.Finish();
 
   return HadError || getContext().hadError();
 }
 
 bool AsmParser::checkForValidSection() {
   if (!ParsingInlineAsm && !getStreamer().getCurrentSectionOnly()) {
     Out.InitSections(false);
     return Error(getTok().getLoc(),
                  "expected section directive before assembly directive");
   }
   return false;
 }
 
 /// Throw away the rest of the line for testing purposes.
 void AsmParser::eatToEndOfStatement() {
   while (Lexer.isNot(AsmToken::EndOfStatement) && Lexer.isNot(AsmToken::Eof))
     Lexer.Lex();
 
   // Eat EOL.
   if (Lexer.is(AsmToken::EndOfStatement))
     Lexer.Lex();
 }
 
 StringRef AsmParser::parseStringToEndOfStatement() {
   const char *Start = getTok().getLoc().getPointer();
 
   while (Lexer.isNot(AsmToken::EndOfStatement) && Lexer.isNot(AsmToken::Eof))
     Lexer.Lex();
 
   const char *End = getTok().getLoc().getPointer();
   return StringRef(Start, End - Start);
 }
 
 StringRef AsmParser::parseStringToComma() {
   const char *Start = getTok().getLoc().getPointer();
 
   while (Lexer.isNot(AsmToken::EndOfStatement) &&
          Lexer.isNot(AsmToken::Comma) && Lexer.isNot(AsmToken::Eof))
     Lexer.Lex();
 
   const char *End = getTok().getLoc().getPointer();
   return StringRef(Start, End - Start);
 }
 
 /// Parse a paren expression and return it.
 /// NOTE: This assumes the leading '(' has already been consumed.
 ///
 /// parenexpr ::= expr)
 ///
 bool AsmParser::parseParenExpr(const MCExpr *&Res, SMLoc &EndLoc) {
   if (parseExpression(Res))
     return true;
   if (Lexer.isNot(AsmToken::RParen))
     return TokError("expected ')' in parentheses expression");
   EndLoc = Lexer.getTok().getEndLoc();
   Lex();
   return false;
 }
 
 /// Parse a bracket expression and return it.
 /// NOTE: This assumes the leading '[' has already been consumed.
 ///
 /// bracketexpr ::= expr]
 ///
 bool AsmParser::parseBracketExpr(const MCExpr *&Res, SMLoc &EndLoc) {
   if (parseExpression(Res))
     return true;
   EndLoc = getTok().getEndLoc();
   if (parseToken(AsmToken::RBrac, "expected ']' in brackets expression"))
     return true;
   return false;
 }
 
 /// Parse a primary expression and return it.
 ///  primaryexpr ::= (parenexpr
 ///  primaryexpr ::= symbol
 ///  primaryexpr ::= number
 ///  primaryexpr ::= '.'
 ///  primaryexpr ::= ~,+,- primaryexpr
 bool AsmParser::parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) {
   SMLoc FirstTokenLoc = getLexer().getLoc();
   AsmToken::TokenKind FirstTokenKind = Lexer.getKind();
   switch (FirstTokenKind) {
   default:
     return TokError("unknown token in expression");
   // If we have an error assume that we've already handled it.
   case AsmToken::Error:
     return true;
   case AsmToken::Exclaim:
     Lex(); // Eat the operator.
     if (parsePrimaryExpr(Res, EndLoc))
       return true;
     Res = MCUnaryExpr::createLNot(Res, getContext(), FirstTokenLoc);
     return false;
   case AsmToken::Dollar:
   case AsmToken::At:
   case AsmToken::String:
   case AsmToken::Identifier: {
     StringRef Identifier;
     if (parseIdentifier(Identifier)) {
       // We may have failed but $ may be a valid token.
       if (getTok().is(AsmToken::Dollar)) {
         if (Lexer.getMAI().getDollarIsPC()) {
           Lex();
           // This is a '$' reference, which references the current PC.  Emit a
           // temporary label to the streamer and refer to it.
           MCSymbol *Sym = Ctx.createTempSymbol();
           Out.EmitLabel(Sym);
           Res = MCSymbolRefExpr::create(Sym, MCSymbolRefExpr::VK_None,
                                         getContext());
           EndLoc = FirstTokenLoc;
           return false;
         }
         return Error(FirstTokenLoc, "invalid token in expression");
       }
     }
     // Parse symbol variant
     std::pair<StringRef, StringRef> Split;
     if (!MAI.useParensForSymbolVariant()) {
       if (FirstTokenKind == AsmToken::String) {
         if (Lexer.is(AsmToken::At)) {
           Lex(); // eat @
           SMLoc AtLoc = getLexer().getLoc();
           StringRef VName;
           if (parseIdentifier(VName))
             return Error(AtLoc, "expected symbol variant after '@'");
 
           Split = std::make_pair(Identifier, VName);
         }
       } else {
         Split = Identifier.split('@');
       }
     } else if (Lexer.is(AsmToken::LParen)) {
       Lex(); // eat '('.
       StringRef VName;
       parseIdentifier(VName);
       // eat ')'.
       if (parseToken(AsmToken::RParen,
                      "unexpected token in variant, expected ')'"))
         return true;
       Split = std::make_pair(Identifier, VName);
     }
 
     EndLoc = SMLoc::getFromPointer(Identifier.end());
 
     // This is a symbol reference.
     StringRef SymbolName = Identifier;
     if (SymbolName.empty())
       return true;
 
     MCSymbolRefExpr::VariantKind Variant = MCSymbolRefExpr::VK_None;
 
     // Lookup the symbol variant if used.
     if (!Split.second.empty()) {
       Variant = MCSymbolRefExpr::getVariantKindForName(Split.second);
       if (Variant != MCSymbolRefExpr::VK_Invalid) {
         SymbolName = Split.first;
       } else if (MAI.doesAllowAtInName() && !MAI.useParensForSymbolVariant()) {
         Variant = MCSymbolRefExpr::VK_None;
       } else {
         return Error(SMLoc::getFromPointer(Split.second.begin()),
                      "invalid variant '" + Split.second + "'");
       }
     }
 
     MCSymbol *Sym = getContext().getOrCreateSymbol(SymbolName);
 
     // If this is an absolute variable reference, substitute it now to preserve
     // semantics in the face of reassignment.
     if (Sym->isVariable()) {
       auto V = Sym->getVariableValue(/*SetUsed*/ false);
       bool DoInline = isa<MCConstantExpr>(V);
       if (auto TV = dyn_cast<MCTargetExpr>(V))
         DoInline = TV->inlineAssignedExpr();
       if (DoInline) {
         if (Variant)
           return Error(EndLoc, "unexpected modifier on variable reference");
         Res = Sym->getVariableValue(/*SetUsed*/ false);
         return false;
       }
     }
 
     // Otherwise create a symbol ref.
     Res = MCSymbolRefExpr::create(Sym, Variant, getContext(), FirstTokenLoc);
     return false;
   }
   case AsmToken::BigNum:
     return TokError("literal value out of range for directive");
   case AsmToken::Integer: {
     SMLoc Loc = getTok().getLoc();
     int64_t IntVal = getTok().getIntVal();
     Res = MCConstantExpr::create(IntVal, getContext());
     EndLoc = Lexer.getTok().getEndLoc();
     Lex(); // Eat token.
     // Look for 'b' or 'f' following an Integer as a directional label
     if (Lexer.getKind() == AsmToken::Identifier) {
       StringRef IDVal = getTok().getString();
       // Lookup the symbol variant if used.
       std::pair<StringRef, StringRef> Split = IDVal.split('@');
       MCSymbolRefExpr::VariantKind Variant = MCSymbolRefExpr::VK_None;
       if (Split.first.size() != IDVal.size()) {
         Variant = MCSymbolRefExpr::getVariantKindForName(Split.second);
         if (Variant == MCSymbolRefExpr::VK_Invalid)
           return TokError("invalid variant '" + Split.second + "'");
         IDVal = Split.first;
       }
       if (IDVal == "f" || IDVal == "b") {
         MCSymbol *Sym =
             Ctx.getDirectionalLocalSymbol(IntVal, IDVal == "b");
         Res = MCSymbolRefExpr::create(Sym, Variant, getContext());
         if (IDVal == "b" && Sym->isUndefined())
           return Error(Loc, "directional label undefined");
         DirLabels.push_back(std::make_tuple(Loc, CppHashInfo, Sym));
         EndLoc = Lexer.getTok().getEndLoc();
         Lex(); // Eat identifier.
       }
     }
     return false;
   }
   case AsmToken::Real: {
     APFloat RealVal(APFloat::IEEEdouble(), getTok().getString());
     uint64_t IntVal = RealVal.bitcastToAPInt().getZExtValue();
     Res = MCConstantExpr::create(IntVal, getContext());
     EndLoc = Lexer.getTok().getEndLoc();
     Lex(); // Eat token.
     return false;
   }
   case AsmToken::Dot: {
     // This is a '.' reference, which references the current PC.  Emit a
     // temporary label to the streamer and refer to it.
     MCSymbol *Sym = Ctx.createTempSymbol();
     Out.EmitLabel(Sym);
     Res = MCSymbolRefExpr::create(Sym, MCSymbolRefExpr::VK_None, getContext());
     EndLoc = Lexer.getTok().getEndLoc();
     Lex(); // Eat identifier.
     return false;
   }
   case AsmToken::LParen:
     Lex(); // Eat the '('.
     return parseParenExpr(Res, EndLoc);
   case AsmToken::LBrac:
     if (!PlatformParser->HasBracketExpressions())
       return TokError("brackets expression not supported on this target");
     Lex(); // Eat the '['.
     return parseBracketExpr(Res, EndLoc);
   case AsmToken::Minus:
     Lex(); // Eat the operator.
     if (parsePrimaryExpr(Res, EndLoc))
       return true;
     Res = MCUnaryExpr::createMinus(Res, getContext(), FirstTokenLoc);
     return false;
   case AsmToken::Plus:
     Lex(); // Eat the operator.
     if (parsePrimaryExpr(Res, EndLoc))
       return true;
     Res = MCUnaryExpr::createPlus(Res, getContext(), FirstTokenLoc);
     return false;
   case AsmToken::Tilde:
     Lex(); // Eat the operator.
     if (parsePrimaryExpr(Res, EndLoc))
       return true;
     Res = MCUnaryExpr::createNot(Res, getContext(), FirstTokenLoc);
     return false;
   // MIPS unary expression operators. The lexer won't generate these tokens if
   // MCAsmInfo::HasMipsExpressions is false for the target.
   case AsmToken::PercentCall16:
   case AsmToken::PercentCall_Hi:
   case AsmToken::PercentCall_Lo:
   case AsmToken::PercentDtprel_Hi:
   case AsmToken::PercentDtprel_Lo:
   case AsmToken::PercentGot:
   case AsmToken::PercentGot_Disp:
   case AsmToken::PercentGot_Hi:
   case AsmToken::PercentGot_Lo:
   case AsmToken::PercentGot_Ofst:
   case AsmToken::PercentGot_Page:
   case AsmToken::PercentGottprel:
   case AsmToken::PercentGp_Rel:
   case AsmToken::PercentHi:
   case AsmToken::PercentHigher:
   case AsmToken::PercentHighest:
   case AsmToken::PercentLo:
   case AsmToken::PercentNeg:
   case AsmToken::PercentPcrel_Hi:
   case AsmToken::PercentPcrel_Lo:
   case AsmToken::PercentTlsgd:
   case AsmToken::PercentTlsldm:
   case AsmToken::PercentTprel_Hi:
   case AsmToken::PercentTprel_Lo:
     Lex(); // Eat the operator.
     if (Lexer.isNot(AsmToken::LParen))
       return TokError("expected '(' after operator");
     Lex(); // Eat the operator.
     if (parseExpression(Res, EndLoc))
       return true;
     if (Lexer.isNot(AsmToken::RParen))
       return TokError("expected ')'");
     Lex(); // Eat the operator.
     Res = getTargetParser().createTargetUnaryExpr(Res, FirstTokenKind, Ctx);
     return !Res;
   }
 }
 
 bool AsmParser::parseExpression(const MCExpr *&Res) {
   SMLoc EndLoc;
   return parseExpression(Res, EndLoc);
 }
 
 const MCExpr *
 AsmParser::applyModifierToExpr(const MCExpr *E,
                                MCSymbolRefExpr::VariantKind Variant) {
   // Ask the target implementation about this expression first.
   const MCExpr *NewE = getTargetParser().applyModifierToExpr(E, Variant, Ctx);
   if (NewE)
     return NewE;
   // Recurse over the given expression, rebuilding it to apply the given variant
   // if there is exactly one symbol.
   switch (E->getKind()) {
   case MCExpr::Target:
   case MCExpr::Constant:
     return nullptr;
 
   case MCExpr::SymbolRef: {
     const MCSymbolRefExpr *SRE = cast<MCSymbolRefExpr>(E);
 
     if (SRE->getKind() != MCSymbolRefExpr::VK_None) {
       TokError("invalid variant on expression '" + getTok().getIdentifier() +
                "' (already modified)");
       return E;
     }
 
     return MCSymbolRefExpr::create(&SRE->getSymbol(), Variant, getContext());
   }
 
   case MCExpr::Unary: {
     const MCUnaryExpr *UE = cast<MCUnaryExpr>(E);
     const MCExpr *Sub = applyModifierToExpr(UE->getSubExpr(), Variant);
     if (!Sub)
       return nullptr;
     return MCUnaryExpr::create(UE->getOpcode(), Sub, getContext());
   }
 
   case MCExpr::Binary: {
     const MCBinaryExpr *BE = cast<MCBinaryExpr>(E);
     const MCExpr *LHS = applyModifierToExpr(BE->getLHS(), Variant);
     const MCExpr *RHS = applyModifierToExpr(BE->getRHS(), Variant);
 
     if (!LHS && !RHS)
       return nullptr;
 
     if (!LHS)
       LHS = BE->getLHS();
     if (!RHS)
       RHS = BE->getRHS();
 
     return MCBinaryExpr::create(BE->getOpcode(), LHS, RHS, getContext());
   }
   }
 
   llvm_unreachable("Invalid expression kind!");
 }
 
 /// This function checks if the next token is <string> type or arithmetic.
 /// string that begin with character '<' must end with character '>'.
 /// otherwise it is arithmetics.
 /// If the function returns a 'true' value,
 /// the End argument will be filled with the last location pointed to the '>'
 /// character.
 
 /// There is a gap between the AltMacro's documentation and the single quote implementation.
 /// GCC does not fully support this feature and so we will not support it.
 /// TODO: Adding single quote as a string.
 bool AsmParser::isAltmacroString(SMLoc &StrLoc, SMLoc &EndLoc) {
   assert((StrLoc.getPointer() != NULL) &&
          "Argument to the function cannot be a NULL value");
   const char *CharPtr = StrLoc.getPointer();
   while ((*CharPtr != '>') && (*CharPtr != '\n') && (*CharPtr != '\r') &&
          (*CharPtr != '\0')) {
     if (*CharPtr == '!')
       CharPtr++;
     CharPtr++;
   }
   if (*CharPtr == '>') {
     EndLoc = StrLoc.getFromPointer(CharPtr + 1);
     return true;
   }
   return false;
 }
 
 /// creating a string without the escape characters '!'.
 void AsmParser::altMacroString(StringRef AltMacroStr,std::string &Res) {
   for (size_t Pos = 0; Pos < AltMacroStr.size(); Pos++) {
     if (AltMacroStr[Pos] == '!')
       Pos++;
     Res += AltMacroStr[Pos];
   }
 }
 
 /// Parse an expression and return it.
 ///
 ///  expr ::= expr &&,|| expr               -> lowest.
 ///  expr ::= expr |,^,&,! expr
 ///  expr ::= expr ==,!=,<>,<,<=,>,>= expr
 ///  expr ::= expr <<,>> expr
 ///  expr ::= expr +,- expr
 ///  expr ::= expr *,/,% expr               -> highest.
 ///  expr ::= primaryexpr
 ///
 bool AsmParser::parseExpression(const MCExpr *&Res, SMLoc &EndLoc) {
   // Parse the expression.
   Res = nullptr;
   if (getTargetParser().parsePrimaryExpr(Res, EndLoc) ||
       parseBinOpRHS(1, Res, EndLoc))
     return true;
 
   // As a special case, we support 'a op b @ modifier' by rewriting the
   // expression to include the modifier. This is inefficient, but in general we
   // expect users to use 'a@modifier op b'.
   if (Lexer.getKind() == AsmToken::At) {
     Lex();
 
     if (Lexer.isNot(AsmToken::Identifier))
       return TokError("unexpected symbol modifier following '@'");
 
     MCSymbolRefExpr::VariantKind Variant =
         MCSymbolRefExpr::getVariantKindForName(getTok().getIdentifier());
     if (Variant == MCSymbolRefExpr::VK_Invalid)
       return TokError("invalid variant '" + getTok().getIdentifier() + "'");
 
     const MCExpr *ModifiedRes = applyModifierToExpr(Res, Variant);
     if (!ModifiedRes) {
       return TokError("invalid modifier '" + getTok().getIdentifier() +
                       "' (no symbols present)");
     }
 
     Res = ModifiedRes;
     Lex();
   }
 
   // Try to constant fold it up front, if possible. Do not exploit
   // assembler here.
   int64_t Value;
   if (Res->evaluateAsAbsolute(Value))
     Res = MCConstantExpr::create(Value, getContext());
 
   return false;
 }
 
 bool AsmParser::parseParenExpression(const MCExpr *&Res, SMLoc &EndLoc) {
   Res = nullptr;
   return parseParenExpr(Res, EndLoc) || parseBinOpRHS(1, Res, EndLoc);
 }
 
 bool AsmParser::parseParenExprOfDepth(unsigned ParenDepth, const MCExpr *&Res,
                                       SMLoc &EndLoc) {
   if (parseParenExpr(Res, EndLoc))
     return true;
 
   for (; ParenDepth > 0; --ParenDepth) {
     if (parseBinOpRHS(1, Res, EndLoc))
       return true;
 
     // We don't Lex() the last RParen.
     // This is the same behavior as parseParenExpression().
     if (ParenDepth - 1 > 0) {
       EndLoc = getTok().getEndLoc();
       if (parseToken(AsmToken::RParen,
                      "expected ')' in parentheses expression"))
         return true;
     }
   }
   return false;
 }
 
 bool AsmParser::parseAbsoluteExpression(int64_t &Res) {
   const MCExpr *Expr;
 
   SMLoc StartLoc = Lexer.getLoc();
   if (parseExpression(Expr))
     return true;
 
   if (!Expr->evaluateAsAbsolute(Res, getStreamer().getAssemblerPtr()))
     return Error(StartLoc, "expected absolute expression");
 
   return false;
 }
 
 static unsigned getDarwinBinOpPrecedence(AsmToken::TokenKind K,
                                          MCBinaryExpr::Opcode &Kind,
                                          bool ShouldUseLogicalShr) {
   switch (K) {
   default:
     return 0; // not a binop.
 
   // Lowest Precedence: &&, ||
   case AsmToken::AmpAmp:
     Kind = MCBinaryExpr::LAnd;
     return 1;
   case AsmToken::PipePipe:
     Kind = MCBinaryExpr::LOr;
     return 1;
 
   // Low Precedence: |, &, ^
   //
   // FIXME: gas seems to support '!' as an infix operator?
   case AsmToken::Pipe:
     Kind = MCBinaryExpr::Or;
     return 2;
   case AsmToken::Caret:
     Kind = MCBinaryExpr::Xor;
     return 2;
   case AsmToken::Amp:
     Kind = MCBinaryExpr::And;
     return 2;
 
   // Low Intermediate Precedence: ==, !=, <>, <, <=, >, >=
   case AsmToken::EqualEqual:
     Kind = MCBinaryExpr::EQ;
     return 3;
   case AsmToken::ExclaimEqual:
   case AsmToken::LessGreater:
     Kind = MCBinaryExpr::NE;
     return 3;
   case AsmToken::Less:
     Kind = MCBinaryExpr::LT;
     return 3;
   case AsmToken::LessEqual:
     Kind = MCBinaryExpr::LTE;
     return 3;
   case AsmToken::Greater:
     Kind = MCBinaryExpr::GT;
     return 3;
   case AsmToken::GreaterEqual:
     Kind = MCBinaryExpr::GTE;
     return 3;
 
   // Intermediate Precedence: <<, >>
   case AsmToken::LessLess:
     Kind = MCBinaryExpr::Shl;
     return 4;
   case AsmToken::GreaterGreater:
     Kind = ShouldUseLogicalShr ? MCBinaryExpr::LShr : MCBinaryExpr::AShr;
     return 4;
 
   // High Intermediate Precedence: +, -
   case AsmToken::Plus:
     Kind = MCBinaryExpr::Add;
     return 5;
   case AsmToken::Minus:
     Kind = MCBinaryExpr::Sub;
     return 5;
 
   // Highest Precedence: *, /, %
   case AsmToken::Star:
     Kind = MCBinaryExpr::Mul;
     return 6;
   case AsmToken::Slash:
     Kind = MCBinaryExpr::Div;
     return 6;
   case AsmToken::Percent:
     Kind = MCBinaryExpr::Mod;
     return 6;
   }
 }
 
 static unsigned getGNUBinOpPrecedence(AsmToken::TokenKind K,
                                       MCBinaryExpr::Opcode &Kind,
                                       bool ShouldUseLogicalShr) {
   switch (K) {
   default:
     return 0; // not a binop.
 
   // Lowest Precedence: &&, ||
   case AsmToken::AmpAmp:
     Kind = MCBinaryExpr::LAnd;
     return 2;
   case AsmToken::PipePipe:
     Kind = MCBinaryExpr::LOr;
     return 1;
 
   // Low Precedence: ==, !=, <>, <, <=, >, >=
   case AsmToken::EqualEqual:
     Kind = MCBinaryExpr::EQ;
     return 3;
   case AsmToken::ExclaimEqual:
   case AsmToken::LessGreater:
     Kind = MCBinaryExpr::NE;
     return 3;
   case AsmToken::Less:
     Kind = MCBinaryExpr::LT;
     return 3;
   case AsmToken::LessEqual:
     Kind = MCBinaryExpr::LTE;
     return 3;
   case AsmToken::Greater:
     Kind = MCBinaryExpr::GT;
     return 3;
   case AsmToken::GreaterEqual:
     Kind = MCBinaryExpr::GTE;
     return 3;
 
   // Low Intermediate Precedence: +, -
   case AsmToken::Plus:
     Kind = MCBinaryExpr::Add;
     return 4;
   case AsmToken::Minus:
     Kind = MCBinaryExpr::Sub;
     return 4;
 
   // High Intermediate Precedence: |, &, ^
   //
   // FIXME: gas seems to support '!' as an infix operator?
   case AsmToken::Pipe:
     Kind = MCBinaryExpr::Or;
     return 5;
   case AsmToken::Caret:
     Kind = MCBinaryExpr::Xor;
     return 5;
   case AsmToken::Amp:
     Kind = MCBinaryExpr::And;
     return 5;
 
   // Highest Precedence: *, /, %, <<, >>
   case AsmToken::Star:
     Kind = MCBinaryExpr::Mul;
     return 6;
   case AsmToken::Slash:
     Kind = MCBinaryExpr::Div;
     return 6;
   case AsmToken::Percent:
     Kind = MCBinaryExpr::Mod;
     return 6;
   case AsmToken::LessLess:
     Kind = MCBinaryExpr::Shl;
     return 6;
   case AsmToken::GreaterGreater:
     Kind = ShouldUseLogicalShr ? MCBinaryExpr::LShr : MCBinaryExpr::AShr;
     return 6;
   }
 }
 
 unsigned AsmParser::getBinOpPrecedence(AsmToken::TokenKind K,
                                        MCBinaryExpr::Opcode &Kind) {
   bool ShouldUseLogicalShr = MAI.shouldUseLogicalShr();
   return IsDarwin ? getDarwinBinOpPrecedence(K, Kind, ShouldUseLogicalShr)
                   : getGNUBinOpPrecedence(K, Kind, ShouldUseLogicalShr);
 }
 
 /// Parse all binary operators with precedence >= 'Precedence'.
 /// Res contains the LHS of the expression on input.
 bool AsmParser::parseBinOpRHS(unsigned Precedence, const MCExpr *&Res,
                               SMLoc &EndLoc) {
   SMLoc StartLoc = Lexer.getLoc();
   while (true) {
     MCBinaryExpr::Opcode Kind = MCBinaryExpr::Add;
     unsigned TokPrec = getBinOpPrecedence(Lexer.getKind(), Kind);
 
     // If the next token is lower precedence than we are allowed to eat, return
     // successfully with what we ate already.
     if (TokPrec < Precedence)
       return false;
 
     Lex();
 
     // Eat the next primary expression.
     const MCExpr *RHS;
     if (getTargetParser().parsePrimaryExpr(RHS, EndLoc))
       return true;
 
     // If BinOp binds less tightly with RHS than the operator after RHS, let
     // the pending operator take RHS as its LHS.
     MCBinaryExpr::Opcode Dummy;
     unsigned NextTokPrec = getBinOpPrecedence(Lexer.getKind(), Dummy);
     if (TokPrec < NextTokPrec && parseBinOpRHS(TokPrec + 1, RHS, EndLoc))
       return true;
 
     // Merge LHS and RHS according to operator.
     Res = MCBinaryExpr::create(Kind, Res, RHS, getContext(), StartLoc);
   }
 }
 
 /// ParseStatement:
 ///   ::= EndOfStatement
 ///   ::= Label* Directive ...Operands... EndOfStatement
 ///   ::= Label* Identifier OperandList* EndOfStatement
 bool AsmParser::parseStatement(ParseStatementInfo &Info,
                                MCAsmParserSemaCallback *SI) {
   assert(!hasPendingError() && "parseStatement started with pending error");
   // Eat initial spaces and comments
   while (Lexer.is(AsmToken::Space))
     Lex();
   if (Lexer.is(AsmToken::EndOfStatement)) {
     // if this is a line comment we can drop it safely
     if (getTok().getString().empty() || getTok().getString().front() == '\r' ||
         getTok().getString().front() == '\n')
       Out.AddBlankLine();
     Lex();
     return false;
   }
   // Statements always start with an identifier.
   AsmToken ID = getTok();
   SMLoc IDLoc = ID.getLoc();
   StringRef IDVal;
   int64_t LocalLabelVal = -1;
   if (Lexer.is(AsmToken::HashDirective))
     return parseCppHashLineFilenameComment(IDLoc);
   // Allow an integer followed by a ':' as a directional local label.
   if (Lexer.is(AsmToken::Integer)) {
     LocalLabelVal = getTok().getIntVal();
     if (LocalLabelVal < 0) {
       if (!TheCondState.Ignore) {
         Lex(); // always eat a token
         return Error(IDLoc, "unexpected token at start of statement");
       }
       IDVal = "";
     } else {
       IDVal = getTok().getString();
       Lex(); // Consume the integer token to be used as an identifier token.
       if (Lexer.getKind() != AsmToken::Colon) {
         if (!TheCondState.Ignore) {
           Lex(); // always eat a token
           return Error(IDLoc, "unexpected token at start of statement");
         }
       }
     }
   } else if (Lexer.is(AsmToken::Dot)) {
     // Treat '.' as a valid identifier in this context.
     Lex();
     IDVal = ".";
   } else if (Lexer.is(AsmToken::LCurly)) {
     // Treat '{' as a valid identifier in this context.
     Lex();
     IDVal = "{";
 
   } else if (Lexer.is(AsmToken::RCurly)) {
     // Treat '}' as a valid identifier in this context.
     Lex();
     IDVal = "}";
   } else if (Lexer.is(AsmToken::Star) &&
              getTargetParser().starIsStartOfStatement()) {
     // Accept '*' as a valid start of statement.
     Lex();
     IDVal = "*";
   } else if (parseIdentifier(IDVal)) {
     if (!TheCondState.Ignore) {
       Lex(); // always eat a token
       return Error(IDLoc, "unexpected token at start of statement");
     }
     IDVal = "";
   }
 
   // Handle conditional assembly here before checking for skipping.  We
   // have to do this so that .endif isn't skipped in a ".if 0" block for
   // example.
   StringMap<DirectiveKind>::const_iterator DirKindIt =
       DirectiveKindMap.find(IDVal);
   DirectiveKind DirKind = (DirKindIt == DirectiveKindMap.end())
                               ? DK_NO_DIRECTIVE
                               : DirKindIt->getValue();
   switch (DirKind) {
   default:
     break;
   case DK_IF:
   case DK_IFEQ:
   case DK_IFGE:
   case DK_IFGT:
   case DK_IFLE:
   case DK_IFLT:
   case DK_IFNE:
     return parseDirectiveIf(IDLoc, DirKind);
   case DK_IFB:
     return parseDirectiveIfb(IDLoc, true);
   case DK_IFNB:
     return parseDirectiveIfb(IDLoc, false);
   case DK_IFC:
     return parseDirectiveIfc(IDLoc, true);
   case DK_IFEQS:
     return parseDirectiveIfeqs(IDLoc, true);
   case DK_IFNC:
     return parseDirectiveIfc(IDLoc, false);
   case DK_IFNES:
     return parseDirectiveIfeqs(IDLoc, false);
   case DK_IFDEF:
     return parseDirectiveIfdef(IDLoc, true);
   case DK_IFNDEF:
   case DK_IFNOTDEF:
     return parseDirectiveIfdef(IDLoc, false);
   case DK_ELSEIF:
     return parseDirectiveElseIf(IDLoc);
   case DK_ELSE:
     return parseDirectiveElse(IDLoc);
   case DK_ENDIF:
     return parseDirectiveEndIf(IDLoc);
   }
 
   // Ignore the statement if in the middle of inactive conditional
   // (e.g. ".if 0").
   if (TheCondState.Ignore) {
     eatToEndOfStatement();
     return false;
   }
 
   // FIXME: Recurse on local labels?
 
   // See what kind of statement we have.
   switch (Lexer.getKind()) {
   case AsmToken::Colon: {
     if (!getTargetParser().isLabel(ID))
       break;
     if (checkForValidSection())
       return true;
 
     // identifier ':'   -> Label.
     Lex();
 
     // Diagnose attempt to use '.' as a label.
     if (IDVal == ".")
       return Error(IDLoc, "invalid use of pseudo-symbol '.' as a label");
 
     // Diagnose attempt to use a variable as a label.
     //
     // FIXME: Diagnostics. Note the location of the definition as a label.
     // FIXME: This doesn't diagnose assignment to a symbol which has been
     // implicitly marked as external.
     MCSymbol *Sym;
     if (LocalLabelVal == -1) {
       if (ParsingInlineAsm && SI) {
         StringRef RewrittenLabel =
             SI->LookupInlineAsmLabel(IDVal, getSourceManager(), IDLoc, true);
         assert(!RewrittenLabel.empty() &&
                "We should have an internal name here.");
         Info.AsmRewrites->emplace_back(AOK_Label, IDLoc, IDVal.size(),
                                        RewrittenLabel);
         IDVal = RewrittenLabel;
       }
       Sym = getContext().getOrCreateSymbol(IDVal);
     } else
       Sym = Ctx.createDirectionalLocalSymbol(LocalLabelVal);
     // End of Labels should be treated as end of line for lexing
     // purposes but that information is not available to the Lexer who
     // does not understand Labels. This may cause us to see a Hash
     // here instead of a preprocessor line comment.
     if (getTok().is(AsmToken::Hash)) {
       StringRef CommentStr = parseStringToEndOfStatement();
       Lexer.Lex();
       Lexer.UnLex(AsmToken(AsmToken::EndOfStatement, CommentStr));
     }
 
     // Consume any end of statement token, if present, to avoid spurious
     // AddBlankLine calls().
     if (getTok().is(AsmToken::EndOfStatement)) {
       Lex();
     }
 
     // Emit the label.
     if (!getTargetParser().isParsingInlineAsm())
       Out.EmitLabel(Sym, IDLoc);
 
     // If we are generating dwarf for assembly source files then gather the
     // info to make a dwarf label entry for this label if needed.
     if (enabledGenDwarfForAssembly())
       MCGenDwarfLabelEntry::Make(Sym, &getStreamer(), getSourceManager(),
                                  IDLoc);
 
     getTargetParser().onLabelParsed(Sym);
 
     return false;
   }
 
   case AsmToken::Equal:
     if (!getTargetParser().equalIsAsmAssignment())
       break;
     // identifier '=' ... -> assignment statement
     Lex();
 
     return parseAssignment(IDVal, true);
 
   default: // Normal instruction or directive.
     break;
   }
 
   // If macros are enabled, check to see if this is a macro instantiation.
   if (areMacrosEnabled())
     if (const MCAsmMacro *M = getContext().lookupMacro(IDVal)) {
       return handleMacroEntry(M, IDLoc);
     }
 
   // Otherwise, we have a normal instruction or directive.
 
   // Directives start with "."
   if (IDVal[0] == '.' && IDVal != ".") {
     // There are several entities interested in parsing directives:
     //
     // 1. The target-specific assembly parser. Some directives are target
     //    specific or may potentially behave differently on certain targets.
     // 2. Asm parser extensions. For example, platform-specific parsers
     //    (like the ELF parser) register themselves as extensions.
     // 3. The generic directive parser implemented by this class. These are
     //    all the directives that behave in a target and platform independent
     //    manner, or at least have a default behavior that's shared between
     //    all targets and platforms.
 
     getTargetParser().flushPendingInstructions(getStreamer());
 
     SMLoc StartTokLoc = getTok().getLoc();
     bool TPDirectiveReturn = getTargetParser().ParseDirective(ID);
 
     if (hasPendingError())
       return true;
     // Currently the return value should be true if we are
     // uninterested but as this is at odds with the standard parsing
     // convention (return true = error) we have instances of a parsed
     // directive that fails returning true as an error. Catch these
     // cases as best as possible errors here.
     if (TPDirectiveReturn && StartTokLoc != getTok().getLoc())
       return true;
     // Return if we did some parsing or believe we succeeded.
     if (!TPDirectiveReturn || StartTokLoc != getTok().getLoc())
       return false;
 
     // Next, check the extension directive map to see if any extension has
     // registered itself to parse this directive.
     std::pair<MCAsmParserExtension *, DirectiveHandler> Handler =
         ExtensionDirectiveMap.lookup(IDVal);
     if (Handler.first)
       return (*Handler.second)(Handler.first, IDVal, IDLoc);
 
     // Finally, if no one else is interested in this directive, it must be
     // generic and familiar to this class.
     switch (DirKind) {
     default:
       break;
     case DK_SET:
     case DK_EQU:
       return parseDirectiveSet(IDVal, true);
     case DK_EQUIV:
       return parseDirectiveSet(IDVal, false);
     case DK_ASCII:
       return parseDirectiveAscii(IDVal, false);
     case DK_ASCIZ:
     case DK_STRING:
       return parseDirectiveAscii(IDVal, true);
     case DK_BYTE:
     case DK_DC_B:
       return parseDirectiveValue(IDVal, 1);
     case DK_DC:
     case DK_DC_W:
     case DK_SHORT:
     case DK_VALUE:
     case DK_2BYTE:
       return parseDirectiveValue(IDVal, 2);
     case DK_LONG:
     case DK_INT:
     case DK_4BYTE:
     case DK_DC_L:
       return parseDirectiveValue(IDVal, 4);
     case DK_QUAD:
     case DK_8BYTE:
       return parseDirectiveValue(IDVal, 8);
     case DK_DC_A:
       return parseDirectiveValue(
           IDVal, getContext().getAsmInfo()->getCodePointerSize());
     case DK_OCTA:
       return parseDirectiveOctaValue(IDVal);
     case DK_SINGLE:
     case DK_FLOAT:
     case DK_DC_S:
       return parseDirectiveRealValue(IDVal, APFloat::IEEEsingle());
     case DK_DOUBLE:
     case DK_DC_D:
       return parseDirectiveRealValue(IDVal, APFloat::IEEEdouble());
     case DK_ALIGN: {
       bool IsPow2 = !getContext().getAsmInfo()->getAlignmentIsInBytes();
       return parseDirectiveAlign(IsPow2, /*ExprSize=*/1);
     }
     case DK_ALIGN32: {
       bool IsPow2 = !getContext().getAsmInfo()->getAlignmentIsInBytes();
       return parseDirectiveAlign(IsPow2, /*ExprSize=*/4);
     }
     case DK_BALIGN:
       return parseDirectiveAlign(/*IsPow2=*/false, /*ExprSize=*/1);
     case DK_BALIGNW:
       return parseDirectiveAlign(/*IsPow2=*/false, /*ExprSize=*/2);
     case DK_BALIGNL:
       return parseDirectiveAlign(/*IsPow2=*/false, /*ExprSize=*/4);
     case DK_P2ALIGN:
       return parseDirectiveAlign(/*IsPow2=*/true, /*ExprSize=*/1);
     case DK_P2ALIGNW:
       return parseDirectiveAlign(/*IsPow2=*/true, /*ExprSize=*/2);
     case DK_P2ALIGNL:
       return parseDirectiveAlign(/*IsPow2=*/true, /*ExprSize=*/4);
     case DK_ORG:
       return parseDirectiveOrg();
     case DK_FILL:
       return parseDirectiveFill();
     case DK_ZERO:
       return parseDirectiveZero();
     case DK_EXTERN:
       eatToEndOfStatement(); // .extern is the default, ignore it.
       return false;
     case DK_GLOBL:
     case DK_GLOBAL:
       return parseDirectiveSymbolAttribute(MCSA_Global);
     case DK_LAZY_REFERENCE:
       return parseDirectiveSymbolAttribute(MCSA_LazyReference);
     case DK_NO_DEAD_STRIP:
       return parseDirectiveSymbolAttribute(MCSA_NoDeadStrip);
     case DK_SYMBOL_RESOLVER:
       return parseDirectiveSymbolAttribute(MCSA_SymbolResolver);
     case DK_PRIVATE_EXTERN:
       return parseDirectiveSymbolAttribute(MCSA_PrivateExtern);
     case DK_REFERENCE:
       return parseDirectiveSymbolAttribute(MCSA_Reference);
     case DK_WEAK_DEFINITION:
       return parseDirectiveSymbolAttribute(MCSA_WeakDefinition);
     case DK_WEAK_REFERENCE:
       return parseDirectiveSymbolAttribute(MCSA_WeakReference);
     case DK_WEAK_DEF_CAN_BE_HIDDEN:
       return parseDirectiveSymbolAttribute(MCSA_WeakDefAutoPrivate);
     case DK_COMM:
     case DK_COMMON:
       return parseDirectiveComm(/*IsLocal=*/false);
     case DK_LCOMM:
       return parseDirectiveComm(/*IsLocal=*/true);
     case DK_ABORT:
       return parseDirectiveAbort();
     case DK_INCLUDE:
       return parseDirectiveInclude();
     case DK_INCBIN:
       return parseDirectiveIncbin();
     case DK_CODE16:
     case DK_CODE16GCC:
       return TokError(Twine(IDVal) +
                       " not currently supported for this target");
     case DK_REPT:
       return parseDirectiveRept(IDLoc, IDVal);
     case DK_IRP:
       return parseDirectiveIrp(IDLoc);
     case DK_IRPC:
       return parseDirectiveIrpc(IDLoc);
     case DK_ENDR:
       return parseDirectiveEndr(IDLoc);
     case DK_BUNDLE_ALIGN_MODE:
       return parseDirectiveBundleAlignMode();
     case DK_BUNDLE_LOCK:
       return parseDirectiveBundleLock();
     case DK_BUNDLE_UNLOCK:
       return parseDirectiveBundleUnlock();
     case DK_SLEB128:
       return parseDirectiveLEB128(true);
     case DK_ULEB128:
       return parseDirectiveLEB128(false);
     case DK_SPACE:
     case DK_SKIP:
       return parseDirectiveSpace(IDVal);
     case DK_FILE:
       return parseDirectiveFile(IDLoc);
     case DK_LINE:
       return parseDirectiveLine();
     case DK_LOC:
       return parseDirectiveLoc();
     case DK_STABS:
       return parseDirectiveStabs();
     case DK_CV_FILE:
       return parseDirectiveCVFile();
     case DK_CV_FUNC_ID:
       return parseDirectiveCVFuncId();
     case DK_CV_INLINE_SITE_ID:
       return parseDirectiveCVInlineSiteId();
     case DK_CV_LOC:
       return parseDirectiveCVLoc();
     case DK_CV_LINETABLE:
       return parseDirectiveCVLinetable();
     case DK_CV_INLINE_LINETABLE:
       return parseDirectiveCVInlineLinetable();
     case DK_CV_DEF_RANGE:
       return parseDirectiveCVDefRange();
     case DK_CV_STRINGTABLE:
       return parseDirectiveCVStringTable();
     case DK_CV_FILECHECKSUMS:
       return parseDirectiveCVFileChecksums();
     case DK_CV_FILECHECKSUM_OFFSET:
       return parseDirectiveCVFileChecksumOffset();
     case DK_CV_FPO_DATA:
       return parseDirectiveCVFPOData();
     case DK_CFI_SECTIONS:
       return parseDirectiveCFISections();
     case DK_CFI_STARTPROC:
       return parseDirectiveCFIStartProc();
     case DK_CFI_ENDPROC:
       return parseDirectiveCFIEndProc();
     case DK_CFI_DEF_CFA:
       return parseDirectiveCFIDefCfa(IDLoc);
     case DK_CFI_DEF_CFA_OFFSET:
       return parseDirectiveCFIDefCfaOffset();
     case DK_CFI_ADJUST_CFA_OFFSET:
       return parseDirectiveCFIAdjustCfaOffset();
     case DK_CFI_DEF_CFA_REGISTER:
       return parseDirectiveCFIDefCfaRegister(IDLoc);
     case DK_CFI_OFFSET:
       return parseDirectiveCFIOffset(IDLoc);
     case DK_CFI_REL_OFFSET:
       return parseDirectiveCFIRelOffset(IDLoc);
     case DK_CFI_PERSONALITY:
       return parseDirectiveCFIPersonalityOrLsda(true);
     case DK_CFI_LSDA:
       return parseDirectiveCFIPersonalityOrLsda(false);
     case DK_CFI_REMEMBER_STATE:
       return parseDirectiveCFIRememberState();
     case DK_CFI_RESTORE_STATE:
       return parseDirectiveCFIRestoreState();
     case DK_CFI_SAME_VALUE:
       return parseDirectiveCFISameValue(IDLoc);
     case DK_CFI_RESTORE:
       return parseDirectiveCFIRestore(IDLoc);
     case DK_CFI_ESCAPE:
       return parseDirectiveCFIEscape();
     case DK_CFI_RETURN_COLUMN:
       return parseDirectiveCFIReturnColumn(IDLoc);
     case DK_CFI_SIGNAL_FRAME:
       return parseDirectiveCFISignalFrame();
     case DK_CFI_UNDEFINED:
       return parseDirectiveCFIUndefined(IDLoc);
     case DK_CFI_REGISTER:
       return parseDirectiveCFIRegister(IDLoc);
     case DK_CFI_WINDOW_SAVE:
       return parseDirectiveCFIWindowSave();
     case DK_MACROS_ON:
     case DK_MACROS_OFF:
       return parseDirectiveMacrosOnOff(IDVal);
     case DK_MACRO:
       return parseDirectiveMacro(IDLoc);
     case DK_ALTMACRO:
     case DK_NOALTMACRO:
       return parseDirectiveAltmacro(IDVal);
     case DK_EXITM:
       return parseDirectiveExitMacro(IDVal);
     case DK_ENDM:
     case DK_ENDMACRO:
       return parseDirectiveEndMacro(IDVal);
     case DK_PURGEM:
       return parseDirectivePurgeMacro(IDLoc);
     case DK_END:
       return parseDirectiveEnd(IDLoc);
     case DK_ERR:
       return parseDirectiveError(IDLoc, false);
     case DK_ERROR:
       return parseDirectiveError(IDLoc, true);
     case DK_WARNING:
       return parseDirectiveWarning(IDLoc);
     case DK_RELOC:
       return parseDirectiveReloc(IDLoc);
     case DK_DCB:
     case DK_DCB_W:
       return parseDirectiveDCB(IDVal, 2);
     case DK_DCB_B:
       return parseDirectiveDCB(IDVal, 1);
     case DK_DCB_D:
       return parseDirectiveRealDCB(IDVal, APFloat::IEEEdouble());
     case DK_DCB_L:
       return parseDirectiveDCB(IDVal, 4);
     case DK_DCB_S:
       return parseDirectiveRealDCB(IDVal, APFloat::IEEEsingle());
     case DK_DC_X:
     case DK_DCB_X:
       return TokError(Twine(IDVal) +
                       " not currently supported for this target");
     case DK_DS:
     case DK_DS_W:
       return parseDirectiveDS(IDVal, 2);
     case DK_DS_B:
       return parseDirectiveDS(IDVal, 1);
     case DK_DS_D:
       return parseDirectiveDS(IDVal, 8);
     case DK_DS_L:
     case DK_DS_S:
       return parseDirectiveDS(IDVal, 4);
     case DK_DS_P:
     case DK_DS_X:
       return parseDirectiveDS(IDVal, 12);
     case DK_PRINT:
       return parseDirectivePrint(IDLoc);
     case DK_ADDRSIG:
       return parseDirectiveAddrsig();
     case DK_ADDRSIG_SYM:
       return parseDirectiveAddrsigSym();
     }
 
     return Error(IDLoc, "unknown directive");
   }
 
   // __asm _emit or __asm __emit
   if (ParsingInlineAsm && (IDVal == "_emit" || IDVal == "__emit" ||
                            IDVal == "_EMIT" || IDVal == "__EMIT"))
     return parseDirectiveMSEmit(IDLoc, Info, IDVal.size());
 
   // __asm align
   if (ParsingInlineAsm && (IDVal == "align" || IDVal == "ALIGN"))
     return parseDirectiveMSAlign(IDLoc, Info);
 
   if (ParsingInlineAsm && (IDVal == "even" || IDVal == "EVEN"))
     Info.AsmRewrites->emplace_back(AOK_EVEN, IDLoc, 4);
   if (checkForValidSection())
     return true;
 
   // Canonicalize the opcode to lower case.
   std::string OpcodeStr = IDVal.lower();
   ParseInstructionInfo IInfo(Info.AsmRewrites);
   bool ParseHadError = getTargetParser().ParseInstruction(IInfo, OpcodeStr, ID,
                                                           Info.ParsedOperands);
   Info.ParseError = ParseHadError;
 
   // Dump the parsed representation, if requested.
   if (getShowParsedOperands()) {
     SmallString<256> Str;
     raw_svector_ostream OS(Str);
     OS << "parsed instruction: [";
     for (unsigned i = 0; i != Info.ParsedOperands.size(); ++i) {
       if (i != 0)
         OS << ", ";
       Info.ParsedOperands[i]->print(OS);
     }
     OS << "]";
 
     printMessage(IDLoc, SourceMgr::DK_Note, OS.str());
   }
 
   // Fail even if ParseInstruction erroneously returns false.
   if (hasPendingError() || ParseHadError)
     return true;
 
   // If we are generating dwarf for the current section then generate a .loc
   // directive for the instruction.
   if (!ParseHadError && enabledGenDwarfForAssembly() &&
       getContext().getGenDwarfSectionSyms().count(
           getStreamer().getCurrentSectionOnly())) {
     unsigned Line;
     if (ActiveMacros.empty())
       Line = SrcMgr.FindLineNumber(IDLoc, CurBuffer);
     else
       Line = SrcMgr.FindLineNumber(ActiveMacros.front()->InstantiationLoc,
                                    ActiveMacros.front()->ExitBuffer);
 
     // If we previously parsed a cpp hash file line comment then make sure the
     // current Dwarf File is for the CppHashFilename if not then emit the
     // Dwarf File table for it and adjust the line number for the .loc.
     if (!CppHashInfo.Filename.empty()) {
       unsigned FileNumber = getStreamer().EmitDwarfFileDirective(
           0, StringRef(), CppHashInfo.Filename);
       getContext().setGenDwarfFileNumber(FileNumber);
 
       unsigned CppHashLocLineNo =
         SrcMgr.FindLineNumber(CppHashInfo.Loc, CppHashInfo.Buf);
       Line = CppHashInfo.LineNumber - 1 + (Line - CppHashLocLineNo);
     }
 
     getStreamer().EmitDwarfLocDirective(
         getContext().getGenDwarfFileNumber(), Line, 0,
         DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0, 0, 0,
         StringRef());
   }
 
   // If parsing succeeded, match the instruction.
   if (!ParseHadError) {
     uint64_t ErrorInfo;
     if (getTargetParser().MatchAndEmitInstruction(
             IDLoc, Info.Opcode, Info.ParsedOperands, Out, ErrorInfo,
             getTargetParser().isParsingInlineAsm()))
       return true;
   }
   return false;
 }
 
 // Parse and erase curly braces marking block start/end
 bool
 AsmParser::parseCurlyBlockScope(SmallVectorImpl<AsmRewrite> &AsmStrRewrites) {
   // Identify curly brace marking block start/end
   if (Lexer.isNot(AsmToken::LCurly) && Lexer.isNot(AsmToken::RCurly))
     return false;
 
   SMLoc StartLoc = Lexer.getLoc();
   Lex(); // Eat the brace
   if (Lexer.is(AsmToken::EndOfStatement))
     Lex(); // Eat EndOfStatement following the brace
 
   // Erase the block start/end brace from the output asm string
   AsmStrRewrites.emplace_back(AOK_Skip, StartLoc, Lexer.getLoc().getPointer() -
                                                   StartLoc.getPointer());
   return true;
 }
 
 /// parseCppHashLineFilenameComment as this:
 ///   ::= # number "filename"
 bool AsmParser::parseCppHashLineFilenameComment(SMLoc L) {
   Lex(); // Eat the hash token.
   // Lexer only ever emits HashDirective if it fully formed if it's
   // done the checking already so this is an internal error.
   assert(getTok().is(AsmToken::Integer) &&
          "Lexing Cpp line comment: Expected Integer");
   int64_t LineNumber = getTok().getIntVal();
   Lex();
   assert(getTok().is(AsmToken::String) &&
          "Lexing Cpp line comment: Expected String");
   StringRef Filename = getTok().getString();
   Lex();
 
   // Get rid of the enclosing quotes.
   Filename = Filename.substr(1, Filename.size() - 2);
 
   // Save the SMLoc, Filename and LineNumber for later use by diagnostics.
   CppHashInfo.Loc = L;
   CppHashInfo.Filename = Filename;
   CppHashInfo.LineNumber = LineNumber;
   CppHashInfo.Buf = CurBuffer;
   return false;
 }
 
 /// will use the last parsed cpp hash line filename comment
 /// for the Filename and LineNo if any in the diagnostic.
 void AsmParser::DiagHandler(const SMDiagnostic &Diag, void *Context) {
   const AsmParser *Parser = static_cast<const AsmParser *>(Context);
   raw_ostream &OS = errs();
 
   const SourceMgr &DiagSrcMgr = *Diag.getSourceMgr();
   SMLoc DiagLoc = Diag.getLoc();
   unsigned DiagBuf = DiagSrcMgr.FindBufferContainingLoc(DiagLoc);
   unsigned CppHashBuf =
       Parser->SrcMgr.FindBufferContainingLoc(Parser->CppHashInfo.Loc);
 
   // Like SourceMgr::printMessage() we need to print the include stack if any
   // before printing the message.
   unsigned DiagCurBuffer = DiagSrcMgr.FindBufferContainingLoc(DiagLoc);
   if (!Parser->SavedDiagHandler && DiagCurBuffer &&
       DiagCurBuffer != DiagSrcMgr.getMainFileID()) {
     SMLoc ParentIncludeLoc = DiagSrcMgr.getParentIncludeLoc(DiagCurBuffer);
     DiagSrcMgr.PrintIncludeStack(ParentIncludeLoc, OS);
   }
 
   // If we have not parsed a cpp hash line filename comment or the source
   // manager changed or buffer changed (like in a nested include) then just
   // print the normal diagnostic using its Filename and LineNo.
   if (!Parser->CppHashInfo.LineNumber || &DiagSrcMgr != &Parser->SrcMgr ||
       DiagBuf != CppHashBuf) {
     if (Parser->SavedDiagHandler)
       Parser->SavedDiagHandler(Diag, Parser->SavedDiagContext);
     else
       Diag.print(nullptr, OS);
     return;
   }
 
   // Use the CppHashFilename and calculate a line number based on the
   // CppHashInfo.Loc and CppHashInfo.LineNumber relative to this Diag's SMLoc
   // for the diagnostic.
   const std::string &Filename = Parser->CppHashInfo.Filename;
 
   int DiagLocLineNo = DiagSrcMgr.FindLineNumber(DiagLoc, DiagBuf);
   int CppHashLocLineNo =
       Parser->SrcMgr.FindLineNumber(Parser->CppHashInfo.Loc, CppHashBuf);
   int LineNo =
       Parser->CppHashInfo.LineNumber - 1 + (DiagLocLineNo - CppHashLocLineNo);
 
   SMDiagnostic NewDiag(*Diag.getSourceMgr(), Diag.getLoc(), Filename, LineNo,
                        Diag.getColumnNo(), Diag.getKind(), Diag.getMessage(),
                        Diag.getLineContents(), Diag.getRanges());
 
   if (Parser->SavedDiagHandler)
     Parser->SavedDiagHandler(NewDiag, Parser->SavedDiagContext);
   else
     NewDiag.print(nullptr, OS);
 }
 
 // FIXME: This is mostly duplicated from the function in AsmLexer.cpp. The
 // difference being that that function accepts '@' as part of identifiers and
 // we can't do that. AsmLexer.cpp should probably be changed to handle
 // '@' as a special case when needed.
 static bool isIdentifierChar(char c) {
   return isalnum(static_cast<unsigned char>(c)) || c == '_' || c == '$' ||
          c == '.';
 }
 
 bool AsmParser::expandMacro(raw_svector_ostream &OS, StringRef Body,
                             ArrayRef<MCAsmMacroParameter> Parameters,
                             ArrayRef<MCAsmMacroArgument> A,
                             bool EnableAtPseudoVariable, SMLoc L) {
   unsigned NParameters = Parameters.size();
   bool HasVararg = NParameters ? Parameters.back().Vararg : false;
   if ((!IsDarwin || NParameters != 0) && NParameters != A.size())
     return Error(L, "Wrong number of arguments");
 
   // A macro without parameters is handled differently on Darwin:
   // gas accepts no arguments and does no substitutions
   while (!Body.empty()) {
     // Scan for the next substitution.
     std::size_t End = Body.size(), Pos = 0;
     for (; Pos != End; ++Pos) {
       // Check for a substitution or escape.
       if (IsDarwin && !NParameters) {
         // This macro has no parameters, look for $0, $1, etc.
         if (Body[Pos] != '$' || Pos + 1 == End)
           continue;
 
         char Next = Body[Pos + 1];
         if (Next == '$' || Next == 'n' ||
             isdigit(static_cast<unsigned char>(Next)))
           break;
       } else {
         // This macro has parameters, look for \foo, \bar, etc.
         if (Body[Pos] == '\\' && Pos + 1 != End)
           break;
       }
     }
 
     // Add the prefix.
     OS << Body.slice(0, Pos);
 
     // Check if we reached the end.
     if (Pos == End)
       break;
 
     if (IsDarwin && !NParameters) {
       switch (Body[Pos + 1]) {
       // $$ => $
       case '$':
         OS << '$';
         break;
 
       // $n => number of arguments
       case 'n':
         OS << A.size();
         break;
 
       // $[0-9] => argument
       default: {
         // Missing arguments are ignored.
         unsigned Index = Body[Pos + 1] - '0';
         if (Index >= A.size())
           break;
 
         // Otherwise substitute with the token values, with spaces eliminated.
         for (const AsmToken &Token : A[Index])
           OS << Token.getString();
         break;
       }
       }
       Pos += 2;
     } else {
       unsigned I = Pos + 1;
 
       // Check for the \@ pseudo-variable.
       if (EnableAtPseudoVariable && Body[I] == '@' && I + 1 != End)
         ++I;
       else
         while (isIdentifierChar(Body[I]) && I + 1 != End)
           ++I;
 
       const char *Begin = Body.data() + Pos + 1;
       StringRef Argument(Begin, I - (Pos + 1));
       unsigned Index = 0;
 
       if (Argument == "@") {
         OS << NumOfMacroInstantiations;
         Pos += 2;
       } else {
         for (; Index < NParameters; ++Index)
           if (Parameters[Index].Name == Argument)
             break;
 
         if (Index == NParameters) {
           if (Body[Pos + 1] == '(' && Body[Pos + 2] == ')')
             Pos += 3;
           else {
             OS << '\\' << Argument;
             Pos = I;
           }
         } else {
           bool VarargParameter = HasVararg && Index == (NParameters - 1);
           for (const AsmToken &Token : A[Index])
             // For altmacro mode, you can write '%expr'.
             // The prefix '%' evaluates the expression 'expr'
             // and uses the result as a string (e.g. replace %(1+2) with the string "3").
             // Here, we identify the integer token which is the result of the
             // absolute expression evaluation and replace it with its string representation.
             if ((Lexer.IsaAltMacroMode()) &&
                  (*(Token.getString().begin()) == '%') && Token.is(AsmToken::Integer))
               // Emit an integer value to the buffer.
               OS << Token.getIntVal();
             // Only Token that was validated as a string and begins with '<'
             // is considered altMacroString!!!
             else if ((Lexer.IsaAltMacroMode()) &&
                      (*(Token.getString().begin()) == '<') &&
                      Token.is(AsmToken::String)) {
               std::string Res;
               altMacroString(Token.getStringContents(), Res);
               OS << Res;
             }
             // We expect no quotes around the string's contents when
             // parsing for varargs.
             else if (Token.isNot(AsmToken::String) || VarargParameter)
               OS << Token.getString();
             else
               OS << Token.getStringContents();
 
           Pos += 1 + Argument.size();
         }
       }
     }
     // Update the scan point.
     Body = Body.substr(Pos);
   }
 
   return false;
 }
 
 MacroInstantiation::MacroInstantiation(SMLoc IL, int EB, SMLoc EL,
                                        size_t CondStackDepth)
     : InstantiationLoc(IL), ExitBuffer(EB), ExitLoc(EL),
       CondStackDepth(CondStackDepth) {}
 
 static bool isOperator(AsmToken::TokenKind kind) {
   switch (kind) {
   default:
     return false;
   case AsmToken::Plus:
   case AsmToken::Minus:
   case AsmToken::Tilde:
   case AsmToken::Slash:
   case AsmToken::Star:
   case AsmToken::Dot:
   case AsmToken::Equal:
   case AsmToken::EqualEqual:
   case AsmToken::Pipe:
   case AsmToken::PipePipe:
   case AsmToken::Caret:
   case AsmToken::Amp:
   case AsmToken::AmpAmp:
   case AsmToken::Exclaim:
   case AsmToken::ExclaimEqual:
   case AsmToken::Less:
   case AsmToken::LessEqual:
   case AsmToken::LessLess:
   case AsmToken::LessGreater:
   case AsmToken::Greater:
   case AsmToken::GreaterEqual:
   case AsmToken::GreaterGreater:
     return true;
   }
 }
 
 namespace {
 
 class AsmLexerSkipSpaceRAII {
 public:
   AsmLexerSkipSpaceRAII(AsmLexer &Lexer, bool SkipSpace) : Lexer(Lexer) {
     Lexer.setSkipSpace(SkipSpace);
   }
 
   ~AsmLexerSkipSpaceRAII() {
     Lexer.setSkipSpace(true);
   }
 
 private:
   AsmLexer &Lexer;
 };
 
 } // end anonymous namespace
 
 bool AsmParser::parseMacroArgument(MCAsmMacroArgument &MA, bool Vararg) {
 
   if (Vararg) {
     if (Lexer.isNot(AsmToken::EndOfStatement)) {
       StringRef Str = parseStringToEndOfStatement();
       MA.emplace_back(AsmToken::String, Str);
     }
     return false;
   }
 
   unsigned ParenLevel = 0;
 
   // Darwin doesn't use spaces to delmit arguments.
   AsmLexerSkipSpaceRAII ScopedSkipSpace(Lexer, IsDarwin);
 
   bool SpaceEaten;
 
   while (true) {
     SpaceEaten = false;
     if (Lexer.is(AsmToken::Eof) || Lexer.is(AsmToken::Equal))
       return TokError("unexpected token in macro instantiation");
 
     if (ParenLevel == 0) {
 
       if (Lexer.is(AsmToken::Comma))
         break;
 
       if (Lexer.is(AsmToken::Space)) {
         SpaceEaten = true;
         Lexer.Lex(); // Eat spaces
       }
 
       // Spaces can delimit parameters, but could also be part an expression.
       // If the token after a space is an operator, add the token and the next
       // one into this argument
       if (!IsDarwin) {
         if (isOperator(Lexer.getKind())) {
           MA.push_back(getTok());
           Lexer.Lex();
 
           // Whitespace after an operator can be ignored.
           if (Lexer.is(AsmToken::Space))
             Lexer.Lex();
 
           continue;
         }
       }
       if (SpaceEaten)
         break;
     }
 
     // handleMacroEntry relies on not advancing the lexer here
     // to be able to fill in the remaining default parameter values
     if (Lexer.is(AsmToken::EndOfStatement))
       break;
 
     // Adjust the current parentheses level.
     if (Lexer.is(AsmToken::LParen))
       ++ParenLevel;
     else if (Lexer.is(AsmToken::RParen) && ParenLevel)
       --ParenLevel;
 
     // Append the token to the current argument list.
     MA.push_back(getTok());
     Lexer.Lex();
   }
 
   if (ParenLevel != 0)
     return TokError("unbalanced parentheses in macro argument");
   return false;
 }
 
 // Parse the macro instantiation arguments.
 bool AsmParser::parseMacroArguments(const MCAsmMacro *M,
                                     MCAsmMacroArguments &A) {
   const unsigned NParameters = M ? M->Parameters.size() : 0;
   bool NamedParametersFound = false;
   SmallVector<SMLoc, 4> FALocs;
 
   A.resize(NParameters);
   FALocs.resize(NParameters);
 
   // Parse two kinds of macro invocations:
   // - macros defined without any parameters accept an arbitrary number of them
   // - macros defined with parameters accept at most that many of them
   bool HasVararg = NParameters ? M->Parameters.back().Vararg : false;
   for (unsigned Parameter = 0; !NParameters || Parameter < NParameters;
        ++Parameter) {
     SMLoc IDLoc = Lexer.getLoc();
     MCAsmMacroParameter FA;
 
     if (Lexer.is(AsmToken::Identifier) && Lexer.peekTok().is(AsmToken::Equal)) {
       if (parseIdentifier(FA.Name))
         return Error(IDLoc, "invalid argument identifier for formal argument");
 
       if (Lexer.isNot(AsmToken::Equal))
         return TokError("expected '=' after formal parameter identifier");
 
       Lex();
 
       NamedParametersFound = true;
     }
     bool Vararg = HasVararg && Parameter == (NParameters - 1);
 
     if (NamedParametersFound && FA.Name.empty())
       return Error(IDLoc, "cannot mix positional and keyword arguments");
 
     SMLoc StrLoc = Lexer.getLoc();
     SMLoc EndLoc;
     if (Lexer.IsaAltMacroMode() && Lexer.is(AsmToken::Percent)) {
         const MCExpr *AbsoluteExp;
         int64_t Value;
         /// Eat '%'
         Lex();
         if (parseExpression(AbsoluteExp, EndLoc))
           return false;
         if (!AbsoluteExp->evaluateAsAbsolute(Value,
                                              getStreamer().getAssemblerPtr()))
           return Error(StrLoc, "expected absolute expression");
         const char *StrChar = StrLoc.getPointer();
         const char *EndChar = EndLoc.getPointer();
         AsmToken newToken(AsmToken::Integer, StringRef(StrChar , EndChar - StrChar), Value);
         FA.Value.push_back(newToken);
     } else if (Lexer.IsaAltMacroMode() && Lexer.is(AsmToken::Less) &&
                isAltmacroString(StrLoc, EndLoc)) {
         const char *StrChar = StrLoc.getPointer();
         const char *EndChar = EndLoc.getPointer();
         jumpToLoc(EndLoc, CurBuffer);
         /// Eat from '<' to '>'
         Lex();
         AsmToken newToken(AsmToken::String, StringRef(StrChar, EndChar - StrChar));
         FA.Value.push_back(newToken);
     } else if(parseMacroArgument(FA.Value, Vararg))
         return true;
 
     unsigned PI = Parameter;
     if (!FA.Name.empty()) {
       unsigned FAI = 0;
       for (FAI = 0; FAI < NParameters; ++FAI)
         if (M->Parameters[FAI].Name == FA.Name)
           break;
 
       if (FAI >= NParameters) {
         assert(M && "expected macro to be defined");
         return Error(IDLoc, "parameter named '" + FA.Name +
                                 "' does not exist for macro '" + M->Name + "'");
       }
       PI = FAI;
     }
 
     if (!FA.Value.empty()) {
       if (A.size() <= PI)
         A.resize(PI + 1);
       A[PI] = FA.Value;
 
       if (FALocs.size() <= PI)
         FALocs.resize(PI + 1);
 
       FALocs[PI] = Lexer.getLoc();
     }
 
     // At the end of the statement, fill in remaining arguments that have
     // default values. If there aren't any, then the next argument is
     // required but missing
     if (Lexer.is(AsmToken::EndOfStatement)) {
       bool Failure = false;
       for (unsigned FAI = 0; FAI < NParameters; ++FAI) {
         if (A[FAI].empty()) {
           if (M->Parameters[FAI].Required) {
             Error(FALocs[FAI].isValid() ? FALocs[FAI] : Lexer.getLoc(),
                   "missing value for required parameter "
                   "'" + M->Parameters[FAI].Name + "' in macro '" + M->Name + "'");
             Failure = true;
           }
 
           if (!M->Parameters[FAI].Value.empty())
             A[FAI] = M->Parameters[FAI].Value;
         }
       }
       return Failure;
     }
 
     if (Lexer.is(AsmToken::Comma))
       Lex();
   }
 
   return TokError("too many positional arguments");
 }
 
 bool AsmParser::handleMacroEntry(const MCAsmMacro *M, SMLoc NameLoc) {
   // Arbitrarily limit macro nesting depth (default matches 'as'). We can
   // eliminate this, although we should protect against infinite loops.
   unsigned MaxNestingDepth = AsmMacroMaxNestingDepth;
   if (ActiveMacros.size() == MaxNestingDepth) {
     std::ostringstream MaxNestingDepthError;
     MaxNestingDepthError << "macros cannot be nested more than "
                          << MaxNestingDepth << " levels deep."
                          << " Use -asm-macro-max-nesting-depth to increase "
                             "this limit.";
     return TokError(MaxNestingDepthError.str());
   }
 
   MCAsmMacroArguments A;
   if (parseMacroArguments(M, A))
     return true;
 
   // Macro instantiation is lexical, unfortunately. We construct a new buffer
   // to hold the macro body with substitutions.
   SmallString<256> Buf;
   StringRef Body = M->Body;
   raw_svector_ostream OS(Buf);
 
   if (expandMacro(OS, Body, M->Parameters, A, true, getTok().getLoc()))
     return true;
 
   // We include the .endmacro in the buffer as our cue to exit the macro
   // instantiation.
   OS << ".endmacro\n";
 
   std::unique_ptr<MemoryBuffer> Instantiation =
       MemoryBuffer::getMemBufferCopy(OS.str(), "<instantiation>");
 
   // Create the macro instantiation object and add to the current macro
   // instantiation stack.
   MacroInstantiation *MI = new MacroInstantiation(
       NameLoc, CurBuffer, getTok().getLoc(), TheCondStack.size());
   ActiveMacros.push_back(MI);
 
   ++NumOfMacroInstantiations;
 
   // Jump to the macro instantiation and prime the lexer.
   CurBuffer = SrcMgr.AddNewSourceBuffer(std::move(Instantiation), SMLoc());
   Lexer.setBuffer(SrcMgr.getMemoryBuffer(CurBuffer)->getBuffer());
   Lex();
 
   return false;
 }
 
 void AsmParser::handleMacroExit() {
   // Jump to the EndOfStatement we should return to, and consume it.
   jumpToLoc(ActiveMacros.back()->ExitLoc, ActiveMacros.back()->ExitBuffer);
   Lex();
 
   // Pop the instantiation entry.
   delete ActiveMacros.back();
   ActiveMacros.pop_back();
 }
 
 bool AsmParser::parseAssignment(StringRef Name, bool allow_redef,
                                 bool NoDeadStrip) {
   MCSymbol *Sym;
   const MCExpr *Value;
   if (MCParserUtils::parseAssignmentExpression(Name, allow_redef, *this, Sym,
                                                Value))
     return true;
 
   if (!Sym) {
     // In the case where we parse an expression starting with a '.', we will
     // not generate an error, nor will we create a symbol.  In this case we
     // should just return out.
     return false;
   }
 
   // Do the assignment.
   Out.EmitAssignment(Sym, Value);
   if (NoDeadStrip)
     Out.EmitSymbolAttribute(Sym, MCSA_NoDeadStrip);
 
   return false;
 }
 
 /// parseIdentifier:
 ///   ::= identifier
 ///   ::= string
 bool AsmParser::parseIdentifier(StringRef &Res) {
   // The assembler has relaxed rules for accepting identifiers, in particular we
   // allow things like '.globl $foo' and '.def @feat.00', which would normally be
   // separate tokens. At this level, we have already lexed so we cannot (currently)
   // handle this as a context dependent token, instead we detect adjacent tokens
   // and return the combined identifier.
   if (Lexer.is(AsmToken::Dollar) || Lexer.is(AsmToken::At)) {
     SMLoc PrefixLoc = getLexer().getLoc();
 
     // Consume the prefix character, and check for a following identifier.
 
     AsmToken Buf[1];
     Lexer.peekTokens(Buf, false);
 
     if (Buf[0].isNot(AsmToken::Identifier))
       return true;
 
     // We have a '$' or '@' followed by an identifier, make sure they are adjacent.
     if (PrefixLoc.getPointer() + 1 != Buf[0].getLoc().getPointer())
       return true;
 
     // eat $ or @
     Lexer.Lex(); // Lexer's Lex guarantees consecutive token.
     // Construct the joined identifier and consume the token.
     Res =
         StringRef(PrefixLoc.getPointer(), getTok().getIdentifier().size() + 1);
     Lex(); // Parser Lex to maintain invariants.
     return false;
   }
 
   if (Lexer.isNot(AsmToken::Identifier) && Lexer.isNot(AsmToken::String))
     return true;
 
   Res = getTok().getIdentifier();
 
   Lex(); // Consume the identifier token.
 
   return false;
 }
 
 /// parseDirectiveSet:
 ///   ::= .equ identifier ',' expression
 ///   ::= .equiv identifier ',' expression
 ///   ::= .set identifier ',' expression
 bool AsmParser::parseDirectiveSet(StringRef IDVal, bool allow_redef) {
   StringRef Name;
   if (check(parseIdentifier(Name), "expected identifier") ||
       parseToken(AsmToken::Comma) || parseAssignment(Name, allow_redef, true))
     return addErrorSuffix(" in '" + Twine(IDVal) + "' directive");
   return false;
 }
 
 bool AsmParser::parseEscapedString(std::string &Data) {
   if (check(getTok().isNot(AsmToken::String), "expected string"))
     return true;
 
   Data = "";
   StringRef Str = getTok().getStringContents();
   for (unsigned i = 0, e = Str.size(); i != e; ++i) {
     if (Str[i] != '\\') {
       Data += Str[i];
       continue;
     }
 
     // Recognize escaped characters. Note that this escape semantics currently
     // loosely follows Darwin 'as'. Notably, it doesn't support hex escapes.
     ++i;
     if (i == e)
       return TokError("unexpected backslash at end of string");
 
     // Recognize octal sequences.
     if ((unsigned)(Str[i] - '0') <= 7) {
       // Consume up to three octal characters.
       unsigned Value = Str[i] - '0';
 
       if (i + 1 != e && ((unsigned)(Str[i + 1] - '0')) <= 7) {
         ++i;
         Value = Value * 8 + (Str[i] - '0');
 
         if (i + 1 != e && ((unsigned)(Str[i + 1] - '0')) <= 7) {
           ++i;
           Value = Value * 8 + (Str[i] - '0');
         }
       }
 
       if (Value > 255)
         return TokError("invalid octal escape sequence (out of range)");
 
       Data += (unsigned char)Value;
       continue;
     }
 
     // Otherwise recognize individual escapes.
     switch (Str[i]) {
     default:
       // Just reject invalid escape sequences for now.
       return TokError("invalid escape sequence (unrecognized character)");
 
     case 'b': Data += '\b'; break;
     case 'f': Data += '\f'; break;
     case 'n': Data += '\n'; break;
     case 'r': Data += '\r'; break;
     case 't': Data += '\t'; break;
     case '"': Data += '"'; break;
     case '\\': Data += '\\'; break;
     }
   }
 
   Lex();
   return false;
 }
 
 /// parseDirectiveAscii:
 ///   ::= ( .ascii | .asciz | .string ) [ "string" ( , "string" )* ]
 bool AsmParser::parseDirectiveAscii(StringRef IDVal, bool ZeroTerminated) {
   auto parseOp = [&]() -> bool {
     std::string Data;
     if (checkForValidSection() || parseEscapedString(Data))
       return true;
     getStreamer().EmitBytes(Data);
     if (ZeroTerminated)
       getStreamer().EmitBytes(StringRef("\0", 1));
     return false;
   };
 
   if (parseMany(parseOp))
     return addErrorSuffix(" in '" + Twine(IDVal) + "' directive");
   return false;
 }
 
 /// parseDirectiveReloc
 ///  ::= .reloc expression , identifier [ , expression ]
 bool AsmParser::parseDirectiveReloc(SMLoc DirectiveLoc) {
   const MCExpr *Offset;
   const MCExpr *Expr = nullptr;
 
   SMLoc OffsetLoc = Lexer.getTok().getLoc();
   int64_t OffsetValue;
   // We can only deal with constant expressions at the moment.
 
   if (parseExpression(Offset))
     return true;
 
   if (check(!Offset->evaluateAsAbsolute(OffsetValue,
                                         getStreamer().getAssemblerPtr()),
             OffsetLoc, "expression is not a constant value") ||
       check(OffsetValue < 0, OffsetLoc, "expression is negative") ||
       parseToken(AsmToken::Comma, "expected comma") ||
       check(getTok().isNot(AsmToken::Identifier), "expected relocation name"))
     return true;
 
   SMLoc NameLoc = Lexer.getTok().getLoc();
   StringRef Name = Lexer.getTok().getIdentifier();
   Lex();
 
   if (Lexer.is(AsmToken::Comma)) {
     Lex();
     SMLoc ExprLoc = Lexer.getLoc();
     if (parseExpression(Expr))
       return true;
 
     MCValue Value;
     if (!Expr->evaluateAsRelocatable(Value, nullptr, nullptr))
       return Error(ExprLoc, "expression must be relocatable");
   }
 
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in .reloc directive"))
       return true;
 
   const MCTargetAsmParser &MCT = getTargetParser();
   const MCSubtargetInfo &STI = MCT.getSTI();
   if (getStreamer().EmitRelocDirective(*Offset, Name, Expr, DirectiveLoc, STI))
     return Error(NameLoc, "unknown relocation name");
 
   return false;
 }
 
 /// parseDirectiveValue
 ///  ::= (.byte | .short | ... ) [ expression (, expression)* ]
 bool AsmParser::parseDirectiveValue(StringRef IDVal, unsigned Size) {
   auto parseOp = [&]() -> bool {
     const MCExpr *Value;
     SMLoc ExprLoc = getLexer().getLoc();
     if (checkForValidSection() || parseExpression(Value))
       return true;
     // Special case constant expressions to match code generator.
     if (const MCConstantExpr *MCE = dyn_cast<MCConstantExpr>(Value)) {
       assert(Size <= 8 && "Invalid size");
       uint64_t IntValue = MCE->getValue();
       if (!isUIntN(8 * Size, IntValue) && !isIntN(8 * Size, IntValue))
         return Error(ExprLoc, "out of range literal value");
       getStreamer().EmitIntValue(IntValue, Size);
     } else
       getStreamer().EmitValue(Value, Size, ExprLoc);
     return false;
   };
 
   if (parseMany(parseOp))
     return addErrorSuffix(" in '" + Twine(IDVal) + "' directive");
   return false;
 }
 
 static bool parseHexOcta(AsmParser &Asm, uint64_t &hi, uint64_t &lo) {
   if (Asm.getTok().isNot(AsmToken::Integer) &&
       Asm.getTok().isNot(AsmToken::BigNum))
     return Asm.TokError("unknown token in expression");
   SMLoc ExprLoc = Asm.getTok().getLoc();
   APInt IntValue = Asm.getTok().getAPIntVal();
   Asm.Lex();
   if (!IntValue.isIntN(128))
     return Asm.Error(ExprLoc, "out of range literal value");
   if (!IntValue.isIntN(64)) {
     hi = IntValue.getHiBits(IntValue.getBitWidth() - 64).getZExtValue();
     lo = IntValue.getLoBits(64).getZExtValue();
   } else {
     hi = 0;
     lo = IntValue.getZExtValue();
   }
   return false;
 }
 
 /// ParseDirectiveOctaValue
 ///  ::= .octa [ hexconstant (, hexconstant)* ]
 
 bool AsmParser::parseDirectiveOctaValue(StringRef IDVal) {
   auto parseOp = [&]() -> bool {
     if (checkForValidSection())
       return true;
     uint64_t hi, lo;
     if (parseHexOcta(*this, hi, lo))
       return true;
     if (MAI.isLittleEndian()) {
       getStreamer().EmitIntValue(lo, 8);
       getStreamer().EmitIntValue(hi, 8);
     } else {
       getStreamer().EmitIntValue(hi, 8);
       getStreamer().EmitIntValue(lo, 8);
     }
     return false;
   };
 
   if (parseMany(parseOp))
     return addErrorSuffix(" in '" + Twine(IDVal) + "' directive");
   return false;
 }
 
 bool AsmParser::parseRealValue(const fltSemantics &Semantics, APInt &Res) {
   // We don't truly support arithmetic on floating point expressions, so we
   // have to manually parse unary prefixes.
   bool IsNeg = false;
   if (getLexer().is(AsmToken::Minus)) {
     Lexer.Lex();
     IsNeg = true;
   } else if (getLexer().is(AsmToken::Plus))
     Lexer.Lex();
 
   if (Lexer.is(AsmToken::Error))
     return TokError(Lexer.getErr());
   if (Lexer.isNot(AsmToken::Integer) && Lexer.isNot(AsmToken::Real) &&
       Lexer.isNot(AsmToken::Identifier))
     return TokError("unexpected token in directive");
 
   // Convert to an APFloat.
   APFloat Value(Semantics);
   StringRef IDVal = getTok().getString();
   if (getLexer().is(AsmToken::Identifier)) {
     if (!IDVal.compare_lower("infinity") || !IDVal.compare_lower("inf"))
       Value = APFloat::getInf(Semantics);
     else if (!IDVal.compare_lower("nan"))
       Value = APFloat::getNaN(Semantics, false, ~0);
     else
       return TokError("invalid floating point literal");
   } else if (Value.convertFromString(IDVal, APFloat::rmNearestTiesToEven) ==
              APFloat::opInvalidOp)
     return TokError("invalid floating point literal");
   if (IsNeg)
     Value.changeSign();
 
   // Consume the numeric token.
   Lex();
 
   Res = Value.bitcastToAPInt();
 
   return false;
 }
 
 /// parseDirectiveRealValue
 ///  ::= (.single | .double) [ expression (, expression)* ]
 bool AsmParser::parseDirectiveRealValue(StringRef IDVal,
                                         const fltSemantics &Semantics) {
   auto parseOp = [&]() -> bool {
     APInt AsInt;
     if (checkForValidSection() || parseRealValue(Semantics, AsInt))
       return true;
     getStreamer().EmitIntValue(AsInt.getLimitedValue(),
                                AsInt.getBitWidth() / 8);
     return false;
   };
 
   if (parseMany(parseOp))
     return addErrorSuffix(" in '" + Twine(IDVal) + "' directive");
   return false;
 }
 
 /// parseDirectiveZero
 ///  ::= .zero expression
 bool AsmParser::parseDirectiveZero() {
   SMLoc NumBytesLoc = Lexer.getLoc();
   const MCExpr *NumBytes;
   if (checkForValidSection() || parseExpression(NumBytes))
     return true;
 
   int64_t Val = 0;
   if (getLexer().is(AsmToken::Comma)) {
     Lex();
     if (parseAbsoluteExpression(Val))
       return true;
   }
 
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.zero' directive"))
     return true;
   getStreamer().emitFill(*NumBytes, Val, NumBytesLoc);
 
   return false;
 }
 
 /// parseDirectiveFill
 ///  ::= .fill expression [ , expression [ , expression ] ]
 bool AsmParser::parseDirectiveFill() {
   SMLoc NumValuesLoc = Lexer.getLoc();
   const MCExpr *NumValues;
   if (checkForValidSection() || parseExpression(NumValues))
     return true;
 
   int64_t FillSize = 1;
   int64_t FillExpr = 0;
 
   SMLoc SizeLoc, ExprLoc;
 
   if (parseOptionalToken(AsmToken::Comma)) {
     SizeLoc = getTok().getLoc();
     if (parseAbsoluteExpression(FillSize))
       return true;
     if (parseOptionalToken(AsmToken::Comma)) {
       ExprLoc = getTok().getLoc();
       if (parseAbsoluteExpression(FillExpr))
         return true;
     }
   }
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.fill' directive"))
     return true;
 
   if (FillSize < 0) {
     Warning(SizeLoc, "'.fill' directive with negative size has no effect");
     return false;
   }
   if (FillSize > 8) {
     Warning(SizeLoc, "'.fill' directive with size greater than 8 has been truncated to 8");
     FillSize = 8;
   }
 
   if (!isUInt<32>(FillExpr) && FillSize > 4)
     Warning(ExprLoc, "'.fill' directive pattern has been truncated to 32-bits");
 
   getStreamer().emitFill(*NumValues, FillSize, FillExpr, NumValuesLoc);
 
   return false;
 }
 
 /// parseDirectiveOrg
 ///  ::= .org expression [ , expression ]
 bool AsmParser::parseDirectiveOrg() {
   const MCExpr *Offset;
   SMLoc OffsetLoc = Lexer.getLoc();
   if (checkForValidSection() || parseExpression(Offset))
     return true;
 
   // Parse optional fill expression.
   int64_t FillExpr = 0;
   if (parseOptionalToken(AsmToken::Comma))
     if (parseAbsoluteExpression(FillExpr))
       return addErrorSuffix(" in '.org' directive");
   if (parseToken(AsmToken::EndOfStatement))
     return addErrorSuffix(" in '.org' directive");
 
   getStreamer().emitValueToOffset(Offset, FillExpr, OffsetLoc);
   return false;
 }
 
 /// parseDirectiveAlign
 ///  ::= {.align, ...} expression [ , expression [ , expression ]]
 bool AsmParser::parseDirectiveAlign(bool IsPow2, unsigned ValueSize) {
   SMLoc AlignmentLoc = getLexer().getLoc();
   int64_t Alignment;
   SMLoc MaxBytesLoc;
   bool HasFillExpr = false;
   int64_t FillExpr = 0;
   int64_t MaxBytesToFill = 0;
 
   auto parseAlign = [&]() -> bool {
     if (parseAbsoluteExpression(Alignment))
       return true;
     if (parseOptionalToken(AsmToken::Comma)) {
       // The fill expression can be omitted while specifying a maximum number of
       // alignment bytes, e.g:
       //  .align 3,,4
       if (getTok().isNot(AsmToken::Comma)) {
         HasFillExpr = true;
         if (parseAbsoluteExpression(FillExpr))
           return true;
       }
       if (parseOptionalToken(AsmToken::Comma))
         if (parseTokenLoc(MaxBytesLoc) ||
             parseAbsoluteExpression(MaxBytesToFill))
           return true;
     }
     return parseToken(AsmToken::EndOfStatement);
   };
 
   if (checkForValidSection())
     return addErrorSuffix(" in directive");
   // Ignore empty '.p2align' directives for GNU-as compatibility
   if (IsPow2 && (ValueSize == 1) && getTok().is(AsmToken::EndOfStatement)) {
     Warning(AlignmentLoc, "p2align directive with no operand(s) is ignored");
     return parseToken(AsmToken::EndOfStatement);
   }
   if (parseAlign())
     return addErrorSuffix(" in directive");
 
   // Always emit an alignment here even if we thrown an error.
   bool ReturnVal = false;
 
   // Compute alignment in bytes.
   if (IsPow2) {
     // FIXME: Diagnose overflow.
     if (Alignment >= 32) {
       ReturnVal |= Error(AlignmentLoc, "invalid alignment value");
       Alignment = 31;
     }
 
     Alignment = 1ULL << Alignment;
   } else {
     // Reject alignments that aren't either a power of two or zero,
     // for gas compatibility. Alignment of zero is silently rounded
     // up to one.
     if (Alignment == 0)
       Alignment = 1;
     if (!isPowerOf2_64(Alignment))
       ReturnVal |= Error(AlignmentLoc, "alignment must be a power of 2");
   }
 
   // Diagnose non-sensical max bytes to align.
   if (MaxBytesLoc.isValid()) {
     if (MaxBytesToFill < 1) {
       ReturnVal |= Error(MaxBytesLoc,
                          "alignment directive can never be satisfied in this "
                          "many bytes, ignoring maximum bytes expression");
       MaxBytesToFill = 0;
     }
 
     if (MaxBytesToFill >= Alignment) {
       Warning(MaxBytesLoc, "maximum bytes expression exceeds alignment and "
                            "has no effect");
       MaxBytesToFill = 0;
     }
   }
 
   // Check whether we should use optimal code alignment for this .align
   // directive.
   const MCSection *Section = getStreamer().getCurrentSectionOnly();
   assert(Section && "must have section to emit alignment");
   bool UseCodeAlign = Section->UseCodeAlign();
   if ((!HasFillExpr || Lexer.getMAI().getTextAlignFillValue() == FillExpr) &&
       ValueSize == 1 && UseCodeAlign) {
     getStreamer().EmitCodeAlignment(Alignment, MaxBytesToFill);
   } else {
     // FIXME: Target specific behavior about how the "extra" bytes are filled.
     getStreamer().EmitValueToAlignment(Alignment, FillExpr, ValueSize,
                                        MaxBytesToFill);
   }
 
   return ReturnVal;
 }
 
 /// parseDirectiveFile
 /// ::= .file filename
 /// ::= .file number [directory] filename [md5 checksum] [source source-text]
 bool AsmParser::parseDirectiveFile(SMLoc DirectiveLoc) {
   // FIXME: I'm not sure what this is.
   int64_t FileNumber = -1;
   if (getLexer().is(AsmToken::Integer)) {
     FileNumber = getTok().getIntVal();
     Lex();
 
     if (FileNumber < 0)
       return TokError("negative file number");
   }
 
   std::string Path;
 
   // Usually the directory and filename together, otherwise just the directory.
   // Allow the strings to have escaped octal character sequence.
   if (check(getTok().isNot(AsmToken::String),
             "unexpected token in '.file' directive") ||
       parseEscapedString(Path))
     return true;
 
   StringRef Directory;
   StringRef Filename;
   std::string FilenameData;
   if (getLexer().is(AsmToken::String)) {
     if (check(FileNumber == -1,
               "explicit path specified, but no file number") ||
         parseEscapedString(FilenameData))
       return true;
     Filename = FilenameData;
     Directory = Path;
   } else {
     Filename = Path;
   }
 
   uint64_t MD5Hi, MD5Lo;
   bool HasMD5 = false;
 
   Optional<StringRef> Source;
   bool HasSource = false;
   std::string SourceString;
 
   while (!parseOptionalToken(AsmToken::EndOfStatement)) {
     StringRef Keyword;
     if (check(getTok().isNot(AsmToken::Identifier),
               "unexpected token in '.file' directive") ||
         parseIdentifier(Keyword))
       return true;
     if (Keyword == "md5") {
       HasMD5 = true;
       if (check(FileNumber == -1,
                 "MD5 checksum specified, but no file number") ||
           parseHexOcta(*this, MD5Hi, MD5Lo))
         return true;
     } else if (Keyword == "source") {
       HasSource = true;
       if (check(FileNumber == -1,
                 "source specified, but no file number") ||
           check(getTok().isNot(AsmToken::String),
                 "unexpected token in '.file' directive") ||
           parseEscapedString(SourceString))
         return true;
     } else {
       return TokError("unexpected token in '.file' directive");
     }
   }
 
-  // In case there is a -g option as well as debug info from directive .file,
-  // we turn off the -g option, directly use the existing debug info instead.
-  // Also reset any implicit ".file 0" for the assembler source.
-  if (Ctx.getGenDwarfForAssembly()) {
-    Ctx.getMCDwarfLineTable(0).resetRootFile();
-    Ctx.setGenDwarfForAssembly(false);
-  }
-
   if (FileNumber == -1)
     getStreamer().EmitFileDirective(Filename);
   else {
+    // In case there is a -g option as well as debug info from directive .file,
+    // we turn off the -g option, directly use the existing debug info instead.
+    // Also reset any implicit ".file 0" for the assembler source.
+    if (Ctx.getGenDwarfForAssembly()) {
+      Ctx.getMCDwarfLineTable(0).resetRootFile();
+      Ctx.setGenDwarfForAssembly(false);
+    }
+
     MD5::MD5Result *CKMem = nullptr;
     if (HasMD5) {
       CKMem = (MD5::MD5Result *)Ctx.allocate(sizeof(MD5::MD5Result), 1);
       for (unsigned i = 0; i != 8; ++i) {
         CKMem->Bytes[i] = uint8_t(MD5Hi >> ((7 - i) * 8));
         CKMem->Bytes[i + 8] = uint8_t(MD5Lo >> ((7 - i) * 8));
       }
     }
     if (HasSource) {
       char *SourceBuf = static_cast<char *>(Ctx.allocate(SourceString.size()));
       memcpy(SourceBuf, SourceString.data(), SourceString.size());
       Source = StringRef(SourceBuf, SourceString.size());
     }
     if (FileNumber == 0) {
       if (Ctx.getDwarfVersion() < 5)
         return Warning(DirectiveLoc, "file 0 not supported prior to DWARF-5");
       getStreamer().emitDwarfFile0Directive(Directory, Filename, CKMem, Source);
     } else {
       Expected<unsigned> FileNumOrErr = getStreamer().tryEmitDwarfFileDirective(
           FileNumber, Directory, Filename, CKMem, Source);
       if (!FileNumOrErr)
         return Error(DirectiveLoc, toString(FileNumOrErr.takeError()));
       FileNumber = FileNumOrErr.get();
     }
     // Alert the user if there are some .file directives with MD5 and some not.
     // But only do that once.
     if (!ReportedInconsistentMD5 && !Ctx.isDwarfMD5UsageConsistent(0)) {
       ReportedInconsistentMD5 = true;
       return Warning(DirectiveLoc, "inconsistent use of MD5 checksums");
     }
   }
 
   return false;
 }
 
 /// parseDirectiveLine
 /// ::= .line [number]
 bool AsmParser::parseDirectiveLine() {
   int64_t LineNumber;
   if (getLexer().is(AsmToken::Integer)) {
     if (parseIntToken(LineNumber, "unexpected token in '.line' directive"))
       return true;
     (void)LineNumber;
     // FIXME: Do something with the .line.
   }
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.line' directive"))
     return true;
 
   return false;
 }
 
 /// parseDirectiveLoc
 /// ::= .loc FileNumber [LineNumber] [ColumnPos] [basic_block] [prologue_end]
 ///                                [epilogue_begin] [is_stmt VALUE] [isa VALUE]
 /// The first number is a file number, must have been previously assigned with
 /// a .file directive, the second number is the line number and optionally the
 /// third number is a column position (zero if not specified).  The remaining
 /// optional items are .loc sub-directives.
 bool AsmParser::parseDirectiveLoc() {
   int64_t FileNumber = 0, LineNumber = 0;
   SMLoc Loc = getTok().getLoc();
   if (parseIntToken(FileNumber, "unexpected token in '.loc' directive") ||
       check(FileNumber < 1 && Ctx.getDwarfVersion() < 5, Loc,
             "file number less than one in '.loc' directive") ||
       check(!getContext().isValidDwarfFileNumber(FileNumber), Loc,
             "unassigned file number in '.loc' directive"))
     return true;
 
   // optional
   if (getLexer().is(AsmToken::Integer)) {
     LineNumber = getTok().getIntVal();
     if (LineNumber < 0)
       return TokError("line number less than zero in '.loc' directive");
     Lex();
   }
 
   int64_t ColumnPos = 0;
   if (getLexer().is(AsmToken::Integer)) {
     ColumnPos = getTok().getIntVal();
     if (ColumnPos < 0)
       return TokError("column position less than zero in '.loc' directive");
     Lex();
   }
 
   unsigned Flags = DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0;
   unsigned Isa = 0;
   int64_t Discriminator = 0;
 
   auto parseLocOp = [&]() -> bool {
     StringRef Name;
     SMLoc Loc = getTok().getLoc();
     if (parseIdentifier(Name))
       return TokError("unexpected token in '.loc' directive");
 
     if (Name == "basic_block")
       Flags |= DWARF2_FLAG_BASIC_BLOCK;
     else if (Name == "prologue_end")
       Flags |= DWARF2_FLAG_PROLOGUE_END;
     else if (Name == "epilogue_begin")
       Flags |= DWARF2_FLAG_EPILOGUE_BEGIN;
     else if (Name == "is_stmt") {
       Loc = getTok().getLoc();
       const MCExpr *Value;
       if (parseExpression(Value))
         return true;
       // The expression must be the constant 0 or 1.
       if (const MCConstantExpr *MCE = dyn_cast<MCConstantExpr>(Value)) {
         int Value = MCE->getValue();
         if (Value == 0)
           Flags &= ~DWARF2_FLAG_IS_STMT;
         else if (Value == 1)
           Flags |= DWARF2_FLAG_IS_STMT;
         else
           return Error(Loc, "is_stmt value not 0 or 1");
       } else {
         return Error(Loc, "is_stmt value not the constant value of 0 or 1");
       }
     } else if (Name == "isa") {
       Loc = getTok().getLoc();
       const MCExpr *Value;
       if (parseExpression(Value))
         return true;
       // The expression must be a constant greater or equal to 0.
       if (const MCConstantExpr *MCE = dyn_cast<MCConstantExpr>(Value)) {
         int Value = MCE->getValue();
         if (Value < 0)
           return Error(Loc, "isa number less than zero");
         Isa = Value;
       } else {
         return Error(Loc, "isa number not a constant value");
       }
     } else if (Name == "discriminator") {
       if (parseAbsoluteExpression(Discriminator))
         return true;
     } else {
       return Error(Loc, "unknown sub-directive in '.loc' directive");
     }
     return false;
   };
 
   if (parseMany(parseLocOp, false /*hasComma*/))
     return true;
 
   getStreamer().EmitDwarfLocDirective(FileNumber, LineNumber, ColumnPos, Flags,
                                       Isa, Discriminator, StringRef());
 
   return false;
 }
 
 /// parseDirectiveStabs
 /// ::= .stabs string, number, number, number
 bool AsmParser::parseDirectiveStabs() {
   return TokError("unsupported directive '.stabs'");
 }
 
 /// parseDirectiveCVFile
 /// ::= .cv_file number filename [checksum] [checksumkind]
 bool AsmParser::parseDirectiveCVFile() {
   SMLoc FileNumberLoc = getTok().getLoc();
   int64_t FileNumber;
   std::string Filename;
   std::string Checksum;
   int64_t ChecksumKind = 0;
 
   if (parseIntToken(FileNumber,
                     "expected file number in '.cv_file' directive") ||
       check(FileNumber < 1, FileNumberLoc, "file number less than one") ||
       check(getTok().isNot(AsmToken::String),
             "unexpected token in '.cv_file' directive") ||
       parseEscapedString(Filename))
     return true;
   if (!parseOptionalToken(AsmToken::EndOfStatement)) {
     if (check(getTok().isNot(AsmToken::String),
               "unexpected token in '.cv_file' directive") ||
         parseEscapedString(Checksum) ||
         parseIntToken(ChecksumKind,
                       "expected checksum kind in '.cv_file' directive") ||
         parseToken(AsmToken::EndOfStatement,
                    "unexpected token in '.cv_file' directive"))
       return true;
   }
 
   Checksum = fromHex(Checksum);
   void *CKMem = Ctx.allocate(Checksum.size(), 1);
   memcpy(CKMem, Checksum.data(), Checksum.size());
   ArrayRef<uint8_t> ChecksumAsBytes(reinterpret_cast<const uint8_t *>(CKMem),
                                     Checksum.size());
 
   if (!getStreamer().EmitCVFileDirective(FileNumber, Filename, ChecksumAsBytes,
                                          static_cast<uint8_t>(ChecksumKind)))
     return Error(FileNumberLoc, "file number already allocated");
 
   return false;
 }
 
 bool AsmParser::parseCVFunctionId(int64_t &FunctionId,
                                   StringRef DirectiveName) {
   SMLoc Loc;
   return parseTokenLoc(Loc) ||
          parseIntToken(FunctionId, "expected function id in '" + DirectiveName +
                                        "' directive") ||
          check(FunctionId < 0 || FunctionId >= UINT_MAX, Loc,
                "expected function id within range [0, UINT_MAX)");
 }
 
 bool AsmParser::parseCVFileId(int64_t &FileNumber, StringRef DirectiveName) {
   SMLoc Loc;
   return parseTokenLoc(Loc) ||
          parseIntToken(FileNumber, "expected integer in '" + DirectiveName +
                                        "' directive") ||
          check(FileNumber < 1, Loc, "file number less than one in '" +
                                         DirectiveName + "' directive") ||
          check(!getCVContext().isValidFileNumber(FileNumber), Loc,
                "unassigned file number in '" + DirectiveName + "' directive");
 }
 
 /// parseDirectiveCVFuncId
 /// ::= .cv_func_id FunctionId
 ///
 /// Introduces a function ID that can be used with .cv_loc.
 bool AsmParser::parseDirectiveCVFuncId() {
   SMLoc FunctionIdLoc = getTok().getLoc();
   int64_t FunctionId;
 
   if (parseCVFunctionId(FunctionId, ".cv_func_id") ||
       parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.cv_func_id' directive"))
     return true;
 
   if (!getStreamer().EmitCVFuncIdDirective(FunctionId))
     return Error(FunctionIdLoc, "function id already allocated");
 
   return false;
 }
 
 /// parseDirectiveCVInlineSiteId
 /// ::= .cv_inline_site_id FunctionId
 ///         "within" IAFunc
 ///         "inlined_at" IAFile IALine [IACol]
 ///
 /// Introduces a function ID that can be used with .cv_loc. Includes "inlined
 /// at" source location information for use in the line table of the caller,
 /// whether the caller is a real function or another inlined call site.
 bool AsmParser::parseDirectiveCVInlineSiteId() {
   SMLoc FunctionIdLoc = getTok().getLoc();
   int64_t FunctionId;
   int64_t IAFunc;
   int64_t IAFile;
   int64_t IALine;
   int64_t IACol = 0;
 
   // FunctionId
   if (parseCVFunctionId(FunctionId, ".cv_inline_site_id"))
     return true;
 
   // "within"
   if (check((getLexer().isNot(AsmToken::Identifier) ||
              getTok().getIdentifier() != "within"),
             "expected 'within' identifier in '.cv_inline_site_id' directive"))
     return true;
   Lex();
 
   // IAFunc
   if (parseCVFunctionId(IAFunc, ".cv_inline_site_id"))
     return true;
 
   // "inlined_at"
   if (check((getLexer().isNot(AsmToken::Identifier) ||
              getTok().getIdentifier() != "inlined_at"),
             "expected 'inlined_at' identifier in '.cv_inline_site_id' "
             "directive") )
     return true;
   Lex();
 
   // IAFile IALine
   if (parseCVFileId(IAFile, ".cv_inline_site_id") ||
       parseIntToken(IALine, "expected line number after 'inlined_at'"))
     return true;
 
   // [IACol]
   if (getLexer().is(AsmToken::Integer)) {
     IACol = getTok().getIntVal();
     Lex();
   }
 
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.cv_inline_site_id' directive"))
     return true;
 
   if (!getStreamer().EmitCVInlineSiteIdDirective(FunctionId, IAFunc, IAFile,
                                                  IALine, IACol, FunctionIdLoc))
     return Error(FunctionIdLoc, "function id already allocated");
 
   return false;
 }
 
 /// parseDirectiveCVLoc
 /// ::= .cv_loc FunctionId FileNumber [LineNumber] [ColumnPos] [prologue_end]
 ///                                [is_stmt VALUE]
 /// The first number is a file number, must have been previously assigned with
 /// a .file directive, the second number is the line number and optionally the
 /// third number is a column position (zero if not specified).  The remaining
 /// optional items are .loc sub-directives.
 bool AsmParser::parseDirectiveCVLoc() {
   SMLoc DirectiveLoc = getTok().getLoc();
   int64_t FunctionId, FileNumber;
   if (parseCVFunctionId(FunctionId, ".cv_loc") ||
       parseCVFileId(FileNumber, ".cv_loc"))
     return true;
 
   int64_t LineNumber = 0;
   if (getLexer().is(AsmToken::Integer)) {
     LineNumber = getTok().getIntVal();
     if (LineNumber < 0)
       return TokError("line number less than zero in '.cv_loc' directive");
     Lex();
   }
 
   int64_t ColumnPos = 0;
   if (getLexer().is(AsmToken::Integer)) {
     ColumnPos = getTok().getIntVal();
     if (ColumnPos < 0)
       return TokError("column position less than zero in '.cv_loc' directive");
     Lex();
   }
 
   bool PrologueEnd = false;
   uint64_t IsStmt = 0;
 
   auto parseOp = [&]() -> bool {
     StringRef Name;
     SMLoc Loc = getTok().getLoc();
     if (parseIdentifier(Name))
       return TokError("unexpected token in '.cv_loc' directive");
     if (Name == "prologue_end")
       PrologueEnd = true;
     else if (Name == "is_stmt") {
       Loc = getTok().getLoc();
       const MCExpr *Value;
       if (parseExpression(Value))
         return true;
       // The expression must be the constant 0 or 1.
       IsStmt = ~0ULL;
       if (const auto *MCE = dyn_cast<MCConstantExpr>(Value))
         IsStmt = MCE->getValue();
 
       if (IsStmt > 1)
         return Error(Loc, "is_stmt value not 0 or 1");
     } else {
       return Error(Loc, "unknown sub-directive in '.cv_loc' directive");
     }
     return false;
   };
 
   if (parseMany(parseOp, false /*hasComma*/))
     return true;
 
   getStreamer().EmitCVLocDirective(FunctionId, FileNumber, LineNumber,
                                    ColumnPos, PrologueEnd, IsStmt, StringRef(),
                                    DirectiveLoc);
   return false;
 }
 
 /// parseDirectiveCVLinetable
 /// ::= .cv_linetable FunctionId, FnStart, FnEnd
 bool AsmParser::parseDirectiveCVLinetable() {
   int64_t FunctionId;
   StringRef FnStartName, FnEndName;
   SMLoc Loc = getTok().getLoc();
   if (parseCVFunctionId(FunctionId, ".cv_linetable") ||
       parseToken(AsmToken::Comma,
                  "unexpected token in '.cv_linetable' directive") ||
       parseTokenLoc(Loc) || check(parseIdentifier(FnStartName), Loc,
                                   "expected identifier in directive") ||
       parseToken(AsmToken::Comma,
                  "unexpected token in '.cv_linetable' directive") ||
       parseTokenLoc(Loc) || check(parseIdentifier(FnEndName), Loc,
                                   "expected identifier in directive"))
     return true;
 
   MCSymbol *FnStartSym = getContext().getOrCreateSymbol(FnStartName);
   MCSymbol *FnEndSym = getContext().getOrCreateSymbol(FnEndName);
 
   getStreamer().EmitCVLinetableDirective(FunctionId, FnStartSym, FnEndSym);
   return false;
 }
 
 /// parseDirectiveCVInlineLinetable
 /// ::= .cv_inline_linetable PrimaryFunctionId FileId LineNum FnStart FnEnd
 bool AsmParser::parseDirectiveCVInlineLinetable() {
   int64_t PrimaryFunctionId, SourceFileId, SourceLineNum;
   StringRef FnStartName, FnEndName;
   SMLoc Loc = getTok().getLoc();
   if (parseCVFunctionId(PrimaryFunctionId, ".cv_inline_linetable") ||
       parseTokenLoc(Loc) ||
       parseIntToken(
           SourceFileId,
           "expected SourceField in '.cv_inline_linetable' directive") ||
       check(SourceFileId <= 0, Loc,
             "File id less than zero in '.cv_inline_linetable' directive") ||
       parseTokenLoc(Loc) ||
       parseIntToken(
           SourceLineNum,
           "expected SourceLineNum in '.cv_inline_linetable' directive") ||
       check(SourceLineNum < 0, Loc,
             "Line number less than zero in '.cv_inline_linetable' directive") ||
       parseTokenLoc(Loc) || check(parseIdentifier(FnStartName), Loc,
                                   "expected identifier in directive") ||
       parseTokenLoc(Loc) || check(parseIdentifier(FnEndName), Loc,
                                   "expected identifier in directive"))
     return true;
 
   if (parseToken(AsmToken::EndOfStatement, "Expected End of Statement"))
     return true;
 
   MCSymbol *FnStartSym = getContext().getOrCreateSymbol(FnStartName);
   MCSymbol *FnEndSym = getContext().getOrCreateSymbol(FnEndName);
   getStreamer().EmitCVInlineLinetableDirective(PrimaryFunctionId, SourceFileId,
                                                SourceLineNum, FnStartSym,
                                                FnEndSym);
   return false;
 }
 
 /// parseDirectiveCVDefRange
 /// ::= .cv_def_range RangeStart RangeEnd (GapStart GapEnd)*, bytes*
 bool AsmParser::parseDirectiveCVDefRange() {
   SMLoc Loc;
   std::vector<std::pair<const MCSymbol *, const MCSymbol *>> Ranges;
   while (getLexer().is(AsmToken::Identifier)) {
     Loc = getLexer().getLoc();
     StringRef GapStartName;
     if (parseIdentifier(GapStartName))
       return Error(Loc, "expected identifier in directive");
     MCSymbol *GapStartSym = getContext().getOrCreateSymbol(GapStartName);
 
     Loc = getLexer().getLoc();
     StringRef GapEndName;
     if (parseIdentifier(GapEndName))
       return Error(Loc, "expected identifier in directive");
     MCSymbol *GapEndSym = getContext().getOrCreateSymbol(GapEndName);
 
     Ranges.push_back({GapStartSym, GapEndSym});
   }
 
   std::string FixedSizePortion;
   if (parseToken(AsmToken::Comma, "unexpected token in directive") ||
       parseEscapedString(FixedSizePortion))
     return true;
 
   getStreamer().EmitCVDefRangeDirective(Ranges, FixedSizePortion);
   return false;
 }
 
 /// parseDirectiveCVStringTable
 /// ::= .cv_stringtable
 bool AsmParser::parseDirectiveCVStringTable() {
   getStreamer().EmitCVStringTableDirective();
   return false;
 }
 
 /// parseDirectiveCVFileChecksums
 /// ::= .cv_filechecksums
 bool AsmParser::parseDirectiveCVFileChecksums() {
   getStreamer().EmitCVFileChecksumsDirective();
   return false;
 }
 
 /// parseDirectiveCVFileChecksumOffset
 /// ::= .cv_filechecksumoffset fileno
 bool AsmParser::parseDirectiveCVFileChecksumOffset() {
   int64_t FileNo;
   if (parseIntToken(FileNo, "expected identifier in directive"))
     return true;
   if (parseToken(AsmToken::EndOfStatement, "Expected End of Statement"))
     return true;
   getStreamer().EmitCVFileChecksumOffsetDirective(FileNo);
   return false;
 }
 
 /// parseDirectiveCVFPOData
 /// ::= .cv_fpo_data procsym
 bool AsmParser::parseDirectiveCVFPOData() {
   SMLoc DirLoc = getLexer().getLoc();
   StringRef ProcName;
   if (parseIdentifier(ProcName))
     return TokError("expected symbol name");
   if (parseEOL("unexpected tokens"))
     return addErrorSuffix(" in '.cv_fpo_data' directive");
   MCSymbol *ProcSym = getContext().getOrCreateSymbol(ProcName);
   getStreamer().EmitCVFPOData(ProcSym, DirLoc);
   return false;
 }
 
 /// parseDirectiveCFISections
 /// ::= .cfi_sections section [, section]
 bool AsmParser::parseDirectiveCFISections() {
   StringRef Name;
   bool EH = false;
   bool Debug = false;
 
   if (parseIdentifier(Name))
     return TokError("Expected an identifier");
 
   if (Name == ".eh_frame")
     EH = true;
   else if (Name == ".debug_frame")
     Debug = true;
 
   if (getLexer().is(AsmToken::Comma)) {
     Lex();
 
     if (parseIdentifier(Name))
       return TokError("Expected an identifier");
 
     if (Name == ".eh_frame")
       EH = true;
     else if (Name == ".debug_frame")
       Debug = true;
   }
 
   getStreamer().EmitCFISections(EH, Debug);
   return false;
 }
 
 /// parseDirectiveCFIStartProc
 /// ::= .cfi_startproc [simple]
 bool AsmParser::parseDirectiveCFIStartProc() {
   StringRef Simple;
   if (!parseOptionalToken(AsmToken::EndOfStatement)) {
     if (check(parseIdentifier(Simple) || Simple != "simple",
               "unexpected token") ||
         parseToken(AsmToken::EndOfStatement))
       return addErrorSuffix(" in '.cfi_startproc' directive");
   }
 
   getStreamer().EmitCFIStartProc(!Simple.empty());
   return false;
 }
 
 /// parseDirectiveCFIEndProc
 /// ::= .cfi_endproc
 bool AsmParser::parseDirectiveCFIEndProc() {
   getStreamer().EmitCFIEndProc();
   return false;
 }
 
 /// parse register name or number.
 bool AsmParser::parseRegisterOrRegisterNumber(int64_t &Register,
                                               SMLoc DirectiveLoc) {
   unsigned RegNo;
 
   if (getLexer().isNot(AsmToken::Integer)) {
     if (getTargetParser().ParseRegister(RegNo, DirectiveLoc, DirectiveLoc))
       return true;
     Register = getContext().getRegisterInfo()->getDwarfRegNum(RegNo, true);
   } else
     return parseAbsoluteExpression(Register);
 
   return false;
 }
 
 /// parseDirectiveCFIDefCfa
 /// ::= .cfi_def_cfa register,  offset
 bool AsmParser::parseDirectiveCFIDefCfa(SMLoc DirectiveLoc) {
   int64_t Register = 0, Offset = 0;
   if (parseRegisterOrRegisterNumber(Register, DirectiveLoc) ||
       parseToken(AsmToken::Comma, "unexpected token in directive") ||
       parseAbsoluteExpression(Offset))
     return true;
 
   getStreamer().EmitCFIDefCfa(Register, Offset);
   return false;
 }
 
 /// parseDirectiveCFIDefCfaOffset
 /// ::= .cfi_def_cfa_offset offset
 bool AsmParser::parseDirectiveCFIDefCfaOffset() {
   int64_t Offset = 0;
   if (parseAbsoluteExpression(Offset))
     return true;
 
   getStreamer().EmitCFIDefCfaOffset(Offset);
   return false;
 }
 
 /// parseDirectiveCFIRegister
 /// ::= .cfi_register register, register
 bool AsmParser::parseDirectiveCFIRegister(SMLoc DirectiveLoc) {
   int64_t Register1 = 0, Register2 = 0;
   if (parseRegisterOrRegisterNumber(Register1, DirectiveLoc) ||
       parseToken(AsmToken::Comma, "unexpected token in directive") ||
       parseRegisterOrRegisterNumber(Register2, DirectiveLoc))
     return true;
 
   getStreamer().EmitCFIRegister(Register1, Register2);
   return false;
 }
 
 /// parseDirectiveCFIWindowSave
 /// ::= .cfi_window_save
 bool AsmParser::parseDirectiveCFIWindowSave() {
   getStreamer().EmitCFIWindowSave();
   return false;
 }
 
 /// parseDirectiveCFIAdjustCfaOffset
 /// ::= .cfi_adjust_cfa_offset adjustment
 bool AsmParser::parseDirectiveCFIAdjustCfaOffset() {
   int64_t Adjustment = 0;
   if (parseAbsoluteExpression(Adjustment))
     return true;
 
   getStreamer().EmitCFIAdjustCfaOffset(Adjustment);
   return false;
 }
 
 /// parseDirectiveCFIDefCfaRegister
 /// ::= .cfi_def_cfa_register register
 bool AsmParser::parseDirectiveCFIDefCfaRegister(SMLoc DirectiveLoc) {
   int64_t Register = 0;
   if (parseRegisterOrRegisterNumber(Register, DirectiveLoc))
     return true;
 
   getStreamer().EmitCFIDefCfaRegister(Register);
   return false;
 }
 
 /// parseDirectiveCFIOffset
 /// ::= .cfi_offset register, offset
 bool AsmParser::parseDirectiveCFIOffset(SMLoc DirectiveLoc) {
   int64_t Register = 0;
   int64_t Offset = 0;
 
   if (parseRegisterOrRegisterNumber(Register, DirectiveLoc) ||
       parseToken(AsmToken::Comma, "unexpected token in directive") ||
       parseAbsoluteExpression(Offset))
     return true;
 
   getStreamer().EmitCFIOffset(Register, Offset);
   return false;
 }
 
 /// parseDirectiveCFIRelOffset
 /// ::= .cfi_rel_offset register, offset
 bool AsmParser::parseDirectiveCFIRelOffset(SMLoc DirectiveLoc) {
   int64_t Register = 0, Offset = 0;
 
   if (parseRegisterOrRegisterNumber(Register, DirectiveLoc) ||
       parseToken(AsmToken::Comma, "unexpected token in directive") ||
       parseAbsoluteExpression(Offset))
     return true;
 
   getStreamer().EmitCFIRelOffset(Register, Offset);
   return false;
 }
 
 static bool isValidEncoding(int64_t Encoding) {
   if (Encoding & ~0xff)
     return false;
 
   if (Encoding == dwarf::DW_EH_PE_omit)
     return true;
 
   const unsigned Format = Encoding & 0xf;
   if (Format != dwarf::DW_EH_PE_absptr && Format != dwarf::DW_EH_PE_udata2 &&
       Format != dwarf::DW_EH_PE_udata4 && Format != dwarf::DW_EH_PE_udata8 &&
       Format != dwarf::DW_EH_PE_sdata2 && Format != dwarf::DW_EH_PE_sdata4 &&
       Format != dwarf::DW_EH_PE_sdata8 && Format != dwarf::DW_EH_PE_signed)
     return false;
 
   const unsigned Application = Encoding & 0x70;
   if (Application != dwarf::DW_EH_PE_absptr &&
       Application != dwarf::DW_EH_PE_pcrel)
     return false;
 
   return true;
 }
 
 /// parseDirectiveCFIPersonalityOrLsda
 /// IsPersonality true for cfi_personality, false for cfi_lsda
 /// ::= .cfi_personality encoding, [symbol_name]
 /// ::= .cfi_lsda encoding, [symbol_name]
 bool AsmParser::parseDirectiveCFIPersonalityOrLsda(bool IsPersonality) {
   int64_t Encoding = 0;
   if (parseAbsoluteExpression(Encoding))
     return true;
   if (Encoding == dwarf::DW_EH_PE_omit)
     return false;
 
   StringRef Name;
   if (check(!isValidEncoding(Encoding), "unsupported encoding.") ||
       parseToken(AsmToken::Comma, "unexpected token in directive") ||
       check(parseIdentifier(Name), "expected identifier in directive"))
     return true;
 
   MCSymbol *Sym = getContext().getOrCreateSymbol(Name);
 
   if (IsPersonality)
     getStreamer().EmitCFIPersonality(Sym, Encoding);
   else
     getStreamer().EmitCFILsda(Sym, Encoding);
   return false;
 }
 
 /// parseDirectiveCFIRememberState
 /// ::= .cfi_remember_state
 bool AsmParser::parseDirectiveCFIRememberState() {
   getStreamer().EmitCFIRememberState();
   return false;
 }
 
 /// parseDirectiveCFIRestoreState
 /// ::= .cfi_remember_state
 bool AsmParser::parseDirectiveCFIRestoreState() {
   getStreamer().EmitCFIRestoreState();
   return false;
 }
 
 /// parseDirectiveCFISameValue
 /// ::= .cfi_same_value register
 bool AsmParser::parseDirectiveCFISameValue(SMLoc DirectiveLoc) {
   int64_t Register = 0;
 
   if (parseRegisterOrRegisterNumber(Register, DirectiveLoc))
     return true;
 
   getStreamer().EmitCFISameValue(Register);
   return false;
 }
 
 /// parseDirectiveCFIRestore
 /// ::= .cfi_restore register
 bool AsmParser::parseDirectiveCFIRestore(SMLoc DirectiveLoc) {
   int64_t Register = 0;
   if (parseRegisterOrRegisterNumber(Register, DirectiveLoc))
     return true;
 
   getStreamer().EmitCFIRestore(Register);
   return false;
 }
 
 /// parseDirectiveCFIEscape
 /// ::= .cfi_escape expression[,...]
 bool AsmParser::parseDirectiveCFIEscape() {
   std::string Values;
   int64_t CurrValue;
   if (parseAbsoluteExpression(CurrValue))
     return true;
 
   Values.push_back((uint8_t)CurrValue);
 
   while (getLexer().is(AsmToken::Comma)) {
     Lex();
 
     if (parseAbsoluteExpression(CurrValue))
       return true;
 
     Values.push_back((uint8_t)CurrValue);
   }
 
   getStreamer().EmitCFIEscape(Values);
   return false;
 }
 
 /// parseDirectiveCFIReturnColumn
 /// ::= .cfi_return_column register
 bool AsmParser::parseDirectiveCFIReturnColumn(SMLoc DirectiveLoc) {
   int64_t Register = 0;
   if (parseRegisterOrRegisterNumber(Register, DirectiveLoc))
     return true;
   getStreamer().EmitCFIReturnColumn(Register);
   return false;
 }
 
 /// parseDirectiveCFISignalFrame
 /// ::= .cfi_signal_frame
 bool AsmParser::parseDirectiveCFISignalFrame() {
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.cfi_signal_frame'"))
     return true;
 
   getStreamer().EmitCFISignalFrame();
   return false;
 }
 
 /// parseDirectiveCFIUndefined
 /// ::= .cfi_undefined register
 bool AsmParser::parseDirectiveCFIUndefined(SMLoc DirectiveLoc) {
   int64_t Register = 0;
 
   if (parseRegisterOrRegisterNumber(Register, DirectiveLoc))
     return true;
 
   getStreamer().EmitCFIUndefined(Register);
   return false;
 }
 
 /// parseDirectiveAltmacro
 /// ::= .altmacro
 /// ::= .noaltmacro
 bool AsmParser::parseDirectiveAltmacro(StringRef Directive) {
   if (getLexer().isNot(AsmToken::EndOfStatement))
     return TokError("unexpected token in '" + Directive + "' directive");
   if (Directive == ".altmacro")
     getLexer().SetAltMacroMode(true);
   else
     getLexer().SetAltMacroMode(false);
   return false;
 }
 
 /// parseDirectiveMacrosOnOff
 /// ::= .macros_on
 /// ::= .macros_off
 bool AsmParser::parseDirectiveMacrosOnOff(StringRef Directive) {
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '" + Directive + "' directive"))
     return true;
 
   setMacrosEnabled(Directive == ".macros_on");
   return false;
 }
 
 /// parseDirectiveMacro
 /// ::= .macro name[,] [parameters]
 bool AsmParser::parseDirectiveMacro(SMLoc DirectiveLoc) {
   StringRef Name;
   if (parseIdentifier(Name))
     return TokError("expected identifier in '.macro' directive");
 
   if (getLexer().is(AsmToken::Comma))
     Lex();
 
   MCAsmMacroParameters Parameters;
   while (getLexer().isNot(AsmToken::EndOfStatement)) {
 
     if (!Parameters.empty() && Parameters.back().Vararg)
       return Error(Lexer.getLoc(),
                    "Vararg parameter '" + Parameters.back().Name +
                    "' should be last one in the list of parameters.");
 
     MCAsmMacroParameter Parameter;
     if (parseIdentifier(Parameter.Name))
       return TokError("expected identifier in '.macro' directive");
 
     // Emit an error if two (or more) named parameters share the same name
     for (const MCAsmMacroParameter& CurrParam : Parameters)
       if (CurrParam.Name.equals(Parameter.Name))
         return TokError("macro '" + Name + "' has multiple parameters"
                         " named '" + Parameter.Name + "'");
 
     if (Lexer.is(AsmToken::Colon)) {
       Lex();  // consume ':'
 
       SMLoc QualLoc;
       StringRef Qualifier;
 
       QualLoc = Lexer.getLoc();
       if (parseIdentifier(Qualifier))
         return Error(QualLoc, "missing parameter qualifier for "
                      "'" + Parameter.Name + "' in macro '" + Name + "'");
 
       if (Qualifier == "req")
         Parameter.Required = true;
       else if (Qualifier == "vararg")
         Parameter.Vararg = true;
       else
         return Error(QualLoc, Qualifier + " is not a valid parameter qualifier "
                      "for '" + Parameter.Name + "' in macro '" + Name + "'");
     }
 
     if (getLexer().is(AsmToken::Equal)) {
       Lex();
 
       SMLoc ParamLoc;
 
       ParamLoc = Lexer.getLoc();
       if (parseMacroArgument(Parameter.Value, /*Vararg=*/false ))
         return true;
 
       if (Parameter.Required)
         Warning(ParamLoc, "pointless default value for required parameter "
                 "'" + Parameter.Name + "' in macro '" + Name + "'");
     }
 
     Parameters.push_back(std::move(Parameter));
 
     if (getLexer().is(AsmToken::Comma))
       Lex();
   }
 
   // Eat just the end of statement.
   Lexer.Lex();
 
   // Consuming deferred text, so use Lexer.Lex to ignore Lexing Errors
   AsmToken EndToken, StartToken = getTok();
   unsigned MacroDepth = 0;
   // Lex the macro definition.
   while (true) {
     // Ignore Lexing errors in macros.
     while (Lexer.is(AsmToken::Error)) {
       Lexer.Lex();
     }
 
     // Check whether we have reached the end of the file.
     if (getLexer().is(AsmToken::Eof))
       return Error(DirectiveLoc, "no matching '.endmacro' in definition");
 
     // Otherwise, check whether we have reach the .endmacro.
     if (getLexer().is(AsmToken::Identifier)) {
       if (getTok().getIdentifier() == ".endm" ||
           getTok().getIdentifier() == ".endmacro") {
         if (MacroDepth == 0) { // Outermost macro.
           EndToken = getTok();
           Lexer.Lex();
           if (getLexer().isNot(AsmToken::EndOfStatement))
             return TokError("unexpected token in '" + EndToken.getIdentifier() +
                             "' directive");
           break;
         } else {
           // Otherwise we just found the end of an inner macro.
           --MacroDepth;
         }
       } else if (getTok().getIdentifier() == ".macro") {
         // We allow nested macros. Those aren't instantiated until the outermost
         // macro is expanded so just ignore them for now.
         ++MacroDepth;
       }
     }
 
     // Otherwise, scan til the end of the statement.
     eatToEndOfStatement();
   }
 
   if (getContext().lookupMacro(Name)) {
     return Error(DirectiveLoc, "macro '" + Name + "' is already defined");
   }
 
   const char *BodyStart = StartToken.getLoc().getPointer();
   const char *BodyEnd = EndToken.getLoc().getPointer();
   StringRef Body = StringRef(BodyStart, BodyEnd - BodyStart);
   checkForBadMacro(DirectiveLoc, Name, Body, Parameters);
   MCAsmMacro Macro(Name, Body, std::move(Parameters));
   DEBUG_WITH_TYPE("asm-macros", dbgs() << "Defining new macro:\n";
                   Macro.dump());
   getContext().defineMacro(Name, std::move(Macro));
   return false;
 }
 
 /// checkForBadMacro
 ///
 /// With the support added for named parameters there may be code out there that
 /// is transitioning from positional parameters.  In versions of gas that did
 /// not support named parameters they would be ignored on the macro definition.
 /// But to support both styles of parameters this is not possible so if a macro
 /// definition has named parameters but does not use them and has what appears
 /// to be positional parameters, strings like $1, $2, ... and $n, then issue a
 /// warning that the positional parameter found in body which have no effect.
 /// Hoping the developer will either remove the named parameters from the macro
 /// definition so the positional parameters get used if that was what was
 /// intended or change the macro to use the named parameters.  It is possible
 /// this warning will trigger when the none of the named parameters are used
 /// and the strings like $1 are infact to simply to be passed trough unchanged.
 void AsmParser::checkForBadMacro(SMLoc DirectiveLoc, StringRef Name,
                                  StringRef Body,
                                  ArrayRef<MCAsmMacroParameter> Parameters) {
   // If this macro is not defined with named parameters the warning we are
   // checking for here doesn't apply.
   unsigned NParameters = Parameters.size();
   if (NParameters == 0)
     return;
 
   bool NamedParametersFound = false;
   bool PositionalParametersFound = false;
 
   // Look at the body of the macro for use of both the named parameters and what
   // are likely to be positional parameters.  This is what expandMacro() is
   // doing when it finds the parameters in the body.
   while (!Body.empty()) {
     // Scan for the next possible parameter.
     std::size_t End = Body.size(), Pos = 0;
     for (; Pos != End; ++Pos) {
       // Check for a substitution or escape.
       // This macro is defined with parameters, look for \foo, \bar, etc.
       if (Body[Pos] == '\\' && Pos + 1 != End)
         break;
 
       // This macro should have parameters, but look for $0, $1, ..., $n too.
       if (Body[Pos] != '$' || Pos + 1 == End)
         continue;
       char Next = Body[Pos + 1];
       if (Next == '$' || Next == 'n' ||
           isdigit(static_cast<unsigned char>(Next)))
         break;
     }
 
     // Check if we reached the end.
     if (Pos == End)
       break;
 
     if (Body[Pos] == '$') {
       switch (Body[Pos + 1]) {
       // $$ => $
       case '$':
         break;
 
       // $n => number of arguments
       case 'n':
         PositionalParametersFound = true;
         break;
 
       // $[0-9] => argument
       default: {
         PositionalParametersFound = true;
         break;
       }
       }
       Pos += 2;
     } else {
       unsigned I = Pos + 1;
       while (isIdentifierChar(Body[I]) && I + 1 != End)
         ++I;
 
       const char *Begin = Body.data() + Pos + 1;
       StringRef Argument(Begin, I - (Pos + 1));
       unsigned Index = 0;
       for (; Index < NParameters; ++Index)
         if (Parameters[Index].Name == Argument)
           break;
 
       if (Index == NParameters) {
         if (Body[Pos + 1] == '(' && Body[Pos + 2] == ')')
           Pos += 3;
         else {
           Pos = I;
         }
       } else {
         NamedParametersFound = true;
         Pos += 1 + Argument.size();
       }
     }
     // Update the scan point.
     Body = Body.substr(Pos);
   }
 
   if (!NamedParametersFound && PositionalParametersFound)
     Warning(DirectiveLoc, "macro defined with named parameters which are not "
                           "used in macro body, possible positional parameter "
                           "found in body which will have no effect");
 }
 
 /// parseDirectiveExitMacro
 /// ::= .exitm
 bool AsmParser::parseDirectiveExitMacro(StringRef Directive) {
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '" + Directive + "' directive"))
     return true;
 
   if (!isInsideMacroInstantiation())
     return TokError("unexpected '" + Directive + "' in file, "
                                                  "no current macro definition");
 
   // Exit all conditionals that are active in the current macro.
   while (TheCondStack.size() != ActiveMacros.back()->CondStackDepth) {
     TheCondState = TheCondStack.back();
     TheCondStack.pop_back();
   }
 
   handleMacroExit();
   return false;
 }
 
 /// parseDirectiveEndMacro
 /// ::= .endm
 /// ::= .endmacro
 bool AsmParser::parseDirectiveEndMacro(StringRef Directive) {
   if (getLexer().isNot(AsmToken::EndOfStatement))
     return TokError("unexpected token in '" + Directive + "' directive");
 
   // If we are inside a macro instantiation, terminate the current
   // instantiation.
   if (isInsideMacroInstantiation()) {
     handleMacroExit();
     return false;
   }
 
   // Otherwise, this .endmacro is a stray entry in the file; well formed
   // .endmacro directives are handled during the macro definition parsing.
   return TokError("unexpected '" + Directive + "' in file, "
                                                "no current macro definition");
 }
 
 /// parseDirectivePurgeMacro
 /// ::= .purgem
 bool AsmParser::parseDirectivePurgeMacro(SMLoc DirectiveLoc) {
   StringRef Name;
   SMLoc Loc;
   if (parseTokenLoc(Loc) ||
       check(parseIdentifier(Name), Loc,
             "expected identifier in '.purgem' directive") ||
       parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.purgem' directive"))
     return true;
 
   if (!getContext().lookupMacro(Name))
     return Error(DirectiveLoc, "macro '" + Name + "' is not defined");
 
   getContext().undefineMacro(Name);
   DEBUG_WITH_TYPE("asm-macros", dbgs()
                                     << "Un-defining macro: " << Name << "\n");
   return false;
 }
 
 /// parseDirectiveBundleAlignMode
 /// ::= {.bundle_align_mode} expression
 bool AsmParser::parseDirectiveBundleAlignMode() {
   // Expect a single argument: an expression that evaluates to a constant
   // in the inclusive range 0-30.
   SMLoc ExprLoc = getLexer().getLoc();
   int64_t AlignSizePow2;
   if (checkForValidSection() || parseAbsoluteExpression(AlignSizePow2) ||
       parseToken(AsmToken::EndOfStatement, "unexpected token after expression "
                                            "in '.bundle_align_mode' "
                                            "directive") ||
       check(AlignSizePow2 < 0 || AlignSizePow2 > 30, ExprLoc,
             "invalid bundle alignment size (expected between 0 and 30)"))
     return true;
 
   // Because of AlignSizePow2's verified range we can safely truncate it to
   // unsigned.
   getStreamer().EmitBundleAlignMode(static_cast<unsigned>(AlignSizePow2));
   return false;
 }
 
 /// parseDirectiveBundleLock
 /// ::= {.bundle_lock} [align_to_end]
 bool AsmParser::parseDirectiveBundleLock() {
   if (checkForValidSection())
     return true;
   bool AlignToEnd = false;
 
   StringRef Option;
   SMLoc Loc = getTok().getLoc();
   const char *kInvalidOptionError =
       "invalid option for '.bundle_lock' directive";
 
   if (!parseOptionalToken(AsmToken::EndOfStatement)) {
     if (check(parseIdentifier(Option), Loc, kInvalidOptionError) ||
         check(Option != "align_to_end", Loc, kInvalidOptionError) ||
         parseToken(AsmToken::EndOfStatement,
                    "unexpected token after '.bundle_lock' directive option"))
       return true;
     AlignToEnd = true;
   }
 
   getStreamer().EmitBundleLock(AlignToEnd);
   return false;
 }
 
 /// parseDirectiveBundleLock
 /// ::= {.bundle_lock}
 bool AsmParser::parseDirectiveBundleUnlock() {
   if (checkForValidSection() ||
       parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.bundle_unlock' directive"))
     return true;
 
   getStreamer().EmitBundleUnlock();
   return false;
 }
 
 /// parseDirectiveSpace
 /// ::= (.skip | .space) expression [ , expression ]
 bool AsmParser::parseDirectiveSpace(StringRef IDVal) {
   SMLoc NumBytesLoc = Lexer.getLoc();
   const MCExpr *NumBytes;
   if (checkForValidSection() || parseExpression(NumBytes))
     return true;
 
   int64_t FillExpr = 0;
   if (parseOptionalToken(AsmToken::Comma))
     if (parseAbsoluteExpression(FillExpr))
       return addErrorSuffix("in '" + Twine(IDVal) + "' directive");
   if (parseToken(AsmToken::EndOfStatement))
     return addErrorSuffix("in '" + Twine(IDVal) + "' directive");
 
   // FIXME: Sometimes the fill expr is 'nop' if it isn't supplied, instead of 0.
   getStreamer().emitFill(*NumBytes, FillExpr, NumBytesLoc);
 
   return false;
 }
 
 /// parseDirectiveDCB
 /// ::= .dcb.{b, l, w} expression, expression
 bool AsmParser::parseDirectiveDCB(StringRef IDVal, unsigned Size) {
   SMLoc NumValuesLoc = Lexer.getLoc();
   int64_t NumValues;
   if (checkForValidSection() || parseAbsoluteExpression(NumValues))
     return true;
 
   if (NumValues < 0) {
     Warning(NumValuesLoc, "'" + Twine(IDVal) + "' directive with negative repeat count has no effect");
     return false;
   }
 
   if (parseToken(AsmToken::Comma,
                  "unexpected token in '" + Twine(IDVal) + "' directive"))
     return true;
 
   const MCExpr *Value;
   SMLoc ExprLoc = getLexer().getLoc();
   if (parseExpression(Value))
     return true;
 
   // Special case constant expressions to match code generator.
   if (const MCConstantExpr *MCE = dyn_cast<MCConstantExpr>(Value)) {
     assert(Size <= 8 && "Invalid size");
     uint64_t IntValue = MCE->getValue();
     if (!isUIntN(8 * Size, IntValue) && !isIntN(8 * Size, IntValue))
       return Error(ExprLoc, "literal value out of range for directive");
     for (uint64_t i = 0, e = NumValues; i != e; ++i)
       getStreamer().EmitIntValue(IntValue, Size);
   } else {
     for (uint64_t i = 0, e = NumValues; i != e; ++i)
       getStreamer().EmitValue(Value, Size, ExprLoc);
   }
 
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '" + Twine(IDVal) + "' directive"))
     return true;
 
   return false;
 }
 
 /// parseDirectiveRealDCB
 /// ::= .dcb.{d, s} expression, expression
 bool AsmParser::parseDirectiveRealDCB(StringRef IDVal, const fltSemantics &Semantics) {
   SMLoc NumValuesLoc = Lexer.getLoc();
   int64_t NumValues;
   if (checkForValidSection() || parseAbsoluteExpression(NumValues))
     return true;
 
   if (NumValues < 0) {
     Warning(NumValuesLoc, "'" + Twine(IDVal) + "' directive with negative repeat count has no effect");
     return false;
   }
 
   if (parseToken(AsmToken::Comma,
                  "unexpected token in '" + Twine(IDVal) + "' directive"))
     return true;
 
   APInt AsInt;
   if (parseRealValue(Semantics, AsInt))
     return true;
 
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '" + Twine(IDVal) + "' directive"))
     return true;
 
   for (uint64_t i = 0, e = NumValues; i != e; ++i)
     getStreamer().EmitIntValue(AsInt.getLimitedValue(),
                                AsInt.getBitWidth() / 8);
 
   return false;
 }
 
 /// parseDirectiveDS
 /// ::= .ds.{b, d, l, p, s, w, x} expression
 bool AsmParser::parseDirectiveDS(StringRef IDVal, unsigned Size) {
   SMLoc NumValuesLoc = Lexer.getLoc();
   int64_t NumValues;
   if (checkForValidSection() || parseAbsoluteExpression(NumValues))
     return true;
 
   if (NumValues < 0) {
     Warning(NumValuesLoc, "'" + Twine(IDVal) + "' directive with negative repeat count has no effect");
     return false;
   }
 
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '" + Twine(IDVal) + "' directive"))
     return true;
 
   for (uint64_t i = 0, e = NumValues; i != e; ++i)
     getStreamer().emitFill(Size, 0);
 
   return false;
 }
 
 /// parseDirectiveLEB128
 /// ::= (.sleb128 | .uleb128) [ expression (, expression)* ]
 bool AsmParser::parseDirectiveLEB128(bool Signed) {
   if (checkForValidSection())
     return true;
 
   auto parseOp = [&]() -> bool {
     const MCExpr *Value;
     if (parseExpression(Value))
       return true;
     if (Signed)
       getStreamer().EmitSLEB128Value(Value);
     else
       getStreamer().EmitULEB128Value(Value);
     return false;
   };
 
   if (parseMany(parseOp))
     return addErrorSuffix(" in directive");
 
   return false;
 }
 
 /// parseDirectiveSymbolAttribute
 ///  ::= { ".globl", ".weak", ... } [ identifier ( , identifier )* ]
 bool AsmParser::parseDirectiveSymbolAttribute(MCSymbolAttr Attr) {
   auto parseOp = [&]() -> bool {
     StringRef Name;
     SMLoc Loc = getTok().getLoc();
     if (parseIdentifier(Name))
       return Error(Loc, "expected identifier");
     MCSymbol *Sym = getContext().getOrCreateSymbol(Name);
 
     // Assembler local symbols don't make any sense here. Complain loudly.
     if (Sym->isTemporary())
       return Error(Loc, "non-local symbol required");
 
     if (!getStreamer().EmitSymbolAttribute(Sym, Attr))
       return Error(Loc, "unable to emit symbol attribute");
     return false;
   };
 
   if (parseMany(parseOp))
     return addErrorSuffix(" in directive");
   return false;
 }
 
 /// parseDirectiveComm
 ///  ::= ( .comm | .lcomm ) identifier , size_expression [ , align_expression ]
 bool AsmParser::parseDirectiveComm(bool IsLocal) {
   if (checkForValidSection())
     return true;
 
   SMLoc IDLoc = getLexer().getLoc();
   StringRef Name;
   if (parseIdentifier(Name))
     return TokError("expected identifier in directive");
 
   // Handle the identifier as the key symbol.
   MCSymbol *Sym = getContext().getOrCreateSymbol(Name);
 
   if (getLexer().isNot(AsmToken::Comma))
     return TokError("unexpected token in directive");
   Lex();
 
   int64_t Size;
   SMLoc SizeLoc = getLexer().getLoc();
   if (parseAbsoluteExpression(Size))
     return true;
 
   int64_t Pow2Alignment = 0;
   SMLoc Pow2AlignmentLoc;
   if (getLexer().is(AsmToken::Comma)) {
     Lex();
     Pow2AlignmentLoc = getLexer().getLoc();
     if (parseAbsoluteExpression(Pow2Alignment))
       return true;
 
     LCOMM::LCOMMType LCOMM = Lexer.getMAI().getLCOMMDirectiveAlignmentType();
     if (IsLocal && LCOMM == LCOMM::NoAlignment)
       return Error(Pow2AlignmentLoc, "alignment not supported on this target");
 
     // If this target takes alignments in bytes (not log) validate and convert.
     if ((!IsLocal && Lexer.getMAI().getCOMMDirectiveAlignmentIsInBytes()) ||
         (IsLocal && LCOMM == LCOMM::ByteAlignment)) {
       if (!isPowerOf2_64(Pow2Alignment))
         return Error(Pow2AlignmentLoc, "alignment must be a power of 2");
       Pow2Alignment = Log2_64(Pow2Alignment);
     }
   }
 
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.comm' or '.lcomm' directive"))
     return true;
 
   // NOTE: a size of zero for a .comm should create a undefined symbol
   // but a size of .lcomm creates a bss symbol of size zero.
   if (Size < 0)
     return Error(SizeLoc, "invalid '.comm' or '.lcomm' directive size, can't "
                           "be less than zero");
 
   // NOTE: The alignment in the directive is a power of 2 value, the assembler
   // may internally end up wanting an alignment in bytes.
   // FIXME: Diagnose overflow.
   if (Pow2Alignment < 0)
     return Error(Pow2AlignmentLoc, "invalid '.comm' or '.lcomm' directive "
                                    "alignment, can't be less than zero");
 
   Sym->redefineIfPossible();
   if (!Sym->isUndefined())
     return Error(IDLoc, "invalid symbol redefinition");
 
   // Create the Symbol as a common or local common with Size and Pow2Alignment
   if (IsLocal) {
     getStreamer().EmitLocalCommonSymbol(Sym, Size, 1 << Pow2Alignment);
     return false;
   }
 
   getStreamer().EmitCommonSymbol(Sym, Size, 1 << Pow2Alignment);
   return false;
 }
 
 /// parseDirectiveAbort
 ///  ::= .abort [... message ...]
 bool AsmParser::parseDirectiveAbort() {
   // FIXME: Use loc from directive.
   SMLoc Loc = getLexer().getLoc();
 
   StringRef Str = parseStringToEndOfStatement();
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.abort' directive"))
     return true;
 
   if (Str.empty())
     return Error(Loc, ".abort detected. Assembly stopping.");
   else
     return Error(Loc, ".abort '" + Str + "' detected. Assembly stopping.");
   // FIXME: Actually abort assembly here.
 
   return false;
 }
 
 /// parseDirectiveInclude
 ///  ::= .include "filename"
 bool AsmParser::parseDirectiveInclude() {
   // Allow the strings to have escaped octal character sequence.
   std::string Filename;
   SMLoc IncludeLoc = getTok().getLoc();
 
   if (check(getTok().isNot(AsmToken::String),
             "expected string in '.include' directive") ||
       parseEscapedString(Filename) ||
       check(getTok().isNot(AsmToken::EndOfStatement),
             "unexpected token in '.include' directive") ||
       // Attempt to switch the lexer to the included file before consuming the
       // end of statement to avoid losing it when we switch.
       check(enterIncludeFile(Filename), IncludeLoc,
             "Could not find include file '" + Filename + "'"))
     return true;
 
   return false;
 }
 
 /// parseDirectiveIncbin
 ///  ::= .incbin "filename" [ , skip [ , count ] ]
 bool AsmParser::parseDirectiveIncbin() {
   // Allow the strings to have escaped octal character sequence.
   std::string Filename;
   SMLoc IncbinLoc = getTok().getLoc();
   if (check(getTok().isNot(AsmToken::String),
             "expected string in '.incbin' directive") ||
       parseEscapedString(Filename))
     return true;
 
   int64_t Skip = 0;
   const MCExpr *Count = nullptr;
   SMLoc SkipLoc, CountLoc;
   if (parseOptionalToken(AsmToken::Comma)) {
     // The skip expression can be omitted while specifying the count, e.g:
     //  .incbin "filename",,4
     if (getTok().isNot(AsmToken::Comma)) {
       if (parseTokenLoc(SkipLoc) || parseAbsoluteExpression(Skip))
         return true;
     }
     if (parseOptionalToken(AsmToken::Comma)) {
       CountLoc = getTok().getLoc();
       if (parseExpression(Count))
         return true;
     }
   }
 
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.incbin' directive"))
     return true;
 
   if (check(Skip < 0, SkipLoc, "skip is negative"))
     return true;
 
   // Attempt to process the included file.
   if (processIncbinFile(Filename, Skip, Count, CountLoc))
     return Error(IncbinLoc, "Could not find incbin file '" + Filename + "'");
   return false;
 }
 
 /// parseDirectiveIf
 /// ::= .if{,eq,ge,gt,le,lt,ne} expression
 bool AsmParser::parseDirectiveIf(SMLoc DirectiveLoc, DirectiveKind DirKind) {
   TheCondStack.push_back(TheCondState);
   TheCondState.TheCond = AsmCond::IfCond;
   if (TheCondState.Ignore) {
     eatToEndOfStatement();
   } else {
     int64_t ExprValue;
     if (parseAbsoluteExpression(ExprValue) ||
         parseToken(AsmToken::EndOfStatement,
                    "unexpected token in '.if' directive"))
       return true;
 
     switch (DirKind) {
     default:
       llvm_unreachable("unsupported directive");
     case DK_IF:
     case DK_IFNE:
       break;
     case DK_IFEQ:
       ExprValue = ExprValue == 0;
       break;
     case DK_IFGE:
       ExprValue = ExprValue >= 0;
       break;
     case DK_IFGT:
       ExprValue = ExprValue > 0;
       break;
     case DK_IFLE:
       ExprValue = ExprValue <= 0;
       break;
     case DK_IFLT:
       ExprValue = ExprValue < 0;
       break;
     }
 
     TheCondState.CondMet = ExprValue;
     TheCondState.Ignore = !TheCondState.CondMet;
   }
 
   return false;
 }
 
 /// parseDirectiveIfb
 /// ::= .ifb string
 bool AsmParser::parseDirectiveIfb(SMLoc DirectiveLoc, bool ExpectBlank) {
   TheCondStack.push_back(TheCondState);
   TheCondState.TheCond = AsmCond::IfCond;
 
   if (TheCondState.Ignore) {
     eatToEndOfStatement();
   } else {
     StringRef Str = parseStringToEndOfStatement();
 
     if (parseToken(AsmToken::EndOfStatement,
                    "unexpected token in '.ifb' directive"))
       return true;
 
     TheCondState.CondMet = ExpectBlank == Str.empty();
     TheCondState.Ignore = !TheCondState.CondMet;
   }
 
   return false;
 }
 
 /// parseDirectiveIfc
 /// ::= .ifc string1, string2
 /// ::= .ifnc string1, string2
 bool AsmParser::parseDirectiveIfc(SMLoc DirectiveLoc, bool ExpectEqual) {
   TheCondStack.push_back(TheCondState);
   TheCondState.TheCond = AsmCond::IfCond;
 
   if (TheCondState.Ignore) {
     eatToEndOfStatement();
   } else {
     StringRef Str1 = parseStringToComma();
 
     if (parseToken(AsmToken::Comma, "unexpected token in '.ifc' directive"))
       return true;
 
     StringRef Str2 = parseStringToEndOfStatement();
 
     if (parseToken(AsmToken::EndOfStatement,
                    "unexpected token in '.ifc' directive"))
       return true;
 
     TheCondState.CondMet = ExpectEqual == (Str1.trim() == Str2.trim());
     TheCondState.Ignore = !TheCondState.CondMet;
   }
 
   return false;
 }
 
 /// parseDirectiveIfeqs
 ///   ::= .ifeqs string1, string2
 bool AsmParser::parseDirectiveIfeqs(SMLoc DirectiveLoc, bool ExpectEqual) {
   if (Lexer.isNot(AsmToken::String)) {
     if (ExpectEqual)
       return TokError("expected string parameter for '.ifeqs' directive");
     return TokError("expected string parameter for '.ifnes' directive");
   }
 
   StringRef String1 = getTok().getStringContents();
   Lex();
 
   if (Lexer.isNot(AsmToken::Comma)) {
     if (ExpectEqual)
       return TokError(
           "expected comma after first string for '.ifeqs' directive");
     return TokError("expected comma after first string for '.ifnes' directive");
   }
 
   Lex();
 
   if (Lexer.isNot(AsmToken::String)) {
     if (ExpectEqual)
       return TokError("expected string parameter for '.ifeqs' directive");
     return TokError("expected string parameter for '.ifnes' directive");
   }
 
   StringRef String2 = getTok().getStringContents();
   Lex();
 
   TheCondStack.push_back(TheCondState);
   TheCondState.TheCond = AsmCond::IfCond;
   TheCondState.CondMet = ExpectEqual == (String1 == String2);
   TheCondState.Ignore = !TheCondState.CondMet;
 
   return false;
 }
 
 /// parseDirectiveIfdef
 /// ::= .ifdef symbol
 bool AsmParser::parseDirectiveIfdef(SMLoc DirectiveLoc, bool expect_defined) {
   StringRef Name;
   TheCondStack.push_back(TheCondState);
   TheCondState.TheCond = AsmCond::IfCond;
 
   if (TheCondState.Ignore) {
     eatToEndOfStatement();
   } else {
     if (check(parseIdentifier(Name), "expected identifier after '.ifdef'") ||
         parseToken(AsmToken::EndOfStatement, "unexpected token in '.ifdef'"))
       return true;
 
     MCSymbol *Sym = getContext().lookupSymbol(Name);
 
     if (expect_defined)
       TheCondState.CondMet = (Sym && !Sym->isUndefined());
     else
       TheCondState.CondMet = (!Sym || Sym->isUndefined());
     TheCondState.Ignore = !TheCondState.CondMet;
   }
 
   return false;
 }
 
 /// parseDirectiveElseIf
 /// ::= .elseif expression
 bool AsmParser::parseDirectiveElseIf(SMLoc DirectiveLoc) {
   if (TheCondState.TheCond != AsmCond::IfCond &&
       TheCondState.TheCond != AsmCond::ElseIfCond)
     return Error(DirectiveLoc, "Encountered a .elseif that doesn't follow an"
                                " .if or  an .elseif");
   TheCondState.TheCond = AsmCond::ElseIfCond;
 
   bool LastIgnoreState = false;
   if (!TheCondStack.empty())
     LastIgnoreState = TheCondStack.back().Ignore;
   if (LastIgnoreState || TheCondState.CondMet) {
     TheCondState.Ignore = true;
     eatToEndOfStatement();
   } else {
     int64_t ExprValue;
     if (parseAbsoluteExpression(ExprValue))
       return true;
 
     if (parseToken(AsmToken::EndOfStatement,
                    "unexpected token in '.elseif' directive"))
       return true;
 
     TheCondState.CondMet = ExprValue;
     TheCondState.Ignore = !TheCondState.CondMet;
   }
 
   return false;
 }
 
 /// parseDirectiveElse
 /// ::= .else
 bool AsmParser::parseDirectiveElse(SMLoc DirectiveLoc) {
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.else' directive"))
     return true;
 
   if (TheCondState.TheCond != AsmCond::IfCond &&
       TheCondState.TheCond != AsmCond::ElseIfCond)
     return Error(DirectiveLoc, "Encountered a .else that doesn't follow "
                                " an .if or an .elseif");
   TheCondState.TheCond = AsmCond::ElseCond;
   bool LastIgnoreState = false;
   if (!TheCondStack.empty())
     LastIgnoreState = TheCondStack.back().Ignore;
   if (LastIgnoreState || TheCondState.CondMet)
     TheCondState.Ignore = true;
   else
     TheCondState.Ignore = false;
 
   return false;
 }
 
 /// parseDirectiveEnd
 /// ::= .end
 bool AsmParser::parseDirectiveEnd(SMLoc DirectiveLoc) {
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.end' directive"))
     return true;
 
   while (Lexer.isNot(AsmToken::Eof))
     Lexer.Lex();
 
   return false;
 }
 
 /// parseDirectiveError
 ///   ::= .err
 ///   ::= .error [string]
 bool AsmParser::parseDirectiveError(SMLoc L, bool WithMessage) {
   if (!TheCondStack.empty()) {
     if (TheCondStack.back().Ignore) {
       eatToEndOfStatement();
       return false;
     }
   }
 
   if (!WithMessage)
     return Error(L, ".err encountered");
 
   StringRef Message = ".error directive invoked in source file";
   if (Lexer.isNot(AsmToken::EndOfStatement)) {
     if (Lexer.isNot(AsmToken::String))
       return TokError(".error argument must be a string");
 
     Message = getTok().getStringContents();
     Lex();
   }
 
   return Error(L, Message);
 }
 
 /// parseDirectiveWarning
 ///   ::= .warning [string]
 bool AsmParser::parseDirectiveWarning(SMLoc L) {
   if (!TheCondStack.empty()) {
     if (TheCondStack.back().Ignore) {
       eatToEndOfStatement();
       return false;
     }
   }
 
   StringRef Message = ".warning directive invoked in source file";
 
   if (!parseOptionalToken(AsmToken::EndOfStatement)) {
     if (Lexer.isNot(AsmToken::String))
       return TokError(".warning argument must be a string");
 
     Message = getTok().getStringContents();
     Lex();
     if (parseToken(AsmToken::EndOfStatement,
                    "expected end of statement in '.warning' directive"))
       return true;
   }
 
   return Warning(L, Message);
 }
 
 /// parseDirectiveEndIf
 /// ::= .endif
 bool AsmParser::parseDirectiveEndIf(SMLoc DirectiveLoc) {
   if (parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '.endif' directive"))
     return true;
 
   if ((TheCondState.TheCond == AsmCond::NoCond) || TheCondStack.empty())
     return Error(DirectiveLoc, "Encountered a .endif that doesn't follow "
                                "an .if or .else");
   if (!TheCondStack.empty()) {
     TheCondState = TheCondStack.back();
     TheCondStack.pop_back();
   }
 
   return false;
 }
 
 void AsmParser::initializeDirectiveKindMap() {
   DirectiveKindMap[".set"] = DK_SET;
   DirectiveKindMap[".equ"] = DK_EQU;
   DirectiveKindMap[".equiv"] = DK_EQUIV;
   DirectiveKindMap[".ascii"] = DK_ASCII;
   DirectiveKindMap[".asciz"] = DK_ASCIZ;
   DirectiveKindMap[".string"] = DK_STRING;
   DirectiveKindMap[".byte"] = DK_BYTE;
   DirectiveKindMap[".short"] = DK_SHORT;
   DirectiveKindMap[".value"] = DK_VALUE;
   DirectiveKindMap[".2byte"] = DK_2BYTE;
   DirectiveKindMap[".long"] = DK_LONG;
   DirectiveKindMap[".int"] = DK_INT;
   DirectiveKindMap[".4byte"] = DK_4BYTE;
   DirectiveKindMap[".quad"] = DK_QUAD;
   DirectiveKindMap[".8byte"] = DK_8BYTE;
   DirectiveKindMap[".octa"] = DK_OCTA;
   DirectiveKindMap[".single"] = DK_SINGLE;
   DirectiveKindMap[".float"] = DK_FLOAT;
   DirectiveKindMap[".double"] = DK_DOUBLE;
   DirectiveKindMap[".align"] = DK_ALIGN;
   DirectiveKindMap[".align32"] = DK_ALIGN32;
   DirectiveKindMap[".balign"] = DK_BALIGN;
   DirectiveKindMap[".balignw"] = DK_BALIGNW;
   DirectiveKindMap[".balignl"] = DK_BALIGNL;
   DirectiveKindMap[".p2align"] = DK_P2ALIGN;
   DirectiveKindMap[".p2alignw"] = DK_P2ALIGNW;
   DirectiveKindMap[".p2alignl"] = DK_P2ALIGNL;
   DirectiveKindMap[".org"] = DK_ORG;
   DirectiveKindMap[".fill"] = DK_FILL;
   DirectiveKindMap[".zero"] = DK_ZERO;
   DirectiveKindMap[".extern"] = DK_EXTERN;
   DirectiveKindMap[".globl"] = DK_GLOBL;
   DirectiveKindMap[".global"] = DK_GLOBAL;
   DirectiveKindMap[".lazy_reference"] = DK_LAZY_REFERENCE;
   DirectiveKindMap[".no_dead_strip"] = DK_NO_DEAD_STRIP;
   DirectiveKindMap[".symbol_resolver"] = DK_SYMBOL_RESOLVER;
   DirectiveKindMap[".private_extern"] = DK_PRIVATE_EXTERN;
   DirectiveKindMap[".reference"] = DK_REFERENCE;
   DirectiveKindMap[".weak_definition"] = DK_WEAK_DEFINITION;
   DirectiveKindMap[".weak_reference"] = DK_WEAK_REFERENCE;
   DirectiveKindMap[".weak_def_can_be_hidden"] = DK_WEAK_DEF_CAN_BE_HIDDEN;
   DirectiveKindMap[".comm"] = DK_COMM;
   DirectiveKindMap[".common"] = DK_COMMON;
   DirectiveKindMap[".lcomm"] = DK_LCOMM;
   DirectiveKindMap[".abort"] = DK_ABORT;
   DirectiveKindMap[".include"] = DK_INCLUDE;
   DirectiveKindMap[".incbin"] = DK_INCBIN;
   DirectiveKindMap[".code16"] = DK_CODE16;
   DirectiveKindMap[".code16gcc"] = DK_CODE16GCC;
   DirectiveKindMap[".rept"] = DK_REPT;
   DirectiveKindMap[".rep"] = DK_REPT;
   DirectiveKindMap[".irp"] = DK_IRP;
   DirectiveKindMap[".irpc"] = DK_IRPC;
   DirectiveKindMap[".endr"] = DK_ENDR;
   DirectiveKindMap[".bundle_align_mode"] = DK_BUNDLE_ALIGN_MODE;
   DirectiveKindMap[".bundle_lock"] = DK_BUNDLE_LOCK;
   DirectiveKindMap[".bundle_unlock"] = DK_BUNDLE_UNLOCK;
   DirectiveKindMap[".if"] = DK_IF;
   DirectiveKindMap[".ifeq"] = DK_IFEQ;
   DirectiveKindMap[".ifge"] = DK_IFGE;
   DirectiveKindMap[".ifgt"] = DK_IFGT;
   DirectiveKindMap[".ifle"] = DK_IFLE;
   DirectiveKindMap[".iflt"] = DK_IFLT;
   DirectiveKindMap[".ifne"] = DK_IFNE;
   DirectiveKindMap[".ifb"] = DK_IFB;
   DirectiveKindMap[".ifnb"] = DK_IFNB;
   DirectiveKindMap[".ifc"] = DK_IFC;
   DirectiveKindMap[".ifeqs"] = DK_IFEQS;
   DirectiveKindMap[".ifnc"] = DK_IFNC;
   DirectiveKindMap[".ifnes"] = DK_IFNES;
   DirectiveKindMap[".ifdef"] = DK_IFDEF;
   DirectiveKindMap[".ifndef"] = DK_IFNDEF;
   DirectiveKindMap[".ifnotdef"] = DK_IFNOTDEF;
   DirectiveKindMap[".elseif"] = DK_ELSEIF;
   DirectiveKindMap[".else"] = DK_ELSE;
   DirectiveKindMap[".end"] = DK_END;
   DirectiveKindMap[".endif"] = DK_ENDIF;
   DirectiveKindMap[".skip"] = DK_SKIP;
   DirectiveKindMap[".space"] = DK_SPACE;
   DirectiveKindMap[".file"] = DK_FILE;
   DirectiveKindMap[".line"] = DK_LINE;
   DirectiveKindMap[".loc"] = DK_LOC;
   DirectiveKindMap[".stabs"] = DK_STABS;
   DirectiveKindMap[".cv_file"] = DK_CV_FILE;
   DirectiveKindMap[".cv_func_id"] = DK_CV_FUNC_ID;
   DirectiveKindMap[".cv_loc"] = DK_CV_LOC;
   DirectiveKindMap[".cv_linetable"] = DK_CV_LINETABLE;
   DirectiveKindMap[".cv_inline_linetable"] = DK_CV_INLINE_LINETABLE;
   DirectiveKindMap[".cv_inline_site_id"] = DK_CV_INLINE_SITE_ID;
   DirectiveKindMap[".cv_def_range"] = DK_CV_DEF_RANGE;
   DirectiveKindMap[".cv_stringtable"] = DK_CV_STRINGTABLE;
   DirectiveKindMap[".cv_filechecksums"] = DK_CV_FILECHECKSUMS;
   DirectiveKindMap[".cv_filechecksumoffset"] = DK_CV_FILECHECKSUM_OFFSET;
   DirectiveKindMap[".cv_fpo_data"] = DK_CV_FPO_DATA;
   DirectiveKindMap[".sleb128"] = DK_SLEB128;
   DirectiveKindMap[".uleb128"] = DK_ULEB128;
   DirectiveKindMap[".cfi_sections"] = DK_CFI_SECTIONS;
   DirectiveKindMap[".cfi_startproc"] = DK_CFI_STARTPROC;
   DirectiveKindMap[".cfi_endproc"] = DK_CFI_ENDPROC;
   DirectiveKindMap[".cfi_def_cfa"] = DK_CFI_DEF_CFA;
   DirectiveKindMap[".cfi_def_cfa_offset"] = DK_CFI_DEF_CFA_OFFSET;
   DirectiveKindMap[".cfi_adjust_cfa_offset"] = DK_CFI_ADJUST_CFA_OFFSET;
   DirectiveKindMap[".cfi_def_cfa_register"] = DK_CFI_DEF_CFA_REGISTER;
   DirectiveKindMap[".cfi_offset"] = DK_CFI_OFFSET;
   DirectiveKindMap[".cfi_rel_offset"] = DK_CFI_REL_OFFSET;
   DirectiveKindMap[".cfi_personality"] = DK_CFI_PERSONALITY;
   DirectiveKindMap[".cfi_lsda"] = DK_CFI_LSDA;
   DirectiveKindMap[".cfi_remember_state"] = DK_CFI_REMEMBER_STATE;
   DirectiveKindMap[".cfi_restore_state"] = DK_CFI_RESTORE_STATE;
   DirectiveKindMap[".cfi_same_value"] = DK_CFI_SAME_VALUE;
   DirectiveKindMap[".cfi_restore"] = DK_CFI_RESTORE;
   DirectiveKindMap[".cfi_escape"] = DK_CFI_ESCAPE;
   DirectiveKindMap[".cfi_return_column"] = DK_CFI_RETURN_COLUMN;
   DirectiveKindMap[".cfi_signal_frame"] = DK_CFI_SIGNAL_FRAME;
   DirectiveKindMap[".cfi_undefined"] = DK_CFI_UNDEFINED;
   DirectiveKindMap[".cfi_register"] = DK_CFI_REGISTER;
   DirectiveKindMap[".cfi_window_save"] = DK_CFI_WINDOW_SAVE;
   DirectiveKindMap[".macros_on"] = DK_MACROS_ON;
   DirectiveKindMap[".macros_off"] = DK_MACROS_OFF;
   DirectiveKindMap[".macro"] = DK_MACRO;
   DirectiveKindMap[".exitm"] = DK_EXITM;
   DirectiveKindMap[".endm"] = DK_ENDM;
   DirectiveKindMap[".endmacro"] = DK_ENDMACRO;
   DirectiveKindMap[".purgem"] = DK_PURGEM;
   DirectiveKindMap[".err"] = DK_ERR;
   DirectiveKindMap[".error"] = DK_ERROR;
   DirectiveKindMap[".warning"] = DK_WARNING;
   DirectiveKindMap[".altmacro"] = DK_ALTMACRO;
   DirectiveKindMap[".noaltmacro"] = DK_NOALTMACRO;
   DirectiveKindMap[".reloc"] = DK_RELOC;
   DirectiveKindMap[".dc"] = DK_DC;
   DirectiveKindMap[".dc.a"] = DK_DC_A;
   DirectiveKindMap[".dc.b"] = DK_DC_B;
   DirectiveKindMap[".dc.d"] = DK_DC_D;
   DirectiveKindMap[".dc.l"] = DK_DC_L;
   DirectiveKindMap[".dc.s"] = DK_DC_S;
   DirectiveKindMap[".dc.w"] = DK_DC_W;
   DirectiveKindMap[".dc.x"] = DK_DC_X;
   DirectiveKindMap[".dcb"] = DK_DCB;
   DirectiveKindMap[".dcb.b"] = DK_DCB_B;
   DirectiveKindMap[".dcb.d"] = DK_DCB_D;
   DirectiveKindMap[".dcb.l"] = DK_DCB_L;
   DirectiveKindMap[".dcb.s"] = DK_DCB_S;
   DirectiveKindMap[".dcb.w"] = DK_DCB_W;
   DirectiveKindMap[".dcb.x"] = DK_DCB_X;
   DirectiveKindMap[".ds"] = DK_DS;
   DirectiveKindMap[".ds.b"] = DK_DS_B;
   DirectiveKindMap[".ds.d"] = DK_DS_D;
   DirectiveKindMap[".ds.l"] = DK_DS_L;
   DirectiveKindMap[".ds.p"] = DK_DS_P;
   DirectiveKindMap[".ds.s"] = DK_DS_S;
   DirectiveKindMap[".ds.w"] = DK_DS_W;
   DirectiveKindMap[".ds.x"] = DK_DS_X;
   DirectiveKindMap[".print"] = DK_PRINT;
   DirectiveKindMap[".addrsig"] = DK_ADDRSIG;
   DirectiveKindMap[".addrsig_sym"] = DK_ADDRSIG_SYM;
 }
 
 MCAsmMacro *AsmParser::parseMacroLikeBody(SMLoc DirectiveLoc) {
   AsmToken EndToken, StartToken = getTok();
 
   unsigned NestLevel = 0;
   while (true) {
     // Check whether we have reached the end of the file.
     if (getLexer().is(AsmToken::Eof)) {
       printError(DirectiveLoc, "no matching '.endr' in definition");
       return nullptr;
     }
 
     if (Lexer.is(AsmToken::Identifier) &&
         (getTok().getIdentifier() == ".rep" ||
          getTok().getIdentifier() == ".rept" ||
          getTok().getIdentifier() == ".irp" ||
          getTok().getIdentifier() == ".irpc")) {
       ++NestLevel;
     }
 
     // Otherwise, check whether we have reached the .endr.
     if (Lexer.is(AsmToken::Identifier) && getTok().getIdentifier() == ".endr") {
       if (NestLevel == 0) {
         EndToken = getTok();
         Lex();
         if (Lexer.isNot(AsmToken::EndOfStatement)) {
           printError(getTok().getLoc(),
                      "unexpected token in '.endr' directive");
           return nullptr;
         }
         break;
       }
       --NestLevel;
     }
 
     // Otherwise, scan till the end of the statement.
     eatToEndOfStatement();
   }
 
   const char *BodyStart = StartToken.getLoc().getPointer();
   const char *BodyEnd = EndToken.getLoc().getPointer();
   StringRef Body = StringRef(BodyStart, BodyEnd - BodyStart);
 
   // We Are Anonymous.
   MacroLikeBodies.emplace_back(StringRef(), Body, MCAsmMacroParameters());
   return &MacroLikeBodies.back();
 }
 
 void AsmParser::instantiateMacroLikeBody(MCAsmMacro *M, SMLoc DirectiveLoc,
                                          raw_svector_ostream &OS) {
   OS << ".endr\n";
 
   std::unique_ptr<MemoryBuffer> Instantiation =
       MemoryBuffer::getMemBufferCopy(OS.str(), "<instantiation>");
 
   // Create the macro instantiation object and add to the current macro
   // instantiation stack.
   MacroInstantiation *MI = new MacroInstantiation(
       DirectiveLoc, CurBuffer, getTok().getLoc(), TheCondStack.size());
   ActiveMacros.push_back(MI);
 
   // Jump to the macro instantiation and prime the lexer.
   CurBuffer = SrcMgr.AddNewSourceBuffer(std::move(Instantiation), SMLoc());
   Lexer.setBuffer(SrcMgr.getMemoryBuffer(CurBuffer)->getBuffer());
   Lex();
 }
 
 /// parseDirectiveRept
 ///   ::= .rep | .rept count
 bool AsmParser::parseDirectiveRept(SMLoc DirectiveLoc, StringRef Dir) {
   const MCExpr *CountExpr;
   SMLoc CountLoc = getTok().getLoc();
   if (parseExpression(CountExpr))
     return true;
 
   int64_t Count;
   if (!CountExpr->evaluateAsAbsolute(Count, getStreamer().getAssemblerPtr())) {
     return Error(CountLoc, "unexpected token in '" + Dir + "' directive");
   }
 
   if (check(Count < 0, CountLoc, "Count is negative") ||
       parseToken(AsmToken::EndOfStatement,
                  "unexpected token in '" + Dir + "' directive"))
     return true;
 
   // Lex the rept definition.
   MCAsmMacro *M = parseMacroLikeBody(DirectiveLoc);
   if (!M)
     return true;
 
   // Macro instantiation is lexical, unfortunately. We construct a new buffer
   // to hold the macro body with substitutions.
   SmallString<256> Buf;
   raw_svector_ostream OS(Buf);
   while (Count--) {
     // Note that the AtPseudoVariable is disabled for instantiations of .rep(t).
     if (expandMacro(OS, M->Body, None, None, false, getTok().getLoc()))
       return true;
   }
   instantiateMacroLikeBody(M, DirectiveLoc, OS);
 
   return false;
 }
 
 /// parseDirectiveIrp
 /// ::= .irp symbol,values
 bool AsmParser::parseDirectiveIrp(SMLoc DirectiveLoc) {
   MCAsmMacroParameter Parameter;
   MCAsmMacroArguments A;
   if (check(parseIdentifier(Parameter.Name),
             "expected identifier in '.irp' directive") ||
       parseToken(AsmToken::Comma, "expected comma in '.irp' directive") ||
       parseMacroArguments(nullptr, A) ||
       parseToken(AsmToken::EndOfStatement, "expected End of Statement"))
     return true;
 
   // Lex the irp definition.
   MCAsmMacro *M = parseMacroLikeBody(DirectiveLoc);
   if (!M)
     return true;
 
   // Macro instantiation is lexical, unfortunately. We construct a new buffer
   // to hold the macro body with substitutions.
   SmallString<256> Buf;
   raw_svector_ostream OS(Buf);
 
   for (const MCAsmMacroArgument &Arg : A) {
     // Note that the AtPseudoVariable is enabled for instantiations of .irp.
     // This is undocumented, but GAS seems to support it.
     if (expandMacro(OS, M->Body, Parameter, Arg, true, getTok().getLoc()))
       return true;
   }
 
   instantiateMacroLikeBody(M, DirectiveLoc, OS);
 
   return false;
 }
 
 /// parseDirectiveIrpc
 /// ::= .irpc symbol,values
 bool AsmParser::parseDirectiveIrpc(SMLoc DirectiveLoc) {
   MCAsmMacroParameter Parameter;
   MCAsmMacroArguments A;
 
   if (check(parseIdentifier(Parameter.Name),
             "expected identifier in '.irpc' directive") ||
       parseToken(AsmToken::Comma, "expected comma in '.irpc' directive") ||
       parseMacroArguments(nullptr, A))
     return true;
 
   if (A.size() != 1 || A.front().size() != 1)
     return TokError("unexpected token in '.irpc' directive");
 
   // Eat the end of statement.
   if (parseToken(AsmToken::EndOfStatement, "expected end of statement"))
     return true;
 
   // Lex the irpc definition.
   MCAsmMacro *M = parseMacroLikeBody(DirectiveLoc);
   if (!M)
     return true;
 
   // Macro instantiation is lexical, unfortunately. We construct a new buffer
   // to hold the macro body with substitutions.
   SmallString<256> Buf;
   raw_svector_ostream OS(Buf);
 
   StringRef Values = A.front().front().getString();
   for (std::size_t I = 0, End = Values.size(); I != End; ++I) {
     MCAsmMacroArgument Arg;
     Arg.emplace_back(AsmToken::Identifier, Values.slice(I, I + 1));
 
     // Note that the AtPseudoVariable is enabled for instantiations of .irpc.
     // This is undocumented, but GAS seems to support it.
     if (expandMacro(OS, M->Body, Parameter, Arg, true, getTok().getLoc()))
       return true;
   }
 
   instantiateMacroLikeBody(M, DirectiveLoc, OS);
 
   return false;
 }
 
 bool AsmParser::parseDirectiveEndr(SMLoc DirectiveLoc) {
   if (ActiveMacros.empty())
     return TokError("unmatched '.endr' directive");
 
   // The only .repl that should get here are the ones created by
   // instantiateMacroLikeBody.
   assert(getLexer().is(AsmToken::EndOfStatement));
 
   handleMacroExit();
   return false;
 }
 
 bool AsmParser::parseDirectiveMSEmit(SMLoc IDLoc, ParseStatementInfo &Info,
                                      size_t Len) {
   const MCExpr *Value;
   SMLoc ExprLoc = getLexer().getLoc();
   if (parseExpression(Value))
     return true;
   const MCConstantExpr *MCE = dyn_cast<MCConstantExpr>(Value);
   if (!MCE)
     return Error(ExprLoc, "unexpected expression in _emit");
   uint64_t IntValue = MCE->getValue();
   if (!isUInt<8>(IntValue) && !isInt<8>(IntValue))
     return Error(ExprLoc, "literal value out of range for directive");
 
   Info.AsmRewrites->emplace_back(AOK_Emit, IDLoc, Len);
   return false;
 }
 
 bool AsmParser::parseDirectiveMSAlign(SMLoc IDLoc, ParseStatementInfo &Info) {
   const MCExpr *Value;
   SMLoc ExprLoc = getLexer().getLoc();
   if (parseExpression(Value))
     return true;
   const MCConstantExpr *MCE = dyn_cast<MCConstantExpr>(Value);
   if (!MCE)
     return Error(ExprLoc, "unexpected expression in align");
   uint64_t IntValue = MCE->getValue();
   if (!isPowerOf2_64(IntValue))
     return Error(ExprLoc, "literal value not a power of two greater then zero");
 
   Info.AsmRewrites->emplace_back(AOK_Align, IDLoc, 5, Log2_64(IntValue));
   return false;
 }
 
 bool AsmParser::parseDirectivePrint(SMLoc DirectiveLoc) {
   const AsmToken StrTok = getTok();
   Lex();
   if (StrTok.isNot(AsmToken::String) || StrTok.getString().front() != '"')
     return Error(DirectiveLoc, "expected double quoted string after .print");
   if (parseToken(AsmToken::EndOfStatement, "expected end of statement"))
     return true;
   llvm::outs() << StrTok.getStringContents() << '\n';
   return false;
 }
 
 bool AsmParser::parseDirectiveAddrsig() {
   getStreamer().EmitAddrsig();
   return false;
 }
 
 bool AsmParser::parseDirectiveAddrsigSym() {
   StringRef Name;
   if (check(parseIdentifier(Name),
             "expected identifier in '.addrsig_sym' directive"))
     return true;
   MCSymbol *Sym = getContext().getOrCreateSymbol(Name);
   getStreamer().EmitAddrsigSym(Sym);
   return false;
 }
 
 // We are comparing pointers, but the pointers are relative to a single string.
 // Thus, this should always be deterministic.
 static int rewritesSort(const AsmRewrite *AsmRewriteA,
                         const AsmRewrite *AsmRewriteB) {
   if (AsmRewriteA->Loc.getPointer() < AsmRewriteB->Loc.getPointer())
     return -1;
   if (AsmRewriteB->Loc.getPointer() < AsmRewriteA->Loc.getPointer())
     return 1;
 
   // It's possible to have a SizeDirective, Imm/ImmPrefix and an Input/Output
   // rewrite to the same location.  Make sure the SizeDirective rewrite is
   // performed first, then the Imm/ImmPrefix and finally the Input/Output.  This
   // ensures the sort algorithm is stable.
   if (AsmRewritePrecedence[AsmRewriteA->Kind] >
       AsmRewritePrecedence[AsmRewriteB->Kind])
     return -1;
 
   if (AsmRewritePrecedence[AsmRewriteA->Kind] <
       AsmRewritePrecedence[AsmRewriteB->Kind])
     return 1;
   llvm_unreachable("Unstable rewrite sort.");
 }
 
 bool AsmParser::parseMSInlineAsm(
     void *AsmLoc, std::string &AsmString, unsigned &NumOutputs,
     unsigned &NumInputs, SmallVectorImpl<std::pair<void *, bool>> &OpDecls,
     SmallVectorImpl<std::string> &Constraints,
     SmallVectorImpl<std::string> &Clobbers, const MCInstrInfo *MII,
     const MCInstPrinter *IP, MCAsmParserSemaCallback &SI) {
   SmallVector<void *, 4> InputDecls;
   SmallVector<void *, 4> OutputDecls;
   SmallVector<bool, 4> InputDeclsAddressOf;
   SmallVector<bool, 4> OutputDeclsAddressOf;
   SmallVector<std::string, 4> InputConstraints;
   SmallVector<std::string, 4> OutputConstraints;
   SmallVector<unsigned, 4> ClobberRegs;
 
   SmallVector<AsmRewrite, 4> AsmStrRewrites;
 
   // Prime the lexer.
   Lex();
 
   // While we have input, parse each statement.
   unsigned InputIdx = 0;
   unsigned OutputIdx = 0;
   while (getLexer().isNot(AsmToken::Eof)) {
     // Parse curly braces marking block start/end
     if (parseCurlyBlockScope(AsmStrRewrites))
       continue;
 
     ParseStatementInfo Info(&AsmStrRewrites);
     bool StatementErr = parseStatement(Info, &SI);
 
     if (StatementErr || Info.ParseError) {
       // Emit pending errors if any exist.
       printPendingErrors();
       return true;
     }
 
     // No pending error should exist here.
     assert(!hasPendingError() && "unexpected error from parseStatement");
 
     if (Info.Opcode == ~0U)
       continue;
 
     const MCInstrDesc &Desc = MII->get(Info.Opcode);
 
     // Build the list of clobbers, outputs and inputs.
     for (unsigned i = 1, e = Info.ParsedOperands.size(); i != e; ++i) {
       MCParsedAsmOperand &Operand = *Info.ParsedOperands[i];
 
       // Immediate.
       if (Operand.isImm())
         continue;
 
       // Register operand.
       if (Operand.isReg() && !Operand.needAddressOf() &&
           !getTargetParser().OmitRegisterFromClobberLists(Operand.getReg())) {
         unsigned NumDefs = Desc.getNumDefs();
         // Clobber.
         if (NumDefs && Operand.getMCOperandNum() < NumDefs)
           ClobberRegs.push_back(Operand.getReg());
         continue;
       }
 
       // Expr/Input or Output.
       StringRef SymName = Operand.getSymName();
       if (SymName.empty())
         continue;
 
       void *OpDecl = Operand.getOpDecl();
       if (!OpDecl)
         continue;
 
       bool isOutput = (i == 1) && Desc.mayStore();
       SMLoc Start = SMLoc::getFromPointer(SymName.data());
       if (isOutput) {
         ++InputIdx;
         OutputDecls.push_back(OpDecl);
         OutputDeclsAddressOf.push_back(Operand.needAddressOf());
         OutputConstraints.push_back(("=" + Operand.getConstraint()).str());
         AsmStrRewrites.emplace_back(AOK_Output, Start, SymName.size());
       } else {
         InputDecls.push_back(OpDecl);
         InputDeclsAddressOf.push_back(Operand.needAddressOf());
         InputConstraints.push_back(Operand.getConstraint().str());
         AsmStrRewrites.emplace_back(AOK_Input, Start, SymName.size());
       }
     }
 
     // Consider implicit defs to be clobbers.  Think of cpuid and push.
     ArrayRef<MCPhysReg> ImpDefs(Desc.getImplicitDefs(),
                                 Desc.getNumImplicitDefs());
     ClobberRegs.insert(ClobberRegs.end(), ImpDefs.begin(), ImpDefs.end());
   }
 
   // Set the number of Outputs and Inputs.
   NumOutputs = OutputDecls.size();
   NumInputs = InputDecls.size();
 
   // Set the unique clobbers.
   array_pod_sort(ClobberRegs.begin(), ClobberRegs.end());
   ClobberRegs.erase(std::unique(ClobberRegs.begin(), ClobberRegs.end()),
                     ClobberRegs.end());
   Clobbers.assign(ClobberRegs.size(), std::string());
   for (unsigned I = 0, E = ClobberRegs.size(); I != E; ++I) {
     raw_string_ostream OS(Clobbers[I]);
     IP->printRegName(OS, ClobberRegs[I]);
   }
 
   // Merge the various outputs and inputs.  Output are expected first.
   if (NumOutputs || NumInputs) {
     unsigned NumExprs = NumOutputs + NumInputs;
     OpDecls.resize(NumExprs);
     Constraints.resize(NumExprs);
     for (unsigned i = 0; i < NumOutputs; ++i) {
       OpDecls[i] = std::make_pair(OutputDecls[i], OutputDeclsAddressOf[i]);
       Constraints[i] = OutputConstraints[i];
     }
     for (unsigned i = 0, j = NumOutputs; i < NumInputs; ++i, ++j) {
       OpDecls[j] = std::make_pair(InputDecls[i], InputDeclsAddressOf[i]);
       Constraints[j] = InputConstraints[i];
     }
   }
 
   // Build the IR assembly string.
   std::string AsmStringIR;
   raw_string_ostream OS(AsmStringIR);
   StringRef ASMString =
       SrcMgr.getMemoryBuffer(SrcMgr.getMainFileID())->getBuffer();
   const char *AsmStart = ASMString.begin();
   const char *AsmEnd = ASMString.end();
   array_pod_sort(AsmStrRewrites.begin(), AsmStrRewrites.end(), rewritesSort);
   for (const AsmRewrite &AR : AsmStrRewrites) {
     AsmRewriteKind Kind = AR.Kind;
 
     const char *Loc = AR.Loc.getPointer();
     assert(Loc >= AsmStart && "Expected Loc to be at or after Start!");
 
     // Emit everything up to the immediate/expression.
     if (unsigned Len = Loc - AsmStart)
       OS << StringRef(AsmStart, Len);
 
     // Skip the original expression.
     if (Kind == AOK_Skip) {
       AsmStart = Loc + AR.Len;
       continue;
     }
 
     unsigned AdditionalSkip = 0;
     // Rewrite expressions in $N notation.
     switch (Kind) {
     default:
       break;
     case AOK_IntelExpr:
       assert(AR.IntelExp.isValid() && "cannot write invalid intel expression");
       if (AR.IntelExp.NeedBracs)
         OS << "[";
       if (AR.IntelExp.hasBaseReg())
         OS << AR.IntelExp.BaseReg;
       if (AR.IntelExp.hasIndexReg())
         OS << (AR.IntelExp.hasBaseReg() ? " + " : "")
            << AR.IntelExp.IndexReg;
       if (AR.IntelExp.Scale > 1)
           OS << " * $$" << AR.IntelExp.Scale;
       if (AR.IntelExp.Imm || !AR.IntelExp.hasRegs())
         OS << (AR.IntelExp.hasRegs() ? " + $$" : "$$") << AR.IntelExp.Imm;
       if (AR.IntelExp.NeedBracs)
         OS << "]";
       break;
     case AOK_Label:
       OS << Ctx.getAsmInfo()->getPrivateLabelPrefix() << AR.Label;
       break;
     case AOK_Input:
       OS << '$' << InputIdx++;
       break;
     case AOK_Output:
       OS << '$' << OutputIdx++;
       break;
     case AOK_SizeDirective:
       switch (AR.Val) {
       default: break;
       case 8:  OS << "byte ptr "; break;
       case 16: OS << "word ptr "; break;
       case 32: OS << "dword ptr "; break;
       case 64: OS << "qword ptr "; break;
       case 80: OS << "xword ptr "; break;
       case 128: OS << "xmmword ptr "; break;
       case 256: OS << "ymmword ptr "; break;
       }
       break;
     case AOK_Emit:
       OS << ".byte";
       break;
     case AOK_Align: {
       // MS alignment directives are measured in bytes. If the native assembler
       // measures alignment in bytes, we can pass it straight through.
       OS << ".align";
       if (getContext().getAsmInfo()->getAlignmentIsInBytes())
         break;
 
       // Alignment is in log2 form, so print that instead and skip the original
       // immediate.
       unsigned Val = AR.Val;
       OS << ' ' << Val;
       assert(Val < 10 && "Expected alignment less then 2^10.");
       AdditionalSkip = (Val < 4) ? 2 : Val < 7 ? 3 : 4;
       break;
     }
     case AOK_EVEN:
       OS << ".even";
       break;
     case AOK_EndOfStatement:
       OS << "\n\t";
       break;
     }
 
     // Skip the original expression.
     AsmStart = Loc + AR.Len + AdditionalSkip;
   }
 
   // Emit the remainder of the asm string.
   if (AsmStart != AsmEnd)
     OS << StringRef(AsmStart, AsmEnd - AsmStart);
 
   AsmString = OS.str();
   return false;
 }
 
 namespace llvm {
 namespace MCParserUtils {
 
 /// Returns whether the given symbol is used anywhere in the given expression,
 /// or subexpressions.
 static bool isSymbolUsedInExpression(const MCSymbol *Sym, const MCExpr *Value) {
   switch (Value->getKind()) {
   case MCExpr::Binary: {
     const MCBinaryExpr *BE = static_cast<const MCBinaryExpr *>(Value);
     return isSymbolUsedInExpression(Sym, BE->getLHS()) ||
            isSymbolUsedInExpression(Sym, BE->getRHS());
   }
   case MCExpr::Target:
   case MCExpr::Constant:
     return false;
   case MCExpr::SymbolRef: {
     const MCSymbol &S =
         static_cast<const MCSymbolRefExpr *>(Value)->getSymbol();
     if (S.isVariable())
       return isSymbolUsedInExpression(Sym, S.getVariableValue());
     return &S == Sym;
   }
   case MCExpr::Unary:
     return isSymbolUsedInExpression(
         Sym, static_cast<const MCUnaryExpr *>(Value)->getSubExpr());
   }
 
   llvm_unreachable("Unknown expr kind!");
 }
 
 bool parseAssignmentExpression(StringRef Name, bool allow_redef,
                                MCAsmParser &Parser, MCSymbol *&Sym,
                                const MCExpr *&Value) {
 
   // FIXME: Use better location, we should use proper tokens.
   SMLoc EqualLoc = Parser.getTok().getLoc();
   if (Parser.parseExpression(Value))
     return Parser.TokError("missing expression");
 
   // Note: we don't count b as used in "a = b". This is to allow
   // a = b
   // b = c
 
   if (Parser.parseToken(AsmToken::EndOfStatement))
     return true;
 
   // Validate that the LHS is allowed to be a variable (either it has not been
   // used as a symbol, or it is an absolute symbol).
   Sym = Parser.getContext().lookupSymbol(Name);
   if (Sym) {
     // Diagnose assignment to a label.
     //
     // FIXME: Diagnostics. Note the location of the definition as a label.
     // FIXME: Diagnose assignment to protected identifier (e.g., register name).
     if (isSymbolUsedInExpression(Sym, Value))
       return Parser.Error(EqualLoc, "Recursive use of '" + Name + "'");
     else if (Sym->isUndefined(/*SetUsed*/ false) && !Sym->isUsed() &&
              !Sym->isVariable())
       ; // Allow redefinitions of undefined symbols only used in directives.
     else if (Sym->isVariable() && !Sym->isUsed() && allow_redef)
       ; // Allow redefinitions of variables that haven't yet been used.
     else if (!Sym->isUndefined() && (!Sym->isVariable() || !allow_redef))
       return Parser.Error(EqualLoc, "redefinition of '" + Name + "'");
     else if (!Sym->isVariable())
       return Parser.Error(EqualLoc, "invalid assignment to '" + Name + "'");
     else if (!isa<MCConstantExpr>(Sym->getVariableValue()))
       return Parser.Error(EqualLoc,
                           "invalid reassignment of non-absolute variable '" +
                               Name + "'");
   } else if (Name == ".") {
     Parser.getStreamer().emitValueToOffset(Value, 0, EqualLoc);
     return false;
   } else
     Sym = Parser.getContext().getOrCreateSymbol(Name);
 
   Sym->setRedefinable(allow_redef);
 
   return false;
 }
 
 } // end namespace MCParserUtils
 } // end namespace llvm
 
 /// Create an MCAsmParser instance.
 MCAsmParser *llvm::createMCAsmParser(SourceMgr &SM, MCContext &C,
                                      MCStreamer &Out, const MCAsmInfo &MAI,
                                      unsigned CB) {
   return new AsmParser(SM, C, Out, MAI, CB);
 }
Index: vendor/llvm/dist-release_70/lib/Support/Unix/Path.inc
===================================================================
--- vendor/llvm/dist-release_70/lib/Support/Unix/Path.inc	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Support/Unix/Path.inc	(revision 338575)
@@ -1,1020 +1,1022 @@
 //===- llvm/Support/Unix/Path.inc - Unix Path Implementation ----*- C++ -*-===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file implements the Unix specific implementation of the Path API.
 //
 //===----------------------------------------------------------------------===//
 
 //===----------------------------------------------------------------------===//
 //=== WARNING: Implementation here must contain only generic UNIX code that
 //===          is guaranteed to work on *all* UNIX variants.
 //===----------------------------------------------------------------------===//
 
 #include "Unix.h"
 #include <limits.h>
 #include <stdio.h>
 #if HAVE_SYS_STAT_H
 #include <sys/stat.h>
 #endif
 #if HAVE_FCNTL_H
 #include <fcntl.h>
 #endif
 #ifdef HAVE_UNISTD_H
 #include <unistd.h>
 #endif
 #ifdef HAVE_SYS_MMAN_H
 #include <sys/mman.h>
 #endif
 
 #include <dirent.h>
 #include <pwd.h>
 
 #ifdef __APPLE__
 #include <mach-o/dyld.h>
 #include <sys/attr.h>
 #endif
 
 // Both stdio.h and cstdio are included via different paths and
 // stdcxx's cstdio doesn't include stdio.h, so it doesn't #undef the macros
 // either.
 #undef ferror
 #undef feof
 
 // For GNU Hurd
 #if defined(__GNU__) && !defined(PATH_MAX)
 # define PATH_MAX 4096
 #endif
 
 #include <sys/types.h>
 #if !defined(__APPLE__) && !defined(__OpenBSD__) && !defined(__FreeBSD__) &&   \
     !defined(__linux__)
 #include <sys/statvfs.h>
 #define STATVFS statvfs
 #define FSTATVFS fstatvfs
 #define STATVFS_F_FRSIZE(vfs) vfs.f_frsize
 #else
 #if defined(__OpenBSD__) || defined(__FreeBSD__)
 #include <sys/mount.h>
 #include <sys/param.h>
 #elif defined(__linux__)
 #if defined(HAVE_LINUX_MAGIC_H)
 #include <linux/magic.h>
 #else
 #if defined(HAVE_LINUX_NFS_FS_H)
 #include <linux/nfs_fs.h>
 #endif
 #if defined(HAVE_LINUX_SMB_H)
 #include <linux/smb.h>
 #endif
 #endif
 #include <sys/vfs.h>
 #else
 #include <sys/mount.h>
 #endif
 #define STATVFS statfs
 #define FSTATVFS fstatfs
 #define STATVFS_F_FRSIZE(vfs) static_cast<uint64_t>(vfs.f_bsize)
 #endif
 
 #if defined(__NetBSD__)
 #define STATVFS_F_FLAG(vfs) (vfs).f_flag
 #else
 #define STATVFS_F_FLAG(vfs) (vfs).f_flags
 #endif
 
 using namespace llvm;
 
 namespace llvm {
 namespace sys  {
 namespace fs {
 
 const file_t kInvalidFile = -1;
 
 #if defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__) ||     \
     defined(__minix) || defined(__FreeBSD_kernel__) || defined(__linux__) ||   \
     defined(__CYGWIN__) || defined(__DragonFly__) || defined(_AIX)
 static int
 test_dir(char ret[PATH_MAX], const char *dir, const char *bin)
 {
   struct stat sb;
   char fullpath[PATH_MAX];
 
   snprintf(fullpath, PATH_MAX, "%s/%s", dir, bin);
   if (!realpath(fullpath, ret))
     return 1;
   if (stat(fullpath, &sb) != 0)
     return 1;
 
   return 0;
 }
 
 static char *
 getprogpath(char ret[PATH_MAX], const char *bin)
 {
   char *pv, *s, *t;
 
   /* First approach: absolute path. */
   if (bin[0] == '/') {
     if (test_dir(ret, "/", bin) == 0)
       return ret;
     return nullptr;
   }
 
   /* Second approach: relative path. */
   if (strchr(bin, '/')) {
     char cwd[PATH_MAX];
     if (!getcwd(cwd, PATH_MAX))
       return nullptr;
     if (test_dir(ret, cwd, bin) == 0)
       return ret;
     return nullptr;
   }
 
   /* Third approach: $PATH */
   if ((pv = getenv("PATH")) == nullptr)
     return nullptr;
   s = pv = strdup(pv);
   if (!pv)
     return nullptr;
   while ((t = strsep(&s, ":")) != nullptr) {
     if (test_dir(ret, t, bin) == 0) {
       free(pv);
       return ret;
     }
   }
   free(pv);
   return nullptr;
 }
 #endif // __FreeBSD__ || __NetBSD__ || __FreeBSD_kernel__
 
 /// GetMainExecutable - Return the path to the main executable, given the
 /// value of argv[0] from program startup.
 std::string getMainExecutable(const char *argv0, void *MainAddr) {
 #if defined(__APPLE__)
   // On OS X the executable path is saved to the stack by dyld. Reading it
   // from there is much faster than calling dladdr, especially for large
   // binaries with symbols.
   char exe_path[MAXPATHLEN];
   uint32_t size = sizeof(exe_path);
   if (_NSGetExecutablePath(exe_path, &size) == 0) {
     char link_path[MAXPATHLEN];
     if (realpath(exe_path, link_path))
       return link_path;
   }
 #elif defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__) ||   \
     defined(__minix) || defined(__DragonFly__) ||                              \
     defined(__FreeBSD_kernel__) || defined(_AIX)
   char exe_path[PATH_MAX];
 
   if (getprogpath(exe_path, argv0) != NULL)
     return exe_path;
 #elif defined(__linux__) || defined(__CYGWIN__)
   char exe_path[MAXPATHLEN];
   StringRef aPath("/proc/self/exe");
   if (sys::fs::exists(aPath)) {
       // /proc is not always mounted under Linux (chroot for example).
       ssize_t len = readlink(aPath.str().c_str(), exe_path, sizeof(exe_path));
       if (len >= 0)
           return std::string(exe_path, len);
   } else {
       // Fall back to the classical detection.
       if (getprogpath(exe_path, argv0))
         return exe_path;
   }
 #elif defined(HAVE_DLFCN_H) && defined(HAVE_DLADDR)
   // Use dladdr to get executable path if available.
   Dl_info DLInfo;
   int err = dladdr(MainAddr, &DLInfo);
   if (err == 0)
     return "";
 
   // If the filename is a symlink, we need to resolve and return the location of
   // the actual executable.
   char link_path[MAXPATHLEN];
   if (realpath(DLInfo.dli_fname, link_path))
     return link_path;
 #else
 #error GetMainExecutable is not implemented on this host yet.
 #endif
   return "";
 }
 
 TimePoint<> basic_file_status::getLastAccessedTime() const {
   return toTimePoint(fs_st_atime);
 }
 
 TimePoint<> basic_file_status::getLastModificationTime() const {
   return toTimePoint(fs_st_mtime);
 }
 
 UniqueID file_status::getUniqueID() const {
   return UniqueID(fs_st_dev, fs_st_ino);
 }
 
 uint32_t file_status::getLinkCount() const {
   return fs_st_nlinks;
 }
 
 ErrorOr<space_info> disk_space(const Twine &Path) {
   struct STATVFS Vfs;
   if (::STATVFS(Path.str().c_str(), &Vfs))
     return std::error_code(errno, std::generic_category());
   auto FrSize = STATVFS_F_FRSIZE(Vfs);
   space_info SpaceInfo;
   SpaceInfo.capacity = static_cast<uint64_t>(Vfs.f_blocks) * FrSize;
   SpaceInfo.free = static_cast<uint64_t>(Vfs.f_bfree) * FrSize;
   SpaceInfo.available = static_cast<uint64_t>(Vfs.f_bavail) * FrSize;
   return SpaceInfo;
 }
 
 std::error_code current_path(SmallVectorImpl<char> &result) {
   result.clear();
 
   const char *pwd = ::getenv("PWD");
   llvm::sys::fs::file_status PWDStatus, DotStatus;
   if (pwd && llvm::sys::path::is_absolute(pwd) &&
       !llvm::sys::fs::status(pwd, PWDStatus) &&
       !llvm::sys::fs::status(".", DotStatus) &&
       PWDStatus.getUniqueID() == DotStatus.getUniqueID()) {
     result.append(pwd, pwd + strlen(pwd));
     return std::error_code();
   }
 
 #ifdef MAXPATHLEN
   result.reserve(MAXPATHLEN);
 #else
 // For GNU Hurd
   result.reserve(1024);
 #endif
 
   while (true) {
     if (::getcwd(result.data(), result.capacity()) == nullptr) {
       // See if there was a real error.
       if (errno != ENOMEM)
         return std::error_code(errno, std::generic_category());
       // Otherwise there just wasn't enough space.
       result.reserve(result.capacity() * 2);
     } else
       break;
   }
 
   result.set_size(strlen(result.data()));
   return std::error_code();
 }
 
 std::error_code set_current_path(const Twine &path) {
   SmallString<128> path_storage;
   StringRef p = path.toNullTerminatedStringRef(path_storage);
 
   if (::chdir(p.begin()) == -1)
     return std::error_code(errno, std::generic_category());
 
   return std::error_code();
 }
 
 std::error_code create_directory(const Twine &path, bool IgnoreExisting,
                                  perms Perms) {
   SmallString<128> path_storage;
   StringRef p = path.toNullTerminatedStringRef(path_storage);
 
   if (::mkdir(p.begin(), Perms) == -1) {
     if (errno != EEXIST || !IgnoreExisting)
       return std::error_code(errno, std::generic_category());
   }
 
   return std::error_code();
 }
 
 // Note that we are using symbolic link because hard links are not supported by
 // all filesystems (SMB doesn't).
 std::error_code create_link(const Twine &to, const Twine &from) {
   // Get arguments.
   SmallString<128> from_storage;
   SmallString<128> to_storage;
   StringRef f = from.toNullTerminatedStringRef(from_storage);
   StringRef t = to.toNullTerminatedStringRef(to_storage);
 
   if (::symlink(t.begin(), f.begin()) == -1)
     return std::error_code(errno, std::generic_category());
 
   return std::error_code();
 }
 
 std::error_code create_hard_link(const Twine &to, const Twine &from) {
   // Get arguments.
   SmallString<128> from_storage;
   SmallString<128> to_storage;
   StringRef f = from.toNullTerminatedStringRef(from_storage);
   StringRef t = to.toNullTerminatedStringRef(to_storage);
 
   if (::link(t.begin(), f.begin()) == -1)
     return std::error_code(errno, std::generic_category());
 
   return std::error_code();
 }
 
 std::error_code remove(const Twine &path, bool IgnoreNonExisting) {
   SmallString<128> path_storage;
   StringRef p = path.toNullTerminatedStringRef(path_storage);
 
   struct stat buf;
   if (lstat(p.begin(), &buf) != 0) {
     if (errno != ENOENT || !IgnoreNonExisting)
       return std::error_code(errno, std::generic_category());
     return std::error_code();
   }
 
   // Note: this check catches strange situations. In all cases, LLVM should
   // only be involved in the creation and deletion of regular files.  This
   // check ensures that what we're trying to erase is a regular file. It
   // effectively prevents LLVM from erasing things like /dev/null, any block
   // special file, or other things that aren't "regular" files.
   if (!S_ISREG(buf.st_mode) && !S_ISDIR(buf.st_mode) && !S_ISLNK(buf.st_mode))
     return make_error_code(errc::operation_not_permitted);
 
   if (::remove(p.begin()) == -1) {
     if (errno != ENOENT || !IgnoreNonExisting)
       return std::error_code(errno, std::generic_category());
   }
 
   return std::error_code();
 }
 
 static bool is_local_impl(struct STATVFS &Vfs) {
 #if defined(__linux__)
 #ifndef NFS_SUPER_MAGIC
 #define NFS_SUPER_MAGIC 0x6969
 #endif
 #ifndef SMB_SUPER_MAGIC
 #define SMB_SUPER_MAGIC 0x517B
 #endif
 #ifndef CIFS_MAGIC_NUMBER
 #define CIFS_MAGIC_NUMBER 0xFF534D42
 #endif
   switch ((uint32_t)Vfs.f_type) {
   case NFS_SUPER_MAGIC:
   case SMB_SUPER_MAGIC:
   case CIFS_MAGIC_NUMBER:
     return false;
   default:
     return true;
   }
 #elif defined(__CYGWIN__)
   // Cygwin doesn't expose this information; would need to use Win32 API.
   return false;
 #elif defined(__Fuchsia__)
   // Fuchsia doesn't yet support remote filesystem mounts.
   return true;
 #elif defined(__HAIKU__)
   // Haiku doesn't expose this information.
   return false;
 #elif defined(__sun)
   // statvfs::f_basetype contains a null-terminated FSType name of the mounted target
   StringRef fstype(Vfs.f_basetype);
   // NFS is the only non-local fstype??
   return !fstype.equals("nfs");
 #else
   return !!(STATVFS_F_FLAG(Vfs) & MNT_LOCAL);
 #endif
 }
 
 std::error_code is_local(const Twine &Path, bool &Result) {
   struct STATVFS Vfs;
   if (::STATVFS(Path.str().c_str(), &Vfs))
     return std::error_code(errno, std::generic_category());
 
   Result = is_local_impl(Vfs);
   return std::error_code();
 }
 
 std::error_code is_local(int FD, bool &Result) {
   struct STATVFS Vfs;
   if (::FSTATVFS(FD, &Vfs))
     return std::error_code(errno, std::generic_category());
 
   Result = is_local_impl(Vfs);
   return std::error_code();
 }
 
 std::error_code rename(const Twine &from, const Twine &to) {
   // Get arguments.
   SmallString<128> from_storage;
   SmallString<128> to_storage;
   StringRef f = from.toNullTerminatedStringRef(from_storage);
   StringRef t = to.toNullTerminatedStringRef(to_storage);
 
   if (::rename(f.begin(), t.begin()) == -1)
     return std::error_code(errno, std::generic_category());
 
   return std::error_code();
 }
 
 std::error_code resize_file(int FD, uint64_t Size) {
 #if defined(HAVE_POSIX_FALLOCATE)
   // If we have posix_fallocate use it. Unlike ftruncate it always allocates
   // space, so we get an error if the disk is full.
   if (int Err = ::posix_fallocate(FD, 0, Size)) {
     if (Err != EINVAL && Err != EOPNOTSUPP)
       return std::error_code(Err, std::generic_category());
   }
 #endif
   // Use ftruncate as a fallback. It may or may not allocate space. At least on
   // OS X with HFS+ it does.
   if (::ftruncate(FD, Size) == -1)
     return std::error_code(errno, std::generic_category());
 
   return std::error_code();
 }
 
 static int convertAccessMode(AccessMode Mode) {
   switch (Mode) {
   case AccessMode::Exist:
     return F_OK;
   case AccessMode::Write:
     return W_OK;
   case AccessMode::Execute:
     return R_OK | X_OK; // scripts also need R_OK.
   }
   llvm_unreachable("invalid enum");
 }
 
 std::error_code access(const Twine &Path, AccessMode Mode) {
   SmallString<128> PathStorage;
   StringRef P = Path.toNullTerminatedStringRef(PathStorage);
 
   if (::access(P.begin(), convertAccessMode(Mode)) == -1)
     return std::error_code(errno, std::generic_category());
 
   if (Mode == AccessMode::Execute) {
     // Don't say that directories are executable.
     struct stat buf;
     if (0 != stat(P.begin(), &buf))
       return errc::permission_denied;
     if (!S_ISREG(buf.st_mode))
       return errc::permission_denied;
   }
 
   return std::error_code();
 }
 
 bool can_execute(const Twine &Path) {
   return !access(Path, AccessMode::Execute);
 }
 
 bool equivalent(file_status A, file_status B) {
   assert(status_known(A) && status_known(B));
   return A.fs_st_dev == B.fs_st_dev &&
          A.fs_st_ino == B.fs_st_ino;
 }
 
 std::error_code equivalent(const Twine &A, const Twine &B, bool &result) {
   file_status fsA, fsB;
   if (std::error_code ec = status(A, fsA))
     return ec;
   if (std::error_code ec = status(B, fsB))
     return ec;
   result = equivalent(fsA, fsB);
   return std::error_code();
 }
 
 static void expandTildeExpr(SmallVectorImpl<char> &Path) {
   StringRef PathStr(Path.begin(), Path.size());
   if (PathStr.empty() || !PathStr.startswith("~"))
     return;
 
   PathStr = PathStr.drop_front();
   StringRef Expr =
       PathStr.take_until([](char c) { return path::is_separator(c); });
   StringRef Remainder = PathStr.substr(Expr.size() + 1);
   SmallString<128> Storage;
   if (Expr.empty()) {
     // This is just ~/..., resolve it to the current user's home dir.
     if (!path::home_directory(Storage)) {
       // For some reason we couldn't get the home directory.  Just exit.
       return;
     }
 
     // Overwrite the first character and insert the rest.
     Path[0] = Storage[0];
     Path.insert(Path.begin() + 1, Storage.begin() + 1, Storage.end());
     return;
   }
 
   // This is a string of the form ~username/, look up this user's entry in the
   // password database.
   struct passwd *Entry = nullptr;
   std::string User = Expr.str();
   Entry = ::getpwnam(User.c_str());
 
   if (!Entry) {
     // Unable to look up the entry, just return back the original path.
     return;
   }
 
   Storage = Remainder;
   Path.clear();
   Path.append(Entry->pw_dir, Entry->pw_dir + strlen(Entry->pw_dir));
   llvm::sys::path::append(Path, Storage);
 }
 
 static std::error_code fillStatus(int StatRet, const struct stat &Status,
                                   file_status &Result) {
   if (StatRet != 0) {
     std::error_code ec(errno, std::generic_category());
     if (ec == errc::no_such_file_or_directory)
       Result = file_status(file_type::file_not_found);
     else
       Result = file_status(file_type::status_error);
     return ec;
   }
 
   file_type Type = file_type::type_unknown;
 
   if (S_ISDIR(Status.st_mode))
     Type = file_type::directory_file;
   else if (S_ISREG(Status.st_mode))
     Type = file_type::regular_file;
   else if (S_ISBLK(Status.st_mode))
     Type = file_type::block_file;
   else if (S_ISCHR(Status.st_mode))
     Type = file_type::character_file;
   else if (S_ISFIFO(Status.st_mode))
     Type = file_type::fifo_file;
   else if (S_ISSOCK(Status.st_mode))
     Type = file_type::socket_file;
   else if (S_ISLNK(Status.st_mode))
     Type = file_type::symlink_file;
 
   perms Perms = static_cast<perms>(Status.st_mode) & all_perms;
   Result = file_status(Type, Perms, Status.st_dev, Status.st_nlink,
                        Status.st_ino, Status.st_atime, Status.st_mtime,
                        Status.st_uid, Status.st_gid, Status.st_size);
 
   return std::error_code();
 }
 
 std::error_code status(const Twine &Path, file_status &Result, bool Follow) {
   SmallString<128> PathStorage;
   StringRef P = Path.toNullTerminatedStringRef(PathStorage);
 
   struct stat Status;
   int StatRet = (Follow ? ::stat : ::lstat)(P.begin(), &Status);
   return fillStatus(StatRet, Status, Result);
 }
 
 std::error_code status(int FD, file_status &Result) {
   struct stat Status;
   int StatRet = ::fstat(FD, &Status);
   return fillStatus(StatRet, Status, Result);
 }
 
 std::error_code setPermissions(const Twine &Path, perms Permissions) {
   SmallString<128> PathStorage;
   StringRef P = Path.toNullTerminatedStringRef(PathStorage);
 
   if (::chmod(P.begin(), Permissions))
     return std::error_code(errno, std::generic_category());
   return std::error_code();
 }
 
 std::error_code setLastModificationAndAccessTime(int FD, TimePoint<> Time) {
 #if defined(HAVE_FUTIMENS)
   timespec Times[2];
   Times[0] = Times[1] = sys::toTimeSpec(Time);
   if (::futimens(FD, Times))
     return std::error_code(errno, std::generic_category());
   return std::error_code();
 #elif defined(HAVE_FUTIMES)
   timeval Times[2];
   Times[0] = Times[1] = sys::toTimeVal(
       std::chrono::time_point_cast<std::chrono::microseconds>(Time));
   if (::futimes(FD, Times))
     return std::error_code(errno, std::generic_category());
   return std::error_code();
 #else
 #warning Missing futimes() and futimens()
   return make_error_code(errc::function_not_supported);
 #endif
 }
 
 std::error_code mapped_file_region::init(int FD, uint64_t Offset,
                                          mapmode Mode) {
   assert(Size != 0);
 
   int flags = (Mode == readwrite) ? MAP_SHARED : MAP_PRIVATE;
   int prot = (Mode == readonly) ? PROT_READ : (PROT_READ | PROT_WRITE);
 #if defined(__APPLE__)
   //----------------------------------------------------------------------
   // Newer versions of MacOSX have a flag that will allow us to read from
   // binaries whose code signature is invalid without crashing by using
   // the MAP_RESILIENT_CODESIGN flag. Also if a file from removable media
   // is mapped we can avoid crashing and return zeroes to any pages we try
   // to read if the media becomes unavailable by using the
   // MAP_RESILIENT_MEDIA flag.  These flags are only usable when mapping
   // with PROT_READ, so take care not to specify them otherwise.
   //----------------------------------------------------------------------
   if (Mode == readonly) {
 #if defined(MAP_RESILIENT_CODESIGN)
     flags |= MAP_RESILIENT_CODESIGN;
 #endif
 #if defined(MAP_RESILIENT_MEDIA)
     flags |= MAP_RESILIENT_MEDIA;
 #endif
   }
 #endif // #if defined (__APPLE__)
 
   Mapping = ::mmap(nullptr, Size, prot, flags, FD, Offset);
   if (Mapping == MAP_FAILED)
     return std::error_code(errno, std::generic_category());
   return std::error_code();
 }
 
 mapped_file_region::mapped_file_region(int fd, mapmode mode, size_t length,
                                        uint64_t offset, std::error_code &ec)
     : Size(length), Mapping(), Mode(mode) {
   (void)Mode;
   ec = init(fd, offset, mode);
   if (ec)
     Mapping = nullptr;
 }
 
 mapped_file_region::~mapped_file_region() {
   if (Mapping)
     ::munmap(Mapping, Size);
 }
 
 size_t mapped_file_region::size() const {
   assert(Mapping && "Mapping failed but used anyway!");
   return Size;
 }
 
 char *mapped_file_region::data() const {
   assert(Mapping && "Mapping failed but used anyway!");
   return reinterpret_cast<char*>(Mapping);
 }
 
 const char *mapped_file_region::const_data() const {
   assert(Mapping && "Mapping failed but used anyway!");
   return reinterpret_cast<const char*>(Mapping);
 }
 
 int mapped_file_region::alignment() {
   return Process::getPageSize();
 }
 
 std::error_code detail::directory_iterator_construct(detail::DirIterState &it,
                                                      StringRef path,
                                                      bool follow_symlinks) {
   SmallString<128> path_null(path);
   DIR *directory = ::opendir(path_null.c_str());
   if (!directory)
     return std::error_code(errno, std::generic_category());
 
   it.IterationHandle = reinterpret_cast<intptr_t>(directory);
   // Add something for replace_filename to replace.
   path::append(path_null, ".");
   it.CurrentEntry = directory_entry(path_null.str(), follow_symlinks);
   return directory_iterator_increment(it);
 }
 
 std::error_code detail::directory_iterator_destruct(detail::DirIterState &it) {
   if (it.IterationHandle)
     ::closedir(reinterpret_cast<DIR *>(it.IterationHandle));
   it.IterationHandle = 0;
   it.CurrentEntry = directory_entry();
   return std::error_code();
 }
 
 std::error_code detail::directory_iterator_increment(detail::DirIterState &it) {
   errno = 0;
   dirent *cur_dir = ::readdir(reinterpret_cast<DIR *>(it.IterationHandle));
   if (cur_dir == nullptr && errno != 0) {
     return std::error_code(errno, std::generic_category());
   } else if (cur_dir != nullptr) {
     StringRef name(cur_dir->d_name);
     if ((name.size() == 1 && name[0] == '.') ||
         (name.size() == 2 && name[0] == '.' && name[1] == '.'))
       return directory_iterator_increment(it);
     it.CurrentEntry.replace_filename(name);
   } else
     return directory_iterator_destruct(it);
 
   return std::error_code();
 }
 
 ErrorOr<basic_file_status> directory_entry::status() const {
   file_status s;
   if (auto EC = fs::status(Path, s, FollowSymlinks))
     return EC;
   return s;
 }
 
 #if !defined(F_GETPATH)
 static bool hasProcSelfFD() {
   // If we have a /proc filesystem mounted, we can quickly establish the
   // real name of the file with readlink
   static const bool Result = (::access("/proc/self/fd", R_OK) == 0);
   return Result;
 }
 #endif
 
 static int nativeOpenFlags(CreationDisposition Disp, OpenFlags Flags,
                            FileAccess Access) {
   int Result = 0;
   if (Access == FA_Read)
     Result |= O_RDONLY;
   else if (Access == FA_Write)
     Result |= O_WRONLY;
   else if (Access == (FA_Read | FA_Write))
     Result |= O_RDWR;
 
   // This is for compatibility with old code that assumed F_Append implied
   // would open an existing file.  See Windows/Path.inc for a longer comment.
   if (Flags & F_Append)
     Disp = CD_OpenAlways;
 
   if (Disp == CD_CreateNew) {
     Result |= O_CREAT; // Create if it doesn't exist.
     Result |= O_EXCL;  // Fail if it does.
   } else if (Disp == CD_CreateAlways) {
     Result |= O_CREAT; // Create if it doesn't exist.
     Result |= O_TRUNC; // Truncate if it does.
   } else if (Disp == CD_OpenAlways) {
     Result |= O_CREAT; // Create if it doesn't exist.
   } else if (Disp == CD_OpenExisting) {
     // Nothing special, just don't add O_CREAT and we get these semantics.
   }
 
   if (Flags & F_Append)
     Result |= O_APPEND;
 
 #ifdef O_CLOEXEC
   if (!(Flags & OF_ChildInherit))
     Result |= O_CLOEXEC;
 #endif
 
   return Result;
 }
 
 std::error_code openFile(const Twine &Name, int &ResultFD,
                          CreationDisposition Disp, FileAccess Access,
                          OpenFlags Flags, unsigned Mode) {
   int OpenFlags = nativeOpenFlags(Disp, Flags, Access);
 
   SmallString<128> Storage;
   StringRef P = Name.toNullTerminatedStringRef(Storage);
-  if ((ResultFD = sys::RetryAfterSignal(-1, ::open, P.begin(), OpenFlags, Mode)) <
-      0)
+  // Call ::open in a lambda to avoid overload resolution in RetryAfterSignal
+  // when open is overloaded, such as in Bionic.
+  auto Open = [&]() { return ::open(P.begin(), OpenFlags, Mode); };
+  if ((ResultFD = sys::RetryAfterSignal(-1, Open)) < 0)
     return std::error_code(errno, std::generic_category());
 #ifndef O_CLOEXEC
   if (!(Flags & OF_ChildInherit)) {
     int r = fcntl(ResultFD, F_SETFD, FD_CLOEXEC);
     (void)r;
     assert(r == 0 && "fcntl(F_SETFD, FD_CLOEXEC) failed");
   }
 #endif
   return std::error_code();
 }
 
 Expected<int> openNativeFile(const Twine &Name, CreationDisposition Disp,
                              FileAccess Access, OpenFlags Flags,
                              unsigned Mode) {
 
   int FD;
   std::error_code EC = openFile(Name, FD, Disp, Access, Flags, Mode);
   if (EC)
     return errorCodeToError(EC);
   return FD;
 }
 
 std::error_code openFileForRead(const Twine &Name, int &ResultFD,
                                 OpenFlags Flags,
                                 SmallVectorImpl<char> *RealPath) {
   std::error_code EC =
       openFile(Name, ResultFD, CD_OpenExisting, FA_Read, Flags, 0666);
   if (EC)
     return EC;
 
   // Attempt to get the real name of the file, if the user asked
   if(!RealPath)
     return std::error_code();
   RealPath->clear();
 #if defined(F_GETPATH)
   // When F_GETPATH is availble, it is the quickest way to get
   // the real path name.
   char Buffer[MAXPATHLEN];
   if (::fcntl(ResultFD, F_GETPATH, Buffer) != -1)
     RealPath->append(Buffer, Buffer + strlen(Buffer));
 #else
   char Buffer[PATH_MAX];
   if (hasProcSelfFD()) {
     char ProcPath[64];
     snprintf(ProcPath, sizeof(ProcPath), "/proc/self/fd/%d", ResultFD);
     ssize_t CharCount = ::readlink(ProcPath, Buffer, sizeof(Buffer));
     if (CharCount > 0)
       RealPath->append(Buffer, Buffer + CharCount);
   } else {
     SmallString<128> Storage;
     StringRef P = Name.toNullTerminatedStringRef(Storage);
 
     // Use ::realpath to get the real path name
     if (::realpath(P.begin(), Buffer) != nullptr)
       RealPath->append(Buffer, Buffer + strlen(Buffer));
   }
 #endif
   return std::error_code();
 }
 
 Expected<file_t> openNativeFileForRead(const Twine &Name, OpenFlags Flags,
                                        SmallVectorImpl<char> *RealPath) {
   file_t ResultFD;
   std::error_code EC = openFileForRead(Name, ResultFD, Flags, RealPath);
   if (EC)
     return errorCodeToError(EC);
   return ResultFD;
 }
 
 void closeFile(file_t &F) {
   ::close(F);
   F = kInvalidFile;
 }
 
 template <typename T>
 static std::error_code remove_directories_impl(const T &Entry,
                                                bool IgnoreErrors) {
   std::error_code EC;
   directory_iterator Begin(Entry, EC, false);
   directory_iterator End;
   while (Begin != End) {
     auto &Item = *Begin;
     ErrorOr<basic_file_status> st = Item.status();
     if (!st && !IgnoreErrors)
       return st.getError();
 
     if (is_directory(*st)) {
       EC = remove_directories_impl(Item, IgnoreErrors);
       if (EC && !IgnoreErrors)
         return EC;
     }
 
     EC = fs::remove(Item.path(), true);
     if (EC && !IgnoreErrors)
       return EC;
 
     Begin.increment(EC);
     if (EC && !IgnoreErrors)
       return EC;
   }
   return std::error_code();
 }
 
 std::error_code remove_directories(const Twine &path, bool IgnoreErrors) {
   auto EC = remove_directories_impl(path, IgnoreErrors);
   if (EC && !IgnoreErrors)
     return EC;
   EC = fs::remove(path, true);
   if (EC && !IgnoreErrors)
     return EC;
   return std::error_code();
 }
 
 std::error_code real_path(const Twine &path, SmallVectorImpl<char> &dest,
                           bool expand_tilde) {
   dest.clear();
   if (path.isTriviallyEmpty())
     return std::error_code();
 
   if (expand_tilde) {
     SmallString<128> Storage;
     path.toVector(Storage);
     expandTildeExpr(Storage);
     return real_path(Storage, dest, false);
   }
 
   SmallString<128> Storage;
   StringRef P = path.toNullTerminatedStringRef(Storage);
   char Buffer[PATH_MAX];
   if (::realpath(P.begin(), Buffer) == nullptr)
     return std::error_code(errno, std::generic_category());
   dest.append(Buffer, Buffer + strlen(Buffer));
   return std::error_code();
 }
 
 } // end namespace fs
 
 namespace path {
 
 bool home_directory(SmallVectorImpl<char> &result) {
   char *RequestedDir = getenv("HOME");
   if (!RequestedDir) {
     struct passwd *pw = getpwuid(getuid());
     if (pw && pw->pw_dir)
       RequestedDir = pw->pw_dir;
   }
   if (!RequestedDir)
     return false;
 
   result.clear();
   result.append(RequestedDir, RequestedDir + strlen(RequestedDir));
   return true;
 }
 
 static bool getDarwinConfDir(bool TempDir, SmallVectorImpl<char> &Result) {
   #if defined(_CS_DARWIN_USER_TEMP_DIR) && defined(_CS_DARWIN_USER_CACHE_DIR)
   // On Darwin, use DARWIN_USER_TEMP_DIR or DARWIN_USER_CACHE_DIR.
   // macros defined in <unistd.h> on darwin >= 9
   int ConfName = TempDir ? _CS_DARWIN_USER_TEMP_DIR
                          : _CS_DARWIN_USER_CACHE_DIR;
   size_t ConfLen = confstr(ConfName, nullptr, 0);
   if (ConfLen > 0) {
     do {
       Result.resize(ConfLen);
       ConfLen = confstr(ConfName, Result.data(), Result.size());
     } while (ConfLen > 0 && ConfLen != Result.size());
 
     if (ConfLen > 0) {
       assert(Result.back() == 0);
       Result.pop_back();
       return true;
     }
 
     Result.clear();
   }
   #endif
   return false;
 }
 
 static bool getUserCacheDir(SmallVectorImpl<char> &Result) {
   // First try using XDG_CACHE_HOME env variable,
   // as specified in XDG Base Directory Specification at
   // http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html
   if (const char *XdgCacheDir = std::getenv("XDG_CACHE_HOME")) {
     Result.clear();
     Result.append(XdgCacheDir, XdgCacheDir + strlen(XdgCacheDir));
     return true;
   }
 
   // Try Darwin configuration query
   if (getDarwinConfDir(false, Result))
     return true;
 
   // Use "$HOME/.cache" if $HOME is available
   if (home_directory(Result)) {
     append(Result, ".cache");
     return true;
   }
 
   return false;
 }
 
 static const char *getEnvTempDir() {
   // Check whether the temporary directory is specified by an environment
   // variable.
   const char *EnvironmentVariables[] = {"TMPDIR", "TMP", "TEMP", "TEMPDIR"};
   for (const char *Env : EnvironmentVariables) {
     if (const char *Dir = std::getenv(Env))
       return Dir;
   }
 
   return nullptr;
 }
 
 static const char *getDefaultTempDir(bool ErasedOnReboot) {
 #ifdef P_tmpdir
   if ((bool)P_tmpdir)
     return P_tmpdir;
 #endif
 
   if (ErasedOnReboot)
     return "/tmp";
   return "/var/tmp";
 }
 
 void system_temp_directory(bool ErasedOnReboot, SmallVectorImpl<char> &Result) {
   Result.clear();
 
   if (ErasedOnReboot) {
     // There is no env variable for the cache directory.
     if (const char *RequestedDir = getEnvTempDir()) {
       Result.append(RequestedDir, RequestedDir + strlen(RequestedDir));
       return;
     }
   }
 
   if (getDarwinConfDir(ErasedOnReboot, Result))
     return;
 
   const char *RequestedDir = getDefaultTempDir(ErasedOnReboot);
   Result.append(RequestedDir, RequestedDir + strlen(RequestedDir));
 }
 
 } // end namespace path
 
 } // end namespace sys
 } // end namespace llvm
Index: vendor/llvm/dist-release_70/lib/Support/Unix/Process.inc
===================================================================
--- vendor/llvm/dist-release_70/lib/Support/Unix/Process.inc	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Support/Unix/Process.inc	(revision 338575)
@@ -1,457 +1,460 @@
 //===- Unix/Process.cpp - Unix Process Implementation --------- -*- C++ -*-===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file provides the generic Unix implementation of the Process class.
 //
 //===----------------------------------------------------------------------===//
 
 #include "Unix.h"
 #include "llvm/ADT/Hashing.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Config/config.h"
 #include "llvm/Support/ManagedStatic.h"
 #include "llvm/Support/Mutex.h"
 #include "llvm/Support/MutexGuard.h"
 #if HAVE_FCNTL_H
 #include <fcntl.h>
 #endif
 #ifdef HAVE_SYS_TIME_H
 #include <sys/time.h>
 #endif
 #ifdef HAVE_SYS_RESOURCE_H
 #include <sys/resource.h>
 #endif
 #ifdef HAVE_SYS_STAT_H
 #include <sys/stat.h>
 #endif
 #if HAVE_SIGNAL_H
 #include <signal.h>
 #endif
 // DragonFlyBSD, and OpenBSD have deprecated <malloc.h> for
 // <stdlib.h> instead. Unix.h includes this for us already.
 #if defined(HAVE_MALLOC_H) && !defined(__DragonFly__) && \
     !defined(__OpenBSD__) 
 #include <malloc.h>
 #endif
 #if defined(HAVE_MALLCTL)
 #include <malloc_np.h>
 #endif
 #ifdef HAVE_MALLOC_MALLOC_H
 #include <malloc/malloc.h>
 #endif
 #ifdef HAVE_SYS_IOCTL_H
 #  include <sys/ioctl.h>
 #endif
 #ifdef HAVE_TERMIOS_H
 #  include <termios.h>
 #endif
 
 //===----------------------------------------------------------------------===//
 //=== WARNING: Implementation here must contain only generic UNIX code that
 //===          is guaranteed to work on *all* UNIX variants.
 //===----------------------------------------------------------------------===//
 
 using namespace llvm;
 using namespace sys;
 
 static std::pair<std::chrono::microseconds, std::chrono::microseconds> getRUsageTimes() {
 #if defined(HAVE_GETRUSAGE)
   struct rusage RU;
   ::getrusage(RUSAGE_SELF, &RU);
   return { toDuration(RU.ru_utime), toDuration(RU.ru_stime) };
 #else
 #warning Cannot get usage times on this platform
   return { std::chrono::microseconds::zero(), std::chrono::microseconds::zero() };
 #endif
 }
 
 // On Cygwin, getpagesize() returns 64k(AllocationGranularity) and
 // offset in mmap(3) should be aligned to the AllocationGranularity.
 unsigned Process::getPageSize() {
 #if defined(HAVE_GETPAGESIZE)
   static const int page_size = ::getpagesize();
 #elif defined(HAVE_SYSCONF)
   static long page_size = ::sysconf(_SC_PAGE_SIZE);
 #else
 #error Cannot get the page size on this machine
 #endif
   return static_cast<unsigned>(page_size);
 }
 
 size_t Process::GetMallocUsage() {
 #if defined(HAVE_MALLINFO)
   struct mallinfo mi;
   mi = ::mallinfo();
   return mi.uordblks;
 #elif defined(HAVE_MALLOC_ZONE_STATISTICS) && defined(HAVE_MALLOC_MALLOC_H)
   malloc_statistics_t Stats;
   malloc_zone_statistics(malloc_default_zone(), &Stats);
   return Stats.size_in_use;   // darwin
 #elif defined(HAVE_MALLCTL)
   size_t alloc, sz;
   sz = sizeof(size_t);
   if (mallctl("stats.allocated", &alloc, &sz, NULL, 0) == 0)
     return alloc;
   return 0;
 #elif defined(HAVE_SBRK)
   // Note this is only an approximation and more closely resembles
   // the value returned by mallinfo in the arena field.
   static char *StartOfMemory = reinterpret_cast<char*>(::sbrk(0));
   char *EndOfMemory = (char*)sbrk(0);
   if (EndOfMemory != ((char*)-1) && StartOfMemory != ((char*)-1))
     return EndOfMemory - StartOfMemory;
   return 0;
 #else
 #warning Cannot get malloc info on this platform
   return 0;
 #endif
 }
 
 void Process::GetTimeUsage(TimePoint<> &elapsed, std::chrono::nanoseconds &user_time,
                            std::chrono::nanoseconds &sys_time) {
   elapsed = std::chrono::system_clock::now();
   std::tie(user_time, sys_time) = getRUsageTimes();
 }
 
 #if defined(HAVE_MACH_MACH_H) && !defined(__GNU__)
 #include <mach/mach.h>
 #endif
 
 // Some LLVM programs such as bugpoint produce core files as a normal part of
 // their operation. To prevent the disk from filling up, this function
 // does what's necessary to prevent their generation.
 void Process::PreventCoreFiles() {
 #if HAVE_SETRLIMIT
   struct rlimit rlim;
   rlim.rlim_cur = rlim.rlim_max = 0;
   setrlimit(RLIMIT_CORE, &rlim);
 #endif
 
 #if defined(HAVE_MACH_MACH_H) && !defined(__GNU__)
   // Disable crash reporting on Mac OS X 10.0-10.4
 
   // get information about the original set of exception ports for the task
   mach_msg_type_number_t Count = 0;
   exception_mask_t OriginalMasks[EXC_TYPES_COUNT];
   exception_port_t OriginalPorts[EXC_TYPES_COUNT];
   exception_behavior_t OriginalBehaviors[EXC_TYPES_COUNT];
   thread_state_flavor_t OriginalFlavors[EXC_TYPES_COUNT];
   kern_return_t err =
     task_get_exception_ports(mach_task_self(), EXC_MASK_ALL, OriginalMasks,
                              &Count, OriginalPorts, OriginalBehaviors,
                              OriginalFlavors);
   if (err == KERN_SUCCESS) {
     // replace each with MACH_PORT_NULL.
     for (unsigned i = 0; i != Count; ++i)
       task_set_exception_ports(mach_task_self(), OriginalMasks[i],
                                MACH_PORT_NULL, OriginalBehaviors[i],
                                OriginalFlavors[i]);
   }
 
   // Disable crash reporting on Mac OS X 10.5
   signal(SIGABRT, _exit);
   signal(SIGILL,  _exit);
   signal(SIGFPE,  _exit);
   signal(SIGSEGV, _exit);
   signal(SIGBUS,  _exit);
 #endif
 
   coreFilesPrevented = true;
 }
 
 Optional<std::string> Process::GetEnv(StringRef Name) {
   std::string NameStr = Name.str();
   const char *Val = ::getenv(NameStr.c_str());
   if (!Val)
     return None;
   return std::string(Val);
 }
 
 namespace {
 class FDCloser {
 public:
   FDCloser(int &FD) : FD(FD), KeepOpen(false) {}
   void keepOpen() { KeepOpen = true; }
   ~FDCloser() {
     if (!KeepOpen && FD >= 0)
       ::close(FD);
   }
 
 private:
   FDCloser(const FDCloser &) = delete;
   void operator=(const FDCloser &) = delete;
 
   int &FD;
   bool KeepOpen;
 };
 }
 
 std::error_code Process::FixupStandardFileDescriptors() {
   int NullFD = -1;
   FDCloser FDC(NullFD);
   const int StandardFDs[] = {STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO};
   for (int StandardFD : StandardFDs) {
     struct stat st;
     errno = 0;
     if (RetryAfterSignal(-1, ::fstat, StandardFD, &st) < 0) {
       assert(errno && "expected errno to be set if fstat failed!");
       // fstat should return EBADF if the file descriptor is closed.
       if (errno != EBADF)
         return std::error_code(errno, std::generic_category());
     }
     // if fstat succeeds, move on to the next FD.
     if (!errno)
       continue;
     assert(errno == EBADF && "expected errno to have EBADF at this point!");
 
     if (NullFD < 0) {
-      if ((NullFD = RetryAfterSignal(-1, ::open, "/dev/null", O_RDWR)) < 0)
+      // Call ::open in a lambda to avoid overload resolution in
+      // RetryAfterSignal when open is overloaded, such as in Bionic.
+      auto Open = [&]() { return ::open("/dev/null", O_RDWR); };
+      if ((NullFD = RetryAfterSignal(-1, Open)) < 0)
         return std::error_code(errno, std::generic_category());
     }
 
     if (NullFD == StandardFD)
       FDC.keepOpen();
     else if (dup2(NullFD, StandardFD) < 0)
       return std::error_code(errno, std::generic_category());
   }
   return std::error_code();
 }
 
 std::error_code Process::SafelyCloseFileDescriptor(int FD) {
   // Create a signal set filled with *all* signals.
   sigset_t FullSet;
   if (sigfillset(&FullSet) < 0)
     return std::error_code(errno, std::generic_category());
   // Atomically swap our current signal mask with a full mask.
   sigset_t SavedSet;
 #if LLVM_ENABLE_THREADS
   if (int EC = pthread_sigmask(SIG_SETMASK, &FullSet, &SavedSet))
     return std::error_code(EC, std::generic_category());
 #else
   if (sigprocmask(SIG_SETMASK, &FullSet, &SavedSet) < 0)
     return std::error_code(errno, std::generic_category());
 #endif
   // Attempt to close the file descriptor.
   // We need to save the error, if one occurs, because our subsequent call to
   // pthread_sigmask might tamper with errno.
   int ErrnoFromClose = 0;
   if (::close(FD) < 0)
     ErrnoFromClose = errno;
   // Restore the signal mask back to what we saved earlier.
   int EC = 0;
 #if LLVM_ENABLE_THREADS
   EC = pthread_sigmask(SIG_SETMASK, &SavedSet, nullptr);
 #else
   if (sigprocmask(SIG_SETMASK, &SavedSet, nullptr) < 0)
     EC = errno;
 #endif
   // The error code from close takes precedence over the one from
   // pthread_sigmask.
   if (ErrnoFromClose)
     return std::error_code(ErrnoFromClose, std::generic_category());
   return std::error_code(EC, std::generic_category());
 }
 
 bool Process::StandardInIsUserInput() {
   return FileDescriptorIsDisplayed(STDIN_FILENO);
 }
 
 bool Process::StandardOutIsDisplayed() {
   return FileDescriptorIsDisplayed(STDOUT_FILENO);
 }
 
 bool Process::StandardErrIsDisplayed() {
   return FileDescriptorIsDisplayed(STDERR_FILENO);
 }
 
 bool Process::FileDescriptorIsDisplayed(int fd) {
 #if HAVE_ISATTY
   return isatty(fd);
 #else
   // If we don't have isatty, just return false.
   return false;
 #endif
 }
 
 static unsigned getColumns(int FileID) {
   // If COLUMNS is defined in the environment, wrap to that many columns.
   if (const char *ColumnsStr = std::getenv("COLUMNS")) {
     int Columns = std::atoi(ColumnsStr);
     if (Columns > 0)
       return Columns;
   }
 
   unsigned Columns = 0;
 
 #if defined(HAVE_SYS_IOCTL_H) && defined(HAVE_TERMIOS_H)
   // Try to determine the width of the terminal.
   struct winsize ws;
   if (ioctl(FileID, TIOCGWINSZ, &ws) == 0)
     Columns = ws.ws_col;
 #endif
 
   return Columns;
 }
 
 unsigned Process::StandardOutColumns() {
   if (!StandardOutIsDisplayed())
     return 0;
 
   return getColumns(1);
 }
 
 unsigned Process::StandardErrColumns() {
   if (!StandardErrIsDisplayed())
     return 0;
 
   return getColumns(2);
 }
 
 #ifdef HAVE_TERMINFO
 // We manually declare these extern functions because finding the correct
 // headers from various terminfo, curses, or other sources is harder than
 // writing their specs down.
 extern "C" int setupterm(char *term, int filedes, int *errret);
 extern "C" struct term *set_curterm(struct term *termp);
 extern "C" int del_curterm(struct term *termp);
 extern "C" int tigetnum(char *capname);
 #endif
 
 #ifdef HAVE_TERMINFO
 static ManagedStatic<sys::Mutex> TermColorMutex;
 #endif
 
 static bool terminalHasColors(int fd) {
 #ifdef HAVE_TERMINFO
   // First, acquire a global lock because these C routines are thread hostile.
   MutexGuard G(*TermColorMutex);
 
   int errret = 0;
   if (setupterm(nullptr, fd, &errret) != 0)
     // Regardless of why, if we can't get terminfo, we shouldn't try to print
     // colors.
     return false;
 
   // Test whether the terminal as set up supports color output. How to do this
   // isn't entirely obvious. We can use the curses routine 'has_colors' but it
   // would be nice to avoid a dependency on curses proper when we can make do
   // with a minimal terminfo parsing library. Also, we don't really care whether
   // the terminal supports the curses-specific color changing routines, merely
   // if it will interpret ANSI color escape codes in a reasonable way. Thus, the
   // strategy here is just to query the baseline colors capability and if it
   // supports colors at all to assume it will translate the escape codes into
   // whatever range of colors it does support. We can add more detailed tests
   // here if users report them as necessary.
   //
   // The 'tigetnum' routine returns -2 or -1 on errors, and might return 0 if
   // the terminfo says that no colors are supported.
   bool HasColors = tigetnum(const_cast<char *>("colors")) > 0;
 
   // Now extract the structure allocated by setupterm and free its memory
   // through a really silly dance.
   struct term *termp = set_curterm(nullptr);
   (void)del_curterm(termp); // Drop any errors here.
 
   // Return true if we found a color capabilities for the current terminal.
   if (HasColors)
     return true;
 #else
   // When the terminfo database is not available, check if the current terminal
   // is one of terminals that are known to support ANSI color escape codes.
   if (const char *TermStr = std::getenv("TERM")) {
     return StringSwitch<bool>(TermStr)
       .Case("ansi", true)
       .Case("cygwin", true)
       .Case("linux", true)
       .StartsWith("screen", true)
       .StartsWith("xterm", true)
       .StartsWith("vt100", true)
       .StartsWith("rxvt", true)
       .EndsWith("color", true)
       .Default(false);
   }
 #endif
 
   // Otherwise, be conservative.
   return false;
 }
 
 bool Process::FileDescriptorHasColors(int fd) {
   // A file descriptor has colors if it is displayed and the terminal has
   // colors.
   return FileDescriptorIsDisplayed(fd) && terminalHasColors(fd);
 }
 
 bool Process::StandardOutHasColors() {
   return FileDescriptorHasColors(STDOUT_FILENO);
 }
 
 bool Process::StandardErrHasColors() {
   return FileDescriptorHasColors(STDERR_FILENO);
 }
 
 void Process::UseANSIEscapeCodes(bool /*enable*/) {
   // No effect.
 }
 
 bool Process::ColorNeedsFlush() {
   // No, we use ANSI escape sequences.
   return false;
 }
 
 const char *Process::OutputColor(char code, bool bold, bool bg) {
   return colorcodes[bg?1:0][bold?1:0][code&7];
 }
 
 const char *Process::OutputBold(bool bg) {
   return "\033[1m";
 }
 
 const char *Process::OutputReverse() {
   return "\033[7m";
 }
 
 const char *Process::ResetColor() {
   return "\033[0m";
 }
 
 #if !HAVE_DECL_ARC4RANDOM
 static unsigned GetRandomNumberSeed() {
   // Attempt to get the initial seed from /dev/urandom, if possible.
   int urandomFD = open("/dev/urandom", O_RDONLY);
 
   if (urandomFD != -1) {
     unsigned seed;
     // Don't use a buffered read to avoid reading more data
     // from /dev/urandom than we need.
     int count = read(urandomFD, (void *)&seed, sizeof(seed));
 
     close(urandomFD);
 
     // Return the seed if the read was successful.
     if (count == sizeof(seed))
       return seed;
   }
 
   // Otherwise, swizzle the current time and the process ID to form a reasonable
   // seed.
   const auto Now = std::chrono::high_resolution_clock::now();
   return hash_combine(Now.time_since_epoch().count(), ::getpid());
 }
 #endif
 
 unsigned llvm::sys::Process::GetRandomNumber() {
 #if HAVE_DECL_ARC4RANDOM
   return arc4random();
 #else
   static int x = (static_cast<void>(::srand(GetRandomNumberSeed())), 0);
   (void)x;
   return ::rand();
 #endif
 }
Index: vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPU.h
===================================================================
--- vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPU.h	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPU.h	(revision 338575)
@@ -1,281 +1,281 @@
 //===-- AMDGPU.h - MachineFunction passes hw codegen --------------*- C++ -*-=//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 /// \file
 //===----------------------------------------------------------------------===//
 
 #ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPU_H
 #define LLVM_LIB_TARGET_AMDGPU_AMDGPU_H
 
 #include "llvm/Target/TargetMachine.h"
 
 namespace llvm {
 
 class AMDGPUTargetMachine;
 class FunctionPass;
 class GCNTargetMachine;
 class ModulePass;
 class Pass;
 class Target;
 class TargetMachine;
 class TargetOptions;
 class PassRegistry;
 class Module;
 
 // R600 Passes
 FunctionPass *createR600VectorRegMerger();
 FunctionPass *createR600ExpandSpecialInstrsPass();
 FunctionPass *createR600EmitClauseMarkers();
 FunctionPass *createR600ClauseMergePass();
 FunctionPass *createR600Packetizer();
 FunctionPass *createR600ControlFlowFinalizer();
 FunctionPass *createAMDGPUCFGStructurizerPass();
 FunctionPass *createR600ISelDag(TargetMachine *TM, CodeGenOpt::Level OptLevel);
 
 // SI Passes
 FunctionPass *createSIAnnotateControlFlowPass();
 FunctionPass *createSIFoldOperandsPass();
 FunctionPass *createSIPeepholeSDWAPass();
 FunctionPass *createSILowerI1CopiesPass();
 FunctionPass *createSIShrinkInstructionsPass();
 FunctionPass *createSILoadStoreOptimizerPass();
 FunctionPass *createSIWholeQuadModePass();
 FunctionPass *createSIFixControlFlowLiveIntervalsPass();
 FunctionPass *createSIOptimizeExecMaskingPreRAPass();
 FunctionPass *createSIFixSGPRCopiesPass();
 FunctionPass *createSIMemoryLegalizerPass();
 FunctionPass *createSIDebuggerInsertNopsPass();
 FunctionPass *createSIInsertWaitcntsPass();
 FunctionPass *createSIFixWWMLivenessPass();
 FunctionPass *createSIFormMemoryClausesPass();
 FunctionPass *createAMDGPUSimplifyLibCallsPass(const TargetOptions &);
 FunctionPass *createAMDGPUUseNativeCallsPass();
 FunctionPass *createAMDGPUCodeGenPreparePass();
 FunctionPass *createAMDGPUMachineCFGStructurizerPass();
 FunctionPass *createAMDGPURewriteOutArgumentsPass();
 
 void initializeAMDGPUDAGToDAGISelPass(PassRegistry&);
 
 void initializeAMDGPUMachineCFGStructurizerPass(PassRegistry&);
 extern char &AMDGPUMachineCFGStructurizerID;
 
 void initializeAMDGPUAlwaysInlinePass(PassRegistry&);
 
 Pass *createAMDGPUAnnotateKernelFeaturesPass();
 void initializeAMDGPUAnnotateKernelFeaturesPass(PassRegistry &);
 extern char &AMDGPUAnnotateKernelFeaturesID;
 
 ModulePass *createAMDGPULowerIntrinsicsPass();
 void initializeAMDGPULowerIntrinsicsPass(PassRegistry &);
 extern char &AMDGPULowerIntrinsicsID;
 
 FunctionPass *createAMDGPULowerKernelArgumentsPass();
 void initializeAMDGPULowerKernelArgumentsPass(PassRegistry &);
 extern char &AMDGPULowerKernelArgumentsID;
 
 ModulePass *createAMDGPULowerKernelAttributesPass();
 void initializeAMDGPULowerKernelAttributesPass(PassRegistry &);
 extern char &AMDGPULowerKernelAttributesID;
 
 void initializeAMDGPURewriteOutArgumentsPass(PassRegistry &);
 extern char &AMDGPURewriteOutArgumentsID;
 
 void initializeR600ClauseMergePassPass(PassRegistry &);
 extern char &R600ClauseMergePassID;
 
 void initializeR600ControlFlowFinalizerPass(PassRegistry &);
 extern char &R600ControlFlowFinalizerID;
 
 void initializeR600ExpandSpecialInstrsPassPass(PassRegistry &);
 extern char &R600ExpandSpecialInstrsPassID;
 
 void initializeR600VectorRegMergerPass(PassRegistry &);
 extern char &R600VectorRegMergerID;
 
 void initializeR600PacketizerPass(PassRegistry &);
 extern char &R600PacketizerID;
 
 void initializeSIFoldOperandsPass(PassRegistry &);
 extern char &SIFoldOperandsID;
 
 void initializeSIPeepholeSDWAPass(PassRegistry &);
 extern char &SIPeepholeSDWAID;
 
 void initializeSIShrinkInstructionsPass(PassRegistry&);
 extern char &SIShrinkInstructionsID;
 
 void initializeSIFixSGPRCopiesPass(PassRegistry &);
 extern char &SIFixSGPRCopiesID;
 
 void initializeSIFixVGPRCopiesPass(PassRegistry &);
 extern char &SIFixVGPRCopiesID;
 
 void initializeSILowerI1CopiesPass(PassRegistry &);
 extern char &SILowerI1CopiesID;
 
 void initializeSILoadStoreOptimizerPass(PassRegistry &);
 extern char &SILoadStoreOptimizerID;
 
 void initializeSIWholeQuadModePass(PassRegistry &);
 extern char &SIWholeQuadModeID;
 
 void initializeSILowerControlFlowPass(PassRegistry &);
 extern char &SILowerControlFlowID;
 
 void initializeSIInsertSkipsPass(PassRegistry &);
 extern char &SIInsertSkipsPassID;
 
 void initializeSIOptimizeExecMaskingPass(PassRegistry &);
 extern char &SIOptimizeExecMaskingID;
 
 void initializeSIFixWWMLivenessPass(PassRegistry &);
 extern char &SIFixWWMLivenessID;
 
 void initializeAMDGPUSimplifyLibCallsPass(PassRegistry &);
 extern char &AMDGPUSimplifyLibCallsID;
 
 void initializeAMDGPUUseNativeCallsPass(PassRegistry &);
 extern char &AMDGPUUseNativeCallsID;
 
 void initializeAMDGPUPerfHintAnalysisPass(PassRegistry &);
 extern char &AMDGPUPerfHintAnalysisID;
 
 // Passes common to R600 and SI
 FunctionPass *createAMDGPUPromoteAlloca();
 void initializeAMDGPUPromoteAllocaPass(PassRegistry&);
 extern char &AMDGPUPromoteAllocaID;
 
 Pass *createAMDGPUStructurizeCFGPass();
 FunctionPass *createAMDGPUISelDag(
   TargetMachine *TM = nullptr,
   CodeGenOpt::Level OptLevel = CodeGenOpt::Default);
 ModulePass *createAMDGPUAlwaysInlinePass(bool GlobalOpt = true);
 ModulePass *createR600OpenCLImageTypeLoweringPass();
 FunctionPass *createAMDGPUAnnotateUniformValues();
 
 ModulePass* createAMDGPUUnifyMetadataPass();
 void initializeAMDGPUUnifyMetadataPass(PassRegistry&);
 extern char &AMDGPUUnifyMetadataID;
 
 void initializeSIOptimizeExecMaskingPreRAPass(PassRegistry&);
 extern char &SIOptimizeExecMaskingPreRAID;
 
 void initializeAMDGPUAnnotateUniformValuesPass(PassRegistry&);
 extern char &AMDGPUAnnotateUniformValuesPassID;
 
 void initializeAMDGPUCodeGenPreparePass(PassRegistry&);
 extern char &AMDGPUCodeGenPrepareID;
 
 void initializeSIAnnotateControlFlowPass(PassRegistry&);
 extern char &SIAnnotateControlFlowPassID;
 
 void initializeSIMemoryLegalizerPass(PassRegistry&);
 extern char &SIMemoryLegalizerID;
 
 void initializeSIDebuggerInsertNopsPass(PassRegistry&);
 extern char &SIDebuggerInsertNopsID;
 
 void initializeSIInsertWaitcntsPass(PassRegistry&);
 extern char &SIInsertWaitcntsID;
 
 void initializeSIFormMemoryClausesPass(PassRegistry&);
 extern char &SIFormMemoryClausesID;
 
 void initializeAMDGPUUnifyDivergentExitNodesPass(PassRegistry&);
 extern char &AMDGPUUnifyDivergentExitNodesID;
 
 ImmutablePass *createAMDGPUAAWrapperPass();
 void initializeAMDGPUAAWrapperPassPass(PassRegistry&);
 
 void initializeAMDGPUArgumentUsageInfoPass(PassRegistry &);
 
 Pass *createAMDGPUFunctionInliningPass();
 void initializeAMDGPUInlinerPass(PassRegistry&);
 
 ModulePass *createAMDGPUOpenCLEnqueuedBlockLoweringPass();
 void initializeAMDGPUOpenCLEnqueuedBlockLoweringPass(PassRegistry &);
 extern char &AMDGPUOpenCLEnqueuedBlockLoweringID;
 
 Target &getTheAMDGPUTarget();
 Target &getTheGCNTarget();
 
 namespace AMDGPU {
 enum TargetIndex {
   TI_CONSTDATA_START,
   TI_SCRATCH_RSRC_DWORD0,
   TI_SCRATCH_RSRC_DWORD1,
   TI_SCRATCH_RSRC_DWORD2,
   TI_SCRATCH_RSRC_DWORD3
 };
 }
 
 } // End namespace llvm
 
 /// OpenCL uses address spaces to differentiate between
 /// various memory regions on the hardware. On the CPU
 /// all of the address spaces point to the same memory,
 /// however on the GPU, each address space points to
 /// a separate piece of memory that is unique from other
 /// memory locations.
 struct AMDGPUAS {
   // The following address space values depend on the triple environment.
   unsigned PRIVATE_ADDRESS;  ///< Address space for private memory.
   unsigned FLAT_ADDRESS;     ///< Address space for flat memory.
   unsigned REGION_ADDRESS;   ///< Address space for region memory.
 
   enum : unsigned {
     // The maximum value for flat, generic, local, private, constant and region.
-    MAX_COMMON_ADDRESS = 5,
+    MAX_AMDGPU_ADDRESS = 6,
 
     GLOBAL_ADDRESS = 1,   ///< Address space for global memory (RAT0, VTX0).
     CONSTANT_ADDRESS = 4, ///< Address space for constant memory (VTX2)
     LOCAL_ADDRESS = 3,    ///< Address space for local memory.
 
     CONSTANT_ADDRESS_32BIT = 6, ///< Address space for 32-bit constant memory
 
     /// Address space for direct addressible parameter memory (CONST0)
     PARAM_D_ADDRESS = 6,
     /// Address space for indirect addressible parameter memory (VTX1)
     PARAM_I_ADDRESS = 7,
 
     // Do not re-order the CONSTANT_BUFFER_* enums.  Several places depend on
     // this order to be able to dynamically index a constant buffer, for
     // example:
     //
     // ConstantBufferAS = CONSTANT_BUFFER_0 + CBIdx
 
     CONSTANT_BUFFER_0 = 8,
     CONSTANT_BUFFER_1 = 9,
     CONSTANT_BUFFER_2 = 10,
     CONSTANT_BUFFER_3 = 11,
     CONSTANT_BUFFER_4 = 12,
     CONSTANT_BUFFER_5 = 13,
     CONSTANT_BUFFER_6 = 14,
     CONSTANT_BUFFER_7 = 15,
     CONSTANT_BUFFER_8 = 16,
     CONSTANT_BUFFER_9 = 17,
     CONSTANT_BUFFER_10 = 18,
     CONSTANT_BUFFER_11 = 19,
     CONSTANT_BUFFER_12 = 20,
     CONSTANT_BUFFER_13 = 21,
     CONSTANT_BUFFER_14 = 22,
     CONSTANT_BUFFER_15 = 23,
 
     // Some places use this if the address space can't be determined.
     UNKNOWN_ADDRESS_SPACE = ~0u,
   };
 };
 
 namespace llvm {
 namespace AMDGPU {
 AMDGPUAS getAMDGPUAS(const Module &M);
 AMDGPUAS getAMDGPUAS(const TargetMachine &TM);
 AMDGPUAS getAMDGPUAS(Triple T);
 } // namespace AMDGPU
 } // namespace llvm
 
 #endif
Index: vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
===================================================================
--- vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp	(revision 338575)
@@ -1,160 +1,164 @@
 //===- AMDGPUAliasAnalysis ------------------------------------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 /// \file
 /// This is the AMGPU address space based alias analysis pass.
 //===----------------------------------------------------------------------===//
 
 #include "AMDGPUAliasAnalysis.h"
 #include "AMDGPU.h"
 #include "llvm/ADT/Triple.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/Analysis/MemoryLocation.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/Argument.h"
 #include "llvm/IR/Attributes.h"
 #include "llvm/IR/CallingConv.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/Type.h"
 #include "llvm/IR/Value.h"
 #include "llvm/Pass.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/ErrorHandling.h"
 #include <cassert>
 
 using namespace llvm;
 
 #define DEBUG_TYPE "amdgpu-aa"
 
 // Register this pass...
 char AMDGPUAAWrapperPass::ID = 0;
 
 INITIALIZE_PASS(AMDGPUAAWrapperPass, "amdgpu-aa",
                 "AMDGPU Address space based Alias Analysis", false, true)
 
 ImmutablePass *llvm::createAMDGPUAAWrapperPass() {
   return new AMDGPUAAWrapperPass();
 }
 
 void AMDGPUAAWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.setPreservesAll();
 }
 
 // Must match the table in getAliasResult.
 AMDGPUAAResult::ASAliasRulesTy::ASAliasRulesTy(AMDGPUAS AS_, Triple::ArchType Arch_)
   : Arch(Arch_), AS(AS_) {
   // These arrarys are indexed by address space value
-  // enum elements 0 ... to 5
-  static const AliasResult ASAliasRulesPrivIsZero[6][6] = {
-  /*             Private    Global    Constant  Group     Flat      Region*/
-  /* Private  */ {MayAlias, NoAlias , NoAlias , NoAlias , MayAlias, NoAlias},
-  /* Global   */ {NoAlias , MayAlias, NoAlias , NoAlias , MayAlias, NoAlias},
-  /* Constant */ {NoAlias , NoAlias , MayAlias, NoAlias , MayAlias, NoAlias},
-  /* Group    */ {NoAlias , NoAlias , NoAlias , MayAlias, MayAlias, NoAlias},
-  /* Flat     */ {MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias},
-  /* Region   */ {NoAlias , NoAlias , NoAlias , NoAlias , MayAlias, MayAlias}
+  // enum elements 0 ... to 6
+  static const AliasResult ASAliasRulesPrivIsZero[7][7] = {
+  /*                    Private    Global    Constant  Group     Flat      Region    Constant 32-bit */
+  /* Private  */        {MayAlias, NoAlias , NoAlias , NoAlias , MayAlias, NoAlias , NoAlias},
+  /* Global   */        {NoAlias , MayAlias, MayAlias, NoAlias , MayAlias, NoAlias , MayAlias},
+  /* Constant */        {NoAlias , MayAlias, MayAlias, NoAlias , MayAlias, NoAlias , MayAlias},
+  /* Group    */        {NoAlias , NoAlias , NoAlias , MayAlias, MayAlias, NoAlias , NoAlias},
+  /* Flat     */        {MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias},
+  /* Region   */        {NoAlias , NoAlias , NoAlias , NoAlias , MayAlias, MayAlias, NoAlias},
+  /* Constant 32-bit */ {NoAlias , MayAlias, MayAlias, NoAlias , MayAlias, NoAlias , MayAlias}
   };
-  static const AliasResult ASAliasRulesGenIsZero[6][6] = {
-  /*             Flat       Global    Region    Group     Constant  Private */
-  /* Flat     */ {MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias},
-  /* Global   */ {MayAlias, MayAlias, NoAlias , NoAlias , NoAlias , NoAlias},
-  /* Constant */ {MayAlias, NoAlias , MayAlias, NoAlias , NoAlias,  NoAlias},
-  /* Group    */ {MayAlias, NoAlias , NoAlias , MayAlias, NoAlias , NoAlias},
-  /* Region   */ {MayAlias, NoAlias , NoAlias , NoAlias,  MayAlias, NoAlias},
-  /* Private  */ {MayAlias, NoAlias , NoAlias , NoAlias , NoAlias , MayAlias}
+  static const AliasResult ASAliasRulesGenIsZero[7][7] = {
+  /*                    Flat       Global    Region    Group     Constant  Private   Constant 32-bit */
+  /* Flat     */        {MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias, MayAlias},
+  /* Global   */        {MayAlias, MayAlias, NoAlias , NoAlias , MayAlias, NoAlias , MayAlias},
+  /* Region   */        {MayAlias, NoAlias , NoAlias , NoAlias,  MayAlias, NoAlias , MayAlias},
+  /* Group    */        {MayAlias, NoAlias , NoAlias , MayAlias, NoAlias , NoAlias , NoAlias},
+  /* Constant */        {MayAlias, MayAlias, MayAlias, NoAlias , NoAlias,  NoAlias , MayAlias},
+  /* Private  */        {MayAlias, NoAlias , NoAlias , NoAlias , NoAlias , MayAlias, NoAlias},
+  /* Constant 32-bit */ {MayAlias, MayAlias, MayAlias, NoAlias , MayAlias, NoAlias , NoAlias}
   };
-  assert(AS.MAX_COMMON_ADDRESS <= 5);
+  static_assert(AMDGPUAS::MAX_AMDGPU_ADDRESS <= 6, "Addr space out of range");
   if (AS.FLAT_ADDRESS == 0) {
-    assert(AS.GLOBAL_ADDRESS   == 1 &&
-           AS.REGION_ADDRESS   == 2 &&
-           AS.LOCAL_ADDRESS    == 3 &&
-           AS.CONSTANT_ADDRESS == 4 &&
-           AS.PRIVATE_ADDRESS  == 5);
+    assert(AS.GLOBAL_ADDRESS         == 1 &&
+           AS.REGION_ADDRESS         == 2 &&
+           AS.LOCAL_ADDRESS          == 3 &&
+           AS.CONSTANT_ADDRESS       == 4 &&
+           AS.PRIVATE_ADDRESS        == 5 &&
+           AS.CONSTANT_ADDRESS_32BIT == 6);
     ASAliasRules = &ASAliasRulesGenIsZero;
   } else {
-    assert(AS.PRIVATE_ADDRESS  == 0 &&
-           AS.GLOBAL_ADDRESS   == 1 &&
-           AS.CONSTANT_ADDRESS == 2 &&
-           AS.LOCAL_ADDRESS    == 3 &&
-           AS.FLAT_ADDRESS     == 4 &&
-           AS.REGION_ADDRESS   == 5);
+    assert(AS.PRIVATE_ADDRESS        == 0 &&
+           AS.GLOBAL_ADDRESS         == 1 &&
+           AS.CONSTANT_ADDRESS       == 2 &&
+           AS.LOCAL_ADDRESS          == 3 &&
+           AS.FLAT_ADDRESS           == 4 &&
+           AS.REGION_ADDRESS         == 5 &&
+           AS.CONSTANT_ADDRESS_32BIT == 6);
     ASAliasRules = &ASAliasRulesPrivIsZero;
   }
 }
 
 AliasResult AMDGPUAAResult::ASAliasRulesTy::getAliasResult(unsigned AS1,
     unsigned AS2) const {
-  if (AS1 > AS.MAX_COMMON_ADDRESS || AS2 > AS.MAX_COMMON_ADDRESS) {
+  if (AS1 > AS.MAX_AMDGPU_ADDRESS || AS2 > AS.MAX_AMDGPU_ADDRESS) {
     if (Arch == Triple::amdgcn)
       report_fatal_error("Pointer address space out of range");
     return AS1 == AS2 ? MayAlias : NoAlias;
   }
 
   return (*ASAliasRules)[AS1][AS2];
 }
 
 AliasResult AMDGPUAAResult::alias(const MemoryLocation &LocA,
                                   const MemoryLocation &LocB) {
   unsigned asA = LocA.Ptr->getType()->getPointerAddressSpace();
   unsigned asB = LocB.Ptr->getType()->getPointerAddressSpace();
 
   AliasResult Result = ASAliasRules.getAliasResult(asA, asB);
   if (Result == NoAlias) return Result;
 
   // Forward the query to the next alias analysis.
   return AAResultBase::alias(LocA, LocB);
 }
 
 bool AMDGPUAAResult::pointsToConstantMemory(const MemoryLocation &Loc,
                                             bool OrLocal) {
   const Value *Base = GetUnderlyingObject(Loc.Ptr, DL);
 
   if (Base->getType()->getPointerAddressSpace() == AS.CONSTANT_ADDRESS ||
       Base->getType()->getPointerAddressSpace() == AS.CONSTANT_ADDRESS_32BIT) {
     return true;
   }
 
   if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(Base)) {
     if (GV->isConstant())
       return true;
   } else if (const Argument *Arg = dyn_cast<Argument>(Base)) {
     const Function *F = Arg->getParent();
 
     // Only assume constant memory for arguments on kernels.
     switch (F->getCallingConv()) {
     default:
       return AAResultBase::pointsToConstantMemory(Loc, OrLocal);
     case CallingConv::AMDGPU_LS:
     case CallingConv::AMDGPU_HS:
     case CallingConv::AMDGPU_ES:
     case CallingConv::AMDGPU_GS:
     case CallingConv::AMDGPU_VS:
     case CallingConv::AMDGPU_PS:
     case CallingConv::AMDGPU_CS:
     case CallingConv::AMDGPU_KERNEL:
     case CallingConv::SPIR_KERNEL:
       break;
     }
 
     unsigned ArgNo = Arg->getArgNo();
     /* On an argument, ReadOnly attribute indicates that the function does
        not write through this pointer argument, even though it may write
        to the memory that the pointer points to.
        On an argument, ReadNone attribute indicates that the function does
        not dereference that pointer argument, even though it may read or write
        the memory that the pointer points to if accessed through other pointers.
      */
     if (F->hasParamAttribute(ArgNo, Attribute::NoAlias) &&
         (F->hasParamAttribute(ArgNo, Attribute::ReadNone) ||
          F->hasParamAttribute(ArgNo, Attribute::ReadOnly))) {
       return true;
     }
   }
   return AAResultBase::pointsToConstantMemory(Loc, OrLocal);
 }
Index: vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPUAliasAnalysis.h
===================================================================
--- vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPUAliasAnalysis.h	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPUAliasAnalysis.h	(revision 338575)
@@ -1,115 +1,115 @@
 //===- AMDGPUAliasAnalysis --------------------------------------*- C++ -*-===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 /// \file
 /// This is the AMGPU address space based alias analysis pass.
 //===----------------------------------------------------------------------===//
 
 #ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUALIASANALYSIS_H
 #define LLVM_LIB_TARGET_AMDGPU_AMDGPUALIASANALYSIS_H
 
 #include "AMDGPU.h"
 #include "llvm/ADT/Triple.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/Module.h"
 #include "llvm/Pass.h"
 #include <algorithm>
 #include <memory>
 
 namespace llvm {
 
 class DataLayout;
 class MDNode;
 class MemoryLocation;
 
 /// A simple AA result that uses TBAA metadata to answer queries.
 class AMDGPUAAResult : public AAResultBase<AMDGPUAAResult> {
   friend AAResultBase<AMDGPUAAResult>;
 
   const DataLayout &DL;
   AMDGPUAS AS;
 
 public:
   explicit AMDGPUAAResult(const DataLayout &DL, Triple T) : AAResultBase(),
     DL(DL), AS(AMDGPU::getAMDGPUAS(T)), ASAliasRules(AS, T.getArch()) {}
   AMDGPUAAResult(AMDGPUAAResult &&Arg)
       : AAResultBase(std::move(Arg)), DL(Arg.DL), AS(Arg.AS),
         ASAliasRules(Arg.ASAliasRules){}
 
   /// Handle invalidation events from the new pass manager.
   ///
   /// By definition, this result is stateless and so remains valid.
   bool invalidate(Function &, const PreservedAnalyses &) { return false; }
 
   AliasResult alias(const MemoryLocation &LocA, const MemoryLocation &LocB);
   bool pointsToConstantMemory(const MemoryLocation &Loc, bool OrLocal);
 
 private:
   bool Aliases(const MDNode *A, const MDNode *B) const;
   bool PathAliases(const MDNode *A, const MDNode *B) const;
 
   class ASAliasRulesTy {
   public:
     ASAliasRulesTy(AMDGPUAS AS_, Triple::ArchType Arch_);
 
     AliasResult getAliasResult(unsigned AS1, unsigned AS2) const;
 
   private:
     Triple::ArchType Arch;
     AMDGPUAS AS;
-    const AliasResult (*ASAliasRules)[6][6];
+    const AliasResult (*ASAliasRules)[7][7];
   } ASAliasRules;
 };
 
 /// Analysis pass providing a never-invalidated alias analysis result.
 class AMDGPUAA : public AnalysisInfoMixin<AMDGPUAA> {
   friend AnalysisInfoMixin<AMDGPUAA>;
 
   static char PassID;
 
 public:
   using Result = AMDGPUAAResult;
 
   AMDGPUAAResult run(Function &F, AnalysisManager<Function> &AM) {
     return AMDGPUAAResult(F.getParent()->getDataLayout(),
         Triple(F.getParent()->getTargetTriple()));
   }
 };
 
 /// Legacy wrapper pass to provide the AMDGPUAAResult object.
 class AMDGPUAAWrapperPass : public ImmutablePass {
   std::unique_ptr<AMDGPUAAResult> Result;
 
 public:
   static char ID;
 
   AMDGPUAAWrapperPass() : ImmutablePass(ID) {
     initializeAMDGPUAAWrapperPassPass(*PassRegistry::getPassRegistry());
   }
 
   AMDGPUAAResult &getResult() { return *Result; }
   const AMDGPUAAResult &getResult() const { return *Result; }
 
   bool doInitialization(Module &M) override {
     Result.reset(new AMDGPUAAResult(M.getDataLayout(),
         Triple(M.getTargetTriple())));
     return false;
   }
 
   bool doFinalization(Module &M) override {
     Result.reset();
     return false;
   }
 
   void getAnalysisUsage(AnalysisUsage &AU) const override;
 };
 
 } // end namespace llvm
 
 #endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUALIASANALYSIS_H
Index: vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
===================================================================
--- vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp	(revision 338575)
@@ -1,2265 +1,2269 @@
 //===-- AMDGPUISelDAGToDAG.cpp - A dag to dag inst selector for AMDGPU ----===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //==-----------------------------------------------------------------------===//
 //
 /// \file
 /// Defines an instruction selector for the AMDGPU target.
 //
 //===----------------------------------------------------------------------===//
 
 #include "AMDGPU.h"
 #include "AMDGPUArgumentUsageInfo.h"
 #include "AMDGPUISelLowering.h" // For AMDGPUISD
 #include "AMDGPUInstrInfo.h"
 #include "AMDGPUPerfHintAnalysis.h"
 #include "AMDGPURegisterInfo.h"
 #include "AMDGPUSubtarget.h"
 #include "AMDGPUTargetMachine.h"
 #include "SIDefines.h"
 #include "SIISelLowering.h"
 #include "SIInstrInfo.h"
 #include "SIMachineFunctionInfo.h"
 #include "SIRegisterInfo.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Analysis/DivergenceAnalysis.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/CodeGen/FunctionLoweringInfo.h"
 #include "llvm/CodeGen/ISDOpcodes.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/SelectionDAG.h"
 #include "llvm/CodeGen/SelectionDAGISel.h"
 #include "llvm/CodeGen/SelectionDAGNodes.h"
 #include "llvm/CodeGen/ValueTypes.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/Instruction.h"
 #include "llvm/MC/MCInstrDesc.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CodeGen.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MachineValueType.h"
 #include "llvm/Support/MathExtras.h"
 #include <cassert>
 #include <cstdint>
 #include <new>
 #include <vector>
 
 using namespace llvm;
 
 namespace llvm {
 
 class R600InstrInfo;
 
 } // end namespace llvm
 
 //===----------------------------------------------------------------------===//
 // Instruction Selector Implementation
 //===----------------------------------------------------------------------===//
 
 namespace {
 
 /// AMDGPU specific code to select AMDGPU machine instructions for
 /// SelectionDAG operations.
 class AMDGPUDAGToDAGISel : public SelectionDAGISel {
   // Subtarget - Keep a pointer to the AMDGPU Subtarget around so that we can
   // make the right decision when generating code for different targets.
   const GCNSubtarget *Subtarget;
   AMDGPUAS AMDGPUASI;
   bool EnableLateStructurizeCFG;
 
 public:
   explicit AMDGPUDAGToDAGISel(TargetMachine *TM = nullptr,
                               CodeGenOpt::Level OptLevel = CodeGenOpt::Default)
     : SelectionDAGISel(*TM, OptLevel) {
     AMDGPUASI = AMDGPU::getAMDGPUAS(*TM);
     EnableLateStructurizeCFG = AMDGPUTargetMachine::EnableLateStructurizeCFG;
   }
   ~AMDGPUDAGToDAGISel() override = default;
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.addRequired<AMDGPUArgumentUsageInfo>();
     AU.addRequired<AMDGPUPerfHintAnalysis>();
     AU.addRequired<DivergenceAnalysis>();
     SelectionDAGISel::getAnalysisUsage(AU);
   }
 
   bool runOnMachineFunction(MachineFunction &MF) override;
   void Select(SDNode *N) override;
   StringRef getPassName() const override;
   void PostprocessISelDAG() override;
 
 protected:
   void SelectBuildVector(SDNode *N, unsigned RegClassID);
 
 private:
   std::pair<SDValue, SDValue> foldFrameIndex(SDValue N) const;
   bool isNoNanSrc(SDValue N) const;
   bool isInlineImmediate(const SDNode *N) const;
 
   bool isUniformBr(const SDNode *N) const;
 
   SDNode *glueCopyToM0(SDNode *N) const;
 
   const TargetRegisterClass *getOperandRegClass(SDNode *N, unsigned OpNo) const;
   virtual bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base, SDValue &Offset);
   virtual bool SelectADDRIndirect(SDValue Addr, SDValue &Base, SDValue &Offset);
   bool isDSOffsetLegal(const SDValue &Base, unsigned Offset,
                        unsigned OffsetBits) const;
   bool SelectDS1Addr1Offset(SDValue Ptr, SDValue &Base, SDValue &Offset) const;
   bool SelectDS64Bit4ByteAligned(SDValue Ptr, SDValue &Base, SDValue &Offset0,
                                  SDValue &Offset1) const;
   bool SelectMUBUF(SDValue Addr, SDValue &SRsrc, SDValue &VAddr,
                    SDValue &SOffset, SDValue &Offset, SDValue &Offen,
                    SDValue &Idxen, SDValue &Addr64, SDValue &GLC, SDValue &SLC,
                    SDValue &TFE) const;
   bool SelectMUBUFAddr64(SDValue Addr, SDValue &SRsrc, SDValue &VAddr,
                          SDValue &SOffset, SDValue &Offset, SDValue &GLC,
                          SDValue &SLC, SDValue &TFE) const;
   bool SelectMUBUFAddr64(SDValue Addr, SDValue &SRsrc,
                          SDValue &VAddr, SDValue &SOffset, SDValue &Offset,
                          SDValue &SLC) const;
   bool SelectMUBUFScratchOffen(SDNode *Parent,
                                SDValue Addr, SDValue &RSrc, SDValue &VAddr,
                                SDValue &SOffset, SDValue &ImmOffset) const;
   bool SelectMUBUFScratchOffset(SDNode *Parent,
                                 SDValue Addr, SDValue &SRsrc, SDValue &Soffset,
                                 SDValue &Offset) const;
 
   bool SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc, SDValue &SOffset,
                          SDValue &Offset, SDValue &GLC, SDValue &SLC,
                          SDValue &TFE) const;
   bool SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc, SDValue &Soffset,
                          SDValue &Offset, SDValue &SLC) const;
   bool SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc, SDValue &Soffset,
                          SDValue &Offset) const;
   bool SelectMUBUFConstant(SDValue Constant,
                            SDValue &SOffset,
                            SDValue &ImmOffset) const;
   bool SelectMUBUFIntrinsicOffset(SDValue Offset, SDValue &SOffset,
                                   SDValue &ImmOffset) const;
   bool SelectMUBUFIntrinsicVOffset(SDValue Offset, SDValue &SOffset,
                                    SDValue &ImmOffset, SDValue &VOffset) const;
 
   bool SelectFlatAtomic(SDValue Addr, SDValue &VAddr,
                         SDValue &Offset, SDValue &SLC) const;
   bool SelectFlatAtomicSigned(SDValue Addr, SDValue &VAddr,
                               SDValue &Offset, SDValue &SLC) const;
 
   template <bool IsSigned>
   bool SelectFlatOffset(SDValue Addr, SDValue &VAddr,
                         SDValue &Offset, SDValue &SLC) const;
 
   bool SelectSMRDOffset(SDValue ByteOffsetNode, SDValue &Offset,
                         bool &Imm) const;
   SDValue Expand32BitAddress(SDValue Addr) const;
   bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue &Offset,
                   bool &Imm) const;
   bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
   bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
   bool SelectSMRDSgpr(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
   bool SelectSMRDBufferImm(SDValue Addr, SDValue &Offset) const;
   bool SelectSMRDBufferImm32(SDValue Addr, SDValue &Offset) const;
   bool SelectMOVRELOffset(SDValue Index, SDValue &Base, SDValue &Offset) const;
 
   bool SelectVOP3Mods_NNaN(SDValue In, SDValue &Src, SDValue &SrcMods) const;
   bool SelectVOP3ModsImpl(SDValue In, SDValue &Src, unsigned &SrcMods) const;
   bool SelectVOP3Mods(SDValue In, SDValue &Src, SDValue &SrcMods) const;
   bool SelectVOP3NoMods(SDValue In, SDValue &Src) const;
   bool SelectVOP3Mods0(SDValue In, SDValue &Src, SDValue &SrcMods,
                        SDValue &Clamp, SDValue &Omod) const;
   bool SelectVOP3NoMods0(SDValue In, SDValue &Src, SDValue &SrcMods,
                          SDValue &Clamp, SDValue &Omod) const;
 
   bool SelectVOP3Mods0Clamp0OMod(SDValue In, SDValue &Src, SDValue &SrcMods,
                                  SDValue &Clamp,
                                  SDValue &Omod) const;
 
   bool SelectVOP3OMods(SDValue In, SDValue &Src,
                        SDValue &Clamp, SDValue &Omod) const;
 
   bool SelectVOP3PMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;
   bool SelectVOP3PMods0(SDValue In, SDValue &Src, SDValue &SrcMods,
                         SDValue &Clamp) const;
 
   bool SelectVOP3OpSel(SDValue In, SDValue &Src, SDValue &SrcMods) const;
   bool SelectVOP3OpSel0(SDValue In, SDValue &Src, SDValue &SrcMods,
                         SDValue &Clamp) const;
 
   bool SelectVOP3OpSelMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;
   bool SelectVOP3OpSelMods0(SDValue In, SDValue &Src, SDValue &SrcMods,
                             SDValue &Clamp) const;
   bool SelectVOP3PMadMixModsImpl(SDValue In, SDValue &Src, unsigned &Mods) const;
   bool SelectVOP3PMadMixMods(SDValue In, SDValue &Src, SDValue &SrcMods) const;
 
   bool SelectHi16Elt(SDValue In, SDValue &Src) const;
 
   void SelectADD_SUB_I64(SDNode *N);
   void SelectUADDO_USUBO(SDNode *N);
   void SelectDIV_SCALE(SDNode *N);
   void SelectMAD_64_32(SDNode *N);
   void SelectFMA_W_CHAIN(SDNode *N);
   void SelectFMUL_W_CHAIN(SDNode *N);
 
   SDNode *getS_BFE(unsigned Opcode, const SDLoc &DL, SDValue Val,
                    uint32_t Offset, uint32_t Width);
   void SelectS_BFEFromShifts(SDNode *N);
   void SelectS_BFE(SDNode *N);
   bool isCBranchSCC(const SDNode *N) const;
   void SelectBRCOND(SDNode *N);
   void SelectFMAD_FMA(SDNode *N);
   void SelectATOMIC_CMP_SWAP(SDNode *N);
 
 protected:
   // Include the pieces autogenerated from the target description.
 #include "AMDGPUGenDAGISel.inc"
 };
 
 class R600DAGToDAGISel : public AMDGPUDAGToDAGISel {
   const R600Subtarget *Subtarget;
   AMDGPUAS AMDGPUASI;
 
   bool isConstantLoad(const MemSDNode *N, int cbID) const;
   bool SelectGlobalValueConstantOffset(SDValue Addr, SDValue& IntPtr);
   bool SelectGlobalValueVariableOffset(SDValue Addr, SDValue &BaseReg,
                                        SDValue& Offset);
 public:
   explicit R600DAGToDAGISel(TargetMachine *TM, CodeGenOpt::Level OptLevel) :
       AMDGPUDAGToDAGISel(TM, OptLevel) {
     AMDGPUASI = AMDGPU::getAMDGPUAS(*TM);
       }
 
   void Select(SDNode *N) override;
 
   bool SelectADDRIndirect(SDValue Addr, SDValue &Base,
                           SDValue &Offset) override;
   bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base,
                           SDValue &Offset) override;
 
   bool runOnMachineFunction(MachineFunction &MF) override;
 protected:
   // Include the pieces autogenerated from the target description.
 #include "R600GenDAGISel.inc"
 };
 
 }  // end anonymous namespace
 
 INITIALIZE_PASS_BEGIN(AMDGPUDAGToDAGISel, "isel",
                       "AMDGPU DAG->DAG Pattern Instruction Selection", false, false)
 INITIALIZE_PASS_DEPENDENCY(AMDGPUArgumentUsageInfo)
 INITIALIZE_PASS_DEPENDENCY(AMDGPUPerfHintAnalysis)
 INITIALIZE_PASS_DEPENDENCY(DivergenceAnalysis)
 INITIALIZE_PASS_END(AMDGPUDAGToDAGISel, "isel",
                     "AMDGPU DAG->DAG Pattern Instruction Selection", false, false)
 
 /// This pass converts a legalized DAG into a AMDGPU-specific
 // DAG, ready for instruction scheduling.
 FunctionPass *llvm::createAMDGPUISelDag(TargetMachine *TM,
                                         CodeGenOpt::Level OptLevel) {
   return new AMDGPUDAGToDAGISel(TM, OptLevel);
 }
 
 /// This pass converts a legalized DAG into a R600-specific
 // DAG, ready for instruction scheduling.
 FunctionPass *llvm::createR600ISelDag(TargetMachine *TM,
                                       CodeGenOpt::Level OptLevel) {
   return new R600DAGToDAGISel(TM, OptLevel);
 }
 
 bool AMDGPUDAGToDAGISel::runOnMachineFunction(MachineFunction &MF) {
   Subtarget = &MF.getSubtarget<GCNSubtarget>();
   return SelectionDAGISel::runOnMachineFunction(MF);
 }
 
 bool AMDGPUDAGToDAGISel::isNoNanSrc(SDValue N) const {
   if (TM.Options.NoNaNsFPMath)
     return true;
 
   // TODO: Move into isKnownNeverNaN
   if (N->getFlags().isDefined())
     return N->getFlags().hasNoNaNs();
 
   return CurDAG->isKnownNeverNaN(N);
 }
 
 bool AMDGPUDAGToDAGISel::isInlineImmediate(const SDNode *N) const {
   const SIInstrInfo *TII = Subtarget->getInstrInfo();
 
   if (const ConstantSDNode *C = dyn_cast<ConstantSDNode>(N))
     return TII->isInlineConstant(C->getAPIntValue());
 
   if (const ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(N))
     return TII->isInlineConstant(C->getValueAPF().bitcastToAPInt());
 
   return false;
 }
 
 /// Determine the register class for \p OpNo
 /// \returns The register class of the virtual register that will be used for
 /// the given operand number \OpNo or NULL if the register class cannot be
 /// determined.
 const TargetRegisterClass *AMDGPUDAGToDAGISel::getOperandRegClass(SDNode *N,
                                                           unsigned OpNo) const {
   if (!N->isMachineOpcode()) {
     if (N->getOpcode() == ISD::CopyToReg) {
       unsigned Reg = cast<RegisterSDNode>(N->getOperand(1))->getReg();
       if (TargetRegisterInfo::isVirtualRegister(Reg)) {
         MachineRegisterInfo &MRI = CurDAG->getMachineFunction().getRegInfo();
         return MRI.getRegClass(Reg);
       }
 
       const SIRegisterInfo *TRI
         = static_cast<const GCNSubtarget *>(Subtarget)->getRegisterInfo();
       return TRI->getPhysRegClass(Reg);
     }
 
     return nullptr;
   }
 
   switch (N->getMachineOpcode()) {
   default: {
     const MCInstrDesc &Desc =
         Subtarget->getInstrInfo()->get(N->getMachineOpcode());
     unsigned OpIdx = Desc.getNumDefs() + OpNo;
     if (OpIdx >= Desc.getNumOperands())
       return nullptr;
     int RegClass = Desc.OpInfo[OpIdx].RegClass;
     if (RegClass == -1)
       return nullptr;
 
     return Subtarget->getRegisterInfo()->getRegClass(RegClass);
   }
   case AMDGPU::REG_SEQUENCE: {
     unsigned RCID = cast<ConstantSDNode>(N->getOperand(0))->getZExtValue();
     const TargetRegisterClass *SuperRC =
         Subtarget->getRegisterInfo()->getRegClass(RCID);
 
     SDValue SubRegOp = N->getOperand(OpNo + 1);
     unsigned SubRegIdx = cast<ConstantSDNode>(SubRegOp)->getZExtValue();
     return Subtarget->getRegisterInfo()->getSubClassWithSubReg(SuperRC,
                                                               SubRegIdx);
   }
   }
 }
 
 SDNode *AMDGPUDAGToDAGISel::glueCopyToM0(SDNode *N) const {
   if (cast<MemSDNode>(N)->getAddressSpace() != AMDGPUASI.LOCAL_ADDRESS ||
       !Subtarget->ldsRequiresM0Init())
     return N;
 
   const SITargetLowering& Lowering =
       *static_cast<const SITargetLowering*>(getTargetLowering());
 
   // Write max value to m0 before each load operation
 
   SDValue M0 = Lowering.copyToM0(*CurDAG, CurDAG->getEntryNode(), SDLoc(N),
                                  CurDAG->getTargetConstant(-1, SDLoc(N), MVT::i32));
 
   SDValue Glue = M0.getValue(1);
 
   SmallVector <SDValue, 8> Ops;
   for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i) {
      Ops.push_back(N->getOperand(i));
   }
   Ops.push_back(Glue);
   return CurDAG->MorphNodeTo(N, N->getOpcode(), N->getVTList(), Ops);
 }
 
 static unsigned selectSGPRVectorRegClassID(unsigned NumVectorElts) {
   switch (NumVectorElts) {
   case 1:
     return AMDGPU::SReg_32_XM0RegClassID;
   case 2:
     return AMDGPU::SReg_64RegClassID;
   case 4:
     return AMDGPU::SReg_128RegClassID;
   case 8:
     return AMDGPU::SReg_256RegClassID;
   case 16:
     return AMDGPU::SReg_512RegClassID;
   }
 
   llvm_unreachable("invalid vector size");
 }
 
 static bool getConstantValue(SDValue N, uint32_t &Out) {
   if (const ConstantSDNode *C = dyn_cast<ConstantSDNode>(N)) {
     Out = C->getAPIntValue().getZExtValue();
     return true;
   }
 
   if (const ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(N)) {
     Out = C->getValueAPF().bitcastToAPInt().getZExtValue();
     return true;
   }
 
   return false;
 }
 
 void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, unsigned RegClassID) {
   EVT VT = N->getValueType(0);
   unsigned NumVectorElts = VT.getVectorNumElements();
   EVT EltVT = VT.getVectorElementType();
   SDLoc DL(N);
   SDValue RegClass = CurDAG->getTargetConstant(RegClassID, DL, MVT::i32);
 
   if (NumVectorElts == 1) {
     CurDAG->SelectNodeTo(N, AMDGPU::COPY_TO_REGCLASS, EltVT, N->getOperand(0),
                          RegClass);
     return;
   }
 
   assert(NumVectorElts <= 16 && "Vectors with more than 16 elements not "
                                   "supported yet");
   // 16 = Max Num Vector Elements
   // 2 = 2 REG_SEQUENCE operands per element (value, subreg index)
   // 1 = Vector Register Class
   SmallVector<SDValue, 16 * 2 + 1> RegSeqArgs(NumVectorElts * 2 + 1);
 
   RegSeqArgs[0] = CurDAG->getTargetConstant(RegClassID, DL, MVT::i32);
   bool IsRegSeq = true;
   unsigned NOps = N->getNumOperands();
   for (unsigned i = 0; i < NOps; i++) {
     // XXX: Why is this here?
     if (isa<RegisterSDNode>(N->getOperand(i))) {
       IsRegSeq = false;
       break;
     }
     unsigned Sub = AMDGPURegisterInfo::getSubRegFromChannel(i);
     RegSeqArgs[1 + (2 * i)] = N->getOperand(i);
     RegSeqArgs[1 + (2 * i) + 1] = CurDAG->getTargetConstant(Sub, DL, MVT::i32);
   }
   if (NOps != NumVectorElts) {
     // Fill in the missing undef elements if this was a scalar_to_vector.
     assert(N->getOpcode() == ISD::SCALAR_TO_VECTOR && NOps < NumVectorElts);
     MachineSDNode *ImpDef = CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF,
                                                    DL, EltVT);
     for (unsigned i = NOps; i < NumVectorElts; ++i) {
       unsigned Sub = AMDGPURegisterInfo::getSubRegFromChannel(i);
       RegSeqArgs[1 + (2 * i)] = SDValue(ImpDef, 0);
       RegSeqArgs[1 + (2 * i) + 1] =
           CurDAG->getTargetConstant(Sub, DL, MVT::i32);
     }
   }
 
   if (!IsRegSeq)
     SelectCode(N);
   CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs);
 }
 
 void AMDGPUDAGToDAGISel::Select(SDNode *N) {
   unsigned int Opc = N->getOpcode();
   if (N->isMachineOpcode()) {
     N->setNodeId(-1);
     return;   // Already selected.
   }
 
   if (isa<AtomicSDNode>(N) ||
       (Opc == AMDGPUISD::ATOMIC_INC || Opc == AMDGPUISD::ATOMIC_DEC ||
        Opc == AMDGPUISD::ATOMIC_LOAD_FADD ||
        Opc == AMDGPUISD::ATOMIC_LOAD_FMIN ||
        Opc == AMDGPUISD::ATOMIC_LOAD_FMAX))
     N = glueCopyToM0(N);
 
   switch (Opc) {
   default:
     break;
   // We are selecting i64 ADD here instead of custom lower it during
   // DAG legalization, so we can fold some i64 ADDs used for address
   // calculation into the LOAD and STORE instructions.
   case ISD::ADDC:
   case ISD::ADDE:
   case ISD::SUBC:
   case ISD::SUBE: {
     if (N->getValueType(0) != MVT::i64)
       break;
 
     SelectADD_SUB_I64(N);
     return;
   }
   case ISD::UADDO:
   case ISD::USUBO: {
     SelectUADDO_USUBO(N);
     return;
   }
   case AMDGPUISD::FMUL_W_CHAIN: {
     SelectFMUL_W_CHAIN(N);
     return;
   }
   case AMDGPUISD::FMA_W_CHAIN: {
     SelectFMA_W_CHAIN(N);
     return;
   }
 
   case ISD::SCALAR_TO_VECTOR:
   case ISD::BUILD_VECTOR: {
     EVT VT = N->getValueType(0);
     unsigned NumVectorElts = VT.getVectorNumElements();
     if (VT.getScalarSizeInBits() == 16) {
       if (Opc == ISD::BUILD_VECTOR && NumVectorElts == 2) {
         uint32_t LHSVal, RHSVal;
         if (getConstantValue(N->getOperand(0), LHSVal) &&
             getConstantValue(N->getOperand(1), RHSVal)) {
           uint32_t K = LHSVal | (RHSVal << 16);
           CurDAG->SelectNodeTo(N, AMDGPU::S_MOV_B32, VT,
                                CurDAG->getTargetConstant(K, SDLoc(N), MVT::i32));
           return;
         }
       }
 
       break;
     }
 
     assert(VT.getVectorElementType().bitsEq(MVT::i32));
     unsigned RegClassID = selectSGPRVectorRegClassID(NumVectorElts);
     SelectBuildVector(N, RegClassID);
     return;
   }
   case ISD::BUILD_PAIR: {
     SDValue RC, SubReg0, SubReg1;
     SDLoc DL(N);
     if (N->getValueType(0) == MVT::i128) {
       RC = CurDAG->getTargetConstant(AMDGPU::SReg_128RegClassID, DL, MVT::i32);
       SubReg0 = CurDAG->getTargetConstant(AMDGPU::sub0_sub1, DL, MVT::i32);
       SubReg1 = CurDAG->getTargetConstant(AMDGPU::sub2_sub3, DL, MVT::i32);
     } else if (N->getValueType(0) == MVT::i64) {
       RC = CurDAG->getTargetConstant(AMDGPU::SReg_64RegClassID, DL, MVT::i32);
       SubReg0 = CurDAG->getTargetConstant(AMDGPU::sub0, DL, MVT::i32);
       SubReg1 = CurDAG->getTargetConstant(AMDGPU::sub1, DL, MVT::i32);
     } else {
       llvm_unreachable("Unhandled value type for BUILD_PAIR");
     }
     const SDValue Ops[] = { RC, N->getOperand(0), SubReg0,
                             N->getOperand(1), SubReg1 };
     ReplaceNode(N, CurDAG->getMachineNode(TargetOpcode::REG_SEQUENCE, DL,
                                           N->getValueType(0), Ops));
     return;
   }
 
   case ISD::Constant:
   case ISD::ConstantFP: {
     if (N->getValueType(0).getSizeInBits() != 64 || isInlineImmediate(N))
       break;
 
     uint64_t Imm;
     if (ConstantFPSDNode *FP = dyn_cast<ConstantFPSDNode>(N))
       Imm = FP->getValueAPF().bitcastToAPInt().getZExtValue();
     else {
       ConstantSDNode *C = cast<ConstantSDNode>(N);
       Imm = C->getZExtValue();
     }
 
     SDLoc DL(N);
     SDNode *Lo = CurDAG->getMachineNode(AMDGPU::S_MOV_B32, DL, MVT::i32,
                                 CurDAG->getConstant(Imm & 0xFFFFFFFF, DL,
                                                     MVT::i32));
     SDNode *Hi = CurDAG->getMachineNode(AMDGPU::S_MOV_B32, DL, MVT::i32,
                                 CurDAG->getConstant(Imm >> 32, DL, MVT::i32));
     const SDValue Ops[] = {
       CurDAG->getTargetConstant(AMDGPU::SReg_64RegClassID, DL, MVT::i32),
       SDValue(Lo, 0), CurDAG->getTargetConstant(AMDGPU::sub0, DL, MVT::i32),
       SDValue(Hi, 0), CurDAG->getTargetConstant(AMDGPU::sub1, DL, MVT::i32)
     };
 
     ReplaceNode(N, CurDAG->getMachineNode(TargetOpcode::REG_SEQUENCE, DL,
                                           N->getValueType(0), Ops));
     return;
   }
   case ISD::LOAD:
   case ISD::STORE:
   case ISD::ATOMIC_LOAD:
   case ISD::ATOMIC_STORE: {
     N = glueCopyToM0(N);
     break;
   }
 
   case AMDGPUISD::BFE_I32:
   case AMDGPUISD::BFE_U32: {
     // There is a scalar version available, but unlike the vector version which
     // has a separate operand for the offset and width, the scalar version packs
     // the width and offset into a single operand. Try to move to the scalar
     // version if the offsets are constant, so that we can try to keep extended
     // loads of kernel arguments in SGPRs.
 
     // TODO: Technically we could try to pattern match scalar bitshifts of
     // dynamic values, but it's probably not useful.
     ConstantSDNode *Offset = dyn_cast<ConstantSDNode>(N->getOperand(1));
     if (!Offset)
       break;
 
     ConstantSDNode *Width = dyn_cast<ConstantSDNode>(N->getOperand(2));
     if (!Width)
       break;
 
     bool Signed = Opc == AMDGPUISD::BFE_I32;
 
     uint32_t OffsetVal = Offset->getZExtValue();
     uint32_t WidthVal = Width->getZExtValue();
 
     ReplaceNode(N, getS_BFE(Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32,
                             SDLoc(N), N->getOperand(0), OffsetVal, WidthVal));
     return;
   }
   case AMDGPUISD::DIV_SCALE: {
     SelectDIV_SCALE(N);
     return;
   }
   case AMDGPUISD::MAD_I64_I32:
   case AMDGPUISD::MAD_U64_U32: {
     SelectMAD_64_32(N);
     return;
   }
   case ISD::CopyToReg: {
     const SITargetLowering& Lowering =
       *static_cast<const SITargetLowering*>(getTargetLowering());
     N = Lowering.legalizeTargetIndependentNode(N, *CurDAG);
     break;
   }
   case ISD::AND:
   case ISD::SRL:
   case ISD::SRA:
   case ISD::SIGN_EXTEND_INREG:
     if (N->getValueType(0) != MVT::i32)
       break;
 
     SelectS_BFE(N);
     return;
   case ISD::BRCOND:
     SelectBRCOND(N);
     return;
   case ISD::FMAD:
   case ISD::FMA:
     SelectFMAD_FMA(N);
     return;
   case AMDGPUISD::ATOMIC_CMP_SWAP:
     SelectATOMIC_CMP_SWAP(N);
     return;
   }
 
   SelectCode(N);
 }
 
 bool AMDGPUDAGToDAGISel::isUniformBr(const SDNode *N) const {
   const BasicBlock *BB = FuncInfo->MBB->getBasicBlock();
   const Instruction *Term = BB->getTerminator();
   return Term->getMetadata("amdgpu.uniform") ||
          Term->getMetadata("structurizecfg.uniform");
 }
 
 StringRef AMDGPUDAGToDAGISel::getPassName() const {
   return "AMDGPU DAG->DAG Pattern Instruction Selection";
 }
 
 //===----------------------------------------------------------------------===//
 // Complex Patterns
 //===----------------------------------------------------------------------===//
 
 bool AMDGPUDAGToDAGISel::SelectADDRVTX_READ(SDValue Addr, SDValue &Base,
                                             SDValue &Offset) {
   return false;
 }
 
 bool AMDGPUDAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,
                                             SDValue &Offset) {
   ConstantSDNode *C;
   SDLoc DL(Addr);
 
   if ((C = dyn_cast<ConstantSDNode>(Addr))) {
     Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
     Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
   } else if ((Addr.getOpcode() == AMDGPUISD::DWORDADDR) &&
              (C = dyn_cast<ConstantSDNode>(Addr.getOperand(0)))) {
     Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
     Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
   } else if ((Addr.getOpcode() == ISD::ADD || Addr.getOpcode() == ISD::OR) &&
             (C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {
     Base = Addr.getOperand(0);
     Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
   } else {
     Base = Addr;
     Offset = CurDAG->getTargetConstant(0, DL, MVT::i32);
   }
 
   return true;
 }
 
 // FIXME: Should only handle addcarry/subcarry
 void AMDGPUDAGToDAGISel::SelectADD_SUB_I64(SDNode *N) {
   SDLoc DL(N);
   SDValue LHS = N->getOperand(0);
   SDValue RHS = N->getOperand(1);
 
   unsigned Opcode = N->getOpcode();
   bool ConsumeCarry = (Opcode == ISD::ADDE || Opcode == ISD::SUBE);
   bool ProduceCarry =
       ConsumeCarry || Opcode == ISD::ADDC || Opcode == ISD::SUBC;
   bool IsAdd = Opcode == ISD::ADD || Opcode == ISD::ADDC || Opcode == ISD::ADDE;
 
   SDValue Sub0 = CurDAG->getTargetConstant(AMDGPU::sub0, DL, MVT::i32);
   SDValue Sub1 = CurDAG->getTargetConstant(AMDGPU::sub1, DL, MVT::i32);
 
   SDNode *Lo0 = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG,
                                        DL, MVT::i32, LHS, Sub0);
   SDNode *Hi0 = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG,
                                        DL, MVT::i32, LHS, Sub1);
 
   SDNode *Lo1 = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG,
                                        DL, MVT::i32, RHS, Sub0);
   SDNode *Hi1 = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG,
                                        DL, MVT::i32, RHS, Sub1);
 
   SDVTList VTList = CurDAG->getVTList(MVT::i32, MVT::Glue);
 
   unsigned Opc = IsAdd ? AMDGPU::S_ADD_U32 : AMDGPU::S_SUB_U32;
   unsigned CarryOpc = IsAdd ? AMDGPU::S_ADDC_U32 : AMDGPU::S_SUBB_U32;
 
   SDNode *AddLo;
   if (!ConsumeCarry) {
     SDValue Args[] = { SDValue(Lo0, 0), SDValue(Lo1, 0) };
     AddLo = CurDAG->getMachineNode(Opc, DL, VTList, Args);
   } else {
     SDValue Args[] = { SDValue(Lo0, 0), SDValue(Lo1, 0), N->getOperand(2) };
     AddLo = CurDAG->getMachineNode(CarryOpc, DL, VTList, Args);
   }
   SDValue AddHiArgs[] = {
     SDValue(Hi0, 0),
     SDValue(Hi1, 0),
     SDValue(AddLo, 1)
   };
   SDNode *AddHi = CurDAG->getMachineNode(CarryOpc, DL, VTList, AddHiArgs);
 
   SDValue RegSequenceArgs[] = {
     CurDAG->getTargetConstant(AMDGPU::SReg_64RegClassID, DL, MVT::i32),
     SDValue(AddLo,0),
     Sub0,
     SDValue(AddHi,0),
     Sub1,
   };
   SDNode *RegSequence = CurDAG->getMachineNode(AMDGPU::REG_SEQUENCE, DL,
                                                MVT::i64, RegSequenceArgs);
 
   if (ProduceCarry) {
     // Replace the carry-use
     ReplaceUses(SDValue(N, 1), SDValue(AddHi, 1));
   }
 
   // Replace the remaining uses.
   ReplaceNode(N, RegSequence);
 }
 
 void AMDGPUDAGToDAGISel::SelectUADDO_USUBO(SDNode *N) {
   // The name of the opcodes are misleading. v_add_i32/v_sub_i32 have unsigned
   // carry out despite the _i32 name. These were renamed in VI to _U32.
   // FIXME: We should probably rename the opcodes here.
   unsigned Opc = N->getOpcode() == ISD::UADDO ?
     AMDGPU::V_ADD_I32_e64 : AMDGPU::V_SUB_I32_e64;
 
   CurDAG->SelectNodeTo(N, Opc, N->getVTList(),
                        { N->getOperand(0), N->getOperand(1) });
 }
 
 void AMDGPUDAGToDAGISel::SelectFMA_W_CHAIN(SDNode *N) {
   SDLoc SL(N);
   //  src0_modifiers, src0,  src1_modifiers, src1, src2_modifiers, src2, clamp, omod
   SDValue Ops[10];
 
   SelectVOP3Mods0(N->getOperand(1), Ops[1], Ops[0], Ops[6], Ops[7]);
   SelectVOP3Mods(N->getOperand(2), Ops[3], Ops[2]);
   SelectVOP3Mods(N->getOperand(3), Ops[5], Ops[4]);
   Ops[8] = N->getOperand(0);
   Ops[9] = N->getOperand(4);
 
   CurDAG->SelectNodeTo(N, AMDGPU::V_FMA_F32, N->getVTList(), Ops);
 }
 
 void AMDGPUDAGToDAGISel::SelectFMUL_W_CHAIN(SDNode *N) {
   SDLoc SL(N);
   //    src0_modifiers, src0,  src1_modifiers, src1, clamp, omod
   SDValue Ops[8];
 
   SelectVOP3Mods0(N->getOperand(1), Ops[1], Ops[0], Ops[4], Ops[5]);
   SelectVOP3Mods(N->getOperand(2), Ops[3], Ops[2]);
   Ops[6] = N->getOperand(0);
   Ops[7] = N->getOperand(3);
 
   CurDAG->SelectNodeTo(N, AMDGPU::V_MUL_F32_e64, N->getVTList(), Ops);
 }
 
 // We need to handle this here because tablegen doesn't support matching
 // instructions with multiple outputs.
 void AMDGPUDAGToDAGISel::SelectDIV_SCALE(SDNode *N) {
   SDLoc SL(N);
   EVT VT = N->getValueType(0);
 
   assert(VT == MVT::f32 || VT == MVT::f64);
 
   unsigned Opc
     = (VT == MVT::f64) ? AMDGPU::V_DIV_SCALE_F64 : AMDGPU::V_DIV_SCALE_F32;
 
   SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2) };
   CurDAG->SelectNodeTo(N, Opc, N->getVTList(), Ops);
 }
 
 // We need to handle this here because tablegen doesn't support matching
 // instructions with multiple outputs.
 void AMDGPUDAGToDAGISel::SelectMAD_64_32(SDNode *N) {
   SDLoc SL(N);
   bool Signed = N->getOpcode() == AMDGPUISD::MAD_I64_I32;
   unsigned Opc = Signed ? AMDGPU::V_MAD_I64_I32 : AMDGPU::V_MAD_U64_U32;
 
   SDValue Clamp = CurDAG->getTargetConstant(0, SL, MVT::i1);
   SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),
                     Clamp };
   CurDAG->SelectNodeTo(N, Opc, N->getVTList(), Ops);
 }
 
 bool AMDGPUDAGToDAGISel::isDSOffsetLegal(const SDValue &Base, unsigned Offset,
                                          unsigned OffsetBits) const {
   if ((OffsetBits == 16 && !isUInt<16>(Offset)) ||
       (OffsetBits == 8 && !isUInt<8>(Offset)))
     return false;
 
   if (Subtarget->getGeneration() >= AMDGPUSubtarget::SEA_ISLANDS ||
       Subtarget->unsafeDSOffsetFoldingEnabled())
     return true;
 
   // On Southern Islands instruction with a negative base value and an offset
   // don't seem to work.
   return CurDAG->SignBitIsZero(Base);
 }
 
 bool AMDGPUDAGToDAGISel::SelectDS1Addr1Offset(SDValue Addr, SDValue &Base,
                                               SDValue &Offset) const {
   SDLoc DL(Addr);
   if (CurDAG->isBaseWithConstantOffset(Addr)) {
     SDValue N0 = Addr.getOperand(0);
     SDValue N1 = Addr.getOperand(1);
     ConstantSDNode *C1 = cast<ConstantSDNode>(N1);
     if (isDSOffsetLegal(N0, C1->getSExtValue(), 16)) {
       // (add n0, c0)
       Base = N0;
       Offset = CurDAG->getTargetConstant(C1->getZExtValue(), DL, MVT::i16);
       return true;
     }
   } else if (Addr.getOpcode() == ISD::SUB) {
     // sub C, x -> add (sub 0, x), C
     if (const ConstantSDNode *C = dyn_cast<ConstantSDNode>(Addr.getOperand(0))) {
       int64_t ByteOffset = C->getSExtValue();
       if (isUInt<16>(ByteOffset)) {
         SDValue Zero = CurDAG->getTargetConstant(0, DL, MVT::i32);
 
         // XXX - This is kind of hacky. Create a dummy sub node so we can check
         // the known bits in isDSOffsetLegal. We need to emit the selected node
         // here, so this is thrown away.
         SDValue Sub = CurDAG->getNode(ISD::SUB, DL, MVT::i32,
                                       Zero, Addr.getOperand(1));
 
         if (isDSOffsetLegal(Sub, ByteOffset, 16)) {
           // FIXME: Select to VOP3 version for with-carry.
           unsigned SubOp = Subtarget->hasAddNoCarry() ?
             AMDGPU::V_SUB_U32_e64 : AMDGPU::V_SUB_I32_e32;
 
           MachineSDNode *MachineSub
             = CurDAG->getMachineNode(SubOp, DL, MVT::i32,
                                      Zero, Addr.getOperand(1));
 
           Base = SDValue(MachineSub, 0);
           Offset = CurDAG->getTargetConstant(ByteOffset, DL, MVT::i16);
           return true;
         }
       }
     }
   } else if (const ConstantSDNode *CAddr = dyn_cast<ConstantSDNode>(Addr)) {
     // If we have a constant address, prefer to put the constant into the
     // offset. This can save moves to load the constant address since multiple
     // operations can share the zero base address register, and enables merging
     // into read2 / write2 instructions.
 
     SDLoc DL(Addr);
 
     if (isUInt<16>(CAddr->getZExtValue())) {
       SDValue Zero = CurDAG->getTargetConstant(0, DL, MVT::i32);
       MachineSDNode *MovZero = CurDAG->getMachineNode(AMDGPU::V_MOV_B32_e32,
                                  DL, MVT::i32, Zero);
       Base = SDValue(MovZero, 0);
       Offset = CurDAG->getTargetConstant(CAddr->getZExtValue(), DL, MVT::i16);
       return true;
     }
   }
 
   // default case
   Base = Addr;
   Offset = CurDAG->getTargetConstant(0, SDLoc(Addr), MVT::i16);
   return true;
 }
 
 // TODO: If offset is too big, put low 16-bit into offset.
 bool AMDGPUDAGToDAGISel::SelectDS64Bit4ByteAligned(SDValue Addr, SDValue &Base,
                                                    SDValue &Offset0,
                                                    SDValue &Offset1) const {
   SDLoc DL(Addr);
 
   if (CurDAG->isBaseWithConstantOffset(Addr)) {
     SDValue N0 = Addr.getOperand(0);
     SDValue N1 = Addr.getOperand(1);
     ConstantSDNode *C1 = cast<ConstantSDNode>(N1);
     unsigned DWordOffset0 = C1->getZExtValue() / 4;
     unsigned DWordOffset1 = DWordOffset0 + 1;
     // (add n0, c0)
     if (isDSOffsetLegal(N0, DWordOffset1, 8)) {
       Base = N0;
       Offset0 = CurDAG->getTargetConstant(DWordOffset0, DL, MVT::i8);
       Offset1 = CurDAG->getTargetConstant(DWordOffset1, DL, MVT::i8);
       return true;
     }
   } else if (Addr.getOpcode() == ISD::SUB) {
     // sub C, x -> add (sub 0, x), C
     if (const ConstantSDNode *C = dyn_cast<ConstantSDNode>(Addr.getOperand(0))) {
       unsigned DWordOffset0 = C->getZExtValue() / 4;
       unsigned DWordOffset1 = DWordOffset0 + 1;
 
       if (isUInt<8>(DWordOffset0)) {
         SDLoc DL(Addr);
         SDValue Zero = CurDAG->getTargetConstant(0, DL, MVT::i32);
 
         // XXX - This is kind of hacky. Create a dummy sub node so we can check
         // the known bits in isDSOffsetLegal. We need to emit the selected node
         // here, so this is thrown away.
         SDValue Sub = CurDAG->getNode(ISD::SUB, DL, MVT::i32,
                                       Zero, Addr.getOperand(1));
 
         if (isDSOffsetLegal(Sub, DWordOffset1, 8)) {
           unsigned SubOp = Subtarget->hasAddNoCarry() ?
             AMDGPU::V_SUB_U32_e64 : AMDGPU::V_SUB_I32_e32;
 
           MachineSDNode *MachineSub
             = CurDAG->getMachineNode(SubOp, DL, MVT::i32,
                                      Zero, Addr.getOperand(1));
 
           Base = SDValue(MachineSub, 0);
           Offset0 = CurDAG->getTargetConstant(DWordOffset0, DL, MVT::i8);
           Offset1 = CurDAG->getTargetConstant(DWordOffset1, DL, MVT::i8);
           return true;
         }
       }
     }
   } else if (const ConstantSDNode *CAddr = dyn_cast<ConstantSDNode>(Addr)) {
     unsigned DWordOffset0 = CAddr->getZExtValue() / 4;
     unsigned DWordOffset1 = DWordOffset0 + 1;
     assert(4 * DWordOffset0 == CAddr->getZExtValue());
 
     if (isUInt<8>(DWordOffset0) && isUInt<8>(DWordOffset1)) {
       SDValue Zero = CurDAG->getTargetConstant(0, DL, MVT::i32);
       MachineSDNode *MovZero
         = CurDAG->getMachineNode(AMDGPU::V_MOV_B32_e32,
                                  DL, MVT::i32, Zero);
       Base = SDValue(MovZero, 0);
       Offset0 = CurDAG->getTargetConstant(DWordOffset0, DL, MVT::i8);
       Offset1 = CurDAG->getTargetConstant(DWordOffset1, DL, MVT::i8);
       return true;
     }
   }
 
   // default case
 
   // FIXME: This is broken on SI where we still need to check if the base
   // pointer is positive here.
   Base = Addr;
   Offset0 = CurDAG->getTargetConstant(0, DL, MVT::i8);
   Offset1 = CurDAG->getTargetConstant(1, DL, MVT::i8);
   return true;
 }
 
 bool AMDGPUDAGToDAGISel::SelectMUBUF(SDValue Addr, SDValue &Ptr,
                                      SDValue &VAddr, SDValue &SOffset,
                                      SDValue &Offset, SDValue &Offen,
                                      SDValue &Idxen, SDValue &Addr64,
                                      SDValue &GLC, SDValue &SLC,
                                      SDValue &TFE) const {
   // Subtarget prefers to use flat instruction
   if (Subtarget->useFlatForGlobal())
     return false;
 
   SDLoc DL(Addr);
 
   if (!GLC.getNode())
     GLC = CurDAG->getTargetConstant(0, DL, MVT::i1);
   if (!SLC.getNode())
     SLC = CurDAG->getTargetConstant(0, DL, MVT::i1);
   TFE = CurDAG->getTargetConstant(0, DL, MVT::i1);
 
   Idxen = CurDAG->getTargetConstant(0, DL, MVT::i1);
   Offen = CurDAG->getTargetConstant(0, DL, MVT::i1);
   Addr64 = CurDAG->getTargetConstant(0, DL, MVT::i1);
   SOffset = CurDAG->getTargetConstant(0, DL, MVT::i32);
 
   if (CurDAG->isBaseWithConstantOffset(Addr)) {
     SDValue N0 = Addr.getOperand(0);
     SDValue N1 = Addr.getOperand(1);
     ConstantSDNode *C1 = cast<ConstantSDNode>(N1);
 
     if (N0.getOpcode() == ISD::ADD) {
       // (add (add N2, N3), C1) -> addr64
       SDValue N2 = N0.getOperand(0);
       SDValue N3 = N0.getOperand(1);
       Addr64 = CurDAG->getTargetConstant(1, DL, MVT::i1);
       Ptr = N2;
       VAddr = N3;
     } else {
       // (add N0, C1) -> offset
       VAddr = CurDAG->getTargetConstant(0, DL, MVT::i32);
       Ptr = N0;
     }
 
     if (SIInstrInfo::isLegalMUBUFImmOffset(C1->getZExtValue())) {
       Offset = CurDAG->getTargetConstant(C1->getZExtValue(), DL, MVT::i16);
       return true;
     }
 
     if (isUInt<32>(C1->getZExtValue())) {
       // Illegal offset, store it in soffset.
       Offset = CurDAG->getTargetConstant(0, DL, MVT::i16);
       SOffset = SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, DL, MVT::i32,
                    CurDAG->getTargetConstant(C1->getZExtValue(), DL, MVT::i32)),
                         0);
       return true;
     }
   }
 
   if (Addr.getOpcode() == ISD::ADD) {
     // (add N0, N1) -> addr64
     SDValue N0 = Addr.getOperand(0);
     SDValue N1 = Addr.getOperand(1);
     Addr64 = CurDAG->getTargetConstant(1, DL, MVT::i1);
     Ptr = N0;
     VAddr = N1;
     Offset = CurDAG->getTargetConstant(0, DL, MVT::i16);
     return true;
   }
 
   // default case -> offset
   VAddr = CurDAG->getTargetConstant(0, DL, MVT::i32);
   Ptr = Addr;
   Offset = CurDAG->getTargetConstant(0, DL, MVT::i16);
 
   return true;
 }
 
 bool AMDGPUDAGToDAGISel::SelectMUBUFAddr64(SDValue Addr, SDValue &SRsrc,
                                            SDValue &VAddr, SDValue &SOffset,
                                            SDValue &Offset, SDValue &GLC,
                                            SDValue &SLC, SDValue &TFE) const {
   SDValue Ptr, Offen, Idxen, Addr64;
 
   // addr64 bit was removed for volcanic islands.
   if (Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
     return false;
 
   if (!SelectMUBUF(Addr, Ptr, VAddr, SOffset, Offset, Offen, Idxen, Addr64,
               GLC, SLC, TFE))
     return false;
 
   ConstantSDNode *C = cast<ConstantSDNode>(Addr64);
   if (C->getSExtValue()) {
     SDLoc DL(Addr);
 
     const SITargetLowering& Lowering =
       *static_cast<const SITargetLowering*>(getTargetLowering());
 
     SRsrc = SDValue(Lowering.wrapAddr64Rsrc(*CurDAG, DL, Ptr), 0);
     return true;
   }
 
   return false;
 }
 
 bool AMDGPUDAGToDAGISel::SelectMUBUFAddr64(SDValue Addr, SDValue &SRsrc,
                                            SDValue &VAddr, SDValue &SOffset,
                                            SDValue &Offset,
                                            SDValue &SLC) const {
   SLC = CurDAG->getTargetConstant(0, SDLoc(Addr), MVT::i1);
   SDValue GLC, TFE;
 
   return SelectMUBUFAddr64(Addr, SRsrc, VAddr, SOffset, Offset, GLC, SLC, TFE);
 }
 
 static bool isStackPtrRelative(const MachinePointerInfo &PtrInfo) {
   auto PSV = PtrInfo.V.dyn_cast<const PseudoSourceValue *>();
   return PSV && PSV->isStack();
 }
 
 std::pair<SDValue, SDValue> AMDGPUDAGToDAGISel::foldFrameIndex(SDValue N) const {
   const MachineFunction &MF = CurDAG->getMachineFunction();
   const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
 
   if (auto FI = dyn_cast<FrameIndexSDNode>(N)) {
     SDValue TFI = CurDAG->getTargetFrameIndex(FI->getIndex(),
                                               FI->getValueType(0));
 
     // If we can resolve this to a frame index access, this is relative to the
     // frame pointer SGPR.
     return std::make_pair(TFI, CurDAG->getRegister(Info->getFrameOffsetReg(),
                                                    MVT::i32));
   }
 
   // If we don't know this private access is a local stack object, it needs to
   // be relative to the entry point's scratch wave offset register.
   return std::make_pair(N, CurDAG->getRegister(Info->getScratchWaveOffsetReg(),
                                                MVT::i32));
 }
 
 bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffen(SDNode *Parent,
                                                  SDValue Addr, SDValue &Rsrc,
                                                  SDValue &VAddr, SDValue &SOffset,
                                                  SDValue &ImmOffset) const {
 
   SDLoc DL(Addr);
   MachineFunction &MF = CurDAG->getMachineFunction();
   const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
 
   Rsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);
 
   if (ConstantSDNode *CAddr = dyn_cast<ConstantSDNode>(Addr)) {
     unsigned Imm = CAddr->getZExtValue();
 
     SDValue HighBits = CurDAG->getTargetConstant(Imm & ~4095, DL, MVT::i32);
     MachineSDNode *MovHighBits = CurDAG->getMachineNode(AMDGPU::V_MOV_B32_e32,
                                                         DL, MVT::i32, HighBits);
     VAddr = SDValue(MovHighBits, 0);
 
     // In a call sequence, stores to the argument stack area are relative to the
     // stack pointer.
     const MachinePointerInfo &PtrInfo = cast<MemSDNode>(Parent)->getPointerInfo();
     unsigned SOffsetReg = isStackPtrRelative(PtrInfo) ?
       Info->getStackPtrOffsetReg() : Info->getScratchWaveOffsetReg();
 
     SOffset = CurDAG->getRegister(SOffsetReg, MVT::i32);
     ImmOffset = CurDAG->getTargetConstant(Imm & 4095, DL, MVT::i16);
     return true;
   }
 
   if (CurDAG->isBaseWithConstantOffset(Addr)) {
     // (add n0, c1)
 
     SDValue N0 = Addr.getOperand(0);
     SDValue N1 = Addr.getOperand(1);
 
     // Offsets in vaddr must be positive if range checking is enabled.
     //
     // The total computation of vaddr + soffset + offset must not overflow.  If
     // vaddr is negative, even if offset is 0 the sgpr offset add will end up
     // overflowing.
     //
     // Prior to gfx9, MUBUF instructions with the vaddr offset enabled would
     // always perform a range check. If a negative vaddr base index was used,
     // this would fail the range check. The overall address computation would
     // compute a valid address, but this doesn't happen due to the range
     // check. For out-of-bounds MUBUF loads, a 0 is returned.
     //
     // Therefore it should be safe to fold any VGPR offset on gfx9 into the
     // MUBUF vaddr, but not on older subtargets which can only do this if the
     // sign bit is known 0.
     ConstantSDNode *C1 = cast<ConstantSDNode>(N1);
     if (SIInstrInfo::isLegalMUBUFImmOffset(C1->getZExtValue()) &&
         (!Subtarget->privateMemoryResourceIsRangeChecked() ||
          CurDAG->SignBitIsZero(N0))) {
       std::tie(VAddr, SOffset) = foldFrameIndex(N0);
       ImmOffset = CurDAG->getTargetConstant(C1->getZExtValue(), DL, MVT::i16);
       return true;
     }
   }
 
   // (node)
   std::tie(VAddr, SOffset) = foldFrameIndex(Addr);
   ImmOffset = CurDAG->getTargetConstant(0, DL, MVT::i16);
   return true;
 }
 
 bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffset(SDNode *Parent,
                                                   SDValue Addr,
                                                   SDValue &SRsrc,
                                                   SDValue &SOffset,
                                                   SDValue &Offset) const {
   ConstantSDNode *CAddr = dyn_cast<ConstantSDNode>(Addr);
   if (!CAddr || !SIInstrInfo::isLegalMUBUFImmOffset(CAddr->getZExtValue()))
     return false;
 
   SDLoc DL(Addr);
   MachineFunction &MF = CurDAG->getMachineFunction();
   const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
 
   SRsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);
 
   const MachinePointerInfo &PtrInfo = cast<MemSDNode>(Parent)->getPointerInfo();
   unsigned SOffsetReg = isStackPtrRelative(PtrInfo) ?
     Info->getStackPtrOffsetReg() : Info->getScratchWaveOffsetReg();
 
   // FIXME: Get from MachinePointerInfo? We should only be using the frame
   // offset if we know this is in a call sequence.
   SOffset = CurDAG->getRegister(SOffsetReg, MVT::i32);
 
   Offset = CurDAG->getTargetConstant(CAddr->getZExtValue(), DL, MVT::i16);
   return true;
 }
 
 bool AMDGPUDAGToDAGISel::SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc,
                                            SDValue &SOffset, SDValue &Offset,
                                            SDValue &GLC, SDValue &SLC,
                                            SDValue &TFE) const {
   SDValue Ptr, VAddr, Offen, Idxen, Addr64;
   const SIInstrInfo *TII =
     static_cast<const SIInstrInfo *>(Subtarget->getInstrInfo());
 
   if (!SelectMUBUF(Addr, Ptr, VAddr, SOffset, Offset, Offen, Idxen, Addr64,
               GLC, SLC, TFE))
     return false;
 
   if (!cast<ConstantSDNode>(Offen)->getSExtValue() &&
       !cast<ConstantSDNode>(Idxen)->getSExtValue() &&
       !cast<ConstantSDNode>(Addr64)->getSExtValue()) {
     uint64_t Rsrc = TII->getDefaultRsrcDataFormat() |
                     APInt::getAllOnesValue(32).getZExtValue(); // Size
     SDLoc DL(Addr);
 
     const SITargetLowering& Lowering =
       *static_cast<const SITargetLowering*>(getTargetLowering());
 
     SRsrc = SDValue(Lowering.buildRSRC(*CurDAG, DL, Ptr, 0, Rsrc), 0);
     return true;
   }
   return false;
 }
 
 bool AMDGPUDAGToDAGISel::SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc,
                                            SDValue &Soffset, SDValue &Offset
                                            ) const {
   SDValue GLC, SLC, TFE;
 
   return SelectMUBUFOffset(Addr, SRsrc, Soffset, Offset, GLC, SLC, TFE);
 }
 bool AMDGPUDAGToDAGISel::SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc,
                                            SDValue &Soffset, SDValue &Offset,
                                            SDValue &SLC) const {
   SDValue GLC, TFE;
 
   return SelectMUBUFOffset(Addr, SRsrc, Soffset, Offset, GLC, SLC, TFE);
 }
 
 bool AMDGPUDAGToDAGISel::SelectMUBUFConstant(SDValue Constant,
                                              SDValue &SOffset,
                                              SDValue &ImmOffset) const {
   SDLoc DL(Constant);
   const uint32_t Align = 4;
   const uint32_t MaxImm = alignDown(4095, Align);
   uint32_t Imm = cast<ConstantSDNode>(Constant)->getZExtValue();
   uint32_t Overflow = 0;
 
   if (Imm > MaxImm) {
     if (Imm <= MaxImm + 64) {
       // Use an SOffset inline constant for 4..64
       Overflow = Imm - MaxImm;
       Imm = MaxImm;
     } else {
       // Try to keep the same value in SOffset for adjacent loads, so that
       // the corresponding register contents can be re-used.
       //
       // Load values with all low-bits (except for alignment bits) set into
       // SOffset, so that a larger range of values can be covered using
       // s_movk_i32.
       //
       // Atomic operations fail to work correctly when individual address
       // components are unaligned, even if their sum is aligned.
       uint32_t High = (Imm + Align) & ~4095;
       uint32_t Low = (Imm + Align) & 4095;
       Imm = Low;
       Overflow = High - Align;
     }
   }
 
   // There is a hardware bug in SI and CI which prevents address clamping in
   // MUBUF instructions from working correctly with SOffsets. The immediate
   // offset is unaffected.
   if (Overflow > 0 &&
       Subtarget->getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS)
     return false;
 
   ImmOffset = CurDAG->getTargetConstant(Imm, DL, MVT::i16);
 
   if (Overflow <= 64)
     SOffset = CurDAG->getTargetConstant(Overflow, DL, MVT::i32);
   else
     SOffset = SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, DL, MVT::i32,
                       CurDAG->getTargetConstant(Overflow, DL, MVT::i32)),
                       0);
 
   return true;
 }
 
 bool AMDGPUDAGToDAGISel::SelectMUBUFIntrinsicOffset(SDValue Offset,
                                                     SDValue &SOffset,
                                                     SDValue &ImmOffset) const {
   SDLoc DL(Offset);
 
   if (!isa<ConstantSDNode>(Offset))
     return false;
 
   return SelectMUBUFConstant(Offset, SOffset, ImmOffset);
 }
 
 bool AMDGPUDAGToDAGISel::SelectMUBUFIntrinsicVOffset(SDValue Offset,
                                                      SDValue &SOffset,
                                                      SDValue &ImmOffset,
                                                      SDValue &VOffset) const {
   SDLoc DL(Offset);
 
   // Don't generate an unnecessary voffset for constant offsets.
   if (isa<ConstantSDNode>(Offset)) {
     SDValue Tmp1, Tmp2;
 
     // When necessary, use a voffset in <= CI anyway to work around a hardware
     // bug.
     if (Subtarget->getGeneration() > AMDGPUSubtarget::SEA_ISLANDS ||
         SelectMUBUFConstant(Offset, Tmp1, Tmp2))
       return false;
   }
 
   if (CurDAG->isBaseWithConstantOffset(Offset)) {
     SDValue N0 = Offset.getOperand(0);
     SDValue N1 = Offset.getOperand(1);
     if (cast<ConstantSDNode>(N1)->getSExtValue() >= 0 &&
         SelectMUBUFConstant(N1, SOffset, ImmOffset)) {
       VOffset = N0;
       return true;
     }
   }
 
   SOffset = CurDAG->getTargetConstant(0, DL, MVT::i32);
   ImmOffset = CurDAG->getTargetConstant(0, DL, MVT::i16);
   VOffset = Offset;
 
   return true;
 }
 
 template <bool IsSigned>
 bool AMDGPUDAGToDAGISel::SelectFlatOffset(SDValue Addr,
                                           SDValue &VAddr,
                                           SDValue &Offset,
                                           SDValue &SLC) const {
   int64_t OffsetVal = 0;
 
   if (Subtarget->hasFlatInstOffsets() &&
       CurDAG->isBaseWithConstantOffset(Addr)) {
     SDValue N0 = Addr.getOperand(0);
     SDValue N1 = Addr.getOperand(1);
     int64_t COffsetVal = cast<ConstantSDNode>(N1)->getSExtValue();
 
     if ((IsSigned && isInt<13>(COffsetVal)) ||
         (!IsSigned && isUInt<12>(COffsetVal))) {
       Addr = N0;
       OffsetVal = COffsetVal;
     }
   }
 
   VAddr = Addr;
   Offset = CurDAG->getTargetConstant(OffsetVal, SDLoc(), MVT::i16);
   SLC = CurDAG->getTargetConstant(0, SDLoc(), MVT::i1);
 
   return true;
 }
 
 bool AMDGPUDAGToDAGISel::SelectFlatAtomic(SDValue Addr,
                                           SDValue &VAddr,
                                           SDValue &Offset,
                                           SDValue &SLC) const {
   return SelectFlatOffset<false>(Addr, VAddr, Offset, SLC);
 }
 
 bool AMDGPUDAGToDAGISel::SelectFlatAtomicSigned(SDValue Addr,
                                           SDValue &VAddr,
                                           SDValue &Offset,
                                           SDValue &SLC) const {
   return SelectFlatOffset<true>(Addr, VAddr, Offset, SLC);
 }
 
 bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue ByteOffsetNode,
                                           SDValue &Offset, bool &Imm) const {
 
   // FIXME: Handle non-constant offsets.
   ConstantSDNode *C = dyn_cast<ConstantSDNode>(ByteOffsetNode);
   if (!C)
     return false;
 
   SDLoc SL(ByteOffsetNode);
   GCNSubtarget::Generation Gen = Subtarget->getGeneration();
   int64_t ByteOffset = C->getSExtValue();
   int64_t EncodedOffset = AMDGPU::getSMRDEncodedOffset(*Subtarget, ByteOffset);
 
   if (AMDGPU::isLegalSMRDImmOffset(*Subtarget, ByteOffset)) {
     Offset = CurDAG->getTargetConstant(EncodedOffset, SL, MVT::i32);
     Imm = true;
     return true;
   }
 
   if (!isUInt<32>(EncodedOffset) || !isUInt<32>(ByteOffset))
     return false;
 
   if (Gen == AMDGPUSubtarget::SEA_ISLANDS && isUInt<32>(EncodedOffset)) {
     // 32-bit Immediates are supported on Sea Islands.
     Offset = CurDAG->getTargetConstant(EncodedOffset, SL, MVT::i32);
   } else {
     SDValue C32Bit = CurDAG->getTargetConstant(ByteOffset, SL, MVT::i32);
     Offset = SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL, MVT::i32,
                                             C32Bit), 0);
   }
   Imm = false;
   return true;
 }
 
 SDValue AMDGPUDAGToDAGISel::Expand32BitAddress(SDValue Addr) const {
   if (Addr.getValueType() != MVT::i32)
     return Addr;
 
   // Zero-extend a 32-bit address.
   SDLoc SL(Addr);
 
   const MachineFunction &MF = CurDAG->getMachineFunction();
   const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
   unsigned AddrHiVal = Info->get32BitAddressHighBits();
   SDValue AddrHi = CurDAG->getTargetConstant(AddrHiVal, SL, MVT::i32);
 
   const SDValue Ops[] = {
     CurDAG->getTargetConstant(AMDGPU::SReg_64_XEXECRegClassID, SL, MVT::i32),
     Addr,
     CurDAG->getTargetConstant(AMDGPU::sub0, SL, MVT::i32),
     SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL, MVT::i32, AddrHi),
             0),
     CurDAG->getTargetConstant(AMDGPU::sub1, SL, MVT::i32),
   };
 
   return SDValue(CurDAG->getMachineNode(AMDGPU::REG_SEQUENCE, SL, MVT::i64,
                                         Ops), 0);
 }
 
 bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,
                                      SDValue &Offset, bool &Imm) const {
   SDLoc SL(Addr);
 
-  if (CurDAG->isBaseWithConstantOffset(Addr)) {
+  // A 32-bit (address + offset) should not cause unsigned 32-bit integer
+  // wraparound, because s_load instructions perform the addition in 64 bits.
+  if ((Addr.getValueType() != MVT::i32 ||
+       Addr->getFlags().hasNoUnsignedWrap()) &&
+      CurDAG->isBaseWithConstantOffset(Addr)) {
     SDValue N0 = Addr.getOperand(0);
     SDValue N1 = Addr.getOperand(1);
 
     if (SelectSMRDOffset(N1, Offset, Imm)) {
       SBase = Expand32BitAddress(N0);
       return true;
     }
   }
   SBase = Expand32BitAddress(Addr);
   Offset = CurDAG->getTargetConstant(0, SL, MVT::i32);
   Imm = true;
   return true;
 }
 
 bool AMDGPUDAGToDAGISel::SelectSMRDImm(SDValue Addr, SDValue &SBase,
                                        SDValue &Offset) const {
   bool Imm;
   return SelectSMRD(Addr, SBase, Offset, Imm) && Imm;
 }
 
 bool AMDGPUDAGToDAGISel::SelectSMRDImm32(SDValue Addr, SDValue &SBase,
                                          SDValue &Offset) const {
 
   if (Subtarget->getGeneration() != AMDGPUSubtarget::SEA_ISLANDS)
     return false;
 
   bool Imm;
   if (!SelectSMRD(Addr, SBase, Offset, Imm))
     return false;
 
   return !Imm && isa<ConstantSDNode>(Offset);
 }
 
 bool AMDGPUDAGToDAGISel::SelectSMRDSgpr(SDValue Addr, SDValue &SBase,
                                         SDValue &Offset) const {
   bool Imm;
   return SelectSMRD(Addr, SBase, Offset, Imm) && !Imm &&
          !isa<ConstantSDNode>(Offset);
 }
 
 bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm(SDValue Addr,
                                              SDValue &Offset) const {
   bool Imm;
   return SelectSMRDOffset(Addr, Offset, Imm) && Imm;
 }
 
 bool AMDGPUDAGToDAGISel::SelectSMRDBufferImm32(SDValue Addr,
                                                SDValue &Offset) const {
   if (Subtarget->getGeneration() != AMDGPUSubtarget::SEA_ISLANDS)
     return false;
 
   bool Imm;
   if (!SelectSMRDOffset(Addr, Offset, Imm))
     return false;
 
   return !Imm && isa<ConstantSDNode>(Offset);
 }
 
 bool AMDGPUDAGToDAGISel::SelectMOVRELOffset(SDValue Index,
                                             SDValue &Base,
                                             SDValue &Offset) const {
   SDLoc DL(Index);
 
   if (CurDAG->isBaseWithConstantOffset(Index)) {
     SDValue N0 = Index.getOperand(0);
     SDValue N1 = Index.getOperand(1);
     ConstantSDNode *C1 = cast<ConstantSDNode>(N1);
 
     // (add n0, c0)
     Base = N0;
     Offset = CurDAG->getTargetConstant(C1->getZExtValue(), DL, MVT::i32);
     return true;
   }
 
   if (isa<ConstantSDNode>(Index))
     return false;
 
   Base = Index;
   Offset = CurDAG->getTargetConstant(0, DL, MVT::i32);
   return true;
 }
 
 SDNode *AMDGPUDAGToDAGISel::getS_BFE(unsigned Opcode, const SDLoc &DL,
                                      SDValue Val, uint32_t Offset,
                                      uint32_t Width) {
   // Transformation function, pack the offset and width of a BFE into
   // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
   // source, bits [5:0] contain the offset and bits [22:16] the width.
   uint32_t PackedVal = Offset | (Width << 16);
   SDValue PackedConst = CurDAG->getTargetConstant(PackedVal, DL, MVT::i32);
 
   return CurDAG->getMachineNode(Opcode, DL, MVT::i32, Val, PackedConst);
 }
 
 void AMDGPUDAGToDAGISel::SelectS_BFEFromShifts(SDNode *N) {
   // "(a << b) srl c)" ---> "BFE_U32 a, (c-b), (32-c)
   // "(a << b) sra c)" ---> "BFE_I32 a, (c-b), (32-c)
   // Predicate: 0 < b <= c < 32
 
   const SDValue &Shl = N->getOperand(0);
   ConstantSDNode *B = dyn_cast<ConstantSDNode>(Shl->getOperand(1));
   ConstantSDNode *C = dyn_cast<ConstantSDNode>(N->getOperand(1));
 
   if (B && C) {
     uint32_t BVal = B->getZExtValue();
     uint32_t CVal = C->getZExtValue();
 
     if (0 < BVal && BVal <= CVal && CVal < 32) {
       bool Signed = N->getOpcode() == ISD::SRA;
       unsigned Opcode = Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32;
 
       ReplaceNode(N, getS_BFE(Opcode, SDLoc(N), Shl.getOperand(0), CVal - BVal,
                               32 - CVal));
       return;
     }
   }
   SelectCode(N);
 }
 
 void AMDGPUDAGToDAGISel::SelectS_BFE(SDNode *N) {
   switch (N->getOpcode()) {
   case ISD::AND:
     if (N->getOperand(0).getOpcode() == ISD::SRL) {
       // "(a srl b) & mask" ---> "BFE_U32 a, b, popcount(mask)"
       // Predicate: isMask(mask)
       const SDValue &Srl = N->getOperand(0);
       ConstantSDNode *Shift = dyn_cast<ConstantSDNode>(Srl.getOperand(1));
       ConstantSDNode *Mask = dyn_cast<ConstantSDNode>(N->getOperand(1));
 
       if (Shift && Mask) {
         uint32_t ShiftVal = Shift->getZExtValue();
         uint32_t MaskVal = Mask->getZExtValue();
 
         if (isMask_32(MaskVal)) {
           uint32_t WidthVal = countPopulation(MaskVal);
 
           ReplaceNode(N, getS_BFE(AMDGPU::S_BFE_U32, SDLoc(N),
                                   Srl.getOperand(0), ShiftVal, WidthVal));
           return;
         }
       }
     }
     break;
   case ISD::SRL:
     if (N->getOperand(0).getOpcode() == ISD::AND) {
       // "(a & mask) srl b)" ---> "BFE_U32 a, b, popcount(mask >> b)"
       // Predicate: isMask(mask >> b)
       const SDValue &And = N->getOperand(0);
       ConstantSDNode *Shift = dyn_cast<ConstantSDNode>(N->getOperand(1));
       ConstantSDNode *Mask = dyn_cast<ConstantSDNode>(And->getOperand(1));
 
       if (Shift && Mask) {
         uint32_t ShiftVal = Shift->getZExtValue();
         uint32_t MaskVal = Mask->getZExtValue() >> ShiftVal;
 
         if (isMask_32(MaskVal)) {
           uint32_t WidthVal = countPopulation(MaskVal);
 
           ReplaceNode(N, getS_BFE(AMDGPU::S_BFE_U32, SDLoc(N),
                                   And.getOperand(0), ShiftVal, WidthVal));
           return;
         }
       }
     } else if (N->getOperand(0).getOpcode() == ISD::SHL) {
       SelectS_BFEFromShifts(N);
       return;
     }
     break;
   case ISD::SRA:
     if (N->getOperand(0).getOpcode() == ISD::SHL) {
       SelectS_BFEFromShifts(N);
       return;
     }
     break;
 
   case ISD::SIGN_EXTEND_INREG: {
     // sext_inreg (srl x, 16), i8 -> bfe_i32 x, 16, 8
     SDValue Src = N->getOperand(0);
     if (Src.getOpcode() != ISD::SRL)
       break;
 
     const ConstantSDNode *Amt = dyn_cast<ConstantSDNode>(Src.getOperand(1));
     if (!Amt)
       break;
 
     unsigned Width = cast<VTSDNode>(N->getOperand(1))->getVT().getSizeInBits();
     ReplaceNode(N, getS_BFE(AMDGPU::S_BFE_I32, SDLoc(N), Src.getOperand(0),
                             Amt->getZExtValue(), Width));
     return;
   }
   }
 
   SelectCode(N);
 }
 
 bool AMDGPUDAGToDAGISel::isCBranchSCC(const SDNode *N) const {
   assert(N->getOpcode() == ISD::BRCOND);
   if (!N->hasOneUse())
     return false;
 
   SDValue Cond = N->getOperand(1);
   if (Cond.getOpcode() == ISD::CopyToReg)
     Cond = Cond.getOperand(2);
 
   if (Cond.getOpcode() != ISD::SETCC || !Cond.hasOneUse())
     return false;
 
   MVT VT = Cond.getOperand(0).getSimpleValueType();
   if (VT == MVT::i32)
     return true;
 
   if (VT == MVT::i64) {
     auto ST = static_cast<const GCNSubtarget *>(Subtarget);
 
     ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
     return (CC == ISD::SETEQ || CC == ISD::SETNE) && ST->hasScalarCompareEq64();
   }
 
   return false;
 }
 
 void AMDGPUDAGToDAGISel::SelectBRCOND(SDNode *N) {
   SDValue Cond = N->getOperand(1);
 
   if (Cond.isUndef()) {
     CurDAG->SelectNodeTo(N, AMDGPU::SI_BR_UNDEF, MVT::Other,
                          N->getOperand(2), N->getOperand(0));
     return;
   }
 
   bool UseSCCBr = isCBranchSCC(N) && isUniformBr(N);
   unsigned BrOp = UseSCCBr ? AMDGPU::S_CBRANCH_SCC1 : AMDGPU::S_CBRANCH_VCCNZ;
   unsigned CondReg = UseSCCBr ? AMDGPU::SCC : AMDGPU::VCC;
   SDLoc SL(N);
 
   if (!UseSCCBr) {
     // This is the case that we are selecting to S_CBRANCH_VCCNZ.  We have not
     // analyzed what generates the vcc value, so we do not know whether vcc
     // bits for disabled lanes are 0.  Thus we need to mask out bits for
     // disabled lanes.
     //
     // For the case that we select S_CBRANCH_SCC1 and it gets
     // changed to S_CBRANCH_VCCNZ in SIFixSGPRCopies, SIFixSGPRCopies calls
     // SIInstrInfo::moveToVALU which inserts the S_AND).
     //
     // We could add an analysis of what generates the vcc value here and omit
     // the S_AND when is unnecessary. But it would be better to add a separate
     // pass after SIFixSGPRCopies to do the unnecessary S_AND removal, so it
     // catches both cases.
     Cond = SDValue(CurDAG->getMachineNode(AMDGPU::S_AND_B64, SL, MVT::i1,
                                CurDAG->getRegister(AMDGPU::EXEC, MVT::i1),
                                Cond),
                    0);
   }
 
   SDValue VCC = CurDAG->getCopyToReg(N->getOperand(0), SL, CondReg, Cond);
   CurDAG->SelectNodeTo(N, BrOp, MVT::Other,
                        N->getOperand(2), // Basic Block
                        VCC.getValue(0));
 }
 
 void AMDGPUDAGToDAGISel::SelectFMAD_FMA(SDNode *N) {
   MVT VT = N->getSimpleValueType(0);
   bool IsFMA = N->getOpcode() == ISD::FMA;
   if (VT != MVT::f32 || (!Subtarget->hasMadMixInsts() &&
                          !Subtarget->hasFmaMixInsts()) ||
       ((IsFMA && Subtarget->hasMadMixInsts()) ||
        (!IsFMA && Subtarget->hasFmaMixInsts()))) {
     SelectCode(N);
     return;
   }
 
   SDValue Src0 = N->getOperand(0);
   SDValue Src1 = N->getOperand(1);
   SDValue Src2 = N->getOperand(2);
   unsigned Src0Mods, Src1Mods, Src2Mods;
 
   // Avoid using v_mad_mix_f32/v_fma_mix_f32 unless there is actually an operand
   // using the conversion from f16.
   bool Sel0 = SelectVOP3PMadMixModsImpl(Src0, Src0, Src0Mods);
   bool Sel1 = SelectVOP3PMadMixModsImpl(Src1, Src1, Src1Mods);
   bool Sel2 = SelectVOP3PMadMixModsImpl(Src2, Src2, Src2Mods);
 
   assert((IsFMA || !Subtarget->hasFP32Denormals()) &&
          "fmad selected with denormals enabled");
   // TODO: We can select this with f32 denormals enabled if all the sources are
   // converted from f16 (in which case fmad isn't legal).
 
   if (Sel0 || Sel1 || Sel2) {
     // For dummy operands.
     SDValue Zero = CurDAG->getTargetConstant(0, SDLoc(), MVT::i32);
     SDValue Ops[] = {
       CurDAG->getTargetConstant(Src0Mods, SDLoc(), MVT::i32), Src0,
       CurDAG->getTargetConstant(Src1Mods, SDLoc(), MVT::i32), Src1,
       CurDAG->getTargetConstant(Src2Mods, SDLoc(), MVT::i32), Src2,
       CurDAG->getTargetConstant(0, SDLoc(), MVT::i1),
       Zero, Zero
     };
 
     CurDAG->SelectNodeTo(N,
                          IsFMA ? AMDGPU::V_FMA_MIX_F32 : AMDGPU::V_MAD_MIX_F32,
                          MVT::f32, Ops);
   } else {
     SelectCode(N);
   }
 }
 
 // This is here because there isn't a way to use the generated sub0_sub1 as the
 // subreg index to EXTRACT_SUBREG in tablegen.
 void AMDGPUDAGToDAGISel::SelectATOMIC_CMP_SWAP(SDNode *N) {
   MemSDNode *Mem = cast<MemSDNode>(N);
   unsigned AS = Mem->getAddressSpace();
   if (AS == AMDGPUASI.FLAT_ADDRESS) {
     SelectCode(N);
     return;
   }
 
   MVT VT = N->getSimpleValueType(0);
   bool Is32 = (VT == MVT::i32);
   SDLoc SL(N);
 
   MachineSDNode *CmpSwap = nullptr;
   if (Subtarget->hasAddr64()) {
     SDValue SRsrc, VAddr, SOffset, Offset, SLC;
 
     if (SelectMUBUFAddr64(Mem->getBasePtr(), SRsrc, VAddr, SOffset, Offset, SLC)) {
       unsigned Opcode = Is32 ? AMDGPU::BUFFER_ATOMIC_CMPSWAP_ADDR64_RTN :
         AMDGPU::BUFFER_ATOMIC_CMPSWAP_X2_ADDR64_RTN;
       SDValue CmpVal = Mem->getOperand(2);
 
       // XXX - Do we care about glue operands?
 
       SDValue Ops[] = {
         CmpVal, VAddr, SRsrc, SOffset, Offset, SLC, Mem->getChain()
       };
 
       CmpSwap = CurDAG->getMachineNode(Opcode, SL, Mem->getVTList(), Ops);
     }
   }
 
   if (!CmpSwap) {
     SDValue SRsrc, SOffset, Offset, SLC;
     if (SelectMUBUFOffset(Mem->getBasePtr(), SRsrc, SOffset, Offset, SLC)) {
       unsigned Opcode = Is32 ? AMDGPU::BUFFER_ATOMIC_CMPSWAP_OFFSET_RTN :
         AMDGPU::BUFFER_ATOMIC_CMPSWAP_X2_OFFSET_RTN;
 
       SDValue CmpVal = Mem->getOperand(2);
       SDValue Ops[] = {
         CmpVal, SRsrc, SOffset, Offset, SLC, Mem->getChain()
       };
 
       CmpSwap = CurDAG->getMachineNode(Opcode, SL, Mem->getVTList(), Ops);
     }
   }
 
   if (!CmpSwap) {
     SelectCode(N);
     return;
   }
 
   MachineSDNode::mmo_iterator MMOs = MF->allocateMemRefsArray(1);
   *MMOs = Mem->getMemOperand();
   CmpSwap->setMemRefs(MMOs, MMOs + 1);
 
   unsigned SubReg = Is32 ? AMDGPU::sub0 : AMDGPU::sub0_sub1;
   SDValue Extract
     = CurDAG->getTargetExtractSubreg(SubReg, SL, VT, SDValue(CmpSwap, 0));
 
   ReplaceUses(SDValue(N, 0), Extract);
   ReplaceUses(SDValue(N, 1), SDValue(CmpSwap, 1));
   CurDAG->RemoveDeadNode(N);
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3ModsImpl(SDValue In, SDValue &Src,
                                             unsigned &Mods) const {
   Mods = 0;
   Src = In;
 
   if (Src.getOpcode() == ISD::FNEG) {
     Mods |= SISrcMods::NEG;
     Src = Src.getOperand(0);
   }
 
   if (Src.getOpcode() == ISD::FABS) {
     Mods |= SISrcMods::ABS;
     Src = Src.getOperand(0);
   }
 
   return true;
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3Mods(SDValue In, SDValue &Src,
                                         SDValue &SrcMods) const {
   unsigned Mods;
   if (SelectVOP3ModsImpl(In, Src, Mods)) {
     SrcMods = CurDAG->getTargetConstant(Mods, SDLoc(In), MVT::i32);
     return true;
   }
 
   return false;
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3Mods_NNaN(SDValue In, SDValue &Src,
                                              SDValue &SrcMods) const {
   SelectVOP3Mods(In, Src, SrcMods);
   return isNoNanSrc(Src);
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3NoMods(SDValue In, SDValue &Src) const {
   if (In.getOpcode() == ISD::FABS || In.getOpcode() == ISD::FNEG)
     return false;
 
   Src = In;
   return true;
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3Mods0(SDValue In, SDValue &Src,
                                          SDValue &SrcMods, SDValue &Clamp,
                                          SDValue &Omod) const {
   SDLoc DL(In);
   Clamp = CurDAG->getTargetConstant(0, DL, MVT::i1);
   Omod = CurDAG->getTargetConstant(0, DL, MVT::i1);
 
   return SelectVOP3Mods(In, Src, SrcMods);
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3Mods0Clamp0OMod(SDValue In, SDValue &Src,
                                                    SDValue &SrcMods,
                                                    SDValue &Clamp,
                                                    SDValue &Omod) const {
   Clamp = Omod = CurDAG->getTargetConstant(0, SDLoc(In), MVT::i32);
   return SelectVOP3Mods(In, Src, SrcMods);
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3OMods(SDValue In, SDValue &Src,
                                          SDValue &Clamp, SDValue &Omod) const {
   Src = In;
 
   SDLoc DL(In);
   Clamp = CurDAG->getTargetConstant(0, DL, MVT::i1);
   Omod = CurDAG->getTargetConstant(0, DL, MVT::i1);
 
   return true;
 }
 
 static SDValue stripBitcast(SDValue Val) {
   return Val.getOpcode() == ISD::BITCAST ? Val.getOperand(0) : Val;
 }
 
 // Figure out if this is really an extract of the high 16-bits of a dword.
 static bool isExtractHiElt(SDValue In, SDValue &Out) {
   In = stripBitcast(In);
   if (In.getOpcode() != ISD::TRUNCATE)
     return false;
 
   SDValue Srl = In.getOperand(0);
   if (Srl.getOpcode() == ISD::SRL) {
     if (ConstantSDNode *ShiftAmt = dyn_cast<ConstantSDNode>(Srl.getOperand(1))) {
       if (ShiftAmt->getZExtValue() == 16) {
         Out = stripBitcast(Srl.getOperand(0));
         return true;
       }
     }
   }
 
   return false;
 }
 
 // Look through operations that obscure just looking at the low 16-bits of the
 // same register.
 static SDValue stripExtractLoElt(SDValue In) {
   if (In.getOpcode() == ISD::TRUNCATE) {
     SDValue Src = In.getOperand(0);
     if (Src.getValueType().getSizeInBits() == 32)
       return stripBitcast(Src);
   }
 
   return In;
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3PMods(SDValue In, SDValue &Src,
                                          SDValue &SrcMods) const {
   unsigned Mods = 0;
   Src = In;
 
   if (Src.getOpcode() == ISD::FNEG) {
     Mods ^= (SISrcMods::NEG | SISrcMods::NEG_HI);
     Src = Src.getOperand(0);
   }
 
   if (Src.getOpcode() == ISD::BUILD_VECTOR) {
     unsigned VecMods = Mods;
 
     SDValue Lo = stripBitcast(Src.getOperand(0));
     SDValue Hi = stripBitcast(Src.getOperand(1));
 
     if (Lo.getOpcode() == ISD::FNEG) {
       Lo = stripBitcast(Lo.getOperand(0));
       Mods ^= SISrcMods::NEG;
     }
 
     if (Hi.getOpcode() == ISD::FNEG) {
       Hi = stripBitcast(Hi.getOperand(0));
       Mods ^= SISrcMods::NEG_HI;
     }
 
     if (isExtractHiElt(Lo, Lo))
       Mods |= SISrcMods::OP_SEL_0;
 
     if (isExtractHiElt(Hi, Hi))
       Mods |= SISrcMods::OP_SEL_1;
 
     Lo = stripExtractLoElt(Lo);
     Hi = stripExtractLoElt(Hi);
 
     if (Lo == Hi && !isInlineImmediate(Lo.getNode())) {
       // Really a scalar input. Just select from the low half of the register to
       // avoid packing.
 
       Src = Lo;
       SrcMods = CurDAG->getTargetConstant(Mods, SDLoc(In), MVT::i32);
       return true;
     }
 
     Mods = VecMods;
   }
 
   // Packed instructions do not have abs modifiers.
   Mods |= SISrcMods::OP_SEL_1;
 
   SrcMods = CurDAG->getTargetConstant(Mods, SDLoc(In), MVT::i32);
   return true;
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3PMods0(SDValue In, SDValue &Src,
                                           SDValue &SrcMods,
                                           SDValue &Clamp) const {
   SDLoc SL(In);
 
   // FIXME: Handle clamp and op_sel
   Clamp = CurDAG->getTargetConstant(0, SL, MVT::i32);
 
   return SelectVOP3PMods(In, Src, SrcMods);
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3OpSel(SDValue In, SDValue &Src,
                                          SDValue &SrcMods) const {
   Src = In;
   // FIXME: Handle op_sel
   SrcMods = CurDAG->getTargetConstant(0, SDLoc(In), MVT::i32);
   return true;
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3OpSel0(SDValue In, SDValue &Src,
                                           SDValue &SrcMods,
                                           SDValue &Clamp) const {
   SDLoc SL(In);
 
   // FIXME: Handle clamp
   Clamp = CurDAG->getTargetConstant(0, SL, MVT::i32);
 
   return SelectVOP3OpSel(In, Src, SrcMods);
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3OpSelMods(SDValue In, SDValue &Src,
                                              SDValue &SrcMods) const {
   // FIXME: Handle op_sel
   return SelectVOP3Mods(In, Src, SrcMods);
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3OpSelMods0(SDValue In, SDValue &Src,
                                               SDValue &SrcMods,
                                               SDValue &Clamp) const {
   SDLoc SL(In);
 
   // FIXME: Handle clamp
   Clamp = CurDAG->getTargetConstant(0, SL, MVT::i32);
 
   return SelectVOP3OpSelMods(In, Src, SrcMods);
 }
 
 // The return value is not whether the match is possible (which it always is),
 // but whether or not it a conversion is really used.
 bool AMDGPUDAGToDAGISel::SelectVOP3PMadMixModsImpl(SDValue In, SDValue &Src,
                                                    unsigned &Mods) const {
   Mods = 0;
   SelectVOP3ModsImpl(In, Src, Mods);
 
   if (Src.getOpcode() == ISD::FP_EXTEND) {
     Src = Src.getOperand(0);
     assert(Src.getValueType() == MVT::f16);
     Src = stripBitcast(Src);
 
     // Be careful about folding modifiers if we already have an abs. fneg is
     // applied last, so we don't want to apply an earlier fneg.
     if ((Mods & SISrcMods::ABS) == 0) {
       unsigned ModsTmp;
       SelectVOP3ModsImpl(Src, Src, ModsTmp);
 
       if ((ModsTmp & SISrcMods::NEG) != 0)
         Mods ^= SISrcMods::NEG;
 
       if ((ModsTmp & SISrcMods::ABS) != 0)
         Mods |= SISrcMods::ABS;
     }
 
     // op_sel/op_sel_hi decide the source type and source.
     // If the source's op_sel_hi is set, it indicates to do a conversion from fp16.
     // If the sources's op_sel is set, it picks the high half of the source
     // register.
 
     Mods |= SISrcMods::OP_SEL_1;
     if (isExtractHiElt(Src, Src)) {
       Mods |= SISrcMods::OP_SEL_0;
 
       // TODO: Should we try to look for neg/abs here?
     }
 
     return true;
   }
 
   return false;
 }
 
 bool AMDGPUDAGToDAGISel::SelectVOP3PMadMixMods(SDValue In, SDValue &Src,
                                                SDValue &SrcMods) const {
   unsigned Mods = 0;
   SelectVOP3PMadMixModsImpl(In, Src, Mods);
   SrcMods = CurDAG->getTargetConstant(Mods, SDLoc(In), MVT::i32);
   return true;
 }
 
 // TODO: Can we identify things like v_mad_mixhi_f16?
 bool AMDGPUDAGToDAGISel::SelectHi16Elt(SDValue In, SDValue &Src) const {
   if (In.isUndef()) {
     Src = In;
     return true;
   }
 
   if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(In)) {
     SDLoc SL(In);
     SDValue K = CurDAG->getTargetConstant(C->getZExtValue() << 16, SL, MVT::i32);
     MachineSDNode *MovK = CurDAG->getMachineNode(AMDGPU::V_MOV_B32_e32,
                                                  SL, MVT::i32, K);
     Src = SDValue(MovK, 0);
     return true;
   }
 
   if (ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(In)) {
     SDLoc SL(In);
     SDValue K = CurDAG->getTargetConstant(
       C->getValueAPF().bitcastToAPInt().getZExtValue() << 16, SL, MVT::i32);
     MachineSDNode *MovK = CurDAG->getMachineNode(AMDGPU::V_MOV_B32_e32,
                                                  SL, MVT::i32, K);
     Src = SDValue(MovK, 0);
     return true;
   }
 
   return isExtractHiElt(In, Src);
 }
 
 void AMDGPUDAGToDAGISel::PostprocessISelDAG() {
   const AMDGPUTargetLowering& Lowering =
     *static_cast<const AMDGPUTargetLowering*>(getTargetLowering());
   bool IsModified = false;
   do {
     IsModified = false;
 
     // Go over all selected nodes and try to fold them a bit more
     SelectionDAG::allnodes_iterator Position = CurDAG->allnodes_begin();
     while (Position != CurDAG->allnodes_end()) {
       SDNode *Node = &*Position++;
       MachineSDNode *MachineNode = dyn_cast<MachineSDNode>(Node);
       if (!MachineNode)
         continue;
 
       SDNode *ResNode = Lowering.PostISelFolding(MachineNode, *CurDAG);
       if (ResNode != Node) {
         if (ResNode)
           ReplaceUses(Node, ResNode);
         IsModified = true;
       }
     }
     CurDAG->RemoveDeadNodes();
   } while (IsModified);
 }
 
 bool R600DAGToDAGISel::runOnMachineFunction(MachineFunction &MF) {
   Subtarget = &MF.getSubtarget<R600Subtarget>();
   return SelectionDAGISel::runOnMachineFunction(MF);
 }
 
 bool R600DAGToDAGISel::isConstantLoad(const MemSDNode *N, int CbId) const {
   if (!N->readMem())
     return false;
   if (CbId == -1)
     return N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS ||
            N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT;
 
   return N->getAddressSpace() == AMDGPUASI.CONSTANT_BUFFER_0 + CbId;
 }
 
 bool R600DAGToDAGISel::SelectGlobalValueConstantOffset(SDValue Addr,
                                                          SDValue& IntPtr) {
   if (ConstantSDNode *Cst = dyn_cast<ConstantSDNode>(Addr)) {
     IntPtr = CurDAG->getIntPtrConstant(Cst->getZExtValue() / 4, SDLoc(Addr),
                                        true);
     return true;
   }
   return false;
 }
 
 bool R600DAGToDAGISel::SelectGlobalValueVariableOffset(SDValue Addr,
     SDValue& BaseReg, SDValue &Offset) {
   if (!isa<ConstantSDNode>(Addr)) {
     BaseReg = Addr;
     Offset = CurDAG->getIntPtrConstant(0, SDLoc(Addr), true);
     return true;
   }
   return false;
 }
 
 void R600DAGToDAGISel::Select(SDNode *N) {
   unsigned int Opc = N->getOpcode();
   if (N->isMachineOpcode()) {
     N->setNodeId(-1);
     return;   // Already selected.
   }
 
   switch (Opc) {
   default: break;
   case AMDGPUISD::BUILD_VERTICAL_VECTOR:
   case ISD::SCALAR_TO_VECTOR:
   case ISD::BUILD_VECTOR: {
     EVT VT = N->getValueType(0);
     unsigned NumVectorElts = VT.getVectorNumElements();
     unsigned RegClassID;
     // BUILD_VECTOR was lowered into an IMPLICIT_DEF + 4 INSERT_SUBREG
     // that adds a 128 bits reg copy when going through TwoAddressInstructions
     // pass. We want to avoid 128 bits copies as much as possible because they
     // can't be bundled by our scheduler.
     switch(NumVectorElts) {
     case 2: RegClassID = R600::R600_Reg64RegClassID; break;
     case 4:
       if (Opc == AMDGPUISD::BUILD_VERTICAL_VECTOR)
         RegClassID = R600::R600_Reg128VerticalRegClassID;
       else
         RegClassID = R600::R600_Reg128RegClassID;
       break;
     default: llvm_unreachable("Do not know how to lower this BUILD_VECTOR");
     }
     SelectBuildVector(N, RegClassID);
     return;
   }
   }
 
   SelectCode(N);
 }
 
 bool R600DAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,
                                           SDValue &Offset) {
   ConstantSDNode *C;
   SDLoc DL(Addr);
 
   if ((C = dyn_cast<ConstantSDNode>(Addr))) {
     Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
     Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
   } else if ((Addr.getOpcode() == AMDGPUISD::DWORDADDR) &&
              (C = dyn_cast<ConstantSDNode>(Addr.getOperand(0)))) {
     Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
     Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
   } else if ((Addr.getOpcode() == ISD::ADD || Addr.getOpcode() == ISD::OR) &&
             (C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {
     Base = Addr.getOperand(0);
     Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
   } else {
     Base = Addr;
     Offset = CurDAG->getTargetConstant(0, DL, MVT::i32);
   }
 
   return true;
 }
 
 bool R600DAGToDAGISel::SelectADDRVTX_READ(SDValue Addr, SDValue &Base,
                                           SDValue &Offset) {
   ConstantSDNode *IMMOffset;
 
   if (Addr.getOpcode() == ISD::ADD
       && (IMMOffset = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))
       && isInt<16>(IMMOffset->getZExtValue())) {
 
       Base = Addr.getOperand(0);
       Offset = CurDAG->getTargetConstant(IMMOffset->getZExtValue(), SDLoc(Addr),
                                          MVT::i32);
       return true;
   // If the pointer address is constant, we can move it to the offset field.
   } else if ((IMMOffset = dyn_cast<ConstantSDNode>(Addr))
              && isInt<16>(IMMOffset->getZExtValue())) {
     Base = CurDAG->getCopyFromReg(CurDAG->getEntryNode(),
                                   SDLoc(CurDAG->getEntryNode()),
                                   R600::ZERO, MVT::i32);
     Offset = CurDAG->getTargetConstant(IMMOffset->getZExtValue(), SDLoc(Addr),
                                        MVT::i32);
     return true;
   }
 
   // Default case, no offset
   Base = Addr;
   Offset = CurDAG->getTargetConstant(0, SDLoc(Addr), MVT::i32);
   return true;
 }
Index: vendor/llvm/dist-release_70/lib/Target/ARM/ARMFrameLowering.cpp
===================================================================
--- vendor/llvm/dist-release_70/lib/Target/ARM/ARMFrameLowering.cpp	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Target/ARM/ARMFrameLowering.cpp	(revision 338575)
@@ -1,2500 +1,2501 @@
 //===- ARMFrameLowering.cpp - ARM Frame Information -----------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file contains the ARM implementation of TargetFrameLowering class.
 //
 //===----------------------------------------------------------------------===//
 
 #include "ARMFrameLowering.h"
 #include "ARMBaseInstrInfo.h"
 #include "ARMBaseRegisterInfo.h"
 #include "ARMConstantPoolValue.h"
 #include "ARMMachineFunctionInfo.h"
 #include "ARMSubtarget.h"
 #include "MCTargetDesc/ARMAddressingModes.h"
 #include "MCTargetDesc/ARMBaseInfo.h"
 #include "Utils/ARMBaseInfo.h"
 #include "llvm/ADT/BitVector.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineConstantPool.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineInstr.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineModuleInfo.h"
 #include "llvm/CodeGen/MachineOperand.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/RegisterScavenging.h"
 #include "llvm/CodeGen/TargetInstrInfo.h"
 #include "llvm/CodeGen/TargetOpcodes.h"
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/IR/Attributes.h"
 #include "llvm/IR/CallingConv.h"
 #include "llvm/IR/DebugLoc.h"
 #include "llvm/IR/Function.h"
 #include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCDwarf.h"
 #include "llvm/MC/MCInstrDesc.h"
 #include "llvm/MC/MCRegisterInfo.h"
 #include "llvm/Support/CodeGen.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetMachine.h"
 #include "llvm/Target/TargetOptions.h"
 #include <algorithm>
 #include <cassert>
 #include <cstddef>
 #include <cstdint>
 #include <iterator>
 #include <utility>
 #include <vector>
 
 #define DEBUG_TYPE "arm-frame-lowering"
 
 using namespace llvm;
 
 static cl::opt<bool>
 SpillAlignedNEONRegs("align-neon-spills", cl::Hidden, cl::init(true),
                      cl::desc("Align ARM NEON spills in prolog and epilog"));
 
 static MachineBasicBlock::iterator
 skipAlignedDPRCS2Spills(MachineBasicBlock::iterator MI,
                         unsigned NumAlignedDPRCS2Regs);
 
 ARMFrameLowering::ARMFrameLowering(const ARMSubtarget &sti)
     : TargetFrameLowering(StackGrowsDown, sti.getStackAlignment(), 0, 4),
       STI(sti) {}
 
 bool ARMFrameLowering::noFramePointerElim(const MachineFunction &MF) const {
   // iOS always has a FP for backtracking, force other targets to keep their FP
   // when doing FastISel. The emitted code is currently superior, and in cases
   // like test-suite's lencod FastISel isn't quite correct when FP is eliminated.
   return TargetFrameLowering::noFramePointerElim(MF) ||
          MF.getSubtarget<ARMSubtarget>().useFastISel();
 }
 
 /// Returns true if the target can safely skip saving callee-saved registers
 /// for noreturn nounwind functions.
 bool ARMFrameLowering::enableCalleeSaveSkip(const MachineFunction &MF) const {
   assert(MF.getFunction().hasFnAttribute(Attribute::NoReturn) &&
          MF.getFunction().hasFnAttribute(Attribute::NoUnwind) &&
          !MF.getFunction().hasFnAttribute(Attribute::UWTable));
 
   // Frame pointer and link register are not treated as normal CSR, thus we
   // can always skip CSR saves for nonreturning functions.
   return true;
 }
 
 /// hasFP - Return true if the specified function should have a dedicated frame
 /// pointer register.  This is true if the function has variable sized allocas
 /// or if frame pointer elimination is disabled.
 bool ARMFrameLowering::hasFP(const MachineFunction &MF) const {
   const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
   const MachineFrameInfo &MFI = MF.getFrameInfo();
 
   // ABI-required frame pointer.
   if (MF.getTarget().Options.DisableFramePointerElim(MF))
     return true;
 
   // Frame pointer required for use within this function.
   return (RegInfo->needsStackRealignment(MF) ||
           MFI.hasVarSizedObjects() ||
           MFI.isFrameAddressTaken());
 }
 
 /// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
 /// not required, we reserve argument space for call sites in the function
 /// immediately on entry to the current function.  This eliminates the need for
 /// add/sub sp brackets around call sites.  Returns true if the call frame is
 /// included as part of the stack frame.
 bool ARMFrameLowering::hasReservedCallFrame(const MachineFunction &MF) const {
   const MachineFrameInfo &MFI = MF.getFrameInfo();
   unsigned CFSize = MFI.getMaxCallFrameSize();
   // It's not always a good idea to include the call frame as part of the
   // stack frame. ARM (especially Thumb) has small immediate offset to
   // address the stack frame. So a large call frame can cause poor codegen
   // and may even makes it impossible to scavenge a register.
   if (CFSize >= ((1 << 12) - 1) / 2)  // Half of imm12
     return false;
 
   return !MFI.hasVarSizedObjects();
 }
 
 /// canSimplifyCallFramePseudos - If there is a reserved call frame, the
 /// call frame pseudos can be simplified.  Unlike most targets, having a FP
 /// is not sufficient here since we still may reference some objects via SP
 /// even when FP is available in Thumb2 mode.
 bool
 ARMFrameLowering::canSimplifyCallFramePseudos(const MachineFunction &MF) const {
   return hasReservedCallFrame(MF) || MF.getFrameInfo().hasVarSizedObjects();
 }
 
 static bool isCSRestore(MachineInstr &MI, const ARMBaseInstrInfo &TII,
                         const MCPhysReg *CSRegs) {
   // Integer spill area is handled with "pop".
   if (isPopOpcode(MI.getOpcode())) {
     // The first two operands are predicates. The last two are
     // imp-def and imp-use of SP. Check everything in between.
     for (int i = 5, e = MI.getNumOperands(); i != e; ++i)
       if (!isCalleeSavedRegister(MI.getOperand(i).getReg(), CSRegs))
         return false;
     return true;
   }
   if ((MI.getOpcode() == ARM::LDR_POST_IMM ||
        MI.getOpcode() == ARM::LDR_POST_REG ||
        MI.getOpcode() == ARM::t2LDR_POST) &&
       isCalleeSavedRegister(MI.getOperand(0).getReg(), CSRegs) &&
       MI.getOperand(1).getReg() == ARM::SP)
     return true;
 
   return false;
 }
 
 static void emitRegPlusImmediate(
     bool isARM, MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI,
     const DebugLoc &dl, const ARMBaseInstrInfo &TII, unsigned DestReg,
     unsigned SrcReg, int NumBytes, unsigned MIFlags = MachineInstr::NoFlags,
     ARMCC::CondCodes Pred = ARMCC::AL, unsigned PredReg = 0) {
   if (isARM)
     emitARMRegPlusImmediate(MBB, MBBI, dl, DestReg, SrcReg, NumBytes,
                             Pred, PredReg, TII, MIFlags);
   else
     emitT2RegPlusImmediate(MBB, MBBI, dl, DestReg, SrcReg, NumBytes,
                            Pred, PredReg, TII, MIFlags);
 }
 
 static void emitSPUpdate(bool isARM, MachineBasicBlock &MBB,
                          MachineBasicBlock::iterator &MBBI, const DebugLoc &dl,
                          const ARMBaseInstrInfo &TII, int NumBytes,
                          unsigned MIFlags = MachineInstr::NoFlags,
                          ARMCC::CondCodes Pred = ARMCC::AL,
                          unsigned PredReg = 0) {
   emitRegPlusImmediate(isARM, MBB, MBBI, dl, TII, ARM::SP, ARM::SP, NumBytes,
                        MIFlags, Pred, PredReg);
 }
 
 static int sizeOfSPAdjustment(const MachineInstr &MI) {
   int RegSize;
   switch (MI.getOpcode()) {
   case ARM::VSTMDDB_UPD:
     RegSize = 8;
     break;
   case ARM::STMDB_UPD:
   case ARM::t2STMDB_UPD:
     RegSize = 4;
     break;
   case ARM::t2STR_PRE:
   case ARM::STR_PRE_IMM:
     return 4;
   default:
     llvm_unreachable("Unknown push or pop like instruction");
   }
 
   int count = 0;
   // ARM and Thumb2 push/pop insts have explicit "sp, sp" operands (+
   // pred) so the list starts at 4.
   for (int i = MI.getNumOperands() - 1; i >= 4; --i)
     count += RegSize;
   return count;
 }
 
 static bool WindowsRequiresStackProbe(const MachineFunction &MF,
                                       size_t StackSizeInBytes) {
   const MachineFrameInfo &MFI = MF.getFrameInfo();
   const Function &F = MF.getFunction();
   unsigned StackProbeSize = (MFI.getStackProtectorIndex() > 0) ? 4080 : 4096;
   if (F.hasFnAttribute("stack-probe-size"))
     F.getFnAttribute("stack-probe-size")
         .getValueAsString()
         .getAsInteger(0, StackProbeSize);
   return (StackSizeInBytes >= StackProbeSize) &&
          !F.hasFnAttribute("no-stack-arg-probe");
 }
 
 namespace {
 
 struct StackAdjustingInsts {
   struct InstInfo {
     MachineBasicBlock::iterator I;
     unsigned SPAdjust;
     bool BeforeFPSet;
   };
 
   SmallVector<InstInfo, 4> Insts;
 
   void addInst(MachineBasicBlock::iterator I, unsigned SPAdjust,
                bool BeforeFPSet = false) {
     InstInfo Info = {I, SPAdjust, BeforeFPSet};
     Insts.push_back(Info);
   }
 
   void addExtraBytes(const MachineBasicBlock::iterator I, unsigned ExtraBytes) {
     auto Info =
         llvm::find_if(Insts, [&](InstInfo &Info) { return Info.I == I; });
     assert(Info != Insts.end() && "invalid sp adjusting instruction");
     Info->SPAdjust += ExtraBytes;
   }
 
   void emitDefCFAOffsets(MachineBasicBlock &MBB, const DebugLoc &dl,
                          const ARMBaseInstrInfo &TII, bool HasFP) {
     MachineFunction &MF = *MBB.getParent();
     unsigned CFAOffset = 0;
     for (auto &Info : Insts) {
       if (HasFP && !Info.BeforeFPSet)
         return;
 
       CFAOffset -= Info.SPAdjust;
       unsigned CFIIndex = MF.addFrameInst(
           MCCFIInstruction::createDefCfaOffset(nullptr, CFAOffset));
       BuildMI(MBB, std::next(Info.I), dl,
               TII.get(TargetOpcode::CFI_INSTRUCTION))
               .addCFIIndex(CFIIndex)
               .setMIFlags(MachineInstr::FrameSetup);
     }
   }
 };
 
 } // end anonymous namespace
 
 /// Emit an instruction sequence that will align the address in
 /// register Reg by zero-ing out the lower bits.  For versions of the
 /// architecture that support Neon, this must be done in a single
 /// instruction, since skipAlignedDPRCS2Spills assumes it is done in a
 /// single instruction. That function only gets called when optimizing
 /// spilling of D registers on a core with the Neon instruction set
 /// present.
 static void emitAligningInstructions(MachineFunction &MF, ARMFunctionInfo *AFI,
                                      const TargetInstrInfo &TII,
                                      MachineBasicBlock &MBB,
                                      MachineBasicBlock::iterator MBBI,
                                      const DebugLoc &DL, const unsigned Reg,
                                      const unsigned Alignment,
                                      const bool MustBeSingleInstruction) {
   const ARMSubtarget &AST =
       static_cast<const ARMSubtarget &>(MF.getSubtarget());
   const bool CanUseBFC = AST.hasV6T2Ops() || AST.hasV7Ops();
   const unsigned AlignMask = Alignment - 1;
   const unsigned NrBitsToZero = countTrailingZeros(Alignment);
   assert(!AFI->isThumb1OnlyFunction() && "Thumb1 not supported");
   if (!AFI->isThumbFunction()) {
     // if the BFC instruction is available, use that to zero the lower
     // bits:
     //   bfc Reg, #0, log2(Alignment)
     // otherwise use BIC, if the mask to zero the required number of bits
     // can be encoded in the bic immediate field
     //   bic Reg, Reg, Alignment-1
     // otherwise, emit
     //   lsr Reg, Reg, log2(Alignment)
     //   lsl Reg, Reg, log2(Alignment)
     if (CanUseBFC) {
       BuildMI(MBB, MBBI, DL, TII.get(ARM::BFC), Reg)
           .addReg(Reg, RegState::Kill)
           .addImm(~AlignMask)
           .add(predOps(ARMCC::AL));
     } else if (AlignMask <= 255) {
       BuildMI(MBB, MBBI, DL, TII.get(ARM::BICri), Reg)
           .addReg(Reg, RegState::Kill)
           .addImm(AlignMask)
           .add(predOps(ARMCC::AL))
           .add(condCodeOp());
     } else {
       assert(!MustBeSingleInstruction &&
              "Shouldn't call emitAligningInstructions demanding a single "
              "instruction to be emitted for large stack alignment for a target "
              "without BFC.");
       BuildMI(MBB, MBBI, DL, TII.get(ARM::MOVsi), Reg)
           .addReg(Reg, RegState::Kill)
           .addImm(ARM_AM::getSORegOpc(ARM_AM::lsr, NrBitsToZero))
           .add(predOps(ARMCC::AL))
           .add(condCodeOp());
       BuildMI(MBB, MBBI, DL, TII.get(ARM::MOVsi), Reg)
           .addReg(Reg, RegState::Kill)
           .addImm(ARM_AM::getSORegOpc(ARM_AM::lsl, NrBitsToZero))
           .add(predOps(ARMCC::AL))
           .add(condCodeOp());
     }
   } else {
     // Since this is only reached for Thumb-2 targets, the BFC instruction
     // should always be available.
     assert(CanUseBFC);
     BuildMI(MBB, MBBI, DL, TII.get(ARM::t2BFC), Reg)
         .addReg(Reg, RegState::Kill)
         .addImm(~AlignMask)
         .add(predOps(ARMCC::AL));
   }
 }
 
 /// We need the offset of the frame pointer relative to other MachineFrameInfo
 /// offsets which are encoded relative to SP at function begin.
 /// See also emitPrologue() for how the FP is set up.
 /// Unfortunately we cannot determine this value in determineCalleeSaves() yet
 /// as assignCalleeSavedSpillSlots() hasn't run at this point. Instead we use
 /// this to produce a conservative estimate that we check in an assert() later.
 static int getMaxFPOffset(const Function &F, const ARMFunctionInfo &AFI) {
   // This is a conservative estimation: Assume the frame pointer being r7 and
   // pc("r15") up to r8 getting spilled before (= 8 registers).
   return -AFI.getArgRegsSaveSize() - (8 * 4);
 }
 
 void ARMFrameLowering::emitPrologue(MachineFunction &MF,
                                     MachineBasicBlock &MBB) const {
   MachineBasicBlock::iterator MBBI = MBB.begin();
   MachineFrameInfo  &MFI = MF.getFrameInfo();
   ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
   MachineModuleInfo &MMI = MF.getMMI();
   MCContext &Context = MMI.getContext();
   const TargetMachine &TM = MF.getTarget();
   const MCRegisterInfo *MRI = Context.getRegisterInfo();
   const ARMBaseRegisterInfo *RegInfo = STI.getRegisterInfo();
   const ARMBaseInstrInfo &TII = *STI.getInstrInfo();
   assert(!AFI->isThumb1OnlyFunction() &&
          "This emitPrologue does not support Thumb1!");
   bool isARM = !AFI->isThumbFunction();
   unsigned Align = STI.getFrameLowering()->getStackAlignment();
   unsigned ArgRegsSaveSize = AFI->getArgRegsSaveSize();
   unsigned NumBytes = MFI.getStackSize();
   const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
 
   // Debug location must be unknown since the first debug location is used
   // to determine the end of the prologue.
   DebugLoc dl;
 
   unsigned FramePtr = RegInfo->getFrameRegister(MF);
 
   // Determine the sizes of each callee-save spill areas and record which frame
   // belongs to which callee-save spill areas.
   unsigned GPRCS1Size = 0, GPRCS2Size = 0, DPRCSSize = 0;
   int FramePtrSpillFI = 0;
   int D8SpillFI = 0;
 
   // All calls are tail calls in GHC calling conv, and functions have no
   // prologue/epilogue.
   if (MF.getFunction().getCallingConv() == CallingConv::GHC)
     return;
 
   StackAdjustingInsts DefCFAOffsetCandidates;
   bool HasFP = hasFP(MF);
 
   // Allocate the vararg register save area.
   if (ArgRegsSaveSize) {
     emitSPUpdate(isARM, MBB, MBBI, dl, TII, -ArgRegsSaveSize,
                  MachineInstr::FrameSetup);
     DefCFAOffsetCandidates.addInst(std::prev(MBBI), ArgRegsSaveSize, true);
   }
 
   if (!AFI->hasStackFrame() &&
       (!STI.isTargetWindows() || !WindowsRequiresStackProbe(MF, NumBytes))) {
     if (NumBytes - ArgRegsSaveSize != 0) {
       emitSPUpdate(isARM, MBB, MBBI, dl, TII, -(NumBytes - ArgRegsSaveSize),
                    MachineInstr::FrameSetup);
       DefCFAOffsetCandidates.addInst(std::prev(MBBI),
                                      NumBytes - ArgRegsSaveSize, true);
     }
     DefCFAOffsetCandidates.emitDefCFAOffsets(MBB, dl, TII, HasFP);
     return;
   }
 
   // Determine spill area sizes.
   for (unsigned i = 0, e = CSI.size(); i != e; ++i) {
     unsigned Reg = CSI[i].getReg();
     int FI = CSI[i].getFrameIdx();
     switch (Reg) {
     case ARM::R8:
     case ARM::R9:
     case ARM::R10:
     case ARM::R11:
     case ARM::R12:
       if (STI.splitFramePushPop(MF)) {
         GPRCS2Size += 4;
         break;
       }
       LLVM_FALLTHROUGH;
     case ARM::R0:
     case ARM::R1:
     case ARM::R2:
     case ARM::R3:
     case ARM::R4:
     case ARM::R5:
     case ARM::R6:
     case ARM::R7:
     case ARM::LR:
       if (Reg == FramePtr)
         FramePtrSpillFI = FI;
       GPRCS1Size += 4;
       break;
     default:
       // This is a DPR. Exclude the aligned DPRCS2 spills.
       if (Reg == ARM::D8)
         D8SpillFI = FI;
       if (Reg < ARM::D8 || Reg >= ARM::D8 + AFI->getNumAlignedDPRCS2Regs())
         DPRCSSize += 8;
     }
   }
 
   // Move past area 1.
   MachineBasicBlock::iterator LastPush = MBB.end(), GPRCS1Push, GPRCS2Push;
   if (GPRCS1Size > 0) {
     GPRCS1Push = LastPush = MBBI++;
     DefCFAOffsetCandidates.addInst(LastPush, GPRCS1Size, true);
   }
 
   // Determine starting offsets of spill areas.
   unsigned GPRCS1Offset = NumBytes - ArgRegsSaveSize - GPRCS1Size;
   unsigned GPRCS2Offset = GPRCS1Offset - GPRCS2Size;
   unsigned DPRAlign = DPRCSSize ? std::min(8U, Align) : 4U;
   unsigned DPRGapSize = (GPRCS1Size + GPRCS2Size + ArgRegsSaveSize) % DPRAlign;
   unsigned DPRCSOffset = GPRCS2Offset - DPRGapSize - DPRCSSize;
   int FramePtrOffsetInPush = 0;
   if (HasFP) {
     int FPOffset = MFI.getObjectOffset(FramePtrSpillFI);
     assert(getMaxFPOffset(MF.getFunction(), *AFI) <= FPOffset &&
            "Max FP estimation is wrong");
     FramePtrOffsetInPush = FPOffset + ArgRegsSaveSize;
     AFI->setFramePtrSpillOffset(MFI.getObjectOffset(FramePtrSpillFI) +
                                 NumBytes);
   }
   AFI->setGPRCalleeSavedArea1Offset(GPRCS1Offset);
   AFI->setGPRCalleeSavedArea2Offset(GPRCS2Offset);
   AFI->setDPRCalleeSavedAreaOffset(DPRCSOffset);
 
   // Move past area 2.
   if (GPRCS2Size > 0) {
     GPRCS2Push = LastPush = MBBI++;
     DefCFAOffsetCandidates.addInst(LastPush, GPRCS2Size);
   }
 
   // Prolog/epilog inserter assumes we correctly align DPRs on the stack, so our
   // .cfi_offset operations will reflect that.
   if (DPRGapSize) {
     assert(DPRGapSize == 4 && "unexpected alignment requirements for DPRs");
     if (LastPush != MBB.end() &&
         tryFoldSPUpdateIntoPushPop(STI, MF, &*LastPush, DPRGapSize))
       DefCFAOffsetCandidates.addExtraBytes(LastPush, DPRGapSize);
     else {
       emitSPUpdate(isARM, MBB, MBBI, dl, TII, -DPRGapSize,
                    MachineInstr::FrameSetup);
       DefCFAOffsetCandidates.addInst(std::prev(MBBI), DPRGapSize);
     }
   }
 
   // Move past area 3.
   if (DPRCSSize > 0) {
     // Since vpush register list cannot have gaps, there may be multiple vpush
     // instructions in the prologue.
     while (MBBI != MBB.end() && MBBI->getOpcode() == ARM::VSTMDDB_UPD) {
       DefCFAOffsetCandidates.addInst(MBBI, sizeOfSPAdjustment(*MBBI));
       LastPush = MBBI++;
     }
   }
 
   // Move past the aligned DPRCS2 area.
   if (AFI->getNumAlignedDPRCS2Regs() > 0) {
     MBBI = skipAlignedDPRCS2Spills(MBBI, AFI->getNumAlignedDPRCS2Regs());
     // The code inserted by emitAlignedDPRCS2Spills realigns the stack, and
     // leaves the stack pointer pointing to the DPRCS2 area.
     //
     // Adjust NumBytes to represent the stack slots below the DPRCS2 area.
     NumBytes += MFI.getObjectOffset(D8SpillFI);
   } else
     NumBytes = DPRCSOffset;
 
   if (STI.isTargetWindows() && WindowsRequiresStackProbe(MF, NumBytes)) {
     uint32_t NumWords = NumBytes >> 2;
 
     if (NumWords < 65536)
       BuildMI(MBB, MBBI, dl, TII.get(ARM::t2MOVi16), ARM::R4)
           .addImm(NumWords)
           .setMIFlags(MachineInstr::FrameSetup)
           .add(predOps(ARMCC::AL));
     else
       BuildMI(MBB, MBBI, dl, TII.get(ARM::t2MOVi32imm), ARM::R4)
         .addImm(NumWords)
         .setMIFlags(MachineInstr::FrameSetup);
 
     switch (TM.getCodeModel()) {
     case CodeModel::Small:
     case CodeModel::Medium:
     case CodeModel::Kernel:
       BuildMI(MBB, MBBI, dl, TII.get(ARM::tBL))
           .add(predOps(ARMCC::AL))
           .addExternalSymbol("__chkstk")
           .addReg(ARM::R4, RegState::Implicit)
           .setMIFlags(MachineInstr::FrameSetup);
       break;
     case CodeModel::Large:
       BuildMI(MBB, MBBI, dl, TII.get(ARM::t2MOVi32imm), ARM::R12)
         .addExternalSymbol("__chkstk")
         .setMIFlags(MachineInstr::FrameSetup);
 
       BuildMI(MBB, MBBI, dl, TII.get(ARM::tBLXr))
           .add(predOps(ARMCC::AL))
           .addReg(ARM::R12, RegState::Kill)
           .addReg(ARM::R4, RegState::Implicit)
           .setMIFlags(MachineInstr::FrameSetup);
       break;
     }
 
     BuildMI(MBB, MBBI, dl, TII.get(ARM::t2SUBrr), ARM::SP)
         .addReg(ARM::SP, RegState::Kill)
         .addReg(ARM::R4, RegState::Kill)
         .setMIFlags(MachineInstr::FrameSetup)
         .add(predOps(ARMCC::AL))
         .add(condCodeOp());
     NumBytes = 0;
   }
 
   if (NumBytes) {
     // Adjust SP after all the callee-save spills.
     if (AFI->getNumAlignedDPRCS2Regs() == 0 &&
         tryFoldSPUpdateIntoPushPop(STI, MF, &*LastPush, NumBytes))
       DefCFAOffsetCandidates.addExtraBytes(LastPush, NumBytes);
     else {
       emitSPUpdate(isARM, MBB, MBBI, dl, TII, -NumBytes,
                    MachineInstr::FrameSetup);
       DefCFAOffsetCandidates.addInst(std::prev(MBBI), NumBytes);
     }
 
     if (HasFP && isARM)
       // Restore from fp only in ARM mode: e.g. sub sp, r7, #24
       // Note it's not safe to do this in Thumb2 mode because it would have
       // taken two instructions:
       // mov sp, r7
       // sub sp, #24
       // If an interrupt is taken between the two instructions, then sp is in
       // an inconsistent state (pointing to the middle of callee-saved area).
       // The interrupt handler can end up clobbering the registers.
       AFI->setShouldRestoreSPFromFP(true);
   }
 
   // Set FP to point to the stack slot that contains the previous FP.
   // For iOS, FP is R7, which has now been stored in spill area 1.
   // Otherwise, if this is not iOS, all the callee-saved registers go
   // into spill area 1, including the FP in R11.  In either case, it
   // is in area one and the adjustment needs to take place just after
   // that push.
   if (HasFP) {
     MachineBasicBlock::iterator AfterPush = std::next(GPRCS1Push);
     unsigned PushSize = sizeOfSPAdjustment(*GPRCS1Push);
     emitRegPlusImmediate(!AFI->isThumbFunction(), MBB, AfterPush,
                          dl, TII, FramePtr, ARM::SP,
                          PushSize + FramePtrOffsetInPush,
                          MachineInstr::FrameSetup);
     if (FramePtrOffsetInPush + PushSize != 0) {
       unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createDefCfa(
           nullptr, MRI->getDwarfRegNum(FramePtr, true),
           -(ArgRegsSaveSize - FramePtrOffsetInPush)));
       BuildMI(MBB, AfterPush, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
           .addCFIIndex(CFIIndex)
           .setMIFlags(MachineInstr::FrameSetup);
     } else {
       unsigned CFIIndex =
           MF.addFrameInst(MCCFIInstruction::createDefCfaRegister(
               nullptr, MRI->getDwarfRegNum(FramePtr, true)));
       BuildMI(MBB, AfterPush, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
           .addCFIIndex(CFIIndex)
           .setMIFlags(MachineInstr::FrameSetup);
     }
   }
 
   // Now that the prologue's actual instructions are finalised, we can insert
   // the necessary DWARF cf instructions to describe the situation. Start by
   // recording where each register ended up:
   if (GPRCS1Size > 0) {
     MachineBasicBlock::iterator Pos = std::next(GPRCS1Push);
     int CFIIndex;
     for (const auto &Entry : CSI) {
       unsigned Reg = Entry.getReg();
       int FI = Entry.getFrameIdx();
       switch (Reg) {
       case ARM::R8:
       case ARM::R9:
       case ARM::R10:
       case ARM::R11:
       case ARM::R12:
         if (STI.splitFramePushPop(MF))
           break;
         LLVM_FALLTHROUGH;
       case ARM::R0:
       case ARM::R1:
       case ARM::R2:
       case ARM::R3:
       case ARM::R4:
       case ARM::R5:
       case ARM::R6:
       case ARM::R7:
       case ARM::LR:
         CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
             nullptr, MRI->getDwarfRegNum(Reg, true), MFI.getObjectOffset(FI)));
         BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
             .addCFIIndex(CFIIndex)
             .setMIFlags(MachineInstr::FrameSetup);
         break;
       }
     }
   }
 
   if (GPRCS2Size > 0) {
     MachineBasicBlock::iterator Pos = std::next(GPRCS2Push);
     for (const auto &Entry : CSI) {
       unsigned Reg = Entry.getReg();
       int FI = Entry.getFrameIdx();
       switch (Reg) {
       case ARM::R8:
       case ARM::R9:
       case ARM::R10:
       case ARM::R11:
       case ARM::R12:
         if (STI.splitFramePushPop(MF)) {
           unsigned DwarfReg =  MRI->getDwarfRegNum(Reg, true);
           unsigned Offset = MFI.getObjectOffset(FI);
           unsigned CFIIndex = MF.addFrameInst(
               MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
           BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
               .addCFIIndex(CFIIndex)
               .setMIFlags(MachineInstr::FrameSetup);
         }
         break;
       }
     }
   }
 
   if (DPRCSSize > 0) {
     // Since vpush register list cannot have gaps, there may be multiple vpush
     // instructions in the prologue.
     MachineBasicBlock::iterator Pos = std::next(LastPush);
     for (const auto &Entry : CSI) {
       unsigned Reg = Entry.getReg();
       int FI = Entry.getFrameIdx();
       if ((Reg >= ARM::D0 && Reg <= ARM::D31) &&
           (Reg < ARM::D8 || Reg >= ARM::D8 + AFI->getNumAlignedDPRCS2Regs())) {
         unsigned DwarfReg = MRI->getDwarfRegNum(Reg, true);
         unsigned Offset = MFI.getObjectOffset(FI);
         unsigned CFIIndex = MF.addFrameInst(
             MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
         BuildMI(MBB, Pos, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
             .addCFIIndex(CFIIndex)
             .setMIFlags(MachineInstr::FrameSetup);
       }
     }
   }
 
   // Now we can emit descriptions of where the canonical frame address was
   // throughout the process. If we have a frame pointer, it takes over the job
   // half-way through, so only the first few .cfi_def_cfa_offset instructions
   // actually get emitted.
   DefCFAOffsetCandidates.emitDefCFAOffsets(MBB, dl, TII, HasFP);
 
   if (STI.isTargetELF() && hasFP(MF))
     MFI.setOffsetAdjustment(MFI.getOffsetAdjustment() -
                             AFI->getFramePtrSpillOffset());
 
   AFI->setGPRCalleeSavedArea1Size(GPRCS1Size);
   AFI->setGPRCalleeSavedArea2Size(GPRCS2Size);
   AFI->setDPRCalleeSavedGapSize(DPRGapSize);
   AFI->setDPRCalleeSavedAreaSize(DPRCSSize);
 
   // If we need dynamic stack realignment, do it here. Be paranoid and make
   // sure if we also have VLAs, we have a base pointer for frame access.
   // If aligned NEON registers were spilled, the stack has already been
   // realigned.
   if (!AFI->getNumAlignedDPRCS2Regs() && RegInfo->needsStackRealignment(MF)) {
     unsigned MaxAlign = MFI.getMaxAlignment();
     assert(!AFI->isThumb1OnlyFunction());
     if (!AFI->isThumbFunction()) {
       emitAligningInstructions(MF, AFI, TII, MBB, MBBI, dl, ARM::SP, MaxAlign,
                                false);
     } else {
       // We cannot use sp as source/dest register here, thus we're using r4 to
       // perform the calculations. We're emitting the following sequence:
       // mov r4, sp
       // -- use emitAligningInstructions to produce best sequence to zero
       // -- out lower bits in r4
       // mov sp, r4
       // FIXME: It will be better just to find spare register here.
       BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::R4)
           .addReg(ARM::SP, RegState::Kill)
           .add(predOps(ARMCC::AL));
       emitAligningInstructions(MF, AFI, TII, MBB, MBBI, dl, ARM::R4, MaxAlign,
                                false);
       BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::SP)
           .addReg(ARM::R4, RegState::Kill)
           .add(predOps(ARMCC::AL));
     }
 
     AFI->setShouldRestoreSPFromFP(true);
   }
 
   // If we need a base pointer, set it up here. It's whatever the value
   // of the stack pointer is at this point. Any variable size objects
   // will be allocated after this, so we can still use the base pointer
   // to reference locals.
   // FIXME: Clarify FrameSetup flags here.
   if (RegInfo->hasBasePointer(MF)) {
     if (isARM)
       BuildMI(MBB, MBBI, dl, TII.get(ARM::MOVr), RegInfo->getBaseRegister())
           .addReg(ARM::SP)
           .add(predOps(ARMCC::AL))
           .add(condCodeOp());
     else
       BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), RegInfo->getBaseRegister())
           .addReg(ARM::SP)
           .add(predOps(ARMCC::AL));
   }
 
   // If the frame has variable sized objects then the epilogue must restore
   // the sp from fp. We can assume there's an FP here since hasFP already
   // checks for hasVarSizedObjects.
   if (MFI.hasVarSizedObjects())
     AFI->setShouldRestoreSPFromFP(true);
 }
 
 void ARMFrameLowering::emitEpilogue(MachineFunction &MF,
                                     MachineBasicBlock &MBB) const {
   MachineFrameInfo &MFI = MF.getFrameInfo();
   ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
   const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
   const ARMBaseInstrInfo &TII =
       *static_cast<const ARMBaseInstrInfo *>(MF.getSubtarget().getInstrInfo());
   assert(!AFI->isThumb1OnlyFunction() &&
          "This emitEpilogue does not support Thumb1!");
   bool isARM = !AFI->isThumbFunction();
 
   unsigned ArgRegsSaveSize = AFI->getArgRegsSaveSize();
   int NumBytes = (int)MFI.getStackSize();
   unsigned FramePtr = RegInfo->getFrameRegister(MF);
 
   // All calls are tail calls in GHC calling conv, and functions have no
   // prologue/epilogue.
   if (MF.getFunction().getCallingConv() == CallingConv::GHC)
     return;
 
   // First put ourselves on the first (from top) terminator instructions.
   MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
   DebugLoc dl = MBBI != MBB.end() ? MBBI->getDebugLoc() : DebugLoc();
 
   if (!AFI->hasStackFrame()) {
     if (NumBytes - ArgRegsSaveSize != 0)
       emitSPUpdate(isARM, MBB, MBBI, dl, TII, NumBytes - ArgRegsSaveSize);
   } else {
     // Unwind MBBI to point to first LDR / VLDRD.
     const MCPhysReg *CSRegs = RegInfo->getCalleeSavedRegs(&MF);
     if (MBBI != MBB.begin()) {
       do {
         --MBBI;
       } while (MBBI != MBB.begin() && isCSRestore(*MBBI, TII, CSRegs));
       if (!isCSRestore(*MBBI, TII, CSRegs))
         ++MBBI;
     }
 
     // Move SP to start of FP callee save spill area.
     NumBytes -= (ArgRegsSaveSize +
                  AFI->getGPRCalleeSavedArea1Size() +
                  AFI->getGPRCalleeSavedArea2Size() +
                  AFI->getDPRCalleeSavedGapSize() +
                  AFI->getDPRCalleeSavedAreaSize());
 
     // Reset SP based on frame pointer only if the stack frame extends beyond
     // frame pointer stack slot or target is ELF and the function has FP.
     if (AFI->shouldRestoreSPFromFP()) {
       NumBytes = AFI->getFramePtrSpillOffset() - NumBytes;
       if (NumBytes) {
         if (isARM)
           emitARMRegPlusImmediate(MBB, MBBI, dl, ARM::SP, FramePtr, -NumBytes,
                                   ARMCC::AL, 0, TII);
         else {
           // It's not possible to restore SP from FP in a single instruction.
           // For iOS, this looks like:
           // mov sp, r7
           // sub sp, #24
           // This is bad, if an interrupt is taken after the mov, sp is in an
           // inconsistent state.
           // Use the first callee-saved register as a scratch register.
           assert(!MFI.getPristineRegs(MF).test(ARM::R4) &&
                  "No scratch register to restore SP from FP!");
           emitT2RegPlusImmediate(MBB, MBBI, dl, ARM::R4, FramePtr, -NumBytes,
                                  ARMCC::AL, 0, TII);
           BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::SP)
               .addReg(ARM::R4)
               .add(predOps(ARMCC::AL));
         }
       } else {
         // Thumb2 or ARM.
         if (isARM)
           BuildMI(MBB, MBBI, dl, TII.get(ARM::MOVr), ARM::SP)
               .addReg(FramePtr)
               .add(predOps(ARMCC::AL))
               .add(condCodeOp());
         else
           BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::SP)
               .addReg(FramePtr)
               .add(predOps(ARMCC::AL));
       }
     } else if (NumBytes &&
                !tryFoldSPUpdateIntoPushPop(STI, MF, &*MBBI, NumBytes))
       emitSPUpdate(isARM, MBB, MBBI, dl, TII, NumBytes);
 
     // Increment past our save areas.
     if (MBBI != MBB.end() && AFI->getDPRCalleeSavedAreaSize()) {
       MBBI++;
       // Since vpop register list cannot have gaps, there may be multiple vpop
       // instructions in the epilogue.
       while (MBBI != MBB.end() && MBBI->getOpcode() == ARM::VLDMDIA_UPD)
         MBBI++;
     }
     if (AFI->getDPRCalleeSavedGapSize()) {
       assert(AFI->getDPRCalleeSavedGapSize() == 4 &&
              "unexpected DPR alignment gap");
       emitSPUpdate(isARM, MBB, MBBI, dl, TII, AFI->getDPRCalleeSavedGapSize());
     }
 
     if (AFI->getGPRCalleeSavedArea2Size()) MBBI++;
     if (AFI->getGPRCalleeSavedArea1Size()) MBBI++;
   }
 
   if (ArgRegsSaveSize)
     emitSPUpdate(isARM, MBB, MBBI, dl, TII, ArgRegsSaveSize);
 }
 
 /// getFrameIndexReference - Provide a base+offset reference to an FI slot for
 /// debug info.  It's the same as what we use for resolving the code-gen
 /// references for now.  FIXME: This can go wrong when references are
 /// SP-relative and simple call frames aren't used.
 int
 ARMFrameLowering::getFrameIndexReference(const MachineFunction &MF, int FI,
                                          unsigned &FrameReg) const {
   return ResolveFrameIndexReference(MF, FI, FrameReg, 0);
 }
 
 int
 ARMFrameLowering::ResolveFrameIndexReference(const MachineFunction &MF,
                                              int FI, unsigned &FrameReg,
                                              int SPAdj) const {
   const MachineFrameInfo &MFI = MF.getFrameInfo();
   const ARMBaseRegisterInfo *RegInfo = static_cast<const ARMBaseRegisterInfo *>(
       MF.getSubtarget().getRegisterInfo());
   const ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
   int Offset = MFI.getObjectOffset(FI) + MFI.getStackSize();
   int FPOffset = Offset - AFI->getFramePtrSpillOffset();
   bool isFixed = MFI.isFixedObjectIndex(FI);
 
   FrameReg = ARM::SP;
   Offset += SPAdj;
 
   // SP can move around if there are allocas.  We may also lose track of SP
   // when emergency spilling inside a non-reserved call frame setup.
   bool hasMovingSP = !hasReservedCallFrame(MF);
 
   // When dynamically realigning the stack, use the frame pointer for
   // parameters, and the stack/base pointer for locals.
   if (RegInfo->needsStackRealignment(MF)) {
     assert(hasFP(MF) && "dynamic stack realignment without a FP!");
     if (isFixed) {
       FrameReg = RegInfo->getFrameRegister(MF);
       Offset = FPOffset;
     } else if (hasMovingSP) {
       assert(RegInfo->hasBasePointer(MF) &&
              "VLAs and dynamic stack alignment, but missing base pointer!");
       FrameReg = RegInfo->getBaseRegister();
     }
     return Offset;
   }
 
   // If there is a frame pointer, use it when we can.
   if (hasFP(MF) && AFI->hasStackFrame()) {
     // Use frame pointer to reference fixed objects. Use it for locals if
     // there are VLAs (and thus the SP isn't reliable as a base).
     if (isFixed || (hasMovingSP && !RegInfo->hasBasePointer(MF))) {
       FrameReg = RegInfo->getFrameRegister(MF);
       return FPOffset;
     } else if (hasMovingSP) {
       assert(RegInfo->hasBasePointer(MF) && "missing base pointer!");
       if (AFI->isThumb2Function()) {
         // Try to use the frame pointer if we can, else use the base pointer
         // since it's available. This is handy for the emergency spill slot, in
         // particular.
         if (FPOffset >= -255 && FPOffset < 0) {
           FrameReg = RegInfo->getFrameRegister(MF);
           return FPOffset;
         }
       }
     } else if (AFI->isThumbFunction()) {
       // Prefer SP to base pointer, if the offset is suitably aligned and in
       // range as the effective range of the immediate offset is bigger when
       // basing off SP.
       // Use  add <rd>, sp, #<imm8>
       //      ldr <rd>, [sp, #<imm8>]
       if (Offset >= 0 && (Offset & 3) == 0 && Offset <= 1020)
         return Offset;
       // In Thumb2 mode, the negative offset is very limited. Try to avoid
       // out of range references. ldr <rt>,[<rn>, #-<imm8>]
       if (AFI->isThumb2Function() && FPOffset >= -255 && FPOffset < 0) {
         FrameReg = RegInfo->getFrameRegister(MF);
         return FPOffset;
       }
     } else if (Offset > (FPOffset < 0 ? -FPOffset : FPOffset)) {
       // Otherwise, use SP or FP, whichever is closer to the stack slot.
       FrameReg = RegInfo->getFrameRegister(MF);
       return FPOffset;
     }
   }
   // Use the base pointer if we have one.
   if (RegInfo->hasBasePointer(MF))
     FrameReg = RegInfo->getBaseRegister();
   return Offset;
 }
 
 void ARMFrameLowering::emitPushInst(MachineBasicBlock &MBB,
                                     MachineBasicBlock::iterator MI,
                                     const std::vector<CalleeSavedInfo> &CSI,
                                     unsigned StmOpc, unsigned StrOpc,
                                     bool NoGap,
                                     bool(*Func)(unsigned, bool),
                                     unsigned NumAlignedDPRCS2Regs,
                                     unsigned MIFlags) const {
   MachineFunction &MF = *MBB.getParent();
   const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
   const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
 
   DebugLoc DL;
 
   using RegAndKill = std::pair<unsigned, bool>;
 
   SmallVector<RegAndKill, 4> Regs;
   unsigned i = CSI.size();
   while (i != 0) {
     unsigned LastReg = 0;
     for (; i != 0; --i) {
       unsigned Reg = CSI[i-1].getReg();
       if (!(Func)(Reg, STI.splitFramePushPop(MF))) continue;
 
       // D-registers in the aligned area DPRCS2 are NOT spilled here.
       if (Reg >= ARM::D8 && Reg < ARM::D8 + NumAlignedDPRCS2Regs)
         continue;
 
       const MachineRegisterInfo &MRI = MF.getRegInfo();
       bool isLiveIn = MRI.isLiveIn(Reg);
       if (!isLiveIn && !MRI.isReserved(Reg))
         MBB.addLiveIn(Reg);
       // If NoGap is true, push consecutive registers and then leave the rest
       // for other instructions. e.g.
       // vpush {d8, d10, d11} -> vpush {d8}, vpush {d10, d11}
       if (NoGap && LastReg && LastReg != Reg-1)
         break;
       LastReg = Reg;
       // Do not set a kill flag on values that are also marked as live-in. This
       // happens with the @llvm-returnaddress intrinsic and with arguments
       // passed in callee saved registers.
       // Omitting the kill flags is conservatively correct even if the live-in
       // is not used after all.
       Regs.push_back(std::make_pair(Reg, /*isKill=*/!isLiveIn));
     }
 
     if (Regs.empty())
       continue;
 
     llvm::sort(Regs.begin(), Regs.end(), [&](const RegAndKill &LHS,
                                              const RegAndKill &RHS) {
       return TRI.getEncodingValue(LHS.first) < TRI.getEncodingValue(RHS.first);
     });
 
     if (Regs.size() > 1 || StrOpc== 0) {
       MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StmOpc), ARM::SP)
                                     .addReg(ARM::SP)
                                     .setMIFlags(MIFlags)
                                     .add(predOps(ARMCC::AL));
       for (unsigned i = 0, e = Regs.size(); i < e; ++i)
         MIB.addReg(Regs[i].first, getKillRegState(Regs[i].second));
     } else if (Regs.size() == 1) {
       BuildMI(MBB, MI, DL, TII.get(StrOpc), ARM::SP)
           .addReg(Regs[0].first, getKillRegState(Regs[0].second))
           .addReg(ARM::SP)
           .setMIFlags(MIFlags)
           .addImm(-4)
           .add(predOps(ARMCC::AL));
     }
     Regs.clear();
 
     // Put any subsequent vpush instructions before this one: they will refer to
     // higher register numbers so need to be pushed first in order to preserve
     // monotonicity.
     if (MI != MBB.begin())
       --MI;
   }
 }
 
 void ARMFrameLowering::emitPopInst(MachineBasicBlock &MBB,
                                    MachineBasicBlock::iterator MI,
                                    std::vector<CalleeSavedInfo> &CSI,
                                    unsigned LdmOpc, unsigned LdrOpc,
                                    bool isVarArg, bool NoGap,
                                    bool(*Func)(unsigned, bool),
                                    unsigned NumAlignedDPRCS2Regs) const {
   MachineFunction &MF = *MBB.getParent();
   const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
   const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
   ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
   DebugLoc DL;
   bool isTailCall = false;
   bool isInterrupt = false;
   bool isTrap = false;
   if (MBB.end() != MI) {
     DL = MI->getDebugLoc();
     unsigned RetOpcode = MI->getOpcode();
     isTailCall = (RetOpcode == ARM::TCRETURNdi || RetOpcode == ARM::TCRETURNri);
     isInterrupt =
         RetOpcode == ARM::SUBS_PC_LR || RetOpcode == ARM::t2SUBS_PC_LR;
     isTrap =
         RetOpcode == ARM::TRAP || RetOpcode == ARM::TRAPNaCl ||
         RetOpcode == ARM::tTRAP;
   }
 
   SmallVector<unsigned, 4> Regs;
   unsigned i = CSI.size();
   while (i != 0) {
     unsigned LastReg = 0;
     bool DeleteRet = false;
     for (; i != 0; --i) {
       CalleeSavedInfo &Info = CSI[i-1];
       unsigned Reg = Info.getReg();
       if (!(Func)(Reg, STI.splitFramePushPop(MF))) continue;
 
       // The aligned reloads from area DPRCS2 are not inserted here.
       if (Reg >= ARM::D8 && Reg < ARM::D8 + NumAlignedDPRCS2Regs)
         continue;
 
       if (Reg == ARM::LR && !isTailCall && !isVarArg && !isInterrupt &&
           !isTrap && STI.hasV5TOps()) {
         if (MBB.succ_empty()) {
           Reg = ARM::PC;
           // Fold the return instruction into the LDM.
           DeleteRet = true;
           LdmOpc = AFI->isThumbFunction() ? ARM::t2LDMIA_RET : ARM::LDMIA_RET;
           // We 'restore' LR into PC so it is not live out of the return block:
           // Clear Restored bit.
           Info.setRestored(false);
         } else
           LdmOpc = AFI->isThumbFunction() ? ARM::t2LDMIA_UPD : ARM::LDMIA_UPD;
       }
 
       // If NoGap is true, pop consecutive registers and then leave the rest
       // for other instructions. e.g.
       // vpop {d8, d10, d11} -> vpop {d8}, vpop {d10, d11}
       if (NoGap && LastReg && LastReg != Reg-1)
         break;
 
       LastReg = Reg;
       Regs.push_back(Reg);
     }
 
     if (Regs.empty())
       continue;
 
     llvm::sort(Regs.begin(), Regs.end(), [&](unsigned LHS, unsigned RHS) {
       return TRI.getEncodingValue(LHS) < TRI.getEncodingValue(RHS);
     });
 
     if (Regs.size() > 1 || LdrOpc == 0) {
       MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(LdmOpc), ARM::SP)
                                     .addReg(ARM::SP)
                                     .add(predOps(ARMCC::AL));
       for (unsigned i = 0, e = Regs.size(); i < e; ++i)
         MIB.addReg(Regs[i], getDefRegState(true));
       if (DeleteRet) {
         if (MI != MBB.end()) {
           MIB.copyImplicitOps(*MI);
           MI->eraseFromParent();
         }
       }
       MI = MIB;
     } else if (Regs.size() == 1) {
       // If we adjusted the reg to PC from LR above, switch it back here. We
       // only do that for LDM.
       if (Regs[0] == ARM::PC)
         Regs[0] = ARM::LR;
       MachineInstrBuilder MIB =
         BuildMI(MBB, MI, DL, TII.get(LdrOpc), Regs[0])
           .addReg(ARM::SP, RegState::Define)
           .addReg(ARM::SP);
       // ARM mode needs an extra reg0 here due to addrmode2. Will go away once
       // that refactoring is complete (eventually).
       if (LdrOpc == ARM::LDR_POST_REG || LdrOpc == ARM::LDR_POST_IMM) {
         MIB.addReg(0);
         MIB.addImm(ARM_AM::getAM2Opc(ARM_AM::add, 4, ARM_AM::no_shift));
       } else
         MIB.addImm(4);
       MIB.add(predOps(ARMCC::AL));
     }
     Regs.clear();
 
     // Put any subsequent vpop instructions after this one: they will refer to
     // higher register numbers so need to be popped afterwards.
     if (MI != MBB.end())
       ++MI;
   }
 }
 
 /// Emit aligned spill instructions for NumAlignedDPRCS2Regs D-registers
 /// starting from d8.  Also insert stack realignment code and leave the stack
 /// pointer pointing to the d8 spill slot.
 static void emitAlignedDPRCS2Spills(MachineBasicBlock &MBB,
                                     MachineBasicBlock::iterator MI,
                                     unsigned NumAlignedDPRCS2Regs,
                                     const std::vector<CalleeSavedInfo> &CSI,
                                     const TargetRegisterInfo *TRI) {
   MachineFunction &MF = *MBB.getParent();
   ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
   DebugLoc DL = MI != MBB.end() ? MI->getDebugLoc() : DebugLoc();
   const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
   MachineFrameInfo &MFI = MF.getFrameInfo();
 
   // Mark the D-register spill slots as properly aligned.  Since MFI computes
   // stack slot layout backwards, this can actually mean that the d-reg stack
   // slot offsets can be wrong. The offset for d8 will always be correct.
   for (unsigned i = 0, e = CSI.size(); i != e; ++i) {
     unsigned DNum = CSI[i].getReg() - ARM::D8;
     if (DNum > NumAlignedDPRCS2Regs - 1)
       continue;
     int FI = CSI[i].getFrameIdx();
     // The even-numbered registers will be 16-byte aligned, the odd-numbered
     // registers will be 8-byte aligned.
     MFI.setObjectAlignment(FI, DNum % 2 ? 8 : 16);
 
     // The stack slot for D8 needs to be maximally aligned because this is
     // actually the point where we align the stack pointer.  MachineFrameInfo
     // computes all offsets relative to the incoming stack pointer which is a
     // bit weird when realigning the stack.  Any extra padding for this
     // over-alignment is not realized because the code inserted below adjusts
     // the stack pointer by numregs * 8 before aligning the stack pointer.
     if (DNum == 0)
       MFI.setObjectAlignment(FI, MFI.getMaxAlignment());
   }
 
   // Move the stack pointer to the d8 spill slot, and align it at the same
   // time. Leave the stack slot address in the scratch register r4.
   //
   //   sub r4, sp, #numregs * 8
   //   bic r4, r4, #align - 1
   //   mov sp, r4
   //
   bool isThumb = AFI->isThumbFunction();
   assert(!AFI->isThumb1OnlyFunction() && "Can't realign stack for thumb1");
   AFI->setShouldRestoreSPFromFP(true);
 
   // sub r4, sp, #numregs * 8
   // The immediate is <= 64, so it doesn't need any special encoding.
   unsigned Opc = isThumb ? ARM::t2SUBri : ARM::SUBri;
   BuildMI(MBB, MI, DL, TII.get(Opc), ARM::R4)
       .addReg(ARM::SP)
       .addImm(8 * NumAlignedDPRCS2Regs)
       .add(predOps(ARMCC::AL))
       .add(condCodeOp());
 
   unsigned MaxAlign = MF.getFrameInfo().getMaxAlignment();
   // We must set parameter MustBeSingleInstruction to true, since
   // skipAlignedDPRCS2Spills expects exactly 3 instructions to perform
   // stack alignment.  Luckily, this can always be done since all ARM
   // architecture versions that support Neon also support the BFC
   // instruction.
   emitAligningInstructions(MF, AFI, TII, MBB, MI, DL, ARM::R4, MaxAlign, true);
 
   // mov sp, r4
   // The stack pointer must be adjusted before spilling anything, otherwise
   // the stack slots could be clobbered by an interrupt handler.
   // Leave r4 live, it is used below.
   Opc = isThumb ? ARM::tMOVr : ARM::MOVr;
   MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(Opc), ARM::SP)
                                 .addReg(ARM::R4)
                                 .add(predOps(ARMCC::AL));
   if (!isThumb)
     MIB.add(condCodeOp());
 
   // Now spill NumAlignedDPRCS2Regs registers starting from d8.
   // r4 holds the stack slot address.
   unsigned NextReg = ARM::D8;
 
   // 16-byte aligned vst1.64 with 4 d-regs and address writeback.
   // The writeback is only needed when emitting two vst1.64 instructions.
   if (NumAlignedDPRCS2Regs >= 6) {
     unsigned SupReg = TRI->getMatchingSuperReg(NextReg, ARM::dsub_0,
                                                &ARM::QQPRRegClass);
     MBB.addLiveIn(SupReg);
     BuildMI(MBB, MI, DL, TII.get(ARM::VST1d64Qwb_fixed), ARM::R4)
         .addReg(ARM::R4, RegState::Kill)
         .addImm(16)
         .addReg(NextReg)
         .addReg(SupReg, RegState::ImplicitKill)
         .add(predOps(ARMCC::AL));
     NextReg += 4;
     NumAlignedDPRCS2Regs -= 4;
   }
 
   // We won't modify r4 beyond this point.  It currently points to the next
   // register to be spilled.
   unsigned R4BaseReg = NextReg;
 
   // 16-byte aligned vst1.64 with 4 d-regs, no writeback.
   if (NumAlignedDPRCS2Regs >= 4) {
     unsigned SupReg = TRI->getMatchingSuperReg(NextReg, ARM::dsub_0,
                                                &ARM::QQPRRegClass);
     MBB.addLiveIn(SupReg);
     BuildMI(MBB, MI, DL, TII.get(ARM::VST1d64Q))
         .addReg(ARM::R4)
         .addImm(16)
         .addReg(NextReg)
         .addReg(SupReg, RegState::ImplicitKill)
         .add(predOps(ARMCC::AL));
     NextReg += 4;
     NumAlignedDPRCS2Regs -= 4;
   }
 
   // 16-byte aligned vst1.64 with 2 d-regs.
   if (NumAlignedDPRCS2Regs >= 2) {
     unsigned SupReg = TRI->getMatchingSuperReg(NextReg, ARM::dsub_0,
                                                &ARM::QPRRegClass);
     MBB.addLiveIn(SupReg);
     BuildMI(MBB, MI, DL, TII.get(ARM::VST1q64))
         .addReg(ARM::R4)
         .addImm(16)
         .addReg(SupReg)
         .add(predOps(ARMCC::AL));
     NextReg += 2;
     NumAlignedDPRCS2Regs -= 2;
   }
 
   // Finally, use a vanilla vstr.64 for the odd last register.
   if (NumAlignedDPRCS2Regs) {
     MBB.addLiveIn(NextReg);
     // vstr.64 uses addrmode5 which has an offset scale of 4.
     BuildMI(MBB, MI, DL, TII.get(ARM::VSTRD))
         .addReg(NextReg)
         .addReg(ARM::R4)
         .addImm((NextReg - R4BaseReg) * 2)
         .add(predOps(ARMCC::AL));
   }
 
   // The last spill instruction inserted should kill the scratch register r4.
   std::prev(MI)->addRegisterKilled(ARM::R4, TRI);
 }
 
 /// Skip past the code inserted by emitAlignedDPRCS2Spills, and return an
 /// iterator to the following instruction.
 static MachineBasicBlock::iterator
 skipAlignedDPRCS2Spills(MachineBasicBlock::iterator MI,
                         unsigned NumAlignedDPRCS2Regs) {
   //   sub r4, sp, #numregs * 8
   //   bic r4, r4, #align - 1
   //   mov sp, r4
   ++MI; ++MI; ++MI;
   assert(MI->mayStore() && "Expecting spill instruction");
 
   // These switches all fall through.
   switch(NumAlignedDPRCS2Regs) {
   case 7:
     ++MI;
     assert(MI->mayStore() && "Expecting spill instruction");
     LLVM_FALLTHROUGH;
   default:
     ++MI;
     assert(MI->mayStore() && "Expecting spill instruction");
     LLVM_FALLTHROUGH;
   case 1:
   case 2:
   case 4:
     assert(MI->killsRegister(ARM::R4) && "Missed kill flag");
     ++MI;
   }
   return MI;
 }
 
 /// Emit aligned reload instructions for NumAlignedDPRCS2Regs D-registers
 /// starting from d8.  These instructions are assumed to execute while the
 /// stack is still aligned, unlike the code inserted by emitPopInst.
 static void emitAlignedDPRCS2Restores(MachineBasicBlock &MBB,
                                       MachineBasicBlock::iterator MI,
                                       unsigned NumAlignedDPRCS2Regs,
                                       const std::vector<CalleeSavedInfo> &CSI,
                                       const TargetRegisterInfo *TRI) {
   MachineFunction &MF = *MBB.getParent();
   ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
   DebugLoc DL = MI != MBB.end() ? MI->getDebugLoc() : DebugLoc();
   const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
 
   // Find the frame index assigned to d8.
   int D8SpillFI = 0;
   for (unsigned i = 0, e = CSI.size(); i != e; ++i)
     if (CSI[i].getReg() == ARM::D8) {
       D8SpillFI = CSI[i].getFrameIdx();
       break;
     }
 
   // Materialize the address of the d8 spill slot into the scratch register r4.
   // This can be fairly complicated if the stack frame is large, so just use
   // the normal frame index elimination mechanism to do it.  This code runs as
   // the initial part of the epilog where the stack and base pointers haven't
   // been changed yet.
   bool isThumb = AFI->isThumbFunction();
   assert(!AFI->isThumb1OnlyFunction() && "Can't realign stack for thumb1");
 
   unsigned Opc = isThumb ? ARM::t2ADDri : ARM::ADDri;
   BuildMI(MBB, MI, DL, TII.get(Opc), ARM::R4)
       .addFrameIndex(D8SpillFI)
       .addImm(0)
       .add(predOps(ARMCC::AL))
       .add(condCodeOp());
 
   // Now restore NumAlignedDPRCS2Regs registers starting from d8.
   unsigned NextReg = ARM::D8;
 
   // 16-byte aligned vld1.64 with 4 d-regs and writeback.
   if (NumAlignedDPRCS2Regs >= 6) {
     unsigned SupReg = TRI->getMatchingSuperReg(NextReg, ARM::dsub_0,
                                                &ARM::QQPRRegClass);
     BuildMI(MBB, MI, DL, TII.get(ARM::VLD1d64Qwb_fixed), NextReg)
         .addReg(ARM::R4, RegState::Define)
         .addReg(ARM::R4, RegState::Kill)
         .addImm(16)
         .addReg(SupReg, RegState::ImplicitDefine)
         .add(predOps(ARMCC::AL));
     NextReg += 4;
     NumAlignedDPRCS2Regs -= 4;
   }
 
   // We won't modify r4 beyond this point.  It currently points to the next
   // register to be spilled.
   unsigned R4BaseReg = NextReg;
 
   // 16-byte aligned vld1.64 with 4 d-regs, no writeback.
   if (NumAlignedDPRCS2Regs >= 4) {
     unsigned SupReg = TRI->getMatchingSuperReg(NextReg, ARM::dsub_0,
                                                &ARM::QQPRRegClass);
     BuildMI(MBB, MI, DL, TII.get(ARM::VLD1d64Q), NextReg)
         .addReg(ARM::R4)
         .addImm(16)
         .addReg(SupReg, RegState::ImplicitDefine)
         .add(predOps(ARMCC::AL));
     NextReg += 4;
     NumAlignedDPRCS2Regs -= 4;
   }
 
   // 16-byte aligned vld1.64 with 2 d-regs.
   if (NumAlignedDPRCS2Regs >= 2) {
     unsigned SupReg = TRI->getMatchingSuperReg(NextReg, ARM::dsub_0,
                                                &ARM::QPRRegClass);
     BuildMI(MBB, MI, DL, TII.get(ARM::VLD1q64), SupReg)
         .addReg(ARM::R4)
         .addImm(16)
         .add(predOps(ARMCC::AL));
     NextReg += 2;
     NumAlignedDPRCS2Regs -= 2;
   }
 
   // Finally, use a vanilla vldr.64 for the remaining odd register.
   if (NumAlignedDPRCS2Regs)
     BuildMI(MBB, MI, DL, TII.get(ARM::VLDRD), NextReg)
         .addReg(ARM::R4)
         .addImm(2 * (NextReg - R4BaseReg))
         .add(predOps(ARMCC::AL));
 
   // Last store kills r4.
   std::prev(MI)->addRegisterKilled(ARM::R4, TRI);
 }
 
 bool ARMFrameLowering::spillCalleeSavedRegisters(MachineBasicBlock &MBB,
                                         MachineBasicBlock::iterator MI,
                                         const std::vector<CalleeSavedInfo> &CSI,
                                         const TargetRegisterInfo *TRI) const {
   if (CSI.empty())
     return false;
 
   MachineFunction &MF = *MBB.getParent();
   ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
 
   unsigned PushOpc = AFI->isThumbFunction() ? ARM::t2STMDB_UPD : ARM::STMDB_UPD;
   unsigned PushOneOpc = AFI->isThumbFunction() ?
     ARM::t2STR_PRE : ARM::STR_PRE_IMM;
   unsigned FltOpc = ARM::VSTMDDB_UPD;
   unsigned NumAlignedDPRCS2Regs = AFI->getNumAlignedDPRCS2Regs();
   emitPushInst(MBB, MI, CSI, PushOpc, PushOneOpc, false, &isARMArea1Register, 0,
                MachineInstr::FrameSetup);
   emitPushInst(MBB, MI, CSI, PushOpc, PushOneOpc, false, &isARMArea2Register, 0,
                MachineInstr::FrameSetup);
   emitPushInst(MBB, MI, CSI, FltOpc, 0, true, &isARMArea3Register,
                NumAlignedDPRCS2Regs, MachineInstr::FrameSetup);
 
   // The code above does not insert spill code for the aligned DPRCS2 registers.
   // The stack realignment code will be inserted between the push instructions
   // and these spills.
   if (NumAlignedDPRCS2Regs)
     emitAlignedDPRCS2Spills(MBB, MI, NumAlignedDPRCS2Regs, CSI, TRI);
 
   return true;
 }
 
 bool ARMFrameLowering::restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
                                         MachineBasicBlock::iterator MI,
                                         std::vector<CalleeSavedInfo> &CSI,
                                         const TargetRegisterInfo *TRI) const {
   if (CSI.empty())
     return false;
 
   MachineFunction &MF = *MBB.getParent();
   ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
   bool isVarArg = AFI->getArgRegsSaveSize() > 0;
   unsigned NumAlignedDPRCS2Regs = AFI->getNumAlignedDPRCS2Regs();
 
   // The emitPopInst calls below do not insert reloads for the aligned DPRCS2
   // registers. Do that here instead.
   if (NumAlignedDPRCS2Regs)
     emitAlignedDPRCS2Restores(MBB, MI, NumAlignedDPRCS2Regs, CSI, TRI);
 
   unsigned PopOpc = AFI->isThumbFunction() ? ARM::t2LDMIA_UPD : ARM::LDMIA_UPD;
   unsigned LdrOpc = AFI->isThumbFunction() ? ARM::t2LDR_POST :ARM::LDR_POST_IMM;
   unsigned FltOpc = ARM::VLDMDIA_UPD;
   emitPopInst(MBB, MI, CSI, FltOpc, 0, isVarArg, true, &isARMArea3Register,
               NumAlignedDPRCS2Regs);
   emitPopInst(MBB, MI, CSI, PopOpc, LdrOpc, isVarArg, false,
               &isARMArea2Register, 0);
   emitPopInst(MBB, MI, CSI, PopOpc, LdrOpc, isVarArg, false,
               &isARMArea1Register, 0);
 
   return true;
 }
 
 // FIXME: Make generic?
 static unsigned GetFunctionSizeInBytes(const MachineFunction &MF,
                                        const ARMBaseInstrInfo &TII) {
   unsigned FnSize = 0;
   for (auto &MBB : MF) {
     for (auto &MI : MBB)
       FnSize += TII.getInstSizeInBytes(MI);
   }
   return FnSize;
 }
 
 /// estimateRSStackSizeLimit - Look at each instruction that references stack
 /// frames and return the stack size limit beyond which some of these
 /// instructions will require a scratch register during their expansion later.
 // FIXME: Move to TII?
 static unsigned estimateRSStackSizeLimit(MachineFunction &MF,
                                          const TargetFrameLowering *TFI) {
   const ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
   unsigned Limit = (1 << 12) - 1;
   for (auto &MBB : MF) {
     for (auto &MI : MBB) {
       for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
         if (!MI.getOperand(i).isFI())
           continue;
 
         // When using ADDri to get the address of a stack object, 255 is the
         // largest offset guaranteed to fit in the immediate offset.
         if (MI.getOpcode() == ARM::ADDri) {
           Limit = std::min(Limit, (1U << 8) - 1);
           break;
         }
 
         // Otherwise check the addressing mode.
         switch (MI.getDesc().TSFlags & ARMII::AddrModeMask) {
         case ARMII::AddrMode3:
         case ARMII::AddrModeT2_i8:
           Limit = std::min(Limit, (1U << 8) - 1);
           break;
         case ARMII::AddrMode5:
         case ARMII::AddrModeT2_i8s4:
+        case ARMII::AddrModeT2_ldrex:
           Limit = std::min(Limit, ((1U << 8) - 1) * 4);
           break;
         case ARMII::AddrModeT2_i12:
           // i12 supports only positive offset so these will be converted to
           // i8 opcodes. See llvm::rewriteT2FrameIndex.
           if (TFI->hasFP(MF) && AFI->hasStackFrame())
             Limit = std::min(Limit, (1U << 8) - 1);
           break;
         case ARMII::AddrMode4:
         case ARMII::AddrMode6:
           // Addressing modes 4 & 6 (load/store) instructions can't encode an
           // immediate offset for stack references.
           return 0;
         default:
           break;
         }
         break; // At most one FI per instruction
       }
     }
   }
 
   return Limit;
 }
 
 // In functions that realign the stack, it can be an advantage to spill the
 // callee-saved vector registers after realigning the stack. The vst1 and vld1
 // instructions take alignment hints that can improve performance.
 static void
 checkNumAlignedDPRCS2Regs(MachineFunction &MF, BitVector &SavedRegs) {
   MF.getInfo<ARMFunctionInfo>()->setNumAlignedDPRCS2Regs(0);
   if (!SpillAlignedNEONRegs)
     return;
 
   // Naked functions don't spill callee-saved registers.
   if (MF.getFunction().hasFnAttribute(Attribute::Naked))
     return;
 
   // We are planning to use NEON instructions vst1 / vld1.
   if (!static_cast<const ARMSubtarget &>(MF.getSubtarget()).hasNEON())
     return;
 
   // Don't bother if the default stack alignment is sufficiently high.
   if (MF.getSubtarget().getFrameLowering()->getStackAlignment() >= 8)
     return;
 
   // Aligned spills require stack realignment.
   if (!static_cast<const ARMBaseRegisterInfo *>(
            MF.getSubtarget().getRegisterInfo())->canRealignStack(MF))
     return;
 
   // We always spill contiguous d-registers starting from d8. Count how many
   // needs spilling.  The register allocator will almost always use the
   // callee-saved registers in order, but it can happen that there are holes in
   // the range.  Registers above the hole will be spilled to the standard DPRCS
   // area.
   unsigned NumSpills = 0;
   for (; NumSpills < 8; ++NumSpills)
     if (!SavedRegs.test(ARM::D8 + NumSpills))
       break;
 
   // Don't do this for just one d-register. It's not worth it.
   if (NumSpills < 2)
     return;
 
   // Spill the first NumSpills D-registers after realigning the stack.
   MF.getInfo<ARMFunctionInfo>()->setNumAlignedDPRCS2Regs(NumSpills);
 
   // A scratch register is required for the vst1 / vld1 instructions.
   SavedRegs.set(ARM::R4);
 }
 
 void ARMFrameLowering::determineCalleeSaves(MachineFunction &MF,
                                             BitVector &SavedRegs,
                                             RegScavenger *RS) const {
   TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);
   // This tells PEI to spill the FP as if it is any other callee-save register
   // to take advantage the eliminateFrameIndex machinery. This also ensures it
   // is spilled in the order specified by getCalleeSavedRegs() to make it easier
   // to combine multiple loads / stores.
   bool CanEliminateFrame = true;
   bool CS1Spilled = false;
   bool LRSpilled = false;
   unsigned NumGPRSpills = 0;
   unsigned NumFPRSpills = 0;
   SmallVector<unsigned, 4> UnspilledCS1GPRs;
   SmallVector<unsigned, 4> UnspilledCS2GPRs;
   const ARMBaseRegisterInfo *RegInfo = static_cast<const ARMBaseRegisterInfo *>(
       MF.getSubtarget().getRegisterInfo());
   const ARMBaseInstrInfo &TII =
       *static_cast<const ARMBaseInstrInfo *>(MF.getSubtarget().getInstrInfo());
   ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
   MachineFrameInfo &MFI = MF.getFrameInfo();
   MachineRegisterInfo &MRI = MF.getRegInfo();
   const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
   (void)TRI;  // Silence unused warning in non-assert builds.
   unsigned FramePtr = RegInfo->getFrameRegister(MF);
 
   // Spill R4 if Thumb2 function requires stack realignment - it will be used as
   // scratch register. Also spill R4 if Thumb2 function has varsized objects,
   // since it's not always possible to restore sp from fp in a single
   // instruction.
   // FIXME: It will be better just to find spare register here.
   if (AFI->isThumb2Function() &&
       (MFI.hasVarSizedObjects() || RegInfo->needsStackRealignment(MF)))
     SavedRegs.set(ARM::R4);
 
   // If a stack probe will be emitted, spill R4 and LR, since they are
   // clobbered by the stack probe call.
   // This estimate should be a safe, conservative estimate. The actual
   // stack probe is enabled based on the size of the local objects;
   // this estimate also includes the varargs store size.
   if (STI.isTargetWindows() &&
       WindowsRequiresStackProbe(MF, MFI.estimateStackSize(MF))) {
     SavedRegs.set(ARM::R4);
     SavedRegs.set(ARM::LR);
   }
 
   if (AFI->isThumb1OnlyFunction()) {
     // Spill LR if Thumb1 function uses variable length argument lists.
     if (AFI->getArgRegsSaveSize() > 0)
       SavedRegs.set(ARM::LR);
 
     // Spill R4 if Thumb1 epilogue has to restore SP from FP or the function
     // requires stack alignment.  We don't know for sure what the stack size
     // will be, but for this, an estimate is good enough. If there anything
     // changes it, it'll be a spill, which implies we've used all the registers
     // and so R4 is already used, so not marking it here will be OK.
     // FIXME: It will be better just to find spare register here.
     if (MFI.hasVarSizedObjects() || RegInfo->needsStackRealignment(MF) ||
         MFI.estimateStackSize(MF) > 508)
       SavedRegs.set(ARM::R4);
   }
 
   // See if we can spill vector registers to aligned stack.
   checkNumAlignedDPRCS2Regs(MF, SavedRegs);
 
   // Spill the BasePtr if it's used.
   if (RegInfo->hasBasePointer(MF))
     SavedRegs.set(RegInfo->getBaseRegister());
 
   // Don't spill FP if the frame can be eliminated. This is determined
   // by scanning the callee-save registers to see if any is modified.
   const MCPhysReg *CSRegs = RegInfo->getCalleeSavedRegs(&MF);
   for (unsigned i = 0; CSRegs[i]; ++i) {
     unsigned Reg = CSRegs[i];
     bool Spilled = false;
     if (SavedRegs.test(Reg)) {
       Spilled = true;
       CanEliminateFrame = false;
     }
 
     if (!ARM::GPRRegClass.contains(Reg)) {
       if (Spilled) {
         if (ARM::SPRRegClass.contains(Reg))
           NumFPRSpills++;
         else if (ARM::DPRRegClass.contains(Reg))
           NumFPRSpills += 2;
         else if (ARM::QPRRegClass.contains(Reg))
           NumFPRSpills += 4;
       }
       continue;
     }
 
     if (Spilled) {
       NumGPRSpills++;
 
       if (!STI.splitFramePushPop(MF)) {
         if (Reg == ARM::LR)
           LRSpilled = true;
         CS1Spilled = true;
         continue;
       }
 
       // Keep track if LR and any of R4, R5, R6, and R7 is spilled.
       switch (Reg) {
       case ARM::LR:
         LRSpilled = true;
         LLVM_FALLTHROUGH;
       case ARM::R0: case ARM::R1:
       case ARM::R2: case ARM::R3:
       case ARM::R4: case ARM::R5:
       case ARM::R6: case ARM::R7:
         CS1Spilled = true;
         break;
       default:
         break;
       }
     } else {
       if (!STI.splitFramePushPop(MF)) {
         UnspilledCS1GPRs.push_back(Reg);
         continue;
       }
 
       switch (Reg) {
       case ARM::R0: case ARM::R1:
       case ARM::R2: case ARM::R3:
       case ARM::R4: case ARM::R5:
       case ARM::R6: case ARM::R7:
       case ARM::LR:
         UnspilledCS1GPRs.push_back(Reg);
         break;
       default:
         UnspilledCS2GPRs.push_back(Reg);
         break;
       }
     }
   }
 
   bool ForceLRSpill = false;
   if (!LRSpilled && AFI->isThumb1OnlyFunction()) {
     unsigned FnSize = GetFunctionSizeInBytes(MF, TII);
     // Force LR to be spilled if the Thumb function size is > 2048. This enables
     // use of BL to implement far jump. If it turns out that it's not needed
     // then the branch fix up path will undo it.
     if (FnSize >= (1 << 11)) {
       CanEliminateFrame = false;
       ForceLRSpill = true;
     }
   }
 
   // If any of the stack slot references may be out of range of an immediate
   // offset, make sure a register (or a spill slot) is available for the
   // register scavenger. Note that if we're indexing off the frame pointer, the
   // effective stack size is 4 bytes larger since the FP points to the stack
   // slot of the previous FP. Also, if we have variable sized objects in the
   // function, stack slot references will often be negative, and some of
   // our instructions are positive-offset only, so conservatively consider
   // that case to want a spill slot (or register) as well. Similarly, if
   // the function adjusts the stack pointer during execution and the
   // adjustments aren't already part of our stack size estimate, our offset
   // calculations may be off, so be conservative.
   // FIXME: We could add logic to be more precise about negative offsets
   //        and which instructions will need a scratch register for them. Is it
   //        worth the effort and added fragility?
   unsigned EstimatedStackSize =
       MFI.estimateStackSize(MF) + 4 * (NumGPRSpills + NumFPRSpills);
 
   // Determine biggest (positive) SP offset in MachineFrameInfo.
   int MaxFixedOffset = 0;
   for (int I = MFI.getObjectIndexBegin(); I < 0; ++I) {
     int MaxObjectOffset = MFI.getObjectOffset(I) + MFI.getObjectSize(I);
     MaxFixedOffset = std::max(MaxFixedOffset, MaxObjectOffset);
   }
 
   bool HasFP = hasFP(MF);
   if (HasFP) {
     if (AFI->hasStackFrame())
       EstimatedStackSize += 4;
   } else {
     // If FP is not used, SP will be used to access arguments, so count the
     // size of arguments into the estimation.
     EstimatedStackSize += MaxFixedOffset;
   }
   EstimatedStackSize += 16; // For possible paddings.
 
   unsigned EstimatedRSStackSizeLimit = estimateRSStackSizeLimit(MF, this);
   int MaxFPOffset = getMaxFPOffset(MF.getFunction(), *AFI);
   bool BigFrameOffsets = EstimatedStackSize >= EstimatedRSStackSizeLimit ||
     MFI.hasVarSizedObjects() ||
     (MFI.adjustsStack() && !canSimplifyCallFramePseudos(MF)) ||
     // For large argument stacks fp relative addressed may overflow.
     (HasFP && (MaxFixedOffset - MaxFPOffset) >= (int)EstimatedRSStackSizeLimit);
   if (BigFrameOffsets ||
       !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF)) {
     AFI->setHasStackFrame(true);
 
     if (HasFP) {
       SavedRegs.set(FramePtr);
       // If the frame pointer is required by the ABI, also spill LR so that we
       // emit a complete frame record.
       if (MF.getTarget().Options.DisableFramePointerElim(MF) && !LRSpilled) {
         SavedRegs.set(ARM::LR);
         LRSpilled = true;
         NumGPRSpills++;
         auto LRPos = llvm::find(UnspilledCS1GPRs, ARM::LR);
         if (LRPos != UnspilledCS1GPRs.end())
           UnspilledCS1GPRs.erase(LRPos);
       }
       auto FPPos = llvm::find(UnspilledCS1GPRs, FramePtr);
       if (FPPos != UnspilledCS1GPRs.end())
         UnspilledCS1GPRs.erase(FPPos);
       NumGPRSpills++;
       if (FramePtr == ARM::R7)
         CS1Spilled = true;
     }
 
     // This is true when we inserted a spill for an unused register that can now
     // be used for register scavenging.
     bool ExtraCSSpill = false;
 
     if (AFI->isThumb1OnlyFunction()) {
       // For Thumb1-only targets, we need some low registers when we save and
       // restore the high registers (which aren't allocatable, but could be
       // used by inline assembly) because the push/pop instructions can not
       // access high registers. If necessary, we might need to push more low
       // registers to ensure that there is at least one free that can be used
       // for the saving & restoring, and preferably we should ensure that as
       // many as are needed are available so that fewer push/pop instructions
       // are required.
 
       // Low registers which are not currently pushed, but could be (r4-r7).
       SmallVector<unsigned, 4> AvailableRegs;
 
       // Unused argument registers (r0-r3) can be clobbered in the prologue for
       // free.
       int EntryRegDeficit = 0;
       for (unsigned Reg : {ARM::R0, ARM::R1, ARM::R2, ARM::R3}) {
         if (!MF.getRegInfo().isLiveIn(Reg)) {
           --EntryRegDeficit;
           LLVM_DEBUG(dbgs()
                      << printReg(Reg, TRI)
                      << " is unused argument register, EntryRegDeficit = "
                      << EntryRegDeficit << "\n");
         }
       }
 
       // Unused return registers can be clobbered in the epilogue for free.
       int ExitRegDeficit = AFI->getReturnRegsCount() - 4;
       LLVM_DEBUG(dbgs() << AFI->getReturnRegsCount()
                         << " return regs used, ExitRegDeficit = "
                         << ExitRegDeficit << "\n");
 
       int RegDeficit = std::max(EntryRegDeficit, ExitRegDeficit);
       LLVM_DEBUG(dbgs() << "RegDeficit = " << RegDeficit << "\n");
 
       // r4-r6 can be used in the prologue if they are pushed by the first push
       // instruction.
       for (unsigned Reg : {ARM::R4, ARM::R5, ARM::R6}) {
         if (SavedRegs.test(Reg)) {
           --RegDeficit;
           LLVM_DEBUG(dbgs() << printReg(Reg, TRI)
                             << " is saved low register, RegDeficit = "
                             << RegDeficit << "\n");
         } else {
           AvailableRegs.push_back(Reg);
           LLVM_DEBUG(
               dbgs()
               << printReg(Reg, TRI)
               << " is non-saved low register, adding to AvailableRegs\n");
         }
       }
 
       // r7 can be used if it is not being used as the frame pointer.
       if (!HasFP) {
         if (SavedRegs.test(ARM::R7)) {
           --RegDeficit;
           LLVM_DEBUG(dbgs() << "%r7 is saved low register, RegDeficit = "
                             << RegDeficit << "\n");
         } else {
           AvailableRegs.push_back(ARM::R7);
           LLVM_DEBUG(
               dbgs()
               << "%r7 is non-saved low register, adding to AvailableRegs\n");
         }
       }
 
       // Each of r8-r11 needs to be copied to a low register, then pushed.
       for (unsigned Reg : {ARM::R8, ARM::R9, ARM::R10, ARM::R11}) {
         if (SavedRegs.test(Reg)) {
           ++RegDeficit;
           LLVM_DEBUG(dbgs() << printReg(Reg, TRI)
                             << " is saved high register, RegDeficit = "
                             << RegDeficit << "\n");
         }
       }
 
       // LR can only be used by PUSH, not POP, and can't be used at all if the
       // llvm.returnaddress intrinsic is used. This is only worth doing if we
       // are more limited at function entry than exit.
       if ((EntryRegDeficit > ExitRegDeficit) &&
           !(MF.getRegInfo().isLiveIn(ARM::LR) &&
             MF.getFrameInfo().isReturnAddressTaken())) {
         if (SavedRegs.test(ARM::LR)) {
           --RegDeficit;
           LLVM_DEBUG(dbgs() << "%lr is saved register, RegDeficit = "
                             << RegDeficit << "\n");
         } else {
           AvailableRegs.push_back(ARM::LR);
           LLVM_DEBUG(dbgs() << "%lr is not saved, adding to AvailableRegs\n");
         }
       }
 
       // If there are more high registers that need pushing than low registers
       // available, push some more low registers so that we can use fewer push
       // instructions. This might not reduce RegDeficit all the way to zero,
       // because we can only guarantee that r4-r6 are available, but r8-r11 may
       // need saving.
       LLVM_DEBUG(dbgs() << "Final RegDeficit = " << RegDeficit << "\n");
       for (; RegDeficit > 0 && !AvailableRegs.empty(); --RegDeficit) {
         unsigned Reg = AvailableRegs.pop_back_val();
         LLVM_DEBUG(dbgs() << "Spilling " << printReg(Reg, TRI)
                           << " to make up reg deficit\n");
         SavedRegs.set(Reg);
         NumGPRSpills++;
         CS1Spilled = true;
         assert(!MRI.isReserved(Reg) && "Should not be reserved");
         if (!MRI.isPhysRegUsed(Reg))
           ExtraCSSpill = true;
         UnspilledCS1GPRs.erase(llvm::find(UnspilledCS1GPRs, Reg));
         if (Reg == ARM::LR)
           LRSpilled = true;
       }
       LLVM_DEBUG(dbgs() << "After adding spills, RegDeficit = " << RegDeficit
                         << "\n");
     }
 
     // If LR is not spilled, but at least one of R4, R5, R6, and R7 is spilled.
     // Spill LR as well so we can fold BX_RET to the registers restore (LDM).
     if (!LRSpilled && CS1Spilled) {
       SavedRegs.set(ARM::LR);
       NumGPRSpills++;
       SmallVectorImpl<unsigned>::iterator LRPos;
       LRPos = llvm::find(UnspilledCS1GPRs, (unsigned)ARM::LR);
       if (LRPos != UnspilledCS1GPRs.end())
         UnspilledCS1GPRs.erase(LRPos);
 
       ForceLRSpill = false;
       if (!MRI.isReserved(ARM::LR) && !MRI.isPhysRegUsed(ARM::LR))
         ExtraCSSpill = true;
     }
 
     // If stack and double are 8-byte aligned and we are spilling an odd number
     // of GPRs, spill one extra callee save GPR so we won't have to pad between
     // the integer and double callee save areas.
     LLVM_DEBUG(dbgs() << "NumGPRSpills = " << NumGPRSpills << "\n");
     unsigned TargetAlign = getStackAlignment();
     if (TargetAlign >= 8 && (NumGPRSpills & 1)) {
       if (CS1Spilled && !UnspilledCS1GPRs.empty()) {
         for (unsigned i = 0, e = UnspilledCS1GPRs.size(); i != e; ++i) {
           unsigned Reg = UnspilledCS1GPRs[i];
           // Don't spill high register if the function is thumb.  In the case of
           // Windows on ARM, accept R11 (frame pointer)
           if (!AFI->isThumbFunction() ||
               (STI.isTargetWindows() && Reg == ARM::R11) ||
               isARMLowRegister(Reg) || Reg == ARM::LR) {
             SavedRegs.set(Reg);
             LLVM_DEBUG(dbgs() << "Spilling " << printReg(Reg, TRI)
                               << " to make up alignment\n");
             if (!MRI.isReserved(Reg) && !MRI.isPhysRegUsed(Reg))
               ExtraCSSpill = true;
             break;
           }
         }
       } else if (!UnspilledCS2GPRs.empty() && !AFI->isThumb1OnlyFunction()) {
         unsigned Reg = UnspilledCS2GPRs.front();
         SavedRegs.set(Reg);
         LLVM_DEBUG(dbgs() << "Spilling " << printReg(Reg, TRI)
                           << " to make up alignment\n");
         if (!MRI.isReserved(Reg) && !MRI.isPhysRegUsed(Reg))
           ExtraCSSpill = true;
       }
     }
 
     // Estimate if we might need to scavenge a register at some point in order
     // to materialize a stack offset. If so, either spill one additional
     // callee-saved register or reserve a special spill slot to facilitate
     // register scavenging. Thumb1 needs a spill slot for stack pointer
     // adjustments also, even when the frame itself is small.
     if (BigFrameOffsets && !ExtraCSSpill) {
       // If any non-reserved CS register isn't spilled, just spill one or two
       // extra. That should take care of it!
       unsigned NumExtras = TargetAlign / 4;
       SmallVector<unsigned, 2> Extras;
       while (NumExtras && !UnspilledCS1GPRs.empty()) {
         unsigned Reg = UnspilledCS1GPRs.back();
         UnspilledCS1GPRs.pop_back();
         if (!MRI.isReserved(Reg) &&
             (!AFI->isThumb1OnlyFunction() || isARMLowRegister(Reg) ||
              Reg == ARM::LR)) {
           Extras.push_back(Reg);
           NumExtras--;
         }
       }
       // For non-Thumb1 functions, also check for hi-reg CS registers
       if (!AFI->isThumb1OnlyFunction()) {
         while (NumExtras && !UnspilledCS2GPRs.empty()) {
           unsigned Reg = UnspilledCS2GPRs.back();
           UnspilledCS2GPRs.pop_back();
           if (!MRI.isReserved(Reg)) {
             Extras.push_back(Reg);
             NumExtras--;
           }
         }
       }
       if (NumExtras == 0) {
         for (unsigned Reg : Extras) {
           SavedRegs.set(Reg);
           if (!MRI.isPhysRegUsed(Reg))
             ExtraCSSpill = true;
         }
       }
       if (!ExtraCSSpill && !AFI->isThumb1OnlyFunction()) {
         // note: Thumb1 functions spill to R12, not the stack.  Reserve a slot
         // closest to SP or frame pointer.
         assert(RS && "Register scavenging not provided");
         const TargetRegisterClass &RC = ARM::GPRRegClass;
         unsigned Size = TRI->getSpillSize(RC);
         unsigned Align = TRI->getSpillAlignment(RC);
         RS->addScavengingFrameIndex(MFI.CreateStackObject(Size, Align, false));
       }
     }
   }
 
   if (ForceLRSpill) {
     SavedRegs.set(ARM::LR);
     AFI->setLRIsSpilledForFarJump(true);
   }
 }
 
 MachineBasicBlock::iterator ARMFrameLowering::eliminateCallFramePseudoInstr(
     MachineFunction &MF, MachineBasicBlock &MBB,
     MachineBasicBlock::iterator I) const {
   const ARMBaseInstrInfo &TII =
       *static_cast<const ARMBaseInstrInfo *>(MF.getSubtarget().getInstrInfo());
   if (!hasReservedCallFrame(MF)) {
     // If we have alloca, convert as follows:
     // ADJCALLSTACKDOWN -> sub, sp, sp, amount
     // ADJCALLSTACKUP   -> add, sp, sp, amount
     MachineInstr &Old = *I;
     DebugLoc dl = Old.getDebugLoc();
     unsigned Amount = TII.getFrameSize(Old);
     if (Amount != 0) {
       // We need to keep the stack aligned properly.  To do this, we round the
       // amount of space needed for the outgoing arguments up to the next
       // alignment boundary.
       Amount = alignSPAdjust(Amount);
 
       ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
       assert(!AFI->isThumb1OnlyFunction() &&
              "This eliminateCallFramePseudoInstr does not support Thumb1!");
       bool isARM = !AFI->isThumbFunction();
 
       // Replace the pseudo instruction with a new instruction...
       unsigned Opc = Old.getOpcode();
       int PIdx = Old.findFirstPredOperandIdx();
       ARMCC::CondCodes Pred =
           (PIdx == -1) ? ARMCC::AL
                        : (ARMCC::CondCodes)Old.getOperand(PIdx).getImm();
       unsigned PredReg = TII.getFramePred(Old);
       if (Opc == ARM::ADJCALLSTACKDOWN || Opc == ARM::tADJCALLSTACKDOWN) {
         emitSPUpdate(isARM, MBB, I, dl, TII, -Amount, MachineInstr::NoFlags,
                      Pred, PredReg);
       } else {
         assert(Opc == ARM::ADJCALLSTACKUP || Opc == ARM::tADJCALLSTACKUP);
         emitSPUpdate(isARM, MBB, I, dl, TII, Amount, MachineInstr::NoFlags,
                      Pred, PredReg);
       }
     }
   }
   return MBB.erase(I);
 }
 
 /// Get the minimum constant for ARM that is greater than or equal to the
 /// argument. In ARM, constants can have any value that can be produced by
 /// rotating an 8-bit value to the right by an even number of bits within a
 /// 32-bit word.
 static uint32_t alignToARMConstant(uint32_t Value) {
   unsigned Shifted = 0;
 
   if (Value == 0)
       return 0;
 
   while (!(Value & 0xC0000000)) {
       Value = Value << 2;
       Shifted += 2;
   }
 
   bool Carry = (Value & 0x00FFFFFF);
   Value = ((Value & 0xFF000000) >> 24) + Carry;
 
   if (Value & 0x0000100)
       Value = Value & 0x000001FC;
 
   if (Shifted > 24)
       Value = Value >> (Shifted - 24);
   else
       Value = Value << (24 - Shifted);
 
   return Value;
 }
 
 // The stack limit in the TCB is set to this many bytes above the actual
 // stack limit.
 static const uint64_t kSplitStackAvailable = 256;
 
 // Adjust the function prologue to enable split stacks. This currently only
 // supports android and linux.
 //
 // The ABI of the segmented stack prologue is a little arbitrarily chosen, but
 // must be well defined in order to allow for consistent implementations of the
 // __morestack helper function. The ABI is also not a normal ABI in that it
 // doesn't follow the normal calling conventions because this allows the
 // prologue of each function to be optimized further.
 //
 // Currently, the ABI looks like (when calling __morestack)
 //
 //  * r4 holds the minimum stack size requested for this function call
 //  * r5 holds the stack size of the arguments to the function
 //  * the beginning of the function is 3 instructions after the call to
 //    __morestack
 //
 // Implementations of __morestack should use r4 to allocate a new stack, r5 to
 // place the arguments on to the new stack, and the 3-instruction knowledge to
 // jump directly to the body of the function when working on the new stack.
 //
 // An old (and possibly no longer compatible) implementation of __morestack for
 // ARM can be found at [1].
 //
 // [1] - https://github.com/mozilla/rust/blob/86efd9/src/rt/arch/arm/morestack.S
 void ARMFrameLowering::adjustForSegmentedStacks(
     MachineFunction &MF, MachineBasicBlock &PrologueMBB) const {
   unsigned Opcode;
   unsigned CFIIndex;
   const ARMSubtarget *ST = &MF.getSubtarget<ARMSubtarget>();
   bool Thumb = ST->isThumb();
 
   // Sadly, this currently doesn't support varargs, platforms other than
   // android/linux. Note that thumb1/thumb2 are support for android/linux.
   if (MF.getFunction().isVarArg())
     report_fatal_error("Segmented stacks do not support vararg functions.");
   if (!ST->isTargetAndroid() && !ST->isTargetLinux())
     report_fatal_error("Segmented stacks not supported on this platform.");
 
   MachineFrameInfo &MFI = MF.getFrameInfo();
   MachineModuleInfo &MMI = MF.getMMI();
   MCContext &Context = MMI.getContext();
   const MCRegisterInfo *MRI = Context.getRegisterInfo();
   const ARMBaseInstrInfo &TII =
       *static_cast<const ARMBaseInstrInfo *>(MF.getSubtarget().getInstrInfo());
   ARMFunctionInfo *ARMFI = MF.getInfo<ARMFunctionInfo>();
   DebugLoc DL;
 
   uint64_t StackSize = MFI.getStackSize();
 
   // Do not generate a prologue for leaf functions with a stack of size zero.
   // For non-leaf functions we have to allow for the possibility that the
   // call is to a non-split function, as in PR37807.
   if (StackSize == 0 && !MFI.hasTailCall())
     return;
 
   // Use R4 and R5 as scratch registers.
   // We save R4 and R5 before use and restore them before leaving the function.
   unsigned ScratchReg0 = ARM::R4;
   unsigned ScratchReg1 = ARM::R5;
   uint64_t AlignedStackSize;
 
   MachineBasicBlock *PrevStackMBB = MF.CreateMachineBasicBlock();
   MachineBasicBlock *PostStackMBB = MF.CreateMachineBasicBlock();
   MachineBasicBlock *AllocMBB = MF.CreateMachineBasicBlock();
   MachineBasicBlock *GetMBB = MF.CreateMachineBasicBlock();
   MachineBasicBlock *McrMBB = MF.CreateMachineBasicBlock();
 
   // Grab everything that reaches PrologueMBB to update there liveness as well.
   SmallPtrSet<MachineBasicBlock *, 8> BeforePrologueRegion;
   SmallVector<MachineBasicBlock *, 2> WalkList;
   WalkList.push_back(&PrologueMBB);
 
   do {
     MachineBasicBlock *CurMBB = WalkList.pop_back_val();
     for (MachineBasicBlock *PredBB : CurMBB->predecessors()) {
       if (BeforePrologueRegion.insert(PredBB).second)
         WalkList.push_back(PredBB);
     }
   } while (!WalkList.empty());
 
   // The order in that list is important.
   // The blocks will all be inserted before PrologueMBB using that order.
   // Therefore the block that should appear first in the CFG should appear
   // first in the list.
   MachineBasicBlock *AddedBlocks[] = {PrevStackMBB, McrMBB, GetMBB, AllocMBB,
                                       PostStackMBB};
 
   for (MachineBasicBlock *B : AddedBlocks)
     BeforePrologueRegion.insert(B);
 
   for (const auto &LI : PrologueMBB.liveins()) {
     for (MachineBasicBlock *PredBB : BeforePrologueRegion)
       PredBB->addLiveIn(LI);
   }
 
   // Remove the newly added blocks from the list, since we know
   // we do not have to do the following updates for them.
   for (MachineBasicBlock *B : AddedBlocks) {
     BeforePrologueRegion.erase(B);
     MF.insert(PrologueMBB.getIterator(), B);
   }
 
   for (MachineBasicBlock *MBB : BeforePrologueRegion) {
     // Make sure the LiveIns are still sorted and unique.
     MBB->sortUniqueLiveIns();
     // Replace the edges to PrologueMBB by edges to the sequences
     // we are about to add.
     MBB->ReplaceUsesOfBlockWith(&PrologueMBB, AddedBlocks[0]);
   }
 
   // The required stack size that is aligned to ARM constant criterion.
   AlignedStackSize = alignToARMConstant(StackSize);
 
   // When the frame size is less than 256 we just compare the stack
   // boundary directly to the value of the stack pointer, per gcc.
   bool CompareStackPointer = AlignedStackSize < kSplitStackAvailable;
 
   // We will use two of the callee save registers as scratch registers so we
   // need to save those registers onto the stack.
   // We will use SR0 to hold stack limit and SR1 to hold the stack size
   // requested and arguments for __morestack().
   // SR0: Scratch Register #0
   // SR1: Scratch Register #1
   // push {SR0, SR1}
   if (Thumb) {
     BuildMI(PrevStackMBB, DL, TII.get(ARM::tPUSH))
         .add(predOps(ARMCC::AL))
         .addReg(ScratchReg0)
         .addReg(ScratchReg1);
   } else {
     BuildMI(PrevStackMBB, DL, TII.get(ARM::STMDB_UPD))
         .addReg(ARM::SP, RegState::Define)
         .addReg(ARM::SP)
         .add(predOps(ARMCC::AL))
         .addReg(ScratchReg0)
         .addReg(ScratchReg1);
   }
 
   // Emit the relevant DWARF information about the change in stack pointer as
   // well as where to find both r4 and r5 (the callee-save registers)
   CFIIndex =
       MF.addFrameInst(MCCFIInstruction::createDefCfaOffset(nullptr, -8));
   BuildMI(PrevStackMBB, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
       .addCFIIndex(CFIIndex);
   CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
       nullptr, MRI->getDwarfRegNum(ScratchReg1, true), -4));
   BuildMI(PrevStackMBB, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
       .addCFIIndex(CFIIndex);
   CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
       nullptr, MRI->getDwarfRegNum(ScratchReg0, true), -8));
   BuildMI(PrevStackMBB, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
       .addCFIIndex(CFIIndex);
 
   // mov SR1, sp
   if (Thumb) {
     BuildMI(McrMBB, DL, TII.get(ARM::tMOVr), ScratchReg1)
         .addReg(ARM::SP)
         .add(predOps(ARMCC::AL));
   } else if (CompareStackPointer) {
     BuildMI(McrMBB, DL, TII.get(ARM::MOVr), ScratchReg1)
         .addReg(ARM::SP)
         .add(predOps(ARMCC::AL))
         .add(condCodeOp());
   }
 
   // sub SR1, sp, #StackSize
   if (!CompareStackPointer && Thumb) {
     BuildMI(McrMBB, DL, TII.get(ARM::tSUBi8), ScratchReg1)
         .add(condCodeOp())
         .addReg(ScratchReg1)
         .addImm(AlignedStackSize)
         .add(predOps(ARMCC::AL));
   } else if (!CompareStackPointer) {
     BuildMI(McrMBB, DL, TII.get(ARM::SUBri), ScratchReg1)
         .addReg(ARM::SP)
         .addImm(AlignedStackSize)
         .add(predOps(ARMCC::AL))
         .add(condCodeOp());
   }
 
   if (Thumb && ST->isThumb1Only()) {
     unsigned PCLabelId = ARMFI->createPICLabelUId();
     ARMConstantPoolValue *NewCPV = ARMConstantPoolSymbol::Create(
         MF.getFunction().getContext(), "__STACK_LIMIT", PCLabelId, 0);
     MachineConstantPool *MCP = MF.getConstantPool();
     unsigned CPI = MCP->getConstantPoolIndex(NewCPV, 4);
 
     // ldr SR0, [pc, offset(STACK_LIMIT)]
     BuildMI(GetMBB, DL, TII.get(ARM::tLDRpci), ScratchReg0)
         .addConstantPoolIndex(CPI)
         .add(predOps(ARMCC::AL));
 
     // ldr SR0, [SR0]
     BuildMI(GetMBB, DL, TII.get(ARM::tLDRi), ScratchReg0)
         .addReg(ScratchReg0)
         .addImm(0)
         .add(predOps(ARMCC::AL));
   } else {
     // Get TLS base address from the coprocessor
     // mrc p15, #0, SR0, c13, c0, #3
     BuildMI(McrMBB, DL, TII.get(ARM::MRC), ScratchReg0)
         .addImm(15)
         .addImm(0)
         .addImm(13)
         .addImm(0)
         .addImm(3)
         .add(predOps(ARMCC::AL));
 
     // Use the last tls slot on android and a private field of the TCP on linux.
     assert(ST->isTargetAndroid() || ST->isTargetLinux());
     unsigned TlsOffset = ST->isTargetAndroid() ? 63 : 1;
 
     // Get the stack limit from the right offset
     // ldr SR0, [sr0, #4 * TlsOffset]
     BuildMI(GetMBB, DL, TII.get(ARM::LDRi12), ScratchReg0)
         .addReg(ScratchReg0)
         .addImm(4 * TlsOffset)
         .add(predOps(ARMCC::AL));
   }
 
   // Compare stack limit with stack size requested.
   // cmp SR0, SR1
   Opcode = Thumb ? ARM::tCMPr : ARM::CMPrr;
   BuildMI(GetMBB, DL, TII.get(Opcode))
       .addReg(ScratchReg0)
       .addReg(ScratchReg1)
       .add(predOps(ARMCC::AL));
 
   // This jump is taken if StackLimit < SP - stack required.
   Opcode = Thumb ? ARM::tBcc : ARM::Bcc;
   BuildMI(GetMBB, DL, TII.get(Opcode)).addMBB(PostStackMBB)
        .addImm(ARMCC::LO)
        .addReg(ARM::CPSR);
 
 
   // Calling __morestack(StackSize, Size of stack arguments).
   // __morestack knows that the stack size requested is in SR0(r4)
   // and amount size of stack arguments is in SR1(r5).
 
   // Pass first argument for the __morestack by Scratch Register #0.
   //   The amount size of stack required
   if (Thumb) {
     BuildMI(AllocMBB, DL, TII.get(ARM::tMOVi8), ScratchReg0)
         .add(condCodeOp())
         .addImm(AlignedStackSize)
         .add(predOps(ARMCC::AL));
   } else {
     BuildMI(AllocMBB, DL, TII.get(ARM::MOVi), ScratchReg0)
         .addImm(AlignedStackSize)
         .add(predOps(ARMCC::AL))
         .add(condCodeOp());
   }
   // Pass second argument for the __morestack by Scratch Register #1.
   //   The amount size of stack consumed to save function arguments.
   if (Thumb) {
     BuildMI(AllocMBB, DL, TII.get(ARM::tMOVi8), ScratchReg1)
         .add(condCodeOp())
         .addImm(alignToARMConstant(ARMFI->getArgumentStackSize()))
         .add(predOps(ARMCC::AL));
   } else {
     BuildMI(AllocMBB, DL, TII.get(ARM::MOVi), ScratchReg1)
         .addImm(alignToARMConstant(ARMFI->getArgumentStackSize()))
         .add(predOps(ARMCC::AL))
         .add(condCodeOp());
   }
 
   // push {lr} - Save return address of this function.
   if (Thumb) {
     BuildMI(AllocMBB, DL, TII.get(ARM::tPUSH))
         .add(predOps(ARMCC::AL))
         .addReg(ARM::LR);
   } else {
     BuildMI(AllocMBB, DL, TII.get(ARM::STMDB_UPD))
         .addReg(ARM::SP, RegState::Define)
         .addReg(ARM::SP)
         .add(predOps(ARMCC::AL))
         .addReg(ARM::LR);
   }
 
   // Emit the DWARF info about the change in stack as well as where to find the
   // previous link register
   CFIIndex =
       MF.addFrameInst(MCCFIInstruction::createDefCfaOffset(nullptr, -12));
   BuildMI(AllocMBB, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
       .addCFIIndex(CFIIndex);
   CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
         nullptr, MRI->getDwarfRegNum(ARM::LR, true), -12));
   BuildMI(AllocMBB, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
       .addCFIIndex(CFIIndex);
 
   // Call __morestack().
   if (Thumb) {
     BuildMI(AllocMBB, DL, TII.get(ARM::tBL))
         .add(predOps(ARMCC::AL))
         .addExternalSymbol("__morestack");
   } else {
     BuildMI(AllocMBB, DL, TII.get(ARM::BL))
         .addExternalSymbol("__morestack");
   }
 
   // pop {lr} - Restore return address of this original function.
   if (Thumb) {
     if (ST->isThumb1Only()) {
       BuildMI(AllocMBB, DL, TII.get(ARM::tPOP))
           .add(predOps(ARMCC::AL))
           .addReg(ScratchReg0);
       BuildMI(AllocMBB, DL, TII.get(ARM::tMOVr), ARM::LR)
           .addReg(ScratchReg0)
           .add(predOps(ARMCC::AL));
     } else {
       BuildMI(AllocMBB, DL, TII.get(ARM::t2LDR_POST))
           .addReg(ARM::LR, RegState::Define)
           .addReg(ARM::SP, RegState::Define)
           .addReg(ARM::SP)
           .addImm(4)
           .add(predOps(ARMCC::AL));
     }
   } else {
     BuildMI(AllocMBB, DL, TII.get(ARM::LDMIA_UPD))
         .addReg(ARM::SP, RegState::Define)
         .addReg(ARM::SP)
         .add(predOps(ARMCC::AL))
         .addReg(ARM::LR);
   }
 
   // Restore SR0 and SR1 in case of __morestack() was called.
   // __morestack() will skip PostStackMBB block so we need to restore
   // scratch registers from here.
   // pop {SR0, SR1}
   if (Thumb) {
     BuildMI(AllocMBB, DL, TII.get(ARM::tPOP))
         .add(predOps(ARMCC::AL))
         .addReg(ScratchReg0)
         .addReg(ScratchReg1);
   } else {
     BuildMI(AllocMBB, DL, TII.get(ARM::LDMIA_UPD))
         .addReg(ARM::SP, RegState::Define)
         .addReg(ARM::SP)
         .add(predOps(ARMCC::AL))
         .addReg(ScratchReg0)
         .addReg(ScratchReg1);
   }
 
   // Update the CFA offset now that we've popped
   CFIIndex = MF.addFrameInst(MCCFIInstruction::createDefCfaOffset(nullptr, 0));
   BuildMI(AllocMBB, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
       .addCFIIndex(CFIIndex);
 
   // Return from this function.
   BuildMI(AllocMBB, DL, TII.get(ST->getReturnOpcode())).add(predOps(ARMCC::AL));
 
   // Restore SR0 and SR1 in case of __morestack() was not called.
   // pop {SR0, SR1}
   if (Thumb) {
     BuildMI(PostStackMBB, DL, TII.get(ARM::tPOP))
         .add(predOps(ARMCC::AL))
         .addReg(ScratchReg0)
         .addReg(ScratchReg1);
   } else {
     BuildMI(PostStackMBB, DL, TII.get(ARM::LDMIA_UPD))
         .addReg(ARM::SP, RegState::Define)
         .addReg(ARM::SP)
         .add(predOps(ARMCC::AL))
         .addReg(ScratchReg0)
         .addReg(ScratchReg1);
   }
 
   // Update the CFA offset now that we've popped
   CFIIndex = MF.addFrameInst(MCCFIInstruction::createDefCfaOffset(nullptr, 0));
   BuildMI(PostStackMBB, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
       .addCFIIndex(CFIIndex);
 
   // Tell debuggers that r4 and r5 are now the same as they were in the
   // previous function, that they're the "Same Value".
   CFIIndex = MF.addFrameInst(MCCFIInstruction::createSameValue(
       nullptr, MRI->getDwarfRegNum(ScratchReg0, true)));
   BuildMI(PostStackMBB, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
       .addCFIIndex(CFIIndex);
   CFIIndex = MF.addFrameInst(MCCFIInstruction::createSameValue(
       nullptr, MRI->getDwarfRegNum(ScratchReg1, true)));
   BuildMI(PostStackMBB, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
       .addCFIIndex(CFIIndex);
 
   // Organizing MBB lists
   PostStackMBB->addSuccessor(&PrologueMBB);
 
   AllocMBB->addSuccessor(PostStackMBB);
 
   GetMBB->addSuccessor(PostStackMBB);
   GetMBB->addSuccessor(AllocMBB);
 
   McrMBB->addSuccessor(GetMBB);
 
   PrevStackMBB->addSuccessor(McrMBB);
 
 #ifdef EXPENSIVE_CHECKS
   MF.verify();
 #endif
 }
Index: vendor/llvm/dist-release_70/lib/Target/ARM/ARMInstrFormats.td
===================================================================
--- vendor/llvm/dist-release_70/lib/Target/ARM/ARMInstrFormats.td	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Target/ARM/ARMInstrFormats.td	(revision 338575)
@@ -1,2619 +1,2620 @@
 //===-- ARMInstrFormats.td - ARM Instruction Formats -------*- tablegen -*-===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 
 //===----------------------------------------------------------------------===//
 //
 // ARM Instruction Format Definitions.
 //
 
 // Format specifies the encoding used by the instruction.  This is part of the
 // ad-hoc solution used to emit machine instruction encodings by our machine
 // code emitter.
 class Format<bits<6> val> {
   bits<6> Value = val;
 }
 
 def Pseudo        : Format<0>;
 def MulFrm        : Format<1>;
 def BrFrm         : Format<2>;
 def BrMiscFrm     : Format<3>;
 
 def DPFrm         : Format<4>;
 def DPSoRegRegFrm    : Format<5>;
 
 def LdFrm         : Format<6>;
 def StFrm         : Format<7>;
 def LdMiscFrm     : Format<8>;
 def StMiscFrm     : Format<9>;
 def LdStMulFrm    : Format<10>;
 
 def LdStExFrm     : Format<11>;
 
 def ArithMiscFrm  : Format<12>;
 def SatFrm        : Format<13>;
 def ExtFrm        : Format<14>;
 
 def VFPUnaryFrm   : Format<15>;
 def VFPBinaryFrm  : Format<16>;
 def VFPConv1Frm   : Format<17>;
 def VFPConv2Frm   : Format<18>;
 def VFPConv3Frm   : Format<19>;
 def VFPConv4Frm   : Format<20>;
 def VFPConv5Frm   : Format<21>;
 def VFPLdStFrm    : Format<22>;
 def VFPLdStMulFrm : Format<23>;
 def VFPMiscFrm    : Format<24>;
 
 def ThumbFrm      : Format<25>;
 def MiscFrm       : Format<26>;
 
 def NGetLnFrm     : Format<27>;
 def NSetLnFrm     : Format<28>;
 def NDupFrm       : Format<29>;
 def NLdStFrm      : Format<30>;
 def N1RegModImmFrm: Format<31>;
 def N2RegFrm      : Format<32>;
 def NVCVTFrm      : Format<33>;
 def NVDupLnFrm    : Format<34>;
 def N2RegVShLFrm  : Format<35>;
 def N2RegVShRFrm  : Format<36>;
 def N3RegFrm      : Format<37>;
 def N3RegVShFrm   : Format<38>;
 def NVExtFrm      : Format<39>;
 def NVMulSLFrm    : Format<40>;
 def NVTBLFrm      : Format<41>;
 def DPSoRegImmFrm  : Format<42>;
 def N3RegCplxFrm  : Format<43>;
 
 // Misc flags.
 
 // The instruction has an Rn register operand.
 // UnaryDP - Indicates this is a unary data processing instruction, i.e.
 // it doesn't have a Rn operand.
 class UnaryDP    { bit isUnaryDataProc = 1; }
 
 // Xform16Bit - Indicates this Thumb2 instruction may be transformed into
 // a 16-bit Thumb instruction if certain conditions are met.
 class Xform16Bit { bit canXformTo16Bit = 1; }
 
 //===----------------------------------------------------------------------===//
 // ARM Instruction flags.  These need to match ARMBaseInstrInfo.h.
 //
 
 // FIXME: Once the JIT is MC-ized, these can go away.
 // Addressing mode.
 class AddrMode<bits<5> val> {
   bits<5> Value = val;
 }
 def AddrModeNone    : AddrMode<0>;
 def AddrMode1       : AddrMode<1>;
 def AddrMode2       : AddrMode<2>;
 def AddrMode3       : AddrMode<3>;
 def AddrMode4       : AddrMode<4>;
 def AddrMode5       : AddrMode<5>;
 def AddrMode6       : AddrMode<6>;
 def AddrModeT1_1    : AddrMode<7>;
 def AddrModeT1_2    : AddrMode<8>;
 def AddrModeT1_4    : AddrMode<9>;
 def AddrModeT1_s    : AddrMode<10>;
 def AddrModeT2_i12  : AddrMode<11>;
 def AddrModeT2_i8   : AddrMode<12>;
 def AddrModeT2_so   : AddrMode<13>;
 def AddrModeT2_pc   : AddrMode<14>;
 def AddrModeT2_i8s4 : AddrMode<15>;
 def AddrMode_i12    : AddrMode<16>;
 def AddrMode5FP16   : AddrMode<17>;
+def AddrModeT2_ldrex : AddrMode<18>;
 
 // Load / store index mode.
 class IndexMode<bits<2> val> {
   bits<2> Value = val;
 }
 def IndexModeNone : IndexMode<0>;
 def IndexModePre  : IndexMode<1>;
 def IndexModePost : IndexMode<2>;
 def IndexModeUpd  : IndexMode<3>;
 
 // Instruction execution domain.
 class Domain<bits<3> val> {
   bits<3> Value = val;
 }
 def GenericDomain : Domain<0>;
 def VFPDomain     : Domain<1>; // Instructions in VFP domain only
 def NeonDomain    : Domain<2>; // Instructions in Neon domain only
 def VFPNeonDomain : Domain<3>; // Instructions in both VFP & Neon domains
 def VFPNeonA8Domain : Domain<5>; // Instructions in VFP & Neon under A8
 
 //===----------------------------------------------------------------------===//
 // ARM special operands.
 //
 
 // ARM imod and iflag operands, used only by the CPS instruction.
 def imod_op : Operand<i32> {
   let PrintMethod = "printCPSIMod";
 }
 
 def ProcIFlagsOperand : AsmOperandClass {
   let Name = "ProcIFlags";
   let ParserMethod = "parseProcIFlagsOperand";
 }
 def iflags_op : Operand<i32> {
   let PrintMethod = "printCPSIFlag";
   let ParserMatchClass = ProcIFlagsOperand;
 }
 
 // ARM Predicate operand. Default to 14 = always (AL). Second part is CC
 // register whose default is 0 (no register).
 def CondCodeOperand : AsmOperandClass { let Name = "CondCode"; }
 def pred : PredicateOperand<OtherVT, (ops i32imm, i32imm),
                                      (ops (i32 14), (i32 zero_reg))> {
   let PrintMethod = "printPredicateOperand";
   let ParserMatchClass = CondCodeOperand;
   let DecoderMethod = "DecodePredicateOperand";
 }
 
 // Selectable predicate operand for CMOV instructions. We can't use a normal
 // predicate because the default values interfere with instruction selection. In
 // all other respects it is identical though: pseudo-instruction expansion
 // relies on the MachineOperands being compatible.
 def cmovpred : Operand<i32>, PredicateOp,
                ComplexPattern<i32, 2, "SelectCMOVPred"> {
   let MIOperandInfo = (ops i32imm, i32imm);
   let PrintMethod = "printPredicateOperand";
 }
 
 // Conditional code result for instructions whose 's' bit is set, e.g. subs.
 def CCOutOperand : AsmOperandClass { let Name = "CCOut"; }
 def cc_out : OptionalDefOperand<OtherVT, (ops CCR), (ops (i32 zero_reg))> {
   let EncoderMethod = "getCCOutOpValue";
   let PrintMethod = "printSBitModifierOperand";
   let ParserMatchClass = CCOutOperand;
   let DecoderMethod = "DecodeCCOutOperand";
 }
 
 // Same as cc_out except it defaults to setting CPSR.
 def s_cc_out : OptionalDefOperand<OtherVT, (ops CCR), (ops (i32 CPSR))> {
   let EncoderMethod = "getCCOutOpValue";
   let PrintMethod = "printSBitModifierOperand";
   let ParserMatchClass = CCOutOperand;
   let DecoderMethod = "DecodeCCOutOperand";
 }
 
 // ARM special operands for disassembly only.
 //
 def SetEndAsmOperand : ImmAsmOperand<0,1> {
   let Name = "SetEndImm";
   let ParserMethod = "parseSetEndImm";
 }
 def setend_op : Operand<i32> {
   let PrintMethod = "printSetendOperand";
   let ParserMatchClass = SetEndAsmOperand;
 }
 
 def MSRMaskOperand : AsmOperandClass {
   let Name = "MSRMask";
   let ParserMethod = "parseMSRMaskOperand";
 }
 def msr_mask : Operand<i32> {
   let PrintMethod = "printMSRMaskOperand";
   let DecoderMethod = "DecodeMSRMask";
   let ParserMatchClass = MSRMaskOperand;
 }
 
 def BankedRegOperand : AsmOperandClass {
   let Name = "BankedReg";
   let ParserMethod = "parseBankedRegOperand";
 }
 def banked_reg : Operand<i32> {
   let PrintMethod = "printBankedRegOperand";
   let DecoderMethod = "DecodeBankedReg";
   let ParserMatchClass = BankedRegOperand;
 }
 
 // Shift Right Immediate - A shift right immediate is encoded differently from
 // other shift immediates. The imm6 field is encoded like so:
 //
 //    Offset    Encoding
 //     8        imm6<5:3> = '001', 8 - <imm> is encoded in imm6<2:0>
 //     16       imm6<5:4> = '01', 16 - <imm> is encoded in imm6<3:0>
 //     32       imm6<5> = '1', 32 - <imm> is encoded in imm6<4:0>
 //     64       64 - <imm> is encoded in imm6<5:0>
 def shr_imm8_asm_operand : ImmAsmOperand<1,8> { let Name = "ShrImm8"; }
 def shr_imm8  : Operand<i32>, ImmLeaf<i32, [{ return Imm > 0 && Imm <= 8; }]> {
   let EncoderMethod = "getShiftRight8Imm";
   let DecoderMethod = "DecodeShiftRight8Imm";
   let ParserMatchClass = shr_imm8_asm_operand;
 }
 def shr_imm16_asm_operand : ImmAsmOperand<1,16> { let Name = "ShrImm16"; }
 def shr_imm16 : Operand<i32>, ImmLeaf<i32, [{ return Imm > 0 && Imm <= 16; }]> {
   let EncoderMethod = "getShiftRight16Imm";
   let DecoderMethod = "DecodeShiftRight16Imm";
   let ParserMatchClass = shr_imm16_asm_operand;
 }
 def shr_imm32_asm_operand : ImmAsmOperand<1,32> { let Name = "ShrImm32"; }
 def shr_imm32 : Operand<i32>, ImmLeaf<i32, [{ return Imm > 0 && Imm <= 32; }]> {
   let EncoderMethod = "getShiftRight32Imm";
   let DecoderMethod = "DecodeShiftRight32Imm";
   let ParserMatchClass = shr_imm32_asm_operand;
 }
 def shr_imm64_asm_operand : ImmAsmOperand<1,64> { let Name = "ShrImm64"; }
 def shr_imm64 : Operand<i32>, ImmLeaf<i32, [{ return Imm > 0 && Imm <= 64; }]> {
   let EncoderMethod = "getShiftRight64Imm";
   let DecoderMethod = "DecodeShiftRight64Imm";
   let ParserMatchClass = shr_imm64_asm_operand;
 }
 
 
 // ARM Assembler operand for ldr Rd, =expression which generates an offset
 // to a constant pool entry or a MOV depending on the value of expression
 def const_pool_asm_operand : AsmOperandClass { let Name = "ConstPoolAsmImm"; }
 def const_pool_asm_imm : Operand<i32> {
   let ParserMatchClass = const_pool_asm_operand;
 }
 
 
 //===----------------------------------------------------------------------===//
 // ARM Assembler alias templates.
 //
 // Note: When EmitPriority == 1, the alias will be used for printing
 class ARMInstAlias<string Asm, dag Result, bit EmitPriority = 0>
       : InstAlias<Asm, Result, EmitPriority>, Requires<[IsARM]>;
 class ARMInstSubst<string Asm, dag Result, bit EmitPriority = 0>
       : InstAlias<Asm, Result, EmitPriority>,
         Requires<[IsARM,UseNegativeImmediates]>;
 class  tInstAlias<string Asm, dag Result, bit EmitPriority = 0>
       : InstAlias<Asm, Result, EmitPriority>, Requires<[IsThumb]>;
 class  tInstSubst<string Asm, dag Result, bit EmitPriority = 0>
       : InstAlias<Asm, Result, EmitPriority>,
         Requires<[IsThumb,UseNegativeImmediates]>;
 class t2InstAlias<string Asm, dag Result, bit EmitPriority = 0>
       : InstAlias<Asm, Result, EmitPriority>, Requires<[IsThumb2]>;
 class t2InstSubst<string Asm, dag Result, bit EmitPriority = 0>
       : InstAlias<Asm, Result, EmitPriority>,
         Requires<[IsThumb2,UseNegativeImmediates]>;
 class VFP2InstAlias<string Asm, dag Result, bit EmitPriority = 0>
       : InstAlias<Asm, Result, EmitPriority>, Requires<[HasVFP2]>;
 class VFP2DPInstAlias<string Asm, dag Result, bit EmitPriority = 0>
       : InstAlias<Asm, Result, EmitPriority>, Requires<[HasVFP2,HasDPVFP]>;
 class VFP3InstAlias<string Asm, dag Result, bit EmitPriority = 0>
       : InstAlias<Asm, Result, EmitPriority>, Requires<[HasVFP3]>;
 class NEONInstAlias<string Asm, dag Result, bit EmitPriority = 0>
       : InstAlias<Asm, Result, EmitPriority>, Requires<[HasNEON]>;
 
 
 class VFP2MnemonicAlias<string src, string dst> : MnemonicAlias<src, dst>,
           Requires<[HasVFP2]>;
 class NEONMnemonicAlias<string src, string dst> : MnemonicAlias<src, dst>,
           Requires<[HasNEON]>;
 
 //===----------------------------------------------------------------------===//
 // ARM Instruction templates.
 //
 
 
 class InstTemplate<AddrMode am, int sz, IndexMode im,
                    Format f, Domain d, string cstr, InstrItinClass itin>
   : Instruction {
   let Namespace = "ARM";
 
   AddrMode AM = am;
   int Size = sz;
   IndexMode IM = im;
   bits<2> IndexModeBits = IM.Value;
   Format F = f;
   bits<6> Form = F.Value;
   Domain D = d;
   bit isUnaryDataProc = 0;
   bit canXformTo16Bit = 0;
   // The instruction is a 16-bit flag setting Thumb instruction. Used
   // by the parser to determine whether to require the 'S' suffix on the
   // mnemonic (when not in an IT block) or preclude it (when in an IT block).
   bit thumbArithFlagSetting = 0;
 
   // If this is a pseudo instruction, mark it isCodeGenOnly.
   let isCodeGenOnly = !eq(!cast<string>(f), "Pseudo");
 
   // The layout of TSFlags should be kept in sync with ARMBaseInfo.h.
   let TSFlags{4-0}   = AM.Value;
   let TSFlags{6-5}   = IndexModeBits;
   let TSFlags{12-7} = Form;
   let TSFlags{13}    = isUnaryDataProc;
   let TSFlags{14}    = canXformTo16Bit;
   let TSFlags{17-15} = D.Value;
   let TSFlags{18}    = thumbArithFlagSetting;
 
   let Constraints = cstr;
   let Itinerary = itin;
 }
 
 class Encoding {
   field bits<32> Inst;
   // Mask of bits that cause an encoding to be UNPREDICTABLE.
   // If a bit is set, then if the corresponding bit in the
   // target encoding differs from its value in the "Inst" field,
   // the instruction is UNPREDICTABLE (SoftFail in abstract parlance).
   field bits<32> Unpredictable = 0;
   // SoftFail is the generic name for this field, but we alias it so
   // as to make it more obvious what it means in ARM-land.
   field bits<32> SoftFail = Unpredictable;
 }
 
 class InstARM<AddrMode am, int sz, IndexMode im,
               Format f, Domain d, string cstr, InstrItinClass itin>
   : InstTemplate<am, sz, im, f, d, cstr, itin>, Encoding {
   let DecoderNamespace = "ARM";
 }
 
 // This Encoding-less class is used by Thumb1 to specify the encoding bits later
 // on by adding flavors to specific instructions.
 class InstThumb<AddrMode am, int sz, IndexMode im,
                 Format f, Domain d, string cstr, InstrItinClass itin>
   : InstTemplate<am, sz, im, f, d, cstr, itin> {
   let DecoderNamespace = "Thumb";
 }
 
 // Pseudo-instructions for alternate assembly syntax (never used by codegen).
 // These are aliases that require C++ handling to convert to the target
 // instruction, while InstAliases can be handled directly by tblgen.
 class AsmPseudoInst<string asm, dag iops, dag oops = (outs)>
   : InstTemplate<AddrModeNone, 0, IndexModeNone, Pseudo, GenericDomain,
                  "", NoItinerary> {
   let OutOperandList = oops;
   let InOperandList = iops;
   let Pattern = [];
   let isCodeGenOnly = 0; // So we get asm matcher for it.
   let AsmString = asm;
   let isPseudo = 1;
 }
 
 class ARMAsmPseudo<string asm, dag iops, dag oops = (outs)>
   : AsmPseudoInst<asm, iops, oops>, Requires<[IsARM]>;
 class tAsmPseudo<string asm, dag iops, dag oops = (outs)>
   : AsmPseudoInst<asm, iops, oops>, Requires<[IsThumb]>;
 class t2AsmPseudo<string asm, dag iops, dag oops = (outs)>
   : AsmPseudoInst<asm, iops, oops>, Requires<[IsThumb2]>;
 class VFP2AsmPseudo<string asm, dag iops, dag oops = (outs)>
   : AsmPseudoInst<asm, iops, oops>, Requires<[HasVFP2]>;
 class NEONAsmPseudo<string asm, dag iops, dag oops = (outs)>
   : AsmPseudoInst<asm, iops, oops>, Requires<[HasNEON]>;
 
 // Pseudo instructions for the code generator.
 class PseudoInst<dag oops, dag iops, InstrItinClass itin, list<dag> pattern>
   : InstTemplate<AddrModeNone, 0, IndexModeNone, Pseudo,
                  GenericDomain, "", itin> {
   let OutOperandList = oops;
   let InOperandList = iops;
   let Pattern = pattern;
   let isCodeGenOnly = 1;
   let isPseudo = 1;
 }
 
 // PseudoInst that's ARM-mode only.
 class ARMPseudoInst<dag oops, dag iops, int sz, InstrItinClass itin,
                     list<dag> pattern>
   : PseudoInst<oops, iops, itin, pattern> {
   let Size = sz;
   list<Predicate> Predicates = [IsARM];
 }
 
 // PseudoInst that's Thumb-mode only.
 class tPseudoInst<dag oops, dag iops, int sz, InstrItinClass itin,
                     list<dag> pattern>
   : PseudoInst<oops, iops, itin, pattern> {
   let Size = sz;
   list<Predicate> Predicates = [IsThumb];
 }
 
 // PseudoInst that's in ARMv8-M baseline (Somewhere between Thumb and Thumb2)
 class t2basePseudoInst<dag oops, dag iops, int sz, InstrItinClass itin,
                     list<dag> pattern>
   : PseudoInst<oops, iops, itin, pattern> {
   let Size = sz;
   list<Predicate> Predicates = [IsThumb,HasV8MBaseline];
 }
 
 // PseudoInst that's Thumb2-mode only.
 class t2PseudoInst<dag oops, dag iops, int sz, InstrItinClass itin,
                     list<dag> pattern>
   : PseudoInst<oops, iops, itin, pattern> {
   let Size = sz;
   list<Predicate> Predicates = [IsThumb2];
 }
 
 class ARMPseudoExpand<dag oops, dag iops, int sz,
                       InstrItinClass itin, list<dag> pattern,
                       dag Result>
   : ARMPseudoInst<oops, iops, sz, itin, pattern>,
     PseudoInstExpansion<Result>;
 
 class tPseudoExpand<dag oops, dag iops, int sz,
                     InstrItinClass itin, list<dag> pattern,
                     dag Result>
   : tPseudoInst<oops, iops, sz, itin, pattern>,
     PseudoInstExpansion<Result>;
 
 class t2PseudoExpand<dag oops, dag iops, int sz,
                     InstrItinClass itin, list<dag> pattern,
                     dag Result>
   : t2PseudoInst<oops, iops, sz, itin, pattern>,
     PseudoInstExpansion<Result>;
 
 // Almost all ARM instructions are predicable.
 class I<dag oops, dag iops, AddrMode am, int sz,
         IndexMode im, Format f, InstrItinClass itin,
         string opc, string asm, string cstr,
         list<dag> pattern>
   : InstARM<am, sz, im, f, GenericDomain, cstr, itin> {
   bits<4> p;
   let Inst{31-28} = p;
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   let AsmString = !strconcat(opc, "${p}", asm);
   let Pattern = pattern;
   list<Predicate> Predicates = [IsARM];
 }
 
 // A few are not predicable
 class InoP<dag oops, dag iops, AddrMode am, int sz,
            IndexMode im, Format f, InstrItinClass itin,
            string opc, string asm, string cstr,
            list<dag> pattern>
   : InstARM<am, sz, im, f, GenericDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = iops;
   let AsmString = !strconcat(opc, asm);
   let Pattern = pattern;
   let isPredicable = 0;
   list<Predicate> Predicates = [IsARM];
 }
 
 // Same as I except it can optionally modify CPSR. Note it's modeled as an input
 // operand since by default it's a zero register. It will become an implicit def
 // once it's "flipped".
 class sI<dag oops, dag iops, AddrMode am, int sz,
          IndexMode im, Format f, InstrItinClass itin,
          string opc, string asm, string cstr,
          list<dag> pattern>
   : InstARM<am, sz, im, f, GenericDomain, cstr, itin> {
   bits<4> p; // Predicate operand
   bits<1> s; // condition-code set flag ('1' if the insn should set the flags)
   let Inst{31-28} = p;
   let Inst{20} = s;
 
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p, cc_out:$s));
   let AsmString = !strconcat(opc, "${s}${p}", asm);
   let Pattern = pattern;
   list<Predicate> Predicates = [IsARM];
 }
 
 // Special cases
 class XI<dag oops, dag iops, AddrMode am, int sz,
          IndexMode im, Format f, InstrItinClass itin,
          string asm, string cstr, list<dag> pattern>
   : InstARM<am, sz, im, f, GenericDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = iops;
   let AsmString = asm;
   let Pattern = pattern;
   list<Predicate> Predicates = [IsARM];
 }
 
 class AI<dag oops, dag iops, Format f, InstrItinClass itin,
          string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrModeNone, 4, IndexModeNone, f, itin,
       opc, asm, "", pattern>;
 class AsI<dag oops, dag iops, Format f, InstrItinClass itin,
           string opc, string asm, list<dag> pattern>
   : sI<oops, iops, AddrModeNone, 4, IndexModeNone, f, itin,
        opc, asm, "", pattern>;
 class AXI<dag oops, dag iops, Format f, InstrItinClass itin,
           string asm, list<dag> pattern>
   : XI<oops, iops, AddrModeNone, 4, IndexModeNone, f, itin,
        asm, "", pattern>;
 class AXIM<dag oops, dag iops, AddrMode am, Format f, InstrItinClass itin,
           string asm, list<dag> pattern>
   : XI<oops, iops, am, 4, IndexModeNone, f, itin,
        asm, "", pattern>;
 class AInoP<dag oops, dag iops, Format f, InstrItinClass itin,
             string opc, string asm, list<dag> pattern>
   : InoP<oops, iops, AddrModeNone, 4, IndexModeNone, f, itin,
          opc, asm, "", pattern>;
 
 // Ctrl flow instructions
 class ABI<bits<4> opcod, dag oops, dag iops, InstrItinClass itin,
           string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrModeNone, 4, IndexModeNone, BrFrm, itin,
       opc, asm, "", pattern> {
   let Inst{27-24} = opcod;
 }
 class ABXI<bits<4> opcod, dag oops, dag iops, InstrItinClass itin,
            string asm, list<dag> pattern>
   : XI<oops, iops, AddrModeNone, 4, IndexModeNone, BrFrm, itin,
        asm, "", pattern> {
   let Inst{27-24} = opcod;
 }
 
 // BR_JT instructions
 class JTI<dag oops, dag iops, InstrItinClass itin,
           string asm, list<dag> pattern>
   : XI<oops, iops, AddrModeNone, 0, IndexModeNone, BrMiscFrm, itin,
        asm, "", pattern>;
 
 class AIldr_ex_or_acq<bits<2> opcod, bits<2> opcod2, dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrModeNone, 4, IndexModeNone, LdStExFrm, itin,
       opc, asm, "", pattern> {
   bits<4> Rt;
   bits<4> addr;
   let Inst{27-23} = 0b00011;
   let Inst{22-21} = opcod;
   let Inst{20}    = 1;
   let Inst{19-16} = addr;
   let Inst{15-12} = Rt;
   let Inst{11-10} = 0b11;
   let Inst{9-8}   = opcod2;
   let Inst{7-0}   = 0b10011111;
 }
 class AIstr_ex_or_rel<bits<2> opcod, bits<2> opcod2, dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrModeNone, 4, IndexModeNone, LdStExFrm, itin,
       opc, asm, "", pattern> {
   bits<4> Rt;
   bits<4> addr;
   let Inst{27-23} = 0b00011;
   let Inst{22-21} = opcod;
   let Inst{20}    = 0;
   let Inst{19-16} = addr;
   let Inst{11-10} = 0b11;
   let Inst{9-8}   = opcod2;
   let Inst{7-4}   = 0b1001;
   let Inst{3-0}   = Rt;
 }
 // Atomic load/store instructions
 class AIldrex<bits<2> opcod, dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, list<dag> pattern>
   : AIldr_ex_or_acq<opcod, 0b11, oops, iops, itin, opc, asm, pattern>;
 
 class AIstrex<bits<2> opcod, dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, list<dag> pattern>
   : AIstr_ex_or_rel<opcod, 0b11, oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   let Inst{15-12} = Rd;
 }
 
 // Exclusive load/store instructions
 
 class AIldaex<bits<2> opcod, dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, list<dag> pattern>
   : AIldr_ex_or_acq<opcod, 0b10, oops, iops, itin, opc, asm, pattern>,
     Requires<[IsARM, HasAcquireRelease, HasV7Clrex]>;
 
 class AIstlex<bits<2> opcod, dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, list<dag> pattern>
   : AIstr_ex_or_rel<opcod, 0b10, oops, iops, itin, opc, asm, pattern>,
     Requires<[IsARM, HasAcquireRelease, HasV7Clrex]> {
   bits<4> Rd;
   let Inst{15-12} = Rd;
 }
 
 class AIswp<bit b, dag oops, dag iops, string opc, list<dag> pattern>
   : AI<oops, iops, MiscFrm, NoItinerary, opc, "\t$Rt, $Rt2, $addr", pattern> {
   bits<4> Rt;
   bits<4> Rt2;
   bits<4> addr;
   let Inst{27-23} = 0b00010;
   let Inst{22} = b;
   let Inst{21-20} = 0b00;
   let Inst{19-16} = addr;
   let Inst{15-12} = Rt;
   let Inst{11-4} = 0b00001001;
   let Inst{3-0} = Rt2;
 
   let Unpredictable{11-8} = 0b1111;
   let DecoderMethod = "DecodeSwap";
 }
 // Acquire/Release load/store instructions
 class AIldracq<bits<2> opcod, dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, list<dag> pattern>
   : AIldr_ex_or_acq<opcod, 0b00, oops, iops, itin, opc, asm, pattern>,
     Requires<[IsARM, HasAcquireRelease]>;
 
 class AIstrrel<bits<2> opcod, dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, list<dag> pattern>
   : AIstr_ex_or_rel<opcod, 0b00, oops, iops, itin, opc, asm, pattern>,
     Requires<[IsARM, HasAcquireRelease]> {
   let Inst{15-12}   = 0b1111;
 }
 
 // addrmode1 instructions
 class AI1<bits<4> opcod, dag oops, dag iops, Format f, InstrItinClass itin,
           string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrMode1, 4, IndexModeNone, f, itin,
       opc, asm, "", pattern> {
   let Inst{24-21} = opcod;
   let Inst{27-26} = 0b00;
 }
 class AsI1<bits<4> opcod, dag oops, dag iops, Format f, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : sI<oops, iops, AddrMode1, 4, IndexModeNone, f, itin,
        opc, asm, "", pattern> {
   let Inst{24-21} = opcod;
   let Inst{27-26} = 0b00;
 }
 class AXI1<bits<4> opcod, dag oops, dag iops, Format f, InstrItinClass itin,
            string asm, list<dag> pattern>
   : XI<oops, iops, AddrMode1, 4, IndexModeNone, f, itin,
        asm, "", pattern> {
   let Inst{24-21} = opcod;
   let Inst{27-26} = 0b00;
 }
 
 // loads
 
 // LDR/LDRB/STR/STRB/...
 class AI2ldst<bits<3> op, bit isLd, bit isByte, dag oops, dag iops, AddrMode am,
              Format f, InstrItinClass itin, string opc, string asm,
              list<dag> pattern>
   : I<oops, iops, am, 4, IndexModeNone, f, itin, opc, asm,
       "", pattern> {
   let Inst{27-25} = op;
   let Inst{24} = 1;  // 24 == P
   // 23 == U
   let Inst{22} = isByte;
   let Inst{21} = 0;  // 21 == W
   let Inst{20} = isLd;
 }
 // Indexed load/stores
 class AI2ldstidx<bit isLd, bit isByte, bit isPre, dag oops, dag iops,
                 IndexMode im, Format f, InstrItinClass itin, string opc,
                 string asm, string cstr, list<dag> pattern>
   : I<oops, iops, AddrMode2, 4, im, f, itin,
       opc, asm, cstr, pattern> {
   bits<4> Rt;
   let Inst{27-26} = 0b01;
   let Inst{24}    = isPre; // P bit
   let Inst{22}    = isByte; // B bit
   let Inst{21}    = isPre; // W bit
   let Inst{20}    = isLd; // L bit
   let Inst{15-12} = Rt;
 }
 class AI2stridx_reg<bit isByte, bit isPre, dag oops, dag iops,
                 IndexMode im, Format f, InstrItinClass itin, string opc,
                 string asm, string cstr, list<dag> pattern>
   : AI2ldstidx<0, isByte, isPre, oops, iops, im, f, itin, opc, asm, cstr,
                pattern> {
   // AM2 store w/ two operands: (GPR, am2offset)
   // {12}     isAdd
   // {11-0}   imm12/Rm
   bits<14> offset;
   bits<4> Rn;
   let Inst{25} = 1;
   let Inst{23} = offset{12};
   let Inst{19-16} = Rn;
   let Inst{11-5} = offset{11-5};
   let Inst{4} = 0;
   let Inst{3-0} = offset{3-0};
 }
 
 class AI2stridx_imm<bit isByte, bit isPre, dag oops, dag iops,
                 IndexMode im, Format f, InstrItinClass itin, string opc,
                 string asm, string cstr, list<dag> pattern>
   : AI2ldstidx<0, isByte, isPre, oops, iops, im, f, itin, opc, asm, cstr,
                pattern> {
   // AM2 store w/ two operands: (GPR, am2offset)
   // {12}     isAdd
   // {11-0}   imm12/Rm
   bits<14> offset;
   bits<4> Rn;
   let Inst{25} = 0;
   let Inst{23} = offset{12};
   let Inst{19-16} = Rn;
   let Inst{11-0} = offset{11-0};
 }
 
 
 // FIXME: Merge with the above class when addrmode2 gets used for STR, STRB
 // but for now use this class for STRT and STRBT.
 class AI2stridxT<bit isByte, bit isPre, dag oops, dag iops,
                 IndexMode im, Format f, InstrItinClass itin, string opc,
                 string asm, string cstr, list<dag> pattern>
   : AI2ldstidx<0, isByte, isPre, oops, iops, im, f, itin, opc, asm, cstr,
                pattern> {
   // AM2 store w/ two operands: (GPR, am2offset)
   // {17-14}  Rn
   // {13}     1 == Rm, 0 == imm12
   // {12}     isAdd
   // {11-0}   imm12/Rm
   bits<18> addr;
   let Inst{25} = addr{13};
   let Inst{23} = addr{12};
   let Inst{19-16} = addr{17-14};
   let Inst{11-0} = addr{11-0};
 }
 
 // addrmode3 instructions
 class AI3ld<bits<4> op, bit op20, dag oops, dag iops, Format f,
             InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrMode3, 4, IndexModeNone, f, itin,
       opc, asm, "", pattern> {
   bits<14> addr;
   bits<4> Rt;
   let Inst{27-25} = 0b000;
   let Inst{24}    = 1;            // P bit
   let Inst{23}    = addr{8};      // U bit
   let Inst{22}    = addr{13};     // 1 == imm8, 0 == Rm
   let Inst{21}    = 0;            // W bit
   let Inst{20}    = op20;         // L bit
   let Inst{19-16} = addr{12-9};   // Rn
   let Inst{15-12} = Rt;           // Rt
   let Inst{11-8}  = addr{7-4};    // imm7_4/zero
   let Inst{7-4}   = op;
   let Inst{3-0}   = addr{3-0};    // imm3_0/Rm
 
   let DecoderMethod = "DecodeAddrMode3Instruction";
 }
 
 class AI3ldstidx<bits<4> op, bit op20, bit isPre, dag oops, dag iops,
                 IndexMode im, Format f, InstrItinClass itin, string opc,
                 string asm, string cstr, list<dag> pattern>
   : I<oops, iops, AddrMode3, 4, im, f, itin,
       opc, asm, cstr, pattern> {
   bits<4> Rt;
   let Inst{27-25} = 0b000;
   let Inst{24}    = isPre;        // P bit
   let Inst{21}    = isPre;        // W bit
   let Inst{20}    = op20;         // L bit
   let Inst{15-12} = Rt;           // Rt
   let Inst{7-4}   = op;
 }
 
 // FIXME: Merge with the above class when addrmode2 gets used for LDR, LDRB
 // but for now use this class for LDRSBT, LDRHT, LDSHT.
 class AI3ldstidxT<bits<4> op, bit isLoad, dag oops, dag iops,
                   IndexMode im, Format f, InstrItinClass itin, string opc,
                   string asm, string cstr, list<dag> pattern>
   : I<oops, iops, AddrMode3, 4, im, f, itin, opc, asm, cstr, pattern> {
   // {13}     1 == imm8, 0 == Rm
   // {12-9}   Rn
   // {8}      isAdd
   // {7-4}    imm7_4/zero
   // {3-0}    imm3_0/Rm
   bits<4> addr;
   bits<4> Rt;
   let Inst{27-25} = 0b000;
   let Inst{24}    = 0;            // P bit
   let Inst{21}    = 1;
   let Inst{20}    = isLoad;       // L bit
   let Inst{19-16} = addr;         // Rn
   let Inst{15-12} = Rt;           // Rt
   let Inst{7-4}   = op;
 }
 
 // stores
 class AI3str<bits<4> op, dag oops, dag iops, Format f, InstrItinClass itin,
              string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrMode3, 4, IndexModeNone, f, itin,
       opc, asm, "", pattern> {
   bits<14> addr;
   bits<4> Rt;
   let Inst{27-25} = 0b000;
   let Inst{24}    = 1;            // P bit
   let Inst{23}    = addr{8};      // U bit
   let Inst{22}    = addr{13};     // 1 == imm8, 0 == Rm
   let Inst{21}    = 0;            // W bit
   let Inst{20}    = 0;            // L bit
   let Inst{19-16} = addr{12-9};   // Rn
   let Inst{15-12} = Rt;           // Rt
   let Inst{11-8}  = addr{7-4};    // imm7_4/zero
   let Inst{7-4}   = op;
   let Inst{3-0}   = addr{3-0};    // imm3_0/Rm
   let DecoderMethod = "DecodeAddrMode3Instruction";
 }
 
 // addrmode4 instructions
 class AXI4<dag oops, dag iops, IndexMode im, Format f, InstrItinClass itin,
            string asm, string cstr, list<dag> pattern>
   : XI<oops, iops, AddrMode4, 4, im, f, itin, asm, cstr, pattern> {
   bits<4>  p;
   bits<16> regs;
   bits<4>  Rn;
   let Inst{31-28} = p;
   let Inst{27-25} = 0b100;
   let Inst{22}    = 0; // S bit
   let Inst{19-16} = Rn;
   let Inst{15-0}  = regs;
 }
 
 // Unsigned multiply, multiply-accumulate instructions.
 class AMul1I<bits<7> opcod, dag oops, dag iops, InstrItinClass itin,
              string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrModeNone, 4, IndexModeNone, MulFrm, itin,
       opc, asm, "", pattern> {
   let Inst{7-4}   = 0b1001;
   let Inst{20}    = 0; // S bit
   let Inst{27-21} = opcod;
 }
 class AsMul1I<bits<7> opcod, dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, list<dag> pattern>
   : sI<oops, iops, AddrModeNone, 4, IndexModeNone, MulFrm, itin,
        opc, asm, "", pattern> {
   let Inst{7-4}   = 0b1001;
   let Inst{27-21} = opcod;
 }
 
 // Most significant word multiply
 class AMul2I<bits<7> opcod, bits<4> opc7_4, dag oops, dag iops,
              InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrModeNone, 4, IndexModeNone, MulFrm, itin,
       opc, asm, "", pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<4> Rm;
   let Inst{7-4}   = opc7_4;
   let Inst{20}    = 1;
   let Inst{27-21} = opcod;
   let Inst{19-16} = Rd;
   let Inst{11-8}  = Rm;
   let Inst{3-0}   = Rn;
 }
 // MSW multiple w/ Ra operand
 class AMul2Ia<bits<7> opcod, bits<4> opc7_4, dag oops, dag iops,
               InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : AMul2I<opcod, opc7_4, oops, iops, itin, opc, asm, pattern> {
   bits<4> Ra;
   let Inst{15-12} = Ra;
 }
 
 // SMUL<x><y> / SMULW<y> / SMLA<x><y> / SMLAW<x><y>
 class AMulxyIbase<bits<7> opcod, bits<2> bit6_5, dag oops, dag iops,
               InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrModeNone, 4, IndexModeNone, MulFrm, itin,
       opc, asm, "", pattern> {
   bits<4> Rn;
   bits<4> Rm;
   let Inst{4}     = 0;
   let Inst{7}     = 1;
   let Inst{20}    = 0;
   let Inst{27-21} = opcod;
   let Inst{6-5}   = bit6_5;
   let Inst{11-8}  = Rm;
   let Inst{3-0}   = Rn;
 }
 class AMulxyI<bits<7> opcod, bits<2> bit6_5, dag oops, dag iops,
               InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : AMulxyIbase<opcod, bit6_5, oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   let Inst{19-16} = Rd;
 }
 
 // AMulxyI with Ra operand
 class AMulxyIa<bits<7> opcod, bits<2> bit6_5, dag oops, dag iops,
               InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : AMulxyI<opcod, bit6_5, oops, iops, itin, opc, asm, pattern> {
   bits<4> Ra;
   let Inst{15-12} = Ra;
 }
 // SMLAL*
 class AMulxyI64<bits<7> opcod, bits<2> bit6_5, dag oops, dag iops,
               InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : AMulxyIbase<opcod, bit6_5, oops, iops, itin, opc, asm, pattern> {
   bits<4> RdLo;
   bits<4> RdHi;
   let Inst{19-16} = RdHi;
   let Inst{15-12} = RdLo;
 }
 
 // Extend instructions.
 class AExtI<bits<8> opcod, dag oops, dag iops, InstrItinClass itin,
             string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrModeNone, 4, IndexModeNone, ExtFrm, itin,
       opc, asm, "", pattern> {
   // All AExtI instructions have Rd and Rm register operands.
   bits<4> Rd;
   bits<4> Rm;
   let Inst{15-12} = Rd;
   let Inst{3-0}   = Rm;
   let Inst{7-4}   = 0b0111;
   let Inst{9-8}   = 0b00;
   let Inst{27-20} = opcod;
 
   let Unpredictable{9-8} = 0b11;
 }
 
 // Misc Arithmetic instructions.
 class AMiscA1I<bits<8> opcod, bits<4> opc7_4, dag oops, dag iops,
                InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrModeNone, 4, IndexModeNone, ArithMiscFrm, itin,
       opc, asm, "", pattern> {
   bits<4> Rd;
   bits<4> Rm;
   let Inst{27-20} = opcod;
   let Inst{19-16} = 0b1111;
   let Inst{15-12} = Rd;
   let Inst{11-8}  = 0b1111;
   let Inst{7-4}   = opc7_4;
   let Inst{3-0}   = Rm;
 }
 
 // Division instructions.
 class ADivA1I<bits<3> opcod, dag oops, dag iops,
               InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrModeNone, 4, IndexModeNone, ArithMiscFrm, itin,
       opc, asm, "", pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<4> Rm;
   let Inst{27-23} = 0b01110;
   let Inst{22-20} = opcod;
   let Inst{19-16} = Rd;
   let Inst{15-12} = 0b1111;
   let Inst{11-8}  = Rm;
   let Inst{7-4}   = 0b0001;
   let Inst{3-0}   = Rn;
 }
 
 // PKH instructions
 def PKHLSLAsmOperand : ImmAsmOperand<0,31> {
   let Name = "PKHLSLImm";
   let ParserMethod = "parsePKHLSLImm";
 }
 def pkh_lsl_amt: Operand<i32>, ImmLeaf<i32, [{ return Imm >= 0 && Imm < 32; }]>{
   let PrintMethod = "printPKHLSLShiftImm";
   let ParserMatchClass = PKHLSLAsmOperand;
 }
 def PKHASRAsmOperand : AsmOperandClass {
   let Name = "PKHASRImm";
   let ParserMethod = "parsePKHASRImm";
 }
 def pkh_asr_amt: Operand<i32>, ImmLeaf<i32, [{ return Imm > 0 && Imm <= 32; }]>{
   let PrintMethod = "printPKHASRShiftImm";
   let ParserMatchClass = PKHASRAsmOperand;
 }
 
 class APKHI<bits<8> opcod, bit tb, dag oops, dag iops, InstrItinClass itin,
             string opc, string asm, list<dag> pattern>
   : I<oops, iops, AddrModeNone, 4, IndexModeNone, ArithMiscFrm, itin,
       opc, asm, "", pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<4> Rm;
   bits<5> sh;
   let Inst{27-20} = opcod;
   let Inst{19-16} = Rn;
   let Inst{15-12} = Rd;
   let Inst{11-7}  = sh;
   let Inst{6}     = tb;
   let Inst{5-4}   = 0b01;
   let Inst{3-0}   = Rm;
 }
 
 //===----------------------------------------------------------------------===//
 
 // ARMPat - Same as Pat<>, but requires that the compiler be in ARM mode.
 class ARMPat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [IsARM];
 }
 class ARMV5TPat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [IsARM, HasV5T];
 }
 class ARMV5TEPat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [IsARM, HasV5TE];
 }
 // ARMV5MOPat - Same as ARMV5TEPat with UseMulOps.
 class ARMV5MOPat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [IsARM, HasV5TE, UseMulOps];
 }
 class ARMV6Pat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [IsARM, HasV6];
 }
 class VFPPat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [HasVFP2];
 }
 class VFPNoNEONPat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [HasVFP2, DontUseNEONForFP];
 }
 class Thumb2DSPPat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [IsThumb2, HasDSP];
 }
 class Thumb2DSPMulPat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [IsThumb2, UseMulOps, HasDSP];
 }
 class FP16Pat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [HasFP16];
 }
 class FullFP16Pat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [HasFullFP16];
 }
 //===----------------------------------------------------------------------===//
 // Thumb Instruction Format Definitions.
 //
 
 class ThumbI<dag oops, dag iops, AddrMode am, int sz,
              InstrItinClass itin, string asm, string cstr, list<dag> pattern>
   : InstThumb<am, sz, IndexModeNone, ThumbFrm, GenericDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = iops;
   let AsmString = asm;
   let Pattern = pattern;
   list<Predicate> Predicates = [IsThumb];
 }
 
 // TI - Thumb instruction.
 class TI<dag oops, dag iops, InstrItinClass itin, string asm, list<dag> pattern>
   : ThumbI<oops, iops, AddrModeNone, 2, itin, asm, "", pattern>;
 
 // Two-address instructions
 class TIt<dag oops, dag iops, InstrItinClass itin, string asm,
           list<dag> pattern>
   : ThumbI<oops, iops, AddrModeNone, 2, itin, asm, "$lhs = $dst",
            pattern>;
 
 // tBL, tBX 32-bit instructions
 class TIx2<bits<5> opcod1, bits<2> opcod2, bit opcod3,
            dag oops, dag iops, InstrItinClass itin, string asm,
            list<dag> pattern>
     : ThumbI<oops, iops, AddrModeNone, 4, itin, asm, "", pattern>,
       Encoding {
   let Inst{31-27} = opcod1;
   let Inst{15-14} = opcod2;
   let Inst{12}    = opcod3;
 }
 
 // BR_JT instructions
 class TJTI<dag oops, dag iops, InstrItinClass itin, string asm,
            list<dag> pattern>
   : ThumbI<oops, iops, AddrModeNone, 0, itin, asm, "", pattern>;
 
 // Thumb1 only
 class Thumb1I<dag oops, dag iops, AddrMode am, int sz,
               InstrItinClass itin, string asm, string cstr, list<dag> pattern>
   : InstThumb<am, sz, IndexModeNone, ThumbFrm, GenericDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = iops;
   let AsmString = asm;
   let Pattern = pattern;
   list<Predicate> Predicates = [IsThumb, IsThumb1Only];
 }
 
 class T1I<dag oops, dag iops, InstrItinClass itin,
           string asm, list<dag> pattern>
   : Thumb1I<oops, iops, AddrModeNone, 2, itin, asm, "", pattern>;
 class T1Ix2<dag oops, dag iops, InstrItinClass itin,
             string asm, list<dag> pattern>
   : Thumb1I<oops, iops, AddrModeNone, 4, itin, asm, "", pattern>;
 
 // Two-address instructions
 class T1It<dag oops, dag iops, InstrItinClass itin,
            string asm, string cstr, list<dag> pattern>
   : Thumb1I<oops, iops, AddrModeNone, 2, itin,
             asm, cstr, pattern>;
 
 // Thumb1 instruction that can either be predicated or set CPSR.
 class Thumb1sI<dag oops, dag iops, AddrMode am, int sz,
                InstrItinClass itin,
                string opc, string asm, string cstr, list<dag> pattern>
   : InstThumb<am, sz, IndexModeNone, ThumbFrm, GenericDomain, cstr, itin> {
   let OutOperandList = !con(oops, (outs s_cc_out:$s));
   let InOperandList = !con(iops, (ins pred:$p));
   let AsmString = !strconcat(opc, "${s}${p}", asm);
   let Pattern = pattern;
   let thumbArithFlagSetting = 1;
   list<Predicate> Predicates = [IsThumb, IsThumb1Only];
   let DecoderNamespace = "ThumbSBit";
 }
 
 class T1sI<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : Thumb1sI<oops, iops, AddrModeNone, 2, itin, opc, asm, "", pattern>;
 
 // Two-address instructions
 class T1sIt<dag oops, dag iops, InstrItinClass itin,
             string opc, string asm, list<dag> pattern>
   : Thumb1sI<oops, iops, AddrModeNone, 2, itin, opc, asm,
              "$Rn = $Rdn", pattern>;
 
 // Thumb1 instruction that can be predicated.
 class Thumb1pI<dag oops, dag iops, AddrMode am, int sz,
                InstrItinClass itin,
                string opc, string asm, string cstr, list<dag> pattern>
   : InstThumb<am, sz, IndexModeNone, ThumbFrm, GenericDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   let AsmString = !strconcat(opc, "${p}", asm);
   let Pattern = pattern;
   list<Predicate> Predicates = [IsThumb, IsThumb1Only];
 }
 
 class T1pI<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : Thumb1pI<oops, iops, AddrModeNone, 2, itin, opc, asm, "", pattern>;
 
 // Two-address instructions
 class T1pIt<dag oops, dag iops, InstrItinClass itin,
             string opc, string asm, list<dag> pattern>
   : Thumb1pI<oops, iops, AddrModeNone, 2, itin, opc, asm,
              "$Rn = $Rdn", pattern>;
 
 class T1pIs<dag oops, dag iops,
             InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : Thumb1pI<oops, iops, AddrModeT1_s, 2, itin, opc, asm, "", pattern>;
 
 class Encoding16 : Encoding {
   let Inst{31-16} = 0x0000;
 }
 
 // A6.2 16-bit Thumb instruction encoding
 class T1Encoding<bits<6> opcode> : Encoding16 {
   let Inst{15-10} = opcode;
 }
 
 // A6.2.1 Shift (immediate), add, subtract, move, and compare encoding.
 class T1General<bits<5> opcode> : Encoding16 {
   let Inst{15-14} = 0b00;
   let Inst{13-9} = opcode;
 }
 
 // A6.2.2 Data-processing encoding.
 class T1DataProcessing<bits<4> opcode> : Encoding16 {
   let Inst{15-10} = 0b010000;
   let Inst{9-6} = opcode;
 }
 
 // A6.2.3 Special data instructions and branch and exchange encoding.
 class T1Special<bits<4> opcode> : Encoding16 {
   let Inst{15-10} = 0b010001;
   let Inst{9-6}   = opcode;
 }
 
 // A6.2.4 Load/store single data item encoding.
 class T1LoadStore<bits<4> opA, bits<3> opB> : Encoding16 {
   let Inst{15-12} = opA;
   let Inst{11-9}  = opB;
 }
 class T1LdStSP<bits<3> opB>   : T1LoadStore<0b1001, opB>; // SP relative
 
 class T1BranchCond<bits<4> opcode> : Encoding16 {
   let Inst{15-12} = opcode;
 }
 
 // Helper classes to encode Thumb1 loads and stores. For immediates, the
 // following bits are used for "opA" (see A6.2.4):
 //
 //   0b0110 => Immediate, 4 bytes
 //   0b1000 => Immediate, 2 bytes
 //   0b0111 => Immediate, 1 byte
 class T1pILdStEncode<bits<3> opcode, dag oops, dag iops, AddrMode am,
                      InstrItinClass itin, string opc, string asm,
                      list<dag> pattern>
   : Thumb1pI<oops, iops, am, 2, itin, opc, asm, "", pattern>,
     T1LoadStore<0b0101, opcode> {
   bits<3> Rt;
   bits<8> addr;
   let Inst{8-6} = addr{5-3};    // Rm
   let Inst{5-3} = addr{2-0};    // Rn
   let Inst{2-0} = Rt;
 }
 class T1pILdStEncodeImm<bits<4> opA, bit opB, dag oops, dag iops, AddrMode am,
                         InstrItinClass itin, string opc, string asm,
                         list<dag> pattern>
   : Thumb1pI<oops, iops, am, 2, itin, opc, asm, "", pattern>,
     T1LoadStore<opA, {opB,?,?}> {
   bits<3> Rt;
   bits<8> addr;
   let Inst{10-6} = addr{7-3};   // imm5
   let Inst{5-3}  = addr{2-0};   // Rn
   let Inst{2-0}  = Rt;
 }
 
 // A6.2.5 Miscellaneous 16-bit instructions encoding.
 class T1Misc<bits<7> opcode> : Encoding16 {
   let Inst{15-12} = 0b1011;
   let Inst{11-5} = opcode;
 }
 
 // Thumb2I - Thumb2 instruction. Almost all Thumb2 instructions are predicable.
 class Thumb2I<dag oops, dag iops, AddrMode am, int sz,
               InstrItinClass itin,
               string opc, string asm, string cstr, list<dag> pattern>
   : InstARM<am, sz, IndexModeNone, ThumbFrm, GenericDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   let AsmString = !strconcat(opc, "${p}", asm);
   let Pattern = pattern;
   list<Predicate> Predicates = [IsThumb2];
   let DecoderNamespace = "Thumb2";
 }
 
 // Same as Thumb2I except it can optionally modify CPSR. Note it's modeled as an
 // input operand since by default it's a zero register. It will become an
 // implicit def once it's "flipped".
 //
 // FIXME: This uses unified syntax so {s} comes before {p}. We should make it
 // more consistent.
 class Thumb2sI<dag oops, dag iops, AddrMode am, int sz,
                InstrItinClass itin,
                string opc, string asm, string cstr, list<dag> pattern>
   : InstARM<am, sz, IndexModeNone, ThumbFrm, GenericDomain, cstr, itin> {
   bits<1> s; // condition-code set flag ('1' if the insn should set the flags)
   let Inst{20} = s;
 
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p, cc_out:$s));
   let AsmString = !strconcat(opc, "${s}${p}", asm);
   let Pattern = pattern;
   list<Predicate> Predicates = [IsThumb2];
   let DecoderNamespace = "Thumb2";
 }
 
 // Special cases
 class Thumb2XI<dag oops, dag iops, AddrMode am, int sz,
                InstrItinClass itin,
                string asm, string cstr, list<dag> pattern>
   : InstARM<am, sz, IndexModeNone, ThumbFrm, GenericDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = iops;
   let AsmString = asm;
   let Pattern = pattern;
   list<Predicate> Predicates = [IsThumb2];
   let DecoderNamespace = "Thumb2";
 }
 
 class ThumbXI<dag oops, dag iops, AddrMode am, int sz,
               InstrItinClass itin,
               string asm, string cstr, list<dag> pattern>
   : InstARM<am, sz, IndexModeNone, ThumbFrm, GenericDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = iops;
   let AsmString = asm;
   let Pattern = pattern;
   list<Predicate> Predicates = [IsThumb, IsThumb1Only];
   let DecoderNamespace = "Thumb";
 }
 
 class T2I<dag oops, dag iops, InstrItinClass itin,
           string opc, string asm, list<dag> pattern>
   : Thumb2I<oops, iops, AddrModeNone, 4, itin, opc, asm, "", pattern>;
 class T2Ii12<dag oops, dag iops, InstrItinClass itin,
              string opc, string asm, list<dag> pattern>
   : Thumb2I<oops, iops, AddrModeT2_i12, 4, itin, opc, asm, "",pattern>;
 class T2Ii8<dag oops, dag iops, InstrItinClass itin,
             string opc, string asm, list<dag> pattern>
   : Thumb2I<oops, iops, AddrModeT2_i8, 4, itin, opc, asm, "", pattern>;
 class T2Iso<dag oops, dag iops, InstrItinClass itin,
             string opc, string asm, list<dag> pattern>
   : Thumb2I<oops, iops, AddrModeT2_so, 4, itin, opc, asm, "", pattern>;
 class T2Ipc<dag oops, dag iops, InstrItinClass itin,
             string opc, string asm, list<dag> pattern>
   : Thumb2I<oops, iops, AddrModeT2_pc, 4, itin, opc, asm, "", pattern>;
 class T2Ii8s4<bit P, bit W, bit isLoad, dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, string cstr, list<dag> pattern>
   : Thumb2I<oops, iops, AddrModeT2_i8s4, 4, itin, opc, asm, cstr,
             pattern> {
   bits<4> Rt;
   bits<4> Rt2;
   bits<13> addr;
   let Inst{31-25} = 0b1110100;
   let Inst{24}    = P;
   let Inst{23}    = addr{8};
   let Inst{22}    = 1;
   let Inst{21}    = W;
   let Inst{20}    = isLoad;
   let Inst{19-16} = addr{12-9};
   let Inst{15-12} = Rt{3-0};
   let Inst{11-8}  = Rt2{3-0};
   let Inst{7-0}   = addr{7-0};
 }
 class T2Ii8s4post<bit P, bit W, bit isLoad, dag oops, dag iops,
                   InstrItinClass itin, string opc, string asm, string cstr,
                   list<dag> pattern>
   : Thumb2I<oops, iops, AddrModeT2_i8s4, 4, itin, opc, asm, cstr,
             pattern> {
   bits<4> Rt;
   bits<4> Rt2;
   bits<4> addr;
   bits<9> imm;
   let Inst{31-25} = 0b1110100;
   let Inst{24}    = P;
   let Inst{23}    = imm{8};
   let Inst{22}    = 1;
   let Inst{21}    = W;
   let Inst{20}    = isLoad;
   let Inst{19-16} = addr;
   let Inst{15-12} = Rt{3-0};
   let Inst{11-8}  = Rt2{3-0};
   let Inst{7-0}   = imm{7-0};
 }
 
 class T2sI<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : Thumb2sI<oops, iops, AddrModeNone, 4, itin, opc, asm, "", pattern>;
 
 class T2XI<dag oops, dag iops, InstrItinClass itin,
            string asm, list<dag> pattern>
   : Thumb2XI<oops, iops, AddrModeNone, 4, itin, asm, "", pattern>;
 class T2JTI<dag oops, dag iops, InstrItinClass itin,
             string asm, list<dag> pattern>
   : Thumb2XI<oops, iops, AddrModeNone, 0, itin, asm, "", pattern>;
 
 // Move to/from coprocessor instructions
 class T2Cop<bits<4> opc, dag oops, dag iops, string opcstr, string asm,
             list<dag> pattern>
   : T2I <oops, iops, NoItinerary, opcstr, asm, pattern>, Requires<[IsThumb2]> {
   let Inst{31-28} = opc;
 }
 
 // Two-address instructions
 class T2XIt<dag oops, dag iops, InstrItinClass itin,
             string asm, string cstr, list<dag> pattern>
   : Thumb2XI<oops, iops, AddrModeNone, 4, itin, asm, cstr, pattern>;
 
 // T2Ipreldst - Thumb2 pre-indexed load / store instructions.
 class T2Ipreldst<bit signed, bits<2> opcod, bit load, bit pre,
                  dag oops, dag iops,
                  AddrMode am, IndexMode im, InstrItinClass itin,
                  string opc, string asm, string cstr, list<dag> pattern>
   : InstARM<am, 4, im, ThumbFrm, GenericDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   let AsmString = !strconcat(opc, "${p}", asm);
   let Pattern = pattern;
   list<Predicate> Predicates = [IsThumb2];
   let DecoderNamespace = "Thumb2";
 
   bits<4> Rt;
   bits<13> addr;
   let Inst{31-27} = 0b11111;
   let Inst{26-25} = 0b00;
   let Inst{24}    = signed;
   let Inst{23}    = 0;
   let Inst{22-21} = opcod;
   let Inst{20}    = load;
   let Inst{19-16} = addr{12-9};
   let Inst{15-12} = Rt{3-0};
   let Inst{11}    = 1;
   // (P, W) = (1, 1) Pre-indexed or (0, 1) Post-indexed
   let Inst{10}    = pre; // The P bit.
   let Inst{9}     = addr{8}; // Sign bit
   let Inst{8}     = 1; // The W bit.
   let Inst{7-0}   = addr{7-0};
 
   let DecoderMethod = "DecodeT2LdStPre";
 }
 
 // T2Ipostldst - Thumb2 post-indexed load / store instructions.
 class T2Ipostldst<bit signed, bits<2> opcod, bit load, bit pre,
                  dag oops, dag iops,
                  AddrMode am, IndexMode im, InstrItinClass itin,
                  string opc, string asm, string cstr, list<dag> pattern>
   : InstARM<am, 4, im, ThumbFrm, GenericDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   let AsmString = !strconcat(opc, "${p}", asm);
   let Pattern = pattern;
   list<Predicate> Predicates = [IsThumb2];
   let DecoderNamespace = "Thumb2";
 
   bits<4> Rt;
   bits<4> Rn;
   bits<9> offset;
   let Inst{31-27} = 0b11111;
   let Inst{26-25} = 0b00;
   let Inst{24}    = signed;
   let Inst{23}    = 0;
   let Inst{22-21} = opcod;
   let Inst{20}    = load;
   let Inst{19-16} = Rn;
   let Inst{15-12} = Rt{3-0};
   let Inst{11}    = 1;
   // (P, W) = (1, 1) Pre-indexed or (0, 1) Post-indexed
   let Inst{10}    = pre; // The P bit.
   let Inst{9}     = offset{8}; // Sign bit
   let Inst{8}     = 1; // The W bit.
   let Inst{7-0}   = offset{7-0};
 
   let DecoderMethod = "DecodeT2LdStPre";
 }
 
 // T1Pat - Same as Pat<>, but requires that the compiler be in Thumb1 mode.
 class T1Pat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [IsThumb, IsThumb1Only];
 }
 
 // T2v6Pat - Same as Pat<>, but requires V6T2 Thumb2 mode.
 class T2v6Pat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [IsThumb2, HasV6T2];
 }
 
 // T2Pat - Same as Pat<>, but requires that the compiler be in Thumb2 mode.
 class T2Pat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [IsThumb2];
 }
 
 //===----------------------------------------------------------------------===//
 
 //===----------------------------------------------------------------------===//
 // ARM VFP Instruction templates.
 //
 
 // Almost all VFP instructions are predicable.
 class VFPI<dag oops, dag iops, AddrMode am, int sz,
            IndexMode im, Format f, InstrItinClass itin,
            string opc, string asm, string cstr, list<dag> pattern>
   : InstARM<am, sz, im, f, VFPDomain, cstr, itin> {
   bits<4> p;
   let Inst{31-28} = p;
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   let AsmString = !strconcat(opc, "${p}", asm);
   let Pattern = pattern;
   let PostEncoderMethod = "VFPThumb2PostEncoder";
   let DecoderNamespace = "VFP";
   list<Predicate> Predicates = [HasVFP2];
 }
 
 // Special cases
 class VFPXI<dag oops, dag iops, AddrMode am, int sz,
             IndexMode im, Format f, InstrItinClass itin,
             string asm, string cstr, list<dag> pattern>
   : InstARM<am, sz, im, f, VFPDomain, cstr, itin> {
   bits<4> p;
   let Inst{31-28} = p;
   let OutOperandList = oops;
   let InOperandList = iops;
   let AsmString = asm;
   let Pattern = pattern;
   let PostEncoderMethod = "VFPThumb2PostEncoder";
   let DecoderNamespace = "VFP";
   list<Predicate> Predicates = [HasVFP2];
 }
 
 class VFPAI<dag oops, dag iops, Format f, InstrItinClass itin,
             string opc, string asm, list<dag> pattern>
   : VFPI<oops, iops, AddrModeNone, 4, IndexModeNone, f, itin,
          opc, asm, "", pattern> {
   let PostEncoderMethod = "VFPThumb2PostEncoder";
 }
 
 // ARM VFP addrmode5 loads and stores
 class ADI5<bits<4> opcod1, bits<2> opcod2, dag oops, dag iops,
            InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : VFPI<oops, iops, AddrMode5, 4, IndexModeNone,
          VFPLdStFrm, itin, opc, asm, "", pattern> {
   // Instruction operands.
   bits<5>  Dd;
   bits<13> addr;
 
   // Encode instruction operands.
   let Inst{23}    = addr{8};      // U (add = (U == '1'))
   let Inst{22}    = Dd{4};
   let Inst{19-16} = addr{12-9};   // Rn
   let Inst{15-12} = Dd{3-0};
   let Inst{7-0}   = addr{7-0};    // imm8
 
   let Inst{27-24} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 1;          // Double precision
 
   // Loads & stores operate on both NEON and VFP pipelines.
   let D = VFPNeonDomain;
 }
 
 class ASI5<bits<4> opcod1, bits<2> opcod2, dag oops, dag iops,
            InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : VFPI<oops, iops, AddrMode5, 4, IndexModeNone,
          VFPLdStFrm, itin, opc, asm, "", pattern> {
   // Instruction operands.
   bits<5>  Sd;
   bits<13> addr;
 
   // Encode instruction operands.
   let Inst{23}    = addr{8};      // U (add = (U == '1'))
   let Inst{22}    = Sd{0};
   let Inst{19-16} = addr{12-9};   // Rn
   let Inst{15-12} = Sd{4-1};
   let Inst{7-0}   = addr{7-0};    // imm8
 
   let Inst{27-24} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 0;          // Single precision
 
   // Loads & stores operate on both NEON and VFP pipelines.
   let D = VFPNeonDomain;
 }
 
 class AHI5<bits<4> opcod1, bits<2> opcod2, dag oops, dag iops,
            InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : VFPI<oops, iops, AddrMode5FP16, 4, IndexModeNone,
          VFPLdStFrm, itin, opc, asm, "", pattern> {
   list<Predicate> Predicates = [HasFullFP16];
 
   // Instruction operands.
   bits<5>  Sd;
   bits<13> addr;
 
   // Encode instruction operands.
   let Inst{23}    = addr{8};      // U (add = (U == '1'))
   let Inst{22}    = Sd{0};
   let Inst{19-16} = addr{12-9};   // Rn
   let Inst{15-12} = Sd{4-1};
   let Inst{7-0}   = addr{7-0};    // imm8
 
   let Inst{27-24} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{11-8}  = 0b1001;     // Half precision
 
   // Loads & stores operate on both NEON and VFP pipelines.
   let D = VFPNeonDomain;
 }
 
 // VFP Load / store multiple pseudo instructions.
 class PseudoVFPLdStM<dag oops, dag iops, InstrItinClass itin, string cstr,
                      list<dag> pattern>
   : InstARM<AddrMode4, 4, IndexModeNone, Pseudo, VFPNeonDomain,
             cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   let Pattern = pattern;
   list<Predicate> Predicates = [HasVFP2];
 }
 
 // Load / store multiple
 
 // Unknown precision
 class AXXI4<dag oops, dag iops, IndexMode im,
             string asm, string cstr, list<dag> pattern>
   : VFPXI<oops, iops, AddrMode4, 4, im,
           VFPLdStFrm, NoItinerary, asm, cstr, pattern> {
   // Instruction operands.
   bits<4>  Rn;
   bits<13> regs;
 
   // Encode instruction operands.
   let Inst{19-16} = Rn;
   let Inst{22}    = 0;
   let Inst{15-12} = regs{11-8};
   let Inst{7-1}   = regs{7-1};
 
   let Inst{27-25} = 0b110;
   let Inst{11-8}  = 0b1011;
   let Inst{0}     = 1;
 }
 
 // Double precision
 class AXDI4<dag oops, dag iops, IndexMode im, InstrItinClass itin,
             string asm, string cstr, list<dag> pattern>
   : VFPXI<oops, iops, AddrMode4, 4, im,
           VFPLdStMulFrm, itin, asm, cstr, pattern> {
   // Instruction operands.
   bits<4>  Rn;
   bits<13> regs;
 
   // Encode instruction operands.
   let Inst{19-16} = Rn;
   let Inst{22}    = regs{12};
   let Inst{15-12} = regs{11-8};
   let Inst{7-1}   = regs{7-1};
 
   let Inst{27-25} = 0b110;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 1;          // Double precision
   let Inst{0}     = 0;
 }
 
 // Single Precision
 class AXSI4<dag oops, dag iops, IndexMode im, InstrItinClass itin,
             string asm, string cstr, list<dag> pattern>
   : VFPXI<oops, iops, AddrMode4, 4, im,
           VFPLdStMulFrm, itin, asm, cstr, pattern> {
   // Instruction operands.
   bits<4> Rn;
   bits<13> regs;
 
   // Encode instruction operands.
   let Inst{19-16} = Rn;
   let Inst{22}    = regs{8};
   let Inst{15-12} = regs{12-9};
   let Inst{7-0}   = regs{7-0};
 
   let Inst{27-25} = 0b110;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 0;          // Single precision
 }
 
 // Double precision, unary
 class ADuI<bits<5> opcod1, bits<2> opcod2, bits<4> opcod3, bits<2> opcod4,
            bit opcod5, dag oops, dag iops, InstrItinClass itin, string opc,
            string asm, list<dag> pattern>
   : VFPAI<oops, iops, VFPUnaryFrm, itin, opc, asm, pattern> {
   // Instruction operands.
   bits<5> Dd;
   bits<5> Dm;
 
   // Encode instruction operands.
   let Inst{3-0}   = Dm{3-0};
   let Inst{5}     = Dm{4};
   let Inst{15-12} = Dd{3-0};
   let Inst{22}    = Dd{4};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{19-16} = opcod3;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 1;          // Double precision
   let Inst{7-6}   = opcod4;
   let Inst{4}     = opcod5;
 
   let Predicates = [HasVFP2, HasDPVFP];
 }
 
 // Double precision, unary, not-predicated
 class ADuInp<bits<5> opcod1, bits<2> opcod2, bits<4> opcod3, bits<2> opcod4,
            bit opcod5, dag oops, dag iops, InstrItinClass itin,
            string asm, list<dag> pattern>
   : VFPXI<oops, iops, AddrModeNone, 4, IndexModeNone, VFPUnaryFrm, itin, asm, "", pattern> {
   // Instruction operands.
   bits<5> Dd;
   bits<5> Dm;
 
   let Inst{31-28} = 0b1111;
 
   // Encode instruction operands.
   let Inst{3-0}   = Dm{3-0};
   let Inst{5}     = Dm{4};
   let Inst{15-12} = Dd{3-0};
   let Inst{22}    = Dd{4};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{19-16} = opcod3;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 1;          // Double precision
   let Inst{7-6}   = opcod4;
   let Inst{4}     = opcod5;
 }
 
 // Double precision, binary
 class ADbI<bits<5> opcod1, bits<2> opcod2, bit op6, bit op4, dag oops,
            dag iops, InstrItinClass itin, string opc, string asm,
            list<dag> pattern>
   : VFPAI<oops, iops, VFPBinaryFrm, itin, opc, asm, pattern> {
   // Instruction operands.
   bits<5> Dd;
   bits<5> Dn;
   bits<5> Dm;
 
   // Encode instruction operands.
   let Inst{3-0}   = Dm{3-0};
   let Inst{5}     = Dm{4};
   let Inst{19-16} = Dn{3-0};
   let Inst{7}     = Dn{4};
   let Inst{15-12} = Dd{3-0};
   let Inst{22}    = Dd{4};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 1;          // Double precision
   let Inst{6}     = op6;
   let Inst{4}     = op4;
 
   let Predicates = [HasVFP2, HasDPVFP];
 }
 
 // FP, binary, not predicated
 class ADbInp<bits<5> opcod1, bits<2> opcod2, bit opcod3, dag oops, dag iops,
            InstrItinClass itin, string asm, list<dag> pattern>
   : VFPXI<oops, iops, AddrModeNone, 4, IndexModeNone, VFPBinaryFrm, itin,
           asm, "", pattern>
 {
   // Instruction operands.
   bits<5> Dd;
   bits<5> Dn;
   bits<5> Dm;
 
   let Inst{31-28} = 0b1111;
 
   // Encode instruction operands.
   let Inst{3-0}   = Dm{3-0};
   let Inst{5}     = Dm{4};
   let Inst{19-16} = Dn{3-0};
   let Inst{7}     = Dn{4};
   let Inst{15-12} = Dd{3-0};
   let Inst{22}    = Dd{4};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 1; // double precision
   let Inst{6}     = opcod3;
   let Inst{4}     = 0;
 
   let Predicates = [HasVFP2, HasDPVFP];
 }
 
 // Single precision, unary, predicated
 class ASuI<bits<5> opcod1, bits<2> opcod2, bits<4> opcod3, bits<2> opcod4,
            bit opcod5, dag oops, dag iops, InstrItinClass itin, string opc,
            string asm, list<dag> pattern>
   : VFPAI<oops, iops, VFPUnaryFrm, itin, opc, asm, pattern> {
   // Instruction operands.
   bits<5> Sd;
   bits<5> Sm;
 
   // Encode instruction operands.
   let Inst{3-0}   = Sm{4-1};
   let Inst{5}     = Sm{0};
   let Inst{15-12} = Sd{4-1};
   let Inst{22}    = Sd{0};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{19-16} = opcod3;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 0;          // Single precision
   let Inst{7-6}   = opcod4;
   let Inst{4}     = opcod5;
 }
 
 // Single precision, unary, non-predicated
 class ASuInp<bits<5> opcod1, bits<2> opcod2, bits<4> opcod3, bits<2> opcod4,
              bit opcod5, dag oops, dag iops, InstrItinClass itin,
              string asm, list<dag> pattern>
   : VFPXI<oops, iops, AddrModeNone, 4, IndexModeNone,
           VFPUnaryFrm, itin, asm, "", pattern> {
   // Instruction operands.
   bits<5> Sd;
   bits<5> Sm;
 
   let Inst{31-28} = 0b1111;
 
   // Encode instruction operands.
   let Inst{3-0}   = Sm{4-1};
   let Inst{5}     = Sm{0};
   let Inst{15-12} = Sd{4-1};
   let Inst{22}    = Sd{0};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{19-16} = opcod3;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 0;          // Single precision
   let Inst{7-6}   = opcod4;
   let Inst{4}     = opcod5;
 }
 
 // Single precision unary, if no NEON. Same as ASuI except not available if
 // NEON is enabled.
 class ASuIn<bits<5> opcod1, bits<2> opcod2, bits<4> opcod3, bits<2> opcod4,
             bit opcod5, dag oops, dag iops, InstrItinClass itin, string opc,
             string asm, list<dag> pattern>
   : ASuI<opcod1, opcod2, opcod3, opcod4, opcod5, oops, iops, itin, opc, asm,
          pattern> {
   list<Predicate> Predicates = [HasVFP2,DontUseNEONForFP];
 }
 
 // Single precision, binary
 class ASbI<bits<5> opcod1, bits<2> opcod2, bit op6, bit op4, dag oops, dag iops,
            InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : VFPAI<oops, iops, VFPBinaryFrm, itin, opc, asm, pattern> {
   // Instruction operands.
   bits<5> Sd;
   bits<5> Sn;
   bits<5> Sm;
 
   // Encode instruction operands.
   let Inst{3-0}   = Sm{4-1};
   let Inst{5}     = Sm{0};
   let Inst{19-16} = Sn{4-1};
   let Inst{7}     = Sn{0};
   let Inst{15-12} = Sd{4-1};
   let Inst{22}    = Sd{0};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 0;          // Single precision
   let Inst{6}     = op6;
   let Inst{4}     = op4;
 }
 
 // Single precision, binary, not predicated
 class ASbInp<bits<5> opcod1, bits<2> opcod2, bit opcod3, dag oops, dag iops,
            InstrItinClass itin, string asm, list<dag> pattern>
   : VFPXI<oops, iops, AddrModeNone, 4, IndexModeNone,
           VFPBinaryFrm, itin, asm, "", pattern>
 {
   // Instruction operands.
   bits<5> Sd;
   bits<5> Sn;
   bits<5> Sm;
 
   let Inst{31-28} = 0b1111;
 
   // Encode instruction operands.
   let Inst{3-0}   = Sm{4-1};
   let Inst{5}     = Sm{0};
   let Inst{19-16} = Sn{4-1};
   let Inst{7}     = Sn{0};
   let Inst{15-12} = Sd{4-1};
   let Inst{22}    = Sd{0};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{11-9}  = 0b101;
   let Inst{8}     = 0; // Single precision
   let Inst{6}     = opcod3;
   let Inst{4}     = 0;
 }
 
 // Single precision binary, if no NEON. Same as ASbI except not available if
 // NEON is enabled.
 class ASbIn<bits<5> opcod1, bits<2> opcod2, bit op6, bit op4, dag oops,
             dag iops, InstrItinClass itin, string opc, string asm,
             list<dag> pattern>
   : ASbI<opcod1, opcod2, op6, op4, oops, iops, itin, opc, asm, pattern> {
   list<Predicate> Predicates = [HasVFP2,DontUseNEONForFP];
 
   // Instruction operands.
   bits<5> Sd;
   bits<5> Sn;
   bits<5> Sm;
 
   // Encode instruction operands.
   let Inst{3-0}   = Sm{4-1};
   let Inst{5}     = Sm{0};
   let Inst{19-16} = Sn{4-1};
   let Inst{7}     = Sn{0};
   let Inst{15-12} = Sd{4-1};
   let Inst{22}    = Sd{0};
 }
 
 // Half precision, unary, predicated
 class AHuI<bits<5> opcod1, bits<2> opcod2, bits<4> opcod3, bits<2> opcod4,
            bit opcod5, dag oops, dag iops, InstrItinClass itin, string opc,
            string asm, list<dag> pattern>
   : VFPAI<oops, iops, VFPUnaryFrm, itin, opc, asm, pattern> {
   list<Predicate> Predicates = [HasFullFP16];
 
   // Instruction operands.
   bits<5> Sd;
   bits<5> Sm;
 
   // Encode instruction operands.
   let Inst{3-0}   = Sm{4-1};
   let Inst{5}     = Sm{0};
   let Inst{15-12} = Sd{4-1};
   let Inst{22}    = Sd{0};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{19-16} = opcod3;
   let Inst{11-8}  = 0b1001;   // Half precision
   let Inst{7-6}   = opcod4;
   let Inst{4}     = opcod5;
 }
 
 // Half precision, unary, non-predicated
 class AHuInp<bits<5> opcod1, bits<2> opcod2, bits<4> opcod3, bits<2> opcod4,
              bit opcod5, dag oops, dag iops, InstrItinClass itin,
              string asm, list<dag> pattern>
   : VFPXI<oops, iops, AddrModeNone, 4, IndexModeNone,
           VFPUnaryFrm, itin, asm, "", pattern> {
   list<Predicate> Predicates = [HasFullFP16];
 
   // Instruction operands.
   bits<5> Sd;
   bits<5> Sm;
 
   let Inst{31-28} = 0b1111;
 
   // Encode instruction operands.
   let Inst{3-0}   = Sm{4-1};
   let Inst{5}     = Sm{0};
   let Inst{15-12} = Sd{4-1};
   let Inst{22}    = Sd{0};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{19-16} = opcod3;
   let Inst{11-8}  = 0b1001;   // Half precision
   let Inst{7-6}   = opcod4;
   let Inst{4}     = opcod5;
 }
 
 // Half precision, binary
 class AHbI<bits<5> opcod1, bits<2> opcod2, bit op6, bit op4, dag oops, dag iops,
            InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : VFPAI<oops, iops, VFPBinaryFrm, itin, opc, asm, pattern> {
   list<Predicate> Predicates = [HasFullFP16];
 
   // Instruction operands.
   bits<5> Sd;
   bits<5> Sn;
   bits<5> Sm;
 
   // Encode instruction operands.
   let Inst{3-0}   = Sm{4-1};
   let Inst{5}     = Sm{0};
   let Inst{19-16} = Sn{4-1};
   let Inst{7}     = Sn{0};
   let Inst{15-12} = Sd{4-1};
   let Inst{22}    = Sd{0};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{11-8}  = 0b1001;   // Half precision
   let Inst{6}     = op6;
   let Inst{4}     = op4;
 }
 
 // Half precision, binary, not predicated
 class AHbInp<bits<5> opcod1, bits<2> opcod2, bit opcod3, dag oops, dag iops,
            InstrItinClass itin, string asm, list<dag> pattern>
   : VFPXI<oops, iops, AddrModeNone, 4, IndexModeNone,
           VFPBinaryFrm, itin, asm, "", pattern> {
   list<Predicate> Predicates = [HasFullFP16];
 
   // Instruction operands.
   bits<5> Sd;
   bits<5> Sn;
   bits<5> Sm;
 
   let Inst{31-28} = 0b1111;
 
   // Encode instruction operands.
   let Inst{3-0}   = Sm{4-1};
   let Inst{5}     = Sm{0};
   let Inst{19-16} = Sn{4-1};
   let Inst{7}     = Sn{0};
   let Inst{15-12} = Sd{4-1};
   let Inst{22}    = Sd{0};
 
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{11-8}  = 0b1001;   // Half precision
   let Inst{6}     = opcod3;
   let Inst{4}     = 0;
 }
 
 // VFP conversion instructions
 class AVConv1I<bits<5> opcod1, bits<2> opcod2, bits<4> opcod3, bits<4> opcod4,
                dag oops, dag iops, InstrItinClass itin, string opc, string asm,
                list<dag> pattern>
   : VFPAI<oops, iops, VFPConv1Frm, itin, opc, asm, pattern> {
   let Inst{27-23} = opcod1;
   let Inst{21-20} = opcod2;
   let Inst{19-16} = opcod3;
   let Inst{11-8}  = opcod4;
   let Inst{6}     = 1;
   let Inst{4}     = 0;
 }
 
 // VFP conversion between floating-point and fixed-point
 class AVConv1XI<bits<5> op1, bits<2> op2, bits<4> op3, bits<4> op4, bit op5,
                 dag oops, dag iops, InstrItinClass itin, string opc, string asm,
                 list<dag> pattern>
   : AVConv1I<op1, op2, op3, op4, oops, iops, itin, opc, asm, pattern> {
   bits<5> fbits;
   // size (fixed-point number): sx == 0 ? 16 : 32
   let Inst{7} = op5; // sx
   let Inst{5} = fbits{0};
   let Inst{3-0} = fbits{4-1};
 }
 
 // VFP conversion instructions, if no NEON
 class AVConv1In<bits<5> opcod1, bits<2> opcod2, bits<4> opcod3, bits<4> opcod4,
                 dag oops, dag iops, InstrItinClass itin,
                 string opc, string asm, list<dag> pattern>
   : AVConv1I<opcod1, opcod2, opcod3, opcod4, oops, iops, itin, opc, asm,
              pattern> {
   list<Predicate> Predicates = [HasVFP2,DontUseNEONForFP];
 }
 
 class AVConvXI<bits<8> opcod1, bits<4> opcod2, dag oops, dag iops, Format f,
                InstrItinClass itin,
                string opc, string asm, list<dag> pattern>
   : VFPAI<oops, iops, f, itin, opc, asm, pattern> {
   let Inst{27-20} = opcod1;
   let Inst{11-8}  = opcod2;
   let Inst{4}     = 1;
 }
 
 class AVConv2I<bits<8> opcod1, bits<4> opcod2, dag oops, dag iops,
                InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : AVConvXI<opcod1, opcod2, oops, iops, VFPConv2Frm, itin, opc, asm, pattern>;
 
 class AVConv3I<bits<8> opcod1, bits<4> opcod2, dag oops, dag iops,
                InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : AVConvXI<opcod1, opcod2, oops, iops, VFPConv3Frm, itin, opc, asm, pattern>;
 
 class AVConv4I<bits<8> opcod1, bits<4> opcod2, dag oops, dag iops,
                InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : AVConvXI<opcod1, opcod2, oops, iops, VFPConv4Frm, itin, opc, asm, pattern>;
 
 class AVConv5I<bits<8> opcod1, bits<4> opcod2, dag oops, dag iops,
                InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : AVConvXI<opcod1, opcod2, oops, iops, VFPConv5Frm, itin, opc, asm, pattern>;
 
 //===----------------------------------------------------------------------===//
 
 //===----------------------------------------------------------------------===//
 // ARM NEON Instruction templates.
 //
 
 class NeonI<dag oops, dag iops, AddrMode am, IndexMode im, Format f,
             InstrItinClass itin, string opc, string dt, string asm, string cstr,
             list<dag> pattern>
   : InstARM<am, 4, im, f, NeonDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   let AsmString = !strconcat(opc, "${p}", ".", dt, "\t", asm);
   let Pattern = pattern;
   list<Predicate> Predicates = [HasNEON];
   let DecoderNamespace = "NEON";
 }
 
 // Same as NeonI except it does not have a "data type" specifier.
 class NeonXI<dag oops, dag iops, AddrMode am, IndexMode im, Format f,
              InstrItinClass itin, string opc, string asm, string cstr,
              list<dag> pattern>
   : InstARM<am, 4, im, f, NeonDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   let AsmString = !strconcat(opc, "${p}", "\t", asm);
   let Pattern = pattern;
   list<Predicate> Predicates = [HasNEON];
   let DecoderNamespace = "NEON";
 }
 
 // Same as NeonI except it is not predicated
 class NeonInp<dag oops, dag iops, AddrMode am, IndexMode im, Format f,
             InstrItinClass itin, string opc, string dt, string asm, string cstr,
             list<dag> pattern>
   : InstARM<am, 4, im, f, NeonDomain, cstr, itin> {
   let OutOperandList = oops;
   let InOperandList = iops;
   let AsmString = !strconcat(opc, ".", dt, "\t", asm);
   let Pattern = pattern;
   list<Predicate> Predicates = [HasNEON];
   let DecoderNamespace = "NEON";
 
   let Inst{31-28} = 0b1111;
 }
 
 class NLdSt<bit op23, bits<2> op21_20, bits<4> op11_8, bits<4> op7_4,
             dag oops, dag iops, InstrItinClass itin,
             string opc, string dt, string asm, string cstr, list<dag> pattern>
   : NeonI<oops, iops, AddrMode6, IndexModeNone, NLdStFrm, itin, opc, dt, asm,
           cstr, pattern> {
   let Inst{31-24} = 0b11110100;
   let Inst{23}    = op23;
   let Inst{21-20} = op21_20;
   let Inst{11-8}  = op11_8;
   let Inst{7-4}   = op7_4;
 
   let PostEncoderMethod = "NEONThumb2LoadStorePostEncoder";
   let DecoderNamespace = "NEONLoadStore";
 
   bits<5> Vd;
   bits<6> Rn;
   bits<4> Rm;
 
   let Inst{22}    = Vd{4};
   let Inst{15-12} = Vd{3-0};
   let Inst{19-16} = Rn{3-0};
   let Inst{3-0}   = Rm{3-0};
 }
 
 class NLdStLn<bit op23, bits<2> op21_20, bits<4> op11_8, bits<4> op7_4,
             dag oops, dag iops, InstrItinClass itin,
             string opc, string dt, string asm, string cstr, list<dag> pattern>
   : NLdSt<op23, op21_20, op11_8, op7_4, oops, iops, itin, opc,
           dt, asm, cstr, pattern> {
   bits<3> lane;
 }
 
 class PseudoNLdSt<dag oops, dag iops, InstrItinClass itin, string cstr>
   : InstARM<AddrMode6, 4, IndexModeNone, Pseudo, NeonDomain, cstr,
             itin> {
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   list<Predicate> Predicates = [HasNEON];
 }
 
 class PseudoNeonI<dag oops, dag iops, InstrItinClass itin, string cstr,
                   list<dag> pattern>
   : InstARM<AddrModeNone, 4, IndexModeNone, Pseudo, NeonDomain, cstr,
             itin> {
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   let Pattern = pattern;
   list<Predicate> Predicates = [HasNEON];
 }
 
 class NDataI<dag oops, dag iops, Format f, InstrItinClass itin,
              string opc, string dt, string asm, string cstr, list<dag> pattern>
   : NeonI<oops, iops, AddrModeNone, IndexModeNone, f, itin, opc, dt, asm, cstr,
           pattern> {
   let Inst{31-25} = 0b1111001;
   let PostEncoderMethod = "NEONThumb2DataIPostEncoder";
   let DecoderNamespace = "NEONData";
 }
 
 class NDataXI<dag oops, dag iops, Format f, InstrItinClass itin,
               string opc, string asm, string cstr, list<dag> pattern>
   : NeonXI<oops, iops, AddrModeNone, IndexModeNone, f, itin, opc, asm,
            cstr, pattern> {
   let Inst{31-25} = 0b1111001;
   let PostEncoderMethod = "NEONThumb2DataIPostEncoder";
   let DecoderNamespace = "NEONData";
 }
 
 // NEON "one register and a modified immediate" format.
 class N1ModImm<bit op23, bits<3> op21_19, bits<4> op11_8, bit op7, bit op6,
                bit op5, bit op4,
                dag oops, dag iops, InstrItinClass itin,
                string opc, string dt, string asm, string cstr,
                list<dag> pattern>
   : NDataI<oops, iops, N1RegModImmFrm, itin, opc, dt, asm, cstr, pattern> {
   let Inst{23}    = op23;
   let Inst{21-19} = op21_19;
   let Inst{11-8}  = op11_8;
   let Inst{7}     = op7;
   let Inst{6}     = op6;
   let Inst{5}     = op5;
   let Inst{4}     = op4;
 
   // Instruction operands.
   bits<5> Vd;
   bits<13> SIMM;
 
   let Inst{15-12} = Vd{3-0};
   let Inst{22}    = Vd{4};
   let Inst{24}    = SIMM{7};
   let Inst{18-16} = SIMM{6-4};
   let Inst{3-0}   = SIMM{3-0};
   let DecoderMethod = "DecodeNEONModImmInstruction";
 }
 
 // NEON 2 vector register format.
 class N2V<bits<2> op24_23, bits<2> op21_20, bits<2> op19_18, bits<2> op17_16,
           bits<5> op11_7, bit op6, bit op4,
           dag oops, dag iops, InstrItinClass itin,
           string opc, string dt, string asm, string cstr, list<dag> pattern>
   : NDataI<oops, iops, N2RegFrm, itin, opc, dt, asm, cstr, pattern> {
   let Inst{24-23} = op24_23;
   let Inst{21-20} = op21_20;
   let Inst{19-18} = op19_18;
   let Inst{17-16} = op17_16;
   let Inst{11-7}  = op11_7;
   let Inst{6}     = op6;
   let Inst{4}     = op4;
 
   // Instruction operands.
   bits<5> Vd;
   bits<5> Vm;
 
   let Inst{15-12} = Vd{3-0};
   let Inst{22}    = Vd{4};
   let Inst{3-0}   = Vm{3-0};
   let Inst{5}     = Vm{4};
 }
 
 // Same as N2V but not predicated.
 class N2Vnp<bits<2> op19_18, bits<2> op17_16, bits<3> op10_8, bit op7, bit op6,
             dag oops, dag iops, InstrItinClass itin, string OpcodeStr,
             string Dt, list<dag> pattern>
    : NeonInp<oops, iops, AddrModeNone, IndexModeNone, N2RegFrm, itin,
              OpcodeStr, Dt, "$Vd, $Vm", "", pattern> {
   bits<5> Vd;
   bits<5> Vm;
 
   // Encode instruction operands
   let Inst{22}    = Vd{4};
   let Inst{15-12} = Vd{3-0};
   let Inst{5}     = Vm{4};
   let Inst{3-0}   = Vm{3-0};
 
   // Encode constant bits
   let Inst{27-23} = 0b00111;
   let Inst{21-20} = 0b11;
   let Inst{19-18} = op19_18;
   let Inst{17-16} = op17_16;
   let Inst{11} = 0;
   let Inst{10-8} = op10_8;
   let Inst{7} = op7;
   let Inst{6} = op6;
   let Inst{4} = 0;
 
   let DecoderNamespace = "NEON";
 }
 
 // Same as N2V except it doesn't have a datatype suffix.
 class N2VX<bits<2> op24_23, bits<2> op21_20, bits<2> op19_18, bits<2> op17_16,
            bits<5> op11_7, bit op6, bit op4,
            dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, string cstr, list<dag> pattern>
   : NDataXI<oops, iops, N2RegFrm, itin, opc, asm, cstr, pattern> {
   let Inst{24-23} = op24_23;
   let Inst{21-20} = op21_20;
   let Inst{19-18} = op19_18;
   let Inst{17-16} = op17_16;
   let Inst{11-7}  = op11_7;
   let Inst{6}     = op6;
   let Inst{4}     = op4;
 
   // Instruction operands.
   bits<5> Vd;
   bits<5> Vm;
 
   let Inst{15-12} = Vd{3-0};
   let Inst{22}    = Vd{4};
   let Inst{3-0}   = Vm{3-0};
   let Inst{5}     = Vm{4};
 }
 
 // NEON 2 vector register with immediate.
 class N2VImm<bit op24, bit op23, bits<4> op11_8, bit op7, bit op6, bit op4,
              dag oops, dag iops, Format f, InstrItinClass itin,
              string opc, string dt, string asm, string cstr, list<dag> pattern>
   : NDataI<oops, iops, f, itin, opc, dt, asm, cstr, pattern> {
   let Inst{24}   = op24;
   let Inst{23}   = op23;
   let Inst{11-8} = op11_8;
   let Inst{7}    = op7;
   let Inst{6}    = op6;
   let Inst{4}    = op4;
 
   // Instruction operands.
   bits<5> Vd;
   bits<5> Vm;
   bits<6> SIMM;
 
   let Inst{15-12} = Vd{3-0};
   let Inst{22}    = Vd{4};
   let Inst{3-0}   = Vm{3-0};
   let Inst{5}     = Vm{4};
   let Inst{21-16} = SIMM{5-0};
 }
 
 // NEON 3 vector register format.
 
 class N3VCommon<bit op24, bit op23, bits<2> op21_20, bits<4> op11_8, bit op6,
                 bit op4, dag oops, dag iops, Format f, InstrItinClass itin,
                 string opc, string dt, string asm, string cstr,
                 list<dag> pattern>
   : NDataI<oops, iops, f, itin, opc, dt, asm, cstr, pattern> {
   let Inst{24}    = op24;
   let Inst{23}    = op23;
   let Inst{21-20} = op21_20;
   let Inst{11-8}  = op11_8;
   let Inst{6}     = op6;
   let Inst{4}     = op4;
 }
 
 class N3V<bit op24, bit op23, bits<2> op21_20, bits<4> op11_8, bit op6, bit op4,
           dag oops, dag iops, Format f, InstrItinClass itin,
           string opc, string dt, string asm, string cstr, list<dag> pattern>
   : N3VCommon<op24, op23, op21_20, op11_8, op6, op4,
               oops, iops, f, itin, opc, dt, asm, cstr, pattern> {
   // Instruction operands.
   bits<5> Vd;
   bits<5> Vn;
   bits<5> Vm;
 
   let Inst{15-12} = Vd{3-0};
   let Inst{22}    = Vd{4};
   let Inst{19-16} = Vn{3-0};
   let Inst{7}     = Vn{4};
   let Inst{3-0}   = Vm{3-0};
   let Inst{5}     = Vm{4};
 }
 
 class N3Vnp<bits<5> op27_23, bits<2> op21_20, bits<4> op11_8, bit op6,
                 bit op4, dag oops, dag iops,Format f, InstrItinClass itin,
                 string OpcodeStr, string Dt, list<dag> pattern>
   : NeonInp<oops, iops, AddrModeNone, IndexModeNone, f, itin, OpcodeStr,
             Dt, "$Vd, $Vn, $Vm", "", pattern> {
   bits<5> Vd;
   bits<5> Vn;
   bits<5> Vm;
 
   // Encode instruction operands
   let Inst{22} = Vd{4};
   let Inst{15-12} = Vd{3-0};
   let Inst{19-16} = Vn{3-0};
   let Inst{7} = Vn{4};
   let Inst{5} = Vm{4};
   let Inst{3-0} = Vm{3-0};
 
   // Encode constant bits
   let Inst{27-23} = op27_23;
   let Inst{21-20} = op21_20;
   let Inst{11-8}  = op11_8;
   let Inst{6}     = op6;
   let Inst{4}     = op4;
 }
 
 class N3VLane32<bit op24, bit op23, bits<2> op21_20, bits<4> op11_8, bit op6,
                 bit op4, dag oops, dag iops, Format f, InstrItinClass itin,
                 string opc, string dt, string asm, string cstr,
                 list<dag> pattern>
   : N3VCommon<op24, op23, op21_20, op11_8, op6, op4,
               oops, iops, f, itin, opc, dt, asm, cstr, pattern> {
 
   // Instruction operands.
   bits<5> Vd;
   bits<5> Vn;
   bits<5> Vm;
   bit lane;
 
   let Inst{15-12} = Vd{3-0};
   let Inst{22}    = Vd{4};
   let Inst{19-16} = Vn{3-0};
   let Inst{7}     = Vn{4};
   let Inst{3-0}   = Vm{3-0};
   let Inst{5}     = lane;
 }
 
 class N3VLane16<bit op24, bit op23, bits<2> op21_20, bits<4> op11_8, bit op6,
                 bit op4, dag oops, dag iops, Format f, InstrItinClass itin,
                 string opc, string dt, string asm, string cstr,
                 list<dag> pattern>
   : N3VCommon<op24, op23, op21_20, op11_8, op6, op4,
               oops, iops, f, itin, opc, dt, asm, cstr, pattern> {
 
   // Instruction operands.
   bits<5> Vd;
   bits<5> Vn;
   bits<5> Vm;
   bits<2> lane;
 
   let Inst{15-12} = Vd{3-0};
   let Inst{22}    = Vd{4};
   let Inst{19-16} = Vn{3-0};
   let Inst{7}     = Vn{4};
   let Inst{2-0}   = Vm{2-0};
   let Inst{5}     = lane{1};
   let Inst{3}     = lane{0};
 }
 
 // Same as N3V except it doesn't have a data type suffix.
 class N3VX<bit op24, bit op23, bits<2> op21_20, bits<4> op11_8, bit op6,
            bit op4,
            dag oops, dag iops, Format f, InstrItinClass itin,
            string opc, string asm, string cstr, list<dag> pattern>
   : NDataXI<oops, iops, f, itin, opc, asm, cstr, pattern> {
   let Inst{24}    = op24;
   let Inst{23}    = op23;
   let Inst{21-20} = op21_20;
   let Inst{11-8}  = op11_8;
   let Inst{6}     = op6;
   let Inst{4}     = op4;
 
   // Instruction operands.
   bits<5> Vd;
   bits<5> Vn;
   bits<5> Vm;
 
   let Inst{15-12} = Vd{3-0};
   let Inst{22}    = Vd{4};
   let Inst{19-16} = Vn{3-0};
   let Inst{7}     = Vn{4};
   let Inst{3-0}   = Vm{3-0};
   let Inst{5}     = Vm{4};
 }
 
 // NEON VMOVs between scalar and core registers.
 class NVLaneOp<bits<8> opcod1, bits<4> opcod2, bits<2> opcod3,
                dag oops, dag iops, Format f, InstrItinClass itin,
                string opc, string dt, string asm, list<dag> pattern>
   : InstARM<AddrModeNone, 4, IndexModeNone, f, NeonDomain,
             "", itin> {
   let Inst{27-20} = opcod1;
   let Inst{11-8}  = opcod2;
   let Inst{6-5}   = opcod3;
   let Inst{4}     = 1;
   // A8.6.303, A8.6.328, A8.6.329
   let Inst{3-0}   = 0b0000;
 
   let OutOperandList = oops;
   let InOperandList = !con(iops, (ins pred:$p));
   let AsmString = !strconcat(opc, "${p}", ".", dt, "\t", asm);
   let Pattern = pattern;
   list<Predicate> Predicates = [HasNEON];
 
   let PostEncoderMethod = "NEONThumb2DupPostEncoder";
   let DecoderNamespace = "NEONDup";
 
   bits<5> V;
   bits<4> R;
   bits<4> p;
   bits<4> lane;
 
   let Inst{31-28} = p{3-0};
   let Inst{7}     = V{4};
   let Inst{19-16} = V{3-0};
   let Inst{15-12} = R{3-0};
 }
 class NVGetLane<bits<8> opcod1, bits<4> opcod2, bits<2> opcod3,
                 dag oops, dag iops, InstrItinClass itin,
                 string opc, string dt, string asm, list<dag> pattern>
   : NVLaneOp<opcod1, opcod2, opcod3, oops, iops, NGetLnFrm, itin,
              opc, dt, asm, pattern>;
 class NVSetLane<bits<8> opcod1, bits<4> opcod2, bits<2> opcod3,
                 dag oops, dag iops, InstrItinClass itin,
                 string opc, string dt, string asm, list<dag> pattern>
   : NVLaneOp<opcod1, opcod2, opcod3, oops, iops, NSetLnFrm, itin,
              opc, dt, asm, pattern>;
 class NVDup<bits<8> opcod1, bits<4> opcod2, bits<2> opcod3,
             dag oops, dag iops, InstrItinClass itin,
             string opc, string dt, string asm, list<dag> pattern>
   : NVLaneOp<opcod1, opcod2, opcod3, oops, iops, NDupFrm, itin,
              opc, dt, asm, pattern>;
 
 // Vector Duplicate Lane (from scalar to all elements)
 class NVDupLane<bits<4> op19_16, bit op6, dag oops, dag iops,
                 InstrItinClass itin, string opc, string dt, string asm,
                 list<dag> pattern>
   : NDataI<oops, iops, NVDupLnFrm, itin, opc, dt, asm, "", pattern> {
   let Inst{24-23} = 0b11;
   let Inst{21-20} = 0b11;
   let Inst{19-16} = op19_16;
   let Inst{11-7}  = 0b11000;
   let Inst{6}     = op6;
   let Inst{4}     = 0;
 
   bits<5> Vd;
   bits<5> Vm;
 
   let Inst{22}     = Vd{4};
   let Inst{15-12} = Vd{3-0};
   let Inst{5}     = Vm{4};
   let Inst{3-0} = Vm{3-0};
 }
 
 // NEONFPPat - Same as Pat<>, but requires that the compiler be using NEON
 // for single-precision FP.
 class NEONFPPat<dag pattern, dag result> : Pat<pattern, result> {
   list<Predicate> Predicates = [HasNEON,UseNEONForFP];
 }
 
 // VFP/NEON Instruction aliases for type suffices.
 // Note: When EmitPriority == 1, the alias will be used for printing
 class VFPDataTypeInstAlias<string opc, string dt, string asm, dag Result, bit EmitPriority = 0> :
   InstAlias<!strconcat(opc, dt, "\t", asm), Result, EmitPriority>, Requires<[HasVFP2]>;
 
 // Note: When EmitPriority == 1, the alias will be used for printing
 multiclass VFPDTAnyInstAlias<string opc, string asm, dag Result, bit EmitPriority = 0> {
   def : VFPDataTypeInstAlias<opc, ".8", asm, Result, EmitPriority>;
   def : VFPDataTypeInstAlias<opc, ".16", asm, Result, EmitPriority>;
   def : VFPDataTypeInstAlias<opc, ".32", asm, Result, EmitPriority>;
   def : VFPDataTypeInstAlias<opc, ".64", asm, Result, EmitPriority>;
 }
 
 // Note: When EmitPriority == 1, the alias will be used for printing
 multiclass NEONDTAnyInstAlias<string opc, string asm, dag Result, bit EmitPriority = 0> {
   let Predicates = [HasNEON] in {
   def : VFPDataTypeInstAlias<opc, ".8", asm, Result, EmitPriority>;
   def : VFPDataTypeInstAlias<opc, ".16", asm, Result, EmitPriority>;
   def : VFPDataTypeInstAlias<opc, ".32", asm, Result, EmitPriority>;
   def : VFPDataTypeInstAlias<opc, ".64", asm, Result, EmitPriority>;
 }
 }
 
 // The same alias classes using AsmPseudo instead, for the more complex
 // stuff in NEON that InstAlias can't quite handle.
 // Note that we can't use anonymous defm references here like we can
 // above, as we care about the ultimate instruction enum names generated, unlike
 // for instalias defs.
 class NEONDataTypeAsmPseudoInst<string opc, string dt, string asm, dag iops> :
   AsmPseudoInst<!strconcat(opc, dt, "\t", asm), iops>, Requires<[HasNEON]>;
 
 // Extension of NEON 3-vector data processing instructions in coprocessor 8
 // encoding space, introduced in ARMv8.3-A.
 class N3VCP8<bits<2> op24_23, bits<2> op21_20, bit op6, bit op4,
              dag oops, dag iops, InstrItinClass itin,
              string opc, string dt, string asm, string cstr, list<dag> pattern>
   : NeonInp<oops, iops, AddrModeNone, IndexModeNone, N3RegCplxFrm, itin, opc,
             dt, asm, cstr, pattern> {
   bits<5> Vd;
   bits<5> Vn;
   bits<5> Vm;
 
   let DecoderNamespace = "VFPV8";
   // These have the same encodings in ARM and Thumb2
   let PostEncoderMethod = "";
 
   let Inst{31-25} = 0b1111110;
   let Inst{24-23} = op24_23;
   let Inst{22}    = Vd{4};
   let Inst{21-20} = op21_20;
   let Inst{19-16} = Vn{3-0};
   let Inst{15-12} = Vd{3-0};
   let Inst{11-8}  = 0b1000;
   let Inst{7}     = Vn{4};
   let Inst{6}     = op6;
   let Inst{5}     = Vm{4};
   let Inst{4}     = op4;
   let Inst{3-0}   = Vm{3-0};
 }
 
 // Extension of NEON 2-vector-and-scalar data processing instructions in
 // coprocessor 8 encoding space, introduced in ARMv8.3-A.
 class N3VLaneCP8<bit op23, bits<2> op21_20, bit op6, bit op4,
              dag oops, dag iops, InstrItinClass itin,
              string opc, string dt, string asm, string cstr, list<dag> pattern>
   : NeonInp<oops, iops, AddrModeNone, IndexModeNone, N3RegCplxFrm, itin, opc,
             dt, asm, cstr, pattern> {
   bits<5> Vd;
   bits<5> Vn;
   bits<5> Vm;
 
   let DecoderNamespace = "VFPV8";
   // These have the same encodings in ARM and Thumb2
   let PostEncoderMethod = "";
 
   let Inst{31-24} = 0b11111110;
   let Inst{23}    = op23;
   let Inst{22}    = Vd{4};
   let Inst{21-20} = op21_20;
   let Inst{19-16} = Vn{3-0};
   let Inst{15-12} = Vd{3-0};
   let Inst{11-8}  = 0b1000;
   let Inst{7}     = Vn{4};
   let Inst{6}     = op6;
   // Bit 5 set by sub-classes
   let Inst{4}     = op4;
   let Inst{3-0}   = Vm{3-0};
 }
 
 // Operand types for complex instructions
 class ComplexRotationOperand<int Angle, int Remainder, string Type, string Diag>
   : AsmOperandClass {
   let PredicateMethod = "isComplexRotation<" # Angle # ", " # Remainder # ">";
   let DiagnosticString = "complex rotation must be " # Diag;
   let Name = "ComplexRotation" # Type;
 }
 def complexrotateop : Operand<i32> {
   let ParserMatchClass = ComplexRotationOperand<90, 0, "Even", "0, 90, 180 or 270">;
   let PrintMethod = "printComplexRotationOp<90, 0>";
 }
 def complexrotateopodd : Operand<i32> {
   let ParserMatchClass = ComplexRotationOperand<180, 90, "Odd", "90 or 270">;
   let PrintMethod = "printComplexRotationOp<180, 90>";
 }
 
 // Data type suffix token aliases. Implements Table A7-3 in the ARM ARM.
 def : TokenAlias<".s8", ".i8">;
 def : TokenAlias<".u8", ".i8">;
 def : TokenAlias<".s16", ".i16">;
 def : TokenAlias<".u16", ".i16">;
 def : TokenAlias<".s32", ".i32">;
 def : TokenAlias<".u32", ".i32">;
 def : TokenAlias<".s64", ".i64">;
 def : TokenAlias<".u64", ".i64">;
 
 def : TokenAlias<".i8", ".8">;
 def : TokenAlias<".i16", ".16">;
 def : TokenAlias<".i32", ".32">;
 def : TokenAlias<".i64", ".64">;
 
 def : TokenAlias<".p8", ".8">;
 def : TokenAlias<".p16", ".16">;
 
 def : TokenAlias<".f32", ".32">;
 def : TokenAlias<".f64", ".64">;
 def : TokenAlias<".f", ".f32">;
 def : TokenAlias<".d", ".f64">;
Index: vendor/llvm/dist-release_70/lib/Target/ARM/ARMInstrThumb2.td
===================================================================
--- vendor/llvm/dist-release_70/lib/Target/ARM/ARMInstrThumb2.td	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Target/ARM/ARMInstrThumb2.td	(revision 338575)
@@ -1,4867 +1,4867 @@
 //===-- ARMInstrThumb2.td - Thumb2 support for ARM ---------*- tablegen -*-===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file describes the Thumb2 instruction set.
 //
 //===----------------------------------------------------------------------===//
 
 // IT block predicate field
 def it_pred_asmoperand : AsmOperandClass {
   let Name = "ITCondCode";
   let ParserMethod = "parseITCondCode";
 }
 def it_pred : Operand<i32> {
   let PrintMethod = "printMandatoryPredicateOperand";
   let ParserMatchClass = it_pred_asmoperand;
 }
 
 // IT block condition mask
 def it_mask_asmoperand : AsmOperandClass { let Name = "ITMask"; }
 def it_mask : Operand<i32> {
   let PrintMethod = "printThumbITMask";
   let ParserMatchClass = it_mask_asmoperand;
 }
 
 // t2_shift_imm: An integer that encodes a shift amount and the type of shift
 // (asr or lsl). The 6-bit immediate encodes as:
 //    {5}     0 ==> lsl
 //            1     asr
 //    {4-0}   imm5 shift amount.
 //            asr #32 not allowed
 def t2_shift_imm : Operand<i32> {
   let PrintMethod = "printShiftImmOperand";
   let ParserMatchClass = ShifterImmAsmOperand;
   let DecoderMethod = "DecodeT2ShifterImmOperand";
 }
 
 // Shifted operands. No register controlled shifts for Thumb2.
 // Note: We do not support rrx shifted operands yet.
 def t2_so_reg : Operand<i32>,    // reg imm
                 ComplexPattern<i32, 2, "SelectShiftImmShifterOperand",
                                [shl,srl,sra,rotr]> {
   let EncoderMethod = "getT2SORegOpValue";
   let PrintMethod = "printT2SOOperand";
   let DecoderMethod = "DecodeSORegImmOperand";
   let ParserMatchClass = ShiftedImmAsmOperand;
   let MIOperandInfo = (ops rGPR, i32imm);
 }
 
 // t2_so_imm_not_XFORM - Return the complement of a t2_so_imm value
 def t2_so_imm_not_XFORM : SDNodeXForm<imm, [{
   return CurDAG->getTargetConstant(~((uint32_t)N->getZExtValue()), SDLoc(N),
                                    MVT::i32);
 }]>;
 
 // t2_so_imm_neg_XFORM - Return the negation of a t2_so_imm value
 def t2_so_imm_neg_XFORM : SDNodeXForm<imm, [{
   return CurDAG->getTargetConstant(-((int)N->getZExtValue()), SDLoc(N),
                                    MVT::i32);
 }]>;
 
 // so_imm_notSext_XFORM - Return a so_imm value packed into the format
 // described for so_imm_notSext def below, with sign extension from 16
 // bits.
 def t2_so_imm_notSext16_XFORM : SDNodeXForm<imm, [{
   APInt apIntN = N->getAPIntValue();
   unsigned N16bitSignExt = apIntN.trunc(16).sext(32).getZExtValue();
   return CurDAG->getTargetConstant(~N16bitSignExt, SDLoc(N), MVT::i32);
 }]>;
 
 // t2_so_imm - Match a 32-bit immediate operand, which is an
 // 8-bit immediate rotated by an arbitrary number of bits, or an 8-bit
 // immediate splatted into multiple bytes of the word.
 def t2_so_imm_asmoperand : AsmOperandClass {
   let Name = "T2SOImm";
   let RenderMethod = "addImmOperands";
 
 }
 def t2_so_imm : Operand<i32>, ImmLeaf<i32, [{
     return ARM_AM::getT2SOImmVal(Imm) != -1;
   }]> {
   let ParserMatchClass = t2_so_imm_asmoperand;
   let EncoderMethod = "getT2SOImmOpValue";
   let DecoderMethod = "DecodeT2SOImm";
 }
 
 // t2_so_imm_not - Match an immediate that is a complement
 // of a t2_so_imm.
 // Note: this pattern doesn't require an encoder method and such, as it's
 // only used on aliases (Pat<> and InstAlias<>). The actual encoding
 // is handled by the destination instructions, which use t2_so_imm.
 def t2_so_imm_not_asmoperand : AsmOperandClass { let Name = "T2SOImmNot"; }
 def t2_so_imm_not : Operand<i32>, PatLeaf<(imm), [{
   return ARM_AM::getT2SOImmVal(~((uint32_t)N->getZExtValue())) != -1;
 }], t2_so_imm_not_XFORM> {
   let ParserMatchClass = t2_so_imm_not_asmoperand;
 }
 
 // t2_so_imm_notSext - match an immediate that is a complement of a t2_so_imm
 // if the upper 16 bits are zero.
 def t2_so_imm_notSext : Operand<i32>, PatLeaf<(imm), [{
     APInt apIntN = N->getAPIntValue();
     if (!apIntN.isIntN(16)) return false;
     unsigned N16bitSignExt = apIntN.trunc(16).sext(32).getZExtValue();
     return ARM_AM::getT2SOImmVal(~N16bitSignExt) != -1;
   }], t2_so_imm_notSext16_XFORM> {
   let ParserMatchClass = t2_so_imm_not_asmoperand;
 }
 
 // t2_so_imm_neg - Match an immediate that is a negation of a t2_so_imm.
 def t2_so_imm_neg_asmoperand : AsmOperandClass { let Name = "T2SOImmNeg"; }
 def t2_so_imm_neg : Operand<i32>, ImmLeaf<i32, [{
   return Imm && ARM_AM::getT2SOImmVal(-(uint32_t)Imm) != -1;
 }], t2_so_imm_neg_XFORM> {
   let ParserMatchClass = t2_so_imm_neg_asmoperand;
 }
 
 /// imm0_4095 predicate - True if the 32-bit immediate is in the range [0,4095].
 def imm0_4095_asmoperand: ImmAsmOperand<0,4095> { let Name = "Imm0_4095"; }
 def imm0_4095 : Operand<i32>, ImmLeaf<i32, [{
   return Imm >= 0 && Imm < 4096;
 }]> {
   let ParserMatchClass = imm0_4095_asmoperand;
 }
 
 def imm0_4095_neg_asmoperand: AsmOperandClass { let Name = "Imm0_4095Neg"; }
 def imm0_4095_neg : Operand<i32>, PatLeaf<(i32 imm), [{
  return (uint32_t)(-N->getZExtValue()) < 4096;
 }], imm_neg_XFORM> {
   let ParserMatchClass = imm0_4095_neg_asmoperand;
 }
 
 def imm1_255_neg : PatLeaf<(i32 imm), [{
   uint32_t Val = -N->getZExtValue();
   return (Val > 0 && Val < 255);
 }], imm_neg_XFORM>;
 
 def imm0_255_not : PatLeaf<(i32 imm), [{
   return (uint32_t)(~N->getZExtValue()) < 255;
 }], imm_not_XFORM>;
 
 def lo5AllOne : PatLeaf<(i32 imm), [{
   // Returns true if all low 5-bits are 1.
   return (((uint32_t)N->getZExtValue()) & 0x1FUL) == 0x1FUL;
 }]>;
 
 // Define Thumb2 specific addressing modes.
 
 // t2addrmode_imm12  := reg + imm12
 def t2addrmode_imm12_asmoperand : AsmOperandClass {let Name="MemUImm12Offset";}
 def t2addrmode_imm12 : MemOperand,
                        ComplexPattern<i32, 2, "SelectT2AddrModeImm12", []> {
   let PrintMethod = "printAddrModeImm12Operand<false>";
   let EncoderMethod = "getAddrModeImm12OpValue";
   let DecoderMethod = "DecodeT2AddrModeImm12";
   let ParserMatchClass = t2addrmode_imm12_asmoperand;
   let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm);
 }
 
 // t2ldrlabel  := imm12
 def t2ldrlabel : Operand<i32> {
   let EncoderMethod = "getAddrModeImm12OpValue";
   let PrintMethod = "printThumbLdrLabelOperand";
 }
 
 def t2ldr_pcrel_imm12_asmoperand : AsmOperandClass {let Name = "MemPCRelImm12";}
 def t2ldr_pcrel_imm12 : Operand<i32> {
   let ParserMatchClass = t2ldr_pcrel_imm12_asmoperand;
   // used for assembler pseudo instruction and maps to t2ldrlabel, so
   // doesn't need encoder or print methods of its own.
 }
 
 // ADR instruction labels.
 def t2adrlabel : Operand<i32> {
   let EncoderMethod = "getT2AdrLabelOpValue";
   let PrintMethod = "printAdrLabelOperand<0>";
 }
 
 // t2addrmode_posimm8  := reg + imm8
 def MemPosImm8OffsetAsmOperand : AsmOperandClass {let Name="MemPosImm8Offset";}
 def t2addrmode_posimm8 : MemOperand {
   let PrintMethod = "printT2AddrModeImm8Operand<false>";
   let EncoderMethod = "getT2AddrModeImm8OpValue";
   let DecoderMethod = "DecodeT2AddrModeImm8";
   let ParserMatchClass = MemPosImm8OffsetAsmOperand;
   let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm);
 }
 
 // t2addrmode_negimm8  := reg - imm8
 def MemNegImm8OffsetAsmOperand : AsmOperandClass {let Name="MemNegImm8Offset";}
 def t2addrmode_negimm8 : MemOperand,
                       ComplexPattern<i32, 2, "SelectT2AddrModeImm8", []> {
   let PrintMethod = "printT2AddrModeImm8Operand<false>";
   let EncoderMethod = "getT2AddrModeImm8OpValue";
   let DecoderMethod = "DecodeT2AddrModeImm8";
   let ParserMatchClass = MemNegImm8OffsetAsmOperand;
   let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm);
 }
 
 // t2addrmode_imm8  := reg +/- imm8
 def MemImm8OffsetAsmOperand : AsmOperandClass { let Name = "MemImm8Offset"; }
 class T2AddrMode_Imm8 : MemOperand,
                         ComplexPattern<i32, 2, "SelectT2AddrModeImm8", []> {
   let EncoderMethod = "getT2AddrModeImm8OpValue";
   let DecoderMethod = "DecodeT2AddrModeImm8";
   let ParserMatchClass = MemImm8OffsetAsmOperand;
   let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm);
 }
 
 def t2addrmode_imm8 : T2AddrMode_Imm8 {
   let PrintMethod = "printT2AddrModeImm8Operand<false>";
 }
 
 def t2addrmode_imm8_pre : T2AddrMode_Imm8 {
   let PrintMethod = "printT2AddrModeImm8Operand<true>";
 }
 
 def t2am_imm8_offset : MemOperand,
                        ComplexPattern<i32, 1, "SelectT2AddrModeImm8Offset",
                                       [], [SDNPWantRoot]> {
   let PrintMethod = "printT2AddrModeImm8OffsetOperand";
   let EncoderMethod = "getT2AddrModeImm8OffsetOpValue";
   let DecoderMethod = "DecodeT2Imm8";
 }
 
 // t2addrmode_imm8s4  := reg +/- (imm8 << 2)
 def MemImm8s4OffsetAsmOperand : AsmOperandClass {let Name = "MemImm8s4Offset";}
 class T2AddrMode_Imm8s4 : MemOperand {
   let EncoderMethod = "getT2AddrModeImm8s4OpValue";
   let DecoderMethod = "DecodeT2AddrModeImm8s4";
   let ParserMatchClass = MemImm8s4OffsetAsmOperand;
   let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm);
 }
 
 def t2addrmode_imm8s4 : T2AddrMode_Imm8s4 {
   let PrintMethod = "printT2AddrModeImm8s4Operand<false>";
 }
 
 def t2addrmode_imm8s4_pre : T2AddrMode_Imm8s4 {
   let PrintMethod = "printT2AddrModeImm8s4Operand<true>";
 }
 
 def t2am_imm8s4_offset_asmoperand : AsmOperandClass { let Name = "Imm8s4"; }
 def t2am_imm8s4_offset : MemOperand {
   let PrintMethod = "printT2AddrModeImm8s4OffsetOperand";
   let EncoderMethod = "getT2Imm8s4OpValue";
   let DecoderMethod = "DecodeT2Imm8S4";
 }
 
 // t2addrmode_imm0_1020s4  := reg + (imm8 << 2)
 def MemImm0_1020s4OffsetAsmOperand : AsmOperandClass {
   let Name = "MemImm0_1020s4Offset";
 }
 def t2addrmode_imm0_1020s4 : MemOperand,
                          ComplexPattern<i32, 2, "SelectT2AddrModeExclusive"> {
   let PrintMethod = "printT2AddrModeImm0_1020s4Operand";
   let EncoderMethod = "getT2AddrModeImm0_1020s4OpValue";
   let DecoderMethod = "DecodeT2AddrModeImm0_1020s4";
   let ParserMatchClass = MemImm0_1020s4OffsetAsmOperand;
   let MIOperandInfo = (ops GPRnopc:$base, i32imm:$offsimm);
 }
 
 // t2addrmode_so_reg  := reg + (reg << imm2)
 def t2addrmode_so_reg_asmoperand : AsmOperandClass {let Name="T2MemRegOffset";}
 def t2addrmode_so_reg : MemOperand,
                         ComplexPattern<i32, 3, "SelectT2AddrModeSoReg", []> {
   let PrintMethod = "printT2AddrModeSoRegOperand";
   let EncoderMethod = "getT2AddrModeSORegOpValue";
   let DecoderMethod = "DecodeT2AddrModeSOReg";
   let ParserMatchClass = t2addrmode_so_reg_asmoperand;
   let MIOperandInfo = (ops GPRnopc:$base, rGPR:$offsreg, i32imm:$offsimm);
 }
 
 // Addresses for the TBB/TBH instructions.
 def addrmode_tbb_asmoperand : AsmOperandClass { let Name = "MemTBB"; }
 def addrmode_tbb : MemOperand {
   let PrintMethod = "printAddrModeTBB";
   let ParserMatchClass = addrmode_tbb_asmoperand;
   let MIOperandInfo = (ops GPR:$Rn, rGPR:$Rm);
 }
 def addrmode_tbh_asmoperand : AsmOperandClass { let Name = "MemTBH"; }
 def addrmode_tbh : MemOperand {
   let PrintMethod = "printAddrModeTBH";
   let ParserMatchClass = addrmode_tbh_asmoperand;
   let MIOperandInfo = (ops GPR:$Rn, rGPR:$Rm);
 }
 
 //===----------------------------------------------------------------------===//
 // Multiclass helpers...
 //
 
 
 class T2OneRegImm<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<12> imm;
 
   let Inst{11-8}  = Rd;
   let Inst{26}    = imm{11};
   let Inst{14-12} = imm{10-8};
   let Inst{7-0}   = imm{7-0};
 }
 
 
 class T2sOneRegImm<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2sI<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<12> imm;
 
   let Inst{11-8}  = Rd;
   let Inst{26}    = imm{11};
   let Inst{14-12} = imm{10-8};
   let Inst{7-0}   = imm{7-0};
 }
 
 class T2OneRegCmpImm<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rn;
   bits<12> imm;
 
   let Inst{19-16}  = Rn;
   let Inst{26}    = imm{11};
   let Inst{14-12} = imm{10-8};
   let Inst{7-0}   = imm{7-0};
 }
 
 
 class T2OneRegShiftedReg<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<12> ShiftedRm;
 
   let Inst{11-8}  = Rd;
   let Inst{3-0}   = ShiftedRm{3-0};
   let Inst{5-4}   = ShiftedRm{6-5};
   let Inst{14-12} = ShiftedRm{11-9};
   let Inst{7-6}   = ShiftedRm{8-7};
 }
 
 class T2sOneRegShiftedReg<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2sI<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<12> ShiftedRm;
 
   let Inst{11-8}  = Rd;
   let Inst{3-0}   = ShiftedRm{3-0};
   let Inst{5-4}   = ShiftedRm{6-5};
   let Inst{14-12} = ShiftedRm{11-9};
   let Inst{7-6}   = ShiftedRm{8-7};
 }
 
 class T2OneRegCmpShiftedReg<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rn;
   bits<12> ShiftedRm;
 
   let Inst{19-16} = Rn;
   let Inst{3-0}   = ShiftedRm{3-0};
   let Inst{5-4}   = ShiftedRm{6-5};
   let Inst{14-12} = ShiftedRm{11-9};
   let Inst{7-6}   = ShiftedRm{8-7};
 }
 
 class T2TwoReg<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rm;
 
   let Inst{11-8}  = Rd;
   let Inst{3-0}   = Rm;
 }
 
 class T2sTwoReg<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2sI<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rm;
 
   let Inst{11-8}  = Rd;
   let Inst{3-0}   = Rm;
 }
 
 class T2TwoRegCmp<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rn;
   bits<4> Rm;
 
   let Inst{19-16} = Rn;
   let Inst{3-0}   = Rm;
 }
 
 
 class T2TwoRegImm<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<12> imm;
 
   let Inst{11-8}  = Rd;
   let Inst{19-16} = Rn;
   let Inst{26}    = imm{11};
   let Inst{14-12} = imm{10-8};
   let Inst{7-0}   = imm{7-0};
 }
 
 class T2sTwoRegImm<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2sI<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<12> imm;
 
   let Inst{11-8}  = Rd;
   let Inst{19-16} = Rn;
   let Inst{26}    = imm{11};
   let Inst{14-12} = imm{10-8};
   let Inst{7-0}   = imm{7-0};
 }
 
 class T2TwoRegShiftImm<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rm;
   bits<5> imm;
 
   let Inst{11-8}  = Rd;
   let Inst{3-0}   = Rm;
   let Inst{14-12} = imm{4-2};
   let Inst{7-6}   = imm{1-0};
 }
 
 class T2sTwoRegShiftImm<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2sI<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rm;
   bits<5> imm;
 
   let Inst{11-8}  = Rd;
   let Inst{3-0}   = Rm;
   let Inst{14-12} = imm{4-2};
   let Inst{7-6}   = imm{1-0};
 }
 
 class T2ThreeReg<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<4> Rm;
 
   let Inst{11-8}  = Rd;
   let Inst{19-16} = Rn;
   let Inst{3-0}   = Rm;
 }
 
 class T2ThreeRegNoP<dag oops, dag iops, InstrItinClass itin,
            string asm, list<dag> pattern>
   : T2XI<oops, iops, itin, asm, pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<4> Rm;
 
   let Inst{11-8}  = Rd;
   let Inst{19-16} = Rn;
   let Inst{3-0}   = Rm;
 }
 
 class T2sThreeReg<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2sI<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<4> Rm;
 
   let Inst{11-8}  = Rd;
   let Inst{19-16} = Rn;
   let Inst{3-0}   = Rm;
 }
 
 class T2TwoRegShiftedReg<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<12> ShiftedRm;
 
   let Inst{11-8}  = Rd;
   let Inst{19-16} = Rn;
   let Inst{3-0}   = ShiftedRm{3-0};
   let Inst{5-4}   = ShiftedRm{6-5};
   let Inst{14-12} = ShiftedRm{11-9};
   let Inst{7-6}   = ShiftedRm{8-7};
 }
 
 class T2sTwoRegShiftedReg<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2sI<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<12> ShiftedRm;
 
   let Inst{11-8}  = Rd;
   let Inst{19-16} = Rn;
   let Inst{3-0}   = ShiftedRm{3-0};
   let Inst{5-4}   = ShiftedRm{6-5};
   let Inst{14-12} = ShiftedRm{11-9};
   let Inst{7-6}   = ShiftedRm{8-7};
 }
 
 class T2FourReg<dag oops, dag iops, InstrItinClass itin,
            string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<4> Rn;
   bits<4> Rm;
   bits<4> Ra;
 
   let Inst{19-16} = Rn;
   let Inst{15-12} = Ra;
   let Inst{11-8}  = Rd;
   let Inst{3-0}   = Rm;
 }
 
 class T2MulLong<bits<3> opc22_20, bits<4> opc7_4,
                 string opc, list<dag> pattern>
   : T2I<(outs rGPR:$RdLo, rGPR:$RdHi), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL64,
          opc, "\t$RdLo, $RdHi, $Rn, $Rm", pattern>,
     Sched<[WriteMUL64Lo, WriteMUL64Hi, ReadMUL, ReadMUL]> {
   bits<4> RdLo;
   bits<4> RdHi;
   bits<4> Rn;
   bits<4> Rm;
 
   let Inst{31-23} = 0b111110111;
   let Inst{22-20} = opc22_20;
   let Inst{19-16} = Rn;
   let Inst{15-12} = RdLo;
   let Inst{11-8}  = RdHi;
   let Inst{7-4}   = opc7_4;
   let Inst{3-0}   = Rm;
 }
 class T2MlaLong<bits<3> opc22_20, bits<4> opc7_4, string opc>
   : T2I<(outs rGPR:$RdLo, rGPR:$RdHi),
         (ins rGPR:$Rn, rGPR:$Rm, rGPR:$RLo, rGPR:$RHi), IIC_iMAC64,
         opc, "\t$RdLo, $RdHi, $Rn, $Rm", []>,
         RegConstraint<"$RLo = $RdLo, $RHi = $RdHi">,
     Sched<[WriteMAC64Lo, WriteMAC64Hi, ReadMUL, ReadMUL, ReadMAC, ReadMAC]> {
   bits<4> RdLo;
   bits<4> RdHi;
   bits<4> Rn;
   bits<4> Rm;
 
   let Inst{31-23} = 0b111110111;
   let Inst{22-20} = opc22_20;
   let Inst{19-16} = Rn;
   let Inst{15-12} = RdLo;
   let Inst{11-8}  = RdHi;
   let Inst{7-4}   = opc7_4;
   let Inst{3-0}   = Rm;
 }
 
 
 /// T2I_bin_irs - Defines a set of (op reg, {so_imm|r|so_reg}) patterns for a
 /// binary operation that produces a value. These are predicable and can be
 /// changed to modify CPSR.
 multiclass T2I_bin_irs<bits<4> opcod, string opc,
                      InstrItinClass iii, InstrItinClass iir, InstrItinClass iis,
                      SDPatternOperator opnode, bit Commutable = 0,
                      string wide = ""> {
    // shifted imm
    def ri : T2sTwoRegImm<
                 (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_imm:$imm), iii,
                  opc, "\t$Rd, $Rn, $imm",
                  [(set rGPR:$Rd, (opnode rGPR:$Rn, t2_so_imm:$imm))]>,
                  Sched<[WriteALU, ReadALU]> {
      let Inst{31-27} = 0b11110;
      let Inst{25} = 0;
      let Inst{24-21} = opcod;
      let Inst{15} = 0;
    }
    // register
    def rr : T2sThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), iir,
                  opc, !strconcat(wide, "\t$Rd, $Rn, $Rm"),
                  [(set rGPR:$Rd, (opnode rGPR:$Rn, rGPR:$Rm))]>,
                  Sched<[WriteALU, ReadALU, ReadALU]> {
      let isCommutable = Commutable;
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = opcod;
      let Inst{14-12} = 0b000; // imm3
      let Inst{7-6} = 0b00; // imm2
      let Inst{5-4} = 0b00; // type
    }
    // shifted register
    def rs : T2sTwoRegShiftedReg<
                  (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_reg:$ShiftedRm), iis,
                  opc, !strconcat(wide, "\t$Rd, $Rn, $ShiftedRm"),
                  [(set rGPR:$Rd, (opnode rGPR:$Rn, t2_so_reg:$ShiftedRm))]>,
                  Sched<[WriteALUsi, ReadALU]>  {
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = opcod;
    }
   // Assembly aliases for optional destination operand when it's the same
   // as the source operand.
   def : t2InstAlias<!strconcat(opc, "${s}${p} $Rdn, $imm"),
      (!cast<Instruction>(NAME#"ri") rGPR:$Rdn, rGPR:$Rdn,
                                                     t2_so_imm:$imm, pred:$p,
                                                     cc_out:$s)>;
   def : t2InstAlias<!strconcat(opc, "${s}${p}", wide, " $Rdn, $Rm"),
      (!cast<Instruction>(NAME#"rr") rGPR:$Rdn, rGPR:$Rdn,
                                                     rGPR:$Rm, pred:$p,
                                                     cc_out:$s)>;
   def : t2InstAlias<!strconcat(opc, "${s}${p}", wide, " $Rdn, $shift"),
      (!cast<Instruction>(NAME#"rs") rGPR:$Rdn, rGPR:$Rdn,
                                                     t2_so_reg:$shift, pred:$p,
                                                     cc_out:$s)>;
 }
 
 /// T2I_bin_w_irs - Same as T2I_bin_irs except these operations need
 //  the ".w" suffix to indicate that they are wide.
 multiclass T2I_bin_w_irs<bits<4> opcod, string opc,
                      InstrItinClass iii, InstrItinClass iir, InstrItinClass iis,
                      SDPatternOperator opnode, bit Commutable = 0> :
     T2I_bin_irs<opcod, opc, iii, iir, iis, opnode, Commutable, ".w"> {
   // Assembler aliases w/ the ".w" suffix.
   def : t2InstAlias<!strconcat(opc, "${s}${p}.w", " $Rd, $Rn, $imm"),
      (!cast<Instruction>(NAME#"ri") rGPR:$Rd, rGPR:$Rn, t2_so_imm:$imm, pred:$p,
                                     cc_out:$s)>;
   // Assembler aliases w/o the ".w" suffix.
   def : t2InstAlias<!strconcat(opc, "${s}${p}", " $Rd, $Rn, $Rm"),
      (!cast<Instruction>(NAME#"rr") rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, pred:$p,
                                     cc_out:$s)>;
   def : t2InstAlias<!strconcat(opc, "${s}${p}", " $Rd, $Rn, $shift"),
      (!cast<Instruction>(NAME#"rs") rGPR:$Rd, rGPR:$Rn, t2_so_reg:$shift,
                                     pred:$p, cc_out:$s)>;
 
   // and with the optional destination operand, too.
   def : t2InstAlias<!strconcat(opc, "${s}${p}.w", " $Rdn, $imm"),
      (!cast<Instruction>(NAME#"ri") rGPR:$Rdn, rGPR:$Rdn, t2_so_imm:$imm,
                                     pred:$p, cc_out:$s)>;
   def : t2InstAlias<!strconcat(opc, "${s}${p}", " $Rdn, $Rm"),
      (!cast<Instruction>(NAME#"rr") rGPR:$Rdn, rGPR:$Rdn, rGPR:$Rm, pred:$p,
                                     cc_out:$s)>;
   def : t2InstAlias<!strconcat(opc, "${s}${p}", " $Rdn, $shift"),
      (!cast<Instruction>(NAME#"rs") rGPR:$Rdn, rGPR:$Rdn, t2_so_reg:$shift,
                                     pred:$p, cc_out:$s)>;
 }
 
 /// T2I_rbin_is - Same as T2I_bin_irs except the order of operands are
 /// reversed.  The 'rr' form is only defined for the disassembler; for codegen
 /// it is equivalent to the T2I_bin_irs counterpart.
 multiclass T2I_rbin_irs<bits<4> opcod, string opc, SDNode opnode> {
    // shifted imm
    def ri : T2sTwoRegImm<
                  (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_imm:$imm), IIC_iALUi,
                  opc, ".w\t$Rd, $Rn, $imm",
                  [(set rGPR:$Rd, (opnode t2_so_imm:$imm, rGPR:$Rn))]>,
                  Sched<[WriteALU, ReadALU]> {
      let Inst{31-27} = 0b11110;
      let Inst{25} = 0;
      let Inst{24-21} = opcod;
      let Inst{15} = 0;
    }
    // register
    def rr : T2sThreeReg<
                  (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iALUr,
                  opc, "\t$Rd, $Rn, $Rm",
                  [/* For disassembly only; pattern left blank */]>,
                  Sched<[WriteALU, ReadALU, ReadALU]> {
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = opcod;
      let Inst{14-12} = 0b000; // imm3
      let Inst{7-6} = 0b00; // imm2
      let Inst{5-4} = 0b00; // type
    }
    // shifted register
    def rs : T2sTwoRegShiftedReg<
                  (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_reg:$ShiftedRm),
                  IIC_iALUsir, opc, "\t$Rd, $Rn, $ShiftedRm",
                  [(set rGPR:$Rd, (opnode t2_so_reg:$ShiftedRm, rGPR:$Rn))]>,
                  Sched<[WriteALUsi, ReadALU]> {
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = opcod;
    }
 }
 
 /// T2I_bin_s_irs - Similar to T2I_bin_irs except it sets the 's' bit so the
 /// instruction modifies the CPSR register.
 ///
 /// These opcodes will be converted to the real non-S opcodes by
 /// AdjustInstrPostInstrSelection after giving then an optional CPSR operand.
 let hasPostISelHook = 1, Defs = [CPSR] in {
 multiclass T2I_bin_s_irs<InstrItinClass iii, InstrItinClass iir,
                          InstrItinClass iis, SDNode opnode,
                          bit Commutable = 0> {
    // shifted imm
    def ri : t2PseudoInst<(outs rGPR:$Rd),
                          (ins GPRnopc:$Rn, t2_so_imm:$imm, pred:$p),
                          4, iii,
                          [(set rGPR:$Rd, CPSR, (opnode GPRnopc:$Rn,
                                                 t2_so_imm:$imm))]>,
             Sched<[WriteALU, ReadALU]>;
    // register
    def rr : t2PseudoInst<(outs rGPR:$Rd), (ins GPRnopc:$Rn, rGPR:$Rm, pred:$p),
                          4, iir,
                          [(set rGPR:$Rd, CPSR, (opnode GPRnopc:$Rn,
                                                 rGPR:$Rm))]>,
             Sched<[WriteALU, ReadALU, ReadALU]> {
      let isCommutable = Commutable;
    }
    // shifted register
    def rs : t2PseudoInst<(outs rGPR:$Rd),
                          (ins GPRnopc:$Rn, t2_so_reg:$ShiftedRm, pred:$p),
                          4, iis,
                          [(set rGPR:$Rd, CPSR, (opnode GPRnopc:$Rn,
                                                 t2_so_reg:$ShiftedRm))]>,
             Sched<[WriteALUsi, ReadALUsr]>;
 }
 }
 
 /// T2I_rbin_s_is -  Same as T2I_bin_s_irs, except selection DAG
 /// operands are reversed.
 let hasPostISelHook = 1, Defs = [CPSR] in {
 multiclass T2I_rbin_s_is<SDNode opnode> {
    // shifted imm
    def ri : t2PseudoInst<(outs rGPR:$Rd),
                          (ins rGPR:$Rn, t2_so_imm:$imm, pred:$p),
                          4, IIC_iALUi,
                          [(set rGPR:$Rd, CPSR, (opnode t2_so_imm:$imm,
                                                 rGPR:$Rn))]>,
             Sched<[WriteALU, ReadALU]>;
    // shifted register
    def rs : t2PseudoInst<(outs rGPR:$Rd),
                          (ins rGPR:$Rn, t2_so_reg:$ShiftedRm, pred:$p),
                          4, IIC_iALUsi,
                          [(set rGPR:$Rd, CPSR, (opnode t2_so_reg:$ShiftedRm,
                                                 rGPR:$Rn))]>,
             Sched<[WriteALUsi, ReadALU]>;
 }
 }
 
 /// T2I_bin_ii12rs - Defines a set of (op reg, {so_imm|imm0_4095|r|so_reg})
 /// patterns for a binary operation that produces a value.
 multiclass T2I_bin_ii12rs<bits<3> op23_21, string opc, SDNode opnode,
                           bit Commutable = 0> {
    // shifted imm
    // The register-immediate version is re-materializable. This is useful
    // in particular for taking the address of a local.
    let isReMaterializable = 1 in {
    def ri : T2sTwoRegImm<
                (outs GPRnopc:$Rd), (ins GPRnopc:$Rn, t2_so_imm:$imm), IIC_iALUi,
                opc, ".w\t$Rd, $Rn, $imm",
                [(set GPRnopc:$Rd, (opnode GPRnopc:$Rn, t2_so_imm:$imm))]>,
                Sched<[WriteALU, ReadALU]> {
      let Inst{31-27} = 0b11110;
      let Inst{25} = 0;
      let Inst{24} = 1;
      let Inst{23-21} = op23_21;
      let Inst{15} = 0;
    }
    }
    // 12-bit imm
    def ri12 : T2I<
                   (outs GPRnopc:$Rd), (ins GPR:$Rn, imm0_4095:$imm), IIC_iALUi,
                   !strconcat(opc, "w"), "\t$Rd, $Rn, $imm",
                   [(set GPRnopc:$Rd, (opnode GPR:$Rn, imm0_4095:$imm))]>,
                   Sched<[WriteALU, ReadALU]> {
      bits<4> Rd;
      bits<4> Rn;
      bits<12> imm;
      let Inst{31-27} = 0b11110;
      let Inst{26} = imm{11};
      let Inst{25-24} = 0b10;
      let Inst{23-21} = op23_21;
      let Inst{20} = 0; // The S bit.
      let Inst{19-16} = Rn;
      let Inst{15} = 0;
      let Inst{14-12} = imm{10-8};
      let Inst{11-8} = Rd;
      let Inst{7-0} = imm{7-0};
    }
    // register
    def rr : T2sThreeReg<(outs GPRnopc:$Rd), (ins GPRnopc:$Rn, rGPR:$Rm),
                  IIC_iALUr, opc, ".w\t$Rd, $Rn, $Rm",
                  [(set GPRnopc:$Rd, (opnode GPRnopc:$Rn, rGPR:$Rm))]>,
                  Sched<[WriteALU, ReadALU, ReadALU]> {
      let isCommutable = Commutable;
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24} = 1;
      let Inst{23-21} = op23_21;
      let Inst{14-12} = 0b000; // imm3
      let Inst{7-6} = 0b00; // imm2
      let Inst{5-4} = 0b00; // type
    }
    // shifted register
    def rs : T2sTwoRegShiftedReg<
                  (outs GPRnopc:$Rd), (ins GPRnopc:$Rn, t2_so_reg:$ShiftedRm),
                  IIC_iALUsi, opc, ".w\t$Rd, $Rn, $ShiftedRm",
               [(set GPRnopc:$Rd, (opnode GPRnopc:$Rn, t2_so_reg:$ShiftedRm))]>,
               Sched<[WriteALUsi, ReadALU]> {
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24} = 1;
      let Inst{23-21} = op23_21;
    }
 }
 
 /// T2I_adde_sube_irs - Defines a set of (op reg, {so_imm|r|so_reg}) patterns
 /// for a binary operation that produces a value and use the carry
 /// bit. It's not predicable.
 let Defs = [CPSR], Uses = [CPSR] in {
 multiclass T2I_adde_sube_irs<bits<4> opcod, string opc, SDNode opnode,
                              bit Commutable = 0> {
    // shifted imm
    def ri : T2sTwoRegImm<(outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_imm:$imm),
                  IIC_iALUi, opc, "\t$Rd, $Rn, $imm",
                [(set rGPR:$Rd, CPSR, (opnode rGPR:$Rn, t2_so_imm:$imm, CPSR))]>,
                  Requires<[IsThumb2]>, Sched<[WriteALU, ReadALU]> {
      let Inst{31-27} = 0b11110;
      let Inst{25} = 0;
      let Inst{24-21} = opcod;
      let Inst{15} = 0;
    }
    // register
    def rr : T2sThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iALUr,
                  opc, ".w\t$Rd, $Rn, $Rm",
                  [(set rGPR:$Rd, CPSR, (opnode rGPR:$Rn, rGPR:$Rm, CPSR))]>,
                  Requires<[IsThumb2]>, Sched<[WriteALU, ReadALU, ReadALU]> {
      let isCommutable = Commutable;
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = opcod;
      let Inst{14-12} = 0b000; // imm3
      let Inst{7-6} = 0b00; // imm2
      let Inst{5-4} = 0b00; // type
    }
    // shifted register
    def rs : T2sTwoRegShiftedReg<
                  (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_reg:$ShiftedRm),
                  IIC_iALUsi, opc, ".w\t$Rd, $Rn, $ShiftedRm",
          [(set rGPR:$Rd, CPSR, (opnode rGPR:$Rn, t2_so_reg:$ShiftedRm, CPSR))]>,
                  Requires<[IsThumb2]>, Sched<[WriteALUsi, ReadALU]> {
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = opcod;
    }
 }
 }
 
 /// T2I_sh_ir - Defines a set of (op reg, {so_imm|r}) patterns for a shift /
 //  rotate operation that produces a value.
 multiclass T2I_sh_ir<bits<2> opcod, string opc, Operand ty, SDNode opnode> {
    // 5-bit imm
    def ri : T2sTwoRegShiftImm<
                  (outs rGPR:$Rd), (ins rGPR:$Rm, ty:$imm), IIC_iMOVsi,
                  opc, ".w\t$Rd, $Rm, $imm",
                  [(set rGPR:$Rd, (opnode rGPR:$Rm, (i32 ty:$imm)))]>,
                  Sched<[WriteALU]> {
      let Inst{31-27} = 0b11101;
      let Inst{26-21} = 0b010010;
      let Inst{19-16} = 0b1111; // Rn
      let Inst{5-4} = opcod;
    }
    // register
    def rr : T2sThreeReg<
                  (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMOVsr,
                  opc, ".w\t$Rd, $Rn, $Rm",
                  [(set rGPR:$Rd, (opnode rGPR:$Rn, rGPR:$Rm))]>,
                  Sched<[WriteALU]> {
      let Inst{31-27} = 0b11111;
      let Inst{26-23} = 0b0100;
      let Inst{22-21} = opcod;
      let Inst{15-12} = 0b1111;
      let Inst{7-4} = 0b0000;
    }
 
   // Optional destination register
   def : t2InstAlias<!strconcat(opc, "${s}${p}", ".w $Rdn, $imm"),
      (!cast<Instruction>(NAME#"ri") rGPR:$Rdn, rGPR:$Rdn, ty:$imm, pred:$p,
                                     cc_out:$s)>;
   def : t2InstAlias<!strconcat(opc, "${s}${p}", ".w $Rdn, $Rm"),
      (!cast<Instruction>(NAME#"rr") rGPR:$Rdn, rGPR:$Rdn, rGPR:$Rm, pred:$p,
                                     cc_out:$s)>;
 
   // Assembler aliases w/o the ".w" suffix.
   def : t2InstAlias<!strconcat(opc, "${s}${p}", " $Rd, $Rn, $imm"),
      (!cast<Instruction>(NAME#"ri") rGPR:$Rd, rGPR:$Rn, ty:$imm, pred:$p,
                                     cc_out:$s)>;
   def : t2InstAlias<!strconcat(opc, "${s}${p}", " $Rd, $Rn, $Rm"),
      (!cast<Instruction>(NAME#"rr") rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, pred:$p,
                                     cc_out:$s)>;
 
   // and with the optional destination operand, too.
   def : t2InstAlias<!strconcat(opc, "${s}${p}", " $Rdn, $imm"),
      (!cast<Instruction>(NAME#"ri") rGPR:$Rdn, rGPR:$Rdn, ty:$imm, pred:$p,
                                     cc_out:$s)>;
   def : t2InstAlias<!strconcat(opc, "${s}${p}", " $Rdn, $Rm"),
      (!cast<Instruction>(NAME#"rr") rGPR:$Rdn, rGPR:$Rdn, rGPR:$Rm, pred:$p,
                                     cc_out:$s)>;
 }
 
 /// T2I_cmp_irs - Defines a set of (op r, {so_imm|r|so_reg}) cmp / test
 /// patterns. Similar to T2I_bin_irs except the instruction does not produce
 /// a explicit result, only implicitly set CPSR.
 multiclass T2I_cmp_irs<bits<4> opcod, string opc,
                      InstrItinClass iii, InstrItinClass iir, InstrItinClass iis,
                      SDPatternOperator opnode> {
 let isCompare = 1, Defs = [CPSR] in {
    // shifted imm
    def ri : T2OneRegCmpImm<
                 (outs), (ins GPRnopc:$Rn, t2_so_imm:$imm), iii,
                 opc, ".w\t$Rn, $imm",
                 [(opnode GPRnopc:$Rn, t2_so_imm:$imm)]>, Sched<[WriteCMP]> {
      let Inst{31-27} = 0b11110;
      let Inst{25} = 0;
      let Inst{24-21} = opcod;
      let Inst{20} = 1; // The S bit.
      let Inst{15} = 0;
      let Inst{11-8} = 0b1111; // Rd
    }
    // register
    def rr : T2TwoRegCmp<
                 (outs), (ins GPRnopc:$Rn, rGPR:$Rm), iir,
                 opc, ".w\t$Rn, $Rm",
                 [(opnode GPRnopc:$Rn, rGPR:$Rm)]>, Sched<[WriteCMP]> {
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = opcod;
      let Inst{20} = 1; // The S bit.
      let Inst{14-12} = 0b000; // imm3
      let Inst{11-8} = 0b1111; // Rd
      let Inst{7-6} = 0b00; // imm2
      let Inst{5-4} = 0b00; // type
    }
    // shifted register
    def rs : T2OneRegCmpShiftedReg<
                 (outs), (ins GPRnopc:$Rn, t2_so_reg:$ShiftedRm), iis,
                 opc, ".w\t$Rn, $ShiftedRm",
                 [(opnode GPRnopc:$Rn, t2_so_reg:$ShiftedRm)]>,
                 Sched<[WriteCMPsi]> {
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = opcod;
      let Inst{20} = 1; // The S bit.
      let Inst{11-8} = 0b1111; // Rd
    }
 }
 
   // Assembler aliases w/o the ".w" suffix.
   // No alias here for 'rr' version as not all instantiations of this
   // multiclass want one (CMP in particular, does not).
   def : t2InstAlias<!strconcat(opc, "${p}", " $Rn, $imm"),
      (!cast<Instruction>(NAME#"ri") GPRnopc:$Rn, t2_so_imm:$imm, pred:$p)>;
   def : t2InstAlias<!strconcat(opc, "${p}", " $Rn, $shift"),
      (!cast<Instruction>(NAME#"rs") GPRnopc:$Rn, t2_so_reg:$shift, pred:$p)>;
 }
 
 /// T2I_ld - Defines a set of (op r, {imm12|imm8|so_reg}) load patterns.
 multiclass T2I_ld<bit signed, bits<2> opcod, string opc,
                   InstrItinClass iii, InstrItinClass iis, RegisterClass target,
                   PatFrag opnode> {
   def i12 : T2Ii12<(outs target:$Rt), (ins t2addrmode_imm12:$addr), iii,
                    opc, ".w\t$Rt, $addr",
                    [(set target:$Rt, (opnode t2addrmode_imm12:$addr))]>,
             Sched<[WriteLd]> {
     bits<4> Rt;
     bits<17> addr;
     let Inst{31-25} = 0b1111100;
     let Inst{24} = signed;
     let Inst{23} = 1;
     let Inst{22-21} = opcod;
     let Inst{20} = 1; // load
     let Inst{19-16} = addr{16-13}; // Rn
     let Inst{15-12} = Rt;
     let Inst{11-0}  = addr{11-0};  // imm
 
     let DecoderMethod = "DecodeT2LoadImm12";
   }
   def i8  : T2Ii8 <(outs target:$Rt), (ins t2addrmode_negimm8:$addr), iii,
                    opc, "\t$Rt, $addr",
                    [(set target:$Rt, (opnode t2addrmode_negimm8:$addr))]>,
             Sched<[WriteLd]> {
     bits<4> Rt;
     bits<13> addr;
     let Inst{31-27} = 0b11111;
     let Inst{26-25} = 0b00;
     let Inst{24} = signed;
     let Inst{23} = 0;
     let Inst{22-21} = opcod;
     let Inst{20} = 1; // load
     let Inst{19-16} = addr{12-9}; // Rn
     let Inst{15-12} = Rt;
     let Inst{11} = 1;
     // Offset: index==TRUE, wback==FALSE
     let Inst{10} = 1; // The P bit.
     let Inst{9}     = addr{8};    // U
     let Inst{8} = 0; // The W bit.
     let Inst{7-0}   = addr{7-0};  // imm
 
     let DecoderMethod = "DecodeT2LoadImm8";
   }
   def s   : T2Iso <(outs target:$Rt), (ins t2addrmode_so_reg:$addr), iis,
                    opc, ".w\t$Rt, $addr",
                    [(set target:$Rt, (opnode t2addrmode_so_reg:$addr))]>,
             Sched<[WriteLd]> {
     let Inst{31-27} = 0b11111;
     let Inst{26-25} = 0b00;
     let Inst{24} = signed;
     let Inst{23} = 0;
     let Inst{22-21} = opcod;
     let Inst{20} = 1; // load
     let Inst{11-6} = 0b000000;
 
     bits<4> Rt;
     let Inst{15-12} = Rt;
 
     bits<10> addr;
     let Inst{19-16} = addr{9-6}; // Rn
     let Inst{3-0}   = addr{5-2}; // Rm
     let Inst{5-4}   = addr{1-0}; // imm
 
     let DecoderMethod = "DecodeT2LoadShift";
   }
 
   // pci variant is very similar to i12, but supports negative offsets
   // from the PC.
   def pci : T2Ipc <(outs target:$Rt), (ins t2ldrlabel:$addr), iii,
                    opc, ".w\t$Rt, $addr",
                    [(set target:$Rt, (opnode (ARMWrapper tconstpool:$addr)))]>,
             Sched<[WriteLd]> {
     let isReMaterializable = 1;
     let Inst{31-27} = 0b11111;
     let Inst{26-25} = 0b00;
     let Inst{24} = signed;
     let Inst{22-21} = opcod;
     let Inst{20} = 1; // load
     let Inst{19-16} = 0b1111; // Rn
 
     bits<4> Rt;
     let Inst{15-12} = Rt{3-0};
 
     bits<13> addr;
     let Inst{23} = addr{12}; // add = (U == '1')
     let Inst{11-0}  = addr{11-0};
 
     let DecoderMethod = "DecodeT2LoadLabel";
   }
 }
 
 /// T2I_st - Defines a set of (op r, {imm12|imm8|so_reg}) store patterns.
 multiclass T2I_st<bits<2> opcod, string opc,
                   InstrItinClass iii, InstrItinClass iis, RegisterClass target,
                   PatFrag opnode> {
   def i12 : T2Ii12<(outs), (ins target:$Rt, t2addrmode_imm12:$addr), iii,
                    opc, ".w\t$Rt, $addr",
                    [(opnode target:$Rt, t2addrmode_imm12:$addr)]>,
             Sched<[WriteST]> {
     let Inst{31-27} = 0b11111;
     let Inst{26-23} = 0b0001;
     let Inst{22-21} = opcod;
     let Inst{20} = 0; // !load
 
     bits<4> Rt;
     let Inst{15-12} = Rt;
 
     bits<17> addr;
     let addr{12}    = 1;           // add = TRUE
     let Inst{19-16} = addr{16-13}; // Rn
     let Inst{23}    = addr{12};    // U
     let Inst{11-0}  = addr{11-0};  // imm
   }
   def i8  : T2Ii8 <(outs), (ins target:$Rt, t2addrmode_negimm8:$addr), iii,
                    opc, "\t$Rt, $addr",
                    [(opnode target:$Rt, t2addrmode_negimm8:$addr)]>,
             Sched<[WriteST]> {
     let Inst{31-27} = 0b11111;
     let Inst{26-23} = 0b0000;
     let Inst{22-21} = opcod;
     let Inst{20} = 0; // !load
     let Inst{11} = 1;
     // Offset: index==TRUE, wback==FALSE
     let Inst{10} = 1; // The P bit.
     let Inst{8} = 0; // The W bit.
 
     bits<4> Rt;
     let Inst{15-12} = Rt;
 
     bits<13> addr;
     let Inst{19-16} = addr{12-9}; // Rn
     let Inst{9}     = addr{8};    // U
     let Inst{7-0}   = addr{7-0};  // imm
   }
   def s   : T2Iso <(outs), (ins target:$Rt, t2addrmode_so_reg:$addr), iis,
                    opc, ".w\t$Rt, $addr",
                    [(opnode target:$Rt, t2addrmode_so_reg:$addr)]>,
             Sched<[WriteST]> {
     let Inst{31-27} = 0b11111;
     let Inst{26-23} = 0b0000;
     let Inst{22-21} = opcod;
     let Inst{20} = 0; // !load
     let Inst{11-6} = 0b000000;
 
     bits<4> Rt;
     let Inst{15-12} = Rt;
 
     bits<10> addr;
     let Inst{19-16}   = addr{9-6}; // Rn
     let Inst{3-0} = addr{5-2}; // Rm
     let Inst{5-4}   = addr{1-0}; // imm
   }
 }
 
 /// T2I_ext_rrot - A unary operation with two forms: one whose operand is a
 /// register and one whose operand is a register rotated by 8/16/24.
 class T2I_ext_rrot_base<bits<3> opcod, dag iops, dag oops,
                         string opc, string oprs,
                         list<dag> pattern>
   : T2TwoReg<iops, oops, IIC_iEXTr, opc, oprs, pattern> {
   bits<2> rot;
   let Inst{31-27} = 0b11111;
   let Inst{26-23} = 0b0100;
   let Inst{22-20} = opcod;
   let Inst{19-16} = 0b1111; // Rn
   let Inst{15-12} = 0b1111;
   let Inst{7} = 1;
   let Inst{5-4} = rot; // rotate
 }
 
 class T2I_ext_rrot<bits<3> opcod, string opc>
   : T2I_ext_rrot_base<opcod,
                       (outs rGPR:$Rd),
                       (ins rGPR:$Rm, rot_imm:$rot),
                       opc, ".w\t$Rd, $Rm$rot", []>,
                       Requires<[IsThumb2]>,
                       Sched<[WriteALU, ReadALU]>;
 
 // UXTB16, SXTB16 - Requires HasDSP, does not need the .w qualifier.
 class T2I_ext_rrot_xtb16<bits<3> opcod, string opc>
   : T2I_ext_rrot_base<opcod,
                       (outs rGPR:$Rd),
                       (ins rGPR:$Rm, rot_imm:$rot),
                       opc, "\t$Rd, $Rm$rot", []>,
                       Requires<[HasDSP, IsThumb2]>,
                       Sched<[WriteALU, ReadALU]>;
 
 /// T2I_exta_rrot - A binary operation with two forms: one whose operand is a
 /// register and one whose operand is a register rotated by 8/16/24.
 class T2I_exta_rrot<bits<3> opcod, string opc>
   : T2ThreeReg<(outs rGPR:$Rd),
                (ins rGPR:$Rn, rGPR:$Rm, rot_imm:$rot),
                IIC_iEXTAsr, opc, "\t$Rd, $Rn, $Rm$rot", []>,
                Requires<[HasDSP, IsThumb2]>,
                Sched<[WriteALU, ReadALU]> {
   bits<2> rot;
   let Inst{31-27} = 0b11111;
   let Inst{26-23} = 0b0100;
   let Inst{22-20} = opcod;
   let Inst{15-12} = 0b1111;
   let Inst{7} = 1;
   let Inst{5-4} = rot;
 }
 
 //===----------------------------------------------------------------------===//
 // Instructions
 //===----------------------------------------------------------------------===//
 
 //===----------------------------------------------------------------------===//
 //  Miscellaneous Instructions.
 //
 
 class T2PCOneRegImm<dag oops, dag iops, InstrItinClass itin,
            string asm, list<dag> pattern>
   : T2XI<oops, iops, itin, asm, pattern> {
   bits<4> Rd;
   bits<12> label;
 
   let Inst{11-8}  = Rd;
   let Inst{26}    = label{11};
   let Inst{14-12} = label{10-8};
   let Inst{7-0}   = label{7-0};
 }
 
 // LEApcrel - Load a pc-relative address into a register without offending the
 // assembler.
 def t2ADR : T2PCOneRegImm<(outs rGPR:$Rd),
               (ins t2adrlabel:$addr, pred:$p),
               IIC_iALUi, "adr{$p}.w\t$Rd, $addr", []>,
               Sched<[WriteALU, ReadALU]> {
   let Inst{31-27} = 0b11110;
   let Inst{25-24} = 0b10;
   // Inst{23:21} = '11' (add = FALSE) or '00' (add = TRUE)
   let Inst{22} = 0;
   let Inst{20} = 0;
   let Inst{19-16} = 0b1111; // Rn
   let Inst{15} = 0;
 
   bits<4> Rd;
   bits<13> addr;
   let Inst{11-8} = Rd;
   let Inst{23}    = addr{12};
   let Inst{21}    = addr{12};
   let Inst{26}    = addr{11};
   let Inst{14-12} = addr{10-8};
   let Inst{7-0}   = addr{7-0};
 
   let DecoderMethod = "DecodeT2Adr";
 }
 
 let hasSideEffects = 0, isReMaterializable = 1 in
 def t2LEApcrel   : t2PseudoInst<(outs rGPR:$Rd), (ins i32imm:$label, pred:$p),
                                 4, IIC_iALUi, []>, Sched<[WriteALU, ReadALU]>;
 let hasSideEffects = 1 in
 def t2LEApcrelJT : t2PseudoInst<(outs rGPR:$Rd),
                                 (ins i32imm:$label, pred:$p),
                                 4, IIC_iALUi,
                                 []>, Sched<[WriteALU, ReadALU]>;
 
 
 //===----------------------------------------------------------------------===//
 //  Load / store Instructions.
 //
 
 // Load
 let canFoldAsLoad = 1, isReMaterializable = 1  in
 defm t2LDR   : T2I_ld<0, 0b10, "ldr", IIC_iLoad_i, IIC_iLoad_si, GPR, load>;
 
 // Loads with zero extension
 defm t2LDRH  : T2I_ld<0, 0b01, "ldrh", IIC_iLoad_bh_i, IIC_iLoad_bh_si,
                       GPRnopc, zextloadi16>;
 defm t2LDRB  : T2I_ld<0, 0b00, "ldrb", IIC_iLoad_bh_i, IIC_iLoad_bh_si,
                       GPRnopc, zextloadi8>;
 
 // Loads with sign extension
 defm t2LDRSH : T2I_ld<1, 0b01, "ldrsh", IIC_iLoad_bh_i, IIC_iLoad_bh_si,
                       GPRnopc, sextloadi16>;
 defm t2LDRSB : T2I_ld<1, 0b00, "ldrsb", IIC_iLoad_bh_i, IIC_iLoad_bh_si,
                       GPRnopc, sextloadi8>;
 
 let mayLoad = 1, hasSideEffects = 0, hasExtraDefRegAllocReq = 1 in {
 // Load doubleword
 def t2LDRDi8  : T2Ii8s4<1, 0, 1, (outs rGPR:$Rt, rGPR:$Rt2),
                         (ins t2addrmode_imm8s4:$addr),
                         IIC_iLoad_d_i, "ldrd", "\t$Rt, $Rt2, $addr", "", []>,
                  Sched<[WriteLd]>;
 } // mayLoad = 1, hasSideEffects = 0, hasExtraDefRegAllocReq = 1
 
 // zextload i1 -> zextload i8
 def : T2Pat<(zextloadi1 t2addrmode_imm12:$addr),
             (t2LDRBi12  t2addrmode_imm12:$addr)>;
 def : T2Pat<(zextloadi1 t2addrmode_negimm8:$addr),
             (t2LDRBi8   t2addrmode_negimm8:$addr)>;
 def : T2Pat<(zextloadi1 t2addrmode_so_reg:$addr),
             (t2LDRBs    t2addrmode_so_reg:$addr)>;
 def : T2Pat<(zextloadi1 (ARMWrapper tconstpool:$addr)),
             (t2LDRBpci  tconstpool:$addr)>;
 
 // extload -> zextload
 // FIXME: Reduce the number of patterns by legalizing extload to zextload
 // earlier?
 def : T2Pat<(extloadi1  t2addrmode_imm12:$addr),
             (t2LDRBi12  t2addrmode_imm12:$addr)>;
 def : T2Pat<(extloadi1  t2addrmode_negimm8:$addr),
             (t2LDRBi8   t2addrmode_negimm8:$addr)>;
 def : T2Pat<(extloadi1  t2addrmode_so_reg:$addr),
             (t2LDRBs    t2addrmode_so_reg:$addr)>;
 def : T2Pat<(extloadi1  (ARMWrapper tconstpool:$addr)),
             (t2LDRBpci  tconstpool:$addr)>;
 
 def : T2Pat<(extloadi8  t2addrmode_imm12:$addr),
             (t2LDRBi12  t2addrmode_imm12:$addr)>;
 def : T2Pat<(extloadi8  t2addrmode_negimm8:$addr),
             (t2LDRBi8   t2addrmode_negimm8:$addr)>;
 def : T2Pat<(extloadi8  t2addrmode_so_reg:$addr),
             (t2LDRBs    t2addrmode_so_reg:$addr)>;
 def : T2Pat<(extloadi8  (ARMWrapper tconstpool:$addr)),
             (t2LDRBpci  tconstpool:$addr)>;
 
 def : T2Pat<(extloadi16 t2addrmode_imm12:$addr),
             (t2LDRHi12  t2addrmode_imm12:$addr)>;
 def : T2Pat<(extloadi16 t2addrmode_negimm8:$addr),
             (t2LDRHi8   t2addrmode_negimm8:$addr)>;
 def : T2Pat<(extloadi16 t2addrmode_so_reg:$addr),
             (t2LDRHs    t2addrmode_so_reg:$addr)>;
 def : T2Pat<(extloadi16 (ARMWrapper tconstpool:$addr)),
             (t2LDRHpci  tconstpool:$addr)>;
 
 // FIXME: The destination register of the loads and stores can't be PC, but
 //        can be SP. We need another regclass (similar to rGPR) to represent
 //        that. Not a pressing issue since these are selected manually,
 //        not via pattern.
 
 // Indexed loads
 
 let mayLoad = 1, hasSideEffects = 0 in {
 def t2LDR_PRE  : T2Ipreldst<0, 0b10, 1, 1, (outs GPR:$Rt, GPR:$Rn_wb),
                             (ins t2addrmode_imm8_pre:$addr),
                             AddrModeT2_i8, IndexModePre, IIC_iLoad_iu,
                             "ldr", "\t$Rt, $addr!", "$addr.base = $Rn_wb", []>,
                  Sched<[WriteLd]>;
 
 def t2LDR_POST : T2Ipostldst<0, 0b10, 1, 0, (outs GPR:$Rt, GPR:$Rn_wb),
                           (ins addr_offset_none:$Rn, t2am_imm8_offset:$offset),
                           AddrModeT2_i8, IndexModePost, IIC_iLoad_iu,
                           "ldr", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb", []>,
                   Sched<[WriteLd]>;
 
 def t2LDRB_PRE : T2Ipreldst<0, 0b00, 1, 1, (outs GPR:$Rt, GPR:$Rn_wb),
                             (ins t2addrmode_imm8_pre:$addr),
                             AddrModeT2_i8, IndexModePre, IIC_iLoad_bh_iu,
                             "ldrb", "\t$Rt, $addr!", "$addr.base = $Rn_wb", []>,
                  Sched<[WriteLd]>;
 
 def t2LDRB_POST : T2Ipostldst<0, 0b00, 1, 0, (outs GPR:$Rt, GPR:$Rn_wb),
                           (ins addr_offset_none:$Rn, t2am_imm8_offset:$offset),
                           AddrModeT2_i8, IndexModePost, IIC_iLoad_bh_iu,
                           "ldrb", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb", []>;
 
 def t2LDRH_PRE : T2Ipreldst<0, 0b01, 1, 1, (outs GPR:$Rt, GPR:$Rn_wb),
                             (ins t2addrmode_imm8_pre:$addr),
                             AddrModeT2_i8, IndexModePre, IIC_iLoad_bh_iu,
                             "ldrh", "\t$Rt, $addr!", "$addr.base = $Rn_wb", []>,
                 Sched<[WriteLd]>;
 
 def t2LDRH_POST : T2Ipostldst<0, 0b01, 1, 0, (outs GPR:$Rt, GPR:$Rn_wb),
                           (ins addr_offset_none:$Rn, t2am_imm8_offset:$offset),
                           AddrModeT2_i8, IndexModePost, IIC_iLoad_bh_iu,
                           "ldrh", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb", []>,
                   Sched<[WriteLd]>;
 
 def t2LDRSB_PRE : T2Ipreldst<1, 0b00, 1, 1, (outs GPR:$Rt, GPR:$Rn_wb),
                             (ins t2addrmode_imm8_pre:$addr),
                             AddrModeT2_i8, IndexModePre, IIC_iLoad_bh_iu,
                             "ldrsb", "\t$Rt, $addr!", "$addr.base = $Rn_wb",
                             []>, Sched<[WriteLd]>;
 
 def t2LDRSB_POST : T2Ipostldst<1, 0b00, 1, 0, (outs GPR:$Rt, GPR:$Rn_wb),
                           (ins addr_offset_none:$Rn, t2am_imm8_offset:$offset),
                           AddrModeT2_i8, IndexModePost, IIC_iLoad_bh_iu,
                           "ldrsb", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb", []>,
                    Sched<[WriteLd]>;
 
 def t2LDRSH_PRE : T2Ipreldst<1, 0b01, 1, 1, (outs GPR:$Rt, GPR:$Rn_wb),
                             (ins t2addrmode_imm8_pre:$addr),
                             AddrModeT2_i8, IndexModePre, IIC_iLoad_bh_iu,
                             "ldrsh", "\t$Rt, $addr!", "$addr.base = $Rn_wb",
                             []>, Sched<[WriteLd]>;
 
 def t2LDRSH_POST : T2Ipostldst<1, 0b01, 1, 0, (outs GPR:$Rt, GPR:$Rn_wb),
                           (ins addr_offset_none:$Rn, t2am_imm8_offset:$offset),
                           AddrModeT2_i8, IndexModePost, IIC_iLoad_bh_iu,
                           "ldrsh", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb", []>,
                   Sched<[WriteLd]>;
 } // mayLoad = 1, hasSideEffects = 0
 
 // LDRT, LDRBT, LDRHT, LDRSBT, LDRSHT all have offset mode (PUW=0b110).
 // Ref: A8.6.57 LDR (immediate, Thumb) Encoding T4
 class T2IldT<bit signed, bits<2> type, string opc, InstrItinClass ii>
   : T2Ii8<(outs rGPR:$Rt), (ins t2addrmode_posimm8:$addr), ii, opc,
           "\t$Rt, $addr", []>, Sched<[WriteLd]> {
   bits<4> Rt;
   bits<13> addr;
   let Inst{31-27} = 0b11111;
   let Inst{26-25} = 0b00;
   let Inst{24} = signed;
   let Inst{23} = 0;
   let Inst{22-21} = type;
   let Inst{20} = 1; // load
   let Inst{19-16} = addr{12-9};
   let Inst{15-12} = Rt;
   let Inst{11} = 1;
   let Inst{10-8} = 0b110; // PUW.
   let Inst{7-0} = addr{7-0};
 
   let DecoderMethod = "DecodeT2LoadT";
 }
 
 def t2LDRT   : T2IldT<0, 0b10, "ldrt", IIC_iLoad_i>;
 def t2LDRBT  : T2IldT<0, 0b00, "ldrbt", IIC_iLoad_bh_i>;
 def t2LDRHT  : T2IldT<0, 0b01, "ldrht", IIC_iLoad_bh_i>;
 def t2LDRSBT : T2IldT<1, 0b00, "ldrsbt", IIC_iLoad_bh_i>;
 def t2LDRSHT : T2IldT<1, 0b01, "ldrsht", IIC_iLoad_bh_i>;
 
 class T2Ildacq<bits<4> bits23_20, bits<2> bit54, dag oops, dag iops,
                string opc, string asm, list<dag> pattern>
   : Thumb2I<oops, iops, AddrModeNone, 4, NoItinerary,
             opc, asm, "", pattern>, Requires<[IsThumb, HasAcquireRelease]> {
   bits<4> Rt;
   bits<4> addr;
 
   let Inst{31-27} = 0b11101;
   let Inst{26-24} = 0b000;
   let Inst{23-20} = bits23_20;
   let Inst{11-6} = 0b111110;
   let Inst{5-4} = bit54;
   let Inst{3-0} = 0b1111;
 
   // Encode instruction operands
   let Inst{19-16} = addr;
   let Inst{15-12} = Rt;
 }
 
 def t2LDA : T2Ildacq<0b1101, 0b10, (outs rGPR:$Rt),
                      (ins addr_offset_none:$addr), "lda", "\t$Rt, $addr", []>,
             Sched<[WriteLd]>;
 def t2LDAB : T2Ildacq<0b1101, 0b00, (outs rGPR:$Rt),
                       (ins addr_offset_none:$addr), "ldab", "\t$Rt, $addr", []>,
             Sched<[WriteLd]>;
 def t2LDAH : T2Ildacq<0b1101, 0b01, (outs rGPR:$Rt),
                       (ins addr_offset_none:$addr), "ldah", "\t$Rt, $addr", []>,
             Sched<[WriteLd]>;
 
 // Store
 defm t2STR :T2I_st<0b10,"str", IIC_iStore_i, IIC_iStore_si, GPR, store>;
 defm t2STRB:T2I_st<0b00,"strb", IIC_iStore_bh_i, IIC_iStore_bh_si,
                    rGPR, truncstorei8>;
 defm t2STRH:T2I_st<0b01,"strh", IIC_iStore_bh_i, IIC_iStore_bh_si,
                    rGPR, truncstorei16>;
 
 // Store doubleword
 let mayStore = 1, hasSideEffects = 0, hasExtraSrcRegAllocReq = 1 in
 def t2STRDi8 : T2Ii8s4<1, 0, 0, (outs),
                        (ins rGPR:$Rt, rGPR:$Rt2, t2addrmode_imm8s4:$addr),
                IIC_iStore_d_r, "strd", "\t$Rt, $Rt2, $addr", "", []>,
                Sched<[WriteST]>;
 
 // Indexed stores
 
 let mayStore = 1, hasSideEffects = 0 in {
 def t2STR_PRE  : T2Ipreldst<0, 0b10, 0, 1, (outs GPRnopc:$Rn_wb),
                             (ins GPRnopc:$Rt, t2addrmode_imm8_pre:$addr),
                             AddrModeT2_i8, IndexModePre, IIC_iStore_iu,
                             "str", "\t$Rt, $addr!",
                             "$addr.base = $Rn_wb,@earlyclobber $Rn_wb", []>,
                  Sched<[WriteST]>;
 
 def t2STRH_PRE  : T2Ipreldst<0, 0b01, 0, 1, (outs GPRnopc:$Rn_wb),
                             (ins rGPR:$Rt, t2addrmode_imm8_pre:$addr),
                             AddrModeT2_i8, IndexModePre, IIC_iStore_iu,
                         "strh", "\t$Rt, $addr!",
                         "$addr.base = $Rn_wb,@earlyclobber $Rn_wb", []>,
                   Sched<[WriteST]>;
 
 def t2STRB_PRE  : T2Ipreldst<0, 0b00, 0, 1, (outs GPRnopc:$Rn_wb),
                             (ins rGPR:$Rt, t2addrmode_imm8_pre:$addr),
                             AddrModeT2_i8, IndexModePre, IIC_iStore_bh_iu,
                         "strb", "\t$Rt, $addr!",
                         "$addr.base = $Rn_wb,@earlyclobber $Rn_wb", []>,
             Sched<[WriteST]>;
 } // mayStore = 1, hasSideEffects = 0
 
 def t2STR_POST : T2Ipostldst<0, 0b10, 0, 0, (outs GPRnopc:$Rn_wb),
                             (ins GPRnopc:$Rt, addr_offset_none:$Rn,
                                  t2am_imm8_offset:$offset),
                             AddrModeT2_i8, IndexModePost, IIC_iStore_iu,
                           "str", "\t$Rt, $Rn$offset",
                           "$Rn = $Rn_wb,@earlyclobber $Rn_wb",
              [(set GPRnopc:$Rn_wb,
                   (post_store GPRnopc:$Rt, addr_offset_none:$Rn,
                               t2am_imm8_offset:$offset))]>,
             Sched<[WriteST]>;
 
 def t2STRH_POST : T2Ipostldst<0, 0b01, 0, 0, (outs GPRnopc:$Rn_wb),
                             (ins rGPR:$Rt, addr_offset_none:$Rn,
                                  t2am_imm8_offset:$offset),
                             AddrModeT2_i8, IndexModePost, IIC_iStore_bh_iu,
                          "strh", "\t$Rt, $Rn$offset",
                          "$Rn = $Rn_wb,@earlyclobber $Rn_wb",
        [(set GPRnopc:$Rn_wb,
              (post_truncsti16 rGPR:$Rt, addr_offset_none:$Rn,
                               t2am_imm8_offset:$offset))]>,
             Sched<[WriteST]>;
 
 def t2STRB_POST : T2Ipostldst<0, 0b00, 0, 0, (outs GPRnopc:$Rn_wb),
                             (ins rGPR:$Rt, addr_offset_none:$Rn,
                                  t2am_imm8_offset:$offset),
                             AddrModeT2_i8, IndexModePost, IIC_iStore_bh_iu,
                          "strb", "\t$Rt, $Rn$offset",
                          "$Rn = $Rn_wb,@earlyclobber $Rn_wb",
         [(set GPRnopc:$Rn_wb,
               (post_truncsti8 rGPR:$Rt, addr_offset_none:$Rn,
                               t2am_imm8_offset:$offset))]>,
             Sched<[WriteST]>;
 
 // Pseudo-instructions for pattern matching the pre-indexed stores. We can't
 // put the patterns on the instruction definitions directly as ISel wants
 // the address base and offset to be separate operands, not a single
 // complex operand like we represent the instructions themselves. The
 // pseudos map between the two.
 let usesCustomInserter = 1,
     Constraints = "$Rn = $Rn_wb,@earlyclobber $Rn_wb" in {
 def t2STR_preidx: t2PseudoInst<(outs GPRnopc:$Rn_wb),
                (ins rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset, pred:$p),
                4, IIC_iStore_ru,
       [(set GPRnopc:$Rn_wb,
             (pre_store rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset))]>,
             Sched<[WriteST]>;
 def t2STRB_preidx: t2PseudoInst<(outs GPRnopc:$Rn_wb),
                (ins rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset, pred:$p),
                4, IIC_iStore_ru,
       [(set GPRnopc:$Rn_wb,
             (pre_truncsti8 rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset))]>,
             Sched<[WriteST]>;
 def t2STRH_preidx: t2PseudoInst<(outs GPRnopc:$Rn_wb),
                (ins rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset, pred:$p),
                4, IIC_iStore_ru,
       [(set GPRnopc:$Rn_wb,
             (pre_truncsti16 rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset))]>,
             Sched<[WriteST]>;
 }
 
 // STRT, STRBT, STRHT all have offset mode (PUW=0b110) and are for disassembly
 // only.
 // Ref: A8.6.193 STR (immediate, Thumb) Encoding T4
 class T2IstT<bits<2> type, string opc, InstrItinClass ii>
   : T2Ii8<(outs rGPR:$Rt), (ins t2addrmode_imm8:$addr), ii, opc,
           "\t$Rt, $addr", []>, Sched<[WriteST]> {
   let Inst{31-27} = 0b11111;
   let Inst{26-25} = 0b00;
   let Inst{24} = 0; // not signed
   let Inst{23} = 0;
   let Inst{22-21} = type;
   let Inst{20} = 0; // store
   let Inst{11} = 1;
   let Inst{10-8} = 0b110; // PUW
 
   bits<4> Rt;
   bits<13> addr;
   let Inst{15-12} = Rt;
   let Inst{19-16} = addr{12-9};
   let Inst{7-0}   = addr{7-0};
 }
 
 def t2STRT   : T2IstT<0b10, "strt", IIC_iStore_i>;
 def t2STRBT  : T2IstT<0b00, "strbt", IIC_iStore_bh_i>;
 def t2STRHT  : T2IstT<0b01, "strht", IIC_iStore_bh_i>;
 
 // ldrd / strd pre / post variants
 
 let mayLoad = 1 in
 def t2LDRD_PRE  : T2Ii8s4<1, 1, 1, (outs rGPR:$Rt, rGPR:$Rt2, GPR:$wb),
                  (ins t2addrmode_imm8s4_pre:$addr), IIC_iLoad_d_ru,
                  "ldrd", "\t$Rt, $Rt2, $addr!", "$addr.base = $wb", []>,
                  Sched<[WriteLd]> {
   let DecoderMethod = "DecodeT2LDRDPreInstruction";
 }
 
 let mayLoad = 1 in
 def t2LDRD_POST : T2Ii8s4post<0, 1, 1, (outs rGPR:$Rt, rGPR:$Rt2, GPR:$wb),
                  (ins addr_offset_none:$addr, t2am_imm8s4_offset:$imm),
                  IIC_iLoad_d_ru, "ldrd", "\t$Rt, $Rt2, $addr$imm",
                  "$addr.base = $wb", []>, Sched<[WriteLd]>;
 
 let mayStore = 1 in
 def t2STRD_PRE  : T2Ii8s4<1, 1, 0, (outs GPR:$wb),
                  (ins rGPR:$Rt, rGPR:$Rt2, t2addrmode_imm8s4_pre:$addr),
                  IIC_iStore_d_ru, "strd", "\t$Rt, $Rt2, $addr!",
                  "$addr.base = $wb", []>, Sched<[WriteST]> {
   let DecoderMethod = "DecodeT2STRDPreInstruction";
 }
 
 let mayStore = 1 in
 def t2STRD_POST : T2Ii8s4post<0, 1, 0, (outs GPR:$wb),
                  (ins rGPR:$Rt, rGPR:$Rt2, addr_offset_none:$addr,
                       t2am_imm8s4_offset:$imm),
                  IIC_iStore_d_ru, "strd", "\t$Rt, $Rt2, $addr$imm",
                  "$addr.base = $wb", []>, Sched<[WriteST]>;
 
 class T2Istrrel<bits<2> bit54, dag oops, dag iops,
                 string opc, string asm, list<dag> pattern>
   : Thumb2I<oops, iops, AddrModeNone, 4, NoItinerary, opc,
             asm, "", pattern>, Requires<[IsThumb, HasAcquireRelease]>,
     Sched<[WriteST]> {
   bits<4> Rt;
   bits<4> addr;
 
   let Inst{31-27} = 0b11101;
   let Inst{26-20} = 0b0001100;
   let Inst{11-6} = 0b111110;
   let Inst{5-4} = bit54;
   let Inst{3-0} = 0b1111;
 
   // Encode instruction operands
   let Inst{19-16} = addr;
   let Inst{15-12} = Rt;
 }
 
 def t2STL  : T2Istrrel<0b10, (outs), (ins rGPR:$Rt, addr_offset_none:$addr),
                        "stl", "\t$Rt, $addr", []>;
 def t2STLB : T2Istrrel<0b00, (outs), (ins rGPR:$Rt, addr_offset_none:$addr),
                        "stlb", "\t$Rt, $addr", []>;
 def t2STLH : T2Istrrel<0b01, (outs), (ins rGPR:$Rt, addr_offset_none:$addr),
                        "stlh", "\t$Rt, $addr", []>;
 
 // T2Ipl (Preload Data/Instruction) signals the memory system of possible future
 // data/instruction access.
 // instr_write is inverted for Thumb mode: (prefetch 3) -> (preload 0),
 // (prefetch 1) -> (preload 2),  (prefetch 2) -> (preload 1).
 multiclass T2Ipl<bits<1> write, bits<1> instr, string opc> {
 
   def i12 : T2Ii12<(outs), (ins t2addrmode_imm12:$addr), IIC_Preload, opc,
                 "\t$addr",
               [(ARMPreload t2addrmode_imm12:$addr, (i32 write), (i32 instr))]>,
               Sched<[WritePreLd]> {
     let Inst{31-25} = 0b1111100;
     let Inst{24} = instr;
     let Inst{23} = 1;
     let Inst{22} = 0;
     let Inst{21} = write;
     let Inst{20} = 1;
     let Inst{15-12} = 0b1111;
 
     bits<17> addr;
     let Inst{19-16} = addr{16-13}; // Rn
     let Inst{11-0}  = addr{11-0};  // imm12
 
     let DecoderMethod = "DecodeT2LoadImm12";
   }
 
   def i8 : T2Ii8<(outs), (ins t2addrmode_negimm8:$addr), IIC_Preload, opc,
                 "\t$addr",
             [(ARMPreload t2addrmode_negimm8:$addr, (i32 write), (i32 instr))]>,
             Sched<[WritePreLd]> {
     let Inst{31-25} = 0b1111100;
     let Inst{24} = instr;
     let Inst{23} = 0; // U = 0
     let Inst{22} = 0;
     let Inst{21} = write;
     let Inst{20} = 1;
     let Inst{15-12} = 0b1111;
     let Inst{11-8} = 0b1100;
 
     bits<13> addr;
     let Inst{19-16} = addr{12-9}; // Rn
     let Inst{7-0}   = addr{7-0};  // imm8
 
     let DecoderMethod = "DecodeT2LoadImm8";
   }
 
   def s : T2Iso<(outs), (ins t2addrmode_so_reg:$addr), IIC_Preload, opc,
                "\t$addr",
              [(ARMPreload t2addrmode_so_reg:$addr, (i32 write), (i32 instr))]>,
              Sched<[WritePreLd]> {
     let Inst{31-25} = 0b1111100;
     let Inst{24} = instr;
     let Inst{23} = 0; // add = TRUE for T1
     let Inst{22} = 0;
     let Inst{21} = write;
     let Inst{20} = 1;
     let Inst{15-12} = 0b1111;
     let Inst{11-6} = 0b000000;
 
     bits<10> addr;
     let Inst{19-16} = addr{9-6}; // Rn
     let Inst{3-0}   = addr{5-2}; // Rm
     let Inst{5-4}   = addr{1-0}; // imm2
 
     let DecoderMethod = "DecodeT2LoadShift";
   }
 }
 
 defm t2PLD    : T2Ipl<0, 0, "pld">,  Requires<[IsThumb2]>;
 defm t2PLDW   : T2Ipl<1, 0, "pldw">, Requires<[IsThumb2,HasV7,HasMP]>;
 defm t2PLI    : T2Ipl<0, 1, "pli">,  Requires<[IsThumb2,HasV7]>;
 
 // pci variant is very similar to i12, but supports negative offsets
 // from the PC. Only PLD and PLI have pci variants (not PLDW)
 class T2Iplpci<bits<1> inst, string opc> : T2Iso<(outs), (ins t2ldrlabel:$addr),
                IIC_Preload, opc, "\t$addr",
                [(ARMPreload (ARMWrapper tconstpool:$addr),
                 (i32 0), (i32 inst))]>, Sched<[WritePreLd]> {
   let Inst{31-25} = 0b1111100;
   let Inst{24} = inst;
   let Inst{22-20} = 0b001;
   let Inst{19-16} = 0b1111;
   let Inst{15-12} = 0b1111;
 
   bits<13> addr;
   let Inst{23}   = addr{12};   // add = (U == '1')
   let Inst{11-0} = addr{11-0}; // imm12
 
   let DecoderMethod = "DecodeT2LoadLabel";
 }
 
 def t2PLDpci : T2Iplpci<0, "pld">,  Requires<[IsThumb2]>;
 def t2PLIpci : T2Iplpci<1, "pli">,  Requires<[IsThumb2,HasV7]>;
 
 //===----------------------------------------------------------------------===//
 //  Load / store multiple Instructions.
 //
 
 multiclass thumb2_ld_mult<string asm, InstrItinClass itin,
                             InstrItinClass itin_upd, bit L_bit> {
   def IA :
     T2XI<(outs), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops),
          itin, !strconcat(asm, "${p}.w\t$Rn, $regs"), []> {
     bits<4>  Rn;
     bits<16> regs;
 
     let Inst{31-27} = 0b11101;
     let Inst{26-25} = 0b00;
     let Inst{24-23} = 0b01;     // Increment After
     let Inst{22}    = 0;
     let Inst{21}    = 0;        // No writeback
     let Inst{20}    = L_bit;
     let Inst{19-16} = Rn;
     let Inst{15-0}  = regs;
   }
   def IA_UPD :
     T2XIt<(outs GPR:$wb), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops),
           itin_upd, !strconcat(asm, "${p}.w\t$Rn!, $regs"), "$Rn = $wb", []> {
     bits<4>  Rn;
     bits<16> regs;
 
     let Inst{31-27} = 0b11101;
     let Inst{26-25} = 0b00;
     let Inst{24-23} = 0b01;     // Increment After
     let Inst{22}    = 0;
     let Inst{21}    = 1;        // Writeback
     let Inst{20}    = L_bit;
     let Inst{19-16} = Rn;
     let Inst{15-0}  = regs;
   }
   def DB :
     T2XI<(outs), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops),
          itin, !strconcat(asm, "db${p}\t$Rn, $regs"), []> {
     bits<4>  Rn;
     bits<16> regs;
 
     let Inst{31-27} = 0b11101;
     let Inst{26-25} = 0b00;
     let Inst{24-23} = 0b10;     // Decrement Before
     let Inst{22}    = 0;
     let Inst{21}    = 0;        // No writeback
     let Inst{20}    = L_bit;
     let Inst{19-16} = Rn;
     let Inst{15-0}  = regs;
   }
   def DB_UPD :
     T2XIt<(outs GPR:$wb), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops),
           itin_upd, !strconcat(asm, "db${p}\t$Rn!, $regs"), "$Rn = $wb", []> {
     bits<4>  Rn;
     bits<16> regs;
 
     let Inst{31-27} = 0b11101;
     let Inst{26-25} = 0b00;
     let Inst{24-23} = 0b10;     // Decrement Before
     let Inst{22}    = 0;
     let Inst{21}    = 1;        // Writeback
     let Inst{20}    = L_bit;
     let Inst{19-16} = Rn;
     let Inst{15-0}  = regs;
   }
 }
 
 let hasSideEffects = 0 in {
 
 let mayLoad = 1, hasExtraDefRegAllocReq = 1 in
 defm t2LDM : thumb2_ld_mult<"ldm", IIC_iLoad_m, IIC_iLoad_mu, 1>;
 
 multiclass thumb2_st_mult<string asm, InstrItinClass itin,
                             InstrItinClass itin_upd, bit L_bit> {
   def IA :
     T2XI<(outs), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops),
          itin, !strconcat(asm, "${p}.w\t$Rn, $regs"), []> {
     bits<4>  Rn;
     bits<16> regs;
 
     let Inst{31-27} = 0b11101;
     let Inst{26-25} = 0b00;
     let Inst{24-23} = 0b01;     // Increment After
     let Inst{22}    = 0;
     let Inst{21}    = 0;        // No writeback
     let Inst{20}    = L_bit;
     let Inst{19-16} = Rn;
     let Inst{15}    = 0;
     let Inst{14}    = regs{14};
     let Inst{13}    = 0;
     let Inst{12-0}  = regs{12-0};
   }
   def IA_UPD :
     T2XIt<(outs GPR:$wb), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops),
           itin_upd, !strconcat(asm, "${p}.w\t$Rn!, $regs"), "$Rn = $wb", []> {
     bits<4>  Rn;
     bits<16> regs;
 
     let Inst{31-27} = 0b11101;
     let Inst{26-25} = 0b00;
     let Inst{24-23} = 0b01;     // Increment After
     let Inst{22}    = 0;
     let Inst{21}    = 1;        // Writeback
     let Inst{20}    = L_bit;
     let Inst{19-16} = Rn;
     let Inst{15}    = 0;
     let Inst{14}    = regs{14};
     let Inst{13}    = 0;
     let Inst{12-0}  = regs{12-0};
   }
   def DB :
     T2XI<(outs), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops),
          itin, !strconcat(asm, "db${p}\t$Rn, $regs"), []> {
     bits<4>  Rn;
     bits<16> regs;
 
     let Inst{31-27} = 0b11101;
     let Inst{26-25} = 0b00;
     let Inst{24-23} = 0b10;     // Decrement Before
     let Inst{22}    = 0;
     let Inst{21}    = 0;        // No writeback
     let Inst{20}    = L_bit;
     let Inst{19-16} = Rn;
     let Inst{15}    = 0;
     let Inst{14}    = regs{14};
     let Inst{13}    = 0;
     let Inst{12-0}  = regs{12-0};
   }
   def DB_UPD :
     T2XIt<(outs GPR:$wb), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops),
           itin_upd, !strconcat(asm, "db${p}\t$Rn!, $regs"), "$Rn = $wb", []> {
     bits<4>  Rn;
     bits<16> regs;
 
     let Inst{31-27} = 0b11101;
     let Inst{26-25} = 0b00;
     let Inst{24-23} = 0b10;     // Decrement Before
     let Inst{22}    = 0;
     let Inst{21}    = 1;        // Writeback
     let Inst{20}    = L_bit;
     let Inst{19-16} = Rn;
     let Inst{15}    = 0;
     let Inst{14}    = regs{14};
     let Inst{13}    = 0;
     let Inst{12-0}  = regs{12-0};
   }
 }
 
 
 let mayStore = 1, hasExtraSrcRegAllocReq = 1 in
 defm t2STM : thumb2_st_mult<"stm", IIC_iStore_m, IIC_iStore_mu, 0>;
 
 } // hasSideEffects
 
 
 //===----------------------------------------------------------------------===//
 //  Move Instructions.
 //
 
 let hasSideEffects = 0 in
 def t2MOVr : T2sTwoReg<(outs GPRnopc:$Rd), (ins GPRnopc:$Rm), IIC_iMOVr,
                    "mov", ".w\t$Rd, $Rm", []>, Sched<[WriteALU]> {
   let Inst{31-27} = 0b11101;
   let Inst{26-25} = 0b01;
   let Inst{24-21} = 0b0010;
   let Inst{19-16} = 0b1111; // Rn
   let Inst{14-12} = 0b000;
   let Inst{7-4} = 0b0000;
 }
 def : t2InstAlias<"mov${p}.w $Rd, $Rm", (t2MOVr GPRnopc:$Rd, GPRnopc:$Rm,
                                                 pred:$p, zero_reg)>;
 def : t2InstAlias<"movs${p}.w $Rd, $Rm", (t2MOVr GPRnopc:$Rd, GPRnopc:$Rm,
                                                  pred:$p, CPSR)>;
 def : t2InstAlias<"movs${p} $Rd, $Rm", (t2MOVr GPRnopc:$Rd, GPRnopc:$Rm,
                                                pred:$p, CPSR)>;
 
 // AddedComplexity to ensure isel tries t2MOVi before t2MOVi16.
 let isReMaterializable = 1, isAsCheapAsAMove = 1, isMoveImm = 1,
     AddedComplexity = 1 in
 def t2MOVi : T2sOneRegImm<(outs rGPR:$Rd), (ins t2_so_imm:$imm), IIC_iMOVi,
                    "mov", ".w\t$Rd, $imm",
                    [(set rGPR:$Rd, t2_so_imm:$imm)]>, Sched<[WriteALU]> {
   let Inst{31-27} = 0b11110;
   let Inst{25} = 0;
   let Inst{24-21} = 0b0010;
   let Inst{19-16} = 0b1111; // Rn
   let Inst{15} = 0;
 }
 
 // cc_out is handled as part of the explicit mnemonic in the parser for 'mov'.
 // Use aliases to get that to play nice here.
 def : t2InstAlias<"movs${p}.w $Rd, $imm", (t2MOVi rGPR:$Rd, t2_so_imm:$imm,
                                                 pred:$p, CPSR)>;
 def : t2InstAlias<"movs${p} $Rd, $imm", (t2MOVi rGPR:$Rd, t2_so_imm:$imm,
                                                 pred:$p, CPSR)>;
 
 def : t2InstAlias<"mov${p}.w $Rd, $imm", (t2MOVi rGPR:$Rd, t2_so_imm:$imm,
                                                  pred:$p, zero_reg)>;
 def : t2InstAlias<"mov${p} $Rd, $imm", (t2MOVi rGPR:$Rd, t2_so_imm:$imm,
                                                pred:$p, zero_reg)>;
 
 let isReMaterializable = 1, isAsCheapAsAMove = 1, isMoveImm = 1 in
 def t2MOVi16 : T2I<(outs rGPR:$Rd), (ins imm0_65535_expr:$imm), IIC_iMOVi,
                    "movw", "\t$Rd, $imm",
                    [(set rGPR:$Rd, imm0_65535:$imm)]>, Sched<[WriteALU]>,
                    Requires<[IsThumb, HasV8MBaseline]> {
   let Inst{31-27} = 0b11110;
   let Inst{25} = 1;
   let Inst{24-21} = 0b0010;
   let Inst{20} = 0; // The S bit.
   let Inst{15} = 0;
 
   bits<4> Rd;
   bits<16> imm;
 
   let Inst{11-8}  = Rd;
   let Inst{19-16} = imm{15-12};
   let Inst{26}    = imm{11};
   let Inst{14-12} = imm{10-8};
   let Inst{7-0}   = imm{7-0};
   let DecoderMethod = "DecodeT2MOVTWInstruction";
 }
 
 def : InstAlias<"mov${p} $Rd, $imm",
                 (t2MOVi16 rGPR:$Rd, imm256_65535_expr:$imm, pred:$p), 0>,
                 Requires<[IsThumb, HasV8MBaseline]>, Sched<[WriteALU]>;
 
 def t2MOVi16_ga_pcrel : PseudoInst<(outs rGPR:$Rd),
                                 (ins i32imm:$addr, pclabel:$id), IIC_iMOVi, []>,
                         Sched<[WriteALU]>;
 
 let Constraints = "$src = $Rd" in {
 def t2MOVTi16 : T2I<(outs rGPR:$Rd),
                     (ins rGPR:$src, imm0_65535_expr:$imm), IIC_iMOVi,
                     "movt", "\t$Rd, $imm",
                     [(set rGPR:$Rd,
                           (or (and rGPR:$src, 0xffff), lo16AllZero:$imm))]>,
                           Sched<[WriteALU]>,
                           Requires<[IsThumb, HasV8MBaseline]> {
   let Inst{31-27} = 0b11110;
   let Inst{25} = 1;
   let Inst{24-21} = 0b0110;
   let Inst{20} = 0; // The S bit.
   let Inst{15} = 0;
 
   bits<4> Rd;
   bits<16> imm;
 
   let Inst{11-8}  = Rd;
   let Inst{19-16} = imm{15-12};
   let Inst{26}    = imm{11};
   let Inst{14-12} = imm{10-8};
   let Inst{7-0}   = imm{7-0};
   let DecoderMethod = "DecodeT2MOVTWInstruction";
 }
 
 def t2MOVTi16_ga_pcrel : PseudoInst<(outs rGPR:$Rd),
                      (ins rGPR:$src, i32imm:$addr, pclabel:$id), IIC_iMOVi, []>,
                      Sched<[WriteALU]>, Requires<[IsThumb, HasV8MBaseline]>;
 } // Constraints
 
 def : T2Pat<(or rGPR:$src, 0xffff0000), (t2MOVTi16 rGPR:$src, 0xffff)>;
 
 //===----------------------------------------------------------------------===//
 //  Extend Instructions.
 //
 
 // Sign extenders
 
 def t2SXTB  : T2I_ext_rrot<0b100, "sxtb">;
 def t2SXTH  : T2I_ext_rrot<0b000, "sxth">;
 def t2SXTB16 : T2I_ext_rrot_xtb16<0b010, "sxtb16">;
 
 def t2SXTAB : T2I_exta_rrot<0b100, "sxtab">;
 def t2SXTAH : T2I_exta_rrot<0b000, "sxtah">;
 def t2SXTAB16 : T2I_exta_rrot<0b010, "sxtab16">;
 
 def : T2Pat<(sext_inreg (rotr rGPR:$Rn, rot_imm:$rot), i8),
             (t2SXTB rGPR:$Rn, rot_imm:$rot)>;
 def : T2Pat<(sext_inreg (rotr rGPR:$Rn, rot_imm:$rot), i16),
             (t2SXTH rGPR:$Rn, rot_imm:$rot)>;
 def : Thumb2DSPPat<(add rGPR:$Rn,
                             (sext_inreg (rotr rGPR:$Rm, rot_imm:$rot), i8)),
             (t2SXTAB rGPR:$Rn, rGPR:$Rm, rot_imm:$rot)>;
 def : Thumb2DSPPat<(add rGPR:$Rn,
                             (sext_inreg (rotr rGPR:$Rm, rot_imm:$rot), i16)),
             (t2SXTAH rGPR:$Rn, rGPR:$Rm, rot_imm:$rot)>;
 def : Thumb2DSPPat<(int_arm_sxtb16 rGPR:$Rn),
                    (t2SXTB16 rGPR:$Rn, 0)>;
 def : Thumb2DSPPat<(int_arm_sxtab16 rGPR:$Rn, rGPR:$Rm),
                    (t2SXTAB16 rGPR:$Rn, rGPR:$Rm, 0)>;
 
 
 // A simple right-shift can also be used in most cases (the exception is the
 // SXTH operations with a rotate of 24: there the non-contiguous bits are
 // relevant).
 def : Thumb2DSPPat<(add rGPR:$Rn, (sext_inreg
                                         (srl rGPR:$Rm, rot_imm:$rot), i8)),
                        (t2SXTAB rGPR:$Rn, rGPR:$Rm, rot_imm:$rot)>;
 def : Thumb2DSPPat<(add rGPR:$Rn, (sext_inreg
                                         (srl rGPR:$Rm, imm8_or_16:$rot), i16)),
                        (t2SXTAH rGPR:$Rn, rGPR:$Rm, rot_imm:$rot)>;
 def : Thumb2DSPPat<(add rGPR:$Rn, (sext_inreg
                                         (rotr rGPR:$Rm, (i32 24)), i16)),
                        (t2SXTAH rGPR:$Rn, rGPR:$Rm, (i32 3))>;
 def : Thumb2DSPPat<(add rGPR:$Rn, (sext_inreg
                                         (or (srl rGPR:$Rm, (i32 24)),
                                               (shl rGPR:$Rm, (i32 8))), i16)),
                        (t2SXTAH rGPR:$Rn, rGPR:$Rm, (i32 3))>;
 
 // Zero extenders
 
 let AddedComplexity = 16 in {
 def t2UXTB   : T2I_ext_rrot<0b101, "uxtb">;
 def t2UXTH   : T2I_ext_rrot<0b001, "uxth">;
 def t2UXTB16 : T2I_ext_rrot_xtb16<0b011, "uxtb16">;
 
 def : Thumb2DSPPat<(and (rotr rGPR:$Rm, rot_imm:$rot), 0x000000FF),
                        (t2UXTB rGPR:$Rm, rot_imm:$rot)>;
 def : Thumb2DSPPat<(and (rotr rGPR:$Rm, rot_imm:$rot), 0x0000FFFF),
                        (t2UXTH rGPR:$Rm, rot_imm:$rot)>;
 def : Thumb2DSPPat<(and (rotr rGPR:$Rm, rot_imm:$rot), 0x00FF00FF),
                        (t2UXTB16 rGPR:$Rm, rot_imm:$rot)>;
 
 def : Thumb2DSPPat<(int_arm_uxtb16 rGPR:$Rm),
                    (t2UXTB16 rGPR:$Rm, 0)>;
 
 // FIXME: This pattern incorrectly assumes the shl operator is a rotate.
 //        The transformation should probably be done as a combiner action
 //        instead so we can include a check for masking back in the upper
 //        eight bits of the source into the lower eight bits of the result.
 //def : T2Pat<(and (shl rGPR:$Src, (i32 8)), 0xFF00FF),
 //            (t2UXTB16 rGPR:$Src, 3)>,
 //          Requires<[HasDSP, IsThumb2]>;
 def : T2Pat<(and (srl rGPR:$Src, (i32 8)), 0xFF00FF),
             (t2UXTB16 rGPR:$Src, 1)>,
         Requires<[HasDSP, IsThumb2]>;
 
 def t2UXTAB : T2I_exta_rrot<0b101, "uxtab">;
 def t2UXTAH : T2I_exta_rrot<0b001, "uxtah">;
 def t2UXTAB16 : T2I_exta_rrot<0b011, "uxtab16">;
 
 def : Thumb2DSPPat<(add rGPR:$Rn, (and (rotr rGPR:$Rm, rot_imm:$rot),
                                             0x00FF)),
                        (t2UXTAB rGPR:$Rn, rGPR:$Rm, rot_imm:$rot)>;
 def : Thumb2DSPPat<(add rGPR:$Rn, (and (rotr rGPR:$Rm, rot_imm:$rot),
                                             0xFFFF)),
                        (t2UXTAH rGPR:$Rn, rGPR:$Rm, rot_imm:$rot)>;
 def : Thumb2DSPPat<(add rGPR:$Rn, (and (srl rGPR:$Rm, rot_imm:$rot),
                                            0xFF)),
                        (t2UXTAB rGPR:$Rn, rGPR:$Rm, rot_imm:$rot)>;
 def : Thumb2DSPPat<(add rGPR:$Rn, (and (srl rGPR:$Rm, imm8_or_16:$rot),
                                             0xFFFF)),
                        (t2UXTAH rGPR:$Rn, rGPR:$Rm, rot_imm:$rot)>;
 def : Thumb2DSPPat<(int_arm_uxtab16 rGPR:$Rn, rGPR:$Rm),
                       (t2UXTAB16 rGPR:$Rn, rGPR:$Rm, 0)>;
 }
 
 
 //===----------------------------------------------------------------------===//
 //  Arithmetic Instructions.
 //
 
 let isAdd = 1 in
 defm t2ADD  : T2I_bin_ii12rs<0b000, "add", add, 1>;
 defm t2SUB  : T2I_bin_ii12rs<0b101, "sub", sub>;
 
 // ADD and SUB with 's' bit set. No 12-bit immediate (T4) variants.
 //
 // Currently, t2ADDS/t2SUBS are pseudo opcodes that exist only in the
 // selection DAG. They are "lowered" to real t2ADD/t2SUB opcodes by
 // AdjustInstrPostInstrSelection where we determine whether or not to
 // set the "s" bit based on CPSR liveness.
 //
 // FIXME: Eliminate t2ADDS/t2SUBS pseudo opcodes after adding tablegen
 // support for an optional CPSR definition that corresponds to the DAG
 // node's second value. We can then eliminate the implicit def of CPSR.
 defm t2ADDS : T2I_bin_s_irs <IIC_iALUi, IIC_iALUr, IIC_iALUsi, ARMaddc, 1>;
 defm t2SUBS : T2I_bin_s_irs <IIC_iALUi, IIC_iALUr, IIC_iALUsi, ARMsubc>;
 
 let hasPostISelHook = 1 in {
 defm t2ADC  : T2I_adde_sube_irs<0b1010, "adc", ARMadde, 1>;
 defm t2SBC  : T2I_adde_sube_irs<0b1011, "sbc", ARMsube>;
 }
 
 def : t2InstSubst<"adc${s}${p} $rd, $rn, $imm",
                  (t2SBCri rGPR:$rd, rGPR:$rn, t2_so_imm_not:$imm, pred:$p, s_cc_out:$s)>;
 def : t2InstSubst<"sbc${s}${p} $rd, $rn, $imm",
                  (t2ADCri rGPR:$rd, rGPR:$rn, t2_so_imm_not:$imm, pred:$p, s_cc_out:$s)>;
 
 def : t2InstSubst<"add${s}${p}.w $rd, $rn, $imm",
                  (t2SUBri GPRnopc:$rd, GPRnopc:$rn, t2_so_imm_neg:$imm, pred:$p, s_cc_out:$s)>;
 def : t2InstSubst<"addw${p} $rd, $rn, $imm",
                  (t2SUBri12 GPRnopc:$rd, GPR:$rn, t2_so_imm_neg:$imm, pred:$p)>;
 def : t2InstSubst<"sub${s}${p}.w $rd, $rn, $imm",
                  (t2ADDri GPRnopc:$rd, GPRnopc:$rn, t2_so_imm_neg:$imm, pred:$p, s_cc_out:$s)>;
 def : t2InstSubst<"subw${p} $rd, $rn, $imm",
                  (t2ADDri12 GPRnopc:$rd, GPR:$rn, t2_so_imm_neg:$imm, pred:$p)>;
 def : t2InstSubst<"subw${p} $Rd, $Rn, $imm",
                  (t2ADDri12 GPRnopc:$Rd, GPR:$Rn, imm0_4095_neg:$imm, pred:$p)>;
 def : t2InstSubst<"sub${s}${p} $rd, $rn, $imm",
                  (t2ADDri GPRnopc:$rd, GPRnopc:$rn, t2_so_imm_neg:$imm, pred:$p, s_cc_out:$s)>;
 def : t2InstSubst<"sub${p} $rd, $rn, $imm",
                  (t2ADDri12 GPRnopc:$rd, GPR:$rn, t2_so_imm_neg:$imm, pred:$p)>;
 // RSB
 defm t2RSB  : T2I_rbin_irs  <0b1110, "rsb", sub>;
 
 // FIXME: Eliminate them if we can write def : Pat patterns which defines
 // CPSR and the implicit def of CPSR is not needed.
 defm t2RSBS : T2I_rbin_s_is <ARMsubc>;
 
 // (sub X, imm) gets canonicalized to (add X, -imm).  Match this form.
 // The assume-no-carry-in form uses the negation of the input since add/sub
 // assume opposite meanings of the carry flag (i.e., carry == !borrow).
 // See the definition of AddWithCarry() in the ARM ARM A2.2.1 for the gory
 // details.
 // The AddedComplexity preferences the first variant over the others since
 // it can be shrunk to a 16-bit wide encoding, while the others cannot.
 let AddedComplexity = 1 in
 def : T2Pat<(add        GPR:$src, imm1_255_neg:$imm),
             (t2SUBri    GPR:$src, imm1_255_neg:$imm)>;
 def : T2Pat<(add        GPR:$src, t2_so_imm_neg:$imm),
             (t2SUBri    GPR:$src, t2_so_imm_neg:$imm)>;
 def : T2Pat<(add        GPR:$src, imm0_4095_neg:$imm),
             (t2SUBri12  GPR:$src, imm0_4095_neg:$imm)>;
 def : T2Pat<(add        GPR:$src, imm0_65535_neg:$imm),
             (t2SUBrr    GPR:$src, (t2MOVi16 (imm_neg_XFORM imm:$imm)))>;
 
 let AddedComplexity = 1 in
 def : T2Pat<(ARMaddc    rGPR:$src, imm1_255_neg:$imm),
             (t2SUBSri   rGPR:$src, imm1_255_neg:$imm)>;
 def : T2Pat<(ARMaddc    rGPR:$src, t2_so_imm_neg:$imm),
             (t2SUBSri   rGPR:$src, t2_so_imm_neg:$imm)>;
 def : T2Pat<(ARMaddc    rGPR:$src, imm0_65535_neg:$imm),
             (t2SUBSrr   rGPR:$src, (t2MOVi16 (imm_neg_XFORM imm:$imm)))>;
 // The with-carry-in form matches bitwise not instead of the negation.
 // Effectively, the inverse interpretation of the carry flag already accounts
 // for part of the negation.
 let AddedComplexity = 1 in
 def : T2Pat<(ARMadde    rGPR:$src, imm0_255_not:$imm, CPSR),
             (t2SBCri    rGPR:$src, imm0_255_not:$imm)>;
 def : T2Pat<(ARMadde    rGPR:$src, t2_so_imm_not:$imm, CPSR),
             (t2SBCri    rGPR:$src, t2_so_imm_not:$imm)>;
 def : T2Pat<(ARMadde    rGPR:$src, imm0_65535_neg:$imm, CPSR),
             (t2SBCrr    rGPR:$src, (t2MOVi16 (imm_not_XFORM imm:$imm)))>;
 
 def t2SEL : T2ThreeReg<(outs GPR:$Rd), (ins GPR:$Rn, GPR:$Rm),
                 NoItinerary, "sel", "\t$Rd, $Rn, $Rm",
                 [(set GPR:$Rd, (int_arm_sel GPR:$Rn, GPR:$Rm))]>,
           Requires<[IsThumb2, HasDSP]> {
   let Inst{31-27} = 0b11111;
   let Inst{26-24} = 0b010;
   let Inst{23} = 0b1;
   let Inst{22-20} = 0b010;
   let Inst{15-12} = 0b1111;
   let Inst{7} = 0b1;
   let Inst{6-4} = 0b000;
 }
 
 // A6.3.13, A6.3.14, A6.3.15 Parallel addition and subtraction (signed/unsigned)
 // And Miscellaneous operations -- for disassembly only
 class T2I_pam<bits<3> op22_20, bits<4> op7_4, string opc,
               list<dag> pat, dag iops, string asm>
   : T2I<(outs rGPR:$Rd), iops, NoItinerary, opc, asm, pat>,
     Requires<[IsThumb2, HasDSP]> {
   let Inst{31-27} = 0b11111;
   let Inst{26-23} = 0b0101;
   let Inst{22-20} = op22_20;
   let Inst{15-12} = 0b1111;
   let Inst{7-4} = op7_4;
 
   bits<4> Rd;
   bits<4> Rn;
   bits<4> Rm;
 
   let Inst{11-8}  = Rd;
   let Inst{19-16} = Rn;
   let Inst{3-0}   = Rm;
 }
 
 class T2I_pam_intrinsics<bits<3> op22_20, bits<4> op7_4, string opc,
                          Intrinsic intrinsic>
   : T2I_pam<op22_20, op7_4, opc,
     [(set rGPR:$Rd, (intrinsic rGPR:$Rn, rGPR:$Rm))],
     (ins rGPR:$Rn, rGPR:$Rm), "\t$Rd, $Rn, $Rm">;
 
 class T2I_pam_intrinsics_rev<bits<3> op22_20, bits<4> op7_4, string opc>
   : T2I_pam<op22_20, op7_4, opc, [],
     (ins rGPR:$Rm, rGPR:$Rn), "\t$Rd, $Rm, $Rn">;
 
 // Saturating add/subtract
 def t2QADD16  : T2I_pam_intrinsics<0b001, 0b0001, "qadd16", int_arm_qadd16>;
 def t2QADD8   : T2I_pam_intrinsics<0b000, 0b0001, "qadd8", int_arm_qadd8>;
 def t2QASX    : T2I_pam_intrinsics<0b010, 0b0001, "qasx", int_arm_qasx>;
 def t2UQSUB8  : T2I_pam_intrinsics<0b100, 0b0101, "uqsub8", int_arm_uqsub8>;
 def t2QSAX    : T2I_pam_intrinsics<0b110, 0b0001, "qsax", int_arm_qsax>;
 def t2QSUB16  : T2I_pam_intrinsics<0b101, 0b0001, "qsub16", int_arm_qsub16>;
 def t2QSUB8   : T2I_pam_intrinsics<0b100, 0b0001, "qsub8", int_arm_qsub8>;
 def t2UQADD16 : T2I_pam_intrinsics<0b001, 0b0101, "uqadd16", int_arm_uqadd16>;
 def t2UQADD8  : T2I_pam_intrinsics<0b000, 0b0101, "uqadd8", int_arm_uqadd8>;
 def t2UQASX   : T2I_pam_intrinsics<0b010, 0b0101, "uqasx", int_arm_uqasx>;
 def t2UQSAX   : T2I_pam_intrinsics<0b110, 0b0101, "uqsax", int_arm_uqsax>;
 def t2UQSUB16 : T2I_pam_intrinsics<0b101, 0b0101, "uqsub16", int_arm_uqsub16>;
 def t2QADD    : T2I_pam_intrinsics_rev<0b000, 0b1000, "qadd">;
 def t2QSUB    : T2I_pam_intrinsics_rev<0b000, 0b1010, "qsub">;
 def t2QDADD   : T2I_pam_intrinsics_rev<0b000, 0b1001, "qdadd">;
 def t2QDSUB   : T2I_pam_intrinsics_rev<0b000, 0b1011, "qdsub">;
 
 def : Thumb2DSPPat<(int_arm_qadd rGPR:$Rm, rGPR:$Rn),
                    (t2QADD rGPR:$Rm, rGPR:$Rn)>;
 def : Thumb2DSPPat<(int_arm_qsub rGPR:$Rm, rGPR:$Rn),
                    (t2QSUB rGPR:$Rm, rGPR:$Rn)>;
 def : Thumb2DSPPat<(int_arm_qadd(int_arm_qadd rGPR:$Rm, rGPR:$Rm), rGPR:$Rn),
                    (t2QDADD rGPR:$Rm, rGPR:$Rn)>;
 def : Thumb2DSPPat<(int_arm_qsub rGPR:$Rm, (int_arm_qadd rGPR:$Rn, rGPR:$Rn)),
                    (t2QDSUB rGPR:$Rm, rGPR:$Rn)>;
 
 // Signed/Unsigned add/subtract
 
 def t2SASX    : T2I_pam_intrinsics<0b010, 0b0000, "sasx", int_arm_sasx>;
 def t2SADD16  : T2I_pam_intrinsics<0b001, 0b0000, "sadd16", int_arm_sadd16>;
 def t2SADD8   : T2I_pam_intrinsics<0b000, 0b0000, "sadd8", int_arm_sadd8>;
 def t2SSAX    : T2I_pam_intrinsics<0b110, 0b0000, "ssax", int_arm_ssax>;
 def t2SSUB16  : T2I_pam_intrinsics<0b101, 0b0000, "ssub16", int_arm_ssub16>;
 def t2SSUB8   : T2I_pam_intrinsics<0b100, 0b0000, "ssub8", int_arm_ssub8>;
 def t2UASX    : T2I_pam_intrinsics<0b010, 0b0100, "uasx", int_arm_uasx>;
 def t2UADD16  : T2I_pam_intrinsics<0b001, 0b0100, "uadd16", int_arm_uadd16>;
 def t2UADD8   : T2I_pam_intrinsics<0b000, 0b0100, "uadd8", int_arm_uadd8>;
 def t2USAX    : T2I_pam_intrinsics<0b110, 0b0100, "usax", int_arm_usax>;
 def t2USUB16  : T2I_pam_intrinsics<0b101, 0b0100, "usub16", int_arm_usub16>;
 def t2USUB8   : T2I_pam_intrinsics<0b100, 0b0100, "usub8", int_arm_usub8>;
 
 // Signed/Unsigned halving add/subtract
 
 def t2SHASX   : T2I_pam_intrinsics<0b010, 0b0010, "shasx", int_arm_shasx>;
 def t2SHADD16 : T2I_pam_intrinsics<0b001, 0b0010, "shadd16", int_arm_shadd16>;
 def t2SHADD8  : T2I_pam_intrinsics<0b000, 0b0010, "shadd8", int_arm_shadd8>;
 def t2SHSAX   : T2I_pam_intrinsics<0b110, 0b0010, "shsax", int_arm_shsax>;
 def t2SHSUB16 : T2I_pam_intrinsics<0b101, 0b0010, "shsub16", int_arm_shsub16>;
 def t2SHSUB8  : T2I_pam_intrinsics<0b100, 0b0010, "shsub8", int_arm_shsub8>;
 def t2UHASX   : T2I_pam_intrinsics<0b010, 0b0110, "uhasx", int_arm_uhasx>;
 def t2UHADD16 : T2I_pam_intrinsics<0b001, 0b0110, "uhadd16", int_arm_uhadd16>;
 def t2UHADD8  : T2I_pam_intrinsics<0b000, 0b0110, "uhadd8", int_arm_uhadd8>;
 def t2UHSAX   : T2I_pam_intrinsics<0b110, 0b0110, "uhsax", int_arm_uhsax>;
 def t2UHSUB16 : T2I_pam_intrinsics<0b101, 0b0110, "uhsub16", int_arm_uhsub16>;
 def t2UHSUB8  : T2I_pam_intrinsics<0b100, 0b0110, "uhsub8", int_arm_uhsub8>;
 
 // Helper class for disassembly only
 // A6.3.16 & A6.3.17
 // T2Imac - Thumb2 multiply [accumulate, and absolute difference] instructions.
 class T2ThreeReg_mac<bit long, bits<3> op22_20, bits<4> op7_4, dag oops,
   dag iops, InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : T2ThreeReg<oops, iops, itin, opc, asm, pattern> {
   let Inst{31-27} = 0b11111;
   let Inst{26-24} = 0b011;
   let Inst{23}    = long;
   let Inst{22-20} = op22_20;
   let Inst{7-4}   = op7_4;
 }
 
 class T2FourReg_mac<bit long, bits<3> op22_20, bits<4> op7_4, dag oops,
   dag iops, InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : T2FourReg<oops, iops, itin, opc, asm, pattern> {
   let Inst{31-27} = 0b11111;
   let Inst{26-24} = 0b011;
   let Inst{23}    = long;
   let Inst{22-20} = op22_20;
   let Inst{7-4}   = op7_4;
 }
 
 // Unsigned Sum of Absolute Differences [and Accumulate].
 def t2USAD8   : T2ThreeReg_mac<0, 0b111, 0b0000, (outs rGPR:$Rd),
                                            (ins rGPR:$Rn, rGPR:$Rm),
                         NoItinerary, "usad8", "\t$Rd, $Rn, $Rm",
                         [(set rGPR:$Rd, (int_arm_usad8 rGPR:$Rn, rGPR:$Rm))]>,
           Requires<[IsThumb2, HasDSP]> {
   let Inst{15-12} = 0b1111;
 }
 def t2USADA8  : T2FourReg_mac<0, 0b111, 0b0000, (outs rGPR:$Rd),
                        (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), NoItinerary,
                         "usada8", "\t$Rd, $Rn, $Rm, $Ra",
           [(set rGPR:$Rd, (int_arm_usada8 rGPR:$Rn, rGPR:$Rm, rGPR:$Ra))]>,
           Requires<[IsThumb2, HasDSP]>;
 
 // Signed/Unsigned saturate.
 let hasSideEffects = 1 in
 class T2SatI<dag iops, string opc, string asm>
   : T2I<(outs rGPR:$Rd), iops, NoItinerary, opc, asm, []> {
   bits<4> Rd;
   bits<4> Rn;
   bits<5> sat_imm;
   bits<6> sh;
 
   let Inst{31-24} = 0b11110011;
   let Inst{21} = sh{5};
   let Inst{20} = 0;
   let Inst{19-16} = Rn;
   let Inst{15} = 0;
   let Inst{14-12} = sh{4-2};
   let Inst{11-8}  = Rd;
   let Inst{7-6} = sh{1-0};
   let Inst{5} = 0;
   let Inst{4-0}   = sat_imm;
 }
 
 def t2SSAT: T2SatI<(ins imm1_32:$sat_imm, rGPR:$Rn, t2_shift_imm:$sh),
                    "ssat", "\t$Rd, $sat_imm, $Rn$sh">,
                    Requires<[IsThumb2]> {
   let Inst{23-22} = 0b00;
   let Inst{5}  = 0;
 }
 
 def t2SSAT16: T2SatI<(ins imm1_16:$sat_imm, rGPR:$Rn),
                      "ssat16", "\t$Rd, $sat_imm, $Rn">,
                      Requires<[IsThumb2, HasDSP]> {
   let Inst{23-22} = 0b00;
   let sh = 0b100000;
   let Inst{4} = 0;
 }
 
 def t2USAT: T2SatI<(ins imm0_31:$sat_imm, rGPR:$Rn, t2_shift_imm:$sh),
                     "usat", "\t$Rd, $sat_imm, $Rn$sh">,
                     Requires<[IsThumb2]> {
   let Inst{23-22} = 0b10;
 }
 
 def t2USAT16: T2SatI<(ins imm0_15:$sat_imm, rGPR:$Rn),
                      "usat16", "\t$Rd, $sat_imm, $Rn">,
                      Requires<[IsThumb2, HasDSP]> {
   let Inst{23-22} = 0b10;
   let sh = 0b100000;
   let Inst{4} = 0;
 }
 
 def : T2Pat<(ARMssatnoshift GPRnopc:$Rn, imm0_31:$imm),
              (t2SSAT imm0_31:$imm, GPRnopc:$Rn, 0)>;
 def : T2Pat<(ARMusatnoshift GPRnopc:$Rn, imm0_31:$imm),
              (t2USAT imm0_31:$imm, GPRnopc:$Rn, 0)>;
 def : T2Pat<(int_arm_ssat GPR:$a, imm1_32:$pos),
             (t2SSAT imm1_32:$pos, GPR:$a, 0)>;
 def : T2Pat<(int_arm_usat GPR:$a, imm0_31:$pos),
             (t2USAT imm0_31:$pos, GPR:$a, 0)>;
 def : T2Pat<(int_arm_ssat16 GPR:$a, imm1_16:$pos),
             (t2SSAT16 imm1_16:$pos, GPR:$a)>;
 def : T2Pat<(int_arm_usat16 GPR:$a, imm0_15:$pos),
             (t2USAT16 imm0_15:$pos, GPR:$a)>;
 
 //===----------------------------------------------------------------------===//
 //  Shift and rotate Instructions.
 //
 
 defm t2LSL  : T2I_sh_ir<0b00, "lsl", imm1_31, shl>;
 defm t2LSR  : T2I_sh_ir<0b01, "lsr", imm_sr,  srl>;
 defm t2ASR  : T2I_sh_ir<0b10, "asr", imm_sr,  sra>;
 defm t2ROR  : T2I_sh_ir<0b11, "ror", imm0_31, rotr>;
 
 // LSL #0 is actually MOV, and has slightly different permitted registers to
 // LSL with non-zero shift
 def : t2InstAlias<"lsl${s}${p} $Rd, $Rm, #0",
                   (t2MOVr GPRnopc:$Rd, GPRnopc:$Rm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"lsl${s}${p}.w $Rd, $Rm, #0",
                   (t2MOVr GPRnopc:$Rd, GPRnopc:$Rm, pred:$p, cc_out:$s)>;
 
 // (rotr x, (and y, 0x...1f)) ==> (ROR x, y)
 def : T2Pat<(rotr rGPR:$lhs, (and rGPR:$rhs, lo5AllOne)),
             (t2RORrr rGPR:$lhs, rGPR:$rhs)>;
 
 let Uses = [CPSR] in {
 def t2RRX : T2sTwoReg<(outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi,
                    "rrx", "\t$Rd, $Rm",
                    [(set rGPR:$Rd, (ARMrrx rGPR:$Rm))]>, Sched<[WriteALU]> {
   let Inst{31-27} = 0b11101;
   let Inst{26-25} = 0b01;
   let Inst{24-21} = 0b0010;
   let Inst{19-16} = 0b1111; // Rn
   let Inst{14-12} = 0b000;
   let Inst{7-4} = 0b0011;
 }
 }
 
 let isCodeGenOnly = 1, Defs = [CPSR] in {
 def t2MOVsrl_flag : T2TwoRegShiftImm<
                         (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi,
                         "lsrs", ".w\t$Rd, $Rm, #1",
                         [(set rGPR:$Rd, (ARMsrl_flag rGPR:$Rm))]>,
                         Sched<[WriteALU]> {
   let Inst{31-27} = 0b11101;
   let Inst{26-25} = 0b01;
   let Inst{24-21} = 0b0010;
   let Inst{20} = 1; // The S bit.
   let Inst{19-16} = 0b1111; // Rn
   let Inst{5-4} = 0b01; // Shift type.
   // Shift amount = Inst{14-12:7-6} = 1.
   let Inst{14-12} = 0b000;
   let Inst{7-6} = 0b01;
 }
 def t2MOVsra_flag : T2TwoRegShiftImm<
                         (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi,
                         "asrs", ".w\t$Rd, $Rm, #1",
                         [(set rGPR:$Rd, (ARMsra_flag rGPR:$Rm))]>,
                         Sched<[WriteALU]> {
   let Inst{31-27} = 0b11101;
   let Inst{26-25} = 0b01;
   let Inst{24-21} = 0b0010;
   let Inst{20} = 1; // The S bit.
   let Inst{19-16} = 0b1111; // Rn
   let Inst{5-4} = 0b10; // Shift type.
   // Shift amount = Inst{14-12:7-6} = 1.
   let Inst{14-12} = 0b000;
   let Inst{7-6} = 0b01;
 }
 }
 
 //===----------------------------------------------------------------------===//
 //  Bitwise Instructions.
 //
 
 defm t2AND  : T2I_bin_w_irs<0b0000, "and",
                             IIC_iBITi, IIC_iBITr, IIC_iBITsi, and, 1>;
 defm t2ORR  : T2I_bin_w_irs<0b0010, "orr",
                             IIC_iBITi, IIC_iBITr, IIC_iBITsi, or, 1>;
 defm t2EOR  : T2I_bin_w_irs<0b0100, "eor",
                             IIC_iBITi, IIC_iBITr, IIC_iBITsi, xor, 1>;
 
 defm t2BIC  : T2I_bin_w_irs<0b0001, "bic",
                             IIC_iBITi, IIC_iBITr, IIC_iBITsi,
                             BinOpFrag<(and node:$LHS, (not node:$RHS))>>;
 
 class T2BitFI<dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, list<dag> pattern>
     : T2I<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rd;
   bits<5> msb;
   bits<5> lsb;
 
   let Inst{11-8}  = Rd;
   let Inst{4-0}   = msb{4-0};
   let Inst{14-12} = lsb{4-2};
   let Inst{7-6}   = lsb{1-0};
 }
 
 class T2TwoRegBitFI<dag oops, dag iops, InstrItinClass itin,
               string opc, string asm, list<dag> pattern>
     : T2BitFI<oops, iops, itin, opc, asm, pattern> {
   bits<4> Rn;
 
   let Inst{19-16} = Rn;
 }
 
 let Constraints = "$src = $Rd" in
 def t2BFC : T2BitFI<(outs rGPR:$Rd), (ins rGPR:$src, bf_inv_mask_imm:$imm),
                 IIC_iUNAsi, "bfc", "\t$Rd, $imm",
                 [(set rGPR:$Rd, (and rGPR:$src, bf_inv_mask_imm:$imm))]> {
   let Inst{31-27} = 0b11110;
   let Inst{26} = 0; // should be 0.
   let Inst{25} = 1;
   let Inst{24-20} = 0b10110;
   let Inst{19-16} = 0b1111; // Rn
   let Inst{15} = 0;
   let Inst{5} = 0; // should be 0.
 
   bits<10> imm;
   let msb{4-0} = imm{9-5};
   let lsb{4-0} = imm{4-0};
 }
 
 def t2SBFX: T2TwoRegBitFI<
                 (outs rGPR:$Rd), (ins rGPR:$Rn, imm0_31:$lsb, imm1_32:$msb),
                  IIC_iUNAsi, "sbfx", "\t$Rd, $Rn, $lsb, $msb", []> {
   let Inst{31-27} = 0b11110;
   let Inst{25} = 1;
   let Inst{24-20} = 0b10100;
   let Inst{15} = 0;
 }
 
 def t2UBFX: T2TwoRegBitFI<
                 (outs rGPR:$Rd), (ins rGPR:$Rn, imm0_31:$lsb, imm1_32:$msb),
                  IIC_iUNAsi, "ubfx", "\t$Rd, $Rn, $lsb, $msb", []> {
   let Inst{31-27} = 0b11110;
   let Inst{25} = 1;
   let Inst{24-20} = 0b11100;
   let Inst{15} = 0;
 }
 
 // A8.8.247  UDF - Undefined (Encoding T2)
 def t2UDF : T2XI<(outs), (ins imm0_65535:$imm16), IIC_Br, "udf.w\t$imm16",
                  [(int_arm_undefined imm0_65535:$imm16)]> {
   bits<16> imm16;
   let Inst{31-29} = 0b111;
   let Inst{28-27} = 0b10;
   let Inst{26-20} = 0b1111111;
   let Inst{19-16} = imm16{15-12};
   let Inst{15} = 0b1;
   let Inst{14-12} = 0b010;
   let Inst{11-0} = imm16{11-0};
 }
 
 // A8.6.18  BFI - Bitfield insert (Encoding T1)
 let Constraints = "$src = $Rd" in {
   def t2BFI : T2TwoRegBitFI<(outs rGPR:$Rd),
                   (ins rGPR:$src, rGPR:$Rn, bf_inv_mask_imm:$imm),
                   IIC_iBITi, "bfi", "\t$Rd, $Rn, $imm",
                   [(set rGPR:$Rd, (ARMbfi rGPR:$src, rGPR:$Rn,
                                    bf_inv_mask_imm:$imm))]> {
     let Inst{31-27} = 0b11110;
     let Inst{26} = 0; // should be 0.
     let Inst{25} = 1;
     let Inst{24-20} = 0b10110;
     let Inst{15} = 0;
     let Inst{5} = 0; // should be 0.
 
     bits<10> imm;
     let msb{4-0} = imm{9-5};
     let lsb{4-0} = imm{4-0};
   }
 }
 
 defm t2ORN  : T2I_bin_irs<0b0011, "orn",
                           IIC_iBITi, IIC_iBITr, IIC_iBITsi,
                           BinOpFrag<(or node:$LHS, (not node:$RHS))>, 0, "">;
 
 /// T2I_un_irs - Defines a set of (op reg, {so_imm|r|so_reg}) patterns for a
 /// unary operation that produces a value. These are predicable and can be
 /// changed to modify CPSR.
 multiclass T2I_un_irs<bits<4> opcod, string opc,
                      InstrItinClass iii, InstrItinClass iir, InstrItinClass iis,
                       PatFrag opnode,
                       bit Cheap = 0, bit ReMat = 0, bit MoveImm = 0> {
    // shifted imm
    def i : T2sOneRegImm<(outs rGPR:$Rd), (ins t2_so_imm:$imm), iii,
                 opc, "\t$Rd, $imm",
                 [(set rGPR:$Rd, (opnode t2_so_imm:$imm))]>, Sched<[WriteALU]> {
      let isAsCheapAsAMove = Cheap;
      let isReMaterializable = ReMat;
      let isMoveImm = MoveImm;
      let Inst{31-27} = 0b11110;
      let Inst{25} = 0;
      let Inst{24-21} = opcod;
      let Inst{19-16} = 0b1111; // Rn
      let Inst{15} = 0;
    }
    // register
    def r : T2sTwoReg<(outs rGPR:$Rd), (ins rGPR:$Rm), iir,
                 opc, ".w\t$Rd, $Rm",
                 [(set rGPR:$Rd, (opnode rGPR:$Rm))]>, Sched<[WriteALU]> {
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = opcod;
      let Inst{19-16} = 0b1111; // Rn
      let Inst{14-12} = 0b000; // imm3
      let Inst{7-6} = 0b00; // imm2
      let Inst{5-4} = 0b00; // type
    }
    // shifted register
    def s : T2sOneRegShiftedReg<(outs rGPR:$Rd), (ins t2_so_reg:$ShiftedRm), iis,
                 opc, ".w\t$Rd, $ShiftedRm",
                 [(set rGPR:$Rd, (opnode t2_so_reg:$ShiftedRm))]>,
                 Sched<[WriteALU]> {
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = opcod;
      let Inst{19-16} = 0b1111; // Rn
    }
 }
 
 // Prefer over of t2EORri ra, rb, -1 because mvn has 16-bit version
 let AddedComplexity = 1 in
 defm t2MVN  : T2I_un_irs <0b0011, "mvn",
                           IIC_iMVNi, IIC_iMVNr, IIC_iMVNsi,
                           not, 1, 1, 1>;
 
 let AddedComplexity = 1 in
 def : T2Pat<(and     rGPR:$src, t2_so_imm_not:$imm),
             (t2BICri rGPR:$src, t2_so_imm_not:$imm)>;
 
 // top16Zero - answer true if the upper 16 bits of $src are 0, false otherwise
 def top16Zero: PatLeaf<(i32 rGPR:$src), [{
   return CurDAG->MaskedValueIsZero(SDValue(N,0), APInt::getHighBitsSet(32, 16));
   }]>;
 
 // so_imm_notSext is needed instead of so_imm_not, as the value of imm
 // will match the extended, not the original bitWidth for $src.
 def : T2Pat<(and top16Zero:$src, t2_so_imm_notSext:$imm),
             (t2BICri rGPR:$src, t2_so_imm_notSext:$imm)>;
 
 
 // FIXME: Disable this pattern on Darwin to workaround an assembler bug.
 def : T2Pat<(or      rGPR:$src, t2_so_imm_not:$imm),
             (t2ORNri rGPR:$src, t2_so_imm_not:$imm)>,
             Requires<[IsThumb2]>;
 
 def : T2Pat<(t2_so_imm_not:$src),
             (t2MVNi t2_so_imm_not:$src)>;
 
 // There are shorter Thumb encodings for ADD than ORR, so to increase
 // Thumb2SizeReduction's chances later on we select a t2ADD for an or where
 // possible.
 def : T2Pat<(or AddLikeOrOp:$Rn, t2_so_imm:$imm),
             (t2ADDri $Rn, t2_so_imm:$imm)>;
 
 def : T2Pat<(or AddLikeOrOp:$Rn, imm0_4095:$Rm),
             (t2ADDri12 $Rn, imm0_4095:$Rm)>;
 
 def : T2Pat<(or AddLikeOrOp:$Rn, non_imm32:$Rm),
             (t2ADDrr $Rn, $Rm)>;
 
 //===----------------------------------------------------------------------===//
 //  Multiply Instructions.
 //
 let isCommutable = 1 in
 def t2MUL: T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL32,
                 "mul", "\t$Rd, $Rn, $Rm",
                 [(set rGPR:$Rd, (mul rGPR:$Rn, rGPR:$Rm))]>,
            Sched<[WriteMUL32, ReadMUL, ReadMUL]> {
   let Inst{31-27} = 0b11111;
   let Inst{26-23} = 0b0110;
   let Inst{22-20} = 0b000;
   let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate)
   let Inst{7-4} = 0b0000; // Multiply
 }
 
 class T2FourRegMLA<bits<4> op7_4, string opc, list<dag> pattern>
   : T2FourReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32,
                opc, "\t$Rd, $Rn, $Rm, $Ra", pattern>,
                Requires<[IsThumb2, UseMulOps]>,
     Sched<[WriteMAC32, ReadMUL, ReadMUL, ReadMAC]>  {
   let Inst{31-27} = 0b11111;
   let Inst{26-23} = 0b0110;
   let Inst{22-20} = 0b000;
   let Inst{7-4} = op7_4;
 }
 
 def t2MLA : T2FourRegMLA<0b0000, "mla",
                          [(set rGPR:$Rd, (add (mul rGPR:$Rn, rGPR:$Rm),
                                                rGPR:$Ra))]>;
 def t2MLS: T2FourRegMLA<0b0001, "mls",
                         [(set rGPR:$Rd, (sub rGPR:$Ra, (mul rGPR:$Rn,
                                                             rGPR:$Rm)))]>;
 
 // Extra precision multiplies with low / high results
 let hasSideEffects = 0 in {
 let isCommutable = 1 in {
 def t2SMULL : T2MulLong<0b000, 0b0000, "smull",
                         [(set rGPR:$RdLo, rGPR:$RdHi,
                               (smullohi rGPR:$Rn, rGPR:$Rm))]>;
 def t2UMULL : T2MulLong<0b010, 0b0000, "umull",
                         [(set rGPR:$RdLo, rGPR:$RdHi,
                               (umullohi rGPR:$Rn, rGPR:$Rm))]>;
 } // isCommutable
 
 // Multiply + accumulate
 def t2SMLAL : T2MlaLong<0b100, 0b0000, "smlal">;
 def t2UMLAL : T2MlaLong<0b110, 0b0000, "umlal">;
 def t2UMAAL : T2MlaLong<0b110, 0b0110, "umaal">, Requires<[IsThumb2, HasDSP]>;
 } // hasSideEffects
 
 // Rounding variants of the below included for disassembly only
 
 // Most significant word multiply
 class T2SMMUL<bits<4> op7_4, string opc, list<dag> pattern>
   : T2ThreeReg<(outs rGPR:$Rd),
                (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL32,
                opc, "\t$Rd, $Rn, $Rm", pattern>,
                Requires<[IsThumb2, HasDSP]>,
     Sched<[WriteMUL32, ReadMUL, ReadMUL]> {
   let Inst{31-27} = 0b11111;
   let Inst{26-23} = 0b0110;
   let Inst{22-20} = 0b101;
   let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate)
   let Inst{7-4} = op7_4;
 }
 def t2SMMUL : T2SMMUL<0b0000, "smmul", [(set rGPR:$Rd, (mulhs rGPR:$Rn,
                                                               rGPR:$Rm))]>;
 def t2SMMULR :
   T2SMMUL<0b0001, "smmulr",
           [(set rGPR:$Rd, (ARMsmmlar rGPR:$Rn, rGPR:$Rm, (i32 0)))]>;
 
 class T2FourRegSMMLA<bits<3> op22_20, bits<4> op7_4, string opc,
                      list<dag> pattern>
   : T2FourReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32,
               opc, "\t$Rd, $Rn, $Rm, $Ra", pattern>,
               Requires<[IsThumb2, HasDSP, UseMulOps]>,
     Sched<[WriteMAC32, ReadMUL, ReadMUL, ReadMAC]> {
   let Inst{31-27} = 0b11111;
   let Inst{26-23} = 0b0110;
   let Inst{22-20} = op22_20;
   let Inst{7-4} = op7_4;
 }
 
 def t2SMMLA :   T2FourRegSMMLA<0b101, 0b0000, "smmla",
                 [(set rGPR:$Rd, (add (mulhs rGPR:$Rm, rGPR:$Rn), rGPR:$Ra))]>;
 def t2SMMLAR:   T2FourRegSMMLA<0b101, 0b0001, "smmlar",
                 [(set rGPR:$Rd, (ARMsmmlar rGPR:$Rn, rGPR:$Rm, rGPR:$Ra))]>;
 def t2SMMLS:    T2FourRegSMMLA<0b110, 0b0000, "smmls", []>;
 def t2SMMLSR:   T2FourRegSMMLA<0b110, 0b0001, "smmlsr",
                 [(set rGPR:$Rd, (ARMsmmlsr rGPR:$Rn, rGPR:$Rm, rGPR:$Ra))]>;
 
 class T2ThreeRegSMUL<bits<3> op22_20, bits<2> op5_4, string opc,
                      list<dag> pattern>
   : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL16, opc,
                "\t$Rd, $Rn, $Rm", pattern>,
     Requires<[IsThumb2, HasDSP]>,
     Sched<[WriteMUL16, ReadMUL, ReadMUL]> {
     let Inst{31-27} = 0b11111;
     let Inst{26-23} = 0b0110;
     let Inst{22-20} = op22_20;
     let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate)
     let Inst{7-6} = 0b00;
     let Inst{5-4} = op5_4;
 }
 
 def t2SMULBB : T2ThreeRegSMUL<0b001, 0b00, "smulbb",
              [(set rGPR:$Rd, (mul (sext_inreg rGPR:$Rn, i16),
                                    (sext_inreg rGPR:$Rm, i16)))]>;
 def t2SMULBT : T2ThreeRegSMUL<0b001, 0b01, "smulbt",
              [(set rGPR:$Rd, (mul (sext_inreg rGPR:$Rn, i16),
                                    (sra rGPR:$Rm, (i32 16))))]>;
 def t2SMULTB : T2ThreeRegSMUL<0b001, 0b10, "smultb",
              [(set rGPR:$Rd, (mul (sra rGPR:$Rn, (i32 16)),
                                    (sext_inreg rGPR:$Rm, i16)))]>;
 def t2SMULTT : T2ThreeRegSMUL<0b001, 0b11, "smultt",
              [(set rGPR:$Rd, (mul (sra rGPR:$Rn, (i32 16)),
                                    (sra rGPR:$Rm, (i32 16))))]>;
 def t2SMULWB : T2ThreeRegSMUL<0b011, 0b00, "smulwb",
              [(set rGPR:$Rd, (ARMsmulwb rGPR:$Rn, rGPR:$Rm))]>;
 def t2SMULWT : T2ThreeRegSMUL<0b011, 0b01, "smulwt",
              [(set rGPR:$Rd, (ARMsmulwt rGPR:$Rn, rGPR:$Rm))]>;
 
 def : Thumb2DSPPat<(mul sext_16_node:$Rm, sext_16_node:$Rn),
                    (t2SMULBB rGPR:$Rm, rGPR:$Rn)>;
 def : Thumb2DSPPat<(mul sext_16_node:$Rn, (sra rGPR:$Rm, (i32 16))),
                    (t2SMULBT rGPR:$Rn, rGPR:$Rm)>;
 def : Thumb2DSPPat<(mul (sra rGPR:$Rn, (i32 16)), sext_16_node:$Rm),
                    (t2SMULTB rGPR:$Rn, rGPR:$Rm)>;
 def : Thumb2DSPPat<(int_arm_smulbb rGPR:$Rn, rGPR:$Rm),
                    (t2SMULBB rGPR:$Rn, rGPR:$Rm)>;
 def : Thumb2DSPPat<(int_arm_smulbt rGPR:$Rn, rGPR:$Rm),
                    (t2SMULBT rGPR:$Rn, rGPR:$Rm)>;
 def : Thumb2DSPPat<(int_arm_smultb rGPR:$Rn, rGPR:$Rm),
                    (t2SMULTB rGPR:$Rn, rGPR:$Rm)>;
 def : Thumb2DSPPat<(int_arm_smultt rGPR:$Rn, rGPR:$Rm),
                    (t2SMULTT rGPR:$Rn, rGPR:$Rm)>;
 def : Thumb2DSPPat<(int_arm_smulwb rGPR:$Rn, rGPR:$Rm),
                    (t2SMULWB rGPR:$Rn, rGPR:$Rm)>;
 def : Thumb2DSPPat<(int_arm_smulwt rGPR:$Rn, rGPR:$Rm),
                    (t2SMULWT rGPR:$Rn, rGPR:$Rm)>;
 
 class T2FourRegSMLA<bits<3> op22_20, bits<2> op5_4, string opc,
                     list<dag> pattern>
   : T2FourReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMUL16,
                opc, "\t$Rd, $Rn, $Rm, $Ra", pattern>,
     Requires<[IsThumb2, HasDSP, UseMulOps]>,
     Sched<[WriteMAC16, ReadMUL, ReadMUL, ReadMAC]>  {
     let Inst{31-27} = 0b11111;
     let Inst{26-23} = 0b0110;
     let Inst{22-20} = op22_20;
     let Inst{7-6} = 0b00;
     let Inst{5-4} = op5_4;
 }
 
 def t2SMLABB : T2FourRegSMLA<0b001, 0b00, "smlabb",
              [(set rGPR:$Rd, (add rGPR:$Ra,
                                (mul (sext_inreg rGPR:$Rn, i16),
                                      (sext_inreg rGPR:$Rm, i16))))]>;
 def t2SMLABT : T2FourRegSMLA<0b001, 0b01, "smlabt",
              [(set rGPR:$Rd, (add rGPR:$Ra, (mul (sext_inreg rGPR:$Rn, i16),
                                                  (sra rGPR:$Rm, (i32 16)))))]>;
 def t2SMLATB : T2FourRegSMLA<0b001, 0b10, "smlatb",
              [(set rGPR:$Rd, (add rGPR:$Ra, (mul (sra rGPR:$Rn, (i32 16)),
                                                 (sext_inreg rGPR:$Rm, i16))))]>;
 def t2SMLATT : T2FourRegSMLA<0b001, 0b11, "smlatt",
              [(set rGPR:$Rd, (add rGPR:$Ra, (mul (sra rGPR:$Rn, (i32 16)),
                                                  (sra rGPR:$Rm, (i32 16)))))]>;
 def t2SMLAWB : T2FourRegSMLA<0b011, 0b00, "smlawb",
              [(set rGPR:$Rd, (add rGPR:$Ra, (ARMsmulwb rGPR:$Rn, rGPR:$Rm)))]>;
 def t2SMLAWT : T2FourRegSMLA<0b011, 0b01, "smlawt",
              [(set rGPR:$Rd, (add rGPR:$Ra, (ARMsmulwt rGPR:$Rn, rGPR:$Rm)))]>;
 
 def : Thumb2DSPMulPat<(add rGPR:$Ra, (mul sext_16_node:$Rn, sext_16_node:$Rm)),
                       (t2SMLABB rGPR:$Rn, rGPR:$Rm, rGPR:$Ra)>;
 def : Thumb2DSPMulPat<(add rGPR:$Ra,
                         (mul sext_16_node:$Rn, (sra rGPR:$Rm, (i32 16)))),
                       (t2SMLABT rGPR:$Rn, rGPR:$Rm, rGPR:$Ra)>;
 def : Thumb2DSPMulPat<(add rGPR:$Ra,
                         (mul (sra rGPR:$Rn, (i32 16)), sext_16_node:$Rm)),
                       (t2SMLATB rGPR:$Rn, rGPR:$Rm, rGPR:$Ra)>;
 
 def : Thumb2DSPPat<(int_arm_smlabb GPR:$a, GPR:$b, GPR:$acc),
                    (t2SMLABB GPR:$a, GPR:$b, GPR:$acc)>;
 def : Thumb2DSPPat<(int_arm_smlabt GPR:$a, GPR:$b, GPR:$acc),
                    (t2SMLABT GPR:$a, GPR:$b, GPR:$acc)>;
 def : Thumb2DSPPat<(int_arm_smlatb GPR:$a, GPR:$b, GPR:$acc),
                    (t2SMLATB GPR:$a, GPR:$b, GPR:$acc)>;
 def : Thumb2DSPPat<(int_arm_smlatt GPR:$a, GPR:$b, GPR:$acc),
                    (t2SMLATT GPR:$a, GPR:$b, GPR:$acc)>;
 def : Thumb2DSPPat<(int_arm_smlawb GPR:$a, GPR:$b, GPR:$acc),
                    (t2SMLAWB GPR:$a, GPR:$b, GPR:$acc)>;
 def : Thumb2DSPPat<(int_arm_smlawt GPR:$a, GPR:$b, GPR:$acc),
                    (t2SMLAWT GPR:$a, GPR:$b, GPR:$acc)>;
 
 // Halfword multiple accumulate long: SMLAL<x><y>
 def t2SMLALBB : T2MlaLong<0b100, 0b1000, "smlalbb">,
                           Requires<[IsThumb2, HasDSP]>;
 def t2SMLALBT : T2MlaLong<0b100, 0b1001, "smlalbt">,
                           Requires<[IsThumb2, HasDSP]>;
 def t2SMLALTB : T2MlaLong<0b100, 0b1010, "smlaltb">,
                           Requires<[IsThumb2, HasDSP]>;
 def t2SMLALTT : T2MlaLong<0b100, 0b1011, "smlaltt">,
                           Requires<[IsThumb2, HasDSP]>;
 
 def : Thumb2DSPPat<(ARMsmlalbb GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi),
                    (t2SMLALBB $Rn, $Rm, $RLo, $RHi)>;
 def : Thumb2DSPPat<(ARMsmlalbt GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi),
                    (t2SMLALBT $Rn, $Rm, $RLo, $RHi)>;
 def : Thumb2DSPPat<(ARMsmlaltb GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi),
                    (t2SMLALTB $Rn, $Rm, $RLo, $RHi)>;
 def : Thumb2DSPPat<(ARMsmlaltt GPR:$Rn, GPR:$Rm, GPR:$RLo, GPR:$RHi),
                    (t2SMLALTT $Rn, $Rm, $RLo, $RHi)>;
 
 class T2DualHalfMul<bits<3> op22_20, bits<4> op7_4, string opc,
                     Intrinsic intrinsic>
   : T2ThreeReg_mac<0, op22_20, op7_4,
                    (outs rGPR:$Rd),
                    (ins rGPR:$Rn, rGPR:$Rm),
                    IIC_iMAC32, opc, "\t$Rd, $Rn, $Rm",
                    [(set rGPR:$Rd, (intrinsic rGPR:$Rn, rGPR:$Rm))]>,
                    Requires<[IsThumb2, HasDSP]>,
    Sched<[WriteMAC32, ReadMUL, ReadMUL, ReadMAC]> {
   let Inst{15-12} = 0b1111;
 }
 
 // Dual halfword multiple: SMUAD, SMUSD, SMLAD, SMLSD, SMLALD, SMLSLD
 def t2SMUAD: T2DualHalfMul<0b010, 0b0000, "smuad", int_arm_smuad>;
 def t2SMUADX: T2DualHalfMul<0b010, 0b0001, "smuadx", int_arm_smuadx>;
 def t2SMUSD: T2DualHalfMul<0b100, 0b0000, "smusd", int_arm_smusd>;
 def t2SMUSDX: T2DualHalfMul<0b100, 0b0001, "smusdx", int_arm_smusdx>;
 
 class T2DualHalfMulAdd<bits<3> op22_20, bits<4> op7_4, string opc,
                        Intrinsic intrinsic>
   : T2FourReg_mac<0, op22_20, op7_4,
                   (outs rGPR:$Rd),
                   (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra),
                   IIC_iMAC32, opc, "\t$Rd, $Rn, $Rm, $Ra",
                   [(set rGPR:$Rd, (intrinsic rGPR:$Rn, rGPR:$Rm, rGPR:$Ra))]>,
                   Requires<[IsThumb2, HasDSP]>;
 
 def t2SMLAD   : T2DualHalfMulAdd<0b010, 0b0000, "smlad", int_arm_smlad>;
 def t2SMLADX  : T2DualHalfMulAdd<0b010, 0b0001, "smladx", int_arm_smladx>;
 def t2SMLSD   : T2DualHalfMulAdd<0b100, 0b0000, "smlsd", int_arm_smlsd>;
 def t2SMLSDX  : T2DualHalfMulAdd<0b100, 0b0001, "smlsdx", int_arm_smlsdx>;
 
 class T2DualHalfMulAddLong<bits<3> op22_20, bits<4> op7_4, string opc>
   : T2FourReg_mac<1, op22_20, op7_4,
                   (outs rGPR:$Ra, rGPR:$Rd),
                   (ins rGPR:$Rn, rGPR:$Rm, rGPR:$RLo, rGPR:$RHi),
                   IIC_iMAC64, opc, "\t$Ra, $Rd, $Rn, $Rm", []>,
                   RegConstraint<"$Ra = $RLo, $Rd = $RHi">,
                   Requires<[IsThumb2, HasDSP]>,
     Sched<[WriteMAC64Lo, WriteMAC64Hi, ReadMUL, ReadMUL, ReadMAC, ReadMAC]>;
 
 def t2SMLALD  : T2DualHalfMulAddLong<0b100, 0b1100, "smlald">;
 def t2SMLALDX : T2DualHalfMulAddLong<0b100, 0b1101, "smlaldx">;
 def t2SMLSLD  : T2DualHalfMulAddLong<0b101, 0b1100, "smlsld">;
 def t2SMLSLDX : T2DualHalfMulAddLong<0b101, 0b1101, "smlsldx">;
 
 def : Thumb2DSPPat<(ARMSmlald rGPR:$Rn, rGPR:$Rm, rGPR:$RLo, rGPR:$RHi),
                    (t2SMLALD rGPR:$Rn, rGPR:$Rm, rGPR:$RLo, rGPR:$RHi)>;
 def : Thumb2DSPPat<(ARMSmlaldx rGPR:$Rn, rGPR:$Rm, rGPR:$RLo, rGPR:$RHi),
                    (t2SMLALDX rGPR:$Rn, rGPR:$Rm, rGPR:$RLo, rGPR:$RHi)>;
 def : Thumb2DSPPat<(ARMSmlsld rGPR:$Rn, rGPR:$Rm, rGPR:$RLo, rGPR:$RHi),
                    (t2SMLSLD rGPR:$Rn, rGPR:$Rm, rGPR:$RLo, rGPR:$RHi)>;
 def : Thumb2DSPPat<(ARMSmlsldx rGPR:$Rn, rGPR:$Rm, rGPR:$RLo, rGPR:$RHi),
                    (t2SMLSLDX rGPR:$Rn, rGPR:$Rm, rGPR:$RLo, rGPR:$RHi)>;
 
 //===----------------------------------------------------------------------===//
 //  Division Instructions.
 //  Signed and unsigned division on v7-M
 //
 def t2SDIV : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iDIV,
                  "sdiv", "\t$Rd, $Rn, $Rm",
                  [(set rGPR:$Rd, (sdiv rGPR:$Rn, rGPR:$Rm))]>,
                  Requires<[HasDivideInThumb, IsThumb, HasV8MBaseline]>,
              Sched<[WriteDIV]> {
   let Inst{31-27} = 0b11111;
   let Inst{26-21} = 0b011100;
   let Inst{20} = 0b1;
   let Inst{15-12} = 0b1111;
   let Inst{7-4} = 0b1111;
 }
 
 def t2UDIV : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iDIV,
                  "udiv", "\t$Rd, $Rn, $Rm",
                  [(set rGPR:$Rd, (udiv rGPR:$Rn, rGPR:$Rm))]>,
                  Requires<[HasDivideInThumb, IsThumb, HasV8MBaseline]>,
              Sched<[WriteDIV]> {
   let Inst{31-27} = 0b11111;
   let Inst{26-21} = 0b011101;
   let Inst{20} = 0b1;
   let Inst{15-12} = 0b1111;
   let Inst{7-4} = 0b1111;
 }
 
 //===----------------------------------------------------------------------===//
 //  Misc. Arithmetic Instructions.
 //
 
 class T2I_misc<bits<2> op1, bits<2> op2, dag oops, dag iops,
       InstrItinClass itin, string opc, string asm, list<dag> pattern>
   : T2ThreeReg<oops, iops, itin, opc, asm, pattern> {
   let Inst{31-27} = 0b11111;
   let Inst{26-22} = 0b01010;
   let Inst{21-20} = op1;
   let Inst{15-12} = 0b1111;
   let Inst{7-6} = 0b10;
   let Inst{5-4} = op2;
   let Rn{3-0} = Rm;
 }
 
 def t2CLZ : T2I_misc<0b11, 0b00, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr,
                     "clz", "\t$Rd, $Rm", [(set rGPR:$Rd, (ctlz rGPR:$Rm))]>,
                     Sched<[WriteALU]>;
 
 def t2RBIT : T2I_misc<0b01, 0b10, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr,
                       "rbit", "\t$Rd, $Rm",
                       [(set rGPR:$Rd, (bitreverse rGPR:$Rm))]>,
                       Sched<[WriteALU]>;
 
 def t2REV : T2I_misc<0b01, 0b00, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr,
                  "rev", ".w\t$Rd, $Rm", [(set rGPR:$Rd, (bswap rGPR:$Rm))]>,
                  Sched<[WriteALU]>;
 
 def t2REV16 : T2I_misc<0b01, 0b01, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr,
                        "rev16", ".w\t$Rd, $Rm",
                 [(set rGPR:$Rd, (rotr (bswap rGPR:$Rm), (i32 16)))]>,
                 Sched<[WriteALU]>;
 
 def t2REVSH : T2I_misc<0b01, 0b11, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr,
                        "revsh", ".w\t$Rd, $Rm",
                  [(set rGPR:$Rd, (sra (bswap rGPR:$Rm), (i32 16)))]>,
                  Sched<[WriteALU]>;
 
 def : T2Pat<(or (sra (shl rGPR:$Rm, (i32 24)), (i32 16)),
                 (and (srl rGPR:$Rm, (i32 8)), 0xFF)),
             (t2REVSH rGPR:$Rm)>;
 
 def t2PKHBT : T2ThreeReg<
             (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, pkh_lsl_amt:$sh),
                   IIC_iBITsi, "pkhbt", "\t$Rd, $Rn, $Rm$sh",
                   [(set rGPR:$Rd, (or (and rGPR:$Rn, 0xFFFF),
                                       (and (shl rGPR:$Rm, pkh_lsl_amt:$sh),
                                            0xFFFF0000)))]>,
                   Requires<[HasDSP, IsThumb2]>,
                   Sched<[WriteALUsi, ReadALU]> {
   let Inst{31-27} = 0b11101;
   let Inst{26-25} = 0b01;
   let Inst{24-20} = 0b01100;
   let Inst{5} = 0; // BT form
   let Inst{4} = 0;
 
   bits<5> sh;
   let Inst{14-12} = sh{4-2};
   let Inst{7-6}   = sh{1-0};
 }
 
 // Alternate cases for PKHBT where identities eliminate some nodes.
 def : T2Pat<(or (and rGPR:$src1, 0xFFFF), (and rGPR:$src2, 0xFFFF0000)),
             (t2PKHBT rGPR:$src1, rGPR:$src2, 0)>,
             Requires<[HasDSP, IsThumb2]>;
 def : T2Pat<(or (and rGPR:$src1, 0xFFFF), (shl rGPR:$src2, imm16_31:$sh)),
             (t2PKHBT rGPR:$src1, rGPR:$src2, imm16_31:$sh)>,
             Requires<[HasDSP, IsThumb2]>;
 
 // Note: Shifts of 1-15 bits will be transformed to srl instead of sra and
 // will match the pattern below.
 def t2PKHTB : T2ThreeReg<
                   (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, pkh_asr_amt:$sh),
                   IIC_iBITsi, "pkhtb", "\t$Rd, $Rn, $Rm$sh",
                   [(set rGPR:$Rd, (or (and rGPR:$Rn, 0xFFFF0000),
                                        (and (sra rGPR:$Rm, pkh_asr_amt:$sh),
                                             0xFFFF)))]>,
                   Requires<[HasDSP, IsThumb2]>,
                   Sched<[WriteALUsi, ReadALU]> {
   let Inst{31-27} = 0b11101;
   let Inst{26-25} = 0b01;
   let Inst{24-20} = 0b01100;
   let Inst{5} = 1; // TB form
   let Inst{4} = 0;
 
   bits<5> sh;
   let Inst{14-12} = sh{4-2};
   let Inst{7-6}   = sh{1-0};
 }
 
 // Alternate cases for PKHTB where identities eliminate some nodes.  Note that
 // a shift amount of 0 is *not legal* here, it is PKHBT instead.
 // We also can not replace a srl (17..31) by an arithmetic shift we would use in
 // pkhtb src1, src2, asr (17..31).
 def : T2Pat<(or (and rGPR:$src1, 0xFFFF0000), (srl rGPR:$src2, imm16:$sh)),
             (t2PKHTB rGPR:$src1, rGPR:$src2, imm16:$sh)>,
             Requires<[HasDSP, IsThumb2]>;
 def : T2Pat<(or (and rGPR:$src1, 0xFFFF0000), (sra rGPR:$src2, imm16_31:$sh)),
             (t2PKHTB rGPR:$src1, rGPR:$src2, imm16_31:$sh)>,
             Requires<[HasDSP, IsThumb2]>;
 def : T2Pat<(or (and rGPR:$src1, 0xFFFF0000),
                 (and (srl rGPR:$src2, imm1_15:$sh), 0xFFFF)),
             (t2PKHTB rGPR:$src1, rGPR:$src2, imm1_15:$sh)>,
             Requires<[HasDSP, IsThumb2]>;
 
 //===----------------------------------------------------------------------===//
 // CRC32 Instructions
 //
 // Polynomials:
 // + CRC32{B,H,W}       0x04C11DB7
 // + CRC32C{B,H,W}      0x1EDC6F41
 //
 
 class T2I_crc32<bit C, bits<2> sz, string suffix, SDPatternOperator builtin>
   : T2ThreeRegNoP<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), NoItinerary,
                !strconcat("crc32", suffix, "\t$Rd, $Rn, $Rm"),
                [(set rGPR:$Rd, (builtin rGPR:$Rn, rGPR:$Rm))]>,
                Requires<[IsThumb2, HasV8, HasCRC]> {
   let Inst{31-27} = 0b11111;
   let Inst{26-21} = 0b010110;
   let Inst{20}    = C;
   let Inst{15-12} = 0b1111;
   let Inst{7-6}   = 0b10;
   let Inst{5-4}   = sz;
 }
 
 def t2CRC32B  : T2I_crc32<0, 0b00, "b", int_arm_crc32b>;
 def t2CRC32CB : T2I_crc32<1, 0b00, "cb", int_arm_crc32cb>;
 def t2CRC32H  : T2I_crc32<0, 0b01, "h", int_arm_crc32h>;
 def t2CRC32CH : T2I_crc32<1, 0b01, "ch", int_arm_crc32ch>;
 def t2CRC32W  : T2I_crc32<0, 0b10, "w", int_arm_crc32w>;
 def t2CRC32CW : T2I_crc32<1, 0b10, "cw", int_arm_crc32cw>;
 
 //===----------------------------------------------------------------------===//
 //  Comparison Instructions...
 //
 defm t2CMP  : T2I_cmp_irs<0b1101, "cmp",
                           IIC_iCMPi, IIC_iCMPr, IIC_iCMPsi, ARMcmp>;
 
 def : T2Pat<(ARMcmpZ  GPRnopc:$lhs, t2_so_imm:$imm),
             (t2CMPri  GPRnopc:$lhs, t2_so_imm:$imm)>;
 def : T2Pat<(ARMcmpZ  GPRnopc:$lhs, rGPR:$rhs),
             (t2CMPrr  GPRnopc:$lhs, rGPR:$rhs)>;
 def : T2Pat<(ARMcmpZ  GPRnopc:$lhs, t2_so_reg:$rhs),
             (t2CMPrs  GPRnopc:$lhs, t2_so_reg:$rhs)>;
 
 let isCompare = 1, Defs = [CPSR] in {
    // shifted imm
    def t2CMNri : T2OneRegCmpImm<
                 (outs), (ins GPRnopc:$Rn, t2_so_imm:$imm), IIC_iCMPi,
                 "cmn", ".w\t$Rn, $imm",
                 [(ARMcmn GPRnopc:$Rn, (ineg t2_so_imm:$imm))]>,
                 Sched<[WriteCMP, ReadALU]> {
      let Inst{31-27} = 0b11110;
      let Inst{25} = 0;
      let Inst{24-21} = 0b1000;
      let Inst{20} = 1; // The S bit.
      let Inst{15} = 0;
      let Inst{11-8} = 0b1111; // Rd
    }
    // register
    def t2CMNzrr : T2TwoRegCmp<
                 (outs), (ins GPRnopc:$Rn, rGPR:$Rm), IIC_iCMPr,
                 "cmn", ".w\t$Rn, $Rm",
                 [(BinOpFrag<(ARMcmpZ node:$LHS,(ineg node:$RHS))>
                   GPRnopc:$Rn, rGPR:$Rm)]>, Sched<[WriteCMP, ReadALU, ReadALU]> {
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = 0b1000;
      let Inst{20} = 1; // The S bit.
      let Inst{14-12} = 0b000; // imm3
      let Inst{11-8} = 0b1111; // Rd
      let Inst{7-6} = 0b00; // imm2
      let Inst{5-4} = 0b00; // type
    }
    // shifted register
    def t2CMNzrs : T2OneRegCmpShiftedReg<
                 (outs), (ins GPRnopc:$Rn, t2_so_reg:$ShiftedRm), IIC_iCMPsi,
                 "cmn", ".w\t$Rn, $ShiftedRm",
                 [(BinOpFrag<(ARMcmpZ node:$LHS,(ineg node:$RHS))>
                   GPRnopc:$Rn, t2_so_reg:$ShiftedRm)]>,
                   Sched<[WriteCMPsi, ReadALU, ReadALU]> {
      let Inst{31-27} = 0b11101;
      let Inst{26-25} = 0b01;
      let Inst{24-21} = 0b1000;
      let Inst{20} = 1; // The S bit.
      let Inst{11-8} = 0b1111; // Rd
    }
 }
 
 // Assembler aliases w/o the ".w" suffix.
 // No alias here for 'rr' version as not all instantiations of this multiclass
 // want one (CMP in particular, does not).
 def : t2InstAlias<"cmn${p} $Rn, $imm",
    (t2CMNri GPRnopc:$Rn, t2_so_imm:$imm, pred:$p)>;
 def : t2InstAlias<"cmn${p} $Rn, $shift",
    (t2CMNzrs GPRnopc:$Rn, t2_so_reg:$shift, pred:$p)>;
 
 def : T2Pat<(ARMcmp  GPR:$src, t2_so_imm_neg:$imm),
             (t2CMNri GPR:$src, t2_so_imm_neg:$imm)>;
 
 def : T2Pat<(ARMcmpZ GPRnopc:$src, t2_so_imm_neg:$imm),
             (t2CMNri GPRnopc:$src, t2_so_imm_neg:$imm)>;
 
 defm t2TST  : T2I_cmp_irs<0b0000, "tst",
                           IIC_iTSTi, IIC_iTSTr, IIC_iTSTsi,
                          BinOpFrag<(ARMcmpZ (and_su node:$LHS, node:$RHS), 0)>>;
 defm t2TEQ  : T2I_cmp_irs<0b0100, "teq",
                           IIC_iTSTi, IIC_iTSTr, IIC_iTSTsi,
                          BinOpFrag<(ARMcmpZ (xor_su node:$LHS, node:$RHS), 0)>>;
 
 // Conditional moves
 let hasSideEffects = 0 in {
 
 let isCommutable = 1, isSelect = 1 in
 def t2MOVCCr : t2PseudoInst<(outs rGPR:$Rd),
                             (ins rGPR:$false, rGPR:$Rm, cmovpred:$p),
                             4, IIC_iCMOVr,
                             [(set rGPR:$Rd, (ARMcmov rGPR:$false, rGPR:$Rm,
                                                      cmovpred:$p))]>,
                RegConstraint<"$false = $Rd">, Sched<[WriteALU]>;
 
 let isMoveImm = 1 in
 def t2MOVCCi
     : t2PseudoInst<(outs rGPR:$Rd),
                    (ins rGPR:$false, t2_so_imm:$imm, cmovpred:$p),
                    4, IIC_iCMOVi,
                    [(set rGPR:$Rd, (ARMcmov rGPR:$false,t2_so_imm:$imm,
                                             cmovpred:$p))]>,
       RegConstraint<"$false = $Rd">, Sched<[WriteALU]>;
 
 let isCodeGenOnly = 1 in {
 let isMoveImm = 1 in
 def t2MOVCCi16
     : t2PseudoInst<(outs rGPR:$Rd),
                    (ins  rGPR:$false, imm0_65535_expr:$imm, cmovpred:$p),
                    4, IIC_iCMOVi,
                    [(set rGPR:$Rd, (ARMcmov rGPR:$false, imm0_65535:$imm,
                                             cmovpred:$p))]>,
       RegConstraint<"$false = $Rd">, Sched<[WriteALU]>;
 
 let isMoveImm = 1 in
 def t2MVNCCi
     : t2PseudoInst<(outs rGPR:$Rd),
                    (ins rGPR:$false, t2_so_imm:$imm, cmovpred:$p),
                    4, IIC_iCMOVi,
                    [(set rGPR:$Rd,
                          (ARMcmov rGPR:$false, t2_so_imm_not:$imm,
                                   cmovpred:$p))]>,
       RegConstraint<"$false = $Rd">, Sched<[WriteALU]>;
 
 class MOVCCShPseudo<SDPatternOperator opnode, Operand ty>
     : t2PseudoInst<(outs rGPR:$Rd),
                    (ins rGPR:$false, rGPR:$Rm, i32imm:$imm, cmovpred:$p),
                    4, IIC_iCMOVsi,
                    [(set rGPR:$Rd, (ARMcmov rGPR:$false,
                                             (opnode rGPR:$Rm, (i32 ty:$imm)),
                                             cmovpred:$p))]>,
       RegConstraint<"$false = $Rd">, Sched<[WriteALU]>;
 
 def t2MOVCClsl : MOVCCShPseudo<shl,  imm0_31>;
 def t2MOVCClsr : MOVCCShPseudo<srl,  imm_sr>;
 def t2MOVCCasr : MOVCCShPseudo<sra,  imm_sr>;
 def t2MOVCCror : MOVCCShPseudo<rotr, imm0_31>;
 
 let isMoveImm = 1 in
 def t2MOVCCi32imm
     : t2PseudoInst<(outs rGPR:$dst),
                    (ins rGPR:$false, i32imm:$src, cmovpred:$p),
                    8, IIC_iCMOVix2,
                    [(set rGPR:$dst, (ARMcmov rGPR:$false, imm:$src,
                                              cmovpred:$p))]>,
       RegConstraint<"$false = $dst">;
 } // isCodeGenOnly = 1
 
 } // hasSideEffects
 
 //===----------------------------------------------------------------------===//
 // Atomic operations intrinsics
 //
 
 // memory barriers protect the atomic sequences
 let hasSideEffects = 1 in {
 def t2DMB : T2I<(outs), (ins memb_opt:$opt), NoItinerary,
                 "dmb", "\t$opt", [(int_arm_dmb (i32 imm0_15:$opt))]>,
                 Requires<[IsThumb, HasDB]> {
   bits<4> opt;
   let Inst{31-4} = 0xf3bf8f5;
   let Inst{3-0} = opt;
 }
 
 def t2DSB : T2I<(outs), (ins memb_opt:$opt), NoItinerary,
                 "dsb", "\t$opt", [(int_arm_dsb (i32 imm0_15:$opt))]>,
                 Requires<[IsThumb, HasDB]> {
   bits<4> opt;
   let Inst{31-4} = 0xf3bf8f4;
   let Inst{3-0} = opt;
 }
 
 def t2ISB : T2I<(outs), (ins instsyncb_opt:$opt), NoItinerary,
                 "isb", "\t$opt", [(int_arm_isb (i32 imm0_15:$opt))]>,
                 Requires<[IsThumb, HasDB]> {
   bits<4> opt;
   let Inst{31-4} = 0xf3bf8f6;
   let Inst{3-0} = opt;
 }
 
 let hasNoSchedulingInfo = 1 in
 def t2TSB : T2I<(outs), (ins tsb_opt:$opt), NoItinerary,
                 "tsb", "\t$opt", []>, Requires<[IsThumb, HasV8_4a]> {
   let Inst{31-0} = 0xf3af8012;
 }
 }
 
 class T2I_ldrex<bits<4> opcod, dag oops, dag iops, AddrMode am, int sz,
                 InstrItinClass itin, string opc, string asm, string cstr,
                 list<dag> pattern, bits<4> rt2 = 0b1111>
   : Thumb2I<oops, iops, am, sz, itin, opc, asm, cstr, pattern> {
   let Inst{31-27} = 0b11101;
   let Inst{26-20} = 0b0001101;
   let Inst{11-8} = rt2;
   let Inst{7-4} = opcod;
   let Inst{3-0} = 0b1111;
 
   bits<4> addr;
   bits<4> Rt;
   let Inst{19-16} = addr;
   let Inst{15-12} = Rt;
 }
 class T2I_strex<bits<4> opcod, dag oops, dag iops, AddrMode am, int sz,
                 InstrItinClass itin, string opc, string asm, string cstr,
                 list<dag> pattern, bits<4> rt2 = 0b1111>
   : Thumb2I<oops, iops, am, sz, itin, opc, asm, cstr, pattern> {
   let Inst{31-27} = 0b11101;
   let Inst{26-20} = 0b0001100;
   let Inst{11-8} = rt2;
   let Inst{7-4} = opcod;
 
   bits<4> Rd;
   bits<4> addr;
   bits<4> Rt;
   let Inst{3-0}  = Rd;
   let Inst{19-16} = addr;
   let Inst{15-12} = Rt;
 }
 
 let mayLoad = 1 in {
 def t2LDREXB : T2I_ldrex<0b0100, (outs rGPR:$Rt), (ins addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "ldrexb", "\t$Rt, $addr", "",
                          [(set rGPR:$Rt, (ldrex_1 addr_offset_none:$addr))]>,
                Requires<[IsThumb, HasV8MBaseline]>;
 def t2LDREXH : T2I_ldrex<0b0101, (outs rGPR:$Rt), (ins addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "ldrexh", "\t$Rt, $addr", "",
                          [(set rGPR:$Rt, (ldrex_2 addr_offset_none:$addr))]>,
                Requires<[IsThumb, HasV8MBaseline]>;
 def t2LDREX  : Thumb2I<(outs rGPR:$Rt), (ins t2addrmode_imm0_1020s4:$addr),
-                       AddrModeNone, 4, NoItinerary,
+                       AddrModeT2_ldrex, 4, NoItinerary,
                        "ldrex", "\t$Rt, $addr", "",
                      [(set rGPR:$Rt, (ldrex_4 t2addrmode_imm0_1020s4:$addr))]>,
                Requires<[IsThumb, HasV8MBaseline]> {
   bits<4> Rt;
   bits<12> addr;
   let Inst{31-27} = 0b11101;
   let Inst{26-20} = 0b0000101;
   let Inst{19-16} = addr{11-8};
   let Inst{15-12} = Rt;
   let Inst{11-8} = 0b1111;
   let Inst{7-0} = addr{7-0};
 }
 let hasExtraDefRegAllocReq = 1 in
 def t2LDREXD : T2I_ldrex<0b0111, (outs rGPR:$Rt, rGPR:$Rt2),
                          (ins addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "ldrexd", "\t$Rt, $Rt2, $addr", "",
                          [], {?, ?, ?, ?}>,
                Requires<[IsThumb2, IsNotMClass]> {
   bits<4> Rt2;
   let Inst{11-8} = Rt2;
 }
 def t2LDAEXB : T2I_ldrex<0b1100, (outs rGPR:$Rt), (ins addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "ldaexb", "\t$Rt, $addr", "",
                          [(set rGPR:$Rt, (ldaex_1 addr_offset_none:$addr))]>,
                Requires<[IsThumb, HasAcquireRelease, HasV7Clrex]>;
 def t2LDAEXH : T2I_ldrex<0b1101, (outs rGPR:$Rt), (ins addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "ldaexh", "\t$Rt, $addr", "",
                          [(set rGPR:$Rt, (ldaex_2 addr_offset_none:$addr))]>,
                Requires<[IsThumb, HasAcquireRelease, HasV7Clrex]>;
 def t2LDAEX  : Thumb2I<(outs rGPR:$Rt), (ins addr_offset_none:$addr),
                        AddrModeNone, 4, NoItinerary,
                        "ldaex", "\t$Rt, $addr", "",
                          [(set rGPR:$Rt, (ldaex_4 addr_offset_none:$addr))]>,
                Requires<[IsThumb, HasAcquireRelease, HasV7Clrex]> {
   bits<4> Rt;
   bits<4> addr;
   let Inst{31-27} = 0b11101;
   let Inst{26-20} = 0b0001101;
   let Inst{19-16} = addr;
   let Inst{15-12} = Rt;
   let Inst{11-8} = 0b1111;
   let Inst{7-0} = 0b11101111;
 }
 let hasExtraDefRegAllocReq = 1 in
 def t2LDAEXD : T2I_ldrex<0b1111, (outs rGPR:$Rt, rGPR:$Rt2),
                          (ins addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "ldaexd", "\t$Rt, $Rt2, $addr", "",
                          [], {?, ?, ?, ?}>, Requires<[IsThumb,
                          HasAcquireRelease, HasV7Clrex, IsNotMClass]> {
   bits<4> Rt2;
   let Inst{11-8} = Rt2;
 
   let Inst{7} = 1;
 }
 }
 
 let mayStore = 1, Constraints = "@earlyclobber $Rd" in {
 def t2STREXB : T2I_strex<0b0100, (outs rGPR:$Rd),
                          (ins rGPR:$Rt, addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "strexb", "\t$Rd, $Rt, $addr", "",
                          [(set rGPR:$Rd,
                                (strex_1 rGPR:$Rt, addr_offset_none:$addr))]>,
                Requires<[IsThumb, HasV8MBaseline]>;
 def t2STREXH : T2I_strex<0b0101, (outs rGPR:$Rd),
                          (ins rGPR:$Rt, addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "strexh", "\t$Rd, $Rt, $addr", "",
                          [(set rGPR:$Rd,
                                (strex_2 rGPR:$Rt, addr_offset_none:$addr))]>,
                Requires<[IsThumb, HasV8MBaseline]>;
 
 def t2STREX  : Thumb2I<(outs rGPR:$Rd), (ins rGPR:$Rt,
                              t2addrmode_imm0_1020s4:$addr),
-                  AddrModeNone, 4, NoItinerary,
+                  AddrModeT2_ldrex, 4, NoItinerary,
                   "strex", "\t$Rd, $Rt, $addr", "",
                   [(set rGPR:$Rd,
                         (strex_4 rGPR:$Rt, t2addrmode_imm0_1020s4:$addr))]>,
                Requires<[IsThumb, HasV8MBaseline]> {
   bits<4> Rd;
   bits<4> Rt;
   bits<12> addr;
   let Inst{31-27} = 0b11101;
   let Inst{26-20} = 0b0000100;
   let Inst{19-16} = addr{11-8};
   let Inst{15-12} = Rt;
   let Inst{11-8}  = Rd;
   let Inst{7-0} = addr{7-0};
 }
 let hasExtraSrcRegAllocReq = 1 in
 def t2STREXD : T2I_strex<0b0111, (outs rGPR:$Rd),
                          (ins rGPR:$Rt, rGPR:$Rt2, addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "strexd", "\t$Rd, $Rt, $Rt2, $addr", "", [],
                          {?, ?, ?, ?}>,
                Requires<[IsThumb2, IsNotMClass]> {
   bits<4> Rt2;
   let Inst{11-8} = Rt2;
 }
 def t2STLEXB : T2I_strex<0b1100, (outs rGPR:$Rd),
                          (ins rGPR:$Rt, addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "stlexb", "\t$Rd, $Rt, $addr", "",
                          [(set rGPR:$Rd,
                                (stlex_1 rGPR:$Rt, addr_offset_none:$addr))]>,
                          Requires<[IsThumb, HasAcquireRelease,
                                    HasV7Clrex]>;
 
 def t2STLEXH : T2I_strex<0b1101, (outs rGPR:$Rd),
                          (ins rGPR:$Rt, addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "stlexh", "\t$Rd, $Rt, $addr", "",
                          [(set rGPR:$Rd,
                                (stlex_2 rGPR:$Rt, addr_offset_none:$addr))]>,
                          Requires<[IsThumb, HasAcquireRelease,
                                    HasV7Clrex]>;
 
 def t2STLEX  : Thumb2I<(outs rGPR:$Rd), (ins rGPR:$Rt,
                              addr_offset_none:$addr),
                   AddrModeNone, 4, NoItinerary,
                   "stlex", "\t$Rd, $Rt, $addr", "",
                   [(set rGPR:$Rd,
                         (stlex_4 rGPR:$Rt, addr_offset_none:$addr))]>,
                   Requires<[IsThumb, HasAcquireRelease, HasV7Clrex]> {
   bits<4> Rd;
   bits<4> Rt;
   bits<4> addr;
   let Inst{31-27} = 0b11101;
   let Inst{26-20} = 0b0001100;
   let Inst{19-16} = addr;
   let Inst{15-12} = Rt;
   let Inst{11-4}  = 0b11111110;
   let Inst{3-0}   = Rd;
 }
 let hasExtraSrcRegAllocReq = 1 in
 def t2STLEXD : T2I_strex<0b1111, (outs rGPR:$Rd),
                          (ins rGPR:$Rt, rGPR:$Rt2, addr_offset_none:$addr),
                          AddrModeNone, 4, NoItinerary,
                          "stlexd", "\t$Rd, $Rt, $Rt2, $addr", "", [],
                          {?, ?, ?, ?}>, Requires<[IsThumb, HasAcquireRelease,
                          HasV7Clrex, IsNotMClass]> {
   bits<4> Rt2;
   let Inst{11-8} = Rt2;
 }
 }
 
 def t2CLREX : T2I<(outs), (ins), NoItinerary, "clrex", "", [(int_arm_clrex)]>,
             Requires<[IsThumb, HasV7Clrex]>  {
   let Inst{31-16} = 0xf3bf;
   let Inst{15-14} = 0b10;
   let Inst{13} = 0;
   let Inst{12} = 0;
   let Inst{11-8} = 0b1111;
   let Inst{7-4} = 0b0010;
   let Inst{3-0} = 0b1111;
 }
 
 def : T2Pat<(and (ldrex_1 addr_offset_none:$addr), 0xff),
             (t2LDREXB addr_offset_none:$addr)>,
             Requires<[IsThumb, HasV8MBaseline]>;
 def : T2Pat<(and (ldrex_2 addr_offset_none:$addr), 0xffff),
             (t2LDREXH addr_offset_none:$addr)>,
             Requires<[IsThumb, HasV8MBaseline]>;
 def : T2Pat<(strex_1 (and GPR:$Rt, 0xff), addr_offset_none:$addr),
             (t2STREXB GPR:$Rt, addr_offset_none:$addr)>,
             Requires<[IsThumb, HasV8MBaseline]>;
 def : T2Pat<(strex_2 (and GPR:$Rt, 0xffff), addr_offset_none:$addr),
             (t2STREXH GPR:$Rt, addr_offset_none:$addr)>,
             Requires<[IsThumb, HasV8MBaseline]>;
 
 def : T2Pat<(and (ldaex_1 addr_offset_none:$addr), 0xff),
             (t2LDAEXB addr_offset_none:$addr)>,
             Requires<[IsThumb, HasAcquireRelease, HasV7Clrex]>;
 def : T2Pat<(and (ldaex_2 addr_offset_none:$addr), 0xffff),
             (t2LDAEXH addr_offset_none:$addr)>,
             Requires<[IsThumb, HasAcquireRelease, HasV7Clrex]>;
 def : T2Pat<(stlex_1 (and GPR:$Rt, 0xff), addr_offset_none:$addr),
             (t2STLEXB GPR:$Rt, addr_offset_none:$addr)>,
             Requires<[IsThumb, HasAcquireRelease, HasV7Clrex]>;
 def : T2Pat<(stlex_2 (and GPR:$Rt, 0xffff), addr_offset_none:$addr),
             (t2STLEXH GPR:$Rt, addr_offset_none:$addr)>,
             Requires<[IsThumb, HasAcquireRelease, HasV7Clrex]>;
 
 //===----------------------------------------------------------------------===//
 // SJLJ Exception handling intrinsics
 //   eh_sjlj_setjmp() is an instruction sequence to store the return
 //   address and save #0 in R0 for the non-longjmp case.
 //   Since by its nature we may be coming from some other function to get
 //   here, and we're using the stack frame for the containing function to
 //   save/restore registers, we can't keep anything live in regs across
 //   the eh_sjlj_setjmp(), else it will almost certainly have been tromped upon
 //   when we get here from a longjmp(). We force everything out of registers
 //   except for our own input by listing the relevant registers in Defs. By
 //   doing so, we also cause the prologue/epilogue code to actively preserve
 //   all of the callee-saved resgisters, which is exactly what we want.
 //   $val is a scratch register for our use.
 let Defs =
   [ R0,  R1,  R2,  R3,  R4,  R5,  R6,  R7,  R8,  R9,  R10, R11, R12, LR, CPSR,
     Q0, Q1, Q2, Q3, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15],
   hasSideEffects = 1, isBarrier = 1, isCodeGenOnly = 1,
   usesCustomInserter = 1 in {
   def t2Int_eh_sjlj_setjmp : Thumb2XI<(outs), (ins tGPR:$src, tGPR:$val),
                                AddrModeNone, 0, NoItinerary, "", "",
                           [(set R0, (ARMeh_sjlj_setjmp tGPR:$src, tGPR:$val))]>,
                              Requires<[IsThumb2, HasVFP2]>;
 }
 
 let Defs =
   [ R0,  R1,  R2,  R3,  R4,  R5,  R6,  R7,  R8,  R9,  R10, R11, R12, LR, CPSR ],
   hasSideEffects = 1, isBarrier = 1, isCodeGenOnly = 1,
   usesCustomInserter = 1 in {
   def t2Int_eh_sjlj_setjmp_nofp : Thumb2XI<(outs), (ins tGPR:$src, tGPR:$val),
                                AddrModeNone, 0, NoItinerary, "", "",
                           [(set R0, (ARMeh_sjlj_setjmp tGPR:$src, tGPR:$val))]>,
                                   Requires<[IsThumb2, NoVFP]>;
 }
 
 
 //===----------------------------------------------------------------------===//
 // Control-Flow Instructions
 //
 
 // FIXME: remove when we have a way to marking a MI with these properties.
 // FIXME: Should pc be an implicit operand like PICADD, etc?
 let isReturn = 1, isTerminator = 1, isBarrier = 1, mayLoad = 1,
     hasExtraDefRegAllocReq = 1, isCodeGenOnly = 1 in
 def t2LDMIA_RET: t2PseudoExpand<(outs GPR:$wb), (ins GPR:$Rn, pred:$p,
                                                    reglist:$regs, variable_ops),
                               4, IIC_iLoad_mBr, [],
             (t2LDMIA_UPD GPR:$wb, GPR:$Rn, pred:$p, reglist:$regs)>,
                          RegConstraint<"$Rn = $wb">;
 
 let isBranch = 1, isTerminator = 1, isBarrier = 1 in {
 let isPredicable = 1 in
 def t2B   : T2I<(outs), (ins thumb_br_target:$target), IIC_Br,
                  "b", ".w\t$target",
                  [(br bb:$target)]>, Sched<[WriteBr]>,
                  Requires<[IsThumb, HasV8MBaseline]> {
   let Inst{31-27} = 0b11110;
   let Inst{15-14} = 0b10;
   let Inst{12} = 1;
 
   bits<24> target;
   let Inst{26} = target{23};
   let Inst{13} = target{22};
   let Inst{11} = target{21};
   let Inst{25-16} = target{20-11};
   let Inst{10-0} = target{10-0};
   let DecoderMethod = "DecodeT2BInstruction";
   let AsmMatchConverter = "cvtThumbBranches";
 }
 
 let Size = 4, isNotDuplicable = 1, isBranch = 1, isTerminator = 1,
     isBarrier = 1, isIndirectBranch = 1 in {
 
 // available in both v8-M.Baseline and Thumb2 targets
 def t2BR_JT : t2basePseudoInst<(outs),
           (ins GPR:$target, GPR:$index, i32imm:$jt),
            0, IIC_Br,
           [(ARMbr2jt GPR:$target, GPR:$index, tjumptable:$jt)]>,
           Sched<[WriteBr]>;
 
 // FIXME: Add a case that can be predicated.
 def t2TBB_JT : t2PseudoInst<(outs),
         (ins GPR:$base, GPR:$index, i32imm:$jt, i32imm:$pclbl), 0, IIC_Br, []>,
         Sched<[WriteBr]>;
 
 def t2TBH_JT : t2PseudoInst<(outs),
         (ins GPR:$base, GPR:$index, i32imm:$jt, i32imm:$pclbl), 0, IIC_Br, []>,
         Sched<[WriteBr]>;
 
 def t2TBB : T2I<(outs), (ins addrmode_tbb:$addr), IIC_Br,
                     "tbb", "\t$addr", []>, Sched<[WriteBrTbl]> {
   bits<4> Rn;
   bits<4> Rm;
   let Inst{31-20} = 0b111010001101;
   let Inst{19-16} = Rn;
   let Inst{15-5} = 0b11110000000;
   let Inst{4} = 0; // B form
   let Inst{3-0} = Rm;
 
   let DecoderMethod = "DecodeThumbTableBranch";
 }
 
 def t2TBH : T2I<(outs), (ins addrmode_tbh:$addr), IIC_Br,
                    "tbh", "\t$addr", []>, Sched<[WriteBrTbl]> {
   bits<4> Rn;
   bits<4> Rm;
   let Inst{31-20} = 0b111010001101;
   let Inst{19-16} = Rn;
   let Inst{15-5} = 0b11110000000;
   let Inst{4} = 1; // H form
   let Inst{3-0} = Rm;
 
   let DecoderMethod = "DecodeThumbTableBranch";
 }
 } // isNotDuplicable, isIndirectBranch
 
 } // isBranch, isTerminator, isBarrier
 
 // FIXME: should be able to write a pattern for ARMBrcond, but can't use
 // a two-value operand where a dag node expects ", "two operands. :(
 let isBranch = 1, isTerminator = 1 in
 def t2Bcc : T2I<(outs), (ins brtarget:$target), IIC_Br,
                 "b", ".w\t$target",
                 [/*(ARMbrcond bb:$target, imm:$cc)*/]>, Sched<[WriteBr]> {
   let Inst{31-27} = 0b11110;
   let Inst{15-14} = 0b10;
   let Inst{12} = 0;
 
   bits<4> p;
   let Inst{25-22} = p;
 
   bits<21> target;
   let Inst{26} = target{20};
   let Inst{11} = target{19};
   let Inst{13} = target{18};
   let Inst{21-16} = target{17-12};
   let Inst{10-0} = target{11-1};
 
   let DecoderMethod = "DecodeThumb2BCCInstruction";
   let AsmMatchConverter = "cvtThumbBranches";
 }
 
 // Tail calls. The MachO version of thumb tail calls uses a t2 branch, so
 // it goes here.
 let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in {
   // IOS version.
   let Uses = [SP] in
   def tTAILJMPd: tPseudoExpand<(outs),
                    (ins thumb_br_target:$dst, pred:$p),
                    4, IIC_Br, [],
                    (t2B thumb_br_target:$dst, pred:$p)>,
                  Requires<[IsThumb2, IsMachO]>, Sched<[WriteBr]>;
 }
 
 // IT block
 let Defs = [ITSTATE] in
 def t2IT : Thumb2XI<(outs), (ins it_pred:$cc, it_mask:$mask),
                     AddrModeNone, 2,  IIC_iALUx,
                     "it$mask\t$cc", "", []>,
            ComplexDeprecationPredicate<"IT"> {
   // 16-bit instruction.
   let Inst{31-16} = 0x0000;
   let Inst{15-8} = 0b10111111;
 
   bits<4> cc;
   bits<4> mask;
   let Inst{7-4} = cc;
   let Inst{3-0} = mask;
 
   let DecoderMethod = "DecodeIT";
 }
 
 // Branch and Exchange Jazelle -- for disassembly only
 // Rm = Inst{19-16}
 let isBranch = 1, isTerminator = 1, isBarrier = 1, isIndirectBranch = 1 in
 def t2BXJ : T2I<(outs), (ins GPRnopc:$func), NoItinerary, "bxj", "\t$func", []>,
     Sched<[WriteBr]>, Requires<[IsThumb2, IsNotMClass]> {
   bits<4> func;
   let Inst{31-27} = 0b11110;
   let Inst{26} = 0;
   let Inst{25-20} = 0b111100;
   let Inst{19-16} = func;
   let Inst{15-0} = 0b1000111100000000;
 }
 
 // Compare and branch on zero / non-zero
 let isBranch = 1, isTerminator = 1 in {
   def tCBZ  : T1I<(outs), (ins tGPR:$Rn, thumb_cb_target:$target), IIC_Br,
                   "cbz\t$Rn, $target", []>,
               T1Misc<{0,0,?,1,?,?,?}>,
               Requires<[IsThumb, HasV8MBaseline]>, Sched<[WriteBr]> {
     // A8.6.27
     bits<6> target;
     bits<3> Rn;
     let Inst{9}   = target{5};
     let Inst{7-3} = target{4-0};
     let Inst{2-0} = Rn;
   }
 
   def tCBNZ : T1I<(outs), (ins tGPR:$Rn, thumb_cb_target:$target), IIC_Br,
                   "cbnz\t$Rn, $target", []>,
               T1Misc<{1,0,?,1,?,?,?}>,
               Requires<[IsThumb, HasV8MBaseline]>, Sched<[WriteBr]> {
     // A8.6.27
     bits<6> target;
     bits<3> Rn;
     let Inst{9}   = target{5};
     let Inst{7-3} = target{4-0};
     let Inst{2-0} = Rn;
   }
 }
 
 
 // Change Processor State is a system instruction.
 // FIXME: Since the asm parser has currently no clean way to handle optional
 // operands, create 3 versions of the same instruction. Once there's a clean
 // framework to represent optional operands, change this behavior.
 class t2CPS<dag iops, string asm_op> : T2XI<(outs), iops, NoItinerary,
             !strconcat("cps", asm_op), []>,
           Requires<[IsThumb2, IsNotMClass]> {
   bits<2> imod;
   bits<3> iflags;
   bits<5> mode;
   bit M;
 
   let Inst{31-11} = 0b111100111010111110000;
   let Inst{10-9}  = imod;
   let Inst{8}     = M;
   let Inst{7-5}   = iflags;
   let Inst{4-0}   = mode;
   let DecoderMethod = "DecodeT2CPSInstruction";
 }
 
 let M = 1 in
   def t2CPS3p : t2CPS<(ins imod_op:$imod, iflags_op:$iflags, i32imm:$mode),
                       "$imod\t$iflags, $mode">;
 let mode = 0, M = 0 in
   def t2CPS2p : t2CPS<(ins imod_op:$imod, iflags_op:$iflags),
                       "$imod.w\t$iflags">;
 let imod = 0, iflags = 0, M = 1 in
   def t2CPS1p : t2CPS<(ins imm0_31:$mode), "\t$mode">;
 
 def : t2InstAlias<"cps$imod.w $iflags, $mode",
                    (t2CPS3p imod_op:$imod, iflags_op:$iflags, i32imm:$mode), 0>;
 def : t2InstAlias<"cps.w $mode", (t2CPS1p imm0_31:$mode), 0>;
 
 // A6.3.4 Branches and miscellaneous control
 // Table A6-14 Change Processor State, and hint instructions
 def t2HINT : T2I<(outs), (ins imm0_239:$imm), NoItinerary, "hint", ".w\t$imm",
                   [(int_arm_hint imm0_239:$imm)]> {
   bits<8> imm;
   let Inst{31-3} = 0b11110011101011111000000000000;
   let Inst{7-0} = imm;
 }
 
 def : t2InstAlias<"hint$p $imm", (t2HINT imm0_239:$imm, pred:$p), 0>;
 def : t2InstAlias<"nop$p.w", (t2HINT 0, pred:$p), 1>;
 def : t2InstAlias<"yield$p.w", (t2HINT 1, pred:$p), 1>;
 def : t2InstAlias<"wfe$p.w", (t2HINT 2, pred:$p), 1>;
 def : t2InstAlias<"wfi$p.w", (t2HINT 3, pred:$p), 1>;
 def : t2InstAlias<"sev$p.w", (t2HINT 4, pred:$p), 1>;
 def : t2InstAlias<"sevl$p.w", (t2HINT 5, pred:$p), 1> {
   let Predicates = [IsThumb2, HasV8];
 }
 def : t2InstAlias<"esb$p.w", (t2HINT 16, pred:$p), 1> {
   let Predicates = [IsThumb2, HasRAS];
 }
 def : t2InstAlias<"esb$p", (t2HINT 16, pred:$p), 0> {
   let Predicates = [IsThumb2, HasRAS];
 }
 def : t2InstAlias<"csdb$p.w", (t2HINT 20, pred:$p), 0>;
 def : t2InstAlias<"csdb$p",   (t2HINT 20, pred:$p), 1>;
 
 def t2DBG : T2I<(outs), (ins imm0_15:$opt), NoItinerary, "dbg", "\t$opt",
                 [(int_arm_dbg imm0_15:$opt)]> {
   bits<4> opt;
   let Inst{31-20} = 0b111100111010;
   let Inst{19-16} = 0b1111;
   let Inst{15-8} = 0b10000000;
   let Inst{7-4} = 0b1111;
   let Inst{3-0} = opt;
 }
 
 // Secure Monitor Call is a system instruction.
 // Option = Inst{19-16}
 let isCall = 1, Uses = [SP] in
 def t2SMC : T2I<(outs), (ins imm0_15:$opt), NoItinerary, "smc", "\t$opt",
                 []>, Requires<[IsThumb2, HasTrustZone]> {
   let Inst{31-27} = 0b11110;
   let Inst{26-20} = 0b1111111;
   let Inst{15-12} = 0b1000;
 
   bits<4> opt;
   let Inst{19-16} = opt;
 }
 
 class T2DCPS<bits<2> opt, string opc>
   : T2I<(outs), (ins), NoItinerary, opc, "", []>, Requires<[IsThumb2, HasV8]> {
   let Inst{31-27} = 0b11110;
   let Inst{26-20} = 0b1111000;
   let Inst{19-16} = 0b1111;
   let Inst{15-12} = 0b1000;
   let Inst{11-2} = 0b0000000000;
   let Inst{1-0} = opt;
 }
 
 def t2DCPS1 : T2DCPS<0b01, "dcps1">;
 def t2DCPS2 : T2DCPS<0b10, "dcps2">;
 def t2DCPS3 : T2DCPS<0b11, "dcps3">;
 
 class T2SRS<bits<2> Op, bit W, dag oops, dag iops, InstrItinClass itin,
             string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern>,
     Requires<[IsThumb2,IsNotMClass]> {
   bits<5> mode;
   let Inst{31-25} = 0b1110100;
   let Inst{24-23} = Op;
   let Inst{22} = 0;
   let Inst{21} = W;
   let Inst{20-16} = 0b01101;
   let Inst{15-5} = 0b11000000000;
   let Inst{4-0} = mode{4-0};
 }
 
 // Store Return State is a system instruction.
 def t2SRSDB_UPD : T2SRS<0b00, 1, (outs), (ins imm0_31:$mode), NoItinerary,
                         "srsdb", "\tsp!, $mode", []>;
 def t2SRSDB  : T2SRS<0b00, 0, (outs), (ins imm0_31:$mode), NoItinerary,
                      "srsdb","\tsp, $mode", []>;
 def t2SRSIA_UPD : T2SRS<0b11, 1, (outs), (ins imm0_31:$mode), NoItinerary,
                         "srsia","\tsp!, $mode", []>;
 def t2SRSIA  : T2SRS<0b11, 0, (outs), (ins imm0_31:$mode), NoItinerary,
                      "srsia","\tsp, $mode", []>;
 
 
 def : t2InstAlias<"srsdb${p} $mode", (t2SRSDB imm0_31:$mode, pred:$p)>;
 def : t2InstAlias<"srsdb${p} $mode!", (t2SRSDB_UPD imm0_31:$mode, pred:$p)>;
 
 def : t2InstAlias<"srsia${p} $mode", (t2SRSIA imm0_31:$mode, pred:$p)>;
 def : t2InstAlias<"srsia${p} $mode!", (t2SRSIA_UPD imm0_31:$mode, pred:$p)>;
 
 // Return From Exception is a system instruction.
 let isReturn = 1, isBarrier = 1, isTerminator = 1, Defs = [PC] in
 class T2RFE<bits<12> op31_20, dag oops, dag iops, InstrItinClass itin,
           string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, itin, opc, asm, pattern>,
     Requires<[IsThumb2,IsNotMClass]> {
   let Inst{31-20} = op31_20{11-0};
 
   bits<4> Rn;
   let Inst{19-16} = Rn;
   let Inst{15-0} = 0xc000;
 }
 
 def t2RFEDBW : T2RFE<0b111010000011,
                    (outs), (ins GPR:$Rn), NoItinerary, "rfedb", "\t$Rn!",
                    [/* For disassembly only; pattern left blank */]>;
 def t2RFEDB  : T2RFE<0b111010000001,
                    (outs), (ins GPR:$Rn), NoItinerary, "rfedb", "\t$Rn",
                    [/* For disassembly only; pattern left blank */]>;
 def t2RFEIAW : T2RFE<0b111010011011,
                    (outs), (ins GPR:$Rn), NoItinerary, "rfeia", "\t$Rn!",
                    [/* For disassembly only; pattern left blank */]>;
 def t2RFEIA  : T2RFE<0b111010011001,
                    (outs), (ins GPR:$Rn), NoItinerary, "rfeia", "\t$Rn",
                    [/* For disassembly only; pattern left blank */]>;
 
 // B9.3.19 SUBS PC, LR, #imm (Thumb2) system instruction.
 // Exception return instruction is "subs pc, lr, #imm".
 let isReturn = 1, isBarrier = 1, isTerminator = 1, Defs = [PC] in
 def t2SUBS_PC_LR : T2I <(outs), (ins imm0_255:$imm), NoItinerary,
                         "subs", "\tpc, lr, $imm",
                         [(ARMintretflag imm0_255:$imm)]>,
                    Requires<[IsThumb2,IsNotMClass]> {
   let Inst{31-8} = 0b111100111101111010001111;
 
   bits<8> imm;
   let Inst{7-0} = imm;
 }
 
 // Hypervisor Call is a system instruction.
 let isCall = 1 in {
 def t2HVC : T2XI <(outs), (ins imm0_65535:$imm16), IIC_Br, "hvc.w\t$imm16", []>,
       Requires<[IsThumb2, HasVirtualization]>, Sched<[WriteBr]> {
     bits<16> imm16;
     let Inst{31-20} = 0b111101111110;
     let Inst{19-16} = imm16{15-12};
     let Inst{15-12} = 0b1000;
     let Inst{11-0} = imm16{11-0};
 }
 }
 
 // Alias for HVC without the ".w" optional width specifier
 def : t2InstAlias<"hvc\t$imm16", (t2HVC imm0_65535:$imm16)>;
 
 // ERET - Return from exception in Hypervisor mode.
 // B9.3.3, B9.3.20: ERET is an alias for "SUBS PC, LR, #0" in an implementation that
 // includes virtualization extensions.
 def t2ERET : InstAlias<"eret${p}", (t2SUBS_PC_LR 0, pred:$p), 1>,
              Requires<[IsThumb2, HasVirtualization]>;
 
 //===----------------------------------------------------------------------===//
 // Non-Instruction Patterns
 //
 
 // 32-bit immediate using movw + movt.
 // This is a single pseudo instruction to make it re-materializable.
 // FIXME: Remove this when we can do generalized remat.
 let isReMaterializable = 1, isMoveImm = 1 in
 def t2MOVi32imm : PseudoInst<(outs rGPR:$dst), (ins i32imm:$src), IIC_iMOVix2,
                             [(set rGPR:$dst, (i32 imm:$src))]>,
                             Requires<[IsThumb, UseMovt]>;
 
 // Pseudo instruction that combines movw + movt + add pc (if pic).
 // It also makes it possible to rematerialize the instructions.
 // FIXME: Remove this when we can do generalized remat and when machine licm
 // can properly the instructions.
 let isReMaterializable = 1 in {
 def t2MOV_ga_pcrel : PseudoInst<(outs rGPR:$dst), (ins i32imm:$addr),
                                 IIC_iMOVix2addpc,
                           [(set rGPR:$dst, (ARMWrapperPIC tglobaladdr:$addr))]>,
                           Requires<[IsThumb, HasV8MBaseline, UseMovtInPic]>;
 
 }
 
 def : T2Pat<(ARMWrapperPIC tglobaltlsaddr :$dst),
             (t2MOV_ga_pcrel tglobaltlsaddr:$dst)>,
       Requires<[IsThumb2, UseMovtInPic]>;
 def : T2Pat<(ARMWrapper tglobaltlsaddr:$dst),
             (t2MOVi32imm tglobaltlsaddr:$dst)>,
       Requires<[IsThumb2, UseMovt]>;
 
 // ConstantPool, GlobalAddress, and JumpTable
 def : T2Pat<(ARMWrapper tconstpool :$dst), (t2LEApcrel tconstpool :$dst)>;
 def : T2Pat<(ARMWrapper texternalsym :$dst), (t2MOVi32imm texternalsym :$dst)>,
     Requires<[IsThumb, HasV8MBaseline, UseMovt]>;
 def : T2Pat<(ARMWrapper tglobaladdr :$dst), (t2MOVi32imm tglobaladdr :$dst)>,
     Requires<[IsThumb, HasV8MBaseline, UseMovt]>;
 
 def : T2Pat<(ARMWrapperJT tjumptable:$dst), (t2LEApcrelJT tjumptable:$dst)>;
 
 // Pseudo instruction that combines ldr from constpool and add pc. This should
 // be expanded into two instructions late to allow if-conversion and
 // scheduling.
 let canFoldAsLoad = 1, isReMaterializable = 1 in
 def t2LDRpci_pic : PseudoInst<(outs rGPR:$dst), (ins i32imm:$addr, pclabel:$cp),
                    IIC_iLoadiALU,
               [(set rGPR:$dst, (ARMpic_add (load (ARMWrapper tconstpool:$addr)),
                                            imm:$cp))]>,
                Requires<[IsThumb2]>;
 
 // Pseudo isntruction that combines movs + predicated rsbmi
 // to implement integer ABS
 let usesCustomInserter = 1, Defs = [CPSR] in {
 def t2ABS : PseudoInst<(outs rGPR:$dst), (ins rGPR:$src),
                        NoItinerary, []>, Requires<[IsThumb2]>;
 }
 
 //===----------------------------------------------------------------------===//
 // Coprocessor load/store -- for disassembly only
 //
 class T2CI<bits<4> op31_28, dag oops, dag iops, string opc, string asm, list<dag> pattern>
   : T2I<oops, iops, NoItinerary, opc, asm, pattern> {
   let Inst{31-28} = op31_28;
   let Inst{27-25} = 0b110;
 }
 
 multiclass t2LdStCop<bits<4> op31_28, bit load, bit Dbit, string asm, list<dag> pattern> {
   def _OFFSET : T2CI<op31_28,
                      (outs), (ins p_imm:$cop, c_imm:$CRd, addrmode5:$addr),
                      asm, "\t$cop, $CRd, $addr", pattern> {
     bits<13> addr;
     bits<4> cop;
     bits<4> CRd;
     let Inst{24} = 1; // P = 1
     let Inst{23} = addr{8};
     let Inst{22} = Dbit;
     let Inst{21} = 0; // W = 0
     let Inst{20} = load;
     let Inst{19-16} = addr{12-9};
     let Inst{15-12} = CRd;
     let Inst{11-8} = cop;
     let Inst{7-0} = addr{7-0};
     let DecoderMethod = "DecodeCopMemInstruction";
   }
   def _PRE : T2CI<op31_28,
                   (outs), (ins p_imm:$cop, c_imm:$CRd, addrmode5_pre:$addr),
                   asm, "\t$cop, $CRd, $addr!", []> {
     bits<13> addr;
     bits<4> cop;
     bits<4> CRd;
     let Inst{24} = 1; // P = 1
     let Inst{23} = addr{8};
     let Inst{22} = Dbit;
     let Inst{21} = 1; // W = 1
     let Inst{20} = load;
     let Inst{19-16} = addr{12-9};
     let Inst{15-12} = CRd;
     let Inst{11-8} = cop;
     let Inst{7-0} = addr{7-0};
     let DecoderMethod = "DecodeCopMemInstruction";
   }
   def _POST: T2CI<op31_28,
                   (outs), (ins p_imm:$cop, c_imm:$CRd, addr_offset_none:$addr,
                                postidx_imm8s4:$offset),
                  asm, "\t$cop, $CRd, $addr, $offset", []> {
     bits<9> offset;
     bits<4> addr;
     bits<4> cop;
     bits<4> CRd;
     let Inst{24} = 0; // P = 0
     let Inst{23} = offset{8};
     let Inst{22} = Dbit;
     let Inst{21} = 1; // W = 1
     let Inst{20} = load;
     let Inst{19-16} = addr;
     let Inst{15-12} = CRd;
     let Inst{11-8} = cop;
     let Inst{7-0} = offset{7-0};
     let DecoderMethod = "DecodeCopMemInstruction";
   }
   def _OPTION : T2CI<op31_28, (outs),
                      (ins p_imm:$cop, c_imm:$CRd, addr_offset_none:$addr,
                           coproc_option_imm:$option),
       asm, "\t$cop, $CRd, $addr, $option", []> {
     bits<8> option;
     bits<4> addr;
     bits<4> cop;
     bits<4> CRd;
     let Inst{24} = 0; // P = 0
     let Inst{23} = 1; // U = 1
     let Inst{22} = Dbit;
     let Inst{21} = 0; // W = 0
     let Inst{20} = load;
     let Inst{19-16} = addr;
     let Inst{15-12} = CRd;
     let Inst{11-8} = cop;
     let Inst{7-0} = option;
     let DecoderMethod = "DecodeCopMemInstruction";
   }
 }
 
 let DecoderNamespace = "Thumb2CoProc" in {
 defm t2LDC   : t2LdStCop<0b1110, 1, 0, "ldc", [(int_arm_ldc imm:$cop, imm:$CRd, addrmode5:$addr)]>;
 defm t2LDCL  : t2LdStCop<0b1110, 1, 1, "ldcl", [(int_arm_ldcl imm:$cop, imm:$CRd, addrmode5:$addr)]>;
 defm t2LDC2  : t2LdStCop<0b1111, 1, 0, "ldc2", [(int_arm_ldc2 imm:$cop, imm:$CRd, addrmode5:$addr)]>, Requires<[PreV8,IsThumb2]>;
 defm t2LDC2L : t2LdStCop<0b1111, 1, 1, "ldc2l", [(int_arm_ldc2l imm:$cop, imm:$CRd, addrmode5:$addr)]>, Requires<[PreV8,IsThumb2]>;
 
 defm t2STC   : t2LdStCop<0b1110, 0, 0, "stc", [(int_arm_stc imm:$cop, imm:$CRd, addrmode5:$addr)]>;
 defm t2STCL  : t2LdStCop<0b1110, 0, 1, "stcl", [(int_arm_stcl imm:$cop, imm:$CRd, addrmode5:$addr)]>;
 defm t2STC2  : t2LdStCop<0b1111, 0, 0, "stc2", [(int_arm_stc2 imm:$cop, imm:$CRd, addrmode5:$addr)]>, Requires<[PreV8,IsThumb2]>;
 defm t2STC2L : t2LdStCop<0b1111, 0, 1, "stc2l", [(int_arm_stc2l imm:$cop, imm:$CRd, addrmode5:$addr)]>, Requires<[PreV8,IsThumb2]>;
 }
 
 
 //===----------------------------------------------------------------------===//
 // Move between special register and ARM core register -- for disassembly only
 //
 // Move to ARM core register from Special Register
 
 // A/R class MRS.
 //
 // A/R class can only move from CPSR or SPSR.
 def t2MRS_AR : T2I<(outs GPR:$Rd), (ins), NoItinerary, "mrs", "\t$Rd, apsr",
                   []>, Requires<[IsThumb2,IsNotMClass]> {
   bits<4> Rd;
   let Inst{31-12} = 0b11110011111011111000;
   let Inst{11-8} = Rd;
   let Inst{7-0} = 0b00000000;
 }
 
 def : t2InstAlias<"mrs${p} $Rd, cpsr", (t2MRS_AR GPR:$Rd, pred:$p)>;
 
 def t2MRSsys_AR: T2I<(outs GPR:$Rd), (ins), NoItinerary, "mrs", "\t$Rd, spsr",
                    []>, Requires<[IsThumb2,IsNotMClass]> {
   bits<4> Rd;
   let Inst{31-12} = 0b11110011111111111000;
   let Inst{11-8} = Rd;
   let Inst{7-0} = 0b00000000;
 }
 
 def t2MRSbanked : T2I<(outs rGPR:$Rd), (ins banked_reg:$banked),
                       NoItinerary, "mrs", "\t$Rd, $banked", []>,
                   Requires<[IsThumb, HasVirtualization]> {
   bits<6> banked;
   bits<4> Rd;
 
   let Inst{31-21} = 0b11110011111;
   let Inst{20} = banked{5}; // R bit
   let Inst{19-16} = banked{3-0};
   let Inst{15-12} = 0b1000;
   let Inst{11-8} = Rd;
   let Inst{7-5} = 0b001;
   let Inst{4} = banked{4};
   let Inst{3-0} = 0b0000;
 }
 
 
 // M class MRS.
 //
 // This MRS has a mask field in bits 7-0 and can take more values than
 // the A/R class (a full msr_mask).
 def t2MRS_M : T2I<(outs rGPR:$Rd), (ins msr_mask:$SYSm), NoItinerary,
                   "mrs", "\t$Rd, $SYSm", []>,
               Requires<[IsThumb,IsMClass]> {
   bits<4> Rd;
   bits<8> SYSm;
   let Inst{31-12} = 0b11110011111011111000;
   let Inst{11-8} = Rd;
   let Inst{7-0} = SYSm;
 
   let Unpredictable{20-16} = 0b11111;
   let Unpredictable{13} = 0b1;
 }
 
 
 // Move from ARM core register to Special Register
 //
 // A/R class MSR.
 //
 // No need to have both system and application versions, the encodings are the
 // same and the assembly parser has no way to distinguish between them. The mask
 // operand contains the special register (R Bit) in bit 4 and bits 3-0 contains
 // the mask with the fields to be accessed in the special register.
 let Defs = [CPSR] in
 def t2MSR_AR : T2I<(outs), (ins msr_mask:$mask, rGPR:$Rn),
                    NoItinerary, "msr", "\t$mask, $Rn", []>,
                Requires<[IsThumb2,IsNotMClass]> {
   bits<5> mask;
   bits<4> Rn;
   let Inst{31-21} = 0b11110011100;
   let Inst{20}    = mask{4}; // R Bit
   let Inst{19-16} = Rn;
   let Inst{15-12} = 0b1000;
   let Inst{11-8}  = mask{3-0};
   let Inst{7-0}   = 0;
 }
 
 // However, the MSR (banked register) system instruction (ARMv7VE) *does* have a
 // separate encoding (distinguished by bit 5.
 def t2MSRbanked : T2I<(outs), (ins banked_reg:$banked, rGPR:$Rn),
                       NoItinerary, "msr", "\t$banked, $Rn", []>,
                   Requires<[IsThumb, HasVirtualization]> {
   bits<6> banked;
   bits<4> Rn;
 
   let Inst{31-21} = 0b11110011100;
   let Inst{20} = banked{5}; // R bit
   let Inst{19-16} = Rn;
   let Inst{15-12} = 0b1000;
   let Inst{11-8} = banked{3-0};
   let Inst{7-5} = 0b001;
   let Inst{4} = banked{4};
   let Inst{3-0} = 0b0000;
 }
 
 
 // M class MSR.
 //
 // Move from ARM core register to Special Register
 let Defs = [CPSR] in
 def t2MSR_M : T2I<(outs), (ins msr_mask:$SYSm, rGPR:$Rn),
                   NoItinerary, "msr", "\t$SYSm, $Rn", []>,
               Requires<[IsThumb,IsMClass]> {
   bits<12> SYSm;
   bits<4> Rn;
   let Inst{31-21} = 0b11110011100;
   let Inst{20}    = 0b0;
   let Inst{19-16} = Rn;
   let Inst{15-12} = 0b1000;
   let Inst{11-10} = SYSm{11-10};
   let Inst{9-8}   = 0b00;
   let Inst{7-0}   = SYSm{7-0};
 
   let Unpredictable{20} = 0b1;
   let Unpredictable{13} = 0b1;
   let Unpredictable{9-8} = 0b11;
 }
 
 
 //===----------------------------------------------------------------------===//
 // Move between coprocessor and ARM core register
 //
 
 class t2MovRCopro<bits<4> Op, string opc, bit direction, dag oops, dag iops,
                   list<dag> pattern>
   : T2Cop<Op, oops, iops, opc, "\t$cop, $opc1, $Rt, $CRn, $CRm, $opc2",
           pattern> {
   let Inst{27-24} = 0b1110;
   let Inst{20} = direction;
   let Inst{4} = 1;
 
   bits<4> Rt;
   bits<4> cop;
   bits<3> opc1;
   bits<3> opc2;
   bits<4> CRm;
   bits<4> CRn;
 
   let Inst{15-12} = Rt;
   let Inst{11-8}  = cop;
   let Inst{23-21} = opc1;
   let Inst{7-5}   = opc2;
   let Inst{3-0}   = CRm;
   let Inst{19-16} = CRn;
 
   let DecoderNamespace = "Thumb2CoProc";
 }
 
 class t2MovRRCopro<bits<4> Op, string opc, bit direction, dag oops, dag iops,
                    list<dag> pattern = []>
   : T2Cop<Op, oops, iops, opc, "\t$cop, $opc1, $Rt, $Rt2, $CRm", pattern> {
   let Inst{27-24} = 0b1100;
   let Inst{23-21} = 0b010;
   let Inst{20} = direction;
 
   bits<4> Rt;
   bits<4> Rt2;
   bits<4> cop;
   bits<4> opc1;
   bits<4> CRm;
 
   let Inst{15-12} = Rt;
   let Inst{19-16} = Rt2;
   let Inst{11-8}  = cop;
   let Inst{7-4}   = opc1;
   let Inst{3-0}   = CRm;
 
   let DecoderNamespace = "Thumb2CoProc";
 }
 
 /* from ARM core register to coprocessor */
 def t2MCR : t2MovRCopro<0b1110, "mcr", 0,
            (outs),
            (ins p_imm:$cop, imm0_7:$opc1, GPR:$Rt, c_imm:$CRn,
                 c_imm:$CRm, imm0_7:$opc2),
            [(int_arm_mcr imm:$cop, imm:$opc1, GPR:$Rt, imm:$CRn,
                          imm:$CRm, imm:$opc2)]>,
            ComplexDeprecationPredicate<"MCR">;
 def : t2InstAlias<"mcr${p} $cop, $opc1, $Rt, $CRn, $CRm",
                   (t2MCR p_imm:$cop, imm0_7:$opc1, GPR:$Rt, c_imm:$CRn,
                          c_imm:$CRm, 0, pred:$p)>;
 def t2MCR2 : t2MovRCopro<0b1111, "mcr2", 0,
              (outs), (ins p_imm:$cop, imm0_7:$opc1, GPR:$Rt, c_imm:$CRn,
                           c_imm:$CRm, imm0_7:$opc2),
              [(int_arm_mcr2 imm:$cop, imm:$opc1, GPR:$Rt, imm:$CRn,
                             imm:$CRm, imm:$opc2)]> {
   let Predicates = [IsThumb2, PreV8];
 }
 def : t2InstAlias<"mcr2${p} $cop, $opc1, $Rt, $CRn, $CRm",
                   (t2MCR2 p_imm:$cop, imm0_7:$opc1, GPR:$Rt, c_imm:$CRn,
                           c_imm:$CRm, 0, pred:$p)>;
 
 /* from coprocessor to ARM core register */
 def t2MRC : t2MovRCopro<0b1110, "mrc", 1,
              (outs GPRwithAPSR:$Rt), (ins p_imm:$cop, imm0_7:$opc1, c_imm:$CRn,
                                   c_imm:$CRm, imm0_7:$opc2), []>;
 def : t2InstAlias<"mrc${p} $cop, $opc1, $Rt, $CRn, $CRm",
                   (t2MRC GPRwithAPSR:$Rt, p_imm:$cop, imm0_7:$opc1, c_imm:$CRn,
                          c_imm:$CRm, 0, pred:$p)>;
 
 def t2MRC2 : t2MovRCopro<0b1111, "mrc2", 1,
              (outs GPRwithAPSR:$Rt), (ins p_imm:$cop, imm0_7:$opc1, c_imm:$CRn,
                                   c_imm:$CRm, imm0_7:$opc2), []> {
   let Predicates = [IsThumb2, PreV8];
 }
 def : t2InstAlias<"mrc2${p} $cop, $opc1, $Rt, $CRn, $CRm",
                   (t2MRC2 GPRwithAPSR:$Rt, p_imm:$cop, imm0_7:$opc1, c_imm:$CRn,
                           c_imm:$CRm, 0, pred:$p)>;
 
 def : T2v6Pat<(int_arm_mrc  imm:$cop, imm:$opc1, imm:$CRn, imm:$CRm, imm:$opc2),
               (t2MRC imm:$cop, imm:$opc1, imm:$CRn, imm:$CRm, imm:$opc2)>;
 
 def : T2v6Pat<(int_arm_mrc2 imm:$cop, imm:$opc1, imm:$CRn, imm:$CRm, imm:$opc2),
               (t2MRC2 imm:$cop, imm:$opc1, imm:$CRn, imm:$CRm, imm:$opc2)>;
 
 
 /* from ARM core register to coprocessor */
 def t2MCRR : t2MovRRCopro<0b1110, "mcrr", 0, (outs),
                          (ins p_imm:$cop, imm0_15:$opc1, GPR:$Rt, GPR:$Rt2,
                          c_imm:$CRm),
                         [(int_arm_mcrr imm:$cop, imm:$opc1, GPR:$Rt, GPR:$Rt2,
                                        imm:$CRm)]>;
 def t2MCRR2 : t2MovRRCopro<0b1111, "mcrr2", 0, (outs),
                           (ins p_imm:$cop, imm0_15:$opc1, GPR:$Rt, GPR:$Rt2,
                            c_imm:$CRm),
                           [(int_arm_mcrr2 imm:$cop, imm:$opc1, GPR:$Rt,
                                           GPR:$Rt2, imm:$CRm)]> {
   let Predicates = [IsThumb2, PreV8];
 }
 
 /* from coprocessor to ARM core register */
 def t2MRRC : t2MovRRCopro<0b1110, "mrrc", 1, (outs GPR:$Rt, GPR:$Rt2),
                           (ins p_imm:$cop, imm0_15:$opc1, c_imm:$CRm)>;
 
 def t2MRRC2 : t2MovRRCopro<0b1111, "mrrc2", 1, (outs GPR:$Rt, GPR:$Rt2),
                            (ins p_imm:$cop, imm0_15:$opc1, c_imm:$CRm)> {
   let Predicates = [IsThumb2, PreV8];
 }
 
 //===----------------------------------------------------------------------===//
 // Other Coprocessor Instructions.
 //
 
 def t2CDP : T2Cop<0b1110, (outs), (ins p_imm:$cop, imm0_15:$opc1,
                  c_imm:$CRd, c_imm:$CRn, c_imm:$CRm, imm0_7:$opc2),
                  "cdp", "\t$cop, $opc1, $CRd, $CRn, $CRm, $opc2",
                  [(int_arm_cdp imm:$cop, imm:$opc1, imm:$CRd, imm:$CRn,
                                imm:$CRm, imm:$opc2)]> {
   let Inst{27-24} = 0b1110;
 
   bits<4> opc1;
   bits<4> CRn;
   bits<4> CRd;
   bits<4> cop;
   bits<3> opc2;
   bits<4> CRm;
 
   let Inst{3-0}   = CRm;
   let Inst{4}     = 0;
   let Inst{7-5}   = opc2;
   let Inst{11-8}  = cop;
   let Inst{15-12} = CRd;
   let Inst{19-16} = CRn;
   let Inst{23-20} = opc1;
 
   let Predicates = [IsThumb2, PreV8];
   let DecoderNamespace = "Thumb2CoProc";
 }
 
 def t2CDP2 : T2Cop<0b1111, (outs), (ins p_imm:$cop, imm0_15:$opc1,
                    c_imm:$CRd, c_imm:$CRn, c_imm:$CRm, imm0_7:$opc2),
                    "cdp2", "\t$cop, $opc1, $CRd, $CRn, $CRm, $opc2",
                    [(int_arm_cdp2 imm:$cop, imm:$opc1, imm:$CRd, imm:$CRn,
                                   imm:$CRm, imm:$opc2)]> {
   let Inst{27-24} = 0b1110;
 
   bits<4> opc1;
   bits<4> CRn;
   bits<4> CRd;
   bits<4> cop;
   bits<3> opc2;
   bits<4> CRm;
 
   let Inst{3-0}   = CRm;
   let Inst{4}     = 0;
   let Inst{7-5}   = opc2;
   let Inst{11-8}  = cop;
   let Inst{15-12} = CRd;
   let Inst{19-16} = CRn;
   let Inst{23-20} = opc1;
 
   let Predicates = [IsThumb2, PreV8];
   let DecoderNamespace = "Thumb2CoProc";
 }
 
 
 
 //===----------------------------------------------------------------------===//
 // ARMv8.1 Privilege Access Never extension
 //
 // SETPAN #imm1
 
 def t2SETPAN : T1I<(outs), (ins imm0_1:$imm), NoItinerary, "setpan\t$imm", []>,
                T1Misc<0b0110000>, Requires<[IsThumb2, HasV8, HasV8_1a]> {
   bits<1> imm;
 
   let Inst{4} = 0b1;
   let Inst{3} = imm;
   let Inst{2-0} = 0b000;
 
   let Unpredictable{4} = 0b1;
   let Unpredictable{2-0} = 0b111;
 }
 
 //===----------------------------------------------------------------------===//
 // ARMv8-M Security Extensions instructions
 //
 
 let hasSideEffects = 1 in
 def t2SG : T2I<(outs), (ins), NoItinerary, "sg", "", []>,
            Requires<[Has8MSecExt]> {
   let Inst = 0xe97fe97f;
 }
 
 class T2TT<bits<2> at, string asm, list<dag> pattern>
   : T2I<(outs rGPR:$Rt), (ins GPRnopc:$Rn), NoItinerary, asm, "\t$Rt, $Rn",
         pattern> {
   bits<4> Rn;
   bits<4> Rt;
 
   let Inst{31-20} = 0b111010000100;
   let Inst{19-16} = Rn;
   let Inst{15-12} = 0b1111;
   let Inst{11-8} = Rt;
   let Inst{7-6} = at;
   let Inst{5-0} = 0b000000;
 
   let Unpredictable{5-0} = 0b111111;
 }
 
 def t2TT   : T2TT<0b00, "tt",   []>, Requires<[IsThumb,Has8MSecExt]>;
 def t2TTT  : T2TT<0b01, "ttt",  []>, Requires<[IsThumb,Has8MSecExt]>;
 def t2TTA  : T2TT<0b10, "tta",  []>, Requires<[IsThumb,Has8MSecExt]>;
 def t2TTAT : T2TT<0b11, "ttat", []>, Requires<[IsThumb,Has8MSecExt]>;
 
 //===----------------------------------------------------------------------===//
 // Non-Instruction Patterns
 //
 
 // SXT/UXT with no rotate
 let AddedComplexity = 16 in {
 def : T2Pat<(and rGPR:$Rm, 0x000000FF), (t2UXTB rGPR:$Rm, 0)>,
            Requires<[IsThumb2]>;
 def : T2Pat<(and rGPR:$Rm, 0x0000FFFF), (t2UXTH rGPR:$Rm, 0)>,
            Requires<[IsThumb2]>;
 def : T2Pat<(and rGPR:$Rm, 0x00FF00FF), (t2UXTB16 rGPR:$Rm, 0)>,
            Requires<[HasDSP, IsThumb2]>;
 def : T2Pat<(add rGPR:$Rn, (and rGPR:$Rm, 0x00FF)),
             (t2UXTAB rGPR:$Rn, rGPR:$Rm, 0)>,
            Requires<[HasDSP, IsThumb2]>;
 def : T2Pat<(add rGPR:$Rn, (and rGPR:$Rm, 0xFFFF)),
             (t2UXTAH rGPR:$Rn, rGPR:$Rm, 0)>,
            Requires<[HasDSP, IsThumb2]>;
 }
 
 def : T2Pat<(sext_inreg rGPR:$Src, i8),  (t2SXTB rGPR:$Src, 0)>,
            Requires<[IsThumb2]>;
 def : T2Pat<(sext_inreg rGPR:$Src, i16), (t2SXTH rGPR:$Src, 0)>,
            Requires<[IsThumb2]>;
 def : T2Pat<(add rGPR:$Rn, (sext_inreg rGPR:$Rm, i8)),
             (t2SXTAB rGPR:$Rn, rGPR:$Rm, 0)>,
            Requires<[HasDSP, IsThumb2]>;
 def : T2Pat<(add rGPR:$Rn, (sext_inreg rGPR:$Rm, i16)),
             (t2SXTAH rGPR:$Rn, rGPR:$Rm, 0)>,
            Requires<[HasDSP, IsThumb2]>;
 
 // Atomic load/store patterns
 def : T2Pat<(atomic_load_8   t2addrmode_imm12:$addr),
             (t2LDRBi12  t2addrmode_imm12:$addr)>;
 def : T2Pat<(atomic_load_8   t2addrmode_negimm8:$addr),
             (t2LDRBi8   t2addrmode_negimm8:$addr)>;
 def : T2Pat<(atomic_load_8   t2addrmode_so_reg:$addr),
             (t2LDRBs    t2addrmode_so_reg:$addr)>;
 def : T2Pat<(atomic_load_16  t2addrmode_imm12:$addr),
             (t2LDRHi12  t2addrmode_imm12:$addr)>;
 def : T2Pat<(atomic_load_16  t2addrmode_negimm8:$addr),
             (t2LDRHi8   t2addrmode_negimm8:$addr)>;
 def : T2Pat<(atomic_load_16  t2addrmode_so_reg:$addr),
             (t2LDRHs    t2addrmode_so_reg:$addr)>;
 def : T2Pat<(atomic_load_32  t2addrmode_imm12:$addr),
             (t2LDRi12   t2addrmode_imm12:$addr)>;
 def : T2Pat<(atomic_load_32  t2addrmode_negimm8:$addr),
             (t2LDRi8    t2addrmode_negimm8:$addr)>;
 def : T2Pat<(atomic_load_32  t2addrmode_so_reg:$addr),
             (t2LDRs     t2addrmode_so_reg:$addr)>;
 def : T2Pat<(atomic_store_8  t2addrmode_imm12:$addr, GPR:$val),
             (t2STRBi12  GPR:$val, t2addrmode_imm12:$addr)>;
 def : T2Pat<(atomic_store_8  t2addrmode_negimm8:$addr, GPR:$val),
             (t2STRBi8   GPR:$val, t2addrmode_negimm8:$addr)>;
 def : T2Pat<(atomic_store_8  t2addrmode_so_reg:$addr, GPR:$val),
             (t2STRBs    GPR:$val, t2addrmode_so_reg:$addr)>;
 def : T2Pat<(atomic_store_16 t2addrmode_imm12:$addr, GPR:$val),
             (t2STRHi12  GPR:$val, t2addrmode_imm12:$addr)>;
 def : T2Pat<(atomic_store_16 t2addrmode_negimm8:$addr, GPR:$val),
             (t2STRHi8   GPR:$val, t2addrmode_negimm8:$addr)>;
 def : T2Pat<(atomic_store_16 t2addrmode_so_reg:$addr, GPR:$val),
             (t2STRHs    GPR:$val, t2addrmode_so_reg:$addr)>;
 def : T2Pat<(atomic_store_32 t2addrmode_imm12:$addr, GPR:$val),
             (t2STRi12   GPR:$val, t2addrmode_imm12:$addr)>;
 def : T2Pat<(atomic_store_32 t2addrmode_negimm8:$addr, GPR:$val),
             (t2STRi8    GPR:$val, t2addrmode_negimm8:$addr)>;
 def : T2Pat<(atomic_store_32 t2addrmode_so_reg:$addr, GPR:$val),
             (t2STRs     GPR:$val, t2addrmode_so_reg:$addr)>;
 
 let AddedComplexity = 8 in {
   def : T2Pat<(atomic_load_acquire_8 addr_offset_none:$addr),  (t2LDAB addr_offset_none:$addr)>;
   def : T2Pat<(atomic_load_acquire_16 addr_offset_none:$addr), (t2LDAH addr_offset_none:$addr)>;
   def : T2Pat<(atomic_load_acquire_32 addr_offset_none:$addr), (t2LDA  addr_offset_none:$addr)>;
   def : T2Pat<(atomic_store_release_8 addr_offset_none:$addr, GPR:$val),  (t2STLB GPR:$val, addr_offset_none:$addr)>;
   def : T2Pat<(atomic_store_release_16 addr_offset_none:$addr, GPR:$val), (t2STLH GPR:$val, addr_offset_none:$addr)>;
   def : T2Pat<(atomic_store_release_32 addr_offset_none:$addr, GPR:$val), (t2STL  GPR:$val, addr_offset_none:$addr)>;
 }
 
 
 //===----------------------------------------------------------------------===//
 // Assembler aliases
 //
 
 // Aliases for ADC without the ".w" optional width specifier.
 def : t2InstAlias<"adc${s}${p} $Rd, $Rn, $Rm",
                   (t2ADCrr rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"adc${s}${p} $Rd, $Rn, $ShiftedRm",
                   (t2ADCrs rGPR:$Rd, rGPR:$Rn, t2_so_reg:$ShiftedRm,
                            pred:$p, cc_out:$s)>;
 
 // Aliases for SBC without the ".w" optional width specifier.
 def : t2InstAlias<"sbc${s}${p} $Rd, $Rn, $Rm",
                   (t2SBCrr rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"sbc${s}${p} $Rd, $Rn, $ShiftedRm",
                   (t2SBCrs rGPR:$Rd, rGPR:$Rn, t2_so_reg:$ShiftedRm,
                            pred:$p, cc_out:$s)>;
 
 // Aliases for ADD without the ".w" optional width specifier.
 def : t2InstAlias<"add${s}${p} $Rd, $Rn, $imm",
         (t2ADDri GPRnopc:$Rd, GPRnopc:$Rn, t2_so_imm:$imm, pred:$p,
          cc_out:$s)>;
 def : t2InstAlias<"add${p} $Rd, $Rn, $imm",
            (t2ADDri12 GPRnopc:$Rd, GPR:$Rn, imm0_4095:$imm, pred:$p)>;
 def : t2InstAlias<"add${s}${p} $Rd, $Rn, $Rm",
               (t2ADDrr GPRnopc:$Rd, GPRnopc:$Rn, rGPR:$Rm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"add${s}${p} $Rd, $Rn, $ShiftedRm",
                   (t2ADDrs GPRnopc:$Rd, GPRnopc:$Rn, t2_so_reg:$ShiftedRm,
                            pred:$p, cc_out:$s)>;
 // ... and with the destination and source register combined.
 def : t2InstAlias<"add${s}${p} $Rdn, $imm",
       (t2ADDri GPRnopc:$Rdn, GPRnopc:$Rdn, t2_so_imm:$imm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"add${p} $Rdn, $imm",
            (t2ADDri12 GPRnopc:$Rdn, GPRnopc:$Rdn, imm0_4095:$imm, pred:$p)>;
 def : t2InstAlias<"add${s}${p} $Rdn, $Rm",
             (t2ADDrr GPRnopc:$Rdn, GPRnopc:$Rdn, rGPR:$Rm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"add${s}${p} $Rdn, $ShiftedRm",
                   (t2ADDrs GPRnopc:$Rdn, GPRnopc:$Rdn, t2_so_reg:$ShiftedRm,
                            pred:$p, cc_out:$s)>;
 
 // add w/ negative immediates is just a sub.
 def : t2InstSubst<"add${s}${p} $Rd, $Rn, $imm",
         (t2SUBri GPRnopc:$Rd, GPRnopc:$Rn, t2_so_imm_neg:$imm, pred:$p,
                  cc_out:$s)>;
 def : t2InstSubst<"add${p} $Rd, $Rn, $imm",
            (t2SUBri12 GPRnopc:$Rd, GPR:$Rn, imm0_4095_neg:$imm, pred:$p)>;
 def : t2InstSubst<"add${s}${p} $Rdn, $imm",
       (t2SUBri GPRnopc:$Rdn, GPRnopc:$Rdn, t2_so_imm_neg:$imm, pred:$p,
                cc_out:$s)>;
 def : t2InstSubst<"add${p} $Rdn, $imm",
            (t2SUBri12 GPRnopc:$Rdn, GPRnopc:$Rdn, imm0_4095_neg:$imm, pred:$p)>;
 
 def : t2InstSubst<"add${s}${p}.w $Rd, $Rn, $imm",
         (t2SUBri GPRnopc:$Rd, GPRnopc:$Rn, t2_so_imm_neg:$imm, pred:$p,
                  cc_out:$s)>;
 def : t2InstSubst<"addw${p} $Rd, $Rn, $imm",
            (t2SUBri12 GPRnopc:$Rd, GPR:$Rn, imm0_4095_neg:$imm, pred:$p)>;
 def : t2InstSubst<"add${s}${p}.w $Rdn, $imm",
       (t2SUBri GPRnopc:$Rdn, GPRnopc:$Rdn, t2_so_imm_neg:$imm, pred:$p,
                cc_out:$s)>;
 def : t2InstSubst<"addw${p} $Rdn, $imm",
            (t2SUBri12 GPRnopc:$Rdn, GPRnopc:$Rdn, imm0_4095_neg:$imm, pred:$p)>;
 
 
 // Aliases for SUB without the ".w" optional width specifier.
 def : t2InstAlias<"sub${s}${p} $Rd, $Rn, $imm",
         (t2SUBri GPRnopc:$Rd, GPRnopc:$Rn, t2_so_imm:$imm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"sub${p} $Rd, $Rn, $imm",
            (t2SUBri12 GPRnopc:$Rd, GPR:$Rn, imm0_4095:$imm, pred:$p)>;
 def : t2InstAlias<"sub${s}${p} $Rd, $Rn, $Rm",
               (t2SUBrr GPRnopc:$Rd, GPRnopc:$Rn, rGPR:$Rm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"sub${s}${p} $Rd, $Rn, $ShiftedRm",
                   (t2SUBrs GPRnopc:$Rd, GPRnopc:$Rn, t2_so_reg:$ShiftedRm,
                            pred:$p, cc_out:$s)>;
 // ... and with the destination and source register combined.
 def : t2InstAlias<"sub${s}${p} $Rdn, $imm",
       (t2SUBri GPRnopc:$Rdn, GPRnopc:$Rdn, t2_so_imm:$imm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"sub${p} $Rdn, $imm",
            (t2SUBri12 GPRnopc:$Rdn, GPRnopc:$Rdn, imm0_4095:$imm, pred:$p)>;
 def : t2InstAlias<"sub${s}${p}.w $Rdn, $Rm",
             (t2SUBrr GPRnopc:$Rdn, GPRnopc:$Rdn, rGPR:$Rm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"sub${s}${p} $Rdn, $Rm",
             (t2SUBrr GPRnopc:$Rdn, GPRnopc:$Rdn, rGPR:$Rm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"sub${s}${p} $Rdn, $ShiftedRm",
                   (t2SUBrs GPRnopc:$Rdn, GPRnopc:$Rdn, t2_so_reg:$ShiftedRm,
                            pred:$p, cc_out:$s)>;
 
 // Alias for compares without the ".w" optional width specifier.
 def : t2InstAlias<"cmn${p} $Rn, $Rm",
                   (t2CMNzrr GPRnopc:$Rn, rGPR:$Rm, pred:$p)>;
 def : t2InstAlias<"teq${p} $Rn, $Rm",
                   (t2TEQrr GPRnopc:$Rn, rGPR:$Rm, pred:$p)>;
 def : t2InstAlias<"tst${p} $Rn, $Rm",
                   (t2TSTrr GPRnopc:$Rn, rGPR:$Rm, pred:$p)>;
 
 // Memory barriers
 def : InstAlias<"dmb${p}", (t2DMB 0xf, pred:$p), 0>, Requires<[HasDB]>;
 def : InstAlias<"dsb${p}", (t2DSB 0xf, pred:$p), 0>, Requires<[HasDB]>;
 def : InstAlias<"isb${p}", (t2ISB 0xf, pred:$p), 0>, Requires<[HasDB]>;
 // Armv8-R 'Data Full Barrier'
 def : InstAlias<"dfb${p}", (t2DSB 0xc, pred:$p), 1>, Requires<[HasDFB]>;
 
 // Alias for LDR, LDRB, LDRH, LDRSB, and LDRSH without the ".w" optional
 // width specifier.
 def : t2InstAlias<"ldr${p} $Rt, $addr",
                   (t2LDRi12 GPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>;
 def : t2InstAlias<"ldrb${p} $Rt, $addr",
                   (t2LDRBi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>;
 def : t2InstAlias<"ldrh${p} $Rt, $addr",
                   (t2LDRHi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>;
 def : t2InstAlias<"ldrsb${p} $Rt, $addr",
                   (t2LDRSBi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>;
 def : t2InstAlias<"ldrsh${p} $Rt, $addr",
                   (t2LDRSHi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>;
 
 def : t2InstAlias<"ldr${p} $Rt, $addr",
                   (t2LDRs GPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>;
 def : t2InstAlias<"ldrb${p} $Rt, $addr",
                   (t2LDRBs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>;
 def : t2InstAlias<"ldrh${p} $Rt, $addr",
                   (t2LDRHs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>;
 def : t2InstAlias<"ldrsb${p} $Rt, $addr",
                   (t2LDRSBs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>;
 def : t2InstAlias<"ldrsh${p} $Rt, $addr",
                   (t2LDRSHs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>;
 
 def : t2InstAlias<"ldr${p} $Rt, $addr",
                   (t2LDRpci GPR:$Rt, t2ldrlabel:$addr, pred:$p)>;
 def : t2InstAlias<"ldrb${p} $Rt, $addr",
                   (t2LDRBpci rGPR:$Rt, t2ldrlabel:$addr, pred:$p)>;
 def : t2InstAlias<"ldrh${p} $Rt, $addr",
                   (t2LDRHpci rGPR:$Rt, t2ldrlabel:$addr, pred:$p)>;
 def : t2InstAlias<"ldrsb${p} $Rt, $addr",
                   (t2LDRSBpci rGPR:$Rt, t2ldrlabel:$addr, pred:$p)>;
 def : t2InstAlias<"ldrsh${p} $Rt, $addr",
                   (t2LDRSHpci rGPR:$Rt, t2ldrlabel:$addr, pred:$p)>;
 
 // Alias for MVN with(out) the ".w" optional width specifier.
 def : t2InstAlias<"mvn${s}${p}.w $Rd, $imm",
            (t2MVNi rGPR:$Rd, t2_so_imm:$imm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"mvn${s}${p} $Rd, $Rm",
            (t2MVNr rGPR:$Rd, rGPR:$Rm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"mvn${s}${p} $Rd, $ShiftedRm",
            (t2MVNs rGPR:$Rd, t2_so_reg:$ShiftedRm, pred:$p, cc_out:$s)>;
 
 // PKHBT/PKHTB with default shift amount. PKHTB is equivalent to PKHBT with the
 // input operands swapped when the shift amount is zero (i.e., unspecified).
 def : InstAlias<"pkhbt${p} $Rd, $Rn, $Rm",
                 (t2PKHBT rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p), 0>,
             Requires<[HasDSP, IsThumb2]>;
 def : InstAlias<"pkhtb${p} $Rd, $Rn, $Rm",
                 (t2PKHBT rGPR:$Rd, rGPR:$Rm, rGPR:$Rn, 0, pred:$p), 0>,
             Requires<[HasDSP, IsThumb2]>;
 
 // PUSH/POP aliases for STM/LDM
 def : t2InstAlias<"push${p}.w $regs", (t2STMDB_UPD SP, pred:$p, reglist:$regs)>;
 def : t2InstAlias<"push${p} $regs", (t2STMDB_UPD SP, pred:$p, reglist:$regs)>;
 def : t2InstAlias<"pop${p}.w $regs", (t2LDMIA_UPD SP, pred:$p, reglist:$regs)>;
 def : t2InstAlias<"pop${p} $regs", (t2LDMIA_UPD SP, pred:$p, reglist:$regs)>;
 
 // STMIA/STMIA_UPD aliases w/o the optional .w suffix
 def : t2InstAlias<"stm${p} $Rn, $regs",
                   (t2STMIA GPR:$Rn, pred:$p, reglist:$regs)>;
 def : t2InstAlias<"stm${p} $Rn!, $regs",
                   (t2STMIA_UPD GPR:$Rn, pred:$p, reglist:$regs)>;
 
 // LDMIA/LDMIA_UPD aliases w/o the optional .w suffix
 def : t2InstAlias<"ldm${p} $Rn, $regs",
                   (t2LDMIA GPR:$Rn, pred:$p, reglist:$regs)>;
 def : t2InstAlias<"ldm${p} $Rn!, $regs",
                   (t2LDMIA_UPD GPR:$Rn, pred:$p, reglist:$regs)>;
 
 // STMDB/STMDB_UPD aliases w/ the optional .w suffix
 def : t2InstAlias<"stmdb${p}.w $Rn, $regs",
                   (t2STMDB GPR:$Rn, pred:$p, reglist:$regs)>;
 def : t2InstAlias<"stmdb${p}.w $Rn!, $regs",
                   (t2STMDB_UPD GPR:$Rn, pred:$p, reglist:$regs)>;
 
 // LDMDB/LDMDB_UPD aliases w/ the optional .w suffix
 def : t2InstAlias<"ldmdb${p}.w $Rn, $regs",
                   (t2LDMDB GPR:$Rn, pred:$p, reglist:$regs)>;
 def : t2InstAlias<"ldmdb${p}.w $Rn!, $regs",
                   (t2LDMDB_UPD GPR:$Rn, pred:$p, reglist:$regs)>;
 
 // Alias for REV/REV16/REVSH without the ".w" optional width specifier.
 def : t2InstAlias<"rev${p} $Rd, $Rm", (t2REV rGPR:$Rd, rGPR:$Rm, pred:$p)>;
 def : t2InstAlias<"rev16${p} $Rd, $Rm", (t2REV16 rGPR:$Rd, rGPR:$Rm, pred:$p)>;
 def : t2InstAlias<"revsh${p} $Rd, $Rm", (t2REVSH rGPR:$Rd, rGPR:$Rm, pred:$p)>;
 
 
 // Alias for RSB without the ".w" optional width specifier, and with optional
 // implied destination register.
 def : t2InstAlias<"rsb${s}${p} $Rd, $Rn, $imm",
            (t2RSBri rGPR:$Rd, rGPR:$Rn, t2_so_imm:$imm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"rsb${s}${p} $Rdn, $imm",
            (t2RSBri rGPR:$Rdn, rGPR:$Rdn, t2_so_imm:$imm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"rsb${s}${p} $Rdn, $Rm",
            (t2RSBrr rGPR:$Rdn, rGPR:$Rdn, rGPR:$Rm, pred:$p, cc_out:$s)>;
 def : t2InstAlias<"rsb${s}${p} $Rdn, $ShiftedRm",
            (t2RSBrs rGPR:$Rdn, rGPR:$Rdn, t2_so_reg:$ShiftedRm, pred:$p,
                     cc_out:$s)>;
 
 // SSAT/USAT optional shift operand.
 def : t2InstAlias<"ssat${p} $Rd, $sat_imm, $Rn",
                   (t2SSAT rGPR:$Rd, imm1_32:$sat_imm, rGPR:$Rn, 0, pred:$p)>;
 def : t2InstAlias<"usat${p} $Rd, $sat_imm, $Rn",
                   (t2USAT rGPR:$Rd, imm0_31:$sat_imm, rGPR:$Rn, 0, pred:$p)>;
 
 // STM w/o the .w suffix.
 def : t2InstAlias<"stm${p} $Rn, $regs",
                   (t2STMIA GPR:$Rn, pred:$p, reglist:$regs)>;
 
 // Alias for STR, STRB, and STRH without the ".w" optional
 // width specifier.
 def : t2InstAlias<"str${p} $Rt, $addr",
                   (t2STRi12 GPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>;
 def : t2InstAlias<"strb${p} $Rt, $addr",
                   (t2STRBi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>;
 def : t2InstAlias<"strh${p} $Rt, $addr",
                   (t2STRHi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>;
 
 def : t2InstAlias<"str${p} $Rt, $addr",
                   (t2STRs GPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>;
 def : t2InstAlias<"strb${p} $Rt, $addr",
                   (t2STRBs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>;
 def : t2InstAlias<"strh${p} $Rt, $addr",
                   (t2STRHs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>;
 
 // Extend instruction optional rotate operand.
 def : InstAlias<"sxtab${p} $Rd, $Rn, $Rm",
               (t2SXTAB rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p), 0>,
               Requires<[HasDSP, IsThumb2]>;
 def : InstAlias<"sxtah${p} $Rd, $Rn, $Rm",
               (t2SXTAH rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p), 0>,
               Requires<[HasDSP, IsThumb2]>;
 def : InstAlias<"sxtab16${p} $Rd, $Rn, $Rm",
               (t2SXTAB16 rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p), 0>,
               Requires<[HasDSP, IsThumb2]>;
 def : InstAlias<"sxtb16${p} $Rd, $Rm",
               (t2SXTB16 rGPR:$Rd, rGPR:$Rm, 0, pred:$p), 0>,
               Requires<[HasDSP, IsThumb2]>;
 
 def : t2InstAlias<"sxtb${p} $Rd, $Rm",
                 (t2SXTB rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>;
 def : t2InstAlias<"sxth${p} $Rd, $Rm",
                 (t2SXTH rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>;
 def : t2InstAlias<"sxtb${p}.w $Rd, $Rm",
                 (t2SXTB rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>;
 def : t2InstAlias<"sxth${p}.w $Rd, $Rm",
                 (t2SXTH rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>;
 
 def : InstAlias<"uxtab${p} $Rd, $Rn, $Rm",
               (t2UXTAB rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p), 0>,
               Requires<[HasDSP, IsThumb2]>;
 def : InstAlias<"uxtah${p} $Rd, $Rn, $Rm",
               (t2UXTAH rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p), 0>,
               Requires<[HasDSP, IsThumb2]>;
 def : InstAlias<"uxtab16${p} $Rd, $Rn, $Rm",
               (t2UXTAB16 rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p), 0>,
               Requires<[HasDSP, IsThumb2]>;
 def : InstAlias<"uxtb16${p} $Rd, $Rm",
               (t2UXTB16 rGPR:$Rd, rGPR:$Rm, 0, pred:$p), 0>,
               Requires<[HasDSP, IsThumb2]>;
 
 def : t2InstAlias<"uxtb${p} $Rd, $Rm",
                 (t2UXTB rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>;
 def : t2InstAlias<"uxth${p} $Rd, $Rm",
                 (t2UXTH rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>;
 def : t2InstAlias<"uxtb${p}.w $Rd, $Rm",
                 (t2UXTB rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>;
 def : t2InstAlias<"uxth${p}.w $Rd, $Rm",
                 (t2UXTH rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>;
 
 // Extend instruction w/o the ".w" optional width specifier.
 def : t2InstAlias<"uxtb${p} $Rd, $Rm$rot",
                   (t2UXTB rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p)>;
 def : InstAlias<"uxtb16${p} $Rd, $Rm$rot",
                 (t2UXTB16 rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p), 0>,
                 Requires<[HasDSP, IsThumb2]>;
 def : t2InstAlias<"uxth${p} $Rd, $Rm$rot",
                   (t2UXTH rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p)>;
 
 def : t2InstAlias<"sxtb${p} $Rd, $Rm$rot",
                   (t2SXTB rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p)>;
 def : InstAlias<"sxtb16${p} $Rd, $Rm$rot",
                 (t2SXTB16 rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p), 0>,
                 Requires<[HasDSP, IsThumb2]>;
 def : t2InstAlias<"sxth${p} $Rd, $Rm$rot",
                   (t2SXTH rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p)>;
 
 
 // "mov Rd, t2_so_imm_not" can be handled via "mvn" in assembly, just like
 // for isel.
 def : t2InstSubst<"mov${p} $Rd, $imm",
                   (t2MVNi rGPR:$Rd, t2_so_imm_not:$imm, pred:$p, zero_reg)>;
 def : t2InstSubst<"mvn${s}${p} $Rd, $imm",
                   (t2MOVi rGPR:$Rd, t2_so_imm_not:$imm, pred:$p, s_cc_out:$s)>;
 // Same for AND <--> BIC
 def : t2InstSubst<"bic${s}${p} $Rd, $Rn, $imm",
                   (t2ANDri rGPR:$Rd, rGPR:$Rn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 def : t2InstSubst<"bic${s}${p} $Rdn, $imm",
                   (t2ANDri rGPR:$Rdn, rGPR:$Rdn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 def : t2InstSubst<"bic${s}${p}.w $Rd, $Rn, $imm",
                   (t2ANDri rGPR:$Rd, rGPR:$Rn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 def : t2InstSubst<"bic${s}${p}.w $Rdn, $imm",
                   (t2ANDri rGPR:$Rdn, rGPR:$Rdn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 def : t2InstSubst<"and${s}${p} $Rd, $Rn, $imm",
                   (t2BICri rGPR:$Rd, rGPR:$Rn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 def : t2InstSubst<"and${s}${p} $Rdn, $imm",
                   (t2BICri rGPR:$Rdn, rGPR:$Rdn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 def : t2InstSubst<"and${s}${p}.w $Rd, $Rn, $imm",
                   (t2BICri rGPR:$Rd, rGPR:$Rn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 def : t2InstSubst<"and${s}${p}.w $Rdn, $imm",
                   (t2BICri rGPR:$Rdn, rGPR:$Rdn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 // And ORR <--> ORN
 def : t2InstSubst<"orn${s}${p} $Rd, $Rn, $imm",
                   (t2ORRri rGPR:$Rd, rGPR:$Rn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 def : t2InstSubst<"orn${s}${p} $Rdn, $imm",
                   (t2ORRri rGPR:$Rdn, rGPR:$Rdn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 def : t2InstSubst<"orr${s}${p} $Rd, $Rn, $imm",
                   (t2ORNri rGPR:$Rd, rGPR:$Rn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 def : t2InstSubst<"orr${s}${p} $Rdn, $imm",
                   (t2ORNri rGPR:$Rdn, rGPR:$Rdn, t2_so_imm_not:$imm,
                            pred:$p, cc_out:$s)>;
 // Likewise, "add Rd, t2_so_imm_neg" -> sub
 def : t2InstSubst<"add${s}${p} $Rd, $Rn, $imm",
                   (t2SUBri GPRnopc:$Rd, GPRnopc:$Rn, t2_so_imm_neg:$imm,
                            pred:$p, cc_out:$s)>;
 def : t2InstSubst<"add${s}${p} $Rd, $imm",
                   (t2SUBri GPRnopc:$Rd, GPRnopc:$Rd, t2_so_imm_neg:$imm,
                            pred:$p, cc_out:$s)>;
 // Same for CMP <--> CMN via t2_so_imm_neg
 def : t2InstSubst<"cmp${p} $Rd, $imm",
                   (t2CMNri rGPR:$Rd, t2_so_imm_neg:$imm, pred:$p)>;
 def : t2InstSubst<"cmn${p} $Rd, $imm",
                   (t2CMPri rGPR:$Rd, t2_so_imm_neg:$imm, pred:$p)>;
 
 
 // Wide 'mul' encoding can be specified with only two operands.
 def : t2InstAlias<"mul${p} $Rn, $Rm",
                   (t2MUL rGPR:$Rn, rGPR:$Rm, rGPR:$Rn, pred:$p)>;
 
 // "neg" is and alias for "rsb rd, rn, #0"
 def : t2InstAlias<"neg${s}${p} $Rd, $Rm",
                   (t2RSBri rGPR:$Rd, rGPR:$Rm, 0, pred:$p, cc_out:$s)>;
 
 // MOV so_reg assembler pseudos. InstAlias isn't expressive enough for
 // these, unfortunately.
 // FIXME: LSL #0 in the shift should allow SP to be used as either the
 // source or destination (but not both).
 def t2MOVsi: t2AsmPseudo<"mov${p} $Rd, $shift",
                          (ins rGPR:$Rd, t2_so_reg:$shift, pred:$p)>;
 def t2MOVSsi: t2AsmPseudo<"movs${p} $Rd, $shift",
                           (ins rGPR:$Rd, t2_so_reg:$shift, pred:$p)>;
 
 def t2MOVsr: t2AsmPseudo<"mov${p} $Rd, $shift",
                          (ins rGPR:$Rd, so_reg_reg:$shift, pred:$p)>;
 def t2MOVSsr: t2AsmPseudo<"movs${p} $Rd, $shift",
                           (ins rGPR:$Rd, so_reg_reg:$shift, pred:$p)>;
 
 // Aliases for the above with the .w qualifier
 def : t2InstAlias<"mov${p}.w $Rd, $shift",
                   (t2MOVsi rGPR:$Rd, t2_so_reg:$shift, pred:$p)>;
 def : t2InstAlias<"movs${p}.w $Rd, $shift",
                   (t2MOVSsi rGPR:$Rd, t2_so_reg:$shift, pred:$p)>;
 def : t2InstAlias<"mov${p}.w $Rd, $shift",
                   (t2MOVsr rGPR:$Rd, so_reg_reg:$shift, pred:$p)>;
 def : t2InstAlias<"movs${p}.w $Rd, $shift",
                   (t2MOVSsr rGPR:$Rd, so_reg_reg:$shift, pred:$p)>;
 
 // ADR w/o the .w suffix
 def : t2InstAlias<"adr${p} $Rd, $addr",
                   (t2ADR rGPR:$Rd, t2adrlabel:$addr, pred:$p)>;
 
 // LDR(literal) w/ alternate [pc, #imm] syntax.
 def t2LDRpcrel   : t2AsmPseudo<"ldr${p} $Rt, $addr",
                          (ins GPR:$Rt, t2ldr_pcrel_imm12:$addr, pred:$p)>;
 def t2LDRBpcrel  : t2AsmPseudo<"ldrb${p} $Rt, $addr",
                          (ins GPRnopc:$Rt, t2ldr_pcrel_imm12:$addr, pred:$p)>;
 def t2LDRHpcrel  : t2AsmPseudo<"ldrh${p} $Rt, $addr",
                          (ins GPRnopc:$Rt, t2ldr_pcrel_imm12:$addr, pred:$p)>;
 def t2LDRSBpcrel  : t2AsmPseudo<"ldrsb${p} $Rt, $addr",
                          (ins GPRnopc:$Rt, t2ldr_pcrel_imm12:$addr, pred:$p)>;
 def t2LDRSHpcrel  : t2AsmPseudo<"ldrsh${p} $Rt, $addr",
                          (ins GPRnopc:$Rt, t2ldr_pcrel_imm12:$addr, pred:$p)>;
     // Version w/ the .w suffix.
 def : t2InstAlias<"ldr${p}.w $Rt, $addr",
                   (t2LDRpcrel GPR:$Rt, t2ldr_pcrel_imm12:$addr, pred:$p), 0>;
 def : t2InstAlias<"ldrb${p}.w $Rt, $addr",
                   (t2LDRBpcrel GPRnopc:$Rt, t2ldr_pcrel_imm12:$addr, pred:$p)>;
 def : t2InstAlias<"ldrh${p}.w $Rt, $addr",
                   (t2LDRHpcrel GPRnopc:$Rt, t2ldr_pcrel_imm12:$addr, pred:$p)>;
 def : t2InstAlias<"ldrsb${p}.w $Rt, $addr",
                   (t2LDRSBpcrel GPRnopc:$Rt, t2ldr_pcrel_imm12:$addr, pred:$p)>;
 def : t2InstAlias<"ldrsh${p}.w $Rt, $addr",
                   (t2LDRSHpcrel GPRnopc:$Rt, t2ldr_pcrel_imm12:$addr, pred:$p)>;
 
 def : t2InstAlias<"add${p} $Rd, pc, $imm",
                   (t2ADR rGPR:$Rd, imm0_4095:$imm, pred:$p)>;
 
 // Pseudo instruction ldr Rt, =immediate
 def t2LDRConstPool
   : t2AsmPseudo<"ldr${p} $Rt, $immediate",
                 (ins GPR:$Rt, const_pool_asm_imm:$immediate, pred:$p)>;
 // Version w/ the .w suffix.
 def : t2InstAlias<"ldr${p}.w $Rt, $immediate",
                   (t2LDRConstPool GPRnopc:$Rt,
                   const_pool_asm_imm:$immediate, pred:$p)>;
 
 // PLD/PLDW/PLI with alternate literal form.
 def : t2InstAlias<"pld${p} $addr",
                   (t2PLDpci t2ldr_pcrel_imm12:$addr, pred:$p)>;
 def : InstAlias<"pli${p} $addr",
                  (t2PLIpci  t2ldr_pcrel_imm12:$addr, pred:$p), 0>,
       Requires<[IsThumb2,HasV7]>;
Index: vendor/llvm/dist-release_70/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h
===================================================================
--- vendor/llvm/dist-release_70/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Target/ARM/MCTargetDesc/ARMBaseInfo.h	(revision 338575)
@@ -1,427 +1,429 @@
 //===-- ARMBaseInfo.h - Top level definitions for ARM -------- --*- C++ -*-===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file contains small standalone helper functions and enum definitions for
 // the ARM target useful for the compiler back-end and the MC libraries.
 // As such, it deliberately does not include references to LLVM core
 // code gen types, passes, etc..
 //
 //===----------------------------------------------------------------------===//
 
 #ifndef LLVM_LIB_TARGET_ARM_MCTARGETDESC_ARMBASEINFO_H
 #define LLVM_LIB_TARGET_ARM_MCTARGETDESC_ARMBASEINFO_H
 
 #include "ARMMCTargetDesc.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "Utils/ARMBaseInfo.h"
 
 namespace llvm {
 
 namespace ARM_PROC {
   enum IMod {
     IE = 2,
     ID = 3
   };
 
   enum IFlags {
     F = 1,
     I = 2,
     A = 4
   };
 
   inline static const char *IFlagsToString(unsigned val) {
     switch (val) {
     default: llvm_unreachable("Unknown iflags operand");
     case F: return "f";
     case I: return "i";
     case A: return "a";
     }
   }
 
   inline static const char *IModToString(unsigned val) {
     switch (val) {
     default: llvm_unreachable("Unknown imod operand");
     case IE: return "ie";
     case ID: return "id";
     }
   }
 }
 
 namespace ARM_MB {
   // The Memory Barrier Option constants map directly to the 4-bit encoding of
   // the option field for memory barrier operations.
   enum MemBOpt {
     RESERVED_0 = 0,
     OSHLD = 1,
     OSHST = 2,
     OSH   = 3,
     RESERVED_4 = 4,
     NSHLD = 5,
     NSHST = 6,
     NSH   = 7,
     RESERVED_8 = 8,
     ISHLD = 9,
     ISHST = 10,
     ISH   = 11,
     RESERVED_12 = 12,
     LD = 13,
     ST    = 14,
     SY    = 15
   };
 
   inline static const char *MemBOptToString(unsigned val, bool HasV8) {
     switch (val) {
     default: llvm_unreachable("Unknown memory operation");
     case SY:    return "sy";
     case ST:    return "st";
     case LD: return HasV8 ? "ld" : "#0xd";
     case RESERVED_12: return "#0xc";
     case ISH:   return "ish";
     case ISHST: return "ishst";
     case ISHLD: return HasV8 ?  "ishld" : "#0x9";
     case RESERVED_8: return "#0x8";
     case NSH:   return "nsh";
     case NSHST: return "nshst";
     case NSHLD: return HasV8 ? "nshld" : "#0x5";
     case RESERVED_4: return "#0x4";
     case OSH:   return "osh";
     case OSHST: return "oshst";
     case OSHLD: return HasV8 ? "oshld" : "#0x1";
     case RESERVED_0: return "#0x0";
     }
   }
 } // namespace ARM_MB
 
 namespace ARM_TSB {
   enum TraceSyncBOpt {
     CSYNC = 0
   };
 
   inline static const char *TraceSyncBOptToString(unsigned val) {
     switch (val) {
     default:
       llvm_unreachable("Unknown trace synchronization barrier operation");
       case CSYNC: return "csync";
     }
   }
 } // namespace ARM_TSB
 
 namespace ARM_ISB {
   enum InstSyncBOpt {
     RESERVED_0 = 0,
     RESERVED_1 = 1,
     RESERVED_2 = 2,
     RESERVED_3 = 3,
     RESERVED_4 = 4,
     RESERVED_5 = 5,
     RESERVED_6 = 6,
     RESERVED_7 = 7,
     RESERVED_8 = 8,
     RESERVED_9 = 9,
     RESERVED_10 = 10,
     RESERVED_11 = 11,
     RESERVED_12 = 12,
     RESERVED_13 = 13,
     RESERVED_14 = 14,
     SY = 15
   };
 
   inline static const char *InstSyncBOptToString(unsigned val) {
     switch (val) {
     default:
       llvm_unreachable("Unknown memory operation");
       case RESERVED_0:  return "#0x0";
       case RESERVED_1:  return "#0x1";
       case RESERVED_2:  return "#0x2";
       case RESERVED_3:  return "#0x3";
       case RESERVED_4:  return "#0x4";
       case RESERVED_5:  return "#0x5";
       case RESERVED_6:  return "#0x6";
       case RESERVED_7:  return "#0x7";
       case RESERVED_8:  return "#0x8";
       case RESERVED_9:  return "#0x9";
       case RESERVED_10: return "#0xa";
       case RESERVED_11: return "#0xb";
       case RESERVED_12: return "#0xc";
       case RESERVED_13: return "#0xd";
       case RESERVED_14: return "#0xe";
       case SY:          return "sy";
     }
   }
 } // namespace ARM_ISB
 
 /// isARMLowRegister - Returns true if the register is a low register (r0-r7).
 ///
 static inline bool isARMLowRegister(unsigned Reg) {
   using namespace ARM;
   switch (Reg) {
   case R0:  case R1:  case R2:  case R3:
   case R4:  case R5:  case R6:  case R7:
     return true;
   default:
     return false;
   }
 }
 
 /// ARMII - This namespace holds all of the target specific flags that
 /// instruction info tracks.
 ///
 namespace ARMII {
 
   /// ARM Index Modes
   enum IndexMode {
     IndexModeNone  = 0,
     IndexModePre   = 1,
     IndexModePost  = 2,
     IndexModeUpd   = 3
   };
 
   /// ARM Addressing Modes
   enum AddrMode {
     AddrModeNone    = 0,
     AddrMode1       = 1,
     AddrMode2       = 2,
     AddrMode3       = 3,
     AddrMode4       = 4,
     AddrMode5       = 5,
     AddrMode6       = 6,
     AddrModeT1_1    = 7,
     AddrModeT1_2    = 8,
     AddrModeT1_4    = 9,
     AddrModeT1_s    = 10, // i8 * 4 for pc and sp relative data
     AddrModeT2_i12  = 11,
     AddrModeT2_i8   = 12,
     AddrModeT2_so   = 13,
     AddrModeT2_pc   = 14, // +/- i12 for pc relative data
     AddrModeT2_i8s4 = 15, // i8 * 4
     AddrMode_i12    = 16,
-    AddrMode5FP16   = 17  // i8 * 2
+    AddrMode5FP16   = 17,  // i8 * 2
+    AddrModeT2_ldrex = 18, // i8 * 4, with unscaled offset in MCInst
   };
 
   inline static const char *AddrModeToString(AddrMode addrmode) {
     switch (addrmode) {
     case AddrModeNone:    return "AddrModeNone";
     case AddrMode1:       return "AddrMode1";
     case AddrMode2:       return "AddrMode2";
     case AddrMode3:       return "AddrMode3";
     case AddrMode4:       return "AddrMode4";
     case AddrMode5:       return "AddrMode5";
     case AddrMode5FP16:   return "AddrMode5FP16";
     case AddrMode6:       return "AddrMode6";
     case AddrModeT1_1:    return "AddrModeT1_1";
     case AddrModeT1_2:    return "AddrModeT1_2";
     case AddrModeT1_4:    return "AddrModeT1_4";
     case AddrModeT1_s:    return "AddrModeT1_s";
     case AddrModeT2_i12:  return "AddrModeT2_i12";
     case AddrModeT2_i8:   return "AddrModeT2_i8";
     case AddrModeT2_so:   return "AddrModeT2_so";
     case AddrModeT2_pc:   return "AddrModeT2_pc";
     case AddrModeT2_i8s4: return "AddrModeT2_i8s4";
     case AddrMode_i12:    return "AddrMode_i12";
+    case AddrModeT2_ldrex:return "AddrModeT2_ldrex";
     }
   }
 
   /// Target Operand Flag enum.
   enum TOF {
     //===------------------------------------------------------------------===//
     // ARM Specific MachineOperand flags.
 
     MO_NO_FLAG = 0,
 
     /// MO_LO16 - On a symbol operand, this represents a relocation containing
     /// lower 16 bit of the address. Used only via movw instruction.
     MO_LO16 = 0x1,
 
     /// MO_HI16 - On a symbol operand, this represents a relocation containing
     /// higher 16 bit of the address. Used only via movt instruction.
     MO_HI16 = 0x2,
 
     /// MO_OPTION_MASK - Most flags are mutually exclusive; this mask selects
     /// just that part of the flag set.
     MO_OPTION_MASK = 0x3,
 
     /// MO_GOT - On a symbol operand, this represents a GOT relative relocation.
     MO_GOT = 0x8,
 
     /// MO_SBREL - On a symbol operand, this represents a static base relative
     /// relocation. Used in movw and movt instructions.
     MO_SBREL = 0x10,
 
     /// MO_DLLIMPORT - On a symbol operand, this represents that the reference
     /// to the symbol is for an import stub.  This is used for DLL import
     /// storage class indication on Windows.
     MO_DLLIMPORT = 0x20,
 
     /// MO_SECREL - On a symbol operand this indicates that the immediate is
     /// the offset from beginning of section.
     ///
     /// This is the TLS offset for the COFF/Windows TLS mechanism.
     MO_SECREL = 0x40,
 
     /// MO_NONLAZY - This is an independent flag, on a symbol operand "FOO" it
     /// represents a symbol which, if indirect, will get special Darwin mangling
     /// as a non-lazy-ptr indirect symbol (i.e. "L_FOO$non_lazy_ptr"). Can be
     /// combined with MO_LO16, MO_HI16 or MO_NO_FLAG (in a constant-pool, for
     /// example).
     MO_NONLAZY = 0x80,
 
     // It's undefined behaviour if an enum overflows the range between its
     // smallest and largest values, but since these are |ed together, it can
     // happen. Put a sentinel in (values of this enum are stored as "unsigned
     // char").
     MO_UNUSED_MAXIMUM = 0xff
   };
 
   enum {
     //===------------------------------------------------------------------===//
     // Instruction Flags.
 
     //===------------------------------------------------------------------===//
     // This four-bit field describes the addressing mode used.
     AddrModeMask  = 0x1f, // The AddrMode enums are declared in ARMBaseInfo.h
 
     // IndexMode - Unindex, pre-indexed, or post-indexed are valid for load
     // and store ops only.  Generic "updating" flag is used for ld/st multiple.
     // The index mode enums are declared in ARMBaseInfo.h
     IndexModeShift = 5,
     IndexModeMask  = 3 << IndexModeShift,
 
     //===------------------------------------------------------------------===//
     // Instruction encoding formats.
     //
     FormShift     = 7,
     FormMask      = 0x3f << FormShift,
 
     // Pseudo instructions
     Pseudo        = 0  << FormShift,
 
     // Multiply instructions
     MulFrm        = 1  << FormShift,
 
     // Branch instructions
     BrFrm         = 2  << FormShift,
     BrMiscFrm     = 3  << FormShift,
 
     // Data Processing instructions
     DPFrm         = 4  << FormShift,
     DPSoRegFrm    = 5  << FormShift,
 
     // Load and Store
     LdFrm         = 6  << FormShift,
     StFrm         = 7  << FormShift,
     LdMiscFrm     = 8  << FormShift,
     StMiscFrm     = 9  << FormShift,
     LdStMulFrm    = 10 << FormShift,
 
     LdStExFrm     = 11 << FormShift,
 
     // Miscellaneous arithmetic instructions
     ArithMiscFrm  = 12 << FormShift,
     SatFrm        = 13 << FormShift,
 
     // Extend instructions
     ExtFrm        = 14 << FormShift,
 
     // VFP formats
     VFPUnaryFrm   = 15 << FormShift,
     VFPBinaryFrm  = 16 << FormShift,
     VFPConv1Frm   = 17 << FormShift,
     VFPConv2Frm   = 18 << FormShift,
     VFPConv3Frm   = 19 << FormShift,
     VFPConv4Frm   = 20 << FormShift,
     VFPConv5Frm   = 21 << FormShift,
     VFPLdStFrm    = 22 << FormShift,
     VFPLdStMulFrm = 23 << FormShift,
     VFPMiscFrm    = 24 << FormShift,
 
     // Thumb format
     ThumbFrm      = 25 << FormShift,
 
     // Miscelleaneous format
     MiscFrm       = 26 << FormShift,
 
     // NEON formats
     NGetLnFrm     = 27 << FormShift,
     NSetLnFrm     = 28 << FormShift,
     NDupFrm       = 29 << FormShift,
     NLdStFrm      = 30 << FormShift,
     N1RegModImmFrm= 31 << FormShift,
     N2RegFrm      = 32 << FormShift,
     NVCVTFrm      = 33 << FormShift,
     NVDupLnFrm    = 34 << FormShift,
     N2RegVShLFrm  = 35 << FormShift,
     N2RegVShRFrm  = 36 << FormShift,
     N3RegFrm      = 37 << FormShift,
     N3RegVShFrm   = 38 << FormShift,
     NVExtFrm      = 39 << FormShift,
     NVMulSLFrm    = 40 << FormShift,
     NVTBLFrm      = 41 << FormShift,
     N3RegCplxFrm  = 43 << FormShift,
 
     //===------------------------------------------------------------------===//
     // Misc flags.
 
     // UnaryDP - Indicates this is a unary data processing instruction, i.e.
     // it doesn't have a Rn operand.
     UnaryDP       = 1 << 13,
 
     // Xform16Bit - Indicates this Thumb2 instruction may be transformed into
     // a 16-bit Thumb instruction if certain conditions are met.
     Xform16Bit    = 1 << 14,
 
     // ThumbArithFlagSetting - The instruction is a 16-bit flag setting Thumb
     // instruction. Used by the parser to determine whether to require the 'S'
     // suffix on the mnemonic (when not in an IT block) or preclude it (when
     // in an IT block).
     ThumbArithFlagSetting = 1 << 18,
 
     //===------------------------------------------------------------------===//
     // Code domain.
     DomainShift   = 15,
     DomainMask    = 7 << DomainShift,
     DomainGeneral = 0 << DomainShift,
     DomainVFP     = 1 << DomainShift,
     DomainNEON    = 2 << DomainShift,
     DomainNEONA8  = 4 << DomainShift,
 
     //===------------------------------------------------------------------===//
     // Field shifts - such shifts are used to set field while generating
     // machine instructions.
     //
     // FIXME: This list will need adjusting/fixing as the MC code emitter
     // takes shape and the ARMCodeEmitter.cpp bits go away.
     ShiftTypeShift = 4,
 
     M_BitShift     = 5,
     ShiftImmShift  = 5,
     ShiftShift     = 7,
     N_BitShift     = 7,
     ImmHiShift     = 8,
     SoRotImmShift  = 8,
     RegRsShift     = 8,
     ExtRotImmShift = 10,
     RegRdLoShift   = 12,
     RegRdShift     = 12,
     RegRdHiShift   = 16,
     RegRnShift     = 16,
     S_BitShift     = 20,
     W_BitShift     = 21,
     AM3_I_BitShift = 22,
     D_BitShift     = 22,
     U_BitShift     = 23,
     P_BitShift     = 24,
     I_BitShift     = 25,
     CondShift      = 28
   };
 
 } // end namespace ARMII
 
 } // end namespace llvm;
 
 #endif
Index: vendor/llvm/dist-release_70/lib/Target/ARM/Thumb2InstrInfo.cpp
===================================================================
--- vendor/llvm/dist-release_70/lib/Target/ARM/Thumb2InstrInfo.cpp	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Target/ARM/Thumb2InstrInfo.cpp	(revision 338575)
@@ -1,679 +1,684 @@
 //===- Thumb2InstrInfo.cpp - Thumb-2 Instruction Information --------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file contains the Thumb-2 implementation of the TargetInstrInfo class.
 //
 //===----------------------------------------------------------------------===//
 
 #include "Thumb2InstrInfo.h"
 #include "ARMMachineFunctionInfo.h"
 #include "MCTargetDesc/ARMAddressingModes.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineInstr.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineMemOperand.h"
 #include "llvm/CodeGen/MachineOperand.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/IR/DebugLoc.h"
 #include "llvm/MC/MCInst.h"
 #include "llvm/MC/MCInstrDesc.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Target/TargetMachine.h"
 #include <cassert>
 
 using namespace llvm;
 
 static cl::opt<bool>
 OldT2IfCvt("old-thumb2-ifcvt", cl::Hidden,
            cl::desc("Use old-style Thumb2 if-conversion heuristics"),
            cl::init(false));
 
 Thumb2InstrInfo::Thumb2InstrInfo(const ARMSubtarget &STI)
     : ARMBaseInstrInfo(STI) {}
 
 /// Return the noop instruction to use for a noop.
 void Thumb2InstrInfo::getNoop(MCInst &NopInst) const {
   NopInst.setOpcode(ARM::tHINT);
   NopInst.addOperand(MCOperand::createImm(0));
   NopInst.addOperand(MCOperand::createImm(ARMCC::AL));
   NopInst.addOperand(MCOperand::createReg(0));
 }
 
 unsigned Thumb2InstrInfo::getUnindexedOpcode(unsigned Opc) const {
   // FIXME
   return 0;
 }
 
 void
 Thumb2InstrInfo::ReplaceTailWithBranchTo(MachineBasicBlock::iterator Tail,
                                          MachineBasicBlock *NewDest) const {
   MachineBasicBlock *MBB = Tail->getParent();
   ARMFunctionInfo *AFI = MBB->getParent()->getInfo<ARMFunctionInfo>();
   if (!AFI->hasITBlocks() || Tail->isBranch()) {
     TargetInstrInfo::ReplaceTailWithBranchTo(Tail, NewDest);
     return;
   }
 
   // If the first instruction of Tail is predicated, we may have to update
   // the IT instruction.
   unsigned PredReg = 0;
   ARMCC::CondCodes CC = getInstrPredicate(*Tail, PredReg);
   MachineBasicBlock::iterator MBBI = Tail;
   if (CC != ARMCC::AL)
     // Expecting at least the t2IT instruction before it.
     --MBBI;
 
   // Actually replace the tail.
   TargetInstrInfo::ReplaceTailWithBranchTo(Tail, NewDest);
 
   // Fix up IT.
   if (CC != ARMCC::AL) {
     MachineBasicBlock::iterator E = MBB->begin();
     unsigned Count = 4; // At most 4 instructions in an IT block.
     while (Count && MBBI != E) {
       if (MBBI->isDebugInstr()) {
         --MBBI;
         continue;
       }
       if (MBBI->getOpcode() == ARM::t2IT) {
         unsigned Mask = MBBI->getOperand(1).getImm();
         if (Count == 4)
           MBBI->eraseFromParent();
         else {
           unsigned MaskOn = 1 << Count;
           unsigned MaskOff = ~(MaskOn - 1);
           MBBI->getOperand(1).setImm((Mask & MaskOff) | MaskOn);
         }
         return;
       }
       --MBBI;
       --Count;
     }
 
     // Ctrl flow can reach here if branch folding is run before IT block
     // formation pass.
   }
 }
 
 bool
 Thumb2InstrInfo::isLegalToSplitMBBAt(MachineBasicBlock &MBB,
                                      MachineBasicBlock::iterator MBBI) const {
   while (MBBI->isDebugInstr()) {
     ++MBBI;
     if (MBBI == MBB.end())
       return false;
   }
 
   unsigned PredReg = 0;
   return getITInstrPredicate(*MBBI, PredReg) == ARMCC::AL;
 }
 
 void Thumb2InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
                                   MachineBasicBlock::iterator I,
                                   const DebugLoc &DL, unsigned DestReg,
                                   unsigned SrcReg, bool KillSrc) const {
   // Handle SPR, DPR, and QPR copies.
   if (!ARM::GPRRegClass.contains(DestReg, SrcReg))
     return ARMBaseInstrInfo::copyPhysReg(MBB, I, DL, DestReg, SrcReg, KillSrc);
 
   BuildMI(MBB, I, DL, get(ARM::tMOVr), DestReg)
       .addReg(SrcReg, getKillRegState(KillSrc))
       .add(predOps(ARMCC::AL));
 }
 
 void Thumb2InstrInfo::
 storeRegToStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
                     unsigned SrcReg, bool isKill, int FI,
                     const TargetRegisterClass *RC,
                     const TargetRegisterInfo *TRI) const {
   DebugLoc DL;
   if (I != MBB.end()) DL = I->getDebugLoc();
 
   MachineFunction &MF = *MBB.getParent();
   MachineFrameInfo &MFI = MF.getFrameInfo();
   MachineMemOperand *MMO = MF.getMachineMemOperand(
       MachinePointerInfo::getFixedStack(MF, FI), MachineMemOperand::MOStore,
       MFI.getObjectSize(FI), MFI.getObjectAlignment(FI));
 
   if (RC == &ARM::GPRRegClass   || RC == &ARM::tGPRRegClass ||
       RC == &ARM::tcGPRRegClass || RC == &ARM::rGPRRegClass ||
       RC == &ARM::GPRnopcRegClass) {
     BuildMI(MBB, I, DL, get(ARM::t2STRi12))
         .addReg(SrcReg, getKillRegState(isKill))
         .addFrameIndex(FI)
         .addImm(0)
         .addMemOperand(MMO)
         .add(predOps(ARMCC::AL));
     return;
   }
 
   if (ARM::GPRPairRegClass.hasSubClassEq(RC)) {
     // Thumb2 STRD expects its dest-registers to be in rGPR. Not a problem for
     // gsub_0, but needs an extra constraint for gsub_1 (which could be sp
     // otherwise).
     if (TargetRegisterInfo::isVirtualRegister(SrcReg)) {
       MachineRegisterInfo *MRI = &MF.getRegInfo();
       MRI->constrainRegClass(SrcReg, &ARM::GPRPair_with_gsub_1_in_rGPRRegClass);
     }
 
     MachineInstrBuilder MIB = BuildMI(MBB, I, DL, get(ARM::t2STRDi8));
     AddDReg(MIB, SrcReg, ARM::gsub_0, getKillRegState(isKill), TRI);
     AddDReg(MIB, SrcReg, ARM::gsub_1, 0, TRI);
     MIB.addFrameIndex(FI).addImm(0).addMemOperand(MMO).add(predOps(ARMCC::AL));
     return;
   }
 
   ARMBaseInstrInfo::storeRegToStackSlot(MBB, I, SrcReg, isKill, FI, RC, TRI);
 }
 
 void Thumb2InstrInfo::
 loadRegFromStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
                      unsigned DestReg, int FI,
                      const TargetRegisterClass *RC,
                      const TargetRegisterInfo *TRI) const {
   MachineFunction &MF = *MBB.getParent();
   MachineFrameInfo &MFI = MF.getFrameInfo();
   MachineMemOperand *MMO = MF.getMachineMemOperand(
       MachinePointerInfo::getFixedStack(MF, FI), MachineMemOperand::MOLoad,
       MFI.getObjectSize(FI), MFI.getObjectAlignment(FI));
   DebugLoc DL;
   if (I != MBB.end()) DL = I->getDebugLoc();
 
   if (RC == &ARM::GPRRegClass   || RC == &ARM::tGPRRegClass ||
       RC == &ARM::tcGPRRegClass || RC == &ARM::rGPRRegClass ||
       RC == &ARM::GPRnopcRegClass) {
     BuildMI(MBB, I, DL, get(ARM::t2LDRi12), DestReg)
         .addFrameIndex(FI)
         .addImm(0)
         .addMemOperand(MMO)
         .add(predOps(ARMCC::AL));
     return;
   }
 
   if (ARM::GPRPairRegClass.hasSubClassEq(RC)) {
     // Thumb2 LDRD expects its dest-registers to be in rGPR. Not a problem for
     // gsub_0, but needs an extra constraint for gsub_1 (which could be sp
     // otherwise).
     if (TargetRegisterInfo::isVirtualRegister(DestReg)) {
       MachineRegisterInfo *MRI = &MF.getRegInfo();
       MRI->constrainRegClass(DestReg,
                              &ARM::GPRPair_with_gsub_1_in_rGPRRegClass);
     }
 
     MachineInstrBuilder MIB = BuildMI(MBB, I, DL, get(ARM::t2LDRDi8));
     AddDReg(MIB, DestReg, ARM::gsub_0, RegState::DefineNoRead, TRI);
     AddDReg(MIB, DestReg, ARM::gsub_1, RegState::DefineNoRead, TRI);
     MIB.addFrameIndex(FI).addImm(0).addMemOperand(MMO).add(predOps(ARMCC::AL));
 
     if (TargetRegisterInfo::isPhysicalRegister(DestReg))
       MIB.addReg(DestReg, RegState::ImplicitDefine);
     return;
   }
 
   ARMBaseInstrInfo::loadRegFromStackSlot(MBB, I, DestReg, FI, RC, TRI);
 }
 
 void Thumb2InstrInfo::expandLoadStackGuard(
     MachineBasicBlock::iterator MI) const {
   MachineFunction &MF = *MI->getParent()->getParent();
   if (MF.getTarget().isPositionIndependent())
     expandLoadStackGuardBase(MI, ARM::t2MOV_ga_pcrel, ARM::t2LDRi12);
   else
     expandLoadStackGuardBase(MI, ARM::t2MOVi32imm, ARM::t2LDRi12);
 }
 
 void llvm::emitT2RegPlusImmediate(MachineBasicBlock &MBB,
                                   MachineBasicBlock::iterator &MBBI,
                                   const DebugLoc &dl, unsigned DestReg,
                                   unsigned BaseReg, int NumBytes,
                                   ARMCC::CondCodes Pred, unsigned PredReg,
                                   const ARMBaseInstrInfo &TII,
                                   unsigned MIFlags) {
   if (NumBytes == 0 && DestReg != BaseReg) {
     BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), DestReg)
       .addReg(BaseReg, RegState::Kill)
       .addImm((unsigned)Pred).addReg(PredReg).setMIFlags(MIFlags);
     return;
   }
 
   bool isSub = NumBytes < 0;
   if (isSub) NumBytes = -NumBytes;
 
   // If profitable, use a movw or movt to materialize the offset.
   // FIXME: Use the scavenger to grab a scratch register.
   if (DestReg != ARM::SP && DestReg != BaseReg &&
       NumBytes >= 4096 &&
       ARM_AM::getT2SOImmVal(NumBytes) == -1) {
     bool Fits = false;
     if (NumBytes < 65536) {
       // Use a movw to materialize the 16-bit constant.
       BuildMI(MBB, MBBI, dl, TII.get(ARM::t2MOVi16), DestReg)
         .addImm(NumBytes)
         .addImm((unsigned)Pred).addReg(PredReg).setMIFlags(MIFlags);
       Fits = true;
     } else if ((NumBytes & 0xffff) == 0) {
       // Use a movt to materialize the 32-bit constant.
       BuildMI(MBB, MBBI, dl, TII.get(ARM::t2MOVTi16), DestReg)
         .addReg(DestReg)
         .addImm(NumBytes >> 16)
         .addImm((unsigned)Pred).addReg(PredReg).setMIFlags(MIFlags);
       Fits = true;
     }
 
     if (Fits) {
       if (isSub) {
         BuildMI(MBB, MBBI, dl, TII.get(ARM::t2SUBrr), DestReg)
             .addReg(BaseReg)
             .addReg(DestReg, RegState::Kill)
             .add(predOps(Pred, PredReg))
             .add(condCodeOp())
             .setMIFlags(MIFlags);
       } else {
         // Here we know that DestReg is not SP but we do not
         // know anything about BaseReg. t2ADDrr is an invalid
         // instruction is SP is used as the second argument, but
         // is fine if SP is the first argument. To be sure we
         // do not generate invalid encoding, put BaseReg first.
         BuildMI(MBB, MBBI, dl, TII.get(ARM::t2ADDrr), DestReg)
             .addReg(BaseReg)
             .addReg(DestReg, RegState::Kill)
             .add(predOps(Pred, PredReg))
             .add(condCodeOp())
             .setMIFlags(MIFlags);
       }
       return;
     }
   }
 
   while (NumBytes) {
     unsigned ThisVal = NumBytes;
     unsigned Opc = 0;
     if (DestReg == ARM::SP && BaseReg != ARM::SP) {
       // mov sp, rn. Note t2MOVr cannot be used.
       BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), DestReg)
           .addReg(BaseReg)
           .setMIFlags(MIFlags)
           .add(predOps(ARMCC::AL));
       BaseReg = ARM::SP;
       continue;
     }
 
     bool HasCCOut = true;
     if (BaseReg == ARM::SP) {
       // sub sp, sp, #imm7
       if (DestReg == ARM::SP && (ThisVal < ((1 << 7)-1) * 4)) {
         assert((ThisVal & 3) == 0 && "Stack update is not multiple of 4?");
         Opc = isSub ? ARM::tSUBspi : ARM::tADDspi;
         BuildMI(MBB, MBBI, dl, TII.get(Opc), DestReg)
             .addReg(BaseReg)
             .addImm(ThisVal / 4)
             .setMIFlags(MIFlags)
             .add(predOps(ARMCC::AL));
         NumBytes = 0;
         continue;
       }
 
       // sub rd, sp, so_imm
       Opc = isSub ? ARM::t2SUBri : ARM::t2ADDri;
       if (ARM_AM::getT2SOImmVal(NumBytes) != -1) {
         NumBytes = 0;
       } else {
         // FIXME: Move this to ARMAddressingModes.h?
         unsigned RotAmt = countLeadingZeros(ThisVal);
         ThisVal = ThisVal & ARM_AM::rotr32(0xff000000U, RotAmt);
         NumBytes &= ~ThisVal;
         assert(ARM_AM::getT2SOImmVal(ThisVal) != -1 &&
                "Bit extraction didn't work?");
       }
     } else {
       assert(DestReg != ARM::SP && BaseReg != ARM::SP);
       Opc = isSub ? ARM::t2SUBri : ARM::t2ADDri;
       if (ARM_AM::getT2SOImmVal(NumBytes) != -1) {
         NumBytes = 0;
       } else if (ThisVal < 4096) {
         Opc = isSub ? ARM::t2SUBri12 : ARM::t2ADDri12;
         HasCCOut = false;
         NumBytes = 0;
       } else {
         // FIXME: Move this to ARMAddressingModes.h?
         unsigned RotAmt = countLeadingZeros(ThisVal);
         ThisVal = ThisVal & ARM_AM::rotr32(0xff000000U, RotAmt);
         NumBytes &= ~ThisVal;
         assert(ARM_AM::getT2SOImmVal(ThisVal) != -1 &&
                "Bit extraction didn't work?");
       }
     }
 
     // Build the new ADD / SUB.
     MachineInstrBuilder MIB = BuildMI(MBB, MBBI, dl, TII.get(Opc), DestReg)
                                   .addReg(BaseReg, RegState::Kill)
                                   .addImm(ThisVal)
                                   .add(predOps(ARMCC::AL))
                                   .setMIFlags(MIFlags);
     if (HasCCOut)
       MIB.add(condCodeOp());
 
     BaseReg = DestReg;
   }
 }
 
 static unsigned
 negativeOffsetOpcode(unsigned opcode)
 {
   switch (opcode) {
   case ARM::t2LDRi12:   return ARM::t2LDRi8;
   case ARM::t2LDRHi12:  return ARM::t2LDRHi8;
   case ARM::t2LDRBi12:  return ARM::t2LDRBi8;
   case ARM::t2LDRSHi12: return ARM::t2LDRSHi8;
   case ARM::t2LDRSBi12: return ARM::t2LDRSBi8;
   case ARM::t2STRi12:   return ARM::t2STRi8;
   case ARM::t2STRBi12:  return ARM::t2STRBi8;
   case ARM::t2STRHi12:  return ARM::t2STRHi8;
   case ARM::t2PLDi12:   return ARM::t2PLDi8;
 
   case ARM::t2LDRi8:
   case ARM::t2LDRHi8:
   case ARM::t2LDRBi8:
   case ARM::t2LDRSHi8:
   case ARM::t2LDRSBi8:
   case ARM::t2STRi8:
   case ARM::t2STRBi8:
   case ARM::t2STRHi8:
   case ARM::t2PLDi8:
     return opcode;
 
   default:
     break;
   }
 
   return 0;
 }
 
 static unsigned
 positiveOffsetOpcode(unsigned opcode)
 {
   switch (opcode) {
   case ARM::t2LDRi8:   return ARM::t2LDRi12;
   case ARM::t2LDRHi8:  return ARM::t2LDRHi12;
   case ARM::t2LDRBi8:  return ARM::t2LDRBi12;
   case ARM::t2LDRSHi8: return ARM::t2LDRSHi12;
   case ARM::t2LDRSBi8: return ARM::t2LDRSBi12;
   case ARM::t2STRi8:   return ARM::t2STRi12;
   case ARM::t2STRBi8:  return ARM::t2STRBi12;
   case ARM::t2STRHi8:  return ARM::t2STRHi12;
   case ARM::t2PLDi8:   return ARM::t2PLDi12;
 
   case ARM::t2LDRi12:
   case ARM::t2LDRHi12:
   case ARM::t2LDRBi12:
   case ARM::t2LDRSHi12:
   case ARM::t2LDRSBi12:
   case ARM::t2STRi12:
   case ARM::t2STRBi12:
   case ARM::t2STRHi12:
   case ARM::t2PLDi12:
     return opcode;
 
   default:
     break;
   }
 
   return 0;
 }
 
 static unsigned
 immediateOffsetOpcode(unsigned opcode)
 {
   switch (opcode) {
   case ARM::t2LDRs:   return ARM::t2LDRi12;
   case ARM::t2LDRHs:  return ARM::t2LDRHi12;
   case ARM::t2LDRBs:  return ARM::t2LDRBi12;
   case ARM::t2LDRSHs: return ARM::t2LDRSHi12;
   case ARM::t2LDRSBs: return ARM::t2LDRSBi12;
   case ARM::t2STRs:   return ARM::t2STRi12;
   case ARM::t2STRBs:  return ARM::t2STRBi12;
   case ARM::t2STRHs:  return ARM::t2STRHi12;
   case ARM::t2PLDs:   return ARM::t2PLDi12;
 
   case ARM::t2LDRi12:
   case ARM::t2LDRHi12:
   case ARM::t2LDRBi12:
   case ARM::t2LDRSHi12:
   case ARM::t2LDRSBi12:
   case ARM::t2STRi12:
   case ARM::t2STRBi12:
   case ARM::t2STRHi12:
   case ARM::t2PLDi12:
   case ARM::t2LDRi8:
   case ARM::t2LDRHi8:
   case ARM::t2LDRBi8:
   case ARM::t2LDRSHi8:
   case ARM::t2LDRSBi8:
   case ARM::t2STRi8:
   case ARM::t2STRBi8:
   case ARM::t2STRHi8:
   case ARM::t2PLDi8:
     return opcode;
 
   default:
     break;
   }
 
   return 0;
 }
 
 bool llvm::rewriteT2FrameIndex(MachineInstr &MI, unsigned FrameRegIdx,
                                unsigned FrameReg, int &Offset,
                                const ARMBaseInstrInfo &TII) {
   unsigned Opcode = MI.getOpcode();
   const MCInstrDesc &Desc = MI.getDesc();
   unsigned AddrMode = (Desc.TSFlags & ARMII::AddrModeMask);
   bool isSub = false;
 
   // Memory operands in inline assembly always use AddrModeT2_i12.
   if (Opcode == ARM::INLINEASM)
     AddrMode = ARMII::AddrModeT2_i12; // FIXME. mode for thumb2?
 
   if (Opcode == ARM::t2ADDri || Opcode == ARM::t2ADDri12) {
     Offset += MI.getOperand(FrameRegIdx+1).getImm();
 
     unsigned PredReg;
     if (Offset == 0 && getInstrPredicate(MI, PredReg) == ARMCC::AL &&
         !MI.definesRegister(ARM::CPSR)) {
       // Turn it into a move.
       MI.setDesc(TII.get(ARM::tMOVr));
       MI.getOperand(FrameRegIdx).ChangeToRegister(FrameReg, false);
       // Remove offset and remaining explicit predicate operands.
       do MI.RemoveOperand(FrameRegIdx+1);
       while (MI.getNumOperands() > FrameRegIdx+1);
       MachineInstrBuilder MIB(*MI.getParent()->getParent(), &MI);
       MIB.add(predOps(ARMCC::AL));
       return true;
     }
 
     bool HasCCOut = Opcode != ARM::t2ADDri12;
 
     if (Offset < 0) {
       Offset = -Offset;
       isSub = true;
       MI.setDesc(TII.get(ARM::t2SUBri));
     } else {
       MI.setDesc(TII.get(ARM::t2ADDri));
     }
 
     // Common case: small offset, fits into instruction.
     if (ARM_AM::getT2SOImmVal(Offset) != -1) {
       MI.getOperand(FrameRegIdx).ChangeToRegister(FrameReg, false);
       MI.getOperand(FrameRegIdx+1).ChangeToImmediate(Offset);
       // Add cc_out operand if the original instruction did not have one.
       if (!HasCCOut)
         MI.addOperand(MachineOperand::CreateReg(0, false));
       Offset = 0;
       return true;
     }
     // Another common case: imm12.
     if (Offset < 4096 &&
         (!HasCCOut || MI.getOperand(MI.getNumOperands()-1).getReg() == 0)) {
       unsigned NewOpc = isSub ? ARM::t2SUBri12 : ARM::t2ADDri12;
       MI.setDesc(TII.get(NewOpc));
       MI.getOperand(FrameRegIdx).ChangeToRegister(FrameReg, false);
       MI.getOperand(FrameRegIdx+1).ChangeToImmediate(Offset);
       // Remove the cc_out operand.
       if (HasCCOut)
         MI.RemoveOperand(MI.getNumOperands()-1);
       Offset = 0;
       return true;
     }
 
     // Otherwise, extract 8 adjacent bits from the immediate into this
     // t2ADDri/t2SUBri.
     unsigned RotAmt = countLeadingZeros<unsigned>(Offset);
     unsigned ThisImmVal = Offset & ARM_AM::rotr32(0xff000000U, RotAmt);
 
     // We will handle these bits from offset, clear them.
     Offset &= ~ThisImmVal;
 
     assert(ARM_AM::getT2SOImmVal(ThisImmVal) != -1 &&
            "Bit extraction didn't work?");
     MI.getOperand(FrameRegIdx+1).ChangeToImmediate(ThisImmVal);
     // Add cc_out operand if the original instruction did not have one.
     if (!HasCCOut)
       MI.addOperand(MachineOperand::CreateReg(0, false));
   } else {
     // AddrMode4 and AddrMode6 cannot handle any offset.
     if (AddrMode == ARMII::AddrMode4 || AddrMode == ARMII::AddrMode6)
       return false;
 
     // AddrModeT2_so cannot handle any offset. If there is no offset
     // register then we change to an immediate version.
     unsigned NewOpc = Opcode;
     if (AddrMode == ARMII::AddrModeT2_so) {
       unsigned OffsetReg = MI.getOperand(FrameRegIdx+1).getReg();
       if (OffsetReg != 0) {
         MI.getOperand(FrameRegIdx).ChangeToRegister(FrameReg, false);
         return Offset == 0;
       }
 
       MI.RemoveOperand(FrameRegIdx+1);
       MI.getOperand(FrameRegIdx+1).ChangeToImmediate(0);
       NewOpc = immediateOffsetOpcode(Opcode);
       AddrMode = ARMII::AddrModeT2_i12;
     }
 
     unsigned NumBits = 0;
     unsigned Scale = 1;
     if (AddrMode == ARMII::AddrModeT2_i8 || AddrMode == ARMII::AddrModeT2_i12) {
       // i8 supports only negative, and i12 supports only positive, so
       // based on Offset sign convert Opcode to the appropriate
       // instruction
       Offset += MI.getOperand(FrameRegIdx+1).getImm();
       if (Offset < 0) {
         NewOpc = negativeOffsetOpcode(Opcode);
         NumBits = 8;
         isSub = true;
         Offset = -Offset;
       } else {
         NewOpc = positiveOffsetOpcode(Opcode);
         NumBits = 12;
       }
     } else if (AddrMode == ARMII::AddrMode5) {
       // VFP address mode.
       const MachineOperand &OffOp = MI.getOperand(FrameRegIdx+1);
       int InstrOffs = ARM_AM::getAM5Offset(OffOp.getImm());
       if (ARM_AM::getAM5Op(OffOp.getImm()) == ARM_AM::sub)
         InstrOffs *= -1;
       NumBits = 8;
       Scale = 4;
       Offset += InstrOffs * 4;
       assert((Offset & (Scale-1)) == 0 && "Can't encode this offset!");
       if (Offset < 0) {
         Offset = -Offset;
         isSub = true;
       }
     } else if (AddrMode == ARMII::AddrMode5FP16) {
       // VFP address mode.
       const MachineOperand &OffOp = MI.getOperand(FrameRegIdx+1);
       int InstrOffs = ARM_AM::getAM5FP16Offset(OffOp.getImm());
       if (ARM_AM::getAM5FP16Op(OffOp.getImm()) == ARM_AM::sub)
         InstrOffs *= -1;
       NumBits = 8;
       Scale = 2;
       Offset += InstrOffs * 2;
       assert((Offset & (Scale-1)) == 0 && "Can't encode this offset!");
       if (Offset < 0) {
         Offset = -Offset;
         isSub = true;
       }
     } else if (AddrMode == ARMII::AddrModeT2_i8s4) {
       Offset += MI.getOperand(FrameRegIdx + 1).getImm() * 4;
       NumBits = 10; // 8 bits scaled by 4
       // MCInst operand expects already scaled value.
       Scale = 1;
       assert((Offset & 3) == 0 && "Can't encode this offset!");
+    } else if (AddrMode == ARMII::AddrModeT2_ldrex) {
+      Offset += MI.getOperand(FrameRegIdx + 1).getImm() * 4;
+      NumBits = 8; // 8 bits scaled by 4
+      Scale = 4;
+      assert((Offset & 3) == 0 && "Can't encode this offset!");
     } else {
       llvm_unreachable("Unsupported addressing mode!");
     }
 
     if (NewOpc != Opcode)
       MI.setDesc(TII.get(NewOpc));
 
     MachineOperand &ImmOp = MI.getOperand(FrameRegIdx+1);
 
     // Attempt to fold address computation
     // Common case: small offset, fits into instruction.
     int ImmedOffset = Offset / Scale;
     unsigned Mask = (1 << NumBits) - 1;
     if ((unsigned)Offset <= Mask * Scale) {
       // Replace the FrameIndex with fp/sp
       MI.getOperand(FrameRegIdx).ChangeToRegister(FrameReg, false);
       if (isSub) {
         if (AddrMode == ARMII::AddrMode5)
           // FIXME: Not consistent.
           ImmedOffset |= 1 << NumBits;
         else
           ImmedOffset = -ImmedOffset;
       }
       ImmOp.ChangeToImmediate(ImmedOffset);
       Offset = 0;
       return true;
     }
 
     // Otherwise, offset doesn't fit. Pull in what we can to simplify
     ImmedOffset = ImmedOffset & Mask;
     if (isSub) {
       if (AddrMode == ARMII::AddrMode5)
         // FIXME: Not consistent.
         ImmedOffset |= 1 << NumBits;
       else {
         ImmedOffset = -ImmedOffset;
         if (ImmedOffset == 0)
           // Change the opcode back if the encoded offset is zero.
           MI.setDesc(TII.get(positiveOffsetOpcode(NewOpc)));
       }
     }
     ImmOp.ChangeToImmediate(ImmedOffset);
     Offset &= ~(Mask*Scale);
   }
 
   Offset = (isSub) ? -Offset : Offset;
   return Offset == 0;
 }
 
 ARMCC::CondCodes llvm::getITInstrPredicate(const MachineInstr &MI,
                                            unsigned &PredReg) {
   unsigned Opc = MI.getOpcode();
   if (Opc == ARM::tBcc || Opc == ARM::t2Bcc)
     return ARMCC::AL;
   return getInstrPredicate(MI, PredReg);
 }
Index: vendor/llvm/dist-release_70/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp
===================================================================
--- vendor/llvm/dist-release_70/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp	(revision 338575)
@@ -1,113 +1,120 @@
 //===-- BPFAsmBackend.cpp - BPF Assembler Backend -------------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 
 #include "MCTargetDesc/BPFMCTargetDesc.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/MC/MCAsmBackend.h"
+#include "llvm/MC/MCAssembler.h"
+#include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCFixup.h"
 #include "llvm/MC/MCObjectWriter.h"
 #include "llvm/Support/EndianStream.h"
 #include <cassert>
 #include <cstdint>
 
 using namespace llvm;
 
 namespace {
 
 class BPFAsmBackend : public MCAsmBackend {
 public:
   BPFAsmBackend(support::endianness Endian) : MCAsmBackend(Endian) {}
   ~BPFAsmBackend() override = default;
 
   void applyFixup(const MCAssembler &Asm, const MCFixup &Fixup,
                   const MCValue &Target, MutableArrayRef<char> Data,
                   uint64_t Value, bool IsResolved,
                   const MCSubtargetInfo *STI) const override;
 
   std::unique_ptr<MCObjectTargetWriter>
   createObjectTargetWriter() const override;
 
   // No instruction requires relaxation
   bool fixupNeedsRelaxation(const MCFixup &Fixup, uint64_t Value,
                             const MCRelaxableFragment *DF,
                             const MCAsmLayout &Layout) const override {
     return false;
   }
 
   unsigned getNumFixupKinds() const override { return 1; }
 
   bool mayNeedRelaxation(const MCInst &Inst,
                          const MCSubtargetInfo &STI) const override {
     return false;
   }
 
   void relaxInstruction(const MCInst &Inst, const MCSubtargetInfo &STI,
                         MCInst &Res) const override {}
 
   bool writeNopData(raw_ostream &OS, uint64_t Count) const override;
 };
 
 } // end anonymous namespace
 
 bool BPFAsmBackend::writeNopData(raw_ostream &OS, uint64_t Count) const {
   if ((Count % 8) != 0)
     return false;
 
   for (uint64_t i = 0; i < Count; i += 8)
     support::endian::write<uint64_t>(OS, 0x15000000, Endian);
 
   return true;
 }
 
 void BPFAsmBackend::applyFixup(const MCAssembler &Asm, const MCFixup &Fixup,
                                const MCValue &Target,
                                MutableArrayRef<char> Data, uint64_t Value,
                                bool IsResolved,
                                const MCSubtargetInfo *STI) const {
   if (Fixup.getKind() == FK_SecRel_4 || Fixup.getKind() == FK_SecRel_8) {
-    assert(Value == 0);
+    if (Value) {
+      MCContext &Ctx = Asm.getContext();
+      Ctx.reportError(Fixup.getLoc(),
+                      "Unsupported relocation: try to compile with -O2 or above, "
+                      "or check your static variable usage");
+    }
   } else if (Fixup.getKind() == FK_Data_4) {
     support::endian::write<uint32_t>(&Data[Fixup.getOffset()], Value, Endian);
   } else if (Fixup.getKind() == FK_Data_8) {
     support::endian::write<uint64_t>(&Data[Fixup.getOffset()], Value, Endian);
   } else if (Fixup.getKind() == FK_PCRel_4) {
     Value = (uint32_t)((Value - 8) / 8);
     if (Endian == support::little) {
       Data[Fixup.getOffset() + 1] = 0x10;
       support::endian::write32le(&Data[Fixup.getOffset() + 4], Value);
     } else {
       Data[Fixup.getOffset() + 1] = 0x1;
       support::endian::write32be(&Data[Fixup.getOffset() + 4], Value);
     }
   } else {
     assert(Fixup.getKind() == FK_PCRel_2);
     Value = (uint16_t)((Value - 8) / 8);
     support::endian::write<uint16_t>(&Data[Fixup.getOffset() + 2], Value,
                                      Endian);
   }
 }
 
 std::unique_ptr<MCObjectTargetWriter>
 BPFAsmBackend::createObjectTargetWriter() const {
   return createBPFELFObjectWriter(0);
 }
 
 MCAsmBackend *llvm::createBPFAsmBackend(const Target &T,
                                         const MCSubtargetInfo &STI,
                                         const MCRegisterInfo &MRI,
                                         const MCTargetOptions &) {
   return new BPFAsmBackend(support::little);
 }
 
 MCAsmBackend *llvm::createBPFbeAsmBackend(const Target &T,
                                           const MCSubtargetInfo &STI,
                                           const MCRegisterInfo &MRI,
                                           const MCTargetOptions &) {
   return new BPFAsmBackend(support::big);
 }
Index: vendor/llvm/dist-release_70/lib/Target/X86/AsmParser/X86AsmParser.cpp
===================================================================
--- vendor/llvm/dist-release_70/lib/Target/X86/AsmParser/X86AsmParser.cpp	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Target/X86/AsmParser/X86AsmParser.cpp	(revision 338575)
@@ -1,3457 +1,3457 @@
 //===-- X86AsmParser.cpp - Parse X86 assembly to MCInst instructions ------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 
 #include "InstPrinter/X86IntelInstPrinter.h"
 #include "MCTargetDesc/X86BaseInfo.h"
 #include "MCTargetDesc/X86MCExpr.h"
 #include "MCTargetDesc/X86TargetStreamer.h"
 #include "X86AsmInstrumentation.h"
 #include "X86AsmParserCommon.h"
 #include "X86Operand.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringSwitch.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/MC/MCContext.h"
 #include "llvm/MC/MCExpr.h"
 #include "llvm/MC/MCInst.h"
 #include "llvm/MC/MCInstrInfo.h"
 #include "llvm/MC/MCParser/MCAsmLexer.h"
 #include "llvm/MC/MCParser/MCAsmParser.h"
 #include "llvm/MC/MCParser/MCParsedAsmOperand.h"
 #include "llvm/MC/MCParser/MCTargetAsmParser.h"
 #include "llvm/MC/MCRegisterInfo.h"
 #include "llvm/MC/MCSection.h"
 #include "llvm/MC/MCStreamer.h"
 #include "llvm/MC/MCSubtargetInfo.h"
 #include "llvm/MC/MCSymbol.h"
 #include "llvm/Support/SourceMgr.h"
 #include "llvm/Support/TargetRegistry.h"
 #include "llvm/Support/raw_ostream.h"
 #include <algorithm>
 #include <memory>
 
 using namespace llvm;
 
 static bool checkScale(unsigned Scale, StringRef &ErrMsg) {
   if (Scale != 1 && Scale != 2 && Scale != 4 && Scale != 8) {
     ErrMsg = "scale factor in address must be 1, 2, 4 or 8";
     return true;
   }
   return false;
 }
 
 namespace {
 
 static const char OpPrecedence[] = {
   0, // IC_OR
   1, // IC_XOR
   2, // IC_AND
   3, // IC_LSHIFT
   3, // IC_RSHIFT
   4, // IC_PLUS
   4, // IC_MINUS
   5, // IC_MULTIPLY
   5, // IC_DIVIDE
   5, // IC_MOD
   6, // IC_NOT
   7, // IC_NEG
   8, // IC_RPAREN
   9, // IC_LPAREN
   0, // IC_IMM
   0  // IC_REGISTER
 };
 
 class X86AsmParser : public MCTargetAsmParser {
   ParseInstructionInfo *InstInfo;
   std::unique_ptr<X86AsmInstrumentation> Instrumentation;
   bool Code16GCC;
 
 private:
   SMLoc consumeToken() {
     MCAsmParser &Parser = getParser();
     SMLoc Result = Parser.getTok().getLoc();
     Parser.Lex();
     return Result;
   }
 
   X86TargetStreamer &getTargetStreamer() {
     assert(getParser().getStreamer().getTargetStreamer() &&
            "do not have a target streamer");
     MCTargetStreamer &TS = *getParser().getStreamer().getTargetStreamer();
     return static_cast<X86TargetStreamer &>(TS);
   }
 
   unsigned MatchInstruction(const OperandVector &Operands, MCInst &Inst,
                             uint64_t &ErrorInfo, bool matchingInlineAsm,
                             unsigned VariantID = 0) {
     // In Code16GCC mode, match as 32-bit.
     if (Code16GCC)
       SwitchMode(X86::Mode32Bit);
     unsigned rv = MatchInstructionImpl(Operands, Inst, ErrorInfo,
                                        matchingInlineAsm, VariantID);
     if (Code16GCC)
       SwitchMode(X86::Mode16Bit);
     return rv;
   }
 
   enum InfixCalculatorTok {
     IC_OR = 0,
     IC_XOR,
     IC_AND,
     IC_LSHIFT,
     IC_RSHIFT,
     IC_PLUS,
     IC_MINUS,
     IC_MULTIPLY,
     IC_DIVIDE,
     IC_MOD,
     IC_NOT,
     IC_NEG,
     IC_RPAREN,
     IC_LPAREN,
     IC_IMM,
     IC_REGISTER
   };
 
   enum IntelOperatorKind {
     IOK_INVALID = 0,
     IOK_LENGTH,
     IOK_SIZE,
     IOK_TYPE,
     IOK_OFFSET
   };
 
   class InfixCalculator {
     typedef std::pair< InfixCalculatorTok, int64_t > ICToken;
     SmallVector<InfixCalculatorTok, 4> InfixOperatorStack;
     SmallVector<ICToken, 4> PostfixStack;
 
     bool isUnaryOperator(const InfixCalculatorTok Op) {
       return Op == IC_NEG || Op == IC_NOT;
     }
 
   public:
     int64_t popOperand() {
       assert (!PostfixStack.empty() && "Poped an empty stack!");
       ICToken Op = PostfixStack.pop_back_val();
       if (!(Op.first == IC_IMM || Op.first == IC_REGISTER))
         return -1; // The invalid Scale value will be caught later by checkScale
       return Op.second;
     }
     void pushOperand(InfixCalculatorTok Op, int64_t Val = 0) {
       assert ((Op == IC_IMM || Op == IC_REGISTER) &&
               "Unexpected operand!");
       PostfixStack.push_back(std::make_pair(Op, Val));
     }
 
     void popOperator() { InfixOperatorStack.pop_back(); }
     void pushOperator(InfixCalculatorTok Op) {
       // Push the new operator if the stack is empty.
       if (InfixOperatorStack.empty()) {
         InfixOperatorStack.push_back(Op);
         return;
       }
 
       // Push the new operator if it has a higher precedence than the operator
       // on the top of the stack or the operator on the top of the stack is a
       // left parentheses.
       unsigned Idx = InfixOperatorStack.size() - 1;
       InfixCalculatorTok StackOp = InfixOperatorStack[Idx];
       if (OpPrecedence[Op] > OpPrecedence[StackOp] || StackOp == IC_LPAREN) {
         InfixOperatorStack.push_back(Op);
         return;
       }
 
       // The operator on the top of the stack has higher precedence than the
       // new operator.
       unsigned ParenCount = 0;
       while (1) {
         // Nothing to process.
         if (InfixOperatorStack.empty())
           break;
 
         Idx = InfixOperatorStack.size() - 1;
         StackOp = InfixOperatorStack[Idx];
         if (!(OpPrecedence[StackOp] >= OpPrecedence[Op] || ParenCount))
           break;
 
         // If we have an even parentheses count and we see a left parentheses,
         // then stop processing.
         if (!ParenCount && StackOp == IC_LPAREN)
           break;
 
         if (StackOp == IC_RPAREN) {
           ++ParenCount;
           InfixOperatorStack.pop_back();
         } else if (StackOp == IC_LPAREN) {
           --ParenCount;
           InfixOperatorStack.pop_back();
         } else {
           InfixOperatorStack.pop_back();
           PostfixStack.push_back(std::make_pair(StackOp, 0));
         }
       }
       // Push the new operator.
       InfixOperatorStack.push_back(Op);
     }
 
     int64_t execute() {
       // Push any remaining operators onto the postfix stack.
       while (!InfixOperatorStack.empty()) {
         InfixCalculatorTok StackOp = InfixOperatorStack.pop_back_val();
         if (StackOp != IC_LPAREN && StackOp != IC_RPAREN)
           PostfixStack.push_back(std::make_pair(StackOp, 0));
       }
 
       if (PostfixStack.empty())
         return 0;
 
       SmallVector<ICToken, 16> OperandStack;
       for (unsigned i = 0, e = PostfixStack.size(); i != e; ++i) {
         ICToken Op = PostfixStack[i];
         if (Op.first == IC_IMM || Op.first == IC_REGISTER) {
           OperandStack.push_back(Op);
         } else if (isUnaryOperator(Op.first)) {
           assert (OperandStack.size() > 0 && "Too few operands.");
           ICToken Operand = OperandStack.pop_back_val();
           assert (Operand.first == IC_IMM &&
                   "Unary operation with a register!");
           switch (Op.first) {
           default:
             report_fatal_error("Unexpected operator!");
             break;
           case IC_NEG:
             OperandStack.push_back(std::make_pair(IC_IMM, -Operand.second));
             break;
           case IC_NOT:
             OperandStack.push_back(std::make_pair(IC_IMM, ~Operand.second));
             break;
           }
         } else {
           assert (OperandStack.size() > 1 && "Too few operands.");
           int64_t Val;
           ICToken Op2 = OperandStack.pop_back_val();
           ICToken Op1 = OperandStack.pop_back_val();
           switch (Op.first) {
           default:
             report_fatal_error("Unexpected operator!");
             break;
           case IC_PLUS:
             Val = Op1.second + Op2.second;
             OperandStack.push_back(std::make_pair(IC_IMM, Val));
             break;
           case IC_MINUS:
             Val = Op1.second - Op2.second;
             OperandStack.push_back(std::make_pair(IC_IMM, Val));
             break;
           case IC_MULTIPLY:
             assert (Op1.first == IC_IMM && Op2.first == IC_IMM &&
                     "Multiply operation with an immediate and a register!");
             Val = Op1.second * Op2.second;
             OperandStack.push_back(std::make_pair(IC_IMM, Val));
             break;
           case IC_DIVIDE:
             assert (Op1.first == IC_IMM && Op2.first == IC_IMM &&
                     "Divide operation with an immediate and a register!");
             assert (Op2.second != 0 && "Division by zero!");
             Val = Op1.second / Op2.second;
             OperandStack.push_back(std::make_pair(IC_IMM, Val));
             break;
           case IC_MOD:
             assert (Op1.first == IC_IMM && Op2.first == IC_IMM &&
                     "Modulo operation with an immediate and a register!");
             Val = Op1.second % Op2.second;
             OperandStack.push_back(std::make_pair(IC_IMM, Val));
             break;
           case IC_OR:
             assert (Op1.first == IC_IMM && Op2.first == IC_IMM &&
                     "Or operation with an immediate and a register!");
             Val = Op1.second | Op2.second;
             OperandStack.push_back(std::make_pair(IC_IMM, Val));
             break;
           case IC_XOR:
             assert(Op1.first == IC_IMM && Op2.first == IC_IMM &&
               "Xor operation with an immediate and a register!");
             Val = Op1.second ^ Op2.second;
             OperandStack.push_back(std::make_pair(IC_IMM, Val));
             break;
           case IC_AND:
             assert (Op1.first == IC_IMM && Op2.first == IC_IMM &&
                     "And operation with an immediate and a register!");
             Val = Op1.second & Op2.second;
             OperandStack.push_back(std::make_pair(IC_IMM, Val));
             break;
           case IC_LSHIFT:
             assert (Op1.first == IC_IMM && Op2.first == IC_IMM &&
                     "Left shift operation with an immediate and a register!");
             Val = Op1.second << Op2.second;
             OperandStack.push_back(std::make_pair(IC_IMM, Val));
             break;
           case IC_RSHIFT:
             assert (Op1.first == IC_IMM && Op2.first == IC_IMM &&
                     "Right shift operation with an immediate and a register!");
             Val = Op1.second >> Op2.second;
             OperandStack.push_back(std::make_pair(IC_IMM, Val));
             break;
           }
         }
       }
       assert (OperandStack.size() == 1 && "Expected a single result.");
       return OperandStack.pop_back_val().second;
     }
   };
 
   enum IntelExprState {
     IES_INIT,
     IES_OR,
     IES_XOR,
     IES_AND,
     IES_LSHIFT,
     IES_RSHIFT,
     IES_PLUS,
     IES_MINUS,
     IES_NOT,
     IES_MULTIPLY,
     IES_DIVIDE,
     IES_MOD,
     IES_LBRAC,
     IES_RBRAC,
     IES_LPAREN,
     IES_RPAREN,
     IES_REGISTER,
     IES_INTEGER,
     IES_IDENTIFIER,
     IES_ERROR
   };
 
   class IntelExprStateMachine {
     IntelExprState State, PrevState;
     unsigned BaseReg, IndexReg, TmpReg, Scale;
     int64_t Imm;
     const MCExpr *Sym;
     StringRef SymName;
     InfixCalculator IC;
     InlineAsmIdentifierInfo Info;
     short BracCount;
     bool MemExpr;
 
   public:
     IntelExprStateMachine()
         : State(IES_INIT), PrevState(IES_ERROR), BaseReg(0), IndexReg(0),
           TmpReg(0), Scale(0), Imm(0), Sym(nullptr), BracCount(0),
           MemExpr(false) {}
 
     void addImm(int64_t imm) { Imm += imm; }
     short getBracCount() { return BracCount; }
     bool isMemExpr() { return MemExpr; }
     unsigned getBaseReg() { return BaseReg; }
     unsigned getIndexReg() { return IndexReg; }
     unsigned getScale() { return Scale; }
     const MCExpr *getSym() { return Sym; }
     StringRef getSymName() { return SymName; }
     int64_t getImm() { return Imm + IC.execute(); }
     bool isValidEndState() {
       return State == IES_RBRAC || State == IES_INTEGER;
     }
     bool hadError() { return State == IES_ERROR; }
     InlineAsmIdentifierInfo &getIdentifierInfo() { return Info; }
 
     void onOr() {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_INTEGER:
       case IES_RPAREN:
       case IES_REGISTER:
         State = IES_OR;
         IC.pushOperator(IC_OR);
         break;
       }
       PrevState = CurrState;
     }
     void onXor() {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_INTEGER:
       case IES_RPAREN:
       case IES_REGISTER:
         State = IES_XOR;
         IC.pushOperator(IC_XOR);
         break;
       }
       PrevState = CurrState;
     }
     void onAnd() {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_INTEGER:
       case IES_RPAREN:
       case IES_REGISTER:
         State = IES_AND;
         IC.pushOperator(IC_AND);
         break;
       }
       PrevState = CurrState;
     }
     void onLShift() {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_INTEGER:
       case IES_RPAREN:
       case IES_REGISTER:
         State = IES_LSHIFT;
         IC.pushOperator(IC_LSHIFT);
         break;
       }
       PrevState = CurrState;
     }
     void onRShift() {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_INTEGER:
       case IES_RPAREN:
       case IES_REGISTER:
         State = IES_RSHIFT;
         IC.pushOperator(IC_RSHIFT);
         break;
       }
       PrevState = CurrState;
     }
     bool onPlus(StringRef &ErrMsg) {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_INTEGER:
       case IES_RPAREN:
       case IES_REGISTER:
         State = IES_PLUS;
         IC.pushOperator(IC_PLUS);
         if (CurrState == IES_REGISTER && PrevState != IES_MULTIPLY) {
           // If we already have a BaseReg, then assume this is the IndexReg with
           // no explicit scale.
           if (!BaseReg) {
             BaseReg = TmpReg;
           } else {
             if (IndexReg) {
               ErrMsg = "BaseReg/IndexReg already set!";
               return true;
             }
             IndexReg = TmpReg;
             Scale = 0;
           }
         }
         break;
       }
       PrevState = CurrState;
       return false;
     }
     bool onMinus(StringRef &ErrMsg) {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_OR:
       case IES_XOR:
       case IES_AND:
       case IES_LSHIFT:
       case IES_RSHIFT:
       case IES_PLUS:
       case IES_NOT:
       case IES_MULTIPLY:
       case IES_DIVIDE:
       case IES_MOD:
       case IES_LPAREN:
       case IES_RPAREN:
       case IES_LBRAC:
       case IES_RBRAC:
       case IES_INTEGER:
       case IES_REGISTER:
       case IES_INIT:
         State = IES_MINUS;
         // push minus operator if it is not a negate operator
         if (CurrState == IES_REGISTER || CurrState == IES_RPAREN ||
             CurrState == IES_INTEGER  || CurrState == IES_RBRAC)
           IC.pushOperator(IC_MINUS);
         else if (PrevState == IES_REGISTER && CurrState == IES_MULTIPLY) {
           // We have negate operator for Scale: it's illegal
           ErrMsg = "Scale can't be negative";
           return true;
         } else
           IC.pushOperator(IC_NEG);
         if (CurrState == IES_REGISTER && PrevState != IES_MULTIPLY) {
           // If we already have a BaseReg, then assume this is the IndexReg with
           // no explicit scale.
           if (!BaseReg) {
             BaseReg = TmpReg;
           } else {
             if (IndexReg) {
               ErrMsg = "BaseReg/IndexReg already set!";
               return true;
             }
             IndexReg = TmpReg;
             Scale = 0;
           }
         }
         break;
       }
       PrevState = CurrState;
       return false;
     }
     void onNot() {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_OR:
       case IES_XOR:
       case IES_AND:
       case IES_LSHIFT:
       case IES_RSHIFT:
       case IES_PLUS:
       case IES_MINUS:
       case IES_NOT:
       case IES_MULTIPLY:
       case IES_DIVIDE:
       case IES_MOD:
       case IES_LPAREN:
       case IES_LBRAC:
       case IES_INIT:
         State = IES_NOT;
         IC.pushOperator(IC_NOT);
         break;
       }
       PrevState = CurrState;
     }
 
     bool onRegister(unsigned Reg, StringRef &ErrMsg) {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_PLUS:
       case IES_LPAREN:
       case IES_LBRAC:
         State = IES_REGISTER;
         TmpReg = Reg;
         IC.pushOperand(IC_REGISTER);
         break;
       case IES_MULTIPLY:
         // Index Register - Scale * Register
         if (PrevState == IES_INTEGER) {
           if (IndexReg) {
             ErrMsg = "BaseReg/IndexReg already set!";
             return true;
           }
           State = IES_REGISTER;
           IndexReg = Reg;
           // Get the scale and replace the 'Scale * Register' with '0'.
           Scale = IC.popOperand();
           if (checkScale(Scale, ErrMsg))
             return true;
           IC.pushOperand(IC_IMM);
           IC.popOperator();
         } else {
           State = IES_ERROR;
         }
         break;
       }
       PrevState = CurrState;
       return false;
     }
     bool onIdentifierExpr(const MCExpr *SymRef, StringRef SymRefName,
                           const InlineAsmIdentifierInfo &IDInfo,
                           bool ParsingInlineAsm, StringRef &ErrMsg) {
       // InlineAsm: Treat an enum value as an integer
       if (ParsingInlineAsm)
         if (IDInfo.isKind(InlineAsmIdentifierInfo::IK_EnumVal))
           return onInteger(IDInfo.Enum.EnumVal, ErrMsg);
       // Treat a symbolic constant like an integer
       if (auto *CE = dyn_cast<MCConstantExpr>(SymRef))
         return onInteger(CE->getValue(), ErrMsg);
       PrevState = State;
       bool HasSymbol = Sym != nullptr;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_PLUS:
       case IES_MINUS:
       case IES_NOT:
       case IES_INIT:
       case IES_LBRAC:
         MemExpr = true;
         State = IES_INTEGER;
         Sym = SymRef;
         SymName = SymRefName;
         IC.pushOperand(IC_IMM);
         if (ParsingInlineAsm)
           Info = IDInfo;
         break;
       }
       if (HasSymbol)
         ErrMsg = "cannot use more than one symbol in memory operand";
       return HasSymbol;
     }
     bool onInteger(int64_t TmpInt, StringRef &ErrMsg) {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_PLUS:
       case IES_MINUS:
       case IES_NOT:
       case IES_OR:
       case IES_XOR:
       case IES_AND:
       case IES_LSHIFT:
       case IES_RSHIFT:
       case IES_DIVIDE:
       case IES_MOD:
       case IES_MULTIPLY:
       case IES_LPAREN:
       case IES_INIT:
       case IES_LBRAC:
         State = IES_INTEGER;
         if (PrevState == IES_REGISTER && CurrState == IES_MULTIPLY) {
           // Index Register - Register * Scale
           if (IndexReg) {
             ErrMsg = "BaseReg/IndexReg already set!";
             return true;
           }
           IndexReg = TmpReg;
           Scale = TmpInt;
           if (checkScale(Scale, ErrMsg))
             return true;
           // Get the scale and replace the 'Register * Scale' with '0'.
           IC.popOperator();
         } else {
           IC.pushOperand(IC_IMM, TmpInt);
         }
         break;
       }
       PrevState = CurrState;
       return false;
     }
     void onStar() {
       PrevState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_INTEGER:
       case IES_REGISTER:
       case IES_RPAREN:
         State = IES_MULTIPLY;
         IC.pushOperator(IC_MULTIPLY);
         break;
       }
     }
     void onDivide() {
       PrevState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_INTEGER:
       case IES_RPAREN:
         State = IES_DIVIDE;
         IC.pushOperator(IC_DIVIDE);
         break;
       }
     }
     void onMod() {
       PrevState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_INTEGER:
       case IES_RPAREN:
         State = IES_MOD;
         IC.pushOperator(IC_MOD);
         break;
       }
     }
     bool onLBrac() {
       if (BracCount)
         return true;
       PrevState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_RBRAC:
       case IES_INTEGER:
       case IES_RPAREN:
         State = IES_PLUS;
         IC.pushOperator(IC_PLUS);
         break;
       case IES_INIT:
         assert(!BracCount && "BracCount should be zero on parsing's start");
         State = IES_LBRAC;
         break;
       }
       MemExpr = true;
       BracCount++;
       return false;
     }
     bool onRBrac() {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_INTEGER:
       case IES_REGISTER:
       case IES_RPAREN:
         if (BracCount-- != 1)
           return true;
         State = IES_RBRAC;
         if (CurrState == IES_REGISTER && PrevState != IES_MULTIPLY) {
           // If we already have a BaseReg, then assume this is the IndexReg with
           // no explicit scale.
           if (!BaseReg) {
             BaseReg = TmpReg;
           } else {
             assert (!IndexReg && "BaseReg/IndexReg already set!");
             IndexReg = TmpReg;
             Scale = 0;
           }
         }
         break;
       }
       PrevState = CurrState;
       return false;
     }
     void onLParen() {
       IntelExprState CurrState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_PLUS:
       case IES_MINUS:
       case IES_NOT:
       case IES_OR:
       case IES_XOR:
       case IES_AND:
       case IES_LSHIFT:
       case IES_RSHIFT:
       case IES_MULTIPLY:
       case IES_DIVIDE:
       case IES_MOD:
       case IES_LPAREN:
       case IES_INIT:
       case IES_LBRAC:
         State = IES_LPAREN;
         IC.pushOperator(IC_LPAREN);
         break;
       }
       PrevState = CurrState;
     }
     void onRParen() {
       PrevState = State;
       switch (State) {
       default:
         State = IES_ERROR;
         break;
       case IES_INTEGER:
       case IES_REGISTER:
       case IES_RPAREN:
         State = IES_RPAREN;
         IC.pushOperator(IC_RPAREN);
         break;
       }
     }
   };
 
   bool Error(SMLoc L, const Twine &Msg, SMRange Range = None,
              bool MatchingInlineAsm = false) {
     MCAsmParser &Parser = getParser();
     if (MatchingInlineAsm) {
       if (!getLexer().isAtStartOfStatement())
         Parser.eatToEndOfStatement();
       return false;
     }
     return Parser.Error(L, Msg, Range);
   }
 
   std::nullptr_t ErrorOperand(SMLoc Loc, StringRef Msg) {
     Error(Loc, Msg);
     return nullptr;
   }
 
   std::unique_ptr<X86Operand> DefaultMemSIOperand(SMLoc Loc);
   std::unique_ptr<X86Operand> DefaultMemDIOperand(SMLoc Loc);
   bool IsSIReg(unsigned Reg);
   unsigned GetSIDIForRegClass(unsigned RegClassID, unsigned Reg, bool IsSIReg);
   void
   AddDefaultSrcDestOperands(OperandVector &Operands,
                             std::unique_ptr<llvm::MCParsedAsmOperand> &&Src,
                             std::unique_ptr<llvm::MCParsedAsmOperand> &&Dst);
   bool VerifyAndAdjustOperands(OperandVector &OrigOperands,
                                OperandVector &FinalOperands);
   std::unique_ptr<X86Operand> ParseOperand();
   std::unique_ptr<X86Operand> ParseATTOperand();
   std::unique_ptr<X86Operand> ParseIntelOperand();
   std::unique_ptr<X86Operand> ParseIntelOffsetOfOperator();
   bool ParseIntelDotOperator(IntelExprStateMachine &SM, SMLoc &End);
   unsigned IdentifyIntelInlineAsmOperator(StringRef Name);
   unsigned ParseIntelInlineAsmOperator(unsigned OpKind);
   std::unique_ptr<X86Operand> ParseRoundingModeOp(SMLoc Start);
   bool ParseIntelNamedOperator(StringRef Name, IntelExprStateMachine &SM);
   void RewriteIntelExpression(IntelExprStateMachine &SM, SMLoc Start,
                               SMLoc End);
   bool ParseIntelExpression(IntelExprStateMachine &SM, SMLoc &End);
   bool ParseIntelInlineAsmIdentifier(const MCExpr *&Val, StringRef &Identifier,
                                      InlineAsmIdentifierInfo &Info,
                                      bool IsUnevaluatedOperand, SMLoc &End);
 
   std::unique_ptr<X86Operand> ParseMemOperand(unsigned SegReg, SMLoc MemStart);
 
   bool ParseIntelMemoryOperandSize(unsigned &Size);
   std::unique_ptr<X86Operand>
   CreateMemForInlineAsm(unsigned SegReg, const MCExpr *Disp, unsigned BaseReg,
                         unsigned IndexReg, unsigned Scale, SMLoc Start,
                         SMLoc End, unsigned Size, StringRef Identifier,
                         const InlineAsmIdentifierInfo &Info);
 
   bool parseDirectiveEven(SMLoc L);
   bool ParseDirectiveCode(StringRef IDVal, SMLoc L);
 
   /// CodeView FPO data directives.
   bool parseDirectiveFPOProc(SMLoc L);
   bool parseDirectiveFPOSetFrame(SMLoc L);
   bool parseDirectiveFPOPushReg(SMLoc L);
   bool parseDirectiveFPOStackAlloc(SMLoc L);
   bool parseDirectiveFPOEndPrologue(SMLoc L);
   bool parseDirectiveFPOEndProc(SMLoc L);
   bool parseDirectiveFPOData(SMLoc L);
 
   bool validateInstruction(MCInst &Inst, const OperandVector &Ops);
   bool processInstruction(MCInst &Inst, const OperandVector &Ops);
 
   /// Wrapper around MCStreamer::EmitInstruction(). Possibly adds
   /// instrumentation around Inst.
   void EmitInstruction(MCInst &Inst, OperandVector &Operands, MCStreamer &Out);
 
   bool MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
                                OperandVector &Operands, MCStreamer &Out,
                                uint64_t &ErrorInfo,
                                bool MatchingInlineAsm) override;
 
   void MatchFPUWaitAlias(SMLoc IDLoc, X86Operand &Op, OperandVector &Operands,
                          MCStreamer &Out, bool MatchingInlineAsm);
 
   bool ErrorMissingFeature(SMLoc IDLoc, uint64_t ErrorInfo,
                            bool MatchingInlineAsm);
 
   bool MatchAndEmitATTInstruction(SMLoc IDLoc, unsigned &Opcode,
                                   OperandVector &Operands, MCStreamer &Out,
                                   uint64_t &ErrorInfo,
                                   bool MatchingInlineAsm);
 
   bool MatchAndEmitIntelInstruction(SMLoc IDLoc, unsigned &Opcode,
                                     OperandVector &Operands, MCStreamer &Out,
                                     uint64_t &ErrorInfo,
                                     bool MatchingInlineAsm);
 
   bool OmitRegisterFromClobberLists(unsigned RegNo) override;
 
   /// Parses AVX512 specific operand primitives: masked registers ({%k<NUM>}, {z})
   /// and memory broadcasting ({1to<NUM>}) primitives, updating Operands vector if required.
   /// return false if no parsing errors occurred, true otherwise.
   bool HandleAVX512Operand(OperandVector &Operands,
                            const MCParsedAsmOperand &Op);
 
   bool ParseZ(std::unique_ptr<X86Operand> &Z, const SMLoc &StartLoc);
 
   bool is64BitMode() const {
     // FIXME: Can tablegen auto-generate this?
     return getSTI().getFeatureBits()[X86::Mode64Bit];
   }
   bool is32BitMode() const {
     // FIXME: Can tablegen auto-generate this?
     return getSTI().getFeatureBits()[X86::Mode32Bit];
   }
   bool is16BitMode() const {
     // FIXME: Can tablegen auto-generate this?
     return getSTI().getFeatureBits()[X86::Mode16Bit];
   }
   void SwitchMode(unsigned mode) {
     MCSubtargetInfo &STI = copySTI();
     FeatureBitset AllModes({X86::Mode64Bit, X86::Mode32Bit, X86::Mode16Bit});
     FeatureBitset OldMode = STI.getFeatureBits() & AllModes;
     uint64_t FB = ComputeAvailableFeatures(
       STI.ToggleFeature(OldMode.flip(mode)));
     setAvailableFeatures(FB);
 
     assert(FeatureBitset({mode}) == (STI.getFeatureBits() & AllModes));
   }
 
   unsigned getPointerWidth() {
     if (is16BitMode()) return 16;
     if (is32BitMode()) return 32;
     if (is64BitMode()) return 64;
     llvm_unreachable("invalid mode");
   }
 
   bool isParsingIntelSyntax() {
     return getParser().getAssemblerDialect();
   }
 
   /// @name Auto-generated Matcher Functions
   /// {
 
 #define GET_ASSEMBLER_HEADER
 #include "X86GenAsmMatcher.inc"
 
   /// }
 
 public:
 
   X86AsmParser(const MCSubtargetInfo &sti, MCAsmParser &Parser,
                const MCInstrInfo &mii, const MCTargetOptions &Options)
       : MCTargetAsmParser(Options, sti, mii),  InstInfo(nullptr),
         Code16GCC(false) {
 
     Parser.addAliasForDirective(".word", ".2byte");
 
     // Initialize the set of available features.
     setAvailableFeatures(ComputeAvailableFeatures(getSTI().getFeatureBits()));
     Instrumentation.reset(
         CreateX86AsmInstrumentation(Options, Parser.getContext(), STI));
   }
 
   bool ParseRegister(unsigned &RegNo, SMLoc &StartLoc, SMLoc &EndLoc) override;
 
   void SetFrameRegister(unsigned RegNo) override;
 
   bool parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) override;
 
   bool ParseInstruction(ParseInstructionInfo &Info, StringRef Name,
                         SMLoc NameLoc, OperandVector &Operands) override;
 
   bool ParseDirective(AsmToken DirectiveID) override;
 };
 } // end anonymous namespace
 
 /// @name Auto-generated Match Functions
 /// {
 
 static unsigned MatchRegisterName(StringRef Name);
 
 /// }
 
 static bool CheckBaseRegAndIndexRegAndScale(unsigned BaseReg, unsigned IndexReg,
                                             unsigned Scale, bool Is64BitMode,
                                             StringRef &ErrMsg) {
   // If we have both a base register and an index register make sure they are
   // both 64-bit or 32-bit registers.
   // To support VSIB, IndexReg can be 128-bit or 256-bit registers.
 
   if (BaseReg != 0 &&
       !(BaseReg == X86::RIP || BaseReg == X86::EIP ||
         X86MCRegisterClasses[X86::GR16RegClassID].contains(BaseReg) ||
         X86MCRegisterClasses[X86::GR32RegClassID].contains(BaseReg) ||
         X86MCRegisterClasses[X86::GR64RegClassID].contains(BaseReg))) {
     ErrMsg = "invalid base+index expression";
     return true;
   }
 
   if (IndexReg != 0 &&
       !(IndexReg == X86::EIZ || IndexReg == X86::RIZ ||
         X86MCRegisterClasses[X86::GR16RegClassID].contains(IndexReg) ||
         X86MCRegisterClasses[X86::GR32RegClassID].contains(IndexReg) ||
         X86MCRegisterClasses[X86::GR64RegClassID].contains(IndexReg) ||
         X86MCRegisterClasses[X86::VR128XRegClassID].contains(IndexReg) ||
         X86MCRegisterClasses[X86::VR256XRegClassID].contains(IndexReg) ||
         X86MCRegisterClasses[X86::VR512RegClassID].contains(IndexReg))) {
     ErrMsg = "invalid base+index expression";
     return true;
   }
 
   if (((BaseReg == X86::RIP || BaseReg == X86::EIP) && IndexReg != 0) ||
       IndexReg == X86::EIP || IndexReg == X86::RIP ||
       IndexReg == X86::ESP || IndexReg == X86::RSP) {
     ErrMsg = "invalid base+index expression";
     return true;
   }
 
   // Check for use of invalid 16-bit registers. Only BX/BP/SI/DI are allowed,
   // and then only in non-64-bit modes.
   if (X86MCRegisterClasses[X86::GR16RegClassID].contains(BaseReg) &&
       (Is64BitMode || (BaseReg != X86::BX && BaseReg != X86::BP &&
                        BaseReg != X86::SI && BaseReg != X86::DI)) &&
       BaseReg != X86::DX) {
     ErrMsg = "invalid 16-bit base register";
     return true;
   }
 
   if (BaseReg == 0 &&
       X86MCRegisterClasses[X86::GR16RegClassID].contains(IndexReg)) {
     ErrMsg = "16-bit memory operand may not include only index register";
     return true;
   }
 
   if (BaseReg != 0 && IndexReg != 0) {
     if (X86MCRegisterClasses[X86::GR64RegClassID].contains(BaseReg) &&
         (X86MCRegisterClasses[X86::GR16RegClassID].contains(IndexReg) ||
          X86MCRegisterClasses[X86::GR32RegClassID].contains(IndexReg) ||
          IndexReg == X86::EIZ)) {
       ErrMsg = "base register is 64-bit, but index register is not";
       return true;
     }
     if (X86MCRegisterClasses[X86::GR32RegClassID].contains(BaseReg) &&
         (X86MCRegisterClasses[X86::GR16RegClassID].contains(IndexReg) ||
          X86MCRegisterClasses[X86::GR64RegClassID].contains(IndexReg) ||
          IndexReg == X86::RIZ)) {
       ErrMsg = "base register is 32-bit, but index register is not";
       return true;
     }
     if (X86MCRegisterClasses[X86::GR16RegClassID].contains(BaseReg)) {
       if (X86MCRegisterClasses[X86::GR32RegClassID].contains(IndexReg) ||
           X86MCRegisterClasses[X86::GR64RegClassID].contains(IndexReg)) {
         ErrMsg = "base register is 16-bit, but index register is not";
         return true;
       }
       if ((BaseReg != X86::BX && BaseReg != X86::BP) ||
           (IndexReg != X86::SI && IndexReg != X86::DI)) {
         ErrMsg = "invalid 16-bit base/index register combination";
         return true;
       }
     }
   }
 
   // RIP/EIP-relative addressing is only supported in 64-bit mode.
   if (!Is64BitMode && BaseReg != 0 &&
       (BaseReg == X86::RIP || BaseReg == X86::EIP)) {
-    ErrMsg = "RIP-relative addressing requires 64-bit mode";
+    ErrMsg = "IP-relative addressing requires 64-bit mode";
     return true;
   }
 
   return checkScale(Scale, ErrMsg);
 }
 
 bool X86AsmParser::ParseRegister(unsigned &RegNo,
                                  SMLoc &StartLoc, SMLoc &EndLoc) {
   MCAsmParser &Parser = getParser();
   RegNo = 0;
   const AsmToken &PercentTok = Parser.getTok();
   StartLoc = PercentTok.getLoc();
 
   // If we encounter a %, ignore it. This code handles registers with and
   // without the prefix, unprefixed registers can occur in cfi directives.
   if (!isParsingIntelSyntax() && PercentTok.is(AsmToken::Percent))
     Parser.Lex(); // Eat percent token.
 
   const AsmToken &Tok = Parser.getTok();
   EndLoc = Tok.getEndLoc();
 
   if (Tok.isNot(AsmToken::Identifier)) {
     if (isParsingIntelSyntax()) return true;
     return Error(StartLoc, "invalid register name",
                  SMRange(StartLoc, EndLoc));
   }
 
   RegNo = MatchRegisterName(Tok.getString());
 
   // If the match failed, try the register name as lowercase.
   if (RegNo == 0)
     RegNo = MatchRegisterName(Tok.getString().lower());
 
   // The "flags" register cannot be referenced directly.
   // Treat it as an identifier instead.
   if (isParsingInlineAsm() && isParsingIntelSyntax() && RegNo == X86::EFLAGS)
     RegNo = 0;
 
   if (!is64BitMode()) {
     // FIXME: This should be done using Requires<Not64BitMode> and
     // Requires<In64BitMode> so "eiz" usage in 64-bit instructions can be also
     // checked.
     // FIXME: Check AH, CH, DH, BH cannot be used in an instruction requiring a
     // REX prefix.
-    if (RegNo == X86::RIZ || RegNo == X86::RIP || RegNo == X86::EIP ||
+    if (RegNo == X86::RIZ || RegNo == X86::RIP ||
         X86MCRegisterClasses[X86::GR64RegClassID].contains(RegNo) ||
         X86II::isX86_64NonExtLowByteReg(RegNo) ||
         X86II::isX86_64ExtendedReg(RegNo))
       return Error(StartLoc, "register %"
                    + Tok.getString() + " is only available in 64-bit mode",
                    SMRange(StartLoc, EndLoc));
   }
 
   // Parse "%st" as "%st(0)" and "%st(1)", which is multiple tokens.
   if (RegNo == 0 && (Tok.getString() == "st" || Tok.getString() == "ST")) {
     RegNo = X86::ST0;
     Parser.Lex(); // Eat 'st'
 
     // Check to see if we have '(4)' after %st.
     if (getLexer().isNot(AsmToken::LParen))
       return false;
     // Lex the paren.
     getParser().Lex();
 
     const AsmToken &IntTok = Parser.getTok();
     if (IntTok.isNot(AsmToken::Integer))
       return Error(IntTok.getLoc(), "expected stack index");
     switch (IntTok.getIntVal()) {
     case 0: RegNo = X86::ST0; break;
     case 1: RegNo = X86::ST1; break;
     case 2: RegNo = X86::ST2; break;
     case 3: RegNo = X86::ST3; break;
     case 4: RegNo = X86::ST4; break;
     case 5: RegNo = X86::ST5; break;
     case 6: RegNo = X86::ST6; break;
     case 7: RegNo = X86::ST7; break;
     default: return Error(IntTok.getLoc(), "invalid stack index");
     }
 
     if (getParser().Lex().isNot(AsmToken::RParen))
       return Error(Parser.getTok().getLoc(), "expected ')'");
 
     EndLoc = Parser.getTok().getEndLoc();
     Parser.Lex(); // Eat ')'
     return false;
   }
 
   EndLoc = Parser.getTok().getEndLoc();
 
   // If this is "db[0-15]", match it as an alias
   // for dr[0-15].
   if (RegNo == 0 && Tok.getString().startswith("db")) {
     if (Tok.getString().size() == 3) {
       switch (Tok.getString()[2]) {
       case '0': RegNo = X86::DR0; break;
       case '1': RegNo = X86::DR1; break;
       case '2': RegNo = X86::DR2; break;
       case '3': RegNo = X86::DR3; break;
       case '4': RegNo = X86::DR4; break;
       case '5': RegNo = X86::DR5; break;
       case '6': RegNo = X86::DR6; break;
       case '7': RegNo = X86::DR7; break;
       case '8': RegNo = X86::DR8; break;
       case '9': RegNo = X86::DR9; break;
       }
     } else if (Tok.getString().size() == 4 && Tok.getString()[2] == '1') {
       switch (Tok.getString()[3]) {
       case '0': RegNo = X86::DR10; break;
       case '1': RegNo = X86::DR11; break;
       case '2': RegNo = X86::DR12; break;
       case '3': RegNo = X86::DR13; break;
       case '4': RegNo = X86::DR14; break;
       case '5': RegNo = X86::DR15; break;
       }
     }
 
     if (RegNo != 0) {
       EndLoc = Parser.getTok().getEndLoc();
       Parser.Lex(); // Eat it.
       return false;
     }
   }
 
   if (RegNo == 0) {
     if (isParsingIntelSyntax()) return true;
     return Error(StartLoc, "invalid register name",
                  SMRange(StartLoc, EndLoc));
   }
 
   Parser.Lex(); // Eat identifier token.
   return false;
 }
 
 void X86AsmParser::SetFrameRegister(unsigned RegNo) {
   Instrumentation->SetInitialFrameRegister(RegNo);
 }
 
 std::unique_ptr<X86Operand> X86AsmParser::DefaultMemSIOperand(SMLoc Loc) {
   bool Parse32 = is32BitMode() || Code16GCC;
   unsigned Basereg = is64BitMode() ? X86::RSI : (Parse32 ? X86::ESI : X86::SI);
   const MCExpr *Disp = MCConstantExpr::create(0, getContext());
   return X86Operand::CreateMem(getPointerWidth(), /*SegReg=*/0, Disp,
                                /*BaseReg=*/Basereg, /*IndexReg=*/0, /*Scale=*/1,
                                Loc, Loc, 0);
 }
 
 std::unique_ptr<X86Operand> X86AsmParser::DefaultMemDIOperand(SMLoc Loc) {
   bool Parse32 = is32BitMode() || Code16GCC;
   unsigned Basereg = is64BitMode() ? X86::RDI : (Parse32 ? X86::EDI : X86::DI);
   const MCExpr *Disp = MCConstantExpr::create(0, getContext());
   return X86Operand::CreateMem(getPointerWidth(), /*SegReg=*/0, Disp,
                                /*BaseReg=*/Basereg, /*IndexReg=*/0, /*Scale=*/1,
                                Loc, Loc, 0);
 }
 
 bool X86AsmParser::IsSIReg(unsigned Reg) {
   switch (Reg) {
   default: llvm_unreachable("Only (R|E)SI and (R|E)DI are expected!");
   case X86::RSI:
   case X86::ESI:
   case X86::SI:
     return true;
   case X86::RDI:
   case X86::EDI:
   case X86::DI:
     return false;
   }
 }
 
 unsigned X86AsmParser::GetSIDIForRegClass(unsigned RegClassID, unsigned Reg,
                                           bool IsSIReg) {
   switch (RegClassID) {
   default: llvm_unreachable("Unexpected register class");
   case X86::GR64RegClassID:
     return IsSIReg ? X86::RSI : X86::RDI;
   case X86::GR32RegClassID:
     return IsSIReg ? X86::ESI : X86::EDI;
   case X86::GR16RegClassID:
     return IsSIReg ? X86::SI : X86::DI;
   }
 }
 
 void X86AsmParser::AddDefaultSrcDestOperands(
     OperandVector& Operands, std::unique_ptr<llvm::MCParsedAsmOperand> &&Src,
     std::unique_ptr<llvm::MCParsedAsmOperand> &&Dst) {
   if (isParsingIntelSyntax()) {
     Operands.push_back(std::move(Dst));
     Operands.push_back(std::move(Src));
   }
   else {
     Operands.push_back(std::move(Src));
     Operands.push_back(std::move(Dst));
   }
 }
 
 bool X86AsmParser::VerifyAndAdjustOperands(OperandVector &OrigOperands,
                                            OperandVector &FinalOperands) {
 
   if (OrigOperands.size() > 1) {
     // Check if sizes match, OrigOperands also contains the instruction name
     assert(OrigOperands.size() == FinalOperands.size() + 1 &&
            "Operand size mismatch");
 
     SmallVector<std::pair<SMLoc, std::string>, 2> Warnings;
     // Verify types match
     int RegClassID = -1;
     for (unsigned int i = 0; i < FinalOperands.size(); ++i) {
       X86Operand &OrigOp = static_cast<X86Operand &>(*OrigOperands[i + 1]);
       X86Operand &FinalOp = static_cast<X86Operand &>(*FinalOperands[i]);
 
       if (FinalOp.isReg() &&
           (!OrigOp.isReg() || FinalOp.getReg() != OrigOp.getReg()))
         // Return false and let a normal complaint about bogus operands happen
         return false;
 
       if (FinalOp.isMem()) {
 
         if (!OrigOp.isMem())
           // Return false and let a normal complaint about bogus operands happen
           return false;
 
         unsigned OrigReg = OrigOp.Mem.BaseReg;
         unsigned FinalReg = FinalOp.Mem.BaseReg;
 
         // If we've already encounterd a register class, make sure all register
         // bases are of the same register class
         if (RegClassID != -1 &&
             !X86MCRegisterClasses[RegClassID].contains(OrigReg)) {
           return Error(OrigOp.getStartLoc(),
                        "mismatching source and destination index registers");
         }
 
         if (X86MCRegisterClasses[X86::GR64RegClassID].contains(OrigReg))
           RegClassID = X86::GR64RegClassID;
         else if (X86MCRegisterClasses[X86::GR32RegClassID].contains(OrigReg))
           RegClassID = X86::GR32RegClassID;
         else if (X86MCRegisterClasses[X86::GR16RegClassID].contains(OrigReg))
           RegClassID = X86::GR16RegClassID;
         else
           // Unexpected register class type
           // Return false and let a normal complaint about bogus operands happen
           return false;
 
         bool IsSI = IsSIReg(FinalReg);
         FinalReg = GetSIDIForRegClass(RegClassID, FinalReg, IsSI);
 
         if (FinalReg != OrigReg) {
           std::string RegName = IsSI ? "ES:(R|E)SI" : "ES:(R|E)DI";
           Warnings.push_back(std::make_pair(
               OrigOp.getStartLoc(),
               "memory operand is only for determining the size, " + RegName +
                   " will be used for the location"));
         }
 
         FinalOp.Mem.Size = OrigOp.Mem.Size;
         FinalOp.Mem.SegReg = OrigOp.Mem.SegReg;
         FinalOp.Mem.BaseReg = FinalReg;
       }
     }
 
     // Produce warnings only if all the operands passed the adjustment - prevent
     // legal cases like "movsd (%rax), %xmm0" mistakenly produce warnings
     for (auto &WarningMsg : Warnings) {
       Warning(WarningMsg.first, WarningMsg.second);
     }
 
     // Remove old operands
     for (unsigned int i = 0; i < FinalOperands.size(); ++i)
       OrigOperands.pop_back();
   }
   // OrigOperands.append(FinalOperands.begin(), FinalOperands.end());
   for (unsigned int i = 0; i < FinalOperands.size(); ++i)
     OrigOperands.push_back(std::move(FinalOperands[i]));
 
   return false;
 }
 
 std::unique_ptr<X86Operand> X86AsmParser::ParseOperand() {
   if (isParsingIntelSyntax())
     return ParseIntelOperand();
   return ParseATTOperand();
 }
 
 std::unique_ptr<X86Operand> X86AsmParser::CreateMemForInlineAsm(
     unsigned SegReg, const MCExpr *Disp, unsigned BaseReg, unsigned IndexReg,
     unsigned Scale, SMLoc Start, SMLoc End, unsigned Size, StringRef Identifier,
     const InlineAsmIdentifierInfo &Info) {
   // If we found a decl other than a VarDecl, then assume it is a FuncDecl or
   // some other label reference.
   if (Info.isKind(InlineAsmIdentifierInfo::IK_Label)) {
     // Insert an explicit size if the user didn't have one.
     if (!Size) {
       Size = getPointerWidth();
       InstInfo->AsmRewrites->emplace_back(AOK_SizeDirective, Start,
                                           /*Len=*/0, Size);
     }
     // Create an absolute memory reference in order to match against
     // instructions taking a PC relative operand.
     return X86Operand::CreateMem(getPointerWidth(), Disp, Start, End, Size,
                                  Identifier, Info.Label.Decl);
   }
   // We either have a direct symbol reference, or an offset from a symbol.  The
   // parser always puts the symbol on the LHS, so look there for size
   // calculation purposes.
   unsigned FrontendSize = 0;
   void *Decl = nullptr;
   bool IsGlobalLV = false;
   if (Info.isKind(InlineAsmIdentifierInfo::IK_Var)) {
     // Size is in terms of bits in this context.
     FrontendSize = Info.Var.Type * 8;
     Decl = Info.Var.Decl;
     IsGlobalLV = Info.Var.IsGlobalLV;
   }
   // It is widely common for MS InlineAsm to use a global variable and one/two
   // registers in a mmory expression, and though unaccessible via rip/eip.
   if (IsGlobalLV && (BaseReg || IndexReg)) {
     return X86Operand::CreateMem(getPointerWidth(), Disp, Start, End);
   // Otherwise, we set the base register to a non-zero value
   // if we don't know the actual value at this time.  This is necessary to
   // get the matching correct in some cases.
   } else {
     BaseReg = BaseReg ? BaseReg : 1;
     return X86Operand::CreateMem(getPointerWidth(), SegReg, Disp, BaseReg,
                                  IndexReg, Scale, Start, End, Size, Identifier,
                                  Decl, FrontendSize);
   }
 }
 
 // Some binary bitwise operators have a named synonymous
 // Query a candidate string for being such a named operator
 // and if so - invoke the appropriate handler
 bool X86AsmParser::ParseIntelNamedOperator(StringRef Name, IntelExprStateMachine &SM) {
   // A named operator should be either lower or upper case, but not a mix
   if (Name.compare(Name.lower()) && Name.compare(Name.upper()))
     return false;
   if (Name.equals_lower("not"))
     SM.onNot();
   else if (Name.equals_lower("or"))
     SM.onOr();
   else if (Name.equals_lower("shl"))
     SM.onLShift();
   else if (Name.equals_lower("shr"))
     SM.onRShift();
   else if (Name.equals_lower("xor"))
     SM.onXor();
   else if (Name.equals_lower("and"))
     SM.onAnd();
   else if (Name.equals_lower("mod"))
     SM.onMod();
   else
     return false;
   return true;
 }
 
 bool X86AsmParser::ParseIntelExpression(IntelExprStateMachine &SM, SMLoc &End) {
   MCAsmParser &Parser = getParser();
   const AsmToken &Tok = Parser.getTok();
   StringRef ErrMsg;
 
   AsmToken::TokenKind PrevTK = AsmToken::Error;
   bool Done = false;
   while (!Done) {
     bool UpdateLocLex = true;
     AsmToken::TokenKind TK = getLexer().getKind();
 
     switch (TK) {
     default:
       if ((Done = SM.isValidEndState()))
         break;
       return Error(Tok.getLoc(), "unknown token in expression");
     case AsmToken::EndOfStatement:
       Done = true;
       break;
     case AsmToken::Real:
       // DotOperator: [ebx].0
       UpdateLocLex = false;
       if (ParseIntelDotOperator(SM, End))
         return true;
       break;
     case AsmToken::At:
     case AsmToken::String:
     case AsmToken::Identifier: {
       SMLoc IdentLoc = Tok.getLoc();
       StringRef Identifier = Tok.getString();
       UpdateLocLex = false;
       // Register
       unsigned Reg;
       if (Tok.is(AsmToken::Identifier) && !ParseRegister(Reg, IdentLoc, End)) {
         if (SM.onRegister(Reg, ErrMsg))
           return Error(Tok.getLoc(), ErrMsg);
         break;
       }
       // Operator synonymous ("not", "or" etc.)
       if ((UpdateLocLex = ParseIntelNamedOperator(Identifier, SM)))
         break;
       // Symbol reference, when parsing assembly content
       InlineAsmIdentifierInfo Info;
       const MCExpr *Val;
       if (!isParsingInlineAsm()) {
         if (getParser().parsePrimaryExpr(Val, End)) {
           return Error(Tok.getLoc(), "Unexpected identifier!");
         } else if (SM.onIdentifierExpr(Val, Identifier, Info, false, ErrMsg)) {
           return Error(IdentLoc, ErrMsg);
         } else
           break;
       }
       // MS InlineAsm operators (TYPE/LENGTH/SIZE)
       if (unsigned OpKind = IdentifyIntelInlineAsmOperator(Identifier)) {
         if (OpKind == IOK_OFFSET)
           return Error(IdentLoc, "Dealing OFFSET operator as part of"
             "a compound immediate expression is yet to be supported");
         if (int64_t Val = ParseIntelInlineAsmOperator(OpKind)) {
           if (SM.onInteger(Val, ErrMsg))
             return Error(IdentLoc, ErrMsg);
         } else
           return true;
         break;
       }
       // MS Dot Operator expression
       if (Identifier.count('.') && PrevTK == AsmToken::RBrac) {
         if (ParseIntelDotOperator(SM, End))
           return true;
         break;
       }
       // MS InlineAsm identifier
       // Call parseIdentifier() to combine @ with the identifier behind it.
       if (TK == AsmToken::At && Parser.parseIdentifier(Identifier))
         return Error(IdentLoc, "expected identifier");
       if (ParseIntelInlineAsmIdentifier(Val, Identifier, Info, false, End))
         return true;
       else if (SM.onIdentifierExpr(Val, Identifier, Info, true, ErrMsg))
         return Error(IdentLoc, ErrMsg);
       break;
     }
     case AsmToken::Integer: {
       // Look for 'b' or 'f' following an Integer as a directional label
       SMLoc Loc = getTok().getLoc();
       int64_t IntVal = getTok().getIntVal();
       End = consumeToken();
       UpdateLocLex = false;
       if (getLexer().getKind() == AsmToken::Identifier) {
         StringRef IDVal = getTok().getString();
         if (IDVal == "f" || IDVal == "b") {
           MCSymbol *Sym =
               getContext().getDirectionalLocalSymbol(IntVal, IDVal == "b");
           MCSymbolRefExpr::VariantKind Variant = MCSymbolRefExpr::VK_None;
           const MCExpr *Val =
               MCSymbolRefExpr::create(Sym, Variant, getContext());
           if (IDVal == "b" && Sym->isUndefined())
             return Error(Loc, "invalid reference to undefined symbol");
           StringRef Identifier = Sym->getName();
           InlineAsmIdentifierInfo Info;
           if (SM.onIdentifierExpr(Val, Identifier, Info,
               isParsingInlineAsm(), ErrMsg))
             return Error(Loc, ErrMsg);
           End = consumeToken();
         } else {
           if (SM.onInteger(IntVal, ErrMsg))
             return Error(Loc, ErrMsg);
         }
       } else {
         if (SM.onInteger(IntVal, ErrMsg))
           return Error(Loc, ErrMsg);
       }
       break;
     }
     case AsmToken::Plus:
       if (SM.onPlus(ErrMsg))
         return Error(getTok().getLoc(), ErrMsg);
       break;
     case AsmToken::Minus:
       if (SM.onMinus(ErrMsg))
         return Error(getTok().getLoc(), ErrMsg);
       break;
     case AsmToken::Tilde:   SM.onNot(); break;
     case AsmToken::Star:    SM.onStar(); break;
     case AsmToken::Slash:   SM.onDivide(); break;
     case AsmToken::Percent: SM.onMod(); break;
     case AsmToken::Pipe:    SM.onOr(); break;
     case AsmToken::Caret:   SM.onXor(); break;
     case AsmToken::Amp:     SM.onAnd(); break;
     case AsmToken::LessLess:
                             SM.onLShift(); break;
     case AsmToken::GreaterGreater:
                             SM.onRShift(); break;
     case AsmToken::LBrac:
       if (SM.onLBrac())
         return Error(Tok.getLoc(), "unexpected bracket encountered");
       break;
     case AsmToken::RBrac:
       if (SM.onRBrac())
         return Error(Tok.getLoc(), "unexpected bracket encountered");
       break;
     case AsmToken::LParen:  SM.onLParen(); break;
     case AsmToken::RParen:  SM.onRParen(); break;
     }
     if (SM.hadError())
       return Error(Tok.getLoc(), "unknown token in expression");
 
     if (!Done && UpdateLocLex)
       End = consumeToken();
 
     PrevTK = TK;
   }
   return false;
 }
 
 void X86AsmParser::RewriteIntelExpression(IntelExprStateMachine &SM,
                                           SMLoc Start, SMLoc End) {
   SMLoc Loc = Start;
   unsigned ExprLen = End.getPointer() - Start.getPointer();
   // Skip everything before a symbol displacement (if we have one)
   if (SM.getSym()) {
     StringRef SymName = SM.getSymName();
     if (unsigned Len =  SymName.data() - Start.getPointer())
       InstInfo->AsmRewrites->emplace_back(AOK_Skip, Start, Len);
     Loc = SMLoc::getFromPointer(SymName.data() + SymName.size());
     ExprLen = End.getPointer() - (SymName.data() + SymName.size());
     // If we have only a symbol than there's no need for complex rewrite,
     // simply skip everything after it
     if (!(SM.getBaseReg() || SM.getIndexReg() || SM.getImm())) {
       if (ExprLen)
         InstInfo->AsmRewrites->emplace_back(AOK_Skip, Loc, ExprLen);
       return;
     }
   }
   // Build an Intel Expression rewrite
   StringRef BaseRegStr;
   StringRef IndexRegStr;
   if (SM.getBaseReg())
     BaseRegStr = X86IntelInstPrinter::getRegisterName(SM.getBaseReg());
   if (SM.getIndexReg())
     IndexRegStr = X86IntelInstPrinter::getRegisterName(SM.getIndexReg());
   // Emit it
   IntelExpr Expr(BaseRegStr, IndexRegStr, SM.getScale(), SM.getImm(), SM.isMemExpr());
   InstInfo->AsmRewrites->emplace_back(Loc, ExprLen, Expr);
 }
 
 // Inline assembly may use variable names with namespace alias qualifiers.
 bool X86AsmParser::ParseIntelInlineAsmIdentifier(const MCExpr *&Val,
                                                  StringRef &Identifier,
                                                  InlineAsmIdentifierInfo &Info,
                                                  bool IsUnevaluatedOperand,
                                                  SMLoc &End) {
   MCAsmParser &Parser = getParser();
   assert(isParsingInlineAsm() && "Expected to be parsing inline assembly.");
   Val = nullptr;
 
   StringRef LineBuf(Identifier.data());
   SemaCallback->LookupInlineAsmIdentifier(LineBuf, Info, IsUnevaluatedOperand);
 
   const AsmToken &Tok = Parser.getTok();
   SMLoc Loc = Tok.getLoc();
 
   // Advance the token stream until the end of the current token is
   // after the end of what the frontend claimed.
   const char *EndPtr = Tok.getLoc().getPointer() + LineBuf.size();
   do {
     End = Tok.getEndLoc();
     getLexer().Lex();
   } while (End.getPointer() < EndPtr);
   Identifier = LineBuf;
 
   // The frontend should end parsing on an assembler token boundary, unless it
   // failed parsing.
   assert((End.getPointer() == EndPtr ||
           Info.isKind(InlineAsmIdentifierInfo::IK_Invalid)) &&
           "frontend claimed part of a token?");
 
   // If the identifier lookup was unsuccessful, assume that we are dealing with
   // a label.
   if (Info.isKind(InlineAsmIdentifierInfo::IK_Invalid)) {
     StringRef InternalName =
       SemaCallback->LookupInlineAsmLabel(Identifier, getSourceManager(),
                                          Loc, false);
     assert(InternalName.size() && "We should have an internal name here.");
     // Push a rewrite for replacing the identifier name with the internal name.
     InstInfo->AsmRewrites->emplace_back(AOK_Label, Loc, Identifier.size(),
                                         InternalName);
   } else if (Info.isKind(InlineAsmIdentifierInfo::IK_EnumVal))
     return false;
   // Create the symbol reference.
   MCSymbol *Sym = getContext().getOrCreateSymbol(Identifier);
   MCSymbolRefExpr::VariantKind Variant = MCSymbolRefExpr::VK_None;
   Val = MCSymbolRefExpr::create(Sym, Variant, getParser().getContext());
   return false;
 }
 
 //ParseRoundingModeOp - Parse AVX-512 rounding mode operand
 std::unique_ptr<X86Operand>
 X86AsmParser::ParseRoundingModeOp(SMLoc Start) {
   MCAsmParser &Parser = getParser();
   const AsmToken &Tok = Parser.getTok();
   // Eat "{" and mark the current place.
   const SMLoc consumedToken = consumeToken();
   if (Tok.getIdentifier().startswith("r")){
     int rndMode = StringSwitch<int>(Tok.getIdentifier())
       .Case("rn", X86::STATIC_ROUNDING::TO_NEAREST_INT)
       .Case("rd", X86::STATIC_ROUNDING::TO_NEG_INF)
       .Case("ru", X86::STATIC_ROUNDING::TO_POS_INF)
       .Case("rz", X86::STATIC_ROUNDING::TO_ZERO)
       .Default(-1);
     if (-1 == rndMode)
       return ErrorOperand(Tok.getLoc(), "Invalid rounding mode.");
      Parser.Lex();  // Eat "r*" of r*-sae
     if (!getLexer().is(AsmToken::Minus))
       return ErrorOperand(Tok.getLoc(), "Expected - at this point");
     Parser.Lex();  // Eat "-"
     Parser.Lex();  // Eat the sae
     if (!getLexer().is(AsmToken::RCurly))
       return ErrorOperand(Tok.getLoc(), "Expected } at this point");
     SMLoc End = Tok.getEndLoc();
     Parser.Lex();  // Eat "}"
     const MCExpr *RndModeOp =
       MCConstantExpr::create(rndMode, Parser.getContext());
     return X86Operand::CreateImm(RndModeOp, Start, End);
   }
   if(Tok.getIdentifier().equals("sae")){
     Parser.Lex();  // Eat the sae
     if (!getLexer().is(AsmToken::RCurly))
       return ErrorOperand(Tok.getLoc(), "Expected } at this point");
     Parser.Lex();  // Eat "}"
     return X86Operand::CreateToken("{sae}", consumedToken);
   }
   return ErrorOperand(Tok.getLoc(), "unknown token in expression");
 }
 
 /// Parse the '.' operator.
 bool X86AsmParser::ParseIntelDotOperator(IntelExprStateMachine &SM, SMLoc &End) {
   const AsmToken &Tok = getTok();
   unsigned Offset;
 
   // Drop the optional '.'.
   StringRef DotDispStr = Tok.getString();
   if (DotDispStr.startswith("."))
     DotDispStr = DotDispStr.drop_front(1);
 
   // .Imm gets lexed as a real.
   if (Tok.is(AsmToken::Real)) {
     APInt DotDisp;
     DotDispStr.getAsInteger(10, DotDisp);
     Offset = DotDisp.getZExtValue();
   } else if (isParsingInlineAsm() && Tok.is(AsmToken::Identifier)) {
     std::pair<StringRef, StringRef> BaseMember = DotDispStr.split('.');
     if (SemaCallback->LookupInlineAsmField(BaseMember.first, BaseMember.second,
                                            Offset))
       return Error(Tok.getLoc(), "Unable to lookup field reference!");
   } else
     return Error(Tok.getLoc(), "Unexpected token type!");
 
   // Eat the DotExpression and update End
   End = SMLoc::getFromPointer(DotDispStr.data());
   const char *DotExprEndLoc = DotDispStr.data() + DotDispStr.size();
   while (Tok.getLoc().getPointer() < DotExprEndLoc)
     Lex();
   SM.addImm(Offset);
   return false;
 }
 
 /// Parse the 'offset' operator.  This operator is used to specify the
 /// location rather then the content of a variable.
 std::unique_ptr<X86Operand> X86AsmParser::ParseIntelOffsetOfOperator() {
   MCAsmParser &Parser = getParser();
   const AsmToken &Tok = Parser.getTok();
   SMLoc OffsetOfLoc = Tok.getLoc();
   Parser.Lex(); // Eat offset.
 
   const MCExpr *Val;
   InlineAsmIdentifierInfo Info;
   SMLoc Start = Tok.getLoc(), End;
   StringRef Identifier = Tok.getString();
   if (ParseIntelInlineAsmIdentifier(Val, Identifier, Info,
                                     /*Unevaluated=*/false, End))
     return nullptr;
 
   void *Decl = nullptr;
   // FIXME: MS evaluates "offset <Constant>" to the underlying integral
   if (Info.isKind(InlineAsmIdentifierInfo::IK_EnumVal))
     return ErrorOperand(Start, "offset operator cannot yet handle constants");
   else if (Info.isKind(InlineAsmIdentifierInfo::IK_Var))
     Decl = Info.Var.Decl;
   // Don't emit the offset operator.
   InstInfo->AsmRewrites->emplace_back(AOK_Skip, OffsetOfLoc, 7);
 
   // The offset operator will have an 'r' constraint, thus we need to create
   // register operand to ensure proper matching.  Just pick a GPR based on
   // the size of a pointer.
   bool Parse32 = is32BitMode() || Code16GCC;
   unsigned RegNo = is64BitMode() ? X86::RBX : (Parse32 ? X86::EBX : X86::BX);
 
   return X86Operand::CreateReg(RegNo, Start, End, /*GetAddress=*/true,
                                OffsetOfLoc, Identifier, Decl);
 }
 
 // Query a candidate string for being an Intel assembly operator
 // Report back its kind, or IOK_INVALID if does not evaluated as a known one
 unsigned X86AsmParser::IdentifyIntelInlineAsmOperator(StringRef Name) {
   return StringSwitch<unsigned>(Name)
     .Cases("TYPE","type",IOK_TYPE)
     .Cases("SIZE","size",IOK_SIZE)
     .Cases("LENGTH","length",IOK_LENGTH)
     .Cases("OFFSET","offset",IOK_OFFSET)
     .Default(IOK_INVALID);
 }
 
 /// Parse the 'LENGTH', 'TYPE' and 'SIZE' operators.  The LENGTH operator
 /// returns the number of elements in an array.  It returns the value 1 for
 /// non-array variables.  The SIZE operator returns the size of a C or C++
 /// variable.  A variable's size is the product of its LENGTH and TYPE.  The
 /// TYPE operator returns the size of a C or C++ type or variable. If the
 /// variable is an array, TYPE returns the size of a single element.
 unsigned X86AsmParser::ParseIntelInlineAsmOperator(unsigned OpKind) {
   MCAsmParser &Parser = getParser();
   const AsmToken &Tok = Parser.getTok();
   Parser.Lex(); // Eat operator.
 
   const MCExpr *Val = nullptr;
   InlineAsmIdentifierInfo Info;
   SMLoc Start = Tok.getLoc(), End;
   StringRef Identifier = Tok.getString();
   if (ParseIntelInlineAsmIdentifier(Val, Identifier, Info,
                                     /*Unevaluated=*/true, End))
     return 0;
 
   if (!Info.isKind(InlineAsmIdentifierInfo::IK_Var)) {
     Error(Start, "unable to lookup expression");
     return 0;
   }
 
   unsigned CVal = 0;
   switch(OpKind) {
   default: llvm_unreachable("Unexpected operand kind!");
   case IOK_LENGTH: CVal = Info.Var.Length; break;
   case IOK_SIZE: CVal = Info.Var.Size; break;
   case IOK_TYPE: CVal = Info.Var.Type; break;
   }
 
   return CVal;
 }
 
 bool X86AsmParser::ParseIntelMemoryOperandSize(unsigned &Size) {
   Size = StringSwitch<unsigned>(getTok().getString())
     .Cases("BYTE", "byte", 8)
     .Cases("WORD", "word", 16)
     .Cases("DWORD", "dword", 32)
     .Cases("FLOAT", "float", 32)
     .Cases("LONG", "long", 32)
     .Cases("FWORD", "fword", 48)
     .Cases("DOUBLE", "double", 64)
     .Cases("QWORD", "qword", 64)
     .Cases("MMWORD","mmword", 64)
     .Cases("XWORD", "xword", 80)
     .Cases("TBYTE", "tbyte", 80)
     .Cases("XMMWORD", "xmmword", 128)
     .Cases("YMMWORD", "ymmword", 256)
     .Cases("ZMMWORD", "zmmword", 512)
     .Default(0);
   if (Size) {
     const AsmToken &Tok = Lex(); // Eat operand size (e.g., byte, word).
     if (!(Tok.getString().equals("PTR") || Tok.getString().equals("ptr")))
       return Error(Tok.getLoc(), "Expected 'PTR' or 'ptr' token!");
     Lex(); // Eat ptr.
   }
   return false;
 }
 
 std::unique_ptr<X86Operand> X86AsmParser::ParseIntelOperand() {
   MCAsmParser &Parser = getParser();
   const AsmToken &Tok = Parser.getTok();
   SMLoc Start, End;
 
   // FIXME: Offset operator
   // Should be handled as part of immediate expression, as other operators
   // Currently, only supported as a stand-alone operand
   if (isParsingInlineAsm())
     if (IdentifyIntelInlineAsmOperator(Tok.getString()) == IOK_OFFSET)
       return ParseIntelOffsetOfOperator();
 
   // Parse optional Size directive.
   unsigned Size;
   if (ParseIntelMemoryOperandSize(Size))
     return nullptr;
   bool PtrInOperand = bool(Size);
 
   Start = Tok.getLoc();
 
   // Rounding mode operand.
   if (getLexer().is(AsmToken::LCurly))
     return ParseRoundingModeOp(Start);
 
   // Register operand.
   unsigned RegNo = 0;
   if (Tok.is(AsmToken::Identifier) && !ParseRegister(RegNo, Start, End)) {
     if (RegNo == X86::RIP)
       return ErrorOperand(Start, "rip can only be used as a base register");
     // A Register followed by ':' is considered a segment override
     if (Tok.isNot(AsmToken::Colon))
       return !PtrInOperand ? X86Operand::CreateReg(RegNo, Start, End) :
         ErrorOperand(Start, "expected memory operand after 'ptr', "
                             "found register operand instead");
     // An alleged segment override. check if we have a valid segment register
     if (!X86MCRegisterClasses[X86::SEGMENT_REGRegClassID].contains(RegNo))
       return ErrorOperand(Start, "invalid segment register");
     // Eat ':' and update Start location
     Start = Lex().getLoc();
   }
 
   // Immediates and Memory
   IntelExprStateMachine SM;
   if (ParseIntelExpression(SM, End))
     return nullptr;
 
   if (isParsingInlineAsm())
     RewriteIntelExpression(SM, Start, Tok.getLoc());
 
   int64_t Imm = SM.getImm();
   const MCExpr *Disp = SM.getSym();
   const MCExpr *ImmDisp = MCConstantExpr::create(Imm, getContext());
   if (Disp && Imm)
     Disp = MCBinaryExpr::createAdd(Disp, ImmDisp, getContext());
   if (!Disp)
     Disp = ImmDisp;
 
   // RegNo != 0 specifies a valid segment register,
   // and we are parsing a segment override
   if (!SM.isMemExpr() && !RegNo)
     return X86Operand::CreateImm(Disp, Start, End);
 
   StringRef ErrMsg;
   unsigned BaseReg = SM.getBaseReg();
   unsigned IndexReg = SM.getIndexReg();
   unsigned Scale = SM.getScale();
 
   if (Scale == 0 && BaseReg != X86::ESP && BaseReg != X86::RSP &&
       (IndexReg == X86::ESP || IndexReg == X86::RSP))
     std::swap(BaseReg, IndexReg);
 
   // If BaseReg is a vector register and IndexReg is not, swap them unless
   // Scale was specified in which case it would be an error.
   if (Scale == 0 &&
       !(X86MCRegisterClasses[X86::VR128XRegClassID].contains(IndexReg) ||
         X86MCRegisterClasses[X86::VR256XRegClassID].contains(IndexReg) ||
         X86MCRegisterClasses[X86::VR512RegClassID].contains(IndexReg)) &&
       (X86MCRegisterClasses[X86::VR128XRegClassID].contains(BaseReg) ||
        X86MCRegisterClasses[X86::VR256XRegClassID].contains(BaseReg) ||
        X86MCRegisterClasses[X86::VR512RegClassID].contains(BaseReg)))
     std::swap(BaseReg, IndexReg);
 
   if (Scale != 0 &&
       X86MCRegisterClasses[X86::GR16RegClassID].contains(IndexReg))
     return ErrorOperand(Start, "16-bit addresses cannot have a scale");
 
   // If there was no explicit scale specified, change it to 1.
   if (Scale == 0)
     Scale = 1;
 
   // If this is a 16-bit addressing mode with the base and index in the wrong
   // order, swap them so CheckBaseRegAndIndexRegAndScale doesn't fail. It is
   // shared with att syntax where order matters.
   if ((BaseReg == X86::SI || BaseReg == X86::DI) &&
       (IndexReg == X86::BX || IndexReg == X86::BP))
     std::swap(BaseReg, IndexReg);
 
   if ((BaseReg || IndexReg) &&
       CheckBaseRegAndIndexRegAndScale(BaseReg, IndexReg, Scale, is64BitMode(),
                                       ErrMsg))
     return ErrorOperand(Start, ErrMsg);
   if (isParsingInlineAsm())
     return CreateMemForInlineAsm(RegNo, Disp, BaseReg, IndexReg,
                                  Scale, Start, End, Size, SM.getSymName(),
                                  SM.getIdentifierInfo());
   if (!(BaseReg || IndexReg || RegNo))
     return X86Operand::CreateMem(getPointerWidth(), Disp, Start, End, Size);
   return X86Operand::CreateMem(getPointerWidth(), RegNo, Disp,
                                BaseReg, IndexReg, Scale, Start, End, Size);
 }
 
 std::unique_ptr<X86Operand> X86AsmParser::ParseATTOperand() {
   MCAsmParser &Parser = getParser();
   switch (getLexer().getKind()) {
   default:
     // Parse a memory operand with no segment register.
     return ParseMemOperand(0, Parser.getTok().getLoc());
   case AsmToken::Percent: {
     // Read the register.
     unsigned RegNo;
     SMLoc Start, End;
     if (ParseRegister(RegNo, Start, End)) return nullptr;
     if (RegNo == X86::EIZ || RegNo == X86::RIZ) {
       Error(Start, "%eiz and %riz can only be used as index registers",
             SMRange(Start, End));
       return nullptr;
     }
     if (RegNo == X86::RIP) {
       Error(Start, "%rip can only be used as a base register",
             SMRange(Start, End));
       return nullptr;
     }
 
     // If this is a segment register followed by a ':', then this is the start
     // of a memory reference, otherwise this is a normal register reference.
     if (getLexer().isNot(AsmToken::Colon))
       return X86Operand::CreateReg(RegNo, Start, End);
 
     if (!X86MCRegisterClasses[X86::SEGMENT_REGRegClassID].contains(RegNo))
       return ErrorOperand(Start, "invalid segment register");
 
     getParser().Lex(); // Eat the colon.
     return ParseMemOperand(RegNo, Start);
   }
   case AsmToken::Dollar: {
     // $42 -> immediate.
     SMLoc Start = Parser.getTok().getLoc(), End;
     Parser.Lex();
     const MCExpr *Val;
     if (getParser().parseExpression(Val, End))
       return nullptr;
     return X86Operand::CreateImm(Val, Start, End);
   }
   case AsmToken::LCurly:{
     SMLoc Start = Parser.getTok().getLoc();
     return ParseRoundingModeOp(Start);
   }
   }
 }
 
 // true on failure, false otherwise
 // If no {z} mark was found - Parser doesn't advance
 bool X86AsmParser::ParseZ(std::unique_ptr<X86Operand> &Z,
                           const SMLoc &StartLoc) {
   MCAsmParser &Parser = getParser();
   // Assuming we are just pass the '{' mark, quering the next token
   // Searched for {z}, but none was found. Return false, as no parsing error was
   // encountered
   if (!(getLexer().is(AsmToken::Identifier) &&
         (getLexer().getTok().getIdentifier() == "z")))
     return false;
   Parser.Lex(); // Eat z
   // Query and eat the '}' mark
   if (!getLexer().is(AsmToken::RCurly))
     return Error(getLexer().getLoc(), "Expected } at this point");
   Parser.Lex(); // Eat '}'
   // Assign Z with the {z} mark opernad
   Z = X86Operand::CreateToken("{z}", StartLoc);
   return false;
 }
 
 // true on failure, false otherwise
 bool X86AsmParser::HandleAVX512Operand(OperandVector &Operands,
                                        const MCParsedAsmOperand &Op) {
   MCAsmParser &Parser = getParser();
   if (getLexer().is(AsmToken::LCurly)) {
     // Eat "{" and mark the current place.
     const SMLoc consumedToken = consumeToken();
     // Distinguish {1to<NUM>} from {%k<NUM>}.
     if(getLexer().is(AsmToken::Integer)) {
       // Parse memory broadcasting ({1to<NUM>}).
       if (getLexer().getTok().getIntVal() != 1)
         return TokError("Expected 1to<NUM> at this point");
       Parser.Lex();  // Eat "1" of 1to8
       if (!getLexer().is(AsmToken::Identifier) ||
           !getLexer().getTok().getIdentifier().startswith("to"))
         return TokError("Expected 1to<NUM> at this point");
       // Recognize only reasonable suffixes.
       const char *BroadcastPrimitive =
         StringSwitch<const char*>(getLexer().getTok().getIdentifier())
           .Case("to2",  "{1to2}")
           .Case("to4",  "{1to4}")
           .Case("to8",  "{1to8}")
           .Case("to16", "{1to16}")
           .Default(nullptr);
       if (!BroadcastPrimitive)
         return TokError("Invalid memory broadcast primitive.");
       Parser.Lex();  // Eat "toN" of 1toN
       if (!getLexer().is(AsmToken::RCurly))
         return TokError("Expected } at this point");
       Parser.Lex();  // Eat "}"
       Operands.push_back(X86Operand::CreateToken(BroadcastPrimitive,
                                                  consumedToken));
       // No AVX512 specific primitives can pass
       // after memory broadcasting, so return.
       return false;
     } else {
       // Parse either {k}{z}, {z}{k}, {k} or {z}
       // last one have no meaning, but GCC accepts it
       // Currently, we're just pass a '{' mark
       std::unique_ptr<X86Operand> Z;
       if (ParseZ(Z, consumedToken))
         return true;
       // Reaching here means that parsing of the allegadly '{z}' mark yielded
       // no errors.
       // Query for the need of further parsing for a {%k<NUM>} mark
       if (!Z || getLexer().is(AsmToken::LCurly)) {
         SMLoc StartLoc = Z ? consumeToken() : consumedToken;
         // Parse an op-mask register mark ({%k<NUM>}), which is now to be
         // expected
         unsigned RegNo;
         SMLoc RegLoc;
         if (!ParseRegister(RegNo, RegLoc, StartLoc) &&
             X86MCRegisterClasses[X86::VK1RegClassID].contains(RegNo)) {
           if (RegNo == X86::K0)
             return Error(RegLoc, "Register k0 can't be used as write mask");
           if (!getLexer().is(AsmToken::RCurly))
             return Error(getLexer().getLoc(), "Expected } at this point");
           Operands.push_back(X86Operand::CreateToken("{", StartLoc));
           Operands.push_back(
               X86Operand::CreateReg(RegNo, StartLoc, StartLoc));
           Operands.push_back(X86Operand::CreateToken("}", consumeToken()));
         } else
           return Error(getLexer().getLoc(),
                         "Expected an op-mask register at this point");
         // {%k<NUM>} mark is found, inquire for {z}
         if (getLexer().is(AsmToken::LCurly) && !Z) {
           // Have we've found a parsing error, or found no (expected) {z} mark
           // - report an error
           if (ParseZ(Z, consumeToken()) || !Z)
             return Error(getLexer().getLoc(),
                          "Expected a {z} mark at this point");
 
         }
         // '{z}' on its own is meaningless, hence should be ignored.
         // on the contrary - have it been accompanied by a K register,
         // allow it.
         if (Z)
           Operands.push_back(std::move(Z));
       }
     }
   }
   return false;
 }
 
 /// ParseMemOperand: segment: disp(basereg, indexreg, scale).  The '%ds:' prefix
 /// has already been parsed if present.
 std::unique_ptr<X86Operand> X86AsmParser::ParseMemOperand(unsigned SegReg,
                                                           SMLoc MemStart) {
 
   MCAsmParser &Parser = getParser();
   // We have to disambiguate a parenthesized expression "(4+5)" from the start
   // of a memory operand with a missing displacement "(%ebx)" or "(,%eax)".  The
   // only way to do this without lookahead is to eat the '(' and see what is
   // after it.
   const MCExpr *Disp = MCConstantExpr::create(0, getParser().getContext());
   if (getLexer().isNot(AsmToken::LParen)) {
     SMLoc ExprEnd;
     if (getParser().parseExpression(Disp, ExprEnd)) return nullptr;
     // Disp may be a variable, handle register values.
     if (auto *RE = dyn_cast<X86MCExpr>(Disp))
       return X86Operand::CreateReg(RE->getRegNo(), MemStart, ExprEnd);
 
     // After parsing the base expression we could either have a parenthesized
     // memory address or not.  If not, return now.  If so, eat the (.
     if (getLexer().isNot(AsmToken::LParen)) {
       // Unless we have a segment register, treat this as an immediate.
       if (SegReg == 0)
         return X86Operand::CreateMem(getPointerWidth(), Disp, MemStart, ExprEnd);
       return X86Operand::CreateMem(getPointerWidth(), SegReg, Disp, 0, 0, 1,
                                    MemStart, ExprEnd);
     }
 
     // Eat the '('.
     Parser.Lex();
   } else {
     // Okay, we have a '('.  We don't know if this is an expression or not, but
     // so we have to eat the ( to see beyond it.
     SMLoc LParenLoc = Parser.getTok().getLoc();
     Parser.Lex(); // Eat the '('.
 
     if (getLexer().is(AsmToken::Percent) || getLexer().is(AsmToken::Comma)) {
       // Nothing to do here, fall into the code below with the '(' part of the
       // memory operand consumed.
     } else {
       SMLoc ExprEnd;
       getLexer().UnLex(AsmToken(AsmToken::LParen, "("));
 
       // It must be either an parenthesized expression, or an expression that
       // begins from a parenthesized expression, parse it now. Example: (1+2) or
       // (1+2)+3
       if (getParser().parseExpression(Disp, ExprEnd))
         return nullptr;
 
       // After parsing the base expression we could either have a parenthesized
       // memory address or not.  If not, return now.  If so, eat the (.
       if (getLexer().isNot(AsmToken::LParen)) {
         // Unless we have a segment register, treat this as an immediate.
         if (SegReg == 0)
           return X86Operand::CreateMem(getPointerWidth(), Disp, LParenLoc,
                                        ExprEnd);
         return X86Operand::CreateMem(getPointerWidth(), SegReg, Disp, 0, 0, 1,
                                      MemStart, ExprEnd);
       }
 
       // Eat the '('.
       Parser.Lex();
     }
   }
 
   // If we reached here, then we just ate the ( of the memory operand.  Process
   // the rest of the memory operand.
   unsigned BaseReg = 0, IndexReg = 0, Scale = 1;
   SMLoc IndexLoc, BaseLoc;
 
   if (getLexer().is(AsmToken::Percent)) {
     SMLoc StartLoc, EndLoc;
     BaseLoc = Parser.getTok().getLoc();
     if (ParseRegister(BaseReg, StartLoc, EndLoc)) return nullptr;
     if (BaseReg == X86::EIZ || BaseReg == X86::RIZ) {
       Error(StartLoc, "eiz and riz can only be used as index registers",
             SMRange(StartLoc, EndLoc));
       return nullptr;
     }
   }
 
   if (getLexer().is(AsmToken::Comma)) {
     Parser.Lex(); // Eat the comma.
     IndexLoc = Parser.getTok().getLoc();
 
     // Following the comma we should have either an index register, or a scale
     // value. We don't support the later form, but we want to parse it
     // correctly.
     //
     // Not that even though it would be completely consistent to support syntax
     // like "1(%eax,,1)", the assembler doesn't. Use "eiz" or "riz" for this.
     if (getLexer().is(AsmToken::Percent)) {
       SMLoc L;
       if (ParseRegister(IndexReg, L, L))
         return nullptr;
       if (BaseReg == X86::RIP) {
         Error(IndexLoc, "%rip as base register can not have an index register");
         return nullptr;
       }
       if (IndexReg == X86::RIP) {
         Error(IndexLoc, "%rip is not allowed as an index register");
         return nullptr;
       }
 
       if (getLexer().isNot(AsmToken::RParen)) {
         // Parse the scale amount:
         //  ::= ',' [scale-expression]
         if (parseToken(AsmToken::Comma, "expected comma in scale expression"))
           return nullptr;
 
         if (getLexer().isNot(AsmToken::RParen)) {
           SMLoc Loc = Parser.getTok().getLoc();
 
           int64_t ScaleVal;
           if (getParser().parseAbsoluteExpression(ScaleVal)){
             Error(Loc, "expected scale expression");
             return nullptr;
           }
 
           // Validate the scale amount.
           if (X86MCRegisterClasses[X86::GR16RegClassID].contains(BaseReg) &&
               ScaleVal != 1) {
             Error(Loc, "scale factor in 16-bit address must be 1");
             return nullptr;
           }
           if (ScaleVal != 1 && ScaleVal != 2 && ScaleVal != 4 &&
               ScaleVal != 8) {
             Error(Loc, "scale factor in address must be 1, 2, 4 or 8");
             return nullptr;
           }
           Scale = (unsigned)ScaleVal;
         }
       }
     } else if (getLexer().isNot(AsmToken::RParen)) {
       // A scale amount without an index is ignored.
       // index.
       SMLoc Loc = Parser.getTok().getLoc();
 
       int64_t Value;
       if (getParser().parseAbsoluteExpression(Value))
         return nullptr;
 
       if (Value != 1)
         Warning(Loc, "scale factor without index register is ignored");
       Scale = 1;
     }
   }
 
   // Ok, we've eaten the memory operand, verify we have a ')' and eat it too.
   SMLoc MemEnd = Parser.getTok().getEndLoc();
   if (parseToken(AsmToken::RParen, "unexpected token in memory operand"))
     return nullptr;
 
   // This is a terrible hack to handle "out[s]?[bwl]? %al, (%dx)" ->
   // "outb %al, %dx".  Out doesn't take a memory form, but this is a widely
   // documented form in various unofficial manuals, so a lot of code uses it.
   if (BaseReg == X86::DX && IndexReg == 0 && Scale == 1 &&
       SegReg == 0 && isa<MCConstantExpr>(Disp) &&
       cast<MCConstantExpr>(Disp)->getValue() == 0)
     return X86Operand::CreateDXReg(BaseLoc, BaseLoc);
 
   StringRef ErrMsg;
   if (CheckBaseRegAndIndexRegAndScale(BaseReg, IndexReg, Scale, is64BitMode(),
                                       ErrMsg)) {
     Error(BaseLoc, ErrMsg);
     return nullptr;
   }
 
   if (SegReg || BaseReg || IndexReg)
     return X86Operand::CreateMem(getPointerWidth(), SegReg, Disp, BaseReg,
                                  IndexReg, Scale, MemStart, MemEnd);
   return X86Operand::CreateMem(getPointerWidth(), Disp, MemStart, MemEnd);
 }
 
 // Parse either a standard primary expression or a register.
 bool X86AsmParser::parsePrimaryExpr(const MCExpr *&Res, SMLoc &EndLoc) {
   MCAsmParser &Parser = getParser();
   if (Parser.parsePrimaryExpr(Res, EndLoc)) {
     SMLoc StartLoc = Parser.getTok().getLoc();
     // Normal Expression parse fails, check if it could be a register.
     unsigned RegNo;
     bool TryRegParse =
         getTok().is(AsmToken::Percent) ||
         (isParsingIntelSyntax() && getTok().is(AsmToken::Identifier));
     if (!TryRegParse || ParseRegister(RegNo, StartLoc, EndLoc))
       return true;
     // Clear previous parse error and return correct expression.
     Parser.clearPendingErrors();
     Res = X86MCExpr::create(RegNo, Parser.getContext());
     return false;
   }
 
   return false;
 }
 
 bool X86AsmParser::ParseInstruction(ParseInstructionInfo &Info, StringRef Name,
                                     SMLoc NameLoc, OperandVector &Operands) {
   MCAsmParser &Parser = getParser();
   InstInfo = &Info;
   StringRef PatchedName = Name;
 
   if ((Name.equals("jmp") || Name.equals("jc") || Name.equals("jz")) &&
       isParsingIntelSyntax() && isParsingInlineAsm()) {
     StringRef NextTok = Parser.getTok().getString();
     if (NextTok == "short") {
       SMLoc NameEndLoc =
           NameLoc.getFromPointer(NameLoc.getPointer() + Name.size());
       // Eat the short keyword
       Parser.Lex();
       // MS ignores the short keyword, it determines the jmp type based
       // on the distance of the label
       InstInfo->AsmRewrites->emplace_back(AOK_Skip, NameEndLoc,
                                           NextTok.size() + 1);
     }
   }
 
   // FIXME: Hack to recognize setneb as setne.
   if (PatchedName.startswith("set") && PatchedName.endswith("b") &&
       PatchedName != "setb" && PatchedName != "setnb")
     PatchedName = PatchedName.substr(0, Name.size()-1);
 
   // FIXME: Hack to recognize cmp<comparison code>{ss,sd,ps,pd}.
   if ((PatchedName.startswith("cmp") || PatchedName.startswith("vcmp")) &&
       (PatchedName.endswith("ss") || PatchedName.endswith("sd") ||
        PatchedName.endswith("ps") || PatchedName.endswith("pd"))) {
     bool IsVCMP = PatchedName[0] == 'v';
     unsigned CCIdx = IsVCMP ? 4 : 3;
     unsigned ComparisonCode = StringSwitch<unsigned>(
       PatchedName.slice(CCIdx, PatchedName.size() - 2))
       .Case("eq",       0x00)
       .Case("eq_oq",    0x00)
       .Case("lt",       0x01)
       .Case("lt_os",    0x01)
       .Case("le",       0x02)
       .Case("le_os",    0x02)
       .Case("unord",    0x03)
       .Case("unord_q",  0x03)
       .Case("neq",      0x04)
       .Case("neq_uq",   0x04)
       .Case("nlt",      0x05)
       .Case("nlt_us",   0x05)
       .Case("nle",      0x06)
       .Case("nle_us",   0x06)
       .Case("ord",      0x07)
       .Case("ord_q",    0x07)
       /* AVX only from here */
       .Case("eq_uq",    0x08)
       .Case("nge",      0x09)
       .Case("nge_us",   0x09)
       .Case("ngt",      0x0A)
       .Case("ngt_us",   0x0A)
       .Case("false",    0x0B)
       .Case("false_oq", 0x0B)
       .Case("neq_oq",   0x0C)
       .Case("ge",       0x0D)
       .Case("ge_os",    0x0D)
       .Case("gt",       0x0E)
       .Case("gt_os",    0x0E)
       .Case("true",     0x0F)
       .Case("true_uq",  0x0F)
       .Case("eq_os",    0x10)
       .Case("lt_oq",    0x11)
       .Case("le_oq",    0x12)
       .Case("unord_s",  0x13)
       .Case("neq_us",   0x14)
       .Case("nlt_uq",   0x15)
       .Case("nle_uq",   0x16)
       .Case("ord_s",    0x17)
       .Case("eq_us",    0x18)
       .Case("nge_uq",   0x19)
       .Case("ngt_uq",   0x1A)
       .Case("false_os", 0x1B)
       .Case("neq_os",   0x1C)
       .Case("ge_oq",    0x1D)
       .Case("gt_oq",    0x1E)
       .Case("true_us",  0x1F)
       .Default(~0U);
     if (ComparisonCode != ~0U && (IsVCMP || ComparisonCode < 8)) {
 
       Operands.push_back(X86Operand::CreateToken(PatchedName.slice(0, CCIdx),
                                                  NameLoc));
 
       const MCExpr *ImmOp = MCConstantExpr::create(ComparisonCode,
                                                    getParser().getContext());
       Operands.push_back(X86Operand::CreateImm(ImmOp, NameLoc, NameLoc));
 
       PatchedName = PatchedName.substr(PatchedName.size() - 2);
     }
   }
 
   // FIXME: Hack to recognize vpcmp<comparison code>{ub,uw,ud,uq,b,w,d,q}.
   if (PatchedName.startswith("vpcmp") &&
       (PatchedName.endswith("b") || PatchedName.endswith("w") ||
        PatchedName.endswith("d") || PatchedName.endswith("q"))) {
     unsigned CCIdx = PatchedName.drop_back().back() == 'u' ? 2 : 1;
     unsigned ComparisonCode = StringSwitch<unsigned>(
       PatchedName.slice(5, PatchedName.size() - CCIdx))
       .Case("eq",    0x0) // Only allowed on unsigned. Checked below.
       .Case("lt",    0x1)
       .Case("le",    0x2)
       //.Case("false", 0x3) // Not a documented alias.
       .Case("neq",   0x4)
       .Case("nlt",   0x5)
       .Case("nle",   0x6)
       //.Case("true",  0x7) // Not a documented alias.
       .Default(~0U);
     if (ComparisonCode != ~0U && (ComparisonCode != 0 || CCIdx == 2)) {
       Operands.push_back(X86Operand::CreateToken("vpcmp", NameLoc));
 
       const MCExpr *ImmOp = MCConstantExpr::create(ComparisonCode,
                                                    getParser().getContext());
       Operands.push_back(X86Operand::CreateImm(ImmOp, NameLoc, NameLoc));
 
       PatchedName = PatchedName.substr(PatchedName.size() - CCIdx);
     }
   }
 
   // FIXME: Hack to recognize vpcom<comparison code>{ub,uw,ud,uq,b,w,d,q}.
   if (PatchedName.startswith("vpcom") &&
       (PatchedName.endswith("b") || PatchedName.endswith("w") ||
        PatchedName.endswith("d") || PatchedName.endswith("q"))) {
     unsigned CCIdx = PatchedName.drop_back().back() == 'u' ? 2 : 1;
     unsigned ComparisonCode = StringSwitch<unsigned>(
       PatchedName.slice(5, PatchedName.size() - CCIdx))
       .Case("lt",    0x0)
       .Case("le",    0x1)
       .Case("gt",    0x2)
       .Case("ge",    0x3)
       .Case("eq",    0x4)
       .Case("neq",   0x5)
       .Case("false", 0x6)
       .Case("true",  0x7)
       .Default(~0U);
     if (ComparisonCode != ~0U) {
       Operands.push_back(X86Operand::CreateToken("vpcom", NameLoc));
 
       const MCExpr *ImmOp = MCConstantExpr::create(ComparisonCode,
                                                    getParser().getContext());
       Operands.push_back(X86Operand::CreateImm(ImmOp, NameLoc, NameLoc));
 
       PatchedName = PatchedName.substr(PatchedName.size() - CCIdx);
     }
   }
 
 
   // Determine whether this is an instruction prefix.
   // FIXME:
   // Enhance prefixes integrity robustness. for example, following forms
   // are currently tolerated:
   // repz repnz <insn>    ; GAS errors for the use of two similar prefixes
   // lock addq %rax, %rbx ; Destination operand must be of memory type
   // xacquire <insn>      ; xacquire must be accompanied by 'lock'
   bool isPrefix = StringSwitch<bool>(Name)
                       .Cases("rex64", "data32", "data16", true)
                       .Cases("xacquire", "xrelease", true)
                       .Cases("acquire", "release", isParsingIntelSyntax())
                       .Default(false);
 
   auto isLockRepeatNtPrefix = [](StringRef N) {
     return StringSwitch<bool>(N)
         .Cases("lock", "rep", "repe", "repz", "repne", "repnz", "notrack", true)
         .Default(false);
   };
 
   bool CurlyAsEndOfStatement = false;
 
   unsigned Flags = X86::IP_NO_PREFIX;
   while (isLockRepeatNtPrefix(Name.lower())) {
     unsigned Prefix =
         StringSwitch<unsigned>(Name)
             .Cases("lock", "lock", X86::IP_HAS_LOCK)
             .Cases("rep", "repe", "repz", X86::IP_HAS_REPEAT)
             .Cases("repne", "repnz", X86::IP_HAS_REPEAT_NE)
             .Cases("notrack", "notrack", X86::IP_HAS_NOTRACK)
             .Default(X86::IP_NO_PREFIX); // Invalid prefix (impossible)
     Flags |= Prefix;
     if (getLexer().is(AsmToken::EndOfStatement)) {
       // We don't have real instr with the given prefix
       //  let's use the prefix as the instr.
       // TODO: there could be several prefixes one after another
       Flags = X86::IP_NO_PREFIX;
       break;
     }
     Name = Parser.getTok().getString();
     Parser.Lex(); // eat the prefix
     // Hack: we could have something like "rep # some comment" or
     //    "lock; cmpxchg16b $1" or "lock\0A\09incl" or "lock/incl"
     while (Name.startswith(";") || Name.startswith("\n") ||
            Name.startswith("#") || Name.startswith("\t") ||
            Name.startswith("/")) {
       Name = Parser.getTok().getString();
       Parser.Lex(); // go to next prefix or instr
     }
   }
 
   if (Flags)
     PatchedName = Name;
 
   // Hacks to handle 'data16' and 'data32'
   if (PatchedName == "data16" && is16BitMode()) {
     return Error(NameLoc, "redundant data16 prefix");
   }
   if (PatchedName == "data32") {
     if (is32BitMode())
       return Error(NameLoc, "redundant data32 prefix");
     if (is64BitMode())
       return Error(NameLoc, "'data32' is not supported in 64-bit mode");
     // Hack to 'data16' for the table lookup.
     PatchedName = "data16";
   }
 
   Operands.push_back(X86Operand::CreateToken(PatchedName, NameLoc));
 
   // This does the actual operand parsing.  Don't parse any more if we have a
   // prefix juxtaposed with an operation like "lock incl 4(%rax)", because we
   // just want to parse the "lock" as the first instruction and the "incl" as
   // the next one.
   if (getLexer().isNot(AsmToken::EndOfStatement) && !isPrefix) {
     // Parse '*' modifier.
     if (getLexer().is(AsmToken::Star))
       Operands.push_back(X86Operand::CreateToken("*", consumeToken()));
 
     // Read the operands.
     while(1) {
       if (std::unique_ptr<X86Operand> Op = ParseOperand()) {
         Operands.push_back(std::move(Op));
         if (HandleAVX512Operand(Operands, *Operands.back()))
           return true;
       } else {
          return true;
       }
       // check for comma and eat it
       if (getLexer().is(AsmToken::Comma))
         Parser.Lex();
       else
         break;
      }
 
     // In MS inline asm curly braces mark the beginning/end of a block,
     // therefore they should be interepreted as end of statement
     CurlyAsEndOfStatement =
         isParsingIntelSyntax() && isParsingInlineAsm() &&
         (getLexer().is(AsmToken::LCurly) || getLexer().is(AsmToken::RCurly));
     if (getLexer().isNot(AsmToken::EndOfStatement) && !CurlyAsEndOfStatement)
       return TokError("unexpected token in argument list");
   }
 
   // Consume the EndOfStatement or the prefix separator Slash
   if (getLexer().is(AsmToken::EndOfStatement) ||
       (isPrefix && getLexer().is(AsmToken::Slash)))
     Parser.Lex();
   else if (CurlyAsEndOfStatement)
     // Add an actual EndOfStatement before the curly brace
     Info.AsmRewrites->emplace_back(AOK_EndOfStatement,
                                    getLexer().getTok().getLoc(), 0);
 
   // This is for gas compatibility and cannot be done in td.
   // Adding "p" for some floating point with no argument.
   // For example: fsub --> fsubp
   bool IsFp =
     Name == "fsub" || Name == "fdiv" || Name == "fsubr" || Name == "fdivr";
   if (IsFp && Operands.size() == 1) {
     const char *Repl = StringSwitch<const char *>(Name)
       .Case("fsub", "fsubp")
       .Case("fdiv", "fdivp")
       .Case("fsubr", "fsubrp")
       .Case("fdivr", "fdivrp");
     static_cast<X86Operand &>(*Operands[0]).setTokenValue(Repl);
   }
 
   // Moving a 32 or 16 bit value into a segment register has the same
   // behavior. Modify such instructions to always take shorter form.
   if ((Name == "mov" || Name == "movw" || Name == "movl") &&
       (Operands.size() == 3)) {
     X86Operand &Op1 = (X86Operand &)*Operands[1];
     X86Operand &Op2 = (X86Operand &)*Operands[2];
     SMLoc Loc = Op1.getEndLoc();
     if (Op1.isReg() && Op2.isReg() &&
         X86MCRegisterClasses[X86::SEGMENT_REGRegClassID].contains(
             Op2.getReg()) &&
         (X86MCRegisterClasses[X86::GR16RegClassID].contains(Op1.getReg()) ||
          X86MCRegisterClasses[X86::GR32RegClassID].contains(Op1.getReg()))) {
       // Change instruction name to match new instruction.
       if (Name != "mov" && Name[3] == (is16BitMode() ? 'l' : 'w')) {
         Name = is16BitMode() ? "movw" : "movl";
         Operands[0] = X86Operand::CreateToken(Name, NameLoc);
       }
       // Select the correct equivalent 16-/32-bit source register.
       unsigned Reg =
           getX86SubSuperRegisterOrZero(Op1.getReg(), is16BitMode() ? 16 : 32);
       Operands[1] = X86Operand::CreateReg(Reg, Loc, Loc);
     }
   }
 
   // This is a terrible hack to handle "out[s]?[bwl]? %al, (%dx)" ->
   // "outb %al, %dx".  Out doesn't take a memory form, but this is a widely
   // documented form in various unofficial manuals, so a lot of code uses it.
   if ((Name == "outb" || Name == "outsb" || Name == "outw" || Name == "outsw" ||
        Name == "outl" || Name == "outsl" || Name == "out" || Name == "outs") &&
       Operands.size() == 3) {
     X86Operand &Op = (X86Operand &)*Operands.back();
     if (Op.isDXReg())
       Operands.back() = X86Operand::CreateReg(X86::DX, Op.getStartLoc(),
                                               Op.getEndLoc());
   }
   // Same hack for "in[s]?[bwl]? (%dx), %al" -> "inb %dx, %al".
   if ((Name == "inb" || Name == "insb" || Name == "inw" || Name == "insw" ||
        Name == "inl" || Name == "insl" || Name == "in" || Name == "ins") &&
       Operands.size() == 3) {
     X86Operand &Op = (X86Operand &)*Operands[1];
     if (Op.isDXReg())
       Operands[1] = X86Operand::CreateReg(X86::DX, Op.getStartLoc(),
                                           Op.getEndLoc());
   }
 
   SmallVector<std::unique_ptr<MCParsedAsmOperand>, 2> TmpOperands;
   bool HadVerifyError = false;
 
   // Append default arguments to "ins[bwld]"
   if (Name.startswith("ins") &&
       (Operands.size() == 1 || Operands.size() == 3) &&
       (Name == "insb" || Name == "insw" || Name == "insl" || Name == "insd" ||
        Name == "ins")) {
 
     AddDefaultSrcDestOperands(TmpOperands,
                               X86Operand::CreateReg(X86::DX, NameLoc, NameLoc),
                               DefaultMemDIOperand(NameLoc));
     HadVerifyError = VerifyAndAdjustOperands(Operands, TmpOperands);
   }
 
   // Append default arguments to "outs[bwld]"
   if (Name.startswith("outs") &&
       (Operands.size() == 1 || Operands.size() == 3) &&
       (Name == "outsb" || Name == "outsw" || Name == "outsl" ||
        Name == "outsd" || Name == "outs")) {
     AddDefaultSrcDestOperands(TmpOperands, DefaultMemSIOperand(NameLoc),
                               X86Operand::CreateReg(X86::DX, NameLoc, NameLoc));
     HadVerifyError = VerifyAndAdjustOperands(Operands, TmpOperands);
   }
 
   // Transform "lods[bwlq]" into "lods[bwlq] ($SIREG)" for appropriate
   // values of $SIREG according to the mode. It would be nice if this
   // could be achieved with InstAlias in the tables.
   if (Name.startswith("lods") &&
       (Operands.size() == 1 || Operands.size() == 2) &&
       (Name == "lods" || Name == "lodsb" || Name == "lodsw" ||
        Name == "lodsl" || Name == "lodsd" || Name == "lodsq")) {
     TmpOperands.push_back(DefaultMemSIOperand(NameLoc));
     HadVerifyError = VerifyAndAdjustOperands(Operands, TmpOperands);
   }
 
   // Transform "stos[bwlq]" into "stos[bwlq] ($DIREG)" for appropriate
   // values of $DIREG according to the mode. It would be nice if this
   // could be achieved with InstAlias in the tables.
   if (Name.startswith("stos") &&
       (Operands.size() == 1 || Operands.size() == 2) &&
       (Name == "stos" || Name == "stosb" || Name == "stosw" ||
        Name == "stosl" || Name == "stosd" || Name == "stosq")) {
     TmpOperands.push_back(DefaultMemDIOperand(NameLoc));
     HadVerifyError = VerifyAndAdjustOperands(Operands, TmpOperands);
   }
 
   // Transform "scas[bwlq]" into "scas[bwlq] ($DIREG)" for appropriate
   // values of $DIREG according to the mode. It would be nice if this
   // could be achieved with InstAlias in the tables.
   if (Name.startswith("scas") &&
       (Operands.size() == 1 || Operands.size() == 2) &&
       (Name == "scas" || Name == "scasb" || Name == "scasw" ||
        Name == "scasl" || Name == "scasd" || Name == "scasq")) {
     TmpOperands.push_back(DefaultMemDIOperand(NameLoc));
     HadVerifyError = VerifyAndAdjustOperands(Operands, TmpOperands);
   }
 
   // Add default SI and DI operands to "cmps[bwlq]".
   if (Name.startswith("cmps") &&
       (Operands.size() == 1 || Operands.size() == 3) &&
       (Name == "cmps" || Name == "cmpsb" || Name == "cmpsw" ||
        Name == "cmpsl" || Name == "cmpsd" || Name == "cmpsq")) {
     AddDefaultSrcDestOperands(TmpOperands, DefaultMemDIOperand(NameLoc),
                               DefaultMemSIOperand(NameLoc));
     HadVerifyError = VerifyAndAdjustOperands(Operands, TmpOperands);
   }
 
   // Add default SI and DI operands to "movs[bwlq]".
   if (((Name.startswith("movs") &&
         (Name == "movs" || Name == "movsb" || Name == "movsw" ||
          Name == "movsl" || Name == "movsd" || Name == "movsq")) ||
        (Name.startswith("smov") &&
         (Name == "smov" || Name == "smovb" || Name == "smovw" ||
          Name == "smovl" || Name == "smovd" || Name == "smovq"))) &&
       (Operands.size() == 1 || Operands.size() == 3)) {
     if (Name == "movsd" && Operands.size() == 1 && !isParsingIntelSyntax())
       Operands.back() = X86Operand::CreateToken("movsl", NameLoc);
     AddDefaultSrcDestOperands(TmpOperands, DefaultMemSIOperand(NameLoc),
                               DefaultMemDIOperand(NameLoc));
     HadVerifyError = VerifyAndAdjustOperands(Operands, TmpOperands);
   }
 
   // Check if we encountered an error for one the string insturctions
   if (HadVerifyError) {
     return HadVerifyError;
   }
 
   // FIXME: Hack to handle recognize s{hr,ar,hl} $1, <op>.  Canonicalize to
   // "shift <op>".
   if ((Name.startswith("shr") || Name.startswith("sar") ||
        Name.startswith("shl") || Name.startswith("sal") ||
        Name.startswith("rcl") || Name.startswith("rcr") ||
        Name.startswith("rol") || Name.startswith("ror")) &&
       Operands.size() == 3) {
     if (isParsingIntelSyntax()) {
       // Intel syntax
       X86Operand &Op1 = static_cast<X86Operand &>(*Operands[2]);
       if (Op1.isImm() && isa<MCConstantExpr>(Op1.getImm()) &&
           cast<MCConstantExpr>(Op1.getImm())->getValue() == 1)
         Operands.pop_back();
     } else {
       X86Operand &Op1 = static_cast<X86Operand &>(*Operands[1]);
       if (Op1.isImm() && isa<MCConstantExpr>(Op1.getImm()) &&
           cast<MCConstantExpr>(Op1.getImm())->getValue() == 1)
         Operands.erase(Operands.begin() + 1);
     }
   }
 
   // Transforms "int $3" into "int3" as a size optimization.  We can't write an
   // instalias with an immediate operand yet.
   if (Name == "int" && Operands.size() == 2) {
     X86Operand &Op1 = static_cast<X86Operand &>(*Operands[1]);
     if (Op1.isImm())
       if (auto *CE = dyn_cast<MCConstantExpr>(Op1.getImm()))
         if (CE->getValue() == 3) {
           Operands.erase(Operands.begin() + 1);
           static_cast<X86Operand &>(*Operands[0]).setTokenValue("int3");
         }
   }
 
   // Transforms "xlat mem8" into "xlatb"
   if ((Name == "xlat" || Name == "xlatb") && Operands.size() == 2) {
     X86Operand &Op1 = static_cast<X86Operand &>(*Operands[1]);
     if (Op1.isMem8()) {
       Warning(Op1.getStartLoc(), "memory operand is only for determining the "
                                  "size, (R|E)BX will be used for the location");
       Operands.pop_back();
       static_cast<X86Operand &>(*Operands[0]).setTokenValue("xlatb");
     }
   }
 
   if (Flags)
     Operands.push_back(X86Operand::CreatePrefix(Flags, NameLoc, NameLoc));
   return false;
 }
 
 bool X86AsmParser::processInstruction(MCInst &Inst, const OperandVector &Ops) {
   return false;
 }
 
 bool X86AsmParser::validateInstruction(MCInst &Inst, const OperandVector &Ops) {
   const MCRegisterInfo *MRI = getContext().getRegisterInfo();
 
   switch (Inst.getOpcode()) {
   case X86::VGATHERDPDYrm:
   case X86::VGATHERDPDrm:
   case X86::VGATHERDPSYrm:
   case X86::VGATHERDPSrm:
   case X86::VGATHERQPDYrm:
   case X86::VGATHERQPDrm:
   case X86::VGATHERQPSYrm:
   case X86::VGATHERQPSrm:
   case X86::VPGATHERDDYrm:
   case X86::VPGATHERDDrm:
   case X86::VPGATHERDQYrm:
   case X86::VPGATHERDQrm:
   case X86::VPGATHERQDYrm:
   case X86::VPGATHERQDrm:
   case X86::VPGATHERQQYrm:
   case X86::VPGATHERQQrm: {
     unsigned Dest = MRI->getEncodingValue(Inst.getOperand(0).getReg());
     unsigned Mask = MRI->getEncodingValue(Inst.getOperand(1).getReg());
     unsigned Index =
       MRI->getEncodingValue(Inst.getOperand(3 + X86::AddrIndexReg).getReg());
     if (Dest == Mask || Dest == Index || Mask == Index)
       return Warning(Ops[0]->getStartLoc(), "mask, index, and destination "
                                             "registers should be distinct");
     break;
   }
   case X86::VGATHERDPDZ128rm:
   case X86::VGATHERDPDZ256rm:
   case X86::VGATHERDPDZrm:
   case X86::VGATHERDPSZ128rm:
   case X86::VGATHERDPSZ256rm:
   case X86::VGATHERDPSZrm:
   case X86::VGATHERQPDZ128rm:
   case X86::VGATHERQPDZ256rm:
   case X86::VGATHERQPDZrm:
   case X86::VGATHERQPSZ128rm:
   case X86::VGATHERQPSZ256rm:
   case X86::VGATHERQPSZrm:
   case X86::VPGATHERDDZ128rm:
   case X86::VPGATHERDDZ256rm:
   case X86::VPGATHERDDZrm:
   case X86::VPGATHERDQZ128rm:
   case X86::VPGATHERDQZ256rm:
   case X86::VPGATHERDQZrm:
   case X86::VPGATHERQDZ128rm:
   case X86::VPGATHERQDZ256rm:
   case X86::VPGATHERQDZrm:
   case X86::VPGATHERQQZ128rm:
   case X86::VPGATHERQQZ256rm:
   case X86::VPGATHERQQZrm: {
     unsigned Dest = MRI->getEncodingValue(Inst.getOperand(0).getReg());
     unsigned Index =
       MRI->getEncodingValue(Inst.getOperand(4 + X86::AddrIndexReg).getReg());
     if (Dest == Index)
       return Warning(Ops[0]->getStartLoc(), "index and destination registers "
                                             "should be distinct");
     break;
   }
   case X86::V4FMADDPSrm:
   case X86::V4FMADDPSrmk:
   case X86::V4FMADDPSrmkz:
   case X86::V4FMADDSSrm:
   case X86::V4FMADDSSrmk:
   case X86::V4FMADDSSrmkz:
   case X86::V4FNMADDPSrm:
   case X86::V4FNMADDPSrmk:
   case X86::V4FNMADDPSrmkz:
   case X86::V4FNMADDSSrm:
   case X86::V4FNMADDSSrmk:
   case X86::V4FNMADDSSrmkz:
   case X86::VP4DPWSSDSrm:
   case X86::VP4DPWSSDSrmk:
   case X86::VP4DPWSSDSrmkz:
   case X86::VP4DPWSSDrm:
   case X86::VP4DPWSSDrmk:
   case X86::VP4DPWSSDrmkz: {
     unsigned Src2 = Inst.getOperand(Inst.getNumOperands() -
                                     X86::AddrNumOperands - 1).getReg();
     unsigned Src2Enc = MRI->getEncodingValue(Src2);
     if (Src2Enc % 4 != 0) {
       StringRef RegName = X86IntelInstPrinter::getRegisterName(Src2);
       unsigned GroupStart = (Src2Enc / 4) * 4;
       unsigned GroupEnd = GroupStart + 3;
       return Warning(Ops[0]->getStartLoc(),
                      "source register '" + RegName + "' implicitly denotes '" +
                      RegName.take_front(3) + Twine(GroupStart) + "' to '" +
                      RegName.take_front(3) + Twine(GroupEnd) +
                      "' source group");
     }
     break;
   }
   }
 
   return false;
 }
 
 static const char *getSubtargetFeatureName(uint64_t Val);
 
 void X86AsmParser::EmitInstruction(MCInst &Inst, OperandVector &Operands,
                                    MCStreamer &Out) {
   Instrumentation->InstrumentAndEmitInstruction(
       Inst, Operands, getContext(), MII, Out,
       getParser().shouldPrintSchedInfo());
 }
 
 bool X86AsmParser::MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
                                            OperandVector &Operands,
                                            MCStreamer &Out, uint64_t &ErrorInfo,
                                            bool MatchingInlineAsm) {
   if (isParsingIntelSyntax())
     return MatchAndEmitIntelInstruction(IDLoc, Opcode, Operands, Out, ErrorInfo,
                                         MatchingInlineAsm);
   return MatchAndEmitATTInstruction(IDLoc, Opcode, Operands, Out, ErrorInfo,
                                     MatchingInlineAsm);
 }
 
 void X86AsmParser::MatchFPUWaitAlias(SMLoc IDLoc, X86Operand &Op,
                                      OperandVector &Operands, MCStreamer &Out,
                                      bool MatchingInlineAsm) {
   // FIXME: This should be replaced with a real .td file alias mechanism.
   // Also, MatchInstructionImpl should actually *do* the EmitInstruction
   // call.
   const char *Repl = StringSwitch<const char *>(Op.getToken())
                          .Case("finit", "fninit")
                          .Case("fsave", "fnsave")
                          .Case("fstcw", "fnstcw")
                          .Case("fstcww", "fnstcw")
                          .Case("fstenv", "fnstenv")
                          .Case("fstsw", "fnstsw")
                          .Case("fstsww", "fnstsw")
                          .Case("fclex", "fnclex")
                          .Default(nullptr);
   if (Repl) {
     MCInst Inst;
     Inst.setOpcode(X86::WAIT);
     Inst.setLoc(IDLoc);
     if (!MatchingInlineAsm)
       EmitInstruction(Inst, Operands, Out);
     Operands[0] = X86Operand::CreateToken(Repl, IDLoc);
   }
 }
 
 bool X86AsmParser::ErrorMissingFeature(SMLoc IDLoc, uint64_t ErrorInfo,
                                        bool MatchingInlineAsm) {
   assert(ErrorInfo && "Unknown missing feature!");
   SmallString<126> Msg;
   raw_svector_ostream OS(Msg);
   OS << "instruction requires:";
   uint64_t Mask = 1;
   for (unsigned i = 0; i < (sizeof(ErrorInfo)*8-1); ++i) {
     if (ErrorInfo & Mask)
       OS << ' ' << getSubtargetFeatureName(ErrorInfo & Mask);
     Mask <<= 1;
   }
   return Error(IDLoc, OS.str(), SMRange(), MatchingInlineAsm);
 }
 
 static unsigned getPrefixes(OperandVector &Operands) {
   unsigned Result = 0;
   X86Operand &Prefix = static_cast<X86Operand &>(*Operands.back());
   if (Prefix.isPrefix()) {
     Result = Prefix.getPrefix();
     Operands.pop_back();
   }
   return Result;
 }
 
 bool X86AsmParser::MatchAndEmitATTInstruction(SMLoc IDLoc, unsigned &Opcode,
                                               OperandVector &Operands,
                                               MCStreamer &Out,
                                               uint64_t &ErrorInfo,
                                               bool MatchingInlineAsm) {
   assert(!Operands.empty() && "Unexpect empty operand list!");
   X86Operand &Op = static_cast<X86Operand &>(*Operands[0]);
   assert(Op.isToken() && "Leading operand should always be a mnemonic!");
   SMRange EmptyRange = None;
 
   // First, handle aliases that expand to multiple instructions.
   MatchFPUWaitAlias(IDLoc, Op, Operands, Out, MatchingInlineAsm);
 
   bool WasOriginallyInvalidOperand = false;
   unsigned Prefixes = getPrefixes(Operands);
 
   MCInst Inst;
 
   if (Prefixes)
     Inst.setFlags(Prefixes);
 
   // First, try a direct match.
   switch (MatchInstruction(Operands, Inst, ErrorInfo, MatchingInlineAsm,
                            isParsingIntelSyntax())) {
   default: llvm_unreachable("Unexpected match result!");
   case Match_Success:
     if (!MatchingInlineAsm && validateInstruction(Inst, Operands))
       return true;
     // Some instructions need post-processing to, for example, tweak which
     // encoding is selected. Loop on it while changes happen so the
     // individual transformations can chain off each other.
     if (!MatchingInlineAsm)
       while (processInstruction(Inst, Operands))
         ;
 
     Inst.setLoc(IDLoc);
     if (!MatchingInlineAsm)
       EmitInstruction(Inst, Operands, Out);
     Opcode = Inst.getOpcode();
     return false;
   case Match_MissingFeature:
     return ErrorMissingFeature(IDLoc, ErrorInfo, MatchingInlineAsm);
   case Match_InvalidOperand:
     WasOriginallyInvalidOperand = true;
     break;
   case Match_MnemonicFail:
     break;
   }
 
   // FIXME: Ideally, we would only attempt suffix matches for things which are
   // valid prefixes, and we could just infer the right unambiguous
   // type. However, that requires substantially more matcher support than the
   // following hack.
 
   // Change the operand to point to a temporary token.
   StringRef Base = Op.getToken();
   SmallString<16> Tmp;
   Tmp += Base;
   Tmp += ' ';
   Op.setTokenValue(Tmp);
 
   // If this instruction starts with an 'f', then it is a floating point stack
   // instruction.  These come in up to three forms for 32-bit, 64-bit, and
   // 80-bit floating point, which use the suffixes s,l,t respectively.
   //
   // Otherwise, we assume that this may be an integer instruction, which comes
   // in 8/16/32/64-bit forms using the b,w,l,q suffixes respectively.
   const char *Suffixes = Base[0] != 'f' ? "bwlq" : "slt\0";
 
   // Check for the various suffix matches.
   uint64_t ErrorInfoIgnore;
   uint64_t ErrorInfoMissingFeature = 0; // Init suppresses compiler warnings.
   unsigned Match[4];
 
   for (unsigned I = 0, E = array_lengthof(Match); I != E; ++I) {
     Tmp.back() = Suffixes[I];
     Match[I] = MatchInstruction(Operands, Inst, ErrorInfoIgnore,
                                 MatchingInlineAsm, isParsingIntelSyntax());
     // If this returned as a missing feature failure, remember that.
     if (Match[I] == Match_MissingFeature)
       ErrorInfoMissingFeature = ErrorInfoIgnore;
   }
 
   // Restore the old token.
   Op.setTokenValue(Base);
 
   // If exactly one matched, then we treat that as a successful match (and the
   // instruction will already have been filled in correctly, since the failing
   // matches won't have modified it).
   unsigned NumSuccessfulMatches =
       std::count(std::begin(Match), std::end(Match), Match_Success);
   if (NumSuccessfulMatches == 1) {
     Inst.setLoc(IDLoc);
     if (!MatchingInlineAsm)
       EmitInstruction(Inst, Operands, Out);
     Opcode = Inst.getOpcode();
     return false;
   }
 
   // Otherwise, the match failed, try to produce a decent error message.
 
   // If we had multiple suffix matches, then identify this as an ambiguous
   // match.
   if (NumSuccessfulMatches > 1) {
     char MatchChars[4];
     unsigned NumMatches = 0;
     for (unsigned I = 0, E = array_lengthof(Match); I != E; ++I)
       if (Match[I] == Match_Success)
         MatchChars[NumMatches++] = Suffixes[I];
 
     SmallString<126> Msg;
     raw_svector_ostream OS(Msg);
     OS << "ambiguous instructions require an explicit suffix (could be ";
     for (unsigned i = 0; i != NumMatches; ++i) {
       if (i != 0)
         OS << ", ";
       if (i + 1 == NumMatches)
         OS << "or ";
       OS << "'" << Base << MatchChars[i] << "'";
     }
     OS << ")";
     Error(IDLoc, OS.str(), EmptyRange, MatchingInlineAsm);
     return true;
   }
 
   // Okay, we know that none of the variants matched successfully.
 
   // If all of the instructions reported an invalid mnemonic, then the original
   // mnemonic was invalid.
   if (std::count(std::begin(Match), std::end(Match), Match_MnemonicFail) == 4) {
     if (!WasOriginallyInvalidOperand) {
       return Error(IDLoc, "invalid instruction mnemonic '" + Base + "'",
                    Op.getLocRange(), MatchingInlineAsm);
     }
 
     // Recover location info for the operand if we know which was the problem.
     if (ErrorInfo != ~0ULL) {
       if (ErrorInfo >= Operands.size())
         return Error(IDLoc, "too few operands for instruction", EmptyRange,
                      MatchingInlineAsm);
 
       X86Operand &Operand = (X86Operand &)*Operands[ErrorInfo];
       if (Operand.getStartLoc().isValid()) {
         SMRange OperandRange = Operand.getLocRange();
         return Error(Operand.getStartLoc(), "invalid operand for instruction",
                      OperandRange, MatchingInlineAsm);
       }
     }
 
     return Error(IDLoc, "invalid operand for instruction", EmptyRange,
                  MatchingInlineAsm);
   }
 
   // If one instruction matched with a missing feature, report this as a
   // missing feature.
   if (std::count(std::begin(Match), std::end(Match),
                  Match_MissingFeature) == 1) {
     ErrorInfo = ErrorInfoMissingFeature;
     return ErrorMissingFeature(IDLoc, ErrorInfoMissingFeature,
                                MatchingInlineAsm);
   }
 
   // If one instruction matched with an invalid operand, report this as an
   // operand failure.
   if (std::count(std::begin(Match), std::end(Match),
                  Match_InvalidOperand) == 1) {
     return Error(IDLoc, "invalid operand for instruction", EmptyRange,
                  MatchingInlineAsm);
   }
 
   // If all of these were an outright failure, report it in a useless way.
   Error(IDLoc, "unknown use of instruction mnemonic without a size suffix",
         EmptyRange, MatchingInlineAsm);
   return true;
 }
 
 bool X86AsmParser::MatchAndEmitIntelInstruction(SMLoc IDLoc, unsigned &Opcode,
                                                 OperandVector &Operands,
                                                 MCStreamer &Out,
                                                 uint64_t &ErrorInfo,
                                                 bool MatchingInlineAsm) {
   assert(!Operands.empty() && "Unexpect empty operand list!");
   X86Operand &Op = static_cast<X86Operand &>(*Operands[0]);
   assert(Op.isToken() && "Leading operand should always be a mnemonic!");
   StringRef Mnemonic = Op.getToken();
   SMRange EmptyRange = None;
   StringRef Base = Op.getToken();
   unsigned Prefixes = getPrefixes(Operands);
 
   // First, handle aliases that expand to multiple instructions.
   MatchFPUWaitAlias(IDLoc, Op, Operands, Out, MatchingInlineAsm);
 
   MCInst Inst;
 
   if (Prefixes)
     Inst.setFlags(Prefixes);
 
   // Find one unsized memory operand, if present.
   X86Operand *UnsizedMemOp = nullptr;
   for (const auto &Op : Operands) {
     X86Operand *X86Op = static_cast<X86Operand *>(Op.get());
     if (X86Op->isMemUnsized()) {
       UnsizedMemOp = X86Op;
       // Have we found an unqualified memory operand,
       // break. IA allows only one memory operand.
       break;
     }
   }
 
   // Allow some instructions to have implicitly pointer-sized operands.  This is
   // compatible with gas.
   if (UnsizedMemOp) {
     static const char *const PtrSizedInstrs[] = {"call", "jmp", "push"};
     for (const char *Instr : PtrSizedInstrs) {
       if (Mnemonic == Instr) {
         UnsizedMemOp->Mem.Size = getPointerWidth();
         break;
       }
     }
   }
 
   SmallVector<unsigned, 8> Match;
   uint64_t ErrorInfoMissingFeature = 0;
 
   // If unsized push has immediate operand we should default the default pointer
   // size for the size.
   if (Mnemonic == "push" && Operands.size() == 2) {
     auto *X86Op = static_cast<X86Operand *>(Operands[1].get());
     if (X86Op->isImm()) {
       // If it's not a constant fall through and let remainder take care of it.
       const auto *CE = dyn_cast<MCConstantExpr>(X86Op->getImm());
       unsigned Size = getPointerWidth();
       if (CE &&
           (isIntN(Size, CE->getValue()) || isUIntN(Size, CE->getValue()))) {
         SmallString<16> Tmp;
         Tmp += Base;
         Tmp += (is64BitMode())
                    ? "q"
                    : (is32BitMode()) ? "l" : (is16BitMode()) ? "w" : " ";
         Op.setTokenValue(Tmp);
         // Do match in ATT mode to allow explicit suffix usage.
         Match.push_back(MatchInstruction(Operands, Inst, ErrorInfo,
                                          MatchingInlineAsm,
                                          false /*isParsingIntelSyntax()*/));
         Op.setTokenValue(Base);
       }
     }
   }
 
   // If an unsized memory operand is present, try to match with each memory
   // operand size.  In Intel assembly, the size is not part of the instruction
   // mnemonic.
   if (UnsizedMemOp && UnsizedMemOp->isMemUnsized()) {
     static const unsigned MopSizes[] = {8, 16, 32, 64, 80, 128, 256, 512};
     for (unsigned Size : MopSizes) {
       UnsizedMemOp->Mem.Size = Size;
       uint64_t ErrorInfoIgnore;
       unsigned LastOpcode = Inst.getOpcode();
       unsigned M = MatchInstruction(Operands, Inst, ErrorInfoIgnore,
                                     MatchingInlineAsm, isParsingIntelSyntax());
       if (Match.empty() || LastOpcode != Inst.getOpcode())
         Match.push_back(M);
 
       // If this returned as a missing feature failure, remember that.
       if (Match.back() == Match_MissingFeature)
         ErrorInfoMissingFeature = ErrorInfoIgnore;
     }
 
     // Restore the size of the unsized memory operand if we modified it.
     UnsizedMemOp->Mem.Size = 0;
   }
 
   // If we haven't matched anything yet, this is not a basic integer or FPU
   // operation.  There shouldn't be any ambiguity in our mnemonic table, so try
   // matching with the unsized operand.
   if (Match.empty()) {
     Match.push_back(MatchInstruction(
         Operands, Inst, ErrorInfo, MatchingInlineAsm, isParsingIntelSyntax()));
     // If this returned as a missing feature failure, remember that.
     if (Match.back() == Match_MissingFeature)
       ErrorInfoMissingFeature = ErrorInfo;
   }
 
   // Restore the size of the unsized memory operand if we modified it.
   if (UnsizedMemOp)
     UnsizedMemOp->Mem.Size = 0;
 
   // If it's a bad mnemonic, all results will be the same.
   if (Match.back() == Match_MnemonicFail) {
     return Error(IDLoc, "invalid instruction mnemonic '" + Mnemonic + "'",
                  Op.getLocRange(), MatchingInlineAsm);
   }
 
   unsigned NumSuccessfulMatches =
       std::count(std::begin(Match), std::end(Match), Match_Success);
 
   // If matching was ambiguous and we had size information from the frontend,
   // try again with that. This handles cases like "movxz eax, m8/m16".
   if (UnsizedMemOp && NumSuccessfulMatches > 1 &&
       UnsizedMemOp->getMemFrontendSize()) {
     UnsizedMemOp->Mem.Size = UnsizedMemOp->getMemFrontendSize();
     unsigned M = MatchInstruction(
         Operands, Inst, ErrorInfo, MatchingInlineAsm, isParsingIntelSyntax());
     if (M == Match_Success)
       NumSuccessfulMatches = 1;
 
     // Add a rewrite that encodes the size information we used from the
     // frontend.
     InstInfo->AsmRewrites->emplace_back(
         AOK_SizeDirective, UnsizedMemOp->getStartLoc(),
         /*Len=*/0, UnsizedMemOp->getMemFrontendSize());
   }
 
   // If exactly one matched, then we treat that as a successful match (and the
   // instruction will already have been filled in correctly, since the failing
   // matches won't have modified it).
   if (NumSuccessfulMatches == 1) {
     if (!MatchingInlineAsm && validateInstruction(Inst, Operands))
       return true;
     // Some instructions need post-processing to, for example, tweak which
     // encoding is selected. Loop on it while changes happen so the individual
     // transformations can chain off each other.
     if (!MatchingInlineAsm)
       while (processInstruction(Inst, Operands))
         ;
     Inst.setLoc(IDLoc);
     if (!MatchingInlineAsm)
       EmitInstruction(Inst, Operands, Out);
     Opcode = Inst.getOpcode();
     return false;
   } else if (NumSuccessfulMatches > 1) {
     assert(UnsizedMemOp &&
            "multiple matches only possible with unsized memory operands");
     return Error(UnsizedMemOp->getStartLoc(),
                  "ambiguous operand size for instruction '" + Mnemonic + "\'",
                  UnsizedMemOp->getLocRange());
   }
 
   // If one instruction matched with a missing feature, report this as a
   // missing feature.
   if (std::count(std::begin(Match), std::end(Match),
                  Match_MissingFeature) == 1) {
     ErrorInfo = ErrorInfoMissingFeature;
     return ErrorMissingFeature(IDLoc, ErrorInfoMissingFeature,
                                MatchingInlineAsm);
   }
 
   // If one instruction matched with an invalid operand, report this as an
   // operand failure.
   if (std::count(std::begin(Match), std::end(Match),
                  Match_InvalidOperand) == 1) {
     return Error(IDLoc, "invalid operand for instruction", EmptyRange,
                  MatchingInlineAsm);
   }
 
   // If all of these were an outright failure, report it in a useless way.
   return Error(IDLoc, "unknown instruction mnemonic", EmptyRange,
                MatchingInlineAsm);
 }
 
 bool X86AsmParser::OmitRegisterFromClobberLists(unsigned RegNo) {
   return X86MCRegisterClasses[X86::SEGMENT_REGRegClassID].contains(RegNo);
 }
 
 bool X86AsmParser::ParseDirective(AsmToken DirectiveID) {
   MCAsmParser &Parser = getParser();
   StringRef IDVal = DirectiveID.getIdentifier();
   if (IDVal.startswith(".code"))
     return ParseDirectiveCode(IDVal, DirectiveID.getLoc());
   else if (IDVal.startswith(".att_syntax")) {
     getParser().setParsingInlineAsm(false);
     if (getLexer().isNot(AsmToken::EndOfStatement)) {
       if (Parser.getTok().getString() == "prefix")
         Parser.Lex();
       else if (Parser.getTok().getString() == "noprefix")
         return Error(DirectiveID.getLoc(), "'.att_syntax noprefix' is not "
                                            "supported: registers must have a "
                                            "'%' prefix in .att_syntax");
     }
     getParser().setAssemblerDialect(0);
     return false;
   } else if (IDVal.startswith(".intel_syntax")) {
     getParser().setAssemblerDialect(1);
     getParser().setParsingInlineAsm(true);
     if (getLexer().isNot(AsmToken::EndOfStatement)) {
       if (Parser.getTok().getString() == "noprefix")
         Parser.Lex();
       else if (Parser.getTok().getString() == "prefix")
         return Error(DirectiveID.getLoc(), "'.intel_syntax prefix' is not "
                                            "supported: registers must not have "
                                            "a '%' prefix in .intel_syntax");
     }
     return false;
   } else if (IDVal == ".even")
     return parseDirectiveEven(DirectiveID.getLoc());
   else if (IDVal == ".cv_fpo_proc")
     return parseDirectiveFPOProc(DirectiveID.getLoc());
   else if (IDVal == ".cv_fpo_setframe")
     return parseDirectiveFPOSetFrame(DirectiveID.getLoc());
   else if (IDVal == ".cv_fpo_pushreg")
     return parseDirectiveFPOPushReg(DirectiveID.getLoc());
   else if (IDVal == ".cv_fpo_stackalloc")
     return parseDirectiveFPOStackAlloc(DirectiveID.getLoc());
   else if (IDVal == ".cv_fpo_endprologue")
     return parseDirectiveFPOEndPrologue(DirectiveID.getLoc());
   else if (IDVal == ".cv_fpo_endproc")
     return parseDirectiveFPOEndProc(DirectiveID.getLoc());
 
   return true;
 }
 
 /// parseDirectiveEven
 ///  ::= .even
 bool X86AsmParser::parseDirectiveEven(SMLoc L) {
   if (parseToken(AsmToken::EndOfStatement, "unexpected token in directive"))
     return false;
 
   const MCSection *Section = getStreamer().getCurrentSectionOnly();
   if (!Section) {
     getStreamer().InitSections(false);
     Section = getStreamer().getCurrentSectionOnly();
   }
   if (Section->UseCodeAlign())
     getStreamer().EmitCodeAlignment(2, 0);
   else
     getStreamer().EmitValueToAlignment(2, 0, 1, 0);
   return false;
 }
 
 /// ParseDirectiveCode
 ///  ::= .code16 | .code32 | .code64
 bool X86AsmParser::ParseDirectiveCode(StringRef IDVal, SMLoc L) {
   MCAsmParser &Parser = getParser();
   Code16GCC = false;
   if (IDVal == ".code16") {
     Parser.Lex();
     if (!is16BitMode()) {
       SwitchMode(X86::Mode16Bit);
       getParser().getStreamer().EmitAssemblerFlag(MCAF_Code16);
     }
   } else if (IDVal == ".code16gcc") {
     // .code16gcc parses as if in 32-bit mode, but emits code in 16-bit mode.
     Parser.Lex();
     Code16GCC = true;
     if (!is16BitMode()) {
       SwitchMode(X86::Mode16Bit);
       getParser().getStreamer().EmitAssemblerFlag(MCAF_Code16);
     }
   } else if (IDVal == ".code32") {
     Parser.Lex();
     if (!is32BitMode()) {
       SwitchMode(X86::Mode32Bit);
       getParser().getStreamer().EmitAssemblerFlag(MCAF_Code32);
     }
   } else if (IDVal == ".code64") {
     Parser.Lex();
     if (!is64BitMode()) {
       SwitchMode(X86::Mode64Bit);
       getParser().getStreamer().EmitAssemblerFlag(MCAF_Code64);
     }
   } else {
     Error(L, "unknown directive " + IDVal);
     return false;
   }
 
   return false;
 }
 
 // .cv_fpo_proc foo
 bool X86AsmParser::parseDirectiveFPOProc(SMLoc L) {
   MCAsmParser &Parser = getParser();
   StringRef ProcName;
   int64_t ParamsSize;
   if (Parser.parseIdentifier(ProcName))
     return Parser.TokError("expected symbol name");
   if (Parser.parseIntToken(ParamsSize, "expected parameter byte count"))
     return true;
   if (!isUIntN(32, ParamsSize))
     return Parser.TokError("parameters size out of range");
   if (Parser.parseEOL("unexpected tokens"))
     return addErrorSuffix(" in '.cv_fpo_proc' directive");
   MCSymbol *ProcSym = getContext().getOrCreateSymbol(ProcName);
   return getTargetStreamer().emitFPOProc(ProcSym, ParamsSize, L);
 }
 
 // .cv_fpo_setframe ebp
 bool X86AsmParser::parseDirectiveFPOSetFrame(SMLoc L) {
   MCAsmParser &Parser = getParser();
   unsigned Reg;
   SMLoc DummyLoc;
   if (ParseRegister(Reg, DummyLoc, DummyLoc) ||
       Parser.parseEOL("unexpected tokens"))
     return addErrorSuffix(" in '.cv_fpo_setframe' directive");
   return getTargetStreamer().emitFPOSetFrame(Reg, L);
 }
 
 // .cv_fpo_pushreg ebx
 bool X86AsmParser::parseDirectiveFPOPushReg(SMLoc L) {
   MCAsmParser &Parser = getParser();
   unsigned Reg;
   SMLoc DummyLoc;
   if (ParseRegister(Reg, DummyLoc, DummyLoc) ||
       Parser.parseEOL("unexpected tokens"))
     return addErrorSuffix(" in '.cv_fpo_pushreg' directive");
   return getTargetStreamer().emitFPOPushReg(Reg, L);
 }
 
 // .cv_fpo_stackalloc 20
 bool X86AsmParser::parseDirectiveFPOStackAlloc(SMLoc L) {
   MCAsmParser &Parser = getParser();
   int64_t Offset;
   if (Parser.parseIntToken(Offset, "expected offset") ||
       Parser.parseEOL("unexpected tokens"))
     return addErrorSuffix(" in '.cv_fpo_stackalloc' directive");
   return getTargetStreamer().emitFPOStackAlloc(Offset, L);
 }
 
 // .cv_fpo_endprologue
 bool X86AsmParser::parseDirectiveFPOEndPrologue(SMLoc L) {
   MCAsmParser &Parser = getParser();
   if (Parser.parseEOL("unexpected tokens"))
     return addErrorSuffix(" in '.cv_fpo_endprologue' directive");
   return getTargetStreamer().emitFPOEndPrologue(L);
 }
 
 // .cv_fpo_endproc
 bool X86AsmParser::parseDirectiveFPOEndProc(SMLoc L) {
   MCAsmParser &Parser = getParser();
   if (Parser.parseEOL("unexpected tokens"))
     return addErrorSuffix(" in '.cv_fpo_endproc' directive");
   return getTargetStreamer().emitFPOEndProc(L);
 }
 
 // Force static initialization.
 extern "C" void LLVMInitializeX86AsmParser() {
   RegisterMCAsmParser<X86AsmParser> X(getTheX86_32Target());
   RegisterMCAsmParser<X86AsmParser> Y(getTheX86_64Target());
 }
 
 #define GET_REGISTER_MATCHER
 #define GET_MATCHER_IMPLEMENTATION
 #define GET_SUBTARGET_FEATURE_NAME
 #include "X86GenAsmMatcher.inc"
Index: vendor/llvm/dist-release_70/lib/Transforms/Scalar/LoopSink.cpp
===================================================================
--- vendor/llvm/dist-release_70/lib/Transforms/Scalar/LoopSink.cpp	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Transforms/Scalar/LoopSink.cpp	(revision 338575)
@@ -1,375 +1,383 @@
 //===-- LoopSink.cpp - Loop Sink Pass -------------------------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This pass does the inverse transformation of what LICM does.
 // It traverses all of the instructions in the loop's preheader and sinks
 // them to the loop body where frequency is lower than the loop's preheader.
 // This pass is a reverse-transformation of LICM. It differs from the Sink
 // pass in the following ways:
 //
 // * It only handles sinking of instructions from the loop's preheader to the
 //   loop's body
 // * It uses alias set tracker to get more accurate alias info
 // * It uses block frequency info to find the optimal sinking locations
 //
 // Overall algorithm:
 //
 // For I in Preheader:
 //   InsertBBs = BBs that uses I
 //   For BB in sorted(LoopBBs):
 //     DomBBs = BBs in InsertBBs that are dominated by BB
 //     if freq(DomBBs) > freq(BB)
 //       InsertBBs = UseBBs - DomBBs + BB
 //   For BB in InsertBBs:
 //     Insert I at BB's beginning
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/Transforms/Scalar/LoopSink.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/Analysis/AliasAnalysis.h"
 #include "llvm/Analysis/AliasSetTracker.h"
 #include "llvm/Analysis/BasicAliasAnalysis.h"
 #include "llvm/Analysis/BlockFrequencyInfo.h"
 #include "llvm/Analysis/Loads.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/LoopPass.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Transforms/Scalar.h"
 #include "llvm/Transforms/Scalar/LoopPassManager.h"
 #include "llvm/Transforms/Utils/LoopUtils.h"
 using namespace llvm;
 
 #define DEBUG_TYPE "loopsink"
 
 STATISTIC(NumLoopSunk, "Number of instructions sunk into loop");
 STATISTIC(NumLoopSunkCloned, "Number of cloned instructions sunk into loop");
 
 static cl::opt<unsigned> SinkFrequencyPercentThreshold(
     "sink-freq-percent-threshold", cl::Hidden, cl::init(90),
     cl::desc("Do not sink instructions that require cloning unless they "
              "execute less than this percent of the time."));
 
 static cl::opt<unsigned> MaxNumberOfUseBBsForSinking(
     "max-uses-for-sinking", cl::Hidden, cl::init(30),
     cl::desc("Do not sink instructions that have too many uses."));
 
 /// Return adjusted total frequency of \p BBs.
 ///
 /// * If there is only one BB, sinking instruction will not introduce code
 ///   size increase. Thus there is no need to adjust the frequency.
 /// * If there are more than one BB, sinking would lead to code size increase.
 ///   In this case, we add some "tax" to the total frequency to make it harder
 ///   to sink. E.g.
 ///     Freq(Preheader) = 100
 ///     Freq(BBs) = sum(50, 49) = 99
 ///   Even if Freq(BBs) < Freq(Preheader), we will not sink from Preheade to
 ///   BBs as the difference is too small to justify the code size increase.
 ///   To model this, The adjusted Freq(BBs) will be:
 ///     AdjustedFreq(BBs) = 99 / SinkFrequencyPercentThreshold%
 static BlockFrequency adjustedSumFreq(SmallPtrSetImpl<BasicBlock *> &BBs,
                                       BlockFrequencyInfo &BFI) {
   BlockFrequency T = 0;
   for (BasicBlock *B : BBs)
     T += BFI.getBlockFreq(B);
   if (BBs.size() > 1)
     T /= BranchProbability(SinkFrequencyPercentThreshold, 100);
   return T;
 }
 
 /// Return a set of basic blocks to insert sinked instructions.
 ///
 /// The returned set of basic blocks (BBsToSinkInto) should satisfy:
 ///
 /// * Inside the loop \p L
 /// * For each UseBB in \p UseBBs, there is at least one BB in BBsToSinkInto
 ///   that domintates the UseBB
 /// * Has minimum total frequency that is no greater than preheader frequency
 ///
 /// The purpose of the function is to find the optimal sinking points to
 /// minimize execution cost, which is defined as "sum of frequency of
 /// BBsToSinkInto".
 /// As a result, the returned BBsToSinkInto needs to have minimum total
 /// frequency.
 /// Additionally, if the total frequency of BBsToSinkInto exceeds preheader
 /// frequency, the optimal solution is not sinking (return empty set).
 ///
 /// \p ColdLoopBBs is used to help find the optimal sinking locations.
 /// It stores a list of BBs that is:
 ///
 /// * Inside the loop \p L
 /// * Has a frequency no larger than the loop's preheader
 /// * Sorted by BB frequency
 ///
 /// The complexity of the function is O(UseBBs.size() * ColdLoopBBs.size()).
 /// To avoid expensive computation, we cap the maximum UseBBs.size() in its
 /// caller.
 static SmallPtrSet<BasicBlock *, 2>
 findBBsToSinkInto(const Loop &L, const SmallPtrSetImpl<BasicBlock *> &UseBBs,
                   const SmallVectorImpl<BasicBlock *> &ColdLoopBBs,
                   DominatorTree &DT, BlockFrequencyInfo &BFI) {
   SmallPtrSet<BasicBlock *, 2> BBsToSinkInto;
   if (UseBBs.size() == 0)
     return BBsToSinkInto;
 
   BBsToSinkInto.insert(UseBBs.begin(), UseBBs.end());
   SmallPtrSet<BasicBlock *, 2> BBsDominatedByColdestBB;
 
   // For every iteration:
   //   * Pick the ColdestBB from ColdLoopBBs
   //   * Find the set BBsDominatedByColdestBB that satisfy:
   //     - BBsDominatedByColdestBB is a subset of BBsToSinkInto
   //     - Every BB in BBsDominatedByColdestBB is dominated by ColdestBB
   //   * If Freq(ColdestBB) < Freq(BBsDominatedByColdestBB), remove
   //     BBsDominatedByColdestBB from BBsToSinkInto, add ColdestBB to
   //     BBsToSinkInto
   for (BasicBlock *ColdestBB : ColdLoopBBs) {
     BBsDominatedByColdestBB.clear();
     for (BasicBlock *SinkedBB : BBsToSinkInto)
       if (DT.dominates(ColdestBB, SinkedBB))
         BBsDominatedByColdestBB.insert(SinkedBB);
     if (BBsDominatedByColdestBB.size() == 0)
       continue;
     if (adjustedSumFreq(BBsDominatedByColdestBB, BFI) >
         BFI.getBlockFreq(ColdestBB)) {
       for (BasicBlock *DominatedBB : BBsDominatedByColdestBB) {
         BBsToSinkInto.erase(DominatedBB);
       }
       BBsToSinkInto.insert(ColdestBB);
     }
   }
 
+  // Can't sink into blocks that have no valid insertion point.
+  for (BasicBlock *BB : BBsToSinkInto) {
+    if (BB->getFirstInsertionPt() == BB->end()) {
+      BBsToSinkInto.clear();
+      break;
+    }
+  }
+
   // If the total frequency of BBsToSinkInto is larger than preheader frequency,
   // do not sink.
   if (adjustedSumFreq(BBsToSinkInto, BFI) >
       BFI.getBlockFreq(L.getLoopPreheader()))
     BBsToSinkInto.clear();
   return BBsToSinkInto;
 }
 
 // Sinks \p I from the loop \p L's preheader to its uses. Returns true if
 // sinking is successful.
 // \p LoopBlockNumber is used to sort the insertion blocks to ensure
 // determinism.
 static bool sinkInstruction(Loop &L, Instruction &I,
                             const SmallVectorImpl<BasicBlock *> &ColdLoopBBs,
                             const SmallDenseMap<BasicBlock *, int, 16> &LoopBlockNumber,
                             LoopInfo &LI, DominatorTree &DT,
                             BlockFrequencyInfo &BFI) {
   // Compute the set of blocks in loop L which contain a use of I.
   SmallPtrSet<BasicBlock *, 2> BBs;
   for (auto &U : I.uses()) {
     Instruction *UI = cast<Instruction>(U.getUser());
     // We cannot sink I to PHI-uses.
     if (dyn_cast<PHINode>(UI))
       return false;
     // We cannot sink I if it has uses outside of the loop.
     if (!L.contains(LI.getLoopFor(UI->getParent())))
       return false;
     BBs.insert(UI->getParent());
   }
 
   // findBBsToSinkInto is O(BBs.size() * ColdLoopBBs.size()). We cap the max
   // BBs.size() to avoid expensive computation.
   // FIXME: Handle code size growth for min_size and opt_size.
   if (BBs.size() > MaxNumberOfUseBBsForSinking)
     return false;
 
   // Find the set of BBs that we should insert a copy of I.
   SmallPtrSet<BasicBlock *, 2> BBsToSinkInto =
       findBBsToSinkInto(L, BBs, ColdLoopBBs, DT, BFI);
   if (BBsToSinkInto.empty())
     return false;
 
   // Copy the final BBs into a vector and sort them using the total ordering
   // of the loop block numbers as iterating the set doesn't give a useful
   // order. No need to stable sort as the block numbers are a total ordering.
   SmallVector<BasicBlock *, 2> SortedBBsToSinkInto;
   SortedBBsToSinkInto.insert(SortedBBsToSinkInto.begin(), BBsToSinkInto.begin(),
                              BBsToSinkInto.end());
   llvm::sort(SortedBBsToSinkInto.begin(), SortedBBsToSinkInto.end(),
              [&](BasicBlock *A, BasicBlock *B) {
                return LoopBlockNumber.find(A)->second <
                       LoopBlockNumber.find(B)->second;
              });
 
   BasicBlock *MoveBB = *SortedBBsToSinkInto.begin();
   // FIXME: Optimize the efficiency for cloned value replacement. The current
   //        implementation is O(SortedBBsToSinkInto.size() * I.num_uses()).
   for (BasicBlock *N : makeArrayRef(SortedBBsToSinkInto).drop_front(1)) {
     assert(LoopBlockNumber.find(N)->second >
                LoopBlockNumber.find(MoveBB)->second &&
            "BBs not sorted!");
     // Clone I and replace its uses.
     Instruction *IC = I.clone();
     IC->setName(I.getName());
     IC->insertBefore(&*N->getFirstInsertionPt());
     // Replaces uses of I with IC in N
     for (Value::use_iterator UI = I.use_begin(), UE = I.use_end(); UI != UE;) {
       Use &U = *UI++;
       auto *I = cast<Instruction>(U.getUser());
       if (I->getParent() == N)
         U.set(IC);
     }
     // Replaces uses of I with IC in blocks dominated by N
     replaceDominatedUsesWith(&I, IC, DT, N);
     LLVM_DEBUG(dbgs() << "Sinking a clone of " << I << " To: " << N->getName()
                       << '\n');
     NumLoopSunkCloned++;
   }
   LLVM_DEBUG(dbgs() << "Sinking " << I << " To: " << MoveBB->getName() << '\n');
   NumLoopSunk++;
   I.moveBefore(&*MoveBB->getFirstInsertionPt());
 
   return true;
 }
 
 /// Sinks instructions from loop's preheader to the loop body if the
 /// sum frequency of inserted copy is smaller than preheader's frequency.
 static bool sinkLoopInvariantInstructions(Loop &L, AAResults &AA, LoopInfo &LI,
                                           DominatorTree &DT,
                                           BlockFrequencyInfo &BFI,
                                           ScalarEvolution *SE) {
   BasicBlock *Preheader = L.getLoopPreheader();
   if (!Preheader)
     return false;
 
   // Enable LoopSink only when runtime profile is available.
   // With static profile, the sinking decision may be sub-optimal.
   if (!Preheader->getParent()->hasProfileData())
     return false;
 
   const BlockFrequency PreheaderFreq = BFI.getBlockFreq(Preheader);
   // If there are no basic blocks with lower frequency than the preheader then
   // we can avoid the detailed analysis as we will never find profitable sinking
   // opportunities.
   if (all_of(L.blocks(), [&](const BasicBlock *BB) {
         return BFI.getBlockFreq(BB) > PreheaderFreq;
       }))
     return false;
 
   bool Changed = false;
   AliasSetTracker CurAST(AA);
 
   // Compute alias set.
   for (BasicBlock *BB : L.blocks())
     CurAST.add(*BB);
 
   // Sort loop's basic blocks by frequency
   SmallVector<BasicBlock *, 10> ColdLoopBBs;
   SmallDenseMap<BasicBlock *, int, 16> LoopBlockNumber;
   int i = 0;
   for (BasicBlock *B : L.blocks())
     if (BFI.getBlockFreq(B) < BFI.getBlockFreq(L.getLoopPreheader())) {
       ColdLoopBBs.push_back(B);
       LoopBlockNumber[B] = ++i;
     }
   std::stable_sort(ColdLoopBBs.begin(), ColdLoopBBs.end(),
                    [&](BasicBlock *A, BasicBlock *B) {
                      return BFI.getBlockFreq(A) < BFI.getBlockFreq(B);
                    });
 
   // Traverse preheader's instructions in reverse order becaue if A depends
   // on B (A appears after B), A needs to be sinked first before B can be
   // sinked.
   for (auto II = Preheader->rbegin(), E = Preheader->rend(); II != E;) {
     Instruction *I = &*II++;
     // No need to check for instruction's operands are loop invariant.
     assert(L.hasLoopInvariantOperands(I) &&
            "Insts in a loop's preheader should have loop invariant operands!");
     if (!canSinkOrHoistInst(*I, &AA, &DT, &L, &CurAST, nullptr))
       continue;
     if (sinkInstruction(L, *I, ColdLoopBBs, LoopBlockNumber, LI, DT, BFI))
       Changed = true;
   }
 
   if (Changed && SE)
     SE->forgetLoopDispositions(&L);
   return Changed;
 }
 
 PreservedAnalyses LoopSinkPass::run(Function &F, FunctionAnalysisManager &FAM) {
   LoopInfo &LI = FAM.getResult<LoopAnalysis>(F);
   // Nothing to do if there are no loops.
   if (LI.empty())
     return PreservedAnalyses::all();
 
   AAResults &AA = FAM.getResult<AAManager>(F);
   DominatorTree &DT = FAM.getResult<DominatorTreeAnalysis>(F);
   BlockFrequencyInfo &BFI = FAM.getResult<BlockFrequencyAnalysis>(F);
 
   // We want to do a postorder walk over the loops. Since loops are a tree this
   // is equivalent to a reversed preorder walk and preorder is easy to compute
   // without recursion. Since we reverse the preorder, we will visit siblings
   // in reverse program order. This isn't expected to matter at all but is more
   // consistent with sinking algorithms which generally work bottom-up.
   SmallVector<Loop *, 4> PreorderLoops = LI.getLoopsInPreorder();
 
   bool Changed = false;
   do {
     Loop &L = *PreorderLoops.pop_back_val();
 
     // Note that we don't pass SCEV here because it is only used to invalidate
     // loops in SCEV and we don't preserve (or request) SCEV at all making that
     // unnecessary.
     Changed |= sinkLoopInvariantInstructions(L, AA, LI, DT, BFI,
                                              /*ScalarEvolution*/ nullptr);
   } while (!PreorderLoops.empty());
 
   if (!Changed)
     return PreservedAnalyses::all();
 
   PreservedAnalyses PA;
   PA.preserveSet<CFGAnalyses>();
   return PA;
 }
 
 namespace {
 struct LegacyLoopSinkPass : public LoopPass {
   static char ID;
   LegacyLoopSinkPass() : LoopPass(ID) {
     initializeLegacyLoopSinkPassPass(*PassRegistry::getPassRegistry());
   }
 
   bool runOnLoop(Loop *L, LPPassManager &LPM) override {
     if (skipLoop(L))
       return false;
 
     auto *SE = getAnalysisIfAvailable<ScalarEvolutionWrapperPass>();
     return sinkLoopInvariantInstructions(
         *L, getAnalysis<AAResultsWrapperPass>().getAAResults(),
         getAnalysis<LoopInfoWrapperPass>().getLoopInfo(),
         getAnalysis<DominatorTreeWrapperPass>().getDomTree(),
         getAnalysis<BlockFrequencyInfoWrapperPass>().getBFI(),
         SE ? &SE->getSE() : nullptr);
   }
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.setPreservesCFG();
     AU.addRequired<BlockFrequencyInfoWrapperPass>();
     getLoopAnalysisUsage(AU);
   }
 };
 }
 
 char LegacyLoopSinkPass::ID = 0;
 INITIALIZE_PASS_BEGIN(LegacyLoopSinkPass, "loop-sink", "Loop Sink", false,
                       false)
 INITIALIZE_PASS_DEPENDENCY(LoopPass)
 INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfoWrapperPass)
 INITIALIZE_PASS_END(LegacyLoopSinkPass, "loop-sink", "Loop Sink", false, false)
 
 Pass *llvm::createLoopSinkPass() { return new LegacyLoopSinkPass(); }
Index: vendor/llvm/dist-release_70/lib/Transforms/Scalar/SROA.cpp
===================================================================
--- vendor/llvm/dist-release_70/lib/Transforms/Scalar/SROA.cpp	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Transforms/Scalar/SROA.cpp	(revision 338575)
@@ -1,4520 +1,4562 @@
 //===- SROA.cpp - Scalar Replacement Of Aggregates ------------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 /// \file
 /// This transformation implements the well known scalar replacement of
 /// aggregates transformation. It tries to identify promotable elements of an
 /// aggregate alloca, and promote them to registers. It will also try to
 /// convert uses of an element (or set of elements) of an alloca into a vector
 /// or bitfield-style integer scalar if appropriate.
 ///
 /// It works to do this with minimal slicing of the alloca so that regions
 /// which are merely transferred in and out of external memory remain unchanged
 /// and are not decomposed to scalar code.
 ///
 /// Because this also performs alloca promotion, it can be thought of as also
 /// serving the purpose of SSA formation. The algorithm iterates on the
 /// function until all opportunities for promotion have been realized.
 ///
 //===----------------------------------------------------------------------===//
 
 #include "llvm/Transforms/Scalar/SROA.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/PointerIntPair.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallBitVector.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/ADT/iterator.h"
 #include "llvm/ADT/iterator_range.h"
 #include "llvm/Analysis/AssumptionCache.h"
 #include "llvm/Analysis/GlobalsModRef.h"
 #include "llvm/Analysis/Loads.h"
 #include "llvm/Analysis/PtrUseVisitor.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include "llvm/Config/llvm-config.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/Constant.h"
 #include "llvm/IR/ConstantFolder.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DIBuilder.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/DebugInfoMetadata.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GetElementPtrTypeIterator.h"
 #include "llvm/IR/GlobalAlias.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/InstVisitor.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Operator.h"
 #include "llvm/IR/PassManager.h"
 #include "llvm/IR/Type.h"
 #include "llvm/IR/Use.h"
 #include "llvm/IR/User.h"
 #include "llvm/IR/Value.h"
 #include "llvm/Pass.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Transforms/Scalar.h"
 #include "llvm/Transforms/Utils/PromoteMemToReg.h"
 #include <algorithm>
 #include <cassert>
 #include <chrono>
 #include <cstddef>
 #include <cstdint>
 #include <cstring>
 #include <iterator>
 #include <string>
 #include <tuple>
 #include <utility>
 #include <vector>
 
 #ifndef NDEBUG
 // We only use this for a debug check.
 #include <random>
 #endif
 
 using namespace llvm;
 using namespace llvm::sroa;
 
 #define DEBUG_TYPE "sroa"
 
 STATISTIC(NumAllocasAnalyzed, "Number of allocas analyzed for replacement");
 STATISTIC(NumAllocaPartitions, "Number of alloca partitions formed");
 STATISTIC(MaxPartitionsPerAlloca, "Maximum number of partitions per alloca");
 STATISTIC(NumAllocaPartitionUses, "Number of alloca partition uses rewritten");
 STATISTIC(MaxUsesPerAllocaPartition, "Maximum number of uses of a partition");
 STATISTIC(NumNewAllocas, "Number of new, smaller allocas introduced");
 STATISTIC(NumPromoted, "Number of allocas promoted to SSA values");
 STATISTIC(NumLoadsSpeculated, "Number of loads speculated to allow promotion");
 STATISTIC(NumDeleted, "Number of instructions deleted");
 STATISTIC(NumVectorized, "Number of vectorized aggregates");
 
 /// Hidden option to enable randomly shuffling the slices to help uncover
 /// instability in their order.
 static cl::opt<bool> SROARandomShuffleSlices("sroa-random-shuffle-slices",
                                              cl::init(false), cl::Hidden);
 
 /// Hidden option to experiment with completely strict handling of inbounds
 /// GEPs.
 static cl::opt<bool> SROAStrictInbounds("sroa-strict-inbounds", cl::init(false),
                                         cl::Hidden);
 
 namespace {
 
 /// A custom IRBuilder inserter which prefixes all names, but only in
 /// Assert builds.
 class IRBuilderPrefixedInserter : public IRBuilderDefaultInserter {
   std::string Prefix;
 
   const Twine getNameWithPrefix(const Twine &Name) const {
     return Name.isTriviallyEmpty() ? Name : Prefix + Name;
   }
 
 public:
   void SetNamePrefix(const Twine &P) { Prefix = P.str(); }
 
 protected:
   void InsertHelper(Instruction *I, const Twine &Name, BasicBlock *BB,
                     BasicBlock::iterator InsertPt) const {
     IRBuilderDefaultInserter::InsertHelper(I, getNameWithPrefix(Name), BB,
                                            InsertPt);
   }
 };
 
 /// Provide a type for IRBuilder that drops names in release builds.
 using IRBuilderTy = IRBuilder<ConstantFolder, IRBuilderPrefixedInserter>;
 
 /// A used slice of an alloca.
 ///
 /// This structure represents a slice of an alloca used by some instruction. It
 /// stores both the begin and end offsets of this use, a pointer to the use
 /// itself, and a flag indicating whether we can classify the use as splittable
 /// or not when forming partitions of the alloca.
 class Slice {
   /// The beginning offset of the range.
   uint64_t BeginOffset = 0;
 
   /// The ending offset, not included in the range.
   uint64_t EndOffset = 0;
 
   /// Storage for both the use of this slice and whether it can be
   /// split.
   PointerIntPair<Use *, 1, bool> UseAndIsSplittable;
 
 public:
   Slice() = default;
 
   Slice(uint64_t BeginOffset, uint64_t EndOffset, Use *U, bool IsSplittable)
       : BeginOffset(BeginOffset), EndOffset(EndOffset),
         UseAndIsSplittable(U, IsSplittable) {}
 
   uint64_t beginOffset() const { return BeginOffset; }
   uint64_t endOffset() const { return EndOffset; }
 
   bool isSplittable() const { return UseAndIsSplittable.getInt(); }
   void makeUnsplittable() { UseAndIsSplittable.setInt(false); }
 
   Use *getUse() const { return UseAndIsSplittable.getPointer(); }
 
   bool isDead() const { return getUse() == nullptr; }
   void kill() { UseAndIsSplittable.setPointer(nullptr); }
 
   /// Support for ordering ranges.
   ///
   /// This provides an ordering over ranges such that start offsets are
   /// always increasing, and within equal start offsets, the end offsets are
   /// decreasing. Thus the spanning range comes first in a cluster with the
   /// same start position.
   bool operator<(const Slice &RHS) const {
     if (beginOffset() < RHS.beginOffset())
       return true;
     if (beginOffset() > RHS.beginOffset())
       return false;
     if (isSplittable() != RHS.isSplittable())
       return !isSplittable();
     if (endOffset() > RHS.endOffset())
       return true;
     return false;
   }
 
   /// Support comparison with a single offset to allow binary searches.
   friend LLVM_ATTRIBUTE_UNUSED bool operator<(const Slice &LHS,
                                               uint64_t RHSOffset) {
     return LHS.beginOffset() < RHSOffset;
   }
   friend LLVM_ATTRIBUTE_UNUSED bool operator<(uint64_t LHSOffset,
                                               const Slice &RHS) {
     return LHSOffset < RHS.beginOffset();
   }
 
   bool operator==(const Slice &RHS) const {
     return isSplittable() == RHS.isSplittable() &&
            beginOffset() == RHS.beginOffset() && endOffset() == RHS.endOffset();
   }
   bool operator!=(const Slice &RHS) const { return !operator==(RHS); }
 };
 
 } // end anonymous namespace
 
 namespace llvm {
 
 template <typename T> struct isPodLike;
 template <> struct isPodLike<Slice> { static const bool value = true; };
 
 } // end namespace llvm
 
 /// Representation of the alloca slices.
 ///
 /// This class represents the slices of an alloca which are formed by its
 /// various uses. If a pointer escapes, we can't fully build a representation
 /// for the slices used and we reflect that in this structure. The uses are
 /// stored, sorted by increasing beginning offset and with unsplittable slices
 /// starting at a particular offset before splittable slices.
 class llvm::sroa::AllocaSlices {
 public:
   /// Construct the slices of a particular alloca.
   AllocaSlices(const DataLayout &DL, AllocaInst &AI);
 
   /// Test whether a pointer to the allocation escapes our analysis.
   ///
   /// If this is true, the slices are never fully built and should be
   /// ignored.
   bool isEscaped() const { return PointerEscapingInstr; }
 
   /// Support for iterating over the slices.
   /// @{
   using iterator = SmallVectorImpl<Slice>::iterator;
   using range = iterator_range<iterator>;
 
   iterator begin() { return Slices.begin(); }
   iterator end() { return Slices.end(); }
 
   using const_iterator = SmallVectorImpl<Slice>::const_iterator;
   using const_range = iterator_range<const_iterator>;
 
   const_iterator begin() const { return Slices.begin(); }
   const_iterator end() const { return Slices.end(); }
   /// @}
 
   /// Erase a range of slices.
   void erase(iterator Start, iterator Stop) { Slices.erase(Start, Stop); }
 
   /// Insert new slices for this alloca.
   ///
   /// This moves the slices into the alloca's slices collection, and re-sorts
   /// everything so that the usual ordering properties of the alloca's slices
   /// hold.
   void insert(ArrayRef<Slice> NewSlices) {
     int OldSize = Slices.size();
     Slices.append(NewSlices.begin(), NewSlices.end());
     auto SliceI = Slices.begin() + OldSize;
     llvm::sort(SliceI, Slices.end());
     std::inplace_merge(Slices.begin(), SliceI, Slices.end());
   }
 
   // Forward declare the iterator and range accessor for walking the
   // partitions.
   class partition_iterator;
   iterator_range<partition_iterator> partitions();
 
   /// Access the dead users for this alloca.
   ArrayRef<Instruction *> getDeadUsers() const { return DeadUsers; }
 
   /// Access the dead operands referring to this alloca.
   ///
   /// These are operands which have cannot actually be used to refer to the
   /// alloca as they are outside its range and the user doesn't correct for
   /// that. These mostly consist of PHI node inputs and the like which we just
   /// need to replace with undef.
   ArrayRef<Use *> getDeadOperands() const { return DeadOperands; }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   void print(raw_ostream &OS, const_iterator I, StringRef Indent = "  ") const;
   void printSlice(raw_ostream &OS, const_iterator I,
                   StringRef Indent = "  ") const;
   void printUse(raw_ostream &OS, const_iterator I,
                 StringRef Indent = "  ") const;
   void print(raw_ostream &OS) const;
   void dump(const_iterator I) const;
   void dump() const;
 #endif
 
 private:
   template <typename DerivedT, typename RetT = void> class BuilderBase;
   class SliceBuilder;
 
   friend class AllocaSlices::SliceBuilder;
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   /// Handle to alloca instruction to simplify method interfaces.
   AllocaInst &AI;
 #endif
 
   /// The instruction responsible for this alloca not having a known set
   /// of slices.
   ///
   /// When an instruction (potentially) escapes the pointer to the alloca, we
   /// store a pointer to that here and abort trying to form slices of the
   /// alloca. This will be null if the alloca slices are analyzed successfully.
   Instruction *PointerEscapingInstr;
 
   /// The slices of the alloca.
   ///
   /// We store a vector of the slices formed by uses of the alloca here. This
   /// vector is sorted by increasing begin offset, and then the unsplittable
   /// slices before the splittable ones. See the Slice inner class for more
   /// details.
   SmallVector<Slice, 8> Slices;
 
   /// Instructions which will become dead if we rewrite the alloca.
   ///
   /// Note that these are not separated by slice. This is because we expect an
   /// alloca to be completely rewritten or not rewritten at all. If rewritten,
   /// all these instructions can simply be removed and replaced with undef as
   /// they come from outside of the allocated space.
   SmallVector<Instruction *, 8> DeadUsers;
 
   /// Operands which will become dead if we rewrite the alloca.
   ///
   /// These are operands that in their particular use can be replaced with
   /// undef when we rewrite the alloca. These show up in out-of-bounds inputs
   /// to PHI nodes and the like. They aren't entirely dead (there might be
   /// a GEP back into the bounds using it elsewhere) and nor is the PHI, but we
   /// want to swap this particular input for undef to simplify the use lists of
   /// the alloca.
   SmallVector<Use *, 8> DeadOperands;
 };
 
 /// A partition of the slices.
 ///
 /// An ephemeral representation for a range of slices which can be viewed as
 /// a partition of the alloca. This range represents a span of the alloca's
 /// memory which cannot be split, and provides access to all of the slices
 /// overlapping some part of the partition.
 ///
 /// Objects of this type are produced by traversing the alloca's slices, but
 /// are only ephemeral and not persistent.
 class llvm::sroa::Partition {
 private:
   friend class AllocaSlices;
   friend class AllocaSlices::partition_iterator;
 
   using iterator = AllocaSlices::iterator;
 
   /// The beginning and ending offsets of the alloca for this
   /// partition.
   uint64_t BeginOffset, EndOffset;
 
   /// The start and end iterators of this partition.
   iterator SI, SJ;
 
   /// A collection of split slice tails overlapping the partition.
   SmallVector<Slice *, 4> SplitTails;
 
   /// Raw constructor builds an empty partition starting and ending at
   /// the given iterator.
   Partition(iterator SI) : SI(SI), SJ(SI) {}
 
 public:
   /// The start offset of this partition.
   ///
   /// All of the contained slices start at or after this offset.
   uint64_t beginOffset() const { return BeginOffset; }
 
   /// The end offset of this partition.
   ///
   /// All of the contained slices end at or before this offset.
   uint64_t endOffset() const { return EndOffset; }
 
   /// The size of the partition.
   ///
   /// Note that this can never be zero.
   uint64_t size() const {
     assert(BeginOffset < EndOffset && "Partitions must span some bytes!");
     return EndOffset - BeginOffset;
   }
 
   /// Test whether this partition contains no slices, and merely spans
   /// a region occupied by split slices.
   bool empty() const { return SI == SJ; }
 
   /// \name Iterate slices that start within the partition.
   /// These may be splittable or unsplittable. They have a begin offset >= the
   /// partition begin offset.
   /// @{
   // FIXME: We should probably define a "concat_iterator" helper and use that
   // to stitch together pointee_iterators over the split tails and the
   // contiguous iterators of the partition. That would give a much nicer
   // interface here. We could then additionally expose filtered iterators for
   // split, unsplit, and unsplittable splices based on the usage patterns.
   iterator begin() const { return SI; }
   iterator end() const { return SJ; }
   /// @}
 
   /// Get the sequence of split slice tails.
   ///
   /// These tails are of slices which start before this partition but are
   /// split and overlap into the partition. We accumulate these while forming
   /// partitions.
   ArrayRef<Slice *> splitSliceTails() const { return SplitTails; }
 };
 
 /// An iterator over partitions of the alloca's slices.
 ///
 /// This iterator implements the core algorithm for partitioning the alloca's
 /// slices. It is a forward iterator as we don't support backtracking for
 /// efficiency reasons, and re-use a single storage area to maintain the
 /// current set of split slices.
 ///
 /// It is templated on the slice iterator type to use so that it can operate
 /// with either const or non-const slice iterators.
 class AllocaSlices::partition_iterator
     : public iterator_facade_base<partition_iterator, std::forward_iterator_tag,
                                   Partition> {
   friend class AllocaSlices;
 
   /// Most of the state for walking the partitions is held in a class
   /// with a nice interface for examining them.
   Partition P;
 
   /// We need to keep the end of the slices to know when to stop.
   AllocaSlices::iterator SE;
 
   /// We also need to keep track of the maximum split end offset seen.
   /// FIXME: Do we really?
   uint64_t MaxSplitSliceEndOffset = 0;
 
   /// Sets the partition to be empty at given iterator, and sets the
   /// end iterator.
   partition_iterator(AllocaSlices::iterator SI, AllocaSlices::iterator SE)
       : P(SI), SE(SE) {
     // If not already at the end, advance our state to form the initial
     // partition.
     if (SI != SE)
       advance();
   }
 
   /// Advance the iterator to the next partition.
   ///
   /// Requires that the iterator not be at the end of the slices.
   void advance() {
     assert((P.SI != SE || !P.SplitTails.empty()) &&
            "Cannot advance past the end of the slices!");
 
     // Clear out any split uses which have ended.
     if (!P.SplitTails.empty()) {
       if (P.EndOffset >= MaxSplitSliceEndOffset) {
         // If we've finished all splits, this is easy.
         P.SplitTails.clear();
         MaxSplitSliceEndOffset = 0;
       } else {
         // Remove the uses which have ended in the prior partition. This
         // cannot change the max split slice end because we just checked that
         // the prior partition ended prior to that max.
         P.SplitTails.erase(llvm::remove_if(P.SplitTails,
                                            [&](Slice *S) {
                                              return S->endOffset() <=
                                                     P.EndOffset;
                                            }),
                            P.SplitTails.end());
         assert(llvm::any_of(P.SplitTails,
                             [&](Slice *S) {
                               return S->endOffset() == MaxSplitSliceEndOffset;
                             }) &&
                "Could not find the current max split slice offset!");
         assert(llvm::all_of(P.SplitTails,
                             [&](Slice *S) {
                               return S->endOffset() <= MaxSplitSliceEndOffset;
                             }) &&
                "Max split slice end offset is not actually the max!");
       }
     }
 
     // If P.SI is already at the end, then we've cleared the split tail and
     // now have an end iterator.
     if (P.SI == SE) {
       assert(P.SplitTails.empty() && "Failed to clear the split slices!");
       return;
     }
 
     // If we had a non-empty partition previously, set up the state for
     // subsequent partitions.
     if (P.SI != P.SJ) {
       // Accumulate all the splittable slices which started in the old
       // partition into the split list.
       for (Slice &S : P)
         if (S.isSplittable() && S.endOffset() > P.EndOffset) {
           P.SplitTails.push_back(&S);
           MaxSplitSliceEndOffset =
               std::max(S.endOffset(), MaxSplitSliceEndOffset);
         }
 
       // Start from the end of the previous partition.
       P.SI = P.SJ;
 
       // If P.SI is now at the end, we at most have a tail of split slices.
       if (P.SI == SE) {
         P.BeginOffset = P.EndOffset;
         P.EndOffset = MaxSplitSliceEndOffset;
         return;
       }
 
       // If the we have split slices and the next slice is after a gap and is
       // not splittable immediately form an empty partition for the split
       // slices up until the next slice begins.
       if (!P.SplitTails.empty() && P.SI->beginOffset() != P.EndOffset &&
           !P.SI->isSplittable()) {
         P.BeginOffset = P.EndOffset;
         P.EndOffset = P.SI->beginOffset();
         return;
       }
     }
 
     // OK, we need to consume new slices. Set the end offset based on the
     // current slice, and step SJ past it. The beginning offset of the
     // partition is the beginning offset of the next slice unless we have
     // pre-existing split slices that are continuing, in which case we begin
     // at the prior end offset.
     P.BeginOffset = P.SplitTails.empty() ? P.SI->beginOffset() : P.EndOffset;
     P.EndOffset = P.SI->endOffset();
     ++P.SJ;
 
     // There are two strategies to form a partition based on whether the
     // partition starts with an unsplittable slice or a splittable slice.
     if (!P.SI->isSplittable()) {
       // When we're forming an unsplittable region, it must always start at
       // the first slice and will extend through its end.
       assert(P.BeginOffset == P.SI->beginOffset());
 
       // Form a partition including all of the overlapping slices with this
       // unsplittable slice.
       while (P.SJ != SE && P.SJ->beginOffset() < P.EndOffset) {
         if (!P.SJ->isSplittable())
           P.EndOffset = std::max(P.EndOffset, P.SJ->endOffset());
         ++P.SJ;
       }
 
       // We have a partition across a set of overlapping unsplittable
       // partitions.
       return;
     }
 
     // If we're starting with a splittable slice, then we need to form
     // a synthetic partition spanning it and any other overlapping splittable
     // splices.
     assert(P.SI->isSplittable() && "Forming a splittable partition!");
 
     // Collect all of the overlapping splittable slices.
     while (P.SJ != SE && P.SJ->beginOffset() < P.EndOffset &&
            P.SJ->isSplittable()) {
       P.EndOffset = std::max(P.EndOffset, P.SJ->endOffset());
       ++P.SJ;
     }
 
     // Back upiP.EndOffset if we ended the span early when encountering an
     // unsplittable slice. This synthesizes the early end offset of
     // a partition spanning only splittable slices.
     if (P.SJ != SE && P.SJ->beginOffset() < P.EndOffset) {
       assert(!P.SJ->isSplittable());
       P.EndOffset = P.SJ->beginOffset();
     }
   }
 
 public:
   bool operator==(const partition_iterator &RHS) const {
     assert(SE == RHS.SE &&
            "End iterators don't match between compared partition iterators!");
 
     // The observed positions of partitions is marked by the P.SI iterator and
     // the emptiness of the split slices. The latter is only relevant when
     // P.SI == SE, as the end iterator will additionally have an empty split
     // slices list, but the prior may have the same P.SI and a tail of split
     // slices.
     if (P.SI == RHS.P.SI && P.SplitTails.empty() == RHS.P.SplitTails.empty()) {
       assert(P.SJ == RHS.P.SJ &&
              "Same set of slices formed two different sized partitions!");
       assert(P.SplitTails.size() == RHS.P.SplitTails.size() &&
              "Same slice position with differently sized non-empty split "
              "slice tails!");
       return true;
     }
     return false;
   }
 
   partition_iterator &operator++() {
     advance();
     return *this;
   }
 
   Partition &operator*() { return P; }
 };
 
 /// A forward range over the partitions of the alloca's slices.
 ///
 /// This accesses an iterator range over the partitions of the alloca's
 /// slices. It computes these partitions on the fly based on the overlapping
 /// offsets of the slices and the ability to split them. It will visit "empty"
 /// partitions to cover regions of the alloca only accessed via split
 /// slices.
 iterator_range<AllocaSlices::partition_iterator> AllocaSlices::partitions() {
   return make_range(partition_iterator(begin(), end()),
                     partition_iterator(end(), end()));
 }
 
 static Value *foldSelectInst(SelectInst &SI) {
   // If the condition being selected on is a constant or the same value is
   // being selected between, fold the select. Yes this does (rarely) happen
   // early on.
   if (ConstantInt *CI = dyn_cast<ConstantInt>(SI.getCondition()))
     return SI.getOperand(1 + CI->isZero());
   if (SI.getOperand(1) == SI.getOperand(2))
     return SI.getOperand(1);
 
   return nullptr;
 }
 
 /// A helper that folds a PHI node or a select.
 static Value *foldPHINodeOrSelectInst(Instruction &I) {
   if (PHINode *PN = dyn_cast<PHINode>(&I)) {
     // If PN merges together the same value, return that value.
     return PN->hasConstantValue();
   }
   return foldSelectInst(cast<SelectInst>(I));
 }
 
 /// Builder for the alloca slices.
 ///
 /// This class builds a set of alloca slices by recursively visiting the uses
 /// of an alloca and making a slice for each load and store at each offset.
 class AllocaSlices::SliceBuilder : public PtrUseVisitor<SliceBuilder> {
   friend class PtrUseVisitor<SliceBuilder>;
   friend class InstVisitor<SliceBuilder>;
 
   using Base = PtrUseVisitor<SliceBuilder>;
 
   const uint64_t AllocSize;
   AllocaSlices &AS;
 
   SmallDenseMap<Instruction *, unsigned> MemTransferSliceMap;
   SmallDenseMap<Instruction *, uint64_t> PHIOrSelectSizes;
 
   /// Set to de-duplicate dead instructions found in the use walk.
   SmallPtrSet<Instruction *, 4> VisitedDeadInsts;
 
 public:
   SliceBuilder(const DataLayout &DL, AllocaInst &AI, AllocaSlices &AS)
       : PtrUseVisitor<SliceBuilder>(DL),
         AllocSize(DL.getTypeAllocSize(AI.getAllocatedType())), AS(AS) {}
 
 private:
   void markAsDead(Instruction &I) {
     if (VisitedDeadInsts.insert(&I).second)
       AS.DeadUsers.push_back(&I);
   }
 
   void insertUse(Instruction &I, const APInt &Offset, uint64_t Size,
                  bool IsSplittable = false) {
     // Completely skip uses which have a zero size or start either before or
     // past the end of the allocation.
     if (Size == 0 || Offset.uge(AllocSize)) {
       LLVM_DEBUG(dbgs() << "WARNING: Ignoring " << Size << " byte use @"
                         << Offset
                         << " which has zero size or starts outside of the "
                         << AllocSize << " byte alloca:\n"
                         << "    alloca: " << AS.AI << "\n"
                         << "       use: " << I << "\n");
       return markAsDead(I);
     }
 
     uint64_t BeginOffset = Offset.getZExtValue();
     uint64_t EndOffset = BeginOffset + Size;
 
     // Clamp the end offset to the end of the allocation. Note that this is
     // formulated to handle even the case where "BeginOffset + Size" overflows.
     // This may appear superficially to be something we could ignore entirely,
     // but that is not so! There may be widened loads or PHI-node uses where
     // some instructions are dead but not others. We can't completely ignore
     // them, and so have to record at least the information here.
     assert(AllocSize >= BeginOffset); // Established above.
     if (Size > AllocSize - BeginOffset) {
       LLVM_DEBUG(dbgs() << "WARNING: Clamping a " << Size << " byte use @"
                         << Offset << " to remain within the " << AllocSize
                         << " byte alloca:\n"
                         << "    alloca: " << AS.AI << "\n"
                         << "       use: " << I << "\n");
       EndOffset = AllocSize;
     }
 
     AS.Slices.push_back(Slice(BeginOffset, EndOffset, U, IsSplittable));
   }
 
   void visitBitCastInst(BitCastInst &BC) {
     if (BC.use_empty())
       return markAsDead(BC);
 
     return Base::visitBitCastInst(BC);
   }
 
   void visitGetElementPtrInst(GetElementPtrInst &GEPI) {
     if (GEPI.use_empty())
       return markAsDead(GEPI);
 
     if (SROAStrictInbounds && GEPI.isInBounds()) {
       // FIXME: This is a manually un-factored variant of the basic code inside
       // of GEPs with checking of the inbounds invariant specified in the
       // langref in a very strict sense. If we ever want to enable
       // SROAStrictInbounds, this code should be factored cleanly into
       // PtrUseVisitor, but it is easier to experiment with SROAStrictInbounds
       // by writing out the code here where we have the underlying allocation
       // size readily available.
       APInt GEPOffset = Offset;
       const DataLayout &DL = GEPI.getModule()->getDataLayout();
       for (gep_type_iterator GTI = gep_type_begin(GEPI),
                              GTE = gep_type_end(GEPI);
            GTI != GTE; ++GTI) {
         ConstantInt *OpC = dyn_cast<ConstantInt>(GTI.getOperand());
         if (!OpC)
           break;
 
         // Handle a struct index, which adds its field offset to the pointer.
         if (StructType *STy = GTI.getStructTypeOrNull()) {
           unsigned ElementIdx = OpC->getZExtValue();
           const StructLayout *SL = DL.getStructLayout(STy);
           GEPOffset +=
               APInt(Offset.getBitWidth(), SL->getElementOffset(ElementIdx));
         } else {
           // For array or vector indices, scale the index by the size of the
           // type.
           APInt Index = OpC->getValue().sextOrTrunc(Offset.getBitWidth());
           GEPOffset += Index * APInt(Offset.getBitWidth(),
                                      DL.getTypeAllocSize(GTI.getIndexedType()));
         }
 
         // If this index has computed an intermediate pointer which is not
         // inbounds, then the result of the GEP is a poison value and we can
         // delete it and all uses.
         if (GEPOffset.ugt(AllocSize))
           return markAsDead(GEPI);
       }
     }
 
     return Base::visitGetElementPtrInst(GEPI);
   }
 
   void handleLoadOrStore(Type *Ty, Instruction &I, const APInt &Offset,
                          uint64_t Size, bool IsVolatile) {
     // We allow splitting of non-volatile loads and stores where the type is an
     // integer type. These may be used to implement 'memcpy' or other "transfer
     // of bits" patterns.
     bool IsSplittable = Ty->isIntegerTy() && !IsVolatile;
 
     insertUse(I, Offset, Size, IsSplittable);
   }
 
   void visitLoadInst(LoadInst &LI) {
     assert((!LI.isSimple() || LI.getType()->isSingleValueType()) &&
            "All simple FCA loads should have been pre-split");
 
     if (!IsOffsetKnown)
       return PI.setAborted(&LI);
 
     const DataLayout &DL = LI.getModule()->getDataLayout();
     uint64_t Size = DL.getTypeStoreSize(LI.getType());
     return handleLoadOrStore(LI.getType(), LI, Offset, Size, LI.isVolatile());
   }
 
   void visitStoreInst(StoreInst &SI) {
     Value *ValOp = SI.getValueOperand();
     if (ValOp == *U)
       return PI.setEscapedAndAborted(&SI);
     if (!IsOffsetKnown)
       return PI.setAborted(&SI);
 
     const DataLayout &DL = SI.getModule()->getDataLayout();
     uint64_t Size = DL.getTypeStoreSize(ValOp->getType());
 
     // If this memory access can be shown to *statically* extend outside the
     // bounds of the allocation, it's behavior is undefined, so simply
     // ignore it. Note that this is more strict than the generic clamping
     // behavior of insertUse. We also try to handle cases which might run the
     // risk of overflow.
     // FIXME: We should instead consider the pointer to have escaped if this
     // function is being instrumented for addressing bugs or race conditions.
     if (Size > AllocSize || Offset.ugt(AllocSize - Size)) {
       LLVM_DEBUG(dbgs() << "WARNING: Ignoring " << Size << " byte store @"
                         << Offset << " which extends past the end of the "
                         << AllocSize << " byte alloca:\n"
                         << "    alloca: " << AS.AI << "\n"
                         << "       use: " << SI << "\n");
       return markAsDead(SI);
     }
 
     assert((!SI.isSimple() || ValOp->getType()->isSingleValueType()) &&
            "All simple FCA stores should have been pre-split");
     handleLoadOrStore(ValOp->getType(), SI, Offset, Size, SI.isVolatile());
   }
 
   void visitMemSetInst(MemSetInst &II) {
     assert(II.getRawDest() == *U && "Pointer use is not the destination?");
     ConstantInt *Length = dyn_cast<ConstantInt>(II.getLength());
     if ((Length && Length->getValue() == 0) ||
         (IsOffsetKnown && Offset.uge(AllocSize)))
       // Zero-length mem transfer intrinsics can be ignored entirely.
       return markAsDead(II);
 
     if (!IsOffsetKnown)
       return PI.setAborted(&II);
 
     insertUse(II, Offset, Length ? Length->getLimitedValue()
                                  : AllocSize - Offset.getLimitedValue(),
               (bool)Length);
   }
 
   void visitMemTransferInst(MemTransferInst &II) {
     ConstantInt *Length = dyn_cast<ConstantInt>(II.getLength());
     if (Length && Length->getValue() == 0)
       // Zero-length mem transfer intrinsics can be ignored entirely.
       return markAsDead(II);
 
     // Because we can visit these intrinsics twice, also check to see if the
     // first time marked this instruction as dead. If so, skip it.
     if (VisitedDeadInsts.count(&II))
       return;
 
     if (!IsOffsetKnown)
       return PI.setAborted(&II);
 
     // This side of the transfer is completely out-of-bounds, and so we can
     // nuke the entire transfer. However, we also need to nuke the other side
     // if already added to our partitions.
     // FIXME: Yet another place we really should bypass this when
     // instrumenting for ASan.
     if (Offset.uge(AllocSize)) {
       SmallDenseMap<Instruction *, unsigned>::iterator MTPI =
           MemTransferSliceMap.find(&II);
       if (MTPI != MemTransferSliceMap.end())
         AS.Slices[MTPI->second].kill();
       return markAsDead(II);
     }
 
     uint64_t RawOffset = Offset.getLimitedValue();
     uint64_t Size = Length ? Length->getLimitedValue() : AllocSize - RawOffset;
 
     // Check for the special case where the same exact value is used for both
     // source and dest.
     if (*U == II.getRawDest() && *U == II.getRawSource()) {
       // For non-volatile transfers this is a no-op.
       if (!II.isVolatile())
         return markAsDead(II);
 
       return insertUse(II, Offset, Size, /*IsSplittable=*/false);
     }
 
     // If we have seen both source and destination for a mem transfer, then
     // they both point to the same alloca.
     bool Inserted;
     SmallDenseMap<Instruction *, unsigned>::iterator MTPI;
     std::tie(MTPI, Inserted) =
         MemTransferSliceMap.insert(std::make_pair(&II, AS.Slices.size()));
     unsigned PrevIdx = MTPI->second;
     if (!Inserted) {
       Slice &PrevP = AS.Slices[PrevIdx];
 
       // Check if the begin offsets match and this is a non-volatile transfer.
       // In that case, we can completely elide the transfer.
       if (!II.isVolatile() && PrevP.beginOffset() == RawOffset) {
         PrevP.kill();
         return markAsDead(II);
       }
 
       // Otherwise we have an offset transfer within the same alloca. We can't
       // split those.
       PrevP.makeUnsplittable();
     }
 
     // Insert the use now that we've fixed up the splittable nature.
     insertUse(II, Offset, Size, /*IsSplittable=*/Inserted && Length);
 
     // Check that we ended up with a valid index in the map.
     assert(AS.Slices[PrevIdx].getUse()->getUser() == &II &&
            "Map index doesn't point back to a slice with this user.");
   }
 
   // Disable SRoA for any intrinsics except for lifetime invariants.
   // FIXME: What about debug intrinsics? This matches old behavior, but
   // doesn't make sense.
   void visitIntrinsicInst(IntrinsicInst &II) {
     if (!IsOffsetKnown)
       return PI.setAborted(&II);
 
     if (II.getIntrinsicID() == Intrinsic::lifetime_start ||
         II.getIntrinsicID() == Intrinsic::lifetime_end) {
       ConstantInt *Length = cast<ConstantInt>(II.getArgOperand(0));
       uint64_t Size = std::min(AllocSize - Offset.getLimitedValue(),
                                Length->getLimitedValue());
       insertUse(II, Offset, Size, true);
       return;
     }
 
     Base::visitIntrinsicInst(II);
   }
 
   Instruction *hasUnsafePHIOrSelectUse(Instruction *Root, uint64_t &Size) {
     // We consider any PHI or select that results in a direct load or store of
     // the same offset to be a viable use for slicing purposes. These uses
     // are considered unsplittable and the size is the maximum loaded or stored
     // size.
     SmallPtrSet<Instruction *, 4> Visited;
     SmallVector<std::pair<Instruction *, Instruction *>, 4> Uses;
     Visited.insert(Root);
     Uses.push_back(std::make_pair(cast<Instruction>(*U), Root));
     const DataLayout &DL = Root->getModule()->getDataLayout();
     // If there are no loads or stores, the access is dead. We mark that as
     // a size zero access.
     Size = 0;
     do {
       Instruction *I, *UsedI;
       std::tie(UsedI, I) = Uses.pop_back_val();
 
       if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
         Size = std::max(Size, DL.getTypeStoreSize(LI->getType()));
         continue;
       }
       if (StoreInst *SI = dyn_cast<StoreInst>(I)) {
         Value *Op = SI->getOperand(0);
         if (Op == UsedI)
           return SI;
         Size = std::max(Size, DL.getTypeStoreSize(Op->getType()));
         continue;
       }
 
       if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I)) {
         if (!GEP->hasAllZeroIndices())
           return GEP;
       } else if (!isa<BitCastInst>(I) && !isa<PHINode>(I) &&
                  !isa<SelectInst>(I)) {
         return I;
       }
 
       for (User *U : I->users())
         if (Visited.insert(cast<Instruction>(U)).second)
           Uses.push_back(std::make_pair(I, cast<Instruction>(U)));
     } while (!Uses.empty());
 
     return nullptr;
   }
 
   void visitPHINodeOrSelectInst(Instruction &I) {
     assert(isa<PHINode>(I) || isa<SelectInst>(I));
     if (I.use_empty())
       return markAsDead(I);
 
     // TODO: We could use SimplifyInstruction here to fold PHINodes and
     // SelectInsts. However, doing so requires to change the current
     // dead-operand-tracking mechanism. For instance, suppose neither loading
     // from %U nor %other traps. Then "load (select undef, %U, %other)" does not
     // trap either.  However, if we simply replace %U with undef using the
     // current dead-operand-tracking mechanism, "load (select undef, undef,
     // %other)" may trap because the select may return the first operand
     // "undef".
     if (Value *Result = foldPHINodeOrSelectInst(I)) {
       if (Result == *U)
         // If the result of the constant fold will be the pointer, recurse
         // through the PHI/select as if we had RAUW'ed it.
         enqueueUsers(I);
       else
         // Otherwise the operand to the PHI/select is dead, and we can replace
         // it with undef.
         AS.DeadOperands.push_back(U);
 
       return;
     }
 
     if (!IsOffsetKnown)
       return PI.setAborted(&I);
 
     // See if we already have computed info on this node.
     uint64_t &Size = PHIOrSelectSizes[&I];
     if (!Size) {
       // This is a new PHI/Select, check for an unsafe use of it.
       if (Instruction *UnsafeI = hasUnsafePHIOrSelectUse(&I, Size))
         return PI.setAborted(UnsafeI);
     }
 
     // For PHI and select operands outside the alloca, we can't nuke the entire
     // phi or select -- the other side might still be relevant, so we special
     // case them here and use a separate structure to track the operands
     // themselves which should be replaced with undef.
     // FIXME: This should instead be escaped in the event we're instrumenting
     // for address sanitization.
     if (Offset.uge(AllocSize)) {
       AS.DeadOperands.push_back(U);
       return;
     }
 
     insertUse(I, Offset, Size);
   }
 
   void visitPHINode(PHINode &PN) { visitPHINodeOrSelectInst(PN); }
 
   void visitSelectInst(SelectInst &SI) { visitPHINodeOrSelectInst(SI); }
 
   /// Disable SROA entirely if there are unhandled users of the alloca.
   void visitInstruction(Instruction &I) { PI.setAborted(&I); }
 };
 
 AllocaSlices::AllocaSlices(const DataLayout &DL, AllocaInst &AI)
     :
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
       AI(AI),
 #endif
       PointerEscapingInstr(nullptr) {
   SliceBuilder PB(DL, AI, *this);
   SliceBuilder::PtrInfo PtrI = PB.visitPtr(AI);
   if (PtrI.isEscaped() || PtrI.isAborted()) {
     // FIXME: We should sink the escape vs. abort info into the caller nicely,
     // possibly by just storing the PtrInfo in the AllocaSlices.
     PointerEscapingInstr = PtrI.getEscapingInst() ? PtrI.getEscapingInst()
                                                   : PtrI.getAbortingInst();
     assert(PointerEscapingInstr && "Did not track a bad instruction");
     return;
   }
 
   Slices.erase(
       llvm::remove_if(Slices, [](const Slice &S) { return S.isDead(); }),
       Slices.end());
 
 #ifndef NDEBUG
   if (SROARandomShuffleSlices) {
     std::mt19937 MT(static_cast<unsigned>(
         std::chrono::system_clock::now().time_since_epoch().count()));
     std::shuffle(Slices.begin(), Slices.end(), MT);
   }
 #endif
 
   // Sort the uses. This arranges for the offsets to be in ascending order,
   // and the sizes to be in descending order.
   llvm::sort(Slices.begin(), Slices.end());
 }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 
 void AllocaSlices::print(raw_ostream &OS, const_iterator I,
                          StringRef Indent) const {
   printSlice(OS, I, Indent);
   OS << "\n";
   printUse(OS, I, Indent);
 }
 
 void AllocaSlices::printSlice(raw_ostream &OS, const_iterator I,
                               StringRef Indent) const {
   OS << Indent << "[" << I->beginOffset() << "," << I->endOffset() << ")"
      << " slice #" << (I - begin())
      << (I->isSplittable() ? " (splittable)" : "");
 }
 
 void AllocaSlices::printUse(raw_ostream &OS, const_iterator I,
                             StringRef Indent) const {
   OS << Indent << "  used by: " << *I->getUse()->getUser() << "\n";
 }
 
 void AllocaSlices::print(raw_ostream &OS) const {
   if (PointerEscapingInstr) {
     OS << "Can't analyze slices for alloca: " << AI << "\n"
        << "  A pointer to this alloca escaped by:\n"
        << "  " << *PointerEscapingInstr << "\n";
     return;
   }
 
   OS << "Slices of alloca: " << AI << "\n";
   for (const_iterator I = begin(), E = end(); I != E; ++I)
     print(OS, I);
 }
 
 LLVM_DUMP_METHOD void AllocaSlices::dump(const_iterator I) const {
   print(dbgs(), I);
 }
 LLVM_DUMP_METHOD void AllocaSlices::dump() const { print(dbgs()); }
 
 #endif // !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 
 /// Walk the range of a partitioning looking for a common type to cover this
 /// sequence of slices.
 static Type *findCommonType(AllocaSlices::const_iterator B,
                             AllocaSlices::const_iterator E,
                             uint64_t EndOffset) {
   Type *Ty = nullptr;
   bool TyIsCommon = true;
   IntegerType *ITy = nullptr;
 
   // Note that we need to look at *every* alloca slice's Use to ensure we
   // always get consistent results regardless of the order of slices.
   for (AllocaSlices::const_iterator I = B; I != E; ++I) {
     Use *U = I->getUse();
     if (isa<IntrinsicInst>(*U->getUser()))
       continue;
     if (I->beginOffset() != B->beginOffset() || I->endOffset() != EndOffset)
       continue;
 
     Type *UserTy = nullptr;
     if (LoadInst *LI = dyn_cast<LoadInst>(U->getUser())) {
       UserTy = LI->getType();
     } else if (StoreInst *SI = dyn_cast<StoreInst>(U->getUser())) {
       UserTy = SI->getValueOperand()->getType();
     }
 
     if (IntegerType *UserITy = dyn_cast_or_null<IntegerType>(UserTy)) {
       // If the type is larger than the partition, skip it. We only encounter
       // this for split integer operations where we want to use the type of the
       // entity causing the split. Also skip if the type is not a byte width
       // multiple.
       if (UserITy->getBitWidth() % 8 != 0 ||
           UserITy->getBitWidth() / 8 > (EndOffset - B->beginOffset()))
         continue;
 
       // Track the largest bitwidth integer type used in this way in case there
       // is no common type.
       if (!ITy || ITy->getBitWidth() < UserITy->getBitWidth())
         ITy = UserITy;
     }
 
     // To avoid depending on the order of slices, Ty and TyIsCommon must not
     // depend on types skipped above.
     if (!UserTy || (Ty && Ty != UserTy))
       TyIsCommon = false; // Give up on anything but an iN type.
     else
       Ty = UserTy;
   }
 
   return TyIsCommon ? Ty : ITy;
 }
 
 /// PHI instructions that use an alloca and are subsequently loaded can be
 /// rewritten to load both input pointers in the pred blocks and then PHI the
 /// results, allowing the load of the alloca to be promoted.
 /// From this:
 ///   %P2 = phi [i32* %Alloca, i32* %Other]
 ///   %V = load i32* %P2
 /// to:
 ///   %V1 = load i32* %Alloca      -> will be mem2reg'd
 ///   ...
 ///   %V2 = load i32* %Other
 ///   ...
 ///   %V = phi [i32 %V1, i32 %V2]
 ///
 /// We can do this to a select if its only uses are loads and if the operands
 /// to the select can be loaded unconditionally.
 ///
 /// FIXME: This should be hoisted into a generic utility, likely in
 /// Transforms/Util/Local.h
 static bool isSafePHIToSpeculate(PHINode &PN) {
   // For now, we can only do this promotion if the load is in the same block
   // as the PHI, and if there are no stores between the phi and load.
   // TODO: Allow recursive phi users.
   // TODO: Allow stores.
   BasicBlock *BB = PN.getParent();
   unsigned MaxAlign = 0;
   bool HaveLoad = false;
   for (User *U : PN.users()) {
     LoadInst *LI = dyn_cast<LoadInst>(U);
     if (!LI || !LI->isSimple())
       return false;
 
     // For now we only allow loads in the same block as the PHI.  This is
     // a common case that happens when instcombine merges two loads through
     // a PHI.
     if (LI->getParent() != BB)
       return false;
 
     // Ensure that there are no instructions between the PHI and the load that
     // could store.
     for (BasicBlock::iterator BBI(PN); &*BBI != LI; ++BBI)
       if (BBI->mayWriteToMemory())
         return false;
 
     MaxAlign = std::max(MaxAlign, LI->getAlignment());
     HaveLoad = true;
   }
 
   if (!HaveLoad)
     return false;
 
   const DataLayout &DL = PN.getModule()->getDataLayout();
 
   // We can only transform this if it is safe to push the loads into the
   // predecessor blocks. The only thing to watch out for is that we can't put
   // a possibly trapping load in the predecessor if it is a critical edge.
   for (unsigned Idx = 0, Num = PN.getNumIncomingValues(); Idx != Num; ++Idx) {
     TerminatorInst *TI = PN.getIncomingBlock(Idx)->getTerminator();
     Value *InVal = PN.getIncomingValue(Idx);
 
     // If the value is produced by the terminator of the predecessor (an
     // invoke) or it has side-effects, there is no valid place to put a load
     // in the predecessor.
     if (TI == InVal || TI->mayHaveSideEffects())
       return false;
 
     // If the predecessor has a single successor, then the edge isn't
     // critical.
     if (TI->getNumSuccessors() == 1)
       continue;
 
     // If this pointer is always safe to load, or if we can prove that there
     // is already a load in the block, then we can move the load to the pred
     // block.
     if (isSafeToLoadUnconditionally(InVal, MaxAlign, DL, TI))
       continue;
 
     return false;
   }
 
   return true;
 }
 
 static void speculatePHINodeLoads(PHINode &PN) {
   LLVM_DEBUG(dbgs() << "    original: " << PN << "\n");
 
   Type *LoadTy = cast<PointerType>(PN.getType())->getElementType();
   IRBuilderTy PHIBuilder(&PN);
   PHINode *NewPN = PHIBuilder.CreatePHI(LoadTy, PN.getNumIncomingValues(),
                                         PN.getName() + ".sroa.speculated");
 
   // Get the AA tags and alignment to use from one of the loads.  It doesn't
   // matter which one we get and if any differ.
   LoadInst *SomeLoad = cast<LoadInst>(PN.user_back());
 
   AAMDNodes AATags;
   SomeLoad->getAAMetadata(AATags);
   unsigned Align = SomeLoad->getAlignment();
 
   // Rewrite all loads of the PN to use the new PHI.
   while (!PN.use_empty()) {
     LoadInst *LI = cast<LoadInst>(PN.user_back());
     LI->replaceAllUsesWith(NewPN);
     LI->eraseFromParent();
   }
 
   // Inject loads into all of the pred blocks.
   DenseMap<BasicBlock*, Value*> InjectedLoads;
   for (unsigned Idx = 0, Num = PN.getNumIncomingValues(); Idx != Num; ++Idx) {
     BasicBlock *Pred = PN.getIncomingBlock(Idx);
     Value *InVal = PN.getIncomingValue(Idx);
 
     // A PHI node is allowed to have multiple (duplicated) entries for the same
     // basic block, as long as the value is the same. So if we already injected
     // a load in the predecessor, then we should reuse the same load for all
     // duplicated entries.
     if (Value* V = InjectedLoads.lookup(Pred)) {
       NewPN->addIncoming(V, Pred);
       continue;
     }
 
     TerminatorInst *TI = Pred->getTerminator();
     IRBuilderTy PredBuilder(TI);
 
     LoadInst *Load = PredBuilder.CreateLoad(
         InVal, (PN.getName() + ".sroa.speculate.load." + Pred->getName()));
     ++NumLoadsSpeculated;
     Load->setAlignment(Align);
     if (AATags)
       Load->setAAMetadata(AATags);
     NewPN->addIncoming(Load, Pred);
     InjectedLoads[Pred] = Load;
   }
 
   LLVM_DEBUG(dbgs() << "          speculated to: " << *NewPN << "\n");
   PN.eraseFromParent();
 }
 
 /// Select instructions that use an alloca and are subsequently loaded can be
 /// rewritten to load both input pointers and then select between the result,
 /// allowing the load of the alloca to be promoted.
 /// From this:
 ///   %P2 = select i1 %cond, i32* %Alloca, i32* %Other
 ///   %V = load i32* %P2
 /// to:
 ///   %V1 = load i32* %Alloca      -> will be mem2reg'd
 ///   %V2 = load i32* %Other
 ///   %V = select i1 %cond, i32 %V1, i32 %V2
 ///
 /// We can do this to a select if its only uses are loads and if the operand
 /// to the select can be loaded unconditionally.
 static bool isSafeSelectToSpeculate(SelectInst &SI) {
   Value *TValue = SI.getTrueValue();
   Value *FValue = SI.getFalseValue();
   const DataLayout &DL = SI.getModule()->getDataLayout();
 
   for (User *U : SI.users()) {
     LoadInst *LI = dyn_cast<LoadInst>(U);
     if (!LI || !LI->isSimple())
       return false;
 
     // Both operands to the select need to be dereferenceable, either
     // absolutely (e.g. allocas) or at this point because we can see other
     // accesses to it.
     if (!isSafeToLoadUnconditionally(TValue, LI->getAlignment(), DL, LI))
       return false;
     if (!isSafeToLoadUnconditionally(FValue, LI->getAlignment(), DL, LI))
       return false;
   }
 
   return true;
 }
 
 static void speculateSelectInstLoads(SelectInst &SI) {
   LLVM_DEBUG(dbgs() << "    original: " << SI << "\n");
 
   IRBuilderTy IRB(&SI);
   Value *TV = SI.getTrueValue();
   Value *FV = SI.getFalseValue();
   // Replace the loads of the select with a select of two loads.
   while (!SI.use_empty()) {
     LoadInst *LI = cast<LoadInst>(SI.user_back());
     assert(LI->isSimple() && "We only speculate simple loads");
 
     IRB.SetInsertPoint(LI);
     LoadInst *TL =
         IRB.CreateLoad(TV, LI->getName() + ".sroa.speculate.load.true");
     LoadInst *FL =
         IRB.CreateLoad(FV, LI->getName() + ".sroa.speculate.load.false");
     NumLoadsSpeculated += 2;
 
     // Transfer alignment and AA info if present.
     TL->setAlignment(LI->getAlignment());
     FL->setAlignment(LI->getAlignment());
 
     AAMDNodes Tags;
     LI->getAAMetadata(Tags);
     if (Tags) {
       TL->setAAMetadata(Tags);
       FL->setAAMetadata(Tags);
     }
 
     Value *V = IRB.CreateSelect(SI.getCondition(), TL, FL,
                                 LI->getName() + ".sroa.speculated");
 
     LLVM_DEBUG(dbgs() << "          speculated to: " << *V << "\n");
     LI->replaceAllUsesWith(V);
     LI->eraseFromParent();
   }
   SI.eraseFromParent();
 }
 
 /// Build a GEP out of a base pointer and indices.
 ///
 /// This will return the BasePtr if that is valid, or build a new GEP
 /// instruction using the IRBuilder if GEP-ing is needed.
 static Value *buildGEP(IRBuilderTy &IRB, Value *BasePtr,
                        SmallVectorImpl<Value *> &Indices, Twine NamePrefix) {
   if (Indices.empty())
     return BasePtr;
 
   // A single zero index is a no-op, so check for this and avoid building a GEP
   // in that case.
   if (Indices.size() == 1 && cast<ConstantInt>(Indices.back())->isZero())
     return BasePtr;
 
   return IRB.CreateInBoundsGEP(nullptr, BasePtr, Indices,
                                NamePrefix + "sroa_idx");
 }
 
 /// Get a natural GEP off of the BasePtr walking through Ty toward
 /// TargetTy without changing the offset of the pointer.
 ///
 /// This routine assumes we've already established a properly offset GEP with
 /// Indices, and arrived at the Ty type. The goal is to continue to GEP with
 /// zero-indices down through type layers until we find one the same as
 /// TargetTy. If we can't find one with the same type, we at least try to use
 /// one with the same size. If none of that works, we just produce the GEP as
 /// indicated by Indices to have the correct offset.
 static Value *getNaturalGEPWithType(IRBuilderTy &IRB, const DataLayout &DL,
                                     Value *BasePtr, Type *Ty, Type *TargetTy,
                                     SmallVectorImpl<Value *> &Indices,
                                     Twine NamePrefix) {
   if (Ty == TargetTy)
     return buildGEP(IRB, BasePtr, Indices, NamePrefix);
 
   // Pointer size to use for the indices.
   unsigned PtrSize = DL.getPointerTypeSizeInBits(BasePtr->getType());
 
   // See if we can descend into a struct and locate a field with the correct
   // type.
   unsigned NumLayers = 0;
   Type *ElementTy = Ty;
   do {
     if (ElementTy->isPointerTy())
       break;
 
     if (ArrayType *ArrayTy = dyn_cast<ArrayType>(ElementTy)) {
       ElementTy = ArrayTy->getElementType();
       Indices.push_back(IRB.getIntN(PtrSize, 0));
     } else if (VectorType *VectorTy = dyn_cast<VectorType>(ElementTy)) {
       ElementTy = VectorTy->getElementType();
       Indices.push_back(IRB.getInt32(0));
     } else if (StructType *STy = dyn_cast<StructType>(ElementTy)) {
       if (STy->element_begin() == STy->element_end())
         break; // Nothing left to descend into.
       ElementTy = *STy->element_begin();
       Indices.push_back(IRB.getInt32(0));
     } else {
       break;
     }
     ++NumLayers;
   } while (ElementTy != TargetTy);
   if (ElementTy != TargetTy)
     Indices.erase(Indices.end() - NumLayers, Indices.end());
 
   return buildGEP(IRB, BasePtr, Indices, NamePrefix);
 }
 
 /// Recursively compute indices for a natural GEP.
 ///
 /// This is the recursive step for getNaturalGEPWithOffset that walks down the
 /// element types adding appropriate indices for the GEP.
 static Value *getNaturalGEPRecursively(IRBuilderTy &IRB, const DataLayout &DL,
                                        Value *Ptr, Type *Ty, APInt &Offset,
                                        Type *TargetTy,
                                        SmallVectorImpl<Value *> &Indices,
                                        Twine NamePrefix) {
   if (Offset == 0)
     return getNaturalGEPWithType(IRB, DL, Ptr, Ty, TargetTy, Indices,
                                  NamePrefix);
 
   // We can't recurse through pointer types.
   if (Ty->isPointerTy())
     return nullptr;
 
   // We try to analyze GEPs over vectors here, but note that these GEPs are
   // extremely poorly defined currently. The long-term goal is to remove GEPing
   // over a vector from the IR completely.
   if (VectorType *VecTy = dyn_cast<VectorType>(Ty)) {
     unsigned ElementSizeInBits = DL.getTypeSizeInBits(VecTy->getScalarType());
     if (ElementSizeInBits % 8 != 0) {
       // GEPs over non-multiple of 8 size vector elements are invalid.
       return nullptr;
     }
     APInt ElementSize(Offset.getBitWidth(), ElementSizeInBits / 8);
     APInt NumSkippedElements = Offset.sdiv(ElementSize);
     if (NumSkippedElements.ugt(VecTy->getNumElements()))
       return nullptr;
     Offset -= NumSkippedElements * ElementSize;
     Indices.push_back(IRB.getInt(NumSkippedElements));
     return getNaturalGEPRecursively(IRB, DL, Ptr, VecTy->getElementType(),
                                     Offset, TargetTy, Indices, NamePrefix);
   }
 
   if (ArrayType *ArrTy = dyn_cast<ArrayType>(Ty)) {
     Type *ElementTy = ArrTy->getElementType();
     APInt ElementSize(Offset.getBitWidth(), DL.getTypeAllocSize(ElementTy));
     APInt NumSkippedElements = Offset.sdiv(ElementSize);
     if (NumSkippedElements.ugt(ArrTy->getNumElements()))
       return nullptr;
 
     Offset -= NumSkippedElements * ElementSize;
     Indices.push_back(IRB.getInt(NumSkippedElements));
     return getNaturalGEPRecursively(IRB, DL, Ptr, ElementTy, Offset, TargetTy,
                                     Indices, NamePrefix);
   }
 
   StructType *STy = dyn_cast<StructType>(Ty);
   if (!STy)
     return nullptr;
 
   const StructLayout *SL = DL.getStructLayout(STy);
   uint64_t StructOffset = Offset.getZExtValue();
   if (StructOffset >= SL->getSizeInBytes())
     return nullptr;
   unsigned Index = SL->getElementContainingOffset(StructOffset);
   Offset -= APInt(Offset.getBitWidth(), SL->getElementOffset(Index));
   Type *ElementTy = STy->getElementType(Index);
   if (Offset.uge(DL.getTypeAllocSize(ElementTy)))
     return nullptr; // The offset points into alignment padding.
 
   Indices.push_back(IRB.getInt32(Index));
   return getNaturalGEPRecursively(IRB, DL, Ptr, ElementTy, Offset, TargetTy,
                                   Indices, NamePrefix);
 }
 
 /// Get a natural GEP from a base pointer to a particular offset and
 /// resulting in a particular type.
 ///
 /// The goal is to produce a "natural" looking GEP that works with the existing
 /// composite types to arrive at the appropriate offset and element type for
 /// a pointer. TargetTy is the element type the returned GEP should point-to if
 /// possible. We recurse by decreasing Offset, adding the appropriate index to
 /// Indices, and setting Ty to the result subtype.
 ///
 /// If no natural GEP can be constructed, this function returns null.
 static Value *getNaturalGEPWithOffset(IRBuilderTy &IRB, const DataLayout &DL,
                                       Value *Ptr, APInt Offset, Type *TargetTy,
                                       SmallVectorImpl<Value *> &Indices,
                                       Twine NamePrefix) {
   PointerType *Ty = cast<PointerType>(Ptr->getType());
 
   // Don't consider any GEPs through an i8* as natural unless the TargetTy is
   // an i8.
   if (Ty == IRB.getInt8PtrTy(Ty->getAddressSpace()) && TargetTy->isIntegerTy(8))
     return nullptr;
 
   Type *ElementTy = Ty->getElementType();
   if (!ElementTy->isSized())
     return nullptr; // We can't GEP through an unsized element.
   APInt ElementSize(Offset.getBitWidth(), DL.getTypeAllocSize(ElementTy));
   if (ElementSize == 0)
     return nullptr; // Zero-length arrays can't help us build a natural GEP.
   APInt NumSkippedElements = Offset.sdiv(ElementSize);
 
   Offset -= NumSkippedElements * ElementSize;
   Indices.push_back(IRB.getInt(NumSkippedElements));
   return getNaturalGEPRecursively(IRB, DL, Ptr, ElementTy, Offset, TargetTy,
                                   Indices, NamePrefix);
 }
 
 /// Compute an adjusted pointer from Ptr by Offset bytes where the
 /// resulting pointer has PointerTy.
 ///
 /// This tries very hard to compute a "natural" GEP which arrives at the offset
 /// and produces the pointer type desired. Where it cannot, it will try to use
 /// the natural GEP to arrive at the offset and bitcast to the type. Where that
 /// fails, it will try to use an existing i8* and GEP to the byte offset and
 /// bitcast to the type.
 ///
 /// The strategy for finding the more natural GEPs is to peel off layers of the
 /// pointer, walking back through bit casts and GEPs, searching for a base
 /// pointer from which we can compute a natural GEP with the desired
 /// properties. The algorithm tries to fold as many constant indices into
 /// a single GEP as possible, thus making each GEP more independent of the
 /// surrounding code.
 static Value *getAdjustedPtr(IRBuilderTy &IRB, const DataLayout &DL, Value *Ptr,
                              APInt Offset, Type *PointerTy, Twine NamePrefix) {
   // Even though we don't look through PHI nodes, we could be called on an
   // instruction in an unreachable block, which may be on a cycle.
   SmallPtrSet<Value *, 4> Visited;
   Visited.insert(Ptr);
   SmallVector<Value *, 4> Indices;
 
   // We may end up computing an offset pointer that has the wrong type. If we
   // never are able to compute one directly that has the correct type, we'll
   // fall back to it, so keep it and the base it was computed from around here.
   Value *OffsetPtr = nullptr;
   Value *OffsetBasePtr;
 
   // Remember any i8 pointer we come across to re-use if we need to do a raw
   // byte offset.
   Value *Int8Ptr = nullptr;
   APInt Int8PtrOffset(Offset.getBitWidth(), 0);
 
   Type *TargetTy = PointerTy->getPointerElementType();
 
   do {
     // First fold any existing GEPs into the offset.
     while (GEPOperator *GEP = dyn_cast<GEPOperator>(Ptr)) {
       APInt GEPOffset(Offset.getBitWidth(), 0);
       if (!GEP->accumulateConstantOffset(DL, GEPOffset))
         break;
       Offset += GEPOffset;
       Ptr = GEP->getPointerOperand();
       if (!Visited.insert(Ptr).second)
         break;
     }
 
     // See if we can perform a natural GEP here.
     Indices.clear();
     if (Value *P = getNaturalGEPWithOffset(IRB, DL, Ptr, Offset, TargetTy,
                                            Indices, NamePrefix)) {
       // If we have a new natural pointer at the offset, clear out any old
       // offset pointer we computed. Unless it is the base pointer or
       // a non-instruction, we built a GEP we don't need. Zap it.
       if (OffsetPtr && OffsetPtr != OffsetBasePtr)
         if (Instruction *I = dyn_cast<Instruction>(OffsetPtr)) {
           assert(I->use_empty() && "Built a GEP with uses some how!");
           I->eraseFromParent();
         }
       OffsetPtr = P;
       OffsetBasePtr = Ptr;
       // If we also found a pointer of the right type, we're done.
       if (P->getType() == PointerTy)
         return P;
     }
 
     // Stash this pointer if we've found an i8*.
     if (Ptr->getType()->isIntegerTy(8)) {
       Int8Ptr = Ptr;
       Int8PtrOffset = Offset;
     }
 
     // Peel off a layer of the pointer and update the offset appropriately.
     if (Operator::getOpcode(Ptr) == Instruction::BitCast) {
       Ptr = cast<Operator>(Ptr)->getOperand(0);
     } else if (GlobalAlias *GA = dyn_cast<GlobalAlias>(Ptr)) {
       if (GA->isInterposable())
         break;
       Ptr = GA->getAliasee();
     } else {
       break;
     }
     assert(Ptr->getType()->isPointerTy() && "Unexpected operand type!");
   } while (Visited.insert(Ptr).second);
 
   if (!OffsetPtr) {
     if (!Int8Ptr) {
       Int8Ptr = IRB.CreateBitCast(
           Ptr, IRB.getInt8PtrTy(PointerTy->getPointerAddressSpace()),
           NamePrefix + "sroa_raw_cast");
       Int8PtrOffset = Offset;
     }
 
     OffsetPtr = Int8PtrOffset == 0
                     ? Int8Ptr
                     : IRB.CreateInBoundsGEP(IRB.getInt8Ty(), Int8Ptr,
                                             IRB.getInt(Int8PtrOffset),
                                             NamePrefix + "sroa_raw_idx");
   }
   Ptr = OffsetPtr;
 
   // On the off chance we were targeting i8*, guard the bitcast here.
   if (Ptr->getType() != PointerTy)
     Ptr = IRB.CreateBitCast(Ptr, PointerTy, NamePrefix + "sroa_cast");
 
   return Ptr;
 }
 
 /// Compute the adjusted alignment for a load or store from an offset.
 static unsigned getAdjustedAlignment(Instruction *I, uint64_t Offset,
                                      const DataLayout &DL) {
   unsigned Alignment;
   Type *Ty;
   if (auto *LI = dyn_cast<LoadInst>(I)) {
     Alignment = LI->getAlignment();
     Ty = LI->getType();
   } else if (auto *SI = dyn_cast<StoreInst>(I)) {
     Alignment = SI->getAlignment();
     Ty = SI->getValueOperand()->getType();
   } else {
     llvm_unreachable("Only loads and stores are allowed!");
   }
 
   if (!Alignment)
     Alignment = DL.getABITypeAlignment(Ty);
 
   return MinAlign(Alignment, Offset);
 }
 
 /// Test whether we can convert a value from the old to the new type.
 ///
 /// This predicate should be used to guard calls to convertValue in order to
 /// ensure that we only try to convert viable values. The strategy is that we
 /// will peel off single element struct and array wrappings to get to an
 /// underlying value, and convert that value.
 static bool canConvertValue(const DataLayout &DL, Type *OldTy, Type *NewTy) {
   if (OldTy == NewTy)
     return true;
 
   // For integer types, we can't handle any bit-width differences. This would
   // break both vector conversions with extension and introduce endianness
   // issues when in conjunction with loads and stores.
   if (isa<IntegerType>(OldTy) && isa<IntegerType>(NewTy)) {
     assert(cast<IntegerType>(OldTy)->getBitWidth() !=
                cast<IntegerType>(NewTy)->getBitWidth() &&
            "We can't have the same bitwidth for different int types");
     return false;
   }
 
   if (DL.getTypeSizeInBits(NewTy) != DL.getTypeSizeInBits(OldTy))
     return false;
   if (!NewTy->isSingleValueType() || !OldTy->isSingleValueType())
     return false;
 
   // We can convert pointers to integers and vice-versa. Same for vectors
   // of pointers and integers.
   OldTy = OldTy->getScalarType();
   NewTy = NewTy->getScalarType();
   if (NewTy->isPointerTy() || OldTy->isPointerTy()) {
     if (NewTy->isPointerTy() && OldTy->isPointerTy()) {
       return cast<PointerType>(NewTy)->getPointerAddressSpace() ==
         cast<PointerType>(OldTy)->getPointerAddressSpace();
     }
 
     // We can convert integers to integral pointers, but not to non-integral
     // pointers.
     if (OldTy->isIntegerTy())
       return !DL.isNonIntegralPointerType(NewTy);
 
     // We can convert integral pointers to integers, but non-integral pointers
     // need to remain pointers.
     if (!DL.isNonIntegralPointerType(OldTy))
       return NewTy->isIntegerTy();
 
     return false;
   }
 
   return true;
 }
 
 /// Generic routine to convert an SSA value to a value of a different
 /// type.
 ///
 /// This will try various different casting techniques, such as bitcasts,
 /// inttoptr, and ptrtoint casts. Use the \c canConvertValue predicate to test
 /// two types for viability with this routine.
 static Value *convertValue(const DataLayout &DL, IRBuilderTy &IRB, Value *V,
                            Type *NewTy) {
   Type *OldTy = V->getType();
   assert(canConvertValue(DL, OldTy, NewTy) && "Value not convertable to type");
 
   if (OldTy == NewTy)
     return V;
 
   assert(!(isa<IntegerType>(OldTy) && isa<IntegerType>(NewTy)) &&
          "Integer types must be the exact same to convert.");
 
   // See if we need inttoptr for this type pair. A cast involving both scalars
   // and vectors requires and additional bitcast.
   if (OldTy->isIntOrIntVectorTy() && NewTy->isPtrOrPtrVectorTy()) {
     // Expand <2 x i32> to i8* --> <2 x i32> to i64 to i8*
     if (OldTy->isVectorTy() && !NewTy->isVectorTy())
       return IRB.CreateIntToPtr(IRB.CreateBitCast(V, DL.getIntPtrType(NewTy)),
                                 NewTy);
 
     // Expand i128 to <2 x i8*> --> i128 to <2 x i64> to <2 x i8*>
     if (!OldTy->isVectorTy() && NewTy->isVectorTy())
       return IRB.CreateIntToPtr(IRB.CreateBitCast(V, DL.getIntPtrType(NewTy)),
                                 NewTy);
 
     return IRB.CreateIntToPtr(V, NewTy);
   }
 
   // See if we need ptrtoint for this type pair. A cast involving both scalars
   // and vectors requires and additional bitcast.
   if (OldTy->isPtrOrPtrVectorTy() && NewTy->isIntOrIntVectorTy()) {
     // Expand <2 x i8*> to i128 --> <2 x i8*> to <2 x i64> to i128
     if (OldTy->isVectorTy() && !NewTy->isVectorTy())
       return IRB.CreateBitCast(IRB.CreatePtrToInt(V, DL.getIntPtrType(OldTy)),
                                NewTy);
 
     // Expand i8* to <2 x i32> --> i8* to i64 to <2 x i32>
     if (!OldTy->isVectorTy() && NewTy->isVectorTy())
       return IRB.CreateBitCast(IRB.CreatePtrToInt(V, DL.getIntPtrType(OldTy)),
                                NewTy);
 
     return IRB.CreatePtrToInt(V, NewTy);
   }
 
   return IRB.CreateBitCast(V, NewTy);
 }
 
 /// Test whether the given slice use can be promoted to a vector.
 ///
 /// This function is called to test each entry in a partition which is slated
 /// for a single slice.
 static bool isVectorPromotionViableForSlice(Partition &P, const Slice &S,
                                             VectorType *Ty,
                                             uint64_t ElementSize,
                                             const DataLayout &DL) {
   // First validate the slice offsets.
   uint64_t BeginOffset =
       std::max(S.beginOffset(), P.beginOffset()) - P.beginOffset();
   uint64_t BeginIndex = BeginOffset / ElementSize;
   if (BeginIndex * ElementSize != BeginOffset ||
       BeginIndex >= Ty->getNumElements())
     return false;
   uint64_t EndOffset =
       std::min(S.endOffset(), P.endOffset()) - P.beginOffset();
   uint64_t EndIndex = EndOffset / ElementSize;
   if (EndIndex * ElementSize != EndOffset || EndIndex > Ty->getNumElements())
     return false;
 
   assert(EndIndex > BeginIndex && "Empty vector!");
   uint64_t NumElements = EndIndex - BeginIndex;
   Type *SliceTy = (NumElements == 1)
                       ? Ty->getElementType()
                       : VectorType::get(Ty->getElementType(), NumElements);
 
   Type *SplitIntTy =
       Type::getIntNTy(Ty->getContext(), NumElements * ElementSize * 8);
 
   Use *U = S.getUse();
 
   if (MemIntrinsic *MI = dyn_cast<MemIntrinsic>(U->getUser())) {
     if (MI->isVolatile())
       return false;
     if (!S.isSplittable())
       return false; // Skip any unsplittable intrinsics.
   } else if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(U->getUser())) {
     if (II->getIntrinsicID() != Intrinsic::lifetime_start &&
         II->getIntrinsicID() != Intrinsic::lifetime_end)
       return false;
   } else if (U->get()->getType()->getPointerElementType()->isStructTy()) {
     // Disable vector promotion when there are loads or stores of an FCA.
     return false;
   } else if (LoadInst *LI = dyn_cast<LoadInst>(U->getUser())) {
     if (LI->isVolatile())
       return false;
     Type *LTy = LI->getType();
     if (P.beginOffset() > S.beginOffset() || P.endOffset() < S.endOffset()) {
       assert(LTy->isIntegerTy());
       LTy = SplitIntTy;
     }
     if (!canConvertValue(DL, SliceTy, LTy))
       return false;
   } else if (StoreInst *SI = dyn_cast<StoreInst>(U->getUser())) {
     if (SI->isVolatile())
       return false;
     Type *STy = SI->getValueOperand()->getType();
     if (P.beginOffset() > S.beginOffset() || P.endOffset() < S.endOffset()) {
       assert(STy->isIntegerTy());
       STy = SplitIntTy;
     }
     if (!canConvertValue(DL, STy, SliceTy))
       return false;
   } else {
     return false;
   }
 
   return true;
 }
 
 /// Test whether the given alloca partitioning and range of slices can be
 /// promoted to a vector.
 ///
 /// This is a quick test to check whether we can rewrite a particular alloca
 /// partition (and its newly formed alloca) into a vector alloca with only
 /// whole-vector loads and stores such that it could be promoted to a vector
 /// SSA value. We only can ensure this for a limited set of operations, and we
 /// don't want to do the rewrites unless we are confident that the result will
 /// be promotable, so we have an early test here.
 static VectorType *isVectorPromotionViable(Partition &P, const DataLayout &DL) {
   // Collect the candidate types for vector-based promotion. Also track whether
   // we have different element types.
   SmallVector<VectorType *, 4> CandidateTys;
   Type *CommonEltTy = nullptr;
   bool HaveCommonEltTy = true;
   auto CheckCandidateType = [&](Type *Ty) {
     if (auto *VTy = dyn_cast<VectorType>(Ty)) {
       CandidateTys.push_back(VTy);
       if (!CommonEltTy)
         CommonEltTy = VTy->getElementType();
       else if (CommonEltTy != VTy->getElementType())
         HaveCommonEltTy = false;
     }
   };
   // Consider any loads or stores that are the exact size of the slice.
   for (const Slice &S : P)
     if (S.beginOffset() == P.beginOffset() &&
         S.endOffset() == P.endOffset()) {
       if (auto *LI = dyn_cast<LoadInst>(S.getUse()->getUser()))
         CheckCandidateType(LI->getType());
       else if (auto *SI = dyn_cast<StoreInst>(S.getUse()->getUser()))
         CheckCandidateType(SI->getValueOperand()->getType());
     }
 
   // If we didn't find a vector type, nothing to do here.
   if (CandidateTys.empty())
     return nullptr;
 
   // Remove non-integer vector types if we had multiple common element types.
   // FIXME: It'd be nice to replace them with integer vector types, but we can't
   // do that until all the backends are known to produce good code for all
   // integer vector types.
   if (!HaveCommonEltTy) {
     CandidateTys.erase(
         llvm::remove_if(CandidateTys,
                         [](VectorType *VTy) {
                           return !VTy->getElementType()->isIntegerTy();
                         }),
         CandidateTys.end());
 
     // If there were no integer vector types, give up.
     if (CandidateTys.empty())
       return nullptr;
 
     // Rank the remaining candidate vector types. This is easy because we know
     // they're all integer vectors. We sort by ascending number of elements.
     auto RankVectorTypes = [&DL](VectorType *RHSTy, VectorType *LHSTy) {
       (void)DL;
       assert(DL.getTypeSizeInBits(RHSTy) == DL.getTypeSizeInBits(LHSTy) &&
              "Cannot have vector types of different sizes!");
       assert(RHSTy->getElementType()->isIntegerTy() &&
              "All non-integer types eliminated!");
       assert(LHSTy->getElementType()->isIntegerTy() &&
              "All non-integer types eliminated!");
       return RHSTy->getNumElements() < LHSTy->getNumElements();
     };
     llvm::sort(CandidateTys.begin(), CandidateTys.end(), RankVectorTypes);
     CandidateTys.erase(
         std::unique(CandidateTys.begin(), CandidateTys.end(), RankVectorTypes),
         CandidateTys.end());
   } else {
 // The only way to have the same element type in every vector type is to
 // have the same vector type. Check that and remove all but one.
 #ifndef NDEBUG
     for (VectorType *VTy : CandidateTys) {
       assert(VTy->getElementType() == CommonEltTy &&
              "Unaccounted for element type!");
       assert(VTy == CandidateTys[0] &&
              "Different vector types with the same element type!");
     }
 #endif
     CandidateTys.resize(1);
   }
 
   // Try each vector type, and return the one which works.
   auto CheckVectorTypeForPromotion = [&](VectorType *VTy) {
     uint64_t ElementSize = DL.getTypeSizeInBits(VTy->getElementType());
 
     // While the definition of LLVM vectors is bitpacked, we don't support sizes
     // that aren't byte sized.
     if (ElementSize % 8)
       return false;
     assert((DL.getTypeSizeInBits(VTy) % 8) == 0 &&
            "vector size not a multiple of element size?");
     ElementSize /= 8;
 
     for (const Slice &S : P)
       if (!isVectorPromotionViableForSlice(P, S, VTy, ElementSize, DL))
         return false;
 
     for (const Slice *S : P.splitSliceTails())
       if (!isVectorPromotionViableForSlice(P, *S, VTy, ElementSize, DL))
         return false;
 
     return true;
   };
   for (VectorType *VTy : CandidateTys)
     if (CheckVectorTypeForPromotion(VTy))
       return VTy;
 
   return nullptr;
 }
 
 /// Test whether a slice of an alloca is valid for integer widening.
 ///
 /// This implements the necessary checking for the \c isIntegerWideningViable
 /// test below on a single slice of the alloca.
 static bool isIntegerWideningViableForSlice(const Slice &S,
                                             uint64_t AllocBeginOffset,
                                             Type *AllocaTy,
                                             const DataLayout &DL,
                                             bool &WholeAllocaOp) {
   uint64_t Size = DL.getTypeStoreSize(AllocaTy);
 
   uint64_t RelBegin = S.beginOffset() - AllocBeginOffset;
   uint64_t RelEnd = S.endOffset() - AllocBeginOffset;
 
   // We can't reasonably handle cases where the load or store extends past
   // the end of the alloca's type and into its padding.
   if (RelEnd > Size)
     return false;
 
   Use *U = S.getUse();
 
   if (LoadInst *LI = dyn_cast<LoadInst>(U->getUser())) {
     if (LI->isVolatile())
       return false;
     // We can't handle loads that extend past the allocated memory.
     if (DL.getTypeStoreSize(LI->getType()) > Size)
       return false;
     // So far, AllocaSliceRewriter does not support widening split slice tails
     // in rewriteIntegerLoad.
     if (S.beginOffset() < AllocBeginOffset)
       return false;
     // Note that we don't count vector loads or stores as whole-alloca
     // operations which enable integer widening because we would prefer to use
     // vector widening instead.
     if (!isa<VectorType>(LI->getType()) && RelBegin == 0 && RelEnd == Size)
       WholeAllocaOp = true;
     if (IntegerType *ITy = dyn_cast<IntegerType>(LI->getType())) {
       if (ITy->getBitWidth() < DL.getTypeStoreSizeInBits(ITy))
         return false;
     } else if (RelBegin != 0 || RelEnd != Size ||
                !canConvertValue(DL, AllocaTy, LI->getType())) {
       // Non-integer loads need to be convertible from the alloca type so that
       // they are promotable.
       return false;
     }
   } else if (StoreInst *SI = dyn_cast<StoreInst>(U->getUser())) {
     Type *ValueTy = SI->getValueOperand()->getType();
     if (SI->isVolatile())
       return false;
     // We can't handle stores that extend past the allocated memory.
     if (DL.getTypeStoreSize(ValueTy) > Size)
       return false;
     // So far, AllocaSliceRewriter does not support widening split slice tails
     // in rewriteIntegerStore.
     if (S.beginOffset() < AllocBeginOffset)
       return false;
     // Note that we don't count vector loads or stores as whole-alloca
     // operations which enable integer widening because we would prefer to use
     // vector widening instead.
     if (!isa<VectorType>(ValueTy) && RelBegin == 0 && RelEnd == Size)
       WholeAllocaOp = true;
     if (IntegerType *ITy = dyn_cast<IntegerType>(ValueTy)) {
       if (ITy->getBitWidth() < DL.getTypeStoreSizeInBits(ITy))
         return false;
     } else if (RelBegin != 0 || RelEnd != Size ||
                !canConvertValue(DL, ValueTy, AllocaTy)) {
       // Non-integer stores need to be convertible to the alloca type so that
       // they are promotable.
       return false;
     }
   } else if (MemIntrinsic *MI = dyn_cast<MemIntrinsic>(U->getUser())) {
     if (MI->isVolatile() || !isa<Constant>(MI->getLength()))
       return false;
     if (!S.isSplittable())
       return false; // Skip any unsplittable intrinsics.
   } else if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(U->getUser())) {
     if (II->getIntrinsicID() != Intrinsic::lifetime_start &&
         II->getIntrinsicID() != Intrinsic::lifetime_end)
       return false;
   } else {
     return false;
   }
 
   return true;
 }
 
 /// Test whether the given alloca partition's integer operations can be
 /// widened to promotable ones.
 ///
 /// This is a quick test to check whether we can rewrite the integer loads and
 /// stores to a particular alloca into wider loads and stores and be able to
 /// promote the resulting alloca.
 static bool isIntegerWideningViable(Partition &P, Type *AllocaTy,
                                     const DataLayout &DL) {
   uint64_t SizeInBits = DL.getTypeSizeInBits(AllocaTy);
   // Don't create integer types larger than the maximum bitwidth.
   if (SizeInBits > IntegerType::MAX_INT_BITS)
     return false;
 
   // Don't try to handle allocas with bit-padding.
   if (SizeInBits != DL.getTypeStoreSizeInBits(AllocaTy))
     return false;
 
   // We need to ensure that an integer type with the appropriate bitwidth can
   // be converted to the alloca type, whatever that is. We don't want to force
   // the alloca itself to have an integer type if there is a more suitable one.
   Type *IntTy = Type::getIntNTy(AllocaTy->getContext(), SizeInBits);
   if (!canConvertValue(DL, AllocaTy, IntTy) ||
       !canConvertValue(DL, IntTy, AllocaTy))
     return false;
 
   // While examining uses, we ensure that the alloca has a covering load or
   // store. We don't want to widen the integer operations only to fail to
   // promote due to some other unsplittable entry (which we may make splittable
   // later). However, if there are only splittable uses, go ahead and assume
   // that we cover the alloca.
   // FIXME: We shouldn't consider split slices that happen to start in the
   // partition here...
   bool WholeAllocaOp =
       P.begin() != P.end() ? false : DL.isLegalInteger(SizeInBits);
 
   for (const Slice &S : P)
     if (!isIntegerWideningViableForSlice(S, P.beginOffset(), AllocaTy, DL,
                                          WholeAllocaOp))
       return false;
 
   for (const Slice *S : P.splitSliceTails())
     if (!isIntegerWideningViableForSlice(*S, P.beginOffset(), AllocaTy, DL,
                                          WholeAllocaOp))
       return false;
 
   return WholeAllocaOp;
 }
 
 static Value *extractInteger(const DataLayout &DL, IRBuilderTy &IRB, Value *V,
                              IntegerType *Ty, uint64_t Offset,
                              const Twine &Name) {
   LLVM_DEBUG(dbgs() << "       start: " << *V << "\n");
   IntegerType *IntTy = cast<IntegerType>(V->getType());
   assert(DL.getTypeStoreSize(Ty) + Offset <= DL.getTypeStoreSize(IntTy) &&
          "Element extends past full value");
   uint64_t ShAmt = 8 * Offset;
   if (DL.isBigEndian())
     ShAmt = 8 * (DL.getTypeStoreSize(IntTy) - DL.getTypeStoreSize(Ty) - Offset);
   if (ShAmt) {
     V = IRB.CreateLShr(V, ShAmt, Name + ".shift");
     LLVM_DEBUG(dbgs() << "     shifted: " << *V << "\n");
   }
   assert(Ty->getBitWidth() <= IntTy->getBitWidth() &&
          "Cannot extract to a larger integer!");
   if (Ty != IntTy) {
     V = IRB.CreateTrunc(V, Ty, Name + ".trunc");
     LLVM_DEBUG(dbgs() << "     trunced: " << *V << "\n");
   }
   return V;
 }
 
 static Value *insertInteger(const DataLayout &DL, IRBuilderTy &IRB, Value *Old,
                             Value *V, uint64_t Offset, const Twine &Name) {
   IntegerType *IntTy = cast<IntegerType>(Old->getType());
   IntegerType *Ty = cast<IntegerType>(V->getType());
   assert(Ty->getBitWidth() <= IntTy->getBitWidth() &&
          "Cannot insert a larger integer!");
   LLVM_DEBUG(dbgs() << "       start: " << *V << "\n");
   if (Ty != IntTy) {
     V = IRB.CreateZExt(V, IntTy, Name + ".ext");
     LLVM_DEBUG(dbgs() << "    extended: " << *V << "\n");
   }
   assert(DL.getTypeStoreSize(Ty) + Offset <= DL.getTypeStoreSize(IntTy) &&
          "Element store outside of alloca store");
   uint64_t ShAmt = 8 * Offset;
   if (DL.isBigEndian())
     ShAmt = 8 * (DL.getTypeStoreSize(IntTy) - DL.getTypeStoreSize(Ty) - Offset);
   if (ShAmt) {
     V = IRB.CreateShl(V, ShAmt, Name + ".shift");
     LLVM_DEBUG(dbgs() << "     shifted: " << *V << "\n");
   }
 
   if (ShAmt || Ty->getBitWidth() < IntTy->getBitWidth()) {
     APInt Mask = ~Ty->getMask().zext(IntTy->getBitWidth()).shl(ShAmt);
     Old = IRB.CreateAnd(Old, Mask, Name + ".mask");
     LLVM_DEBUG(dbgs() << "      masked: " << *Old << "\n");
     V = IRB.CreateOr(Old, V, Name + ".insert");
     LLVM_DEBUG(dbgs() << "    inserted: " << *V << "\n");
   }
   return V;
 }
 
 static Value *extractVector(IRBuilderTy &IRB, Value *V, unsigned BeginIndex,
                             unsigned EndIndex, const Twine &Name) {
   VectorType *VecTy = cast<VectorType>(V->getType());
   unsigned NumElements = EndIndex - BeginIndex;
   assert(NumElements <= VecTy->getNumElements() && "Too many elements!");
 
   if (NumElements == VecTy->getNumElements())
     return V;
 
   if (NumElements == 1) {
     V = IRB.CreateExtractElement(V, IRB.getInt32(BeginIndex),
                                  Name + ".extract");
     LLVM_DEBUG(dbgs() << "     extract: " << *V << "\n");
     return V;
   }
 
   SmallVector<Constant *, 8> Mask;
   Mask.reserve(NumElements);
   for (unsigned i = BeginIndex; i != EndIndex; ++i)
     Mask.push_back(IRB.getInt32(i));
   V = IRB.CreateShuffleVector(V, UndefValue::get(V->getType()),
                               ConstantVector::get(Mask), Name + ".extract");
   LLVM_DEBUG(dbgs() << "     shuffle: " << *V << "\n");
   return V;
 }
 
 static Value *insertVector(IRBuilderTy &IRB, Value *Old, Value *V,
                            unsigned BeginIndex, const Twine &Name) {
   VectorType *VecTy = cast<VectorType>(Old->getType());
   assert(VecTy && "Can only insert a vector into a vector");
 
   VectorType *Ty = dyn_cast<VectorType>(V->getType());
   if (!Ty) {
     // Single element to insert.
     V = IRB.CreateInsertElement(Old, V, IRB.getInt32(BeginIndex),
                                 Name + ".insert");
     LLVM_DEBUG(dbgs() << "     insert: " << *V << "\n");
     return V;
   }
 
   assert(Ty->getNumElements() <= VecTy->getNumElements() &&
          "Too many elements!");
   if (Ty->getNumElements() == VecTy->getNumElements()) {
     assert(V->getType() == VecTy && "Vector type mismatch");
     return V;
   }
   unsigned EndIndex = BeginIndex + Ty->getNumElements();
 
   // When inserting a smaller vector into the larger to store, we first
   // use a shuffle vector to widen it with undef elements, and then
   // a second shuffle vector to select between the loaded vector and the
   // incoming vector.
   SmallVector<Constant *, 8> Mask;
   Mask.reserve(VecTy->getNumElements());
   for (unsigned i = 0; i != VecTy->getNumElements(); ++i)
     if (i >= BeginIndex && i < EndIndex)
       Mask.push_back(IRB.getInt32(i - BeginIndex));
     else
       Mask.push_back(UndefValue::get(IRB.getInt32Ty()));
   V = IRB.CreateShuffleVector(V, UndefValue::get(V->getType()),
                               ConstantVector::get(Mask), Name + ".expand");
   LLVM_DEBUG(dbgs() << "    shuffle: " << *V << "\n");
 
   Mask.clear();
   for (unsigned i = 0; i != VecTy->getNumElements(); ++i)
     Mask.push_back(IRB.getInt1(i >= BeginIndex && i < EndIndex));
 
   V = IRB.CreateSelect(ConstantVector::get(Mask), V, Old, Name + "blend");
 
   LLVM_DEBUG(dbgs() << "    blend: " << *V << "\n");
   return V;
 }
 
 /// Visitor to rewrite instructions using p particular slice of an alloca
 /// to use a new alloca.
 ///
 /// Also implements the rewriting to vector-based accesses when the partition
 /// passes the isVectorPromotionViable predicate. Most of the rewriting logic
 /// lives here.
 class llvm::sroa::AllocaSliceRewriter
     : public InstVisitor<AllocaSliceRewriter, bool> {
   // Befriend the base class so it can delegate to private visit methods.
   friend class InstVisitor<AllocaSliceRewriter, bool>;
 
   using Base = InstVisitor<AllocaSliceRewriter, bool>;
 
   const DataLayout &DL;
   AllocaSlices &AS;
   SROA &Pass;
   AllocaInst &OldAI, &NewAI;
   const uint64_t NewAllocaBeginOffset, NewAllocaEndOffset;
   Type *NewAllocaTy;
 
   // This is a convenience and flag variable that will be null unless the new
   // alloca's integer operations should be widened to this integer type due to
   // passing isIntegerWideningViable above. If it is non-null, the desired
   // integer type will be stored here for easy access during rewriting.
   IntegerType *IntTy;
 
   // If we are rewriting an alloca partition which can be written as pure
   // vector operations, we stash extra information here. When VecTy is
   // non-null, we have some strict guarantees about the rewritten alloca:
   //   - The new alloca is exactly the size of the vector type here.
   //   - The accesses all either map to the entire vector or to a single
   //     element.
   //   - The set of accessing instructions is only one of those handled above
   //     in isVectorPromotionViable. Generally these are the same access kinds
   //     which are promotable via mem2reg.
   VectorType *VecTy;
   Type *ElementTy;
   uint64_t ElementSize;
 
   // The original offset of the slice currently being rewritten relative to
   // the original alloca.
   uint64_t BeginOffset = 0;
   uint64_t EndOffset = 0;
 
   // The new offsets of the slice currently being rewritten relative to the
   // original alloca.
   uint64_t NewBeginOffset, NewEndOffset;
 
   uint64_t SliceSize;
   bool IsSplittable = false;
   bool IsSplit = false;
   Use *OldUse = nullptr;
   Instruction *OldPtr = nullptr;
 
   // Track post-rewrite users which are PHI nodes and Selects.
   SmallSetVector<PHINode *, 8> &PHIUsers;
   SmallSetVector<SelectInst *, 8> &SelectUsers;
 
   // Utility IR builder, whose name prefix is setup for each visited use, and
   // the insertion point is set to point to the user.
   IRBuilderTy IRB;
 
 public:
   AllocaSliceRewriter(const DataLayout &DL, AllocaSlices &AS, SROA &Pass,
                       AllocaInst &OldAI, AllocaInst &NewAI,
                       uint64_t NewAllocaBeginOffset,
                       uint64_t NewAllocaEndOffset, bool IsIntegerPromotable,
                       VectorType *PromotableVecTy,
                       SmallSetVector<PHINode *, 8> &PHIUsers,
                       SmallSetVector<SelectInst *, 8> &SelectUsers)
       : DL(DL), AS(AS), Pass(Pass), OldAI(OldAI), NewAI(NewAI),
         NewAllocaBeginOffset(NewAllocaBeginOffset),
         NewAllocaEndOffset(NewAllocaEndOffset),
         NewAllocaTy(NewAI.getAllocatedType()),
         IntTy(IsIntegerPromotable
                   ? Type::getIntNTy(
                         NewAI.getContext(),
                         DL.getTypeSizeInBits(NewAI.getAllocatedType()))
                   : nullptr),
         VecTy(PromotableVecTy),
         ElementTy(VecTy ? VecTy->getElementType() : nullptr),
         ElementSize(VecTy ? DL.getTypeSizeInBits(ElementTy) / 8 : 0),
         PHIUsers(PHIUsers), SelectUsers(SelectUsers),
         IRB(NewAI.getContext(), ConstantFolder()) {
     if (VecTy) {
       assert((DL.getTypeSizeInBits(ElementTy) % 8) == 0 &&
              "Only multiple-of-8 sized vector elements are viable");
       ++NumVectorized;
     }
     assert((!IntTy && !VecTy) || (IntTy && !VecTy) || (!IntTy && VecTy));
   }
 
   bool visit(AllocaSlices::const_iterator I) {
     bool CanSROA = true;
     BeginOffset = I->beginOffset();
     EndOffset = I->endOffset();
     IsSplittable = I->isSplittable();
     IsSplit =
         BeginOffset < NewAllocaBeginOffset || EndOffset > NewAllocaEndOffset;
     LLVM_DEBUG(dbgs() << "  rewriting " << (IsSplit ? "split " : ""));
     LLVM_DEBUG(AS.printSlice(dbgs(), I, ""));
     LLVM_DEBUG(dbgs() << "\n");
 
     // Compute the intersecting offset range.
     assert(BeginOffset < NewAllocaEndOffset);
     assert(EndOffset > NewAllocaBeginOffset);
     NewBeginOffset = std::max(BeginOffset, NewAllocaBeginOffset);
     NewEndOffset = std::min(EndOffset, NewAllocaEndOffset);
 
     SliceSize = NewEndOffset - NewBeginOffset;
 
     OldUse = I->getUse();
     OldPtr = cast<Instruction>(OldUse->get());
 
     Instruction *OldUserI = cast<Instruction>(OldUse->getUser());
     IRB.SetInsertPoint(OldUserI);
     IRB.SetCurrentDebugLocation(OldUserI->getDebugLoc());
     IRB.SetNamePrefix(Twine(NewAI.getName()) + "." + Twine(BeginOffset) + ".");
 
     CanSROA &= visit(cast<Instruction>(OldUse->getUser()));
     if (VecTy || IntTy)
       assert(CanSROA);
     return CanSROA;
   }
 
 private:
   // Make sure the other visit overloads are visible.
   using Base::visit;
 
   // Every instruction which can end up as a user must have a rewrite rule.
   bool visitInstruction(Instruction &I) {
     LLVM_DEBUG(dbgs() << "    !!!! Cannot rewrite: " << I << "\n");
     llvm_unreachable("No rewrite rule for this instruction!");
   }
 
   Value *getNewAllocaSlicePtr(IRBuilderTy &IRB, Type *PointerTy) {
     // Note that the offset computation can use BeginOffset or NewBeginOffset
     // interchangeably for unsplit slices.
     assert(IsSplit || BeginOffset == NewBeginOffset);
     uint64_t Offset = NewBeginOffset - NewAllocaBeginOffset;
 
 #ifndef NDEBUG
     StringRef OldName = OldPtr->getName();
     // Skip through the last '.sroa.' component of the name.
     size_t LastSROAPrefix = OldName.rfind(".sroa.");
     if (LastSROAPrefix != StringRef::npos) {
       OldName = OldName.substr(LastSROAPrefix + strlen(".sroa."));
       // Look for an SROA slice index.
       size_t IndexEnd = OldName.find_first_not_of("0123456789");
       if (IndexEnd != StringRef::npos && OldName[IndexEnd] == '.') {
         // Strip the index and look for the offset.
         OldName = OldName.substr(IndexEnd + 1);
         size_t OffsetEnd = OldName.find_first_not_of("0123456789");
         if (OffsetEnd != StringRef::npos && OldName[OffsetEnd] == '.')
           // Strip the offset.
           OldName = OldName.substr(OffsetEnd + 1);
       }
     }
     // Strip any SROA suffixes as well.
     OldName = OldName.substr(0, OldName.find(".sroa_"));
 #endif
 
     return getAdjustedPtr(IRB, DL, &NewAI,
                           APInt(DL.getPointerTypeSizeInBits(PointerTy), Offset),
                           PointerTy,
 #ifndef NDEBUG
                           Twine(OldName) + "."
 #else
                           Twine()
 #endif
                           );
   }
 
   /// Compute suitable alignment to access this slice of the *new*
   /// alloca.
   ///
   /// You can optionally pass a type to this routine and if that type's ABI
   /// alignment is itself suitable, this will return zero.
   unsigned getSliceAlign(Type *Ty = nullptr) {
     unsigned NewAIAlign = NewAI.getAlignment();
     if (!NewAIAlign)
       NewAIAlign = DL.getABITypeAlignment(NewAI.getAllocatedType());
     unsigned Align =
         MinAlign(NewAIAlign, NewBeginOffset - NewAllocaBeginOffset);
     return (Ty && Align == DL.getABITypeAlignment(Ty)) ? 0 : Align;
   }
 
   unsigned getIndex(uint64_t Offset) {
     assert(VecTy && "Can only call getIndex when rewriting a vector");
     uint64_t RelOffset = Offset - NewAllocaBeginOffset;
     assert(RelOffset / ElementSize < UINT32_MAX && "Index out of bounds");
     uint32_t Index = RelOffset / ElementSize;
     assert(Index * ElementSize == RelOffset);
     return Index;
   }
 
   void deleteIfTriviallyDead(Value *V) {
     Instruction *I = cast<Instruction>(V);
     if (isInstructionTriviallyDead(I))
       Pass.DeadInsts.insert(I);
   }
 
   Value *rewriteVectorizedLoadInst() {
     unsigned BeginIndex = getIndex(NewBeginOffset);
     unsigned EndIndex = getIndex(NewEndOffset);
     assert(EndIndex > BeginIndex && "Empty vector!");
 
     Value *V = IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "load");
     return extractVector(IRB, V, BeginIndex, EndIndex, "vec");
   }
 
   Value *rewriteIntegerLoad(LoadInst &LI) {
     assert(IntTy && "We cannot insert an integer to the alloca");
     assert(!LI.isVolatile());
     Value *V = IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "load");
     V = convertValue(DL, IRB, V, IntTy);
     assert(NewBeginOffset >= NewAllocaBeginOffset && "Out of bounds offset");
     uint64_t Offset = NewBeginOffset - NewAllocaBeginOffset;
     if (Offset > 0 || NewEndOffset < NewAllocaEndOffset) {
       IntegerType *ExtractTy = Type::getIntNTy(LI.getContext(), SliceSize * 8);
       V = extractInteger(DL, IRB, V, ExtractTy, Offset, "extract");
     }
     // It is possible that the extracted type is not the load type. This
     // happens if there is a load past the end of the alloca, and as
     // a consequence the slice is narrower but still a candidate for integer
     // lowering. To handle this case, we just zero extend the extracted
     // integer.
     assert(cast<IntegerType>(LI.getType())->getBitWidth() >= SliceSize * 8 &&
            "Can only handle an extract for an overly wide load");
     if (cast<IntegerType>(LI.getType())->getBitWidth() > SliceSize * 8)
       V = IRB.CreateZExt(V, LI.getType());
     return V;
   }
 
   bool visitLoadInst(LoadInst &LI) {
     LLVM_DEBUG(dbgs() << "    original: " << LI << "\n");
     Value *OldOp = LI.getOperand(0);
     assert(OldOp == OldPtr);
 
     AAMDNodes AATags;
     LI.getAAMetadata(AATags);
 
     unsigned AS = LI.getPointerAddressSpace();
 
     Type *TargetTy = IsSplit ? Type::getIntNTy(LI.getContext(), SliceSize * 8)
                              : LI.getType();
     const bool IsLoadPastEnd = DL.getTypeStoreSize(TargetTy) > SliceSize;
     bool IsPtrAdjusted = false;
     Value *V;
     if (VecTy) {
       V = rewriteVectorizedLoadInst();
     } else if (IntTy && LI.getType()->isIntegerTy()) {
       V = rewriteIntegerLoad(LI);
     } else if (NewBeginOffset == NewAllocaBeginOffset &&
                NewEndOffset == NewAllocaEndOffset &&
                (canConvertValue(DL, NewAllocaTy, TargetTy) ||
                 (IsLoadPastEnd && NewAllocaTy->isIntegerTy() &&
                  TargetTy->isIntegerTy()))) {
       LoadInst *NewLI = IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(),
                                               LI.isVolatile(), LI.getName());
       if (AATags)
         NewLI->setAAMetadata(AATags);
       if (LI.isVolatile())
         NewLI->setAtomic(LI.getOrdering(), LI.getSyncScopeID());
 
       // Any !nonnull metadata or !range metadata on the old load is also valid
       // on the new load. This is even true in some cases even when the loads
       // are different types, for example by mapping !nonnull metadata to
       // !range metadata by modeling the null pointer constant converted to the
       // integer type.
       // FIXME: Add support for range metadata here. Currently the utilities
       // for this don't propagate range metadata in trivial cases from one
       // integer load to another, don't handle non-addrspace-0 null pointers
       // correctly, and don't have any support for mapping ranges as the
       // integer type becomes winder or narrower.
       if (MDNode *N = LI.getMetadata(LLVMContext::MD_nonnull))
         copyNonnullMetadata(LI, N, *NewLI);
 
       // Try to preserve nonnull metadata
       V = NewLI;
 
       // If this is an integer load past the end of the slice (which means the
       // bytes outside the slice are undef or this load is dead) just forcibly
       // fix the integer size with correct handling of endianness.
       if (auto *AITy = dyn_cast<IntegerType>(NewAllocaTy))
         if (auto *TITy = dyn_cast<IntegerType>(TargetTy))
           if (AITy->getBitWidth() < TITy->getBitWidth()) {
             V = IRB.CreateZExt(V, TITy, "load.ext");
             if (DL.isBigEndian())
               V = IRB.CreateShl(V, TITy->getBitWidth() - AITy->getBitWidth(),
                                 "endian_shift");
           }
     } else {
       Type *LTy = TargetTy->getPointerTo(AS);
       LoadInst *NewLI = IRB.CreateAlignedLoad(getNewAllocaSlicePtr(IRB, LTy),
                                               getSliceAlign(TargetTy),
                                               LI.isVolatile(), LI.getName());
       if (AATags)
         NewLI->setAAMetadata(AATags);
       if (LI.isVolatile())
         NewLI->setAtomic(LI.getOrdering(), LI.getSyncScopeID());
 
       V = NewLI;
       IsPtrAdjusted = true;
     }
     V = convertValue(DL, IRB, V, TargetTy);
 
     if (IsSplit) {
       assert(!LI.isVolatile());
       assert(LI.getType()->isIntegerTy() &&
              "Only integer type loads and stores are split");
       assert(SliceSize < DL.getTypeStoreSize(LI.getType()) &&
              "Split load isn't smaller than original load");
       assert(LI.getType()->getIntegerBitWidth() ==
                  DL.getTypeStoreSizeInBits(LI.getType()) &&
              "Non-byte-multiple bit width");
       // Move the insertion point just past the load so that we can refer to it.
       IRB.SetInsertPoint(&*std::next(BasicBlock::iterator(&LI)));
       // Create a placeholder value with the same type as LI to use as the
       // basis for the new value. This allows us to replace the uses of LI with
       // the computed value, and then replace the placeholder with LI, leaving
       // LI only used for this computation.
       Value *Placeholder =
           new LoadInst(UndefValue::get(LI.getType()->getPointerTo(AS)));
       V = insertInteger(DL, IRB, Placeholder, V, NewBeginOffset - BeginOffset,
                         "insert");
       LI.replaceAllUsesWith(V);
       Placeholder->replaceAllUsesWith(&LI);
       Placeholder->deleteValue();
     } else {
       LI.replaceAllUsesWith(V);
     }
 
     Pass.DeadInsts.insert(&LI);
     deleteIfTriviallyDead(OldOp);
     LLVM_DEBUG(dbgs() << "          to: " << *V << "\n");
     return !LI.isVolatile() && !IsPtrAdjusted;
   }
 
   bool rewriteVectorizedStoreInst(Value *V, StoreInst &SI, Value *OldOp,
                                   AAMDNodes AATags) {
     if (V->getType() != VecTy) {
       unsigned BeginIndex = getIndex(NewBeginOffset);
       unsigned EndIndex = getIndex(NewEndOffset);
       assert(EndIndex > BeginIndex && "Empty vector!");
       unsigned NumElements = EndIndex - BeginIndex;
       assert(NumElements <= VecTy->getNumElements() && "Too many elements!");
       Type *SliceTy = (NumElements == 1)
                           ? ElementTy
                           : VectorType::get(ElementTy, NumElements);
       if (V->getType() != SliceTy)
         V = convertValue(DL, IRB, V, SliceTy);
 
       // Mix in the existing elements.
       Value *Old = IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "load");
       V = insertVector(IRB, Old, V, BeginIndex, "vec");
     }
     StoreInst *Store = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment());
     if (AATags)
       Store->setAAMetadata(AATags);
     Pass.DeadInsts.insert(&SI);
 
     LLVM_DEBUG(dbgs() << "          to: " << *Store << "\n");
     return true;
   }
 
   bool rewriteIntegerStore(Value *V, StoreInst &SI, AAMDNodes AATags) {
     assert(IntTy && "We cannot extract an integer from the alloca");
     assert(!SI.isVolatile());
     if (DL.getTypeSizeInBits(V->getType()) != IntTy->getBitWidth()) {
       Value *Old =
           IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "oldload");
       Old = convertValue(DL, IRB, Old, IntTy);
       assert(BeginOffset >= NewAllocaBeginOffset && "Out of bounds offset");
       uint64_t Offset = BeginOffset - NewAllocaBeginOffset;
       V = insertInteger(DL, IRB, Old, SI.getValueOperand(), Offset, "insert");
     }
     V = convertValue(DL, IRB, V, NewAllocaTy);
     StoreInst *Store = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment());
     Store->copyMetadata(SI, LLVMContext::MD_mem_parallel_loop_access);
     if (AATags)
       Store->setAAMetadata(AATags);
     Pass.DeadInsts.insert(&SI);
     LLVM_DEBUG(dbgs() << "          to: " << *Store << "\n");
     return true;
   }
 
   bool visitStoreInst(StoreInst &SI) {
     LLVM_DEBUG(dbgs() << "    original: " << SI << "\n");
     Value *OldOp = SI.getOperand(1);
     assert(OldOp == OldPtr);
 
     AAMDNodes AATags;
     SI.getAAMetadata(AATags);
 
     Value *V = SI.getValueOperand();
 
     // Strip all inbounds GEPs and pointer casts to try to dig out any root
     // alloca that should be re-examined after promoting this alloca.
     if (V->getType()->isPointerTy())
       if (AllocaInst *AI = dyn_cast<AllocaInst>(V->stripInBoundsOffsets()))
         Pass.PostPromotionWorklist.insert(AI);
 
     if (SliceSize < DL.getTypeStoreSize(V->getType())) {
       assert(!SI.isVolatile());
       assert(V->getType()->isIntegerTy() &&
              "Only integer type loads and stores are split");
       assert(V->getType()->getIntegerBitWidth() ==
                  DL.getTypeStoreSizeInBits(V->getType()) &&
              "Non-byte-multiple bit width");
       IntegerType *NarrowTy = Type::getIntNTy(SI.getContext(), SliceSize * 8);
       V = extractInteger(DL, IRB, V, NarrowTy, NewBeginOffset - BeginOffset,
                          "extract");
     }
 
     if (VecTy)
       return rewriteVectorizedStoreInst(V, SI, OldOp, AATags);
     if (IntTy && V->getType()->isIntegerTy())
       return rewriteIntegerStore(V, SI, AATags);
 
     const bool IsStorePastEnd = DL.getTypeStoreSize(V->getType()) > SliceSize;
     StoreInst *NewSI;
     if (NewBeginOffset == NewAllocaBeginOffset &&
         NewEndOffset == NewAllocaEndOffset &&
         (canConvertValue(DL, V->getType(), NewAllocaTy) ||
          (IsStorePastEnd && NewAllocaTy->isIntegerTy() &&
           V->getType()->isIntegerTy()))) {
       // If this is an integer store past the end of slice (and thus the bytes
       // past that point are irrelevant or this is unreachable), truncate the
       // value prior to storing.
       if (auto *VITy = dyn_cast<IntegerType>(V->getType()))
         if (auto *AITy = dyn_cast<IntegerType>(NewAllocaTy))
           if (VITy->getBitWidth() > AITy->getBitWidth()) {
             if (DL.isBigEndian())
               V = IRB.CreateLShr(V, VITy->getBitWidth() - AITy->getBitWidth(),
                                  "endian_shift");
             V = IRB.CreateTrunc(V, AITy, "load.trunc");
           }
 
       V = convertValue(DL, IRB, V, NewAllocaTy);
       NewSI = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment(),
                                      SI.isVolatile());
     } else {
       unsigned AS = SI.getPointerAddressSpace();
       Value *NewPtr = getNewAllocaSlicePtr(IRB, V->getType()->getPointerTo(AS));
       NewSI = IRB.CreateAlignedStore(V, NewPtr, getSliceAlign(V->getType()),
                                      SI.isVolatile());
     }
     NewSI->copyMetadata(SI, LLVMContext::MD_mem_parallel_loop_access);
     if (AATags)
       NewSI->setAAMetadata(AATags);
     if (SI.isVolatile())
       NewSI->setAtomic(SI.getOrdering(), SI.getSyncScopeID());
     Pass.DeadInsts.insert(&SI);
     deleteIfTriviallyDead(OldOp);
 
     LLVM_DEBUG(dbgs() << "          to: " << *NewSI << "\n");
     return NewSI->getPointerOperand() == &NewAI && !SI.isVolatile();
   }
 
   /// Compute an integer value from splatting an i8 across the given
   /// number of bytes.
   ///
   /// Note that this routine assumes an i8 is a byte. If that isn't true, don't
   /// call this routine.
   /// FIXME: Heed the advice above.
   ///
   /// \param V The i8 value to splat.
   /// \param Size The number of bytes in the output (assuming i8 is one byte)
   Value *getIntegerSplat(Value *V, unsigned Size) {
     assert(Size > 0 && "Expected a positive number of bytes.");
     IntegerType *VTy = cast<IntegerType>(V->getType());
     assert(VTy->getBitWidth() == 8 && "Expected an i8 value for the byte");
     if (Size == 1)
       return V;
 
     Type *SplatIntTy = Type::getIntNTy(VTy->getContext(), Size * 8);
     V = IRB.CreateMul(
         IRB.CreateZExt(V, SplatIntTy, "zext"),
         ConstantExpr::getUDiv(
             Constant::getAllOnesValue(SplatIntTy),
             ConstantExpr::getZExt(Constant::getAllOnesValue(V->getType()),
                                   SplatIntTy)),
         "isplat");
     return V;
   }
 
   /// Compute a vector splat for a given element value.
   Value *getVectorSplat(Value *V, unsigned NumElements) {
     V = IRB.CreateVectorSplat(NumElements, V, "vsplat");
     LLVM_DEBUG(dbgs() << "       splat: " << *V << "\n");
     return V;
   }
 
   bool visitMemSetInst(MemSetInst &II) {
     LLVM_DEBUG(dbgs() << "    original: " << II << "\n");
     assert(II.getRawDest() == OldPtr);
 
     AAMDNodes AATags;
     II.getAAMetadata(AATags);
 
     // If the memset has a variable size, it cannot be split, just adjust the
     // pointer to the new alloca.
     if (!isa<Constant>(II.getLength())) {
       assert(!IsSplit);
       assert(NewBeginOffset == BeginOffset);
       II.setDest(getNewAllocaSlicePtr(IRB, OldPtr->getType()));
       II.setDestAlignment(getSliceAlign());
 
       deleteIfTriviallyDead(OldPtr);
       return false;
     }
 
     // Record this instruction for deletion.
     Pass.DeadInsts.insert(&II);
 
     Type *AllocaTy = NewAI.getAllocatedType();
     Type *ScalarTy = AllocaTy->getScalarType();
 
     // If this doesn't map cleanly onto the alloca type, and that type isn't
     // a single value type, just emit a memset.
     if (!VecTy && !IntTy &&
         (BeginOffset > NewAllocaBeginOffset || EndOffset < NewAllocaEndOffset ||
          SliceSize != DL.getTypeStoreSize(AllocaTy) ||
          !AllocaTy->isSingleValueType() ||
          !DL.isLegalInteger(DL.getTypeSizeInBits(ScalarTy)) ||
          DL.getTypeSizeInBits(ScalarTy) % 8 != 0)) {
       Type *SizeTy = II.getLength()->getType();
       Constant *Size = ConstantInt::get(SizeTy, NewEndOffset - NewBeginOffset);
       CallInst *New = IRB.CreateMemSet(
           getNewAllocaSlicePtr(IRB, OldPtr->getType()), II.getValue(), Size,
           getSliceAlign(), II.isVolatile());
       if (AATags)
         New->setAAMetadata(AATags);
       LLVM_DEBUG(dbgs() << "          to: " << *New << "\n");
       return false;
     }
 
     // If we can represent this as a simple value, we have to build the actual
     // value to store, which requires expanding the byte present in memset to
     // a sensible representation for the alloca type. This is essentially
     // splatting the byte to a sufficiently wide integer, splatting it across
     // any desired vector width, and bitcasting to the final type.
     Value *V;
 
     if (VecTy) {
       // If this is a memset of a vectorized alloca, insert it.
       assert(ElementTy == ScalarTy);
 
       unsigned BeginIndex = getIndex(NewBeginOffset);
       unsigned EndIndex = getIndex(NewEndOffset);
       assert(EndIndex > BeginIndex && "Empty vector!");
       unsigned NumElements = EndIndex - BeginIndex;
       assert(NumElements <= VecTy->getNumElements() && "Too many elements!");
 
       Value *Splat =
           getIntegerSplat(II.getValue(), DL.getTypeSizeInBits(ElementTy) / 8);
       Splat = convertValue(DL, IRB, Splat, ElementTy);
       if (NumElements > 1)
         Splat = getVectorSplat(Splat, NumElements);
 
       Value *Old =
           IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "oldload");
       V = insertVector(IRB, Old, Splat, BeginIndex, "vec");
     } else if (IntTy) {
       // If this is a memset on an alloca where we can widen stores, insert the
       // set integer.
       assert(!II.isVolatile());
 
       uint64_t Size = NewEndOffset - NewBeginOffset;
       V = getIntegerSplat(II.getValue(), Size);
 
       if (IntTy && (BeginOffset != NewAllocaBeginOffset ||
                     EndOffset != NewAllocaBeginOffset)) {
         Value *Old =
             IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "oldload");
         Old = convertValue(DL, IRB, Old, IntTy);
         uint64_t Offset = NewBeginOffset - NewAllocaBeginOffset;
         V = insertInteger(DL, IRB, Old, V, Offset, "insert");
       } else {
         assert(V->getType() == IntTy &&
                "Wrong type for an alloca wide integer!");
       }
       V = convertValue(DL, IRB, V, AllocaTy);
     } else {
       // Established these invariants above.
       assert(NewBeginOffset == NewAllocaBeginOffset);
       assert(NewEndOffset == NewAllocaEndOffset);
 
       V = getIntegerSplat(II.getValue(), DL.getTypeSizeInBits(ScalarTy) / 8);
       if (VectorType *AllocaVecTy = dyn_cast<VectorType>(AllocaTy))
         V = getVectorSplat(V, AllocaVecTy->getNumElements());
 
       V = convertValue(DL, IRB, V, AllocaTy);
     }
 
     StoreInst *New = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment(),
                                             II.isVolatile());
     if (AATags)
       New->setAAMetadata(AATags);
     LLVM_DEBUG(dbgs() << "          to: " << *New << "\n");
     return !II.isVolatile();
   }
 
   bool visitMemTransferInst(MemTransferInst &II) {
     // Rewriting of memory transfer instructions can be a bit tricky. We break
     // them into two categories: split intrinsics and unsplit intrinsics.
 
     LLVM_DEBUG(dbgs() << "    original: " << II << "\n");
 
     AAMDNodes AATags;
     II.getAAMetadata(AATags);
 
     bool IsDest = &II.getRawDestUse() == OldUse;
     assert((IsDest && II.getRawDest() == OldPtr) ||
            (!IsDest && II.getRawSource() == OldPtr));
 
     unsigned SliceAlign = getSliceAlign();
 
     // For unsplit intrinsics, we simply modify the source and destination
     // pointers in place. This isn't just an optimization, it is a matter of
     // correctness. With unsplit intrinsics we may be dealing with transfers
     // within a single alloca before SROA ran, or with transfers that have
     // a variable length. We may also be dealing with memmove instead of
     // memcpy, and so simply updating the pointers is the necessary for us to
     // update both source and dest of a single call.
     if (!IsSplittable) {
       Value *AdjustedPtr = getNewAllocaSlicePtr(IRB, OldPtr->getType());
       if (IsDest) {
         II.setDest(AdjustedPtr);
         II.setDestAlignment(SliceAlign);
       }
       else {
         II.setSource(AdjustedPtr);
         II.setSourceAlignment(SliceAlign);
       }
 
       LLVM_DEBUG(dbgs() << "          to: " << II << "\n");
       deleteIfTriviallyDead(OldPtr);
       return false;
     }
     // For split transfer intrinsics we have an incredibly useful assurance:
     // the source and destination do not reside within the same alloca, and at
     // least one of them does not escape. This means that we can replace
     // memmove with memcpy, and we don't need to worry about all manner of
     // downsides to splitting and transforming the operations.
 
     // If this doesn't map cleanly onto the alloca type, and that type isn't
     // a single value type, just emit a memcpy.
     bool EmitMemCpy =
         !VecTy && !IntTy &&
         (BeginOffset > NewAllocaBeginOffset || EndOffset < NewAllocaEndOffset ||
          SliceSize != DL.getTypeStoreSize(NewAI.getAllocatedType()) ||
          !NewAI.getAllocatedType()->isSingleValueType());
 
     // If we're just going to emit a memcpy, the alloca hasn't changed, and the
     // size hasn't been shrunk based on analysis of the viable range, this is
     // a no-op.
     if (EmitMemCpy && &OldAI == &NewAI) {
       // Ensure the start lines up.
       assert(NewBeginOffset == BeginOffset);
 
       // Rewrite the size as needed.
       if (NewEndOffset != EndOffset)
         II.setLength(ConstantInt::get(II.getLength()->getType(),
                                       NewEndOffset - NewBeginOffset));
       return false;
     }
     // Record this instruction for deletion.
     Pass.DeadInsts.insert(&II);
 
     // Strip all inbounds GEPs and pointer casts to try to dig out any root
     // alloca that should be re-examined after rewriting this instruction.
     Value *OtherPtr = IsDest ? II.getRawSource() : II.getRawDest();
     if (AllocaInst *AI =
             dyn_cast<AllocaInst>(OtherPtr->stripInBoundsOffsets())) {
       assert(AI != &OldAI && AI != &NewAI &&
              "Splittable transfers cannot reach the same alloca on both ends.");
       Pass.Worklist.insert(AI);
     }
 
     Type *OtherPtrTy = OtherPtr->getType();
     unsigned OtherAS = OtherPtrTy->getPointerAddressSpace();
 
     // Compute the relative offset for the other pointer within the transfer.
     unsigned IntPtrWidth = DL.getPointerSizeInBits(OtherAS);
     APInt OtherOffset(IntPtrWidth, NewBeginOffset - BeginOffset);
     unsigned OtherAlign =
       IsDest ? II.getSourceAlignment() : II.getDestAlignment();
     OtherAlign =  MinAlign(OtherAlign ? OtherAlign : 1,
                            OtherOffset.zextOrTrunc(64).getZExtValue());
 
     if (EmitMemCpy) {
       // Compute the other pointer, folding as much as possible to produce
       // a single, simple GEP in most cases.
       OtherPtr = getAdjustedPtr(IRB, DL, OtherPtr, OtherOffset, OtherPtrTy,
                                 OtherPtr->getName() + ".");
 
       Value *OurPtr = getNewAllocaSlicePtr(IRB, OldPtr->getType());
       Type *SizeTy = II.getLength()->getType();
       Constant *Size = ConstantInt::get(SizeTy, NewEndOffset - NewBeginOffset);
 
       Value *DestPtr, *SrcPtr;
       unsigned DestAlign, SrcAlign;
       // Note: IsDest is true iff we're copying into the new alloca slice
       if (IsDest) {
         DestPtr = OurPtr;
         DestAlign = SliceAlign;
         SrcPtr = OtherPtr;
         SrcAlign = OtherAlign;
       } else {
         DestPtr = OtherPtr;
         DestAlign = OtherAlign;
         SrcPtr = OurPtr;
         SrcAlign = SliceAlign;
       }
       CallInst *New = IRB.CreateMemCpy(DestPtr, DestAlign, SrcPtr, SrcAlign,
                                        Size, II.isVolatile());
       if (AATags)
         New->setAAMetadata(AATags);
       LLVM_DEBUG(dbgs() << "          to: " << *New << "\n");
       return false;
     }
 
     bool IsWholeAlloca = NewBeginOffset == NewAllocaBeginOffset &&
                          NewEndOffset == NewAllocaEndOffset;
     uint64_t Size = NewEndOffset - NewBeginOffset;
     unsigned BeginIndex = VecTy ? getIndex(NewBeginOffset) : 0;
     unsigned EndIndex = VecTy ? getIndex(NewEndOffset) : 0;
     unsigned NumElements = EndIndex - BeginIndex;
     IntegerType *SubIntTy =
         IntTy ? Type::getIntNTy(IntTy->getContext(), Size * 8) : nullptr;
 
     // Reset the other pointer type to match the register type we're going to
     // use, but using the address space of the original other pointer.
     if (VecTy && !IsWholeAlloca) {
       if (NumElements == 1)
         OtherPtrTy = VecTy->getElementType();
       else
         OtherPtrTy = VectorType::get(VecTy->getElementType(), NumElements);
 
       OtherPtrTy = OtherPtrTy->getPointerTo(OtherAS);
     } else if (IntTy && !IsWholeAlloca) {
       OtherPtrTy = SubIntTy->getPointerTo(OtherAS);
     } else {
       OtherPtrTy = NewAllocaTy->getPointerTo(OtherAS);
     }
 
     Value *SrcPtr = getAdjustedPtr(IRB, DL, OtherPtr, OtherOffset, OtherPtrTy,
                                    OtherPtr->getName() + ".");
     unsigned SrcAlign = OtherAlign;
     Value *DstPtr = &NewAI;
     unsigned DstAlign = SliceAlign;
     if (!IsDest) {
       std::swap(SrcPtr, DstPtr);
       std::swap(SrcAlign, DstAlign);
     }
 
     Value *Src;
     if (VecTy && !IsWholeAlloca && !IsDest) {
       Src = IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "load");
       Src = extractVector(IRB, Src, BeginIndex, EndIndex, "vec");
     } else if (IntTy && !IsWholeAlloca && !IsDest) {
       Src = IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "load");
       Src = convertValue(DL, IRB, Src, IntTy);
       uint64_t Offset = NewBeginOffset - NewAllocaBeginOffset;
       Src = extractInteger(DL, IRB, Src, SubIntTy, Offset, "extract");
     } else {
       LoadInst *Load = IRB.CreateAlignedLoad(SrcPtr, SrcAlign, II.isVolatile(),
                                              "copyload");
       if (AATags)
         Load->setAAMetadata(AATags);
       Src = Load;
     }
 
     if (VecTy && !IsWholeAlloca && IsDest) {
       Value *Old =
           IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "oldload");
       Src = insertVector(IRB, Old, Src, BeginIndex, "vec");
     } else if (IntTy && !IsWholeAlloca && IsDest) {
       Value *Old =
           IRB.CreateAlignedLoad(&NewAI, NewAI.getAlignment(), "oldload");
       Old = convertValue(DL, IRB, Old, IntTy);
       uint64_t Offset = NewBeginOffset - NewAllocaBeginOffset;
       Src = insertInteger(DL, IRB, Old, Src, Offset, "insert");
       Src = convertValue(DL, IRB, Src, NewAllocaTy);
     }
 
     StoreInst *Store = cast<StoreInst>(
         IRB.CreateAlignedStore(Src, DstPtr, DstAlign, II.isVolatile()));
     if (AATags)
       Store->setAAMetadata(AATags);
     LLVM_DEBUG(dbgs() << "          to: " << *Store << "\n");
     return !II.isVolatile();
   }
 
   bool visitIntrinsicInst(IntrinsicInst &II) {
     assert(II.getIntrinsicID() == Intrinsic::lifetime_start ||
            II.getIntrinsicID() == Intrinsic::lifetime_end);
     LLVM_DEBUG(dbgs() << "    original: " << II << "\n");
     assert(II.getArgOperand(1) == OldPtr);
 
     // Record this instruction for deletion.
     Pass.DeadInsts.insert(&II);
 
     // Lifetime intrinsics are only promotable if they cover the whole alloca.
     // Therefore, we drop lifetime intrinsics which don't cover the whole
     // alloca.
     // (In theory, intrinsics which partially cover an alloca could be
     // promoted, but PromoteMemToReg doesn't handle that case.)
     // FIXME: Check whether the alloca is promotable before dropping the
     // lifetime intrinsics?
     if (NewBeginOffset != NewAllocaBeginOffset ||
         NewEndOffset != NewAllocaEndOffset)
       return true;
 
     ConstantInt *Size =
         ConstantInt::get(cast<IntegerType>(II.getArgOperand(0)->getType()),
                          NewEndOffset - NewBeginOffset);
     Value *Ptr = getNewAllocaSlicePtr(IRB, OldPtr->getType());
     Value *New;
     if (II.getIntrinsicID() == Intrinsic::lifetime_start)
       New = IRB.CreateLifetimeStart(Ptr, Size);
     else
       New = IRB.CreateLifetimeEnd(Ptr, Size);
 
     (void)New;
     LLVM_DEBUG(dbgs() << "          to: " << *New << "\n");
 
     return true;
   }
 
+  void fixLoadStoreAlign(Instruction &Root) {
+    // This algorithm implements the same visitor loop as
+    // hasUnsafePHIOrSelectUse, and fixes the alignment of each load
+    // or store found.
+    SmallPtrSet<Instruction *, 4> Visited;
+    SmallVector<Instruction *, 4> Uses;
+    Visited.insert(&Root);
+    Uses.push_back(&Root);
+    do {
+      Instruction *I = Uses.pop_back_val();
+
+      if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
+        unsigned LoadAlign = LI->getAlignment();
+        if (!LoadAlign)
+          LoadAlign = DL.getABITypeAlignment(LI->getType());
+        LI->setAlignment(std::min(LoadAlign, getSliceAlign()));
+        continue;
+      }
+      if (StoreInst *SI = dyn_cast<StoreInst>(I)) {
+        unsigned StoreAlign = SI->getAlignment();
+        if (!StoreAlign) {
+          Value *Op = SI->getOperand(0);
+          StoreAlign = DL.getABITypeAlignment(Op->getType());
+        }
+        SI->setAlignment(std::min(StoreAlign, getSliceAlign()));
+        continue;
+      }
+
+      assert(isa<BitCastInst>(I) || isa<PHINode>(I) ||
+             isa<SelectInst>(I) || isa<GetElementPtrInst>(I));
+      for (User *U : I->users())
+        if (Visited.insert(cast<Instruction>(U)).second)
+          Uses.push_back(cast<Instruction>(U));
+    } while (!Uses.empty());
+  }
+
   bool visitPHINode(PHINode &PN) {
     LLVM_DEBUG(dbgs() << "    original: " << PN << "\n");
     assert(BeginOffset >= NewAllocaBeginOffset && "PHIs are unsplittable");
     assert(EndOffset <= NewAllocaEndOffset && "PHIs are unsplittable");
 
     // We would like to compute a new pointer in only one place, but have it be
     // as local as possible to the PHI. To do that, we re-use the location of
     // the old pointer, which necessarily must be in the right position to
     // dominate the PHI.
     IRBuilderTy PtrBuilder(IRB);
     if (isa<PHINode>(OldPtr))
       PtrBuilder.SetInsertPoint(&*OldPtr->getParent()->getFirstInsertionPt());
     else
       PtrBuilder.SetInsertPoint(OldPtr);
     PtrBuilder.SetCurrentDebugLocation(OldPtr->getDebugLoc());
 
     Value *NewPtr = getNewAllocaSlicePtr(PtrBuilder, OldPtr->getType());
     // Replace the operands which were using the old pointer.
     std::replace(PN.op_begin(), PN.op_end(), cast<Value>(OldPtr), NewPtr);
 
     LLVM_DEBUG(dbgs() << "          to: " << PN << "\n");
     deleteIfTriviallyDead(OldPtr);
 
+    // Fix the alignment of any loads or stores using this PHI node.
+    fixLoadStoreAlign(PN);
+
     // PHIs can't be promoted on their own, but often can be speculated. We
     // check the speculation outside of the rewriter so that we see the
     // fully-rewritten alloca.
     PHIUsers.insert(&PN);
     return true;
   }
 
   bool visitSelectInst(SelectInst &SI) {
     LLVM_DEBUG(dbgs() << "    original: " << SI << "\n");
     assert((SI.getTrueValue() == OldPtr || SI.getFalseValue() == OldPtr) &&
            "Pointer isn't an operand!");
     assert(BeginOffset >= NewAllocaBeginOffset && "Selects are unsplittable");
     assert(EndOffset <= NewAllocaEndOffset && "Selects are unsplittable");
 
     Value *NewPtr = getNewAllocaSlicePtr(IRB, OldPtr->getType());
     // Replace the operands which were using the old pointer.
     if (SI.getOperand(1) == OldPtr)
       SI.setOperand(1, NewPtr);
     if (SI.getOperand(2) == OldPtr)
       SI.setOperand(2, NewPtr);
 
     LLVM_DEBUG(dbgs() << "          to: " << SI << "\n");
     deleteIfTriviallyDead(OldPtr);
+
+    // Fix the alignment of any loads or stores using this select.
+    fixLoadStoreAlign(SI);
 
     // Selects can't be promoted on their own, but often can be speculated. We
     // check the speculation outside of the rewriter so that we see the
     // fully-rewritten alloca.
     SelectUsers.insert(&SI);
     return true;
   }
 };
 
 namespace {
 
 /// Visitor to rewrite aggregate loads and stores as scalar.
 ///
 /// This pass aggressively rewrites all aggregate loads and stores on
 /// a particular pointer (or any pointer derived from it which we can identify)
 /// with scalar loads and stores.
 class AggLoadStoreRewriter : public InstVisitor<AggLoadStoreRewriter, bool> {
   // Befriend the base class so it can delegate to private visit methods.
   friend class InstVisitor<AggLoadStoreRewriter, bool>;
 
   /// Queue of pointer uses to analyze and potentially rewrite.
   SmallVector<Use *, 8> Queue;
 
   /// Set to prevent us from cycling with phi nodes and loops.
   SmallPtrSet<User *, 8> Visited;
 
   /// The current pointer use being rewritten. This is used to dig up the used
   /// value (as opposed to the user).
   Use *U;
 
 public:
   /// Rewrite loads and stores through a pointer and all pointers derived from
   /// it.
   bool rewrite(Instruction &I) {
     LLVM_DEBUG(dbgs() << "  Rewriting FCA loads and stores...\n");
     enqueueUsers(I);
     bool Changed = false;
     while (!Queue.empty()) {
       U = Queue.pop_back_val();
       Changed |= visit(cast<Instruction>(U->getUser()));
     }
     return Changed;
   }
 
 private:
   /// Enqueue all the users of the given instruction for further processing.
   /// This uses a set to de-duplicate users.
   void enqueueUsers(Instruction &I) {
     for (Use &U : I.uses())
       if (Visited.insert(U.getUser()).second)
         Queue.push_back(&U);
   }
 
   // Conservative default is to not rewrite anything.
   bool visitInstruction(Instruction &I) { return false; }
 
   /// Generic recursive split emission class.
   template <typename Derived> class OpSplitter {
   protected:
     /// The builder used to form new instructions.
     IRBuilderTy IRB;
 
     /// The indices which to be used with insert- or extractvalue to select the
     /// appropriate value within the aggregate.
     SmallVector<unsigned, 4> Indices;
 
     /// The indices to a GEP instruction which will move Ptr to the correct slot
     /// within the aggregate.
     SmallVector<Value *, 4> GEPIndices;
 
     /// The base pointer of the original op, used as a base for GEPing the
     /// split operations.
     Value *Ptr;
 
     /// Initialize the splitter with an insertion point, Ptr and start with a
     /// single zero GEP index.
     OpSplitter(Instruction *InsertionPoint, Value *Ptr)
         : IRB(InsertionPoint), GEPIndices(1, IRB.getInt32(0)), Ptr(Ptr) {}
 
   public:
     /// Generic recursive split emission routine.
     ///
     /// This method recursively splits an aggregate op (load or store) into
     /// scalar or vector ops. It splits recursively until it hits a single value
     /// and emits that single value operation via the template argument.
     ///
     /// The logic of this routine relies on GEPs and insertvalue and
     /// extractvalue all operating with the same fundamental index list, merely
     /// formatted differently (GEPs need actual values).
     ///
     /// \param Ty  The type being split recursively into smaller ops.
     /// \param Agg The aggregate value being built up or stored, depending on
     /// whether this is splitting a load or a store respectively.
     void emitSplitOps(Type *Ty, Value *&Agg, const Twine &Name) {
       if (Ty->isSingleValueType())
         return static_cast<Derived *>(this)->emitFunc(Ty, Agg, Name);
 
       if (ArrayType *ATy = dyn_cast<ArrayType>(Ty)) {
         unsigned OldSize = Indices.size();
         (void)OldSize;
         for (unsigned Idx = 0, Size = ATy->getNumElements(); Idx != Size;
              ++Idx) {
           assert(Indices.size() == OldSize && "Did not return to the old size");
           Indices.push_back(Idx);
           GEPIndices.push_back(IRB.getInt32(Idx));
           emitSplitOps(ATy->getElementType(), Agg, Name + "." + Twine(Idx));
           GEPIndices.pop_back();
           Indices.pop_back();
         }
         return;
       }
 
       if (StructType *STy = dyn_cast<StructType>(Ty)) {
         unsigned OldSize = Indices.size();
         (void)OldSize;
         for (unsigned Idx = 0, Size = STy->getNumElements(); Idx != Size;
              ++Idx) {
           assert(Indices.size() == OldSize && "Did not return to the old size");
           Indices.push_back(Idx);
           GEPIndices.push_back(IRB.getInt32(Idx));
           emitSplitOps(STy->getElementType(Idx), Agg, Name + "." + Twine(Idx));
           GEPIndices.pop_back();
           Indices.pop_back();
         }
         return;
       }
 
       llvm_unreachable("Only arrays and structs are aggregate loadable types");
     }
   };
 
   struct LoadOpSplitter : public OpSplitter<LoadOpSplitter> {
     AAMDNodes AATags;
 
     LoadOpSplitter(Instruction *InsertionPoint, Value *Ptr, AAMDNodes AATags)
         : OpSplitter<LoadOpSplitter>(InsertionPoint, Ptr), AATags(AATags) {}
 
     /// Emit a leaf load of a single value. This is called at the leaves of the
     /// recursive emission to actually load values.
     void emitFunc(Type *Ty, Value *&Agg, const Twine &Name) {
       assert(Ty->isSingleValueType());
       // Load the single value and insert it using the indices.
       Value *GEP =
           IRB.CreateInBoundsGEP(nullptr, Ptr, GEPIndices, Name + ".gep");
       LoadInst *Load = IRB.CreateLoad(GEP, Name + ".load");
       if (AATags)
         Load->setAAMetadata(AATags);
       Agg = IRB.CreateInsertValue(Agg, Load, Indices, Name + ".insert");
       LLVM_DEBUG(dbgs() << "          to: " << *Load << "\n");
     }
   };
 
   bool visitLoadInst(LoadInst &LI) {
     assert(LI.getPointerOperand() == *U);
     if (!LI.isSimple() || LI.getType()->isSingleValueType())
       return false;
 
     // We have an aggregate being loaded, split it apart.
     LLVM_DEBUG(dbgs() << "    original: " << LI << "\n");
     AAMDNodes AATags;
     LI.getAAMetadata(AATags);
     LoadOpSplitter Splitter(&LI, *U, AATags);
     Value *V = UndefValue::get(LI.getType());
     Splitter.emitSplitOps(LI.getType(), V, LI.getName() + ".fca");
     LI.replaceAllUsesWith(V);
     LI.eraseFromParent();
     return true;
   }
 
   struct StoreOpSplitter : public OpSplitter<StoreOpSplitter> {
     StoreOpSplitter(Instruction *InsertionPoint, Value *Ptr, AAMDNodes AATags)
         : OpSplitter<StoreOpSplitter>(InsertionPoint, Ptr), AATags(AATags) {}
     AAMDNodes AATags;
 
     /// Emit a leaf store of a single value. This is called at the leaves of the
     /// recursive emission to actually produce stores.
     void emitFunc(Type *Ty, Value *&Agg, const Twine &Name) {
       assert(Ty->isSingleValueType());
       // Extract the single value and store it using the indices.
       //
       // The gep and extractvalue values are factored out of the CreateStore
       // call to make the output independent of the argument evaluation order.
       Value *ExtractValue =
           IRB.CreateExtractValue(Agg, Indices, Name + ".extract");
       Value *InBoundsGEP =
           IRB.CreateInBoundsGEP(nullptr, Ptr, GEPIndices, Name + ".gep");
       StoreInst *Store = IRB.CreateStore(ExtractValue, InBoundsGEP);
       if (AATags)
         Store->setAAMetadata(AATags);
       LLVM_DEBUG(dbgs() << "          to: " << *Store << "\n");
     }
   };
 
   bool visitStoreInst(StoreInst &SI) {
     if (!SI.isSimple() || SI.getPointerOperand() != *U)
       return false;
     Value *V = SI.getValueOperand();
     if (V->getType()->isSingleValueType())
       return false;
 
     // We have an aggregate being stored, split it apart.
     LLVM_DEBUG(dbgs() << "    original: " << SI << "\n");
     AAMDNodes AATags;
     SI.getAAMetadata(AATags);
     StoreOpSplitter Splitter(&SI, *U, AATags);
     Splitter.emitSplitOps(V->getType(), V, V->getName() + ".fca");
     SI.eraseFromParent();
     return true;
   }
 
   bool visitBitCastInst(BitCastInst &BC) {
     enqueueUsers(BC);
     return false;
   }
 
   bool visitGetElementPtrInst(GetElementPtrInst &GEPI) {
     enqueueUsers(GEPI);
     return false;
   }
 
   bool visitPHINode(PHINode &PN) {
     enqueueUsers(PN);
     return false;
   }
 
   bool visitSelectInst(SelectInst &SI) {
     enqueueUsers(SI);
     return false;
   }
 };
 
 } // end anonymous namespace
 
 /// Strip aggregate type wrapping.
 ///
 /// This removes no-op aggregate types wrapping an underlying type. It will
 /// strip as many layers of types as it can without changing either the type
 /// size or the allocated size.
 static Type *stripAggregateTypeWrapping(const DataLayout &DL, Type *Ty) {
   if (Ty->isSingleValueType())
     return Ty;
 
   uint64_t AllocSize = DL.getTypeAllocSize(Ty);
   uint64_t TypeSize = DL.getTypeSizeInBits(Ty);
 
   Type *InnerTy;
   if (ArrayType *ArrTy = dyn_cast<ArrayType>(Ty)) {
     InnerTy = ArrTy->getElementType();
   } else if (StructType *STy = dyn_cast<StructType>(Ty)) {
     const StructLayout *SL = DL.getStructLayout(STy);
     unsigned Index = SL->getElementContainingOffset(0);
     InnerTy = STy->getElementType(Index);
   } else {
     return Ty;
   }
 
   if (AllocSize > DL.getTypeAllocSize(InnerTy) ||
       TypeSize > DL.getTypeSizeInBits(InnerTy))
     return Ty;
 
   return stripAggregateTypeWrapping(DL, InnerTy);
 }
 
 /// Try to find a partition of the aggregate type passed in for a given
 /// offset and size.
 ///
 /// This recurses through the aggregate type and tries to compute a subtype
 /// based on the offset and size. When the offset and size span a sub-section
 /// of an array, it will even compute a new array type for that sub-section,
 /// and the same for structs.
 ///
 /// Note that this routine is very strict and tries to find a partition of the
 /// type which produces the *exact* right offset and size. It is not forgiving
 /// when the size or offset cause either end of type-based partition to be off.
 /// Also, this is a best-effort routine. It is reasonable to give up and not
 /// return a type if necessary.
 static Type *getTypePartition(const DataLayout &DL, Type *Ty, uint64_t Offset,
                               uint64_t Size) {
   if (Offset == 0 && DL.getTypeAllocSize(Ty) == Size)
     return stripAggregateTypeWrapping(DL, Ty);
   if (Offset > DL.getTypeAllocSize(Ty) ||
       (DL.getTypeAllocSize(Ty) - Offset) < Size)
     return nullptr;
 
   if (SequentialType *SeqTy = dyn_cast<SequentialType>(Ty)) {
     Type *ElementTy = SeqTy->getElementType();
     uint64_t ElementSize = DL.getTypeAllocSize(ElementTy);
     uint64_t NumSkippedElements = Offset / ElementSize;
     if (NumSkippedElements >= SeqTy->getNumElements())
       return nullptr;
     Offset -= NumSkippedElements * ElementSize;
 
     // First check if we need to recurse.
     if (Offset > 0 || Size < ElementSize) {
       // Bail if the partition ends in a different array element.
       if ((Offset + Size) > ElementSize)
         return nullptr;
       // Recurse through the element type trying to peel off offset bytes.
       return getTypePartition(DL, ElementTy, Offset, Size);
     }
     assert(Offset == 0);
 
     if (Size == ElementSize)
       return stripAggregateTypeWrapping(DL, ElementTy);
     assert(Size > ElementSize);
     uint64_t NumElements = Size / ElementSize;
     if (NumElements * ElementSize != Size)
       return nullptr;
     return ArrayType::get(ElementTy, NumElements);
   }
 
   StructType *STy = dyn_cast<StructType>(Ty);
   if (!STy)
     return nullptr;
 
   const StructLayout *SL = DL.getStructLayout(STy);
   if (Offset >= SL->getSizeInBytes())
     return nullptr;
   uint64_t EndOffset = Offset + Size;
   if (EndOffset > SL->getSizeInBytes())
     return nullptr;
 
   unsigned Index = SL->getElementContainingOffset(Offset);
   Offset -= SL->getElementOffset(Index);
 
   Type *ElementTy = STy->getElementType(Index);
   uint64_t ElementSize = DL.getTypeAllocSize(ElementTy);
   if (Offset >= ElementSize)
     return nullptr; // The offset points into alignment padding.
 
   // See if any partition must be contained by the element.
   if (Offset > 0 || Size < ElementSize) {
     if ((Offset + Size) > ElementSize)
       return nullptr;
     return getTypePartition(DL, ElementTy, Offset, Size);
   }
   assert(Offset == 0);
 
   if (Size == ElementSize)
     return stripAggregateTypeWrapping(DL, ElementTy);
 
   StructType::element_iterator EI = STy->element_begin() + Index,
                                EE = STy->element_end();
   if (EndOffset < SL->getSizeInBytes()) {
     unsigned EndIndex = SL->getElementContainingOffset(EndOffset);
     if (Index == EndIndex)
       return nullptr; // Within a single element and its padding.
 
     // Don't try to form "natural" types if the elements don't line up with the
     // expected size.
     // FIXME: We could potentially recurse down through the last element in the
     // sub-struct to find a natural end point.
     if (SL->getElementOffset(EndIndex) != EndOffset)
       return nullptr;
 
     assert(Index < EndIndex);
     EE = STy->element_begin() + EndIndex;
   }
 
   // Try to build up a sub-structure.
   StructType *SubTy =
       StructType::get(STy->getContext(), makeArrayRef(EI, EE), STy->isPacked());
   const StructLayout *SubSL = DL.getStructLayout(SubTy);
   if (Size != SubSL->getSizeInBytes())
     return nullptr; // The sub-struct doesn't have quite the size needed.
 
   return SubTy;
 }
 
 /// Pre-split loads and stores to simplify rewriting.
 ///
 /// We want to break up the splittable load+store pairs as much as
 /// possible. This is important to do as a preprocessing step, as once we
 /// start rewriting the accesses to partitions of the alloca we lose the
 /// necessary information to correctly split apart paired loads and stores
 /// which both point into this alloca. The case to consider is something like
 /// the following:
 ///
 ///   %a = alloca [12 x i8]
 ///   %gep1 = getelementptr [12 x i8]* %a, i32 0, i32 0
 ///   %gep2 = getelementptr [12 x i8]* %a, i32 0, i32 4
 ///   %gep3 = getelementptr [12 x i8]* %a, i32 0, i32 8
 ///   %iptr1 = bitcast i8* %gep1 to i64*
 ///   %iptr2 = bitcast i8* %gep2 to i64*
 ///   %fptr1 = bitcast i8* %gep1 to float*
 ///   %fptr2 = bitcast i8* %gep2 to float*
 ///   %fptr3 = bitcast i8* %gep3 to float*
 ///   store float 0.0, float* %fptr1
 ///   store float 1.0, float* %fptr2
 ///   %v = load i64* %iptr1
 ///   store i64 %v, i64* %iptr2
 ///   %f1 = load float* %fptr2
 ///   %f2 = load float* %fptr3
 ///
 /// Here we want to form 3 partitions of the alloca, each 4 bytes large, and
 /// promote everything so we recover the 2 SSA values that should have been
 /// there all along.
 ///
 /// \returns true if any changes are made.
 bool SROA::presplitLoadsAndStores(AllocaInst &AI, AllocaSlices &AS) {
   LLVM_DEBUG(dbgs() << "Pre-splitting loads and stores\n");
 
   // Track the loads and stores which are candidates for pre-splitting here, in
   // the order they first appear during the partition scan. These give stable
   // iteration order and a basis for tracking which loads and stores we
   // actually split.
   SmallVector<LoadInst *, 4> Loads;
   SmallVector<StoreInst *, 4> Stores;
 
   // We need to accumulate the splits required of each load or store where we
   // can find them via a direct lookup. This is important to cross-check loads
   // and stores against each other. We also track the slice so that we can kill
   // all the slices that end up split.
   struct SplitOffsets {
     Slice *S;
     std::vector<uint64_t> Splits;
   };
   SmallDenseMap<Instruction *, SplitOffsets, 8> SplitOffsetsMap;
 
   // Track loads out of this alloca which cannot, for any reason, be pre-split.
   // This is important as we also cannot pre-split stores of those loads!
   // FIXME: This is all pretty gross. It means that we can be more aggressive
   // in pre-splitting when the load feeding the store happens to come from
   // a separate alloca. Put another way, the effectiveness of SROA would be
   // decreased by a frontend which just concatenated all of its local allocas
   // into one big flat alloca. But defeating such patterns is exactly the job
   // SROA is tasked with! Sadly, to not have this discrepancy we would have
   // change store pre-splitting to actually force pre-splitting of the load
   // that feeds it *and all stores*. That makes pre-splitting much harder, but
   // maybe it would make it more principled?
   SmallPtrSet<LoadInst *, 8> UnsplittableLoads;
 
   LLVM_DEBUG(dbgs() << "  Searching for candidate loads and stores\n");
   for (auto &P : AS.partitions()) {
     for (Slice &S : P) {
       Instruction *I = cast<Instruction>(S.getUse()->getUser());
       if (!S.isSplittable() || S.endOffset() <= P.endOffset()) {
         // If this is a load we have to track that it can't participate in any
         // pre-splitting. If this is a store of a load we have to track that
         // that load also can't participate in any pre-splitting.
         if (auto *LI = dyn_cast<LoadInst>(I))
           UnsplittableLoads.insert(LI);
         else if (auto *SI = dyn_cast<StoreInst>(I))
           if (auto *LI = dyn_cast<LoadInst>(SI->getValueOperand()))
             UnsplittableLoads.insert(LI);
         continue;
       }
       assert(P.endOffset() > S.beginOffset() &&
              "Empty or backwards partition!");
 
       // Determine if this is a pre-splittable slice.
       if (auto *LI = dyn_cast<LoadInst>(I)) {
         assert(!LI->isVolatile() && "Cannot split volatile loads!");
 
         // The load must be used exclusively to store into other pointers for
         // us to be able to arbitrarily pre-split it. The stores must also be
         // simple to avoid changing semantics.
         auto IsLoadSimplyStored = [](LoadInst *LI) {
           for (User *LU : LI->users()) {
             auto *SI = dyn_cast<StoreInst>(LU);
             if (!SI || !SI->isSimple())
               return false;
           }
           return true;
         };
         if (!IsLoadSimplyStored(LI)) {
           UnsplittableLoads.insert(LI);
           continue;
         }
 
         Loads.push_back(LI);
       } else if (auto *SI = dyn_cast<StoreInst>(I)) {
         if (S.getUse() != &SI->getOperandUse(SI->getPointerOperandIndex()))
           // Skip stores *of* pointers. FIXME: This shouldn't even be possible!
           continue;
         auto *StoredLoad = dyn_cast<LoadInst>(SI->getValueOperand());
         if (!StoredLoad || !StoredLoad->isSimple())
           continue;
         assert(!SI->isVolatile() && "Cannot split volatile stores!");
 
         Stores.push_back(SI);
       } else {
         // Other uses cannot be pre-split.
         continue;
       }
 
       // Record the initial split.
       LLVM_DEBUG(dbgs() << "    Candidate: " << *I << "\n");
       auto &Offsets = SplitOffsetsMap[I];
       assert(Offsets.Splits.empty() &&
              "Should not have splits the first time we see an instruction!");
       Offsets.S = &S;
       Offsets.Splits.push_back(P.endOffset() - S.beginOffset());
     }
 
     // Now scan the already split slices, and add a split for any of them which
     // we're going to pre-split.
     for (Slice *S : P.splitSliceTails()) {
       auto SplitOffsetsMapI =
           SplitOffsetsMap.find(cast<Instruction>(S->getUse()->getUser()));
       if (SplitOffsetsMapI == SplitOffsetsMap.end())
         continue;
       auto &Offsets = SplitOffsetsMapI->second;
 
       assert(Offsets.S == S && "Found a mismatched slice!");
       assert(!Offsets.Splits.empty() &&
              "Cannot have an empty set of splits on the second partition!");
       assert(Offsets.Splits.back() ==
                  P.beginOffset() - Offsets.S->beginOffset() &&
              "Previous split does not end where this one begins!");
 
       // Record each split. The last partition's end isn't needed as the size
       // of the slice dictates that.
       if (S->endOffset() > P.endOffset())
         Offsets.Splits.push_back(P.endOffset() - Offsets.S->beginOffset());
     }
   }
 
   // We may have split loads where some of their stores are split stores. For
   // such loads and stores, we can only pre-split them if their splits exactly
   // match relative to their starting offset. We have to verify this prior to
   // any rewriting.
   Stores.erase(
       llvm::remove_if(Stores,
                       [&UnsplittableLoads, &SplitOffsetsMap](StoreInst *SI) {
                         // Lookup the load we are storing in our map of split
                         // offsets.
                         auto *LI = cast<LoadInst>(SI->getValueOperand());
                         // If it was completely unsplittable, then we're done,
                         // and this store can't be pre-split.
                         if (UnsplittableLoads.count(LI))
                           return true;
 
                         auto LoadOffsetsI = SplitOffsetsMap.find(LI);
                         if (LoadOffsetsI == SplitOffsetsMap.end())
                           return false; // Unrelated loads are definitely safe.
                         auto &LoadOffsets = LoadOffsetsI->second;
 
                         // Now lookup the store's offsets.
                         auto &StoreOffsets = SplitOffsetsMap[SI];
 
                         // If the relative offsets of each split in the load and
                         // store match exactly, then we can split them and we
                         // don't need to remove them here.
                         if (LoadOffsets.Splits == StoreOffsets.Splits)
                           return false;
 
                         LLVM_DEBUG(
                             dbgs()
                             << "    Mismatched splits for load and store:\n"
                             << "      " << *LI << "\n"
                             << "      " << *SI << "\n");
 
                         // We've found a store and load that we need to split
                         // with mismatched relative splits. Just give up on them
                         // and remove both instructions from our list of
                         // candidates.
                         UnsplittableLoads.insert(LI);
                         return true;
                       }),
       Stores.end());
   // Now we have to go *back* through all the stores, because a later store may
   // have caused an earlier store's load to become unsplittable and if it is
   // unsplittable for the later store, then we can't rely on it being split in
   // the earlier store either.
   Stores.erase(llvm::remove_if(Stores,
                                [&UnsplittableLoads](StoreInst *SI) {
                                  auto *LI =
                                      cast<LoadInst>(SI->getValueOperand());
                                  return UnsplittableLoads.count(LI);
                                }),
                Stores.end());
   // Once we've established all the loads that can't be split for some reason,
   // filter any that made it into our list out.
   Loads.erase(llvm::remove_if(Loads,
                               [&UnsplittableLoads](LoadInst *LI) {
                                 return UnsplittableLoads.count(LI);
                               }),
               Loads.end());
 
   // If no loads or stores are left, there is no pre-splitting to be done for
   // this alloca.
   if (Loads.empty() && Stores.empty())
     return false;
 
   // From here on, we can't fail and will be building new accesses, so rig up
   // an IR builder.
   IRBuilderTy IRB(&AI);
 
   // Collect the new slices which we will merge into the alloca slices.
   SmallVector<Slice, 4> NewSlices;
 
   // Track any allocas we end up splitting loads and stores for so we iterate
   // on them.
   SmallPtrSet<AllocaInst *, 4> ResplitPromotableAllocas;
 
   // At this point, we have collected all of the loads and stores we can
   // pre-split, and the specific splits needed for them. We actually do the
   // splitting in a specific order in order to handle when one of the loads in
   // the value operand to one of the stores.
   //
   // First, we rewrite all of the split loads, and just accumulate each split
   // load in a parallel structure. We also build the slices for them and append
   // them to the alloca slices.
   SmallDenseMap<LoadInst *, std::vector<LoadInst *>, 1> SplitLoadsMap;
   std::vector<LoadInst *> SplitLoads;
   const DataLayout &DL = AI.getModule()->getDataLayout();
   for (LoadInst *LI : Loads) {
     SplitLoads.clear();
 
     IntegerType *Ty = cast<IntegerType>(LI->getType());
     uint64_t LoadSize = Ty->getBitWidth() / 8;
     assert(LoadSize > 0 && "Cannot have a zero-sized integer load!");
 
     auto &Offsets = SplitOffsetsMap[LI];
     assert(LoadSize == Offsets.S->endOffset() - Offsets.S->beginOffset() &&
            "Slice size should always match load size exactly!");
     uint64_t BaseOffset = Offsets.S->beginOffset();
     assert(BaseOffset + LoadSize > BaseOffset &&
            "Cannot represent alloca access size using 64-bit integers!");
 
     Instruction *BasePtr = cast<Instruction>(LI->getPointerOperand());
     IRB.SetInsertPoint(LI);
 
     LLVM_DEBUG(dbgs() << "  Splitting load: " << *LI << "\n");
 
     uint64_t PartOffset = 0, PartSize = Offsets.Splits.front();
     int Idx = 0, Size = Offsets.Splits.size();
     for (;;) {
       auto *PartTy = Type::getIntNTy(Ty->getContext(), PartSize * 8);
       auto AS = LI->getPointerAddressSpace();
       auto *PartPtrTy = PartTy->getPointerTo(AS);
       LoadInst *PLoad = IRB.CreateAlignedLoad(
           getAdjustedPtr(IRB, DL, BasePtr,
                          APInt(DL.getIndexSizeInBits(AS), PartOffset),
                          PartPtrTy, BasePtr->getName() + "."),
           getAdjustedAlignment(LI, PartOffset, DL), /*IsVolatile*/ false,
           LI->getName());
       PLoad->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);
 
       // Append this load onto the list of split loads so we can find it later
       // to rewrite the stores.
       SplitLoads.push_back(PLoad);
 
       // Now build a new slice for the alloca.
       NewSlices.push_back(
           Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,
                 &PLoad->getOperandUse(PLoad->getPointerOperandIndex()),
                 /*IsSplittable*/ false));
       LLVM_DEBUG(dbgs() << "    new slice [" << NewSlices.back().beginOffset()
                         << ", " << NewSlices.back().endOffset()
                         << "): " << *PLoad << "\n");
 
       // See if we've handled all the splits.
       if (Idx >= Size)
         break;
 
       // Setup the next partition.
       PartOffset = Offsets.Splits[Idx];
       ++Idx;
       PartSize = (Idx < Size ? Offsets.Splits[Idx] : LoadSize) - PartOffset;
     }
 
     // Now that we have the split loads, do the slow walk over all uses of the
     // load and rewrite them as split stores, or save the split loads to use
     // below if the store is going to be split there anyways.
     bool DeferredStores = false;
     for (User *LU : LI->users()) {
       StoreInst *SI = cast<StoreInst>(LU);
       if (!Stores.empty() && SplitOffsetsMap.count(SI)) {
         DeferredStores = true;
         LLVM_DEBUG(dbgs() << "    Deferred splitting of store: " << *SI
                           << "\n");
         continue;
       }
 
       Value *StoreBasePtr = SI->getPointerOperand();
       IRB.SetInsertPoint(SI);
 
       LLVM_DEBUG(dbgs() << "    Splitting store of load: " << *SI << "\n");
 
       for (int Idx = 0, Size = SplitLoads.size(); Idx < Size; ++Idx) {
         LoadInst *PLoad = SplitLoads[Idx];
         uint64_t PartOffset = Idx == 0 ? 0 : Offsets.Splits[Idx - 1];
         auto *PartPtrTy =
             PLoad->getType()->getPointerTo(SI->getPointerAddressSpace());
 
         auto AS = SI->getPointerAddressSpace();
         StoreInst *PStore = IRB.CreateAlignedStore(
             PLoad,
             getAdjustedPtr(IRB, DL, StoreBasePtr,
                            APInt(DL.getIndexSizeInBits(AS), PartOffset),
                            PartPtrTy, StoreBasePtr->getName() + "."),
             getAdjustedAlignment(SI, PartOffset, DL), /*IsVolatile*/ false);
         PStore->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);
         LLVM_DEBUG(dbgs() << "      +" << PartOffset << ":" << *PStore << "\n");
       }
 
       // We want to immediately iterate on any allocas impacted by splitting
       // this store, and we have to track any promotable alloca (indicated by
       // a direct store) as needing to be resplit because it is no longer
       // promotable.
       if (AllocaInst *OtherAI = dyn_cast<AllocaInst>(StoreBasePtr)) {
         ResplitPromotableAllocas.insert(OtherAI);
         Worklist.insert(OtherAI);
       } else if (AllocaInst *OtherAI = dyn_cast<AllocaInst>(
                      StoreBasePtr->stripInBoundsOffsets())) {
         Worklist.insert(OtherAI);
       }
 
       // Mark the original store as dead.
       DeadInsts.insert(SI);
     }
 
     // Save the split loads if there are deferred stores among the users.
     if (DeferredStores)
       SplitLoadsMap.insert(std::make_pair(LI, std::move(SplitLoads)));
 
     // Mark the original load as dead and kill the original slice.
     DeadInsts.insert(LI);
     Offsets.S->kill();
   }
 
   // Second, we rewrite all of the split stores. At this point, we know that
   // all loads from this alloca have been split already. For stores of such
   // loads, we can simply look up the pre-existing split loads. For stores of
   // other loads, we split those loads first and then write split stores of
   // them.
   for (StoreInst *SI : Stores) {
     auto *LI = cast<LoadInst>(SI->getValueOperand());
     IntegerType *Ty = cast<IntegerType>(LI->getType());
     uint64_t StoreSize = Ty->getBitWidth() / 8;
     assert(StoreSize > 0 && "Cannot have a zero-sized integer store!");
 
     auto &Offsets = SplitOffsetsMap[SI];
     assert(StoreSize == Offsets.S->endOffset() - Offsets.S->beginOffset() &&
            "Slice size should always match load size exactly!");
     uint64_t BaseOffset = Offsets.S->beginOffset();
     assert(BaseOffset + StoreSize > BaseOffset &&
            "Cannot represent alloca access size using 64-bit integers!");
 
     Value *LoadBasePtr = LI->getPointerOperand();
     Instruction *StoreBasePtr = cast<Instruction>(SI->getPointerOperand());
 
     LLVM_DEBUG(dbgs() << "  Splitting store: " << *SI << "\n");
 
     // Check whether we have an already split load.
     auto SplitLoadsMapI = SplitLoadsMap.find(LI);
     std::vector<LoadInst *> *SplitLoads = nullptr;
     if (SplitLoadsMapI != SplitLoadsMap.end()) {
       SplitLoads = &SplitLoadsMapI->second;
       assert(SplitLoads->size() == Offsets.Splits.size() + 1 &&
              "Too few split loads for the number of splits in the store!");
     } else {
       LLVM_DEBUG(dbgs() << "          of load: " << *LI << "\n");
     }
 
     uint64_t PartOffset = 0, PartSize = Offsets.Splits.front();
     int Idx = 0, Size = Offsets.Splits.size();
     for (;;) {
       auto *PartTy = Type::getIntNTy(Ty->getContext(), PartSize * 8);
       auto *LoadPartPtrTy = PartTy->getPointerTo(LI->getPointerAddressSpace());
       auto *StorePartPtrTy = PartTy->getPointerTo(SI->getPointerAddressSpace());
 
       // Either lookup a split load or create one.
       LoadInst *PLoad;
       if (SplitLoads) {
         PLoad = (*SplitLoads)[Idx];
       } else {
         IRB.SetInsertPoint(LI);
         auto AS = LI->getPointerAddressSpace();
         PLoad = IRB.CreateAlignedLoad(
             getAdjustedPtr(IRB, DL, LoadBasePtr,
                            APInt(DL.getIndexSizeInBits(AS), PartOffset),
                            LoadPartPtrTy, LoadBasePtr->getName() + "."),
             getAdjustedAlignment(LI, PartOffset, DL), /*IsVolatile*/ false,
             LI->getName());
       }
 
       // And store this partition.
       IRB.SetInsertPoint(SI);
       auto AS = SI->getPointerAddressSpace();
       StoreInst *PStore = IRB.CreateAlignedStore(
           PLoad,
           getAdjustedPtr(IRB, DL, StoreBasePtr,
                          APInt(DL.getIndexSizeInBits(AS), PartOffset),
                          StorePartPtrTy, StoreBasePtr->getName() + "."),
           getAdjustedAlignment(SI, PartOffset, DL), /*IsVolatile*/ false);
 
       // Now build a new slice for the alloca.
       NewSlices.push_back(
           Slice(BaseOffset + PartOffset, BaseOffset + PartOffset + PartSize,
                 &PStore->getOperandUse(PStore->getPointerOperandIndex()),
                 /*IsSplittable*/ false));
       LLVM_DEBUG(dbgs() << "    new slice [" << NewSlices.back().beginOffset()
                         << ", " << NewSlices.back().endOffset()
                         << "): " << *PStore << "\n");
       if (!SplitLoads) {
         LLVM_DEBUG(dbgs() << "      of split load: " << *PLoad << "\n");
       }
 
       // See if we've finished all the splits.
       if (Idx >= Size)
         break;
 
       // Setup the next partition.
       PartOffset = Offsets.Splits[Idx];
       ++Idx;
       PartSize = (Idx < Size ? Offsets.Splits[Idx] : StoreSize) - PartOffset;
     }
 
     // We want to immediately iterate on any allocas impacted by splitting
     // this load, which is only relevant if it isn't a load of this alloca and
     // thus we didn't already split the loads above. We also have to keep track
     // of any promotable allocas we split loads on as they can no longer be
     // promoted.
     if (!SplitLoads) {
       if (AllocaInst *OtherAI = dyn_cast<AllocaInst>(LoadBasePtr)) {
         assert(OtherAI != &AI && "We can't re-split our own alloca!");
         ResplitPromotableAllocas.insert(OtherAI);
         Worklist.insert(OtherAI);
       } else if (AllocaInst *OtherAI = dyn_cast<AllocaInst>(
                      LoadBasePtr->stripInBoundsOffsets())) {
         assert(OtherAI != &AI && "We can't re-split our own alloca!");
         Worklist.insert(OtherAI);
       }
     }
 
     // Mark the original store as dead now that we've split it up and kill its
     // slice. Note that we leave the original load in place unless this store
     // was its only use. It may in turn be split up if it is an alloca load
     // for some other alloca, but it may be a normal load. This may introduce
     // redundant loads, but where those can be merged the rest of the optimizer
     // should handle the merging, and this uncovers SSA splits which is more
     // important. In practice, the original loads will almost always be fully
     // split and removed eventually, and the splits will be merged by any
     // trivial CSE, including instcombine.
     if (LI->hasOneUse()) {
       assert(*LI->user_begin() == SI && "Single use isn't this store!");
       DeadInsts.insert(LI);
     }
     DeadInsts.insert(SI);
     Offsets.S->kill();
   }
 
   // Remove the killed slices that have ben pre-split.
   AS.erase(llvm::remove_if(AS, [](const Slice &S) { return S.isDead(); }),
            AS.end());
 
   // Insert our new slices. This will sort and merge them into the sorted
   // sequence.
   AS.insert(NewSlices);
 
   LLVM_DEBUG(dbgs() << "  Pre-split slices:\n");
 #ifndef NDEBUG
   for (auto I = AS.begin(), E = AS.end(); I != E; ++I)
     LLVM_DEBUG(AS.print(dbgs(), I, "    "));
 #endif
 
   // Finally, don't try to promote any allocas that new require re-splitting.
   // They have already been added to the worklist above.
   PromotableAllocas.erase(
       llvm::remove_if(
           PromotableAllocas,
           [&](AllocaInst *AI) { return ResplitPromotableAllocas.count(AI); }),
       PromotableAllocas.end());
 
   return true;
 }
 
 /// Rewrite an alloca partition's users.
 ///
 /// This routine drives both of the rewriting goals of the SROA pass. It tries
 /// to rewrite uses of an alloca partition to be conducive for SSA value
 /// promotion. If the partition needs a new, more refined alloca, this will
 /// build that new alloca, preserving as much type information as possible, and
 /// rewrite the uses of the old alloca to point at the new one and have the
 /// appropriate new offsets. It also evaluates how successful the rewrite was
 /// at enabling promotion and if it was successful queues the alloca to be
 /// promoted.
 AllocaInst *SROA::rewritePartition(AllocaInst &AI, AllocaSlices &AS,
                                    Partition &P) {
   // Try to compute a friendly type for this partition of the alloca. This
   // won't always succeed, in which case we fall back to a legal integer type
   // or an i8 array of an appropriate size.
   Type *SliceTy = nullptr;
   const DataLayout &DL = AI.getModule()->getDataLayout();
   if (Type *CommonUseTy = findCommonType(P.begin(), P.end(), P.endOffset()))
     if (DL.getTypeAllocSize(CommonUseTy) >= P.size())
       SliceTy = CommonUseTy;
   if (!SliceTy)
     if (Type *TypePartitionTy = getTypePartition(DL, AI.getAllocatedType(),
                                                  P.beginOffset(), P.size()))
       SliceTy = TypePartitionTy;
   if ((!SliceTy || (SliceTy->isArrayTy() &&
                     SliceTy->getArrayElementType()->isIntegerTy())) &&
       DL.isLegalInteger(P.size() * 8))
     SliceTy = Type::getIntNTy(*C, P.size() * 8);
   if (!SliceTy)
     SliceTy = ArrayType::get(Type::getInt8Ty(*C), P.size());
   assert(DL.getTypeAllocSize(SliceTy) >= P.size());
 
   bool IsIntegerPromotable = isIntegerWideningViable(P, SliceTy, DL);
 
   VectorType *VecTy =
       IsIntegerPromotable ? nullptr : isVectorPromotionViable(P, DL);
   if (VecTy)
     SliceTy = VecTy;
 
   // Check for the case where we're going to rewrite to a new alloca of the
   // exact same type as the original, and with the same access offsets. In that
   // case, re-use the existing alloca, but still run through the rewriter to
   // perform phi and select speculation.
   // P.beginOffset() can be non-zero even with the same type in a case with
   // out-of-bounds access (e.g. @PR35657 function in SROA/basictest.ll).
   AllocaInst *NewAI;
   if (SliceTy == AI.getAllocatedType() && P.beginOffset() == 0) {
     NewAI = &AI;
     // FIXME: We should be able to bail at this point with "nothing changed".
     // FIXME: We might want to defer PHI speculation until after here.
     // FIXME: return nullptr;
   } else {
     unsigned Alignment = AI.getAlignment();
     if (!Alignment) {
       // The minimum alignment which users can rely on when the explicit
       // alignment is omitted or zero is that required by the ABI for this
       // type.
       Alignment = DL.getABITypeAlignment(AI.getAllocatedType());
     }
     Alignment = MinAlign(Alignment, P.beginOffset());
     // If we will get at least this much alignment from the type alone, leave
     // the alloca's alignment unconstrained.
     if (Alignment <= DL.getABITypeAlignment(SliceTy))
       Alignment = 0;
     NewAI = new AllocaInst(
       SliceTy, AI.getType()->getAddressSpace(), nullptr, Alignment,
         AI.getName() + ".sroa." + Twine(P.begin() - AS.begin()), &AI);
     // Copy the old AI debug location over to the new one.
     NewAI->setDebugLoc(AI.getDebugLoc());
     ++NumNewAllocas;
   }
 
   LLVM_DEBUG(dbgs() << "Rewriting alloca partition "
                     << "[" << P.beginOffset() << "," << P.endOffset()
                     << ") to: " << *NewAI << "\n");
 
   // Track the high watermark on the worklist as it is only relevant for
   // promoted allocas. We will reset it to this point if the alloca is not in
   // fact scheduled for promotion.
   unsigned PPWOldSize = PostPromotionWorklist.size();
   unsigned NumUses = 0;
   SmallSetVector<PHINode *, 8> PHIUsers;
   SmallSetVector<SelectInst *, 8> SelectUsers;
 
   AllocaSliceRewriter Rewriter(DL, AS, *this, AI, *NewAI, P.beginOffset(),
                                P.endOffset(), IsIntegerPromotable, VecTy,
                                PHIUsers, SelectUsers);
   bool Promotable = true;
   for (Slice *S : P.splitSliceTails()) {
     Promotable &= Rewriter.visit(S);
     ++NumUses;
   }
   for (Slice &S : P) {
     Promotable &= Rewriter.visit(&S);
     ++NumUses;
   }
 
   NumAllocaPartitionUses += NumUses;
   MaxUsesPerAllocaPartition.updateMax(NumUses);
 
   // Now that we've processed all the slices in the new partition, check if any
   // PHIs or Selects would block promotion.
   for (PHINode *PHI : PHIUsers)
     if (!isSafePHIToSpeculate(*PHI)) {
       Promotable = false;
       PHIUsers.clear();
       SelectUsers.clear();
       break;
     }
 
   for (SelectInst *Sel : SelectUsers)
     if (!isSafeSelectToSpeculate(*Sel)) {
       Promotable = false;
       PHIUsers.clear();
       SelectUsers.clear();
       break;
     }
 
   if (Promotable) {
     if (PHIUsers.empty() && SelectUsers.empty()) {
       // Promote the alloca.
       PromotableAllocas.push_back(NewAI);
     } else {
       // If we have either PHIs or Selects to speculate, add them to those
       // worklists and re-queue the new alloca so that we promote in on the
       // next iteration.
       for (PHINode *PHIUser : PHIUsers)
         SpeculatablePHIs.insert(PHIUser);
       for (SelectInst *SelectUser : SelectUsers)
         SpeculatableSelects.insert(SelectUser);
       Worklist.insert(NewAI);
     }
   } else {
     // Drop any post-promotion work items if promotion didn't happen.
     while (PostPromotionWorklist.size() > PPWOldSize)
       PostPromotionWorklist.pop_back();
 
     // We couldn't promote and we didn't create a new partition, nothing
     // happened.
     if (NewAI == &AI)
       return nullptr;
 
     // If we can't promote the alloca, iterate on it to check for new
     // refinements exposed by splitting the current alloca. Don't iterate on an
     // alloca which didn't actually change and didn't get promoted.
     Worklist.insert(NewAI);
   }
 
   return NewAI;
 }
 
 /// Walks the slices of an alloca and form partitions based on them,
 /// rewriting each of their uses.
 bool SROA::splitAlloca(AllocaInst &AI, AllocaSlices &AS) {
   if (AS.begin() == AS.end())
     return false;
 
   unsigned NumPartitions = 0;
   bool Changed = false;
   const DataLayout &DL = AI.getModule()->getDataLayout();
 
   // First try to pre-split loads and stores.
   Changed |= presplitLoadsAndStores(AI, AS);
 
   // Now that we have identified any pre-splitting opportunities,
   // mark loads and stores unsplittable except for the following case.
   // We leave a slice splittable if all other slices are disjoint or fully
   // included in the slice, such as whole-alloca loads and stores.
   // If we fail to split these during pre-splitting, we want to force them
   // to be rewritten into a partition.
   bool IsSorted = true;
 
   uint64_t AllocaSize = DL.getTypeAllocSize(AI.getAllocatedType());
   const uint64_t MaxBitVectorSize = 1024;
   if (AllocaSize <= MaxBitVectorSize) {
     // If a byte boundary is included in any load or store, a slice starting or
     // ending at the boundary is not splittable.
     SmallBitVector SplittableOffset(AllocaSize + 1, true);
     for (Slice &S : AS)
       for (unsigned O = S.beginOffset() + 1;
            O < S.endOffset() && O < AllocaSize; O++)
         SplittableOffset.reset(O);
 
     for (Slice &S : AS) {
       if (!S.isSplittable())
         continue;
 
       if ((S.beginOffset() > AllocaSize || SplittableOffset[S.beginOffset()]) &&
           (S.endOffset() > AllocaSize || SplittableOffset[S.endOffset()]))
         continue;
 
       if (isa<LoadInst>(S.getUse()->getUser()) ||
           isa<StoreInst>(S.getUse()->getUser())) {
         S.makeUnsplittable();
         IsSorted = false;
       }
     }
   }
   else {
     // We only allow whole-alloca splittable loads and stores
     // for a large alloca to avoid creating too large BitVector.
     for (Slice &S : AS) {
       if (!S.isSplittable())
         continue;
 
       if (S.beginOffset() == 0 && S.endOffset() >= AllocaSize)
         continue;
 
       if (isa<LoadInst>(S.getUse()->getUser()) ||
           isa<StoreInst>(S.getUse()->getUser())) {
         S.makeUnsplittable();
         IsSorted = false;
       }
     }
   }
 
   if (!IsSorted)
     llvm::sort(AS.begin(), AS.end());
 
   /// Describes the allocas introduced by rewritePartition in order to migrate
   /// the debug info.
   struct Fragment {
     AllocaInst *Alloca;
     uint64_t Offset;
     uint64_t Size;
     Fragment(AllocaInst *AI, uint64_t O, uint64_t S)
       : Alloca(AI), Offset(O), Size(S) {}
   };
   SmallVector<Fragment, 4> Fragments;
 
   // Rewrite each partition.
   for (auto &P : AS.partitions()) {
     if (AllocaInst *NewAI = rewritePartition(AI, AS, P)) {
       Changed = true;
       if (NewAI != &AI) {
         uint64_t SizeOfByte = 8;
         uint64_t AllocaSize = DL.getTypeSizeInBits(NewAI->getAllocatedType());
         // Don't include any padding.
         uint64_t Size = std::min(AllocaSize, P.size() * SizeOfByte);
         Fragments.push_back(Fragment(NewAI, P.beginOffset() * SizeOfByte, Size));
       }
     }
     ++NumPartitions;
   }
 
   NumAllocaPartitions += NumPartitions;
   MaxPartitionsPerAlloca.updateMax(NumPartitions);
 
   // Migrate debug information from the old alloca to the new alloca(s)
   // and the individual partitions.
   TinyPtrVector<DbgInfoIntrinsic *> DbgDeclares = FindDbgAddrUses(&AI);
   if (!DbgDeclares.empty()) {
     auto *Var = DbgDeclares.front()->getVariable();
     auto *Expr = DbgDeclares.front()->getExpression();
     auto VarSize = Var->getSizeInBits();
     DIBuilder DIB(*AI.getModule(), /*AllowUnresolved*/ false);
     uint64_t AllocaSize = DL.getTypeSizeInBits(AI.getAllocatedType());
     for (auto Fragment : Fragments) {
       // Create a fragment expression describing the new partition or reuse AI's
       // expression if there is only one partition.
       auto *FragmentExpr = Expr;
       if (Fragment.Size < AllocaSize || Expr->isFragment()) {
         // If this alloca is already a scalar replacement of a larger aggregate,
         // Fragment.Offset describes the offset inside the scalar.
         auto ExprFragment = Expr->getFragmentInfo();
         uint64_t Offset = ExprFragment ? ExprFragment->OffsetInBits : 0;
         uint64_t Start = Offset + Fragment.Offset;
         uint64_t Size = Fragment.Size;
         if (ExprFragment) {
           uint64_t AbsEnd =
               ExprFragment->OffsetInBits + ExprFragment->SizeInBits;
           if (Start >= AbsEnd)
             // No need to describe a SROAed padding.
             continue;
           Size = std::min(Size, AbsEnd - Start);
         }
         // The new, smaller fragment is stenciled out from the old fragment.
         if (auto OrigFragment = FragmentExpr->getFragmentInfo()) {
           assert(Start >= OrigFragment->OffsetInBits &&
                  "new fragment is outside of original fragment");
           Start -= OrigFragment->OffsetInBits;
         }
 
         // The alloca may be larger than the variable.
         if (VarSize) {
           if (Size > *VarSize)
             Size = *VarSize;
           if (Size == 0 || Start + Size > *VarSize)
             continue;
         }
 
         // Avoid creating a fragment expression that covers the entire variable.
         if (!VarSize || *VarSize != Size) {
           if (auto E =
                   DIExpression::createFragmentExpression(Expr, Start, Size))
             FragmentExpr = *E;
           else
             continue;
         }
       }
 
       // Remove any existing intrinsics describing the same alloca.
       for (DbgInfoIntrinsic *OldDII : FindDbgAddrUses(Fragment.Alloca))
         OldDII->eraseFromParent();
 
       DIB.insertDeclare(Fragment.Alloca, Var, FragmentExpr,
                         DbgDeclares.front()->getDebugLoc(), &AI);
     }
   }
   return Changed;
 }
 
 /// Clobber a use with undef, deleting the used value if it becomes dead.
 void SROA::clobberUse(Use &U) {
   Value *OldV = U;
   // Replace the use with an undef value.
   U = UndefValue::get(OldV->getType());
 
   // Check for this making an instruction dead. We have to garbage collect
   // all the dead instructions to ensure the uses of any alloca end up being
   // minimal.
   if (Instruction *OldI = dyn_cast<Instruction>(OldV))
     if (isInstructionTriviallyDead(OldI)) {
       DeadInsts.insert(OldI);
     }
 }
 
 /// Analyze an alloca for SROA.
 ///
 /// This analyzes the alloca to ensure we can reason about it, builds
 /// the slices of the alloca, and then hands it off to be split and
 /// rewritten as needed.
 bool SROA::runOnAlloca(AllocaInst &AI) {
   LLVM_DEBUG(dbgs() << "SROA alloca: " << AI << "\n");
   ++NumAllocasAnalyzed;
 
   // Special case dead allocas, as they're trivial.
   if (AI.use_empty()) {
     AI.eraseFromParent();
     return true;
   }
   const DataLayout &DL = AI.getModule()->getDataLayout();
 
   // Skip alloca forms that this analysis can't handle.
   if (AI.isArrayAllocation() || !AI.getAllocatedType()->isSized() ||
       DL.getTypeAllocSize(AI.getAllocatedType()) == 0)
     return false;
 
   bool Changed = false;
 
   // First, split any FCA loads and stores touching this alloca to promote
   // better splitting and promotion opportunities.
   AggLoadStoreRewriter AggRewriter;
   Changed |= AggRewriter.rewrite(AI);
 
   // Build the slices using a recursive instruction-visiting builder.
   AllocaSlices AS(DL, AI);
   LLVM_DEBUG(AS.print(dbgs()));
   if (AS.isEscaped())
     return Changed;
 
   // Delete all the dead users of this alloca before splitting and rewriting it.
   for (Instruction *DeadUser : AS.getDeadUsers()) {
     // Free up everything used by this instruction.
     for (Use &DeadOp : DeadUser->operands())
       clobberUse(DeadOp);
 
     // Now replace the uses of this instruction.
     DeadUser->replaceAllUsesWith(UndefValue::get(DeadUser->getType()));
 
     // And mark it for deletion.
     DeadInsts.insert(DeadUser);
     Changed = true;
   }
   for (Use *DeadOp : AS.getDeadOperands()) {
     clobberUse(*DeadOp);
     Changed = true;
   }
 
   // No slices to split. Leave the dead alloca for a later pass to clean up.
   if (AS.begin() == AS.end())
     return Changed;
 
   Changed |= splitAlloca(AI, AS);
 
   LLVM_DEBUG(dbgs() << "  Speculating PHIs\n");
   while (!SpeculatablePHIs.empty())
     speculatePHINodeLoads(*SpeculatablePHIs.pop_back_val());
 
   LLVM_DEBUG(dbgs() << "  Speculating Selects\n");
   while (!SpeculatableSelects.empty())
     speculateSelectInstLoads(*SpeculatableSelects.pop_back_val());
 
   return Changed;
 }
 
 /// Delete the dead instructions accumulated in this run.
 ///
 /// Recursively deletes the dead instructions we've accumulated. This is done
 /// at the very end to maximize locality of the recursive delete and to
 /// minimize the problems of invalidated instruction pointers as such pointers
 /// are used heavily in the intermediate stages of the algorithm.
 ///
 /// We also record the alloca instructions deleted here so that they aren't
 /// subsequently handed to mem2reg to promote.
 bool SROA::deleteDeadInstructions(
     SmallPtrSetImpl<AllocaInst *> &DeletedAllocas) {
   bool Changed = false;
   while (!DeadInsts.empty()) {
     Instruction *I = DeadInsts.pop_back_val();
     LLVM_DEBUG(dbgs() << "Deleting dead instruction: " << *I << "\n");
 
     // If the instruction is an alloca, find the possible dbg.declare connected
     // to it, and remove it too. We must do this before calling RAUW or we will
     // not be able to find it.
     if (AllocaInst *AI = dyn_cast<AllocaInst>(I)) {
       DeletedAllocas.insert(AI);
       for (DbgInfoIntrinsic *OldDII : FindDbgAddrUses(AI))
         OldDII->eraseFromParent();
     }
 
     I->replaceAllUsesWith(UndefValue::get(I->getType()));
 
     for (Use &Operand : I->operands())
       if (Instruction *U = dyn_cast<Instruction>(Operand)) {
         // Zero out the operand and see if it becomes trivially dead.
         Operand = nullptr;
         if (isInstructionTriviallyDead(U))
           DeadInsts.insert(U);
       }
 
     ++NumDeleted;
     I->eraseFromParent();
     Changed = true;
   }
   return Changed;
 }
 
 /// Promote the allocas, using the best available technique.
 ///
 /// This attempts to promote whatever allocas have been identified as viable in
 /// the PromotableAllocas list. If that list is empty, there is nothing to do.
 /// This function returns whether any promotion occurred.
 bool SROA::promoteAllocas(Function &F) {
   if (PromotableAllocas.empty())
     return false;
 
   NumPromoted += PromotableAllocas.size();
 
   LLVM_DEBUG(dbgs() << "Promoting allocas with mem2reg...\n");
   PromoteMemToReg(PromotableAllocas, *DT, AC);
   PromotableAllocas.clear();
   return true;
 }
 
 PreservedAnalyses SROA::runImpl(Function &F, DominatorTree &RunDT,
                                 AssumptionCache &RunAC) {
   LLVM_DEBUG(dbgs() << "SROA function: " << F.getName() << "\n");
   C = &F.getContext();
   DT = &RunDT;
   AC = &RunAC;
 
   BasicBlock &EntryBB = F.getEntryBlock();
   for (BasicBlock::iterator I = EntryBB.begin(), E = std::prev(EntryBB.end());
        I != E; ++I) {
     if (AllocaInst *AI = dyn_cast<AllocaInst>(I))
       Worklist.insert(AI);
   }
 
   bool Changed = false;
   // A set of deleted alloca instruction pointers which should be removed from
   // the list of promotable allocas.
   SmallPtrSet<AllocaInst *, 4> DeletedAllocas;
 
   do {
     while (!Worklist.empty()) {
       Changed |= runOnAlloca(*Worklist.pop_back_val());
       Changed |= deleteDeadInstructions(DeletedAllocas);
 
       // Remove the deleted allocas from various lists so that we don't try to
       // continue processing them.
       if (!DeletedAllocas.empty()) {
         auto IsInSet = [&](AllocaInst *AI) { return DeletedAllocas.count(AI); };
         Worklist.remove_if(IsInSet);
         PostPromotionWorklist.remove_if(IsInSet);
         PromotableAllocas.erase(llvm::remove_if(PromotableAllocas, IsInSet),
                                 PromotableAllocas.end());
         DeletedAllocas.clear();
       }
     }
 
     Changed |= promoteAllocas(F);
 
     Worklist = PostPromotionWorklist;
     PostPromotionWorklist.clear();
   } while (!Worklist.empty());
 
   if (!Changed)
     return PreservedAnalyses::all();
 
   PreservedAnalyses PA;
   PA.preserveSet<CFGAnalyses>();
   PA.preserve<GlobalsAA>();
   return PA;
 }
 
 PreservedAnalyses SROA::run(Function &F, FunctionAnalysisManager &AM) {
   return runImpl(F, AM.getResult<DominatorTreeAnalysis>(F),
                  AM.getResult<AssumptionAnalysis>(F));
 }
 
 /// A legacy pass for the legacy pass manager that wraps the \c SROA pass.
 ///
 /// This is in the llvm namespace purely to allow it to be a friend of the \c
 /// SROA pass.
 class llvm::sroa::SROALegacyPass : public FunctionPass {
   /// The SROA implementation.
   SROA Impl;
 
 public:
   static char ID;
 
   SROALegacyPass() : FunctionPass(ID) {
     initializeSROALegacyPassPass(*PassRegistry::getPassRegistry());
   }
 
   bool runOnFunction(Function &F) override {
     if (skipFunction(F))
       return false;
 
     auto PA = Impl.runImpl(
         F, getAnalysis<DominatorTreeWrapperPass>().getDomTree(),
         getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F));
     return !PA.areAllPreserved();
   }
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.addRequired<AssumptionCacheTracker>();
     AU.addRequired<DominatorTreeWrapperPass>();
     AU.addPreserved<GlobalsAAWrapperPass>();
     AU.setPreservesCFG();
   }
 
   StringRef getPassName() const override { return "SROA"; }
 };
 
 char SROALegacyPass::ID = 0;
 
 FunctionPass *llvm::createSROAPass() { return new SROALegacyPass(); }
 
 INITIALIZE_PASS_BEGIN(SROALegacyPass, "sroa",
                       "Scalar Replacement Of Aggregates", false, false)
 INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
 INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
 INITIALIZE_PASS_END(SROALegacyPass, "sroa", "Scalar Replacement Of Aggregates",
                     false, false)
Index: vendor/llvm/dist-release_70/lib/Transforms/Utils/CloneFunction.cpp
===================================================================
--- vendor/llvm/dist-release_70/lib/Transforms/Utils/CloneFunction.cpp	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Transforms/Utils/CloneFunction.cpp	(revision 338575)
@@ -1,825 +1,834 @@
 //===- CloneFunction.cpp - Clone a function into another function ---------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file implements the CloneFunctionInto interface, which is used as the
 // low-level function cloner.  This is used by the CloneFunction and function
 // inliner to do the dirty work of copying the body of a function around.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/ConstantFolding.h"
 #include "llvm/Analysis/InstructionSimplify.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include "llvm/IR/CFG.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DebugInfo.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GlobalVariable.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/IR/Module.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/Cloning.h"
 #include "llvm/Transforms/Utils/ValueMapper.h"
 #include <map>
 using namespace llvm;
 
 /// See comments in Cloning.h.
 BasicBlock *llvm::CloneBasicBlock(const BasicBlock *BB, ValueToValueMapTy &VMap,
                                   const Twine &NameSuffix, Function *F,
                                   ClonedCodeInfo *CodeInfo,
                                   DebugInfoFinder *DIFinder) {
   DenseMap<const MDNode *, MDNode *> Cache;
   BasicBlock *NewBB = BasicBlock::Create(BB->getContext(), "", F);
   if (BB->hasName())
     NewBB->setName(BB->getName() + NameSuffix);
 
   bool hasCalls = false, hasDynamicAllocas = false, hasStaticAllocas = false;
   Module *TheModule = F ? F->getParent() : nullptr;
 
   // Loop over all instructions, and copy them over.
   for (const Instruction &I : *BB) {
     if (DIFinder && TheModule)
       DIFinder->processInstruction(*TheModule, I);
 
     Instruction *NewInst = I.clone();
     if (I.hasName())
       NewInst->setName(I.getName() + NameSuffix);
     NewBB->getInstList().push_back(NewInst);
     VMap[&I] = NewInst; // Add instruction map to value.
 
     hasCalls |= (isa<CallInst>(I) && !isa<DbgInfoIntrinsic>(I));
     if (const AllocaInst *AI = dyn_cast<AllocaInst>(&I)) {
       if (isa<ConstantInt>(AI->getArraySize()))
         hasStaticAllocas = true;
       else
         hasDynamicAllocas = true;
     }
   }
 
   if (CodeInfo) {
     CodeInfo->ContainsCalls          |= hasCalls;
     CodeInfo->ContainsDynamicAllocas |= hasDynamicAllocas;
     CodeInfo->ContainsDynamicAllocas |= hasStaticAllocas &&
                                         BB != &BB->getParent()->getEntryBlock();
   }
   return NewBB;
 }
 
 // Clone OldFunc into NewFunc, transforming the old arguments into references to
 // VMap values.
 //
 void llvm::CloneFunctionInto(Function *NewFunc, const Function *OldFunc,
                              ValueToValueMapTy &VMap,
                              bool ModuleLevelChanges,
                              SmallVectorImpl<ReturnInst*> &Returns,
                              const char *NameSuffix, ClonedCodeInfo *CodeInfo,
                              ValueMapTypeRemapper *TypeMapper,
                              ValueMaterializer *Materializer) {
   assert(NameSuffix && "NameSuffix cannot be null!");
 
 #ifndef NDEBUG
   for (const Argument &I : OldFunc->args())
     assert(VMap.count(&I) && "No mapping from source argument specified!");
 #endif
 
   // Copy all attributes other than those stored in the AttributeList.  We need
   // to remap the parameter indices of the AttributeList.
   AttributeList NewAttrs = NewFunc->getAttributes();
   NewFunc->copyAttributesFrom(OldFunc);
   NewFunc->setAttributes(NewAttrs);
 
   // Fix up the personality function that got copied over.
   if (OldFunc->hasPersonalityFn())
     NewFunc->setPersonalityFn(
         MapValue(OldFunc->getPersonalityFn(), VMap,
                  ModuleLevelChanges ? RF_None : RF_NoModuleLevelChanges,
                  TypeMapper, Materializer));
 
   SmallVector<AttributeSet, 4> NewArgAttrs(NewFunc->arg_size());
   AttributeList OldAttrs = OldFunc->getAttributes();
 
   // Clone any argument attributes that are present in the VMap.
   for (const Argument &OldArg : OldFunc->args()) {
     if (Argument *NewArg = dyn_cast<Argument>(VMap[&OldArg])) {
       NewArgAttrs[NewArg->getArgNo()] =
           OldAttrs.getParamAttributes(OldArg.getArgNo());
     }
   }
 
   NewFunc->setAttributes(
       AttributeList::get(NewFunc->getContext(), OldAttrs.getFnAttributes(),
                          OldAttrs.getRetAttributes(), NewArgAttrs));
 
   bool MustCloneSP =
       OldFunc->getParent() && OldFunc->getParent() == NewFunc->getParent();
   DISubprogram *SP = OldFunc->getSubprogram();
   if (SP) {
     assert(!MustCloneSP || ModuleLevelChanges);
     // Add mappings for some DebugInfo nodes that we don't want duplicated
     // even if they're distinct.
     auto &MD = VMap.MD();
     MD[SP->getUnit()].reset(SP->getUnit());
     MD[SP->getType()].reset(SP->getType());
     MD[SP->getFile()].reset(SP->getFile());
     // If we're not cloning into the same module, no need to clone the
     // subprogram
     if (!MustCloneSP)
       MD[SP].reset(SP);
   }
 
   SmallVector<std::pair<unsigned, MDNode *>, 1> MDs;
   OldFunc->getAllMetadata(MDs);
   for (auto MD : MDs) {
     NewFunc->addMetadata(
         MD.first,
         *MapMetadata(MD.second, VMap,
                      ModuleLevelChanges ? RF_None : RF_NoModuleLevelChanges,
                      TypeMapper, Materializer));
   }
 
   // When we remap instructions, we want to avoid duplicating inlined
   // DISubprograms, so record all subprograms we find as we duplicate
   // instructions and then freeze them in the MD map.
   // We also record information about dbg.value and dbg.declare to avoid
   // duplicating the types.
   DebugInfoFinder DIFinder;
 
   // Loop over all of the basic blocks in the function, cloning them as
   // appropriate.  Note that we save BE this way in order to handle cloning of
   // recursive functions into themselves.
   //
   for (Function::const_iterator BI = OldFunc->begin(), BE = OldFunc->end();
        BI != BE; ++BI) {
     const BasicBlock &BB = *BI;
 
     // Create a new basic block and copy instructions into it!
     BasicBlock *CBB = CloneBasicBlock(&BB, VMap, NameSuffix, NewFunc, CodeInfo,
                                       ModuleLevelChanges ? &DIFinder : nullptr);
 
     // Add basic block mapping.
     VMap[&BB] = CBB;
 
     // It is only legal to clone a function if a block address within that
     // function is never referenced outside of the function.  Given that, we
     // want to map block addresses from the old function to block addresses in
     // the clone. (This is different from the generic ValueMapper
     // implementation, which generates an invalid blockaddress when
     // cloning a function.)
     if (BB.hasAddressTaken()) {
       Constant *OldBBAddr = BlockAddress::get(const_cast<Function*>(OldFunc),
                                               const_cast<BasicBlock*>(&BB));
       VMap[OldBBAddr] = BlockAddress::get(NewFunc, CBB);
     }
 
     // Note return instructions for the caller.
     if (ReturnInst *RI = dyn_cast<ReturnInst>(CBB->getTerminator()))
       Returns.push_back(RI);
   }
 
   for (DISubprogram *ISP : DIFinder.subprograms())
     if (ISP != SP)
       VMap.MD()[ISP].reset(ISP);
 
   for (DICompileUnit *CU : DIFinder.compile_units())
     VMap.MD()[CU].reset(CU);
 
   for (DIType *Type : DIFinder.types())
     VMap.MD()[Type].reset(Type);
 
   // Loop over all of the instructions in the function, fixing up operand
   // references as we go.  This uses VMap to do all the hard work.
   for (Function::iterator BB =
            cast<BasicBlock>(VMap[&OldFunc->front()])->getIterator(),
                           BE = NewFunc->end();
        BB != BE; ++BB)
     // Loop over all instructions, fixing each one as we find it...
     for (Instruction &II : *BB)
       RemapInstruction(&II, VMap,
                        ModuleLevelChanges ? RF_None : RF_NoModuleLevelChanges,
                        TypeMapper, Materializer);
 }
 
 /// Return a copy of the specified function and add it to that function's
 /// module.  Also, any references specified in the VMap are changed to refer to
 /// their mapped value instead of the original one.  If any of the arguments to
 /// the function are in the VMap, the arguments are deleted from the resultant
 /// function.  The VMap is updated to include mappings from all of the
 /// instructions and basicblocks in the function from their old to new values.
 ///
 Function *llvm::CloneFunction(Function *F, ValueToValueMapTy &VMap,
                               ClonedCodeInfo *CodeInfo) {
   std::vector<Type*> ArgTypes;
 
   // The user might be deleting arguments to the function by specifying them in
   // the VMap.  If so, we need to not add the arguments to the arg ty vector
   //
   for (const Argument &I : F->args())
     if (VMap.count(&I) == 0) // Haven't mapped the argument to anything yet?
       ArgTypes.push_back(I.getType());
 
   // Create a new function type...
   FunctionType *FTy = FunctionType::get(F->getFunctionType()->getReturnType(),
                                     ArgTypes, F->getFunctionType()->isVarArg());
 
   // Create the new function...
   Function *NewF =
       Function::Create(FTy, F->getLinkage(), F->getName(), F->getParent());
 
   // Loop over the arguments, copying the names of the mapped arguments over...
   Function::arg_iterator DestI = NewF->arg_begin();
   for (const Argument & I : F->args())
     if (VMap.count(&I) == 0) {     // Is this argument preserved?
       DestI->setName(I.getName()); // Copy the name over...
       VMap[&I] = &*DestI++;        // Add mapping to VMap
     }
 
   SmallVector<ReturnInst*, 8> Returns;  // Ignore returns cloned.
   CloneFunctionInto(NewF, F, VMap, F->getSubprogram() != nullptr, Returns, "",
                     CodeInfo);
 
   return NewF;
 }
 
 
 
 namespace {
   /// This is a private class used to implement CloneAndPruneFunctionInto.
   struct PruningFunctionCloner {
     Function *NewFunc;
     const Function *OldFunc;
     ValueToValueMapTy &VMap;
     bool ModuleLevelChanges;
     const char *NameSuffix;
     ClonedCodeInfo *CodeInfo;
 
   public:
     PruningFunctionCloner(Function *newFunc, const Function *oldFunc,
                           ValueToValueMapTy &valueMap, bool moduleLevelChanges,
                           const char *nameSuffix, ClonedCodeInfo *codeInfo)
         : NewFunc(newFunc), OldFunc(oldFunc), VMap(valueMap),
           ModuleLevelChanges(moduleLevelChanges), NameSuffix(nameSuffix),
           CodeInfo(codeInfo) {}
 
     /// The specified block is found to be reachable, clone it and
     /// anything that it can reach.
     void CloneBlock(const BasicBlock *BB,
                     BasicBlock::const_iterator StartingInst,
                     std::vector<const BasicBlock*> &ToClone);
   };
 }
 
 /// The specified block is found to be reachable, clone it and
 /// anything that it can reach.
 void PruningFunctionCloner::CloneBlock(const BasicBlock *BB,
                                        BasicBlock::const_iterator StartingInst,
                                        std::vector<const BasicBlock*> &ToClone){
   WeakTrackingVH &BBEntry = VMap[BB];
 
   // Have we already cloned this block?
   if (BBEntry) return;
 
   // Nope, clone it now.
   BasicBlock *NewBB;
   BBEntry = NewBB = BasicBlock::Create(BB->getContext());
   if (BB->hasName()) NewBB->setName(BB->getName()+NameSuffix);
 
   // It is only legal to clone a function if a block address within that
   // function is never referenced outside of the function.  Given that, we
   // want to map block addresses from the old function to block addresses in
   // the clone. (This is different from the generic ValueMapper
   // implementation, which generates an invalid blockaddress when
   // cloning a function.)
   //
   // Note that we don't need to fix the mapping for unreachable blocks;
   // the default mapping there is safe.
   if (BB->hasAddressTaken()) {
     Constant *OldBBAddr = BlockAddress::get(const_cast<Function*>(OldFunc),
                                             const_cast<BasicBlock*>(BB));
     VMap[OldBBAddr] = BlockAddress::get(NewFunc, NewBB);
   }
 
   bool hasCalls = false, hasDynamicAllocas = false, hasStaticAllocas = false;
 
   // Loop over all instructions, and copy them over, DCE'ing as we go.  This
   // loop doesn't include the terminator.
   for (BasicBlock::const_iterator II = StartingInst, IE = --BB->end();
        II != IE; ++II) {
 
     Instruction *NewInst = II->clone();
 
     // Eagerly remap operands to the newly cloned instruction, except for PHI
     // nodes for which we defer processing until we update the CFG.
     if (!isa<PHINode>(NewInst)) {
       RemapInstruction(NewInst, VMap,
                        ModuleLevelChanges ? RF_None : RF_NoModuleLevelChanges);
 
       // If we can simplify this instruction to some other value, simply add
       // a mapping to that value rather than inserting a new instruction into
       // the basic block.
       if (Value *V =
               SimplifyInstruction(NewInst, BB->getModule()->getDataLayout())) {
         // On the off-chance that this simplifies to an instruction in the old
         // function, map it back into the new function.
         if (NewFunc != OldFunc)
           if (Value *MappedV = VMap.lookup(V))
             V = MappedV;
 
         if (!NewInst->mayHaveSideEffects()) {
           VMap[&*II] = V;
           NewInst->deleteValue();
           continue;
         }
       }
     }
 
     if (II->hasName())
       NewInst->setName(II->getName()+NameSuffix);
     VMap[&*II] = NewInst; // Add instruction map to value.
     NewBB->getInstList().push_back(NewInst);
     hasCalls |= (isa<CallInst>(II) && !isa<DbgInfoIntrinsic>(II));
 
     if (CodeInfo)
       if (auto CS = ImmutableCallSite(&*II))
         if (CS.hasOperandBundles())
           CodeInfo->OperandBundleCallSites.push_back(NewInst);
 
     if (const AllocaInst *AI = dyn_cast<AllocaInst>(II)) {
       if (isa<ConstantInt>(AI->getArraySize()))
         hasStaticAllocas = true;
       else
         hasDynamicAllocas = true;
     }
   }
 
   // Finally, clone over the terminator.
   const TerminatorInst *OldTI = BB->getTerminator();
   bool TerminatorDone = false;
   if (const BranchInst *BI = dyn_cast<BranchInst>(OldTI)) {
     if (BI->isConditional()) {
       // If the condition was a known constant in the callee...
       ConstantInt *Cond = dyn_cast<ConstantInt>(BI->getCondition());
       // Or is a known constant in the caller...
       if (!Cond) {
         Value *V = VMap.lookup(BI->getCondition());
         Cond = dyn_cast_or_null<ConstantInt>(V);
       }
 
       // Constant fold to uncond branch!
       if (Cond) {
         BasicBlock *Dest = BI->getSuccessor(!Cond->getZExtValue());
         VMap[OldTI] = BranchInst::Create(Dest, NewBB);
         ToClone.push_back(Dest);
         TerminatorDone = true;
       }
     }
   } else if (const SwitchInst *SI = dyn_cast<SwitchInst>(OldTI)) {
     // If switching on a value known constant in the caller.
     ConstantInt *Cond = dyn_cast<ConstantInt>(SI->getCondition());
     if (!Cond) { // Or known constant after constant prop in the callee...
       Value *V = VMap.lookup(SI->getCondition());
       Cond = dyn_cast_or_null<ConstantInt>(V);
     }
     if (Cond) {     // Constant fold to uncond branch!
       SwitchInst::ConstCaseHandle Case = *SI->findCaseValue(Cond);
       BasicBlock *Dest = const_cast<BasicBlock*>(Case.getCaseSuccessor());
       VMap[OldTI] = BranchInst::Create(Dest, NewBB);
       ToClone.push_back(Dest);
       TerminatorDone = true;
     }
   }
 
   if (!TerminatorDone) {
     Instruction *NewInst = OldTI->clone();
     if (OldTI->hasName())
       NewInst->setName(OldTI->getName()+NameSuffix);
     NewBB->getInstList().push_back(NewInst);
     VMap[OldTI] = NewInst;             // Add instruction map to value.
 
     if (CodeInfo)
       if (auto CS = ImmutableCallSite(OldTI))
         if (CS.hasOperandBundles())
           CodeInfo->OperandBundleCallSites.push_back(NewInst);
 
     // Recursively clone any reachable successor blocks.
     const TerminatorInst *TI = BB->getTerminator();
     for (const BasicBlock *Succ : TI->successors())
       ToClone.push_back(Succ);
   }
 
   if (CodeInfo) {
     CodeInfo->ContainsCalls          |= hasCalls;
     CodeInfo->ContainsDynamicAllocas |= hasDynamicAllocas;
     CodeInfo->ContainsDynamicAllocas |= hasStaticAllocas &&
       BB != &BB->getParent()->front();
   }
 }
 
 /// This works like CloneAndPruneFunctionInto, except that it does not clone the
 /// entire function. Instead it starts at an instruction provided by the caller
 /// and copies (and prunes) only the code reachable from that instruction.
 void llvm::CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc,
                                      const Instruction *StartingInst,
                                      ValueToValueMapTy &VMap,
                                      bool ModuleLevelChanges,
                                      SmallVectorImpl<ReturnInst *> &Returns,
                                      const char *NameSuffix,
                                      ClonedCodeInfo *CodeInfo) {
   assert(NameSuffix && "NameSuffix cannot be null!");
 
   ValueMapTypeRemapper *TypeMapper = nullptr;
   ValueMaterializer *Materializer = nullptr;
 
 #ifndef NDEBUG
   // If the cloning starts at the beginning of the function, verify that
   // the function arguments are mapped.
   if (!StartingInst)
     for (const Argument &II : OldFunc->args())
       assert(VMap.count(&II) && "No mapping from source argument specified!");
 #endif
 
   PruningFunctionCloner PFC(NewFunc, OldFunc, VMap, ModuleLevelChanges,
                             NameSuffix, CodeInfo);
   const BasicBlock *StartingBB;
   if (StartingInst)
     StartingBB = StartingInst->getParent();
   else {
     StartingBB = &OldFunc->getEntryBlock();
     StartingInst = &StartingBB->front();
   }
 
   // Clone the entry block, and anything recursively reachable from it.
   std::vector<const BasicBlock*> CloneWorklist;
   PFC.CloneBlock(StartingBB, StartingInst->getIterator(), CloneWorklist);
   while (!CloneWorklist.empty()) {
     const BasicBlock *BB = CloneWorklist.back();
     CloneWorklist.pop_back();
     PFC.CloneBlock(BB, BB->begin(), CloneWorklist);
   }
 
   // Loop over all of the basic blocks in the old function.  If the block was
   // reachable, we have cloned it and the old block is now in the value map:
   // insert it into the new function in the right order.  If not, ignore it.
   //
   // Defer PHI resolution until rest of function is resolved.
   SmallVector<const PHINode*, 16> PHIToResolve;
   for (const BasicBlock &BI : *OldFunc) {
     Value *V = VMap.lookup(&BI);
     BasicBlock *NewBB = cast_or_null<BasicBlock>(V);
     if (!NewBB) continue;  // Dead block.
 
     // Add the new block to the new function.
     NewFunc->getBasicBlockList().push_back(NewBB);
 
     // Handle PHI nodes specially, as we have to remove references to dead
     // blocks.
     for (const PHINode &PN : BI.phis()) {
       // PHI nodes may have been remapped to non-PHI nodes by the caller or
       // during the cloning process.
       if (isa<PHINode>(VMap[&PN]))
         PHIToResolve.push_back(&PN);
       else
         break;
     }
 
     // Finally, remap the terminator instructions, as those can't be remapped
     // until all BBs are mapped.
     RemapInstruction(NewBB->getTerminator(), VMap,
                      ModuleLevelChanges ? RF_None : RF_NoModuleLevelChanges,
                      TypeMapper, Materializer);
   }
 
   // Defer PHI resolution until rest of function is resolved, PHI resolution
   // requires the CFG to be up-to-date.
   for (unsigned phino = 0, e = PHIToResolve.size(); phino != e; ) {
     const PHINode *OPN = PHIToResolve[phino];
     unsigned NumPreds = OPN->getNumIncomingValues();
     const BasicBlock *OldBB = OPN->getParent();
     BasicBlock *NewBB = cast<BasicBlock>(VMap[OldBB]);
 
     // Map operands for blocks that are live and remove operands for blocks
     // that are dead.
     for (; phino != PHIToResolve.size() &&
          PHIToResolve[phino]->getParent() == OldBB; ++phino) {
       OPN = PHIToResolve[phino];
       PHINode *PN = cast<PHINode>(VMap[OPN]);
       for (unsigned pred = 0, e = NumPreds; pred != e; ++pred) {
         Value *V = VMap.lookup(PN->getIncomingBlock(pred));
         if (BasicBlock *MappedBlock = cast_or_null<BasicBlock>(V)) {
           Value *InVal = MapValue(PN->getIncomingValue(pred),
                                   VMap,
                         ModuleLevelChanges ? RF_None : RF_NoModuleLevelChanges);
           assert(InVal && "Unknown input value?");
           PN->setIncomingValue(pred, InVal);
           PN->setIncomingBlock(pred, MappedBlock);
         } else {
           PN->removeIncomingValue(pred, false);
           --pred;  // Revisit the next entry.
           --e;
         }
       }
     }
 
     // The loop above has removed PHI entries for those blocks that are dead
     // and has updated others.  However, if a block is live (i.e. copied over)
     // but its terminator has been changed to not go to this block, then our
     // phi nodes will have invalid entries.  Update the PHI nodes in this
     // case.
     PHINode *PN = cast<PHINode>(NewBB->begin());
     NumPreds = pred_size(NewBB);
     if (NumPreds != PN->getNumIncomingValues()) {
       assert(NumPreds < PN->getNumIncomingValues());
       // Count how many times each predecessor comes to this block.
       std::map<BasicBlock*, unsigned> PredCount;
       for (pred_iterator PI = pred_begin(NewBB), E = pred_end(NewBB);
            PI != E; ++PI)
         --PredCount[*PI];
 
       // Figure out how many entries to remove from each PHI.
       for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i)
         ++PredCount[PN->getIncomingBlock(i)];
 
       // At this point, the excess predecessor entries are positive in the
       // map.  Loop over all of the PHIs and remove excess predecessor
       // entries.
       BasicBlock::iterator I = NewBB->begin();
       for (; (PN = dyn_cast<PHINode>(I)); ++I) {
         for (const auto &PCI : PredCount) {
           BasicBlock *Pred = PCI.first;
           for (unsigned NumToRemove = PCI.second; NumToRemove; --NumToRemove)
             PN->removeIncomingValue(Pred, false);
         }
       }
     }
 
     // If the loops above have made these phi nodes have 0 or 1 operand,
     // replace them with undef or the input value.  We must do this for
     // correctness, because 0-operand phis are not valid.
     PN = cast<PHINode>(NewBB->begin());
     if (PN->getNumIncomingValues() == 0) {
       BasicBlock::iterator I = NewBB->begin();
       BasicBlock::const_iterator OldI = OldBB->begin();
       while ((PN = dyn_cast<PHINode>(I++))) {
         Value *NV = UndefValue::get(PN->getType());
         PN->replaceAllUsesWith(NV);
         assert(VMap[&*OldI] == PN && "VMap mismatch");
         VMap[&*OldI] = NV;
         PN->eraseFromParent();
         ++OldI;
       }
     }
   }
 
   // Make a second pass over the PHINodes now that all of them have been
   // remapped into the new function, simplifying the PHINode and performing any
   // recursive simplifications exposed. This will transparently update the
   // WeakTrackingVH in the VMap. Notably, we rely on that so that if we coalesce
   // two PHINodes, the iteration over the old PHIs remains valid, and the
   // mapping will just map us to the new node (which may not even be a PHI
   // node).
   const DataLayout &DL = NewFunc->getParent()->getDataLayout();
   SmallSetVector<const Value *, 8> Worklist;
   for (unsigned Idx = 0, Size = PHIToResolve.size(); Idx != Size; ++Idx)
     if (isa<PHINode>(VMap[PHIToResolve[Idx]]))
       Worklist.insert(PHIToResolve[Idx]);
 
   // Note that we must test the size on each iteration, the worklist can grow.
   for (unsigned Idx = 0; Idx != Worklist.size(); ++Idx) {
     const Value *OrigV = Worklist[Idx];
     auto *I = dyn_cast_or_null<Instruction>(VMap.lookup(OrigV));
     if (!I)
       continue;
 
     // Skip over non-intrinsic callsites, we don't want to remove any nodes from
     // the CGSCC.
     CallSite CS = CallSite(I);
     if (CS && CS.getCalledFunction() && !CS.getCalledFunction()->isIntrinsic())
       continue;
 
     // See if this instruction simplifies.
     Value *SimpleV = SimplifyInstruction(I, DL);
     if (!SimpleV)
       continue;
 
     // Stash away all the uses of the old instruction so we can check them for
     // recursive simplifications after a RAUW. This is cheaper than checking all
     // uses of To on the recursive step in most cases.
     for (const User *U : OrigV->users())
       Worklist.insert(cast<Instruction>(U));
 
     // Replace the instruction with its simplified value.
     I->replaceAllUsesWith(SimpleV);
 
     // If the original instruction had no side effects, remove it.
     if (isInstructionTriviallyDead(I))
       I->eraseFromParent();
     else
       VMap[OrigV] = I;
   }
 
   // Now that the inlined function body has been fully constructed, go through
   // and zap unconditional fall-through branches. This happens all the time when
   // specializing code: code specialization turns conditional branches into
   // uncond branches, and this code folds them.
   Function::iterator Begin = cast<BasicBlock>(VMap[StartingBB])->getIterator();
   Function::iterator I = Begin;
   while (I != NewFunc->end()) {
+    // We need to simplify conditional branches and switches with a constant
+    // operand. We try to prune these out when cloning, but if the
+    // simplification required looking through PHI nodes, those are only
+    // available after forming the full basic block. That may leave some here,
+    // and we still want to prune the dead code as early as possible.
+    //
+    // Do the folding before we check if the block is dead since we want code
+    // like
+    //  bb:
+    //    br i1 undef, label %bb, label %bb
+    // to be simplified to
+    //  bb:
+    //    br label %bb
+    // before we call I->getSinglePredecessor().
+    ConstantFoldTerminator(&*I);
+
     // Check if this block has become dead during inlining or other
     // simplifications. Note that the first block will appear dead, as it has
     // not yet been wired up properly.
     if (I != Begin && (pred_begin(&*I) == pred_end(&*I) ||
                        I->getSinglePredecessor() == &*I)) {
       BasicBlock *DeadBB = &*I++;
       DeleteDeadBlock(DeadBB);
       continue;
     }
-
-    // We need to simplify conditional branches and switches with a constant
-    // operand. We try to prune these out when cloning, but if the
-    // simplification required looking through PHI nodes, those are only
-    // available after forming the full basic block. That may leave some here,
-    // and we still want to prune the dead code as early as possible.
-    ConstantFoldTerminator(&*I);
 
     BranchInst *BI = dyn_cast<BranchInst>(I->getTerminator());
     if (!BI || BI->isConditional()) { ++I; continue; }
 
     BasicBlock *Dest = BI->getSuccessor(0);
     if (!Dest->getSinglePredecessor()) {
       ++I; continue;
     }
 
     // We shouldn't be able to get single-entry PHI nodes here, as instsimplify
     // above should have zapped all of them..
     assert(!isa<PHINode>(Dest->begin()));
 
     // We know all single-entry PHI nodes in the inlined function have been
     // removed, so we just need to splice the blocks.
     BI->eraseFromParent();
 
     // Make all PHI nodes that referred to Dest now refer to I as their source.
     Dest->replaceAllUsesWith(&*I);
 
     // Move all the instructions in the succ to the pred.
     I->getInstList().splice(I->end(), Dest->getInstList());
 
     // Remove the dest block.
     Dest->eraseFromParent();
 
     // Do not increment I, iteratively merge all things this block branches to.
   }
 
   // Make a final pass over the basic blocks from the old function to gather
   // any return instructions which survived folding. We have to do this here
   // because we can iteratively remove and merge returns above.
   for (Function::iterator I = cast<BasicBlock>(VMap[StartingBB])->getIterator(),
                           E = NewFunc->end();
        I != E; ++I)
     if (ReturnInst *RI = dyn_cast<ReturnInst>(I->getTerminator()))
       Returns.push_back(RI);
 }
 
 
 /// This works exactly like CloneFunctionInto,
 /// except that it does some simple constant prop and DCE on the fly.  The
 /// effect of this is to copy significantly less code in cases where (for
 /// example) a function call with constant arguments is inlined, and those
 /// constant arguments cause a significant amount of code in the callee to be
 /// dead.  Since this doesn't produce an exact copy of the input, it can't be
 /// used for things like CloneFunction or CloneModule.
 void llvm::CloneAndPruneFunctionInto(Function *NewFunc, const Function *OldFunc,
                                      ValueToValueMapTy &VMap,
                                      bool ModuleLevelChanges,
                                      SmallVectorImpl<ReturnInst*> &Returns,
                                      const char *NameSuffix,
                                      ClonedCodeInfo *CodeInfo,
                                      Instruction *TheCall) {
   CloneAndPruneIntoFromInst(NewFunc, OldFunc, &OldFunc->front().front(), VMap,
                             ModuleLevelChanges, Returns, NameSuffix, CodeInfo);
 }
 
 /// Remaps instructions in \p Blocks using the mapping in \p VMap.
 void llvm::remapInstructionsInBlocks(
     const SmallVectorImpl<BasicBlock *> &Blocks, ValueToValueMapTy &VMap) {
   // Rewrite the code to refer to itself.
   for (auto *BB : Blocks)
     for (auto &Inst : *BB)
       RemapInstruction(&Inst, VMap,
                        RF_NoModuleLevelChanges | RF_IgnoreMissingLocals);
 }
 
 /// Clones a loop \p OrigLoop.  Returns the loop and the blocks in \p
 /// Blocks.
 ///
 /// Updates LoopInfo and DominatorTree assuming the loop is dominated by block
 /// \p LoopDomBB.  Insert the new blocks before block specified in \p Before.
 Loop *llvm::cloneLoopWithPreheader(BasicBlock *Before, BasicBlock *LoopDomBB,
                                    Loop *OrigLoop, ValueToValueMapTy &VMap,
                                    const Twine &NameSuffix, LoopInfo *LI,
                                    DominatorTree *DT,
                                    SmallVectorImpl<BasicBlock *> &Blocks) {
   assert(OrigLoop->getSubLoops().empty() &&
          "Loop to be cloned cannot have inner loop");
   Function *F = OrigLoop->getHeader()->getParent();
   Loop *ParentLoop = OrigLoop->getParentLoop();
 
   Loop *NewLoop = LI->AllocateLoop();
   if (ParentLoop)
     ParentLoop->addChildLoop(NewLoop);
   else
     LI->addTopLevelLoop(NewLoop);
 
   BasicBlock *OrigPH = OrigLoop->getLoopPreheader();
   assert(OrigPH && "No preheader");
   BasicBlock *NewPH = CloneBasicBlock(OrigPH, VMap, NameSuffix, F);
   // To rename the loop PHIs.
   VMap[OrigPH] = NewPH;
   Blocks.push_back(NewPH);
 
   // Update LoopInfo.
   if (ParentLoop)
     ParentLoop->addBasicBlockToLoop(NewPH, *LI);
 
   // Update DominatorTree.
   DT->addNewBlock(NewPH, LoopDomBB);
 
   for (BasicBlock *BB : OrigLoop->getBlocks()) {
     BasicBlock *NewBB = CloneBasicBlock(BB, VMap, NameSuffix, F);
     VMap[BB] = NewBB;
 
     // Update LoopInfo.
     NewLoop->addBasicBlockToLoop(NewBB, *LI);
 
     // Add DominatorTree node. After seeing all blocks, update to correct IDom.
     DT->addNewBlock(NewBB, NewPH);
 
     Blocks.push_back(NewBB);
   }
 
   for (BasicBlock *BB : OrigLoop->getBlocks()) {
     // Update DominatorTree.
     BasicBlock *IDomBB = DT->getNode(BB)->getIDom()->getBlock();
     DT->changeImmediateDominator(cast<BasicBlock>(VMap[BB]),
                                  cast<BasicBlock>(VMap[IDomBB]));
   }
 
   // Move them physically from the end of the block list.
   F->getBasicBlockList().splice(Before->getIterator(), F->getBasicBlockList(),
                                 NewPH);
   F->getBasicBlockList().splice(Before->getIterator(), F->getBasicBlockList(),
                                 NewLoop->getHeader()->getIterator(), F->end());
 
   return NewLoop;
 }
 
 /// Duplicate non-Phi instructions from the beginning of block up to
 /// StopAt instruction into a split block between BB and its predecessor.
 BasicBlock *
 llvm::DuplicateInstructionsInSplitBetween(BasicBlock *BB, BasicBlock *PredBB,
                                           Instruction *StopAt,
                                           ValueToValueMapTy &ValueMapping,
                                           DominatorTree *DT) {
   // We are going to have to map operands from the original BB block to the new
   // copy of the block 'NewBB'.  If there are PHI nodes in BB, evaluate them to
   // account for entry from PredBB.
   BasicBlock::iterator BI = BB->begin();
   for (; PHINode *PN = dyn_cast<PHINode>(BI); ++BI)
     ValueMapping[PN] = PN->getIncomingValueForBlock(PredBB);
 
   BasicBlock *NewBB = SplitEdge(PredBB, BB, DT);
   NewBB->setName(PredBB->getName() + ".split");
   Instruction *NewTerm = NewBB->getTerminator();
 
   // Clone the non-phi instructions of BB into NewBB, keeping track of the
   // mapping and using it to remap operands in the cloned instructions.
   // Stop once we see the terminator too. This covers the case where BB's
   // terminator gets replaced and StopAt == BB's terminator.
   for (; StopAt != &*BI && BB->getTerminator() != &*BI; ++BI) {
     Instruction *New = BI->clone();
     New->setName(BI->getName());
     New->insertBefore(NewTerm);
     ValueMapping[&*BI] = New;
 
     // Remap operands to patch up intra-block references.
     for (unsigned i = 0, e = New->getNumOperands(); i != e; ++i)
       if (Instruction *Inst = dyn_cast<Instruction>(New->getOperand(i))) {
         auto I = ValueMapping.find(Inst);
         if (I != ValueMapping.end())
           New->setOperand(i, I->second);
       }
   }
 
   return NewBB;
 }
Index: vendor/llvm/dist-release_70/lib/Transforms/Vectorize/LoopVectorize.cpp
===================================================================
--- vendor/llvm/dist-release_70/lib/Transforms/Vectorize/LoopVectorize.cpp	(revision 338574)
+++ vendor/llvm/dist-release_70/lib/Transforms/Vectorize/LoopVectorize.cpp	(revision 338575)
@@ -1,7667 +1,7674 @@
 //===- LoopVectorize.cpp - A Loop Vectorizer ------------------------------===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This is the LLVM loop vectorizer. This pass modifies 'vectorizable' loops
 // and generates target-independent LLVM-IR.
 // The vectorizer uses the TargetTransformInfo analysis to estimate the costs
 // of instructions in order to estimate the profitability of vectorization.
 //
 // The loop vectorizer combines consecutive loop iterations into a single
 // 'wide' iteration. After this transformation the index is incremented
 // by the SIMD vector width, and not by one.
 //
 // This pass has three parts:
 // 1. The main loop pass that drives the different parts.
 // 2. LoopVectorizationLegality - A unit that checks for the legality
 //    of the vectorization.
 // 3. InnerLoopVectorizer - A unit that performs the actual
 //    widening of instructions.
 // 4. LoopVectorizationCostModel - A unit that checks for the profitability
 //    of vectorization. It decides on the optimal vector width, which
 //    can be one, if vectorization is not profitable.
 //
 // There is a development effort going on to migrate loop vectorizer to the
 // VPlan infrastructure and to introduce outer loop vectorization support (see
 // docs/Proposal/VectorizationPlan.rst and
 // http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). For this
 // purpose, we temporarily introduced the VPlan-native vectorization path: an
 // alternative vectorization path that is natively implemented on top of the
 // VPlan infrastructure. See EnableVPlanNativePath for enabling.
 //
 //===----------------------------------------------------------------------===//
 //
 // The reduction-variable vectorization is based on the paper:
 //  D. Nuzman and R. Henderson. Multi-platform Auto-vectorization.
 //
 // Variable uniformity checks are inspired by:
 //  Karrenberg, R. and Hack, S. Whole Function Vectorization.
 //
 // The interleaved access vectorization is based on the paper:
 //  Dorit Nuzman, Ira Rosen and Ayal Zaks.  Auto-Vectorization of Interleaved
 //  Data for SIMD
 //
 // Other ideas/concepts are from:
 //  A. Zaks and D. Nuzman. Autovectorization in GCC-two years later.
 //
 //  S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.  An Evaluation of
 //  Vectorizing Compilers.
 //
 //===----------------------------------------------------------------------===//
 
 #include "llvm/Transforms/Vectorize/LoopVectorize.h"
 #include "LoopVectorizationPlanner.h"
 #include "VPRecipeBuilder.h"
 #include "VPlanHCFGBuilder.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/DenseMapInfo.h"
 #include "llvm/ADT/Hashing.h"
 #include "llvm/ADT/MapVector.h"
 #include "llvm/ADT/None.h"
 #include "llvm/ADT/Optional.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
 #include "llvm/ADT/iterator_range.h"
 #include "llvm/Analysis/AssumptionCache.h"
 #include "llvm/Analysis/BasicAliasAnalysis.h"
 #include "llvm/Analysis/BlockFrequencyInfo.h"
 #include "llvm/Analysis/CFG.h"
 #include "llvm/Analysis/CodeMetrics.h"
 #include "llvm/Analysis/DemandedBits.h"
 #include "llvm/Analysis/GlobalsModRef.h"
 #include "llvm/Analysis/LoopAccessAnalysis.h"
 #include "llvm/Analysis/LoopAnalysisManager.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/LoopIterator.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/Analysis/ScalarEvolution.h"
 #include "llvm/Analysis/ScalarEvolutionExpander.h"
 #include "llvm/Analysis/ScalarEvolutionExpressions.h"
 #include "llvm/Analysis/TargetLibraryInfo.h"
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/Analysis/VectorUtils.h"
 #include "llvm/IR/Attributes.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/CFG.h"
 #include "llvm/IR/Constant.h"
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/DebugInfoMetadata.h"
 #include "llvm/IR/DebugLoc.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Operator.h"
 #include "llvm/IR/Type.h"
 #include "llvm/IR/Use.h"
 #include "llvm/IR/User.h"
 #include "llvm/IR/Value.h"
 #include "llvm/IR/ValueHandle.h"
 #include "llvm/IR/Verifier.h"
 #include "llvm/Pass.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
 #include "llvm/Transforms/Utils/LoopSimplify.h"
 #include "llvm/Transforms/Utils/LoopUtils.h"
 #include "llvm/Transforms/Utils/LoopVersioning.h"
 #include "llvm/Transforms/Vectorize/LoopVectorizationLegality.h"
 #include <algorithm>
 #include <cassert>
 #include <cstdint>
 #include <cstdlib>
 #include <functional>
 #include <iterator>
 #include <limits>
 #include <memory>
 #include <string>
 #include <tuple>
 #include <utility>
 #include <vector>
 
 using namespace llvm;
 
 #define LV_NAME "loop-vectorize"
 #define DEBUG_TYPE LV_NAME
 
 STATISTIC(LoopsVectorized, "Number of loops vectorized");
 STATISTIC(LoopsAnalyzed, "Number of loops analyzed for vectorization");
 
 /// Loops with a known constant trip count below this number are vectorized only
 /// if no scalar iteration overheads are incurred.
 static cl::opt<unsigned> TinyTripCountVectorThreshold(
     "vectorizer-min-trip-count", cl::init(16), cl::Hidden,
     cl::desc("Loops with a constant trip count that is smaller than this "
              "value are vectorized only if no scalar iteration overheads "
              "are incurred."));
 
 static cl::opt<bool> MaximizeBandwidth(
     "vectorizer-maximize-bandwidth", cl::init(false), cl::Hidden,
     cl::desc("Maximize bandwidth when selecting vectorization factor which "
              "will be determined by the smallest type in loop."));
 
 static cl::opt<bool> EnableInterleavedMemAccesses(
     "enable-interleaved-mem-accesses", cl::init(false), cl::Hidden,
     cl::desc("Enable vectorization on interleaved memory accesses in a loop"));
 
 /// Maximum factor for an interleaved memory access.
 static cl::opt<unsigned> MaxInterleaveGroupFactor(
     "max-interleave-group-factor", cl::Hidden,
     cl::desc("Maximum factor for an interleaved access group (default = 8)"),
     cl::init(8));
 
 /// We don't interleave loops with a known constant trip count below this
 /// number.
 static const unsigned TinyTripCountInterleaveThreshold = 128;
 
 static cl::opt<unsigned> ForceTargetNumScalarRegs(
     "force-target-num-scalar-regs", cl::init(0), cl::Hidden,
     cl::desc("A flag that overrides the target's number of scalar registers."));
 
 static cl::opt<unsigned> ForceTargetNumVectorRegs(
     "force-target-num-vector-regs", cl::init(0), cl::Hidden,
     cl::desc("A flag that overrides the target's number of vector registers."));
 
 static cl::opt<unsigned> ForceTargetMaxScalarInterleaveFactor(
     "force-target-max-scalar-interleave", cl::init(0), cl::Hidden,
     cl::desc("A flag that overrides the target's max interleave factor for "
              "scalar loops."));
 
 static cl::opt<unsigned> ForceTargetMaxVectorInterleaveFactor(
     "force-target-max-vector-interleave", cl::init(0), cl::Hidden,
     cl::desc("A flag that overrides the target's max interleave factor for "
              "vectorized loops."));
 
 static cl::opt<unsigned> ForceTargetInstructionCost(
     "force-target-instruction-cost", cl::init(0), cl::Hidden,
     cl::desc("A flag that overrides the target's expected cost for "
              "an instruction to a single constant value. Mostly "
              "useful for getting consistent testing."));
 
 static cl::opt<unsigned> SmallLoopCost(
     "small-loop-cost", cl::init(20), cl::Hidden,
     cl::desc(
         "The cost of a loop that is considered 'small' by the interleaver."));
 
 static cl::opt<bool> LoopVectorizeWithBlockFrequency(
     "loop-vectorize-with-block-frequency", cl::init(true), cl::Hidden,
     cl::desc("Enable the use of the block frequency analysis to access PGO "
              "heuristics minimizing code growth in cold regions and being more "
              "aggressive in hot regions."));
 
 // Runtime interleave loops for load/store throughput.
 static cl::opt<bool> EnableLoadStoreRuntimeInterleave(
     "enable-loadstore-runtime-interleave", cl::init(true), cl::Hidden,
     cl::desc(
         "Enable runtime interleaving until load/store ports are saturated"));
 
 /// The number of stores in a loop that are allowed to need predication.
 static cl::opt<unsigned> NumberOfStoresToPredicate(
     "vectorize-num-stores-pred", cl::init(1), cl::Hidden,
     cl::desc("Max number of stores to be predicated behind an if."));
 
 static cl::opt<bool> EnableIndVarRegisterHeur(
     "enable-ind-var-reg-heur", cl::init(true), cl::Hidden,
     cl::desc("Count the induction variable only once when interleaving"));
 
 static cl::opt<bool> EnableCondStoresVectorization(
     "enable-cond-stores-vec", cl::init(true), cl::Hidden,
     cl::desc("Enable if predication of stores during vectorization."));
 
 static cl::opt<unsigned> MaxNestedScalarReductionIC(
     "max-nested-scalar-reduction-interleave", cl::init(2), cl::Hidden,
     cl::desc("The maximum interleave count to use when interleaving a scalar "
              "reduction in a nested loop."));
 
 static cl::opt<bool> EnableVPlanNativePath(
     "enable-vplan-native-path", cl::init(false), cl::Hidden,
     cl::desc("Enable VPlan-native vectorization path with "
              "support for outer loop vectorization."));
 
 // This flag enables the stress testing of the VPlan H-CFG construction in the
 // VPlan-native vectorization path. It must be used in conjuction with
 // -enable-vplan-native-path. -vplan-verify-hcfg can also be used to enable the
 // verification of the H-CFGs built.
 static cl::opt<bool> VPlanBuildStressTest(
     "vplan-build-stress-test", cl::init(false), cl::Hidden,
     cl::desc(
         "Build VPlan for every supported loop nest in the function and bail "
         "out right after the build (stress test the VPlan H-CFG construction "
         "in the VPlan-native vectorization path)."));
 
 /// A helper function for converting Scalar types to vector types.
 /// If the incoming type is void, we return void. If the VF is 1, we return
 /// the scalar type.
 static Type *ToVectorTy(Type *Scalar, unsigned VF) {
   if (Scalar->isVoidTy() || VF == 1)
     return Scalar;
   return VectorType::get(Scalar, VF);
 }
 
 // FIXME: The following helper functions have multiple implementations
 // in the project. They can be effectively organized in a common Load/Store
 // utilities unit.
 
 /// A helper function that returns the type of loaded or stored value.
 static Type *getMemInstValueType(Value *I) {
   assert((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
          "Expected Load or Store instruction");
   if (auto *LI = dyn_cast<LoadInst>(I))
     return LI->getType();
   return cast<StoreInst>(I)->getValueOperand()->getType();
 }
 
 /// A helper function that returns the alignment of load or store instruction.
 static unsigned getMemInstAlignment(Value *I) {
   assert((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
          "Expected Load or Store instruction");
   if (auto *LI = dyn_cast<LoadInst>(I))
     return LI->getAlignment();
   return cast<StoreInst>(I)->getAlignment();
 }
 
 /// A helper function that returns the address space of the pointer operand of
 /// load or store instruction.
 static unsigned getMemInstAddressSpace(Value *I) {
   assert((isa<LoadInst>(I) || isa<StoreInst>(I)) &&
          "Expected Load or Store instruction");
   if (auto *LI = dyn_cast<LoadInst>(I))
     return LI->getPointerAddressSpace();
   return cast<StoreInst>(I)->getPointerAddressSpace();
 }
 
 /// A helper function that returns true if the given type is irregular. The
 /// type is irregular if its allocated size doesn't equal the store size of an
 /// element of the corresponding vector type at the given vectorization factor.
 static bool hasIrregularType(Type *Ty, const DataLayout &DL, unsigned VF) {
   // Determine if an array of VF elements of type Ty is "bitcast compatible"
   // with a <VF x Ty> vector.
   if (VF > 1) {
     auto *VectorTy = VectorType::get(Ty, VF);
     return VF * DL.getTypeAllocSize(Ty) != DL.getTypeStoreSize(VectorTy);
   }
 
   // If the vectorization factor is one, we just check if an array of type Ty
   // requires padding between elements.
   return DL.getTypeAllocSizeInBits(Ty) != DL.getTypeSizeInBits(Ty);
 }
 
 /// A helper function that returns the reciprocal of the block probability of
 /// predicated blocks. If we return X, we are assuming the predicated block
 /// will execute once for every X iterations of the loop header.
 ///
 /// TODO: We should use actual block probability here, if available. Currently,
 ///       we always assume predicated blocks have a 50% chance of executing.
 static unsigned getReciprocalPredBlockProb() { return 2; }
 
 /// A helper function that adds a 'fast' flag to floating-point operations.
 static Value *addFastMathFlag(Value *V) {
   if (isa<FPMathOperator>(V)) {
     FastMathFlags Flags;
     Flags.setFast();
     cast<Instruction>(V)->setFastMathFlags(Flags);
   }
   return V;
 }
 
 /// A helper function that returns an integer or floating-point constant with
 /// value C.
 static Constant *getSignedIntOrFpConstant(Type *Ty, int64_t C) {
   return Ty->isIntegerTy() ? ConstantInt::getSigned(Ty, C)
                            : ConstantFP::get(Ty, C);
 }
 
 namespace llvm {
 
 /// InnerLoopVectorizer vectorizes loops which contain only one basic
 /// block to a specified vectorization factor (VF).
 /// This class performs the widening of scalars into vectors, or multiple
 /// scalars. This class also implements the following features:
 /// * It inserts an epilogue loop for handling loops that don't have iteration
 ///   counts that are known to be a multiple of the vectorization factor.
 /// * It handles the code generation for reduction variables.
 /// * Scalarization (implementation using scalars) of un-vectorizable
 ///   instructions.
 /// InnerLoopVectorizer does not perform any vectorization-legality
 /// checks, and relies on the caller to check for the different legality
 /// aspects. The InnerLoopVectorizer relies on the
 /// LoopVectorizationLegality class to provide information about the induction
 /// and reduction variables that were found to a given vectorization factor.
 class InnerLoopVectorizer {
 public:
   InnerLoopVectorizer(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
                       LoopInfo *LI, DominatorTree *DT,
                       const TargetLibraryInfo *TLI,
                       const TargetTransformInfo *TTI, AssumptionCache *AC,
                       OptimizationRemarkEmitter *ORE, unsigned VecWidth,
                       unsigned UnrollFactor, LoopVectorizationLegality *LVL,
                       LoopVectorizationCostModel *CM)
       : OrigLoop(OrigLoop), PSE(PSE), LI(LI), DT(DT), TLI(TLI), TTI(TTI),
         AC(AC), ORE(ORE), VF(VecWidth), UF(UnrollFactor),
         Builder(PSE.getSE()->getContext()),
         VectorLoopValueMap(UnrollFactor, VecWidth), Legal(LVL), Cost(CM) {}
   virtual ~InnerLoopVectorizer() = default;
 
   /// Create a new empty loop. Unlink the old loop and connect the new one.
   /// Return the pre-header block of the new loop.
   BasicBlock *createVectorizedLoopSkeleton();
 
   /// Widen a single instruction within the innermost loop.
   void widenInstruction(Instruction &I);
 
   /// Fix the vectorized code, taking care of header phi's, live-outs, and more.
   void fixVectorizedLoop();
 
   // Return true if any runtime check is added.
   bool areSafetyChecksAdded() { return AddedSafetyChecks; }
 
   /// A type for vectorized values in the new loop. Each value from the
   /// original loop, when vectorized, is represented by UF vector values in the
   /// new unrolled loop, where UF is the unroll factor.
   using VectorParts = SmallVector<Value *, 2>;
 
   /// Vectorize a single PHINode in a block. This method handles the induction
   /// variable canonicalization. It supports both VF = 1 for unrolled loops and
   /// arbitrary length vectors.
   void widenPHIInstruction(Instruction *PN, unsigned UF, unsigned VF);
 
   /// A helper function to scalarize a single Instruction in the innermost loop.
   /// Generates a sequence of scalar instances for each lane between \p MinLane
   /// and \p MaxLane, times each part between \p MinPart and \p MaxPart,
   /// inclusive..
   void scalarizeInstruction(Instruction *Instr, const VPIteration &Instance,
                             bool IfPredicateInstr);
 
   /// Widen an integer or floating-point induction variable \p IV. If \p Trunc
   /// is provided, the integer induction variable will first be truncated to
   /// the corresponding type.
   void widenIntOrFpInduction(PHINode *IV, TruncInst *Trunc = nullptr);
 
   /// getOrCreateVectorValue and getOrCreateScalarValue coordinate to generate a
   /// vector or scalar value on-demand if one is not yet available. When
   /// vectorizing a loop, we visit the definition of an instruction before its
   /// uses. When visiting the definition, we either vectorize or scalarize the
   /// instruction, creating an entry for it in the corresponding map. (In some
   /// cases, such as induction variables, we will create both vector and scalar
   /// entries.) Then, as we encounter uses of the definition, we derive values
   /// for each scalar or vector use unless such a value is already available.
   /// For example, if we scalarize a definition and one of its uses is vector,
   /// we build the required vector on-demand with an insertelement sequence
   /// when visiting the use. Otherwise, if the use is scalar, we can use the
   /// existing scalar definition.
   ///
   /// Return a value in the new loop corresponding to \p V from the original
   /// loop at unroll index \p Part. If the value has already been vectorized,
   /// the corresponding vector entry in VectorLoopValueMap is returned. If,
   /// however, the value has a scalar entry in VectorLoopValueMap, we construct
   /// a new vector value on-demand by inserting the scalar values into a vector
   /// with an insertelement sequence. If the value has been neither vectorized
   /// nor scalarized, it must be loop invariant, so we simply broadcast the
   /// value into a vector.
   Value *getOrCreateVectorValue(Value *V, unsigned Part);
 
   /// Return a value in the new loop corresponding to \p V from the original
   /// loop at unroll and vector indices \p Instance. If the value has been
   /// vectorized but not scalarized, the necessary extractelement instruction
   /// will be generated.
   Value *getOrCreateScalarValue(Value *V, const VPIteration &Instance);
 
   /// Construct the vector value of a scalarized value \p V one lane at a time.
   void packScalarIntoVectorValue(Value *V, const VPIteration &Instance);
 
   /// Try to vectorize the interleaved access group that \p Instr belongs to.
   void vectorizeInterleaveGroup(Instruction *Instr);
 
   /// Vectorize Load and Store instructions, optionally masking the vector
   /// operations if \p BlockInMask is non-null.
   void vectorizeMemoryInstruction(Instruction *Instr,
                                   VectorParts *BlockInMask = nullptr);
 
   /// Set the debug location in the builder using the debug location in
   /// the instruction.
   void setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr);
 
 protected:
   friend class LoopVectorizationPlanner;
 
   /// A small list of PHINodes.
   using PhiVector = SmallVector<PHINode *, 4>;
 
   /// A type for scalarized values in the new loop. Each value from the
   /// original loop, when scalarized, is represented by UF x VF scalar values
   /// in the new unrolled loop, where UF is the unroll factor and VF is the
   /// vectorization factor.
   using ScalarParts = SmallVector<SmallVector<Value *, 4>, 2>;
 
   /// Set up the values of the IVs correctly when exiting the vector loop.
   void fixupIVUsers(PHINode *OrigPhi, const InductionDescriptor &II,
                     Value *CountRoundDown, Value *EndValue,
                     BasicBlock *MiddleBlock);
 
   /// Create a new induction variable inside L.
   PHINode *createInductionVariable(Loop *L, Value *Start, Value *End,
                                    Value *Step, Instruction *DL);
 
   /// Handle all cross-iteration phis in the header.
   void fixCrossIterationPHIs();
 
   /// Fix a first-order recurrence. This is the second phase of vectorizing
   /// this phi node.
   void fixFirstOrderRecurrence(PHINode *Phi);
 
   /// Fix a reduction cross-iteration phi. This is the second phase of
   /// vectorizing this phi node.
   void fixReduction(PHINode *Phi);
 
   /// The Loop exit block may have single value PHI nodes with some
   /// incoming value. While vectorizing we only handled real values
   /// that were defined inside the loop and we should have one value for
   /// each predecessor of its parent basic block. See PR14725.
   void fixLCSSAPHIs();
 
   /// Iteratively sink the scalarized operands of a predicated instruction into
   /// the block that was created for it.
   void sinkScalarOperands(Instruction *PredInst);
 
   /// Shrinks vector element sizes to the smallest bitwidth they can be legally
   /// represented as.
   void truncateToMinimalBitwidths();
 
   /// Insert the new loop to the loop hierarchy and pass manager
   /// and update the analysis passes.
   void updateAnalysis();
 
   /// Create a broadcast instruction. This method generates a broadcast
   /// instruction (shuffle) for loop invariant values and for the induction
   /// value. If this is the induction variable then we extend it to N, N+1, ...
   /// this is needed because each iteration in the loop corresponds to a SIMD
   /// element.
   virtual Value *getBroadcastInstrs(Value *V);
 
   /// This function adds (StartIdx, StartIdx + Step, StartIdx + 2*Step, ...)
   /// to each vector element of Val. The sequence starts at StartIndex.
   /// \p Opcode is relevant for FP induction variable.
   virtual Value *getStepVector(Value *Val, int StartIdx, Value *Step,
                                Instruction::BinaryOps Opcode =
                                Instruction::BinaryOpsEnd);
 
   /// Compute scalar induction steps. \p ScalarIV is the scalar induction
   /// variable on which to base the steps, \p Step is the size of the step, and
   /// \p EntryVal is the value from the original loop that maps to the steps.
   /// Note that \p EntryVal doesn't have to be an induction variable - it
   /// can also be a truncate instruction.
   void buildScalarSteps(Value *ScalarIV, Value *Step, Instruction *EntryVal,
                         const InductionDescriptor &ID);
 
   /// Create a vector induction phi node based on an existing scalar one. \p
   /// EntryVal is the value from the original loop that maps to the vector phi
   /// node, and \p Step is the loop-invariant step. If \p EntryVal is a
   /// truncate instruction, instead of widening the original IV, we widen a
   /// version of the IV truncated to \p EntryVal's type.
   void createVectorIntOrFpInductionPHI(const InductionDescriptor &II,
                                        Value *Step, Instruction *EntryVal);
 
   /// Returns true if an instruction \p I should be scalarized instead of
   /// vectorized for the chosen vectorization factor.
   bool shouldScalarizeInstruction(Instruction *I) const;
 
   /// Returns true if we should generate a scalar version of \p IV.
   bool needsScalarInduction(Instruction *IV) const;
 
   /// If there is a cast involved in the induction variable \p ID, which should
   /// be ignored in the vectorized loop body, this function records the
   /// VectorLoopValue of the respective Phi also as the VectorLoopValue of the
   /// cast. We had already proved that the casted Phi is equal to the uncasted
   /// Phi in the vectorized loop (under a runtime guard), and therefore
   /// there is no need to vectorize the cast - the same value can be used in the
   /// vector loop for both the Phi and the cast.
   /// If \p VectorLoopValue is a scalarized value, \p Lane is also specified,
   /// Otherwise, \p VectorLoopValue is a widened/vectorized value.
   ///
   /// \p EntryVal is the value from the original loop that maps to the vector
   /// phi node and is used to distinguish what is the IV currently being
   /// processed - original one (if \p EntryVal is a phi corresponding to the
   /// original IV) or the "newly-created" one based on the proof mentioned above
   /// (see also buildScalarSteps() and createVectorIntOrFPInductionPHI()). In the
   /// latter case \p EntryVal is a TruncInst and we must not record anything for
   /// that IV, but it's error-prone to expect callers of this routine to care
   /// about that, hence this explicit parameter.
   void recordVectorLoopValueForInductionCast(const InductionDescriptor &ID,
                                              const Instruction *EntryVal,
                                              Value *VectorLoopValue,
                                              unsigned Part,
                                              unsigned Lane = UINT_MAX);
 
   /// Generate a shuffle sequence that will reverse the vector Vec.
   virtual Value *reverseVector(Value *Vec);
 
   /// Returns (and creates if needed) the original loop trip count.
   Value *getOrCreateTripCount(Loop *NewLoop);
 
   /// Returns (and creates if needed) the trip count of the widened loop.
   Value *getOrCreateVectorTripCount(Loop *NewLoop);
 
   /// Returns a bitcasted value to the requested vector type.
   /// Also handles bitcasts of vector<float> <-> vector<pointer> types.
   Value *createBitOrPointerCast(Value *V, VectorType *DstVTy,
                                 const DataLayout &DL);
 
   /// Emit a bypass check to see if the vector trip count is zero, including if
   /// it overflows.
   void emitMinimumIterationCountCheck(Loop *L, BasicBlock *Bypass);
 
   /// Emit a bypass check to see if all of the SCEV assumptions we've
   /// had to make are correct.
   void emitSCEVChecks(Loop *L, BasicBlock *Bypass);
 
   /// Emit bypass checks to check any memory assumptions we may have made.
   void emitMemRuntimeChecks(Loop *L, BasicBlock *Bypass);
 
   /// Add additional metadata to \p To that was not present on \p Orig.
   ///
   /// Currently this is used to add the noalias annotations based on the
   /// inserted memchecks.  Use this for instructions that are *cloned* into the
   /// vector loop.
   void addNewMetadata(Instruction *To, const Instruction *Orig);
 
   /// Add metadata from one instruction to another.
   ///
   /// This includes both the original MDs from \p From and additional ones (\see
   /// addNewMetadata).  Use this for *newly created* instructions in the vector
   /// loop.
   void addMetadata(Instruction *To, Instruction *From);
 
   /// Similar to the previous function but it adds the metadata to a
   /// vector of instructions.
   void addMetadata(ArrayRef<Value *> To, Instruction *From);
 
   /// The original loop.
   Loop *OrigLoop;
 
   /// A wrapper around ScalarEvolution used to add runtime SCEV checks. Applies
   /// dynamic knowledge to simplify SCEV expressions and converts them to a
   /// more usable form.
   PredicatedScalarEvolution &PSE;
 
   /// Loop Info.
   LoopInfo *LI;
 
   /// Dominator Tree.
   DominatorTree *DT;
 
   /// Alias Analysis.
   AliasAnalysis *AA;
 
   /// Target Library Info.
   const TargetLibraryInfo *TLI;
 
   /// Target Transform Info.
   const TargetTransformInfo *TTI;
 
   /// Assumption Cache.
   AssumptionCache *AC;
 
   /// Interface to emit optimization remarks.
   OptimizationRemarkEmitter *ORE;
 
   /// LoopVersioning.  It's only set up (non-null) if memchecks were
   /// used.
   ///
   /// This is currently only used to add no-alias metadata based on the
   /// memchecks.  The actually versioning is performed manually.
   std::unique_ptr<LoopVersioning> LVer;
 
   /// The vectorization SIMD factor to use. Each vector will have this many
   /// vector elements.
   unsigned VF;
 
   /// The vectorization unroll factor to use. Each scalar is vectorized to this
   /// many different vector instructions.
   unsigned UF;
 
   /// The builder that we use
   IRBuilder<> Builder;
 
   // --- Vectorization state ---
 
   /// The vector-loop preheader.
   BasicBlock *LoopVectorPreHeader;
 
   /// The scalar-loop preheader.
   BasicBlock *LoopScalarPreHeader;
 
   /// Middle Block between the vector and the scalar.
   BasicBlock *LoopMiddleBlock;
 
   /// The ExitBlock of the scalar loop.
   BasicBlock *LoopExitBlock;
 
   /// The vector loop body.
   BasicBlock *LoopVectorBody;
 
   /// The scalar loop body.
   BasicBlock *LoopScalarBody;
 
   /// A list of all bypass blocks. The first block is the entry of the loop.
   SmallVector<BasicBlock *, 4> LoopBypassBlocks;
 
   /// The new Induction variable which was added to the new block.
   PHINode *Induction = nullptr;
 
   /// The induction variable of the old basic block.
   PHINode *OldInduction = nullptr;
 
   /// Maps values from the original loop to their corresponding values in the
   /// vectorized loop. A key value can map to either vector values, scalar
   /// values or both kinds of values, depending on whether the key was
   /// vectorized and scalarized.
   VectorizerValueMap VectorLoopValueMap;
 
   /// Store instructions that were predicated.
   SmallVector<Instruction *, 4> PredicatedInstructions;
 
   /// Trip count of the original loop.
   Value *TripCount = nullptr;
 
   /// Trip count of the widened loop (TripCount - TripCount % (VF*UF))
   Value *VectorTripCount = nullptr;
 
   /// The legality analysis.
   LoopVectorizationLegality *Legal;
 
   /// The profitablity analysis.
   LoopVectorizationCostModel *Cost;
 
   // Record whether runtime checks are added.
   bool AddedSafetyChecks = false;
 
   // Holds the end values for each induction variable. We save the end values
   // so we can later fix-up the external users of the induction variables.
   DenseMap<PHINode *, Value *> IVEndValues;
 };
 
 class InnerLoopUnroller : public InnerLoopVectorizer {
 public:
   InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
                     LoopInfo *LI, DominatorTree *DT,
                     const TargetLibraryInfo *TLI,
                     const TargetTransformInfo *TTI, AssumptionCache *AC,
                     OptimizationRemarkEmitter *ORE, unsigned UnrollFactor,
                     LoopVectorizationLegality *LVL,
                     LoopVectorizationCostModel *CM)
       : InnerLoopVectorizer(OrigLoop, PSE, LI, DT, TLI, TTI, AC, ORE, 1,
                             UnrollFactor, LVL, CM) {}
 
 private:
   Value *getBroadcastInstrs(Value *V) override;
   Value *getStepVector(Value *Val, int StartIdx, Value *Step,
                        Instruction::BinaryOps Opcode =
                        Instruction::BinaryOpsEnd) override;
   Value *reverseVector(Value *Vec) override;
 };
 
 } // end namespace llvm
 
 /// Look for a meaningful debug location on the instruction or it's
 /// operands.
 static Instruction *getDebugLocFromInstOrOperands(Instruction *I) {
   if (!I)
     return I;
 
   DebugLoc Empty;
   if (I->getDebugLoc() != Empty)
     return I;
 
   for (User::op_iterator OI = I->op_begin(), OE = I->op_end(); OI != OE; ++OI) {
     if (Instruction *OpInst = dyn_cast<Instruction>(*OI))
       if (OpInst->getDebugLoc() != Empty)
         return OpInst;
   }
 
   return I;
 }
 
 void InnerLoopVectorizer::setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr) {
   if (const Instruction *Inst = dyn_cast_or_null<Instruction>(Ptr)) {
     const DILocation *DIL = Inst->getDebugLoc();
     if (DIL && Inst->getFunction()->isDebugInfoForProfiling() &&
         !isa<DbgInfoIntrinsic>(Inst))
       B.SetCurrentDebugLocation(DIL->cloneWithDuplicationFactor(UF * VF));
     else
       B.SetCurrentDebugLocation(DIL);
   } else
     B.SetCurrentDebugLocation(DebugLoc());
 }
 
 #ifndef NDEBUG
 /// \return string containing a file name and a line # for the given loop.
 static std::string getDebugLocString(const Loop *L) {
   std::string Result;
   if (L) {
     raw_string_ostream OS(Result);
     if (const DebugLoc LoopDbgLoc = L->getStartLoc())
       LoopDbgLoc.print(OS);
     else
       // Just print the module name.
       OS << L->getHeader()->getParent()->getParent()->getModuleIdentifier();
     OS.flush();
   }
   return Result;
 }
 #endif
 
 void InnerLoopVectorizer::addNewMetadata(Instruction *To,
                                          const Instruction *Orig) {
   // If the loop was versioned with memchecks, add the corresponding no-alias
   // metadata.
   if (LVer && (isa<LoadInst>(Orig) || isa<StoreInst>(Orig)))
     LVer->annotateInstWithNoAlias(To, Orig);
 }
 
 void InnerLoopVectorizer::addMetadata(Instruction *To,
                                       Instruction *From) {
   propagateMetadata(To, From);
   addNewMetadata(To, From);
 }
 
 void InnerLoopVectorizer::addMetadata(ArrayRef<Value *> To,
                                       Instruction *From) {
   for (Value *V : To) {
     if (Instruction *I = dyn_cast<Instruction>(V))
       addMetadata(I, From);
   }
 }
 
 namespace llvm {
 
 /// The group of interleaved loads/stores sharing the same stride and
 /// close to each other.
 ///
 /// Each member in this group has an index starting from 0, and the largest
 /// index should be less than interleaved factor, which is equal to the absolute
 /// value of the access's stride.
 ///
 /// E.g. An interleaved load group of factor 4:
 ///        for (unsigned i = 0; i < 1024; i+=4) {
 ///          a = A[i];                           // Member of index 0
 ///          b = A[i+1];                         // Member of index 1
 ///          d = A[i+3];                         // Member of index 3
 ///          ...
 ///        }
 ///
 ///      An interleaved store group of factor 4:
 ///        for (unsigned i = 0; i < 1024; i+=4) {
 ///          ...
 ///          A[i]   = a;                         // Member of index 0
 ///          A[i+1] = b;                         // Member of index 1
 ///          A[i+2] = c;                         // Member of index 2
 ///          A[i+3] = d;                         // Member of index 3
 ///        }
 ///
 /// Note: the interleaved load group could have gaps (missing members), but
 /// the interleaved store group doesn't allow gaps.
 class InterleaveGroup {
 public:
   InterleaveGroup(Instruction *Instr, int Stride, unsigned Align)
       : Align(Align), InsertPos(Instr) {
     assert(Align && "The alignment should be non-zero");
 
     Factor = std::abs(Stride);
     assert(Factor > 1 && "Invalid interleave factor");
 
     Reverse = Stride < 0;
     Members[0] = Instr;
   }
 
   bool isReverse() const { return Reverse; }
   unsigned getFactor() const { return Factor; }
   unsigned getAlignment() const { return Align; }
   unsigned getNumMembers() const { return Members.size(); }
 
   /// Try to insert a new member \p Instr with index \p Index and
   /// alignment \p NewAlign. The index is related to the leader and it could be
   /// negative if it is the new leader.
   ///
   /// \returns false if the instruction doesn't belong to the group.
   bool insertMember(Instruction *Instr, int Index, unsigned NewAlign) {
     assert(NewAlign && "The new member's alignment should be non-zero");
 
     int Key = Index + SmallestKey;
 
     // Skip if there is already a member with the same index.
     if (Members.count(Key))
       return false;
 
     if (Key > LargestKey) {
       // The largest index is always less than the interleave factor.
       if (Index >= static_cast<int>(Factor))
         return false;
 
       LargestKey = Key;
     } else if (Key < SmallestKey) {
       // The largest index is always less than the interleave factor.
       if (LargestKey - Key >= static_cast<int>(Factor))
         return false;
 
       SmallestKey = Key;
     }
 
     // It's always safe to select the minimum alignment.
     Align = std::min(Align, NewAlign);
     Members[Key] = Instr;
     return true;
   }
 
   /// Get the member with the given index \p Index
   ///
   /// \returns nullptr if contains no such member.
   Instruction *getMember(unsigned Index) const {
     int Key = SmallestKey + Index;
     if (!Members.count(Key))
       return nullptr;
 
     return Members.find(Key)->second;
   }
 
   /// Get the index for the given member. Unlike the key in the member
   /// map, the index starts from 0.
   unsigned getIndex(Instruction *Instr) const {
     for (auto I : Members)
       if (I.second == Instr)
         return I.first - SmallestKey;
 
     llvm_unreachable("InterleaveGroup contains no such member");
   }
 
   Instruction *getInsertPos() const { return InsertPos; }
   void setInsertPos(Instruction *Inst) { InsertPos = Inst; }
 
   /// Add metadata (e.g. alias info) from the instructions in this group to \p
   /// NewInst.
   ///
   /// FIXME: this function currently does not add noalias metadata a'la
   /// addNewMedata.  To do that we need to compute the intersection of the
   /// noalias info from all members.
   void addMetadata(Instruction *NewInst) const {
     SmallVector<Value *, 4> VL;
     std::transform(Members.begin(), Members.end(), std::back_inserter(VL),
                    [](std::pair<int, Instruction *> p) { return p.second; });
     propagateMetadata(NewInst, VL);
   }
 
 private:
   unsigned Factor; // Interleave Factor.
   bool Reverse;
   unsigned Align;
   DenseMap<int, Instruction *> Members;
   int SmallestKey = 0;
   int LargestKey = 0;
 
   // To avoid breaking dependences, vectorized instructions of an interleave
   // group should be inserted at either the first load or the last store in
   // program order.
   //
   // E.g. %even = load i32             // Insert Position
   //      %add = add i32 %even         // Use of %even
   //      %odd = load i32
   //
   //      store i32 %even
   //      %odd = add i32               // Def of %odd
   //      store i32 %odd               // Insert Position
   Instruction *InsertPos;
 };
 } // end namespace llvm
 
 namespace {
 
 /// Drive the analysis of interleaved memory accesses in the loop.
 ///
 /// Use this class to analyze interleaved accesses only when we can vectorize
 /// a loop. Otherwise it's meaningless to do analysis as the vectorization
 /// on interleaved accesses is unsafe.
 ///
 /// The analysis collects interleave groups and records the relationships
 /// between the member and the group in a map.
 class InterleavedAccessInfo {
 public:
   InterleavedAccessInfo(PredicatedScalarEvolution &PSE, Loop *L,
                         DominatorTree *DT, LoopInfo *LI,
                         const LoopAccessInfo *LAI)
     : PSE(PSE), TheLoop(L), DT(DT), LI(LI), LAI(LAI) {}
 
   ~InterleavedAccessInfo() {
     SmallPtrSet<InterleaveGroup *, 4> DelSet;
     // Avoid releasing a pointer twice.
     for (auto &I : InterleaveGroupMap)
       DelSet.insert(I.second);
     for (auto *Ptr : DelSet)
       delete Ptr;
   }
 
   /// Analyze the interleaved accesses and collect them in interleave
   /// groups. Substitute symbolic strides using \p Strides.
   void analyzeInterleaving();
 
   /// Check if \p Instr belongs to any interleave group.
   bool isInterleaved(Instruction *Instr) const {
     return InterleaveGroupMap.count(Instr);
   }
 
   /// Get the interleave group that \p Instr belongs to.
   ///
   /// \returns nullptr if doesn't have such group.
   InterleaveGroup *getInterleaveGroup(Instruction *Instr) const {
     if (InterleaveGroupMap.count(Instr))
       return InterleaveGroupMap.find(Instr)->second;
     return nullptr;
   }
 
   /// Returns true if an interleaved group that may access memory
   /// out-of-bounds requires a scalar epilogue iteration for correctness.
   bool requiresScalarEpilogue() const { return RequiresScalarEpilogue; }
 
 private:
   /// A wrapper around ScalarEvolution, used to add runtime SCEV checks.
   /// Simplifies SCEV expressions in the context of existing SCEV assumptions.
   /// The interleaved access analysis can also add new predicates (for example
   /// by versioning strides of pointers).
   PredicatedScalarEvolution &PSE;
 
   Loop *TheLoop;
   DominatorTree *DT;
   LoopInfo *LI;
   const LoopAccessInfo *LAI;
 
   /// True if the loop may contain non-reversed interleaved groups with
   /// out-of-bounds accesses. We ensure we don't speculatively access memory
   /// out-of-bounds by executing at least one scalar epilogue iteration.
   bool RequiresScalarEpilogue = false;
 
   /// Holds the relationships between the members and the interleave group.
   DenseMap<Instruction *, InterleaveGroup *> InterleaveGroupMap;
 
   /// Holds dependences among the memory accesses in the loop. It maps a source
   /// access to a set of dependent sink accesses.
   DenseMap<Instruction *, SmallPtrSet<Instruction *, 2>> Dependences;
 
   /// The descriptor for a strided memory access.
   struct StrideDescriptor {
     StrideDescriptor() = default;
     StrideDescriptor(int64_t Stride, const SCEV *Scev, uint64_t Size,
                      unsigned Align)
         : Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}
 
     // The access's stride. It is negative for a reverse access.
     int64_t Stride = 0;
 
     // The scalar expression of this access.
     const SCEV *Scev = nullptr;
 
     // The size of the memory object.
     uint64_t Size = 0;
 
     // The alignment of this access.
     unsigned Align = 0;
   };
 
   /// A type for holding instructions and their stride descriptors.
   using StrideEntry = std::pair<Instruction *, StrideDescriptor>;
 
   /// Create a new interleave group with the given instruction \p Instr,
   /// stride \p Stride and alignment \p Align.
   ///
   /// \returns the newly created interleave group.
   InterleaveGroup *createInterleaveGroup(Instruction *Instr, int Stride,
                                          unsigned Align) {
     assert(!InterleaveGroupMap.count(Instr) &&
            "Already in an interleaved access group");
     InterleaveGroupMap[Instr] = new InterleaveGroup(Instr, Stride, Align);
     return InterleaveGroupMap[Instr];
   }
 
   /// Release the group and remove all the relationships.
   void releaseGroup(InterleaveGroup *Group) {
     for (unsigned i = 0; i < Group->getFactor(); i++)
       if (Instruction *Member = Group->getMember(i))
         InterleaveGroupMap.erase(Member);
 
     delete Group;
   }
 
   /// Collect all the accesses with a constant stride in program order.
   void collectConstStrideAccesses(
       MapVector<Instruction *, StrideDescriptor> &AccessStrideInfo,
       const ValueToValueMap &Strides);
 
   /// Returns true if \p Stride is allowed in an interleaved group.
   static bool isStrided(int Stride) {
     unsigned Factor = std::abs(Stride);
     return Factor >= 2 && Factor <= MaxInterleaveGroupFactor;
   }
 
   /// Returns true if \p BB is a predicated block.
   bool isPredicated(BasicBlock *BB) const {
     return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);
   }
 
   /// Returns true if LoopAccessInfo can be used for dependence queries.
   bool areDependencesValid() const {
     return LAI && LAI->getDepChecker().getDependences();
   }
 
   /// Returns true if memory accesses \p A and \p B can be reordered, if
   /// necessary, when constructing interleaved groups.
   ///
   /// \p A must precede \p B in program order. We return false if reordering is
   /// not necessary or is prevented because \p A and \p B may be dependent.
   bool canReorderMemAccessesForInterleavedGroups(StrideEntry *A,
                                                  StrideEntry *B) const {
     // Code motion for interleaved accesses can potentially hoist strided loads
     // and sink strided stores. The code below checks the legality of the
     // following two conditions:
     //
     // 1. Potentially moving a strided load (B) before any store (A) that
     //    precedes B, or
     //
     // 2. Potentially moving a strided store (A) after any load or store (B)
     //    that A precedes.
     //
     // It's legal to reorder A and B if we know there isn't a dependence from A
     // to B. Note that this determination is conservative since some
     // dependences could potentially be reordered safely.
 
     // A is potentially the source of a dependence.
     auto *Src = A->first;
     auto SrcDes = A->second;
 
     // B is potentially the sink of a dependence.
     auto *Sink = B->first;
     auto SinkDes = B->second;
 
     // Code motion for interleaved accesses can't violate WAR dependences.
     // Thus, reordering is legal if the source isn't a write.
     if (!Src->mayWriteToMemory())
       return true;
 
     // At least one of the accesses must be strided.
     if (!isStrided(SrcDes.Stride) && !isStrided(SinkDes.Stride))
       return true;
 
     // If dependence information is not available from LoopAccessInfo,
     // conservatively assume the instructions can't be reordered.
     if (!areDependencesValid())
       return false;
 
     // If we know there is a dependence from source to sink, assume the
     // instructions can't be reordered. Otherwise, reordering is legal.
     return !Dependences.count(Src) || !Dependences.lookup(Src).count(Sink);
   }
 
   /// Collect the dependences from LoopAccessInfo.
   ///
   /// We process the dependences once during the interleaved access analysis to
   /// enable constant-time dependence queries.
   void collectDependences() {
     if (!areDependencesValid())
       return;
     auto *Deps = LAI->getDepChecker().getDependences();
     for (auto Dep : *Deps)
       Dependences[Dep.getSource(*LAI)].insert(Dep.getDestination(*LAI));
   }
 };
 
 } // end anonymous namespace
 
 static void emitMissedWarning(Function *F, Loop *L,
                               const LoopVectorizeHints &LH,
                               OptimizationRemarkEmitter *ORE) {
   LH.emitRemarkWithHints();
 
   if (LH.getForce() == LoopVectorizeHints::FK_Enabled) {
     if (LH.getWidth() != 1)
       ORE->emit(DiagnosticInfoOptimizationFailure(
                     DEBUG_TYPE, "FailedRequestedVectorization",
                     L->getStartLoc(), L->getHeader())
                 << "loop not vectorized: "
                 << "failed explicitly specified loop vectorization");
     else if (LH.getInterleave() != 1)
       ORE->emit(DiagnosticInfoOptimizationFailure(
                     DEBUG_TYPE, "FailedRequestedInterleaving", L->getStartLoc(),
                     L->getHeader())
                 << "loop not interleaved: "
                 << "failed explicitly specified loop interleaving");
   }
 }
 
 namespace llvm {
 
 /// LoopVectorizationCostModel - estimates the expected speedups due to
 /// vectorization.
 /// In many cases vectorization is not profitable. This can happen because of
 /// a number of reasons. In this class we mainly attempt to predict the
 /// expected speedup/slowdowns due to the supported instruction set. We use the
 /// TargetTransformInfo to query the different backends for the cost of
 /// different operations.
 class LoopVectorizationCostModel {
 public:
   LoopVectorizationCostModel(Loop *L, PredicatedScalarEvolution &PSE,
                              LoopInfo *LI, LoopVectorizationLegality *Legal,
                              const TargetTransformInfo &TTI,
                              const TargetLibraryInfo *TLI, DemandedBits *DB,
                              AssumptionCache *AC,
                              OptimizationRemarkEmitter *ORE, const Function *F,
                              const LoopVectorizeHints *Hints,
                              InterleavedAccessInfo &IAI)
       : TheLoop(L), PSE(PSE), LI(LI), Legal(Legal), TTI(TTI), TLI(TLI), DB(DB),
     AC(AC), ORE(ORE), TheFunction(F), Hints(Hints), InterleaveInfo(IAI) {}
 
   /// \return An upper bound for the vectorization factor, or None if
   /// vectorization should be avoided up front.
   Optional<unsigned> computeMaxVF(bool OptForSize);
 
   /// \return The most profitable vectorization factor and the cost of that VF.
   /// This method checks every power of two up to MaxVF. If UserVF is not ZERO
   /// then this vectorization factor will be selected if vectorization is
   /// possible.
   VectorizationFactor selectVectorizationFactor(unsigned MaxVF);
 
   /// Setup cost-based decisions for user vectorization factor.
   void selectUserVectorizationFactor(unsigned UserVF) {
     collectUniformsAndScalars(UserVF);
     collectInstsToScalarize(UserVF);
   }
 
   /// \return The size (in bits) of the smallest and widest types in the code
   /// that needs to be vectorized. We ignore values that remain scalar such as
   /// 64 bit loop indices.
   std::pair<unsigned, unsigned> getSmallestAndWidestTypes();
 
   /// \return The desired interleave count.
   /// If interleave count has been specified by metadata it will be returned.
   /// Otherwise, the interleave count is computed and returned. VF and LoopCost
   /// are the selected vectorization factor and the cost of the selected VF.
   unsigned selectInterleaveCount(bool OptForSize, unsigned VF,
                                  unsigned LoopCost);
 
   /// Memory access instruction may be vectorized in more than one way.
   /// Form of instruction after vectorization depends on cost.
   /// This function takes cost-based decisions for Load/Store instructions
   /// and collects them in a map. This decisions map is used for building
   /// the lists of loop-uniform and loop-scalar instructions.
   /// The calculated cost is saved with widening decision in order to
   /// avoid redundant calculations.
   void setCostBasedWideningDecision(unsigned VF);
 
   /// A struct that represents some properties of the register usage
   /// of a loop.
   struct RegisterUsage {
     /// Holds the number of loop invariant values that are used in the loop.
     unsigned LoopInvariantRegs;
 
     /// Holds the maximum number of concurrent live intervals in the loop.
     unsigned MaxLocalUsers;
   };
 
   /// \return Returns information about the register usages of the loop for the
   /// given vectorization factors.
   SmallVector<RegisterUsage, 8> calculateRegisterUsage(ArrayRef<unsigned> VFs);
 
   /// Collect values we want to ignore in the cost model.
   void collectValuesToIgnore();
 
   /// \returns The smallest bitwidth each instruction can be represented with.
   /// The vector equivalents of these instructions should be truncated to this
   /// type.
   const MapVector<Instruction *, uint64_t> &getMinimalBitwidths() const {
     return MinBWs;
   }
 
   /// \returns True if it is more profitable to scalarize instruction \p I for
   /// vectorization factor \p VF.
   bool isProfitableToScalarize(Instruction *I, unsigned VF) const {
     assert(VF > 1 && "Profitable to scalarize relevant only for VF > 1.");
     auto Scalars = InstsToScalarize.find(VF);
     assert(Scalars != InstsToScalarize.end() &&
            "VF not yet analyzed for scalarization profitability");
     return Scalars->second.count(I);
   }
 
   /// Returns true if \p I is known to be uniform after vectorization.
   bool isUniformAfterVectorization(Instruction *I, unsigned VF) const {
     if (VF == 1)
       return true;
     assert(Uniforms.count(VF) && "VF not yet analyzed for uniformity");
     auto UniformsPerVF = Uniforms.find(VF);
     return UniformsPerVF->second.count(I);
   }
 
   /// Returns true if \p I is known to be scalar after vectorization.
   bool isScalarAfterVectorization(Instruction *I, unsigned VF) const {
     if (VF == 1)
       return true;
     assert(Scalars.count(VF) && "Scalar values are not calculated for VF");
     auto ScalarsPerVF = Scalars.find(VF);
     return ScalarsPerVF->second.count(I);
   }
 
   /// \returns True if instruction \p I can be truncated to a smaller bitwidth
   /// for vectorization factor \p VF.
   bool canTruncateToMinimalBitwidth(Instruction *I, unsigned VF) const {
     return VF > 1 && MinBWs.count(I) && !isProfitableToScalarize(I, VF) &&
            !isScalarAfterVectorization(I, VF);
   }
 
   /// Decision that was taken during cost calculation for memory instruction.
   enum InstWidening {
     CM_Unknown,
     CM_Widen,         // For consecutive accesses with stride +1.
     CM_Widen_Reverse, // For consecutive accesses with stride -1.
     CM_Interleave,
     CM_GatherScatter,
     CM_Scalarize
   };
 
   /// Save vectorization decision \p W and \p Cost taken by the cost model for
   /// instruction \p I and vector width \p VF.
   void setWideningDecision(Instruction *I, unsigned VF, InstWidening W,
                            unsigned Cost) {
     assert(VF >= 2 && "Expected VF >=2");
     WideningDecisions[std::make_pair(I, VF)] = std::make_pair(W, Cost);
   }
 
   /// Save vectorization decision \p W and \p Cost taken by the cost model for
   /// interleaving group \p Grp and vector width \p VF.
   void setWideningDecision(const InterleaveGroup *Grp, unsigned VF,
                            InstWidening W, unsigned Cost) {
     assert(VF >= 2 && "Expected VF >=2");
     /// Broadcast this decicion to all instructions inside the group.
     /// But the cost will be assigned to one instruction only.
     for (unsigned i = 0; i < Grp->getFactor(); ++i) {
       if (auto *I = Grp->getMember(i)) {
         if (Grp->getInsertPos() == I)
           WideningDecisions[std::make_pair(I, VF)] = std::make_pair(W, Cost);
         else
           WideningDecisions[std::make_pair(I, VF)] = std::make_pair(W, 0);
       }
     }
   }
 
   /// Return the cost model decision for the given instruction \p I and vector
   /// width \p VF. Return CM_Unknown if this instruction did not pass
   /// through the cost modeling.
   InstWidening getWideningDecision(Instruction *I, unsigned VF) {
     assert(VF >= 2 && "Expected VF >=2");
     std::pair<Instruction *, unsigned> InstOnVF = std::make_pair(I, VF);
     auto Itr = WideningDecisions.find(InstOnVF);
     if (Itr == WideningDecisions.end())
       return CM_Unknown;
     return Itr->second.first;
   }
 
   /// Return the vectorization cost for the given instruction \p I and vector
   /// width \p VF.
   unsigned getWideningCost(Instruction *I, unsigned VF) {
     assert(VF >= 2 && "Expected VF >=2");
     std::pair<Instruction *, unsigned> InstOnVF = std::make_pair(I, VF);
     assert(WideningDecisions.count(InstOnVF) && "The cost is not calculated");
     return WideningDecisions[InstOnVF].second;
   }
 
   /// Return True if instruction \p I is an optimizable truncate whose operand
   /// is an induction variable. Such a truncate will be removed by adding a new
   /// induction variable with the destination type.
   bool isOptimizableIVTruncate(Instruction *I, unsigned VF) {
     // If the instruction is not a truncate, return false.
     auto *Trunc = dyn_cast<TruncInst>(I);
     if (!Trunc)
       return false;
 
     // Get the source and destination types of the truncate.
     Type *SrcTy = ToVectorTy(cast<CastInst>(I)->getSrcTy(), VF);
     Type *DestTy = ToVectorTy(cast<CastInst>(I)->getDestTy(), VF);
 
     // If the truncate is free for the given types, return false. Replacing a
     // free truncate with an induction variable would add an induction variable
     // update instruction to each iteration of the loop. We exclude from this
     // check the primary induction variable since it will need an update
     // instruction regardless.
     Value *Op = Trunc->getOperand(0);
     if (Op != Legal->getPrimaryInduction() && TTI.isTruncateFree(SrcTy, DestTy))
       return false;
 
     // If the truncated value is not an induction variable, return false.
     return Legal->isInductionPhi(Op);
   }
 
   /// Collects the instructions to scalarize for each predicated instruction in
   /// the loop.
   void collectInstsToScalarize(unsigned VF);
 
   /// Collect Uniform and Scalar values for the given \p VF.
   /// The sets depend on CM decision for Load/Store instructions
   /// that may be vectorized as interleave, gather-scatter or scalarized.
   void collectUniformsAndScalars(unsigned VF) {
     // Do the analysis once.
     if (VF == 1 || Uniforms.count(VF))
       return;
     setCostBasedWideningDecision(VF);
     collectLoopUniforms(VF);
     collectLoopScalars(VF);
   }
 
   /// Returns true if the target machine supports masked store operation
   /// for the given \p DataType and kind of access to \p Ptr.
   bool isLegalMaskedStore(Type *DataType, Value *Ptr) {
     return Legal->isConsecutivePtr(Ptr) && TTI.isLegalMaskedStore(DataType);
   }
 
   /// Returns true if the target machine supports masked load operation
   /// for the given \p DataType and kind of access to \p Ptr.
   bool isLegalMaskedLoad(Type *DataType, Value *Ptr) {
     return Legal->isConsecutivePtr(Ptr) && TTI.isLegalMaskedLoad(DataType);
   }
 
   /// Returns true if the target machine supports masked scatter operation
   /// for the given \p DataType.
   bool isLegalMaskedScatter(Type *DataType) {
     return TTI.isLegalMaskedScatter(DataType);
   }
 
   /// Returns true if the target machine supports masked gather operation
   /// for the given \p DataType.
   bool isLegalMaskedGather(Type *DataType) {
     return TTI.isLegalMaskedGather(DataType);
   }
 
   /// Returns true if the target machine can represent \p V as a masked gather
   /// or scatter operation.
   bool isLegalGatherOrScatter(Value *V) {
     bool LI = isa<LoadInst>(V);
     bool SI = isa<StoreInst>(V);
     if (!LI && !SI)
       return false;
     auto *Ty = getMemInstValueType(V);
     return (LI && isLegalMaskedGather(Ty)) || (SI && isLegalMaskedScatter(Ty));
   }
 
   /// Returns true if \p I is an instruction that will be scalarized with
   /// predication. Such instructions include conditional stores and
   /// instructions that may divide by zero.
   bool isScalarWithPredication(Instruction *I);
 
   /// Returns true if \p I is a memory instruction with consecutive memory
   /// access that can be widened.
   bool memoryInstructionCanBeWidened(Instruction *I, unsigned VF = 1);
 
   /// Check if \p Instr belongs to any interleaved access group.
   bool isAccessInterleaved(Instruction *Instr) {
     return InterleaveInfo.isInterleaved(Instr);
   }
 
   /// Get the interleaved access group that \p Instr belongs to.
   const InterleaveGroup *getInterleavedAccessGroup(Instruction *Instr) {
     return InterleaveInfo.getInterleaveGroup(Instr);
   }
 
   /// Returns true if an interleaved group requires a scalar iteration
   /// to handle accesses with gaps.
   bool requiresScalarEpilogue() const {
     return InterleaveInfo.requiresScalarEpilogue();
   }
 
 private:
   unsigned NumPredStores = 0;
 
   /// \return An upper bound for the vectorization factor, larger than zero.
   /// One is returned if vectorization should best be avoided due to cost.
   unsigned computeFeasibleMaxVF(bool OptForSize, unsigned ConstTripCount);
 
   /// The vectorization cost is a combination of the cost itself and a boolean
   /// indicating whether any of the contributing operations will actually
   /// operate on
   /// vector values after type legalization in the backend. If this latter value
   /// is
   /// false, then all operations will be scalarized (i.e. no vectorization has
   /// actually taken place).
   using VectorizationCostTy = std::pair<unsigned, bool>;
 
   /// Returns the expected execution cost. The unit of the cost does
   /// not matter because we use the 'cost' units to compare different
   /// vector widths. The cost that is returned is *not* normalized by
   /// the factor width.
   VectorizationCostTy expectedCost(unsigned VF);
 
   /// Returns the execution time cost of an instruction for a given vector
   /// width. Vector width of one means scalar.
   VectorizationCostTy getInstructionCost(Instruction *I, unsigned VF);
 
   /// The cost-computation logic from getInstructionCost which provides
   /// the vector type as an output parameter.
   unsigned getInstructionCost(Instruction *I, unsigned VF, Type *&VectorTy);
 
   /// Calculate vectorization cost of memory instruction \p I.
   unsigned getMemoryInstructionCost(Instruction *I, unsigned VF);
 
   /// The cost computation for scalarized memory instruction.
   unsigned getMemInstScalarizationCost(Instruction *I, unsigned VF);
 
   /// The cost computation for interleaving group of memory instructions.
   unsigned getInterleaveGroupCost(Instruction *I, unsigned VF);
 
   /// The cost computation for Gather/Scatter instruction.
   unsigned getGatherScatterCost(Instruction *I, unsigned VF);
 
   /// The cost computation for widening instruction \p I with consecutive
   /// memory access.
   unsigned getConsecutiveMemOpCost(Instruction *I, unsigned VF);
 
   /// The cost calculation for Load instruction \p I with uniform pointer -
   /// scalar load + broadcast.
   unsigned getUniformMemOpCost(Instruction *I, unsigned VF);
 
   /// Returns whether the instruction is a load or store and will be a emitted
   /// as a vector operation.
   bool isConsecutiveLoadOrStore(Instruction *I);
 
   /// Returns true if an artificially high cost for emulated masked memrefs
   /// should be used.
   bool useEmulatedMaskMemRefHack(Instruction *I);
 
   /// Create an analysis remark that explains why vectorization failed
   ///
   /// \p RemarkName is the identifier for the remark.  \return the remark object
   /// that can be streamed to.
   OptimizationRemarkAnalysis createMissedAnalysis(StringRef RemarkName) {
     return createLVMissedAnalysis(Hints->vectorizeAnalysisPassName(),
                                   RemarkName, TheLoop);
   }
 
   /// Map of scalar integer values to the smallest bitwidth they can be legally
   /// represented as. The vector equivalents of these values should be truncated
   /// to this type.
   MapVector<Instruction *, uint64_t> MinBWs;
 
   /// A type representing the costs for instructions if they were to be
   /// scalarized rather than vectorized. The entries are Instruction-Cost
   /// pairs.
   using ScalarCostsTy = DenseMap<Instruction *, unsigned>;
 
   /// A set containing all BasicBlocks that are known to present after
   /// vectorization as a predicated block.
   SmallPtrSet<BasicBlock *, 4> PredicatedBBsAfterVectorization;
 
   /// A map holding scalar costs for different vectorization factors. The
   /// presence of a cost for an instruction in the mapping indicates that the
   /// instruction will be scalarized when vectorizing with the associated
   /// vectorization factor. The entries are VF-ScalarCostTy pairs.
   DenseMap<unsigned, ScalarCostsTy> InstsToScalarize;
 
   /// Holds the instructions known to be uniform after vectorization.
   /// The data is collected per VF.
   DenseMap<unsigned, SmallPtrSet<Instruction *, 4>> Uniforms;
 
   /// Holds the instructions known to be scalar after vectorization.
   /// The data is collected per VF.
   DenseMap<unsigned, SmallPtrSet<Instruction *, 4>> Scalars;
 
   /// Holds the instructions (address computations) that are forced to be
   /// scalarized.
   DenseMap<unsigned, SmallPtrSet<Instruction *, 4>> ForcedScalars;
 
   /// Returns the expected difference in cost from scalarizing the expression
   /// feeding a predicated instruction \p PredInst. The instructions to
   /// scalarize and their scalar costs are collected in \p ScalarCosts. A
   /// non-negative return value implies the expression will be scalarized.
   /// Currently, only single-use chains are considered for scalarization.
   int computePredInstDiscount(Instruction *PredInst, ScalarCostsTy &ScalarCosts,
                               unsigned VF);
 
   /// Collect the instructions that are uniform after vectorization. An
   /// instruction is uniform if we represent it with a single scalar value in
   /// the vectorized loop corresponding to each vector iteration. Examples of
   /// uniform instructions include pointer operands of consecutive or
   /// interleaved memory accesses. Note that although uniformity implies an
   /// instruction will be scalar, the reverse is not true. In general, a
   /// scalarized instruction will be represented by VF scalar values in the
   /// vectorized loop, each corresponding to an iteration of the original
   /// scalar loop.
   void collectLoopUniforms(unsigned VF);
 
   /// Collect the instructions that are scalar after vectorization. An
   /// instruction is scalar if it is known to be uniform or will be scalarized
   /// during vectorization. Non-uniform scalarized instructions will be
   /// represented by VF values in the vectorized loop, each corresponding to an
   /// iteration of the original scalar loop.
   void collectLoopScalars(unsigned VF);
 
   /// Keeps cost model vectorization decision and cost for instructions.
   /// Right now it is used for memory instructions only.
   using DecisionList = DenseMap<std::pair<Instruction *, unsigned>,
                                 std::pair<InstWidening, unsigned>>;
 
   DecisionList WideningDecisions;
 
 public:
   /// The loop that we evaluate.
   Loop *TheLoop;
 
   /// Predicated scalar evolution analysis.
   PredicatedScalarEvolution &PSE;
 
   /// Loop Info analysis.
   LoopInfo *LI;
 
   /// Vectorization legality.
   LoopVectorizationLegality *Legal;
 
   /// Vector target information.
   const TargetTransformInfo &TTI;
 
   /// Target Library Info.
   const TargetLibraryInfo *TLI;
 
   /// Demanded bits analysis.
   DemandedBits *DB;
 
   /// Assumption cache.
   AssumptionCache *AC;
 
   /// Interface to emit optimization remarks.
   OptimizationRemarkEmitter *ORE;
 
   const Function *TheFunction;
 
   /// Loop Vectorize Hint.
   const LoopVectorizeHints *Hints;
 
   /// The interleave access information contains groups of interleaved accesses
   /// with the same stride and close to each other.
   InterleavedAccessInfo &InterleaveInfo;
 
   /// Values to ignore in the cost model.
   SmallPtrSet<const Value *, 16> ValuesToIgnore;
 
   /// Values to ignore in the cost model when VF > 1.
   SmallPtrSet<const Value *, 16> VecValuesToIgnore;
 };
 
 } // end namespace llvm
 
 // Return true if \p OuterLp is an outer loop annotated with hints for explicit
 // vectorization. The loop needs to be annotated with #pragma omp simd
 // simdlen(#) or #pragma clang vectorize(enable) vectorize_width(#). If the
 // vector length information is not provided, vectorization is not considered
 // explicit. Interleave hints are not allowed either. These limitations will be
 // relaxed in the future.
 // Please, note that we are currently forced to abuse the pragma 'clang
 // vectorize' semantics. This pragma provides *auto-vectorization hints*
 // (i.e., LV must check that vectorization is legal) whereas pragma 'omp simd'
 // provides *explicit vectorization hints* (LV can bypass legal checks and
 // assume that vectorization is legal). However, both hints are implemented
 // using the same metadata (llvm.loop.vectorize, processed by
 // LoopVectorizeHints). This will be fixed in the future when the native IR
 // representation for pragma 'omp simd' is introduced.
 static bool isExplicitVecOuterLoop(Loop *OuterLp,
                                    OptimizationRemarkEmitter *ORE) {
   assert(!OuterLp->empty() && "This is not an outer loop");
   LoopVectorizeHints Hints(OuterLp, true /*DisableInterleaving*/, *ORE);
 
   // Only outer loops with an explicit vectorization hint are supported.
   // Unannotated outer loops are ignored.
   if (Hints.getForce() == LoopVectorizeHints::FK_Undefined)
     return false;
 
   Function *Fn = OuterLp->getHeader()->getParent();
   if (!Hints.allowVectorization(Fn, OuterLp, false /*AlwaysVectorize*/)) {
     LLVM_DEBUG(dbgs() << "LV: Loop hints prevent outer loop vectorization.\n");
     return false;
   }
 
   if (!Hints.getWidth()) {
     LLVM_DEBUG(dbgs() << "LV: Not vectorizing: No user vector width.\n");
     emitMissedWarning(Fn, OuterLp, Hints, ORE);
     return false;
   }
 
   if (Hints.getInterleave() > 1) {
     // TODO: Interleave support is future work.
     LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Interleave is not supported for "
                          "outer loops.\n");
     emitMissedWarning(Fn, OuterLp, Hints, ORE);
     return false;
   }
 
   return true;
 }
 
 static void collectSupportedLoops(Loop &L, LoopInfo *LI,
                                   OptimizationRemarkEmitter *ORE,
                                   SmallVectorImpl<Loop *> &V) {
   // Collect inner loops and outer loops without irreducible control flow. For
   // now, only collect outer loops that have explicit vectorization hints. If we
   // are stress testing the VPlan H-CFG construction, we collect the outermost
   // loop of every loop nest.
   if (L.empty() || VPlanBuildStressTest ||
       (EnableVPlanNativePath && isExplicitVecOuterLoop(&L, ORE))) {
     LoopBlocksRPO RPOT(&L);
     RPOT.perform(LI);
     if (!containsIrreducibleCFG<const BasicBlock *>(RPOT, *LI)) {
       V.push_back(&L);
       // TODO: Collect inner loops inside marked outer loops in case
       // vectorization fails for the outer loop. Do not invoke
       // 'containsIrreducibleCFG' again for inner loops when the outer loop is
       // already known to be reducible. We can use an inherited attribute for
       // that.
       return;
     }
   }
   for (Loop *InnerL : L)
     collectSupportedLoops(*InnerL, LI, ORE, V);
 }
 
 namespace {
 
 /// The LoopVectorize Pass.
 struct LoopVectorize : public FunctionPass {
   /// Pass identification, replacement for typeid
   static char ID;
 
   LoopVectorizePass Impl;
 
   explicit LoopVectorize(bool NoUnrolling = false, bool AlwaysVectorize = true)
       : FunctionPass(ID) {
     Impl.DisableUnrolling = NoUnrolling;
     Impl.AlwaysVectorize = AlwaysVectorize;
     initializeLoopVectorizePass(*PassRegistry::getPassRegistry());
   }
 
   bool runOnFunction(Function &F) override {
     if (skipFunction(F))
       return false;
 
     auto *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
     auto *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
     auto *TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
     auto *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
     auto *BFI = &getAnalysis<BlockFrequencyInfoWrapperPass>().getBFI();
     auto *TLIP = getAnalysisIfAvailable<TargetLibraryInfoWrapperPass>();
     auto *TLI = TLIP ? &TLIP->getTLI() : nullptr;
     auto *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
     auto *AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
     auto *LAA = &getAnalysis<LoopAccessLegacyAnalysis>();
     auto *DB = &getAnalysis<DemandedBitsWrapperPass>().getDemandedBits();
     auto *ORE = &getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();
 
     std::function<const LoopAccessInfo &(Loop &)> GetLAA =
         [&](Loop &L) -> const LoopAccessInfo & { return LAA->getInfo(&L); };
 
     return Impl.runImpl(F, *SE, *LI, *TTI, *DT, *BFI, TLI, *DB, *AA, *AC,
                         GetLAA, *ORE);
   }
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.addRequired<AssumptionCacheTracker>();
     AU.addRequired<BlockFrequencyInfoWrapperPass>();
     AU.addRequired<DominatorTreeWrapperPass>();
     AU.addRequired<LoopInfoWrapperPass>();
     AU.addRequired<ScalarEvolutionWrapperPass>();
     AU.addRequired<TargetTransformInfoWrapperPass>();
     AU.addRequired<AAResultsWrapperPass>();
     AU.addRequired<LoopAccessLegacyAnalysis>();
     AU.addRequired<DemandedBitsWrapperPass>();
     AU.addRequired<OptimizationRemarkEmitterWrapperPass>();
     AU.addPreserved<LoopInfoWrapperPass>();
     AU.addPreserved<DominatorTreeWrapperPass>();
     AU.addPreserved<BasicAAWrapperPass>();
     AU.addPreserved<GlobalsAAWrapperPass>();
   }
 };
 
 } // end anonymous namespace
 
 //===----------------------------------------------------------------------===//
 // Implementation of LoopVectorizationLegality, InnerLoopVectorizer and
 // LoopVectorizationCostModel and LoopVectorizationPlanner.
 //===----------------------------------------------------------------------===//
 
 Value *InnerLoopVectorizer::getBroadcastInstrs(Value *V) {
   // We need to place the broadcast of invariant variables outside the loop,
   // but only if it's proven safe to do so. Else, broadcast will be inside
   // vector loop body.
   Instruction *Instr = dyn_cast<Instruction>(V);
   bool SafeToHoist = OrigLoop->isLoopInvariant(V) &&
                      (!Instr ||
                       DT->dominates(Instr->getParent(), LoopVectorPreHeader));
   // Place the code for broadcasting invariant variables in the new preheader.
   IRBuilder<>::InsertPointGuard Guard(Builder);
   if (SafeToHoist)
     Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
 
   // Broadcast the scalar into all locations in the vector.
   Value *Shuf = Builder.CreateVectorSplat(VF, V, "broadcast");
 
   return Shuf;
 }
 
 void InnerLoopVectorizer::createVectorIntOrFpInductionPHI(
     const InductionDescriptor &II, Value *Step, Instruction *EntryVal) {
   assert((isa<PHINode>(EntryVal) || isa<TruncInst>(EntryVal)) &&
          "Expected either an induction phi-node or a truncate of it!");
   Value *Start = II.getStartValue();
 
   // Construct the initial value of the vector IV in the vector loop preheader
   auto CurrIP = Builder.saveIP();
   Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
   if (isa<TruncInst>(EntryVal)) {
     assert(Start->getType()->isIntegerTy() &&
            "Truncation requires an integer type");
     auto *TruncType = cast<IntegerType>(EntryVal->getType());
     Step = Builder.CreateTrunc(Step, TruncType);
     Start = Builder.CreateCast(Instruction::Trunc, Start, TruncType);
   }
   Value *SplatStart = Builder.CreateVectorSplat(VF, Start);
   Value *SteppedStart =
       getStepVector(SplatStart, 0, Step, II.getInductionOpcode());
 
   // We create vector phi nodes for both integer and floating-point induction
   // variables. Here, we determine the kind of arithmetic we will perform.
   Instruction::BinaryOps AddOp;
   Instruction::BinaryOps MulOp;
   if (Step->getType()->isIntegerTy()) {
     AddOp = Instruction::Add;
     MulOp = Instruction::Mul;
   } else {
     AddOp = II.getInductionOpcode();
     MulOp = Instruction::FMul;
   }
 
   // Multiply the vectorization factor by the step using integer or
   // floating-point arithmetic as appropriate.
   Value *ConstVF = getSignedIntOrFpConstant(Step->getType(), VF);
   Value *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, Step, ConstVF));
 
   // Create a vector splat to use in the induction update.
   //
   // FIXME: If the step is non-constant, we create the vector splat with
   //        IRBuilder. IRBuilder can constant-fold the multiply, but it doesn't
   //        handle a constant vector splat.
   Value *SplatVF = isa<Constant>(Mul)
                        ? ConstantVector::getSplat(VF, cast<Constant>(Mul))
                        : Builder.CreateVectorSplat(VF, Mul);
   Builder.restoreIP(CurrIP);
 
   // We may need to add the step a number of times, depending on the unroll
   // factor. The last of those goes into the PHI.
   PHINode *VecInd = PHINode::Create(SteppedStart->getType(), 2, "vec.ind",
                                     &*LoopVectorBody->getFirstInsertionPt());
   VecInd->setDebugLoc(EntryVal->getDebugLoc());
   Instruction *LastInduction = VecInd;
   for (unsigned Part = 0; Part < UF; ++Part) {
     VectorLoopValueMap.setVectorValue(EntryVal, Part, LastInduction);
 
     if (isa<TruncInst>(EntryVal))
       addMetadata(LastInduction, EntryVal);
     recordVectorLoopValueForInductionCast(II, EntryVal, LastInduction, Part);
 
     LastInduction = cast<Instruction>(addFastMathFlag(
         Builder.CreateBinOp(AddOp, LastInduction, SplatVF, "step.add")));
     LastInduction->setDebugLoc(EntryVal->getDebugLoc());
   }
 
   // Move the last step to the end of the latch block. This ensures consistent
   // placement of all induction updates.
   auto *LoopVectorLatch = LI->getLoopFor(LoopVectorBody)->getLoopLatch();
   auto *Br = cast<BranchInst>(LoopVectorLatch->getTerminator());
   auto *ICmp = cast<Instruction>(Br->getCondition());
   LastInduction->moveBefore(ICmp);
   LastInduction->setName("vec.ind.next");
 
   VecInd->addIncoming(SteppedStart, LoopVectorPreHeader);
   VecInd->addIncoming(LastInduction, LoopVectorLatch);
 }
 
 bool InnerLoopVectorizer::shouldScalarizeInstruction(Instruction *I) const {
   return Cost->isScalarAfterVectorization(I, VF) ||
          Cost->isProfitableToScalarize(I, VF);
 }
 
 bool InnerLoopVectorizer::needsScalarInduction(Instruction *IV) const {
   if (shouldScalarizeInstruction(IV))
     return true;
   auto isScalarInst = [&](User *U) -> bool {
     auto *I = cast<Instruction>(U);
     return (OrigLoop->contains(I) && shouldScalarizeInstruction(I));
   };
   return llvm::any_of(IV->users(), isScalarInst);
 }
 
 void InnerLoopVectorizer::recordVectorLoopValueForInductionCast(
     const InductionDescriptor &ID, const Instruction *EntryVal,
     Value *VectorLoopVal, unsigned Part, unsigned Lane) {
   assert((isa<PHINode>(EntryVal) || isa<TruncInst>(EntryVal)) &&
          "Expected either an induction phi-node or a truncate of it!");
 
   // This induction variable is not the phi from the original loop but the
   // newly-created IV based on the proof that casted Phi is equal to the
   // uncasted Phi in the vectorized loop (under a runtime guard possibly). It
   // re-uses the same InductionDescriptor that original IV uses but we don't
   // have to do any recording in this case - that is done when original IV is
   // processed.
   if (isa<TruncInst>(EntryVal))
     return;
 
   const SmallVectorImpl<Instruction *> &Casts = ID.getCastInsts();
   if (Casts.empty())
     return;
   // Only the first Cast instruction in the Casts vector is of interest.
   // The rest of the Casts (if exist) have no uses outside the
   // induction update chain itself.
   Instruction *CastInst = *Casts.begin();
   if (Lane < UINT_MAX)
     VectorLoopValueMap.setScalarValue(CastInst, {Part, Lane}, VectorLoopVal);
   else
     VectorLoopValueMap.setVectorValue(CastInst, Part, VectorLoopVal);
 }
 
 void InnerLoopVectorizer::widenIntOrFpInduction(PHINode *IV, TruncInst *Trunc) {
   assert((IV->getType()->isIntegerTy() || IV != OldInduction) &&
          "Primary induction variable must have an integer type");
 
   auto II = Legal->getInductionVars()->find(IV);
   assert(II != Legal->getInductionVars()->end() && "IV is not an induction");
 
   auto ID = II->second;
   assert(IV->getType() == ID.getStartValue()->getType() && "Types must match");
 
   // The scalar value to broadcast. This will be derived from the canonical
   // induction variable.
   Value *ScalarIV = nullptr;
 
   // The value from the original loop to which we are mapping the new induction
   // variable.
   Instruction *EntryVal = Trunc ? cast<Instruction>(Trunc) : IV;
 
   // True if we have vectorized the induction variable.
   auto VectorizedIV = false;
 
   // Determine if we want a scalar version of the induction variable. This is
   // true if the induction variable itself is not widened, or if it has at
   // least one user in the loop that is not widened.
   auto NeedsScalarIV = VF > 1 && needsScalarInduction(EntryVal);
 
   // Generate code for the induction step. Note that induction steps are
   // required to be loop-invariant
   assert(PSE.getSE()->isLoopInvariant(ID.getStep(), OrigLoop) &&
          "Induction step should be loop invariant");
   auto &DL = OrigLoop->getHeader()->getModule()->getDataLayout();
   Value *Step = nullptr;
   if (PSE.getSE()->isSCEVable(IV->getType())) {
     SCEVExpander Exp(*PSE.getSE(), DL, "induction");
     Step = Exp.expandCodeFor(ID.getStep(), ID.getStep()->getType(),
                              LoopVectorPreHeader->getTerminator());
   } else {
     Step = cast<SCEVUnknown>(ID.getStep())->getValue();
   }
 
   // Try to create a new independent vector induction variable. If we can't
   // create the phi node, we will splat the scalar induction variable in each
   // loop iteration.
   if (VF > 1 && !shouldScalarizeInstruction(EntryVal)) {
     createVectorIntOrFpInductionPHI(ID, Step, EntryVal);
     VectorizedIV = true;
   }
 
   // If we haven't yet vectorized the induction variable, or if we will create
   // a scalar one, we need to define the scalar induction variable and step
   // values. If we were given a truncation type, truncate the canonical
   // induction variable and step. Otherwise, derive these values from the
   // induction descriptor.
   if (!VectorizedIV || NeedsScalarIV) {
     ScalarIV = Induction;
     if (IV != OldInduction) {
       ScalarIV = IV->getType()->isIntegerTy()
                      ? Builder.CreateSExtOrTrunc(Induction, IV->getType())
                      : Builder.CreateCast(Instruction::SIToFP, Induction,
                                           IV->getType());
       ScalarIV = ID.transform(Builder, ScalarIV, PSE.getSE(), DL);
       ScalarIV->setName("offset.idx");
     }
     if (Trunc) {
       auto *TruncType = cast<IntegerType>(Trunc->getType());
       assert(Step->getType()->isIntegerTy() &&
              "Truncation requires an integer step");
       ScalarIV = Builder.CreateTrunc(ScalarIV, TruncType);
       Step = Builder.CreateTrunc(Step, TruncType);
     }
   }
 
   // If we haven't yet vectorized the induction variable, splat the scalar
   // induction variable, and build the necessary step vectors.
   // TODO: Don't do it unless the vectorized IV is really required.
   if (!VectorizedIV) {
     Value *Broadcasted = getBroadcastInstrs(ScalarIV);
     for (unsigned Part = 0; Part < UF; ++Part) {
       Value *EntryPart =
           getStepVector(Broadcasted, VF * Part, Step, ID.getInductionOpcode());
       VectorLoopValueMap.setVectorValue(EntryVal, Part, EntryPart);
       if (Trunc)
         addMetadata(EntryPart, Trunc);
       recordVectorLoopValueForInductionCast(ID, EntryVal, EntryPart, Part);
     }
   }
 
   // If an induction variable is only used for counting loop iterations or
   // calculating addresses, it doesn't need to be widened. Create scalar steps
   // that can be used by instructions we will later scalarize. Note that the
   // addition of the scalar steps will not increase the number of instructions
   // in the loop in the common case prior to InstCombine. We will be trading
   // one vector extract for each scalar step.
   if (NeedsScalarIV)
     buildScalarSteps(ScalarIV, Step, EntryVal, ID);
 }
 
 Value *InnerLoopVectorizer::getStepVector(Value *Val, int StartIdx, Value *Step,
                                           Instruction::BinaryOps BinOp) {
   // Create and check the types.
   assert(Val->getType()->isVectorTy() && "Must be a vector");
   int VLen = Val->getType()->getVectorNumElements();
 
   Type *STy = Val->getType()->getScalarType();
   assert((STy->isIntegerTy() || STy->isFloatingPointTy()) &&
          "Induction Step must be an integer or FP");
   assert(Step->getType() == STy && "Step has wrong type");
 
   SmallVector<Constant *, 8> Indices;
 
   if (STy->isIntegerTy()) {
     // Create a vector of consecutive numbers from zero to VF.
     for (int i = 0; i < VLen; ++i)
       Indices.push_back(ConstantInt::get(STy, StartIdx + i));
 
     // Add the consecutive indices to the vector value.
     Constant *Cv = ConstantVector::get(Indices);
     assert(Cv->getType() == Val->getType() && "Invalid consecutive vec");
     Step = Builder.CreateVectorSplat(VLen, Step);
     assert(Step->getType() == Val->getType() && "Invalid step vec");
     // FIXME: The newly created binary instructions should contain nsw/nuw flags,
     // which can be found from the original scalar operations.
     Step = Builder.CreateMul(Cv, Step);
     return Builder.CreateAdd(Val, Step, "induction");
   }
 
   // Floating point induction.
   assert((BinOp == Instruction::FAdd || BinOp == Instruction::FSub) &&
          "Binary Opcode should be specified for FP induction");
   // Create a vector of consecutive numbers from zero to VF.
   for (int i = 0; i < VLen; ++i)
     Indices.push_back(ConstantFP::get(STy, (double)(StartIdx + i)));
 
   // Add the consecutive indices to the vector value.
   Constant *Cv = ConstantVector::get(Indices);
 
   Step = Builder.CreateVectorSplat(VLen, Step);
 
   // Floating point operations had to be 'fast' to enable the induction.
   FastMathFlags Flags;
   Flags.setFast();
 
   Value *MulOp = Builder.CreateFMul(Cv, Step);
   if (isa<Instruction>(MulOp))
     // Have to check, MulOp may be a constant
     cast<Instruction>(MulOp)->setFastMathFlags(Flags);
 
   Value *BOp = Builder.CreateBinOp(BinOp, Val, MulOp, "induction");
   if (isa<Instruction>(BOp))
     cast<Instruction>(BOp)->setFastMathFlags(Flags);
   return BOp;
 }
 
 void InnerLoopVectorizer::buildScalarSteps(Value *ScalarIV, Value *Step,
                                            Instruction *EntryVal,
                                            const InductionDescriptor &ID) {
   // We shouldn't have to build scalar steps if we aren't vectorizing.
   assert(VF > 1 && "VF should be greater than one");
 
   // Get the value type and ensure it and the step have the same integer type.
   Type *ScalarIVTy = ScalarIV->getType()->getScalarType();
   assert(ScalarIVTy == Step->getType() &&
          "Val and Step should have the same type");
 
   // We build scalar steps for both integer and floating-point induction
   // variables. Here, we determine the kind of arithmetic we will perform.
   Instruction::BinaryOps AddOp;
   Instruction::BinaryOps MulOp;
   if (ScalarIVTy->isIntegerTy()) {
     AddOp = Instruction::Add;
     MulOp = Instruction::Mul;
   } else {
     AddOp = ID.getInductionOpcode();
     MulOp = Instruction::FMul;
   }
 
   // Determine the number of scalars we need to generate for each unroll
   // iteration. If EntryVal is uniform, we only need to generate the first
   // lane. Otherwise, we generate all VF values.
   unsigned Lanes =
       Cost->isUniformAfterVectorization(cast<Instruction>(EntryVal), VF) ? 1
                                                                          : VF;
   // Compute the scalar steps and save the results in VectorLoopValueMap.
   for (unsigned Part = 0; Part < UF; ++Part) {
     for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
       auto *StartIdx = getSignedIntOrFpConstant(ScalarIVTy, VF * Part + Lane);
       auto *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, StartIdx, Step));
       auto *Add = addFastMathFlag(Builder.CreateBinOp(AddOp, ScalarIV, Mul));
       VectorLoopValueMap.setScalarValue(EntryVal, {Part, Lane}, Add);
       recordVectorLoopValueForInductionCast(ID, EntryVal, Add, Part, Lane);
     }
   }
 }
 
 Value *InnerLoopVectorizer::getOrCreateVectorValue(Value *V, unsigned Part) {
   assert(V != Induction && "The new induction variable should not be used.");
   assert(!V->getType()->isVectorTy() && "Can't widen a vector");
   assert(!V->getType()->isVoidTy() && "Type does not produce a value");
 
   // If we have a stride that is replaced by one, do it here.
   if (Legal->hasStride(V))
     V = ConstantInt::get(V->getType(), 1);
 
   // If we have a vector mapped to this value, return it.
   if (VectorLoopValueMap.hasVectorValue(V, Part))
     return VectorLoopValueMap.getVectorValue(V, Part);
 
   // If the value has not been vectorized, check if it has been scalarized
   // instead. If it has been scalarized, and we actually need the value in
   // vector form, we will construct the vector values on demand.
   if (VectorLoopValueMap.hasAnyScalarValue(V)) {
     Value *ScalarValue = VectorLoopValueMap.getScalarValue(V, {Part, 0});
 
     // If we've scalarized a value, that value should be an instruction.
     auto *I = cast<Instruction>(V);
 
     // If we aren't vectorizing, we can just copy the scalar map values over to
     // the vector map.
     if (VF == 1) {
       VectorLoopValueMap.setVectorValue(V, Part, ScalarValue);
       return ScalarValue;
     }
 
     // Get the last scalar instruction we generated for V and Part. If the value
     // is known to be uniform after vectorization, this corresponds to lane zero
     // of the Part unroll iteration. Otherwise, the last instruction is the one
     // we created for the last vector lane of the Part unroll iteration.
     unsigned LastLane = Cost->isUniformAfterVectorization(I, VF) ? 0 : VF - 1;
     auto *LastInst = cast<Instruction>(
         VectorLoopValueMap.getScalarValue(V, {Part, LastLane}));
 
     // Set the insert point after the last scalarized instruction. This ensures
     // the insertelement sequence will directly follow the scalar definitions.
     auto OldIP = Builder.saveIP();
     auto NewIP = std::next(BasicBlock::iterator(LastInst));
     Builder.SetInsertPoint(&*NewIP);
 
     // However, if we are vectorizing, we need to construct the vector values.
     // If the value is known to be uniform after vectorization, we can just
     // broadcast the scalar value corresponding to lane zero for each unroll
     // iteration. Otherwise, we construct the vector values using insertelement
     // instructions. Since the resulting vectors are stored in
     // VectorLoopValueMap, we will only generate the insertelements once.
     Value *VectorValue = nullptr;
     if (Cost->isUniformAfterVectorization(I, VF)) {
       VectorValue = getBroadcastInstrs(ScalarValue);
       VectorLoopValueMap.setVectorValue(V, Part, VectorValue);
     } else {
       // Initialize packing with insertelements to start from undef.
       Value *Undef = UndefValue::get(VectorType::get(V->getType(), VF));
       VectorLoopValueMap.setVectorValue(V, Part, Undef);
       for (unsigned Lane = 0; Lane < VF; ++Lane)
         packScalarIntoVectorValue(V, {Part, Lane});
       VectorValue = VectorLoopValueMap.getVectorValue(V, Part);
     }
     Builder.restoreIP(OldIP);
     return VectorValue;
   }
 
   // If this scalar is unknown, assume that it is a constant or that it is
   // loop invariant. Broadcast V and save the value for future uses.
   Value *B = getBroadcastInstrs(V);
   VectorLoopValueMap.setVectorValue(V, Part, B);
   return B;
 }
 
 Value *
 InnerLoopVectorizer::getOrCreateScalarValue(Value *V,
                                             const VPIteration &Instance) {
   // If the value is not an instruction contained in the loop, it should
   // already be scalar.
   if (OrigLoop->isLoopInvariant(V))
     return V;
 
   assert(Instance.Lane > 0
              ? !Cost->isUniformAfterVectorization(cast<Instruction>(V), VF)
              : true && "Uniform values only have lane zero");
 
   // If the value from the original loop has not been vectorized, it is
   // represented by UF x VF scalar values in the new loop. Return the requested
   // scalar value.
   if (VectorLoopValueMap.hasScalarValue(V, Instance))
     return VectorLoopValueMap.getScalarValue(V, Instance);
 
   // If the value has not been scalarized, get its entry in VectorLoopValueMap
   // for the given unroll part. If this entry is not a vector type (i.e., the
   // vectorization factor is one), there is no need to generate an
   // extractelement instruction.
   auto *U = getOrCreateVectorValue(V, Instance.Part);
   if (!U->getType()->isVectorTy()) {
     assert(VF == 1 && "Value not scalarized has non-vector type");
     return U;
   }
 
   // Otherwise, the value from the original loop has been vectorized and is
   // represented by UF vector values. Extract and return the requested scalar
   // value from the appropriate vector lane.
   return Builder.CreateExtractElement(U, Builder.getInt32(Instance.Lane));
 }
 
 void InnerLoopVectorizer::packScalarIntoVectorValue(
     Value *V, const VPIteration &Instance) {
   assert(V != Induction && "The new induction variable should not be used.");
   assert(!V->getType()->isVectorTy() && "Can't pack a vector");
   assert(!V->getType()->isVoidTy() && "Type does not produce a value");
 
   Value *ScalarInst = VectorLoopValueMap.getScalarValue(V, Instance);
   Value *VectorValue = VectorLoopValueMap.getVectorValue(V, Instance.Part);
   VectorValue = Builder.CreateInsertElement(VectorValue, ScalarInst,
                                             Builder.getInt32(Instance.Lane));
   VectorLoopValueMap.resetVectorValue(V, Instance.Part, VectorValue);
 }
 
 Value *InnerLoopVectorizer::reverseVector(Value *Vec) {
   assert(Vec->getType()->isVectorTy() && "Invalid type");
   SmallVector<Constant *, 8> ShuffleMask;
   for (unsigned i = 0; i < VF; ++i)
     ShuffleMask.push_back(Builder.getInt32(VF - i - 1));
 
   return Builder.CreateShuffleVector(Vec, UndefValue::get(Vec->getType()),
                                      ConstantVector::get(ShuffleMask),
                                      "reverse");
 }
 
 // Try to vectorize the interleave group that \p Instr belongs to.
 //
 // E.g. Translate following interleaved load group (factor = 3):
 //   for (i = 0; i < N; i+=3) {
 //     R = Pic[i];             // Member of index 0
 //     G = Pic[i+1];           // Member of index 1
 //     B = Pic[i+2];           // Member of index 2
 //     ... // do something to R, G, B
 //   }
 // To:
 //   %wide.vec = load <12 x i32>                       ; Read 4 tuples of R,G,B
 //   %R.vec = shuffle %wide.vec, undef, <0, 3, 6, 9>   ; R elements
 //   %G.vec = shuffle %wide.vec, undef, <1, 4, 7, 10>  ; G elements
 //   %B.vec = shuffle %wide.vec, undef, <2, 5, 8, 11>  ; B elements
 //
 // Or translate following interleaved store group (factor = 3):
 //   for (i = 0; i < N; i+=3) {
 //     ... do something to R, G, B
 //     Pic[i]   = R;           // Member of index 0
 //     Pic[i+1] = G;           // Member of index 1
 //     Pic[i+2] = B;           // Member of index 2
 //   }
 // To:
 //   %R_G.vec = shuffle %R.vec, %G.vec, <0, 1, 2, ..., 7>
 //   %B_U.vec = shuffle %B.vec, undef, <0, 1, 2, 3, u, u, u, u>
 //   %interleaved.vec = shuffle %R_G.vec, %B_U.vec,
 //        <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>    ; Interleave R,G,B elements
 //   store <12 x i32> %interleaved.vec              ; Write 4 tuples of R,G,B
 void InnerLoopVectorizer::vectorizeInterleaveGroup(Instruction *Instr) {
   const InterleaveGroup *Group = Cost->getInterleavedAccessGroup(Instr);
   assert(Group && "Fail to get an interleaved access group.");
 
   // Skip if current instruction is not the insert position.
   if (Instr != Group->getInsertPos())
     return;
 
   const DataLayout &DL = Instr->getModule()->getDataLayout();
   Value *Ptr = getLoadStorePointerOperand(Instr);
 
   // Prepare for the vector type of the interleaved load/store.
   Type *ScalarTy = getMemInstValueType(Instr);
   unsigned InterleaveFactor = Group->getFactor();
   Type *VecTy = VectorType::get(ScalarTy, InterleaveFactor * VF);
   Type *PtrTy = VecTy->getPointerTo(getMemInstAddressSpace(Instr));
 
   // Prepare for the new pointers.
   setDebugLocFromInst(Builder, Ptr);
   SmallVector<Value *, 2> NewPtrs;
   unsigned Index = Group->getIndex(Instr);
 
   // If the group is reverse, adjust the index to refer to the last vector lane
   // instead of the first. We adjust the index from the first vector lane,
   // rather than directly getting the pointer for lane VF - 1, because the
   // pointer operand of the interleaved access is supposed to be uniform. For
   // uniform instructions, we're only required to generate a value for the
   // first vector lane in each unroll iteration.
   if (Group->isReverse())
     Index += (VF - 1) * Group->getFactor();
 
   bool InBounds = false;
   if (auto *gep = dyn_cast<GetElementPtrInst>(Ptr->stripPointerCasts()))
     InBounds = gep->isInBounds();
 
   for (unsigned Part = 0; Part < UF; Part++) {
     Value *NewPtr = getOrCreateScalarValue(Ptr, {Part, 0});
 
     // Notice current instruction could be any index. Need to adjust the address
     // to the member of index 0.
     //
     // E.g.  a = A[i+1];     // Member of index 1 (Current instruction)
     //       b = A[i];       // Member of index 0
     // Current pointer is pointed to A[i+1], adjust it to A[i].
     //
     // E.g.  A[i+1] = a;     // Member of index 1
     //       A[i]   = b;     // Member of index 0
     //       A[i+2] = c;     // Member of index 2 (Current instruction)
     // Current pointer is pointed to A[i+2], adjust it to A[i].
     NewPtr = Builder.CreateGEP(NewPtr, Builder.getInt32(-Index));
     if (InBounds)
       cast<GetElementPtrInst>(NewPtr)->setIsInBounds(true);
 
     // Cast to the vector pointer type.
     NewPtrs.push_back(Builder.CreateBitCast(NewPtr, PtrTy));
   }
 
   setDebugLocFromInst(Builder, Instr);
   Value *UndefVec = UndefValue::get(VecTy);
 
   // Vectorize the interleaved load group.
   if (isa<LoadInst>(Instr)) {
     // For each unroll part, create a wide load for the group.
     SmallVector<Value *, 2> NewLoads;
     for (unsigned Part = 0; Part < UF; Part++) {
       auto *NewLoad = Builder.CreateAlignedLoad(
           NewPtrs[Part], Group->getAlignment(), "wide.vec");
       Group->addMetadata(NewLoad);
       NewLoads.push_back(NewLoad);
     }
 
     // For each member in the group, shuffle out the appropriate data from the
     // wide loads.
     for (unsigned I = 0; I < InterleaveFactor; ++I) {
       Instruction *Member = Group->getMember(I);
 
       // Skip the gaps in the group.
       if (!Member)
         continue;
 
       Constant *StrideMask = createStrideMask(Builder, I, InterleaveFactor, VF);
       for (unsigned Part = 0; Part < UF; Part++) {
         Value *StridedVec = Builder.CreateShuffleVector(
             NewLoads[Part], UndefVec, StrideMask, "strided.vec");
 
         // If this member has different type, cast the result type.
         if (Member->getType() != ScalarTy) {
           VectorType *OtherVTy = VectorType::get(Member->getType(), VF);
           StridedVec = createBitOrPointerCast(StridedVec, OtherVTy, DL);
         }
 
         if (Group->isReverse())
           StridedVec = reverseVector(StridedVec);
 
         VectorLoopValueMap.setVectorValue(Member, Part, StridedVec);
       }
     }
     return;
   }
 
   // The sub vector type for current instruction.
   VectorType *SubVT = VectorType::get(ScalarTy, VF);
 
   // Vectorize the interleaved store group.
   for (unsigned Part = 0; Part < UF; Part++) {
     // Collect the stored vector from each member.
     SmallVector<Value *, 4> StoredVecs;
     for (unsigned i = 0; i < InterleaveFactor; i++) {
       // Interleaved store group doesn't allow a gap, so each index has a member
       Instruction *Member = Group->getMember(i);
       assert(Member && "Fail to get a member from an interleaved store group");
 
       Value *StoredVec = getOrCreateVectorValue(
           cast<StoreInst>(Member)->getValueOperand(), Part);
       if (Group->isReverse())
         StoredVec = reverseVector(StoredVec);
 
       // If this member has different type, cast it to a unified type.
 
       if (StoredVec->getType() != SubVT)
         StoredVec = createBitOrPointerCast(StoredVec, SubVT, DL);
 
       StoredVecs.push_back(StoredVec);
     }
 
     // Concatenate all vectors into a wide vector.
     Value *WideVec = concatenateVectors(Builder, StoredVecs);
 
     // Interleave the elements in the wide vector.
     Constant *IMask = createInterleaveMask(Builder, VF, InterleaveFactor);
     Value *IVec = Builder.CreateShuffleVector(WideVec, UndefVec, IMask,
                                               "interleaved.vec");
 
     Instruction *NewStoreInstr =
         Builder.CreateAlignedStore(IVec, NewPtrs[Part], Group->getAlignment());
 
     Group->addMetadata(NewStoreInstr);
   }
 }
 
 void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr,
                                                      VectorParts *BlockInMask) {
   // Attempt to issue a wide load.
   LoadInst *LI = dyn_cast<LoadInst>(Instr);
   StoreInst *SI = dyn_cast<StoreInst>(Instr);
 
   assert((LI || SI) && "Invalid Load/Store instruction");
 
   LoopVectorizationCostModel::InstWidening Decision =
       Cost->getWideningDecision(Instr, VF);
   assert(Decision != LoopVectorizationCostModel::CM_Unknown &&
          "CM decision should be taken at this point");
   if (Decision == LoopVectorizationCostModel::CM_Interleave)
     return vectorizeInterleaveGroup(Instr);
 
   Type *ScalarDataTy = getMemInstValueType(Instr);
   Type *DataTy = VectorType::get(ScalarDataTy, VF);
   Value *Ptr = getLoadStorePointerOperand(Instr);
   unsigned Alignment = getMemInstAlignment(Instr);
   // An alignment of 0 means target abi alignment. We need to use the scalar's
   // target abi alignment in such a case.
   const DataLayout &DL = Instr->getModule()->getDataLayout();
   if (!Alignment)
     Alignment = DL.getABITypeAlignment(ScalarDataTy);
   unsigned AddressSpace = getMemInstAddressSpace(Instr);
 
   // Determine if the pointer operand of the access is either consecutive or
   // reverse consecutive.
   bool Reverse = (Decision == LoopVectorizationCostModel::CM_Widen_Reverse);
   bool ConsecutiveStride =
       Reverse || (Decision == LoopVectorizationCostModel::CM_Widen);
   bool CreateGatherScatter =
       (Decision == LoopVectorizationCostModel::CM_GatherScatter);
 
   // Either Ptr feeds a vector load/store, or a vector GEP should feed a vector
   // gather/scatter. Otherwise Decision should have been to Scalarize.
   assert((ConsecutiveStride || CreateGatherScatter) &&
          "The instruction should be scalarized");
 
   // Handle consecutive loads/stores.
   if (ConsecutiveStride)
     Ptr = getOrCreateScalarValue(Ptr, {0, 0});
 
   VectorParts Mask;
   bool isMaskRequired = BlockInMask;
   if (isMaskRequired)
     Mask = *BlockInMask;
 
   bool InBounds = false;
   if (auto *gep = dyn_cast<GetElementPtrInst>(
           getLoadStorePointerOperand(Instr)->stripPointerCasts()))
     InBounds = gep->isInBounds();
 
   const auto CreateVecPtr = [&](unsigned Part, Value *Ptr) -> Value * {
     // Calculate the pointer for the specific unroll-part.
     GetElementPtrInst *PartPtr = nullptr;
 
     if (Reverse) {
       // If the address is consecutive but reversed, then the
       // wide store needs to start at the last vector element.
       PartPtr = cast<GetElementPtrInst>(
           Builder.CreateGEP(Ptr, Builder.getInt32(-Part * VF)));
       PartPtr->setIsInBounds(InBounds);
       PartPtr = cast<GetElementPtrInst>(
           Builder.CreateGEP(PartPtr, Builder.getInt32(1 - VF)));
       PartPtr->setIsInBounds(InBounds);
       if (isMaskRequired) // Reverse of a null all-one mask is a null mask.
         Mask[Part] = reverseVector(Mask[Part]);
     } else {
       PartPtr = cast<GetElementPtrInst>(
           Builder.CreateGEP(Ptr, Builder.getInt32(Part * VF)));
       PartPtr->setIsInBounds(InBounds);
     }
 
     return Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));
   };
 
   // Handle Stores:
   if (SI) {
     setDebugLocFromInst(Builder, SI);
 
     for (unsigned Part = 0; Part < UF; ++Part) {
       Instruction *NewSI = nullptr;
       Value *StoredVal = getOrCreateVectorValue(SI->getValueOperand(), Part);
       if (CreateGatherScatter) {
         Value *MaskPart = isMaskRequired ? Mask[Part] : nullptr;
         Value *VectorGep = getOrCreateVectorValue(Ptr, Part);
         NewSI = Builder.CreateMaskedScatter(StoredVal, VectorGep, Alignment,
                                             MaskPart);
       } else {
         if (Reverse) {
           // If we store to reverse consecutive memory locations, then we need
           // to reverse the order of elements in the stored value.
           StoredVal = reverseVector(StoredVal);
           // We don't want to update the value in the map as it might be used in
           // another expression. So don't call resetVectorValue(StoredVal).
         }
         auto *VecPtr = CreateVecPtr(Part, Ptr);
         if (isMaskRequired)
           NewSI = Builder.CreateMaskedStore(StoredVal, VecPtr, Alignment,
                                             Mask[Part]);
         else
           NewSI = Builder.CreateAlignedStore(StoredVal, VecPtr, Alignment);
       }
       addMetadata(NewSI, SI);
     }
     return;
   }
 
   // Handle loads.
   assert(LI && "Must have a load instruction");
   setDebugLocFromInst(Builder, LI);
   for (unsigned Part = 0; Part < UF; ++Part) {
     Value *NewLI;
     if (CreateGatherScatter) {
       Value *MaskPart = isMaskRequired ? Mask[Part] : nullptr;
       Value *VectorGep = getOrCreateVectorValue(Ptr, Part);
       NewLI = Builder.CreateMaskedGather(VectorGep, Alignment, MaskPart,
                                          nullptr, "wide.masked.gather");
       addMetadata(NewLI, LI);
     } else {
       auto *VecPtr = CreateVecPtr(Part, Ptr);
       if (isMaskRequired)
         NewLI = Builder.CreateMaskedLoad(VecPtr, Alignment, Mask[Part],
                                          UndefValue::get(DataTy),
                                          "wide.masked.load");
       else
         NewLI = Builder.CreateAlignedLoad(VecPtr, Alignment, "wide.load");
 
       // Add metadata to the load, but setVectorValue to the reverse shuffle.
       addMetadata(NewLI, LI);
       if (Reverse)
         NewLI = reverseVector(NewLI);
     }
     VectorLoopValueMap.setVectorValue(Instr, Part, NewLI);
   }
 }
 
 void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,
                                                const VPIteration &Instance,
                                                bool IfPredicateInstr) {
   assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
 
   setDebugLocFromInst(Builder, Instr);
 
   // Does this instruction return a value ?
   bool IsVoidRetTy = Instr->getType()->isVoidTy();
 
   Instruction *Cloned = Instr->clone();
   if (!IsVoidRetTy)
     Cloned->setName(Instr->getName() + ".cloned");
 
   // Replace the operands of the cloned instructions with their scalar
   // equivalents in the new loop.
   for (unsigned op = 0, e = Instr->getNumOperands(); op != e; ++op) {
     auto *NewOp = getOrCreateScalarValue(Instr->getOperand(op), Instance);
     Cloned->setOperand(op, NewOp);
   }
   addNewMetadata(Cloned, Instr);
 
   // Place the cloned scalar in the new loop.
   Builder.Insert(Cloned);
 
   // Add the cloned scalar to the scalar map entry.
   VectorLoopValueMap.setScalarValue(Instr, Instance, Cloned);
 
   // If we just cloned a new assumption, add it the assumption cache.
   if (auto *II = dyn_cast<IntrinsicInst>(Cloned))
     if (II->getIntrinsicID() == Intrinsic::assume)
       AC->registerAssumption(II);
 
   // End if-block.
   if (IfPredicateInstr)
     PredicatedInstructions.push_back(Cloned);
 }
 
 PHINode *InnerLoopVectorizer::createInductionVariable(Loop *L, Value *Start,
                                                       Value *End, Value *Step,
                                                       Instruction *DL) {
   BasicBlock *Header = L->getHeader();
   BasicBlock *Latch = L->getLoopLatch();
   // As we're just creating this loop, it's possible no latch exists
   // yet. If so, use the header as this will be a single block loop.
   if (!Latch)
     Latch = Header;
 
   IRBuilder<> Builder(&*Header->getFirstInsertionPt());
   Instruction *OldInst = getDebugLocFromInstOrOperands(OldInduction);
   setDebugLocFromInst(Builder, OldInst);
   auto *Induction = Builder.CreatePHI(Start->getType(), 2, "index");
 
   Builder.SetInsertPoint(Latch->getTerminator());
   setDebugLocFromInst(Builder, OldInst);
 
   // Create i+1 and fill the PHINode.
   Value *Next = Builder.CreateAdd(Induction, Step, "index.next");
   Induction->addIncoming(Start, L->getLoopPreheader());
   Induction->addIncoming(Next, Latch);
   // Create the compare.
   Value *ICmp = Builder.CreateICmpEQ(Next, End);
   Builder.CreateCondBr(ICmp, L->getExitBlock(), Header);
 
   // Now we have two terminators. Remove the old one from the block.
   Latch->getTerminator()->eraseFromParent();
 
   return Induction;
 }
 
 Value *InnerLoopVectorizer::getOrCreateTripCount(Loop *L) {
   if (TripCount)
     return TripCount;
 
   IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());
   // Find the loop boundaries.
   ScalarEvolution *SE = PSE.getSE();
   const SCEV *BackedgeTakenCount = PSE.getBackedgeTakenCount();
   assert(BackedgeTakenCount != SE->getCouldNotCompute() &&
          "Invalid loop count");
 
   Type *IdxTy = Legal->getWidestInductionType();
 
   // The exit count might have the type of i64 while the phi is i32. This can
   // happen if we have an induction variable that is sign extended before the
   // compare. The only way that we get a backedge taken count is that the
   // induction variable was signed and as such will not overflow. In such a case
   // truncation is legal.
   if (BackedgeTakenCount->getType()->getPrimitiveSizeInBits() >
       IdxTy->getPrimitiveSizeInBits())
     BackedgeTakenCount = SE->getTruncateOrNoop(BackedgeTakenCount, IdxTy);
   BackedgeTakenCount = SE->getNoopOrZeroExtend(BackedgeTakenCount, IdxTy);
 
   // Get the total trip count from the count by adding 1.
   const SCEV *ExitCount = SE->getAddExpr(
       BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));
 
   const DataLayout &DL = L->getHeader()->getModule()->getDataLayout();
 
   // Expand the trip count and place the new instructions in the preheader.
   // Notice that the pre-header does not change, only the loop body.
   SCEVExpander Exp(*SE, DL, "induction");
 
   // Count holds the overall loop count (N).
   TripCount = Exp.expandCodeFor(ExitCount, ExitCount->getType(),
                                 L->getLoopPreheader()->getTerminator());
 
   if (TripCount->getType()->isPointerTy())
     TripCount =
         CastInst::CreatePointerCast(TripCount, IdxTy, "exitcount.ptrcnt.to.int",
                                     L->getLoopPreheader()->getTerminator());
 
   return TripCount;
 }
 
 Value *InnerLoopVectorizer::getOrCreateVectorTripCount(Loop *L) {
   if (VectorTripCount)
     return VectorTripCount;
 
   Value *TC = getOrCreateTripCount(L);
   IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());
 
   // Now we need to generate the expression for the part of the loop that the
   // vectorized body will execute. This is equal to N - (N % Step) if scalar
   // iterations are not required for correctness, or N - Step, otherwise. Step
   // is equal to the vectorization factor (number of SIMD elements) times the
   // unroll factor (number of SIMD instructions).
   Constant *Step = ConstantInt::get(TC->getType(), VF * UF);
   Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");
 
   // If there is a non-reversed interleaved group that may speculatively access
   // memory out-of-bounds, we need to ensure that there will be at least one
   // iteration of the scalar epilogue loop. Thus, if the step evenly divides
   // the trip count, we set the remainder to be equal to the step. If the step
   // does not evenly divide the trip count, no adjustment is necessary since
   // there will already be scalar iterations. Note that the minimum iterations
   // check ensures that N >= Step.
   if (VF > 1 && Cost->requiresScalarEpilogue()) {
     auto *IsZero = Builder.CreateICmpEQ(R, ConstantInt::get(R->getType(), 0));
     R = Builder.CreateSelect(IsZero, Step, R);
   }
 
   VectorTripCount = Builder.CreateSub(TC, R, "n.vec");
 
   return VectorTripCount;
 }
 
 Value *InnerLoopVectorizer::createBitOrPointerCast(Value *V, VectorType *DstVTy,
                                                    const DataLayout &DL) {
   // Verify that V is a vector type with same number of elements as DstVTy.
   unsigned VF = DstVTy->getNumElements();
   VectorType *SrcVecTy = cast<VectorType>(V->getType());
   assert((VF == SrcVecTy->getNumElements()) && "Vector dimensions do not match");
   Type *SrcElemTy = SrcVecTy->getElementType();
   Type *DstElemTy = DstVTy->getElementType();
   assert((DL.getTypeSizeInBits(SrcElemTy) == DL.getTypeSizeInBits(DstElemTy)) &&
          "Vector elements must have same size");
 
   // Do a direct cast if element types are castable.
   if (CastInst::isBitOrNoopPointerCastable(SrcElemTy, DstElemTy, DL)) {
     return Builder.CreateBitOrPointerCast(V, DstVTy);
   }
   // V cannot be directly casted to desired vector type.
   // May happen when V is a floating point vector but DstVTy is a vector of
   // pointers or vice-versa. Handle this using a two-step bitcast using an
   // intermediate Integer type for the bitcast i.e. Ptr <-> Int <-> Float.
   assert((DstElemTy->isPointerTy() != SrcElemTy->isPointerTy()) &&
          "Only one type should be a pointer type");
   assert((DstElemTy->isFloatingPointTy() != SrcElemTy->isFloatingPointTy()) &&
          "Only one type should be a floating point type");
   Type *IntTy =
       IntegerType::getIntNTy(V->getContext(), DL.getTypeSizeInBits(SrcElemTy));
   VectorType *VecIntTy = VectorType::get(IntTy, VF);
   Value *CastVal = Builder.CreateBitOrPointerCast(V, VecIntTy);
   return Builder.CreateBitOrPointerCast(CastVal, DstVTy);
 }
 
 void InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop *L,
                                                          BasicBlock *Bypass) {
   Value *Count = getOrCreateTripCount(L);
   BasicBlock *BB = L->getLoopPreheader();
   IRBuilder<> Builder(BB->getTerminator());
 
   // Generate code to check if the loop's trip count is less than VF * UF, or
   // equal to it in case a scalar epilogue is required; this implies that the
   // vector trip count is zero. This check also covers the case where adding one
   // to the backedge-taken count overflowed leading to an incorrect trip count
   // of zero. In this case we will also jump to the scalar loop.
   auto P = Cost->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE
                                           : ICmpInst::ICMP_ULT;
   Value *CheckMinIters = Builder.CreateICmp(
       P, Count, ConstantInt::get(Count->getType(), VF * UF), "min.iters.check");
 
   BasicBlock *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");
   // Update dominator tree immediately if the generated block is a
   // LoopBypassBlock because SCEV expansions to generate loop bypass
   // checks may query it before the current function is finished.
   DT->addNewBlock(NewBB, BB);
   if (L->getParentLoop())
     L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);
   ReplaceInstWithInst(BB->getTerminator(),
                       BranchInst::Create(Bypass, NewBB, CheckMinIters));
   LoopBypassBlocks.push_back(BB);
 }
 
 void InnerLoopVectorizer::emitSCEVChecks(Loop *L, BasicBlock *Bypass) {
   BasicBlock *BB = L->getLoopPreheader();
 
   // Generate the code to check that the SCEV assumptions that we made.
   // We want the new basic block to start at the first instruction in a
   // sequence of instructions that form a check.
   SCEVExpander Exp(*PSE.getSE(), Bypass->getModule()->getDataLayout(),
                    "scev.check");
   Value *SCEVCheck =
       Exp.expandCodeForPredicate(&PSE.getUnionPredicate(), BB->getTerminator());
 
   if (auto *C = dyn_cast<ConstantInt>(SCEVCheck))
     if (C->isZero())
       return;
 
   // Create a new block containing the stride check.
   BB->setName("vector.scevcheck");
   auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");
   // Update dominator tree immediately if the generated block is a
   // LoopBypassBlock because SCEV expansions to generate loop bypass
   // checks may query it before the current function is finished.
   DT->addNewBlock(NewBB, BB);
   if (L->getParentLoop())
     L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);
   ReplaceInstWithInst(BB->getTerminator(),
                       BranchInst::Create(Bypass, NewBB, SCEVCheck));
   LoopBypassBlocks.push_back(BB);
   AddedSafetyChecks = true;
 }
 
 void InnerLoopVectorizer::emitMemRuntimeChecks(Loop *L, BasicBlock *Bypass) {
   BasicBlock *BB = L->getLoopPreheader();
 
   // Generate the code that checks in runtime if arrays overlap. We put the
   // checks into a separate block to make the more common case of few elements
   // faster.
   Instruction *FirstCheckInst;
   Instruction *MemRuntimeCheck;
   std::tie(FirstCheckInst, MemRuntimeCheck) =
       Legal->getLAI()->addRuntimeChecks(BB->getTerminator());
   if (!MemRuntimeCheck)
     return;
 
   // Create a new block containing the memory check.
   BB->setName("vector.memcheck");
   auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");
   // Update dominator tree immediately if the generated block is a
   // LoopBypassBlock because SCEV expansions to generate loop bypass
   // checks may query it before the current function is finished.
   DT->addNewBlock(NewBB, BB);
   if (L->getParentLoop())
     L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);
   ReplaceInstWithInst(BB->getTerminator(),
                       BranchInst::Create(Bypass, NewBB, MemRuntimeCheck));
   LoopBypassBlocks.push_back(BB);
   AddedSafetyChecks = true;
 
   // We currently don't use LoopVersioning for the actual loop cloning but we
   // still use it to add the noalias metadata.
   LVer = llvm::make_unique<LoopVersioning>(*Legal->getLAI(), OrigLoop, LI, DT,
                                            PSE.getSE());
   LVer->prepareNoAliasMetadata();
 }
 
 BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
   /*
    In this function we generate a new loop. The new loop will contain
    the vectorized instructions while the old loop will continue to run the
    scalar remainder.
 
        [ ] <-- loop iteration number check.
     /   |
    /    v
   |    [ ] <-- vector loop bypass (may consist of multiple blocks).
   |  /  |
   | /   v
   ||   [ ]     <-- vector pre header.
   |/    |
   |     v
   |    [  ] \
   |    [  ]_|   <-- vector loop.
   |     |
   |     v
   |   -[ ]   <--- middle-block.
   |  /  |
   | /   v
   -|- >[ ]     <--- new preheader.
    |    |
    |    v
    |   [ ] \
    |   [ ]_|   <-- old scalar loop to handle remainder.
     \   |
      \  v
       >[ ]     <-- exit block.
    ...
    */
 
   BasicBlock *OldBasicBlock = OrigLoop->getHeader();
   BasicBlock *VectorPH = OrigLoop->getLoopPreheader();
   BasicBlock *ExitBlock = OrigLoop->getExitBlock();
   assert(VectorPH && "Invalid loop structure");
   assert(ExitBlock && "Must have an exit block");
 
   // Some loops have a single integer induction variable, while other loops
   // don't. One example is c++ iterators that often have multiple pointer
   // induction variables. In the code below we also support a case where we
   // don't have a single induction variable.
   //
   // We try to obtain an induction variable from the original loop as hard
   // as possible. However if we don't find one that:
   //   - is an integer
   //   - counts from zero, stepping by one
   //   - is the size of the widest induction variable type
   // then we create a new one.
   OldInduction = Legal->getPrimaryInduction();
   Type *IdxTy = Legal->getWidestInductionType();
 
   // Split the single block loop into the two loop structure described above.
   BasicBlock *VecBody =
       VectorPH->splitBasicBlock(VectorPH->getTerminator(), "vector.body");
   BasicBlock *MiddleBlock =
       VecBody->splitBasicBlock(VecBody->getTerminator(), "middle.block");
   BasicBlock *ScalarPH =
       MiddleBlock->splitBasicBlock(MiddleBlock->getTerminator(), "scalar.ph");
 
   // Create and register the new vector loop.
   Loop *Lp = LI->AllocateLoop();
   Loop *ParentLoop = OrigLoop->getParentLoop();
 
   // Insert the new loop into the loop nest and register the new basic blocks
   // before calling any utilities such as SCEV that require valid LoopInfo.
   if (ParentLoop) {
     ParentLoop->addChildLoop(Lp);
     ParentLoop->addBasicBlockToLoop(ScalarPH, *LI);
     ParentLoop->addBasicBlockToLoop(MiddleBlock, *LI);
   } else {
     LI->addTopLevelLoop(Lp);
   }
   Lp->addBasicBlockToLoop(VecBody, *LI);
 
   // Find the loop boundaries.
   Value *Count = getOrCreateTripCount(Lp);
 
   Value *StartIdx = ConstantInt::get(IdxTy, 0);
 
   // Now, compare the new count to zero. If it is zero skip the vector loop and
   // jump to the scalar loop. This check also covers the case where the
   // backedge-taken count is uint##_max: adding one to it will overflow leading
   // to an incorrect trip count of zero. In this (rare) case we will also jump
   // to the scalar loop.
   emitMinimumIterationCountCheck(Lp, ScalarPH);
 
   // Generate the code to check any assumptions that we've made for SCEV
   // expressions.
   emitSCEVChecks(Lp, ScalarPH);
 
   // Generate the code that checks in runtime if arrays overlap. We put the
   // checks into a separate block to make the more common case of few elements
   // faster.
   emitMemRuntimeChecks(Lp, ScalarPH);
 
   // Generate the induction variable.
   // The loop step is equal to the vectorization factor (num of SIMD elements)
   // times the unroll factor (num of SIMD instructions).
   Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
   Constant *Step = ConstantInt::get(IdxTy, VF * UF);
   Induction =
       createInductionVariable(Lp, StartIdx, CountRoundDown, Step,
                               getDebugLocFromInstOrOperands(OldInduction));
 
   // We are going to resume the execution of the scalar loop.
   // Go over all of the induction variables that we found and fix the
   // PHIs that are left in the scalar version of the loop.
   // The starting values of PHI nodes depend on the counter of the last
   // iteration in the vectorized loop.
   // If we come from a bypass edge then we need to start from the original
   // start value.
 
   // This variable saves the new starting index for the scalar loop. It is used
   // to test if there are any tail iterations left once the vector loop has
   // completed.
   LoopVectorizationLegality::InductionList *List = Legal->getInductionVars();
   for (auto &InductionEntry : *List) {
     PHINode *OrigPhi = InductionEntry.first;
     InductionDescriptor II = InductionEntry.second;
 
     // Create phi nodes to merge from the  backedge-taken check block.
     PHINode *BCResumeVal = PHINode::Create(
         OrigPhi->getType(), 3, "bc.resume.val", ScalarPH->getTerminator());
     // Copy original phi DL over to the new one.
     BCResumeVal->setDebugLoc(OrigPhi->getDebugLoc());
     Value *&EndValue = IVEndValues[OrigPhi];
     if (OrigPhi == OldInduction) {
       // We know what the end value is.
       EndValue = CountRoundDown;
     } else {
       IRBuilder<> B(Lp->getLoopPreheader()->getTerminator());
       Type *StepType = II.getStep()->getType();
       Instruction::CastOps CastOp =
         CastInst::getCastOpcode(CountRoundDown, true, StepType, true);
       Value *CRD = B.CreateCast(CastOp, CountRoundDown, StepType, "cast.crd");
       const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();
       EndValue = II.transform(B, CRD, PSE.getSE(), DL);
       EndValue->setName("ind.end");
     }
 
     // The new PHI merges the original incoming value, in case of a bypass,
     // or the value at the end of the vectorized loop.
     BCResumeVal->addIncoming(EndValue, MiddleBlock);
 
     // Fix the scalar body counter (PHI node).
     unsigned BlockIdx = OrigPhi->getBasicBlockIndex(ScalarPH);
 
     // The old induction's phi node in the scalar body needs the truncated
     // value.
     for (BasicBlock *BB : LoopBypassBlocks)
       BCResumeVal->addIncoming(II.getStartValue(), BB);
     OrigPhi->setIncomingValue(BlockIdx, BCResumeVal);
   }
 
   // Add a check in the middle block to see if we have completed
   // all of the iterations in the first vector loop.
   // If (N - N%VF) == N, then we *don't* need to run the remainder.
   Value *CmpN =
       CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_EQ, Count,
                       CountRoundDown, "cmp.n", MiddleBlock->getTerminator());
   ReplaceInstWithInst(MiddleBlock->getTerminator(),
                       BranchInst::Create(ExitBlock, ScalarPH, CmpN));
 
   // Get ready to start creating new instructions into the vectorized body.
   Builder.SetInsertPoint(&*VecBody->getFirstInsertionPt());
 
   // Save the state.
   LoopVectorPreHeader = Lp->getLoopPreheader();
   LoopScalarPreHeader = ScalarPH;
   LoopMiddleBlock = MiddleBlock;
   LoopExitBlock = ExitBlock;
   LoopVectorBody = VecBody;
   LoopScalarBody = OldBasicBlock;
 
   // Keep all loop hints from the original loop on the vector loop (we'll
   // replace the vectorizer-specific hints below).
   if (MDNode *LID = OrigLoop->getLoopID())
     Lp->setLoopID(LID);
 
   LoopVectorizeHints Hints(Lp, true, *ORE);
   Hints.setAlreadyVectorized();
 
   return LoopVectorPreHeader;
 }
 
 // Fix up external users of the induction variable. At this point, we are
 // in LCSSA form, with all external PHIs that use the IV having one input value,
 // coming from the remainder loop. We need those PHIs to also have a correct
 // value for the IV when arriving directly from the middle block.
 void InnerLoopVectorizer::fixupIVUsers(PHINode *OrigPhi,
                                        const InductionDescriptor &II,
                                        Value *CountRoundDown, Value *EndValue,
                                        BasicBlock *MiddleBlock) {
   // There are two kinds of external IV usages - those that use the value
   // computed in the last iteration (the PHI) and those that use the penultimate
   // value (the value that feeds into the phi from the loop latch).
   // We allow both, but they, obviously, have different values.
 
   assert(OrigLoop->getExitBlock() && "Expected a single exit block");
 
   DenseMap<Value *, Value *> MissingVals;
 
   // An external user of the last iteration's value should see the value that
   // the remainder loop uses to initialize its own IV.
   Value *PostInc = OrigPhi->getIncomingValueForBlock(OrigLoop->getLoopLatch());
   for (User *U : PostInc->users()) {
     Instruction *UI = cast<Instruction>(U);
     if (!OrigLoop->contains(UI)) {
       assert(isa<PHINode>(UI) && "Expected LCSSA form");
       MissingVals[UI] = EndValue;
     }
   }
 
   // An external user of the penultimate value need to see EndValue - Step.
   // The simplest way to get this is to recompute it from the constituent SCEVs,
   // that is Start + (Step * (CRD - 1)).
   for (User *U : OrigPhi->users()) {
     auto *UI = cast<Instruction>(U);
     if (!OrigLoop->contains(UI)) {
       const DataLayout &DL =
           OrigLoop->getHeader()->getModule()->getDataLayout();
       assert(isa<PHINode>(UI) && "Expected LCSSA form");
 
       IRBuilder<> B(MiddleBlock->getTerminator());
       Value *CountMinusOne = B.CreateSub(
           CountRoundDown, ConstantInt::get(CountRoundDown->getType(), 1));
       Value *CMO =
           !II.getStep()->getType()->isIntegerTy()
               ? B.CreateCast(Instruction::SIToFP, CountMinusOne,
                              II.getStep()->getType())
               : B.CreateSExtOrTrunc(CountMinusOne, II.getStep()->getType());
       CMO->setName("cast.cmo");
       Value *Escape = II.transform(B, CMO, PSE.getSE(), DL);
       Escape->setName("ind.escape");
       MissingVals[UI] = Escape;
     }
   }
 
   for (auto &I : MissingVals) {
     PHINode *PHI = cast<PHINode>(I.first);
     // One corner case we have to handle is two IVs "chasing" each-other,
     // that is %IV2 = phi [...], [ %IV1, %latch ]
     // In this case, if IV1 has an external use, we need to avoid adding both
     // "last value of IV1" and "penultimate value of IV2". So, verify that we
     // don't already have an incoming value for the middle block.
     if (PHI->getBasicBlockIndex(MiddleBlock) == -1)
       PHI->addIncoming(I.second, MiddleBlock);
   }
 }
 
 namespace {
 
 struct CSEDenseMapInfo {
   static bool canHandle(const Instruction *I) {
     return isa<InsertElementInst>(I) || isa<ExtractElementInst>(I) ||
            isa<ShuffleVectorInst>(I) || isa<GetElementPtrInst>(I);
   }
 
   static inline Instruction *getEmptyKey() {
     return DenseMapInfo<Instruction *>::getEmptyKey();
   }
 
   static inline Instruction *getTombstoneKey() {
     return DenseMapInfo<Instruction *>::getTombstoneKey();
   }
 
   static unsigned getHashValue(const Instruction *I) {
     assert(canHandle(I) && "Unknown instruction!");
     return hash_combine(I->getOpcode(), hash_combine_range(I->value_op_begin(),
                                                            I->value_op_end()));
   }
 
   static bool isEqual(const Instruction *LHS, const Instruction *RHS) {
     if (LHS == getEmptyKey() || RHS == getEmptyKey() ||
         LHS == getTombstoneKey() || RHS == getTombstoneKey())
       return LHS == RHS;
     return LHS->isIdenticalTo(RHS);
   }
 };
 
 } // end anonymous namespace
 
 ///Perform cse of induction variable instructions.
 static void cse(BasicBlock *BB) {
   // Perform simple cse.
   SmallDenseMap<Instruction *, Instruction *, 4, CSEDenseMapInfo> CSEMap;
   for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {
     Instruction *In = &*I++;
 
     if (!CSEDenseMapInfo::canHandle(In))
       continue;
 
     // Check if we can replace this instruction with any of the
     // visited instructions.
     if (Instruction *V = CSEMap.lookup(In)) {
       In->replaceAllUsesWith(V);
       In->eraseFromParent();
       continue;
     }
 
     CSEMap[In] = In;
   }
 }
 
 /// Estimate the overhead of scalarizing an instruction. This is a
 /// convenience wrapper for the type-based getScalarizationOverhead API.
 static unsigned getScalarizationOverhead(Instruction *I, unsigned VF,
                                          const TargetTransformInfo &TTI) {
   if (VF == 1)
     return 0;
 
   unsigned Cost = 0;
   Type *RetTy = ToVectorTy(I->getType(), VF);
   if (!RetTy->isVoidTy() &&
       (!isa<LoadInst>(I) ||
        !TTI.supportsEfficientVectorElementLoadStore()))
     Cost += TTI.getScalarizationOverhead(RetTy, true, false);
 
   if (CallInst *CI = dyn_cast<CallInst>(I)) {
     SmallVector<const Value *, 4> Operands(CI->arg_operands());
     Cost += TTI.getOperandsScalarizationOverhead(Operands, VF);
   }
   else if (!isa<StoreInst>(I) ||
            !TTI.supportsEfficientVectorElementLoadStore()) {
     SmallVector<const Value *, 4> Operands(I->operand_values());
     Cost += TTI.getOperandsScalarizationOverhead(Operands, VF);
   }
 
   return Cost;
 }
 
 // Estimate cost of a call instruction CI if it were vectorized with factor VF.
 // Return the cost of the instruction, including scalarization overhead if it's
 // needed. The flag NeedToScalarize shows if the call needs to be scalarized -
 // i.e. either vector version isn't available, or is too expensive.
 static unsigned getVectorCallCost(CallInst *CI, unsigned VF,
                                   const TargetTransformInfo &TTI,
                                   const TargetLibraryInfo *TLI,
                                   bool &NeedToScalarize) {
   Function *F = CI->getCalledFunction();
   StringRef FnName = CI->getCalledFunction()->getName();
   Type *ScalarRetTy = CI->getType();
   SmallVector<Type *, 4> Tys, ScalarTys;
   for (auto &ArgOp : CI->arg_operands())
     ScalarTys.push_back(ArgOp->getType());
 
   // Estimate cost of scalarized vector call. The source operands are assumed
   // to be vectors, so we need to extract individual elements from there,
   // execute VF scalar calls, and then gather the result into the vector return
   // value.
   unsigned ScalarCallCost = TTI.getCallInstrCost(F, ScalarRetTy, ScalarTys);
   if (VF == 1)
     return ScalarCallCost;
 
   // Compute corresponding vector type for return value and arguments.
   Type *RetTy = ToVectorTy(ScalarRetTy, VF);
   for (Type *ScalarTy : ScalarTys)
     Tys.push_back(ToVectorTy(ScalarTy, VF));
 
   // Compute costs of unpacking argument values for the scalar calls and
   // packing the return values to a vector.
   unsigned ScalarizationCost = getScalarizationOverhead(CI, VF, TTI);
 
   unsigned Cost = ScalarCallCost * VF + ScalarizationCost;
 
   // If we can't emit a vector call for this function, then the currently found
   // cost is the cost we need to return.
   NeedToScalarize = true;
   if (!TLI || !TLI->isFunctionVectorizable(FnName, VF) || CI->isNoBuiltin())
     return Cost;
 
   // If the corresponding vector cost is cheaper, return its cost.
   unsigned VectorCallCost = TTI.getCallInstrCost(nullptr, RetTy, Tys);
   if (VectorCallCost < Cost) {
     NeedToScalarize = false;
     return VectorCallCost;
   }
   return Cost;
 }
 
 // Estimate cost of an intrinsic call instruction CI if it were vectorized with
 // factor VF.  Return the cost of the instruction, including scalarization
 // overhead if it's needed.
 static unsigned getVectorIntrinsicCost(CallInst *CI, unsigned VF,
                                        const TargetTransformInfo &TTI,
                                        const TargetLibraryInfo *TLI) {
   Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
   assert(ID && "Expected intrinsic call!");
 
   FastMathFlags FMF;
   if (auto *FPMO = dyn_cast<FPMathOperator>(CI))
     FMF = FPMO->getFastMathFlags();
 
   SmallVector<Value *, 4> Operands(CI->arg_operands());
   return TTI.getIntrinsicInstrCost(ID, CI->getType(), Operands, FMF, VF);
 }
 
 static Type *smallestIntegerVectorType(Type *T1, Type *T2) {
   auto *I1 = cast<IntegerType>(T1->getVectorElementType());
   auto *I2 = cast<IntegerType>(T2->getVectorElementType());
   return I1->getBitWidth() < I2->getBitWidth() ? T1 : T2;
 }
 static Type *largestIntegerVectorType(Type *T1, Type *T2) {
   auto *I1 = cast<IntegerType>(T1->getVectorElementType());
   auto *I2 = cast<IntegerType>(T2->getVectorElementType());
   return I1->getBitWidth() > I2->getBitWidth() ? T1 : T2;
 }
 
 void InnerLoopVectorizer::truncateToMinimalBitwidths() {
   // For every instruction `I` in MinBWs, truncate the operands, create a
   // truncated version of `I` and reextend its result. InstCombine runs
   // later and will remove any ext/trunc pairs.
   SmallPtrSet<Value *, 4> Erased;
   for (const auto &KV : Cost->getMinimalBitwidths()) {
     // If the value wasn't vectorized, we must maintain the original scalar
     // type. The absence of the value from VectorLoopValueMap indicates that it
     // wasn't vectorized.
     if (!VectorLoopValueMap.hasAnyVectorValue(KV.first))
       continue;
     for (unsigned Part = 0; Part < UF; ++Part) {
       Value *I = getOrCreateVectorValue(KV.first, Part);
       if (Erased.count(I) || I->use_empty() || !isa<Instruction>(I))
         continue;
       Type *OriginalTy = I->getType();
       Type *ScalarTruncatedTy =
           IntegerType::get(OriginalTy->getContext(), KV.second);
       Type *TruncatedTy = VectorType::get(ScalarTruncatedTy,
                                           OriginalTy->getVectorNumElements());
       if (TruncatedTy == OriginalTy)
         continue;
 
       IRBuilder<> B(cast<Instruction>(I));
       auto ShrinkOperand = [&](Value *V) -> Value * {
         if (auto *ZI = dyn_cast<ZExtInst>(V))
           if (ZI->getSrcTy() == TruncatedTy)
             return ZI->getOperand(0);
         return B.CreateZExtOrTrunc(V, TruncatedTy);
       };
 
       // The actual instruction modification depends on the instruction type,
       // unfortunately.
       Value *NewI = nullptr;
       if (auto *BO = dyn_cast<BinaryOperator>(I)) {
         NewI = B.CreateBinOp(BO->getOpcode(), ShrinkOperand(BO->getOperand(0)),
                              ShrinkOperand(BO->getOperand(1)));
 
         // Any wrapping introduced by shrinking this operation shouldn't be
         // considered undefined behavior. So, we can't unconditionally copy
         // arithmetic wrapping flags to NewI.
         cast<BinaryOperator>(NewI)->copyIRFlags(I, /*IncludeWrapFlags=*/false);
       } else if (auto *CI = dyn_cast<ICmpInst>(I)) {
         NewI =
             B.CreateICmp(CI->getPredicate(), ShrinkOperand(CI->getOperand(0)),
                          ShrinkOperand(CI->getOperand(1)));
       } else if (auto *SI = dyn_cast<SelectInst>(I)) {
         NewI = B.CreateSelect(SI->getCondition(),
                               ShrinkOperand(SI->getTrueValue()),
                               ShrinkOperand(SI->getFalseValue()));
       } else if (auto *CI = dyn_cast<CastInst>(I)) {
         switch (CI->getOpcode()) {
         default:
           llvm_unreachable("Unhandled cast!");
         case Instruction::Trunc:
           NewI = ShrinkOperand(CI->getOperand(0));
           break;
         case Instruction::SExt:
           NewI = B.CreateSExtOrTrunc(
               CI->getOperand(0),
               smallestIntegerVectorType(OriginalTy, TruncatedTy));
           break;
         case Instruction::ZExt:
           NewI = B.CreateZExtOrTrunc(
               CI->getOperand(0),
               smallestIntegerVectorType(OriginalTy, TruncatedTy));
           break;
         }
       } else if (auto *SI = dyn_cast<ShuffleVectorInst>(I)) {
         auto Elements0 = SI->getOperand(0)->getType()->getVectorNumElements();
         auto *O0 = B.CreateZExtOrTrunc(
             SI->getOperand(0), VectorType::get(ScalarTruncatedTy, Elements0));
         auto Elements1 = SI->getOperand(1)->getType()->getVectorNumElements();
         auto *O1 = B.CreateZExtOrTrunc(
             SI->getOperand(1), VectorType::get(ScalarTruncatedTy, Elements1));
 
         NewI = B.CreateShuffleVector(O0, O1, SI->getMask());
       } else if (isa<LoadInst>(I) || isa<PHINode>(I)) {
         // Don't do anything with the operands, just extend the result.
         continue;
       } else if (auto *IE = dyn_cast<InsertElementInst>(I)) {
         auto Elements = IE->getOperand(0)->getType()->getVectorNumElements();
         auto *O0 = B.CreateZExtOrTrunc(
             IE->getOperand(0), VectorType::get(ScalarTruncatedTy, Elements));
         auto *O1 = B.CreateZExtOrTrunc(IE->getOperand(1), ScalarTruncatedTy);
         NewI = B.CreateInsertElement(O0, O1, IE->getOperand(2));
       } else if (auto *EE = dyn_cast<ExtractElementInst>(I)) {
         auto Elements = EE->getOperand(0)->getType()->getVectorNumElements();
         auto *O0 = B.CreateZExtOrTrunc(
             EE->getOperand(0), VectorType::get(ScalarTruncatedTy, Elements));
         NewI = B.CreateExtractElement(O0, EE->getOperand(2));
       } else {
         // If we don't know what to do, be conservative and don't do anything.
         continue;
       }
 
       // Lastly, extend the result.
       NewI->takeName(cast<Instruction>(I));
       Value *Res = B.CreateZExtOrTrunc(NewI, OriginalTy);
       I->replaceAllUsesWith(Res);
       cast<Instruction>(I)->eraseFromParent();
       Erased.insert(I);
       VectorLoopValueMap.resetVectorValue(KV.first, Part, Res);
     }
   }
 
   // We'll have created a bunch of ZExts that are now parentless. Clean up.
   for (const auto &KV : Cost->getMinimalBitwidths()) {
     // If the value wasn't vectorized, we must maintain the original scalar
     // type. The absence of the value from VectorLoopValueMap indicates that it
     // wasn't vectorized.
     if (!VectorLoopValueMap.hasAnyVectorValue(KV.first))
       continue;
     for (unsigned Part = 0; Part < UF; ++Part) {
       Value *I = getOrCreateVectorValue(KV.first, Part);
       ZExtInst *Inst = dyn_cast<ZExtInst>(I);
       if (Inst && Inst->use_empty()) {
         Value *NewI = Inst->getOperand(0);
         Inst->eraseFromParent();
         VectorLoopValueMap.resetVectorValue(KV.first, Part, NewI);
       }
     }
   }
 }
 
 void InnerLoopVectorizer::fixVectorizedLoop() {
   // Insert truncates and extends for any truncated instructions as hints to
   // InstCombine.
   if (VF > 1)
     truncateToMinimalBitwidths();
 
   // At this point every instruction in the original loop is widened to a
   // vector form. Now we need to fix the recurrences in the loop. These PHI
   // nodes are currently empty because we did not want to introduce cycles.
   // This is the second stage of vectorizing recurrences.
   fixCrossIterationPHIs();
 
   // Update the dominator tree.
   //
   // FIXME: After creating the structure of the new loop, the dominator tree is
   //        no longer up-to-date, and it remains that way until we update it
   //        here. An out-of-date dominator tree is problematic for SCEV,
   //        because SCEVExpander uses it to guide code generation. The
   //        vectorizer use SCEVExpanders in several places. Instead, we should
   //        keep the dominator tree up-to-date as we go.
   updateAnalysis();
 
   // Fix-up external users of the induction variables.
   for (auto &Entry : *Legal->getInductionVars())
     fixupIVUsers(Entry.first, Entry.second,
                  getOrCreateVectorTripCount(LI->getLoopFor(LoopVectorBody)),
                  IVEndValues[Entry.first], LoopMiddleBlock);
 
   fixLCSSAPHIs();
   for (Instruction *PI : PredicatedInstructions)
     sinkScalarOperands(&*PI);
 
   // Remove redundant induction instructions.
   cse(LoopVectorBody);
 }
 
 void InnerLoopVectorizer::fixCrossIterationPHIs() {
   // In order to support recurrences we need to be able to vectorize Phi nodes.
   // Phi nodes have cycles, so we need to vectorize them in two stages. This is
   // stage #2: We now need to fix the recurrences by adding incoming edges to
   // the currently empty PHI nodes. At this point every instruction in the
   // original loop is widened to a vector form so we can use them to construct
   // the incoming edges.
   for (PHINode &Phi : OrigLoop->getHeader()->phis()) {
     // Handle first-order recurrences and reductions that need to be fixed.
     if (Legal->isFirstOrderRecurrence(&Phi))
       fixFirstOrderRecurrence(&Phi);
     else if (Legal->isReductionVariable(&Phi))
       fixReduction(&Phi);
   }
 }
 
 void InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi) {
   // This is the second phase of vectorizing first-order recurrences. An
   // overview of the transformation is described below. Suppose we have the
   // following loop.
   //
   //   for (int i = 0; i < n; ++i)
   //     b[i] = a[i] - a[i - 1];
   //
   // There is a first-order recurrence on "a". For this loop, the shorthand
   // scalar IR looks like:
   //
   //   scalar.ph:
   //     s_init = a[-1]
   //     br scalar.body
   //
   //   scalar.body:
   //     i = phi [0, scalar.ph], [i+1, scalar.body]
   //     s1 = phi [s_init, scalar.ph], [s2, scalar.body]
   //     s2 = a[i]
   //     b[i] = s2 - s1
   //     br cond, scalar.body, ...
   //
   // In this example, s1 is a recurrence because it's value depends on the
   // previous iteration. In the first phase of vectorization, we created a
   // temporary value for s1. We now complete the vectorization and produce the
   // shorthand vector IR shown below (for VF = 4, UF = 1).
   //
   //   vector.ph:
   //     v_init = vector(..., ..., ..., a[-1])
   //     br vector.body
   //
   //   vector.body
   //     i = phi [0, vector.ph], [i+4, vector.body]
   //     v1 = phi [v_init, vector.ph], [v2, vector.body]
   //     v2 = a[i, i+1, i+2, i+3];
   //     v3 = vector(v1(3), v2(0, 1, 2))
   //     b[i, i+1, i+2, i+3] = v2 - v3
   //     br cond, vector.body, middle.block
   //
   //   middle.block:
   //     x = v2(3)
   //     br scalar.ph
   //
   //   scalar.ph:
   //     s_init = phi [x, middle.block], [a[-1], otherwise]
   //     br scalar.body
   //
   // After execution completes the vector loop, we extract the next value of
   // the recurrence (x) to use as the initial value in the scalar loop.
 
   // Get the original loop preheader and single loop latch.
   auto *Preheader = OrigLoop->getLoopPreheader();
   auto *Latch = OrigLoop->getLoopLatch();
 
   // Get the initial and previous values of the scalar recurrence.
   auto *ScalarInit = Phi->getIncomingValueForBlock(Preheader);
   auto *Previous = Phi->getIncomingValueForBlock(Latch);
 
   // Create a vector from the initial value.
   auto *VectorInit = ScalarInit;
   if (VF > 1) {
     Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
     VectorInit = Builder.CreateInsertElement(
         UndefValue::get(VectorType::get(VectorInit->getType(), VF)), VectorInit,
         Builder.getInt32(VF - 1), "vector.recur.init");
   }
 
   // We constructed a temporary phi node in the first phase of vectorization.
   // This phi node will eventually be deleted.
   Builder.SetInsertPoint(
       cast<Instruction>(VectorLoopValueMap.getVectorValue(Phi, 0)));
 
   // Create a phi node for the new recurrence. The current value will either be
   // the initial value inserted into a vector or loop-varying vector value.
   auto *VecPhi = Builder.CreatePHI(VectorInit->getType(), 2, "vector.recur");
   VecPhi->addIncoming(VectorInit, LoopVectorPreHeader);
 
   // Get the vectorized previous value of the last part UF - 1. It appears last
   // among all unrolled iterations, due to the order of their construction.
   Value *PreviousLastPart = getOrCreateVectorValue(Previous, UF - 1);
 
   // Set the insertion point after the previous value if it is an instruction.
   // Note that the previous value may have been constant-folded so it is not
   // guaranteed to be an instruction in the vector loop. Also, if the previous
   // value is a phi node, we should insert after all the phi nodes to avoid
   // breaking basic block verification.
   if (LI->getLoopFor(LoopVectorBody)->isLoopInvariant(PreviousLastPart) ||
       isa<PHINode>(PreviousLastPart))
     Builder.SetInsertPoint(&*LoopVectorBody->getFirstInsertionPt());
   else
     Builder.SetInsertPoint(
         &*++BasicBlock::iterator(cast<Instruction>(PreviousLastPart)));
 
   // We will construct a vector for the recurrence by combining the values for
   // the current and previous iterations. This is the required shuffle mask.
   SmallVector<Constant *, 8> ShuffleMask(VF);
   ShuffleMask[0] = Builder.getInt32(VF - 1);
   for (unsigned I = 1; I < VF; ++I)
     ShuffleMask[I] = Builder.getInt32(I + VF - 1);
 
   // The vector from which to take the initial value for the current iteration
   // (actual or unrolled). Initially, this is the vector phi node.
   Value *Incoming = VecPhi;
 
   // Shuffle the current and previous vector and update the vector parts.
   for (unsigned Part = 0; Part < UF; ++Part) {
     Value *PreviousPart = getOrCreateVectorValue(Previous, Part);
     Value *PhiPart = VectorLoopValueMap.getVectorValue(Phi, Part);
     auto *Shuffle =
         VF > 1 ? Builder.CreateShuffleVector(Incoming, PreviousPart,
                                              ConstantVector::get(ShuffleMask))
                : Incoming;
     PhiPart->replaceAllUsesWith(Shuffle);
     cast<Instruction>(PhiPart)->eraseFromParent();
     VectorLoopValueMap.resetVectorValue(Phi, Part, Shuffle);
     Incoming = PreviousPart;
   }
 
   // Fix the latch value of the new recurrence in the vector loop.
   VecPhi->addIncoming(Incoming, LI->getLoopFor(LoopVectorBody)->getLoopLatch());
 
   // Extract the last vector element in the middle block. This will be the
   // initial value for the recurrence when jumping to the scalar loop.
   auto *ExtractForScalar = Incoming;
   if (VF > 1) {
     Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());
     ExtractForScalar = Builder.CreateExtractElement(
         ExtractForScalar, Builder.getInt32(VF - 1), "vector.recur.extract");
   }
   // Extract the second last element in the middle block if the
   // Phi is used outside the loop. We need to extract the phi itself
   // and not the last element (the phi update in the current iteration). This
   // will be the value when jumping to the exit block from the LoopMiddleBlock,
   // when the scalar loop is not run at all.
   Value *ExtractForPhiUsedOutsideLoop = nullptr;
   if (VF > 1)
     ExtractForPhiUsedOutsideLoop = Builder.CreateExtractElement(
         Incoming, Builder.getInt32(VF - 2), "vector.recur.extract.for.phi");
   // When loop is unrolled without vectorizing, initialize
   // ExtractForPhiUsedOutsideLoop with the value just prior to unrolled value of
   // `Incoming`. This is analogous to the vectorized case above: extracting the
   // second last element when VF > 1.
   else if (UF > 1)
     ExtractForPhiUsedOutsideLoop = getOrCreateVectorValue(Previous, UF - 2);
 
   // Fix the initial value of the original recurrence in the scalar loop.
   Builder.SetInsertPoint(&*LoopScalarPreHeader->begin());
   auto *Start = Builder.CreatePHI(Phi->getType(), 2, "scalar.recur.init");
   for (auto *BB : predecessors(LoopScalarPreHeader)) {
     auto *Incoming = BB == LoopMiddleBlock ? ExtractForScalar : ScalarInit;
     Start->addIncoming(Incoming, BB);
   }
 
   Phi->setIncomingValue(Phi->getBasicBlockIndex(LoopScalarPreHeader), Start);
   Phi->setName("scalar.recur");
 
   // Finally, fix users of the recurrence outside the loop. The users will need
   // either the last value of the scalar recurrence or the last value of the
   // vector recurrence we extracted in the middle block. Since the loop is in
   // LCSSA form, we just need to find all the phi nodes for the original scalar
   // recurrence in the exit block, and then add an edge for the middle block.
   for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {
     if (LCSSAPhi.getIncomingValue(0) == Phi) {
       LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop, LoopMiddleBlock);
     }
   }
 }
 
 void InnerLoopVectorizer::fixReduction(PHINode *Phi) {
   Constant *Zero = Builder.getInt32(0);
 
   // Get it's reduction variable descriptor.
   assert(Legal->isReductionVariable(Phi) &&
          "Unable to find the reduction variable");
   RecurrenceDescriptor RdxDesc = (*Legal->getReductionVars())[Phi];
 
   RecurrenceDescriptor::RecurrenceKind RK = RdxDesc.getRecurrenceKind();
   TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
   Instruction *LoopExitInst = RdxDesc.getLoopExitInstr();
   RecurrenceDescriptor::MinMaxRecurrenceKind MinMaxKind =
     RdxDesc.getMinMaxRecurrenceKind();
   setDebugLocFromInst(Builder, ReductionStartValue);
 
   // We need to generate a reduction vector from the incoming scalar.
   // To do so, we need to generate the 'identity' vector and override
   // one of the elements with the incoming scalar reduction. We need
   // to do it in the vector-loop preheader.
   Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
 
   // This is the vector-clone of the value that leaves the loop.
   Type *VecTy = getOrCreateVectorValue(LoopExitInst, 0)->getType();
 
   // Find the reduction identity variable. Zero for addition, or, xor,
   // one for multiplication, -1 for And.
   Value *Identity;
   Value *VectorStart;
   if (RK == RecurrenceDescriptor::RK_IntegerMinMax ||
       RK == RecurrenceDescriptor::RK_FloatMinMax) {
     // MinMax reduction have the start value as their identify.
     if (VF == 1) {
       VectorStart = Identity = ReductionStartValue;
     } else {
       VectorStart = Identity =
         Builder.CreateVectorSplat(VF, ReductionStartValue, "minmax.ident");
     }
   } else {
     // Handle other reduction kinds:
     Constant *Iden = RecurrenceDescriptor::getRecurrenceIdentity(
         RK, VecTy->getScalarType());
     if (VF == 1) {
       Identity = Iden;
       // This vector is the Identity vector where the first element is the
       // incoming scalar reduction.
       VectorStart = ReductionStartValue;
     } else {
       Identity = ConstantVector::getSplat(VF, Iden);
 
       // This vector is the Identity vector where the first element is the
       // incoming scalar reduction.
       VectorStart =
         Builder.CreateInsertElement(Identity, ReductionStartValue, Zero);
     }
   }
 
   // Fix the vector-loop phi.
 
   // Reductions do not have to start at zero. They can start with
   // any loop invariant values.
   BasicBlock *Latch = OrigLoop->getLoopLatch();
   Value *LoopVal = Phi->getIncomingValueForBlock(Latch);
   for (unsigned Part = 0; Part < UF; ++Part) {
     Value *VecRdxPhi = getOrCreateVectorValue(Phi, Part);
     Value *Val = getOrCreateVectorValue(LoopVal, Part);
     // Make sure to add the reduction stat value only to the
     // first unroll part.
     Value *StartVal = (Part == 0) ? VectorStart : Identity;
     cast<PHINode>(VecRdxPhi)->addIncoming(StartVal, LoopVectorPreHeader);
     cast<PHINode>(VecRdxPhi)
       ->addIncoming(Val, LI->getLoopFor(LoopVectorBody)->getLoopLatch());
   }
 
   // Before each round, move the insertion point right between
   // the PHIs and the values we are going to write.
   // This allows us to write both PHINodes and the extractelement
   // instructions.
   Builder.SetInsertPoint(&*LoopMiddleBlock->getFirstInsertionPt());
 
   setDebugLocFromInst(Builder, LoopExitInst);
 
   // If the vector reduction can be performed in a smaller type, we truncate
   // then extend the loop exit value to enable InstCombine to evaluate the
   // entire expression in the smaller type.
   if (VF > 1 && Phi->getType() != RdxDesc.getRecurrenceType()) {
     Type *RdxVecTy = VectorType::get(RdxDesc.getRecurrenceType(), VF);
     Builder.SetInsertPoint(
         LI->getLoopFor(LoopVectorBody)->getLoopLatch()->getTerminator());
     VectorParts RdxParts(UF);
     for (unsigned Part = 0; Part < UF; ++Part) {
       RdxParts[Part] = VectorLoopValueMap.getVectorValue(LoopExitInst, Part);
       Value *Trunc = Builder.CreateTrunc(RdxParts[Part], RdxVecTy);
       Value *Extnd = RdxDesc.isSigned() ? Builder.CreateSExt(Trunc, VecTy)
                                         : Builder.CreateZExt(Trunc, VecTy);
       for (Value::user_iterator UI = RdxParts[Part]->user_begin();
            UI != RdxParts[Part]->user_end();)
         if (*UI != Trunc) {
           (*UI++)->replaceUsesOfWith(RdxParts[Part], Extnd);
           RdxParts[Part] = Extnd;
         } else {
           ++UI;
         }
     }
     Builder.SetInsertPoint(&*LoopMiddleBlock->getFirstInsertionPt());
     for (unsigned Part = 0; Part < UF; ++Part) {
       RdxParts[Part] = Builder.CreateTrunc(RdxParts[Part], RdxVecTy);
       VectorLoopValueMap.resetVectorValue(LoopExitInst, Part, RdxParts[Part]);
     }
   }
 
   // Reduce all of the unrolled parts into a single vector.
   Value *ReducedPartRdx = VectorLoopValueMap.getVectorValue(LoopExitInst, 0);
   unsigned Op = RecurrenceDescriptor::getRecurrenceBinOp(RK);
   setDebugLocFromInst(Builder, ReducedPartRdx);
   for (unsigned Part = 1; Part < UF; ++Part) {
     Value *RdxPart = VectorLoopValueMap.getVectorValue(LoopExitInst, Part);
     if (Op != Instruction::ICmp && Op != Instruction::FCmp)
       // Floating point operations had to be 'fast' to enable the reduction.
       ReducedPartRdx = addFastMathFlag(
           Builder.CreateBinOp((Instruction::BinaryOps)Op, RdxPart,
                               ReducedPartRdx, "bin.rdx"));
     else
       ReducedPartRdx = RecurrenceDescriptor::createMinMaxOp(
           Builder, MinMaxKind, ReducedPartRdx, RdxPart);
   }
 
   if (VF > 1) {
     bool NoNaN = Legal->hasFunNoNaNAttr();
     ReducedPartRdx =
         createTargetReduction(Builder, TTI, RdxDesc, ReducedPartRdx, NoNaN);
     // If the reduction can be performed in a smaller type, we need to extend
     // the reduction to the wider type before we branch to the original loop.
     if (Phi->getType() != RdxDesc.getRecurrenceType())
       ReducedPartRdx =
         RdxDesc.isSigned()
         ? Builder.CreateSExt(ReducedPartRdx, Phi->getType())
         : Builder.CreateZExt(ReducedPartRdx, Phi->getType());
   }
 
   // Create a phi node that merges control-flow from the backedge-taken check
   // block and the middle block.
   PHINode *BCBlockPhi = PHINode::Create(Phi->getType(), 2, "bc.merge.rdx",
                                         LoopScalarPreHeader->getTerminator());
   for (unsigned I = 0, E = LoopBypassBlocks.size(); I != E; ++I)
     BCBlockPhi->addIncoming(ReductionStartValue, LoopBypassBlocks[I]);
   BCBlockPhi->addIncoming(ReducedPartRdx, LoopMiddleBlock);
 
   // Now, we need to fix the users of the reduction variable
   // inside and outside of the scalar remainder loop.
   // We know that the loop is in LCSSA form. We need to update the
   // PHI nodes in the exit blocks.
   for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {
     // All PHINodes need to have a single entry edge, or two if
     // we already fixed them.
     assert(LCSSAPhi.getNumIncomingValues() < 3 && "Invalid LCSSA PHI");
 
     // We found a reduction value exit-PHI. Update it with the
     // incoming bypass edge.
     if (LCSSAPhi.getIncomingValue(0) == LoopExitInst)
       LCSSAPhi.addIncoming(ReducedPartRdx, LoopMiddleBlock);
   } // end of the LCSSA phi scan.
 
     // Fix the scalar loop reduction variable with the incoming reduction sum
     // from the vector body and from the backedge value.
   int IncomingEdgeBlockIdx =
     Phi->getBasicBlockIndex(OrigLoop->getLoopLatch());
   assert(IncomingEdgeBlockIdx >= 0 && "Invalid block index");
   // Pick the other block.
   int SelfEdgeBlockIdx = (IncomingEdgeBlockIdx ? 0 : 1);
   Phi->setIncomingValue(SelfEdgeBlockIdx, BCBlockPhi);
   Phi->setIncomingValue(IncomingEdgeBlockIdx, LoopExitInst);
 }
 
 void InnerLoopVectorizer::fixLCSSAPHIs() {
   for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {
     if (LCSSAPhi.getNumIncomingValues() == 1) {
       assert(OrigLoop->isLoopInvariant(LCSSAPhi.getIncomingValue(0)) &&
              "Incoming value isn't loop invariant");
       LCSSAPhi.addIncoming(LCSSAPhi.getIncomingValue(0), LoopMiddleBlock);
     }
   }
 }
 
 void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {
   // The basic block and loop containing the predicated instruction.
   auto *PredBB = PredInst->getParent();
   auto *VectorLoop = LI->getLoopFor(PredBB);
 
   // Initialize a worklist with the operands of the predicated instruction.
   SetVector<Value *> Worklist(PredInst->op_begin(), PredInst->op_end());
 
   // Holds instructions that we need to analyze again. An instruction may be
   // reanalyzed if we don't yet know if we can sink it or not.
   SmallVector<Instruction *, 8> InstsToReanalyze;
 
   // Returns true if a given use occurs in the predicated block. Phi nodes use
   // their operands in their corresponding predecessor blocks.
   auto isBlockOfUsePredicated = [&](Use &U) -> bool {
     auto *I = cast<Instruction>(U.getUser());
     BasicBlock *BB = I->getParent();
     if (auto *Phi = dyn_cast<PHINode>(I))
       BB = Phi->getIncomingBlock(
           PHINode::getIncomingValueNumForOperand(U.getOperandNo()));
     return BB == PredBB;
   };
 
   // Iteratively sink the scalarized operands of the predicated instruction
   // into the block we created for it. When an instruction is sunk, it's
   // operands are then added to the worklist. The algorithm ends after one pass
   // through the worklist doesn't sink a single instruction.
   bool Changed;
   do {
     // Add the instructions that need to be reanalyzed to the worklist, and
     // reset the changed indicator.
     Worklist.insert(InstsToReanalyze.begin(), InstsToReanalyze.end());
     InstsToReanalyze.clear();
     Changed = false;
 
     while (!Worklist.empty()) {
       auto *I = dyn_cast<Instruction>(Worklist.pop_back_val());
 
       // We can't sink an instruction if it is a phi node, is already in the
       // predicated block, is not in the loop, or may have side effects.
       if (!I || isa<PHINode>(I) || I->getParent() == PredBB ||
           !VectorLoop->contains(I) || I->mayHaveSideEffects())
         continue;
 
       // It's legal to sink the instruction if all its uses occur in the
       // predicated block. Otherwise, there's nothing to do yet, and we may
       // need to reanalyze the instruction.
       if (!llvm::all_of(I->uses(), isBlockOfUsePredicated)) {
         InstsToReanalyze.push_back(I);
         continue;
       }
 
       // Move the instruction to the beginning of the predicated block, and add
       // it's operands to the worklist.
       I->moveBefore(&*PredBB->getFirstInsertionPt());
       Worklist.insert(I->op_begin(), I->op_end());
 
       // The sinking may have enabled other instructions to be sunk, so we will
       // need to iterate.
       Changed = true;
     }
   } while (Changed);
 }
 
 void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,
                                               unsigned VF) {
   assert(PN->getParent() == OrigLoop->getHeader() &&
          "Non-header phis should have been handled elsewhere");
 
   PHINode *P = cast<PHINode>(PN);
   // In order to support recurrences we need to be able to vectorize Phi nodes.
   // Phi nodes have cycles, so we need to vectorize them in two stages. This is
   // stage #1: We create a new vector PHI node with no incoming edges. We'll use
   // this value when we vectorize all of the instructions that use the PHI.
   if (Legal->isReductionVariable(P) || Legal->isFirstOrderRecurrence(P)) {
     for (unsigned Part = 0; Part < UF; ++Part) {
       // This is phase one of vectorizing PHIs.
       Type *VecTy =
           (VF == 1) ? PN->getType() : VectorType::get(PN->getType(), VF);
       Value *EntryPart = PHINode::Create(
           VecTy, 2, "vec.phi", &*LoopVectorBody->getFirstInsertionPt());
       VectorLoopValueMap.setVectorValue(P, Part, EntryPart);
     }
     return;
   }
 
   setDebugLocFromInst(Builder, P);
 
   // This PHINode must be an induction variable.
   // Make sure that we know about it.
   assert(Legal->getInductionVars()->count(P) && "Not an induction variable");
 
   InductionDescriptor II = Legal->getInductionVars()->lookup(P);
   const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();
 
   // FIXME: The newly created binary instructions should contain nsw/nuw flags,
   // which can be found from the original scalar operations.
   switch (II.getKind()) {
   case InductionDescriptor::IK_NoInduction:
     llvm_unreachable("Unknown induction");
   case InductionDescriptor::IK_IntInduction:
   case InductionDescriptor::IK_FpInduction:
     llvm_unreachable("Integer/fp induction is handled elsewhere.");
   case InductionDescriptor::IK_PtrInduction: {
     // Handle the pointer induction variable case.
     assert(P->getType()->isPointerTy() && "Unexpected type.");
     // This is the normalized GEP that starts counting at zero.
     Value *PtrInd = Induction;
     PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStep()->getType());
     // Determine the number of scalars we need to generate for each unroll
     // iteration. If the instruction is uniform, we only need to generate the
     // first lane. Otherwise, we generate all VF values.
     unsigned Lanes = Cost->isUniformAfterVectorization(P, VF) ? 1 : VF;
     // These are the scalar results. Notice that we don't generate vector GEPs
     // because scalar GEPs result in better code.
     for (unsigned Part = 0; Part < UF; ++Part) {
       for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
         Constant *Idx = ConstantInt::get(PtrInd->getType(), Lane + Part * VF);
         Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
         Value *SclrGep = II.transform(Builder, GlobalIdx, PSE.getSE(), DL);
         SclrGep->setName("next.gep");
         VectorLoopValueMap.setScalarValue(P, {Part, Lane}, SclrGep);
       }
     }
     return;
   }
   }
 }
 
 /// A helper function for checking whether an integer division-related
 /// instruction may divide by zero (in which case it must be predicated if
 /// executed conditionally in the scalar code).
 /// TODO: It may be worthwhile to generalize and check isKnownNonZero().
 /// Non-zero divisors that are non compile-time constants will not be
 /// converted into multiplication, so we will still end up scalarizing
 /// the division, but can do so w/o predication.
 static bool mayDivideByZero(Instruction &I) {
   assert((I.getOpcode() == Instruction::UDiv ||
           I.getOpcode() == Instruction::SDiv ||
           I.getOpcode() == Instruction::URem ||
           I.getOpcode() == Instruction::SRem) &&
          "Unexpected instruction");
   Value *Divisor = I.getOperand(1);
   auto *CInt = dyn_cast<ConstantInt>(Divisor);
   return !CInt || CInt->isZero();
 }
 
 void InnerLoopVectorizer::widenInstruction(Instruction &I) {
   switch (I.getOpcode()) {
   case Instruction::Br:
   case Instruction::PHI:
     llvm_unreachable("This instruction is handled by a different recipe.");
   case Instruction::GetElementPtr: {
     // Construct a vector GEP by widening the operands of the scalar GEP as
     // necessary. We mark the vector GEP 'inbounds' if appropriate. A GEP
     // results in a vector of pointers when at least one operand of the GEP
     // is vector-typed. Thus, to keep the representation compact, we only use
     // vector-typed operands for loop-varying values.
     auto *GEP = cast<GetElementPtrInst>(&I);
 
     if (VF > 1 && OrigLoop->hasLoopInvariantOperands(GEP)) {
       // If we are vectorizing, but the GEP has only loop-invariant operands,
       // the GEP we build (by only using vector-typed operands for
       // loop-varying values) would be a scalar pointer. Thus, to ensure we
       // produce a vector of pointers, we need to either arbitrarily pick an
       // operand to broadcast, or broadcast a clone of the original GEP.
       // Here, we broadcast a clone of the original.
       //
       // TODO: If at some point we decide to scalarize instructions having
       //       loop-invariant operands, this special case will no longer be
       //       required. We would add the scalarization decision to
       //       collectLoopScalars() and teach getVectorValue() to broadcast
       //       the lane-zero scalar value.
       auto *Clone = Builder.Insert(GEP->clone());
       for (unsigned Part = 0; Part < UF; ++Part) {
         Value *EntryPart = Builder.CreateVectorSplat(VF, Clone);
         VectorLoopValueMap.setVectorValue(&I, Part, EntryPart);
         addMetadata(EntryPart, GEP);
       }
     } else {
       // If the GEP has at least one loop-varying operand, we are sure to
       // produce a vector of pointers. But if we are only unrolling, we want
       // to produce a scalar GEP for each unroll part. Thus, the GEP we
       // produce with the code below will be scalar (if VF == 1) or vector
       // (otherwise). Note that for the unroll-only case, we still maintain
       // values in the vector mapping with initVector, as we do for other
       // instructions.
       for (unsigned Part = 0; Part < UF; ++Part) {
         // The pointer operand of the new GEP. If it's loop-invariant, we
         // won't broadcast it.
         auto *Ptr =
             OrigLoop->isLoopInvariant(GEP->getPointerOperand())
                 ? GEP->getPointerOperand()
                 : getOrCreateVectorValue(GEP->getPointerOperand(), Part);
 
         // Collect all the indices for the new GEP. If any index is
         // loop-invariant, we won't broadcast it.
         SmallVector<Value *, 4> Indices;
         for (auto &U : make_range(GEP->idx_begin(), GEP->idx_end())) {
           if (OrigLoop->isLoopInvariant(U.get()))
             Indices.push_back(U.get());
           else
             Indices.push_back(getOrCreateVectorValue(U.get(), Part));
         }
 
         // Create the new GEP. Note that this GEP may be a scalar if VF == 1,
         // but it should be a vector, otherwise.
         auto *NewGEP = GEP->isInBounds()
                            ? Builder.CreateInBoundsGEP(Ptr, Indices)
                            : Builder.CreateGEP(Ptr, Indices);
         assert((VF == 1 || NewGEP->getType()->isVectorTy()) &&
                "NewGEP is not a pointer vector");
         VectorLoopValueMap.setVectorValue(&I, Part, NewGEP);
         addMetadata(NewGEP, GEP);
       }
     }
 
     break;
   }
   case Instruction::UDiv:
   case Instruction::SDiv:
   case Instruction::SRem:
   case Instruction::URem:
   case Instruction::Add:
   case Instruction::FAdd:
   case Instruction::Sub:
   case Instruction::FSub:
   case Instruction::Mul:
   case Instruction::FMul:
   case Instruction::FDiv:
   case Instruction::FRem:
   case Instruction::Shl:
   case Instruction::LShr:
   case Instruction::AShr:
   case Instruction::And:
   case Instruction::Or:
   case Instruction::Xor: {
     // Just widen binops.
     auto *BinOp = cast<BinaryOperator>(&I);
     setDebugLocFromInst(Builder, BinOp);
 
     for (unsigned Part = 0; Part < UF; ++Part) {
       Value *A = getOrCreateVectorValue(BinOp->getOperand(0), Part);
       Value *B = getOrCreateVectorValue(BinOp->getOperand(1), Part);
       Value *V = Builder.CreateBinOp(BinOp->getOpcode(), A, B);
 
       if (BinaryOperator *VecOp = dyn_cast<BinaryOperator>(V))
         VecOp->copyIRFlags(BinOp);
 
       // Use this vector value for all users of the original instruction.
       VectorLoopValueMap.setVectorValue(&I, Part, V);
       addMetadata(V, BinOp);
     }
 
     break;
   }
   case Instruction::Select: {
     // Widen selects.
     // If the selector is loop invariant we can create a select
     // instruction with a scalar condition. Otherwise, use vector-select.
     auto *SE = PSE.getSE();
     bool InvariantCond =
         SE->isLoopInvariant(PSE.getSCEV(I.getOperand(0)), OrigLoop);
     setDebugLocFromInst(Builder, &I);
 
     // The condition can be loop invariant  but still defined inside the
     // loop. This means that we can't just use the original 'cond' value.
     // We have to take the 'vectorized' value and pick the first lane.
     // Instcombine will make this a no-op.
 
     auto *ScalarCond = getOrCreateScalarValue(I.getOperand(0), {0, 0});
 
     for (unsigned Part = 0; Part < UF; ++Part) {
       Value *Cond = getOrCreateVectorValue(I.getOperand(0), Part);
       Value *Op0 = getOrCreateVectorValue(I.getOperand(1), Part);
       Value *Op1 = getOrCreateVectorValue(I.getOperand(2), Part);
       Value *Sel =
           Builder.CreateSelect(InvariantCond ? ScalarCond : Cond, Op0, Op1);
       VectorLoopValueMap.setVectorValue(&I, Part, Sel);
       addMetadata(Sel, &I);
     }
 
     break;
   }
 
   case Instruction::ICmp:
   case Instruction::FCmp: {
     // Widen compares. Generate vector compares.
     bool FCmp = (I.getOpcode() == Instruction::FCmp);
     auto *Cmp = dyn_cast<CmpInst>(&I);
     setDebugLocFromInst(Builder, Cmp);
     for (unsigned Part = 0; Part < UF; ++Part) {
       Value *A = getOrCreateVectorValue(Cmp->getOperand(0), Part);
       Value *B = getOrCreateVectorValue(Cmp->getOperand(1), Part);
       Value *C = nullptr;
       if (FCmp) {
         // Propagate fast math flags.
         IRBuilder<>::FastMathFlagGuard FMFG(Builder);
         Builder.setFastMathFlags(Cmp->getFastMathFlags());
         C = Builder.CreateFCmp(Cmp->getPredicate(), A, B);
       } else {
         C = Builder.CreateICmp(Cmp->getPredicate(), A, B);
       }
       VectorLoopValueMap.setVectorValue(&I, Part, C);
       addMetadata(C, &I);
     }
 
     break;
   }
 
   case Instruction::ZExt:
   case Instruction::SExt:
   case Instruction::FPToUI:
   case Instruction::FPToSI:
   case Instruction::FPExt:
   case Instruction::PtrToInt:
   case Instruction::IntToPtr:
   case Instruction::SIToFP:
   case Instruction::UIToFP:
   case Instruction::Trunc:
   case Instruction::FPTrunc:
   case Instruction::BitCast: {
     auto *CI = dyn_cast<CastInst>(&I);
     setDebugLocFromInst(Builder, CI);
 
     /// Vectorize casts.
     Type *DestTy =
         (VF == 1) ? CI->getType() : VectorType::get(CI->getType(), VF);
 
     for (unsigned Part = 0; Part < UF; ++Part) {
       Value *A = getOrCreateVectorValue(CI->getOperand(0), Part);
       Value *Cast = Builder.CreateCast(CI->getOpcode(), A, DestTy);
       VectorLoopValueMap.setVectorValue(&I, Part, Cast);
       addMetadata(Cast, &I);
     }
     break;
   }
 
   case Instruction::Call: {
     // Ignore dbg intrinsics.
     if (isa<DbgInfoIntrinsic>(I))
       break;
     setDebugLocFromInst(Builder, &I);
 
     Module *M = I.getParent()->getParent()->getParent();
     auto *CI = cast<CallInst>(&I);
 
     StringRef FnName = CI->getCalledFunction()->getName();
     Function *F = CI->getCalledFunction();
     Type *RetTy = ToVectorTy(CI->getType(), VF);
     SmallVector<Type *, 4> Tys;
     for (Value *ArgOperand : CI->arg_operands())
       Tys.push_back(ToVectorTy(ArgOperand->getType(), VF));
 
     Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
 
     // The flag shows whether we use Intrinsic or a usual Call for vectorized
     // version of the instruction.
     // Is it beneficial to perform intrinsic call compared to lib call?
     bool NeedToScalarize;
     unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);
     bool UseVectorIntrinsic =
         ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;
     assert((UseVectorIntrinsic || !NeedToScalarize) &&
            "Instruction should be scalarized elsewhere.");
 
     for (unsigned Part = 0; Part < UF; ++Part) {
       SmallVector<Value *, 4> Args;
       for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {
         Value *Arg = CI->getArgOperand(i);
         // Some intrinsics have a scalar argument - don't replace it with a
         // vector.
         if (!UseVectorIntrinsic || !hasVectorInstrinsicScalarOpd(ID, i))
           Arg = getOrCreateVectorValue(CI->getArgOperand(i), Part);
         Args.push_back(Arg);
       }
 
       Function *VectorF;
       if (UseVectorIntrinsic) {
         // Use vector version of the intrinsic.
         Type *TysForDecl[] = {CI->getType()};
         if (VF > 1)
           TysForDecl[0] = VectorType::get(CI->getType()->getScalarType(), VF);
         VectorF = Intrinsic::getDeclaration(M, ID, TysForDecl);
       } else {
         // Use vector version of the library call.
         StringRef VFnName = TLI->getVectorizedFunction(FnName, VF);
         assert(!VFnName.empty() && "Vector function name is empty.");
         VectorF = M->getFunction(VFnName);
         if (!VectorF) {
           // Generate a declaration
           FunctionType *FTy = FunctionType::get(RetTy, Tys, false);
           VectorF =
               Function::Create(FTy, Function::ExternalLinkage, VFnName, M);
           VectorF->copyAttributesFrom(F);
         }
       }
       assert(VectorF && "Can't create vector function.");
 
       SmallVector<OperandBundleDef, 1> OpBundles;
       CI->getOperandBundlesAsDefs(OpBundles);
       CallInst *V = Builder.CreateCall(VectorF, Args, OpBundles);
 
       if (isa<FPMathOperator>(V))
         V->copyFastMathFlags(CI);
 
       VectorLoopValueMap.setVectorValue(&I, Part, V);
       addMetadata(V, &I);
     }
 
     break;
   }
 
   default:
     // This instruction is not vectorized by simple widening.
     LLVM_DEBUG(dbgs() << "LV: Found an unhandled instruction: " << I);
     llvm_unreachable("Unhandled instruction!");
   } // end of switch.
 }
 
 void InnerLoopVectorizer::updateAnalysis() {
   // Forget the original basic block.
   PSE.getSE()->forgetLoop(OrigLoop);
 
   // Update the dominator tree information.
   assert(DT->properlyDominates(LoopBypassBlocks.front(), LoopExitBlock) &&
          "Entry does not dominate exit.");
 
   DT->addNewBlock(LoopMiddleBlock,
                   LI->getLoopFor(LoopVectorBody)->getLoopLatch());
   DT->addNewBlock(LoopScalarPreHeader, LoopBypassBlocks[0]);
   DT->changeImmediateDominator(LoopScalarBody, LoopScalarPreHeader);
   DT->changeImmediateDominator(LoopExitBlock, LoopBypassBlocks[0]);
   assert(DT->verify(DominatorTree::VerificationLevel::Fast));
 }
 
 void LoopVectorizationCostModel::collectLoopScalars(unsigned VF) {
   // We should not collect Scalars more than once per VF. Right now, this
   // function is called from collectUniformsAndScalars(), which already does
   // this check. Collecting Scalars for VF=1 does not make any sense.
   assert(VF >= 2 && !Scalars.count(VF) &&
          "This function should not be visited twice for the same VF");
 
   SmallSetVector<Instruction *, 8> Worklist;
 
   // These sets are used to seed the analysis with pointers used by memory
   // accesses that will remain scalar.
   SmallSetVector<Instruction *, 8> ScalarPtrs;
   SmallPtrSet<Instruction *, 8> PossibleNonScalarPtrs;
 
   // A helper that returns true if the use of Ptr by MemAccess will be scalar.
   // The pointer operands of loads and stores will be scalar as long as the
   // memory access is not a gather or scatter operation. The value operand of a
   // store will remain scalar if the store is scalarized.
   auto isScalarUse = [&](Instruction *MemAccess, Value *Ptr) {
     InstWidening WideningDecision = getWideningDecision(MemAccess, VF);
     assert(WideningDecision != CM_Unknown &&
            "Widening decision should be ready at this moment");
     if (auto *Store = dyn_cast<StoreInst>(MemAccess))
       if (Ptr == Store->getValueOperand())
         return WideningDecision == CM_Scalarize;
     assert(Ptr == getLoadStorePointerOperand(MemAccess) &&
            "Ptr is neither a value or pointer operand");
     return WideningDecision != CM_GatherScatter;
   };
 
   // A helper that returns true if the given value is a bitcast or
   // getelementptr instruction contained in the loop.
   auto isLoopVaryingBitCastOrGEP = [&](Value *V) {
     return ((isa<BitCastInst>(V) && V->getType()->isPointerTy()) ||
             isa<GetElementPtrInst>(V)) &&
            !TheLoop->isLoopInvariant(V);
   };
 
   // A helper that evaluates a memory access's use of a pointer. If the use
   // will be a scalar use, and the pointer is only used by memory accesses, we
   // place the pointer in ScalarPtrs. Otherwise, the pointer is placed in
   // PossibleNonScalarPtrs.
   auto evaluatePtrUse = [&](Instruction *MemAccess, Value *Ptr) {
     // We only care about bitcast and getelementptr instructions contained in
     // the loop.
     if (!isLoopVaryingBitCastOrGEP(Ptr))
       return;
 
     // If the pointer has already been identified as scalar (e.g., if it was
     // also identified as uniform), there's nothing to do.
     auto *I = cast<Instruction>(Ptr);
     if (Worklist.count(I))
       return;
 
     // If the use of the pointer will be a scalar use, and all users of the
     // pointer are memory accesses, place the pointer in ScalarPtrs. Otherwise,
     // place the pointer in PossibleNonScalarPtrs.
     if (isScalarUse(MemAccess, Ptr) && llvm::all_of(I->users(), [&](User *U) {
           return isa<LoadInst>(U) || isa<StoreInst>(U);
         }))
       ScalarPtrs.insert(I);
     else
       PossibleNonScalarPtrs.insert(I);
   };
 
   // We seed the scalars analysis with three classes of instructions: (1)
   // instructions marked uniform-after-vectorization, (2) bitcast and
   // getelementptr instructions used by memory accesses requiring a scalar use,
   // and (3) pointer induction variables and their update instructions (we
   // currently only scalarize these).
   //
   // (1) Add to the worklist all instructions that have been identified as
   // uniform-after-vectorization.
   Worklist.insert(Uniforms[VF].begin(), Uniforms[VF].end());
 
   // (2) Add to the worklist all bitcast and getelementptr instructions used by
   // memory accesses requiring a scalar use. The pointer operands of loads and
   // stores will be scalar as long as the memory accesses is not a gather or
   // scatter operation. The value operand of a store will remain scalar if the
   // store is scalarized.
   for (auto *BB : TheLoop->blocks())
     for (auto &I : *BB) {
       if (auto *Load = dyn_cast<LoadInst>(&I)) {
         evaluatePtrUse(Load, Load->getPointerOperand());
       } else if (auto *Store = dyn_cast<StoreInst>(&I)) {
         evaluatePtrUse(Store, Store->getPointerOperand());
         evaluatePtrUse(Store, Store->getValueOperand());
       }
     }
   for (auto *I : ScalarPtrs)
     if (!PossibleNonScalarPtrs.count(I)) {
       LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *I << "\n");
       Worklist.insert(I);
     }
 
   // (3) Add to the worklist all pointer induction variables and their update
   // instructions.
   //
   // TODO: Once we are able to vectorize pointer induction variables we should
   //       no longer insert them into the worklist here.
   auto *Latch = TheLoop->getLoopLatch();
   for (auto &Induction : *Legal->getInductionVars()) {
     auto *Ind = Induction.first;
     auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));
     if (Induction.second.getKind() != InductionDescriptor::IK_PtrInduction)
       continue;
     Worklist.insert(Ind);
     Worklist.insert(IndUpdate);
     LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *Ind << "\n");
     LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *IndUpdate
                       << "\n");
   }
 
   // Insert the forced scalars.
   // FIXME: Currently widenPHIInstruction() often creates a dead vector
   // induction variable when the PHI user is scalarized.
   if (ForcedScalars.count(VF))
     for (auto *I : ForcedScalars.find(VF)->second)
       Worklist.insert(I);
 
   // Expand the worklist by looking through any bitcasts and getelementptr
   // instructions we've already identified as scalar. This is similar to the
   // expansion step in collectLoopUniforms(); however, here we're only
   // expanding to include additional bitcasts and getelementptr instructions.
   unsigned Idx = 0;
   while (Idx != Worklist.size()) {
     Instruction *Dst = Worklist[Idx++];
     if (!isLoopVaryingBitCastOrGEP(Dst->getOperand(0)))
       continue;
     auto *Src = cast<Instruction>(Dst->getOperand(0));
     if (llvm::all_of(Src->users(), [&](User *U) -> bool {
           auto *J = cast<Instruction>(U);
           return !TheLoop->contains(J) || Worklist.count(J) ||
                  ((isa<LoadInst>(J) || isa<StoreInst>(J)) &&
                   isScalarUse(J, Src));
         })) {
       Worklist.insert(Src);
       LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *Src << "\n");
     }
   }
 
   // An induction variable will remain scalar if all users of the induction
   // variable and induction variable update remain scalar.
   for (auto &Induction : *Legal->getInductionVars()) {
     auto *Ind = Induction.first;
     auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));
 
     // We already considered pointer induction variables, so there's no reason
     // to look at their users again.
     //
     // TODO: Once we are able to vectorize pointer induction variables we
     //       should no longer skip over them here.
     if (Induction.second.getKind() == InductionDescriptor::IK_PtrInduction)
       continue;
 
     // Determine if all users of the induction variable are scalar after
     // vectorization.
     auto ScalarInd = llvm::all_of(Ind->users(), [&](User *U) -> bool {
       auto *I = cast<Instruction>(U);
       return I == IndUpdate || !TheLoop->contains(I) || Worklist.count(I);
     });
     if (!ScalarInd)
       continue;
 
     // Determine if all users of the induction variable update instruction are
     // scalar after vectorization.
     auto ScalarIndUpdate =
         llvm::all_of(IndUpdate->users(), [&](User *U) -> bool {
           auto *I = cast<Instruction>(U);
           return I == Ind || !TheLoop->contains(I) || Worklist.count(I);
         });
     if (!ScalarIndUpdate)
       continue;
 
     // The induction variable and its update instruction will remain scalar.
     Worklist.insert(Ind);
     Worklist.insert(IndUpdate);
     LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *Ind << "\n");
     LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *IndUpdate
                       << "\n");
   }
 
   Scalars[VF].insert(Worklist.begin(), Worklist.end());
 }
 
 bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I) {
   if (!Legal->blockNeedsPredication(I->getParent()))
     return false;
   switch(I->getOpcode()) {
   default:
     break;
   case Instruction::Load:
   case Instruction::Store: {
     if (!Legal->isMaskRequired(I))
       return false;
     auto *Ptr = getLoadStorePointerOperand(I);
     auto *Ty = getMemInstValueType(I);
     return isa<LoadInst>(I) ?
         !(isLegalMaskedLoad(Ty, Ptr)  || isLegalMaskedGather(Ty))
       : !(isLegalMaskedStore(Ty, Ptr) || isLegalMaskedScatter(Ty));
   }
   case Instruction::UDiv:
   case Instruction::SDiv:
   case Instruction::SRem:
   case Instruction::URem:
     return mayDivideByZero(*I);
   }
   return false;
 }
 
 bool LoopVectorizationCostModel::memoryInstructionCanBeWidened(Instruction *I,
                                                                unsigned VF) {
   // Get and ensure we have a valid memory instruction.
   LoadInst *LI = dyn_cast<LoadInst>(I);
   StoreInst *SI = dyn_cast<StoreInst>(I);
   assert((LI || SI) && "Invalid memory instruction");
 
   auto *Ptr = getLoadStorePointerOperand(I);
 
   // In order to be widened, the pointer should be consecutive, first of all.
   if (!Legal->isConsecutivePtr(Ptr))
     return false;
 
   // If the instruction is a store located in a predicated block, it will be
   // scalarized.
   if (isScalarWithPredication(I))
     return false;
 
   // If the instruction's allocated size doesn't equal it's type size, it
   // requires padding and will be scalarized.
   auto &DL = I->getModule()->getDataLayout();
   auto *ScalarTy = LI ? LI->getType() : SI->getValueOperand()->getType();
   if (hasIrregularType(ScalarTy, DL, VF))
     return false;
 
   return true;
 }
 
 void LoopVectorizationCostModel::collectLoopUniforms(unsigned VF) {
   // We should not collect Uniforms more than once per VF. Right now,
   // this function is called from collectUniformsAndScalars(), which
   // already does this check. Collecting Uniforms for VF=1 does not make any
   // sense.
 
   assert(VF >= 2 && !Uniforms.count(VF) &&
          "This function should not be visited twice for the same VF");
 
   // Visit the list of Uniforms. If we'll not find any uniform value, we'll
   // not analyze again.  Uniforms.count(VF) will return 1.
   Uniforms[VF].clear();
 
   // We now know that the loop is vectorizable!
   // Collect instructions inside the loop that will remain uniform after
   // vectorization.
 
   // Global values, params and instructions outside of current loop are out of
   // scope.
   auto isOutOfScope = [&](Value *V) -> bool {
     Instruction *I = dyn_cast<Instruction>(V);
     return (!I || !TheLoop->contains(I));
   };
 
   SetVector<Instruction *> Worklist;
   BasicBlock *Latch = TheLoop->getLoopLatch();
 
   // Start with the conditional branch. If the branch condition is an
   // instruction contained in the loop that is only used by the branch, it is
   // uniform.
   auto *Cmp = dyn_cast<Instruction>(Latch->getTerminator()->getOperand(0));
   if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse()) {
     Worklist.insert(Cmp);
     LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *Cmp << "\n");
   }
 
   // Holds consecutive and consecutive-like pointers. Consecutive-like pointers
   // are pointers that are treated like consecutive pointers during
   // vectorization. The pointer operands of interleaved accesses are an
   // example.
   SmallSetVector<Instruction *, 8> ConsecutiveLikePtrs;
 
   // Holds pointer operands of instructions that are possibly non-uniform.
   SmallPtrSet<Instruction *, 8> PossibleNonUniformPtrs;
 
   auto isUniformDecision = [&](Instruction *I, unsigned VF) {
     InstWidening WideningDecision = getWideningDecision(I, VF);
     assert(WideningDecision != CM_Unknown &&
            "Widening decision should be ready at this moment");
 
     return (WideningDecision == CM_Widen ||
             WideningDecision == CM_Widen_Reverse ||
             WideningDecision == CM_Interleave);
   };
   // Iterate over the instructions in the loop, and collect all
   // consecutive-like pointer operands in ConsecutiveLikePtrs. If it's possible
   // that a consecutive-like pointer operand will be scalarized, we collect it
   // in PossibleNonUniformPtrs instead. We use two sets here because a single
   // getelementptr instruction can be used by both vectorized and scalarized
   // memory instructions. For example, if a loop loads and stores from the same
   // location, but the store is conditional, the store will be scalarized, and
   // the getelementptr won't remain uniform.
   for (auto *BB : TheLoop->blocks())
     for (auto &I : *BB) {
       // If there's no pointer operand, there's nothing to do.
       auto *Ptr = dyn_cast_or_null<Instruction>(getLoadStorePointerOperand(&I));
       if (!Ptr)
         continue;
 
       // True if all users of Ptr are memory accesses that have Ptr as their
       // pointer operand.
       auto UsersAreMemAccesses =
           llvm::all_of(Ptr->users(), [&](User *U) -> bool {
             return getLoadStorePointerOperand(U) == Ptr;
           });
 
       // Ensure the memory instruction will not be scalarized or used by
       // gather/scatter, making its pointer operand non-uniform. If the pointer
       // operand is used by any instruction other than a memory access, we
       // conservatively assume the pointer operand may be non-uniform.
       if (!UsersAreMemAccesses || !isUniformDecision(&I, VF))
         PossibleNonUniformPtrs.insert(Ptr);
 
       // If the memory instruction will be vectorized and its pointer operand
       // is consecutive-like, or interleaving - the pointer operand should
       // remain uniform.
       else
         ConsecutiveLikePtrs.insert(Ptr);
     }
 
   // Add to the Worklist all consecutive and consecutive-like pointers that
   // aren't also identified as possibly non-uniform.
   for (auto *V : ConsecutiveLikePtrs)
     if (!PossibleNonUniformPtrs.count(V)) {
       LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *V << "\n");
       Worklist.insert(V);
     }
 
   // Expand Worklist in topological order: whenever a new instruction
   // is added , its users should be either already inside Worklist, or
   // out of scope. It ensures a uniform instruction will only be used
   // by uniform instructions or out of scope instructions.
   unsigned idx = 0;
   while (idx != Worklist.size()) {
     Instruction *I = Worklist[idx++];
 
     for (auto OV : I->operand_values()) {
       if (isOutOfScope(OV))
         continue;
+      // First order recurrence Phi's should typically be considered
+      // non-uniform.
+      auto *OP = dyn_cast<PHINode>(OV);
+      if (OP && Legal->isFirstOrderRecurrence(OP))
+        continue;
+      // If all the users of the operand are uniform, then add the
+      // operand into the uniform worklist.
       auto *OI = cast<Instruction>(OV);
       if (llvm::all_of(OI->users(), [&](User *U) -> bool {
             auto *J = cast<Instruction>(U);
             return !TheLoop->contains(J) || Worklist.count(J) ||
                    (OI == getLoadStorePointerOperand(J) &&
                     isUniformDecision(J, VF));
           })) {
         Worklist.insert(OI);
         LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *OI << "\n");
       }
     }
   }
 
   // Returns true if Ptr is the pointer operand of a memory access instruction
   // I, and I is known to not require scalarization.
   auto isVectorizedMemAccessUse = [&](Instruction *I, Value *Ptr) -> bool {
     return getLoadStorePointerOperand(I) == Ptr && isUniformDecision(I, VF);
   };
 
   // For an instruction to be added into Worklist above, all its users inside
   // the loop should also be in Worklist. However, this condition cannot be
   // true for phi nodes that form a cyclic dependence. We must process phi
   // nodes separately. An induction variable will remain uniform if all users
   // of the induction variable and induction variable update remain uniform.
   // The code below handles both pointer and non-pointer induction variables.
   for (auto &Induction : *Legal->getInductionVars()) {
     auto *Ind = Induction.first;
     auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));
 
     // Determine if all users of the induction variable are uniform after
     // vectorization.
     auto UniformInd = llvm::all_of(Ind->users(), [&](User *U) -> bool {
       auto *I = cast<Instruction>(U);
       return I == IndUpdate || !TheLoop->contains(I) || Worklist.count(I) ||
              isVectorizedMemAccessUse(I, Ind);
     });
     if (!UniformInd)
       continue;
 
     // Determine if all users of the induction variable update instruction are
     // uniform after vectorization.
     auto UniformIndUpdate =
         llvm::all_of(IndUpdate->users(), [&](User *U) -> bool {
           auto *I = cast<Instruction>(U);
           return I == Ind || !TheLoop->contains(I) || Worklist.count(I) ||
                  isVectorizedMemAccessUse(I, IndUpdate);
         });
     if (!UniformIndUpdate)
       continue;
 
     // The induction variable and its update instruction will remain uniform.
     Worklist.insert(Ind);
     Worklist.insert(IndUpdate);
     LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *Ind << "\n");
     LLVM_DEBUG(dbgs() << "LV: Found uniform instruction: " << *IndUpdate
                       << "\n");
   }
 
   Uniforms[VF].insert(Worklist.begin(), Worklist.end());
 }
 
 void InterleavedAccessInfo::collectConstStrideAccesses(
     MapVector<Instruction *, StrideDescriptor> &AccessStrideInfo,
     const ValueToValueMap &Strides) {
   auto &DL = TheLoop->getHeader()->getModule()->getDataLayout();
 
   // Since it's desired that the load/store instructions be maintained in
   // "program order" for the interleaved access analysis, we have to visit the
   // blocks in the loop in reverse postorder (i.e., in a topological order).
   // Such an ordering will ensure that any load/store that may be executed
   // before a second load/store will precede the second load/store in
   // AccessStrideInfo.
   LoopBlocksDFS DFS(TheLoop);
   DFS.perform(LI);
   for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO()))
     for (auto &I : *BB) {
       auto *LI = dyn_cast<LoadInst>(&I);
       auto *SI = dyn_cast<StoreInst>(&I);
       if (!LI && !SI)
         continue;
 
       Value *Ptr = getLoadStorePointerOperand(&I);
       // We don't check wrapping here because we don't know yet if Ptr will be
       // part of a full group or a group with gaps. Checking wrapping for all
       // pointers (even those that end up in groups with no gaps) will be overly
       // conservative. For full groups, wrapping should be ok since if we would
       // wrap around the address space we would do a memory access at nullptr
       // even without the transformation. The wrapping checks are therefore
       // deferred until after we've formed the interleaved groups.
       int64_t Stride = getPtrStride(PSE, Ptr, TheLoop, Strides,
                                     /*Assume=*/true, /*ShouldCheckWrap=*/false);
 
       const SCEV *Scev = replaceSymbolicStrideSCEV(PSE, Strides, Ptr);
       PointerType *PtrTy = dyn_cast<PointerType>(Ptr->getType());
       uint64_t Size = DL.getTypeAllocSize(PtrTy->getElementType());
 
       // An alignment of 0 means target ABI alignment.
       unsigned Align = getMemInstAlignment(&I);
       if (!Align)
         Align = DL.getABITypeAlignment(PtrTy->getElementType());
 
       AccessStrideInfo[&I] = StrideDescriptor(Stride, Scev, Size, Align);
     }
 }
 
 // Analyze interleaved accesses and collect them into interleaved load and
 // store groups.
 //
 // When generating code for an interleaved load group, we effectively hoist all
 // loads in the group to the location of the first load in program order. When
 // generating code for an interleaved store group, we sink all stores to the
 // location of the last store. This code motion can change the order of load
 // and store instructions and may break dependences.
 //
 // The code generation strategy mentioned above ensures that we won't violate
 // any write-after-read (WAR) dependences.
 //
 // E.g., for the WAR dependence:  a = A[i];      // (1)
 //                                A[i] = b;      // (2)
 //
 // The store group of (2) is always inserted at or below (2), and the load
 // group of (1) is always inserted at or above (1). Thus, the instructions will
 // never be reordered. All other dependences are checked to ensure the
 // correctness of the instruction reordering.
 //
 // The algorithm visits all memory accesses in the loop in bottom-up program
 // order. Program order is established by traversing the blocks in the loop in
 // reverse postorder when collecting the accesses.
 //
 // We visit the memory accesses in bottom-up order because it can simplify the
 // construction of store groups in the presence of write-after-write (WAW)
 // dependences.
 //
 // E.g., for the WAW dependence:  A[i] = a;      // (1)
 //                                A[i] = b;      // (2)
 //                                A[i + 1] = c;  // (3)
 //
 // We will first create a store group with (3) and (2). (1) can't be added to
 // this group because it and (2) are dependent. However, (1) can be grouped
 // with other accesses that may precede it in program order. Note that a
 // bottom-up order does not imply that WAW dependences should not be checked.
 void InterleavedAccessInfo::analyzeInterleaving() {
   LLVM_DEBUG(dbgs() << "LV: Analyzing interleaved accesses...\n");
   const ValueToValueMap &Strides = LAI->getSymbolicStrides();
 
   // Holds all accesses with a constant stride.
   MapVector<Instruction *, StrideDescriptor> AccessStrideInfo;
   collectConstStrideAccesses(AccessStrideInfo, Strides);
 
   if (AccessStrideInfo.empty())
     return;
 
   // Collect the dependences in the loop.
   collectDependences();
 
   // Holds all interleaved store groups temporarily.
   SmallSetVector<InterleaveGroup *, 4> StoreGroups;
   // Holds all interleaved load groups temporarily.
   SmallSetVector<InterleaveGroup *, 4> LoadGroups;
 
   // Search in bottom-up program order for pairs of accesses (A and B) that can
   // form interleaved load or store groups. In the algorithm below, access A
   // precedes access B in program order. We initialize a group for B in the
   // outer loop of the algorithm, and then in the inner loop, we attempt to
   // insert each A into B's group if:
   //
   //  1. A and B have the same stride,
   //  2. A and B have the same memory object size, and
   //  3. A belongs in B's group according to its distance from B.
   //
   // Special care is taken to ensure group formation will not break any
   // dependences.
   for (auto BI = AccessStrideInfo.rbegin(), E = AccessStrideInfo.rend();
        BI != E; ++BI) {
     Instruction *B = BI->first;
     StrideDescriptor DesB = BI->second;
 
     // Initialize a group for B if it has an allowable stride. Even if we don't
     // create a group for B, we continue with the bottom-up algorithm to ensure
     // we don't break any of B's dependences.
     InterleaveGroup *Group = nullptr;
     if (isStrided(DesB.Stride)) {
       Group = getInterleaveGroup(B);
       if (!Group) {
         LLVM_DEBUG(dbgs() << "LV: Creating an interleave group with:" << *B
                           << '\n');
         Group = createInterleaveGroup(B, DesB.Stride, DesB.Align);
       }
       if (B->mayWriteToMemory())
         StoreGroups.insert(Group);
       else
         LoadGroups.insert(Group);
     }
 
     for (auto AI = std::next(BI); AI != E; ++AI) {
       Instruction *A = AI->first;
       StrideDescriptor DesA = AI->second;
 
       // Our code motion strategy implies that we can't have dependences
       // between accesses in an interleaved group and other accesses located
       // between the first and last member of the group. Note that this also
       // means that a group can't have more than one member at a given offset.
       // The accesses in a group can have dependences with other accesses, but
       // we must ensure we don't extend the boundaries of the group such that
       // we encompass those dependent accesses.
       //
       // For example, assume we have the sequence of accesses shown below in a
       // stride-2 loop:
       //
       //  (1, 2) is a group | A[i]   = a;  // (1)
       //                    | A[i-1] = b;  // (2) |
       //                      A[i-3] = c;  // (3)
       //                      A[i]   = d;  // (4) | (2, 4) is not a group
       //
       // Because accesses (2) and (3) are dependent, we can group (2) with (1)
       // but not with (4). If we did, the dependent access (3) would be within
       // the boundaries of the (2, 4) group.
       if (!canReorderMemAccessesForInterleavedGroups(&*AI, &*BI)) {
         // If a dependence exists and A is already in a group, we know that A
         // must be a store since A precedes B and WAR dependences are allowed.
         // Thus, A would be sunk below B. We release A's group to prevent this
         // illegal code motion. A will then be free to form another group with
         // instructions that precede it.
         if (isInterleaved(A)) {
           InterleaveGroup *StoreGroup = getInterleaveGroup(A);
           StoreGroups.remove(StoreGroup);
           releaseGroup(StoreGroup);
         }
 
         // If a dependence exists and A is not already in a group (or it was
         // and we just released it), B might be hoisted above A (if B is a
         // load) or another store might be sunk below A (if B is a store). In
         // either case, we can't add additional instructions to B's group. B
         // will only form a group with instructions that it precedes.
         break;
       }
 
       // At this point, we've checked for illegal code motion. If either A or B
       // isn't strided, there's nothing left to do.
       if (!isStrided(DesA.Stride) || !isStrided(DesB.Stride))
         continue;
 
       // Ignore A if it's already in a group or isn't the same kind of memory
       // operation as B.
       // Note that mayReadFromMemory() isn't mutually exclusive to mayWriteToMemory
       // in the case of atomic loads. We shouldn't see those here, canVectorizeMemory()
       // should have returned false - except for the case we asked for optimization
       // remarks.
       if (isInterleaved(A) || (A->mayReadFromMemory() != B->mayReadFromMemory())
           || (A->mayWriteToMemory() != B->mayWriteToMemory()))
         continue;
 
       // Check rules 1 and 2. Ignore A if its stride or size is different from
       // that of B.
       if (DesA.Stride != DesB.Stride || DesA.Size != DesB.Size)
         continue;
 
       // Ignore A if the memory object of A and B don't belong to the same
       // address space
       if (getMemInstAddressSpace(A) != getMemInstAddressSpace(B))
         continue;
 
       // Calculate the distance from A to B.
       const SCEVConstant *DistToB = dyn_cast<SCEVConstant>(
           PSE.getSE()->getMinusSCEV(DesA.Scev, DesB.Scev));
       if (!DistToB)
         continue;
       int64_t DistanceToB = DistToB->getAPInt().getSExtValue();
 
       // Check rule 3. Ignore A if its distance to B is not a multiple of the
       // size.
       if (DistanceToB % static_cast<int64_t>(DesB.Size))
         continue;
 
       // Ignore A if either A or B is in a predicated block. Although we
       // currently prevent group formation for predicated accesses, we may be
       // able to relax this limitation in the future once we handle more
       // complicated blocks.
       if (isPredicated(A->getParent()) || isPredicated(B->getParent()))
         continue;
 
       // The index of A is the index of B plus A's distance to B in multiples
       // of the size.
       int IndexA =
           Group->getIndex(B) + DistanceToB / static_cast<int64_t>(DesB.Size);
 
       // Try to insert A into B's group.
       if (Group->insertMember(A, IndexA, DesA.Align)) {
         LLVM_DEBUG(dbgs() << "LV: Inserted:" << *A << '\n'
                           << "    into the interleave group with" << *B
                           << '\n');
         InterleaveGroupMap[A] = Group;
 
         // Set the first load in program order as the insert position.
         if (A->mayReadFromMemory())
           Group->setInsertPos(A);
       }
     } // Iteration over A accesses.
   } // Iteration over B accesses.
 
   // Remove interleaved store groups with gaps.
   for (InterleaveGroup *Group : StoreGroups)
     if (Group->getNumMembers() != Group->getFactor()) {
       LLVM_DEBUG(
           dbgs() << "LV: Invalidate candidate interleaved store group due "
                     "to gaps.\n");
       releaseGroup(Group);
     }
   // Remove interleaved groups with gaps (currently only loads) whose memory
   // accesses may wrap around. We have to revisit the getPtrStride analysis,
   // this time with ShouldCheckWrap=true, since collectConstStrideAccesses does
   // not check wrapping (see documentation there).
   // FORNOW we use Assume=false;
   // TODO: Change to Assume=true but making sure we don't exceed the threshold
   // of runtime SCEV assumptions checks (thereby potentially failing to
   // vectorize altogether).
   // Additional optional optimizations:
   // TODO: If we are peeling the loop and we know that the first pointer doesn't
   // wrap then we can deduce that all pointers in the group don't wrap.
   // This means that we can forcefully peel the loop in order to only have to
   // check the first pointer for no-wrap. When we'll change to use Assume=true
   // we'll only need at most one runtime check per interleaved group.
   for (InterleaveGroup *Group : LoadGroups) {
     // Case 1: A full group. Can Skip the checks; For full groups, if the wide
     // load would wrap around the address space we would do a memory access at
     // nullptr even without the transformation.
     if (Group->getNumMembers() == Group->getFactor())
       continue;
 
     // Case 2: If first and last members of the group don't wrap this implies
     // that all the pointers in the group don't wrap.
     // So we check only group member 0 (which is always guaranteed to exist),
     // and group member Factor - 1; If the latter doesn't exist we rely on
     // peeling (if it is a non-reveresed accsess -- see Case 3).
     Value *FirstMemberPtr = getLoadStorePointerOperand(Group->getMember(0));
     if (!getPtrStride(PSE, FirstMemberPtr, TheLoop, Strides, /*Assume=*/false,
                       /*ShouldCheckWrap=*/true)) {
       LLVM_DEBUG(
           dbgs() << "LV: Invalidate candidate interleaved group due to "
                     "first group member potentially pointer-wrapping.\n");
       releaseGroup(Group);
       continue;
     }
     Instruction *LastMember = Group->getMember(Group->getFactor() - 1);
     if (LastMember) {
       Value *LastMemberPtr = getLoadStorePointerOperand(LastMember);
       if (!getPtrStride(PSE, LastMemberPtr, TheLoop, Strides, /*Assume=*/false,
                         /*ShouldCheckWrap=*/true)) {
         LLVM_DEBUG(
             dbgs() << "LV: Invalidate candidate interleaved group due to "
                       "last group member potentially pointer-wrapping.\n");
         releaseGroup(Group);
       }
     } else {
       // Case 3: A non-reversed interleaved load group with gaps: We need
       // to execute at least one scalar epilogue iteration. This will ensure
       // we don't speculatively access memory out-of-bounds. We only need
       // to look for a member at index factor - 1, since every group must have
       // a member at index zero.
       if (Group->isReverse()) {
         LLVM_DEBUG(
             dbgs() << "LV: Invalidate candidate interleaved group due to "
                       "a reverse access with gaps.\n");
         releaseGroup(Group);
         continue;
       }
       LLVM_DEBUG(
           dbgs() << "LV: Interleaved group requires epilogue iteration.\n");
       RequiresScalarEpilogue = true;
     }
   }
 }
 
 Optional<unsigned> LoopVectorizationCostModel::computeMaxVF(bool OptForSize) {
   if (Legal->getRuntimePointerChecking()->Need && TTI.hasBranchDivergence()) {
     // TODO: It may by useful to do since it's still likely to be dynamically
     // uniform if the target can skip.
     LLVM_DEBUG(
         dbgs() << "LV: Not inserting runtime ptr check for divergent target");
 
     ORE->emit(
       createMissedAnalysis("CantVersionLoopWithDivergentTarget")
       << "runtime pointer checks needed. Not enabled for divergent target");
 
     return None;
   }
 
   unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
   if (!OptForSize) // Remaining checks deal with scalar loop when OptForSize.
     return computeFeasibleMaxVF(OptForSize, TC);
 
   if (Legal->getRuntimePointerChecking()->Need) {
     ORE->emit(createMissedAnalysis("CantVersionLoopWithOptForSize")
               << "runtime pointer checks needed. Enable vectorization of this "
                  "loop with '#pragma clang loop vectorize(enable)' when "
                  "compiling with -Os/-Oz");
     LLVM_DEBUG(
         dbgs()
         << "LV: Aborting. Runtime ptr check is required with -Os/-Oz.\n");
     return None;
   }
 
   // If we optimize the program for size, avoid creating the tail loop.
   LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');
 
   // If we don't know the precise trip count, don't try to vectorize.
   if (TC < 2) {
     ORE->emit(
         createMissedAnalysis("UnknownLoopCountComplexCFG")
         << "unable to calculate the loop count due to complex control flow");
     LLVM_DEBUG(
         dbgs() << "LV: Aborting. A tail loop is required with -Os/-Oz.\n");
     return None;
   }
 
   unsigned MaxVF = computeFeasibleMaxVF(OptForSize, TC);
 
   if (TC % MaxVF != 0) {
     // If the trip count that we found modulo the vectorization factor is not
     // zero then we require a tail.
     // FIXME: look for a smaller MaxVF that does divide TC rather than give up.
     // FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a
     //        smaller MaxVF that does not require a scalar epilog.
 
     ORE->emit(createMissedAnalysis("NoTailLoopWithOptForSize")
               << "cannot optimize for size and vectorize at the "
                  "same time. Enable vectorization of this loop "
                  "with '#pragma clang loop vectorize(enable)' "
                  "when compiling with -Os/-Oz");
     LLVM_DEBUG(
         dbgs() << "LV: Aborting. A tail loop is required with -Os/-Oz.\n");
     return None;
   }
 
   return MaxVF;
 }
 
 unsigned
 LoopVectorizationCostModel::computeFeasibleMaxVF(bool OptForSize,
                                                  unsigned ConstTripCount) {
   MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);
   unsigned SmallestType, WidestType;
   std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();
   unsigned WidestRegister = TTI.getRegisterBitWidth(true);
 
   // Get the maximum safe dependence distance in bits computed by LAA.
   // It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
   // the memory accesses that is most restrictive (involved in the smallest
   // dependence distance).
   unsigned MaxSafeRegisterWidth = Legal->getMaxSafeRegisterWidth();
 
   WidestRegister = std::min(WidestRegister, MaxSafeRegisterWidth);
 
   unsigned MaxVectorSize = WidestRegister / WidestType;
 
   LLVM_DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType
                     << " / " << WidestType << " bits.\n");
   LLVM_DEBUG(dbgs() << "LV: The Widest register safe to use is: "
                     << WidestRegister << " bits.\n");
 
   assert(MaxVectorSize <= 256 && "Did not expect to pack so many elements"
                                  " into one vector!");
   if (MaxVectorSize == 0) {
     LLVM_DEBUG(dbgs() << "LV: The target has no vector registers.\n");
     MaxVectorSize = 1;
     return MaxVectorSize;
   } else if (ConstTripCount && ConstTripCount < MaxVectorSize &&
              isPowerOf2_32(ConstTripCount)) {
     // We need to clamp the VF to be the ConstTripCount. There is no point in
     // choosing a higher viable VF as done in the loop below.
     LLVM_DEBUG(dbgs() << "LV: Clamping the MaxVF to the constant trip count: "
                       << ConstTripCount << "\n");
     MaxVectorSize = ConstTripCount;
     return MaxVectorSize;
   }
 
   unsigned MaxVF = MaxVectorSize;
   if (TTI.shouldMaximizeVectorBandwidth(OptForSize) ||
       (MaximizeBandwidth && !OptForSize)) {
     // Collect all viable vectorization factors larger than the default MaxVF
     // (i.e. MaxVectorSize).
     SmallVector<unsigned, 8> VFs;
     unsigned NewMaxVectorSize = WidestRegister / SmallestType;
     for (unsigned VS = MaxVectorSize * 2; VS <= NewMaxVectorSize; VS *= 2)
       VFs.push_back(VS);
 
     // For each VF calculate its register usage.
     auto RUs = calculateRegisterUsage(VFs);
 
     // Select the largest VF which doesn't require more registers than existing
     // ones.
     unsigned TargetNumRegisters = TTI.getNumberOfRegisters(true);
     for (int i = RUs.size() - 1; i >= 0; --i) {
       if (RUs[i].MaxLocalUsers <= TargetNumRegisters) {
         MaxVF = VFs[i];
         break;
       }
     }
     if (unsigned MinVF = TTI.getMinimumVF(SmallestType)) {
       if (MaxVF < MinVF) {
         LLVM_DEBUG(dbgs() << "LV: Overriding calculated MaxVF(" << MaxVF
                           << ") with target's minimum: " << MinVF << '\n');
         MaxVF = MinVF;
       }
     }
   }
   return MaxVF;
 }
 
 VectorizationFactor
 LoopVectorizationCostModel::selectVectorizationFactor(unsigned MaxVF) {
   float Cost = expectedCost(1).first;
   const float ScalarCost = Cost;
   unsigned Width = 1;
   LLVM_DEBUG(dbgs() << "LV: Scalar loop costs: " << (int)ScalarCost << ".\n");
 
   bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled;
   if (ForceVectorization && MaxVF > 1) {
     // Ignore scalar width, because the user explicitly wants vectorization.
     // Initialize cost to max so that VF = 2 is, at least, chosen during cost
     // evaluation.
     Cost = std::numeric_limits<float>::max();
   }
 
   for (unsigned i = 2; i <= MaxVF; i *= 2) {
     // Notice that the vector loop needs to be executed less times, so
     // we need to divide the cost of the vector loops by the width of
     // the vector elements.
     VectorizationCostTy C = expectedCost(i);
     float VectorCost = C.first / (float)i;
     LLVM_DEBUG(dbgs() << "LV: Vector loop of width " << i
                       << " costs: " << (int)VectorCost << ".\n");
     if (!C.second && !ForceVectorization) {
       LLVM_DEBUG(
           dbgs() << "LV: Not considering vector loop of width " << i
                  << " because it will not generate any vector instructions.\n");
       continue;
     }
     if (VectorCost < Cost) {
       Cost = VectorCost;
       Width = i;
     }
   }
 
   if (!EnableCondStoresVectorization && NumPredStores) {
     ORE->emit(createMissedAnalysis("ConditionalStore")
               << "store that is conditionally executed prevents vectorization");
     LLVM_DEBUG(
         dbgs() << "LV: No vectorization. There are conditional stores.\n");
     Width = 1;
     Cost = ScalarCost;
   }
 
   LLVM_DEBUG(if (ForceVectorization && Width > 1 && Cost >= ScalarCost) dbgs()
              << "LV: Vectorization seems to be not beneficial, "
              << "but was forced by a user.\n");
   LLVM_DEBUG(dbgs() << "LV: Selecting VF: " << Width << ".\n");
   VectorizationFactor Factor = {Width, (unsigned)(Width * Cost)};
   return Factor;
 }
 
 std::pair<unsigned, unsigned>
 LoopVectorizationCostModel::getSmallestAndWidestTypes() {
   unsigned MinWidth = -1U;
   unsigned MaxWidth = 8;
   const DataLayout &DL = TheFunction->getParent()->getDataLayout();
 
   // For each block.
   for (BasicBlock *BB : TheLoop->blocks()) {
     // For each instruction in the loop.
     for (Instruction &I : *BB) {
       Type *T = I.getType();
 
       // Skip ignored values.
       if (ValuesToIgnore.count(&I))
         continue;
 
       // Only examine Loads, Stores and PHINodes.
       if (!isa<LoadInst>(I) && !isa<StoreInst>(I) && !isa<PHINode>(I))
         continue;
 
       // Examine PHI nodes that are reduction variables. Update the type to
       // account for the recurrence type.
       if (auto *PN = dyn_cast<PHINode>(&I)) {
         if (!Legal->isReductionVariable(PN))
           continue;
         RecurrenceDescriptor RdxDesc = (*Legal->getReductionVars())[PN];
         T = RdxDesc.getRecurrenceType();
       }
 
       // Examine the stored values.
       if (auto *ST = dyn_cast<StoreInst>(&I))
         T = ST->getValueOperand()->getType();
 
       // Ignore loaded pointer types and stored pointer types that are not
       // vectorizable.
       //
       // FIXME: The check here attempts to predict whether a load or store will
       //        be vectorized. We only know this for certain after a VF has
       //        been selected. Here, we assume that if an access can be
       //        vectorized, it will be. We should also look at extending this
       //        optimization to non-pointer types.
       //
       if (T->isPointerTy() && !isConsecutiveLoadOrStore(&I) &&
           !isAccessInterleaved(&I) && !isLegalGatherOrScatter(&I))
         continue;
 
       MinWidth = std::min(MinWidth,
                           (unsigned)DL.getTypeSizeInBits(T->getScalarType()));
       MaxWidth = std::max(MaxWidth,
                           (unsigned)DL.getTypeSizeInBits(T->getScalarType()));
     }
   }
 
   return {MinWidth, MaxWidth};
 }
 
 unsigned LoopVectorizationCostModel::selectInterleaveCount(bool OptForSize,
                                                            unsigned VF,
                                                            unsigned LoopCost) {
   // -- The interleave heuristics --
   // We interleave the loop in order to expose ILP and reduce the loop overhead.
   // There are many micro-architectural considerations that we can't predict
   // at this level. For example, frontend pressure (on decode or fetch) due to
   // code size, or the number and capabilities of the execution ports.
   //
   // We use the following heuristics to select the interleave count:
   // 1. If the code has reductions, then we interleave to break the cross
   // iteration dependency.
   // 2. If the loop is really small, then we interleave to reduce the loop
   // overhead.
   // 3. We don't interleave if we think that we will spill registers to memory
   // due to the increased register pressure.
 
   // When we optimize for size, we don't interleave.
   if (OptForSize)
     return 1;
 
   // We used the distance for the interleave count.
   if (Legal->getMaxSafeDepDistBytes() != -1U)
     return 1;
 
   // Do not interleave loops with a relatively small trip count.
   unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
   if (TC > 1 && TC < TinyTripCountInterleaveThreshold)
     return 1;
 
   unsigned TargetNumRegisters = TTI.getNumberOfRegisters(VF > 1);
   LLVM_DEBUG(dbgs() << "LV: The target has " << TargetNumRegisters
                     << " registers\n");
 
   if (VF == 1) {
     if (ForceTargetNumScalarRegs.getNumOccurrences() > 0)
       TargetNumRegisters = ForceTargetNumScalarRegs;
   } else {
     if (ForceTargetNumVectorRegs.getNumOccurrences() > 0)
       TargetNumRegisters = ForceTargetNumVectorRegs;
   }
 
   RegisterUsage R = calculateRegisterUsage({VF})[0];
   // We divide by these constants so assume that we have at least one
   // instruction that uses at least one register.
   R.MaxLocalUsers = std::max(R.MaxLocalUsers, 1U);
 
   // We calculate the interleave count using the following formula.
   // Subtract the number of loop invariants from the number of available
   // registers. These registers are used by all of the interleaved instances.
   // Next, divide the remaining registers by the number of registers that is
   // required by the loop, in order to estimate how many parallel instances
   // fit without causing spills. All of this is rounded down if necessary to be
   // a power of two. We want power of two interleave count to simplify any
   // addressing operations or alignment considerations.
   unsigned IC = PowerOf2Floor((TargetNumRegisters - R.LoopInvariantRegs) /
                               R.MaxLocalUsers);
 
   // Don't count the induction variable as interleaved.
   if (EnableIndVarRegisterHeur)
     IC = PowerOf2Floor((TargetNumRegisters - R.LoopInvariantRegs - 1) /
                        std::max(1U, (R.MaxLocalUsers - 1)));
 
   // Clamp the interleave ranges to reasonable counts.
   unsigned MaxInterleaveCount = TTI.getMaxInterleaveFactor(VF);
 
   // Check if the user has overridden the max.
   if (VF == 1) {
     if (ForceTargetMaxScalarInterleaveFactor.getNumOccurrences() > 0)
       MaxInterleaveCount = ForceTargetMaxScalarInterleaveFactor;
   } else {
     if (ForceTargetMaxVectorInterleaveFactor.getNumOccurrences() > 0)
       MaxInterleaveCount = ForceTargetMaxVectorInterleaveFactor;
   }
 
   // If we did not calculate the cost for VF (because the user selected the VF)
   // then we calculate the cost of VF here.
   if (LoopCost == 0)
     LoopCost = expectedCost(VF).first;
 
   // Clamp the calculated IC to be between the 1 and the max interleave count
   // that the target allows.
   if (IC > MaxInterleaveCount)
     IC = MaxInterleaveCount;
   else if (IC < 1)
     IC = 1;
 
   // Interleave if we vectorized this loop and there is a reduction that could
   // benefit from interleaving.
   if (VF > 1 && !Legal->getReductionVars()->empty()) {
     LLVM_DEBUG(dbgs() << "LV: Interleaving because of reductions.\n");
     return IC;
   }
 
   // Note that if we've already vectorized the loop we will have done the
   // runtime check and so interleaving won't require further checks.
   bool InterleavingRequiresRuntimePointerCheck =
       (VF == 1 && Legal->getRuntimePointerChecking()->Need);
 
   // We want to interleave small loops in order to reduce the loop overhead and
   // potentially expose ILP opportunities.
   LLVM_DEBUG(dbgs() << "LV: Loop cost is " << LoopCost << '\n');
   if (!InterleavingRequiresRuntimePointerCheck && LoopCost < SmallLoopCost) {
     // We assume that the cost overhead is 1 and we use the cost model
     // to estimate the cost of the loop and interleave until the cost of the
     // loop overhead is about 5% of the cost of the loop.
     unsigned SmallIC =
         std::min(IC, (unsigned)PowerOf2Floor(SmallLoopCost / LoopCost));
 
     // Interleave until store/load ports (estimated by max interleave count) are
     // saturated.
     unsigned NumStores = Legal->getNumStores();
     unsigned NumLoads = Legal->getNumLoads();
     unsigned StoresIC = IC / (NumStores ? NumStores : 1);
     unsigned LoadsIC = IC / (NumLoads ? NumLoads : 1);
 
     // If we have a scalar reduction (vector reductions are already dealt with
     // by this point), we can increase the critical path length if the loop
     // we're interleaving is inside another loop. Limit, by default to 2, so the
     // critical path only gets increased by one reduction operation.
     if (!Legal->getReductionVars()->empty() && TheLoop->getLoopDepth() > 1) {
       unsigned F = static_cast<unsigned>(MaxNestedScalarReductionIC);
       SmallIC = std::min(SmallIC, F);
       StoresIC = std::min(StoresIC, F);
       LoadsIC = std::min(LoadsIC, F);
     }
 
     if (EnableLoadStoreRuntimeInterleave &&
         std::max(StoresIC, LoadsIC) > SmallIC) {
       LLVM_DEBUG(
           dbgs() << "LV: Interleaving to saturate store or load ports.\n");
       return std::max(StoresIC, LoadsIC);
     }
 
     LLVM_DEBUG(dbgs() << "LV: Interleaving to reduce branch cost.\n");
     return SmallIC;
   }
 
   // Interleave if this is a large loop (small loops are already dealt with by
   // this point) that could benefit from interleaving.
   bool HasReductions = !Legal->getReductionVars()->empty();
   if (TTI.enableAggressiveInterleaving(HasReductions)) {
     LLVM_DEBUG(dbgs() << "LV: Interleaving to expose ILP.\n");
     return IC;
   }
 
   LLVM_DEBUG(dbgs() << "LV: Not Interleaving.\n");
   return 1;
 }
 
 SmallVector<LoopVectorizationCostModel::RegisterUsage, 8>
 LoopVectorizationCostModel::calculateRegisterUsage(ArrayRef<unsigned> VFs) {
   // This function calculates the register usage by measuring the highest number
   // of values that are alive at a single location. Obviously, this is a very
   // rough estimation. We scan the loop in a topological order in order and
   // assign a number to each instruction. We use RPO to ensure that defs are
   // met before their users. We assume that each instruction that has in-loop
   // users starts an interval. We record every time that an in-loop value is
   // used, so we have a list of the first and last occurrences of each
   // instruction. Next, we transpose this data structure into a multi map that
   // holds the list of intervals that *end* at a specific location. This multi
   // map allows us to perform a linear search. We scan the instructions linearly
   // and record each time that a new interval starts, by placing it in a set.
   // If we find this value in the multi-map then we remove it from the set.
   // The max register usage is the maximum size of the set.
   // We also search for instructions that are defined outside the loop, but are
   // used inside the loop. We need this number separately from the max-interval
   // usage number because when we unroll, loop-invariant values do not take
   // more register.
   LoopBlocksDFS DFS(TheLoop);
   DFS.perform(LI);
 
   RegisterUsage RU;
 
   // Each 'key' in the map opens a new interval. The values
   // of the map are the index of the 'last seen' usage of the
   // instruction that is the key.
   using IntervalMap = DenseMap<Instruction *, unsigned>;
 
   // Maps instruction to its index.
   DenseMap<unsigned, Instruction *> IdxToInstr;
   // Marks the end of each interval.
   IntervalMap EndPoint;
   // Saves the list of instruction indices that are used in the loop.
   SmallPtrSet<Instruction *, 8> Ends;
   // Saves the list of values that are used in the loop but are
   // defined outside the loop, such as arguments and constants.
   SmallPtrSet<Value *, 8> LoopInvariants;
 
   unsigned Index = 0;
   for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
     for (Instruction &I : *BB) {
       IdxToInstr[Index++] = &I;
 
       // Save the end location of each USE.
       for (Value *U : I.operands()) {
         auto *Instr = dyn_cast<Instruction>(U);
 
         // Ignore non-instruction values such as arguments, constants, etc.
         if (!Instr)
           continue;
 
         // If this instruction is outside the loop then record it and continue.
         if (!TheLoop->contains(Instr)) {
           LoopInvariants.insert(Instr);
           continue;
         }
 
         // Overwrite previous end points.
         EndPoint[Instr] = Index;
         Ends.insert(Instr);
       }
     }
   }
 
   // Saves the list of intervals that end with the index in 'key'.
   using InstrList = SmallVector<Instruction *, 2>;
   DenseMap<unsigned, InstrList> TransposeEnds;
 
   // Transpose the EndPoints to a list of values that end at each index.
   for (auto &Interval : EndPoint)
     TransposeEnds[Interval.second].push_back(Interval.first);
 
   SmallPtrSet<Instruction *, 8> OpenIntervals;
 
   // Get the size of the widest register.
   unsigned MaxSafeDepDist = -1U;
   if (Legal->getMaxSafeDepDistBytes() != -1U)
     MaxSafeDepDist = Legal->getMaxSafeDepDistBytes() * 8;
   unsigned WidestRegister =
       std::min(TTI.getRegisterBitWidth(true), MaxSafeDepDist);
   const DataLayout &DL = TheFunction->getParent()->getDataLayout();
 
   SmallVector<RegisterUsage, 8> RUs(VFs.size());
   SmallVector<unsigned, 8> MaxUsages(VFs.size(), 0);
 
   LLVM_DEBUG(dbgs() << "LV(REG): Calculating max register usage:\n");
 
   // A lambda that gets the register usage for the given type and VF.
   auto GetRegUsage = [&DL, WidestRegister](Type *Ty, unsigned VF) {
     if (Ty->isTokenTy())
       return 0U;
     unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType());
     return std::max<unsigned>(1, VF * TypeSize / WidestRegister);
   };
 
   for (unsigned int i = 0; i < Index; ++i) {
     Instruction *I = IdxToInstr[i];
 
     // Remove all of the instructions that end at this location.
     InstrList &List = TransposeEnds[i];
     for (Instruction *ToRemove : List)
       OpenIntervals.erase(ToRemove);
 
     // Ignore instructions that are never used within the loop.
     if (!Ends.count(I))
       continue;
 
     // Skip ignored values.
     if (ValuesToIgnore.count(I))
       continue;
 
     // For each VF find the maximum usage of registers.
     for (unsigned j = 0, e = VFs.size(); j < e; ++j) {
       if (VFs[j] == 1) {
         MaxUsages[j] = std::max(MaxUsages[j], OpenIntervals.size());
         continue;
       }
       collectUniformsAndScalars(VFs[j]);
       // Count the number of live intervals.
       unsigned RegUsage = 0;
       for (auto Inst : OpenIntervals) {
         // Skip ignored values for VF > 1.
         if (VecValuesToIgnore.count(Inst) ||
             isScalarAfterVectorization(Inst, VFs[j]))
           continue;
         RegUsage += GetRegUsage(Inst->getType(), VFs[j]);
       }
       MaxUsages[j] = std::max(MaxUsages[j], RegUsage);
     }
 
     LLVM_DEBUG(dbgs() << "LV(REG): At #" << i << " Interval # "
                       << OpenIntervals.size() << '\n');
 
     // Add the current instruction to the list of open intervals.
     OpenIntervals.insert(I);
   }
 
   for (unsigned i = 0, e = VFs.size(); i < e; ++i) {
     unsigned Invariant = 0;
     if (VFs[i] == 1)
       Invariant = LoopInvariants.size();
     else {
       for (auto Inst : LoopInvariants)
         Invariant += GetRegUsage(Inst->getType(), VFs[i]);
     }
 
     LLVM_DEBUG(dbgs() << "LV(REG): VF = " << VFs[i] << '\n');
     LLVM_DEBUG(dbgs() << "LV(REG): Found max usage: " << MaxUsages[i] << '\n');
     LLVM_DEBUG(dbgs() << "LV(REG): Found invariant usage: " << Invariant
                       << '\n');
 
     RU.LoopInvariantRegs = Invariant;
     RU.MaxLocalUsers = MaxUsages[i];
     RUs[i] = RU;
   }
 
   return RUs;
 }
 
 bool LoopVectorizationCostModel::useEmulatedMaskMemRefHack(Instruction *I){
   // TODO: Cost model for emulated masked load/store is completely
   // broken. This hack guides the cost model to use an artificially
   // high enough value to practically disable vectorization with such
   // operations, except where previously deployed legality hack allowed
   // using very low cost values. This is to avoid regressions coming simply
   // from moving "masked load/store" check from legality to cost model.
   // Masked Load/Gather emulation was previously never allowed.
   // Limited number of Masked Store/Scatter emulation was allowed.
   assert(isScalarWithPredication(I) &&
          "Expecting a scalar emulated instruction");
   return isa<LoadInst>(I) ||
          (isa<StoreInst>(I) &&
           NumPredStores > NumberOfStoresToPredicate);
 }
 
 void LoopVectorizationCostModel::collectInstsToScalarize(unsigned VF) {
   // If we aren't vectorizing the loop, or if we've already collected the
   // instructions to scalarize, there's nothing to do. Collection may already
   // have occurred if we have a user-selected VF and are now computing the
   // expected cost for interleaving.
   if (VF < 2 || InstsToScalarize.count(VF))
     return;
 
   // Initialize a mapping for VF in InstsToScalalarize. If we find that it's
   // not profitable to scalarize any instructions, the presence of VF in the
   // map will indicate that we've analyzed it already.
   ScalarCostsTy &ScalarCostsVF = InstsToScalarize[VF];
 
   // Find all the instructions that are scalar with predication in the loop and
   // determine if it would be better to not if-convert the blocks they are in.
   // If so, we also record the instructions to scalarize.
   for (BasicBlock *BB : TheLoop->blocks()) {
     if (!Legal->blockNeedsPredication(BB))
       continue;
     for (Instruction &I : *BB)
       if (isScalarWithPredication(&I)) {
         ScalarCostsTy ScalarCosts;
         // Do not apply discount logic if hacked cost is needed
         // for emulated masked memrefs.
         if (!useEmulatedMaskMemRefHack(&I) &&
             computePredInstDiscount(&I, ScalarCosts, VF) >= 0)
           ScalarCostsVF.insert(ScalarCosts.begin(), ScalarCosts.end());
         // Remember that BB will remain after vectorization.
         PredicatedBBsAfterVectorization.insert(BB);
       }
   }
 }
 
 int LoopVectorizationCostModel::computePredInstDiscount(
     Instruction *PredInst, DenseMap<Instruction *, unsigned> &ScalarCosts,
     unsigned VF) {
   assert(!isUniformAfterVectorization(PredInst, VF) &&
          "Instruction marked uniform-after-vectorization will be predicated");
 
   // Initialize the discount to zero, meaning that the scalar version and the
   // vector version cost the same.
   int Discount = 0;
 
   // Holds instructions to analyze. The instructions we visit are mapped in
   // ScalarCosts. Those instructions are the ones that would be scalarized if
   // we find that the scalar version costs less.
   SmallVector<Instruction *, 8> Worklist;
 
   // Returns true if the given instruction can be scalarized.
   auto canBeScalarized = [&](Instruction *I) -> bool {
     // We only attempt to scalarize instructions forming a single-use chain
     // from the original predicated block that would otherwise be vectorized.
     // Although not strictly necessary, we give up on instructions we know will
     // already be scalar to avoid traversing chains that are unlikely to be
     // beneficial.
     if (!I->hasOneUse() || PredInst->getParent() != I->getParent() ||
         isScalarAfterVectorization(I, VF))
       return false;
 
     // If the instruction is scalar with predication, it will be analyzed
     // separately. We ignore it within the context of PredInst.
     if (isScalarWithPredication(I))
       return false;
 
     // If any of the instruction's operands are uniform after vectorization,
     // the instruction cannot be scalarized. This prevents, for example, a
     // masked load from being scalarized.
     //
     // We assume we will only emit a value for lane zero of an instruction
     // marked uniform after vectorization, rather than VF identical values.
     // Thus, if we scalarize an instruction that uses a uniform, we would
     // create uses of values corresponding to the lanes we aren't emitting code
     // for. This behavior can be changed by allowing getScalarValue to clone
     // the lane zero values for uniforms rather than asserting.
     for (Use &U : I->operands())
       if (auto *J = dyn_cast<Instruction>(U.get()))
         if (isUniformAfterVectorization(J, VF))
           return false;
 
     // Otherwise, we can scalarize the instruction.
     return true;
   };
 
   // Returns true if an operand that cannot be scalarized must be extracted
   // from a vector. We will account for this scalarization overhead below. Note
   // that the non-void predicated instructions are placed in their own blocks,
   // and their return values are inserted into vectors. Thus, an extract would
   // still be required.
   auto needsExtract = [&](Instruction *I) -> bool {
     return TheLoop->contains(I) && !isScalarAfterVectorization(I, VF);
   };
 
   // Compute the expected cost discount from scalarizing the entire expression
   // feeding the predicated instruction. We currently only consider expressions
   // that are single-use instruction chains.
   Worklist.push_back(PredInst);
   while (!Worklist.empty()) {
     Instruction *I = Worklist.pop_back_val();
 
     // If we've already analyzed the instruction, there's nothing to do.
     if (ScalarCosts.count(I))
       continue;
 
     // Compute the cost of the vector instruction. Note that this cost already
     // includes the scalarization overhead of the predicated instruction.
     unsigned VectorCost = getInstructionCost(I, VF).first;
 
     // Compute the cost of the scalarized instruction. This cost is the cost of
     // the instruction as if it wasn't if-converted and instead remained in the
     // predicated block. We will scale this cost by block probability after
     // computing the scalarization overhead.
     unsigned ScalarCost = VF * getInstructionCost(I, 1).first;
 
     // Compute the scalarization overhead of needed insertelement instructions
     // and phi nodes.
     if (isScalarWithPredication(I) && !I->getType()->isVoidTy()) {
       ScalarCost += TTI.getScalarizationOverhead(ToVectorTy(I->getType(), VF),
                                                  true, false);
       ScalarCost += VF * TTI.getCFInstrCost(Instruction::PHI);
     }
 
     // Compute the scalarization overhead of needed extractelement
     // instructions. For each of the instruction's operands, if the operand can
     // be scalarized, add it to the worklist; otherwise, account for the
     // overhead.
     for (Use &U : I->operands())
       if (auto *J = dyn_cast<Instruction>(U.get())) {
         assert(VectorType::isValidElementType(J->getType()) &&
                "Instruction has non-scalar type");
         if (canBeScalarized(J))
           Worklist.push_back(J);
         else if (needsExtract(J))
           ScalarCost += TTI.getScalarizationOverhead(
                               ToVectorTy(J->getType(),VF), false, true);
       }
 
     // Scale the total scalar cost by block probability.
     ScalarCost /= getReciprocalPredBlockProb();
 
     // Compute the discount. A non-negative discount means the vector version
     // of the instruction costs more, and scalarizing would be beneficial.
     Discount += VectorCost - ScalarCost;
     ScalarCosts[I] = ScalarCost;
   }
 
   return Discount;
 }
 
 LoopVectorizationCostModel::VectorizationCostTy
 LoopVectorizationCostModel::expectedCost(unsigned VF) {
   VectorizationCostTy Cost;
 
   // For each block.
   for (BasicBlock *BB : TheLoop->blocks()) {
     VectorizationCostTy BlockCost;
 
     // For each instruction in the old loop.
     for (Instruction &I : BB->instructionsWithoutDebug()) {
       // Skip ignored values.
       if (ValuesToIgnore.count(&I) ||
           (VF > 1 && VecValuesToIgnore.count(&I)))
         continue;
 
       VectorizationCostTy C = getInstructionCost(&I, VF);
 
       // Check if we should override the cost.
       if (ForceTargetInstructionCost.getNumOccurrences() > 0)
         C.first = ForceTargetInstructionCost;
 
       BlockCost.first += C.first;
       BlockCost.second |= C.second;
       LLVM_DEBUG(dbgs() << "LV: Found an estimated cost of " << C.first
                         << " for VF " << VF << " For instruction: " << I
                         << '\n');
     }
 
     // If we are vectorizing a predicated block, it will have been
     // if-converted. This means that the block's instructions (aside from
     // stores and instructions that may divide by zero) will now be
     // unconditionally executed. For the scalar case, we may not always execute
     // the predicated block. Thus, scale the block's cost by the probability of
     // executing it.
     if (VF == 1 && Legal->blockNeedsPredication(BB))
       BlockCost.first /= getReciprocalPredBlockProb();
 
     Cost.first += BlockCost.first;
     Cost.second |= BlockCost.second;
   }
 
   return Cost;
 }
 
 /// Gets Address Access SCEV after verifying that the access pattern
 /// is loop invariant except the induction variable dependence.
 ///
 /// This SCEV can be sent to the Target in order to estimate the address
 /// calculation cost.
 static const SCEV *getAddressAccessSCEV(
               Value *Ptr,
               LoopVectorizationLegality *Legal,
               PredicatedScalarEvolution &PSE,
               const Loop *TheLoop) {
 
   auto *Gep = dyn_cast<GetElementPtrInst>(Ptr);
   if (!Gep)
     return nullptr;
 
   // We are looking for a gep with all loop invariant indices except for one
   // which should be an induction variable.
   auto SE = PSE.getSE();
   unsigned NumOperands = Gep->getNumOperands();
   for (unsigned i = 1; i < NumOperands; ++i) {
     Value *Opd = Gep->getOperand(i);
     if (!SE->isLoopInvariant(SE->getSCEV(Opd), TheLoop) &&
         !Legal->isInductionVariable(Opd))
       return nullptr;
   }
 
   // Now we know we have a GEP ptr, %inv, %ind, %inv. return the Ptr SCEV.
   return PSE.getSCEV(Ptr);
 }
 
 static bool isStrideMul(Instruction *I, LoopVectorizationLegality *Legal) {
   return Legal->hasStride(I->getOperand(0)) ||
          Legal->hasStride(I->getOperand(1));
 }
 
 unsigned LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
                                                                  unsigned VF) {
   Type *ValTy = getMemInstValueType(I);
   auto SE = PSE.getSE();
 
   unsigned Alignment = getMemInstAlignment(I);
   unsigned AS = getMemInstAddressSpace(I);
   Value *Ptr = getLoadStorePointerOperand(I);
   Type *PtrTy = ToVectorTy(Ptr->getType(), VF);
 
   // Figure out whether the access is strided and get the stride value
   // if it's known in compile time
   const SCEV *PtrSCEV = getAddressAccessSCEV(Ptr, Legal, PSE, TheLoop);
 
   // Get the cost of the scalar memory instruction and address computation.
   unsigned Cost = VF * TTI.getAddressComputationCost(PtrTy, SE, PtrSCEV);
 
   Cost += VF *
           TTI.getMemoryOpCost(I->getOpcode(), ValTy->getScalarType(), Alignment,
                               AS, I);
 
   // Get the overhead of the extractelement and insertelement instructions
   // we might create due to scalarization.
   Cost += getScalarizationOverhead(I, VF, TTI);
 
   // If we have a predicated store, it may not be executed for each vector
   // lane. Scale the cost by the probability of executing the predicated
   // block.
   if (isScalarWithPredication(I)) {
     Cost /= getReciprocalPredBlockProb();
 
     if (useEmulatedMaskMemRefHack(I))
       // Artificially setting to a high enough value to practically disable
       // vectorization with such operations.
       Cost = 3000000;
   }
 
   return Cost;
 }
 
 unsigned LoopVectorizationCostModel::getConsecutiveMemOpCost(Instruction *I,
                                                              unsigned VF) {
   Type *ValTy = getMemInstValueType(I);
   Type *VectorTy = ToVectorTy(ValTy, VF);
   unsigned Alignment = getMemInstAlignment(I);
   Value *Ptr = getLoadStorePointerOperand(I);
   unsigned AS = getMemInstAddressSpace(I);
   int ConsecutiveStride = Legal->isConsecutivePtr(Ptr);
 
   assert((ConsecutiveStride == 1 || ConsecutiveStride == -1) &&
          "Stride should be 1 or -1 for consecutive memory access");
   unsigned Cost = 0;
   if (Legal->isMaskRequired(I))
     Cost += TTI.getMaskedMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS);
   else
     Cost += TTI.getMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS, I);
 
   bool Reverse = ConsecutiveStride < 0;
   if (Reverse)
     Cost += TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy, 0);
   return Cost;
 }
 
 unsigned LoopVectorizationCostModel::getUniformMemOpCost(Instruction *I,
                                                          unsigned VF) {
   LoadInst *LI = cast<LoadInst>(I);
   Type *ValTy = LI->getType();
   Type *VectorTy = ToVectorTy(ValTy, VF);
   unsigned Alignment = LI->getAlignment();
   unsigned AS = LI->getPointerAddressSpace();
 
   return TTI.getAddressComputationCost(ValTy) +
          TTI.getMemoryOpCost(Instruction::Load, ValTy, Alignment, AS) +
          TTI.getShuffleCost(TargetTransformInfo::SK_Broadcast, VectorTy);
 }
 
 unsigned LoopVectorizationCostModel::getGatherScatterCost(Instruction *I,
                                                           unsigned VF) {
   Type *ValTy = getMemInstValueType(I);
   Type *VectorTy = ToVectorTy(ValTy, VF);
   unsigned Alignment = getMemInstAlignment(I);
   Value *Ptr = getLoadStorePointerOperand(I);
 
   return TTI.getAddressComputationCost(VectorTy) +
          TTI.getGatherScatterOpCost(I->getOpcode(), VectorTy, Ptr,
                                     Legal->isMaskRequired(I), Alignment);
 }
 
 unsigned LoopVectorizationCostModel::getInterleaveGroupCost(Instruction *I,
                                                             unsigned VF) {
   Type *ValTy = getMemInstValueType(I);
   Type *VectorTy = ToVectorTy(ValTy, VF);
   unsigned AS = getMemInstAddressSpace(I);
 
   auto Group = getInterleavedAccessGroup(I);
   assert(Group && "Fail to get an interleaved access group.");
 
   unsigned InterleaveFactor = Group->getFactor();
   Type *WideVecTy = VectorType::get(ValTy, VF * InterleaveFactor);
 
   // Holds the indices of existing members in an interleaved load group.
   // An interleaved store group doesn't need this as it doesn't allow gaps.
   SmallVector<unsigned, 4> Indices;
   if (isa<LoadInst>(I)) {
     for (unsigned i = 0; i < InterleaveFactor; i++)
       if (Group->getMember(i))
         Indices.push_back(i);
   }
 
   // Calculate the cost of the whole interleaved group.
   unsigned Cost = TTI.getInterleavedMemoryOpCost(I->getOpcode(), WideVecTy,
                                                  Group->getFactor(), Indices,
                                                  Group->getAlignment(), AS);
 
   if (Group->isReverse())
     Cost += Group->getNumMembers() *
             TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy, 0);
   return Cost;
 }
 
 unsigned LoopVectorizationCostModel::getMemoryInstructionCost(Instruction *I,
                                                               unsigned VF) {
   // Calculate scalar cost only. Vectorization cost should be ready at this
   // moment.
   if (VF == 1) {
     Type *ValTy = getMemInstValueType(I);
     unsigned Alignment = getMemInstAlignment(I);
     unsigned AS = getMemInstAddressSpace(I);
 
     return TTI.getAddressComputationCost(ValTy) +
            TTI.getMemoryOpCost(I->getOpcode(), ValTy, Alignment, AS, I);
   }
   return getWideningCost(I, VF);
 }
 
 LoopVectorizationCostModel::VectorizationCostTy
 LoopVectorizationCostModel::getInstructionCost(Instruction *I, unsigned VF) {
   // If we know that this instruction will remain uniform, check the cost of
   // the scalar version.
   if (isUniformAfterVectorization(I, VF))
     VF = 1;
 
   if (VF > 1 && isProfitableToScalarize(I, VF))
     return VectorizationCostTy(InstsToScalarize[VF][I], false);
 
   // Forced scalars do not have any scalarization overhead.
   if (VF > 1 && ForcedScalars.count(VF) &&
       ForcedScalars.find(VF)->second.count(I))
     return VectorizationCostTy((getInstructionCost(I, 1).first * VF), false);
 
   Type *VectorTy;
   unsigned C = getInstructionCost(I, VF, VectorTy);
 
   bool TypeNotScalarized =
       VF > 1 && VectorTy->isVectorTy() && TTI.getNumberOfParts(VectorTy) < VF;
   return VectorizationCostTy(C, TypeNotScalarized);
 }
 
 void LoopVectorizationCostModel::setCostBasedWideningDecision(unsigned VF) {
   if (VF == 1)
     return;
   NumPredStores = 0;
   for (BasicBlock *BB : TheLoop->blocks()) {
     // For each instruction in the old loop.
     for (Instruction &I : *BB) {
       Value *Ptr =  getLoadStorePointerOperand(&I);
       if (!Ptr)
         continue;
 
       if (isa<StoreInst>(&I) && isScalarWithPredication(&I))
         NumPredStores++;
       if (isa<LoadInst>(&I) && Legal->isUniform(Ptr)) {
         // Scalar load + broadcast
         unsigned Cost = getUniformMemOpCost(&I, VF);
         setWideningDecision(&I, VF, CM_Scalarize, Cost);
         continue;
       }
 
       // We assume that widening is the best solution when possible.
       if (memoryInstructionCanBeWidened(&I, VF)) {
         unsigned Cost = getConsecutiveMemOpCost(&I, VF);
         int ConsecutiveStride =
                Legal->isConsecutivePtr(getLoadStorePointerOperand(&I));
         assert((ConsecutiveStride == 1 || ConsecutiveStride == -1) &&
                "Expected consecutive stride.");
         InstWidening Decision =
             ConsecutiveStride == 1 ? CM_Widen : CM_Widen_Reverse;
         setWideningDecision(&I, VF, Decision, Cost);
         continue;
       }
 
       // Choose between Interleaving, Gather/Scatter or Scalarization.
       unsigned InterleaveCost = std::numeric_limits<unsigned>::max();
       unsigned NumAccesses = 1;
       if (isAccessInterleaved(&I)) {
         auto Group = getInterleavedAccessGroup(&I);
         assert(Group && "Fail to get an interleaved access group.");
 
         // Make one decision for the whole group.
         if (getWideningDecision(&I, VF) != CM_Unknown)
           continue;
 
         NumAccesses = Group->getNumMembers();
         InterleaveCost = getInterleaveGroupCost(&I, VF);
       }
 
       unsigned GatherScatterCost =
           isLegalGatherOrScatter(&I)
               ? getGatherScatterCost(&I, VF) * NumAccesses
               : std::numeric_limits<unsigned>::max();
 
       unsigned ScalarizationCost =
           getMemInstScalarizationCost(&I, VF) * NumAccesses;
 
       // Choose better solution for the current VF,
       // write down this decision and use it during vectorization.
       unsigned Cost;
       InstWidening Decision;
       if (InterleaveCost <= GatherScatterCost &&
           InterleaveCost < ScalarizationCost) {
         Decision = CM_Interleave;
         Cost = InterleaveCost;
       } else if (GatherScatterCost < ScalarizationCost) {
         Decision = CM_GatherScatter;
         Cost = GatherScatterCost;
       } else {
         Decision = CM_Scalarize;
         Cost = ScalarizationCost;
       }
       // If the instructions belongs to an interleave group, the whole group
       // receives the same decision. The whole group receives the cost, but
       // the cost will actually be assigned to one instruction.
       if (auto Group = getInterleavedAccessGroup(&I))
         setWideningDecision(Group, VF, Decision, Cost);
       else
         setWideningDecision(&I, VF, Decision, Cost);
     }
   }
 
   // Make sure that any load of address and any other address computation
   // remains scalar unless there is gather/scatter support. This avoids
   // inevitable extracts into address registers, and also has the benefit of
   // activating LSR more, since that pass can't optimize vectorized
   // addresses.
   if (TTI.prefersVectorizedAddressing())
     return;
 
   // Start with all scalar pointer uses.
   SmallPtrSet<Instruction *, 8> AddrDefs;
   for (BasicBlock *BB : TheLoop->blocks())
     for (Instruction &I : *BB) {
       Instruction *PtrDef =
         dyn_cast_or_null<Instruction>(getLoadStorePointerOperand(&I));
       if (PtrDef && TheLoop->contains(PtrDef) &&
           getWideningDecision(&I, VF) != CM_GatherScatter)
         AddrDefs.insert(PtrDef);
     }
 
   // Add all instructions used to generate the addresses.
   SmallVector<Instruction *, 4> Worklist;
   for (auto *I : AddrDefs)
     Worklist.push_back(I);
   while (!Worklist.empty()) {
     Instruction *I = Worklist.pop_back_val();
     for (auto &Op : I->operands())
       if (auto *InstOp = dyn_cast<Instruction>(Op))
         if ((InstOp->getParent() == I->getParent()) && !isa<PHINode>(InstOp) &&
             AddrDefs.insert(InstOp).second)
           Worklist.push_back(InstOp);
   }
 
   for (auto *I : AddrDefs) {
     if (isa<LoadInst>(I)) {
       // Setting the desired widening decision should ideally be handled in
       // by cost functions, but since this involves the task of finding out
       // if the loaded register is involved in an address computation, it is
       // instead changed here when we know this is the case.
       InstWidening Decision = getWideningDecision(I, VF);
       if (Decision == CM_Widen || Decision == CM_Widen_Reverse)
         // Scalarize a widened load of address.
         setWideningDecision(I, VF, CM_Scalarize,
                             (VF * getMemoryInstructionCost(I, 1)));
       else if (auto Group = getInterleavedAccessGroup(I)) {
         // Scalarize an interleave group of address loads.
         for (unsigned I = 0; I < Group->getFactor(); ++I) {
           if (Instruction *Member = Group->getMember(I))
             setWideningDecision(Member, VF, CM_Scalarize,
                                 (VF * getMemoryInstructionCost(Member, 1)));
         }
       }
     } else
       // Make sure I gets scalarized and a cost estimate without
       // scalarization overhead.
       ForcedScalars[VF].insert(I);
   }
 }
 
 unsigned LoopVectorizationCostModel::getInstructionCost(Instruction *I,
                                                         unsigned VF,
                                                         Type *&VectorTy) {
   Type *RetTy = I->getType();
   if (canTruncateToMinimalBitwidth(I, VF))
     RetTy = IntegerType::get(RetTy->getContext(), MinBWs[I]);
   VectorTy = isScalarAfterVectorization(I, VF) ? RetTy : ToVectorTy(RetTy, VF);
   auto SE = PSE.getSE();
 
   // TODO: We need to estimate the cost of intrinsic calls.
   switch (I->getOpcode()) {
   case Instruction::GetElementPtr:
     // We mark this instruction as zero-cost because the cost of GEPs in
     // vectorized code depends on whether the corresponding memory instruction
     // is scalarized or not. Therefore, we handle GEPs with the memory
     // instruction cost.
     return 0;
   case Instruction::Br: {
     // In cases of scalarized and predicated instructions, there will be VF
     // predicated blocks in the vectorized loop. Each branch around these
     // blocks requires also an extract of its vector compare i1 element.
     bool ScalarPredicatedBB = false;
     BranchInst *BI = cast<BranchInst>(I);
     if (VF > 1 && BI->isConditional() &&
         (PredicatedBBsAfterVectorization.count(BI->getSuccessor(0)) ||
          PredicatedBBsAfterVectorization.count(BI->getSuccessor(1))))
       ScalarPredicatedBB = true;
 
     if (ScalarPredicatedBB) {
       // Return cost for branches around scalarized and predicated blocks.
       Type *Vec_i1Ty =
           VectorType::get(IntegerType::getInt1Ty(RetTy->getContext()), VF);
       return (TTI.getScalarizationOverhead(Vec_i1Ty, false, true) +
               (TTI.getCFInstrCost(Instruction::Br) * VF));
     } else if (I->getParent() == TheLoop->getLoopLatch() || VF == 1)
       // The back-edge branch will remain, as will all scalar branches.
       return TTI.getCFInstrCost(Instruction::Br);
     else
       // This branch will be eliminated by if-conversion.
       return 0;
     // Note: We currently assume zero cost for an unconditional branch inside
     // a predicated block since it will become a fall-through, although we
     // may decide in the future to call TTI for all branches.
   }
   case Instruction::PHI: {
     auto *Phi = cast<PHINode>(I);
 
     // First-order recurrences are replaced by vector shuffles inside the loop.
     if (VF > 1 && Legal->isFirstOrderRecurrence(Phi))
       return TTI.getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,
                                 VectorTy, VF - 1, VectorTy);
 
     // Phi nodes in non-header blocks (not inductions, reductions, etc.) are
     // converted into select instructions. We require N - 1 selects per phi
     // node, where N is the number of incoming values.
     if (VF > 1 && Phi->getParent() != TheLoop->getHeader())
       return (Phi->getNumIncomingValues() - 1) *
              TTI.getCmpSelInstrCost(
                  Instruction::Select, ToVectorTy(Phi->getType(), VF),
                  ToVectorTy(Type::getInt1Ty(Phi->getContext()), VF));
 
     return TTI.getCFInstrCost(Instruction::PHI);
   }
   case Instruction::UDiv:
   case Instruction::SDiv:
   case Instruction::URem:
   case Instruction::SRem:
     // If we have a predicated instruction, it may not be executed for each
     // vector lane. Get the scalarization cost and scale this amount by the
     // probability of executing the predicated block. If the instruction is not
     // predicated, we fall through to the next case.
     if (VF > 1 && isScalarWithPredication(I)) {
       unsigned Cost = 0;
 
       // These instructions have a non-void type, so account for the phi nodes
       // that we will create. This cost is likely to be zero. The phi node
       // cost, if any, should be scaled by the block probability because it
       // models a copy at the end of each predicated block.
       Cost += VF * TTI.getCFInstrCost(Instruction::PHI);
 
       // The cost of the non-predicated instruction.
       Cost += VF * TTI.getArithmeticInstrCost(I->getOpcode(), RetTy);
 
       // The cost of insertelement and extractelement instructions needed for
       // scalarization.
       Cost += getScalarizationOverhead(I, VF, TTI);
 
       // Scale the cost by the probability of executing the predicated blocks.
       // This assumes the predicated block for each vector lane is equally
       // likely.
       return Cost / getReciprocalPredBlockProb();
     }
     LLVM_FALLTHROUGH;
   case Instruction::Add:
   case Instruction::FAdd:
   case Instruction::Sub:
   case Instruction::FSub:
   case Instruction::Mul:
   case Instruction::FMul:
   case Instruction::FDiv:
   case Instruction::FRem:
   case Instruction::Shl:
   case Instruction::LShr:
   case Instruction::AShr:
   case Instruction::And:
   case Instruction::Or:
   case Instruction::Xor: {
     // Since we will replace the stride by 1 the multiplication should go away.
     if (I->getOpcode() == Instruction::Mul && isStrideMul(I, Legal))
       return 0;
     // Certain instructions can be cheaper to vectorize if they have a constant
     // second vector operand. One example of this are shifts on x86.
     TargetTransformInfo::OperandValueKind Op1VK =
         TargetTransformInfo::OK_AnyValue;
     TargetTransformInfo::OperandValueKind Op2VK =
         TargetTransformInfo::OK_AnyValue;
     TargetTransformInfo::OperandValueProperties Op1VP =
         TargetTransformInfo::OP_None;
     TargetTransformInfo::OperandValueProperties Op2VP =
         TargetTransformInfo::OP_None;
     Value *Op2 = I->getOperand(1);
 
     // Check for a splat or for a non uniform vector of constants.
     if (isa<ConstantInt>(Op2)) {
       ConstantInt *CInt = cast<ConstantInt>(Op2);
       if (CInt && CInt->getValue().isPowerOf2())
         Op2VP = TargetTransformInfo::OP_PowerOf2;
       Op2VK = TargetTransformInfo::OK_UniformConstantValue;
     } else if (isa<ConstantVector>(Op2) || isa<ConstantDataVector>(Op2)) {
       Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;
       Constant *SplatValue = cast<Constant>(Op2)->getSplatValue();
       if (SplatValue) {
         ConstantInt *CInt = dyn_cast<ConstantInt>(SplatValue);
         if (CInt && CInt->getValue().isPowerOf2())
           Op2VP = TargetTransformInfo::OP_PowerOf2;
         Op2VK = TargetTransformInfo::OK_UniformConstantValue;
       }
     } else if (Legal->isUniform(Op2)) {
       Op2VK = TargetTransformInfo::OK_UniformValue;
     }
     SmallVector<const Value *, 4> Operands(I->operand_values());
     unsigned N = isScalarAfterVectorization(I, VF) ? VF : 1;
     return N * TTI.getArithmeticInstrCost(I->getOpcode(), VectorTy, Op1VK,
                                           Op2VK, Op1VP, Op2VP, Operands);
   }
   case Instruction::Select: {
     SelectInst *SI = cast<SelectInst>(I);
     const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());
     bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));
     Type *CondTy = SI->getCondition()->getType();
     if (!ScalarCond)
       CondTy = VectorType::get(CondTy, VF);
 
     return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy, I);
   }
   case Instruction::ICmp:
   case Instruction::FCmp: {
     Type *ValTy = I->getOperand(0)->getType();
     Instruction *Op0AsInstruction = dyn_cast<Instruction>(I->getOperand(0));
     if (canTruncateToMinimalBitwidth(Op0AsInstruction, VF))
       ValTy = IntegerType::get(ValTy->getContext(), MinBWs[Op0AsInstruction]);
     VectorTy = ToVectorTy(ValTy, VF);
     return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, nullptr, I);
   }
   case Instruction::Store:
   case Instruction::Load: {
     unsigned Width = VF;
     if (Width > 1) {
       InstWidening Decision = getWideningDecision(I, Width);
       assert(Decision != CM_Unknown &&
              "CM decision should be taken at this point");
       if (Decision == CM_Scalarize)
         Width = 1;
     }
     VectorTy = ToVectorTy(getMemInstValueType(I), Width);
     return getMemoryInstructionCost(I, VF);
   }
   case Instruction::ZExt:
   case Instruction::SExt:
   case Instruction::FPToUI:
   case Instruction::FPToSI:
   case Instruction::FPExt:
   case Instruction::PtrToInt:
   case Instruction::IntToPtr:
   case Instruction::SIToFP:
   case Instruction::UIToFP:
   case Instruction::Trunc:
   case Instruction::FPTrunc:
   case Instruction::BitCast: {
     // We optimize the truncation of induction variables having constant
     // integer steps. The cost of these truncations is the same as the scalar
     // operation.
     if (isOptimizableIVTruncate(I, VF)) {
       auto *Trunc = cast<TruncInst>(I);
       return TTI.getCastInstrCost(Instruction::Trunc, Trunc->getDestTy(),
                                   Trunc->getSrcTy(), Trunc);
     }
 
     Type *SrcScalarTy = I->getOperand(0)->getType();
     Type *SrcVecTy =
         VectorTy->isVectorTy() ? ToVectorTy(SrcScalarTy, VF) : SrcScalarTy;
     if (canTruncateToMinimalBitwidth(I, VF)) {
       // This cast is going to be shrunk. This may remove the cast or it might
       // turn it into slightly different cast. For example, if MinBW == 16,
       // "zext i8 %1 to i32" becomes "zext i8 %1 to i16".
       //
       // Calculate the modified src and dest types.
       Type *MinVecTy = VectorTy;
       if (I->getOpcode() == Instruction::Trunc) {
         SrcVecTy = smallestIntegerVectorType(SrcVecTy, MinVecTy);
         VectorTy =
             largestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);
       } else if (I->getOpcode() == Instruction::ZExt ||
                  I->getOpcode() == Instruction::SExt) {
         SrcVecTy = largestIntegerVectorType(SrcVecTy, MinVecTy);
         VectorTy =
             smallestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);
       }
     }
 
     unsigned N = isScalarAfterVectorization(I, VF) ? VF : 1;
     return N * TTI.getCastInstrCost(I->getOpcode(), VectorTy, SrcVecTy, I);
   }
   case Instruction::Call: {
     bool NeedToScalarize;
     CallInst *CI = cast<CallInst>(I);
     unsigned CallCost = getVectorCallCost(CI, VF, TTI, TLI, NeedToScalarize);
     if (getVectorIntrinsicIDForCall(CI, TLI))
       return std::min(CallCost, getVectorIntrinsicCost(CI, VF, TTI, TLI));
     return CallCost;
   }
   default:
     // The cost of executing VF copies of the scalar instruction. This opcode
     // is unknown. Assume that it is the same as 'mul'.
     return VF * TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy) +
            getScalarizationOverhead(I, VF, TTI);
   } // end of switch.
 }
 
 char LoopVectorize::ID = 0;
 
 static const char lv_name[] = "Loop Vectorization";
 
 INITIALIZE_PASS_BEGIN(LoopVectorize, LV_NAME, lv_name, false, false)
 INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(BasicAAWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
 INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfoWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(LoopAccessLegacyAnalysis)
 INITIALIZE_PASS_DEPENDENCY(DemandedBitsWrapperPass)
 INITIALIZE_PASS_DEPENDENCY(OptimizationRemarkEmitterWrapperPass)
 INITIALIZE_PASS_END(LoopVectorize, LV_NAME, lv_name, false, false)
 
 namespace llvm {
 
 Pass *createLoopVectorizePass(bool NoUnrolling, bool AlwaysVectorize) {
   return new LoopVectorize(NoUnrolling, AlwaysVectorize);
 }
 
 } // end namespace llvm
 
 bool LoopVectorizationCostModel::isConsecutiveLoadOrStore(Instruction *Inst) {
   // Check if the pointer operand of a load or store instruction is
   // consecutive.
   if (auto *Ptr = getLoadStorePointerOperand(Inst))
     return Legal->isConsecutivePtr(Ptr);
   return false;
 }
 
 void LoopVectorizationCostModel::collectValuesToIgnore() {
   // Ignore ephemeral values.
   CodeMetrics::collectEphemeralValues(TheLoop, AC, ValuesToIgnore);
 
   // Ignore type-promoting instructions we identified during reduction
   // detection.
   for (auto &Reduction : *Legal->getReductionVars()) {
     RecurrenceDescriptor &RedDes = Reduction.second;
     SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();
     VecValuesToIgnore.insert(Casts.begin(), Casts.end());
   }
   // Ignore type-casting instructions we identified during induction
   // detection.
   for (auto &Induction : *Legal->getInductionVars()) {
     InductionDescriptor &IndDes = Induction.second;
     const SmallVectorImpl<Instruction *> &Casts = IndDes.getCastInsts();
     VecValuesToIgnore.insert(Casts.begin(), Casts.end());
   }
 }
 
 VectorizationFactor
 LoopVectorizationPlanner::planInVPlanNativePath(bool OptForSize,
                                                 unsigned UserVF) {
   // Width 1 means no vectorization, cost 0 means uncomputed cost.
   const VectorizationFactor NoVectorization = {1U, 0U};
 
   // Outer loop handling: They may require CFG and instruction level
   // transformations before even evaluating whether vectorization is profitable.
   // Since we cannot modify the incoming IR, we need to build VPlan upfront in
   // the vectorization pipeline.
   if (!OrigLoop->empty()) {
     // TODO: If UserVF is not provided, we set UserVF to 4 for stress testing.
     // This won't be necessary when UserVF is not required in the VPlan-native
     // path.
     if (VPlanBuildStressTest && !UserVF)
       UserVF = 4;
 
     assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
     assert(UserVF && "Expected UserVF for outer loop vectorization.");
     assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
     LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
     buildVPlans(UserVF, UserVF);
 
     // For VPlan build stress testing, we bail out after VPlan construction.
     if (VPlanBuildStressTest)
       return NoVectorization;
 
     return {UserVF, 0};
   }
 
   LLVM_DEBUG(
       dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "
                 "VPlan-native path.\n");
   return NoVectorization;
 }
 
 VectorizationFactor
 LoopVectorizationPlanner::plan(bool OptForSize, unsigned UserVF) {
   assert(OrigLoop->empty() && "Inner loop expected.");
   // Width 1 means no vectorization, cost 0 means uncomputed cost.
   const VectorizationFactor NoVectorization = {1U, 0U};
   Optional<unsigned> MaybeMaxVF = CM.computeMaxVF(OptForSize);
   if (!MaybeMaxVF.hasValue()) // Cases considered too costly to vectorize.
     return NoVectorization;
 
   if (UserVF) {
     LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
     assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
     // Collect the instructions (and their associated costs) that will be more
     // profitable to scalarize.
     CM.selectUserVectorizationFactor(UserVF);
     buildVPlansWithVPRecipes(UserVF, UserVF);
     LLVM_DEBUG(printPlans(dbgs()));
     return {UserVF, 0};
   }
 
   unsigned MaxVF = MaybeMaxVF.getValue();
   assert(MaxVF != 0 && "MaxVF is zero.");
 
   for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {
     // Collect Uniform and Scalar instructions after vectorization with VF.
     CM.collectUniformsAndScalars(VF);
 
     // Collect the instructions (and their associated costs) that will be more
     // profitable to scalarize.
     if (VF > 1)
       CM.collectInstsToScalarize(VF);
   }
 
   buildVPlansWithVPRecipes(1, MaxVF);
   LLVM_DEBUG(printPlans(dbgs()));
   if (MaxVF == 1)
     return NoVectorization;
 
   // Select the optimal vectorization factor.
   return CM.selectVectorizationFactor(MaxVF);
 }
 
 void LoopVectorizationPlanner::setBestPlan(unsigned VF, unsigned UF) {
   LLVM_DEBUG(dbgs() << "Setting best plan to VF=" << VF << ", UF=" << UF
                     << '\n');
   BestVF = VF;
   BestUF = UF;
 
   erase_if(VPlans, [VF](const VPlanPtr &Plan) {
     return !Plan->hasVF(VF);
   });
   assert(VPlans.size() == 1 && "Best VF has not a single VPlan.");
 }
 
 void LoopVectorizationPlanner::executePlan(InnerLoopVectorizer &ILV,
                                            DominatorTree *DT) {
   // Perform the actual loop transformation.
 
   // 1. Create a new empty loop. Unlink the old loop and connect the new one.
   VPCallbackILV CallbackILV(ILV);
 
   VPTransformState State{BestVF, BestUF,      LI,
                          DT,     ILV.Builder, ILV.VectorLoopValueMap,
                          &ILV,   CallbackILV};
   State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();
 
   //===------------------------------------------------===//
   //
   // Notice: any optimization or new instruction that go
   // into the code below should also be implemented in
   // the cost-model.
   //
   //===------------------------------------------------===//
 
   // 2. Copy and widen instructions from the old loop into the new loop.
   assert(VPlans.size() == 1 && "Not a single VPlan to execute.");
   VPlans.front()->execute(&State);
 
   // 3. Fix the vectorized code: take care of header phi's, live-outs,
   //    predication, updating analyses.
   ILV.fixVectorizedLoop();
 }
 
 void LoopVectorizationPlanner::collectTriviallyDeadInstructions(
     SmallPtrSetImpl<Instruction *> &DeadInstructions) {
   BasicBlock *Latch = OrigLoop->getLoopLatch();
 
   // We create new control-flow for the vectorized loop, so the original
   // condition will be dead after vectorization if it's only used by the
   // branch.
   auto *Cmp = dyn_cast<Instruction>(Latch->getTerminator()->getOperand(0));
   if (Cmp && Cmp->hasOneUse())
     DeadInstructions.insert(Cmp);
 
   // We create new "steps" for induction variable updates to which the original
   // induction variables map. An original update instruction will be dead if
   // all its users except the induction variable are dead.
   for (auto &Induction : *Legal->getInductionVars()) {
     PHINode *Ind = Induction.first;
     auto *IndUpdate = cast<Instruction>(Ind->getIncomingValueForBlock(Latch));
     if (llvm::all_of(IndUpdate->users(), [&](User *U) -> bool {
           return U == Ind || DeadInstructions.count(cast<Instruction>(U));
         }))
       DeadInstructions.insert(IndUpdate);
 
     // We record as "Dead" also the type-casting instructions we had identified
     // during induction analysis. We don't need any handling for them in the
     // vectorized loop because we have proven that, under a proper runtime
     // test guarding the vectorized loop, the value of the phi, and the casted
     // value of the phi, are the same. The last instruction in this casting chain
     // will get its scalar/vector/widened def from the scalar/vector/widened def
     // of the respective phi node. Any other casts in the induction def-use chain
     // have no other uses outside the phi update chain, and will be ignored.
     InductionDescriptor &IndDes = Induction.second;
     const SmallVectorImpl<Instruction *> &Casts = IndDes.getCastInsts();
     DeadInstructions.insert(Casts.begin(), Casts.end());
   }
 }
 
 Value *InnerLoopUnroller::reverseVector(Value *Vec) { return Vec; }
 
 Value *InnerLoopUnroller::getBroadcastInstrs(Value *V) { return V; }
 
 Value *InnerLoopUnroller::getStepVector(Value *Val, int StartIdx, Value *Step,
                                         Instruction::BinaryOps BinOp) {
   // When unrolling and the VF is 1, we only need to add a simple scalar.
   Type *Ty = Val->getType();
   assert(!Ty->isVectorTy() && "Val must be a scalar");
 
   if (Ty->isFloatingPointTy()) {
     Constant *C = ConstantFP::get(Ty, (double)StartIdx);
 
     // Floating point operations had to be 'fast' to enable the unrolling.
     Value *MulOp = addFastMathFlag(Builder.CreateFMul(C, Step));
     return addFastMathFlag(Builder.CreateBinOp(BinOp, Val, MulOp));
   }
   Constant *C = ConstantInt::get(Ty, StartIdx);
   return Builder.CreateAdd(Val, Builder.CreateMul(C, Step), "induction");
 }
 
 static void AddRuntimeUnrollDisableMetaData(Loop *L) {
   SmallVector<Metadata *, 4> MDs;
   // Reserve first location for self reference to the LoopID metadata node.
   MDs.push_back(nullptr);
   bool IsUnrollMetadata = false;
   MDNode *LoopID = L->getLoopID();
   if (LoopID) {
     // First find existing loop unrolling disable metadata.
     for (unsigned i = 1, ie = LoopID->getNumOperands(); i < ie; ++i) {
       auto *MD = dyn_cast<MDNode>(LoopID->getOperand(i));
       if (MD) {
         const auto *S = dyn_cast<MDString>(MD->getOperand(0));
         IsUnrollMetadata =
             S && S->getString().startswith("llvm.loop.unroll.disable");
       }
       MDs.push_back(LoopID->getOperand(i));
     }
   }
 
   if (!IsUnrollMetadata) {
     // Add runtime unroll disable metadata.
     LLVMContext &Context = L->getHeader()->getContext();
     SmallVector<Metadata *, 1> DisableOperands;
     DisableOperands.push_back(
         MDString::get(Context, "llvm.loop.unroll.runtime.disable"));
     MDNode *DisableNode = MDNode::get(Context, DisableOperands);
     MDs.push_back(DisableNode);
     MDNode *NewLoopID = MDNode::get(Context, MDs);
     // Set operand 0 to refer to the loop id itself.
     NewLoopID->replaceOperandWith(0, NewLoopID);
     L->setLoopID(NewLoopID);
   }
 }
 
 bool LoopVectorizationPlanner::getDecisionAndClampRange(
     const std::function<bool(unsigned)> &Predicate, VFRange &Range) {
   assert(Range.End > Range.Start && "Trying to test an empty VF range.");
   bool PredicateAtRangeStart = Predicate(Range.Start);
 
   for (unsigned TmpVF = Range.Start * 2; TmpVF < Range.End; TmpVF *= 2)
     if (Predicate(TmpVF) != PredicateAtRangeStart) {
       Range.End = TmpVF;
       break;
     }
 
   return PredicateAtRangeStart;
 }
 
 /// Build VPlans for the full range of feasible VF's = {\p MinVF, 2 * \p MinVF,
 /// 4 * \p MinVF, ..., \p MaxVF} by repeatedly building a VPlan for a sub-range
 /// of VF's starting at a given VF and extending it as much as possible. Each
 /// vectorization decision can potentially shorten this sub-range during
 /// buildVPlan().
 void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF) {
   for (unsigned VF = MinVF; VF < MaxVF + 1;) {
     VFRange SubRange = {VF, MaxVF + 1};
     VPlans.push_back(buildVPlan(SubRange));
     VF = SubRange.End;
   }
 }
 
 VPValue *VPRecipeBuilder::createEdgeMask(BasicBlock *Src, BasicBlock *Dst,
                                          VPlanPtr &Plan) {
   assert(is_contained(predecessors(Dst), Src) && "Invalid edge");
 
   // Look for cached value.
   std::pair<BasicBlock *, BasicBlock *> Edge(Src, Dst);
   EdgeMaskCacheTy::iterator ECEntryIt = EdgeMaskCache.find(Edge);
   if (ECEntryIt != EdgeMaskCache.end())
     return ECEntryIt->second;
 
   VPValue *SrcMask = createBlockInMask(Src, Plan);
 
   // The terminator has to be a branch inst!
   BranchInst *BI = dyn_cast<BranchInst>(Src->getTerminator());
   assert(BI && "Unexpected terminator found");
 
   if (!BI->isConditional())
     return EdgeMaskCache[Edge] = SrcMask;
 
   VPValue *EdgeMask = Plan->getVPValue(BI->getCondition());
   assert(EdgeMask && "No Edge Mask found for condition");
 
   if (BI->getSuccessor(0) != Dst)
     EdgeMask = Builder.createNot(EdgeMask);
 
   if (SrcMask) // Otherwise block in-mask is all-one, no need to AND.
     EdgeMask = Builder.createAnd(EdgeMask, SrcMask);
 
   return EdgeMaskCache[Edge] = EdgeMask;
 }
 
 VPValue *VPRecipeBuilder::createBlockInMask(BasicBlock *BB, VPlanPtr &Plan) {
   assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
 
   // Look for cached value.
   BlockMaskCacheTy::iterator BCEntryIt = BlockMaskCache.find(BB);
   if (BCEntryIt != BlockMaskCache.end())
     return BCEntryIt->second;
 
   // All-one mask is modelled as no-mask following the convention for masked
   // load/store/gather/scatter. Initialize BlockMask to no-mask.
   VPValue *BlockMask = nullptr;
 
   // Loop incoming mask is all-one.
   if (OrigLoop->getHeader() == BB)
     return BlockMaskCache[BB] = BlockMask;
 
   // This is the block mask. We OR all incoming edges.
   for (auto *Predecessor : predecessors(BB)) {
     VPValue *EdgeMask = createEdgeMask(Predecessor, BB, Plan);
     if (!EdgeMask) // Mask of predecessor is all-one so mask of block is too.
       return BlockMaskCache[BB] = EdgeMask;
 
     if (!BlockMask) { // BlockMask has its initialized nullptr value.
       BlockMask = EdgeMask;
       continue;
     }
 
     BlockMask = Builder.createOr(BlockMask, EdgeMask);
   }
 
   return BlockMaskCache[BB] = BlockMask;
 }
 
 VPInterleaveRecipe *VPRecipeBuilder::tryToInterleaveMemory(Instruction *I,
                                                            VFRange &Range) {
   const InterleaveGroup *IG = CM.getInterleavedAccessGroup(I);
   if (!IG)
     return nullptr;
 
   // Now check if IG is relevant for VF's in the given range.
   auto isIGMember = [&](Instruction *I) -> std::function<bool(unsigned)> {
     return [=](unsigned VF) -> bool {
       return (VF >= 2 && // Query is illegal for VF == 1
               CM.getWideningDecision(I, VF) ==
                   LoopVectorizationCostModel::CM_Interleave);
     };
   };
   if (!LoopVectorizationPlanner::getDecisionAndClampRange(isIGMember(I), Range))
     return nullptr;
 
   // I is a member of an InterleaveGroup for VF's in the (possibly trimmed)
   // range. If it's the primary member of the IG construct a VPInterleaveRecipe.
   // Otherwise, it's an adjunct member of the IG, do not construct any Recipe.
   assert(I == IG->getInsertPos() &&
          "Generating a recipe for an adjunct member of an interleave group");
 
   return new VPInterleaveRecipe(IG);
 }
 
 VPWidenMemoryInstructionRecipe *
 VPRecipeBuilder::tryToWidenMemory(Instruction *I, VFRange &Range,
                                   VPlanPtr &Plan) {
   if (!isa<LoadInst>(I) && !isa<StoreInst>(I))
     return nullptr;
 
   auto willWiden = [&](unsigned VF) -> bool {
     if (VF == 1)
       return false;
     if (CM.isScalarAfterVectorization(I, VF) ||
         CM.isProfitableToScalarize(I, VF))
       return false;
     LoopVectorizationCostModel::InstWidening Decision =
         CM.getWideningDecision(I, VF);
     assert(Decision != LoopVectorizationCostModel::CM_Unknown &&
            "CM decision should be taken at this point.");
     assert(Decision != LoopVectorizationCostModel::CM_Interleave &&
            "Interleave memory opportunity should be caught earlier.");
     return Decision != LoopVectorizationCostModel::CM_Scalarize;
   };
 
   if (!LoopVectorizationPlanner::getDecisionAndClampRange(willWiden, Range))
     return nullptr;
 
   VPValue *Mask = nullptr;
   if (Legal->isMaskRequired(I))
     Mask = createBlockInMask(I->getParent(), Plan);
 
   return new VPWidenMemoryInstructionRecipe(*I, Mask);
 }
 
 VPWidenIntOrFpInductionRecipe *
 VPRecipeBuilder::tryToOptimizeInduction(Instruction *I, VFRange &Range) {
   if (PHINode *Phi = dyn_cast<PHINode>(I)) {
     // Check if this is an integer or fp induction. If so, build the recipe that
     // produces its scalar and vector values.
     InductionDescriptor II = Legal->getInductionVars()->lookup(Phi);
     if (II.getKind() == InductionDescriptor::IK_IntInduction ||
         II.getKind() == InductionDescriptor::IK_FpInduction)
       return new VPWidenIntOrFpInductionRecipe(Phi);
 
     return nullptr;
   }
 
   // Optimize the special case where the source is a constant integer
   // induction variable. Notice that we can only optimize the 'trunc' case
   // because (a) FP conversions lose precision, (b) sext/zext may wrap, and
   // (c) other casts depend on pointer size.
 
   // Determine whether \p K is a truncation based on an induction variable that
   // can be optimized.
   auto isOptimizableIVTruncate =
       [&](Instruction *K) -> std::function<bool(unsigned)> {
     return
         [=](unsigned VF) -> bool { return CM.isOptimizableIVTruncate(K, VF); };
   };
 
   if (isa<TruncInst>(I) && LoopVectorizationPlanner::getDecisionAndClampRange(
                                isOptimizableIVTruncate(I), Range))
     return new VPWidenIntOrFpInductionRecipe(cast<PHINode>(I->getOperand(0)),
                                              cast<TruncInst>(I));
   return nullptr;
 }
 
 VPBlendRecipe *VPRecipeBuilder::tryToBlend(Instruction *I, VPlanPtr &Plan) {
   PHINode *Phi = dyn_cast<PHINode>(I);
   if (!Phi || Phi->getParent() == OrigLoop->getHeader())
     return nullptr;
 
   // We know that all PHIs in non-header blocks are converted into selects, so
   // we don't have to worry about the insertion order and we can just use the
   // builder. At this point we generate the predication tree. There may be
   // duplications since this is a simple recursive scan, but future
   // optimizations will clean it up.
 
   SmallVector<VPValue *, 2> Masks;
   unsigned NumIncoming = Phi->getNumIncomingValues();
   for (unsigned In = 0; In < NumIncoming; In++) {
     VPValue *EdgeMask =
       createEdgeMask(Phi->getIncomingBlock(In), Phi->getParent(), Plan);
     assert((EdgeMask || NumIncoming == 1) &&
            "Multiple predecessors with one having a full mask");
     if (EdgeMask)
       Masks.push_back(EdgeMask);
   }
   return new VPBlendRecipe(Phi, Masks);
 }
 
 bool VPRecipeBuilder::tryToWiden(Instruction *I, VPBasicBlock *VPBB,
                                  VFRange &Range) {
   if (CM.isScalarWithPredication(I))
     return false;
 
   auto IsVectorizableOpcode = [](unsigned Opcode) {
     switch (Opcode) {
     case Instruction::Add:
     case Instruction::And:
     case Instruction::AShr:
     case Instruction::BitCast:
     case Instruction::Br:
     case Instruction::Call:
     case Instruction::FAdd:
     case Instruction::FCmp:
     case Instruction::FDiv:
     case Instruction::FMul:
     case Instruction::FPExt:
     case Instruction::FPToSI:
     case Instruction::FPToUI:
     case Instruction::FPTrunc:
     case Instruction::FRem:
     case Instruction::FSub:
     case Instruction::GetElementPtr:
     case Instruction::ICmp:
     case Instruction::IntToPtr:
     case Instruction::Load:
     case Instruction::LShr:
     case Instruction::Mul:
     case Instruction::Or:
     case Instruction::PHI:
     case Instruction::PtrToInt:
     case Instruction::SDiv:
     case Instruction::Select:
     case Instruction::SExt:
     case Instruction::Shl:
     case Instruction::SIToFP:
     case Instruction::SRem:
     case Instruction::Store:
     case Instruction::Sub:
     case Instruction::Trunc:
     case Instruction::UDiv:
     case Instruction::UIToFP:
     case Instruction::URem:
     case Instruction::Xor:
     case Instruction::ZExt:
       return true;
     }
     return false;
   };
 
   if (!IsVectorizableOpcode(I->getOpcode()))
     return false;
 
   if (CallInst *CI = dyn_cast<CallInst>(I)) {
     Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
     if (ID && (ID == Intrinsic::assume || ID == Intrinsic::lifetime_end ||
                ID == Intrinsic::lifetime_start || ID == Intrinsic::sideeffect))
       return false;
   }
 
   auto willWiden = [&](unsigned VF) -> bool {
     if (!isa<PHINode>(I) && (CM.isScalarAfterVectorization(I, VF) ||
                              CM.isProfitableToScalarize(I, VF)))
       return false;
     if (CallInst *CI = dyn_cast<CallInst>(I)) {
       Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
       // The following case may be scalarized depending on the VF.
       // The flag shows whether we use Intrinsic or a usual Call for vectorized
       // version of the instruction.
       // Is it beneficial to perform intrinsic call compared to lib call?
       bool NeedToScalarize;
       unsigned CallCost = getVectorCallCost(CI, VF, *TTI, TLI, NeedToScalarize);
       bool UseVectorIntrinsic =
           ID && getVectorIntrinsicCost(CI, VF, *TTI, TLI) <= CallCost;
       return UseVectorIntrinsic || !NeedToScalarize;
     }
     if (isa<LoadInst>(I) || isa<StoreInst>(I)) {
       assert(CM.getWideningDecision(I, VF) ==
                  LoopVectorizationCostModel::CM_Scalarize &&
              "Memory widening decisions should have been taken care by now");
       return false;
     }
     return true;
   };
 
   if (!LoopVectorizationPlanner::getDecisionAndClampRange(willWiden, Range))
     return false;
 
   // Success: widen this instruction. We optimize the common case where
   // consecutive instructions can be represented by a single recipe.
   if (!VPBB->empty()) {
     VPWidenRecipe *LastWidenRecipe = dyn_cast<VPWidenRecipe>(&VPBB->back());
     if (LastWidenRecipe && LastWidenRecipe->appendInstruction(I))
       return true;
   }
 
   VPBB->appendRecipe(new VPWidenRecipe(I));
   return true;
 }
 
 VPBasicBlock *VPRecipeBuilder::handleReplication(
     Instruction *I, VFRange &Range, VPBasicBlock *VPBB,
     DenseMap<Instruction *, VPReplicateRecipe *> &PredInst2Recipe,
     VPlanPtr &Plan) {
   bool IsUniform = LoopVectorizationPlanner::getDecisionAndClampRange(
       [&](unsigned VF) { return CM.isUniformAfterVectorization(I, VF); },
       Range);
 
   bool IsPredicated = CM.isScalarWithPredication(I);
   auto *Recipe = new VPReplicateRecipe(I, IsUniform, IsPredicated);
 
   // Find if I uses a predicated instruction. If so, it will use its scalar
   // value. Avoid hoisting the insert-element which packs the scalar value into
   // a vector value, as that happens iff all users use the vector value.
   for (auto &Op : I->operands())
     if (auto *PredInst = dyn_cast<Instruction>(Op))
       if (PredInst2Recipe.find(PredInst) != PredInst2Recipe.end())
         PredInst2Recipe[PredInst]->setAlsoPack(false);
 
   // Finalize the recipe for Instr, first if it is not predicated.
   if (!IsPredicated) {
     LLVM_DEBUG(dbgs() << "LV: Scalarizing:" << *I << "\n");
     VPBB->appendRecipe(Recipe);
     return VPBB;
   }
   LLVM_DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");
   assert(VPBB->getSuccessors().empty() &&
          "VPBB has successors when handling predicated replication.");
   // Record predicated instructions for above packing optimizations.
   PredInst2Recipe[I] = Recipe;
   VPBlockBase *Region = createReplicateRegion(I, Recipe, Plan);
   VPBlockUtils::insertBlockAfter(Region, VPBB);
   auto *RegSucc = new VPBasicBlock();
   VPBlockUtils::insertBlockAfter(RegSucc, Region);
   return RegSucc;
 }
 
 VPRegionBlock *VPRecipeBuilder::createReplicateRegion(Instruction *Instr,
                                                       VPRecipeBase *PredRecipe,
                                                       VPlanPtr &Plan) {
   // Instructions marked for predication are replicated and placed under an
   // if-then construct to prevent side-effects.
 
   // Generate recipes to compute the block mask for this region.
   VPValue *BlockInMask = createBlockInMask(Instr->getParent(), Plan);
 
   // Build the triangular if-then region.
   std::string RegionName = (Twine("pred.") + Instr->getOpcodeName()).str();
   assert(Instr->getParent() && "Predicated instruction not in any basic block");
   auto *BOMRecipe = new VPBranchOnMaskRecipe(BlockInMask);
   auto *Entry = new VPBasicBlock(Twine(RegionName) + ".entry", BOMRecipe);
   auto *PHIRecipe =
       Instr->getType()->isVoidTy() ? nullptr : new VPPredInstPHIRecipe(Instr);
   auto *Exit = new VPBasicBlock(Twine(RegionName) + ".continue", PHIRecipe);
   auto *Pred = new VPBasicBlock(Twine(RegionName) + ".if", PredRecipe);
   VPRegionBlock *Region = new VPRegionBlock(Entry, Exit, RegionName, true);
 
   // Note: first set Entry as region entry and then connect successors starting
   // from it in order, to propagate the "parent" of each VPBasicBlock.
   VPBlockUtils::insertTwoBlocksAfter(Pred, Exit, BlockInMask, Entry);
   VPBlockUtils::connectBlocks(Pred, Exit);
 
   return Region;
 }
 
 bool VPRecipeBuilder::tryToCreateRecipe(Instruction *Instr, VFRange &Range,
                                         VPlanPtr &Plan, VPBasicBlock *VPBB) {
   VPRecipeBase *Recipe = nullptr;
   // Check if Instr should belong to an interleave memory recipe, or already
   // does. In the latter case Instr is irrelevant.
   if ((Recipe = tryToInterleaveMemory(Instr, Range))) {
     VPBB->appendRecipe(Recipe);
     return true;
   }
 
   // Check if Instr is a memory operation that should be widened.
   if ((Recipe = tryToWidenMemory(Instr, Range, Plan))) {
     VPBB->appendRecipe(Recipe);
     return true;
   }
 
   // Check if Instr should form some PHI recipe.
   if ((Recipe = tryToOptimizeInduction(Instr, Range))) {
     VPBB->appendRecipe(Recipe);
     return true;
   }
   if ((Recipe = tryToBlend(Instr, Plan))) {
     VPBB->appendRecipe(Recipe);
     return true;
   }
   if (PHINode *Phi = dyn_cast<PHINode>(Instr)) {
     VPBB->appendRecipe(new VPWidenPHIRecipe(Phi));
     return true;
   }
 
   // Check if Instr is to be widened by a general VPWidenRecipe, after
   // having first checked for specific widening recipes that deal with
   // Interleave Groups, Inductions and Phi nodes.
   if (tryToWiden(Instr, VPBB, Range))
     return true;
 
   return false;
 }
 
 void LoopVectorizationPlanner::buildVPlansWithVPRecipes(unsigned MinVF,
                                                         unsigned MaxVF) {
   assert(OrigLoop->empty() && "Inner loop expected.");
 
   // Collect conditions feeding internal conditional branches; they need to be
   // represented in VPlan for it to model masking.
   SmallPtrSet<Value *, 1> NeedDef;
 
   auto *Latch = OrigLoop->getLoopLatch();
   for (BasicBlock *BB : OrigLoop->blocks()) {
     if (BB == Latch)
       continue;
     BranchInst *Branch = dyn_cast<BranchInst>(BB->getTerminator());
     if (Branch && Branch->isConditional())
       NeedDef.insert(Branch->getCondition());
   }
 
   // Collect instructions from the original loop that will become trivially dead
   // in the vectorized loop. We don't need to vectorize these instructions. For
   // example, original induction update instructions can become dead because we
   // separately emit induction "steps" when generating code for the new loop.
   // Similarly, we create a new latch condition when setting up the structure
   // of the new loop, so the old one can become dead.
   SmallPtrSet<Instruction *, 4> DeadInstructions;
   collectTriviallyDeadInstructions(DeadInstructions);
 
   for (unsigned VF = MinVF; VF < MaxVF + 1;) {
     VFRange SubRange = {VF, MaxVF + 1};
     VPlans.push_back(
         buildVPlanWithVPRecipes(SubRange, NeedDef, DeadInstructions));
     VF = SubRange.End;
   }
 }
 
 LoopVectorizationPlanner::VPlanPtr
 LoopVectorizationPlanner::buildVPlanWithVPRecipes(
     VFRange &Range, SmallPtrSetImpl<Value *> &NeedDef,
     SmallPtrSetImpl<Instruction *> &DeadInstructions) {
   // Hold a mapping from predicated instructions to their recipes, in order to
   // fix their AlsoPack behavior if a user is determined to replicate and use a
   // scalar instead of vector value.
   DenseMap<Instruction *, VPReplicateRecipe *> PredInst2Recipe;
 
   DenseMap<Instruction *, Instruction *> &SinkAfter = Legal->getSinkAfter();
   DenseMap<Instruction *, Instruction *> SinkAfterInverse;
 
   // Create a dummy pre-entry VPBasicBlock to start building the VPlan.
   VPBasicBlock *VPBB = new VPBasicBlock("Pre-Entry");
   auto Plan = llvm::make_unique<VPlan>(VPBB);
 
   VPRecipeBuilder RecipeBuilder(OrigLoop, TLI, TTI, Legal, CM, Builder);
   // Represent values that will have defs inside VPlan.
   for (Value *V : NeedDef)
     Plan->addVPValue(V);
 
   // Scan the body of the loop in a topological order to visit each basic block
   // after having visited its predecessor basic blocks.
   LoopBlocksDFS DFS(OrigLoop);
   DFS.perform(LI);
 
   for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
     // Relevant instructions from basic block BB will be grouped into VPRecipe
     // ingredients and fill a new VPBasicBlock.
     unsigned VPBBsForBB = 0;
     auto *FirstVPBBForBB = new VPBasicBlock(BB->getName());
     VPBlockUtils::insertBlockAfter(FirstVPBBForBB, VPBB);
     VPBB = FirstVPBBForBB;
     Builder.setInsertPoint(VPBB);
 
     std::vector<Instruction *> Ingredients;
 
     // Organize the ingredients to vectorize from current basic block in the
     // right order.
     for (Instruction &I : BB->instructionsWithoutDebug()) {
       Instruction *Instr = &I;
 
       // First filter out irrelevant instructions, to ensure no recipes are
       // built for them.
       if (isa<BranchInst>(Instr) || DeadInstructions.count(Instr))
         continue;
 
       // I is a member of an InterleaveGroup for Range.Start. If it's an adjunct
       // member of the IG, do not construct any Recipe for it.
       const InterleaveGroup *IG = CM.getInterleavedAccessGroup(Instr);
       if (IG && Instr != IG->getInsertPos() &&
           Range.Start >= 2 && // Query is illegal for VF == 1
           CM.getWideningDecision(Instr, Range.Start) ==
               LoopVectorizationCostModel::CM_Interleave) {
         if (SinkAfterInverse.count(Instr))
           Ingredients.push_back(SinkAfterInverse.find(Instr)->second);
         continue;
       }
 
       // Move instructions to handle first-order recurrences, step 1: avoid
       // handling this instruction until after we've handled the instruction it
       // should follow.
       auto SAIt = SinkAfter.find(Instr);
       if (SAIt != SinkAfter.end()) {
         LLVM_DEBUG(dbgs() << "Sinking" << *SAIt->first << " after"
                           << *SAIt->second
                           << " to vectorize a 1st order recurrence.\n");
         SinkAfterInverse[SAIt->second] = Instr;
         continue;
       }
 
       Ingredients.push_back(Instr);
 
       // Move instructions to handle first-order recurrences, step 2: push the
       // instruction to be sunk at its insertion point.
       auto SAInvIt = SinkAfterInverse.find(Instr);
       if (SAInvIt != SinkAfterInverse.end())
         Ingredients.push_back(SAInvIt->second);
     }
 
     // Introduce each ingredient into VPlan.
     for (Instruction *Instr : Ingredients) {
       if (RecipeBuilder.tryToCreateRecipe(Instr, Range, Plan, VPBB))
         continue;
 
       // Otherwise, if all widening options failed, Instruction is to be
       // replicated. This may create a successor for VPBB.
       VPBasicBlock *NextVPBB = RecipeBuilder.handleReplication(
           Instr, Range, VPBB, PredInst2Recipe, Plan);
       if (NextVPBB != VPBB) {
         VPBB = NextVPBB;
         VPBB->setName(BB->hasName() ? BB->getName() + "." + Twine(VPBBsForBB++)
                                     : "");
       }
     }
   }
 
   // Discard empty dummy pre-entry VPBasicBlock. Note that other VPBasicBlocks
   // may also be empty, such as the last one VPBB, reflecting original
   // basic-blocks with no recipes.
   VPBasicBlock *PreEntry = cast<VPBasicBlock>(Plan->getEntry());
   assert(PreEntry->empty() && "Expecting empty pre-entry block.");
   VPBlockBase *Entry = Plan->setEntry(PreEntry->getSingleSuccessor());
   VPBlockUtils::disconnectBlocks(PreEntry, Entry);
   delete PreEntry;
 
   std::string PlanName;
   raw_string_ostream RSO(PlanName);
   unsigned VF = Range.Start;
   Plan->addVF(VF);
   RSO << "Initial VPlan for VF={" << VF;
   for (VF *= 2; VF < Range.End; VF *= 2) {
     Plan->addVF(VF);
     RSO << "," << VF;
   }
   RSO << "},UF>=1";
   RSO.flush();
   Plan->setName(PlanName);
 
   return Plan;
 }
 
 LoopVectorizationPlanner::VPlanPtr
 LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
   // Outer loop handling: They may require CFG and instruction level
   // transformations before even evaluating whether vectorization is profitable.
   // Since we cannot modify the incoming IR, we need to build VPlan upfront in
   // the vectorization pipeline.
   assert(!OrigLoop->empty());
   assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
 
   // Create new empty VPlan
   auto Plan = llvm::make_unique<VPlan>();
 
   // Build hierarchical CFG
   VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
   HCFGBuilder.buildHierarchicalCFG();
 
   return Plan;
 }
 
 Value* LoopVectorizationPlanner::VPCallbackILV::
 getOrCreateVectorValues(Value *V, unsigned Part) {
       return ILV.getOrCreateVectorValue(V, Part);
 }
 
 void VPInterleaveRecipe::print(raw_ostream &O, const Twine &Indent) const {
   O << " +\n"
     << Indent << "\"INTERLEAVE-GROUP with factor " << IG->getFactor() << " at ";
   IG->getInsertPos()->printAsOperand(O, false);
   O << "\\l\"";
   for (unsigned i = 0; i < IG->getFactor(); ++i)
     if (Instruction *I = IG->getMember(i))
       O << " +\n"
         << Indent << "\"  " << VPlanIngredient(I) << " " << i << "\\l\"";
 }
 
 void VPWidenRecipe::execute(VPTransformState &State) {
   for (auto &Instr : make_range(Begin, End))
     State.ILV->widenInstruction(Instr);
 }
 
 void VPWidenIntOrFpInductionRecipe::execute(VPTransformState &State) {
   assert(!State.Instance && "Int or FP induction being replicated.");
   State.ILV->widenIntOrFpInduction(IV, Trunc);
 }
 
 void VPWidenPHIRecipe::execute(VPTransformState &State) {
   State.ILV->widenPHIInstruction(Phi, State.UF, State.VF);
 }
 
 void VPBlendRecipe::execute(VPTransformState &State) {
   State.ILV->setDebugLocFromInst(State.Builder, Phi);
   // We know that all PHIs in non-header blocks are converted into
   // selects, so we don't have to worry about the insertion order and we
   // can just use the builder.
   // At this point we generate the predication tree. There may be
   // duplications since this is a simple recursive scan, but future
   // optimizations will clean it up.
 
   unsigned NumIncoming = Phi->getNumIncomingValues();
 
   assert((User || NumIncoming == 1) &&
          "Multiple predecessors with predecessors having a full mask");
   // Generate a sequence of selects of the form:
   // SELECT(Mask3, In3,
   //      SELECT(Mask2, In2,
   //                   ( ...)))
   InnerLoopVectorizer::VectorParts Entry(State.UF);
   for (unsigned In = 0; In < NumIncoming; ++In) {
     for (unsigned Part = 0; Part < State.UF; ++Part) {
       // We might have single edge PHIs (blocks) - use an identity
       // 'select' for the first PHI operand.
       Value *In0 =
           State.ILV->getOrCreateVectorValue(Phi->getIncomingValue(In), Part);
       if (In == 0)
         Entry[Part] = In0; // Initialize with the first incoming value.
       else {
         // Select between the current value and the previous incoming edge
         // based on the incoming mask.
         Value *Cond = State.get(User->getOperand(In), Part);
         Entry[Part] =
             State.Builder.CreateSelect(Cond, In0, Entry[Part], "predphi");
       }
     }
   }
   for (unsigned Part = 0; Part < State.UF; ++Part)
     State.ValueMap.setVectorValue(Phi, Part, Entry[Part]);
 }
 
 void VPInterleaveRecipe::execute(VPTransformState &State) {
   assert(!State.Instance && "Interleave group being replicated.");
   State.ILV->vectorizeInterleaveGroup(IG->getInsertPos());
 }
 
 void VPReplicateRecipe::execute(VPTransformState &State) {
   if (State.Instance) { // Generate a single instance.
     State.ILV->scalarizeInstruction(Ingredient, *State.Instance, IsPredicated);
     // Insert scalar instance packing it into a vector.
     if (AlsoPack && State.VF > 1) {
       // If we're constructing lane 0, initialize to start from undef.
       if (State.Instance->Lane == 0) {
         Value *Undef =
             UndefValue::get(VectorType::get(Ingredient->getType(), State.VF));
         State.ValueMap.setVectorValue(Ingredient, State.Instance->Part, Undef);
       }
       State.ILV->packScalarIntoVectorValue(Ingredient, *State.Instance);
     }
     return;
   }
 
   // Generate scalar instances for all VF lanes of all UF parts, unless the
   // instruction is uniform inwhich case generate only the first lane for each
   // of the UF parts.
   unsigned EndLane = IsUniform ? 1 : State.VF;
   for (unsigned Part = 0; Part < State.UF; ++Part)
     for (unsigned Lane = 0; Lane < EndLane; ++Lane)
       State.ILV->scalarizeInstruction(Ingredient, {Part, Lane}, IsPredicated);
 }
 
 void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
   assert(State.Instance && "Branch on Mask works only on single instance.");
 
   unsigned Part = State.Instance->Part;
   unsigned Lane = State.Instance->Lane;
 
   Value *ConditionBit = nullptr;
   if (!User) // Block in mask is all-one.
     ConditionBit = State.Builder.getTrue();
   else {
     VPValue *BlockInMask = User->getOperand(0);
     ConditionBit = State.get(BlockInMask, Part);
     if (ConditionBit->getType()->isVectorTy())
       ConditionBit = State.Builder.CreateExtractElement(
           ConditionBit, State.Builder.getInt32(Lane));
   }
 
   // Replace the temporary unreachable terminator with a new conditional branch,
   // whose two destinations will be set later when they are created.
   auto *CurrentTerminator = State.CFG.PrevBB->getTerminator();
   assert(isa<UnreachableInst>(CurrentTerminator) &&
          "Expected to replace unreachable terminator with conditional branch.");
   auto *CondBr = BranchInst::Create(State.CFG.PrevBB, nullptr, ConditionBit);
   CondBr->setSuccessor(0, nullptr);
   ReplaceInstWithInst(CurrentTerminator, CondBr);
 }
 
 void VPPredInstPHIRecipe::execute(VPTransformState &State) {
   assert(State.Instance && "Predicated instruction PHI works per instance.");
   Instruction *ScalarPredInst = cast<Instruction>(
       State.ValueMap.getScalarValue(PredInst, *State.Instance));
   BasicBlock *PredicatedBB = ScalarPredInst->getParent();
   BasicBlock *PredicatingBB = PredicatedBB->getSinglePredecessor();
   assert(PredicatingBB && "Predicated block has no single predecessor.");
 
   // By current pack/unpack logic we need to generate only a single phi node: if
   // a vector value for the predicated instruction exists at this point it means
   // the instruction has vector users only, and a phi for the vector value is
   // needed. In this case the recipe of the predicated instruction is marked to
   // also do that packing, thereby "hoisting" the insert-element sequence.
   // Otherwise, a phi node for the scalar value is needed.
   unsigned Part = State.Instance->Part;
   if (State.ValueMap.hasVectorValue(PredInst, Part)) {
     Value *VectorValue = State.ValueMap.getVectorValue(PredInst, Part);
     InsertElementInst *IEI = cast<InsertElementInst>(VectorValue);
     PHINode *VPhi = State.Builder.CreatePHI(IEI->getType(), 2);
     VPhi->addIncoming(IEI->getOperand(0), PredicatingBB); // Unmodified vector.
     VPhi->addIncoming(IEI, PredicatedBB); // New vector with inserted element.
     State.ValueMap.resetVectorValue(PredInst, Part, VPhi); // Update cache.
   } else {
     Type *PredInstType = PredInst->getType();
     PHINode *Phi = State.Builder.CreatePHI(PredInstType, 2);
     Phi->addIncoming(UndefValue::get(ScalarPredInst->getType()), PredicatingBB);
     Phi->addIncoming(ScalarPredInst, PredicatedBB);
     State.ValueMap.resetScalarValue(PredInst, *State.Instance, Phi);
   }
 }
 
 void VPWidenMemoryInstructionRecipe::execute(VPTransformState &State) {
   if (!User)
     return State.ILV->vectorizeMemoryInstruction(&Instr);
 
   // Last (and currently only) operand is a mask.
   InnerLoopVectorizer::VectorParts MaskValues(State.UF);
   VPValue *Mask = User->getOperand(User->getNumOperands() - 1);
   for (unsigned Part = 0; Part < State.UF; ++Part)
     MaskValues[Part] = State.get(Mask, Part);
   State.ILV->vectorizeMemoryInstruction(&Instr, &MaskValues);
 }
 
 // Process the loop in the VPlan-native vectorization path. This path builds
 // VPlan upfront in the vectorization pipeline, which allows to apply
 // VPlan-to-VPlan transformations from the very beginning without modifying the
 // input LLVM IR.
 static bool processLoopInVPlanNativePath(
     Loop *L, PredicatedScalarEvolution &PSE, LoopInfo *LI, DominatorTree *DT,
     LoopVectorizationLegality *LVL, TargetTransformInfo *TTI,
     TargetLibraryInfo *TLI, DemandedBits *DB, AssumptionCache *AC,
     OptimizationRemarkEmitter *ORE, LoopVectorizeHints &Hints) {
 
   assert(EnableVPlanNativePath && "VPlan-native path is disabled.");
   Function *F = L->getHeader()->getParent();
   InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL->getLAI());
   LoopVectorizationCostModel CM(L, PSE, LI, LVL, *TTI, TLI, DB, AC, ORE, F,
                                 &Hints, IAI);
   // Use the planner for outer loop vectorization.
   // TODO: CM is not used at this point inside the planner. Turn CM into an
   // optional argument if we don't need it in the future.
   LoopVectorizationPlanner LVP(L, LI, TLI, TTI, LVL, CM);
 
   // Get user vectorization factor.
   unsigned UserVF = Hints.getWidth();
 
   // Check the function attributes to find out if this function should be
   // optimized for size.
   bool OptForSize =
       Hints.getForce() != LoopVectorizeHints::FK_Enabled && F->optForSize();
 
   // Plan how to best vectorize, return the best VF and its cost.
   LVP.planInVPlanNativePath(OptForSize, UserVF);
 
   // Returning false. We are currently not generating vector code in the VPlan
   // native path.
   return false;
 }
 
 bool LoopVectorizePass::processLoop(Loop *L) {
   assert((EnableVPlanNativePath || L->empty()) &&
          "VPlan-native path is not enabled. Only process inner loops.");
 
 #ifndef NDEBUG
   const std::string DebugLocStr = getDebugLocString(L);
 #endif /* NDEBUG */
 
   LLVM_DEBUG(dbgs() << "\nLV: Checking a loop in \""
                     << L->getHeader()->getParent()->getName() << "\" from "
                     << DebugLocStr << "\n");
 
   LoopVectorizeHints Hints(L, DisableUnrolling, *ORE);
 
   LLVM_DEBUG(
       dbgs() << "LV: Loop hints:"
              << " force="
              << (Hints.getForce() == LoopVectorizeHints::FK_Disabled
                      ? "disabled"
                      : (Hints.getForce() == LoopVectorizeHints::FK_Enabled
                             ? "enabled"
                             : "?"))
              << " width=" << Hints.getWidth()
              << " unroll=" << Hints.getInterleave() << "\n");
 
   // Function containing loop
   Function *F = L->getHeader()->getParent();
 
   // Looking at the diagnostic output is the only way to determine if a loop
   // was vectorized (other than looking at the IR or machine code), so it
   // is important to generate an optimization remark for each loop. Most of
   // these messages are generated as OptimizationRemarkAnalysis. Remarks
   // generated as OptimizationRemark and OptimizationRemarkMissed are
   // less verbose reporting vectorized loops and unvectorized loops that may
   // benefit from vectorization, respectively.
 
   if (!Hints.allowVectorization(F, L, AlwaysVectorize)) {
     LLVM_DEBUG(dbgs() << "LV: Loop hints prevent vectorization.\n");
     return false;
   }
 
   PredicatedScalarEvolution PSE(*SE, *L);
 
   // Check if it is legal to vectorize the loop.
   LoopVectorizationRequirements Requirements(*ORE);
   LoopVectorizationLegality LVL(L, PSE, DT, TLI, AA, F, GetLAA, LI, ORE,
                                 &Requirements, &Hints, DB, AC);
   if (!LVL.canVectorize(EnableVPlanNativePath)) {
     LLVM_DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");
     emitMissedWarning(F, L, Hints, ORE);
     return false;
   }
 
   // Check the function attributes to find out if this function should be
   // optimized for size.
   bool OptForSize =
       Hints.getForce() != LoopVectorizeHints::FK_Enabled && F->optForSize();
 
   // Entrance to the VPlan-native vectorization path. Outer loops are processed
   // here. They may require CFG and instruction level transformations before
   // even evaluating whether vectorization is profitable. Since we cannot modify
   // the incoming IR, we need to build VPlan upfront in the vectorization
   // pipeline.
   if (!L->empty())
     return processLoopInVPlanNativePath(L, PSE, LI, DT, &LVL, TTI, TLI, DB, AC,
                                         ORE, Hints);
 
   assert(L->empty() && "Inner loop expected.");
   // Check the loop for a trip count threshold: vectorize loops with a tiny trip
   // count by optimizing for size, to minimize overheads.
   // Prefer constant trip counts over profile data, over upper bound estimate.
   unsigned ExpectedTC = 0;
   bool HasExpectedTC = false;
   if (const SCEVConstant *ConstExits =
       dyn_cast<SCEVConstant>(SE->getBackedgeTakenCount(L))) {
     const APInt &ExitsCount = ConstExits->getAPInt();
     // We are interested in small values for ExpectedTC. Skip over those that
     // can't fit an unsigned.
     if (ExitsCount.ult(std::numeric_limits<unsigned>::max())) {
       ExpectedTC = static_cast<unsigned>(ExitsCount.getZExtValue()) + 1;
       HasExpectedTC = true;
     }
   }
   // ExpectedTC may be large because it's bound by a variable. Check
   // profiling information to validate we should vectorize.
   if (!HasExpectedTC && LoopVectorizeWithBlockFrequency) {
     auto EstimatedTC = getLoopEstimatedTripCount(L);
     if (EstimatedTC) {
       ExpectedTC = *EstimatedTC;
       HasExpectedTC = true;
     }
   }
   if (!HasExpectedTC) {
     ExpectedTC = SE->getSmallConstantMaxTripCount(L);
     HasExpectedTC = (ExpectedTC > 0);
   }
 
   if (HasExpectedTC && ExpectedTC < TinyTripCountVectorThreshold) {
     LLVM_DEBUG(dbgs() << "LV: Found a loop with a very small trip count. "
                       << "This loop is worth vectorizing only if no scalar "
                       << "iteration overheads are incurred.");
     if (Hints.getForce() == LoopVectorizeHints::FK_Enabled)
       LLVM_DEBUG(dbgs() << " But vectorizing was explicitly forced.\n");
     else {
       LLVM_DEBUG(dbgs() << "\n");
       // Loops with a very small trip count are considered for vectorization
       // under OptForSize, thereby making sure the cost of their loop body is
       // dominant, free of runtime guards and scalar iteration overheads.
       OptForSize = true;
     }
   }
 
   // Check the function attributes to see if implicit floats are allowed.
   // FIXME: This check doesn't seem possibly correct -- what if the loop is
   // an integer loop and the vector instructions selected are purely integer
   // vector instructions?
   if (F->hasFnAttribute(Attribute::NoImplicitFloat)) {
     LLVM_DEBUG(dbgs() << "LV: Can't vectorize when the NoImplicitFloat"
                          "attribute is used.\n");
     ORE->emit(createLVMissedAnalysis(Hints.vectorizeAnalysisPassName(),
                                      "NoImplicitFloat", L)
               << "loop not vectorized due to NoImplicitFloat attribute");
     emitMissedWarning(F, L, Hints, ORE);
     return false;
   }
 
   // Check if the target supports potentially unsafe FP vectorization.
   // FIXME: Add a check for the type of safety issue (denormal, signaling)
   // for the target we're vectorizing for, to make sure none of the
   // additional fp-math flags can help.
   if (Hints.isPotentiallyUnsafe() &&
       TTI->isFPVectorizationPotentiallyUnsafe()) {
     LLVM_DEBUG(
         dbgs() << "LV: Potentially unsafe FP op prevents vectorization.\n");
     ORE->emit(
         createLVMissedAnalysis(Hints.vectorizeAnalysisPassName(), "UnsafeFP", L)
         << "loop not vectorized due to unsafe FP support.");
     emitMissedWarning(F, L, Hints, ORE);
     return false;
   }
 
   bool UseInterleaved = TTI->enableInterleavedAccessVectorization();
   InterleavedAccessInfo IAI(PSE, L, DT, LI, LVL.getLAI());
 
   // If an override option has been passed in for interleaved accesses, use it.
   if (EnableInterleavedMemAccesses.getNumOccurrences() > 0)
     UseInterleaved = EnableInterleavedMemAccesses;
 
   // Analyze interleaved memory accesses.
   if (UseInterleaved) {
     IAI.analyzeInterleaving();
   }
 
   // Use the cost model.
   LoopVectorizationCostModel CM(L, PSE, LI, &LVL, *TTI, TLI, DB, AC, ORE, F,
                                 &Hints, IAI);
   CM.collectValuesToIgnore();
 
   // Use the planner for vectorization.
   LoopVectorizationPlanner LVP(L, LI, TLI, TTI, &LVL, CM);
 
   // Get user vectorization factor.
   unsigned UserVF = Hints.getWidth();
 
   // Plan how to best vectorize, return the best VF and its cost.
   VectorizationFactor VF = LVP.plan(OptForSize, UserVF);
 
   // Select the interleave count.
   unsigned IC = CM.selectInterleaveCount(OptForSize, VF.Width, VF.Cost);
 
   // Get user interleave count.
   unsigned UserIC = Hints.getInterleave();
 
   // Identify the diagnostic messages that should be produced.
   std::pair<StringRef, std::string> VecDiagMsg, IntDiagMsg;
   bool VectorizeLoop = true, InterleaveLoop = true;
   if (Requirements.doesNotMeet(F, L, Hints)) {
     LLVM_DEBUG(dbgs() << "LV: Not vectorizing: loop did not meet vectorization "
                          "requirements.\n");
     emitMissedWarning(F, L, Hints, ORE);
     return false;
   }
 
   if (VF.Width == 1) {
     LLVM_DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");
     VecDiagMsg = std::make_pair(
         "VectorizationNotBeneficial",
         "the cost-model indicates that vectorization is not beneficial");
     VectorizeLoop = false;
   }
 
   if (IC == 1 && UserIC <= 1) {
     // Tell the user interleaving is not beneficial.
     LLVM_DEBUG(dbgs() << "LV: Interleaving is not beneficial.\n");
     IntDiagMsg = std::make_pair(
         "InterleavingNotBeneficial",
         "the cost-model indicates that interleaving is not beneficial");
     InterleaveLoop = false;
     if (UserIC == 1) {
       IntDiagMsg.first = "InterleavingNotBeneficialAndDisabled";
       IntDiagMsg.second +=
           " and is explicitly disabled or interleave count is set to 1";
     }
   } else if (IC > 1 && UserIC == 1) {
     // Tell the user interleaving is beneficial, but it explicitly disabled.
     LLVM_DEBUG(
         dbgs() << "LV: Interleaving is beneficial but is explicitly disabled.");
     IntDiagMsg = std::make_pair(
         "InterleavingBeneficialButDisabled",
         "the cost-model indicates that interleaving is beneficial "
         "but is explicitly disabled or interleave count is set to 1");
     InterleaveLoop = false;
   }
 
   // Override IC if user provided an interleave count.
   IC = UserIC > 0 ? UserIC : IC;
 
   // Emit diagnostic messages, if any.
   const char *VAPassName = Hints.vectorizeAnalysisPassName();
   if (!VectorizeLoop && !InterleaveLoop) {
     // Do not vectorize or interleaving the loop.
     ORE->emit([&]() {
       return OptimizationRemarkMissed(VAPassName, VecDiagMsg.first,
                                       L->getStartLoc(), L->getHeader())
              << VecDiagMsg.second;
     });
     ORE->emit([&]() {
       return OptimizationRemarkMissed(LV_NAME, IntDiagMsg.first,
                                       L->getStartLoc(), L->getHeader())
              << IntDiagMsg.second;
     });
     return false;
   } else if (!VectorizeLoop && InterleaveLoop) {
     LLVM_DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');
     ORE->emit([&]() {
       return OptimizationRemarkAnalysis(VAPassName, VecDiagMsg.first,
                                         L->getStartLoc(), L->getHeader())
              << VecDiagMsg.second;
     });
   } else if (VectorizeLoop && !InterleaveLoop) {
     LLVM_DEBUG(dbgs() << "LV: Found a vectorizable loop (" << VF.Width
                       << ") in " << DebugLocStr << '\n');
     ORE->emit([&]() {
       return OptimizationRemarkAnalysis(LV_NAME, IntDiagMsg.first,
                                         L->getStartLoc(), L->getHeader())
              << IntDiagMsg.second;
     });
   } else if (VectorizeLoop && InterleaveLoop) {
     LLVM_DEBUG(dbgs() << "LV: Found a vectorizable loop (" << VF.Width
                       << ") in " << DebugLocStr << '\n');
     LLVM_DEBUG(dbgs() << "LV: Interleave Count is " << IC << '\n');
   }
 
   LVP.setBestPlan(VF.Width, IC);
 
   using namespace ore;
 
   if (!VectorizeLoop) {
     assert(IC > 1 && "interleave count should not be 1 or 0");
     // If we decided that it is not legal to vectorize the loop, then
     // interleave it.
     InnerLoopUnroller Unroller(L, PSE, LI, DT, TLI, TTI, AC, ORE, IC, &LVL,
                                &CM);
     LVP.executePlan(Unroller, DT);
 
     ORE->emit([&]() {
       return OptimizationRemark(LV_NAME, "Interleaved", L->getStartLoc(),
                                 L->getHeader())
              << "interleaved loop (interleaved count: "
              << NV("InterleaveCount", IC) << ")";
     });
   } else {
     // If we decided that it is *legal* to vectorize the loop, then do it.
     InnerLoopVectorizer LB(L, PSE, LI, DT, TLI, TTI, AC, ORE, VF.Width, IC,
                            &LVL, &CM);
     LVP.executePlan(LB, DT);
     ++LoopsVectorized;
 
     // Add metadata to disable runtime unrolling a scalar loop when there are
     // no runtime checks about strides and memory. A scalar loop that is
     // rarely used is not worth unrolling.
     if (!LB.areSafetyChecksAdded())
       AddRuntimeUnrollDisableMetaData(L);
 
     // Report the vectorization decision.
     ORE->emit([&]() {
       return OptimizationRemark(LV_NAME, "Vectorized", L->getStartLoc(),
                                 L->getHeader())
              << "vectorized loop (vectorization width: "
              << NV("VectorizationFactor", VF.Width)
              << ", interleaved count: " << NV("InterleaveCount", IC) << ")";
     });
   }
 
   // Mark the loop as already vectorized to avoid vectorizing again.
   Hints.setAlreadyVectorized();
 
   LLVM_DEBUG(verifyFunction(*L->getHeader()->getParent()));
   return true;
 }
 
 bool LoopVectorizePass::runImpl(
     Function &F, ScalarEvolution &SE_, LoopInfo &LI_, TargetTransformInfo &TTI_,
     DominatorTree &DT_, BlockFrequencyInfo &BFI_, TargetLibraryInfo *TLI_,
     DemandedBits &DB_, AliasAnalysis &AA_, AssumptionCache &AC_,
     std::function<const LoopAccessInfo &(Loop &)> &GetLAA_,
     OptimizationRemarkEmitter &ORE_) {
   SE = &SE_;
   LI = &LI_;
   TTI = &TTI_;
   DT = &DT_;
   BFI = &BFI_;
   TLI = TLI_;
   AA = &AA_;
   AC = &AC_;
   GetLAA = &GetLAA_;
   DB = &DB_;
   ORE = &ORE_;
 
   // Don't attempt if
   // 1. the target claims to have no vector registers, and
   // 2. interleaving won't help ILP.
   //
   // The second condition is necessary because, even if the target has no
   // vector registers, loop vectorization may still enable scalar
   // interleaving.
   if (!TTI->getNumberOfRegisters(true) && TTI->getMaxInterleaveFactor(1) < 2)
     return false;
 
   bool Changed = false;
 
   // The vectorizer requires loops to be in simplified form.
   // Since simplification may add new inner loops, it has to run before the
   // legality and profitability checks. This means running the loop vectorizer
   // will simplify all loops, regardless of whether anything end up being
   // vectorized.
   for (auto &L : *LI)
     Changed |= simplifyLoop(L, DT, LI, SE, AC, false /* PreserveLCSSA */);
 
   // Build up a worklist of inner-loops to vectorize. This is necessary as
   // the act of vectorizing or partially unrolling a loop creates new loops
   // and can invalidate iterators across the loops.
   SmallVector<Loop *, 8> Worklist;
 
   for (Loop *L : *LI)
     collectSupportedLoops(*L, LI, ORE, Worklist);
 
   LoopsAnalyzed += Worklist.size();
 
   // Now walk the identified inner loops.
   while (!Worklist.empty()) {
     Loop *L = Worklist.pop_back_val();
 
     // For the inner loops we actually process, form LCSSA to simplify the
     // transform.
     Changed |= formLCSSARecursively(*L, *DT, LI, SE);
 
     Changed |= processLoop(L);
   }
 
   // Process each loop nest in the function.
   return Changed;
 }
 
 PreservedAnalyses LoopVectorizePass::run(Function &F,
                                          FunctionAnalysisManager &AM) {
     auto &SE = AM.getResult<ScalarEvolutionAnalysis>(F);
     auto &LI = AM.getResult<LoopAnalysis>(F);
     auto &TTI = AM.getResult<TargetIRAnalysis>(F);
     auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
     auto &BFI = AM.getResult<BlockFrequencyAnalysis>(F);
     auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
     auto &AA = AM.getResult<AAManager>(F);
     auto &AC = AM.getResult<AssumptionAnalysis>(F);
     auto &DB = AM.getResult<DemandedBitsAnalysis>(F);
     auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);
 
     auto &LAM = AM.getResult<LoopAnalysisManagerFunctionProxy>(F).getManager();
     std::function<const LoopAccessInfo &(Loop &)> GetLAA =
         [&](Loop &L) -> const LoopAccessInfo & {
       LoopStandardAnalysisResults AR = {AA, AC, DT, LI, SE, TLI, TTI, nullptr};
       return LAM.getResult<LoopAccessAnalysis>(L, AR);
     };
     bool Changed =
         runImpl(F, SE, LI, TTI, DT, BFI, &TLI, DB, AA, AC, GetLAA, ORE);
     if (!Changed)
       return PreservedAnalyses::all();
     PreservedAnalyses PA;
     PA.preserve<LoopAnalysis>();
     PA.preserve<DominatorTreeAnalysis>();
     PA.preserve<BasicAA>();
     PA.preserve<GlobalsAA>();
     return PA;
 }
Index: vendor/llvm/dist-release_70/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll
===================================================================
--- vendor/llvm/dist-release_70/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll	(revision 338574)
+++ vendor/llvm/dist-release_70/test/CodeGen/AMDGPU/amdgpu-alias-analysis.ll	(revision 338575)
@@ -1,9 +1,33 @@
 ; RUN: opt -mtriple=amdgcn-- -O3 -aa-eval -print-all-alias-modref-info -disable-output < %s 2>&1 | FileCheck %s
 ; RUN: opt -mtriple=r600-- -O3 -aa-eval -print-all-alias-modref-info -disable-output < %s 2>&1 | FileCheck %s
 
 ; CHECK: NoAlias:      i8 addrspace(1)* %p1, i8 addrspace(5)* %p
 
 define void @test(i8 addrspace(5)* %p, i8 addrspace(1)* %p1) {
   ret void
 }
 
+; CHECK: MayAlias:      i8 addrspace(1)* %p1, i8 addrspace(4)* %p
+
+define void @test_constant_vs_global(i8 addrspace(4)* %p, i8 addrspace(1)* %p1) {
+  ret void
+}
+
+; CHECK: MayAlias:      i8 addrspace(1)* %p, i8 addrspace(4)* %p1
+
+define void @test_global_vs_constant(i8 addrspace(1)* %p, i8 addrspace(4)* %p1) {
+  ret void
+}
+
+; CHECK: MayAlias:      i8 addrspace(1)* %p1, i8 addrspace(6)* %p
+
+define void @test_constant_32bit_vs_global(i8 addrspace(6)* %p, i8 addrspace(1)* %p1) {
+  ret void
+}
+
+; CHECK: MayAlias:      i8 addrspace(4)* %p1, i8 addrspace(6)* %p
+
+define void @test_constant_32bit_vs_constant(i8 addrspace(6)* %p, i8 addrspace(4)* %p1) {
+  ret void
+}
+
Index: vendor/llvm/dist-release_70/test/CodeGen/AMDGPU/constant-address-space-32bit.ll
===================================================================
--- vendor/llvm/dist-release_70/test/CodeGen/AMDGPU/constant-address-space-32bit.ll	(revision 338574)
+++ vendor/llvm/dist-release_70/test/CodeGen/AMDGPU/constant-address-space-32bit.ll	(revision 338575)
@@ -1,288 +1,299 @@
 ; RUN: llc -march=amdgcn -mcpu=tahiti < %s | FileCheck -check-prefixes=GCN,SICI,SI %s
 ; RUN: llc -march=amdgcn -mcpu=bonaire < %s | FileCheck -check-prefixes=GCN,SICI %s
 ; RUN: llc -march=amdgcn -mcpu=tonga < %s | FileCheck -check-prefixes=GCN,VIGFX9 %s
 ; RUN: llc -march=amdgcn -mcpu=gfx900 < %s | FileCheck -check-prefixes=GCN,VIGFX9 %s
 
 ; GCN-LABEL: {{^}}load_i32:
 ; GCN-DAG: s_mov_b32 s3, 0
 ; GCN-DAG: s_mov_b32 s2, s1
 ; GCN-DAG: s_mov_b32 s1, s3
 ; SICI-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
 ; SICI-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x2
 ; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
 ; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x8
 define amdgpu_vs float @load_i32(i32 addrspace(6)* inreg %p0, i32 addrspace(6)* inreg %p1) #0 {
-  %gep1 = getelementptr i32, i32 addrspace(6)* %p1, i64 2
+  %gep1 = getelementptr inbounds i32, i32 addrspace(6)* %p1, i32 2
   %r0 = load i32, i32 addrspace(6)* %p0
   %r1 = load i32, i32 addrspace(6)* %gep1
   %r = add i32 %r0, %r1
   %r2 = bitcast i32 %r to float
   ret float %r2
 }
 
 ; GCN-LABEL: {{^}}load_v2i32:
 ; GCN-DAG: s_mov_b32 s3, 0
 ; GCN-DAG: s_mov_b32 s2, s1
 ; GCN-DAG: s_mov_b32 s1, s3
 ; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
 ; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x4
 ; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
 ; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x10
 define amdgpu_vs <2 x float> @load_v2i32(<2 x i32> addrspace(6)* inreg %p0, <2 x i32> addrspace(6)* inreg %p1) #0 {
-  %gep1 = getelementptr <2 x i32>, <2 x i32> addrspace(6)* %p1, i64 2
+  %gep1 = getelementptr inbounds <2 x i32>, <2 x i32> addrspace(6)* %p1, i32 2
   %r0 = load <2 x i32>, <2 x i32> addrspace(6)* %p0
   %r1 = load <2 x i32>, <2 x i32> addrspace(6)* %gep1
   %r = add <2 x i32> %r0, %r1
   %r2 = bitcast <2 x i32> %r to <2 x float>
   ret <2 x float> %r2
 }
 
 ; GCN-LABEL: {{^}}load_v4i32:
 ; GCN-DAG: s_mov_b32 s3, 0
 ; GCN-DAG: s_mov_b32 s2, s1
 ; GCN-DAG: s_mov_b32 s1, s3
 ; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
 ; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x8
 ; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
 ; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x20
 define amdgpu_vs <4 x float> @load_v4i32(<4 x i32> addrspace(6)* inreg %p0, <4 x i32> addrspace(6)* inreg %p1) #0 {
-  %gep1 = getelementptr <4 x i32>, <4 x i32> addrspace(6)* %p1, i64 2
+  %gep1 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(6)* %p1, i32 2
   %r0 = load <4 x i32>, <4 x i32> addrspace(6)* %p0
   %r1 = load <4 x i32>, <4 x i32> addrspace(6)* %gep1
   %r = add <4 x i32> %r0, %r1
   %r2 = bitcast <4 x i32> %r to <4 x float>
   ret <4 x float> %r2
 }
 
 ; GCN-LABEL: {{^}}load_v8i32:
 ; GCN-DAG: s_mov_b32 s3, 0
 ; GCN-DAG: s_mov_b32 s2, s1
 ; GCN-DAG: s_mov_b32 s1, s3
 ; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
 ; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x10
 ; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
 ; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x40
 define amdgpu_vs <8 x float> @load_v8i32(<8 x i32> addrspace(6)* inreg %p0, <8 x i32> addrspace(6)* inreg %p1) #0 {
-  %gep1 = getelementptr <8 x i32>, <8 x i32> addrspace(6)* %p1, i64 2
+  %gep1 = getelementptr inbounds <8 x i32>, <8 x i32> addrspace(6)* %p1, i32 2
   %r0 = load <8 x i32>, <8 x i32> addrspace(6)* %p0
   %r1 = load <8 x i32>, <8 x i32> addrspace(6)* %gep1
   %r = add <8 x i32> %r0, %r1
   %r2 = bitcast <8 x i32> %r to <8 x float>
   ret <8 x float> %r2
 }
 
 ; GCN-LABEL: {{^}}load_v16i32:
 ; GCN-DAG: s_mov_b32 s3, 0
 ; GCN-DAG: s_mov_b32 s2, s1
 ; GCN-DAG: s_mov_b32 s1, s3
 ; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
 ; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x20
 ; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
 ; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x80
 define amdgpu_vs <16 x float> @load_v16i32(<16 x i32> addrspace(6)* inreg %p0, <16 x i32> addrspace(6)* inreg %p1) #0 {
-  %gep1 = getelementptr <16 x i32>, <16 x i32> addrspace(6)* %p1, i64 2
+  %gep1 = getelementptr inbounds <16 x i32>, <16 x i32> addrspace(6)* %p1, i32 2
   %r0 = load <16 x i32>, <16 x i32> addrspace(6)* %p0
   %r1 = load <16 x i32>, <16 x i32> addrspace(6)* %gep1
   %r = add <16 x i32> %r0, %r1
   %r2 = bitcast <16 x i32> %r to <16 x float>
   ret <16 x float> %r2
 }
 
 ; GCN-LABEL: {{^}}load_float:
 ; GCN-DAG: s_mov_b32 s3, 0
 ; GCN-DAG: s_mov_b32 s2, s1
 ; GCN-DAG: s_mov_b32 s1, s3
 ; SICI-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
 ; SICI-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x2
 ; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
 ; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x8
 define amdgpu_vs float @load_float(float addrspace(6)* inreg %p0, float addrspace(6)* inreg %p1) #0 {
-  %gep1 = getelementptr float, float addrspace(6)* %p1, i64 2
+  %gep1 = getelementptr inbounds float, float addrspace(6)* %p1, i32 2
   %r0 = load float, float addrspace(6)* %p0
   %r1 = load float, float addrspace(6)* %gep1
   %r = fadd float %r0, %r1
   ret float %r
 }
 
 ; GCN-LABEL: {{^}}load_v2float:
 ; GCN-DAG: s_mov_b32 s3, 0
 ; GCN-DAG: s_mov_b32 s2, s1
 ; GCN-DAG: s_mov_b32 s1, s3
 ; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
 ; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x4
 ; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
 ; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x10
 define amdgpu_vs <2 x float> @load_v2float(<2 x float> addrspace(6)* inreg %p0, <2 x float> addrspace(6)* inreg %p1) #0 {
-  %gep1 = getelementptr <2 x float>, <2 x float> addrspace(6)* %p1, i64 2
+  %gep1 = getelementptr inbounds <2 x float>, <2 x float> addrspace(6)* %p1, i32 2
   %r0 = load <2 x float>, <2 x float> addrspace(6)* %p0
   %r1 = load <2 x float>, <2 x float> addrspace(6)* %gep1
   %r = fadd <2 x float> %r0, %r1
   ret <2 x float> %r
 }
 
 ; GCN-LABEL: {{^}}load_v4float:
 ; GCN-DAG: s_mov_b32 s3, 0
 ; GCN-DAG: s_mov_b32 s2, s1
 ; GCN-DAG: s_mov_b32 s1, s3
 ; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
 ; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x8
 ; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
 ; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x20
 define amdgpu_vs <4 x float> @load_v4float(<4 x float> addrspace(6)* inreg %p0, <4 x float> addrspace(6)* inreg %p1) #0 {
-  %gep1 = getelementptr <4 x float>, <4 x float> addrspace(6)* %p1, i64 2
+  %gep1 = getelementptr inbounds <4 x float>, <4 x float> addrspace(6)* %p1, i32 2
   %r0 = load <4 x float>, <4 x float> addrspace(6)* %p0
   %r1 = load <4 x float>, <4 x float> addrspace(6)* %gep1
   %r = fadd <4 x float> %r0, %r1
   ret <4 x float> %r
 }
 
 ; GCN-LABEL: {{^}}load_v8float:
 ; GCN-DAG: s_mov_b32 s3, 0
 ; GCN-DAG: s_mov_b32 s2, s1
 ; GCN-DAG: s_mov_b32 s1, s3
 ; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
 ; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x10
 ; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
 ; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x40
 define amdgpu_vs <8 x float> @load_v8float(<8 x float> addrspace(6)* inreg %p0, <8 x float> addrspace(6)* inreg %p1) #0 {
-  %gep1 = getelementptr <8 x float>, <8 x float> addrspace(6)* %p1, i64 2
+  %gep1 = getelementptr inbounds <8 x float>, <8 x float> addrspace(6)* %p1, i32 2
   %r0 = load <8 x float>, <8 x float> addrspace(6)* %p0
   %r1 = load <8 x float>, <8 x float> addrspace(6)* %gep1
   %r = fadd <8 x float> %r0, %r1
   ret <8 x float> %r
 }
 
 ; GCN-LABEL: {{^}}load_v16float:
 ; GCN-DAG: s_mov_b32 s3, 0
 ; GCN-DAG: s_mov_b32 s2, s1
 ; GCN-DAG: s_mov_b32 s1, s3
 ; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
 ; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x20
 ; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
 ; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x80
 define amdgpu_vs <16 x float> @load_v16float(<16 x float> addrspace(6)* inreg %p0, <16 x float> addrspace(6)* inreg %p1) #0 {
-  %gep1 = getelementptr <16 x float>, <16 x float> addrspace(6)* %p1, i64 2
+  %gep1 = getelementptr inbounds <16 x float>, <16 x float> addrspace(6)* %p1, i32 2
   %r0 = load <16 x float>, <16 x float> addrspace(6)* %p0
   %r1 = load <16 x float>, <16 x float> addrspace(6)* %gep1
   %r = fadd <16 x float> %r0, %r1
   ret <16 x float> %r
 }
 
 ; GCN-LABEL: {{^}}load_i32_hi0:
 ; GCN: s_mov_b32 s1, 0
 ; GCN-NEXT: s_load_dword s0, s[0:1], 0x0
 define amdgpu_vs i32 @load_i32_hi0(i32 addrspace(6)* inreg %p) #1 {
   %r0 = load i32, i32 addrspace(6)* %p
   ret i32 %r0
 }
 
 ; GCN-LABEL: {{^}}load_i32_hi1:
 ; GCN: s_mov_b32 s1, 1
 ; GCN-NEXT: s_load_dword s0, s[0:1], 0x0
 define amdgpu_vs i32 @load_i32_hi1(i32 addrspace(6)* inreg %p) #2 {
   %r0 = load i32, i32 addrspace(6)* %p
   ret i32 %r0
 }
 
 ; GCN-LABEL: {{^}}load_i32_hiffff8000:
 ; GCN: s_movk_i32 s1, 0x8000
 ; GCN-NEXT: s_load_dword s0, s[0:1], 0x0
 define amdgpu_vs i32 @load_i32_hiffff8000(i32 addrspace(6)* inreg %p) #3 {
   %r0 = load i32, i32 addrspace(6)* %p
   ret i32 %r0
 }
 
 ; GCN-LABEL: {{^}}load_i32_hifffffff0:
 ; GCN: s_mov_b32 s1, -16
 ; GCN-NEXT: s_load_dword s0, s[0:1], 0x0
 define amdgpu_vs i32 @load_i32_hifffffff0(i32 addrspace(6)* inreg %p) #4 {
   %r0 = load i32, i32 addrspace(6)* %p
   ret i32 %r0
 }
 
 ; GCN-LABEL: {{^}}load_sampler
 ; GCN: v_readfirstlane_b32
 ; GCN-NEXT: v_readfirstlane_b32
 ; SI: s_nop
 ; GCN: s_load_dwordx8
 ; GCN-NEXT: s_load_dwordx4
 ; GCN: image_sample
 define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @load_sampler([0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, i32, i32, float, i32) #5 {
 main_body:
   %22 = call nsz float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %5) #8
   %23 = bitcast float %22 to i32
   %24 = shl i32 %23, 1
-  %25 = getelementptr [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24, !amdgpu.uniform !0
+  %25 = getelementptr inbounds [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24, !amdgpu.uniform !0
   %26 = load <8 x i32>, <8 x i32> addrspace(6)* %25, align 32, !invariant.load !0
   %27 = shl i32 %23, 2
   %28 = or i32 %27, 3
   %29 = bitcast [0 x <8 x i32>] addrspace(6)* %1 to [0 x <4 x i32>] addrspace(6)*
-  %30 = getelementptr [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28, !amdgpu.uniform !0
+  %30 = getelementptr inbounds [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28, !amdgpu.uniform !0
   %31 = load <4 x i32>, <4 x i32> addrspace(6)* %30, align 16, !invariant.load !0
   %32 = call nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float 0.0, <8 x i32> %26, <4 x i32> %31, i1 0, i32 0, i32 0) #8
   %33 = extractelement <4 x float> %32, i32 0
   %34 = extractelement <4 x float> %32, i32 1
   %35 = extractelement <4 x float> %32, i32 2
   %36 = extractelement <4 x float> %32, i32 3
   %37 = bitcast float %4 to i32
   %38 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> undef, i32 %37, 4
   %39 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %38, float %33, 5
   %40 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %39, float %34, 6
   %41 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %40, float %35, 7
   %42 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %41, float %36, 8
   %43 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %42, float %20, 19
   ret <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %43
 }
 
 ; GCN-LABEL: {{^}}load_sampler_nouniform
 ; GCN: v_readfirstlane_b32
 ; GCN-NEXT: v_readfirstlane_b32
 ; SI: s_nop
 ; GCN: s_load_dwordx8
 ; GCN-NEXT: s_load_dwordx4
 ; GCN: image_sample
 define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @load_sampler_nouniform([0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, i32, i32, float, i32) #5 {
 main_body:
   %22 = call nsz float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %5) #8
   %23 = bitcast float %22 to i32
   %24 = shl i32 %23, 1
-  %25 = getelementptr [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24
+  %25 = getelementptr inbounds [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24
   %26 = load <8 x i32>, <8 x i32> addrspace(6)* %25, align 32, !invariant.load !0
   %27 = shl i32 %23, 2
   %28 = or i32 %27, 3
   %29 = bitcast [0 x <8 x i32>] addrspace(6)* %1 to [0 x <4 x i32>] addrspace(6)*
-  %30 = getelementptr [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28
+  %30 = getelementptr inbounds [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28
   %31 = load <4 x i32>, <4 x i32> addrspace(6)* %30, align 16, !invariant.load !0
   %32 = call nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float 0.0, <8 x i32> %26, <4 x i32> %31, i1 0, i32 0, i32 0) #8
   %33 = extractelement <4 x float> %32, i32 0
   %34 = extractelement <4 x float> %32, i32 1
   %35 = extractelement <4 x float> %32, i32 2
   %36 = extractelement <4 x float> %32, i32 3
   %37 = bitcast float %4 to i32
   %38 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> undef, i32 %37, 4
   %39 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %38, float %33, 5
   %40 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %39, float %34, 6
   %41 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %40, float %35, 7
   %42 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %41, float %36, 8
   %43 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %42, float %20, 19
   ret <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %43
+}
+
+; GCN-LABEL: {{^}}load_addr_no_fold:
+; GCN-DAG: s_add_i32 s0, s0, 4
+; GCN-DAG: s_mov_b32 s1, 0
+; GCN: s_load_dword s{{[0-9]}}, s[0:1], 0x0
+define amdgpu_vs float @load_addr_no_fold(i32 addrspace(6)* inreg noalias %p0) #0 {
+  %gep1 = getelementptr i32, i32 addrspace(6)* %p0, i32 1
+  %r1 = load i32, i32 addrspace(6)* %gep1
+  %r2 = bitcast i32 %r1 to float
+  ret float %r2
 }
 
 ; Function Attrs: nounwind readnone speculatable
 declare float @llvm.amdgcn.interp.mov(i32, i32, i32, i32) #6
 
 ; Function Attrs: nounwind readonly
 declare <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32, float, <8 x i32>, <4 x i32>, i1, i32, i32) #7
 
 
 !0 = !{}
 
 attributes #0 = { nounwind }
 attributes #1 = { nounwind "amdgpu-32bit-address-high-bits"="0" }
 attributes #2 = { nounwind "amdgpu-32bit-address-high-bits"="1" }
 attributes #3 = { nounwind "amdgpu-32bit-address-high-bits"="0xffff8000" }
 attributes #4 = { nounwind "amdgpu-32bit-address-high-bits"="0xfffffff0" }
 attributes #5 = { "InitialPSInputAddr"="45175" }
 attributes #6 = { nounwind readnone speculatable }
 attributes #7 = { nounwind readonly }
 attributes #8 = { nounwind readnone }
Index: vendor/llvm/dist-release_70/test/CodeGen/ARM/ldrex-frame-size.ll
===================================================================
--- vendor/llvm/dist-release_70/test/CodeGen/ARM/ldrex-frame-size.ll	(nonexistent)
+++ vendor/llvm/dist-release_70/test/CodeGen/ARM/ldrex-frame-size.ll	(revision 338575)
@@ -0,0 +1,36 @@
+; RUN: llc -mtriple=thumbv7-linux-gnueabi -o - %s | FileCheck %s
+
+; This alloca is just large enough that FrameLowering decides it needs a frame
+; to guarantee access, based on the range of ldrex.
+
+; The actual alloca size is a bit of black magic, unfortunately: the real
+; maximum accessible is 1020, but FrameLowering adds 16 bytes to its estimated
+; stack size just because so the alloca is not actually the what the limit gets
+; compared to. The important point is that we don't go up to ~4096, which is the
+; default with no strange instructions.
+define void @test_large_frame() {
+; CHECK-LABEL: test_large_frame:
+; CHECK: push
+; CHECK: sub.w sp, sp, #1004
+
+  %ptr = alloca i32, i32 251
+
+  %addr = getelementptr i32, i32* %ptr, i32 1
+  call i32 @llvm.arm.ldrex.p0i32(i32* %addr)
+  ret void
+}
+
+; This alloca is just is just the other side of the limit, so no frame
+define void @test_small_frame() {
+; CHECK-LABEL: test_small_frame:
+; CHECK-NOT: push
+; CHECK: sub.w sp, sp, #1000
+
+  %ptr = alloca i32, i32 250
+
+  %addr = getelementptr i32, i32* %ptr, i32 1
+  call i32 @llvm.arm.ldrex.p0i32(i32* %addr)
+  ret void
+}
+
+declare i32 @llvm.arm.ldrex.p0i32(i32*)
Index: vendor/llvm/dist-release_70/test/CodeGen/ARM/ldstrex.ll
===================================================================
--- vendor/llvm/dist-release_70/test/CodeGen/ARM/ldstrex.ll	(revision 338574)
+++ vendor/llvm/dist-release_70/test/CodeGen/ARM/ldstrex.ll	(revision 338575)
@@ -1,159 +1,244 @@
 ; RUN: llc < %s -mtriple=armv7-apple-darwin   | FileCheck %s
 ; RUN: llc < %s -mtriple=thumbv7-apple-darwin > %t
 ; RUN: FileCheck %s < %t
 ; RUN: FileCheck %s < %t --check-prefix=CHECK-T2ADDRMODE
 
 %0 = type { i32, i32 }
 
 ; CHECK-LABEL: f0:
 ; CHECK: ldrexd
 define i64 @f0(i8* %p) nounwind readonly {
 entry:
   %ldrexd = tail call %0 @llvm.arm.ldrexd(i8* %p)
   %0 = extractvalue %0 %ldrexd, 1
   %1 = extractvalue %0 %ldrexd, 0
   %2 = zext i32 %0 to i64
   %3 = zext i32 %1 to i64
   %shl = shl nuw i64 %2, 32
   %4 = or i64 %shl, %3
   ret i64 %4
 }
 
 ; CHECK-LABEL: f1:
 ; CHECK: strexd
 define i32 @f1(i8* %ptr, i64 %val) nounwind {
 entry:
   %tmp4 = trunc i64 %val to i32
   %tmp6 = lshr i64 %val, 32
   %tmp7 = trunc i64 %tmp6 to i32
   %strexd = tail call i32 @llvm.arm.strexd(i32 %tmp4, i32 %tmp7, i8* %ptr)
   ret i32 %strexd
 }
 
 declare %0 @llvm.arm.ldrexd(i8*) nounwind readonly
 declare i32 @llvm.arm.strexd(i32, i32, i8*) nounwind
 
 ; CHECK-LABEL: test_load_i8:
 ; CHECK: ldrexb r0, [r0]
 ; CHECK-NOT: uxtb
 ; CHECK-NOT: and
 define zeroext i8 @test_load_i8(i8* %addr) {
   %val = call i32 @llvm.arm.ldrex.p0i8(i8* %addr)
   %val8 = trunc i32 %val to i8
   ret i8 %val8
 }
 
 ; CHECK-LABEL: test_load_i16:
 ; CHECK: ldrexh r0, [r0]
 ; CHECK-NOT: uxth
 ; CHECK-NOT: and
 define zeroext i16 @test_load_i16(i16* %addr) {
   %val = call i32 @llvm.arm.ldrex.p0i16(i16* %addr)
   %val16 = trunc i32 %val to i16
   ret i16 %val16
 }
 
 ; CHECK-LABEL: test_load_i32:
 ; CHECK: ldrex r0, [r0]
 define i32 @test_load_i32(i32* %addr) {
   %val = call i32 @llvm.arm.ldrex.p0i32(i32* %addr)
   ret i32 %val
 }
 
 declare i32 @llvm.arm.ldrex.p0i8(i8*) nounwind readonly
 declare i32 @llvm.arm.ldrex.p0i16(i16*) nounwind readonly
 declare i32 @llvm.arm.ldrex.p0i32(i32*) nounwind readonly
 
 ; CHECK-LABEL: test_store_i8:
 ; CHECK-NOT: uxtb
 ; CHECK: strexb r0, r1, [r2]
 define i32 @test_store_i8(i32, i8 %val, i8* %addr) {
   %extval = zext i8 %val to i32
   %res = call i32 @llvm.arm.strex.p0i8(i32 %extval, i8* %addr)
   ret i32 %res
 }
 
 ; CHECK-LABEL: test_store_i16:
 ; CHECK-NOT: uxth
 ; CHECK: strexh r0, r1, [r2]
 define i32 @test_store_i16(i32, i16 %val, i16* %addr) {
   %extval = zext i16 %val to i32
   %res = call i32 @llvm.arm.strex.p0i16(i32 %extval, i16* %addr)
   ret i32 %res
 }
 
 ; CHECK-LABEL: test_store_i32:
 ; CHECK: strex r0, r1, [r2]
 define i32 @test_store_i32(i32, i32 %val, i32* %addr) {
   %res = call i32 @llvm.arm.strex.p0i32(i32 %val, i32* %addr)
   ret i32 %res
 }
 
 declare i32 @llvm.arm.strex.p0i8(i32, i8*) nounwind
 declare i32 @llvm.arm.strex.p0i16(i32, i16*) nounwind
 declare i32 @llvm.arm.strex.p0i32(i32, i32*) nounwind
 
 ; CHECK-LABEL: test_clear:
 ; CHECK: clrex
 define void @test_clear() {
   call void @llvm.arm.clrex()
   ret void
 }
 
 declare void @llvm.arm.clrex() nounwind
 
 @base = global i32* null
 
 define void @excl_addrmode() {
 ; CHECK-T2ADDRMODE-LABEL: excl_addrmode:
   %base1020 = load i32*, i32** @base
   %offset1020 = getelementptr i32, i32* %base1020, i32 255
   call i32 @llvm.arm.ldrex.p0i32(i32* %offset1020)
   call i32 @llvm.arm.strex.p0i32(i32 0, i32* %offset1020)
 ; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [{{r[0-9]+}}, #1020]
 ; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [{{r[0-9]+}}, #1020]
 
   %base1024 = load i32*, i32** @base
   %offset1024 = getelementptr i32, i32* %base1024, i32 256
   call i32 @llvm.arm.ldrex.p0i32(i32* %offset1024)
   call i32 @llvm.arm.strex.p0i32(i32 0, i32* %offset1024)
 ; CHECK-T2ADDRMODE: add.w r[[ADDR:[0-9]+]], {{r[0-9]+}}, #1024
 ; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [r[[ADDR]]]
 ; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [r[[ADDR]]]
 
   %base1 = load i32*, i32** @base
   %addr8 = bitcast i32* %base1 to i8*
   %offset1_8 = getelementptr i8, i8* %addr8, i32 1
   %offset1 = bitcast i8* %offset1_8 to i32*
   call i32 @llvm.arm.ldrex.p0i32(i32* %offset1)
   call i32 @llvm.arm.strex.p0i32(i32 0, i32* %offset1)
 ; CHECK-T2ADDRMODE: adds r[[ADDR:[0-9]+]], #1
 ; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [r[[ADDR]]]
 ; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [r[[ADDR]]]
 
   %local = alloca i8, i32 1024
   %local32 = bitcast i8* %local to i32*
   call i32 @llvm.arm.ldrex.p0i32(i32* %local32)
   call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32)
 ; CHECK-T2ADDRMODE: mov r[[ADDR:[0-9]+]], sp
 ; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [r[[ADDR]]]
 ; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [r[[ADDR]]]
 
   ret void
 }
 
+define void @test_excl_addrmode_folded() {
+; CHECK-LABEL: test_excl_addrmode_folded:
+  %local = alloca i8, i32 4096
+
+  %local.0 = getelementptr i8, i8* %local, i32 4
+  %local32.0 = bitcast i8* %local.0 to i32*
+  call i32 @llvm.arm.ldrex.p0i32(i32* %local32.0)
+  call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.0)
+; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [sp, #4]
+; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [sp, #4]
+
+  %local.1 = getelementptr i8, i8* %local, i32 1020
+  %local32.1 = bitcast i8* %local.1 to i32*
+  call i32 @llvm.arm.ldrex.p0i32(i32* %local32.1)
+  call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.1)
+; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [sp, #1020]
+; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [sp, #1020]
+
+  ret void
+}
+
+define void @test_excl_addrmode_range() {
+; CHECK-LABEL: test_excl_addrmode_range:
+  %local = alloca i8, i32 4096
+
+  %local.0 = getelementptr i8, i8* %local, i32 1024
+  %local32.0 = bitcast i8* %local.0 to i32*
+  call i32 @llvm.arm.ldrex.p0i32(i32* %local32.0)
+  call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.0)
+; CHECK-T2ADDRMODE: mov r[[TMP:[0-9]+]], sp
+; CHECK-T2ADDRMODE: add.w r[[ADDR:[0-9]+]], r[[TMP]], #1024
+; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [r[[ADDR]]]
+; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [r[[ADDR]]]
+
+  ret void
+}
+
+define void @test_excl_addrmode_align() {
+; CHECK-LABEL: test_excl_addrmode_align:
+  %local = alloca i8, i32 4096
+
+  %local.0 = getelementptr i8, i8* %local, i32 2
+  %local32.0 = bitcast i8* %local.0 to i32*
+  call i32 @llvm.arm.ldrex.p0i32(i32* %local32.0)
+  call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.0)
+; CHECK-T2ADDRMODE: mov r[[ADDR:[0-9]+]], sp
+; CHECK-T2ADDRMODE: adds r[[ADDR:[0-9]+]], #2
+; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [r[[ADDR]]]
+; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [r[[ADDR]]]
+
+  ret void
+}
+
+define void @test_excl_addrmode_sign() {
+; CHECK-LABEL: test_excl_addrmode_sign:
+  %local = alloca i8, i32 4096
+
+  %local.0 = getelementptr i8, i8* %local, i32 -4
+  %local32.0 = bitcast i8* %local.0 to i32*
+  call i32 @llvm.arm.ldrex.p0i32(i32* %local32.0)
+  call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.0)
+; CHECK-T2ADDRMODE: mov r[[ADDR:[0-9]+]], sp
+; CHECK-T2ADDRMODE: subs r[[ADDR:[0-9]+]], #4
+; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [r[[ADDR]]]
+; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [r[[ADDR]]]
+
+  ret void
+}
+
+define void @test_excl_addrmode_combination() {
+; CHECK-LABEL: test_excl_addrmode_combination:
+  %local = alloca i8, i32 4096
+  %unused = alloca i8, i32 64
+
+  %local.0 = getelementptr i8, i8* %local, i32 4
+  %local32.0 = bitcast i8* %local.0 to i32*
+  call i32 @llvm.arm.ldrex.p0i32(i32* %local32.0)
+  call i32 @llvm.arm.strex.p0i32(i32 0, i32* %local32.0)
+; CHECK-T2ADDRMODE: ldrex {{r[0-9]+}}, [sp, #68]
+; CHECK-T2ADDRMODE: strex {{r[0-9]+}}, {{r[0-9]+}}, [sp, #68]
+
+  ret void
+}
+
+
 ; LLVM should know, even across basic blocks, that ldrex is setting the high
 ; bits of its i32 to 0. There should be no zero-extend operation.
 define zeroext i8 @test_cross_block_zext_i8(i1 %tst, i8* %addr) {
 ; CHECK: test_cross_block_zext_i8:
 ; CHECK-NOT: uxtb
 ; CHECK-NOT: and
 ; CHECK: bx lr
   %val = call i32 @llvm.arm.ldrex.p0i8(i8* %addr)
   br i1 %tst, label %end, label %mid
 mid:
   ret i8 42
 end:
   %val8 = trunc i32 %val to i8
   ret i8 %val8
 }
Index: vendor/llvm/dist-release_70/test/CodeGen/X86/eip-addressing-i386.ll
===================================================================
--- vendor/llvm/dist-release_70/test/CodeGen/X86/eip-addressing-i386.ll	(revision 338574)
+++ vendor/llvm/dist-release_70/test/CodeGen/X86/eip-addressing-i386.ll	(revision 338575)
@@ -1,13 +1,13 @@
 ; RUN: not llc -mtriple i386-apple-- -o /dev/null < %s 2>&1| FileCheck %s
-; CHECK: <inline asm>:1:13: error: register %eip is only available in 64-bit mode
+; CHECK: <inline asm>:1:13: error: IP-relative addressing requires 64-bit mode
 ; CHECK-NEXT: jmpl *_foo(%eip)
 
-; Make sure that we emit an error if we encounter RIP-relative instructions in
+; Make sure that we emit an error if we encounter IP-relative instructions in
 ; 32-bit mode.
 
 define i32 @foo() { ret i32 0 }
 
 define i32 @bar() {
   call void asm sideeffect "jmpl *_foo(%eip)\0A", "~{dirflag},~{fpsr},~{flags}"()
   ret i32 0
 }
Index: vendor/llvm/dist-release_70/test/MC/AsmParser/directive_file-3.s
===================================================================
--- vendor/llvm/dist-release_70/test/MC/AsmParser/directive_file-3.s	(nonexistent)
+++ vendor/llvm/dist-release_70/test/MC/AsmParser/directive_file-3.s	(revision 338575)
@@ -0,0 +1,24 @@
+// RUN: llvm-mc -g -triple i386-unknown-unknown  %s | FileCheck -check-prefix=CHECK-DEFAULT %s
+// RUN: llvm-mc -g -triple i386-unknown-unknown %s -filetype=obj | obj2yaml | FileCheck -check-prefix=CHECK-DEBUG %s
+
+// Test for Bug 38695
+// This testcase has a single function and a .file directive
+// without the [file-num] argument.  When compiled with -g,
+// this testcase will not report error, and generate new
+// debug info.
+
+        .file "hello"
+.text
+
+f1:
+        nop
+.size f1, .-f1
+
+// CHECK-DEFAULT: .file "hello"
+
+// CHECK-DEBUG:  Sections:
+// CHECK-DEBUG:  - Name:            .text
+// CHECK-DEBUG:  - Name:            .debug_info
+// CHECK-DEBUG:  - Name:            .rel.debug_info
+// CHECK-DEBUG:    Info:            .debug_info
+// CHECK-DEBUG:  Symbols:

Property changes on: vendor/llvm/dist-release_70/test/MC/AsmParser/directive_file-3.s
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: vendor/llvm/dist-release_70/test/MC/X86/pr38826.s
===================================================================
--- vendor/llvm/dist-release_70/test/MC/X86/pr38826.s	(nonexistent)
+++ vendor/llvm/dist-release_70/test/MC/X86/pr38826.s	(revision 338575)
@@ -0,0 +1,24 @@
+// RUN: llvm-mc %s -triple i386-unknown-unknown
+
+// Make sure %eip is allowed as a register in cfi directives in 32-bit mode
+
+ .text
+ .align 4
+ .globl foo
+
+foo:
+ .cfi_startproc
+
+ movl (%edx), %ecx
+ movl 4(%edx), %ebx
+ movl 8(%edx), %esi
+ movl 12(%edx), %edi
+ movl 16(%edx), %ebp
+ .cfi_def_cfa %edx, 0
+ .cfi_offset %eip, 24
+ .cfi_register %esp, %ecx
+ movl %ecx, %esp
+
+ jmp *24(%edx)
+
+ .cfi_endproc

Property changes on: vendor/llvm/dist-release_70/test/MC/X86/pr38826.s
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+FreeBSD=%H
\ No newline at end of property
Added: svn:mime-type
## -0,0 +1 ##
+text/plain
\ No newline at end of property
Index: vendor/llvm/dist-release_70/test/MC/X86/x86_errors.s
===================================================================
--- vendor/llvm/dist-release_70/test/MC/X86/x86_errors.s	(revision 338574)
+++ vendor/llvm/dist-release_70/test/MC/X86/x86_errors.s	(revision 338575)
@@ -1,120 +1,120 @@
 // RUN: not llvm-mc -triple x86_64-unknown-unknown %s 2> %t.err
 // RUN: FileCheck --check-prefix=64 < %t.err %s
 
 // RUN: not llvm-mc -triple i386-unknown-unknown %s 2> %t.err
 // RUN: FileCheck --check-prefix=32 < %t.err %s
 // rdar://8204588
 
 // 64: error: ambiguous instructions require an explicit suffix (could be 'cmpb', 'cmpw', 'cmpl', or 'cmpq')
 cmp $0, 0(%eax)
 
 // 32: error: register %rax is only available in 64-bit mode
 addl $0, 0(%rax)
 
 // 32: test.s:8:2: error: invalid instruction mnemonic 'movi'
 
 # 8 "test.s"
  movi $8,%eax
 
 movl 0(%rax), 0(%edx)  // error: invalid operand for instruction
 
 // 32: error: instruction requires: 64-bit mode
 sysexitq
 
 // rdar://10710167
 // 64: error: expected scale expression
 lea (%rsp, %rbp, $4), %rax
 
 // rdar://10423777
 // 64: error: base register is 64-bit, but index register is not
 movq (%rsi,%ecx),%xmm0
 
 // 64: error: invalid 16-bit base register
 movl %eax,(%bp,%si)
 
 // 32: error: scale factor in 16-bit address must be 1
 movl %eax,(%bp,%si,2)
 
 // 32: error: invalid 16-bit base register
 movl %eax,(%cx)
 
 // 32: error: invalid 16-bit base/index register combination
 movl %eax,(%bp,%bx)
 
 // 32: error: 16-bit memory operand may not include only index register
 movl %eax,(,%bx)
 
 // 32: error: invalid operand for instruction
 outb al, 4
 
 // 32: error: invalid segment register
 // 64: error: invalid segment register
 movl %eax:0x00, %ebx
 
 // 32: error: invalid operand for instruction
 // 64: error: invalid operand for instruction
 cmpps $-129, %xmm0, %xmm0
 
 // 32: error: invalid operand for instruction
 // 64: error: invalid operand for instruction
 cmppd $256, %xmm0, %xmm0
 
 // 32: error: instruction requires: 64-bit mode
 jrcxz 1
 
 // 64: error: instruction requires: Not 64-bit mode
 jcxz 1
 
 // 32: error: register %cr8 is only available in 64-bit mode
 movl %edx, %cr8
 
 // 32: error: register %dr8 is only available in 64-bit mode
 movl %edx, %dr8
 
 // 32: error: register %rip is only available in 64-bit mode
 // 64: error: %rip can only be used as a base register
 mov %rip, %rax
 
 // 32: error: register %rax is only available in 64-bit mode
 // 64: error: %rip is not allowed as an index register
 mov (%rax,%rip), %rbx
 
 // 32: error: instruction requires: 64-bit mode
 ljmpq *(%eax)
 
 // 32: error: register %rax is only available in 64-bit mode
 // 64: error: invalid base+index expression
 leaq (%rax,%rsp), %rax
 
 // 32: error: invalid base+index expression
 // 64: error: invalid base+index expression
 leaq (%eax,%esp), %eax
 
 // 32: error: invalid 16-bit base/index register combination
 // 64: error: invalid 16-bit base register
 lea (%si,%bp), %ax
 // 32: error: invalid 16-bit base/index register combination
 // 64: error: invalid 16-bit base register
 lea (%di,%bp), %ax
 // 32: error: invalid 16-bit base/index register combination
 // 64: error: invalid 16-bit base register
 lea (%si,%bx), %ax
 // 32: error: invalid 16-bit base/index register combination
 // 64: error: invalid 16-bit base register
 lea (%di,%bx), %ax
 
-// 32: error: register %eip is only available in 64-bit mode
+// 32: error: invalid base+index expression
 // 64: error: invalid base+index expression
 mov (,%eip), %rbx
 
-// 32: error: register %eip is only available in 64-bit mode
+// 32: error: invalid base+index expression
 // 64: error: invalid base+index expression
 mov (%eip,%eax), %rbx
 
 // 32: error: register %rax is only available in 64-bit mode
 // 64: error: base register is 64-bit, but index register is not
 mov (%rax,%eiz), %ebx
 
 // 32: error: register %riz is only available in 64-bit mode
 // 64: error: base register is 32-bit, but index register is not
 mov (%eax,%riz), %ebx
Index: vendor/llvm/dist-release_70/test/Transforms/Inline/infinite-loop-two-predecessors.ll
===================================================================
--- vendor/llvm/dist-release_70/test/Transforms/Inline/infinite-loop-two-predecessors.ll	(nonexistent)
+++ vendor/llvm/dist-release_70/test/Transforms/Inline/infinite-loop-two-predecessors.ll	(revision 338575)
@@ -0,0 +1,32 @@
+; RUN: opt -S -o - %s -inline | FileCheck %s
+
+define void @f1() {
+bb.0:
+  br i1 false, label %bb.2, label %bb.1
+
+bb.1:                                             ; preds = %bb.0
+  br label %bb.2
+
+bb.2:                                             ; preds = %bb.0, %bb.1
+  %tmp0 = phi i1 [ true, %bb.1 ], [ false, %bb.0 ]
+  br i1 %tmp0, label %bb.4, label %bb.3
+
+bb.3:                                             ; preds = %bb.3, %bb.3
+  br i1 undef, label %bb.3, label %bb.3
+
+bb.4:                                             ; preds = %bb.2
+  ret void
+}
+
+define void @f2() {
+bb.0:
+  call void @f1()
+  ret void
+}
+
+; f1 should be inlined into f2 and simplified/collapsed to nothing.
+
+; CHECK-LABEL: define void @f2() {
+; CHECK-NEXT:  bb.0:
+; CHECK-NEXT:    ret void
+; CHECK-NEXT:  }
Index: vendor/llvm/dist-release_70/test/Transforms/LICM/loopsink-pr38462.ll
===================================================================
--- vendor/llvm/dist-release_70/test/Transforms/LICM/loopsink-pr38462.ll	(nonexistent)
+++ vendor/llvm/dist-release_70/test/Transforms/LICM/loopsink-pr38462.ll	(revision 338575)
@@ -0,0 +1,65 @@
+; RUN: opt -S -loop-sink < %s | FileCheck %s
+
+target datalayout = "e-m:w-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-pc-windows-msvc19.13.26128"
+
+%struct.FontInfoData = type { i32 (...)** }
+%struct.S = type { i8 }
+
+; CHECK: @pr38462
+; Make sure not to assert by trying to sink into catch.dispatch.
+
+define void @pr38462(%struct.FontInfoData* %this) personality i8* bitcast (i32 (...)* @__C_specific_handler to i8*) !prof !1 {
+entry:
+  %s = alloca %struct.S
+  %call6 = call i32 @f()
+  %tobool7 = icmp eq i32 %call6, 0
+  br i1 %tobool7, label %for.body.lr.ph, label %for.cond.cleanup
+
+for.body.lr.ph:
+  %0 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0
+  br label %for.body
+
+for.cond.cleanup.loopexit:
+  br label %for.cond.cleanup
+
+for.cond.cleanup:
+  ret void
+
+for.body:
+  %call2 = invoke i32 @f() to label %__try.cont unwind label %catch.dispatch
+
+catch.dispatch:
+  %1 = catchswitch within none [label %__except] unwind to caller
+
+__except:
+  %2 = catchpad within %1 [i8* null]
+  catchret from %2 to label %__except3
+
+__except3:
+  call void @llvm.lifetime.start.p0i8(i64 1, i8* nonnull %0)
+  %call.i = call zeroext i1 @g(%struct.S* nonnull %s)
+  br i1 %call.i, label %if.then.i, label %exit
+
+if.then.i:
+  %call2.i = call i32 @f()
+  br label %exit
+
+exit:
+  call void @llvm.lifetime.end.p0i8(i64 1, i8* nonnull %0)
+  br label %__try.cont
+
+__try.cont:
+  %call = call i32 @f()
+  %tobool = icmp eq i32 %call, 0
+  br i1 %tobool, label %for.body, label %for.cond.cleanup.loopexit
+}
+
+declare i32 @__C_specific_handler(...)
+declare i32 @f()
+declare zeroext i1 @g(%struct.S*)
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture)
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture)
+
+!1 = !{!"function_entry_count", i64 1}
+
Index: vendor/llvm/dist-release_70/test/Transforms/LoopVectorize/X86/uniform-phi.ll
===================================================================
--- vendor/llvm/dist-release_70/test/Transforms/LoopVectorize/X86/uniform-phi.ll	(revision 338574)
+++ vendor/llvm/dist-release_70/test/Transforms/LoopVectorize/X86/uniform-phi.ll	(revision 338575)
@@ -1,77 +1,99 @@
 ; RUN: opt < %s  -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 -debug-only=loop-vectorize -S 2>&1 | FileCheck %s
 ; REQUIRES: asserts
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 
 ; CHECK-LABEL: test
 ; CHECK-DAG: LV: Found uniform instruction:   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
 ; CHECK-DAG: LV: Found uniform instruction:   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
 ; CHECK-DAG: LV: Found uniform instruction:   %exitcond = icmp eq i64 %indvars.iv, 1599
 
 define void @test(float* noalias nocapture %a, float* noalias nocapture readonly %b) #0 {
 entry:
   br label %for.body
 
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv
   %tmp0 = load float, float* %arrayidx, align 4
   %add = fadd float %tmp0, 1.000000e+00
   %arrayidx5 = getelementptr inbounds float, float* %a, i64 %indvars.iv
   store float %add, float* %arrayidx5, align 4
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv, 1599
   br i1 %exitcond, label %for.end, label %for.body
 
 for.end:                                          ; preds = %for.body
   ret void
 }
 
 ; CHECK-LABEL: foo
 ; CHECK-DAG: LV: Found uniform instruction:   %cond = icmp eq i64 %i.next, %n
 ; CHECK-DAG: LV: Found uniform instruction:   %tmp1 = getelementptr inbounds i32, i32* %a, i32 %tmp0
 ; CHECK-NOT: LV: Found uniform instruction:   %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
 
 define void @foo(i32* %a, i64 %n) {
 entry:
   br label %for.body
 
 for.body:
   %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
   %tmp0 = trunc i64 %i to i32
   %tmp1 = getelementptr inbounds i32, i32* %a, i32 %tmp0
   store i32 %tmp0, i32* %tmp1, align 4
   %i.next = add nuw nsw i64 %i, 1
   %cond = icmp eq i64 %i.next, %n
   br i1 %cond, label %for.end, label %for.body
 
 for.end:
   ret void
 }
 
 ; CHECK-LABEL: goo
 ; Check %indvars.iv and %indvars.iv.next are uniform instructions even if they are used outside of loop.
 ; CHECK-DAG: LV: Found uniform instruction:   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
 ; CHECK-DAG: LV: Found uniform instruction:   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
 ; CHECK-DAG: LV: Found uniform instruction:   %exitcond = icmp eq i64 %indvars.iv, 1599
 
 define i64 @goo(float* noalias nocapture %a, float* noalias nocapture readonly %b) #0 {
 entry:
   br label %for.body
 
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv
   %tmp0 = load float, float* %arrayidx, align 4
   %add = fadd float %tmp0, 1.000000e+00
   %arrayidx5 = getelementptr inbounds float, float* %a, i64 %indvars.iv
   store float %add, float* %arrayidx5, align 4
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv, 1599
   br i1 %exitcond, label %for.end, label %for.body
 
 for.end:                                          ; preds = %for.body
   %retval = add i64 %indvars.iv, %indvars.iv.next
   ret i64 %retval
 }
 
+; CHECK-LABEL: PR38786
+; Check that first order recurrence phis (%phi32 and %phi64) are not uniform.
+; CHECK-NOT: LV: Found uniform instruction:   %phi
+define void @PR38786(double* %y, double* %x, i64 %n) {
+entry:
+  br label %for.body
+
+for.body:
+  %phi32 = phi i32 [ 0, %entry ], [ %i32next, %for.body ]
+  %phi64 = phi i64 [ 0, %entry ], [ %i64next, %for.body ]
+  %i32next = add i32 %phi32, 1
+  %i64next = zext i32 %i32next to i64
+  %xip = getelementptr inbounds double, double* %x, i64 %i64next
+  %yip = getelementptr inbounds double, double* %y, i64 %phi64
+  %xi = load double, double* %xip, align 8
+  store double %xi, double* %yip, align 8
+  %cmp = icmp slt i64 %i64next, %n
+  br i1 %cmp, label %for.body, label %for.end
+
+for.end:
+  ret void
+}
Index: vendor/llvm/dist-release_70/test/Transforms/SROA/phi-and-select.ll
===================================================================
--- vendor/llvm/dist-release_70/test/Transforms/SROA/phi-and-select.ll	(revision 338574)
+++ vendor/llvm/dist-release_70/test/Transforms/SROA/phi-and-select.ll	(revision 338575)
@@ -1,602 +1,634 @@
 ; RUN: opt < %s -sroa -S | FileCheck %s
 target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n8:16:32:64"
 
 define i32 @test1() {
 ; CHECK-LABEL: @test1(
 entry:
 	%a = alloca [2 x i32]
 ; CHECK-NOT: alloca
 
   %a0 = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 0
   %a1 = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 1
 	store i32 0, i32* %a0
 	store i32 1, i32* %a1
 	%v0 = load i32, i32* %a0
 	%v1 = load i32, i32* %a1
 ; CHECK-NOT: store
 ; CHECK-NOT: load
 
 	%cond = icmp sle i32 %v0, %v1
 	br i1 %cond, label %then, label %exit
 
 then:
 	br label %exit
 
 exit:
 	%phi = phi i32* [ %a1, %then ], [ %a0, %entry ]
 ; CHECK: phi i32 [ 1, %{{.*}} ], [ 0, %{{.*}} ]
 
 	%result = load i32, i32* %phi
 	ret i32 %result
 }
 
 define i32 @test2() {
 ; CHECK-LABEL: @test2(
 entry:
 	%a = alloca [2 x i32]
 ; CHECK-NOT: alloca
 
   %a0 = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 0
   %a1 = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 1
 	store i32 0, i32* %a0
 	store i32 1, i32* %a1
 	%v0 = load i32, i32* %a0
 	%v1 = load i32, i32* %a1
 ; CHECK-NOT: store
 ; CHECK-NOT: load
 
 	%cond = icmp sle i32 %v0, %v1
 	%select = select i1 %cond, i32* %a1, i32* %a0
 ; CHECK: select i1 %{{.*}}, i32 1, i32 0
 
 	%result = load i32, i32* %select
 	ret i32 %result
 }
 
 define i32 @test3(i32 %x) {
 ; CHECK-LABEL: @test3(
 entry:
 	%a = alloca [2 x i32]
 ; CHECK-NOT: alloca
 
   ; Note that we build redundant GEPs here to ensure that having different GEPs
   ; into the same alloca partation continues to work with PHI speculation. This
   ; was the underlying cause of PR13926.
   %a0 = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 0
   %a0b = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 0
   %a1 = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 1
   %a1b = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 1
 	store i32 0, i32* %a0
 	store i32 1, i32* %a1
 ; CHECK-NOT: store
 
   switch i32 %x, label %bb0 [ i32 1, label %bb1
                               i32 2, label %bb2
                               i32 3, label %bb3
                               i32 4, label %bb4
                               i32 5, label %bb5
                               i32 6, label %bb6
                               i32 7, label %bb7 ]
 
 bb0:
 	br label %exit
 bb1:
 	br label %exit
 bb2:
 	br label %exit
 bb3:
 	br label %exit
 bb4:
 	br label %exit
 bb5:
 	br label %exit
 bb6:
 	br label %exit
 bb7:
 	br label %exit
 
 exit:
 	%phi = phi i32* [ %a1, %bb0 ], [ %a0, %bb1 ], [ %a0, %bb2 ], [ %a1, %bb3 ],
                   [ %a1b, %bb4 ], [ %a0b, %bb5 ], [ %a0b, %bb6 ], [ %a1b, %bb7 ]
 ; CHECK: phi i32 [ 1, %{{.*}} ], [ 0, %{{.*}} ], [ 0, %{{.*}} ], [ 1, %{{.*}} ], [ 1, %{{.*}} ], [ 0, %{{.*}} ], [ 0, %{{.*}} ], [ 1, %{{.*}} ]
 
 	%result = load i32, i32* %phi
 	ret i32 %result
 }
 
 define i32 @test4() {
 ; CHECK-LABEL: @test4(
 entry:
 	%a = alloca [2 x i32]
 ; CHECK-NOT: alloca
 
   %a0 = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 0
   %a1 = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 1
 	store i32 0, i32* %a0
 	store i32 1, i32* %a1
 	%v0 = load i32, i32* %a0
 	%v1 = load i32, i32* %a1
 ; CHECK-NOT: store
 ; CHECK-NOT: load
 
 	%cond = icmp sle i32 %v0, %v1
 	%select = select i1 %cond, i32* %a0, i32* %a0
 ; CHECK-NOT: select
 
 	%result = load i32, i32* %select
 	ret i32 %result
 ; CHECK: ret i32 0
 }
 
 define i32 @test5(i32* %b) {
 ; CHECK-LABEL: @test5(
 entry:
 	%a = alloca [2 x i32]
 ; CHECK-NOT: alloca
 
   %a1 = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 1
 	store i32 1, i32* %a1
 ; CHECK-NOT: store
 
 	%select = select i1 true, i32* %a1, i32* %b
 ; CHECK-NOT: select
 
 	%result = load i32, i32* %select
 ; CHECK-NOT: load
 
 	ret i32 %result
 ; CHECK: ret i32 1
 }
 
 declare void @f(i32*, i32*)
 
 define i32 @test6(i32* %b) {
 ; CHECK-LABEL: @test6(
 entry:
 	%a = alloca [2 x i32]
   %c = alloca i32
 ; CHECK-NOT: alloca
 
   %a1 = getelementptr [2 x i32], [2 x i32]* %a, i64 0, i32 1
 	store i32 1, i32* %a1
 
 	%select = select i1 true, i32* %a1, i32* %b
 	%select2 = select i1 false, i32* %a1, i32* %b
   %select3 = select i1 false, i32* %c, i32* %b
 ; CHECK: %[[select2:.*]] = select i1 false, i32* undef, i32* %b
 ; CHECK: %[[select3:.*]] = select i1 false, i32* undef, i32* %b
 
   ; Note, this would potentially escape the alloca pointer except for the
   ; constant folding of the select.
   call void @f(i32* %select2, i32* %select3)
 ; CHECK: call void @f(i32* %[[select2]], i32* %[[select3]])
 
 
 	%result = load i32, i32* %select
 ; CHECK-NOT: load
 
   %dead = load i32, i32* %c
 
 	ret i32 %result
 ; CHECK: ret i32 1
 }
 
 define i32 @test7() {
 ; CHECK-LABEL: @test7(
 ; CHECK-NOT: alloca
 
 entry:
   %X = alloca i32
   br i1 undef, label %good, label %bad
 
 good:
   %Y1 = getelementptr i32, i32* %X, i64 0
   store i32 0, i32* %Y1
   br label %exit
 
 bad:
   %Y2 = getelementptr i32, i32* %X, i64 1
   store i32 0, i32* %Y2
   br label %exit
 
 exit:
 	%P = phi i32* [ %Y1, %good ], [ %Y2, %bad ]
 ; CHECK: %[[phi:.*]] = phi i32 [ 0, %good ],
   %Z2 = load i32, i32* %P
   ret i32 %Z2
 ; CHECK: ret i32 %[[phi]]
 }
 
 define i32 @test8(i32 %b, i32* %ptr) {
 ; Ensure that we rewrite allocas to the used type when that use is hidden by
 ; a PHI that can be speculated.
 ; CHECK-LABEL: @test8(
 ; CHECK-NOT: alloca
 ; CHECK-NOT: load
 ; CHECK: %[[value:.*]] = load i32, i32* %ptr
 ; CHECK-NOT: load
 ; CHECK: %[[result:.*]] = phi i32 [ undef, %else ], [ %[[value]], %then ]
 ; CHECK-NEXT: ret i32 %[[result]]
 
 entry:
   %f = alloca float
   %test = icmp ne i32 %b, 0
   br i1 %test, label %then, label %else
 
 then:
   br label %exit
 
 else:
   %bitcast = bitcast float* %f to i32*
   br label %exit
 
 exit:
   %phi = phi i32* [ %bitcast, %else ], [ %ptr, %then ]
   %loaded = load i32, i32* %phi, align 4
   ret i32 %loaded
 }
 
 define i32 @test9(i32 %b, i32* %ptr) {
 ; Same as @test8 but for a select rather than a PHI node.
 ; CHECK-LABEL: @test9(
 ; CHECK-NOT: alloca
 ; CHECK-NOT: load
 ; CHECK: %[[value:.*]] = load i32, i32* %ptr
 ; CHECK-NOT: load
 ; CHECK: %[[result:.*]] = select i1 %{{.*}}, i32 undef, i32 %[[value]]
 ; CHECK-NEXT: ret i32 %[[result]]
 
 entry:
   %f = alloca float
   store i32 0, i32* %ptr
   %test = icmp ne i32 %b, 0
   %bitcast = bitcast float* %f to i32*
   %select = select i1 %test, i32* %bitcast, i32* %ptr
   %loaded = load i32, i32* %select, align 4
   ret i32 %loaded
 }
 
 define float @test10(i32 %b, float* %ptr) {
 ; Don't try to promote allocas which are not elligible for it even after
 ; rewriting due to the necessity of inserting bitcasts when speculating a PHI
 ; node.
 ; CHECK-LABEL: @test10(
 ; CHECK: %[[alloca:.*]] = alloca
 ; CHECK: %[[argvalue:.*]] = load float, float* %ptr
 ; CHECK: %[[cast:.*]] = bitcast double* %[[alloca]] to float*
 ; CHECK: %[[allocavalue:.*]] = load float, float* %[[cast]]
 ; CHECK: %[[result:.*]] = phi float [ %[[allocavalue]], %else ], [ %[[argvalue]], %then ]
 ; CHECK-NEXT: ret float %[[result]]
 
 entry:
   %f = alloca double
   store double 0.0, double* %f
   %test = icmp ne i32 %b, 0
   br i1 %test, label %then, label %else
 
 then:
   br label %exit
 
 else:
   %bitcast = bitcast double* %f to float*
   br label %exit
 
 exit:
   %phi = phi float* [ %bitcast, %else ], [ %ptr, %then ]
   %loaded = load float, float* %phi, align 4
   ret float %loaded
 }
 
 define float @test11(i32 %b, float* %ptr) {
 ; Same as @test10 but for a select rather than a PHI node.
 ; CHECK-LABEL: @test11(
 ; CHECK: %[[alloca:.*]] = alloca
 ; CHECK: %[[cast:.*]] = bitcast double* %[[alloca]] to float*
 ; CHECK: %[[allocavalue:.*]] = load float, float* %[[cast]]
 ; CHECK: %[[argvalue:.*]] = load float, float* %ptr
 ; CHECK: %[[result:.*]] = select i1 %{{.*}}, float %[[allocavalue]], float %[[argvalue]]
 ; CHECK-NEXT: ret float %[[result]]
 
 entry:
   %f = alloca double
   store double 0.0, double* %f
   store float 0.0, float* %ptr
   %test = icmp ne i32 %b, 0
   %bitcast = bitcast double* %f to float*
   %select = select i1 %test, float* %bitcast, float* %ptr
   %loaded = load float, float* %select, align 4
   ret float %loaded
 }
 
 define i32 @test12(i32 %x, i32* %p) {
 ; Ensure we don't crash or fail to nuke dead selects of allocas if no load is
 ; never found.
 ; CHECK-LABEL: @test12(
 ; CHECK-NOT: alloca
 ; CHECK-NOT: select
 ; CHECK: ret i32 %x
 
 entry:
   %a = alloca i32
   store i32 %x, i32* %a
   %dead = select i1 undef, i32* %a, i32* %p
   %load = load i32, i32* %a
   ret i32 %load
 }
 
 define i32 @test13(i32 %x, i32* %p) {
 ; Ensure we don't crash or fail to nuke dead phis of allocas if no load is ever
 ; found.
 ; CHECK-LABEL: @test13(
 ; CHECK-NOT: alloca
 ; CHECK-NOT: phi
 ; CHECK: ret i32 %x
 
 entry:
   %a = alloca i32
   store i32 %x, i32* %a
   br label %loop
 
 loop:
   %phi = phi i32* [ %p, %entry ], [ %a, %loop ]
   br i1 undef, label %loop, label %exit
 
 exit:
   %load = load i32, i32* %a
   ret i32 %load
 }
 
 define i32 @test14(i1 %b1, i1 %b2, i32* %ptr) {
 ; Check for problems when there are both selects and phis and one is
 ; speculatable toward promotion but the other is not. That should block all of
 ; the speculation.
 ; CHECK-LABEL: @test14(
 ; CHECK: alloca
 ; CHECK: alloca
 ; CHECK: select
 ; CHECK: phi
 ; CHECK: phi
 ; CHECK: select
 ; CHECK: ret i32
 
 entry:
   %f = alloca i32
   %g = alloca i32
   store i32 0, i32* %f
   store i32 0, i32* %g
   %f.select = select i1 %b1, i32* %f, i32* %ptr
   br i1 %b2, label %then, label %else
 
 then:
   br label %exit
 
 else:
   br label %exit
 
 exit:
   %f.phi = phi i32* [ %f, %then ], [ %f.select, %else ]
   %g.phi = phi i32* [ %g, %then ], [ %ptr, %else ]
   %f.loaded = load i32, i32* %f.phi
   %g.select = select i1 %b1, i32* %g, i32* %g.phi
   %g.loaded = load i32, i32* %g.select
   %result = add i32 %f.loaded, %g.loaded
   ret i32 %result
 }
 
 define i32 @PR13905() {
 ; Check a pattern where we have a chain of dead phi nodes to ensure they are
 ; deleted and promotion can proceed.
 ; CHECK-LABEL: @PR13905(
 ; CHECK-NOT: alloca i32
 ; CHECK: ret i32 undef
 
 entry:
   %h = alloca i32
   store i32 0, i32* %h
   br i1 undef, label %loop1, label %exit
 
 loop1:
   %phi1 = phi i32* [ null, %entry ], [ %h, %loop1 ], [ %h, %loop2 ]
   br i1 undef, label %loop1, label %loop2
 
 loop2:
   br i1 undef, label %loop1, label %exit
 
 exit:
   %phi2 = phi i32* [ %phi1, %loop2 ], [ null, %entry ]
   ret i32 undef
 }
 
 define i32 @PR13906() {
 ; Another pattern which can lead to crashes due to failing to clear out dead
 ; PHI nodes or select nodes. This triggers subtly differently from the above
 ; cases because the PHI node is (recursively) alive, but the select is dead.
 ; CHECK-LABEL: @PR13906(
 ; CHECK-NOT: alloca
 
 entry:
   %c = alloca i32
   store i32 0, i32* %c
   br label %for.cond
 
 for.cond:
   %d.0 = phi i32* [ undef, %entry ], [ %c, %if.then ], [ %d.0, %for.cond ]
   br i1 undef, label %if.then, label %for.cond
 
 if.then:
   %tmpcast.d.0 = select i1 undef, i32* %c, i32* %d.0
   br label %for.cond
 }
 
 define i64 @PR14132(i1 %flag) {
 ; CHECK-LABEL: @PR14132(
 ; Here we form a PHI-node by promoting the pointer alloca first, and then in
 ; order to promote the other two allocas, we speculate the load of the
 ; now-phi-node-pointer. In doing so we end up loading a 64-bit value from an i8
 ; alloca. While this is a bit dubious, we were asserting on trying to
 ; rewrite it. The trick is that the code using the value may carefully take
 ; steps to only use the not-undef bits, and so we need to at least loosely
 ; support this..
 entry:
   %a = alloca i64, align 8
   %b = alloca i8, align 8
   %ptr = alloca i64*, align 8
 ; CHECK-NOT: alloca
 
   %ptr.cast = bitcast i64** %ptr to i8**
   store i64 0, i64* %a, align 8
   store i8 1, i8* %b, align 8
   store i64* %a, i64** %ptr, align 8
   br i1 %flag, label %if.then, label %if.end
 
 if.then:
   store i8* %b, i8** %ptr.cast, align 8
   br label %if.end
 ; CHECK-NOT: store
 ; CHECK: %[[ext:.*]] = zext i8 1 to i64
 
 if.end:
   %tmp = load i64*, i64** %ptr, align 8
   %result = load i64, i64* %tmp, align 8
 ; CHECK-NOT: load
 ; CHECK: %[[result:.*]] = phi i64 [ %[[ext]], %if.then ], [ 0, %entry ]
 
   ret i64 %result
 ; CHECK-NEXT: ret i64 %[[result]]
 }
 
 define float @PR16687(i64 %x, i1 %flag) {
 ; CHECK-LABEL: @PR16687(
 ; Check that even when we try to speculate the same phi twice (in two slices)
 ; on an otherwise promotable construct, we don't get ahead of ourselves and try
 ; to promote one of the slices prior to speculating it.
 
 entry:
   %a = alloca i64, align 8
   store i64 %x, i64* %a
   br i1 %flag, label %then, label %else
 ; CHECK-NOT: alloca
 ; CHECK-NOT: store
 ; CHECK: %[[lo:.*]] = trunc i64 %x to i32
 ; CHECK: %[[shift:.*]] = lshr i64 %x, 32
 ; CHECK: %[[hi:.*]] = trunc i64 %[[shift]] to i32
 
 then:
   %a.f = bitcast i64* %a to float*
   br label %end
 ; CHECK: %[[lo_cast:.*]] = bitcast i32 %[[lo]] to float
 
 else:
   %a.raw = bitcast i64* %a to i8*
   %a.raw.4 = getelementptr i8, i8* %a.raw, i64 4
   %a.raw.4.f = bitcast i8* %a.raw.4 to float*
   br label %end
 ; CHECK: %[[hi_cast:.*]] = bitcast i32 %[[hi]] to float
 
 end:
   %a.phi.f = phi float* [ %a.f, %then ], [ %a.raw.4.f, %else ]
   %f = load float, float* %a.phi.f
   ret float %f
 ; CHECK: %[[phi:.*]] = phi float [ %[[lo_cast]], %then ], [ %[[hi_cast]], %else ]
 ; CHECK-NOT: load
 ; CHECK: ret float %[[phi]]
 }
 
 ; Verifies we fixed PR20425. We should be able to promote all alloca's to
 ; registers in this test.
 ;
 ; %0 = slice
 ; %1 = slice
 ; %2 = phi(%0, %1) // == slice
 define float @simplify_phi_nodes_that_equal_slice(i1 %cond, float* %temp) {
 ; CHECK-LABEL: @simplify_phi_nodes_that_equal_slice(
 entry:
   %arr = alloca [4 x float], align 4
 ; CHECK-NOT: alloca
   br i1 %cond, label %then, label %else
 
 then:
   %0 = getelementptr inbounds [4 x float], [4 x float]* %arr, i64 0, i64 3
   store float 1.000000e+00, float* %0, align 4
   br label %merge
 
 else:
   %1 = getelementptr inbounds [4 x float], [4 x float]* %arr, i64 0, i64 3
   store float 2.000000e+00, float* %1, align 4
   br label %merge
 
 merge:
   %2 = phi float* [ %0, %then ], [ %1, %else ]
   store float 0.000000e+00, float* %temp, align 4
   %3 = load float, float* %2, align 4
   ret float %3
 }
 
 ; A slightly complicated example for PR20425.
 ;
 ; %0 = slice
 ; %1 = phi(%0) // == slice
 ; %2 = slice
 ; %3 = phi(%1, %2) // == slice
 define float @simplify_phi_nodes_that_equal_slice_2(i1 %cond, float* %temp) {
 ; CHECK-LABEL: @simplify_phi_nodes_that_equal_slice_2(
 entry:
   %arr = alloca [4 x float], align 4
 ; CHECK-NOT: alloca
   br i1 %cond, label %then, label %else
 
 then:
   %0 = getelementptr inbounds [4 x float], [4 x float]* %arr, i64 0, i64 3
   store float 1.000000e+00, float* %0, align 4
   br label %then2
 
 then2:
   %1 = phi float* [ %0, %then ]
   store float 2.000000e+00, float* %1, align 4
   br label %merge
 
 else:
   %2 = getelementptr inbounds [4 x float], [4 x float]* %arr, i64 0, i64 3
   store float 3.000000e+00, float* %2, align 4
   br label %merge
 
 merge:
   %3 = phi float* [ %1, %then2 ], [ %2, %else ]
   store float 0.000000e+00, float* %temp, align 4
   %4 = load float, float* %3, align 4
   ret float %4
 }
 
 %struct.S = type { i32 }
 
 ; Verifies we fixed PR20822. We have a foldable PHI feeding a speculatable PHI
 ; which requires the rewriting of the speculated PHI to handle insertion
 ; when the incoming pointer is itself from a PHI node. We would previously
 ; insert a bitcast instruction *before* a PHI, producing an invalid module;
 ; make sure we insert *after* the first non-PHI instruction.
 define void @PR20822() {
 ; CHECK-LABEL: @PR20822(
 entry:
   %f = alloca %struct.S, align 4
 ; CHECK: %[[alloca:.*]] = alloca
   br i1 undef, label %if.end, label %for.cond
 
 for.cond:                                         ; preds = %for.cond, %entry
   br label %if.end
 
 if.end:                                           ; preds = %for.cond, %entry
   %f2 = phi %struct.S* [ %f, %entry ], [ %f, %for.cond ]
 ; CHECK: phi i32
 ; CHECK: %[[cast:.*]] = bitcast i32* %[[alloca]] to %struct.S*
   phi i32 [ undef, %entry ], [ undef, %for.cond ]
   br i1 undef, label %if.then5, label %if.then2
 
 if.then2:                                         ; preds = %if.end
   br label %if.then5
 
 if.then5:                                         ; preds = %if.then2, %if.end
   %f1 = phi %struct.S* [ undef, %if.then2 ], [ %f2, %if.end ]
 ; CHECK: phi {{.*}} %[[cast]]
   store %struct.S undef, %struct.S* %f1, align 4
   ret void
 }
+
+define i32 @phi_align(i32* %z) {
+; CHECK-LABEL: @phi_align(
+entry:
+  %a = alloca [8 x i8], align 8
+; CHECK: alloca [7 x i8]
+
+  %a0x = getelementptr [8 x i8], [8 x i8]* %a, i64 0, i32 1
+  %a0 = bitcast i8* %a0x to i32*
+  %a1x = getelementptr [8 x i8], [8 x i8]* %a, i64 0, i32 4
+  %a1 = bitcast i8* %a1x to i32*
+; CHECK: store i32 0, {{.*}}, align 1
+  store i32 0, i32* %a0, align 1
+; CHECK: store i32 1, {{.*}}, align 1
+  store i32 1, i32* %a1, align 4
+; CHECK: load {{.*}}, align 1
+  %v0 = load i32, i32* %a0, align 1
+; CHECK: load {{.*}}, align 1
+  %v1 = load i32, i32* %a1, align 4
+  %cond = icmp sle i32 %v0, %v1
+  br i1 %cond, label %then, label %exit
+
+then:
+  br label %exit
+
+exit:
+; CHECK: %phi = phi i32* [ {{.*}}, %then ], [ %z, %entry ]
+; CHECK-NEXT: %result = load i32, i32* %phi, align 1
+  %phi = phi i32* [ %a1, %then ], [ %z, %entry ]
+  %result = load i32, i32* %phi, align 4
+  ret i32 %result
+}
Index: vendor/llvm/dist-release_70/utils/lit/lit/TestRunner.py
===================================================================
--- vendor/llvm/dist-release_70/utils/lit/lit/TestRunner.py	(revision 338574)
+++ vendor/llvm/dist-release_70/utils/lit/lit/TestRunner.py	(revision 338575)
@@ -1,1583 +1,1583 @@
 from __future__ import absolute_import
 import difflib
 import errno
 import functools
 import io
 import itertools
 import getopt
 import os, signal, subprocess, sys
 import re
 import stat
 import platform
 import shutil
 import tempfile
 import threading
 
 import io
 try:
     from StringIO import StringIO
 except ImportError:
     from io import StringIO
 
 from lit.ShCommands import GlobItem
 import lit.ShUtil as ShUtil
 import lit.Test as Test
 import lit.util
 from lit.util import to_bytes, to_string
 from lit.BooleanExpression import BooleanExpression
 
 class InternalShellError(Exception):
     def __init__(self, command, message):
         self.command = command
         self.message = message
 
 kIsWindows = platform.system() == 'Windows'
 
 # Don't use close_fds on Windows.
 kUseCloseFDs = not kIsWindows
 
 # Use temporary files to replace /dev/null on Windows.
 kAvoidDevNull = kIsWindows
 kDevNull = "/dev/null"
 
 # A regex that matches %dbg(ARG), which lit inserts at the beginning of each
 # run command pipeline such that ARG specifies the pipeline's source line
 # number.  lit later expands each %dbg(ARG) to a command that behaves as a null
 # command in the target shell so that the line number is seen in lit's verbose
 # mode.
 #
 # This regex captures ARG.  ARG must not contain a right parenthesis, which
 # terminates %dbg.  ARG must not contain quotes, in which ARG might be enclosed
 # during expansion.
 kPdbgRegex = '%dbg\(([^)\'"]*)\)'
 
 class ShellEnvironment(object):
 
     """Mutable shell environment containing things like CWD and env vars.
 
     Environment variables are not implemented, but cwd tracking is.
     """
 
     def __init__(self, cwd, env):
         self.cwd = cwd
         self.env = dict(env)
 
 class TimeoutHelper(object):
     """
         Object used to helper manage enforcing a timeout in
         _executeShCmd(). It is passed through recursive calls
         to collect processes that have been executed so that when
         the timeout happens they can be killed.
     """
     def __init__(self, timeout):
         self.timeout = timeout
         self._procs = []
         self._timeoutReached = False
         self._doneKillPass = False
         # This lock will be used to protect concurrent access
         # to _procs and _doneKillPass
         self._lock = None
         self._timer = None
 
     def cancel(self):
         if not self.active():
             return
         self._timer.cancel()
 
     def active(self):
         return self.timeout > 0
 
     def addProcess(self, proc):
         if not self.active():
             return
         needToRunKill = False
         with self._lock:
             self._procs.append(proc)
             # Avoid re-entering the lock by finding out if kill needs to be run
             # again here but call it if necessary once we have left the lock.
             # We could use a reentrant lock here instead but this code seems
             # clearer to me.
             needToRunKill = self._doneKillPass
 
         # The initial call to _kill() from the timer thread already happened so
         # we need to call it again from this thread, otherwise this process
         # will be left to run even though the timeout was already hit
         if needToRunKill:
             assert self.timeoutReached()
             self._kill()
 
     def startTimer(self):
         if not self.active():
             return
 
         # Do some late initialisation that's only needed
         # if there is a timeout set
         self._lock = threading.Lock()
         self._timer = threading.Timer(self.timeout, self._handleTimeoutReached)
         self._timer.start()
 
     def _handleTimeoutReached(self):
         self._timeoutReached = True
         self._kill()
 
     def timeoutReached(self):
         return self._timeoutReached
 
     def _kill(self):
         """
             This method may be called multiple times as we might get unlucky
             and be in the middle of creating a new process in _executeShCmd()
             which won't yet be in ``self._procs``. By locking here and in
             addProcess() we should be able to kill processes launched after
             the initial call to _kill()
         """
         with self._lock:
             for p in self._procs:
                 lit.util.killProcessAndChildren(p.pid)
             # Empty the list and note that we've done a pass over the list
             self._procs = [] # Python2 doesn't have list.clear()
             self._doneKillPass = True
 
 class ShellCommandResult(object):
     """Captures the result of an individual command."""
 
     def __init__(self, command, stdout, stderr, exitCode, timeoutReached,
                  outputFiles = []):
         self.command = command
         self.stdout = stdout
         self.stderr = stderr
         self.exitCode = exitCode
         self.timeoutReached = timeoutReached
         self.outputFiles = list(outputFiles)
                
 def executeShCmd(cmd, shenv, results, timeout=0):
     """
         Wrapper around _executeShCmd that handles
         timeout
     """
     # Use the helper even when no timeout is required to make
     # other code simpler (i.e. avoid bunch of ``!= None`` checks)
     timeoutHelper = TimeoutHelper(timeout)
     if timeout > 0:
         timeoutHelper.startTimer()
     finalExitCode = _executeShCmd(cmd, shenv, results, timeoutHelper)
     timeoutHelper.cancel()
     timeoutInfo = None
     if timeoutHelper.timeoutReached():
         timeoutInfo = 'Reached timeout of {} seconds'.format(timeout)
 
     return (finalExitCode, timeoutInfo)
 
 def expand_glob(arg, cwd):
     if isinstance(arg, GlobItem):
         return sorted(arg.resolve(cwd))
     return [arg]
 
 def expand_glob_expressions(args, cwd):
     result = [args[0]]
     for arg in args[1:]:
         result.extend(expand_glob(arg, cwd))
     return result
 
 def quote_windows_command(seq):
     """
     Reimplement Python's private subprocess.list2cmdline for MSys compatibility
 
     Based on CPython implementation here:
       https://hg.python.org/cpython/file/849826a900d2/Lib/subprocess.py#l422
 
     Some core util distributions (MSys) don't tokenize command line arguments
     the same way that MSVC CRT does. Lit rolls its own quoting logic similar to
     the stock CPython logic to paper over these quoting and tokenization rule
     differences.
 
     We use the same algorithm from MSDN as CPython
     (http://msdn.microsoft.com/en-us/library/17w5ykft.aspx), but we treat more
     characters as needing quoting, such as double quotes themselves.
     """
     result = []
     needquote = False
     for arg in seq:
         bs_buf = []
 
         # Add a space to separate this argument from the others
         if result:
             result.append(' ')
 
         # This logic differs from upstream list2cmdline.
         needquote = (" " in arg) or ("\t" in arg) or ("\"" in arg) or not arg
         if needquote:
             result.append('"')
 
         for c in arg:
             if c == '\\':
                 # Don't know if we need to double yet.
                 bs_buf.append(c)
             elif c == '"':
                 # Double backslashes.
                 result.append('\\' * len(bs_buf)*2)
                 bs_buf = []
                 result.append('\\"')
             else:
                 # Normal char
                 if bs_buf:
                     result.extend(bs_buf)
                     bs_buf = []
                 result.append(c)
 
         # Add remaining backslashes, if any.
         if bs_buf:
             result.extend(bs_buf)
 
         if needquote:
             result.extend(bs_buf)
             result.append('"')
 
     return ''.join(result)
 
 # cmd is export or env
 def updateEnv(env, cmd):
     arg_idx = 1
     unset_next_env_var = False
     for arg_idx, arg in enumerate(cmd.args[1:]):
         # Support for the -u flag (unsetting) for env command
         # e.g., env -u FOO -u BAR will remove both FOO and BAR
         # from the environment.
         if arg == '-u':
             unset_next_env_var = True
             continue
         if unset_next_env_var:
             unset_next_env_var = False
             if arg in env.env:
                 del env.env[arg]
             continue
 
         # Partition the string into KEY=VALUE.
         key, eq, val = arg.partition('=')
         # Stop if there was no equals.
         if eq == '':
             break
         env.env[key] = val
     cmd.args = cmd.args[arg_idx+1:]
 
 def executeBuiltinEcho(cmd, shenv):
     """Interpret a redirected echo command"""
     opened_files = []
     stdin, stdout, stderr = processRedirects(cmd, subprocess.PIPE, shenv,
                                              opened_files)
     if stdin != subprocess.PIPE or stderr != subprocess.PIPE:
         raise InternalShellError(
                 cmd, "stdin and stderr redirects not supported for echo")
 
     # Some tests have un-redirected echo commands to help debug test failures.
     # Buffer our output and return it to the caller.
     is_redirected = True
     encode = lambda x : x
     if stdout == subprocess.PIPE:
         is_redirected = False
         stdout = StringIO()
     elif kIsWindows:
         # Reopen stdout in binary mode to avoid CRLF translation. The versions
         # of echo we are replacing on Windows all emit plain LF, and the LLVM
         # tests now depend on this.
         # When we open as binary, however, this also means that we have to write
         # 'bytes' objects to stdout instead of 'str' objects.
         encode = lit.util.to_bytes
         stdout = open(stdout.name, stdout.mode + 'b')
         opened_files.append((None, None, stdout, None))
 
     # Implement echo flags. We only support -e and -n, and not yet in
     # combination. We have to ignore unknown flags, because `echo "-D FOO"`
     # prints the dash.
     args = cmd.args[1:]
     interpret_escapes = False
     write_newline = True
     while len(args) >= 1 and args[0] in ('-e', '-n'):
         flag = args[0]
         args = args[1:]
         if flag == '-e':
             interpret_escapes = True
         elif flag == '-n':
             write_newline = False
 
     def maybeUnescape(arg):
         if not interpret_escapes:
             return arg
 
         arg = lit.util.to_bytes(arg)
         codec = 'string_escape' if sys.version_info < (3,0) else 'unicode_escape'
         return arg.decode(codec)
 
     if args:
         for arg in args[:-1]:
             stdout.write(encode(maybeUnescape(arg)))
             stdout.write(encode(' '))
         stdout.write(encode(maybeUnescape(args[-1])))
     if write_newline:
         stdout.write(encode('\n'))
 
     for (name, mode, f, path) in opened_files:
         f.close()
 
     if not is_redirected:
         return stdout.getvalue()
     return ""
 
 def executeBuiltinMkdir(cmd, cmd_shenv):
     """executeBuiltinMkdir - Create new directories."""
     args = expand_glob_expressions(cmd.args, cmd_shenv.cwd)[1:]
     try:
         opts, args = getopt.gnu_getopt(args, 'p')
     except getopt.GetoptError as err:
         raise InternalShellError(cmd, "Unsupported: 'mkdir':  %s" % str(err))
 
     parent = False
     for o, a in opts:
         if o == "-p":
             parent = True
         else:
             assert False, "unhandled option"
 
     if len(args) == 0:
         raise InternalShellError(cmd, "Error: 'mkdir' is missing an operand")
 
     stderr = StringIO()
     exitCode = 0
     for dir in args:
         if not os.path.isabs(dir):
             dir = os.path.realpath(os.path.join(cmd_shenv.cwd, dir))
         if parent:
             lit.util.mkdir_p(dir)
         else:
             try:
                 os.mkdir(dir)
             except OSError as err:
                 stderr.write("Error: 'mkdir' command failed, %s\n" % str(err))
                 exitCode = 1
     return ShellCommandResult(cmd, "", stderr.getvalue(), exitCode, False)
 
 def executeBuiltinDiff(cmd, cmd_shenv):
     """executeBuiltinDiff - Compare files line by line."""
     args = expand_glob_expressions(cmd.args, cmd_shenv.cwd)[1:]
     try:
         opts, args = getopt.gnu_getopt(args, "wbur", ["strip-trailing-cr"])
     except getopt.GetoptError as err:
         raise InternalShellError(cmd, "Unsupported: 'diff':  %s" % str(err))
 
     filelines, filepaths, dir_trees = ([] for i in range(3))
     ignore_all_space = False
     ignore_space_change = False
     unified_diff = False
     recursive_diff = False
     strip_trailing_cr = False
     for o, a in opts:
         if o == "-w":
             ignore_all_space = True
         elif o == "-b":
             ignore_space_change = True
         elif o == "-u":
             unified_diff = True
         elif o == "-r":
             recursive_diff = True
         elif o == "--strip-trailing-cr":
             strip_trailing_cr = True
         else:
             assert False, "unhandled option"
 
     if len(args) != 2:
         raise InternalShellError(cmd, "Error:  missing or extra operand")
 
     def getDirTree(path, basedir=""):
         # Tree is a tuple of form (dirname, child_trees).
         # An empty dir has child_trees = [], a file has child_trees = None.
         child_trees = []
         for dirname, child_dirs, files in os.walk(os.path.join(basedir, path)):
             for child_dir in child_dirs:
                 child_trees.append(getDirTree(child_dir, dirname))
             for filename in files:
                 child_trees.append((filename, None))
             return path, sorted(child_trees)
 
     def compareTwoFiles(filepaths):
         compare_bytes = False
         encoding = None
         filelines = []
         for file in filepaths:
             try:
                 with open(file, 'r') as f:
                     filelines.append(f.readlines())
             except UnicodeDecodeError:
                 try:
                     with io.open(file, 'r', encoding="utf-8") as f:
                         filelines.append(f.readlines())
                     encoding = "utf-8"
                 except:
                     compare_bytes = True
 
         if compare_bytes:
             return compareTwoBinaryFiles(filepaths)
         else:
             return compareTwoTextFiles(filepaths, encoding)
 
     def compareTwoBinaryFiles(filepaths):
         filelines = []
         for file in filepaths:
             with open(file, 'rb') as f:
                 filelines.append(f.readlines())
 
         exitCode = 0
         if hasattr(difflib, 'diff_bytes'):
             # python 3.5 or newer
             diffs = difflib.diff_bytes(difflib.unified_diff, filelines[0], filelines[1], filepaths[0].encode(), filepaths[1].encode())
             diffs = [diff.decode() for diff in diffs]
         else:
             # python 2.7
             func = difflib.unified_diff if unified_diff else difflib.context_diff
             diffs = func(filelines[0], filelines[1], filepaths[0], filepaths[1])
 
         for diff in diffs:
             stdout.write(diff)
             exitCode = 1
         return exitCode
 
     def compareTwoTextFiles(filepaths, encoding):
         filelines = []
         for file in filepaths:
             if encoding is None:
                 with open(file, 'r') as f:
                     filelines.append(f.readlines())
             else:
                 with io.open(file, 'r', encoding=encoding) as f:
                     filelines.append(f.readlines())
 
         exitCode = 0
         def compose2(f, g):
             return lambda x: f(g(x))
 
         f = lambda x: x
         if strip_trailing_cr:
             f = compose2(lambda line: line.rstrip('\r'), f)
         if ignore_all_space or ignore_space_change:
             ignoreSpace = lambda line, separator: separator.join(line.split())
             ignoreAllSpaceOrSpaceChange = functools.partial(ignoreSpace, separator='' if ignore_all_space else ' ')
             f = compose2(ignoreAllSpaceOrSpaceChange, f)
 
         for idx, lines in enumerate(filelines):
             filelines[idx]= [f(line) for line in lines]
 
         func = difflib.unified_diff if unified_diff else difflib.context_diff
         for diff in func(filelines[0], filelines[1], filepaths[0], filepaths[1]):
             stdout.write(diff)
             exitCode = 1
         return exitCode
 
     def printDirVsFile(dir_path, file_path):
         if os.path.getsize(file_path):
             msg = "File %s is a directory while file %s is a regular file"
         else:
             msg = "File %s is a directory while file %s is a regular empty file"
         stdout.write(msg % (dir_path, file_path) + "\n")
 
     def printFileVsDir(file_path, dir_path):
         if os.path.getsize(file_path):
             msg = "File %s is a regular file while file %s is a directory"
         else:
             msg = "File %s is a regular empty file while file %s is a directory"
         stdout.write(msg % (file_path, dir_path) + "\n")
 
     def printOnlyIn(basedir, path, name):
         stdout.write("Only in %s: %s\n" % (os.path.join(basedir, path), name))
 
     def compareDirTrees(dir_trees, base_paths=["", ""]):
         # Dirnames of the trees are not checked, it's caller's responsibility,
         # as top-level dirnames are always different. Base paths are important
         # for doing os.walk, but we don't put it into tree's dirname in order
         # to speed up string comparison below and while sorting in getDirTree.
         left_tree, right_tree = dir_trees[0], dir_trees[1]
         left_base, right_base = base_paths[0], base_paths[1]
 
         # Compare two files or report file vs. directory mismatch.
         if left_tree[1] is None and right_tree[1] is None:
             return compareTwoFiles([os.path.join(left_base, left_tree[0]),
                                     os.path.join(right_base, right_tree[0])])
 
         if left_tree[1] is None and right_tree[1] is not None:
             printFileVsDir(os.path.join(left_base, left_tree[0]),
                            os.path.join(right_base, right_tree[0]))
             return 1
 
         if left_tree[1] is not None and right_tree[1] is None:
             printDirVsFile(os.path.join(left_base, left_tree[0]),
                            os.path.join(right_base, right_tree[0]))
             return 1
 
         # Compare two directories via recursive use of compareDirTrees.
         exitCode = 0
         left_names = [node[0] for node in left_tree[1]]
         right_names = [node[0] for node in right_tree[1]]
         l, r = 0, 0
         while l < len(left_names) and r < len(right_names):
             # Names are sorted in getDirTree, rely on that order.
             if left_names[l] < right_names[r]:
                 exitCode = 1
                 printOnlyIn(left_base, left_tree[0], left_names[l])
                 l += 1
             elif left_names[l] > right_names[r]:
                 exitCode = 1
                 printOnlyIn(right_base, right_tree[0], right_names[r])
                 r += 1
             else:
                 exitCode |= compareDirTrees([left_tree[1][l], right_tree[1][r]],
                                             [os.path.join(left_base, left_tree[0]),
                                             os.path.join(right_base, right_tree[0])])
                 l += 1
                 r += 1
 
         # At least one of the trees has ended. Report names from the other tree.
         while l < len(left_names):
             exitCode = 1
             printOnlyIn(left_base, left_tree[0], left_names[l])
             l += 1
         while r < len(right_names):
             exitCode = 1
             printOnlyIn(right_base, right_tree[0], right_names[r])
             r += 1
         return exitCode
 
     stderr = StringIO()
     stdout = StringIO()
     exitCode = 0
     try:
         for file in args:
             if not os.path.isabs(file):
                 file = os.path.realpath(os.path.join(cmd_shenv.cwd, file))
     
             if recursive_diff:
                 dir_trees.append(getDirTree(file))
             else:
                 filepaths.append(file)
 
         if not recursive_diff:
             exitCode = compareTwoFiles(filepaths)
         else:
             exitCode = compareDirTrees(dir_trees)
 
     except IOError as err:
         stderr.write("Error: 'diff' command failed, %s\n" % str(err))
         exitCode = 1
 
     return ShellCommandResult(cmd, stdout.getvalue(), stderr.getvalue(), exitCode, False)
 
 def executeBuiltinRm(cmd, cmd_shenv):
     """executeBuiltinRm - Removes (deletes) files or directories."""
     args = expand_glob_expressions(cmd.args, cmd_shenv.cwd)[1:]
     try:
         opts, args = getopt.gnu_getopt(args, "frR", ["--recursive"])
     except getopt.GetoptError as err:
         raise InternalShellError(cmd, "Unsupported: 'rm':  %s" % str(err))
 
     force = False
     recursive = False
     for o, a in opts:
         if o == "-f":
             force = True
         elif o in ("-r", "-R", "--recursive"):
             recursive = True
         else:
             assert False, "unhandled option"
 
     if len(args) == 0:
         raise InternalShellError(cmd, "Error: 'rm' is missing an operand")
 
     def on_rm_error(func, path, exc_info):
         # path contains the path of the file that couldn't be removed
         # let's just assume that it's read-only and remove it.
         os.chmod(path, stat.S_IMODE( os.stat(path).st_mode) | stat.S_IWRITE)
         os.remove(path)
 
     stderr = StringIO()
     exitCode = 0
     for path in args:
         if not os.path.isabs(path):
             path = os.path.realpath(os.path.join(cmd_shenv.cwd, path))
         if force and not os.path.exists(path):
             continue
         try:
             if os.path.isdir(path):
                 if not recursive:
                     stderr.write("Error: %s is a directory\n" % path)
                     exitCode = 1
                 shutil.rmtree(path, onerror = on_rm_error if force else None)
             else:
                 if force and not os.access(path, os.W_OK):
                     os.chmod(path,
                              stat.S_IMODE(os.stat(path).st_mode) | stat.S_IWRITE)
                 os.remove(path)
         except OSError as err:
             stderr.write("Error: 'rm' command failed, %s" % str(err))
             exitCode = 1
     return ShellCommandResult(cmd, "", stderr.getvalue(), exitCode, False)
 
 def processRedirects(cmd, stdin_source, cmd_shenv, opened_files):
     """Return the standard fds for cmd after applying redirects
 
     Returns the three standard file descriptors for the new child process.  Each
     fd may be an open, writable file object or a sentinel value from the
     subprocess module.
     """
 
     # Apply the redirections, we use (N,) as a sentinel to indicate stdin,
     # stdout, stderr for N equal to 0, 1, or 2 respectively. Redirects to or
     # from a file are represented with a list [file, mode, file-object]
     # where file-object is initially None.
     redirects = [(0,), (1,), (2,)]
     for (op, filename) in cmd.redirects:
         if op == ('>',2):
             redirects[2] = [filename, 'w', None]
         elif op == ('>>',2):
             redirects[2] = [filename, 'a', None]
         elif op == ('>&',2) and filename in '012':
             redirects[2] = redirects[int(filename)]
         elif op == ('>&',) or op == ('&>',):
             redirects[1] = redirects[2] = [filename, 'w', None]
         elif op == ('>',):
             redirects[1] = [filename, 'w', None]
         elif op == ('>>',):
             redirects[1] = [filename, 'a', None]
         elif op == ('<',):
             redirects[0] = [filename, 'r', None]
         else:
             raise InternalShellError(cmd, "Unsupported redirect: %r" % ((op, filename),))
 
     # Open file descriptors in a second pass.
     std_fds = [None, None, None]
     for (index, r) in enumerate(redirects):
         # Handle the sentinel values for defaults up front.
         if isinstance(r, tuple):
             if r == (0,):
                 fd = stdin_source
             elif r == (1,):
                 if index == 0:
                     raise InternalShellError(cmd, "Unsupported redirect for stdin")
                 elif index == 1:
                     fd = subprocess.PIPE
                 else:
                     fd = subprocess.STDOUT
             elif r == (2,):
                 if index != 2:
                     raise InternalShellError(cmd, "Unsupported redirect on stdout")
                 fd = subprocess.PIPE
             else:
                 raise InternalShellError(cmd, "Bad redirect")
             std_fds[index] = fd
             continue
 
         (filename, mode, fd) = r
 
         # Check if we already have an open fd. This can happen if stdout and
         # stderr go to the same place.
         if fd is not None:
             std_fds[index] = fd
             continue
 
         redir_filename = None
         name = expand_glob(filename, cmd_shenv.cwd)
         if len(name) != 1:
            raise InternalShellError(cmd, "Unsupported: glob in "
                                     "redirect expanded to multiple files")
         name = name[0]
         if kAvoidDevNull and name == kDevNull:
             fd = tempfile.TemporaryFile(mode=mode)
         elif kIsWindows and name == '/dev/tty':
             # Simulate /dev/tty on Windows.
             # "CON" is a special filename for the console.
             fd = open("CON", mode)
         else:
             # Make sure relative paths are relative to the cwd.
             redir_filename = os.path.join(cmd_shenv.cwd, name)
             fd = open(redir_filename, mode)
         # Workaround a Win32 and/or subprocess bug when appending.
         #
         # FIXME: Actually, this is probably an instance of PR6753.
         if mode == 'a':
             fd.seek(0, 2)
         # Mutate the underlying redirect list so that we can redirect stdout
         # and stderr to the same place without opening the file twice.
         r[2] = fd
         opened_files.append((filename, mode, fd) + (redir_filename,))
         std_fds[index] = fd
 
     return std_fds
 
 def _executeShCmd(cmd, shenv, results, timeoutHelper):
     if timeoutHelper.timeoutReached():
         # Prevent further recursion if the timeout has been hit
         # as we should try avoid launching more processes.
         return None
 
     if isinstance(cmd, ShUtil.Seq):
         if cmd.op == ';':
             res = _executeShCmd(cmd.lhs, shenv, results, timeoutHelper)
             return _executeShCmd(cmd.rhs, shenv, results, timeoutHelper)
 
         if cmd.op == '&':
             raise InternalShellError(cmd,"unsupported shell operator: '&'")
 
         if cmd.op == '||':
             res = _executeShCmd(cmd.lhs, shenv, results, timeoutHelper)
             if res != 0:
                 res = _executeShCmd(cmd.rhs, shenv, results, timeoutHelper)
             return res
 
         if cmd.op == '&&':
             res = _executeShCmd(cmd.lhs, shenv, results, timeoutHelper)
             if res is None:
                 return res
 
             if res == 0:
                 res = _executeShCmd(cmd.rhs, shenv, results, timeoutHelper)
             return res
 
         raise ValueError('Unknown shell command: %r' % cmd.op)
     assert isinstance(cmd, ShUtil.Pipeline)
 
     # Handle shell builtins first.
     if cmd.commands[0].args[0] == 'cd':
         if len(cmd.commands) != 1:
             raise ValueError("'cd' cannot be part of a pipeline")
         if len(cmd.commands[0].args) != 2:
             raise ValueError("'cd' supports only one argument")
         newdir = cmd.commands[0].args[1]
         # Update the cwd in the parent environment.
         if os.path.isabs(newdir):
             shenv.cwd = newdir
         else:
             shenv.cwd = os.path.realpath(os.path.join(shenv.cwd, newdir))
         # The cd builtin always succeeds. If the directory does not exist, the
         # following Popen calls will fail instead.
         return 0
 
     # Handle "echo" as a builtin if it is not part of a pipeline. This greatly
     # speeds up tests that construct input files by repeatedly echo-appending to
     # a file.
     # FIXME: Standardize on the builtin echo implementation. We can use a
     # temporary file to sidestep blocking pipe write issues.
     if cmd.commands[0].args[0] == 'echo' and len(cmd.commands) == 1:
         output = executeBuiltinEcho(cmd.commands[0], shenv)
         results.append(ShellCommandResult(cmd.commands[0], output, "", 0,
                                           False))
         return 0
 
     if cmd.commands[0].args[0] == 'export':
         if len(cmd.commands) != 1:
             raise ValueError("'export' cannot be part of a pipeline")
         if len(cmd.commands[0].args) != 2:
             raise ValueError("'export' supports only one argument")
         updateEnv(shenv, cmd.commands[0])
         return 0
 
     if cmd.commands[0].args[0] == 'mkdir':
         if len(cmd.commands) != 1:
             raise InternalShellError(cmd.commands[0], "Unsupported: 'mkdir' "
                                      "cannot be part of a pipeline")
         cmdResult = executeBuiltinMkdir(cmd.commands[0], shenv)
         results.append(cmdResult)
         return cmdResult.exitCode
 
     if cmd.commands[0].args[0] == 'diff':
         if len(cmd.commands) != 1:
             raise InternalShellError(cmd.commands[0], "Unsupported: 'diff' "
                                      "cannot be part of a pipeline")
         cmdResult = executeBuiltinDiff(cmd.commands[0], shenv)
         results.append(cmdResult)
         return cmdResult.exitCode
 
     if cmd.commands[0].args[0] == 'rm':
         if len(cmd.commands) != 1:
             raise InternalShellError(cmd.commands[0], "Unsupported: 'rm' "
                                      "cannot be part of a pipeline")
         cmdResult = executeBuiltinRm(cmd.commands[0], shenv)
         results.append(cmdResult)
         return cmdResult.exitCode
 
     if cmd.commands[0].args[0] == ':':
         if len(cmd.commands) != 1:
             raise InternalShellError(cmd.commands[0], "Unsupported: ':' "
                                      "cannot be part of a pipeline")
         results.append(ShellCommandResult(cmd.commands[0], '', '', 0, False))
         return 0;
 
     procs = []
     default_stdin = subprocess.PIPE
     stderrTempFiles = []
     opened_files = []
     named_temp_files = []
     builtin_commands = set(['cat'])
     builtin_commands_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "builtin_commands")
     # To avoid deadlock, we use a single stderr stream for piped
     # output. This is null until we have seen some output using
     # stderr.
     for i,j in enumerate(cmd.commands):
         # Reference the global environment by default.
         cmd_shenv = shenv
         if j.args[0] == 'env':
             # Create a copy of the global environment and modify it for this one
             # command. There might be multiple envs in a pipeline:
             #   env FOO=1 llc < %s | env BAR=2 llvm-mc | FileCheck %s
             cmd_shenv = ShellEnvironment(shenv.cwd, shenv.env)
             updateEnv(cmd_shenv, j)
 
         stdin, stdout, stderr = processRedirects(j, default_stdin, cmd_shenv,
                                                  opened_files)
 
         # If stderr wants to come from stdout, but stdout isn't a pipe, then put
         # stderr on a pipe and treat it as stdout.
         if (stderr == subprocess.STDOUT and stdout != subprocess.PIPE):
             stderr = subprocess.PIPE
             stderrIsStdout = True
         else:
             stderrIsStdout = False
 
             # Don't allow stderr on a PIPE except for the last
             # process, this could deadlock.
             #
             # FIXME: This is slow, but so is deadlock.
             if stderr == subprocess.PIPE and j != cmd.commands[-1]:
                 stderr = tempfile.TemporaryFile(mode='w+b')
                 stderrTempFiles.append((i, stderr))
 
         # Resolve the executable path ourselves.
         args = list(j.args)
         executable = None
         is_builtin_cmd = args[0] in builtin_commands;
         if not is_builtin_cmd:
             # For paths relative to cwd, use the cwd of the shell environment.
             if args[0].startswith('.'):
                 exe_in_cwd = os.path.join(cmd_shenv.cwd, args[0])
                 if os.path.isfile(exe_in_cwd):
                     executable = exe_in_cwd
             if not executable:
                 executable = lit.util.which(args[0], cmd_shenv.env['PATH'])
             if not executable:
                 raise InternalShellError(j, '%r: command not found' % j.args[0])
 
         # Replace uses of /dev/null with temporary files.
         if kAvoidDevNull:
             # In Python 2.x, basestring is the base class for all string (including unicode)
             # In Python 3.x, basestring no longer exist and str is always unicode
             try:
                 str_type = basestring
             except NameError:
                 str_type = str
             for i,arg in enumerate(args):
                 if isinstance(arg, str_type) and kDevNull in arg:
                     f = tempfile.NamedTemporaryFile(delete=False)
                     f.close()
                     named_temp_files.append(f.name)
                     args[i] = arg.replace(kDevNull, f.name)
 
         # Expand all glob expressions
         args = expand_glob_expressions(args, cmd_shenv.cwd)
         if is_builtin_cmd:
-            args.insert(0, "python")
+            args.insert(0, sys.executable)
             args[1] = os.path.join(builtin_commands_dir ,args[1] + ".py")
 
         # On Windows, do our own command line quoting for better compatibility
         # with some core utility distributions.
         if kIsWindows:
             args = quote_windows_command(args)
 
         try:
             procs.append(subprocess.Popen(args, cwd=cmd_shenv.cwd,
                                           executable = executable,
                                           stdin = stdin,
                                           stdout = stdout,
                                           stderr = stderr,
                                           env = cmd_shenv.env,
                                           close_fds = kUseCloseFDs))
             # Let the helper know about this process
             timeoutHelper.addProcess(procs[-1])
         except OSError as e:
             raise InternalShellError(j, 'Could not create process ({}) due to {}'.format(executable, e))
 
         # Immediately close stdin for any process taking stdin from us.
         if stdin == subprocess.PIPE:
             procs[-1].stdin.close()
             procs[-1].stdin = None
 
         # Update the current stdin source.
         if stdout == subprocess.PIPE:
             default_stdin = procs[-1].stdout
         elif stderrIsStdout:
             default_stdin = procs[-1].stderr
         else:
             default_stdin = subprocess.PIPE
 
     # Explicitly close any redirected files. We need to do this now because we
     # need to release any handles we may have on the temporary files (important
     # on Win32, for example). Since we have already spawned the subprocess, our
     # handles have already been transferred so we do not need them anymore.
     for (name, mode, f, path) in opened_files:
         f.close()
 
     # FIXME: There is probably still deadlock potential here. Yawn.
     procData = [None] * len(procs)
     procData[-1] = procs[-1].communicate()
 
     for i in range(len(procs) - 1):
         if procs[i].stdout is not None:
             out = procs[i].stdout.read()
         else:
             out = ''
         if procs[i].stderr is not None:
             err = procs[i].stderr.read()
         else:
             err = ''
         procData[i] = (out,err)
 
     # Read stderr out of the temp files.
     for i,f in stderrTempFiles:
         f.seek(0, 0)
         procData[i] = (procData[i][0], f.read())
 
     exitCode = None
     for i,(out,err) in enumerate(procData):
         res = procs[i].wait()
         # Detect Ctrl-C in subprocess.
         if res == -signal.SIGINT:
             raise KeyboardInterrupt
 
         # Ensure the resulting output is always of string type.
         try:
             if out is None:
                 out = ''
             else:
                 out = to_string(out.decode('utf-8', errors='replace'))
         except:
             out = str(out)
         try:
             if err is None:
                 err = ''
             else:
                 err = to_string(err.decode('utf-8', errors='replace'))
         except:
             err = str(err)
 
         # Gather the redirected output files for failed commands.
         output_files = []
         if res != 0:
             for (name, mode, f, path) in sorted(opened_files):
                 if path is not None and mode in ('w', 'a'):
                     try:
                         with open(path, 'rb') as f:
                             data = f.read()
                     except:
                         data = None
                     if data is not None:
                         output_files.append((name, path, data))
             
         results.append(ShellCommandResult(
             cmd.commands[i], out, err, res, timeoutHelper.timeoutReached(),
             output_files))
         if cmd.pipe_err:
             # Take the last failing exit code from the pipeline.
             if not exitCode or res != 0:
                 exitCode = res
         else:
             exitCode = res
 
     # Remove any named temporary files we created.
     for f in named_temp_files:
         try:
             os.remove(f)
         except OSError:
             pass
 
     if cmd.negate:
         exitCode = not exitCode
 
     return exitCode
 
 def executeScriptInternal(test, litConfig, tmpBase, commands, cwd):
     cmds = []
     for i, ln in enumerate(commands):
         ln = commands[i] = re.sub(kPdbgRegex, ": '\\1'; ", ln)
         try:
             cmds.append(ShUtil.ShParser(ln, litConfig.isWindows,
                                         test.config.pipefail).parse())
         except:
             return lit.Test.Result(Test.FAIL, "shell parser error on: %r" % ln)
 
     cmd = cmds[0]
     for c in cmds[1:]:
         cmd = ShUtil.Seq(cmd, '&&', c)
 
     results = []
     timeoutInfo = None
     try:
         shenv = ShellEnvironment(cwd, test.config.environment)
         exitCode, timeoutInfo = executeShCmd(cmd, shenv, results, timeout=litConfig.maxIndividualTestTime)
     except InternalShellError:
         e = sys.exc_info()[1]
         exitCode = 127
         results.append(
             ShellCommandResult(e.command, '', e.message, exitCode, False))
 
     out = err = ''
     for i,result in enumerate(results):
         # Write the command line run.
         out += '$ %s\n' % (' '.join('"%s"' % s
                                     for s in result.command.args),)
 
         # If nothing interesting happened, move on.
         if litConfig.maxIndividualTestTime == 0 and \
                result.exitCode == 0 and \
                not result.stdout.strip() and not result.stderr.strip():
             continue
 
         # Otherwise, something failed or was printed, show it.
 
         # Add the command output, if redirected.
         for (name, path, data) in result.outputFiles:
             if data.strip():
                 out += "# redirected output from %r:\n" % (name,)
                 data = to_string(data.decode('utf-8', errors='replace'))
                 if len(data) > 1024:
                     out += data[:1024] + "\n...\n"
                     out += "note: data was truncated\n"
                 else:
                     out += data
                 out += "\n"
                     
         if result.stdout.strip():
             out += '# command output:\n%s\n' % (result.stdout,)
         if result.stderr.strip():
             out += '# command stderr:\n%s\n' % (result.stderr,)
         if not result.stdout.strip() and not result.stderr.strip():
             out += "note: command had no output on stdout or stderr\n"
 
         # Show the error conditions:
         if result.exitCode != 0:
             # On Windows, a negative exit code indicates a signal, and those are
             # easier to recognize or look up if we print them in hex.
             if litConfig.isWindows and result.exitCode < 0:
                 codeStr = hex(int(result.exitCode & 0xFFFFFFFF)).rstrip("L")
             else:
                 codeStr = str(result.exitCode)
             out += "error: command failed with exit status: %s\n" % (
                 codeStr,)
         if litConfig.maxIndividualTestTime > 0:
             out += 'error: command reached timeout: %s\n' % (
                 str(result.timeoutReached),)
 
     return out, err, exitCode, timeoutInfo
 
 def executeScript(test, litConfig, tmpBase, commands, cwd):
     bashPath = litConfig.getBashPath()
     isWin32CMDEXE = (litConfig.isWindows and not bashPath)
     script = tmpBase + '.script'
     if isWin32CMDEXE:
         script += '.bat'
 
     # Write script file
     mode = 'w'
     if litConfig.isWindows and not isWin32CMDEXE:
       mode += 'b'  # Avoid CRLFs when writing bash scripts.
     f = open(script, mode)
     if isWin32CMDEXE:
         for i, ln in enumerate(commands):
             commands[i] = re.sub(kPdbgRegex, "echo '\\1' > nul && ", ln)
         if litConfig.echo_all_commands:
             f.write('@echo on\n')
         else:
             f.write('@echo off\n')
         f.write('\n@if %ERRORLEVEL% NEQ 0 EXIT\n'.join(commands))
     else:
         for i, ln in enumerate(commands):
             commands[i] = re.sub(kPdbgRegex, ": '\\1'; ", ln)
         if test.config.pipefail:
             f.write('set -o pipefail;')
         if litConfig.echo_all_commands:
             f.write('set -x;')
         f.write('{ ' + '; } &&\n{ '.join(commands) + '; }')
     f.write('\n')
     f.close()
 
     if isWin32CMDEXE:
         command = ['cmd','/c', script]
     else:
         if bashPath:
             command = [bashPath, script]
         else:
             command = ['/bin/sh', script]
         if litConfig.useValgrind:
             # FIXME: Running valgrind on sh is overkill. We probably could just
             # run on clang with no real loss.
             command = litConfig.valgrindArgs + command
 
     try:
         out, err, exitCode = lit.util.executeCommand(command, cwd=cwd,
                                        env=test.config.environment,
                                        timeout=litConfig.maxIndividualTestTime)
         return (out, err, exitCode, None)
     except lit.util.ExecuteCommandTimeoutException as e:
         return (e.out, e.err, e.exitCode, e.msg)
 
 def parseIntegratedTestScriptCommands(source_path, keywords):
     """
     parseIntegratedTestScriptCommands(source_path) -> commands
 
     Parse the commands in an integrated test script file into a list of
     (line_number, command_type, line).
     """
 
     # This code is carefully written to be dual compatible with Python 2.5+ and
     # Python 3 without requiring input files to always have valid codings. The
     # trick we use is to open the file in binary mode and use the regular
     # expression library to find the commands, with it scanning strings in
     # Python2 and bytes in Python3.
     #
     # Once we find a match, we do require each script line to be decodable to
     # UTF-8, so we convert the outputs to UTF-8 before returning. This way the
     # remaining code can work with "strings" agnostic of the executing Python
     # version.
 
     keywords_re = re.compile(
         to_bytes("(%s)(.*)\n" % ("|".join(re.escape(k) for k in keywords),)))
 
     f = open(source_path, 'rb')
     try:
         # Read the entire file contents.
         data = f.read()
 
         # Ensure the data ends with a newline.
         if not data.endswith(to_bytes('\n')):
             data = data + to_bytes('\n')
 
         # Iterate over the matches.
         line_number = 1
         last_match_position = 0
         for match in keywords_re.finditer(data):
             # Compute the updated line number by counting the intervening
             # newlines.
             match_position = match.start()
             line_number += data.count(to_bytes('\n'), last_match_position,
                                       match_position)
             last_match_position = match_position
 
             # Convert the keyword and line to UTF-8 strings and yield the
             # command. Note that we take care to return regular strings in
             # Python 2, to avoid other code having to differentiate between the
             # str and unicode types.
             #
             # Opening the file in binary mode prevented Windows \r newline
             # characters from being converted to Unix \n newlines, so manually
             # strip those from the yielded lines.
             keyword,ln = match.groups()
             yield (line_number, to_string(keyword.decode('utf-8')),
                    to_string(ln.decode('utf-8').rstrip('\r')))
     finally:
         f.close()
 
 def getTempPaths(test):
     """Get the temporary location, this is always relative to the test suite
     root, not test source root."""
     execpath = test.getExecPath()
     execdir,execbase = os.path.split(execpath)
     tmpDir = os.path.join(execdir, 'Output')
     tmpBase = os.path.join(tmpDir, execbase)
     return tmpDir, tmpBase
 
 def colonNormalizePath(path):
     if kIsWindows:
         return re.sub(r'^(.):', r'\1', path.replace('\\', '/'))
     else:
         assert path[0] == '/'
         return path[1:]
 
 def getDefaultSubstitutions(test, tmpDir, tmpBase, normalize_slashes=False):
     sourcepath = test.getSourcePath()
     sourcedir = os.path.dirname(sourcepath)
 
     # Normalize slashes, if requested.
     if normalize_slashes:
         sourcepath = sourcepath.replace('\\', '/')
         sourcedir = sourcedir.replace('\\', '/')
         tmpDir = tmpDir.replace('\\', '/')
         tmpBase = tmpBase.replace('\\', '/')
 
     # We use #_MARKER_# to hide %% while we do the other substitutions.
     substitutions = []
     substitutions.extend([('%%', '#_MARKER_#')])
     substitutions.extend(test.config.substitutions)
     tmpName = tmpBase + '.tmp'
     baseName = os.path.basename(tmpBase)
     substitutions.extend([('%s', sourcepath),
                           ('%S', sourcedir),
                           ('%p', sourcedir),
                           ('%{pathsep}', os.pathsep),
                           ('%t', tmpName),
                           ('%basename_t', baseName),
                           ('%T', tmpDir),
                           ('#_MARKER_#', '%')])
 
     # "%/[STpst]" should be normalized.
     substitutions.extend([
             ('%/s', sourcepath.replace('\\', '/')),
             ('%/S', sourcedir.replace('\\', '/')),
             ('%/p', sourcedir.replace('\\', '/')),
             ('%/t', tmpBase.replace('\\', '/') + '.tmp'),
             ('%/T', tmpDir.replace('\\', '/')),
             ])
 
     # "%:[STpst]" are normalized paths without colons and without a leading
     # slash.
     substitutions.extend([
             ('%:s', colonNormalizePath(sourcepath)),
             ('%:S', colonNormalizePath(sourcedir)),
             ('%:p', colonNormalizePath(sourcedir)),
             ('%:t', colonNormalizePath(tmpBase + '.tmp')),
             ('%:T', colonNormalizePath(tmpDir)),
             ])
     return substitutions
 
 def applySubstitutions(script, substitutions):
     """Apply substitutions to the script.  Allow full regular expression syntax.
     Replace each matching occurrence of regular expression pattern a with
     substitution b in line ln."""
     def processLine(ln):
         # Apply substitutions
         for a,b in substitutions:
             if kIsWindows:
                 b = b.replace("\\","\\\\")
             ln = re.sub(a, b, ln)
 
         # Strip the trailing newline and any extra whitespace.
         return ln.strip()
     # Note Python 3 map() gives an iterator rather than a list so explicitly
     # convert to list before returning.
     return list(map(processLine, script))
 
 
 class ParserKind(object):
     """
     An enumeration representing the style of an integrated test keyword or
     command.
 
     TAG: A keyword taking no value. Ex 'END.'
     COMMAND: A keyword taking a list of shell commands. Ex 'RUN:'
     LIST: A keyword taking a comma-separated list of values.
     BOOLEAN_EXPR: A keyword taking a comma-separated list of 
         boolean expressions. Ex 'XFAIL:'
     CUSTOM: A keyword with custom parsing semantics.
     """
     TAG = 0
     COMMAND = 1
     LIST = 2
     BOOLEAN_EXPR = 3
     CUSTOM = 4
 
     @staticmethod
     def allowedKeywordSuffixes(value):
         return { ParserKind.TAG:          ['.'],
                  ParserKind.COMMAND:      [':'],
                  ParserKind.LIST:         [':'],
                  ParserKind.BOOLEAN_EXPR: [':'],
                  ParserKind.CUSTOM:       [':', '.']
                } [value]
 
     @staticmethod
     def str(value):
         return { ParserKind.TAG:          'TAG',
                  ParserKind.COMMAND:      'COMMAND',
                  ParserKind.LIST:         'LIST',
                  ParserKind.BOOLEAN_EXPR: 'BOOLEAN_EXPR',
                  ParserKind.CUSTOM:       'CUSTOM'
                } [value]
 
 
 class IntegratedTestKeywordParser(object):
     """A parser for LLVM/Clang style integrated test scripts.
 
     keyword: The keyword to parse for. It must end in either '.' or ':'.
     kind: An value of ParserKind.
     parser: A custom parser. This value may only be specified with
             ParserKind.CUSTOM.
     """
     def __init__(self, keyword, kind, parser=None, initial_value=None):
         allowedSuffixes = ParserKind.allowedKeywordSuffixes(kind)
         if len(keyword) == 0 or keyword[-1] not in allowedSuffixes:
             if len(allowedSuffixes) == 1:
                 raise ValueError("Keyword '%s' of kind '%s' must end in '%s'"
                                  % (keyword, ParserKind.str(kind),
                                     allowedSuffixes[0]))
             else:
                 raise ValueError("Keyword '%s' of kind '%s' must end in "
                                  " one of '%s'"
                                  % (keyword, ParserKind.str(kind),
                                     ' '.join(allowedSuffixes)))
 
         if parser is not None and kind != ParserKind.CUSTOM:
             raise ValueError("custom parsers can only be specified with "
                              "ParserKind.CUSTOM")
         self.keyword = keyword
         self.kind = kind
         self.parsed_lines = []
         self.value = initial_value
         self.parser = parser
 
         if kind == ParserKind.COMMAND:
             self.parser = lambda line_number, line, output: \
                                  self._handleCommand(line_number, line, output,
                                                      self.keyword)
         elif kind == ParserKind.LIST:
             self.parser = self._handleList
         elif kind == ParserKind.BOOLEAN_EXPR:
             self.parser = self._handleBooleanExpr
         elif kind == ParserKind.TAG:
             self.parser = self._handleTag
         elif kind == ParserKind.CUSTOM:
             if parser is None:
                 raise ValueError("ParserKind.CUSTOM requires a custom parser")
             self.parser = parser
         else:
             raise ValueError("Unknown kind '%s'" % kind)
 
     def parseLine(self, line_number, line):
         try:
             self.parsed_lines += [(line_number, line)]
             self.value = self.parser(line_number, line, self.value)
         except ValueError as e:
             raise ValueError(str(e) + ("\nin %s directive on test line %d" %
                                        (self.keyword, line_number)))
 
     def getValue(self):
         return self.value
 
     @staticmethod
     def _handleTag(line_number, line, output):
         """A helper for parsing TAG type keywords"""
         return (not line.strip() or output)
 
     @staticmethod
     def _handleCommand(line_number, line, output, keyword):
         """A helper for parsing COMMAND type keywords"""
         # Trim trailing whitespace.
         line = line.rstrip()
         # Substitute line number expressions
         line = re.sub('%\(line\)', str(line_number), line)
 
         def replace_line_number(match):
             if match.group(1) == '+':
                 return str(line_number + int(match.group(2)))
             if match.group(1) == '-':
                 return str(line_number - int(match.group(2)))
         line = re.sub('%\(line *([\+-]) *(\d+)\)', replace_line_number, line)
         # Collapse lines with trailing '\\'.
         if output and output[-1][-1] == '\\':
             output[-1] = output[-1][:-1] + line
         else:
             if output is None:
                 output = []
             pdbg = "%dbg({keyword} at line {line_number})".format(
                 keyword=keyword,
                 line_number=line_number)
             assert re.match(kPdbgRegex + "$", pdbg), \
                    "kPdbgRegex expected to match actual %dbg usage"
             line = "{pdbg} {real_command}".format(
                 pdbg=pdbg,
                 real_command=line)
             output.append(line)
         return output
 
     @staticmethod
     def _handleList(line_number, line, output):
         """A parser for LIST type keywords"""
         if output is None:
             output = []
         output.extend([s.strip() for s in line.split(',')])
         return output
 
     @staticmethod
     def _handleBooleanExpr(line_number, line, output):
         """A parser for BOOLEAN_EXPR type keywords"""
         if output is None:
             output = []
         output.extend([s.strip() for s in line.split(',')])
         # Evaluate each expression to verify syntax.
         # We don't want any results, just the raised ValueError.
         for s in output:
             if s != '*':
                 BooleanExpression.evaluate(s, [])
         return output
 
     @staticmethod
     def _handleRequiresAny(line_number, line, output):
         """A custom parser to transform REQUIRES-ANY: into REQUIRES:"""
 
         # Extract the conditions specified in REQUIRES-ANY: as written.
         conditions = []
         IntegratedTestKeywordParser._handleList(line_number, line, conditions)
 
         # Output a `REQUIRES: a || b || c` expression in its place.
         expression = ' || '.join(conditions)
         IntegratedTestKeywordParser._handleBooleanExpr(line_number,
                                                        expression, output)
         return output
 
 def parseIntegratedTestScript(test, additional_parsers=[],
                               require_script=True):
     """parseIntegratedTestScript - Scan an LLVM/Clang style integrated test
     script and extract the lines to 'RUN' as well as 'XFAIL' and 'REQUIRES'
     and 'UNSUPPORTED' information.
 
     If additional parsers are specified then the test is also scanned for the
     keywords they specify and all matches are passed to the custom parser.
 
     If 'require_script' is False an empty script
     may be returned. This can be used for test formats where the actual script
     is optional or ignored.
     """
 
     # Install the built-in keyword parsers.
     script = []
     builtin_parsers = [
         IntegratedTestKeywordParser('RUN:', ParserKind.COMMAND,
                                     initial_value=script),
         IntegratedTestKeywordParser('XFAIL:', ParserKind.BOOLEAN_EXPR,
                                     initial_value=test.xfails),
         IntegratedTestKeywordParser('REQUIRES:', ParserKind.BOOLEAN_EXPR,
                                     initial_value=test.requires),
         IntegratedTestKeywordParser('REQUIRES-ANY:', ParserKind.CUSTOM,
                                     IntegratedTestKeywordParser._handleRequiresAny, 
                                     initial_value=test.requires), 
         IntegratedTestKeywordParser('UNSUPPORTED:', ParserKind.BOOLEAN_EXPR,
                                     initial_value=test.unsupported),
         IntegratedTestKeywordParser('END.', ParserKind.TAG)
     ]
     keyword_parsers = {p.keyword: p for p in builtin_parsers}
     
     # Install user-defined additional parsers.
     for parser in additional_parsers:
         if not isinstance(parser, IntegratedTestKeywordParser):
             raise ValueError('additional parser must be an instance of '
                              'IntegratedTestKeywordParser')
         if parser.keyword in keyword_parsers:
             raise ValueError("Parser for keyword '%s' already exists"
                              % parser.keyword)
         keyword_parsers[parser.keyword] = parser
         
     # Collect the test lines from the script.
     sourcepath = test.getSourcePath()
     for line_number, command_type, ln in \
             parseIntegratedTestScriptCommands(sourcepath,
                                               keyword_parsers.keys()):
         parser = keyword_parsers[command_type]
         parser.parseLine(line_number, ln)
         if command_type == 'END.' and parser.getValue() is True:
             break
 
     # Verify the script contains a run line.
     if require_script and not script:
         return lit.Test.Result(Test.UNRESOLVED, "Test has no run line!")
 
     # Check for unterminated run lines.
     if script and script[-1][-1] == '\\':
         return lit.Test.Result(Test.UNRESOLVED,
                                "Test has unterminated run lines (with '\\')")
 
     # Enforce REQUIRES:
     missing_required_features = test.getMissingRequiredFeatures()
     if missing_required_features:
         msg = ', '.join(missing_required_features)
         return lit.Test.Result(Test.UNSUPPORTED,
                                "Test requires the following unavailable "
                                "features: %s" % msg)
 
     # Enforce UNSUPPORTED:
     unsupported_features = test.getUnsupportedFeatures()
     if unsupported_features:
         msg = ', '.join(unsupported_features)
         return lit.Test.Result(
             Test.UNSUPPORTED,
             "Test does not support the following features "
             "and/or targets: %s" % msg)
 
     # Enforce limit_to_features.
     if not test.isWithinFeatureLimits():
         msg = ', '.join(test.config.limit_to_features)
         return lit.Test.Result(Test.UNSUPPORTED,
                                "Test does not require any of the features "
                                "specified in limit_to_features: %s" % msg)
 
     return script
 
 
 def _runShTest(test, litConfig, useExternalSh, script, tmpBase):
     # Create the output directory if it does not already exist.
     lit.util.mkdir_p(os.path.dirname(tmpBase))
 
     execdir = os.path.dirname(test.getExecPath())
     if useExternalSh:
         res = executeScript(test, litConfig, tmpBase, script, execdir)
     else:
         res = executeScriptInternal(test, litConfig, tmpBase, script, execdir)
     if isinstance(res, lit.Test.Result):
         return res
 
     out,err,exitCode,timeoutInfo = res
     if exitCode == 0:
         status = Test.PASS
     else:
         if timeoutInfo is None:
             status = Test.FAIL
         else:
             status = Test.TIMEOUT
 
     # Form the output log.
     output = """Script:\n--\n%s\n--\nExit Code: %d\n""" % (
         '\n'.join(script), exitCode)
 
     if timeoutInfo is not None:
         output += """Timeout: %s\n""" % (timeoutInfo,)
     output += "\n"
 
     # Append the outputs, if present.
     if out:
         output += """Command Output (stdout):\n--\n%s\n--\n""" % (out,)
     if err:
         output += """Command Output (stderr):\n--\n%s\n--\n""" % (err,)
 
     return lit.Test.Result(status, output)
 
 
 def executeShTest(test, litConfig, useExternalSh,
                   extra_substitutions=[]):
     if test.config.unsupported:
         return lit.Test.Result(Test.UNSUPPORTED, 'Test is unsupported')
 
     script = parseIntegratedTestScript(test)
     if isinstance(script, lit.Test.Result):
         return script
     if litConfig.noExecute:
         return lit.Test.Result(Test.PASS)
 
     tmpDir, tmpBase = getTempPaths(test)
     substitutions = list(extra_substitutions)
     substitutions += getDefaultSubstitutions(test, tmpDir, tmpBase,
                                              normalize_slashes=useExternalSh)
     script = applySubstitutions(script, substitutions)
 
     # Re-run failed tests up to test_retry_attempts times.
     attempts = 1
     if hasattr(test.config, 'test_retry_attempts'):
         attempts += test.config.test_retry_attempts
     for i in range(attempts):
         res = _runShTest(test, litConfig, useExternalSh, script, tmpBase)
         if res.code != Test.FAIL:
             break
     # If we had to run the test more than once, count it as a flaky pass. These
     # will be printed separately in the test summary.
     if i > 0 and res.code == Test.PASS:
         res.code = Test.FLAKYPASS
     return res