Index: vendor/llvm/dist/docs/ReleaseNotes.rst =================================================================== --- vendor/llvm/dist/docs/ReleaseNotes.rst (revision 314158) +++ vendor/llvm/dist/docs/ReleaseNotes.rst (revision 314159) @@ -1,322 +1,329 @@ ======================== LLVM 4.0.0 Release Notes ======================== .. contents:: :local: .. warning:: These are in-progress notes for the upcoming LLVM 4.0.0 release. You may prefer the `LLVM 3.9 Release Notes `_. Introduction ============ This document contains the release notes for the LLVM Compiler Infrastructure, release 4.0.0. Here we describe the status of LLVM, including major improvements from the previous release, improvements in various subprojects of LLVM, and some of the current users of the code. All LLVM releases may be downloaded from the `LLVM releases web site `_. For more information about LLVM, including information about the latest release, please check out the `main LLVM web site `_. If you have questions or comments, the `LLVM Developer's Mailing List `_ is a good place to send them. Non-comprehensive list of changes in this release ================================================= * The C API functions LLVMAddFunctionAttr, LLVMGetFunctionAttr, LLVMRemoveFunctionAttr, LLVMAddAttribute, LLVMRemoveAttribute, LLVMGetAttribute, LLVMAddInstrAttribute and LLVMRemoveInstrAttribute have been removed. * The C API enum LLVMAttribute has been deleted. .. NOTE For small 1-3 sentence descriptions, just add an entry at the end of this list. If your description won't fit comfortably in one bullet point (e.g. maybe you would like to give an example of the functionality, or simply have a lot to talk about), see the `NOTE` below for adding a new subsection. * The definition and uses of LLVM_ATRIBUTE_UNUSED_RESULT in the LLVM source were replaced with LLVM_NODISCARD, which matches the C++17 [[nodiscard]] semantics rather than gcc's __attribute__((warn_unused_result)). * Minimum compiler version to build has been raised to GCC 4.8 and VS 2015. * The Timer related APIs now expect a Name and Description. When upgrading code the previously used names should become descriptions and a short name in the style of a programming language identifier should be added. * LLVM now handles invariant.group across different basic blocks, which makes it possible to devirtualize virtual calls inside loops. * The aggressive dead code elimination phase ("adce") now remove branches which do not effect program behavior. Loops are retained by default since they may be infinite but these can also be removed with LLVM option -adce-remove-loops when the loop body otherwise has no live operations. +* The GVNHoist pass is now enabled by default. The new pass based on Global + Value Numbering detects similar computations in branch code and replaces + multiple instances of the same computation with a unique expression. The + transform benefits code size and generates better schedules. GVNHoist is + more aggressive at -Os and -Oz, hoisting more expressions at the expense of + execution time degradations. + * The llvm-cov tool can now export coverage data as json. Its html output mode has also improved. * ... next change ... .. NOTE If you would like to document a larger change, then you can add a subsection about it right here. You can copy the following boilerplate and un-indent it (the indentation causes it to be inside this comment). Special New Feature ------------------- Makes programs 10x faster by doing Special New Thing. Improvements to ThinLTO (-flto=thin) ------------------------------------ * Integration with profile data (PGO). When available, profile data enables more accurate function importing decisions, as well as cross-module indirect call promotion. * Significant build-time and binary-size improvements when compiling with debug info (-g). LLVM Coroutines --------------- Experimental support for :doc:`Coroutines` was added, which can be enabled with ``-enable-coroutines`` in ``opt`` command tool or using ``addCoroutinePassesToExtensionPoints`` API when building the optimization pipeline. For more information on LLVM Coroutines and the LLVM implementation, see `2016 LLVM Developers’ Meeting talk on LLVM Coroutines `_. Regcall and Vectorcall Calling Conventions -------------------------------------------------- Support was added for _regcall calling convention. Existing __vectorcall calling convention support was extended to include correct handling of HVAs. The __vectorcall calling convention was introduced by Microsoft to enhance register usage when passing parameters. For more information please read `__vectorcall documentation `_. The __regcall calling convention was introduced by Intel to optimize parameter transfer on function call. This calling convention ensures that as many values as possible are passed or returned in registers. For more information please read `__regcall documentation `_. Code Generation Testing ----------------------- Passes that work on the machine instruction representation can be tested with the .mir serialization format. ``llc`` supports the ``-run-pass``, ``-stop-after``, ``-stop-before``, ``-start-after``, ``-start-before`` to to run a single pass of the code generation pipeline, or to stop or start the code generation pipeline at a given point. Additional information can be found in the :doc:`MIRLangRef`. The format is used by the tests ending in ``.mir`` in the ``test/CodeGen`` directory. This feature is available since 2015. It is used more often lately and was not mentioned in the release notes yet. Intrusive list API overhaul --------------------------- The intrusive list infrastructure was substantially rewritten over the last couple of releases, primarily to excise undefined behaviour. The biggest changes landed in this release. * ``simple_ilist`` is a lower-level intrusive list that never takes ownership of its nodes. New intrusive-list clients should consider using it instead of ``ilist``. * ``ilist_tag`` allows a single data type to be inserted into two parallel intrusive lists. A type can inherit twice from ``ilist_node``, first using ``ilist_node>`` (enabling insertion into ``simple_ilist>``) and second using ``ilist_node>`` (enabling insertion into ``simple_ilist>``), where ``A`` and ``B`` are arbitrary types. * ``ilist_sentinel_tracking`` controls whether an iterator knows whether it's pointing at the sentinel (``end()``). By default, sentinel tracking is on when ABI-breaking checks are enabled, and off otherwise; this is used for an assertion when dereferencing ``end()`` (this assertion triggered often in practice, and many backend bugs were fixed). Explicitly turning on sentinel tracking also enables ``iterator::isEnd()``. This is used by ``MachineInstrBundleIterator`` to iterate over bundles. * ``ilist`` is built on top of ``simple_ilist``, and supports the same configuration options. As before (and unlike ``simple_ilist``), ``ilist`` takes ownership of its nodes. However, it no longer supports *allocating* nodes, and is now equivalent to ``iplist``. ``iplist`` will likely be removed in the future. * ``ilist`` now always uses ``ilist_traits``. Instead of passing a custom traits class in via a template parameter, clients that want to customize the traits should specialize ``ilist_traits``. Clients that want to avoid ownership can specialize ``ilist_alloc_traits`` to inherit from ``ilist_noalloc_traits`` (or to do something funky); clients that need callbacks can specialize ``ilist_callback_traits`` directly. * The underlying data structure is now a simple recursive linked list. The sentinel node contains only a "next" (``begin()``) and "prev" (``rbegin()``) pointer and is stored in the same allocation as ``simple_ilist``. Previously, it was malloc-allocated on-demand by default, although the now-defunct ``ilist_sentinel_traits`` was sometimes specialized to avoid this. * The ``reverse_iterator`` class no longer uses ``std::reverse_iterator``. Instead, it now has a handle to the same node that it dereferences to. Reverse iterators now have the same iterator invalidation semantics as forward iterators. * ``iterator`` and ``reverse_iterator`` have explicit conversion constructors that match ``std::reverse_iterator``'s off-by-one semantics, so that reversing the end points of an iterator range results in the same range (albeit in reverse). I.e., ``reverse_iterator(begin())`` equals ``rend()``. * ``iterator::getReverse()`` and ``reverse_iterator::getReverse()`` return an iterator that dereferences to the *same* node. I.e., ``begin().getReverse()`` equals ``--rend()``. * ``ilist_node::getIterator()`` and ``ilist_node::getReverseIterator()`` return the forward and reverse iterators that dereference to the current node. I.e., ``begin()->getIterator()`` equals ``begin()`` and ``rbegin()->getReverseIterator()`` equals ``rbegin()``. * ``iterator`` now stores an ``ilist_node_base*`` instead of a ``T*``. The implicit conversions between ``ilist::iterator`` and ``T*`` have been removed. Clients may use ``N->getIterator()`` (if not ``nullptr``) or ``&*I`` (if not ``end()``); alternatively, clients may refactor to use references for known-good nodes. Changes to the LLVM IR ---------------------- Changes to the ARM Targets -------------------------- **During this release the AArch64 target has:** * Gained support for ILP32 relocations. * Gained support for XRay. * Made even more progress on GlobalISel. There is still some work left before it is production-ready though. * Refined the support for Qualcomm's Falkor and Samsung's Exynos CPUs. * Learned a few new tricks for lowering multiplications by constants, folding spilled/refilled copies etc. **During this release the ARM target has:** * Gained support for ROPI (read-only position independence) and RWPI (read-write position independence), which can be used to remove the need for a dynamic linker. * Gained support for execute-only code, which is placed in pages without read permissions. * Gained a machine scheduler for Cortex-R52. * Gained support for XRay. * Gained Thumb1 implementations for several compiler-rt builtins. It also has some support for building the builtins for HF targets. * Started using the generic bitreverse intrinsic instead of rbit. * Gained very basic support for GlobalISel. A lot of work has also been done in LLD for ARM, which now supports more relocations and TLS. Changes to the MIPS Target -------------------------- During this release ... Changes to the PowerPC Target ----------------------------- During this release ... Changes to the X86 Target ------------------------- During this release ... Changes to the AMDGPU Target ----------------------------- During this release ... Changes to the AVR Target ----------------------------- This marks the first release where the AVR backend has been completely merged from a fork into LLVM trunk. The backend is still marked experimental, but is generally quite usable. All downstream development has halted on `GitHub `_, and changes now go directly into LLVM trunk. * Instruction selector and pseudo instruction expansion pass landed * `read_register` and `write_register` intrinsics are now supported * Support stack stores greater than 63-bytes from the bottom of the stack * A number of assertion errors have been fixed * Support stores to `undef` locations * Very basic support for the target has been added to clang * Small optimizations to some 16-bit boolean expressions Most of the work behind the scenes has been on correctness of generated assembly, and also fixing some assertions we would hit on some well-formed inputs. Changes to the OCaml bindings ----------------------------- * The attribute API was completely overhauled, following the changes to the C API. External Open Source Projects Using LLVM 4.0.0 ============================================== * A project... LDC - the LLVM-based D compiler ------------------------------- `D `_ is a language with C-like syntax and static typing. It pragmatically combines efficiency, control, and modeling power, with safety and programmer productivity. D supports powerful concepts like Compile-Time Function Execution (CTFE) and Template Meta-Programming, provides an innovative approach to concurrency and offers many classical paradigms. `LDC `_ uses the frontend from the reference compiler combined with LLVM as backend to produce efficient native code. LDC targets x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux on ARM and PowerPC (32/64 bit). Ports to other architectures like AArch64 and MIPS64 are underway. Additional Information ====================== A wide variety of additional information is available on the `LLVM web page `_, in particular in the `documentation `_ section. The web page also contains versions of the API documentation which is up-to-date with the Subversion version of the source code. You can access versions of these documents specific to this release by going into the ``llvm/docs/`` directory in the LLVM tree. If you have any questions or comments about LLVM, please feel free to contact us via the `mailing lists `_. Index: vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp =================================================================== --- vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp (revision 314158) +++ vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp (revision 314159) @@ -1,826 +1,823 @@ #include "DwarfCompileUnit.h" #include "DwarfExpression.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/IR/Constants.h" #include "llvm/IR/DataLayout.h" #include "llvm/IR/GlobalValue.h" #include "llvm/IR/GlobalVariable.h" #include "llvm/IR/Instruction.h" #include "llvm/MC/MCAsmInfo.h" #include "llvm/MC/MCStreamer.h" #include "llvm/Target/TargetFrameLowering.h" #include "llvm/Target/TargetLoweringObjectFile.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Target/TargetRegisterInfo.h" #include "llvm/Target/TargetSubtargetInfo.h" namespace llvm { DwarfCompileUnit::DwarfCompileUnit(unsigned UID, const DICompileUnit *Node, AsmPrinter *A, DwarfDebug *DW, DwarfFile *DWU) : DwarfUnit(dwarf::DW_TAG_compile_unit, Node, A, DW, DWU), UniqueID(UID), Skeleton(nullptr), BaseAddress(nullptr) { insertDIE(Node, &getUnitDie()); MacroLabelBegin = Asm->createTempSymbol("cu_macro_begin"); } /// addLabelAddress - Add a dwarf label attribute data and value using /// DW_FORM_addr or DW_FORM_GNU_addr_index. /// void DwarfCompileUnit::addLabelAddress(DIE &Die, dwarf::Attribute Attribute, const MCSymbol *Label) { // Don't use the address pool in non-fission or in the skeleton unit itself. // FIXME: Once GDB supports this, it's probably worthwhile using the address // pool from the skeleton - maybe even in non-fission (possibly fewer // relocations by sharing them in the pool, but we have other ideas about how // to reduce the number of relocations as well/instead). if (!DD->useSplitDwarf() || !Skeleton) return addLocalLabelAddress(Die, Attribute, Label); if (Label) DD->addArangeLabel(SymbolCU(this, Label)); unsigned idx = DD->getAddressPool().getIndex(Label); Die.addValue(DIEValueAllocator, Attribute, dwarf::DW_FORM_GNU_addr_index, DIEInteger(idx)); } void DwarfCompileUnit::addLocalLabelAddress(DIE &Die, dwarf::Attribute Attribute, const MCSymbol *Label) { if (Label) DD->addArangeLabel(SymbolCU(this, Label)); if (Label) Die.addValue(DIEValueAllocator, Attribute, dwarf::DW_FORM_addr, DIELabel(Label)); else Die.addValue(DIEValueAllocator, Attribute, dwarf::DW_FORM_addr, DIEInteger(0)); } unsigned DwarfCompileUnit::getOrCreateSourceID(StringRef FileName, StringRef DirName) { // If we print assembly, we can't separate .file entries according to // compile units. Thus all files will belong to the default compile unit. // FIXME: add a better feature test than hasRawTextSupport. Even better, // extend .file to support this. return Asm->OutStreamer->EmitDwarfFileDirective( 0, DirName, FileName, Asm->OutStreamer->hasRawTextSupport() ? 0 : getUniqueID()); } DIE *DwarfCompileUnit::getOrCreateGlobalVariableDIE( const DIGlobalVariable *GV, ArrayRef GlobalExprs) { // Check for pre-existence. if (DIE *Die = getDIE(GV)) return Die; assert(GV); auto *GVContext = GV->getScope(); auto *GTy = DD->resolve(GV->getType()); // Construct the context before querying for the existence of the DIE in // case such construction creates the DIE. DIE *ContextDIE = getOrCreateContextDIE(GVContext); // Add to map. DIE *VariableDIE = &createAndAddDIE(GV->getTag(), *ContextDIE, GV); DIScope *DeclContext; if (auto *SDMDecl = GV->getStaticDataMemberDeclaration()) { DeclContext = resolve(SDMDecl->getScope()); assert(SDMDecl->isStaticMember() && "Expected static member decl"); assert(GV->isDefinition()); // We need the declaration DIE that is in the static member's class. DIE *VariableSpecDIE = getOrCreateStaticMemberDIE(SDMDecl); addDIEEntry(*VariableDIE, dwarf::DW_AT_specification, *VariableSpecDIE); // If the global variable's type is different from the one in the class // member type, assume that it's more specific and also emit it. if (GTy != DD->resolve(SDMDecl->getBaseType())) addType(*VariableDIE, GTy); } else { DeclContext = GV->getScope(); // Add name and type. addString(*VariableDIE, dwarf::DW_AT_name, GV->getDisplayName()); addType(*VariableDIE, GTy); // Add scoping info. if (!GV->isLocalToUnit()) addFlag(*VariableDIE, dwarf::DW_AT_external); // Add line number info. addSourceLine(*VariableDIE, GV); } if (!GV->isDefinition()) addFlag(*VariableDIE, dwarf::DW_AT_declaration); else addGlobalName(GV->getName(), *VariableDIE, DeclContext); if (uint32_t AlignInBytes = GV->getAlignInBytes()) addUInt(*VariableDIE, dwarf::DW_AT_alignment, dwarf::DW_FORM_udata, AlignInBytes); // Add location. bool addToAccelTable = false; DIELoc *Loc = nullptr; std::unique_ptr DwarfExpr; bool AllConstant = std::all_of( GlobalExprs.begin(), GlobalExprs.end(), [&](const GlobalExpr GE) { return GE.Expr && GE.Expr->isConstant(); }); for (const auto &GE : GlobalExprs) { const GlobalVariable *Global = GE.Var; const DIExpression *Expr = GE.Expr; // For compatibility with DWARF 3 and earlier, // DW_AT_location(DW_OP_constu, X, DW_OP_stack_value) becomes // DW_AT_const_value(X). if (GlobalExprs.size() == 1 && Expr && Expr->isConstant()) { addConstantValue(*VariableDIE, /*Unsigned=*/true, Expr->getElement(1)); // We cannot describe the location of dllimport'd variables: the // computation of their address requires loads from the IAT. } else if ((Global && !Global->hasDLLImportStorageClass()) || AllConstant) { if (!Loc) { Loc = new (DIEValueAllocator) DIELoc; DwarfExpr = llvm::make_unique(*Asm, *this, *Loc); } addToAccelTable = true; if (Global) { const MCSymbol *Sym = Asm->getSymbol(Global); if (Global->isThreadLocal()) { if (Asm->TM.Options.EmulatedTLS) { // TODO: add debug info for emulated thread local mode. } else { // FIXME: Make this work with -gsplit-dwarf. unsigned PointerSize = Asm->getDataLayout().getPointerSize(); assert((PointerSize == 4 || PointerSize == 8) && "Add support for other sizes if necessary"); // Based on GCC's support for TLS: if (!DD->useSplitDwarf()) { // 1) Start with a constNu of the appropriate pointer size addUInt(*Loc, dwarf::DW_FORM_data1, PointerSize == 4 ? dwarf::DW_OP_const4u : dwarf::DW_OP_const8u); // 2) containing the (relocated) offset of the TLS variable // within the module's TLS block. addExpr(*Loc, dwarf::DW_FORM_udata, Asm->getObjFileLowering().getDebugThreadLocalSymbol(Sym)); } else { addUInt(*Loc, dwarf::DW_FORM_data1, dwarf::DW_OP_GNU_const_index); addUInt(*Loc, dwarf::DW_FORM_udata, DD->getAddressPool().getIndex(Sym, /* TLS */ true)); } // 3) followed by an OP to make the debugger do a TLS lookup. addUInt(*Loc, dwarf::DW_FORM_data1, DD->useGNUTLSOpcode() ? dwarf::DW_OP_GNU_push_tls_address : dwarf::DW_OP_form_tls_address); } } else { DD->addArangeLabel(SymbolCU(this, Sym)); addOpAddress(*Loc, Sym); } } if (Expr) { DwarfExpr->addFragmentOffset(Expr); DwarfExpr->AddExpression(Expr); } } } if (Loc) addBlock(*VariableDIE, dwarf::DW_AT_location, DwarfExpr->finalize()); if (DD->useAllLinkageNames()) addLinkageName(*VariableDIE, GV->getLinkageName()); if (addToAccelTable) { DD->addAccelName(GV->getName(), *VariableDIE); // If the linkage name is different than the name, go ahead and output // that as well into the name table. if (GV->getLinkageName() != "" && GV->getName() != GV->getLinkageName()) DD->addAccelName(GV->getLinkageName(), *VariableDIE); } return VariableDIE; } void DwarfCompileUnit::addRange(RangeSpan Range) { bool SameAsPrevCU = this == DD->getPrevCU(); DD->setPrevCU(this); // If we have no current ranges just add the range and return, otherwise, // check the current section and CU against the previous section and CU we // emitted into and the subprogram was contained within. If these are the // same then extend our current range, otherwise add this as a new range. if (CURanges.empty() || !SameAsPrevCU || (&CURanges.back().getEnd()->getSection() != &Range.getEnd()->getSection())) { CURanges.push_back(Range); return; } CURanges.back().setEnd(Range.getEnd()); } DIE::value_iterator DwarfCompileUnit::addSectionLabel(DIE &Die, dwarf::Attribute Attribute, const MCSymbol *Label, const MCSymbol *Sec) { if (Asm->MAI->doesDwarfUseRelocationsAcrossSections()) return addLabel(Die, Attribute, DD->getDwarfVersion() >= 4 ? dwarf::DW_FORM_sec_offset : dwarf::DW_FORM_data4, Label); return addSectionDelta(Die, Attribute, Label, Sec); } void DwarfCompileUnit::initStmtList() { // Define start line table label for each Compile Unit. MCSymbol *LineTableStartSym = Asm->OutStreamer->getDwarfLineTableSymbol(getUniqueID()); // DW_AT_stmt_list is a offset of line number information for this // compile unit in debug_line section. For split dwarf this is // left in the skeleton CU and so not included. // The line table entries are not always emitted in assembly, so it // is not okay to use line_table_start here. const TargetLoweringObjectFile &TLOF = Asm->getObjFileLowering(); StmtListValue = addSectionLabel(getUnitDie(), dwarf::DW_AT_stmt_list, LineTableStartSym, TLOF.getDwarfLineSection()->getBeginSymbol()); } void DwarfCompileUnit::applyStmtList(DIE &D) { D.addValue(DIEValueAllocator, *StmtListValue); } void DwarfCompileUnit::attachLowHighPC(DIE &D, const MCSymbol *Begin, const MCSymbol *End) { assert(Begin && "Begin label should not be null!"); assert(End && "End label should not be null!"); assert(Begin->isDefined() && "Invalid starting label"); assert(End->isDefined() && "Invalid end label"); addLabelAddress(D, dwarf::DW_AT_low_pc, Begin); if (DD->getDwarfVersion() < 4) addLabelAddress(D, dwarf::DW_AT_high_pc, End); else addLabelDelta(D, dwarf::DW_AT_high_pc, End, Begin); } // Find DIE for the given subprogram and attach appropriate DW_AT_low_pc // and DW_AT_high_pc attributes. If there are global variables in this // scope then create and insert DIEs for these variables. DIE &DwarfCompileUnit::updateSubprogramScopeDIE(const DISubprogram *SP) { DIE *SPDie = getOrCreateSubprogramDIE(SP, includeMinimalInlineScopes()); attachLowHighPC(*SPDie, Asm->getFunctionBegin(), Asm->getFunctionEnd()); if (DD->useAppleExtensionAttributes() && !DD->getCurrentFunction()->getTarget().Options.DisableFramePointerElim( *DD->getCurrentFunction())) addFlag(*SPDie, dwarf::DW_AT_APPLE_omit_frame_ptr); // Only include DW_AT_frame_base in full debug info if (!includeMinimalInlineScopes()) { const TargetRegisterInfo *RI = Asm->MF->getSubtarget().getRegisterInfo(); MachineLocation Location(RI->getFrameRegister(*Asm->MF)); if (RI->isPhysicalRegister(Location.getReg())) addAddress(*SPDie, dwarf::DW_AT_frame_base, Location); } // Add name to the name table, we do this here because we're guaranteed // to have concrete versions of our DW_TAG_subprogram nodes. DD->addSubprogramNames(SP, *SPDie); return *SPDie; } // Construct a DIE for this scope. void DwarfCompileUnit::constructScopeDIE( LexicalScope *Scope, SmallVectorImpl &FinalChildren) { if (!Scope || !Scope->getScopeNode()) return; auto *DS = Scope->getScopeNode(); assert((Scope->getInlinedAt() || !isa(DS)) && "Only handle inlined subprograms here, use " "constructSubprogramScopeDIE for non-inlined " "subprograms"); SmallVector Children; // We try to create the scope DIE first, then the children DIEs. This will // avoid creating un-used children then removing them later when we find out // the scope DIE is null. DIE *ScopeDIE; if (Scope->getParent() && isa(DS)) { ScopeDIE = constructInlinedScopeDIE(Scope); if (!ScopeDIE) return; // We create children when the scope DIE is not null. createScopeChildrenDIE(Scope, Children); } else { // Early exit when we know the scope DIE is going to be null. if (DD->isLexicalScopeDIENull(Scope)) return; unsigned ChildScopeCount; // We create children here when we know the scope DIE is not going to be // null and the children will be added to the scope DIE. createScopeChildrenDIE(Scope, Children, &ChildScopeCount); // Skip imported directives in gmlt-like data. if (!includeMinimalInlineScopes()) { // There is no need to emit empty lexical block DIE. for (const auto *IE : ImportedEntities[DS]) Children.push_back( constructImportedEntityDIE(cast(IE))); } // If there are only other scopes as children, put them directly in the // parent instead, as this scope would serve no purpose. if (Children.size() == ChildScopeCount) { FinalChildren.insert(FinalChildren.end(), std::make_move_iterator(Children.begin()), std::make_move_iterator(Children.end())); return; } ScopeDIE = constructLexicalScopeDIE(Scope); assert(ScopeDIE && "Scope DIE should not be null."); } // Add children for (auto &I : Children) ScopeDIE->addChild(std::move(I)); FinalChildren.push_back(std::move(ScopeDIE)); } DIE::value_iterator DwarfCompileUnit::addSectionDelta(DIE &Die, dwarf::Attribute Attribute, const MCSymbol *Hi, const MCSymbol *Lo) { return Die.addValue(DIEValueAllocator, Attribute, DD->getDwarfVersion() >= 4 ? dwarf::DW_FORM_sec_offset : dwarf::DW_FORM_data4, new (DIEValueAllocator) DIEDelta(Hi, Lo)); } void DwarfCompileUnit::addScopeRangeList(DIE &ScopeDIE, SmallVector Range) { const TargetLoweringObjectFile &TLOF = Asm->getObjFileLowering(); // Emit offset in .debug_range as a relocatable label. emitDIE will handle // emitting it appropriately. const MCSymbol *RangeSectionSym = TLOF.getDwarfRangesSection()->getBeginSymbol(); RangeSpanList List(Asm->createTempSymbol("debug_ranges"), std::move(Range)); // Under fission, ranges are specified by constant offsets relative to the // CU's DW_AT_GNU_ranges_base. if (isDwoUnit()) addSectionDelta(ScopeDIE, dwarf::DW_AT_ranges, List.getSym(), RangeSectionSym); else addSectionLabel(ScopeDIE, dwarf::DW_AT_ranges, List.getSym(), RangeSectionSym); // Add the range list to the set of ranges to be emitted. (Skeleton ? Skeleton : this)->CURangeLists.push_back(std::move(List)); } void DwarfCompileUnit::attachRangesOrLowHighPC( DIE &Die, SmallVector Ranges) { if (Ranges.size() == 1) { const auto &single = Ranges.front(); attachLowHighPC(Die, single.getStart(), single.getEnd()); } else addScopeRangeList(Die, std::move(Ranges)); } void DwarfCompileUnit::attachRangesOrLowHighPC( DIE &Die, const SmallVectorImpl &Ranges) { SmallVector List; List.reserve(Ranges.size()); for (const InsnRange &R : Ranges) List.push_back(RangeSpan(DD->getLabelBeforeInsn(R.first), DD->getLabelAfterInsn(R.second))); attachRangesOrLowHighPC(Die, std::move(List)); } // This scope represents inlined body of a function. Construct DIE to // represent this concrete inlined copy of the function. DIE *DwarfCompileUnit::constructInlinedScopeDIE(LexicalScope *Scope) { assert(Scope->getScopeNode()); auto *DS = Scope->getScopeNode(); auto *InlinedSP = getDISubprogram(DS); // Find the subprogram's DwarfCompileUnit in the SPMap in case the subprogram // was inlined from another compile unit. DIE *OriginDIE = DU->getAbstractSPDies()[InlinedSP]; assert(OriginDIE && "Unable to find original DIE for an inlined subprogram."); auto ScopeDIE = DIE::get(DIEValueAllocator, dwarf::DW_TAG_inlined_subroutine); addDIEEntry(*ScopeDIE, dwarf::DW_AT_abstract_origin, *OriginDIE); attachRangesOrLowHighPC(*ScopeDIE, Scope->getRanges()); // Add the call site information to the DIE. const DILocation *IA = Scope->getInlinedAt(); addUInt(*ScopeDIE, dwarf::DW_AT_call_file, None, getOrCreateSourceID(IA->getFilename(), IA->getDirectory())); addUInt(*ScopeDIE, dwarf::DW_AT_call_line, None, IA->getLine()); if (IA->getDiscriminator() && DD->getDwarfVersion() >= 4) addUInt(*ScopeDIE, dwarf::DW_AT_GNU_discriminator, None, IA->getDiscriminator()); // Add name to the name table, we do this here because we're guaranteed // to have concrete versions of our DW_TAG_inlined_subprogram nodes. DD->addSubprogramNames(InlinedSP, *ScopeDIE); return ScopeDIE; } // Construct new DW_TAG_lexical_block for this scope and attach // DW_AT_low_pc/DW_AT_high_pc labels. DIE *DwarfCompileUnit::constructLexicalScopeDIE(LexicalScope *Scope) { if (DD->isLexicalScopeDIENull(Scope)) return nullptr; auto ScopeDIE = DIE::get(DIEValueAllocator, dwarf::DW_TAG_lexical_block); if (Scope->isAbstractScope()) return ScopeDIE; attachRangesOrLowHighPC(*ScopeDIE, Scope->getRanges()); return ScopeDIE; } /// constructVariableDIE - Construct a DIE for the given DbgVariable. DIE *DwarfCompileUnit::constructVariableDIE(DbgVariable &DV, bool Abstract) { auto D = constructVariableDIEImpl(DV, Abstract); DV.setDIE(*D); return D; } DIE *DwarfCompileUnit::constructVariableDIEImpl(const DbgVariable &DV, bool Abstract) { // Define variable debug information entry. auto VariableDie = DIE::get(DIEValueAllocator, DV.getTag()); if (Abstract) { applyVariableAttributes(DV, *VariableDie); return VariableDie; } // Add variable address. unsigned Offset = DV.getDebugLocListIndex(); if (Offset != ~0U) { addLocationList(*VariableDie, dwarf::DW_AT_location, Offset); return VariableDie; } // Check if variable is described by a DBG_VALUE instruction. if (const MachineInstr *DVInsn = DV.getMInsn()) { assert(DVInsn->getNumOperands() == 4); if (DVInsn->getOperand(0).isReg()) { const MachineOperand RegOp = DVInsn->getOperand(0); // If the second operand is an immediate, this is an indirect value. if (DVInsn->getOperand(1).isImm()) { MachineLocation Location(RegOp.getReg(), DVInsn->getOperand(1).getImm()); addVariableAddress(DV, *VariableDie, Location); } else if (RegOp.getReg()) addVariableAddress(DV, *VariableDie, MachineLocation(RegOp.getReg())); } else if (DVInsn->getOperand(0).isImm()) { // This variable is described by a single constant. // Check whether it has a DIExpression. auto *Expr = DV.getSingleExpression(); if (Expr && Expr->getNumElements()) { DIELoc *Loc = new (DIEValueAllocator) DIELoc; DIEDwarfExpression DwarfExpr(*Asm, *this, *Loc); // If there is an expression, emit raw unsigned bytes. DwarfExpr.addFragmentOffset(Expr); DwarfExpr.AddUnsignedConstant(DVInsn->getOperand(0).getImm()); DwarfExpr.AddExpression(Expr); addBlock(*VariableDie, dwarf::DW_AT_location, DwarfExpr.finalize()); } else addConstantValue(*VariableDie, DVInsn->getOperand(0), DV.getType()); } else if (DVInsn->getOperand(0).isFPImm()) addConstantFPValue(*VariableDie, DVInsn->getOperand(0)); else if (DVInsn->getOperand(0).isCImm()) addConstantValue(*VariableDie, DVInsn->getOperand(0).getCImm(), DV.getType()); return VariableDie; } // .. else use frame index. - if (DV.getFrameIndex().empty()) + if (!DV.hasFrameIndexExprs()) return VariableDie; - auto Expr = DV.getExpression().begin(); DIELoc *Loc = new (DIEValueAllocator) DIELoc; DIEDwarfExpression DwarfExpr(*Asm, *this, *Loc); - for (auto FI : DV.getFrameIndex()) { + for (auto &Fragment : DV.getFrameIndexExprs()) { unsigned FrameReg = 0; const TargetFrameLowering *TFI = Asm->MF->getSubtarget().getFrameLowering(); - int Offset = TFI->getFrameIndexReference(*Asm->MF, FI, FrameReg); - assert(Expr != DV.getExpression().end() && "Wrong number of expressions"); - DwarfExpr.addFragmentOffset(*Expr); + int Offset = TFI->getFrameIndexReference(*Asm->MF, Fragment.FI, FrameReg); + DwarfExpr.addFragmentOffset(Fragment.Expr); DwarfExpr.AddMachineRegIndirect(*Asm->MF->getSubtarget().getRegisterInfo(), FrameReg, Offset); - DwarfExpr.AddExpression(*Expr); - ++Expr; + DwarfExpr.AddExpression(Fragment.Expr); } addBlock(*VariableDie, dwarf::DW_AT_location, DwarfExpr.finalize()); return VariableDie; } DIE *DwarfCompileUnit::constructVariableDIE(DbgVariable &DV, const LexicalScope &Scope, DIE *&ObjectPointer) { auto Var = constructVariableDIE(DV, Scope.isAbstractScope()); if (DV.isObjectPointer()) ObjectPointer = Var; return Var; } DIE *DwarfCompileUnit::createScopeChildrenDIE(LexicalScope *Scope, SmallVectorImpl &Children, unsigned *ChildScopeCount) { DIE *ObjectPointer = nullptr; for (DbgVariable *DV : DU->getScopeVariables().lookup(Scope)) Children.push_back(constructVariableDIE(*DV, *Scope, ObjectPointer)); unsigned ChildCountWithoutScopes = Children.size(); for (LexicalScope *LS : Scope->getChildren()) constructScopeDIE(LS, Children); if (ChildScopeCount) *ChildScopeCount = Children.size() - ChildCountWithoutScopes; return ObjectPointer; } void DwarfCompileUnit::constructSubprogramScopeDIE(const DISubprogram *Sub, LexicalScope *Scope) { DIE &ScopeDIE = updateSubprogramScopeDIE(Sub); if (Scope) { assert(!Scope->getInlinedAt()); assert(!Scope->isAbstractScope()); // Collect lexical scope children first. // ObjectPointer might be a local (non-argument) local variable if it's a // block's synthetic this pointer. if (DIE *ObjectPointer = createAndAddScopeChildren(Scope, ScopeDIE)) addDIEEntry(ScopeDIE, dwarf::DW_AT_object_pointer, *ObjectPointer); } // If this is a variadic function, add an unspecified parameter. DITypeRefArray FnArgs = Sub->getType()->getTypeArray(); // If we have a single element of null, it is a function that returns void. // If we have more than one elements and the last one is null, it is a // variadic function. if (FnArgs.size() > 1 && !FnArgs[FnArgs.size() - 1] && !includeMinimalInlineScopes()) ScopeDIE.addChild( DIE::get(DIEValueAllocator, dwarf::DW_TAG_unspecified_parameters)); } DIE *DwarfCompileUnit::createAndAddScopeChildren(LexicalScope *Scope, DIE &ScopeDIE) { // We create children when the scope DIE is not null. SmallVector Children; DIE *ObjectPointer = createScopeChildrenDIE(Scope, Children); // Add children for (auto &I : Children) ScopeDIE.addChild(std::move(I)); return ObjectPointer; } void DwarfCompileUnit::constructAbstractSubprogramScopeDIE( LexicalScope *Scope) { DIE *&AbsDef = DU->getAbstractSPDies()[Scope->getScopeNode()]; if (AbsDef) return; auto *SP = cast(Scope->getScopeNode()); DIE *ContextDIE; if (includeMinimalInlineScopes()) ContextDIE = &getUnitDie(); // Some of this is duplicated from DwarfUnit::getOrCreateSubprogramDIE, with // the important distinction that the debug node is not associated with the // DIE (since the debug node will be associated with the concrete DIE, if // any). It could be refactored to some common utility function. else if (auto *SPDecl = SP->getDeclaration()) { ContextDIE = &getUnitDie(); getOrCreateSubprogramDIE(SPDecl); } else ContextDIE = getOrCreateContextDIE(resolve(SP->getScope())); // Passing null as the associated node because the abstract definition // shouldn't be found by lookup. AbsDef = &createAndAddDIE(dwarf::DW_TAG_subprogram, *ContextDIE, nullptr); applySubprogramAttributesToDefinition(SP, *AbsDef); if (!includeMinimalInlineScopes()) addUInt(*AbsDef, dwarf::DW_AT_inline, None, dwarf::DW_INL_inlined); if (DIE *ObjectPointer = createAndAddScopeChildren(Scope, *AbsDef)) addDIEEntry(*AbsDef, dwarf::DW_AT_object_pointer, *ObjectPointer); } DIE *DwarfCompileUnit::constructImportedEntityDIE( const DIImportedEntity *Module) { DIE *IMDie = DIE::get(DIEValueAllocator, (dwarf::Tag)Module->getTag()); insertDIE(Module, IMDie); DIE *EntityDie; auto *Entity = resolve(Module->getEntity()); if (auto *NS = dyn_cast(Entity)) EntityDie = getOrCreateNameSpace(NS); else if (auto *M = dyn_cast(Entity)) EntityDie = getOrCreateModule(M); else if (auto *SP = dyn_cast(Entity)) EntityDie = getOrCreateSubprogramDIE(SP); else if (auto *T = dyn_cast(Entity)) EntityDie = getOrCreateTypeDIE(T); else if (auto *GV = dyn_cast(Entity)) EntityDie = getOrCreateGlobalVariableDIE(GV, {}); else EntityDie = getDIE(Entity); assert(EntityDie); addSourceLine(*IMDie, Module->getLine(), Module->getScope()->getFilename(), Module->getScope()->getDirectory()); addDIEEntry(*IMDie, dwarf::DW_AT_import, *EntityDie); StringRef Name = Module->getName(); if (!Name.empty()) addString(*IMDie, dwarf::DW_AT_name, Name); return IMDie; } void DwarfCompileUnit::finishSubprogramDefinition(const DISubprogram *SP) { DIE *D = getDIE(SP); if (DIE *AbsSPDIE = DU->getAbstractSPDies().lookup(SP)) { if (D) // If this subprogram has an abstract definition, reference that addDIEEntry(*D, dwarf::DW_AT_abstract_origin, *AbsSPDIE); } else { assert(D || includeMinimalInlineScopes()); if (D) // And attach the attributes applySubprogramAttributesToDefinition(SP, *D); } } void DwarfCompileUnit::emitHeader(bool UseOffsets) { // Don't bother labeling the .dwo unit, as its offset isn't used. if (!Skeleton) { LabelBegin = Asm->createTempSymbol("cu_begin"); Asm->OutStreamer->EmitLabel(LabelBegin); } DwarfUnit::emitHeader(UseOffsets); } /// addGlobalName - Add a new global name to the compile unit. void DwarfCompileUnit::addGlobalName(StringRef Name, DIE &Die, const DIScope *Context) { if (includeMinimalInlineScopes()) return; std::string FullName = getParentContextString(Context) + Name.str(); GlobalNames[FullName] = &Die; } /// Add a new global type to the unit. void DwarfCompileUnit::addGlobalType(const DIType *Ty, const DIE &Die, const DIScope *Context) { if (includeMinimalInlineScopes()) return; std::string FullName = getParentContextString(Context) + Ty->getName().str(); GlobalTypes[FullName] = &Die; } /// addVariableAddress - Add DW_AT_location attribute for a /// DbgVariable based on provided MachineLocation. void DwarfCompileUnit::addVariableAddress(const DbgVariable &DV, DIE &Die, MachineLocation Location) { if (DV.hasComplexAddress()) addComplexAddress(DV, Die, dwarf::DW_AT_location, Location); else if (DV.isBlockByrefVariable()) addBlockByrefAddress(DV, Die, dwarf::DW_AT_location, Location); else addAddress(Die, dwarf::DW_AT_location, Location); } /// Add an address attribute to a die based on the location provided. void DwarfCompileUnit::addAddress(DIE &Die, dwarf::Attribute Attribute, const MachineLocation &Location) { DIELoc *Loc = new (DIEValueAllocator) DIELoc; DIEDwarfExpression Expr(*Asm, *this, *Loc); bool validReg; if (Location.isReg()) validReg = Expr.AddMachineReg(*Asm->MF->getSubtarget().getRegisterInfo(), Location.getReg()); else validReg = Expr.AddMachineRegIndirect(*Asm->MF->getSubtarget().getRegisterInfo(), Location.getReg(), Location.getOffset()); if (!validReg) return; // Now attach the location information to the DIE. addBlock(Die, Attribute, Expr.finalize()); } /// Start with the address based on the location provided, and generate the /// DWARF information necessary to find the actual variable given the extra /// address information encoded in the DbgVariable, starting from the starting /// location. Add the DWARF information to the die. void DwarfCompileUnit::addComplexAddress(const DbgVariable &DV, DIE &Die, dwarf::Attribute Attribute, const MachineLocation &Location) { DIELoc *Loc = new (DIEValueAllocator) DIELoc; DIEDwarfExpression DwarfExpr(*Asm, *this, *Loc); const DIExpression *Expr = DV.getSingleExpression(); DIExpressionCursor ExprCursor(Expr); const TargetRegisterInfo &TRI = *Asm->MF->getSubtarget().getRegisterInfo(); auto Reg = Location.getReg(); DwarfExpr.addFragmentOffset(Expr); bool ValidReg = Location.getOffset() ? DwarfExpr.AddMachineRegIndirect(TRI, Reg, Location.getOffset()) : DwarfExpr.AddMachineRegExpression(TRI, ExprCursor, Reg); if (!ValidReg) return; DwarfExpr.AddExpression(std::move(ExprCursor)); // Now attach the location information to the DIE. addBlock(Die, Attribute, Loc); } /// Add a Dwarf loclistptr attribute data and value. void DwarfCompileUnit::addLocationList(DIE &Die, dwarf::Attribute Attribute, unsigned Index) { dwarf::Form Form = DD->getDwarfVersion() >= 4 ? dwarf::DW_FORM_sec_offset : dwarf::DW_FORM_data4; Die.addValue(DIEValueAllocator, Attribute, Form, DIELocList(Index)); } void DwarfCompileUnit::applyVariableAttributes(const DbgVariable &Var, DIE &VariableDie) { StringRef Name = Var.getName(); if (!Name.empty()) addString(VariableDie, dwarf::DW_AT_name, Name); const auto *DIVar = Var.getVariable(); if (DIVar) if (uint32_t AlignInBytes = DIVar->getAlignInBytes()) addUInt(VariableDie, dwarf::DW_AT_alignment, dwarf::DW_FORM_udata, AlignInBytes); addSourceLine(VariableDie, DIVar); addType(VariableDie, Var.getType()); if (Var.isArtificial()) addFlag(VariableDie, dwarf::DW_AT_artificial); } /// Add a Dwarf expression attribute data and value. void DwarfCompileUnit::addExpr(DIELoc &Die, dwarf::Form Form, const MCExpr *Expr) { Die.addValue(DIEValueAllocator, (dwarf::Attribute)0, Form, DIEExpr(Expr)); } void DwarfCompileUnit::applySubprogramAttributesToDefinition( const DISubprogram *SP, DIE &SPDie) { auto *SPDecl = SP->getDeclaration(); auto *Context = resolve(SPDecl ? SPDecl->getScope() : SP->getScope()); applySubprogramAttributes(SP, SPDie, includeMinimalInlineScopes()); addGlobalName(SP->getName(), SPDie, Context); } bool DwarfCompileUnit::isDwoUnit() const { return DD->useSplitDwarf() && Skeleton; } bool DwarfCompileUnit::includeMinimalInlineScopes() const { return getCUNode()->getEmissionKind() == DICompileUnit::LineTablesOnly || (DD->useSplitDwarf() && !Skeleton); } } // end llvm namespace Index: vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfDebug.cpp =================================================================== --- vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (revision 314158) +++ vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (revision 314159) @@ -1,2043 +1,2052 @@ //===-- llvm/CodeGen/DwarfDebug.cpp - Dwarf Debug Framework ---------------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file contains support for writing dwarf debug info into asm files. // //===----------------------------------------------------------------------===// #include "DwarfDebug.h" #include "ByteStreamer.h" #include "DIEHash.h" #include "DebugLocEntry.h" #include "DwarfCompileUnit.h" #include "DwarfExpression.h" #include "DwarfUnit.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/Statistic.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/Triple.h" #include "llvm/CodeGen/DIE.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineModuleInfo.h" #include "llvm/IR/Constants.h" #include "llvm/IR/DataLayout.h" #include "llvm/IR/DebugInfo.h" #include "llvm/IR/Instructions.h" #include "llvm/IR/Module.h" #include "llvm/IR/ValueHandle.h" #include "llvm/MC/MCAsmInfo.h" #include "llvm/MC/MCDwarf.h" #include "llvm/MC/MCSection.h" #include "llvm/MC/MCStreamer.h" #include "llvm/MC/MCSymbol.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" #include "llvm/Support/Dwarf.h" #include "llvm/Support/Endian.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/FormattedStream.h" #include "llvm/Support/LEB128.h" #include "llvm/Support/MD5.h" #include "llvm/Support/Path.h" #include "llvm/Support/Timer.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Target/TargetFrameLowering.h" #include "llvm/Target/TargetLoweringObjectFile.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Target/TargetOptions.h" #include "llvm/Target/TargetRegisterInfo.h" #include "llvm/Target/TargetSubtargetInfo.h" using namespace llvm; #define DEBUG_TYPE "dwarfdebug" static cl::opt DisableDebugInfoPrinting("disable-debug-info-print", cl::Hidden, cl::desc("Disable debug info printing")); static cl::opt GenerateGnuPubSections("generate-gnu-dwarf-pub-sections", cl::Hidden, cl::desc("Generate GNU-style pubnames and pubtypes"), cl::init(false)); static cl::opt GenerateARangeSection("generate-arange-section", cl::Hidden, cl::desc("Generate dwarf aranges"), cl::init(false)); namespace { enum DefaultOnOff { Default, Enable, Disable }; } static cl::opt UnknownLocations( "use-unknown-locations", cl::Hidden, cl::desc("Make an absence of debug location information explicit."), cl::values(clEnumVal(Default, "At top of block or after label"), clEnumVal(Enable, "In all cases"), clEnumVal(Disable, "Never")), cl::init(Default)); static cl::opt DwarfAccelTables("dwarf-accel-tables", cl::Hidden, cl::desc("Output prototype dwarf accelerator tables."), cl::values(clEnumVal(Default, "Default for platform"), clEnumVal(Enable, "Enabled"), clEnumVal(Disable, "Disabled")), cl::init(Default)); static cl::opt SplitDwarf("split-dwarf", cl::Hidden, cl::desc("Output DWARF5 split debug info."), cl::values(clEnumVal(Default, "Default for platform"), clEnumVal(Enable, "Enabled"), clEnumVal(Disable, "Disabled")), cl::init(Default)); static cl::opt DwarfPubSections("generate-dwarf-pub-sections", cl::Hidden, cl::desc("Generate DWARF pubnames and pubtypes sections"), cl::values(clEnumVal(Default, "Default for platform"), clEnumVal(Enable, "Enabled"), clEnumVal(Disable, "Disabled")), cl::init(Default)); enum LinkageNameOption { DefaultLinkageNames, AllLinkageNames, AbstractLinkageNames }; static cl::opt DwarfLinkageNames("dwarf-linkage-names", cl::Hidden, cl::desc("Which DWARF linkage-name attributes to emit."), cl::values(clEnumValN(DefaultLinkageNames, "Default", "Default for platform"), clEnumValN(AllLinkageNames, "All", "All"), clEnumValN(AbstractLinkageNames, "Abstract", "Abstract subprograms")), cl::init(DefaultLinkageNames)); static const char *const DWARFGroupName = "dwarf"; static const char *const DWARFGroupDescription = "DWARF Emission"; static const char *const DbgTimerName = "writer"; static const char *const DbgTimerDescription = "DWARF Debug Writer"; void DebugLocDwarfExpression::EmitOp(uint8_t Op, const char *Comment) { BS.EmitInt8( Op, Comment ? Twine(Comment) + " " + dwarf::OperationEncodingString(Op) : dwarf::OperationEncodingString(Op)); } void DebugLocDwarfExpression::EmitSigned(int64_t Value) { BS.EmitSLEB128(Value, Twine(Value)); } void DebugLocDwarfExpression::EmitUnsigned(uint64_t Value) { BS.EmitULEB128(Value, Twine(Value)); } bool DebugLocDwarfExpression::isFrameRegister(const TargetRegisterInfo &TRI, unsigned MachineReg) { // This information is not available while emitting .debug_loc entries. return false; } //===----------------------------------------------------------------------===// bool DbgVariable::isBlockByrefVariable() const { assert(Var && "Invalid complex DbgVariable!"); return Var->getType().resolve()->isBlockByrefStruct(); } const DIType *DbgVariable::getType() const { DIType *Ty = Var->getType().resolve(); // FIXME: isBlockByrefVariable should be reformulated in terms of complex // addresses instead. if (Ty->isBlockByrefStruct()) { /* Byref variables, in Blocks, are declared by the programmer as "SomeType VarName;", but the compiler creates a __Block_byref_x_VarName struct, and gives the variable VarName either the struct, or a pointer to the struct, as its type. This is necessary for various behind-the-scenes things the compiler needs to do with by-reference variables in blocks. However, as far as the original *programmer* is concerned, the variable should still have type 'SomeType', as originally declared. The following function dives into the __Block_byref_x_VarName struct to find the original type of the variable. This will be passed back to the code generating the type for the Debug Information Entry for the variable 'VarName'. 'VarName' will then have the original type 'SomeType' in its debug information. The original type 'SomeType' will be the type of the field named 'VarName' inside the __Block_byref_x_VarName struct. NOTE: In order for this to not completely fail on the debugger side, the Debug Information Entry for the variable VarName needs to have a DW_AT_location that tells the debugger how to unwind through the pointers and __Block_byref_x_VarName struct to find the actual value of the variable. The function addBlockByrefType does this. */ DIType *subType = Ty; uint16_t tag = Ty->getTag(); if (tag == dwarf::DW_TAG_pointer_type) subType = resolve(cast(Ty)->getBaseType()); auto Elements = cast(subType)->getElements(); for (unsigned i = 0, N = Elements.size(); i < N; ++i) { auto *DT = cast(Elements[i]); if (getName() == DT->getName()) return resolve(DT->getBaseType()); } } return Ty; } +ArrayRef DbgVariable::getFrameIndexExprs() const { + std::sort(FrameIndexExprs.begin(), FrameIndexExprs.end(), + [](const FrameIndexExpr &A, const FrameIndexExpr &B) -> bool { + return A.Expr->getFragmentInfo()->OffsetInBits < + B.Expr->getFragmentInfo()->OffsetInBits; + }); + return FrameIndexExprs; +} + static const DwarfAccelTable::Atom TypeAtoms[] = { DwarfAccelTable::Atom(dwarf::DW_ATOM_die_offset, dwarf::DW_FORM_data4), DwarfAccelTable::Atom(dwarf::DW_ATOM_die_tag, dwarf::DW_FORM_data2), DwarfAccelTable::Atom(dwarf::DW_ATOM_type_flags, dwarf::DW_FORM_data1)}; DwarfDebug::DwarfDebug(AsmPrinter *A, Module *M) : DebugHandlerBase(A), DebugLocs(A->OutStreamer->isVerboseAsm()), InfoHolder(A, "info_string", DIEValueAllocator), SkeletonHolder(A, "skel_string", DIEValueAllocator), IsDarwin(A->TM.getTargetTriple().isOSDarwin()), AccelNames(DwarfAccelTable::Atom(dwarf::DW_ATOM_die_offset, dwarf::DW_FORM_data4)), AccelObjC(DwarfAccelTable::Atom(dwarf::DW_ATOM_die_offset, dwarf::DW_FORM_data4)), AccelNamespace(DwarfAccelTable::Atom(dwarf::DW_ATOM_die_offset, dwarf::DW_FORM_data4)), AccelTypes(TypeAtoms), DebuggerTuning(DebuggerKind::Default) { CurFn = nullptr; const Triple &TT = Asm->TM.getTargetTriple(); // Make sure we know our "debugger tuning." The target option takes // precedence; fall back to triple-based defaults. if (Asm->TM.Options.DebuggerTuning != DebuggerKind::Default) DebuggerTuning = Asm->TM.Options.DebuggerTuning; else if (IsDarwin) DebuggerTuning = DebuggerKind::LLDB; else if (TT.isPS4CPU()) DebuggerTuning = DebuggerKind::SCE; else DebuggerTuning = DebuggerKind::GDB; // Turn on accelerator tables for LLDB by default. if (DwarfAccelTables == Default) HasDwarfAccelTables = tuneForLLDB(); else HasDwarfAccelTables = DwarfAccelTables == Enable; HasAppleExtensionAttributes = tuneForLLDB(); // Handle split DWARF. Off by default for now. if (SplitDwarf == Default) HasSplitDwarf = false; else HasSplitDwarf = SplitDwarf == Enable; // Pubnames/pubtypes on by default for GDB. if (DwarfPubSections == Default) HasDwarfPubSections = tuneForGDB(); else HasDwarfPubSections = DwarfPubSections == Enable; // SCE defaults to linkage names only for abstract subprograms. if (DwarfLinkageNames == DefaultLinkageNames) UseAllLinkageNames = !tuneForSCE(); else UseAllLinkageNames = DwarfLinkageNames == AllLinkageNames; unsigned DwarfVersionNumber = Asm->TM.Options.MCOptions.DwarfVersion; unsigned DwarfVersion = DwarfVersionNumber ? DwarfVersionNumber : MMI->getModule()->getDwarfVersion(); // Use dwarf 4 by default if nothing is requested. DwarfVersion = DwarfVersion ? DwarfVersion : dwarf::DWARF_VERSION; // Work around a GDB bug. GDB doesn't support the standard opcode; // SCE doesn't support GNU's; LLDB prefers the standard opcode, which // is defined as of DWARF 3. // See GDB bug 11616 - DW_OP_form_tls_address is unimplemented // https://sourceware.org/bugzilla/show_bug.cgi?id=11616 UseGNUTLSOpcode = tuneForGDB() || DwarfVersion < 3; // GDB does not fully support the DWARF 4 representation for bitfields. UseDWARF2Bitfields = (DwarfVersion < 4) || tuneForGDB(); Asm->OutStreamer->getContext().setDwarfVersion(DwarfVersion); } // Define out of line so we don't have to include DwarfUnit.h in DwarfDebug.h. DwarfDebug::~DwarfDebug() { } static bool isObjCClass(StringRef Name) { return Name.startswith("+") || Name.startswith("-"); } static bool hasObjCCategory(StringRef Name) { if (!isObjCClass(Name)) return false; return Name.find(") ") != StringRef::npos; } static void getObjCClassCategory(StringRef In, StringRef &Class, StringRef &Category) { if (!hasObjCCategory(In)) { Class = In.slice(In.find('[') + 1, In.find(' ')); Category = ""; return; } Class = In.slice(In.find('[') + 1, In.find('(')); Category = In.slice(In.find('[') + 1, In.find(' ')); } static StringRef getObjCMethodName(StringRef In) { return In.slice(In.find(' ') + 1, In.find(']')); } // Add the various names to the Dwarf accelerator table names. // TODO: Determine whether or not we should add names for programs // that do not have a DW_AT_name or DW_AT_linkage_name field - this // is only slightly different than the lookup of non-standard ObjC names. void DwarfDebug::addSubprogramNames(const DISubprogram *SP, DIE &Die) { if (!SP->isDefinition()) return; addAccelName(SP->getName(), Die); // If the linkage name is different than the name, go ahead and output // that as well into the name table. if (SP->getLinkageName() != "" && SP->getName() != SP->getLinkageName()) addAccelName(SP->getLinkageName(), Die); // If this is an Objective-C selector name add it to the ObjC accelerator // too. if (isObjCClass(SP->getName())) { StringRef Class, Category; getObjCClassCategory(SP->getName(), Class, Category); addAccelObjC(Class, Die); if (Category != "") addAccelObjC(Category, Die); // Also add the base method name to the name table. addAccelName(getObjCMethodName(SP->getName()), Die); } } /// Check whether we should create a DIE for the given Scope, return true /// if we don't create a DIE (the corresponding DIE is null). bool DwarfDebug::isLexicalScopeDIENull(LexicalScope *Scope) { if (Scope->isAbstractScope()) return false; // We don't create a DIE if there is no Range. const SmallVectorImpl &Ranges = Scope->getRanges(); if (Ranges.empty()) return true; if (Ranges.size() > 1) return false; // We don't create a DIE if we have a single Range and the end label // is null. return !getLabelAfterInsn(Ranges.front().second); } template static void forBothCUs(DwarfCompileUnit &CU, Func F) { F(CU); if (auto *SkelCU = CU.getSkeleton()) if (CU.getCUNode()->getSplitDebugInlining()) F(*SkelCU); } void DwarfDebug::constructAbstractSubprogramScopeDIE(LexicalScope *Scope) { assert(Scope && Scope->getScopeNode()); assert(Scope->isAbstractScope()); assert(!Scope->getInlinedAt()); auto *SP = cast(Scope->getScopeNode()); ProcessedSPNodes.insert(SP); // Find the subprogram's DwarfCompileUnit in the SPMap in case the subprogram // was inlined from another compile unit. auto &CU = *CUMap.lookup(SP->getUnit()); forBothCUs(CU, [&](DwarfCompileUnit &CU) { CU.constructAbstractSubprogramScopeDIE(Scope); }); } void DwarfDebug::addGnuPubAttributes(DwarfUnit &U, DIE &D) const { if (!GenerateGnuPubSections) return; U.addFlag(D, dwarf::DW_AT_GNU_pubnames); } // Create new DwarfCompileUnit for the given metadata node with tag // DW_TAG_compile_unit. DwarfCompileUnit & DwarfDebug::constructDwarfCompileUnit(const DICompileUnit *DIUnit) { StringRef FN = DIUnit->getFilename(); CompilationDir = DIUnit->getDirectory(); auto OwnedUnit = make_unique( InfoHolder.getUnits().size(), DIUnit, Asm, this, &InfoHolder); DwarfCompileUnit &NewCU = *OwnedUnit; DIE &Die = NewCU.getUnitDie(); InfoHolder.addUnit(std::move(OwnedUnit)); if (useSplitDwarf()) { NewCU.setSkeleton(constructSkeletonCU(NewCU)); NewCU.addString(Die, dwarf::DW_AT_GNU_dwo_name, DIUnit->getSplitDebugFilename()); } // LTO with assembly output shares a single line table amongst multiple CUs. // To avoid the compilation directory being ambiguous, let the line table // explicitly describe the directory of all files, never relying on the // compilation directory. if (!Asm->OutStreamer->hasRawTextSupport() || SingleCU) Asm->OutStreamer->getContext().setMCLineTableCompilationDir( NewCU.getUniqueID(), CompilationDir); NewCU.addString(Die, dwarf::DW_AT_producer, DIUnit->getProducer()); NewCU.addUInt(Die, dwarf::DW_AT_language, dwarf::DW_FORM_data2, DIUnit->getSourceLanguage()); NewCU.addString(Die, dwarf::DW_AT_name, FN); if (!useSplitDwarf()) { NewCU.initStmtList(); // If we're using split dwarf the compilation dir is going to be in the // skeleton CU and so we don't need to duplicate it here. if (!CompilationDir.empty()) NewCU.addString(Die, dwarf::DW_AT_comp_dir, CompilationDir); addGnuPubAttributes(NewCU, Die); } if (useAppleExtensionAttributes()) { if (DIUnit->isOptimized()) NewCU.addFlag(Die, dwarf::DW_AT_APPLE_optimized); StringRef Flags = DIUnit->getFlags(); if (!Flags.empty()) NewCU.addString(Die, dwarf::DW_AT_APPLE_flags, Flags); if (unsigned RVer = DIUnit->getRuntimeVersion()) NewCU.addUInt(Die, dwarf::DW_AT_APPLE_major_runtime_vers, dwarf::DW_FORM_data1, RVer); } if (useSplitDwarf()) NewCU.setSection(Asm->getObjFileLowering().getDwarfInfoDWOSection()); else NewCU.setSection(Asm->getObjFileLowering().getDwarfInfoSection()); if (DIUnit->getDWOId()) { // This CU is either a clang module DWO or a skeleton CU. NewCU.addUInt(Die, dwarf::DW_AT_GNU_dwo_id, dwarf::DW_FORM_data8, DIUnit->getDWOId()); if (!DIUnit->getSplitDebugFilename().empty()) // This is a prefabricated skeleton CU. NewCU.addString(Die, dwarf::DW_AT_GNU_dwo_name, DIUnit->getSplitDebugFilename()); } CUMap.insert({DIUnit, &NewCU}); CUDieMap.insert({&Die, &NewCU}); return NewCU; } void DwarfDebug::constructAndAddImportedEntityDIE(DwarfCompileUnit &TheCU, const DIImportedEntity *N) { if (DIE *D = TheCU.getOrCreateContextDIE(N->getScope())) D->addChild(TheCU.constructImportedEntityDIE(N)); } /// Sort and unique GVEs by comparing their fragment offset. static SmallVectorImpl & sortGlobalExprs(SmallVectorImpl &GVEs) { std::sort(GVEs.begin(), GVEs.end(), [](DwarfCompileUnit::GlobalExpr A, DwarfCompileUnit::GlobalExpr B) { if (A.Expr != B.Expr && A.Expr && B.Expr) { auto FragmentA = A.Expr->getFragmentInfo(); auto FragmentB = B.Expr->getFragmentInfo(); if (FragmentA && FragmentB) return FragmentA->OffsetInBits < FragmentB->OffsetInBits; } return false; }); GVEs.erase(std::unique(GVEs.begin(), GVEs.end(), [](DwarfCompileUnit::GlobalExpr A, DwarfCompileUnit::GlobalExpr B) { return A.Expr == B.Expr; }), GVEs.end()); return GVEs; } // Emit all Dwarf sections that should come prior to the content. Create // global DIEs and emit initial debug info sections. This is invoked by // the target AsmPrinter. void DwarfDebug::beginModule() { NamedRegionTimer T(DbgTimerName, DbgTimerDescription, DWARFGroupName, DWARFGroupDescription, TimePassesIsEnabled); if (DisableDebugInfoPrinting) return; const Module *M = MMI->getModule(); unsigned NumDebugCUs = std::distance(M->debug_compile_units_begin(), M->debug_compile_units_end()); // Tell MMI whether we have debug info. MMI->setDebugInfoAvailability(NumDebugCUs > 0); SingleCU = NumDebugCUs == 1; DenseMap> GVMap; for (const GlobalVariable &Global : M->globals()) { SmallVector GVs; Global.getDebugInfo(GVs); for (auto *GVE : GVs) GVMap[GVE->getVariable()].push_back({&Global, GVE->getExpression()}); } for (DICompileUnit *CUNode : M->debug_compile_units()) { DwarfCompileUnit &CU = constructDwarfCompileUnit(CUNode); for (auto *IE : CUNode->getImportedEntities()) CU.addImportedEntity(IE); // Global Variables. for (auto *GVE : CUNode->getGlobalVariables()) GVMap[GVE->getVariable()].push_back({nullptr, GVE->getExpression()}); DenseSet Processed; for (auto *GVE : CUNode->getGlobalVariables()) { DIGlobalVariable *GV = GVE->getVariable(); if (Processed.insert(GV).second) CU.getOrCreateGlobalVariableDIE(GV, sortGlobalExprs(GVMap[GV])); } for (auto *Ty : CUNode->getEnumTypes()) { // The enum types array by design contains pointers to // MDNodes rather than DIRefs. Unique them here. CU.getOrCreateTypeDIE(cast(Ty)); } for (auto *Ty : CUNode->getRetainedTypes()) { // The retained types array by design contains pointers to // MDNodes rather than DIRefs. Unique them here. if (DIType *RT = dyn_cast(Ty)) if (!RT->isExternalTypeRef()) // There is no point in force-emitting a forward declaration. CU.getOrCreateTypeDIE(RT); } // Emit imported_modules last so that the relevant context is already // available. for (auto *IE : CUNode->getImportedEntities()) constructAndAddImportedEntityDIE(CU, IE); } } void DwarfDebug::finishVariableDefinitions() { for (const auto &Var : ConcreteVariables) { DIE *VariableDie = Var->getDIE(); assert(VariableDie); // FIXME: Consider the time-space tradeoff of just storing the unit pointer // in the ConcreteVariables list, rather than looking it up again here. // DIE::getUnit isn't simple - it walks parent pointers, etc. DwarfCompileUnit *Unit = CUDieMap.lookup(VariableDie->getUnitDie()); assert(Unit); DbgVariable *AbsVar = getExistingAbstractVariable( InlinedVariable(Var->getVariable(), Var->getInlinedAt())); if (AbsVar && AbsVar->getDIE()) { Unit->addDIEEntry(*VariableDie, dwarf::DW_AT_abstract_origin, *AbsVar->getDIE()); } else Unit->applyVariableAttributes(*Var, *VariableDie); } } void DwarfDebug::finishSubprogramDefinitions() { for (const DISubprogram *SP : ProcessedSPNodes) if (SP->getUnit()->getEmissionKind() != DICompileUnit::NoDebug) forBothCUs(*CUMap.lookup(SP->getUnit()), [&](DwarfCompileUnit &CU) { CU.finishSubprogramDefinition(SP); }); } void DwarfDebug::finalizeModuleInfo() { const TargetLoweringObjectFile &TLOF = Asm->getObjFileLowering(); finishSubprogramDefinitions(); finishVariableDefinitions(); // Handle anything that needs to be done on a per-unit basis after // all other generation. for (const auto &P : CUMap) { auto &TheCU = *P.second; // Emit DW_AT_containing_type attribute to connect types with their // vtable holding type. TheCU.constructContainingTypeDIEs(); // Add CU specific attributes if we need to add any. // If we're splitting the dwarf out now that we've got the entire // CU then add the dwo id to it. auto *SkCU = TheCU.getSkeleton(); if (useSplitDwarf()) { // Emit a unique identifier for this CU. uint64_t ID = DIEHash(Asm).computeCUSignature(TheCU.getUnitDie()); TheCU.addUInt(TheCU.getUnitDie(), dwarf::DW_AT_GNU_dwo_id, dwarf::DW_FORM_data8, ID); SkCU->addUInt(SkCU->getUnitDie(), dwarf::DW_AT_GNU_dwo_id, dwarf::DW_FORM_data8, ID); // We don't keep track of which addresses are used in which CU so this // is a bit pessimistic under LTO. if (!AddrPool.isEmpty()) { const MCSymbol *Sym = TLOF.getDwarfAddrSection()->getBeginSymbol(); SkCU->addSectionLabel(SkCU->getUnitDie(), dwarf::DW_AT_GNU_addr_base, Sym, Sym); } if (!SkCU->getRangeLists().empty()) { const MCSymbol *Sym = TLOF.getDwarfRangesSection()->getBeginSymbol(); SkCU->addSectionLabel(SkCU->getUnitDie(), dwarf::DW_AT_GNU_ranges_base, Sym, Sym); } } // If we have code split among multiple sections or non-contiguous // ranges of code then emit a DW_AT_ranges attribute on the unit that will // remain in the .o file, otherwise add a DW_AT_low_pc. // FIXME: We should use ranges allow reordering of code ala // .subsections_via_symbols in mach-o. This would mean turning on // ranges for all subprogram DIEs for mach-o. DwarfCompileUnit &U = SkCU ? *SkCU : TheCU; if (unsigned NumRanges = TheCU.getRanges().size()) { if (NumRanges > 1) // A DW_AT_low_pc attribute may also be specified in combination with // DW_AT_ranges to specify the default base address for use in // location lists (see Section 2.6.2) and range lists (see Section // 2.17.3). U.addUInt(U.getUnitDie(), dwarf::DW_AT_low_pc, dwarf::DW_FORM_addr, 0); else U.setBaseAddress(TheCU.getRanges().front().getStart()); U.attachRangesOrLowHighPC(U.getUnitDie(), TheCU.takeRanges()); } auto *CUNode = cast(P.first); // If compile Unit has macros, emit "DW_AT_macro_info" attribute. if (CUNode->getMacros()) U.addSectionLabel(U.getUnitDie(), dwarf::DW_AT_macro_info, U.getMacroLabelBegin(), TLOF.getDwarfMacinfoSection()->getBeginSymbol()); } // Compute DIE offsets and sizes. InfoHolder.computeSizeAndOffsets(); if (useSplitDwarf()) SkeletonHolder.computeSizeAndOffsets(); } // Emit all Dwarf sections that should come after the content. void DwarfDebug::endModule() { assert(CurFn == nullptr); assert(CurMI == nullptr); // If we aren't actually generating debug info (check beginModule - // conditionalized on !DisableDebugInfoPrinting and the presence of the // llvm.dbg.cu metadata node) if (!MMI->hasDebugInfo()) return; // Finalize the debug info for the module. finalizeModuleInfo(); emitDebugStr(); if (useSplitDwarf()) emitDebugLocDWO(); else // Emit info into a debug loc section. emitDebugLoc(); // Corresponding abbreviations into a abbrev section. emitAbbreviations(); // Emit all the DIEs into a debug info section. emitDebugInfo(); // Emit info into a debug aranges section. if (GenerateARangeSection) emitDebugARanges(); // Emit info into a debug ranges section. emitDebugRanges(); // Emit info into a debug macinfo section. emitDebugMacinfo(); if (useSplitDwarf()) { emitDebugStrDWO(); emitDebugInfoDWO(); emitDebugAbbrevDWO(); emitDebugLineDWO(); // Emit DWO addresses. AddrPool.emit(*Asm, Asm->getObjFileLowering().getDwarfAddrSection()); } // Emit info into the dwarf accelerator table sections. if (useDwarfAccelTables()) { emitAccelNames(); emitAccelObjC(); emitAccelNamespaces(); emitAccelTypes(); } // Emit the pubnames and pubtypes sections if requested. if (HasDwarfPubSections) { emitDebugPubNames(GenerateGnuPubSections); emitDebugPubTypes(GenerateGnuPubSections); } // clean up. AbstractVariables.clear(); } // Find abstract variable, if any, associated with Var. DbgVariable * DwarfDebug::getExistingAbstractVariable(InlinedVariable IV, const DILocalVariable *&Cleansed) { // More then one inlined variable corresponds to one abstract variable. Cleansed = IV.first; auto I = AbstractVariables.find(Cleansed); if (I != AbstractVariables.end()) return I->second.get(); return nullptr; } DbgVariable *DwarfDebug::getExistingAbstractVariable(InlinedVariable IV) { const DILocalVariable *Cleansed; return getExistingAbstractVariable(IV, Cleansed); } void DwarfDebug::createAbstractVariable(const DILocalVariable *Var, LexicalScope *Scope) { auto AbsDbgVariable = make_unique(Var, /* IA */ nullptr); InfoHolder.addScopeVariable(Scope, AbsDbgVariable.get()); AbstractVariables[Var] = std::move(AbsDbgVariable); } void DwarfDebug::ensureAbstractVariableIsCreated(InlinedVariable IV, const MDNode *ScopeNode) { const DILocalVariable *Cleansed = nullptr; if (getExistingAbstractVariable(IV, Cleansed)) return; createAbstractVariable(Cleansed, LScopes.getOrCreateAbstractScope( cast(ScopeNode))); } void DwarfDebug::ensureAbstractVariableIsCreatedIfScoped( InlinedVariable IV, const MDNode *ScopeNode) { const DILocalVariable *Cleansed = nullptr; if (getExistingAbstractVariable(IV, Cleansed)) return; if (LexicalScope *Scope = LScopes.findAbstractScope(cast_or_null(ScopeNode))) createAbstractVariable(Cleansed, Scope); } // Collect variable information from side table maintained by MF. void DwarfDebug::collectVariableInfoFromMFTable( DenseSet &Processed) { for (const auto &VI : Asm->MF->getVariableDbgInfo()) { if (!VI.Var) continue; assert(VI.Var->isValidLocationForIntrinsic(VI.Loc) && "Expected inlined-at fields to agree"); InlinedVariable Var(VI.Var, VI.Loc->getInlinedAt()); Processed.insert(Var); LexicalScope *Scope = LScopes.findLexicalScope(VI.Loc); // If variable scope is not found then skip this variable. if (!Scope) continue; ensureAbstractVariableIsCreatedIfScoped(Var, Scope->getScopeNode()); auto RegVar = make_unique(Var.first, Var.second); RegVar->initializeMMI(VI.Expr, VI.Slot); if (InfoHolder.addScopeVariable(Scope, RegVar.get())) ConcreteVariables.push_back(std::move(RegVar)); } } // Get .debug_loc entry for the instruction range starting at MI. static DebugLocEntry::Value getDebugLocValue(const MachineInstr *MI) { const DIExpression *Expr = MI->getDebugExpression(); assert(MI->getNumOperands() == 4); if (MI->getOperand(0).isReg()) { MachineLocation MLoc; // If the second operand is an immediate, this is a // register-indirect address. if (!MI->getOperand(1).isImm()) MLoc.set(MI->getOperand(0).getReg()); else MLoc.set(MI->getOperand(0).getReg(), MI->getOperand(1).getImm()); return DebugLocEntry::Value(Expr, MLoc); } if (MI->getOperand(0).isImm()) return DebugLocEntry::Value(Expr, MI->getOperand(0).getImm()); if (MI->getOperand(0).isFPImm()) return DebugLocEntry::Value(Expr, MI->getOperand(0).getFPImm()); if (MI->getOperand(0).isCImm()) return DebugLocEntry::Value(Expr, MI->getOperand(0).getCImm()); llvm_unreachable("Unexpected 4-operand DBG_VALUE instruction!"); } /// \brief If this and Next are describing different fragments of the same /// variable, merge them by appending Next's values to the current /// list of values. /// Return true if the merge was successful. bool DebugLocEntry::MergeValues(const DebugLocEntry &Next) { if (Begin == Next.Begin) { auto *FirstExpr = cast(Values[0].Expression); auto *FirstNextExpr = cast(Next.Values[0].Expression); if (!FirstExpr->isFragment() || !FirstNextExpr->isFragment()) return false; // We can only merge entries if none of the fragments overlap any others. // In doing so, we can take advantage of the fact that both lists are // sorted. for (unsigned i = 0, j = 0; i < Values.size(); ++i) { for (; j < Next.Values.size(); ++j) { int res = DebugHandlerBase::fragmentCmp( cast(Values[i].Expression), cast(Next.Values[j].Expression)); if (res == 0) // The two expressions overlap, we can't merge. return false; // Values[i] is entirely before Next.Values[j], // so go back to the next entry of Values. else if (res == -1) break; // Next.Values[j] is entirely before Values[i], so go on to the // next entry of Next.Values. } } addValues(Next.Values); End = Next.End; return true; } return false; } /// Build the location list for all DBG_VALUEs in the function that /// describe the same variable. If the ranges of several independent /// fragments of the same variable overlap partially, split them up and /// combine the ranges. The resulting DebugLocEntries are will have /// strict monotonically increasing begin addresses and will never /// overlap. // // Input: // // Ranges History [var, loc, fragment ofs size] // 0 | [x, (reg0, fragment 0, 32)] // 1 | | [x, (reg1, fragment 32, 32)] <- IsFragmentOfPrevEntry // 2 | | ... // 3 | [clobber reg0] // 4 [x, (mem, fragment 0, 64)] <- overlapping with both previous fragments of // x. // // Output: // // [0-1] [x, (reg0, fragment 0, 32)] // [1-3] [x, (reg0, fragment 0, 32), (reg1, fragment 32, 32)] // [3-4] [x, (reg1, fragment 32, 32)] // [4- ] [x, (mem, fragment 0, 64)] void DwarfDebug::buildLocationList(SmallVectorImpl &DebugLoc, const DbgValueHistoryMap::InstrRanges &Ranges) { SmallVector OpenRanges; for (auto I = Ranges.begin(), E = Ranges.end(); I != E; ++I) { const MachineInstr *Begin = I->first; const MachineInstr *End = I->second; assert(Begin->isDebugValue() && "Invalid History entry"); // Check if a variable is inaccessible in this range. if (Begin->getNumOperands() > 1 && Begin->getOperand(0).isReg() && !Begin->getOperand(0).getReg()) { OpenRanges.clear(); continue; } // If this fragment overlaps with any open ranges, truncate them. const DIExpression *DIExpr = Begin->getDebugExpression(); auto Last = remove_if(OpenRanges, [&](DebugLocEntry::Value R) { return fragmentsOverlap(DIExpr, R.getExpression()); }); OpenRanges.erase(Last, OpenRanges.end()); const MCSymbol *StartLabel = getLabelBeforeInsn(Begin); assert(StartLabel && "Forgot label before DBG_VALUE starting a range!"); const MCSymbol *EndLabel; if (End != nullptr) EndLabel = getLabelAfterInsn(End); else if (std::next(I) == Ranges.end()) EndLabel = Asm->getFunctionEnd(); else EndLabel = getLabelBeforeInsn(std::next(I)->first); assert(EndLabel && "Forgot label after instruction ending a range!"); DEBUG(dbgs() << "DotDebugLoc: " << *Begin << "\n"); auto Value = getDebugLocValue(Begin); DebugLocEntry Loc(StartLabel, EndLabel, Value); bool couldMerge = false; // If this is a fragment, it may belong to the current DebugLocEntry. if (DIExpr->isFragment()) { // Add this value to the list of open ranges. OpenRanges.push_back(Value); // Attempt to add the fragment to the last entry. if (!DebugLoc.empty()) if (DebugLoc.back().MergeValues(Loc)) couldMerge = true; } if (!couldMerge) { // Need to add a new DebugLocEntry. Add all values from still // valid non-overlapping fragments. if (OpenRanges.size()) Loc.addValues(OpenRanges); DebugLoc.push_back(std::move(Loc)); } // Attempt to coalesce the ranges of two otherwise identical // DebugLocEntries. auto CurEntry = DebugLoc.rbegin(); DEBUG({ dbgs() << CurEntry->getValues().size() << " Values:\n"; for (auto &Value : CurEntry->getValues()) Value.dump(); dbgs() << "-----\n"; }); auto PrevEntry = std::next(CurEntry); if (PrevEntry != DebugLoc.rend() && PrevEntry->MergeRanges(*CurEntry)) DebugLoc.pop_back(); } } DbgVariable *DwarfDebug::createConcreteVariable(LexicalScope &Scope, InlinedVariable IV) { ensureAbstractVariableIsCreatedIfScoped(IV, Scope.getScopeNode()); ConcreteVariables.push_back(make_unique(IV.first, IV.second)); InfoHolder.addScopeVariable(&Scope, ConcreteVariables.back().get()); return ConcreteVariables.back().get(); } // Determine whether this DBG_VALUE is valid at the beginning of the function. static bool validAtEntry(const MachineInstr *MInsn) { auto MBB = MInsn->getParent(); // Is it in the entry basic block? if (!MBB->pred_empty()) return false; for (MachineBasicBlock::const_reverse_iterator I(MInsn); I != MBB->rend(); ++I) if (!(I->isDebugValue() || I->getFlag(MachineInstr::FrameSetup))) return false; return true; } // Find variables for each lexical scope. void DwarfDebug::collectVariableInfo(DwarfCompileUnit &TheCU, const DISubprogram *SP, DenseSet &Processed) { // Grab the variable info that was squirreled away in the MMI side-table. collectVariableInfoFromMFTable(Processed); for (const auto &I : DbgValues) { InlinedVariable IV = I.first; if (Processed.count(IV)) continue; // Instruction ranges, specifying where IV is accessible. const auto &Ranges = I.second; if (Ranges.empty()) continue; LexicalScope *Scope = nullptr; if (const DILocation *IA = IV.second) Scope = LScopes.findInlinedScope(IV.first->getScope(), IA); else Scope = LScopes.findLexicalScope(IV.first->getScope()); // If variable scope is not found then skip this variable. if (!Scope) continue; Processed.insert(IV); DbgVariable *RegVar = createConcreteVariable(*Scope, IV); const MachineInstr *MInsn = Ranges.front().first; assert(MInsn->isDebugValue() && "History must begin with debug value"); // Check if there is a single DBG_VALUE, valid throughout the function. // A single constant is also considered valid for the entire function. if (Ranges.size() == 1 && (MInsn->getOperand(0).isImm() || (validAtEntry(MInsn) && Ranges.front().second == nullptr))) { RegVar->initializeDbgValue(MInsn); continue; } // Handle multiple DBG_VALUE instructions describing one variable. DebugLocStream::ListBuilder List(DebugLocs, TheCU, *Asm, *RegVar, *MInsn); // Build the location list for this variable. SmallVector Entries; buildLocationList(Entries, Ranges); // If the variable has a DIBasicType, extract it. Basic types cannot have // unique identifiers, so don't bother resolving the type with the // identifier map. const DIBasicType *BT = dyn_cast( static_cast(IV.first->getType())); // Finalize the entry by lowering it into a DWARF bytestream. for (auto &Entry : Entries) Entry.finalize(*Asm, List, BT); } // Collect info for variables that were optimized out. for (const DILocalVariable *DV : SP->getVariables()) { if (Processed.insert(InlinedVariable(DV, nullptr)).second) if (LexicalScope *Scope = LScopes.findLexicalScope(DV->getScope())) createConcreteVariable(*Scope, InlinedVariable(DV, nullptr)); } } // Process beginning of an instruction. void DwarfDebug::beginInstruction(const MachineInstr *MI) { DebugHandlerBase::beginInstruction(MI); assert(CurMI); // Check if source location changes, but ignore DBG_VALUE and CFI locations. if (MI->isDebugValue() || MI->isCFIInstruction()) return; const DebugLoc &DL = MI->getDebugLoc(); // When we emit a line-0 record, we don't update PrevInstLoc; so look at // the last line number actually emitted, to see if it was line 0. unsigned LastAsmLine = Asm->OutStreamer->getContext().getCurrentDwarfLoc().getLine(); if (DL == PrevInstLoc) { // If we have an ongoing unspecified location, nothing to do here. if (!DL) return; // We have an explicit location, same as the previous location. // But we might be coming back to it after a line 0 record. if (LastAsmLine == 0 && DL.getLine() != 0) { // Reinstate the source location but not marked as a statement. const MDNode *Scope = DL.getScope(); recordSourceLine(DL.getLine(), DL.getCol(), Scope, /*Flags=*/0); } return; } if (!DL) { // We have an unspecified location, which might want to be line 0. // If we have already emitted a line-0 record, don't repeat it. if (LastAsmLine == 0) return; // If user said Don't Do That, don't do that. if (UnknownLocations == Disable) return; // See if we have a reason to emit a line-0 record now. // Reasons to emit a line-0 record include: // - User asked for it (UnknownLocations). // - Instruction has a label, so it's referenced from somewhere else, // possibly debug information; we want it to have a source location. // - Instruction is at the top of a block; we don't want to inherit the // location from the physically previous (maybe unrelated) block. if (UnknownLocations == Enable || PrevLabel || (PrevInstBB && PrevInstBB != MI->getParent())) { // Preserve the file and column numbers, if we can, to save space in // the encoded line table. // Do not update PrevInstLoc, it remembers the last non-0 line. const MDNode *Scope = nullptr; unsigned Column = 0; if (PrevInstLoc) { Scope = PrevInstLoc.getScope(); Column = PrevInstLoc.getCol(); } recordSourceLine(/*Line=*/0, Column, Scope, /*Flags=*/0); } return; } // We have an explicit location, different from the previous location. // Don't repeat a line-0 record, but otherwise emit the new location. // (The new location might be an explicit line 0, which we do emit.) if (PrevInstLoc && DL.getLine() == 0 && LastAsmLine == 0) return; unsigned Flags = 0; if (DL == PrologEndLoc) { Flags |= DWARF2_FLAG_PROLOGUE_END | DWARF2_FLAG_IS_STMT; PrologEndLoc = DebugLoc(); } // If the line changed, we call that a new statement; unless we went to // line 0 and came back, in which case it is not a new statement. unsigned OldLine = PrevInstLoc ? PrevInstLoc.getLine() : LastAsmLine; if (DL.getLine() && DL.getLine() != OldLine) Flags |= DWARF2_FLAG_IS_STMT; const MDNode *Scope = DL.getScope(); recordSourceLine(DL.getLine(), DL.getCol(), Scope, Flags); // If we're not at line 0, remember this location. if (DL.getLine()) PrevInstLoc = DL; } static DebugLoc findPrologueEndLoc(const MachineFunction *MF) { // First known non-DBG_VALUE and non-frame setup location marks // the beginning of the function body. for (const auto &MBB : *MF) for (const auto &MI : MBB) if (!MI.isDebugValue() && !MI.getFlag(MachineInstr::FrameSetup) && MI.getDebugLoc()) return MI.getDebugLoc(); return DebugLoc(); } // Gather pre-function debug information. Assumes being called immediately // after the function entry point has been emitted. void DwarfDebug::beginFunction(const MachineFunction *MF) { CurFn = MF; // If there's no debug info for the function we're not going to do anything. if (!MMI->hasDebugInfo()) return; auto DI = MF->getFunction()->getSubprogram(); if (!DI) return; // Grab the lexical scopes for the function, if we don't have any of those // then we're not going to be able to do anything. DebugHandlerBase::beginFunction(MF); if (LScopes.empty()) return; // Set DwarfDwarfCompileUnitID in MCContext to the Compile Unit this function // belongs to so that we add to the correct per-cu line table in the // non-asm case. LexicalScope *FnScope = LScopes.getCurrentFunctionScope(); // FnScope->getScopeNode() and DI->second should represent the same function, // though they may not be the same MDNode due to inline functions merged in // LTO where the debug info metadata still differs (either due to distinct // written differences - two versions of a linkonce_odr function // written/copied into two separate files, or some sub-optimal metadata that // isn't structurally identical (see: file path/name info from clang, which // includes the directory of the cpp file being built, even when the file name // is absolute (such as an <> lookup header))) auto *SP = cast(FnScope->getScopeNode()); DwarfCompileUnit *TheCU = CUMap.lookup(SP->getUnit()); if (!TheCU) { assert(SP->getUnit()->getEmissionKind() == DICompileUnit::NoDebug && "DICompileUnit missing from llvm.dbg.cu?"); return; } if (Asm->OutStreamer->hasRawTextSupport()) // Use a single line table if we are generating assembly. Asm->OutStreamer->getContext().setDwarfCompileUnitID(0); else Asm->OutStreamer->getContext().setDwarfCompileUnitID(TheCU->getUniqueID()); // Record beginning of function. PrologEndLoc = findPrologueEndLoc(MF); if (DILocation *L = PrologEndLoc) { // We'd like to list the prologue as "not statements" but GDB behaves // poorly if we do that. Revisit this with caution/GDB (7.5+) testing. auto *SP = L->getInlinedAtScope()->getSubprogram(); recordSourceLine(SP->getScopeLine(), 0, SP, DWARF2_FLAG_IS_STMT); } } // Gather and emit post-function debug information. void DwarfDebug::endFunction(const MachineFunction *MF) { assert(CurFn == MF && "endFunction should be called with the same function as beginFunction"); const DISubprogram *SP = MF->getFunction()->getSubprogram(); if (!MMI->hasDebugInfo() || !SP || SP->getUnit()->getEmissionKind() == DICompileUnit::NoDebug) { // If we don't have a subprogram for this function then there will be a hole // in the range information. Keep note of this by setting the previously // used section to nullptr. PrevCU = nullptr; CurFn = nullptr; DebugHandlerBase::endFunction(MF); return; } // Set DwarfDwarfCompileUnitID in MCContext to default value. Asm->OutStreamer->getContext().setDwarfCompileUnitID(0); LexicalScope *FnScope = LScopes.getCurrentFunctionScope(); assert(!FnScope || SP == FnScope->getScopeNode()); DwarfCompileUnit &TheCU = *CUMap.lookup(SP->getUnit()); DenseSet ProcessedVars; collectVariableInfo(TheCU, SP, ProcessedVars); // Add the range of this function to the list of ranges for the CU. TheCU.addRange(RangeSpan(Asm->getFunctionBegin(), Asm->getFunctionEnd())); // Under -gmlt, skip building the subprogram if there are no inlined // subroutines inside it. if (TheCU.getCUNode()->getEmissionKind() == DICompileUnit::LineTablesOnly && LScopes.getAbstractScopesList().empty() && !IsDarwin) { assert(InfoHolder.getScopeVariables().empty()); assert(DbgValues.empty()); // FIXME: This wouldn't be true in LTO with a -g (with inlining) CU followed // by a -gmlt CU. Add a test and remove this assertion. assert(AbstractVariables.empty()); PrevLabel = nullptr; CurFn = nullptr; DebugHandlerBase::endFunction(MF); return; } #ifndef NDEBUG size_t NumAbstractScopes = LScopes.getAbstractScopesList().size(); #endif // Construct abstract scopes. for (LexicalScope *AScope : LScopes.getAbstractScopesList()) { auto *SP = cast(AScope->getScopeNode()); // Collect info for variables that were optimized out. for (const DILocalVariable *DV : SP->getVariables()) { if (!ProcessedVars.insert(InlinedVariable(DV, nullptr)).second) continue; ensureAbstractVariableIsCreated(InlinedVariable(DV, nullptr), DV->getScope()); assert(LScopes.getAbstractScopesList().size() == NumAbstractScopes && "ensureAbstractVariableIsCreated inserted abstract scopes"); } constructAbstractSubprogramScopeDIE(AScope); } ProcessedSPNodes.insert(SP); TheCU.constructSubprogramScopeDIE(SP, FnScope); if (auto *SkelCU = TheCU.getSkeleton()) if (!LScopes.getAbstractScopesList().empty() && TheCU.getCUNode()->getSplitDebugInlining()) SkelCU->constructSubprogramScopeDIE(SP, FnScope); // Clear debug info // Ownership of DbgVariables is a bit subtle - ScopeVariables owns all the // DbgVariables except those that are also in AbstractVariables (since they // can be used cross-function) InfoHolder.getScopeVariables().clear(); PrevLabel = nullptr; CurFn = nullptr; DebugHandlerBase::endFunction(MF); } // Register a source line with debug info. Returns the unique label that was // emitted and which provides correspondence to the source line list. void DwarfDebug::recordSourceLine(unsigned Line, unsigned Col, const MDNode *S, unsigned Flags) { StringRef Fn; StringRef Dir; unsigned Src = 1; unsigned Discriminator = 0; if (auto *Scope = cast_or_null(S)) { Fn = Scope->getFilename(); Dir = Scope->getDirectory(); if (auto *LBF = dyn_cast(Scope)) if (getDwarfVersion() >= 4) Discriminator = LBF->getDiscriminator(); unsigned CUID = Asm->OutStreamer->getContext().getDwarfCompileUnitID(); Src = static_cast(*InfoHolder.getUnits()[CUID]) .getOrCreateSourceID(Fn, Dir); } Asm->OutStreamer->EmitDwarfLocDirective(Src, Line, Col, Flags, 0, Discriminator, Fn); } //===----------------------------------------------------------------------===// // Emit Methods //===----------------------------------------------------------------------===// // Emit the debug info section. void DwarfDebug::emitDebugInfo() { DwarfFile &Holder = useSplitDwarf() ? SkeletonHolder : InfoHolder; Holder.emitUnits(/* UseOffsets */ false); } // Emit the abbreviation section. void DwarfDebug::emitAbbreviations() { DwarfFile &Holder = useSplitDwarf() ? SkeletonHolder : InfoHolder; Holder.emitAbbrevs(Asm->getObjFileLowering().getDwarfAbbrevSection()); } void DwarfDebug::emitAccel(DwarfAccelTable &Accel, MCSection *Section, StringRef TableName) { Accel.FinalizeTable(Asm, TableName); Asm->OutStreamer->SwitchSection(Section); // Emit the full data. Accel.emit(Asm, Section->getBeginSymbol(), this); } // Emit visible names into a hashed accelerator table section. void DwarfDebug::emitAccelNames() { emitAccel(AccelNames, Asm->getObjFileLowering().getDwarfAccelNamesSection(), "Names"); } // Emit objective C classes and categories into a hashed accelerator table // section. void DwarfDebug::emitAccelObjC() { emitAccel(AccelObjC, Asm->getObjFileLowering().getDwarfAccelObjCSection(), "ObjC"); } // Emit namespace dies into a hashed accelerator table. void DwarfDebug::emitAccelNamespaces() { emitAccel(AccelNamespace, Asm->getObjFileLowering().getDwarfAccelNamespaceSection(), "namespac"); } // Emit type dies into a hashed accelerator table. void DwarfDebug::emitAccelTypes() { emitAccel(AccelTypes, Asm->getObjFileLowering().getDwarfAccelTypesSection(), "types"); } // Public name handling. // The format for the various pubnames: // // dwarf pubnames - offset/name pairs where the offset is the offset into the CU // for the DIE that is named. // // gnu pubnames - offset/index value/name tuples where the offset is the offset // into the CU and the index value is computed according to the type of value // for the DIE that is named. // // For type units the offset is the offset of the skeleton DIE. For split dwarf // it's the offset within the debug_info/debug_types dwo section, however, the // reference in the pubname header doesn't change. /// computeIndexValue - Compute the gdb index value for the DIE and CU. static dwarf::PubIndexEntryDescriptor computeIndexValue(DwarfUnit *CU, const DIE *Die) { dwarf::GDBIndexEntryLinkage Linkage = dwarf::GIEL_STATIC; // We could have a specification DIE that has our most of our knowledge, // look for that now. if (DIEValue SpecVal = Die->findAttribute(dwarf::DW_AT_specification)) { DIE &SpecDIE = SpecVal.getDIEEntry().getEntry(); if (SpecDIE.findAttribute(dwarf::DW_AT_external)) Linkage = dwarf::GIEL_EXTERNAL; } else if (Die->findAttribute(dwarf::DW_AT_external)) Linkage = dwarf::GIEL_EXTERNAL; switch (Die->getTag()) { case dwarf::DW_TAG_class_type: case dwarf::DW_TAG_structure_type: case dwarf::DW_TAG_union_type: case dwarf::DW_TAG_enumeration_type: return dwarf::PubIndexEntryDescriptor( dwarf::GIEK_TYPE, CU->getLanguage() != dwarf::DW_LANG_C_plus_plus ? dwarf::GIEL_STATIC : dwarf::GIEL_EXTERNAL); case dwarf::DW_TAG_typedef: case dwarf::DW_TAG_base_type: case dwarf::DW_TAG_subrange_type: return dwarf::PubIndexEntryDescriptor(dwarf::GIEK_TYPE, dwarf::GIEL_STATIC); case dwarf::DW_TAG_namespace: return dwarf::GIEK_TYPE; case dwarf::DW_TAG_subprogram: return dwarf::PubIndexEntryDescriptor(dwarf::GIEK_FUNCTION, Linkage); case dwarf::DW_TAG_variable: return dwarf::PubIndexEntryDescriptor(dwarf::GIEK_VARIABLE, Linkage); case dwarf::DW_TAG_enumerator: return dwarf::PubIndexEntryDescriptor(dwarf::GIEK_VARIABLE, dwarf::GIEL_STATIC); default: return dwarf::GIEK_NONE; } } /// emitDebugPubNames - Emit visible names into a debug pubnames section. /// void DwarfDebug::emitDebugPubNames(bool GnuStyle) { MCSection *PSec = GnuStyle ? Asm->getObjFileLowering().getDwarfGnuPubNamesSection() : Asm->getObjFileLowering().getDwarfPubNamesSection(); emitDebugPubSection(GnuStyle, PSec, "Names", &DwarfCompileUnit::getGlobalNames); } void DwarfDebug::emitDebugPubSection( bool GnuStyle, MCSection *PSec, StringRef Name, const StringMap &(DwarfCompileUnit::*Accessor)() const) { for (const auto &NU : CUMap) { DwarfCompileUnit *TheU = NU.second; const auto &Globals = (TheU->*Accessor)(); if (Globals.empty()) continue; if (auto *Skeleton = TheU->getSkeleton()) TheU = Skeleton; // Start the dwarf pubnames section. Asm->OutStreamer->SwitchSection(PSec); // Emit the header. Asm->OutStreamer->AddComment("Length of Public " + Name + " Info"); MCSymbol *BeginLabel = Asm->createTempSymbol("pub" + Name + "_begin"); MCSymbol *EndLabel = Asm->createTempSymbol("pub" + Name + "_end"); Asm->EmitLabelDifference(EndLabel, BeginLabel, 4); Asm->OutStreamer->EmitLabel(BeginLabel); Asm->OutStreamer->AddComment("DWARF Version"); Asm->EmitInt16(dwarf::DW_PUBNAMES_VERSION); Asm->OutStreamer->AddComment("Offset of Compilation Unit Info"); Asm->emitDwarfSymbolReference(TheU->getLabelBegin()); Asm->OutStreamer->AddComment("Compilation Unit Length"); Asm->EmitInt32(TheU->getLength()); // Emit the pubnames for this compilation unit. for (const auto &GI : Globals) { const char *Name = GI.getKeyData(); const DIE *Entity = GI.second; Asm->OutStreamer->AddComment("DIE offset"); Asm->EmitInt32(Entity->getOffset()); if (GnuStyle) { dwarf::PubIndexEntryDescriptor Desc = computeIndexValue(TheU, Entity); Asm->OutStreamer->AddComment( Twine("Kind: ") + dwarf::GDBIndexEntryKindString(Desc.Kind) + ", " + dwarf::GDBIndexEntryLinkageString(Desc.Linkage)); Asm->EmitInt8(Desc.toBits()); } Asm->OutStreamer->AddComment("External Name"); Asm->OutStreamer->EmitBytes(StringRef(Name, GI.getKeyLength() + 1)); } Asm->OutStreamer->AddComment("End Mark"); Asm->EmitInt32(0); Asm->OutStreamer->EmitLabel(EndLabel); } } void DwarfDebug::emitDebugPubTypes(bool GnuStyle) { MCSection *PSec = GnuStyle ? Asm->getObjFileLowering().getDwarfGnuPubTypesSection() : Asm->getObjFileLowering().getDwarfPubTypesSection(); emitDebugPubSection(GnuStyle, PSec, "Types", &DwarfCompileUnit::getGlobalTypes); } /// Emit null-terminated strings into a debug str section. void DwarfDebug::emitDebugStr() { DwarfFile &Holder = useSplitDwarf() ? SkeletonHolder : InfoHolder; Holder.emitStrings(Asm->getObjFileLowering().getDwarfStrSection()); } void DwarfDebug::emitDebugLocEntry(ByteStreamer &Streamer, const DebugLocStream::Entry &Entry) { auto &&Comments = DebugLocs.getComments(Entry); auto Comment = Comments.begin(); auto End = Comments.end(); for (uint8_t Byte : DebugLocs.getBytes(Entry)) Streamer.EmitInt8(Byte, Comment != End ? *(Comment++) : ""); } static void emitDebugLocValue(const AsmPrinter &AP, const DIBasicType *BT, ByteStreamer &Streamer, const DebugLocEntry::Value &Value, DwarfExpression &DwarfExpr) { DIExpressionCursor ExprCursor(Value.getExpression()); DwarfExpr.addFragmentOffset(Value.getExpression()); // Regular entry. if (Value.isInt()) { if (BT && (BT->getEncoding() == dwarf::DW_ATE_signed || BT->getEncoding() == dwarf::DW_ATE_signed_char)) DwarfExpr.AddSignedConstant(Value.getInt()); else DwarfExpr.AddUnsignedConstant(Value.getInt()); } else if (Value.isLocation()) { MachineLocation Loc = Value.getLoc(); const TargetRegisterInfo &TRI = *AP.MF->getSubtarget().getRegisterInfo(); if (Loc.getOffset()) DwarfExpr.AddMachineRegIndirect(TRI, Loc.getReg(), Loc.getOffset()); else DwarfExpr.AddMachineRegExpression(TRI, ExprCursor, Loc.getReg()); } else if (Value.isConstantFP()) { APInt RawBytes = Value.getConstantFP()->getValueAPF().bitcastToAPInt(); DwarfExpr.AddUnsignedConstant(RawBytes); } DwarfExpr.AddExpression(std::move(ExprCursor)); } void DebugLocEntry::finalize(const AsmPrinter &AP, DebugLocStream::ListBuilder &List, const DIBasicType *BT) { DebugLocStream::EntryBuilder Entry(List, Begin, End); BufferByteStreamer Streamer = Entry.getStreamer(); DebugLocDwarfExpression DwarfExpr(AP.getDwarfVersion(), Streamer); const DebugLocEntry::Value &Value = Values[0]; if (Value.isFragment()) { // Emit all fragments that belong to the same variable and range. assert(all_of(Values, [](DebugLocEntry::Value P) { return P.isFragment(); }) && "all values are expected to be fragments"); assert(std::is_sorted(Values.begin(), Values.end()) && "fragments are expected to be sorted"); for (auto Fragment : Values) emitDebugLocValue(AP, BT, Streamer, Fragment, DwarfExpr); } else { assert(Values.size() == 1 && "only fragments may have >1 value"); emitDebugLocValue(AP, BT, Streamer, Value, DwarfExpr); } DwarfExpr.finalize(); } void DwarfDebug::emitDebugLocEntryLocation(const DebugLocStream::Entry &Entry) { // Emit the size. Asm->OutStreamer->AddComment("Loc expr size"); Asm->EmitInt16(DebugLocs.getBytes(Entry).size()); // Emit the entry. APByteStreamer Streamer(*Asm); emitDebugLocEntry(Streamer, Entry); } // Emit locations into the debug loc section. void DwarfDebug::emitDebugLoc() { // Start the dwarf loc section. Asm->OutStreamer->SwitchSection( Asm->getObjFileLowering().getDwarfLocSection()); unsigned char Size = Asm->getDataLayout().getPointerSize(); for (const auto &List : DebugLocs.getLists()) { Asm->OutStreamer->EmitLabel(List.Label); const DwarfCompileUnit *CU = List.CU; for (const auto &Entry : DebugLocs.getEntries(List)) { // Set up the range. This range is relative to the entry point of the // compile unit. This is a hard coded 0 for low_pc when we're emitting // ranges, or the DW_AT_low_pc on the compile unit otherwise. if (auto *Base = CU->getBaseAddress()) { Asm->EmitLabelDifference(Entry.BeginSym, Base, Size); Asm->EmitLabelDifference(Entry.EndSym, Base, Size); } else { Asm->OutStreamer->EmitSymbolValue(Entry.BeginSym, Size); Asm->OutStreamer->EmitSymbolValue(Entry.EndSym, Size); } emitDebugLocEntryLocation(Entry); } Asm->OutStreamer->EmitIntValue(0, Size); Asm->OutStreamer->EmitIntValue(0, Size); } } void DwarfDebug::emitDebugLocDWO() { Asm->OutStreamer->SwitchSection( Asm->getObjFileLowering().getDwarfLocDWOSection()); for (const auto &List : DebugLocs.getLists()) { Asm->OutStreamer->EmitLabel(List.Label); for (const auto &Entry : DebugLocs.getEntries(List)) { // Just always use start_length for now - at least that's one address // rather than two. We could get fancier and try to, say, reuse an // address we know we've emitted elsewhere (the start of the function? // The start of the CU or CU subrange that encloses this range?) Asm->EmitInt8(dwarf::DW_LLE_startx_length); unsigned idx = AddrPool.getIndex(Entry.BeginSym); Asm->EmitULEB128(idx); Asm->EmitLabelDifference(Entry.EndSym, Entry.BeginSym, 4); emitDebugLocEntryLocation(Entry); } Asm->EmitInt8(dwarf::DW_LLE_end_of_list); } } struct ArangeSpan { const MCSymbol *Start, *End; }; // Emit a debug aranges section, containing a CU lookup for any // address we can tie back to a CU. void DwarfDebug::emitDebugARanges() { // Provides a unique id per text section. MapVector> SectionMap; // Filter labels by section. for (const SymbolCU &SCU : ArangeLabels) { if (SCU.Sym->isInSection()) { // Make a note of this symbol and it's section. MCSection *Section = &SCU.Sym->getSection(); if (!Section->getKind().isMetadata()) SectionMap[Section].push_back(SCU); } else { // Some symbols (e.g. common/bss on mach-o) can have no section but still // appear in the output. This sucks as we rely on sections to build // arange spans. We can do it without, but it's icky. SectionMap[nullptr].push_back(SCU); } } DenseMap> Spans; for (auto &I : SectionMap) { MCSection *Section = I.first; SmallVector &List = I.second; if (List.size() < 1) continue; // If we have no section (e.g. common), just write out // individual spans for each symbol. if (!Section) { for (const SymbolCU &Cur : List) { ArangeSpan Span; Span.Start = Cur.Sym; Span.End = nullptr; assert(Cur.CU); Spans[Cur.CU].push_back(Span); } continue; } // Sort the symbols by offset within the section. std::sort( List.begin(), List.end(), [&](const SymbolCU &A, const SymbolCU &B) { unsigned IA = A.Sym ? Asm->OutStreamer->GetSymbolOrder(A.Sym) : 0; unsigned IB = B.Sym ? Asm->OutStreamer->GetSymbolOrder(B.Sym) : 0; // Symbols with no order assigned should be placed at the end. // (e.g. section end labels) if (IA == 0) return false; if (IB == 0) return true; return IA < IB; }); // Insert a final terminator. List.push_back(SymbolCU(nullptr, Asm->OutStreamer->endSection(Section))); // Build spans between each label. const MCSymbol *StartSym = List[0].Sym; for (size_t n = 1, e = List.size(); n < e; n++) { const SymbolCU &Prev = List[n - 1]; const SymbolCU &Cur = List[n]; // Try and build the longest span we can within the same CU. if (Cur.CU != Prev.CU) { ArangeSpan Span; Span.Start = StartSym; Span.End = Cur.Sym; assert(Prev.CU); Spans[Prev.CU].push_back(Span); StartSym = Cur.Sym; } } } // Start the dwarf aranges section. Asm->OutStreamer->SwitchSection( Asm->getObjFileLowering().getDwarfARangesSection()); unsigned PtrSize = Asm->getDataLayout().getPointerSize(); // Build a list of CUs used. std::vector CUs; for (const auto &it : Spans) { DwarfCompileUnit *CU = it.first; CUs.push_back(CU); } // Sort the CU list (again, to ensure consistent output order). std::sort(CUs.begin(), CUs.end(), [](const DwarfCompileUnit *A, const DwarfCompileUnit *B) { return A->getUniqueID() < B->getUniqueID(); }); // Emit an arange table for each CU we used. for (DwarfCompileUnit *CU : CUs) { std::vector &List = Spans[CU]; // Describe the skeleton CU's offset and length, not the dwo file's. if (auto *Skel = CU->getSkeleton()) CU = Skel; // Emit size of content not including length itself. unsigned ContentSize = sizeof(int16_t) + // DWARF ARange version number sizeof(int32_t) + // Offset of CU in the .debug_info section sizeof(int8_t) + // Pointer Size (in bytes) sizeof(int8_t); // Segment Size (in bytes) unsigned TupleSize = PtrSize * 2; // 7.20 in the Dwarf specs requires the table to be aligned to a tuple. unsigned Padding = OffsetToAlignment(sizeof(int32_t) + ContentSize, TupleSize); ContentSize += Padding; ContentSize += (List.size() + 1) * TupleSize; // For each compile unit, write the list of spans it covers. Asm->OutStreamer->AddComment("Length of ARange Set"); Asm->EmitInt32(ContentSize); Asm->OutStreamer->AddComment("DWARF Arange version number"); Asm->EmitInt16(dwarf::DW_ARANGES_VERSION); Asm->OutStreamer->AddComment("Offset Into Debug Info Section"); Asm->emitDwarfSymbolReference(CU->getLabelBegin()); Asm->OutStreamer->AddComment("Address Size (in bytes)"); Asm->EmitInt8(PtrSize); Asm->OutStreamer->AddComment("Segment Size (in bytes)"); Asm->EmitInt8(0); Asm->OutStreamer->emitFill(Padding, 0xff); for (const ArangeSpan &Span : List) { Asm->EmitLabelReference(Span.Start, PtrSize); // Calculate the size as being from the span start to it's end. if (Span.End) { Asm->EmitLabelDifference(Span.End, Span.Start, PtrSize); } else { // For symbols without an end marker (e.g. common), we // write a single arange entry containing just that one symbol. uint64_t Size = SymSize[Span.Start]; if (Size == 0) Size = 1; Asm->OutStreamer->EmitIntValue(Size, PtrSize); } } Asm->OutStreamer->AddComment("ARange terminator"); Asm->OutStreamer->EmitIntValue(0, PtrSize); Asm->OutStreamer->EmitIntValue(0, PtrSize); } } /// Emit address ranges into a debug ranges section. void DwarfDebug::emitDebugRanges() { // Start the dwarf ranges section. Asm->OutStreamer->SwitchSection( Asm->getObjFileLowering().getDwarfRangesSection()); // Size for our labels. unsigned char Size = Asm->getDataLayout().getPointerSize(); // Grab the specific ranges for the compile units in the module. for (const auto &I : CUMap) { DwarfCompileUnit *TheCU = I.second; if (auto *Skel = TheCU->getSkeleton()) TheCU = Skel; // Iterate over the misc ranges for the compile units in the module. for (const RangeSpanList &List : TheCU->getRangeLists()) { // Emit our symbol so we can find the beginning of the range. Asm->OutStreamer->EmitLabel(List.getSym()); for (const RangeSpan &Range : List.getRanges()) { const MCSymbol *Begin = Range.getStart(); const MCSymbol *End = Range.getEnd(); assert(Begin && "Range without a begin symbol?"); assert(End && "Range without an end symbol?"); if (auto *Base = TheCU->getBaseAddress()) { Asm->EmitLabelDifference(Begin, Base, Size); Asm->EmitLabelDifference(End, Base, Size); } else { Asm->OutStreamer->EmitSymbolValue(Begin, Size); Asm->OutStreamer->EmitSymbolValue(End, Size); } } // And terminate the list with two 0 values. Asm->OutStreamer->EmitIntValue(0, Size); Asm->OutStreamer->EmitIntValue(0, Size); } } } void DwarfDebug::handleMacroNodes(DIMacroNodeArray Nodes, DwarfCompileUnit &U) { for (auto *MN : Nodes) { if (auto *M = dyn_cast(MN)) emitMacro(*M); else if (auto *F = dyn_cast(MN)) emitMacroFile(*F, U); else llvm_unreachable("Unexpected DI type!"); } } void DwarfDebug::emitMacro(DIMacro &M) { Asm->EmitULEB128(M.getMacinfoType()); Asm->EmitULEB128(M.getLine()); StringRef Name = M.getName(); StringRef Value = M.getValue(); Asm->OutStreamer->EmitBytes(Name); if (!Value.empty()) { // There should be one space between macro name and macro value. Asm->EmitInt8(' '); Asm->OutStreamer->EmitBytes(Value); } Asm->EmitInt8('\0'); } void DwarfDebug::emitMacroFile(DIMacroFile &F, DwarfCompileUnit &U) { assert(F.getMacinfoType() == dwarf::DW_MACINFO_start_file); Asm->EmitULEB128(dwarf::DW_MACINFO_start_file); Asm->EmitULEB128(F.getLine()); DIFile *File = F.getFile(); unsigned FID = U.getOrCreateSourceID(File->getFilename(), File->getDirectory()); Asm->EmitULEB128(FID); handleMacroNodes(F.getElements(), U); Asm->EmitULEB128(dwarf::DW_MACINFO_end_file); } /// Emit macros into a debug macinfo section. void DwarfDebug::emitDebugMacinfo() { // Start the dwarf macinfo section. Asm->OutStreamer->SwitchSection( Asm->getObjFileLowering().getDwarfMacinfoSection()); for (const auto &P : CUMap) { auto &TheCU = *P.second; auto *SkCU = TheCU.getSkeleton(); DwarfCompileUnit &U = SkCU ? *SkCU : TheCU; auto *CUNode = cast(P.first); Asm->OutStreamer->EmitLabel(U.getMacroLabelBegin()); handleMacroNodes(CUNode->getMacros(), U); } Asm->OutStreamer->AddComment("End Of Macro List Mark"); Asm->EmitInt8(0); } // DWARF5 Experimental Separate Dwarf emitters. void DwarfDebug::initSkeletonUnit(const DwarfUnit &U, DIE &Die, std::unique_ptr NewU) { NewU->addString(Die, dwarf::DW_AT_GNU_dwo_name, U.getCUNode()->getSplitDebugFilename()); if (!CompilationDir.empty()) NewU->addString(Die, dwarf::DW_AT_comp_dir, CompilationDir); addGnuPubAttributes(*NewU, Die); SkeletonHolder.addUnit(std::move(NewU)); } // This DIE has the following attributes: DW_AT_comp_dir, DW_AT_stmt_list, // DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges, DW_AT_dwo_name, DW_AT_dwo_id, // DW_AT_addr_base, DW_AT_ranges_base. DwarfCompileUnit &DwarfDebug::constructSkeletonCU(const DwarfCompileUnit &CU) { auto OwnedUnit = make_unique( CU.getUniqueID(), CU.getCUNode(), Asm, this, &SkeletonHolder); DwarfCompileUnit &NewCU = *OwnedUnit; NewCU.setSection(Asm->getObjFileLowering().getDwarfInfoSection()); NewCU.initStmtList(); initSkeletonUnit(CU, NewCU.getUnitDie(), std::move(OwnedUnit)); return NewCU; } // Emit the .debug_info.dwo section for separated dwarf. This contains the // compile units that would normally be in debug_info. void DwarfDebug::emitDebugInfoDWO() { assert(useSplitDwarf() && "No split dwarf debug info?"); // Don't emit relocations into the dwo file. InfoHolder.emitUnits(/* UseOffsets */ true); } // Emit the .debug_abbrev.dwo section for separated dwarf. This contains the // abbreviations for the .debug_info.dwo section. void DwarfDebug::emitDebugAbbrevDWO() { assert(useSplitDwarf() && "No split dwarf?"); InfoHolder.emitAbbrevs(Asm->getObjFileLowering().getDwarfAbbrevDWOSection()); } void DwarfDebug::emitDebugLineDWO() { assert(useSplitDwarf() && "No split dwarf?"); Asm->OutStreamer->SwitchSection( Asm->getObjFileLowering().getDwarfLineDWOSection()); SplitTypeUnitFileTable.Emit(*Asm->OutStreamer, MCDwarfLineTableParams()); } // Emit the .debug_str.dwo section for separated dwarf. This contains the // string section and is identical in format to traditional .debug_str // sections. void DwarfDebug::emitDebugStrDWO() { assert(useSplitDwarf() && "No split dwarf?"); MCSection *OffSec = Asm->getObjFileLowering().getDwarfStrOffDWOSection(); InfoHolder.emitStrings(Asm->getObjFileLowering().getDwarfStrDWOSection(), OffSec); } MCDwarfDwoLineTable *DwarfDebug::getDwoLineTable(const DwarfCompileUnit &CU) { if (!useSplitDwarf()) return nullptr; if (SingleCU) SplitTypeUnitFileTable.setCompilationDir(CU.getCUNode()->getDirectory()); return &SplitTypeUnitFileTable; } uint64_t DwarfDebug::makeTypeSignature(StringRef Identifier) { MD5 Hash; Hash.update(Identifier); // ... take the least significant 8 bytes and return those. Our MD5 // implementation always returns its results in little endian, swap bytes // appropriately. MD5::MD5Result Result; Hash.final(Result); return support::endian::read64le(Result + 8); } void DwarfDebug::addDwarfTypeUnitType(DwarfCompileUnit &CU, StringRef Identifier, DIE &RefDie, const DICompositeType *CTy) { // Fast path if we're building some type units and one has already used the // address pool we know we're going to throw away all this work anyway, so // don't bother building dependent types. if (!TypeUnitsUnderConstruction.empty() && AddrPool.hasBeenUsed()) return; auto Ins = TypeSignatures.insert(std::make_pair(CTy, 0)); if (!Ins.second) { CU.addDIETypeSignature(RefDie, Ins.first->second); return; } bool TopLevelType = TypeUnitsUnderConstruction.empty(); AddrPool.resetUsedFlag(); auto OwnedUnit = make_unique(CU, Asm, this, &InfoHolder, getDwoLineTable(CU)); DwarfTypeUnit &NewTU = *OwnedUnit; DIE &UnitDie = NewTU.getUnitDie(); TypeUnitsUnderConstruction.emplace_back(std::move(OwnedUnit), CTy); NewTU.addUInt(UnitDie, dwarf::DW_AT_language, dwarf::DW_FORM_data2, CU.getLanguage()); uint64_t Signature = makeTypeSignature(Identifier); NewTU.setTypeSignature(Signature); Ins.first->second = Signature; if (useSplitDwarf()) NewTU.setSection(Asm->getObjFileLowering().getDwarfTypesDWOSection()); else { CU.applyStmtList(UnitDie); NewTU.setSection(Asm->getObjFileLowering().getDwarfTypesSection(Signature)); } NewTU.setType(NewTU.createTypeDIE(CTy)); if (TopLevelType) { auto TypeUnitsToAdd = std::move(TypeUnitsUnderConstruction); TypeUnitsUnderConstruction.clear(); // Types referencing entries in the address table cannot be placed in type // units. if (AddrPool.hasBeenUsed()) { // Remove all the types built while building this type. // This is pessimistic as some of these types might not be dependent on // the type that used an address. for (const auto &TU : TypeUnitsToAdd) TypeSignatures.erase(TU.second); // Construct this type in the CU directly. // This is inefficient because all the dependent types will be rebuilt // from scratch, including building them in type units, discovering that // they depend on addresses, throwing them out and rebuilding them. CU.constructTypeDIE(RefDie, cast(CTy)); return; } // If the type wasn't dependent on fission addresses, finish adding the type // and all its dependent types. for (auto &TU : TypeUnitsToAdd) { InfoHolder.computeSizeAndOffsetsForUnit(TU.first.get()); InfoHolder.emitUnit(TU.first.get(), useSplitDwarf()); } } CU.addDIETypeSignature(RefDie, Signature); } // Accelerator table mutators - add each name along with its companion // DIE to the proper table while ensuring that the name that we're going // to reference is in the string table. We do this since the names we // add may not only be identical to the names in the DIE. void DwarfDebug::addAccelName(StringRef Name, const DIE &Die) { if (!useDwarfAccelTables()) return; AccelNames.AddName(InfoHolder.getStringPool().getEntry(*Asm, Name), &Die); } void DwarfDebug::addAccelObjC(StringRef Name, const DIE &Die) { if (!useDwarfAccelTables()) return; AccelObjC.AddName(InfoHolder.getStringPool().getEntry(*Asm, Name), &Die); } void DwarfDebug::addAccelNamespace(StringRef Name, const DIE &Die) { if (!useDwarfAccelTables()) return; AccelNamespace.AddName(InfoHolder.getStringPool().getEntry(*Asm, Name), &Die); } void DwarfDebug::addAccelType(StringRef Name, const DIE &Die, char Flags) { if (!useDwarfAccelTables()) return; AccelTypes.AddName(InfoHolder.getStringPool().getEntry(*Asm, Name), &Die); } uint16_t DwarfDebug::getDwarfVersion() const { return Asm->OutStreamer->getContext().getDwarfVersion(); } Index: vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfDebug.h =================================================================== --- vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfDebug.h (revision 314158) +++ vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfDebug.h (revision 314159) @@ -1,559 +1,561 @@ //===-- llvm/CodeGen/DwarfDebug.h - Dwarf Debug Framework ------*- C++ -*--===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file contains support for writing dwarf debug info into asm files. // //===----------------------------------------------------------------------===// #ifndef LLVM_LIB_CODEGEN_ASMPRINTER_DWARFDEBUG_H #define LLVM_LIB_CODEGEN_ASMPRINTER_DWARFDEBUG_H #include "DbgValueHistoryCalculator.h" #include "DebugHandlerBase.h" #include "DebugLocStream.h" #include "DwarfAccelTable.h" #include "DwarfFile.h" #include "llvm/ADT/DenseMap.h" #include "llvm/ADT/DenseSet.h" #include "llvm/ADT/MapVector.h" #include "llvm/ADT/SetVector.h" #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/StringMap.h" #include "llvm/CodeGen/DIE.h" #include "llvm/CodeGen/LexicalScopes.h" #include "llvm/CodeGen/MachineInstr.h" #include "llvm/IR/DebugInfo.h" #include "llvm/IR/DebugLoc.h" #include "llvm/MC/MCDwarf.h" #include "llvm/MC/MachineLocation.h" #include "llvm/Support/Allocator.h" #include "llvm/Target/TargetOptions.h" #include namespace llvm { class AsmPrinter; class ByteStreamer; class ConstantInt; class ConstantFP; class DebugLocEntry; class DwarfCompileUnit; class DwarfDebug; class DwarfTypeUnit; class DwarfUnit; class MachineModuleInfo; //===----------------------------------------------------------------------===// /// This class is used to track local variable information. /// /// Variables can be created from allocas, in which case they're generated from /// the MMI table. Such variables can have multiple expressions and frame -/// indices. The \a Expr and \a FrameIndices array must match. +/// indices. /// /// Variables can be created from \c DBG_VALUE instructions. Those whose /// location changes over time use \a DebugLocListIndex, while those with a /// single instruction use \a MInsn and (optionally) a single entry of \a Expr. /// /// Variables that have been optimized out use none of these fields. class DbgVariable { const DILocalVariable *Var; /// Variable Descriptor. const DILocation *IA; /// Inlined at location. - SmallVector Expr; /// Complex address. DIE *TheDIE = nullptr; /// Variable DIE. unsigned DebugLocListIndex = ~0u; /// Offset in DebugLocs. const MachineInstr *MInsn = nullptr; /// DBG_VALUE instruction. - SmallVector FrameIndex; /// Frame index. + struct FrameIndexExpr { + int FI; + const DIExpression *Expr; + }; + mutable SmallVector + FrameIndexExprs; /// Frame index + expression. + public: /// Construct a DbgVariable. /// /// Creates a variable without any DW_AT_location. Call \a initializeMMI() /// for MMI entries, or \a initializeDbgValue() for DBG_VALUE instructions. DbgVariable(const DILocalVariable *V, const DILocation *IA) : Var(V), IA(IA) {} /// Initialize from the MMI table. void initializeMMI(const DIExpression *E, int FI) { - assert(Expr.empty() && "Already initialized?"); - assert(FrameIndex.empty() && "Already initialized?"); + assert(FrameIndexExprs.empty() && "Already initialized?"); assert(!MInsn && "Already initialized?"); assert((!E || E->isValid()) && "Expected valid expression"); assert(~FI && "Expected valid index"); - Expr.push_back(E); - FrameIndex.push_back(FI); + FrameIndexExprs.push_back({FI, E}); } /// Initialize from a DBG_VALUE instruction. void initializeDbgValue(const MachineInstr *DbgValue) { - assert(Expr.empty() && "Already initialized?"); - assert(FrameIndex.empty() && "Already initialized?"); + assert(FrameIndexExprs.empty() && "Already initialized?"); assert(!MInsn && "Already initialized?"); assert(Var == DbgValue->getDebugVariable() && "Wrong variable"); assert(IA == DbgValue->getDebugLoc()->getInlinedAt() && "Wrong inlined-at"); MInsn = DbgValue; if (auto *E = DbgValue->getDebugExpression()) if (E->getNumElements()) - Expr.push_back(E); + FrameIndexExprs.push_back({0, E}); } // Accessors. const DILocalVariable *getVariable() const { return Var; } const DILocation *getInlinedAt() const { return IA; } - ArrayRef getExpression() const { return Expr; } const DIExpression *getSingleExpression() const { - assert(MInsn && Expr.size() <= 1); - return Expr.size() ? Expr[0] : nullptr; + assert(MInsn && FrameIndexExprs.size() <= 1); + return FrameIndexExprs.size() ? FrameIndexExprs[0].Expr : nullptr; } void setDIE(DIE &D) { TheDIE = &D; } DIE *getDIE() const { return TheDIE; } void setDebugLocListIndex(unsigned O) { DebugLocListIndex = O; } unsigned getDebugLocListIndex() const { return DebugLocListIndex; } StringRef getName() const { return Var->getName(); } const MachineInstr *getMInsn() const { return MInsn; } - ArrayRef getFrameIndex() const { return FrameIndex; } + /// Get the FI entries, sorted by fragment offset. + ArrayRef getFrameIndexExprs() const; + bool hasFrameIndexExprs() const { return !FrameIndexExprs.empty(); } void addMMIEntry(const DbgVariable &V) { assert(DebugLocListIndex == ~0U && !MInsn && "not an MMI entry"); assert(V.DebugLocListIndex == ~0U && !V.MInsn && "not an MMI entry"); assert(V.Var == Var && "conflicting variable"); assert(V.IA == IA && "conflicting inlined-at location"); - assert(!FrameIndex.empty() && "Expected an MMI entry"); - assert(!V.FrameIndex.empty() && "Expected an MMI entry"); - assert(Expr.size() == FrameIndex.size() && "Mismatched expressions"); - assert(V.Expr.size() == V.FrameIndex.size() && "Mismatched expressions"); + assert(!FrameIndexExprs.empty() && "Expected an MMI entry"); + assert(!V.FrameIndexExprs.empty() && "Expected an MMI entry"); - Expr.append(V.Expr.begin(), V.Expr.end()); - FrameIndex.append(V.FrameIndex.begin(), V.FrameIndex.end()); - assert(all_of(Expr, [](const DIExpression *E) { - return E && E->isFragment(); - }) && "conflicting locations for variable"); + FrameIndexExprs.append(V.FrameIndexExprs.begin(), V.FrameIndexExprs.end()); + assert(all_of(FrameIndexExprs, + [](FrameIndexExpr &FIE) { + return FIE.Expr && FIE.Expr->isFragment(); + }) && + "conflicting locations for variable"); } // Translate tag to proper Dwarf tag. dwarf::Tag getTag() const { // FIXME: Why don't we just infer this tag and store it all along? if (Var->isParameter()) return dwarf::DW_TAG_formal_parameter; return dwarf::DW_TAG_variable; } /// Return true if DbgVariable is artificial. bool isArtificial() const { if (Var->isArtificial()) return true; if (getType()->isArtificial()) return true; return false; } bool isObjectPointer() const { if (Var->isObjectPointer()) return true; if (getType()->isObjectPointer()) return true; return false; } bool hasComplexAddress() const { assert(MInsn && "Expected DBG_VALUE, not MMI variable"); - assert(FrameIndex.empty() && "Expected DBG_VALUE, not MMI variable"); - assert( - (Expr.empty() || (Expr.size() == 1 && Expr.back()->getNumElements())) && - "Invalid Expr for DBG_VALUE"); - return !Expr.empty(); + assert((FrameIndexExprs.empty() || + (FrameIndexExprs.size() == 1 && + FrameIndexExprs[0].Expr->getNumElements())) && + "Invalid Expr for DBG_VALUE"); + return !FrameIndexExprs.empty(); } bool isBlockByrefVariable() const; const DIType *getType() const; private: template T *resolve(TypedDINodeRef Ref) const { return Ref.resolve(); } }; /// Helper used to pair up a symbol and its DWARF compile unit. struct SymbolCU { SymbolCU(DwarfCompileUnit *CU, const MCSymbol *Sym) : Sym(Sym), CU(CU) {} const MCSymbol *Sym; DwarfCompileUnit *CU; }; /// Collects and handles dwarf debug information. class DwarfDebug : public DebugHandlerBase { /// All DIEValues are allocated through this allocator. BumpPtrAllocator DIEValueAllocator; /// Maps MDNode with its corresponding DwarfCompileUnit. MapVector CUMap; /// Maps a CU DIE with its corresponding DwarfCompileUnit. DenseMap CUDieMap; /// List of all labels used in aranges generation. std::vector ArangeLabels; /// Size of each symbol emitted (for those symbols that have a specific size). DenseMap SymSize; /// Collection of abstract variables. DenseMap> AbstractVariables; SmallVector, 64> ConcreteVariables; /// Collection of DebugLocEntry. Stored in a linked list so that DIELocLists /// can refer to them in spite of insertions into this list. DebugLocStream DebugLocs; /// This is a collection of subprogram MDNodes that are processed to /// create DIEs. SetVector, SmallPtrSet> ProcessedSPNodes; /// If nonnull, stores the current machine function we're processing. const MachineFunction *CurFn; /// If nonnull, stores the CU in which the previous subprogram was contained. const DwarfCompileUnit *PrevCU; /// As an optimization, there is no need to emit an entry in the directory /// table for the same directory as DW_AT_comp_dir. StringRef CompilationDir; /// Holder for the file specific debug information. DwarfFile InfoHolder; /// Holders for the various debug information flags that we might need to /// have exposed. See accessor functions below for description. /// Map from MDNodes for user-defined types to their type signatures. Also /// used to keep track of which types we have emitted type units for. DenseMap TypeSignatures; SmallVector< std::pair, const DICompositeType *>, 1> TypeUnitsUnderConstruction; /// Whether to emit the pubnames/pubtypes sections. bool HasDwarfPubSections; /// Whether to use the GNU TLS opcode (instead of the standard opcode). bool UseGNUTLSOpcode; /// Whether to use DWARF 2 bitfields (instead of the DWARF 4 format). bool UseDWARF2Bitfields; /// Whether to emit all linkage names, or just abstract subprograms. bool UseAllLinkageNames; /// DWARF5 Experimental Options /// @{ bool HasDwarfAccelTables; bool HasAppleExtensionAttributes; bool HasSplitDwarf; /// Separated Dwarf Variables /// In general these will all be for bits that are left in the /// original object file, rather than things that are meant /// to be in the .dwo sections. /// Holder for the skeleton information. DwarfFile SkeletonHolder; /// Store file names for type units under fission in a line table /// header that will be emitted into debug_line.dwo. // FIXME: replace this with a map from comp_dir to table so that we // can emit multiple tables during LTO each of which uses directory // 0, referencing the comp_dir of all the type units that use it. MCDwarfDwoLineTable SplitTypeUnitFileTable; /// @} /// True iff there are multiple CUs in this module. bool SingleCU; bool IsDarwin; AddressPool AddrPool; DwarfAccelTable AccelNames; DwarfAccelTable AccelObjC; DwarfAccelTable AccelNamespace; DwarfAccelTable AccelTypes; // Identify a debugger for "tuning" the debug info. DebuggerKind DebuggerTuning; /// \defgroup DebuggerTuning Predicates to tune DWARF for a given debugger. /// /// Returns whether we are "tuning" for a given debugger. /// Should be used only within the constructor, to set feature flags. /// @{ bool tuneForGDB() const { return DebuggerTuning == DebuggerKind::GDB; } bool tuneForLLDB() const { return DebuggerTuning == DebuggerKind::LLDB; } bool tuneForSCE() const { return DebuggerTuning == DebuggerKind::SCE; } /// @} MCDwarfDwoLineTable *getDwoLineTable(const DwarfCompileUnit &); const SmallVectorImpl> &getUnits() { return InfoHolder.getUnits(); } typedef DbgValueHistoryMap::InlinedVariable InlinedVariable; /// Find abstract variable associated with Var. DbgVariable *getExistingAbstractVariable(InlinedVariable IV, const DILocalVariable *&Cleansed); DbgVariable *getExistingAbstractVariable(InlinedVariable IV); void createAbstractVariable(const DILocalVariable *DV, LexicalScope *Scope); void ensureAbstractVariableIsCreated(InlinedVariable Var, const MDNode *Scope); void ensureAbstractVariableIsCreatedIfScoped(InlinedVariable Var, const MDNode *Scope); DbgVariable *createConcreteVariable(LexicalScope &Scope, InlinedVariable IV); /// Construct a DIE for this abstract scope. void constructAbstractSubprogramScopeDIE(LexicalScope *Scope); void finishVariableDefinitions(); void finishSubprogramDefinitions(); /// Finish off debug information after all functions have been /// processed. void finalizeModuleInfo(); /// Emit the debug info section. void emitDebugInfo(); /// Emit the abbreviation section. void emitAbbreviations(); /// Emit a specified accelerator table. void emitAccel(DwarfAccelTable &Accel, MCSection *Section, StringRef TableName); /// Emit visible names into a hashed accelerator table section. void emitAccelNames(); /// Emit objective C classes and categories into a hashed /// accelerator table section. void emitAccelObjC(); /// Emit namespace dies into a hashed accelerator table. void emitAccelNamespaces(); /// Emit type dies into a hashed accelerator table. void emitAccelTypes(); /// Emit visible names into a debug pubnames section. /// \param GnuStyle determines whether or not we want to emit /// additional information into the table ala newer gcc for gdb /// index. void emitDebugPubNames(bool GnuStyle = false); /// Emit visible types into a debug pubtypes section. /// \param GnuStyle determines whether or not we want to emit /// additional information into the table ala newer gcc for gdb /// index. void emitDebugPubTypes(bool GnuStyle = false); void emitDebugPubSection( bool GnuStyle, MCSection *PSec, StringRef Name, const StringMap &(DwarfCompileUnit::*Accessor)() const); /// Emit null-terminated strings into a debug str section. void emitDebugStr(); /// Emit variable locations into a debug loc section. void emitDebugLoc(); /// Emit variable locations into a debug loc dwo section. void emitDebugLocDWO(); /// Emit address ranges into a debug aranges section. void emitDebugARanges(); /// Emit address ranges into a debug ranges section. void emitDebugRanges(); /// Emit macros into a debug macinfo section. void emitDebugMacinfo(); void emitMacro(DIMacro &M); void emitMacroFile(DIMacroFile &F, DwarfCompileUnit &U); void handleMacroNodes(DIMacroNodeArray Nodes, DwarfCompileUnit &U); /// DWARF 5 Experimental Split Dwarf Emitters /// Initialize common features of skeleton units. void initSkeletonUnit(const DwarfUnit &U, DIE &Die, std::unique_ptr NewU); /// Construct the split debug info compile unit for the debug info /// section. DwarfCompileUnit &constructSkeletonCU(const DwarfCompileUnit &CU); /// Emit the debug info dwo section. void emitDebugInfoDWO(); /// Emit the debug abbrev dwo section. void emitDebugAbbrevDWO(); /// Emit the debug line dwo section. void emitDebugLineDWO(); /// Emit the debug str dwo section. void emitDebugStrDWO(); /// Flags to let the linker know we have emitted new style pubnames. Only /// emit it here if we don't have a skeleton CU for split dwarf. void addGnuPubAttributes(DwarfUnit &U, DIE &D) const; /// Create new DwarfCompileUnit for the given metadata node with tag /// DW_TAG_compile_unit. DwarfCompileUnit &constructDwarfCompileUnit(const DICompileUnit *DIUnit); /// Construct imported_module or imported_declaration DIE. void constructAndAddImportedEntityDIE(DwarfCompileUnit &TheCU, const DIImportedEntity *N); /// Register a source line with debug info. Returns the unique /// label that was emitted and which provides correspondence to the /// source line list. void recordSourceLine(unsigned Line, unsigned Col, const MDNode *Scope, unsigned Flags); /// Populate LexicalScope entries with variables' info. void collectVariableInfo(DwarfCompileUnit &TheCU, const DISubprogram *SP, DenseSet &ProcessedVars); /// Build the location list for all DBG_VALUEs in the /// function that describe the same variable. void buildLocationList(SmallVectorImpl &DebugLoc, const DbgValueHistoryMap::InstrRanges &Ranges); /// Collect variable information from the side table maintained by MF. void collectVariableInfoFromMFTable(DenseSet &P); public: //===--------------------------------------------------------------------===// // Main entry points. // DwarfDebug(AsmPrinter *A, Module *M); ~DwarfDebug() override; /// Emit all Dwarf sections that should come prior to the /// content. void beginModule(); /// Emit all Dwarf sections that should come after the content. void endModule() override; /// Gather pre-function debug information. void beginFunction(const MachineFunction *MF) override; /// Gather and emit post-function debug information. void endFunction(const MachineFunction *MF) override; /// Process beginning of an instruction. void beginInstruction(const MachineInstr *MI) override; /// Perform an MD5 checksum of \p Identifier and return the lower 64 bits. static uint64_t makeTypeSignature(StringRef Identifier); /// Add a DIE to the set of types that we're going to pull into /// type units. void addDwarfTypeUnitType(DwarfCompileUnit &CU, StringRef Identifier, DIE &Die, const DICompositeType *CTy); /// Add a label so that arange data can be generated for it. void addArangeLabel(SymbolCU SCU) { ArangeLabels.push_back(SCU); } /// For symbols that have a size designated (e.g. common symbols), /// this tracks that size. void setSymbolSize(const MCSymbol *Sym, uint64_t Size) override { SymSize[Sym] = Size; } /// Returns whether we should emit all DW_AT_[MIPS_]linkage_name. /// If not, we still might emit certain cases. bool useAllLinkageNames() const { return UseAllLinkageNames; } /// Returns whether to use DW_OP_GNU_push_tls_address, instead of the /// standard DW_OP_form_tls_address opcode bool useGNUTLSOpcode() const { return UseGNUTLSOpcode; } /// Returns whether to use the DWARF2 format for bitfields instyead of the /// DWARF4 format. bool useDWARF2Bitfields() const { return UseDWARF2Bitfields; } // Experimental DWARF5 features. /// Returns whether or not to emit tables that dwarf consumers can /// use to accelerate lookup. bool useDwarfAccelTables() const { return HasDwarfAccelTables; } bool useAppleExtensionAttributes() const { return HasAppleExtensionAttributes; } /// Returns whether or not to change the current debug info for the /// split dwarf proposal support. bool useSplitDwarf() const { return HasSplitDwarf; } /// Returns the Dwarf Version. uint16_t getDwarfVersion() const; /// Returns the previous CU that was being updated const DwarfCompileUnit *getPrevCU() const { return PrevCU; } void setPrevCU(const DwarfCompileUnit *PrevCU) { this->PrevCU = PrevCU; } /// Returns the entries for the .debug_loc section. const DebugLocStream &getDebugLocs() const { return DebugLocs; } /// Emit an entry for the debug loc section. This can be used to /// handle an entry that's going to be emitted into the debug loc section. void emitDebugLocEntry(ByteStreamer &Streamer, const DebugLocStream::Entry &Entry); /// Emit the location for a debug loc entry, including the size header. void emitDebugLocEntryLocation(const DebugLocStream::Entry &Entry); /// Find the MDNode for the given reference. template T *resolve(TypedDINodeRef Ref) const { return Ref.resolve(); } void addSubprogramNames(const DISubprogram *SP, DIE &Die); AddressPool &getAddressPool() { return AddrPool; } void addAccelName(StringRef Name, const DIE &Die); void addAccelObjC(StringRef Name, const DIE &Die); void addAccelNamespace(StringRef Name, const DIE &Die); void addAccelType(StringRef Name, const DIE &Die, char Flags); const MachineFunction *getCurrentFunction() const { return CurFn; } /// A helper function to check whether the DIE for a given Scope is /// going to be null. bool isLexicalScopeDIENull(LexicalScope *Scope); }; } // End of namespace llvm #endif Index: vendor/llvm/dist/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp =================================================================== --- vendor/llvm/dist/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (revision 314158) +++ vendor/llvm/dist/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp (revision 314159) @@ -1,1752 +1,1754 @@ //=- AArch64LoadStoreOptimizer.cpp - AArch64 load/store opt. pass -*- C++ -*-=// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file contains a pass that performs load / store related peephole // optimizations. This pass should be run after register allocation. // //===----------------------------------------------------------------------===// #include "AArch64InstrInfo.h" #include "AArch64Subtarget.h" #include "MCTargetDesc/AArch64AddressingModes.h" #include "llvm/ADT/BitVector.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/Statistic.h" #include "llvm/CodeGen/MachineBasicBlock.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineInstr.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Target/TargetInstrInfo.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Target/TargetRegisterInfo.h" using namespace llvm; #define DEBUG_TYPE "aarch64-ldst-opt" STATISTIC(NumPairCreated, "Number of load/store pair instructions generated"); STATISTIC(NumPostFolded, "Number of post-index updates folded"); STATISTIC(NumPreFolded, "Number of pre-index updates folded"); STATISTIC(NumUnscaledPairCreated, "Number of load/store from unscaled generated"); STATISTIC(NumZeroStoresPromoted, "Number of narrow zero stores promoted"); STATISTIC(NumLoadsFromStoresPromoted, "Number of loads from stores promoted"); // The LdStLimit limits how far we search for load/store pairs. static cl::opt LdStLimit("aarch64-load-store-scan-limit", cl::init(20), cl::Hidden); // The UpdateLimit limits how far we search for update instructions when we form // pre-/post-index instructions. static cl::opt UpdateLimit("aarch64-update-scan-limit", cl::init(100), cl::Hidden); #define AARCH64_LOAD_STORE_OPT_NAME "AArch64 load / store optimization pass" namespace { typedef struct LdStPairFlags { // If a matching instruction is found, MergeForward is set to true if the // merge is to remove the first instruction and replace the second with // a pair-wise insn, and false if the reverse is true. bool MergeForward; // SExtIdx gives the index of the result of the load pair that must be // extended. The value of SExtIdx assumes that the paired load produces the // value in this order: (I, returned iterator), i.e., -1 means no value has // to be extended, 0 means I, and 1 means the returned iterator. int SExtIdx; LdStPairFlags() : MergeForward(false), SExtIdx(-1) {} void setMergeForward(bool V = true) { MergeForward = V; } bool getMergeForward() const { return MergeForward; } void setSExtIdx(int V) { SExtIdx = V; } int getSExtIdx() const { return SExtIdx; } } LdStPairFlags; struct AArch64LoadStoreOpt : public MachineFunctionPass { static char ID; AArch64LoadStoreOpt() : MachineFunctionPass(ID) { initializeAArch64LoadStoreOptPass(*PassRegistry::getPassRegistry()); } const AArch64InstrInfo *TII; const TargetRegisterInfo *TRI; const AArch64Subtarget *Subtarget; // Track which registers have been modified and used. BitVector ModifiedRegs, UsedRegs; // Scan the instructions looking for a load/store that can be combined // with the current instruction into a load/store pair. // Return the matching instruction if one is found, else MBB->end(). MachineBasicBlock::iterator findMatchingInsn(MachineBasicBlock::iterator I, LdStPairFlags &Flags, unsigned Limit, bool FindNarrowMerge); // Scan the instructions looking for a store that writes to the address from // which the current load instruction reads. Return true if one is found. bool findMatchingStore(MachineBasicBlock::iterator I, unsigned Limit, MachineBasicBlock::iterator &StoreI); // Merge the two instructions indicated into a wider narrow store instruction. MachineBasicBlock::iterator mergeNarrowZeroStores(MachineBasicBlock::iterator I, MachineBasicBlock::iterator MergeMI, const LdStPairFlags &Flags); // Merge the two instructions indicated into a single pair-wise instruction. MachineBasicBlock::iterator mergePairedInsns(MachineBasicBlock::iterator I, MachineBasicBlock::iterator Paired, const LdStPairFlags &Flags); // Promote the load that reads directly from the address stored to. MachineBasicBlock::iterator promoteLoadFromStore(MachineBasicBlock::iterator LoadI, MachineBasicBlock::iterator StoreI); // Scan the instruction list to find a base register update that can // be combined with the current instruction (a load or store) using // pre or post indexed addressing with writeback. Scan forwards. MachineBasicBlock::iterator findMatchingUpdateInsnForward(MachineBasicBlock::iterator I, int UnscaledOffset, unsigned Limit); // Scan the instruction list to find a base register update that can // be combined with the current instruction (a load or store) using // pre or post indexed addressing with writeback. Scan backwards. MachineBasicBlock::iterator findMatchingUpdateInsnBackward(MachineBasicBlock::iterator I, unsigned Limit); // Find an instruction that updates the base register of the ld/st // instruction. bool isMatchingUpdateInsn(MachineInstr &MemMI, MachineInstr &MI, unsigned BaseReg, int Offset); // Merge a pre- or post-index base register update into a ld/st instruction. MachineBasicBlock::iterator mergeUpdateInsn(MachineBasicBlock::iterator I, MachineBasicBlock::iterator Update, bool IsPreIdx); // Find and merge zero store instructions. bool tryToMergeZeroStInst(MachineBasicBlock::iterator &MBBI); // Find and pair ldr/str instructions. bool tryToPairLdStInst(MachineBasicBlock::iterator &MBBI); // Find and promote load instructions which read directly from store. bool tryToPromoteLoadFromStore(MachineBasicBlock::iterator &MBBI); bool optimizeBlock(MachineBasicBlock &MBB, bool EnableNarrowZeroStOpt); bool runOnMachineFunction(MachineFunction &Fn) override; MachineFunctionProperties getRequiredProperties() const override { return MachineFunctionProperties().set( MachineFunctionProperties::Property::NoVRegs); } StringRef getPassName() const override { return AARCH64_LOAD_STORE_OPT_NAME; } }; char AArch64LoadStoreOpt::ID = 0; } // namespace INITIALIZE_PASS(AArch64LoadStoreOpt, "aarch64-ldst-opt", AARCH64_LOAD_STORE_OPT_NAME, false, false) static bool isNarrowStore(unsigned Opc) { switch (Opc) { default: return false; case AArch64::STRBBui: case AArch64::STURBBi: case AArch64::STRHHui: case AArch64::STURHHi: return true; } } // Scaling factor for unscaled load or store. static int getMemScale(MachineInstr &MI) { switch (MI.getOpcode()) { default: llvm_unreachable("Opcode has unknown scale!"); case AArch64::LDRBBui: case AArch64::LDURBBi: case AArch64::LDRSBWui: case AArch64::LDURSBWi: case AArch64::STRBBui: case AArch64::STURBBi: return 1; case AArch64::LDRHHui: case AArch64::LDURHHi: case AArch64::LDRSHWui: case AArch64::LDURSHWi: case AArch64::STRHHui: case AArch64::STURHHi: return 2; case AArch64::LDRSui: case AArch64::LDURSi: case AArch64::LDRSWui: case AArch64::LDURSWi: case AArch64::LDRWui: case AArch64::LDURWi: case AArch64::STRSui: case AArch64::STURSi: case AArch64::STRWui: case AArch64::STURWi: case AArch64::LDPSi: case AArch64::LDPSWi: case AArch64::LDPWi: case AArch64::STPSi: case AArch64::STPWi: return 4; case AArch64::LDRDui: case AArch64::LDURDi: case AArch64::LDRXui: case AArch64::LDURXi: case AArch64::STRDui: case AArch64::STURDi: case AArch64::STRXui: case AArch64::STURXi: case AArch64::LDPDi: case AArch64::LDPXi: case AArch64::STPDi: case AArch64::STPXi: return 8; case AArch64::LDRQui: case AArch64::LDURQi: case AArch64::STRQui: case AArch64::STURQi: case AArch64::LDPQi: case AArch64::STPQi: return 16; } } static unsigned getMatchingNonSExtOpcode(unsigned Opc, bool *IsValidLdStrOpc = nullptr) { if (IsValidLdStrOpc) *IsValidLdStrOpc = true; switch (Opc) { default: if (IsValidLdStrOpc) *IsValidLdStrOpc = false; return UINT_MAX; case AArch64::STRDui: case AArch64::STURDi: case AArch64::STRQui: case AArch64::STURQi: case AArch64::STRBBui: case AArch64::STURBBi: case AArch64::STRHHui: case AArch64::STURHHi: case AArch64::STRWui: case AArch64::STURWi: case AArch64::STRXui: case AArch64::STURXi: case AArch64::LDRDui: case AArch64::LDURDi: case AArch64::LDRQui: case AArch64::LDURQi: case AArch64::LDRWui: case AArch64::LDURWi: case AArch64::LDRXui: case AArch64::LDURXi: case AArch64::STRSui: case AArch64::STURSi: case AArch64::LDRSui: case AArch64::LDURSi: return Opc; case AArch64::LDRSWui: return AArch64::LDRWui; case AArch64::LDURSWi: return AArch64::LDURWi; } } static unsigned getMatchingWideOpcode(unsigned Opc) { switch (Opc) { default: llvm_unreachable("Opcode has no wide equivalent!"); case AArch64::STRBBui: return AArch64::STRHHui; case AArch64::STRHHui: return AArch64::STRWui; case AArch64::STURBBi: return AArch64::STURHHi; case AArch64::STURHHi: return AArch64::STURWi; case AArch64::STURWi: return AArch64::STURXi; case AArch64::STRWui: return AArch64::STRXui; } } static unsigned getMatchingPairOpcode(unsigned Opc) { switch (Opc) { default: llvm_unreachable("Opcode has no pairwise equivalent!"); case AArch64::STRSui: case AArch64::STURSi: return AArch64::STPSi; case AArch64::STRDui: case AArch64::STURDi: return AArch64::STPDi; case AArch64::STRQui: case AArch64::STURQi: return AArch64::STPQi; case AArch64::STRWui: case AArch64::STURWi: return AArch64::STPWi; case AArch64::STRXui: case AArch64::STURXi: return AArch64::STPXi; case AArch64::LDRSui: case AArch64::LDURSi: return AArch64::LDPSi; case AArch64::LDRDui: case AArch64::LDURDi: return AArch64::LDPDi; case AArch64::LDRQui: case AArch64::LDURQi: return AArch64::LDPQi; case AArch64::LDRWui: case AArch64::LDURWi: return AArch64::LDPWi; case AArch64::LDRXui: case AArch64::LDURXi: return AArch64::LDPXi; case AArch64::LDRSWui: case AArch64::LDURSWi: return AArch64::LDPSWi; } } static unsigned isMatchingStore(MachineInstr &LoadInst, MachineInstr &StoreInst) { unsigned LdOpc = LoadInst.getOpcode(); unsigned StOpc = StoreInst.getOpcode(); switch (LdOpc) { default: llvm_unreachable("Unsupported load instruction!"); case AArch64::LDRBBui: return StOpc == AArch64::STRBBui || StOpc == AArch64::STRHHui || StOpc == AArch64::STRWui || StOpc == AArch64::STRXui; case AArch64::LDURBBi: return StOpc == AArch64::STURBBi || StOpc == AArch64::STURHHi || StOpc == AArch64::STURWi || StOpc == AArch64::STURXi; case AArch64::LDRHHui: return StOpc == AArch64::STRHHui || StOpc == AArch64::STRWui || StOpc == AArch64::STRXui; case AArch64::LDURHHi: return StOpc == AArch64::STURHHi || StOpc == AArch64::STURWi || StOpc == AArch64::STURXi; case AArch64::LDRWui: return StOpc == AArch64::STRWui || StOpc == AArch64::STRXui; case AArch64::LDURWi: return StOpc == AArch64::STURWi || StOpc == AArch64::STURXi; case AArch64::LDRXui: return StOpc == AArch64::STRXui; case AArch64::LDURXi: return StOpc == AArch64::STURXi; } } static unsigned getPreIndexedOpcode(unsigned Opc) { switch (Opc) { default: llvm_unreachable("Opcode has no pre-indexed equivalent!"); case AArch64::STRSui: return AArch64::STRSpre; case AArch64::STRDui: return AArch64::STRDpre; case AArch64::STRQui: return AArch64::STRQpre; case AArch64::STRBBui: return AArch64::STRBBpre; case AArch64::STRHHui: return AArch64::STRHHpre; case AArch64::STRWui: return AArch64::STRWpre; case AArch64::STRXui: return AArch64::STRXpre; case AArch64::LDRSui: return AArch64::LDRSpre; case AArch64::LDRDui: return AArch64::LDRDpre; case AArch64::LDRQui: return AArch64::LDRQpre; case AArch64::LDRBBui: return AArch64::LDRBBpre; case AArch64::LDRHHui: return AArch64::LDRHHpre; case AArch64::LDRWui: return AArch64::LDRWpre; case AArch64::LDRXui: return AArch64::LDRXpre; case AArch64::LDRSWui: return AArch64::LDRSWpre; case AArch64::LDPSi: return AArch64::LDPSpre; case AArch64::LDPSWi: return AArch64::LDPSWpre; case AArch64::LDPDi: return AArch64::LDPDpre; case AArch64::LDPQi: return AArch64::LDPQpre; case AArch64::LDPWi: return AArch64::LDPWpre; case AArch64::LDPXi: return AArch64::LDPXpre; case AArch64::STPSi: return AArch64::STPSpre; case AArch64::STPDi: return AArch64::STPDpre; case AArch64::STPQi: return AArch64::STPQpre; case AArch64::STPWi: return AArch64::STPWpre; case AArch64::STPXi: return AArch64::STPXpre; } } static unsigned getPostIndexedOpcode(unsigned Opc) { switch (Opc) { default: llvm_unreachable("Opcode has no post-indexed wise equivalent!"); case AArch64::STRSui: return AArch64::STRSpost; case AArch64::STRDui: return AArch64::STRDpost; case AArch64::STRQui: return AArch64::STRQpost; case AArch64::STRBBui: return AArch64::STRBBpost; case AArch64::STRHHui: return AArch64::STRHHpost; case AArch64::STRWui: return AArch64::STRWpost; case AArch64::STRXui: return AArch64::STRXpost; case AArch64::LDRSui: return AArch64::LDRSpost; case AArch64::LDRDui: return AArch64::LDRDpost; case AArch64::LDRQui: return AArch64::LDRQpost; case AArch64::LDRBBui: return AArch64::LDRBBpost; case AArch64::LDRHHui: return AArch64::LDRHHpost; case AArch64::LDRWui: return AArch64::LDRWpost; case AArch64::LDRXui: return AArch64::LDRXpost; case AArch64::LDRSWui: return AArch64::LDRSWpost; case AArch64::LDPSi: return AArch64::LDPSpost; case AArch64::LDPSWi: return AArch64::LDPSWpost; case AArch64::LDPDi: return AArch64::LDPDpost; case AArch64::LDPQi: return AArch64::LDPQpost; case AArch64::LDPWi: return AArch64::LDPWpost; case AArch64::LDPXi: return AArch64::LDPXpost; case AArch64::STPSi: return AArch64::STPSpost; case AArch64::STPDi: return AArch64::STPDpost; case AArch64::STPQi: return AArch64::STPQpost; case AArch64::STPWi: return AArch64::STPWpost; case AArch64::STPXi: return AArch64::STPXpost; } } static bool isPairedLdSt(const MachineInstr &MI) { switch (MI.getOpcode()) { default: return false; case AArch64::LDPSi: case AArch64::LDPSWi: case AArch64::LDPDi: case AArch64::LDPQi: case AArch64::LDPWi: case AArch64::LDPXi: case AArch64::STPSi: case AArch64::STPDi: case AArch64::STPQi: case AArch64::STPWi: case AArch64::STPXi: return true; } } static const MachineOperand &getLdStRegOp(const MachineInstr &MI, unsigned PairedRegOp = 0) { assert(PairedRegOp < 2 && "Unexpected register operand idx."); unsigned Idx = isPairedLdSt(MI) ? PairedRegOp : 0; return MI.getOperand(Idx); } static const MachineOperand &getLdStBaseOp(const MachineInstr &MI) { unsigned Idx = isPairedLdSt(MI) ? 2 : 1; return MI.getOperand(Idx); } static const MachineOperand &getLdStOffsetOp(const MachineInstr &MI) { unsigned Idx = isPairedLdSt(MI) ? 3 : 2; return MI.getOperand(Idx); } static bool isLdOffsetInRangeOfSt(MachineInstr &LoadInst, MachineInstr &StoreInst, const AArch64InstrInfo *TII) { assert(isMatchingStore(LoadInst, StoreInst) && "Expect only matched ld/st."); int LoadSize = getMemScale(LoadInst); int StoreSize = getMemScale(StoreInst); int UnscaledStOffset = TII->isUnscaledLdSt(StoreInst) ? getLdStOffsetOp(StoreInst).getImm() : getLdStOffsetOp(StoreInst).getImm() * StoreSize; int UnscaledLdOffset = TII->isUnscaledLdSt(LoadInst) ? getLdStOffsetOp(LoadInst).getImm() : getLdStOffsetOp(LoadInst).getImm() * LoadSize; return (UnscaledStOffset <= UnscaledLdOffset) && (UnscaledLdOffset + LoadSize <= (UnscaledStOffset + StoreSize)); } static bool isPromotableZeroStoreInst(MachineInstr &MI) { unsigned Opc = MI.getOpcode(); return (Opc == AArch64::STRWui || Opc == AArch64::STURWi || isNarrowStore(Opc)) && getLdStRegOp(MI).getReg() == AArch64::WZR; } MachineBasicBlock::iterator AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I, MachineBasicBlock::iterator MergeMI, const LdStPairFlags &Flags) { assert(isPromotableZeroStoreInst(*I) && isPromotableZeroStoreInst(*MergeMI) && "Expected promotable zero stores."); MachineBasicBlock::iterator NextI = I; ++NextI; // If NextI is the second of the two instructions to be merged, we need // to skip one further. Either way we merge will invalidate the iterator, // and we don't need to scan the new instruction, as it's a pairwise // instruction, which we're not considering for further action anyway. if (NextI == MergeMI) ++NextI; unsigned Opc = I->getOpcode(); bool IsScaled = !TII->isUnscaledLdSt(Opc); int OffsetStride = IsScaled ? 1 : getMemScale(*I); bool MergeForward = Flags.getMergeForward(); // Insert our new paired instruction after whichever of the paired // instructions MergeForward indicates. MachineBasicBlock::iterator InsertionPoint = MergeForward ? MergeMI : I; // Also based on MergeForward is from where we copy the base register operand // so we get the flags compatible with the input code. const MachineOperand &BaseRegOp = MergeForward ? getLdStBaseOp(*MergeMI) : getLdStBaseOp(*I); // Which register is Rt and which is Rt2 depends on the offset order. MachineInstr *RtMI; if (getLdStOffsetOp(*I).getImm() == getLdStOffsetOp(*MergeMI).getImm() + OffsetStride) RtMI = &*MergeMI; else RtMI = &*I; int OffsetImm = getLdStOffsetOp(*RtMI).getImm(); // Change the scaled offset from small to large type. if (IsScaled) { assert(((OffsetImm & 1) == 0) && "Unexpected offset to merge"); OffsetImm /= 2; } // Construct the new instruction. DebugLoc DL = I->getDebugLoc(); MachineBasicBlock *MBB = I->getParent(); MachineInstrBuilder MIB; MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(getMatchingWideOpcode(Opc))) .addReg(isNarrowStore(Opc) ? AArch64::WZR : AArch64::XZR) .addOperand(BaseRegOp) .addImm(OffsetImm) .setMemRefs(I->mergeMemRefsWith(*MergeMI)); (void)MIB; DEBUG(dbgs() << "Creating wider store. Replacing instructions:\n "); DEBUG(I->print(dbgs())); DEBUG(dbgs() << " "); DEBUG(MergeMI->print(dbgs())); DEBUG(dbgs() << " with instruction:\n "); DEBUG(((MachineInstr *)MIB)->print(dbgs())); DEBUG(dbgs() << "\n"); // Erase the old instructions. I->eraseFromParent(); MergeMI->eraseFromParent(); return NextI; } MachineBasicBlock::iterator AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I, MachineBasicBlock::iterator Paired, const LdStPairFlags &Flags) { MachineBasicBlock::iterator NextI = I; ++NextI; // If NextI is the second of the two instructions to be merged, we need // to skip one further. Either way we merge will invalidate the iterator, // and we don't need to scan the new instruction, as it's a pairwise // instruction, which we're not considering for further action anyway. if (NextI == Paired) ++NextI; int SExtIdx = Flags.getSExtIdx(); unsigned Opc = SExtIdx == -1 ? I->getOpcode() : getMatchingNonSExtOpcode(I->getOpcode()); bool IsUnscaled = TII->isUnscaledLdSt(Opc); int OffsetStride = IsUnscaled ? getMemScale(*I) : 1; bool MergeForward = Flags.getMergeForward(); // Insert our new paired instruction after whichever of the paired // instructions MergeForward indicates. MachineBasicBlock::iterator InsertionPoint = MergeForward ? Paired : I; // Also based on MergeForward is from where we copy the base register operand // so we get the flags compatible with the input code. const MachineOperand &BaseRegOp = MergeForward ? getLdStBaseOp(*Paired) : getLdStBaseOp(*I); int Offset = getLdStOffsetOp(*I).getImm(); int PairedOffset = getLdStOffsetOp(*Paired).getImm(); bool PairedIsUnscaled = TII->isUnscaledLdSt(Paired->getOpcode()); if (IsUnscaled != PairedIsUnscaled) { // We're trying to pair instructions that differ in how they are scaled. If // I is scaled then scale the offset of Paired accordingly. Otherwise, do // the opposite (i.e., make Paired's offset unscaled). int MemSize = getMemScale(*Paired); if (PairedIsUnscaled) { // If the unscaled offset isn't a multiple of the MemSize, we can't // pair the operations together. assert(!(PairedOffset % getMemScale(*Paired)) && "Offset should be a multiple of the stride!"); PairedOffset /= MemSize; } else { PairedOffset *= MemSize; } } // Which register is Rt and which is Rt2 depends on the offset order. MachineInstr *RtMI, *Rt2MI; if (Offset == PairedOffset + OffsetStride) { RtMI = &*Paired; Rt2MI = &*I; // Here we swapped the assumption made for SExtIdx. // I.e., we turn ldp I, Paired into ldp Paired, I. // Update the index accordingly. if (SExtIdx != -1) SExtIdx = (SExtIdx + 1) % 2; } else { RtMI = &*I; Rt2MI = &*Paired; } int OffsetImm = getLdStOffsetOp(*RtMI).getImm(); // Scale the immediate offset, if necessary. if (TII->isUnscaledLdSt(RtMI->getOpcode())) { assert(!(OffsetImm % getMemScale(*RtMI)) && "Unscaled offset cannot be scaled."); OffsetImm /= getMemScale(*RtMI); } // Construct the new instruction. MachineInstrBuilder MIB; DebugLoc DL = I->getDebugLoc(); MachineBasicBlock *MBB = I->getParent(); MachineOperand RegOp0 = getLdStRegOp(*RtMI); MachineOperand RegOp1 = getLdStRegOp(*Rt2MI); // Kill flags may become invalid when moving stores for pairing. if (RegOp0.isUse()) { if (!MergeForward) { // Clear kill flags on store if moving upwards. Example: // STRWui %w0, ... // USE %w1 // STRWui kill %w1 ; need to clear kill flag when moving STRWui upwards RegOp0.setIsKill(false); RegOp1.setIsKill(false); } else { // Clear kill flags of the first stores register. Example: // STRWui %w1, ... // USE kill %w1 ; need to clear kill flag when moving STRWui downwards // STRW %w0 unsigned Reg = getLdStRegOp(*I).getReg(); for (MachineInstr &MI : make_range(std::next(I), Paired)) MI.clearRegisterKills(Reg, TRI); } } MIB = BuildMI(*MBB, InsertionPoint, DL, TII->get(getMatchingPairOpcode(Opc))) .addOperand(RegOp0) .addOperand(RegOp1) .addOperand(BaseRegOp) .addImm(OffsetImm) .setMemRefs(I->mergeMemRefsWith(*Paired)); (void)MIB; DEBUG(dbgs() << "Creating pair load/store. Replacing instructions:\n "); DEBUG(I->print(dbgs())); DEBUG(dbgs() << " "); DEBUG(Paired->print(dbgs())); DEBUG(dbgs() << " with instruction:\n "); if (SExtIdx != -1) { // Generate the sign extension for the proper result of the ldp. // I.e., with X1, that would be: // %W1 = KILL %W1, %X1 // %X1 = SBFMXri %X1, 0, 31 MachineOperand &DstMO = MIB->getOperand(SExtIdx); // Right now, DstMO has the extended register, since it comes from an // extended opcode. unsigned DstRegX = DstMO.getReg(); // Get the W variant of that register. unsigned DstRegW = TRI->getSubReg(DstRegX, AArch64::sub_32); // Update the result of LDP to use the W instead of the X variant. DstMO.setReg(DstRegW); DEBUG(((MachineInstr *)MIB)->print(dbgs())); DEBUG(dbgs() << "\n"); // Make the machine verifier happy by providing a definition for // the X register. // Insert this definition right after the generated LDP, i.e., before // InsertionPoint. MachineInstrBuilder MIBKill = BuildMI(*MBB, InsertionPoint, DL, TII->get(TargetOpcode::KILL), DstRegW) .addReg(DstRegW) .addReg(DstRegX, RegState::Define); MIBKill->getOperand(2).setImplicit(); // Create the sign extension. MachineInstrBuilder MIBSXTW = BuildMI(*MBB, InsertionPoint, DL, TII->get(AArch64::SBFMXri), DstRegX) .addReg(DstRegX) .addImm(0) .addImm(31); (void)MIBSXTW; DEBUG(dbgs() << " Extend operand:\n "); DEBUG(((MachineInstr *)MIBSXTW)->print(dbgs())); } else { DEBUG(((MachineInstr *)MIB)->print(dbgs())); } DEBUG(dbgs() << "\n"); // Erase the old instructions. I->eraseFromParent(); Paired->eraseFromParent(); return NextI; } MachineBasicBlock::iterator AArch64LoadStoreOpt::promoteLoadFromStore(MachineBasicBlock::iterator LoadI, MachineBasicBlock::iterator StoreI) { MachineBasicBlock::iterator NextI = LoadI; ++NextI; int LoadSize = getMemScale(*LoadI); int StoreSize = getMemScale(*StoreI); unsigned LdRt = getLdStRegOp(*LoadI).getReg(); unsigned StRt = getLdStRegOp(*StoreI).getReg(); bool IsStoreXReg = TRI->getRegClass(AArch64::GPR64RegClassID)->contains(StRt); assert((IsStoreXReg || TRI->getRegClass(AArch64::GPR32RegClassID)->contains(StRt)) && "Unexpected RegClass"); MachineInstr *BitExtMI; if (LoadSize == StoreSize && (LoadSize == 4 || LoadSize == 8)) { // Remove the load, if the destination register of the loads is the same // register for stored value. if (StRt == LdRt && LoadSize == 8) { StoreI->clearRegisterKills(StRt, TRI); DEBUG(dbgs() << "Remove load instruction:\n "); DEBUG(LoadI->print(dbgs())); DEBUG(dbgs() << "\n"); LoadI->eraseFromParent(); return NextI; } // Replace the load with a mov if the load and store are in the same size. BitExtMI = BuildMI(*LoadI->getParent(), LoadI, LoadI->getDebugLoc(), TII->get(IsStoreXReg ? AArch64::ORRXrs : AArch64::ORRWrs), LdRt) .addReg(IsStoreXReg ? AArch64::XZR : AArch64::WZR) .addReg(StRt) .addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0)); } else { // FIXME: Currently we disable this transformation in big-endian targets as // performance and correctness are verified only in little-endian. if (!Subtarget->isLittleEndian()) return NextI; bool IsUnscaled = TII->isUnscaledLdSt(*LoadI); assert(IsUnscaled == TII->isUnscaledLdSt(*StoreI) && "Unsupported ld/st match"); assert(LoadSize <= StoreSize && "Invalid load size"); int UnscaledLdOffset = IsUnscaled ? getLdStOffsetOp(*LoadI).getImm() : getLdStOffsetOp(*LoadI).getImm() * LoadSize; int UnscaledStOffset = IsUnscaled ? getLdStOffsetOp(*StoreI).getImm() : getLdStOffsetOp(*StoreI).getImm() * StoreSize; int Width = LoadSize * 8; int Immr = 8 * (UnscaledLdOffset - UnscaledStOffset); int Imms = Immr + Width - 1; unsigned DestReg = IsStoreXReg ? TRI->getMatchingSuperReg(LdRt, AArch64::sub_32, &AArch64::GPR64RegClass) : LdRt; assert((UnscaledLdOffset >= UnscaledStOffset && (UnscaledLdOffset + LoadSize) <= UnscaledStOffset + StoreSize) && "Invalid offset"); Immr = 8 * (UnscaledLdOffset - UnscaledStOffset); Imms = Immr + Width - 1; if (UnscaledLdOffset == UnscaledStOffset) { uint32_t AndMaskEncoded = ((IsStoreXReg ? 1 : 0) << 12) // N | ((Immr) << 6) // immr | ((Imms) << 0) // imms ; BitExtMI = BuildMI(*LoadI->getParent(), LoadI, LoadI->getDebugLoc(), TII->get(IsStoreXReg ? AArch64::ANDXri : AArch64::ANDWri), DestReg) .addReg(StRt) .addImm(AndMaskEncoded); } else { BitExtMI = BuildMI(*LoadI->getParent(), LoadI, LoadI->getDebugLoc(), TII->get(IsStoreXReg ? AArch64::UBFMXri : AArch64::UBFMWri), DestReg) .addReg(StRt) .addImm(Immr) .addImm(Imms); } } - StoreI->clearRegisterKills(StRt, TRI); - (void)BitExtMI; + // Clear kill flags between store and load. + for (MachineInstr &MI : make_range(StoreI->getIterator(), + BitExtMI->getIterator())) + MI.clearRegisterKills(StRt, TRI); DEBUG(dbgs() << "Promoting load by replacing :\n "); DEBUG(StoreI->print(dbgs())); DEBUG(dbgs() << " "); DEBUG(LoadI->print(dbgs())); DEBUG(dbgs() << " with instructions:\n "); DEBUG(StoreI->print(dbgs())); DEBUG(dbgs() << " "); DEBUG((BitExtMI)->print(dbgs())); DEBUG(dbgs() << "\n"); // Erase the old instructions. LoadI->eraseFromParent(); return NextI; } /// trackRegDefsUses - Remember what registers the specified instruction uses /// and modifies. static void trackRegDefsUses(const MachineInstr &MI, BitVector &ModifiedRegs, BitVector &UsedRegs, const TargetRegisterInfo *TRI) { for (const MachineOperand &MO : MI.operands()) { if (MO.isRegMask()) ModifiedRegs.setBitsNotInMask(MO.getRegMask()); if (!MO.isReg()) continue; unsigned Reg = MO.getReg(); if (!Reg) continue; if (MO.isDef()) { // WZR/XZR are not modified even when used as a destination register. if (Reg != AArch64::WZR && Reg != AArch64::XZR) for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI) ModifiedRegs.set(*AI); } else { assert(MO.isUse() && "Reg operand not a def and not a use?!?"); for (MCRegAliasIterator AI(Reg, TRI, true); AI.isValid(); ++AI) UsedRegs.set(*AI); } } } static bool inBoundsForPair(bool IsUnscaled, int Offset, int OffsetStride) { // Convert the byte-offset used by unscaled into an "element" offset used // by the scaled pair load/store instructions. if (IsUnscaled) { // If the byte-offset isn't a multiple of the stride, there's no point // trying to match it. if (Offset % OffsetStride) return false; Offset /= OffsetStride; } return Offset <= 63 && Offset >= -64; } // Do alignment, specialized to power of 2 and for signed ints, // avoiding having to do a C-style cast from uint_64t to int when // using alignTo from include/llvm/Support/MathExtras.h. // FIXME: Move this function to include/MathExtras.h? static int alignTo(int Num, int PowOf2) { return (Num + PowOf2 - 1) & ~(PowOf2 - 1); } static bool mayAlias(MachineInstr &MIa, MachineInstr &MIb, const AArch64InstrInfo *TII) { // One of the instructions must modify memory. if (!MIa.mayStore() && !MIb.mayStore()) return false; // Both instructions must be memory operations. if (!MIa.mayLoadOrStore() && !MIb.mayLoadOrStore()) return false; return !TII->areMemAccessesTriviallyDisjoint(MIa, MIb); } static bool mayAlias(MachineInstr &MIa, SmallVectorImpl &MemInsns, const AArch64InstrInfo *TII) { for (MachineInstr *MIb : MemInsns) if (mayAlias(MIa, *MIb, TII)) return true; return false; } bool AArch64LoadStoreOpt::findMatchingStore( MachineBasicBlock::iterator I, unsigned Limit, MachineBasicBlock::iterator &StoreI) { MachineBasicBlock::iterator B = I->getParent()->begin(); MachineBasicBlock::iterator MBBI = I; MachineInstr &LoadMI = *I; unsigned BaseReg = getLdStBaseOp(LoadMI).getReg(); // If the load is the first instruction in the block, there's obviously // not any matching store. if (MBBI == B) return false; // Track which registers have been modified and used between the first insn // and the second insn. ModifiedRegs.reset(); UsedRegs.reset(); unsigned Count = 0; do { --MBBI; MachineInstr &MI = *MBBI; // Don't count transient instructions towards the search limit since there // may be different numbers of them if e.g. debug information is present. if (!MI.isTransient()) ++Count; // If the load instruction reads directly from the address to which the // store instruction writes and the stored value is not modified, we can // promote the load. Since we do not handle stores with pre-/post-index, // it's unnecessary to check if BaseReg is modified by the store itself. if (MI.mayStore() && isMatchingStore(LoadMI, MI) && BaseReg == getLdStBaseOp(MI).getReg() && isLdOffsetInRangeOfSt(LoadMI, MI, TII) && !ModifiedRegs[getLdStRegOp(MI).getReg()]) { StoreI = MBBI; return true; } if (MI.isCall()) return false; // Update modified / uses register lists. trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI); // Otherwise, if the base register is modified, we have no match, so // return early. if (ModifiedRegs[BaseReg]) return false; // If we encounter a store aliased with the load, return early. if (MI.mayStore() && mayAlias(LoadMI, MI, TII)) return false; } while (MBBI != B && Count < Limit); return false; } // Returns true if FirstMI and MI are candidates for merging or pairing. // Otherwise, returns false. static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI, LdStPairFlags &Flags, const AArch64InstrInfo *TII) { // If this is volatile or if pairing is suppressed, not a candidate. if (MI.hasOrderedMemoryRef() || TII->isLdStPairSuppressed(MI)) return false; // We should have already checked FirstMI for pair suppression and volatility. assert(!FirstMI.hasOrderedMemoryRef() && !TII->isLdStPairSuppressed(FirstMI) && "FirstMI shouldn't get here if either of these checks are true."); unsigned OpcA = FirstMI.getOpcode(); unsigned OpcB = MI.getOpcode(); // Opcodes match: nothing more to check. if (OpcA == OpcB) return true; // Try to match a sign-extended load/store with a zero-extended load/store. bool IsValidLdStrOpc, PairIsValidLdStrOpc; unsigned NonSExtOpc = getMatchingNonSExtOpcode(OpcA, &IsValidLdStrOpc); assert(IsValidLdStrOpc && "Given Opc should be a Load or Store with an immediate"); // OpcA will be the first instruction in the pair. if (NonSExtOpc == getMatchingNonSExtOpcode(OpcB, &PairIsValidLdStrOpc)) { Flags.setSExtIdx(NonSExtOpc == (unsigned)OpcA ? 1 : 0); return true; } // If the second instruction isn't even a mergable/pairable load/store, bail // out. if (!PairIsValidLdStrOpc) return false; // FIXME: We don't support merging narrow stores with mixed scaled/unscaled // offsets. if (isNarrowStore(OpcA) || isNarrowStore(OpcB)) return false; // Try to match an unscaled load/store with a scaled load/store. return TII->isUnscaledLdSt(OpcA) != TII->isUnscaledLdSt(OpcB) && getMatchingPairOpcode(OpcA) == getMatchingPairOpcode(OpcB); // FIXME: Can we also match a mixed sext/zext unscaled/scaled pair? } /// Scan the instructions looking for a load/store that can be combined with the /// current instruction into a wider equivalent or a load/store pair. MachineBasicBlock::iterator AArch64LoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I, LdStPairFlags &Flags, unsigned Limit, bool FindNarrowMerge) { MachineBasicBlock::iterator E = I->getParent()->end(); MachineBasicBlock::iterator MBBI = I; MachineInstr &FirstMI = *I; ++MBBI; bool MayLoad = FirstMI.mayLoad(); bool IsUnscaled = TII->isUnscaledLdSt(FirstMI); unsigned Reg = getLdStRegOp(FirstMI).getReg(); unsigned BaseReg = getLdStBaseOp(FirstMI).getReg(); int Offset = getLdStOffsetOp(FirstMI).getImm(); int OffsetStride = IsUnscaled ? getMemScale(FirstMI) : 1; bool IsPromotableZeroStore = isPromotableZeroStoreInst(FirstMI); // Track which registers have been modified and used between the first insn // (inclusive) and the second insn. ModifiedRegs.reset(); UsedRegs.reset(); // Remember any instructions that read/write memory between FirstMI and MI. SmallVector MemInsns; for (unsigned Count = 0; MBBI != E && Count < Limit; ++MBBI) { MachineInstr &MI = *MBBI; // Don't count transient instructions towards the search limit since there // may be different numbers of them if e.g. debug information is present. if (!MI.isTransient()) ++Count; Flags.setSExtIdx(-1); if (areCandidatesToMergeOrPair(FirstMI, MI, Flags, TII) && getLdStOffsetOp(MI).isImm()) { assert(MI.mayLoadOrStore() && "Expected memory operation."); // If we've found another instruction with the same opcode, check to see // if the base and offset are compatible with our starting instruction. // These instructions all have scaled immediate operands, so we just // check for +1/-1. Make sure to check the new instruction offset is // actually an immediate and not a symbolic reference destined for // a relocation. unsigned MIBaseReg = getLdStBaseOp(MI).getReg(); int MIOffset = getLdStOffsetOp(MI).getImm(); bool MIIsUnscaled = TII->isUnscaledLdSt(MI); if (IsUnscaled != MIIsUnscaled) { // We're trying to pair instructions that differ in how they are scaled. // If FirstMI is scaled then scale the offset of MI accordingly. // Otherwise, do the opposite (i.e., make MI's offset unscaled). int MemSize = getMemScale(MI); if (MIIsUnscaled) { // If the unscaled offset isn't a multiple of the MemSize, we can't // pair the operations together: bail and keep looking. if (MIOffset % MemSize) { trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI); MemInsns.push_back(&MI); continue; } MIOffset /= MemSize; } else { MIOffset *= MemSize; } } if (BaseReg == MIBaseReg && ((Offset == MIOffset + OffsetStride) || (Offset + OffsetStride == MIOffset))) { int MinOffset = Offset < MIOffset ? Offset : MIOffset; if (FindNarrowMerge) { // If the alignment requirements of the scaled wide load/store // instruction can't express the offset of the scaled narrow input, // bail and keep looking. For promotable zero stores, allow only when // the stored value is the same (i.e., WZR). if ((!IsUnscaled && alignTo(MinOffset, 2) != MinOffset) || (IsPromotableZeroStore && Reg != getLdStRegOp(MI).getReg())) { trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI); MemInsns.push_back(&MI); continue; } } else { // Pairwise instructions have a 7-bit signed offset field. Single // insns have a 12-bit unsigned offset field. If the resultant // immediate offset of merging these instructions is out of range for // a pairwise instruction, bail and keep looking. if (!inBoundsForPair(IsUnscaled, MinOffset, OffsetStride)) { trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI); MemInsns.push_back(&MI); continue; } // If the alignment requirements of the paired (scaled) instruction // can't express the offset of the unscaled input, bail and keep // looking. if (IsUnscaled && (alignTo(MinOffset, OffsetStride) != MinOffset)) { trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI); MemInsns.push_back(&MI); continue; } } // If the destination register of the loads is the same register, bail // and keep looking. A load-pair instruction with both destination // registers the same is UNPREDICTABLE and will result in an exception. if (MayLoad && Reg == getLdStRegOp(MI).getReg()) { trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI); MemInsns.push_back(&MI); continue; } // If the Rt of the second instruction was not modified or used between // the two instructions and none of the instructions between the second // and first alias with the second, we can combine the second into the // first. if (!ModifiedRegs[getLdStRegOp(MI).getReg()] && !(MI.mayLoad() && UsedRegs[getLdStRegOp(MI).getReg()]) && !mayAlias(MI, MemInsns, TII)) { Flags.setMergeForward(false); return MBBI; } // Likewise, if the Rt of the first instruction is not modified or used // between the two instructions and none of the instructions between the // first and the second alias with the first, we can combine the first // into the second. if (!ModifiedRegs[getLdStRegOp(FirstMI).getReg()] && !(MayLoad && UsedRegs[getLdStRegOp(FirstMI).getReg()]) && !mayAlias(FirstMI, MemInsns, TII)) { Flags.setMergeForward(true); return MBBI; } // Unable to combine these instructions due to interference in between. // Keep looking. } } // If the instruction wasn't a matching load or store. Stop searching if we // encounter a call instruction that might modify memory. if (MI.isCall()) return E; // Update modified / uses register lists. trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI); // Otherwise, if the base register is modified, we have no match, so // return early. if (ModifiedRegs[BaseReg]) return E; // Update list of instructions that read/write memory. if (MI.mayLoadOrStore()) MemInsns.push_back(&MI); } return E; } MachineBasicBlock::iterator AArch64LoadStoreOpt::mergeUpdateInsn(MachineBasicBlock::iterator I, MachineBasicBlock::iterator Update, bool IsPreIdx) { assert((Update->getOpcode() == AArch64::ADDXri || Update->getOpcode() == AArch64::SUBXri) && "Unexpected base register update instruction to merge!"); MachineBasicBlock::iterator NextI = I; // Return the instruction following the merged instruction, which is // the instruction following our unmerged load. Unless that's the add/sub // instruction we're merging, in which case it's the one after that. if (++NextI == Update) ++NextI; int Value = Update->getOperand(2).getImm(); assert(AArch64_AM::getShiftValue(Update->getOperand(3).getImm()) == 0 && "Can't merge 1 << 12 offset into pre-/post-indexed load / store"); if (Update->getOpcode() == AArch64::SUBXri) Value = -Value; unsigned NewOpc = IsPreIdx ? getPreIndexedOpcode(I->getOpcode()) : getPostIndexedOpcode(I->getOpcode()); MachineInstrBuilder MIB; if (!isPairedLdSt(*I)) { // Non-paired instruction. MIB = BuildMI(*I->getParent(), I, I->getDebugLoc(), TII->get(NewOpc)) .addOperand(getLdStRegOp(*Update)) .addOperand(getLdStRegOp(*I)) .addOperand(getLdStBaseOp(*I)) .addImm(Value) .setMemRefs(I->memoperands_begin(), I->memoperands_end()); } else { // Paired instruction. int Scale = getMemScale(*I); MIB = BuildMI(*I->getParent(), I, I->getDebugLoc(), TII->get(NewOpc)) .addOperand(getLdStRegOp(*Update)) .addOperand(getLdStRegOp(*I, 0)) .addOperand(getLdStRegOp(*I, 1)) .addOperand(getLdStBaseOp(*I)) .addImm(Value / Scale) .setMemRefs(I->memoperands_begin(), I->memoperands_end()); } (void)MIB; if (IsPreIdx) DEBUG(dbgs() << "Creating pre-indexed load/store."); else DEBUG(dbgs() << "Creating post-indexed load/store."); DEBUG(dbgs() << " Replacing instructions:\n "); DEBUG(I->print(dbgs())); DEBUG(dbgs() << " "); DEBUG(Update->print(dbgs())); DEBUG(dbgs() << " with instruction:\n "); DEBUG(((MachineInstr *)MIB)->print(dbgs())); DEBUG(dbgs() << "\n"); // Erase the old instructions for the block. I->eraseFromParent(); Update->eraseFromParent(); return NextI; } bool AArch64LoadStoreOpt::isMatchingUpdateInsn(MachineInstr &MemMI, MachineInstr &MI, unsigned BaseReg, int Offset) { switch (MI.getOpcode()) { default: break; case AArch64::SUBXri: case AArch64::ADDXri: // Make sure it's a vanilla immediate operand, not a relocation or // anything else we can't handle. if (!MI.getOperand(2).isImm()) break; // Watch out for 1 << 12 shifted value. if (AArch64_AM::getShiftValue(MI.getOperand(3).getImm())) break; // The update instruction source and destination register must be the // same as the load/store base register. if (MI.getOperand(0).getReg() != BaseReg || MI.getOperand(1).getReg() != BaseReg) break; bool IsPairedInsn = isPairedLdSt(MemMI); int UpdateOffset = MI.getOperand(2).getImm(); if (MI.getOpcode() == AArch64::SUBXri) UpdateOffset = -UpdateOffset; // For non-paired load/store instructions, the immediate must fit in a // signed 9-bit integer. if (!IsPairedInsn && (UpdateOffset > 255 || UpdateOffset < -256)) break; // For paired load/store instructions, the immediate must be a multiple of // the scaling factor. The scaled offset must also fit into a signed 7-bit // integer. if (IsPairedInsn) { int Scale = getMemScale(MemMI); if (UpdateOffset % Scale != 0) break; int ScaledOffset = UpdateOffset / Scale; if (ScaledOffset > 63 || ScaledOffset < -64) break; } // If we have a non-zero Offset, we check that it matches the amount // we're adding to the register. if (!Offset || Offset == UpdateOffset) return true; break; } return false; } MachineBasicBlock::iterator AArch64LoadStoreOpt::findMatchingUpdateInsnForward( MachineBasicBlock::iterator I, int UnscaledOffset, unsigned Limit) { MachineBasicBlock::iterator E = I->getParent()->end(); MachineInstr &MemMI = *I; MachineBasicBlock::iterator MBBI = I; unsigned BaseReg = getLdStBaseOp(MemMI).getReg(); int MIUnscaledOffset = getLdStOffsetOp(MemMI).getImm() * getMemScale(MemMI); // Scan forward looking for post-index opportunities. Updating instructions // can't be formed if the memory instruction doesn't have the offset we're // looking for. if (MIUnscaledOffset != UnscaledOffset) return E; // If the base register overlaps a destination register, we can't // merge the update. bool IsPairedInsn = isPairedLdSt(MemMI); for (unsigned i = 0, e = IsPairedInsn ? 2 : 1; i != e; ++i) { unsigned DestReg = getLdStRegOp(MemMI, i).getReg(); if (DestReg == BaseReg || TRI->isSubRegister(BaseReg, DestReg)) return E; } // Track which registers have been modified and used between the first insn // (inclusive) and the second insn. ModifiedRegs.reset(); UsedRegs.reset(); ++MBBI; for (unsigned Count = 0; MBBI != E && Count < Limit; ++MBBI) { MachineInstr &MI = *MBBI; // Don't count transient instructions towards the search limit since there // may be different numbers of them if e.g. debug information is present. if (!MI.isTransient()) ++Count; // If we found a match, return it. if (isMatchingUpdateInsn(*I, MI, BaseReg, UnscaledOffset)) return MBBI; // Update the status of what the instruction clobbered and used. trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI); // Otherwise, if the base register is used or modified, we have no match, so // return early. if (ModifiedRegs[BaseReg] || UsedRegs[BaseReg]) return E; } return E; } MachineBasicBlock::iterator AArch64LoadStoreOpt::findMatchingUpdateInsnBackward( MachineBasicBlock::iterator I, unsigned Limit) { MachineBasicBlock::iterator B = I->getParent()->begin(); MachineBasicBlock::iterator E = I->getParent()->end(); MachineInstr &MemMI = *I; MachineBasicBlock::iterator MBBI = I; unsigned BaseReg = getLdStBaseOp(MemMI).getReg(); int Offset = getLdStOffsetOp(MemMI).getImm(); // If the load/store is the first instruction in the block, there's obviously // not any matching update. Ditto if the memory offset isn't zero. if (MBBI == B || Offset != 0) return E; // If the base register overlaps a destination register, we can't // merge the update. bool IsPairedInsn = isPairedLdSt(MemMI); for (unsigned i = 0, e = IsPairedInsn ? 2 : 1; i != e; ++i) { unsigned DestReg = getLdStRegOp(MemMI, i).getReg(); if (DestReg == BaseReg || TRI->isSubRegister(BaseReg, DestReg)) return E; } // Track which registers have been modified and used between the first insn // (inclusive) and the second insn. ModifiedRegs.reset(); UsedRegs.reset(); unsigned Count = 0; do { --MBBI; MachineInstr &MI = *MBBI; // Don't count transient instructions towards the search limit since there // may be different numbers of them if e.g. debug information is present. if (!MI.isTransient()) ++Count; // If we found a match, return it. if (isMatchingUpdateInsn(*I, MI, BaseReg, Offset)) return MBBI; // Update the status of what the instruction clobbered and used. trackRegDefsUses(MI, ModifiedRegs, UsedRegs, TRI); // Otherwise, if the base register is used or modified, we have no match, so // return early. if (ModifiedRegs[BaseReg] || UsedRegs[BaseReg]) return E; } while (MBBI != B && Count < Limit); return E; } bool AArch64LoadStoreOpt::tryToPromoteLoadFromStore( MachineBasicBlock::iterator &MBBI) { MachineInstr &MI = *MBBI; // If this is a volatile load, don't mess with it. if (MI.hasOrderedMemoryRef()) return false; // Make sure this is a reg+imm. // FIXME: It is possible to extend it to handle reg+reg cases. if (!getLdStOffsetOp(MI).isImm()) return false; // Look backward up to LdStLimit instructions. MachineBasicBlock::iterator StoreI; if (findMatchingStore(MBBI, LdStLimit, StoreI)) { ++NumLoadsFromStoresPromoted; // Promote the load. Keeping the iterator straight is a // pain, so we let the merge routine tell us what the next instruction // is after it's done mucking about. MBBI = promoteLoadFromStore(MBBI, StoreI); return true; } return false; } // Merge adjacent zero stores into a wider store. bool AArch64LoadStoreOpt::tryToMergeZeroStInst( MachineBasicBlock::iterator &MBBI) { assert(isPromotableZeroStoreInst(*MBBI) && "Expected narrow store."); MachineInstr &MI = *MBBI; MachineBasicBlock::iterator E = MI.getParent()->end(); if (!TII->isCandidateToMergeOrPair(MI)) return false; // Look ahead up to LdStLimit instructions for a mergable instruction. LdStPairFlags Flags; MachineBasicBlock::iterator MergeMI = findMatchingInsn(MBBI, Flags, LdStLimit, /* FindNarrowMerge = */ true); if (MergeMI != E) { ++NumZeroStoresPromoted; // Keeping the iterator straight is a pain, so we let the merge routine tell // us what the next instruction is after it's done mucking about. MBBI = mergeNarrowZeroStores(MBBI, MergeMI, Flags); return true; } return false; } // Find loads and stores that can be merged into a single load or store pair // instruction. bool AArch64LoadStoreOpt::tryToPairLdStInst(MachineBasicBlock::iterator &MBBI) { MachineInstr &MI = *MBBI; MachineBasicBlock::iterator E = MI.getParent()->end(); if (!TII->isCandidateToMergeOrPair(MI)) return false; // Early exit if the offset is not possible to match. (6 bits of positive // range, plus allow an extra one in case we find a later insn that matches // with Offset-1) bool IsUnscaled = TII->isUnscaledLdSt(MI); int Offset = getLdStOffsetOp(MI).getImm(); int OffsetStride = IsUnscaled ? getMemScale(MI) : 1; // Allow one more for offset. if (Offset > 0) Offset -= OffsetStride; if (!inBoundsForPair(IsUnscaled, Offset, OffsetStride)) return false; // Look ahead up to LdStLimit instructions for a pairable instruction. LdStPairFlags Flags; MachineBasicBlock::iterator Paired = findMatchingInsn(MBBI, Flags, LdStLimit, /* FindNarrowMerge = */ false); if (Paired != E) { ++NumPairCreated; if (TII->isUnscaledLdSt(MI)) ++NumUnscaledPairCreated; // Keeping the iterator straight is a pain, so we let the merge routine tell // us what the next instruction is after it's done mucking about. MBBI = mergePairedInsns(MBBI, Paired, Flags); return true; } return false; } bool AArch64LoadStoreOpt::optimizeBlock(MachineBasicBlock &MBB, bool EnableNarrowZeroStOpt) { bool Modified = false; // Four tranformations to do here: // 1) Find loads that directly read from stores and promote them by // replacing with mov instructions. If the store is wider than the load, // the load will be replaced with a bitfield extract. // e.g., // str w1, [x0, #4] // ldrh w2, [x0, #6] // ; becomes // str w1, [x0, #4] // lsr w2, w1, #16 for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end(); MBBI != E;) { MachineInstr &MI = *MBBI; switch (MI.getOpcode()) { default: // Just move on to the next instruction. ++MBBI; break; // Scaled instructions. case AArch64::LDRBBui: case AArch64::LDRHHui: case AArch64::LDRWui: case AArch64::LDRXui: // Unscaled instructions. case AArch64::LDURBBi: case AArch64::LDURHHi: case AArch64::LDURWi: case AArch64::LDURXi: { if (tryToPromoteLoadFromStore(MBBI)) { Modified = true; break; } ++MBBI; break; } } } // 2) Merge adjacent zero stores into a wider store. // e.g., // strh wzr, [x0] // strh wzr, [x0, #2] // ; becomes // str wzr, [x0] // e.g., // str wzr, [x0] // str wzr, [x0, #4] // ; becomes // str xzr, [x0] for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end(); EnableNarrowZeroStOpt && MBBI != E;) { if (isPromotableZeroStoreInst(*MBBI)) { if (tryToMergeZeroStInst(MBBI)) { Modified = true; } else ++MBBI; } else ++MBBI; } // 3) Find loads and stores that can be merged into a single load or store // pair instruction. // e.g., // ldr x0, [x2] // ldr x1, [x2, #8] // ; becomes // ldp x0, x1, [x2] for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end(); MBBI != E;) { if (TII->isPairableLdStInst(*MBBI) && tryToPairLdStInst(MBBI)) Modified = true; else ++MBBI; } // 4) Find base register updates that can be merged into the load or store // as a base-reg writeback. // e.g., // ldr x0, [x2] // add x2, x2, #4 // ; becomes // ldr x0, [x2], #4 for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end(); MBBI != E;) { MachineInstr &MI = *MBBI; // Do update merging. It's simpler to keep this separate from the above // switchs, though not strictly necessary. unsigned Opc = MI.getOpcode(); switch (Opc) { default: // Just move on to the next instruction. ++MBBI; break; // Scaled instructions. case AArch64::STRSui: case AArch64::STRDui: case AArch64::STRQui: case AArch64::STRXui: case AArch64::STRWui: case AArch64::STRHHui: case AArch64::STRBBui: case AArch64::LDRSui: case AArch64::LDRDui: case AArch64::LDRQui: case AArch64::LDRXui: case AArch64::LDRWui: case AArch64::LDRHHui: case AArch64::LDRBBui: // Unscaled instructions. case AArch64::STURSi: case AArch64::STURDi: case AArch64::STURQi: case AArch64::STURWi: case AArch64::STURXi: case AArch64::LDURSi: case AArch64::LDURDi: case AArch64::LDURQi: case AArch64::LDURWi: case AArch64::LDURXi: // Paired instructions. case AArch64::LDPSi: case AArch64::LDPSWi: case AArch64::LDPDi: case AArch64::LDPQi: case AArch64::LDPWi: case AArch64::LDPXi: case AArch64::STPSi: case AArch64::STPDi: case AArch64::STPQi: case AArch64::STPWi: case AArch64::STPXi: { // Make sure this is a reg+imm (as opposed to an address reloc). if (!getLdStOffsetOp(MI).isImm()) { ++MBBI; break; } // Look forward to try to form a post-index instruction. For example, // ldr x0, [x20] // add x20, x20, #32 // merged into: // ldr x0, [x20], #32 MachineBasicBlock::iterator Update = findMatchingUpdateInsnForward(MBBI, 0, UpdateLimit); if (Update != E) { // Merge the update into the ld/st. MBBI = mergeUpdateInsn(MBBI, Update, /*IsPreIdx=*/false); Modified = true; ++NumPostFolded; break; } // Don't know how to handle pre/post-index versions, so move to the next // instruction. if (TII->isUnscaledLdSt(Opc)) { ++MBBI; break; } // Look back to try to find a pre-index instruction. For example, // add x0, x0, #8 // ldr x1, [x0] // merged into: // ldr x1, [x0, #8]! Update = findMatchingUpdateInsnBackward(MBBI, UpdateLimit); if (Update != E) { // Merge the update into the ld/st. MBBI = mergeUpdateInsn(MBBI, Update, /*IsPreIdx=*/true); Modified = true; ++NumPreFolded; break; } // The immediate in the load/store is scaled by the size of the memory // operation. The immediate in the add we're looking for, // however, is not, so adjust here. int UnscaledOffset = getLdStOffsetOp(MI).getImm() * getMemScale(MI); // Look forward to try to find a post-index instruction. For example, // ldr x1, [x0, #64] // add x0, x0, #64 // merged into: // ldr x1, [x0, #64]! Update = findMatchingUpdateInsnForward(MBBI, UnscaledOffset, UpdateLimit); if (Update != E) { // Merge the update into the ld/st. MBBI = mergeUpdateInsn(MBBI, Update, /*IsPreIdx=*/true); Modified = true; ++NumPreFolded; break; } // Nothing found. Just move to the next instruction. ++MBBI; break; } } } return Modified; } bool AArch64LoadStoreOpt::runOnMachineFunction(MachineFunction &Fn) { if (skipFunction(*Fn.getFunction())) return false; Subtarget = &static_cast(Fn.getSubtarget()); TII = static_cast(Subtarget->getInstrInfo()); TRI = Subtarget->getRegisterInfo(); // Resize the modified and used register bitfield trackers. We do this once // per function and then clear the bitfield each time we optimize a load or // store. ModifiedRegs.resize(TRI->getNumRegs()); UsedRegs.resize(TRI->getNumRegs()); bool Modified = false; bool enableNarrowZeroStOpt = !Subtarget->requiresStrictAlign(); for (auto &MBB : Fn) Modified |= optimizeBlock(MBB, enableNarrowZeroStOpt); return Modified; } // FIXME: Do we need/want a pre-alloc pass like ARM has to try to keep loads and // stores near one another? Note: The pre-RA instruction scheduler already has // hooks to try and schedule pairable loads/stores together to improve pairing // opportunities. Thus, pre-RA pairing pass may not be worth the effort. // FIXME: When pairing store instructions it's very possible for this pass to // hoist a store with a KILL marker above another use (without a KILL marker). // The resulting IR is invalid, but nothing uses the KILL markers after this // pass, so it's never caused a problem in practice. /// createAArch64LoadStoreOptimizationPass - returns an instance of the /// load / store optimization pass. FunctionPass *llvm::createAArch64LoadStoreOptimizationPass() { return new AArch64LoadStoreOpt(); } Index: vendor/llvm/dist/lib/Target/ARM/ARMExpandPseudoInsts.cpp =================================================================== --- vendor/llvm/dist/lib/Target/ARM/ARMExpandPseudoInsts.cpp (revision 314158) +++ vendor/llvm/dist/lib/Target/ARM/ARMExpandPseudoInsts.cpp (revision 314159) @@ -1,1686 +1,1706 @@ //===-- ARMExpandPseudoInsts.cpp - Expand pseudo instructions -------------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file contains a pass that expands pseudo instructions into target // instructions to allow proper scheduling, if-conversion, and other late // optimizations. This pass should be run after register allocation but before // the post-regalloc scheduling pass. // //===----------------------------------------------------------------------===// #include "ARM.h" #include "ARMBaseInstrInfo.h" #include "ARMBaseRegisterInfo.h" #include "ARMConstantPoolValue.h" #include "ARMMachineFunctionInfo.h" #include "MCTargetDesc/ARMAddressingModes.h" #include "llvm/CodeGen/LivePhysRegs.h" #include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/CodeGen/MachineInstrBundle.h" #include "llvm/IR/GlobalValue.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/raw_ostream.h" // FIXME: for debug only. remove! #include "llvm/Target/TargetFrameLowering.h" #include "llvm/Target/TargetRegisterInfo.h" using namespace llvm; #define DEBUG_TYPE "arm-pseudo" static cl::opt VerifyARMPseudo("verify-arm-pseudo-expand", cl::Hidden, cl::desc("Verify machine code after expanding ARM pseudos")); namespace { class ARMExpandPseudo : public MachineFunctionPass { public: static char ID; ARMExpandPseudo() : MachineFunctionPass(ID) {} const ARMBaseInstrInfo *TII; const TargetRegisterInfo *TRI; const ARMSubtarget *STI; ARMFunctionInfo *AFI; bool runOnMachineFunction(MachineFunction &Fn) override; MachineFunctionProperties getRequiredProperties() const override { return MachineFunctionProperties().set( MachineFunctionProperties::Property::NoVRegs); } StringRef getPassName() const override { return "ARM pseudo instruction expansion pass"; } private: void TransferImpOps(MachineInstr &OldMI, MachineInstrBuilder &UseMI, MachineInstrBuilder &DefMI); bool ExpandMI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, MachineBasicBlock::iterator &NextMBBI); bool ExpandMBB(MachineBasicBlock &MBB); void ExpandVLD(MachineBasicBlock::iterator &MBBI); void ExpandVST(MachineBasicBlock::iterator &MBBI); void ExpandLaneOp(MachineBasicBlock::iterator &MBBI); void ExpandVTBL(MachineBasicBlock::iterator &MBBI, unsigned Opc, bool IsExt); void ExpandMOV32BitImm(MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI); bool ExpandCMP_SWAP(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, unsigned LdrexOp, unsigned StrexOp, unsigned UxtOp, MachineBasicBlock::iterator &NextMBBI); bool ExpandCMP_SWAP_64(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, MachineBasicBlock::iterator &NextMBBI); }; char ARMExpandPseudo::ID = 0; } /// TransferImpOps - Transfer implicit operands on the pseudo instruction to /// the instructions created from the expansion. void ARMExpandPseudo::TransferImpOps(MachineInstr &OldMI, MachineInstrBuilder &UseMI, MachineInstrBuilder &DefMI) { const MCInstrDesc &Desc = OldMI.getDesc(); for (unsigned i = Desc.getNumOperands(), e = OldMI.getNumOperands(); i != e; ++i) { const MachineOperand &MO = OldMI.getOperand(i); assert(MO.isReg() && MO.getReg()); if (MO.isUse()) UseMI.addOperand(MO); else DefMI.addOperand(MO); } } namespace { // Constants for register spacing in NEON load/store instructions. // For quad-register load-lane and store-lane pseudo instructors, the // spacing is initially assumed to be EvenDblSpc, and that is changed to // OddDblSpc depending on the lane number operand. enum NEONRegSpacing { SingleSpc, EvenDblSpc, OddDblSpc }; // Entries for NEON load/store information table. The table is sorted by // PseudoOpc for fast binary-search lookups. struct NEONLdStTableEntry { uint16_t PseudoOpc; uint16_t RealOpc; bool IsLoad; bool isUpdating; bool hasWritebackOperand; uint8_t RegSpacing; // One of type NEONRegSpacing uint8_t NumRegs; // D registers loaded or stored uint8_t RegElts; // elements per D register; used for lane ops // FIXME: Temporary flag to denote whether the real instruction takes // a single register (like the encoding) or all of the registers in // the list (like the asm syntax and the isel DAG). When all definitions // are converted to take only the single encoded register, this will // go away. bool copyAllListRegs; // Comparison methods for binary search of the table. bool operator<(const NEONLdStTableEntry &TE) const { return PseudoOpc < TE.PseudoOpc; } friend bool operator<(const NEONLdStTableEntry &TE, unsigned PseudoOpc) { return TE.PseudoOpc < PseudoOpc; } friend bool LLVM_ATTRIBUTE_UNUSED operator<(unsigned PseudoOpc, const NEONLdStTableEntry &TE) { return PseudoOpc < TE.PseudoOpc; } }; } static const NEONLdStTableEntry NEONLdStTable[] = { { ARM::VLD1LNq16Pseudo, ARM::VLD1LNd16, true, false, false, EvenDblSpc, 1, 4 ,true}, { ARM::VLD1LNq16Pseudo_UPD, ARM::VLD1LNd16_UPD, true, true, true, EvenDblSpc, 1, 4 ,true}, { ARM::VLD1LNq32Pseudo, ARM::VLD1LNd32, true, false, false, EvenDblSpc, 1, 2 ,true}, { ARM::VLD1LNq32Pseudo_UPD, ARM::VLD1LNd32_UPD, true, true, true, EvenDblSpc, 1, 2 ,true}, { ARM::VLD1LNq8Pseudo, ARM::VLD1LNd8, true, false, false, EvenDblSpc, 1, 8 ,true}, { ARM::VLD1LNq8Pseudo_UPD, ARM::VLD1LNd8_UPD, true, true, true, EvenDblSpc, 1, 8 ,true}, { ARM::VLD1d64QPseudo, ARM::VLD1d64Q, true, false, false, SingleSpc, 4, 1 ,false}, { ARM::VLD1d64QPseudoWB_fixed, ARM::VLD1d64Qwb_fixed, true, true, false, SingleSpc, 4, 1 ,false}, { ARM::VLD1d64TPseudo, ARM::VLD1d64T, true, false, false, SingleSpc, 3, 1 ,false}, { ARM::VLD1d64TPseudoWB_fixed, ARM::VLD1d64Twb_fixed, true, true, false, SingleSpc, 3, 1 ,false}, { ARM::VLD2LNd16Pseudo, ARM::VLD2LNd16, true, false, false, SingleSpc, 2, 4 ,true}, { ARM::VLD2LNd16Pseudo_UPD, ARM::VLD2LNd16_UPD, true, true, true, SingleSpc, 2, 4 ,true}, { ARM::VLD2LNd32Pseudo, ARM::VLD2LNd32, true, false, false, SingleSpc, 2, 2 ,true}, { ARM::VLD2LNd32Pseudo_UPD, ARM::VLD2LNd32_UPD, true, true, true, SingleSpc, 2, 2 ,true}, { ARM::VLD2LNd8Pseudo, ARM::VLD2LNd8, true, false, false, SingleSpc, 2, 8 ,true}, { ARM::VLD2LNd8Pseudo_UPD, ARM::VLD2LNd8_UPD, true, true, true, SingleSpc, 2, 8 ,true}, { ARM::VLD2LNq16Pseudo, ARM::VLD2LNq16, true, false, false, EvenDblSpc, 2, 4 ,true}, { ARM::VLD2LNq16Pseudo_UPD, ARM::VLD2LNq16_UPD, true, true, true, EvenDblSpc, 2, 4 ,true}, { ARM::VLD2LNq32Pseudo, ARM::VLD2LNq32, true, false, false, EvenDblSpc, 2, 2 ,true}, { ARM::VLD2LNq32Pseudo_UPD, ARM::VLD2LNq32_UPD, true, true, true, EvenDblSpc, 2, 2 ,true}, { ARM::VLD2q16Pseudo, ARM::VLD2q16, true, false, false, SingleSpc, 4, 4 ,false}, { ARM::VLD2q16PseudoWB_fixed, ARM::VLD2q16wb_fixed, true, true, false, SingleSpc, 4, 4 ,false}, { ARM::VLD2q16PseudoWB_register, ARM::VLD2q16wb_register, true, true, true, SingleSpc, 4, 4 ,false}, { ARM::VLD2q32Pseudo, ARM::VLD2q32, true, false, false, SingleSpc, 4, 2 ,false}, { ARM::VLD2q32PseudoWB_fixed, ARM::VLD2q32wb_fixed, true, true, false, SingleSpc, 4, 2 ,false}, { ARM::VLD2q32PseudoWB_register, ARM::VLD2q32wb_register, true, true, true, SingleSpc, 4, 2 ,false}, { ARM::VLD2q8Pseudo, ARM::VLD2q8, true, false, false, SingleSpc, 4, 8 ,false}, { ARM::VLD2q8PseudoWB_fixed, ARM::VLD2q8wb_fixed, true, true, false, SingleSpc, 4, 8 ,false}, { ARM::VLD2q8PseudoWB_register, ARM::VLD2q8wb_register, true, true, true, SingleSpc, 4, 8 ,false}, { ARM::VLD3DUPd16Pseudo, ARM::VLD3DUPd16, true, false, false, SingleSpc, 3, 4,true}, { ARM::VLD3DUPd16Pseudo_UPD, ARM::VLD3DUPd16_UPD, true, true, true, SingleSpc, 3, 4,true}, { ARM::VLD3DUPd32Pseudo, ARM::VLD3DUPd32, true, false, false, SingleSpc, 3, 2,true}, { ARM::VLD3DUPd32Pseudo_UPD, ARM::VLD3DUPd32_UPD, true, true, true, SingleSpc, 3, 2,true}, { ARM::VLD3DUPd8Pseudo, ARM::VLD3DUPd8, true, false, false, SingleSpc, 3, 8,true}, { ARM::VLD3DUPd8Pseudo_UPD, ARM::VLD3DUPd8_UPD, true, true, true, SingleSpc, 3, 8,true}, { ARM::VLD3LNd16Pseudo, ARM::VLD3LNd16, true, false, false, SingleSpc, 3, 4 ,true}, { ARM::VLD3LNd16Pseudo_UPD, ARM::VLD3LNd16_UPD, true, true, true, SingleSpc, 3, 4 ,true}, { ARM::VLD3LNd32Pseudo, ARM::VLD3LNd32, true, false, false, SingleSpc, 3, 2 ,true}, { ARM::VLD3LNd32Pseudo_UPD, ARM::VLD3LNd32_UPD, true, true, true, SingleSpc, 3, 2 ,true}, { ARM::VLD3LNd8Pseudo, ARM::VLD3LNd8, true, false, false, SingleSpc, 3, 8 ,true}, { ARM::VLD3LNd8Pseudo_UPD, ARM::VLD3LNd8_UPD, true, true, true, SingleSpc, 3, 8 ,true}, { ARM::VLD3LNq16Pseudo, ARM::VLD3LNq16, true, false, false, EvenDblSpc, 3, 4 ,true}, { ARM::VLD3LNq16Pseudo_UPD, ARM::VLD3LNq16_UPD, true, true, true, EvenDblSpc, 3, 4 ,true}, { ARM::VLD3LNq32Pseudo, ARM::VLD3LNq32, true, false, false, EvenDblSpc, 3, 2 ,true}, { ARM::VLD3LNq32Pseudo_UPD, ARM::VLD3LNq32_UPD, true, true, true, EvenDblSpc, 3, 2 ,true}, { ARM::VLD3d16Pseudo, ARM::VLD3d16, true, false, false, SingleSpc, 3, 4 ,true}, { ARM::VLD3d16Pseudo_UPD, ARM::VLD3d16_UPD, true, true, true, SingleSpc, 3, 4 ,true}, { ARM::VLD3d32Pseudo, ARM::VLD3d32, true, false, false, SingleSpc, 3, 2 ,true}, { ARM::VLD3d32Pseudo_UPD, ARM::VLD3d32_UPD, true, true, true, SingleSpc, 3, 2 ,true}, { ARM::VLD3d8Pseudo, ARM::VLD3d8, true, false, false, SingleSpc, 3, 8 ,true}, { ARM::VLD3d8Pseudo_UPD, ARM::VLD3d8_UPD, true, true, true, SingleSpc, 3, 8 ,true}, { ARM::VLD3q16Pseudo_UPD, ARM::VLD3q16_UPD, true, true, true, EvenDblSpc, 3, 4 ,true}, { ARM::VLD3q16oddPseudo, ARM::VLD3q16, true, false, false, OddDblSpc, 3, 4 ,true}, { ARM::VLD3q16oddPseudo_UPD, ARM::VLD3q16_UPD, true, true, true, OddDblSpc, 3, 4 ,true}, { ARM::VLD3q32Pseudo_UPD, ARM::VLD3q32_UPD, true, true, true, EvenDblSpc, 3, 2 ,true}, { ARM::VLD3q32oddPseudo, ARM::VLD3q32, true, false, false, OddDblSpc, 3, 2 ,true}, { ARM::VLD3q32oddPseudo_UPD, ARM::VLD3q32_UPD, true, true, true, OddDblSpc, 3, 2 ,true}, { ARM::VLD3q8Pseudo_UPD, ARM::VLD3q8_UPD, true, true, true, EvenDblSpc, 3, 8 ,true}, { ARM::VLD3q8oddPseudo, ARM::VLD3q8, true, false, false, OddDblSpc, 3, 8 ,true}, { ARM::VLD3q8oddPseudo_UPD, ARM::VLD3q8_UPD, true, true, true, OddDblSpc, 3, 8 ,true}, { ARM::VLD4DUPd16Pseudo, ARM::VLD4DUPd16, true, false, false, SingleSpc, 4, 4,true}, { ARM::VLD4DUPd16Pseudo_UPD, ARM::VLD4DUPd16_UPD, true, true, true, SingleSpc, 4, 4,true}, { ARM::VLD4DUPd32Pseudo, ARM::VLD4DUPd32, true, false, false, SingleSpc, 4, 2,true}, { ARM::VLD4DUPd32Pseudo_UPD, ARM::VLD4DUPd32_UPD, true, true, true, SingleSpc, 4, 2,true}, { ARM::VLD4DUPd8Pseudo, ARM::VLD4DUPd8, true, false, false, SingleSpc, 4, 8,true}, { ARM::VLD4DUPd8Pseudo_UPD, ARM::VLD4DUPd8_UPD, true, true, true, SingleSpc, 4, 8,true}, { ARM::VLD4LNd16Pseudo, ARM::VLD4LNd16, true, false, false, SingleSpc, 4, 4 ,true}, { ARM::VLD4LNd16Pseudo_UPD, ARM::VLD4LNd16_UPD, true, true, true, SingleSpc, 4, 4 ,true}, { ARM::VLD4LNd32Pseudo, ARM::VLD4LNd32, true, false, false, SingleSpc, 4, 2 ,true}, { ARM::VLD4LNd32Pseudo_UPD, ARM::VLD4LNd32_UPD, true, true, true, SingleSpc, 4, 2 ,true}, { ARM::VLD4LNd8Pseudo, ARM::VLD4LNd8, true, false, false, SingleSpc, 4, 8 ,true}, { ARM::VLD4LNd8Pseudo_UPD, ARM::VLD4LNd8_UPD, true, true, true, SingleSpc, 4, 8 ,true}, { ARM::VLD4LNq16Pseudo, ARM::VLD4LNq16, true, false, false, EvenDblSpc, 4, 4 ,true}, { ARM::VLD4LNq16Pseudo_UPD, ARM::VLD4LNq16_UPD, true, true, true, EvenDblSpc, 4, 4 ,true}, { ARM::VLD4LNq32Pseudo, ARM::VLD4LNq32, true, false, false, EvenDblSpc, 4, 2 ,true}, { ARM::VLD4LNq32Pseudo_UPD, ARM::VLD4LNq32_UPD, true, true, true, EvenDblSpc, 4, 2 ,true}, { ARM::VLD4d16Pseudo, ARM::VLD4d16, true, false, false, SingleSpc, 4, 4 ,true}, { ARM::VLD4d16Pseudo_UPD, ARM::VLD4d16_UPD, true, true, true, SingleSpc, 4, 4 ,true}, { ARM::VLD4d32Pseudo, ARM::VLD4d32, true, false, false, SingleSpc, 4, 2 ,true}, { ARM::VLD4d32Pseudo_UPD, ARM::VLD4d32_UPD, true, true, true, SingleSpc, 4, 2 ,true}, { ARM::VLD4d8Pseudo, ARM::VLD4d8, true, false, false, SingleSpc, 4, 8 ,true}, { ARM::VLD4d8Pseudo_UPD, ARM::VLD4d8_UPD, true, true, true, SingleSpc, 4, 8 ,true}, { ARM::VLD4q16Pseudo_UPD, ARM::VLD4q16_UPD, true, true, true, EvenDblSpc, 4, 4 ,true}, { ARM::VLD4q16oddPseudo, ARM::VLD4q16, true, false, false, OddDblSpc, 4, 4 ,true}, { ARM::VLD4q16oddPseudo_UPD, ARM::VLD4q16_UPD, true, true, true, OddDblSpc, 4, 4 ,true}, { ARM::VLD4q32Pseudo_UPD, ARM::VLD4q32_UPD, true, true, true, EvenDblSpc, 4, 2 ,true}, { ARM::VLD4q32oddPseudo, ARM::VLD4q32, true, false, false, OddDblSpc, 4, 2 ,true}, { ARM::VLD4q32oddPseudo_UPD, ARM::VLD4q32_UPD, true, true, true, OddDblSpc, 4, 2 ,true}, { ARM::VLD4q8Pseudo_UPD, ARM::VLD4q8_UPD, true, true, true, EvenDblSpc, 4, 8 ,true}, { ARM::VLD4q8oddPseudo, ARM::VLD4q8, true, false, false, OddDblSpc, 4, 8 ,true}, { ARM::VLD4q8oddPseudo_UPD, ARM::VLD4q8_UPD, true, true, true, OddDblSpc, 4, 8 ,true}, { ARM::VST1LNq16Pseudo, ARM::VST1LNd16, false, false, false, EvenDblSpc, 1, 4 ,true}, { ARM::VST1LNq16Pseudo_UPD, ARM::VST1LNd16_UPD, false, true, true, EvenDblSpc, 1, 4 ,true}, { ARM::VST1LNq32Pseudo, ARM::VST1LNd32, false, false, false, EvenDblSpc, 1, 2 ,true}, { ARM::VST1LNq32Pseudo_UPD, ARM::VST1LNd32_UPD, false, true, true, EvenDblSpc, 1, 2 ,true}, { ARM::VST1LNq8Pseudo, ARM::VST1LNd8, false, false, false, EvenDblSpc, 1, 8 ,true}, { ARM::VST1LNq8Pseudo_UPD, ARM::VST1LNd8_UPD, false, true, true, EvenDblSpc, 1, 8 ,true}, { ARM::VST1d64QPseudo, ARM::VST1d64Q, false, false, false, SingleSpc, 4, 1 ,false}, { ARM::VST1d64QPseudoWB_fixed, ARM::VST1d64Qwb_fixed, false, true, false, SingleSpc, 4, 1 ,false}, { ARM::VST1d64QPseudoWB_register, ARM::VST1d64Qwb_register, false, true, true, SingleSpc, 4, 1 ,false}, { ARM::VST1d64TPseudo, ARM::VST1d64T, false, false, false, SingleSpc, 3, 1 ,false}, { ARM::VST1d64TPseudoWB_fixed, ARM::VST1d64Twb_fixed, false, true, false, SingleSpc, 3, 1 ,false}, { ARM::VST1d64TPseudoWB_register, ARM::VST1d64Twb_register, false, true, true, SingleSpc, 3, 1 ,false}, { ARM::VST2LNd16Pseudo, ARM::VST2LNd16, false, false, false, SingleSpc, 2, 4 ,true}, { ARM::VST2LNd16Pseudo_UPD, ARM::VST2LNd16_UPD, false, true, true, SingleSpc, 2, 4 ,true}, { ARM::VST2LNd32Pseudo, ARM::VST2LNd32, false, false, false, SingleSpc, 2, 2 ,true}, { ARM::VST2LNd32Pseudo_UPD, ARM::VST2LNd32_UPD, false, true, true, SingleSpc, 2, 2 ,true}, { ARM::VST2LNd8Pseudo, ARM::VST2LNd8, false, false, false, SingleSpc, 2, 8 ,true}, { ARM::VST2LNd8Pseudo_UPD, ARM::VST2LNd8_UPD, false, true, true, SingleSpc, 2, 8 ,true}, { ARM::VST2LNq16Pseudo, ARM::VST2LNq16, false, false, false, EvenDblSpc, 2, 4,true}, { ARM::VST2LNq16Pseudo_UPD, ARM::VST2LNq16_UPD, false, true, true, EvenDblSpc, 2, 4,true}, { ARM::VST2LNq32Pseudo, ARM::VST2LNq32, false, false, false, EvenDblSpc, 2, 2,true}, { ARM::VST2LNq32Pseudo_UPD, ARM::VST2LNq32_UPD, false, true, true, EvenDblSpc, 2, 2,true}, { ARM::VST2q16Pseudo, ARM::VST2q16, false, false, false, SingleSpc, 4, 4 ,false}, { ARM::VST2q16PseudoWB_fixed, ARM::VST2q16wb_fixed, false, true, false, SingleSpc, 4, 4 ,false}, { ARM::VST2q16PseudoWB_register, ARM::VST2q16wb_register, false, true, true, SingleSpc, 4, 4 ,false}, { ARM::VST2q32Pseudo, ARM::VST2q32, false, false, false, SingleSpc, 4, 2 ,false}, { ARM::VST2q32PseudoWB_fixed, ARM::VST2q32wb_fixed, false, true, false, SingleSpc, 4, 2 ,false}, { ARM::VST2q32PseudoWB_register, ARM::VST2q32wb_register, false, true, true, SingleSpc, 4, 2 ,false}, { ARM::VST2q8Pseudo, ARM::VST2q8, false, false, false, SingleSpc, 4, 8 ,false}, { ARM::VST2q8PseudoWB_fixed, ARM::VST2q8wb_fixed, false, true, false, SingleSpc, 4, 8 ,false}, { ARM::VST2q8PseudoWB_register, ARM::VST2q8wb_register, false, true, true, SingleSpc, 4, 8 ,false}, { ARM::VST3LNd16Pseudo, ARM::VST3LNd16, false, false, false, SingleSpc, 3, 4 ,true}, { ARM::VST3LNd16Pseudo_UPD, ARM::VST3LNd16_UPD, false, true, true, SingleSpc, 3, 4 ,true}, { ARM::VST3LNd32Pseudo, ARM::VST3LNd32, false, false, false, SingleSpc, 3, 2 ,true}, { ARM::VST3LNd32Pseudo_UPD, ARM::VST3LNd32_UPD, false, true, true, SingleSpc, 3, 2 ,true}, { ARM::VST3LNd8Pseudo, ARM::VST3LNd8, false, false, false, SingleSpc, 3, 8 ,true}, { ARM::VST3LNd8Pseudo_UPD, ARM::VST3LNd8_UPD, false, true, true, SingleSpc, 3, 8 ,true}, { ARM::VST3LNq16Pseudo, ARM::VST3LNq16, false, false, false, EvenDblSpc, 3, 4,true}, { ARM::VST3LNq16Pseudo_UPD, ARM::VST3LNq16_UPD, false, true, true, EvenDblSpc, 3, 4,true}, { ARM::VST3LNq32Pseudo, ARM::VST3LNq32, false, false, false, EvenDblSpc, 3, 2,true}, { ARM::VST3LNq32Pseudo_UPD, ARM::VST3LNq32_UPD, false, true, true, EvenDblSpc, 3, 2,true}, { ARM::VST3d16Pseudo, ARM::VST3d16, false, false, false, SingleSpc, 3, 4 ,true}, { ARM::VST3d16Pseudo_UPD, ARM::VST3d16_UPD, false, true, true, SingleSpc, 3, 4 ,true}, { ARM::VST3d32Pseudo, ARM::VST3d32, false, false, false, SingleSpc, 3, 2 ,true}, { ARM::VST3d32Pseudo_UPD, ARM::VST3d32_UPD, false, true, true, SingleSpc, 3, 2 ,true}, { ARM::VST3d8Pseudo, ARM::VST3d8, false, false, false, SingleSpc, 3, 8 ,true}, { ARM::VST3d8Pseudo_UPD, ARM::VST3d8_UPD, false, true, true, SingleSpc, 3, 8 ,true}, { ARM::VST3q16Pseudo_UPD, ARM::VST3q16_UPD, false, true, true, EvenDblSpc, 3, 4 ,true}, { ARM::VST3q16oddPseudo, ARM::VST3q16, false, false, false, OddDblSpc, 3, 4 ,true}, { ARM::VST3q16oddPseudo_UPD, ARM::VST3q16_UPD, false, true, true, OddDblSpc, 3, 4 ,true}, { ARM::VST3q32Pseudo_UPD, ARM::VST3q32_UPD, false, true, true, EvenDblSpc, 3, 2 ,true}, { ARM::VST3q32oddPseudo, ARM::VST3q32, false, false, false, OddDblSpc, 3, 2 ,true}, { ARM::VST3q32oddPseudo_UPD, ARM::VST3q32_UPD, false, true, true, OddDblSpc, 3, 2 ,true}, { ARM::VST3q8Pseudo_UPD, ARM::VST3q8_UPD, false, true, true, EvenDblSpc, 3, 8 ,true}, { ARM::VST3q8oddPseudo, ARM::VST3q8, false, false, false, OddDblSpc, 3, 8 ,true}, { ARM::VST3q8oddPseudo_UPD, ARM::VST3q8_UPD, false, true, true, OddDblSpc, 3, 8 ,true}, { ARM::VST4LNd16Pseudo, ARM::VST4LNd16, false, false, false, SingleSpc, 4, 4 ,true}, { ARM::VST4LNd16Pseudo_UPD, ARM::VST4LNd16_UPD, false, true, true, SingleSpc, 4, 4 ,true}, { ARM::VST4LNd32Pseudo, ARM::VST4LNd32, false, false, false, SingleSpc, 4, 2 ,true}, { ARM::VST4LNd32Pseudo_UPD, ARM::VST4LNd32_UPD, false, true, true, SingleSpc, 4, 2 ,true}, { ARM::VST4LNd8Pseudo, ARM::VST4LNd8, false, false, false, SingleSpc, 4, 8 ,true}, { ARM::VST4LNd8Pseudo_UPD, ARM::VST4LNd8_UPD, false, true, true, SingleSpc, 4, 8 ,true}, { ARM::VST4LNq16Pseudo, ARM::VST4LNq16, false, false, false, EvenDblSpc, 4, 4,true}, { ARM::VST4LNq16Pseudo_UPD, ARM::VST4LNq16_UPD, false, true, true, EvenDblSpc, 4, 4,true}, { ARM::VST4LNq32Pseudo, ARM::VST4LNq32, false, false, false, EvenDblSpc, 4, 2,true}, { ARM::VST4LNq32Pseudo_UPD, ARM::VST4LNq32_UPD, false, true, true, EvenDblSpc, 4, 2,true}, { ARM::VST4d16Pseudo, ARM::VST4d16, false, false, false, SingleSpc, 4, 4 ,true}, { ARM::VST4d16Pseudo_UPD, ARM::VST4d16_UPD, false, true, true, SingleSpc, 4, 4 ,true}, { ARM::VST4d32Pseudo, ARM::VST4d32, false, false, false, SingleSpc, 4, 2 ,true}, { ARM::VST4d32Pseudo_UPD, ARM::VST4d32_UPD, false, true, true, SingleSpc, 4, 2 ,true}, { ARM::VST4d8Pseudo, ARM::VST4d8, false, false, false, SingleSpc, 4, 8 ,true}, { ARM::VST4d8Pseudo_UPD, ARM::VST4d8_UPD, false, true, true, SingleSpc, 4, 8 ,true}, { ARM::VST4q16Pseudo_UPD, ARM::VST4q16_UPD, false, true, true, EvenDblSpc, 4, 4 ,true}, { ARM::VST4q16oddPseudo, ARM::VST4q16, false, false, false, OddDblSpc, 4, 4 ,true}, { ARM::VST4q16oddPseudo_UPD, ARM::VST4q16_UPD, false, true, true, OddDblSpc, 4, 4 ,true}, { ARM::VST4q32Pseudo_UPD, ARM::VST4q32_UPD, false, true, true, EvenDblSpc, 4, 2 ,true}, { ARM::VST4q32oddPseudo, ARM::VST4q32, false, false, false, OddDblSpc, 4, 2 ,true}, { ARM::VST4q32oddPseudo_UPD, ARM::VST4q32_UPD, false, true, true, OddDblSpc, 4, 2 ,true}, { ARM::VST4q8Pseudo_UPD, ARM::VST4q8_UPD, false, true, true, EvenDblSpc, 4, 8 ,true}, { ARM::VST4q8oddPseudo, ARM::VST4q8, false, false, false, OddDblSpc, 4, 8 ,true}, { ARM::VST4q8oddPseudo_UPD, ARM::VST4q8_UPD, false, true, true, OddDblSpc, 4, 8 ,true} }; /// LookupNEONLdSt - Search the NEONLdStTable for information about a NEON /// load or store pseudo instruction. static const NEONLdStTableEntry *LookupNEONLdSt(unsigned Opcode) { #ifndef NDEBUG // Make sure the table is sorted. static bool TableChecked = false; if (!TableChecked) { assert(std::is_sorted(std::begin(NEONLdStTable), std::end(NEONLdStTable)) && "NEONLdStTable is not sorted!"); TableChecked = true; } #endif auto I = std::lower_bound(std::begin(NEONLdStTable), std::end(NEONLdStTable), Opcode); if (I != std::end(NEONLdStTable) && I->PseudoOpc == Opcode) return I; return nullptr; } /// GetDSubRegs - Get 4 D subregisters of a Q, QQ, or QQQQ register, /// corresponding to the specified register spacing. Not all of the results /// are necessarily valid, e.g., a Q register only has 2 D subregisters. static void GetDSubRegs(unsigned Reg, NEONRegSpacing RegSpc, const TargetRegisterInfo *TRI, unsigned &D0, unsigned &D1, unsigned &D2, unsigned &D3) { if (RegSpc == SingleSpc) { D0 = TRI->getSubReg(Reg, ARM::dsub_0); D1 = TRI->getSubReg(Reg, ARM::dsub_1); D2 = TRI->getSubReg(Reg, ARM::dsub_2); D3 = TRI->getSubReg(Reg, ARM::dsub_3); } else if (RegSpc == EvenDblSpc) { D0 = TRI->getSubReg(Reg, ARM::dsub_0); D1 = TRI->getSubReg(Reg, ARM::dsub_2); D2 = TRI->getSubReg(Reg, ARM::dsub_4); D3 = TRI->getSubReg(Reg, ARM::dsub_6); } else { assert(RegSpc == OddDblSpc && "unknown register spacing"); D0 = TRI->getSubReg(Reg, ARM::dsub_1); D1 = TRI->getSubReg(Reg, ARM::dsub_3); D2 = TRI->getSubReg(Reg, ARM::dsub_5); D3 = TRI->getSubReg(Reg, ARM::dsub_7); } } /// ExpandVLD - Translate VLD pseudo instructions with Q, QQ or QQQQ register /// operands to real VLD instructions with D register operands. void ARMExpandPseudo::ExpandVLD(MachineBasicBlock::iterator &MBBI) { MachineInstr &MI = *MBBI; MachineBasicBlock &MBB = *MI.getParent(); const NEONLdStTableEntry *TableEntry = LookupNEONLdSt(MI.getOpcode()); assert(TableEntry && TableEntry->IsLoad && "NEONLdStTable lookup failed"); NEONRegSpacing RegSpc = (NEONRegSpacing)TableEntry->RegSpacing; unsigned NumRegs = TableEntry->NumRegs; MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(TableEntry->RealOpc)); unsigned OpIdx = 0; bool DstIsDead = MI.getOperand(OpIdx).isDead(); unsigned DstReg = MI.getOperand(OpIdx++).getReg(); unsigned D0, D1, D2, D3; GetDSubRegs(DstReg, RegSpc, TRI, D0, D1, D2, D3); MIB.addReg(D0, RegState::Define | getDeadRegState(DstIsDead)); if (NumRegs > 1 && TableEntry->copyAllListRegs) MIB.addReg(D1, RegState::Define | getDeadRegState(DstIsDead)); if (NumRegs > 2 && TableEntry->copyAllListRegs) MIB.addReg(D2, RegState::Define | getDeadRegState(DstIsDead)); if (NumRegs > 3 && TableEntry->copyAllListRegs) MIB.addReg(D3, RegState::Define | getDeadRegState(DstIsDead)); if (TableEntry->isUpdating) MIB.addOperand(MI.getOperand(OpIdx++)); // Copy the addrmode6 operands. MIB.addOperand(MI.getOperand(OpIdx++)); MIB.addOperand(MI.getOperand(OpIdx++)); // Copy the am6offset operand. if (TableEntry->hasWritebackOperand) MIB.addOperand(MI.getOperand(OpIdx++)); // For an instruction writing double-spaced subregs, the pseudo instruction // has an extra operand that is a use of the super-register. Record the // operand index and skip over it. unsigned SrcOpIdx = 0; if (RegSpc == EvenDblSpc || RegSpc == OddDblSpc) SrcOpIdx = OpIdx++; // Copy the predicate operands. MIB.addOperand(MI.getOperand(OpIdx++)); MIB.addOperand(MI.getOperand(OpIdx++)); // Copy the super-register source operand used for double-spaced subregs over // to the new instruction as an implicit operand. if (SrcOpIdx != 0) { MachineOperand MO = MI.getOperand(SrcOpIdx); MO.setImplicit(true); MIB.addOperand(MO); } // Add an implicit def for the super-register. MIB.addReg(DstReg, RegState::ImplicitDefine | getDeadRegState(DstIsDead)); TransferImpOps(MI, MIB, MIB); // Transfer memoperands. MIB->setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); MI.eraseFromParent(); } /// ExpandVST - Translate VST pseudo instructions with Q, QQ or QQQQ register /// operands to real VST instructions with D register operands. void ARMExpandPseudo::ExpandVST(MachineBasicBlock::iterator &MBBI) { MachineInstr &MI = *MBBI; MachineBasicBlock &MBB = *MI.getParent(); const NEONLdStTableEntry *TableEntry = LookupNEONLdSt(MI.getOpcode()); assert(TableEntry && !TableEntry->IsLoad && "NEONLdStTable lookup failed"); NEONRegSpacing RegSpc = (NEONRegSpacing)TableEntry->RegSpacing; unsigned NumRegs = TableEntry->NumRegs; MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(TableEntry->RealOpc)); unsigned OpIdx = 0; if (TableEntry->isUpdating) MIB.addOperand(MI.getOperand(OpIdx++)); // Copy the addrmode6 operands. MIB.addOperand(MI.getOperand(OpIdx++)); MIB.addOperand(MI.getOperand(OpIdx++)); // Copy the am6offset operand. if (TableEntry->hasWritebackOperand) MIB.addOperand(MI.getOperand(OpIdx++)); bool SrcIsKill = MI.getOperand(OpIdx).isKill(); bool SrcIsUndef = MI.getOperand(OpIdx).isUndef(); unsigned SrcReg = MI.getOperand(OpIdx++).getReg(); unsigned D0, D1, D2, D3; GetDSubRegs(SrcReg, RegSpc, TRI, D0, D1, D2, D3); MIB.addReg(D0, getUndefRegState(SrcIsUndef)); if (NumRegs > 1 && TableEntry->copyAllListRegs) MIB.addReg(D1, getUndefRegState(SrcIsUndef)); if (NumRegs > 2 && TableEntry->copyAllListRegs) MIB.addReg(D2, getUndefRegState(SrcIsUndef)); if (NumRegs > 3 && TableEntry->copyAllListRegs) MIB.addReg(D3, getUndefRegState(SrcIsUndef)); // Copy the predicate operands. MIB.addOperand(MI.getOperand(OpIdx++)); MIB.addOperand(MI.getOperand(OpIdx++)); if (SrcIsKill && !SrcIsUndef) // Add an implicit kill for the super-reg. MIB->addRegisterKilled(SrcReg, TRI, true); else if (!SrcIsUndef) MIB.addReg(SrcReg, RegState::Implicit); // Add implicit uses for src reg. TransferImpOps(MI, MIB, MIB); // Transfer memoperands. MIB->setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); MI.eraseFromParent(); } /// ExpandLaneOp - Translate VLD*LN and VST*LN instructions with Q, QQ or QQQQ /// register operands to real instructions with D register operands. void ARMExpandPseudo::ExpandLaneOp(MachineBasicBlock::iterator &MBBI) { MachineInstr &MI = *MBBI; MachineBasicBlock &MBB = *MI.getParent(); const NEONLdStTableEntry *TableEntry = LookupNEONLdSt(MI.getOpcode()); assert(TableEntry && "NEONLdStTable lookup failed"); NEONRegSpacing RegSpc = (NEONRegSpacing)TableEntry->RegSpacing; unsigned NumRegs = TableEntry->NumRegs; unsigned RegElts = TableEntry->RegElts; MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(TableEntry->RealOpc)); unsigned OpIdx = 0; // The lane operand is always the 3rd from last operand, before the 2 // predicate operands. unsigned Lane = MI.getOperand(MI.getDesc().getNumOperands() - 3).getImm(); // Adjust the lane and spacing as needed for Q registers. assert(RegSpc != OddDblSpc && "unexpected register spacing for VLD/VST-lane"); if (RegSpc == EvenDblSpc && Lane >= RegElts) { RegSpc = OddDblSpc; Lane -= RegElts; } assert(Lane < RegElts && "out of range lane for VLD/VST-lane"); unsigned D0 = 0, D1 = 0, D2 = 0, D3 = 0; unsigned DstReg = 0; bool DstIsDead = false; if (TableEntry->IsLoad) { DstIsDead = MI.getOperand(OpIdx).isDead(); DstReg = MI.getOperand(OpIdx++).getReg(); GetDSubRegs(DstReg, RegSpc, TRI, D0, D1, D2, D3); MIB.addReg(D0, RegState::Define | getDeadRegState(DstIsDead)); if (NumRegs > 1) MIB.addReg(D1, RegState::Define | getDeadRegState(DstIsDead)); if (NumRegs > 2) MIB.addReg(D2, RegState::Define | getDeadRegState(DstIsDead)); if (NumRegs > 3) MIB.addReg(D3, RegState::Define | getDeadRegState(DstIsDead)); } if (TableEntry->isUpdating) MIB.addOperand(MI.getOperand(OpIdx++)); // Copy the addrmode6 operands. MIB.addOperand(MI.getOperand(OpIdx++)); MIB.addOperand(MI.getOperand(OpIdx++)); // Copy the am6offset operand. if (TableEntry->hasWritebackOperand) MIB.addOperand(MI.getOperand(OpIdx++)); // Grab the super-register source. MachineOperand MO = MI.getOperand(OpIdx++); if (!TableEntry->IsLoad) GetDSubRegs(MO.getReg(), RegSpc, TRI, D0, D1, D2, D3); // Add the subregs as sources of the new instruction. unsigned SrcFlags = (getUndefRegState(MO.isUndef()) | getKillRegState(MO.isKill())); MIB.addReg(D0, SrcFlags); if (NumRegs > 1) MIB.addReg(D1, SrcFlags); if (NumRegs > 2) MIB.addReg(D2, SrcFlags); if (NumRegs > 3) MIB.addReg(D3, SrcFlags); // Add the lane number operand. MIB.addImm(Lane); OpIdx += 1; // Copy the predicate operands. MIB.addOperand(MI.getOperand(OpIdx++)); MIB.addOperand(MI.getOperand(OpIdx++)); // Copy the super-register source to be an implicit source. MO.setImplicit(true); MIB.addOperand(MO); if (TableEntry->IsLoad) // Add an implicit def for the super-register. MIB.addReg(DstReg, RegState::ImplicitDefine | getDeadRegState(DstIsDead)); TransferImpOps(MI, MIB, MIB); // Transfer memoperands. MIB->setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); MI.eraseFromParent(); } /// ExpandVTBL - Translate VTBL and VTBX pseudo instructions with Q or QQ /// register operands to real instructions with D register operands. void ARMExpandPseudo::ExpandVTBL(MachineBasicBlock::iterator &MBBI, unsigned Opc, bool IsExt) { MachineInstr &MI = *MBBI; MachineBasicBlock &MBB = *MI.getParent(); MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(Opc)); unsigned OpIdx = 0; // Transfer the destination register operand. MIB.addOperand(MI.getOperand(OpIdx++)); if (IsExt) MIB.addOperand(MI.getOperand(OpIdx++)); bool SrcIsKill = MI.getOperand(OpIdx).isKill(); unsigned SrcReg = MI.getOperand(OpIdx++).getReg(); unsigned D0, D1, D2, D3; GetDSubRegs(SrcReg, SingleSpc, TRI, D0, D1, D2, D3); MIB.addReg(D0); // Copy the other source register operand. MIB.addOperand(MI.getOperand(OpIdx++)); // Copy the predicate operands. MIB.addOperand(MI.getOperand(OpIdx++)); MIB.addOperand(MI.getOperand(OpIdx++)); // Add an implicit kill and use for the super-reg. MIB.addReg(SrcReg, RegState::Implicit | getKillRegState(SrcIsKill)); TransferImpOps(MI, MIB, MIB); MI.eraseFromParent(); } static bool IsAnAddressOperand(const MachineOperand &MO) { // This check is overly conservative. Unless we are certain that the machine // operand is not a symbol reference, we return that it is a symbol reference. // This is important as the load pair may not be split up Windows. switch (MO.getType()) { case MachineOperand::MO_Register: case MachineOperand::MO_Immediate: case MachineOperand::MO_CImmediate: case MachineOperand::MO_FPImmediate: return false; case MachineOperand::MO_MachineBasicBlock: return true; case MachineOperand::MO_FrameIndex: return false; case MachineOperand::MO_ConstantPoolIndex: case MachineOperand::MO_TargetIndex: case MachineOperand::MO_JumpTableIndex: case MachineOperand::MO_ExternalSymbol: case MachineOperand::MO_GlobalAddress: case MachineOperand::MO_BlockAddress: return true; case MachineOperand::MO_RegisterMask: case MachineOperand::MO_RegisterLiveOut: return false; case MachineOperand::MO_Metadata: case MachineOperand::MO_MCSymbol: return true; case MachineOperand::MO_CFIIndex: return false; case MachineOperand::MO_IntrinsicID: case MachineOperand::MO_Predicate: llvm_unreachable("should not exist post-isel"); } llvm_unreachable("unhandled machine operand type"); } void ARMExpandPseudo::ExpandMOV32BitImm(MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI) { MachineInstr &MI = *MBBI; unsigned Opcode = MI.getOpcode(); unsigned PredReg = 0; ARMCC::CondCodes Pred = getInstrPredicate(MI, PredReg); unsigned DstReg = MI.getOperand(0).getReg(); bool DstIsDead = MI.getOperand(0).isDead(); bool isCC = Opcode == ARM::MOVCCi32imm || Opcode == ARM::t2MOVCCi32imm; const MachineOperand &MO = MI.getOperand(isCC ? 2 : 1); bool RequiresBundling = STI->isTargetWindows() && IsAnAddressOperand(MO); MachineInstrBuilder LO16, HI16; if (!STI->hasV6T2Ops() && (Opcode == ARM::MOVi32imm || Opcode == ARM::MOVCCi32imm)) { // FIXME Windows CE supports older ARM CPUs assert(!STI->isTargetWindows() && "Windows on ARM requires ARMv7+"); // Expand into a movi + orr. LO16 = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(ARM::MOVi), DstReg); HI16 = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(ARM::ORRri)) .addReg(DstReg, RegState::Define | getDeadRegState(DstIsDead)) .addReg(DstReg); assert (MO.isImm() && "MOVi32imm w/ non-immediate source operand!"); unsigned ImmVal = (unsigned)MO.getImm(); unsigned SOImmValV1 = ARM_AM::getSOImmTwoPartFirst(ImmVal); unsigned SOImmValV2 = ARM_AM::getSOImmTwoPartSecond(ImmVal); LO16 = LO16.addImm(SOImmValV1); HI16 = HI16.addImm(SOImmValV2); LO16->setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); HI16->setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); LO16.addImm(Pred).addReg(PredReg).addReg(0); HI16.addImm(Pred).addReg(PredReg).addReg(0); TransferImpOps(MI, LO16, HI16); MI.eraseFromParent(); return; } unsigned LO16Opc = 0; unsigned HI16Opc = 0; if (Opcode == ARM::t2MOVi32imm || Opcode == ARM::t2MOVCCi32imm) { LO16Opc = ARM::t2MOVi16; HI16Opc = ARM::t2MOVTi16; } else { LO16Opc = ARM::MOVi16; HI16Opc = ARM::MOVTi16; } LO16 = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(LO16Opc), DstReg); HI16 = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(HI16Opc)) .addReg(DstReg, RegState::Define | getDeadRegState(DstIsDead)) .addReg(DstReg); switch (MO.getType()) { case MachineOperand::MO_Immediate: { unsigned Imm = MO.getImm(); unsigned Lo16 = Imm & 0xffff; unsigned Hi16 = (Imm >> 16) & 0xffff; LO16 = LO16.addImm(Lo16); HI16 = HI16.addImm(Hi16); break; } case MachineOperand::MO_ExternalSymbol: { const char *ES = MO.getSymbolName(); unsigned TF = MO.getTargetFlags(); LO16 = LO16.addExternalSymbol(ES, TF | ARMII::MO_LO16); HI16 = HI16.addExternalSymbol(ES, TF | ARMII::MO_HI16); break; } default: { const GlobalValue *GV = MO.getGlobal(); unsigned TF = MO.getTargetFlags(); LO16 = LO16.addGlobalAddress(GV, MO.getOffset(), TF | ARMII::MO_LO16); HI16 = HI16.addGlobalAddress(GV, MO.getOffset(), TF | ARMII::MO_HI16); break; } } LO16->setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); HI16->setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); LO16.addImm(Pred).addReg(PredReg); HI16.addImm(Pred).addReg(PredReg); if (RequiresBundling) finalizeBundle(MBB, LO16->getIterator(), MBBI->getIterator()); TransferImpOps(MI, LO16, HI16); MI.eraseFromParent(); } static void addPostLoopLiveIns(MachineBasicBlock *MBB, LivePhysRegs &LiveRegs) { for (auto I = LiveRegs.begin(); I != LiveRegs.end(); ++I) MBB->addLiveIn(*I); } /// Expand a CMP_SWAP pseudo-inst to an ldrex/strex loop as simply as /// possible. This only gets used at -O0 so we don't care about efficiency of the /// generated code. bool ARMExpandPseudo::ExpandCMP_SWAP(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, unsigned LdrexOp, unsigned StrexOp, unsigned UxtOp, MachineBasicBlock::iterator &NextMBBI) { bool IsThumb = STI->isThumb(); MachineInstr &MI = *MBBI; DebugLoc DL = MI.getDebugLoc(); MachineOperand &Dest = MI.getOperand(0); unsigned StatusReg = MI.getOperand(1).getReg(); MachineOperand &Addr = MI.getOperand(2); MachineOperand &Desired = MI.getOperand(3); MachineOperand &New = MI.getOperand(4); LivePhysRegs LiveRegs(&TII->getRegisterInfo()); LiveRegs.addLiveOuts(MBB); for (auto I = std::prev(MBB.end()); I != MBBI; --I) LiveRegs.stepBackward(*I); MachineFunction *MF = MBB.getParent(); auto LoadCmpBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock()); auto StoreBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock()); auto DoneBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock()); MF->insert(++MBB.getIterator(), LoadCmpBB); MF->insert(++LoadCmpBB->getIterator(), StoreBB); MF->insert(++StoreBB->getIterator(), DoneBB); if (UxtOp) { MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(UxtOp), Desired.getReg()) .addReg(Desired.getReg(), RegState::Kill); if (!IsThumb) MIB.addImm(0); AddDefaultPred(MIB); } // .Lloadcmp: // ldrex rDest, [rAddr] // cmp rDest, rDesired // bne .Ldone LoadCmpBB->addLiveIn(Addr.getReg()); LoadCmpBB->addLiveIn(Dest.getReg()); LoadCmpBB->addLiveIn(Desired.getReg()); addPostLoopLiveIns(LoadCmpBB, LiveRegs); MachineInstrBuilder MIB; MIB = BuildMI(LoadCmpBB, DL, TII->get(LdrexOp), Dest.getReg()); MIB.addReg(Addr.getReg()); if (LdrexOp == ARM::t2LDREX) MIB.addImm(0); // a 32-bit Thumb ldrex (only) allows an offset. AddDefaultPred(MIB); unsigned CMPrr = IsThumb ? ARM::tCMPhir : ARM::CMPrr; AddDefaultPred(BuildMI(LoadCmpBB, DL, TII->get(CMPrr)) .addReg(Dest.getReg(), getKillRegState(Dest.isDead())) .addOperand(Desired)); unsigned Bcc = IsThumb ? ARM::tBcc : ARM::Bcc; BuildMI(LoadCmpBB, DL, TII->get(Bcc)) .addMBB(DoneBB) .addImm(ARMCC::NE) .addReg(ARM::CPSR, RegState::Kill); LoadCmpBB->addSuccessor(DoneBB); LoadCmpBB->addSuccessor(StoreBB); // .Lstore: // strex rStatus, rNew, [rAddr] // cmp rStatus, #0 // bne .Lloadcmp StoreBB->addLiveIn(Addr.getReg()); StoreBB->addLiveIn(New.getReg()); addPostLoopLiveIns(StoreBB, LiveRegs); MIB = BuildMI(StoreBB, DL, TII->get(StrexOp), StatusReg); MIB.addOperand(New); MIB.addOperand(Addr); if (StrexOp == ARM::t2STREX) MIB.addImm(0); // a 32-bit Thumb strex (only) allows an offset. AddDefaultPred(MIB); unsigned CMPri = IsThumb ? ARM::t2CMPri : ARM::CMPri; AddDefaultPred(BuildMI(StoreBB, DL, TII->get(CMPri)) .addReg(StatusReg, RegState::Kill) .addImm(0)); BuildMI(StoreBB, DL, TII->get(Bcc)) .addMBB(LoadCmpBB) .addImm(ARMCC::NE) .addReg(ARM::CPSR, RegState::Kill); StoreBB->addSuccessor(LoadCmpBB); StoreBB->addSuccessor(DoneBB); DoneBB->splice(DoneBB->end(), &MBB, MI, MBB.end()); DoneBB->transferSuccessors(&MBB); addPostLoopLiveIns(DoneBB, LiveRegs); MBB.addSuccessor(LoadCmpBB); NextMBBI = MBB.end(); MI.eraseFromParent(); return true; } /// ARM's ldrexd/strexd take a consecutive register pair (represented as a /// single GPRPair register), Thumb's take two separate registers so we need to /// extract the subregs from the pair. static void addExclusiveRegPair(MachineInstrBuilder &MIB, MachineOperand &Reg, unsigned Flags, bool IsThumb, const TargetRegisterInfo *TRI) { if (IsThumb) { unsigned RegLo = TRI->getSubReg(Reg.getReg(), ARM::gsub_0); unsigned RegHi = TRI->getSubReg(Reg.getReg(), ARM::gsub_1); MIB.addReg(RegLo, Flags | getKillRegState(Reg.isDead())); MIB.addReg(RegHi, Flags | getKillRegState(Reg.isDead())); } else MIB.addReg(Reg.getReg(), Flags | getKillRegState(Reg.isDead())); } /// Expand a 64-bit CMP_SWAP to an ldrexd/strexd loop. bool ARMExpandPseudo::ExpandCMP_SWAP_64(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, MachineBasicBlock::iterator &NextMBBI) { bool IsThumb = STI->isThumb(); MachineInstr &MI = *MBBI; DebugLoc DL = MI.getDebugLoc(); MachineOperand &Dest = MI.getOperand(0); unsigned StatusReg = MI.getOperand(1).getReg(); MachineOperand &Addr = MI.getOperand(2); MachineOperand &Desired = MI.getOperand(3); MachineOperand &New = MI.getOperand(4); unsigned DestLo = TRI->getSubReg(Dest.getReg(), ARM::gsub_0); unsigned DestHi = TRI->getSubReg(Dest.getReg(), ARM::gsub_1); unsigned DesiredLo = TRI->getSubReg(Desired.getReg(), ARM::gsub_0); unsigned DesiredHi = TRI->getSubReg(Desired.getReg(), ARM::gsub_1); LivePhysRegs LiveRegs(&TII->getRegisterInfo()); LiveRegs.addLiveOuts(MBB); for (auto I = std::prev(MBB.end()); I != MBBI; --I) LiveRegs.stepBackward(*I); MachineFunction *MF = MBB.getParent(); auto LoadCmpBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock()); auto StoreBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock()); auto DoneBB = MF->CreateMachineBasicBlock(MBB.getBasicBlock()); MF->insert(++MBB.getIterator(), LoadCmpBB); MF->insert(++LoadCmpBB->getIterator(), StoreBB); MF->insert(++StoreBB->getIterator(), DoneBB); // .Lloadcmp: // ldrexd rDestLo, rDestHi, [rAddr] // cmp rDestLo, rDesiredLo // sbcs rStatus, rDestHi, rDesiredHi // bne .Ldone LoadCmpBB->addLiveIn(Addr.getReg()); LoadCmpBB->addLiveIn(Dest.getReg()); LoadCmpBB->addLiveIn(Desired.getReg()); addPostLoopLiveIns(LoadCmpBB, LiveRegs); unsigned LDREXD = IsThumb ? ARM::t2LDREXD : ARM::LDREXD; MachineInstrBuilder MIB; MIB = BuildMI(LoadCmpBB, DL, TII->get(LDREXD)); addExclusiveRegPair(MIB, Dest, RegState::Define, IsThumb, TRI); MIB.addReg(Addr.getReg()); AddDefaultPred(MIB); unsigned CMPrr = IsThumb ? ARM::tCMPhir : ARM::CMPrr; AddDefaultPred(BuildMI(LoadCmpBB, DL, TII->get(CMPrr)) .addReg(DestLo, getKillRegState(Dest.isDead())) .addReg(DesiredLo, getKillRegState(Desired.isDead()))); BuildMI(LoadCmpBB, DL, TII->get(CMPrr)) .addReg(DestHi, getKillRegState(Dest.isDead())) .addReg(DesiredHi, getKillRegState(Desired.isDead())) .addImm(ARMCC::EQ).addReg(ARM::CPSR, RegState::Kill); unsigned Bcc = IsThumb ? ARM::tBcc : ARM::Bcc; BuildMI(LoadCmpBB, DL, TII->get(Bcc)) .addMBB(DoneBB) .addImm(ARMCC::NE) .addReg(ARM::CPSR, RegState::Kill); LoadCmpBB->addSuccessor(DoneBB); LoadCmpBB->addSuccessor(StoreBB); // .Lstore: // strexd rStatus, rNewLo, rNewHi, [rAddr] // cmp rStatus, #0 // bne .Lloadcmp StoreBB->addLiveIn(Addr.getReg()); StoreBB->addLiveIn(New.getReg()); addPostLoopLiveIns(StoreBB, LiveRegs); unsigned STREXD = IsThumb ? ARM::t2STREXD : ARM::STREXD; MIB = BuildMI(StoreBB, DL, TII->get(STREXD), StatusReg); addExclusiveRegPair(MIB, New, 0, IsThumb, TRI); MIB.addOperand(Addr); AddDefaultPred(MIB); unsigned CMPri = IsThumb ? ARM::t2CMPri : ARM::CMPri; AddDefaultPred(BuildMI(StoreBB, DL, TII->get(CMPri)) .addReg(StatusReg, RegState::Kill) .addImm(0)); BuildMI(StoreBB, DL, TII->get(Bcc)) .addMBB(LoadCmpBB) .addImm(ARMCC::NE) .addReg(ARM::CPSR, RegState::Kill); StoreBB->addSuccessor(LoadCmpBB); StoreBB->addSuccessor(DoneBB); DoneBB->splice(DoneBB->end(), &MBB, MI, MBB.end()); DoneBB->transferSuccessors(&MBB); addPostLoopLiveIns(DoneBB, LiveRegs); MBB.addSuccessor(LoadCmpBB); NextMBBI = MBB.end(); MI.eraseFromParent(); return true; } bool ARMExpandPseudo::ExpandMI(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, MachineBasicBlock::iterator &NextMBBI) { MachineInstr &MI = *MBBI; unsigned Opcode = MI.getOpcode(); switch (Opcode) { default: return false; case ARM::TCRETURNdi: case ARM::TCRETURNri: { MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr(); assert(MBBI->isReturn() && "Can only insert epilog into returning blocks"); unsigned RetOpcode = MBBI->getOpcode(); DebugLoc dl = MBBI->getDebugLoc(); const ARMBaseInstrInfo &TII = *static_cast( MBB.getParent()->getSubtarget().getInstrInfo()); // Tail call return: adjust the stack pointer and jump to callee. MBBI = MBB.getLastNonDebugInstr(); MachineOperand &JumpTarget = MBBI->getOperand(0); // Jump to label or value in register. if (RetOpcode == ARM::TCRETURNdi) { unsigned TCOpcode = STI->isThumb() ? (STI->isTargetMachO() ? ARM::tTAILJMPd : ARM::tTAILJMPdND) : ARM::TAILJMPd; MachineInstrBuilder MIB = BuildMI(MBB, MBBI, dl, TII.get(TCOpcode)); if (JumpTarget.isGlobal()) MIB.addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset(), JumpTarget.getTargetFlags()); else { assert(JumpTarget.isSymbol()); MIB.addExternalSymbol(JumpTarget.getSymbolName(), JumpTarget.getTargetFlags()); } // Add the default predicate in Thumb mode. if (STI->isThumb()) MIB.addImm(ARMCC::AL).addReg(0); } else if (RetOpcode == ARM::TCRETURNri) { BuildMI(MBB, MBBI, dl, TII.get(STI->isThumb() ? ARM::tTAILJMPr : ARM::TAILJMPr)) .addReg(JumpTarget.getReg(), RegState::Kill); } auto NewMI = std::prev(MBBI); for (unsigned i = 1, e = MBBI->getNumOperands(); i != e; ++i) NewMI->addOperand(MBBI->getOperand(i)); // Delete the pseudo instruction TCRETURN. MBB.erase(MBBI); MBBI = NewMI; return true; } case ARM::VMOVScc: case ARM::VMOVDcc: { unsigned newOpc = Opcode == ARM::VMOVScc ? ARM::VMOVS : ARM::VMOVD; BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(newOpc), MI.getOperand(1).getReg()) .addOperand(MI.getOperand(2)) .addImm(MI.getOperand(3).getImm()) // 'pred' .addOperand(MI.getOperand(4)); MI.eraseFromParent(); return true; } case ARM::t2MOVCCr: case ARM::MOVCCr: { unsigned Opc = AFI->isThumbFunction() ? ARM::t2MOVr : ARM::MOVr; BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(Opc), MI.getOperand(1).getReg()) .addOperand(MI.getOperand(2)) .addImm(MI.getOperand(3).getImm()) // 'pred' .addOperand(MI.getOperand(4)) .addReg(0); // 's' bit MI.eraseFromParent(); return true; } case ARM::MOVCCsi: { BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(ARM::MOVsi), (MI.getOperand(1).getReg())) .addOperand(MI.getOperand(2)) .addImm(MI.getOperand(3).getImm()) .addImm(MI.getOperand(4).getImm()) // 'pred' .addOperand(MI.getOperand(5)) .addReg(0); // 's' bit MI.eraseFromParent(); return true; } case ARM::MOVCCsr: { BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(ARM::MOVsr), (MI.getOperand(1).getReg())) .addOperand(MI.getOperand(2)) .addOperand(MI.getOperand(3)) .addImm(MI.getOperand(4).getImm()) .addImm(MI.getOperand(5).getImm()) // 'pred' .addOperand(MI.getOperand(6)) .addReg(0); // 's' bit MI.eraseFromParent(); return true; } case ARM::t2MOVCCi16: case ARM::MOVCCi16: { unsigned NewOpc = AFI->isThumbFunction() ? ARM::t2MOVi16 : ARM::MOVi16; BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(NewOpc), MI.getOperand(1).getReg()) .addImm(MI.getOperand(2).getImm()) .addImm(MI.getOperand(3).getImm()) // 'pred' .addOperand(MI.getOperand(4)); MI.eraseFromParent(); return true; } case ARM::t2MOVCCi: case ARM::MOVCCi: { unsigned Opc = AFI->isThumbFunction() ? ARM::t2MOVi : ARM::MOVi; BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(Opc), MI.getOperand(1).getReg()) .addImm(MI.getOperand(2).getImm()) .addImm(MI.getOperand(3).getImm()) // 'pred' .addOperand(MI.getOperand(4)) .addReg(0); // 's' bit MI.eraseFromParent(); return true; } case ARM::t2MVNCCi: case ARM::MVNCCi: { unsigned Opc = AFI->isThumbFunction() ? ARM::t2MVNi : ARM::MVNi; BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(Opc), MI.getOperand(1).getReg()) .addImm(MI.getOperand(2).getImm()) .addImm(MI.getOperand(3).getImm()) // 'pred' .addOperand(MI.getOperand(4)) .addReg(0); // 's' bit MI.eraseFromParent(); return true; } case ARM::t2MOVCClsl: case ARM::t2MOVCClsr: case ARM::t2MOVCCasr: case ARM::t2MOVCCror: { unsigned NewOpc; switch (Opcode) { case ARM::t2MOVCClsl: NewOpc = ARM::t2LSLri; break; case ARM::t2MOVCClsr: NewOpc = ARM::t2LSRri; break; case ARM::t2MOVCCasr: NewOpc = ARM::t2ASRri; break; case ARM::t2MOVCCror: NewOpc = ARM::t2RORri; break; default: llvm_unreachable("unexpeced conditional move"); } BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(NewOpc), MI.getOperand(1).getReg()) .addOperand(MI.getOperand(2)) .addImm(MI.getOperand(3).getImm()) .addImm(MI.getOperand(4).getImm()) // 'pred' .addOperand(MI.getOperand(5)) .addReg(0); // 's' bit MI.eraseFromParent(); return true; } case ARM::Int_eh_sjlj_dispatchsetup: { MachineFunction &MF = *MI.getParent()->getParent(); const ARMBaseInstrInfo *AII = static_cast(TII); const ARMBaseRegisterInfo &RI = AII->getRegisterInfo(); // For functions using a base pointer, we rematerialize it (via the frame // pointer) here since eh.sjlj.setjmp and eh.sjlj.longjmp don't do it // for us. Otherwise, expand to nothing. if (RI.hasBasePointer(MF)) { int32_t NumBytes = AFI->getFramePtrSpillOffset(); unsigned FramePtr = RI.getFrameRegister(MF); assert(MF.getSubtarget().getFrameLowering()->hasFP(MF) && "base pointer without frame pointer?"); if (AFI->isThumb2Function()) { emitT2RegPlusImmediate(MBB, MBBI, MI.getDebugLoc(), ARM::R6, FramePtr, -NumBytes, ARMCC::AL, 0, *TII); } else if (AFI->isThumbFunction()) { emitThumbRegPlusImmediate(MBB, MBBI, MI.getDebugLoc(), ARM::R6, FramePtr, -NumBytes, *TII, RI); } else { emitARMRegPlusImmediate(MBB, MBBI, MI.getDebugLoc(), ARM::R6, FramePtr, -NumBytes, ARMCC::AL, 0, *TII); } // If there's dynamic realignment, adjust for it. if (RI.needsStackRealignment(MF)) { MachineFrameInfo &MFI = MF.getFrameInfo(); unsigned MaxAlign = MFI.getMaxAlignment(); assert (!AFI->isThumb1OnlyFunction()); // Emit bic r6, r6, MaxAlign assert(MaxAlign <= 256 && "The BIC instruction cannot encode " "immediates larger than 256 with all lower " "bits set."); unsigned bicOpc = AFI->isThumbFunction() ? ARM::t2BICri : ARM::BICri; AddDefaultCC(AddDefaultPred(BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(bicOpc), ARM::R6) .addReg(ARM::R6, RegState::Kill) .addImm(MaxAlign-1))); } } MI.eraseFromParent(); return true; } case ARM::MOVsrl_flag: case ARM::MOVsra_flag: { // These are just fancy MOVs instructions. AddDefaultPred(BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(ARM::MOVsi), MI.getOperand(0).getReg()) .addOperand(MI.getOperand(1)) .addImm(ARM_AM::getSORegOpc((Opcode == ARM::MOVsrl_flag ? ARM_AM::lsr : ARM_AM::asr), 1))) .addReg(ARM::CPSR, RegState::Define); MI.eraseFromParent(); return true; } case ARM::RRX: { // This encodes as "MOVs Rd, Rm, rrx MachineInstrBuilder MIB = AddDefaultPred(BuildMI(MBB, MBBI, MI.getDebugLoc(),TII->get(ARM::MOVsi), MI.getOperand(0).getReg()) .addOperand(MI.getOperand(1)) .addImm(ARM_AM::getSORegOpc(ARM_AM::rrx, 0))) .addReg(0); TransferImpOps(MI, MIB, MIB); MI.eraseFromParent(); return true; } case ARM::tTPsoft: case ARM::TPsoft: { + const bool Thumb = Opcode == ARM::tTPsoft; + MachineInstrBuilder MIB; - if (Opcode == ARM::tTPsoft) + if (STI->genLongCalls()) { + MachineFunction *MF = MBB.getParent(); + MachineConstantPool *MCP = MF->getConstantPool(); + unsigned PCLabelID = AFI->createPICLabelUId(); + MachineConstantPoolValue *CPV = + ARMConstantPoolSymbol::Create(MF->getFunction()->getContext(), + "__aeabi_read_tp", PCLabelID, 0); + unsigned Reg = MI.getOperand(0).getReg(); MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), - TII->get( ARM::tBL)) - .addImm((unsigned)ARMCC::AL).addReg(0) - .addExternalSymbol("__aeabi_read_tp", 0); - else + TII->get(Thumb ? ARM::tLDRpci : ARM::LDRi12), Reg) + .addConstantPoolIndex(MCP->getConstantPoolIndex(CPV, 4)); + if (!Thumb) + MIB.addImm(0); + MIB.addImm(static_cast(ARMCC::AL)).addReg(0); + MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), - TII->get( ARM::BL)) - .addExternalSymbol("__aeabi_read_tp", 0); + TII->get(Thumb ? ARM::tBLXr : ARM::BLX)); + if (Thumb) + MIB.addImm(static_cast(ARMCC::AL)).addReg(0); + MIB.addReg(Reg, RegState::Kill); + } else { + MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), + TII->get(Thumb ? ARM::tBL : ARM::BL)); + if (Thumb) + MIB.addImm(static_cast(ARMCC::AL)).addReg(0); + MIB.addExternalSymbol("__aeabi_read_tp", 0); + } MIB->setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); TransferImpOps(MI, MIB, MIB); MI.eraseFromParent(); return true; } case ARM::tLDRpci_pic: case ARM::t2LDRpci_pic: { unsigned NewLdOpc = (Opcode == ARM::tLDRpci_pic) ? ARM::tLDRpci : ARM::t2LDRpci; unsigned DstReg = MI.getOperand(0).getReg(); bool DstIsDead = MI.getOperand(0).isDead(); MachineInstrBuilder MIB1 = AddDefaultPred(BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(NewLdOpc), DstReg) .addOperand(MI.getOperand(1))); MIB1->setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); MachineInstrBuilder MIB2 = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(ARM::tPICADD)) .addReg(DstReg, RegState::Define | getDeadRegState(DstIsDead)) .addReg(DstReg) .addOperand(MI.getOperand(2)); TransferImpOps(MI, MIB1, MIB2); MI.eraseFromParent(); return true; } case ARM::LDRLIT_ga_abs: case ARM::LDRLIT_ga_pcrel: case ARM::LDRLIT_ga_pcrel_ldr: case ARM::tLDRLIT_ga_abs: case ARM::tLDRLIT_ga_pcrel: { unsigned DstReg = MI.getOperand(0).getReg(); bool DstIsDead = MI.getOperand(0).isDead(); const MachineOperand &MO1 = MI.getOperand(1); const GlobalValue *GV = MO1.getGlobal(); bool IsARM = Opcode != ARM::tLDRLIT_ga_pcrel && Opcode != ARM::tLDRLIT_ga_abs; bool IsPIC = Opcode != ARM::LDRLIT_ga_abs && Opcode != ARM::tLDRLIT_ga_abs; unsigned LDRLITOpc = IsARM ? ARM::LDRi12 : ARM::tLDRpci; unsigned PICAddOpc = IsARM ? (Opcode == ARM::LDRLIT_ga_pcrel_ldr ? ARM::PICLDR : ARM::PICADD) : ARM::tPICADD; // We need a new const-pool entry to load from. MachineConstantPool *MCP = MBB.getParent()->getConstantPool(); unsigned ARMPCLabelIndex = 0; MachineConstantPoolValue *CPV; if (IsPIC) { unsigned PCAdj = IsARM ? 8 : 4; ARMPCLabelIndex = AFI->createPICLabelUId(); CPV = ARMConstantPoolConstant::Create(GV, ARMPCLabelIndex, ARMCP::CPValue, PCAdj); } else CPV = ARMConstantPoolConstant::Create(GV, ARMCP::no_modifier); MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(LDRLITOpc), DstReg) .addConstantPoolIndex(MCP->getConstantPoolIndex(CPV, 4)); if (IsARM) MIB.addImm(0); AddDefaultPred(MIB); if (IsPIC) { MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(PICAddOpc)) .addReg(DstReg, RegState::Define | getDeadRegState(DstIsDead)) .addReg(DstReg) .addImm(ARMPCLabelIndex); if (IsARM) AddDefaultPred(MIB); } MI.eraseFromParent(); return true; } case ARM::MOV_ga_pcrel: case ARM::MOV_ga_pcrel_ldr: case ARM::t2MOV_ga_pcrel: { // Expand into movw + movw. Also "add pc" / ldr [pc] in PIC mode. unsigned LabelId = AFI->createPICLabelUId(); unsigned DstReg = MI.getOperand(0).getReg(); bool DstIsDead = MI.getOperand(0).isDead(); const MachineOperand &MO1 = MI.getOperand(1); const GlobalValue *GV = MO1.getGlobal(); unsigned TF = MO1.getTargetFlags(); bool isARM = Opcode != ARM::t2MOV_ga_pcrel; unsigned LO16Opc = isARM ? ARM::MOVi16_ga_pcrel : ARM::t2MOVi16_ga_pcrel; unsigned HI16Opc = isARM ? ARM::MOVTi16_ga_pcrel :ARM::t2MOVTi16_ga_pcrel; unsigned LO16TF = TF | ARMII::MO_LO16; unsigned HI16TF = TF | ARMII::MO_HI16; unsigned PICAddOpc = isARM ? (Opcode == ARM::MOV_ga_pcrel_ldr ? ARM::PICLDR : ARM::PICADD) : ARM::tPICADD; MachineInstrBuilder MIB1 = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(LO16Opc), DstReg) .addGlobalAddress(GV, MO1.getOffset(), TF | LO16TF) .addImm(LabelId); BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(HI16Opc), DstReg) .addReg(DstReg) .addGlobalAddress(GV, MO1.getOffset(), TF | HI16TF) .addImm(LabelId); MachineInstrBuilder MIB3 = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(PICAddOpc)) .addReg(DstReg, RegState::Define | getDeadRegState(DstIsDead)) .addReg(DstReg).addImm(LabelId); if (isARM) { AddDefaultPred(MIB3); if (Opcode == ARM::MOV_ga_pcrel_ldr) MIB3->setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); } TransferImpOps(MI, MIB1, MIB3); MI.eraseFromParent(); return true; } case ARM::MOVi32imm: case ARM::MOVCCi32imm: case ARM::t2MOVi32imm: case ARM::t2MOVCCi32imm: ExpandMOV32BitImm(MBB, MBBI); return true; case ARM::SUBS_PC_LR: { MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(ARM::SUBri), ARM::PC) .addReg(ARM::LR) .addOperand(MI.getOperand(0)) .addOperand(MI.getOperand(1)) .addOperand(MI.getOperand(2)) .addReg(ARM::CPSR, RegState::Undef); TransferImpOps(MI, MIB, MIB); MI.eraseFromParent(); return true; } case ARM::VLDMQIA: { unsigned NewOpc = ARM::VLDMDIA; MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(NewOpc)); unsigned OpIdx = 0; // Grab the Q register destination. bool DstIsDead = MI.getOperand(OpIdx).isDead(); unsigned DstReg = MI.getOperand(OpIdx++).getReg(); // Copy the source register. MIB.addOperand(MI.getOperand(OpIdx++)); // Copy the predicate operands. MIB.addOperand(MI.getOperand(OpIdx++)); MIB.addOperand(MI.getOperand(OpIdx++)); // Add the destination operands (D subregs). unsigned D0 = TRI->getSubReg(DstReg, ARM::dsub_0); unsigned D1 = TRI->getSubReg(DstReg, ARM::dsub_1); MIB.addReg(D0, RegState::Define | getDeadRegState(DstIsDead)) .addReg(D1, RegState::Define | getDeadRegState(DstIsDead)); // Add an implicit def for the super-register. MIB.addReg(DstReg, RegState::ImplicitDefine | getDeadRegState(DstIsDead)); TransferImpOps(MI, MIB, MIB); MIB.setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); MI.eraseFromParent(); return true; } case ARM::VSTMQIA: { unsigned NewOpc = ARM::VSTMDIA; MachineInstrBuilder MIB = BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(NewOpc)); unsigned OpIdx = 0; // Grab the Q register source. bool SrcIsKill = MI.getOperand(OpIdx).isKill(); unsigned SrcReg = MI.getOperand(OpIdx++).getReg(); // Copy the destination register. MIB.addOperand(MI.getOperand(OpIdx++)); // Copy the predicate operands. MIB.addOperand(MI.getOperand(OpIdx++)); MIB.addOperand(MI.getOperand(OpIdx++)); // Add the source operands (D subregs). unsigned D0 = TRI->getSubReg(SrcReg, ARM::dsub_0); unsigned D1 = TRI->getSubReg(SrcReg, ARM::dsub_1); MIB.addReg(D0, SrcIsKill ? RegState::Kill : 0) .addReg(D1, SrcIsKill ? RegState::Kill : 0); if (SrcIsKill) // Add an implicit kill for the Q register. MIB->addRegisterKilled(SrcReg, TRI, true); TransferImpOps(MI, MIB, MIB); MIB.setMemRefs(MI.memoperands_begin(), MI.memoperands_end()); MI.eraseFromParent(); return true; } case ARM::VLD2q8Pseudo: case ARM::VLD2q16Pseudo: case ARM::VLD2q32Pseudo: case ARM::VLD2q8PseudoWB_fixed: case ARM::VLD2q16PseudoWB_fixed: case ARM::VLD2q32PseudoWB_fixed: case ARM::VLD2q8PseudoWB_register: case ARM::VLD2q16PseudoWB_register: case ARM::VLD2q32PseudoWB_register: case ARM::VLD3d8Pseudo: case ARM::VLD3d16Pseudo: case ARM::VLD3d32Pseudo: case ARM::VLD1d64TPseudo: case ARM::VLD1d64TPseudoWB_fixed: case ARM::VLD3d8Pseudo_UPD: case ARM::VLD3d16Pseudo_UPD: case ARM::VLD3d32Pseudo_UPD: case ARM::VLD3q8Pseudo_UPD: case ARM::VLD3q16Pseudo_UPD: case ARM::VLD3q32Pseudo_UPD: case ARM::VLD3q8oddPseudo: case ARM::VLD3q16oddPseudo: case ARM::VLD3q32oddPseudo: case ARM::VLD3q8oddPseudo_UPD: case ARM::VLD3q16oddPseudo_UPD: case ARM::VLD3q32oddPseudo_UPD: case ARM::VLD4d8Pseudo: case ARM::VLD4d16Pseudo: case ARM::VLD4d32Pseudo: case ARM::VLD1d64QPseudo: case ARM::VLD1d64QPseudoWB_fixed: case ARM::VLD4d8Pseudo_UPD: case ARM::VLD4d16Pseudo_UPD: case ARM::VLD4d32Pseudo_UPD: case ARM::VLD4q8Pseudo_UPD: case ARM::VLD4q16Pseudo_UPD: case ARM::VLD4q32Pseudo_UPD: case ARM::VLD4q8oddPseudo: case ARM::VLD4q16oddPseudo: case ARM::VLD4q32oddPseudo: case ARM::VLD4q8oddPseudo_UPD: case ARM::VLD4q16oddPseudo_UPD: case ARM::VLD4q32oddPseudo_UPD: case ARM::VLD3DUPd8Pseudo: case ARM::VLD3DUPd16Pseudo: case ARM::VLD3DUPd32Pseudo: case ARM::VLD3DUPd8Pseudo_UPD: case ARM::VLD3DUPd16Pseudo_UPD: case ARM::VLD3DUPd32Pseudo_UPD: case ARM::VLD4DUPd8Pseudo: case ARM::VLD4DUPd16Pseudo: case ARM::VLD4DUPd32Pseudo: case ARM::VLD4DUPd8Pseudo_UPD: case ARM::VLD4DUPd16Pseudo_UPD: case ARM::VLD4DUPd32Pseudo_UPD: ExpandVLD(MBBI); return true; case ARM::VST2q8Pseudo: case ARM::VST2q16Pseudo: case ARM::VST2q32Pseudo: case ARM::VST2q8PseudoWB_fixed: case ARM::VST2q16PseudoWB_fixed: case ARM::VST2q32PseudoWB_fixed: case ARM::VST2q8PseudoWB_register: case ARM::VST2q16PseudoWB_register: case ARM::VST2q32PseudoWB_register: case ARM::VST3d8Pseudo: case ARM::VST3d16Pseudo: case ARM::VST3d32Pseudo: case ARM::VST1d64TPseudo: case ARM::VST3d8Pseudo_UPD: case ARM::VST3d16Pseudo_UPD: case ARM::VST3d32Pseudo_UPD: case ARM::VST1d64TPseudoWB_fixed: case ARM::VST1d64TPseudoWB_register: case ARM::VST3q8Pseudo_UPD: case ARM::VST3q16Pseudo_UPD: case ARM::VST3q32Pseudo_UPD: case ARM::VST3q8oddPseudo: case ARM::VST3q16oddPseudo: case ARM::VST3q32oddPseudo: case ARM::VST3q8oddPseudo_UPD: case ARM::VST3q16oddPseudo_UPD: case ARM::VST3q32oddPseudo_UPD: case ARM::VST4d8Pseudo: case ARM::VST4d16Pseudo: case ARM::VST4d32Pseudo: case ARM::VST1d64QPseudo: case ARM::VST4d8Pseudo_UPD: case ARM::VST4d16Pseudo_UPD: case ARM::VST4d32Pseudo_UPD: case ARM::VST1d64QPseudoWB_fixed: case ARM::VST1d64QPseudoWB_register: case ARM::VST4q8Pseudo_UPD: case ARM::VST4q16Pseudo_UPD: case ARM::VST4q32Pseudo_UPD: case ARM::VST4q8oddPseudo: case ARM::VST4q16oddPseudo: case ARM::VST4q32oddPseudo: case ARM::VST4q8oddPseudo_UPD: case ARM::VST4q16oddPseudo_UPD: case ARM::VST4q32oddPseudo_UPD: ExpandVST(MBBI); return true; case ARM::VLD1LNq8Pseudo: case ARM::VLD1LNq16Pseudo: case ARM::VLD1LNq32Pseudo: case ARM::VLD1LNq8Pseudo_UPD: case ARM::VLD1LNq16Pseudo_UPD: case ARM::VLD1LNq32Pseudo_UPD: case ARM::VLD2LNd8Pseudo: case ARM::VLD2LNd16Pseudo: case ARM::VLD2LNd32Pseudo: case ARM::VLD2LNq16Pseudo: case ARM::VLD2LNq32Pseudo: case ARM::VLD2LNd8Pseudo_UPD: case ARM::VLD2LNd16Pseudo_UPD: case ARM::VLD2LNd32Pseudo_UPD: case ARM::VLD2LNq16Pseudo_UPD: case ARM::VLD2LNq32Pseudo_UPD: case ARM::VLD3LNd8Pseudo: case ARM::VLD3LNd16Pseudo: case ARM::VLD3LNd32Pseudo: case ARM::VLD3LNq16Pseudo: case ARM::VLD3LNq32Pseudo: case ARM::VLD3LNd8Pseudo_UPD: case ARM::VLD3LNd16Pseudo_UPD: case ARM::VLD3LNd32Pseudo_UPD: case ARM::VLD3LNq16Pseudo_UPD: case ARM::VLD3LNq32Pseudo_UPD: case ARM::VLD4LNd8Pseudo: case ARM::VLD4LNd16Pseudo: case ARM::VLD4LNd32Pseudo: case ARM::VLD4LNq16Pseudo: case ARM::VLD4LNq32Pseudo: case ARM::VLD4LNd8Pseudo_UPD: case ARM::VLD4LNd16Pseudo_UPD: case ARM::VLD4LNd32Pseudo_UPD: case ARM::VLD4LNq16Pseudo_UPD: case ARM::VLD4LNq32Pseudo_UPD: case ARM::VST1LNq8Pseudo: case ARM::VST1LNq16Pseudo: case ARM::VST1LNq32Pseudo: case ARM::VST1LNq8Pseudo_UPD: case ARM::VST1LNq16Pseudo_UPD: case ARM::VST1LNq32Pseudo_UPD: case ARM::VST2LNd8Pseudo: case ARM::VST2LNd16Pseudo: case ARM::VST2LNd32Pseudo: case ARM::VST2LNq16Pseudo: case ARM::VST2LNq32Pseudo: case ARM::VST2LNd8Pseudo_UPD: case ARM::VST2LNd16Pseudo_UPD: case ARM::VST2LNd32Pseudo_UPD: case ARM::VST2LNq16Pseudo_UPD: case ARM::VST2LNq32Pseudo_UPD: case ARM::VST3LNd8Pseudo: case ARM::VST3LNd16Pseudo: case ARM::VST3LNd32Pseudo: case ARM::VST3LNq16Pseudo: case ARM::VST3LNq32Pseudo: case ARM::VST3LNd8Pseudo_UPD: case ARM::VST3LNd16Pseudo_UPD: case ARM::VST3LNd32Pseudo_UPD: case ARM::VST3LNq16Pseudo_UPD: case ARM::VST3LNq32Pseudo_UPD: case ARM::VST4LNd8Pseudo: case ARM::VST4LNd16Pseudo: case ARM::VST4LNd32Pseudo: case ARM::VST4LNq16Pseudo: case ARM::VST4LNq32Pseudo: case ARM::VST4LNd8Pseudo_UPD: case ARM::VST4LNd16Pseudo_UPD: case ARM::VST4LNd32Pseudo_UPD: case ARM::VST4LNq16Pseudo_UPD: case ARM::VST4LNq32Pseudo_UPD: ExpandLaneOp(MBBI); return true; case ARM::VTBL3Pseudo: ExpandVTBL(MBBI, ARM::VTBL3, false); return true; case ARM::VTBL4Pseudo: ExpandVTBL(MBBI, ARM::VTBL4, false); return true; case ARM::VTBX3Pseudo: ExpandVTBL(MBBI, ARM::VTBX3, true); return true; case ARM::VTBX4Pseudo: ExpandVTBL(MBBI, ARM::VTBX4, true); return true; case ARM::CMP_SWAP_8: if (STI->isThumb()) return ExpandCMP_SWAP(MBB, MBBI, ARM::t2LDREXB, ARM::t2STREXB, ARM::tUXTB, NextMBBI); else return ExpandCMP_SWAP(MBB, MBBI, ARM::LDREXB, ARM::STREXB, ARM::UXTB, NextMBBI); case ARM::CMP_SWAP_16: if (STI->isThumb()) return ExpandCMP_SWAP(MBB, MBBI, ARM::t2LDREXH, ARM::t2STREXH, ARM::tUXTH, NextMBBI); else return ExpandCMP_SWAP(MBB, MBBI, ARM::LDREXH, ARM::STREXH, ARM::UXTH, NextMBBI); case ARM::CMP_SWAP_32: if (STI->isThumb()) return ExpandCMP_SWAP(MBB, MBBI, ARM::t2LDREX, ARM::t2STREX, 0, NextMBBI); else return ExpandCMP_SWAP(MBB, MBBI, ARM::LDREX, ARM::STREX, 0, NextMBBI); case ARM::CMP_SWAP_64: return ExpandCMP_SWAP_64(MBB, MBBI, NextMBBI); } } bool ARMExpandPseudo::ExpandMBB(MachineBasicBlock &MBB) { bool Modified = false; MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end(); while (MBBI != E) { MachineBasicBlock::iterator NMBBI = std::next(MBBI); Modified |= ExpandMI(MBB, MBBI, NMBBI); MBBI = NMBBI; } return Modified; } bool ARMExpandPseudo::runOnMachineFunction(MachineFunction &MF) { STI = &static_cast(MF.getSubtarget()); TII = STI->getInstrInfo(); TRI = STI->getRegisterInfo(); AFI = MF.getInfo(); bool Modified = false; for (MachineFunction::iterator MFI = MF.begin(), E = MF.end(); MFI != E; ++MFI) Modified |= ExpandMBB(*MFI); if (VerifyARMPseudo) MF.verify(this, "After expanding ARM pseudo instructions."); return Modified; } /// createARMExpandPseudoPass - returns an instance of the pseudo instruction /// expansion pass. FunctionPass *llvm::createARMExpandPseudoPass() { return new ARMExpandPseudo(); } Index: vendor/llvm/dist/lib/Transforms/Scalar/LICM.cpp =================================================================== --- vendor/llvm/dist/lib/Transforms/Scalar/LICM.cpp (revision 314158) +++ vendor/llvm/dist/lib/Transforms/Scalar/LICM.cpp (revision 314159) @@ -1,1264 +1,1261 @@ //===-- LICM.cpp - Loop Invariant Code Motion Pass ------------------------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This pass performs loop invariant code motion, attempting to remove as much // code from the body of a loop as possible. It does this by either hoisting // code into the preheader block, or by sinking code to the exit blocks if it is // safe. This pass also promotes must-aliased memory locations in the loop to // live in registers, thus hoisting and sinking "invariant" loads and stores. // // This pass uses alias analysis for two purposes: // // 1. Moving loop invariant loads and calls out of loops. If we can determine // that a load or call inside of a loop never aliases anything stored to, // we can hoist it or sink it like any other instruction. // 2. Scalar Promotion of Memory - If there is a store instruction inside of // the loop, we try to move the store to happen AFTER the loop instead of // inside of the loop. This can only happen if a few conditions are true: // A. The pointer stored through is loop invariant // B. There are no stores or loads in the loop which _may_ alias the // pointer. There are no calls in the loop which mod/ref the pointer. // If these conditions are true, we can promote the loads and stores in the // loop of the pointer to use a temporary alloca'd variable. We then use // the SSAUpdater to construct the appropriate SSA form for the value. // //===----------------------------------------------------------------------===// #include "llvm/Transforms/Scalar/LICM.h" #include "llvm/ADT/Statistic.h" #include "llvm/Analysis/AliasAnalysis.h" #include "llvm/Analysis/AliasSetTracker.h" #include "llvm/Analysis/BasicAliasAnalysis.h" #include "llvm/Analysis/CaptureTracking.h" #include "llvm/Analysis/ConstantFolding.h" #include "llvm/Analysis/GlobalsModRef.h" #include "llvm/Analysis/Loads.h" #include "llvm/Analysis/LoopInfo.h" #include "llvm/Analysis/LoopPass.h" #include "llvm/Analysis/MemoryBuiltins.h" #include "llvm/Analysis/OptimizationDiagnosticInfo.h" #include "llvm/Analysis/ScalarEvolution.h" #include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h" #include "llvm/Analysis/TargetLibraryInfo.h" #include "llvm/Analysis/ValueTracking.h" #include "llvm/IR/CFG.h" #include "llvm/IR/Constants.h" #include "llvm/IR/DataLayout.h" #include "llvm/IR/DerivedTypes.h" #include "llvm/IR/Dominators.h" #include "llvm/IR/Instructions.h" #include "llvm/IR/IntrinsicInst.h" #include "llvm/IR/LLVMContext.h" #include "llvm/IR/Metadata.h" #include "llvm/IR/PredIteratorCache.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Transforms/Scalar.h" #include "llvm/Transforms/Scalar/LoopPassManager.h" #include "llvm/Transforms/Utils/Local.h" #include "llvm/Transforms/Utils/LoopUtils.h" #include "llvm/Transforms/Utils/SSAUpdater.h" #include #include using namespace llvm; #define DEBUG_TYPE "licm" STATISTIC(NumSunk, "Number of instructions sunk out of loop"); STATISTIC(NumHoisted, "Number of instructions hoisted out of loop"); STATISTIC(NumMovedLoads, "Number of load insts hoisted or sunk"); STATISTIC(NumMovedCalls, "Number of call insts hoisted or sunk"); STATISTIC(NumPromoted, "Number of memory locations promoted to registers"); static cl::opt DisablePromotion("disable-licm-promotion", cl::Hidden, cl::desc("Disable memory promotion in LICM pass")); static bool inSubLoop(BasicBlock *BB, Loop *CurLoop, LoopInfo *LI); static bool isNotUsedInLoop(const Instruction &I, const Loop *CurLoop, const LoopSafetyInfo *SafetyInfo); static bool hoist(Instruction &I, const DominatorTree *DT, const Loop *CurLoop, const LoopSafetyInfo *SafetyInfo, OptimizationRemarkEmitter *ORE); static bool sink(Instruction &I, const LoopInfo *LI, const DominatorTree *DT, const Loop *CurLoop, AliasSetTracker *CurAST, const LoopSafetyInfo *SafetyInfo, OptimizationRemarkEmitter *ORE); static bool isSafeToExecuteUnconditionally(Instruction &Inst, const DominatorTree *DT, const Loop *CurLoop, const LoopSafetyInfo *SafetyInfo, OptimizationRemarkEmitter *ORE, const Instruction *CtxI = nullptr); static bool pointerInvalidatedByLoop(Value *V, uint64_t Size, const AAMDNodes &AAInfo, AliasSetTracker *CurAST); static Instruction * CloneInstructionInExitBlock(Instruction &I, BasicBlock &ExitBlock, PHINode &PN, const LoopInfo *LI, const LoopSafetyInfo *SafetyInfo); namespace { struct LoopInvariantCodeMotion { bool runOnLoop(Loop *L, AliasAnalysis *AA, LoopInfo *LI, DominatorTree *DT, TargetLibraryInfo *TLI, ScalarEvolution *SE, OptimizationRemarkEmitter *ORE, bool DeleteAST); DenseMap &getLoopToAliasSetMap() { return LoopToAliasSetMap; } private: DenseMap LoopToAliasSetMap; AliasSetTracker *collectAliasInfoForLoop(Loop *L, LoopInfo *LI, AliasAnalysis *AA); }; struct LegacyLICMPass : public LoopPass { static char ID; // Pass identification, replacement for typeid LegacyLICMPass() : LoopPass(ID) { initializeLegacyLICMPassPass(*PassRegistry::getPassRegistry()); } bool runOnLoop(Loop *L, LPPassManager &LPM) override { if (skipLoop(L)) { // If we have run LICM on a previous loop but now we are skipping // (because we've hit the opt-bisect limit), we need to clear the // loop alias information. for (auto <AS : LICM.getLoopToAliasSetMap()) delete LTAS.second; LICM.getLoopToAliasSetMap().clear(); return false; } auto *SE = getAnalysisIfAvailable(); // For the old PM, we can't use OptimizationRemarkEmitter as an analysis // pass. Function analyses need to be preserved across loop transformations // but ORE cannot be preserved (see comment before the pass definition). OptimizationRemarkEmitter ORE(L->getHeader()->getParent()); return LICM.runOnLoop(L, &getAnalysis().getAAResults(), &getAnalysis().getLoopInfo(), &getAnalysis().getDomTree(), &getAnalysis().getTLI(), SE ? &SE->getSE() : nullptr, &ORE, false); } /// This transformation requires natural loop information & requires that /// loop preheaders be inserted into the CFG... /// void getAnalysisUsage(AnalysisUsage &AU) const override { AU.setPreservesCFG(); AU.addRequired(); getLoopAnalysisUsage(AU); } using llvm::Pass::doFinalization; bool doFinalization() override { assert(LICM.getLoopToAliasSetMap().empty() && "Didn't free loop alias sets"); return false; } private: LoopInvariantCodeMotion LICM; /// cloneBasicBlockAnalysis - Simple Analysis hook. Clone alias set info. void cloneBasicBlockAnalysis(BasicBlock *From, BasicBlock *To, Loop *L) override; /// deleteAnalysisValue - Simple Analysis hook. Delete value V from alias /// set. void deleteAnalysisValue(Value *V, Loop *L) override; /// Simple Analysis hook. Delete loop L from alias set map. void deleteAnalysisLoop(Loop *L) override; }; } PreservedAnalyses LICMPass::run(Loop &L, LoopAnalysisManager &AM, LoopStandardAnalysisResults &AR, LPMUpdater &) { const auto &FAM = AM.getResult(L, AR).getManager(); Function *F = L.getHeader()->getParent(); auto *ORE = FAM.getCachedResult(*F); // FIXME: This should probably be optional rather than required. if (!ORE) report_fatal_error("LICM: OptimizationRemarkEmitterAnalysis not " "cached at a higher level"); LoopInvariantCodeMotion LICM; if (!LICM.runOnLoop(&L, &AR.AA, &AR.LI, &AR.DT, &AR.TLI, &AR.SE, ORE, true)) return PreservedAnalyses::all(); // FIXME: There is no setPreservesCFG in the new PM. When that becomes // available, it should be used here. return getLoopPassPreservedAnalyses(); } char LegacyLICMPass::ID = 0; INITIALIZE_PASS_BEGIN(LegacyLICMPass, "licm", "Loop Invariant Code Motion", false, false) INITIALIZE_PASS_DEPENDENCY(LoopPass) INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass) INITIALIZE_PASS_END(LegacyLICMPass, "licm", "Loop Invariant Code Motion", false, false) Pass *llvm::createLICMPass() { return new LegacyLICMPass(); } /// Hoist expressions out of the specified loop. Note, alias info for inner /// loop is not preserved so it is not a good idea to run LICM multiple /// times on one loop. /// We should delete AST for inner loops in the new pass manager to avoid /// memory leak. /// bool LoopInvariantCodeMotion::runOnLoop(Loop *L, AliasAnalysis *AA, LoopInfo *LI, DominatorTree *DT, TargetLibraryInfo *TLI, ScalarEvolution *SE, OptimizationRemarkEmitter *ORE, bool DeleteAST) { bool Changed = false; assert(L->isLCSSAForm(*DT) && "Loop is not in LCSSA form."); AliasSetTracker *CurAST = collectAliasInfoForLoop(L, LI, AA); // Get the preheader block to move instructions into... BasicBlock *Preheader = L->getLoopPreheader(); // Compute loop safety information. LoopSafetyInfo SafetyInfo; computeLoopSafetyInfo(&SafetyInfo, L); // We want to visit all of the instructions in this loop... that are not parts // of our subloops (they have already had their invariants hoisted out of // their loop, into this loop, so there is no need to process the BODIES of // the subloops). // // Traverse the body of the loop in depth first order on the dominator tree so // that we are guaranteed to see definitions before we see uses. This allows // us to sink instructions in one pass, without iteration. After sinking // instructions, we perform another pass to hoist them out of the loop. // if (L->hasDedicatedExits()) Changed |= sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI, L, CurAST, &SafetyInfo, ORE); if (Preheader) Changed |= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI, L, CurAST, &SafetyInfo, ORE); // Now that all loop invariants have been removed from the loop, promote any // memory references to scalars that we can. // Don't sink stores from loops without dedicated block exits. Exits // containing indirect branches are not transformed by loop simplify, // make sure we catch that. An additional load may be generated in the // preheader for SSA updater, so also avoid sinking when no preheader // is available. if (!DisablePromotion && Preheader && L->hasDedicatedExits()) { // Figure out the loop exits and their insertion points SmallVector ExitBlocks; L->getUniqueExitBlocks(ExitBlocks); // We can't insert into a catchswitch. bool HasCatchSwitch = llvm::any_of(ExitBlocks, [](BasicBlock *Exit) { return isa(Exit->getTerminator()); }); if (!HasCatchSwitch) { SmallVector InsertPts; InsertPts.reserve(ExitBlocks.size()); for (BasicBlock *ExitBlock : ExitBlocks) InsertPts.push_back(&*ExitBlock->getFirstInsertionPt()); PredIteratorCache PIC; bool Promoted = false; // Loop over all of the alias sets in the tracker object. for (AliasSet &AS : *CurAST) Promoted |= promoteLoopAccessesToScalars(AS, ExitBlocks, InsertPts, PIC, LI, DT, TLI, L, CurAST, &SafetyInfo, ORE); // Once we have promoted values across the loop body we have to // recursively reform LCSSA as any nested loop may now have values defined // within the loop used in the outer loop. // FIXME: This is really heavy handed. It would be a bit better to use an // SSAUpdater strategy during promotion that was LCSSA aware and reformed // it as it went. if (Promoted) formLCSSARecursively(*L, *DT, LI, SE); Changed |= Promoted; } } // Check that neither this loop nor its parent have had LCSSA broken. LICM is // specifically moving instructions across the loop boundary and so it is // especially in need of sanity checking here. assert(L->isLCSSAForm(*DT) && "Loop not left in LCSSA form after LICM!"); assert((!L->getParentLoop() || L->getParentLoop()->isLCSSAForm(*DT)) && "Parent loop not left in LCSSA form after LICM!"); // If this loop is nested inside of another one, save the alias information // for when we process the outer loop. if (L->getParentLoop() && !DeleteAST) LoopToAliasSetMap[L] = CurAST; else delete CurAST; if (Changed && SE) SE->forgetLoopDispositions(L); return Changed; } /// Walk the specified region of the CFG (defined by all blocks dominated by /// the specified block, and that are in the current loop) in reverse depth /// first order w.r.t the DominatorTree. This allows us to visit uses before /// definitions, allowing us to sink a loop body in one pass without iteration. /// bool llvm::sinkRegion(DomTreeNode *N, AliasAnalysis *AA, LoopInfo *LI, DominatorTree *DT, TargetLibraryInfo *TLI, Loop *CurLoop, AliasSetTracker *CurAST, LoopSafetyInfo *SafetyInfo, OptimizationRemarkEmitter *ORE) { // Verify inputs. assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr && CurLoop != nullptr && CurAST != nullptr && SafetyInfo != nullptr && "Unexpected input to sinkRegion"); BasicBlock *BB = N->getBlock(); // If this subregion is not in the top level loop at all, exit. if (!CurLoop->contains(BB)) return false; // We are processing blocks in reverse dfo, so process children first. bool Changed = false; const std::vector &Children = N->getChildren(); for (DomTreeNode *Child : Children) Changed |= sinkRegion(Child, AA, LI, DT, TLI, CurLoop, CurAST, SafetyInfo, ORE); // Only need to process the contents of this block if it is not part of a // subloop (which would already have been processed). if (inSubLoop(BB, CurLoop, LI)) return Changed; for (BasicBlock::iterator II = BB->end(); II != BB->begin();) { Instruction &I = *--II; // If the instruction is dead, we would try to sink it because it isn't used // in the loop, instead, just delete it. if (isInstructionTriviallyDead(&I, TLI)) { DEBUG(dbgs() << "LICM deleting dead inst: " << I << '\n'); ++II; CurAST->deleteValue(&I); I.eraseFromParent(); Changed = true; continue; } // Check to see if we can sink this instruction to the exit blocks // of the loop. We can do this if the all users of the instruction are // outside of the loop. In this case, it doesn't even matter if the // operands of the instruction are loop invariant. // if (isNotUsedInLoop(I, CurLoop, SafetyInfo) && canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, SafetyInfo, ORE)) { ++II; Changed |= sink(I, LI, DT, CurLoop, CurAST, SafetyInfo, ORE); } } return Changed; } /// Walk the specified region of the CFG (defined by all blocks dominated by /// the specified block, and that are in the current loop) in depth first /// order w.r.t the DominatorTree. This allows us to visit definitions before /// uses, allowing us to hoist a loop body in one pass without iteration. /// bool llvm::hoistRegion(DomTreeNode *N, AliasAnalysis *AA, LoopInfo *LI, DominatorTree *DT, TargetLibraryInfo *TLI, Loop *CurLoop, AliasSetTracker *CurAST, LoopSafetyInfo *SafetyInfo, OptimizationRemarkEmitter *ORE) { // Verify inputs. assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr && CurLoop != nullptr && CurAST != nullptr && SafetyInfo != nullptr && "Unexpected input to hoistRegion"); BasicBlock *BB = N->getBlock(); // If this subregion is not in the top level loop at all, exit. if (!CurLoop->contains(BB)) return false; // Only need to process the contents of this block if it is not part of a // subloop (which would already have been processed). bool Changed = false; if (!inSubLoop(BB, CurLoop, LI)) for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E;) { Instruction &I = *II++; // Try constant folding this instruction. If all the operands are // constants, it is technically hoistable, but it would be better to just // fold it. if (Constant *C = ConstantFoldInstruction( &I, I.getModule()->getDataLayout(), TLI)) { DEBUG(dbgs() << "LICM folding inst: " << I << " --> " << *C << '\n'); CurAST->copyValue(&I, C); I.replaceAllUsesWith(C); if (isInstructionTriviallyDead(&I, TLI)) { CurAST->deleteValue(&I); I.eraseFromParent(); } Changed = true; continue; } // Try hoisting the instruction out to the preheader. We can only do this // if all of the operands of the instruction are loop invariant and if it // is safe to hoist the instruction. // if (CurLoop->hasLoopInvariantOperands(&I) && canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, SafetyInfo, ORE) && isSafeToExecuteUnconditionally( I, DT, CurLoop, SafetyInfo, ORE, CurLoop->getLoopPreheader()->getTerminator())) Changed |= hoist(I, DT, CurLoop, SafetyInfo, ORE); } const std::vector &Children = N->getChildren(); for (DomTreeNode *Child : Children) Changed |= hoistRegion(Child, AA, LI, DT, TLI, CurLoop, CurAST, SafetyInfo, ORE); return Changed; } /// Computes loop safety information, checks loop body & header /// for the possibility of may throw exception. /// void llvm::computeLoopSafetyInfo(LoopSafetyInfo *SafetyInfo, Loop *CurLoop) { assert(CurLoop != nullptr && "CurLoop cant be null"); BasicBlock *Header = CurLoop->getHeader(); // Setting default safety values. SafetyInfo->MayThrow = false; SafetyInfo->HeaderMayThrow = false; // Iterate over header and compute safety info. for (BasicBlock::iterator I = Header->begin(), E = Header->end(); (I != E) && !SafetyInfo->HeaderMayThrow; ++I) SafetyInfo->HeaderMayThrow |= !isGuaranteedToTransferExecutionToSuccessor(&*I); SafetyInfo->MayThrow = SafetyInfo->HeaderMayThrow; // Iterate over loop instructions and compute safety info. for (Loop::block_iterator BB = CurLoop->block_begin(), BBE = CurLoop->block_end(); (BB != BBE) && !SafetyInfo->MayThrow; ++BB) for (BasicBlock::iterator I = (*BB)->begin(), E = (*BB)->end(); (I != E) && !SafetyInfo->MayThrow; ++I) SafetyInfo->MayThrow |= !isGuaranteedToTransferExecutionToSuccessor(&*I); // Compute funclet colors if we might sink/hoist in a function with a funclet // personality routine. Function *Fn = CurLoop->getHeader()->getParent(); if (Fn->hasPersonalityFn()) if (Constant *PersonalityFn = Fn->getPersonalityFn()) if (isFuncletEHPersonality(classifyEHPersonality(PersonalityFn))) SafetyInfo->BlockColors = colorEHFunclets(*Fn); } bool llvm::canSinkOrHoistInst(Instruction &I, AAResults *AA, DominatorTree *DT, Loop *CurLoop, AliasSetTracker *CurAST, LoopSafetyInfo *SafetyInfo, OptimizationRemarkEmitter *ORE) { // Loads have extra constraints we have to verify before we can hoist them. if (LoadInst *LI = dyn_cast(&I)) { if (!LI->isUnordered()) return false; // Don't hoist volatile/atomic loads! // Loads from constant memory are always safe to move, even if they end up // in the same alias set as something that ends up being modified. if (AA->pointsToConstantMemory(LI->getOperand(0))) return true; if (LI->getMetadata(LLVMContext::MD_invariant_load)) return true; // Don't hoist loads which have may-aliased stores in loop. uint64_t Size = 0; if (LI->getType()->isSized()) Size = I.getModule()->getDataLayout().getTypeStoreSize(LI->getType()); AAMDNodes AAInfo; LI->getAAMetadata(AAInfo); bool Invalidated = pointerInvalidatedByLoop(LI->getOperand(0), Size, AAInfo, CurAST); // Check loop-invariant address because this may also be a sinkable load // whose address is not necessarily loop-invariant. if (ORE && Invalidated && CurLoop->isLoopInvariant(LI->getPointerOperand())) ORE->emit(OptimizationRemarkMissed( DEBUG_TYPE, "LoadWithLoopInvariantAddressInvalidated", LI) << "failed to move load with loop-invariant address " "because the loop may invalidate its value"); return !Invalidated; } else if (CallInst *CI = dyn_cast(&I)) { // Don't sink or hoist dbg info; it's legal, but not useful. if (isa(I)) return false; // Don't sink calls which can throw. if (CI->mayThrow()) return false; // Handle simple cases by querying alias analysis. FunctionModRefBehavior Behavior = AA->getModRefBehavior(CI); if (Behavior == FMRB_DoesNotAccessMemory) return true; if (AliasAnalysis::onlyReadsMemory(Behavior)) { // A readonly argmemonly function only reads from memory pointed to by // it's arguments with arbitrary offsets. If we can prove there are no // writes to this memory in the loop, we can hoist or sink. if (AliasAnalysis::onlyAccessesArgPointees(Behavior)) { for (Value *Op : CI->arg_operands()) if (Op->getType()->isPointerTy() && pointerInvalidatedByLoop(Op, MemoryLocation::UnknownSize, AAMDNodes(), CurAST)) return false; return true; } // If this call only reads from memory and there are no writes to memory // in the loop, we can hoist or sink the call as appropriate. bool FoundMod = false; for (AliasSet &AS : *CurAST) { if (!AS.isForwardingAliasSet() && AS.isMod()) { FoundMod = true; break; } } if (!FoundMod) return true; } // FIXME: This should use mod/ref information to see if we can hoist or // sink the call. return false; } // Only these instructions are hoistable/sinkable. if (!isa(I) && !isa(I) && !isa(I) && !isa(I) && !isa(I) && !isa(I) && !isa(I) && !isa(I) && !isa(I) && !isa(I)) return false; // SafetyInfo is nullptr if we are checking for sinking from preheader to // loop body. It will be always safe as there is no speculative execution. if (!SafetyInfo) return true; // TODO: Plumb the context instruction through to make hoisting and sinking // more powerful. Hoisting of loads already works due to the special casing // above. return isSafeToExecuteUnconditionally(I, DT, CurLoop, SafetyInfo, nullptr); } /// Returns true if a PHINode is a trivially replaceable with an /// Instruction. /// This is true when all incoming values are that instruction. /// This pattern occurs most often with LCSSA PHI nodes. /// static bool isTriviallyReplacablePHI(const PHINode &PN, const Instruction &I) { for (const Value *IncValue : PN.incoming_values()) if (IncValue != &I) return false; return true; } /// Return true if the only users of this instruction are outside of /// the loop. If this is true, we can sink the instruction to the exit /// blocks of the loop. /// static bool isNotUsedInLoop(const Instruction &I, const Loop *CurLoop, const LoopSafetyInfo *SafetyInfo) { const auto &BlockColors = SafetyInfo->BlockColors; for (const User *U : I.users()) { const Instruction *UI = cast(U); if (const PHINode *PN = dyn_cast(UI)) { const BasicBlock *BB = PN->getParent(); // We cannot sink uses in catchswitches. if (isa(BB->getTerminator())) return false; // We need to sink a callsite to a unique funclet. Avoid sinking if the // phi use is too muddled. if (isa(I)) if (!BlockColors.empty() && BlockColors.find(const_cast(BB))->second.size() != 1) return false; // A PHI node where all of the incoming values are this instruction are // special -- they can just be RAUW'ed with the instruction and thus // don't require a use in the predecessor. This is a particular important // special case because it is the pattern found in LCSSA form. if (isTriviallyReplacablePHI(*PN, I)) { if (CurLoop->contains(PN)) return false; else continue; } // Otherwise, PHI node uses occur in predecessor blocks if the incoming // values. Check for such a use being inside the loop. for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) if (PN->getIncomingValue(i) == &I) if (CurLoop->contains(PN->getIncomingBlock(i))) return false; continue; } if (CurLoop->contains(UI)) return false; } return true; } static Instruction * CloneInstructionInExitBlock(Instruction &I, BasicBlock &ExitBlock, PHINode &PN, const LoopInfo *LI, const LoopSafetyInfo *SafetyInfo) { Instruction *New; if (auto *CI = dyn_cast(&I)) { const auto &BlockColors = SafetyInfo->BlockColors; // Sinking call-sites need to be handled differently from other // instructions. The cloned call-site needs a funclet bundle operand // appropriate for it's location in the CFG. SmallVector OpBundles; for (unsigned BundleIdx = 0, BundleEnd = CI->getNumOperandBundles(); BundleIdx != BundleEnd; ++BundleIdx) { OperandBundleUse Bundle = CI->getOperandBundleAt(BundleIdx); if (Bundle.getTagID() == LLVMContext::OB_funclet) continue; OpBundles.emplace_back(Bundle); } if (!BlockColors.empty()) { const ColorVector &CV = BlockColors.find(&ExitBlock)->second; assert(CV.size() == 1 && "non-unique color for exit block!"); BasicBlock *BBColor = CV.front(); Instruction *EHPad = BBColor->getFirstNonPHI(); if (EHPad->isEHPad()) OpBundles.emplace_back("funclet", EHPad); } New = CallInst::Create(CI, OpBundles); } else { New = I.clone(); } ExitBlock.getInstList().insert(ExitBlock.getFirstInsertionPt(), New); if (!I.getName().empty()) New->setName(I.getName() + ".le"); // Build LCSSA PHI nodes for any in-loop operands. Note that this is // particularly cheap because we can rip off the PHI node that we're // replacing for the number and blocks of the predecessors. // OPT: If this shows up in a profile, we can instead finish sinking all // invariant instructions, and then walk their operands to re-establish // LCSSA. That will eliminate creating PHI nodes just to nuke them when // sinking bottom-up. for (User::op_iterator OI = New->op_begin(), OE = New->op_end(); OI != OE; ++OI) if (Instruction *OInst = dyn_cast(*OI)) if (Loop *OLoop = LI->getLoopFor(OInst->getParent())) if (!OLoop->contains(&PN)) { PHINode *OpPN = PHINode::Create(OInst->getType(), PN.getNumIncomingValues(), OInst->getName() + ".lcssa", &ExitBlock.front()); for (unsigned i = 0, e = PN.getNumIncomingValues(); i != e; ++i) OpPN->addIncoming(OInst, PN.getIncomingBlock(i)); *OI = OpPN; } return New; } /// When an instruction is found to only be used outside of the loop, this /// function moves it to the exit blocks and patches up SSA form as needed. /// This method is guaranteed to remove the original instruction from its /// position, and may either delete it or move it to outside of the loop. /// static bool sink(Instruction &I, const LoopInfo *LI, const DominatorTree *DT, const Loop *CurLoop, AliasSetTracker *CurAST, const LoopSafetyInfo *SafetyInfo, OptimizationRemarkEmitter *ORE) { DEBUG(dbgs() << "LICM sinking instruction: " << I << "\n"); ORE->emit(OptimizationRemark(DEBUG_TYPE, "InstSunk", &I) << "sinking " << ore::NV("Inst", &I)); bool Changed = false; if (isa(I)) ++NumMovedLoads; else if (isa(I)) ++NumMovedCalls; ++NumSunk; Changed = true; #ifndef NDEBUG SmallVector ExitBlocks; CurLoop->getUniqueExitBlocks(ExitBlocks); SmallPtrSet ExitBlockSet(ExitBlocks.begin(), ExitBlocks.end()); #endif // Clones of this instruction. Don't create more than one per exit block! SmallDenseMap SunkCopies; // If this instruction is only used outside of the loop, then all users are // PHI nodes in exit blocks due to LCSSA form. Just RAUW them with clones of // the instruction. while (!I.use_empty()) { Value::user_iterator UI = I.user_begin(); auto *User = cast(*UI); if (!DT->isReachableFromEntry(User->getParent())) { User->replaceUsesOfWith(&I, UndefValue::get(I.getType())); continue; } // The user must be a PHI node. PHINode *PN = cast(User); // Surprisingly, instructions can be used outside of loops without any // exits. This can only happen in PHI nodes if the incoming block is // unreachable. Use &U = UI.getUse(); BasicBlock *BB = PN->getIncomingBlock(U); if (!DT->isReachableFromEntry(BB)) { U = UndefValue::get(I.getType()); continue; } BasicBlock *ExitBlock = PN->getParent(); assert(ExitBlockSet.count(ExitBlock) && "The LCSSA PHI is not in an exit block!"); Instruction *New; auto It = SunkCopies.find(ExitBlock); if (It != SunkCopies.end()) New = It->second; else New = SunkCopies[ExitBlock] = CloneInstructionInExitBlock(I, *ExitBlock, *PN, LI, SafetyInfo); PN->replaceAllUsesWith(New); PN->eraseFromParent(); } CurAST->deleteValue(&I); I.eraseFromParent(); return Changed; } /// When an instruction is found to only use loop invariant operands that /// is safe to hoist, this instruction is called to do the dirty work. /// static bool hoist(Instruction &I, const DominatorTree *DT, const Loop *CurLoop, const LoopSafetyInfo *SafetyInfo, OptimizationRemarkEmitter *ORE) { auto *Preheader = CurLoop->getLoopPreheader(); DEBUG(dbgs() << "LICM hoisting to " << Preheader->getName() << ": " << I << "\n"); ORE->emit(OptimizationRemark(DEBUG_TYPE, "Hoisted", &I) << "hosting " << ore::NV("Inst", &I)); // Metadata can be dependent on conditions we are hoisting above. // Conservatively strip all metadata on the instruction unless we were // guaranteed to execute I if we entered the loop, in which case the metadata // is valid in the loop preheader. if (I.hasMetadataOtherThanDebugLoc() && // The check on hasMetadataOtherThanDebugLoc is to prevent us from burning // time in isGuaranteedToExecute if we don't actually have anything to // drop. It is a compile time optimization, not required for correctness. !isGuaranteedToExecute(I, DT, CurLoop, SafetyInfo)) I.dropUnknownNonDebugMetadata(); // Move the new node to the Preheader, before its terminator. I.moveBefore(Preheader->getTerminator()); // Do not retain debug locations when we are moving instructions to different // basic blocks, because we want to avoid jumpy line tables. Calls, however, // need to retain their debug locs because they may be inlined. // FIXME: How do we retain source locations without causing poor debugging // behavior? if (!isa(I)) I.setDebugLoc(DebugLoc()); if (isa(I)) ++NumMovedLoads; else if (isa(I)) ++NumMovedCalls; ++NumHoisted; return true; } /// Only sink or hoist an instruction if it is not a trapping instruction, /// or if the instruction is known not to trap when moved to the preheader. /// or if it is a trapping instruction and is guaranteed to execute. static bool isSafeToExecuteUnconditionally(Instruction &Inst, const DominatorTree *DT, const Loop *CurLoop, const LoopSafetyInfo *SafetyInfo, OptimizationRemarkEmitter *ORE, const Instruction *CtxI) { if (isSafeToSpeculativelyExecute(&Inst, CtxI, DT)) return true; bool GuaranteedToExecute = isGuaranteedToExecute(Inst, DT, CurLoop, SafetyInfo); if (!GuaranteedToExecute) { auto *LI = dyn_cast(&Inst); if (LI && CurLoop->isLoopInvariant(LI->getPointerOperand())) ORE->emit(OptimizationRemarkMissed( DEBUG_TYPE, "LoadWithLoopInvariantAddressCondExecuted", LI) << "failed to hoist load with loop-invariant address " "because load is conditionally executed"); } return GuaranteedToExecute; } namespace { class LoopPromoter : public LoadAndStorePromoter { Value *SomePtr; // Designated pointer to store to. SmallPtrSetImpl &PointerMustAliases; SmallVectorImpl &LoopExitBlocks; SmallVectorImpl &LoopInsertPts; PredIteratorCache &PredCache; AliasSetTracker &AST; LoopInfo &LI; DebugLoc DL; int Alignment; AAMDNodes AATags; Value *maybeInsertLCSSAPHI(Value *V, BasicBlock *BB) const { if (Instruction *I = dyn_cast(V)) if (Loop *L = LI.getLoopFor(I->getParent())) if (!L->contains(BB)) { // We need to create an LCSSA PHI node for the incoming value and // store that. PHINode *PN = PHINode::Create(I->getType(), PredCache.size(BB), I->getName() + ".lcssa", &BB->front()); for (BasicBlock *Pred : PredCache.get(BB)) PN->addIncoming(I, Pred); return PN; } return V; } public: LoopPromoter(Value *SP, ArrayRef Insts, SSAUpdater &S, SmallPtrSetImpl &PMA, SmallVectorImpl &LEB, SmallVectorImpl &LIP, PredIteratorCache &PIC, AliasSetTracker &ast, LoopInfo &li, DebugLoc dl, int alignment, const AAMDNodes &AATags) : LoadAndStorePromoter(Insts, S), SomePtr(SP), PointerMustAliases(PMA), LoopExitBlocks(LEB), LoopInsertPts(LIP), PredCache(PIC), AST(ast), LI(li), DL(std::move(dl)), Alignment(alignment), AATags(AATags) {} bool isInstInList(Instruction *I, const SmallVectorImpl &) const override { Value *Ptr; if (LoadInst *LI = dyn_cast(I)) Ptr = LI->getOperand(0); else Ptr = cast(I)->getPointerOperand(); return PointerMustAliases.count(Ptr); } void doExtraRewritesBeforeFinalDeletion() const override { // Insert stores after in the loop exit blocks. Each exit block gets a // store of the live-out values that feed them. Since we've already told // the SSA updater about the defs in the loop and the preheader // definition, it is all set and we can start using it. for (unsigned i = 0, e = LoopExitBlocks.size(); i != e; ++i) { BasicBlock *ExitBlock = LoopExitBlocks[i]; Value *LiveInValue = SSA.GetValueInMiddleOfBlock(ExitBlock); LiveInValue = maybeInsertLCSSAPHI(LiveInValue, ExitBlock); Value *Ptr = maybeInsertLCSSAPHI(SomePtr, ExitBlock); Instruction *InsertPos = LoopInsertPts[i]; StoreInst *NewSI = new StoreInst(LiveInValue, Ptr, InsertPos); NewSI->setAlignment(Alignment); NewSI->setDebugLoc(DL); if (AATags) NewSI->setAAMetadata(AATags); } } void replaceLoadWithValue(LoadInst *LI, Value *V) const override { // Update alias analysis. AST.copyValue(LI, V); } void instructionDeleted(Instruction *I) const override { AST.deleteValue(I); } }; } // end anon namespace /// Try to promote memory values to scalars by sinking stores out of the /// loop and moving loads to before the loop. We do this by looping over /// the stores in the loop, looking for stores to Must pointers which are /// loop invariant. /// bool llvm::promoteLoopAccessesToScalars( AliasSet &AS, SmallVectorImpl &ExitBlocks, SmallVectorImpl &InsertPts, PredIteratorCache &PIC, LoopInfo *LI, DominatorTree *DT, const TargetLibraryInfo *TLI, Loop *CurLoop, AliasSetTracker *CurAST, LoopSafetyInfo *SafetyInfo, OptimizationRemarkEmitter *ORE) { // Verify inputs. assert(LI != nullptr && DT != nullptr && CurLoop != nullptr && CurAST != nullptr && SafetyInfo != nullptr && "Unexpected Input to promoteLoopAccessesToScalars"); // We can promote this alias set if it has a store, if it is a "Must" alias // set, if the pointer is loop invariant, and if we are not eliminating any // volatile loads or stores. if (AS.isForwardingAliasSet() || !AS.isMod() || !AS.isMustAlias() || AS.isVolatile() || !CurLoop->isLoopInvariant(AS.begin()->getValue())) return false; assert(!AS.empty() && "Must alias set should have at least one pointer element in it!"); Value *SomePtr = AS.begin()->getValue(); BasicBlock *Preheader = CurLoop->getLoopPreheader(); // It isn't safe to promote a load/store from the loop if the load/store is // conditional. For example, turning: // // for () { if (c) *P += 1; } // // into: // // tmp = *P; for () { if (c) tmp +=1; } *P = tmp; // // is not safe, because *P may only be valid to access if 'c' is true. // // The safety property divides into two parts: // p1) The memory may not be dereferenceable on entry to the loop. In this // case, we can't insert the required load in the preheader. // p2) The memory model does not allow us to insert a store along any dynamic // path which did not originally have one. // // If at least one store is guaranteed to execute, both properties are // satisfied, and promotion is legal. // // This, however, is not a necessary condition. Even if no store/load is // guaranteed to execute, we can still establish these properties. // We can establish (p1) by proving that hoisting the load into the preheader // is safe (i.e. proving dereferenceability on all paths through the loop). We // can use any access within the alias set to prove dereferenceability, // since they're all must alias. // // There are two ways establish (p2): // a) Prove the location is thread-local. In this case the memory model // requirement does not apply, and stores are safe to insert. // b) Prove a store dominates every exit block. In this case, if an exit // blocks is reached, the original dynamic path would have taken us through // the store, so inserting a store into the exit block is safe. Note that this // is different from the store being guaranteed to execute. For instance, // if an exception is thrown on the first iteration of the loop, the original // store is never executed, but the exit blocks are not executed either. bool DereferenceableInPH = false; bool SafeToInsertStore = false; SmallVector LoopUses; SmallPtrSet PointerMustAliases; // We start with an alignment of one and try to find instructions that allow // us to prove better alignment. unsigned Alignment = 1; AAMDNodes AATags; const DataLayout &MDL = Preheader->getModule()->getDataLayout(); if (SafetyInfo->MayThrow) { // If a loop can throw, we have to insert a store along each unwind edge. // That said, we can't actually make the unwind edge explicit. Therefore, // we have to prove that the store is dead along the unwind edge. // // Currently, this code just special-cases alloca instructions. if (!isa(GetUnderlyingObject(SomePtr, MDL))) return false; } // Check that all of the pointers in the alias set have the same type. We // cannot (yet) promote a memory location that is loaded and stored in // different sizes. While we are at it, collect alignment and AA info. for (const auto &ASI : AS) { Value *ASIV = ASI.getValue(); PointerMustAliases.insert(ASIV); // Check that all of the pointers in the alias set have the same type. We // cannot (yet) promote a memory location that is loaded and stored in // different sizes. if (SomePtr->getType() != ASIV->getType()) return false; for (User *U : ASIV->users()) { // Ignore instructions that are outside the loop. Instruction *UI = dyn_cast(U); if (!UI || !CurLoop->contains(UI)) continue; // If there is an non-load/store instruction in the loop, we can't promote // it. if (LoadInst *Load = dyn_cast(UI)) { assert(!Load->isVolatile() && "AST broken"); if (!Load->isSimple()) return false; if (!DereferenceableInPH) DereferenceableInPH = isSafeToExecuteUnconditionally( *Load, DT, CurLoop, SafetyInfo, ORE, Preheader->getTerminator()); } else if (const StoreInst *Store = dyn_cast(UI)) { // Stores *of* the pointer are not interesting, only stores *to* the // pointer. if (UI->getOperand(1) != ASIV) continue; assert(!Store->isVolatile() && "AST broken"); if (!Store->isSimple()) return false; // If the store is guaranteed to execute, both properties are satisfied. // We may want to check if a store is guaranteed to execute even if we // already know that promotion is safe, since it may have higher // alignment than any other guaranteed stores, in which case we can // raise the alignment on the promoted store. unsigned InstAlignment = Store->getAlignment(); if (!InstAlignment) InstAlignment = MDL.getABITypeAlignment(Store->getValueOperand()->getType()); if (!DereferenceableInPH || !SafeToInsertStore || (InstAlignment > Alignment)) { if (isGuaranteedToExecute(*UI, DT, CurLoop, SafetyInfo)) { DereferenceableInPH = true; SafeToInsertStore = true; Alignment = std::max(Alignment, InstAlignment); } } // If a store dominates all exit blocks, it is safe to sink. // As explained above, if an exit block was executed, a dominating // store must have been been executed at least once, so we are not // introducing stores on paths that did not have them. // Note that this only looks at explicit exit blocks. If we ever // start sinking stores into unwind edges (see above), this will break. if (!SafeToInsertStore) SafeToInsertStore = llvm::all_of(ExitBlocks, [&](BasicBlock *Exit) { return DT->dominates(Store->getParent(), Exit); }); // If the store is not guaranteed to execute, we may still get // deref info through it. if (!DereferenceableInPH) { DereferenceableInPH = isDereferenceableAndAlignedPointer( Store->getPointerOperand(), Store->getAlignment(), MDL, Preheader->getTerminator(), DT); } } else return false; // Not a load or store. // Merge the AA tags. if (LoopUses.empty()) { // On the first load/store, just take its AA tags. UI->getAAMetadata(AATags); } else if (AATags) { UI->getAAMetadata(AATags, /* Merge = */ true); } LoopUses.push_back(UI); } } // If we couldn't prove we can hoist the load, bail. if (!DereferenceableInPH) return false; // We know we can hoist the load, but don't have a guaranteed store. // Check whether the location is thread-local. If it is, then we can insert // stores along paths which originally didn't have them without violating the // memory model. if (!SafeToInsertStore) { Value *Object = GetUnderlyingObject(SomePtr, MDL); SafeToInsertStore = (isAllocLikeFn(Object, TLI) || isa(Object)) && !PointerMayBeCaptured(Object, true, true); } // If we've still failed to prove we can sink the store, give up. if (!SafeToInsertStore) return false; // Otherwise, this is safe to promote, lets do it! DEBUG(dbgs() << "LICM: Promoting value stored to in loop: " << *SomePtr << '\n'); ORE->emit( OptimizationRemark(DEBUG_TYPE, "PromoteLoopAccessesToScalar", LoopUses[0]) << "Moving accesses to memory location out of the loop"); ++NumPromoted; // Grab a debug location for the inserted loads/stores; given that the // inserted loads/stores have little relation to the original loads/stores, // this code just arbitrarily picks a location from one, since any debug // location is better than none. DebugLoc DL = LoopUses[0]->getDebugLoc(); // We use the SSAUpdater interface to insert phi nodes as required. SmallVector NewPHIs; SSAUpdater SSA(&NewPHIs); LoopPromoter Promoter(SomePtr, LoopUses, SSA, PointerMustAliases, ExitBlocks, InsertPts, PIC, *CurAST, *LI, DL, Alignment, AATags); // Set up the preheader to have a definition of the value. It is the live-out // value from the preheader that uses in the loop will use. LoadInst *PreheaderLoad = new LoadInst( SomePtr, SomePtr->getName() + ".promoted", Preheader->getTerminator()); PreheaderLoad->setAlignment(Alignment); PreheaderLoad->setDebugLoc(DL); if (AATags) PreheaderLoad->setAAMetadata(AATags); SSA.AddAvailableValue(Preheader, PreheaderLoad); // Rewrite all the loads in the loop and remember all the definitions from // stores in the loop. Promoter.run(LoopUses); // If the SSAUpdater didn't use the load in the preheader, just zap it now. if (PreheaderLoad->use_empty()) PreheaderLoad->eraseFromParent(); return true; } /// Returns an owning pointer to an alias set which incorporates aliasing info /// from L and all subloops of L. /// FIXME: In new pass manager, there is no helper function to handle loop /// analysis such as cloneBasicBlockAnalysis, so the AST needs to be recomputed /// from scratch for every loop. Hook up with the helper functions when /// available in the new pass manager to avoid redundant computation. AliasSetTracker * LoopInvariantCodeMotion::collectAliasInfoForLoop(Loop *L, LoopInfo *LI, AliasAnalysis *AA) { AliasSetTracker *CurAST = nullptr; SmallVector RecomputeLoops; for (Loop *InnerL : L->getSubLoops()) { auto MapI = LoopToAliasSetMap.find(InnerL); // If the AST for this inner loop is missing it may have been merged into // some other loop's AST and then that loop unrolled, and so we need to // recompute it. if (MapI == LoopToAliasSetMap.end()) { RecomputeLoops.push_back(InnerL); continue; } AliasSetTracker *InnerAST = MapI->second; if (CurAST != nullptr) { // What if InnerLoop was modified by other passes ? CurAST->add(*InnerAST); // Once we've incorporated the inner loop's AST into ours, we don't need // the subloop's anymore. delete InnerAST; } else { CurAST = InnerAST; } LoopToAliasSetMap.erase(MapI); } if (CurAST == nullptr) CurAST = new AliasSetTracker(*AA); auto mergeLoop = [&](Loop *L) { // Loop over the body of this loop, looking for calls, invokes, and stores. - // Because subloops have already been incorporated into AST, we skip blocks - // in subloops. for (BasicBlock *BB : L->blocks()) - if (LI->getLoopFor(BB) == L) // Ignore blocks in subloops. CurAST->add(*BB); // Incorporate the specified basic block }; // Add everything from the sub loops that are no longer directly available. for (Loop *InnerL : RecomputeLoops) mergeLoop(InnerL); // And merge in this loop. mergeLoop(L); return CurAST; } /// Simple analysis hook. Clone alias set info. /// void LegacyLICMPass::cloneBasicBlockAnalysis(BasicBlock *From, BasicBlock *To, Loop *L) { AliasSetTracker *AST = LICM.getLoopToAliasSetMap().lookup(L); if (!AST) return; AST->copyValue(From, To); } /// Simple Analysis hook. Delete value V from alias set /// void LegacyLICMPass::deleteAnalysisValue(Value *V, Loop *L) { AliasSetTracker *AST = LICM.getLoopToAliasSetMap().lookup(L); if (!AST) return; AST->deleteValue(V); } /// Simple Analysis hook. Delete value L from alias set map. /// void LegacyLICMPass::deleteAnalysisLoop(Loop *L) { AliasSetTracker *AST = LICM.getLoopToAliasSetMap().lookup(L); if (!AST) return; delete AST; LICM.getLoopToAliasSetMap().erase(L); } /// Return true if the body of this loop may store into the memory /// location pointed to by V. /// static bool pointerInvalidatedByLoop(Value *V, uint64_t Size, const AAMDNodes &AAInfo, AliasSetTracker *CurAST) { // Check to see if any of the basic blocks in CurLoop invalidate *V. return CurAST->getAliasSetForPointer(V, Size, AAInfo).isMod(); } /// Little predicate that returns true if the specified basic block is in /// a subloop of the current one, not the current one itself. /// static bool inSubLoop(BasicBlock *BB, Loop *CurLoop, LoopInfo *LI) { assert(CurLoop->contains(BB) && "Only valid if BB is IN the loop"); return LI->getLoopFor(BB) != CurLoop; } Index: vendor/llvm/dist/test/CodeGen/AArch64/ldst-opt.mir =================================================================== --- vendor/llvm/dist/test/CodeGen/AArch64/ldst-opt.mir (revision 314158) +++ vendor/llvm/dist/test/CodeGen/AArch64/ldst-opt.mir (revision 314159) @@ -1,132 +1,146 @@ # RUN: llc -mtriple=aarch64--linux-gnu -run-pass=aarch64-ldst-opt %s -verify-machineinstrs -o - 2>&1 | FileCheck %s ---- | - define void @promote-load-from-store() { ret void } - define void @store-pair() { ret void } - define void @store-pair-clearkill0() { ret void } - define void @store-pair-clearkill1() { ret void } -... --- name: promote-load-from-store tracksRegLiveness: true body: | bb.0: liveins: %w1, %x0, %lr STRWui killed %w1, %x0, 0 :: (store 4) CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 %w0 = LDRHHui killed %x0, 1 :: (load 2) RET %lr, implicit %w0 ... # Don't count transient instructions towards search limits. # CHECK-LABEL: name: promote-load-from-store # CHECK: STRWui %w1 # CHECK: UBFMWri %w1 --- name: store-pair tracksRegLiveness: true body: | bb.0: liveins: %w1, %x0, %lr STRWui %w1, %x0, 0 :: (store 4) CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 CFI_INSTRUCTION 0 STRWui killed %w1, killed %x0, 1 :: (store 4) RET %lr ... # CHECK-LABEL: name: store-pair # CHECK: STPWi --- name: store-pair-clearkill0 tracksRegLiveness: true body: | bb.0: liveins: %w1, %x0, %lr STRWui %w1, %x0, 0 :: (store 4) %w2 = COPY %w1 %x3 = COPY %x0 STRWui killed %w1, killed %x0, 1 :: (store 4) RET %lr ... # When merging a lower store with an upper one, we must clear kill flags on # the lower store. # CHECK-LABEL: store-pair-clearkill0 # CHECK-NOT: STPWi %w1, killed %w1, %x0, 0 :: (store 4) # CHECK: STPWi %w1, %w1, %x0, 0 :: (store 4) # CHECK: %w2 = COPY %w1 # CHECK: RET %lr --- name: store-pair-clearkill1 tracksRegLiveness: true body: | bb.0: liveins: %x0, %lr %w1 = MOVi32imm 13 %w2 = MOVi32imm 7 STRWui %w1, %x0, 1 :: (store 4) %w2 = COPY killed %w1 STRWui killed %w2, %x0, 0 :: (store 4) %w1 = MOVi32imm 42 %w2 = MOVi32imm 7 STRWui %w1, %x0, 0 :: (store 4) %w2 = COPY killed %w1 STRWui killed %w2, killed %x0, 1 :: (store 4) RET %lr ... # When merging an upper store with a lower one, kill flags along the way need # to be removed; In this case the kill flag on %w1. # CHECK-LABEL: store-pair-clearkill1 # CHECK: %w1 = MOVi32imm # CHECK: %w2 = MOVi32imm # CHECK-NOT: %w2 = COPY killed %w1 # CHECK: %w2 = COPY %w1 # CHECK: STPWi killed %w2, %w1, %x0, 0 # CHECK: %w1 = MOVi32imm # CHECK: %w2 = MOVi32imm # CHECK-NOT: %w2 = COPY killed %w1 # CHECK: %w2 = COPY %w1 # CHECK: STPWi %w1, killed %w2, killed %x0, 0 +--- +name: store-load-clearkill +tracksRegLiveness: true +body: | + bb.0: + liveins: %w1 + + STRWui %w1, %sp, 0 :: (store 4) + %wzr = COPY killed %w1 ; killing use of %w1 + %w11 = LDRWui %sp, 0 :: (load 4) + HINT 0, implicit %w11 ; some use of %w11 +... +# When replaceing the load of a store-load pair with a copy the kill flags +# along the way need to be cleared. +# CHECK-LABEL: name: store-load-clearkill +# CHECK: STRWui %w1, %sp, 0 :: (store 4) +# CHECK-NOT: COPY killed %w1 +# CHECK: %wzr = COPY %w1 +# CHECK: %w11 = ORRWrs %wzr, %w1, 0 +# CHECK: HINT 0, implicit %w11 Index: vendor/llvm/dist/test/CodeGen/ARM/aeabi-read-tp.ll =================================================================== --- vendor/llvm/dist/test/CodeGen/ARM/aeabi-read-tp.ll (nonexistent) +++ vendor/llvm/dist/test/CodeGen/ARM/aeabi-read-tp.ll (revision 314159) @@ -0,0 +1,26 @@ +; RUN: llc -mtriple armv7---eabi -filetype asm -o - %s | FileCheck %s -check-prefix CHECK -check-prefix CHECK-SHORT +; RUN: llc -mtriple thumbv7---eabi -filetype asm -o - %s | FileCheck %s -check-prefix CHECK -check-prefix CHECK-SHORT +; RUN: llc -mtriple armv7---eabi -mattr=+long-calls -filetype asm -o - %s | FileCheck %s -check-prefix CHECK -check-prefix CHECK-LONG +; RUN: llc -mtriple thumbv7---eabi -mattr=+long-calls -filetype asm -o - %s | FileCheck %s -check-prefix CHECK -check-prefix CHECK-LONG + +@i = thread_local local_unnamed_addr global i32 0, align 4 + +define i32 @f() local_unnamed_addr { +entry: + %0 = load i32, i32* @i, align 4 + ret i32 %0 +} + +; CHECK-LABEL: f: +; CHECK-SHORT: ldr r1, [[VAR:.LCPI[0-9]+_[0-9]+]] +; CHECK-SHORT-NEXT: bl __aeabi_read_tp +; CHECK-SHORT: [[VAR]]: +; CHECK-SHORT-NEXT: .long i(TPOFF) + +; CHECK-LONG: ldr [[REG:r[0-9]+]], [[FUN:.LCPI[0-9]+_[0-9]+]] +; CHECK-LONG-NEXT: ldr r1, [[VAR:.LCPI[0-9]+_[0-9]+]] +; CHECK-LONG-NEXT: blx [[REG]] +; CHECK-LONG: [[VAR]]: +; CHECK-LONG-NEXT: .long i(TPOFF) +; CHECK-LONG: [[FUN]]: +; CHECK-LONG-NEXT: .long __aeabi_read_tp Index: vendor/llvm/dist/test/DebugInfo/X86/FrameIndexExprs.ll =================================================================== --- vendor/llvm/dist/test/DebugInfo/X86/FrameIndexExprs.ll (nonexistent) +++ vendor/llvm/dist/test/DebugInfo/X86/FrameIndexExprs.ll (revision 314159) @@ -0,0 +1,85 @@ +; PR31381: An assertion in the DWARF backend when fragments in MMI slots are +; sorted by largest offset first. +; RUN: llc -mtriple=x86_64-apple-darwin -filetype=obj < %s | llvm-dwarfdump -debug-dump=info - | FileCheck %s +; CHECK: DW_TAG_formal_parameter +; CHECK: DW_TAG_formal_parameter +; CHECK-NEXT: DW_AT_location [DW_FORM_exprloc] (<0xa> 91 78 93 03 93 06 91 7d 93 03 ) +; fbreg -8, piece 0x00000003, piece 0x00000006, fbreg -3, piece 0x00000003 +; CHECK-NEXT: DW_AT_abstract_origin {{.*}}"p" +source_filename = "bugpoint-reduced-simplified.ll" +target triple = "x86_64-apple-darwin" + +@f = common local_unnamed_addr global i32 0, align 4, !dbg !0 +@h = common local_unnamed_addr global i32 0, align 4, !dbg !6 + +; Function Attrs: nounwind readnone +declare void @llvm.dbg.declare(metadata, metadata, metadata) #0 + +define void @fn4() local_unnamed_addr !dbg !12 { +entry: + %l1.sroa.7.i = alloca [3 x i8], align 1 + tail call void @llvm.dbg.declare(metadata [3 x i8]* %l1.sroa.7.i, metadata !15, metadata !26), !dbg !27 + %i.sroa.4.i = alloca [3 x i8], align 8 + tail call void @llvm.dbg.declare(metadata [3 x i8]* %i.sroa.4.i, metadata !15, metadata !32), !dbg !27 + %0 = load i32, i32* @h, align 4 + br label %while.body.i.i, !dbg !33 + +while.body.i.i: ; preds = %while.body.i.i, %entry + br label %while.body.i.i, !dbg !34 + +fn3.exit: ; No predecessors! + %1 = load i32, i32* @f, align 4 + %tobool.i = icmp eq i32 %1, 0 + br label %while.body.i + +while.body.i: ; preds = %if.end.i, %fn3.exit + br i1 %tobool.i, label %if.end.i, label %if.then.i + +if.then.i: ; preds = %while.body.i + br label %if.end.i + +if.end.i: ; preds = %if.then.i, %while.body.i + br label %while.body.i +} + +attributes #0 = { nounwind readnone } + +!llvm.dbg.cu = !{!2} +!llvm.module.flags = !{!9, !10, !11} + +!0 = !DIGlobalVariableExpression(var: !1) +!1 = distinct !DIGlobalVariable(name: "f", scope: !2, file: !3, line: 8, type: !8, isLocal: false, isDefinition: true) +!2 = distinct !DICompileUnit(language: DW_LANG_C99, file: !3, isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, globals: !5) +!3 = !DIFile(filename: "PR31381.c", directory: "/") +!4 = !{} +!5 = !{!0, !6} +!6 = !DIGlobalVariableExpression(var: !7) +!7 = distinct !DIGlobalVariable(name: "h", scope: !2, file: !3, line: 8, type: !8, isLocal: false, isDefinition: true) +!8 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) +!9 = !{i32 2, !"Dwarf Version", i32 4} +!10 = !{i32 2, !"Debug Info Version", i32 3} +!11 = !{i32 1, !"PIC Level", i32 2} +!12 = distinct !DISubprogram(name: "fn4", scope: !3, file: !3, line: 31, type: !13, isLocal: false, isDefinition: true, scopeLine: 32, isOptimized: true, unit: !2, variables: !4) +!13 = !DISubroutineType(types: !14) +!14 = !{null} +!15 = !DILocalVariable(name: "p", arg: 1, scope: !16, file: !3, line: 19, type: !19) +!16 = distinct !DISubprogram(name: "fn2", scope: !3, file: !3, line: 19, type: !17, isLocal: false, isDefinition: true, scopeLine: 20, flags: DIFlagPrototyped, isOptimized: true, unit: !2, variables: !25) +!17 = !DISubroutineType(types: !18) +!18 = !{null, !19} +!19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "S", file: !3, line: 1, size: 96, elements: !20) +!20 = !{!21, !23, !24} +!21 = !DIDerivedType(tag: DW_TAG_member, name: "b", scope: !19, file: !3, line: 4, baseType: !22, size: 8, offset: 24) +!22 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char) +!23 = !DIDerivedType(tag: DW_TAG_member, name: "c", scope: !19, file: !3, line: 5, baseType: !8, size: 32, offset: 32) +!24 = !DIDerivedType(tag: DW_TAG_member, name: "d", scope: !19, file: !3, line: 6, baseType: !8, size: 6, offset: 64, flags: DIFlagBitField, extraData: i64 64) +!25 = !{!15} +!26 = !DIExpression(DW_OP_LLVM_fragment, 72, 24) +!27 = !DILocation(line: 19, column: 20, scope: !16, inlinedAt: !28) +!28 = distinct !DILocation(line: 27, column: 3, scope: !29, inlinedAt: !30) +!29 = distinct !DISubprogram(name: "fn3", scope: !3, file: !3, line: 24, type: !13, isLocal: false, isDefinition: true, scopeLine: 25, isOptimized: true, unit: !2, variables: !4) +!30 = distinct !DILocation(line: 34, column: 7, scope: !31) +!31 = distinct !DILexicalBlock(scope: !12, file: !3, line: 33, column: 5) +!32 = !DIExpression(DW_OP_LLVM_fragment, 0, 24) +!33 = !DILocation(line: 22, column: 9, scope: !16, inlinedAt: !28) +!34 = !DILocation(line: 21, column: 3, scope: !35, inlinedAt: !28) +!35 = !DILexicalBlockFile(scope: !16, file: !3, discriminator: 2)