Index: vendor/llvm/dist/docs/AliasAnalysis.html =================================================================== --- vendor/llvm/dist/docs/AliasAnalysis.html (revision 228363) +++ vendor/llvm/dist/docs/AliasAnalysis.html (revision 228364) @@ -1,1067 +1,1068 @@ + LLVM Alias Analysis Infrastructure

LLVM Alias Analysis Infrastructure

  1. Introduction
  2. AliasAnalysis Class Overview
  3. Writing a new AliasAnalysis Implementation
  4. Using alias analysis results
  5. Existing alias analysis implementations and clients
  6. Memory Dependence Analysis

Written by Chris Lattner

Introduction

Alias Analysis (aka Pointer Analysis) is a class of techniques which attempt to determine whether or not two pointers ever can point to the same object in memory. There are many different algorithms for alias analysis and many different ways of classifying them: flow-sensitive vs flow-insensitive, context-sensitive vs context-insensitive, field-sensitive vs field-insensitive, unification-based vs subset-based, etc. Traditionally, alias analyses respond to a query with a Must, May, or No alias response, indicating that two pointers always point to the same object, might point to the same object, or are known to never point to the same object.

The LLVM AliasAnalysis class is the primary interface used by clients and implementations of alias analyses in the LLVM system. This class is the common interface between clients of alias analysis information and the implementations providing it, and is designed to support a wide range of implementations and clients (but currently all clients are assumed to be flow-insensitive). In addition to simple alias analysis information, this class exposes Mod/Ref information from those implementations which can provide it, allowing for powerful analyses and transformations to work well together.

This document contains information necessary to successfully implement this interface, use it, and to test both sides. It also explains some of the finer points about what exactly results mean. If you feel that something is unclear or should be added, please let me know.

AliasAnalysis Class Overview

The AliasAnalysis class defines the interface that the various alias analysis implementations should support. This class exports two important enums: AliasResult and ModRefResult which represent the result of an alias query or a mod/ref query, respectively.

The AliasAnalysis interface exposes information about memory, represented in several different ways. In particular, memory objects are represented as a starting address and size, and function calls are represented as the actual call or invoke instructions that performs the call. The AliasAnalysis interface also exposes some helper methods which allow you to get mod/ref information for arbitrary instructions.

All AliasAnalysis interfaces require that in queries involving multiple values, values which are not constants are all defined within the same function.

Representation of Pointers

Most importantly, the AliasAnalysis class provides several methods which are used to query whether or not two memory objects alias, whether function calls can modify or read a memory object, etc. For all of these queries, memory objects are represented as a pair of their starting address (a symbolic LLVM Value*) and a static size.

Representing memory objects as a starting address and a size is critically important for correct Alias Analyses. For example, consider this (silly, but possible) C code:

 int i;
 char C[2];
 char A[10]; 
 /* ... */
 for (i = 0; i != 10; ++i) {
   C[0] = A[i];          /* One byte store */
   C[1] = A[9-i];        /* One byte store */
 }
 

In this case, the basicaa pass will disambiguate the stores to C[0] and C[1] because they are accesses to two distinct locations one byte apart, and the accesses are each one byte. In this case, the LICM pass can use store motion to remove the stores from the loop. In constrast, the following code:

 int i;
 char C[2];
 char A[10]; 
 /* ... */
 for (i = 0; i != 10; ++i) {
   ((short*)C)[0] = A[i];  /* Two byte store! */
   C[1] = A[9-i];          /* One byte store */
 }
 

In this case, the two stores to C do alias each other, because the access to the &C[0] element is a two byte access. If size information wasn't available in the query, even the first case would have to conservatively assume that the accesses alias.

The alias method

The alias method is the primary interface used to determine whether or not two memory objects alias each other. It takes two memory objects as input and returns MustAlias, PartialAlias, MayAlias, or NoAlias as appropriate.

Like all AliasAnalysis interfaces, the alias method requires that either the two pointer values be defined within the same function, or at least one of the values is a constant.

Must, May, and No Alias Responses

The NoAlias response may be used when there is never an immediate dependence between any memory reference based on one pointer and any memory reference based the other. The most obvious example is when the two pointers point to non-overlapping memory ranges. Another is when the two pointers are only ever used for reading memory. Another is when the memory is freed and reallocated between accesses through one pointer and accesses through the other -- in this case, there is a dependence, but it's mediated by the free and reallocation.

As an exception to this is with the noalias keyword; the "irrelevant" dependencies are ignored.

The MayAlias response is used whenever the two pointers might refer to the same object.

The PartialAlias response is used when the two memory objects are known to be overlapping in some way, but do not start at the same address.

The MustAlias response may only be returned if the two memory objects are guaranteed to always start at exactly the same location. A MustAlias response implies that the pointers compare equal.

The getModRefInfo methods

The getModRefInfo methods return information about whether the execution of an instruction can read or modify a memory location. Mod/Ref information is always conservative: if an instruction might read or write a location, ModRef is returned.

The AliasAnalysis class also provides a getModRefInfo method for testing dependencies between function calls. This method takes two call sites (CS1 & CS2), returns NoModRef if neither call writes to memory read or written by the other, Ref if CS1 reads memory written by CS2, Mod if CS1 writes to memory read or written by CS2, or ModRef if CS1 might read or write memory written to by CS2. Note that this relation is not commutative.

Other useful AliasAnalysis methods

Several other tidbits of information are often collected by various alias analysis implementations and can be put to good use by various clients.

The pointsToConstantMemory method

The pointsToConstantMemory method returns true if and only if the analysis can prove that the pointer only points to unchanging memory locations (functions, constant global variables, and the null pointer). This information can be used to refine mod/ref information: it is impossible for an unchanging memory location to be modified.

The doesNotAccessMemory and onlyReadsMemory methods

These methods are used to provide very simple mod/ref information for function calls. The doesNotAccessMemory method returns true for a function if the analysis can prove that the function never reads or writes to memory, or if the function only reads from constant memory. Functions with this property are side-effect free and only depend on their input arguments, allowing them to be eliminated if they form common subexpressions or be hoisted out of loops. Many common functions behave this way (e.g., sin and cos) but many others do not (e.g., acos, which modifies the errno variable).

The onlyReadsMemory method returns true for a function if analysis can prove that (at most) the function only reads from non-volatile memory. Functions with this property are side-effect free, only depending on their input arguments and the state of memory when they are called. This property allows calls to these functions to be eliminated and moved around, as long as there is no store instruction that changes the contents of memory. Note that all functions that satisfy the doesNotAccessMemory method also satisfies onlyReadsMemory.

Writing a new AliasAnalysis Implementation

Writing a new alias analysis implementation for LLVM is quite straight-forward. There are already several implementations that you can use for examples, and the following information should help fill in any details. For a examples, take a look at the various alias analysis implementations included with LLVM.

Different Pass styles

The first step to determining what type of LLVM pass you need to use for your Alias Analysis. As is the case with most other analyses and transformations, the answer should be fairly obvious from what type of problem you are trying to solve:

  1. If you require interprocedural analysis, it should be a Pass.
  2. If you are a function-local analysis, subclass FunctionPass.
  3. If you don't need to look at the program at all, subclass ImmutablePass.

In addition to the pass that you subclass, you should also inherit from the AliasAnalysis interface, of course, and use the RegisterAnalysisGroup template to register as an implementation of AliasAnalysis.

Required initialization calls

Your subclass of AliasAnalysis is required to invoke two methods on the AliasAnalysis base class: getAnalysisUsage and InitializeAliasAnalysis. In particular, your implementation of getAnalysisUsage should explicitly call into the AliasAnalysis::getAnalysisUsage method in addition to doing any declaring any pass dependencies your pass has. Thus you should have something like this:

 void getAnalysisUsage(AnalysisUsage &AU) const {
   AliasAnalysis::getAnalysisUsage(AU);
   // declare your dependencies here.
 }
 

Additionally, your must invoke the InitializeAliasAnalysis method from your analysis run method (run for a Pass, runOnFunction for a FunctionPass, or InitializePass for an ImmutablePass). For example (as part of a Pass):

 bool run(Module &M) {
   InitializeAliasAnalysis(this);
   // Perform analysis here...
   return false;
 }
 

Interfaces which may be specified

All of the AliasAnalysis virtual methods default to providing chaining to another alias analysis implementation, which ends up returning conservatively correct information (returning "May" Alias and "Mod/Ref" for alias and mod/ref queries respectively). Depending on the capabilities of the analysis you are implementing, you just override the interfaces you can improve.

AliasAnalysis chaining behavior

With only two special exceptions (the basicaa and no-aa passes) every alias analysis pass chains to another alias analysis implementation (for example, the user can specify "-basicaa -ds-aa -licm" to get the maximum benefit from both alias analyses). The alias analysis class automatically takes care of most of this for methods that you don't override. For methods that you do override, in code paths that return a conservative MayAlias or Mod/Ref result, simply return whatever the superclass computes. For example:

 AliasAnalysis::AliasResult alias(const Value *V1, unsigned V1Size,
                                  const Value *V2, unsigned V2Size) {
   if (...)
     return NoAlias;
   ...
 
   // Couldn't determine a must or no-alias result.
   return AliasAnalysis::alias(V1, V1Size, V2, V2Size);
 }
 

In addition to analysis queries, you must make sure to unconditionally pass LLVM update notification methods to the superclass as well if you override them, which allows all alias analyses in a change to be updated.

Updating analysis results for transformations

Alias analysis information is initially computed for a static snapshot of the program, but clients will use this information to make transformations to the code. All but the most trivial forms of alias analysis will need to have their analysis results updated to reflect the changes made by these transformations.

The AliasAnalysis interface exposes four methods which are used to communicate program changes from the clients to the analysis implementations. Various alias analysis implementations should use these methods to ensure that their internal data structures are kept up-to-date as the program changes (for example, when an instruction is deleted), and clients of alias analysis must be sure to call these interfaces appropriately.

The deleteValue method

The deleteValue method is called by transformations when they remove an instruction or any other value from the program (including values that do not use pointers). Typically alias analyses keep data structures that have entries for each value in the program. When this method is called, they should remove any entries for the specified value, if they exist.

The copyValue method

The copyValue method is used when a new value is introduced into the program. There is no way to introduce a value into the program that did not exist before (this doesn't make sense for a safe compiler transformation), so this is the only way to introduce a new value. This method indicates that the new value has exactly the same properties as the value being copied.

The replaceWithNewValue method

This method is a simple helper method that is provided to make clients easier to use. It is implemented by copying the old analysis information to the new value, then deleting the old value. This method cannot be overridden by alias analysis implementations.

The addEscapingUse method

The addEscapingUse method is used when the uses of a pointer value have changed in ways that may invalidate precomputed analysis information. Implementations may either use this callback to provide conservative responses for points whose uses have change since analysis time, or may recompute some or all of their internal state to continue providing accurate responses.

In general, any new use of a pointer value is considered an escaping use, and must be reported through this callback, except for the uses below:

  • A bitcast or getelementptr of the pointer
  • A store through the pointer (but not a store of the pointer)
  • A load through the pointer

Efficiency Issues

From the LLVM perspective, the only thing you need to do to provide an efficient alias analysis is to make sure that alias analysis queries are serviced quickly. The actual calculation of the alias analysis results (the "run" method) is only performed once, but many (perhaps duplicate) queries may be performed. Because of this, try to move as much computation to the run method as possible (within reason).

Limitations

The AliasAnalysis infrastructure has several limitations which make writing a new AliasAnalysis implementation difficult.

There is no way to override the default alias analysis. It would be very useful to be able to do something like "opt -my-aa -O2" and have it use -my-aa for all passes which need AliasAnalysis, but there is currently no support for that, short of changing the source code and recompiling. Similarly, there is also no way of setting a chain of analyses as the default.

There is no way for transform passes to declare that they preserve AliasAnalysis implementations. The AliasAnalysis interface includes deleteValue and copyValue methods which are intended to allow a pass to keep an AliasAnalysis consistent, however there's no way for a pass to declare in its getAnalysisUsage that it does so. Some passes attempt to use AU.addPreserved<AliasAnalysis>, however this doesn't actually have any effect.

AliasAnalysisCounter (-count-aa) and AliasDebugger (-debug-aa) are implemented as ModulePass classes, so if your alias analysis uses FunctionPass, it won't be able to use these utilities. If you try to use them, the pass manager will silently route alias analysis queries directly to BasicAliasAnalysis instead.

Similarly, the opt -p option introduces ModulePass passes between each pass, which prevents the use of FunctionPass alias analysis passes.

The AliasAnalysis API does have functions for notifying implementations when values are deleted or copied, however these aren't sufficient. There are many other ways that LLVM IR can be modified which could be relevant to AliasAnalysis implementations which can not be expressed.

The AliasAnalysisDebugger utility seems to suggest that AliasAnalysis implementations can expect that they will be informed of any relevant Value before it appears in an alias query. However, popular clients such as GVN don't support this, and are known to trigger errors when run with the AliasAnalysisDebugger.

Due to several of the above limitations, the most obvious use for the AliasAnalysisCounter utility, collecting stats on all alias queries in a compilation, doesn't work, even if the AliasAnalysis implementations don't use FunctionPass. There's no way to set a default, much less a default sequence, and there's no way to preserve it.

The AliasSetTracker class (which is used by LICM makes a non-deterministic number of alias queries. This can cause stats collected by AliasAnalysisCounter to have fluctuations among identical runs, for example. Another consequence is that debugging techniques involving pausing execution after a predetermined number of queries can be unreliable.

Many alias queries can be reformulated in terms of other alias queries. When multiple AliasAnalysis queries are chained together, it would make sense to start those queries from the beginning of the chain, with care taken to avoid infinite looping, however currently an implementation which wants to do this can only start such queries from itself.

Using alias analysis results

There are several different ways to use alias analysis results. In order of preference, these are...

Using the MemoryDependenceAnalysis Pass

The memdep pass uses alias analysis to provide high-level dependence information about memory-using instructions. This will tell you which store feeds into a load, for example. It uses caching and other techniques to be efficient, and is used by Dead Store Elimination, GVN, and memcpy optimizations.

Using the AliasSetTracker class

Many transformations need information about alias sets that are active in some scope, rather than information about pairwise aliasing. The AliasSetTracker class is used to efficiently build these Alias Sets from the pairwise alias analysis information provided by the AliasAnalysis interface.

First you initialize the AliasSetTracker by using the "add" methods to add information about various potentially aliasing instructions in the scope you are interested in. Once all of the alias sets are completed, your pass should simply iterate through the constructed alias sets, using the AliasSetTracker begin()/end() methods.

The AliasSets formed by the AliasSetTracker are guaranteed to be disjoint, calculate mod/ref information and volatility for the set, and keep track of whether or not all of the pointers in the set are Must aliases. The AliasSetTracker also makes sure that sets are properly folded due to call instructions, and can provide a list of pointers in each set.

As an example user of this, the Loop Invariant Code Motion pass uses AliasSetTrackers to calculate alias sets for each loop nest. If an AliasSet in a loop is not modified, then all load instructions from that set may be hoisted out of the loop. If any alias sets are stored to and are must alias sets, then the stores may be sunk to outside of the loop, promoting the memory location to a register for the duration of the loop nest. Both of these transformations only apply if the pointer argument is loop-invariant.

The AliasSetTracker implementation

The AliasSetTracker class is implemented to be as efficient as possible. It uses the union-find algorithm to efficiently merge AliasSets when a pointer is inserted into the AliasSetTracker that aliases multiple sets. The primary data structure is a hash table mapping pointers to the AliasSet they are in.

The AliasSetTracker class must maintain a list of all of the LLVM Value*'s that are in each AliasSet. Since the hash table already has entries for each LLVM Value* of interest, the AliasesSets thread the linked list through these hash-table nodes to avoid having to allocate memory unnecessarily, and to make merging alias sets extremely efficient (the linked list merge is constant time).

You shouldn't need to understand these details if you are just a client of the AliasSetTracker, but if you look at the code, hopefully this brief description will help make sense of why things are designed the way they are.

Using the AliasAnalysis interface directly

If neither of these utility class are what your pass needs, you should use the interfaces exposed by the AliasAnalysis class directly. Try to use the higher-level methods when possible (e.g., use mod/ref information instead of the alias method directly if possible) to get the best precision and efficiency.

Existing alias analysis implementations and clients

If you're going to be working with the LLVM alias analysis infrastructure, you should know what clients and implementations of alias analysis are available. In particular, if you are implementing an alias analysis, you should be aware of the the clients that are useful for monitoring and evaluating different implementations.

Available AliasAnalysis implementations

This section lists the various implementations of the AliasAnalysis interface. With the exception of the -no-aa implementation, all of these chain to other alias analysis implementations.

The -no-aa pass

The -no-aa pass is just like what it sounds: an alias analysis that never returns any useful information. This pass can be useful if you think that alias analysis is doing something wrong and are trying to narrow down a problem.

The -basicaa pass

The -basicaa pass is an aggressive local analysis that "knows" many important facts:

  • Distinct globals, stack allocations, and heap allocations can never alias.
  • Globals, stack allocations, and heap allocations never alias the null pointer.
  • Different fields of a structure do not alias.
  • Indexes into arrays with statically differing subscripts cannot alias.
  • Many common standard C library functions never access memory or only read memory.
  • Pointers that obviously point to constant globals "pointToConstantMemory".
  • Function calls can not modify or references stack allocations if they never escape from the function that allocates them (a common case for automatic arrays).

The -globalsmodref-aa pass

This pass implements a simple context-sensitive mod/ref and alias analysis for internal global variables that don't "have their address taken". If a global does not have its address taken, the pass knows that no pointers alias the global. This pass also keeps track of functions that it knows never access memory or never read memory. This allows certain optimizations (e.g. GVN) to eliminate call instructions entirely.

The real power of this pass is that it provides context-sensitive mod/ref information for call instructions. This allows the optimizer to know that calls to a function do not clobber or read the value of the global, allowing loads and stores to be eliminated.

Note that this pass is somewhat limited in its scope (only support non-address taken globals), but is very quick analysis.

The -steens-aa pass

The -steens-aa pass implements a variation on the well-known "Steensgaard's algorithm" for interprocedural alias analysis. Steensgaard's algorithm is a unification-based, flow-insensitive, context-insensitive, and field-insensitive alias analysis that is also very scalable (effectively linear time).

The LLVM -steens-aa pass implements a "speculatively field-sensitive" version of Steensgaard's algorithm using the Data Structure Analysis framework. This gives it substantially more precision than the standard algorithm while maintaining excellent analysis scalability.

Note that -steens-aa is available in the optional "poolalloc" module, it is not part of the LLVM core.

The -ds-aa pass

The -ds-aa pass implements the full Data Structure Analysis algorithm. Data Structure Analysis is a modular unification-based, flow-insensitive, context-sensitive, and speculatively field-sensitive alias analysis that is also quite scalable, usually at O(n*log(n)).

This algorithm is capable of responding to a full variety of alias analysis queries, and can provide context-sensitive mod/ref information as well. The only major facility not implemented so far is support for must-alias information.

Note that -ds-aa is available in the optional "poolalloc" module, it is not part of the LLVM core.

The -scev-aa pass

The -scev-aa pass implements AliasAnalysis queries by translating them into ScalarEvolution queries. This gives it a more complete understanding of getelementptr instructions and loop induction variables than other alias analyses have.

Alias analysis driven transformations

LLVM includes several alias-analysis driven transformations which can be used with any of the implementations above.

The -adce pass

The -adce pass, which implements Aggressive Dead Code Elimination uses the AliasAnalysis interface to delete calls to functions that do not have side-effects and are not used.

The -licm pass

The -licm pass implements various Loop Invariant Code Motion related transformations. It uses the AliasAnalysis interface for several different transformations:

  • It uses mod/ref information to hoist or sink load instructions out of loops if there are no instructions in the loop that modifies the memory loaded.
  • It uses mod/ref information to hoist function calls out of loops that do not write to memory and are loop-invariant.
  • If uses alias information to promote memory objects that are loaded and stored to in loops to live in a register instead. It can do this if there are no may aliases to the loaded/stored memory location.

The -argpromotion pass

The -argpromotion pass promotes by-reference arguments to be passed in by-value instead. In particular, if pointer arguments are only loaded from it passes in the value loaded instead of the address to the function. This pass uses alias information to make sure that the value loaded from the argument pointer is not modified between the entry of the function and any load of the pointer.

The -gvn, -memcpyopt, and -dse passes

These passes use AliasAnalysis information to reason about loads and stores.

Clients for debugging and evaluation of implementations

These passes are useful for evaluating the various alias analysis implementations. You can use them with commands like 'opt -ds-aa -aa-eval foo.bc -disable-output -stats'.

The -print-alias-sets pass

The -print-alias-sets pass is exposed as part of the opt tool to print out the Alias Sets formed by the AliasSetTracker class. This is useful if you're using the AliasSetTracker class. To use it, use something like:

 % opt -ds-aa -print-alias-sets -disable-output
 

The -count-aa pass

The -count-aa pass is useful to see how many queries a particular pass is making and what responses are returned by the alias analysis. As an example,

 % opt -basicaa -count-aa -ds-aa -count-aa -licm
 

will print out how many queries (and what responses are returned) by the -licm pass (of the -ds-aa pass) and how many queries are made of the -basicaa pass by the -ds-aa pass. This can be useful when debugging a transformation or an alias analysis implementation.

The -aa-eval pass

The -aa-eval pass simply iterates through all pairs of pointers in a function and asks an alias analysis whether or not the pointers alias. This gives an indication of the precision of the alias analysis. Statistics are printed indicating the percent of no/may/must aliases found (a more precise algorithm will have a lower number of may aliases).

Memory Dependence Analysis

If you're just looking to be a client of alias analysis information, consider using the Memory Dependence Analysis interface instead. MemDep is a lazy, caching layer on top of alias analysis that is able to answer the question of what preceding memory operations a given instruction depends on, either at an intra- or inter-block level. Because of its laziness and caching policy, using MemDep can be a significant performance win over accessing alias analysis directly.


Valid CSS Valid HTML 4.01 Chris Lattner
LLVM Compiler Infrastructure
- Last modified: $Date: 2011-05-25 00:01:32 +0200 (Wed, 25 May 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/BranchWeightMetadata.html =================================================================== --- vendor/llvm/dist/docs/BranchWeightMetadata.html (revision 228363) +++ vendor/llvm/dist/docs/BranchWeightMetadata.html (revision 228364) @@ -1,163 +1,164 @@ + LLVM Branch Weight Metadata

LLVM Branch Weight Metadata

  1. Introduction
  2. Supported Instructions
  3. Built-in "expect" Instruction
  4. CFG Modifications

Written by Jakub Staszak

Introduction

Branch Weight Metadata represents branch weights as its likeliness to be taken. Metadata is assigned to the TerminatorInst as a MDNode of the MD_prof kind. The first operator is always a MDString node with the string "branch_weights". Number of operators depends on the terminator type.

Branch weights might be fetch from the profiling file, or generated based on __builtin_expect instruction.

All weights are represented as an unsigned 32-bit values, where higher value indicates greater chance to be taken.

Supported Instructions

BranchInst

Metadata is only assign to the conditional branches. There are two extra operarands, for the true and the false branch.

 !0 = metadata !{
   metadata !"branch_weights",
   i32 <TRUE_BRANCH_WEIGHT>,
   i32 <FALSE_BRANCH_WEIGHT>
 }
   

SwitchInst

Branch weights are assign to every case (including default case which is always case #0).

 !0 = metadata !{
   metadata !"branch_weights",
   i32 <DEFAULT_BRANCH_WEIGHT>
   [ , i32 <CASE_BRANCH_WEIGHT> ... ]
 }
   

IndirectBrInst

Branch weights are assign to every destination.

 !0 = metadata !{
   metadata !"branch_weights",
   i32 <LABEL_BRANCH_WEIGHT>
   [ , i32 <LABEL_BRANCH_WEIGHT> ... ]
 }
   

Other

Other terminator instructions are not allowed to contain Branch Weight Metadata.

Built-in "expect" Instructions

__builtin_expect(long exp, long c) instruction provides branch prediction information. The return value is the value of exp.

It is especially useful in conditional statements. Currently Clang supports two conditional statements:

if statement

The exp parameter is the condition. The c parameter is the expected comparision value. If it is equal to 1 (true), the condition is likely to be true, in other case condition is likely to be false. For example:

   if (__builtin_expect(x > 0, 1)) {
     // This block is likely to be taken.
   }
   

switch statement

The exp parameter is the value. The c parameter is the expected value. If the expected value doesn't show on the cases list, the default case is assumed to be likely taken.

   switch (__builtin_expect(x, 5)) {
   default: break;
   case 0:  // ...
   case 3:  // ...
   case 5:  // This case is likely to be taken.
   }
   

CFG Modifications

Branch Weight Metatada is not proof against CFG changes. If terminator operands' are changed some action should be taken. In other case some misoptimizations may occur due to incorrent branch prediction information.


Valid CSS Valid HTML 4.01 Jakub Staszak
LLVM Compiler Infrastructure
Index: vendor/llvm/dist/docs/Bugpoint.html =================================================================== --- vendor/llvm/dist/docs/Bugpoint.html (revision 228363) +++ vendor/llvm/dist/docs/Bugpoint.html (revision 228364) @@ -1,238 +1,239 @@ + LLVM bugpoint tool: design and usage

LLVM bugpoint tool: design and usage

Written by Chris Lattner

Description

bugpoint narrows down the source of problems in LLVM tools and passes. It can be used to debug three types of failures: optimizer crashes, miscompilations by optimizers, or bad native code generation (including problems in the static and JIT compilers). It aims to reduce large test cases to small, useful ones. For example, if opt crashes while optimizing a file, it will identify the optimization (or combination of optimizations) that causes the crash, and reduce the file down to a small example which triggers the crash.

For detailed case scenarios, such as debugging opt, llvm-ld, or one of the LLVM code generators, see How To Submit a Bug Report document.

Design Philosophy

bugpoint is designed to be a useful tool without requiring any hooks into the LLVM infrastructure at all. It works with any and all LLVM passes and code generators, and does not need to "know" how they work. Because of this, it may appear to do stupid things or miss obvious simplifications. bugpoint is also designed to trade off programmer time for computer time in the compiler-debugging process; consequently, it may take a long period of (unattended) time to reduce a test case, but we feel it is still worth it. Note that bugpoint is generally very quick unless debugging a miscompilation where each test of the program (which requires executing it) takes a long time.

Automatic Debugger Selection

bugpoint reads each .bc or .ll file specified on the command line and links them together into a single module, called the test program. If any LLVM passes are specified on the command line, it runs these passes on the test program. If any of the passes crash, or if they produce malformed output (which causes the verifier to abort), bugpoint starts the crash debugger.

Otherwise, if the -output option was not specified, bugpoint runs the test program with the C backend (which is assumed to generate good code) to generate a reference output. Once bugpoint has a reference output for the test program, it tries executing it with the selected code generator. If the selected code generator crashes, bugpoint starts the crash debugger on the code generator. Otherwise, if the resulting output differs from the reference output, it assumes the difference resulted from a code generator failure, and starts the code generator debugger.

Finally, if the output of the selected code generator matches the reference output, bugpoint runs the test program after all of the LLVM passes have been applied to it. If its output differs from the reference output, it assumes the difference resulted from a failure in one of the LLVM passes, and enters the miscompilation debugger. Otherwise, there is no problem bugpoint can debug.

Crash debugger

If an optimizer or code generator crashes, bugpoint will try as hard as it can to reduce the list of passes (for optimizer crashes) and the size of the test program. First, bugpoint figures out which combination of optimizer passes triggers the bug. This is useful when debugging a problem exposed by opt, for example, because it runs over 38 passes.

Next, bugpoint tries removing functions from the test program, to reduce its size. Usually it is able to reduce a test program to a single function, when debugging intraprocedural optimizations. Once the number of functions has been reduced, it attempts to delete various edges in the control flow graph, to reduce the size of the function as much as possible. Finally, bugpoint deletes any individual LLVM instructions whose absence does not eliminate the failure. At the end, bugpoint should tell you what passes crash, give you a bitcode file, and give you instructions on how to reproduce the failure with opt or llc.

Code generator debugger

The code generator debugger attempts to narrow down the amount of code that is being miscompiled by the selected code generator. To do this, it takes the test program and partitions it into two pieces: one piece which it compiles with the C backend (into a shared object), and one piece which it runs with either the JIT or the static LLC compiler. It uses several techniques to reduce the amount of code pushed through the LLVM code generator, to reduce the potential scope of the problem. After it is finished, it emits two bitcode files (called "test" [to be compiled with the code generator] and "safe" [to be compiled with the C backend], respectively), and instructions for reproducing the problem. The code generator debugger assumes that the C backend produces good code.

Miscompilation debugger

The miscompilation debugger works similarly to the code generator debugger. It works by splitting the test program into two pieces, running the optimizations specified on one piece, linking the two pieces back together, and then executing the result. It attempts to narrow down the list of passes to the one (or few) which are causing the miscompilation, then reduce the portion of the test program which is being miscompiled. The miscompilation debugger assumes that the selected code generator is working properly.

Advice for using bugpoint

bugpoint can be a remarkably useful tool, but it sometimes works in non-obvious ways. Here are some hints and tips:

  1. In the code generator and miscompilation debuggers, bugpoint only works with programs that have deterministic output. Thus, if the program outputs argv[0], the date, time, or any other "random" data, bugpoint may misinterpret differences in these data, when output, as the result of a miscompilation. Programs should be temporarily modified to disable outputs that are likely to vary from run to run.
  2. In the code generator and miscompilation debuggers, debugging will go faster if you manually modify the program or its inputs to reduce the runtime, but still exhibit the problem.
  3. bugpoint is extremely useful when working on a new optimization: it helps track down regressions quickly. To avoid having to relink bugpoint every time you change your optimization however, have bugpoint dynamically load your optimization with the -load option.
  4. bugpoint can generate a lot of output and run for a long period of time. It is often useful to capture the output of the program to file. For example, in the C shell, you can run:

    bugpoint ... |& tee bugpoint.log

    to get a copy of bugpoint's output in the file bugpoint.log, as well as on your terminal.

  5. bugpoint cannot debug problems with the LLVM linker. If bugpoint crashes before you see its "All input ok" message, you might try llvm-link -v on the same set of input files. If that also crashes, you may be experiencing a linker bug.
  6. bugpoint is useful for proactively finding bugs in LLVM. Invoking bugpoint with the -find-bugs option will cause the list of specified optimizations to be randomized and applied to the program. This process will repeat until a bug is found or the user kills bugpoint.

Valid CSS Valid HTML 4.01 Chris Lattner
LLVM Compiler Infrastructure
- Last modified: $Date: 2011-08-30 20:26:11 +0200 (Tue, 30 Aug 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/CMake.html =================================================================== --- vendor/llvm/dist/docs/CMake.html (revision 228363) +++ vendor/llvm/dist/docs/CMake.html (revision 228364) @@ -1,565 +1,566 @@ + Building LLVM with CMake

Building LLVM with CMake

Written by Oscar Fuentes

Introduction

CMake is a cross-platform build-generator tool. CMake does not build the project, it generates the files needed by your build tool (GNU make, Visual Studio, etc) for building LLVM.

If you are really anxious about getting a functional LLVM build, go to the Quick start section. If you are a CMake novice, start on Basic CMake usage and then go back to the Quick start once you know what you are doing. The Options and variables section is a reference for customizing your build. If you already have experience with CMake, this is the recommended starting point.

Quick start

We use here the command-line, non-interactive CMake interface

  1. Download and install CMake. Version 2.8 is the minimum required.

  2. Open a shell. Your development tools must be reachable from this shell through the PATH environment variable.

  3. Create a directory for containing the build. It is not supported to build LLVM on the source directory. cd to this directory:

    mkdir mybuilddir

    cd mybuilddir

  4. Execute this command on the shell replacing path/to/llvm/source/root with the path to the root of your LLVM source tree:

    cmake path/to/llvm/source/root

    CMake will detect your development environment, perform a series of test and generate the files required for building LLVM. CMake will use default values for all build parameters. See the Options and variables section for fine-tuning your build

    This can fail if CMake can't detect your toolset, or if it thinks that the environment is not sane enough. On this case make sure that the toolset that you intend to use is the only one reachable from the shell and that the shell itself is the correct one for you development environment. CMake will refuse to build MinGW makefiles if you have a POSIX shell reachable through the PATH environment variable, for instance. You can force CMake to use a given build tool, see the Usage section.

Basic CMake usage

This section explains basic aspects of CMake, mostly for explaining those options which you may need on your day-to-day usage.

CMake comes with extensive documentation in the form of html files and on the cmake executable itself. Execute cmake --help for further help options.

CMake requires to know for which build tool it shall generate files (GNU make, Visual Studio, Xcode, etc). If not specified on the command line, it tries to guess it based on you environment. Once identified the build tool, CMake uses the corresponding Generator for creating files for your build tool. You can explicitly specify the generator with the command line option -G "Name of the generator". For knowing the available generators on your platform, execute

cmake --help

This will list the generator's names at the end of the help text. Generator's names are case-sensitive. Example:

cmake -G "Visual Studio 8 2005" path/to/llvm/source/root

For a given development platform there can be more than one adequate generator. If you use Visual Studio "NMake Makefiles" is a generator you can use for building with NMake. By default, CMake chooses the more specific generator supported by your development environment. If you want an alternative generator, you must tell this to CMake with the -G option.

TODO: explain variables and cache. Move explanation here from #options section.

Options and variables

Variables customize how the build will be generated. Options are boolean variables, with possible values ON/OFF. Options and variables are defined on the CMake command line like this:

cmake -DVARIABLE=value path/to/llvm/source

You can set a variable after the initial CMake invocation for changing its value. You can also undefine a variable:

cmake -UVARIABLE path/to/llvm/source

Variables are stored on the CMake cache. This is a file named CMakeCache.txt on the root of the build directory. Do not hand-edit it.

Variables are listed here appending its type after a colon. It is correct to write the variable and the type on the CMake command line:

cmake -DVARIABLE:TYPE=value path/to/llvm/source

Frequently-used CMake variables

Here are listed some of the CMake variables that are used often, along with a brief explanation and LLVM-specific notes. For full documentation, check the CMake docs or execute cmake --help-variable VARIABLE_NAME.

CMAKE_BUILD_TYPE:STRING
Sets the build type for make based generators. Possible values are Release, Debug, RelWithDebInfo and MinSizeRel. On systems like Visual Studio the user sets the build type with the IDE settings.
CMAKE_INSTALL_PREFIX:PATH
Path where LLVM will be installed if "make install" is invoked or the "INSTALL" target is built.
LLVM_LIBDIR_SUFFIX:STRING
Extra suffix to append to the directory where libraries are to be installed. On a 64-bit architecture, one could use -DLLVM_LIBDIR_SUFFIX=64 to install libraries to /usr/lib64.
CMAKE_C_FLAGS:STRING
Extra flags to use when compiling C source files.
CMAKE_CXX_FLAGS:STRING
Extra flags to use when compiling C++ source files.
BUILD_SHARED_LIBS:BOOL
Flag indicating is shared libraries will be built. Its default value is OFF. Shared libraries are not supported on Windows and not recommended in the other OSes.

LLVM-specific variables

LLVM_TARGETS_TO_BUILD:STRING
Semicolon-separated list of targets to build, or all for building all targets. Case-sensitive. For Visual C++ defaults to X86. On the other cases defaults to all. Example: -DLLVM_TARGETS_TO_BUILD="X86;PowerPC;Alpha".
LLVM_BUILD_TOOLS:BOOL
Build LLVM tools. Defaults to ON. Targets for building each tool are generated in any case. You can build an tool separately by invoking its target. For example, you can build llvm-as with a makefile-based system executing make llvm-as on the root of your build directory.
LLVM_INCLUDE_TOOLS:BOOL
Generate build targets for the LLVM tools. Defaults to ON. You can use that option for disabling the generation of build targets for the LLVM tools.
LLVM_BUILD_EXAMPLES:BOOL
Build LLVM examples. Defaults to OFF. Targets for building each example are generated in any case. See documentation for LLVM_BUILD_TOOLS above for more details.
LLVM_INCLUDE_EXAMPLES:BOOL
Generate build targets for the LLVM examples. Defaults to ON. You can use that option for disabling the generation of build targets for the LLVM examples.
LLVM_BUILD_TESTS:BOOL
Build LLVM unit tests. Defaults to OFF. Targets for building each unit test are generated in any case. You can build a specific unit test with the target UnitTestNameTests (where at this time UnitTestName can be ADT, Analysis, ExecutionEngine, JIT, Support, Transform, VMCore; see the subdirectories of unittests for an updated list.) It is possible to build all unit tests with the target UnitTests.
LLVM_INCLUDE_TESTS:BOOL
Generate build targets for the LLVM unit tests. Defaults to ON. You can use that option for disabling the generation of build targets for the LLVM unit tests.
LLVM_APPEND_VC_REV:BOOL
Append version control revision info (svn revision number or git revision id) to LLVM version string (stored in the PACKAGE_VERSION macro). For this to work cmake must be invoked before the build. Defaults to OFF.
LLVM_ENABLE_THREADS:BOOL
Build with threads support, if available. Defaults to ON.
LLVM_ENABLE_ASSERTIONS:BOOL
Enables code assertions. Defaults to OFF if and only if CMAKE_BUILD_TYPE is Release.
LLVM_ENABLE_PIC:BOOL
Add the -fPIC flag for the compiler command-line, if the compiler supports this flag. Some systems, like Windows, do not need this flag. Defaults to ON.
LLVM_ENABLE_WARNINGS:BOOL
Enable all compiler warnings. Defaults to ON.
LLVM_ENABLE_PEDANTIC:BOOL
Enable pedantic mode. This disable compiler specific extensions, is possible. Defaults to ON.
LLVM_ENABLE_WERROR:BOOL
Stop and fail build, if a compiler warning is triggered. Defaults to OFF.
LLVM_BUILD_32_BITS:BOOL
Build 32-bits executables and libraries on 64-bits systems. This option is available only on some 64-bits unix systems. Defaults to OFF.
LLVM_TARGET_ARCH:STRING
LLVM target to use for native code generation. This is required for JIT generation. It defaults to "host", meaning that it shall pick the architecture of the machine where LLVM is being built. If you are cross-compiling, set it to the target architecture name.
LLVM_TABLEGEN:STRING
Full path to a native TableGen executable (usually named tblgen). This is intented for cross-compiling: if the user sets this variable, no native TableGen will be created.
LLVM_LIT_ARGS:STRING
Arguments given to lit. make check and make clang-test are affected. By default, "-sv --no-progress-bar" on Visual C++ and Xcode, "-sv" on others.
LLVM_LIT_TOOLS_DIR:PATH
The path to GnuWin32 tools for tests. Valid on Windows host. Defaults to "", then Lit seeks tools according to %PATH%. Lit can find tools(eg. grep, sort, &c) on LLVM_LIT_TOOLS_DIR at first, without specifying GnuWin32 to %PATH%.
LLVM_ENABLE_FFI:BOOL
Indicates whether LLVM Interpreter will be linked with Foreign Function Interface library. If the library or its headers are installed on a custom location, you can set the variables FFI_INCLUDE_DIR and FFI_LIBRARY_DIR. Defaults to OFF.

Executing the test suite

Testing is performed when the check target is built. For instance, if you are using makefiles, execute this command while on the top level of your build directory:

make check

On Visual Studio, you may run tests to build the project "check".

Cross compiling

See this wiki page for generic instructions on how to cross-compile with CMake. It goes into detailed explanations and may seem daunting, but it is not. On the wiki page there are several examples including toolchain files. Go directly to this section for a quick solution.

Also see the LLVM-specific variables section for variables used when cross-compiling.

Embedding LLVM in your project

The most difficult part of adding LLVM to the build of a project is to determine the set of LLVM libraries corresponding to the set of required LLVM features. What follows is an example of how to obtain this information:

     # A convenience variable:
     set(LLVM_ROOT "" CACHE PATH "Root of LLVM install.")
     # A bit of a sanity check:
     if( NOT EXISTS ${LLVM_ROOT}/include/llvm )
     message(FATAL_ERROR "LLVM_ROOT (${LLVM_ROOT}) is not a valid LLVM install")
     endif()
     # We incorporate the CMake features provided by LLVM:
     set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${LLVM_ROOT}/share/llvm/cmake")
     include(LLVMConfig)
     # Now set the header and library paths:
     include_directories( ${LLVM_INCLUDE_DIRS} )
     link_directories( ${LLVM_LIBRARY_DIRS} )
     add_definitions( ${LLVM_DEFINITIONS} )
     # Let's suppose we want to build a JIT compiler with support for
     # binary code (no interpreter):
     llvm_map_components_to_libraries(REQ_LLVM_LIBRARIES jit native)
     # Finally, we link the LLVM libraries to our executable:
     target_link_libraries(mycompiler ${REQ_LLVM_LIBRARIES})
     

This assumes that LLVM_ROOT points to an install of LLVM. The procedure works too for uninstalled builds although we need to take care to add an include_directories for the location of the headers on the LLVM source directory (if we are building out-of-source.)

Alternativaly, you can utilize CMake's find_package functionality. Here is an equivalent variant of snippet shown above:

     find_package(LLVM)
 
     if( NOT LLVM_FOUND )
       message(FATAL_ERROR "LLVM package can't be found. Set CMAKE_PREFIX_PATH variable to LLVM's installation prefix.")
     endif()
 
     include_directories( ${LLVM_INCLUDE_DIRS} )
     link_directories( ${LLVM_LIBRARY_DIRS} )
 
     llvm_map_components_to_libraries(REQ_LLVM_LIBRARIES jit native)
 
     target_link_libraries(mycompiler ${REQ_LLVM_LIBRARIES})
     

Developing LLVM pass out of source

It is possible to develop LLVM passes against installed LLVM. An example of project layout provided below:

       <project dir>/
           |
           CMakeLists.txt
           <pass name>/
               |
               CMakeLists.txt
               Pass.cpp
               ...
     

Contents of <project dir>/CMakeLists.txt:

     find_package(LLVM)
 
     # Define add_llvm_* macro's.
     include(AddLLVM)
 
     add_definitions(${LLVM_DEFINITIONS})
     include_directories(${LLVM_INCLUDE_DIRS})
     link_directories(${LLVM_LIBRARY_DIRS})
 
     add_subdirectory(<pass name>)
     

Contents of <project dir>/<pass name>/CMakeLists.txt:

     add_llvm_loadable_module(LLVMPassname
       Pass.cpp
       )
     

When you are done developing your pass, you may wish to integrate it into LLVM source tree. You can achieve it in two easy steps:
1. Copying <pass name> folder into <LLVM root>/lib/Transform directory.
2. Adding "add_subdirectory(<pass name>)" line into <LLVM root>/lib/Transform/CMakeLists.txt

Compiler/Platform specific topics

Notes for specific compilers and/or platforms.

Microsoft Visual C++

LLVM_COMPILER_JOBS:STRING
Specifies the maximum number of parallell compiler jobs to use per project when building with msbuild or Visual Studio. Only supported for Visual Studio 2008 and Visual Studio 2010 CMake generators. 0 means use all processors. Default is 0.

Valid CSS Valid HTML 4.01 Oscar Fuentes
LLVM Compiler Infrastructure
Last modified: $Date: 2010-08-09 03:59:36 +0100 (Mon, 9 Aug 2010) $
Index: vendor/llvm/dist/docs/CodeGenerator.html =================================================================== --- vendor/llvm/dist/docs/CodeGenerator.html (revision 228363) +++ vendor/llvm/dist/docs/CodeGenerator.html (revision 228364) @@ -1,2995 +1,2999 @@ The LLVM Target-Independent Code Generator

The LLVM Target-Independent Code Generator

  1. Introduction
  2. Target description classes
  3. The "Machine" Code Generator classes
  4. The "MC" Layer
  5. Target-independent code generation algorithms
  6. Implementing a Native Assembler
  7. Target-specific Implementation Notes

Written by the LLVM Team.

Warning: This is a work in progress.

Introduction

The LLVM target-independent code generator is a framework that provides a suite of reusable components for translating the LLVM internal representation to the machine code for a specified target—either in assembly form (suitable for a static compiler) or in binary machine code format (usable for a JIT compiler). The LLVM target-independent code generator consists of six main components:

  1. Abstract target description interfaces which capture important properties about various aspects of the machine, independently of how they will be used. These interfaces are defined in include/llvm/Target/.
  2. Classes used to represent the code being generated for a target. These classes are intended to be abstract enough to represent the machine code for any target machine. These classes are defined in include/llvm/CodeGen/. At this level, concepts like "constant pool entries" and "jump tables" are explicitly exposed.
  3. Classes and algorithms used to represent code as the object file level, the MC Layer. These classes represent assembly level constructs like labels, sections, and instructions. At this level, concepts like "constant pool entries" and "jump tables" don't exist.
  4. Target-independent algorithms used to implement various phases of native code generation (register allocation, scheduling, stack frame representation, etc). This code lives in lib/CodeGen/.
  5. Implementations of the abstract target description interfaces for particular targets. These machine descriptions make use of the components provided by LLVM, and can optionally provide custom target-specific passes, to build complete code generators for a specific target. Target descriptions live in lib/Target/.
  6. The target-independent JIT components. The LLVM JIT is completely target independent (it uses the TargetJITInfo structure to interface for target-specific issues. The code for the target-independent JIT lives in lib/ExecutionEngine/JIT.

Depending on which part of the code generator you are interested in working on, different pieces of this will be useful to you. In any case, you should be familiar with the target description and machine code representation classes. If you want to add a backend for a new target, you will need to implement the target description classes for your new target and understand the LLVM code representation. If you are interested in implementing a new code generation algorithm, it should only depend on the target-description and machine code representation classes, ensuring that it is portable.

Required components in the code generator

The two pieces of the LLVM code generator are the high-level interface to the code generator and the set of reusable components that can be used to build target-specific backends. The two most important interfaces (TargetMachine and TargetData) are the only ones that are required to be defined for a backend to fit into the LLVM system, but the others must be defined if the reusable code generator components are going to be used.

This design has two important implications. The first is that LLVM can support completely non-traditional code generation targets. For example, the C backend does not require register allocation, instruction selection, or any of the other standard components provided by the system. As such, it only implements these two interfaces, and does its own thing. Another example of a code generator like this is a (purely hypothetical) backend that converts LLVM to the GCC RTL form and uses GCC to emit machine code for a target.

This design also implies that it is possible to design and implement radically different code generators in the LLVM system that do not make use of any of the built-in components. Doing so is not recommended at all, but could be required for radically different targets that do not fit into the LLVM machine description model: FPGAs for example.

The high-level design of the code generator

The LLVM target-independent code generator is designed to support efficient and quality code generation for standard register-based microprocessors. Code generation in this model is divided into the following stages:

  1. Instruction Selection — This phase determines an efficient way to express the input LLVM code in the target instruction set. This stage produces the initial code for the program in the target instruction set, then makes use of virtual registers in SSA form and physical registers that represent any required register assignments due to target constraints or calling conventions. This step turns the LLVM code into a DAG of target instructions.
  2. Scheduling and Formation — This phase takes the DAG of target instructions produced by the instruction selection phase, determines an ordering of the instructions, then emits the instructions as MachineInstrs with that ordering. Note that we describe this in the instruction selection section because it operates on a SelectionDAG.
  3. SSA-based Machine Code Optimizations — This optional stage consists of a series of machine-code optimizations that operate on the SSA-form produced by the instruction selector. Optimizations like modulo-scheduling or peephole optimization work here.
  4. Register Allocation — The target code is transformed from an infinite virtual register file in SSA form to the concrete register file used by the target. This phase introduces spill code and eliminates all virtual register references from the program.
  5. Prolog/Epilog Code Insertion — Once the machine code has been generated for the function and the amount of stack space required is known (used for LLVM alloca's and spill slots), the prolog and epilog code for the function can be inserted and "abstract stack location references" can be eliminated. This stage is responsible for implementing optimizations like frame-pointer elimination and stack packing.
  6. Late Machine Code Optimizations — Optimizations that operate on "final" machine code can go here, such as spill code scheduling and peephole optimizations.
  7. Code Emission — The final stage actually puts out the code for the current function, either in the target assembler format or in machine code.

The code generator is based on the assumption that the instruction selector will use an optimal pattern matching selector to create high-quality sequences of native instructions. Alternative code generator designs based on pattern expansion and aggressive iterative peephole optimization are much slower. This design permits efficient compilation (important for JIT environments) and aggressive optimization (used when generating code offline) by allowing components of varying levels of sophistication to be used for any step of compilation.

In addition to these stages, target implementations can insert arbitrary target-specific passes into the flow. For example, the X86 target uses a special pass to handle the 80x87 floating point stack architecture. Other targets with unusual requirements can be supported with custom passes as needed.

Using TableGen for target description

The target description classes require a detailed description of the target architecture. These target descriptions often have a large amount of common information (e.g., an add instruction is almost identical to a sub instruction). In order to allow the maximum amount of commonality to be factored out, the LLVM code generator uses the TableGen tool to describe big chunks of the target machine, which allows the use of domain-specific and target-specific abstractions to reduce the amount of repetition.

As LLVM continues to be developed and refined, we plan to move more and more of the target description to the .td form. Doing so gives us a number of advantages. The most important is that it makes it easier to port LLVM because it reduces the amount of C++ code that has to be written, and the surface area of the code generator that needs to be understood before someone can get something working. Second, it makes it easier to change things. In particular, if tables and other things are all emitted by tblgen, we only need a change in one place (tblgen) to update all of the targets to a new interface.

Target description classes

The LLVM target description classes (located in the include/llvm/Target directory) provide an abstract description of the target machine independent of any particular client. These classes are designed to capture the abstract properties of the target (such as the instructions and registers it has), and do not incorporate any particular pieces of code generation algorithms.

All of the target description classes (except the TargetData class) are designed to be subclassed by the concrete target implementation, and have virtual methods implemented. To get to these implementations, the TargetMachine class provides accessors that should be implemented by the target.

The TargetMachine class

The TargetMachine class provides virtual methods that are used to access the target-specific implementations of the various target description classes via the get*Info methods (getInstrInfo, getRegisterInfo, getFrameInfo, etc.). This class is designed to be specialized by a concrete target implementation (e.g., X86TargetMachine) which implements the various virtual methods. The only required target description class is the TargetData class, but if the code generator components are to be used, the other interfaces should be implemented as well.

The TargetData class

The TargetData class is the only required target description class, and it is the only class that is not extensible (you cannot derived a new class from it). TargetData specifies information about how the target lays out memory for structures, the alignment requirements for various data types, the size of pointers in the target, and whether the target is little-endian or big-endian.

The TargetLowering class

The TargetLowering class is used by SelectionDAG based instruction selectors primarily to describe how LLVM code should be lowered to SelectionDAG operations. Among other things, this class indicates:

The TargetRegisterInfo class

The TargetRegisterInfo class is used to describe the register file of the target and any interactions between the registers.

Registers in the code generator are represented in the code generator by unsigned integers. Physical registers (those that actually exist in the target description) are unique small numbers, and virtual registers are generally large. Note that register #0 is reserved as a flag value.

Each register in the processor description has an associated TargetRegisterDesc entry, which provides a textual name for the register (used for assembly output and debugging dumps) and a set of aliases (used to indicate whether one register overlaps with another).

In addition to the per-register description, the TargetRegisterInfo class exposes a set of processor specific register classes (instances of the TargetRegisterClass class). Each register class contains sets of registers that have the same properties (for example, they are all 32-bit integer registers). Each SSA virtual register created by the instruction selector has an associated register class. When the register allocator runs, it replaces virtual registers with a physical register in the set.

The target-specific implementations of these classes is auto-generated from a TableGen description of the register file.

The TargetInstrInfo class

The TargetInstrInfo class is used to describe the machine instructions supported by the target. It is essentially an array of TargetInstrDescriptor objects, each of which describes one instruction the target supports. Descriptors define things like the mnemonic for the opcode, the number of operands, the list of implicit register uses and defs, whether the instruction has certain target-independent properties (accesses memory, is commutable, etc), and holds any target-specific flags.

The TargetFrameInfo class

The TargetFrameInfo class is used to provide information about the stack frame layout of the target. It holds the direction of stack growth, the known stack alignment on entry to each function, and the offset to the local area. The offset to the local area is the offset from the stack pointer on function entry to the first location where function data (local variables, spill locations) can be stored.

The TargetSubtarget class

The TargetSubtarget class is used to provide information about the specific chip set being targeted. A sub-target informs code generation of which instructions are supported, instruction latencies and instruction execution itinerary; i.e., which processing units are used, in what order, and for how long.

The TargetJITInfo class

The TargetJITInfo class exposes an abstract interface used by the Just-In-Time code generator to perform target-specific activities, such as emitting stubs. If a TargetMachine supports JIT code generation, it should provide one of these objects through the getJITInfo method.

Machine code description classes

At the high-level, LLVM code is translated to a machine specific representation formed out of MachineFunction, MachineBasicBlock, and MachineInstr instances (defined in include/llvm/CodeGen). This representation is completely target agnostic, representing instructions in their most abstract form: an opcode and a series of operands. This representation is designed to support both an SSA representation for machine code, as well as a register allocated, non-SSA form.

The MachineInstr class

Target machine instructions are represented as instances of the MachineInstr class. This class is an extremely abstract way of representing machine instructions. In particular, it only keeps track of an opcode number and a set of operands.

The opcode number is a simple unsigned integer that only has meaning to a specific backend. All of the instructions for a target should be defined in the *InstrInfo.td file for the target. The opcode enum values are auto-generated from this description. The MachineInstr class does not have any information about how to interpret the instruction (i.e., what the semantics of the instruction are); for that you must refer to the TargetInstrInfo class.

The operands of a machine instruction can be of several different types: a register reference, a constant integer, a basic block reference, etc. In addition, a machine operand should be marked as a def or a use of the value (though only registers are allowed to be defs).

By convention, the LLVM code generator orders instruction operands so that all register definitions come before the register uses, even on architectures that are normally printed in other orders. For example, the SPARC add instruction: "add %i1, %i2, %i3" adds the "%i1", and "%i2" registers and stores the result into the "%i3" register. In the LLVM code generator, the operands should be stored as "%i3, %i1, %i2": with the destination first.

Keeping destination (definition) operands at the beginning of the operand list has several advantages. In particular, the debugging printer will print the instruction like this:

 %r3 = add %i1, %i2
 

Also if the first operand is a def, it is easier to create instructions whose only def is the first operand.

Using the MachineInstrBuilder.h functions

Machine instructions are created by using the BuildMI functions, located in the include/llvm/CodeGen/MachineInstrBuilder.h file. The BuildMI functions make it easy to build arbitrary machine instructions. Usage of the BuildMI functions look like this:

 // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
 // instruction.  The '1' specifies how many operands will be added.
 MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
 
 // Create the same instr, but insert it at the end of a basic block.
 MachineBasicBlock &MBB = ...
 BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
 
 // Create the same instr, but insert it before a specified iterator point.
 MachineBasicBlock::iterator MBBI = ...
 BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
 
 // Create a 'cmp Reg, 0' instruction, no destination reg.
 MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
 // Create an 'sahf' instruction which takes no operands and stores nothing.
 MI = BuildMI(X86::SAHF, 0);
 
 // Create a self looping branch instruction.
 BuildMI(MBB, X86::JNE, 1).addMBB(&MBB);
 

The key thing to remember with the BuildMI functions is that you have to specify the number of operands that the machine instruction will take. This allows for efficient memory allocation. You also need to specify if operands default to be uses of values, not definitions. If you need to add a definition operand (other than the optional destination register), you must explicitly mark it as such:

 MI.addReg(Reg, RegState::Define);
 

Fixed (preassigned) registers

One important issue that the code generator needs to be aware of is the presence of fixed registers. In particular, there are often places in the instruction stream where the register allocator must arrange for a particular value to be in a particular register. This can occur due to limitations of the instruction set (e.g., the X86 can only do a 32-bit divide with the EAX/EDX registers), or external factors like calling conventions. In any case, the instruction selector should emit code that copies a virtual register into or out of a physical register when needed.

For example, consider this simple LLVM example:

 define i32 @test(i32 %X, i32 %Y) {
   %Z = udiv i32 %X, %Y
   ret i32 %Z
 }
 

The X86 instruction selector produces this machine code for the div and ret (use "llc X.bc -march=x86 -print-machineinstrs" to get this):

 ;; Start of div
 %EAX = mov %reg1024           ;; Copy X (in reg1024) into EAX
 %reg1027 = sar %reg1024, 31
 %EDX = mov %reg1027           ;; Sign extend X into EDX
 idiv %reg1025                 ;; Divide by Y (in reg1025)
 %reg1026 = mov %EAX           ;; Read the result (Z) out of EAX
 
 ;; Start of ret
 %EAX = mov %reg1026           ;; 32-bit return value goes in EAX
 ret
 

By the end of code generation, the register allocator has coalesced the registers and deleted the resultant identity moves producing the following code:

 ;; X is in EAX, Y is in ECX
 mov %EAX, %EDX
 sar %EDX, 31
 idiv %ECX
 ret 
 

This approach is extremely general (if it can handle the X86 architecture, it can handle anything!) and allows all of the target specific knowledge about the instruction stream to be isolated in the instruction selector. Note that physical registers should have a short lifetime for good code generation, and all physical registers are assumed dead on entry to and exit from basic blocks (before register allocation). Thus, if you need a value to be live across basic block boundaries, it must live in a virtual register.

Machine code in SSA form

MachineInstr's are initially selected in SSA-form, and are maintained in SSA-form until register allocation happens. For the most part, this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes become machine code PHI nodes, and virtual registers are only allowed to have a single definition.

After register allocation, machine code is no longer in SSA-form because there are no virtual registers left in the code.

The MachineBasicBlock class

The MachineBasicBlock class contains a list of machine instructions (MachineInstr instances). It roughly corresponds to the LLVM code input to the instruction selector, but there can be a one-to-many mapping (i.e. one LLVM basic block can map to multiple machine basic blocks). The MachineBasicBlock class has a "getBasicBlock" method, which returns the LLVM basic block that it comes from.

The MachineFunction class

The MachineFunction class contains a list of machine basic blocks (MachineBasicBlock instances). It corresponds one-to-one with the LLVM function input to the instruction selector. In addition to a list of basic blocks, the MachineFunction contains a a MachineConstantPool, a MachineFrameInfo, a MachineFunctionInfo, and a MachineRegisterInfo. See include/llvm/CodeGen/MachineFunction.h for more information.

The "MC" Layer

The MC Layer is used to represent and process code at the raw machine code level, devoid of "high level" information like "constant pools", "jump tables", "global variables" or anything like that. At this level, LLVM handles things like label names, machine instructions, and sections in the object file. The code in this layer is used for a number of important purposes: the tail end of the code generator uses it to write a .s or .o file, and it is also used by the llvm-mc tool to implement standalone machine code assemblers and disassemblers.

This section describes some of the important classes. There are also a number of important subsystems that interact at this layer, they are described later in this manual.

The MCStreamer API

MCStreamer is best thought of as an assembler API. It is an abstract API which is implemented in different ways (e.g. to output a .s file, output an ELF .o file, etc) but whose API correspond directly to what you see in a .s file. MCStreamer has one method per directive, such as EmitLabel, EmitSymbolAttribute, SwitchSection, EmitValue (for .byte, .word), etc, which directly correspond to assembly level directives. It also has an EmitInstruction method, which is used to output an MCInst to the streamer.

This API is most important for two clients: the llvm-mc stand-alone assembler is effectively a parser that parses a line, then invokes a method on MCStreamer. In the code generator, the Code Emission phase of the code generator lowers higher level LLVM IR and Machine* constructs down to the MC layer, emitting directives through MCStreamer.

On the implementation side of MCStreamer, there are two major implementations: one for writing out a .s file (MCAsmStreamer), and one for writing out a .o file (MCObjectStreamer). MCAsmStreamer is a straight-forward implementation that prints out a directive for each method (e.g. EmitValue -> .byte), but MCObjectStreamer implements a full assembler.

The MCContext class

The MCContext class is the owner of a variety of uniqued data structures at the MC layer, including symbols, sections, etc. As such, this is the class that you interact with to create symbols and sections. This class can not be subclassed.

The MCSymbol class

The MCSymbol class represents a symbol (aka label) in the assembly file. There are two interesting kinds of symbols: assembler temporary symbols, and normal symbols. Assembler temporary symbols are used and processed by the assembler but are discarded when the object file is produced. The distinction is usually represented by adding a prefix to the label, for example "L" labels are assembler temporary labels in MachO.

MCSymbols are created by MCContext and uniqued there. This means that MCSymbols can be compared for pointer equivalence to find out if they are the same symbol. Note that pointer inequality does not guarantee the labels will end up at different addresses though. It's perfectly legal to output something like this to the .s file:

   foo:
   bar:
     .byte 4
 

In this case, both the foo and bar symbols will have the same address.

The MCSection class

The MCSection class represents an object-file specific section. It is subclassed by object file specific implementations (e.g. MCSectionMachO, MCSectionCOFF, MCSectionELF) and these are created and uniqued by MCContext. The MCStreamer has a notion of the current section, which can be changed with the SwitchToSection method (which corresponds to a ".section" directive in a .s file).

The MCInst class

The MCInst class is a target-independent representation of an instruction. It is a simple class (much more so than MachineInstr) that holds a target-specific opcode and a vector of MCOperands. MCOperand, in turn, is a simple discriminated union of three cases: 1) a simple immediate, 2) a target register ID, 3) a symbolic expression (e.g. "Lfoo-Lbar+42") as an MCExpr.

MCInst is the common currency used to represent machine instructions at the MC layer. It is the type used by the instruction encoder, the instruction printer, and the type generated by the assembly parser and disassembler.

Target-independent code generation algorithms

This section documents the phases described in the high-level design of the code generator. It explains how they work and some of the rationale behind their design.

Instruction Selection

Instruction Selection is the process of translating LLVM code presented to the code generator into target-specific machine instructions. There are several well-known ways to do this in the literature. LLVM uses a SelectionDAG based instruction selector.

Portions of the DAG instruction selector are generated from the target description (*.td) files. Our goal is for the entire instruction selector to be generated from these .td files, though currently there are still things that require custom C++ code.

Introduction to SelectionDAGs

The SelectionDAG provides an abstraction for code representation in a way that is amenable to instruction selection using automatic techniques (e.g. dynamic-programming based optimal pattern matching selectors). It is also well-suited to other phases of code generation; in particular, instruction scheduling (SelectionDAG's are very close to scheduling DAGs post-selection). Additionally, the SelectionDAG provides a host representation where a large variety of very-low-level (but target-independent) optimizations may be performed; ones which require extensive information about the instructions efficiently supported by the target.

The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the SDNode class. The primary payload of the SDNode is its operation code (Opcode) that indicates what operation the node performs and the operands to the operation. The various operation node types are described at the top of the include/llvm/CodeGen/SelectionDAGNodes.h file.

Although most operations define a single value, each node in the graph may define multiple values. For example, a combined div/rem operation will define both the dividend and the remainder. Many other situations require multiple values as well. Each node also has some number of operands, which are edges to the node defining the used value. Because nodes may define multiple values, edges are represented by instances of the SDValue class, which is a <SDNode, unsigned> pair, indicating the node and result value being used, respectively. Each value produced by an SDNode has an associated MVT (Machine Value Type) indicating what the type of the value is.

SelectionDAGs contain two different kinds of values: those that represent data flow and those that represent control flow dependencies. Data values are simple edges with an integer or floating point value type. Control edges are represented as "chain" edges which are of type MVT::Other. These edges provide an ordering between nodes that have side effects (such as loads, stores, calls, returns, etc). All nodes that have side effects should take a token chain as input and produce a new one as output. By convention, token chain inputs are always operand #0, and chain results are always the last value produced by an operation.

A SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is always a marker node with an Opcode of ISD::EntryToken. The Root node is the final side-effecting node in the token chain. For example, in a single basic block function it would be the return node.

One important concept for SelectionDAGs is the notion of a "legal" vs. "illegal" DAG. A legal DAG for a target is one that only uses supported operations and supported types. On a 32-bit PowerPC, for example, a DAG with a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that uses a SREM or UREM operation. The legalize types and legalize operations phases are responsible for turning an illegal DAG into a legal DAG.

SelectionDAG Instruction Selection Process

SelectionDAG-based instruction selection consists of the following steps:

  1. Build initial DAG — This stage performs a simple translation from the input LLVM code to an illegal SelectionDAG.
  2. Optimize SelectionDAG — This stage performs simple optimizations on the SelectionDAG to simplify it, and recognize meta instructions (like rotates and div/rem pairs) for targets that support these meta operations. This makes the resultant code more efficient and the select instructions from DAG phase (below) simpler.
  3. Legalize SelectionDAG Types — This stage transforms SelectionDAG nodes to eliminate any types that are unsupported on the target.
  4. Optimize SelectionDAG — The SelectionDAG optimizer is run to clean up redundancies exposed by type legalization.
  5. Legalize SelectionDAG Ops — This stage transforms SelectionDAG nodes to eliminate any operations that are unsupported on the target.
  6. Optimize SelectionDAG — The SelectionDAG optimizer is run to eliminate inefficiencies introduced by operation legalization.
  7. Select instructions from DAG — Finally, the target instruction selector matches the DAG operations to target instructions. This process translates the target-independent input DAG into another DAG of target instructions.
  8. SelectionDAG Scheduling and Formation — The last phase assigns a linear order to the instructions in the target-instruction DAG and emits them into the MachineFunction being compiled. This step uses traditional prepass scheduling techniques.

After all of these steps are complete, the SelectionDAG is destroyed and the rest of the code generation passes are run.

One great way to visualize what is going on here is to take advantage of a few LLC command line options. The following options pop up a window displaying the SelectionDAG at specific times (if you only get errors printed to the console while using this, you probably need to configure your system to add support for it).

  • -view-dag-combine1-dags displays the DAG after being built, before the first optimization pass.
  • -view-legalize-dags displays the DAG before Legalization.
  • -view-dag-combine2-dags displays the DAG before the second optimization pass.
  • -view-isel-dags displays the DAG before the Select phase.
  • -view-sched-dags displays the DAG before Scheduling.

The -view-sunit-dags displays the Scheduler's dependency graph. This graph is based on the final SelectionDAG, with nodes that must be scheduled together bundled into a single scheduling-unit node, and with immediate operands and other nodes that aren't relevant for scheduling omitted.

Initial SelectionDAG Construction

The initial SelectionDAG is naïvely peephole expanded from the LLVM input by the SelectionDAGLowering class in the lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp file. The intent of this pass is to expose as much low-level, target-specific details to the SelectionDAG as possible. This pass is mostly hard-coded (e.g. an LLVM add turns into an SDNode add while a getelementptr is expanded into the obvious arithmetic). This pass requires target-specific hooks to lower calls, returns, varargs, etc. For these features, the TargetLowering interface is used.

SelectionDAG LegalizeTypes Phase

The Legalize phase is in charge of converting a DAG to only use the types that are natively supported by the target.

There are two main ways of converting values of unsupported scalar types to values of supported types: converting small types to larger types ("promoting"), and breaking up large integer types into smaller ones ("expanding"). For example, a target might require that all f32 values are promoted to f64 and that all i1/i8/i16 values are promoted to i32. The same target might require that all i64 values be expanded into pairs of i32 values. These changes can insert sign and zero extensions as needed to make sure that the final code has the same behavior as the input.

There are two main ways of converting values of unsupported vector types to value of supported types: splitting vector types, multiple times if necessary, until a legal type is found, and extending vector types by adding elements to the end to round them out to legal types ("widening"). If a vector gets split all the way down to single-element parts with no supported vector type being found, the elements are converted to scalars ("scalarizing").

A target implementation tells the legalizer which types are supported (and which register class to use for them) by calling the addRegisterClass method in its TargetLowering constructor.

SelectionDAG Legalize Phase

The Legalize phase is in charge of converting a DAG to only use the operations that are natively supported by the target.

Targets often have weird constraints, such as not supporting every operation on every supported datatype (e.g. X86 does not support byte conditional moves and PowerPC does not support sign-extending loads from a 16-bit memory location). Legalize takes care of this by open-coding another sequence of operations to emulate the operation ("expansion"), by promoting one type to a larger type that supports the operation ("promotion"), or by using a target-specific hook to implement the legalization ("custom").

A target implementation tells the legalizer which operations are not supported (and which of the above three actions to take) by calling the setOperationAction method in its TargetLowering constructor.

Prior to the existence of the Legalize passes, we required that every target selector supported and handled every operator and type even if they are not natively supported. The introduction of the Legalize phases allows all of the canonicalization patterns to be shared across targets, and makes it very easy to optimize the canonicalized code because it is still in the form of a DAG.

SelectionDAG Optimization Phase: the DAG Combiner

The SelectionDAG optimization phase is run multiple times for code generation, immediately after the DAG is built and once after each legalization. The first run of the pass allows the initial code to be cleaned up (e.g. performing optimizations that depend on knowing that the operators have restricted type inputs). Subsequent runs of the pass clean up the messy code generated by the Legalize passes, which allows Legalize to be very simple (it can focus on making code legal instead of focusing on generating good and legal code).

One important class of optimizations performed is optimizing inserted sign and zero extension instructions. We currently use ad-hoc techniques, but could move to more rigorous techniques in the future. Here are some good papers on the subject:

"Widening integer arithmetic"
Kevin Redwine and Norman Ramsey
International Conference on Compiler Construction (CC) 2004

"Effective sign extension elimination"
Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani
Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation.

SelectionDAG Select Phase

The Select phase is the bulk of the target-specific code for instruction selection. This phase takes a legal SelectionDAG as input, pattern matches the instructions supported by the target to this DAG, and produces a new DAG of target code. For example, consider the following LLVM fragment:

 %t1 = fadd float %W, %X
 %t2 = fmul float %t1, %Y
 %t3 = fadd float %t2, %Z
 

This LLVM code corresponds to a SelectionDAG that looks basically like this:

 (fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
 

If a target supports floating point multiply-and-add (FMA) operations, one of the adds can be merged with the multiply. On the PowerPC, for example, the output of the instruction selector might look like this DAG:

 (FMADDS (FADDS W, X), Y, Z)
 

The FMADDS instruction is a ternary instruction that multiplies its first two operands and adds the third (as single-precision floating-point numbers). The FADDS instruction is a simple binary single-precision add instruction. To perform this pattern match, the PowerPC backend includes the following instruction definitions:

 def FMADDS : AForm_1<59, 29,
                     (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
                     "fmadds $FRT, $FRA, $FRC, $FRB",
                     [(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
                                            F4RC:$FRB))]>;
 def FADDS : AForm_2<59, 21,
                     (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
                     "fadds $FRT, $FRA, $FRB",
                     [(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>;
 

The portion of the instruction definition in bold indicates the pattern used to match the instruction. The DAG operators (like fmul/fadd) are defined in the include/llvm/Target/TargetSelectionDAG.td file. " F4RC" is the register class of the input and result values.

The TableGen DAG instruction selector generator reads the instruction patterns in the .td file and automatically builds parts of the pattern matching code for your target. It has the following strengths:

  • At compiler-compiler time, it analyzes your instruction patterns and tells you if your patterns make sense or not.
  • It can handle arbitrary constraints on operands for the pattern match. In particular, it is straight-forward to say things like "match any immediate that is a 13-bit sign-extended value". For examples, see the immSExt16 and related tblgen classes in the PowerPC backend.
  • It knows several important identities for the patterns defined. For example, it knows that addition is commutative, so it allows the FMADDS pattern above to match "(fadd X, (fmul Y, Z))" as well as "(fadd (fmul X, Y), Z)", without the target author having to specially handle this case.
  • It has a full-featured type-inferencing system. In particular, you should rarely have to explicitly tell the system what type parts of your patterns are. In the FMADDS case above, we didn't have to tell tblgen that all of the nodes in the pattern are of type 'f32'. It was able to infer and propagate this knowledge from the fact that F4RC has type 'f32'.
  • Targets can define their own (and rely on built-in) "pattern fragments". Pattern fragments are chunks of reusable patterns that get inlined into your patterns during compiler-compiler time. For example, the integer "(not x)" operation is actually defined as a pattern fragment that expands as "(xor x, -1)", since the SelectionDAG does not have a native 'not' operation. Targets can define their own short-hand fragments as they see fit. See the definition of 'not' and 'ineg' for examples.
  • In addition to instructions, targets can specify arbitrary patterns that map to one or more instructions using the 'Pat' class. For example, the PowerPC has no way to load an arbitrary integer immediate into a register in one instruction. To tell tblgen how to do this, it defines:

     // Arbitrary immediate support.  Implement in terms of LIS/ORI.
     def : Pat<(i32 imm:$imm),
               (ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;
     

    If none of the single-instruction patterns for loading an immediate into a register match, this will be used. This rule says "match an arbitrary i32 immediate, turning it into an ORI ('or a 16-bit immediate') and an LIS ('load 16-bit immediate, where the immediate is shifted to the left 16 bits') instruction". To make this work, the LO16/HI16 node transformations are used to manipulate the input immediate (in this case, take the high or low 16-bits of the immediate).
  • While the system does automate a lot, it still allows you to write custom C++ code to match special cases if there is something that is hard to express.

While it has many strengths, the system currently has some limitations, primarily because it is a work in progress and is not yet finished:

  • Overall, there is no way to define or match SelectionDAG nodes that define multiple values (e.g. SMUL_LOHI, LOAD, CALL, etc). This is the biggest reason that you currently still have to write custom C++ code for your instruction selector.
  • There is no great way to support matching complex addressing modes yet. In the future, we will extend pattern fragments to allow them to define multiple values (e.g. the four operands of the X86 addressing mode, which are currently matched with custom C++ code). In addition, we'll extend fragments so that a fragment can match multiple different patterns.
  • We don't automatically infer flags like isStore/isLoad yet.
  • We don't automatically generate the set of supported registers and operations for the Legalizer yet.
  • We don't have a way of tying in custom legalized nodes yet.

Despite these limitations, the instruction selector generator is still quite useful for most of the binary and logical operations in typical instruction sets. If you run into any problems or can't figure out how to do something, please let Chris know!

SelectionDAG Scheduling and Formation Phase

The scheduling phase takes the DAG of target instructions from the selection phase and assigns an order. The scheduler can pick an order depending on various constraints of the machines (i.e. order for minimal register pressure or try to cover instruction latencies). Once an order is established, the DAG is converted to a list of MachineInstrs and the SelectionDAG is destroyed.

Note that this phase is logically separate from the instruction selection phase, but is tied to it closely in the code because it operates on SelectionDAGs.

Future directions for the SelectionDAG

  1. Optional function-at-a-time selection.
  2. Auto-generate entire selector from .td file.

SSA-based Machine Code Optimizations

To Be Written

Live Intervals

Live Intervals are the ranges (intervals) where a variable is live. They are used by some register allocator passes to determine if two or more virtual registers which require the same physical register are live at the same point in the program (i.e., they conflict). When this situation occurs, one virtual register must be spilled.

Live Variable Analysis

The first step in determining the live intervals of variables is to calculate the set of registers that are immediately dead after the instruction (i.e., the instruction calculates the value, but it is never used) and the set of registers that are used by the instruction, but are never used after the instruction (i.e., they are killed). Live variable information is computed for each virtual register and register allocatable physical register in the function. This is done in a very efficient manner because it uses SSA to sparsely compute lifetime information for virtual registers (which are in SSA form) and only has to track physical registers within a block. Before register allocation, LLVM can assume that physical registers are only live within a single basic block. This allows it to do a single, local analysis to resolve physical register lifetimes within each basic block. If a physical register is not register allocatable (e.g., a stack pointer or condition codes), it is not tracked.

Physical registers may be live in to or out of a function. Live in values are typically arguments in registers. Live out values are typically return values in registers. Live in values are marked as such, and are given a dummy "defining" instruction during live intervals analysis. If the last basic block of a function is a return, then it's marked as using all live out values in the function.

PHI nodes need to be handled specially, because the calculation of the live variable information from a depth first traversal of the CFG of the function won't guarantee that a virtual register used by the PHI node is defined before it's used. When a PHI node is encountered, only the definition is handled, because the uses will be handled in other basic blocks.

For each PHI node of the current basic block, we simulate an assignment at the end of the current basic block and traverse the successor basic blocks. If a successor basic block has a PHI node and one of the PHI node's operands is coming from the current basic block, then the variable is marked as alive within the current basic block and all of its predecessor basic blocks, until the basic block with the defining instruction is encountered.

Live Intervals Analysis

We now have the information available to perform the live intervals analysis and build the live intervals themselves. We start off by numbering the basic blocks and machine instructions. We then handle the "live-in" values. These are in physical registers, so the physical register is assumed to be killed by the end of the basic block. Live intervals for virtual registers are computed for some ordering of the machine instructions [1, N]. A live interval is an interval [i, j), where 1 <= i <= j < N, for which a variable is live.

More to come...

Register Allocation

The Register Allocation problem consists in mapping a program Pv, that can use an unbounded number of virtual registers, to a program Pp that contains a finite (possibly small) number of physical registers. Each target architecture has a different number of physical registers. If the number of physical registers is not enough to accommodate all the virtual registers, some of them will have to be mapped into memory. These virtuals are called spilled virtuals.

How registers are represented in LLVM

In LLVM, physical registers are denoted by integer numbers that normally range from 1 to 1023. To see how this numbering is defined for a particular architecture, you can read the GenRegisterNames.inc file for that architecture. For instance, by inspecting lib/Target/X86/X86GenRegisterNames.inc we see that the 32-bit register EAX is denoted by 15, and the MMX register MM0 is mapped to 48.

Some architectures contain registers that share the same physical location. A notable example is the X86 platform. For instance, in the X86 architecture, the registers EAX, AX and AL share the first eight bits. These physical registers are marked as aliased in LLVM. Given a particular architecture, you can check which registers are aliased by inspecting its RegisterInfo.td file. Moreover, the method TargetRegisterInfo::getAliasSet(p_reg) returns an array containing all the physical registers aliased to the register p_reg.

Physical registers, in LLVM, are grouped in Register Classes. Elements in the same register class are functionally equivalent, and can be interchangeably used. Each virtual register can only be mapped to physical registers of a particular class. For instance, in the X86 architecture, some virtuals can only be allocated to 8 bit registers. A register class is described by TargetRegisterClass objects. To discover if a virtual register is compatible with a given physical, this code can be used:

 bool RegMapping_Fer::compatible_class(MachineFunction &mf,
                                       unsigned v_reg,
                                       unsigned p_reg) {
   assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
          "Target register must be physical");
   const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
   return trc->contains(p_reg);
 }
 

Sometimes, mostly for debugging purposes, it is useful to change the number of physical registers available in the target architecture. This must be done statically, inside the TargetRegsterInfo.td file. Just grep for RegisterClass, the last parameter of which is a list of registers. Just commenting some out is one simple way to avoid them being used. A more polite way is to explicitly exclude some registers from the allocation order. See the definition of the GR8 register class in lib/Target/X86/X86RegisterInfo.td for an example of this.

Virtual registers are also denoted by integer numbers. Contrary to physical registers, different virtual registers never share the same number. Whereas physical registers are statically defined in a TargetRegisterInfo.td file and cannot be created by the application developer, that is not the case with virtual registers. In order to create new virtual registers, use the method MachineRegisterInfo::createVirtualRegister(). This method will return a new virtual register. Use an IndexedMap<Foo, VirtReg2IndexFunctor> to hold information per virtual register. If you need to enumerate all virtual registers, use the function TargetRegisterInfo::index2VirtReg() to find the virtual register numbers:

   for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
     unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i);
     stuff(VirtReg);
   }
 

Before register allocation, the operands of an instruction are mostly virtual registers, although physical registers may also be used. In order to check if a given machine operand is a register, use the boolean function MachineOperand::isRegister(). To obtain the integer code of a register, use MachineOperand::getReg(). An instruction may define or use a register. For instance, ADD reg:1026 := reg:1025 reg:1024 defines the registers 1024, and uses registers 1025 and 1026. Given a register operand, the method MachineOperand::isUse() informs if that register is being used by the instruction. The method MachineOperand::isDef() informs if that registers is being defined.

We will call physical registers present in the LLVM bitcode before register allocation pre-colored registers. Pre-colored registers are used in many different situations, for instance, to pass parameters of functions calls, and to store results of particular instructions. There are two types of pre-colored registers: the ones implicitly defined, and those explicitly defined. Explicitly defined registers are normal operands, and can be accessed with MachineInstr::getOperand(int)::getReg(). In order to check which registers are implicitly defined by an instruction, use the TargetInstrInfo::get(opcode)::ImplicitDefs, where opcode is the opcode of the target instruction. One important difference between explicit and implicit physical registers is that the latter are defined statically for each instruction, whereas the former may vary depending on the program being compiled. For example, an instruction that represents a function call will always implicitly define or use the same set of physical registers. To read the registers implicitly used by an instruction, use TargetInstrInfo::get(opcode)::ImplicitUses. Pre-colored registers impose constraints on any register allocation algorithm. The register allocator must make sure that none of them are overwritten by the values of virtual registers while still alive.

Mapping virtual registers to physical registers

There are two ways to map virtual registers to physical registers (or to memory slots). The first way, that we will call direct mapping, is based on the use of methods of the classes TargetRegisterInfo, and MachineOperand. The second way, that we will call indirect mapping, relies on the VirtRegMap class in order to insert loads and stores sending and getting values to and from memory.

The direct mapping provides more flexibility to the developer of the register allocator; however, it is more error prone, and demands more implementation work. Basically, the programmer will have to specify where load and store instructions should be inserted in the target function being compiled in order to get and store values in memory. To assign a physical register to a virtual register present in a given operand, use MachineOperand::setReg(p_reg). To insert a store instruction, use TargetInstrInfo::storeRegToStackSlot(...), and to insert a load instruction, use TargetInstrInfo::loadRegFromStackSlot.

The indirect mapping shields the application developer from the complexities of inserting load and store instructions. In order to map a virtual register to a physical one, use VirtRegMap::assignVirt2Phys(vreg, preg). In order to map a certain virtual register to memory, use VirtRegMap::assignVirt2StackSlot(vreg). This method will return the stack slot where vreg's value will be located. If it is necessary to map another virtual register to the same stack slot, use VirtRegMap::assignVirt2StackSlot(vreg, stack_location). One important point to consider when using the indirect mapping, is that even if a virtual register is mapped to memory, it still needs to be mapped to a physical register. This physical register is the location where the virtual register is supposed to be found before being stored or after being reloaded.

If the indirect strategy is used, after all the virtual registers have been mapped to physical registers or stack slots, it is necessary to use a spiller object to place load and store instructions in the code. Every virtual that has been mapped to a stack slot will be stored to memory after been defined and will be loaded before being used. The implementation of the spiller tries to recycle load/store instructions, avoiding unnecessary instructions. For an example of how to invoke the spiller, see RegAllocLinearScan::runOnMachineFunction in lib/CodeGen/RegAllocLinearScan.cpp.

Handling two address instructions

With very rare exceptions (e.g., function calls), the LLVM machine code instructions are three address instructions. That is, each instruction is expected to define at most one register, and to use at most two registers. However, some architectures use two address instructions. In this case, the defined register is also one of the used register. For instance, an instruction such as ADD %EAX, %EBX, in X86 is actually equivalent to %EAX = %EAX + %EBX.

In order to produce correct code, LLVM must convert three address instructions that represent two address instructions into true two address instructions. LLVM provides the pass TwoAddressInstructionPass for this specific purpose. It must be run before register allocation takes place. After its execution, the resulting code may no longer be in SSA form. This happens, for instance, in situations where an instruction such as %a = ADD %b %c is converted to two instructions such as:

 %a = MOVE %b
 %a = ADD %a %c
 

Notice that, internally, the second instruction is represented as ADD %a[def/use] %c. I.e., the register operand %a is both used and defined by the instruction.

The SSA deconstruction phase

An important transformation that happens during register allocation is called the SSA Deconstruction Phase. The SSA form simplifies many analyses that are performed on the control flow graph of programs. However, traditional instruction sets do not implement PHI instructions. Thus, in order to generate executable code, compilers must replace PHI instructions with other instructions that preserve their semantics.

There are many ways in which PHI instructions can safely be removed from the target code. The most traditional PHI deconstruction algorithm replaces PHI instructions with copy instructions. That is the strategy adopted by LLVM. The SSA deconstruction algorithm is implemented in lib/CodeGen/PHIElimination.cpp. In order to invoke this pass, the identifier PHIEliminationID must be marked as required in the code of the register allocator.

Instruction folding

Instruction folding is an optimization performed during register allocation that removes unnecessary copy instructions. For instance, a sequence of instructions such as:

 %EBX = LOAD %mem_address
 %EAX = COPY %EBX
 

can be safely substituted by the single instruction:

 %EAX = LOAD %mem_address
 

Instructions can be folded with the TargetRegisterInfo::foldMemoryOperand(...) method. Care must be taken when folding instructions; a folded instruction can be quite different from the original instruction. See LiveIntervals::addIntervalsForSpills in lib/CodeGen/LiveIntervalAnalysis.cpp for an example of its use.

Built in register allocators

The LLVM infrastructure provides the application developer with three different register allocators:

  • Fast — This register allocator is the default for debug builds. It allocates registers on a basic block level, attempting to keep values in registers and reusing registers as appropriate.
  • Basic — This is an incremental approach to register allocation. Live ranges are assigned to registers one at a time in an order that is driven by heuristics. Since code can be rewritten on-the-fly during allocation, this framework allows interesting allocators to be developed as extensions. It is not itself a production register allocator but is a potentially useful stand-alone mode for triaging bugs and as a performance baseline.
  • GreedyThe default allocator. This is a highly tuned implementation of the Basic allocator that incorporates global live range splitting. This allocator works hard to minimize the cost of spill code.
  • PBQP — A Partitioned Boolean Quadratic Programming (PBQP) based register allocator. This allocator works by constructing a PBQP problem representing the register allocation problem under consideration, solving this using a PBQP solver, and mapping the solution back to a register assignment.

The type of register allocator used in llc can be chosen with the command line option -regalloc=...:

 $ llc -regalloc=linearscan file.bc -o ln.s;
 $ llc -regalloc=fast file.bc -o fa.s;
 $ llc -regalloc=pbqp file.bc -o pbqp.s;
 

Prolog/Epilog Code Insertion

+
+

Compact Unwind

Throwing an exception requires unwinding out of a function. The information on how to unwind a given function is traditionally expressed in DWARF unwind (a.k.a. frame) info. But that format was originally developed for debuggers to backtrace, and each Frame Description Entry (FDE) requires ~20-30 bytes per function. There is also the cost of mapping from an address in a function to the corresponding FDE at runtime. An alternative unwind encoding is called compact unwind and requires just 4-bytes per function.

The compact unwind encoding is a 32-bit value, which is encoded in an architecture-specific way. It specifies which registers to restore and from where, and how to unwind out of the function. When the linker creates a final linked image, it will create a __TEXT,__unwind_info section. This section is a small and fast way for the runtime to access unwind info for any given function. If we emit compact unwind info for the function, that compact unwind info will be encoded in the __TEXT,__unwind_info section. If we emit DWARF unwind info, the __TEXT,__unwind_info section will contain the offset of the FDE in the __TEXT,__eh_frame section in the final linked image.

For X86, there are three modes for the compact unwind encoding:

Function with a Frame Pointer (EBP or RBP)

EBP/RBP-based frame, where EBP/RBP is pushed onto the stack immediately after the return address, then ESP/RSP is moved to EBP/RBP. Thus to unwind, ESP/RSP is restored with the current EBP/RBP value, then EBP/RBP is restored by popping the stack, and the return is done by popping the stack once more into the PC. All non-volatile registers that need to be restored must have been saved in a small range on the stack that starts EBP-4 to EBP-1020 (RBP-8 to RBP-1020). The offset (divided by 4 in 32-bit mode and 8 in 64-bit mode) is encoded in bits 16-23 (mask: 0x00FF0000). The registers saved are encoded in bits 0-14 (mask: 0x00007FFF) as five 3-bit entries from the following table:

Compact Number i386 Register x86-64 Regiser
1 EBX RBX
2 ECX R12
3 EDX R13
4 EDI R14
5 ESI R15
6 EBP RBP
Frameless with a Small Constant Stack Size (EBP or RBP is not used as a frame pointer)

To return, a constant (encoded in the compact unwind encoding) is added to the ESP/RSP. Then the return is done by popping the stack into the PC. All non-volatile registers that need to be restored must have been saved on the stack immediately after the return address. The stack size (divided by 4 in 32-bit mode and 8 in 64-bit mode) is encoded in bits 16-23 (mask: 0x00FF0000). There is a maximum stack size of 1024 bytes in 32-bit mode and 2048 in 64-bit mode. The number of registers saved is encoded in bits 9-12 (mask: 0x00001C00). Bits 0-9 (mask: 0x000003FF) contain which registers were saved and their order. (See the encodeCompactUnwindRegistersWithoutFrame() function in lib/Target/X86FrameLowering.cpp for the encoding algorithm.)

Frameless with a Large Constant Stack Size (EBP or RBP is not used as a frame pointer)

This case is like the "Frameless with a Small Constant Stack Size" case, but the stack size is too large to encode in the compact unwind encoding. Instead it requires that the function contains "subl $nnnnnn, %esp" in its prolog. The compact encoding contains the offset to the $nnnnnn value in the function in bits 9-12 (mask: 0x00001C00).

+
+

Late Machine Code Optimizations

To Be Written

Code Emission

The code emission step of code generation is responsible for lowering from the code generator abstractions (like MachineFunction, MachineInstr, etc) down to the abstractions used by the MC layer (MCInst, MCStreamer, etc). This is done with a combination of several different classes: the (misnamed) target-independent AsmPrinter class, target-specific subclasses of AsmPrinter (such as SparcAsmPrinter), and the TargetLoweringObjectFile class.

Since the MC layer works at the level of abstraction of object files, it doesn't have a notion of functions, global variables etc. Instead, it thinks about labels, directives, and instructions. A key class used at this time is the MCStreamer class. This is an abstract API that is implemented in different ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively an "assembler API". MCStreamer has one method per directive, such as EmitLabel, EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly level directives.

If you are interested in implementing a code generator for a target, there are three important things that you have to implement for your target:

  1. First, you need a subclass of AsmPrinter for your target. This class implements the general lowering process converting MachineFunction's into MC label constructs. The AsmPrinter base class provides a number of useful methods and routines, and also allows you to override the lowering process in some important ways. You should get much of the lowering for free if you are implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile class implements much of the common logic.
  2. Second, you need to implement an instruction printer for your target. The instruction printer takes an MCInst and renders it to a raw_ostream as text. Most of this is automatically generated from the .td file (when you specify something like "add $dst, $src1, $src2" in the instructions), but you need to implement routines to print operands.
  3. Third, you need to implement code that lowers a MachineInstr to an MCInst, usually implemented in "<target>MCInstLower.cpp". This lowering process is often target specific, and is responsible for turning jump table entries, constant pool indices, global variable addresses, etc into MCLabels as appropriate. This translation layer is also responsible for expanding pseudo ops used by the code generator into the actual machine instructions they correspond to. The MCInsts that are generated by this are fed into the instruction printer or the encoder.

Finally, at your choosing, you can also implement an subclass of MCCodeEmitter which lowers MCInst's into machine code bytes and relocations. This is important if you want to support direct .o file emission, or would like to implement an assembler for your target.

Implementing a Native Assembler

Though you're probably reading this because you want to write or maintain a compiler backend, LLVM also fully supports building a native assemblers too. We've tried hard to automate the generation of the assembler from the .td files (in particular the instruction syntax and encodings), which means that a large part of the manual and repetitive data entry can be factored and shared with the compiler.

Instruction Parsing

To Be Written

Instruction Alias Processing

Once the instruction is parsed, it enters the MatchInstructionImpl function. The MatchInstructionImpl function performs alias processing and then does actual matching.

Alias processing is the phase that canonicalizes different lexical forms of the same instructions down to one representation. There are several different kinds of alias that are possible to implement and they are listed below in the order that they are processed (which is in order from simplest/weakest to most complex/powerful). Generally you want to use the first alias mechanism that meets the needs of your instruction, because it will allow a more concise description.

Mnemonic Aliases

The first phase of alias processing is simple instruction mnemonic remapping for classes of instructions which are allowed with two different mnemonics. This phase is a simple and unconditionally remapping from one input mnemonic to one output mnemonic. It isn't possible for this form of alias to look at the operands at all, so the remapping must apply for all forms of a given mnemonic. Mnemonic aliases are defined simply, for example X86 has:

 def : MnemonicAlias<"cbw",     "cbtw">;
 def : MnemonicAlias<"smovq",   "movsq">;
 def : MnemonicAlias<"fldcww",  "fldcw">;
 def : MnemonicAlias<"fucompi", "fucomip">;
 def : MnemonicAlias<"ud2a",    "ud2">;
 

... and many others. With a MnemonicAlias definition, the mnemonic is remapped simply and directly. Though MnemonicAlias's can't look at any aspect of the instruction (such as the operands) they can depend on global modes (the same ones supported by the matcher), through a Requires clause:

 def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>;
 def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>;
 

In this example, the mnemonic gets mapped into different a new one depending on the current instruction set.

Instruction Aliases

The most general phase of alias processing occurs while matching is happening: it provides new forms for the matcher to match along with a specific instruction to generate. An instruction alias has two parts: the string to match and the instruction to generate. For example:

 def : InstAlias<"movsx $src, $dst", (MOVSX16rr8W GR16:$dst, GR8  :$src)>;
 def : InstAlias<"movsx $src, $dst", (MOVSX16rm8W GR16:$dst, i8mem:$src)>;
 def : InstAlias<"movsx $src, $dst", (MOVSX32rr8  GR32:$dst, GR8  :$src)>;
 def : InstAlias<"movsx $src, $dst", (MOVSX32rr16 GR32:$dst, GR16 :$src)>;
 def : InstAlias<"movsx $src, $dst", (MOVSX64rr8  GR64:$dst, GR8  :$src)>;
 def : InstAlias<"movsx $src, $dst", (MOVSX64rr16 GR64:$dst, GR16 :$src)>;
 def : InstAlias<"movsx $src, $dst", (MOVSX64rr32 GR64:$dst, GR32 :$src)>;
 

This shows a powerful example of the instruction aliases, matching the same mnemonic in multiple different ways depending on what operands are present in the assembly. The result of instruction aliases can include operands in a different order than the destination instruction, and can use an input multiple times, for example:

 def : InstAlias<"clrb $reg", (XOR8rr  GR8 :$reg, GR8 :$reg)>;
 def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>;
 def : InstAlias<"clrl $reg", (XOR32rr GR32:$reg, GR32:$reg)>;
 def : InstAlias<"clrq $reg", (XOR64rr GR64:$reg, GR64:$reg)>;
 

This example also shows that tied operands are only listed once. In the X86 backend, XOR8rr has two input GR8's and one output GR8 (where an input is tied to the output). InstAliases take a flattened operand list without duplicates for tied operands. The result of an instruction alias can also use immediates and fixed physical registers which are added as simple immediate operands in the result, for example:

 // Fixed Immediate operand.
 def : InstAlias<"aad", (AAD8i8 10)>;
 
 // Fixed register operand.
 def : InstAlias<"fcomi", (COM_FIr ST1)>;
 
 // Simple alias.
 def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>;
 

Instruction aliases can also have a Requires clause to make them subtarget specific.

If the back-end supports it, the instruction printer can automatically emit the alias rather than what's being aliased. It typically leads to better, more readable code. If it's better to print out what's being aliased, then pass a '0' as the third parameter to the InstAlias definition.

Instruction Matching

To Be Written

Target-specific Implementation Notes

This section of the document explains features or design decisions that are specific to the code generator for a particular target. First we start with a table that summarizes what features are supported by each target.

Target Feature Matrix

Note that this table does not include the C backend or Cpp backends, since they do not use the target independent code generator infrastructure. It also doesn't list features that are not supported fully by any target yet. It considers a feature to be supported if at least one subtarget supports it. A feature being supported means that it is useful and works for most cases, it does not indicate that there are zero known bugs in the implementation. Here is the key:

Unknown No support Partial Support Complete Support

Here is the table:

Target
Feature ARM Alpha Blackfin CellSPU MBlaze MSP430 Mips PTX PowerPC Sparc SystemZ X86 XCore
is generally reliable
assembly parser
disassembler
inline asm
jit *
.o file writing
tail calls

Is Generally Reliable

This box indicates whether the target is considered to be production quality. This indicates that the target has been used as a static compiler to compile large amounts of code by a variety of different people and is in continuous use.

Assembly Parser

This box indicates whether the target supports parsing target specific .s files by implementing the MCAsmParser interface. This is required for llvm-mc to be able to act as a native assembler and is required for inline assembly support in the native .o file writer.

Disassembler

This box indicates whether the target supports the MCDisassembler API for disassembling machine opcode bytes into MCInst's.

Inline Asm

This box indicates whether the target supports most popular inline assembly constraints and modifiers.

JIT Support

This box indicates whether the target supports the JIT compiler through the ExecutionEngine interface.

The ARM backend has basic support for integer code in ARM codegen mode, but lacks NEON and full Thumb support.

.o File Writing

This box indicates whether the target supports writing .o files (e.g. MachO, ELF, and/or COFF) files directly from the target. Note that the target also must include an assembly parser and general inline assembly support for full inline assembly support in the .o writer.

Targets that don't support this feature can obviously still write out .o files, they just rely on having an external assembler to translate from a .s file to a .o file (as is the case for many C compilers).

Tail Calls

This box indicates whether the target supports guaranteed tail calls. These are calls marked "tail" and use the fastcc calling convention. Please see the tail call section more more details.

Tail call optimization

Tail call optimization, callee reusing the stack of the caller, is currently supported on x86/x86-64 and PowerPC. It is performed if:

x86/x86-64 constraints:

PowerPC constraints:

Example:

Call as llc -tailcallopt test.ll.

 declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
 
 define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
   %l1 = add i32 %in1, %in2
   %tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
   ret i32 %tmp
 }
 

Implications of -tailcallopt:

To support tail call optimization in situations where the callee has more arguments than the caller a 'callee pops arguments' convention is used. This currently causes each fastcc call that is not tail call optimized (because one or more of above constraints are not met) to be followed by a readjustment of the stack. So performance might be worse in such cases.

Sibling call optimization

Sibling call optimization is a restricted form of tail call optimization. Unlike tail call optimization described in the previous section, it can be performed automatically on any tail calls when -tailcallopt option is not specified.

Sibling call optimization is currently performed on x86/x86-64 when the following constraints are met:

Example:

 declare i32 @bar(i32, i32)
 
 define i32 @foo(i32 %a, i32 %b, i32 %c) {
 entry:
   %0 = tail call i32 @bar(i32 %a, i32 %b)
   ret i32 %0
 }
 

The X86 backend

The X86 code generator lives in the lib/Target/X86 directory. This code generator is capable of targeting a variety of x86-32 and x86-64 processors, and includes support for ISA extensions such as MMX and SSE.

X86 Target Triples supported

The following are the known target triples that are supported by the X86 backend. This is not an exhaustive list, and it would be useful to add those that people test.

  • i686-pc-linux-gnu — Linux
  • i386-unknown-freebsd5.3 — FreeBSD 5.3
  • i686-pc-cygwin — Cygwin on Win32
  • i686-pc-mingw32 — MingW on Win32
  • i386-pc-mingw32msvc — MingW crosscompiler on Linux
  • i686-apple-darwin* — Apple Darwin on X86
  • x86_64-unknown-linux-gnu — Linux

X86 Calling Conventions supported

The following target-specific calling conventions are known to backend:

  • x86_StdCall — stdcall calling convention seen on Microsoft Windows platform (CC ID = 64).
  • x86_FastCall — fastcall calling convention seen on Microsoft Windows platform (CC ID = 65).
  • x86_ThisCall — Similar to X86_StdCall. Passes first argument in ECX, others via stack. Callee is responsible for stack cleaning. This convention is used by MSVC by default for methods in its ABI (CC ID = 70).

Representing X86 addressing modes in MachineInstrs

The x86 has a very flexible way of accessing memory. It is capable of forming memory addresses of the following expression directly in integer instructions (which use ModR/M addressing):

 SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
 

In order to represent this, LLVM tracks no less than 5 operands for each memory operand of this form. This means that the "load" form of 'mov' has the following MachineOperands in this order:

 Index:        0     |    1        2       3           4          5
 Meaning:   DestReg, | BaseReg,  Scale, IndexReg, Displacement Segment
 OperandTy: VirtReg, | VirtReg, UnsImm, VirtReg,   SignExtImm  PhysReg
 

Stores, and all other instructions, treat the four memory operands in the same way and in the same order. If the segment register is unspecified (regno = 0), then no segment override is generated. "Lea" operations do not have a segment register specified, so they only have 4 operands for their memory reference.

X86 address spaces supported

x86 has a feature which provides the ability to perform loads and stores to different address spaces via the x86 segment registers. A segment override prefix byte on an instruction causes the instruction's memory access to go to the specified segment. LLVM address space 0 is the default address space, which includes the stack, and any unqualified memory accesses in a program. Address spaces 1-255 are currently reserved for user-defined code. The GS-segment is represented by address space 256, while the FS-segment is represented by address space 257. Other x86 segments have yet to be allocated address space numbers.

While these address spaces may seem similar to TLS via the thread_local keyword, and often use the same underlying hardware, there are some fundamental differences.

The thread_local keyword applies to global variables and specifies that they are to be allocated in thread-local memory. There are no type qualifiers involved, and these variables can be pointed to with normal pointers and accessed with normal loads and stores. The thread_local keyword is target-independent at the LLVM IR level (though LLVM doesn't yet have implementations of it for some configurations).

Special address spaces, in contrast, apply to static types. Every load and store has a particular address space in its address operand type, and this is what determines which address space is accessed. LLVM ignores these special address space qualifiers on global variables, and does not provide a way to directly allocate storage in them. At the LLVM IR level, the behavior of these special address spaces depends in part on the underlying OS or runtime environment, and they are specific to x86 (and LLVM doesn't yet handle them correctly in some cases).

Some operating systems and runtime environments use (or may in the future use) the FS/GS-segment registers for various low-level purposes, so care should be taken when considering them.

Instruction naming

An instruction name consists of the base name, a default operand size, and a a character per operand with an optional special size. For example:

 ADD8rr      -> add, 8-bit register, 8-bit register
 IMUL16rmi   -> imul, 16-bit register, 16-bit memory, 16-bit immediate
 IMUL16rmi8  -> imul, 16-bit register, 16-bit memory, 8-bit immediate
 MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
 

The PowerPC backend

The PowerPC code generator lives in the lib/Target/PowerPC directory. The code generation is retargetable to several variations or subtargets of the PowerPC ISA; including ppc32, ppc64 and altivec.

LLVM PowerPC ABI

LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC relative (PIC) or static addressing for accessing global values, so no TOC (r2) is used. Second, r31 is used as a frame pointer to allow dynamic growth of a stack frame. LLVM takes advantage of having no TOC to provide space to save the frame pointer in the PowerPC linkage area of the caller frame. Other details of PowerPC ABI can be found at PowerPC ABI. Note: This link describes the 32 bit ABI. The 64 bit ABI is similar except space for GPRs are 8 bytes wide (not 4) and r13 is reserved for system use.

Frame Layout

The size of a PowerPC frame is usually fixed for the duration of a function's invocation. Since the frame is fixed size, all references into the frame can be accessed via fixed offsets from the stack pointer. The exception to this is when dynamic alloca or variable sized arrays are present, then a base pointer (r31) is used as a proxy for the stack pointer and stack pointer is free to grow or shrink. A base pointer is also used if llvm-gcc is not passed the -fomit-frame-pointer flag. The stack pointer is always aligned to 16 bytes, so that space allocated for altivec vectors will be properly aligned.

An invocation frame is laid out as follows (low memory at top);

Linkage

Parameter area

Dynamic area

Locals area

Saved registers area


Previous Frame

The linkage area is used by a callee to save special registers prior to allocating its own frame. Only three entries are relevant to LLVM. The first entry is the previous stack pointer (sp), aka link. This allows probing tools like gdb or exception handlers to quickly scan the frames in the stack. A function epilog can also use the link to pop the frame from the stack. The third entry in the linkage area is used to save the return address from the lr register. Finally, as mentioned above, the last entry is used to save the previous frame pointer (r31.) The entries in the linkage area are the size of a GPR, thus the linkage area is 24 bytes long in 32 bit mode and 48 bytes in 64 bit mode.

32 bit linkage area

0 Saved SP (r1)
4 Saved CR
8 Saved LR
12 Reserved
16 Reserved
20 Saved FP (r31)

64 bit linkage area

0 Saved SP (r1)
8 Saved CR
16 Saved LR
24 Reserved
32 Reserved
40 Saved FP (r31)

The parameter area is used to store arguments being passed to a callee function. Following the PowerPC ABI, the first few arguments are actually passed in registers, with the space in the parameter area unused. However, if there are not enough registers or the callee is a thunk or vararg function, these register arguments can be spilled into the parameter area. Thus, the parameter area must be large enough to store all the parameters for the largest call sequence made by the caller. The size must also be minimally large enough to spill registers r3-r10. This allows callees blind to the call signature, such as thunks and vararg functions, enough space to cache the argument registers. Therefore, the parameter area is minimally 32 bytes (64 bytes in 64 bit mode.) Also note that since the parameter area is a fixed offset from the top of the frame, that a callee can access its spilt arguments using fixed offsets from the stack pointer (or base pointer.)

Combining the information about the linkage, parameter areas and alignment. A stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit mode.

The dynamic area starts out as size zero. If a function uses dynamic alloca then space is added to the stack, the linkage and parameter areas are shifted to top of stack, and the new space is available immediately below the linkage and parameter areas. The cost of shifting the linkage and parameter areas is minor since only the link value needs to be copied. The link value can be easily fetched by adding the original frame size to the base pointer. Note that allocations in the dynamic space need to observe 16 byte alignment.

The locals area is where the llvm compiler reserves space for local variables.

The saved registers area is where the llvm compiler spills callee saved registers on entry to the callee.

Prolog/Epilog

The llvm prolog and epilog are the same as described in the PowerPC ABI, with the following exceptions. Callee saved registers are spilled after the frame is created. This allows the llvm epilog/prolog support to be common with other targets. The base pointer callee saved register r31 is saved in the TOC slot of linkage area. This simplifies allocation of space for the base pointer and makes it convenient to locate programatically and during debugging.

Dynamic Allocation

TODO - More to come.

The PTX backend

The PTX code generator lives in the lib/Target/PTX directory. It is currently a work-in-progress, but already supports most of the code generation functionality needed to generate correct PTX kernels for CUDA devices.

The code generator can target PTX 2.0+, and shader model 1.0+. The PTX ISA Reference Manual is used as the primary source of ISA information, though an effort is made to make the output of the code generator match the output of the NVidia nvcc compiler, whenever possible.

Code Generator Options:

Option Description
double If enabled, the map_f64_to_f32 directive is disabled in the PTX output, allowing native double-precision arithmetic
no-fma Disable generation of Fused-Multiply Add instructions, which may be beneficial for some devices
smxy / computexy Set shader model/compute capability to x.y, e.g. sm20 or compute13

Working:

In Progress:


Valid CSS Valid HTML 4.01 Chris Lattner
The LLVM Compiler Infrastructure
- Last modified: $Date: 2011-09-19 20:15:46 +0200 (Mon, 19 Sep 2011) $ + Last modified: $Date: 2011-11-03 07:43:54 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/CodingStandards.html =================================================================== --- vendor/llvm/dist/docs/CodingStandards.html (revision 228363) +++ vendor/llvm/dist/docs/CodingStandards.html (revision 228364) @@ -1,1533 +1,1534 @@ + LLVM Coding Standards

LLVM Coding Standards

  1. Introduction
  2. Mechanical Source Issues
    1. Source Code Formatting
      1. Commenting
      2. Comment Formatting
      3. #include Style
      4. Source Code Width
      5. Use Spaces Instead of Tabs
      6. Indent Code Consistently
    2. Compiler Issues
      1. Treat Compiler Warnings Like Errors
      2. Write Portable Code
      3. Do not use RTTI or Exceptions
      4. Use of class/struct Keywords
  3. Style Issues
    1. The High-Level Issues
      1. A Public Header File is a Module
      2. #include as Little as Possible
      3. Keep "internal" Headers Private
      4. Use Early Exits and continue to Simplify Code
      5. Don't use else after a return
      6. Turn Predicate Loops into Predicate Functions
    2. The Low-Level Issues
      1. Name Types, Functions, Variables, and Enumerators Properly
      2. Assert Liberally
      3. Do not use 'using namespace std'
      4. Provide a virtual method anchor for classes in headers
      5. Don't evaluate end() every time through a loop
      6. #include <iostream> is forbidden
      7. Use raw_ostream
      8. Avoid std::endl
    3. Microscopic Details
      1. Spaces Before Parentheses
      2. Prefer Preincrement
      3. Namespace Indentation
      4. Anonymous Namespaces
  4. See Also

Written by Chris Lattner

Introduction

This document attempts to describe a few coding standards that are being used in the LLVM source tree. Although no coding standards should be regarded as absolute requirements to be followed in all instances, coding standards can be useful.

This document intentionally does not prescribe fixed standards for religious issues such as brace placement and space usage. For issues like this, follow the golden rule:

If you are adding a significant body of source to a project, feel free to use whatever style you are most comfortable with. If you are extending, enhancing, or bug fixing already implemented code, use the style that is already being used so that the source is uniform and easy to follow.

The ultimate goal of these guidelines is the increase readability and maintainability of our common source base. If you have suggestions for topics to be included, please mail them to Chris.

Mechanical Source Issues

Source Code Formatting

Commenting

Comments are one critical part of readability and maintainability. Everyone knows they should comment, so should you. When writing comments, write them as English prose, which means they should use proper capitalization, punctuation, etc. Although we all should probably comment our code more than we do, there are a few very critical places that documentation is very useful:

File Headers

Every source file should have a header on it that describes the basic purpose of the file. If a file does not have a header, it should not be checked into Subversion. Most source trees will probably have a standard file header format. The standard format for the LLVM source tree looks like this:

 //===-- llvm/Instruction.h - Instruction class definition -------*- C++ -*-===//
 //
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
 //
 //===----------------------------------------------------------------------===//
 //
 // This file contains the declaration of the Instruction class, which is the
 // base class for all of the VM instructions.
 //
 //===----------------------------------------------------------------------===//
 

A few things to note about this particular format: The "-*- C++ -*-" string on the first line is there to tell Emacs that the source file is a C++ file, not a C file (Emacs assumes .h files are C files by default). Note that this tag is not necessary in .cpp files. The name of the file is also on the first line, along with a very short description of the purpose of the file. This is important when printing out code and flipping though lots of pages.

The next section in the file is a concise note that defines the license that the file is released under. This makes it perfectly clear what terms the source code can be distributed under and should not be modified in any way.

The main body of the description does not have to be very long in most cases. Here it's only two lines. If an algorithm is being implemented or something tricky is going on, a reference to the paper where it is published should be included, as well as any notes or "gotchas" in the code to watch out for.

Class overviews

Classes are one fundamental part of a good object oriented design. As such, a class definition should have a comment block that explains what the class is used for... if it's not obvious. If it's so completely obvious your grandma could figure it out, it's probably safe to leave it out. Naming classes something sane goes a long ways towards avoiding writing documentation.

Method information

Methods defined in a class (as well as any global functions) should also be documented properly. A quick note about what it does and a description of the borderline behaviour is all that is necessary here (unless something particularly tricky or insidious is going on). The hope is that people can figure out how to use your interfaces without reading the code itself... that is the goal metric.

Good things to talk about here are what happens when something unexpected happens: does the method return null? Abort? Format your hard disk?

Comment Formatting

In general, prefer C++ style (//) comments. They take less space, require less typing, don't have nesting problems, etc. There are a few cases when it is useful to use C style (/* */) comments however:

  1. When writing C code: Obviously if you are writing C code, use C style comments.
  2. When writing a header file that may be #included by a C source file.
  3. When writing a source file that is used by a tool that only accepts C style comments.

To comment out a large block of code, use #if 0 and #endif. These nest properly and are better behaved in general than C style comments.

#include Style

Immediately after the header file comment (and include guards if working on a header file), the minimal list of #includes required by the file should be listed. We prefer these #includes to be listed in this order:

  1. Main Module Header
  2. Local/Private Headers
  3. llvm/*
  4. llvm/Analysis/*
  5. llvm/Assembly/*
  6. llvm/Bitcode/*
  7. llvm/CodeGen/*
  8. ...
  9. Support/*
  10. Config/*
  11. System #includes

and each category should be sorted by name.

The "Main Module Header" file applies to .cpp files which implement an interface defined by a .h file. This #include should always be included first regardless of where it lives on the file system. By including a header file first in the .cpp files that implement the interfaces, we ensure that the header does not have any hidden dependencies which are not explicitly #included in the header, but should be. It is also a form of documentation in the .cpp file to indicate where the interfaces it implements are defined.

Source Code Width

Write your code to fit within 80 columns of text. This helps those of us who like to print out code and look at your code in an xterm without resizing it.

The longer answer is that there must be some limit to the width of the code in order to reasonably allow developers to have multiple files side-by-side in windows on a modest display. If you are going to pick a width limit, it is somewhat arbitrary but you might as well pick something standard. Going with 90 columns (for example) instead of 80 columns wouldn't add any significant value and would be detrimental to printing out code. Also many other projects have standardized on 80 columns, so some people have already configured their editors for it (vs something else, like 90 columns).

This is one of many contentious issues in coding standards, but it is not up for debate.

Use Spaces Instead of Tabs

In all cases, prefer spaces to tabs in source files. People have different preferred indentation levels, and different styles of indentation that they like; this is fine. What isn't fine is that different editors/viewers expand tabs out to different tab stops. This can cause your code to look completely unreadable, and it is not worth dealing with.

As always, follow the Golden Rule above: follow the style of existing code if you are modifying and extending it. If you like four spaces of indentation, DO NOT do that in the middle of a chunk of code with two spaces of indentation. Also, do not reindent a whole source file: it makes for incredible diffs that are absolutely worthless.

Indent Code Consistently

Okay, in your first year of programming you were told that indentation is important. If you didn't believe and internalize this then, now is the time. Just do it.

Compiler Issues

Treat Compiler Warnings Like Errors

If your code has compiler warnings in it, something is wrong — you aren't casting values correctly, your have "questionable" constructs in your code, or you are doing something legitimately wrong. Compiler warnings can cover up legitimate errors in output and make dealing with a translation unit difficult.

It is not possible to prevent all warnings from all compilers, nor is it desirable. Instead, pick a standard compiler (like gcc) that provides a good thorough set of warnings, and stick to it. At least in the case of gcc, it is possible to work around any spurious errors by changing the syntax of the code slightly. For example, a warning that annoys me occurs when I write code like this:

 if (V = getValue()) {
   ...
 }
 

gcc will warn me that I probably want to use the == operator, and that I probably mistyped it. In most cases, I haven't, and I really don't want the spurious errors. To fix this particular problem, I rewrite the code like this:

 if ((V = getValue())) {
   ...
 }
 

which shuts gcc up. Any gcc warning that annoys you can be fixed by massaging the code appropriately.

These are the gcc warnings that I prefer to enable:

 -Wall -Winline -W -Wwrite-strings -Wno-unused
 

Write Portable Code

In almost all cases, it is possible and within reason to write completely portable code. If there are cases where it isn't possible to write portable code, isolate it behind a well defined (and well documented) interface.

In practice, this means that you shouldn't assume much about the host compiler, and Visual Studio tends to be the lowest common denominator. If advanced features are used, they should only be an implementation detail of a library which has a simple exposed API, and preferably be buried in libSystem.

Do not use RTTI or Exceptions

In an effort to reduce code and executable size, LLVM does not use RTTI (e.g. dynamic_cast<>) or exceptions. These two language features violate the general C++ principle of "you only pay for what you use", causing executable bloat even if exceptions are never used in the code base, or if RTTI is never used for a class. Because of this, we turn them off globally in the code.

That said, LLVM does make extensive use of a hand-rolled form of RTTI that use templates like isa<>, cast<>, and dyn_cast<>. This form of RTTI is opt-in and can be added to any class. It is also substantially more efficient than dynamic_cast<>.

Use of class and struct Keywords

In C++, the class and struct keywords can be used almost interchangeably. The only difference is when they are used to declare a class: class makes all members private by default while struct makes all members public by default.

Unfortunately, not all compilers follow the rules and some will generate different symbols based on whether class or struct was used to declare the symbol. This can lead to problems at link time.

So, the rule for LLVM is to always use the class keyword, unless all members are public and the type is a C++ POD type, in which case struct is allowed.

Style Issues

The High-Level Issues

A Public Header File is a Module

C++ doesn't do too well in the modularity department. There is no real encapsulation or data hiding (unless you use expensive protocol classes), but it is what we have to work with. When you write a public header file (in the LLVM source tree, they live in the top level "include" directory), you are defining a module of functionality.

Ideally, modules should be completely independent of each other, and their header files should only #include the absolute minimum number of headers possible. A module is not just a class, a function, or a namespace: it's a collection of these that defines an interface. This interface may be several functions, classes, or data structures, but the important issue is how they work together.

In general, a module should be implemented by one or more .cpp files. Each of these .cpp files should include the header that defines their interface first. This ensures that all of the dependences of the module header have been properly added to the module header itself, and are not implicit. System headers should be included after user headers for a translation unit.

#include as Little as Possible

#include hurts compile time performance. Don't do it unless you have to, especially in header files.

But wait! Sometimes you need to have the definition of a class to use it, or to inherit from it. In these cases go ahead and #include that header file. Be aware however that there are many cases where you don't need to have the full definition of a class. If you are using a pointer or reference to a class, you don't need the header file. If you are simply returning a class instance from a prototyped function or method, you don't need it. In fact, for most cases, you simply don't need the definition of a class. And not #include'ing speeds up compilation.

It is easy to try to go too overboard on this recommendation, however. You must include all of the header files that you are using — you can include them either directly or indirectly (through another header file). To make sure that you don't accidentally forget to include a header file in your module header, make sure to include your module header first in the implementation file (as mentioned above). This way there won't be any hidden dependencies that you'll find out about later.

Keep "Internal" Headers Private

Many modules have a complex implementation that causes them to use more than one implementation (.cpp) file. It is often tempting to put the internal communication interface (helper classes, extra functions, etc) in the public module header file. Don't do this!

If you really need to do something like this, put a private header file in the same directory as the source files, and include it locally. This ensures that your private interface remains private and undisturbed by outsiders.

Note however, that it's okay to put extra implementation methods in a public class itself. Just make them private (or protected) and all is well.

Use Early Exits and continue to Simplify Code

When reading code, keep in mind how much state and how many previous decisions have to be remembered by the reader to understand a block of code. Aim to reduce indentation where possible when it doesn't make it more difficult to understand the code. One great way to do this is by making use of early exits and the continue keyword in long loops. As an example of using an early exit from a function, consider this "bad" code:

 Value *DoSomething(Instruction *I) {
   if (!isa<TerminatorInst>(I) &&
       I->hasOneUse() && SomeOtherThing(I)) {
     ... some long code ....
   }
   
   return 0;
 }
 

This code has several problems if the body of the 'if' is large. When you're looking at the top of the function, it isn't immediately clear that this only does interesting things with non-terminator instructions, and only applies to things with the other predicates. Second, it is relatively difficult to describe (in comments) why these predicates are important because the if statement makes it difficult to lay out the comments. Third, when you're deep within the body of the code, it is indented an extra level. Finally, when reading the top of the function, it isn't clear what the result is if the predicate isn't true; you have to read to the end of the function to know that it returns null.

It is much preferred to format the code like this:

 Value *DoSomething(Instruction *I) {
   // Terminators never need 'something' done to them because ... 
   if (isa<TerminatorInst>(I))
     return 0;
 
   // We conservatively avoid transforming instructions with multiple uses
   // because goats like cheese.
   if (!I->hasOneUse())
     return 0;
 
   // This is really just here for example.
   if (!SomeOtherThing(I))
     return 0;
     
   ... some long code ....
 }
 

This fixes these problems. A similar problem frequently happens in for loops. A silly example is something like this:

   for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ++II) {
     if (BinaryOperator *BO = dyn_cast<BinaryOperator>(II)) {
       Value *LHS = BO->getOperand(0);
       Value *RHS = BO->getOperand(1);
       if (LHS != RHS) {
         ...
       }
     }
   }
 

When you have very, very small loops, this sort of structure is fine. But if it exceeds more than 10-15 lines, it becomes difficult for people to read and understand at a glance. The problem with this sort of code is that it gets very nested very quickly. Meaning that the reader of the code has to keep a lot of context in their brain to remember what is going immediately on in the loop, because they don't know if/when the if conditions will have elses etc. It is strongly preferred to structure the loop like this:

   for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E; ++II) {
     BinaryOperator *BO = dyn_cast<BinaryOperator>(II);
     if (!BO) continue;
     
     Value *LHS = BO->getOperand(0);
     Value *RHS = BO->getOperand(1);
     if (LHS == RHS) continue;
 
     ...
   }
 

This has all the benefits of using early exits for functions: it reduces nesting of the loop, it makes it easier to describe why the conditions are true, and it makes it obvious to the reader that there is no else coming up that they have to push context into their brain for. If a loop is large, this can be a big understandability win.

Don't use else after a return

For similar reasons above (reduction of indentation and easier reading), please do not use 'else' or 'else if' after something that interrupts control flow — like return, break, continue, goto, etc. For example, this is bad:

   case 'J': {
     if (Signed) {
       Type = Context.getsigjmp_bufType();
       if (Type.isNull()) {
         Error = ASTContext::GE_Missing_sigjmp_buf;
         return QualType();
       } else {
         break;
       }
     } else {
       Type = Context.getjmp_bufType();
       if (Type.isNull()) {
         Error = ASTContext::GE_Missing_jmp_buf;
         return QualType();
       } else {
         break;
       }
     }
   }
   }
 

It is better to write it like this:

   case 'J':
     if (Signed) {
       Type = Context.getsigjmp_bufType();
       if (Type.isNull()) {
         Error = ASTContext::GE_Missing_sigjmp_buf;
         return QualType();
       }
     } else {
       Type = Context.getjmp_bufType();
       if (Type.isNull()) {
         Error = ASTContext::GE_Missing_jmp_buf;
         return QualType();
       }
     }
     break;
 

Or better yet (in this case) as:

   case 'J':
     if (Signed)
       Type = Context.getsigjmp_bufType();
     else
       Type = Context.getjmp_bufType();
     
     if (Type.isNull()) {
       Error = Signed ? ASTContext::GE_Missing_sigjmp_buf :
                        ASTContext::GE_Missing_jmp_buf;
       return QualType();
     }
     break;
 

The idea is to reduce indentation and the amount of code you have to keep track of when reading the code.

Turn Predicate Loops into Predicate Functions

It is very common to write small loops that just compute a boolean value. There are a number of ways that people commonly write these, but an example of this sort of thing is:

   bool FoundFoo = false;
   for (unsigned i = 0, e = BarList.size(); i != e; ++i)
     if (BarList[i]->isFoo()) {
       FoundFoo = true;
       break;
     }
     
   if (FoundFoo) {
     ...
   }
 

This sort of code is awkward to write, and is almost always a bad sign. Instead of this sort of loop, we strongly prefer to use a predicate function (which may be static) that uses early exits to compute the predicate. We prefer the code to be structured like this:

 /// ListContainsFoo - Return true if the specified list has an element that is
 /// a foo.
 static bool ListContainsFoo(const std::vector<Bar*> &List) {
   for (unsigned i = 0, e = List.size(); i != e; ++i)
     if (List[i]->isFoo())
       return true;
   return false;
 }
 ...
 
   if (ListContainsFoo(BarList)) {
     ...
   }
 

There are many reasons for doing this: it reduces indentation and factors out code which can often be shared by other code that checks for the same predicate. More importantly, it forces you to pick a name for the function, and forces you to write a comment for it. In this silly example, this doesn't add much value. However, if the condition is complex, this can make it a lot easier for the reader to understand the code that queries for this predicate. Instead of being faced with the in-line details of how we check to see if the BarList contains a foo, we can trust the function name and continue reading with better locality.

The Low-Level Issues

Name Types, Functions, Variables, and Enumerators Properly

Poorly-chosen names can mislead the reader and cause bugs. We cannot stress enough how important it is to use descriptive names. Pick names that match the semantics and role of the underlying entities, within reason. Avoid abbreviations unless they are well known. After picking a good name, make sure to use consistent capitalization for the name, as inconsistency requires clients to either memorize the APIs or to look it up to find the exact spelling.

In general, names should be in camel case (e.g. TextFileReader and isLValue()). Different kinds of declarations have different rules:

  • Type names (including classes, structs, enums, typedefs, etc) should be nouns and start with an upper-case letter (e.g. TextFileReader).

  • Variable names should be nouns (as they represent state). The name should be camel case, and start with an upper case letter (e.g. Leader or Boats).

  • Function names should be verb phrases (as they represent actions), and command-like function should be imperative. The name should be camel case, and start with a lower case letter (e.g. openFile() or isFoo()).

  • Enum declarations (e.g. enum Foo {...}) are types, so they should follow the naming conventions for types. A common use for enums is as a discriminator for a union, or an indicator of a subclass. When an enum is used for something like this, it should have a Kind suffix (e.g. ValueKind).

  • Enumerators (e.g. enum { Foo, Bar }) and public member variables should start with an upper-case letter, just like types. Unless the enumerators are defined in their own small namespace or inside a class, enumerators should have a prefix corresponding to the enum declaration name. For example, enum ValueKind { ... }; may contain enumerators like VK_Argument, VK_BasicBlock, etc. Enumerators that are just convenience constants are exempt from the requirement for a prefix. For instance:

     enum {
       MaxSize = 42,
       Density = 12
     };
     

As an exception, classes that mimic STL classes can have member names in STL's style of lower-case words separated by underscores (e.g. begin(), push_back(), and empty()).

Here are some examples of good and bad names:

 class VehicleMaker {
   ...
   Factory<Tire> F;            // Bad -- abbreviation and non-descriptive.
   Factory<Tire> Factory;      // Better.
   Factory<Tire> TireFactory;  // Even better -- if VehicleMaker has more than one
                               // kind of factories.
 };
 
 Vehicle MakeVehicle(VehicleType Type) {
   VehicleMaker M;                         // Might be OK if having a short life-span.
   Tire tmp1 = M.makeTire();               // Bad -- 'tmp1' provides no information.
   Light headlight = M.makeLight("head");  // Good -- descriptive.
   ...
 }
 

Assert Liberally

Use the "assert" macro to its fullest. Check all of your preconditions and assumptions, you never know when a bug (not necessarily even yours) might be caught early by an assertion, which reduces debugging time dramatically. The "<cassert>" header file is probably already included by the header files you are using, so it doesn't cost anything to use it.

To further assist with debugging, make sure to put some kind of error message in the assertion statement, which is printed if the assertion is tripped. This helps the poor debugger make sense of why an assertion is being made and enforced, and hopefully what to do about it. Here is one complete example:

 inline Value *getOperand(unsigned i) { 
   assert(i < Operands.size() && "getOperand() out of range!");
   return Operands[i]; 
 }
 

Here are more examples:

 assert(Ty->isPointerType() && "Can't allocate a non pointer type!");
 
 assert((Opcode == Shl || Opcode == Shr) && "ShiftInst Opcode invalid!");
 
 assert(idx < getNumSuccessors() && "Successor # out of range!");
 
 assert(V1.getType() == V2.getType() && "Constant types must be identical!");
 
 assert(isa<PHINode>(Succ->front()) && "Only works on PHId BBs!");
 

You get the idea.

Please be aware that, when adding assert statements, not all compilers are aware of the semantics of the assert. In some places, asserts are used to indicate a piece of code that should not be reached. These are typically of the form:

 assert(0 && "Some helpful error message");
 

When used in a function that returns a value, they should be followed with a return statement and a comment indicating that this line is never reached. This will prevent a compiler which is unable to deduce that the assert statement never returns from generating a warning.

 assert(0 && "Some helpful error message");
 // Not reached
 return 0;
 

Another issue is that values used only by assertions will produce an "unused value" warning when assertions are disabled. For example, this code will warn:

 unsigned Size = V.size();
 assert(Size > 42 && "Vector smaller than it should be");
 
 bool NewToSet = Myset.insert(Value);
 assert(NewToSet && "The value shouldn't be in the set yet");
 

These are two interesting different cases. In the first case, the call to V.size() is only useful for the assert, and we don't want it executed when assertions are disabled. Code like this should move the call into the assert itself. In the second case, the side effects of the call must happen whether the assert is enabled or not. In this case, the value should be cast to void to disable the warning. To be specific, it is preferred to write the code like this:

 assert(V.size() > 42 && "Vector smaller than it should be");
 
 bool NewToSet = Myset.insert(Value); (void)NewToSet;
 assert(NewToSet && "The value shouldn't be in the set yet");
 

Do Not Use 'using namespace std'

In LLVM, we prefer to explicitly prefix all identifiers from the standard namespace with an "std::" prefix, rather than rely on "using namespace std;".

In header files, adding a 'using namespace XXX' directive pollutes the namespace of any source file that #includes the header. This is clearly a bad thing.

In implementation files (e.g. .cpp files), the rule is more of a stylistic rule, but is still important. Basically, using explicit namespace prefixes makes the code clearer, because it is immediately obvious what facilities are being used and where they are coming from. And more portable, because namespace clashes cannot occur between LLVM code and other namespaces. The portability rule is important because different standard library implementations expose different symbols (potentially ones they shouldn't), and future revisions to the C++ standard will add more symbols to the std namespace. As such, we never use 'using namespace std;' in LLVM.

The exception to the general rule (i.e. it's not an exception for the std namespace) is for implementation files. For example, all of the code in the LLVM project implements code that lives in the 'llvm' namespace. As such, it is ok, and actually clearer, for the .cpp files to have a 'using namespace llvm;' directive at the top, after the #includes. This reduces indentation in the body of the file for source editors that indent based on braces, and keeps the conceptual context cleaner. The general form of this rule is that any .cpp file that implements code in any namespace may use that namespace (and its parents'), but should not use any others.

Provide a Virtual Method Anchor for Classes in Headers

If a class is defined in a header file and has a v-table (either it has virtual methods or it derives from classes with virtual methods), it must always have at least one out-of-line virtual method in the class. Without this, the compiler will copy the vtable and RTTI into every .o file that #includes the header, bloating .o file sizes and increasing link times.

Don't evaluate end() every time through a loop

Because C++ doesn't have a standard "foreach" loop (though it can be emulated with macros and may be coming in C++'0x) we end up writing a lot of loops that manually iterate from begin to end on a variety of containers or through other data structures. One common mistake is to write a loop in this style:

   BasicBlock *BB = ...
   for (BasicBlock::iterator I = BB->begin(); I != BB->end(); ++I)
      ... use I ...
 

The problem with this construct is that it evaluates "BB->end()" every time through the loop. Instead of writing the loop like this, we strongly prefer loops to be written so that they evaluate it once before the loop starts. A convenient way to do this is like so:

   BasicBlock *BB = ...
   for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
      ... use I ...
 

The observant may quickly point out that these two loops may have different semantics: if the container (a basic block in this case) is being mutated, then "BB->end()" may change its value every time through the loop and the second loop may not in fact be correct. If you actually do depend on this behavior, please write the loop in the first form and add a comment indicating that you did it intentionally.

Why do we prefer the second form (when correct)? Writing the loop in the first form has two problems. First it may be less efficient than evaluating it at the start of the loop. In this case, the cost is probably minor — a few extra loads every time through the loop. However, if the base expression is more complex, then the cost can rise quickly. I've seen loops where the end expression was actually something like: "SomeMap[x]->end()" and map lookups really aren't cheap. By writing it in the second form consistently, you eliminate the issue entirely and don't even have to think about it.

The second (even bigger) issue is that writing the loop in the first form hints to the reader that the loop is mutating the container (a fact that a comment would handily confirm!). If you write the loop in the second form, it is immediately obvious without even looking at the body of the loop that the container isn't being modified, which makes it easier to read the code and understand what it does.

While the second form of the loop is a few extra keystrokes, we do strongly prefer it.

#include <iostream> is Forbidden

The use of #include <iostream> in library files is hereby forbidden. The primary reason for doing this is to support clients using LLVM libraries as part of larger systems. In particular, we statically link LLVM into some dynamic libraries. Even if LLVM isn't used, the static constructors are run whenever an application starts up that uses the dynamic library. There are two problems with this:

  1. The time to run the static c'tors impacts startup time of applications — a critical time for GUI apps.
  2. The static c'tors cause the app to pull many extra pages of memory off the disk: both the code for the static c'tors in each .o file and the small amount of data that gets touched. In addition, touched/dirty pages put more pressure on the VM system on low-memory machines.

Note that using the other stream headers (<sstream> for example) is not problematic in this regard — just <iostream>. However, raw_ostream provides various APIs that are better performing for almost every use than std::ostream style APIs. Therefore new code should always use raw_ostream for writing, or the llvm::MemoryBuffer API for reading files.

Use raw_ostream

LLVM includes a lightweight, simple, and efficient stream implementation in llvm/Support/raw_ostream.h, which provides all of the common features of std::ostream. All new code should use raw_ostream instead of ostream.

Unlike std::ostream, raw_ostream is not a template and can be forward declared as class raw_ostream. Public headers should generally not include the raw_ostream header, but use forward declarations and constant references to raw_ostream instances.

Avoid std::endl

The std::endl modifier, when used with iostreams outputs a newline to the output stream specified. In addition to doing this, however, it also flushes the output stream. In other words, these are equivalent:

 std::cout << std::endl;
 std::cout << '\n' << std::flush;
 

Most of the time, you probably have no reason to flush the output stream, so it's better to use a literal '\n'.

Microscopic Details

This section describes preferred low-level formatting guidelines along with reasoning on why we prefer them.

Spaces Before Parentheses

We prefer to put a space before an open parenthesis only in control flow statements, but not in normal function call expressions and function-like macros. For example, this is good:

 if (x) ...
 for (i = 0; i != 100; ++i) ...
 while (llvm_rocks) ...
 
 somefunc(42);
 assert(3 != 4 && "laws of math are failing me");
   
 a = foo(42, 92) + bar(x);
 

and this is bad:

 if(x) ...
 for(i = 0; i != 100; ++i) ...
 while(llvm_rocks) ...
 
 somefunc (42);
 assert (3 != 4 && "laws of math are failing me");
   
 a = foo (42, 92) + bar (x);
 

The reason for doing this is not completely arbitrary. This style makes control flow operators stand out more, and makes expressions flow better. The function call operator binds very tightly as a postfix operator. Putting a space after a function name (as in the last example) makes it appear that the code might bind the arguments of the left-hand-side of a binary operator with the argument list of a function and the name of the right side. More specifically, it is easy to misread the "a" example as:

 a = foo ((42, 92) + bar) (x);
 

when skimming through the code. By avoiding a space in a function, we avoid this misinterpretation.

Prefer Preincrement

Hard fast rule: Preincrement (++X) may be no slower than postincrement (X++) and could very well be a lot faster than it. Use preincrementation whenever possible.

The semantics of postincrement include making a copy of the value being incremented, returning it, and then preincrementing the "work value". For primitive types, this isn't a big deal... but for iterators, it can be a huge issue (for example, some iterators contains stack and set objects in them... copying an iterator could invoke the copy ctor's of these as well). In general, get in the habit of always using preincrement, and you won't have a problem.

Namespace Indentation

In general, we strive to reduce indentation wherever possible. This is useful because we want code to fit into 80 columns without wrapping horribly, but also because it makes it easier to understand the code. Namespaces are a funny thing: they are often large, and we often desire to put lots of stuff into them (so they can be large). Other times they are tiny, because they just hold an enum or something similar. In order to balance this, we use different approaches for small versus large namespaces.

If a namespace definition is small and easily fits on a screen (say, less than 35 lines of code), then you should indent its body. Here's an example:

 namespace llvm {
   namespace X86 {
     /// RelocationType - An enum for the x86 relocation codes. Note that
     /// the terminology here doesn't follow x86 convention - word means
     /// 32-bit and dword means 64-bit.
     enum RelocationType {
       /// reloc_pcrel_word - PC relative relocation, add the relocated value to
       /// the value already in memory, after we adjust it for where the PC is.
       reloc_pcrel_word = 0,
 
       /// reloc_picrel_word - PIC base relative relocation, add the relocated
       /// value to the value already in memory, after we adjust it for where the
       /// PIC base is.
       reloc_picrel_word = 1,
       
       /// reloc_absolute_word, reloc_absolute_dword - Absolute relocation, just
       /// add the relocated value to the value already in memory.
       reloc_absolute_word = 2,
       reloc_absolute_dword = 3
     };
   }
 }
 

Since the body is small, indenting adds value because it makes it very clear where the namespace starts and ends, and it is easy to take the whole thing in in one "gulp" when reading the code. If the blob of code in the namespace is larger (as it typically is in a header in the llvm or clang namespaces), do not indent the code, and add a comment indicating what namespace is being closed. For example:

 namespace llvm {
 namespace knowledge {
 
 /// Grokable - This class represents things that Smith can have an intimate
 /// understanding of and contains the data associated with it.
 class Grokable {
 ...
 public:
   explicit Grokable() { ... }
   virtual ~Grokable() = 0;
   
   ...
 
 };
 
 } // end namespace knowledge
 } // end namespace llvm
 

Because the class is large, we don't expect that the reader can easily understand the entire concept in a glance, and the end of the file (where the namespaces end) may be a long ways away from the place they open. As such, indenting the contents of the namespace doesn't add any value, and detracts from the readability of the class. In these cases it is best to not indent the contents of the namespace.

Anonymous Namespaces

After talking about namespaces in general, you may be wondering about anonymous namespaces in particular. Anonymous namespaces are a great language feature that tells the C++ compiler that the contents of the namespace are only visible within the current translation unit, allowing more aggressive optimization and eliminating the possibility of symbol name collisions. Anonymous namespaces are to C++ as "static" is to C functions and global variables. While "static" is available in C++, anonymous namespaces are more general: they can make entire classes private to a file.

The problem with anonymous namespaces is that they naturally want to encourage indentation of their body, and they reduce locality of reference: if you see a random function definition in a C++ file, it is easy to see if it is marked static, but seeing if it is in an anonymous namespace requires scanning a big chunk of the file.

Because of this, we have a simple guideline: make anonymous namespaces as small as possible, and only use them for class declarations. For example, this is good:

 namespace {
   class StringSort {
   ...
   public:
     StringSort(...)
     bool operator<(const char *RHS) const;
   };
 } // end anonymous namespace
 
 static void Helper() { 
   ... 
 }
 
 bool StringSort::operator<(const char *RHS) const {
   ...
 }
 
 

This is bad:

 namespace {
 class StringSort {
 ...
 public:
   StringSort(...)
   bool operator<(const char *RHS) const;
 };
 
 void Helper() { 
   ... 
 }
 
 bool StringSort::operator<(const char *RHS) const {
   ...
 }
 
 } // end anonymous namespace
 
 

This is bad specifically because if you're looking at "Helper" in the middle of a large C++ file, that you have no immediate way to tell if it is local to the file. When it is marked static explicitly, this is immediately obvious. Also, there is no reason to enclose the definition of "operator<" in the namespace just because it was declared there.

See Also

A lot of these comments and recommendations have been culled for other sources. Two particularly important books for our work are:

  1. Effective C++ by Scott Meyers. Also interesting and useful are "More Effective C++" and "Effective STL" by the same author.
  2. Large-Scale C++ Software Design by John Lakos

If you get some free time, and you haven't read them: do so, you might learn something.


Valid CSS Valid HTML 4.01 Chris Lattner
LLVM Compiler Infrastructure
- Last modified: $Date: 2011-08-12 21:49:16 +0200 (Fri, 12 Aug 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/DebuggingJITedCode.html =================================================================== --- vendor/llvm/dist/docs/DebuggingJITedCode.html (revision 228363) +++ vendor/llvm/dist/docs/DebuggingJITedCode.html (revision 228364) @@ -1,152 +1,153 @@ + Debugging JITed Code With GDB

Debugging JITed Code With GDB

  1. Example usage
  2. Background
Written by Reid Kleckner

Example usage

In order to debug code JITed by LLVM, you need GDB 7.0 or newer, which is available on most modern distributions of Linux. The version of GDB that Apple ships with XCode has been frozen at 6.3 for a while. LLDB may be a better option for debugging JITed code on Mac OS X.

Consider debugging the following code compiled with clang and run through lli:

 #include <stdio.h>
 
 void foo() {
     printf("%d\n", *(int*)NULL);  // Crash here
 }
 
 void bar() {
     foo();
 }
 
 void baz() {
     bar();
 }
 
 int main(int argc, char **argv) {
     baz();
 }
 

Here are the commands to run that application under GDB and print the stack trace at the crash:

 # Compile foo.c to bitcode.  You can use either clang or llvm-gcc with this
 # command line.  Both require -fexceptions, or the calls are all marked
 # 'nounwind' which disables DWARF exception handling info.  Custom frontends
 # should avoid adding this attribute to JITed code, since it interferes with
 # DWARF CFA generation at the moment.
 $ clang foo.c -fexceptions -emit-llvm -c -o foo.bc
 
 # Run foo.bc under lli with -jit-emit-debug.  If you built lli in debug mode,
 # -jit-emit-debug defaults to true.
 $ $GDB_INSTALL/gdb --args lli -jit-emit-debug foo.bc
 ...
 
 # Run the code.
 (gdb) run
 Starting program: /tmp/gdb/lli -jit-emit-debug foo.bc
 [Thread debugging using libthread_db enabled]
 
 Program received signal SIGSEGV, Segmentation fault.
 0x00007ffff7f55164 in foo ()
 
 # Print the backtrace, this time with symbols instead of ??.
 (gdb) bt
 #0  0x00007ffff7f55164 in foo ()
 #1  0x00007ffff7f550f9 in bar ()
 #2  0x00007ffff7f55099 in baz ()
 #3  0x00007ffff7f5502a in main ()
 #4  0x00000000007c0225 in llvm::JIT::runFunction(llvm::Function*,
     std::vector<llvm::GenericValue,
     std::allocator<llvm::GenericValue> > const&) ()
 #5  0x00000000007d6d98 in
     llvm::ExecutionEngine::runFunctionAsMain(llvm::Function*,
     std::vector<std::string,
     std::allocator<std::string> > const&, char const* const*) ()
 #6  0x00000000004dab76 in main ()
 

As you can see, GDB can correctly unwind the stack and has the appropriate function names.

Background

Without special runtime support, debugging dynamically generated code with GDB (as well as most debuggers) can be quite painful. Debuggers generally read debug information from the object file of the code, but for JITed code, there is no such file to look for.

Depending on the architecture, this can impact the debugging experience in different ways. For example, on most 32-bit x86 architectures, you can simply compile with -fno-omit-frame-pointer for GCC and -disable-fp-elim for LLVM. When GDB creates a backtrace, it can properly unwind the stack, but the stack frames owned by JITed code have ??'s instead of the appropriate symbol name. However, on Linux x86_64 in particular, GDB relies on the DWARF call frame address (CFA) debug information to unwind the stack, so even if you compile your program to leave the frame pointer untouched, GDB will usually be unable to unwind the stack past any JITed code stack frames.

In order to communicate the necessary debug info to GDB, an interface for registering JITed code with debuggers has been designed and implemented for GDB and LLVM. At a high level, whenever LLVM generates new machine code, it also generates an object file in memory containing the debug information. LLVM then adds the object file to the global list of object files and calls a special function (__jit_debug_register_code) marked noinline that GDB knows about. When GDB attaches to a process, it puts a breakpoint in this function and loads all of the object files in the global list. When LLVM calls the registration function, GDB catches the breakpoint signal, loads the new object file from LLVM's memory, and resumes the execution. In this way, GDB can get the necessary debug information.

At the time of this writing, LLVM only supports architectures that use ELF object files and it only generates symbols and DWARF CFA information. However, it would be easy to add more information to the object file, so we don't need to coordinate with GDB to get better debug information.


Valid CSS Valid HTML 4.01 Reid Kleckner
The LLVM Compiler Infrastructure
- Last modified: $Date: 2011-04-23 02:30:22 +0200 (Sat, 23 Apr 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/ExtendingLLVM.html =================================================================== --- vendor/llvm/dist/docs/ExtendingLLVM.html (revision 228363) +++ vendor/llvm/dist/docs/ExtendingLLVM.html (revision 228364) @@ -1,391 +1,392 @@ + Extending LLVM: Adding instructions, intrinsics, types, etc.

Extending LLVM: Adding instructions, intrinsics, types, etc.

  1. Introduction and Warning
  2. Adding a new intrinsic function
  3. Adding a new instruction
  4. Adding a new SelectionDAG node
  5. Adding a new type
    1. Adding a new fundamental type
    2. Adding a new derived type

Written by Misha Brukman, Brad Jones, Nate Begeman, and Chris Lattner

Introduction and Warning

During the course of using LLVM, you may wish to customize it for your research project or for experimentation. At this point, you may realize that you need to add something to LLVM, whether it be a new fundamental type, a new intrinsic function, or a whole new instruction.

When you come to this realization, stop and think. Do you really need to extend LLVM? Is it a new fundamental capability that LLVM does not support at its current incarnation or can it be synthesized from already pre-existing LLVM elements? If you are not sure, ask on the LLVM-dev list. The reason is that extending LLVM will get involved as you need to update all the different passes that you intend to use with your extension, and there are many LLVM analyses and transformations, so it may be quite a bit of work.

Adding an intrinsic function is far easier than adding an instruction, and is transparent to optimization passes. If your added functionality can be expressed as a function call, an intrinsic function is the method of choice for LLVM extension.

Before you invest a significant amount of effort into a non-trivial extension, ask on the list if what you are looking to do can be done with already-existing infrastructure, or if maybe someone else is already working on it. You will save yourself a lot of time and effort by doing so.

Adding a new intrinsic function

Adding a new intrinsic function to LLVM is much easier than adding a new instruction. Almost all extensions to LLVM should start as an intrinsic function and then be turned into an instruction if warranted.

  1. llvm/docs/LangRef.html: Document the intrinsic. Decide whether it is code generator specific and what the restrictions are. Talk to other people about it so that you are sure it's a good idea.
  2. llvm/include/llvm/Intrinsics*.td: Add an entry for your intrinsic. Describe its memory access characteristics for optimization (this controls whether it will be DCE'd, CSE'd, etc). Note that any intrinsic using the llvm_int_ty type for an argument will be deemed by tblgen as overloaded and the corresponding suffix will be required on the intrinsic's name.
  3. llvm/lib/Analysis/ConstantFolding.cpp: If it is possible to constant fold your intrinsic, add support to it in the canConstantFoldCallTo and ConstantFoldCall functions.
  4. llvm/test/Regression/*: Add test cases for your test cases to the test suite

Once the intrinsic has been added to the system, you must add code generator support for it. Generally you must do the following steps:

Add support to the C backend in lib/Target/CBackend/
Depending on the intrinsic, there are a few ways to implement this. For most intrinsics, it makes sense to add code to lower your intrinsic in LowerIntrinsicCall in lib/CodeGen/IntrinsicLowering.cpp. Second, if it makes sense to lower the intrinsic to an expanded sequence of C code in all cases, just emit the expansion in visitCallInst in Writer.cpp. If the intrinsic has some way to express it with GCC (or any other compiler) extensions, it can be conditionally supported based on the compiler compiling the CBE output (see llvm.prefetch for an example). Third, if the intrinsic really has no way to be lowered, just have the code generator emit code that prints an error message and calls abort if executed.
Add support to the .td file for the target(s) of your choice in lib/Target/*/*.td.
This is usually a matter of adding a pattern to the .td file that matches the intrinsic, though it may obviously require adding the instructions you want to generate as well. There are lots of examples in the PowerPC and X86 backend to follow.

Adding a new SelectionDAG node

As with intrinsics, adding a new SelectionDAG node to LLVM is much easier than adding a new instruction. New nodes are often added to help represent instructions common to many targets. These nodes often map to an LLVM instruction (add, sub) or intrinsic (byteswap, population count). In other cases, new nodes have been added to allow many targets to perform a common task (converting between floating point and integer representation) or capture more complicated behavior in a single node (rotate).

  1. include/llvm/CodeGen/ISDOpcodes.h: Add an enum value for the new SelectionDAG node.
  2. lib/CodeGen/SelectionDAG/SelectionDAG.cpp: Add code to print the node to getOperationName. If your new node can be evaluated at compile time when given constant arguments (such as an add of a constant with another constant), find the getNode method that takes the appropriate number of arguments, and add a case for your node to the switch statement that performs constant folding for nodes that take the same number of arguments as your new node.
  3. lib/CodeGen/SelectionDAG/LegalizeDAG.cpp: Add code to legalize, promote, and expand the node as necessary. At a minimum, you will need to add a case statement for your node in LegalizeOp which calls LegalizeOp on the node's operands, and returns a new node if any of the operands changed as a result of being legalized. It is likely that not all targets supported by the SelectionDAG framework will natively support the new node. In this case, you must also add code in your node's case statement in LegalizeOp to Expand your node into simpler, legal operations. The case for ISD::UREM for expanding a remainder into a divide, multiply, and a subtract is a good example.
  4. lib/CodeGen/SelectionDAG/LegalizeDAG.cpp: If targets may support the new node being added only at certain sizes, you will also need to add code to your node's case statement in LegalizeOp to Promote your node's operands to a larger size, and perform the correct operation. You will also need to add code to PromoteOp to do this as well. For a good example, see ISD::BSWAP, which promotes its operand to a wider size, performs the byteswap, and then shifts the correct bytes right to emulate the narrower byteswap in the wider type.
  5. lib/CodeGen/SelectionDAG/LegalizeDAG.cpp: Add a case for your node in ExpandOp to teach the legalizer how to perform the action represented by the new node on a value that has been split into high and low halves. This case will be used to support your node with a 64 bit operand on a 32 bit target.
  6. lib/CodeGen/SelectionDAG/DAGCombiner.cpp: If your node can be combined with itself, or other existing nodes in a peephole-like fashion, add a visit function for it, and call that function from . There are several good examples for simple combines you can do; visitFABS and visitSRL are good starting places.
  7. lib/Target/PowerPC/PPCISelLowering.cpp: Each target has an implementation of the TargetLowering class, usually in its own file (although some targets include it in the same file as the DAGToDAGISel). The default behavior for a target is to assume that your new node is legal for all types that are legal for that target. If this target does not natively support your node, then tell the target to either Promote it (if it is supported at a larger type) or Expand it. This will cause the code you wrote in LegalizeOp above to decompose your new node into other legal nodes for this target.
  8. lib/Target/TargetSelectionDAG.td: Most current targets supported by LLVM generate code using the DAGToDAG method, where SelectionDAG nodes are pattern matched to target-specific nodes, which represent individual instructions. In order for the targets to match an instruction to your new node, you must add a def for that node to the list in this file, with the appropriate type constraints. Look at add, bswap, and fadd for examples.
  9. lib/Target/PowerPC/PPCInstrInfo.td: Each target has a tablegen file that describes the target's instruction set. For targets that use the DAGToDAG instruction selection framework, add a pattern for your new node that uses one or more target nodes. Documentation for this is a bit sparse right now, but there are several decent examples. See the patterns for rotl in PPCInstrInfo.td.
  10. TODO: document complex patterns.
  11. llvm/test/Regression/CodeGen/*: Add test cases for your new node to the test suite. llvm/test/Regression/CodeGen/X86/bswap.ll is a good example.

Adding a new instruction

WARNING: adding instructions changes the bitcode format, and it will take some effort to maintain compatibility with the previous version. Only add an instruction if it is absolutely necessary.

  1. llvm/include/llvm/Instruction.def: add a number for your instruction and an enum name
  2. llvm/include/llvm/Instructions.h: add a definition for the class that will represent your instruction
  3. llvm/include/llvm/Support/InstVisitor.h: add a prototype for a visitor to your new instruction type
  4. llvm/lib/AsmParser/Lexer.l: add a new token to parse your instruction from assembly text file
  5. llvm/lib/AsmParser/llvmAsmParser.y: add the grammar on how your instruction can be read and what it will construct as a result
  6. llvm/lib/Bitcode/Reader/Reader.cpp: add a case for your instruction and how it will be parsed from bitcode
  7. llvm/lib/VMCore/Instruction.cpp: add a case for how your instruction will be printed out to assembly
  8. llvm/lib/VMCore/Instructions.cpp: implement the class you defined in llvm/include/llvm/Instructions.h
  9. Test your instruction
  10. llvm/lib/Target/*: Add support for your instruction to code generators, or add a lowering pass.
  11. llvm/test/Regression/*: add your test cases to the test suite.

Also, you need to implement (or modify) any analyses or passes that you want to understand this new instruction.

Adding a new type

WARNING: adding new types changes the bitcode format, and will break compatibility with currently-existing LLVM installations. Only add new types if it is absolutely necessary.

Adding a fundamental type

  1. llvm/include/llvm/Type.h: add enum for the new type; add static Type* for this type
  2. llvm/lib/VMCore/Type.cpp: add mapping from TypeID => Type*; initialize the static Type*
  3. llvm/lib/AsmReader/Lexer.l: add ability to parse in the type from text assembly
  4. llvm/lib/AsmReader/llvmAsmParser.y: add a token for that type

Adding a derived type

  1. llvm/include/llvm/Type.h: add enum for the new type; add a forward declaration of the type also
  2. llvm/include/llvm/DerivedTypes.h: add new class to represent new class in the hierarchy; add forward declaration to the TypeMap value type
  3. llvm/lib/VMCore/Type.cpp: add support for derived type to:
     std::string getTypeDescription(const Type &Ty,
       std::vector<const Type*> &TypeStack)
     bool TypesEqual(const Type *Ty, const Type *Ty2,
       std::map<const Type*, const Type*> & EqTypes)
     
    add necessary member functions for type, and factory methods
  4. llvm/lib/AsmReader/Lexer.l: add ability to parse in the type from text assembly
  5. llvm/lib/BitCode/Writer/Writer.cpp: modify void BitcodeWriter::outputType(const Type *T) to serialize your type
  6. llvm/lib/BitCode/Reader/Reader.cpp: modify const Type *BitcodeReader::ParseType() to read your data type
  7. llvm/lib/VMCore/AsmWriter.cpp: modify
     void calcTypeName(const Type *Ty,
                       std::vector<const Type*> &TypeStack,
                       std::map<const Type*,std::string> &TypeNames,
                       std::string & Result)
     
    to output the new derived type

Valid CSS Valid HTML 4.01 The LLVM Compiler Infrastructure
- Last modified: $Date: 2011-06-30 08:37:07 +0200 (Thu, 30 Jun 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/GetElementPtr.html =================================================================== --- vendor/llvm/dist/docs/GetElementPtr.html (revision 228363) +++ vendor/llvm/dist/docs/GetElementPtr.html (revision 228364) @@ -1,753 +1,753 @@ The Often Misunderstood GEP Instruction

The Often Misunderstood GEP Instruction

  1. Introduction
  2. Address Computation
    1. Why is the extra 0 index required?
    2. What is dereferenced by GEP?
    3. Why can you index through the first pointer but not subsequent ones?
    4. Why don't GEP x,0,0,1 and GEP x,1 alias?
    5. Why do GEP x,1,0,0 and GEP x,1 alias?
    6. Can GEP index into vector elements?
    7. What effect do address spaces have on GEPs?
    8. How is GEP different from ptrtoint, arithmetic, and inttoptr?
    9. I'm writing a backend for a target which needs custom lowering for GEP. How do I do this?
    10. How does VLA addressing work with GEPs?
  3. Rules
    1. What happens if an array index is out of bounds?
    2. Can array indices be negative?
    3. Can I compare two values computed with GEPs?
    4. Can I do GEP with a different pointer type than the type of the underlying object?
    5. Can I cast an object's address to integer and add it to null?
    6. Can I compute the distance between two objects, and add that value to one address to compute the other address?
    7. Can I do type-based alias analysis on LLVM IR?
    8. What happens if a GEP computation overflows?
    9. How can I tell if my front-end is following the rules?
  4. Rationale
    1. Why is GEP designed this way?
    2. Why do struct member indices always use i32?
    3. What's an uglygep?
  5. Summary

Written by: Reid Spencer.

Introduction

This document seeks to dispel the mystery and confusion surrounding LLVM's GetElementPtr (GEP) instruction. Questions about the wily GEP instruction are probably the most frequently occurring questions once a developer gets down to coding with LLVM. Here we lay out the sources of confusion and show that the GEP instruction is really quite simple.

Address Computation

When people are first confronted with the GEP instruction, they tend to relate it to known concepts from other programming paradigms, most notably C array indexing and field selection. GEP closely resembles C array indexing and field selection, however it's is a little different and this leads to the following questions.

What is the first index of the GEP instruction?

Quick answer: The index stepping through the first operand.

The confusion with the first index usually arises from thinking about the GetElementPtr instruction as if it was a C index operator. They aren't the same. For example, when we write, in "C":

 AType *Foo;
 ...
 X = &Foo->F;
 

it is natural to think that there is only one index, the selection of the field F. However, in this example, Foo is a pointer. That pointer must be indexed explicitly in LLVM. C, on the other hand, indices through it transparently. To arrive at the same address location as the C code, you would provide the GEP instruction with two index operands. The first operand indexes through the pointer; the second operand indexes the field F of the structure, just as if you wrote:

 X = &Foo[0].F;
 

Sometimes this question gets rephrased as:

Why is it okay to index through the first pointer, but subsequent pointers won't be dereferenced?

The answer is simply because memory does not have to be accessed to perform the computation. The first operand to the GEP instruction must be a value of a pointer type. The value of the pointer is provided directly to the GEP instruction as an operand without any need for accessing memory. It must, therefore be indexed and requires an index operand. Consider this example:

 struct munger_struct {
   int f1;
   int f2;
 };
 void munge(struct munger_struct *P) {
   P[0].f1 = P[1].f1 + P[2].f2;
 }
 ...
 munger_struct Array[3];
 ...
 munge(Array);
 

In this "C" example, the front end compiler (llvm-gcc) will generate three GEP instructions for the three indices through "P" in the assignment statement. The function argument P will be the first operand of each of these GEP instructions. The second operand indexes through that pointer. The third operand will be the field offset into the struct munger_struct type, for either the f1 or f2 field. So, in LLVM assembly the munge function looks like:

 void %munge(%struct.munger_struct* %P) {
 entry:
   %tmp = getelementptr %struct.munger_struct* %P, i32 1, i32 0
   %tmp = load i32* %tmp
   %tmp6 = getelementptr %struct.munger_struct* %P, i32 2, i32 1
   %tmp7 = load i32* %tmp6
   %tmp8 = add i32 %tmp7, %tmp
   %tmp9 = getelementptr %struct.munger_struct* %P, i32 0, i32 0
   store i32 %tmp8, i32* %tmp9
   ret void
 }
 

In each case the first operand is the pointer through which the GEP instruction starts. The same is true whether the first operand is an argument, allocated memory, or a global variable.

To make this clear, let's consider a more obtuse example:

 %MyVar = uninitialized global i32
 ...
 %idx1 = getelementptr i32* %MyVar, i64 0
 %idx2 = getelementptr i32* %MyVar, i64 1
 %idx3 = getelementptr i32* %MyVar, i64 2
 

These GEP instructions are simply making address computations from the base address of MyVar. They compute, as follows (using C syntax):

 idx1 = (char*) &MyVar + 0
 idx2 = (char*) &MyVar + 4
 idx3 = (char*) &MyVar + 8
 

Since the type i32 is known to be four bytes long, the indices 0, 1 and 2 translate into memory offsets of 0, 4, and 8, respectively. No memory is accessed to make these computations because the address of %MyVar is passed directly to the GEP instructions.

The obtuse part of this example is in the cases of %idx2 and %idx3. They result in the computation of addresses that point to memory past the end of the %MyVar global, which is only one i32 long, not three i32s long. While this is legal in LLVM, it is inadvisable because any load or store with the pointer that results from these GEP instructions would produce undefined results.

Why is the extra 0 index required?

Quick answer: there are no superfluous indices.

This question arises most often when the GEP instruction is applied to a global variable which is always a pointer type. For example, consider this:

 %MyStruct = uninitialized global { float*, i32 }
 ...
 %idx = getelementptr { float*, i32 }* %MyStruct, i64 0, i32 1
 

The GEP above yields an i32* by indexing the i32 typed field of the structure %MyStruct. When people first look at it, they wonder why the i64 0 index is needed. However, a closer inspection of how globals and GEPs work reveals the need. Becoming aware of the following facts will dispel the confusion:

  1. The type of %MyStruct is not { float*, i32 } but rather { float*, i32 }*. That is, %MyStruct is a pointer to a structure containing a pointer to a float and an i32.
  2. Point #1 is evidenced by noticing the type of the first operand of the GEP instruction (%MyStruct) which is { float*, i32 }*.
  3. The first index, i64 0 is required to step over the global variable %MyStruct. Since the first argument to the GEP instruction must always be a value of pointer type, the first index steps through that pointer. A value of 0 means 0 elements offset from that pointer.
  4. The second index, i32 1 selects the second field of the structure (the i32).

What is dereferenced by GEP?

Quick answer: nothing.

The GetElementPtr instruction dereferences nothing. That is, it doesn't access memory in any way. That's what the Load and Store instructions are for. GEP is only involved in the computation of addresses. For example, consider this:

 %MyVar = uninitialized global { [40 x i32 ]* }
 ...
 %idx = getelementptr { [40 x i32]* }* %MyVar, i64 0, i32 0, i64 0, i64 17
 

In this example, we have a global variable, %MyVar that is a pointer to a structure containing a pointer to an array of 40 ints. The GEP instruction seems to be accessing the 18th integer of the structure's array of ints. However, this is actually an illegal GEP instruction. It won't compile. The reason is that the pointer in the structure must be dereferenced in order to index into the array of 40 ints. Since the GEP instruction never accesses memory, it is illegal.

In order to access the 18th integer in the array, you would need to do the following:

 %idx = getelementptr { [40 x i32]* }* %, i64 0, i32 0
 %arr = load [40 x i32]** %idx
 %idx = getelementptr [40 x i32]* %arr, i64 0, i64 17
 

In this case, we have to load the pointer in the structure with a load instruction before we can index into the array. If the example was changed to:

 %MyVar = uninitialized global { [40 x i32 ] }
 ...
 %idx = getelementptr { [40 x i32] }*, i64 0, i32 0, i64 17
 

then everything works fine. In this case, the structure does not contain a pointer and the GEP instruction can index through the global variable, into the first field of the structure and access the 18th i32 in the array there.

Why don't GEP x,0,0,1 and GEP x,1 alias?

Quick Answer: They compute different address locations.

If you look at the first indices in these GEP instructions you find that they are different (0 and 1), therefore the address computation diverges with that index. Consider this example:

 %MyVar = global { [10 x i32 ] }
 %idx1 = getelementptr { [10 x i32 ] }* %MyVar, i64 0, i32 0, i64 1
 %idx2 = getelementptr { [10 x i32 ] }* %MyVar, i64 1
 

In this example, idx1 computes the address of the second integer in the array that is in the structure in %MyVar, that is MyVar+4. The type of idx1 is i32*. However, idx2 computes the address of the next structure after %MyVar. The type of idx2 is { [10 x i32] }* and its value is equivalent to MyVar + 40 because it indexes past the ten 4-byte integers in MyVar. Obviously, in such a situation, the pointers don't alias.

Why do GEP x,1,0,0 and GEP x,1 alias?

Quick Answer: They compute the same address location.

These two GEP instructions will compute the same address because indexing through the 0th element does not change the address. However, it does change the type. Consider this example:

 %MyVar = global { [10 x i32 ] }
 %idx1 = getelementptr { [10 x i32 ] }* %MyVar, i64 1, i32 0, i64 0
 %idx2 = getelementptr { [10 x i32 ] }* %MyVar, i64 1
 

In this example, the value of %idx1 is %MyVar+40 and its type is i32*. The value of %idx2 is also MyVar+40 but its type is { [10 x i32] }*.

Can GEP index into vector elements?

This hasn't always been forcefully disallowed, though it's not recommended. It leads to awkward special cases in the optimizers, and fundamental inconsistency in the IR. In the future, it will probably be outright disallowed.

What effect do address spaces have on GEPs?

None, except that the address space qualifier on the first operand pointer type always matches the address space qualifier on the result type.

How is GEP different from ptrtoint, arithmetic, and inttoptr?

It's very similar; there are only subtle differences.

With ptrtoint, you have to pick an integer type. One approach is to pick i64; this is safe on everything LLVM supports (LLVM internally assumes pointers are never wider than 64 bits in many places), and the optimizer will actually narrow the i64 arithmetic down to the actual pointer size on targets which don't support 64-bit arithmetic in most cases. However, there are some cases where it doesn't do this. With GEP you can avoid this problem.

Also, GEP carries additional pointer aliasing rules. It's invalid to take a GEP from one object, address into a different separately allocated object, and dereference it. IR producers (front-ends) must follow this rule, and consumers (optimizers, specifically alias analysis) benefit from being able to rely on it. See the Rules section for more information.

And, GEP is more concise in common cases.

However, for the underlying integer computation implied, there is no difference.

I'm writing a backend for a target which needs custom lowering for GEP. How do I do this?

You don't. The integer computation implied by a GEP is target-independent. Typically what you'll need to do is make your backend pattern-match expressions trees involving ADD, MUL, etc., which are what GEP is lowered into. This has the advantage of letting your code work correctly in more cases.

GEP does use target-dependent parameters for the size and layout of data types, which targets can customize.

If you require support for addressing units which are not 8 bits, you'll need to fix a lot of code in the backend, with GEP lowering being only a small piece of the overall picture.

How does VLA addressing work with GEPs?

GEPs don't natively support VLAs. LLVM's type system is entirely static, and GEP address computations are guided by an LLVM type.

VLA indices can be implemented as linearized indices. For example, an expression like X[a][b][c], must be effectively lowered into a form like X[a*m+b*n+c], so that it appears to the GEP as a single-dimensional array reference.

This means if you want to write an analysis which understands array indices and you want to support VLAs, your code will have to be prepared to reverse-engineer the linearization. One way to solve this problem is to use the ScalarEvolution library, which always presents VLA and non-VLA indexing in the same manner.

Rules

What happens if an array index is out of bounds?

There are two senses in which an array index can be out of bounds.

First, there's the array type which comes from the (static) type of the first operand to the GEP. Indices greater than the number of elements in the corresponding static array type are valid. There is no problem with out of bounds indices in this sense. Indexing into an array only depends on the size of the array element, not the number of elements.

A common example of how this is used is arrays where the size is not known. It's common to use array types with zero length to represent these. The fact that the static type says there are zero elements is irrelevant; it's perfectly valid to compute arbitrary element indices, as the computation only depends on the size of the array element, not the number of elements. Note that zero-sized arrays are not a special case here.

This sense is unconnected with inbounds keyword. The inbounds keyword is designed to describe low-level pointer arithmetic overflow conditions, rather than high-level array indexing rules.

Analysis passes which wish to understand array indexing should not assume that the static array type bounds are respected.

The second sense of being out of bounds is computing an address that's beyond the actual underlying allocated object.

With the inbounds keyword, the result value of the GEP is undefined if the address is outside the actual underlying allocated object and not the address one-past-the-end.

Without the inbounds keyword, there are no restrictions on computing out-of-bounds addresses. Obviously, performing a load or a store requires an address of allocated and sufficiently aligned memory. But the GEP itself is only concerned with computing addresses.

Can array indices be negative?

Yes. This is basically a special case of array indices being out of bounds.

Can I compare two values computed with GEPs?

Yes. If both addresses are within the same allocated object, or one-past-the-end, you'll get the comparison result you expect. If either is outside of it, integer arithmetic wrapping may occur, so the comparison may not be meaningful.

Can I do GEP with a different pointer type than the type of the underlying object?

Yes. There are no restrictions on bitcasting a pointer value to an arbitrary pointer type. The types in a GEP serve only to define the parameters for the underlying integer computation. They need not correspond with the actual type of the underlying object.

Furthermore, loads and stores don't have to use the same types as the type of the underlying object. Types in this context serve only to specify memory size and alignment. Beyond that there are merely a hint to the optimizer indicating how the value will likely be used.

Can I cast an object's address to integer and add it to null?

You can compute an address that way, but if you use GEP to do the add, you can't use that pointer to actually access the object, unless the object is managed outside of LLVM.

The underlying integer computation is sufficiently defined; null has a defined value -- zero -- and you can add whatever value you want to it.

However, it's invalid to access (load from or store to) an LLVM-aware object with such a pointer. This includes GlobalVariables, Allocas, and objects pointed to by noalias pointers.

If you really need this functionality, you can do the arithmetic with explicit integer instructions, and use inttoptr to convert the result to an address. Most of GEP's special aliasing rules do not apply to pointers computed from ptrtoint, arithmetic, and inttoptr sequences.

Can I compute the distance between two objects, and add that value to one address to compute the other address?

As with arithmetic on null, You can use GEP to compute an address that way, but you can't use that pointer to actually access the object if you do, unless the object is managed outside of LLVM.

Also as above, ptrtoint and inttoptr provide an alternative way to do this which do not have this restriction.

Can I do type-based alias analysis on LLVM IR?

You can't do type-based alias analysis using LLVM's built-in type system, because LLVM has no restrictions on mixing types in addressing, loads or stores.

It would be possible to add special annotations to the IR, probably using metadata, to describe a different type system (such as the C type system), and do type-based aliasing on top of that. This is a much bigger undertaking though.

What happens if a GEP computation overflows?

If the GEP lacks the inbounds keyword, the value is the result from evaluating the implied two's complement integer computation. However, since there's no guarantee of where an object will be allocated in the address space, such values have limited meaning.

If the GEP has the inbounds keyword, the result value is undefined (a "trap value") if the GEP overflows (i.e. wraps around the end of the address space).

As such, there are some ramifications of this for inbounds GEPs: scales implied by array/vector/pointer indices are always known to be "nsw" since they are signed values that are scaled by the element size. These values are also allowed to be negative (e.g. "gep i32 *%P, i32 -1") but the pointer itself is logically treated as an unsigned value. This means that GEPs have an asymmetric relation between the pointer base (which is treated as unsigned) and the offset applied to it (which is treated as signed). The result of the additions within the offset calculation cannot have signed overflow, but when applied to the base pointer, there can be signed overflow.

How can I tell if my front-end is following the rules?

There is currently no checker for the getelementptr rules. Currently, the only way to do this is to manually check each place in your front-end where GetElementPtr operators are created.

It's not possible to write a checker which could find all rule violations statically. It would be possible to write a checker which works by instrumenting the code with dynamic checks though. Alternatively, it would be possible to write a static checker which catches a subset of possible problems. However, no such checker exists today.

Rationale

Why is GEP designed this way?

The design of GEP has the following goals, in rough unofficial order of priority:

Why do struct member indices always use i32?

The specific type i32 is probably just a historical artifact, however it's wide enough for all practical purposes, so there's been no need to change it. It doesn't necessarily imply i32 address arithmetic; it's just an identifier which identifies a field in a struct. Requiring that all struct indices be the same reduces the range of possibilities for cases where two GEPs are effectively the same but have distinct operand types.

What's an uglygep?

Some LLVM optimizers operate on GEPs by internally lowering them into more primitive integer expressions, which allows them to be combined with other integer expressions and/or split into multiple separate integer expressions. If they've made non-trivial changes, translating back into LLVM IR can involve reverse-engineering the structure of the addressing in order to fit it into the static type of the original first operand. It isn't always possibly to fully reconstruct this structure; sometimes the underlying addressing doesn't correspond with the static type at all. In such cases the optimizer instead will emit a GEP with the base pointer casted to a simple address-unit pointer, using the name "uglygep". This isn't pretty, but it's just as valid, and it's sufficient to preserve the pointer aliasing guarantees that GEP provides.

Summary

In summary, here's some things to always remember about the GetElementPtr instruction:

  1. The GEP instruction never accesses memory, it only provides pointer computations.
  2. The first operand to the GEP instruction is always a pointer and it must be indexed.
  3. There are no superfluous indices for the GEP instruction.
  4. Trailing zero indices are superfluous for pointer aliasing, but not for the types of the pointers.
  5. Leading zero indices are not superfluous for pointer aliasing nor the types of the pointers.

Valid CSS Valid HTML 4.01 - The LLVM Compiler Infrastructure
- Last modified: $Date: 2011-04-23 02:30:22 +0200 (Sat, 23 Apr 2011) $ + The LLVM Compiler Infrastructure
+ Last modified: $Date: 2011-11-03 07:43:54 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/GoldPlugin.html =================================================================== --- vendor/llvm/dist/docs/GoldPlugin.html (revision 228363) +++ vendor/llvm/dist/docs/GoldPlugin.html (revision 228364) @@ -1,227 +1,228 @@ + LLVM gold plugin

LLVM gold plugin

  1. Introduction
  2. How to build it
  3. Usage
  4. Licensing
Written by Nick Lewycky

Introduction

Building with link time optimization requires cooperation from the system linker. LTO support on Linux systems requires that you use the gold linker which supports LTO via plugins. This is the same mechanism used by the GCC LTO project.

The LLVM gold plugin implements the gold plugin interface on top of libLTO. The same plugin can also be used by other tools such as ar and nm.

How to build it

You need to have gold with plugin support and build the LLVMgold plugin. Check whether you have gold running /usr/bin/ld -v. It will report “GNU gold” or else “GNU ld” if not. If you have gold, check for plugin support by running /usr/bin/ld -plugin. If it complains “missing argument” then you have plugin support. If not, such as an “unknown option” error then you will either need to build gold or install a version with plugin support.

Usage

The linker takes a -plugin option that points to the path of the plugin .so file. To find out what link command gcc would run in a given situation, run gcc -v [...] and look for the line where it runs collect2. Replace that with ld-new -plugin /path/to/LLVMgold.so to test it out. Once you're ready to switch to using gold, backup your existing /usr/bin/ld then replace it with ld-new.

You can produce bitcode files from clang using -emit-llvm or -flto, or the -O4 flag which is synonymous with -O3 -flto.

Clang has a -use-gold-plugin option which looks for the gold plugin in the same directories as it looks for cc1 and passes the -plugin option to ld. It will not look for an alternate linker, which is why you need gold to be the installed system linker in your path.

If you want ar and nm to work seamlessly as well, install LLVMgold.so to /usr/lib/bfd-plugins. If you built your own gold, be sure to install the ar and nm-new you built to /usr/bin.

Example of link time optimization

The following example shows a worked example of the gold plugin mixing LLVM bitcode and native code.

 --- a.c ---
 #include <stdio.h>
 
 extern void foo1(void);
 extern void foo4(void);
 
 void foo2(void) {
   printf("Foo2\n");
 }
 
 void foo3(void) {
   foo4();
 }
 
 int main(void) {
   foo1();
 }
 
 --- b.c ---
 #include <stdio.h>
 
 extern void foo2(void);
 
 void foo1(void) {
   foo2();
 }
 
 void foo4(void) {
   printf("Foo4");
 }
 
 --- command lines ---
 $ clang -flto a.c -c -o a.o                 # <-- a.o is LLVM bitcode file
 $ ar q a.a a.o                              # <-- a.a is an archive with LLVM bitcode
 $ clang b.c -c -o b.o                       # <-- b.o is native object file
 $ clang -use-gold-plugin a.a b.o -o main    # <-- link with LLVMgold plugin
 

Gold informs the plugin that foo3 is never referenced outside the IR, leading LLVM to delete that function. However, unlike in the libLTO example gold does not currently eliminate foo4.

Quickstart for using LTO with autotooled projects

Once your system ld, ar, and nm all support LLVM bitcode, everything is in place for an easy to use LTO build of autotooled projects:

The environment variable settings may work for non-autotooled projects too, but you may need to set the LD environment variable as well.

Licensing

Gold is licensed under the GPLv3. LLVMgold uses the interface file plugin-api.h from gold which means that the resulting LLVMgold.so binary is also GPLv3. This can still be used to link non-GPLv3 programs just as much as gold could without the plugin.


Valid CSS Valid HTML 4.01 Nick Lewycky
The LLVM Compiler Infrastructure
Last modified: $Date: 2010-04-16 23:58:21 -0800 (Fri, 16 Apr 2010) $
Index: vendor/llvm/dist/docs/HowToReleaseLLVM.html =================================================================== --- vendor/llvm/dist/docs/HowToReleaseLLVM.html (revision 228363) +++ vendor/llvm/dist/docs/HowToReleaseLLVM.html (revision 228364) @@ -1,580 +1,581 @@ + How To Release LLVM To The Public

How To Release LLVM To The Public

  1. Introduction
  2. Qualification Criteria
  3. Release Timeline
  4. Release Process

Written by Tanya Lattner, Reid Spencer, John Criswell, & Bill Wendling

Introduction

This document contains information about successfully releasing LLVM — including subprojects: e.g., clang and dragonegg — to the public. It is the Release Manager's responsibility to ensure that a high quality build of LLVM is released.

Release Timeline

LLVM is released on a time based schedule — roughly every 6 months. We do not normally have dot releases because of the nature of LLVM's incremental development philosophy. That said, the only thing preventing dot releases for critical bug fixes from happening is a lack of resources — testers, machines, time, etc. And, because of the high quality we desire for LLVM releases, we cannot allow for a truncated form of release qualification.

The release process is roughly as follows:

Release Process

  1. Release Administrative Tasks
    1. Create Release Branch
    2. Update Version Numbers
  2. Building the Release
    1. Build the LLVM Source Distributions
    2. Build LLVM
    3. Build the Clang Binary Distribution
    4. Target Specific Build Details
  3. Release Qualification Criteria
    1. Qualify LLVM
    2. Qualify Clang
    3. Specific Target Qualification Details
  4. Community Testing
  5. Release Patch Rules
  6. Release final tasks
    1. Update Documentation
    2. Tag the LLVM Final Release
    3. Update the LLVM Demo Page
    4. Update the LLVM Website
    5. Announce the Release

Release Administrative Tasks

This section describes a few administrative tasks that need to be done for the release process to begin. Specifically, it involves:

Create Release Branch

Branch the Subversion trunk using the following procedure:

  1. Remind developers that the release branching is imminent and to refrain from committing patches that might break the build. E.g., new features, large patches for works in progress, an overhaul of the type system, an exciting new TableGen feature, etc.

  2. Verify that the current Subversion trunk is in decent shape by examining nightly tester and buildbot results.

  3. Create the release branch for llvm, clang, the test-suite, and dragonegg from the last known good revision. The branch's name is release_XY, where X is the major and Y the minor release numbers. The branches should be created using the following commands:

     $ svn copy https://llvm.org/svn/llvm-project/llvm/trunk \
                https://llvm.org/svn/llvm-project/llvm/branches/release_XY
     
     $ svn copy https://llvm.org/svn/llvm-project/cfe/trunk \
                https://llvm.org/svn/llvm-project/cfe/branches/release_XY
     
     $ svn copy https://llvm.org/svn/llvm-project/dragonegg/trunk \
                https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY
     
     $ svn copy https://llvm.org/svn/llvm-project/test-suite/trunk \
                https://llvm.org/svn/llvm-project/test-suite/branches/release_XY
     
  4. Advise developers that they may now check their patches into the Subversion tree again.

  5. The Release Manager should switch to the release branch, because all changes to the release will now be done in the branch. The easiest way to do this is to grab a working copy using the following commands:

     $ svn co https://llvm.org/svn/llvm-project/llvm/branches/release_XY llvm-X.Y
     
     $ svn co https://llvm.org/svn/llvm-project/cfe/branches/release_XY clang-X.Y
     
     $ svn co https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY dragonegg-X.Y
     
     $ svn co https://llvm.org/svn/llvm-project/test-suite/branches/release_XY test-suite-X.Y
     

Update LLVM Version

After creating the LLVM release branch, update the release branches' autoconf and configure.ac versions from 'X.Ysvn' to 'X.Y'. Update it on mainline as well to be the next version ('X.Y+1svn'). Regenerate the configure scripts for both llvm and the test-suite.

In addition, the version numbers of all the Bugzilla components must be updated for the next release.

Build the LLVM Release Candidates

Create release candidates for llvm, clang, dragonegg, and the LLVM test-suite by tagging the branch with the respective release candidate number. For instance, to create Release Candidate 1 you would issue the following commands:

 $ svn mkdir https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XY
 $ svn copy https://llvm.org/svn/llvm-project/llvm/branches/release_XY \
            https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XY/rc1
 
 $ svn mkdir https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XY
 $ svn copy https://llvm.org/svn/llvm-project/cfe/branches/release_XY \
            https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XY/rc1
 
 $ svn mkdir https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XY
 $ svn copy https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY \
            https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XY/rc1
 
 $ svn mkdir https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XY
 $ svn copy https://llvm.org/svn/llvm-project/test-suite/branches/release_XY \
            https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XY/rc1
 

Similarly, Release Candidate 2 would be named RC2 and so on. This keeps a permanent copy of the release candidate around for people to export and build as they wish. The final released sources will be tagged in the RELEASE_XY directory as Final (c.f. Tag the LLVM Final Release).

The Release Manager may supply pre-packaged source tarballs for users. This can be done with the following commands:

 $ svn export https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XY/rc1 llvm-X.Yrc1
 $ svn export https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XY/rc1 clang-X.Yrc1
 $ svn export https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XY/rc1 dragonegg-X.Yrc1
 $ svn export https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XY/rc1 llvm-test-X.Yrc1
 
 $ tar -cvf - llvm-X.Yrc1        | gzip > llvm-X.Yrc1.src.tar.gz
 $ tar -cvf - clang-X.Yrc1       | gzip > clang-X.Yrc1.src.tar.gz
 $ tar -cvf - dragonegg-X.Yrc1   | gzip > dragonegg-X.Yrc1.src.tar.gz
 $ tar -cvf - llvm-test-X.Yrc1   | gzip > llvm-test-X.Yrc1.src.tar.gz
 

Building the Release

The builds of llvm, clang, and dragonegg must be free of errors and warnings in Debug, Release+Asserts, and Release builds. If all builds are clean, then the release passes Build Qualification.

The make options for building the different modes:

ModeOptions
DebugENABLE_OPTIMIZED=0
Release+AssertsENABLE_OPTIMIZED=1
ReleaseENABLE_OPTIMIZED=1 DISABLE_ASSERTIONS=1

Build LLVM

Build Debug, Release+Asserts, and Release versions of llvm on all supported platforms. Directions to build llvm are here.

Build Clang Binary Distribution

Creating the clang binary distribution (Debug/Release+Asserts/Release) requires performing the following steps for each supported platform:

  1. Build clang according to the directions here.
  2. Build both a Debug and Release version of clang. The binary will be the Release build.
  3. Package clang (details to follow).

Target Specific Build Details

The table below specifies which compilers are used for each Arch/OS combination when qualifying the build of llvm, clang, and dragonegg.

Architecture OS compiler
x86-32 Mac OS 10.5 gcc 4.0.1
x86-32 Linux gcc 4.2.X, gcc 4.3.X
x86-32 FreeBSD gcc 4.2.X
x86-32 mingw gcc 3.4.5
x86-64 Mac OS 10.5 gcc 4.0.1
x86-64 Linux gcc 4.2.X, gcc 4.3.X
x86-64 FreeBSD gcc 4.2.X

Building the Release

A release is qualified when it has no regressions from the previous release (or baseline). Regressions are related to correctness first and performance second. (We may tolerate some minor performance regressions if they are deemed necessary for the general quality of the compiler.)

Regressions are new failures in the set of tests that are used to qualify each product and only include things on the list. Every release will have some bugs in it. It is the reality of developing a complex piece of software. We need a very concrete and definitive release criteria that ensures we have monotonically improving quality on some metric. The metric we use is described below. This doesn't mean that we don't care about other criteria, but these are the criteria which we found to be most important and which must be satisfied before a release can go out

Qualify LLVM

LLVM is qualified when it has a clean test run without a front-end. And it has no regressions when using either clang or dragonegg with the test-suite from the previous release.

Qualify Clang

Clang is qualified when front-end specific tests in the llvm dejagnu test suite all pass, clang's own test suite passes cleanly, and there are no regressions in the test-suite.

Specific Target Qualification Details

Architecture OS clang baseline tests
x86-32 Linux last release llvm dejagnu, clang tests, test-suite (including spec)
x86-32 FreeBSD last release llvm dejagnu, clang tests, test-suite
x86-32 mingw none QT
x86-64 Mac OS 10.X last release llvm dejagnu, clang tests, test-suite (including spec)
x86-64 Linux last release llvm dejagnu, clang tests, test-suite (including spec)
x86-64 FreeBSD last release llvm dejagnu, clang tests, test-suite

Community Testing

Once all testing has been completed and appropriate bugs filed, the release candidate tarballs are put on the website and the LLVM community is notified. Ask that all LLVM developers test the release in 2 ways:

  1. Download llvm-X.Y, llvm-test-X.Y, and the appropriate clang binary. Build LLVM. Run make check and the full LLVM test suite (make TEST=nightly report).
  2. Download llvm-X.Y, llvm-test-X.Y, and the clang sources. Compile everything. Run make check and the full LLVM test suite (make TEST=nightly report).

Ask LLVM developers to submit the test suite report and make check results to the list. Verify that there are no regressions from the previous release. The results are not used to qualify a release, but to spot other potential problems. For unsupported targets, verify that make check is at least clean.

During the first round of testing, all regressions must be fixed before the second release candidate is tagged.

If this is the second round of testing, the testing is only to ensure that bug fixes previously merged in have not created new major problems. This is not the time to solve additional and unrelated bugs! If no patches are merged in, the release is determined to be ready and the release manager may move onto the next stage.

Release Patch Rules

Below are the rules regarding patching the release branch:

  1. Patches applied to the release branch may only be applied by the release manager.

  2. During the first round of testing, patches that fix regressions or that are small and relatively risk free (verified by the appropriate code owner) are applied to the branch. Code owners are asked to be very conservative in approving patches for the branch. We reserve the right to reject any patch that does not fix a regression as previously defined.

  3. During the remaining rounds of testing, only patches that fix critical regressions may be applied.

Release Final Tasks

The final stages of the release process involves tagging the "final" release branch, updating documentation that refers to the release, and updating the demo page.

Update Documentation

Review the documentation and ensure that it is up to date. The "Release Notes" must be updated to reflect new features, bug fixes, new known issues, and changes in the list of supported platforms. The "Getting Started Guide" should be updated to reflect the new release version number tag avaiable from Subversion and changes in basic system requirements. Merge both changes from mainline into the release branch.

Tag the LLVM Final Release

Tag the final release sources using the following procedure:

 $ svn copy https://llvm.org/svn/llvm-project/llvm/branches/release_XY \
            https://llvm.org/svn/llvm-project/llvm/tags/RELEASE_XY/Final
 
 $ svn copy https://llvm.org/svn/llvm-project/cfe/branches/release_XY \
            https://llvm.org/svn/llvm-project/cfe/tags/RELEASE_XY/Final
 
 $ svn copy https://llvm.org/svn/llvm-project/dragonegg/branches/release_XY \
            https://llvm.org/svn/llvm-project/dragonegg/tags/RELEASE_XY/Final
 
 $ svn copy https://llvm.org/svn/llvm-project/test-suite/branches/release_XY \
            https://llvm.org/svn/llvm-project/test-suite/tags/RELEASE_XY/Final
 

Update the LLVM Demo Page

The LLVM demo page must be updated to use the new release. This consists of using the new clang binary and building LLVM.

Update the LLVM Website

The website must be updated before the release announcement is sent out. Here is what to do:

  1. Check out the www module from Subversion.
  2. Create a new subdirectory X.Y in the releases directory.
  3. Commit the llvm, test-suite, clang source, clang binaries, dragonegg source, and dragonegg binaries in this new directory.
  4. Copy and commit the llvm/docs and LICENSE.txt files into this new directory. The docs should be built with BUILD_FOR_WEBSITE=1.
  5. Commit the index.html to the release/X.Y directory to redirect (use from previous release.
  6. Update the releases/download.html file with the new release.
  7. Update the releases/index.html with the new release and link to release documentation.
  8. Finally, update the main page (index.html and sidebar) to point to the new release and release announcement. Make sure this all gets committed back into Subversion.

Announce the Release

Have Chris send out the release announcement when everything is finished.


Valid CSS Valid HTML 4.01 The LLVM Compiler Infrastructure
- Last modified: $Date: 2011-10-17 22:32:14 +0200 (Mon, 17 Oct 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/HowToSubmitABug.html =================================================================== --- vendor/llvm/dist/docs/HowToSubmitABug.html (revision 228363) +++ vendor/llvm/dist/docs/HowToSubmitABug.html (revision 228364) @@ -1,347 +1,348 @@ + How to submit an LLVM bug report

How to submit an LLVM bug report

  1. Introduction - Got bugs?
  2. Crashing Bugs
  3. Miscompilations
  4. Incorrect code generation (JIT and LLC)

Written by Chris Lattner and Misha Brukman

Debugging

Introduction - Got bugs?

If you're working with LLVM and run into a bug, we definitely want to know about it. This document describes what you can do to increase the odds of getting it fixed quickly.

Basically you have to do two things at a minimum. First, decide whether the bug crashes the compiler (or an LLVM pass), or if the compiler is miscompiling the program (i.e., the compiler successfully produces an executable, but it doesn't run right). Based on what type of bug it is, follow the instructions in the linked section to narrow down the bug so that the person who fixes it will be able to find the problem more easily.

Once you have a reduced test-case, go to the LLVM Bug Tracking System and fill out the form with the necessary details (note that you don't need to pick a category, just use the "new-bugs" category if you're not sure). The bug description should contain the following information:

Thanks for helping us make LLVM better!

Crashing Bugs

More often than not, bugs in the compiler cause it to crash—often due to an assertion failure of some sort. The most important piece of the puzzle is to figure out if it is crashing in the GCC front-end or if it is one of the LLVM libraries (e.g. the optimizer or code generator) that has problems.

To figure out which component is crashing (the front-end, optimizer or code generator), run the llvm-gcc command line as you were when the crash occurred, but with the following extra command line options:

Front-end bugs

If the problem is in the front-end, you should re-run the same llvm-gcc command that resulted in the crash, but add the -save-temps option. The compiler will crash again, but it will leave behind a foo.i file (containing preprocessed C source code) and possibly foo.s for each compiled foo.c file. Send us the foo.i file, along with the options you passed to llvm-gcc, and a brief description of the error it caused.

The delta tool helps to reduce the preprocessed file down to the smallest amount of code that still replicates the problem. You're encouraged to use delta to reduce the code to make the developers' lives easier. This website has instructions on the best way to use delta.

Compile-time optimization bugs

If you find that a bug crashes in the optimizer, compile your test-case to a .bc file by passing "-emit-llvm -O0 -c -o foo.bc". Then run:

opt -std-compile-opts -debug-pass=Arguments foo.bc -disable-output

This command should do two things: it should print out a list of passes, and then it should crash in the same way as llvm-gcc. If it doesn't crash, please follow the instructions for a front-end bug.

If this does crash, then you should be able to debug this with the following bugpoint command:

bugpoint foo.bc <list of passes printed by opt>

Please run this, then file a bug with the instructions and reduced .bc files that bugpoint emits. If something goes wrong with bugpoint, please submit the "foo.bc" file and the list of passes printed by opt.

Code generator bugs

If you find a bug that crashes llvm-gcc in the code generator, compile your source file to a .bc file by passing "-emit-llvm -c -o foo.bc" to llvm-gcc (in addition to the options you already pass). Once your have foo.bc, one of the following commands should fail:

  1. llc foo.bc
  2. llc foo.bc -relocation-model=pic
  3. llc foo.bc -relocation-model=static

If none of these crash, please follow the instructions for a front-end bug. If one of these do crash, you should be able to reduce this with one of the following bugpoint command lines (use the one corresponding to the command above that failed):

  1. bugpoint -run-llc foo.bc
  2. bugpoint -run-llc foo.bc --tool-args -relocation-model=pic
  3. bugpoint -run-llc foo.bc --tool-args -relocation-model=static

Please run this, then file a bug with the instructions and reduced .bc file that bugpoint emits. If something goes wrong with bugpoint, please submit the "foo.bc" file and the option that llc crashes with.

Miscompilations

If llvm-gcc successfully produces an executable, but that executable doesn't run right, this is either a bug in the code or a bug in the compiler. The first thing to check is to make sure it is not using undefined behavior (e.g. reading a variable before it is defined). In particular, check to see if the program valgrinds clean, passes purify, or some other memory checker tool. Many of the "LLVM bugs" that we have chased down ended up being bugs in the program being compiled, not LLVM.

Once you determine that the program itself is not buggy, you should choose which code generator you wish to compile the program with (e.g. C backend, the JIT, or LLC) and optionally a series of LLVM passes to run. For example:

bugpoint -run-cbe [... optzn passes ...] file-to-test.bc --args -- [program arguments]

bugpoint will try to narrow down your list of passes to the one pass that causes an error, and simplify the bitcode file as much as it can to assist you. It will print a message letting you know how to reproduce the resulting error.

Incorrect code generation

Similarly to debugging incorrect compilation by mis-behaving passes, you can debug incorrect code generation by either LLC or the JIT, using bugpoint. The process bugpoint follows in this case is to try to narrow the code down to a function that is miscompiled by one or the other method, but since for correctness, the entire program must be run, bugpoint will compile the code it deems to not be affected with the C Backend, and then link in the shared object it generates.

To debug the JIT:

 bugpoint -run-jit -output=[correct output file] [bitcode file]  \
          --tool-args -- [arguments to pass to lli]              \
          --args -- [program arguments]
 

Similarly, to debug the LLC, one would run:

 bugpoint -run-llc -output=[correct output file] [bitcode file]  \
          --tool-args -- [arguments to pass to llc]              \
          --args -- [program arguments]
 

Special note: if you are debugging MultiSource or SPEC tests that already exist in the llvm/test hierarchy, there is an easier way to debug the JIT, LLC, and CBE, using the pre-written Makefile targets, which will pass the program options specified in the Makefiles:

cd llvm/test/../../program
make bugpoint-jit

At the end of a successful bugpoint run, you will be presented with two bitcode files: a safe file which can be compiled with the C backend and the test file which either LLC or the JIT mis-codegenerates, and thus causes the error.

To reproduce the error that bugpoint found, it is sufficient to do the following:

  1. Regenerate the shared object from the safe bitcode file:

    llc -march=c safe.bc -o safe.c
    gcc -shared safe.c -o safe.so

  2. If debugging LLC, compile test bitcode native and link with the shared object:

    llc test.bc -o test.s
    gcc test.s safe.so -o test.llc
    ./test.llc [program options]

  3. If debugging the JIT, load the shared object and supply the test bitcode:

    lli -load=safe.so test.bc [program options]


Valid CSS Valid HTML 4.01 Chris Lattner
The LLVM Compiler Infrastructure
- Last modified: $Date: 2011-06-07 22:03:13 +0200 (Tue, 07 Jun 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/LangRef.html =================================================================== --- vendor/llvm/dist/docs/LangRef.html (revision 228363) +++ vendor/llvm/dist/docs/LangRef.html (revision 228364) @@ -1,8622 +1,8108 @@ LLVM Assembly Language Reference Manual

LLVM Language Reference Manual

  1. Abstract
  2. Introduction
  3. Identifiers
  4. High Level Structure
    1. Module Structure
    2. Linkage Types
      1. 'private' Linkage
      2. 'linker_private' Linkage
      3. 'linker_private_weak' Linkage
      4. 'linker_private_weak_def_auto' Linkage
      5. 'internal' Linkage
      6. 'available_externally' Linkage
      7. 'linkonce' Linkage
      8. 'common' Linkage
      9. 'weak' Linkage
      10. 'appending' Linkage
      11. 'extern_weak' Linkage
      12. 'linkonce_odr' Linkage
      13. 'weak_odr' Linkage
      14. 'external' Linkage
      15. 'dllimport' Linkage
      16. 'dllexport' Linkage
    3. Calling Conventions
    4. Named Types
    5. Global Variables
    6. Functions
    7. Aliases
    8. Named Metadata
    9. Parameter Attributes
    10. Function Attributes
    11. Garbage Collector Names
    12. Module-Level Inline Assembly
    13. Data Layout
    14. Pointer Aliasing Rules
    15. Volatile Memory Accesses
    16. Memory Model for Concurrent Operations
    17. Atomic Memory Ordering Constraints
  5. Type System
    1. Type Classifications
    2. Primitive Types
      1. Integer Type
      2. Floating Point Types
      3. X86mmx Type
      4. Void Type
      5. Label Type
      6. Metadata Type
    3. Derived Types
      1. Aggregate Types
        1. Array Type
        2. Structure Type
        3. Opaque Structure Types
        4. Vector Type
      2. Function Type
      3. Pointer Type
  6. Constants
    1. Simple Constants
    2. Complex Constants
    3. Global Variable and Function Addresses
    4. Undefined Values
    5. Trap Values
    6. Addresses of Basic Blocks
    7. Constant Expressions
  7. Other Values
    1. Inline Assembler Expressions
    2. Metadata Nodes and Metadata Strings
  8. Intrinsic Global Variables
    1. The 'llvm.used' Global Variable
    2. The 'llvm.compiler.used' Global Variable
    3. The 'llvm.global_ctors' Global Variable
    4. The 'llvm.global_dtors' Global Variable
  9. Instruction Reference
    1. Terminator Instructions
      1. 'ret' Instruction
      2. 'br' Instruction
      3. 'switch' Instruction
      4. 'indirectbr' Instruction
      5. 'invoke' Instruction
      6. 'unwind' Instruction
      7. 'resume' Instruction
      8. 'unreachable' Instruction
    2. Binary Operations
      1. 'add' Instruction
      2. 'fadd' Instruction
      3. 'sub' Instruction
      4. 'fsub' Instruction
      5. 'mul' Instruction
      6. 'fmul' Instruction
      7. 'udiv' Instruction
      8. 'sdiv' Instruction
      9. 'fdiv' Instruction
      10. 'urem' Instruction
      11. 'srem' Instruction
      12. 'frem' Instruction
    3. Bitwise Binary Operations
      1. 'shl' Instruction
      2. 'lshr' Instruction
      3. 'ashr' Instruction
      4. 'and' Instruction
      5. 'or' Instruction
      6. 'xor' Instruction
    4. Vector Operations
      1. 'extractelement' Instruction
      2. 'insertelement' Instruction
      3. 'shufflevector' Instruction
    5. Aggregate Operations
      1. 'extractvalue' Instruction
      2. 'insertvalue' Instruction
    6. Memory Access and Addressing Operations
      1. 'alloca' Instruction
      2. 'load' Instruction
      3. 'store' Instruction
      4. 'fence' Instruction
      5. 'cmpxchg' Instruction
      6. 'atomicrmw' Instruction
      7. 'getelementptr' Instruction
    7. Conversion Operations
      1. 'trunc .. to' Instruction
      2. 'zext .. to' Instruction
      3. 'sext .. to' Instruction
      4. 'fptrunc .. to' Instruction
      5. 'fpext .. to' Instruction
      6. 'fptoui .. to' Instruction
      7. 'fptosi .. to' Instruction
      8. 'uitofp .. to' Instruction
      9. 'sitofp .. to' Instruction
      10. 'ptrtoint .. to' Instruction
      11. 'inttoptr .. to' Instruction
      12. 'bitcast .. to' Instruction
    8. Other Operations
      1. 'icmp' Instruction
      2. 'fcmp' Instruction
      3. 'phi' Instruction
      4. 'select' Instruction
      5. 'call' Instruction
      6. 'va_arg' Instruction
      7. 'landingpad' Instruction
  10. Intrinsic Functions
    1. Variable Argument Handling Intrinsics
      1. 'llvm.va_start' Intrinsic
      2. 'llvm.va_end' Intrinsic
      3. 'llvm.va_copy' Intrinsic
    2. Accurate Garbage Collection Intrinsics
      1. 'llvm.gcroot' Intrinsic
      2. 'llvm.gcread' Intrinsic
      3. 'llvm.gcwrite' Intrinsic
    3. Code Generator Intrinsics
      1. 'llvm.returnaddress' Intrinsic
      2. 'llvm.frameaddress' Intrinsic
      3. 'llvm.stacksave' Intrinsic
      4. 'llvm.stackrestore' Intrinsic
      5. 'llvm.prefetch' Intrinsic
      6. 'llvm.pcmarker' Intrinsic
      7. 'llvm.readcyclecounter' Intrinsic
    4. Standard C Library Intrinsics
      1. 'llvm.memcpy.*' Intrinsic
      2. 'llvm.memmove.*' Intrinsic
      3. 'llvm.memset.*' Intrinsic
      4. 'llvm.sqrt.*' Intrinsic
      5. 'llvm.powi.*' Intrinsic
      6. 'llvm.sin.*' Intrinsic
      7. 'llvm.cos.*' Intrinsic
      8. 'llvm.pow.*' Intrinsic
      9. 'llvm.exp.*' Intrinsic
      10. 'llvm.log.*' Intrinsic
      11. 'llvm.fma.*' Intrinsic
    5. Bit Manipulation Intrinsics
      1. 'llvm.bswap.*' Intrinsics
      2. 'llvm.ctpop.*' Intrinsic
      3. 'llvm.ctlz.*' Intrinsic
      4. 'llvm.cttz.*' Intrinsic
    6. Arithmetic with Overflow Intrinsics
      1. 'llvm.sadd.with.overflow.* Intrinsics
      2. 'llvm.uadd.with.overflow.* Intrinsics
      3. 'llvm.ssub.with.overflow.* Intrinsics
      4. 'llvm.usub.with.overflow.* Intrinsics
      5. 'llvm.smul.with.overflow.* Intrinsics
      6. 'llvm.umul.with.overflow.* Intrinsics
    7. Half Precision Floating Point Intrinsics
      1. 'llvm.convert.to.fp16' Intrinsic
      2. 'llvm.convert.from.fp16' Intrinsic
    8. Debugger intrinsics
    9. Exception Handling intrinsics
    10. Trampoline Intrinsics
      1. 'llvm.init.trampoline' Intrinsic
      2. 'llvm.adjust.trampoline' Intrinsic
    11. -
    12. Atomic intrinsics -
        -
      1. llvm.memory_barrier
      2. -
      3. llvm.atomic.cmp.swap
      4. -
      5. llvm.atomic.swap
      6. -
      7. llvm.atomic.load.add
      8. -
      9. llvm.atomic.load.sub
      10. -
      11. llvm.atomic.load.and
      12. -
      13. llvm.atomic.load.nand
      14. -
      15. llvm.atomic.load.or
      16. -
      17. llvm.atomic.load.xor
      18. -
      19. llvm.atomic.load.max
      20. -
      21. llvm.atomic.load.min
      22. -
      23. llvm.atomic.load.umax
      24. -
      25. llvm.atomic.load.umin
      26. -
      -
    13. Memory Use Markers
      1. llvm.lifetime.start
      2. llvm.lifetime.end
      3. llvm.invariant.start
      4. llvm.invariant.end
    14. General intrinsics
      1. 'llvm.var.annotation' Intrinsic
      2. 'llvm.annotation.*' Intrinsic
      3. 'llvm.trap' Intrinsic
      4. 'llvm.stackprotector' Intrinsic
      5. 'llvm.objectsize' Intrinsic

Written by Chris Lattner and Vikram Adve

Abstract

This document is a reference manual for the LLVM assembly language. LLVM is a Static Single Assignment (SSA) based representation that provides type safety, low-level operations, flexibility, and the capability of representing 'all' high-level languages cleanly. It is the common code representation used throughout all phases of the LLVM compilation strategy.

Introduction

The LLVM code representation is designed to be used in three different forms: as an in-memory compiler IR, as an on-disk bitcode representation (suitable for fast loading by a Just-In-Time compiler), and as a human readable assembly language representation. This allows LLVM to provide a powerful intermediate representation for efficient compiler transformations and analysis, while providing a natural means to debug and visualize the transformations. The three different forms of LLVM are all equivalent. This document describes the human readable representation and notation.

The LLVM representation aims to be light-weight and low-level while being expressive, typed, and extensible at the same time. It aims to be a "universal IR" of sorts, by being at a low enough level that high-level ideas may be cleanly mapped to it (similar to how microprocessors are "universal IR's", allowing many source languages to be mapped to them). By providing type information, LLVM can be used as the target of optimizations: for example, through pointer analysis, it can be proven that a C automatic variable is never accessed outside of the current function, allowing it to be promoted to a simple SSA value instead of a memory location.

Well-Formedness

It is important to note that this document describes 'well formed' LLVM assembly language. There is a difference between what the parser accepts and what is considered 'well formed'. For example, the following instruction is syntactically okay, but not well formed:

 %x = add i32 1, %x
 

because the definition of %x does not dominate all of its uses. The LLVM infrastructure provides a verification pass that may be used to verify that an LLVM module is well formed. This pass is automatically run by the parser after parsing input assembly and by the optimizer before it outputs bitcode. The violations pointed out by the verifier pass indicate bugs in transformation passes or input to the parser.

Identifiers

LLVM identifiers come in two basic types: global and local. Global identifiers (functions, global variables) begin with the '@' character. Local identifiers (register names, types) begin with the '%' character. Additionally, there are three different formats for identifiers, for different purposes:

  1. Named values are represented as a string of characters with their prefix. For example, %foo, @DivisionByZero, %a.really.long.identifier. The actual regular expression used is '[%@][a-zA-Z$._][a-zA-Z$._0-9]*'. Identifiers which require other characters in their names can be surrounded with quotes. Special characters may be escaped using "\xx" where xx is the ASCII code for the character in hexadecimal. In this way, any character can be used in a name value, even quotes themselves.
  2. Unnamed values are represented as an unsigned numeric value with their prefix. For example, %12, @2, %44.
  3. Constants, which are described in a section about constants, below.

LLVM requires that values start with a prefix for two reasons: Compilers don't need to worry about name clashes with reserved words, and the set of reserved words may be expanded in the future without penalty. Additionally, unnamed identifiers allow a compiler to quickly come up with a temporary variable without having to avoid symbol table conflicts.

Reserved words in LLVM are very similar to reserved words in other languages. There are keywords for different opcodes ('add', 'bitcast', 'ret', etc...), for primitive type names ('void', 'i32', etc...), and others. These reserved words cannot conflict with variable names, because none of them start with a prefix character ('%' or '@').

Here is an example of LLVM code to multiply the integer variable '%X' by 8:

The easy way:

 %result = mul i32 %X, 8
 

After strength reduction:

 %result = shl i32 %X, i8 3
 

And the hard way:

 %0 = add i32 %X, %X           ; yields {i32}:%0
 %1 = add i32 %0, %0           ; yields {i32}:%1
 %result = add i32 %1, %1
 

This last way of multiplying %X by 8 illustrates several important lexical features of LLVM:

  1. Comments are delimited with a ';' and go until the end of line.
  2. Unnamed temporaries are created when the result of a computation is not assigned to a named value.
  3. Unnamed temporaries are numbered sequentially

It also shows a convention that we follow in this document. When demonstrating instructions, we will follow an instruction with a comment that defines the type and name of value produced. Comments are shown in italic text.

High Level Structure

Module Structure

LLVM programs are composed of "Module"s, each of which is a translation unit of the input programs. Each module consists of functions, global variables, and symbol table entries. Modules may be combined together with the LLVM linker, which merges function (and global variable) definitions, resolves forward declarations, and merges symbol table entries. Here is an example of the "hello world" module:

 ; Declare the string constant as a global constant. 
 @.LC0 = internal constant [13 x i8] c"hello world\0A\00"      ; [13 x i8]* 
 
 ; External declaration of the puts function 
 declare i32 @puts(i8*)                                      ; i32 (i8*)*  
 
 ; Definition of main function
 define i32 @main() {   ; i32()*  
   ; Convert [13 x i8]* to i8  *... 
   %cast210 = getelementptr [13 x i8]* @.LC0, i64 0, i64 0   ; i8* 
 
   ; Call puts function to write out the string to stdout. 
   call i32 @puts(i8* %cast210)           ; i32 
   ret i32 0 
 }
 
 ; Named metadata
 !1 = metadata !{i32 41}
 !foo = !{!1, null}
 

This example is made up of a global variable named ".LC0", an external declaration of the "puts" function, a function definition for "main" and named metadata "foo".

In general, a module is made up of a list of global values, where both functions and global variables are global values. Global values are represented by a pointer to a memory location (in this case, a pointer to an array of char, and a pointer to a function), and have one of the following linkage types.

Linkage Types

All Global Variables and Functions have one of the following types of linkage:

private
Global values with "private" linkage are only directly accessible by objects in the current module. In particular, linking code into a module with an private global value may cause the private to be renamed as necessary to avoid collisions. Because the symbol is private to the module, all references can be updated. This doesn't show up in any symbol table in the object file.
linker_private
Similar to private, but the symbol is passed through the assembler and evaluated by the linker. Unlike normal strong symbols, they are removed by the linker from the final linked image (executable or dynamic library).
linker_private_weak
Similar to "linker_private", but the symbol is weak. Note that linker_private_weak symbols are subject to coalescing by the linker. The symbols are removed by the linker from the final linked image (executable or dynamic library).
linker_private_weak_def_auto
Similar to "linker_private_weak", but it's known that the address of the object is not taken. For instance, functions that had an inline definition, but the compiler decided not to inline it. Note, unlike linker_private and linker_private_weak, linker_private_weak_def_auto may have only default visibility. The symbols are removed by the linker from the final linked image (executable or dynamic library).
internal
Similar to private, but the value shows as a local symbol (STB_LOCAL in the case of ELF) in the object file. This corresponds to the notion of the 'static' keyword in C.
available_externally
Globals with "available_externally" linkage are never emitted into the object file corresponding to the LLVM module. They exist to allow inlining and other optimizations to take place given knowledge of the definition of the global, which is known to be somewhere outside the module. Globals with available_externally linkage are allowed to be discarded at will, and are otherwise the same as linkonce_odr. This linkage type is only allowed on definitions, not declarations.
linkonce
Globals with "linkonce" linkage are merged with other globals of the same name when linkage occurs. This can be used to implement some forms of inline functions, templates, or other code which must be generated in each translation unit that uses it, but where the body may be overridden with a more definitive definition later. Unreferenced linkonce globals are allowed to be discarded. Note that linkonce linkage does not actually allow the optimizer to inline the body of this function into callers because it doesn't know if this definition of the function is the definitive definition within the program or whether it will be overridden by a stronger definition. To enable inlining and other optimizations, use "linkonce_odr" linkage.
weak
"weak" linkage has the same merging semantics as linkonce linkage, except that unreferenced globals with weak linkage may not be discarded. This is used for globals that are declared "weak" in C source code.
common
"common" linkage is most similar to "weak" linkage, but they are used for tentative definitions in C, such as "int X;" at global scope. Symbols with "common" linkage are merged in the same way as weak symbols, and they may not be deleted if unreferenced. common symbols may not have an explicit section, must have a zero initializer, and may not be marked 'constant'. Functions and aliases may not have common linkage.
appending
"appending" linkage may only be applied to global variables of pointer to array type. When two global variables with appending linkage are linked together, the two global arrays are appended together. This is the LLVM, typesafe, equivalent of having the system linker append together "sections" with identical names when .o files are linked.
extern_weak
The semantics of this linkage follow the ELF object file model: the symbol is weak until linked, if not linked, the symbol becomes null instead of being an undefined reference.
linkonce_odr
weak_odr
Some languages allow differing globals to be merged, such as two functions with different semantics. Other languages, such as C++, ensure that only equivalent globals are ever merged (the "one definition rule" — "ODR"). Such languages can use the linkonce_odr and weak_odr linkage types to indicate that the global will only be merged with equivalent globals. These linkage types are otherwise the same as their non-odr versions.
external:
If none of the above identifiers are used, the global is externally visible, meaning that it participates in linkage and can be used to resolve external symbol references.

The next two types of linkage are targeted for Microsoft Windows platform only. They are designed to support importing (exporting) symbols from (to) DLLs (Dynamic Link Libraries).

dllimport
"dllimport" linkage causes the compiler to reference a function or variable via a global pointer to a pointer that is set up by the DLL exporting the symbol. On Microsoft Windows targets, the pointer name is formed by combining __imp_ and the function or variable name.
dllexport
"dllexport" linkage causes the compiler to provide a global pointer to a pointer in a DLL, so that it can be referenced with the dllimport attribute. On Microsoft Windows targets, the pointer name is formed by combining __imp_ and the function or variable name.

For example, since the ".LC0" variable is defined to be internal, if another module defined a ".LC0" variable and was linked with this one, one of the two would be renamed, preventing a collision. Since "main" and "puts" are external (i.e., lacking any linkage declarations), they are accessible outside of the current module.

It is illegal for a function declaration to have any linkage type other than external, dllimport or extern_weak.

Aliases can have only external, internal, weak or weak_odr linkages.

Calling Conventions

LLVM functions, calls and invokes can all have an optional calling convention specified for the call. The calling convention of any pair of dynamic caller/callee must match, or the behavior of the program is undefined. The following calling conventions are supported by LLVM, and more may be added in the future:

"ccc" - The C calling convention:
This calling convention (the default if no other calling convention is specified) matches the target C calling conventions. This calling convention supports varargs function calls and tolerates some mismatch in the declared prototype and implemented declaration of the function (as does normal C).
"fastcc" - The fast calling convention:
This calling convention attempts to make calls as fast as possible (e.g. by passing things in registers). This calling convention allows the target to use whatever tricks it wants to produce fast code for the target, without having to conform to an externally specified ABI (Application Binary Interface). Tail calls can only be optimized when this or the GHC convention is used. This calling convention does not support varargs and requires the prototype of all callees to exactly match the prototype of the function definition.
"coldcc" - The cold calling convention:
This calling convention attempts to make code in the caller as efficient as possible under the assumption that the call is not commonly executed. As such, these calls often preserve all registers so that the call does not break any live ranges in the caller side. This calling convention does not support varargs and requires the prototype of all callees to exactly match the prototype of the function definition.
"cc 10" - GHC convention:
This calling convention has been implemented specifically for use by the Glasgow Haskell Compiler (GHC). It passes everything in registers, going to extremes to achieve this by disabling callee save registers. This calling convention should not be used lightly but only for specific situations such as an alternative to the register pinning performance technique often used when implementing functional programming languages.At the moment only X86 supports this convention and it has the following limitations:
  • On X86-32 only supports up to 4 bit type parameters. No floating point types are supported.
  • On X86-64 only supports up to 10 bit type parameters and 6 floating point parameters.
This calling convention supports tail call optimization but requires both the caller and callee are using it.
"cc <n>" - Numbered convention:
Any calling convention may be specified by number, allowing target-specific calling conventions to be used. Target specific calling conventions start at 64.

More calling conventions can be added/defined on an as-needed basis, to support Pascal conventions or any other well-known target-independent convention.

Visibility Styles

All Global Variables and Functions have one of the following visibility styles:

"default" - Default style:
On targets that use the ELF object file format, default visibility means that the declaration is visible to other modules and, in shared libraries, means that the declared entity may be overridden. On Darwin, default visibility means that the declaration is visible to other modules. Default visibility corresponds to "external linkage" in the language.
"hidden" - Hidden style:
Two declarations of an object with hidden visibility refer to the same object if they are in the same shared object. Usually, hidden visibility indicates that the symbol will not be placed into the dynamic symbol table, so no other module (executable or shared library) can reference it directly.
"protected" - Protected style:
On ELF, protected visibility indicates that the symbol will be placed in the dynamic symbol table, but that references within the defining module will bind to the local symbol. That is, the symbol cannot be overridden by another module.

Named Types

LLVM IR allows you to specify name aliases for certain types. This can make it easier to read the IR and make the IR more condensed (particularly when recursive types are involved). An example of a name specification is:

 %mytype = type { %mytype*, i32 }
 

You may give a name to any type except "void". Type name aliases may be used anywhere a type is expected with the syntax "%mytype".

Note that type names are aliases for the structural type that they indicate, and that you can therefore specify multiple names for the same type. This often leads to confusing behavior when dumping out a .ll file. Since LLVM IR uses structural typing, the name is not part of the type. When printing out LLVM IR, the printer will pick one name to render all types of a particular shape. This means that if you have code where two different source types end up having the same LLVM type, that the dumper will sometimes print the "wrong" or unexpected type. This is an important design point and isn't going to change.

Global Variables

Global variables define regions of memory allocated at compilation time instead of run-time. Global variables may optionally be initialized, may have an explicit section to be placed in, and may have an optional explicit alignment specified. A variable may be defined as "thread_local", which means that it will not be shared by threads (each thread will have a separated copy of the variable). A variable may be defined as a global "constant," which indicates that the contents of the variable will never be modified (enabling better optimization, allowing the global data to be placed in the read-only section of an executable, etc). Note that variables that need runtime initialization cannot be marked "constant" as there is a store to the variable.

LLVM explicitly allows declarations of global variables to be marked constant, even if the final definition of the global is not. This capability can be used to enable slightly better optimization of the program, but requires the language definition to guarantee that optimizations based on the 'constantness' are valid for the translation units that do not include the definition.

As SSA values, global variables define pointer values that are in scope (i.e. they dominate) all basic blocks in the program. Global variables always define a pointer to their "content" type because they describe a region of memory, and all memory objects in LLVM are accessed through pointers.

Global variables can be marked with unnamed_addr which indicates that the address is not significant, only the content. Constants marked like this can be merged with other constants if they have the same initializer. Note that a constant with significant address can be merged with a unnamed_addr constant, the result being a constant whose address is significant.

A global variable may be declared to reside in a target-specific numbered address space. For targets that support them, address spaces may affect how optimizations are performed and/or what target instructions are used to access the variable. The default address space is zero. The address space qualifier must precede any other attributes.

LLVM allows an explicit section to be specified for globals. If the target supports it, it will emit globals to the section specified.

An explicit alignment may be specified for a global, which must be a power of 2. If not present, or if the alignment is set to zero, the alignment of the global is set by the target to whatever it feels convenient. If an explicit alignment is specified, the global is forced to have exactly that alignment. Targets and optimizers are not allowed to over-align the global if the global has an assigned section. In this case, the extra alignment could be observable: for example, code could assume that the globals are densely packed in their section and try to iterate over them as an array, alignment padding would break this iteration.

For example, the following defines a global in a numbered address space with an initializer, section, and alignment:

 @G = addrspace(5) constant float 1.0, section "foo", align 4
 

Functions

LLVM function definitions consist of the "define" keyword, an optional linkage type, an optional visibility style, an optional calling convention, an optional unnamed_addr attribute, a return type, an optional parameter attribute for the return type, a function name, a (possibly empty) argument list (each with optional parameter attributes), optional function attributes, an optional section, an optional alignment, an optional garbage collector name, an opening curly brace, a list of basic blocks, and a closing curly brace.

LLVM function declarations consist of the "declare" keyword, an optional linkage type, an optional visibility style, an optional calling convention, an optional unnamed_addr attribute, a return type, an optional parameter attribute for the return type, a function name, a possibly empty list of arguments, an optional alignment, and an optional garbage collector name.

A function definition contains a list of basic blocks, forming the CFG (Control Flow Graph) for the function. Each basic block may optionally start with a label (giving the basic block a symbol table entry), contains a list of instructions, and ends with a terminator instruction (such as a branch or function return).

The first basic block in a function is special in two ways: it is immediately executed on entrance to the function, and it is not allowed to have predecessor basic blocks (i.e. there can not be any branches to the entry block of a function). Because the block can have no predecessors, it also cannot have any PHI nodes.

LLVM allows an explicit section to be specified for functions. If the target supports it, it will emit functions to the section specified.

An explicit alignment may be specified for a function. If not present, or if the alignment is set to zero, the alignment of the function is set by the target to whatever it feels convenient. If an explicit alignment is specified, the function is forced to have at least that much alignment. All alignments must be a power of 2.

If the unnamed_addr attribute is given, the address is know to not be significant and two identical functions can be merged

.
Syntax:
 define [linkage] [visibility]
        [cconv] [ret attrs]
        <ResultType> @<FunctionName> ([argument list])
        [fn Attrs] [section "name"] [align N]
        [gc] { ... }
 

Aliases

Aliases act as "second name" for the aliasee value (which can be either function, global variable, another alias or bitcast of global value). Aliases may have an optional linkage type, and an optional visibility style.

Syntax:
 @<Name> = alias [Linkage] [Visibility] <AliaseeTy> @<Aliasee>
 

Named Metadata

Named metadata is a collection of metadata. Metadata nodes (but not metadata strings) are the only valid operands for a named metadata.

Syntax:
 ; Some unnamed metadata nodes, which are referenced by the named metadata.
 !0 = metadata !{metadata !"zero"}
 !1 = metadata !{metadata !"one"}
 !2 = metadata !{metadata !"two"}
 ; A named metadata.
 !name = !{!0, !1, !2}
 

Parameter Attributes

The return type and each parameter of a function type may have a set of parameter attributes associated with them. Parameter attributes are used to communicate additional information about the result or parameters of a function. Parameter attributes are considered to be part of the function, not of the function type, so functions with different parameter attributes can have the same function type.

Parameter attributes are simple keywords that follow the type specified. If multiple parameter attributes are needed, they are space separated. For example:

 declare i32 @printf(i8* noalias nocapture, ...)
 declare i32 @atoi(i8 zeroext)
 declare signext i8 @returns_signed_char()
 

Note that any attributes for the function result (nounwind, readonly) come immediately after the argument list.

Currently, only the following parameter attributes are defined:

zeroext
This indicates to the code generator that the parameter or return value should be zero-extended to the extent required by the target's ABI (which is usually 32-bits, but is 8-bits for a i1 on x86-64) by the caller (for a parameter) or the callee (for a return value).
signext
This indicates to the code generator that the parameter or return value should be sign-extended to the extent required by the target's ABI (which is usually 32-bits) by the caller (for a parameter) or the callee (for a return value).
inreg
This indicates that this parameter or return value should be treated in a special target-dependent fashion during while emitting code for a function call or return (usually, by putting it in a register as opposed to memory, though some targets use it to distinguish between two different kinds of registers). Use of this attribute is target-specific.
byval

This indicates that the pointer parameter should really be passed by value to the function. The attribute implies that a hidden copy of the pointee is made between the caller and the callee, so the callee is unable to modify the value in the callee. This attribute is only valid on LLVM pointer arguments. It is generally used to pass structs and arrays by value, but is also valid on pointers to scalars. The copy is considered to belong to the caller not the callee (for example, readonly functions should not write to byval parameters). This is not a valid attribute for return values.

The byval attribute also supports specifying an alignment with the align attribute. It indicates the alignment of the stack slot to form and the known alignment of the pointer specified to the call site. If the alignment is not specified, then the code generator makes a target-specific assumption.

sret
This indicates that the pointer parameter specifies the address of a structure that is the return value of the function in the source program. This pointer must be guaranteed by the caller to be valid: loads and stores to the structure may be assumed by the callee to not to trap. This may only be applied to the first parameter. This is not a valid attribute for return values.
noalias
This indicates that pointer values based on the argument or return value do not alias pointer values which are not based on it, ignoring certain "irrelevant" dependencies. For a call to the parent function, dependencies between memory references from before or after the call and from those during the call are "irrelevant" to the noalias keyword for the arguments and return value used in that call. The caller shares the responsibility with the callee for ensuring that these requirements are met. For further details, please see the discussion of the NoAlias response in alias analysis.

Note that this definition of noalias is intentionally similar to the definition of restrict in C99 for function arguments, though it is slightly weaker.
For function return values, C99's restrict is not meaningful, while LLVM's noalias is.
nocapture
This indicates that the callee does not make any copies of the pointer that outlive the callee itself. This is not a valid attribute for return values.
nest
This indicates that the pointer parameter can be excised using the trampoline intrinsics. This is not a valid attribute for return values.

Garbage Collector Names

Each function may specify a garbage collector name, which is simply a string:

 define void @f() gc "name" { ... }
 

The compiler declares the supported values of name. Specifying a collector which will cause the compiler to alter its output in order to support the named garbage collection algorithm.

Function Attributes

Function attributes are set to communicate additional information about a function. Function attributes are considered to be part of the function, not of the function type, so functions with different parameter attributes can have the same function type.

Function attributes are simple keywords that follow the type specified. If multiple attributes are needed, they are space separated. For example:

 define void @f() noinline { ... }
 define void @f() alwaysinline { ... }
 define void @f() alwaysinline optsize { ... }
 define void @f() optsize { ... }
 
alignstack(<n>)
This attribute indicates that, when emitting the prologue and epilogue, the backend should forcibly align the stack pointer. Specify the desired alignment, which must be a power of two, in parentheses.
alwaysinline
This attribute indicates that the inliner should attempt to inline this function into callers whenever possible, ignoring any active inlining size threshold for this caller.
nonlazybind
This attribute suppresses lazy symbol binding for the function. This may make calls to the function faster, at the cost of extra program startup time if the function is not called during program startup.
inlinehint
This attribute indicates that the source code contained a hint that inlining this function is desirable (such as the "inline" keyword in C/C++). It is just a hint; it imposes no requirements on the inliner.
naked
This attribute disables prologue / epilogue emission for the function. This can have very system-specific consequences.
noimplicitfloat
This attributes disables implicit floating point instructions.
noinline
This attribute indicates that the inliner should never inline this function in any situation. This attribute may not be used together with the alwaysinline attribute.
noredzone
This attribute indicates that the code generator should not use a red zone, even if the target-specific ABI normally permits it.
noreturn
This function attribute indicates that the function never returns normally. This produces undefined behavior at runtime if the function ever does dynamically return.
nounwind
This function attribute indicates that the function never returns with an unwind or exceptional control flow. If the function does unwind, its runtime behavior is undefined.
optsize
This attribute suggests that optimization passes and code generator passes make choices that keep the code size of this function low, and otherwise do optimizations specifically to reduce code size.
readnone
This attribute indicates that the function computes its result (or decides to unwind an exception) based strictly on its arguments, without dereferencing any pointer arguments or otherwise accessing any mutable state (e.g. memory, control registers, etc) visible to caller functions. It does not write through any pointer arguments (including byval arguments) and never changes any state visible to callers. This means that it cannot unwind exceptions by calling the C++ exception throwing methods, but could use the unwind instruction.
readonly
This attribute indicates that the function does not write through any pointer arguments (including byval arguments) or otherwise modify any state (e.g. memory, control registers, etc) visible to caller functions. It may dereference pointer arguments and read state that may be set in the caller. A readonly function always returns the same value (or unwinds an exception identically) when called with the same set of arguments and global state. It cannot unwind an exception by calling the C++ exception throwing methods, but may use the unwind instruction.
ssp
This attribute indicates that the function should emit a stack smashing protector. It is in the form of a "canary"—a random value placed on the stack before the local variables that's checked upon return from the function to see if it has been overwritten. A heuristic is used to determine if a function needs stack protectors or not.

If a function that has an ssp attribute is inlined into a function that doesn't have an ssp attribute, then the resulting function will have an ssp attribute.
sspreq
This attribute indicates that the function should always emit a stack smashing protector. This overrides the ssp function attribute.

If a function that has an sspreq attribute is inlined into a function that doesn't have an sspreq attribute or which has an ssp attribute, then the resulting function will have an sspreq attribute.
uwtable
This attribute indicates that the ABI being targeted requires that an unwind table entry be produce for this function even if we can show that no exceptions passes by it. This is normally the case for the ELF x86-64 abi, but it can be disabled for some compilation units.
returns_twice
This attribute indicates that this function can return twice. The C setjmp is an example of such a function. The compiler disables some optimizations (like tail calls) in the caller of these functions.

Module-Level Inline Assembly

Modules may contain "module-level inline asm" blocks, which corresponds to the GCC "file scope inline asm" blocks. These blocks are internally concatenated by LLVM and treated as a single unit, but may be separated in the .ll file if desired. The syntax is very simple:

 module asm "inline asm code goes here"
 module asm "more can go here"
 

The strings can contain any character by escaping non-printable characters. The escape sequence used is simply "\xx" where "xx" is the two digit hex code for the number.

The inline asm code is simply printed to the machine code .s file when assembly code is generated.

Data Layout

A module may specify a target specific data layout string that specifies how data is to be laid out in memory. The syntax for the data layout is simply:

 target datalayout = "layout specification"
 

The layout specification consists of a list of specifications separated by the minus sign character ('-'). Each specification starts with a letter and may include other information after the letter to define some aspect of the data layout. The specifications accepted are as follows:

E
Specifies that the target lays out data in big-endian form. That is, the bits with the most significance have the lowest address location.
e
Specifies that the target lays out data in little-endian form. That is, the bits with the least significance have the lowest address location.
Ssize
Specifies the natural alignment of the stack in bits. Alignment promotion of stack variables is limited to the natural stack alignment to avoid dynamic stack realignment. The stack alignment must be a multiple of 8-bits. If omitted, the natural stack alignment defaults to "unspecified", which does not prevent any alignment promotions.
p:size:abi:pref
This specifies the size of a pointer and its abi and preferred alignments. All sizes are in bits. Specifying the pref alignment is optional. If omitted, the preceding : should be omitted too.
isize:abi:pref
This specifies the alignment for an integer type of a given bit size. The value of size must be in the range [1,2^23).
vsize:abi:pref
This specifies the alignment for a vector type of a given bit size.
fsize:abi:pref
This specifies the alignment for a floating point type of a given bit size. Only values of size that are supported by the target will work. 32 (float) and 64 (double) are supported on all targets; 80 or 128 (different flavors of long double) are also supported on some targets.
asize:abi:pref
This specifies the alignment for an aggregate type of a given bit size.
ssize:abi:pref
This specifies the alignment for a stack object of a given bit size.
nsize1:size2:size3...
This specifies a set of native integer widths for the target CPU in bits. For example, it might contain "n32" for 32-bit PowerPC, "n32:64" for PowerPC 64, or "n8:16:32:64" for X86-64. Elements of this set are considered to support most general arithmetic operations efficiently.

When constructing the data layout for a given target, LLVM starts with a default set of specifications which are then (possibly) overridden by the specifications in the datalayout keyword. The default specifications are given in this list:

When LLVM is determining the alignment for a given type, it uses the following rules:

  1. If the type sought is an exact match for one of the specifications, that specification is used.
  2. If no match is found, and the type sought is an integer type, then the smallest integer type that is larger than the bitwidth of the sought type is used. If none of the specifications are larger than the bitwidth then the the largest integer type is used. For example, given the default specifications above, the i7 type will use the alignment of i8 (next largest) while both i65 and i256 will use the alignment of i64 (largest specified).
  3. If no match is found, and the type sought is a vector type, then the largest vector type that is smaller than the sought vector type will be used as a fall back. This happens because <128 x double> can be implemented in terms of 64 <2 x double>, for example.

The function of the data layout string may not be what you expect. Notably, this is not a specification from the frontend of what alignment the code generator should use.

Instead, if specified, the target data layout is required to match what the ultimate code generator expects. This string is used by the mid-level optimizers to improve code, and this only works if it matches what the ultimate code generator uses. If you would like to generate IR that does not embed this target-specific detail into the IR, then you don't have to specify the string. This will disable some optimizations that require precise layout information, but this also prevents those optimizations from introducing target specificity into the IR.

Pointer Aliasing Rules

Any memory access must be done through a pointer value associated with an address range of the memory access, otherwise the behavior is undefined. Pointer values are associated with address ranges according to the following rules:

A pointer value is based on another pointer value according to the following rules:

Note that this definition of "based" is intentionally similar to the definition of "based" in C99, though it is slightly weaker.

LLVM IR does not associate types with memory. The result type of a load merely indicates the size and alignment of the memory from which to load, as well as the interpretation of the value. The first operand type of a store similarly only indicates the size and alignment of the store.

Consequently, type-based alias analysis, aka TBAA, aka -fstrict-aliasing, is not applicable to general unadorned LLVM IR. Metadata may be used to encode additional information which specialized optimization passes may use to implement type-based alias analysis.

Volatile Memory Accesses

Certain memory accesses, such as loads, stores, and llvm.memcpys may be marked volatile. The optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. The optimizers may change the order of volatile operations relative to non-volatile operations. This is not Java's "volatile" and has no cross-thread synchronization behavior.

Memory Model for Concurrent Operations

The LLVM IR does not define any way to start parallel threads of execution or to register signal handlers. Nonetheless, there are platform-specific ways to create them, and we define LLVM IR's behavior in their presence. This model is inspired by the C++0x memory model.

For a more informal introduction to this model, see the LLVM Atomic Instructions and Concurrency Guide.

We define a happens-before partial order as the least partial order that

Note that program order does not introduce happens-before edges between a thread and signals executing inside that thread.

Every (defined) read operation (load instructions, memcpy, atomic loads/read-modify-writes, etc.) R reads a series of bytes written by (defined) write operations (store instructions, atomic stores/read-modify-writes, memcpy, etc.). For the purposes of this section, initialized globals are considered to have a write of the initializer which is atomic and happens before any other read or write of the memory in question. For each byte of a read R, Rbyte may see any write to the same byte, except:

Given that definition, Rbyte is defined as follows:

R returns the value composed of the series of bytes it read. This implies that some bytes within the value may be undef without the entire value being undef. Note that this only defines the semantics of the operation; it doesn't mean that targets will emit more than one instruction to read the series of bytes.

Note that in cases where none of the atomic intrinsics are used, this model places only one restriction on IR transformations on top of what is required for single-threaded execution: introducing a store to a byte which might not otherwise be stored is not allowed in general. (Specifically, in the case where another thread might write to and read from an address, introducing a store can change a load that may see exactly one write into a load that may see multiple writes.)

Atomic Memory Ordering Constraints

Atomic instructions (cmpxchg, atomicrmw, fence, atomic load, and atomic store) take an ordering parameter that determines which other atomic instructions on the same address they synchronize with. These semantics are borrowed from Java and C++0x, but are somewhat more colloquial. If these descriptions aren't precise enough, check those specs (see spec references in the atomics guide). fence instructions treat these orderings somewhat differently since they don't take an address. See that instruction's documentation for details.

For a simpler introduction to the ordering constraints, see the LLVM Atomic Instructions and Concurrency Guide.

unordered
The set of values that can be read is governed by the happens-before partial order. A value cannot be read unless some operation wrote it. This is intended to provide a guarantee strong enough to model Java's non-volatile shared variables. This ordering cannot be specified for read-modify-write operations; it is not strong enough to make them atomic in any interesting way.
monotonic
In addition to the guarantees of unordered, there is a single total order for modifications by monotonic operations on each address. All modification orders must be compatible with the happens-before order. There is no guarantee that the modification orders can be combined to a global total order for the whole program (and this often will not be possible). The read in an atomic read-modify-write operation (cmpxchg and atomicrmw) reads the value in the modification order immediately before the value it writes. If one atomic read happens before another atomic read of the same address, the later read must see the same value or a later value in the address's modification order. This disallows reordering of monotonic (or stronger) operations on the same address. If an address is written monotonically by one thread, and other threads monotonically read that address repeatedly, the other threads must eventually see the write. This corresponds to the C++0x/C1x memory_order_relaxed.
acquire
In addition to the guarantees of monotonic, a synchronizes-with edge may be formed with a release operation. This is intended to model C++'s memory_order_acquire.
release
In addition to the guarantees of monotonic, if this operation writes a value which is subsequently read by an acquire operation, it synchronizes-with that operation. (This isn't a complete description; see the C++0x definition of a release sequence.) This corresponds to the C++0x/C1x memory_order_release.
acq_rel (acquire+release)
Acts as both an acquire and release operation on its address. This corresponds to the C++0x/C1x memory_order_acq_rel.
seq_cst (sequentially consistent)
In addition to the guarantees of acq_rel (acquire for an operation which only reads, release for an operation which only writes), there is a global total order on all sequentially-consistent operations on all addresses, which is consistent with the happens-before partial order and with the modification orders of all the affected addresses. Each sequentially-consistent read sees the last preceding write to the same address in this global order. This corresponds to the C++0x/C1x memory_order_seq_cst and Java volatile.

If an atomic operation is marked singlethread, it only synchronizes with or participates in modification and seq_cst total orderings with other operations running in the same thread (for example, in signal handlers).

Type System

The LLVM type system is one of the most important features of the intermediate representation. Being typed enables a number of optimizations to be performed on the intermediate representation directly, without having to do extra analyses on the side before the transformation. A strong type system makes it easier to read the generated code and enables novel analyses and transformations that are not feasible to perform on normal three address code representations.

Type Classifications

The types fall into a few useful classifications:

ClassificationTypes
integer i1, i2, i3, ... i8, ... i16, ... i32, ... i64, ...
floating point float, double, x86_fp80, fp128, ppc_fp128
first class integer, floating point, pointer, vector, structure, array, label, metadata.
primitive label, void, integer, floating point, x86mmx, metadata.
derived array, function, pointer, structure, vector, opaque.

The first class types are perhaps the most important. Values of these types are the only ones which can be produced by instructions.

Primitive Types

The primitive types are the fundamental building blocks of the LLVM system.

Integer Type

Overview:

The integer type is a very simple type that simply specifies an arbitrary bit width for the integer type desired. Any bit width from 1 bit to 223-1 (about 8 million) can be specified.

Syntax:
   iN
 

The number of bits the integer will occupy is specified by the N value.

Examples:
i1 a single-bit integer.
i32 a 32-bit integer.
i1942652 a really big integer of over 1 million bits.

Floating Point Types

TypeDescription
float32-bit floating point value
double64-bit floating point value
fp128128-bit floating point value (112-bit mantissa)
x86_fp8080-bit floating point value (X87)
ppc_fp128128-bit floating point value (two 64-bits)

X86mmx Type

Overview:

The x86mmx type represents a value held in an MMX register on an x86 machine. The operations allowed on it are quite limited: parameters and return values, load and store, and bitcast. User-specified MMX instructions are represented as intrinsic or asm calls with arguments and/or results of this type. There are no arrays, vectors or constants of this type.

Syntax:
   x86mmx
 

Void Type

Overview:

The void type does not represent any value and has no size.

Syntax:
   void
 

Label Type

Overview:

The label type represents code labels.

Syntax:
   label
 

Metadata Type

Overview:

The metadata type represents embedded metadata. No derived types may be created from metadata except for function arguments.

Syntax:
   metadata
 

Derived Types

The real power in LLVM comes from the derived types in the system. This is what allows a programmer to represent arrays, functions, pointers, and other useful types. Each of these types contain one or more element types which may be a primitive type, or another derived type. For example, it is possible to have a two dimensional array, using an array as the element type of another array.

-
- -

Aggregate Types

Aggregate Types are a subset of derived types that can contain multiple member types. Arrays, structs, and vectors are aggregate types.

Array Type

Overview:

The array type is a very simple derived type that arranges elements sequentially in memory. The array type requires a size (number of elements) and an underlying data type.

Syntax:
   [<# elements> x <elementtype>]
 

The number of elements is a constant integer value; elementtype may be any type with a size.

Examples:
[40 x i32] Array of 40 32-bit integer values.
[41 x i32] Array of 41 32-bit integer values.
[4 x i8] Array of 4 8-bit integer values.

Here are some examples of multidimensional arrays:

[3 x [4 x i32]] 3x4 array of 32-bit integer values.
[12 x [10 x float]] 12x10 array of single precision floating point values.
[2 x [3 x [4 x i16]]] 2x3x4 array of 16-bit integer values.

There is no restriction on indexing beyond the end of the array implied by a static type (though there are restrictions on indexing beyond the bounds of an allocated object in some cases). This means that single-dimension 'variable sized array' addressing can be implemented in LLVM with a zero length array type. An implementation of 'pascal style arrays' in LLVM could use the type "{ i32, [0 x float]}", for example.

Function Type

Overview:

The function type can be thought of as a function signature. It consists of a return type and a list of formal parameter types. The return type of a function type is a first class type or a void type.

Syntax:
   <returntype> (<parameter list>)
 

...where '<parameter list>' is a comma-separated list of type specifiers. Optionally, the parameter list may include a type ..., which indicates that the function takes a variable number of arguments. Variable argument functions can access their arguments with the variable argument handling intrinsic functions. '<returntype>' is any type except label.

Examples:
i32 (i32) function taking an i32, returning an i32
float (i16, i32 *) * Pointer to a function that takes an i16 and a pointer to i32, returning float.
i32 (i8*, ...) A vararg function that takes at least one pointer to i8 (char in C), which returns an integer. This is the signature for printf in LLVM.
{i32, i32} (i32) A function taking an i32, returning a structure containing two i32 values

Structure Type

Overview:

The structure type is used to represent a collection of data members together in memory. The elements of a structure may be any type that has a size.

Structures in memory are accessed using 'load' and 'store' by getting a pointer to a field with the 'getelementptr' instruction. Structures in registers are accessed using the 'extractvalue' and 'insertvalue' instructions.

Structures may optionally be "packed" structures, which indicate that the alignment of the struct is one byte, and that there is no padding between the elements. In non-packed structs, padding between field types is inserted as defined by the TargetData string in the module, which is required to match what the underlying code generator expects.

Structures can either be "literal" or "identified". A literal structure is defined inline with other types (e.g. {i32, i32}*) whereas identified types are always defined at the top level with a name. Literal types are uniqued by their contents and can never be recursive or opaque since there is no way to write one. Identified types can be recursive, can be opaqued, and are never uniqued.

Syntax:
   %T1 = type { <type list> }     ; Identified normal struct type
   %T2 = type <{ <type list> }>   ; Identified packed struct type
 
Examples:
{ i32, i32, i32 } A triple of three i32 values
{ float, i32 (i32) * } A pair, where the first element is a float and the second element is a pointer to a function that takes an i32, returning an i32.
<{ i8, i32 }> A packed struct known to be 5 bytes in size.

Opaque Structure Types

Overview:

Opaque structure types are used to represent named structure types that do not have a body specified. This corresponds (for example) to the C notion of a forward declared structure.

Syntax:
   %X = type opaque
   %52 = type opaque
 
Examples:
opaque An opaque type.

Pointer Type

Overview:

The pointer type is used to specify memory locations. Pointers are commonly used to reference objects in memory.

Pointer types may have an optional address space attribute defining the numbered address space where the pointed-to object resides. The default address space is number zero. The semantics of non-zero address spaces are target-specific.

Note that LLVM does not permit pointers to void (void*) nor does it permit pointers to labels (label*). Use i8* instead.

Syntax:
   <type> *
 
Examples:
[4 x i32]* A pointer to array of four i32 values.
i32 (i32*) * A pointer to a function that takes an i32*, returning an i32.
i32 addrspace(5)* A pointer to an i32 value that resides in address space #5.

Vector Type

Overview:

A vector type is a simple derived type that represents a vector of elements. Vector types are used when multiple primitive data are operated in parallel using a single instruction (SIMD). A vector type requires a size (number of elements) and an underlying primitive data type. Vector types are considered first class.

Syntax:
   < <# elements> x <elementtype> >
 

The number of elements is a constant integer value larger than 0; elementtype may be any integer or floating point type. Vectors of size zero are not allowed, and pointers are not allowed as the element type.

Examples:
<4 x i32> Vector of 4 32-bit integer values.
<8 x float> Vector of 8 32-bit floating-point values.
<2 x i64> Vector of 2 64-bit integer values.
+ +

Constants

LLVM has several different basic types of constants. This section describes them all and their syntax.

Simple Constants

Boolean constants
The two strings 'true' and 'false' are both valid constants of the i1 type.
Integer constants
Standard integers (such as '4') are constants of the integer type. Negative numbers may be used with integer types.
Floating point constants
Floating point constants use standard decimal notation (e.g. 123.421), exponential notation (e.g. 1.23421e+2), or a more precise hexadecimal notation (see below). The assembler requires the exact decimal value of a floating-point constant. For example, the assembler accepts 1.25 but rejects 1.3 because 1.3 is a repeating decimal in binary. Floating point constants must have a floating point type.
Null pointer constants
The identifier 'null' is recognized as a null pointer constant and must be of pointer type.

The one non-intuitive notation for constants is the hexadecimal form of floating point constants. For example, the form 'double 0x432ff973cafa8000' is equivalent to (but harder to read than) 'double 4.5e+15'. The only time hexadecimal floating point constants are required (and the only time that they are generated by the disassembler) is when a floating point constant must be emitted but it cannot be represented as a decimal floating point number in a reasonable number of digits. For example, NaN's, infinities, and other special values are represented in their IEEE hexadecimal format so that assembly and disassembly do not cause any bits to change in the constants.

When using the hexadecimal form, constants of types float and double are represented using the 16-digit form shown above (which matches the IEEE754 representation for double); float values must, however, be exactly representable as IEE754 single precision. Hexadecimal format is always used for long double, and there are three forms of long double. The 80-bit format used by x86 is represented as 0xK followed by 20 hexadecimal digits. The 128-bit format used by PowerPC (two adjacent doubles) is represented by 0xM followed by 32 hexadecimal digits. The IEEE 128-bit format is represented by 0xL followed by 32 hexadecimal digits; no currently supported target uses this format. Long doubles will only work if they match the long double format on your target. All hexadecimal formats are big-endian (sign bit at the left).

There are no constants of type x86mmx.

Complex Constants

Complex constants are a (potentially recursive) combination of simple constants and smaller complex constants.

Structure constants
Structure constants are represented with notation similar to structure type definitions (a comma separated list of elements, surrounded by braces ({})). For example: "{ i32 4, float 17.0, i32* @G }", where "@G" is declared as "@G = external global i32". Structure constants must have structure type, and the number and types of elements must match those specified by the type.
Array constants
Array constants are represented with notation similar to array type definitions (a comma separated list of elements, surrounded by square brackets ([])). For example: "[ i32 42, i32 11, i32 74 ]". Array constants must have array type, and the number and types of elements must match those specified by the type.
Vector constants
Vector constants are represented with notation similar to vector type definitions (a comma separated list of elements, surrounded by less-than/greater-than's (<>)). For example: "< i32 42, i32 11, i32 74, i32 100 >". Vector constants must have vector type, and the number and types of elements must match those specified by the type.
Zero initialization
The string 'zeroinitializer' can be used to zero initialize a value to zero of any type, including scalar and aggregate types. This is often used to avoid having to print large zero initializers (e.g. for large arrays) and is always exactly equivalent to using explicit zero initializers.
Metadata node
A metadata node is a structure-like constant with metadata type. For example: "metadata !{ i32 0, metadata !"test" }". Unlike other constants that are meant to be interpreted as part of the instruction stream, metadata is a place to attach additional information such as debug info.

Global Variable and Function Addresses

The addresses of global variables and functions are always implicitly valid (link-time) constants. These constants are explicitly referenced when the identifier for the global is used and always have pointer type. For example, the following is a legal LLVM file:

 @X = global i32 17
 @Y = global i32 42
 @Z = global [2 x i32*] [ i32* @X, i32* @Y ]
 

Undefined Values

The string 'undef' can be used anywhere a constant is expected, and indicates that the user of the value may receive an unspecified bit-pattern. Undefined values may be of any type (other than 'label' or 'void') and be used anywhere a constant is permitted.

Undefined values are useful because they indicate to the compiler that the program is well defined no matter what value is used. This gives the compiler more freedom to optimize. Here are some examples of (potentially surprising) transformations that are valid (in pseudo IR):

   %A = add %X, undef
   %B = sub %X, undef
   %C = xor %X, undef
 Safe:
   %A = undef
   %B = undef
   %C = undef
 

This is safe because all of the output bits are affected by the undef bits. Any output bit can have a zero or one depending on the input bits.

   %A = or %X, undef
   %B = and %X, undef
 Safe:
   %A = -1
   %B = 0
 Unsafe:
   %A = undef
   %B = undef
 

These logical operations have bits that are not always affected by the input. For example, if %X has a zero bit, then the output of the 'and' operation will always be a zero for that bit, no matter what the corresponding bit from the 'undef' is. As such, it is unsafe to optimize or assume that the result of the 'and' is 'undef'. However, it is safe to assume that all bits of the 'undef' could be 0, and optimize the 'and' to 0. Likewise, it is safe to assume that all the bits of the 'undef' operand to the 'or' could be set, allowing the 'or' to be folded to -1.

   %A = select undef, %X, %Y
   %B = select undef, 42, %Y
   %C = select %X, %Y, undef
 Safe:
   %A = %X     (or %Y)
   %B = 42     (or %Y)
   %C = %Y
 Unsafe:
   %A = undef
   %B = undef
   %C = undef
 

This set of examples shows that undefined 'select' (and conditional branch) conditions can go either way, but they have to come from one of the two operands. In the %A example, if %X and %Y were both known to have a clear low bit, then %A would have to have a cleared low bit. However, in the %C example, the optimizer is allowed to assume that the 'undef' operand could be the same as %Y, allowing the whole 'select' to be eliminated.

   %A = xor undef, undef
 
   %B = undef
   %C = xor %B, %B
 
   %D = undef
   %E = icmp lt %D, 4
   %F = icmp gte %D, 4
 
 Safe:
   %A = undef
   %B = undef
   %C = undef
   %D = undef
   %E = undef
   %F = undef
 

This example points out that two 'undef' operands are not necessarily the same. This can be surprising to people (and also matches C semantics) where they assume that "X^X" is always zero, even if X is undefined. This isn't true for a number of reasons, but the short answer is that an 'undef' "variable" can arbitrarily change its value over its "live range". This is true because the variable doesn't actually have a live range. Instead, the value is logically read from arbitrary registers that happen to be around when needed, so the value is not necessarily consistent over time. In fact, %A and %C need to have the same semantics or the core LLVM "replace all uses with" concept would not hold.

   %A = fdiv undef, %X
   %B = fdiv %X, undef
 Safe:
   %A = undef
 b: unreachable
 

These examples show the crucial difference between an undefined value and undefined behavior. An undefined value (like 'undef') is allowed to have an arbitrary bit-pattern. This means that the %A operation can be constant folded to 'undef', because the 'undef' could be an SNaN, and fdiv is not (currently) defined on SNaN's. However, in the second example, we can make a more aggressive assumption: because the undef is allowed to be an arbitrary value, we are allowed to assume that it could be zero. Since a divide by zero has undefined behavior, we are allowed to assume that the operation does not execute at all. This allows us to delete the divide and all code after it. Because the undefined operation "can't happen", the optimizer can assume that it occurs in dead code.

 a:  store undef -> %X
 b:  store %X -> undef
 Safe:
 a: <deleted>
 b: unreachable
 

These examples reiterate the fdiv example: a store of an undefined value can be assumed to not have any effect; we can assume that the value is overwritten with bits that happen to match what was already there. However, a store to an undefined location could clobber arbitrary memory, therefore, it has undefined behavior.

Trap Values

Trap values are similar to undef values, however instead of representing an unspecified bit pattern, they represent the fact that an instruction or constant expression which cannot evoke side effects has nevertheless detected a condition which results in undefined behavior.

There is currently no way of representing a trap value in the IR; they only exist when produced by operations such as add with the nsw flag.

Trap value behavior is defined in terms of value dependence:

Whenever a trap value is generated, all values which depend on it evaluate to trap. If they have side effects, they evoke their side effects as if each operand with a trap value were undef. If they have externally-visible side effects, the behavior is undefined.

Here are some examples:

 entry:
   %trap = sub nuw i32 0, 1           ; Results in a trap value.
   %still_trap = and i32 %trap, 0     ; Whereas (and i32 undef, 0) would return 0.
   %trap_yet_again = getelementptr i32* @h, i32 %still_trap
   store i32 0, i32* %trap_yet_again  ; undefined behavior
 
   store i32 %trap, i32* @g           ; Trap value conceptually stored to memory.
   %trap2 = load i32* @g              ; Returns a trap value, not just undef.
 
   volatile store i32 %trap, i32* @g  ; External observation; undefined behavior.
 
   %narrowaddr = bitcast i32* @g to i16*
   %wideaddr = bitcast i32* @g to i64*
   %trap3 = load i16* %narrowaddr     ; Returns a trap value.
   %trap4 = load i64* %wideaddr       ; Returns a trap value.
 
   %cmp = icmp slt i32 %trap, 0       ; Returns a trap value.
   br i1 %cmp, label %true, label %end ; Branch to either destination.
 
 true:
   volatile store i32 0, i32* @g      ; This is control-dependent on %cmp, so
                                      ; it has undefined behavior.
   br label %end
 
 end:
   %p = phi i32 [ 0, %entry ], [ 1, %true ]
                                      ; Both edges into this PHI are
                                      ; control-dependent on %cmp, so this
                                      ; always results in a trap value.
 
   volatile store i32 0, i32* @g      ; This would depend on the store in %true
                                      ; if %cmp is true, or the store in %entry
                                      ; otherwise, so this is undefined behavior.
 
   br i1 %cmp, label %second_true, label %second_end
                                      ; The same branch again, but this time the
                                      ; true block doesn't have side effects.
 
 second_true:
   ; No side effects!
   ret void
 
 second_end:
   volatile store i32 0, i32* @g      ; This time, the instruction always depends
                                      ; on the store in %end. Also, it is
                                      ; control-equivalent to %end, so this is
                                      ; well-defined (again, ignoring earlier
                                      ; undefined behavior in this example).
 

Addresses of Basic Blocks

blockaddress(@function, %block)

The 'blockaddress' constant computes the address of the specified basic block in the specified function, and always has an i8* type. Taking the address of the entry block is illegal.

This value only has defined behavior when used as an operand to the 'indirectbr' instruction, or for comparisons against null. Pointer equality tests between labels addresses results in undefined behavior — though, again, comparison against null is ok, and no label is equal to the null pointer. This may be passed around as an opaque pointer sized value as long as the bits are not inspected. This allows ptrtoint and arithmetic to be performed on these values so long as the original value is reconstituted before the indirectbr instruction.

Finally, some targets may provide defined semantics when using the value as the operand to an inline assembly, but that is target specific.

Constant Expressions

Constant expressions are used to allow expressions involving other constants to be used as constants. Constant expressions may be of any first class type and may involve any LLVM operation that does not have side effects (e.g. load and call are not supported). The following is the syntax for constant expressions:

trunc (CST to TYPE)
Truncate a constant to another type. The bit size of CST must be larger than the bit size of TYPE. Both types must be integers.
zext (CST to TYPE)
Zero extend a constant to another type. The bit size of CST must be smaller than the bit size of TYPE. Both types must be integers.
sext (CST to TYPE)
Sign extend a constant to another type. The bit size of CST must be smaller than the bit size of TYPE. Both types must be integers.
fptrunc (CST to TYPE)
Truncate a floating point constant to another floating point type. The size of CST must be larger than the size of TYPE. Both types must be floating point.
fpext (CST to TYPE)
Floating point extend a constant to another type. The size of CST must be smaller or equal to the size of TYPE. Both types must be floating point.
fptoui (CST to TYPE)
Convert a floating point constant to the corresponding unsigned integer constant. TYPE must be a scalar or vector integer type. CST must be of scalar or vector floating point type. Both CST and TYPE must be scalars, or vectors of the same number of elements. If the value won't fit in the integer type, the results are undefined.
fptosi (CST to TYPE)
Convert a floating point constant to the corresponding signed integer constant. TYPE must be a scalar or vector integer type. CST must be of scalar or vector floating point type. Both CST and TYPE must be scalars, or vectors of the same number of elements. If the value won't fit in the integer type, the results are undefined.
uitofp (CST to TYPE)
Convert an unsigned integer constant to the corresponding floating point constant. TYPE must be a scalar or vector floating point type. CST must be of scalar or vector integer type. Both CST and TYPE must be scalars, or vectors of the same number of elements. If the value won't fit in the floating point type, the results are undefined.
sitofp (CST to TYPE)
Convert a signed integer constant to the corresponding floating point constant. TYPE must be a scalar or vector floating point type. CST must be of scalar or vector integer type. Both CST and TYPE must be scalars, or vectors of the same number of elements. If the value won't fit in the floating point type, the results are undefined.
ptrtoint (CST to TYPE)
Convert a pointer typed constant to the corresponding integer constant TYPE must be an integer type. CST must be of pointer type. The CST value is zero extended, truncated, or unchanged to make it fit in TYPE.
inttoptr (CST to TYPE)
Convert a integer constant to a pointer constant. TYPE must be a pointer type. CST must be of integer type. The CST value is zero extended, truncated, or unchanged to make it fit in a pointer size. This one is really dangerous!
bitcast (CST to TYPE)
Convert a constant, CST, to another TYPE. The constraints of the operands are the same as those for the bitcast instruction.
getelementptr (CSTPTR, IDX0, IDX1, ...)
getelementptr inbounds (CSTPTR, IDX0, IDX1, ...)
Perform the getelementptr operation on constants. As with the getelementptr instruction, the index list may have zero or more indexes, which are required to make sense for the type of "CSTPTR".
select (COND, VAL1, VAL2)
Perform the select operation on constants.
icmp COND (VAL1, VAL2)
Performs the icmp operation on constants.
fcmp COND (VAL1, VAL2)
Performs the fcmp operation on constants.
extractelement (VAL, IDX)
Perform the extractelement operation on constants.
insertelement (VAL, ELT, IDX)
Perform the insertelement operation on constants.
shufflevector (VEC1, VEC2, IDXMASK)
Perform the shufflevector operation on constants.
extractvalue (VAL, IDX0, IDX1, ...)
Perform the extractvalue operation on constants. The index list is interpreted in a similar manner as indices in a 'getelementptr' operation. At least one index value must be specified.
insertvalue (VAL, ELT, IDX0, IDX1, ...)
Perform the insertvalue operation on constants. The index list is interpreted in a similar manner as indices in a 'getelementptr' operation. At least one index value must be specified.
OPCODE (LHS, RHS)
Perform the specified operation of the LHS and RHS constants. OPCODE may be any of the binary or bitwise binary operations. The constraints on operands are the same as those for the corresponding instruction (e.g. no bitwise operations on floating point values are allowed).

Other Values

Inline Assembler Expressions

LLVM supports inline assembler expressions (as opposed to Module-Level Inline Assembly) through the use of a special value. This value represents the inline assembler as a string (containing the instructions to emit), a list of operand constraints (stored as a string), a flag that indicates whether or not the inline asm expression has side effects, and a flag indicating whether the function containing the asm needs to align its stack conservatively. An example inline assembler expression is:

 i32 (i32) asm "bswap $0", "=r,r"
 

Inline assembler expressions may only be used as the callee operand of a call instruction. Thus, typically we have:

 %X = call i32 asm "bswap $0", "=r,r"(i32 %Y)
 

Inline asms with side effects not visible in the constraint list must be marked as having side effects. This is done through the use of the 'sideeffect' keyword, like so:

 call void asm sideeffect "eieio", ""()
 

In some cases inline asms will contain code that will not work unless the stack is aligned in some way, such as calls or SSE instructions on x86, yet will not contain code that does that alignment within the asm. The compiler should make conservative assumptions about what the asm might contain and should generate its usual stack alignment code in the prologue if the 'alignstack' keyword is present:

 call void asm alignstack "eieio", ""()
 

If both keywords appear the 'sideeffect' keyword must come first.

TODO: The format of the asm and constraints string still need to be documented here. Constraints on what can be done (e.g. duplication, moving, etc need to be documented). This is probably best done by reference to another document that covers inline asm from a holistic perspective.

Inline Asm Metadata

The call instructions that wrap inline asm nodes may have a "!srcloc" MDNode attached to it that contains a list of constant integers. If present, the code generator will use the integer as the location cookie value when report errors through the LLVMContext error reporting mechanisms. This allows a front-end to correlate backend errors that occur with inline asm back to the source code that produced it. For example:

 call void asm sideeffect "something bad", ""(), !srcloc !42
 ...
 !42 = !{ i32 1234567 }
 

It is up to the front-end to make sense of the magic numbers it places in the IR. If the MDNode contains multiple constants, the code generator will use the one that corresponds to the line of the asm that the error occurs on.

Metadata Nodes and Metadata Strings

LLVM IR allows metadata to be attached to instructions in the program that can convey extra information about the code to the optimizers and code generator. One example application of metadata is source-level debug information. There are two metadata primitives: strings and nodes. All metadata has the metadata type and is identified in syntax by a preceding exclamation point ('!').

A metadata string is a string surrounded by double quotes. It can contain any character by escaping non-printable characters with "\xx" where "xx" is the two digit hex code. For example: "!"test\00"".

Metadata nodes are represented with notation similar to structure constants (a comma separated list of elements, surrounded by braces and preceded by an exclamation point). For example: "!{ metadata !"test\00", i32 10}". Metadata nodes can have any values as their operand.

A named metadata is a collection of metadata nodes, which can be looked up in the module symbol table. For example: "!foo = metadata !{!4, !3}".

Metadata can be used as function arguments. Here llvm.dbg.value function is using two metadata arguments.

 call void @llvm.dbg.value(metadata !24, i64 0, metadata !25)
 

Metadata can be attached with an instruction. Here metadata !21 is attached with add instruction using !dbg identifier.

 %indvar.next = add i64 %indvar, 1, !dbg !21
 

Intrinsic Global Variables

LLVM has a number of "magic" global variables that contain data that affect code generation or other IR semantics. These are documented here. All globals of this sort should have a section specified as "llvm.metadata". This section and all globals that start with "llvm." are reserved for use by LLVM.

The 'llvm.used' Global Variable

The @llvm.used global is an array with i8* element type which has appending linkage. This array contains a list of pointers to global variables and functions which may optionally have a pointer cast formed of bitcast or getelementptr. For example, a legal use of it is:

   @X = global i8 4
   @Y = global i32 123
 
   @llvm.used = appending global [2 x i8*] [
      i8* @X,
      i8* bitcast (i32* @Y to i8*)
   ], section "llvm.metadata"
 

If a global variable appears in the @llvm.used list, then the compiler, assembler, and linker are required to treat the symbol as if there is a reference to the global that it cannot see. For example, if a variable has internal linkage and no references other than that from the @llvm.used list, it cannot be deleted. This is commonly used to represent references from inline asms and other things the compiler cannot "see", and corresponds to "attribute((used))" in GNU C.

On some targets, the code generator must emit a directive to the assembler or object file to prevent the assembler and linker from molesting the symbol.

The 'llvm.compiler.used' Global Variable

The @llvm.compiler.used directive is the same as the @llvm.used directive, except that it only prevents the compiler from touching the symbol. On targets that support it, this allows an intelligent linker to optimize references to the symbol without being impeded as it would be by @llvm.used.

This is a rare construct that should only be used in rare circumstances, and should not be exposed to source languages.

The 'llvm.global_ctors' Global Variable

 %0 = type { i32, void ()* }
 @llvm.global_ctors = appending global [1 x %0] [%0 { i32 65535, void ()* @ctor }]
 

The @llvm.global_ctors array contains a list of constructor functions and associated priorities. The functions referenced by this array will be called in ascending order of priority (i.e. lowest first) when the module is loaded. The order of functions with the same priority is not defined.

The 'llvm.global_dtors' Global Variable

 %0 = type { i32, void ()* }
 @llvm.global_dtors = appending global [1 x %0] [%0 { i32 65535, void ()* @dtor }]
 

The @llvm.global_dtors array contains a list of destructor functions and associated priorities. The functions referenced by this array will be called in descending order of priority (i.e. highest first) when the module is loaded. The order of functions with the same priority is not defined.

Instruction Reference

The LLVM instruction set consists of several different classifications of instructions: terminator instructions, binary instructions, bitwise binary instructions, memory instructions, and other instructions.

Terminator Instructions

As mentioned previously, every basic block in a program ends with a "Terminator" instruction, which indicates which block should be executed after the current block is finished. These terminator instructions typically yield a 'void' value: they produce control flow, not values (the one exception being the 'invoke' instruction).

The terminator instructions are: 'ret', 'br', 'switch', 'indirectbr', 'invoke', 'unwind', 'resume', and 'unreachable'.

'ret' Instruction

Syntax:
   ret <type> <value>       ; Return a value from a non-void function
   ret void                 ; Return from void function
 
Overview:

The 'ret' instruction is used to return control flow (and optionally a value) from a function back to the caller.

There are two forms of the 'ret' instruction: one that returns a value and then causes control flow, and one that just causes control flow to occur.

Arguments:

The 'ret' instruction optionally accepts a single argument, the return value. The type of the return value must be a 'first class' type.

A function is not well formed if it it has a non-void return type and contains a 'ret' instruction with no return value or a return value with a type that does not match its type, or if it has a void return type and contains a 'ret' instruction with a return value.

Semantics:

When the 'ret' instruction is executed, control flow returns back to the calling function's context. If the caller is a "call" instruction, execution continues at the instruction after the call. If the caller was an "invoke" instruction, execution continues at the beginning of the "normal" destination block. If the instruction returns a value, that value shall set the call or invoke instruction's return value.

Example:
   ret i32 5                       ; Return an integer value of 5
   ret void                        ; Return from a void function
   ret { i32, i8 } { i32 4, i8 2 } ; Return a struct of values 4 and 2
 

'br' Instruction

Syntax:
   br i1 <cond>, label <iftrue>, label <iffalse>
   br label <dest>          ; Unconditional branch
 
Overview:

The 'br' instruction is used to cause control flow to transfer to a different basic block in the current function. There are two forms of this instruction, corresponding to a conditional branch and an unconditional branch.

Arguments:

The conditional branch form of the 'br' instruction takes a single 'i1' value and two 'label' values. The unconditional form of the 'br' instruction takes a single 'label' value as a target.

Semantics:

Upon execution of a conditional 'br' instruction, the 'i1' argument is evaluated. If the value is true, control flows to the 'iftrue' label argument. If "cond" is false, control flows to the 'iffalse' label argument.

Example:
 Test:
   %cond = icmp eq i32 %a, %b
   br i1 %cond, label %IfEqual, label %IfUnequal
 IfEqual:
   ret i32 1
 IfUnequal:
   ret i32 0
 

'switch' Instruction

Syntax:
   switch <intty> <value>, label <defaultdest> [ <intty> <val>, label <dest> ... ]
 
Overview:

The 'switch' instruction is used to transfer control flow to one of several different places. It is a generalization of the 'br' instruction, allowing a branch to occur to one of many possible destinations.

Arguments:

The 'switch' instruction uses three parameters: an integer comparison value 'value', a default 'label' destination, and an array of pairs of comparison value constants and 'label's. The table is not allowed to contain duplicate constant entries.

Semantics:

The switch instruction specifies a table of values and destinations. When the 'switch' instruction is executed, this table is searched for the given value. If the value is found, control flow is transferred to the corresponding destination; otherwise, control flow is transferred to the default destination.

Implementation:

Depending on properties of the target machine and the particular switch instruction, this instruction may be code generated in different ways. For example, it could be generated as a series of chained conditional branches or with a lookup table.

Example:
  ; Emulate a conditional br instruction
  %Val = zext i1 %value to i32
  switch i32 %Val, label %truedest [ i32 0, label %falsedest ]
 
  ; Emulate an unconditional br instruction
  switch i32 0, label %dest [ ]
 
  ; Implement a jump table:
  switch i32 %val, label %otherwise [ i32 0, label %onzero
                                      i32 1, label %onone
                                      i32 2, label %ontwo ]
 

'indirectbr' Instruction

Syntax:
   indirectbr <somety>* <address>, [ label <dest1>, label <dest2>, ... ]
 
Overview:

The 'indirectbr' instruction implements an indirect branch to a label within the current function, whose address is specified by "address". Address must be derived from a blockaddress constant.

Arguments:

The 'address' argument is the address of the label to jump to. The rest of the arguments indicate the full set of possible destinations that the address may point to. Blocks are allowed to occur multiple times in the destination list, though this isn't particularly useful.

This destination list is required so that dataflow analysis has an accurate understanding of the CFG.

Semantics:

Control transfers to the block specified in the address argument. All possible destination blocks must be listed in the label list, otherwise this instruction has undefined behavior. This implies that jumps to labels defined in other functions have undefined behavior as well.

Implementation:

This is typically implemented with a jump through a register.

Example:
  indirectbr i8* %Addr, [ label %bb1, label %bb2, label %bb3 ]
 

'invoke' Instruction

Syntax:
   <result> = invoke [cconv] [ret attrs] <ptr to function ty> <function ptr val>(<function args>) [fn attrs]
                 to label <normal label> unwind label <exception label>
 
Overview:

The 'invoke' instruction causes control to transfer to a specified function, with the possibility of control flow transfer to either the 'normal' label or the 'exception' label. If the callee function returns with the "ret" instruction, control flow will return to the "normal" label. If the callee (or any indirect callees) returns with the "unwind" instruction, control is interrupted and continued at the dynamically nearest "exception" label.

The 'exception' label is a landing pad for the exception. As such, 'exception' label is required to have the "landingpad" instruction, which contains the information about about the behavior of the program after unwinding happens, as its first non-PHI instruction. The restrictions on the "landingpad" instruction's tightly couples it to the "invoke" instruction, so that the important information contained within the "landingpad" instruction can't be lost through normal code motion.

Arguments:

This instruction requires several arguments:

  1. The optional "cconv" marker indicates which calling convention the call should use. If none is specified, the call defaults to using C calling conventions.
  2. The optional Parameter Attributes list for return values. Only 'zeroext', 'signext', and 'inreg' attributes are valid here.
  3. 'ptr to function ty': shall be the signature of the pointer to function value being invoked. In most cases, this is a direct function invocation, but indirect invokes are just as possible, branching off an arbitrary pointer to function value.
  4. 'function ptr val': An LLVM value containing a pointer to a function to be invoked.
  5. 'function args': argument list whose types match the function signature argument types and parameter attributes. All arguments must be of first class type. If the function signature indicates the function accepts a variable number of arguments, the extra arguments can be specified.
  6. 'normal label': the label reached when the called function executes a 'ret' instruction.
  7. 'exception label': the label reached when a callee returns with the unwind instruction.
  8. The optional function attributes list. Only 'noreturn', 'nounwind', 'readonly' and 'readnone' attributes are valid here.
Semantics:

This instruction is designed to operate as a standard 'call' instruction in most regards. The primary difference is that it establishes an association with a label, which is used by the runtime library to unwind the stack.

This instruction is used in languages with destructors to ensure that proper cleanup is performed in the case of either a longjmp or a thrown exception. Additionally, this is important for implementation of 'catch' clauses in high-level languages that support them.

For the purposes of the SSA form, the definition of the value returned by the 'invoke' instruction is deemed to occur on the edge from the current block to the "normal" label. If the callee unwinds then no return value is available.

Note that the code generator does not yet completely support unwind, and that the invoke/unwind semantics are likely to change in future versions.

Example:
   %retval = invoke i32 @Test(i32 15) to label %Continue
               unwind label %TestCleanup              ; {i32}:retval set
   %retval = invoke coldcc i32 %Testfnptr(i32 15) to label %Continue
               unwind label %TestCleanup              ; {i32}:retval set
 

'unwind' Instruction

Syntax:
   unwind
 
Overview:

The 'unwind' instruction unwinds the stack, continuing control flow at the first callee in the dynamic call stack which used an invoke instruction to perform the call. This is primarily used to implement exception handling.

Semantics:

The 'unwind' instruction causes execution of the current function to immediately halt. The dynamic call stack is then searched for the first invoke instruction on the call stack. Once found, execution continues at the "exceptional" destination block specified by the invoke instruction. If there is no invoke instruction in the dynamic call chain, undefined behavior results.

Note that the code generator does not yet completely support unwind, and that the invoke/unwind semantics are likely to change in future versions.

'resume' Instruction

Syntax:
   resume <type> <value>
 
Overview:

The 'resume' instruction is a terminator instruction that has no successors.

Arguments:

The 'resume' instruction requires one argument, which must have the same type as the result of any 'landingpad' instruction in the same function.

Semantics:

The 'resume' instruction resumes propagation of an existing (in-flight) exception whose unwinding was interrupted with a landingpad instruction.

Example:
   resume { i8*, i32 } %exn
 

'unreachable' Instruction

Syntax:
   unreachable
 
Overview:

The 'unreachable' instruction has no defined semantics. This instruction is used to inform the optimizer that a particular portion of the code is not reachable. This can be used to indicate that the code after a no-return function cannot be reached, and other facts.

Semantics:

The 'unreachable' instruction has no defined semantics.

Binary Operations

Binary operators are used to do most of the computation in a program. They require two operands of the same type, execute an operation on them, and produce a single value. The operands might represent multiple data, as is the case with the vector data type. The result value has the same type as its operands.

There are several different binary operators:

'add' Instruction

Syntax:
   <result> = add <ty> <op1>, <op2>          ; yields {ty}:result
   <result> = add nuw <ty> <op1>, <op2>      ; yields {ty}:result
   <result> = add nsw <ty> <op1>, <op2>      ; yields {ty}:result
   <result> = add nuw nsw <ty> <op1>, <op2>  ; yields {ty}:result
 
Overview:

The 'add' instruction returns the sum of its two operands.

Arguments:

The two arguments to the 'add' instruction must be integer or vector of integer values. Both arguments must have identical types.

Semantics:

The value produced is the integer sum of the two operands.

If the sum has unsigned overflow, the result returned is the mathematical result modulo 2n, where n is the bit width of the result.

Because LLVM integers use a two's complement representation, this instruction is appropriate for both signed and unsigned integers.

nuw and nsw stand for "No Unsigned Wrap" and "No Signed Wrap", respectively. If the nuw and/or nsw keywords are present, the result value of the add is a trap value if unsigned and/or signed overflow, respectively, occurs.

Example:
   <result> = add i32 4, %var          ; yields {i32}:result = 4 + %var
 

'fadd' Instruction

Syntax:
   <result> = fadd <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'fadd' instruction returns the sum of its two operands.

Arguments:

The two arguments to the 'fadd' instruction must be floating point or vector of floating point values. Both arguments must have identical types.

Semantics:

The value produced is the floating point sum of the two operands.

Example:
   <result> = fadd float 4.0, %var          ; yields {float}:result = 4.0 + %var
 

'sub' Instruction

Syntax:
   <result> = sub <ty> <op1>, <op2>          ; yields {ty}:result
   <result> = sub nuw <ty> <op1>, <op2>      ; yields {ty}:result
   <result> = sub nsw <ty> <op1>, <op2>      ; yields {ty}:result
   <result> = sub nuw nsw <ty> <op1>, <op2>  ; yields {ty}:result
 
Overview:

The 'sub' instruction returns the difference of its two operands.

Note that the 'sub' instruction is used to represent the 'neg' instruction present in most other intermediate representations.

Arguments:

The two arguments to the 'sub' instruction must be integer or vector of integer values. Both arguments must have identical types.

Semantics:

The value produced is the integer difference of the two operands.

If the difference has unsigned overflow, the result returned is the mathematical result modulo 2n, where n is the bit width of the result.

Because LLVM integers use a two's complement representation, this instruction is appropriate for both signed and unsigned integers.

nuw and nsw stand for "No Unsigned Wrap" and "No Signed Wrap", respectively. If the nuw and/or nsw keywords are present, the result value of the sub is a trap value if unsigned and/or signed overflow, respectively, occurs.

Example:
   <result> = sub i32 4, %var          ; yields {i32}:result = 4 - %var
   <result> = sub i32 0, %val          ; yields {i32}:result = -%var
 

'fsub' Instruction

Syntax:
   <result> = fsub <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'fsub' instruction returns the difference of its two operands.

Note that the 'fsub' instruction is used to represent the 'fneg' instruction present in most other intermediate representations.

Arguments:

The two arguments to the 'fsub' instruction must be floating point or vector of floating point values. Both arguments must have identical types.

Semantics:

The value produced is the floating point difference of the two operands.

Example:
   <result> = fsub float 4.0, %var           ; yields {float}:result = 4.0 - %var
   <result> = fsub float -0.0, %val          ; yields {float}:result = -%var
 

'mul' Instruction

Syntax:
   <result> = mul <ty> <op1>, <op2>          ; yields {ty}:result
   <result> = mul nuw <ty> <op1>, <op2>      ; yields {ty}:result
   <result> = mul nsw <ty> <op1>, <op2>      ; yields {ty}:result
   <result> = mul nuw nsw <ty> <op1>, <op2>  ; yields {ty}:result
 
Overview:

The 'mul' instruction returns the product of its two operands.

Arguments:

The two arguments to the 'mul' instruction must be integer or vector of integer values. Both arguments must have identical types.

Semantics:

The value produced is the integer product of the two operands.

If the result of the multiplication has unsigned overflow, the result returned is the mathematical result modulo 2n, where n is the bit width of the result.

Because LLVM integers use a two's complement representation, and the result is the same width as the operands, this instruction returns the correct result for both signed and unsigned integers. If a full product (e.g. i32xi32->i64) is needed, the operands should be sign-extended or zero-extended as appropriate to the width of the full product.

nuw and nsw stand for "No Unsigned Wrap" and "No Signed Wrap", respectively. If the nuw and/or nsw keywords are present, the result value of the mul is a trap value if unsigned and/or signed overflow, respectively, occurs.

Example:
   <result> = mul i32 4, %var          ; yields {i32}:result = 4 * %var
 

'fmul' Instruction

Syntax:
   <result> = fmul <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'fmul' instruction returns the product of its two operands.

Arguments:

The two arguments to the 'fmul' instruction must be floating point or vector of floating point values. Both arguments must have identical types.

Semantics:

The value produced is the floating point product of the two operands.

Example:
   <result> = fmul float 4.0, %var          ; yields {float}:result = 4.0 * %var
 

'udiv' Instruction

Syntax:
   <result> = udiv <ty> <op1>, <op2>         ; yields {ty}:result
   <result> = udiv exact <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'udiv' instruction returns the quotient of its two operands.

Arguments:

The two arguments to the 'udiv' instruction must be integer or vector of integer values. Both arguments must have identical types.

Semantics:

The value produced is the unsigned integer quotient of the two operands.

Note that unsigned integer division and signed integer division are distinct operations; for signed integer division, use 'sdiv'.

Division by zero leads to undefined behavior.

If the exact keyword is present, the result value of the udiv is a trap value if %op1 is not a multiple of %op2 (as such, "((a udiv exact b) mul b) == a").

Example:
   <result> = udiv i32 4, %var          ; yields {i32}:result = 4 / %var
 

'sdiv' Instruction

Syntax:
   <result> = sdiv <ty> <op1>, <op2>         ; yields {ty}:result
   <result> = sdiv exact <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'sdiv' instruction returns the quotient of its two operands.

Arguments:

The two arguments to the 'sdiv' instruction must be integer or vector of integer values. Both arguments must have identical types.

Semantics:

The value produced is the signed integer quotient of the two operands rounded towards zero.

Note that signed integer division and unsigned integer division are distinct operations; for unsigned integer division, use 'udiv'.

Division by zero leads to undefined behavior. Overflow also leads to undefined behavior; this is a rare case, but can occur, for example, by doing a 32-bit division of -2147483648 by -1.

If the exact keyword is present, the result value of the sdiv is a trap value if the result would be rounded.

Example:
   <result> = sdiv i32 4, %var          ; yields {i32}:result = 4 / %var
 

'fdiv' Instruction

Syntax:
   <result> = fdiv <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'fdiv' instruction returns the quotient of its two operands.

Arguments:

The two arguments to the 'fdiv' instruction must be floating point or vector of floating point values. Both arguments must have identical types.

Semantics:

The value produced is the floating point quotient of the two operands.

Example:
   <result> = fdiv float 4.0, %var          ; yields {float}:result = 4.0 / %var
 

'urem' Instruction

Syntax:
   <result> = urem <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'urem' instruction returns the remainder from the unsigned division of its two arguments.

Arguments:

The two arguments to the 'urem' instruction must be integer or vector of integer values. Both arguments must have identical types.

Semantics:

This instruction returns the unsigned integer remainder of a division. This instruction always performs an unsigned division to get the remainder.

Note that unsigned integer remainder and signed integer remainder are distinct operations; for signed integer remainder, use 'srem'.

Taking the remainder of a division by zero leads to undefined behavior.

Example:
   <result> = urem i32 4, %var          ; yields {i32}:result = 4 % %var
 

'srem' Instruction

Syntax:
   <result> = srem <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'srem' instruction returns the remainder from the signed division of its two operands. This instruction can also take vector versions of the values in which case the elements must be integers.

Arguments:

The two arguments to the 'srem' instruction must be integer or vector of integer values. Both arguments must have identical types.

Semantics:

This instruction returns the remainder of a division (where the result is either zero or has the same sign as the dividend, op1), not the modulo operator (where the result is either zero or has the same sign as the divisor, op2) of a value. For more information about the difference, see The Math Forum. For a table of how this is implemented in various languages, please see Wikipedia: modulo operation.

Note that signed integer remainder and unsigned integer remainder are distinct operations; for unsigned integer remainder, use 'urem'.

Taking the remainder of a division by zero leads to undefined behavior. Overflow also leads to undefined behavior; this is a rare case, but can occur, for example, by taking the remainder of a 32-bit division of -2147483648 by -1. (The remainder doesn't actually overflow, but this rule lets srem be implemented using instructions that return both the result of the division and the remainder.)

Example:
   <result> = srem i32 4, %var          ; yields {i32}:result = 4 % %var
 

'frem' Instruction

Syntax:
   <result> = frem <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'frem' instruction returns the remainder from the division of its two operands.

Arguments:

The two arguments to the 'frem' instruction must be floating point or vector of floating point values. Both arguments must have identical types.

Semantics:

This instruction returns the remainder of a division. The remainder has the same sign as the dividend.

Example:
   <result> = frem float 4.0, %var          ; yields {float}:result = 4.0 % %var
 

Bitwise Binary Operations

Bitwise binary operators are used to do various forms of bit-twiddling in a program. They are generally very efficient instructions and can commonly be strength reduced from other instructions. They require two operands of the same type, execute an operation on them, and produce a single value. The resulting value is the same type as its operands.

'shl' Instruction

Syntax:
   <result> = shl <ty> <op1>, <op2>           ; yields {ty}:result
   <result> = shl nuw <ty> <op1>, <op2>       ; yields {ty}:result
   <result> = shl nsw <ty> <op1>, <op2>       ; yields {ty}:result
   <result> = shl nuw nsw <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'shl' instruction returns the first operand shifted to the left a specified number of bits.

Arguments:

Both arguments to the 'shl' instruction must be the same integer or vector of integer type. 'op2' is treated as an unsigned value.

Semantics:

The value produced is op1 * 2op2 mod 2n, where n is the width of the result. If op2 is (statically or dynamically) negative or equal to or larger than the number of bits in op1, the result is undefined. If the arguments are vectors, each vector element of op1 is shifted by the corresponding shift amount in op2.

If the nuw keyword is present, then the shift produces a trap value if it shifts out any non-zero bits. If the nsw keyword is present, then the shift produces a trap value if it shifts out any bits that disagree with the resultant sign bit. As such, NUW/NSW have the same semantics as they would if the shift were expressed as a mul instruction with the same nsw/nuw bits in (mul %op1, (shl 1, %op2)).

Example:
   <result> = shl i32 4, %var   ; yields {i32}: 4 << %var
   <result> = shl i32 4, 2      ; yields {i32}: 16
   <result> = shl i32 1, 10     ; yields {i32}: 1024
   <result> = shl i32 1, 32     ; undefined
   <result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2>   ; yields: result=<2 x i32> < i32 2, i32 4>
 

'lshr' Instruction

Syntax:
   <result> = lshr <ty> <op1>, <op2>         ; yields {ty}:result
   <result> = lshr exact <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'lshr' instruction (logical shift right) returns the first operand shifted to the right a specified number of bits with zero fill.

Arguments:

Both arguments to the 'lshr' instruction must be the same integer or vector of integer type. 'op2' is treated as an unsigned value.

Semantics:

This instruction always performs a logical shift right operation. The most significant bits of the result will be filled with zero bits after the shift. If op2 is (statically or dynamically) equal to or larger than the number of bits in op1, the result is undefined. If the arguments are vectors, each vector element of op1 is shifted by the corresponding shift amount in op2.

If the exact keyword is present, the result value of the lshr is a trap value if any of the bits shifted out are non-zero.

Example:
   <result> = lshr i32 4, 1   ; yields {i32}:result = 2
   <result> = lshr i32 4, 2   ; yields {i32}:result = 1
   <result> = lshr i8  4, 3   ; yields {i8}:result = 0
   <result> = lshr i8 -2, 1   ; yields {i8}:result = 0x7FFFFFFF 
   <result> = lshr i32 1, 32  ; undefined
   <result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2>   ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1>
 

'ashr' Instruction

Syntax:
   <result> = ashr <ty> <op1>, <op2>         ; yields {ty}:result
   <result> = ashr exact <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'ashr' instruction (arithmetic shift right) returns the first operand shifted to the right a specified number of bits with sign extension.

Arguments:

Both arguments to the 'ashr' instruction must be the same integer or vector of integer type. 'op2' is treated as an unsigned value.

Semantics:

This instruction always performs an arithmetic shift right operation, The most significant bits of the result will be filled with the sign bit of op1. If op2 is (statically or dynamically) equal to or larger than the number of bits in op1, the result is undefined. If the arguments are vectors, each vector element of op1 is shifted by the corresponding shift amount in op2.

If the exact keyword is present, the result value of the ashr is a trap value if any of the bits shifted out are non-zero.

Example:
   <result> = ashr i32 4, 1   ; yields {i32}:result = 2
   <result> = ashr i32 4, 2   ; yields {i32}:result = 1
   <result> = ashr i8  4, 3   ; yields {i8}:result = 0
   <result> = ashr i8 -2, 1   ; yields {i8}:result = -1
   <result> = ashr i32 1, 32  ; undefined
   <result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3>   ; yields: result=<2 x i32> < i32 -1, i32 0>
 

'and' Instruction

Syntax:
   <result> = and <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'and' instruction returns the bitwise logical and of its two operands.

Arguments:

The two arguments to the 'and' instruction must be integer or vector of integer values. Both arguments must have identical types.

Semantics:

The truth table used for the 'and' instruction is:

In0 In1 Out
0 0 0
0 1 0
1 0 0
1 1 1
Example:
   <result> = and i32 4, %var         ; yields {i32}:result = 4 & %var
   <result> = and i32 15, 40          ; yields {i32}:result = 8
   <result> = and i32 4, 8            ; yields {i32}:result = 0
 

'or' Instruction

Syntax:
   <result> = or <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'or' instruction returns the bitwise logical inclusive or of its two operands.

Arguments:

The two arguments to the 'or' instruction must be integer or vector of integer values. Both arguments must have identical types.

Semantics:

The truth table used for the 'or' instruction is:

In0 In1 Out
0 0 0
0 1 1
1 0 1
1 1 1
Example:
   <result> = or i32 4, %var         ; yields {i32}:result = 4 | %var
   <result> = or i32 15, 40          ; yields {i32}:result = 47
   <result> = or i32 4, 8            ; yields {i32}:result = 12
 

'xor' Instruction

Syntax:
   <result> = xor <ty> <op1>, <op2>   ; yields {ty}:result
 
Overview:

The 'xor' instruction returns the bitwise logical exclusive or of its two operands. The xor is used to implement the "one's complement" operation, which is the "~" operator in C.

Arguments:

The two arguments to the 'xor' instruction must be integer or vector of integer values. Both arguments must have identical types.

Semantics:

The truth table used for the 'xor' instruction is:

In0 In1 Out
0 0 0
0 1 1
1 0 1
1 1 0
Example:
   <result> = xor i32 4, %var         ; yields {i32}:result = 4 ^ %var
   <result> = xor i32 15, 40          ; yields {i32}:result = 39
   <result> = xor i32 4, 8            ; yields {i32}:result = 12
   <result> = xor i32 %V, -1          ; yields {i32}:result = ~%V
 

Vector Operations

LLVM supports several instructions to represent vector operations in a target-independent manner. These instructions cover the element-access and vector-specific operations needed to process vectors effectively. While LLVM does directly support these vector operations, many sophisticated algorithms will want to use target-specific intrinsics to take full advantage of a specific target.

'extractelement' Instruction

Syntax:
   <result> = extractelement <n x <ty>> <val>, i32 <idx>    ; yields <ty>
 
Overview:

The 'extractelement' instruction extracts a single scalar element from a vector at a specified index.

Arguments:

The first operand of an 'extractelement' instruction is a value of vector type. The second operand is an index indicating the position from which to extract the element. The index may be a variable.

Semantics:

The result is a scalar of the same type as the element type of val. Its value is the value at position idx of val. If idx exceeds the length of val, the results are undefined.

Example:
   <result> = extractelement <4 x i32> %vec, i32 0    ; yields i32
 

'insertelement' Instruction

Syntax:
   <result> = insertelement <n x <ty>> <val>, <ty> <elt>, i32 <idx>    ; yields <n x <ty>>
 
Overview:

The 'insertelement' instruction inserts a scalar element into a vector at a specified index.

Arguments:

The first operand of an 'insertelement' instruction is a value of vector type. The second operand is a scalar value whose type must equal the element type of the first operand. The third operand is an index indicating the position at which to insert the value. The index may be a variable.

Semantics:

The result is a vector of the same type as val. Its element values are those of val except at position idx, where it gets the value elt. If idx exceeds the length of val, the results are undefined.

Example:
   <result> = insertelement <4 x i32> %vec, i32 1, i32 0    ; yields <4 x i32>
 

'shufflevector' Instruction

Syntax:
   <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <m x i32> <mask>    ; yields <m x <ty>>
 
Overview:

The 'shufflevector' instruction constructs a permutation of elements from two input vectors, returning a vector with the same element type as the input and length that is the same as the shuffle mask.

Arguments:

The first two operands of a 'shufflevector' instruction are vectors with types that match each other. The third argument is a shuffle mask whose element type is always 'i32'. The result of the instruction is a vector whose length is the same as the shuffle mask and whose element type is the same as the element type of the first two operands.

The shuffle mask operand is required to be a constant vector with either constant integer or undef values.

Semantics:

The elements of the two input vectors are numbered from left to right across both of the vectors. The shuffle mask operand specifies, for each element of the result vector, which element of the two input vectors the result element gets. The element selector may be undef (meaning "don't care") and the second operand may be undef if performing a shuffle from only one vector.

Example:
   <result> = shufflevector <4 x i32> %v1, <4 x i32> %v2,
                           <4 x i32> <i32 0, i32 4, i32 1, i32 5>  ; yields <4 x i32>
   <result> = shufflevector <4 x i32> %v1, <4 x i32> undef,
                           <4 x i32> <i32 0, i32 1, i32 2, i32 3>  ; yields <4 x i32> - Identity shuffle.
   <result> = shufflevector <8 x i32> %v1, <8 x i32> undef,
                           <4 x i32> <i32 0, i32 1, i32 2, i32 3>  ; yields <4 x i32>
   <result> = shufflevector <4 x i32> %v1, <4 x i32> %v2,
                           <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7 >  ; yields <8 x i32>
 

Aggregate Operations

LLVM supports several instructions for working with aggregate values.

'extractvalue' Instruction

Syntax:
   <result> = extractvalue <aggregate type> <val>, <idx>{, <idx>}*
 
Overview:

The 'extractvalue' instruction extracts the value of a member field from an aggregate value.

Arguments:

The first operand of an 'extractvalue' instruction is a value of struct or array type. The operands are constant indices to specify which value to extract in a similar manner as indices in a 'getelementptr' instruction.

The major differences to getelementptr indexing are:

  • Since the value being indexed is not a pointer, the first index is omitted and assumed to be zero.
  • At least one index must be specified.
  • Not only struct indices but also array indices must be in bounds.
Semantics:

The result is the value at the position in the aggregate specified by the index operands.

Example:
   <result> = extractvalue {i32, float} %agg, 0    ; yields i32
 

'insertvalue' Instruction

Syntax:
   <result> = insertvalue <aggregate type> <val>, <ty> <elt>, <idx>{, <idx>}*    ; yields <aggregate type>
 
Overview:

The 'insertvalue' instruction inserts a value into a member field in an aggregate value.

Arguments:

The first operand of an 'insertvalue' instruction is a value of struct or array type. The second operand is a first-class value to insert. The following operands are constant indices indicating the position at which to insert the value in a similar manner as indices in a 'extractvalue' instruction. The value to insert must have the same type as the value identified by the indices.

Semantics:

The result is an aggregate of the same type as val. Its value is that of val except that the value at the position specified by the indices is that of elt.

Example:
   %agg1 = insertvalue {i32, float} undef, i32 1, 0              ; yields {i32 1, float undef}
   %agg2 = insertvalue {i32, float} %agg1, float %val, 1         ; yields {i32 1, float %val}
   %agg3 = insertvalue {i32, {float}} %agg1, float %val, 1, 0    ; yields {i32 1, float %val}
 

Memory Access and Addressing Operations

A key design point of an SSA-based representation is how it represents memory. In LLVM, no memory locations are in SSA form, which makes things very simple. This section describes how to read, write, and allocate memory in LLVM.

'alloca' Instruction

Syntax:
   <result> = alloca <type>[, <ty> <NumElements>][, align <alignment>]     ; yields {type*}:result
 
Overview:

The 'alloca' instruction allocates memory on the stack frame of the currently executing function, to be automatically released when this function returns to its caller. The object is always allocated in the generic address space (address space zero).

Arguments:

The 'alloca' instruction allocates sizeof(<type>)*NumElements bytes of memory on the runtime stack, returning a pointer of the appropriate type to the program. If "NumElements" is specified, it is the number of elements allocated, otherwise "NumElements" is defaulted to be one. If a constant alignment is specified, the value result of the allocation is guaranteed to be aligned to at least that boundary. If not specified, or if zero, the target can choose to align the allocation on any convenient boundary compatible with the type.

'type' may be any sized type.

Semantics:

Memory is allocated; a pointer is returned. The operation is undefined if there is insufficient stack space for the allocation. 'alloca'd memory is automatically released when the function returns. The 'alloca' instruction is commonly used to represent automatic variables that must have an address available. When the function returns (either with the ret or unwind instructions), the memory is reclaimed. Allocating zero bytes is legal, but the result is undefined.

Example:
   %ptr = alloca i32                             ; yields {i32*}:ptr
   %ptr = alloca i32, i32 4                      ; yields {i32*}:ptr
   %ptr = alloca i32, i32 4, align 1024          ; yields {i32*}:ptr
   %ptr = alloca i32, align 1024                 ; yields {i32*}:ptr
 

'load' Instruction

Syntax:
   <result> = load [volatile] <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>]
   <result> = load atomic [volatile] <ty>* <pointer> [singlethread] <ordering>, align <alignment>
   !<index> = !{ i32 1 }
 
Overview:

The 'load' instruction is used to read from memory.

Arguments:

The argument to the 'load' instruction specifies the memory address from which to load. The pointer must point to a first class type. If the load is marked as volatile, then the optimizer is not allowed to modify the number or order of execution of this load with other volatile operations.

If the load is marked as atomic, it takes an extra ordering and optional singlethread argument. The release and acq_rel orderings are not valid on load instructions. Atomic loads produce defined results when they may see multiple atomic stores. The type of the pointee must be an integer type whose bit width is a power of two greater than or equal to eight and less than or equal to a target-specific size limit. align must be explicitly specified on atomic loads, and the load has undefined behavior if the alignment is not set to a value which is at least the size in bytes of the pointee. !nontemporal does not have any defined semantics for atomic loads.

The optional constant align argument specifies the alignment of the operation (that is, the alignment of the memory address). A value of 0 or an omitted align argument means that the operation has the preferential alignment for the target. It is the responsibility of the code emitter to ensure that the alignment information is correct. Overestimating the alignment results in undefined behavior. Underestimating the alignment may produce less efficient code. An alignment of 1 is always safe.

The optional !nontemporal metadata must reference a single metatadata name <index> corresponding to a metadata node with one i32 entry of value 1. The existence of the !nontemporal metatadata on the instruction tells the optimizer and code generator that this load is not expected to be reused in the cache. The code generator may select special instructions to save cache bandwidth, such as the MOVNT instruction on x86.

Semantics:

The location of memory pointed to is loaded. If the value being loaded is of scalar type then the number of bytes read does not exceed the minimum number of bytes needed to hold all bits of the type. For example, loading an i24 reads at most three bytes. When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type.

Examples:
   %ptr = alloca i32                               ; yields {i32*}:ptr
   store i32 3, i32* %ptr                          ; yields {void}
   %val = load i32* %ptr                           ; yields {i32}:val = i32 3
 

'store' Instruction

Syntax:
   store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>]                   ; yields {void}
   store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread] <ordering>, align <alignment>             ; yields {void}
 
Overview:

The 'store' instruction is used to write to memory.

Arguments:

There are two arguments to the 'store' instruction: a value to store and an address at which to store it. The type of the '<pointer>' operand must be a pointer to the first class type of the '<value>' operand. If the store is marked as volatile, then the optimizer is not allowed to modify the number or order of execution of this store with other volatile operations.

If the store is marked as atomic, it takes an extra ordering and optional singlethread argument. The acquire and acq_rel orderings aren't valid on store instructions. Atomic loads produce defined results when they may see multiple atomic stores. The type of the pointee must be an integer type whose bit width is a power of two greater than or equal to eight and less than or equal to a target-specific size limit. align must be explicitly specified on atomic stores, and the store has undefined behavior if the alignment is not set to a value which is at least the size in bytes of the pointee. !nontemporal does not have any defined semantics for atomic stores.

The optional constant "align" argument specifies the alignment of the operation (that is, the alignment of the memory address). A value of 0 or an omitted "align" argument means that the operation has the preferential alignment for the target. It is the responsibility of the code emitter to ensure that the alignment information is correct. Overestimating the alignment results in an undefined behavior. Underestimating the alignment may produce less efficient code. An alignment of 1 is always safe.

The optional !nontemporal metadata must reference a single metatadata name <index> corresponding to a metadata node with one i32 entry of value 1. The existence of the !nontemporal metatadata on the instruction tells the optimizer and code generator that this load is not expected to be reused in the cache. The code generator may select special instructions to save cache bandwidth, such as the MOVNT instruction on x86.

Semantics:

The contents of memory are updated to contain '<value>' at the location specified by the '<pointer>' operand. If '<value>' is of scalar type then the number of bytes written does not exceed the minimum number of bytes needed to hold all bits of the type. For example, storing an i24 writes at most three bytes. When writing a value of a type like i20 with a size that is not an integral number of bytes, it is unspecified what happens to the extra bits that do not belong to the type, but they will typically be overwritten.

Example:
   %ptr = alloca i32                               ; yields {i32*}:ptr
   store i32 3, i32* %ptr                          ; yields {void}
   %val = load i32* %ptr                           ; yields {i32}:val = i32 3
 

'fence' Instruction

Syntax:
   fence [singlethread] <ordering>                   ; yields {void}
 
Overview:

The 'fence' instruction is used to introduce happens-before edges between operations.

Arguments:

'fence' instructions take an ordering argument which defines what synchronizes-with edges they add. They can only be given acquire, release, acq_rel, and seq_cst orderings.

Semantics:

A fence A which has (at least) release ordering semantics synchronizes with a fence B with (at least) acquire ordering semantics if and only if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M (either directly or through some side effect of a sequence headed by X), Y is sequenced before B, and Y observes M. This provides a happens-before dependency between A and B. Rather than an explicit fence, one (but not both) of the atomic operations X or Y might provide a release or acquire (resp.) ordering constraint and still synchronize-with the explicit fence and establish the happens-before edge.

A fence which has seq_cst ordering, in addition to having both acquire and release semantics specified above, participates in the global program order of other seq_cst operations and/or fences.

The optional "singlethread" argument specifies that the fence only synchronizes with other fences in the same thread. (This is useful for interacting with signal handlers.)

Example:
   fence acquire                          ; yields {void}
   fence singlethread seq_cst             ; yields {void}
 

'cmpxchg' Instruction

Syntax:
   cmpxchg [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread] <ordering>                   ; yields {ty}
 
Overview:

The 'cmpxchg' instruction is used to atomically modify memory. It loads a value in memory and compares it to a given value. If they are equal, it stores a new value into the memory.

Arguments:

There are three arguments to the 'cmpxchg' instruction: an address to operate on, a value to compare to the value currently be at that address, and a new value to place at that address if the compared values are equal. The type of '<cmp>' must be an integer type whose bit width is a power of two greater than or equal to eight and less than or equal to a target-specific size limit. '<cmp>' and '<new>' must have the same type, and the type of '<pointer>' must be a pointer to that type. If the cmpxchg is marked as volatile, then the optimizer is not allowed to modify the number or order of execution of this cmpxchg with other volatile operations.

The ordering argument specifies how this cmpxchg synchronizes with other atomic operations.

The optional "singlethread" argument declares that the cmpxchg is only atomic with respect to code (usually signal handlers) running in the same thread as the cmpxchg. Otherwise the cmpxchg is atomic with respect to all other code in the system.

The pointer passed into cmpxchg must have alignment greater than or equal to the size in memory of the operand.

Semantics:

The contents of memory at the location specified by the '<pointer>' operand is read and compared to '<cmp>'; if the read value is the equal, '<new>' is written. The original value at the location is returned.

A successful cmpxchg is a read-modify-write instruction for the purpose of identifying release sequences. A failed cmpxchg is equivalent to an atomic load with an ordering parameter determined by dropping any release part of the cmpxchg's ordering.

Example:
 entry:
   %orig = atomic load i32* %ptr unordered                       ; yields {i32}
   br label %loop
 
 loop:
   %cmp = phi i32 [ %orig, %entry ], [%old, %loop]
   %squared = mul i32 %cmp, %cmp
   %old = cmpxchg i32* %ptr, i32 %cmp, i32 %squared                       ; yields {i32}
   %success = icmp eq i32 %cmp, %old
   br i1 %success, label %done, label %loop
 
 done:
   ...
 

'atomicrmw' Instruction

Syntax:
   atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread] <ordering>                   ; yields {ty}
 
Overview:

The 'atomicrmw' instruction is used to atomically modify memory.

Arguments:

There are three arguments to the 'atomicrmw' instruction: an operation to apply, an address whose value to modify, an argument to the operation. The operation must be one of the following keywords:

  • xchg
  • add
  • sub
  • and
  • nand
  • or
  • xor
  • max
  • min
  • umax
  • umin

The type of '<value>' must be an integer type whose bit width is a power of two greater than or equal to eight and less than or equal to a target-specific size limit. The type of the '<pointer>' operand must be a pointer to that type. If the atomicrmw is marked as volatile, then the optimizer is not allowed to modify the number or order of execution of this atomicrmw with other volatile operations.

Semantics:

The contents of memory at the location specified by the '<pointer>' operand are atomically read, modified, and written back. The original value at the location is returned. The modification is specified by the operation argument:

  • xchg: *ptr = val
  • add: *ptr = *ptr + val
  • sub: *ptr = *ptr - val
  • and: *ptr = *ptr & val
  • nand: *ptr = ~(*ptr & val)
  • or: *ptr = *ptr | val
  • xor: *ptr = *ptr ^ val
  • max: *ptr = *ptr > val ? *ptr : val (using a signed comparison)
  • min: *ptr = *ptr < val ? *ptr : val (using a signed comparison)
  • umax: *ptr = *ptr > val ? *ptr : val (using an unsigned comparison)
  • umin: *ptr = *ptr < val ? *ptr : val (using an unsigned comparison)
Example:
   %old = atomicrmw add i32* %ptr, i32 1 acquire                        ; yields {i32}
 

'getelementptr' Instruction

Syntax:
   <result> = getelementptr <pty>* <ptrval>{, <ty> <idx>}*
   <result> = getelementptr inbounds <pty>* <ptrval>{, <ty> <idx>}*
 
Overview:

The 'getelementptr' instruction is used to get the address of a subelement of an aggregate data structure. It performs address calculation only and does not access memory.

Arguments:

The first argument is always a pointer, and forms the basis of the calculation. The remaining arguments are indices that indicate which of the elements of the aggregate object are indexed. The interpretation of each index is dependent on the type being indexed into. The first index always indexes the pointer value given as the first argument, the second index indexes a value of the type pointed to (not necessarily the value directly pointed to, since the first index can be non-zero), etc. The first type indexed into must be a pointer value, subsequent types can be arrays, vectors, and structs. Note that subsequent types being indexed into can never be pointers, since that would require loading the pointer before continuing calculation.

The type of each index argument depends on the type it is indexing into. When indexing into a (optionally packed) structure, only i32 integer constants are allowed. When indexing into an array, pointer or vector, integers of any width are allowed, and they are not required to be constant. These integers are treated as signed values where relevant.

For example, let's consider a C code fragment and how it gets compiled to LLVM:

 struct RT {
   char A;
   int B[10][20];
   char C;
 };
 struct ST {
   int X;
   double Y;
   struct RT Z;
 };
 
 int *foo(struct ST *s) {
   return &s[1].Z.B[5][13];
 }
 

The LLVM code generated by the GCC frontend is:

 %RT = type { i8 , [10 x [20 x i32]], i8  }
 %ST = type { i32, double, %RT }
 
 define i32* @foo(%ST* %s) {
 entry:
   %reg = getelementptr %ST* %s, i32 1, i32 2, i32 1, i32 5, i32 13
   ret i32* %reg
 }
 
Semantics:

In the example above, the first index is indexing into the '%ST*' type, which is a pointer, yielding a '%ST' = '{ i32, double, %RT }' type, a structure. The second index indexes into the third element of the structure, yielding a '%RT' = '{ i8 , [10 x [20 x i32]], i8 }' type, another structure. The third index indexes into the second element of the structure, yielding a '[10 x [20 x i32]]' type, an array. The two dimensions of the array are subscripted into, yielding an 'i32' type. The 'getelementptr' instruction returns a pointer to this element, thus computing a value of 'i32*' type.

Note that it is perfectly legal to index partially through a structure, returning a pointer to an inner element. Because of this, the LLVM code for the given testcase is equivalent to:

   define i32* @foo(%ST* %s) {
     %t1 = getelementptr %ST* %s, i32 1                        ; yields %ST*:%t1
     %t2 = getelementptr %ST* %t1, i32 0, i32 2                ; yields %RT*:%t2
     %t3 = getelementptr %RT* %t2, i32 0, i32 1                ; yields [10 x [20 x i32]]*:%t3
     %t4 = getelementptr [10 x [20 x i32]]* %t3, i32 0, i32 5  ; yields [20 x i32]*:%t4
     %t5 = getelementptr [20 x i32]* %t4, i32 0, i32 13        ; yields i32*:%t5
     ret i32* %t5
   }
 

If the inbounds keyword is present, the result value of the getelementptr is a trap value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an in bounds address of that allocated object. The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end.

If the inbounds keyword is not present, the offsets are added to the base address with silently-wrapping two's complement arithmetic. If the offsets have a different width from the pointer, they are sign-extended or truncated to the width of the pointer. The result value of the getelementptr may be outside the object pointed to by the base pointer. The result value may not necessarily be used to access memory though, even if it happens to point into allocated storage. See the Pointer Aliasing Rules section for more information.

The getelementptr instruction is often confusing. For some more insight into how it works, see the getelementptr FAQ.

Example:
     ; yields [12 x i8]*:aptr
     %aptr = getelementptr {i32, [12 x i8]}* %saptr, i64 0, i32 1
     ; yields i8*:vptr
     %vptr = getelementptr {i32, <2 x i8>}* %svptr, i64 0, i32 1, i32 1
     ; yields i8*:eptr
     %eptr = getelementptr [12 x i8]* %aptr, i64 0, i32 1
     ; yields i32*:iptr
     %iptr = getelementptr [10 x i32]* @arr, i16 0, i16 0
 

Conversion Operations

The instructions in this category are the conversion instructions (casting) which all take a single operand and a type. They perform various bit conversions on the operand.

'trunc .. to' Instruction

Syntax:
   <result> = trunc <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'trunc' instruction truncates its operand to the type ty2.

Arguments:

The 'trunc' instruction takes a value to trunc, and a type to trunc it to. Both types must be of integer types, or vectors of the same number of integers. The bit size of the value must be larger than the bit size of the destination type, ty2. Equal sized types are not allowed.

Semantics:

The 'trunc' instruction truncates the high order bits in value and converts the remaining bits to ty2. Since the source size must be larger than the destination size, trunc cannot be a no-op cast. It will always truncate bits.

Example:
   %X = trunc i32 257 to i8                        ; yields i8:1
   %Y = trunc i32 123 to i1                        ; yields i1:true
   %Z = trunc i32 122 to i1                        ; yields i1:false
   %W = trunc <2 x i16> <i16 8, i16 7> to <2 x i8> ; yields <i8 8, i8 7>
 

'zext .. to' Instruction

Syntax:
   <result> = zext <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'zext' instruction zero extends its operand to type ty2.

Arguments:

The 'zext' instruction takes a value to cast, and a type to cast it to. Both types must be of integer types, or vectors of the same number of integers. The bit size of the value must be smaller than the bit size of the destination type, ty2.

Semantics:

The zext fills the high order bits of the value with zero bits until it reaches the size of the destination type, ty2.

When zero extending from i1, the result will always be either 0 or 1.

Example:
   %X = zext i32 257 to i64              ; yields i64:257
   %Y = zext i1 true to i32              ; yields i32:1
   %Z = zext <2 x i16> <i16 8, i16 7> to <2 x i32> ; yields <i32 8, i32 7>
 

'sext .. to' Instruction

Syntax:
   <result> = sext <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'sext' sign extends value to the type ty2.

Arguments:

The 'sext' instruction takes a value to cast, and a type to cast it to. Both types must be of integer types, or vectors of the same number of integers. The bit size of the value must be smaller than the bit size of the destination type, ty2.

Semantics:

The 'sext' instruction performs a sign extension by copying the sign bit (highest order bit) of the value until it reaches the bit size of the type ty2.

When sign extending from i1, the extension always results in -1 or 0.

Example:
   %X = sext i8  -1 to i16              ; yields i16   :65535
   %Y = sext i1 true to i32             ; yields i32:-1
   %Z = sext <2 x i16> <i16 8, i16 7> to <2 x i32> ; yields <i32 8, i32 7>
 

'fptrunc .. to' Instruction

Syntax:
   <result> = fptrunc <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'fptrunc' instruction truncates value to type ty2.

Arguments:

The 'fptrunc' instruction takes a floating point value to cast and a floating point type to cast it to. The size of value must be larger than the size of ty2. This implies that fptrunc cannot be used to make a no-op cast.

Semantics:

The 'fptrunc' instruction truncates a value from a larger floating point type to a smaller floating point type. If the value cannot fit within the destination type, ty2, then the results are undefined.

Example:
   %X = fptrunc double 123.0 to float         ; yields float:123.0
   %Y = fptrunc double 1.0E+300 to float      ; yields undefined
 

'fpext .. to' Instruction

Syntax:
   <result> = fpext <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'fpext' extends a floating point value to a larger floating point value.

Arguments:

The 'fpext' instruction takes a floating point value to cast, and a floating point type to cast it to. The source type must be smaller than the destination type.

Semantics:

The 'fpext' instruction extends the value from a smaller floating point type to a larger floating point type. The fpext cannot be used to make a no-op cast because it always changes bits. Use bitcast to make a no-op cast for a floating point cast.

Example:
   %X = fpext float 3.125 to double         ; yields double:3.125000e+00
   %Y = fpext double %X to fp128            ; yields fp128:0xL00000000000000004000900000000000
 

'fptoui .. to' Instruction

Syntax:
   <result> = fptoui <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'fptoui' converts a floating point value to its unsigned integer equivalent of type ty2.

Arguments:

The 'fptoui' instruction takes a value to cast, which must be a scalar or vector floating point value, and a type to cast it to ty2, which must be an integer type. If ty is a vector floating point type, ty2 must be a vector integer type with the same number of elements as ty

Semantics:

The 'fptoui' instruction converts its floating point operand into the nearest (rounding towards zero) unsigned integer value. If the value cannot fit in ty2, the results are undefined.

Example:
   %X = fptoui double 123.0 to i32      ; yields i32:123
   %Y = fptoui float 1.0E+300 to i1     ; yields undefined:1
   %Z = fptoui float 1.04E+17 to i8     ; yields undefined:1
 

'fptosi .. to' Instruction

Syntax:
   <result> = fptosi <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'fptosi' instruction converts floating point value to type ty2.

Arguments:

The 'fptosi' instruction takes a value to cast, which must be a scalar or vector floating point value, and a type to cast it to ty2, which must be an integer type. If ty is a vector floating point type, ty2 must be a vector integer type with the same number of elements as ty

Semantics:

The 'fptosi' instruction converts its floating point operand into the nearest (rounding towards zero) signed integer value. If the value cannot fit in ty2, the results are undefined.

Example:
   %X = fptosi double -123.0 to i32      ; yields i32:-123
   %Y = fptosi float 1.0E-247 to i1      ; yields undefined:1
   %Z = fptosi float 1.04E+17 to i8      ; yields undefined:1
 

'uitofp .. to' Instruction

Syntax:
   <result> = uitofp <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'uitofp' instruction regards value as an unsigned integer and converts that value to the ty2 type.

Arguments:

The 'uitofp' instruction takes a value to cast, which must be a scalar or vector integer value, and a type to cast it to ty2, which must be an floating point type. If ty is a vector integer type, ty2 must be a vector floating point type with the same number of elements as ty

Semantics:

The 'uitofp' instruction interprets its operand as an unsigned integer quantity and converts it to the corresponding floating point value. If the value cannot fit in the floating point value, the results are undefined.

Example:
   %X = uitofp i32 257 to float         ; yields float:257.0
   %Y = uitofp i8 -1 to double          ; yields double:255.0
 

'sitofp .. to' Instruction

Syntax:
   <result> = sitofp <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'sitofp' instruction regards value as a signed integer and converts that value to the ty2 type.

Arguments:

The 'sitofp' instruction takes a value to cast, which must be a scalar or vector integer value, and a type to cast it to ty2, which must be an floating point type. If ty is a vector integer type, ty2 must be a vector floating point type with the same number of elements as ty

Semantics:

The 'sitofp' instruction interprets its operand as a signed integer quantity and converts it to the corresponding floating point value. If the value cannot fit in the floating point value, the results are undefined.

Example:
   %X = sitofp i32 257 to float         ; yields float:257.0
   %Y = sitofp i8 -1 to double          ; yields double:-1.0
 

'ptrtoint .. to' Instruction

Syntax:
   <result> = ptrtoint <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'ptrtoint' instruction converts the pointer value to the integer type ty2.

Arguments:

The 'ptrtoint' instruction takes a value to cast, which must be a pointer value, and a type to cast it to ty2, which must be an integer type.

Semantics:

The 'ptrtoint' instruction converts value to integer type ty2 by interpreting the pointer value as an integer and either truncating or zero extending that value to the size of the integer type. If value is smaller than ty2 then a zero extension is done. If value is larger than ty2 then a truncation is done. If they are the same size, then nothing is done (no-op cast) other than a type change.

Example:
   %X = ptrtoint i32* %X to i8           ; yields truncation on 32-bit architecture
   %Y = ptrtoint i32* %x to i64          ; yields zero extension on 32-bit architecture
 

'inttoptr .. to' Instruction

Syntax:
   <result> = inttoptr <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'inttoptr' instruction converts an integer value to a pointer type, ty2.

Arguments:

The 'inttoptr' instruction takes an integer value to cast, and a type to cast it to, which must be a pointer type.

Semantics:

The 'inttoptr' instruction converts value to type ty2 by applying either a zero extension or a truncation depending on the size of the integer value. If value is larger than the size of a pointer then a truncation is done. If value is smaller than the size of a pointer then a zero extension is done. If they are the same size, nothing is done (no-op cast).

Example:
   %X = inttoptr i32 255 to i32*          ; yields zero extension on 64-bit architecture
   %Y = inttoptr i32 255 to i32*          ; yields no-op on 32-bit architecture
   %Z = inttoptr i64 0 to i32*            ; yields truncation on 32-bit architecture
 

'bitcast .. to' Instruction

Syntax:
   <result> = bitcast <ty> <value> to <ty2>             ; yields ty2
 
Overview:

The 'bitcast' instruction converts value to type ty2 without changing any bits.

Arguments:

The 'bitcast' instruction takes a value to cast, which must be a non-aggregate first class value, and a type to cast it to, which must also be a non-aggregate first class type. The bit sizes of value and the destination type, ty2, must be identical. If the source type is a pointer, the destination type must also be a pointer. This instruction supports bitwise conversion of vectors to integers and to vectors of other types (as long as they have the same size).

Semantics:

The 'bitcast' instruction converts value to type ty2. It is always a no-op cast because no bits change with this conversion. The conversion is done as if the value had been stored to memory and read back as type ty2. Pointer types may only be converted to other pointer types with this instruction. To convert pointers to other types, use the inttoptr or ptrtoint instructions first.

Example:
   %X = bitcast i8 255 to i8              ; yields i8 :-1
   %Y = bitcast i32* %x to sint*          ; yields sint*:%x
   %Z = bitcast <2 x int> %V to i64;      ; yields i64: %V
 

Other Operations

The instructions in this category are the "miscellaneous" instructions, which defy better classification.

'icmp' Instruction

Syntax:
   <result> = icmp <cond> <ty> <op1>, <op2>   ; yields {i1} or {<N x i1>}:result
 
Overview:

The 'icmp' instruction returns a boolean value or a vector of boolean values based on comparison of its two integer, integer vector, or pointer operands.

Arguments:

The 'icmp' instruction takes three operands. The first operand is the condition code indicating the kind of comparison to perform. It is not a value, just a keyword. The possible condition code are:

  1. eq: equal
  2. ne: not equal
  3. ugt: unsigned greater than
  4. uge: unsigned greater or equal
  5. ult: unsigned less than
  6. ule: unsigned less or equal
  7. sgt: signed greater than
  8. sge: signed greater or equal
  9. slt: signed less than
  10. sle: signed less or equal

The remaining two arguments must be integer or pointer or integer vector typed. They must also be identical types.

Semantics:

The 'icmp' compares op1 and op2 according to the condition code given as cond. The comparison performed always yields either an i1 or vector of i1 result, as follows:

  1. eq: yields true if the operands are equal, false otherwise. No sign interpretation is necessary or performed.
  2. ne: yields true if the operands are unequal, false otherwise. No sign interpretation is necessary or performed.
  3. ugt: interprets the operands as unsigned values and yields true if op1 is greater than op2.
  4. uge: interprets the operands as unsigned values and yields true if op1 is greater than or equal to op2.
  5. ult: interprets the operands as unsigned values and yields true if op1 is less than op2.
  6. ule: interprets the operands as unsigned values and yields true if op1 is less than or equal to op2.
  7. sgt: interprets the operands as signed values and yields true if op1 is greater than op2.
  8. sge: interprets the operands as signed values and yields true if op1 is greater than or equal to op2.
  9. slt: interprets the operands as signed values and yields true if op1 is less than op2.
  10. sle: interprets the operands as signed values and yields true if op1 is less than or equal to op2.

If the operands are pointer typed, the pointer values are compared as if they were integers.

If the operands are integer vectors, then they are compared element by element. The result is an i1 vector with the same number of elements as the values being compared. Otherwise, the result is an i1.

Example:
   <result> = icmp eq i32 4, 5          ; yields: result=false
   <result> = icmp ne float* %X, %X     ; yields: result=false
   <result> = icmp ult i16  4, 5        ; yields: result=true
   <result> = icmp sgt i16  4, 5        ; yields: result=false
   <result> = icmp ule i16 -4, 5        ; yields: result=false
   <result> = icmp sge i16  4, 5        ; yields: result=false
 

Note that the code generator does not yet support vector types with the icmp instruction.

'fcmp' Instruction

Syntax:
   <result> = fcmp <cond> <ty> <op1>, <op2>     ; yields {i1} or {<N x i1>}:result
 
Overview:

The 'fcmp' instruction returns a boolean value or vector of boolean values based on comparison of its operands.

If the operands are floating point scalars, then the result type is a boolean (i1).

If the operands are floating point vectors, then the result type is a vector of boolean with the same number of elements as the operands being compared.

Arguments:

The 'fcmp' instruction takes three operands. The first operand is the condition code indicating the kind of comparison to perform. It is not a value, just a keyword. The possible condition code are:

  1. false: no comparison, always returns false
  2. oeq: ordered and equal
  3. ogt: ordered and greater than
  4. oge: ordered and greater than or equal
  5. olt: ordered and less than
  6. ole: ordered and less than or equal
  7. one: ordered and not equal
  8. ord: ordered (no nans)
  9. ueq: unordered or equal
  10. ugt: unordered or greater than
  11. uge: unordered or greater than or equal
  12. ult: unordered or less than
  13. ule: unordered or less than or equal
  14. une: unordered or not equal
  15. uno: unordered (either nans)
  16. true: no comparison, always returns true

Ordered means that neither operand is a QNAN while unordered means that either operand may be a QNAN.

Each of val1 and val2 arguments must be either a floating point type or a vector of floating point type. They must have identical types.

Semantics:

The 'fcmp' instruction compares op1 and op2 according to the condition code given as cond. If the operands are vectors, then the vectors are compared element by element. Each comparison performed always yields an i1 result, as follows:

  1. false: always yields false, regardless of operands.
  2. oeq: yields true if both operands are not a QNAN and op1 is equal to op2.
  3. ogt: yields true if both operands are not a QNAN and op1 is greater than op2.
  4. oge: yields true if both operands are not a QNAN and op1 is greater than or equal to op2.
  5. olt: yields true if both operands are not a QNAN and op1 is less than op2.
  6. ole: yields true if both operands are not a QNAN and op1 is less than or equal to op2.
  7. one: yields true if both operands are not a QNAN and op1 is not equal to op2.
  8. ord: yields true if both operands are not a QNAN.
  9. ueq: yields true if either operand is a QNAN or op1 is equal to op2.
  10. ugt: yields true if either operand is a QNAN or op1 is greater than op2.
  11. uge: yields true if either operand is a QNAN or op1 is greater than or equal to op2.
  12. ult: yields true if either operand is a QNAN or op1 is less than op2.
  13. ule: yields true if either operand is a QNAN or op1 is less than or equal to op2.
  14. une: yields true if either operand is a QNAN or op1 is not equal to op2.
  15. uno: yields true if either operand is a QNAN.
  16. true: always yields true, regardless of operands.
Example:
   <result> = fcmp oeq float 4.0, 5.0    ; yields: result=false
   <result> = fcmp one float 4.0, 5.0    ; yields: result=true
   <result> = fcmp olt float 4.0, 5.0    ; yields: result=true
   <result> = fcmp ueq double 1.0, 2.0   ; yields: result=false
 

Note that the code generator does not yet support vector types with the fcmp instruction.

'phi' Instruction

Syntax:
   <result> = phi <ty> [ <val0>, <label0>], ...
 
Overview:

The 'phi' instruction is used to implement the φ node in the SSA graph representing the function.

Arguments:

The type of the incoming values is specified with the first type field. After this, the 'phi' instruction takes a list of pairs as arguments, with one pair for each predecessor basic block of the current block. Only values of first class type may be used as the value arguments to the PHI node. Only labels may be used as the label arguments.

There must be no non-phi instructions between the start of a basic block and the PHI instructions: i.e. PHI instructions must be first in a basic block.

For the purposes of the SSA form, the use of each incoming value is deemed to occur on the edge from the corresponding predecessor block to the current block (but after any definition of an 'invoke' instruction's return value on the same edge).

Semantics:

At runtime, the 'phi' instruction logically takes on the value specified by the pair corresponding to the predecessor basic block that executed just prior to the current block.

Example:
 Loop:       ; Infinite loop that counts from 0 on up...
   %indvar = phi i32 [ 0, %LoopHeader ], [ %nextindvar, %Loop ]
   %nextindvar = add i32 %indvar, 1
   br label %Loop
 

'select' Instruction

Syntax:
   <result> = select selty <cond>, <ty> <val1>, <ty> <val2>             ; yields ty
 
   selty is either i1 or {<N x i1>}
 
Overview:

The 'select' instruction is used to choose one value based on a condition, without branching.

Arguments:

The 'select' instruction requires an 'i1' value or a vector of 'i1' values indicating the condition, and two values of the same first class type. If the val1/val2 are vectors and the condition is a scalar, then entire vectors are selected, not individual elements.

Semantics:

If the condition is an i1 and it evaluates to 1, the instruction returns the first value argument; otherwise, it returns the second value argument.

If the condition is a vector of i1, then the value arguments must be vectors of the same size, and the selection is done element by element.

Example:
   %X = select i1 true, i8 17, i8 42          ; yields i8:17
 

Note that the code generator does not yet support conditions with vector type.

'call' Instruction

Syntax:
   <result> = [tail] call [cconv] [ret attrs] <ty> [<fnty>*] <fnptrval>(<function args>) [fn attrs]
 
Overview:

The 'call' instruction represents a simple function call.

Arguments:

This instruction requires several arguments:

  1. The optional "tail" marker indicates that the callee function does not access any allocas or varargs in the caller. Note that calls may be marked "tail" even if they do not occur before a ret instruction. If the "tail" marker is present, the function call is eligible for tail call optimization, but might not in fact be optimized into a jump. The code generator may optimize calls marked "tail" with either 1) automatic sibling call optimization when the caller and callee have matching signatures, or 2) forced tail call optimization when the following extra requirements are met:
    • Caller and callee both have the calling convention fastcc.
    • The call is in tail position (ret immediately follows call and ret uses value of call or is void).
    • Option -tailcallopt is enabled, or llvm::GuaranteedTailCallOpt is true.
    • Platform specific constraints are met.
  2. The optional "cconv" marker indicates which calling convention the call should use. If none is specified, the call defaults to using C calling conventions. The calling convention of the call must match the calling convention of the target function, or else the behavior is undefined.
  3. The optional Parameter Attributes list for return values. Only 'zeroext', 'signext', and 'inreg' attributes are valid here.
  4. 'ty': the type of the call instruction itself which is also the type of the return value. Functions that return no value are marked void.
  5. 'fnty': shall be the signature of the pointer to function value being invoked. The argument types must match the types implied by this signature. This type can be omitted if the function is not varargs and if the function type does not return a pointer to a function.
  6. 'fnptrval': An LLVM value containing a pointer to a function to be invoked. In most cases, this is a direct function invocation, but indirect calls are just as possible, calling an arbitrary pointer to function value.
  7. 'function args': argument list whose types match the function signature argument types and parameter attributes. All arguments must be of first class type. If the function signature indicates the function accepts a variable number of arguments, the extra arguments can be specified.
  8. The optional function attributes list. Only 'noreturn', 'nounwind', 'readonly' and 'readnone' attributes are valid here.
Semantics:

The 'call' instruction is used to cause control flow to transfer to a specified function, with its incoming arguments bound to the specified values. Upon a 'ret' instruction in the called function, control flow continues with the instruction after the function call, and the return value of the function is bound to the result argument.

Example:
   %retval = call i32 @test(i32 %argc)
   call i32 (i8*, ...)* @printf(i8* %msg, i32 12, i8 42)        ; yields i32
   %X = tail call i32 @foo()                                    ; yields i32
   %Y = tail call fastcc i32 @foo()  ; yields i32
   call void %foo(i8 97 signext)
 
   %struct.A = type { i32, i8 }
   %r = call %struct.A @foo()                        ; yields { 32, i8 }
   %gr = extractvalue %struct.A %r, 0                ; yields i32
   %gr1 = extractvalue %struct.A %r, 1               ; yields i8
   %Z = call void @foo() noreturn                    ; indicates that %foo never returns normally
   %ZZ = call zeroext i32 @bar()                     ; Return value is %zero extended
 

llvm treats calls to some functions with names and arguments that match the standard C99 library as being the C99 library functions, and may perform optimizations or generate code for them under that assumption. This is something we'd like to change in the future to provide better support for freestanding environments and non-C-based languages.

'va_arg' Instruction

Syntax:
   <resultval> = va_arg <va_list*> <arglist>, <argty>
 
Overview:

The 'va_arg' instruction is used to access arguments passed through the "variable argument" area of a function call. It is used to implement the va_arg macro in C.

Arguments:

This instruction takes a va_list* value and the type of the argument. It returns a value of the specified argument type and increments the va_list to point to the next argument. The actual type of va_list is target specific.

Semantics:

The 'va_arg' instruction loads an argument of the specified type from the specified va_list and causes the va_list to point to the next argument. For more information, see the variable argument handling Intrinsic Functions.

It is legal for this instruction to be called in a function which does not take a variable number of arguments, for example, the vfprintf function.

va_arg is an LLVM instruction instead of an intrinsic function because it takes a type as an argument.

Example:

See the variable argument processing section.

Note that the code generator does not yet fully support va_arg on many targets. Also, it does not currently support va_arg with aggregate types on any target.

'landingpad' Instruction

Syntax:
   <resultval> = landingpad <somety> personality <type> <pers_fn> <clause>+
   <resultval> = landingpad <somety> personality <type> <pers_fn> cleanup <clause>*
 
   <clause> := catch <type> <value>
   <clause> := filter <array constant type> <array constant>
 
Overview:

The 'landingpad' instruction is used by LLVM's exception handling system to specify that a basic block is a landing pad — one where the exception lands, and corresponds to the code found in the catch portion of a try/catch sequence. It defines values supplied by the personality function (pers_fn) upon re-entry to the function. The resultval has the type somety.

Arguments:

This instruction takes a pers_fn value. This is the personality function associated with the unwinding mechanism. The optional cleanup flag indicates that the landing pad block is a cleanup.

A clause begins with the clause type — catch or filter — and contains the global variable representing the "type" that may be caught or filtered respectively. Unlike the catch clause, the filter clause takes an array constant as its argument. Use "[0 x i8**] undef" for a filter which cannot throw. The 'landingpad' instruction must contain at least one clause or the cleanup flag.

Semantics:

The 'landingpad' instruction defines the values which are set by the personality function (pers_fn) upon re-entry to the function, and therefore the "result type" of the landingpad instruction. As with calling conventions, how the personality function results are represented in LLVM IR is target specific.

The clauses are applied in order from top to bottom. If two landingpad instructions are merged together through inlining, the clauses from the calling function are appended to the list of clauses.

The landingpad instruction has several restrictions:

  • A landing pad block is a basic block which is the unwind destination of an 'invoke' instruction.
  • A landing pad block must have a 'landingpad' instruction as its first non-PHI instruction.
  • There can be only one 'landingpad' instruction within the landing pad block.
  • A basic block that is not a landing pad block may not include a 'landingpad' instruction.
  • All 'landingpad' instructions in a function must have the same personality function.
Example:
   ;; A landing pad which can catch an integer.
   %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0
            catch i8** @_ZTIi
   ;; A landing pad that is a cleanup.
   %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0
            cleanup
   ;; A landing pad which can catch an integer and can only throw a double.
   %res = landingpad { i8*, i32 } personality i32 (...)* @__gxx_personality_v0
            catch i8** @_ZTIi
            filter [1 x i8**] [@_ZTId]
 

Intrinsic Functions

LLVM supports the notion of an "intrinsic function". These functions have well known names and semantics and are required to follow certain restrictions. Overall, these intrinsics represent an extension mechanism for the LLVM language that does not require changing all of the transformations in LLVM when adding to the language (or the bitcode reader/writer, the parser, etc...).

Intrinsic function names must all start with an "llvm." prefix. This prefix is reserved in LLVM for intrinsic names; thus, function names may not begin with this prefix. Intrinsic functions must always be external functions: you cannot define the body of intrinsic functions. Intrinsic functions may only be used in call or invoke instructions: it is illegal to take the address of an intrinsic function. Additionally, because intrinsic functions are part of the LLVM language, it is required if any are added that they be documented here.

Some intrinsic functions can be overloaded, i.e., the intrinsic represents a family of functions that perform the same operation but on different data types. Because LLVM can represent over 8 million different integer types, overloading is used commonly to allow an intrinsic function to operate on any integer type. One or more of the argument types or the result type can be overloaded to accept any integer type. Argument types may also be defined as exactly matching a previous argument's type or the result type. This allows an intrinsic function which accepts multiple arguments, but needs all of them to be of the same type, to only be overloaded with respect to a single argument or the result.

Overloaded intrinsics will have the names of its overloaded argument types encoded into its function name, each preceded by a period. Only those types which are overloaded result in a name suffix. Arguments whose type is matched against another type do not. For example, the llvm.ctpop function can take an integer of any width and returns an integer of exactly the same integer width. This leads to a family of functions such as i8 @llvm.ctpop.i8(i8 %val) and i29 @llvm.ctpop.i29(i29 %val). Only one type, the return type, is overloaded, and only one type suffix is required. Because the argument's type is matched against the return type, it does not require its own name suffix.

To learn how to add an intrinsic function, please see the Extending LLVM Guide.

Variable Argument Handling Intrinsics

Variable argument support is defined in LLVM with the va_arg instruction and these three intrinsic functions. These functions are related to the similarly named macros defined in the <stdarg.h> header file.

All of these functions operate on arguments that use a target-specific value type "va_list". The LLVM assembly language reference manual does not define what this type is, so all transformations should be prepared to handle these functions regardless of the type used.

This example shows how the va_arg instruction and the variable argument handling intrinsic functions are used.

 define i32 @test(i32 %X, ...) {
   ; Initialize variable argument processing
   %ap = alloca i8*
   %ap2 = bitcast i8** %ap to i8*
   call void @llvm.va_start(i8* %ap2)
 
   ; Read a single integer argument
   %tmp = va_arg i8** %ap, i32
 
   ; Demonstrate usage of llvm.va_copy and llvm.va_end
   %aq = alloca i8*
   %aq2 = bitcast i8** %aq to i8*
   call void @llvm.va_copy(i8* %aq2, i8* %ap2)
   call void @llvm.va_end(i8* %aq2)
 
   ; Stop processing of arguments.
   call void @llvm.va_end(i8* %ap2)
   ret i32 %tmp
 }
 
 declare void @llvm.va_start(i8*)
 declare void @llvm.va_copy(i8*, i8*)
 declare void @llvm.va_end(i8*)
 

'llvm.va_start' Intrinsic

Syntax:
   declare void %llvm.va_start(i8* <arglist>)
 
Overview:

The 'llvm.va_start' intrinsic initializes *<arglist> for subsequent use by va_arg.

Arguments:

The argument is a pointer to a va_list element to initialize.

Semantics:

The 'llvm.va_start' intrinsic works just like the va_start macro available in C. In a target-dependent way, it initializes the va_list element to which the argument points, so that the next call to va_arg will produce the first variable argument passed to the function. Unlike the C va_start macro, this intrinsic does not need to know the last argument of the function as the compiler can figure that out.

'llvm.va_end' Intrinsic

Syntax:
   declare void @llvm.va_end(i8* <arglist>)
 
Overview:

The 'llvm.va_end' intrinsic destroys *<arglist>, which has been initialized previously with llvm.va_start or llvm.va_copy.

Arguments:

The argument is a pointer to a va_list to destroy.

Semantics:

The 'llvm.va_end' intrinsic works just like the va_end macro available in C. In a target-dependent way, it destroys the va_list element to which the argument points. Calls to llvm.va_start and llvm.va_copy must be matched exactly with calls to llvm.va_end.

'llvm.va_copy' Intrinsic

Syntax:
   declare void @llvm.va_copy(i8* <destarglist>, i8* <srcarglist>)
 
Overview:

The 'llvm.va_copy' intrinsic copies the current argument position from the source argument list to the destination argument list.

Arguments:

The first argument is a pointer to a va_list element to initialize. The second argument is a pointer to a va_list element to copy from.

Semantics:

The 'llvm.va_copy' intrinsic works just like the va_copy macro available in C. In a target-dependent way, it copies the source va_list element into the destination va_list element. This intrinsic is necessary because the llvm.va_start intrinsic may be arbitrarily complex and require, for example, memory allocation.

-
-

Accurate Garbage Collection Intrinsics

LLVM support for Accurate Garbage Collection (GC) requires the implementation and generation of these intrinsics. These intrinsics allow identification of GC roots on the stack, as well as garbage collector implementations that require read and write barriers. Front-ends for type-safe garbage collected languages should generate these intrinsics to make use of the LLVM garbage collectors. For more details, see Accurate Garbage Collection with LLVM.

The garbage collection intrinsics only operate on objects in the generic address space (address space zero).

'llvm.gcroot' Intrinsic

Syntax:
   declare void @llvm.gcroot(i8** %ptrloc, i8* %metadata)
 
Overview:

The 'llvm.gcroot' intrinsic declares the existence of a GC root to the code generator, and allows some metadata to be associated with it.

Arguments:

The first argument specifies the address of a stack object that contains the root pointer. The second pointer (which must be either a constant or a global value address) contains the meta-data to be associated with the root.

Semantics:

At runtime, a call to this intrinsic stores a null pointer into the "ptrloc" location. At compile-time, the code generator generates information to allow the runtime to find the pointer at GC safe points. The 'llvm.gcroot' intrinsic may only be used in a function which specifies a GC algorithm.

'llvm.gcread' Intrinsic

Syntax:
   declare i8* @llvm.gcread(i8* %ObjPtr, i8** %Ptr)
 
Overview:

The 'llvm.gcread' intrinsic identifies reads of references from heap locations, allowing garbage collector implementations that require read barriers.

Arguments:

The second argument is the address to read from, which should be an address allocated from the garbage collector. The first object is a pointer to the start of the referenced object, if needed by the language runtime (otherwise null).

Semantics:

The 'llvm.gcread' intrinsic has the same semantics as a load instruction, but may be replaced with substantially more complex code by the garbage collector runtime, as needed. The 'llvm.gcread' intrinsic may only be used in a function which specifies a GC algorithm.

'llvm.gcwrite' Intrinsic

Syntax:
   declare void @llvm.gcwrite(i8* %P1, i8* %Obj, i8** %P2)
 
Overview:

The 'llvm.gcwrite' intrinsic identifies writes of references to heap locations, allowing garbage collector implementations that require write barriers (such as generational or reference counting collectors).

Arguments:

The first argument is the reference to store, the second is the start of the object to store it to, and the third is the address of the field of Obj to store to. If the runtime does not require a pointer to the object, Obj may be null.

Semantics:

The 'llvm.gcwrite' intrinsic has the same semantics as a store instruction, but may be replaced with substantially more complex code by the garbage collector runtime, as needed. The 'llvm.gcwrite' intrinsic may only be used in a function which specifies a GC algorithm.

Code Generator Intrinsics

These intrinsics are provided by LLVM to expose special features that may only be implemented with code generator support.

'llvm.returnaddress' Intrinsic

Syntax:
   declare i8  *@llvm.returnaddress(i32 <level>)
 
Overview:

The 'llvm.returnaddress' intrinsic attempts to compute a target-specific value indicating the return address of the current function or one of its callers.

Arguments:

The argument to this intrinsic indicates which function to return the address for. Zero indicates the calling function, one indicates its caller, etc. The argument is required to be a constant integer value.

Semantics:

The 'llvm.returnaddress' intrinsic either returns a pointer indicating the return address of the specified call frame, or zero if it cannot be identified. The value returned by this intrinsic is likely to be incorrect or 0 for arguments other than zero, so it should only be used for debugging purposes.

Note that calling this intrinsic does not prevent function inlining or other aggressive transformations, so the value returned may not be that of the obvious source-language caller.

'llvm.frameaddress' Intrinsic

Syntax:
   declare i8* @llvm.frameaddress(i32 <level>)
 
Overview:

The 'llvm.frameaddress' intrinsic attempts to return the target-specific frame pointer value for the specified stack frame.

Arguments:

The argument to this intrinsic indicates which function to return the frame pointer for. Zero indicates the calling function, one indicates its caller, etc. The argument is required to be a constant integer value.

Semantics:

The 'llvm.frameaddress' intrinsic either returns a pointer indicating the frame address of the specified call frame, or zero if it cannot be identified. The value returned by this intrinsic is likely to be incorrect or 0 for arguments other than zero, so it should only be used for debugging purposes.

Note that calling this intrinsic does not prevent function inlining or other aggressive transformations, so the value returned may not be that of the obvious source-language caller.

'llvm.stacksave' Intrinsic

Syntax:
   declare i8* @llvm.stacksave()
 
Overview:

The 'llvm.stacksave' intrinsic is used to remember the current state of the function stack, for use with llvm.stackrestore. This is useful for implementing language features like scoped automatic variable sized arrays in C99.

Semantics:

This intrinsic returns a opaque pointer value that can be passed to llvm.stackrestore. When an llvm.stackrestore intrinsic is executed with a value saved from llvm.stacksave, it effectively restores the state of the stack to the state it was in when the llvm.stacksave intrinsic executed. In practice, this pops any alloca blocks from the stack that were allocated after the llvm.stacksave was executed.

'llvm.stackrestore' Intrinsic

Syntax:
   declare void @llvm.stackrestore(i8* %ptr)
 
Overview:

The 'llvm.stackrestore' intrinsic is used to restore the state of the function stack to the state it was in when the corresponding llvm.stacksave intrinsic executed. This is useful for implementing language features like scoped automatic variable sized arrays in C99.

Semantics:

See the description for llvm.stacksave.

'llvm.prefetch' Intrinsic

Syntax:
   declare void @llvm.prefetch(i8* <address>, i32 <rw>, i32 <locality>, i32 <cache type>)
 
Overview:

The 'llvm.prefetch' intrinsic is a hint to the code generator to insert a prefetch instruction if supported; otherwise, it is a noop. Prefetches have no effect on the behavior of the program but can change its performance characteristics.

Arguments:

address is the address to be prefetched, rw is the specifier determining if the fetch should be for a read (0) or write (1), and locality is a temporal locality specifier ranging from (0) - no locality, to (3) - extremely local keep in cache. The cache type specifies whether the prefetch is performed on the data (1) or instruction (0) cache. The rw, locality and cache type arguments must be constant integers.

Semantics:

This intrinsic does not modify the behavior of the program. In particular, prefetches cannot trap and do not produce a value. On targets that support this intrinsic, the prefetch can provide hints to the processor cache for better performance.

'llvm.pcmarker' Intrinsic

Syntax:
   declare void @llvm.pcmarker(i32 <id>)
 
Overview:

The 'llvm.pcmarker' intrinsic is a method to export a Program Counter (PC) in a region of code to simulators and other tools. The method is target specific, but it is expected that the marker will use exported symbols to transmit the PC of the marker. The marker makes no guarantees that it will remain with any specific instruction after optimizations. It is possible that the presence of a marker will inhibit optimizations. The intended use is to be inserted after optimizations to allow correlations of simulation runs.

Arguments:

id is a numerical id identifying the marker.

Semantics:

This intrinsic does not modify the behavior of the program. Backends that do not support this intrinsic may ignore it.

'llvm.readcyclecounter' Intrinsic

Syntax:
   declare i64 @llvm.readcyclecounter()
 
Overview:

The 'llvm.readcyclecounter' intrinsic provides access to the cycle counter register (or similar low latency, high accuracy clocks) on those targets that support it. On X86, it should map to RDTSC. On Alpha, it should map to RPCC. As the backing counters overflow quickly (on the order of 9 seconds on alpha), this should only be used for small timings.

Semantics:

When directly supported, reading the cycle counter should not modify any memory. Implementations are allowed to either return a application specific value or a system wide value. On backends without support, this is lowered to a constant 0.

Standard C Library Intrinsics

LLVM provides intrinsics for a few important standard C library functions. These intrinsics allow source-language front-ends to pass information about the alignment of the pointer arguments to the code generator, providing opportunity for more efficient code generation.

'llvm.memcpy' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.memcpy on any integer bit width and for different address spaces. Not all targets support all bit widths however.

   declare void @llvm.memcpy.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
                                           i32 <len>, i32 <align>, i1 <isvolatile>)
   declare void @llvm.memcpy.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
                                           i64 <len>, i32 <align>, i1 <isvolatile>)
 
Overview:

The 'llvm.memcpy.*' intrinsics copy a block of memory from the source location to the destination location.

Note that, unlike the standard libc function, the llvm.memcpy.* intrinsics do not return a value, takes extra alignment/isvolatile arguments and the pointers can be in specified address spaces.

Arguments:

The first argument is a pointer to the destination, the second is a pointer to the source. The third argument is an integer argument specifying the number of bytes to copy, the fourth argument is the alignment of the source and destination locations, and the fifth is a boolean indicating a volatile access.

If the call to this intrinsic has an alignment value that is not 0 or 1, then the caller guarantees that both the source and destination pointers are aligned to that boundary.

If the isvolatile parameter is true, the llvm.memcpy call is a volatile operation. The detailed access behavior is not very cleanly specified and it is unwise to depend on it.

Semantics:

The 'llvm.memcpy.*' intrinsics copy a block of memory from the source location to the destination location, which are not allowed to overlap. It copies "len" bytes of memory over. If the argument is known to be aligned to some boundary, this can be specified as the fourth argument, otherwise it should be set to 0 or 1.

'llvm.memmove' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.memmove on any integer bit width and for different address space. Not all targets support all bit widths however.

   declare void @llvm.memmove.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
                                            i32 <len>, i32 <align>, i1 <isvolatile>)
   declare void @llvm.memmove.p0i8.p0i8.i64(i8* <dest>, i8* <src>,
                                            i64 <len>, i32 <align>, i1 <isvolatile>)
 
Overview:

The 'llvm.memmove.*' intrinsics move a block of memory from the source location to the destination location. It is similar to the 'llvm.memcpy' intrinsic but allows the two memory locations to overlap.

Note that, unlike the standard libc function, the llvm.memmove.* intrinsics do not return a value, takes extra alignment/isvolatile arguments and the pointers can be in specified address spaces.

Arguments:

The first argument is a pointer to the destination, the second is a pointer to the source. The third argument is an integer argument specifying the number of bytes to copy, the fourth argument is the alignment of the source and destination locations, and the fifth is a boolean indicating a volatile access.

If the call to this intrinsic has an alignment value that is not 0 or 1, then the caller guarantees that the source and destination pointers are aligned to that boundary.

If the isvolatile parameter is true, the llvm.memmove call is a volatile operation. The detailed access behavior is not very cleanly specified and it is unwise to depend on it.

Semantics:

The 'llvm.memmove.*' intrinsics copy a block of memory from the source location to the destination location, which may overlap. It copies "len" bytes of memory over. If the argument is known to be aligned to some boundary, this can be specified as the fourth argument, otherwise it should be set to 0 or 1.

'llvm.memset.*' Intrinsics

Syntax:

This is an overloaded intrinsic. You can use llvm.memset on any integer bit width and for different address spaces. However, not all targets support all bit widths.

   declare void @llvm.memset.p0i8.i32(i8* <dest>, i8 <val>,
                                      i32 <len>, i32 <align>, i1 <isvolatile>)
   declare void @llvm.memset.p0i8.i64(i8* <dest>, i8 <val>,
                                      i64 <len>, i32 <align>, i1 <isvolatile>)
 
Overview:

The 'llvm.memset.*' intrinsics fill a block of memory with a particular byte value.

Note that, unlike the standard libc function, the llvm.memset intrinsic does not return a value and takes extra alignment/volatile arguments. Also, the destination can be in an arbitrary address space.

Arguments:

The first argument is a pointer to the destination to fill, the second is the byte value with which to fill it, the third argument is an integer argument specifying the number of bytes to fill, and the fourth argument is the known alignment of the destination location.

If the call to this intrinsic has an alignment value that is not 0 or 1, then the caller guarantees that the destination pointer is aligned to that boundary.

If the isvolatile parameter is true, the llvm.memset call is a volatile operation. The detailed access behavior is not very cleanly specified and it is unwise to depend on it.

Semantics:

The 'llvm.memset.*' intrinsics fill "len" bytes of memory starting at the destination location. If the argument is known to be aligned to some boundary, this can be specified as the fourth argument, otherwise it should be set to 0 or 1.

'llvm.sqrt.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.sqrt on any floating point or vector of floating point type. Not all targets support all types however.

   declare float     @llvm.sqrt.f32(float %Val)
   declare double    @llvm.sqrt.f64(double %Val)
   declare x86_fp80  @llvm.sqrt.f80(x86_fp80 %Val)
   declare fp128     @llvm.sqrt.f128(fp128 %Val)
   declare ppc_fp128 @llvm.sqrt.ppcf128(ppc_fp128 %Val)
 
Overview:

The 'llvm.sqrt' intrinsics return the sqrt of the specified operand, returning the same value as the libm 'sqrt' functions would. Unlike sqrt in libm, however, llvm.sqrt has undefined behavior for negative numbers other than -0.0 (which allows for better optimization, because there is no need to worry about errno being set). llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sqrt.

Arguments:

The argument and return value are floating point numbers of the same type.

Semantics:

This function returns the sqrt of the specified operand if it is a nonnegative floating point number.

'llvm.powi.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.powi on any floating point or vector of floating point type. Not all targets support all types however.

   declare float     @llvm.powi.f32(float  %Val, i32 %power)
   declare double    @llvm.powi.f64(double %Val, i32 %power)
   declare x86_fp80  @llvm.powi.f80(x86_fp80  %Val, i32 %power)
   declare fp128     @llvm.powi.f128(fp128 %Val, i32 %power)
   declare ppc_fp128 @llvm.powi.ppcf128(ppc_fp128  %Val, i32 %power)
 
Overview:

The 'llvm.powi.*' intrinsics return the first operand raised to the specified (positive or negative) power. The order of evaluation of multiplications is not defined. When a vector of floating point type is used, the second argument remains a scalar integer value.

Arguments:

The second argument is an integer power, and the first is a value to raise to that power.

Semantics:

This function returns the first value raised to the second power with an unspecified sequence of rounding operations.

'llvm.sin.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.sin on any floating point or vector of floating point type. Not all targets support all types however.

   declare float     @llvm.sin.f32(float  %Val)
   declare double    @llvm.sin.f64(double %Val)
   declare x86_fp80  @llvm.sin.f80(x86_fp80  %Val)
   declare fp128     @llvm.sin.f128(fp128 %Val)
   declare ppc_fp128 @llvm.sin.ppcf128(ppc_fp128  %Val)
 
Overview:

The 'llvm.sin.*' intrinsics return the sine of the operand.

Arguments:

The argument and return value are floating point numbers of the same type.

Semantics:

This function returns the sine of the specified operand, returning the same values as the libm sin functions would, and handles error conditions in the same way.

'llvm.cos.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.cos on any floating point or vector of floating point type. Not all targets support all types however.

   declare float     @llvm.cos.f32(float  %Val)
   declare double    @llvm.cos.f64(double %Val)
   declare x86_fp80  @llvm.cos.f80(x86_fp80  %Val)
   declare fp128     @llvm.cos.f128(fp128 %Val)
   declare ppc_fp128 @llvm.cos.ppcf128(ppc_fp128  %Val)
 
Overview:

The 'llvm.cos.*' intrinsics return the cosine of the operand.

Arguments:

The argument and return value are floating point numbers of the same type.

Semantics:

This function returns the cosine of the specified operand, returning the same values as the libm cos functions would, and handles error conditions in the same way.

'llvm.pow.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.pow on any floating point or vector of floating point type. Not all targets support all types however.

   declare float     @llvm.pow.f32(float  %Val, float %Power)
   declare double    @llvm.pow.f64(double %Val, double %Power)
   declare x86_fp80  @llvm.pow.f80(x86_fp80  %Val, x86_fp80 %Power)
   declare fp128     @llvm.pow.f128(fp128 %Val, fp128 %Power)
   declare ppc_fp128 @llvm.pow.ppcf128(ppc_fp128  %Val, ppc_fp128 Power)
 
Overview:

The 'llvm.pow.*' intrinsics return the first operand raised to the specified (positive or negative) power.

Arguments:

The second argument is a floating point power, and the first is a value to raise to that power.

Semantics:

This function returns the first value raised to the second power, returning the same values as the libm pow functions would, and handles error conditions in the same way.

-
-

'llvm.exp.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.exp on any floating point or vector of floating point type. Not all targets support all types however.

   declare float     @llvm.exp.f32(float  %Val)
   declare double    @llvm.exp.f64(double %Val)
   declare x86_fp80  @llvm.exp.f80(x86_fp80  %Val)
   declare fp128     @llvm.exp.f128(fp128 %Val)
   declare ppc_fp128 @llvm.exp.ppcf128(ppc_fp128  %Val)
 
Overview:

The 'llvm.exp.*' intrinsics perform the exp function.

Arguments:

The argument and return value are floating point numbers of the same type.

Semantics:

This function returns the same values as the libm exp functions would, and handles error conditions in the same way.

'llvm.log.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.log on any floating point or vector of floating point type. Not all targets support all types however.

   declare float     @llvm.log.f32(float  %Val)
   declare double    @llvm.log.f64(double %Val)
   declare x86_fp80  @llvm.log.f80(x86_fp80  %Val)
   declare fp128     @llvm.log.f128(fp128 %Val)
   declare ppc_fp128 @llvm.log.ppcf128(ppc_fp128  %Val)
 
Overview:

The 'llvm.log.*' intrinsics perform the log function.

Arguments:

The argument and return value are floating point numbers of the same type.

Semantics:

This function returns the same values as the libm log functions would, and handles error conditions in the same way.

+
+ +

'llvm.fma.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.fma on any floating point or vector of floating point type. Not all targets support all types however.

   declare float     @llvm.fma.f32(float  %a, float  %b, float  %c)
   declare double    @llvm.fma.f64(double %a, double %b, double %c)
   declare x86_fp80  @llvm.fma.f80(x86_fp80 %a, x86_fp80 %b, x86_fp80 %c)
   declare fp128     @llvm.fma.f128(fp128 %a, fp128 %b, fp128 %c)
   declare ppc_fp128 @llvm.fma.ppcf128(ppc_fp128 %a, ppc_fp128 %b, ppc_fp128 %c)
 
Overview:

The 'llvm.fma.*' intrinsics perform the fused multiply-add operation.

Arguments:

The argument and return value are floating point numbers of the same type.

Semantics:

This function returns the same values as the libm fma functions would.

+ +

Bit Manipulation Intrinsics

LLVM provides intrinsics for a few important bit manipulation operations. These allow efficient code generation for some algorithms.

'llvm.bswap.*' Intrinsics

Syntax:

This is an overloaded intrinsic function. You can use bswap on any integer type that is an even number of bytes (i.e. BitWidth % 16 == 0).

   declare i16 @llvm.bswap.i16(i16 <id>)
   declare i32 @llvm.bswap.i32(i32 <id>)
   declare i64 @llvm.bswap.i64(i64 <id>)
 
Overview:

The 'llvm.bswap' family of intrinsics is used to byte swap integer values with an even number of bytes (positive multiple of 16 bits). These are useful for performing operations on data that is not in the target's native byte order.

Semantics:

The llvm.bswap.i16 intrinsic returns an i16 value that has the high and low byte of the input i16 swapped. Similarly, the llvm.bswap.i32 intrinsic returns an i32 value that has the four bytes of the input i32 swapped, so that if the input bytes are numbered 0, 1, 2, 3 then the returned i32 will have its bytes in 3, 2, 1, 0 order. The llvm.bswap.i48, llvm.bswap.i64 and other intrinsics extend this concept to additional even-byte lengths (6 bytes, 8 bytes and more, respectively).

'llvm.ctpop.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.ctpop on any integer bit width, or on any vector with integer elements. Not all targets support all bit widths or vector types, however.

   declare i8 @llvm.ctpop.i8(i8  <src>)
   declare i16 @llvm.ctpop.i16(i16 <src>)
   declare i32 @llvm.ctpop.i32(i32 <src>)
   declare i64 @llvm.ctpop.i64(i64 <src>)
   declare i256 @llvm.ctpop.i256(i256 <src>)
   declare <2 x i32> @llvm.ctpop.v2i32(<2 x i32> <src>)
 
Overview:

The 'llvm.ctpop' family of intrinsics counts the number of bits set in a value.

Arguments:

The only argument is the value to be counted. The argument may be of any integer type, or a vector with integer elements. The return type must match the argument type.

Semantics:

The 'llvm.ctpop' intrinsic counts the 1's in a variable, or within each element of a vector.

'llvm.ctlz.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.ctlz on any integer bit width, or any vector whose elements are integers. Not all targets support all bit widths or vector types, however.

   declare i8 @llvm.ctlz.i8 (i8  <src>)
   declare i16 @llvm.ctlz.i16(i16 <src>)
   declare i32 @llvm.ctlz.i32(i32 <src>)
   declare i64 @llvm.ctlz.i64(i64 <src>)
   declare i256 @llvm.ctlz.i256(i256 <src>)
   declare <2 x i32> @llvm.ctlz.v2i32(<2 x i32> <src;gt)
 
Overview:

The 'llvm.ctlz' family of intrinsic functions counts the number of leading zeros in a variable.

Arguments:

The only argument is the value to be counted. The argument may be of any integer type, or any vector type with integer element type. The return type must match the argument type.

Semantics:

The 'llvm.ctlz' intrinsic counts the leading (most significant) zeros in a variable, or within each element of the vector if the operation is of vector type. If the src == 0 then the result is the size in bits of the type of src. For example, llvm.ctlz(i32 2) = 30.

'llvm.cttz.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use llvm.cttz on any integer bit width, or any vector of integer elements. Not all targets support all bit widths or vector types, however.

   declare i8 @llvm.cttz.i8 (i8  <src>)
   declare i16 @llvm.cttz.i16(i16 <src>)
   declare i32 @llvm.cttz.i32(i32 <src>)
   declare i64 @llvm.cttz.i64(i64 <src>)
   declare i256 @llvm.cttz.i256(i256 <src>)
   declase <2 x i32> @llvm.cttz.v2i32(<2 x i32> <src>)
 
Overview:

The 'llvm.cttz' family of intrinsic functions counts the number of trailing zeros.

Arguments:

The only argument is the value to be counted. The argument may be of any integer type, or a vectory with integer element type.. The return type must match the argument type.

Semantics:

The 'llvm.cttz' intrinsic counts the trailing (least significant) zeros in a variable, or within each element of a vector. If the src == 0 then the result is the size in bits of the type of src. For example, llvm.cttz(2) = 1.

Arithmetic with Overflow Intrinsics

LLVM provides intrinsics for some arithmetic with overflow operations.

'llvm.sadd.with.overflow.*' Intrinsics

Syntax:

This is an overloaded intrinsic. You can use llvm.sadd.with.overflow on any integer bit width.

   declare {i16, i1} @llvm.sadd.with.overflow.i16(i16 %a, i16 %b)
   declare {i32, i1} @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
   declare {i64, i1} @llvm.sadd.with.overflow.i64(i64 %a, i64 %b)
 
Overview:

The 'llvm.sadd.with.overflow' family of intrinsic functions perform a signed addition of the two arguments, and indicate whether an overflow occurred during the signed summation.

Arguments:

The arguments (%a and %b) and the first element of the result structure may be of integer types of any bit width, but they must have the same bit width. The second element of the result structure must be of type i1. %a and %b are the two values that will undergo signed addition.

Semantics:

The 'llvm.sadd.with.overflow' family of intrinsic functions perform a signed addition of the two variables. They return a structure — the first element of which is the signed summation, and the second element of which is a bit specifying if the signed summation resulted in an overflow.

Examples:
   %res = call {i32, i1} @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
   %sum = extractvalue {i32, i1} %res, 0
   %obit = extractvalue {i32, i1} %res, 1
   br i1 %obit, label %overflow, label %normal
 

'llvm.uadd.with.overflow.*' Intrinsics

Syntax:

This is an overloaded intrinsic. You can use llvm.uadd.with.overflow on any integer bit width.

   declare {i16, i1} @llvm.uadd.with.overflow.i16(i16 %a, i16 %b)
   declare {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
   declare {i64, i1} @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
 
Overview:

The 'llvm.uadd.with.overflow' family of intrinsic functions perform an unsigned addition of the two arguments, and indicate whether a carry occurred during the unsigned summation.

Arguments:

The arguments (%a and %b) and the first element of the result structure may be of integer types of any bit width, but they must have the same bit width. The second element of the result structure must be of type i1. %a and %b are the two values that will undergo unsigned addition.

Semantics:

The 'llvm.uadd.with.overflow' family of intrinsic functions perform an unsigned addition of the two arguments. They return a structure — the first element of which is the sum, and the second element of which is a bit specifying if the unsigned summation resulted in a carry.

Examples:
   %res = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
   %sum = extractvalue {i32, i1} %res, 0
   %obit = extractvalue {i32, i1} %res, 1
   br i1 %obit, label %carry, label %normal
 

'llvm.ssub.with.overflow.*' Intrinsics

Syntax:

This is an overloaded intrinsic. You can use llvm.ssub.with.overflow on any integer bit width.

   declare {i16, i1} @llvm.ssub.with.overflow.i16(i16 %a, i16 %b)
   declare {i32, i1} @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)
   declare {i64, i1} @llvm.ssub.with.overflow.i64(i64 %a, i64 %b)
 
Overview:

The 'llvm.ssub.with.overflow' family of intrinsic functions perform a signed subtraction of the two arguments, and indicate whether an overflow occurred during the signed subtraction.

Arguments:

The arguments (%a and %b) and the first element of the result structure may be of integer types of any bit width, but they must have the same bit width. The second element of the result structure must be of type i1. %a and %b are the two values that will undergo signed subtraction.

Semantics:

The 'llvm.ssub.with.overflow' family of intrinsic functions perform a signed subtraction of the two arguments. They return a structure — the first element of which is the subtraction, and the second element of which is a bit specifying if the signed subtraction resulted in an overflow.

Examples:
   %res = call {i32, i1} @llvm.ssub.with.overflow.i32(i32 %a, i32 %b)
   %sum = extractvalue {i32, i1} %res, 0
   %obit = extractvalue {i32, i1} %res, 1
   br i1 %obit, label %overflow, label %normal
 

'llvm.usub.with.overflow.*' Intrinsics

Syntax:

This is an overloaded intrinsic. You can use llvm.usub.with.overflow on any integer bit width.

   declare {i16, i1} @llvm.usub.with.overflow.i16(i16 %a, i16 %b)
   declare {i32, i1} @llvm.usub.with.overflow.i32(i32 %a, i32 %b)
   declare {i64, i1} @llvm.usub.with.overflow.i64(i64 %a, i64 %b)
 
Overview:

The 'llvm.usub.with.overflow' family of intrinsic functions perform an unsigned subtraction of the two arguments, and indicate whether an overflow occurred during the unsigned subtraction.

Arguments:

The arguments (%a and %b) and the first element of the result structure may be of integer types of any bit width, but they must have the same bit width. The second element of the result structure must be of type i1. %a and %b are the two values that will undergo unsigned subtraction.

Semantics:

The 'llvm.usub.with.overflow' family of intrinsic functions perform an unsigned subtraction of the two arguments. They return a structure — the first element of which is the subtraction, and the second element of which is a bit specifying if the unsigned subtraction resulted in an overflow.

Examples:
   %res = call {i32, i1} @llvm.usub.with.overflow.i32(i32 %a, i32 %b)
   %sum = extractvalue {i32, i1} %res, 0
   %obit = extractvalue {i32, i1} %res, 1
   br i1 %obit, label %overflow, label %normal
 

'llvm.smul.with.overflow.*' Intrinsics

Syntax:

This is an overloaded intrinsic. You can use llvm.smul.with.overflow on any integer bit width.

   declare {i16, i1} @llvm.smul.with.overflow.i16(i16 %a, i16 %b)
   declare {i32, i1} @llvm.smul.with.overflow.i32(i32 %a, i32 %b)
   declare {i64, i1} @llvm.smul.with.overflow.i64(i64 %a, i64 %b)
 
Overview:

The 'llvm.smul.with.overflow' family of intrinsic functions perform a signed multiplication of the two arguments, and indicate whether an overflow occurred during the signed multiplication.

Arguments:

The arguments (%a and %b) and the first element of the result structure may be of integer types of any bit width, but they must have the same bit width. The second element of the result structure must be of type i1. %a and %b are the two values that will undergo signed multiplication.

Semantics:

The 'llvm.smul.with.overflow' family of intrinsic functions perform a signed multiplication of the two arguments. They return a structure — the first element of which is the multiplication, and the second element of which is a bit specifying if the signed multiplication resulted in an overflow.

Examples:
   %res = call {i32, i1} @llvm.smul.with.overflow.i32(i32 %a, i32 %b)
   %sum = extractvalue {i32, i1} %res, 0
   %obit = extractvalue {i32, i1} %res, 1
   br i1 %obit, label %overflow, label %normal
 

'llvm.umul.with.overflow.*' Intrinsics

Syntax:

This is an overloaded intrinsic. You can use llvm.umul.with.overflow on any integer bit width.

   declare {i16, i1} @llvm.umul.with.overflow.i16(i16 %a, i16 %b)
   declare {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b)
   declare {i64, i1} @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
 
Overview:

The 'llvm.umul.with.overflow' family of intrinsic functions perform a unsigned multiplication of the two arguments, and indicate whether an overflow occurred during the unsigned multiplication.

Arguments:

The arguments (%a and %b) and the first element of the result structure may be of integer types of any bit width, but they must have the same bit width. The second element of the result structure must be of type i1. %a and %b are the two values that will undergo unsigned multiplication.

Semantics:

The 'llvm.umul.with.overflow' family of intrinsic functions perform an unsigned multiplication of the two arguments. They return a structure — the first element of which is the multiplication, and the second element of which is a bit specifying if the unsigned multiplication resulted in an overflow.

Examples:
   %res = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %a, i32 %b)
   %sum = extractvalue {i32, i1} %res, 0
   %obit = extractvalue {i32, i1} %res, 1
   br i1 %obit, label %overflow, label %normal
 

Half Precision Floating Point Intrinsics

Half precision floating point is a storage-only format. This means that it is a dense encoding (in memory) but does not support computation in the format.

This means that code must first load the half-precision floating point value as an i16, then convert it to float with llvm.convert.from.fp16. Computation can then be performed on the float value (including extending to double etc). To store the value back to memory, it is first converted to float if needed, then converted to i16 with llvm.convert.to.fp16, then storing as an i16 value.

'llvm.convert.to.fp16' Intrinsic

Syntax:
   declare i16 @llvm.convert.to.fp16(f32 %a)
 
Overview:

The 'llvm.convert.to.fp16' intrinsic function performs a conversion from single precision floating point format to half precision floating point format.

Arguments:

The intrinsic function contains single argument - the value to be converted.

Semantics:

The 'llvm.convert.to.fp16' intrinsic function performs a conversion from single precision floating point format to half precision floating point format. The return value is an i16 which contains the converted number.

Examples:
   %res = call i16 @llvm.convert.to.fp16(f32 %a)
   store i16 %res, i16* @x, align 2
 

'llvm.convert.from.fp16' Intrinsic

Syntax:
   declare f32 @llvm.convert.from.fp16(i16 %a)
 
Overview:

The 'llvm.convert.from.fp16' intrinsic function performs a conversion from half precision floating point format to single precision floating point format.

Arguments:

The intrinsic function contains single argument - the value to be converted.

Semantics:

The 'llvm.convert.from.fp16' intrinsic function performs a conversion from half single precision floating point format to single precision floating point format. The input half-float value is represented by an i16 value.

Examples:
   %a = load i16* @x, align 2
   %res = call f32 @llvm.convert.from.fp16(i16 %a)
 

Debugger Intrinsics

The LLVM debugger intrinsics (which all start with llvm.dbg. prefix), are described in the LLVM Source Level Debugging document.

Exception Handling Intrinsics

The LLVM exception handling intrinsics (which all start with llvm.eh. prefix), are described in the LLVM Exception Handling document.

Trampoline Intrinsics

These intrinsics make it possible to excise one parameter, marked with the nest attribute, from a function. The result is a callable function pointer lacking the nest parameter - the caller does not need to provide a value for it. Instead, the value to use is stored in advance in a "trampoline", a block of memory usually allocated on the stack, which also contains code to splice the nest value into the argument list. This is used to implement the GCC nested function address extension.

For example, if the function is i32 f(i8* nest %c, i32 %x, i32 %y) then the resulting function pointer has signature i32 (i32, i32)*. It can be created as follows:

   %tramp = alloca [10 x i8], align 4 ; size and alignment only correct for X86
   %tramp1 = getelementptr [10 x i8]* %tramp, i32 0, i32 0
   call i8* @llvm.init.trampoline(i8* %tramp1, i8* bitcast (i32 (i8*, i32, i32)* @f to i8*), i8* %nval)
   %p = call i8* @llvm.adjust.trampoline(i8* %tramp1)
   %fp = bitcast i8* %p to i32 (i32, i32)*
 

The call %val = call i32 %fp(i32 %x, i32 %y) is then equivalent to %val = call i32 %f(i8* %nval, i32 %x, i32 %y).

'llvm.init.trampoline' Intrinsic

Syntax:
   declare void @llvm.init.trampoline(i8* <tramp>, i8* <func>, i8* <nval>)
 
Overview:

This fills the memory pointed to by tramp with executable code, turning it into a trampoline.

Arguments:

The llvm.init.trampoline intrinsic takes three arguments, all pointers. The tramp argument must point to a sufficiently large and sufficiently aligned block of memory; this memory is written to by the intrinsic. Note that the size and the alignment are target-specific - LLVM currently provides no portable way of determining them, so a front-end that generates this intrinsic needs to have some target-specific knowledge. The func argument must hold a function bitcast to an i8*.

Semantics:

The block of memory pointed to by tramp is filled with target dependent code, turning it into a function. Then tramp needs to be passed to llvm.adjust.trampoline to get a pointer which can be bitcast (to a new function) and called. The new function's signature is the same as that of func with any arguments marked with the nest attribute removed. At most one such nest argument is allowed, and it must be of pointer type. Calling the new function is equivalent to calling func with the same argument list, but with nval used for the missing nest argument. If, after calling llvm.init.trampoline, the memory pointed to by tramp is modified, then the effect of any later call to the returned function pointer is undefined.

'llvm.adjust.trampoline' Intrinsic

Syntax:
   declare i8* @llvm.adjust.trampoline(i8* <tramp>)
 
Overview:

This performs any required machine-specific adjustment to the address of a trampoline (passed as tramp).

Arguments:

tramp must point to a block of memory which already has trampoline code filled in by a previous call to llvm.init.trampoline .

Semantics:

On some architectures the address of the code to be executed needs to be different to the address where the trampoline is actually stored. This intrinsic returns the executable address corresponding to tramp after performing the required machine specific adjustments. The pointer returned can then be bitcast and executed.

- Atomic Operations and Synchronization Intrinsics -

- -
- -

These intrinsic functions expand the "universal IR" of LLVM to represent - hardware constructs for atomic operations and memory synchronization. This - provides an interface to the hardware, not an interface to the programmer. It - is aimed at a low enough level to allow any programming models or APIs - (Application Programming Interfaces) which need atomic behaviors to map - cleanly onto it. It is also modeled primarily on hardware behavior. Just as - hardware provides a "universal IR" for source languages, it also provides a - starting point for developing a "universal" atomic operation and - synchronization IR.

- -

These do not form an API such as high-level threading libraries, - software transaction memory systems, atomic primitives, and intrinsic - functions as found in BSD, GNU libc, atomic_ops, APR, and other system and - application libraries. The hardware interface provided by LLVM should allow - a clean implementation of all of these APIs and parallel programming models. - No one model or paradigm should be selected above others unless the hardware - itself ubiquitously does so.

- - -

- 'llvm.memory.barrier' Intrinsic -

- -
-
Syntax:
-
-  declare void @llvm.memory.barrier(i1 <ll>, i1 <ls>, i1 <sl>, i1 <ss>, i1 <device>)
-
- -
Overview:
-

The llvm.memory.barrier intrinsic guarantees ordering between - specific pairs of memory access types.

- -
Arguments:
-

The llvm.memory.barrier intrinsic requires five boolean arguments. - The first four arguments enables a specific barrier as listed below. The - fifth argument specifies that the barrier applies to io or device or uncached - memory.

- - - -
Semantics:
-

This intrinsic causes the system to enforce some ordering constraints upon - the loads and stores of the program. This barrier does not - indicate when any events will occur, it only enforces - an order in which they occur. For any of the specified pairs of load - and store operations (f.ex. load-load, or store-load), all of the first - operations preceding the barrier will complete before any of the second - operations succeeding the barrier begin. Specifically the semantics for each - pairing is as follows:

- - - -

These semantics are applied with a logical "and" behavior when more than one - is enabled in a single memory barrier intrinsic.

- -

Backends may implement stronger barriers than those requested when they do - not support as fine grained a barrier as requested. Some architectures do - not need all types of barriers and on such architectures, these become - noops.

- -
Example:
-
-%mallocP  = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32))
-%ptr      = bitcast i8* %mallocP to i32*
-            store i32 4, %ptr
-
-%result1  = load i32* %ptr      ; yields {i32}:result1 = 4
-            call void @llvm.memory.barrier(i1 false, i1 true, i1 false, i1 false, i1 true)
-                                ; guarantee the above finishes
-            store i32 8, %ptr   ; before this begins
-
- -
- - -

- 'llvm.atomic.cmp.swap.*' Intrinsic -

- -
- -
Syntax:
-

This is an overloaded intrinsic. You can use llvm.atomic.cmp.swap on - any integer bit width and for different address spaces. Not all targets - support all bit widths however.

- -
-  declare i8 @llvm.atomic.cmp.swap.i8.p0i8(i8* <ptr>, i8 <cmp>, i8 <val>)
-  declare i16 @llvm.atomic.cmp.swap.i16.p0i16(i16* <ptr>, i16 <cmp>, i16 <val>)
-  declare i32 @llvm.atomic.cmp.swap.i32.p0i32(i32* <ptr>, i32 <cmp>, i32 <val>)
-  declare i64 @llvm.atomic.cmp.swap.i64.p0i64(i64* <ptr>, i64 <cmp>, i64 <val>)
-
- -
Overview:
-

This loads a value in memory and compares it to a given value. If they are - equal, it stores a new value into the memory.

- -
Arguments:
-

The llvm.atomic.cmp.swap intrinsic takes three arguments. The result - as well as both cmp and val must be integer values with the - same bit width. The ptr argument must be a pointer to a value of - this integer type. While any bit width integer may be used, targets may only - lower representations they support in hardware.

- -
Semantics:
-

This entire intrinsic must be executed atomically. It first loads the value - in memory pointed to by ptr and compares it with the - value cmp. If they are equal, val is stored into the - memory. The loaded value is yielded in all cases. This provides the - equivalent of an atomic compare-and-swap operation within the SSA - framework.

- -
Examples:
-
-%mallocP  = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32))
-%ptr      = bitcast i8* %mallocP to i32*
-            store i32 4, %ptr
-
-%val1     = add i32 4, 4
-%result1  = call i32 @llvm.atomic.cmp.swap.i32.p0i32(i32* %ptr, i32 4, %val1)
-                                          ; yields {i32}:result1 = 4
-%stored1  = icmp eq i32 %result1, 4       ; yields {i1}:stored1 = true
-%memval1  = load i32* %ptr                ; yields {i32}:memval1 = 8
-
-%val2     = add i32 1, 1
-%result2  = call i32 @llvm.atomic.cmp.swap.i32.p0i32(i32* %ptr, i32 5, %val2)
-                                          ; yields {i32}:result2 = 8
-%stored2  = icmp eq i32 %result2, 5       ; yields {i1}:stored2 = false
-
-%memval2  = load i32* %ptr                ; yields {i32}:memval2 = 8
-
- -
- - -

- 'llvm.atomic.swap.*' Intrinsic -

- -
-
Syntax:
- -

This is an overloaded intrinsic. You can use llvm.atomic.swap on any - integer bit width. Not all targets support all bit widths however.

- -
-  declare i8 @llvm.atomic.swap.i8.p0i8(i8* <ptr>, i8 <val>)
-  declare i16 @llvm.atomic.swap.i16.p0i16(i16* <ptr>, i16 <val>)
-  declare i32 @llvm.atomic.swap.i32.p0i32(i32* <ptr>, i32 <val>)
-  declare i64 @llvm.atomic.swap.i64.p0i64(i64* <ptr>, i64 <val>)
-
- -
Overview:
-

This intrinsic loads the value stored in memory at ptr and yields - the value from memory. It then stores the value in val in the memory - at ptr.

- -
Arguments:
-

The llvm.atomic.swap intrinsic takes two arguments. Both - the val argument and the result must be integers of the same bit - width. The first argument, ptr, must be a pointer to a value of this - integer type. The targets may only lower integer representations they - support.

- -
Semantics:
-

This intrinsic loads the value pointed to by ptr, yields it, and - stores val back into ptr atomically. This provides the - equivalent of an atomic swap operation within the SSA framework.

- -
Examples:
-
-%mallocP  = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32))
-%ptr      = bitcast i8* %mallocP to i32*
-            store i32 4, %ptr
-
-%val1     = add i32 4, 4
-%result1  = call i32 @llvm.atomic.swap.i32.p0i32(i32* %ptr, i32 %val1)
-                                        ; yields {i32}:result1 = 4
-%stored1  = icmp eq i32 %result1, 4     ; yields {i1}:stored1 = true
-%memval1  = load i32* %ptr              ; yields {i32}:memval1 = 8
-
-%val2     = add i32 1, 1
-%result2  = call i32 @llvm.atomic.swap.i32.p0i32(i32* %ptr, i32 %val2)
-                                        ; yields {i32}:result2 = 8
-
-%stored2  = icmp eq i32 %result2, 8     ; yields {i1}:stored2 = true
-%memval2  = load i32* %ptr              ; yields {i32}:memval2 = 2
-
- -
- - -

- 'llvm.atomic.load.add.*' Intrinsic -

- -
- -
Syntax:
-

This is an overloaded intrinsic. You can use llvm.atomic.load.add on - any integer bit width. Not all targets support all bit widths however.

- -
-  declare i8 @llvm.atomic.load.add.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.add.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.add.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.add.i64.p0i64(i64* <ptr>, i64 <delta>)
-
- -
Overview:
-

This intrinsic adds delta to the value stored in memory - at ptr. It yields the original value at ptr.

- -
Arguments:
-

The intrinsic takes two arguments, the first a pointer to an integer value - and the second an integer value. The result is also an integer value. These - integer types can have any bit width, but they must all have the same bit - width. The targets may only lower integer representations they support.

- -
Semantics:
-

This intrinsic does a series of operations atomically. It first loads the - value stored at ptr. It then adds delta, stores the result - to ptr. It yields the original value stored at ptr.

- -
Examples:
-
-%mallocP  = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32))
-%ptr      = bitcast i8* %mallocP to i32*
-            store i32 4, %ptr
-%result1  = call i32 @llvm.atomic.load.add.i32.p0i32(i32* %ptr, i32 4)
-                                ; yields {i32}:result1 = 4
-%result2  = call i32 @llvm.atomic.load.add.i32.p0i32(i32* %ptr, i32 2)
-                                ; yields {i32}:result2 = 8
-%result3  = call i32 @llvm.atomic.load.add.i32.p0i32(i32* %ptr, i32 5)
-                                ; yields {i32}:result3 = 10
-%memval1  = load i32* %ptr      ; yields {i32}:memval1 = 15
-
- -
- - -

- 'llvm.atomic.load.sub.*' Intrinsic -

- -
- -
Syntax:
-

This is an overloaded intrinsic. You can use llvm.atomic.load.sub on - any integer bit width and for different address spaces. Not all targets - support all bit widths however.

- -
-  declare i8 @llvm.atomic.load.sub.i8.p0i32(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.sub.i16.p0i32(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.sub.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.sub.i64.p0i32(i64* <ptr>, i64 <delta>)
-
- -
Overview:
-

This intrinsic subtracts delta to the value stored in memory at - ptr. It yields the original value at ptr.

- -
Arguments:
-

The intrinsic takes two arguments, the first a pointer to an integer value - and the second an integer value. The result is also an integer value. These - integer types can have any bit width, but they must all have the same bit - width. The targets may only lower integer representations they support.

- -
Semantics:
-

This intrinsic does a series of operations atomically. It first loads the - value stored at ptr. It then subtracts delta, stores the - result to ptr. It yields the original value stored - at ptr.

- -
Examples:
-
-%mallocP  = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32))
-%ptr      = bitcast i8* %mallocP to i32*
-            store i32 8, %ptr
-%result1  = call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %ptr, i32 4)
-                                ; yields {i32}:result1 = 8
-%result2  = call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %ptr, i32 2)
-                                ; yields {i32}:result2 = 4
-%result3  = call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %ptr, i32 5)
-                                ; yields {i32}:result3 = 2
-%memval1  = load i32* %ptr      ; yields {i32}:memval1 = -3
-
- -
- - -

- - 'llvm.atomic.load.and.*' Intrinsic - -
- - 'llvm.atomic.load.nand.*' Intrinsic - -
- - 'llvm.atomic.load.or.*' Intrinsic - -
- - 'llvm.atomic.load.xor.*' Intrinsic - -

- -
- -
Syntax:
-

These are overloaded intrinsics. You can - use llvm.atomic.load_and, llvm.atomic.load_nand, - llvm.atomic.load_or, and llvm.atomic.load_xor on any integer - bit width and for different address spaces. Not all targets support all bit - widths however.

- -
-  declare i8 @llvm.atomic.load.and.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.and.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.and.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.and.i64.p0i64(i64* <ptr>, i64 <delta>)
-
- -
-  declare i8 @llvm.atomic.load.or.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.or.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.or.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.or.i64.p0i64(i64* <ptr>, i64 <delta>)
-
- -
-  declare i8 @llvm.atomic.load.nand.i8.p0i32(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.nand.i16.p0i32(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.nand.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.nand.i64.p0i32(i64* <ptr>, i64 <delta>)
-
- -
-  declare i8 @llvm.atomic.load.xor.i8.p0i32(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.xor.i16.p0i32(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.xor.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.xor.i64.p0i32(i64* <ptr>, i64 <delta>)
-
- -
Overview:
-

These intrinsics bitwise the operation (and, nand, or, xor) delta to - the value stored in memory at ptr. It yields the original value - at ptr.

- -
Arguments:
-

These intrinsics take two arguments, the first a pointer to an integer value - and the second an integer value. The result is also an integer value. These - integer types can have any bit width, but they must all have the same bit - width. The targets may only lower integer representations they support.

- -
Semantics:
-

These intrinsics does a series of operations atomically. They first load the - value stored at ptr. They then do the bitwise - operation delta, store the result to ptr. They yield the - original value stored at ptr.

- -
Examples:
-
-%mallocP  = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32))
-%ptr      = bitcast i8* %mallocP to i32*
-            store i32 0x0F0F, %ptr
-%result0  = call i32 @llvm.atomic.load.nand.i32.p0i32(i32* %ptr, i32 0xFF)
-                                ; yields {i32}:result0 = 0x0F0F
-%result1  = call i32 @llvm.atomic.load.and.i32.p0i32(i32* %ptr, i32 0xFF)
-                                ; yields {i32}:result1 = 0xFFFFFFF0
-%result2  = call i32 @llvm.atomic.load.or.i32.p0i32(i32* %ptr, i32 0F)
-                                ; yields {i32}:result2 = 0xF0
-%result3  = call i32 @llvm.atomic.load.xor.i32.p0i32(i32* %ptr, i32 0F)
-                                ; yields {i32}:result3 = FF
-%memval1  = load i32* %ptr      ; yields {i32}:memval1 = F0
-
- -
- - -

- - 'llvm.atomic.load.max.*' Intrinsic - -
- - 'llvm.atomic.load.min.*' Intrinsic - -
- - 'llvm.atomic.load.umax.*' Intrinsic - -
- - 'llvm.atomic.load.umin.*' Intrinsic - -

- -
- -
Syntax:
-

These are overloaded intrinsics. You can use llvm.atomic.load_max, - llvm.atomic.load_min, llvm.atomic.load_umax, and - llvm.atomic.load_umin on any integer bit width and for different - address spaces. Not all targets support all bit widths however.

- -
-  declare i8 @llvm.atomic.load.max.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.max.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.max.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.max.i64.p0i64(i64* <ptr>, i64 <delta>)
-
- -
-  declare i8 @llvm.atomic.load.min.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.min.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.min.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.min.i64.p0i64(i64* <ptr>, i64 <delta>)
-
- -
-  declare i8 @llvm.atomic.load.umax.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.umax.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.umax.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.umax.i64.p0i64(i64* <ptr>, i64 <delta>)
-
- -
-  declare i8 @llvm.atomic.load.umin.i8.p0i8(i8* <ptr>, i8 <delta>)
-  declare i16 @llvm.atomic.load.umin.i16.p0i16(i16* <ptr>, i16 <delta>)
-  declare i32 @llvm.atomic.load.umin.i32.p0i32(i32* <ptr>, i32 <delta>)
-  declare i64 @llvm.atomic.load.umin.i64.p0i64(i64* <ptr>, i64 <delta>)
-
- -
Overview:
-

These intrinsics takes the signed or unsigned minimum or maximum of - delta and the value stored in memory at ptr. It yields the - original value at ptr.

- -
Arguments:
-

These intrinsics take two arguments, the first a pointer to an integer value - and the second an integer value. The result is also an integer value. These - integer types can have any bit width, but they must all have the same bit - width. The targets may only lower integer representations they support.

- -
Semantics:
-

These intrinsics does a series of operations atomically. They first load the - value stored at ptr. They then do the signed or unsigned min or - max delta and the value, store the result to ptr. They - yield the original value stored at ptr.

- -
Examples:
-
-%mallocP  = tail call i8* @malloc(i32 ptrtoint (i32* getelementptr (i32* null, i32 1) to i32))
-%ptr      = bitcast i8* %mallocP to i32*
-            store i32 7, %ptr
-%result0  = call i32 @llvm.atomic.load.min.i32.p0i32(i32* %ptr, i32 -2)
-                                ; yields {i32}:result0 = 7
-%result1  = call i32 @llvm.atomic.load.max.i32.p0i32(i32* %ptr, i32 8)
-                                ; yields {i32}:result1 = -2
-%result2  = call i32 @llvm.atomic.load.umin.i32.p0i32(i32* %ptr, i32 10)
-                                ; yields {i32}:result2 = 8
-%result3  = call i32 @llvm.atomic.load.umax.i32.p0i32(i32* %ptr, i32 30)
-                                ; yields {i32}:result3 = 8
-%memval1  = load i32* %ptr      ; yields {i32}:memval1 = 30
-
- -
- -
- - -

Memory Use Markers

This class of intrinsics exists to information about the lifetime of memory objects and ranges where variables are immutable.

'llvm.lifetime.start' Intrinsic

Syntax:
   declare void @llvm.lifetime.start(i64 <size>, i8* nocapture <ptr>)
 
Overview:

The 'llvm.lifetime.start' intrinsic specifies the start of a memory object's lifetime.

Arguments:

The first argument is a constant integer representing the size of the object, or -1 if it is variable sized. The second argument is a pointer to the object.

Semantics:

This intrinsic indicates that before this point in the code, the value of the memory pointed to by ptr is dead. This means that it is known to never be used and has an undefined value. A load from the pointer that precedes this intrinsic can be replaced with 'undef'.

'llvm.lifetime.end' Intrinsic

Syntax:
   declare void @llvm.lifetime.end(i64 <size>, i8* nocapture <ptr>)
 
Overview:

The 'llvm.lifetime.end' intrinsic specifies the end of a memory object's lifetime.

Arguments:

The first argument is a constant integer representing the size of the object, or -1 if it is variable sized. The second argument is a pointer to the object.

Semantics:

This intrinsic indicates that after this point in the code, the value of the memory pointed to by ptr is dead. This means that it is known to never be used and has an undefined value. Any stores into the memory object following this intrinsic may be removed as dead.

'llvm.invariant.start' Intrinsic

Syntax:
   declare {}* @llvm.invariant.start(i64 <size>, i8* nocapture <ptr>)
 
Overview:

The 'llvm.invariant.start' intrinsic specifies that the contents of a memory object will not change.

Arguments:

The first argument is a constant integer representing the size of the object, or -1 if it is variable sized. The second argument is a pointer to the object.

Semantics:

This intrinsic indicates that until an llvm.invariant.end that uses the return value, the referenced memory location is constant and unchanging.

'llvm.invariant.end' Intrinsic

Syntax:
   declare void @llvm.invariant.end({}* <start>, i64 <size>, i8* nocapture <ptr>)
 
Overview:

The 'llvm.invariant.end' intrinsic specifies that the contents of a memory object are mutable.

Arguments:

The first argument is the matching llvm.invariant.start intrinsic. The second argument is a constant integer representing the size of the object, or -1 if it is variable sized and the third argument is a pointer to the object.

Semantics:

This intrinsic indicates that the memory is mutable again.

General Intrinsics

This class of intrinsics is designed to be generic and has no specific purpose.

'llvm.var.annotation' Intrinsic

Syntax:
   declare void @llvm.var.annotation(i8* <val>, i8* <str>, i8* <str>, i32  <int>)
 
Overview:

The 'llvm.var.annotation' intrinsic.

Arguments:

The first argument is a pointer to a value, the second is a pointer to a global string, the third is a pointer to a global string which is the source file name, and the last argument is the line number.

Semantics:

This intrinsic allows annotation of local variables with arbitrary strings. This can be useful for special purpose optimizations that want to look for these annotations. These have no other defined use; they are ignored by code generation and optimization.

'llvm.annotation.*' Intrinsic

Syntax:

This is an overloaded intrinsic. You can use 'llvm.annotation' on any integer bit width.

   declare i8 @llvm.annotation.i8(i8 <val>, i8* <str>, i8* <str>, i32  <int>)
   declare i16 @llvm.annotation.i16(i16 <val>, i8* <str>, i8* <str>, i32  <int>)
   declare i32 @llvm.annotation.i32(i32 <val>, i8* <str>, i8* <str>, i32  <int>)
   declare i64 @llvm.annotation.i64(i64 <val>, i8* <str>, i8* <str>, i32  <int>)
   declare i256 @llvm.annotation.i256(i256 <val>, i8* <str>, i8* <str>, i32  <int>)
 
Overview:

The 'llvm.annotation' intrinsic.

Arguments:

The first argument is an integer value (result of some expression), the second is a pointer to a global string, the third is a pointer to a global string which is the source file name, and the last argument is the line number. It returns the value of the first argument.

Semantics:

This intrinsic allows annotations to be put on arbitrary expressions with arbitrary strings. This can be useful for special purpose optimizations that want to look for these annotations. These have no other defined use; they are ignored by code generation and optimization.

'llvm.trap' Intrinsic

Syntax:
   declare void @llvm.trap()
 
Overview:

The 'llvm.trap' intrinsic.

Arguments:

None.

Semantics:

This intrinsics is lowered to the target dependent trap instruction. If the target does not have a trap instruction, this intrinsic will be lowered to the call of the abort() function.

'llvm.stackprotector' Intrinsic

Syntax:
   declare void @llvm.stackprotector(i8* <guard>, i8** <slot>)
 
Overview:

The llvm.stackprotector intrinsic takes the guard and stores it onto the stack at slot. The stack slot is adjusted to ensure that it is placed on the stack before local variables.

Arguments:

The llvm.stackprotector intrinsic requires two pointer arguments. The first argument is the value loaded from the stack guard @__stack_chk_guard. The second variable is an alloca that has enough space to hold the value of the guard.

Semantics:

This intrinsic causes the prologue/epilogue inserter to force the position of the AllocaInst stack slot to be before local variables on the stack. This is to ensure that if a local variable on the stack is overwritten, it will destroy the value of the guard. When the function exits, the guard on the stack is checked against the original guard. If they are different, then the program aborts by calling the __stack_chk_fail() function.

'llvm.objectsize' Intrinsic

Syntax:
   declare i32 @llvm.objectsize.i32(i8* <object>, i1 <type>)
   declare i64 @llvm.objectsize.i64(i8* <object>, i1 <type>)
 
Overview:

The llvm.objectsize intrinsic is designed to provide information to the optimizers to determine at compile time whether a) an operation (like memcpy) will overflow a buffer that corresponds to an object, or b) that a runtime check for overflow isn't necessary. An object in this context means an allocation of a specific class, structure, array, or other object.

Arguments:

The llvm.objectsize intrinsic takes two arguments. The first argument is a pointer to or into the object. The second argument is a boolean 0 or 1. This argument determines whether you want the maximum (0) or minimum (1) bytes remaining. This needs to be a literal 0 or 1, variables are not allowed.

Semantics:

The llvm.objectsize intrinsic is lowered to either a constant representing the size of the object concerned, or i32/i64 -1 or 0, depending on the type argument, if the size cannot be determined at compile time.


Valid CSS Valid HTML 4.01 Chris Lattner
The LLVM Compiler Infrastructure
- Last modified: $Date: 2011-10-14 01:04:49 +0200 (Fri, 14 Oct 2011) $ + Last modified: $Date: 2011-11-03 07:43:54 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/LinkTimeOptimization.html =================================================================== --- vendor/llvm/dist/docs/LinkTimeOptimization.html (revision 228363) +++ vendor/llvm/dist/docs/LinkTimeOptimization.html (revision 228364) @@ -1,400 +1,401 @@ + LLVM Link Time Optimization: Design and Implementation

LLVM Link Time Optimization: Design and Implementation

Written by Devang Patel and Nick Kledzik

Description

LLVM features powerful intermodular optimizations which can be used at link time. Link Time Optimization (LTO) is another name for intermodular optimization when performed during the link stage. This document describes the interface and design between the LTO optimizer and the linker.

Design Philosophy

The LLVM Link Time Optimizer provides complete transparency, while doing intermodular optimization, in the compiler tool chain. Its main goal is to let the developer take advantage of intermodular optimizations without making any significant changes to the developer's makefiles or build system. This is achieved through tight integration with the linker. In this model, the linker treates LLVM bitcode files like native object files and allows mixing and matching among them. The linker uses libLTO, a shared object, to handle LLVM bitcode files. This tight integration between the linker and LLVM optimizer helps to do optimizations that are not possible in other models. The linker input allows the optimizer to avoid relying on conservative escape analysis.

Example of link time optimization

The following example illustrates the advantages of LTO's integrated approach and clean interface. This example requires a system linker which supports LTO through the interface described in this document. Here, clang transparently invokes system linker.

 --- a.h ---
 extern int foo1(void);
 extern void foo2(void);
 extern void foo4(void);
 
 --- a.c ---
 #include "a.h"
 
 static signed int i = 0;
 
 void foo2(void) {
   i = -1;
 }
 
 static int foo3() {
   foo4();
   return 10;
 }
 
 int foo1(void) {
   int data = 0;
 
   if (i < 0) 
     data = foo3();
 
   data = data + 42;
   return data;
 }
 
 --- main.c ---
 #include <stdio.h>
 #include "a.h"
 
 void foo4(void) {
   printf("Hi\n");
 }
 
 int main() {
   return foo1();
 }
 
 --- command lines ---
 $ clang -emit-llvm -c a.c -o a.o   # <-- a.o is LLVM bitcode file
 $ clang -c main.c -o main.o        # <-- main.o is native object file
 $ clang a.o main.o -o main         # <-- standard link command without any modifications
 

This example illustrates the advantage of tight integration with the linker. Here, the optimizer can not remove foo3() without the linker's input.

Alternative Approaches

Compiler driver invokes link time optimizer separately.
In this model the link time optimizer is not able to take advantage of information collected during the linker's normal symbol resolution phase. In the above example, the optimizer can not remove foo2() without the linker's input because it is externally visible. This in turn prohibits the optimizer from removing foo3().
Use separate tool to collect symbol information from all object files.
In this model, a new, separate, tool or library replicates the linker's capability to collect information for link time optimization. Not only is this code duplication difficult to justify, but it also has several other disadvantages. For example, the linking semantics and the features provided by the linker on various platform are not unique. This means, this new tool needs to support all such features and platforms in one super tool or a separate tool per platform is required. This increases maintenance cost for link time optimizer significantly, which is not necessary. This approach also requires staying synchronized with linker developements on various platforms, which is not the main focus of the link time optimizer. Finally, this approach increases end user's build time due to the duplication of work done by this separate tool and the linker itself.

Multi-phase communication between libLTO and linker

The linker collects information about symbol defininitions and uses in various link objects which is more accurate than any information collected by other tools during typical build cycles. The linker collects this information by looking at the definitions and uses of symbols in native .o files and using symbol visibility information. The linker also uses user-supplied information, such as a list of exported symbols. LLVM optimizer collects control flow information, data flow information and knows much more about program structure from the optimizer's point of view. Our goal is to take advantage of tight integration between the linker and the optimizer by sharing this information during various linking phases.

Phase 1 : Read LLVM Bitcode Files

The linker first reads all object files in natural order and collects symbol information. This includes native object files as well as LLVM bitcode files. To minimize the cost to the linker in the case that all .o files are native object files, the linker only calls lto_module_create() when a supplied object file is found to not be a native object file. If lto_module_create() returns that the file is an LLVM bitcode file, the linker then iterates over the module using lto_module_get_symbol_name() and lto_module_get_symbol_attribute() to get all symbols defined and referenced. This information is added to the linker's global symbol table.

The lto* functions are all implemented in a shared object libLTO. This allows the LLVM LTO code to be updated independently of the linker tool. On platforms that support it, the shared object is lazily loaded.

Phase 2 : Symbol Resolution

In this stage, the linker resolves symbols using global symbol table. It may report undefined symbol errors, read archive members, replace weak symbols, etc. The linker is able to do this seamlessly even though it does not know the exact content of input LLVM bitcode files. If dead code stripping is enabled then the linker collects the list of live symbols.

Phase 3 : Optimize Bitcode Files

After symbol resolution, the linker tells the LTO shared object which symbols are needed by native object files. In the example above, the linker reports that only foo1() is used by native object files using lto_codegen_add_must_preserve_symbol(). Next the linker invokes the LLVM optimizer and code generators using lto_codegen_compile() which returns a native object file creating by merging the LLVM bitcode files and applying various optimization passes.

Phase 4 : Symbol Resolution after optimization

In this phase, the linker reads optimized a native object file and updates the internal global symbol table to reflect any changes. The linker also collects information about any changes in use of external symbols by LLVM bitcode files. In the example above, the linker notes that foo4() is not used any more. If dead code stripping is enabled then the linker refreshes the live symbol information appropriately and performs dead code stripping.

After this phase, the linker continues linking as if it never saw LLVM bitcode files.

libLTO

libLTO is a shared object that is part of the LLVM tools, and is intended for use by a linker. libLTO provides an abstract C interface to use the LLVM interprocedural optimizer without exposing details of LLVM's internals. The intention is to keep the interface as stable as possible even when the LLVM optimizer continues to evolve. It should even be possible for a completely different compilation technology to provide a different libLTO that works with their object files and the standard linker tool.

lto_module_t

A non-native object file is handled via an lto_module_t. The following functions allow the linker to check if a file (on disk or in a memory buffer) is a file which libLTO can process:

 lto_module_is_object_file(const char*)
 lto_module_is_object_file_for_target(const char*, const char*)
 lto_module_is_object_file_in_memory(const void*, size_t)
 lto_module_is_object_file_in_memory_for_target(const void*, size_t, const char*)
 

If the object file can be processed by libLTO, the linker creates a lto_module_t by using one of

 lto_module_create(const char*)
 lto_module_create_from_memory(const void*, size_t)
 

and when done, the handle is released via

 lto_module_dispose(lto_module_t)
 

The linker can introspect the non-native object file by getting the number of symbols and getting the name and attributes of each symbol via:

 lto_module_get_num_symbols(lto_module_t)
 lto_module_get_symbol_name(lto_module_t, unsigned int)
 lto_module_get_symbol_attribute(lto_module_t, unsigned int)
 

The attributes of a symbol include the alignment, visibility, and kind.

lto_code_gen_t

Once the linker has loaded each non-native object files into an lto_module_t, it can request libLTO to process them all and generate a native object file. This is done in a couple of steps. First, a code generator is created with:

lto_codegen_create()

Then, each non-native object file is added to the code generator with:

 lto_codegen_add_module(lto_code_gen_t, lto_module_t)
 

The linker then has the option of setting some codegen options. Whether or not to generate DWARF debug info is set with:

lto_codegen_set_debug_model(lto_code_gen_t)

Which kind of position independence is set with:

lto_codegen_set_pic_model(lto_code_gen_t) 

And each symbol that is referenced by a native object file or otherwise must not be optimized away is set with:

 lto_codegen_add_must_preserve_symbol(lto_code_gen_t, const char*)
 

After all these settings are done, the linker requests that a native object file be created from the modules with the settings using:

lto_codegen_compile(lto_code_gen_t, size*)

which returns a pointer to a buffer containing the generated native object file. The linker then parses that and links it with the rest of the native object files.


Valid CSS Valid HTML 4.01 Devang Patel and Nick Kledzik
LLVM Compiler Infrastructure
- Last modified: $Date: 2011-09-18 14:51:05 +0200 (Sun, 18 Sep 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/Packaging.html =================================================================== --- vendor/llvm/dist/docs/Packaging.html (revision 228363) +++ vendor/llvm/dist/docs/Packaging.html (revision 228364) @@ -1,118 +1,119 @@ + Advice on Packaging LLVM

Advice on Packaging LLVM

  1. Overview
  2. Compile Flags
  3. C++ Features
  4. Shared Library
  5. Dependencies

Overview

LLVM sets certain default configure options to make sure our developers don't break things for constrained platforms. These settings are not optimal for most desktop systems, and we hope that packagers (e.g., Redhat, Debian, MacPorts, etc.) will tweak them. This document lists settings we suggest you tweak.

LLVM's API changes with each release, so users are likely to want, for example, both LLVM-2.6 and LLVM-2.7 installed at the same time to support apps developed against each.

Compile Flags

LLVM runs much more quickly when it's optimized and assertions are removed. However, such a build is currently incompatible with users who build without defining NDEBUG, and the lack of assertions makes it hard to debug problems in user code. We recommend allowing users to install both optimized and debug versions of LLVM in parallel. The following configure flags are relevant:

--disable-assertions
Builds LLVM with NDEBUG defined. Changes the LLVM ABI. Also available by setting DISABLE_ASSERTIONS=0|1 in make's environment. This defaults to enabled regardless of the optimization setting, but it slows things down.
--enable-debug-symbols
Builds LLVM with -g. Also available by setting DEBUG_SYMBOLS=0|1 in make's environment. This defaults to disabled when optimizing, so you should turn it back on to let users debug their programs.
--enable-optimized
(For svn checkouts) Builds LLVM with -O2 and, by default, turns off debug symbols. Also available by setting ENABLE_OPTIMIZED=0|1 in make's environment. This defaults to enabled when not in a checkout.

C++ Features

RTTI
LLVM disables RTTI by default. Add REQUIRES_RTTI=1 to your environment while running make to re-enable it. This will allow users to build with RTTI enabled and still inherit from LLVM classes.

Shared Library

Configure with --enable-shared to build libLLVM-major.minor.(so|dylib) and link the tools against it. This saves lots of binary size at the cost of some startup time.

Dependencies

--enable-libffi
Depend on libffi to allow the LLVM interpreter to call external functions.
--with-oprofile
Depend on libopagent (>=version 0.9.4) to let the LLVM JIT tell oprofile about function addresses and line numbers.

Valid CSS Valid HTML 4.01 The LLVM Compiler Infrastructure
- Last modified: $Date: 2011-04-23 02:30:22 +0200 (Sat, 23 Apr 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/Passes.html =================================================================== --- vendor/llvm/dist/docs/Passes.html (revision 228363) +++ vendor/llvm/dist/docs/Passes.html (revision 228364) @@ -1,2048 +1,2046 @@ LLVM's Analysis and Transform Passes

LLVM's Analysis and Transform Passes

  1. Introduction
  2. Analysis Passes
  3. Transform Passes
  4. Utility Passes

Written by Reid Spencer and Gordon Henriksen

Introduction

This document serves as a high level summary of the optimization features that LLVM provides. Optimizations are implemented as Passes that traverse some portion of a program to either collect information or transform the program. The table below divides the passes that LLVM provides into three categories. Analysis passes compute information that other passes can use or for debugging or program visualization purposes. Transform passes can use (or invalidate) the analysis passes. Transform passes all mutate the program in some way. Utility passes provides some utility but don't otherwise fit categorization. For example passes to extract functions to bitcode or write a module to bitcode are neither analysis nor transform passes.

The table below provides a quick summary of each pass and links to the more complete pass description later in the document.

ANALYSIS PASSES
OptionName
-aa-evalExhaustive Alias Analysis Precision Evaluator
-basicaaBasic Alias Analysis (stateless AA impl)
-basiccgBasic CallGraph Construction
-count-aaCount Alias Analysis Query Responses
-debug-aaAA use debugger
-domfrontierDominance Frontier Construction
-domtreeDominator Tree Construction
-dot-callgraphPrint Call Graph to 'dot' file
-dot-cfgPrint CFG of function to 'dot' file
-dot-cfg-onlyPrint CFG of function to 'dot' file (with no function bodies)
-dot-domPrint dominance tree of function to 'dot' file
-dot-dom-onlyPrint dominance tree of function to 'dot' file (with no function bodies)
-dot-postdomPrint postdominance tree of function to 'dot' file
-dot-postdom-onlyPrint postdominance tree of function to 'dot' file (with no function bodies)
-globalsmodref-aaSimple mod/ref analysis for globals
-instcountCounts the various types of Instructions
-intervalsInterval Partition Construction
-iv-usersInduction Variable Users
-lazy-value-infoLazy Value Information Analysis
-ldaLoop Dependence Analysis
-libcall-aaLibCall Alias Analysis
-lintStatically lint-checks LLVM IR
-loopsNatural Loop Information
-memdepMemory Dependence Analysis
-module-debuginfoDecodes module-level debug info
-no-aaNo Alias Analysis (always returns 'may' alias)
-no-profileNo Profile Information
-postdomfrontierPost-Dominance Frontier Construction
-postdomtreePost-Dominator Tree Construction
-print-alias-setsAlias Set Printer
-print-callgraphPrint a call graph
-print-callgraph-sccsPrint SCCs of the Call Graph
-print-cfg-sccsPrint SCCs of each function CFG
-print-dbginfoPrint debug info in human readable form
-print-dom-infoDominator Info Printer
-print-externalfnconstantsPrint external fn callsites passed constants
-print-functionPrint function to stderr
-print-modulePrint module to stderr
-print-used-typesFind Used Types
-profile-estimatorEstimate profiling information
-profile-loaderLoad profile information from llvmprof.out
-profile-verifierVerify profiling information
-regionsDetect single entry single exit regions
-scalar-evolutionScalar Evolution Analysis
-scev-aaScalarEvolution-based Alias Analysis
-targetdataTarget Data Layout
TRANSFORM PASSES
OptionName
-adceAggressive Dead Code Elimination
-always-inlineInliner for always_inline functions
-argpromotionPromote 'by reference' arguments to scalars
-block-placementProfile Guided Basic Block Placement
-break-crit-edgesBreak critical edges in CFG
-codegenprepareOptimize for code generation
-constmergeMerge Duplicate Global Constants
-constpropSimple constant propagation
-dceDead Code Elimination
-deadargelimDead Argument Elimination
-deadtypeelimDead Type Elimination
-dieDead Instruction Elimination
-dseDead Store Elimination
-functionattrsDeduce function attributes
-globaldceDead Global Elimination
-globaloptGlobal Variable Optimizer
-gvnGlobal Value Numbering
-indvarsCanonicalize Induction Variables
-inlineFunction Integration/Inlining
-insert-edge-profilingInsert instrumentation for edge profiling
-insert-optimal-edge-profilingInsert optimal instrumentation for edge profiling
-instcombineCombine redundant instructions
-internalizeInternalize Global Symbols
-ipconstpropInterprocedural constant propagation
-ipsccpInterprocedural Sparse Conditional Constant Propagation
-jump-threadingJump Threading
-lcssaLoop-Closed SSA Form Pass
-licmLoop Invariant Code Motion
-loop-deletionDelete dead loops
-loop-extractExtract loops into new functions
-loop-extract-singleExtract at most one loop into a new function
-loop-reduceLoop Strength Reduction
-loop-rotateRotate Loops
-loop-simplifyCanonicalize natural loops
-loop-unrollUnroll loops
-loop-unswitchUnswitch loops
-loweratomicLower atomic intrinsics to non-atomic form
-lowerinvokeLower invoke and unwind, for unwindless code generators
-lowerswitchLower SwitchInst's to branches
-mem2regPromote Memory to Register
-memcpyoptMemCpy Optimization
-mergefuncMerge Functions
-mergereturnUnify function exit nodes
-partial-inlinerPartial Inliner
-prune-ehRemove unused exception handling info
-reassociateReassociate expressions
-reg2memDemote all values to stack slots
-scalarreplScalar Replacement of Aggregates (DT)
-sccpSparse Conditional Constant Propagation
-simplify-libcallsSimplify well-known library calls
-simplifycfgSimplify the CFG
-sinkCode sinking
-sretpromotionPromote sret arguments to multiple ret values
-stripStrip all symbols from a module
-strip-dead-debug-infoStrip debug info for unused symbols
-strip-dead-prototypesStrip Unused Function Prototypes
-strip-debug-declareStrip all llvm.dbg.declare intrinsics
-strip-nondebugStrip all symbols, except dbg symbols, from a module
-tailcallelimTail Call Elimination
-tailduplicateTail Duplication
UTILITY PASSES
OptionName
-deadarghaX0rDead Argument Hacking (BUGPOINT USE ONLY; DO NOT USE)
-extract-blocksExtract Basic Blocks From Module (for bugpoint use)
-instnamerAssign names to anonymous instructions
-preverifyPreliminary module verification
-verifyModule Verifier
-view-cfgView CFG of function
-view-cfg-onlyView CFG of function (with no function bodies)
-view-domView dominance tree of function
-view-dom-onlyView dominance tree of function (with no function bodies)
-view-postdomView postdominance tree of function
-view-postdom-onlyView postdominance tree of function (with no function bodies)

Analysis Passes

This section describes the LLVM Analysis Passes.

-aa-eval: Exhaustive Alias Analysis Precision Evaluator

This is a simple N^2 alias analysis accuracy evaluator. Basically, for each function in the program, it simply queries to see how the alias analysis implementation answers alias queries between each pair of pointers in the function.

This is inspired and adapted from code by: Naveen Neelakantam, Francesco Spadini, and Wojciech Stryjewski.

-basicaa: Basic Alias Analysis (stateless AA impl)

-

- This is the default implementation of the Alias Analysis interface - that simply implements a few identities (two different globals cannot alias, - etc), but otherwise does no analysis. -

+

A basic alias analysis pass that implements identities (two different + globals cannot alias, etc), but does no stateful analysis.

-basiccg: Basic CallGraph Construction

Yet to be written.

-count-aa: Count Alias Analysis Query Responses

A pass which can be used to count how many alias queries are being made and how the alias analysis implementation being used responds.

-debug-aa: AA use debugger

This simple pass checks alias analysis users to ensure that if they create a new value, they do not query AA without informing it of the value. It acts as a shim over any other AA pass you want.

Yes keeping track of every value in the program is expensive, but this is a debugging pass.

-domfrontier: Dominance Frontier Construction

This pass is a simple dominator construction algorithm for finding forward dominator frontiers.

-domtree: Dominator Tree Construction

This pass is a simple dominator construction algorithm for finding forward dominators.

-dot-callgraph: Print Call Graph to 'dot' file

This pass, only available in opt, prints the call graph into a .dot graph. This graph can then be processed with the "dot" tool to convert it to postscript or some other suitable format.

-dot-cfg: Print CFG of function to 'dot' file

This pass, only available in opt, prints the control flow graph into a .dot graph. This graph can then be processed with the "dot" tool to convert it to postscript or some other suitable format.

-dot-cfg-only: Print CFG of function to 'dot' file (with no function bodies)

This pass, only available in opt, prints the control flow graph into a .dot graph, omitting the function bodies. This graph can then be processed with the "dot" tool to convert it to postscript or some other suitable format.

-dot-dom: Print dominance tree of function to 'dot' file

This pass, only available in opt, prints the dominator tree into a .dot graph. This graph can then be processed with the "dot" tool to convert it to postscript or some other suitable format.

-dot-dom-only: Print dominance tree of function to 'dot' file (with no function bodies)

This pass, only available in opt, prints the dominator tree into a .dot graph, omitting the function bodies. This graph can then be processed with the "dot" tool to convert it to postscript or some other suitable format.

-dot-postdom: Print postdominance tree of function to 'dot' file

This pass, only available in opt, prints the post dominator tree into a .dot graph. This graph can then be processed with the "dot" tool to convert it to postscript or some other suitable format.

-dot-postdom-only: Print postdominance tree of function to 'dot' file (with no function bodies)

This pass, only available in opt, prints the post dominator tree into a .dot graph, omitting the function bodies. This graph can then be processed with the "dot" tool to convert it to postscript or some other suitable format.

-globalsmodref-aa: Simple mod/ref analysis for globals

This simple pass provides alias and mod/ref information for global values that do not have their address taken, and keeps track of whether functions read or write memory (are "pure"). For this simple (but very common) case, we can provide pretty accurate and useful information.

-instcount: Counts the various types of Instructions

This pass collects the count of all instructions and reports them

-intervals: Interval Partition Construction

This analysis calculates and represents the interval partition of a function, or a preexisting interval partition.

In this way, the interval partition may be used to reduce a flow graph down to its degenerate single node interval partition (unless it is irreducible).

-iv-users: Induction Variable Users

Bookkeeping for "interesting" users of expressions computed from induction variables.

-lazy-value-info: Lazy Value Information Analysis

Interface for lazy computation of value constraint information.

-lda: Loop Dependence Analysis

Loop dependence analysis framework, which is used to detect dependences in memory accesses in loops.

-libcall-aa: LibCall Alias Analysis

LibCall Alias Analysis.

-lint: Statically lint-checks LLVM IR

This pass statically checks for common and easily-identified constructs which produce undefined or likely unintended behavior in LLVM IR.

It is not a guarantee of correctness, in two ways. First, it isn't comprehensive. There are checks which could be done statically which are not yet implemented. Some of these are indicated by TODO comments, but those aren't comprehensive either. Second, many conditions cannot be checked statically. This pass does no dynamic instrumentation, so it can't check for all possible problems.

Another limitation is that it assumes all code will be executed. A store through a null pointer in a basic block which is never reached is harmless, but this pass will warn about it anyway.

Optimization passes may make conditions that this pass checks for more or less obvious. If an optimization pass appears to be introducing a warning, it may be that the optimization pass is merely exposing an existing condition in the code.

This code may be run before instcombine. In many cases, instcombine checks for the same kinds of things and turns instructions with undefined behavior into unreachable (or equivalent). Because of this, this pass makes some effort to look through bitcasts and so on.

-loops: Natural Loop Information

This analysis is used to identify natural loops and determine the loop depth of various nodes of the CFG. Note that the loops identified may actually be several natural loops that share the same header node... not just a single natural loop.

-memdep: Memory Dependence Analysis

An analysis that determines, for a given memory operation, what preceding memory operations it depends on. It builds on alias analysis information, and tries to provide a lazy, caching interface to a common kind of alias information query.

-module-debuginfo: Decodes module-level debug info

This pass decodes the debug info metadata in a module and prints in a (sufficiently-prepared-) human-readable form. For example, run this pass from opt along with the -analyze option, and it'll print to standard output.

-no-aa: No Alias Analysis (always returns 'may' alias)

- Always returns "I don't know" for alias queries. NoAA is unlike other alias - analysis implementations, in that it does not chain to a previous analysis. As - such it doesn't follow many of the rules that other alias analyses must. + This is the default implementation of the Alias Analysis interface. It always + returns "I don't know" for alias queries. NoAA is unlike other alias analysis + implementations, in that it does not chain to a previous analysis. As such it + doesn't follow many of the rules that other alias analyses must.

-no-profile: No Profile Information

The default "no profile" implementation of the abstract ProfileInfo interface.

-postdomfrontier: Post-Dominance Frontier Construction

This pass is a simple post-dominator construction algorithm for finding post-dominator frontiers.

-postdomtree: Post-Dominator Tree Construction

This pass is a simple post-dominator construction algorithm for finding post-dominators.

-print-alias-sets: Alias Set Printer

Yet to be written.

-print-callgraph: Print a call graph

This pass, only available in opt, prints the call graph to standard error in a human-readable form.

-print-callgraph-sccs: Print SCCs of the Call Graph

This pass, only available in opt, prints the SCCs of the call graph to standard error in a human-readable form.

-print-cfg-sccs: Print SCCs of each function CFG

This pass, only available in opt, prints the SCCs of each function CFG to standard error in a human-readable form.

-print-dbginfo: Print debug info in human readable form

Pass that prints instructions, and associated debug info:

-print-dom-info: Dominator Info Printer

Dominator Info Printer.

-print-externalfnconstants: Print external fn callsites passed constants

This pass, only available in opt, prints out call sites to external functions that are called with constant arguments. This can be useful when looking for standard library functions we should constant fold or handle in alias analyses.

-print-function: Print function to stderr

The PrintFunctionPass class is designed to be pipelined with other FunctionPasses, and prints out the functions of the module as they are processed.

-print-module: Print module to stderr

This pass simply prints out the entire module when it is executed.

-print-used-types: Find Used Types

This pass is used to seek out all of the types in use by the program. Note that this analysis explicitly does not include types only used by the symbol table.

-profile-estimator: Estimate profiling information

Profiling information that estimates the profiling information in a very crude and unimaginative way.

-profile-loader: Load profile information from llvmprof.out

A concrete implementation of profiling information that loads the information from a profile dump file.

-profile-verifier: Verify profiling information

Pass that checks profiling information for plausibility.

-regions: Detect single entry single exit regions

The RegionInfo pass detects single entry single exit regions in a function, where a region is defined as any subgraph that is connected to the remaining graph at only two spots. Furthermore, an hierarchical region tree is built.

-scalar-evolution: Scalar Evolution Analysis

The ScalarEvolution analysis can be used to analyze and catagorize scalar expressions in loops. It specializes in recognizing general induction variables, representing them with the abstract and opaque SCEV class. Given this analysis, trip counts of loops and other important properties can be obtained.

This analysis is primarily useful for induction variable substitution and strength reduction.

-scev-aa: ScalarEvolution-based Alias Analysis

Simple alias analysis implemented in terms of ScalarEvolution queries. This differs from traditional loop dependence analysis in that it tests for dependencies within a single iteration of a loop, rather than dependencies between different iterations. ScalarEvolution has a more complete understanding of pointer arithmetic than BasicAliasAnalysis' collection of ad-hoc analyses.

-targetdata: Target Data Layout

Provides other passes access to information on how the size and alignment required by the the target ABI for various data types.

Transform Passes

This section describes the LLVM Transform Passes.

-adce: Aggressive Dead Code Elimination

ADCE aggressively tries to eliminate code. This pass is similar to DCE but it assumes that values are dead until proven otherwise. This is similar to SCCP, except applied to the liveness of values.

-always-inline: Inliner for always_inline functions

A custom inliner that handles only functions that are marked as "always inline".

-argpromotion: Promote 'by reference' arguments to scalars

This pass promotes "by reference" arguments to be "by value" arguments. In practice, this means looking for internal functions that have pointer arguments. If it can prove, through the use of alias analysis, that an argument is *only* loaded, then it can pass the value into the function instead of the address of the value. This can cause recursive simplification of code and lead to the elimination of allocas (especially in C++ template code like the STL).

This pass also handles aggregate arguments that are passed into a function, scalarizing them if the elements of the aggregate are only loaded. Note that it refuses to scalarize aggregates which would require passing in more than three operands to the function, because passing thousands of operands for a large array or structure is unprofitable!

Note that this transformation could also be done for arguments that are only stored to (returning the value instead), but does not currently. This case would be best handled when and if LLVM starts supporting multiple return values from functions.

-block-placement: Profile Guided Basic Block Placement

This pass is a very simple profile guided basic block placement algorithm. The idea is to put frequently executed blocks together at the start of the function and hopefully increase the number of fall-through conditional branches. If there is no profile information for a particular function, this pass basically orders blocks in depth-first order.

-break-crit-edges: Break critical edges in CFG

Break all of the critical edges in the CFG by inserting a dummy basic block. It may be "required" by passes that cannot deal with critical edges. This transformation obviously invalidates the CFG, but can update forward dominator (set, immediate dominators, tree, and frontier) information.

-codegenprepare: Optimize for code generation

This pass munges the code in the input function to better prepare it for SelectionDAG-based code generation. This works around limitations in it's basic-block-at-a-time approach. It should eventually be removed.

-constmerge: Merge Duplicate Global Constants

Merges duplicate global constants together into a single constant that is shared. This is useful because some passes (ie TraceValues) insert a lot of string constants into the program, regardless of whether or not an existing string is available.

-constprop: Simple constant propagation

This file implements constant propagation and merging. It looks for instructions involving only constant operands and replaces them with a constant value instead of an instruction. For example:

add i32 1, 2

becomes

i32 3

NOTE: this pass has a habit of making definitions be dead. It is a good idea to to run a DIE (Dead Instruction Elimination) pass sometime after running this pass.

-dce: Dead Code Elimination

Dead code elimination is similar to dead instruction elimination, but it rechecks instructions that were used by removed instructions to see if they are newly dead.

-deadargelim: Dead Argument Elimination

This pass deletes dead arguments from internal functions. Dead argument elimination removes arguments which are directly dead, as well as arguments only passed into function calls as dead arguments of other functions. This pass also deletes dead arguments in a similar way.

This pass is often useful as a cleanup pass to run after aggressive interprocedural passes, which add possibly-dead arguments.

-deadtypeelim: Dead Type Elimination

This pass is used to cleanup the output of GCC. It eliminate names for types that are unused in the entire translation unit, using the find used types pass.

-die: Dead Instruction Elimination

Dead instruction elimination performs a single pass over the function, removing instructions that are obviously dead.

-dse: Dead Store Elimination

A trivial dead store elimination that only considers basic-block local redundant stores.

-functionattrs: Deduce function attributes

A simple interprocedural pass which walks the call-graph, looking for functions which do not access or only read non-local memory, and marking them readnone/readonly. In addition, it marks function arguments (of pointer type) 'nocapture' if a call to the function does not create any copies of the pointer value that outlive the call. This more or less means that the pointer is only dereferenced, and not returned from the function or stored in a global. This pass is implemented as a bottom-up traversal of the call-graph.

-globaldce: Dead Global Elimination

This transform is designed to eliminate unreachable internal globals from the program. It uses an aggressive algorithm, searching out globals that are known to be alive. After it finds all of the globals which are needed, it deletes whatever is left over. This allows it to delete recursive chunks of the program which are unreachable.

-globalopt: Global Variable Optimizer

This pass transforms simple global variables that never have their address taken. If obviously true, it marks read/write globals as constant, deletes variables only stored to, etc.

-gvn: Global Value Numbering

This pass performs global value numbering to eliminate fully and partially redundant instructions. It also performs redundant load elimination.

-indvars: Canonicalize Induction Variables

This transformation analyzes and transforms the induction variables (and computations derived from them) into simpler forms suitable for subsequent analysis and transformation.

This transformation makes the following changes to each loop with an identifiable induction variable:

  1. All loops are transformed to have a single canonical induction variable which starts at zero and steps by one.
  2. The canonical induction variable is guaranteed to be the first PHI node in the loop header block.
  3. Any pointer arithmetic recurrences are raised to use array subscripts.

If the trip count of a loop is computable, this pass also makes the following changes:

  1. The exit condition for the loop is canonicalized to compare the induction value against the exit value. This turns loops like:
    for (i = 7; i*i < 1000; ++i)
    into
    for (i = 0; i != 25; ++i)
  2. Any use outside of the loop of an expression derived from the indvar is changed to compute the derived value outside of the loop, eliminating the dependence on the exit value of the induction variable. If the only purpose of the loop is to compute the exit value of some derived expression, this transformation will make the loop dead.

This transformation should be followed by strength reduction after all of the desired loop transformations have been performed. Additionally, on targets where it is profitable, the loop could be transformed to count down to zero (the "do loop" optimization).

-inline: Function Integration/Inlining

Bottom-up inlining of functions into callees.

-insert-edge-profiling: Insert instrumentation for edge profiling

This pass instruments the specified program with counters for edge profiling. Edge profiling can give a reasonable approximation of the hot paths through a program, and is used for a wide variety of program transformations.

Note that this implementation is very naïve. It inserts a counter for every edge in the program, instead of using control flow information to prune the number of counters inserted.

-insert-optimal-edge-profiling: Insert optimal instrumentation for edge profiling

This pass instruments the specified program with counters for edge profiling. Edge profiling can give a reasonable approximation of the hot paths through a program, and is used for a wide variety of program transformations.

-instcombine: Combine redundant instructions

Combine instructions to form fewer, simple instructions. This pass does not modify the CFG This pass is where algebraic simplification happens.

This pass combines things like:

%Y = add i32 %X, 1
 %Z = add i32 %Y, 1

into:

%Z = add i32 %X, 2

This is a simple worklist driven algorithm.

This pass guarantees that the following canonicalizations are performed on the program:

-internalize: Internalize Global Symbols

This pass loops over all of the functions in the input module, looking for a main function. If a main function is found, all other functions and all global variables with initializers are marked as internal.

-ipconstprop: Interprocedural constant propagation

This pass implements an extremely simple interprocedural constant propagation pass. It could certainly be improved in many different ways, like using a worklist. This pass makes arguments dead, but does not remove them. The existing dead argument elimination pass should be run after this to clean up the mess.

-ipsccp: Interprocedural Sparse Conditional Constant Propagation

An interprocedural variant of Sparse Conditional Constant Propagation.

-jump-threading: Jump Threading

Jump threading tries to find distinct threads of control flow running through a basic block. This pass looks at blocks that have multiple predecessors and multiple successors. If one or more of the predecessors of the block can be proven to always cause a jump to one of the successors, we forward the edge from the predecessor to the successor by duplicating the contents of this block.

An example of when this can occur is code like this:

if () { ...
   X = 4;
 }
 if (X < 3) {

In this case, the unconditional branch at the end of the first if can be revectored to the false side of the second if.

-lcssa: Loop-Closed SSA Form Pass

This pass transforms loops by placing phi nodes at the end of the loops for all values that are live across the loop boundary. For example, it turns the left into the right code:

for (...)                for (...)
   if (c)                   if (c)
     X1 = ...                 X1 = ...
   else                     else
     X2 = ...                 X2 = ...
   X3 = phi(X1, X2)         X3 = phi(X1, X2)
 ... = X3 + 4              X4 = phi(X3)
                           ... = X4 + 4

This is still valid LLVM; the extra phi nodes are purely redundant, and will be trivially eliminated by InstCombine. The major benefit of this transformation is that it makes many other loop optimizations, such as LoopUnswitching, simpler.

-licm: Loop Invariant Code Motion

This pass performs loop invariant code motion, attempting to remove as much code from the body of a loop as possible. It does this by either hoisting code into the preheader block, or by sinking code to the exit blocks if it is safe. This pass also promotes must-aliased memory locations in the loop to live in registers, thus hoisting and sinking "invariant" loads and stores.

This pass uses alias analysis for two purposes:

-loop-deletion: Delete dead loops

This file implements the Dead Loop Deletion Pass. This pass is responsible for eliminating loops with non-infinite computable trip counts that have no side effects or volatile instructions, and do not contribute to the computation of the function's return value.

-loop-extract: Extract loops into new functions

A pass wrapper around the ExtractLoop() scalar transformation to extract each top-level loop into its own new function. If the loop is the only loop in a given function, it is not touched. This is a pass most useful for debugging via bugpoint.

-loop-extract-single: Extract at most one loop into a new function

Similar to Extract loops into new functions, this pass extracts one natural loop from the program into a function if it can. This is used by bugpoint.

-loop-reduce: Loop Strength Reduction

This pass performs a strength reduction on array references inside loops that have as one or more of their components the loop induction variable. This is accomplished by creating a new value to hold the initial value of the array access for the first iteration, and then creating a new GEP instruction in the loop to increment the value by the appropriate amount.

-loop-rotate: Rotate Loops

A simple loop rotation transformation.

-loop-simplify: Canonicalize natural loops

This pass performs several transformations to transform natural loops into a simpler form, which makes subsequent analyses and transformations simpler and more effective.

Loop pre-header insertion guarantees that there is a single, non-critical entry edge from outside of the loop to the loop header. This simplifies a number of analyses and transformations, such as LICM.

Loop exit-block insertion guarantees that all exit blocks from the loop (blocks which are outside of the loop that have predecessors inside of the loop) only have predecessors from inside of the loop (and are thus dominated by the loop header). This simplifies transformations such as store-sinking that are built into LICM.

This pass also guarantees that loops will have exactly one backedge.

Note that the simplifycfg pass will clean up blocks which are split out but end up being unnecessary, so usage of this pass should not pessimize generated code.

This pass obviously modifies the CFG, but updates loop information and dominator information.

-loop-unroll: Unroll loops

This pass implements a simple loop unroller. It works best when loops have been canonicalized by the -indvars pass, allowing it to determine the trip counts of loops easily.

-loop-unswitch: Unswitch loops

This pass transforms loops that contain branches on loop-invariant conditions to have multiple loops. For example, it turns the left into the right code:

for (...)                  if (lic)
   A                          for (...)
   if (lic)                     A; B; C
     B                      else
   C                          for (...)
                                A; C

This can increase the size of the code exponentially (doubling it every time a loop is unswitched) so we only unswitch if the resultant code will be smaller than a threshold.

This pass expects LICM to be run before it to hoist invariant conditions out of the loop, to make the unswitching opportunity obvious.

-loweratomic: Lower atomic intrinsics to non-atomic form

This pass lowers atomic intrinsics to non-atomic form for use in a known non-preemptible environment.

The pass does not verify that the environment is non-preemptible (in general this would require knowledge of the entire call graph of the program including any libraries which may not be available in bitcode form); it simply lowers every atomic intrinsic.

-lowerinvoke: Lower invoke and unwind, for unwindless code generators

This transformation is designed for use by code generators which do not yet support stack unwinding. This pass supports two models of exception handling lowering, the 'cheap' support and the 'expensive' support.

'Cheap' exception handling support gives the program the ability to execute any program which does not "throw an exception", by turning 'invoke' instructions into calls and by turning 'unwind' instructions into calls to abort(). If the program does dynamically use the unwind instruction, the program will print a message then abort.

'Expensive' exception handling support gives the full exception handling support to the program at the cost of making the 'invoke' instruction really expensive. It basically inserts setjmp/longjmp calls to emulate the exception handling as necessary.

Because the 'expensive' support slows down programs a lot, and EH is only used for a subset of the programs, it must be specifically enabled by the -enable-correct-eh-support option.

Note that after this pass runs the CFG is not entirely accurate (exceptional control flow edges are not correct anymore) so only very simple things should be done after the lowerinvoke pass has run (like generation of native code). This should not be used as a general purpose "my LLVM-to-LLVM pass doesn't support the invoke instruction yet" lowering pass.

-lowerswitch: Lower SwitchInst's to branches

Rewrites switch instructions with a sequence of branches, which allows targets to get away with not implementing the switch instruction until it is convenient.

-mem2reg: Promote Memory to Register

This file promotes memory references to be register references. It promotes alloca instructions which only have loads and stores as uses. An alloca is transformed by using dominator frontiers to place phi nodes, then traversing the function in depth-first order to rewrite loads and stores as appropriate. This is just the standard SSA construction algorithm to construct "pruned" SSA form.

-memcpyopt: MemCpy Optimization

This pass performs various transformations related to eliminating memcpy calls, or transforming sets of stores into memset's.

-mergefunc: Merge Functions

This pass looks for equivalent functions that are mergable and folds them. A hash is computed from the function, based on its type and number of basic blocks. Once all hashes are computed, we perform an expensive equality comparison on each function pair. This takes n^2/2 comparisons per bucket, so it's important that the hash function be high quality. The equality comparison iterates through each instruction in each basic block. When a match is found the functions are folded. If both functions are overridable, we move the functionality into a new internal function and leave two overridable thunks to it.

-mergereturn: Unify function exit nodes

Ensure that functions have at most one ret instruction in them. Additionally, it keeps track of which node is the new exit node of the CFG.

-partial-inliner: Partial Inliner

This pass performs partial inlining, typically by inlining an if statement that surrounds the body of the function.

-prune-eh: Remove unused exception handling info

This file implements a simple interprocedural pass which walks the call-graph, turning invoke instructions into call instructions if and only if the callee cannot throw an exception. It implements this as a bottom-up traversal of the call-graph.

-reassociate: Reassociate expressions

This pass reassociates commutative expressions in an order that is designed to promote better constant propagation, GCSE, LICM, PRE, etc.

For example: 4 + (x + 5) ⇒ x + (4 + 5)

In the implementation of this algorithm, constants are assigned rank = 0, function arguments are rank = 1, and other values are assigned ranks corresponding to the reverse post order traversal of current function (starting at 2), which effectively gives values in deep loops higher rank than values not in loops.

-reg2mem: Demote all values to stack slots

This file demotes all registers to memory references. It is intented to be the inverse of -mem2reg. By converting to load instructions, the only values live across basic blocks are alloca instructions and load instructions before phi nodes. It is intended that this should make CFG hacking much easier. To make later hacking easier, the entry block is split into two, such that all introduced alloca instructions (and nothing else) are in the entry block.

-scalarrepl: Scalar Replacement of Aggregates (DT)

The well-known scalar replacement of aggregates transformation. This transform breaks up alloca instructions of aggregate type (structure or array) into individual alloca instructions for each member if possible. Then, if possible, it transforms the individual alloca instructions into nice clean scalar SSA form.

This combines a simple scalar replacement of aggregates algorithm with the mem2reg algorithm because often interact, especially for C++ programs. As such, iterating between scalarrepl, then mem2reg until we run out of things to promote works well.

-sccp: Sparse Conditional Constant Propagation

Sparse conditional constant propagation and merging, which can be summarized as:

  1. Assumes values are constant unless proven otherwise
  2. Assumes BasicBlocks are dead unless proven otherwise
  3. Proves values to be constant, and replaces them with constants
  4. Proves conditional branches to be unconditional

Note that this pass has a habit of making definitions be dead. It is a good idea to to run a DCE pass sometime after running this pass.

-simplify-libcalls: Simplify well-known library calls

Applies a variety of small optimizations for calls to specific well-known function calls (e.g. runtime library functions). For example, a call exit(3) that occurs within the main() function can be transformed into simply return 3.

-simplifycfg: Simplify the CFG

Performs dead code elimination and basic block merging. Specifically:

  1. Removes basic blocks with no predecessors.
  2. Merges a basic block into its predecessor if there is only one and the predecessor only has one successor.
  3. Eliminates PHI nodes for basic blocks with a single predecessor.
  4. Eliminates a basic block that only contains an unconditional branch.

-sink: Code sinking

This pass moves instructions into successor blocks, when possible, so that they aren't executed on paths where their results aren't needed.

-sretpromotion: Promote sret arguments to multiple ret values

This pass finds functions that return a struct (using a pointer to the struct as the first argument of the function, marked with the 'sret' attribute) and replaces them with a new function that simply returns each of the elements of that struct (using multiple return values).

This pass works under a number of conditions:

-strip: Strip all symbols from a module

performs code stripping. this transformation can delete:

  1. names for virtual registers
  2. symbols for internal globals and functions
  3. debug information

note that this transformation makes code much less readable, so it should only be used in situations where the strip utility would be used, such as reducing code size or making it harder to reverse engineer code.

-strip-dead-debug-info: Strip debug info for unused symbols

performs code stripping. this transformation can delete:

  1. names for virtual registers
  2. symbols for internal globals and functions
  3. debug information

note that this transformation makes code much less readable, so it should only be used in situations where the strip utility would be used, such as reducing code size or making it harder to reverse engineer code.

-strip-dead-prototypes: Strip Unused Function Prototypes

This pass loops over all of the functions in the input module, looking for dead declarations and removes them. Dead declarations are declarations of functions for which no implementation is available (i.e., declarations for unused library functions).

-strip-debug-declare: Strip all llvm.dbg.declare intrinsics

This pass implements code stripping. Specifically, it can delete:

Note that this transformation makes code much less readable, so it should only be used in situations where the 'strip' utility would be used, such as reducing code size or making it harder to reverse engineer code.

-strip-nondebug: Strip all symbols, except dbg symbols, from a module

This pass implements code stripping. Specifically, it can delete:

Note that this transformation makes code much less readable, so it should only be used in situations where the 'strip' utility would be used, such as reducing code size or making it harder to reverse engineer code.

-tailcallelim: Tail Call Elimination

This file transforms calls of the current function (self recursion) followed by a return instruction with a branch to the entry of the function, creating a loop. This pass also implements the following extensions to the basic algorithm:

-tailduplicate: Tail Duplication

This pass performs a limited form of tail duplication, intended to simplify CFGs by removing some unconditional branches. This pass is necessary to straighten out loops created by the C front-end, but also is capable of making other code nicer. After this pass is run, the CFG simplify pass should be run to clean up the mess.

Utility Passes

This section describes the LLVM Utility Passes.

-deadarghaX0r: Dead Argument Hacking (BUGPOINT USE ONLY; DO NOT USE)

Same as dead argument elimination, but deletes arguments to functions which are external. This is only for use by bugpoint.

-extract-blocks: Extract Basic Blocks From Module (for bugpoint use)

This pass is used by bugpoint to extract all blocks from the module into their own functions.

-instnamer: Assign names to anonymous instructions

This is a little utility pass that gives instructions names, this is mostly useful when diffing the effect of an optimization because deleting an unnamed instruction can change all other instruction numbering, making the diff very noisy.

-preverify: Preliminary module verification

Ensures that the module is in the form required by the Module Verifier pass.

Running the verifier runs this pass automatically, so there should be no need to use it directly.

-verify: Module Verifier

Verifies an LLVM IR code. This is useful to run after an optimization which is undergoing testing. Note that llvm-as verifies its input before emitting bitcode, and also that malformed bitcode is likely to make LLVM crash. All language front-ends are therefore encouraged to verify their output before performing optimizing transformations.

Note that this does not provide full security verification (like Java), but instead just tries to ensure that code is well-formed.

-view-cfg: View CFG of function

Displays the control flow graph using the GraphViz tool.

-view-cfg-only: View CFG of function (with no function bodies)

Displays the control flow graph using the GraphViz tool, but omitting function bodies.

-view-dom: View dominance tree of function

Displays the dominator tree using the GraphViz tool.

-view-dom-only: View dominance tree of function (with no function bodies)

Displays the dominator tree using the GraphViz tool, but omitting function bodies.

-view-postdom: View postdominance tree of function

Displays the post dominator tree using the GraphViz tool.

-view-postdom-only: View postdominance tree of function (with no function bodies)

Displays the post dominator tree using the GraphViz tool, but omitting function bodies.


Valid CSS Valid HTML 4.01 Reid Spencer
LLVM Compiler Infrastructure
- Last modified: $Date: 2011-08-04 00:18:20 +0200 (Thu, 04 Aug 2011) $ + Last modified: $Date: 2011-11-04 07:30:50 +0100 (Fri, 04 Nov 2011) $
Index: vendor/llvm/dist/docs/ProgrammersManual.html =================================================================== --- vendor/llvm/dist/docs/ProgrammersManual.html (revision 228363) +++ vendor/llvm/dist/docs/ProgrammersManual.html (revision 228364) @@ -1,4062 +1,4059 @@ LLVM Programmer's Manual

LLVM Programmer's Manual

  1. Introduction
  2. General Information
  3. Important and useful LLVM APIs
  4. Picking the Right Data Structure for a Task
  5. Helpful Hints for Common Operations
  6. Threads and LLVM
  7. Advanced Topics
  8. The Core LLVM Class Hierarchy Reference

Written by Chris Lattner, Dinakar Dhurjati, Gabor Greif, Joel Stanley, Reid Spencer and Owen Anderson

Introduction

This document is meant to highlight some of the important classes and interfaces available in the LLVM source-base. This manual is not intended to explain what LLVM is, how it works, and what LLVM code looks like. It assumes that you know the basics of LLVM and are interested in writing transformations or otherwise analyzing or manipulating the code.

This document should get you oriented so that you can find your way in the continuously growing source code that makes up the LLVM infrastructure. Note that this manual is not intended to serve as a replacement for reading the source code, so if you think there should be a method in one of these classes to do something, but it's not listed, check the source. Links to the doxygen sources are provided to make this as easy as possible.

The first section of this document describes general information that is useful to know when working in the LLVM infrastructure, and the second describes the Core LLVM classes. In the future this manual will be extended with information describing how to use extension libraries, such as dominator information, CFG traversal routines, and useful utilities like the InstVisitor template.

General Information

This section contains general information that is useful if you are working in the LLVM source-base, but that isn't specific to any particular API.

The C++ Standard Template Library

LLVM makes heavy use of the C++ Standard Template Library (STL), perhaps much more than you are used to, or have seen before. Because of this, you might want to do a little background reading in the techniques used and capabilities of the library. There are many good pages that discuss the STL, and several books on the subject that you can get, so it will not be discussed in this document.

Here are some useful links:

  1. Dinkumware C++ Library reference - an excellent reference for the STL and other parts of the standard C++ library.
  2. C++ In a Nutshell - This is an O'Reilly book in the making. It has a decent Standard Library Reference that rivals Dinkumware's, and is unfortunately no longer free since the book has been published.
  3. C++ Frequently Asked Questions
  4. SGI's STL Programmer's Guide - Contains a useful Introduction to the STL.
  5. Bjarne Stroustrup's C++ Page
  6. Bruce Eckel's Thinking in C++, 2nd ed. Volume 2 Revision 4.0 (even better, get the book).

You are also encouraged to take a look at the LLVM Coding Standards guide which focuses on how to write maintainable code more than where to put your curly braces.

Other useful references

  1. Using static and shared libraries across platforms

Important and useful LLVM APIs

Here we highlight some LLVM APIs that are generally useful and good to know about when writing transformations.

The isa<>, cast<> and dyn_cast<> templates

The LLVM source-base makes extensive use of a custom form of RTTI. These templates have many similarities to the C++ dynamic_cast<> operator, but they don't have some drawbacks (primarily stemming from the fact that dynamic_cast<> only works on classes that have a v-table). Because they are used so often, you must know what they do and how they work. All of these templates are defined in the llvm/Support/Casting.h file (note that you very rarely have to include this file directly).

isa<>:

The isa<> operator works exactly like the Java "instanceof" operator. It returns true or false depending on whether a reference or pointer points to an instance of the specified class. This can be very useful for constraint checking of various sorts (example below).

cast<>:

The cast<> operator is a "checked cast" operation. It converts a pointer or reference from a base class to a derived class, causing an assertion failure if it is not really an instance of the right type. This should be used in cases where you have some information that makes you believe that something is of the right type. An example of the isa<> and cast<> template is:

 static bool isLoopInvariant(const Value *V, const Loop *L) {
   if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
     return true;
 
   // Otherwise, it must be an instruction...
   return !L->contains(cast<Instruction>(V)->getParent());
 }
 

Note that you should not use an isa<> test followed by a cast<>, for that use the dyn_cast<> operator.

dyn_cast<>:

The dyn_cast<> operator is a "checking cast" operation. It checks to see if the operand is of the specified type, and if so, returns a pointer to it (this operator does not work with references). If the operand is not of the correct type, a null pointer is returned. Thus, this works very much like the dynamic_cast<> operator in C++, and should be used in the same circumstances. Typically, the dyn_cast<> operator is used in an if statement or some other flow control statement like this:

 if (AllocationInst *AI = dyn_cast<AllocationInst>(Val)) {
   // ...
 }
 

This form of the if statement effectively combines together a call to isa<> and a call to cast<> into one statement, which is very convenient.

Note that the dyn_cast<> operator, like C++'s dynamic_cast<> or Java's instanceof operator, can be abused. In particular, you should not use big chained if/then/else blocks to check for lots of different variants of classes. If you find yourself wanting to do this, it is much cleaner and more efficient to use the InstVisitor class to dispatch over the instruction type directly.

cast_or_null<>:

The cast_or_null<> operator works just like the cast<> operator, except that it allows for a null pointer as an argument (which it then propagates). This can sometimes be useful, allowing you to combine several null checks into one.

dyn_cast_or_null<>:

The dyn_cast_or_null<> operator works just like the dyn_cast<> operator, except that it allows for a null pointer as an argument (which it then propagates). This can sometimes be useful, allowing you to combine several null checks into one.

These five templates can be used with any classes, whether they have a v-table or not. To add support for these templates, you simply need to add classof static methods to the class you are interested casting to. Describing this is currently outside the scope of this document, but there are lots of examples in the LLVM source base.

Passing strings (the StringRef and Twine classes)

Although LLVM generally does not do much string manipulation, we do have several important APIs which take strings. Two important examples are the Value class -- which has names for instructions, functions, etc. -- and the StringMap class which is used extensively in LLVM and Clang.

These are generic classes, and they need to be able to accept strings which may have embedded null characters. Therefore, they cannot simply take a const char *, and taking a const std::string& requires clients to perform a heap allocation which is usually unnecessary. Instead, many LLVM APIs use a StringRef or a const Twine& for passing strings efficiently.

The StringRef class

The StringRef data type represents a reference to a constant string (a character array and a length) and supports the common operations available on std:string, but does not require heap allocation.

It can be implicitly constructed using a C style null-terminated string, an std::string, or explicitly with a character pointer and length. For example, the StringRef find function is declared as:

   iterator find(StringRef Key);
 

and clients can call it using any one of:

   Map.find("foo");                 // Lookup "foo"
   Map.find(std::string("bar"));    // Lookup "bar"
   Map.find(StringRef("\0baz", 4)); // Lookup "\0baz"
 

Similarly, APIs which need to return a string may return a StringRef instance, which can be used directly or converted to an std::string using the str member function. See "llvm/ADT/StringRef.h" for more information.

You should rarely use the StringRef class directly, because it contains pointers to external memory it is not generally safe to store an instance of the class (unless you know that the external storage will not be freed). StringRef is small and pervasive enough in LLVM that it should always be passed by value.

The Twine class

The Twine class is an efficient way for APIs to accept concatenated strings. For example, a common LLVM paradigm is to name one instruction based on the name of another instruction with a suffix, for example:

     New = CmpInst::Create(..., SO->getName() + ".cmp");
 

The Twine class is effectively a lightweight rope which points to temporary (stack allocated) objects. Twines can be implicitly constructed as the result of the plus operator applied to strings (i.e., a C strings, an std::string, or a StringRef). The twine delays the actual concatenation of strings until it is actually required, at which point it can be efficiently rendered directly into a character array. This avoids unnecessary heap allocation involved in constructing the temporary results of string concatenation. See "llvm/ADT/Twine.h" for more information.

As with a StringRef, Twine objects point to external memory and should almost never be stored or mentioned directly. They are intended solely for use when defining a function which should be able to efficiently accept concatenated strings.

The DEBUG() macro and -debug option

Often when working on your pass you will put a bunch of debugging printouts and other code into your pass. After you get it working, you want to remove it, but you may need it again in the future (to work out new bugs that you run across).

Naturally, because of this, you don't want to delete the debug printouts, but you don't want them to always be noisy. A standard compromise is to comment them out, allowing you to enable them if you need them in the future.

The "llvm/Support/Debug.h" file provides a macro named DEBUG() that is a much nicer solution to this problem. Basically, you can put arbitrary code into the argument of the DEBUG macro, and it is only executed if 'opt' (or any other tool) is run with the '-debug' command line argument:

 DEBUG(errs() << "I am here!\n");
 

Then you can run your pass like this:

 $ opt < a.bc > /dev/null -mypass
 <no output>
 $ opt < a.bc > /dev/null -mypass -debug
 I am here!
 

Using the DEBUG() macro instead of a home-brewed solution allows you to not have to create "yet another" command line option for the debug output for your pass. Note that DEBUG() macros are disabled for optimized builds, so they do not cause a performance impact at all (for the same reason, they should also not contain side-effects!).

One additional nice thing about the DEBUG() macro is that you can enable or disable it directly in gdb. Just use "set DebugFlag=0" or "set DebugFlag=1" from the gdb if the program is running. If the program hasn't been started yet, you can always just run it with -debug.

Fine grained debug info with DEBUG_TYPE and the -debug-only option

Sometimes you may find yourself in a situation where enabling -debug just turns on too much information (such as when working on the code generator). If you want to enable debug information with more fine-grained control, you define the DEBUG_TYPE macro and the -debug only option as follows:

 #undef  DEBUG_TYPE
 DEBUG(errs() << "No debug type\n");
 #define DEBUG_TYPE "foo"
 DEBUG(errs() << "'foo' debug type\n");
 #undef  DEBUG_TYPE
 #define DEBUG_TYPE "bar"
 DEBUG(errs() << "'bar' debug type\n"));
 #undef  DEBUG_TYPE
 #define DEBUG_TYPE ""
 DEBUG(errs() << "No debug type (2)\n");
 

Then you can run your pass like this:

 $ opt < a.bc > /dev/null -mypass
 <no output>
 $ opt < a.bc > /dev/null -mypass -debug
 No debug type
 'foo' debug type
 'bar' debug type
 No debug type (2)
 $ opt < a.bc > /dev/null -mypass -debug-only=foo
 'foo' debug type
 $ opt < a.bc > /dev/null -mypass -debug-only=bar
 'bar' debug type
 

Of course, in practice, you should only set DEBUG_TYPE at the top of a file, to specify the debug type for the entire module (if you do this before you #include "llvm/Support/Debug.h", you don't have to insert the ugly #undef's). Also, you should use names more meaningful than "foo" and "bar", because there is no system in place to ensure that names do not conflict. If two different modules use the same string, they will all be turned on when the name is specified. This allows, for example, all debug information for instruction scheduling to be enabled with -debug-type=InstrSched, even if the source lives in multiple files.

The DEBUG_WITH_TYPE macro is also available for situations where you would like to set DEBUG_TYPE, but only for one specific DEBUG statement. It takes an additional first parameter, which is the type to use. For example, the preceding example could be written as:

 DEBUG_WITH_TYPE("", errs() << "No debug type\n");
 DEBUG_WITH_TYPE("foo", errs() << "'foo' debug type\n");
 DEBUG_WITH_TYPE("bar", errs() << "'bar' debug type\n"));
 DEBUG_WITH_TYPE("", errs() << "No debug type (2)\n");
 

The Statistic class & -stats option

The "llvm/ADT/Statistic.h" file provides a class named Statistic that is used as a unified way to keep track of what the LLVM compiler is doing and how effective various optimizations are. It is useful to see what optimizations are contributing to making a particular program run faster.

Often you may run your pass on some big program, and you're interested to see how many times it makes a certain transformation. Although you can do this with hand inspection, or some ad-hoc method, this is a real pain and not very useful for big programs. Using the Statistic class makes it very easy to keep track of this information, and the calculated information is presented in a uniform manner with the rest of the passes being executed.

There are many examples of Statistic uses, but the basics of using it are as follows:

  1. Define your statistic like this:

     #define DEBUG_TYPE "mypassname"   // This goes before any #includes.
     STATISTIC(NumXForms, "The # of times I did stuff");
     

    The STATISTIC macro defines a static variable, whose name is specified by the first argument. The pass name is taken from the DEBUG_TYPE macro, and the description is taken from the second argument. The variable defined ("NumXForms" in this case) acts like an unsigned integer.

  2. Whenever you make a transformation, bump the counter:

     ++NumXForms;   // I did stuff!
     

That's all you have to do. To get 'opt' to print out the statistics gathered, use the '-stats' option:

 $ opt -stats -mypassname < program.bc > /dev/null
 ... statistics output ...
 

When running opt on a C file from the SPEC benchmark suite, it gives a report that looks like this:

    7646 bitcodewriter   - Number of normal instructions
     725 bitcodewriter   - Number of oversized instructions
  129996 bitcodewriter   - Number of bitcode bytes written
    2817 raise           - Number of insts DCEd or constprop'd
    3213 raise           - Number of cast-of-self removed
    5046 raise           - Number of expression trees converted
      75 raise           - Number of other getelementptr's formed
     138 raise           - Number of load/store peepholes
      42 deadtypeelim    - Number of unused typenames removed from symtab
     392 funcresolve     - Number of varargs functions resolved
      27 globaldce       - Number of global variables removed
       2 adce            - Number of basic blocks removed
     134 cee             - Number of branches revectored
      49 cee             - Number of setcc instruction eliminated
     532 gcse            - Number of loads removed
    2919 gcse            - Number of instructions removed
      86 indvars         - Number of canonical indvars added
      87 indvars         - Number of aux indvars removed
      25 instcombine     - Number of dead inst eliminate
     434 instcombine     - Number of insts combined
     248 licm            - Number of load insts hoisted
    1298 licm            - Number of insts hoisted to a loop pre-header
       3 licm            - Number of insts hoisted to multiple loop preds (bad, no loop pre-header)
      75 mem2reg         - Number of alloca's promoted
    1444 cfgsimplify     - Number of blocks simplified
 

Obviously, with so many optimizations, having a unified framework for this stuff is very nice. Making your pass fit well into the framework makes it more maintainable and useful.

Viewing graphs while debugging code

Several of the important data structures in LLVM are graphs: for example CFGs made out of LLVM BasicBlocks, CFGs made out of LLVM MachineBasicBlocks, and Instruction Selection DAGs. In many cases, while debugging various parts of the compiler, it is nice to instantly visualize these graphs.

LLVM provides several callbacks that are available in a debug build to do exactly that. If you call the Function::viewCFG() method, for example, the current LLVM tool will pop up a window containing the CFG for the function where each basic block is a node in the graph, and each node contains the instructions in the block. Similarly, there also exists Function::viewCFGOnly() (does not include the instructions), the MachineFunction::viewCFG() and MachineFunction::viewCFGOnly(), and the SelectionDAG::viewGraph() methods. Within GDB, for example, you can usually use something like call DAG.viewGraph() to pop up a window. Alternatively, you can sprinkle calls to these functions in your code in places you want to debug.

Getting this to work requires a small amount of configuration. On Unix systems with X11, install the graphviz toolkit, and make sure 'dot' and 'gv' are in your path. If you are running on Mac OS/X, download and install the Mac OS/X Graphviz program, and add /Applications/Graphviz.app/Contents/MacOS/ (or wherever you install it) to your path. Once in your system and path are set up, rerun the LLVM configure script and rebuild LLVM to enable this functionality.

SelectionDAG has been extended to make it easier to locate interesting nodes in large complex graphs. From gdb, if you call DAG.setGraphColor(node, "color"), then the next call DAG.viewGraph() would highlight the node in the specified color (choices of colors can be found at colors.) More complex node attributes can be provided with call DAG.setGraphAttrs(node, "attributes") (choices can be found at Graph Attributes.) If you want to restart and clear all the current graph attributes, then you can call DAG.clearGraphAttrs().

Note that graph visualization features are compiled out of Release builds to reduce file size. This means that you need a Debug+Asserts or Release+Asserts build to use these features.

Picking the Right Data Structure for a Task

LLVM has a plethora of data structures in the llvm/ADT/ directory, and we commonly use STL data structures. This section describes the trade-offs you should consider when you pick one.

The first step is a choose your own adventure: do you want a sequential container, a set-like container, or a map-like container? The most important thing when choosing a container is the algorithmic properties of how you plan to access the container. Based on that, you should use:

Once the proper category of container is determined, you can fine tune the memory use, constant factors, and cache behaviors of access by intelligently picking a member of the category. Note that constant factors and cache behavior can be a big deal. If you have a vector that usually only contains a few elements (but could contain many), for example, it's much better to use SmallVector than vector . Doing so avoids (relatively) expensive malloc/free calls, which dwarf the cost of adding the elements to the container.

-
- -

Sequential Containers (std::vector, std::list, etc)

There are a variety of sequential containers available for you, based on your needs. Pick the first in this section that will do what you want.

llvm/ADT/ArrayRef.h

The llvm::ArrayRef class is the preferred class to use in an interface that accepts a sequential list of elements in memory and just reads from them. By taking an ArrayRef, the API can be passed a fixed size array, an std::vector, an llvm::SmallVector and anything else that is contiguous in memory.

Fixed Size Arrays

Fixed size arrays are very simple and very fast. They are good if you know exactly how many elements you have, or you have a (low) upper bound on how many you have.

Heap Allocated Arrays

Heap allocated arrays (new[] + delete[]) are also simple. They are good if the number of elements is variable, if you know how many elements you will need before the array is allocated, and if the array is usually large (if not, consider a SmallVector). The cost of a heap allocated array is the cost of the new/delete (aka malloc/free). Also note that if you are allocating an array of a type with a constructor, the constructor and destructors will be run for every element in the array (re-sizable vectors only construct those elements actually used).

"llvm/ADT/TinyPtrVector.h"

TinyPtrVector<Type> is a highly specialized collection class that is optimized to avoid allocation in the case when a vector has zero or one elements. It has two major restrictions: 1) it can only hold values of pointer type, and 2) it cannot hold a null pointer.

Since this container is highly specialized, it is rarely used.

"llvm/ADT/SmallVector.h"

SmallVector<Type, N> is a simple class that looks and smells just like vector<Type>: it supports efficient iteration, lays out elements in memory order (so you can do pointer arithmetic between elements), supports efficient push_back/pop_back operations, supports efficient random access to its elements, etc.

The advantage of SmallVector is that it allocates space for some number of elements (N) in the object itself. Because of this, if the SmallVector is dynamically smaller than N, no malloc is performed. This can be a big win in cases where the malloc/free call is far more expensive than the code that fiddles around with the elements.

This is good for vectors that are "usually small" (e.g. the number of predecessors/successors of a block is usually less than 8). On the other hand, this makes the size of the SmallVector itself large, so you don't want to allocate lots of them (doing so will waste a lot of space). As such, SmallVectors are most useful when on the stack.

SmallVector also provides a nice portable and efficient replacement for alloca.

<vector>

std::vector is well loved and respected. It is useful when SmallVector isn't: when the size of the vector is often large (thus the small optimization will rarely be a benefit) or if you will be allocating many instances of the vector itself (which would waste space for elements that aren't in the container). vector is also useful when interfacing with code that expects vectors :).

One worthwhile note about std::vector: avoid code like this:

 for ( ... ) {
    std::vector<foo> V;
    use V;
 }
 

Instead, write this as:

 std::vector<foo> V;
 for ( ... ) {
    use V;
    V.clear();
 }
 

Doing so will save (at least) one heap allocation and free per iteration of the loop.

<deque>

std::deque is, in some senses, a generalized version of std::vector. Like std::vector, it provides constant time random access and other similar properties, but it also provides efficient access to the front of the list. It does not guarantee continuity of elements within memory.

In exchange for this extra flexibility, std::deque has significantly higher constant factor costs than std::vector. If possible, use std::vector or something cheaper.

<list>

std::list is an extremely inefficient class that is rarely useful. It performs a heap allocation for every element inserted into it, thus having an extremely high constant factor, particularly for small data types. std::list also only supports bidirectional iteration, not random access iteration.

In exchange for this high cost, std::list supports efficient access to both ends of the list (like std::deque, but unlike std::vector or SmallVector). In addition, the iterator invalidation characteristics of std::list are stronger than that of a vector class: inserting or removing an element into the list does not invalidate iterator or pointers to other elements in the list.

llvm/ADT/ilist.h

ilist<T> implements an 'intrusive' doubly-linked list. It is intrusive, because it requires the element to store and provide access to the prev/next pointers for the list.

ilist has the same drawbacks as std::list, and additionally requires an ilist_traits implementation for the element type, but it provides some novel characteristics. In particular, it can efficiently store polymorphic objects, the traits class is informed when an element is inserted or removed from the list, and ilists are guaranteed to support a constant-time splice operation.

These properties are exactly what we want for things like Instructions and basic blocks, which is why these are implemented with ilists.

Related classes of interest are explained in the following subsections:

llvm/ADT/PackedVector.h

Useful for storing a vector of values using only a few number of bits for each value. Apart from the standard operations of a vector-like container, it can also perform an 'or' set operation.

For example:

 enum State {
     None = 0x0,
     FirstCondition = 0x1,
     SecondCondition = 0x2,
     Both = 0x3
 };
 
 State get() {
     PackedVector<State, 2> Vec1;
     Vec1.push_back(FirstCondition);
 
     PackedVector<State, 2> Vec2;
     Vec2.push_back(SecondCondition);
 
     Vec1 |= Vec2;
     return Vec1[0]; // returns 'Both'.
 }
 

ilist_traits

ilist_traits<T> is ilist<T>'s customization mechanism. iplist<T> (and consequently ilist<T>) publicly derive from this traits class.

iplist

iplist<T> is ilist<T>'s base and as such supports a slightly narrower interface. Notably, inserters from T& are absent.

ilist_traits<T> is a public base of this class and can be used for a wide variety of customizations.

llvm/ADT/ilist_node.h

ilist_node<T> implements a the forward and backward links that are expected by the ilist<T> (and analogous containers) in the default manner.

ilist_node<T>s are meant to be embedded in the node type T, usually T publicly derives from ilist_node<T>.

Sentinels

ilists have another specialty that must be considered. To be a good citizen in the C++ ecosystem, it needs to support the standard container operations, such as begin and end iterators, etc. Also, the operator-- must work correctly on the end iterator in the case of non-empty ilists.

The only sensible solution to this problem is to allocate a so-called sentinel along with the intrusive list, which serves as the end iterator, providing the back-link to the last element. However conforming to the C++ convention it is illegal to operator++ beyond the sentinel and it also must not be dereferenced.

These constraints allow for some implementation freedom to the ilist how to allocate and store the sentinel. The corresponding policy is dictated by ilist_traits<T>. By default a T gets heap-allocated whenever the need for a sentinel arises.

While the default policy is sufficient in most cases, it may break down when T does not provide a default constructor. Also, in the case of many instances of ilists, the memory overhead of the associated sentinels is wasted. To alleviate the situation with numerous and voluminous T-sentinels, sometimes a trick is employed, leading to ghostly sentinels.

Ghostly sentinels are obtained by specially-crafted ilist_traits<T> which superpose the sentinel with the ilist instance in memory. Pointer arithmetic is used to obtain the sentinel, which is relative to the ilist's this pointer. The ilist is augmented by an extra pointer, which serves as the back-link of the sentinel. This is the only field in the ghostly sentinel which can be legally accessed.

Other Sequential Container options

Other STL containers are available, such as std::string.

There are also various STL adapter classes such as std::queue, std::priority_queue, std::stack, etc. These provide simplified access to an underlying container but don't affect the cost of the container itself.

String-like containers

There are a variety of ways to pass around and use strings in C and C++, and LLVM adds a few new options to choose from. Pick the first option on this list that will do what you need, they are ordered according to their relative cost.

Note that is is generally preferred to not pass strings around as "const char*"'s. These have a number of problems, including the fact that they cannot represent embedded nul ("\0") characters, and do not have a length available efficiently. The general replacement for 'const char*' is StringRef.

For more information on choosing string containers for APIs, please see Passing strings.

llvm/ADT/StringRef.h

The StringRef class is a simple value class that contains a pointer to a character and a length, and is quite related to the ArrayRef class (but specialized for arrays of characters). Because StringRef carries a length with it, it safely handles strings with embedded nul characters in it, getting the length does not require a strlen call, and it even has very convenient APIs for slicing and dicing the character range that it represents.

StringRef is ideal for passing simple strings around that are known to be live, either because they are C string literals, std::string, a C array, or a SmallVector. Each of these cases has an efficient implicit conversion to StringRef, which doesn't result in a dynamic strlen being executed.

StringRef has a few major limitations which make more powerful string containers useful:

  1. You cannot directly convert a StringRef to a 'const char*' because there is no way to add a trailing nul (unlike the .c_str() method on various stronger classes).
  2. StringRef doesn't own or keep alive the underlying string bytes. As such it can easily lead to dangling pointers, and is not suitable for embedding in datastructures in most cases (instead, use an std::string or something like that).
  3. For the same reason, StringRef cannot be used as the return value of a method if the method "computes" the result string. Instead, use std::string.
  4. StringRef's do not allow you to mutate the pointed-to string bytes and it doesn't allow you to insert or remove bytes from the range. For editing operations like this, it interoperates with the Twine class.

Because of its strengths and limitations, it is very common for a function to take a StringRef and for a method on an object to return a StringRef that points into some string that it owns.

llvm/ADT/Twine.h

The Twine class is used as an intermediary datatype for APIs that want to take a string that can be constructed inline with a series of concatenations. Twine works by forming recursive instances of the Twine datatype (a simple value object) on the stack as temporary objects, linking them together into a tree which is then linearized when the Twine is consumed. Twine is only safe to use as the argument to a function, and should always be a const reference, e.g.:

     void foo(const Twine &T);
     ...
     StringRef X = ...
     unsigned i = ...
     foo(X + "." + Twine(i));
   

This example forms a string like "blarg.42" by concatenating the values together, and does not form intermediate strings containing "blarg" or "blarg.".

Because Twine is constructed with temporary objects on the stack, and because these instances are destroyed at the end of the current statement, it is an inherently dangerous API. For example, this simple variant contains undefined behavior and will probably crash:

     void foo(const Twine &T);
     ...
     StringRef X = ...
     unsigned i = ...
     const Twine &Tmp = X + "." + Twine(i);
     foo(Tmp);
   

... because the temporaries are destroyed before the call. That said, Twine's are much more efficient than intermediate std::string temporaries, and they work really well with StringRef. Just be aware of their limitations.

llvm/ADT/SmallString.h

SmallString is a subclass of SmallVector that adds some convenience APIs like += that takes StringRef's. SmallString avoids allocating memory in the case when the preallocated space is enough to hold its data, and it calls back to general heap allocation when required. Since it owns its data, it is very safe to use and supports full mutation of the string.

Like SmallVector's, the big downside to SmallString is their sizeof. While they are optimized for small strings, they themselves are not particularly small. This means that they work great for temporary scratch buffers on the stack, but should not generally be put into the heap: it is very rare to see a SmallString as the member of a frequently-allocated heap data structure or returned by-value.

std::string

The standard C++ std::string class is a very general class that (like SmallString) owns its underlying data. sizeof(std::string) is very reasonable so it can be embedded into heap data structures and returned by-value. On the other hand, std::string is highly inefficient for inline editing (e.g. concatenating a bunch of stuff together) and because it is provided by the standard library, its performance characteristics depend a lot of the host standard library (e.g. libc++ and MSVC provide a highly optimized string class, GCC contains a really slow implementation).

The major disadvantage of std::string is that almost every operation that makes them larger can allocate memory, which is slow. As such, it is better to use SmallVector or Twine as a scratch buffer, but then use std::string to persist the result.

Set-Like Containers (std::set, SmallSet, SetVector, etc)

Set-like containers are useful when you need to canonicalize multiple values into a single representation. There are several different choices for how to do this, providing various trade-offs.

A sorted 'vector'

If you intend to insert a lot of elements, then do a lot of queries, a great approach is to use a vector (or other sequential container) with std::sort+std::unique to remove duplicates. This approach works really well if your usage pattern has these two distinct phases (insert then query), and can be coupled with a good choice of sequential container.

This combination provides the several nice properties: the result data is contiguous in memory (good for cache locality), has few allocations, is easy to address (iterators in the final vector are just indices or pointers), and can be efficiently queried with a standard binary or radix search.

"llvm/ADT/SmallSet.h"

If you have a set-like data structure that is usually small and whose elements are reasonably small, a SmallSet<Type, N> is a good choice. This set has space for N elements in place (thus, if the set is dynamically smaller than N, no malloc traffic is required) and accesses them with a simple linear search. When the set grows beyond 'N' elements, it allocates a more expensive representation that guarantees efficient access (for most types, it falls back to std::set, but for pointers it uses something far better, SmallPtrSet).

The magic of this class is that it handles small sets extremely efficiently, but gracefully handles extremely large sets without loss of efficiency. The drawback is that the interface is quite small: it supports insertion, queries and erasing, but does not support iteration.

"llvm/ADT/SmallPtrSet.h"

SmallPtrSet has all the advantages of SmallSet (and a SmallSet of pointers is transparently implemented with a SmallPtrSet), but also supports iterators. If more than 'N' insertions are performed, a single quadratically probed hash table is allocated and grows as needed, providing extremely efficient access (constant time insertion/deleting/queries with low constant factors) and is very stingy with malloc traffic.

Note that, unlike std::set, the iterators of SmallPtrSet are invalidated whenever an insertion occurs. Also, the values visited by the iterators are not visited in sorted order.

"llvm/ADT/DenseSet.h"

DenseSet is a simple quadratically probed hash table. It excels at supporting small values: it uses a single allocation to hold all of the pairs that are currently inserted in the set. DenseSet is a great way to unique small values that are not simple pointers (use SmallPtrSet for pointers). Note that DenseSet has the same requirements for the value type that DenseMap has.

"llvm/ADT/FoldingSet.h"

FoldingSet is an aggregate class that is really good at uniquing expensive-to-create or polymorphic objects. It is a combination of a chained hash table with intrusive links (uniqued objects are required to inherit from FoldingSetNode) that uses SmallVector as part of its ID process.

Consider a case where you want to implement a "getOrCreateFoo" method for a complex object (for example, a node in the code generator). The client has a description of *what* it wants to generate (it knows the opcode and all the operands), but we don't want to 'new' a node, then try inserting it into a set only to find out it already exists, at which point we would have to delete it and return the node that already exists.

To support this style of client, FoldingSet perform a query with a FoldingSetNodeID (which wraps SmallVector) that can be used to describe the element that we want to query for. The query either returns the element matching the ID or it returns an opaque ID that indicates where insertion should take place. Construction of the ID usually does not require heap traffic.

Because FoldingSet uses intrusive links, it can support polymorphic objects in the set (for example, you can have SDNode instances mixed with LoadSDNodes). Because the elements are individually allocated, pointers to the elements are stable: inserting or removing elements does not invalidate any pointers to other elements.

<set>

std::set is a reasonable all-around set class, which is decent at many things but great at nothing. std::set allocates memory for each element inserted (thus it is very malloc intensive) and typically stores three pointers per element in the set (thus adding a large amount of per-element space overhead). It offers guaranteed log(n) performance, which is not particularly fast from a complexity standpoint (particularly if the elements of the set are expensive to compare, like strings), and has extremely high constant factors for lookup, insertion and removal.

The advantages of std::set are that its iterators are stable (deleting or inserting an element from the set does not affect iterators or pointers to other elements) and that iteration over the set is guaranteed to be in sorted order. If the elements in the set are large, then the relative overhead of the pointers and malloc traffic is not a big deal, but if the elements of the set are small, std::set is almost never a good choice.

"llvm/ADT/SetVector.h"

LLVM's SetVector<Type> is an adapter class that combines your choice of a set-like container along with a Sequential Container. The important property that this provides is efficient insertion with uniquing (duplicate elements are ignored) with iteration support. It implements this by inserting elements into both a set-like container and the sequential container, using the set-like container for uniquing and the sequential container for iteration.

The difference between SetVector and other sets is that the order of iteration is guaranteed to match the order of insertion into the SetVector. This property is really important for things like sets of pointers. Because pointer values are non-deterministic (e.g. vary across runs of the program on different machines), iterating over the pointers in the set will not be in a well-defined order.

The drawback of SetVector is that it requires twice as much space as a normal set and has the sum of constant factors from the set-like container and the sequential container that it uses. Use it *only* if you need to iterate over the elements in a deterministic order. SetVector is also expensive to delete elements out of (linear time), unless you use it's "pop_back" method, which is faster.

SetVector is an adapter class that defaults to using std::vector and a size 16 SmallSet for the underlying containers, so it is quite expensive. However, "llvm/ADT/SetVector.h" also provides a SmallSetVector class, which defaults to using a SmallVector and SmallSet of a specified size. If you use this, and if your sets are dynamically smaller than N, you will save a lot of heap traffic.

"llvm/ADT/UniqueVector.h"

UniqueVector is similar to SetVector, but it retains a unique ID for each element inserted into the set. It internally contains a map and a vector, and it assigns a unique ID for each value inserted into the set.

UniqueVector is very expensive: its cost is the sum of the cost of maintaining both the map and vector, it has high complexity, high constant factors, and produces a lot of malloc traffic. It should be avoided.

Other Set-Like Container Options

The STL provides several other options, such as std::multiset and the various "hash_set" like containers (whether from C++ TR1 or from the SGI library). We never use hash_set and unordered_set because they are generally very expensive (each insertion requires a malloc) and very non-portable.

std::multiset is useful if you're not interested in elimination of duplicates, but has all the drawbacks of std::set. A sorted vector (where you don't delete duplicate entries) or some other approach is almost always better.

Map-Like Containers (std::map, DenseMap, etc)

Map-like containers are useful when you want to associate data to a key. As usual, there are a lot of different ways to do this. :)

A sorted 'vector'

If your usage pattern follows a strict insert-then-query approach, you can trivially use the same approach as sorted vectors for set-like containers. The only difference is that your query function (which uses std::lower_bound to get efficient log(n) lookup) should only compare the key, not both the key and value. This yields the same advantages as sorted vectors for sets.

"llvm/ADT/StringMap.h"

Strings are commonly used as keys in maps, and they are difficult to support efficiently: they are variable length, inefficient to hash and compare when long, expensive to copy, etc. StringMap is a specialized container designed to cope with these issues. It supports mapping an arbitrary range of bytes to an arbitrary other object.

The StringMap implementation uses a quadratically-probed hash table, where the buckets store a pointer to the heap allocated entries (and some other stuff). The entries in the map must be heap allocated because the strings are variable length. The string data (key) and the element object (value) are stored in the same allocation with the string data immediately after the element object. This container guarantees the "(char*)(&Value+1)" points to the key string for a value.

The StringMap is very fast for several reasons: quadratic probing is very cache efficient for lookups, the hash value of strings in buckets is not recomputed when looking up an element, StringMap rarely has to touch the memory for unrelated objects when looking up a value (even when hash collisions happen), hash table growth does not recompute the hash values for strings already in the table, and each pair in the map is store in a single allocation (the string data is stored in the same allocation as the Value of a pair).

StringMap also provides query methods that take byte ranges, so it only ever copies a string if a value is inserted into the table.

"llvm/ADT/IndexedMap.h"

IndexedMap is a specialized container for mapping small dense integers (or values that can be mapped to small dense integers) to some other type. It is internally implemented as a vector with a mapping function that maps the keys to the dense integer range.

This is useful for cases like virtual registers in the LLVM code generator: they have a dense mapping that is offset by a compile-time constant (the first virtual register ID).

"llvm/ADT/DenseMap.h"

DenseMap is a simple quadratically probed hash table. It excels at supporting small keys and values: it uses a single allocation to hold all of the pairs that are currently inserted in the map. DenseMap is a great way to map pointers to pointers, or map other small types to each other.

There are several aspects of DenseMap that you should be aware of, however. The iterators in a densemap are invalidated whenever an insertion occurs, unlike map. Also, because DenseMap allocates space for a large number of key/value pairs (it starts with 64 by default), it will waste a lot of space if your keys or values are large. Finally, you must implement a partial specialization of DenseMapInfo for the key that you want, if it isn't already supported. This is required to tell DenseMap about two special marker values (which can never be inserted into the map) that it needs internally.

"llvm/ADT/ValueMap.h"

ValueMap is a wrapper around a DenseMap mapping Value*s (or subclasses) to another type. When a Value is deleted or RAUW'ed, ValueMap will update itself so the new version of the key is mapped to the same value, just as if the key were a WeakVH. You can configure exactly how this happens, and what else happens on these two events, by passing a Config parameter to the ValueMap template.

"llvm/ADT/IntervalMap.h"

IntervalMap is a compact map for small keys and values. It maps key intervals instead of single keys, and it will automatically coalesce adjacent intervals. When then map only contains a few intervals, they are stored in the map object itself to avoid allocations.

The IntervalMap iterators are quite big, so they should not be passed around as STL iterators. The heavyweight iterators allow a smaller data structure.

<map>

std::map has similar characteristics to std::set: it uses a single allocation per pair inserted into the map, it offers log(n) lookup with an extremely large constant factor, imposes a space penalty of 3 pointers per pair in the map, etc.

std::map is most useful when your keys or values are very large, if you need to iterate over the collection in sorted order, or if you need stable iterators into the map (i.e. they don't get invalidated if an insertion or deletion of another element takes place).

"llvm/ADT/IntEqClasses.h"

IntEqClasses provides a compact representation of equivalence classes of small integers. Initially, each integer in the range 0..n-1 has its own equivalence class. Classes can be joined by passing two class representatives to the join(a, b) method. Two integers are in the same class when findLeader() returns the same representative.

Once all equivalence classes are formed, the map can be compressed so each integer 0..n-1 maps to an equivalence class number in the range 0..m-1, where m is the total number of equivalence classes. The map must be uncompressed before it can be edited again.

Other Map-Like Container Options

The STL provides several other options, such as std::multimap and the various "hash_map" like containers (whether from C++ TR1 or from the SGI library). We never use hash_set and unordered_set because they are generally very expensive (each insertion requires a malloc) and very non-portable.

std::multimap is useful if you want to map a key to multiple values, but has all the drawbacks of std::map. A sorted vector or some other approach is almost always better.

Bit storage containers (BitVector, SparseBitVector)

Unlike the other containers, there are only two bit storage containers, and choosing when to use each is relatively straightforward.

One additional option is std::vector<bool>: we discourage its use for two reasons 1) the implementation in many common compilers (e.g. commonly available versions of GCC) is extremely inefficient and 2) the C++ standards committee is likely to deprecate this container and/or change it significantly somehow. In any case, please don't use it.

BitVector

The BitVector container provides a dynamic size set of bits for manipulation. It supports individual bit setting/testing, as well as set operations. The set operations take time O(size of bitvector), but operations are performed one word at a time, instead of one bit at a time. This makes the BitVector very fast for set operations compared to other containers. Use the BitVector when you expect the number of set bits to be high (IE a dense set).

SmallBitVector

The SmallBitVector container provides the same interface as BitVector, but it is optimized for the case where only a small number of bits, less than 25 or so, are needed. It also transparently supports larger bit counts, but slightly less efficiently than a plain BitVector, so SmallBitVector should only be used when larger counts are rare.

At this time, SmallBitVector does not support set operations (and, or, xor), and its operator[] does not provide an assignable lvalue.

SparseBitVector

The SparseBitVector container is much like BitVector, with one major difference: Only the bits that are set, are stored. This makes the SparseBitVector much more space efficient than BitVector when the set is sparse, as well as making set operations O(number of set bits) instead of O(size of universe). The downside to the SparseBitVector is that setting and testing of random bits is O(N), and on large SparseBitVectors, this can be slower than BitVector. In our implementation, setting or testing bits in sorted order (either forwards or reverse) is O(1) worst case. Testing and setting bits within 128 bits (depends on size) of the current bit is also O(1). As a general statement, testing/setting bits in a SparseBitVector is O(distance away from last set bit).

Helpful Hints for Common Operations

This section describes how to perform some very simple transformations of LLVM code. This is meant to give examples of common idioms used, showing the practical side of LLVM transformations.

Because this is a "how-to" section, you should also read about the main classes that you will be working with. The Core LLVM Class Hierarchy Reference contains details and descriptions of the main classes that you should know about.

Basic Inspection and Traversal Routines

The LLVM compiler infrastructure have many different data structures that may be traversed. Following the example of the C++ standard template library, the techniques used to traverse these various data structures are all basically the same. For a enumerable sequence of values, the XXXbegin() function (or method) returns an iterator to the start of the sequence, the XXXend() function returns an iterator pointing to one past the last valid element of the sequence, and there is some XXXiterator data type that is common between the two operations.

Because the pattern for iteration is common across many different aspects of the program representation, the standard template library algorithms may be used on them, and it is easier to remember how to iterate. First we show a few common examples of the data structures that need to be traversed. Other data structures are traversed in very similar ways.

Iterating over the BasicBlocks in a Function

It's quite common to have a Function instance that you'd like to transform in some way; in particular, you'd like to manipulate its BasicBlocks. To facilitate this, you'll need to iterate over all of the BasicBlocks that constitute the Function. The following is an example that prints the name of a BasicBlock and the number of Instructions it contains:

 // func is a pointer to a Function instance
 for (Function::iterator i = func->begin(), e = func->end(); i != e; ++i)
   // Print out the name of the basic block if it has one, and then the
   // number of instructions that it contains
   errs() << "Basic block (name=" << i->getName() << ") has "
              << i->size() << " instructions.\n";
 

Note that i can be used as if it were a pointer for the purposes of invoking member functions of the Instruction class. This is because the indirection operator is overloaded for the iterator classes. In the above code, the expression i->size() is exactly equivalent to (*i).size() just like you'd expect.

Iterating over the Instructions in a BasicBlock

Just like when dealing with BasicBlocks in Functions, it's easy to iterate over the individual instructions that make up BasicBlocks. Here's a code snippet that prints out each instruction in a BasicBlock:

 // blk is a pointer to a BasicBlock instance
 for (BasicBlock::iterator i = blk->begin(), e = blk->end(); i != e; ++i)
    // The next statement works since operator<<(ostream&,...)
    // is overloaded for Instruction&
    errs() << *i << "\n";
 

However, this isn't really the best way to print out the contents of a BasicBlock! Since the ostream operators are overloaded for virtually anything you'll care about, you could have just invoked the print routine on the basic block itself: errs() << *blk << "\n";.

Iterating over the Instructions in a Function

If you're finding that you commonly iterate over a Function's BasicBlocks and then that BasicBlock's Instructions, InstIterator should be used instead. You'll need to include llvm/Support/InstIterator.h, and then instantiate InstIterators explicitly in your code. Here's a small example that shows how to dump all instructions in a function to the standard error stream:

 #include "llvm/Support/InstIterator.h"
 
 // F is a pointer to a Function instance
 for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
   errs() << *I << "\n";
 

Easy, isn't it? You can also use InstIterators to fill a work list with its initial contents. For example, if you wanted to initialize a work list to contain all instructions in a Function F, all you would need to do is something like:

 std::set<Instruction*> worklist;
 // or better yet, SmallPtrSet<Instruction*, 64> worklist;
 
 for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
    worklist.insert(&*I);
 

The STL set worklist would now contain all instructions in the Function pointed to by F.

Turning an iterator into a class pointer (and vice-versa)

Sometimes, it'll be useful to grab a reference (or pointer) to a class instance when all you've got at hand is an iterator. Well, extracting a reference or a pointer from an iterator is very straight-forward. Assuming that i is a BasicBlock::iterator and j is a BasicBlock::const_iterator:

 Instruction& inst = *i;   // Grab reference to instruction reference
 Instruction* pinst = &*i; // Grab pointer to instruction reference
 const Instruction& inst = *j;
 

However, the iterators you'll be working with in the LLVM framework are special: they will automatically convert to a ptr-to-instance type whenever they need to. Instead of dereferencing the iterator and then taking the address of the result, you can simply assign the iterator to the proper pointer type and you get the dereference and address-of operation as a result of the assignment (behind the scenes, this is a result of overloading casting mechanisms). Thus the last line of the last example,

 Instruction *pinst = &*i;
 

is semantically equivalent to

 Instruction *pinst = i;
 

It's also possible to turn a class pointer into the corresponding iterator, and this is a constant time operation (very efficient). The following code snippet illustrates use of the conversion constructors provided by LLVM iterators. By using these, you can explicitly grab the iterator of something without actually obtaining it via iteration over some structure:

 void printNextInstruction(Instruction* inst) {
   BasicBlock::iterator it(inst);
   ++it; // After this line, it refers to the instruction after *inst
   if (it != inst->getParent()->end()) errs() << *it << "\n";
 }
 

Unfortunately, these implicit conversions come at a cost; they prevent these iterators from conforming to standard iterator conventions, and thus from being usable with standard algorithms and containers. For example, they prevent the following code, where B is a BasicBlock, from compiling:

   llvm::SmallVector<llvm::Instruction *, 16>(B->begin(), B->end());
 

Because of this, these implicit conversions may be removed some day, and operator* changed to return a pointer instead of a reference.

Finding call sites: a slightly more complex example

Say that you're writing a FunctionPass and would like to count all the locations in the entire module (that is, across every Function) where a certain function (i.e., some Function*) is already in scope. As you'll learn later, you may want to use an InstVisitor to accomplish this in a much more straight-forward manner, but this example will allow us to explore how you'd do it if you didn't have InstVisitor around. In pseudo-code, this is what we want to do:

 initialize callCounter to zero
 for each Function f in the Module
   for each BasicBlock b in f
     for each Instruction i in b
       if (i is a CallInst and calls the given function)
         increment callCounter
 

And the actual code is (remember, because we're writing a FunctionPass, our FunctionPass-derived class simply has to override the runOnFunction method):

 Function* targetFunc = ...;
 
 class OurFunctionPass : public FunctionPass {
   public:
     OurFunctionPass(): callCounter(0) { }
 
     virtual runOnFunction(Function& F) {
       for (Function::iterator b = F.begin(), be = F.end(); b != be; ++b) {
         for (BasicBlock::iterator i = b->begin(), ie = b->end(); i != ie; ++i) {
           if (CallInst* callInst = dyn_cast<CallInst>(&*i)) {
             // We know we've encountered a call instruction, so we
             // need to determine if it's a call to the
             // function pointed to by m_func or not.
             if (callInst->getCalledFunction() == targetFunc)
               ++callCounter;
           }
         }
       }
     }
 
   private:
     unsigned callCounter;
 };
 

Treating calls and invokes the same way

You may have noticed that the previous example was a bit oversimplified in that it did not deal with call sites generated by 'invoke' instructions. In this, and in other situations, you may find that you want to treat CallInsts and InvokeInsts the same way, even though their most-specific common base class is Instruction, which includes lots of less closely-related things. For these cases, LLVM provides a handy wrapper class called CallSite. It is essentially a wrapper around an Instruction pointer, with some methods that provide functionality common to CallInsts and InvokeInsts.

This class has "value semantics": it should be passed by value, not by reference and it should not be dynamically allocated or deallocated using operator new or operator delete. It is efficiently copyable, assignable and constructable, with costs equivalents to that of a bare pointer. If you look at its definition, it has only a single pointer member.

Iterating over def-use & use-def chains

Frequently, we might have an instance of the Value Class and we want to determine which Users use the Value. The list of all Users of a particular Value is called a def-use chain. For example, let's say we have a Function* named F to a particular function foo. Finding all of the instructions that use foo is as simple as iterating over the def-use chain of F:

 Function *F = ...;
 
 for (Value::use_iterator i = F->use_begin(), e = F->use_end(); i != e; ++i)
   if (Instruction *Inst = dyn_cast<Instruction>(*i)) {
     errs() << "F is used in instruction:\n";
     errs() << *Inst << "\n";
   }
 

Note that dereferencing a Value::use_iterator is not a very cheap operation. Instead of performing *i above several times, consider doing it only once in the loop body and reusing its result.

Alternatively, it's common to have an instance of the User Class and need to know what Values are used by it. The list of all Values used by a User is known as a use-def chain. Instances of class Instruction are common Users, so we might want to iterate over all of the values that a particular instruction uses (that is, the operands of the particular Instruction):

 Instruction *pi = ...;
 
 for (User::op_iterator i = pi->op_begin(), e = pi->op_end(); i != e; ++i) {
   Value *v = *i;
   // ...
 }
 

Declaring objects as const is an important tool of enforcing mutation free algorithms (such as analyses, etc.). For this purpose above iterators come in constant flavors as Value::const_use_iterator and Value::const_op_iterator. They automatically arise when calling use/op_begin() on const Value*s or const User*s respectively. Upon dereferencing, they return const Use*s. Otherwise the above patterns remain unchanged.

Iterating over predecessors & successors of blocks

Iterating over the predecessors and successors of a block is quite easy with the routines defined in "llvm/Support/CFG.h". Just use code like this to iterate over all predecessors of BB:

 #include "llvm/Support/CFG.h"
 BasicBlock *BB = ...;
 
 for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
   BasicBlock *Pred = *PI;
   // ...
 }
 

Similarly, to iterate over successors use succ_iterator/succ_begin/succ_end.

Making simple changes

There are some primitive transformation operations present in the LLVM infrastructure that are worth knowing about. When performing transformations, it's fairly common to manipulate the contents of basic blocks. This section describes some of the common methods for doing so and gives example code.

Creating and inserting new Instructions

Instantiating Instructions

Creation of Instructions is straight-forward: simply call the constructor for the kind of instruction to instantiate and provide the necessary parameters. For example, an AllocaInst only requires a (const-ptr-to) Type. Thus:

 AllocaInst* ai = new AllocaInst(Type::Int32Ty);
 

will create an AllocaInst instance that represents the allocation of one integer in the current stack frame, at run time. Each Instruction subclass is likely to have varying default parameters which change the semantics of the instruction, so refer to the doxygen documentation for the subclass of Instruction that you're interested in instantiating.

Naming values

It is very useful to name the values of instructions when you're able to, as this facilitates the debugging of your transformations. If you end up looking at generated LLVM machine code, you definitely want to have logical names associated with the results of instructions! By supplying a value for the Name (default) parameter of the Instruction constructor, you associate a logical name with the result of the instruction's execution at run time. For example, say that I'm writing a transformation that dynamically allocates space for an integer on the stack, and that integer is going to be used as some kind of index by some other code. To accomplish this, I place an AllocaInst at the first point in the first BasicBlock of some Function, and I'm intending to use it within the same Function. I might do:

 AllocaInst* pa = new AllocaInst(Type::Int32Ty, 0, "indexLoc");
 

where indexLoc is now the logical name of the instruction's execution value, which is a pointer to an integer on the run time stack.

Inserting instructions

There are essentially two ways to insert an Instruction into an existing sequence of instructions that form a BasicBlock:

  • Insertion into an explicit instruction list

    Given a BasicBlock* pb, an Instruction* pi within that BasicBlock, and a newly-created instruction we wish to insert before *pi, we do the following:

     BasicBlock *pb = ...;
     Instruction *pi = ...;
     Instruction *newInst = new Instruction(...);
     
     pb->getInstList().insert(pi, newInst); // Inserts newInst before pi in pb
     

    Appending to the end of a BasicBlock is so common that the Instruction class and Instruction-derived classes provide constructors which take a pointer to a BasicBlock to be appended to. For example code that looked like:

     BasicBlock *pb = ...;
     Instruction *newInst = new Instruction(...);
     
     pb->getInstList().push_back(newInst); // Appends newInst to pb
     

    becomes:

     BasicBlock *pb = ...;
     Instruction *newInst = new Instruction(..., pb);
     

    which is much cleaner, especially if you are creating long instruction streams.

  • Insertion into an implicit instruction list

    Instruction instances that are already in BasicBlocks are implicitly associated with an existing instruction list: the instruction list of the enclosing basic block. Thus, we could have accomplished the same thing as the above code without being given a BasicBlock by doing:

     Instruction *pi = ...;
     Instruction *newInst = new Instruction(...);
     
     pi->getParent()->getInstList().insert(pi, newInst);
     

    In fact, this sequence of steps occurs so frequently that the Instruction class and Instruction-derived classes provide constructors which take (as a default parameter) a pointer to an Instruction which the newly-created Instruction should precede. That is, Instruction constructors are capable of inserting the newly-created instance into the BasicBlock of a provided instruction, immediately before that instruction. Using an Instruction constructor with a insertBefore (default) parameter, the above code becomes:

     Instruction* pi = ...;
     Instruction* newInst = new Instruction(..., pi);
     

    which is much cleaner, especially if you're creating a lot of instructions and adding them to BasicBlocks.

Deleting Instructions

Deleting an instruction from an existing sequence of instructions that form a BasicBlock is very straight-forward: just call the instruction's eraseFromParent() method. For example:

 Instruction *I = .. ;
 I->eraseFromParent();
 

This unlinks the instruction from its containing basic block and deletes it. If you'd just like to unlink the instruction from its containing basic block but not delete it, you can use the removeFromParent() method.

Replacing an Instruction with another Value

Replacing individual instructions

Including "llvm/Transforms/Utils/BasicBlockUtils.h" permits use of two very useful replace functions: ReplaceInstWithValue and ReplaceInstWithInst.

Deleting Instructions
  • ReplaceInstWithValue

    This function replaces all uses of a given instruction with a value, and then removes the original instruction. The following example illustrates the replacement of the result of a particular AllocaInst that allocates memory for a single integer with a null pointer to an integer.

     AllocaInst* instToReplace = ...;
     BasicBlock::iterator ii(instToReplace);
     
     ReplaceInstWithValue(instToReplace->getParent()->getInstList(), ii,
                          Constant::getNullValue(PointerType::getUnqual(Type::Int32Ty)));
     
  • ReplaceInstWithInst

    This function replaces a particular instruction with another instruction, inserting the new instruction into the basic block at the location where the old instruction was, and replacing any uses of the old instruction with the new instruction. The following example illustrates the replacement of one AllocaInst with another.

     AllocaInst* instToReplace = ...;
     BasicBlock::iterator ii(instToReplace);
     
     ReplaceInstWithInst(instToReplace->getParent()->getInstList(), ii,
                         new AllocaInst(Type::Int32Ty, 0, "ptrToReplacedInt"));
     

Replacing multiple uses of Users and Values

You can use Value::replaceAllUsesWith and User::replaceUsesOfWith to change more than one use at a time. See the doxygen documentation for the Value Class and User Class, respectively, for more information.

Deleting GlobalVariables

Deleting a global variable from a module is just as easy as deleting an Instruction. First, you must have a pointer to the global variable that you wish to delete. You use this pointer to erase it from its parent, the module. For example:

 GlobalVariable *GV = .. ;
 
 GV->eraseFromParent();
 

How to Create Types

In generating IR, you may need some complex types. If you know these types statically, you can use TypeBuilder<...>::get(), defined in llvm/Support/TypeBuilder.h, to retrieve them. TypeBuilder has two forms depending on whether you're building types for cross-compilation or native library use. TypeBuilder<T, true> requires that T be independent of the host environment, meaning that it's built out of types from the llvm::types namespace and pointers, functions, arrays, etc. built of those. TypeBuilder<T, false> additionally allows native C types whose size may depend on the host compiler. For example,

 FunctionType *ft = TypeBuilder<types::i<8>(types::i<32>*), true>::get();
 

is easier to read and write than the equivalent

 std::vector<const Type*> params;
 params.push_back(PointerType::getUnqual(Type::Int32Ty));
 FunctionType *ft = FunctionType::get(Type::Int8Ty, params, false);
 

See the class comment for more details.

Threads and LLVM

This section describes the interaction of the LLVM APIs with multithreading, both on the part of client applications, and in the JIT, in the hosted application.

Note that LLVM's support for multithreading is still relatively young. Up through version 2.5, the execution of threaded hosted applications was supported, but not threaded client access to the APIs. While this use case is now supported, clients must adhere to the guidelines specified below to ensure proper operation in multithreaded mode.

Note that, on Unix-like platforms, LLVM requires the presence of GCC's atomic intrinsics in order to support threaded operation. If you need a multhreading-capable LLVM on a platform without a suitably modern system compiler, consider compiling LLVM and LLVM-GCC in single-threaded mode, and using the resultant compiler to build a copy of LLVM with multithreading support.

Entering and Exiting Multithreaded Mode

In order to properly protect its internal data structures while avoiding excessive locking overhead in the single-threaded case, the LLVM must intialize certain data structures necessary to provide guards around its internals. To do so, the client program must invoke llvm_start_multithreaded() before making any concurrent LLVM API calls. To subsequently tear down these structures, use the llvm_stop_multithreaded() call. You can also use the llvm_is_multithreaded() call to check the status of multithreaded mode.

Note that both of these calls must be made in isolation. That is to say that no other LLVM API calls may be executing at any time during the execution of llvm_start_multithreaded() or llvm_stop_multithreaded . It's is the client's responsibility to enforce this isolation.

The return value of llvm_start_multithreaded() indicates the success or failure of the initialization. Failure typically indicates that your copy of LLVM was built without multithreading support, typically because GCC atomic intrinsics were not found in your system compiler. In this case, the LLVM API will not be safe for concurrent calls. However, it will be safe for hosting threaded applications in the JIT, though care must be taken to ensure that side exits and the like do not accidentally result in concurrent LLVM API calls.

Ending Execution with llvm_shutdown()

When you are done using the LLVM APIs, you should call llvm_shutdown() to deallocate memory used for internal structures. This will also invoke llvm_stop_multithreaded() if LLVM is operating in multithreaded mode. As such, llvm_shutdown() requires the same isolation guarantees as llvm_stop_multithreaded().

Note that, if you use scope-based shutdown, you can use the llvm_shutdown_obj class, which calls llvm_shutdown() in its destructor.

Lazy Initialization with ManagedStatic

ManagedStatic is a utility class in LLVM used to implement static initialization of static resources, such as the global type tables. Before the invocation of llvm_shutdown(), it implements a simple lazy initialization scheme. Once llvm_start_multithreaded() returns, however, it uses double-checked locking to implement thread-safe lazy initialization.

Note that, because no other threads are allowed to issue LLVM API calls before llvm_start_multithreaded() returns, it is possible to have ManagedStatics of llvm::sys::Mutexs.

The llvm_acquire_global_lock() and llvm_release_global_lock APIs provide access to the global lock used to implement the double-checked locking for lazy initialization. These should only be used internally to LLVM, and only if you know what you're doing!

Achieving Isolation with LLVMContext

LLVMContext is an opaque class in the LLVM API which clients can use to operate multiple, isolated instances of LLVM concurrently within the same address space. For instance, in a hypothetical compile-server, the compilation of an individual translation unit is conceptually independent from all the others, and it would be desirable to be able to compile incoming translation units concurrently on independent server threads. Fortunately, LLVMContext exists to enable just this kind of scenario!

Conceptually, LLVMContext provides isolation. Every LLVM entity (Modules, Values, Types, Constants, etc.) in LLVM's in-memory IR belongs to an LLVMContext. Entities in different contexts cannot interact with each other: Modules in different contexts cannot be linked together, Functions cannot be added to Modules in different contexts, etc. What this means is that is is safe to compile on multiple threads simultaneously, as long as no two threads operate on entities within the same context.

In practice, very few places in the API require the explicit specification of a LLVMContext, other than the Type creation/lookup APIs. Because every Type carries a reference to its owning context, most other entities can determine what context they belong to by looking at their own Type. If you are adding new entities to LLVM IR, please try to maintain this interface design.

For clients that do not require the benefits of isolation, LLVM provides a convenience API getGlobalContext(). This returns a global, lazily initialized LLVMContext that may be used in situations where isolation is not a concern.

Threads and the JIT

LLVM's "eager" JIT compiler is safe to use in threaded programs. Multiple threads can call ExecutionEngine::getPointerToFunction() or ExecutionEngine::runFunction() concurrently, and multiple threads can run code output by the JIT concurrently. The user must still ensure that only one thread accesses IR in a given LLVMContext while another thread might be modifying it. One way to do that is to always hold the JIT lock while accessing IR outside the JIT (the JIT modifies the IR by adding CallbackVHs). Another way is to only call getPointerToFunction() from the LLVMContext's thread.

When the JIT is configured to compile lazily (using ExecutionEngine::DisableLazyCompilation(false)), there is currently a race condition in updating call sites after a function is lazily-jitted. It's still possible to use the lazy JIT in a threaded program if you ensure that only one thread at a time can call any particular lazy stub and that the JIT lock guards any IR access, but we suggest using only the eager JIT in threaded programs.

Advanced Topics

This section describes some of the advanced or obscure API's that most clients do not need to be aware of. These API's tend manage the inner workings of the LLVM system, and only need to be accessed in unusual circumstances.

The ValueSymbolTable class

The ValueSymbolTable class provides a symbol table that the Function and Module classes use for naming value definitions. The symbol table can provide a name for any Value.

Note that the SymbolTable class should not be directly accessed by most clients. It should only be used when iteration over the symbol table names themselves are required, which is very special purpose. Note that not all LLVM Values have names, and those without names (i.e. they have an empty name) do not exist in the symbol table.

Symbol tables support iteration over the values in the symbol table with begin/end/iterator and supports querying to see if a specific name is in the symbol table (with lookup). The ValueSymbolTable class exposes no public mutator methods, instead, simply call setName on a value, which will autoinsert it into the appropriate symbol table.

The User and owned Use classes' memory layout

The User class provides a basis for expressing the ownership of User towards other Values. The Use helper class is employed to do the bookkeeping and to facilitate O(1) addition and removal.

Interaction and relationship between User and Use objects

A subclass of User can choose between incorporating its Use objects or refer to them out-of-line by means of a pointer. A mixed variant (some Uses inline others hung off) is impractical and breaks the invariant that the Use objects belonging to the same User form a contiguous array.

We have 2 different layouts in the User (sub)classes:

  • Layout a) The Use object(s) are inside (resp. at fixed offset) of the User object and there are a fixed number of them.

  • Layout b) The Use object(s) are referenced by a pointer to an array from the User object and there may be a variable number of them.

As of v2.4 each layout still possesses a direct pointer to the start of the array of Uses. Though not mandatory for layout a), we stick to this redundancy for the sake of simplicity. The User object also stores the number of Use objects it has. (Theoretically this information can also be calculated given the scheme presented below.)

Special forms of allocation operators (operator new) enforce the following memory layouts:

  • Layout a) is modelled by prepending the User object by the Use[] array.

     ...---.---.---.---.-------...
       | P | P | P | P | User
     '''---'---'---'---'-------'''
     
  • Layout b) is modelled by pointing at the Use[] array.

     .-------...
     | User
     '-------'''
         |
         v
         .---.---.---.---...
         | P | P | P | P |
         '---'---'---'---'''
     
(In the above figures 'P' stands for the Use** that is stored in each Use object in the member Use::Prev)

The waymarking algorithm

Since the Use objects are deprived of the direct (back)pointer to their User objects, there must be a fast and exact method to recover it. This is accomplished by the following scheme:

A bit-encoding in the 2 LSBits (least significant bits) of the Use::Prev allows to find the start of the User object:
  • 00 —> binary digit 0
  • 01 —> binary digit 1
  • 10 —> stop and calculate (s)
  • 11 —> full stop (S)

Given a Use*, all we have to do is to walk till we get a stop and we either have a User immediately behind or we have to walk to the next stop picking up digits and calculating the offset:

 .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.----------------
 | 1 | s | 1 | 0 | 1 | 0 | s | 1 | 1 | 0 | s | 1 | 1 | s | 1 | S | User (or User*)
 '---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'----------------
     |+15                |+10            |+6         |+3     |+1
     |                   |               |           |       |__>
     |                   |               |           |__________>
     |                   |               |______________________>
     |                   |______________________________________>
     |__________________________________________________________>
 

Only the significant number of bits need to be stored between the stops, so that the worst case is 20 memory accesses when there are 1000 Use objects associated with a User.

Reference implementation

The following literate Haskell fragment demonstrates the concept:

 > import Test.QuickCheck
 > 
 > digits :: Int -> [Char] -> [Char]
 > digits 0 acc = '0' : acc
 > digits 1 acc = '1' : acc
 > digits n acc = digits (n `div` 2) $ digits (n `mod` 2) acc
 > 
 > dist :: Int -> [Char] -> [Char]
 > dist 0 [] = ['S']
 > dist 0 acc = acc
 > dist 1 acc = let r = dist 0 acc in 's' : digits (length r) r
 > dist n acc = dist (n - 1) $ dist 1 acc
 > 
 > takeLast n ss = reverse $ take n $ reverse ss
 > 
 > test = takeLast 40 $ dist 20 []
 > 
 

Printing <test> gives: "1s100000s11010s10100s1111s1010s110s11s1S"

The reverse algorithm computes the length of the string just by examining a certain prefix:

 > pref :: [Char] -> Int
 > pref "S" = 1
 > pref ('s':'1':rest) = decode 2 1 rest
 > pref (_:rest) = 1 + pref rest
 > 
 > decode walk acc ('0':rest) = decode (walk + 1) (acc * 2) rest
 > decode walk acc ('1':rest) = decode (walk + 1) (acc * 2 + 1) rest
 > decode walk acc _ = walk + acc
 > 
 

Now, as expected, printing <pref test> gives 40.

We can quickCheck this with following property:

 > testcase = dist 2000 []
 > testcaseLength = length testcase
 > 
 > identityProp n = n > 0 && n <= testcaseLength ==> length arr == pref arr
 >     where arr = takeLast n testcase
 > 
 

As expected <quickCheck identityProp> gives:

 *Main> quickCheck identityProp
 OK, passed 100 tests.
 

Let's be a bit more exhaustive:

 > 
 > deepCheck p = check (defaultConfig { configMaxTest = 500 }) p
 > 
 

And here is the result of <deepCheck identityProp>:

 *Main> deepCheck identityProp
 OK, passed 500 tests.
 

Tagging considerations

To maintain the invariant that the 2 LSBits of each Use** in Use never change after being set up, setters of Use::Prev must re-tag the new Use** on every modification. Accordingly getters must strip the tag bits.

For layout b) instead of the User we find a pointer (User* with LSBit set). Following this pointer brings us to the User. A portable trick ensures that the first bytes of User (if interpreted as a pointer) never has the LSBit set. (Portability is relying on the fact that all known compilers place the vptr in the first word of the instances.)

The Core LLVM Class Hierarchy Reference

#include "llvm/Type.h"
doxygen info: Type Class

The Core LLVM classes are the primary means of representing the program being inspected or transformed. The core LLVM classes are defined in header files in the include/llvm/ directory, and implemented in the lib/VMCore directory.

The Type class and Derived Types

Type is a superclass of all type classes. Every Value has a Type. Type cannot be instantiated directly but only through its subclasses. Certain primitive types (VoidType, LabelType, FloatType and DoubleType) have hidden subclasses. They are hidden because they offer no useful functionality beyond what the Type class offers except to distinguish themselves from other subclasses of Type.

All other types are subclasses of DerivedType. Types can be named, but this is not a requirement. There exists exactly one instance of a given shape at any one time. This allows type equality to be performed with address equality of the Type Instance. That is, given two Type* values, the types are identical if the pointers are identical.

Important Public Methods

  • bool isIntegerTy() const: Returns true for any integer type.
  • bool isFloatingPointTy(): Return true if this is one of the five floating point types.
  • bool isSized(): Return true if the type has known size. Things that don't have a size are abstract types, labels and void.

Important Derived Types

IntegerType
Subclass of DerivedType that represents integer types of any bit width. Any bit width between IntegerType::MIN_INT_BITS (1) and IntegerType::MAX_INT_BITS (~8 million) can be represented.
  • static const IntegerType* get(unsigned NumBits): get an integer type of a specific bit width.
  • unsigned getBitWidth() const: Get the bit width of an integer type.
SequentialType
This is subclassed by ArrayType, PointerType and VectorType.
  • const Type * getElementType() const: Returns the type of each of the elements in the sequential type.
ArrayType
This is a subclass of SequentialType and defines the interface for array types.
  • unsigned getNumElements() const: Returns the number of elements in the array.
PointerType
Subclass of SequentialType for pointer types.
VectorType
Subclass of SequentialType for vector types. A vector type is similar to an ArrayType but is distinguished because it is a first class type whereas ArrayType is not. Vector types are used for vector operations and are usually small vectors of of an integer or floating point type.
StructType
Subclass of DerivedTypes for struct types.
FunctionType
Subclass of DerivedTypes for function types.
  • bool isVarArg() const: Returns true if it's a vararg function
  • const Type * getReturnType() const: Returns the return type of the function.
  • const Type * getParamType (unsigned i): Returns the type of the ith parameter.
  • const unsigned getNumParams() const: Returns the number of formal parameters.

The Module class

#include "llvm/Module.h"
doxygen info: Module Class

The Module class represents the top level structure present in LLVM programs. An LLVM module is effectively either a translation unit of the original program or a combination of several translation units merged by the linker. The Module class keeps track of a list of Functions, a list of GlobalVariables, and a SymbolTable. Additionally, it contains a few helpful member functions that try to make common operations easy.

Important Public Members of the Module class

  • Module::Module(std::string name = "")

Constructing a Module is easy. You can optionally provide a name for it (probably based on the name of the translation unit).

  • Module::iterator - Typedef for function list iterator
    Module::const_iterator - Typedef for const_iterator.
    begin(), end() size(), empty()

    These are forwarding methods that make it easy to access the contents of a Module object's Function list.

  • Module::FunctionListType &getFunctionList()

    Returns the list of Functions. This is necessary to use when you need to update the list or perform a complex action that doesn't have a forwarding method.


  • Module::global_iterator - Typedef for global variable list iterator
    Module::const_global_iterator - Typedef for const_iterator.
    global_begin(), global_end() global_size(), global_empty()

    These are forwarding methods that make it easy to access the contents of a Module object's GlobalVariable list.

  • Module::GlobalListType &getGlobalList()

    Returns the list of GlobalVariables. This is necessary to use when you need to update the list or perform a complex action that doesn't have a forwarding method.



  • Function *getFunction(const std::string &Name, const FunctionType *Ty)

    Look up the specified function in the Module SymbolTable. If it does not exist, return null.

  • Function *getOrInsertFunction(const std::string &Name, const FunctionType *T)

    Look up the specified function in the Module SymbolTable. If it does not exist, add an external declaration for the function and return it.

  • std::string getTypeName(const Type *Ty)

    If there is at least one entry in the SymbolTable for the specified Type, return it. Otherwise return the empty string.

  • bool addTypeName(const std::string &Name, const Type *Ty)

    Insert an entry in the SymbolTable mapping Name to Ty. If there is already an entry for this name, true is returned and the SymbolTable is not modified.

The Value class

#include "llvm/Value.h"
doxygen info: Value Class

The Value class is the most important class in the LLVM Source base. It represents a typed value that may be used (among other things) as an operand to an instruction. There are many different types of Values, such as Constants,Arguments. Even Instructions and Functions are Values.

A particular Value may be used many times in the LLVM representation for a program. For example, an incoming argument to a function (represented with an instance of the Argument class) is "used" by every instruction in the function that references the argument. To keep track of this relationship, the Value class keeps a list of all of the Users that is using it (the User class is a base class for all nodes in the LLVM graph that can refer to Values). This use list is how LLVM represents def-use information in the program, and is accessible through the use_* methods, shown below.

Because LLVM is a typed representation, every LLVM Value is typed, and this Type is available through the getType() method. In addition, all LLVM values can be named. The "name" of the Value is a symbolic string printed in the LLVM code:

 %foo = add i32 1, 2
 

The name of this instruction is "foo". NOTE that the name of any value may be missing (an empty string), so names should ONLY be used for debugging (making the source code easier to read, debugging printouts), they should not be used to keep track of values or map between them. For this purpose, use a std::map of pointers to the Value itself instead.

One important aspect of LLVM is that there is no distinction between an SSA variable and the operation that produces it. Because of this, any reference to the value produced by an instruction (or the value available as an incoming argument, for example) is represented as a direct pointer to the instance of the class that represents this value. Although this may take some getting used to, it simplifies the representation and makes it easier to manipulate.

Important Public Members of the Value class

  • Value::use_iterator - Typedef for iterator over the use-list
    Value::const_use_iterator - Typedef for const_iterator over the use-list
    unsigned use_size() - Returns the number of users of the value.
    bool use_empty() - Returns true if there are no users.
    use_iterator use_begin() - Get an iterator to the start of the use-list.
    use_iterator use_end() - Get an iterator to the end of the use-list.
    User *use_back() - Returns the last element in the list.

    These methods are the interface to access the def-use information in LLVM. As with all other iterators in LLVM, the naming conventions follow the conventions defined by the STL.

  • Type *getType() const

    This method returns the Type of the Value.

  • bool hasName() const
    std::string getName() const
    void setName(const std::string &Name)

    This family of methods is used to access and assign a name to a Value, be aware of the precaution above.

  • void replaceAllUsesWith(Value *V)

    This method traverses the use list of a Value changing all Users of the current value to refer to "V" instead. For example, if you detect that an instruction always produces a constant value (for example through constant folding), you can replace all uses of the instruction with the constant like this:

     Inst->replaceAllUsesWith(ConstVal);
     

The User class

#include "llvm/User.h"
doxygen info: User Class
Superclass: Value

The User class is the common base class of all LLVM nodes that may refer to Values. It exposes a list of "Operands" that are all of the Values that the User is referring to. The User class itself is a subclass of Value.

The operands of a User point directly to the LLVM Value that it refers to. Because LLVM uses Static Single Assignment (SSA) form, there can only be one definition referred to, allowing this direct connection. This connection provides the use-def information in LLVM.

Important Public Members of the User class

The User class exposes the operand list in two ways: through an index access interface and through an iterator based interface.

  • Value *getOperand(unsigned i)
    unsigned getNumOperands()

    These two methods expose the operands of the User in a convenient form for direct access.

  • User::op_iterator - Typedef for iterator over the operand list
    op_iterator op_begin() - Get an iterator to the start of the operand list.
    op_iterator op_end() - Get an iterator to the end of the operand list.

    Together, these methods make up the iterator based interface to the operands of a User.

The Instruction class

#include "llvm/Instruction.h"
doxygen info: Instruction Class
Superclasses: User, Value

The Instruction class is the common base class for all LLVM instructions. It provides only a few methods, but is a very commonly used class. The primary data tracked by the Instruction class itself is the opcode (instruction type) and the parent BasicBlock the Instruction is embedded into. To represent a specific type of instruction, one of many subclasses of Instruction are used.

Because the Instruction class subclasses the User class, its operands can be accessed in the same way as for other Users (with the getOperand()/getNumOperands() and op_begin()/op_end() methods).

An important file for the Instruction class is the llvm/Instruction.def file. This file contains some meta-data about the various different types of instructions in LLVM. It describes the enum values that are used as opcodes (for example Instruction::Add and Instruction::ICmp), as well as the concrete sub-classes of Instruction that implement the instruction (for example BinaryOperator and CmpInst). Unfortunately, the use of macros in this file confuses doxygen, so these enum values don't show up correctly in the doxygen output.

Important Subclasses of the Instruction class

  • BinaryOperator

    This subclasses represents all two operand instructions whose operands must be the same type, except for the comparison instructions.

  • CastInst

    This subclass is the parent of the 12 casting instructions. It provides common operations on cast instructions.

  • CmpInst

    This subclass respresents the two comparison instructions, ICmpInst (integer opreands), and FCmpInst (floating point operands).

  • TerminatorInst

    This subclass is the parent of all terminator instructions (those which can terminate a block).

Important Public Members of the Instruction class

  • BasicBlock *getParent()

    Returns the BasicBlock that this Instruction is embedded into.

  • bool mayWriteToMemory()

    Returns true if the instruction writes to memory, i.e. it is a call,free,invoke, or store.

  • unsigned getOpcode()

    Returns the opcode for the Instruction.

  • Instruction *clone() const

    Returns another instance of the specified instruction, identical in all ways to the original except that the instruction has no parent (ie it's not embedded into a BasicBlock), and it has no name

The Constant class and subclasses

Constant represents a base class for different types of constants. It is subclassed by ConstantInt, ConstantArray, etc. for representing the various types of Constants. GlobalValue is also a subclass, which represents the address of a global variable or function.

Important Subclasses of Constant

  • ConstantInt : This subclass of Constant represents an integer constant of any width.
    • const APInt& getValue() const: Returns the underlying value of this constant, an APInt value.
    • int64_t getSExtValue() const: Converts the underlying APInt value to an int64_t via sign extension. If the value (not the bit width) of the APInt is too large to fit in an int64_t, an assertion will result. For this reason, use of this method is discouraged.
    • uint64_t getZExtValue() const: Converts the underlying APInt value to a uint64_t via zero extension. IF the value (not the bit width) of the APInt is too large to fit in a uint64_t, an assertion will result. For this reason, use of this method is discouraged.
    • static ConstantInt* get(const APInt& Val): Returns the ConstantInt object that represents the value provided by Val. The type is implied as the IntegerType that corresponds to the bit width of Val.
    • static ConstantInt* get(const Type *Ty, uint64_t Val): Returns the ConstantInt object that represents the value provided by Val for integer type Ty.
  • ConstantFP : This class represents a floating point constant.
    • double getValue() const: Returns the underlying value of this constant.
  • ConstantArray : This represents a constant array.
    • const std::vector<Use> &getValues() const: Returns a vector of component constants that makeup this array.
  • ConstantStruct : This represents a constant struct.
    • const std::vector<Use> &getValues() const: Returns a vector of component constants that makeup this array.
  • GlobalValue : This represents either a global variable or a function. In either case, the value is a constant fixed address (after linking).

The GlobalValue class

#include "llvm/GlobalValue.h"
doxygen info: GlobalValue Class
Superclasses: Constant, User, Value

Global values (GlobalVariables or Functions) are the only LLVM values that are visible in the bodies of all Functions. Because they are visible at global scope, they are also subject to linking with other globals defined in different translation units. To control the linking process, GlobalValues know their linkage rules. Specifically, GlobalValues know whether they have internal or external linkage, as defined by the LinkageTypes enumeration.

If a GlobalValue has internal linkage (equivalent to being static in C), it is not visible to code outside the current translation unit, and does not participate in linking. If it has external linkage, it is visible to external code, and does participate in linking. In addition to linkage information, GlobalValues keep track of which Module they are currently part of.

Because GlobalValues are memory objects, they are always referred to by their address. As such, the Type of a global is always a pointer to its contents. It is important to remember this when using the GetElementPtrInst instruction because this pointer must be dereferenced first. For example, if you have a GlobalVariable (a subclass of GlobalValue) that is an array of 24 ints, type [24 x i32], then the GlobalVariable is a pointer to that array. Although the address of the first element of this array and the value of the GlobalVariable are the same, they have different types. The GlobalVariable's type is [24 x i32]. The first element's type is i32. Because of this, accessing a global value requires you to dereference the pointer with GetElementPtrInst first, then its elements can be accessed. This is explained in the LLVM Language Reference Manual.

Important Public Members of the GlobalValue class

  • bool hasInternalLinkage() const
    bool hasExternalLinkage() const
    void setInternalLinkage(bool HasInternalLinkage)

    These methods manipulate the linkage characteristics of the GlobalValue.

  • Module *getParent()

    This returns the Module that the GlobalValue is currently embedded into.

The Function class

#include "llvm/Function.h"
doxygen info: Function Class
Superclasses: GlobalValue, Constant, User, Value

The Function class represents a single procedure in LLVM. It is actually one of the more complex classes in the LLVM hierarchy because it must keep track of a large amount of data. The Function class keeps track of a list of BasicBlocks, a list of formal Arguments, and a SymbolTable.

The list of BasicBlocks is the most commonly used part of Function objects. The list imposes an implicit ordering of the blocks in the function, which indicate how the code will be laid out by the backend. Additionally, the first BasicBlock is the implicit entry node for the Function. It is not legal in LLVM to explicitly branch to this initial block. There are no implicit exit nodes, and in fact there may be multiple exit nodes from a single Function. If the BasicBlock list is empty, this indicates that the Function is actually a function declaration: the actual body of the function hasn't been linked in yet.

In addition to a list of BasicBlocks, the Function class also keeps track of the list of formal Arguments that the function receives. This container manages the lifetime of the Argument nodes, just like the BasicBlock list does for the BasicBlocks.

The SymbolTable is a very rarely used LLVM feature that is only used when you have to look up a value by name. Aside from that, the SymbolTable is used internally to make sure that there are not conflicts between the names of Instructions, BasicBlocks, or Arguments in the function body.

Note that Function is a GlobalValue and therefore also a Constant. The value of the function is its address (after linking) which is guaranteed to be constant.

Important Public Members of the Function class

  • Function(const FunctionType *Ty, LinkageTypes Linkage, const std::string &N = "", Module* Parent = 0)

    Constructor used when you need to create new Functions to add the the program. The constructor must specify the type of the function to create and what type of linkage the function should have. The FunctionType argument specifies the formal arguments and return value for the function. The same FunctionType value can be used to create multiple functions. The Parent argument specifies the Module in which the function is defined. If this argument is provided, the function will automatically be inserted into that module's list of functions.

  • bool isDeclaration()

    Return whether or not the Function has a body defined. If the function is "external", it does not have a body, and thus must be resolved by linking with a function defined in a different translation unit.

  • Function::iterator - Typedef for basic block list iterator
    Function::const_iterator - Typedef for const_iterator.
    begin(), end() size(), empty()

    These are forwarding methods that make it easy to access the contents of a Function object's BasicBlock list.

  • Function::BasicBlockListType &getBasicBlockList()

    Returns the list of BasicBlocks. This is necessary to use when you need to update the list or perform a complex action that doesn't have a forwarding method.

  • Function::arg_iterator - Typedef for the argument list iterator
    Function::const_arg_iterator - Typedef for const_iterator.
    arg_begin(), arg_end() arg_size(), arg_empty()

    These are forwarding methods that make it easy to access the contents of a Function object's Argument list.

  • Function::ArgumentListType &getArgumentList()

    Returns the list of Arguments. This is necessary to use when you need to update the list or perform a complex action that doesn't have a forwarding method.

  • BasicBlock &getEntryBlock()

    Returns the entry BasicBlock for the function. Because the entry block for the function is always the first block, this returns the first block of the Function.

  • Type *getReturnType()
    FunctionType *getFunctionType()

    This traverses the Type of the Function and returns the return type of the function, or the FunctionType of the actual function.

  • SymbolTable *getSymbolTable()

    Return a pointer to the SymbolTable for this Function.

The GlobalVariable class

#include "llvm/GlobalVariable.h"
doxygen info: GlobalVariable Class
Superclasses: GlobalValue, Constant, User, Value

Global variables are represented with the (surprise surprise) GlobalVariable class. Like functions, GlobalVariables are also subclasses of GlobalValue, and as such are always referenced by their address (global values must live in memory, so their "name" refers to their constant address). See GlobalValue for more on this. Global variables may have an initial value (which must be a Constant), and if they have an initializer, they may be marked as "constant" themselves (indicating that their contents never change at runtime).

Important Public Members of the GlobalVariable class

  • GlobalVariable(const Type *Ty, bool isConstant, LinkageTypes& Linkage, Constant *Initializer = 0, const std::string &Name = "", Module* Parent = 0)

    Create a new global variable of the specified type. If isConstant is true then the global variable will be marked as unchanging for the program. The Linkage parameter specifies the type of linkage (internal, external, weak, linkonce, appending) for the variable. If the linkage is InternalLinkage, WeakAnyLinkage, WeakODRLinkage, LinkOnceAnyLinkage or LinkOnceODRLinkage,  then the resultant global variable will have internal linkage. AppendingLinkage concatenates together all instances (in different translation units) of the variable into a single variable but is only applicable to arrays.  See the LLVM Language Reference for further details on linkage types. Optionally an initializer, a name, and the module to put the variable into may be specified for the global variable as well.

  • bool isConstant() const

    Returns true if this is a global variable that is known not to be modified at runtime.

  • bool hasInitializer()

    Returns true if this GlobalVariable has an intializer.

  • Constant *getInitializer()

    Returns the initial value for a GlobalVariable. It is not legal to call this method if there is no initializer.

The BasicBlock class

#include "llvm/BasicBlock.h"
doxygen info: BasicBlock Class
Superclass: Value

This class represents a single entry single exit section of the code, commonly known as a basic block by the compiler community. The BasicBlock class maintains a list of Instructions, which form the body of the block. Matching the language definition, the last element of this list of instructions is always a terminator instruction (a subclass of the TerminatorInst class).

In addition to tracking the list of instructions that make up the block, the BasicBlock class also keeps track of the Function that it is embedded into.

Note that BasicBlocks themselves are Values, because they are referenced by instructions like branches and can go in the switch tables. BasicBlocks have type label.

Important Public Members of the BasicBlock class

  • BasicBlock(const std::string &Name = "", Function *Parent = 0)

    The BasicBlock constructor is used to create new basic blocks for insertion into a function. The constructor optionally takes a name for the new block, and a Function to insert it into. If the Parent parameter is specified, the new BasicBlock is automatically inserted at the end of the specified Function, if not specified, the BasicBlock must be manually inserted into the Function.

  • BasicBlock::iterator - Typedef for instruction list iterator
    BasicBlock::const_iterator - Typedef for const_iterator.
    begin(), end(), front(), back(), size(), empty() STL-style functions for accessing the instruction list.

    These methods and typedefs are forwarding functions that have the same semantics as the standard library methods of the same names. These methods expose the underlying instruction list of a basic block in a way that is easy to manipulate. To get the full complement of container operations (including operations to update the list), you must use the getInstList() method.

  • BasicBlock::InstListType &getInstList()

    This method is used to get access to the underlying container that actually holds the Instructions. This method must be used when there isn't a forwarding function in the BasicBlock class for the operation that you would like to perform. Because there are no forwarding functions for "updating" operations, you need to use this if you want to update the contents of a BasicBlock.

  • Function *getParent()

    Returns a pointer to Function the block is embedded into, or a null pointer if it is homeless.

  • TerminatorInst *getTerminator()

    Returns a pointer to the terminator instruction that appears at the end of the BasicBlock. If there is no terminator instruction, or if the last instruction in the block is not a terminator, then a null pointer is returned.

The Argument class

This subclass of Value defines the interface for incoming formal arguments to a function. A Function maintains a list of its formal arguments. An argument has a pointer to the parent Function.


Valid CSS Valid HTML 4.01 Strict Dinakar Dhurjati and Chris Lattner
The LLVM Compiler Infrastructure
- Last modified: $Date: 2011-10-11 08:33:56 +0200 (Tue, 11 Oct 2011) $ + Last modified: $Date: 2011-11-03 07:43:54 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/Projects.html =================================================================== --- vendor/llvm/dist/docs/Projects.html (revision 228363) +++ vendor/llvm/dist/docs/Projects.html (revision 228364) @@ -1,488 +1,489 @@ + Creating an LLVM Project

Creating an LLVM Project

  1. Overview
  2. Create a project from the Sample Project
  3. Source tree layout
  4. Writing LLVM-style Makefiles
    1. Required Variables
    2. Variables for Building Subdirectories
    3. Variables for Building Libraries
    4. Variables for Building Programs
    5. Miscellaneous Variables
  5. Placement of object code
  6. Further help

Written by John Criswell

Overview

The LLVM build system is designed to facilitate the building of third party projects that use LLVM header files, libraries, and tools. In order to use these facilities, a Makefile from a project must do the following things:

  1. Set make variables. There are several variables that a Makefile needs to set to use the LLVM build system:
    • PROJECT_NAME - The name by which your project is known.
    • LLVM_SRC_ROOT - The root of the LLVM source tree.
    • LLVM_OBJ_ROOT - The root of the LLVM object tree.
    • PROJ_SRC_ROOT - The root of the project's source tree.
    • PROJ_OBJ_ROOT - The root of the project's object tree.
    • PROJ_INSTALL_ROOT - The root installation directory.
    • LEVEL - The relative path from the current directory to the project's root ($PROJ_OBJ_ROOT).
  2. Include Makefile.config from $(LLVM_OBJ_ROOT).
  3. Include Makefile.rules from $(LLVM_SRC_ROOT).

There are two ways that you can set all of these variables:

  1. You can write your own Makefiles which hard-code these values.
  2. You can use the pre-made LLVM sample project. This sample project includes Makefiles, a configure script that can be used to configure the location of LLVM, and the ability to support multiple object directories from a single source directory.

This document assumes that you will base your project on the LLVM sample project found in llvm/projects/sample. If you want to devise your own build system, studying the sample project and LLVM Makefiles will probably provide enough information on how to write your own Makefiles.

Create a Project from the Sample Project

Follow these simple steps to start your project:

  1. Copy the llvm/projects/sample directory to any place of your choosing. You can place it anywhere you like. Rename the directory to match the name of your project.
  2. If you downloaded LLVM using Subversion, remove all the directories named .svn (and all the files therein) from your project's new source tree. This will keep Subversion from thinking that your project is inside llvm/trunk/projects/sample.
  3. Add your source code and Makefiles to your source tree.
  4. If you want your project to be configured with the configure script then you need to edit autoconf/configure.ac as follows:
    • AC_INIT. Place the name of your project, its version number and a contact email address for your project as the arguments to this macro
    • AC_CONFIG_AUX_DIR. If your project isn't in the llvm/projects directory then you might need to adjust this so that it specifies a relative path to the llvm/autoconf directory.
    • LLVM_CONFIG_PROJECT. Just leave this alone.
    • AC_CONFIG_SRCDIR. Specify a path to a file name that identifies your project; or just leave it at Makefile.common.in
    • AC_CONFIG_FILES. Do not change.
    • AC_CONFIG_MAKEFILE. Use one of these macros for each Makefile that your project uses. This macro arranges for your makefiles to be copied from the source directory, unmodified, to the build directory.
  5. After updating autoconf/configure.ac, regenerate the configure script with these commands:

    % cd autoconf
    % ./AutoRegen.sh

    You must be using Autoconf version 2.59 or later and your aclocal version should be 1.9 or later.

  6. Run configure in the directory in which you want to place object code. Use the following options to tell your project where it can find LLVM:
    --with-llvmsrc=<directory>
    Tell your project where the LLVM source tree is located.

    --with-llvmobj=<directory>
    Tell your project where the LLVM object tree is located.

    --prefix=<directory>
    Tell your project where it should get installed.

That's it! Now all you have to do is type gmake (or make if your on a GNU/Linux system) in the root of your object directory, and your project should build.

Source Tree Layout

In order to use the LLVM build system, you will want to organize your source code so that it can benefit from the build system's features. Mainly, you want your source tree layout to look similar to the LLVM source tree layout. The best way to do this is to just copy the project tree from llvm/projects/sample and modify it to meet your needs, but you can certainly add to it if you want.

Underneath your top level directory, you should have the following directories:

lib
This subdirectory should contain all of your library source code. For each library that you build, you will have one directory in lib that will contain that library's source code.

Libraries can be object files, archives, or dynamic libraries. The lib directory is just a convenient place for libraries as it places them all in a directory from which they can be linked later.

include
This subdirectory should contain any header files that are global to your project. By global, we mean that they are used by more than one library or executable of your project.

By placing your header files in include, they will be found automatically by the LLVM build system. For example, if you have a file include/jazz/note.h, then your source files can include it simply with #include "jazz/note.h".

tools
This subdirectory should contain all of your source code for executables. For each program that you build, you will have one directory in tools that will contain that program's source code.

test
This subdirectory should contain tests that verify that your code works correctly. Automated tests are especially useful.

Currently, the LLVM build system provides basic support for tests. The LLVM system provides the following:

  • LLVM provides a tcl procedure that is used by Dejagnu to run tests. It can be found in llvm/lib/llvm-dg.exp. This test procedure uses RUN lines in the actual test case to determine how to run the test. See the TestingGuide for more details. You can easily write Makefile support similar to the Makefiles in llvm/test to use Dejagnu to run your project's tests.
  • LLVM contains an optional package called llvm-test which provides benchmarks and programs that are known to compile with the LLVM GCC front ends. You can use these programs to test your code, gather statistics information, and compare it to the current LLVM performance statistics.
    Currently, there is no way to hook your tests directly into the llvm/test testing harness. You will simply need to find a way to use the source provided within that directory on your own.

Typically, you will want to build your lib directory first followed by your tools directory.

Writing LLVM Style Makefiles

The LLVM build system provides a convenient way to build libraries and executables. Most of your project Makefiles will only need to define a few variables. Below is a list of the variables one can set and what they can do:

Required Variables

LEVEL
This variable is the relative path from this Makefile to the top directory of your project's source code. For example, if your source code is in /tmp/src, then the Makefile in /tmp/src/jump/high would set LEVEL to "../..".

Variables for Building Subdirectories

DIRS
This is a space separated list of subdirectories that should be built. They will be built, one at a time, in the order specified.

PARALLEL_DIRS
This is a list of directories that can be built in parallel. These will be built after the directories in DIRS have been built.

OPTIONAL_DIRS
This is a list of directories that can be built if they exist, but will not cause an error if they do not exist. They are built serially in the order in which they are listed.

Variables for Building Libraries

LIBRARYNAME
This variable contains the base name of the library that will be built. For example, to build a library named libsample.a, LIBRARYNAME should be set to sample.

BUILD_ARCHIVE
By default, a library is a .o file that is linked directly into a program. To build an archive (also known as a static library), set the BUILD_ARCHIVE variable.

SHARED_LIBRARY
If SHARED_LIBRARY is defined in your Makefile, a shared (or dynamic) library will be built.

Variables for Building Programs

TOOLNAME
This variable contains the name of the program that will be built. For example, to build an executable named sample, TOOLNAME should be set to sample.

USEDLIBS
This variable holds a space separated list of libraries that should be linked into the program. These libraries must be libraries that come from your lib directory. The libraries must be specified without their "lib" prefix. For example, to link libsample.a, you would set USEDLIBS to sample.a.

Note that this works only for statically linked libraries.

LLVMLIBS
This variable holds a space separated list of libraries that should be linked into the program. These libraries must be LLVM libraries. The libraries must be specified without their "lib" prefix. For example, to link with a driver that performs an IR transformation you might set LLVMLIBS to this minimal set of libraries LLVMSupport.a LLVMCore.a LLVMBitReader.a LLVMAsmParser.a LLVMAnalysis.a LLVMTransformUtils.a LLVMScalarOpts.a LLVMTarget.a.

Note that this works only for statically linked libraries. LLVM is split into a large number of static libraries, and the list of libraries you require may be much longer than the list above. To see a full list of libraries use: llvm-config --libs all. Using LINK_COMPONENTS as described below, obviates the need to set LLVMLIBS.

LINK_COMPONENTS
This variable holds a space separated list of components that the LLVM Makefiles pass to the llvm-config tool to generate a link line for the program. For example, to link with all LLVM libraries use LINK_COMPONENTS = all.

LIBS
To link dynamic libraries, add -l<library base name> to the LIBS variable. The LLVM build system will look in the same places for dynamic libraries as it does for static libraries.

For example, to link libsample.so, you would have the following line in your Makefile:

LIBS += -lsample

Note that LIBS must occur in the Makefile after the inclusion of Makefile.common.

Miscellaneous Variables

ExtraSource
This variable contains a space separated list of extra source files that need to be built. It is useful for including the output of Lex and Yacc programs.

CFLAGS
CPPFLAGS
This variable can be used to add options to the C and C++ compiler, respectively. It is typically used to add options that tell the compiler the location of additional directories to search for header files.

It is highly suggested that you append to CFLAGS and CPPFLAGS as opposed to overwriting them. The master Makefiles may already have useful options in them that you may not want to overwrite.

Placement of Object Code

The final location of built libraries and executables will depend upon whether you do a Debug, Release, or Profile build.

Libraries
All libraries (static and dynamic) will be stored in PROJ_OBJ_ROOT/<type>/lib, where type is Debug, Release, or Profile for a debug, optimized, or profiled build, respectively.

Executables
All executables will be stored in PROJ_OBJ_ROOT/<type>/bin, where type is Debug, Release, or Profile for a debug, optimized, or profiled build, respectively.

Further Help

If you have any questions or need any help creating an LLVM project, the LLVM team would be more than happy to help. You can always post your questions to the LLVM Developers Mailing List.


Valid CSS Valid HTML 4.01 John Criswell
The LLVM Compiler Infrastructure
- Last modified: $Date: 2011-06-03 04:20:48 +0200 (Fri, 03 Jun 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/ReleaseNotes.html =================================================================== --- vendor/llvm/dist/docs/ReleaseNotes.html (revision 228363) +++ vendor/llvm/dist/docs/ReleaseNotes.html (revision 228364) @@ -1,1103 +1,1349 @@ LLVM 3.0 Release Notes

LLVM 3.0 Release Notes

LLVM Dragon Logo
  1. Introduction
  2. Sub-project Status Update
  3. External Projects Using LLVM 3.0
  4. What's New in LLVM 3.0?
  5. Installation Instructions
  6. Known Problems
  7. Additional Information

Written by the LLVM Team

Introduction

This document contains the release notes for the LLVM Compiler Infrastructure, release 3.0. Here we describe the status of LLVM, including major improvements from the previous release and significant known problems. All LLVM releases may be downloaded from the LLVM releases web site.

For more information about LLVM, including information about the latest release, please check out the main LLVM web site. If you have questions or comments, the LLVM Developer's Mailing List is a good place to send them.

Note that if you are reading this file from a Subversion checkout or the main LLVM web page, this document applies to the next release, not the current one. To see the release notes for a specific release, please see the releases page.

Sub-project Status Update

The LLVM 3.0 distribution currently consists of code from the core LLVM repository (which roughly includes the LLVM optimizers, code generators and supporting tools), the Clang repository and the llvm-gcc repository. In addition to this code, the LLVM Project includes other sub-projects that are in development. Here we include updates on these subprojects.

Clang: C/C++/Objective-C Frontend Toolkit

Clang is an LLVM front end for the C, C++, and Objective-C languages. Clang aims to provide a better user experience through expressive diagnostics, a high level of conformance to language standards, fast compilation, and low memory use. Like LLVM, Clang provides a modular, library-based architecture that makes it suitable for creating or integrating with other development tools. Clang is considered a production-quality compiler for C, Objective-C, C++ and Objective-C++ on x86 (32- and 64-bit), and for darwin/arm targets.

In the LLVM 3.0 time-frame, the Clang team has made many improvements:

If Clang rejects your code but another compiler accepts it, please take a look at the language compatibility guide to make sure this is not intentional or a known issue.

DragonEgg: GCC front-ends, LLVM back-end

DragonEgg is a gcc plugin that replaces GCC's optimizers and code generators with LLVM's. Currently it requires a patched version of gcc-4.5. The plugin can target the x86-32 and x86-64 processor families and has been used successfully on the Darwin, FreeBSD and Linux platforms. The Ada, C, C++ and Fortran languages work well. The plugin is capable of compiling plenty of Obj-C, Obj-C++ and Java but it is not known whether the compiled code actually works or not!

The 3.0 release has the following notable changes:

compiler-rt: Compiler Runtime Library

The new LLVM compiler-rt project is a simple library that provides an implementation of the low-level target-specific hooks required by code generation and other runtime components. For example, when compiling for a 32-bit target, converting a double to a 64-bit unsigned integer is compiled into a runtime call to the "__fixunsdfdi" function. The compiler-rt library provides highly optimized implementations of this and other low-level routines (some are 3x faster than the equivalent libgcc routines).

In the LLVM 3.0 timeframe,

LLDB: Low Level Debugger

-

LLDB is a brand new member of the LLVM - umbrella of projects. LLDB is a next generation, high-performance - debugger. It is built as a set of reusable components which highly leverage - existing libraries in the larger LLVM Project, such as the Clang expression - parser, the LLVM disassembler and the LLVM JIT.

- -

LLDB is has advanced by leaps and bounds in the 3.0 timeframe. It is +

LLDB has advanced by leaps and bounds in the 3.0 timeframe. It is dramatically more stable and useful, and includes both a new tutorial and a side-by-side comparison with GDB.

libc++: C++ Standard Library

-

libc++ is another new member of the - LLVM family. It is an implementation of the C++ standard library, written - from the ground up to specifically target the forthcoming C++'0X standard and - focus on delivering great performance.

- -

In the LLVM 3.0 timeframe,

-

Like compiler_rt, libc++ is now dual licensed under the MIT and UIUC license, allowing it to be used more permissively.

LLBrowse: IR Browser

LLBrowse is an interactive viewer for LLVM modules. It can load any LLVM module and displays its contents as an expandable tree view, facilitating an easy way to inspect types, functions, global variables, or metadata nodes. It is fully cross-platform, being based on the popular wxWidgets GUI toolkit.

VMKit

The VMKit project is an implementation of a Java Virtual Machine (Java VM or JVM) that uses LLVM for static and just-in-time compilation. As of LLVM 3.0, VMKit now supports generational garbage collectors. The garbage collectors are provided by the MMTk framework, and VMKit can be configured to use one of the numerous implemented collectors of MMTk.

External Open Source Projects Using LLVM 3.0

An exciting aspect of LLVM is that it is used as an enabling technology for a lot of other language and tools projects. This section lists some of the projects that have already been updated to work with LLVM 3.0.

-

Crack Programming Language

+

AddressSanitizer

+ +
+

AddressSanitizer + uses compiler instrumentation and a specialized malloc library to find C/C++ + bugs such as use-after-free and out-of-bound accesses to heap, stack, and + globals. The key feature of the tool is speed: the average slowdown + introduced by AddressSanitizer is less than 2x.

+ +
+ + +

ClamAV

+
-

Crack aims to provide - the ease of development of a scripting language with the performance of a - compiled language. The language derives concepts from C++, Java and Python, - incorporating object-oriented programming, operator overloading and strong - typing.

+

Clam AntiVirus is an open source (GPL) + anti-virus toolkit for UNIX, designed especially for e-mail scanning on mail + gateways.

+

Since version 0.96 it + has bytecode + signatures that allow writing detections for complex malware.

+ +

It uses LLVM's JIT to speed up the execution of bytecode on X86, X86-64, + PPC32/64, falling back to its own interpreter otherwise. The git version was + updated to work with LLVM 3.0.

+
+ + +

clReflect

+ +
+ +

clReflect is a C++ + parser that uses clang/LLVM to derive a light-weight reflection database + suitable for use in game development. It comes with a very simple runtime + library for loading and querying the database, requiring no external + dependencies (including CRT), and an additional utility library for object + management and serialisation.

+ +
+ + +

Cling C++ Interpreter

+ +
+ +

Cling is an interactive compiler interface + (aka C++ interpreter). It uses LLVM's JIT and clang; it currently supports + C++ and C. It has a prompt interface, runs source files, calls into shared + libraries, prints the value of expressions, even does runtime lookup of + identifiers (dynamic scopes). And it just behaves like one would expect from + an interpreter.

+ +
+ + + -

TTA-based Codesign Environment (TCE)

+

Glasgow Haskell Compiler (GHC)

+

GHC is an open source, state-of-the-art programming suite for Haskell, a + standard lazy functional programming language. It includes an optimizing + static compiler generating good code for a variety of platforms, together + with an interactive system for convenient, quick development.

+ +

GHC 7.0 and onwards include an LLVM code generator, supporting LLVM 2.8 and + later. Since LLVM 2.9, GHC now includes experimental support for the ARM + platform with LLVM 3.0.

+ +
+ + +

gwXscript

+ +
+ +

gwXscript is an object oriented, + aspect oriented programming language which can create both executables (ELF, + EXE) and shared libraries (DLL, SO, DYNLIB). The compiler is implemented in + its own language and translates scripts into LLVM-IR which can be optimized + and translated into native code by the LLVM framework. Source code in + gwScript contains definitions that expand the namespaces. So you can build + your project and simply 'plug out' features by removing a file. The remaining + project does not leave scars since you directly separate concerns by the + 'template' feature of gwX. It is also possible to add new features to a + project by just adding files and without editing the original project. This + language is used for example to create games or content management systems + that should be extendable.

+ +

gwXscript is strongly typed and offers comfort with its native types string, + hash and array. You can easily write new libraries in gwXscript or native + code. gwXscript is type safe and users should not be able to crash your + program or execute malicious code except code that is eating CPU time.

+ +
+ + +

include-what-you-use

+ +
+ +

include-what-you-use + is a tool to ensure that a file directly #includes + all .h files that provide a symbol that the file uses. It also + removes superfluous #includes from source files.

+ +
+ + +

LanguageKit and Pragmatic Smalltalk

+ +
+ +

LanguageKit is + a framework for implementing dynamic languages sharing an object model with + Objective-C. It provides static and JIT compilation using LLVM along with + its own interpreter. Pragmatic Smalltalk is a dialect of Smalltalk, built on + top of LanguageKit, that interfaces directly with Objective-C, sharing the + same object representation and message sending behaviour. These projects are + developed as part of the Étoié desktop environment.

+ +
+ + +

LuaAV

+ +
+ +

LuaAV is a real-time + audiovisual scripting environment based around the Lua language and a + collection of libraries for sound, graphics, and other media protocols. LuaAV + uses LLVM and Clang to JIT compile efficient user-defined audio synthesis + routines specified in a declarative syntax.

+ +
+ + +

Mono

+ +
+ +

An open source, cross-platform implementation of C# and the CLR that is + binary compatible with Microsoft.NET. Has an optional, dynamically-loaded + LLVM code generation backend in Mini, the JIT compiler.

+ +

Note that we use a Git mirror of LLVM with some patches. See: + https://github.com/mono/llvm

+ +
+ + +

Portable OpenCL (pocl)

+ +
+ +

Portable OpenCL is an open source implementation of the OpenCL standard which + can be easily adapted for new targets. One of the goals of the project is + improving performance portability of OpenCL programs, avoiding the need for + target-dependent manual optimizations. A "native" target is included, which + allows running OpenCL kernels on the host (CPU).

+ +
+ + +

Pure

+ +
+

Pure is an + algebraic/functional programming language based on term rewriting. Programs + are collections of equations which are used to evaluate expressions in a + symbolic fashion. The interpreter uses LLVM as a backend to JIT-compile Pure + programs to fast native code. Pure offers dynamic typing, eager and lazy + evaluation, lexical closures, a hygienic macro system (also based on term + rewriting), built-in list and matrix support (including list and matrix + comprehensions) and an easy-to-use interface to C and other programming + languages (including the ability to load LLVM bitcode modules, and inline C, + C++, Fortran and Faust code in Pure programs if the corresponding LLVM-enabled + compilers are installed).

+ +

Pure version 0.48 has been tested and is known to work with LLVM 3.0 + (and continues to work with older LLVM releases >= 2.5).

+ +
+ + +

Renderscript

+ +
+ +

Renderscript + is Android's advanced 3D graphics rendering and compute API. It provides a + portable C99-based language with extensions to facilitate common use cases + for enhancing graphics and thread level parallelism. The Renderscript + compiler frontend is based on Clang/LLVM. It emits a portable bitcode format + for the actual compiled script code, as well as reflects a Java interface for + developers to control the execution of the compiled bitcode. Executable + machine code is then generated from this bitcode by an LLVM backend on the + device. Renderscript is thus able to provide a mechanism by which Android + developers can improve performance of their applications while retaining + portability.

+ +
+ + +

SAFECode

+ +
+ +

SAFECode is a memory safe C/C++ + compiler built using LLVM. It takes standard, unannotated C/C++ code, + analyzes the code to ensure that memory accesses and array indexing + operations are safe, and instruments the code with run-time checks when + safety cannot be proven statically. SAFECode can be used as a debugging aid + (like Valgrind) to find and repair memory safety bugs. It can also be used + to protect code from security attacks at run-time.

+ +
+ + +

The Stupid D Compiler (SDC)

+ +
+ +

The Stupid D Compiler is a + project seeking to write a self-hosting compiler for the D programming + language without using the frontend of the reference compiler (DMD).

+ +
+ + +

TTA-based Co-design Environment (TCE)

+ +
+

TCE is a toolset for designing application-specific processors (ASP) based on the Transport triggered architecture (TTA). The toolset provides a complete co-design flow from C/C++ programs down to synthesizable VHDL and parallel program binaries. Processor customization points include the register files, function units, supported operations, and the interconnection network.

TCE uses Clang and LLVM for C/C++ language support, target independent optimizations and also for parts of code generation. It generates new LLVM-based code generators "on the fly" for the designed TTA processors and loads them in to the compiler backend as runtime libraries to avoid per-target recompilation of larger parts of the compiler chain.

-

PinaVM

- +

Tart Programming Language

+
-

PinaVM is an open - source, SystemC front-end. Unlike many - other front-ends, PinaVM actually executes the elaboration of the program - analyzed using LLVM's JIT infrastructure. It later enriches the bitcode with - SystemC-specific information.

+

Tart is a general-purpose, + strongly typed programming language designed for application + developers. Strongly inspired by Python and C#, Tart focuses on practical + solutions for the professional software developer, while avoiding the clutter + and boilerplate of legacy languages like Java and C++. Although Tart is still + in development, the current implementation supports many features expected of + a modern programming language, such as garbage collection, powerful + bidirectional type inference, a greatly simplified syntax for template + metaprogramming, closures and function literals, reflection, operator + overloading, explicit mutability and immutability, and much more. Tart is + flexible enough to accommodate a broad range of programming styles and + philosophies, while maintaining a strong commitment to simplicity, minimalism + and elegance in design.

-

Pure

- +

ThreadSanitizer

+
-

Pure is an - algebraic/functional programming language based on term rewriting. Programs - are collections of equations which are used to evaluate expressions in a - symbolic fashion. The interpreter uses LLVM as a backend to JIT-compile Pure - programs to fast native code. Pure offers dynamic typing, eager and lazy - evaluation, lexical closures, a hygienic macro system (also based on term - rewriting), built-in list and matrix support (including list and matrix - comprehensions) and an easy-to-use interface to C and other programming - languages (including the ability to load LLVM bitcode modules, and inline C, - C++, Fortran and Faust code in Pure programs if the corresponding - LLVM-enabled compilers are installed).

- -

Pure version 0.47 has been tested and is known to work with LLVM 3.0 (and - continues to work with older LLVM releases >= 2.5).

+

ThreadSanitizer is a + data race detector for (mostly) C and C++ code, available for Linux, Mac OS + and Windows. On different systems, we use binary instrumentation frameworks + (Valgrind and Pin) as frontends that generate the program events for the race + detection algorithm. On Linux, there's an option of using LLVM-based + compile-time instrumentation.

-

IcedTea Java Virtual Machine Implementation

+

The ZooLib C++ Cross-Platform Application Framework

-

IcedTea provides a - harness to build OpenJDK using only free software build tools and to provide - replacements for the not-yet free parts of OpenJDK. One of the extensions - that IcedTea provides is a new JIT compiler - named Shark - which uses LLVM to provide native code generation without introducing - processor-dependent code.

+

ZooLib is Open Source under the MIT + License. It provides GUI, filesystem access, TCP networking, thread-safe + memory management, threading and locking for Mac OS X, Classic Mac OS, + Microsoft Windows, POSIX operating systems with X11, BeOS, Haiku, Apple's iOS + and Research in Motion's BlackBerry.

-

OpenJDK 7 b112, IcedTea6 1.9 and IcedTea7 1.13 and later have been tested and - are known to work with LLVM 3.0 (and continue to work with older LLVM - releases >= 2.6 as well).

+

My current work is to use CLang's static analyzer to improve ZooLib's code + quality. I also plan to set up LLVM compiles of the demo programs and test + programs using CLang and LLVM on all the platforms that CLang, LLVM and + ZooLib all support.

-

Glasgow Haskell Compiler (GHC)

+ -

GHC is an open source, state-of-the-art programming suite for Haskell, a - standard lazy functional programming language. It includes an optimizing - static compiler generating good code for a variety of platforms, together - with an interactive system for convenient, quick development.

-

In addition to the existing C and native code generators, GHC 7.0 now - supports an LLVM code generator. GHC supports LLVM 2.7 and later.

+ + + + +

What's New in LLVM 3.0?

This release includes a huge number of bug fixes, performance tweaks and minor improvements. Some of the major improvements and new features are listed in this section.

Major New Features

LLVM 3.0 includes several major new capabilities:

LLVM IR and Core Improvements

LLVM IR has several new features for better support of new targets and that expose new optimization opportunities:

One of the biggest changes is that 3.0 has a new exception handling system. The old system used LLVM intrinsics to convey the exception handling information to the code generator. It worked in most cases, but not all. Inlining was especially difficult to get right. Also, the intrinsics could be moved away from the invoke instruction, making it hard to recover that information.

The new EH system makes exception handling a first-class member of the IR. It adds two new instructions:

Converting from the old EH API to the new EH API is rather simple, because a lot of complexity has been removed. The two intrinsics, @llvm.eh.exception and @llvm.eh.selector have been superceded by the landingpad instruction. Instead of generating a call to @llvm.eh.exception and @llvm.eh.selector:

 Function *ExcIntr = Intrinsic::getDeclaration(TheModule,
                                               Intrinsic::eh_exception);
 Function *SlctrIntr = Intrinsic::getDeclaration(TheModule,
                                                 Intrinsic::eh_selector);
 
 // The exception pointer.
 Value *ExnPtr = Builder.CreateCall(ExcIntr, "exc_ptr");
 
 std::vector<Value*> Args;
 Args.push_back(ExnPtr);
 Args.push_back(Builder.CreateBitCast(Personality,
                                      Type::getInt8PtrTy(Context)));
 
 // Add selector clauses to Args.
 
 // The selector call.
 Builder.CreateCall(SlctrIntr, Args, "exc_sel");
 

You should instead generate a landingpad instruction, that returns an exception object and selector value:

 LandingPadInst *LPadInst =
   Builder.CreateLandingPad(StructType::get(Int8PtrTy, Int32Ty, NULL),
                            Personality, 0);
 
 Value *LPadExn = Builder.CreateExtractValue(LPadInst, 0);
 Builder.CreateStore(LPadExn, getExceptionSlot());
 
 Value *LPadSel = Builder.CreateExtractValue(LPadInst, 1);
 Builder.CreateStore(LPadSel, getEHSelectorSlot());
 

It's now trivial to add the individual clauses to the landingpad instruction.

 // Adding a catch clause
 Constant *TypeInfo = getTypeInfo();
 LPadInst->addClause(TypeInfo);
 
 // Adding a C++ catch-all
 LPadInst->addClause(Constant::getNullValue(Builder.getInt8PtrTy()));
 
 // Adding a cleanup
 LPadInst->setCleanup(true);
 
 // Adding a filter clause
 std::vector<Constant*> TypeInfos;
 Constant *TypeInfo = getFilterTypeInfo();
 TypeInfos.push_back(Builder.CreateBitCast(TypeInfo, Builder.getInt8PtrTy()));
 
 ArrayType *FilterTy = ArrayType::get(Int8PtrTy, TypeInfos.size());
 LPadInst->addClause(ConstantArray::get(FilterTy, TypeInfos));
 

Converting from using the @llvm.eh.resume intrinsic to the resume instruction is trivial. It takes the exception pointer and exception selector values returned by the landingpad instruction:

 Type *UnwindDataTy = StructType::get(Builder.getInt8PtrTy(),
                                      Builder.getInt32Ty(), NULL);
 Value *UnwindData = UndefValue::get(UnwindDataTy);
 Value *ExcPtr = Builder.CreateLoad(getExceptionObjSlot());
 Value *ExcSel = Builder.CreateLoad(getExceptionSelSlot());
 UnwindData = Builder.CreateInsertValue(UnwindData, ExcPtr, 0, "exc_ptr");
 UnwindData = Builder.CreateInsertValue(UnwindData, ExcSel, 1, "exc_sel");
 Builder.CreateResume(UnwindData);
 

Optimizer Improvements

In addition to a large array of minor performance tweaks and bug fixes, this release includes a few major enhancements and additions to the optimizers:

MC Level Improvements

The LLVM Machine Code (aka MC) subsystem was created to solve a number of problems in the realm of assembly, disassembly, object file format handling, and a number of other related areas that CPU instruction-set level tools work in.

For more information, please see the Intro to the LLVM MC Project Blog Post.

Target Independent Code Generator Improvements

We have put a significant amount of work into the code generator infrastructure, which allows us to implement more aggressive algorithms and make it run faster:

X86-32 and X86-64 Target Improvements

New features and major changes in the X86 target include:

ARM Target Improvements

New features of the ARM target include:

Other Target Specific Improvements

+

PPC32/ELF va_arg was implemented.

+

PPC32 initial support for .o file writing was implemented.

+

Major Changes and Removed Features

If you're already an LLVM user or developer with out-of-tree changes based on LLVM 2.9, this section lists some "gotchas" that you may run into upgrading from the previous release.

Windows (32-bit)

  • On Win32(MinGW32 and MSVC), Windows 2000 will not be supported. Windows XP or higher is required.

Internal API Changes

In addition, many APIs have changed in this release. Some of the major LLVM API changes are:

Known Problems

This section contains significant known problems with the LLVM system, listed by component. If you run into a problem, please check the LLVM bug database and submit a bug if there isn't already one.

Experimental features included with this release

The following components of this LLVM release are either untested, known to be broken or unreliable, or are in early development. These components should not be relied on, and bugs should not be filed against them, but they may be useful to some people. In particular, if you would like to work on one of these components, please contact us on the LLVMdev list.

Known problems with the X86 back-end

Known problems with the PowerPC back-end

Known problems with the ARM back-end

Known problems with the SPARC back-end

Known problems with the MIPS back-end

Known problems with the Alpha back-end

Known problems with the C back-end

The C backend has numerous problems and is not being actively maintained. Depending on it for anything serious is not advised.

Known problems with the llvm-gcc front-end

LLVM 2.9 was the last release of llvm-gcc.

llvm-gcc is generally very stable for the C family of languages. The only major language feature of GCC not supported by llvm-gcc is the __builtin_apply family of builtins. However, some extensions are only supported on some targets. For example, trampolines are only supported on some targets (these are used when you take the address of a nested function).

Fortran support generally works, but there are still several unresolved bugs in Bugzilla. Please see the tools/gfortran component for details. Note that llvm-gcc is missing major Fortran performance work in the frontend and library that went into GCC after 4.2. If you are interested in Fortran, we recommend that you consider using dragonegg instead.

The llvm-gcc 4.2 Ada compiler has basic functionality, but is no longer being actively maintained. If you are interested in Ada, we recommend that you consider using dragonegg instead.

Additional Information

A wide variety of additional information is available on the LLVM web page, in particular in the documentation section. The web page also contains versions of the API documentation which is up-to-date with the Subversion version of the source code. You can access versions of these documents specific to this release by going into the "llvm/doc/" directory in the LLVM tree.

If you have any questions or comments about LLVM, please feel free to contact us via the mailing lists.


Valid CSS Valid HTML 4.01 LLVM Compiler Infrastructure
- Last modified: $Date: 2011-10-17 08:31:58 +0200 (Mon, 17 Oct 2011) $ + Last modified: $Date: 2011-11-01 05:51:35 +0100 (Tue, 01 Nov 2011) $
Index: vendor/llvm/dist/docs/SystemLibrary.html =================================================================== --- vendor/llvm/dist/docs/SystemLibrary.html (revision 228363) +++ vendor/llvm/dist/docs/SystemLibrary.html (revision 228364) @@ -1,315 +1,316 @@ + System Library

System Library

Written by Reid Spencer

Abstract

This document provides some details on LLVM's System Library, located in the source at lib/System and include/llvm/System. The library's purpose is to shield LLVM from the differences between operating systems for the few services LLVM needs from the operating system. Much of LLVM is written using portability features of standard C++. However, in a few areas, system dependent facilities are needed and the System Library is the wrapper around those system calls.

By centralizing LLVM's use of operating system interfaces, we make it possible for the LLVM tool chain and runtime libraries to be more easily ported to new platforms since (theoretically) only lib/System needs to be ported. This library also unclutters the rest of LLVM from #ifdef use and special cases for specific operating systems. Such uses are replaced with simple calls to the interfaces provided in include/llvm/System.

Note that the System Library is not intended to be a complete operating system wrapper (such as the Adaptive Communications Environment (ACE) or Apache Portable Runtime (APR)), but only provides the functionality necessary to support LLVM.

The System Library was written by Reid Spencer who formulated the design based on similar work originating from the eXtensible Programming System (XPS). Several people helped with the effort; especially, Jeff Cohen and Henrik Bach on the Win32 port.

Keeping LLVM Portable

In order to keep LLVM portable, LLVM developers should adhere to a set of portability rules associated with the System Library. Adherence to these rules should help the System Library achieve its goal of shielding LLVM from the variations in operating system interfaces and doing so efficiently. The following sections define the rules needed to fulfill this objective.

Don't Include System Headers

Except in lib/System, no LLVM source code should directly #include a system header. Care has been taken to remove all such #includes from LLVM while lib/System was being developed. Specifically this means that header files like "unistd.h", "windows.h", "stdio.h", and "string.h" are forbidden to be included by LLVM source code outside the implementation of lib/System.

To obtain system-dependent functionality, existing interfaces to the system found in include/llvm/System should be used. If an appropriate interface is not available, it should be added to include/llvm/System and implemented in lib/System for all supported platforms.

Don't Expose System Headers

The System Library must shield LLVM from all system headers. To obtain system level functionality, LLVM source must #include "llvm/System/Thing.h" and nothing else. This means that Thing.h cannot expose any system header files. This protects LLVM from accidentally using system specific functionality and only allows it via the lib/System interface.

Use Standard C Headers

The standard C headers (the ones beginning with "c") are allowed to be exposed through the lib/System interface. These headers and the things they declare are considered to be platform agnostic. LLVM source files may include them directly or obtain their inclusion through lib/System interfaces.

Use Standard C++ Headers

The standard C++ headers from the standard C++ library and standard template library may be exposed through the lib/System interface. These headers and the things they declare are considered to be platform agnostic. LLVM source files may include them or obtain their inclusion through lib/System interfaces.

High Level Interface

The entry points specified in the interface of lib/System must be aimed at completing some reasonably high level task needed by LLVM. We do not want to simply wrap each operating system call. It would be preferable to wrap several operating system calls that are always used in conjunction with one another by LLVM.

For example, consider what is needed to execute a program, wait for it to complete, and return its result code. On Unix, this involves the following operating system calls: getenv, fork, execve, and wait. The correct thing for lib/System to provide is a function, say ExecuteProgramAndWait, that implements the functionality completely. what we don't want is wrappers for the operating system calls involved.

There must not be a one-to-one relationship between operating system calls and the System library's interface. Any such interface function will be suspicious.

No Unused Functionality

There must be no functionality specified in the interface of lib/System that isn't actually used by LLVM. We're not writing a general purpose operating system wrapper here, just enough to satisfy LLVM's needs. And, LLVM doesn't need much. This design goal aims to keep the lib/System interface small and understandable which should foster its actual use and adoption.

No Duplicate Implementations

The implementation of a function for a given platform must be written exactly once. This implies that it must be possible to apply a function's implementation to multiple operating systems if those operating systems can share the same implementation. This rule applies to the set of operating systems supported for a given class of operating system (e.g. Unix, Win32).

No Virtual Methods

The System Library interfaces can be called quite frequently by LLVM. In order to make those calls as efficient as possible, we discourage the use of virtual methods. There is no need to use inheritance for implementation differences, it just adds complexity. The #include mechanism works just fine.

No Exposed Functions

Any functions defined by system libraries (i.e. not defined by lib/System) must not be exposed through the lib/System interface, even if the header file for that function is not exposed. This prevents inadvertent use of system specific functionality.

For example, the stat system call is notorious for having variations in the data it provides. lib/System must not declare stat nor allow it to be declared. Instead it should provide its own interface to discovering information about files and directories. Those interfaces may be implemented in terms of stat but that is strictly an implementation detail. The interface provided by the System Library must be implemented on all platforms (even those without stat).

No Exposed Data

Any data defined by system libraries (i.e. not defined by lib/System) must not be exposed through the lib/System interface, even if the header file for that function is not exposed. As with functions, this prevents inadvertent use of data that might not exist on all platforms.

Minimize Soft Errors

Operating system interfaces will generally provide error results for every little thing that could go wrong. In almost all cases, you can divide these error results into two groups: normal/good/soft and abnormal/bad/hard. That is, some of the errors are simply information like "file not found", "insufficient privileges", etc. while other errors are much harder like "out of space", "bad disk sector", or "system call interrupted". We'll call the first group "soft" errors and the second group "hard" errors.

lib/System must always attempt to minimize soft errors. This is a design requirement because the minimization of soft errors can affect the granularity and the nature of the interface. In general, if you find that you're wanting to throw soft errors, you must review the granularity of the interface because it is likely you're trying to implement something that is too low level. The rule of thumb is to provide interface functions that can't fail, except when faced with hard errors.

For a trivial example, suppose we wanted to add an "OpenFileForWriting" function. For many operating systems, if the file doesn't exist, attempting to open the file will produce an error. However, lib/System should not simply throw that error if it occurs because its a soft error. The problem is that the interface function, OpenFileForWriting is too low level. It should be OpenOrCreateFileForWriting. In the case of the soft "doesn't exist" error, this function would just create it and then open it for writing.

This design principle needs to be maintained in lib/System because it avoids the propagation of soft error handling throughout the rest of LLVM. Hard errors will generally just cause a termination for an LLVM tool so don't be bashful about throwing them.

Rules of thumb:

  1. Don't throw soft errors, only hard errors.
  2. If you're tempted to throw a soft error, re-think the interface.
  3. Handle internally the most common normal/good/soft error conditions so the rest of LLVM doesn't have to.

No throw Specifications

None of the lib/System interface functions may be declared with C++ throw() specifications on them. This requirement makes sure that the compiler does not insert additional exception handling code into the interface functions. This is a performance consideration: lib/System functions are at the bottom of many call chains and as such can be frequently called. We need them to be as efficient as possible. However, no routines in the system library should actually throw exceptions.

Code Organization

Implementations of the System Library interface are separated by their general class of operating system. Currently only Unix and Win32 classes are defined but more could be added for other operating system classifications. To distinguish which implementation to compile, the code in lib/System uses the LLVM_ON_UNIX and LLVM_ON_WIN32 #defines provided via configure through the llvm/Config/config.h file. Each source file in lib/System, after implementing the generic (operating system independent) functionality needs to include the correct implementation using a set of #if defined(LLVM_ON_XYZ) directives. For example, if we had lib/System/File.cpp, we'd expect to see in that file:


   #if defined(LLVM_ON_UNIX)
   #include "Unix/File.cpp"
   #endif
   #if defined(LLVM_ON_WIN32)
   #include "Win32/File.cpp"
   #endif
   

The implementation in lib/System/Unix/File.cpp should handle all Unix variants. The implementation in lib/System/Win32/File.cpp should handle all Win32 variants. What this does is quickly differentiate the basic class of operating system that will provide the implementation. The specific details for a given platform must still be determined through the use of #ifdef.

Consistent Semantics

The implementation of a lib/System interface can vary drastically between platforms. That's okay as long as the end result of the interface function is the same. For example, a function to create a directory is pretty straight forward on all operating system. System V IPC on the other hand isn't even supported on all platforms. Instead of "supporting" System V IPC, lib/System should provide an interface to the basic concept of inter-process communications. The implementations might use System V IPC if that was available or named pipes, or whatever gets the job done effectively for a given operating system. In all cases, the interface and the implementation must be semantically consistent.

Bug 351

See bug 351 for further details on the progress of this work


Valid CSS Valid HTML 4.01 Reid Spencer
LLVM Compiler Infrastructure
- Last modified: $Date: 2011-04-23 02:30:22 +0200 (Sat, 23 Apr 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/TableGenFundamentals.html =================================================================== --- vendor/llvm/dist/docs/TableGenFundamentals.html (revision 228363) +++ vendor/llvm/dist/docs/TableGenFundamentals.html (revision 228364) @@ -1,918 +1,919 @@ + TableGen Fundamentals

TableGen Fundamentals

Written by Chris Lattner

Introduction

TableGen's purpose is to help a human develop and maintain records of domain-specific information. Because there may be a large number of these records, it is specifically designed to allow writing flexible descriptions and for common features of these records to be factored out. This reduces the amount of duplication in the description, reduces the chance of error, and makes it easier to structure domain specific information.

The core part of TableGen parses a file, instantiates the declarations, and hands the result off to a domain-specific "TableGen backend" for processing. The current major user of TableGen is the LLVM code generator.

Note that if you work on TableGen much, and use emacs or vim, that you can find an emacs "TableGen mode" and a vim language file in the llvm/utils/emacs and llvm/utils/vim directories of your LLVM distribution, respectively.

Basic concepts

TableGen files consist of two key parts: 'classes' and 'definitions', both of which are considered 'records'.

TableGen records have a unique name, a list of values, and a list of superclasses. The list of values is the main data that TableGen builds for each record; it is this that holds the domain specific information for the application. The interpretation of this data is left to a specific TableGen backend, but the structure and format rules are taken care of and are fixed by TableGen.

TableGen definitions are the concrete form of 'records'. These generally do not have any undefined values, and are marked with the 'def' keyword.

TableGen classes are abstract records that are used to build and describe other records. These 'classes' allow the end-user to build abstractions for either the domain they are targeting (such as "Register", "RegisterClass", and "Instruction" in the LLVM code generator) or for the implementor to help factor out common properties of records (such as "FPInst", which is used to represent floating point instructions in the X86 backend). TableGen keeps track of all of the classes that are used to build up a definition, so the backend can find all definitions of a particular class, such as "Instruction".

TableGen multiclasses are groups of abstract records that are instantiated all at once. Each instantiation can result in multiple TableGen definitions. If a multiclass inherits from another multiclass, the definitions in the sub-multiclass become part of the current multiclass, as if they were declared in the current multiclass.

An example record

With no other arguments, TableGen parses the specified file and prints out all of the classes, then all of the definitions. This is a good way to see what the various definitions expand to fully. Running this on the X86.td file prints this (at the time of this writing):

 ...
 def ADD32rr {   // Instruction X86Inst I
   string Namespace = "X86";
   dag OutOperandList = (outs GR32:$dst);
   dag InOperandList = (ins GR32:$src1, GR32:$src2);
   string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}";
   list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))];
   list<Register> Uses = [];
   list<Register> Defs = [EFLAGS];
   list<Predicate> Predicates = [];
   int CodeSize = 3;
   int AddedComplexity = 0;
   bit isReturn = 0;
   bit isBranch = 0;
   bit isIndirectBranch = 0;
   bit isBarrier = 0;
   bit isCall = 0;
   bit canFoldAsLoad = 0;
   bit mayLoad = 0;
   bit mayStore = 0;
   bit isImplicitDef = 0;
   bit isConvertibleToThreeAddress = 1;
   bit isCommutable = 1;
   bit isTerminator = 0;
   bit isReMaterializable = 0;
   bit isPredicable = 0;
   bit hasDelaySlot = 0;
   bit usesCustomInserter = 0;
   bit hasCtrlDep = 0;
   bit isNotDuplicable = 0;
   bit hasSideEffects = 0;
   bit neverHasSideEffects = 0;
   InstrItinClass Itinerary = NoItinerary;
   string Constraints = "";
   string DisableEncoding = "";
   bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 };
   Format Form = MRMDestReg;
   bits<6> FormBits = { 0, 0, 0, 0, 1, 1 };
   ImmType ImmT = NoImm;
   bits<3> ImmTypeBits = { 0, 0, 0 };
   bit hasOpSizePrefix = 0;
   bit hasAdSizePrefix = 0;
   bits<4> Prefix = { 0, 0, 0, 0 };
   bit hasREX_WPrefix = 0;
   FPFormat FPForm = ?;
   bits<3> FPFormBits = { 0, 0, 0 };
 }
 ...
 

This definition corresponds to a 32-bit register-register add instruction in the X86. The string after the 'def' string indicates the name of the record—"ADD32rr" in this case—and the comment at the end of the line indicates the superclasses of the definition. The body of the record contains all of the data that TableGen assembled for the record, indicating that the instruction is part of the "X86" namespace, the pattern indicating how the the instruction should be emitted into the assembly file, that it is a two-address instruction, has a particular encoding, etc. The contents and semantics of the information in the record is specific to the needs of the X86 backend, and is only shown as an example.

As you can see, a lot of information is needed for every instruction supported by the code generator, and specifying it all manually would be unmaintainable, prone to bugs, and tiring to do in the first place. Because we are using TableGen, all of the information was derived from the following definition:

 let Defs = [EFLAGS],
     isCommutable = 1,                  // X = ADD Y,Z --> X = ADD Z,Y
     isConvertibleToThreeAddress = 1 in // Can transform into LEA.
 def ADD32rr  : I<0x01, MRMDestReg, (outs GR32:$dst),
                                    (ins GR32:$src1, GR32:$src2),
                  "add{l}\t{$src2, $dst|$dst, $src2}",
                  [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
 

This definition makes use of the custom class I (extended from the custom class X86Inst), which is defined in the X86-specific TableGen file, to factor out the common features that instructions of its class share. A key feature of TableGen is that it allows the end-user to define the abstractions they prefer to use when describing their information.

Running TableGen

TableGen runs just like any other LLVM tool. The first (optional) argument specifies the file to read. If a filename is not specified, tblgen reads from standard input.

To be useful, one of the TableGen backends must be used. These backends are selectable on the command line (type 'tblgen -help' for a list). For example, to get a list of all of the definitions that subclass a particular type (which can be useful for building up an enum list of these records), use the -print-enums option:

 $ tblgen X86.td -print-enums -class=Register
 AH, AL, AX, BH, BL, BP, BPL, BX, CH, CL, CX, DH, DI, DIL, DL, DX, EAX, EBP, EBX,
 ECX, EDI, EDX, EFLAGS, EIP, ESI, ESP, FP0, FP1, FP2, FP3, FP4, FP5, FP6, IP,
 MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7, R10, R10B, R10D, R10W, R11, R11B, R11D,
 R11W, R12, R12B, R12D, R12W, R13, R13B, R13D, R13W, R14, R14B, R14D, R14W, R15,
 R15B, R15D, R15W, R8, R8B, R8D, R8W, R9, R9B, R9D, R9W, RAX, RBP, RBX, RCX, RDI,
 RDX, RIP, RSI, RSP, SI, SIL, SP, SPL, ST0, ST1, ST2, ST3, ST4, ST5, ST6, ST7,
 XMM0, XMM1, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, XMM2, XMM3, XMM4, XMM5,
 XMM6, XMM7, XMM8, XMM9,
 
 $ tblgen X86.td -print-enums -class=Instruction 
 ABS_F, ABS_Fp32, ABS_Fp64, ABS_Fp80, ADC32mi, ADC32mi8, ADC32mr, ADC32ri,
 ADC32ri8, ADC32rm, ADC32rr, ADC64mi32, ADC64mi8, ADC64mr, ADC64ri32, ADC64ri8,
 ADC64rm, ADC64rr, ADD16mi, ADD16mi8, ADD16mr, ADD16ri, ADD16ri8, ADD16rm,
 ADD16rr, ADD32mi, ADD32mi8, ADD32mr, ADD32ri, ADD32ri8, ADD32rm, ADD32rr,
 ADD64mi32, ADD64mi8, ADD64mr, ADD64ri32, ...
 

The default backend prints out all of the records, as described above.

If you plan to use TableGen, you will most likely have to write a backend that extracts the information specific to what you need and formats it in the appropriate way.

TableGen syntax

TableGen doesn't care about the meaning of data (that is up to the backend to define), but it does care about syntax, and it enforces a simple type system. This section describes the syntax and the constructs allowed in a TableGen file.

TableGen primitives

TableGen comments

TableGen supports BCPL style "//" comments, which run to the end of the line, and it also supports nestable "/* */" comments.

The TableGen type system

TableGen files are strongly typed, in a simple (but complete) type-system. These types are used to perform automatic conversions, check for errors, and to help interface designers constrain the input that they allow. Every value definition is required to have an associated type.

TableGen supports a mixture of very low-level types (such as bit) and very high-level types (such as dag). This flexibility is what allows it to describe a wide range of information conveniently and compactly. The TableGen types are:

bit
A 'bit' is a boolean value that can hold either 0 or 1.
int
The 'int' type represents a simple 32-bit integer value, such as 5.
string
The 'string' type represents an ordered sequence of characters of arbitrary length.
bits<n>
A 'bits' type is an arbitrary, but fixed, size integer that is broken up into individual bits. This type is useful because it can handle some bits being defined while others are undefined.
list<ty>
This type represents a list whose elements are some other type. The contained type is arbitrary: it can even be another list type.
Class type
Specifying a class name in a type context means that the defined value must be a subclass of the specified class. This is useful in conjunction with the list type, for example, to constrain the elements of the list to a common base class (e.g., a list<Register> can only contain definitions derived from the "Register" class).
dag
This type represents a nestable directed graph of elements.
code
This represents a big hunk of text. This is lexically distinct from string values because it doesn't require escapeing double quotes and other common characters that occur in code.

To date, these types have been sufficient for describing things that TableGen has been used for, but it is straight-forward to extend this list if needed.

TableGen values and expressions

TableGen allows for a pretty reasonable number of different expression forms when building up values. These forms allow the TableGen file to be written in a natural syntax and flavor for the application. The current expression forms supported include:

?
uninitialized field
0b1001011
binary integer value
07654321
octal integer value (indicated by a leading 0)
7
decimal integer value
0x7F
hexadecimal integer value
"foo"
string value
[{ ... }]
code fragment
[ X, Y, Z ]<type>
list value. <type> is the type of the list element and is usually optional. In rare cases, TableGen is unable to deduce the element type in which case the user must specify it explicitly.
{ a, b, c }
initializer for a "bits<3>" value
value
value reference
value{17}
access to one bit of a value
value{15-17}
access to multiple bits of a value
DEF
reference to a record definition
CLASS<val list>
reference to a new anonymous definition of CLASS with the specified template arguments.
X.Y
reference to the subfield of a value
list[4-7,17,2-3]
A slice of the 'list' list, including elements 4,5,6,7,17,2, and 3 from it. Elements may be included multiple times.
(DEF a, b)
a dag value. The first element is required to be a record definition, the remaining elements in the list may be arbitrary other values, including nested `dag' values.
!strconcat(a, b)
A string value that is the result of concatenating the 'a' and 'b' strings.
!cast<type>(a)
A symbol of type type obtained by looking up the string 'a' in the symbol table. If the type of 'a' does not match type, TableGen aborts with an error. !cast<string> is a special case in that the argument must be an object defined by a 'def' construct.
!subst(a, b, c)
If 'a' and 'b' are of string type or are symbol references, substitute 'b' for 'a' in 'c.' This operation is analogous to $(subst) in GNU make.
!foreach(a, b, c)
For each member 'b' of dag or list 'a' apply operator 'c.' 'b' is a dummy variable that should be declared as a member variable of an instantiated class. This operation is analogous to $(foreach) in GNU make.
!head(a)
The first element of list 'a.'
!tail(a)
The 2nd-N elements of list 'a.'
!empty(a)
An integer {0,1} indicating whether list 'a' is empty.
!if(a,b,c)
'b' if the result of 'int' or 'bit' operator 'a' is nonzero, 'c' otherwise.
!eq(a,b)
'bit 1' if string a is equal to string b, 0 otherwise. This only operates on string, int and bit objects. Use !cast<string> to compare other types of objects.

Note that all of the values have rules specifying how they convert to values for different types. These rules allow you to assign a value like "7" to a "bits<4>" value, for example.

Classes and definitions

As mentioned in the intro, classes and definitions (collectively known as 'records') in TableGen are the main high-level unit of information that TableGen collects. Records are defined with a def or class keyword, the record name, and an optional list of "template arguments". If the record has superclasses, they are specified as a comma separated list that starts with a colon character (":"). If value definitions or let expressions are needed for the class, they are enclosed in curly braces ("{}"); otherwise, the record ends with a semicolon.

Here is a simple TableGen file:

 class C { bit V = 1; }
 def X : C;
 def Y : C {
   string Greeting = "hello";
 }
 

This example defines two definitions, X and Y, both of which derive from the C class. Because of this, they both get the V bit value. The Y definition also gets the Greeting member as well.

In general, classes are useful for collecting together the commonality between a group of records and isolating it in a single place. Also, classes permit the specification of default values for their subclasses, allowing the subclasses to override them as they wish.

Value definitions

Value definitions define named entries in records. A value must be defined before it can be referred to as the operand for another value definition or before the value is reset with a let expression. A value is defined by specifying a TableGen type and a name. If an initial value is available, it may be specified after the type with an equal sign. Value definitions require terminating semicolons.

'let' expressions

A record-level let expression is used to change the value of a value definition in a record. This is primarily useful when a superclass defines a value that a derived class or definition wants to override. Let expressions consist of the 'let' keyword followed by a value name, an equal sign ("="), and a new value. For example, a new class could be added to the example above, redefining the V field for all of its subclasses:

 class D : C { let V = 0; }
 def Z : D;
 

In this case, the Z definition will have a zero value for its "V" value, despite the fact that it derives (indirectly) from the C class, because the D class overrode its value.

Class template arguments

TableGen permits the definition of parameterized classes as well as normal concrete classes. Parameterized TableGen classes specify a list of variable bindings (which may optionally have defaults) that are bound when used. Here is a simple example:

 class FPFormat<bits<3> val> {
   bits<3> Value = val;
 }
 def NotFP      : FPFormat<0>;
 def ZeroArgFP  : FPFormat<1>;
 def OneArgFP   : FPFormat<2>;
 def OneArgFPRW : FPFormat<3>;
 def TwoArgFP   : FPFormat<4>;
 def CompareFP  : FPFormat<5>;
 def CondMovFP  : FPFormat<6>;
 def SpecialFP  : FPFormat<7>;
 

In this case, template arguments are used as a space efficient way to specify a list of "enumeration values", each with a "Value" field set to the specified integer.

The more esoteric forms of TableGen expressions are useful in conjunction with template arguments. As an example:

 class ModRefVal<bits<2> val> {
   bits<2> Value = val;
 }
 
 def None   : ModRefVal<0>;
 def Mod    : ModRefVal<1>;
 def Ref    : ModRefVal<2>;
 def ModRef : ModRefVal<3>;
 
 class Value<ModRefVal MR> {
   // Decode some information into a more convenient format, while providing
   // a nice interface to the user of the "Value" class.
   bit isMod = MR.Value{0};
   bit isRef = MR.Value{1};
 
   // other stuff...
 }
 
 // Example uses
 def bork : Value<Mod>;
 def zork : Value<Ref>;
 def hork : Value<ModRef>;
 

This is obviously a contrived example, but it shows how template arguments can be used to decouple the interface provided to the user of the class from the actual internal data representation expected by the class. In this case, running tblgen on the example prints the following definitions:

 def bork {      // Value
   bit isMod = 1;
   bit isRef = 0;
 }
 def hork {      // Value
   bit isMod = 1;
   bit isRef = 1;
 }
 def zork {      // Value
   bit isMod = 0;
   bit isRef = 1;
 }
 

This shows that TableGen was able to dig into the argument and extract a piece of information that was requested by the designer of the "Value" class. For more realistic examples, please see existing users of TableGen, such as the X86 backend.

Multiclass definitions and instances

While classes with template arguments are a good way to factor commonality between two instances of a definition, multiclasses allow a convenient notation for defining multiple definitions at once (instances of implicitly constructed classes). For example, consider an 3-address instruction set whose instructions come in two forms: "reg = reg op reg" and "reg = reg op imm" (e.g. SPARC). In this case, you'd like to specify in one place that this commonality exists, then in a separate place indicate what all the ops are.

Here is an example TableGen fragment that shows this idea:

 def ops;
 def GPR;
 def Imm;
 class inst<int opc, string asmstr, dag operandlist>;
 
 multiclass ri_inst<int opc, string asmstr> {
   def _rr : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
                  (ops GPR:$dst, GPR:$src1, GPR:$src2)>;
   def _ri : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
                  (ops GPR:$dst, GPR:$src1, Imm:$src2)>;
 }
 
 // Instantiations of the ri_inst multiclass.
 defm ADD : ri_inst<0b111, "add">;
 defm SUB : ri_inst<0b101, "sub">;
 defm MUL : ri_inst<0b100, "mul">;
 ...
 

The name of the resultant definitions has the multidef fragment names appended to them, so this defines ADD_rr, ADD_ri, SUB_rr, etc. A defm may inherit from multiple multiclasses, instantiating definitions from each multiclass. Using a multiclass this way is exactly equivalent to instantiating the classes multiple times yourself, e.g. by writing:

 def ops;
 def GPR;
 def Imm;
 class inst<int opc, string asmstr, dag operandlist>;
 
 class rrinst<int opc, string asmstr>
   : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
          (ops GPR:$dst, GPR:$src1, GPR:$src2)>;
 
 class riinst<int opc, string asmstr>
   : inst<opc, !strconcat(asmstr, " $dst, $src1, $src2"),
          (ops GPR:$dst, GPR:$src1, Imm:$src2)>;
 
 // Instantiations of the ri_inst multiclass.
 def ADD_rr : rrinst<0b111, "add">;
 def ADD_ri : riinst<0b111, "add">;
 def SUB_rr : rrinst<0b101, "sub">;
 def SUB_ri : riinst<0b101, "sub">;
 def MUL_rr : rrinst<0b100, "mul">;
 def MUL_ri : riinst<0b100, "mul">;
 ...
 

A defm can also be used inside a multiclass providing several levels of multiclass instanciations.

 class Instruction<bits<4> opc, string Name> {
   bits<4> opcode = opc;
   string name = Name;
 }
 
 multiclass basic_r<bits<4> opc> {
   def rr : Instruction<opc, "rr">;
   def rm : Instruction<opc, "rm">;
 }
 
 multiclass basic_s<bits<4> opc> {
   defm SS : basic_r<opc>;
   defm SD : basic_r<opc>;
   def X : Instruction<opc, "x">;
 }
 
 multiclass basic_p<bits<4> opc> {
   defm PS : basic_r<opc>;
   defm PD : basic_r<opc>;
   def Y : Instruction<opc, "y">;
 }
 
 defm ADD : basic_s<0xf>, basic_p<0xf>;
 ...
 
 // Results
 def ADDPDrm { ...
 def ADDPDrr { ...
 def ADDPSrm { ...
 def ADDPSrr { ...
 def ADDSDrm { ...
 def ADDSDrr { ...
 def ADDY { ...
 def ADDX { ...
 

defm declarations can inherit from classes too, the rule to follow is that the class list must start after the last multiclass, and there must be at least one multiclass before them.

 class XD { bits<4> Prefix = 11; }
 class XS { bits<4> Prefix = 12; }
 
 class I<bits<4> op> {
   bits<4> opcode = op;
 }
 
 multiclass R {
   def rr : I<4>;
   def rm : I<2>;
 }
 
 multiclass Y {
   defm SS : R, XD;
   defm SD : R, XS;
 }
 
 defm Instr : Y;
 
 // Results
 def InstrSDrm {
   bits<4> opcode = { 0, 0, 1, 0 };
   bits<4> Prefix = { 1, 1, 0, 0 };
 }
 ...
 def InstrSSrr {
   bits<4> opcode = { 0, 1, 0, 0 };
   bits<4> Prefix = { 1, 0, 1, 1 };
 }
 

File scope entities

File inclusion

TableGen supports the 'include' token, which textually substitutes the specified file in place of the include directive. The filename should be specified as a double quoted string immediately after the 'include' keyword. Example:

 include "foo.td"
 

'let' expressions

"Let" expressions at file scope are similar to "let" expressions within a record, except they can specify a value binding for multiple records at a time, and may be useful in certain other cases. File-scope let expressions are really just another way that TableGen allows the end-user to factor out commonality from the records.

File-scope "let" expressions take a comma-separated list of bindings to apply, and one or more records to bind the values in. Here are some examples:

 let isTerminator = 1, isReturn = 1, isBarrier = 1, hasCtrlDep = 1 in
   def RET : I<0xC3, RawFrm, (outs), (ins), "ret", [(X86retflag 0)]>;
 
 let isCall = 1 in
   // All calls clobber the non-callee saved registers...
   let Defs = [EAX, ECX, EDX, FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0,
               MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
               XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7, EFLAGS] in {
     def CALLpcrel32 : Ii32<0xE8, RawFrm, (outs), (ins i32imm:$dst,variable_ops),
                            "call\t${dst:call}", []>;
     def CALL32r     : I<0xFF, MRM2r, (outs), (ins GR32:$dst, variable_ops),
                         "call\t{*}$dst", [(X86call GR32:$dst)]>;
     def CALL32m     : I<0xFF, MRM2m, (outs), (ins i32mem:$dst, variable_ops),
                         "call\t{*}$dst", []>;
   }
 

File-scope "let" expressions are often useful when a couple of definitions need to be added to several records, and the records do not otherwise need to be opened, as in the case with the CALL* instructions above.

It's also possible to use "let" expressions inside multiclasses, providing more ways to factor out commonality from the records, specially if using several levels of multiclass instanciations. This also avoids the need of using "let" expressions within subsequent records inside a multiclass.

 multiclass basic_r<bits<4> opc> {
   let Predicates = [HasSSE2] in {
     def rr : Instruction<opc, "rr">;
     def rm : Instruction<opc, "rm">;
   }
   let Predicates = [HasSSE3] in
     def rx : Instruction<opc, "rx">;
 }
 
 multiclass basic_ss<bits<4> opc> {
   let IsDouble = 0 in
     defm SS : basic_r<opc>;
 
   let IsDouble = 1 in
     defm SD : basic_r<opc>;
 }
 
 defm ADD : basic_ss<0xf>;
 

Code Generator backend info

Expressions used by code generator to describe instructions and isel patterns:

(implicit a)
an implicitly defined physical register. This tells the dag instruction selection emitter the input pattern's extra definitions matches implicit physical register definitions.

TableGen backends

TODO: How they work, how to write one. This section should not contain details about any particular backend, except maybe -print-enums as an example. This should highlight the APIs in TableGen/Record.h.


Valid CSS Valid HTML 4.01 Chris Lattner
LLVM Compiler Infrastructure
- Last modified: $Date: 2011-10-07 20:25:05 +0200 (Fri, 07 Oct 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/TestingGuide.html =================================================================== --- vendor/llvm/dist/docs/TestingGuide.html (revision 228363) +++ vendor/llvm/dist/docs/TestingGuide.html (revision 228364) @@ -1,1212 +1,1213 @@ + LLVM Testing Infrastructure Guide

LLVM Testing Infrastructure Guide

  1. Overview
  2. Requirements
  3. LLVM testing infrastructure organization
  4. Quick start
  5. Regression test structure
  6. Test suite structure
  7. Running the test suite

Written by John T. Criswell, Daniel Dunbar, Reid Spencer, and Tanya Lattner

Overview

This document is the reference manual for the LLVM testing infrastructure. It documents the structure of the LLVM testing infrastructure, the tools needed to use it, and how to add and run tests.

Requirements

In order to use the LLVM testing infrastructure, you will need all of the software required to build LLVM, as well as Python 2.4 or later.

LLVM testing infrastructure organization

The LLVM testing infrastructure contains two major categories of tests: regression tests and whole programs. The regression tests are contained inside the LLVM repository itself under llvm/test and are expected to always pass -- they should be run before every commit. The whole programs tests are referred to as the "LLVM test suite" and are in the test-suite module in subversion.

Regression tests

The regression tests are small pieces of code that test a specific feature of LLVM or trigger a specific bug in LLVM. They are usually written in LLVM assembly language, but can be written in other languages if the test targets a particular language front end (and the appropriate --with-llvmgcc options were used at configure time of the llvm module). These tests are driven by the 'lit' testing tool, which is part of LLVM.

These code fragments are not complete programs. The code generated from them is never executed to determine correct behavior.

These code fragment tests are located in the llvm/test directory.

Typically when a bug is found in LLVM, a regression test containing just enough code to reproduce the problem should be written and placed somewhere underneath this directory. In most cases, this will be a small piece of LLVM assembly language code, often distilled from an actual application or benchmark.

Test suite

The test suite contains whole programs, which are pieces of code which can be compiled and linked into a stand-alone program that can be executed. These programs are generally written in high level languages such as C or C++, but sometimes they are written straight in LLVM assembly.

These programs are compiled and then executed using several different methods (native compiler, LLVM C backend, LLVM JIT, LLVM native code generation, etc). The output of these programs is compared to ensure that LLVM is compiling the program correctly.

In addition to compiling and executing programs, whole program tests serve as a way of benchmarking LLVM performance, both in terms of the efficiency of the programs generated as well as the speed with which LLVM compiles, optimizes, and generates code.

The test-suite is located in the test-suite Subversion module.

Debugging Information tests

The test suite contains tests to check quality of debugging information. The test are written in C based languages or in LLVM assembly language.

These tests are compiled and run under a debugger. The debugger output is checked to validate of debugging information. See README.txt in the test suite for more information . This test suite is located in the debuginfo-tests Subversion module.

Quick start

The tests are located in two separate Subversion modules. The regressions tests are in the main "llvm" module under the directory llvm/test (so you get these tests for free with the main llvm tree). The more comprehensive test suite that includes whole programs in C and C++ is in the test-suite module. This module should be checked out to the llvm/projects directory (don't use another name than the default "test-suite", for then the test suite will be run every time you run make in the main llvm directory). When you configure the llvm module, the test-suite directory will be automatically configured. Alternatively, you can configure the test-suite module manually.

Regression tests

To run all of the LLVM regression tests, use master Makefile in the llvm/test directory:

 % gmake -C llvm/test
 

or

 % gmake check
 

If you have Clang checked out and built, you can run the LLVM and Clang tests simultaneously using:

or

 % gmake check-all
 

To run the tests with Valgrind (Memcheck by default), just append VG=1 to the commands above, e.g.:

 % gmake check VG=1
 

To run individual tests or subsets of tests, you can use the 'llvm-lit' script which is built as part of LLVM. For example, to run the 'Integer/BitCast.ll' test by itself you can run:

 % llvm-lit ~/llvm/test/Integer/BitCast.ll 
 

or to run all of the ARM CodeGen tests:

 % llvm-lit ~/llvm/test/CodeGen/ARM
 

For more information on using the 'lit' tool, see 'llvm-lit --help' or the 'lit' man page.

Test suite

To run the comprehensive test suite (tests that compile and execute whole programs), first checkout and setup the test-suite module:

 % cd llvm/projects
 % svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite
 % cd ..
 % ./configure --with-llvmgccdir=$LLVM_GCC_DIR
 

where $LLVM_GCC_DIR is the directory where you installed llvm-gcc, not its src or obj dir. The --with-llvmgccdir option assumes that the llvm-gcc-4.2 module was configured with --program-prefix=llvm-, and therefore that the C and C++ compiler drivers are called llvm-gcc and llvm-g++ respectively. If this is not the case, use --with-llvmgcc/--with-llvmgxx to specify each executable's location.

Then, run the entire test suite by running make in the test-suite directory:

 % cd projects/test-suite
 % gmake
 

Usually, running the "nightly" set of tests is a good idea, and you can also let it generate a report by running:

 % cd projects/test-suite
 % gmake TEST=nightly report report.html
 

Any of the above commands can also be run in a subdirectory of projects/test-suite to run the specified test only on the programs in that subdirectory.

Debugging Information tests

To run debugging information tests simply checkout the tests inside clang/test directory.

 %cd clang/test
 % svn co http://llvm.org/svn/llvm-project/debuginfo-tests/trunk debuginfo-tests
 

These tests are already set up to run as part of clang regression tests.

Regression test structure

The LLVM regression tests are driven by 'lit' and are located in the llvm/test directory.

This directory contains a large array of small tests that exercise various features of LLVM and to ensure that regressions do not occur. The directory is broken into several sub-directories, each focused on a particular area of LLVM. A few of the important ones are:

Writing new regression tests

The regression test structure is very simple, but does require some information to be set. This information is gathered via configure and is written to a file, lit.site.cfg in llvm/test. The llvm/test Makefile does this work for you.

In order for the regression tests to work, each directory of tests must have a dg.exp file. Lit looks for this file to determine how to run the tests. This file is just a Tcl script and it can do anything you want, but we've standardized it for the LLVM regression tests. If you're adding a directory of tests, just copy dg.exp from another directory to get running. The standard dg.exp simply loads a Tcl library (test/lib/llvm.exp) and calls the llvm_runtests function defined in that library with a list of file names to run. The names are obtained by using Tcl's glob command. Any directory that contains only directories does not need the dg.exp file.

The llvm-runtests function looks at each file that is passed to it and gathers any lines together that match "RUN:". These are the "RUN" lines that specify how the test is to be run. So, each test script must contain RUN lines if it is to do anything. If there are no RUN lines, the llvm-runtests function will issue an error and the test will fail.

RUN lines are specified in the comments of the test program using the keyword RUN followed by a colon, and lastly the command (pipeline) to execute. Together, these lines form the "script" that llvm-runtests executes to run the test case. The syntax of the RUN lines is similar to a shell's syntax for pipelines including I/O redirection and variable substitution. However, even though these lines may look like a shell script, they are not. RUN lines are interpreted directly by the Tcl exec command. They are never executed by a shell. Consequently the syntax differs from normal shell script syntax in a few ways. You can specify as many RUN lines as needed.

lit performs substitution on each RUN line to replace LLVM tool names with the full paths to the executable built for each tool (in $(LLVM_OBJ_ROOT)/$(BuildMode)/bin). This ensures that lit does not invoke any stray LLVM tools in the user's path during testing.

Each RUN line is executed on its own, distinct from other lines unless its last character is \. This continuation character causes the RUN line to be concatenated with the next one. In this way you can build up long pipelines of commands without making huge line lengths. The lines ending in \ are concatenated until a RUN line that doesn't end in \ is found. This concatenated set of RUN lines then constitutes one execution. Tcl will substitute variables and arrange for the pipeline to be executed. If any process in the pipeline fails, the entire line (and test case) fails too.

Below is an example of legal RUN lines in a .ll file:

 ; RUN: llvm-as < %s | llvm-dis > %t1
 ; RUN: llvm-dis < %s.bc-13 > %t2
 ; RUN: diff %t1 %t2
 

As with a Unix shell, the RUN: lines permit pipelines and I/O redirection to be used. However, the usage is slightly different than for Bash. To check what's legal, see the documentation for the Tcl exec command and the tutorial. The major differences are:

There are some quoting rules that you must pay attention to when writing your RUN lines. In general nothing needs to be quoted. Tcl won't strip off any quote characters so they will get passed to the invoked program. For example:

 ... | grep 'find this string'
 

This will fail because the ' characters are passed to grep. This would instruction grep to look for 'find in the files this and string'. To avoid this use curly braces to tell Tcl that it should treat everything enclosed as one value. So our example would become:

 ... | grep {find this string}
 

Additionally, the characters [ and ] are treated specially by Tcl. They tell Tcl to interpret the content as a command to execute. Since these characters are often used in regular expressions this can have disastrous results and cause the entire test run in a directory to fail. For example, a common idiom is to look for some basicblock number:

 ... | grep bb[2-8]
 

This, however, will cause Tcl to fail because its going to try to execute a program named "2-8". Instead, what you want is this:

 ... | grep {bb\[2-8\]}
 

Finally, if you need to pass the \ character down to a program, then it must be doubled. This is another Tcl special character. So, suppose you had:

 ... | grep 'i32\*'
 

This will fail to match what you want (a pointer to i32). First, the ' do not get stripped off. Second, the \ gets stripped off by Tcl so what grep sees is: 'i32*'. That's not likely to match anything. To resolve this you must use \\ and the {}, like this:

 ... | grep {i32\\*}
 

If your system includes GNU grep, make sure that GREP_OPTIONS is not set in your environment. Otherwise, you may get invalid results (both false positives and false negatives).

The FileCheck utility

A powerful feature of the RUN: lines is that it allows any arbitrary commands to be executed as part of the test harness. While standard (portable) unix tools like 'grep' work fine on run lines, as you see above, there are a lot of caveats due to interaction with Tcl syntax, and we want to make sure the run lines are portable to a wide range of systems. Another major problem is that grep is not very good at checking to verify that the output of a tools contains a series of different output in a specific order. The FileCheck tool was designed to help with these problems.

FileCheck (whose basic command line arguments are described in the FileCheck man page is designed to read a file to check from standard input, and the set of things to verify from a file specified as a command line argument. A simple example of using FileCheck from a RUN line looks like this:

 ; RUN: llvm-as < %s | llc -march=x86-64 | FileCheck %s
 

This syntax says to pipe the current file ("%s") into llvm-as, pipe that into llc, then pipe the output of llc into FileCheck. This means that FileCheck will be verifying its standard input (the llc output) against the filename argument specified (the original .ll file specified by "%s"). To see how this works, lets look at the rest of the .ll file (after the RUN line):

 define void @sub1(i32* %p, i32 %v) {
 entry:
 ; CHECK: sub1:
 ; CHECK: subl
         %0 = tail call i32 @llvm.atomic.load.sub.i32.p0i32(i32* %p, i32 %v)
         ret void
 }
 
 define void @inc4(i64* %p) {
 entry:
 ; CHECK: inc4:
 ; CHECK: incq
         %0 = tail call i64 @llvm.atomic.load.add.i64.p0i64(i64* %p, i64 1)
         ret void
 }
 

Here you can see some "CHECK:" lines specified in comments. Now you can see how the file is piped into llvm-as, then llc, and the machine code output is what we are verifying. FileCheck checks the machine code output to verify that it matches what the "CHECK:" lines specify.

The syntax of the CHECK: lines is very simple: they are fixed strings that must occur in order. FileCheck defaults to ignoring horizontal whitespace differences (e.g. a space is allowed to match a tab) but otherwise, the contents of the CHECK: line is required to match some thing in the test file exactly.

One nice thing about FileCheck (compared to grep) is that it allows merging test cases together into logical groups. For example, because the test above is checking for the "sub1:" and "inc4:" labels, it will not match unless there is a "subl" in between those labels. If it existed somewhere else in the file, that would not count: "grep subl" matches if subl exists anywhere in the file.

The FileCheck -check-prefix option

The FileCheck -check-prefix option allows multiple test configurations to be driven from one .ll file. This is useful in many circumstances, for example, testing different architectural variants with llc. Here's a simple example:

 ; RUN: llvm-as < %s | llc -mtriple=i686-apple-darwin9 -mattr=sse41 \
 ; RUN:              | FileCheck %s -check-prefix=X32
 ; RUN: llvm-as < %s | llc -mtriple=x86_64-apple-darwin9 -mattr=sse41 \
 ; RUN:              | FileCheck %s -check-prefix=X64
 
 define <4 x i32> @pinsrd_1(i32 %s, <4 x i32> %tmp) nounwind {
         %tmp1 = insertelement <4 x i32> %tmp, i32 %s, i32 1
         ret <4 x i32> %tmp1
 ; X32: pinsrd_1:
 ; X32:    pinsrd $1, 4(%esp), %xmm0
 
 ; X64: pinsrd_1:
 ; X64:    pinsrd $1, %edi, %xmm0
 }
 

In this case, we're testing that we get the expected code generation with both 32-bit and 64-bit code generation.

The "CHECK-NEXT:" directive

Sometimes you want to match lines and would like to verify that matches happen on exactly consecutive lines with no other lines in between them. In this case, you can use CHECK: and CHECK-NEXT: directives to specify this. If you specified a custom check prefix, just use "<PREFIX>-NEXT:". For example, something like this works as you'd expect:

 define void @t2(<2 x double>* %r, <2 x double>* %A, double %B) {
 	%tmp3 = load <2 x double>* %A, align 16
 	%tmp7 = insertelement <2 x double> undef, double %B, i32 0
 	%tmp9 = shufflevector <2 x double> %tmp3,
                               <2 x double> %tmp7,
                               <2 x i32> < i32 0, i32 2 >
 	store <2 x double> %tmp9, <2 x double>* %r, align 16
 	ret void
         
 ; CHECK: t2:
 ; CHECK: 	movl	8(%esp), %eax
 ; CHECK-NEXT: 	movapd	(%eax), %xmm0
 ; CHECK-NEXT: 	movhpd	12(%esp), %xmm0
 ; CHECK-NEXT: 	movl	4(%esp), %eax
 ; CHECK-NEXT: 	movapd	%xmm0, (%eax)
 ; CHECK-NEXT: 	ret
 }
 

CHECK-NEXT: directives reject the input unless there is exactly one newline between it an the previous directive. A CHECK-NEXT cannot be the first directive in a file.

The "CHECK-NOT:" directive

The CHECK-NOT: directive is used to verify that a string doesn't occur between two matches (or the first match and the beginning of the file). For example, to verify that a load is removed by a transformation, a test like this can be used:

 define i8 @coerce_offset0(i32 %V, i32* %P) {
   store i32 %V, i32* %P
    
   %P2 = bitcast i32* %P to i8*
   %P3 = getelementptr i8* %P2, i32 2
 
   %A = load i8* %P3
   ret i8 %A
 ; CHECK: @coerce_offset0
 ; CHECK-NOT: load
 ; CHECK: ret i8
 }
 

FileCheck Pattern Matching Syntax

The CHECK: and CHECK-NOT: directives both take a pattern to match. For most uses of FileCheck, fixed string matching is perfectly sufficient. For some things, a more flexible form of matching is desired. To support this, FileCheck allows you to specify regular expressions in matching strings, surrounded by double braces: {{yourregex}}. Because we want to use fixed string matching for a majority of what we do, FileCheck has been designed to support mixing and matching fixed string matching with regular expressions. This allows you to write things like this:

 ; CHECK: movhpd	{{[0-9]+}}(%esp), {{%xmm[0-7]}}
 

In this case, any offset from the ESP register will be allowed, and any xmm register will be allowed.

Because regular expressions are enclosed with double braces, they are visually distinct, and you don't need to use escape characters within the double braces like you would in C. In the rare case that you want to match double braces explicitly from the input, you can use something ugly like {{[{][{]}} as your pattern.

FileCheck Variables

It is often useful to match a pattern and then verify that it occurs again later in the file. For codegen tests, this can be useful to allow any register, but verify that that register is used consistently later. To do this, FileCheck allows named variables to be defined and substituted into patterns. Here is a simple example:

 ; CHECK: test5:
 ; CHECK:    notw	[[REGISTER:%[a-z]+]]
 ; CHECK:    andw	{{.*}}[[REGISTER]]
 

The first check line matches a regex (%[a-z]+) and captures it into the variables "REGISTER". The second line verifies that whatever is in REGISTER occurs later in the file after an "andw". FileCheck variable references are always contained in [[ ]] pairs, are named, and their names can be formed with the regex "[a-zA-Z][a-zA-Z0-9]*". If a colon follows the name, then it is a definition of the variable, if not, it is a use.

FileCheck variables can be defined multiple times, and uses always get the latest value. Note that variables are all read at the start of a "CHECK" line and are all defined at the end. This means that if you have something like "CHECK: [[XYZ:.*]]x[[XYZ]]" that the check line will read the previous value of the XYZ variable and define a new one after the match is performed. If you need to do something like this you can probably take advantage of the fact that FileCheck is not actually line-oriented when it matches, this allows you to define two separate CHECK lines that match on the same line.

Variables and substitutions

With a RUN line there are a number of substitutions that are permitted. In general, any Tcl variable that is available in the substitute function (in test/lib/llvm.exp) can be substituted into a RUN line. To make a substitution just write the variable's name preceded by a $. Additionally, for compatibility reasons with previous versions of the test library, certain names can be accessed with an alternate syntax: a % prefix. These alternates are deprecated and may go away in a future version.

Here are the available variable names. The alternate syntax is listed in parentheses.

$test (%s)
The full path to the test case's source. This is suitable for passing on the command line as the input to an llvm tool.
$srcdir
The source directory from where the "make check" was run.
objdir
The object directory that corresponds to the $srcdir.
subdir
A partial path from the test directory that contains the sub-directory that contains the test source being executed.
srcroot
The root directory of the LLVM src tree.
objroot
The root directory of the LLVM object tree. This could be the same as the srcroot.
path
The path to the directory that contains the test case source. This is for locating any supporting files that are not generated by the test, but used by the test.
tmp
The path to a temporary file name that could be used for this test case. The file name won't conflict with other test cases. You can append to it if you need multiple temporaries. This is useful as the destination of some redirected output.
llvmlibsdir (%llvmlibsdir)
The directory where the LLVM libraries are located.
target_triplet (%target_triplet)
The target triplet that corresponds to the current host machine (the one running the test cases). This should probably be called "host".
llvmgcc (%llvmgcc)
The full path to the llvm-gcc executable as specified in the configured LLVM environment
llvmgxx (%llvmgxx)
The full path to the llvm-gxx executable as specified in the configured LLVM environment
gccpath
The full path to the C compiler used to build LLVM. Note that this might not be gcc.
gxxpath
The full path to the C++ compiler used to build LLVM. Note that this might not be g++.
compile_c (%compile_c)
The full command line used to compile LLVM C source code. This has all the configured -I, -D and optimization options.
compile_cxx (%compile_cxx)
The full command used to compile LLVM C++ source code. This has all the configured -I, -D and optimization options.
link (%link)
This full link command used to link LLVM executables. This has all the configured -I, -L and -l options.
shlibext (%shlibext)
The suffix for the host platforms share library (dll) files. This includes the period as the first character.

To add more variables, two things need to be changed. First, add a line in the test/Makefile that creates the site.exp file. This will "set" the variable as a global in the site.exp file. Second, in the test/lib/llvm.exp file, in the substitute proc, add the variable name to the list of "global" declarations at the beginning of the proc. That's it, the variable can then be used in test scripts.

Other Features

To make RUN line writing easier, there are several shell scripts located in the llvm/test/Scripts directory. This directory is in the PATH when running tests, so you can just call these scripts using their name. For example:

ignore
This script runs its arguments and then always returns 0. This is useful in cases where the test needs to cause a tool to generate an error (e.g. to check the error output). However, any program in a pipeline that returns a non-zero result will cause the test to fail. This script overcomes that issue and nicely documents that the test case is purposefully ignoring the result code of the tool
not
This script runs its arguments and then inverts the result code from it. Zero result codes become 1. Non-zero result codes become 0. This is useful to invert the result of a grep. For example "not grep X" means succeed only if you don't find X in the input.

Sometimes it is necessary to mark a test case as "expected fail" or XFAIL. You can easily mark a test as XFAIL just by including XFAIL: on a line near the top of the file. This signals that the test case should succeed if the test fails. Such test cases are counted separately by the testing tool. To specify an expected fail, use the XFAIL keyword in the comments of the test program followed by a colon and one or more regular expressions (separated by a comma). The regular expressions allow you to XFAIL the test conditionally by host platform. The regular expressions following the : are matched against the target triplet for the host machine. If there is a match, the test is expected to fail. If not, the test is expected to succeed. To XFAIL everywhere just specify XFAIL: *. Here is an example of an XFAIL line:

 ; XFAIL: darwin,sun
 

To make the output more useful, the llvm_runtest function wil scan the lines of the test case for ones that contain a pattern that matches PR[0-9]+. This is the syntax for specifying a PR (Problem Report) number that is related to the test case. The number after "PR" specifies the LLVM bugzilla number. When a PR number is specified, it will be used in the pass/fail reporting. This is useful to quickly get some context when a test fails.

Finally, any line that contains "END." will cause the special interpretation of lines to terminate. This is generally done right after the last RUN: line. This has two side effects: (a) it prevents special interpretation of lines that are part of the test program, not the instructions to the test case, and (b) it speeds things up for really big test cases by avoiding interpretation of the remainder of the file.

Test suite Structure

The test-suite module contains a number of programs that can be compiled with LLVM and executed. These programs are compiled using the native compiler and various LLVM backends. The output from the program compiled with the native compiler is assumed correct; the results from the other programs are compared to the native program output and pass if they match.

When executing tests, it is usually a good idea to start out with a subset of the available tests or programs. This makes test run times smaller at first and later on this is useful to investigate individual test failures. To run some test only on a subset of programs, simply change directory to the programs you want tested and run gmake there. Alternatively, you can run a different test using the TEST variable to change what tests or run on the selected programs (see below for more info).

In addition for testing correctness, the test-suite directory also performs timing tests of various LLVM optimizations. It also records compilation times for the compilers and the JIT. This information can be used to compare the effectiveness of LLVM's optimizations and code generation.

test-suite tests are divided into three types of tests: MultiSource, SingleSource, and External.

Each tree is then subdivided into several categories, including applications, benchmarks, regression tests, code that is strange grammatically, etc. These organizations should be relatively self explanatory.

Some tests are known to fail. Some are bugs that we have not fixed yet; others are features that we haven't added yet (or may never add). In the regression tests, the result for such tests will be XFAIL (eXpected FAILure). In this way, you can tell the difference between an expected and unexpected failure.

The tests in the test suite have no such feature at this time. If the test passes, only warnings and other miscellaneous output will be generated. If a test fails, a large <program> FAILED message will be displayed. This will help you separate benign warnings from actual test failures.

Running the test suite

First, all tests are executed within the LLVM object directory tree. They are not executed inside of the LLVM source tree. This is because the test suite creates temporary files during execution.

To run the test suite, you need to use the following steps:

  1. cd into the llvm/projects directory in your source tree.
  2. Check out the test-suite module with:

     % svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite
     

    This will get the test suite into llvm/projects/test-suite.

  3. Configure and build llvm.

  4. Configure and build llvm-gcc.

  5. Install llvm-gcc somewhere.

  6. Re-configure llvm from the top level of each build tree (LLVM object directory tree) in which you want to run the test suite, just as you do before building LLVM.

    During the re-configuration, you must either: (1) have llvm-gcc you just built in your path, or (2) specify the directory where your just-built llvm-gcc is installed using --with-llvmgccdir=$LLVM_GCC_DIR.

    You must also tell the configure machinery that the test suite is available so it can be configured for your build tree:

     % cd $LLVM_OBJ_ROOT ; $LLVM_SRC_ROOT/configure [--with-llvmgccdir=$LLVM_GCC_DIR]
     

    [Remember that $LLVM_GCC_DIR is the directory where you installed llvm-gcc, not its src or obj directory.]

  7. You can now run the test suite from your build tree as follows:

     % cd $LLVM_OBJ_ROOT/projects/test-suite
     % make
     

Note that the second and third steps only need to be done once. After you have the suite checked out and configured, you don't need to do it again (unless the test code or configure script changes).

Configuring External Tests

In order to run the External tests in the test-suite module, you must specify --with-externals. This must be done during the re-configuration step (see above), and the llvm re-configuration must recognize the previously-built llvm-gcc. If any of these is missing or neglected, the External tests won't work.

--with-externals
--with-externals=<directory>
This tells LLVM where to find any external tests. They are expected to be in specifically named subdirectories of <directory>. If directory is left unspecified, configure uses the default value /home/vadve/shared/benchmarks/speccpu2000/benchspec. Subdirectory names known to LLVM include:
spec95
speccpu2000
speccpu2006
povray31
Others are added from time to time, and can be determined from configure.

Running different tests

In addition to the regular "whole program" tests, the test-suite module also provides a mechanism for compiling the programs in different ways. If the variable TEST is defined on the gmake command line, the test system will include a Makefile named TEST.<value of TEST variable>.Makefile. This Makefile can modify build rules to yield different results.

For example, the LLVM nightly tester uses TEST.nightly.Makefile to create the nightly test reports. To run the nightly tests, run gmake TEST=nightly.

There are several TEST Makefiles available in the tree. Some of them are designed for internal LLVM research and will not work outside of the LLVM research group. They may still be valuable, however, as a guide to writing your own TEST Makefile for any optimization or analysis passes that you develop with LLVM.

Generating test output

There are a number of ways to run the tests and generate output. The most simple one is simply running gmake with no arguments. This will compile and run all programs in the tree using a number of different methods and compare results. Any failures are reported in the output, but are likely drowned in the other output. Passes are not reported explicitely.

Somewhat better is running gmake TEST=sometest test, which runs the specified test and usually adds per-program summaries to the output (depending on which sometest you use). For example, the nightly test explicitely outputs TEST-PASS or TEST-FAIL for every test after each program. Though these lines are still drowned in the output, it's easy to grep the output logs in the Output directories.

Even better are the report and report.format targets (where format is one of html, csv, text or graphs). The exact contents of the report are dependent on which TEST you are running, but the text results are always shown at the end of the run and the results are always stored in the report.<type>.format file (when running with TEST=<type>). The report also generate a file called report.<type>.raw.out containing the output of the entire test run.

Writing custom tests for the test suite

Assuming you can run the test suite, (e.g. "gmake TEST=nightly report" should work), it is really easy to run optimizations or code generator components against every program in the tree, collecting statistics or running custom checks for correctness. At base, this is how the nightly tester works, it's just one example of a general framework.

Lets say that you have an LLVM optimization pass, and you want to see how many times it triggers. First thing you should do is add an LLVM statistic to your pass, which will tally counts of things you care about.

Following this, you can set up a test and a report that collects these and formats them for easy viewing. This consists of two files, a "test-suite/TEST.XXX.Makefile" fragment (where XXX is the name of your test) and a "test-suite/TEST.XXX.report" file that indicates how to format the output into a table. There are many example reports of various levels of sophistication included with the test suite, and the framework is very general.

If you are interested in testing an optimization pass, check out the "libcalls" test as an example. It can be run like this:

 % cd llvm/projects/test-suite/MultiSource/Benchmarks  # or some other level
 % make TEST=libcalls report
 

This will do a bunch of stuff, then eventually print a table like this:

 Name                                  | total | #exit |
 ...
 FreeBench/analyzer/analyzer           | 51    | 6     | 
 FreeBench/fourinarow/fourinarow       | 1     | 1     | 
 FreeBench/neural/neural               | 19    | 9     | 
 FreeBench/pifft/pifft                 | 5     | 3     | 
 MallocBench/cfrac/cfrac               | 1     | *     | 
 MallocBench/espresso/espresso         | 52    | 12    | 
 MallocBench/gs/gs                     | 4     | *     | 
 Prolangs-C/TimberWolfMC/timberwolfmc  | 302   | *     | 
 Prolangs-C/agrep/agrep                | 33    | 12    | 
 Prolangs-C/allroots/allroots          | *     | *     | 
 Prolangs-C/assembler/assembler        | 47    | *     | 
 Prolangs-C/bison/mybison              | 74    | *     | 
 ...
 

This basically is grepping the -stats output and displaying it in a table. You can also use the "TEST=libcalls report.html" target to get the table in HTML form, similarly for report.csv and report.tex.

The source for this is in test-suite/TEST.libcalls.*. The format is pretty simple: the Makefile indicates how to run the test (in this case, "opt -simplify-libcalls -stats"), and the report contains one line for each column of the output. The first value is the header for the column and the second is the regex to grep the output of the command for. There are lots of example reports that can do fancy stuff.


Valid CSS Valid HTML 4.01 John T. Criswell, Daniel Dunbar, Reid Spencer, and Tanya Lattner
The LLVM Compiler Infrastructure
- Last modified: $Date: 2011-05-18 20:07:16 +0200 (Wed, 18 May 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/docs/UsingLibraries.html =================================================================== --- vendor/llvm/dist/docs/UsingLibraries.html (revision 228363) +++ vendor/llvm/dist/docs/UsingLibraries.html (revision 228364) @@ -1,447 +1,448 @@ - Using The LLVM Libraries + + Using The LLVM Libraries

Using The LLVM Libraries

  1. Abstract
  2. Introduction
  3. Library Descriptions
  4. Library Dependencies
  5. Linkage Rules Of Thumb
    1. Always link LLVMCore, LLVMSupport, LLVMSystem
    2. Never link both archive and re-linked

Written by Reid Spencer

Warning: This document is out of date, for more information please see llvm-config or, if you use CMake, the CMake LLVM guide.

Abstract

Amongst other things, LLVM is a toolkit for building compilers, linkers, runtime executives, virtual machines, and other program execution related tools. In addition to the LLVM tool set, the functionality of LLVM is available through a set of libraries. To use LLVM as a toolkit for constructing tools, a developer needs to understand what is contained in the various libraries, what they depend on, and how to use them. Fortunately, there is a tool, llvm-config to aid with this. This document describes the contents of the libraries and how to use llvm-config to generate command line options.

Introduction

If you're writing a compiler, virtual machine, or any other utility based on LLVM, you'll need to figure out which of the many libraries files you will need to link with to be successful. An understanding of the contents of these libraries will be useful in coming up with an optimal specification for the libraries to link with. The purpose of this document is to reduce some of the trial and error that the author experienced in using LLVM.

LLVM produces two types of libraries: archives (ending in .a) and objects (ending in .o). However, both are libraries. Libraries ending in .o are known as re-linked libraries because they contain all the compilation units of the library linked together as a single .o file. Furthermore, several of the libraries have both forms of library. The re-linked libraries are used whenever you want to include all symbols from the library. The archive libraries are used whenever you want to only resolve outstanding symbols at that point in the link without including everything in the library.

If you're using the LLVM Makefile system to link your tools,you will use the LLVMLIBS make variable. (see the Makefile Guide for details). This variable specifies which LLVM libraries to link into your tool and the order in which they will be linked. You specify re-linked libraries by naming the library without a suffix. You specify archive libraries by naming the library with a .a suffix but without the lib prefix. The order in which the libraries appear in the LLVMLIBS variable definition is the order in which they will be linked. Getting this order correct for your tool can sometimes be challenging.

Library Descriptions

The table below categorizes each library
LibraryFormsDescription
Core Libraries
LLVMArchive.a LLVM archive reading and writing
LLVMAsmParser.a LLVM assembly parsing
LLVMBCReader.a LLVM bitcode reading
LLVMBCWriter.a LLVM bitcode writing
LLVMCore.a LLVM core intermediate representation
LLVMDebugger.a Source level debugging support
LLVMLinker.a Bitcode and archive linking interface
LLVMSupport.a General support utilities
LLVMSystem.a Operating system abstraction layer
LLVMbzip2.a BZip2 compression library
Analysis Libraries
LLVMAnalysis.a Various analysis passes.
LLVMDataStructure.o Data structure analysis passes.
LLVMipa.a Inter-procedural analysis passes.
Transformation Libraries
LLVMInstrumentation.a Instrumentation passes.
LLVMipo.a All inter-procedural optimization passes.
LLVMScalarOpts.a All scalar optimization passes.
LLVMTransformUtils.a Transformation utilities used by many passes.
Code Generation Libraries
LLVMCodeGen.o Native code generation infrastructure
LLVMSelectionDAG.o Aggressive instruction selector for directed acyclic graphs
Target Libraries
LLVMAlpha.o Code generation for Alpha architecture
LLVMARM.o Code generation for ARM architecture
LLVMCBackend.o 'C' language code generator.
LLVMPowerPC.o Code generation for PowerPC architecture
LLVMSparc.o Code generation for Sparc architecture
LLVMTarget.a Generic code generation utilities.
LLVMX86.o Code generation for Intel x86 architecture
Runtime Libraries
LLVMInterpreter.o Bitcode Interpreter
LLVMJIT.o Bitcode JIT Compiler
LLVMExecutionEngine.o Virtual machine engine

Using llvm-config

The llvm-config tool is a perl script that produces on its output various kinds of information. For example, the source or object directories used to build LLVM can be accessed by passing options to llvm-config. For complete details on this tool, please see the manual page.

To understand the relationships between libraries, the llvm-config can be very useful. If all you know is that you want certain libraries to be available, you can generate the complete set of libraries to link with using one of four options, as below:

  1. --ldflags. This generates the command line options necessary to be passed to the ld tool in order to link with LLVM. Most notably, the -L option is provided to specify a library search directory that contains the LLVM libraries.
  2. --libs. This generates command line options suitable for use with a gcc-style linker. That is, libraries are given with a -l option and object files are given with a full path.
  3. --libnames. This generates a list of just the library file names. If you know the directory in which these files reside (see --ldflags) then you can find the libraries there.
  4. --libfiles. This generates the full path names of the LLVM library files.

If you wish to delve further into how llvm-config generates the correct order (based on library dependencies), please see the tool named GenLibDeps.pl in the utils source directory of LLVM.

Dependency Relationships Of Libraries

This graph shows the dependency of archive libraries on other archive libraries or objects. Where a library has both archive and object forms, only the archive form is shown.

Library Dependencies

Dependency Relationships Of Object Files

This graph shows the dependency of object files on archive libraries or other objects. Where a library has both object and archive forms, only the dependency to the archive form is shown.

Object File Dependencies

The following list shows the dependency relationships between libraries in textual form. The information is the same as shown on the graphs but arranged alphabetically.

libLLVMAnalysis.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
libLLVMArchive.a
  • libLLVMBCReader.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
libLLVMAsmParser.a
  • libLLVMCore.a
  • libLLVMSystem.a
libLLVMBCReader.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
libLLVMBCWriter.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
libLLVMCodeGen.a
  • libLLVMAnalysis.a
  • libLLVMCore.a
  • libLLVMScalarOpts.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
  • libLLVMTransformUtils.a
libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
libLLVMDebugger.a
  • libLLVMBCReader.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
libLLVMInstrumentation.a
  • libLLVMCore.a
  • libLLVMScalarOpts.a
  • libLLVMSupport.a
  • libLLVMTransformUtils.a
libLLVMLinker.a
  • libLLVMArchive.a
  • libLLVMBCReader.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
libLLVMScalarOpts.a
  • libLLVMAnalysis.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
  • libLLVMTransformUtils.a
libLLVMSelectionDAG.a
  • libLLVMAnalysis.a
  • libLLVMCodeGen.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
  • libLLVMTransformUtils.a
libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMbzip2.a
libLLVMSystem.a
libLLVMTarget.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
libLLVMTransformUtils.a
  • libLLVMAnalysis.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
  • libLLVMipa.a
libLLVMbzip2.a
libLLVMipa.a
  • libLLVMAnalysis.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
libLLVMipo.a
  • libLLVMAnalysis.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
  • libLLVMTransformUtils.a
  • libLLVMipa.a
libLLVMlto.a
  • libLLVMAnalysis.a
  • libLLVMBCReader.a
  • libLLVMBCWriter.a
  • libLLVMCore.a
  • libLLVMLinker.a
  • libLLVMScalarOpts.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
  • libLLVMipa.a
  • libLLVMipo.a
LLVMARM.o
  • libLLVMCodeGen.a
  • libLLVMCore.a
  • libLLVMSelectionDAG.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
LLVMAlpha.o
  • libLLVMCodeGen.a
  • libLLVMCore.a
  • libLLVMSelectionDAG.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
LLVMCBackend.o
  • libLLVMAnalysis.a
  • libLLVMCodeGen.a
  • libLLVMCore.a
  • libLLVMScalarOpts.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
  • libLLVMTransformUtils.a
  • libLLVMipa.a
LLVMExecutionEngine.o
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
LLVMInterpreter.o
  • LLVMExecutionEngine.o
  • libLLVMCodeGen.a
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
LLVMJIT.o
  • LLVMExecutionEngine.o
  • libLLVMCore.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
LLVMPowerPC.o
  • libLLVMCodeGen.a
  • libLLVMCore.a
  • libLLVMSelectionDAG.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
LLVMSparc.o
  • libLLVMCodeGen.a
  • libLLVMCore.a
  • libLLVMSelectionDAG.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a
LLVMX86.o
  • libLLVMCodeGen.a
  • libLLVMCore.a
  • libLLVMSelectionDAG.a
  • libLLVMSupport.a
  • libLLVMSystem.a
  • libLLVMTarget.a

Linkage Rules Of Thumb

This section contains various "rules of thumb" about what files you should link into your programs.

Always Link LLVMCore, LLVMSupport, and LLVMSystem

No matter what you do with LLVM, the last three entries in the value of your LLVMLIBS make variable should always be: LLVMCore LLVMSupport.a LLVMSystem.a. There are no LLVM programs that don't depend on these three.

Never link both archive and re-linked library

There is never any point to linking both the re-linked (.o) and the archive (.a) versions of a library. Since the re-linked version includes the entire library, the archive version will not resolve any symbols. You could even end up with link error if you place the archive version before the re-linked version on the linker's command line.


+
Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $ Index: vendor/llvm/dist/docs/index.html =================================================================== --- vendor/llvm/dist/docs/index.html (revision 228363) +++ vendor/llvm/dist/docs/index.html (revision 228364) @@ -1,291 +1,292 @@ + Documentation for the LLVM System at SVN head

Documentation for the LLVM System at SVN head

If you are using a released version of LLVM, see the download page to find your documentation.


Written by The LLVM Team

LLVM Design & Overview

LLVM User Guides

General LLVM Programming Documentation

LLVM Subsystem Documentation

LLVM Mailing Lists


Valid CSS Valid HTML 4.01 LLVM Compiler Infrastructure
- Last modified: $Date: 2011-10-11 18:35:07 +0200 (Tue, 11 Oct 2011) $ + Last modified: $Date: 2011-11-03 07:43:23 +0100 (Thu, 03 Nov 2011) $
Index: vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp =================================================================== --- vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp (revision 228363) +++ vendor/llvm/dist/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp (revision 228364) @@ -1,1411 +1,1413 @@ //===-- llvm/CodeGen/DwarfCompileUnit.cpp - Dwarf Compile Unit ------------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file contains support for writing dwarf compile unit. // //===----------------------------------------------------------------------===// #define DEBUG_TYPE "dwarfdebug" #include "DwarfCompileUnit.h" #include "DwarfDebug.h" #include "llvm/Constants.h" #include "llvm/GlobalVariable.h" #include "llvm/Instructions.h" #include "llvm/Analysis/DIBuilder.h" #include "llvm/Target/Mangler.h" #include "llvm/Target/TargetData.h" #include "llvm/Target/TargetFrameLowering.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Target/TargetRegisterInfo.h" #include "llvm/ADT/APFloat.h" #include "llvm/Support/ErrorHandling.h" using namespace llvm; /// CompileUnit - Compile unit constructor. CompileUnit::CompileUnit(unsigned I, DIE *D, AsmPrinter *A, DwarfDebug *DW) : ID(I), CUDie(D), Asm(A), DD(DW), IndexTyDie(0) { DIEIntegerOne = new (DIEValueAllocator) DIEInteger(1); } /// ~CompileUnit - Destructor for compile unit. CompileUnit::~CompileUnit() { for (unsigned j = 0, M = DIEBlocks.size(); j < M; ++j) DIEBlocks[j]->~DIEBlock(); } /// createDIEEntry - Creates a new DIEEntry to be a proxy for a debug /// information entry. DIEEntry *CompileUnit::createDIEEntry(DIE *Entry) { DIEEntry *Value = new (DIEValueAllocator) DIEEntry(Entry); return Value; } /// addUInt - Add an unsigned integer attribute data and value. /// void CompileUnit::addUInt(DIE *Die, unsigned Attribute, unsigned Form, uint64_t Integer) { if (!Form) Form = DIEInteger::BestForm(false, Integer); DIEValue *Value = Integer == 1 ? DIEIntegerOne : new (DIEValueAllocator) DIEInteger(Integer); Die->addValue(Attribute, Form, Value); } /// addSInt - Add an signed integer attribute data and value. /// void CompileUnit::addSInt(DIE *Die, unsigned Attribute, unsigned Form, int64_t Integer) { if (!Form) Form = DIEInteger::BestForm(true, Integer); DIEValue *Value = new (DIEValueAllocator) DIEInteger(Integer); Die->addValue(Attribute, Form, Value); } /// addString - Add a string attribute data and value. DIEString only /// keeps string reference. void CompileUnit::addString(DIE *Die, unsigned Attribute, unsigned Form, StringRef String) { DIEValue *Value = new (DIEValueAllocator) DIEString(String); Die->addValue(Attribute, Form, Value); } /// addLabel - Add a Dwarf label attribute data and value. /// void CompileUnit::addLabel(DIE *Die, unsigned Attribute, unsigned Form, const MCSymbol *Label) { DIEValue *Value = new (DIEValueAllocator) DIELabel(Label); Die->addValue(Attribute, Form, Value); } /// addDelta - Add a label delta attribute data and value. /// void CompileUnit::addDelta(DIE *Die, unsigned Attribute, unsigned Form, const MCSymbol *Hi, const MCSymbol *Lo) { DIEValue *Value = new (DIEValueAllocator) DIEDelta(Hi, Lo); Die->addValue(Attribute, Form, Value); } /// addDIEEntry - Add a DIE attribute data and value. /// void CompileUnit::addDIEEntry(DIE *Die, unsigned Attribute, unsigned Form, DIE *Entry) { Die->addValue(Attribute, Form, createDIEEntry(Entry)); } /// addBlock - Add block data. /// void CompileUnit::addBlock(DIE *Die, unsigned Attribute, unsigned Form, DIEBlock *Block) { Block->ComputeSize(Asm); DIEBlocks.push_back(Block); // Memoize so we can call the destructor later on. Die->addValue(Attribute, Block->BestForm(), Block); } /// addSourceLine - Add location information to specified debug information /// entry. void CompileUnit::addSourceLine(DIE *Die, DIVariable V) { // Verify variable. if (!V.Verify()) return; unsigned Line = V.getLineNumber(); if (Line == 0) return; unsigned FileID = DD->GetOrCreateSourceID(V.getContext().getFilename(), V.getContext().getDirectory()); assert(FileID && "Invalid file id"); addUInt(Die, dwarf::DW_AT_decl_file, 0, FileID); addUInt(Die, dwarf::DW_AT_decl_line, 0, Line); } /// addSourceLine - Add location information to specified debug information /// entry. void CompileUnit::addSourceLine(DIE *Die, DIGlobalVariable G) { // Verify global variable. if (!G.Verify()) return; unsigned Line = G.getLineNumber(); if (Line == 0) return; unsigned FileID = DD->GetOrCreateSourceID(G.getFilename(), G.getDirectory()); assert(FileID && "Invalid file id"); addUInt(Die, dwarf::DW_AT_decl_file, 0, FileID); addUInt(Die, dwarf::DW_AT_decl_line, 0, Line); } /// addSourceLine - Add location information to specified debug information /// entry. void CompileUnit::addSourceLine(DIE *Die, DISubprogram SP) { // Verify subprogram. if (!SP.Verify()) return; // If the line number is 0, don't add it. if (SP.getLineNumber() == 0) return; unsigned Line = SP.getLineNumber(); if (!SP.getContext().Verify()) return; unsigned FileID = DD->GetOrCreateSourceID(SP.getFilename(), SP.getDirectory()); assert(FileID && "Invalid file id"); addUInt(Die, dwarf::DW_AT_decl_file, 0, FileID); addUInt(Die, dwarf::DW_AT_decl_line, 0, Line); } /// addSourceLine - Add location information to specified debug information /// entry. void CompileUnit::addSourceLine(DIE *Die, DIType Ty) { // Verify type. if (!Ty.Verify()) return; unsigned Line = Ty.getLineNumber(); if (Line == 0 || !Ty.getContext().Verify()) return; unsigned FileID = DD->GetOrCreateSourceID(Ty.getFilename(), Ty.getDirectory()); assert(FileID && "Invalid file id"); addUInt(Die, dwarf::DW_AT_decl_file, 0, FileID); addUInt(Die, dwarf::DW_AT_decl_line, 0, Line); } /// addSourceLine - Add location information to specified debug information /// entry. void CompileUnit::addSourceLine(DIE *Die, DINameSpace NS) { // Verify namespace. if (!NS.Verify()) return; unsigned Line = NS.getLineNumber(); if (Line == 0) return; StringRef FN = NS.getFilename(); unsigned FileID = DD->GetOrCreateSourceID(FN, NS.getDirectory()); assert(FileID && "Invalid file id"); addUInt(Die, dwarf::DW_AT_decl_file, 0, FileID); addUInt(Die, dwarf::DW_AT_decl_line, 0, Line); } /// addVariableAddress - Add DW_AT_location attribute for a /// DbgVariable based on provided MachineLocation. void CompileUnit::addVariableAddress(DbgVariable *&DV, DIE *Die, MachineLocation Location) { if (DV->variableHasComplexAddress()) addComplexAddress(DV, Die, dwarf::DW_AT_location, Location); else if (DV->isBlockByrefVariable()) addBlockByrefAddress(DV, Die, dwarf::DW_AT_location, Location); else addAddress(Die, dwarf::DW_AT_location, Location); } /// addRegisterOp - Add register operand. void CompileUnit::addRegisterOp(DIE *TheDie, unsigned Reg) { const TargetRegisterInfo *RI = Asm->TM.getRegisterInfo(); unsigned DWReg = RI->getDwarfRegNum(Reg, false); if (DWReg < 32) addUInt(TheDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_reg0 + DWReg); else { addUInt(TheDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_regx); addUInt(TheDie, 0, dwarf::DW_FORM_udata, DWReg); } } /// addRegisterOffset - Add register offset. void CompileUnit::addRegisterOffset(DIE *TheDie, unsigned Reg, int64_t Offset) { const TargetRegisterInfo *RI = Asm->TM.getRegisterInfo(); unsigned DWReg = RI->getDwarfRegNum(Reg, false); const TargetRegisterInfo *TRI = Asm->TM.getRegisterInfo(); if (Reg == TRI->getFrameRegister(*Asm->MF)) // If variable offset is based in frame register then use fbreg. addUInt(TheDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_fbreg); else if (DWReg < 32) addUInt(TheDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_breg0 + DWReg); else { addUInt(TheDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_bregx); addUInt(TheDie, 0, dwarf::DW_FORM_udata, DWReg); } addSInt(TheDie, 0, dwarf::DW_FORM_sdata, Offset); } /// addAddress - Add an address attribute to a die based on the location /// provided. void CompileUnit::addAddress(DIE *Die, unsigned Attribute, const MachineLocation &Location) { DIEBlock *Block = new (DIEValueAllocator) DIEBlock(); if (Location.isReg()) addRegisterOp(Block, Location.getReg()); else addRegisterOffset(Block, Location.getReg(), Location.getOffset()); // Now attach the location information to the DIE. addBlock(Die, Attribute, 0, Block); } /// addComplexAddress - Start with the address based on the location provided, /// and generate the DWARF information necessary to find the actual variable /// given the extra address information encoded in the DIVariable, starting from /// the starting location. Add the DWARF information to the die. /// void CompileUnit::addComplexAddress(DbgVariable *&DV, DIE *Die, unsigned Attribute, const MachineLocation &Location) { DIEBlock *Block = new (DIEValueAllocator) DIEBlock(); unsigned N = DV->getNumAddrElements(); unsigned i = 0; if (Location.isReg()) { if (N >= 2 && DV->getAddrElement(0) == DIBuilder::OpPlus) { // If first address element is OpPlus then emit // DW_OP_breg + Offset instead of DW_OP_reg + Offset. addRegisterOffset(Block, Location.getReg(), DV->getAddrElement(1)); i = 2; } else addRegisterOp(Block, Location.getReg()); } else addRegisterOffset(Block, Location.getReg(), Location.getOffset()); for (;i < N; ++i) { uint64_t Element = DV->getAddrElement(i); if (Element == DIBuilder::OpPlus) { addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_plus_uconst); addUInt(Block, 0, dwarf::DW_FORM_udata, DV->getAddrElement(++i)); } else if (Element == DIBuilder::OpDeref) { addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_deref); } else llvm_unreachable("unknown DIBuilder Opcode"); } // Now attach the location information to the DIE. addBlock(Die, Attribute, 0, Block); } /* Byref variables, in Blocks, are declared by the programmer as "SomeType VarName;", but the compiler creates a __Block_byref_x_VarName struct, and gives the variable VarName either the struct, or a pointer to the struct, as its type. This is necessary for various behind-the-scenes things the compiler needs to do with by-reference variables in Blocks. However, as far as the original *programmer* is concerned, the variable should still have type 'SomeType', as originally declared. The function getBlockByrefType dives into the __Block_byref_x_VarName struct to find the original type of the variable, which is then assigned to the variable's Debug Information Entry as its real type. So far, so good. However now the debugger will expect the variable VarName to have the type SomeType. So we need the location attribute for the variable to be an expression that explains to the debugger how to navigate through the pointers and struct to find the actual variable of type SomeType. The following function does just that. We start by getting the "normal" location for the variable. This will be the location of either the struct __Block_byref_x_VarName or the pointer to the struct __Block_byref_x_VarName. The struct will look something like: struct __Block_byref_x_VarName { ... struct __Block_byref_x_VarName *forwarding; ... SomeType VarName; ... }; If we are given the struct directly (as our starting point) we need to tell the debugger to: 1). Add the offset of the forwarding field. 2). Follow that pointer to get the real __Block_byref_x_VarName struct to use (the real one may have been copied onto the heap). 3). Add the offset for the field VarName, to find the actual variable. If we started with a pointer to the struct, then we need to dereference that pointer first, before the other steps. Translating this into DWARF ops, we will need to append the following to the current location description for the variable: DW_OP_deref -- optional, if we start with a pointer DW_OP_plus_uconst DW_OP_deref DW_OP_plus_uconst That is what this function does. */ /// addBlockByrefAddress - Start with the address based on the location /// provided, and generate the DWARF information necessary to find the /// actual Block variable (navigating the Block struct) based on the /// starting location. Add the DWARF information to the die. For /// more information, read large comment just above here. /// void CompileUnit::addBlockByrefAddress(DbgVariable *&DV, DIE *Die, unsigned Attribute, const MachineLocation &Location) { DIType Ty = DV->getType(); DIType TmpTy = Ty; unsigned Tag = Ty.getTag(); bool isPointer = false; StringRef varName = DV->getName(); if (Tag == dwarf::DW_TAG_pointer_type) { DIDerivedType DTy = DIDerivedType(Ty); TmpTy = DTy.getTypeDerivedFrom(); isPointer = true; } DICompositeType blockStruct = DICompositeType(TmpTy); // Find the __forwarding field and the variable field in the __Block_byref // struct. DIArray Fields = blockStruct.getTypeArray(); DIDescriptor varField = DIDescriptor(); DIDescriptor forwardingField = DIDescriptor(); for (unsigned i = 0, N = Fields.getNumElements(); i < N; ++i) { DIDescriptor Element = Fields.getElement(i); DIDerivedType DT = DIDerivedType(Element); StringRef fieldName = DT.getName(); if (fieldName == "__forwarding") forwardingField = Element; else if (fieldName == varName) varField = Element; } // Get the offsets for the forwarding field and the variable field. unsigned forwardingFieldOffset = DIDerivedType(forwardingField).getOffsetInBits() >> 3; unsigned varFieldOffset = DIDerivedType(varField).getOffsetInBits() >> 3; // Decode the original location, and use that as the start of the byref // variable's location. const TargetRegisterInfo *RI = Asm->TM.getRegisterInfo(); unsigned Reg = RI->getDwarfRegNum(Location.getReg(), false); DIEBlock *Block = new (DIEValueAllocator) DIEBlock(); if (Location.isReg()) { if (Reg < 32) addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_reg0 + Reg); else { addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_regx); addUInt(Block, 0, dwarf::DW_FORM_udata, Reg); } } else { if (Reg < 32) addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_breg0 + Reg); else { addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_bregx); addUInt(Block, 0, dwarf::DW_FORM_udata, Reg); } addUInt(Block, 0, dwarf::DW_FORM_sdata, Location.getOffset()); } // If we started with a pointer to the __Block_byref... struct, then // the first thing we need to do is dereference the pointer (DW_OP_deref). if (isPointer) addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_deref); // Next add the offset for the '__forwarding' field: // DW_OP_plus_uconst ForwardingFieldOffset. Note there's no point in // adding the offset if it's 0. if (forwardingFieldOffset > 0) { addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_plus_uconst); addUInt(Block, 0, dwarf::DW_FORM_udata, forwardingFieldOffset); } // Now dereference the __forwarding field to get to the real __Block_byref // struct: DW_OP_deref. addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_deref); // Now that we've got the real __Block_byref... struct, add the offset // for the variable's field to get to the location of the actual variable: // DW_OP_plus_uconst varFieldOffset. Again, don't add if it's 0. if (varFieldOffset > 0) { addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_plus_uconst); addUInt(Block, 0, dwarf::DW_FORM_udata, varFieldOffset); } // Now attach the location information to the DIE. addBlock(Die, Attribute, 0, Block); } /// isTypeSigned - Return true if the type is signed. static bool isTypeSigned(DIType Ty, int *SizeInBits) { if (Ty.isDerivedType()) return isTypeSigned(DIDerivedType(Ty).getTypeDerivedFrom(), SizeInBits); if (Ty.isBasicType()) if (DIBasicType(Ty).getEncoding() == dwarf::DW_ATE_signed || DIBasicType(Ty).getEncoding() == dwarf::DW_ATE_signed_char) { *SizeInBits = Ty.getSizeInBits(); return true; } return false; } /// addConstantValue - Add constant value entry in variable DIE. bool CompileUnit::addConstantValue(DIE *Die, const MachineOperand &MO, DIType Ty) { assert (MO.isImm() && "Invalid machine operand!"); DIEBlock *Block = new (DIEValueAllocator) DIEBlock(); int SizeInBits = -1; bool SignedConstant = isTypeSigned(Ty, &SizeInBits); unsigned Form = SignedConstant ? dwarf::DW_FORM_sdata : dwarf::DW_FORM_udata; switch (SizeInBits) { case 8: Form = dwarf::DW_FORM_data1; break; case 16: Form = dwarf::DW_FORM_data2; break; case 32: Form = dwarf::DW_FORM_data4; break; case 64: Form = dwarf::DW_FORM_data8; break; default: break; } SignedConstant ? addSInt(Block, 0, Form, MO.getImm()) : addUInt(Block, 0, Form, MO.getImm()); addBlock(Die, dwarf::DW_AT_const_value, 0, Block); return true; } /// addConstantFPValue - Add constant value entry in variable DIE. bool CompileUnit::addConstantFPValue(DIE *Die, const MachineOperand &MO) { assert (MO.isFPImm() && "Invalid machine operand!"); DIEBlock *Block = new (DIEValueAllocator) DIEBlock(); APFloat FPImm = MO.getFPImm()->getValueAPF(); // Get the raw data form of the floating point. const APInt FltVal = FPImm.bitcastToAPInt(); const char *FltPtr = (const char*)FltVal.getRawData(); int NumBytes = FltVal.getBitWidth() / 8; // 8 bits per byte. bool LittleEndian = Asm->getTargetData().isLittleEndian(); int Incr = (LittleEndian ? 1 : -1); int Start = (LittleEndian ? 0 : NumBytes - 1); int Stop = (LittleEndian ? NumBytes : -1); // Output the constant to DWARF one byte at a time. for (; Start != Stop; Start += Incr) addUInt(Block, 0, dwarf::DW_FORM_data1, (unsigned char)0xFF & FltPtr[Start]); addBlock(Die, dwarf::DW_AT_const_value, 0, Block); return true; } /// addConstantValue - Add constant value entry in variable DIE. bool CompileUnit::addConstantValue(DIE *Die, const ConstantInt *CI, bool Unsigned) { unsigned CIBitWidth = CI->getBitWidth(); if (CIBitWidth <= 64) { unsigned form = 0; switch (CIBitWidth) { case 8: form = dwarf::DW_FORM_data1; break; case 16: form = dwarf::DW_FORM_data2; break; case 32: form = dwarf::DW_FORM_data4; break; case 64: form = dwarf::DW_FORM_data8; break; default: form = Unsigned ? dwarf::DW_FORM_udata : dwarf::DW_FORM_sdata; } if (Unsigned) addUInt(Die, dwarf::DW_AT_const_value, form, CI->getZExtValue()); else addSInt(Die, dwarf::DW_AT_const_value, form, CI->getSExtValue()); return true; } DIEBlock *Block = new (DIEValueAllocator) DIEBlock(); // Get the raw data form of the large APInt. const APInt Val = CI->getValue(); - const char *Ptr = (const char*)Val.getRawData(); + const uint64_t *Ptr64 = Val.getRawData(); int NumBytes = Val.getBitWidth() / 8; // 8 bits per byte. bool LittleEndian = Asm->getTargetData().isLittleEndian(); - int Incr = (LittleEndian ? 1 : -1); - int Start = (LittleEndian ? 0 : NumBytes - 1); - int Stop = (LittleEndian ? NumBytes : -1); // Output the constant to DWARF one byte at a time. - for (; Start != Stop; Start += Incr) - addUInt(Block, 0, dwarf::DW_FORM_data1, - (unsigned char)0xFF & Ptr[Start]); + for (int i = 0; i < NumBytes; i++) { + uint8_t c; + if (LittleEndian) + c = Ptr64[i / 8] >> (8 * (i & 7)); + else + c = Ptr64[(NumBytes - 1 - i) / 8] >> (8 * ((NumBytes - 1 - i) & 7)); + addUInt(Block, 0, dwarf::DW_FORM_data1, c); + } addBlock(Die, dwarf::DW_AT_const_value, 0, Block); return true; } /// addTemplateParams - Add template parameters in buffer. void CompileUnit::addTemplateParams(DIE &Buffer, DIArray TParams) { // Add template parameters. for (unsigned i = 0, e = TParams.getNumElements(); i != e; ++i) { DIDescriptor Element = TParams.getElement(i); if (Element.isTemplateTypeParameter()) Buffer.addChild(getOrCreateTemplateTypeParameterDIE( DITemplateTypeParameter(Element))); else if (Element.isTemplateValueParameter()) Buffer.addChild(getOrCreateTemplateValueParameterDIE( DITemplateValueParameter(Element))); } } /// addToContextOwner - Add Die into the list of its context owner's children. void CompileUnit::addToContextOwner(DIE *Die, DIDescriptor Context) { if (Context.isType()) { DIE *ContextDIE = getOrCreateTypeDIE(DIType(Context)); ContextDIE->addChild(Die); } else if (Context.isNameSpace()) { DIE *ContextDIE = getOrCreateNameSpace(DINameSpace(Context)); ContextDIE->addChild(Die); } else if (Context.isSubprogram()) { DIE *ContextDIE = getOrCreateSubprogramDIE(DISubprogram(Context)); ContextDIE->addChild(Die); } else if (DIE *ContextDIE = getDIE(Context)) ContextDIE->addChild(Die); else addDie(Die); } /// getOrCreateTypeDIE - Find existing DIE or create new DIE for the /// given DIType. DIE *CompileUnit::getOrCreateTypeDIE(const MDNode *TyNode) { DIType Ty(TyNode); if (!Ty.Verify()) return NULL; DIE *TyDIE = getDIE(Ty); if (TyDIE) return TyDIE; // Create new type. TyDIE = new DIE(dwarf::DW_TAG_base_type); insertDIE(Ty, TyDIE); if (Ty.isBasicType()) constructTypeDIE(*TyDIE, DIBasicType(Ty)); else if (Ty.isCompositeType()) constructTypeDIE(*TyDIE, DICompositeType(Ty)); else { assert(Ty.isDerivedType() && "Unknown kind of DIType"); constructTypeDIE(*TyDIE, DIDerivedType(Ty)); } addToContextOwner(TyDIE, Ty.getContext()); return TyDIE; } /// addType - Add a new type attribute to the specified entity. void CompileUnit::addType(DIE *Entity, DIType Ty) { if (!Ty.Verify()) return; // Check for pre-existence. DIEEntry *Entry = getDIEEntry(Ty); // If it exists then use the existing value. if (Entry) { Entity->addValue(dwarf::DW_AT_type, dwarf::DW_FORM_ref4, Entry); return; } // Construct type. DIE *Buffer = getOrCreateTypeDIE(Ty); // Set up proxy. Entry = createDIEEntry(Buffer); insertDIEEntry(Ty, Entry); Entity->addValue(dwarf::DW_AT_type, dwarf::DW_FORM_ref4, Entry); // If this is a complete composite type then include it in the // list of global types. addGlobalType(Ty); } /// addGlobalType - Add a new global type to the compile unit. /// void CompileUnit::addGlobalType(DIType Ty) { DIDescriptor Context = Ty.getContext(); if (Ty.isCompositeType() && !Ty.getName().empty() && !Ty.isForwardDecl() && (!Context || Context.isCompileUnit() || Context.isFile() || Context.isNameSpace())) if (DIEEntry *Entry = getDIEEntry(Ty)) GlobalTypes[Ty.getName()] = Entry->getEntry(); } /// addPubTypes - Add type for pubtypes section. void CompileUnit::addPubTypes(DISubprogram SP) { DICompositeType SPTy = SP.getType(); unsigned SPTag = SPTy.getTag(); if (SPTag != dwarf::DW_TAG_subroutine_type) return; DIArray Args = SPTy.getTypeArray(); for (unsigned i = 0, e = Args.getNumElements(); i != e; ++i) { DIType ATy(Args.getElement(i)); if (!ATy.Verify()) continue; addGlobalType(ATy); } } /// constructTypeDIE - Construct basic type die from DIBasicType. void CompileUnit::constructTypeDIE(DIE &Buffer, DIBasicType BTy) { // Get core information. StringRef Name = BTy.getName(); // Add name if not anonymous or intermediate type. if (!Name.empty()) addString(&Buffer, dwarf::DW_AT_name, dwarf::DW_FORM_string, Name); if (BTy.getTag() == dwarf::DW_TAG_unspecified_type) { Buffer.setTag(dwarf::DW_TAG_unspecified_type); // Unspecified types has only name, nothing else. return; } Buffer.setTag(dwarf::DW_TAG_base_type); addUInt(&Buffer, dwarf::DW_AT_encoding, dwarf::DW_FORM_data1, BTy.getEncoding()); uint64_t Size = BTy.getSizeInBits() >> 3; addUInt(&Buffer, dwarf::DW_AT_byte_size, 0, Size); } /// constructTypeDIE - Construct derived type die from DIDerivedType. void CompileUnit::constructTypeDIE(DIE &Buffer, DIDerivedType DTy) { // Get core information. StringRef Name = DTy.getName(); uint64_t Size = DTy.getSizeInBits() >> 3; unsigned Tag = DTy.getTag(); // FIXME - Workaround for templates. if (Tag == dwarf::DW_TAG_inheritance) Tag = dwarf::DW_TAG_reference_type; Buffer.setTag(Tag); // Map to main type, void will not have a type. DIType FromTy = DTy.getTypeDerivedFrom(); addType(&Buffer, FromTy); // Add name if not anonymous or intermediate type. if (!Name.empty()) addString(&Buffer, dwarf::DW_AT_name, dwarf::DW_FORM_string, Name); // Add size if non-zero (derived types might be zero-sized.) if (Size) addUInt(&Buffer, dwarf::DW_AT_byte_size, 0, Size); // Add source line info if available and TyDesc is not a forward declaration. if (!DTy.isForwardDecl()) addSourceLine(&Buffer, DTy); } /// constructTypeDIE - Construct type DIE from DICompositeType. void CompileUnit::constructTypeDIE(DIE &Buffer, DICompositeType CTy) { // Get core information. StringRef Name = CTy.getName(); uint64_t Size = CTy.getSizeInBits() >> 3; unsigned Tag = CTy.getTag(); Buffer.setTag(Tag); switch (Tag) { case dwarf::DW_TAG_vector_type: case dwarf::DW_TAG_array_type: constructArrayTypeDIE(Buffer, &CTy); break; case dwarf::DW_TAG_enumeration_type: { DIArray Elements = CTy.getTypeArray(); // Add enumerators to enumeration type. for (unsigned i = 0, N = Elements.getNumElements(); i < N; ++i) { DIE *ElemDie = NULL; DIDescriptor Enum(Elements.getElement(i)); if (Enum.isEnumerator()) { ElemDie = constructEnumTypeDIE(DIEnumerator(Enum)); Buffer.addChild(ElemDie); } } } break; case dwarf::DW_TAG_subroutine_type: { // Add return type. DIArray Elements = CTy.getTypeArray(); DIDescriptor RTy = Elements.getElement(0); addType(&Buffer, DIType(RTy)); bool isPrototyped = true; // Add arguments. for (unsigned i = 1, N = Elements.getNumElements(); i < N; ++i) { DIDescriptor Ty = Elements.getElement(i); if (Ty.isUnspecifiedParameter()) { DIE *Arg = new DIE(dwarf::DW_TAG_unspecified_parameters); Buffer.addChild(Arg); isPrototyped = false; } else { DIE *Arg = new DIE(dwarf::DW_TAG_formal_parameter); addType(Arg, DIType(Ty)); Buffer.addChild(Arg); } } // Add prototype flag. if (isPrototyped) addUInt(&Buffer, dwarf::DW_AT_prototyped, dwarf::DW_FORM_flag, 1); } break; case dwarf::DW_TAG_structure_type: case dwarf::DW_TAG_union_type: case dwarf::DW_TAG_class_type: { // Add elements to structure type. DIArray Elements = CTy.getTypeArray(); // A forward struct declared type may not have elements available. unsigned N = Elements.getNumElements(); if (N == 0) break; // Add elements to structure type. for (unsigned i = 0; i < N; ++i) { DIDescriptor Element = Elements.getElement(i); DIE *ElemDie = NULL; if (Element.isSubprogram()) { DISubprogram SP(Element); ElemDie = getOrCreateSubprogramDIE(DISubprogram(Element)); if (SP.isProtected()) addUInt(ElemDie, dwarf::DW_AT_accessibility, dwarf::DW_FORM_flag, dwarf::DW_ACCESS_protected); else if (SP.isPrivate()) addUInt(ElemDie, dwarf::DW_AT_accessibility, dwarf::DW_FORM_flag, dwarf::DW_ACCESS_private); else addUInt(ElemDie, dwarf::DW_AT_accessibility, dwarf::DW_FORM_flag, dwarf::DW_ACCESS_public); if (SP.isExplicit()) addUInt(ElemDie, dwarf::DW_AT_explicit, dwarf::DW_FORM_flag, 1); } else if (Element.isVariable()) { DIVariable DV(Element); ElemDie = new DIE(dwarf::DW_TAG_variable); addString(ElemDie, dwarf::DW_AT_name, dwarf::DW_FORM_string, DV.getName()); addType(ElemDie, DV.getType()); addUInt(ElemDie, dwarf::DW_AT_declaration, dwarf::DW_FORM_flag, 1); addUInt(ElemDie, dwarf::DW_AT_external, dwarf::DW_FORM_flag, 1); addSourceLine(ElemDie, DV); } else if (Element.isDerivedType()) ElemDie = createMemberDIE(DIDerivedType(Element)); else continue; Buffer.addChild(ElemDie); } if (CTy.isAppleBlockExtension()) addUInt(&Buffer, dwarf::DW_AT_APPLE_block, dwarf::DW_FORM_flag, 1); unsigned RLang = CTy.getRunTimeLang(); if (RLang) addUInt(&Buffer, dwarf::DW_AT_APPLE_runtime_class, dwarf::DW_FORM_data1, RLang); DICompositeType ContainingType = CTy.getContainingType(); if (DIDescriptor(ContainingType).isCompositeType()) addDIEEntry(&Buffer, dwarf::DW_AT_containing_type, dwarf::DW_FORM_ref4, getOrCreateTypeDIE(DIType(ContainingType))); else { DIDescriptor Context = CTy.getContext(); addToContextOwner(&Buffer, Context); } if (CTy.isObjcClassComplete()) addUInt(&Buffer, dwarf::DW_AT_APPLE_objc_complete_type, dwarf::DW_FORM_flag, 1); if (Tag == dwarf::DW_TAG_class_type) addTemplateParams(Buffer, CTy.getTemplateParams()); break; } default: break; } // Add name if not anonymous or intermediate type. if (!Name.empty()) addString(&Buffer, dwarf::DW_AT_name, dwarf::DW_FORM_string, Name); if (Tag == dwarf::DW_TAG_enumeration_type || Tag == dwarf::DW_TAG_class_type || Tag == dwarf::DW_TAG_structure_type || Tag == dwarf::DW_TAG_union_type) { // Add size if non-zero (derived types might be zero-sized.) if (Size) addUInt(&Buffer, dwarf::DW_AT_byte_size, 0, Size); else { // Add zero size if it is not a forward declaration. if (CTy.isForwardDecl()) addUInt(&Buffer, dwarf::DW_AT_declaration, dwarf::DW_FORM_flag, 1); else addUInt(&Buffer, dwarf::DW_AT_byte_size, 0, 0); } // Add source line info if available. if (!CTy.isForwardDecl()) addSourceLine(&Buffer, CTy); } } /// getOrCreateTemplateTypeParameterDIE - Find existing DIE or create new DIE /// for the given DITemplateTypeParameter. DIE * CompileUnit::getOrCreateTemplateTypeParameterDIE(DITemplateTypeParameter TP) { DIE *ParamDIE = getDIE(TP); if (ParamDIE) return ParamDIE; ParamDIE = new DIE(dwarf::DW_TAG_template_type_parameter); addType(ParamDIE, TP.getType()); addString(ParamDIE, dwarf::DW_AT_name, dwarf::DW_FORM_string, TP.getName()); return ParamDIE; } /// getOrCreateTemplateValueParameterDIE - Find existing DIE or create new DIE /// for the given DITemplateValueParameter. DIE * CompileUnit::getOrCreateTemplateValueParameterDIE(DITemplateValueParameter TPV) { DIE *ParamDIE = getDIE(TPV); if (ParamDIE) return ParamDIE; ParamDIE = new DIE(dwarf::DW_TAG_template_value_parameter); addType(ParamDIE, TPV.getType()); if (!TPV.getName().empty()) addString(ParamDIE, dwarf::DW_AT_name, dwarf::DW_FORM_string, TPV.getName()); addUInt(ParamDIE, dwarf::DW_AT_const_value, dwarf::DW_FORM_udata, TPV.getValue()); return ParamDIE; } /// getOrCreateNameSpace - Create a DIE for DINameSpace. DIE *CompileUnit::getOrCreateNameSpace(DINameSpace NS) { DIE *NDie = getDIE(NS); if (NDie) return NDie; NDie = new DIE(dwarf::DW_TAG_namespace); insertDIE(NS, NDie); if (!NS.getName().empty()) addString(NDie, dwarf::DW_AT_name, dwarf::DW_FORM_string, NS.getName()); addSourceLine(NDie, NS); addToContextOwner(NDie, NS.getContext()); return NDie; } /// getRealLinkageName - If special LLVM prefix that is used to inform the asm /// printer to not emit usual symbol prefix before the symbol name is used then /// return linkage name after skipping this special LLVM prefix. static StringRef getRealLinkageName(StringRef LinkageName) { char One = '\1'; if (LinkageName.startswith(StringRef(&One, 1))) return LinkageName.substr(1); return LinkageName; } /// getOrCreateSubprogramDIE - Create new DIE using SP. DIE *CompileUnit::getOrCreateSubprogramDIE(DISubprogram SP) { DIE *SPDie = getDIE(SP); if (SPDie) return SPDie; SPDie = new DIE(dwarf::DW_TAG_subprogram); // DW_TAG_inlined_subroutine may refer to this DIE. insertDIE(SP, SPDie); // Add to context owner. addToContextOwner(SPDie, SP.getContext()); // Add function template parameters. addTemplateParams(*SPDie, SP.getTemplateParams()); StringRef LinkageName = SP.getLinkageName(); if (!LinkageName.empty()) addString(SPDie, dwarf::DW_AT_MIPS_linkage_name, dwarf::DW_FORM_string, getRealLinkageName(LinkageName)); // If this DIE is going to refer declaration info using AT_specification // then there is no need to add other attributes. if (SP.getFunctionDeclaration().isSubprogram()) return SPDie; // Constructors and operators for anonymous aggregates do not have names. if (!SP.getName().empty()) addString(SPDie, dwarf::DW_AT_name, dwarf::DW_FORM_string, SP.getName()); addSourceLine(SPDie, SP); if (SP.isPrototyped()) addUInt(SPDie, dwarf::DW_AT_prototyped, dwarf::DW_FORM_flag, 1); // Add Return Type. DICompositeType SPTy = SP.getType(); DIArray Args = SPTy.getTypeArray(); unsigned SPTag = SPTy.getTag(); if (Args.getNumElements() == 0 || SPTag != dwarf::DW_TAG_subroutine_type) addType(SPDie, SPTy); else addType(SPDie, DIType(Args.getElement(0))); unsigned VK = SP.getVirtuality(); if (VK) { addUInt(SPDie, dwarf::DW_AT_virtuality, dwarf::DW_FORM_flag, VK); DIEBlock *Block = getDIEBlock(); addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_constu); addUInt(Block, 0, dwarf::DW_FORM_udata, SP.getVirtualIndex()); addBlock(SPDie, dwarf::DW_AT_vtable_elem_location, 0, Block); ContainingTypeMap.insert(std::make_pair(SPDie, SP.getContainingType())); } if (!SP.isDefinition()) { addUInt(SPDie, dwarf::DW_AT_declaration, dwarf::DW_FORM_flag, 1); // Add arguments. Do not add arguments for subprogram definition. They will // be handled while processing variables. DICompositeType SPTy = SP.getType(); DIArray Args = SPTy.getTypeArray(); unsigned SPTag = SPTy.getTag(); if (SPTag == dwarf::DW_TAG_subroutine_type) for (unsigned i = 1, N = Args.getNumElements(); i < N; ++i) { DIE *Arg = new DIE(dwarf::DW_TAG_formal_parameter); DIType ATy = DIType(DIType(Args.getElement(i))); addType(Arg, ATy); if (ATy.isArtificial()) addUInt(Arg, dwarf::DW_AT_artificial, dwarf::DW_FORM_flag, 1); SPDie->addChild(Arg); } } if (SP.isArtificial()) addUInt(SPDie, dwarf::DW_AT_artificial, dwarf::DW_FORM_flag, 1); if (!SP.isLocalToUnit()) addUInt(SPDie, dwarf::DW_AT_external, dwarf::DW_FORM_flag, 1); if (SP.isOptimized()) addUInt(SPDie, dwarf::DW_AT_APPLE_optimized, dwarf::DW_FORM_flag, 1); if (unsigned isa = Asm->getISAEncoding()) { addUInt(SPDie, dwarf::DW_AT_APPLE_isa, dwarf::DW_FORM_flag, isa); } return SPDie; } // Return const expression if value is a GEP to access merged global // constant. e.g. // i8* getelementptr ({ i8, i8, i8, i8 }* @_MergedGlobals, i32 0, i32 0) static const ConstantExpr *getMergedGlobalExpr(const Value *V) { const ConstantExpr *CE = dyn_cast_or_null(V); if (!CE || CE->getNumOperands() != 3 || CE->getOpcode() != Instruction::GetElementPtr) return NULL; // First operand points to a global struct. Value *Ptr = CE->getOperand(0); if (!isa(Ptr) || !isa(cast(Ptr->getType())->getElementType())) return NULL; // Second operand is zero. const ConstantInt *CI = dyn_cast_or_null(CE->getOperand(1)); if (!CI || !CI->isZero()) return NULL; // Third operand is offset. if (!isa(CE->getOperand(2))) return NULL; return CE; } /// createGlobalVariableDIE - create global variable DIE. void CompileUnit::createGlobalVariableDIE(const MDNode *N) { // Check for pre-existence. if (getDIE(N)) return; DIGlobalVariable GV(N); if (!GV.Verify()) return; DIE *VariableDIE = new DIE(GV.getTag()); // Add to map. insertDIE(N, VariableDIE); // Add name. addString(VariableDIE, dwarf::DW_AT_name, dwarf::DW_FORM_string, GV.getDisplayName()); StringRef LinkageName = GV.getLinkageName(); bool isGlobalVariable = GV.getGlobal() != NULL; if (!LinkageName.empty() && isGlobalVariable) addString(VariableDIE, dwarf::DW_AT_MIPS_linkage_name, dwarf::DW_FORM_string, getRealLinkageName(LinkageName)); // Add type. DIType GTy = GV.getType(); addType(VariableDIE, GTy); // Add scoping info. if (!GV.isLocalToUnit()) { addUInt(VariableDIE, dwarf::DW_AT_external, dwarf::DW_FORM_flag, 1); // Expose as global. addGlobal(GV.getName(), VariableDIE); } // Add line number info. addSourceLine(VariableDIE, GV); // Add to context owner. DIDescriptor GVContext = GV.getContext(); addToContextOwner(VariableDIE, GVContext); // Add location. if (isGlobalVariable) { DIEBlock *Block = new (DIEValueAllocator) DIEBlock(); addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_addr); addLabel(Block, 0, dwarf::DW_FORM_udata, Asm->Mang->getSymbol(GV.getGlobal())); // Do not create specification DIE if context is either compile unit // or a subprogram. if (GVContext && GV.isDefinition() && !GVContext.isCompileUnit() && !GVContext.isFile() && !isSubprogramContext(GVContext)) { // Create specification DIE. DIE *VariableSpecDIE = new DIE(dwarf::DW_TAG_variable); addDIEEntry(VariableSpecDIE, dwarf::DW_AT_specification, dwarf::DW_FORM_ref4, VariableDIE); addBlock(VariableSpecDIE, dwarf::DW_AT_location, 0, Block); addUInt(VariableDIE, dwarf::DW_AT_declaration, dwarf::DW_FORM_flag, 1); addDie(VariableSpecDIE); } else { addBlock(VariableDIE, dwarf::DW_AT_location, 0, Block); } } else if (const ConstantInt *CI = dyn_cast_or_null(GV.getConstant())) addConstantValue(VariableDIE, CI, GTy.isUnsignedDIType()); else if (const ConstantExpr *CE = getMergedGlobalExpr(N->getOperand(11))) { // GV is a merged global. DIEBlock *Block = new (DIEValueAllocator) DIEBlock(); Value *Ptr = CE->getOperand(0); addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_addr); addLabel(Block, 0, dwarf::DW_FORM_udata, Asm->Mang->getSymbol(cast(Ptr))); addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_constu); SmallVector Idx(CE->op_begin()+1, CE->op_end()); addUInt(Block, 0, dwarf::DW_FORM_udata, Asm->getTargetData().getIndexedOffset(Ptr->getType(), Idx)); addUInt(Block, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_plus); addBlock(VariableDIE, dwarf::DW_AT_location, 0, Block); } return; } /// constructSubrangeDIE - Construct subrange DIE from DISubrange. void CompileUnit::constructSubrangeDIE(DIE &Buffer, DISubrange SR, DIE *IndexTy){ DIE *DW_Subrange = new DIE(dwarf::DW_TAG_subrange_type); addDIEEntry(DW_Subrange, dwarf::DW_AT_type, dwarf::DW_FORM_ref4, IndexTy); int64_t L = SR.getLo(); int64_t H = SR.getHi(); // The L value defines the lower bounds which is typically zero for C/C++. The // H value is the upper bounds. Values are 64 bit. H - L + 1 is the size // of the array. If L > H then do not emit DW_AT_lower_bound and // DW_AT_upper_bound attributes. If L is zero and H is also zero then the // array has one element and in such case do not emit lower bound. if (L > H) { Buffer.addChild(DW_Subrange); return; } if (L) addSInt(DW_Subrange, dwarf::DW_AT_lower_bound, 0, L); addSInt(DW_Subrange, dwarf::DW_AT_upper_bound, 0, H); Buffer.addChild(DW_Subrange); } /// constructArrayTypeDIE - Construct array type DIE from DICompositeType. void CompileUnit::constructArrayTypeDIE(DIE &Buffer, DICompositeType *CTy) { Buffer.setTag(dwarf::DW_TAG_array_type); if (CTy->getTag() == dwarf::DW_TAG_vector_type) addUInt(&Buffer, dwarf::DW_AT_GNU_vector, dwarf::DW_FORM_flag, 1); // Emit derived type. addType(&Buffer, CTy->getTypeDerivedFrom()); DIArray Elements = CTy->getTypeArray(); // Get an anonymous type for index type. DIE *IdxTy = getIndexTyDie(); if (!IdxTy) { // Construct an anonymous type for index type. IdxTy = new DIE(dwarf::DW_TAG_base_type); addUInt(IdxTy, dwarf::DW_AT_byte_size, 0, sizeof(int32_t)); addUInt(IdxTy, dwarf::DW_AT_encoding, dwarf::DW_FORM_data1, dwarf::DW_ATE_signed); addDie(IdxTy); setIndexTyDie(IdxTy); } // Add subranges to array type. for (unsigned i = 0, N = Elements.getNumElements(); i < N; ++i) { DIDescriptor Element = Elements.getElement(i); if (Element.getTag() == dwarf::DW_TAG_subrange_type) constructSubrangeDIE(Buffer, DISubrange(Element), IdxTy); } } /// constructEnumTypeDIE - Construct enum type DIE from DIEnumerator. DIE *CompileUnit::constructEnumTypeDIE(DIEnumerator ETy) { DIE *Enumerator = new DIE(dwarf::DW_TAG_enumerator); StringRef Name = ETy.getName(); addString(Enumerator, dwarf::DW_AT_name, dwarf::DW_FORM_string, Name); int64_t Value = ETy.getEnumValue(); addSInt(Enumerator, dwarf::DW_AT_const_value, dwarf::DW_FORM_sdata, Value); return Enumerator; } /// constructContainingTypeDIEs - Construct DIEs for types that contain /// vtables. void CompileUnit::constructContainingTypeDIEs() { for (DenseMap::iterator CI = ContainingTypeMap.begin(), CE = ContainingTypeMap.end(); CI != CE; ++CI) { DIE *SPDie = CI->first; const MDNode *N = CI->second; if (!N) continue; DIE *NDie = getDIE(N); if (!NDie) continue; addDIEEntry(SPDie, dwarf::DW_AT_containing_type, dwarf::DW_FORM_ref4, NDie); } } /// constructVariableDIE - Construct a DIE for the given DbgVariable. DIE *CompileUnit::constructVariableDIE(DbgVariable *DV, bool isScopeAbstract) { StringRef Name = DV->getName(); if (Name.empty()) return NULL; // Translate tag to proper Dwarf tag. unsigned Tag = DV->getTag(); // Define variable debug information entry. DIE *VariableDie = new DIE(Tag); DbgVariable *AbsVar = DV->getAbstractVariable(); DIE *AbsDIE = AbsVar ? AbsVar->getDIE() : NULL; if (AbsDIE) addDIEEntry(VariableDie, dwarf::DW_AT_abstract_origin, dwarf::DW_FORM_ref4, AbsDIE); else { addString(VariableDie, dwarf::DW_AT_name, dwarf::DW_FORM_string, Name); addSourceLine(VariableDie, DV->getVariable()); addType(VariableDie, DV->getType()); } if (DV->isArtificial()) addUInt(VariableDie, dwarf::DW_AT_artificial, dwarf::DW_FORM_flag, 1); if (isScopeAbstract) { DV->setDIE(VariableDie); return VariableDie; } // Add variable address. unsigned Offset = DV->getDotDebugLocOffset(); if (Offset != ~0U) { addLabel(VariableDie, dwarf::DW_AT_location, dwarf::DW_FORM_data4, Asm->GetTempSymbol("debug_loc", Offset)); DV->setDIE(VariableDie); return VariableDie; } // Check if variable is described by a DBG_VALUE instruction. if (const MachineInstr *DVInsn = DV->getMInsn()) { bool updated = false; if (DVInsn->getNumOperands() == 3) { if (DVInsn->getOperand(0).isReg()) { const MachineOperand RegOp = DVInsn->getOperand(0); const TargetRegisterInfo *TRI = Asm->TM.getRegisterInfo(); if (DVInsn->getOperand(1).isImm() && TRI->getFrameRegister(*Asm->MF) == RegOp.getReg()) { unsigned FrameReg = 0; const TargetFrameLowering *TFI = Asm->TM.getFrameLowering(); int Offset = TFI->getFrameIndexReference(*Asm->MF, DVInsn->getOperand(1).getImm(), FrameReg); MachineLocation Location(FrameReg, Offset); addVariableAddress(DV, VariableDie, Location); } else if (RegOp.getReg()) addVariableAddress(DV, VariableDie, MachineLocation(RegOp.getReg())); updated = true; } else if (DVInsn->getOperand(0).isImm()) updated = addConstantValue(VariableDie, DVInsn->getOperand(0), DV->getType()); else if (DVInsn->getOperand(0).isFPImm()) updated = addConstantFPValue(VariableDie, DVInsn->getOperand(0)); else if (DVInsn->getOperand(0).isCImm()) updated = addConstantValue(VariableDie, DVInsn->getOperand(0).getCImm(), DV->getType().isUnsignedDIType()); } else { addVariableAddress(DV, VariableDie, Asm->getDebugValueLocation(DVInsn)); updated = true; } if (!updated) { // If variableDie is not updated then DBG_VALUE instruction does not // have valid variable info. delete VariableDie; return NULL; } DV->setDIE(VariableDie); return VariableDie; } else { // .. else use frame index. int FI = DV->getFrameIndex(); if (FI != ~0) { unsigned FrameReg = 0; const TargetFrameLowering *TFI = Asm->TM.getFrameLowering(); int Offset = TFI->getFrameIndexReference(*Asm->MF, FI, FrameReg); MachineLocation Location(FrameReg, Offset); addVariableAddress(DV, VariableDie, Location); } } DV->setDIE(VariableDie); return VariableDie; } /// createMemberDIE - Create new member DIE. DIE *CompileUnit::createMemberDIE(DIDerivedType DT) { DIE *MemberDie = new DIE(DT.getTag()); StringRef Name = DT.getName(); if (!Name.empty()) addString(MemberDie, dwarf::DW_AT_name, dwarf::DW_FORM_string, Name); addType(MemberDie, DT.getTypeDerivedFrom()); addSourceLine(MemberDie, DT); DIEBlock *MemLocationDie = new (DIEValueAllocator) DIEBlock(); addUInt(MemLocationDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_plus_uconst); uint64_t Size = DT.getSizeInBits(); uint64_t FieldSize = DT.getOriginalTypeSize(); if (Size != FieldSize) { // Handle bitfield. addUInt(MemberDie, dwarf::DW_AT_byte_size, 0, DT.getOriginalTypeSize()>>3); addUInt(MemberDie, dwarf::DW_AT_bit_size, 0, DT.getSizeInBits()); uint64_t Offset = DT.getOffsetInBits(); uint64_t AlignMask = ~(DT.getAlignInBits() - 1); uint64_t HiMark = (Offset + FieldSize) & AlignMask; uint64_t FieldOffset = (HiMark - FieldSize); Offset -= FieldOffset; // Maybe we need to work from the other end. if (Asm->getTargetData().isLittleEndian()) Offset = FieldSize - (Offset + Size); addUInt(MemberDie, dwarf::DW_AT_bit_offset, 0, Offset); // Here WD_AT_data_member_location points to the anonymous // field that includes this bit field. addUInt(MemLocationDie, 0, dwarf::DW_FORM_udata, FieldOffset >> 3); } else // This is not a bitfield. addUInt(MemLocationDie, 0, dwarf::DW_FORM_udata, DT.getOffsetInBits() >> 3); if (DT.getTag() == dwarf::DW_TAG_inheritance && DT.isVirtual()) { // For C++, virtual base classes are not at fixed offset. Use following // expression to extract appropriate offset from vtable. // BaseAddr = ObAddr + *((*ObAddr) - Offset) DIEBlock *VBaseLocationDie = new (DIEValueAllocator) DIEBlock(); addUInt(VBaseLocationDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_dup); addUInt(VBaseLocationDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_deref); addUInt(VBaseLocationDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_constu); addUInt(VBaseLocationDie, 0, dwarf::DW_FORM_udata, DT.getOffsetInBits()); addUInt(VBaseLocationDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_minus); addUInt(VBaseLocationDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_deref); addUInt(VBaseLocationDie, 0, dwarf::DW_FORM_data1, dwarf::DW_OP_plus); addBlock(MemberDie, dwarf::DW_AT_data_member_location, 0, VBaseLocationDie); } else addBlock(MemberDie, dwarf::DW_AT_data_member_location, 0, MemLocationDie); if (DT.isProtected()) addUInt(MemberDie, dwarf::DW_AT_accessibility, dwarf::DW_FORM_flag, dwarf::DW_ACCESS_protected); else if (DT.isPrivate()) addUInt(MemberDie, dwarf::DW_AT_accessibility, dwarf::DW_FORM_flag, dwarf::DW_ACCESS_private); // Otherwise C++ member and base classes are considered public. else addUInt(MemberDie, dwarf::DW_AT_accessibility, dwarf::DW_FORM_flag, dwarf::DW_ACCESS_public); if (DT.isVirtual()) addUInt(MemberDie, dwarf::DW_AT_virtuality, dwarf::DW_FORM_flag, dwarf::DW_VIRTUALITY_virtual); // Objective-C properties. StringRef PropertyName = DT.getObjCPropertyName(); if (!PropertyName.empty()) { addString(MemberDie, dwarf::DW_AT_APPLE_property_name, dwarf::DW_FORM_string, PropertyName); StringRef GetterName = DT.getObjCPropertyGetterName(); if (!GetterName.empty()) addString(MemberDie, dwarf::DW_AT_APPLE_property_getter, dwarf::DW_FORM_string, GetterName); StringRef SetterName = DT.getObjCPropertySetterName(); if (!SetterName.empty()) addString(MemberDie, dwarf::DW_AT_APPLE_property_setter, dwarf::DW_FORM_string, SetterName); unsigned PropertyAttributes = 0; if (DT.isReadOnlyObjCProperty()) PropertyAttributes |= dwarf::DW_APPLE_PROPERTY_readonly; if (DT.isReadWriteObjCProperty()) PropertyAttributes |= dwarf::DW_APPLE_PROPERTY_readwrite; if (DT.isAssignObjCProperty()) PropertyAttributes |= dwarf::DW_APPLE_PROPERTY_assign; if (DT.isRetainObjCProperty()) PropertyAttributes |= dwarf::DW_APPLE_PROPERTY_retain; if (DT.isCopyObjCProperty()) PropertyAttributes |= dwarf::DW_APPLE_PROPERTY_copy; if (DT.isNonAtomicObjCProperty()) PropertyAttributes |= dwarf::DW_APPLE_PROPERTY_nonatomic; if (PropertyAttributes) addUInt(MemberDie, dwarf::DW_AT_APPLE_property_attribute, 0, PropertyAttributes); } return MemberDie; } Index: vendor/llvm/dist/lib/CodeGen/LLVMTargetMachine.cpp =================================================================== --- vendor/llvm/dist/lib/CodeGen/LLVMTargetMachine.cpp (revision 228363) +++ vendor/llvm/dist/lib/CodeGen/LLVMTargetMachine.cpp (revision 228364) @@ -1,496 +1,497 @@ //===-- LLVMTargetMachine.cpp - Implement the LLVMTargetMachine class -----===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file implements the LLVMTargetMachine class. // //===----------------------------------------------------------------------===// #include "llvm/Target/TargetMachine.h" #include "llvm/PassManager.h" #include "llvm/Analysis/Passes.h" #include "llvm/Analysis/Verifier.h" #include "llvm/Assembly/PrintModulePass.h" #include "llvm/CodeGen/AsmPrinter.h" #include "llvm/CodeGen/MachineFunctionAnalysis.h" #include "llvm/CodeGen/MachineModuleInfo.h" #include "llvm/CodeGen/GCStrategy.h" #include "llvm/CodeGen/Passes.h" #include "llvm/Target/TargetLowering.h" #include "llvm/Target/TargetOptions.h" #include "llvm/MC/MCAsmInfo.h" #include "llvm/MC/MCInstrInfo.h" #include "llvm/MC/MCStreamer.h" #include "llvm/MC/MCSubtargetInfo.h" #include "llvm/Target/TargetData.h" #include "llvm/Target/TargetInstrInfo.h" #include "llvm/Target/TargetLowering.h" #include "llvm/Target/TargetLoweringObjectFile.h" #include "llvm/Target/TargetRegisterInfo.h" #include "llvm/Target/TargetSubtargetInfo.h" #include "llvm/Transforms/Scalar.h" #include "llvm/ADT/OwningPtr.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" #include "llvm/Support/FormattedStream.h" #include "llvm/Support/TargetRegistry.h" using namespace llvm; namespace llvm { bool EnableFastISel; } static cl::opt DisablePostRA("disable-post-ra", cl::Hidden, cl::desc("Disable Post Regalloc")); static cl::opt DisableBranchFold("disable-branch-fold", cl::Hidden, cl::desc("Disable branch folding")); static cl::opt DisableTailDuplicate("disable-tail-duplicate", cl::Hidden, cl::desc("Disable tail duplication")); static cl::opt DisableEarlyTailDup("disable-early-taildup", cl::Hidden, cl::desc("Disable pre-register allocation tail duplication")); static cl::opt DisableCodePlace("disable-code-place", cl::Hidden, cl::desc("Disable code placement")); static cl::opt DisableSSC("disable-ssc", cl::Hidden, cl::desc("Disable Stack Slot Coloring")); static cl::opt DisableMachineDCE("disable-machine-dce", cl::Hidden, cl::desc("Disable Machine Dead Code Elimination")); static cl::opt DisableMachineLICM("disable-machine-licm", cl::Hidden, cl::desc("Disable Machine LICM")); static cl::opt DisableMachineCSE("disable-machine-cse", cl::Hidden, cl::desc("Disable Machine Common Subexpression Elimination")); static cl::opt DisablePostRAMachineLICM("disable-postra-machine-licm", cl::Hidden, cl::desc("Disable Machine LICM")); static cl::opt DisableMachineSink("disable-machine-sink", cl::Hidden, cl::desc("Disable Machine Sinking")); static cl::opt DisableLSR("disable-lsr", cl::Hidden, cl::desc("Disable Loop Strength Reduction Pass")); static cl::opt DisableCGP("disable-cgp", cl::Hidden, cl::desc("Disable Codegen Prepare")); static cl::opt PrintLSR("print-lsr-output", cl::Hidden, cl::desc("Print LLVM IR produced by the loop-reduce pass")); static cl::opt PrintISelInput("print-isel-input", cl::Hidden, cl::desc("Print LLVM IR input to isel pass")); static cl::opt PrintGCInfo("print-gc", cl::Hidden, cl::desc("Dump garbage collector data")); static cl::opt ShowMCEncoding("show-mc-encoding", cl::Hidden, cl::desc("Show encoding in .s output")); static cl::opt ShowMCInst("show-mc-inst", cl::Hidden, cl::desc("Show instruction structure in .s output")); static cl::opt EnableMCLogging("enable-mc-api-logging", cl::Hidden, cl::desc("Enable MC API logging")); static cl::opt VerifyMachineCode("verify-machineinstrs", cl::Hidden, cl::desc("Verify generated machine code"), cl::init(getenv("LLVM_VERIFY_MACHINEINSTRS")!=NULL)); static cl::opt AsmVerbose("asm-verbose", cl::desc("Add comments to directives."), cl::init(cl::BOU_UNSET)); static bool getVerboseAsm() { switch (AsmVerbose) { default: case cl::BOU_UNSET: return TargetMachine::getAsmVerbosityDefault(); case cl::BOU_TRUE: return true; case cl::BOU_FALSE: return false; } } // Enable or disable FastISel. Both options are needed, because // FastISel is enabled by default with -fast, and we wish to be // able to enable or disable fast-isel independently from -O0. static cl::opt EnableFastISelOption("fast-isel", cl::Hidden, cl::desc("Enable the \"fast\" instruction selector")); LLVMTargetMachine::LLVMTargetMachine(const Target &T, StringRef Triple, StringRef CPU, StringRef FS, Reloc::Model RM, CodeModel::Model CM) : TargetMachine(T, Triple, CPU, FS) { CodeGenInfo = T.createMCCodeGenInfo(Triple, RM, CM); AsmInfo = T.createMCAsmInfo(Triple); // TargetSelect.h moved to a different directory between LLVM 2.9 and 3.0, // and if the old one gets included then MCAsmInfo will be NULL and // we'll crash later. // Provide the user with a useful error message about what's wrong. assert(AsmInfo && "MCAsmInfo not initialized." - "Make sure you include the correct TargetSelect.h!"); + "Make sure you include the correct TargetSelect.h" + "and that InitializeAllTargetMCs() is being invoked!"); } bool LLVMTargetMachine::addPassesToEmitFile(PassManagerBase &PM, formatted_raw_ostream &Out, CodeGenFileType FileType, CodeGenOpt::Level OptLevel, bool DisableVerify) { // Add common CodeGen passes. MCContext *Context = 0; if (addCommonCodeGenPasses(PM, OptLevel, DisableVerify, Context)) return true; assert(Context != 0 && "Failed to get MCContext"); if (hasMCSaveTempLabels()) Context->setAllowTemporaryLabels(false); const MCAsmInfo &MAI = *getMCAsmInfo(); const MCSubtargetInfo &STI = getSubtarget(); OwningPtr AsmStreamer; switch (FileType) { default: return true; case CGFT_AssemblyFile: { MCInstPrinter *InstPrinter = getTarget().createMCInstPrinter(MAI.getAssemblerDialect(), MAI, STI); // Create a code emitter if asked to show the encoding. MCCodeEmitter *MCE = 0; MCAsmBackend *MAB = 0; if (ShowMCEncoding) { const MCSubtargetInfo &STI = getSubtarget(); MCE = getTarget().createMCCodeEmitter(*getInstrInfo(), STI, *Context); MAB = getTarget().createMCAsmBackend(getTargetTriple()); } MCStreamer *S = getTarget().createAsmStreamer(*Context, Out, getVerboseAsm(), hasMCUseLoc(), hasMCUseCFI(), InstPrinter, MCE, MAB, ShowMCInst); AsmStreamer.reset(S); break; } case CGFT_ObjectFile: { // Create the code emitter for the target if it exists. If not, .o file // emission fails. MCCodeEmitter *MCE = getTarget().createMCCodeEmitter(*getInstrInfo(), STI, *Context); MCAsmBackend *MAB = getTarget().createMCAsmBackend(getTargetTriple()); if (MCE == 0 || MAB == 0) return true; AsmStreamer.reset(getTarget().createMCObjectStreamer(getTargetTriple(), *Context, *MAB, Out, MCE, hasMCRelaxAll(), hasMCNoExecStack())); AsmStreamer.get()->InitSections(); break; } case CGFT_Null: // The Null output is intended for use for performance analysis and testing, // not real users. AsmStreamer.reset(createNullStreamer(*Context)); break; } if (EnableMCLogging) AsmStreamer.reset(createLoggingStreamer(AsmStreamer.take(), errs())); // Create the AsmPrinter, which takes ownership of AsmStreamer if successful. FunctionPass *Printer = getTarget().createAsmPrinter(*this, *AsmStreamer); if (Printer == 0) return true; // If successful, createAsmPrinter took ownership of AsmStreamer. AsmStreamer.take(); PM.add(Printer); PM.add(createGCInfoDeleter()); return false; } /// addPassesToEmitMachineCode - Add passes to the specified pass manager to /// get machine code emitted. This uses a JITCodeEmitter object to handle /// actually outputting the machine code and resolving things like the address /// of functions. This method should returns true if machine code emission is /// not supported. /// bool LLVMTargetMachine::addPassesToEmitMachineCode(PassManagerBase &PM, JITCodeEmitter &JCE, CodeGenOpt::Level OptLevel, bool DisableVerify) { // Add common CodeGen passes. MCContext *Ctx = 0; if (addCommonCodeGenPasses(PM, OptLevel, DisableVerify, Ctx)) return true; addCodeEmitter(PM, OptLevel, JCE); PM.add(createGCInfoDeleter()); return false; // success! } /// addPassesToEmitMC - Add passes to the specified pass manager to get /// machine code emitted with the MCJIT. This method returns true if machine /// code is not supported. It fills the MCContext Ctx pointer which can be /// used to build custom MCStreamer. /// bool LLVMTargetMachine::addPassesToEmitMC(PassManagerBase &PM, MCContext *&Ctx, raw_ostream &Out, CodeGenOpt::Level OptLevel, bool DisableVerify) { // Add common CodeGen passes. if (addCommonCodeGenPasses(PM, OptLevel, DisableVerify, Ctx)) return true; if (hasMCSaveTempLabels()) Ctx->setAllowTemporaryLabels(false); // Create the code emitter for the target if it exists. If not, .o file // emission fails. const MCSubtargetInfo &STI = getSubtarget(); MCCodeEmitter *MCE = getTarget().createMCCodeEmitter(*getInstrInfo(),STI, *Ctx); MCAsmBackend *MAB = getTarget().createMCAsmBackend(getTargetTriple()); if (MCE == 0 || MAB == 0) return true; OwningPtr AsmStreamer; AsmStreamer.reset(getTarget().createMCObjectStreamer(getTargetTriple(), *Ctx, *MAB, Out, MCE, hasMCRelaxAll(), hasMCNoExecStack())); AsmStreamer.get()->InitSections(); // Create the AsmPrinter, which takes ownership of AsmStreamer if successful. FunctionPass *Printer = getTarget().createAsmPrinter(*this, *AsmStreamer); if (Printer == 0) return true; // If successful, createAsmPrinter took ownership of AsmStreamer. AsmStreamer.take(); PM.add(Printer); return false; // success! } static void printNoVerify(PassManagerBase &PM, const char *Banner) { if (PrintMachineCode) PM.add(createMachineFunctionPrinterPass(dbgs(), Banner)); } static void printAndVerify(PassManagerBase &PM, const char *Banner) { if (PrintMachineCode) PM.add(createMachineFunctionPrinterPass(dbgs(), Banner)); if (VerifyMachineCode) PM.add(createMachineVerifierPass(Banner)); } /// addCommonCodeGenPasses - Add standard LLVM codegen passes used for both /// emitting to assembly files or machine code output. /// bool LLVMTargetMachine::addCommonCodeGenPasses(PassManagerBase &PM, CodeGenOpt::Level OptLevel, bool DisableVerify, MCContext *&OutContext) { // Standard LLVM-Level Passes. // Basic AliasAnalysis support. // Add TypeBasedAliasAnalysis before BasicAliasAnalysis so that // BasicAliasAnalysis wins if they disagree. This is intended to help // support "obvious" type-punning idioms. PM.add(createTypeBasedAliasAnalysisPass()); PM.add(createBasicAliasAnalysisPass()); // Before running any passes, run the verifier to determine if the input // coming from the front-end and/or optimizer is valid. if (!DisableVerify) PM.add(createVerifierPass()); // Run loop strength reduction before anything else. if (OptLevel != CodeGenOpt::None && !DisableLSR) { PM.add(createLoopStrengthReducePass(getTargetLowering())); if (PrintLSR) PM.add(createPrintFunctionPass("\n\n*** Code after LSR ***\n", &dbgs())); } PM.add(createGCLoweringPass()); // Make sure that no unreachable blocks are instruction selected. PM.add(createUnreachableBlockEliminationPass()); // Turn exception handling constructs into something the code generators can // handle. switch (getMCAsmInfo()->getExceptionHandlingType()) { case ExceptionHandling::SjLj: // SjLj piggy-backs on dwarf for this bit. The cleanups done apply to both // Dwarf EH prepare needs to be run after SjLj prepare. Otherwise, // catch info can get misplaced when a selector ends up more than one block // removed from the parent invoke(s). This could happen when a landing // pad is shared by multiple invokes and is also a target of a normal // edge from elsewhere. PM.add(createSjLjEHPass(getTargetLowering())); // FALLTHROUGH case ExceptionHandling::DwarfCFI: case ExceptionHandling::ARM: case ExceptionHandling::Win64: PM.add(createDwarfEHPass(this)); break; case ExceptionHandling::None: PM.add(createLowerInvokePass(getTargetLowering())); // The lower invoke pass may create unreachable code. Remove it. PM.add(createUnreachableBlockEliminationPass()); break; } if (OptLevel != CodeGenOpt::None && !DisableCGP) PM.add(createCodeGenPreparePass(getTargetLowering())); PM.add(createStackProtectorPass(getTargetLowering())); addPreISel(PM, OptLevel); if (PrintISelInput) PM.add(createPrintFunctionPass("\n\n" "*** Final LLVM Code input to ISel ***\n", &dbgs())); // All passes which modify the LLVM IR are now complete; run the verifier // to ensure that the IR is valid. if (!DisableVerify) PM.add(createVerifierPass()); // Standard Lower-Level Passes. // Install a MachineModuleInfo class, which is an immutable pass that holds // all the per-module stuff we're generating, including MCContext. MachineModuleInfo *MMI = new MachineModuleInfo(*getMCAsmInfo(), *getRegisterInfo(), &getTargetLowering()->getObjFileLowering()); PM.add(MMI); OutContext = &MMI->getContext(); // Return the MCContext specifically by-ref. // Set up a MachineFunction for the rest of CodeGen to work on. PM.add(new MachineFunctionAnalysis(*this, OptLevel)); // Enable FastISel with -fast, but allow that to be overridden. if (EnableFastISelOption == cl::BOU_TRUE || (OptLevel == CodeGenOpt::None && EnableFastISelOption != cl::BOU_FALSE)) EnableFastISel = true; // Ask the target for an isel. if (addInstSelector(PM, OptLevel)) return true; // Print the instruction selected machine code... printAndVerify(PM, "After Instruction Selection"); // Expand pseudo-instructions emitted by ISel. PM.add(createExpandISelPseudosPass()); // Pre-ra tail duplication. if (OptLevel != CodeGenOpt::None && !DisableEarlyTailDup) { PM.add(createTailDuplicatePass(true)); printAndVerify(PM, "After Pre-RegAlloc TailDuplicate"); } // Optimize PHIs before DCE: removing dead PHI cycles may make more // instructions dead. if (OptLevel != CodeGenOpt::None) PM.add(createOptimizePHIsPass()); // If the target requests it, assign local variables to stack slots relative // to one another and simplify frame index references where possible. PM.add(createLocalStackSlotAllocationPass()); if (OptLevel != CodeGenOpt::None) { // With optimization, dead code should already be eliminated. However // there is one known exception: lowered code for arguments that are only // used by tail calls, where the tail calls reuse the incoming stack // arguments directly (see t11 in test/CodeGen/X86/sibcall.ll). if (!DisableMachineDCE) PM.add(createDeadMachineInstructionElimPass()); printAndVerify(PM, "After codegen DCE pass"); if (!DisableMachineLICM) PM.add(createMachineLICMPass()); if (!DisableMachineCSE) PM.add(createMachineCSEPass()); if (!DisableMachineSink) PM.add(createMachineSinkingPass()); printAndVerify(PM, "After Machine LICM, CSE and Sinking passes"); PM.add(createPeepholeOptimizerPass()); printAndVerify(PM, "After codegen peephole optimization pass"); } // Run pre-ra passes. if (addPreRegAlloc(PM, OptLevel)) printAndVerify(PM, "After PreRegAlloc passes"); // Perform register allocation. PM.add(createRegisterAllocator(OptLevel)); printAndVerify(PM, "After Register Allocation"); // Perform stack slot coloring and post-ra machine LICM. if (OptLevel != CodeGenOpt::None) { // FIXME: Re-enable coloring with register when it's capable of adding // kill markers. if (!DisableSSC) PM.add(createStackSlotColoringPass(false)); // Run post-ra machine LICM to hoist reloads / remats. if (!DisablePostRAMachineLICM) PM.add(createMachineLICMPass(false)); printAndVerify(PM, "After StackSlotColoring and postra Machine LICM"); } // Run post-ra passes. if (addPostRegAlloc(PM, OptLevel)) printAndVerify(PM, "After PostRegAlloc passes"); PM.add(createExpandPostRAPseudosPass()); printAndVerify(PM, "After ExpandPostRAPseudos"); // Insert prolog/epilog code. Eliminate abstract frame index references... PM.add(createPrologEpilogCodeInserter()); printAndVerify(PM, "After PrologEpilogCodeInserter"); // Run pre-sched2 passes. if (addPreSched2(PM, OptLevel)) printAndVerify(PM, "After PreSched2 passes"); // Second pass scheduler. if (OptLevel != CodeGenOpt::None && !DisablePostRA) { PM.add(createPostRAScheduler(OptLevel)); printAndVerify(PM, "After PostRAScheduler"); } // Branch folding must be run after regalloc and prolog/epilog insertion. if (OptLevel != CodeGenOpt::None && !DisableBranchFold) { PM.add(createBranchFoldingPass(getEnableTailMergeDefault())); printNoVerify(PM, "After BranchFolding"); } // Tail duplication. if (OptLevel != CodeGenOpt::None && !DisableTailDuplicate) { PM.add(createTailDuplicatePass(false)); printNoVerify(PM, "After TailDuplicate"); } PM.add(createGCMachineCodeAnalysisPass()); if (PrintGCInfo) PM.add(createGCInfoPrinter(dbgs())); if (OptLevel != CodeGenOpt::None && !DisableCodePlace) { PM.add(createCodePlacementOptPass()); printNoVerify(PM, "After CodePlacementOpt"); } if (addPreEmitPass(PM, OptLevel)) printNoVerify(PM, "After PreEmit passes"); return false; } Index: vendor/llvm/dist/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- vendor/llvm/dist/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (revision 228363) +++ vendor/llvm/dist/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (revision 228364) @@ -1,6818 +1,6821 @@ //===-- SelectionDAGBuilder.cpp - Selection-DAG building ------------------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This implements routines for translating from LLVM IR into SelectionDAG IR. // //===----------------------------------------------------------------------===// #define DEBUG_TYPE "isel" #include "SDNodeDbgValue.h" #include "SelectionDAGBuilder.h" #include "llvm/ADT/BitVector.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/SmallSet.h" #include "llvm/Analysis/AliasAnalysis.h" #include "llvm/Analysis/ConstantFolding.h" #include "llvm/Constants.h" #include "llvm/CallingConv.h" #include "llvm/DerivedTypes.h" #include "llvm/Function.h" #include "llvm/GlobalVariable.h" #include "llvm/InlineAsm.h" #include "llvm/Instructions.h" #include "llvm/Intrinsics.h" #include "llvm/IntrinsicInst.h" #include "llvm/LLVMContext.h" #include "llvm/Module.h" #include "llvm/CodeGen/Analysis.h" #include "llvm/CodeGen/FastISel.h" #include "llvm/CodeGen/FunctionLoweringInfo.h" #include "llvm/CodeGen/GCStrategy.h" #include "llvm/CodeGen/GCMetadata.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/CodeGen/MachineJumpTableInfo.h" #include "llvm/CodeGen/MachineModuleInfo.h" #include "llvm/CodeGen/MachineRegisterInfo.h" #include "llvm/CodeGen/PseudoSourceValue.h" #include "llvm/CodeGen/SelectionDAG.h" #include "llvm/Analysis/DebugInfo.h" #include "llvm/Target/TargetData.h" #include "llvm/Target/TargetFrameLowering.h" #include "llvm/Target/TargetInstrInfo.h" #include "llvm/Target/TargetIntrinsicInfo.h" #include "llvm/Target/TargetLowering.h" #include "llvm/Target/TargetOptions.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/raw_ostream.h" #include using namespace llvm; /// LimitFloatPrecision - Generate low-precision inline sequences for /// some float libcalls (6, 8 or 12 bits). static unsigned LimitFloatPrecision; static cl::opt LimitFPPrecision("limit-float-precision", cl::desc("Generate low-precision inline sequences " "for some float libcalls"), cl::location(LimitFloatPrecision), cl::init(0)); // Limit the width of DAG chains. This is important in general to prevent // prevent DAG-based analysis from blowing up. For example, alias analysis and // load clustering may not complete in reasonable time. It is difficult to // recognize and avoid this situation within each individual analysis, and // future analyses are likely to have the same behavior. Limiting DAG width is // the safe approach, and will be especially important with global DAGs. // // MaxParallelChains default is arbitrarily high to avoid affecting // optimization, but could be lowered to improve compile time. Any ld-ld-st-st // sequence over this should have been converted to llvm.memcpy by the // frontend. It easy to induce this behavior with .ll code such as: // %buffer = alloca [4096 x i8] // %data = load [4096 x i8]* %argPtr // store [4096 x i8] %data, [4096 x i8]* %buffer static const unsigned MaxParallelChains = 64; static SDValue getCopyFromPartsVector(SelectionDAG &DAG, DebugLoc DL, const SDValue *Parts, unsigned NumParts, EVT PartVT, EVT ValueVT); /// getCopyFromParts - Create a value that contains the specified legal parts /// combined into the value they represent. If the parts combine to a type /// larger then ValueVT then AssertOp can be used to specify whether the extra /// bits are known to be zero (ISD::AssertZext) or sign extended from ValueVT /// (ISD::AssertSext). static SDValue getCopyFromParts(SelectionDAG &DAG, DebugLoc DL, const SDValue *Parts, unsigned NumParts, EVT PartVT, EVT ValueVT, ISD::NodeType AssertOp = ISD::DELETED_NODE) { if (ValueVT.isVector()) return getCopyFromPartsVector(DAG, DL, Parts, NumParts, PartVT, ValueVT); assert(NumParts > 0 && "No parts to assemble!"); const TargetLowering &TLI = DAG.getTargetLoweringInfo(); SDValue Val = Parts[0]; if (NumParts > 1) { // Assemble the value from multiple parts. if (ValueVT.isInteger()) { unsigned PartBits = PartVT.getSizeInBits(); unsigned ValueBits = ValueVT.getSizeInBits(); // Assemble the power of 2 part. unsigned RoundParts = NumParts & (NumParts - 1) ? 1 << Log2_32(NumParts) : NumParts; unsigned RoundBits = PartBits * RoundParts; EVT RoundVT = RoundBits == ValueBits ? ValueVT : EVT::getIntegerVT(*DAG.getContext(), RoundBits); SDValue Lo, Hi; EVT HalfVT = EVT::getIntegerVT(*DAG.getContext(), RoundBits/2); if (RoundParts > 2) { Lo = getCopyFromParts(DAG, DL, Parts, RoundParts / 2, PartVT, HalfVT); Hi = getCopyFromParts(DAG, DL, Parts + RoundParts / 2, RoundParts / 2, PartVT, HalfVT); } else { Lo = DAG.getNode(ISD::BITCAST, DL, HalfVT, Parts[0]); Hi = DAG.getNode(ISD::BITCAST, DL, HalfVT, Parts[1]); } if (TLI.isBigEndian()) std::swap(Lo, Hi); Val = DAG.getNode(ISD::BUILD_PAIR, DL, RoundVT, Lo, Hi); if (RoundParts < NumParts) { // Assemble the trailing non-power-of-2 part. unsigned OddParts = NumParts - RoundParts; EVT OddVT = EVT::getIntegerVT(*DAG.getContext(), OddParts * PartBits); Hi = getCopyFromParts(DAG, DL, Parts + RoundParts, OddParts, PartVT, OddVT); // Combine the round and odd parts. Lo = Val; if (TLI.isBigEndian()) std::swap(Lo, Hi); EVT TotalVT = EVT::getIntegerVT(*DAG.getContext(), NumParts * PartBits); Hi = DAG.getNode(ISD::ANY_EXTEND, DL, TotalVT, Hi); Hi = DAG.getNode(ISD::SHL, DL, TotalVT, Hi, DAG.getConstant(Lo.getValueType().getSizeInBits(), TLI.getPointerTy())); Lo = DAG.getNode(ISD::ZERO_EXTEND, DL, TotalVT, Lo); Val = DAG.getNode(ISD::OR, DL, TotalVT, Lo, Hi); } } else if (PartVT.isFloatingPoint()) { // FP split into multiple FP parts (for ppcf128) assert(ValueVT == EVT(MVT::ppcf128) && PartVT == EVT(MVT::f64) && "Unexpected split"); SDValue Lo, Hi; Lo = DAG.getNode(ISD::BITCAST, DL, EVT(MVT::f64), Parts[0]); Hi = DAG.getNode(ISD::BITCAST, DL, EVT(MVT::f64), Parts[1]); if (TLI.isBigEndian()) std::swap(Lo, Hi); Val = DAG.getNode(ISD::BUILD_PAIR, DL, ValueVT, Lo, Hi); } else { // FP split into integer parts (soft fp) assert(ValueVT.isFloatingPoint() && PartVT.isInteger() && !PartVT.isVector() && "Unexpected split"); EVT IntVT = EVT::getIntegerVT(*DAG.getContext(), ValueVT.getSizeInBits()); Val = getCopyFromParts(DAG, DL, Parts, NumParts, PartVT, IntVT); } } // There is now one part, held in Val. Correct it to match ValueVT. PartVT = Val.getValueType(); if (PartVT == ValueVT) return Val; if (PartVT.isInteger() && ValueVT.isInteger()) { if (ValueVT.bitsLT(PartVT)) { // For a truncate, see if we have any information to // indicate whether the truncated bits will always be // zero or sign-extension. if (AssertOp != ISD::DELETED_NODE) Val = DAG.getNode(AssertOp, DL, PartVT, Val, DAG.getValueType(ValueVT)); return DAG.getNode(ISD::TRUNCATE, DL, ValueVT, Val); } return DAG.getNode(ISD::ANY_EXTEND, DL, ValueVT, Val); } if (PartVT.isFloatingPoint() && ValueVT.isFloatingPoint()) { // FP_ROUND's are always exact here. if (ValueVT.bitsLT(Val.getValueType())) return DAG.getNode(ISD::FP_ROUND, DL, ValueVT, Val, DAG.getIntPtrConstant(1)); return DAG.getNode(ISD::FP_EXTEND, DL, ValueVT, Val); } if (PartVT.getSizeInBits() == ValueVT.getSizeInBits()) return DAG.getNode(ISD::BITCAST, DL, ValueVT, Val); llvm_unreachable("Unknown mismatch!"); return SDValue(); } /// getCopyFromParts - Create a value that contains the specified legal parts /// combined into the value they represent. If the parts combine to a type /// larger then ValueVT then AssertOp can be used to specify whether the extra /// bits are known to be zero (ISD::AssertZext) or sign extended from ValueVT /// (ISD::AssertSext). static SDValue getCopyFromPartsVector(SelectionDAG &DAG, DebugLoc DL, const SDValue *Parts, unsigned NumParts, EVT PartVT, EVT ValueVT) { assert(ValueVT.isVector() && "Not a vector value"); assert(NumParts > 0 && "No parts to assemble!"); const TargetLowering &TLI = DAG.getTargetLoweringInfo(); SDValue Val = Parts[0]; // Handle a multi-element vector. if (NumParts > 1) { EVT IntermediateVT, RegisterVT; unsigned NumIntermediates; unsigned NumRegs = TLI.getVectorTypeBreakdown(*DAG.getContext(), ValueVT, IntermediateVT, NumIntermediates, RegisterVT); assert(NumRegs == NumParts && "Part count doesn't match vector breakdown!"); NumParts = NumRegs; // Silence a compiler warning. assert(RegisterVT == PartVT && "Part type doesn't match vector breakdown!"); assert(RegisterVT == Parts[0].getValueType() && "Part type doesn't match part!"); // Assemble the parts into intermediate operands. SmallVector Ops(NumIntermediates); if (NumIntermediates == NumParts) { // If the register was not expanded, truncate or copy the value, // as appropriate. for (unsigned i = 0; i != NumParts; ++i) Ops[i] = getCopyFromParts(DAG, DL, &Parts[i], 1, PartVT, IntermediateVT); } else if (NumParts > 0) { // If the intermediate type was expanded, build the intermediate // operands from the parts. assert(NumParts % NumIntermediates == 0 && "Must expand into a divisible number of parts!"); unsigned Factor = NumParts / NumIntermediates; for (unsigned i = 0; i != NumIntermediates; ++i) Ops[i] = getCopyFromParts(DAG, DL, &Parts[i * Factor], Factor, PartVT, IntermediateVT); } // Build a vector with BUILD_VECTOR or CONCAT_VECTORS from the // intermediate operands. Val = DAG.getNode(IntermediateVT.isVector() ? ISD::CONCAT_VECTORS : ISD::BUILD_VECTOR, DL, ValueVT, &Ops[0], NumIntermediates); } // There is now one part, held in Val. Correct it to match ValueVT. PartVT = Val.getValueType(); if (PartVT == ValueVT) return Val; if (PartVT.isVector()) { // If the element type of the source/dest vectors are the same, but the // parts vector has more elements than the value vector, then we have a // vector widening case (e.g. <2 x float> -> <4 x float>). Extract the // elements we want. if (PartVT.getVectorElementType() == ValueVT.getVectorElementType()) { assert(PartVT.getVectorNumElements() > ValueVT.getVectorNumElements() && "Cannot narrow, it would be a lossy transformation"); return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ValueVT, Val, DAG.getIntPtrConstant(0)); } // Vector/Vector bitcast. if (ValueVT.getSizeInBits() == PartVT.getSizeInBits()) return DAG.getNode(ISD::BITCAST, DL, ValueVT, Val); assert(PartVT.getVectorNumElements() == ValueVT.getVectorNumElements() && "Cannot handle this kind of promotion"); // Promoted vector extract bool Smaller = ValueVT.bitsLE(PartVT); return DAG.getNode((Smaller ? ISD::TRUNCATE : ISD::ANY_EXTEND), DL, ValueVT, Val); } // Trivial bitcast if the types are the same size and the destination // vector type is legal. if (PartVT.getSizeInBits() == ValueVT.getSizeInBits() && TLI.isTypeLegal(ValueVT)) return DAG.getNode(ISD::BITCAST, DL, ValueVT, Val); // Handle cases such as i8 -> <1 x i1> assert(ValueVT.getVectorNumElements() == 1 && "Only trivial scalar-to-vector conversions should get here!"); if (ValueVT.getVectorNumElements() == 1 && ValueVT.getVectorElementType() != PartVT) { bool Smaller = ValueVT.bitsLE(PartVT); Val = DAG.getNode((Smaller ? ISD::TRUNCATE : ISD::ANY_EXTEND), DL, ValueVT.getScalarType(), Val); } return DAG.getNode(ISD::BUILD_VECTOR, DL, ValueVT, Val); } static void getCopyToPartsVector(SelectionDAG &DAG, DebugLoc dl, SDValue Val, SDValue *Parts, unsigned NumParts, EVT PartVT); /// getCopyToParts - Create a series of nodes that contain the specified value /// split into legal parts. If the parts contain more bits than Val, then, for /// integers, ExtendKind can be used to specify how to generate the extra bits. static void getCopyToParts(SelectionDAG &DAG, DebugLoc DL, SDValue Val, SDValue *Parts, unsigned NumParts, EVT PartVT, ISD::NodeType ExtendKind = ISD::ANY_EXTEND) { EVT ValueVT = Val.getValueType(); // Handle the vector case separately. if (ValueVT.isVector()) return getCopyToPartsVector(DAG, DL, Val, Parts, NumParts, PartVT); const TargetLowering &TLI = DAG.getTargetLoweringInfo(); unsigned PartBits = PartVT.getSizeInBits(); unsigned OrigNumParts = NumParts; assert(TLI.isTypeLegal(PartVT) && "Copying to an illegal type!"); if (NumParts == 0) return; assert(!ValueVT.isVector() && "Vector case handled elsewhere"); if (PartVT == ValueVT) { assert(NumParts == 1 && "No-op copy with multiple parts!"); Parts[0] = Val; return; } if (NumParts * PartBits > ValueVT.getSizeInBits()) { // If the parts cover more bits than the value has, promote the value. if (PartVT.isFloatingPoint() && ValueVT.isFloatingPoint()) { assert(NumParts == 1 && "Do not know what to promote to!"); Val = DAG.getNode(ISD::FP_EXTEND, DL, PartVT, Val); } else { assert(PartVT.isInteger() && ValueVT.isInteger() && "Unknown mismatch!"); ValueVT = EVT::getIntegerVT(*DAG.getContext(), NumParts * PartBits); Val = DAG.getNode(ExtendKind, DL, ValueVT, Val); } } else if (PartBits == ValueVT.getSizeInBits()) { // Different types of the same size. assert(NumParts == 1 && PartVT != ValueVT); Val = DAG.getNode(ISD::BITCAST, DL, PartVT, Val); } else if (NumParts * PartBits < ValueVT.getSizeInBits()) { // If the parts cover less bits than value has, truncate the value. assert(PartVT.isInteger() && ValueVT.isInteger() && "Unknown mismatch!"); ValueVT = EVT::getIntegerVT(*DAG.getContext(), NumParts * PartBits); Val = DAG.getNode(ISD::TRUNCATE, DL, ValueVT, Val); } // The value may have changed - recompute ValueVT. ValueVT = Val.getValueType(); assert(NumParts * PartBits == ValueVT.getSizeInBits() && "Failed to tile the value with PartVT!"); if (NumParts == 1) { assert(PartVT == ValueVT && "Type conversion failed!"); Parts[0] = Val; return; } // Expand the value into multiple parts. if (NumParts & (NumParts - 1)) { // The number of parts is not a power of 2. Split off and copy the tail. assert(PartVT.isInteger() && ValueVT.isInteger() && "Do not know what to expand to!"); unsigned RoundParts = 1 << Log2_32(NumParts); unsigned RoundBits = RoundParts * PartBits; unsigned OddParts = NumParts - RoundParts; SDValue OddVal = DAG.getNode(ISD::SRL, DL, ValueVT, Val, DAG.getIntPtrConstant(RoundBits)); getCopyToParts(DAG, DL, OddVal, Parts + RoundParts, OddParts, PartVT); if (TLI.isBigEndian()) // The odd parts were reversed by getCopyToParts - unreverse them. std::reverse(Parts + RoundParts, Parts + NumParts); NumParts = RoundParts; ValueVT = EVT::getIntegerVT(*DAG.getContext(), NumParts * PartBits); Val = DAG.getNode(ISD::TRUNCATE, DL, ValueVT, Val); } // The number of parts is a power of 2. Repeatedly bisect the value using // EXTRACT_ELEMENT. Parts[0] = DAG.getNode(ISD::BITCAST, DL, EVT::getIntegerVT(*DAG.getContext(), ValueVT.getSizeInBits()), Val); for (unsigned StepSize = NumParts; StepSize > 1; StepSize /= 2) { for (unsigned i = 0; i < NumParts; i += StepSize) { unsigned ThisBits = StepSize * PartBits / 2; EVT ThisVT = EVT::getIntegerVT(*DAG.getContext(), ThisBits); SDValue &Part0 = Parts[i]; SDValue &Part1 = Parts[i+StepSize/2]; Part1 = DAG.getNode(ISD::EXTRACT_ELEMENT, DL, ThisVT, Part0, DAG.getIntPtrConstant(1)); Part0 = DAG.getNode(ISD::EXTRACT_ELEMENT, DL, ThisVT, Part0, DAG.getIntPtrConstant(0)); if (ThisBits == PartBits && ThisVT != PartVT) { Part0 = DAG.getNode(ISD::BITCAST, DL, PartVT, Part0); Part1 = DAG.getNode(ISD::BITCAST, DL, PartVT, Part1); } } } if (TLI.isBigEndian()) std::reverse(Parts, Parts + OrigNumParts); } /// getCopyToPartsVector - Create a series of nodes that contain the specified /// value split into legal parts. static void getCopyToPartsVector(SelectionDAG &DAG, DebugLoc DL, SDValue Val, SDValue *Parts, unsigned NumParts, EVT PartVT) { EVT ValueVT = Val.getValueType(); assert(ValueVT.isVector() && "Not a vector"); const TargetLowering &TLI = DAG.getTargetLoweringInfo(); if (NumParts == 1) { if (PartVT == ValueVT) { // Nothing to do. } else if (PartVT.getSizeInBits() == ValueVT.getSizeInBits()) { // Bitconvert vector->vector case. Val = DAG.getNode(ISD::BITCAST, DL, PartVT, Val); } else if (PartVT.isVector() && PartVT.getVectorElementType() == ValueVT.getVectorElementType() && PartVT.getVectorNumElements() > ValueVT.getVectorNumElements()) { EVT ElementVT = PartVT.getVectorElementType(); // Vector widening case, e.g. <2 x float> -> <4 x float>. Shuffle in // undef elements. SmallVector Ops; for (unsigned i = 0, e = ValueVT.getVectorNumElements(); i != e; ++i) Ops.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ElementVT, Val, DAG.getIntPtrConstant(i))); for (unsigned i = ValueVT.getVectorNumElements(), e = PartVT.getVectorNumElements(); i != e; ++i) Ops.push_back(DAG.getUNDEF(ElementVT)); Val = DAG.getNode(ISD::BUILD_VECTOR, DL, PartVT, &Ops[0], Ops.size()); // FIXME: Use CONCAT for 2x -> 4x. //SDValue UndefElts = DAG.getUNDEF(VectorTy); //Val = DAG.getNode(ISD::CONCAT_VECTORS, DL, PartVT, Val, UndefElts); } else if (PartVT.isVector() && PartVT.getVectorElementType().bitsGE( ValueVT.getVectorElementType()) && PartVT.getVectorNumElements() == ValueVT.getVectorNumElements()) { // Promoted vector extract bool Smaller = PartVT.bitsLE(ValueVT); Val = DAG.getNode((Smaller ? ISD::TRUNCATE : ISD::ANY_EXTEND), DL, PartVT, Val); } else{ // Vector -> scalar conversion. assert(ValueVT.getVectorNumElements() == 1 && "Only trivial vector-to-scalar conversions should get here!"); Val = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, PartVT, Val, DAG.getIntPtrConstant(0)); bool Smaller = ValueVT.bitsLE(PartVT); Val = DAG.getNode((Smaller ? ISD::TRUNCATE : ISD::ANY_EXTEND), DL, PartVT, Val); } Parts[0] = Val; return; } // Handle a multi-element vector. EVT IntermediateVT, RegisterVT; unsigned NumIntermediates; unsigned NumRegs = TLI.getVectorTypeBreakdown(*DAG.getContext(), ValueVT, IntermediateVT, NumIntermediates, RegisterVT); unsigned NumElements = ValueVT.getVectorNumElements(); assert(NumRegs == NumParts && "Part count doesn't match vector breakdown!"); NumParts = NumRegs; // Silence a compiler warning. assert(RegisterVT == PartVT && "Part type doesn't match vector breakdown!"); // Split the vector into intermediate operands. SmallVector Ops(NumIntermediates); for (unsigned i = 0; i != NumIntermediates; ++i) { if (IntermediateVT.isVector()) Ops[i] = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, IntermediateVT, Val, DAG.getIntPtrConstant(i * (NumElements / NumIntermediates))); else Ops[i] = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, IntermediateVT, Val, DAG.getIntPtrConstant(i)); } // Split the intermediate operands into legal parts. if (NumParts == NumIntermediates) { // If the register was not expanded, promote or copy the value, // as appropriate. for (unsigned i = 0; i != NumParts; ++i) getCopyToParts(DAG, DL, Ops[i], &Parts[i], 1, PartVT); } else if (NumParts > 0) { // If the intermediate type was expanded, split each the value into // legal parts. assert(NumParts % NumIntermediates == 0 && "Must expand into a divisible number of parts!"); unsigned Factor = NumParts / NumIntermediates; for (unsigned i = 0; i != NumIntermediates; ++i) getCopyToParts(DAG, DL, Ops[i], &Parts[i*Factor], Factor, PartVT); } } namespace { /// RegsForValue - This struct represents the registers (physical or virtual) /// that a particular set of values is assigned, and the type information /// about the value. The most common situation is to represent one value at a /// time, but struct or array values are handled element-wise as multiple /// values. The splitting of aggregates is performed recursively, so that we /// never have aggregate-typed registers. The values at this point do not /// necessarily have legal types, so each value may require one or more /// registers of some legal type. /// struct RegsForValue { /// ValueVTs - The value types of the values, which may not be legal, and /// may need be promoted or synthesized from one or more registers. /// SmallVector ValueVTs; /// RegVTs - The value types of the registers. This is the same size as /// ValueVTs and it records, for each value, what the type of the assigned /// register or registers are. (Individual values are never synthesized /// from more than one type of register.) /// /// With virtual registers, the contents of RegVTs is redundant with TLI's /// getRegisterType member function, however when with physical registers /// it is necessary to have a separate record of the types. /// SmallVector RegVTs; /// Regs - This list holds the registers assigned to the values. /// Each legal or promoted value requires one register, and each /// expanded value requires multiple registers. /// SmallVector Regs; RegsForValue() {} RegsForValue(const SmallVector ®s, EVT regvt, EVT valuevt) : ValueVTs(1, valuevt), RegVTs(1, regvt), Regs(regs) {} RegsForValue(LLVMContext &Context, const TargetLowering &tli, unsigned Reg, Type *Ty) { ComputeValueVTs(tli, Ty, ValueVTs); for (unsigned Value = 0, e = ValueVTs.size(); Value != e; ++Value) { EVT ValueVT = ValueVTs[Value]; unsigned NumRegs = tli.getNumRegisters(Context, ValueVT); EVT RegisterVT = tli.getRegisterType(Context, ValueVT); for (unsigned i = 0; i != NumRegs; ++i) Regs.push_back(Reg + i); RegVTs.push_back(RegisterVT); Reg += NumRegs; } } /// areValueTypesLegal - Return true if types of all the values are legal. bool areValueTypesLegal(const TargetLowering &TLI) { for (unsigned Value = 0, e = ValueVTs.size(); Value != e; ++Value) { EVT RegisterVT = RegVTs[Value]; if (!TLI.isTypeLegal(RegisterVT)) return false; } return true; } /// append - Add the specified values to this one. void append(const RegsForValue &RHS) { ValueVTs.append(RHS.ValueVTs.begin(), RHS.ValueVTs.end()); RegVTs.append(RHS.RegVTs.begin(), RHS.RegVTs.end()); Regs.append(RHS.Regs.begin(), RHS.Regs.end()); } /// getCopyFromRegs - Emit a series of CopyFromReg nodes that copies from /// this value and returns the result as a ValueVTs value. This uses /// Chain/Flag as the input and updates them for the output Chain/Flag. /// If the Flag pointer is NULL, no flag is used. SDValue getCopyFromRegs(SelectionDAG &DAG, FunctionLoweringInfo &FuncInfo, DebugLoc dl, SDValue &Chain, SDValue *Flag) const; /// getCopyToRegs - Emit a series of CopyToReg nodes that copies the /// specified value into the registers specified by this object. This uses /// Chain/Flag as the input and updates them for the output Chain/Flag. /// If the Flag pointer is NULL, no flag is used. void getCopyToRegs(SDValue Val, SelectionDAG &DAG, DebugLoc dl, SDValue &Chain, SDValue *Flag) const; /// AddInlineAsmOperands - Add this value to the specified inlineasm node /// operand list. This adds the code marker, matching input operand index /// (if applicable), and includes the number of values added into it. void AddInlineAsmOperands(unsigned Kind, bool HasMatching, unsigned MatchingIdx, SelectionDAG &DAG, std::vector &Ops) const; }; } /// getCopyFromRegs - Emit a series of CopyFromReg nodes that copies from /// this value and returns the result as a ValueVT value. This uses /// Chain/Flag as the input and updates them for the output Chain/Flag. /// If the Flag pointer is NULL, no flag is used. SDValue RegsForValue::getCopyFromRegs(SelectionDAG &DAG, FunctionLoweringInfo &FuncInfo, DebugLoc dl, SDValue &Chain, SDValue *Flag) const { // A Value with type {} or [0 x %t] needs no registers. if (ValueVTs.empty()) return SDValue(); const TargetLowering &TLI = DAG.getTargetLoweringInfo(); // Assemble the legal parts into the final values. SmallVector Values(ValueVTs.size()); SmallVector Parts; for (unsigned Value = 0, Part = 0, e = ValueVTs.size(); Value != e; ++Value) { // Copy the legal parts from the registers. EVT ValueVT = ValueVTs[Value]; unsigned NumRegs = TLI.getNumRegisters(*DAG.getContext(), ValueVT); EVT RegisterVT = RegVTs[Value]; Parts.resize(NumRegs); for (unsigned i = 0; i != NumRegs; ++i) { SDValue P; if (Flag == 0) { P = DAG.getCopyFromReg(Chain, dl, Regs[Part+i], RegisterVT); } else { P = DAG.getCopyFromReg(Chain, dl, Regs[Part+i], RegisterVT, *Flag); *Flag = P.getValue(2); } Chain = P.getValue(1); Parts[i] = P; // If the source register was virtual and if we know something about it, // add an assert node. if (!TargetRegisterInfo::isVirtualRegister(Regs[Part+i]) || !RegisterVT.isInteger() || RegisterVT.isVector()) continue; const FunctionLoweringInfo::LiveOutInfo *LOI = FuncInfo.GetLiveOutRegInfo(Regs[Part+i]); if (!LOI) continue; unsigned RegSize = RegisterVT.getSizeInBits(); unsigned NumSignBits = LOI->NumSignBits; unsigned NumZeroBits = LOI->KnownZero.countLeadingOnes(); // FIXME: We capture more information than the dag can represent. For // now, just use the tightest assertzext/assertsext possible. bool isSExt = true; EVT FromVT(MVT::Other); if (NumSignBits == RegSize) isSExt = true, FromVT = MVT::i1; // ASSERT SEXT 1 else if (NumZeroBits >= RegSize-1) isSExt = false, FromVT = MVT::i1; // ASSERT ZEXT 1 else if (NumSignBits > RegSize-8) isSExt = true, FromVT = MVT::i8; // ASSERT SEXT 8 else if (NumZeroBits >= RegSize-8) isSExt = false, FromVT = MVT::i8; // ASSERT ZEXT 8 else if (NumSignBits > RegSize-16) isSExt = true, FromVT = MVT::i16; // ASSERT SEXT 16 else if (NumZeroBits >= RegSize-16) isSExt = false, FromVT = MVT::i16; // ASSERT ZEXT 16 else if (NumSignBits > RegSize-32) isSExt = true, FromVT = MVT::i32; // ASSERT SEXT 32 else if (NumZeroBits >= RegSize-32) isSExt = false, FromVT = MVT::i32; // ASSERT ZEXT 32 else continue; // Add an assertion node. assert(FromVT != MVT::Other); Parts[i] = DAG.getNode(isSExt ? ISD::AssertSext : ISD::AssertZext, dl, RegisterVT, P, DAG.getValueType(FromVT)); } Values[Value] = getCopyFromParts(DAG, dl, Parts.begin(), NumRegs, RegisterVT, ValueVT); Part += NumRegs; Parts.clear(); } return DAG.getNode(ISD::MERGE_VALUES, dl, DAG.getVTList(&ValueVTs[0], ValueVTs.size()), &Values[0], ValueVTs.size()); } /// getCopyToRegs - Emit a series of CopyToReg nodes that copies the /// specified value into the registers specified by this object. This uses /// Chain/Flag as the input and updates them for the output Chain/Flag. /// If the Flag pointer is NULL, no flag is used. void RegsForValue::getCopyToRegs(SDValue Val, SelectionDAG &DAG, DebugLoc dl, SDValue &Chain, SDValue *Flag) const { const TargetLowering &TLI = DAG.getTargetLoweringInfo(); // Get the list of the values's legal parts. unsigned NumRegs = Regs.size(); SmallVector Parts(NumRegs); for (unsigned Value = 0, Part = 0, e = ValueVTs.size(); Value != e; ++Value) { EVT ValueVT = ValueVTs[Value]; unsigned NumParts = TLI.getNumRegisters(*DAG.getContext(), ValueVT); EVT RegisterVT = RegVTs[Value]; getCopyToParts(DAG, dl, Val.getValue(Val.getResNo() + Value), &Parts[Part], NumParts, RegisterVT); Part += NumParts; } // Copy the parts into the registers. SmallVector Chains(NumRegs); for (unsigned i = 0; i != NumRegs; ++i) { SDValue Part; if (Flag == 0) { Part = DAG.getCopyToReg(Chain, dl, Regs[i], Parts[i]); } else { Part = DAG.getCopyToReg(Chain, dl, Regs[i], Parts[i], *Flag); *Flag = Part.getValue(1); } Chains[i] = Part.getValue(0); } if (NumRegs == 1 || Flag) // If NumRegs > 1 && Flag is used then the use of the last CopyToReg is // flagged to it. That is the CopyToReg nodes and the user are considered // a single scheduling unit. If we create a TokenFactor and return it as // chain, then the TokenFactor is both a predecessor (operand) of the // user as well as a successor (the TF operands are flagged to the user). // c1, f1 = CopyToReg // c2, f2 = CopyToReg // c3 = TokenFactor c1, c2 // ... // = op c3, ..., f2 Chain = Chains[NumRegs-1]; else Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, &Chains[0], NumRegs); } /// AddInlineAsmOperands - Add this value to the specified inlineasm node /// operand list. This adds the code marker and includes the number of /// values added into it. void RegsForValue::AddInlineAsmOperands(unsigned Code, bool HasMatching, unsigned MatchingIdx, SelectionDAG &DAG, std::vector &Ops) const { const TargetLowering &TLI = DAG.getTargetLoweringInfo(); unsigned Flag = InlineAsm::getFlagWord(Code, Regs.size()); if (HasMatching) Flag = InlineAsm::getFlagWordForMatchingOp(Flag, MatchingIdx); else if (!Regs.empty() && TargetRegisterInfo::isVirtualRegister(Regs.front())) { // Put the register class of the virtual registers in the flag word. That // way, later passes can recompute register class constraints for inline // assembly as well as normal instructions. // Don't do this for tied operands that can use the regclass information // from the def. const MachineRegisterInfo &MRI = DAG.getMachineFunction().getRegInfo(); const TargetRegisterClass *RC = MRI.getRegClass(Regs.front()); Flag = InlineAsm::getFlagWordForRegClass(Flag, RC->getID()); } SDValue Res = DAG.getTargetConstant(Flag, MVT::i32); Ops.push_back(Res); for (unsigned Value = 0, Reg = 0, e = ValueVTs.size(); Value != e; ++Value) { unsigned NumRegs = TLI.getNumRegisters(*DAG.getContext(), ValueVTs[Value]); EVT RegisterVT = RegVTs[Value]; for (unsigned i = 0; i != NumRegs; ++i) { assert(Reg < Regs.size() && "Mismatch in # registers expected"); Ops.push_back(DAG.getRegister(Regs[Reg++], RegisterVT)); } } } void SelectionDAGBuilder::init(GCFunctionInfo *gfi, AliasAnalysis &aa) { AA = &aa; GFI = gfi; TD = DAG.getTarget().getTargetData(); LPadToCallSiteMap.clear(); } /// clear - Clear out the current SelectionDAG and the associated /// state and prepare this SelectionDAGBuilder object to be used /// for a new block. This doesn't clear out information about /// additional blocks that are needed to complete switch lowering /// or PHI node updating; that information is cleared out as it is /// consumed. void SelectionDAGBuilder::clear() { NodeMap.clear(); UnusedArgNodeMap.clear(); PendingLoads.clear(); PendingExports.clear(); CurDebugLoc = DebugLoc(); HasTailCall = false; } /// clearDanglingDebugInfo - Clear the dangling debug information /// map. This function is seperated from the clear so that debug /// information that is dangling in a basic block can be properly /// resolved in a different basic block. This allows the /// SelectionDAG to resolve dangling debug information attached /// to PHI nodes. void SelectionDAGBuilder::clearDanglingDebugInfo() { DanglingDebugInfoMap.clear(); } /// getRoot - Return the current virtual root of the Selection DAG, /// flushing any PendingLoad items. This must be done before emitting /// a store or any other node that may need to be ordered after any /// prior load instructions. /// SDValue SelectionDAGBuilder::getRoot() { if (PendingLoads.empty()) return DAG.getRoot(); if (PendingLoads.size() == 1) { SDValue Root = PendingLoads[0]; DAG.setRoot(Root); PendingLoads.clear(); return Root; } // Otherwise, we have to make a token factor node. SDValue Root = DAG.getNode(ISD::TokenFactor, getCurDebugLoc(), MVT::Other, &PendingLoads[0], PendingLoads.size()); PendingLoads.clear(); DAG.setRoot(Root); return Root; } /// getControlRoot - Similar to getRoot, but instead of flushing all the /// PendingLoad items, flush all the PendingExports items. It is necessary /// to do this before emitting a terminator instruction. /// SDValue SelectionDAGBuilder::getControlRoot() { SDValue Root = DAG.getRoot(); if (PendingExports.empty()) return Root; // Turn all of the CopyToReg chains into one factored node. if (Root.getOpcode() != ISD::EntryToken) { unsigned i = 0, e = PendingExports.size(); for (; i != e; ++i) { assert(PendingExports[i].getNode()->getNumOperands() > 1); if (PendingExports[i].getNode()->getOperand(0) == Root) break; // Don't add the root if we already indirectly depend on it. } if (i == e) PendingExports.push_back(Root); } Root = DAG.getNode(ISD::TokenFactor, getCurDebugLoc(), MVT::Other, &PendingExports[0], PendingExports.size()); PendingExports.clear(); DAG.setRoot(Root); return Root; } void SelectionDAGBuilder::AssignOrderingToNode(const SDNode *Node) { if (DAG.GetOrdering(Node) != 0) return; // Already has ordering. DAG.AssignOrdering(Node, SDNodeOrder); for (unsigned I = 0, E = Node->getNumOperands(); I != E; ++I) AssignOrderingToNode(Node->getOperand(I).getNode()); } void SelectionDAGBuilder::visit(const Instruction &I) { // Set up outgoing PHI node register values before emitting the terminator. if (isa(&I)) HandlePHINodesInSuccessorBlocks(I.getParent()); CurDebugLoc = I.getDebugLoc(); visit(I.getOpcode(), I); if (!isa(&I) && !HasTailCall) CopyToExportRegsIfNeeded(&I); CurDebugLoc = DebugLoc(); } void SelectionDAGBuilder::visitPHI(const PHINode &) { llvm_unreachable("SelectionDAGBuilder shouldn't visit PHI nodes!"); } void SelectionDAGBuilder::visit(unsigned Opcode, const User &I) { // Note: this doesn't use InstVisitor, because it has to work with // ConstantExpr's in addition to instructions. switch (Opcode) { default: llvm_unreachable("Unknown instruction type encountered!"); // Build the switch statement using the Instruction.def file. #define HANDLE_INST(NUM, OPCODE, CLASS) \ case Instruction::OPCODE: visit##OPCODE((CLASS&)I); break; #include "llvm/Instruction.def" } // Assign the ordering to the freshly created DAG nodes. if (NodeMap.count(&I)) { ++SDNodeOrder; AssignOrderingToNode(getValue(&I).getNode()); } } // resolveDanglingDebugInfo - if we saw an earlier dbg_value referring to V, // generate the debug data structures now that we've seen its definition. void SelectionDAGBuilder::resolveDanglingDebugInfo(const Value *V, SDValue Val) { DanglingDebugInfo &DDI = DanglingDebugInfoMap[V]; if (DDI.getDI()) { const DbgValueInst *DI = DDI.getDI(); DebugLoc dl = DDI.getdl(); unsigned DbgSDNodeOrder = DDI.getSDNodeOrder(); MDNode *Variable = DI->getVariable(); uint64_t Offset = DI->getOffset(); SDDbgValue *SDV; if (Val.getNode()) { if (!EmitFuncArgumentDbgValue(V, Variable, Offset, Val)) { SDV = DAG.getDbgValue(Variable, Val.getNode(), Val.getResNo(), Offset, dl, DbgSDNodeOrder); DAG.AddDbgValue(SDV, Val.getNode(), false); } } else DEBUG(dbgs() << "Dropping debug info for " << DI); DanglingDebugInfoMap[V] = DanglingDebugInfo(); } } /// getValue - Return an SDValue for the given Value. SDValue SelectionDAGBuilder::getValue(const Value *V) { // If we already have an SDValue for this value, use it. It's important // to do this first, so that we don't create a CopyFromReg if we already // have a regular SDValue. SDValue &N = NodeMap[V]; if (N.getNode()) return N; // If there's a virtual register allocated and initialized for this // value, use it. DenseMap::iterator It = FuncInfo.ValueMap.find(V); if (It != FuncInfo.ValueMap.end()) { unsigned InReg = It->second; RegsForValue RFV(*DAG.getContext(), TLI, InReg, V->getType()); SDValue Chain = DAG.getEntryNode(); N = RFV.getCopyFromRegs(DAG, FuncInfo, getCurDebugLoc(), Chain, NULL); resolveDanglingDebugInfo(V, N); return N; } // Otherwise create a new SDValue and remember it. SDValue Val = getValueImpl(V); NodeMap[V] = Val; resolveDanglingDebugInfo(V, Val); return Val; } /// getNonRegisterValue - Return an SDValue for the given Value, but /// don't look in FuncInfo.ValueMap for a virtual register. SDValue SelectionDAGBuilder::getNonRegisterValue(const Value *V) { // If we already have an SDValue for this value, use it. SDValue &N = NodeMap[V]; if (N.getNode()) return N; // Otherwise create a new SDValue and remember it. SDValue Val = getValueImpl(V); NodeMap[V] = Val; resolveDanglingDebugInfo(V, Val); return Val; } /// getValueImpl - Helper function for getValue and getNonRegisterValue. /// Create an SDValue for the given value. SDValue SelectionDAGBuilder::getValueImpl(const Value *V) { if (const Constant *C = dyn_cast(V)) { EVT VT = TLI.getValueType(V->getType(), true); if (const ConstantInt *CI = dyn_cast(C)) return DAG.getConstant(*CI, VT); if (const GlobalValue *GV = dyn_cast(C)) return DAG.getGlobalAddress(GV, getCurDebugLoc(), VT); if (isa(C)) return DAG.getConstant(0, TLI.getPointerTy()); if (const ConstantFP *CFP = dyn_cast(C)) return DAG.getConstantFP(*CFP, VT); if (isa(C) && !V->getType()->isAggregateType()) return DAG.getUNDEF(VT); if (const ConstantExpr *CE = dyn_cast(C)) { visit(CE->getOpcode(), *CE); SDValue N1 = NodeMap[V]; assert(N1.getNode() && "visit didn't populate the NodeMap!"); return N1; } if (isa(C) || isa(C)) { SmallVector Constants; for (User::const_op_iterator OI = C->op_begin(), OE = C->op_end(); OI != OE; ++OI) { SDNode *Val = getValue(*OI).getNode(); // If the operand is an empty aggregate, there are no values. if (!Val) continue; // Add each leaf value from the operand to the Constants list // to form a flattened list of all the values. for (unsigned i = 0, e = Val->getNumValues(); i != e; ++i) Constants.push_back(SDValue(Val, i)); } return DAG.getMergeValues(&Constants[0], Constants.size(), getCurDebugLoc()); } if (C->getType()->isStructTy() || C->getType()->isArrayTy()) { assert((isa(C) || isa(C)) && "Unknown struct or array constant!"); SmallVector ValueVTs; ComputeValueVTs(TLI, C->getType(), ValueVTs); unsigned NumElts = ValueVTs.size(); if (NumElts == 0) return SDValue(); // empty struct SmallVector Constants(NumElts); for (unsigned i = 0; i != NumElts; ++i) { EVT EltVT = ValueVTs[i]; if (isa(C)) Constants[i] = DAG.getUNDEF(EltVT); else if (EltVT.isFloatingPoint()) Constants[i] = DAG.getConstantFP(0, EltVT); else Constants[i] = DAG.getConstant(0, EltVT); } return DAG.getMergeValues(&Constants[0], NumElts, getCurDebugLoc()); } if (const BlockAddress *BA = dyn_cast(C)) return DAG.getBlockAddress(BA, VT); VectorType *VecTy = cast(V->getType()); unsigned NumElements = VecTy->getNumElements(); // Now that we know the number and type of the elements, get that number of // elements into the Ops array based on what kind of constant it is. SmallVector Ops; if (const ConstantVector *CP = dyn_cast(C)) { for (unsigned i = 0; i != NumElements; ++i) Ops.push_back(getValue(CP->getOperand(i))); } else { assert(isa(C) && "Unknown vector constant!"); EVT EltVT = TLI.getValueType(VecTy->getElementType()); SDValue Op; if (EltVT.isFloatingPoint()) Op = DAG.getConstantFP(0, EltVT); else Op = DAG.getConstant(0, EltVT); Ops.assign(NumElements, Op); } // Create a BUILD_VECTOR node. return NodeMap[V] = DAG.getNode(ISD::BUILD_VECTOR, getCurDebugLoc(), VT, &Ops[0], Ops.size()); } // If this is a static alloca, generate it as the frameindex instead of // computation. if (const AllocaInst *AI = dyn_cast(V)) { DenseMap::iterator SI = FuncInfo.StaticAllocaMap.find(AI); if (SI != FuncInfo.StaticAllocaMap.end()) return DAG.getFrameIndex(SI->second, TLI.getPointerTy()); } // If this is an instruction which fast-isel has deferred, select it now. if (const Instruction *Inst = dyn_cast(V)) { unsigned InReg = FuncInfo.InitializeRegForValue(Inst); RegsForValue RFV(*DAG.getContext(), TLI, InReg, Inst->getType()); SDValue Chain = DAG.getEntryNode(); return RFV.getCopyFromRegs(DAG, FuncInfo, getCurDebugLoc(), Chain, NULL); } llvm_unreachable("Can't get register for value!"); return SDValue(); } void SelectionDAGBuilder::visitRet(const ReturnInst &I) { SDValue Chain = getControlRoot(); SmallVector Outs; SmallVector OutVals; if (!FuncInfo.CanLowerReturn) { unsigned DemoteReg = FuncInfo.DemoteRegister; const Function *F = I.getParent()->getParent(); // Emit a store of the return value through the virtual register. // Leave Outs empty so that LowerReturn won't try to load return // registers the usual way. SmallVector PtrValueVTs; ComputeValueVTs(TLI, PointerType::getUnqual(F->getReturnType()), PtrValueVTs); SDValue RetPtr = DAG.getRegister(DemoteReg, PtrValueVTs[0]); SDValue RetOp = getValue(I.getOperand(0)); SmallVector ValueVTs; SmallVector Offsets; ComputeValueVTs(TLI, I.getOperand(0)->getType(), ValueVTs, &Offsets); unsigned NumValues = ValueVTs.size(); SmallVector Chains(NumValues); for (unsigned i = 0; i != NumValues; ++i) { SDValue Add = DAG.getNode(ISD::ADD, getCurDebugLoc(), RetPtr.getValueType(), RetPtr, DAG.getIntPtrConstant(Offsets[i])); Chains[i] = DAG.getStore(Chain, getCurDebugLoc(), SDValue(RetOp.getNode(), RetOp.getResNo() + i), // FIXME: better loc info would be nice. Add, MachinePointerInfo(), false, false, 0); } Chain = DAG.getNode(ISD::TokenFactor, getCurDebugLoc(), MVT::Other, &Chains[0], NumValues); } else if (I.getNumOperands() != 0) { SmallVector ValueVTs; ComputeValueVTs(TLI, I.getOperand(0)->getType(), ValueVTs); unsigned NumValues = ValueVTs.size(); if (NumValues) { SDValue RetOp = getValue(I.getOperand(0)); for (unsigned j = 0, f = NumValues; j != f; ++j) { EVT VT = ValueVTs[j]; ISD::NodeType ExtendKind = ISD::ANY_EXTEND; const Function *F = I.getParent()->getParent(); if (F->paramHasAttr(0, Attribute::SExt)) ExtendKind = ISD::SIGN_EXTEND; else if (F->paramHasAttr(0, Attribute::ZExt)) ExtendKind = ISD::ZERO_EXTEND; if (ExtendKind != ISD::ANY_EXTEND && VT.isInteger()) VT = TLI.getTypeForExtArgOrReturn(*DAG.getContext(), VT, ExtendKind); unsigned NumParts = TLI.getNumRegisters(*DAG.getContext(), VT); EVT PartVT = TLI.getRegisterType(*DAG.getContext(), VT); SmallVector Parts(NumParts); getCopyToParts(DAG, getCurDebugLoc(), SDValue(RetOp.getNode(), RetOp.getResNo() + j), &Parts[0], NumParts, PartVT, ExtendKind); // 'inreg' on function refers to return value ISD::ArgFlagsTy Flags = ISD::ArgFlagsTy(); if (F->paramHasAttr(0, Attribute::InReg)) Flags.setInReg(); // Propagate extension type if any if (ExtendKind == ISD::SIGN_EXTEND) Flags.setSExt(); else if (ExtendKind == ISD::ZERO_EXTEND) Flags.setZExt(); for (unsigned i = 0; i < NumParts; ++i) { Outs.push_back(ISD::OutputArg(Flags, Parts[i].getValueType(), /*isfixed=*/true)); OutVals.push_back(Parts[i]); } } } } bool isVarArg = DAG.getMachineFunction().getFunction()->isVarArg(); CallingConv::ID CallConv = DAG.getMachineFunction().getFunction()->getCallingConv(); Chain = TLI.LowerReturn(Chain, CallConv, isVarArg, Outs, OutVals, getCurDebugLoc(), DAG); // Verify that the target's LowerReturn behaved as expected. assert(Chain.getNode() && Chain.getValueType() == MVT::Other && "LowerReturn didn't return a valid chain!"); // Update the DAG with the new chain value resulting from return lowering. DAG.setRoot(Chain); } /// CopyToExportRegsIfNeeded - If the given value has virtual registers /// created for it, emit nodes to copy the value into the virtual /// registers. void SelectionDAGBuilder::CopyToExportRegsIfNeeded(const Value *V) { // Skip empty types if (V->getType()->isEmptyTy()) return; DenseMap::iterator VMI = FuncInfo.ValueMap.find(V); if (VMI != FuncInfo.ValueMap.end()) { assert(!V->use_empty() && "Unused value assigned virtual registers!"); CopyValueToVirtualRegister(V, VMI->second); } } /// ExportFromCurrentBlock - If this condition isn't known to be exported from /// the current basic block, add it to ValueMap now so that we'll get a /// CopyTo/FromReg. void SelectionDAGBuilder::ExportFromCurrentBlock(const Value *V) { // No need to export constants. if (!isa(V) && !isa(V)) return; // Already exported? if (FuncInfo.isExportedInst(V)) return; unsigned Reg = FuncInfo.InitializeRegForValue(V); CopyValueToVirtualRegister(V, Reg); } bool SelectionDAGBuilder::isExportableFromCurrentBlock(const Value *V, const BasicBlock *FromBB) { // The operands of the setcc have to be in this block. We don't know // how to export them from some other block. if (const Instruction *VI = dyn_cast(V)) { // Can export from current BB. if (VI->getParent() == FromBB) return true; // Is already exported, noop. return FuncInfo.isExportedInst(V); } // If this is an argument, we can export it if the BB is the entry block or // if it is already exported. if (isa(V)) { if (FromBB == &FromBB->getParent()->getEntryBlock()) return true; // Otherwise, can only export this if it is already exported. return FuncInfo.isExportedInst(V); } // Otherwise, constants can always be exported. return true; } /// Return branch probability calculated by BranchProbabilityInfo for IR blocks. uint32_t SelectionDAGBuilder::getEdgeWeight(MachineBasicBlock *Src, MachineBasicBlock *Dst) { BranchProbabilityInfo *BPI = FuncInfo.BPI; if (!BPI) return 0; const BasicBlock *SrcBB = Src->getBasicBlock(); const BasicBlock *DstBB = Dst->getBasicBlock(); return BPI->getEdgeWeight(SrcBB, DstBB); } void SelectionDAGBuilder:: addSuccessorWithWeight(MachineBasicBlock *Src, MachineBasicBlock *Dst, uint32_t Weight /* = 0 */) { if (!Weight) Weight = getEdgeWeight(Src, Dst); Src->addSuccessor(Dst, Weight); } static bool InBlock(const Value *V, const BasicBlock *BB) { if (const Instruction *I = dyn_cast(V)) return I->getParent() == BB; return true; } /// EmitBranchForMergedCondition - Helper method for FindMergedConditions. /// This function emits a branch and is used at the leaves of an OR or an /// AND operator tree. /// void SelectionDAGBuilder::EmitBranchForMergedCondition(const Value *Cond, MachineBasicBlock *TBB, MachineBasicBlock *FBB, MachineBasicBlock *CurBB, MachineBasicBlock *SwitchBB) { const BasicBlock *BB = CurBB->getBasicBlock(); // If the leaf of the tree is a comparison, merge the condition into // the caseblock. if (const CmpInst *BOp = dyn_cast(Cond)) { // The operands of the cmp have to be in this block. We don't know // how to export them from some other block. If this is the first block // of the sequence, no exporting is needed. if (CurBB == SwitchBB || (isExportableFromCurrentBlock(BOp->getOperand(0), BB) && isExportableFromCurrentBlock(BOp->getOperand(1), BB))) { ISD::CondCode Condition; if (const ICmpInst *IC = dyn_cast(Cond)) { Condition = getICmpCondCode(IC->getPredicate()); } else if (const FCmpInst *FC = dyn_cast(Cond)) { Condition = getFCmpCondCode(FC->getPredicate()); } else { Condition = ISD::SETEQ; // silence warning. llvm_unreachable("Unknown compare instruction"); } CaseBlock CB(Condition, BOp->getOperand(0), BOp->getOperand(1), NULL, TBB, FBB, CurBB); SwitchCases.push_back(CB); return; } } // Create a CaseBlock record representing this branch. CaseBlock CB(ISD::SETEQ, Cond, ConstantInt::getTrue(*DAG.getContext()), NULL, TBB, FBB, CurBB); SwitchCases.push_back(CB); } /// FindMergedConditions - If Cond is an expression like void SelectionDAGBuilder::FindMergedConditions(const Value *Cond, MachineBasicBlock *TBB, MachineBasicBlock *FBB, MachineBasicBlock *CurBB, MachineBasicBlock *SwitchBB, unsigned Opc) { // If this node is not part of the or/and tree, emit it as a branch. const Instruction *BOp = dyn_cast(Cond); if (!BOp || !(isa(BOp) || isa(BOp)) || (unsigned)BOp->getOpcode() != Opc || !BOp->hasOneUse() || BOp->getParent() != CurBB->getBasicBlock() || !InBlock(BOp->getOperand(0), CurBB->getBasicBlock()) || !InBlock(BOp->getOperand(1), CurBB->getBasicBlock())) { EmitBranchForMergedCondition(Cond, TBB, FBB, CurBB, SwitchBB); return; } // Create TmpBB after CurBB. MachineFunction::iterator BBI = CurBB; MachineFunction &MF = DAG.getMachineFunction(); MachineBasicBlock *TmpBB = MF.CreateMachineBasicBlock(CurBB->getBasicBlock()); CurBB->getParent()->insert(++BBI, TmpBB); if (Opc == Instruction::Or) { // Codegen X | Y as: // jmp_if_X TBB // jmp TmpBB // TmpBB: // jmp_if_Y TBB // jmp FBB // // Emit the LHS condition. FindMergedConditions(BOp->getOperand(0), TBB, TmpBB, CurBB, SwitchBB, Opc); // Emit the RHS condition into TmpBB. FindMergedConditions(BOp->getOperand(1), TBB, FBB, TmpBB, SwitchBB, Opc); } else { assert(Opc == Instruction::And && "Unknown merge op!"); // Codegen X & Y as: // jmp_if_X TmpBB // jmp FBB // TmpBB: // jmp_if_Y TBB // jmp FBB // // This requires creation of TmpBB after CurBB. // Emit the LHS condition. FindMergedConditions(BOp->getOperand(0), TmpBB, FBB, CurBB, SwitchBB, Opc); // Emit the RHS condition into TmpBB. FindMergedConditions(BOp->getOperand(1), TBB, FBB, TmpBB, SwitchBB, Opc); } } /// If the set of cases should be emitted as a series of branches, return true. /// If we should emit this as a bunch of and/or'd together conditions, return /// false. bool SelectionDAGBuilder::ShouldEmitAsBranches(const std::vector &Cases){ if (Cases.size() != 2) return true; // If this is two comparisons of the same values or'd or and'd together, they // will get folded into a single comparison, so don't emit two blocks. if ((Cases[0].CmpLHS == Cases[1].CmpLHS && Cases[0].CmpRHS == Cases[1].CmpRHS) || (Cases[0].CmpRHS == Cases[1].CmpLHS && Cases[0].CmpLHS == Cases[1].CmpRHS)) { return false; } // Handle: (X != null) | (Y != null) --> (X|Y) != 0 // Handle: (X == null) & (Y == null) --> (X|Y) == 0 if (Cases[0].CmpRHS == Cases[1].CmpRHS && Cases[0].CC == Cases[1].CC && isa(Cases[0].CmpRHS) && cast(Cases[0].CmpRHS)->isNullValue()) { if (Cases[0].CC == ISD::SETEQ && Cases[0].TrueBB == Cases[1].ThisBB) return false; if (Cases[0].CC == ISD::SETNE && Cases[0].FalseBB == Cases[1].ThisBB) return false; } return true; } void SelectionDAGBuilder::visitBr(const BranchInst &I) { MachineBasicBlock *BrMBB = FuncInfo.MBB; // Update machine-CFG edges. MachineBasicBlock *Succ0MBB = FuncInfo.MBBMap[I.getSuccessor(0)]; // Figure out which block is immediately after the current one. MachineBasicBlock *NextBlock = 0; MachineFunction::iterator BBI = BrMBB; if (++BBI != FuncInfo.MF->end()) NextBlock = BBI; if (I.isUnconditional()) { // Update machine-CFG edges. BrMBB->addSuccessor(Succ0MBB); // If this is not a fall-through branch, emit the branch. if (Succ0MBB != NextBlock) DAG.setRoot(DAG.getNode(ISD::BR, getCurDebugLoc(), MVT::Other, getControlRoot(), DAG.getBasicBlock(Succ0MBB))); return; } // If this condition is one of the special cases we handle, do special stuff // now. const Value *CondVal = I.getCondition(); MachineBasicBlock *Succ1MBB = FuncInfo.MBBMap[I.getSuccessor(1)]; // If this is a series of conditions that are or'd or and'd together, emit // this as a sequence of branches instead of setcc's with and/or operations. // As long as jumps are not expensive, this should improve performance. // For example, instead of something like: // cmp A, B // C = seteq // cmp D, E // F = setle // or C, F // jnz foo // Emit: // cmp A, B // je foo // cmp D, E // jle foo // if (const BinaryOperator *BOp = dyn_cast(CondVal)) { if (!TLI.isJumpExpensive() && BOp->hasOneUse() && (BOp->getOpcode() == Instruction::And || BOp->getOpcode() == Instruction::Or)) { FindMergedConditions(BOp, Succ0MBB, Succ1MBB, BrMBB, BrMBB, BOp->getOpcode()); // If the compares in later blocks need to use values not currently // exported from this block, export them now. This block should always // be the first entry. assert(SwitchCases[0].ThisBB == BrMBB && "Unexpected lowering!"); // Allow some cases to be rejected. if (ShouldEmitAsBranches(SwitchCases)) { for (unsigned i = 1, e = SwitchCases.size(); i != e; ++i) { ExportFromCurrentBlock(SwitchCases[i].CmpLHS); ExportFromCurrentBlock(SwitchCases[i].CmpRHS); } // Emit the branch for this block. visitSwitchCase(SwitchCases[0], BrMBB); SwitchCases.erase(SwitchCases.begin()); return; } // Okay, we decided not to do this, remove any inserted MBB's and clear // SwitchCases. for (unsigned i = 1, e = SwitchCases.size(); i != e; ++i) FuncInfo.MF->erase(SwitchCases[i].ThisBB); SwitchCases.clear(); } } // Create a CaseBlock record representing this branch. CaseBlock CB(ISD::SETEQ, CondVal, ConstantInt::getTrue(*DAG.getContext()), NULL, Succ0MBB, Succ1MBB, BrMBB); // Use visitSwitchCase to actually insert the fast branch sequence for this // cond branch. visitSwitchCase(CB, BrMBB); } /// visitSwitchCase - Emits the necessary code to represent a single node in /// the binary search tree resulting from lowering a switch instruction. void SelectionDAGBuilder::visitSwitchCase(CaseBlock &CB, MachineBasicBlock *SwitchBB) { SDValue Cond; SDValue CondLHS = getValue(CB.CmpLHS); DebugLoc dl = getCurDebugLoc(); // Build the setcc now. if (CB.CmpMHS == NULL) { // Fold "(X == true)" to X and "(X == false)" to !X to // handle common cases produced by branch lowering. if (CB.CmpRHS == ConstantInt::getTrue(*DAG.getContext()) && CB.CC == ISD::SETEQ) Cond = CondLHS; else if (CB.CmpRHS == ConstantInt::getFalse(*DAG.getContext()) && CB.CC == ISD::SETEQ) { SDValue True = DAG.getConstant(1, CondLHS.getValueType()); Cond = DAG.getNode(ISD::XOR, dl, CondLHS.getValueType(), CondLHS, True); } else Cond = DAG.getSetCC(dl, MVT::i1, CondLHS, getValue(CB.CmpRHS), CB.CC); } else { assert(CB.CC == ISD::SETLE && "Can handle only LE ranges now"); const APInt& Low = cast(CB.CmpLHS)->getValue(); const APInt& High = cast(CB.CmpRHS)->getValue(); SDValue CmpOp = getValue(CB.CmpMHS); EVT VT = CmpOp.getValueType(); if (cast(CB.CmpLHS)->isMinValue(true)) { Cond = DAG.getSetCC(dl, MVT::i1, CmpOp, DAG.getConstant(High, VT), ISD::SETLE); } else { SDValue SUB = DAG.getNode(ISD::SUB, dl, VT, CmpOp, DAG.getConstant(Low, VT)); Cond = DAG.getSetCC(dl, MVT::i1, SUB, DAG.getConstant(High-Low, VT), ISD::SETULE); } } // Update successor info addSuccessorWithWeight(SwitchBB, CB.TrueBB, CB.TrueWeight); addSuccessorWithWeight(SwitchBB, CB.FalseBB, CB.FalseWeight); // Set NextBlock to be the MBB immediately after the current one, if any. // This is used to avoid emitting unnecessary branches to the next block. MachineBasicBlock *NextBlock = 0; MachineFunction::iterator BBI = SwitchBB; if (++BBI != FuncInfo.MF->end()) NextBlock = BBI; // If the lhs block is the next block, invert the condition so that we can // fall through to the lhs instead of the rhs block. if (CB.TrueBB == NextBlock) { std::swap(CB.TrueBB, CB.FalseBB); SDValue True = DAG.getConstant(1, Cond.getValueType()); Cond = DAG.getNode(ISD::XOR, dl, Cond.getValueType(), Cond, True); } SDValue BrCond = DAG.getNode(ISD::BRCOND, dl, MVT::Other, getControlRoot(), Cond, DAG.getBasicBlock(CB.TrueBB)); // Insert the false branch. Do this even if it's a fall through branch, // this makes it easier to do DAG optimizations which require inverting // the branch condition. BrCond = DAG.getNode(ISD::BR, dl, MVT::Other, BrCond, DAG.getBasicBlock(CB.FalseBB)); DAG.setRoot(BrCond); } /// visitJumpTable - Emit JumpTable node in the current MBB void SelectionDAGBuilder::visitJumpTable(JumpTable &JT) { // Emit the code for the jump table assert(JT.Reg != -1U && "Should lower JT Header first!"); EVT PTy = TLI.getPointerTy(); SDValue Index = DAG.getCopyFromReg(getControlRoot(), getCurDebugLoc(), JT.Reg, PTy); SDValue Table = DAG.getJumpTable(JT.JTI, PTy); SDValue BrJumpTable = DAG.getNode(ISD::BR_JT, getCurDebugLoc(), MVT::Other, Index.getValue(1), Table, Index); DAG.setRoot(BrJumpTable); } /// visitJumpTableHeader - This function emits necessary code to produce index /// in the JumpTable from switch case. void SelectionDAGBuilder::visitJumpTableHeader(JumpTable &JT, JumpTableHeader &JTH, MachineBasicBlock *SwitchBB) { // Subtract the lowest switch case value from the value being switched on and // conditional branch to default mbb if the result is greater than the // difference between smallest and largest cases. SDValue SwitchOp = getValue(JTH.SValue); EVT VT = SwitchOp.getValueType(); SDValue Sub = DAG.getNode(ISD::SUB, getCurDebugLoc(), VT, SwitchOp, DAG.getConstant(JTH.First, VT)); // The SDNode we just created, which holds the value being switched on minus // the smallest case value, needs to be copied to a virtual register so it // can be used as an index into the jump table in a subsequent basic block. // This value may be smaller or larger than the target's pointer type, and // therefore require extension or truncating. SwitchOp = DAG.getZExtOrTrunc(Sub, getCurDebugLoc(), TLI.getPointerTy()); unsigned JumpTableReg = FuncInfo.CreateReg(TLI.getPointerTy()); SDValue CopyTo = DAG.getCopyToReg(getControlRoot(), getCurDebugLoc(), JumpTableReg, SwitchOp); JT.Reg = JumpTableReg; // Emit the range check for the jump table, and branch to the default block // for the switch statement if the value being switched on exceeds the largest // case in the switch. SDValue CMP = DAG.getSetCC(getCurDebugLoc(), TLI.getSetCCResultType(Sub.getValueType()), Sub, DAG.getConstant(JTH.Last-JTH.First,VT), ISD::SETUGT); // Set NextBlock to be the MBB immediately after the current one, if any. // This is used to avoid emitting unnecessary branches to the next block. MachineBasicBlock *NextBlock = 0; MachineFunction::iterator BBI = SwitchBB; if (++BBI != FuncInfo.MF->end()) NextBlock = BBI; SDValue BrCond = DAG.getNode(ISD::BRCOND, getCurDebugLoc(), MVT::Other, CopyTo, CMP, DAG.getBasicBlock(JT.Default)); if (JT.MBB != NextBlock) BrCond = DAG.getNode(ISD::BR, getCurDebugLoc(), MVT::Other, BrCond, DAG.getBasicBlock(JT.MBB)); DAG.setRoot(BrCond); } /// visitBitTestHeader - This function emits necessary code to produce value /// suitable for "bit tests" void SelectionDAGBuilder::visitBitTestHeader(BitTestBlock &B, MachineBasicBlock *SwitchBB) { // Subtract the minimum value SDValue SwitchOp = getValue(B.SValue); EVT VT = SwitchOp.getValueType(); SDValue Sub = DAG.getNode(ISD::SUB, getCurDebugLoc(), VT, SwitchOp, DAG.getConstant(B.First, VT)); // Check range SDValue RangeCmp = DAG.getSetCC(getCurDebugLoc(), TLI.getSetCCResultType(Sub.getValueType()), Sub, DAG.getConstant(B.Range, VT), ISD::SETUGT); // Determine the type of the test operands. bool UsePtrType = false; if (!TLI.isTypeLegal(VT)) UsePtrType = true; else { for (unsigned i = 0, e = B.Cases.size(); i != e; ++i) if (!isUIntN(VT.getSizeInBits(), B.Cases[i].Mask)) { // Switch table case range are encoded into series of masks. // Just use pointer type, it's guaranteed to fit. UsePtrType = true; break; } } if (UsePtrType) { VT = TLI.getPointerTy(); Sub = DAG.getZExtOrTrunc(Sub, getCurDebugLoc(), VT); } B.RegVT = VT; B.Reg = FuncInfo.CreateReg(VT); SDValue CopyTo = DAG.getCopyToReg(getControlRoot(), getCurDebugLoc(), B.Reg, Sub); // Set NextBlock to be the MBB immediately after the current one, if any. // This is used to avoid emitting unnecessary branches to the next block. MachineBasicBlock *NextBlock = 0; MachineFunction::iterator BBI = SwitchBB; if (++BBI != FuncInfo.MF->end()) NextBlock = BBI; MachineBasicBlock* MBB = B.Cases[0].ThisBB; addSuccessorWithWeight(SwitchBB, B.Default); addSuccessorWithWeight(SwitchBB, MBB); SDValue BrRange = DAG.getNode(ISD::BRCOND, getCurDebugLoc(), MVT::Other, CopyTo, RangeCmp, DAG.getBasicBlock(B.Default)); if (MBB != NextBlock) BrRange = DAG.getNode(ISD::BR, getCurDebugLoc(), MVT::Other, CopyTo, DAG.getBasicBlock(MBB)); DAG.setRoot(BrRange); } /// visitBitTestCase - this function produces one "bit test" void SelectionDAGBuilder::visitBitTestCase(BitTestBlock &BB, MachineBasicBlock* NextMBB, unsigned Reg, BitTestCase &B, MachineBasicBlock *SwitchBB) { EVT VT = BB.RegVT; SDValue ShiftOp = DAG.getCopyFromReg(getControlRoot(), getCurDebugLoc(), Reg, VT); SDValue Cmp; unsigned PopCount = CountPopulation_64(B.Mask); if (PopCount == 1) { // Testing for a single bit; just compare the shift count with what it // would need to be to shift a 1 bit in that position. Cmp = DAG.getSetCC(getCurDebugLoc(), TLI.getSetCCResultType(VT), ShiftOp, DAG.getConstant(CountTrailingZeros_64(B.Mask), VT), ISD::SETEQ); } else if (PopCount == BB.Range) { // There is only one zero bit in the range, test for it directly. Cmp = DAG.getSetCC(getCurDebugLoc(), TLI.getSetCCResultType(VT), ShiftOp, DAG.getConstant(CountTrailingOnes_64(B.Mask), VT), ISD::SETNE); } else { // Make desired shift SDValue SwitchVal = DAG.getNode(ISD::SHL, getCurDebugLoc(), VT, DAG.getConstant(1, VT), ShiftOp); // Emit bit tests and jumps SDValue AndOp = DAG.getNode(ISD::AND, getCurDebugLoc(), VT, SwitchVal, DAG.getConstant(B.Mask, VT)); Cmp = DAG.getSetCC(getCurDebugLoc(), TLI.getSetCCResultType(VT), AndOp, DAG.getConstant(0, VT), ISD::SETNE); } addSuccessorWithWeight(SwitchBB, B.TargetBB); addSuccessorWithWeight(SwitchBB, NextMBB); SDValue BrAnd = DAG.getNode(ISD::BRCOND, getCurDebugLoc(), MVT::Other, getControlRoot(), Cmp, DAG.getBasicBlock(B.TargetBB)); // Set NextBlock to be the MBB immediately after the current one, if any. // This is used to avoid emitting unnecessary branches to the next block. MachineBasicBlock *NextBlock = 0; MachineFunction::iterator BBI = SwitchBB; if (++BBI != FuncInfo.MF->end()) NextBlock = BBI; if (NextMBB != NextBlock) BrAnd = DAG.getNode(ISD::BR, getCurDebugLoc(), MVT::Other, BrAnd, DAG.getBasicBlock(NextMBB)); DAG.setRoot(BrAnd); } void SelectionDAGBuilder::visitInvoke(const InvokeInst &I) { MachineBasicBlock *InvokeMBB = FuncInfo.MBB; // Retrieve successors. MachineBasicBlock *Return = FuncInfo.MBBMap[I.getSuccessor(0)]; MachineBasicBlock *LandingPad = FuncInfo.MBBMap[I.getSuccessor(1)]; const Value *Callee(I.getCalledValue()); if (isa(Callee)) visitInlineAsm(&I); else LowerCallTo(&I, getValue(Callee), false, LandingPad); // If the value of the invoke is used outside of its defining block, make it // available as a virtual register. CopyToExportRegsIfNeeded(&I); // Update successor info InvokeMBB->addSuccessor(Return); InvokeMBB->addSuccessor(LandingPad); // Drop into normal successor. DAG.setRoot(DAG.getNode(ISD::BR, getCurDebugLoc(), MVT::Other, getControlRoot(), DAG.getBasicBlock(Return))); } void SelectionDAGBuilder::visitUnwind(const UnwindInst &I) { } void SelectionDAGBuilder::visitResume(const ResumeInst &RI) { llvm_unreachable("SelectionDAGBuilder shouldn't visit resume instructions!"); } void SelectionDAGBuilder::visitLandingPad(const LandingPadInst &LP) { assert(FuncInfo.MBB->isLandingPad() && "Call to landingpad not in landing pad!"); MachineBasicBlock *MBB = FuncInfo.MBB; MachineModuleInfo &MMI = DAG.getMachineFunction().getMMI(); AddLandingPadInfo(LP, MMI, MBB); SmallVector ValueVTs; ComputeValueVTs(TLI, LP.getType(), ValueVTs); // Insert the EXCEPTIONADDR instruction. assert(FuncInfo.MBB->isLandingPad() && "Call to eh.exception not in landing pad!"); SDVTList VTs = DAG.getVTList(TLI.getPointerTy(), MVT::Other); SDValue Ops[2]; Ops[0] = DAG.getRoot(); SDValue Op1 = DAG.getNode(ISD::EXCEPTIONADDR, getCurDebugLoc(), VTs, Ops, 1); SDValue Chain = Op1.getValue(1); // Insert the EHSELECTION instruction. VTs = DAG.getVTList(TLI.getPointerTy(), MVT::Other); Ops[0] = Op1; Ops[1] = Chain; SDValue Op2 = DAG.getNode(ISD::EHSELECTION, getCurDebugLoc(), VTs, Ops, 2); Chain = Op2.getValue(1); Op2 = DAG.getSExtOrTrunc(Op2, getCurDebugLoc(), MVT::i32); Ops[0] = Op1; Ops[1] = Op2; SDValue Res = DAG.getNode(ISD::MERGE_VALUES, getCurDebugLoc(), DAG.getVTList(&ValueVTs[0], ValueVTs.size()), &Ops[0], 2); std::pair RetPair = std::make_pair(Res, Chain); setValue(&LP, RetPair.first); DAG.setRoot(RetPair.second); } /// handleSmallSwitchCaseRange - Emit a series of specific tests (suitable for /// small case ranges). bool SelectionDAGBuilder::handleSmallSwitchRange(CaseRec& CR, CaseRecVector& WorkList, const Value* SV, MachineBasicBlock *Default, MachineBasicBlock *SwitchBB) { Case& BackCase = *(CR.Range.second-1); // Size is the number of Cases represented by this range. size_t Size = CR.Range.second - CR.Range.first; if (Size > 3) return false; // Get the MachineFunction which holds the current MBB. This is used when // inserting any additional MBBs necessary to represent the switch. MachineFunction *CurMF = FuncInfo.MF; // Figure out which block is immediately after the current one. MachineBasicBlock *NextBlock = 0; MachineFunction::iterator BBI = CR.CaseBB; if (++BBI != FuncInfo.MF->end()) NextBlock = BBI; // If any two of the cases has the same destination, and if one value // is the same as the other, but has one bit unset that the other has set, // use bit manipulation to do two compares at once. For example: // "if (X == 6 || X == 4)" -> "if ((X|2) == 6)" // TODO: This could be extended to merge any 2 cases in switches with 3 cases. // TODO: Handle cases where CR.CaseBB != SwitchBB. if (Size == 2 && CR.CaseBB == SwitchBB) { Case &Small = *CR.Range.first; Case &Big = *(CR.Range.second-1); if (Small.Low == Small.High && Big.Low == Big.High && Small.BB == Big.BB) { const APInt& SmallValue = cast(Small.Low)->getValue(); const APInt& BigValue = cast(Big.Low)->getValue(); // Check that there is only one bit different. if (BigValue.countPopulation() == SmallValue.countPopulation() + 1 && (SmallValue | BigValue) == BigValue) { // Isolate the common bit. APInt CommonBit = BigValue & ~SmallValue; assert((SmallValue | CommonBit) == BigValue && CommonBit.countPopulation() == 1 && "Not a common bit?"); SDValue CondLHS = getValue(SV); EVT VT = CondLHS.getValueType(); DebugLoc DL = getCurDebugLoc(); SDValue Or = DAG.getNode(ISD::OR, DL, VT, CondLHS, DAG.getConstant(CommonBit, VT)); SDValue Cond = DAG.getSetCC(DL, MVT::i1, Or, DAG.getConstant(BigValue, VT), ISD::SETEQ); // Update successor info. addSuccessorWithWeight(SwitchBB, Small.BB); addSuccessorWithWeight(SwitchBB, Default); // Insert the true branch. SDValue BrCond = DAG.getNode(ISD::BRCOND, DL, MVT::Other, getControlRoot(), Cond, DAG.getBasicBlock(Small.BB)); // Insert the false branch. BrCond = DAG.getNode(ISD::BR, DL, MVT::Other, BrCond, DAG.getBasicBlock(Default)); DAG.setRoot(BrCond); return true; } } } // Rearrange the case blocks so that the last one falls through if possible. if (NextBlock && Default != NextBlock && BackCase.BB != NextBlock) { // The last case block won't fall through into 'NextBlock' if we emit the // branches in this order. See if rearranging a case value would help. for (CaseItr I = CR.Range.first, E = CR.Range.second-1; I != E; ++I) { if (I->BB == NextBlock) { std::swap(*I, BackCase); break; } } } // Create a CaseBlock record representing a conditional branch to // the Case's target mbb if the value being switched on SV is equal // to C. MachineBasicBlock *CurBlock = CR.CaseBB; for (CaseItr I = CR.Range.first, E = CR.Range.second; I != E; ++I) { MachineBasicBlock *FallThrough; if (I != E-1) { FallThrough = CurMF->CreateMachineBasicBlock(CurBlock->getBasicBlock()); CurMF->insert(BBI, FallThrough); // Put SV in a virtual register to make it available from the new blocks. ExportFromCurrentBlock(SV); } else { // If the last case doesn't match, go to the default block. FallThrough = Default; } const Value *RHS, *LHS, *MHS; ISD::CondCode CC; if (I->High == I->Low) { // This is just small small case range :) containing exactly 1 case CC = ISD::SETEQ; LHS = SV; RHS = I->High; MHS = NULL; } else { CC = ISD::SETLE; LHS = I->Low; MHS = SV; RHS = I->High; } uint32_t ExtraWeight = I->ExtraWeight; CaseBlock CB(CC, LHS, RHS, MHS, /* truebb */ I->BB, /* falsebb */ FallThrough, /* me */ CurBlock, /* trueweight */ ExtraWeight / 2, /* falseweight */ ExtraWeight / 2); // If emitting the first comparison, just call visitSwitchCase to emit the // code into the current block. Otherwise, push the CaseBlock onto the // vector to be later processed by SDISel, and insert the node's MBB // before the next MBB. if (CurBlock == SwitchBB) visitSwitchCase(CB, SwitchBB); else SwitchCases.push_back(CB); CurBlock = FallThrough; } return true; } static inline bool areJTsAllowed(const TargetLowering &TLI) { return !DisableJumpTables && (TLI.isOperationLegalOrCustom(ISD::BR_JT, MVT::Other) || TLI.isOperationLegalOrCustom(ISD::BRIND, MVT::Other)); } static APInt ComputeRange(const APInt &First, const APInt &Last) { uint32_t BitWidth = std::max(Last.getBitWidth(), First.getBitWidth()) + 1; APInt LastExt = Last.sext(BitWidth), FirstExt = First.sext(BitWidth); return (LastExt - FirstExt + 1ULL); } /// handleJTSwitchCase - Emit jumptable for current switch case range bool SelectionDAGBuilder::handleJTSwitchCase(CaseRec &CR, CaseRecVector &WorkList, const Value *SV, MachineBasicBlock *Default, MachineBasicBlock *SwitchBB) { Case& FrontCase = *CR.Range.first; Case& BackCase = *(CR.Range.second-1); const APInt &First = cast(FrontCase.Low)->getValue(); const APInt &Last = cast(BackCase.High)->getValue(); APInt TSize(First.getBitWidth(), 0); for (CaseItr I = CR.Range.first, E = CR.Range.second; I != E; ++I) TSize += I->size(); if (!areJTsAllowed(TLI) || TSize.ult(4)) return false; APInt Range = ComputeRange(First, Last); - double Density = TSize.roundToDouble() / Range.roundToDouble(); - if (Density < 0.4) + // The density is TSize / Range. Require at least 40%. + // It should not be possible for IntTSize to saturate for sane code, but make + // sure we handle Range saturation correctly. + uint64_t IntRange = Range.getLimitedValue(UINT64_MAX/10); + uint64_t IntTSize = TSize.getLimitedValue(UINT64_MAX/10); + if (IntTSize * 10 < IntRange * 4) return false; DEBUG(dbgs() << "Lowering jump table\n" << "First entry: " << First << ". Last entry: " << Last << '\n' - << "Range: " << Range - << ". Size: " << TSize << ". Density: " << Density << "\n\n"); + << "Range: " << Range << ". Size: " << TSize << ".\n\n"); // Get the MachineFunction which holds the current MBB. This is used when // inserting any additional MBBs necessary to represent the switch. MachineFunction *CurMF = FuncInfo.MF; // Figure out which block is immediately after the current one. MachineFunction::iterator BBI = CR.CaseBB; ++BBI; const BasicBlock *LLVMBB = CR.CaseBB->getBasicBlock(); // Create a new basic block to hold the code for loading the address // of the jump table, and jumping to it. Update successor information; // we will either branch to the default case for the switch, or the jump // table. MachineBasicBlock *JumpTableBB = CurMF->CreateMachineBasicBlock(LLVMBB); CurMF->insert(BBI, JumpTableBB); addSuccessorWithWeight(CR.CaseBB, Default); addSuccessorWithWeight(CR.CaseBB, JumpTableBB); // Build a vector of destination BBs, corresponding to each target // of the jump table. If the value of the jump table slot corresponds to // a case statement, push the case's BB onto the vector, otherwise, push // the default BB. std::vector DestBBs; APInt TEI = First; for (CaseItr I = CR.Range.first, E = CR.Range.second; I != E; ++TEI) { const APInt &Low = cast(I->Low)->getValue(); const APInt &High = cast(I->High)->getValue(); if (Low.sle(TEI) && TEI.sle(High)) { DestBBs.push_back(I->BB); if (TEI==High) ++I; } else { DestBBs.push_back(Default); } } // Update successor info. Add one edge to each unique successor. BitVector SuccsHandled(CR.CaseBB->getParent()->getNumBlockIDs()); for (std::vector::iterator I = DestBBs.begin(), E = DestBBs.end(); I != E; ++I) { if (!SuccsHandled[(*I)->getNumber()]) { SuccsHandled[(*I)->getNumber()] = true; addSuccessorWithWeight(JumpTableBB, *I); } } // Create a jump table index for this jump table. unsigned JTEncoding = TLI.getJumpTableEncoding(); unsigned JTI = CurMF->getOrCreateJumpTableInfo(JTEncoding) ->createJumpTableIndex(DestBBs); // Set the jump table information so that we can codegen it as a second // MachineBasicBlock JumpTable JT(-1U, JTI, JumpTableBB, Default); JumpTableHeader JTH(First, Last, SV, CR.CaseBB, (CR.CaseBB == SwitchBB)); if (CR.CaseBB == SwitchBB) visitJumpTableHeader(JT, JTH, SwitchBB); JTCases.push_back(JumpTableBlock(JTH, JT)); return true; } /// handleBTSplitSwitchCase - emit comparison and split binary search tree into /// 2 subtrees. bool SelectionDAGBuilder::handleBTSplitSwitchCase(CaseRec& CR, CaseRecVector& WorkList, const Value* SV, MachineBasicBlock *Default, MachineBasicBlock *SwitchBB) { // Get the MachineFunction which holds the current MBB. This is used when // inserting any additional MBBs necessary to represent the switch. MachineFunction *CurMF = FuncInfo.MF; // Figure out which block is immediately after the current one. MachineFunction::iterator BBI = CR.CaseBB; ++BBI; Case& FrontCase = *CR.Range.first; Case& BackCase = *(CR.Range.second-1); const BasicBlock *LLVMBB = CR.CaseBB->getBasicBlock(); // Size is the number of Cases represented by this range. unsigned Size = CR.Range.second - CR.Range.first; const APInt &First = cast(FrontCase.Low)->getValue(); const APInt &Last = cast(BackCase.High)->getValue(); double FMetric = 0; CaseItr Pivot = CR.Range.first + Size/2; // Select optimal pivot, maximizing sum density of LHS and RHS. This will // (heuristically) allow us to emit JumpTable's later. APInt TSize(First.getBitWidth(), 0); for (CaseItr I = CR.Range.first, E = CR.Range.second; I!=E; ++I) TSize += I->size(); APInt LSize = FrontCase.size(); APInt RSize = TSize-LSize; DEBUG(dbgs() << "Selecting best pivot: \n" << "First: " << First << ", Last: " << Last <<'\n' << "LSize: " << LSize << ", RSize: " << RSize << '\n'); for (CaseItr I = CR.Range.first, J=I+1, E = CR.Range.second; J!=E; ++I, ++J) { const APInt &LEnd = cast(I->High)->getValue(); const APInt &RBegin = cast(J->Low)->getValue(); APInt Range = ComputeRange(LEnd, RBegin); assert((Range - 2ULL).isNonNegative() && "Invalid case distance"); // Use volatile double here to avoid excess precision issues on some hosts, // e.g. that use 80-bit X87 registers. volatile double LDensity = (double)LSize.roundToDouble() / (LEnd - First + 1ULL).roundToDouble(); volatile double RDensity = (double)RSize.roundToDouble() / (Last - RBegin + 1ULL).roundToDouble(); double Metric = Range.logBase2()*(LDensity+RDensity); // Should always split in some non-trivial place DEBUG(dbgs() <<"=>Step\n" << "LEnd: " << LEnd << ", RBegin: " << RBegin << '\n' << "LDensity: " << LDensity << ", RDensity: " << RDensity << '\n' << "Metric: " << Metric << '\n'); if (FMetric < Metric) { Pivot = J; FMetric = Metric; DEBUG(dbgs() << "Current metric set to: " << FMetric << '\n'); } LSize += J->size(); RSize -= J->size(); } if (areJTsAllowed(TLI)) { // If our case is dense we *really* should handle it earlier! assert((FMetric > 0) && "Should handle dense range earlier!"); } else { Pivot = CR.Range.first + Size/2; } CaseRange LHSR(CR.Range.first, Pivot); CaseRange RHSR(Pivot, CR.Range.second); Constant *C = Pivot->Low; MachineBasicBlock *FalseBB = 0, *TrueBB = 0; // We know that we branch to the LHS if the Value being switched on is // less than the Pivot value, C. We use this to optimize our binary // tree a bit, by recognizing that if SV is greater than or equal to the // LHS's Case Value, and that Case Value is exactly one less than the // Pivot's Value, then we can branch directly to the LHS's Target, // rather than creating a leaf node for it. if ((LHSR.second - LHSR.first) == 1 && LHSR.first->High == CR.GE && cast(C)->getValue() == (cast(CR.GE)->getValue() + 1LL)) { TrueBB = LHSR.first->BB; } else { TrueBB = CurMF->CreateMachineBasicBlock(LLVMBB); CurMF->insert(BBI, TrueBB); WorkList.push_back(CaseRec(TrueBB, C, CR.GE, LHSR)); // Put SV in a virtual register to make it available from the new blocks. ExportFromCurrentBlock(SV); } // Similar to the optimization above, if the Value being switched on is // known to be less than the Constant CR.LT, and the current Case Value // is CR.LT - 1, then we can branch directly to the target block for // the current Case Value, rather than emitting a RHS leaf node for it. if ((RHSR.second - RHSR.first) == 1 && CR.LT && cast(RHSR.first->Low)->getValue() == (cast(CR.LT)->getValue() - 1LL)) { FalseBB = RHSR.first->BB; } else { FalseBB = CurMF->CreateMachineBasicBlock(LLVMBB); CurMF->insert(BBI, FalseBB); WorkList.push_back(CaseRec(FalseBB,CR.LT,C,RHSR)); // Put SV in a virtual register to make it available from the new blocks. ExportFromCurrentBlock(SV); } // Create a CaseBlock record representing a conditional branch to // the LHS node if the value being switched on SV is less than C. // Otherwise, branch to LHS. CaseBlock CB(ISD::SETLT, SV, C, NULL, TrueBB, FalseBB, CR.CaseBB); if (CR.CaseBB == SwitchBB) visitSwitchCase(CB, SwitchBB); else SwitchCases.push_back(CB); return true; } /// handleBitTestsSwitchCase - if current case range has few destination and /// range span less, than machine word bitwidth, encode case range into series /// of masks and emit bit tests with these masks. bool SelectionDAGBuilder::handleBitTestsSwitchCase(CaseRec& CR, CaseRecVector& WorkList, const Value* SV, MachineBasicBlock* Default, MachineBasicBlock *SwitchBB){ EVT PTy = TLI.getPointerTy(); unsigned IntPtrBits = PTy.getSizeInBits(); Case& FrontCase = *CR.Range.first; Case& BackCase = *(CR.Range.second-1); // Get the MachineFunction which holds the current MBB. This is used when // inserting any additional MBBs necessary to represent the switch. MachineFunction *CurMF = FuncInfo.MF; // If target does not have legal shift left, do not emit bit tests at all. if (!TLI.isOperationLegal(ISD::SHL, TLI.getPointerTy())) return false; size_t numCmps = 0; for (CaseItr I = CR.Range.first, E = CR.Range.second; I!=E; ++I) { // Single case counts one, case range - two. numCmps += (I->Low == I->High ? 1 : 2); } // Count unique destinations SmallSet Dests; for (CaseItr I = CR.Range.first, E = CR.Range.second; I!=E; ++I) { Dests.insert(I->BB); if (Dests.size() > 3) // Don't bother the code below, if there are too much unique destinations return false; } DEBUG(dbgs() << "Total number of unique destinations: " << Dests.size() << '\n' << "Total number of comparisons: " << numCmps << '\n'); // Compute span of values. const APInt& minValue = cast(FrontCase.Low)->getValue(); const APInt& maxValue = cast(BackCase.High)->getValue(); APInt cmpRange = maxValue - minValue; DEBUG(dbgs() << "Compare range: " << cmpRange << '\n' << "Low bound: " << minValue << '\n' << "High bound: " << maxValue << '\n'); if (cmpRange.uge(IntPtrBits) || (!(Dests.size() == 1 && numCmps >= 3) && !(Dests.size() == 2 && numCmps >= 5) && !(Dests.size() >= 3 && numCmps >= 6))) return false; DEBUG(dbgs() << "Emitting bit tests\n"); APInt lowBound = APInt::getNullValue(cmpRange.getBitWidth()); // Optimize the case where all the case values fit in a // word without having to subtract minValue. In this case, // we can optimize away the subtraction. if (minValue.isNonNegative() && maxValue.slt(IntPtrBits)) { cmpRange = maxValue; } else { lowBound = minValue; } CaseBitsVector CasesBits; unsigned i, count = 0; for (CaseItr I = CR.Range.first, E = CR.Range.second; I!=E; ++I) { MachineBasicBlock* Dest = I->BB; for (i = 0; i < count; ++i) if (Dest == CasesBits[i].BB) break; if (i == count) { assert((count < 3) && "Too much destinations to test!"); CasesBits.push_back(CaseBits(0, Dest, 0)); count++; } const APInt& lowValue = cast(I->Low)->getValue(); const APInt& highValue = cast(I->High)->getValue(); uint64_t lo = (lowValue - lowBound).getZExtValue(); uint64_t hi = (highValue - lowBound).getZExtValue(); for (uint64_t j = lo; j <= hi; j++) { CasesBits[i].Mask |= 1ULL << j; CasesBits[i].Bits++; } } std::sort(CasesBits.begin(), CasesBits.end(), CaseBitsCmp()); BitTestInfo BTC; // Figure out which block is immediately after the current one. MachineFunction::iterator BBI = CR.CaseBB; ++BBI; const BasicBlock *LLVMBB = CR.CaseBB->getBasicBlock(); DEBUG(dbgs() << "Cases:\n"); for (unsigned i = 0, e = CasesBits.size(); i!=e; ++i) { DEBUG(dbgs() << "Mask: " << CasesBits[i].Mask << ", Bits: " << CasesBits[i].Bits << ", BB: " << CasesBits[i].BB << '\n'); MachineBasicBlock *CaseBB = CurMF->CreateMachineBasicBlock(LLVMBB); CurMF->insert(BBI, CaseBB); BTC.push_back(BitTestCase(CasesBits[i].Mask, CaseBB, CasesBits[i].BB)); // Put SV in a virtual register to make it available from the new blocks. ExportFromCurrentBlock(SV); } BitTestBlock BTB(lowBound, cmpRange, SV, -1U, MVT::Other, (CR.CaseBB == SwitchBB), CR.CaseBB, Default, BTC); if (CR.CaseBB == SwitchBB) visitBitTestHeader(BTB, SwitchBB); BitTestCases.push_back(BTB); return true; } /// Clusterify - Transform simple list of Cases into list of CaseRange's size_t SelectionDAGBuilder::Clusterify(CaseVector& Cases, const SwitchInst& SI) { size_t numCmps = 0; BranchProbabilityInfo *BPI = FuncInfo.BPI; // Start with "simple" cases for (size_t i = 1; i < SI.getNumSuccessors(); ++i) { BasicBlock *SuccBB = SI.getSuccessor(i); MachineBasicBlock *SMBB = FuncInfo.MBBMap[SuccBB]; uint32_t ExtraWeight = BPI ? BPI->getEdgeWeight(SI.getParent(), SuccBB) : 0; Cases.push_back(Case(SI.getSuccessorValue(i), SI.getSuccessorValue(i), SMBB, ExtraWeight)); } std::sort(Cases.begin(), Cases.end(), CaseCmp()); // Merge case into clusters if (Cases.size() >= 2) // Must recompute end() each iteration because it may be // invalidated by erase if we hold on to it for (CaseItr I = Cases.begin(), J = llvm::next(Cases.begin()); J != Cases.end(); ) { const APInt& nextValue = cast(J->Low)->getValue(); const APInt& currentValue = cast(I->High)->getValue(); MachineBasicBlock* nextBB = J->BB; MachineBasicBlock* currentBB = I->BB; // If the two neighboring cases go to the same destination, merge them // into a single case. if ((nextValue - currentValue == 1) && (currentBB == nextBB)) { I->High = J->High; J = Cases.erase(J); if (BranchProbabilityInfo *BPI = FuncInfo.BPI) { uint32_t CurWeight = currentBB->getBasicBlock() ? BPI->getEdgeWeight(SI.getParent(), currentBB->getBasicBlock()) : 16; uint32_t NextWeight = nextBB->getBasicBlock() ? BPI->getEdgeWeight(SI.getParent(), nextBB->getBasicBlock()) : 16; BPI->setEdgeWeight(SI.getParent(), currentBB->getBasicBlock(), CurWeight + NextWeight); } } else { I = J++; } } for (CaseItr I=Cases.begin(), E=Cases.end(); I!=E; ++I, ++numCmps) { if (I->Low != I->High) // A range counts double, since it requires two compares. ++numCmps; } return numCmps; } void SelectionDAGBuilder::UpdateSplitBlock(MachineBasicBlock *First, MachineBasicBlock *Last) { // Update JTCases. for (unsigned i = 0, e = JTCases.size(); i != e; ++i) if (JTCases[i].first.HeaderBB == First) JTCases[i].first.HeaderBB = Last; // Update BitTestCases. for (unsigned i = 0, e = BitTestCases.size(); i != e; ++i) if (BitTestCases[i].Parent == First) BitTestCases[i].Parent = Last; } void SelectionDAGBuilder::visitSwitch(const SwitchInst &SI) { MachineBasicBlock *SwitchMBB = FuncInfo.MBB; // Figure out which block is immediately after the current one. MachineBasicBlock *NextBlock = 0; MachineBasicBlock *Default = FuncInfo.MBBMap[SI.getDefaultDest()]; // If there is only the default destination, branch to it if it is not the // next basic block. Otherwise, just fall through. if (SI.getNumCases() == 1) { // Update machine-CFG edges. // If this is not a fall-through branch, emit the branch. SwitchMBB->addSuccessor(Default); if (Default != NextBlock) DAG.setRoot(DAG.getNode(ISD::BR, getCurDebugLoc(), MVT::Other, getControlRoot(), DAG.getBasicBlock(Default))); return; } // If there are any non-default case statements, create a vector of Cases // representing each one, and sort the vector so that we can efficiently // create a binary search tree from them. CaseVector Cases; size_t numCmps = Clusterify(Cases, SI); DEBUG(dbgs() << "Clusterify finished. Total clusters: " << Cases.size() << ". Total compares: " << numCmps << '\n'); (void)numCmps; // Get the Value to be switched on and default basic blocks, which will be // inserted into CaseBlock records, representing basic blocks in the binary // search tree. const Value *SV = SI.getCondition(); // Push the initial CaseRec onto the worklist CaseRecVector WorkList; WorkList.push_back(CaseRec(SwitchMBB,0,0, CaseRange(Cases.begin(),Cases.end()))); while (!WorkList.empty()) { // Grab a record representing a case range to process off the worklist CaseRec CR = WorkList.back(); WorkList.pop_back(); if (handleBitTestsSwitchCase(CR, WorkList, SV, Default, SwitchMBB)) continue; // If the range has few cases (two or less) emit a series of specific // tests. if (handleSmallSwitchRange(CR, WorkList, SV, Default, SwitchMBB)) continue; // If the switch has more than 5 blocks, and at least 40% dense, and the // target supports indirect branches, then emit a jump table rather than // lowering the switch to a binary tree of conditional branches. if (handleJTSwitchCase(CR, WorkList, SV, Default, SwitchMBB)) continue; // Emit binary tree. We need to pick a pivot, and push left and right ranges // onto the worklist. Leafs are handled via handleSmallSwitchRange() call. handleBTSplitSwitchCase(CR, WorkList, SV, Default, SwitchMBB); } } void SelectionDAGBuilder::visitIndirectBr(const IndirectBrInst &I) { MachineBasicBlock *IndirectBrMBB = FuncInfo.MBB; // Update machine-CFG edges with unique successors. SmallVector succs; succs.reserve(I.getNumSuccessors()); for (unsigned i = 0, e = I.getNumSuccessors(); i != e; ++i) succs.push_back(I.getSuccessor(i)); array_pod_sort(succs.begin(), succs.end()); succs.erase(std::unique(succs.begin(), succs.end()), succs.end()); for (unsigned i = 0, e = succs.size(); i != e; ++i) { MachineBasicBlock *Succ = FuncInfo.MBBMap[succs[i]]; addSuccessorWithWeight(IndirectBrMBB, Succ); } DAG.setRoot(DAG.getNode(ISD::BRIND, getCurDebugLoc(), MVT::Other, getControlRoot(), getValue(I.getAddress()))); } void SelectionDAGBuilder::visitFSub(const User &I) { // -0.0 - X --> fneg Type *Ty = I.getType(); if (isa(I.getOperand(0)) && I.getOperand(0) == ConstantFP::getZeroValueForNegation(Ty)) { SDValue Op2 = getValue(I.getOperand(1)); setValue(&I, DAG.getNode(ISD::FNEG, getCurDebugLoc(), Op2.getValueType(), Op2)); return; } visitBinary(I, ISD::FSUB); } void SelectionDAGBuilder::visitBinary(const User &I, unsigned OpCode) { SDValue Op1 = getValue(I.getOperand(0)); SDValue Op2 = getValue(I.getOperand(1)); setValue(&I, DAG.getNode(OpCode, getCurDebugLoc(), Op1.getValueType(), Op1, Op2)); } void SelectionDAGBuilder::visitShift(const User &I, unsigned Opcode) { SDValue Op1 = getValue(I.getOperand(0)); SDValue Op2 = getValue(I.getOperand(1)); MVT ShiftTy = TLI.getShiftAmountTy(Op2.getValueType()); // Coerce the shift amount to the right type if we can. if (!I.getType()->isVectorTy() && Op2.getValueType() != ShiftTy) { unsigned ShiftSize = ShiftTy.getSizeInBits(); unsigned Op2Size = Op2.getValueType().getSizeInBits(); DebugLoc DL = getCurDebugLoc(); // If the operand is smaller than the shift count type, promote it. if (ShiftSize > Op2Size) Op2 = DAG.getNode(ISD::ZERO_EXTEND, DL, ShiftTy, Op2); // If the operand is larger than the shift count type but the shift // count type has enough bits to represent any shift value, truncate // it now. This is a common case and it exposes the truncate to // optimization early. else if (ShiftSize >= Log2_32_Ceil(Op2.getValueType().getSizeInBits())) Op2 = DAG.getNode(ISD::TRUNCATE, DL, ShiftTy, Op2); // Otherwise we'll need to temporarily settle for some other convenient // type. Type legalization will make adjustments once the shiftee is split. else Op2 = DAG.getZExtOrTrunc(Op2, DL, MVT::i32); } setValue(&I, DAG.getNode(Opcode, getCurDebugLoc(), Op1.getValueType(), Op1, Op2)); } void SelectionDAGBuilder::visitSDiv(const User &I) { SDValue Op1 = getValue(I.getOperand(0)); SDValue Op2 = getValue(I.getOperand(1)); // Turn exact SDivs into multiplications. // FIXME: This should be in DAGCombiner, but it doesn't have access to the // exact bit. if (isa(&I) && cast(&I)->isExact() && !isa(Op1) && isa(Op2) && !cast(Op2)->isNullValue()) setValue(&I, TLI.BuildExactSDIV(Op1, Op2, getCurDebugLoc(), DAG)); else setValue(&I, DAG.getNode(ISD::SDIV, getCurDebugLoc(), Op1.getValueType(), Op1, Op2)); } void SelectionDAGBuilder::visitICmp(const User &I) { ICmpInst::Predicate predicate = ICmpInst::BAD_ICMP_PREDICATE; if (const ICmpInst *IC = dyn_cast(&I)) predicate = IC->getPredicate(); else if (const ConstantExpr *IC = dyn_cast(&I)) predicate = ICmpInst::Predicate(IC->getPredicate()); SDValue Op1 = getValue(I.getOperand(0)); SDValue Op2 = getValue(I.getOperand(1)); ISD::CondCode Opcode = getICmpCondCode(predicate); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getSetCC(getCurDebugLoc(), DestVT, Op1, Op2, Opcode)); } void SelectionDAGBuilder::visitFCmp(const User &I) { FCmpInst::Predicate predicate = FCmpInst::BAD_FCMP_PREDICATE; if (const FCmpInst *FC = dyn_cast(&I)) predicate = FC->getPredicate(); else if (const ConstantExpr *FC = dyn_cast(&I)) predicate = FCmpInst::Predicate(FC->getPredicate()); SDValue Op1 = getValue(I.getOperand(0)); SDValue Op2 = getValue(I.getOperand(1)); ISD::CondCode Condition = getFCmpCondCode(predicate); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getSetCC(getCurDebugLoc(), DestVT, Op1, Op2, Condition)); } void SelectionDAGBuilder::visitSelect(const User &I) { SmallVector ValueVTs; ComputeValueVTs(TLI, I.getType(), ValueVTs); unsigned NumValues = ValueVTs.size(); if (NumValues == 0) return; SmallVector Values(NumValues); SDValue Cond = getValue(I.getOperand(0)); SDValue TrueVal = getValue(I.getOperand(1)); SDValue FalseVal = getValue(I.getOperand(2)); ISD::NodeType OpCode = Cond.getValueType().isVector() ? ISD::VSELECT : ISD::SELECT; for (unsigned i = 0; i != NumValues; ++i) Values[i] = DAG.getNode(OpCode, getCurDebugLoc(), TrueVal.getNode()->getValueType(TrueVal.getResNo()+i), Cond, SDValue(TrueVal.getNode(), TrueVal.getResNo() + i), SDValue(FalseVal.getNode(), FalseVal.getResNo() + i)); setValue(&I, DAG.getNode(ISD::MERGE_VALUES, getCurDebugLoc(), DAG.getVTList(&ValueVTs[0], NumValues), &Values[0], NumValues)); } void SelectionDAGBuilder::visitTrunc(const User &I) { // TruncInst cannot be a no-op cast because sizeof(src) > sizeof(dest). SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getNode(ISD::TRUNCATE, getCurDebugLoc(), DestVT, N)); } void SelectionDAGBuilder::visitZExt(const User &I) { // ZExt cannot be a no-op cast because sizeof(src) < sizeof(dest). // ZExt also can't be a cast to bool for same reason. So, nothing much to do SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getNode(ISD::ZERO_EXTEND, getCurDebugLoc(), DestVT, N)); } void SelectionDAGBuilder::visitSExt(const User &I) { // SExt cannot be a no-op cast because sizeof(src) < sizeof(dest). // SExt also can't be a cast to bool for same reason. So, nothing much to do SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getNode(ISD::SIGN_EXTEND, getCurDebugLoc(), DestVT, N)); } void SelectionDAGBuilder::visitFPTrunc(const User &I) { // FPTrunc is never a no-op cast, no need to check SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getNode(ISD::FP_ROUND, getCurDebugLoc(), DestVT, N, DAG.getIntPtrConstant(0))); } void SelectionDAGBuilder::visitFPExt(const User &I){ // FPTrunc is never a no-op cast, no need to check SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getNode(ISD::FP_EXTEND, getCurDebugLoc(), DestVT, N)); } void SelectionDAGBuilder::visitFPToUI(const User &I) { // FPToUI is never a no-op cast, no need to check SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getNode(ISD::FP_TO_UINT, getCurDebugLoc(), DestVT, N)); } void SelectionDAGBuilder::visitFPToSI(const User &I) { // FPToSI is never a no-op cast, no need to check SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getNode(ISD::FP_TO_SINT, getCurDebugLoc(), DestVT, N)); } void SelectionDAGBuilder::visitUIToFP(const User &I) { // UIToFP is never a no-op cast, no need to check SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getNode(ISD::UINT_TO_FP, getCurDebugLoc(), DestVT, N)); } void SelectionDAGBuilder::visitSIToFP(const User &I){ // SIToFP is never a no-op cast, no need to check SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getNode(ISD::SINT_TO_FP, getCurDebugLoc(), DestVT, N)); } void SelectionDAGBuilder::visitPtrToInt(const User &I) { // What to do depends on the size of the integer and the size of the pointer. // We can either truncate, zero extend, or no-op, accordingly. SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getZExtOrTrunc(N, getCurDebugLoc(), DestVT)); } void SelectionDAGBuilder::visitIntToPtr(const User &I) { // What to do depends on the size of the integer and the size of the pointer. // We can either truncate, zero extend, or no-op, accordingly. SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); setValue(&I, DAG.getZExtOrTrunc(N, getCurDebugLoc(), DestVT)); } void SelectionDAGBuilder::visitBitCast(const User &I) { SDValue N = getValue(I.getOperand(0)); EVT DestVT = TLI.getValueType(I.getType()); // BitCast assures us that source and destination are the same size so this is // either a BITCAST or a no-op. if (DestVT != N.getValueType()) setValue(&I, DAG.getNode(ISD::BITCAST, getCurDebugLoc(), DestVT, N)); // convert types. else setValue(&I, N); // noop cast. } void SelectionDAGBuilder::visitInsertElement(const User &I) { SDValue InVec = getValue(I.getOperand(0)); SDValue InVal = getValue(I.getOperand(1)); SDValue InIdx = DAG.getNode(ISD::ZERO_EXTEND, getCurDebugLoc(), TLI.getPointerTy(), getValue(I.getOperand(2))); setValue(&I, DAG.getNode(ISD::INSERT_VECTOR_ELT, getCurDebugLoc(), TLI.getValueType(I.getType()), InVec, InVal, InIdx)); } void SelectionDAGBuilder::visitExtractElement(const User &I) { SDValue InVec = getValue(I.getOperand(0)); SDValue InIdx = DAG.getNode(ISD::ZERO_EXTEND, getCurDebugLoc(), TLI.getPointerTy(), getValue(I.getOperand(1))); setValue(&I, DAG.getNode(ISD::EXTRACT_VECTOR_ELT, getCurDebugLoc(), TLI.getValueType(I.getType()), InVec, InIdx)); } // Utility for visitShuffleVector - Returns true if the mask is mask starting // from SIndx and increasing to the element length (undefs are allowed). static bool SequentialMask(SmallVectorImpl &Mask, unsigned SIndx) { unsigned MaskNumElts = Mask.size(); for (unsigned i = 0; i != MaskNumElts; ++i) if ((Mask[i] >= 0) && (Mask[i] != (int)(i + SIndx))) return false; return true; } void SelectionDAGBuilder::visitShuffleVector(const User &I) { SmallVector Mask; SDValue Src1 = getValue(I.getOperand(0)); SDValue Src2 = getValue(I.getOperand(1)); // Convert the ConstantVector mask operand into an array of ints, with -1 // representing undef values. SmallVector MaskElts; cast(I.getOperand(2))->getVectorElements(MaskElts); unsigned MaskNumElts = MaskElts.size(); for (unsigned i = 0; i != MaskNumElts; ++i) { if (isa(MaskElts[i])) Mask.push_back(-1); else Mask.push_back(cast(MaskElts[i])->getSExtValue()); } EVT VT = TLI.getValueType(I.getType()); EVT SrcVT = Src1.getValueType(); unsigned SrcNumElts = SrcVT.getVectorNumElements(); if (SrcNumElts == MaskNumElts) { setValue(&I, DAG.getVectorShuffle(VT, getCurDebugLoc(), Src1, Src2, &Mask[0])); return; } // Normalize the shuffle vector since mask and vector length don't match. if (SrcNumElts < MaskNumElts && MaskNumElts % SrcNumElts == 0) { // Mask is longer than the source vectors and is a multiple of the source // vectors. We can use concatenate vector to make the mask and vectors // lengths match. if (SrcNumElts*2 == MaskNumElts && SequentialMask(Mask, 0)) { // The shuffle is concatenating two vectors together. setValue(&I, DAG.getNode(ISD::CONCAT_VECTORS, getCurDebugLoc(), VT, Src1, Src2)); return; } // Pad both vectors with undefs to make them the same length as the mask. unsigned NumConcat = MaskNumElts / SrcNumElts; bool Src1U = Src1.getOpcode() == ISD::UNDEF; bool Src2U = Src2.getOpcode() == ISD::UNDEF; SDValue UndefVal = DAG.getUNDEF(SrcVT); SmallVector MOps1(NumConcat, UndefVal); SmallVector MOps2(NumConcat, UndefVal); MOps1[0] = Src1; MOps2[0] = Src2; Src1 = Src1U ? DAG.getUNDEF(VT) : DAG.getNode(ISD::CONCAT_VECTORS, getCurDebugLoc(), VT, &MOps1[0], NumConcat); Src2 = Src2U ? DAG.getUNDEF(VT) : DAG.getNode(ISD::CONCAT_VECTORS, getCurDebugLoc(), VT, &MOps2[0], NumConcat); // Readjust mask for new input vector length. SmallVector MappedOps; for (unsigned i = 0; i != MaskNumElts; ++i) { int Idx = Mask[i]; if (Idx < (int)SrcNumElts) MappedOps.push_back(Idx); else MappedOps.push_back(Idx + MaskNumElts - SrcNumElts); } setValue(&I, DAG.getVectorShuffle(VT, getCurDebugLoc(), Src1, Src2, &MappedOps[0])); return; } if (SrcNumElts > MaskNumElts) { // Analyze the access pattern of the vector to see if we can extract // two subvectors and do the shuffle. The analysis is done by calculating // the range of elements the mask access on both vectors. int MinRange[2] = { static_cast(SrcNumElts+1), static_cast(SrcNumElts+1)}; int MaxRange[2] = {-1, -1}; for (unsigned i = 0; i != MaskNumElts; ++i) { int Idx = Mask[i]; int Input = 0; if (Idx < 0) continue; if (Idx >= (int)SrcNumElts) { Input = 1; Idx -= SrcNumElts; } if (Idx > MaxRange[Input]) MaxRange[Input] = Idx; if (Idx < MinRange[Input]) MinRange[Input] = Idx; } // Check if the access is smaller than the vector size and can we find // a reasonable extract index. int RangeUse[2] = { 2, 2 }; // 0 = Unused, 1 = Extract, 2 = Can not // Extract. int StartIdx[2]; // StartIdx to extract from for (int Input=0; Input < 2; ++Input) { if (MinRange[Input] == (int)(SrcNumElts+1) && MaxRange[Input] == -1) { RangeUse[Input] = 0; // Unused StartIdx[Input] = 0; } else if (MaxRange[Input] - MinRange[Input] < (int)MaskNumElts) { // Fits within range but we should see if we can find a good // start index that is a multiple of the mask length. if (MaxRange[Input] < (int)MaskNumElts) { RangeUse[Input] = 1; // Extract from beginning of the vector StartIdx[Input] = 0; } else { StartIdx[Input] = (MinRange[Input]/MaskNumElts)*MaskNumElts; if (MaxRange[Input] - StartIdx[Input] < (int)MaskNumElts && StartIdx[Input] + MaskNumElts <= SrcNumElts) RangeUse[Input] = 1; // Extract from a multiple of the mask length. } } } if (RangeUse[0] == 0 && RangeUse[1] == 0) { setValue(&I, DAG.getUNDEF(VT)); // Vectors are not used. return; } else if (RangeUse[0] < 2 && RangeUse[1] < 2) { // Extract appropriate subvector and generate a vector shuffle for (int Input=0; Input < 2; ++Input) { SDValue &Src = Input == 0 ? Src1 : Src2; if (RangeUse[Input] == 0) Src = DAG.getUNDEF(VT); else Src = DAG.getNode(ISD::EXTRACT_SUBVECTOR, getCurDebugLoc(), VT, Src, DAG.getIntPtrConstant(StartIdx[Input])); } // Calculate new mask. SmallVector MappedOps; for (unsigned i = 0; i != MaskNumElts; ++i) { int Idx = Mask[i]; if (Idx < 0) MappedOps.push_back(Idx); else if (Idx < (int)SrcNumElts) MappedOps.push_back(Idx - StartIdx[0]); else MappedOps.push_back(Idx - SrcNumElts - StartIdx[1] + MaskNumElts); } setValue(&I, DAG.getVectorShuffle(VT, getCurDebugLoc(), Src1, Src2, &MappedOps[0])); return; } } // We can't use either concat vectors or extract subvectors so fall back to // replacing the shuffle with extract and build vector. // to insert and build vector. EVT EltVT = VT.getVectorElementType(); EVT PtrVT = TLI.getPointerTy(); SmallVector Ops; for (unsigned i = 0; i != MaskNumElts; ++i) { if (Mask[i] < 0) { Ops.push_back(DAG.getUNDEF(EltVT)); } else { int Idx = Mask[i]; SDValue Res; if (Idx < (int)SrcNumElts) Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, getCurDebugLoc(), EltVT, Src1, DAG.getConstant(Idx, PtrVT)); else Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, getCurDebugLoc(), EltVT, Src2, DAG.getConstant(Idx - SrcNumElts, PtrVT)); Ops.push_back(Res); } } setValue(&I, DAG.getNode(ISD::BUILD_VECTOR, getCurDebugLoc(), VT, &Ops[0], Ops.size())); } void SelectionDAGBuilder::visitInsertValue(const InsertValueInst &I) { const Value *Op0 = I.getOperand(0); const Value *Op1 = I.getOperand(1); Type *AggTy = I.getType(); Type *ValTy = Op1->getType(); bool IntoUndef = isa(Op0); bool FromUndef = isa(Op1); unsigned LinearIndex = ComputeLinearIndex(AggTy, I.getIndices()); SmallVector AggValueVTs; ComputeValueVTs(TLI, AggTy, AggValueVTs); SmallVector ValValueVTs; ComputeValueVTs(TLI, ValTy, ValValueVTs); unsigned NumAggValues = AggValueVTs.size(); unsigned NumValValues = ValValueVTs.size(); SmallVector Values(NumAggValues); SDValue Agg = getValue(Op0); unsigned i = 0; // Copy the beginning value(s) from the original aggregate. for (; i != LinearIndex; ++i) Values[i] = IntoUndef ? DAG.getUNDEF(AggValueVTs[i]) : SDValue(Agg.getNode(), Agg.getResNo() + i); // Copy values from the inserted value(s). if (NumValValues) { SDValue Val = getValue(Op1); for (; i != LinearIndex + NumValValues; ++i) Values[i] = FromUndef ? DAG.getUNDEF(AggValueVTs[i]) : SDValue(Val.getNode(), Val.getResNo() + i - LinearIndex); } // Copy remaining value(s) from the original aggregate. for (; i != NumAggValues; ++i) Values[i] = IntoUndef ? DAG.getUNDEF(AggValueVTs[i]) : SDValue(Agg.getNode(), Agg.getResNo() + i); setValue(&I, DAG.getNode(ISD::MERGE_VALUES, getCurDebugLoc(), DAG.getVTList(&AggValueVTs[0], NumAggValues), &Values[0], NumAggValues)); } void SelectionDAGBuilder::visitExtractValue(const ExtractValueInst &I) { const Value *Op0 = I.getOperand(0); Type *AggTy = Op0->getType(); Type *ValTy = I.getType(); bool OutOfUndef = isa(Op0); unsigned LinearIndex = ComputeLinearIndex(AggTy, I.getIndices()); SmallVector ValValueVTs; ComputeValueVTs(TLI, ValTy, ValValueVTs); unsigned NumValValues = ValValueVTs.size(); // Ignore a extractvalue that produces an empty object if (!NumValValues) { setValue(&I, DAG.getUNDEF(MVT(MVT::Other))); return; } SmallVector Values(NumValValues); SDValue Agg = getValue(Op0); // Copy out the selected value(s). for (unsigned i = LinearIndex; i != LinearIndex + NumValValues; ++i) Values[i - LinearIndex] = OutOfUndef ? DAG.getUNDEF(Agg.getNode()->getValueType(Agg.getResNo() + i)) : SDValue(Agg.getNode(), Agg.getResNo() + i); setValue(&I, DAG.getNode(ISD::MERGE_VALUES, getCurDebugLoc(), DAG.getVTList(&ValValueVTs[0], NumValValues), &Values[0], NumValValues)); } void SelectionDAGBuilder::visitGetElementPtr(const User &I) { SDValue N = getValue(I.getOperand(0)); Type *Ty = I.getOperand(0)->getType(); for (GetElementPtrInst::const_op_iterator OI = I.op_begin()+1, E = I.op_end(); OI != E; ++OI) { const Value *Idx = *OI; if (StructType *StTy = dyn_cast(Ty)) { unsigned Field = cast(Idx)->getZExtValue(); if (Field) { // N = N + Offset uint64_t Offset = TD->getStructLayout(StTy)->getElementOffset(Field); N = DAG.getNode(ISD::ADD, getCurDebugLoc(), N.getValueType(), N, DAG.getIntPtrConstant(Offset)); } Ty = StTy->getElementType(Field); } else { Ty = cast(Ty)->getElementType(); // If this is a constant subscript, handle it quickly. if (const ConstantInt *CI = dyn_cast(Idx)) { if (CI->isZero()) continue; uint64_t Offs = TD->getTypeAllocSize(Ty)*cast(CI)->getSExtValue(); SDValue OffsVal; EVT PTy = TLI.getPointerTy(); unsigned PtrBits = PTy.getSizeInBits(); if (PtrBits < 64) OffsVal = DAG.getNode(ISD::TRUNCATE, getCurDebugLoc(), TLI.getPointerTy(), DAG.getConstant(Offs, MVT::i64)); else OffsVal = DAG.getIntPtrConstant(Offs); N = DAG.getNode(ISD::ADD, getCurDebugLoc(), N.getValueType(), N, OffsVal); continue; } // N = N + Idx * ElementSize; APInt ElementSize = APInt(TLI.getPointerTy().getSizeInBits(), TD->getTypeAllocSize(Ty)); SDValue IdxN = getValue(Idx); // If the index is smaller or larger than intptr_t, truncate or extend // it. IdxN = DAG.getSExtOrTrunc(IdxN, getCurDebugLoc(), N.getValueType()); // If this is a multiply by a power of two, turn it into a shl // immediately. This is a very common case. if (ElementSize != 1) { if (ElementSize.isPowerOf2()) { unsigned Amt = ElementSize.logBase2(); IdxN = DAG.getNode(ISD::SHL, getCurDebugLoc(), N.getValueType(), IdxN, DAG.getConstant(Amt, TLI.getPointerTy())); } else { SDValue Scale = DAG.getConstant(ElementSize, TLI.getPointerTy()); IdxN = DAG.getNode(ISD::MUL, getCurDebugLoc(), N.getValueType(), IdxN, Scale); } } N = DAG.getNode(ISD::ADD, getCurDebugLoc(), N.getValueType(), N, IdxN); } } setValue(&I, N); } void SelectionDAGBuilder::visitAlloca(const AllocaInst &I) { // If this is a fixed sized alloca in the entry block of the function, // allocate it statically on the stack. if (FuncInfo.StaticAllocaMap.count(&I)) return; // getValue will auto-populate this. Type *Ty = I.getAllocatedType(); uint64_t TySize = TLI.getTargetData()->getTypeAllocSize(Ty); unsigned Align = std::max((unsigned)TLI.getTargetData()->getPrefTypeAlignment(Ty), I.getAlignment()); SDValue AllocSize = getValue(I.getArraySize()); EVT IntPtr = TLI.getPointerTy(); if (AllocSize.getValueType() != IntPtr) AllocSize = DAG.getZExtOrTrunc(AllocSize, getCurDebugLoc(), IntPtr); AllocSize = DAG.getNode(ISD::MUL, getCurDebugLoc(), IntPtr, AllocSize, DAG.getConstant(TySize, IntPtr)); // Handle alignment. If the requested alignment is less than or equal to // the stack alignment, ignore it. If the size is greater than or equal to // the stack alignment, we note this in the DYNAMIC_STACKALLOC node. unsigned StackAlign = TM.getFrameLowering()->getStackAlignment(); if (Align <= StackAlign) Align = 0; // Round the size of the allocation up to the stack alignment size // by add SA-1 to the size. AllocSize = DAG.getNode(ISD::ADD, getCurDebugLoc(), AllocSize.getValueType(), AllocSize, DAG.getIntPtrConstant(StackAlign-1)); // Mask out the low bits for alignment purposes. AllocSize = DAG.getNode(ISD::AND, getCurDebugLoc(), AllocSize.getValueType(), AllocSize, DAG.getIntPtrConstant(~(uint64_t)(StackAlign-1))); SDValue Ops[] = { getRoot(), AllocSize, DAG.getIntPtrConstant(Align) }; SDVTList VTs = DAG.getVTList(AllocSize.getValueType(), MVT::Other); SDValue DSA = DAG.getNode(ISD::DYNAMIC_STACKALLOC, getCurDebugLoc(), VTs, Ops, 3); setValue(&I, DSA); DAG.setRoot(DSA.getValue(1)); // Inform the Frame Information that we have just allocated a variable-sized // object. FuncInfo.MF->getFrameInfo()->CreateVariableSizedObject(Align ? Align : 1); } void SelectionDAGBuilder::visitLoad(const LoadInst &I) { if (I.isAtomic()) return visitAtomicLoad(I); const Value *SV = I.getOperand(0); SDValue Ptr = getValue(SV); Type *Ty = I.getType(); bool isVolatile = I.isVolatile(); bool isNonTemporal = I.getMetadata("nontemporal") != 0; unsigned Alignment = I.getAlignment(); const MDNode *TBAAInfo = I.getMetadata(LLVMContext::MD_tbaa); SmallVector ValueVTs; SmallVector Offsets; ComputeValueVTs(TLI, Ty, ValueVTs, &Offsets); unsigned NumValues = ValueVTs.size(); if (NumValues == 0) return; SDValue Root; bool ConstantMemory = false; if (I.isVolatile() || NumValues > MaxParallelChains) // Serialize volatile loads with other side effects. Root = getRoot(); else if (AA->pointsToConstantMemory( AliasAnalysis::Location(SV, AA->getTypeStoreSize(Ty), TBAAInfo))) { // Do not serialize (non-volatile) loads of constant memory with anything. Root = DAG.getEntryNode(); ConstantMemory = true; } else { // Do not serialize non-volatile loads against each other. Root = DAG.getRoot(); } SmallVector Values(NumValues); SmallVector Chains(std::min(unsigned(MaxParallelChains), NumValues)); EVT PtrVT = Ptr.getValueType(); unsigned ChainI = 0; for (unsigned i = 0; i != NumValues; ++i, ++ChainI) { // Serializing loads here may result in excessive register pressure, and // TokenFactor places arbitrary choke points on the scheduler. SD scheduling // could recover a bit by hoisting nodes upward in the chain by recognizing // they are side-effect free or do not alias. The optimizer should really // avoid this case by converting large object/array copies to llvm.memcpy // (MaxParallelChains should always remain as failsafe). if (ChainI == MaxParallelChains) { assert(PendingLoads.empty() && "PendingLoads must be serialized first"); SDValue Chain = DAG.getNode(ISD::TokenFactor, getCurDebugLoc(), MVT::Other, &Chains[0], ChainI); Root = Chain; ChainI = 0; } SDValue A = DAG.getNode(ISD::ADD, getCurDebugLoc(), PtrVT, Ptr, DAG.getConstant(Offsets[i], PtrVT)); SDValue L = DAG.getLoad(ValueVTs[i], getCurDebugLoc(), Root, A, MachinePointerInfo(SV, Offsets[i]), isVolatile, isNonTemporal, Alignment, TBAAInfo); Values[i] = L; Chains[ChainI] = L.getValue(1); } if (!ConstantMemory) { SDValue Chain = DAG.getNode(ISD::TokenFactor, getCurDebugLoc(), MVT::Other, &Chains[0], ChainI); if (isVolatile) DAG.setRoot(Chain); else PendingLoads.push_back(Chain); } setValue(&I, DAG.getNode(ISD::MERGE_VALUES, getCurDebugLoc(), DAG.getVTList(&ValueVTs[0], NumValues), &Values[0], NumValues)); } void SelectionDAGBuilder::visitStore(const StoreInst &I) { if (I.isAtomic()) return visitAtomicStore(I); const Value *SrcV = I.getOperand(0); const Value *PtrV = I.getOperand(1); SmallVector ValueVTs; SmallVector Offsets; ComputeValueVTs(TLI, SrcV->getType(), ValueVTs, &Offsets); unsigned NumValues = ValueVTs.size(); if (NumValues == 0) return; // Get the lowered operands. Note that we do this after // checking if NumResults is zero, because with zero results // the operands won't have values in the map. SDValue Src = getValue(SrcV); SDValue Ptr = getValue(PtrV); SDValue Root = getRoot(); SmallVector Chains(std::min(unsigned(MaxParallelChains), NumValues)); EVT PtrVT = Ptr.getValueType(); bool isVolatile = I.isVolatile(); bool isNonTemporal = I.getMetadata("nontemporal") != 0; unsigned Alignment = I.getAlignment(); const MDNode *TBAAInfo = I.getMetadata(LLVMContext::MD_tbaa); unsigned ChainI = 0; for (unsigned i = 0; i != NumValues; ++i, ++ChainI) { // See visitLoad comments. if (ChainI == MaxParallelChains) { SDValue Chain = DAG.getNode(ISD::TokenFactor, getCurDebugLoc(), MVT::Other, &Chains[0], ChainI); Root = Chain; ChainI = 0; } SDValue Add = DAG.getNode(ISD::ADD, getCurDebugLoc(), PtrVT, Ptr, DAG.getConstant(Offsets[i], PtrVT)); SDValue St = DAG.getStore(Root, getCurDebugLoc(), SDValue(Src.getNode(), Src.getResNo() + i), Add, MachinePointerInfo(PtrV, Offsets[i]), isVolatile, isNonTemporal, Alignment, TBAAInfo); Chains[ChainI] = St; } SDValue StoreNode = DAG.getNode(ISD::TokenFactor, getCurDebugLoc(), MVT::Other, &Chains[0], ChainI); ++SDNodeOrder; AssignOrderingToNode(StoreNode.getNode()); DAG.setRoot(StoreNode); } static SDValue InsertFenceForAtomic(SDValue Chain, AtomicOrdering Order, SynchronizationScope Scope, bool Before, DebugLoc dl, SelectionDAG &DAG, const TargetLowering &TLI) { // Fence, if necessary if (Before) { if (Order == AcquireRelease || Order == SequentiallyConsistent) Order = Release; else if (Order == Acquire || Order == Monotonic) return Chain; } else { if (Order == AcquireRelease) Order = Acquire; else if (Order == Release || Order == Monotonic) return Chain; } SDValue Ops[3]; Ops[0] = Chain; Ops[1] = DAG.getConstant(Order, TLI.getPointerTy()); Ops[2] = DAG.getConstant(Scope, TLI.getPointerTy()); return DAG.getNode(ISD::ATOMIC_FENCE, dl, MVT::Other, Ops, 3); } void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) { DebugLoc dl = getCurDebugLoc(); AtomicOrdering Order = I.getOrdering(); SynchronizationScope Scope = I.getSynchScope(); SDValue InChain = getRoot(); if (TLI.getInsertFencesForAtomic()) InChain = InsertFenceForAtomic(InChain, Order, Scope, true, dl, DAG, TLI); SDValue L = DAG.getAtomic(ISD::ATOMIC_CMP_SWAP, dl, getValue(I.getCompareOperand()).getValueType().getSimpleVT(), InChain, getValue(I.getPointerOperand()), getValue(I.getCompareOperand()), getValue(I.getNewValOperand()), MachinePointerInfo(I.getPointerOperand()), 0 /* Alignment */, TLI.getInsertFencesForAtomic() ? Monotonic : Order, Scope); SDValue OutChain = L.getValue(1); if (TLI.getInsertFencesForAtomic()) OutChain = InsertFenceForAtomic(OutChain, Order, Scope, false, dl, DAG, TLI); setValue(&I, L); DAG.setRoot(OutChain); } void SelectionDAGBuilder::visitAtomicRMW(const AtomicRMWInst &I) { DebugLoc dl = getCurDebugLoc(); ISD::NodeType NT; switch (I.getOperation()) { default: llvm_unreachable("Unknown atomicrmw operation"); return; case AtomicRMWInst::Xchg: NT = ISD::ATOMIC_SWAP; break; case AtomicRMWInst::Add: NT = ISD::ATOMIC_LOAD_ADD; break; case AtomicRMWInst::Sub: NT = ISD::ATOMIC_LOAD_SUB; break; case AtomicRMWInst::And: NT = ISD::ATOMIC_LOAD_AND; break; case AtomicRMWInst::Nand: NT = ISD::ATOMIC_LOAD_NAND; break; case AtomicRMWInst::Or: NT = ISD::ATOMIC_LOAD_OR; break; case AtomicRMWInst::Xor: NT = ISD::ATOMIC_LOAD_XOR; break; case AtomicRMWInst::Max: NT = ISD::ATOMIC_LOAD_MAX; break; case AtomicRMWInst::Min: NT = ISD::ATOMIC_LOAD_MIN; break; case AtomicRMWInst::UMax: NT = ISD::ATOMIC_LOAD_UMAX; break; case AtomicRMWInst::UMin: NT = ISD::ATOMIC_LOAD_UMIN; break; } AtomicOrdering Order = I.getOrdering(); SynchronizationScope Scope = I.getSynchScope(); SDValue InChain = getRoot(); if (TLI.getInsertFencesForAtomic()) InChain = InsertFenceForAtomic(InChain, Order, Scope, true, dl, DAG, TLI); SDValue L = DAG.getAtomic(NT, dl, getValue(I.getValOperand()).getValueType().getSimpleVT(), InChain, getValue(I.getPointerOperand()), getValue(I.getValOperand()), I.getPointerOperand(), 0 /* Alignment */, TLI.getInsertFencesForAtomic() ? Monotonic : Order, Scope); SDValue OutChain = L.getValue(1); if (TLI.getInsertFencesForAtomic()) OutChain = InsertFenceForAtomic(OutChain, Order, Scope, false, dl, DAG, TLI); setValue(&I, L); DAG.setRoot(OutChain); } void SelectionDAGBuilder::visitFence(const FenceInst &I) { DebugLoc dl = getCurDebugLoc(); SDValue Ops[3]; Ops[0] = getRoot(); Ops[1] = DAG.getConstant(I.getOrdering(), TLI.getPointerTy()); Ops[2] = DAG.getConstant(I.getSynchScope(), TLI.getPointerTy()); DAG.setRoot(DAG.getNode(ISD::ATOMIC_FENCE, dl, MVT::Other, Ops, 3)); } void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) { DebugLoc dl = getCurDebugLoc(); AtomicOrdering Order = I.getOrdering(); SynchronizationScope Scope = I.getSynchScope(); SDValue InChain = getRoot(); EVT VT = EVT::getEVT(I.getType()); if (I.getAlignment() * 8 < VT.getSizeInBits()) report_fatal_error("Cannot generate unaligned atomic load"); SDValue L = DAG.getAtomic(ISD::ATOMIC_LOAD, dl, VT, VT, InChain, getValue(I.getPointerOperand()), I.getPointerOperand(), I.getAlignment(), TLI.getInsertFencesForAtomic() ? Monotonic : Order, Scope); SDValue OutChain = L.getValue(1); if (TLI.getInsertFencesForAtomic()) OutChain = InsertFenceForAtomic(OutChain, Order, Scope, false, dl, DAG, TLI); setValue(&I, L); DAG.setRoot(OutChain); } void SelectionDAGBuilder::visitAtomicStore(const StoreInst &I) { DebugLoc dl = getCurDebugLoc(); AtomicOrdering Order = I.getOrdering(); SynchronizationScope Scope = I.getSynchScope(); SDValue InChain = getRoot(); EVT VT = EVT::getEVT(I.getValueOperand()->getType()); if (I.getAlignment() * 8 < VT.getSizeInBits()) report_fatal_error("Cannot generate unaligned atomic store"); if (TLI.getInsertFencesForAtomic()) InChain = InsertFenceForAtomic(InChain, Order, Scope, true, dl, DAG, TLI); SDValue OutChain = DAG.getAtomic(ISD::ATOMIC_STORE, dl, VT, InChain, getValue(I.getPointerOperand()), getValue(I.getValueOperand()), I.getPointerOperand(), I.getAlignment(), TLI.getInsertFencesForAtomic() ? Monotonic : Order, Scope); if (TLI.getInsertFencesForAtomic()) OutChain = InsertFenceForAtomic(OutChain, Order, Scope, false, dl, DAG, TLI); DAG.setRoot(OutChain); } /// visitTargetIntrinsic - Lower a call of a target intrinsic to an INTRINSIC /// node. void SelectionDAGBuilder::visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic) { bool HasChain = !I.doesNotAccessMemory(); bool OnlyLoad = HasChain && I.onlyReadsMemory(); // Build the operand list. SmallVector Ops; if (HasChain) { // If this intrinsic has side-effects, chainify it. if (OnlyLoad) { // We don't need to serialize loads against other loads. Ops.push_back(DAG.getRoot()); } else { Ops.push_back(getRoot()); } } // Info is set by getTgtMemInstrinsic TargetLowering::IntrinsicInfo Info; bool IsTgtIntrinsic = TLI.getTgtMemIntrinsic(Info, I, Intrinsic); // Add the intrinsic ID as an integer operand if it's not a target intrinsic. if (!IsTgtIntrinsic || Info.opc == ISD::INTRINSIC_VOID || Info.opc == ISD::INTRINSIC_W_CHAIN) Ops.push_back(DAG.getConstant(Intrinsic, TLI.getPointerTy())); // Add all operands of the call to the operand list. for (unsigned i = 0, e = I.getNumArgOperands(); i != e; ++i) { SDValue Op = getValue(I.getArgOperand(i)); assert(TLI.isTypeLegal(Op.getValueType()) && "Intrinsic uses a non-legal type?"); Ops.push_back(Op); } SmallVector ValueVTs; ComputeValueVTs(TLI, I.getType(), ValueVTs); #ifndef NDEBUG for (unsigned Val = 0, E = ValueVTs.size(); Val != E; ++Val) { assert(TLI.isTypeLegal(ValueVTs[Val]) && "Intrinsic uses a non-legal type?"); } #endif // NDEBUG if (HasChain) ValueVTs.push_back(MVT::Other); SDVTList VTs = DAG.getVTList(ValueVTs.data(), ValueVTs.size()); // Create the node. SDValue Result; if (IsTgtIntrinsic) { // This is target intrinsic that touches memory Result = DAG.getMemIntrinsicNode(Info.opc, getCurDebugLoc(), VTs, &Ops[0], Ops.size(), Info.memVT, MachinePointerInfo(Info.ptrVal, Info.offset), Info.align, Info.vol, Info.readMem, Info.writeMem); } else if (!HasChain) { Result = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, getCurDebugLoc(), VTs, &Ops[0], Ops.size()); } else if (!I.getType()->isVoidTy()) { Result = DAG.getNode(ISD::INTRINSIC_W_CHAIN, getCurDebugLoc(), VTs, &Ops[0], Ops.size()); } else { Result = DAG.getNode(ISD::INTRINSIC_VOID, getCurDebugLoc(), VTs, &Ops[0], Ops.size()); } if (HasChain) { SDValue Chain = Result.getValue(Result.getNode()->getNumValues()-1); if (OnlyLoad) PendingLoads.push_back(Chain); else DAG.setRoot(Chain); } if (!I.getType()->isVoidTy()) { if (VectorType *PTy = dyn_cast(I.getType())) { EVT VT = TLI.getValueType(PTy); Result = DAG.getNode(ISD::BITCAST, getCurDebugLoc(), VT, Result); } setValue(&I, Result); } } /// GetSignificand - Get the significand and build it into a floating-point /// number with exponent of 1: /// /// Op = (Op & 0x007fffff) | 0x3f800000; /// /// where Op is the hexidecimal representation of floating point value. static SDValue GetSignificand(SelectionDAG &DAG, SDValue Op, DebugLoc dl) { SDValue t1 = DAG.getNode(ISD::AND, dl, MVT::i32, Op, DAG.getConstant(0x007fffff, MVT::i32)); SDValue t2 = DAG.getNode(ISD::OR, dl, MVT::i32, t1, DAG.getConstant(0x3f800000, MVT::i32)); return DAG.getNode(ISD::BITCAST, dl, MVT::f32, t2); } /// GetExponent - Get the exponent: /// /// (float)(int)(((Op & 0x7f800000) >> 23) - 127); /// /// where Op is the hexidecimal representation of floating point value. static SDValue GetExponent(SelectionDAG &DAG, SDValue Op, const TargetLowering &TLI, DebugLoc dl) { SDValue t0 = DAG.getNode(ISD::AND, dl, MVT::i32, Op, DAG.getConstant(0x7f800000, MVT::i32)); SDValue t1 = DAG.getNode(ISD::SRL, dl, MVT::i32, t0, DAG.getConstant(23, TLI.getPointerTy())); SDValue t2 = DAG.getNode(ISD::SUB, dl, MVT::i32, t1, DAG.getConstant(127, MVT::i32)); return DAG.getNode(ISD::SINT_TO_FP, dl, MVT::f32, t2); } /// getF32Constant - Get 32-bit floating point constant. static SDValue getF32Constant(SelectionDAG &DAG, unsigned Flt) { return DAG.getConstantFP(APFloat(APInt(32, Flt)), MVT::f32); } // implVisitAluOverflow - Lower arithmetic overflow instrinsics. const char * SelectionDAGBuilder::implVisitAluOverflow(const CallInst &I, ISD::NodeType Op) { SDValue Op1 = getValue(I.getArgOperand(0)); SDValue Op2 = getValue(I.getArgOperand(1)); SDVTList VTs = DAG.getVTList(Op1.getValueType(), MVT::i1); setValue(&I, DAG.getNode(Op, getCurDebugLoc(), VTs, Op1, Op2)); return 0; } /// visitExp - Lower an exp intrinsic. Handles the special sequences for /// limited-precision mode. void SelectionDAGBuilder::visitExp(const CallInst &I) { SDValue result; DebugLoc dl = getCurDebugLoc(); if (getValue(I.getArgOperand(0)).getValueType() == MVT::f32 && LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) { SDValue Op = getValue(I.getArgOperand(0)); // Put the exponent in the right bit position for later addition to the // final result: // // #define LOG2OFe 1.4426950f // IntegerPartOfX = ((int32_t)(X * LOG2OFe)); SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, Op, getF32Constant(DAG, 0x3fb8aa3b)); SDValue IntegerPartOfX = DAG.getNode(ISD::FP_TO_SINT, dl, MVT::i32, t0); // FractionalPartOfX = (X * LOG2OFe) - (float)IntegerPartOfX; SDValue t1 = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::f32, IntegerPartOfX); SDValue X = DAG.getNode(ISD::FSUB, dl, MVT::f32, t0, t1); // IntegerPartOfX <<= 23; IntegerPartOfX = DAG.getNode(ISD::SHL, dl, MVT::i32, IntegerPartOfX, DAG.getConstant(23, TLI.getPointerTy())); if (LimitFloatPrecision <= 6) { // For floating-point precision of 6: // // TwoToFractionalPartOfX = // 0.997535578f + // (0.735607626f + 0.252464424f * x) * x; // // error 0.0144103317, which is 6 bits SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0x3e814304)); SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2, getF32Constant(DAG, 0x3f3c50c8)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x3f7f5e7e)); SDValue TwoToFracPartOfX = DAG.getNode(ISD::BITCAST, dl,MVT::i32, t5); // Add the exponent into the result in integer domain. SDValue t6 = DAG.getNode(ISD::ADD, dl, MVT::i32, TwoToFracPartOfX, IntegerPartOfX); result = DAG.getNode(ISD::BITCAST, dl, MVT::f32, t6); } else if (LimitFloatPrecision > 6 && LimitFloatPrecision <= 12) { // For floating-point precision of 12: // // TwoToFractionalPartOfX = // 0.999892986f + // (0.696457318f + // (0.224338339f + 0.792043434e-1f * x) * x) * x; // // 0.000107046256 error, which is 13 to 14 bits SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0x3da235e3)); SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2, getF32Constant(DAG, 0x3e65b8f3)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x3f324b07)); SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X); SDValue t7 = DAG.getNode(ISD::FADD, dl, MVT::f32, t6, getF32Constant(DAG, 0x3f7ff8fd)); SDValue TwoToFracPartOfX = DAG.getNode(ISD::BITCAST, dl,MVT::i32, t7); // Add the exponent into the result in integer domain. SDValue t8 = DAG.getNode(ISD::ADD, dl, MVT::i32, TwoToFracPartOfX, IntegerPartOfX); result = DAG.getNode(ISD::BITCAST, dl, MVT::f32, t8); } else { // LimitFloatPrecision > 12 && LimitFloatPrecision <= 18 // For floating-point precision of 18: // // TwoToFractionalPartOfX = // 0.999999982f + // (0.693148872f + // (0.240227044f + // (0.554906021e-1f + // (0.961591928e-2f + // (0.136028312e-2f + 0.157059148e-3f *x)*x)*x)*x)*x)*x; // // error 2.47208000*10^(-7), which is better than 18 bits SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0x3924b03e)); SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2, getF32Constant(DAG, 0x3ab24b87)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x3c1d8c17)); SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X); SDValue t7 = DAG.getNode(ISD::FADD, dl, MVT::f32, t6, getF32Constant(DAG, 0x3d634a1d)); SDValue t8 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t7, X); SDValue t9 = DAG.getNode(ISD::FADD, dl, MVT::f32, t8, getF32Constant(DAG, 0x3e75fe14)); SDValue t10 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t9, X); SDValue t11 = DAG.getNode(ISD::FADD, dl, MVT::f32, t10, getF32Constant(DAG, 0x3f317234)); SDValue t12 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t11, X); SDValue t13 = DAG.getNode(ISD::FADD, dl, MVT::f32, t12, getF32Constant(DAG, 0x3f800000)); SDValue TwoToFracPartOfX = DAG.getNode(ISD::BITCAST, dl, MVT::i32, t13); // Add the exponent into the result in integer domain. SDValue t14 = DAG.getNode(ISD::ADD, dl, MVT::i32, TwoToFracPartOfX, IntegerPartOfX); result = DAG.getNode(ISD::BITCAST, dl, MVT::f32, t14); } } else { // No special expansion. result = DAG.getNode(ISD::FEXP, dl, getValue(I.getArgOperand(0)).getValueType(), getValue(I.getArgOperand(0))); } setValue(&I, result); } /// visitLog - Lower a log intrinsic. Handles the special sequences for /// limited-precision mode. void SelectionDAGBuilder::visitLog(const CallInst &I) { SDValue result; DebugLoc dl = getCurDebugLoc(); if (getValue(I.getArgOperand(0)).getValueType() == MVT::f32 && LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) { SDValue Op = getValue(I.getArgOperand(0)); SDValue Op1 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, Op); // Scale the exponent by log(2) [0.69314718f]. SDValue Exp = GetExponent(DAG, Op1, TLI, dl); SDValue LogOfExponent = DAG.getNode(ISD::FMUL, dl, MVT::f32, Exp, getF32Constant(DAG, 0x3f317218)); // Get the significand and build it into a floating-point number with // exponent of 1. SDValue X = GetSignificand(DAG, Op1, dl); if (LimitFloatPrecision <= 6) { // For floating-point precision of 6: // // LogofMantissa = // -1.1609546f + // (1.4034025f - 0.23903021f * x) * x; // // error 0.0034276066, which is better than 8 bits SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0xbe74c456)); SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0, getF32Constant(DAG, 0x3fb3a2b1)); SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X); SDValue LogOfMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2, getF32Constant(DAG, 0x3f949a29)); result = DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, LogOfMantissa); } else if (LimitFloatPrecision > 6 && LimitFloatPrecision <= 12) { // For floating-point precision of 12: // // LogOfMantissa = // -1.7417939f + // (2.8212026f + // (-1.4699568f + // (0.44717955f - 0.56570851e-1f * x) * x) * x) * x; // // error 0.000061011436, which is 14 bits SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0xbd67b6d6)); SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0, getF32Constant(DAG, 0x3ee4f4b8)); SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X); SDValue t3 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2, getF32Constant(DAG, 0x3fbc278b)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x40348e95)); SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X); SDValue LogOfMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t6, getF32Constant(DAG, 0x3fdef31a)); result = DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, LogOfMantissa); } else { // LimitFloatPrecision > 12 && LimitFloatPrecision <= 18 // For floating-point precision of 18: // // LogOfMantissa = // -2.1072184f + // (4.2372794f + // (-3.7029485f + // (2.2781945f + // (-0.87823314f + // (0.19073739f - 0.17809712e-1f * x) * x) * x) * x) * x)*x; // // error 0.0000023660568, which is better than 18 bits SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0xbc91e5ac)); SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0, getF32Constant(DAG, 0x3e4350aa)); SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X); SDValue t3 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2, getF32Constant(DAG, 0x3f60d3e3)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x4011cdf0)); SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X); SDValue t7 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t6, getF32Constant(DAG, 0x406cfd1c)); SDValue t8 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t7, X); SDValue t9 = DAG.getNode(ISD::FADD, dl, MVT::f32, t8, getF32Constant(DAG, 0x408797cb)); SDValue t10 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t9, X); SDValue LogOfMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t10, getF32Constant(DAG, 0x4006dcab)); result = DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, LogOfMantissa); } } else { // No special expansion. result = DAG.getNode(ISD::FLOG, dl, getValue(I.getArgOperand(0)).getValueType(), getValue(I.getArgOperand(0))); } setValue(&I, result); } /// visitLog2 - Lower a log2 intrinsic. Handles the special sequences for /// limited-precision mode. void SelectionDAGBuilder::visitLog2(const CallInst &I) { SDValue result; DebugLoc dl = getCurDebugLoc(); if (getValue(I.getArgOperand(0)).getValueType() == MVT::f32 && LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) { SDValue Op = getValue(I.getArgOperand(0)); SDValue Op1 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, Op); // Get the exponent. SDValue LogOfExponent = GetExponent(DAG, Op1, TLI, dl); // Get the significand and build it into a floating-point number with // exponent of 1. SDValue X = GetSignificand(DAG, Op1, dl); // Different possible minimax approximations of significand in // floating-point for various degrees of accuracy over [1,2]. if (LimitFloatPrecision <= 6) { // For floating-point precision of 6: // // Log2ofMantissa = -1.6749035f + (2.0246817f - .34484768f * x) * x; // // error 0.0049451742, which is more than 7 bits SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0xbeb08fe0)); SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0, getF32Constant(DAG, 0x40019463)); SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X); SDValue Log2ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2, getF32Constant(DAG, 0x3fd6633d)); result = DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, Log2ofMantissa); } else if (LimitFloatPrecision > 6 && LimitFloatPrecision <= 12) { // For floating-point precision of 12: // // Log2ofMantissa = // -2.51285454f + // (4.07009056f + // (-2.12067489f + // (.645142248f - 0.816157886e-1f * x) * x) * x) * x; // // error 0.0000876136000, which is better than 13 bits SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0xbda7262e)); SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0, getF32Constant(DAG, 0x3f25280b)); SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X); SDValue t3 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2, getF32Constant(DAG, 0x4007b923)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x40823e2f)); SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X); SDValue Log2ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t6, getF32Constant(DAG, 0x4020d29c)); result = DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, Log2ofMantissa); } else { // LimitFloatPrecision > 12 && LimitFloatPrecision <= 18 // For floating-point precision of 18: // // Log2ofMantissa = // -3.0400495f + // (6.1129976f + // (-5.3420409f + // (3.2865683f + // (-1.2669343f + // (0.27515199f - // 0.25691327e-1f * x) * x) * x) * x) * x) * x; // // error 0.0000018516, which is better than 18 bits SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0xbcd2769e)); SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0, getF32Constant(DAG, 0x3e8ce0b9)); SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X); SDValue t3 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2, getF32Constant(DAG, 0x3fa22ae7)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x40525723)); SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X); SDValue t7 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t6, getF32Constant(DAG, 0x40aaf200)); SDValue t8 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t7, X); SDValue t9 = DAG.getNode(ISD::FADD, dl, MVT::f32, t8, getF32Constant(DAG, 0x40c39dad)); SDValue t10 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t9, X); SDValue Log2ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t10, getF32Constant(DAG, 0x4042902c)); result = DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, Log2ofMantissa); } } else { // No special expansion. result = DAG.getNode(ISD::FLOG2, dl, getValue(I.getArgOperand(0)).getValueType(), getValue(I.getArgOperand(0))); } setValue(&I, result); } /// visitLog10 - Lower a log10 intrinsic. Handles the special sequences for /// limited-precision mode. void SelectionDAGBuilder::visitLog10(const CallInst &I) { SDValue result; DebugLoc dl = getCurDebugLoc(); if (getValue(I.getArgOperand(0)).getValueType() == MVT::f32 && LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) { SDValue Op = getValue(I.getArgOperand(0)); SDValue Op1 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, Op); // Scale the exponent by log10(2) [0.30102999f]. SDValue Exp = GetExponent(DAG, Op1, TLI, dl); SDValue LogOfExponent = DAG.getNode(ISD::FMUL, dl, MVT::f32, Exp, getF32Constant(DAG, 0x3e9a209a)); // Get the significand and build it into a floating-point number with // exponent of 1. SDValue X = GetSignificand(DAG, Op1, dl); if (LimitFloatPrecision <= 6) { // For floating-point precision of 6: // // Log10ofMantissa = // -0.50419619f + // (0.60948995f - 0.10380950f * x) * x; // // error 0.0014886165, which is 6 bits SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0xbdd49a13)); SDValue t1 = DAG.getNode(ISD::FADD, dl, MVT::f32, t0, getF32Constant(DAG, 0x3f1c0789)); SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X); SDValue Log10ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t2, getF32Constant(DAG, 0x3f011300)); result = DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, Log10ofMantissa); } else if (LimitFloatPrecision > 6 && LimitFloatPrecision <= 12) { // For floating-point precision of 12: // // Log10ofMantissa = // -0.64831180f + // (0.91751397f + // (-0.31664806f + 0.47637168e-1f * x) * x) * x; // // error 0.00019228036, which is better than 12 bits SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0x3d431f31)); SDValue t1 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t0, getF32Constant(DAG, 0x3ea21fb2)); SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X); SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2, getF32Constant(DAG, 0x3f6ae232)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue Log10ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t4, getF32Constant(DAG, 0x3f25f7c3)); result = DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, Log10ofMantissa); } else { // LimitFloatPrecision > 12 && LimitFloatPrecision <= 18 // For floating-point precision of 18: // // Log10ofMantissa = // -0.84299375f + // (1.5327582f + // (-1.0688956f + // (0.49102474f + // (-0.12539807f + 0.13508273e-1f * x) * x) * x) * x) * x; // // error 0.0000037995730, which is better than 18 bits SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0x3c5d51ce)); SDValue t1 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t0, getF32Constant(DAG, 0x3e00685a)); SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t1, X); SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2, getF32Constant(DAG, 0x3efb6798)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FSUB, dl, MVT::f32, t4, getF32Constant(DAG, 0x3f88d192)); SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X); SDValue t7 = DAG.getNode(ISD::FADD, dl, MVT::f32, t6, getF32Constant(DAG, 0x3fc4316c)); SDValue t8 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t7, X); SDValue Log10ofMantissa = DAG.getNode(ISD::FSUB, dl, MVT::f32, t8, getF32Constant(DAG, 0x3f57ce70)); result = DAG.getNode(ISD::FADD, dl, MVT::f32, LogOfExponent, Log10ofMantissa); } } else { // No special expansion. result = DAG.getNode(ISD::FLOG10, dl, getValue(I.getArgOperand(0)).getValueType(), getValue(I.getArgOperand(0))); } setValue(&I, result); } /// visitExp2 - Lower an exp2 intrinsic. Handles the special sequences for /// limited-precision mode. void SelectionDAGBuilder::visitExp2(const CallInst &I) { SDValue result; DebugLoc dl = getCurDebugLoc(); if (getValue(I.getArgOperand(0)).getValueType() == MVT::f32 && LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) { SDValue Op = getValue(I.getArgOperand(0)); SDValue IntegerPartOfX = DAG.getNode(ISD::FP_TO_SINT, dl, MVT::i32, Op); // FractionalPartOfX = x - (float)IntegerPartOfX; SDValue t1 = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::f32, IntegerPartOfX); SDValue X = DAG.getNode(ISD::FSUB, dl, MVT::f32, Op, t1); // IntegerPartOfX <<= 23; IntegerPartOfX = DAG.getNode(ISD::SHL, dl, MVT::i32, IntegerPartOfX, DAG.getConstant(23, TLI.getPointerTy())); if (LimitFloatPrecision <= 6) { // For floating-point precision of 6: // // TwoToFractionalPartOfX = // 0.997535578f + // (0.735607626f + 0.252464424f * x) * x; // // error 0.0144103317, which is 6 bits SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0x3e814304)); SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2, getF32Constant(DAG, 0x3f3c50c8)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x3f7f5e7e)); SDValue t6 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, t5); SDValue TwoToFractionalPartOfX = DAG.getNode(ISD::ADD, dl, MVT::i32, t6, IntegerPartOfX); result = DAG.getNode(ISD::BITCAST, dl, MVT::f32, TwoToFractionalPartOfX); } else if (LimitFloatPrecision > 6 && LimitFloatPrecision <= 12) { // For floating-point precision of 12: // // TwoToFractionalPartOfX = // 0.999892986f + // (0.696457318f + // (0.224338339f + 0.792043434e-1f * x) * x) * x; // // error 0.000107046256, which is 13 to 14 bits SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0x3da235e3)); SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2, getF32Constant(DAG, 0x3e65b8f3)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x3f324b07)); SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X); SDValue t7 = DAG.getNode(ISD::FADD, dl, MVT::f32, t6, getF32Constant(DAG, 0x3f7ff8fd)); SDValue t8 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, t7); SDValue TwoToFractionalPartOfX = DAG.getNode(ISD::ADD, dl, MVT::i32, t8, IntegerPartOfX); result = DAG.getNode(ISD::BITCAST, dl, MVT::f32, TwoToFractionalPartOfX); } else { // LimitFloatPrecision > 12 && LimitFloatPrecision <= 18 // For floating-point precision of 18: // // TwoToFractionalPartOfX = // 0.999999982f + // (0.693148872f + // (0.240227044f + // (0.554906021e-1f + // (0.961591928e-2f + // (0.136028312e-2f + 0.157059148e-3f *x)*x)*x)*x)*x)*x; // error 2.47208000*10^(-7), which is better than 18 bits SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0x3924b03e)); SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2, getF32Constant(DAG, 0x3ab24b87)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x3c1d8c17)); SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X); SDValue t7 = DAG.getNode(ISD::FADD, dl, MVT::f32, t6, getF32Constant(DAG, 0x3d634a1d)); SDValue t8 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t7, X); SDValue t9 = DAG.getNode(ISD::FADD, dl, MVT::f32, t8, getF32Constant(DAG, 0x3e75fe14)); SDValue t10 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t9, X); SDValue t11 = DAG.getNode(ISD::FADD, dl, MVT::f32, t10, getF32Constant(DAG, 0x3f317234)); SDValue t12 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t11, X); SDValue t13 = DAG.getNode(ISD::FADD, dl, MVT::f32, t12, getF32Constant(DAG, 0x3f800000)); SDValue t14 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, t13); SDValue TwoToFractionalPartOfX = DAG.getNode(ISD::ADD, dl, MVT::i32, t14, IntegerPartOfX); result = DAG.getNode(ISD::BITCAST, dl, MVT::f32, TwoToFractionalPartOfX); } } else { // No special expansion. result = DAG.getNode(ISD::FEXP2, dl, getValue(I.getArgOperand(0)).getValueType(), getValue(I.getArgOperand(0))); } setValue(&I, result); } /// visitPow - Lower a pow intrinsic. Handles the special sequences for /// limited-precision mode with x == 10.0f. void SelectionDAGBuilder::visitPow(const CallInst &I) { SDValue result; const Value *Val = I.getArgOperand(0); DebugLoc dl = getCurDebugLoc(); bool IsExp10 = false; if (getValue(Val).getValueType() == MVT::f32 && getValue(I.getArgOperand(1)).getValueType() == MVT::f32 && LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) { if (Constant *C = const_cast(dyn_cast(Val))) { if (ConstantFP *CFP = dyn_cast(C)) { APFloat Ten(10.0f); IsExp10 = CFP->getValueAPF().bitwiseIsEqual(Ten); } } } if (IsExp10 && LimitFloatPrecision > 0 && LimitFloatPrecision <= 18) { SDValue Op = getValue(I.getArgOperand(1)); // Put the exponent in the right bit position for later addition to the // final result: // // #define LOG2OF10 3.3219281f // IntegerPartOfX = (int32_t)(x * LOG2OF10); SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, Op, getF32Constant(DAG, 0x40549a78)); SDValue IntegerPartOfX = DAG.getNode(ISD::FP_TO_SINT, dl, MVT::i32, t0); // FractionalPartOfX = x - (float)IntegerPartOfX; SDValue t1 = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::f32, IntegerPartOfX); SDValue X = DAG.getNode(ISD::FSUB, dl, MVT::f32, t0, t1); // IntegerPartOfX <<= 23; IntegerPartOfX = DAG.getNode(ISD::SHL, dl, MVT::i32, IntegerPartOfX, DAG.getConstant(23, TLI.getPointerTy())); if (LimitFloatPrecision <= 6) { // For floating-point precision of 6: // // twoToFractionalPartOfX = // 0.997535578f + // (0.735607626f + 0.252464424f * x) * x; // // error 0.0144103317, which is 6 bits SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0x3e814304)); SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2, getF32Constant(DAG, 0x3f3c50c8)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x3f7f5e7e)); SDValue t6 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, t5); SDValue TwoToFractionalPartOfX = DAG.getNode(ISD::ADD, dl, MVT::i32, t6, IntegerPartOfX); result = DAG.getNode(ISD::BITCAST, dl, MVT::f32, TwoToFractionalPartOfX); } else if (LimitFloatPrecision > 6 && LimitFloatPrecision <= 12) { // For floating-point precision of 12: // // TwoToFractionalPartOfX = // 0.999892986f + // (0.696457318f + // (0.224338339f + 0.792043434e-1f * x) * x) * x; // // error 0.000107046256, which is 13 to 14 bits SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0x3da235e3)); SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2, getF32Constant(DAG, 0x3e65b8f3)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x3f324b07)); SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X); SDValue t7 = DAG.getNode(ISD::FADD, dl, MVT::f32, t6, getF32Constant(DAG, 0x3f7ff8fd)); SDValue t8 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, t7); SDValue TwoToFractionalPartOfX = DAG.getNode(ISD::ADD, dl, MVT::i32, t8, IntegerPartOfX); result = DAG.getNode(ISD::BITCAST, dl, MVT::f32, TwoToFractionalPartOfX); } else { // LimitFloatPrecision > 12 && LimitFloatPrecision <= 18 // For floating-point precision of 18: // // TwoToFractionalPartOfX = // 0.999999982f + // (0.693148872f + // (0.240227044f + // (0.554906021e-1f + // (0.961591928e-2f + // (0.136028312e-2f + 0.157059148e-3f *x)*x)*x)*x)*x)*x; // error 2.47208000*10^(-7), which is better than 18 bits SDValue t2 = DAG.getNode(ISD::FMUL, dl, MVT::f32, X, getF32Constant(DAG, 0x3924b03e)); SDValue t3 = DAG.getNode(ISD::FADD, dl, MVT::f32, t2, getF32Constant(DAG, 0x3ab24b87)); SDValue t4 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t3, X); SDValue t5 = DAG.getNode(ISD::FADD, dl, MVT::f32, t4, getF32Constant(DAG, 0x3c1d8c17)); SDValue t6 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t5, X); SDValue t7 = DAG.getNode(ISD::FADD, dl, MVT::f32, t6, getF32Constant(DAG, 0x3d634a1d)); SDValue t8 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t7, X); SDValue t9 = DAG.getNode(ISD::FADD, dl, MVT::f32, t8, getF32Constant(DAG, 0x3e75fe14)); SDValue t10 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t9, X); SDValue t11 = DAG.getNode(ISD::FADD, dl, MVT::f32, t10, getF32Constant(DAG, 0x3f317234)); SDValue t12 = DAG.getNode(ISD::FMUL, dl, MVT::f32, t11, X); SDValue t13 = DAG.getNode(ISD::FADD, dl, MVT::f32, t12, getF32Constant(DAG, 0x3f800000)); SDValue t14 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, t13); SDValue TwoToFractionalPartOfX = DAG.getNode(ISD::ADD, dl, MVT::i32, t14, IntegerPartOfX); result = DAG.getNode(ISD::BITCAST, dl, MVT::f32, TwoToFractionalPartOfX); } } else { // No special expansion. result = DAG.getNode(ISD::FPOW, dl, getValue(I.getArgOperand(0)).getValueType(), getValue(I.getArgOperand(0)), getValue(I.getArgOperand(1))); } setValue(&I, result); } /// ExpandPowI - Expand a llvm.powi intrinsic. static SDValue ExpandPowI(DebugLoc DL, SDValue LHS, SDValue RHS, SelectionDAG &DAG) { // If RHS is a constant, we can expand this out to a multiplication tree, // otherwise we end up lowering to a call to __powidf2 (for example). When // optimizing for size, we only want to do this if the expansion would produce // a small number of multiplies, otherwise we do the full expansion. if (ConstantSDNode *RHSC = dyn_cast(RHS)) { // Get the exponent as a positive value. unsigned Val = RHSC->getSExtValue(); if ((int)Val < 0) Val = -Val; // powi(x, 0) -> 1.0 if (Val == 0) return DAG.getConstantFP(1.0, LHS.getValueType()); const Function *F = DAG.getMachineFunction().getFunction(); if (!F->hasFnAttr(Attribute::OptimizeForSize) || // If optimizing for size, don't insert too many multiplies. This // inserts up to 5 multiplies. CountPopulation_32(Val)+Log2_32(Val) < 7) { // We use the simple binary decomposition method to generate the multiply // sequence. There are more optimal ways to do this (for example, // powi(x,15) generates one more multiply than it should), but this has // the benefit of being both really simple and much better than a libcall. SDValue Res; // Logically starts equal to 1.0 SDValue CurSquare = LHS; while (Val) { if (Val & 1) { if (Res.getNode()) Res = DAG.getNode(ISD::FMUL, DL,Res.getValueType(), Res, CurSquare); else Res = CurSquare; // 1.0*CurSquare. } CurSquare = DAG.getNode(ISD::FMUL, DL, CurSquare.getValueType(), CurSquare, CurSquare); Val >>= 1; } // If the original was negative, invert the result, producing 1/(x*x*x). if (RHSC->getSExtValue() < 0) Res = DAG.getNode(ISD::FDIV, DL, LHS.getValueType(), DAG.getConstantFP(1.0, LHS.getValueType()), Res); return Res; } } // Otherwise, expand to a libcall. return DAG.getNode(ISD::FPOWI, DL, LHS.getValueType(), LHS, RHS); } // getTruncatedArgReg - Find underlying register used for an truncated // argument. static unsigned getTruncatedArgReg(const SDValue &N) { if (N.getOpcode() != ISD::TRUNCATE) return 0; const SDValue &Ext = N.getOperand(0); if (Ext.getOpcode() == ISD::AssertZext || Ext.getOpcode() == ISD::AssertSext){ const SDValue &CFR = Ext.getOperand(0); if (CFR.getOpcode() == ISD::CopyFromReg) return cast(CFR.getOperand(1))->getReg(); else if (CFR.getOpcode() == ISD::TRUNCATE) return getTruncatedArgReg(CFR); } return 0; } /// EmitFuncArgumentDbgValue - If the DbgValueInst is a dbg_value of a function /// argument, create the corresponding DBG_VALUE machine instruction for it now. /// At the end of instruction selection, they will be inserted to the entry BB. bool SelectionDAGBuilder::EmitFuncArgumentDbgValue(const Value *V, MDNode *Variable, int64_t Offset, const SDValue &N) { const Argument *Arg = dyn_cast(V); if (!Arg) return false; MachineFunction &MF = DAG.getMachineFunction(); const TargetInstrInfo *TII = DAG.getTarget().getInstrInfo(); const TargetRegisterInfo *TRI = DAG.getTarget().getRegisterInfo(); // Ignore inlined function arguments here. DIVariable DV(Variable); if (DV.isInlinedFnArgument(MF.getFunction())) return false; unsigned Reg = 0; // Some arguments' frame index is recorded during argument lowering. Offset = FuncInfo.getArgumentFrameIndex(Arg); if (Offset) Reg = TRI->getFrameRegister(MF); if (!Reg && N.getNode()) { if (N.getOpcode() == ISD::CopyFromReg) Reg = cast(N.getOperand(1))->getReg(); else Reg = getTruncatedArgReg(N); if (Reg && TargetRegisterInfo::isVirtualRegister(Reg)) { MachineRegisterInfo &RegInfo = MF.getRegInfo(); unsigned PR = RegInfo.getLiveInPhysReg(Reg); if (PR) Reg = PR; } } if (!Reg) { // Check if ValueMap has reg number. DenseMap::iterator VMI = FuncInfo.ValueMap.find(V); if (VMI != FuncInfo.ValueMap.end()) Reg = VMI->second; } if (!Reg && N.getNode()) { // Check if frame index is available. if (LoadSDNode *LNode = dyn_cast(N.getNode())) if (FrameIndexSDNode *FINode = dyn_cast(LNode->getBasePtr().getNode())) { Reg = TRI->getFrameRegister(MF); Offset = FINode->getIndex(); } } if (!Reg) return false; MachineInstrBuilder MIB = BuildMI(MF, getCurDebugLoc(), TII->get(TargetOpcode::DBG_VALUE)) .addReg(Reg, RegState::Debug).addImm(Offset).addMetadata(Variable); FuncInfo.ArgDbgValues.push_back(&*MIB); return true; } // VisualStudio defines setjmp as _setjmp #if defined(_MSC_VER) && defined(setjmp) && \ !defined(setjmp_undefined_for_msvc) # pragma push_macro("setjmp") # undef setjmp # define setjmp_undefined_for_msvc #endif /// visitIntrinsicCall - Lower the call to the specified intrinsic function. If /// we want to emit this as a call to a named external function, return the name /// otherwise lower it and return null. const char * SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, unsigned Intrinsic) { DebugLoc dl = getCurDebugLoc(); SDValue Res; switch (Intrinsic) { default: // By default, turn this into a target intrinsic node. visitTargetIntrinsic(I, Intrinsic); return 0; case Intrinsic::vastart: visitVAStart(I); return 0; case Intrinsic::vaend: visitVAEnd(I); return 0; case Intrinsic::vacopy: visitVACopy(I); return 0; case Intrinsic::returnaddress: setValue(&I, DAG.getNode(ISD::RETURNADDR, dl, TLI.getPointerTy(), getValue(I.getArgOperand(0)))); return 0; case Intrinsic::frameaddress: setValue(&I, DAG.getNode(ISD::FRAMEADDR, dl, TLI.getPointerTy(), getValue(I.getArgOperand(0)))); return 0; case Intrinsic::setjmp: return "_setjmp"+!TLI.usesUnderscoreSetJmp(); case Intrinsic::longjmp: return "_longjmp"+!TLI.usesUnderscoreLongJmp(); case Intrinsic::memcpy: { // Assert for address < 256 since we support only user defined address // spaces. assert(cast(I.getArgOperand(0)->getType())->getAddressSpace() < 256 && cast(I.getArgOperand(1)->getType())->getAddressSpace() < 256 && "Unknown address space"); SDValue Op1 = getValue(I.getArgOperand(0)); SDValue Op2 = getValue(I.getArgOperand(1)); SDValue Op3 = getValue(I.getArgOperand(2)); unsigned Align = cast(I.getArgOperand(3))->getZExtValue(); bool isVol = cast(I.getArgOperand(4))->getZExtValue(); DAG.setRoot(DAG.getMemcpy(getRoot(), dl, Op1, Op2, Op3, Align, isVol, false, MachinePointerInfo(I.getArgOperand(0)), MachinePointerInfo(I.getArgOperand(1)))); return 0; } case Intrinsic::memset: { // Assert for address < 256 since we support only user defined address // spaces. assert(cast(I.getArgOperand(0)->getType())->getAddressSpace() < 256 && "Unknown address space"); SDValue Op1 = getValue(I.getArgOperand(0)); SDValue Op2 = getValue(I.getArgOperand(1)); SDValue Op3 = getValue(I.getArgOperand(2)); unsigned Align = cast(I.getArgOperand(3))->getZExtValue(); bool isVol = cast(I.getArgOperand(4))->getZExtValue(); DAG.setRoot(DAG.getMemset(getRoot(), dl, Op1, Op2, Op3, Align, isVol, MachinePointerInfo(I.getArgOperand(0)))); return 0; } case Intrinsic::memmove: { // Assert for address < 256 since we support only user defined address // spaces. assert(cast(I.getArgOperand(0)->getType())->getAddressSpace() < 256 && cast(I.getArgOperand(1)->getType())->getAddressSpace() < 256 && "Unknown address space"); SDValue Op1 = getValue(I.getArgOperand(0)); SDValue Op2 = getValue(I.getArgOperand(1)); SDValue Op3 = getValue(I.getArgOperand(2)); unsigned Align = cast(I.getArgOperand(3))->getZExtValue(); bool isVol = cast(I.getArgOperand(4))->getZExtValue(); DAG.setRoot(DAG.getMemmove(getRoot(), dl, Op1, Op2, Op3, Align, isVol, MachinePointerInfo(I.getArgOperand(0)), MachinePointerInfo(I.getArgOperand(1)))); return 0; } case Intrinsic::dbg_declare: { const DbgDeclareInst &DI = cast(I); MDNode *Variable = DI.getVariable(); const Value *Address = DI.getAddress(); if (!Address || !DIVariable(Variable).Verify()) return 0; // Build an entry in DbgOrdering. Debug info input nodes get an SDNodeOrder // but do not always have a corresponding SDNode built. The SDNodeOrder // absolute, but not relative, values are different depending on whether // debug info exists. ++SDNodeOrder; // Check if address has undef value. if (isa(Address) || (Address->use_empty() && !isa(Address))) { DEBUG(dbgs() << "Dropping debug info for " << DI); return 0; } SDValue &N = NodeMap[Address]; if (!N.getNode() && isa(Address)) // Check unused arguments map. N = UnusedArgNodeMap[Address]; SDDbgValue *SDV; if (N.getNode()) { // Parameters are handled specially. bool isParameter = DIVariable(Variable).getTag() == dwarf::DW_TAG_arg_variable; if (const BitCastInst *BCI = dyn_cast(Address)) Address = BCI->getOperand(0); const AllocaInst *AI = dyn_cast(Address); if (isParameter && !AI) { FrameIndexSDNode *FINode = dyn_cast(N.getNode()); if (FINode) // Byval parameter. We have a frame index at this point. SDV = DAG.getDbgValue(Variable, FINode->getIndex(), 0, dl, SDNodeOrder); else { // Address is an argument, so try to emit its dbg value using // virtual register info from the FuncInfo.ValueMap. EmitFuncArgumentDbgValue(Address, Variable, 0, N); return 0; } } else if (AI) SDV = DAG.getDbgValue(Variable, N.getNode(), N.getResNo(), 0, dl, SDNodeOrder); else { // Can't do anything with other non-AI cases yet. DEBUG(dbgs() << "Dropping debug info for " << DI); return 0; } DAG.AddDbgValue(SDV, N.getNode(), isParameter); } else { // If Address is an argument then try to emit its dbg value using // virtual register info from the FuncInfo.ValueMap. if (!EmitFuncArgumentDbgValue(Address, Variable, 0, N)) { // If variable is pinned by a alloca in dominating bb then // use StaticAllocaMap. if (const AllocaInst *AI = dyn_cast(Address)) { if (AI->getParent() != DI.getParent()) { DenseMap::iterator SI = FuncInfo.StaticAllocaMap.find(AI); if (SI != FuncInfo.StaticAllocaMap.end()) { SDV = DAG.getDbgValue(Variable, SI->second, 0, dl, SDNodeOrder); DAG.AddDbgValue(SDV, 0, false); return 0; } } } DEBUG(dbgs() << "Dropping debug info for " << DI); } } return 0; } case Intrinsic::dbg_value: { const DbgValueInst &DI = cast(I); if (!DIVariable(DI.getVariable()).Verify()) return 0; MDNode *Variable = DI.getVariable(); uint64_t Offset = DI.getOffset(); const Value *V = DI.getValue(); if (!V) return 0; // Build an entry in DbgOrdering. Debug info input nodes get an SDNodeOrder // but do not always have a corresponding SDNode built. The SDNodeOrder // absolute, but not relative, values are different depending on whether // debug info exists. ++SDNodeOrder; SDDbgValue *SDV; if (isa(V) || isa(V) || isa(V)) { SDV = DAG.getDbgValue(Variable, V, Offset, dl, SDNodeOrder); DAG.AddDbgValue(SDV, 0, false); } else { // Do not use getValue() in here; we don't want to generate code at // this point if it hasn't been done yet. SDValue N = NodeMap[V]; if (!N.getNode() && isa(V)) // Check unused arguments map. N = UnusedArgNodeMap[V]; if (N.getNode()) { if (!EmitFuncArgumentDbgValue(V, Variable, Offset, N)) { SDV = DAG.getDbgValue(Variable, N.getNode(), N.getResNo(), Offset, dl, SDNodeOrder); DAG.AddDbgValue(SDV, N.getNode(), false); } } else if (!V->use_empty() ) { // Do not call getValue(V) yet, as we don't want to generate code. // Remember it for later. DanglingDebugInfo DDI(&DI, dl, SDNodeOrder); DanglingDebugInfoMap[V] = DDI; } else { // We may expand this to cover more cases. One case where we have no // data available is an unreferenced parameter. DEBUG(dbgs() << "Dropping debug info for " << DI); } } // Build a debug info table entry. if (const BitCastInst *BCI = dyn_cast(V)) V = BCI->getOperand(0); const AllocaInst *AI = dyn_cast(V); // Don't handle byval struct arguments or VLAs, for example. if (!AI) return 0; DenseMap::iterator SI = FuncInfo.StaticAllocaMap.find(AI); if (SI == FuncInfo.StaticAllocaMap.end()) return 0; // VLAs. int FI = SI->second; MachineModuleInfo &MMI = DAG.getMachineFunction().getMMI(); if (!DI.getDebugLoc().isUnknown() && MMI.hasDebugInfo()) MMI.setVariableDbgInfo(Variable, FI, DI.getDebugLoc()); return 0; } case Intrinsic::eh_exception: { // Insert the EXCEPTIONADDR instruction. assert(FuncInfo.MBB->isLandingPad() && "Call to eh.exception not in landing pad!"); SDVTList VTs = DAG.getVTList(TLI.getPointerTy(), MVT::Other); SDValue Ops[1]; Ops[0] = DAG.getRoot(); SDValue Op = DAG.getNode(ISD::EXCEPTIONADDR, dl, VTs, Ops, 1); setValue(&I, Op); DAG.setRoot(Op.getValue(1)); return 0; } case Intrinsic::eh_selector: { MachineBasicBlock *CallMBB = FuncInfo.MBB; MachineModuleInfo &MMI = DAG.getMachineFunction().getMMI(); if (CallMBB->isLandingPad()) AddCatchInfo(I, &MMI, CallMBB); else { #ifndef NDEBUG FuncInfo.CatchInfoLost.insert(&I); #endif // FIXME: Mark exception selector register as live in. Hack for PR1508. unsigned Reg = TLI.getExceptionSelectorRegister(); if (Reg) FuncInfo.MBB->addLiveIn(Reg); } // Insert the EHSELECTION instruction. SDVTList VTs = DAG.getVTList(TLI.getPointerTy(), MVT::Other); SDValue Ops[2]; Ops[0] = getValue(I.getArgOperand(0)); Ops[1] = getRoot(); SDValue Op = DAG.getNode(ISD::EHSELECTION, dl, VTs, Ops, 2); DAG.setRoot(Op.getValue(1)); setValue(&I, DAG.getSExtOrTrunc(Op, dl, MVT::i32)); return 0; } case Intrinsic::eh_typeid_for: { // Find the type id for the given typeinfo. GlobalVariable *GV = ExtractTypeInfo(I.getArgOperand(0)); unsigned TypeID = DAG.getMachineFunction().getMMI().getTypeIDFor(GV); Res = DAG.getConstant(TypeID, MVT::i32); setValue(&I, Res); return 0; } case Intrinsic::eh_return_i32: case Intrinsic::eh_return_i64: DAG.getMachineFunction().getMMI().setCallsEHReturn(true); DAG.setRoot(DAG.getNode(ISD::EH_RETURN, dl, MVT::Other, getControlRoot(), getValue(I.getArgOperand(0)), getValue(I.getArgOperand(1)))); return 0; case Intrinsic::eh_unwind_init: DAG.getMachineFunction().getMMI().setCallsUnwindInit(true); return 0; case Intrinsic::eh_dwarf_cfa: { SDValue CfaArg = DAG.getSExtOrTrunc(getValue(I.getArgOperand(0)), dl, TLI.getPointerTy()); SDValue Offset = DAG.getNode(ISD::ADD, dl, TLI.getPointerTy(), DAG.getNode(ISD::FRAME_TO_ARGS_OFFSET, dl, TLI.getPointerTy()), CfaArg); SDValue FA = DAG.getNode(ISD::FRAMEADDR, dl, TLI.getPointerTy(), DAG.getConstant(0, TLI.getPointerTy())); setValue(&I, DAG.getNode(ISD::ADD, dl, TLI.getPointerTy(), FA, Offset)); return 0; } case Intrinsic::eh_sjlj_callsite: { MachineModuleInfo &MMI = DAG.getMachineFunction().getMMI(); ConstantInt *CI = dyn_cast(I.getArgOperand(0)); assert(CI && "Non-constant call site value in eh.sjlj.callsite!"); assert(MMI.getCurrentCallSite() == 0 && "Overlapping call sites!"); MMI.setCurrentCallSite(CI->getZExtValue()); return 0; } case Intrinsic::eh_sjlj_functioncontext: { // Get and store the index of the function context. MachineFrameInfo *MFI = DAG.getMachineFunction().getFrameInfo(); AllocaInst *FnCtx = cast(I.getArgOperand(0)->stripPointerCasts()); int FI = FuncInfo.StaticAllocaMap[FnCtx]; MFI->setFunctionContextIndex(FI); return 0; } case Intrinsic::eh_sjlj_setjmp: { SDValue Ops[2]; Ops[0] = getRoot(); Ops[1] = getValue(I.getArgOperand(0)); SDValue Op = DAG.getNode(ISD::EH_SJLJ_SETJMP, dl, DAG.getVTList(MVT::i32, MVT::Other), Ops, 2); setValue(&I, Op.getValue(0)); DAG.setRoot(Op.getValue(1)); return 0; } case Intrinsic::eh_sjlj_longjmp: { DAG.setRoot(DAG.getNode(ISD::EH_SJLJ_LONGJMP, dl, MVT::Other, getRoot(), getValue(I.getArgOperand(0)))); return 0; } case Intrinsic::eh_sjlj_dispatch_setup: { DAG.setRoot(DAG.getNode(ISD::EH_SJLJ_DISPATCHSETUP, dl, MVT::Other, getRoot(), getValue(I.getArgOperand(0)))); return 0; } case Intrinsic::x86_mmx_pslli_w: case Intrinsic::x86_mmx_pslli_d: case Intrinsic::x86_mmx_pslli_q: case Intrinsic::x86_mmx_psrli_w: case Intrinsic::x86_mmx_psrli_d: case Intrinsic::x86_mmx_psrli_q: case Intrinsic::x86_mmx_psrai_w: case Intrinsic::x86_mmx_psrai_d: { SDValue ShAmt = getValue(I.getArgOperand(1)); if (isa(ShAmt)) { visitTargetIntrinsic(I, Intrinsic); return 0; } unsigned NewIntrinsic = 0; EVT ShAmtVT = MVT::v2i32; switch (Intrinsic) { case Intrinsic::x86_mmx_pslli_w: NewIntrinsic = Intrinsic::x86_mmx_psll_w; break; case Intrinsic::x86_mmx_pslli_d: NewIntrinsic = Intrinsic::x86_mmx_psll_d; break; case Intrinsic::x86_mmx_pslli_q: NewIntrinsic = Intrinsic::x86_mmx_psll_q; break; case Intrinsic::x86_mmx_psrli_w: NewIntrinsic = Intrinsic::x86_mmx_psrl_w; break; case Intrinsic::x86_mmx_psrli_d: NewIntrinsic = Intrinsic::x86_mmx_psrl_d; break; case Intrinsic::x86_mmx_psrli_q: NewIntrinsic = Intrinsic::x86_mmx_psrl_q; break; case Intrinsic::x86_mmx_psrai_w: NewIntrinsic = Intrinsic::x86_mmx_psra_w; break; case Intrinsic::x86_mmx_psrai_d: NewIntrinsic = Intrinsic::x86_mmx_psra_d; break; default: llvm_unreachable("Impossible intrinsic"); // Can't reach here. } // The vector shift intrinsics with scalars uses 32b shift amounts but // the sse2/mmx shift instructions reads 64 bits. Set the upper 32 bits // to be zero. // We must do this early because v2i32 is not a legal type. DebugLoc dl = getCurDebugLoc(); SDValue ShOps[2]; ShOps[0] = ShAmt; ShOps[1] = DAG.getConstant(0, MVT::i32); ShAmt = DAG.getNode(ISD::BUILD_VECTOR, dl, ShAmtVT, &ShOps[0], 2); EVT DestVT = TLI.getValueType(I.getType()); ShAmt = DAG.getNode(ISD::BITCAST, dl, DestVT, ShAmt); Res = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, DestVT, DAG.getConstant(NewIntrinsic, MVT::i32), getValue(I.getArgOperand(0)), ShAmt); setValue(&I, Res); return 0; } case Intrinsic::convertff: case Intrinsic::convertfsi: case Intrinsic::convertfui: case Intrinsic::convertsif: case Intrinsic::convertuif: case Intrinsic::convertss: case Intrinsic::convertsu: case Intrinsic::convertus: case Intrinsic::convertuu: { ISD::CvtCode Code = ISD::CVT_INVALID; switch (Intrinsic) { case Intrinsic::convertff: Code = ISD::CVT_FF; break; case Intrinsic::convertfsi: Code = ISD::CVT_FS; break; case Intrinsic::convertfui: Code = ISD::CVT_FU; break; case Intrinsic::convertsif: Code = ISD::CVT_SF; break; case Intrinsic::convertuif: Code = ISD::CVT_UF; break; case Intrinsic::convertss: Code = ISD::CVT_SS; break; case Intrinsic::convertsu: Code = ISD::CVT_SU; break; case Intrinsic::convertus: Code = ISD::CVT_US; break; case Intrinsic::convertuu: Code = ISD::CVT_UU; break; } EVT DestVT = TLI.getValueType(I.getType()); const Value *Op1 = I.getArgOperand(0); Res = DAG.getConvertRndSat(DestVT, getCurDebugLoc(), getValue(Op1), DAG.getValueType(DestVT), DAG.getValueType(getValue(Op1).getValueType()), getValue(I.getArgOperand(1)), getValue(I.getArgOperand(2)), Code); setValue(&I, Res); return 0; } case Intrinsic::sqrt: setValue(&I, DAG.getNode(ISD::FSQRT, dl, getValue(I.getArgOperand(0)).getValueType(), getValue(I.getArgOperand(0)))); return 0; case Intrinsic::powi: setValue(&I, ExpandPowI(dl, getValue(I.getArgOperand(0)), getValue(I.getArgOperand(1)), DAG)); return 0; case Intrinsic::sin: setValue(&I, DAG.getNode(ISD::FSIN, dl, getValue(I.getArgOperand(0)).getValueType(), getValue(I.getArgOperand(0)))); return 0; case Intrinsic::cos: setValue(&I, DAG.getNode(ISD::FCOS, dl, getValue(I.getArgOperand(0)).getValueType(), getValue(I.getArgOperand(0)))); return 0; case Intrinsic::log: visitLog(I); return 0; case Intrinsic::log2: visitLog2(I); return 0; case Intrinsic::log10: visitLog10(I); return 0; case Intrinsic::exp: visitExp(I); return 0; case Intrinsic::exp2: visitExp2(I); return 0; case Intrinsic::pow: visitPow(I); return 0; case Intrinsic::fma: setValue(&I, DAG.getNode(ISD::FMA, dl, getValue(I.getArgOperand(0)).getValueType(), getValue(I.getArgOperand(0)), getValue(I.getArgOperand(1)), getValue(I.getArgOperand(2)))); return 0; case Intrinsic::convert_to_fp16: setValue(&I, DAG.getNode(ISD::FP32_TO_FP16, dl, MVT::i16, getValue(I.getArgOperand(0)))); return 0; case Intrinsic::convert_from_fp16: setValue(&I, DAG.getNode(ISD::FP16_TO_FP32, dl, MVT::f32, getValue(I.getArgOperand(0)))); return 0; case Intrinsic::pcmarker: { SDValue Tmp = getValue(I.getArgOperand(0)); DAG.setRoot(DAG.getNode(ISD::PCMARKER, dl, MVT::Other, getRoot(), Tmp)); return 0; } case Intrinsic::readcyclecounter: { SDValue Op = getRoot(); Res = DAG.getNode(ISD::READCYCLECOUNTER, dl, DAG.getVTList(MVT::i64, MVT::Other), &Op, 1); setValue(&I, Res); DAG.setRoot(Res.getValue(1)); return 0; } case Intrinsic::bswap: setValue(&I, DAG.getNode(ISD::BSWAP, dl, getValue(I.getArgOperand(0)).getValueType(), getValue(I.getArgOperand(0)))); return 0; case Intrinsic::cttz: { SDValue Arg = getValue(I.getArgOperand(0)); EVT Ty = Arg.getValueType(); setValue(&I, DAG.getNode(ISD::CTTZ, dl, Ty, Arg)); return 0; } case Intrinsic::ctlz: { SDValue Arg = getValue(I.getArgOperand(0)); EVT Ty = Arg.getValueType(); setValue(&I, DAG.getNode(ISD::CTLZ, dl, Ty, Arg)); return 0; } case Intrinsic::ctpop: { SDValue Arg = getValue(I.getArgOperand(0)); EVT Ty = Arg.getValueType(); setValue(&I, DAG.getNode(ISD::CTPOP, dl, Ty, Arg)); return 0; } case Intrinsic::stacksave: { SDValue Op = getRoot(); Res = DAG.getNode(ISD::STACKSAVE, dl, DAG.getVTList(TLI.getPointerTy(), MVT::Other), &Op, 1); setValue(&I, Res); DAG.setRoot(Res.getValue(1)); return 0; } case Intrinsic::stackrestore: { Res = getValue(I.getArgOperand(0)); DAG.setRoot(DAG.getNode(ISD::STACKRESTORE, dl, MVT::Other, getRoot(), Res)); return 0; } case Intrinsic::stackprotector: { // Emit code into the DAG to store the stack guard onto the stack. MachineFunction &MF = DAG.getMachineFunction(); MachineFrameInfo *MFI = MF.getFrameInfo(); EVT PtrTy = TLI.getPointerTy(); SDValue Src = getValue(I.getArgOperand(0)); // The guard's value. AllocaInst *Slot = cast(I.getArgOperand(1)); int FI = FuncInfo.StaticAllocaMap[Slot]; MFI->setStackProtectorIndex(FI); SDValue FIN = DAG.getFrameIndex(FI, PtrTy); // Store the stack protector onto the stack. Res = DAG.getStore(getRoot(), getCurDebugLoc(), Src, FIN, MachinePointerInfo::getFixedStack(FI), true, false, 0); setValue(&I, Res); DAG.setRoot(Res); return 0; } case Intrinsic::objectsize: { // If we don't know by now, we're never going to know. ConstantInt *CI = dyn_cast(I.getArgOperand(1)); assert(CI && "Non-constant type in __builtin_object_size?"); SDValue Arg = getValue(I.getCalledValue()); EVT Ty = Arg.getValueType(); if (CI->isZero()) Res = DAG.getConstant(-1ULL, Ty); else Res = DAG.getConstant(0, Ty); setValue(&I, Res); return 0; } case Intrinsic::var_annotation: // Discard annotate attributes return 0; case Intrinsic::init_trampoline: { const Function *F = cast(I.getArgOperand(1)->stripPointerCasts()); SDValue Ops[6]; Ops[0] = getRoot(); Ops[1] = getValue(I.getArgOperand(0)); Ops[2] = getValue(I.getArgOperand(1)); Ops[3] = getValue(I.getArgOperand(2)); Ops[4] = DAG.getSrcValue(I.getArgOperand(0)); Ops[5] = DAG.getSrcValue(F); Res = DAG.getNode(ISD::INIT_TRAMPOLINE, dl, MVT::Other, Ops, 6); DAG.setRoot(Res); return 0; } case Intrinsic::adjust_trampoline: { setValue(&I, DAG.getNode(ISD::ADJUST_TRAMPOLINE, dl, TLI.getPointerTy(), getValue(I.getArgOperand(0)))); return 0; } case Intrinsic::gcroot: if (GFI) { const Value *Alloca = I.getArgOperand(0); const Constant *TypeMap = cast(I.getArgOperand(1)); FrameIndexSDNode *FI = cast(getValue(Alloca).getNode()); GFI->addStackRoot(FI->getIndex(), TypeMap); } return 0; case Intrinsic::gcread: case Intrinsic::gcwrite: llvm_unreachable("GC failed to lower gcread/gcwrite intrinsics!"); return 0; case Intrinsic::flt_rounds: setValue(&I, DAG.getNode(ISD::FLT_ROUNDS_, dl, MVT::i32)); return 0; case Intrinsic::expect: { // Just replace __builtin_expect(exp, c) with EXP. setValue(&I, getValue(I.getArgOperand(0))); return 0; } case Intrinsic::trap: { StringRef TrapFuncName = getTrapFunctionName(); if (TrapFuncName.empty()) { DAG.setRoot(DAG.getNode(ISD::TRAP, dl,MVT::Other, getRoot())); return 0; } TargetLowering::ArgListTy Args; std::pair Result = TLI.LowerCallTo(getRoot(), I.getType(), false, false, false, false, 0, CallingConv::C, /*isTailCall=*/false, /*isReturnValueUsed=*/true, DAG.getExternalSymbol(TrapFuncName.data(), TLI.getPointerTy()), Args, DAG, getCurDebugLoc()); DAG.setRoot(Result.second); return 0; } case Intrinsic::uadd_with_overflow: return implVisitAluOverflow(I, ISD::UADDO); case Intrinsic::sadd_with_overflow: return implVisitAluOverflow(I, ISD::SADDO); case Intrinsic::usub_with_overflow: return implVisitAluOverflow(I, ISD::USUBO); case Intrinsic::ssub_with_overflow: return implVisitAluOverflow(I, ISD::SSUBO); case Intrinsic::umul_with_overflow: return implVisitAluOverflow(I, ISD::UMULO); case Intrinsic::smul_with_overflow: return implVisitAluOverflow(I, ISD::SMULO); case Intrinsic::prefetch: { SDValue Ops[5]; unsigned rw = cast(I.getArgOperand(1))->getZExtValue(); Ops[0] = getRoot(); Ops[1] = getValue(I.getArgOperand(0)); Ops[2] = getValue(I.getArgOperand(1)); Ops[3] = getValue(I.getArgOperand(2)); Ops[4] = getValue(I.getArgOperand(3)); DAG.setRoot(DAG.getMemIntrinsicNode(ISD::PREFETCH, dl, DAG.getVTList(MVT::Other), &Ops[0], 5, EVT::getIntegerVT(*Context, 8), MachinePointerInfo(I.getArgOperand(0)), 0, /* align */ false, /* volatile */ rw==0, /* read */ rw==1)); /* write */ return 0; } case Intrinsic::invariant_start: case Intrinsic::lifetime_start: // Discard region information. setValue(&I, DAG.getUNDEF(TLI.getPointerTy())); return 0; case Intrinsic::invariant_end: case Intrinsic::lifetime_end: // Discard region information. return 0; } } void SelectionDAGBuilder::LowerCallTo(ImmutableCallSite CS, SDValue Callee, bool isTailCall, MachineBasicBlock *LandingPad) { PointerType *PT = cast(CS.getCalledValue()->getType()); FunctionType *FTy = cast(PT->getElementType()); Type *RetTy = FTy->getReturnType(); MachineModuleInfo &MMI = DAG.getMachineFunction().getMMI(); MCSymbol *BeginLabel = 0; TargetLowering::ArgListTy Args; TargetLowering::ArgListEntry Entry; Args.reserve(CS.arg_size()); // Check whether the function can return without sret-demotion. SmallVector Outs; SmallVector Offsets; GetReturnInfo(RetTy, CS.getAttributes().getRetAttributes(), Outs, TLI, &Offsets); bool CanLowerReturn = TLI.CanLowerReturn(CS.getCallingConv(), DAG.getMachineFunction(), FTy->isVarArg(), Outs, FTy->getContext()); SDValue DemoteStackSlot; int DemoteStackIdx = -100; if (!CanLowerReturn) { uint64_t TySize = TLI.getTargetData()->getTypeAllocSize( FTy->getReturnType()); unsigned Align = TLI.getTargetData()->getPrefTypeAlignment( FTy->getReturnType()); MachineFunction &MF = DAG.getMachineFunction(); DemoteStackIdx = MF.getFrameInfo()->CreateStackObject(TySize, Align, false); Type *StackSlotPtrType = PointerType::getUnqual(FTy->getReturnType()); DemoteStackSlot = DAG.getFrameIndex(DemoteStackIdx, TLI.getPointerTy()); Entry.Node = DemoteStackSlot; Entry.Ty = StackSlotPtrType; Entry.isSExt = false; Entry.isZExt = false; Entry.isInReg = false; Entry.isSRet = true; Entry.isNest = false; Entry.isByVal = false; Entry.Alignment = Align; Args.push_back(Entry); RetTy = Type::getVoidTy(FTy->getContext()); } for (ImmutableCallSite::arg_iterator i = CS.arg_begin(), e = CS.arg_end(); i != e; ++i) { const Value *V = *i; // Skip empty types if (V->getType()->isEmptyTy()) continue; SDValue ArgNode = getValue(V); Entry.Node = ArgNode; Entry.Ty = V->getType(); unsigned attrInd = i - CS.arg_begin() + 1; Entry.isSExt = CS.paramHasAttr(attrInd, Attribute::SExt); Entry.isZExt = CS.paramHasAttr(attrInd, Attribute::ZExt); Entry.isInReg = CS.paramHasAttr(attrInd, Attribute::InReg); Entry.isSRet = CS.paramHasAttr(attrInd, Attribute::StructRet); Entry.isNest = CS.paramHasAttr(attrInd, Attribute::Nest); Entry.isByVal = CS.paramHasAttr(attrInd, Attribute::ByVal); Entry.Alignment = CS.getParamAlignment(attrInd); Args.push_back(Entry); } if (LandingPad) { // Insert a label before the invoke call to mark the try range. This can be // used to detect deletion of the invoke via the MachineModuleInfo. BeginLabel = MMI.getContext().CreateTempSymbol(); // For SjLj, keep track of which landing pads go with which invokes // so as to maintain the ordering of pads in the LSDA. unsigned CallSiteIndex = MMI.getCurrentCallSite(); if (CallSiteIndex) { MMI.setCallSiteBeginLabel(BeginLabel, CallSiteIndex); LPadToCallSiteMap[LandingPad].push_back(CallSiteIndex); // Now that the call site is handled, stop tracking it. MMI.setCurrentCallSite(0); } // Both PendingLoads and PendingExports must be flushed here; // this call might not return. (void)getRoot(); DAG.setRoot(DAG.getEHLabel(getCurDebugLoc(), getControlRoot(), BeginLabel)); } // Check if target-independent constraints permit a tail call here. // Target-dependent constraints are checked within TLI.LowerCallTo. if (isTailCall && !isInTailCallPosition(CS, CS.getAttributes().getRetAttributes(), TLI)) isTailCall = false; // If there's a possibility that fast-isel has already selected some amount // of the current basic block, don't emit a tail call. if (isTailCall && EnableFastISel) isTailCall = false; std::pair Result = TLI.LowerCallTo(getRoot(), RetTy, CS.paramHasAttr(0, Attribute::SExt), CS.paramHasAttr(0, Attribute::ZExt), FTy->isVarArg(), CS.paramHasAttr(0, Attribute::InReg), FTy->getNumParams(), CS.getCallingConv(), isTailCall, !CS.getInstruction()->use_empty(), Callee, Args, DAG, getCurDebugLoc()); assert((isTailCall || Result.second.getNode()) && "Non-null chain expected with non-tail call!"); assert((Result.second.getNode() || !Result.first.getNode()) && "Null value expected with tail call!"); if (Result.first.getNode()) { setValue(CS.getInstruction(), Result.first); } else if (!CanLowerReturn && Result.second.getNode()) { // The instruction result is the result of loading from the // hidden sret parameter. SmallVector PVTs; Type *PtrRetTy = PointerType::getUnqual(FTy->getReturnType()); ComputeValueVTs(TLI, PtrRetTy, PVTs); assert(PVTs.size() == 1 && "Pointers should fit in one register"); EVT PtrVT = PVTs[0]; unsigned NumValues = Outs.size(); SmallVector Values(NumValues); SmallVector Chains(NumValues); for (unsigned i = 0; i < NumValues; ++i) { SDValue Add = DAG.getNode(ISD::ADD, getCurDebugLoc(), PtrVT, DemoteStackSlot, DAG.getConstant(Offsets[i], PtrVT)); SDValue L = DAG.getLoad(Outs[i].VT, getCurDebugLoc(), Result.second, Add, MachinePointerInfo::getFixedStack(DemoteStackIdx, Offsets[i]), false, false, 1); Values[i] = L; Chains[i] = L.getValue(1); } SDValue Chain = DAG.getNode(ISD::TokenFactor, getCurDebugLoc(), MVT::Other, &Chains[0], NumValues); PendingLoads.push_back(Chain); // Collect the legal value parts into potentially illegal values // that correspond to the original function's return values. SmallVector RetTys; RetTy = FTy->getReturnType(); ComputeValueVTs(TLI, RetTy, RetTys); ISD::NodeType AssertOp = ISD::DELETED_NODE; SmallVector ReturnValues; unsigned CurReg = 0; for (unsigned I = 0, E = RetTys.size(); I != E; ++I) { EVT VT = RetTys[I]; EVT RegisterVT = TLI.getRegisterType(RetTy->getContext(), VT); unsigned NumRegs = TLI.getNumRegisters(RetTy->getContext(), VT); SDValue ReturnValue = getCopyFromParts(DAG, getCurDebugLoc(), &Values[CurReg], NumRegs, RegisterVT, VT, AssertOp); ReturnValues.push_back(ReturnValue); CurReg += NumRegs; } setValue(CS.getInstruction(), DAG.getNode(ISD::MERGE_VALUES, getCurDebugLoc(), DAG.getVTList(&RetTys[0], RetTys.size()), &ReturnValues[0], ReturnValues.size())); } // Assign order to nodes here. If the call does not produce a result, it won't // be mapped to a SDNode and visit() will not assign it an order number. if (!Result.second.getNode()) { // As a special case, a null chain means that a tail call has been emitted and // the DAG root is already updated. HasTailCall = true; ++SDNodeOrder; AssignOrderingToNode(DAG.getRoot().getNode()); } else { DAG.setRoot(Result.second); ++SDNodeOrder; AssignOrderingToNode(Result.second.getNode()); } if (LandingPad) { // Insert a label at the end of the invoke call to mark the try range. This // can be used to detect deletion of the invoke via the MachineModuleInfo. MCSymbol *EndLabel = MMI.getContext().CreateTempSymbol(); DAG.setRoot(DAG.getEHLabel(getCurDebugLoc(), getRoot(), EndLabel)); // Inform MachineModuleInfo of range. MMI.addInvoke(LandingPad, BeginLabel, EndLabel); } } /// IsOnlyUsedInZeroEqualityComparison - Return true if it only matters that the /// value is equal or not-equal to zero. static bool IsOnlyUsedInZeroEqualityComparison(const Value *V) { for (Value::const_use_iterator UI = V->use_begin(), E = V->use_end(); UI != E; ++UI) { if (const ICmpInst *IC = dyn_cast(*UI)) if (IC->isEquality()) if (const Constant *C = dyn_cast(IC->getOperand(1))) if (C->isNullValue()) continue; // Unknown instruction. return false; } return true; } static SDValue getMemCmpLoad(const Value *PtrVal, MVT LoadVT, Type *LoadTy, SelectionDAGBuilder &Builder) { // Check to see if this load can be trivially constant folded, e.g. if the // input is from a string literal. if (const Constant *LoadInput = dyn_cast(PtrVal)) { // Cast pointer to the type we really want to load. LoadInput = ConstantExpr::getBitCast(const_cast(LoadInput), PointerType::getUnqual(LoadTy)); if (const Constant *LoadCst = ConstantFoldLoadFromConstPtr(const_cast(LoadInput), Builder.TD)) return Builder.getValue(LoadCst); } // Otherwise, we have to emit the load. If the pointer is to unfoldable but // still constant memory, the input chain can be the entry node. SDValue Root; bool ConstantMemory = false; // Do not serialize (non-volatile) loads of constant memory with anything. if (Builder.AA->pointsToConstantMemory(PtrVal)) { Root = Builder.DAG.getEntryNode(); ConstantMemory = true; } else { // Do not serialize non-volatile loads against each other. Root = Builder.DAG.getRoot(); } SDValue Ptr = Builder.getValue(PtrVal); SDValue LoadVal = Builder.DAG.getLoad(LoadVT, Builder.getCurDebugLoc(), Root, Ptr, MachinePointerInfo(PtrVal), false /*volatile*/, false /*nontemporal*/, 1 /* align=1 */); if (!ConstantMemory) Builder.PendingLoads.push_back(LoadVal.getValue(1)); return LoadVal; } /// visitMemCmpCall - See if we can lower a call to memcmp in an optimized form. /// If so, return true and lower it, otherwise return false and it will be /// lowered like a normal call. bool SelectionDAGBuilder::visitMemCmpCall(const CallInst &I) { // Verify that the prototype makes sense. int memcmp(void*,void*,size_t) if (I.getNumArgOperands() != 3) return false; const Value *LHS = I.getArgOperand(0), *RHS = I.getArgOperand(1); if (!LHS->getType()->isPointerTy() || !RHS->getType()->isPointerTy() || !I.getArgOperand(2)->getType()->isIntegerTy() || !I.getType()->isIntegerTy()) return false; const ConstantInt *Size = dyn_cast(I.getArgOperand(2)); // memcmp(S1,S2,2) != 0 -> (*(short*)LHS != *(short*)RHS) != 0 // memcmp(S1,S2,4) != 0 -> (*(int*)LHS != *(int*)RHS) != 0 if (Size && IsOnlyUsedInZeroEqualityComparison(&I)) { bool ActuallyDoIt = true; MVT LoadVT; Type *LoadTy; switch (Size->getZExtValue()) { default: LoadVT = MVT::Other; LoadTy = 0; ActuallyDoIt = false; break; case 2: LoadVT = MVT::i16; LoadTy = Type::getInt16Ty(Size->getContext()); break; case 4: LoadVT = MVT::i32; LoadTy = Type::getInt32Ty(Size->getContext()); break; case 8: LoadVT = MVT::i64; LoadTy = Type::getInt64Ty(Size->getContext()); break; /* case 16: LoadVT = MVT::v4i32; LoadTy = Type::getInt32Ty(Size->getContext()); LoadTy = VectorType::get(LoadTy, 4); break; */ } // This turns into unaligned loads. We only do this if the target natively // supports the MVT we'll be loading or if it is small enough (<= 4) that // we'll only produce a small number of byte loads. // Require that we can find a legal MVT, and only do this if the target // supports unaligned loads of that type. Expanding into byte loads would // bloat the code. if (ActuallyDoIt && Size->getZExtValue() > 4) { // TODO: Handle 5 byte compare as 4-byte + 1 byte. // TODO: Handle 8 byte compare on x86-32 as two 32-bit loads. if (!TLI.isTypeLegal(LoadVT) ||!TLI.allowsUnalignedMemoryAccesses(LoadVT)) ActuallyDoIt = false; } if (ActuallyDoIt) { SDValue LHSVal = getMemCmpLoad(LHS, LoadVT, LoadTy, *this); SDValue RHSVal = getMemCmpLoad(RHS, LoadVT, LoadTy, *this); SDValue Res = DAG.getSetCC(getCurDebugLoc(), MVT::i1, LHSVal, RHSVal, ISD::SETNE); EVT CallVT = TLI.getValueType(I.getType(), true); setValue(&I, DAG.getZExtOrTrunc(Res, getCurDebugLoc(), CallVT)); return true; } } return false; } void SelectionDAGBuilder::visitCall(const CallInst &I) { // Handle inline assembly differently. if (isa(I.getCalledValue())) { visitInlineAsm(&I); return; } // See if any floating point values are being passed to this function. This is // used to emit an undefined reference to fltused on Windows. FunctionType *FT = cast(I.getCalledValue()->getType()->getContainedType(0)); MachineModuleInfo &MMI = DAG.getMachineFunction().getMMI(); if (FT->isVarArg() && !MMI.callsExternalVAFunctionWithFloatingPointArguments()) { for (unsigned i = 0, e = I.getNumArgOperands(); i != e; ++i) { Type* T = I.getArgOperand(i)->getType(); for (po_iterator i = po_begin(T), e = po_end(T); i != e; ++i) { if (!i->isFloatingPointTy()) continue; MMI.setCallsExternalVAFunctionWithFloatingPointArguments(true); break; } } } const char *RenameFn = 0; if (Function *F = I.getCalledFunction()) { if (F->isDeclaration()) { if (const TargetIntrinsicInfo *II = TM.getIntrinsicInfo()) { if (unsigned IID = II->getIntrinsicID(F)) { RenameFn = visitIntrinsicCall(I, IID); if (!RenameFn) return; } } if (unsigned IID = F->getIntrinsicID()) { RenameFn = visitIntrinsicCall(I, IID); if (!RenameFn) return; } } // Check for well-known libc/libm calls. If the function is internal, it // can't be a library call. if (!F->hasLocalLinkage() && F->hasName()) { StringRef Name = F->getName(); if (Name == "copysign" || Name == "copysignf" || Name == "copysignl") { if (I.getNumArgOperands() == 2 && // Basic sanity checks. I.getArgOperand(0)->getType()->isFloatingPointTy() && I.getType() == I.getArgOperand(0)->getType() && I.getType() == I.getArgOperand(1)->getType()) { SDValue LHS = getValue(I.getArgOperand(0)); SDValue RHS = getValue(I.getArgOperand(1)); setValue(&I, DAG.getNode(ISD::FCOPYSIGN, getCurDebugLoc(), LHS.getValueType(), LHS, RHS)); return; } } else if (Name == "fabs" || Name == "fabsf" || Name == "fabsl") { if (I.getNumArgOperands() == 1 && // Basic sanity checks. I.getArgOperand(0)->getType()->isFloatingPointTy() && I.getType() == I.getArgOperand(0)->getType()) { SDValue Tmp = getValue(I.getArgOperand(0)); setValue(&I, DAG.getNode(ISD::FABS, getCurDebugLoc(), Tmp.getValueType(), Tmp)); return; } } else if (Name == "sin" || Name == "sinf" || Name == "sinl") { if (I.getNumArgOperands() == 1 && // Basic sanity checks. I.getArgOperand(0)->getType()->isFloatingPointTy() && I.getType() == I.getArgOperand(0)->getType() && I.onlyReadsMemory()) { SDValue Tmp = getValue(I.getArgOperand(0)); setValue(&I, DAG.getNode(ISD::FSIN, getCurDebugLoc(), Tmp.getValueType(), Tmp)); return; } } else if (Name == "cos" || Name == "cosf" || Name == "cosl") { if (I.getNumArgOperands() == 1 && // Basic sanity checks. I.getArgOperand(0)->getType()->isFloatingPointTy() && I.getType() == I.getArgOperand(0)->getType() && I.onlyReadsMemory()) { SDValue Tmp = getValue(I.getArgOperand(0)); setValue(&I, DAG.getNode(ISD::FCOS, getCurDebugLoc(), Tmp.getValueType(), Tmp)); return; } } else if (Name == "sqrt" || Name == "sqrtf" || Name == "sqrtl") { if (I.getNumArgOperands() == 1 && // Basic sanity checks. I.getArgOperand(0)->getType()->isFloatingPointTy() && I.getType() == I.getArgOperand(0)->getType() && I.onlyReadsMemory()) { SDValue Tmp = getValue(I.getArgOperand(0)); setValue(&I, DAG.getNode(ISD::FSQRT, getCurDebugLoc(), Tmp.getValueType(), Tmp)); return; } } else if (Name == "memcmp") { if (visitMemCmpCall(I)) return; } } } SDValue Callee; if (!RenameFn) Callee = getValue(I.getCalledValue()); else Callee = DAG.getExternalSymbol(RenameFn, TLI.getPointerTy()); // Check if we can potentially perform a tail call. More detailed checking is // be done within LowerCallTo, after more information about the call is known. LowerCallTo(&I, Callee, I.isTailCall()); } namespace { /// AsmOperandInfo - This contains information for each constraint that we are /// lowering. class SDISelAsmOperandInfo : public TargetLowering::AsmOperandInfo { public: /// CallOperand - If this is the result output operand or a clobber /// this is null, otherwise it is the incoming operand to the CallInst. /// This gets modified as the asm is processed. SDValue CallOperand; /// AssignedRegs - If this is a register or register class operand, this /// contains the set of register corresponding to the operand. RegsForValue AssignedRegs; explicit SDISelAsmOperandInfo(const TargetLowering::AsmOperandInfo &info) : TargetLowering::AsmOperandInfo(info), CallOperand(0,0) { } /// MarkAllocatedRegs - Once AssignedRegs is set, mark the assigned registers /// busy in OutputRegs/InputRegs. void MarkAllocatedRegs(bool isOutReg, bool isInReg, std::set &OutputRegs, std::set &InputRegs, const TargetRegisterInfo &TRI) const { if (isOutReg) { for (unsigned i = 0, e = AssignedRegs.Regs.size(); i != e; ++i) MarkRegAndAliases(AssignedRegs.Regs[i], OutputRegs, TRI); } if (isInReg) { for (unsigned i = 0, e = AssignedRegs.Regs.size(); i != e; ++i) MarkRegAndAliases(AssignedRegs.Regs[i], InputRegs, TRI); } } /// getCallOperandValEVT - Return the EVT of the Value* that this operand /// corresponds to. If there is no Value* for this operand, it returns /// MVT::Other. EVT getCallOperandValEVT(LLVMContext &Context, const TargetLowering &TLI, const TargetData *TD) const { if (CallOperandVal == 0) return MVT::Other; if (isa(CallOperandVal)) return TLI.getPointerTy(); llvm::Type *OpTy = CallOperandVal->getType(); // FIXME: code duplicated from TargetLowering::ParseConstraints(). // If this is an indirect operand, the operand is a pointer to the // accessed type. if (isIndirect) { llvm::PointerType *PtrTy = dyn_cast(OpTy); if (!PtrTy) report_fatal_error("Indirect operand for inline asm not a pointer!"); OpTy = PtrTy->getElementType(); } // Look for vector wrapped in a struct. e.g. { <16 x i8> }. if (StructType *STy = dyn_cast(OpTy)) if (STy->getNumElements() == 1) OpTy = STy->getElementType(0); // If OpTy is not a single value, it may be a struct/union that we // can tile with integers. if (!OpTy->isSingleValueType() && OpTy->isSized()) { unsigned BitSize = TD->getTypeSizeInBits(OpTy); switch (BitSize) { default: break; case 1: case 8: case 16: case 32: case 64: case 128: OpTy = IntegerType::get(Context, BitSize); break; } } return TLI.getValueType(OpTy, true); } private: /// MarkRegAndAliases - Mark the specified register and all aliases in the /// specified set. static void MarkRegAndAliases(unsigned Reg, std::set &Regs, const TargetRegisterInfo &TRI) { assert(TargetRegisterInfo::isPhysicalRegister(Reg) && "Isn't a physreg"); Regs.insert(Reg); if (const unsigned *Aliases = TRI.getAliasSet(Reg)) for (; *Aliases; ++Aliases) Regs.insert(*Aliases); } }; typedef SmallVector SDISelAsmOperandInfoVector; } // end anonymous namespace /// GetRegistersForValue - Assign registers (virtual or physical) for the /// specified operand. We prefer to assign virtual registers, to allow the /// register allocator to handle the assignment process. However, if the asm /// uses features that we can't model on machineinstrs, we have SDISel do the /// allocation. This produces generally horrible, but correct, code. /// /// OpInfo describes the operand. /// Input and OutputRegs are the set of already allocated physical registers. /// static void GetRegistersForValue(SelectionDAG &DAG, const TargetLowering &TLI, DebugLoc DL, SDISelAsmOperandInfo &OpInfo, std::set &OutputRegs, std::set &InputRegs) { LLVMContext &Context = *DAG.getContext(); // Compute whether this value requires an input register, an output register, // or both. bool isOutReg = false; bool isInReg = false; switch (OpInfo.Type) { case InlineAsm::isOutput: isOutReg = true; // If there is an input constraint that matches this, we need to reserve // the input register so no other inputs allocate to it. isInReg = OpInfo.hasMatchingInput(); break; case InlineAsm::isInput: isInReg = true; isOutReg = false; break; case InlineAsm::isClobber: isOutReg = true; isInReg = true; break; } MachineFunction &MF = DAG.getMachineFunction(); SmallVector Regs; // If this is a constraint for a single physreg, or a constraint for a // register class, find it. std::pair PhysReg = TLI.getRegForInlineAsmConstraint(OpInfo.ConstraintCode, OpInfo.ConstraintVT); unsigned NumRegs = 1; if (OpInfo.ConstraintVT != MVT::Other) { // If this is a FP input in an integer register (or visa versa) insert a bit // cast of the input value. More generally, handle any case where the input // value disagrees with the register class we plan to stick this in. if (OpInfo.Type == InlineAsm::isInput && PhysReg.second && !PhysReg.second->hasType(OpInfo.ConstraintVT)) { // Try to convert to the first EVT that the reg class contains. If the // types are identical size, use a bitcast to convert (e.g. two differing // vector types). EVT RegVT = *PhysReg.second->vt_begin(); if (RegVT.getSizeInBits() == OpInfo.ConstraintVT.getSizeInBits()) { OpInfo.CallOperand = DAG.getNode(ISD::BITCAST, DL, RegVT, OpInfo.CallOperand); OpInfo.ConstraintVT = RegVT; } else if (RegVT.isInteger() && OpInfo.ConstraintVT.isFloatingPoint()) { // If the input is a FP value and we want it in FP registers, do a // bitcast to the corresponding integer type. This turns an f64 value // into i64, which can be passed with two i32 values on a 32-bit // machine. RegVT = EVT::getIntegerVT(Context, OpInfo.ConstraintVT.getSizeInBits()); OpInfo.CallOperand = DAG.getNode(ISD::BITCAST, DL, RegVT, OpInfo.CallOperand); OpInfo.ConstraintVT = RegVT; } } NumRegs = TLI.getNumRegisters(Context, OpInfo.ConstraintVT); } EVT RegVT; EVT ValueVT = OpInfo.ConstraintVT; // If this is a constraint for a specific physical register, like {r17}, // assign it now. if (unsigned AssignedReg = PhysReg.first) { const TargetRegisterClass *RC = PhysReg.second; if (OpInfo.ConstraintVT == MVT::Other) ValueVT = *RC->vt_begin(); // Get the actual register value type. This is important, because the user // may have asked for (e.g.) the AX register in i32 type. We need to // remember that AX is actually i16 to get the right extension. RegVT = *RC->vt_begin(); // This is a explicit reference to a physical register. Regs.push_back(AssignedReg); // If this is an expanded reference, add the rest of the regs to Regs. if (NumRegs != 1) { TargetRegisterClass::iterator I = RC->begin(); for (; *I != AssignedReg; ++I) assert(I != RC->end() && "Didn't find reg!"); // Already added the first reg. --NumRegs; ++I; for (; NumRegs; --NumRegs, ++I) { assert(I != RC->end() && "Ran out of registers to allocate!"); Regs.push_back(*I); } } OpInfo.AssignedRegs = RegsForValue(Regs, RegVT, ValueVT); const TargetRegisterInfo *TRI = DAG.getTarget().getRegisterInfo(); OpInfo.MarkAllocatedRegs(isOutReg, isInReg, OutputRegs, InputRegs, *TRI); return; } // Otherwise, if this was a reference to an LLVM register class, create vregs // for this reference. if (const TargetRegisterClass *RC = PhysReg.second) { RegVT = *RC->vt_begin(); if (OpInfo.ConstraintVT == MVT::Other) ValueVT = RegVT; // Create the appropriate number of virtual registers. MachineRegisterInfo &RegInfo = MF.getRegInfo(); for (; NumRegs; --NumRegs) Regs.push_back(RegInfo.createVirtualRegister(RC)); OpInfo.AssignedRegs = RegsForValue(Regs, RegVT, ValueVT); return; } // Otherwise, we couldn't allocate enough registers for this. } /// visitInlineAsm - Handle a call to an InlineAsm object. /// void SelectionDAGBuilder::visitInlineAsm(ImmutableCallSite CS) { const InlineAsm *IA = cast(CS.getCalledValue()); /// ConstraintOperands - Information about all of the constraints. SDISelAsmOperandInfoVector ConstraintOperands; std::set OutputRegs, InputRegs; TargetLowering::AsmOperandInfoVector TargetConstraints = TLI.ParseConstraints(CS); bool hasMemory = false; unsigned ArgNo = 0; // ArgNo - The argument of the CallInst. unsigned ResNo = 0; // ResNo - The result number of the next output. for (unsigned i = 0, e = TargetConstraints.size(); i != e; ++i) { ConstraintOperands.push_back(SDISelAsmOperandInfo(TargetConstraints[i])); SDISelAsmOperandInfo &OpInfo = ConstraintOperands.back(); EVT OpVT = MVT::Other; // Compute the value type for each operand. switch (OpInfo.Type) { case InlineAsm::isOutput: // Indirect outputs just consume an argument. if (OpInfo.isIndirect) { OpInfo.CallOperandVal = const_cast(CS.getArgument(ArgNo++)); break; } // The return value of the call is this value. As such, there is no // corresponding argument. assert(!CS.getType()->isVoidTy() && "Bad inline asm!"); if (StructType *STy = dyn_cast(CS.getType())) { OpVT = TLI.getValueType(STy->getElementType(ResNo)); } else { assert(ResNo == 0 && "Asm only has one result!"); OpVT = TLI.getValueType(CS.getType()); } ++ResNo; break; case InlineAsm::isInput: OpInfo.CallOperandVal = const_cast(CS.getArgument(ArgNo++)); break; case InlineAsm::isClobber: // Nothing to do. break; } // If this is an input or an indirect output, process the call argument. // BasicBlocks are labels, currently appearing only in asm's. if (OpInfo.CallOperandVal) { if (const BasicBlock *BB = dyn_cast(OpInfo.CallOperandVal)) { OpInfo.CallOperand = DAG.getBasicBlock(FuncInfo.MBBMap[BB]); } else { OpInfo.CallOperand = getValue(OpInfo.CallOperandVal); } OpVT = OpInfo.getCallOperandValEVT(*DAG.getContext(), TLI, TD); } OpInfo.ConstraintVT = OpVT; // Indirect operand accesses access memory. if (OpInfo.isIndirect) hasMemory = true; else { for (unsigned j = 0, ee = OpInfo.Codes.size(); j != ee; ++j) { TargetLowering::ConstraintType CType = TLI.getConstraintType(OpInfo.Codes[j]); if (CType == TargetLowering::C_Memory) { hasMemory = true; break; } } } } SDValue Chain, Flag; // We won't need to flush pending loads if this asm doesn't touch // memory and is nonvolatile. if (hasMemory || IA->hasSideEffects()) Chain = getRoot(); else Chain = DAG.getRoot(); // Second pass over the constraints: compute which constraint option to use // and assign registers to constraints that want a specific physreg. for (unsigned i = 0, e = ConstraintOperands.size(); i != e; ++i) { SDISelAsmOperandInfo &OpInfo = ConstraintOperands[i]; // If this is an output operand with a matching input operand, look up the // matching input. If their types mismatch, e.g. one is an integer, the // other is floating point, or their sizes are different, flag it as an // error. if (OpInfo.hasMatchingInput()) { SDISelAsmOperandInfo &Input = ConstraintOperands[OpInfo.MatchingInput]; if (OpInfo.ConstraintVT != Input.ConstraintVT) { std::pair MatchRC = TLI.getRegForInlineAsmConstraint(OpInfo.ConstraintCode, OpInfo.ConstraintVT); std::pair InputRC = TLI.getRegForInlineAsmConstraint(Input.ConstraintCode, Input.ConstraintVT); if ((OpInfo.ConstraintVT.isInteger() != Input.ConstraintVT.isInteger()) || (MatchRC.second != InputRC.second)) { report_fatal_error("Unsupported asm: input constraint" " with a matching output constraint of" " incompatible type!"); } Input.ConstraintVT = OpInfo.ConstraintVT; } } // Compute the constraint code and ConstraintType to use. TLI.ComputeConstraintToUse(OpInfo, OpInfo.CallOperand, &DAG); // If this is a memory input, and if the operand is not indirect, do what we // need to to provide an address for the memory input. if (OpInfo.ConstraintType == TargetLowering::C_Memory && !OpInfo.isIndirect) { assert((OpInfo.isMultipleAlternative || (OpInfo.Type == InlineAsm::isInput)) && "Can only indirectify direct input operands!"); // Memory operands really want the address of the value. If we don't have // an indirect input, put it in the constpool if we can, otherwise spill // it to a stack slot. // TODO: This isn't quite right. We need to handle these according to // the addressing mode that the constraint wants. Also, this may take // an additional register for the computation and we don't want that // either. // If the operand is a float, integer, or vector constant, spill to a // constant pool entry to get its address. const Value *OpVal = OpInfo.CallOperandVal; if (isa(OpVal) || isa(OpVal) || isa(OpVal)) { OpInfo.CallOperand = DAG.getConstantPool(cast(OpVal), TLI.getPointerTy()); } else { // Otherwise, create a stack slot and emit a store to it before the // asm. Type *Ty = OpVal->getType(); uint64_t TySize = TLI.getTargetData()->getTypeAllocSize(Ty); unsigned Align = TLI.getTargetData()->getPrefTypeAlignment(Ty); MachineFunction &MF = DAG.getMachineFunction(); int SSFI = MF.getFrameInfo()->CreateStackObject(TySize, Align, false); SDValue StackSlot = DAG.getFrameIndex(SSFI, TLI.getPointerTy()); Chain = DAG.getStore(Chain, getCurDebugLoc(), OpInfo.CallOperand, StackSlot, MachinePointerInfo::getFixedStack(SSFI), false, false, 0); OpInfo.CallOperand = StackSlot; } // There is no longer a Value* corresponding to this operand. OpInfo.CallOperandVal = 0; // It is now an indirect operand. OpInfo.isIndirect = true; } // If this constraint is for a specific register, allocate it before // anything else. if (OpInfo.ConstraintType == TargetLowering::C_Register) GetRegistersForValue(DAG, TLI, getCurDebugLoc(), OpInfo, OutputRegs, InputRegs); } // Second pass - Loop over all of the operands, assigning virtual or physregs // to register class operands. for (unsigned i = 0, e = ConstraintOperands.size(); i != e; ++i) { SDISelAsmOperandInfo &OpInfo = ConstraintOperands[i]; // C_Register operands have already been allocated, Other/Memory don't need // to be. if (OpInfo.ConstraintType == TargetLowering::C_RegisterClass) GetRegistersForValue(DAG, TLI, getCurDebugLoc(), OpInfo, OutputRegs, InputRegs); } // AsmNodeOperands - The operands for the ISD::INLINEASM node. std::vector AsmNodeOperands; AsmNodeOperands.push_back(SDValue()); // reserve space for input chain AsmNodeOperands.push_back( DAG.getTargetExternalSymbol(IA->getAsmString().c_str(), TLI.getPointerTy())); // If we have a !srcloc metadata node associated with it, we want to attach // this to the ultimately generated inline asm machineinstr. To do this, we // pass in the third operand as this (potentially null) inline asm MDNode. const MDNode *SrcLoc = CS.getInstruction()->getMetadata("srcloc"); AsmNodeOperands.push_back(DAG.getMDNode(SrcLoc)); // Remember the HasSideEffect and AlignStack bits as operand 3. unsigned ExtraInfo = 0; if (IA->hasSideEffects()) ExtraInfo |= InlineAsm::Extra_HasSideEffects; if (IA->isAlignStack()) ExtraInfo |= InlineAsm::Extra_IsAlignStack; AsmNodeOperands.push_back(DAG.getTargetConstant(ExtraInfo, TLI.getPointerTy())); // Loop over all of the inputs, copying the operand values into the // appropriate registers and processing the output regs. RegsForValue RetValRegs; // IndirectStoresToEmit - The set of stores to emit after the inline asm node. std::vector > IndirectStoresToEmit; for (unsigned i = 0, e = ConstraintOperands.size(); i != e; ++i) { SDISelAsmOperandInfo &OpInfo = ConstraintOperands[i]; switch (OpInfo.Type) { case InlineAsm::isOutput: { if (OpInfo.ConstraintType != TargetLowering::C_RegisterClass && OpInfo.ConstraintType != TargetLowering::C_Register) { // Memory output, or 'other' output (e.g. 'X' constraint). assert(OpInfo.isIndirect && "Memory output must be indirect operand"); // Add information to the INLINEASM node to know about this output. unsigned OpFlags = InlineAsm::getFlagWord(InlineAsm::Kind_Mem, 1); AsmNodeOperands.push_back(DAG.getTargetConstant(OpFlags, TLI.getPointerTy())); AsmNodeOperands.push_back(OpInfo.CallOperand); break; } // Otherwise, this is a register or register class output. // Copy the output from the appropriate register. Find a register that // we can use. if (OpInfo.AssignedRegs.Regs.empty()) report_fatal_error("Couldn't allocate output reg for constraint '" + Twine(OpInfo.ConstraintCode) + "'!"); // If this is an indirect operand, store through the pointer after the // asm. if (OpInfo.isIndirect) { IndirectStoresToEmit.push_back(std::make_pair(OpInfo.AssignedRegs, OpInfo.CallOperandVal)); } else { // This is the result value of the call. assert(!CS.getType()->isVoidTy() && "Bad inline asm!"); // Concatenate this output onto the outputs list. RetValRegs.append(OpInfo.AssignedRegs); } // Add information to the INLINEASM node to know that this register is // set. OpInfo.AssignedRegs.AddInlineAsmOperands(OpInfo.isEarlyClobber ? InlineAsm::Kind_RegDefEarlyClobber : InlineAsm::Kind_RegDef, false, 0, DAG, AsmNodeOperands); break; } case InlineAsm::isInput: { SDValue InOperandVal = OpInfo.CallOperand; if (OpInfo.isMatchingInputConstraint()) { // Matching constraint? // If this is required to match an output register we have already set, // just use its register. unsigned OperandNo = OpInfo.getMatchedOperand(); // Scan until we find the definition we already emitted of this operand. // When we find it, create a RegsForValue operand. unsigned CurOp = InlineAsm::Op_FirstOperand; for (; OperandNo; --OperandNo) { // Advance to the next operand. unsigned OpFlag = cast(AsmNodeOperands[CurOp])->getZExtValue(); assert((InlineAsm::isRegDefKind(OpFlag) || InlineAsm::isRegDefEarlyClobberKind(OpFlag) || InlineAsm::isMemKind(OpFlag)) && "Skipped past definitions?"); CurOp += InlineAsm::getNumOperandRegisters(OpFlag)+1; } unsigned OpFlag = cast(AsmNodeOperands[CurOp])->getZExtValue(); if (InlineAsm::isRegDefKind(OpFlag) || InlineAsm::isRegDefEarlyClobberKind(OpFlag)) { // Add (OpFlag&0xffff)>>3 registers to MatchedRegs. if (OpInfo.isIndirect) { // This happens on gcc/testsuite/gcc.dg/pr8788-1.c LLVMContext &Ctx = *DAG.getContext(); Ctx.emitError(CS.getInstruction(), "inline asm not supported yet:" " don't know how to handle tied " "indirect register inputs"); } RegsForValue MatchedRegs; MatchedRegs.ValueVTs.push_back(InOperandVal.getValueType()); EVT RegVT = AsmNodeOperands[CurOp+1].getValueType(); MatchedRegs.RegVTs.push_back(RegVT); MachineRegisterInfo &RegInfo = DAG.getMachineFunction().getRegInfo(); for (unsigned i = 0, e = InlineAsm::getNumOperandRegisters(OpFlag); i != e; ++i) MatchedRegs.Regs.push_back (RegInfo.createVirtualRegister(TLI.getRegClassFor(RegVT))); // Use the produced MatchedRegs object to MatchedRegs.getCopyToRegs(InOperandVal, DAG, getCurDebugLoc(), Chain, &Flag); MatchedRegs.AddInlineAsmOperands(InlineAsm::Kind_RegUse, true, OpInfo.getMatchedOperand(), DAG, AsmNodeOperands); break; } assert(InlineAsm::isMemKind(OpFlag) && "Unknown matching constraint!"); assert(InlineAsm::getNumOperandRegisters(OpFlag) == 1 && "Unexpected number of operands"); // Add information to the INLINEASM node to know about this input. // See InlineAsm.h isUseOperandTiedToDef. OpFlag = InlineAsm::getFlagWordForMatchingOp(OpFlag, OpInfo.getMatchedOperand()); AsmNodeOperands.push_back(DAG.getTargetConstant(OpFlag, TLI.getPointerTy())); AsmNodeOperands.push_back(AsmNodeOperands[CurOp+1]); break; } // Treat indirect 'X' constraint as memory. if (OpInfo.ConstraintType == TargetLowering::C_Other && OpInfo.isIndirect) OpInfo.ConstraintType = TargetLowering::C_Memory; if (OpInfo.ConstraintType == TargetLowering::C_Other) { std::vector Ops; TLI.LowerAsmOperandForConstraint(InOperandVal, OpInfo.ConstraintCode, Ops, DAG); if (Ops.empty()) report_fatal_error("Invalid operand for inline asm constraint '" + Twine(OpInfo.ConstraintCode) + "'!"); // Add information to the INLINEASM node to know about this input. unsigned ResOpType = InlineAsm::getFlagWord(InlineAsm::Kind_Imm, Ops.size()); AsmNodeOperands.push_back(DAG.getTargetConstant(ResOpType, TLI.getPointerTy())); AsmNodeOperands.insert(AsmNodeOperands.end(), Ops.begin(), Ops.end()); break; } if (OpInfo.ConstraintType == TargetLowering::C_Memory) { assert(OpInfo.isIndirect && "Operand must be indirect to be a mem!"); assert(InOperandVal.getValueType() == TLI.getPointerTy() && "Memory operands expect pointer values"); // Add information to the INLINEASM node to know about this input. unsigned ResOpType = InlineAsm::getFlagWord(InlineAsm::Kind_Mem, 1); AsmNodeOperands.push_back(DAG.getTargetConstant(ResOpType, TLI.getPointerTy())); AsmNodeOperands.push_back(InOperandVal); break; } assert((OpInfo.ConstraintType == TargetLowering::C_RegisterClass || OpInfo.ConstraintType == TargetLowering::C_Register) && "Unknown constraint type!"); assert(!OpInfo.isIndirect && "Don't know how to handle indirect register inputs yet!"); // Copy the input into the appropriate registers. if (OpInfo.AssignedRegs.Regs.empty()) report_fatal_error("Couldn't allocate input reg for constraint '" + Twine(OpInfo.ConstraintCode) + "'!"); OpInfo.AssignedRegs.getCopyToRegs(InOperandVal, DAG, getCurDebugLoc(), Chain, &Flag); OpInfo.AssignedRegs.AddInlineAsmOperands(InlineAsm::Kind_RegUse, false, 0, DAG, AsmNodeOperands); break; } case InlineAsm::isClobber: { // Add the clobbered value to the operand list, so that the register // allocator is aware that the physreg got clobbered. if (!OpInfo.AssignedRegs.Regs.empty()) OpInfo.AssignedRegs.AddInlineAsmOperands(InlineAsm::Kind_Clobber, false, 0, DAG, AsmNodeOperands); break; } } } // Finish up input operands. Set the input chain and add the flag last. AsmNodeOperands[InlineAsm::Op_InputChain] = Chain; if (Flag.getNode()) AsmNodeOperands.push_back(Flag); Chain = DAG.getNode(ISD::INLINEASM, getCurDebugLoc(), DAG.getVTList(MVT::Other, MVT::Glue), &AsmNodeOperands[0], AsmNodeOperands.size()); Flag = Chain.getValue(1); // If this asm returns a register value, copy the result from that register // and set it as the value of the call. if (!RetValRegs.Regs.empty()) { SDValue Val = RetValRegs.getCopyFromRegs(DAG, FuncInfo, getCurDebugLoc(), Chain, &Flag); // FIXME: Why don't we do this for inline asms with MRVs? if (CS.getType()->isSingleValueType() && CS.getType()->isSized()) { EVT ResultType = TLI.getValueType(CS.getType()); // If any of the results of the inline asm is a vector, it may have the // wrong width/num elts. This can happen for register classes that can // contain multiple different value types. The preg or vreg allocated may // not have the same VT as was expected. Convert it to the right type // with bit_convert. if (ResultType != Val.getValueType() && Val.getValueType().isVector()) { Val = DAG.getNode(ISD::BITCAST, getCurDebugLoc(), ResultType, Val); } else if (ResultType != Val.getValueType() && ResultType.isInteger() && Val.getValueType().isInteger()) { // If a result value was tied to an input value, the computed result may // have a wider width than the expected result. Extract the relevant // portion. Val = DAG.getNode(ISD::TRUNCATE, getCurDebugLoc(), ResultType, Val); } assert(ResultType == Val.getValueType() && "Asm result value mismatch!"); } setValue(CS.getInstruction(), Val); // Don't need to use this as a chain in this case. if (!IA->hasSideEffects() && !hasMemory && IndirectStoresToEmit.empty()) return; } std::vector > StoresToEmit; // Process indirect outputs, first output all of the flagged copies out of // physregs. for (unsigned i = 0, e = IndirectStoresToEmit.size(); i != e; ++i) { RegsForValue &OutRegs = IndirectStoresToEmit[i].first; const Value *Ptr = IndirectStoresToEmit[i].second; SDValue OutVal = OutRegs.getCopyFromRegs(DAG, FuncInfo, getCurDebugLoc(), Chain, &Flag); StoresToEmit.push_back(std::make_pair(OutVal, Ptr)); } // Emit the non-flagged stores from the physregs. SmallVector OutChains; for (unsigned i = 0, e = StoresToEmit.size(); i != e; ++i) { SDValue Val = DAG.getStore(Chain, getCurDebugLoc(), StoresToEmit[i].first, getValue(StoresToEmit[i].second), MachinePointerInfo(StoresToEmit[i].second), false, false, 0); OutChains.push_back(Val); } if (!OutChains.empty()) Chain = DAG.getNode(ISD::TokenFactor, getCurDebugLoc(), MVT::Other, &OutChains[0], OutChains.size()); DAG.setRoot(Chain); } void SelectionDAGBuilder::visitVAStart(const CallInst &I) { DAG.setRoot(DAG.getNode(ISD::VASTART, getCurDebugLoc(), MVT::Other, getRoot(), getValue(I.getArgOperand(0)), DAG.getSrcValue(I.getArgOperand(0)))); } void SelectionDAGBuilder::visitVAArg(const VAArgInst &I) { const TargetData &TD = *TLI.getTargetData(); SDValue V = DAG.getVAArg(TLI.getValueType(I.getType()), getCurDebugLoc(), getRoot(), getValue(I.getOperand(0)), DAG.getSrcValue(I.getOperand(0)), TD.getABITypeAlignment(I.getType())); setValue(&I, V); DAG.setRoot(V.getValue(1)); } void SelectionDAGBuilder::visitVAEnd(const CallInst &I) { DAG.setRoot(DAG.getNode(ISD::VAEND, getCurDebugLoc(), MVT::Other, getRoot(), getValue(I.getArgOperand(0)), DAG.getSrcValue(I.getArgOperand(0)))); } void SelectionDAGBuilder::visitVACopy(const CallInst &I) { DAG.setRoot(DAG.getNode(ISD::VACOPY, getCurDebugLoc(), MVT::Other, getRoot(), getValue(I.getArgOperand(0)), getValue(I.getArgOperand(1)), DAG.getSrcValue(I.getArgOperand(0)), DAG.getSrcValue(I.getArgOperand(1)))); } /// TargetLowering::LowerCallTo - This is the default LowerCallTo /// implementation, which just calls LowerCall. /// FIXME: When all targets are /// migrated to using LowerCall, this hook should be integrated into SDISel. std::pair TargetLowering::LowerCallTo(SDValue Chain, Type *RetTy, bool RetSExt, bool RetZExt, bool isVarArg, bool isInreg, unsigned NumFixedArgs, CallingConv::ID CallConv, bool isTailCall, bool isReturnValueUsed, SDValue Callee, ArgListTy &Args, SelectionDAG &DAG, DebugLoc dl) const { // Handle all of the outgoing arguments. SmallVector Outs; SmallVector OutVals; for (unsigned i = 0, e = Args.size(); i != e; ++i) { SmallVector ValueVTs; ComputeValueVTs(*this, Args[i].Ty, ValueVTs); for (unsigned Value = 0, NumValues = ValueVTs.size(); Value != NumValues; ++Value) { EVT VT = ValueVTs[Value]; Type *ArgTy = VT.getTypeForEVT(RetTy->getContext()); SDValue Op = SDValue(Args[i].Node.getNode(), Args[i].Node.getResNo() + Value); ISD::ArgFlagsTy Flags; unsigned OriginalAlignment = getTargetData()->getABITypeAlignment(ArgTy); if (Args[i].isZExt) Flags.setZExt(); if (Args[i].isSExt) Flags.setSExt(); if (Args[i].isInReg) Flags.setInReg(); if (Args[i].isSRet) Flags.setSRet(); if (Args[i].isByVal) { Flags.setByVal(); PointerType *Ty = cast(Args[i].Ty); Type *ElementTy = Ty->getElementType(); Flags.setByValSize(getTargetData()->getTypeAllocSize(ElementTy)); // For ByVal, alignment should come from FE. BE will guess if this // info is not there but there are cases it cannot get right. unsigned FrameAlign; if (Args[i].Alignment) FrameAlign = Args[i].Alignment; else FrameAlign = getByValTypeAlignment(ElementTy); Flags.setByValAlign(FrameAlign); } if (Args[i].isNest) Flags.setNest(); Flags.setOrigAlign(OriginalAlignment); EVT PartVT = getRegisterType(RetTy->getContext(), VT); unsigned NumParts = getNumRegisters(RetTy->getContext(), VT); SmallVector Parts(NumParts); ISD::NodeType ExtendKind = ISD::ANY_EXTEND; if (Args[i].isSExt) ExtendKind = ISD::SIGN_EXTEND; else if (Args[i].isZExt) ExtendKind = ISD::ZERO_EXTEND; getCopyToParts(DAG, dl, Op, &Parts[0], NumParts, PartVT, ExtendKind); for (unsigned j = 0; j != NumParts; ++j) { // if it isn't first piece, alignment must be 1 ISD::OutputArg MyFlags(Flags, Parts[j].getValueType(), i < NumFixedArgs); if (NumParts > 1 && j == 0) MyFlags.Flags.setSplit(); else if (j != 0) MyFlags.Flags.setOrigAlign(1); Outs.push_back(MyFlags); OutVals.push_back(Parts[j]); } } } // Handle the incoming return values from the call. SmallVector Ins; SmallVector RetTys; ComputeValueVTs(*this, RetTy, RetTys); for (unsigned I = 0, E = RetTys.size(); I != E; ++I) { EVT VT = RetTys[I]; EVT RegisterVT = getRegisterType(RetTy->getContext(), VT); unsigned NumRegs = getNumRegisters(RetTy->getContext(), VT); for (unsigned i = 0; i != NumRegs; ++i) { ISD::InputArg MyFlags; MyFlags.VT = RegisterVT.getSimpleVT(); MyFlags.Used = isReturnValueUsed; if (RetSExt) MyFlags.Flags.setSExt(); if (RetZExt) MyFlags.Flags.setZExt(); if (isInreg) MyFlags.Flags.setInReg(); Ins.push_back(MyFlags); } } SmallVector InVals; Chain = LowerCall(Chain, Callee, CallConv, isVarArg, isTailCall, Outs, OutVals, Ins, dl, DAG, InVals); // Verify that the target's LowerCall behaved as expected. assert(Chain.getNode() && Chain.getValueType() == MVT::Other && "LowerCall didn't return a valid chain!"); assert((!isTailCall || InVals.empty()) && "LowerCall emitted a return value for a tail call!"); assert((isTailCall || InVals.size() == Ins.size()) && "LowerCall didn't emit the correct number of values!"); // For a tail call, the return value is merely live-out and there aren't // any nodes in the DAG representing it. Return a special value to // indicate that a tail call has been emitted and no more Instructions // should be processed in the current block. if (isTailCall) { DAG.setRoot(Chain); return std::make_pair(SDValue(), SDValue()); } DEBUG(for (unsigned i = 0, e = Ins.size(); i != e; ++i) { assert(InVals[i].getNode() && "LowerCall emitted a null value!"); assert(EVT(Ins[i].VT) == InVals[i].getValueType() && "LowerCall emitted a value with the wrong type!"); }); // Collect the legal value parts into potentially illegal values // that correspond to the original function's return values. ISD::NodeType AssertOp = ISD::DELETED_NODE; if (RetSExt) AssertOp = ISD::AssertSext; else if (RetZExt) AssertOp = ISD::AssertZext; SmallVector ReturnValues; unsigned CurReg = 0; for (unsigned I = 0, E = RetTys.size(); I != E; ++I) { EVT VT = RetTys[I]; EVT RegisterVT = getRegisterType(RetTy->getContext(), VT); unsigned NumRegs = getNumRegisters(RetTy->getContext(), VT); ReturnValues.push_back(getCopyFromParts(DAG, dl, &InVals[CurReg], NumRegs, RegisterVT, VT, AssertOp)); CurReg += NumRegs; } // For a function returning void, there is no return value. We can't create // such a node, so we just return a null return value in that case. In // that case, nothing will actually look at the value. if (ReturnValues.empty()) return std::make_pair(SDValue(), Chain); SDValue Res = DAG.getNode(ISD::MERGE_VALUES, dl, DAG.getVTList(&RetTys[0], RetTys.size()), &ReturnValues[0], ReturnValues.size()); return std::make_pair(Res, Chain); } void TargetLowering::LowerOperationWrapper(SDNode *N, SmallVectorImpl &Results, SelectionDAG &DAG) const { SDValue Res = LowerOperation(SDValue(N, 0), DAG); if (Res.getNode()) Results.push_back(Res); } SDValue TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { llvm_unreachable("LowerOperation not implemented for this target!"); return SDValue(); } void SelectionDAGBuilder::CopyValueToVirtualRegister(const Value *V, unsigned Reg) { SDValue Op = getNonRegisterValue(V); assert((Op.getOpcode() != ISD::CopyFromReg || cast(Op.getOperand(1))->getReg() != Reg) && "Copy from a reg to the same reg!"); assert(!TargetRegisterInfo::isPhysicalRegister(Reg) && "Is a physreg"); RegsForValue RFV(V->getContext(), TLI, Reg, V->getType()); SDValue Chain = DAG.getEntryNode(); RFV.getCopyToRegs(Op, DAG, getCurDebugLoc(), Chain, 0); PendingExports.push_back(Chain); } #include "llvm/CodeGen/SelectionDAGISel.h" /// isOnlyUsedInEntryBlock - If the specified argument is only used in the /// entry block, return true. This includes arguments used by switches, since /// the switch may expand into multiple basic blocks. static bool isOnlyUsedInEntryBlock(const Argument *A) { // With FastISel active, we may be splitting blocks, so force creation // of virtual registers for all non-dead arguments. if (EnableFastISel) return A->use_empty(); const BasicBlock *Entry = A->getParent()->begin(); for (Value::const_use_iterator UI = A->use_begin(), E = A->use_end(); UI != E; ++UI) { const User *U = *UI; if (cast(U)->getParent() != Entry || isa(U)) return false; // Use not in entry block. } return true; } void SelectionDAGISel::LowerArguments(const BasicBlock *LLVMBB) { // If this is the entry block, emit arguments. const Function &F = *LLVMBB->getParent(); SelectionDAG &DAG = SDB->DAG; DebugLoc dl = SDB->getCurDebugLoc(); const TargetData *TD = TLI.getTargetData(); SmallVector Ins; // Check whether the function can return without sret-demotion. SmallVector Outs; GetReturnInfo(F.getReturnType(), F.getAttributes().getRetAttributes(), Outs, TLI); if (!FuncInfo->CanLowerReturn) { // Put in an sret pointer parameter before all the other parameters. SmallVector ValueVTs; ComputeValueVTs(TLI, PointerType::getUnqual(F.getReturnType()), ValueVTs); // NOTE: Assuming that a pointer will never break down to more than one VT // or one register. ISD::ArgFlagsTy Flags; Flags.setSRet(); EVT RegisterVT = TLI.getRegisterType(*DAG.getContext(), ValueVTs[0]); ISD::InputArg RetArg(Flags, RegisterVT, true); Ins.push_back(RetArg); } // Set up the incoming argument description vector. unsigned Idx = 1; for (Function::const_arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I, ++Idx) { SmallVector ValueVTs; ComputeValueVTs(TLI, I->getType(), ValueVTs); bool isArgValueUsed = !I->use_empty(); for (unsigned Value = 0, NumValues = ValueVTs.size(); Value != NumValues; ++Value) { EVT VT = ValueVTs[Value]; Type *ArgTy = VT.getTypeForEVT(*DAG.getContext()); ISD::ArgFlagsTy Flags; unsigned OriginalAlignment = TD->getABITypeAlignment(ArgTy); if (F.paramHasAttr(Idx, Attribute::ZExt)) Flags.setZExt(); if (F.paramHasAttr(Idx, Attribute::SExt)) Flags.setSExt(); if (F.paramHasAttr(Idx, Attribute::InReg)) Flags.setInReg(); if (F.paramHasAttr(Idx, Attribute::StructRet)) Flags.setSRet(); if (F.paramHasAttr(Idx, Attribute::ByVal)) { Flags.setByVal(); PointerType *Ty = cast(I->getType()); Type *ElementTy = Ty->getElementType(); Flags.setByValSize(TD->getTypeAllocSize(ElementTy)); // For ByVal, alignment should be passed from FE. BE will guess if // this info is not there but there are cases it cannot get right. unsigned FrameAlign; if (F.getParamAlignment(Idx)) FrameAlign = F.getParamAlignment(Idx); else FrameAlign = TLI.getByValTypeAlignment(ElementTy); Flags.setByValAlign(FrameAlign); } if (F.paramHasAttr(Idx, Attribute::Nest)) Flags.setNest(); Flags.setOrigAlign(OriginalAlignment); EVT RegisterVT = TLI.getRegisterType(*CurDAG->getContext(), VT); unsigned NumRegs = TLI.getNumRegisters(*CurDAG->getContext(), VT); for (unsigned i = 0; i != NumRegs; ++i) { ISD::InputArg MyFlags(Flags, RegisterVT, isArgValueUsed); if (NumRegs > 1 && i == 0) MyFlags.Flags.setSplit(); // if it isn't first piece, alignment must be 1 else if (i > 0) MyFlags.Flags.setOrigAlign(1); Ins.push_back(MyFlags); } } } // Call the target to set up the argument values. SmallVector InVals; SDValue NewRoot = TLI.LowerFormalArguments(DAG.getRoot(), F.getCallingConv(), F.isVarArg(), Ins, dl, DAG, InVals); // Verify that the target's LowerFormalArguments behaved as expected. assert(NewRoot.getNode() && NewRoot.getValueType() == MVT::Other && "LowerFormalArguments didn't return a valid chain!"); assert(InVals.size() == Ins.size() && "LowerFormalArguments didn't emit the correct number of values!"); DEBUG({ for (unsigned i = 0, e = Ins.size(); i != e; ++i) { assert(InVals[i].getNode() && "LowerFormalArguments emitted a null value!"); assert(EVT(Ins[i].VT) == InVals[i].getValueType() && "LowerFormalArguments emitted a value with the wrong type!"); } }); // Update the DAG with the new chain value resulting from argument lowering. DAG.setRoot(NewRoot); // Set up the argument values. unsigned i = 0; Idx = 1; if (!FuncInfo->CanLowerReturn) { // Create a virtual register for the sret pointer, and put in a copy // from the sret argument into it. SmallVector ValueVTs; ComputeValueVTs(TLI, PointerType::getUnqual(F.getReturnType()), ValueVTs); EVT VT = ValueVTs[0]; EVT RegVT = TLI.getRegisterType(*CurDAG->getContext(), VT); ISD::NodeType AssertOp = ISD::DELETED_NODE; SDValue ArgValue = getCopyFromParts(DAG, dl, &InVals[0], 1, RegVT, VT, AssertOp); MachineFunction& MF = SDB->DAG.getMachineFunction(); MachineRegisterInfo& RegInfo = MF.getRegInfo(); unsigned SRetReg = RegInfo.createVirtualRegister(TLI.getRegClassFor(RegVT)); FuncInfo->DemoteRegister = SRetReg; NewRoot = SDB->DAG.getCopyToReg(NewRoot, SDB->getCurDebugLoc(), SRetReg, ArgValue); DAG.setRoot(NewRoot); // i indexes lowered arguments. Bump it past the hidden sret argument. // Idx indexes LLVM arguments. Don't touch it. ++i; } for (Function::const_arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I, ++Idx) { SmallVector ArgValues; SmallVector ValueVTs; ComputeValueVTs(TLI, I->getType(), ValueVTs); unsigned NumValues = ValueVTs.size(); // If this argument is unused then remember its value. It is used to generate // debugging information. if (I->use_empty() && NumValues) SDB->setUnusedArgValue(I, InVals[i]); for (unsigned Val = 0; Val != NumValues; ++Val) { EVT VT = ValueVTs[Val]; EVT PartVT = TLI.getRegisterType(*CurDAG->getContext(), VT); unsigned NumParts = TLI.getNumRegisters(*CurDAG->getContext(), VT); if (!I->use_empty()) { ISD::NodeType AssertOp = ISD::DELETED_NODE; if (F.paramHasAttr(Idx, Attribute::SExt)) AssertOp = ISD::AssertSext; else if (F.paramHasAttr(Idx, Attribute::ZExt)) AssertOp = ISD::AssertZext; ArgValues.push_back(getCopyFromParts(DAG, dl, &InVals[i], NumParts, PartVT, VT, AssertOp)); } i += NumParts; } // We don't need to do anything else for unused arguments. if (ArgValues.empty()) continue; // Note down frame index. if (FrameIndexSDNode *FI = dyn_cast(ArgValues[0].getNode())) FuncInfo->setArgumentFrameIndex(I, FI->getIndex()); SDValue Res = DAG.getMergeValues(&ArgValues[0], NumValues, SDB->getCurDebugLoc()); SDB->setValue(I, Res); if (!EnableFastISel && Res.getOpcode() == ISD::BUILD_PAIR) { if (LoadSDNode *LNode = dyn_cast(Res.getOperand(0).getNode())) if (FrameIndexSDNode *FI = dyn_cast(LNode->getBasePtr().getNode())) FuncInfo->setArgumentFrameIndex(I, FI->getIndex()); } // If this argument is live outside of the entry block, insert a copy from // wherever we got it to the vreg that other BB's will reference it as. if (!EnableFastISel && Res.getOpcode() == ISD::CopyFromReg) { // If we can, though, try to skip creating an unnecessary vreg. // FIXME: This isn't very clean... it would be nice to make this more // general. It's also subtly incompatible with the hacks FastISel // uses with vregs. unsigned Reg = cast(Res.getOperand(1))->getReg(); if (TargetRegisterInfo::isVirtualRegister(Reg)) { FuncInfo->ValueMap[I] = Reg; continue; } } if (!isOnlyUsedInEntryBlock(I)) { FuncInfo->InitializeRegForValue(I); SDB->CopyToExportRegsIfNeeded(I); } } assert(i == InVals.size() && "Argument register count mismatch!"); // Finally, if the target has anything special to do, allow it to do so. // FIXME: this should insert code into the DAG! EmitFunctionEntryCode(); } /// Handle PHI nodes in successor blocks. Emit code into the SelectionDAG to /// ensure constants are generated when needed. Remember the virtual registers /// that need to be added to the Machine PHI nodes as input. We cannot just /// directly add them, because expansion might result in multiple MBB's for one /// BB. As such, the start of the BB might correspond to a different MBB than /// the end. /// void SelectionDAGBuilder::HandlePHINodesInSuccessorBlocks(const BasicBlock *LLVMBB) { const TerminatorInst *TI = LLVMBB->getTerminator(); SmallPtrSet SuccsHandled; // Check successor nodes' PHI nodes that expect a constant to be available // from this block. for (unsigned succ = 0, e = TI->getNumSuccessors(); succ != e; ++succ) { const BasicBlock *SuccBB = TI->getSuccessor(succ); if (!isa(SuccBB->begin())) continue; MachineBasicBlock *SuccMBB = FuncInfo.MBBMap[SuccBB]; // If this terminator has multiple identical successors (common for // switches), only handle each succ once. if (!SuccsHandled.insert(SuccMBB)) continue; MachineBasicBlock::iterator MBBI = SuccMBB->begin(); // At this point we know that there is a 1-1 correspondence between LLVM PHI // nodes and Machine PHI nodes, but the incoming operands have not been // emitted yet. for (BasicBlock::const_iterator I = SuccBB->begin(); const PHINode *PN = dyn_cast(I); ++I) { // Ignore dead phi's. if (PN->use_empty()) continue; // Skip empty types if (PN->getType()->isEmptyTy()) continue; unsigned Reg; const Value *PHIOp = PN->getIncomingValueForBlock(LLVMBB); if (const Constant *C = dyn_cast(PHIOp)) { unsigned &RegOut = ConstantsOut[C]; if (RegOut == 0) { RegOut = FuncInfo.CreateRegs(C->getType()); CopyValueToVirtualRegister(C, RegOut); } Reg = RegOut; } else { DenseMap::iterator I = FuncInfo.ValueMap.find(PHIOp); if (I != FuncInfo.ValueMap.end()) Reg = I->second; else { assert(isa(PHIOp) && FuncInfo.StaticAllocaMap.count(cast(PHIOp)) && "Didn't codegen value into a register!??"); Reg = FuncInfo.CreateRegs(PHIOp->getType()); CopyValueToVirtualRegister(PHIOp, Reg); } } // Remember that this register needs to added to the machine PHI node as // the input for this MBB. SmallVector ValueVTs; ComputeValueVTs(TLI, PN->getType(), ValueVTs); for (unsigned vti = 0, vte = ValueVTs.size(); vti != vte; ++vti) { EVT VT = ValueVTs[vti]; unsigned NumRegisters = TLI.getNumRegisters(*DAG.getContext(), VT); for (unsigned i = 0, e = NumRegisters; i != e; ++i) FuncInfo.PHINodesToUpdate.push_back(std::make_pair(MBBI++, Reg+i)); Reg += NumRegisters; } } } ConstantsOut.clear(); } Index: vendor/llvm/dist/lib/CodeGen/TargetLoweringObjectFileImpl.cpp =================================================================== --- vendor/llvm/dist/lib/CodeGen/TargetLoweringObjectFileImpl.cpp (revision 228363) +++ vendor/llvm/dist/lib/CodeGen/TargetLoweringObjectFileImpl.cpp (revision 228364) @@ -1,625 +1,629 @@ //===-- llvm/CodeGen/TargetLoweringObjectFileImpl.cpp - Object File Info --===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file implements classes used to handle lowerings specific to common // object file formats. // //===----------------------------------------------------------------------===// #include "llvm/CodeGen/TargetLoweringObjectFileImpl.h" #include "llvm/Constants.h" #include "llvm/DerivedTypes.h" #include "llvm/Function.h" #include "llvm/GlobalVariable.h" #include "llvm/CodeGen/MachineModuleInfoImpls.h" #include "llvm/MC/MCContext.h" #include "llvm/MC/MCExpr.h" #include "llvm/MC/MCSectionMachO.h" #include "llvm/MC/MCSectionELF.h" #include "llvm/MC/MCSectionCOFF.h" #include "llvm/MC/MCStreamer.h" #include "llvm/MC/MCSymbol.h" #include "llvm/Target/Mangler.h" #include "llvm/Target/TargetData.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Target/TargetOptions.h" #include "llvm/Support/Dwarf.h" #include "llvm/Support/ELF.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/raw_ostream.h" #include "llvm/ADT/SmallString.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/Triple.h" using namespace llvm; using namespace dwarf; //===----------------------------------------------------------------------===// // ELF //===----------------------------------------------------------------------===// MCSymbol * TargetLoweringObjectFileELF::getCFIPersonalitySymbol(const GlobalValue *GV, Mangler *Mang, MachineModuleInfo *MMI) const { unsigned Encoding = getPersonalityEncoding(); switch (Encoding & 0x70) { default: report_fatal_error("We do not support this DWARF encoding yet!"); case dwarf::DW_EH_PE_absptr: return Mang->getSymbol(GV); break; case dwarf::DW_EH_PE_pcrel: { return getContext().GetOrCreateSymbol(StringRef("DW.ref.") + Mang->getSymbol(GV)->getName()); break; } } } void TargetLoweringObjectFileELF::emitPersonalityValue(MCStreamer &Streamer, const TargetMachine &TM, const MCSymbol *Sym) const { SmallString<64> NameData("DW.ref."); NameData += Sym->getName(); MCSymbol *Label = getContext().GetOrCreateSymbol(NameData); Streamer.EmitSymbolAttribute(Label, MCSA_Hidden); Streamer.EmitSymbolAttribute(Label, MCSA_Weak); StringRef Prefix = ".data."; NameData.insert(NameData.begin(), Prefix.begin(), Prefix.end()); unsigned Flags = ELF::SHF_ALLOC | ELF::SHF_WRITE | ELF::SHF_GROUP; const MCSection *Sec = getContext().getELFSection(NameData, ELF::SHT_PROGBITS, Flags, SectionKind::getDataRel(), 0, Label->getName()); Streamer.SwitchSection(Sec); Streamer.EmitValueToAlignment(8); Streamer.EmitSymbolAttribute(Label, MCSA_ELF_TypeObject); const MCExpr *E = MCConstantExpr::Create(8, getContext()); Streamer.EmitELFSize(Label, E); Streamer.EmitLabel(Label); unsigned Size = TM.getTargetData()->getPointerSize(); Streamer.EmitSymbolValue(Sym, Size); } static SectionKind getELFKindForNamedSection(StringRef Name, SectionKind K) { // N.B.: The defaults used in here are no the same ones used in MC. // We follow gcc, MC follows gas. For example, given ".section .eh_frame", // both gas and MC will produce a section with no flags. Given // section(".eh_frame") gcc will produce // .section .eh_frame,"a",@progbits if (Name.empty() || Name[0] != '.') return K; // Some lame default implementation based on some magic section names. if (Name == ".bss" || Name.startswith(".bss.") || Name.startswith(".gnu.linkonce.b.") || Name.startswith(".llvm.linkonce.b.") || Name == ".sbss" || Name.startswith(".sbss.") || Name.startswith(".gnu.linkonce.sb.") || Name.startswith(".llvm.linkonce.sb.")) return SectionKind::getBSS(); if (Name == ".tdata" || Name.startswith(".tdata.") || Name.startswith(".gnu.linkonce.td.") || Name.startswith(".llvm.linkonce.td.")) return SectionKind::getThreadData(); if (Name == ".tbss" || Name.startswith(".tbss.") || Name.startswith(".gnu.linkonce.tb.") || Name.startswith(".llvm.linkonce.tb.")) return SectionKind::getThreadBSS(); return K; } static unsigned getELFSectionType(StringRef Name, SectionKind K) { if (Name == ".init_array") return ELF::SHT_INIT_ARRAY; if (Name == ".fini_array") return ELF::SHT_FINI_ARRAY; if (Name == ".preinit_array") return ELF::SHT_PREINIT_ARRAY; if (K.isBSS() || K.isThreadBSS()) return ELF::SHT_NOBITS; return ELF::SHT_PROGBITS; } static unsigned getELFSectionFlags(SectionKind K) { unsigned Flags = 0; if (!K.isMetadata()) Flags |= ELF::SHF_ALLOC; if (K.isText()) Flags |= ELF::SHF_EXECINSTR; if (K.isWriteable()) Flags |= ELF::SHF_WRITE; if (K.isThreadLocal()) Flags |= ELF::SHF_TLS; // K.isMergeableConst() is left out to honour PR4650 if (K.isMergeableCString() || K.isMergeableConst4() || K.isMergeableConst8() || K.isMergeableConst16()) Flags |= ELF::SHF_MERGE; if (K.isMergeableCString()) Flags |= ELF::SHF_STRINGS; return Flags; } const MCSection *TargetLoweringObjectFileELF:: getExplicitSectionGlobal(const GlobalValue *GV, SectionKind Kind, Mangler *Mang, const TargetMachine &TM) const { StringRef SectionName = GV->getSection(); // Infer section flags from the section name if we can. Kind = getELFKindForNamedSection(SectionName, Kind); return getContext().getELFSection(SectionName, getELFSectionType(SectionName, Kind), getELFSectionFlags(Kind), Kind); } /// getSectionPrefixForGlobal - Return the section prefix name used by options /// FunctionsSections and DataSections. static const char *getSectionPrefixForGlobal(SectionKind Kind) { if (Kind.isText()) return ".text."; if (Kind.isReadOnly()) return ".rodata."; if (Kind.isThreadData()) return ".tdata."; if (Kind.isThreadBSS()) return ".tbss."; if (Kind.isDataNoRel()) return ".data."; if (Kind.isDataRelLocal()) return ".data.rel.local."; if (Kind.isDataRel()) return ".data.rel."; if (Kind.isReadOnlyWithRelLocal()) return ".data.rel.ro.local."; assert(Kind.isReadOnlyWithRel() && "Unknown section kind"); return ".data.rel.ro."; } const MCSection *TargetLoweringObjectFileELF:: SelectSectionForGlobal(const GlobalValue *GV, SectionKind Kind, Mangler *Mang, const TargetMachine &TM) const { // If we have -ffunction-section or -fdata-section then we should emit the // global value to a uniqued section specifically for it. bool EmitUniquedSection; if (Kind.isText()) EmitUniquedSection = TM.getFunctionSections(); else EmitUniquedSection = TM.getDataSections(); // If this global is linkonce/weak and the target handles this by emitting it // into a 'uniqued' section name, create and return the section now. if ((GV->isWeakForLinker() || EmitUniquedSection) && !Kind.isCommon() && !Kind.isBSS()) { const char *Prefix; Prefix = getSectionPrefixForGlobal(Kind); SmallString<128> Name(Prefix, Prefix+strlen(Prefix)); MCSymbol *Sym = Mang->getSymbol(GV); Name.append(Sym->getName().begin(), Sym->getName().end()); StringRef Group = ""; unsigned Flags = getELFSectionFlags(Kind); if (GV->isWeakForLinker()) { Group = Sym->getName(); Flags |= ELF::SHF_GROUP; } return getContext().getELFSection(Name.str(), getELFSectionType(Name.str(), Kind), Flags, Kind, 0, Group); } if (Kind.isText()) return TextSection; if (Kind.isMergeable1ByteCString() || Kind.isMergeable2ByteCString() || Kind.isMergeable4ByteCString()) { // We also need alignment here. // FIXME: this is getting the alignment of the character, not the // alignment of the global! unsigned Align = TM.getTargetData()->getPreferredAlignment(cast(GV)); const char *SizeSpec = ".rodata.str1."; if (Kind.isMergeable2ByteCString()) SizeSpec = ".rodata.str2."; else if (Kind.isMergeable4ByteCString()) SizeSpec = ".rodata.str4."; else assert(Kind.isMergeable1ByteCString() && "unknown string width"); std::string Name = SizeSpec + utostr(Align); return getContext().getELFSection(Name, ELF::SHT_PROGBITS, ELF::SHF_ALLOC | ELF::SHF_MERGE | ELF::SHF_STRINGS, Kind); } if (Kind.isMergeableConst()) { if (Kind.isMergeableConst4() && MergeableConst4Section) return MergeableConst4Section; if (Kind.isMergeableConst8() && MergeableConst8Section) return MergeableConst8Section; if (Kind.isMergeableConst16() && MergeableConst16Section) return MergeableConst16Section; return ReadOnlySection; // .const } if (Kind.isReadOnly()) return ReadOnlySection; if (Kind.isThreadData()) return TLSDataSection; if (Kind.isThreadBSS()) return TLSBSSSection; // Note: we claim that common symbols are put in BSSSection, but they are // really emitted with the magic .comm directive, which creates a symbol table // entry but not a section. if (Kind.isBSS() || Kind.isCommon()) return BSSSection; if (Kind.isDataNoRel()) return DataSection; if (Kind.isDataRelLocal()) return DataRelLocalSection; if (Kind.isDataRel()) return DataRelSection; if (Kind.isReadOnlyWithRelLocal()) return DataRelROLocalSection; assert(Kind.isReadOnlyWithRel() && "Unknown section kind"); return DataRelROSection; } /// getSectionForConstant - Given a mergeable constant with the /// specified size and relocation information, return a section that it /// should be placed in. const MCSection *TargetLoweringObjectFileELF:: getSectionForConstant(SectionKind Kind) const { if (Kind.isMergeableConst4() && MergeableConst4Section) return MergeableConst4Section; if (Kind.isMergeableConst8() && MergeableConst8Section) return MergeableConst8Section; if (Kind.isMergeableConst16() && MergeableConst16Section) return MergeableConst16Section; if (Kind.isReadOnly()) return ReadOnlySection; if (Kind.isReadOnlyWithRelLocal()) return DataRelROLocalSection; assert(Kind.isReadOnlyWithRel() && "Unknown section kind"); return DataRelROSection; } const MCExpr *TargetLoweringObjectFileELF:: getExprForDwarfGlobalReference(const GlobalValue *GV, Mangler *Mang, MachineModuleInfo *MMI, unsigned Encoding, MCStreamer &Streamer) const { if (Encoding & dwarf::DW_EH_PE_indirect) { MachineModuleInfoELF &ELFMMI = MMI->getObjFileInfo(); SmallString<128> Name; Mang->getNameWithPrefix(Name, GV, true); Name += ".DW.stub"; // Add information about the stub reference to ELFMMI so that the stub // gets emitted by the asmprinter. MCSymbol *SSym = getContext().GetOrCreateSymbol(Name.str()); MachineModuleInfoImpl::StubValueTy &StubSym = ELFMMI.getGVStubEntry(SSym); if (StubSym.getPointer() == 0) { MCSymbol *Sym = Mang->getSymbol(GV); StubSym = MachineModuleInfoImpl::StubValueTy(Sym, !GV->hasLocalLinkage()); } return TargetLoweringObjectFile:: getExprForDwarfReference(SSym, Encoding & ~dwarf::DW_EH_PE_indirect, Streamer); } return TargetLoweringObjectFile:: getExprForDwarfGlobalReference(GV, Mang, MMI, Encoding, Streamer); } //===----------------------------------------------------------------------===// // MachO //===----------------------------------------------------------------------===// const MCSection *TargetLoweringObjectFileMachO:: getExplicitSectionGlobal(const GlobalValue *GV, SectionKind Kind, Mangler *Mang, const TargetMachine &TM) const { // Parse the section specifier and create it if valid. StringRef Segment, Section; unsigned TAA = 0, StubSize = 0; bool TAAParsed; std::string ErrorCode = MCSectionMachO::ParseSectionSpecifier(GV->getSection(), Segment, Section, TAA, TAAParsed, StubSize); if (!ErrorCode.empty()) { // If invalid, report the error with report_fatal_error. report_fatal_error("Global variable '" + GV->getNameStr() + "' has an invalid section specifier '" + GV->getSection()+ "': " + ErrorCode + "."); // Fall back to dropping it into the data section. return DataSection; } // Get the section. const MCSectionMachO *S = getContext().getMachOSection(Segment, Section, TAA, StubSize, Kind); // If TAA wasn't set by ParseSectionSpecifier() above, // use the value returned by getMachOSection() as a default. if (!TAAParsed) TAA = S->getTypeAndAttributes(); // Okay, now that we got the section, verify that the TAA & StubSize agree. // If the user declared multiple globals with different section flags, we need // to reject it here. if (S->getTypeAndAttributes() != TAA || S->getStubSize() != StubSize) { // If invalid, report the error with report_fatal_error. report_fatal_error("Global variable '" + GV->getNameStr() + "' section type or attributes does not match previous" " section specifier"); } return S; } const MCSection *TargetLoweringObjectFileMachO:: SelectSectionForGlobal(const GlobalValue *GV, SectionKind Kind, Mangler *Mang, const TargetMachine &TM) const { // Handle thread local data. if (Kind.isThreadBSS()) return TLSBSSSection; if (Kind.isThreadData()) return TLSDataSection; if (Kind.isText()) return GV->isWeakForLinker() ? TextCoalSection : TextSection; // If this is weak/linkonce, put this in a coalescable section, either in text // or data depending on if it is writable. if (GV->isWeakForLinker()) { if (Kind.isReadOnly()) return ConstTextCoalSection; return DataCoalSection; } // FIXME: Alignment check should be handled by section classifier. if (Kind.isMergeable1ByteCString() && TM.getTargetData()->getPreferredAlignment(cast(GV)) < 32) return CStringSection; // Do not put 16-bit arrays in the UString section if they have an // externally visible label, this runs into issues with certain linker // versions. if (Kind.isMergeable2ByteCString() && !GV->hasExternalLinkage() && TM.getTargetData()->getPreferredAlignment(cast(GV)) < 32) return UStringSection; if (Kind.isMergeableConst()) { if (Kind.isMergeableConst4()) return FourByteConstantSection; if (Kind.isMergeableConst8()) return EightByteConstantSection; if (Kind.isMergeableConst16() && SixteenByteConstantSection) return SixteenByteConstantSection; } // Otherwise, if it is readonly, but not something we can specially optimize, // just drop it in .const. if (Kind.isReadOnly()) return ReadOnlySection; // If this is marked const, put it into a const section. But if the dynamic // linker needs to write to it, put it in the data segment. if (Kind.isReadOnlyWithRel()) return ConstDataSection; // Put zero initialized globals with strong external linkage in the // DATA, __common section with the .zerofill directive. if (Kind.isBSSExtern()) return DataCommonSection; // Put zero initialized globals with local linkage in __DATA,__bss directive // with the .zerofill directive (aka .lcomm). if (Kind.isBSSLocal()) return DataBSSSection; // Otherwise, just drop the variable in the normal data section. return DataSection; } const MCSection * TargetLoweringObjectFileMachO::getSectionForConstant(SectionKind Kind) const { // If this constant requires a relocation, we have to put it in the data // segment, not in the text segment. if (Kind.isDataRel() || Kind.isReadOnlyWithRel()) return ConstDataSection; if (Kind.isMergeableConst4()) return FourByteConstantSection; if (Kind.isMergeableConst8()) return EightByteConstantSection; if (Kind.isMergeableConst16() && SixteenByteConstantSection) return SixteenByteConstantSection; return ReadOnlySection; // .const } /// shouldEmitUsedDirectiveFor - This hook allows targets to selectively decide /// not to emit the UsedDirective for some symbols in llvm.used. // FIXME: REMOVE this (rdar://7071300) bool TargetLoweringObjectFileMachO:: shouldEmitUsedDirectiveFor(const GlobalValue *GV, Mangler *Mang) const { /// On Darwin, internally linked data beginning with "L" or "l" does not have /// the directive emitted (this occurs in ObjC metadata). if (!GV) return false; // Check whether the mangled name has the "Private" or "LinkerPrivate" prefix. if (GV->hasLocalLinkage() && !isa(GV)) { // FIXME: ObjC metadata is currently emitted as internal symbols that have // \1L and \0l prefixes on them. Fix them to be Private/LinkerPrivate and // this horrible hack can go away. MCSymbol *Sym = Mang->getSymbol(GV); if (Sym->getName()[0] == 'L' || Sym->getName()[0] == 'l') return false; } return true; } const MCExpr *TargetLoweringObjectFileMachO:: getExprForDwarfGlobalReference(const GlobalValue *GV, Mangler *Mang, MachineModuleInfo *MMI, unsigned Encoding, MCStreamer &Streamer) const { // The mach-o version of this method defaults to returning a stub reference. if (Encoding & DW_EH_PE_indirect) { MachineModuleInfoMachO &MachOMMI = MMI->getObjFileInfo(); SmallString<128> Name; Mang->getNameWithPrefix(Name, GV, true); Name += "$non_lazy_ptr"; // Add information about the stub reference to MachOMMI so that the stub // gets emitted by the asmprinter. MCSymbol *SSym = getContext().GetOrCreateSymbol(Name.str()); - MachineModuleInfoImpl::StubValueTy &StubSym = MachOMMI.getGVStubEntry(SSym); + MachineModuleInfoImpl::StubValueTy &StubSym = + GV->hasHiddenVisibility() ? MachOMMI.getHiddenGVStubEntry(SSym) : + MachOMMI.getGVStubEntry(SSym); if (StubSym.getPointer() == 0) { MCSymbol *Sym = Mang->getSymbol(GV); StubSym = MachineModuleInfoImpl::StubValueTy(Sym, !GV->hasLocalLinkage()); } return TargetLoweringObjectFile:: getExprForDwarfReference(SSym, Encoding & ~dwarf::DW_EH_PE_indirect, Streamer); } return TargetLoweringObjectFile:: getExprForDwarfGlobalReference(GV, Mang, MMI, Encoding, Streamer); } MCSymbol *TargetLoweringObjectFileMachO:: getCFIPersonalitySymbol(const GlobalValue *GV, Mangler *Mang, MachineModuleInfo *MMI) const { // The mach-o version of this method defaults to returning a stub reference. MachineModuleInfoMachO &MachOMMI = MMI->getObjFileInfo(); SmallString<128> Name; Mang->getNameWithPrefix(Name, GV, true); Name += "$non_lazy_ptr"; // Add information about the stub reference to MachOMMI so that the stub // gets emitted by the asmprinter. MCSymbol *SSym = getContext().GetOrCreateSymbol(Name.str()); - MachineModuleInfoImpl::StubValueTy &StubSym = MachOMMI.getGVStubEntry(SSym); + MachineModuleInfoImpl::StubValueTy &StubSym = + GV->hasHiddenVisibility() ? MachOMMI.getHiddenGVStubEntry(SSym) : + MachOMMI.getGVStubEntry(SSym); if (StubSym.getPointer() == 0) { MCSymbol *Sym = Mang->getSymbol(GV); StubSym = MachineModuleInfoImpl::StubValueTy(Sym, !GV->hasLocalLinkage()); } return SSym; } //===----------------------------------------------------------------------===// // COFF //===----------------------------------------------------------------------===// static unsigned getCOFFSectionFlags(SectionKind K) { unsigned Flags = 0; if (K.isMetadata()) Flags |= COFF::IMAGE_SCN_MEM_DISCARDABLE; else if (K.isText()) Flags |= COFF::IMAGE_SCN_MEM_EXECUTE | COFF::IMAGE_SCN_MEM_READ | COFF::IMAGE_SCN_CNT_CODE; else if (K.isBSS ()) Flags |= COFF::IMAGE_SCN_CNT_UNINITIALIZED_DATA | COFF::IMAGE_SCN_MEM_READ | COFF::IMAGE_SCN_MEM_WRITE; else if (K.isReadOnly()) Flags |= COFF::IMAGE_SCN_CNT_INITIALIZED_DATA | COFF::IMAGE_SCN_MEM_READ; else if (K.isWriteable()) Flags |= COFF::IMAGE_SCN_CNT_INITIALIZED_DATA | COFF::IMAGE_SCN_MEM_READ | COFF::IMAGE_SCN_MEM_WRITE; return Flags; } const MCSection *TargetLoweringObjectFileCOFF:: getExplicitSectionGlobal(const GlobalValue *GV, SectionKind Kind, Mangler *Mang, const TargetMachine &TM) const { return getContext().getCOFFSection(GV->getSection(), getCOFFSectionFlags(Kind), Kind); } static const char *getCOFFSectionPrefixForUniqueGlobal(SectionKind Kind) { if (Kind.isText()) return ".text$"; if (Kind.isBSS ()) return ".bss$"; if (Kind.isWriteable()) return ".data$"; return ".rdata$"; } const MCSection *TargetLoweringObjectFileCOFF:: SelectSectionForGlobal(const GlobalValue *GV, SectionKind Kind, Mangler *Mang, const TargetMachine &TM) const { assert(!Kind.isThreadLocal() && "Doesn't support TLS"); // If this global is linkonce/weak and the target handles this by emitting it // into a 'uniqued' section name, create and return the section now. if (GV->isWeakForLinker()) { const char *Prefix = getCOFFSectionPrefixForUniqueGlobal(Kind); SmallString<128> Name(Prefix, Prefix+strlen(Prefix)); MCSymbol *Sym = Mang->getSymbol(GV); Name.append(Sym->getName().begin() + 1, Sym->getName().end()); unsigned Characteristics = getCOFFSectionFlags(Kind); Characteristics |= COFF::IMAGE_SCN_LNK_COMDAT; return getContext().getCOFFSection(Name.str(), Characteristics, COFF::IMAGE_COMDAT_SELECT_ANY, Kind); } if (Kind.isText()) return getTextSection(); return getDataSection(); } Index: vendor/llvm/dist/lib/Target/ARM/ARMBaseRegisterInfo.cpp =================================================================== --- vendor/llvm/dist/lib/Target/ARM/ARMBaseRegisterInfo.cpp (revision 228363) +++ vendor/llvm/dist/lib/Target/ARM/ARMBaseRegisterInfo.cpp (revision 228364) @@ -1,1208 +1,1221 @@ //===- ARMBaseRegisterInfo.cpp - ARM Register Information -------*- C++ -*-===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file contains the base ARM implementation of TargetRegisterInfo class. // //===----------------------------------------------------------------------===// #include "ARM.h" #include "ARMBaseInstrInfo.h" #include "ARMBaseRegisterInfo.h" #include "ARMFrameLowering.h" #include "ARMInstrInfo.h" #include "ARMMachineFunctionInfo.h" #include "ARMSubtarget.h" #include "MCTargetDesc/ARMAddressingModes.h" #include "llvm/Constants.h" #include "llvm/DerivedTypes.h" #include "llvm/Function.h" #include "llvm/LLVMContext.h" #include "llvm/CodeGen/MachineConstantPool.h" #include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/CodeGen/MachineRegisterInfo.h" #include "llvm/CodeGen/RegisterScavenging.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Target/TargetFrameLowering.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Target/TargetOptions.h" #include "llvm/ADT/BitVector.h" #include "llvm/ADT/SmallVector.h" #include "llvm/Support/CommandLine.h" #define GET_REGINFO_TARGET_DESC #include "ARMGenRegisterInfo.inc" using namespace llvm; static cl::opt ForceAllBaseRegAlloc("arm-force-base-reg-alloc", cl::Hidden, cl::init(false), cl::desc("Force use of virtual base registers for stack load/store")); static cl::opt EnableLocalStackAlloc("enable-local-stack-alloc", cl::init(true), cl::Hidden, cl::desc("Enable pre-regalloc stack frame index allocation")); static cl::opt EnableBasePointer("arm-use-base-pointer", cl::Hidden, cl::init(true), cl::desc("Enable use of a base pointer for complex stack frames")); ARMBaseRegisterInfo::ARMBaseRegisterInfo(const ARMBaseInstrInfo &tii, const ARMSubtarget &sti) : ARMGenRegisterInfo(ARM::LR), TII(tii), STI(sti), FramePtr((STI.isTargetDarwin() || STI.isThumb()) ? ARM::R7 : ARM::R11), BasePtr(ARM::R6) { } const unsigned* ARMBaseRegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const { + bool ghcCall = false; + + if (MF) { + const Function *F = MF->getFunction(); + ghcCall = (F ? F->getCallingConv() == CallingConv::GHC : false); + } + static const unsigned CalleeSavedRegs[] = { ARM::LR, ARM::R11, ARM::R10, ARM::R9, ARM::R8, ARM::R7, ARM::R6, ARM::R5, ARM::R4, ARM::D15, ARM::D14, ARM::D13, ARM::D12, ARM::D11, ARM::D10, ARM::D9, ARM::D8, 0 }; static const unsigned DarwinCalleeSavedRegs[] = { // Darwin ABI deviates from ARM standard ABI. R9 is not a callee-saved // register. ARM::LR, ARM::R7, ARM::R6, ARM::R5, ARM::R4, ARM::R11, ARM::R10, ARM::R8, ARM::D15, ARM::D14, ARM::D13, ARM::D12, ARM::D11, ARM::D10, ARM::D9, ARM::D8, 0 }; - return STI.isTargetDarwin() ? DarwinCalleeSavedRegs : CalleeSavedRegs; + + static const unsigned GhcCalleeSavedRegs[] = { + 0 + }; + + return ghcCall ? GhcCalleeSavedRegs : + STI.isTargetDarwin() ? DarwinCalleeSavedRegs : CalleeSavedRegs; } BitVector ARMBaseRegisterInfo:: getReservedRegs(const MachineFunction &MF) const { const TargetFrameLowering *TFI = MF.getTarget().getFrameLowering(); // FIXME: avoid re-calculating this every time. BitVector Reserved(getNumRegs()); Reserved.set(ARM::SP); Reserved.set(ARM::PC); Reserved.set(ARM::FPSCR); if (TFI->hasFP(MF)) Reserved.set(FramePtr); if (hasBasePointer(MF)) Reserved.set(BasePtr); // Some targets reserve R9. if (STI.isR9Reserved()) Reserved.set(ARM::R9); // Reserve D16-D31 if the subtarget doesn't support them. if (!STI.hasVFP3() || STI.hasD16()) { assert(ARM::D31 == ARM::D16 + 15); for (unsigned i = 0; i != 16; ++i) Reserved.set(ARM::D16 + i); } return Reserved; } bool ARMBaseRegisterInfo::isReservedReg(const MachineFunction &MF, unsigned Reg) const { const TargetFrameLowering *TFI = MF.getTarget().getFrameLowering(); switch (Reg) { default: break; case ARM::SP: case ARM::PC: return true; case ARM::R6: if (hasBasePointer(MF)) return true; break; case ARM::R7: case ARM::R11: if (FramePtr == Reg && TFI->hasFP(MF)) return true; break; case ARM::R9: return STI.isR9Reserved(); } return false; } const TargetRegisterClass * ARMBaseRegisterInfo::getMatchingSuperRegClass(const TargetRegisterClass *A, const TargetRegisterClass *B, unsigned SubIdx) const { switch (SubIdx) { default: return 0; case ARM::ssub_0: case ARM::ssub_1: case ARM::ssub_2: case ARM::ssub_3: { // S sub-registers. if (A->getSize() == 8) { if (B == &ARM::SPR_8RegClass) return &ARM::DPR_8RegClass; assert(B == &ARM::SPRRegClass && "Expecting SPR register class!"); if (A == &ARM::DPR_8RegClass) return A; return &ARM::DPR_VFP2RegClass; } if (A->getSize() == 16) { if (B == &ARM::SPR_8RegClass) return &ARM::QPR_8RegClass; return &ARM::QPR_VFP2RegClass; } if (A->getSize() == 32) { if (B == &ARM::SPR_8RegClass) return 0; // Do not allow coalescing! return &ARM::QQPR_VFP2RegClass; } assert(A->getSize() == 64 && "Expecting a QQQQ register class!"); return 0; // Do not allow coalescing! } case ARM::dsub_0: case ARM::dsub_1: case ARM::dsub_2: case ARM::dsub_3: { // D sub-registers. if (A->getSize() == 16) { if (B == &ARM::DPR_VFP2RegClass) return &ARM::QPR_VFP2RegClass; if (B == &ARM::DPR_8RegClass) return 0; // Do not allow coalescing! return A; } if (A->getSize() == 32) { if (B == &ARM::DPR_VFP2RegClass) return &ARM::QQPR_VFP2RegClass; if (B == &ARM::DPR_8RegClass) return 0; // Do not allow coalescing! return A; } assert(A->getSize() == 64 && "Expecting a QQQQ register class!"); if (B != &ARM::DPRRegClass) return 0; // Do not allow coalescing! return A; } case ARM::dsub_4: case ARM::dsub_5: case ARM::dsub_6: case ARM::dsub_7: { // D sub-registers of QQQQ registers. if (A->getSize() == 64 && B == &ARM::DPRRegClass) return A; return 0; // Do not allow coalescing! } case ARM::qsub_0: case ARM::qsub_1: { // Q sub-registers. if (A->getSize() == 32) { if (B == &ARM::QPR_VFP2RegClass) return &ARM::QQPR_VFP2RegClass; if (B == &ARM::QPR_8RegClass) return 0; // Do not allow coalescing! return A; } assert(A->getSize() == 64 && "Expecting a QQQQ register class!"); if (B == &ARM::QPRRegClass) return A; return 0; // Do not allow coalescing! } case ARM::qsub_2: case ARM::qsub_3: { // Q sub-registers of QQQQ registers. if (A->getSize() == 64 && B == &ARM::QPRRegClass) return A; return 0; // Do not allow coalescing! } } return 0; } bool ARMBaseRegisterInfo::canCombineSubRegIndices(const TargetRegisterClass *RC, SmallVectorImpl &SubIndices, unsigned &NewSubIdx) const { unsigned Size = RC->getSize() * 8; if (Size < 6) return 0; NewSubIdx = 0; // Whole register. unsigned NumRegs = SubIndices.size(); if (NumRegs == 8) { // 8 D registers -> 1 QQQQ register. return (Size == 512 && SubIndices[0] == ARM::dsub_0 && SubIndices[1] == ARM::dsub_1 && SubIndices[2] == ARM::dsub_2 && SubIndices[3] == ARM::dsub_3 && SubIndices[4] == ARM::dsub_4 && SubIndices[5] == ARM::dsub_5 && SubIndices[6] == ARM::dsub_6 && SubIndices[7] == ARM::dsub_7); } else if (NumRegs == 4) { if (SubIndices[0] == ARM::qsub_0) { // 4 Q registers -> 1 QQQQ register. return (Size == 512 && SubIndices[1] == ARM::qsub_1 && SubIndices[2] == ARM::qsub_2 && SubIndices[3] == ARM::qsub_3); } else if (SubIndices[0] == ARM::dsub_0) { // 4 D registers -> 1 QQ register. if (Size >= 256 && SubIndices[1] == ARM::dsub_1 && SubIndices[2] == ARM::dsub_2 && SubIndices[3] == ARM::dsub_3) { if (Size == 512) NewSubIdx = ARM::qqsub_0; return true; } } else if (SubIndices[0] == ARM::dsub_4) { // 4 D registers -> 1 QQ register (2nd). if (Size == 512 && SubIndices[1] == ARM::dsub_5 && SubIndices[2] == ARM::dsub_6 && SubIndices[3] == ARM::dsub_7) { NewSubIdx = ARM::qqsub_1; return true; } } else if (SubIndices[0] == ARM::ssub_0) { // 4 S registers -> 1 Q register. if (Size >= 128 && SubIndices[1] == ARM::ssub_1 && SubIndices[2] == ARM::ssub_2 && SubIndices[3] == ARM::ssub_3) { if (Size >= 256) NewSubIdx = ARM::qsub_0; return true; } } } else if (NumRegs == 2) { if (SubIndices[0] == ARM::qsub_0) { // 2 Q registers -> 1 QQ register. if (Size >= 256 && SubIndices[1] == ARM::qsub_1) { if (Size == 512) NewSubIdx = ARM::qqsub_0; return true; } } else if (SubIndices[0] == ARM::qsub_2) { // 2 Q registers -> 1 QQ register (2nd). if (Size == 512 && SubIndices[1] == ARM::qsub_3) { NewSubIdx = ARM::qqsub_1; return true; } } else if (SubIndices[0] == ARM::dsub_0) { // 2 D registers -> 1 Q register. if (Size >= 128 && SubIndices[1] == ARM::dsub_1) { if (Size >= 256) NewSubIdx = ARM::qsub_0; return true; } } else if (SubIndices[0] == ARM::dsub_2) { // 2 D registers -> 1 Q register (2nd). if (Size >= 256 && SubIndices[1] == ARM::dsub_3) { NewSubIdx = ARM::qsub_1; return true; } } else if (SubIndices[0] == ARM::dsub_4) { // 2 D registers -> 1 Q register (3rd). if (Size == 512 && SubIndices[1] == ARM::dsub_5) { NewSubIdx = ARM::qsub_2; return true; } } else if (SubIndices[0] == ARM::dsub_6) { // 2 D registers -> 1 Q register (3rd). if (Size == 512 && SubIndices[1] == ARM::dsub_7) { NewSubIdx = ARM::qsub_3; return true; } } else if (SubIndices[0] == ARM::ssub_0) { // 2 S registers -> 1 D register. if (SubIndices[1] == ARM::ssub_1) { if (Size >= 128) NewSubIdx = ARM::dsub_0; return true; } } else if (SubIndices[0] == ARM::ssub_2) { // 2 S registers -> 1 D register (2nd). if (Size >= 128 && SubIndices[1] == ARM::ssub_3) { NewSubIdx = ARM::dsub_1; return true; } } } return false; } const TargetRegisterClass* ARMBaseRegisterInfo::getLargestLegalSuperClass(const TargetRegisterClass *RC) const { const TargetRegisterClass *Super = RC; TargetRegisterClass::sc_iterator I = RC->getSuperClasses(); do { switch (Super->getID()) { case ARM::GPRRegClassID: case ARM::SPRRegClassID: case ARM::DPRRegClassID: case ARM::QPRRegClassID: case ARM::QQPRRegClassID: case ARM::QQQQPRRegClassID: return Super; } Super = *I++; } while (Super); return RC; } const TargetRegisterClass * ARMBaseRegisterInfo::getPointerRegClass(unsigned Kind) const { return ARM::GPRRegisterClass; } const TargetRegisterClass * ARMBaseRegisterInfo::getCrossCopyRegClass(const TargetRegisterClass *RC) const { if (RC == &ARM::CCRRegClass) return 0; // Can't copy CCR registers. return RC; } unsigned ARMBaseRegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC, MachineFunction &MF) const { const TargetFrameLowering *TFI = MF.getTarget().getFrameLowering(); switch (RC->getID()) { default: return 0; case ARM::tGPRRegClassID: return TFI->hasFP(MF) ? 4 : 5; case ARM::GPRRegClassID: { unsigned FP = TFI->hasFP(MF) ? 1 : 0; return 10 - FP - (STI.isR9Reserved() ? 1 : 0); } case ARM::SPRRegClassID: // Currently not used as 'rep' register class. case ARM::DPRRegClassID: return 32 - 10; } } /// getRawAllocationOrder - Returns the register allocation order for a /// specified register class with a target-dependent hint. ArrayRef ARMBaseRegisterInfo::getRawAllocationOrder(const TargetRegisterClass *RC, unsigned HintType, unsigned HintReg, const MachineFunction &MF) const { const TargetFrameLowering *TFI = MF.getTarget().getFrameLowering(); // Alternative register allocation orders when favoring even / odd registers // of register pairs. // No FP, R9 is available. static const unsigned GPREven1[] = { ARM::R0, ARM::R2, ARM::R4, ARM::R6, ARM::R8, ARM::R10, ARM::R1, ARM::R3, ARM::R12,ARM::LR, ARM::R5, ARM::R7, ARM::R9, ARM::R11 }; static const unsigned GPROdd1[] = { ARM::R1, ARM::R3, ARM::R5, ARM::R7, ARM::R9, ARM::R11, ARM::R0, ARM::R2, ARM::R12,ARM::LR, ARM::R4, ARM::R6, ARM::R8, ARM::R10 }; // FP is R7, R9 is available. static const unsigned GPREven2[] = { ARM::R0, ARM::R2, ARM::R4, ARM::R8, ARM::R10, ARM::R1, ARM::R3, ARM::R12,ARM::LR, ARM::R5, ARM::R6, ARM::R9, ARM::R11 }; static const unsigned GPROdd2[] = { ARM::R1, ARM::R3, ARM::R5, ARM::R9, ARM::R11, ARM::R0, ARM::R2, ARM::R12,ARM::LR, ARM::R4, ARM::R6, ARM::R8, ARM::R10 }; // FP is R11, R9 is available. static const unsigned GPREven3[] = { ARM::R0, ARM::R2, ARM::R4, ARM::R6, ARM::R8, ARM::R1, ARM::R3, ARM::R10,ARM::R12,ARM::LR, ARM::R5, ARM::R7, ARM::R9 }; static const unsigned GPROdd3[] = { ARM::R1, ARM::R3, ARM::R5, ARM::R6, ARM::R9, ARM::R0, ARM::R2, ARM::R10,ARM::R12,ARM::LR, ARM::R4, ARM::R7, ARM::R8 }; // No FP, R9 is not available. static const unsigned GPREven4[] = { ARM::R0, ARM::R2, ARM::R4, ARM::R6, ARM::R10, ARM::R1, ARM::R3, ARM::R12,ARM::LR, ARM::R5, ARM::R7, ARM::R8, ARM::R11 }; static const unsigned GPROdd4[] = { ARM::R1, ARM::R3, ARM::R5, ARM::R7, ARM::R11, ARM::R0, ARM::R2, ARM::R12,ARM::LR, ARM::R4, ARM::R6, ARM::R8, ARM::R10 }; // FP is R7, R9 is not available. static const unsigned GPREven5[] = { ARM::R0, ARM::R2, ARM::R4, ARM::R10, ARM::R1, ARM::R3, ARM::R12,ARM::LR, ARM::R5, ARM::R6, ARM::R8, ARM::R11 }; static const unsigned GPROdd5[] = { ARM::R1, ARM::R3, ARM::R5, ARM::R11, ARM::R0, ARM::R2, ARM::R12,ARM::LR, ARM::R4, ARM::R6, ARM::R8, ARM::R10 }; // FP is R11, R9 is not available. static const unsigned GPREven6[] = { ARM::R0, ARM::R2, ARM::R4, ARM::R6, ARM::R1, ARM::R3, ARM::R10,ARM::R12,ARM::LR, ARM::R5, ARM::R7, ARM::R8 }; static const unsigned GPROdd6[] = { ARM::R1, ARM::R3, ARM::R5, ARM::R7, ARM::R0, ARM::R2, ARM::R10,ARM::R12,ARM::LR, ARM::R4, ARM::R6, ARM::R8 }; // We only support even/odd hints for GPR and rGPR. if (RC != ARM::GPRRegisterClass && RC != ARM::rGPRRegisterClass) return RC->getRawAllocationOrder(MF); if (HintType == ARMRI::RegPairEven) { if (isPhysicalRegister(HintReg) && getRegisterPairEven(HintReg, MF) == 0) // It's no longer possible to fulfill this hint. Return the default // allocation order. return RC->getRawAllocationOrder(MF); if (!TFI->hasFP(MF)) { if (!STI.isR9Reserved()) return makeArrayRef(GPREven1); else return makeArrayRef(GPREven4); } else if (FramePtr == ARM::R7) { if (!STI.isR9Reserved()) return makeArrayRef(GPREven2); else return makeArrayRef(GPREven5); } else { // FramePtr == ARM::R11 if (!STI.isR9Reserved()) return makeArrayRef(GPREven3); else return makeArrayRef(GPREven6); } } else if (HintType == ARMRI::RegPairOdd) { if (isPhysicalRegister(HintReg) && getRegisterPairOdd(HintReg, MF) == 0) // It's no longer possible to fulfill this hint. Return the default // allocation order. return RC->getRawAllocationOrder(MF); if (!TFI->hasFP(MF)) { if (!STI.isR9Reserved()) return makeArrayRef(GPROdd1); else return makeArrayRef(GPROdd4); } else if (FramePtr == ARM::R7) { if (!STI.isR9Reserved()) return makeArrayRef(GPROdd2); else return makeArrayRef(GPROdd5); } else { // FramePtr == ARM::R11 if (!STI.isR9Reserved()) return makeArrayRef(GPROdd3); else return makeArrayRef(GPROdd6); } } return RC->getRawAllocationOrder(MF); } /// ResolveRegAllocHint - Resolves the specified register allocation hint /// to a physical register. Returns the physical register if it is successful. unsigned ARMBaseRegisterInfo::ResolveRegAllocHint(unsigned Type, unsigned Reg, const MachineFunction &MF) const { if (Reg == 0 || !isPhysicalRegister(Reg)) return 0; if (Type == 0) return Reg; else if (Type == (unsigned)ARMRI::RegPairOdd) // Odd register. return getRegisterPairOdd(Reg, MF); else if (Type == (unsigned)ARMRI::RegPairEven) // Even register. return getRegisterPairEven(Reg, MF); return 0; } void ARMBaseRegisterInfo::UpdateRegAllocHint(unsigned Reg, unsigned NewReg, MachineFunction &MF) const { MachineRegisterInfo *MRI = &MF.getRegInfo(); std::pair Hint = MRI->getRegAllocationHint(Reg); if ((Hint.first == (unsigned)ARMRI::RegPairOdd || Hint.first == (unsigned)ARMRI::RegPairEven) && TargetRegisterInfo::isVirtualRegister(Hint.second)) { // If 'Reg' is one of the even / odd register pair and it's now changed // (e.g. coalesced) into a different register. The other register of the // pair allocation hint must be updated to reflect the relationship // change. unsigned OtherReg = Hint.second; Hint = MRI->getRegAllocationHint(OtherReg); if (Hint.second == Reg) // Make sure the pair has not already divorced. MRI->setRegAllocationHint(OtherReg, Hint.first, NewReg); } } bool ARMBaseRegisterInfo::avoidWriteAfterWrite(const TargetRegisterClass *RC) const { // CortexA9 has a Write-after-write hazard for NEON registers. if (!STI.isCortexA9()) return false; switch (RC->getID()) { case ARM::DPRRegClassID: case ARM::DPR_8RegClassID: case ARM::DPR_VFP2RegClassID: case ARM::QPRRegClassID: case ARM::QPR_8RegClassID: case ARM::QPR_VFP2RegClassID: case ARM::SPRRegClassID: case ARM::SPR_8RegClassID: // Avoid reusing S, D, and Q registers. // Don't increase register pressure for QQ and QQQQ. return true; default: return false; } } bool ARMBaseRegisterInfo::hasBasePointer(const MachineFunction &MF) const { const MachineFrameInfo *MFI = MF.getFrameInfo(); const ARMFunctionInfo *AFI = MF.getInfo(); if (!EnableBasePointer) return false; if (needsStackRealignment(MF) && MFI->hasVarSizedObjects()) return true; // Thumb has trouble with negative offsets from the FP. Thumb2 has a limited // negative range for ldr/str (255), and thumb1 is positive offsets only. // It's going to be better to use the SP or Base Pointer instead. When there // are variable sized objects, we can't reference off of the SP, so we // reserve a Base Pointer. if (AFI->isThumbFunction() && MFI->hasVarSizedObjects()) { // Conservatively estimate whether the negative offset from the frame // pointer will be sufficient to reach. If a function has a smallish // frame, it's less likely to have lots of spills and callee saved // space, so it's all more likely to be within range of the frame pointer. // If it's wrong, the scavenger will still enable access to work, it just // won't be optimal. if (AFI->isThumb2Function() && MFI->getLocalFrameSize() < 128) return false; return true; } return false; } bool ARMBaseRegisterInfo::canRealignStack(const MachineFunction &MF) const { const MachineFrameInfo *MFI = MF.getFrameInfo(); const ARMFunctionInfo *AFI = MF.getInfo(); // We can't realign the stack if: // 1. Dynamic stack realignment is explicitly disabled, // 2. This is a Thumb1 function (it's not useful, so we don't bother), or // 3. There are VLAs in the function and the base pointer is disabled. return (RealignStack && !AFI->isThumb1OnlyFunction() && (!MFI->hasVarSizedObjects() || EnableBasePointer)); } bool ARMBaseRegisterInfo:: needsStackRealignment(const MachineFunction &MF) const { const MachineFrameInfo *MFI = MF.getFrameInfo(); const Function *F = MF.getFunction(); unsigned StackAlign = MF.getTarget().getFrameLowering()->getStackAlignment(); bool requiresRealignment = ((MFI->getLocalFrameMaxAlign() > StackAlign) || F->hasFnAttr(Attribute::StackAlignment)); return requiresRealignment && canRealignStack(MF); } bool ARMBaseRegisterInfo:: cannotEliminateFrame(const MachineFunction &MF) const { const MachineFrameInfo *MFI = MF.getFrameInfo(); if (DisableFramePointerElim(MF) && MFI->adjustsStack()) return true; return MFI->hasVarSizedObjects() || MFI->isFrameAddressTaken() || needsStackRealignment(MF); } unsigned ARMBaseRegisterInfo::getFrameRegister(const MachineFunction &MF) const { const TargetFrameLowering *TFI = MF.getTarget().getFrameLowering(); if (TFI->hasFP(MF)) return FramePtr; return ARM::SP; } unsigned ARMBaseRegisterInfo::getEHExceptionRegister() const { llvm_unreachable("What is the exception register"); return 0; } unsigned ARMBaseRegisterInfo::getEHHandlerRegister() const { llvm_unreachable("What is the exception handler register"); return 0; } unsigned ARMBaseRegisterInfo::getRegisterPairEven(unsigned Reg, const MachineFunction &MF) const { switch (Reg) { default: break; // Return 0 if either register of the pair is a special register. // So no R12, etc. case ARM::R1: return ARM::R0; case ARM::R3: return ARM::R2; case ARM::R5: return ARM::R4; case ARM::R7: return (isReservedReg(MF, ARM::R7) || isReservedReg(MF, ARM::R6)) ? 0 : ARM::R6; case ARM::R9: return isReservedReg(MF, ARM::R9) ? 0 :ARM::R8; case ARM::R11: return isReservedReg(MF, ARM::R11) ? 0 : ARM::R10; case ARM::S1: return ARM::S0; case ARM::S3: return ARM::S2; case ARM::S5: return ARM::S4; case ARM::S7: return ARM::S6; case ARM::S9: return ARM::S8; case ARM::S11: return ARM::S10; case ARM::S13: return ARM::S12; case ARM::S15: return ARM::S14; case ARM::S17: return ARM::S16; case ARM::S19: return ARM::S18; case ARM::S21: return ARM::S20; case ARM::S23: return ARM::S22; case ARM::S25: return ARM::S24; case ARM::S27: return ARM::S26; case ARM::S29: return ARM::S28; case ARM::S31: return ARM::S30; case ARM::D1: return ARM::D0; case ARM::D3: return ARM::D2; case ARM::D5: return ARM::D4; case ARM::D7: return ARM::D6; case ARM::D9: return ARM::D8; case ARM::D11: return ARM::D10; case ARM::D13: return ARM::D12; case ARM::D15: return ARM::D14; case ARM::D17: return ARM::D16; case ARM::D19: return ARM::D18; case ARM::D21: return ARM::D20; case ARM::D23: return ARM::D22; case ARM::D25: return ARM::D24; case ARM::D27: return ARM::D26; case ARM::D29: return ARM::D28; case ARM::D31: return ARM::D30; } return 0; } unsigned ARMBaseRegisterInfo::getRegisterPairOdd(unsigned Reg, const MachineFunction &MF) const { switch (Reg) { default: break; // Return 0 if either register of the pair is a special register. // So no R12, etc. case ARM::R0: return ARM::R1; case ARM::R2: return ARM::R3; case ARM::R4: return ARM::R5; case ARM::R6: return (isReservedReg(MF, ARM::R7) || isReservedReg(MF, ARM::R6)) ? 0 : ARM::R7; case ARM::R8: return isReservedReg(MF, ARM::R9) ? 0 :ARM::R9; case ARM::R10: return isReservedReg(MF, ARM::R11) ? 0 : ARM::R11; case ARM::S0: return ARM::S1; case ARM::S2: return ARM::S3; case ARM::S4: return ARM::S5; case ARM::S6: return ARM::S7; case ARM::S8: return ARM::S9; case ARM::S10: return ARM::S11; case ARM::S12: return ARM::S13; case ARM::S14: return ARM::S15; case ARM::S16: return ARM::S17; case ARM::S18: return ARM::S19; case ARM::S20: return ARM::S21; case ARM::S22: return ARM::S23; case ARM::S24: return ARM::S25; case ARM::S26: return ARM::S27; case ARM::S28: return ARM::S29; case ARM::S30: return ARM::S31; case ARM::D0: return ARM::D1; case ARM::D2: return ARM::D3; case ARM::D4: return ARM::D5; case ARM::D6: return ARM::D7; case ARM::D8: return ARM::D9; case ARM::D10: return ARM::D11; case ARM::D12: return ARM::D13; case ARM::D14: return ARM::D15; case ARM::D16: return ARM::D17; case ARM::D18: return ARM::D19; case ARM::D20: return ARM::D21; case ARM::D22: return ARM::D23; case ARM::D24: return ARM::D25; case ARM::D26: return ARM::D27; case ARM::D28: return ARM::D29; case ARM::D30: return ARM::D31; } return 0; } /// emitLoadConstPool - Emits a load from constpool to materialize the /// specified immediate. void ARMBaseRegisterInfo:: emitLoadConstPool(MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI, DebugLoc dl, unsigned DestReg, unsigned SubIdx, int Val, ARMCC::CondCodes Pred, unsigned PredReg, unsigned MIFlags) const { MachineFunction &MF = *MBB.getParent(); MachineConstantPool *ConstantPool = MF.getConstantPool(); const Constant *C = ConstantInt::get(Type::getInt32Ty(MF.getFunction()->getContext()), Val); unsigned Idx = ConstantPool->getConstantPoolIndex(C, 4); BuildMI(MBB, MBBI, dl, TII.get(ARM::LDRcp)) .addReg(DestReg, getDefRegState(true), SubIdx) .addConstantPoolIndex(Idx) .addImm(0).addImm(Pred).addReg(PredReg) .setMIFlags(MIFlags); } bool ARMBaseRegisterInfo:: requiresRegisterScavenging(const MachineFunction &MF) const { return true; } bool ARMBaseRegisterInfo:: requiresFrameIndexScavenging(const MachineFunction &MF) const { return true; } bool ARMBaseRegisterInfo:: requiresVirtualBaseRegisters(const MachineFunction &MF) const { return EnableLocalStackAlloc; } static void emitSPUpdate(bool isARM, MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI, DebugLoc dl, const ARMBaseInstrInfo &TII, int NumBytes, ARMCC::CondCodes Pred = ARMCC::AL, unsigned PredReg = 0) { if (isARM) emitARMRegPlusImmediate(MBB, MBBI, dl, ARM::SP, ARM::SP, NumBytes, Pred, PredReg, TII); else emitT2RegPlusImmediate(MBB, MBBI, dl, ARM::SP, ARM::SP, NumBytes, Pred, PredReg, TII); } void ARMBaseRegisterInfo:: eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const { const TargetFrameLowering *TFI = MF.getTarget().getFrameLowering(); if (!TFI->hasReservedCallFrame(MF)) { // If we have alloca, convert as follows: // ADJCALLSTACKDOWN -> sub, sp, sp, amount // ADJCALLSTACKUP -> add, sp, sp, amount MachineInstr *Old = I; DebugLoc dl = Old->getDebugLoc(); unsigned Amount = Old->getOperand(0).getImm(); if (Amount != 0) { // We need to keep the stack aligned properly. To do this, we round the // amount of space needed for the outgoing arguments up to the next // alignment boundary. unsigned Align = TFI->getStackAlignment(); Amount = (Amount+Align-1)/Align*Align; ARMFunctionInfo *AFI = MF.getInfo(); assert(!AFI->isThumb1OnlyFunction() && "This eliminateCallFramePseudoInstr does not support Thumb1!"); bool isARM = !AFI->isThumbFunction(); // Replace the pseudo instruction with a new instruction... unsigned Opc = Old->getOpcode(); int PIdx = Old->findFirstPredOperandIdx(); ARMCC::CondCodes Pred = (PIdx == -1) ? ARMCC::AL : (ARMCC::CondCodes)Old->getOperand(PIdx).getImm(); if (Opc == ARM::ADJCALLSTACKDOWN || Opc == ARM::tADJCALLSTACKDOWN) { // Note: PredReg is operand 2 for ADJCALLSTACKDOWN. unsigned PredReg = Old->getOperand(2).getReg(); emitSPUpdate(isARM, MBB, I, dl, TII, -Amount, Pred, PredReg); } else { // Note: PredReg is operand 3 for ADJCALLSTACKUP. unsigned PredReg = Old->getOperand(3).getReg(); assert(Opc == ARM::ADJCALLSTACKUP || Opc == ARM::tADJCALLSTACKUP); emitSPUpdate(isARM, MBB, I, dl, TII, Amount, Pred, PredReg); } } } MBB.erase(I); } int64_t ARMBaseRegisterInfo:: getFrameIndexInstrOffset(const MachineInstr *MI, int Idx) const { const MCInstrDesc &Desc = MI->getDesc(); unsigned AddrMode = (Desc.TSFlags & ARMII::AddrModeMask); int64_t InstrOffs = 0;; int Scale = 1; unsigned ImmIdx = 0; switch (AddrMode) { case ARMII::AddrModeT2_i8: case ARMII::AddrModeT2_i12: case ARMII::AddrMode_i12: InstrOffs = MI->getOperand(Idx+1).getImm(); Scale = 1; break; case ARMII::AddrMode5: { // VFP address mode. const MachineOperand &OffOp = MI->getOperand(Idx+1); InstrOffs = ARM_AM::getAM5Offset(OffOp.getImm()); if (ARM_AM::getAM5Op(OffOp.getImm()) == ARM_AM::sub) InstrOffs = -InstrOffs; Scale = 4; break; } case ARMII::AddrMode2: { ImmIdx = Idx+2; InstrOffs = ARM_AM::getAM2Offset(MI->getOperand(ImmIdx).getImm()); if (ARM_AM::getAM2Op(MI->getOperand(ImmIdx).getImm()) == ARM_AM::sub) InstrOffs = -InstrOffs; break; } case ARMII::AddrMode3: { ImmIdx = Idx+2; InstrOffs = ARM_AM::getAM3Offset(MI->getOperand(ImmIdx).getImm()); if (ARM_AM::getAM3Op(MI->getOperand(ImmIdx).getImm()) == ARM_AM::sub) InstrOffs = -InstrOffs; break; } case ARMII::AddrModeT1_s: { ImmIdx = Idx+1; InstrOffs = MI->getOperand(ImmIdx).getImm(); Scale = 4; break; } default: llvm_unreachable("Unsupported addressing mode!"); break; } return InstrOffs * Scale; } /// needsFrameBaseReg - Returns true if the instruction's frame index /// reference would be better served by a base register other than FP /// or SP. Used by LocalStackFrameAllocation to determine which frame index /// references it should create new base registers for. bool ARMBaseRegisterInfo:: needsFrameBaseReg(MachineInstr *MI, int64_t Offset) const { for (unsigned i = 0; !MI->getOperand(i).isFI(); ++i) { assert(i < MI->getNumOperands() &&"Instr doesn't have FrameIndex operand!"); } // It's the load/store FI references that cause issues, as it can be difficult // to materialize the offset if it won't fit in the literal field. Estimate // based on the size of the local frame and some conservative assumptions // about the rest of the stack frame (note, this is pre-regalloc, so // we don't know everything for certain yet) whether this offset is likely // to be out of range of the immediate. Return true if so. // We only generate virtual base registers for loads and stores, so // return false for everything else. unsigned Opc = MI->getOpcode(); switch (Opc) { case ARM::LDRi12: case ARM::LDRH: case ARM::LDRBi12: case ARM::STRi12: case ARM::STRH: case ARM::STRBi12: case ARM::t2LDRi12: case ARM::t2LDRi8: case ARM::t2STRi12: case ARM::t2STRi8: case ARM::VLDRS: case ARM::VLDRD: case ARM::VSTRS: case ARM::VSTRD: case ARM::tSTRspi: case ARM::tLDRspi: if (ForceAllBaseRegAlloc) return true; break; default: return false; } // Without a virtual base register, if the function has variable sized // objects, all fixed-size local references will be via the frame pointer, // Approximate the offset and see if it's legal for the instruction. // Note that the incoming offset is based on the SP value at function entry, // so it'll be negative. MachineFunction &MF = *MI->getParent()->getParent(); const TargetFrameLowering *TFI = MF.getTarget().getFrameLowering(); MachineFrameInfo *MFI = MF.getFrameInfo(); ARMFunctionInfo *AFI = MF.getInfo(); // Estimate an offset from the frame pointer. // Conservatively assume all callee-saved registers get pushed. R4-R6 // will be earlier than the FP, so we ignore those. // R7, LR int64_t FPOffset = Offset - 8; // ARM and Thumb2 functions also need to consider R8-R11 and D8-D15 if (!AFI->isThumbFunction() || !AFI->isThumb1OnlyFunction()) FPOffset -= 80; // Estimate an offset from the stack pointer. // The incoming offset is relating to the SP at the start of the function, // but when we access the local it'll be relative to the SP after local // allocation, so adjust our SP-relative offset by that allocation size. Offset = -Offset; Offset += MFI->getLocalFrameSize(); // Assume that we'll have at least some spill slots allocated. // FIXME: This is a total SWAG number. We should run some statistics // and pick a real one. Offset += 128; // 128 bytes of spill slots // If there is a frame pointer, try using it. // The FP is only available if there is no dynamic realignment. We // don't know for sure yet whether we'll need that, so we guess based // on whether there are any local variables that would trigger it. unsigned StackAlign = TFI->getStackAlignment(); if (TFI->hasFP(MF) && !((MFI->getLocalFrameMaxAlign() > StackAlign) && canRealignStack(MF))) { if (isFrameOffsetLegal(MI, FPOffset)) return false; } // If we can reference via the stack pointer, try that. // FIXME: This (and the code that resolves the references) can be improved // to only disallow SP relative references in the live range of // the VLA(s). In practice, it's unclear how much difference that // would make, but it may be worth doing. if (!MFI->hasVarSizedObjects() && isFrameOffsetLegal(MI, Offset)) return false; // The offset likely isn't legal, we want to allocate a virtual base register. return true; } /// materializeFrameBaseRegister - Insert defining instruction(s) for BaseReg to /// be a pointer to FrameIdx at the beginning of the basic block. void ARMBaseRegisterInfo:: materializeFrameBaseRegister(MachineBasicBlock *MBB, unsigned BaseReg, int FrameIdx, int64_t Offset) const { ARMFunctionInfo *AFI = MBB->getParent()->getInfo(); unsigned ADDriOpc = !AFI->isThumbFunction() ? ARM::ADDri : (AFI->isThumb1OnlyFunction() ? ARM::tADDrSPi : ARM::t2ADDri); MachineBasicBlock::iterator Ins = MBB->begin(); DebugLoc DL; // Defaults to "unknown" if (Ins != MBB->end()) DL = Ins->getDebugLoc(); const MCInstrDesc &MCID = TII.get(ADDriOpc); MachineRegisterInfo &MRI = MBB->getParent()->getRegInfo(); MRI.constrainRegClass(BaseReg, TII.getRegClass(MCID, 0, this)); MachineInstrBuilder MIB = AddDefaultPred(BuildMI(*MBB, Ins, DL, MCID, BaseReg) .addFrameIndex(FrameIdx).addImm(Offset)); if (!AFI->isThumb1OnlyFunction()) AddDefaultCC(MIB); } void ARMBaseRegisterInfo::resolveFrameIndex(MachineBasicBlock::iterator I, unsigned BaseReg, int64_t Offset) const { MachineInstr &MI = *I; MachineBasicBlock &MBB = *MI.getParent(); MachineFunction &MF = *MBB.getParent(); ARMFunctionInfo *AFI = MF.getInfo(); int Off = Offset; // ARM doesn't need the general 64-bit offsets unsigned i = 0; assert(!AFI->isThumb1OnlyFunction() && "This resolveFrameIndex does not support Thumb1!"); while (!MI.getOperand(i).isFI()) { ++i; assert(i < MI.getNumOperands() && "Instr doesn't have FrameIndex operand!"); } bool Done = false; if (!AFI->isThumbFunction()) Done = rewriteARMFrameIndex(MI, i, BaseReg, Off, TII); else { assert(AFI->isThumb2Function()); Done = rewriteT2FrameIndex(MI, i, BaseReg, Off, TII); } assert (Done && "Unable to resolve frame index!"); (void)Done; } bool ARMBaseRegisterInfo::isFrameOffsetLegal(const MachineInstr *MI, int64_t Offset) const { const MCInstrDesc &Desc = MI->getDesc(); unsigned AddrMode = (Desc.TSFlags & ARMII::AddrModeMask); unsigned i = 0; while (!MI->getOperand(i).isFI()) { ++i; assert(i < MI->getNumOperands() &&"Instr doesn't have FrameIndex operand!"); } // AddrMode4 and AddrMode6 cannot handle any offset. if (AddrMode == ARMII::AddrMode4 || AddrMode == ARMII::AddrMode6) return Offset == 0; unsigned NumBits = 0; unsigned Scale = 1; bool isSigned = true; switch (AddrMode) { case ARMII::AddrModeT2_i8: case ARMII::AddrModeT2_i12: // i8 supports only negative, and i12 supports only positive, so // based on Offset sign, consider the appropriate instruction Scale = 1; if (Offset < 0) { NumBits = 8; Offset = -Offset; } else { NumBits = 12; } break; case ARMII::AddrMode5: // VFP address mode. NumBits = 8; Scale = 4; break; case ARMII::AddrMode_i12: case ARMII::AddrMode2: NumBits = 12; break; case ARMII::AddrMode3: NumBits = 8; break; case ARMII::AddrModeT1_s: NumBits = 5; Scale = 4; isSigned = false; break; default: llvm_unreachable("Unsupported addressing mode!"); break; } Offset += getFrameIndexInstrOffset(MI, i); // Make sure the offset is encodable for instructions that scale the // immediate. if ((Offset & (Scale-1)) != 0) return false; if (isSigned && Offset < 0) Offset = -Offset; unsigned Mask = (1 << NumBits) - 1; if ((unsigned)Offset <= Mask * Scale) return true; return false; } void ARMBaseRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II, int SPAdj, RegScavenger *RS) const { unsigned i = 0; MachineInstr &MI = *II; MachineBasicBlock &MBB = *MI.getParent(); MachineFunction &MF = *MBB.getParent(); const ARMFrameLowering *TFI = static_cast(MF.getTarget().getFrameLowering()); ARMFunctionInfo *AFI = MF.getInfo(); assert(!AFI->isThumb1OnlyFunction() && "This eliminateFrameIndex does not support Thumb1!"); while (!MI.getOperand(i).isFI()) { ++i; assert(i < MI.getNumOperands() && "Instr doesn't have FrameIndex operand!"); } int FrameIndex = MI.getOperand(i).getIndex(); unsigned FrameReg; int Offset = TFI->ResolveFrameIndexReference(MF, FrameIndex, FrameReg, SPAdj); // Special handling of dbg_value instructions. if (MI.isDebugValue()) { MI.getOperand(i). ChangeToRegister(FrameReg, false /*isDef*/); MI.getOperand(i+1).ChangeToImmediate(Offset); return; } // Modify MI as necessary to handle as much of 'Offset' as possible bool Done = false; if (!AFI->isThumbFunction()) Done = rewriteARMFrameIndex(MI, i, FrameReg, Offset, TII); else { assert(AFI->isThumb2Function()); Done = rewriteT2FrameIndex(MI, i, FrameReg, Offset, TII); } if (Done) return; // If we get here, the immediate doesn't fit into the instruction. We folded // as much as possible above, handle the rest, providing a register that is // SP+LargeImm. assert((Offset || (MI.getDesc().TSFlags & ARMII::AddrModeMask) == ARMII::AddrMode4 || (MI.getDesc().TSFlags & ARMII::AddrModeMask) == ARMII::AddrMode6) && "This code isn't needed if offset already handled!"); unsigned ScratchReg = 0; int PIdx = MI.findFirstPredOperandIdx(); ARMCC::CondCodes Pred = (PIdx == -1) ? ARMCC::AL : (ARMCC::CondCodes)MI.getOperand(PIdx).getImm(); unsigned PredReg = (PIdx == -1) ? 0 : MI.getOperand(PIdx+1).getReg(); if (Offset == 0) // Must be addrmode4/6. MI.getOperand(i).ChangeToRegister(FrameReg, false, false, false); else { ScratchReg = MF.getRegInfo().createVirtualRegister(ARM::GPRRegisterClass); if (!AFI->isThumbFunction()) emitARMRegPlusImmediate(MBB, II, MI.getDebugLoc(), ScratchReg, FrameReg, Offset, Pred, PredReg, TII); else { assert(AFI->isThumb2Function()); emitT2RegPlusImmediate(MBB, II, MI.getDebugLoc(), ScratchReg, FrameReg, Offset, Pred, PredReg, TII); } // Update the original instruction to use the scratch register. MI.getOperand(i).ChangeToRegister(ScratchReg, false, false, true); } } Index: vendor/llvm/dist/lib/Target/ARM/ARMCallingConv.td =================================================================== --- vendor/llvm/dist/lib/Target/ARM/ARMCallingConv.td (revision 228363) +++ vendor/llvm/dist/lib/Target/ARM/ARMCallingConv.td (revision 228364) @@ -1,164 +1,183 @@ //===- ARMCallingConv.td - Calling Conventions for ARM -----*- tablegen -*-===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // This describes the calling conventions for ARM architecture. //===----------------------------------------------------------------------===// /// CCIfSubtarget - Match if the current subtarget has a feature F. class CCIfSubtarget: CCIf().", F), A>; /// CCIfAlign - Match of the original alignment of the arg class CCIfAlign: CCIf; //===----------------------------------------------------------------------===// // ARM APCS Calling Convention //===----------------------------------------------------------------------===// def CC_ARM_APCS : CallingConv<[ // Handles byval parameters. CCIfByVal>, CCIfType<[i8, i16], CCPromoteToType>, // Handle all vector types as either f64 or v2f64. CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType>, CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType>, // f64 and v2f64 are passed in adjacent GPRs, possibly split onto the stack CCIfType<[f64, v2f64], CCCustom<"CC_ARM_APCS_Custom_f64">>, CCIfType<[f32], CCBitConvertToType>, CCIfType<[i32], CCAssignToReg<[R0, R1, R2, R3]>>, CCIfType<[i32], CCAssignToStack<4, 4>>, CCIfType<[f64], CCAssignToStack<8, 4>>, CCIfType<[v2f64], CCAssignToStack<16, 4>> ]>; def RetCC_ARM_APCS : CallingConv<[ CCIfType<[f32], CCBitConvertToType>, // Handle all vector types as either f64 or v2f64. CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType>, CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType>, CCIfType<[f64, v2f64], CCCustom<"RetCC_ARM_APCS_Custom_f64">>, CCIfType<[i32], CCAssignToReg<[R0, R1, R2, R3]>>, CCIfType<[i64], CCAssignToRegWithShadow<[R0, R2], [R1, R3]>> ]>; //===----------------------------------------------------------------------===// // ARM APCS Calling Convention for FastCC (when VFP2 or later is available) //===----------------------------------------------------------------------===// def FastCC_ARM_APCS : CallingConv<[ // Handle all vector types as either f64 or v2f64. CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType>, CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType>, CCIfType<[v2f64], CCAssignToReg<[Q0, Q1, Q2, Q3]>>, CCIfType<[f64], CCAssignToReg<[D0, D1, D2, D3, D4, D5, D6, D7]>>, CCIfType<[f32], CCAssignToReg<[S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15]>>, CCDelegateTo ]>; def RetFastCC_ARM_APCS : CallingConv<[ // Handle all vector types as either f64 or v2f64. CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType>, CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType>, CCIfType<[v2f64], CCAssignToReg<[Q0, Q1, Q2, Q3]>>, CCIfType<[f64], CCAssignToReg<[D0, D1, D2, D3, D4, D5, D6, D7]>>, CCIfType<[f32], CCAssignToReg<[S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15]>>, CCDelegateTo ]>; +//===----------------------------------------------------------------------===// +// ARM APCS Calling Convention for GHC +//===----------------------------------------------------------------------===// + +def CC_ARM_APCS_GHC : CallingConv<[ + // Handle all vector types as either f64 or v2f64. + CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType>, + CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType>, + + CCIfType<[v2f64], CCAssignToReg<[Q4, Q5]>>, + CCIfType<[f64], CCAssignToReg<[D8, D9, D10, D11]>>, + CCIfType<[f32], CCAssignToReg<[S16, S17, S18, S19, S20, S21, S22, S23]>>, + + // Promote i8/i16 arguments to i32. + CCIfType<[i8, i16], CCPromoteToType>, + + // Pass in STG registers: Base, Sp, Hp, R1, R2, R3, R4, SpLim + CCIfType<[i32], CCAssignToReg<[R4, R5, R6, R7, R8, R9, R10, R11]>> +]>; //===----------------------------------------------------------------------===// // ARM AAPCS (EABI) Calling Convention, common parts //===----------------------------------------------------------------------===// def CC_ARM_AAPCS_Common : CallingConv<[ CCIfType<[i8, i16], CCPromoteToType>, // i64/f64 is passed in even pairs of GPRs // i64 is 8-aligned i32 here, so we may need to eat R1 as a pad register // (and the same is true for f64 if VFP is not enabled) CCIfType<[i32], CCIfAlign<"8", CCAssignToRegWithShadow<[R0, R2], [R0, R1]>>>, CCIfType<[i32], CCIf<"State.getNextStackOffset() == 0 &&" "ArgFlags.getOrigAlign() != 8", CCAssignToReg<[R0, R1, R2, R3]>>>, CCIfType<[i32], CCIfAlign<"8", CCAssignToStackWithShadow<4, 8, R3>>>, CCIfType<[i32, f32], CCAssignToStack<4, 4>>, CCIfType<[f64], CCAssignToStack<8, 8>>, CCIfType<[v2f64], CCAssignToStack<16, 8>> ]>; def RetCC_ARM_AAPCS_Common : CallingConv<[ CCIfType<[i32], CCAssignToReg<[R0, R1, R2, R3]>>, CCIfType<[i64], CCAssignToRegWithShadow<[R0, R2], [R1, R3]>> ]>; //===----------------------------------------------------------------------===// // ARM AAPCS (EABI) Calling Convention //===----------------------------------------------------------------------===// def CC_ARM_AAPCS : CallingConv<[ // Handle all vector types as either f64 or v2f64. CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType>, CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType>, CCIfType<[f64, v2f64], CCCustom<"CC_ARM_AAPCS_Custom_f64">>, CCIfType<[f32], CCBitConvertToType>, CCDelegateTo ]>; def RetCC_ARM_AAPCS : CallingConv<[ // Handle all vector types as either f64 or v2f64. CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType>, CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType>, CCIfType<[f64, v2f64], CCCustom<"RetCC_ARM_AAPCS_Custom_f64">>, CCIfType<[f32], CCBitConvertToType>, CCDelegateTo ]>; //===----------------------------------------------------------------------===// // ARM AAPCS-VFP (EABI) Calling Convention // Also used for FastCC (when VFP2 or later is available) //===----------------------------------------------------------------------===// def CC_ARM_AAPCS_VFP : CallingConv<[ // Handle all vector types as either f64 or v2f64. CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType>, CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType>, CCIfType<[v2f64], CCAssignToReg<[Q0, Q1, Q2, Q3]>>, CCIfType<[f64], CCAssignToReg<[D0, D1, D2, D3, D4, D5, D6, D7]>>, CCIfType<[f32], CCAssignToReg<[S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15]>>, CCDelegateTo ]>; def RetCC_ARM_AAPCS_VFP : CallingConv<[ // Handle all vector types as either f64 or v2f64. CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType>, CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType>, CCIfType<[v2f64], CCAssignToReg<[Q0, Q1, Q2, Q3]>>, CCIfType<[f64], CCAssignToReg<[D0, D1, D2, D3, D4, D5, D6, D7]>>, CCIfType<[f32], CCAssignToReg<[S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15]>>, CCDelegateTo ]>; Index: vendor/llvm/dist/lib/Target/ARM/ARMFastISel.cpp =================================================================== --- vendor/llvm/dist/lib/Target/ARM/ARMFastISel.cpp (revision 228363) +++ vendor/llvm/dist/lib/Target/ARM/ARMFastISel.cpp (revision 228364) @@ -1,2120 +1,2125 @@ //===-- ARMFastISel.cpp - ARM FastISel implementation ---------------------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file defines the ARM-specific support for the FastISel class. Some // of the target-specific code is generated by tablegen in the file // ARMGenFastISel.inc, which is #included here. // //===----------------------------------------------------------------------===// #include "ARM.h" #include "ARMBaseInstrInfo.h" #include "ARMCallingConv.h" #include "ARMRegisterInfo.h" #include "ARMTargetMachine.h" #include "ARMSubtarget.h" #include "ARMConstantPoolValue.h" #include "MCTargetDesc/ARMAddressingModes.h" #include "llvm/CallingConv.h" #include "llvm/DerivedTypes.h" #include "llvm/GlobalVariable.h" #include "llvm/Instructions.h" #include "llvm/IntrinsicInst.h" #include "llvm/Module.h" #include "llvm/Operator.h" #include "llvm/CodeGen/Analysis.h" #include "llvm/CodeGen/FastISel.h" #include "llvm/CodeGen/FunctionLoweringInfo.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/CodeGen/MachineModuleInfo.h" #include "llvm/CodeGen/MachineConstantPool.h" #include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineMemOperand.h" #include "llvm/CodeGen/MachineRegisterInfo.h" #include "llvm/CodeGen/PseudoSourceValue.h" #include "llvm/Support/CallSite.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/GetElementPtrTypeIterator.h" #include "llvm/Target/TargetData.h" #include "llvm/Target/TargetInstrInfo.h" #include "llvm/Target/TargetLowering.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Target/TargetOptions.h" using namespace llvm; static cl::opt DisableARMFastISel("disable-arm-fast-isel", cl::desc("Turn off experimental ARM fast-isel support"), cl::init(false), cl::Hidden); extern cl::opt EnableARMLongCalls; namespace { // All possible address modes, plus some. typedef struct Address { enum { RegBase, FrameIndexBase } BaseType; union { unsigned Reg; int FI; } Base; int Offset; // Innocuous defaults for our address. Address() : BaseType(RegBase), Offset(0) { Base.Reg = 0; } } Address; class ARMFastISel : public FastISel { /// Subtarget - Keep a pointer to the ARMSubtarget around so that we can /// make the right decision when generating code for different targets. const ARMSubtarget *Subtarget; const TargetMachine &TM; const TargetInstrInfo &TII; const TargetLowering &TLI; ARMFunctionInfo *AFI; // Convenience variables to avoid some queries. bool isThumb; LLVMContext *Context; public: explicit ARMFastISel(FunctionLoweringInfo &funcInfo) : FastISel(funcInfo), TM(funcInfo.MF->getTarget()), TII(*TM.getInstrInfo()), TLI(*TM.getTargetLowering()) { Subtarget = &TM.getSubtarget(); AFI = funcInfo.MF->getInfo(); isThumb = AFI->isThumbFunction(); Context = &funcInfo.Fn->getContext(); } // Code from FastISel.cpp. virtual unsigned FastEmitInst_(unsigned MachineInstOpcode, const TargetRegisterClass *RC); virtual unsigned FastEmitInst_r(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill); virtual unsigned FastEmitInst_rr(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill, unsigned Op1, bool Op1IsKill); virtual unsigned FastEmitInst_rrr(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill, unsigned Op1, bool Op1IsKill, unsigned Op2, bool Op2IsKill); virtual unsigned FastEmitInst_ri(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill, uint64_t Imm); virtual unsigned FastEmitInst_rf(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill, const ConstantFP *FPImm); virtual unsigned FastEmitInst_rri(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill, unsigned Op1, bool Op1IsKill, uint64_t Imm); virtual unsigned FastEmitInst_i(unsigned MachineInstOpcode, const TargetRegisterClass *RC, uint64_t Imm); virtual unsigned FastEmitInst_ii(unsigned MachineInstOpcode, const TargetRegisterClass *RC, uint64_t Imm1, uint64_t Imm2); virtual unsigned FastEmitInst_extractsubreg(MVT RetVT, unsigned Op0, bool Op0IsKill, uint32_t Idx); // Backend specific FastISel code. virtual bool TargetSelectInstruction(const Instruction *I); virtual unsigned TargetMaterializeConstant(const Constant *C); virtual unsigned TargetMaterializeAlloca(const AllocaInst *AI); #include "ARMGenFastISel.inc" // Instruction selection routines. private: bool SelectLoad(const Instruction *I); bool SelectStore(const Instruction *I); bool SelectBranch(const Instruction *I); bool SelectCmp(const Instruction *I); bool SelectFPExt(const Instruction *I); bool SelectFPTrunc(const Instruction *I); bool SelectBinaryOp(const Instruction *I, unsigned ISDOpcode); bool SelectSIToFP(const Instruction *I); bool SelectFPToSI(const Instruction *I); bool SelectSDiv(const Instruction *I); bool SelectSRem(const Instruction *I); bool SelectCall(const Instruction *I); bool SelectSelect(const Instruction *I); bool SelectRet(const Instruction *I); bool SelectIntCast(const Instruction *I); // Utility routines. private: bool isTypeLegal(Type *Ty, MVT &VT); bool isLoadTypeLegal(Type *Ty, MVT &VT); bool ARMEmitLoad(EVT VT, unsigned &ResultReg, Address &Addr); bool ARMEmitStore(EVT VT, unsigned SrcReg, Address &Addr); bool ARMComputeAddress(const Value *Obj, Address &Addr); void ARMSimplifyAddress(Address &Addr, EVT VT); unsigned ARMMaterializeFP(const ConstantFP *CFP, EVT VT); unsigned ARMMaterializeInt(const Constant *C, EVT VT); unsigned ARMMaterializeGV(const GlobalValue *GV, EVT VT); unsigned ARMMoveToFPReg(EVT VT, unsigned SrcReg); unsigned ARMMoveToIntReg(EVT VT, unsigned SrcReg); unsigned ARMSelectCallOp(const GlobalValue *GV); // Call handling routines. private: bool FastEmitExtend(ISD::NodeType Opc, EVT DstVT, unsigned Src, EVT SrcVT, unsigned &ResultReg); CCAssignFn *CCAssignFnForCall(CallingConv::ID CC, bool Return); bool ProcessCallArgs(SmallVectorImpl &Args, SmallVectorImpl &ArgRegs, SmallVectorImpl &ArgVTs, SmallVectorImpl &ArgFlags, SmallVectorImpl &RegArgs, CallingConv::ID CC, unsigned &NumBytes); bool FinishCall(MVT RetVT, SmallVectorImpl &UsedRegs, const Instruction *I, CallingConv::ID CC, unsigned &NumBytes); bool ARMEmitLibcall(const Instruction *I, RTLIB::Libcall Call); // OptionalDef handling routines. private: bool isARMNEONPred(const MachineInstr *MI); bool DefinesOptionalPredicate(MachineInstr *MI, bool *CPSR); const MachineInstrBuilder &AddOptionalDefs(const MachineInstrBuilder &MIB); void AddLoadStoreOperands(EVT VT, Address &Addr, const MachineInstrBuilder &MIB, unsigned Flags); }; } // end anonymous namespace #include "ARMGenCallingConv.inc" // DefinesOptionalPredicate - This is different from DefinesPredicate in that // we don't care about implicit defs here, just places we'll need to add a // default CCReg argument. Sets CPSR if we're setting CPSR instead of CCR. bool ARMFastISel::DefinesOptionalPredicate(MachineInstr *MI, bool *CPSR) { const MCInstrDesc &MCID = MI->getDesc(); if (!MCID.hasOptionalDef()) return false; // Look to see if our OptionalDef is defining CPSR or CCR. for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) { const MachineOperand &MO = MI->getOperand(i); if (!MO.isReg() || !MO.isDef()) continue; if (MO.getReg() == ARM::CPSR) *CPSR = true; } return true; } bool ARMFastISel::isARMNEONPred(const MachineInstr *MI) { const MCInstrDesc &MCID = MI->getDesc(); // If we're a thumb2 or not NEON function we were handled via isPredicable. if ((MCID.TSFlags & ARMII::DomainMask) != ARMII::DomainNEON || AFI->isThumb2Function()) return false; for (unsigned i = 0, e = MCID.getNumOperands(); i != e; ++i) if (MCID.OpInfo[i].isPredicate()) return true; return false; } // If the machine is predicable go ahead and add the predicate operands, if // it needs default CC operands add those. // TODO: If we want to support thumb1 then we'll need to deal with optional // CPSR defs that need to be added before the remaining operands. See s_cc_out // for descriptions why. const MachineInstrBuilder & ARMFastISel::AddOptionalDefs(const MachineInstrBuilder &MIB) { MachineInstr *MI = &*MIB; // Do we use a predicate? or... // Are we NEON in ARM mode and have a predicate operand? If so, I know // we're not predicable but add it anyways. if (TII.isPredicable(MI) || isARMNEONPred(MI)) AddDefaultPred(MIB); // Do we optionally set a predicate? Preds is size > 0 iff the predicate // defines CPSR. All other OptionalDefines in ARM are the CCR register. bool CPSR = false; if (DefinesOptionalPredicate(MI, &CPSR)) { if (CPSR) AddDefaultT1CC(MIB); else AddDefaultCC(MIB); } return MIB; } unsigned ARMFastISel::FastEmitInst_(unsigned MachineInstOpcode, const TargetRegisterClass* RC) { unsigned ResultReg = createResultReg(RC); const MCInstrDesc &II = TII.get(MachineInstOpcode); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II, ResultReg)); return ResultReg; } unsigned ARMFastISel::FastEmitInst_r(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill) { unsigned ResultReg = createResultReg(RC); const MCInstrDesc &II = TII.get(MachineInstOpcode); if (II.getNumDefs() >= 1) AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II, ResultReg) .addReg(Op0, Op0IsKill * RegState::Kill)); else { AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II) .addReg(Op0, Op0IsKill * RegState::Kill)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), ResultReg) .addReg(II.ImplicitDefs[0])); } return ResultReg; } unsigned ARMFastISel::FastEmitInst_rr(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill, unsigned Op1, bool Op1IsKill) { unsigned ResultReg = createResultReg(RC); const MCInstrDesc &II = TII.get(MachineInstOpcode); if (II.getNumDefs() >= 1) AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II, ResultReg) .addReg(Op0, Op0IsKill * RegState::Kill) .addReg(Op1, Op1IsKill * RegState::Kill)); else { AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II) .addReg(Op0, Op0IsKill * RegState::Kill) .addReg(Op1, Op1IsKill * RegState::Kill)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), ResultReg) .addReg(II.ImplicitDefs[0])); } return ResultReg; } unsigned ARMFastISel::FastEmitInst_rrr(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill, unsigned Op1, bool Op1IsKill, unsigned Op2, bool Op2IsKill) { unsigned ResultReg = createResultReg(RC); const MCInstrDesc &II = TII.get(MachineInstOpcode); if (II.getNumDefs() >= 1) AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II, ResultReg) .addReg(Op0, Op0IsKill * RegState::Kill) .addReg(Op1, Op1IsKill * RegState::Kill) .addReg(Op2, Op2IsKill * RegState::Kill)); else { AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II) .addReg(Op0, Op0IsKill * RegState::Kill) .addReg(Op1, Op1IsKill * RegState::Kill) .addReg(Op2, Op2IsKill * RegState::Kill)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), ResultReg) .addReg(II.ImplicitDefs[0])); } return ResultReg; } unsigned ARMFastISel::FastEmitInst_ri(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill, uint64_t Imm) { unsigned ResultReg = createResultReg(RC); const MCInstrDesc &II = TII.get(MachineInstOpcode); if (II.getNumDefs() >= 1) AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II, ResultReg) .addReg(Op0, Op0IsKill * RegState::Kill) .addImm(Imm)); else { AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II) .addReg(Op0, Op0IsKill * RegState::Kill) .addImm(Imm)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), ResultReg) .addReg(II.ImplicitDefs[0])); } return ResultReg; } unsigned ARMFastISel::FastEmitInst_rf(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill, const ConstantFP *FPImm) { unsigned ResultReg = createResultReg(RC); const MCInstrDesc &II = TII.get(MachineInstOpcode); if (II.getNumDefs() >= 1) AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II, ResultReg) .addReg(Op0, Op0IsKill * RegState::Kill) .addFPImm(FPImm)); else { AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II) .addReg(Op0, Op0IsKill * RegState::Kill) .addFPImm(FPImm)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), ResultReg) .addReg(II.ImplicitDefs[0])); } return ResultReg; } unsigned ARMFastISel::FastEmitInst_rri(unsigned MachineInstOpcode, const TargetRegisterClass *RC, unsigned Op0, bool Op0IsKill, unsigned Op1, bool Op1IsKill, uint64_t Imm) { unsigned ResultReg = createResultReg(RC); const MCInstrDesc &II = TII.get(MachineInstOpcode); if (II.getNumDefs() >= 1) AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II, ResultReg) .addReg(Op0, Op0IsKill * RegState::Kill) .addReg(Op1, Op1IsKill * RegState::Kill) .addImm(Imm)); else { AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II) .addReg(Op0, Op0IsKill * RegState::Kill) .addReg(Op1, Op1IsKill * RegState::Kill) .addImm(Imm)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), ResultReg) .addReg(II.ImplicitDefs[0])); } return ResultReg; } unsigned ARMFastISel::FastEmitInst_i(unsigned MachineInstOpcode, const TargetRegisterClass *RC, uint64_t Imm) { unsigned ResultReg = createResultReg(RC); const MCInstrDesc &II = TII.get(MachineInstOpcode); if (II.getNumDefs() >= 1) AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II, ResultReg) .addImm(Imm)); else { AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II) .addImm(Imm)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), ResultReg) .addReg(II.ImplicitDefs[0])); } return ResultReg; } unsigned ARMFastISel::FastEmitInst_ii(unsigned MachineInstOpcode, const TargetRegisterClass *RC, uint64_t Imm1, uint64_t Imm2) { unsigned ResultReg = createResultReg(RC); const MCInstrDesc &II = TII.get(MachineInstOpcode); if (II.getNumDefs() >= 1) AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II, ResultReg) .addImm(Imm1).addImm(Imm2)); else { AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, II) .addImm(Imm1).addImm(Imm2)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), ResultReg) .addReg(II.ImplicitDefs[0])); } return ResultReg; } unsigned ARMFastISel::FastEmitInst_extractsubreg(MVT RetVT, unsigned Op0, bool Op0IsKill, uint32_t Idx) { unsigned ResultReg = createResultReg(TLI.getRegClassFor(RetVT)); assert(TargetRegisterInfo::isVirtualRegister(Op0) && "Cannot yet extract from physregs"); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), ResultReg) .addReg(Op0, getKillRegState(Op0IsKill), Idx)); return ResultReg; } // TODO: Don't worry about 64-bit now, but when this is fixed remove the // checks from the various callers. unsigned ARMFastISel::ARMMoveToFPReg(EVT VT, unsigned SrcReg) { if (VT == MVT::f64) return 0; unsigned MoveReg = createResultReg(TLI.getRegClassFor(VT)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::VMOVRS), MoveReg) .addReg(SrcReg)); return MoveReg; } unsigned ARMFastISel::ARMMoveToIntReg(EVT VT, unsigned SrcReg) { if (VT == MVT::i64) return 0; unsigned MoveReg = createResultReg(TLI.getRegClassFor(VT)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::VMOVSR), MoveReg) .addReg(SrcReg)); return MoveReg; } // For double width floating point we need to materialize two constants // (the high and the low) into integer registers then use a move to get // the combined constant into an FP reg. unsigned ARMFastISel::ARMMaterializeFP(const ConstantFP *CFP, EVT VT) { const APFloat Val = CFP->getValueAPF(); bool is64bit = VT == MVT::f64; // This checks to see if we can use VFP3 instructions to materialize // a constant, otherwise we have to go through the constant pool. if (TLI.isFPImmLegal(Val, VT)) { int Imm; unsigned Opc; if (is64bit) { Imm = ARM_AM::getFP64Imm(Val); Opc = ARM::FCONSTD; } else { Imm = ARM_AM::getFP32Imm(Val); Opc = ARM::FCONSTS; } unsigned DestReg = createResultReg(TLI.getRegClassFor(VT)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(Opc), DestReg) .addImm(Imm)); return DestReg; } // Require VFP2 for loading fp constants. if (!Subtarget->hasVFP2()) return false; // MachineConstantPool wants an explicit alignment. unsigned Align = TD.getPrefTypeAlignment(CFP->getType()); if (Align == 0) { // TODO: Figure out if this is correct. Align = TD.getTypeAllocSize(CFP->getType()); } unsigned Idx = MCP.getConstantPoolIndex(cast(CFP), Align); unsigned DestReg = createResultReg(TLI.getRegClassFor(VT)); unsigned Opc = is64bit ? ARM::VLDRD : ARM::VLDRS; // The extra reg is for addrmode5. AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(Opc), DestReg) .addConstantPoolIndex(Idx) .addReg(0)); return DestReg; } unsigned ARMFastISel::ARMMaterializeInt(const Constant *C, EVT VT) { // For now 32-bit only. if (VT != MVT::i32) return false; unsigned DestReg = createResultReg(TLI.getRegClassFor(VT)); // If we can do this in a single instruction without a constant pool entry // do so now. const ConstantInt *CI = cast(C); if (Subtarget->hasV6T2Ops() && isUInt<16>(CI->getSExtValue())) { unsigned Opc = isThumb ? ARM::t2MOVi16 : ARM::MOVi16; AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(Opc), DestReg) .addImm(CI->getSExtValue())); return DestReg; } // MachineConstantPool wants an explicit alignment. unsigned Align = TD.getPrefTypeAlignment(C->getType()); if (Align == 0) { // TODO: Figure out if this is correct. Align = TD.getTypeAllocSize(C->getType()); } unsigned Idx = MCP.getConstantPoolIndex(C, Align); if (isThumb) AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::t2LDRpci), DestReg) .addConstantPoolIndex(Idx)); else // The extra immediate is for addrmode2. AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::LDRcp), DestReg) .addConstantPoolIndex(Idx) .addImm(0)); return DestReg; } unsigned ARMFastISel::ARMMaterializeGV(const GlobalValue *GV, EVT VT) { // For now 32-bit only. if (VT != MVT::i32) return 0; Reloc::Model RelocM = TM.getRelocationModel(); // TODO: Need more magic for ARM PIC. if (!isThumb && (RelocM == Reloc::PIC_)) return 0; // MachineConstantPool wants an explicit alignment. unsigned Align = TD.getPrefTypeAlignment(GV->getType()); if (Align == 0) { // TODO: Figure out if this is correct. Align = TD.getTypeAllocSize(GV->getType()); } // Grab index. unsigned PCAdj = (RelocM != Reloc::PIC_) ? 0 : (Subtarget->isThumb() ? 4 : 8); unsigned Id = AFI->createPICLabelUId(); ARMConstantPoolValue *CPV = ARMConstantPoolConstant::Create(GV, Id, ARMCP::CPValue, PCAdj); unsigned Idx = MCP.getConstantPoolIndex(CPV, Align); // Load value. MachineInstrBuilder MIB; unsigned DestReg = createResultReg(TLI.getRegClassFor(VT)); if (isThumb) { unsigned Opc = (RelocM != Reloc::PIC_) ? ARM::t2LDRpci : ARM::t2LDRpci_pic; MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(Opc), DestReg) .addConstantPoolIndex(Idx); if (RelocM == Reloc::PIC_) MIB.addImm(Id); } else { // The extra immediate is for addrmode2. MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::LDRcp), DestReg) .addConstantPoolIndex(Idx) .addImm(0); } AddOptionalDefs(MIB); if (Subtarget->GVIsIndirectSymbol(GV, RelocM)) { unsigned NewDestReg = createResultReg(TLI.getRegClassFor(VT)); if (isThumb) MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::t2LDRi12), NewDestReg) .addReg(DestReg) .addImm(0); else MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::LDRi12), NewDestReg) .addReg(DestReg) .addImm(0); DestReg = NewDestReg; AddOptionalDefs(MIB); } return DestReg; } unsigned ARMFastISel::TargetMaterializeConstant(const Constant *C) { EVT VT = TLI.getValueType(C->getType(), true); // Only handle simple types. if (!VT.isSimple()) return 0; if (const ConstantFP *CFP = dyn_cast(C)) return ARMMaterializeFP(CFP, VT); else if (const GlobalValue *GV = dyn_cast(C)) return ARMMaterializeGV(GV, VT); else if (isa(C)) return ARMMaterializeInt(C, VT); return 0; } unsigned ARMFastISel::TargetMaterializeAlloca(const AllocaInst *AI) { // Don't handle dynamic allocas. if (!FuncInfo.StaticAllocaMap.count(AI)) return 0; MVT VT; if (!isLoadTypeLegal(AI->getType(), VT)) return false; DenseMap::iterator SI = FuncInfo.StaticAllocaMap.find(AI); // This will get lowered later into the correct offsets and registers // via rewriteXFrameIndex. if (SI != FuncInfo.StaticAllocaMap.end()) { TargetRegisterClass* RC = TLI.getRegClassFor(VT); unsigned ResultReg = createResultReg(RC); unsigned Opc = isThumb ? ARM::t2ADDri : ARM::ADDri; AddOptionalDefs(BuildMI(*FuncInfo.MBB, *FuncInfo.InsertPt, DL, TII.get(Opc), ResultReg) .addFrameIndex(SI->second) .addImm(0)); return ResultReg; } return 0; } bool ARMFastISel::isTypeLegal(Type *Ty, MVT &VT) { EVT evt = TLI.getValueType(Ty, true); // Only handle simple types. if (evt == MVT::Other || !evt.isSimple()) return false; VT = evt.getSimpleVT(); // Handle all legal types, i.e. a register that will directly hold this // value. return TLI.isTypeLegal(VT); } bool ARMFastISel::isLoadTypeLegal(Type *Ty, MVT &VT) { if (isTypeLegal(Ty, VT)) return true; // If this is a type than can be sign or zero-extended to a basic operation // go ahead and accept it now. if (VT == MVT::i8 || VT == MVT::i16) return true; return false; } // Computes the address to get to an object. bool ARMFastISel::ARMComputeAddress(const Value *Obj, Address &Addr) { // Some boilerplate from the X86 FastISel. const User *U = NULL; unsigned Opcode = Instruction::UserOp1; if (const Instruction *I = dyn_cast(Obj)) { // Don't walk into other basic blocks unless the object is an alloca from // another block, otherwise it may not have a virtual register assigned. if (FuncInfo.StaticAllocaMap.count(static_cast(Obj)) || FuncInfo.MBBMap[I->getParent()] == FuncInfo.MBB) { Opcode = I->getOpcode(); U = I; } } else if (const ConstantExpr *C = dyn_cast(Obj)) { Opcode = C->getOpcode(); U = C; } if (PointerType *Ty = dyn_cast(Obj->getType())) if (Ty->getAddressSpace() > 255) // Fast instruction selection doesn't support the special // address spaces. return false; switch (Opcode) { default: break; case Instruction::BitCast: { // Look through bitcasts. return ARMComputeAddress(U->getOperand(0), Addr); } case Instruction::IntToPtr: { // Look past no-op inttoptrs. if (TLI.getValueType(U->getOperand(0)->getType()) == TLI.getPointerTy()) return ARMComputeAddress(U->getOperand(0), Addr); break; } case Instruction::PtrToInt: { // Look past no-op ptrtoints. if (TLI.getValueType(U->getType()) == TLI.getPointerTy()) return ARMComputeAddress(U->getOperand(0), Addr); break; } case Instruction::GetElementPtr: { Address SavedAddr = Addr; int TmpOffset = Addr.Offset; // Iterate through the GEP folding the constants into offsets where // we can. gep_type_iterator GTI = gep_type_begin(U); for (User::const_op_iterator i = U->op_begin() + 1, e = U->op_end(); i != e; ++i, ++GTI) { const Value *Op = *i; if (StructType *STy = dyn_cast(*GTI)) { const StructLayout *SL = TD.getStructLayout(STy); unsigned Idx = cast(Op)->getZExtValue(); TmpOffset += SL->getElementOffset(Idx); } else { uint64_t S = TD.getTypeAllocSize(GTI.getIndexedType()); for (;;) { if (const ConstantInt *CI = dyn_cast(Op)) { // Constant-offset addressing. TmpOffset += CI->getSExtValue() * S; break; } if (isa(Op) && (!isa(Op) || FuncInfo.MBBMap[cast(Op)->getParent()] == FuncInfo.MBB) && isa(cast(Op)->getOperand(1))) { // An add (in the same block) with a constant operand. Fold the // constant. ConstantInt *CI = cast(cast(Op)->getOperand(1)); TmpOffset += CI->getSExtValue() * S; // Iterate on the other operand. Op = cast(Op)->getOperand(0); continue; } // Unsupported goto unsupported_gep; } } } // Try to grab the base operand now. Addr.Offset = TmpOffset; if (ARMComputeAddress(U->getOperand(0), Addr)) return true; // We failed, restore everything and try the other options. Addr = SavedAddr; unsupported_gep: break; } case Instruction::Alloca: { const AllocaInst *AI = cast(Obj); DenseMap::iterator SI = FuncInfo.StaticAllocaMap.find(AI); if (SI != FuncInfo.StaticAllocaMap.end()) { Addr.BaseType = Address::FrameIndexBase; Addr.Base.FI = SI->second; return true; } break; } } // Materialize the global variable's address into a reg which can // then be used later to load the variable. if (const GlobalValue *GV = dyn_cast(Obj)) { unsigned Tmp = ARMMaterializeGV(GV, TLI.getValueType(Obj->getType())); if (Tmp == 0) return false; Addr.Base.Reg = Tmp; return true; } // Try to get this in a register if nothing else has worked. if (Addr.Base.Reg == 0) Addr.Base.Reg = getRegForValue(Obj); return Addr.Base.Reg != 0; } void ARMFastISel::ARMSimplifyAddress(Address &Addr, EVT VT) { assert(VT.isSimple() && "Non-simple types are invalid here!"); bool needsLowering = false; switch (VT.getSimpleVT().SimpleTy) { default: assert(false && "Unhandled load/store type!"); case MVT::i1: case MVT::i8: case MVT::i16: case MVT::i32: // Integer loads/stores handle 12-bit offsets. needsLowering = ((Addr.Offset & 0xfff) != Addr.Offset); break; case MVT::f32: case MVT::f64: // Floating point operands handle 8-bit offsets. needsLowering = ((Addr.Offset & 0xff) != Addr.Offset); break; } // If this is a stack pointer and the offset needs to be simplified then // put the alloca address into a register, set the base type back to // register and continue. This should almost never happen. if (needsLowering && Addr.BaseType == Address::FrameIndexBase) { TargetRegisterClass *RC = isThumb ? ARM::tGPRRegisterClass : ARM::GPRRegisterClass; unsigned ResultReg = createResultReg(RC); unsigned Opc = isThumb ? ARM::t2ADDri : ARM::ADDri; AddOptionalDefs(BuildMI(*FuncInfo.MBB, *FuncInfo.InsertPt, DL, TII.get(Opc), ResultReg) .addFrameIndex(Addr.Base.FI) .addImm(0)); Addr.Base.Reg = ResultReg; Addr.BaseType = Address::RegBase; } // Since the offset is too large for the load/store instruction // get the reg+offset into a register. if (needsLowering) { Addr.Base.Reg = FastEmit_ri_(MVT::i32, ISD::ADD, Addr.Base.Reg, /*Op0IsKill*/false, Addr.Offset, MVT::i32); Addr.Offset = 0; } } void ARMFastISel::AddLoadStoreOperands(EVT VT, Address &Addr, const MachineInstrBuilder &MIB, unsigned Flags) { // addrmode5 output depends on the selection dag addressing dividing the // offset by 4 that it then later multiplies. Do this here as well. if (VT.getSimpleVT().SimpleTy == MVT::f32 || VT.getSimpleVT().SimpleTy == MVT::f64) Addr.Offset /= 4; // Frame base works a bit differently. Handle it separately. if (Addr.BaseType == Address::FrameIndexBase) { int FI = Addr.Base.FI; int Offset = Addr.Offset; MachineMemOperand *MMO = FuncInfo.MF->getMachineMemOperand( MachinePointerInfo::getFixedStack(FI, Offset), Flags, MFI.getObjectSize(FI), MFI.getObjectAlignment(FI)); // Now add the rest of the operands. MIB.addFrameIndex(FI); // ARM halfword load/stores need an additional operand. if (!isThumb && VT.getSimpleVT().SimpleTy == MVT::i16) MIB.addReg(0); MIB.addImm(Addr.Offset); MIB.addMemOperand(MMO); } else { // Now add the rest of the operands. MIB.addReg(Addr.Base.Reg); // ARM halfword load/stores need an additional operand. if (!isThumb && VT.getSimpleVT().SimpleTy == MVT::i16) MIB.addReg(0); MIB.addImm(Addr.Offset); } AddOptionalDefs(MIB); } bool ARMFastISel::ARMEmitLoad(EVT VT, unsigned &ResultReg, Address &Addr) { assert(VT.isSimple() && "Non-simple types are invalid here!"); unsigned Opc; TargetRegisterClass *RC; switch (VT.getSimpleVT().SimpleTy) { // This is mostly going to be Neon/vector support. default: return false; case MVT::i16: Opc = isThumb ? ARM::t2LDRHi12 : ARM::LDRH; RC = ARM::GPRRegisterClass; break; case MVT::i8: Opc = isThumb ? ARM::t2LDRBi12 : ARM::LDRBi12; RC = ARM::GPRRegisterClass; break; case MVT::i32: Opc = isThumb ? ARM::t2LDRi12 : ARM::LDRi12; RC = ARM::GPRRegisterClass; break; case MVT::f32: Opc = ARM::VLDRS; RC = TLI.getRegClassFor(VT); break; case MVT::f64: Opc = ARM::VLDRD; RC = TLI.getRegClassFor(VT); break; } // Simplify this down to something we can handle. ARMSimplifyAddress(Addr, VT); // Create the base instruction, then add the operands. ResultReg = createResultReg(RC); MachineInstrBuilder MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(Opc), ResultReg); AddLoadStoreOperands(VT, Addr, MIB, MachineMemOperand::MOLoad); return true; } bool ARMFastISel::SelectLoad(const Instruction *I) { // Atomic loads need special handling. if (cast(I)->isAtomic()) return false; // Verify we have a legal type before going any further. MVT VT; if (!isLoadTypeLegal(I->getType(), VT)) return false; // See if we can handle this address. Address Addr; if (!ARMComputeAddress(I->getOperand(0), Addr)) return false; unsigned ResultReg; if (!ARMEmitLoad(VT, ResultReg, Addr)) return false; UpdateValueMap(I, ResultReg); return true; } bool ARMFastISel::ARMEmitStore(EVT VT, unsigned SrcReg, Address &Addr) { unsigned StrOpc; switch (VT.getSimpleVT().SimpleTy) { // This is mostly going to be Neon/vector support. default: return false; case MVT::i1: { unsigned Res = createResultReg(isThumb ? ARM::tGPRRegisterClass : ARM::GPRRegisterClass); unsigned Opc = isThumb ? ARM::t2ANDri : ARM::ANDri; AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(Opc), Res) .addReg(SrcReg).addImm(1)); SrcReg = Res; } // Fallthrough here. case MVT::i8: StrOpc = isThumb ? ARM::t2STRBi12 : ARM::STRBi12; break; case MVT::i16: StrOpc = isThumb ? ARM::t2STRHi12 : ARM::STRH; break; case MVT::i32: StrOpc = isThumb ? ARM::t2STRi12 : ARM::STRi12; break; case MVT::f32: if (!Subtarget->hasVFP2()) return false; StrOpc = ARM::VSTRS; break; case MVT::f64: if (!Subtarget->hasVFP2()) return false; StrOpc = ARM::VSTRD; break; } // Simplify this down to something we can handle. ARMSimplifyAddress(Addr, VT); // Create the base instruction, then add the operands. MachineInstrBuilder MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(StrOpc)) .addReg(SrcReg, getKillRegState(true)); AddLoadStoreOperands(VT, Addr, MIB, MachineMemOperand::MOStore); return true; } bool ARMFastISel::SelectStore(const Instruction *I) { Value *Op0 = I->getOperand(0); unsigned SrcReg = 0; // Atomic stores need special handling. if (cast(I)->isAtomic()) return false; // Verify we have a legal type before going any further. MVT VT; if (!isLoadTypeLegal(I->getOperand(0)->getType(), VT)) return false; // Get the value to be stored into a register. SrcReg = getRegForValue(Op0); if (SrcReg == 0) return false; // See if we can handle this address. Address Addr; if (!ARMComputeAddress(I->getOperand(1), Addr)) return false; if (!ARMEmitStore(VT, SrcReg, Addr)) return false; return true; } static ARMCC::CondCodes getComparePred(CmpInst::Predicate Pred) { switch (Pred) { // Needs two compares... case CmpInst::FCMP_ONE: case CmpInst::FCMP_UEQ: default: // AL is our "false" for now. The other two need more compares. return ARMCC::AL; case CmpInst::ICMP_EQ: case CmpInst::FCMP_OEQ: return ARMCC::EQ; case CmpInst::ICMP_SGT: case CmpInst::FCMP_OGT: return ARMCC::GT; case CmpInst::ICMP_SGE: case CmpInst::FCMP_OGE: return ARMCC::GE; case CmpInst::ICMP_UGT: case CmpInst::FCMP_UGT: return ARMCC::HI; case CmpInst::FCMP_OLT: return ARMCC::MI; case CmpInst::ICMP_ULE: case CmpInst::FCMP_OLE: return ARMCC::LS; case CmpInst::FCMP_ORD: return ARMCC::VC; case CmpInst::FCMP_UNO: return ARMCC::VS; case CmpInst::FCMP_UGE: return ARMCC::PL; case CmpInst::ICMP_SLT: case CmpInst::FCMP_ULT: return ARMCC::LT; case CmpInst::ICMP_SLE: case CmpInst::FCMP_ULE: return ARMCC::LE; case CmpInst::FCMP_UNE: case CmpInst::ICMP_NE: return ARMCC::NE; case CmpInst::ICMP_UGE: return ARMCC::HS; case CmpInst::ICMP_ULT: return ARMCC::LO; } } bool ARMFastISel::SelectBranch(const Instruction *I) { const BranchInst *BI = cast(I); MachineBasicBlock *TBB = FuncInfo.MBBMap[BI->getSuccessor(0)]; MachineBasicBlock *FBB = FuncInfo.MBBMap[BI->getSuccessor(1)]; // Simple branch support. // If we can, avoid recomputing the compare - redoing it could lead to wonky // behavior. // TODO: Factor this out. if (const CmpInst *CI = dyn_cast(BI->getCondition())) { MVT SourceVT; Type *Ty = CI->getOperand(0)->getType(); if (CI->hasOneUse() && (CI->getParent() == I->getParent()) && isTypeLegal(Ty, SourceVT)) { bool isFloat = (Ty->isDoubleTy() || Ty->isFloatTy()); if (isFloat && !Subtarget->hasVFP2()) return false; unsigned CmpOpc; switch (SourceVT.SimpleTy) { default: return false; // TODO: Verify compares. case MVT::f32: CmpOpc = ARM::VCMPES; break; case MVT::f64: CmpOpc = ARM::VCMPED; break; case MVT::i32: CmpOpc = isThumb ? ARM::t2CMPrr : ARM::CMPrr; break; } // Get the compare predicate. // Try to take advantage of fallthrough opportunities. CmpInst::Predicate Predicate = CI->getPredicate(); if (FuncInfo.MBB->isLayoutSuccessor(TBB)) { std::swap(TBB, FBB); Predicate = CmpInst::getInversePredicate(Predicate); } ARMCC::CondCodes ARMPred = getComparePred(Predicate); // We may not handle every CC for now. if (ARMPred == ARMCC::AL) return false; unsigned Arg1 = getRegForValue(CI->getOperand(0)); if (Arg1 == 0) return false; unsigned Arg2 = getRegForValue(CI->getOperand(1)); if (Arg2 == 0) return false; AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(CmpOpc)) .addReg(Arg1).addReg(Arg2)); // For floating point we need to move the result to a comparison register // that we can then use for branches. if (isFloat) AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::FMSTAT))); unsigned BrOpc = isThumb ? ARM::t2Bcc : ARM::Bcc; BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(BrOpc)) .addMBB(TBB).addImm(ARMPred).addReg(ARM::CPSR); FastEmitBranch(FBB, DL); FuncInfo.MBB->addSuccessor(TBB); return true; } } else if (TruncInst *TI = dyn_cast(BI->getCondition())) { MVT SourceVT; if (TI->hasOneUse() && TI->getParent() == I->getParent() && (isLoadTypeLegal(TI->getOperand(0)->getType(), SourceVT))) { unsigned TstOpc = isThumb ? ARM::t2TSTri : ARM::TSTri; unsigned OpReg = getRegForValue(TI->getOperand(0)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TstOpc)) .addReg(OpReg).addImm(1)); unsigned CCMode = ARMCC::NE; if (FuncInfo.MBB->isLayoutSuccessor(TBB)) { std::swap(TBB, FBB); CCMode = ARMCC::EQ; } unsigned BrOpc = isThumb ? ARM::t2Bcc : ARM::Bcc; BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(BrOpc)) .addMBB(TBB).addImm(CCMode).addReg(ARM::CPSR); FastEmitBranch(FBB, DL); FuncInfo.MBB->addSuccessor(TBB); return true; } } unsigned CmpReg = getRegForValue(BI->getCondition()); if (CmpReg == 0) return false; // We've been divorced from our compare! Our block was split, and // now our compare lives in a predecessor block. We musn't // re-compare here, as the children of the compare aren't guaranteed // live across the block boundary (we *could* check for this). // Regardless, the compare has been done in the predecessor block, // and it left a value for us in a virtual register. Ergo, we test // the one-bit value left in the virtual register. unsigned TstOpc = isThumb ? ARM::t2TSTri : ARM::TSTri; AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TstOpc)) .addReg(CmpReg).addImm(1)); unsigned CCMode = ARMCC::NE; if (FuncInfo.MBB->isLayoutSuccessor(TBB)) { std::swap(TBB, FBB); CCMode = ARMCC::EQ; } unsigned BrOpc = isThumb ? ARM::t2Bcc : ARM::Bcc; BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(BrOpc)) .addMBB(TBB).addImm(CCMode).addReg(ARM::CPSR); FastEmitBranch(FBB, DL); FuncInfo.MBB->addSuccessor(TBB); return true; } bool ARMFastISel::SelectCmp(const Instruction *I) { const CmpInst *CI = cast(I); MVT VT; Type *Ty = CI->getOperand(0)->getType(); if (!isTypeLegal(Ty, VT)) return false; bool isFloat = (Ty->isDoubleTy() || Ty->isFloatTy()); if (isFloat && !Subtarget->hasVFP2()) return false; unsigned CmpOpc; unsigned CondReg; switch (VT.SimpleTy) { default: return false; // TODO: Verify compares. case MVT::f32: CmpOpc = ARM::VCMPES; CondReg = ARM::FPSCR; break; case MVT::f64: CmpOpc = ARM::VCMPED; CondReg = ARM::FPSCR; break; case MVT::i32: CmpOpc = isThumb ? ARM::t2CMPrr : ARM::CMPrr; CondReg = ARM::CPSR; break; } // Get the compare predicate. ARMCC::CondCodes ARMPred = getComparePred(CI->getPredicate()); // We may not handle every CC for now. if (ARMPred == ARMCC::AL) return false; unsigned Arg1 = getRegForValue(CI->getOperand(0)); if (Arg1 == 0) return false; unsigned Arg2 = getRegForValue(CI->getOperand(1)); if (Arg2 == 0) return false; AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(CmpOpc)) .addReg(Arg1).addReg(Arg2)); // For floating point we need to move the result to a comparison register // that we can then use for branches. if (isFloat) AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::FMSTAT))); // Now set a register based on the comparison. Explicitly set the predicates // here. unsigned MovCCOpc = isThumb ? ARM::t2MOVCCi : ARM::MOVCCi; TargetRegisterClass *RC = isThumb ? ARM::rGPRRegisterClass : ARM::GPRRegisterClass; unsigned DestReg = createResultReg(RC); Constant *Zero = ConstantInt::get(Type::getInt32Ty(*Context), 0); unsigned ZeroReg = TargetMaterializeConstant(Zero); BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(MovCCOpc), DestReg) .addReg(ZeroReg).addImm(1) .addImm(ARMPred).addReg(CondReg); UpdateValueMap(I, DestReg); return true; } bool ARMFastISel::SelectFPExt(const Instruction *I) { // Make sure we have VFP and that we're extending float to double. if (!Subtarget->hasVFP2()) return false; Value *V = I->getOperand(0); if (!I->getType()->isDoubleTy() || !V->getType()->isFloatTy()) return false; unsigned Op = getRegForValue(V); if (Op == 0) return false; unsigned Result = createResultReg(ARM::DPRRegisterClass); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::VCVTDS), Result) .addReg(Op)); UpdateValueMap(I, Result); return true; } bool ARMFastISel::SelectFPTrunc(const Instruction *I) { // Make sure we have VFP and that we're truncating double to float. if (!Subtarget->hasVFP2()) return false; Value *V = I->getOperand(0); if (!(I->getType()->isFloatTy() && V->getType()->isDoubleTy())) return false; unsigned Op = getRegForValue(V); if (Op == 0) return false; unsigned Result = createResultReg(ARM::SPRRegisterClass); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::VCVTSD), Result) .addReg(Op)); UpdateValueMap(I, Result); return true; } bool ARMFastISel::SelectSIToFP(const Instruction *I) { // Make sure we have VFP. if (!Subtarget->hasVFP2()) return false; MVT DstVT; Type *Ty = I->getType(); if (!isTypeLegal(Ty, DstVT)) return false; // FIXME: Handle sign-extension where necessary. if (!I->getOperand(0)->getType()->isIntegerTy(32)) return false; unsigned Op = getRegForValue(I->getOperand(0)); if (Op == 0) return false; // The conversion routine works on fp-reg to fp-reg and the operand above // was an integer, move it to the fp registers if possible. unsigned FP = ARMMoveToFPReg(MVT::f32, Op); if (FP == 0) return false; unsigned Opc; if (Ty->isFloatTy()) Opc = ARM::VSITOS; else if (Ty->isDoubleTy()) Opc = ARM::VSITOD; else return false; unsigned ResultReg = createResultReg(TLI.getRegClassFor(DstVT)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(Opc), ResultReg) .addReg(FP)); UpdateValueMap(I, ResultReg); return true; } bool ARMFastISel::SelectFPToSI(const Instruction *I) { // Make sure we have VFP. if (!Subtarget->hasVFP2()) return false; MVT DstVT; Type *RetTy = I->getType(); if (!isTypeLegal(RetTy, DstVT)) return false; unsigned Op = getRegForValue(I->getOperand(0)); if (Op == 0) return false; unsigned Opc; Type *OpTy = I->getOperand(0)->getType(); if (OpTy->isFloatTy()) Opc = ARM::VTOSIZS; else if (OpTy->isDoubleTy()) Opc = ARM::VTOSIZD; else return false; // f64->s32 or f32->s32 both need an intermediate f32 reg. unsigned ResultReg = createResultReg(TLI.getRegClassFor(MVT::f32)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(Opc), ResultReg) .addReg(Op)); // This result needs to be in an integer register, but the conversion only // takes place in fp-regs. unsigned IntReg = ARMMoveToIntReg(DstVT, ResultReg); if (IntReg == 0) return false; UpdateValueMap(I, IntReg); return true; } bool ARMFastISel::SelectSelect(const Instruction *I) { MVT VT; if (!isTypeLegal(I->getType(), VT)) return false; // Things need to be register sized for register moves. if (VT != MVT::i32) return false; const TargetRegisterClass *RC = TLI.getRegClassFor(VT); unsigned CondReg = getRegForValue(I->getOperand(0)); if (CondReg == 0) return false; unsigned Op1Reg = getRegForValue(I->getOperand(1)); if (Op1Reg == 0) return false; unsigned Op2Reg = getRegForValue(I->getOperand(2)); if (Op2Reg == 0) return false; unsigned CmpOpc = isThumb ? ARM::t2TSTri : ARM::TSTri; AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(CmpOpc)) .addReg(CondReg).addImm(1)); unsigned ResultReg = createResultReg(RC); unsigned MovCCOpc = isThumb ? ARM::t2MOVCCr : ARM::MOVCCr; BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(MovCCOpc), ResultReg) .addReg(Op1Reg).addReg(Op2Reg) .addImm(ARMCC::EQ).addReg(ARM::CPSR); UpdateValueMap(I, ResultReg); return true; } bool ARMFastISel::SelectSDiv(const Instruction *I) { MVT VT; Type *Ty = I->getType(); if (!isTypeLegal(Ty, VT)) return false; // If we have integer div support we should have selected this automagically. // In case we have a real miss go ahead and return false and we'll pick // it up later. if (Subtarget->hasDivide()) return false; // Otherwise emit a libcall. RTLIB::Libcall LC = RTLIB::UNKNOWN_LIBCALL; if (VT == MVT::i8) LC = RTLIB::SDIV_I8; else if (VT == MVT::i16) LC = RTLIB::SDIV_I16; else if (VT == MVT::i32) LC = RTLIB::SDIV_I32; else if (VT == MVT::i64) LC = RTLIB::SDIV_I64; else if (VT == MVT::i128) LC = RTLIB::SDIV_I128; assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported SDIV!"); return ARMEmitLibcall(I, LC); } bool ARMFastISel::SelectSRem(const Instruction *I) { MVT VT; Type *Ty = I->getType(); if (!isTypeLegal(Ty, VT)) return false; RTLIB::Libcall LC = RTLIB::UNKNOWN_LIBCALL; if (VT == MVT::i8) LC = RTLIB::SREM_I8; else if (VT == MVT::i16) LC = RTLIB::SREM_I16; else if (VT == MVT::i32) LC = RTLIB::SREM_I32; else if (VT == MVT::i64) LC = RTLIB::SREM_I64; else if (VT == MVT::i128) LC = RTLIB::SREM_I128; assert(LC != RTLIB::UNKNOWN_LIBCALL && "Unsupported SREM!"); return ARMEmitLibcall(I, LC); } bool ARMFastISel::SelectBinaryOp(const Instruction *I, unsigned ISDOpcode) { EVT VT = TLI.getValueType(I->getType(), true); // We can get here in the case when we want to use NEON for our fp // operations, but can't figure out how to. Just use the vfp instructions // if we have them. // FIXME: It'd be nice to use NEON instructions. Type *Ty = I->getType(); bool isFloat = (Ty->isDoubleTy() || Ty->isFloatTy()); if (isFloat && !Subtarget->hasVFP2()) return false; unsigned Op1 = getRegForValue(I->getOperand(0)); if (Op1 == 0) return false; unsigned Op2 = getRegForValue(I->getOperand(1)); if (Op2 == 0) return false; unsigned Opc; bool is64bit = VT == MVT::f64 || VT == MVT::i64; switch (ISDOpcode) { default: return false; case ISD::FADD: Opc = is64bit ? ARM::VADDD : ARM::VADDS; break; case ISD::FSUB: Opc = is64bit ? ARM::VSUBD : ARM::VSUBS; break; case ISD::FMUL: Opc = is64bit ? ARM::VMULD : ARM::VMULS; break; } unsigned ResultReg = createResultReg(TLI.getRegClassFor(VT)); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(Opc), ResultReg) .addReg(Op1).addReg(Op2)); UpdateValueMap(I, ResultReg); return true; } // Call Handling Code bool ARMFastISel::FastEmitExtend(ISD::NodeType Opc, EVT DstVT, unsigned Src, EVT SrcVT, unsigned &ResultReg) { unsigned RR = FastEmit_r(SrcVT.getSimpleVT(), DstVT.getSimpleVT(), Opc, Src, /*TODO: Kill=*/false); if (RR != 0) { ResultReg = RR; return true; } else return false; } // This is largely taken directly from CCAssignFnForNode - we don't support // varargs in FastISel so that part has been removed. // TODO: We may not support all of this. CCAssignFn *ARMFastISel::CCAssignFnForCall(CallingConv::ID CC, bool Return) { switch (CC) { default: llvm_unreachable("Unsupported calling convention"); case CallingConv::Fast: // Ignore fastcc. Silence compiler warnings. (void)RetFastCC_ARM_APCS; (void)FastCC_ARM_APCS; // Fallthrough case CallingConv::C: // Use target triple & subtarget features to do actual dispatch. if (Subtarget->isAAPCS_ABI()) { if (Subtarget->hasVFP2() && FloatABIType == FloatABI::Hard) return (Return ? RetCC_ARM_AAPCS_VFP: CC_ARM_AAPCS_VFP); else return (Return ? RetCC_ARM_AAPCS: CC_ARM_AAPCS); } else return (Return ? RetCC_ARM_APCS: CC_ARM_APCS); case CallingConv::ARM_AAPCS_VFP: return (Return ? RetCC_ARM_AAPCS_VFP: CC_ARM_AAPCS_VFP); case CallingConv::ARM_AAPCS: return (Return ? RetCC_ARM_AAPCS: CC_ARM_AAPCS); case CallingConv::ARM_APCS: return (Return ? RetCC_ARM_APCS: CC_ARM_APCS); + case CallingConv::GHC: + if (Return) + llvm_unreachable("Can't return in GHC call convention"); + else + return CC_ARM_APCS_GHC; } } bool ARMFastISel::ProcessCallArgs(SmallVectorImpl &Args, SmallVectorImpl &ArgRegs, SmallVectorImpl &ArgVTs, SmallVectorImpl &ArgFlags, SmallVectorImpl &RegArgs, CallingConv::ID CC, unsigned &NumBytes) { SmallVector ArgLocs; CCState CCInfo(CC, false, *FuncInfo.MF, TM, ArgLocs, *Context); CCInfo.AnalyzeCallOperands(ArgVTs, ArgFlags, CCAssignFnForCall(CC, false)); // Get a count of how many bytes are to be pushed on the stack. NumBytes = CCInfo.getNextStackOffset(); // Issue CALLSEQ_START unsigned AdjStackDown = TII.getCallFrameSetupOpcode(); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(AdjStackDown)) .addImm(NumBytes)); // Process the args. for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) { CCValAssign &VA = ArgLocs[i]; unsigned Arg = ArgRegs[VA.getValNo()]; MVT ArgVT = ArgVTs[VA.getValNo()]; // We don't handle NEON/vector parameters yet. if (ArgVT.isVector() || ArgVT.getSizeInBits() > 64) return false; // Handle arg promotion, etc. switch (VA.getLocInfo()) { case CCValAssign::Full: break; case CCValAssign::SExt: { bool Emitted = FastEmitExtend(ISD::SIGN_EXTEND, VA.getLocVT(), Arg, ArgVT, Arg); assert(Emitted && "Failed to emit a sext!"); (void)Emitted; Emitted = true; ArgVT = VA.getLocVT(); break; } case CCValAssign::ZExt: { bool Emitted = FastEmitExtend(ISD::ZERO_EXTEND, VA.getLocVT(), Arg, ArgVT, Arg); assert(Emitted && "Failed to emit a zext!"); (void)Emitted; Emitted = true; ArgVT = VA.getLocVT(); break; } case CCValAssign::AExt: { bool Emitted = FastEmitExtend(ISD::ANY_EXTEND, VA.getLocVT(), Arg, ArgVT, Arg); if (!Emitted) Emitted = FastEmitExtend(ISD::ZERO_EXTEND, VA.getLocVT(), Arg, ArgVT, Arg); if (!Emitted) Emitted = FastEmitExtend(ISD::SIGN_EXTEND, VA.getLocVT(), Arg, ArgVT, Arg); assert(Emitted && "Failed to emit a aext!"); (void)Emitted; ArgVT = VA.getLocVT(); break; } case CCValAssign::BCvt: { unsigned BC = FastEmit_r(ArgVT, VA.getLocVT(), ISD::BITCAST, Arg, /*TODO: Kill=*/false); assert(BC != 0 && "Failed to emit a bitcast!"); Arg = BC; ArgVT = VA.getLocVT(); break; } default: llvm_unreachable("Unknown arg promotion!"); } // Now copy/store arg to correct locations. if (VA.isRegLoc() && !VA.needsCustom()) { BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), VA.getLocReg()) .addReg(Arg); RegArgs.push_back(VA.getLocReg()); } else if (VA.needsCustom()) { // TODO: We need custom lowering for vector (v2f64) args. if (VA.getLocVT() != MVT::f64) return false; CCValAssign &NextVA = ArgLocs[++i]; // TODO: Only handle register args for now. if(!(VA.isRegLoc() && NextVA.isRegLoc())) return false; AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::VMOVRRD), VA.getLocReg()) .addReg(NextVA.getLocReg(), RegState::Define) .addReg(Arg)); RegArgs.push_back(VA.getLocReg()); RegArgs.push_back(NextVA.getLocReg()); } else { assert(VA.isMemLoc()); // Need to store on the stack. Address Addr; Addr.BaseType = Address::RegBase; Addr.Base.Reg = ARM::SP; Addr.Offset = VA.getLocMemOffset(); if (!ARMEmitStore(ArgVT, Arg, Addr)) return false; } } return true; } bool ARMFastISel::FinishCall(MVT RetVT, SmallVectorImpl &UsedRegs, const Instruction *I, CallingConv::ID CC, unsigned &NumBytes) { // Issue CALLSEQ_END unsigned AdjStackUp = TII.getCallFrameDestroyOpcode(); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(AdjStackUp)) .addImm(NumBytes).addImm(0)); // Now the return value. if (RetVT != MVT::isVoid) { SmallVector RVLocs; CCState CCInfo(CC, false, *FuncInfo.MF, TM, RVLocs, *Context); CCInfo.AnalyzeCallResult(RetVT, CCAssignFnForCall(CC, true)); // Copy all of the result registers out of their specified physreg. if (RVLocs.size() == 2 && RetVT == MVT::f64) { // For this move we copy into two registers and then move into the // double fp reg we want. EVT DestVT = RVLocs[0].getValVT(); TargetRegisterClass* DstRC = TLI.getRegClassFor(DestVT); unsigned ResultReg = createResultReg(DstRC); AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(ARM::VMOVDRR), ResultReg) .addReg(RVLocs[0].getLocReg()) .addReg(RVLocs[1].getLocReg())); UsedRegs.push_back(RVLocs[0].getLocReg()); UsedRegs.push_back(RVLocs[1].getLocReg()); // Finally update the result. UpdateValueMap(I, ResultReg); } else { assert(RVLocs.size() == 1 &&"Can't handle non-double multi-reg retvals!"); EVT CopyVT = RVLocs[0].getValVT(); TargetRegisterClass* DstRC = TLI.getRegClassFor(CopyVT); unsigned ResultReg = createResultReg(DstRC); BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), ResultReg).addReg(RVLocs[0].getLocReg()); UsedRegs.push_back(RVLocs[0].getLocReg()); // Finally update the result. UpdateValueMap(I, ResultReg); } } return true; } bool ARMFastISel::SelectRet(const Instruction *I) { const ReturnInst *Ret = cast(I); const Function &F = *I->getParent()->getParent(); if (!FuncInfo.CanLowerReturn) return false; if (F.isVarArg()) return false; CallingConv::ID CC = F.getCallingConv(); if (Ret->getNumOperands() > 0) { SmallVector Outs; GetReturnInfo(F.getReturnType(), F.getAttributes().getRetAttributes(), Outs, TLI); // Analyze operands of the call, assigning locations to each operand. SmallVector ValLocs; CCState CCInfo(CC, F.isVarArg(), *FuncInfo.MF, TM, ValLocs,I->getContext()); CCInfo.AnalyzeReturn(Outs, CCAssignFnForCall(CC, true /* is Ret */)); const Value *RV = Ret->getOperand(0); unsigned Reg = getRegForValue(RV); if (Reg == 0) return false; // Only handle a single return value for now. if (ValLocs.size() != 1) return false; CCValAssign &VA = ValLocs[0]; // Don't bother handling odd stuff for now. if (VA.getLocInfo() != CCValAssign::Full) return false; // Only handle register returns for now. if (!VA.isRegLoc()) return false; // TODO: For now, don't try to handle cases where getLocInfo() // says Full but the types don't match. if (TLI.getValueType(RV->getType()) != VA.getValVT()) return false; // Make the copy. unsigned SrcReg = Reg + VA.getValNo(); unsigned DstReg = VA.getLocReg(); const TargetRegisterClass* SrcRC = MRI.getRegClass(SrcReg); // Avoid a cross-class copy. This is very unlikely. if (!SrcRC->contains(DstReg)) return false; BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(TargetOpcode::COPY), DstReg).addReg(SrcReg); // Mark the register as live out of the function. MRI.addLiveOut(VA.getLocReg()); } unsigned RetOpc = isThumb ? ARM::tBX_RET : ARM::BX_RET; AddOptionalDefs(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(RetOpc))); return true; } unsigned ARMFastISel::ARMSelectCallOp(const GlobalValue *GV) { // Darwin needs the r9 versions of the opcodes. bool isDarwin = Subtarget->isTargetDarwin(); if (isThumb) { return isDarwin ? ARM::tBLr9 : ARM::tBL; } else { return isDarwin ? ARM::BLr9 : ARM::BL; } } // A quick function that will emit a call for a named libcall in F with the // vector of passed arguments for the Instruction in I. We can assume that we // can emit a call for any libcall we can produce. This is an abridged version // of the full call infrastructure since we won't need to worry about things // like computed function pointers or strange arguments at call sites. // TODO: Try to unify this and the normal call bits for ARM, then try to unify // with X86. bool ARMFastISel::ARMEmitLibcall(const Instruction *I, RTLIB::Libcall Call) { CallingConv::ID CC = TLI.getLibcallCallingConv(Call); // Handle *simple* calls for now. Type *RetTy = I->getType(); MVT RetVT; if (RetTy->isVoidTy()) RetVT = MVT::isVoid; else if (!isTypeLegal(RetTy, RetVT)) return false; // TODO: For now if we have long calls specified we don't handle the call. if (EnableARMLongCalls) return false; // Set up the argument vectors. SmallVector Args; SmallVector ArgRegs; SmallVector ArgVTs; SmallVector ArgFlags; Args.reserve(I->getNumOperands()); ArgRegs.reserve(I->getNumOperands()); ArgVTs.reserve(I->getNumOperands()); ArgFlags.reserve(I->getNumOperands()); for (unsigned i = 0; i < I->getNumOperands(); ++i) { Value *Op = I->getOperand(i); unsigned Arg = getRegForValue(Op); if (Arg == 0) return false; Type *ArgTy = Op->getType(); MVT ArgVT; if (!isTypeLegal(ArgTy, ArgVT)) return false; ISD::ArgFlagsTy Flags; unsigned OriginalAlignment = TD.getABITypeAlignment(ArgTy); Flags.setOrigAlign(OriginalAlignment); Args.push_back(Op); ArgRegs.push_back(Arg); ArgVTs.push_back(ArgVT); ArgFlags.push_back(Flags); } // Handle the arguments now that we've gotten them. SmallVector RegArgs; unsigned NumBytes; if (!ProcessCallArgs(Args, ArgRegs, ArgVTs, ArgFlags, RegArgs, CC, NumBytes)) return false; // Issue the call, BLr9 for darwin, BL otherwise. // TODO: Turn this into the table of arm call ops. MachineInstrBuilder MIB; unsigned CallOpc = ARMSelectCallOp(NULL); if(isThumb) // Explicitly adding the predicate here. MIB = AddDefaultPred(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(CallOpc))) .addExternalSymbol(TLI.getLibcallName(Call)); else // Explicitly adding the predicate here. MIB = AddDefaultPred(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(CallOpc)) .addExternalSymbol(TLI.getLibcallName(Call))); // Add implicit physical register uses to the call. for (unsigned i = 0, e = RegArgs.size(); i != e; ++i) MIB.addReg(RegArgs[i]); // Finish off the call including any return values. SmallVector UsedRegs; if (!FinishCall(RetVT, UsedRegs, I, CC, NumBytes)) return false; // Set all unused physreg defs as dead. static_cast(MIB)->setPhysRegsDeadExcept(UsedRegs, TRI); return true; } bool ARMFastISel::SelectCall(const Instruction *I) { const CallInst *CI = cast(I); const Value *Callee = CI->getCalledValue(); // Can't handle inline asm or worry about intrinsics yet. if (isa(Callee) || isa(CI)) return false; // Only handle global variable Callees. const GlobalValue *GV = dyn_cast(Callee); if (!GV) return false; // Check the calling convention. ImmutableCallSite CS(CI); CallingConv::ID CC = CS.getCallingConv(); // TODO: Avoid some calling conventions? // Let SDISel handle vararg functions. PointerType *PT = cast(CS.getCalledValue()->getType()); FunctionType *FTy = cast(PT->getElementType()); if (FTy->isVarArg()) return false; // Handle *simple* calls for now. Type *RetTy = I->getType(); MVT RetVT; if (RetTy->isVoidTy()) RetVT = MVT::isVoid; else if (!isTypeLegal(RetTy, RetVT)) return false; // TODO: For now if we have long calls specified we don't handle the call. if (EnableARMLongCalls) return false; // Set up the argument vectors. SmallVector Args; SmallVector ArgRegs; SmallVector ArgVTs; SmallVector ArgFlags; Args.reserve(CS.arg_size()); ArgRegs.reserve(CS.arg_size()); ArgVTs.reserve(CS.arg_size()); ArgFlags.reserve(CS.arg_size()); for (ImmutableCallSite::arg_iterator i = CS.arg_begin(), e = CS.arg_end(); i != e; ++i) { unsigned Arg = getRegForValue(*i); if (Arg == 0) return false; ISD::ArgFlagsTy Flags; unsigned AttrInd = i - CS.arg_begin() + 1; if (CS.paramHasAttr(AttrInd, Attribute::SExt)) Flags.setSExt(); if (CS.paramHasAttr(AttrInd, Attribute::ZExt)) Flags.setZExt(); // FIXME: Only handle *easy* calls for now. if (CS.paramHasAttr(AttrInd, Attribute::InReg) || CS.paramHasAttr(AttrInd, Attribute::StructRet) || CS.paramHasAttr(AttrInd, Attribute::Nest) || CS.paramHasAttr(AttrInd, Attribute::ByVal)) return false; Type *ArgTy = (*i)->getType(); MVT ArgVT; if (!isTypeLegal(ArgTy, ArgVT)) return false; unsigned OriginalAlignment = TD.getABITypeAlignment(ArgTy); Flags.setOrigAlign(OriginalAlignment); Args.push_back(*i); ArgRegs.push_back(Arg); ArgVTs.push_back(ArgVT); ArgFlags.push_back(Flags); } // Handle the arguments now that we've gotten them. SmallVector RegArgs; unsigned NumBytes; if (!ProcessCallArgs(Args, ArgRegs, ArgVTs, ArgFlags, RegArgs, CC, NumBytes)) return false; // Issue the call, BLr9 for darwin, BL otherwise. // TODO: Turn this into the table of arm call ops. MachineInstrBuilder MIB; unsigned CallOpc = ARMSelectCallOp(GV); // Explicitly adding the predicate here. if(isThumb) // Explicitly adding the predicate here. MIB = AddDefaultPred(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(CallOpc))) .addGlobalAddress(GV, 0, 0); else // Explicitly adding the predicate here. MIB = AddDefaultPred(BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(CallOpc)) .addGlobalAddress(GV, 0, 0)); // Add implicit physical register uses to the call. for (unsigned i = 0, e = RegArgs.size(); i != e; ++i) MIB.addReg(RegArgs[i]); // Finish off the call including any return values. SmallVector UsedRegs; if (!FinishCall(RetVT, UsedRegs, I, CC, NumBytes)) return false; // Set all unused physreg defs as dead. static_cast(MIB)->setPhysRegsDeadExcept(UsedRegs, TRI); return true; } bool ARMFastISel::SelectIntCast(const Instruction *I) { // On ARM, in general, integer casts don't involve legal types; this code // handles promotable integers. The high bits for a type smaller than // the register size are assumed to be undefined. Type *DestTy = I->getType(); Value *Op = I->getOperand(0); Type *SrcTy = Op->getType(); EVT SrcVT, DestVT; SrcVT = TLI.getValueType(SrcTy, true); DestVT = TLI.getValueType(DestTy, true); if (isa(I)) { if (SrcVT != MVT::i32 && SrcVT != MVT::i16 && SrcVT != MVT::i8) return false; if (DestVT != MVT::i16 && DestVT != MVT::i8 && DestVT != MVT::i1) return false; unsigned SrcReg = getRegForValue(Op); if (!SrcReg) return false; // Because the high bits are undefined, a truncate doesn't generate // any code. UpdateValueMap(I, SrcReg); return true; } if (DestVT != MVT::i32 && DestVT != MVT::i16 && DestVT != MVT::i8) return false; unsigned Opc; bool isZext = isa(I); bool isBoolZext = false; if (!SrcVT.isSimple()) return false; switch (SrcVT.getSimpleVT().SimpleTy) { default: return false; case MVT::i16: if (!Subtarget->hasV6Ops()) return false; if (isZext) Opc = isThumb ? ARM::t2UXTH : ARM::UXTH; else Opc = isThumb ? ARM::t2SXTH : ARM::SXTH; break; case MVT::i8: if (!Subtarget->hasV6Ops()) return false; if (isZext) Opc = isThumb ? ARM::t2UXTB : ARM::UXTB; else Opc = isThumb ? ARM::t2SXTB : ARM::SXTB; break; case MVT::i1: if (isZext) { Opc = isThumb ? ARM::t2ANDri : ARM::ANDri; isBoolZext = true; break; } return false; } // FIXME: We could save an instruction in many cases by special-casing // load instructions. unsigned SrcReg = getRegForValue(Op); if (!SrcReg) return false; unsigned DestReg = createResultReg(TLI.getRegClassFor(MVT::i32)); MachineInstrBuilder MIB; MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DL, TII.get(Opc), DestReg) .addReg(SrcReg); if (isBoolZext) MIB.addImm(1); else MIB.addImm(0); AddOptionalDefs(MIB); UpdateValueMap(I, DestReg); return true; } // TODO: SoftFP support. bool ARMFastISel::TargetSelectInstruction(const Instruction *I) { switch (I->getOpcode()) { case Instruction::Load: return SelectLoad(I); case Instruction::Store: return SelectStore(I); case Instruction::Br: return SelectBranch(I); case Instruction::ICmp: case Instruction::FCmp: return SelectCmp(I); case Instruction::FPExt: return SelectFPExt(I); case Instruction::FPTrunc: return SelectFPTrunc(I); case Instruction::SIToFP: return SelectSIToFP(I); case Instruction::FPToSI: return SelectFPToSI(I); case Instruction::FAdd: return SelectBinaryOp(I, ISD::FADD); case Instruction::FSub: return SelectBinaryOp(I, ISD::FSUB); case Instruction::FMul: return SelectBinaryOp(I, ISD::FMUL); case Instruction::SDiv: return SelectSDiv(I); case Instruction::SRem: return SelectSRem(I); case Instruction::Call: return SelectCall(I); case Instruction::Select: return SelectSelect(I); case Instruction::Ret: return SelectRet(I); case Instruction::Trunc: case Instruction::ZExt: case Instruction::SExt: return SelectIntCast(I); default: break; } return false; } namespace llvm { llvm::FastISel *ARM::createFastISel(FunctionLoweringInfo &funcInfo) { // Completely untested on non-darwin. const TargetMachine &TM = funcInfo.MF->getTarget(); // Darwin and thumb1 only for now. const ARMSubtarget *Subtarget = &TM.getSubtarget(); if (Subtarget->isTargetDarwin() && !Subtarget->isThumb1Only() && !DisableARMFastISel) return new ARMFastISel(funcInfo); return 0; } } Index: vendor/llvm/dist/lib/Target/ARM/ARMFrameLowering.cpp =================================================================== --- vendor/llvm/dist/lib/Target/ARM/ARMFrameLowering.cpp (revision 228363) +++ vendor/llvm/dist/lib/Target/ARM/ARMFrameLowering.cpp (revision 228364) @@ -1,1087 +1,1097 @@ //=======- ARMFrameLowering.cpp - ARM Frame Information --------*- C++ -*-====// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file contains the ARM implementation of TargetFrameLowering class. // //===----------------------------------------------------------------------===// #include "ARMFrameLowering.h" #include "ARMBaseInstrInfo.h" #include "ARMBaseRegisterInfo.h" #include "ARMMachineFunctionInfo.h" +#include "llvm/CallingConv.h" +#include "llvm/Function.h" #include "MCTargetDesc/ARMAddressingModes.h" #include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/CodeGen/MachineRegisterInfo.h" #include "llvm/CodeGen/RegisterScavenging.h" #include "llvm/Target/TargetOptions.h" using namespace llvm; /// hasFP - Return true if the specified function should have a dedicated frame /// pointer register. This is true if the function has variable sized allocas /// or if frame pointer elimination is disabled. bool ARMFrameLowering::hasFP(const MachineFunction &MF) const { const TargetRegisterInfo *RegInfo = MF.getTarget().getRegisterInfo(); // Mac OS X requires FP not to be clobbered for backtracing purpose. if (STI.isTargetDarwin()) return true; const MachineFrameInfo *MFI = MF.getFrameInfo(); // Always eliminate non-leaf frame pointers. return ((DisableFramePointerElim(MF) && MFI->hasCalls()) || RegInfo->needsStackRealignment(MF) || MFI->hasVarSizedObjects() || MFI->isFrameAddressTaken()); } /// hasReservedCallFrame - Under normal circumstances, when a frame pointer is /// not required, we reserve argument space for call sites in the function /// immediately on entry to the current function. This eliminates the need for /// add/sub sp brackets around call sites. Returns true if the call frame is /// included as part of the stack frame. bool ARMFrameLowering::hasReservedCallFrame(const MachineFunction &MF) const { const MachineFrameInfo *FFI = MF.getFrameInfo(); unsigned CFSize = FFI->getMaxCallFrameSize(); // It's not always a good idea to include the call frame as part of the // stack frame. ARM (especially Thumb) has small immediate offset to // address the stack frame. So a large call frame can cause poor codegen // and may even makes it impossible to scavenge a register. if (CFSize >= ((1 << 12) - 1) / 2) // Half of imm12 return false; return !MF.getFrameInfo()->hasVarSizedObjects(); } /// canSimplifyCallFramePseudos - If there is a reserved call frame, the /// call frame pseudos can be simplified. Unlike most targets, having a FP /// is not sufficient here since we still may reference some objects via SP /// even when FP is available in Thumb2 mode. bool ARMFrameLowering::canSimplifyCallFramePseudos(const MachineFunction &MF) const { return hasReservedCallFrame(MF) || MF.getFrameInfo()->hasVarSizedObjects(); } static bool isCalleeSavedRegister(unsigned Reg, const unsigned *CSRegs) { for (unsigned i = 0; CSRegs[i]; ++i) if (Reg == CSRegs[i]) return true; return false; } static bool isCSRestore(MachineInstr *MI, const ARMBaseInstrInfo &TII, const unsigned *CSRegs) { // Integer spill area is handled with "pop". if (MI->getOpcode() == ARM::LDMIA_RET || MI->getOpcode() == ARM::t2LDMIA_RET || MI->getOpcode() == ARM::LDMIA_UPD || MI->getOpcode() == ARM::t2LDMIA_UPD || MI->getOpcode() == ARM::VLDMDIA_UPD) { // The first two operands are predicates. The last two are // imp-def and imp-use of SP. Check everything in between. for (int i = 5, e = MI->getNumOperands(); i != e; ++i) if (!isCalleeSavedRegister(MI->getOperand(i).getReg(), CSRegs)) return false; return true; } if ((MI->getOpcode() == ARM::LDR_POST_IMM || MI->getOpcode() == ARM::LDR_POST_REG || MI->getOpcode() == ARM::t2LDR_POST) && isCalleeSavedRegister(MI->getOperand(0).getReg(), CSRegs) && MI->getOperand(1).getReg() == ARM::SP) return true; return false; } static void emitSPUpdate(bool isARM, MachineBasicBlock &MBB, MachineBasicBlock::iterator &MBBI, DebugLoc dl, const ARMBaseInstrInfo &TII, int NumBytes, unsigned MIFlags = MachineInstr::NoFlags) { if (isARM) emitARMRegPlusImmediate(MBB, MBBI, dl, ARM::SP, ARM::SP, NumBytes, ARMCC::AL, 0, TII, MIFlags); else emitT2RegPlusImmediate(MBB, MBBI, dl, ARM::SP, ARM::SP, NumBytes, ARMCC::AL, 0, TII, MIFlags); } void ARMFrameLowering::emitPrologue(MachineFunction &MF) const { MachineBasicBlock &MBB = MF.front(); MachineBasicBlock::iterator MBBI = MBB.begin(); MachineFrameInfo *MFI = MF.getFrameInfo(); ARMFunctionInfo *AFI = MF.getInfo(); const ARMBaseRegisterInfo *RegInfo = static_cast(MF.getTarget().getRegisterInfo()); const ARMBaseInstrInfo &TII = *static_cast(MF.getTarget().getInstrInfo()); assert(!AFI->isThumb1OnlyFunction() && "This emitPrologue does not support Thumb1!"); bool isARM = !AFI->isThumbFunction(); unsigned VARegSaveSize = AFI->getVarArgsRegSaveSize(); unsigned NumBytes = MFI->getStackSize(); const std::vector &CSI = MFI->getCalleeSavedInfo(); DebugLoc dl = MBBI != MBB.end() ? MBBI->getDebugLoc() : DebugLoc(); unsigned FramePtr = RegInfo->getFrameRegister(MF); // Determine the sizes of each callee-save spill areas and record which frame // belongs to which callee-save spill areas. unsigned GPRCS1Size = 0, GPRCS2Size = 0, DPRCSSize = 0; int FramePtrSpillFI = 0; + // All calls are tail calls in GHC calling conv, and functions have no prologue/epilogue. + if (MF.getFunction()->getCallingConv() == CallingConv::GHC) + return; + // Allocate the vararg register save area. This is not counted in NumBytes. if (VARegSaveSize) emitSPUpdate(isARM, MBB, MBBI, dl, TII, -VARegSaveSize, MachineInstr::FrameSetup); if (!AFI->hasStackFrame()) { if (NumBytes != 0) emitSPUpdate(isARM, MBB, MBBI, dl, TII, -NumBytes, MachineInstr::FrameSetup); return; } for (unsigned i = 0, e = CSI.size(); i != e; ++i) { unsigned Reg = CSI[i].getReg(); int FI = CSI[i].getFrameIdx(); switch (Reg) { case ARM::R4: case ARM::R5: case ARM::R6: case ARM::R7: case ARM::LR: if (Reg == FramePtr) FramePtrSpillFI = FI; AFI->addGPRCalleeSavedArea1Frame(FI); GPRCS1Size += 4; break; case ARM::R8: case ARM::R9: case ARM::R10: case ARM::R11: if (Reg == FramePtr) FramePtrSpillFI = FI; if (STI.isTargetDarwin()) { AFI->addGPRCalleeSavedArea2Frame(FI); GPRCS2Size += 4; } else { AFI->addGPRCalleeSavedArea1Frame(FI); GPRCS1Size += 4; } break; default: AFI->addDPRCalleeSavedAreaFrame(FI); DPRCSSize += 8; } } // Move past area 1. if (GPRCS1Size > 0) MBBI++; // Set FP to point to the stack slot that contains the previous FP. // For Darwin, FP is R7, which has now been stored in spill area 1. // Otherwise, if this is not Darwin, all the callee-saved registers go // into spill area 1, including the FP in R11. In either case, it is // now safe to emit this assignment. bool HasFP = hasFP(MF); if (HasFP) { unsigned ADDriOpc = !AFI->isThumbFunction() ? ARM::ADDri : ARM::t2ADDri; MachineInstrBuilder MIB = BuildMI(MBB, MBBI, dl, TII.get(ADDriOpc), FramePtr) .addFrameIndex(FramePtrSpillFI).addImm(0) .setMIFlag(MachineInstr::FrameSetup); AddDefaultCC(AddDefaultPred(MIB)); } // Move past area 2. if (GPRCS2Size > 0) MBBI++; // Determine starting offsets of spill areas. unsigned DPRCSOffset = NumBytes - (GPRCS1Size + GPRCS2Size + DPRCSSize); unsigned GPRCS2Offset = DPRCSOffset + DPRCSSize; unsigned GPRCS1Offset = GPRCS2Offset + GPRCS2Size; if (HasFP) AFI->setFramePtrSpillOffset(MFI->getObjectOffset(FramePtrSpillFI) + NumBytes); AFI->setGPRCalleeSavedArea1Offset(GPRCS1Offset); AFI->setGPRCalleeSavedArea2Offset(GPRCS2Offset); AFI->setDPRCalleeSavedAreaOffset(DPRCSOffset); // Move past area 3. if (DPRCSSize > 0) { MBBI++; // Since vpush register list cannot have gaps, there may be multiple vpush // instructions in the prologue. while (MBBI->getOpcode() == ARM::VSTMDDB_UPD) MBBI++; } NumBytes = DPRCSOffset; if (NumBytes) { // Adjust SP after all the callee-save spills. emitSPUpdate(isARM, MBB, MBBI, dl, TII, -NumBytes, MachineInstr::FrameSetup); if (HasFP && isARM) // Restore from fp only in ARM mode: e.g. sub sp, r7, #24 // Note it's not safe to do this in Thumb2 mode because it would have // taken two instructions: // mov sp, r7 // sub sp, #24 // If an interrupt is taken between the two instructions, then sp is in // an inconsistent state (pointing to the middle of callee-saved area). // The interrupt handler can end up clobbering the registers. AFI->setShouldRestoreSPFromFP(true); } if (STI.isTargetELF() && hasFP(MF)) MFI->setOffsetAdjustment(MFI->getOffsetAdjustment() - AFI->getFramePtrSpillOffset()); AFI->setGPRCalleeSavedArea1Size(GPRCS1Size); AFI->setGPRCalleeSavedArea2Size(GPRCS2Size); AFI->setDPRCalleeSavedAreaSize(DPRCSSize); // If we need dynamic stack realignment, do it here. Be paranoid and make // sure if we also have VLAs, we have a base pointer for frame access. if (RegInfo->needsStackRealignment(MF)) { unsigned MaxAlign = MFI->getMaxAlignment(); assert (!AFI->isThumb1OnlyFunction()); if (!AFI->isThumbFunction()) { // Emit bic sp, sp, MaxAlign AddDefaultCC(AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::BICri), ARM::SP) .addReg(ARM::SP, RegState::Kill) .addImm(MaxAlign-1))); } else { // We cannot use sp as source/dest register here, thus we're emitting the // following sequence: // mov r4, sp // bic r4, r4, MaxAlign // mov sp, r4 // FIXME: It will be better just to find spare register here. AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::R4) .addReg(ARM::SP, RegState::Kill)); AddDefaultCC(AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::t2BICri), ARM::R4) .addReg(ARM::R4, RegState::Kill) .addImm(MaxAlign-1))); AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::SP) .addReg(ARM::R4, RegState::Kill)); } AFI->setShouldRestoreSPFromFP(true); } // If we need a base pointer, set it up here. It's whatever the value // of the stack pointer is at this point. Any variable size objects // will be allocated after this, so we can still use the base pointer // to reference locals. // FIXME: Clarify FrameSetup flags here. if (RegInfo->hasBasePointer(MF)) { if (isARM) BuildMI(MBB, MBBI, dl, TII.get(ARM::MOVr), RegInfo->getBaseRegister()) .addReg(ARM::SP) .addImm((unsigned)ARMCC::AL).addReg(0).addReg(0); else AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), RegInfo->getBaseRegister()) .addReg(ARM::SP)); } // If the frame has variable sized objects then the epilogue must restore // the sp from fp. We can assume there's an FP here since hasFP already // checks for hasVarSizedObjects. if (MFI->hasVarSizedObjects()) AFI->setShouldRestoreSPFromFP(true); } void ARMFrameLowering::emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const { MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr(); assert(MBBI->getDesc().isReturn() && "Can only insert epilog into returning blocks"); unsigned RetOpcode = MBBI->getOpcode(); DebugLoc dl = MBBI->getDebugLoc(); MachineFrameInfo *MFI = MF.getFrameInfo(); ARMFunctionInfo *AFI = MF.getInfo(); const TargetRegisterInfo *RegInfo = MF.getTarget().getRegisterInfo(); const ARMBaseInstrInfo &TII = *static_cast(MF.getTarget().getInstrInfo()); assert(!AFI->isThumb1OnlyFunction() && "This emitEpilogue does not support Thumb1!"); bool isARM = !AFI->isThumbFunction(); unsigned VARegSaveSize = AFI->getVarArgsRegSaveSize(); int NumBytes = (int)MFI->getStackSize(); unsigned FramePtr = RegInfo->getFrameRegister(MF); + + // All calls are tail calls in GHC calling conv, and functions have no prologue/epilogue. + if (MF.getFunction()->getCallingConv() == CallingConv::GHC) + return; if (!AFI->hasStackFrame()) { if (NumBytes != 0) emitSPUpdate(isARM, MBB, MBBI, dl, TII, NumBytes); } else { // Unwind MBBI to point to first LDR / VLDRD. const unsigned *CSRegs = RegInfo->getCalleeSavedRegs(); if (MBBI != MBB.begin()) { do --MBBI; while (MBBI != MBB.begin() && isCSRestore(MBBI, TII, CSRegs)); if (!isCSRestore(MBBI, TII, CSRegs)) ++MBBI; } // Move SP to start of FP callee save spill area. NumBytes -= (AFI->getGPRCalleeSavedArea1Size() + AFI->getGPRCalleeSavedArea2Size() + AFI->getDPRCalleeSavedAreaSize()); // Reset SP based on frame pointer only if the stack frame extends beyond // frame pointer stack slot or target is ELF and the function has FP. if (AFI->shouldRestoreSPFromFP()) { NumBytes = AFI->getFramePtrSpillOffset() - NumBytes; if (NumBytes) { if (isARM) emitARMRegPlusImmediate(MBB, MBBI, dl, ARM::SP, FramePtr, -NumBytes, ARMCC::AL, 0, TII); else { // It's not possible to restore SP from FP in a single instruction. // For Darwin, this looks like: // mov sp, r7 // sub sp, #24 // This is bad, if an interrupt is taken after the mov, sp is in an // inconsistent state. // Use the first callee-saved register as a scratch register. assert(MF.getRegInfo().isPhysRegUsed(ARM::R4) && "No scratch register to restore SP from FP!"); emitT2RegPlusImmediate(MBB, MBBI, dl, ARM::R4, FramePtr, -NumBytes, ARMCC::AL, 0, TII); AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::SP) .addReg(ARM::R4)); } } else { // Thumb2 or ARM. if (isARM) BuildMI(MBB, MBBI, dl, TII.get(ARM::MOVr), ARM::SP) .addReg(FramePtr).addImm((unsigned)ARMCC::AL).addReg(0).addReg(0); else AddDefaultPred(BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::SP) .addReg(FramePtr)); } } else if (NumBytes) emitSPUpdate(isARM, MBB, MBBI, dl, TII, NumBytes); // Increment past our save areas. if (AFI->getDPRCalleeSavedAreaSize()) { MBBI++; // Since vpop register list cannot have gaps, there may be multiple vpop // instructions in the epilogue. while (MBBI->getOpcode() == ARM::VLDMDIA_UPD) MBBI++; } if (AFI->getGPRCalleeSavedArea2Size()) MBBI++; if (AFI->getGPRCalleeSavedArea1Size()) MBBI++; } if (RetOpcode == ARM::TCRETURNdi || RetOpcode == ARM::TCRETURNdiND || RetOpcode == ARM::TCRETURNri || RetOpcode == ARM::TCRETURNriND) { // Tail call return: adjust the stack pointer and jump to callee. MBBI = MBB.getLastNonDebugInstr(); MachineOperand &JumpTarget = MBBI->getOperand(0); // Jump to label or value in register. if (RetOpcode == ARM::TCRETURNdi || RetOpcode == ARM::TCRETURNdiND) { unsigned TCOpcode = (RetOpcode == ARM::TCRETURNdi) ? (STI.isThumb() ? ARM::tTAILJMPd : ARM::TAILJMPd) : (STI.isThumb() ? ARM::tTAILJMPdND : ARM::TAILJMPdND); MachineInstrBuilder MIB = BuildMI(MBB, MBBI, dl, TII.get(TCOpcode)); if (JumpTarget.isGlobal()) MIB.addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset(), JumpTarget.getTargetFlags()); else { assert(JumpTarget.isSymbol()); MIB.addExternalSymbol(JumpTarget.getSymbolName(), JumpTarget.getTargetFlags()); } // Add the default predicate in Thumb mode. if (STI.isThumb()) MIB.addImm(ARMCC::AL).addReg(0); } else if (RetOpcode == ARM::TCRETURNri) { BuildMI(MBB, MBBI, dl, TII.get(STI.isThumb() ? ARM::tTAILJMPr : ARM::TAILJMPr)). addReg(JumpTarget.getReg(), RegState::Kill); } else if (RetOpcode == ARM::TCRETURNriND) { BuildMI(MBB, MBBI, dl, TII.get(STI.isThumb() ? ARM::tTAILJMPrND : ARM::TAILJMPrND)). addReg(JumpTarget.getReg(), RegState::Kill); } MachineInstr *NewMI = prior(MBBI); for (unsigned i = 1, e = MBBI->getNumOperands(); i != e; ++i) NewMI->addOperand(MBBI->getOperand(i)); // Delete the pseudo instruction TCRETURN. MBB.erase(MBBI); MBBI = NewMI; } if (VARegSaveSize) emitSPUpdate(isARM, MBB, MBBI, dl, TII, VARegSaveSize); } /// getFrameIndexReference - Provide a base+offset reference to an FI slot for /// debug info. It's the same as what we use for resolving the code-gen /// references for now. FIXME: This can go wrong when references are /// SP-relative and simple call frames aren't used. int ARMFrameLowering::getFrameIndexReference(const MachineFunction &MF, int FI, unsigned &FrameReg) const { return ResolveFrameIndexReference(MF, FI, FrameReg, 0); } int ARMFrameLowering::ResolveFrameIndexReference(const MachineFunction &MF, int FI, unsigned &FrameReg, int SPAdj) const { const MachineFrameInfo *MFI = MF.getFrameInfo(); const ARMBaseRegisterInfo *RegInfo = static_cast(MF.getTarget().getRegisterInfo()); const ARMFunctionInfo *AFI = MF.getInfo(); int Offset = MFI->getObjectOffset(FI) + MFI->getStackSize(); int FPOffset = Offset - AFI->getFramePtrSpillOffset(); bool isFixed = MFI->isFixedObjectIndex(FI); FrameReg = ARM::SP; Offset += SPAdj; if (AFI->isGPRCalleeSavedArea1Frame(FI)) return Offset - AFI->getGPRCalleeSavedArea1Offset(); else if (AFI->isGPRCalleeSavedArea2Frame(FI)) return Offset - AFI->getGPRCalleeSavedArea2Offset(); else if (AFI->isDPRCalleeSavedAreaFrame(FI)) return Offset - AFI->getDPRCalleeSavedAreaOffset(); // When dynamically realigning the stack, use the frame pointer for // parameters, and the stack/base pointer for locals. if (RegInfo->needsStackRealignment(MF)) { assert (hasFP(MF) && "dynamic stack realignment without a FP!"); if (isFixed) { FrameReg = RegInfo->getFrameRegister(MF); Offset = FPOffset; } else if (MFI->hasVarSizedObjects()) { assert(RegInfo->hasBasePointer(MF) && "VLAs and dynamic stack alignment, but missing base pointer!"); FrameReg = RegInfo->getBaseRegister(); } return Offset; } // If there is a frame pointer, use it when we can. if (hasFP(MF) && AFI->hasStackFrame()) { // Use frame pointer to reference fixed objects. Use it for locals if // there are VLAs (and thus the SP isn't reliable as a base). if (isFixed || (MFI->hasVarSizedObjects() && !RegInfo->hasBasePointer(MF))) { FrameReg = RegInfo->getFrameRegister(MF); return FPOffset; } else if (MFI->hasVarSizedObjects()) { assert(RegInfo->hasBasePointer(MF) && "missing base pointer!"); if (AFI->isThumb2Function()) { // Try to use the frame pointer if we can, else use the base pointer // since it's available. This is handy for the emergency spill slot, in // particular. if (FPOffset >= -255 && FPOffset < 0) { FrameReg = RegInfo->getFrameRegister(MF); return FPOffset; } } } else if (AFI->isThumb2Function()) { // Use add , sp, # // ldr , [sp, #] // if at all possible to save space. if (Offset >= 0 && (Offset & 3) == 0 && Offset <= 1020) return Offset; // In Thumb2 mode, the negative offset is very limited. Try to avoid // out of range references. ldr ,[, #-] if (FPOffset >= -255 && FPOffset < 0) { FrameReg = RegInfo->getFrameRegister(MF); return FPOffset; } } else if (Offset > (FPOffset < 0 ? -FPOffset : FPOffset)) { // Otherwise, use SP or FP, whichever is closer to the stack slot. FrameReg = RegInfo->getFrameRegister(MF); return FPOffset; } } // Use the base pointer if we have one. if (RegInfo->hasBasePointer(MF)) FrameReg = RegInfo->getBaseRegister(); return Offset; } int ARMFrameLowering::getFrameIndexOffset(const MachineFunction &MF, int FI) const { unsigned FrameReg; return getFrameIndexReference(MF, FI, FrameReg); } void ARMFrameLowering::emitPushInst(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, const std::vector &CSI, unsigned StmOpc, unsigned StrOpc, bool NoGap, bool(*Func)(unsigned, bool), unsigned MIFlags) const { MachineFunction &MF = *MBB.getParent(); const TargetInstrInfo &TII = *MF.getTarget().getInstrInfo(); DebugLoc DL; if (MI != MBB.end()) DL = MI->getDebugLoc(); SmallVector, 4> Regs; unsigned i = CSI.size(); while (i != 0) { unsigned LastReg = 0; for (; i != 0; --i) { unsigned Reg = CSI[i-1].getReg(); if (!(Func)(Reg, STI.isTargetDarwin())) continue; // Add the callee-saved register as live-in unless it's LR and // @llvm.returnaddress is called. If LR is returned for // @llvm.returnaddress then it's already added to the function and // entry block live-in sets. bool isKill = true; if (Reg == ARM::LR) { if (MF.getFrameInfo()->isReturnAddressTaken() && MF.getRegInfo().isLiveIn(Reg)) isKill = false; } if (isKill) MBB.addLiveIn(Reg); // If NoGap is true, push consecutive registers and then leave the rest // for other instructions. e.g. // vpush {d8, d10, d11} -> vpush {d8}, vpush {d10, d11} if (NoGap && LastReg && LastReg != Reg-1) break; LastReg = Reg; Regs.push_back(std::make_pair(Reg, isKill)); } if (Regs.empty()) continue; if (Regs.size() > 1 || StrOpc== 0) { MachineInstrBuilder MIB = AddDefaultPred(BuildMI(MBB, MI, DL, TII.get(StmOpc), ARM::SP) .addReg(ARM::SP).setMIFlags(MIFlags)); for (unsigned i = 0, e = Regs.size(); i < e; ++i) MIB.addReg(Regs[i].first, getKillRegState(Regs[i].second)); } else if (Regs.size() == 1) { MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc), ARM::SP) .addReg(Regs[0].first, getKillRegState(Regs[0].second)) .addReg(ARM::SP).setMIFlags(MIFlags) .addImm(-4); AddDefaultPred(MIB); } Regs.clear(); } } void ARMFrameLowering::emitPopInst(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, const std::vector &CSI, unsigned LdmOpc, unsigned LdrOpc, bool isVarArg, bool NoGap, bool(*Func)(unsigned, bool)) const { MachineFunction &MF = *MBB.getParent(); const TargetInstrInfo &TII = *MF.getTarget().getInstrInfo(); ARMFunctionInfo *AFI = MF.getInfo(); DebugLoc DL = MI->getDebugLoc(); unsigned RetOpcode = MI->getOpcode(); bool isTailCall = (RetOpcode == ARM::TCRETURNdi || RetOpcode == ARM::TCRETURNdiND || RetOpcode == ARM::TCRETURNri || RetOpcode == ARM::TCRETURNriND); SmallVector Regs; unsigned i = CSI.size(); while (i != 0) { unsigned LastReg = 0; bool DeleteRet = false; for (; i != 0; --i) { unsigned Reg = CSI[i-1].getReg(); if (!(Func)(Reg, STI.isTargetDarwin())) continue; if (Reg == ARM::LR && !isTailCall && !isVarArg && STI.hasV5TOps()) { Reg = ARM::PC; LdmOpc = AFI->isThumbFunction() ? ARM::t2LDMIA_RET : ARM::LDMIA_RET; // Fold the return instruction into the LDM. DeleteRet = true; } // If NoGap is true, pop consecutive registers and then leave the rest // for other instructions. e.g. // vpop {d8, d10, d11} -> vpop {d8}, vpop {d10, d11} if (NoGap && LastReg && LastReg != Reg-1) break; LastReg = Reg; Regs.push_back(Reg); } if (Regs.empty()) continue; if (Regs.size() > 1 || LdrOpc == 0) { MachineInstrBuilder MIB = AddDefaultPred(BuildMI(MBB, MI, DL, TII.get(LdmOpc), ARM::SP) .addReg(ARM::SP)); for (unsigned i = 0, e = Regs.size(); i < e; ++i) MIB.addReg(Regs[i], getDefRegState(true)); if (DeleteRet) { MIB->copyImplicitOps(&*MI); MI->eraseFromParent(); } MI = MIB; } else if (Regs.size() == 1) { // If we adjusted the reg to PC from LR above, switch it back here. We // only do that for LDM. if (Regs[0] == ARM::PC) Regs[0] = ARM::LR; MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(LdrOpc), Regs[0]) .addReg(ARM::SP, RegState::Define) .addReg(ARM::SP); // ARM mode needs an extra reg0 here due to addrmode2. Will go away once // that refactoring is complete (eventually). if (LdrOpc == ARM::LDR_POST_REG || LdrOpc == ARM::LDR_POST_IMM) { MIB.addReg(0); MIB.addImm(ARM_AM::getAM2Opc(ARM_AM::add, 4, ARM_AM::no_shift)); } else MIB.addImm(4); AddDefaultPred(MIB); } Regs.clear(); } } bool ARMFrameLowering::spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, const std::vector &CSI, const TargetRegisterInfo *TRI) const { if (CSI.empty()) return false; MachineFunction &MF = *MBB.getParent(); ARMFunctionInfo *AFI = MF.getInfo(); unsigned PushOpc = AFI->isThumbFunction() ? ARM::t2STMDB_UPD : ARM::STMDB_UPD; unsigned PushOneOpc = AFI->isThumbFunction() ? ARM::t2STR_PRE : ARM::STR_PRE_IMM; unsigned FltOpc = ARM::VSTMDDB_UPD; emitPushInst(MBB, MI, CSI, PushOpc, PushOneOpc, false, &isARMArea1Register, MachineInstr::FrameSetup); emitPushInst(MBB, MI, CSI, PushOpc, PushOneOpc, false, &isARMArea2Register, MachineInstr::FrameSetup); emitPushInst(MBB, MI, CSI, FltOpc, 0, true, &isARMArea3Register, MachineInstr::FrameSetup); return true; } bool ARMFrameLowering::restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, const std::vector &CSI, const TargetRegisterInfo *TRI) const { if (CSI.empty()) return false; MachineFunction &MF = *MBB.getParent(); ARMFunctionInfo *AFI = MF.getInfo(); bool isVarArg = AFI->getVarArgsRegSaveSize() > 0; unsigned PopOpc = AFI->isThumbFunction() ? ARM::t2LDMIA_UPD : ARM::LDMIA_UPD; unsigned LdrOpc = AFI->isThumbFunction() ? ARM::t2LDR_POST :ARM::LDR_POST_IMM; unsigned FltOpc = ARM::VLDMDIA_UPD; emitPopInst(MBB, MI, CSI, FltOpc, 0, isVarArg, true, &isARMArea3Register); emitPopInst(MBB, MI, CSI, PopOpc, LdrOpc, isVarArg, false, &isARMArea2Register); emitPopInst(MBB, MI, CSI, PopOpc, LdrOpc, isVarArg, false, &isARMArea1Register); return true; } // FIXME: Make generic? static unsigned GetFunctionSizeInBytes(const MachineFunction &MF, const ARMBaseInstrInfo &TII) { unsigned FnSize = 0; for (MachineFunction::const_iterator MBBI = MF.begin(), E = MF.end(); MBBI != E; ++MBBI) { const MachineBasicBlock &MBB = *MBBI; for (MachineBasicBlock::const_iterator I = MBB.begin(),E = MBB.end(); I != E; ++I) FnSize += TII.GetInstSizeInBytes(I); } return FnSize; } /// estimateStackSize - Estimate and return the size of the frame. /// FIXME: Make generic? static unsigned estimateStackSize(MachineFunction &MF) { const MachineFrameInfo *MFI = MF.getFrameInfo(); const TargetFrameLowering *TFI = MF.getTarget().getFrameLowering(); const TargetRegisterInfo *RegInfo = MF.getTarget().getRegisterInfo(); unsigned MaxAlign = MFI->getMaxAlignment(); int Offset = 0; // This code is very, very similar to PEI::calculateFrameObjectOffsets(). // It really should be refactored to share code. Until then, changes // should keep in mind that there's tight coupling between the two. for (int i = MFI->getObjectIndexBegin(); i != 0; ++i) { int FixedOff = -MFI->getObjectOffset(i); if (FixedOff > Offset) Offset = FixedOff; } for (unsigned i = 0, e = MFI->getObjectIndexEnd(); i != e; ++i) { if (MFI->isDeadObjectIndex(i)) continue; Offset += MFI->getObjectSize(i); unsigned Align = MFI->getObjectAlignment(i); // Adjust to alignment boundary Offset = (Offset+Align-1)/Align*Align; MaxAlign = std::max(Align, MaxAlign); } if (MFI->adjustsStack() && TFI->hasReservedCallFrame(MF)) Offset += MFI->getMaxCallFrameSize(); // Round up the size to a multiple of the alignment. If the function has // any calls or alloca's, align to the target's StackAlignment value to // ensure that the callee's frame or the alloca data is suitably aligned; // otherwise, for leaf functions, align to the TransientStackAlignment // value. unsigned StackAlign; if (MFI->adjustsStack() || MFI->hasVarSizedObjects() || (RegInfo->needsStackRealignment(MF) && MFI->getObjectIndexEnd() != 0)) StackAlign = TFI->getStackAlignment(); else StackAlign = TFI->getTransientStackAlignment(); // If the frame pointer is eliminated, all frame offsets will be relative to // SP not FP. Align to MaxAlign so this works. StackAlign = std::max(StackAlign, MaxAlign); unsigned AlignMask = StackAlign - 1; Offset = (Offset + AlignMask) & ~uint64_t(AlignMask); return (unsigned)Offset; } /// estimateRSStackSizeLimit - Look at each instruction that references stack /// frames and return the stack size limit beyond which some of these /// instructions will require a scratch register during their expansion later. // FIXME: Move to TII? static unsigned estimateRSStackSizeLimit(MachineFunction &MF, const TargetFrameLowering *TFI) { const ARMFunctionInfo *AFI = MF.getInfo(); unsigned Limit = (1 << 12) - 1; for (MachineFunction::iterator BB = MF.begin(),E = MF.end(); BB != E; ++BB) { for (MachineBasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) { for (unsigned i = 0, e = I->getNumOperands(); i != e; ++i) { if (!I->getOperand(i).isFI()) continue; // When using ADDri to get the address of a stack object, 255 is the // largest offset guaranteed to fit in the immediate offset. if (I->getOpcode() == ARM::ADDri) { Limit = std::min(Limit, (1U << 8) - 1); break; } // Otherwise check the addressing mode. switch (I->getDesc().TSFlags & ARMII::AddrModeMask) { case ARMII::AddrMode3: case ARMII::AddrModeT2_i8: Limit = std::min(Limit, (1U << 8) - 1); break; case ARMII::AddrMode5: case ARMII::AddrModeT2_i8s4: Limit = std::min(Limit, ((1U << 8) - 1) * 4); break; case ARMII::AddrModeT2_i12: // i12 supports only positive offset so these will be converted to // i8 opcodes. See llvm::rewriteT2FrameIndex. if (TFI->hasFP(MF) && AFI->hasStackFrame()) Limit = std::min(Limit, (1U << 8) - 1); break; case ARMII::AddrMode4: case ARMII::AddrMode6: // Addressing modes 4 & 6 (load/store) instructions can't encode an // immediate offset for stack references. return 0; default: break; } break; // At most one FI per instruction } } } return Limit; } void ARMFrameLowering::processFunctionBeforeCalleeSavedScan(MachineFunction &MF, RegScavenger *RS) const { // This tells PEI to spill the FP as if it is any other callee-save register // to take advantage the eliminateFrameIndex machinery. This also ensures it // is spilled in the order specified by getCalleeSavedRegs() to make it easier // to combine multiple loads / stores. bool CanEliminateFrame = true; bool CS1Spilled = false; bool LRSpilled = false; unsigned NumGPRSpills = 0; SmallVector UnspilledCS1GPRs; SmallVector UnspilledCS2GPRs; const ARMBaseRegisterInfo *RegInfo = static_cast(MF.getTarget().getRegisterInfo()); const ARMBaseInstrInfo &TII = *static_cast(MF.getTarget().getInstrInfo()); ARMFunctionInfo *AFI = MF.getInfo(); MachineFrameInfo *MFI = MF.getFrameInfo(); unsigned FramePtr = RegInfo->getFrameRegister(MF); // Spill R4 if Thumb2 function requires stack realignment - it will be used as // scratch register. Also spill R4 if Thumb2 function has varsized objects, // since it's not always possible to restore sp from fp in a single // instruction. // FIXME: It will be better just to find spare register here. if (AFI->isThumb2Function() && (MFI->hasVarSizedObjects() || RegInfo->needsStackRealignment(MF))) MF.getRegInfo().setPhysRegUsed(ARM::R4); if (AFI->isThumb1OnlyFunction()) { // Spill LR if Thumb1 function uses variable length argument lists. if (AFI->getVarArgsRegSaveSize() > 0) MF.getRegInfo().setPhysRegUsed(ARM::LR); // Spill R4 if Thumb1 epilogue has to restore SP from FP. We don't know // for sure what the stack size will be, but for this, an estimate is good // enough. If there anything changes it, it'll be a spill, which implies // we've used all the registers and so R4 is already used, so not marking // it here will be OK. // FIXME: It will be better just to find spare register here. unsigned StackSize = estimateStackSize(MF); if (MFI->hasVarSizedObjects() || StackSize > 508) MF.getRegInfo().setPhysRegUsed(ARM::R4); } // Spill the BasePtr if it's used. if (RegInfo->hasBasePointer(MF)) MF.getRegInfo().setPhysRegUsed(RegInfo->getBaseRegister()); // Don't spill FP if the frame can be eliminated. This is determined // by scanning the callee-save registers to see if any is used. const unsigned *CSRegs = RegInfo->getCalleeSavedRegs(); for (unsigned i = 0; CSRegs[i]; ++i) { unsigned Reg = CSRegs[i]; bool Spilled = false; if (MF.getRegInfo().isPhysRegUsed(Reg)) { Spilled = true; CanEliminateFrame = false; } else { // Check alias registers too. for (const unsigned *Aliases = RegInfo->getAliasSet(Reg); *Aliases; ++Aliases) { if (MF.getRegInfo().isPhysRegUsed(*Aliases)) { Spilled = true; CanEliminateFrame = false; } } } if (!ARM::GPRRegisterClass->contains(Reg)) continue; if (Spilled) { NumGPRSpills++; if (!STI.isTargetDarwin()) { if (Reg == ARM::LR) LRSpilled = true; CS1Spilled = true; continue; } // Keep track if LR and any of R4, R5, R6, and R7 is spilled. switch (Reg) { case ARM::LR: LRSpilled = true; // Fallthrough case ARM::R4: case ARM::R5: case ARM::R6: case ARM::R7: CS1Spilled = true; break; default: break; } } else { if (!STI.isTargetDarwin()) { UnspilledCS1GPRs.push_back(Reg); continue; } switch (Reg) { case ARM::R4: case ARM::R5: case ARM::R6: case ARM::R7: case ARM::LR: UnspilledCS1GPRs.push_back(Reg); break; default: UnspilledCS2GPRs.push_back(Reg); break; } } } bool ForceLRSpill = false; if (!LRSpilled && AFI->isThumb1OnlyFunction()) { unsigned FnSize = GetFunctionSizeInBytes(MF, TII); // Force LR to be spilled if the Thumb function size is > 2048. This enables // use of BL to implement far jump. If it turns out that it's not needed // then the branch fix up path will undo it. if (FnSize >= (1 << 11)) { CanEliminateFrame = false; ForceLRSpill = true; } } // If any of the stack slot references may be out of range of an immediate // offset, make sure a register (or a spill slot) is available for the // register scavenger. Note that if we're indexing off the frame pointer, the // effective stack size is 4 bytes larger since the FP points to the stack // slot of the previous FP. Also, if we have variable sized objects in the // function, stack slot references will often be negative, and some of // our instructions are positive-offset only, so conservatively consider // that case to want a spill slot (or register) as well. Similarly, if // the function adjusts the stack pointer during execution and the // adjustments aren't already part of our stack size estimate, our offset // calculations may be off, so be conservative. // FIXME: We could add logic to be more precise about negative offsets // and which instructions will need a scratch register for them. Is it // worth the effort and added fragility? bool BigStack = (RS && (estimateStackSize(MF) + ((hasFP(MF) && AFI->hasStackFrame()) ? 4:0) >= estimateRSStackSizeLimit(MF, this))) || MFI->hasVarSizedObjects() || (MFI->adjustsStack() && !canSimplifyCallFramePseudos(MF)); bool ExtraCSSpill = false; if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF)) { AFI->setHasStackFrame(true); // If LR is not spilled, but at least one of R4, R5, R6, and R7 is spilled. // Spill LR as well so we can fold BX_RET to the registers restore (LDM). if (!LRSpilled && CS1Spilled) { MF.getRegInfo().setPhysRegUsed(ARM::LR); NumGPRSpills++; UnspilledCS1GPRs.erase(std::find(UnspilledCS1GPRs.begin(), UnspilledCS1GPRs.end(), (unsigned)ARM::LR)); ForceLRSpill = false; ExtraCSSpill = true; } if (hasFP(MF)) { MF.getRegInfo().setPhysRegUsed(FramePtr); NumGPRSpills++; } // If stack and double are 8-byte aligned and we are spilling an odd number // of GPRs, spill one extra callee save GPR so we won't have to pad between // the integer and double callee save areas. unsigned TargetAlign = getStackAlignment(); if (TargetAlign == 8 && (NumGPRSpills & 1)) { if (CS1Spilled && !UnspilledCS1GPRs.empty()) { for (unsigned i = 0, e = UnspilledCS1GPRs.size(); i != e; ++i) { unsigned Reg = UnspilledCS1GPRs[i]; // Don't spill high register if the function is thumb1 if (!AFI->isThumb1OnlyFunction() || isARMLowRegister(Reg) || Reg == ARM::LR) { MF.getRegInfo().setPhysRegUsed(Reg); if (!RegInfo->isReservedReg(MF, Reg)) ExtraCSSpill = true; break; } } } else if (!UnspilledCS2GPRs.empty() && !AFI->isThumb1OnlyFunction()) { unsigned Reg = UnspilledCS2GPRs.front(); MF.getRegInfo().setPhysRegUsed(Reg); if (!RegInfo->isReservedReg(MF, Reg)) ExtraCSSpill = true; } } // Estimate if we might need to scavenge a register at some point in order // to materialize a stack offset. If so, either spill one additional // callee-saved register or reserve a special spill slot to facilitate // register scavenging. Thumb1 needs a spill slot for stack pointer // adjustments also, even when the frame itself is small. if (BigStack && !ExtraCSSpill) { // If any non-reserved CS register isn't spilled, just spill one or two // extra. That should take care of it! unsigned NumExtras = TargetAlign / 4; SmallVector Extras; while (NumExtras && !UnspilledCS1GPRs.empty()) { unsigned Reg = UnspilledCS1GPRs.back(); UnspilledCS1GPRs.pop_back(); if (!RegInfo->isReservedReg(MF, Reg) && (!AFI->isThumb1OnlyFunction() || isARMLowRegister(Reg) || Reg == ARM::LR)) { Extras.push_back(Reg); NumExtras--; } } // For non-Thumb1 functions, also check for hi-reg CS registers if (!AFI->isThumb1OnlyFunction()) { while (NumExtras && !UnspilledCS2GPRs.empty()) { unsigned Reg = UnspilledCS2GPRs.back(); UnspilledCS2GPRs.pop_back(); if (!RegInfo->isReservedReg(MF, Reg)) { Extras.push_back(Reg); NumExtras--; } } } if (Extras.size() && NumExtras == 0) { for (unsigned i = 0, e = Extras.size(); i != e; ++i) { MF.getRegInfo().setPhysRegUsed(Extras[i]); } } else if (!AFI->isThumb1OnlyFunction()) { // note: Thumb1 functions spill to R12, not the stack. Reserve a slot // closest to SP or frame pointer. const TargetRegisterClass *RC = ARM::GPRRegisterClass; RS->setScavengingFrameIndex(MFI->CreateStackObject(RC->getSize(), RC->getAlignment(), false)); } } } if (ForceLRSpill) { MF.getRegInfo().setPhysRegUsed(ARM::LR); AFI->setLRIsSpilledForFarJump(true); } } Index: vendor/llvm/dist/lib/Target/ARM/ARMISelLowering.cpp =================================================================== --- vendor/llvm/dist/lib/Target/ARM/ARMISelLowering.cpp (revision 228363) +++ vendor/llvm/dist/lib/Target/ARM/ARMISelLowering.cpp (revision 228364) @@ -1,8835 +1,8837 @@ //===-- ARMISelLowering.cpp - ARM DAG Lowering Implementation -------------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file defines the interfaces that ARM uses to lower LLVM code into a // selection DAG. // //===----------------------------------------------------------------------===// #define DEBUG_TYPE "arm-isel" #include "ARM.h" #include "ARMCallingConv.h" #include "ARMConstantPoolValue.h" #include "ARMISelLowering.h" #include "ARMMachineFunctionInfo.h" #include "ARMPerfectShuffle.h" #include "ARMRegisterInfo.h" #include "ARMSubtarget.h" #include "ARMTargetMachine.h" #include "ARMTargetObjectFile.h" #include "MCTargetDesc/ARMAddressingModes.h" #include "llvm/CallingConv.h" #include "llvm/Constants.h" #include "llvm/Function.h" #include "llvm/GlobalValue.h" #include "llvm/Instruction.h" #include "llvm/Instructions.h" #include "llvm/Intrinsics.h" #include "llvm/Type.h" #include "llvm/CodeGen/CallingConvLower.h" #include "llvm/CodeGen/IntrinsicLowering.h" #include "llvm/CodeGen/MachineBasicBlock.h" #include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/CodeGen/MachineModuleInfo.h" #include "llvm/CodeGen/MachineRegisterInfo.h" #include "llvm/CodeGen/PseudoSourceValue.h" #include "llvm/CodeGen/SelectionDAG.h" #include "llvm/MC/MCSectionMachO.h" #include "llvm/Target/TargetOptions.h" #include "llvm/ADT/VectorExtras.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/Statistic.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/raw_ostream.h" #include using namespace llvm; STATISTIC(NumTailCalls, "Number of tail calls"); STATISTIC(NumMovwMovt, "Number of GAs materialized with movw + movt"); // This option should go away when tail calls fully work. static cl::opt EnableARMTailCalls("arm-tail-calls", cl::Hidden, cl::desc("Generate tail calls (TEMPORARY OPTION)."), cl::init(false)); cl::opt EnableARMLongCalls("arm-long-calls", cl::Hidden, cl::desc("Generate calls via indirect call instructions"), cl::init(false)); static cl::opt ARMInterworking("arm-interworking", cl::Hidden, cl::desc("Enable / disable ARM interworking (for debugging only)"), cl::init(true)); namespace llvm { class ARMCCState : public CCState { public: ARMCCState(CallingConv::ID CC, bool isVarArg, MachineFunction &MF, const TargetMachine &TM, SmallVector &locs, LLVMContext &C, ParmContext PC) : CCState(CC, isVarArg, MF, TM, locs, C) { assert(((PC == Call) || (PC == Prologue)) && "ARMCCState users must specify whether their context is call" "or prologue generation."); CallOrPrologue = PC; } }; } // The APCS parameter registers. static const unsigned GPRArgRegs[] = { ARM::R0, ARM::R1, ARM::R2, ARM::R3 }; void ARMTargetLowering::addTypeForNEON(EVT VT, EVT PromotedLdStVT, EVT PromotedBitwiseVT) { if (VT != PromotedLdStVT) { setOperationAction(ISD::LOAD, VT.getSimpleVT(), Promote); AddPromotedToType (ISD::LOAD, VT.getSimpleVT(), PromotedLdStVT.getSimpleVT()); setOperationAction(ISD::STORE, VT.getSimpleVT(), Promote); AddPromotedToType (ISD::STORE, VT.getSimpleVT(), PromotedLdStVT.getSimpleVT()); } EVT ElemTy = VT.getVectorElementType(); if (ElemTy != MVT::i64 && ElemTy != MVT::f64) setOperationAction(ISD::SETCC, VT.getSimpleVT(), Custom); setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT.getSimpleVT(), Custom); if (ElemTy != MVT::i32) { setOperationAction(ISD::SINT_TO_FP, VT.getSimpleVT(), Expand); setOperationAction(ISD::UINT_TO_FP, VT.getSimpleVT(), Expand); setOperationAction(ISD::FP_TO_SINT, VT.getSimpleVT(), Expand); setOperationAction(ISD::FP_TO_UINT, VT.getSimpleVT(), Expand); } setOperationAction(ISD::BUILD_VECTOR, VT.getSimpleVT(), Custom); setOperationAction(ISD::VECTOR_SHUFFLE, VT.getSimpleVT(), Custom); setOperationAction(ISD::CONCAT_VECTORS, VT.getSimpleVT(), Legal); setOperationAction(ISD::EXTRACT_SUBVECTOR, VT.getSimpleVT(), Legal); setOperationAction(ISD::SELECT, VT.getSimpleVT(), Expand); setOperationAction(ISD::SELECT_CC, VT.getSimpleVT(), Expand); if (VT.isInteger()) { setOperationAction(ISD::SHL, VT.getSimpleVT(), Custom); setOperationAction(ISD::SRA, VT.getSimpleVT(), Custom); setOperationAction(ISD::SRL, VT.getSimpleVT(), Custom); setLoadExtAction(ISD::SEXTLOAD, VT.getSimpleVT(), Expand); setLoadExtAction(ISD::ZEXTLOAD, VT.getSimpleVT(), Expand); for (unsigned InnerVT = (unsigned)MVT::FIRST_VECTOR_VALUETYPE; InnerVT <= (unsigned)MVT::LAST_VECTOR_VALUETYPE; ++InnerVT) setTruncStoreAction(VT.getSimpleVT(), (MVT::SimpleValueType)InnerVT, Expand); } setLoadExtAction(ISD::EXTLOAD, VT.getSimpleVT(), Expand); // Promote all bit-wise operations. if (VT.isInteger() && VT != PromotedBitwiseVT) { setOperationAction(ISD::AND, VT.getSimpleVT(), Promote); AddPromotedToType (ISD::AND, VT.getSimpleVT(), PromotedBitwiseVT.getSimpleVT()); setOperationAction(ISD::OR, VT.getSimpleVT(), Promote); AddPromotedToType (ISD::OR, VT.getSimpleVT(), PromotedBitwiseVT.getSimpleVT()); setOperationAction(ISD::XOR, VT.getSimpleVT(), Promote); AddPromotedToType (ISD::XOR, VT.getSimpleVT(), PromotedBitwiseVT.getSimpleVT()); } // Neon does not support vector divide/remainder operations. setOperationAction(ISD::SDIV, VT.getSimpleVT(), Expand); setOperationAction(ISD::UDIV, VT.getSimpleVT(), Expand); setOperationAction(ISD::FDIV, VT.getSimpleVT(), Expand); setOperationAction(ISD::SREM, VT.getSimpleVT(), Expand); setOperationAction(ISD::UREM, VT.getSimpleVT(), Expand); setOperationAction(ISD::FREM, VT.getSimpleVT(), Expand); } void ARMTargetLowering::addDRTypeForNEON(EVT VT) { addRegisterClass(VT, ARM::DPRRegisterClass); addTypeForNEON(VT, MVT::f64, MVT::v2i32); } void ARMTargetLowering::addQRTypeForNEON(EVT VT) { addRegisterClass(VT, ARM::QPRRegisterClass); addTypeForNEON(VT, MVT::v2f64, MVT::v4i32); } static TargetLoweringObjectFile *createTLOF(TargetMachine &TM) { if (TM.getSubtarget().isTargetDarwin()) return new TargetLoweringObjectFileMachO(); return new ARMElfTargetObjectFile(); } ARMTargetLowering::ARMTargetLowering(TargetMachine &TM) : TargetLowering(TM, createTLOF(TM)) { Subtarget = &TM.getSubtarget(); RegInfo = TM.getRegisterInfo(); Itins = TM.getInstrItineraryData(); setBooleanVectorContents(ZeroOrNegativeOneBooleanContent); if (Subtarget->isTargetDarwin()) { // Uses VFP for Thumb libfuncs if available. if (Subtarget->isThumb() && Subtarget->hasVFP2()) { // Single-precision floating-point arithmetic. setLibcallName(RTLIB::ADD_F32, "__addsf3vfp"); setLibcallName(RTLIB::SUB_F32, "__subsf3vfp"); setLibcallName(RTLIB::MUL_F32, "__mulsf3vfp"); setLibcallName(RTLIB::DIV_F32, "__divsf3vfp"); // Double-precision floating-point arithmetic. setLibcallName(RTLIB::ADD_F64, "__adddf3vfp"); setLibcallName(RTLIB::SUB_F64, "__subdf3vfp"); setLibcallName(RTLIB::MUL_F64, "__muldf3vfp"); setLibcallName(RTLIB::DIV_F64, "__divdf3vfp"); // Single-precision comparisons. setLibcallName(RTLIB::OEQ_F32, "__eqsf2vfp"); setLibcallName(RTLIB::UNE_F32, "__nesf2vfp"); setLibcallName(RTLIB::OLT_F32, "__ltsf2vfp"); setLibcallName(RTLIB::OLE_F32, "__lesf2vfp"); setLibcallName(RTLIB::OGE_F32, "__gesf2vfp"); setLibcallName(RTLIB::OGT_F32, "__gtsf2vfp"); setLibcallName(RTLIB::UO_F32, "__unordsf2vfp"); setLibcallName(RTLIB::O_F32, "__unordsf2vfp"); setCmpLibcallCC(RTLIB::OEQ_F32, ISD::SETNE); setCmpLibcallCC(RTLIB::UNE_F32, ISD::SETNE); setCmpLibcallCC(RTLIB::OLT_F32, ISD::SETNE); setCmpLibcallCC(RTLIB::OLE_F32, ISD::SETNE); setCmpLibcallCC(RTLIB::OGE_F32, ISD::SETNE); setCmpLibcallCC(RTLIB::OGT_F32, ISD::SETNE); setCmpLibcallCC(RTLIB::UO_F32, ISD::SETNE); setCmpLibcallCC(RTLIB::O_F32, ISD::SETEQ); // Double-precision comparisons. setLibcallName(RTLIB::OEQ_F64, "__eqdf2vfp"); setLibcallName(RTLIB::UNE_F64, "__nedf2vfp"); setLibcallName(RTLIB::OLT_F64, "__ltdf2vfp"); setLibcallName(RTLIB::OLE_F64, "__ledf2vfp"); setLibcallName(RTLIB::OGE_F64, "__gedf2vfp"); setLibcallName(RTLIB::OGT_F64, "__gtdf2vfp"); setLibcallName(RTLIB::UO_F64, "__unorddf2vfp"); setLibcallName(RTLIB::O_F64, "__unorddf2vfp"); setCmpLibcallCC(RTLIB::OEQ_F64, ISD::SETNE); setCmpLibcallCC(RTLIB::UNE_F64, ISD::SETNE); setCmpLibcallCC(RTLIB::OLT_F64, ISD::SETNE); setCmpLibcallCC(RTLIB::OLE_F64, ISD::SETNE); setCmpLibcallCC(RTLIB::OGE_F64, ISD::SETNE); setCmpLibcallCC(RTLIB::OGT_F64, ISD::SETNE); setCmpLibcallCC(RTLIB::UO_F64, ISD::SETNE); setCmpLibcallCC(RTLIB::O_F64, ISD::SETEQ); // Floating-point to integer conversions. // i64 conversions are done via library routines even when generating VFP // instructions, so use the same ones. setLibcallName(RTLIB::FPTOSINT_F64_I32, "__fixdfsivfp"); setLibcallName(RTLIB::FPTOUINT_F64_I32, "__fixunsdfsivfp"); setLibcallName(RTLIB::FPTOSINT_F32_I32, "__fixsfsivfp"); setLibcallName(RTLIB::FPTOUINT_F32_I32, "__fixunssfsivfp"); // Conversions between floating types. setLibcallName(RTLIB::FPROUND_F64_F32, "__truncdfsf2vfp"); setLibcallName(RTLIB::FPEXT_F32_F64, "__extendsfdf2vfp"); // Integer to floating-point conversions. // i64 conversions are done via library routines even when generating VFP // instructions, so use the same ones. // FIXME: There appears to be some naming inconsistency in ARM libgcc: // e.g., __floatunsidf vs. __floatunssidfvfp. setLibcallName(RTLIB::SINTTOFP_I32_F64, "__floatsidfvfp"); setLibcallName(RTLIB::UINTTOFP_I32_F64, "__floatunssidfvfp"); setLibcallName(RTLIB::SINTTOFP_I32_F32, "__floatsisfvfp"); setLibcallName(RTLIB::UINTTOFP_I32_F32, "__floatunssisfvfp"); } } // These libcalls are not available in 32-bit. setLibcallName(RTLIB::SHL_I128, 0); setLibcallName(RTLIB::SRL_I128, 0); setLibcallName(RTLIB::SRA_I128, 0); if (Subtarget->isAAPCS_ABI()) { // Double-precision floating-point arithmetic helper functions // RTABI chapter 4.1.2, Table 2 setLibcallName(RTLIB::ADD_F64, "__aeabi_dadd"); setLibcallName(RTLIB::DIV_F64, "__aeabi_ddiv"); setLibcallName(RTLIB::MUL_F64, "__aeabi_dmul"); setLibcallName(RTLIB::SUB_F64, "__aeabi_dsub"); setLibcallCallingConv(RTLIB::ADD_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::DIV_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::MUL_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::SUB_F64, CallingConv::ARM_AAPCS); // Double-precision floating-point comparison helper functions // RTABI chapter 4.1.2, Table 3 setLibcallName(RTLIB::OEQ_F64, "__aeabi_dcmpeq"); setCmpLibcallCC(RTLIB::OEQ_F64, ISD::SETNE); setLibcallName(RTLIB::UNE_F64, "__aeabi_dcmpeq"); setCmpLibcallCC(RTLIB::UNE_F64, ISD::SETEQ); setLibcallName(RTLIB::OLT_F64, "__aeabi_dcmplt"); setCmpLibcallCC(RTLIB::OLT_F64, ISD::SETNE); setLibcallName(RTLIB::OLE_F64, "__aeabi_dcmple"); setCmpLibcallCC(RTLIB::OLE_F64, ISD::SETNE); setLibcallName(RTLIB::OGE_F64, "__aeabi_dcmpge"); setCmpLibcallCC(RTLIB::OGE_F64, ISD::SETNE); setLibcallName(RTLIB::OGT_F64, "__aeabi_dcmpgt"); setCmpLibcallCC(RTLIB::OGT_F64, ISD::SETNE); setLibcallName(RTLIB::UO_F64, "__aeabi_dcmpun"); setCmpLibcallCC(RTLIB::UO_F64, ISD::SETNE); setLibcallName(RTLIB::O_F64, "__aeabi_dcmpun"); setCmpLibcallCC(RTLIB::O_F64, ISD::SETEQ); setLibcallCallingConv(RTLIB::OEQ_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UNE_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::OLT_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::OLE_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::OGE_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::OGT_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UO_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::O_F64, CallingConv::ARM_AAPCS); // Single-precision floating-point arithmetic helper functions // RTABI chapter 4.1.2, Table 4 setLibcallName(RTLIB::ADD_F32, "__aeabi_fadd"); setLibcallName(RTLIB::DIV_F32, "__aeabi_fdiv"); setLibcallName(RTLIB::MUL_F32, "__aeabi_fmul"); setLibcallName(RTLIB::SUB_F32, "__aeabi_fsub"); setLibcallCallingConv(RTLIB::ADD_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::DIV_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::MUL_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::SUB_F32, CallingConv::ARM_AAPCS); // Single-precision floating-point comparison helper functions // RTABI chapter 4.1.2, Table 5 setLibcallName(RTLIB::OEQ_F32, "__aeabi_fcmpeq"); setCmpLibcallCC(RTLIB::OEQ_F32, ISD::SETNE); setLibcallName(RTLIB::UNE_F32, "__aeabi_fcmpeq"); setCmpLibcallCC(RTLIB::UNE_F32, ISD::SETEQ); setLibcallName(RTLIB::OLT_F32, "__aeabi_fcmplt"); setCmpLibcallCC(RTLIB::OLT_F32, ISD::SETNE); setLibcallName(RTLIB::OLE_F32, "__aeabi_fcmple"); setCmpLibcallCC(RTLIB::OLE_F32, ISD::SETNE); setLibcallName(RTLIB::OGE_F32, "__aeabi_fcmpge"); setCmpLibcallCC(RTLIB::OGE_F32, ISD::SETNE); setLibcallName(RTLIB::OGT_F32, "__aeabi_fcmpgt"); setCmpLibcallCC(RTLIB::OGT_F32, ISD::SETNE); setLibcallName(RTLIB::UO_F32, "__aeabi_fcmpun"); setCmpLibcallCC(RTLIB::UO_F32, ISD::SETNE); setLibcallName(RTLIB::O_F32, "__aeabi_fcmpun"); setCmpLibcallCC(RTLIB::O_F32, ISD::SETEQ); setLibcallCallingConv(RTLIB::OEQ_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UNE_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::OLT_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::OLE_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::OGE_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::OGT_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UO_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::O_F32, CallingConv::ARM_AAPCS); // Floating-point to integer conversions. // RTABI chapter 4.1.2, Table 6 setLibcallName(RTLIB::FPTOSINT_F64_I32, "__aeabi_d2iz"); setLibcallName(RTLIB::FPTOUINT_F64_I32, "__aeabi_d2uiz"); setLibcallName(RTLIB::FPTOSINT_F64_I64, "__aeabi_d2lz"); setLibcallName(RTLIB::FPTOUINT_F64_I64, "__aeabi_d2ulz"); setLibcallName(RTLIB::FPTOSINT_F32_I32, "__aeabi_f2iz"); setLibcallName(RTLIB::FPTOUINT_F32_I32, "__aeabi_f2uiz"); setLibcallName(RTLIB::FPTOSINT_F32_I64, "__aeabi_f2lz"); setLibcallName(RTLIB::FPTOUINT_F32_I64, "__aeabi_f2ulz"); setLibcallCallingConv(RTLIB::FPTOSINT_F64_I32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::FPTOUINT_F64_I32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::FPTOSINT_F64_I64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::FPTOUINT_F64_I64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::FPTOSINT_F32_I32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::FPTOUINT_F32_I32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::FPTOSINT_F32_I64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::FPTOUINT_F32_I64, CallingConv::ARM_AAPCS); // Conversions between floating types. // RTABI chapter 4.1.2, Table 7 setLibcallName(RTLIB::FPROUND_F64_F32, "__aeabi_d2f"); setLibcallName(RTLIB::FPEXT_F32_F64, "__aeabi_f2d"); setLibcallCallingConv(RTLIB::FPROUND_F64_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::FPEXT_F32_F64, CallingConv::ARM_AAPCS); // Integer to floating-point conversions. // RTABI chapter 4.1.2, Table 8 setLibcallName(RTLIB::SINTTOFP_I32_F64, "__aeabi_i2d"); setLibcallName(RTLIB::UINTTOFP_I32_F64, "__aeabi_ui2d"); setLibcallName(RTLIB::SINTTOFP_I64_F64, "__aeabi_l2d"); setLibcallName(RTLIB::UINTTOFP_I64_F64, "__aeabi_ul2d"); setLibcallName(RTLIB::SINTTOFP_I32_F32, "__aeabi_i2f"); setLibcallName(RTLIB::UINTTOFP_I32_F32, "__aeabi_ui2f"); setLibcallName(RTLIB::SINTTOFP_I64_F32, "__aeabi_l2f"); setLibcallName(RTLIB::UINTTOFP_I64_F32, "__aeabi_ul2f"); setLibcallCallingConv(RTLIB::SINTTOFP_I32_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UINTTOFP_I32_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::SINTTOFP_I64_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UINTTOFP_I64_F64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::SINTTOFP_I32_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UINTTOFP_I32_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::SINTTOFP_I64_F32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UINTTOFP_I64_F32, CallingConv::ARM_AAPCS); // Long long helper functions // RTABI chapter 4.2, Table 9 setLibcallName(RTLIB::MUL_I64, "__aeabi_lmul"); setLibcallName(RTLIB::SDIV_I64, "__aeabi_ldivmod"); setLibcallName(RTLIB::UDIV_I64, "__aeabi_uldivmod"); setLibcallName(RTLIB::SHL_I64, "__aeabi_llsl"); setLibcallName(RTLIB::SRL_I64, "__aeabi_llsr"); setLibcallName(RTLIB::SRA_I64, "__aeabi_lasr"); setLibcallCallingConv(RTLIB::MUL_I64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::SDIV_I64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UDIV_I64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::SHL_I64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::SRL_I64, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::SRA_I64, CallingConv::ARM_AAPCS); // Integer division functions // RTABI chapter 4.3.1 setLibcallName(RTLIB::SDIV_I8, "__aeabi_idiv"); setLibcallName(RTLIB::SDIV_I16, "__aeabi_idiv"); setLibcallName(RTLIB::SDIV_I32, "__aeabi_idiv"); setLibcallName(RTLIB::UDIV_I8, "__aeabi_uidiv"); setLibcallName(RTLIB::UDIV_I16, "__aeabi_uidiv"); setLibcallName(RTLIB::UDIV_I32, "__aeabi_uidiv"); setLibcallCallingConv(RTLIB::SDIV_I8, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::SDIV_I16, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::SDIV_I32, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UDIV_I8, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UDIV_I16, CallingConv::ARM_AAPCS); setLibcallCallingConv(RTLIB::UDIV_I32, CallingConv::ARM_AAPCS); // Memory operations // RTABI chapter 4.3.4 setLibcallName(RTLIB::MEMCPY, "__aeabi_memcpy"); setLibcallName(RTLIB::MEMMOVE, "__aeabi_memmove"); setLibcallName(RTLIB::MEMSET, "__aeabi_memset"); } // Use divmod compiler-rt calls for iOS 5.0 and later. if (Subtarget->getTargetTriple().getOS() == Triple::IOS && !Subtarget->getTargetTriple().isOSVersionLT(5, 0)) { setLibcallName(RTLIB::SDIVREM_I32, "__divmodsi4"); setLibcallName(RTLIB::UDIVREM_I32, "__udivmodsi4"); } if (Subtarget->isThumb1Only()) addRegisterClass(MVT::i32, ARM::tGPRRegisterClass); else addRegisterClass(MVT::i32, ARM::GPRRegisterClass); if (!UseSoftFloat && Subtarget->hasVFP2() && !Subtarget->isThumb1Only()) { addRegisterClass(MVT::f32, ARM::SPRRegisterClass); if (!Subtarget->isFPOnlySP()) addRegisterClass(MVT::f64, ARM::DPRRegisterClass); setTruncStoreAction(MVT::f64, MVT::f32, Expand); } if (Subtarget->hasNEON()) { addDRTypeForNEON(MVT::v2f32); addDRTypeForNEON(MVT::v8i8); addDRTypeForNEON(MVT::v4i16); addDRTypeForNEON(MVT::v2i32); addDRTypeForNEON(MVT::v1i64); addQRTypeForNEON(MVT::v4f32); addQRTypeForNEON(MVT::v2f64); addQRTypeForNEON(MVT::v16i8); addQRTypeForNEON(MVT::v8i16); addQRTypeForNEON(MVT::v4i32); addQRTypeForNEON(MVT::v2i64); // v2f64 is legal so that QR subregs can be extracted as f64 elements, but // neither Neon nor VFP support any arithmetic operations on it. setOperationAction(ISD::FADD, MVT::v2f64, Expand); setOperationAction(ISD::FSUB, MVT::v2f64, Expand); setOperationAction(ISD::FMUL, MVT::v2f64, Expand); setOperationAction(ISD::FDIV, MVT::v2f64, Expand); setOperationAction(ISD::FREM, MVT::v2f64, Expand); setOperationAction(ISD::FCOPYSIGN, MVT::v2f64, Expand); setOperationAction(ISD::SETCC, MVT::v2f64, Expand); setOperationAction(ISD::FNEG, MVT::v2f64, Expand); setOperationAction(ISD::FABS, MVT::v2f64, Expand); setOperationAction(ISD::FSQRT, MVT::v2f64, Expand); setOperationAction(ISD::FSIN, MVT::v2f64, Expand); setOperationAction(ISD::FCOS, MVT::v2f64, Expand); setOperationAction(ISD::FPOWI, MVT::v2f64, Expand); setOperationAction(ISD::FPOW, MVT::v2f64, Expand); setOperationAction(ISD::FLOG, MVT::v2f64, Expand); setOperationAction(ISD::FLOG2, MVT::v2f64, Expand); setOperationAction(ISD::FLOG10, MVT::v2f64, Expand); setOperationAction(ISD::FEXP, MVT::v2f64, Expand); setOperationAction(ISD::FEXP2, MVT::v2f64, Expand); setOperationAction(ISD::FCEIL, MVT::v2f64, Expand); setOperationAction(ISD::FTRUNC, MVT::v2f64, Expand); setOperationAction(ISD::FRINT, MVT::v2f64, Expand); setOperationAction(ISD::FNEARBYINT, MVT::v2f64, Expand); setOperationAction(ISD::FFLOOR, MVT::v2f64, Expand); setTruncStoreAction(MVT::v2f64, MVT::v2f32, Expand); // Neon does not support some operations on v1i64 and v2i64 types. setOperationAction(ISD::MUL, MVT::v1i64, Expand); // Custom handling for some quad-vector types to detect VMULL. setOperationAction(ISD::MUL, MVT::v8i16, Custom); setOperationAction(ISD::MUL, MVT::v4i32, Custom); setOperationAction(ISD::MUL, MVT::v2i64, Custom); // Custom handling for some vector types to avoid expensive expansions setOperationAction(ISD::SDIV, MVT::v4i16, Custom); setOperationAction(ISD::SDIV, MVT::v8i8, Custom); setOperationAction(ISD::UDIV, MVT::v4i16, Custom); setOperationAction(ISD::UDIV, MVT::v8i8, Custom); setOperationAction(ISD::SETCC, MVT::v1i64, Expand); setOperationAction(ISD::SETCC, MVT::v2i64, Expand); // Neon does not have single instruction SINT_TO_FP and UINT_TO_FP with // a destination type that is wider than the source. setOperationAction(ISD::SINT_TO_FP, MVT::v4i16, Custom); setOperationAction(ISD::UINT_TO_FP, MVT::v4i16, Custom); setTargetDAGCombine(ISD::INTRINSIC_VOID); setTargetDAGCombine(ISD::INTRINSIC_W_CHAIN); setTargetDAGCombine(ISD::INTRINSIC_WO_CHAIN); setTargetDAGCombine(ISD::SHL); setTargetDAGCombine(ISD::SRL); setTargetDAGCombine(ISD::SRA); setTargetDAGCombine(ISD::SIGN_EXTEND); setTargetDAGCombine(ISD::ZERO_EXTEND); setTargetDAGCombine(ISD::ANY_EXTEND); setTargetDAGCombine(ISD::SELECT_CC); setTargetDAGCombine(ISD::BUILD_VECTOR); setTargetDAGCombine(ISD::VECTOR_SHUFFLE); setTargetDAGCombine(ISD::INSERT_VECTOR_ELT); setTargetDAGCombine(ISD::STORE); setTargetDAGCombine(ISD::FP_TO_SINT); setTargetDAGCombine(ISD::FP_TO_UINT); setTargetDAGCombine(ISD::FDIV); } computeRegisterProperties(); // ARM does not have f32 extending load. setLoadExtAction(ISD::EXTLOAD, MVT::f32, Expand); // ARM does not have i1 sign extending load. setLoadExtAction(ISD::SEXTLOAD, MVT::i1, Promote); // ARM supports all 4 flavors of integer indexed load / store. if (!Subtarget->isThumb1Only()) { for (unsigned im = (unsigned)ISD::PRE_INC; im != (unsigned)ISD::LAST_INDEXED_MODE; ++im) { setIndexedLoadAction(im, MVT::i1, Legal); setIndexedLoadAction(im, MVT::i8, Legal); setIndexedLoadAction(im, MVT::i16, Legal); setIndexedLoadAction(im, MVT::i32, Legal); setIndexedStoreAction(im, MVT::i1, Legal); setIndexedStoreAction(im, MVT::i8, Legal); setIndexedStoreAction(im, MVT::i16, Legal); setIndexedStoreAction(im, MVT::i32, Legal); } } // i64 operation support. setOperationAction(ISD::MUL, MVT::i64, Expand); setOperationAction(ISD::MULHU, MVT::i32, Expand); if (Subtarget->isThumb1Only()) { setOperationAction(ISD::UMUL_LOHI, MVT::i32, Expand); setOperationAction(ISD::SMUL_LOHI, MVT::i32, Expand); } if (Subtarget->isThumb1Only() || !Subtarget->hasV6Ops() || (Subtarget->isThumb2() && !Subtarget->hasThumb2DSP())) setOperationAction(ISD::MULHS, MVT::i32, Expand); setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom); setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom); setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom); setOperationAction(ISD::SRL, MVT::i64, Custom); setOperationAction(ISD::SRA, MVT::i64, Custom); if (!Subtarget->isThumb1Only()) { // FIXME: We should do this for Thumb1 as well. setOperationAction(ISD::ADDC, MVT::i32, Custom); setOperationAction(ISD::ADDE, MVT::i32, Custom); setOperationAction(ISD::SUBC, MVT::i32, Custom); setOperationAction(ISD::SUBE, MVT::i32, Custom); } // ARM does not have ROTL. setOperationAction(ISD::ROTL, MVT::i32, Expand); setOperationAction(ISD::CTTZ, MVT::i32, Custom); setOperationAction(ISD::CTPOP, MVT::i32, Expand); if (!Subtarget->hasV5TOps() || Subtarget->isThumb1Only()) setOperationAction(ISD::CTLZ, MVT::i32, Expand); // Only ARMv6 has BSWAP. if (!Subtarget->hasV6Ops()) setOperationAction(ISD::BSWAP, MVT::i32, Expand); // These are expanded into libcalls. if (!Subtarget->hasDivide() || !Subtarget->isThumb2()) { // v7M has a hardware divider setOperationAction(ISD::SDIV, MVT::i32, Expand); setOperationAction(ISD::UDIV, MVT::i32, Expand); } setOperationAction(ISD::SREM, MVT::i32, Expand); setOperationAction(ISD::UREM, MVT::i32, Expand); setOperationAction(ISD::SDIVREM, MVT::i32, Expand); setOperationAction(ISD::UDIVREM, MVT::i32, Expand); setOperationAction(ISD::GlobalAddress, MVT::i32, Custom); setOperationAction(ISD::ConstantPool, MVT::i32, Custom); setOperationAction(ISD::GLOBAL_OFFSET_TABLE, MVT::i32, Custom); setOperationAction(ISD::GlobalTLSAddress, MVT::i32, Custom); setOperationAction(ISD::BlockAddress, MVT::i32, Custom); setOperationAction(ISD::TRAP, MVT::Other, Legal); // Use the default implementation. setOperationAction(ISD::VASTART, MVT::Other, Custom); setOperationAction(ISD::VAARG, MVT::Other, Expand); setOperationAction(ISD::VACOPY, MVT::Other, Expand); setOperationAction(ISD::VAEND, MVT::Other, Expand); setOperationAction(ISD::STACKSAVE, MVT::Other, Expand); setOperationAction(ISD::STACKRESTORE, MVT::Other, Expand); setOperationAction(ISD::EHSELECTION, MVT::i32, Expand); setOperationAction(ISD::EXCEPTIONADDR, MVT::i32, Expand); setExceptionPointerRegister(ARM::R0); setExceptionSelectorRegister(ARM::R1); setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i32, Expand); // ARMv6 Thumb1 (except for CPUs that support dmb / dsb) and earlier use // the default expansion. // FIXME: This should be checking for v6k, not just v6. if (Subtarget->hasDataBarrier() || (Subtarget->hasV6Ops() && !Subtarget->isThumb())) { // membarrier needs custom lowering; the rest are legal and handled // normally. setOperationAction(ISD::MEMBARRIER, MVT::Other, Custom); setOperationAction(ISD::ATOMIC_FENCE, MVT::Other, Custom); // Custom lowering for 64-bit ops setOperationAction(ISD::ATOMIC_LOAD_ADD, MVT::i64, Custom); setOperationAction(ISD::ATOMIC_LOAD_SUB, MVT::i64, Custom); setOperationAction(ISD::ATOMIC_LOAD_AND, MVT::i64, Custom); setOperationAction(ISD::ATOMIC_LOAD_OR, MVT::i64, Custom); setOperationAction(ISD::ATOMIC_LOAD_XOR, MVT::i64, Custom); setOperationAction(ISD::ATOMIC_SWAP, MVT::i64, Custom); setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i64, Custom); // Automatically insert fences (dmb ist) around ATOMIC_SWAP etc. setInsertFencesForAtomic(true); } else { // Set them all for expansion, which will force libcalls. setOperationAction(ISD::MEMBARRIER, MVT::Other, Expand); setOperationAction(ISD::ATOMIC_FENCE, MVT::Other, Expand); setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i32, Expand); setOperationAction(ISD::ATOMIC_SWAP, MVT::i32, Expand); setOperationAction(ISD::ATOMIC_LOAD_ADD, MVT::i32, Expand); setOperationAction(ISD::ATOMIC_LOAD_SUB, MVT::i32, Expand); setOperationAction(ISD::ATOMIC_LOAD_AND, MVT::i32, Expand); setOperationAction(ISD::ATOMIC_LOAD_OR, MVT::i32, Expand); setOperationAction(ISD::ATOMIC_LOAD_XOR, MVT::i32, Expand); setOperationAction(ISD::ATOMIC_LOAD_NAND, MVT::i32, Expand); setOperationAction(ISD::ATOMIC_LOAD_MIN, MVT::i32, Expand); setOperationAction(ISD::ATOMIC_LOAD_MAX, MVT::i32, Expand); setOperationAction(ISD::ATOMIC_LOAD_UMIN, MVT::i32, Expand); setOperationAction(ISD::ATOMIC_LOAD_UMAX, MVT::i32, Expand); // Mark ATOMIC_LOAD and ATOMIC_STORE custom so we can handle the // Unordered/Monotonic case. setOperationAction(ISD::ATOMIC_LOAD, MVT::i32, Custom); setOperationAction(ISD::ATOMIC_STORE, MVT::i32, Custom); // Since the libcalls include locking, fold in the fences setShouldFoldAtomicFences(true); } setOperationAction(ISD::PREFETCH, MVT::Other, Custom); // Requires SXTB/SXTH, available on v6 and up in both ARM and Thumb modes. if (!Subtarget->hasV6Ops()) { setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Expand); setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Expand); } setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand); if (!UseSoftFloat && Subtarget->hasVFP2() && !Subtarget->isThumb1Only()) { // Turn f64->i64 into VMOVRRD, i64 -> f64 to VMOVDRR // iff target supports vfp2. setOperationAction(ISD::BITCAST, MVT::i64, Custom); setOperationAction(ISD::FLT_ROUNDS_, MVT::i32, Custom); } // We want to custom lower some of our intrinsics. setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom); if (Subtarget->isTargetDarwin()) { setOperationAction(ISD::EH_SJLJ_SETJMP, MVT::i32, Custom); setOperationAction(ISD::EH_SJLJ_LONGJMP, MVT::Other, Custom); setOperationAction(ISD::EH_SJLJ_DISPATCHSETUP, MVT::Other, Custom); setLibcallName(RTLIB::UNWIND_RESUME, "_Unwind_SjLj_Resume"); } setOperationAction(ISD::SETCC, MVT::i32, Expand); setOperationAction(ISD::SETCC, MVT::f32, Expand); setOperationAction(ISD::SETCC, MVT::f64, Expand); setOperationAction(ISD::SELECT, MVT::i32, Custom); setOperationAction(ISD::SELECT, MVT::f32, Custom); setOperationAction(ISD::SELECT, MVT::f64, Custom); setOperationAction(ISD::SELECT_CC, MVT::i32, Custom); setOperationAction(ISD::SELECT_CC, MVT::f32, Custom); setOperationAction(ISD::SELECT_CC, MVT::f64, Custom); setOperationAction(ISD::BRCOND, MVT::Other, Expand); setOperationAction(ISD::BR_CC, MVT::i32, Custom); setOperationAction(ISD::BR_CC, MVT::f32, Custom); setOperationAction(ISD::BR_CC, MVT::f64, Custom); setOperationAction(ISD::BR_JT, MVT::Other, Custom); // We don't support sin/cos/fmod/copysign/pow setOperationAction(ISD::FSIN, MVT::f64, Expand); setOperationAction(ISD::FSIN, MVT::f32, Expand); setOperationAction(ISD::FCOS, MVT::f32, Expand); setOperationAction(ISD::FCOS, MVT::f64, Expand); setOperationAction(ISD::FREM, MVT::f64, Expand); setOperationAction(ISD::FREM, MVT::f32, Expand); if (!UseSoftFloat && Subtarget->hasVFP2() && !Subtarget->isThumb1Only()) { setOperationAction(ISD::FCOPYSIGN, MVT::f64, Custom); setOperationAction(ISD::FCOPYSIGN, MVT::f32, Custom); } setOperationAction(ISD::FPOW, MVT::f64, Expand); setOperationAction(ISD::FPOW, MVT::f32, Expand); setOperationAction(ISD::FMA, MVT::f64, Expand); setOperationAction(ISD::FMA, MVT::f32, Expand); // Various VFP goodness if (!UseSoftFloat && !Subtarget->isThumb1Only()) { // int <-> fp are custom expanded into bit_convert + ARMISD ops. if (Subtarget->hasVFP2()) { setOperationAction(ISD::SINT_TO_FP, MVT::i32, Custom); setOperationAction(ISD::UINT_TO_FP, MVT::i32, Custom); setOperationAction(ISD::FP_TO_UINT, MVT::i32, Custom); setOperationAction(ISD::FP_TO_SINT, MVT::i32, Custom); } // Special handling for half-precision FP. if (!Subtarget->hasFP16()) { setOperationAction(ISD::FP16_TO_FP32, MVT::f32, Expand); setOperationAction(ISD::FP32_TO_FP16, MVT::i32, Expand); } } // We have target-specific dag combine patterns for the following nodes: // ARMISD::VMOVRRD - No need to call setTargetDAGCombine setTargetDAGCombine(ISD::ADD); setTargetDAGCombine(ISD::SUB); setTargetDAGCombine(ISD::MUL); if (Subtarget->hasV6T2Ops() || Subtarget->hasNEON()) setTargetDAGCombine(ISD::OR); if (Subtarget->hasNEON()) setTargetDAGCombine(ISD::AND); setStackPointerRegisterToSaveRestore(ARM::SP); if (UseSoftFloat || Subtarget->isThumb1Only() || !Subtarget->hasVFP2()) setSchedulingPreference(Sched::RegPressure); else setSchedulingPreference(Sched::Hybrid); //// temporary - rewrite interface to use type maxStoresPerMemcpy = maxStoresPerMemcpyOptSize = 1; // On ARM arguments smaller than 4 bytes are extended, so all arguments // are at least 4 bytes aligned. setMinStackArgumentAlignment(4); benefitFromCodePlacementOpt = true; setMinFunctionAlignment(Subtarget->isThumb() ? 1 : 2); } // FIXME: It might make sense to define the representative register class as the // nearest super-register that has a non-null superset. For example, DPR_VFP2 is // a super-register of SPR, and DPR is a superset if DPR_VFP2. Consequently, // SPR's representative would be DPR_VFP2. This should work well if register // pressure tracking were modified such that a register use would increment the // pressure of the register class's representative and all of it's super // classes' representatives transitively. We have not implemented this because // of the difficulty prior to coalescing of modeling operand register classes // due to the common occurrence of cross class copies and subregister insertions // and extractions. std::pair ARMTargetLowering::findRepresentativeClass(EVT VT) const{ const TargetRegisterClass *RRC = 0; uint8_t Cost = 1; switch (VT.getSimpleVT().SimpleTy) { default: return TargetLowering::findRepresentativeClass(VT); // Use DPR as representative register class for all floating point // and vector types. Since there are 32 SPR registers and 32 DPR registers so // the cost is 1 for both f32 and f64. case MVT::f32: case MVT::f64: case MVT::v8i8: case MVT::v4i16: case MVT::v2i32: case MVT::v1i64: case MVT::v2f32: RRC = ARM::DPRRegisterClass; // When NEON is used for SP, only half of the register file is available // because operations that define both SP and DP results will be constrained // to the VFP2 class (D0-D15). We currently model this constraint prior to // coalescing by double-counting the SP regs. See the FIXME above. if (Subtarget->useNEONForSinglePrecisionFP()) Cost = 2; break; case MVT::v16i8: case MVT::v8i16: case MVT::v4i32: case MVT::v2i64: case MVT::v4f32: case MVT::v2f64: RRC = ARM::DPRRegisterClass; Cost = 2; break; case MVT::v4i64: RRC = ARM::DPRRegisterClass; Cost = 4; break; case MVT::v8i64: RRC = ARM::DPRRegisterClass; Cost = 8; break; } return std::make_pair(RRC, Cost); } const char *ARMTargetLowering::getTargetNodeName(unsigned Opcode) const { switch (Opcode) { default: return 0; case ARMISD::Wrapper: return "ARMISD::Wrapper"; case ARMISD::WrapperDYN: return "ARMISD::WrapperDYN"; case ARMISD::WrapperPIC: return "ARMISD::WrapperPIC"; case ARMISD::WrapperJT: return "ARMISD::WrapperJT"; case ARMISD::CALL: return "ARMISD::CALL"; case ARMISD::CALL_PRED: return "ARMISD::CALL_PRED"; case ARMISD::CALL_NOLINK: return "ARMISD::CALL_NOLINK"; case ARMISD::tCALL: return "ARMISD::tCALL"; case ARMISD::BRCOND: return "ARMISD::BRCOND"; case ARMISD::BR_JT: return "ARMISD::BR_JT"; case ARMISD::BR2_JT: return "ARMISD::BR2_JT"; case ARMISD::RET_FLAG: return "ARMISD::RET_FLAG"; case ARMISD::PIC_ADD: return "ARMISD::PIC_ADD"; case ARMISD::CMP: return "ARMISD::CMP"; case ARMISD::CMPZ: return "ARMISD::CMPZ"; case ARMISD::CMPFP: return "ARMISD::CMPFP"; case ARMISD::CMPFPw0: return "ARMISD::CMPFPw0"; case ARMISD::BCC_i64: return "ARMISD::BCC_i64"; case ARMISD::FMSTAT: return "ARMISD::FMSTAT"; case ARMISD::CMOV: return "ARMISD::CMOV"; case ARMISD::RBIT: return "ARMISD::RBIT"; case ARMISD::FTOSI: return "ARMISD::FTOSI"; case ARMISD::FTOUI: return "ARMISD::FTOUI"; case ARMISD::SITOF: return "ARMISD::SITOF"; case ARMISD::UITOF: return "ARMISD::UITOF"; case ARMISD::SRL_FLAG: return "ARMISD::SRL_FLAG"; case ARMISD::SRA_FLAG: return "ARMISD::SRA_FLAG"; case ARMISD::RRX: return "ARMISD::RRX"; case ARMISD::ADDC: return "ARMISD::ADDC"; case ARMISD::ADDE: return "ARMISD::ADDE"; case ARMISD::SUBC: return "ARMISD::SUBC"; case ARMISD::SUBE: return "ARMISD::SUBE"; case ARMISD::VMOVRRD: return "ARMISD::VMOVRRD"; case ARMISD::VMOVDRR: return "ARMISD::VMOVDRR"; case ARMISD::EH_SJLJ_SETJMP: return "ARMISD::EH_SJLJ_SETJMP"; case ARMISD::EH_SJLJ_LONGJMP:return "ARMISD::EH_SJLJ_LONGJMP"; case ARMISD::EH_SJLJ_DISPATCHSETUP:return "ARMISD::EH_SJLJ_DISPATCHSETUP"; case ARMISD::TC_RETURN: return "ARMISD::TC_RETURN"; case ARMISD::THREAD_POINTER:return "ARMISD::THREAD_POINTER"; case ARMISD::DYN_ALLOC: return "ARMISD::DYN_ALLOC"; case ARMISD::MEMBARRIER: return "ARMISD::MEMBARRIER"; case ARMISD::MEMBARRIER_MCR: return "ARMISD::MEMBARRIER_MCR"; case ARMISD::PRELOAD: return "ARMISD::PRELOAD"; case ARMISD::VCEQ: return "ARMISD::VCEQ"; case ARMISD::VCEQZ: return "ARMISD::VCEQZ"; case ARMISD::VCGE: return "ARMISD::VCGE"; case ARMISD::VCGEZ: return "ARMISD::VCGEZ"; case ARMISD::VCLEZ: return "ARMISD::VCLEZ"; case ARMISD::VCGEU: return "ARMISD::VCGEU"; case ARMISD::VCGT: return "ARMISD::VCGT"; case ARMISD::VCGTZ: return "ARMISD::VCGTZ"; case ARMISD::VCLTZ: return "ARMISD::VCLTZ"; case ARMISD::VCGTU: return "ARMISD::VCGTU"; case ARMISD::VTST: return "ARMISD::VTST"; case ARMISD::VSHL: return "ARMISD::VSHL"; case ARMISD::VSHRs: return "ARMISD::VSHRs"; case ARMISD::VSHRu: return "ARMISD::VSHRu"; case ARMISD::VSHLLs: return "ARMISD::VSHLLs"; case ARMISD::VSHLLu: return "ARMISD::VSHLLu"; case ARMISD::VSHLLi: return "ARMISD::VSHLLi"; case ARMISD::VSHRN: return "ARMISD::VSHRN"; case ARMISD::VRSHRs: return "ARMISD::VRSHRs"; case ARMISD::VRSHRu: return "ARMISD::VRSHRu"; case ARMISD::VRSHRN: return "ARMISD::VRSHRN"; case ARMISD::VQSHLs: return "ARMISD::VQSHLs"; case ARMISD::VQSHLu: return "ARMISD::VQSHLu"; case ARMISD::VQSHLsu: return "ARMISD::VQSHLsu"; case ARMISD::VQSHRNs: return "ARMISD::VQSHRNs"; case ARMISD::VQSHRNu: return "ARMISD::VQSHRNu"; case ARMISD::VQSHRNsu: return "ARMISD::VQSHRNsu"; case ARMISD::VQRSHRNs: return "ARMISD::VQRSHRNs"; case ARMISD::VQRSHRNu: return "ARMISD::VQRSHRNu"; case ARMISD::VQRSHRNsu: return "ARMISD::VQRSHRNsu"; case ARMISD::VGETLANEu: return "ARMISD::VGETLANEu"; case ARMISD::VGETLANEs: return "ARMISD::VGETLANEs"; case ARMISD::VMOVIMM: return "ARMISD::VMOVIMM"; case ARMISD::VMVNIMM: return "ARMISD::VMVNIMM"; case ARMISD::VDUP: return "ARMISD::VDUP"; case ARMISD::VDUPLANE: return "ARMISD::VDUPLANE"; case ARMISD::VEXT: return "ARMISD::VEXT"; case ARMISD::VREV64: return "ARMISD::VREV64"; case ARMISD::VREV32: return "ARMISD::VREV32"; case ARMISD::VREV16: return "ARMISD::VREV16"; case ARMISD::VZIP: return "ARMISD::VZIP"; case ARMISD::VUZP: return "ARMISD::VUZP"; case ARMISD::VTRN: return "ARMISD::VTRN"; case ARMISD::VTBL1: return "ARMISD::VTBL1"; case ARMISD::VTBL2: return "ARMISD::VTBL2"; case ARMISD::VMULLs: return "ARMISD::VMULLs"; case ARMISD::VMULLu: return "ARMISD::VMULLu"; case ARMISD::BUILD_VECTOR: return "ARMISD::BUILD_VECTOR"; case ARMISD::FMAX: return "ARMISD::FMAX"; case ARMISD::FMIN: return "ARMISD::FMIN"; case ARMISD::BFI: return "ARMISD::BFI"; case ARMISD::VORRIMM: return "ARMISD::VORRIMM"; case ARMISD::VBICIMM: return "ARMISD::VBICIMM"; case ARMISD::VBSL: return "ARMISD::VBSL"; case ARMISD::VLD2DUP: return "ARMISD::VLD2DUP"; case ARMISD::VLD3DUP: return "ARMISD::VLD3DUP"; case ARMISD::VLD4DUP: return "ARMISD::VLD4DUP"; case ARMISD::VLD1_UPD: return "ARMISD::VLD1_UPD"; case ARMISD::VLD2_UPD: return "ARMISD::VLD2_UPD"; case ARMISD::VLD3_UPD: return "ARMISD::VLD3_UPD"; case ARMISD::VLD4_UPD: return "ARMISD::VLD4_UPD"; case ARMISD::VLD2LN_UPD: return "ARMISD::VLD2LN_UPD"; case ARMISD::VLD3LN_UPD: return "ARMISD::VLD3LN_UPD"; case ARMISD::VLD4LN_UPD: return "ARMISD::VLD4LN_UPD"; case ARMISD::VLD2DUP_UPD: return "ARMISD::VLD2DUP_UPD"; case ARMISD::VLD3DUP_UPD: return "ARMISD::VLD3DUP_UPD"; case ARMISD::VLD4DUP_UPD: return "ARMISD::VLD4DUP_UPD"; case ARMISD::VST1_UPD: return "ARMISD::VST1_UPD"; case ARMISD::VST2_UPD: return "ARMISD::VST2_UPD"; case ARMISD::VST3_UPD: return "ARMISD::VST3_UPD"; case ARMISD::VST4_UPD: return "ARMISD::VST4_UPD"; case ARMISD::VST2LN_UPD: return "ARMISD::VST2LN_UPD"; case ARMISD::VST3LN_UPD: return "ARMISD::VST3LN_UPD"; case ARMISD::VST4LN_UPD: return "ARMISD::VST4LN_UPD"; } } EVT ARMTargetLowering::getSetCCResultType(EVT VT) const { if (!VT.isVector()) return getPointerTy(); return VT.changeVectorElementTypeToInteger(); } /// getRegClassFor - Return the register class that should be used for the /// specified value type. TargetRegisterClass *ARMTargetLowering::getRegClassFor(EVT VT) const { // Map v4i64 to QQ registers but do not make the type legal. Similarly map // v8i64 to QQQQ registers. v4i64 and v8i64 are only used for REG_SEQUENCE to // load / store 4 to 8 consecutive D registers. if (Subtarget->hasNEON()) { if (VT == MVT::v4i64) return ARM::QQPRRegisterClass; else if (VT == MVT::v8i64) return ARM::QQQQPRRegisterClass; } return TargetLowering::getRegClassFor(VT); } // Create a fast isel object. FastISel * ARMTargetLowering::createFastISel(FunctionLoweringInfo &funcInfo) const { return ARM::createFastISel(funcInfo); } /// getMaximalGlobalOffset - Returns the maximal possible offset which can /// be used for loads / stores from the global. unsigned ARMTargetLowering::getMaximalGlobalOffset() const { return (Subtarget->isThumb1Only() ? 127 : 4095); } Sched::Preference ARMTargetLowering::getSchedulingPreference(SDNode *N) const { unsigned NumVals = N->getNumValues(); if (!NumVals) return Sched::RegPressure; for (unsigned i = 0; i != NumVals; ++i) { EVT VT = N->getValueType(i); if (VT == MVT::Glue || VT == MVT::Other) continue; if (VT.isFloatingPoint() || VT.isVector()) return Sched::Latency; } if (!N->isMachineOpcode()) return Sched::RegPressure; // Load are scheduled for latency even if there instruction itinerary // is not available. const TargetInstrInfo *TII = getTargetMachine().getInstrInfo(); const MCInstrDesc &MCID = TII->get(N->getMachineOpcode()); if (MCID.getNumDefs() == 0) return Sched::RegPressure; if (!Itins->isEmpty() && Itins->getOperandCycle(MCID.getSchedClass(), 0) > 2) return Sched::Latency; return Sched::RegPressure; } //===----------------------------------------------------------------------===// // Lowering Code //===----------------------------------------------------------------------===// /// IntCCToARMCC - Convert a DAG integer condition code to an ARM CC static ARMCC::CondCodes IntCCToARMCC(ISD::CondCode CC) { switch (CC) { default: llvm_unreachable("Unknown condition code!"); case ISD::SETNE: return ARMCC::NE; case ISD::SETEQ: return ARMCC::EQ; case ISD::SETGT: return ARMCC::GT; case ISD::SETGE: return ARMCC::GE; case ISD::SETLT: return ARMCC::LT; case ISD::SETLE: return ARMCC::LE; case ISD::SETUGT: return ARMCC::HI; case ISD::SETUGE: return ARMCC::HS; case ISD::SETULT: return ARMCC::LO; case ISD::SETULE: return ARMCC::LS; } } /// FPCCToARMCC - Convert a DAG fp condition code to an ARM CC. static void FPCCToARMCC(ISD::CondCode CC, ARMCC::CondCodes &CondCode, ARMCC::CondCodes &CondCode2) { CondCode2 = ARMCC::AL; switch (CC) { default: llvm_unreachable("Unknown FP condition!"); case ISD::SETEQ: case ISD::SETOEQ: CondCode = ARMCC::EQ; break; case ISD::SETGT: case ISD::SETOGT: CondCode = ARMCC::GT; break; case ISD::SETGE: case ISD::SETOGE: CondCode = ARMCC::GE; break; case ISD::SETOLT: CondCode = ARMCC::MI; break; case ISD::SETOLE: CondCode = ARMCC::LS; break; case ISD::SETONE: CondCode = ARMCC::MI; CondCode2 = ARMCC::GT; break; case ISD::SETO: CondCode = ARMCC::VC; break; case ISD::SETUO: CondCode = ARMCC::VS; break; case ISD::SETUEQ: CondCode = ARMCC::EQ; CondCode2 = ARMCC::VS; break; case ISD::SETUGT: CondCode = ARMCC::HI; break; case ISD::SETUGE: CondCode = ARMCC::PL; break; case ISD::SETLT: case ISD::SETULT: CondCode = ARMCC::LT; break; case ISD::SETLE: case ISD::SETULE: CondCode = ARMCC::LE; break; case ISD::SETNE: case ISD::SETUNE: CondCode = ARMCC::NE; break; } } //===----------------------------------------------------------------------===// // Calling Convention Implementation //===----------------------------------------------------------------------===// #include "ARMGenCallingConv.inc" /// CCAssignFnForNode - Selects the correct CCAssignFn for a the /// given CallingConvention value. CCAssignFn *ARMTargetLowering::CCAssignFnForNode(CallingConv::ID CC, bool Return, bool isVarArg) const { switch (CC) { default: llvm_unreachable("Unsupported calling convention"); case CallingConv::Fast: if (Subtarget->hasVFP2() && !isVarArg) { if (!Subtarget->isAAPCS_ABI()) return (Return ? RetFastCC_ARM_APCS : FastCC_ARM_APCS); // For AAPCS ABI targets, just use VFP variant of the calling convention. return (Return ? RetCC_ARM_AAPCS_VFP : CC_ARM_AAPCS_VFP); } // Fallthrough case CallingConv::C: { // Use target triple & subtarget features to do actual dispatch. if (!Subtarget->isAAPCS_ABI()) return (Return ? RetCC_ARM_APCS : CC_ARM_APCS); else if (Subtarget->hasVFP2() && FloatABIType == FloatABI::Hard && !isVarArg) return (Return ? RetCC_ARM_AAPCS_VFP : CC_ARM_AAPCS_VFP); return (Return ? RetCC_ARM_AAPCS : CC_ARM_AAPCS); } case CallingConv::ARM_AAPCS_VFP: return (Return ? RetCC_ARM_AAPCS_VFP : CC_ARM_AAPCS_VFP); case CallingConv::ARM_AAPCS: return (Return ? RetCC_ARM_AAPCS : CC_ARM_AAPCS); case CallingConv::ARM_APCS: return (Return ? RetCC_ARM_APCS : CC_ARM_APCS); + case CallingConv::GHC: + return (Return ? RetCC_ARM_APCS : CC_ARM_APCS_GHC); } } /// LowerCallResult - Lower the result values of a call into the /// appropriate copies out of appropriate physical registers. SDValue ARMTargetLowering::LowerCallResult(SDValue Chain, SDValue InFlag, CallingConv::ID CallConv, bool isVarArg, const SmallVectorImpl &Ins, DebugLoc dl, SelectionDAG &DAG, SmallVectorImpl &InVals) const { // Assign locations to each value returned by this call. SmallVector RVLocs; ARMCCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(), getTargetMachine(), RVLocs, *DAG.getContext(), Call); CCInfo.AnalyzeCallResult(Ins, CCAssignFnForNode(CallConv, /* Return*/ true, isVarArg)); // Copy all of the result registers out of their specified physreg. for (unsigned i = 0; i != RVLocs.size(); ++i) { CCValAssign VA = RVLocs[i]; SDValue Val; if (VA.needsCustom()) { // Handle f64 or half of a v2f64. SDValue Lo = DAG.getCopyFromReg(Chain, dl, VA.getLocReg(), MVT::i32, InFlag); Chain = Lo.getValue(1); InFlag = Lo.getValue(2); VA = RVLocs[++i]; // skip ahead to next loc SDValue Hi = DAG.getCopyFromReg(Chain, dl, VA.getLocReg(), MVT::i32, InFlag); Chain = Hi.getValue(1); InFlag = Hi.getValue(2); Val = DAG.getNode(ARMISD::VMOVDRR, dl, MVT::f64, Lo, Hi); if (VA.getLocVT() == MVT::v2f64) { SDValue Vec = DAG.getNode(ISD::UNDEF, dl, MVT::v2f64); Vec = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, MVT::v2f64, Vec, Val, DAG.getConstant(0, MVT::i32)); VA = RVLocs[++i]; // skip ahead to next loc Lo = DAG.getCopyFromReg(Chain, dl, VA.getLocReg(), MVT::i32, InFlag); Chain = Lo.getValue(1); InFlag = Lo.getValue(2); VA = RVLocs[++i]; // skip ahead to next loc Hi = DAG.getCopyFromReg(Chain, dl, VA.getLocReg(), MVT::i32, InFlag); Chain = Hi.getValue(1); InFlag = Hi.getValue(2); Val = DAG.getNode(ARMISD::VMOVDRR, dl, MVT::f64, Lo, Hi); Val = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, MVT::v2f64, Vec, Val, DAG.getConstant(1, MVT::i32)); } } else { Val = DAG.getCopyFromReg(Chain, dl, VA.getLocReg(), VA.getLocVT(), InFlag); Chain = Val.getValue(1); InFlag = Val.getValue(2); } switch (VA.getLocInfo()) { default: llvm_unreachable("Unknown loc info!"); case CCValAssign::Full: break; case CCValAssign::BCvt: Val = DAG.getNode(ISD::BITCAST, dl, VA.getValVT(), Val); break; } InVals.push_back(Val); } return Chain; } /// LowerMemOpCallTo - Store the argument to the stack. SDValue ARMTargetLowering::LowerMemOpCallTo(SDValue Chain, SDValue StackPtr, SDValue Arg, DebugLoc dl, SelectionDAG &DAG, const CCValAssign &VA, ISD::ArgFlagsTy Flags) const { unsigned LocMemOffset = VA.getLocMemOffset(); SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset); PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(), StackPtr, PtrOff); return DAG.getStore(Chain, dl, Arg, PtrOff, MachinePointerInfo::getStack(LocMemOffset), false, false, 0); } void ARMTargetLowering::PassF64ArgInRegs(DebugLoc dl, SelectionDAG &DAG, SDValue Chain, SDValue &Arg, RegsToPassVector &RegsToPass, CCValAssign &VA, CCValAssign &NextVA, SDValue &StackPtr, SmallVector &MemOpChains, ISD::ArgFlagsTy Flags) const { SDValue fmrrd = DAG.getNode(ARMISD::VMOVRRD, dl, DAG.getVTList(MVT::i32, MVT::i32), Arg); RegsToPass.push_back(std::make_pair(VA.getLocReg(), fmrrd)); if (NextVA.isRegLoc()) RegsToPass.push_back(std::make_pair(NextVA.getLocReg(), fmrrd.getValue(1))); else { assert(NextVA.isMemLoc()); if (StackPtr.getNode() == 0) StackPtr = DAG.getCopyFromReg(Chain, dl, ARM::SP, getPointerTy()); MemOpChains.push_back(LowerMemOpCallTo(Chain, StackPtr, fmrrd.getValue(1), dl, DAG, NextVA, Flags)); } } /// LowerCall - Lowering a call into a callseq_start <- /// ARMISD:CALL <- callseq_end chain. Also add input and output parameter /// nodes. SDValue ARMTargetLowering::LowerCall(SDValue Chain, SDValue Callee, CallingConv::ID CallConv, bool isVarArg, bool &isTailCall, const SmallVectorImpl &Outs, const SmallVectorImpl &OutVals, const SmallVectorImpl &Ins, DebugLoc dl, SelectionDAG &DAG, SmallVectorImpl &InVals) const { MachineFunction &MF = DAG.getMachineFunction(); bool IsStructRet = (Outs.empty()) ? false : Outs[0].Flags.isSRet(); bool IsSibCall = false; // Disable tail calls if they're not supported. if (!EnableARMTailCalls && !Subtarget->supportsTailCall()) isTailCall = false; if (isTailCall) { // Check if it's really possible to do a tail call. isTailCall = IsEligibleForTailCallOptimization(Callee, CallConv, isVarArg, IsStructRet, MF.getFunction()->hasStructRetAttr(), Outs, OutVals, Ins, DAG); // We don't support GuaranteedTailCallOpt for ARM, only automatically // detected sibcalls. if (isTailCall) { ++NumTailCalls; IsSibCall = true; } } // Analyze operands of the call, assigning locations to each operand. SmallVector ArgLocs; ARMCCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(), getTargetMachine(), ArgLocs, *DAG.getContext(), Call); CCInfo.AnalyzeCallOperands(Outs, CCAssignFnForNode(CallConv, /* Return*/ false, isVarArg)); // Get a count of how many bytes are to be pushed on the stack. unsigned NumBytes = CCInfo.getNextStackOffset(); // For tail calls, memory operands are available in our caller's stack. if (IsSibCall) NumBytes = 0; // Adjust the stack pointer for the new arguments... // These operations are automatically eliminated by the prolog/epilog pass if (!IsSibCall) Chain = DAG.getCALLSEQ_START(Chain, DAG.getIntPtrConstant(NumBytes, true)); SDValue StackPtr = DAG.getCopyFromReg(Chain, dl, ARM::SP, getPointerTy()); RegsToPassVector RegsToPass; SmallVector MemOpChains; // Walk the register/memloc assignments, inserting copies/loads. In the case // of tail call optimization, arguments are handled later. for (unsigned i = 0, realArgIdx = 0, e = ArgLocs.size(); i != e; ++i, ++realArgIdx) { CCValAssign &VA = ArgLocs[i]; SDValue Arg = OutVals[realArgIdx]; ISD::ArgFlagsTy Flags = Outs[realArgIdx].Flags; bool isByVal = Flags.isByVal(); // Promote the value if needed. switch (VA.getLocInfo()) { default: llvm_unreachable("Unknown loc info!"); case CCValAssign::Full: break; case CCValAssign::SExt: Arg = DAG.getNode(ISD::SIGN_EXTEND, dl, VA.getLocVT(), Arg); break; case CCValAssign::ZExt: Arg = DAG.getNode(ISD::ZERO_EXTEND, dl, VA.getLocVT(), Arg); break; case CCValAssign::AExt: Arg = DAG.getNode(ISD::ANY_EXTEND, dl, VA.getLocVT(), Arg); break; case CCValAssign::BCvt: Arg = DAG.getNode(ISD::BITCAST, dl, VA.getLocVT(), Arg); break; } // f64 and v2f64 might be passed in i32 pairs and must be split into pieces if (VA.needsCustom()) { if (VA.getLocVT() == MVT::v2f64) { SDValue Op0 = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64, Arg, DAG.getConstant(0, MVT::i32)); SDValue Op1 = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64, Arg, DAG.getConstant(1, MVT::i32)); PassF64ArgInRegs(dl, DAG, Chain, Op0, RegsToPass, VA, ArgLocs[++i], StackPtr, MemOpChains, Flags); VA = ArgLocs[++i]; // skip ahead to next loc if (VA.isRegLoc()) { PassF64ArgInRegs(dl, DAG, Chain, Op1, RegsToPass, VA, ArgLocs[++i], StackPtr, MemOpChains, Flags); } else { assert(VA.isMemLoc()); MemOpChains.push_back(LowerMemOpCallTo(Chain, StackPtr, Op1, dl, DAG, VA, Flags)); } } else { PassF64ArgInRegs(dl, DAG, Chain, Arg, RegsToPass, VA, ArgLocs[++i], StackPtr, MemOpChains, Flags); } } else if (VA.isRegLoc()) { RegsToPass.push_back(std::make_pair(VA.getLocReg(), Arg)); } else if (isByVal) { assert(VA.isMemLoc()); unsigned offset = 0; // True if this byval aggregate will be split between registers // and memory. if (CCInfo.isFirstByValRegValid()) { EVT PtrVT = DAG.getTargetLoweringInfo().getPointerTy(); unsigned int i, j; for (i = 0, j = CCInfo.getFirstByValReg(); j < ARM::R4; i++, j++) { SDValue Const = DAG.getConstant(4*i, MVT::i32); SDValue AddArg = DAG.getNode(ISD::ADD, dl, PtrVT, Arg, Const); SDValue Load = DAG.getLoad(PtrVT, dl, Chain, AddArg, MachinePointerInfo(), false, false, 0); MemOpChains.push_back(Load.getValue(1)); RegsToPass.push_back(std::make_pair(j, Load)); } offset = ARM::R4 - CCInfo.getFirstByValReg(); CCInfo.clearFirstByValReg(); } unsigned LocMemOffset = VA.getLocMemOffset(); SDValue StkPtrOff = DAG.getIntPtrConstant(LocMemOffset); SDValue Dst = DAG.getNode(ISD::ADD, dl, getPointerTy(), StackPtr, StkPtrOff); SDValue SrcOffset = DAG.getIntPtrConstant(4*offset); SDValue Src = DAG.getNode(ISD::ADD, dl, getPointerTy(), Arg, SrcOffset); SDValue SizeNode = DAG.getConstant(Flags.getByValSize() - 4*offset, MVT::i32); // TODO: Disable AlwaysInline when it becomes possible // to emit a nested call sequence. MemOpChains.push_back(DAG.getMemcpy(Chain, dl, Dst, Src, SizeNode, Flags.getByValAlign(), /*isVolatile=*/false, /*AlwaysInline=*/true, MachinePointerInfo(0), MachinePointerInfo(0))); } else if (!IsSibCall) { assert(VA.isMemLoc()); MemOpChains.push_back(LowerMemOpCallTo(Chain, StackPtr, Arg, dl, DAG, VA, Flags)); } } if (!MemOpChains.empty()) Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, &MemOpChains[0], MemOpChains.size()); // Build a sequence of copy-to-reg nodes chained together with token chain // and flag operands which copy the outgoing args into the appropriate regs. SDValue InFlag; // Tail call byval lowering might overwrite argument registers so in case of // tail call optimization the copies to registers are lowered later. if (!isTailCall) for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i) { Chain = DAG.getCopyToReg(Chain, dl, RegsToPass[i].first, RegsToPass[i].second, InFlag); InFlag = Chain.getValue(1); } // For tail calls lower the arguments to the 'real' stack slot. if (isTailCall) { // Force all the incoming stack arguments to be loaded from the stack // before any new outgoing arguments are stored to the stack, because the // outgoing stack slots may alias the incoming argument stack slots, and // the alias isn't otherwise explicit. This is slightly more conservative // than necessary, because it means that each store effectively depends // on every argument instead of just those arguments it would clobber. // Do not flag preceding copytoreg stuff together with the following stuff. InFlag = SDValue(); for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i) { Chain = DAG.getCopyToReg(Chain, dl, RegsToPass[i].first, RegsToPass[i].second, InFlag); InFlag = Chain.getValue(1); } InFlag =SDValue(); } // If the callee is a GlobalAddress/ExternalSymbol node (quite common, every // direct call is) turn it into a TargetGlobalAddress/TargetExternalSymbol // node so that legalize doesn't hack it. bool isDirect = false; bool isARMFunc = false; bool isLocalARMFunc = false; ARMFunctionInfo *AFI = MF.getInfo(); if (EnableARMLongCalls) { assert (getTargetMachine().getRelocationModel() == Reloc::Static && "long-calls with non-static relocation model!"); // Handle a global address or an external symbol. If it's not one of // those, the target's already in a register, so we don't need to do // anything extra. if (GlobalAddressSDNode *G = dyn_cast(Callee)) { const GlobalValue *GV = G->getGlobal(); // Create a constant pool entry for the callee address unsigned ARMPCLabelIndex = AFI->createPICLabelUId(); ARMConstantPoolValue *CPV = ARMConstantPoolConstant::Create(GV, ARMPCLabelIndex, ARMCP::CPValue, 0); // Get the address of the callee into a register SDValue CPAddr = DAG.getTargetConstantPool(CPV, getPointerTy(), 4); CPAddr = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr); Callee = DAG.getLoad(getPointerTy(), dl, DAG.getEntryNode(), CPAddr, MachinePointerInfo::getConstantPool(), false, false, 0); } else if (ExternalSymbolSDNode *S=dyn_cast(Callee)) { const char *Sym = S->getSymbol(); // Create a constant pool entry for the callee address unsigned ARMPCLabelIndex = AFI->createPICLabelUId(); ARMConstantPoolValue *CPV = ARMConstantPoolSymbol::Create(*DAG.getContext(), Sym, ARMPCLabelIndex, 0); // Get the address of the callee into a register SDValue CPAddr = DAG.getTargetConstantPool(CPV, getPointerTy(), 4); CPAddr = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr); Callee = DAG.getLoad(getPointerTy(), dl, DAG.getEntryNode(), CPAddr, MachinePointerInfo::getConstantPool(), false, false, 0); } } else if (GlobalAddressSDNode *G = dyn_cast(Callee)) { const GlobalValue *GV = G->getGlobal(); isDirect = true; bool isExt = GV->isDeclaration() || GV->isWeakForLinker(); bool isStub = (isExt && Subtarget->isTargetDarwin()) && getTargetMachine().getRelocationModel() != Reloc::Static; isARMFunc = !Subtarget->isThumb() || isStub; // ARM call to a local ARM function is predicable. isLocalARMFunc = !Subtarget->isThumb() && (!isExt || !ARMInterworking); // tBX takes a register source operand. if (isARMFunc && Subtarget->isThumb1Only() && !Subtarget->hasV5TOps()) { unsigned ARMPCLabelIndex = AFI->createPICLabelUId(); ARMConstantPoolValue *CPV = ARMConstantPoolConstant::Create(GV, ARMPCLabelIndex, ARMCP::CPValue, 4); SDValue CPAddr = DAG.getTargetConstantPool(CPV, getPointerTy(), 4); CPAddr = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr); Callee = DAG.getLoad(getPointerTy(), dl, DAG.getEntryNode(), CPAddr, MachinePointerInfo::getConstantPool(), false, false, 0); SDValue PICLabel = DAG.getConstant(ARMPCLabelIndex, MVT::i32); Callee = DAG.getNode(ARMISD::PIC_ADD, dl, getPointerTy(), Callee, PICLabel); } else { // On ELF targets for PIC code, direct calls should go through the PLT unsigned OpFlags = 0; if (Subtarget->isTargetELF() && getTargetMachine().getRelocationModel() == Reloc::PIC_) OpFlags = ARMII::MO_PLT; Callee = DAG.getTargetGlobalAddress(GV, dl, getPointerTy(), 0, OpFlags); } } else if (ExternalSymbolSDNode *S = dyn_cast(Callee)) { isDirect = true; bool isStub = Subtarget->isTargetDarwin() && getTargetMachine().getRelocationModel() != Reloc::Static; isARMFunc = !Subtarget->isThumb() || isStub; // tBX takes a register source operand. const char *Sym = S->getSymbol(); if (isARMFunc && Subtarget->isThumb1Only() && !Subtarget->hasV5TOps()) { unsigned ARMPCLabelIndex = AFI->createPICLabelUId(); ARMConstantPoolValue *CPV = ARMConstantPoolSymbol::Create(*DAG.getContext(), Sym, ARMPCLabelIndex, 4); SDValue CPAddr = DAG.getTargetConstantPool(CPV, getPointerTy(), 4); CPAddr = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr); Callee = DAG.getLoad(getPointerTy(), dl, DAG.getEntryNode(), CPAddr, MachinePointerInfo::getConstantPool(), false, false, 0); SDValue PICLabel = DAG.getConstant(ARMPCLabelIndex, MVT::i32); Callee = DAG.getNode(ARMISD::PIC_ADD, dl, getPointerTy(), Callee, PICLabel); } else { unsigned OpFlags = 0; // On ELF targets for PIC code, direct calls should go through the PLT if (Subtarget->isTargetELF() && getTargetMachine().getRelocationModel() == Reloc::PIC_) OpFlags = ARMII::MO_PLT; Callee = DAG.getTargetExternalSymbol(Sym, getPointerTy(), OpFlags); } } // FIXME: handle tail calls differently. unsigned CallOpc; if (Subtarget->isThumb()) { if ((!isDirect || isARMFunc) && !Subtarget->hasV5TOps()) CallOpc = ARMISD::CALL_NOLINK; else CallOpc = isARMFunc ? ARMISD::CALL : ARMISD::tCALL; } else { CallOpc = (isDirect || Subtarget->hasV5TOps()) ? (isLocalARMFunc ? ARMISD::CALL_PRED : ARMISD::CALL) : ARMISD::CALL_NOLINK; } std::vector Ops; Ops.push_back(Chain); Ops.push_back(Callee); // Add argument registers to the end of the list so that they are known live // into the call. for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i) Ops.push_back(DAG.getRegister(RegsToPass[i].first, RegsToPass[i].second.getValueType())); if (InFlag.getNode()) Ops.push_back(InFlag); SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue); if (isTailCall) return DAG.getNode(ARMISD::TC_RETURN, dl, NodeTys, &Ops[0], Ops.size()); // Returns a chain and a flag for retval copy to use. Chain = DAG.getNode(CallOpc, dl, NodeTys, &Ops[0], Ops.size()); InFlag = Chain.getValue(1); Chain = DAG.getCALLSEQ_END(Chain, DAG.getIntPtrConstant(NumBytes, true), DAG.getIntPtrConstant(0, true), InFlag); if (!Ins.empty()) InFlag = Chain.getValue(1); // Handle result values, copying them out of physregs into vregs that we // return. return LowerCallResult(Chain, InFlag, CallConv, isVarArg, Ins, dl, DAG, InVals); } /// HandleByVal - Every parameter *after* a byval parameter is passed /// on the stack. Remember the next parameter register to allocate, /// and then confiscate the rest of the parameter registers to insure /// this. void llvm::ARMTargetLowering::HandleByVal(CCState *State, unsigned &size) const { unsigned reg = State->AllocateReg(GPRArgRegs, 4); assert((State->getCallOrPrologue() == Prologue || State->getCallOrPrologue() == Call) && "unhandled ParmContext"); if ((!State->isFirstByValRegValid()) && (ARM::R0 <= reg) && (reg <= ARM::R3)) { State->setFirstByValReg(reg); // At a call site, a byval parameter that is split between // registers and memory needs its size truncated here. In a // function prologue, such byval parameters are reassembled in // memory, and are not truncated. if (State->getCallOrPrologue() == Call) { unsigned excess = 4 * (ARM::R4 - reg); assert(size >= excess && "expected larger existing stack allocation"); size -= excess; } } // Confiscate any remaining parameter registers to preclude their // assignment to subsequent parameters. while (State->AllocateReg(GPRArgRegs, 4)) ; } /// MatchingStackOffset - Return true if the given stack call argument is /// already available in the same position (relatively) of the caller's /// incoming argument stack. static bool MatchingStackOffset(SDValue Arg, unsigned Offset, ISD::ArgFlagsTy Flags, MachineFrameInfo *MFI, const MachineRegisterInfo *MRI, const ARMInstrInfo *TII) { unsigned Bytes = Arg.getValueType().getSizeInBits() / 8; int FI = INT_MAX; if (Arg.getOpcode() == ISD::CopyFromReg) { unsigned VR = cast(Arg.getOperand(1))->getReg(); if (!TargetRegisterInfo::isVirtualRegister(VR)) return false; MachineInstr *Def = MRI->getVRegDef(VR); if (!Def) return false; if (!Flags.isByVal()) { if (!TII->isLoadFromStackSlot(Def, FI)) return false; } else { return false; } } else if (LoadSDNode *Ld = dyn_cast(Arg)) { if (Flags.isByVal()) // ByVal argument is passed in as a pointer but it's now being // dereferenced. e.g. // define @foo(%struct.X* %A) { // tail call @bar(%struct.X* byval %A) // } return false; SDValue Ptr = Ld->getBasePtr(); FrameIndexSDNode *FINode = dyn_cast(Ptr); if (!FINode) return false; FI = FINode->getIndex(); } else return false; assert(FI != INT_MAX); if (!MFI->isFixedObjectIndex(FI)) return false; return Offset == MFI->getObjectOffset(FI) && Bytes == MFI->getObjectSize(FI); } /// IsEligibleForTailCallOptimization - Check whether the call is eligible /// for tail call optimization. Targets which want to do tail call /// optimization should implement this function. bool ARMTargetLowering::IsEligibleForTailCallOptimization(SDValue Callee, CallingConv::ID CalleeCC, bool isVarArg, bool isCalleeStructRet, bool isCallerStructRet, const SmallVectorImpl &Outs, const SmallVectorImpl &OutVals, const SmallVectorImpl &Ins, SelectionDAG& DAG) const { const Function *CallerF = DAG.getMachineFunction().getFunction(); CallingConv::ID CallerCC = CallerF->getCallingConv(); bool CCMatch = CallerCC == CalleeCC; // Look for obvious safe cases to perform tail call optimization that do not // require ABI changes. This is what gcc calls sibcall. // Do not sibcall optimize vararg calls unless the call site is not passing // any arguments. if (isVarArg && !Outs.empty()) return false; // Also avoid sibcall optimization if either caller or callee uses struct // return semantics. if (isCalleeStructRet || isCallerStructRet) return false; // FIXME: Completely disable sibcall for Thumb1 since Thumb1RegisterInfo:: // emitEpilogue is not ready for them. Thumb tail calls also use t2B, as // the Thumb1 16-bit unconditional branch doesn't have sufficient relocation // support in the assembler and linker to be used. This would need to be // fixed to fully support tail calls in Thumb1. // // Doing this is tricky, since the LDM/POP instruction on Thumb doesn't take // LR. This means if we need to reload LR, it takes an extra instructions, // which outweighs the value of the tail call; but here we don't know yet // whether LR is going to be used. Probably the right approach is to // generate the tail call here and turn it back into CALL/RET in // emitEpilogue if LR is used. // Thumb1 PIC calls to external symbols use BX, so they can be tail calls, // but we need to make sure there are enough registers; the only valid // registers are the 4 used for parameters. We don't currently do this // case. if (Subtarget->isThumb1Only()) return false; // If the calling conventions do not match, then we'd better make sure the // results are returned in the same way as what the caller expects. if (!CCMatch) { SmallVector RVLocs1; ARMCCState CCInfo1(CalleeCC, false, DAG.getMachineFunction(), getTargetMachine(), RVLocs1, *DAG.getContext(), Call); CCInfo1.AnalyzeCallResult(Ins, CCAssignFnForNode(CalleeCC, true, isVarArg)); SmallVector RVLocs2; ARMCCState CCInfo2(CallerCC, false, DAG.getMachineFunction(), getTargetMachine(), RVLocs2, *DAG.getContext(), Call); CCInfo2.AnalyzeCallResult(Ins, CCAssignFnForNode(CallerCC, true, isVarArg)); if (RVLocs1.size() != RVLocs2.size()) return false; for (unsigned i = 0, e = RVLocs1.size(); i != e; ++i) { if (RVLocs1[i].isRegLoc() != RVLocs2[i].isRegLoc()) return false; if (RVLocs1[i].getLocInfo() != RVLocs2[i].getLocInfo()) return false; if (RVLocs1[i].isRegLoc()) { if (RVLocs1[i].getLocReg() != RVLocs2[i].getLocReg()) return false; } else { if (RVLocs1[i].getLocMemOffset() != RVLocs2[i].getLocMemOffset()) return false; } } } // If the callee takes no arguments then go on to check the results of the // call. if (!Outs.empty()) { // Check if stack adjustment is needed. For now, do not do this if any // argument is passed on the stack. SmallVector ArgLocs; ARMCCState CCInfo(CalleeCC, isVarArg, DAG.getMachineFunction(), getTargetMachine(), ArgLocs, *DAG.getContext(), Call); CCInfo.AnalyzeCallOperands(Outs, CCAssignFnForNode(CalleeCC, false, isVarArg)); if (CCInfo.getNextStackOffset()) { MachineFunction &MF = DAG.getMachineFunction(); // Check if the arguments are already laid out in the right way as // the caller's fixed stack objects. MachineFrameInfo *MFI = MF.getFrameInfo(); const MachineRegisterInfo *MRI = &MF.getRegInfo(); const ARMInstrInfo *TII = ((ARMTargetMachine&)getTargetMachine()).getInstrInfo(); for (unsigned i = 0, realArgIdx = 0, e = ArgLocs.size(); i != e; ++i, ++realArgIdx) { CCValAssign &VA = ArgLocs[i]; EVT RegVT = VA.getLocVT(); SDValue Arg = OutVals[realArgIdx]; ISD::ArgFlagsTy Flags = Outs[realArgIdx].Flags; if (VA.getLocInfo() == CCValAssign::Indirect) return false; if (VA.needsCustom()) { // f64 and vector types are split into multiple registers or // register/stack-slot combinations. The types will not match // the registers; give up on memory f64 refs until we figure // out what to do about this. if (!VA.isRegLoc()) return false; if (!ArgLocs[++i].isRegLoc()) return false; if (RegVT == MVT::v2f64) { if (!ArgLocs[++i].isRegLoc()) return false; if (!ArgLocs[++i].isRegLoc()) return false; } } else if (!VA.isRegLoc()) { if (!MatchingStackOffset(Arg, VA.getLocMemOffset(), Flags, MFI, MRI, TII)) return false; } } } } return true; } SDValue ARMTargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg, const SmallVectorImpl &Outs, const SmallVectorImpl &OutVals, DebugLoc dl, SelectionDAG &DAG) const { // CCValAssign - represent the assignment of the return value to a location. SmallVector RVLocs; // CCState - Info about the registers and stack slots. ARMCCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(), getTargetMachine(), RVLocs, *DAG.getContext(), Call); // Analyze outgoing return values. CCInfo.AnalyzeReturn(Outs, CCAssignFnForNode(CallConv, /* Return */ true, isVarArg)); // If this is the first return lowered for this function, add // the regs to the liveout set for the function. if (DAG.getMachineFunction().getRegInfo().liveout_empty()) { for (unsigned i = 0; i != RVLocs.size(); ++i) if (RVLocs[i].isRegLoc()) DAG.getMachineFunction().getRegInfo().addLiveOut(RVLocs[i].getLocReg()); } SDValue Flag; // Copy the result values into the output registers. for (unsigned i = 0, realRVLocIdx = 0; i != RVLocs.size(); ++i, ++realRVLocIdx) { CCValAssign &VA = RVLocs[i]; assert(VA.isRegLoc() && "Can only return in registers!"); SDValue Arg = OutVals[realRVLocIdx]; switch (VA.getLocInfo()) { default: llvm_unreachable("Unknown loc info!"); case CCValAssign::Full: break; case CCValAssign::BCvt: Arg = DAG.getNode(ISD::BITCAST, dl, VA.getLocVT(), Arg); break; } if (VA.needsCustom()) { if (VA.getLocVT() == MVT::v2f64) { // Extract the first half and return it in two registers. SDValue Half = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64, Arg, DAG.getConstant(0, MVT::i32)); SDValue HalfGPRs = DAG.getNode(ARMISD::VMOVRRD, dl, DAG.getVTList(MVT::i32, MVT::i32), Half); Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), HalfGPRs, Flag); Flag = Chain.getValue(1); VA = RVLocs[++i]; // skip ahead to next loc Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), HalfGPRs.getValue(1), Flag); Flag = Chain.getValue(1); VA = RVLocs[++i]; // skip ahead to next loc // Extract the 2nd half and fall through to handle it as an f64 value. Arg = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64, Arg, DAG.getConstant(1, MVT::i32)); } // Legalize ret f64 -> ret 2 x i32. We always have fmrrd if f64 is // available. SDValue fmrrd = DAG.getNode(ARMISD::VMOVRRD, dl, DAG.getVTList(MVT::i32, MVT::i32), &Arg, 1); Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), fmrrd, Flag); Flag = Chain.getValue(1); VA = RVLocs[++i]; // skip ahead to next loc Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), fmrrd.getValue(1), Flag); } else Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), Arg, Flag); // Guarantee that all emitted copies are // stuck together, avoiding something bad. Flag = Chain.getValue(1); } SDValue result; if (Flag.getNode()) result = DAG.getNode(ARMISD::RET_FLAG, dl, MVT::Other, Chain, Flag); else // Return Void result = DAG.getNode(ARMISD::RET_FLAG, dl, MVT::Other, Chain); return result; } bool ARMTargetLowering::isUsedByReturnOnly(SDNode *N) const { if (N->getNumValues() != 1) return false; if (!N->hasNUsesOfValue(1, 0)) return false; unsigned NumCopies = 0; SDNode* Copies[2]; SDNode *Use = *N->use_begin(); if (Use->getOpcode() == ISD::CopyToReg) { Copies[NumCopies++] = Use; } else if (Use->getOpcode() == ARMISD::VMOVRRD) { // f64 returned in a pair of GPRs. for (SDNode::use_iterator UI = Use->use_begin(), UE = Use->use_end(); UI != UE; ++UI) { if (UI->getOpcode() != ISD::CopyToReg) return false; Copies[UI.getUse().getResNo()] = *UI; ++NumCopies; } } else if (Use->getOpcode() == ISD::BITCAST) { // f32 returned in a single GPR. if (!Use->hasNUsesOfValue(1, 0)) return false; Use = *Use->use_begin(); if (Use->getOpcode() != ISD::CopyToReg || !Use->hasNUsesOfValue(1, 0)) return false; Copies[NumCopies++] = Use; } else { return false; } if (NumCopies != 1 && NumCopies != 2) return false; bool HasRet = false; for (unsigned i = 0; i < NumCopies; ++i) { SDNode *Copy = Copies[i]; for (SDNode::use_iterator UI = Copy->use_begin(), UE = Copy->use_end(); UI != UE; ++UI) { if (UI->getOpcode() == ISD::CopyToReg) { SDNode *Use = *UI; if (Use == Copies[0] || Use == Copies[1]) continue; return false; } if (UI->getOpcode() != ARMISD::RET_FLAG) return false; HasRet = true; } } return HasRet; } bool ARMTargetLowering::mayBeEmittedAsTailCall(CallInst *CI) const { if (!EnableARMTailCalls) return false; if (!CI->isTailCall()) return false; return !Subtarget->isThumb1Only(); } // ConstantPool, JumpTable, GlobalAddress, and ExternalSymbol are lowered as // their target counterpart wrapped in the ARMISD::Wrapper node. Suppose N is // one of the above mentioned nodes. It has to be wrapped because otherwise // Select(N) returns N. So the raw TargetGlobalAddress nodes, etc. can only // be used to form addressing mode. These wrapped nodes will be selected // into MOVi. static SDValue LowerConstantPool(SDValue Op, SelectionDAG &DAG) { EVT PtrVT = Op.getValueType(); // FIXME there is no actual debug info here DebugLoc dl = Op.getDebugLoc(); ConstantPoolSDNode *CP = cast(Op); SDValue Res; if (CP->isMachineConstantPoolEntry()) Res = DAG.getTargetConstantPool(CP->getMachineCPVal(), PtrVT, CP->getAlignment()); else Res = DAG.getTargetConstantPool(CP->getConstVal(), PtrVT, CP->getAlignment()); return DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, Res); } unsigned ARMTargetLowering::getJumpTableEncoding() const { return MachineJumpTableInfo::EK_Inline; } SDValue ARMTargetLowering::LowerBlockAddress(SDValue Op, SelectionDAG &DAG) const { MachineFunction &MF = DAG.getMachineFunction(); ARMFunctionInfo *AFI = MF.getInfo(); unsigned ARMPCLabelIndex = 0; DebugLoc DL = Op.getDebugLoc(); EVT PtrVT = getPointerTy(); const BlockAddress *BA = cast(Op)->getBlockAddress(); Reloc::Model RelocM = getTargetMachine().getRelocationModel(); SDValue CPAddr; if (RelocM == Reloc::Static) { CPAddr = DAG.getTargetConstantPool(BA, PtrVT, 4); } else { unsigned PCAdj = Subtarget->isThumb() ? 4 : 8; ARMPCLabelIndex = AFI->createPICLabelUId(); ARMConstantPoolValue *CPV = ARMConstantPoolConstant::Create(BA, ARMPCLabelIndex, ARMCP::CPBlockAddress, PCAdj); CPAddr = DAG.getTargetConstantPool(CPV, PtrVT, 4); } CPAddr = DAG.getNode(ARMISD::Wrapper, DL, PtrVT, CPAddr); SDValue Result = DAG.getLoad(PtrVT, DL, DAG.getEntryNode(), CPAddr, MachinePointerInfo::getConstantPool(), false, false, 0); if (RelocM == Reloc::Static) return Result; SDValue PICLabel = DAG.getConstant(ARMPCLabelIndex, MVT::i32); return DAG.getNode(ARMISD::PIC_ADD, DL, PtrVT, Result, PICLabel); } // Lower ISD::GlobalTLSAddress using the "general dynamic" model SDValue ARMTargetLowering::LowerToTLSGeneralDynamicModel(GlobalAddressSDNode *GA, SelectionDAG &DAG) const { DebugLoc dl = GA->getDebugLoc(); EVT PtrVT = getPointerTy(); unsigned char PCAdj = Subtarget->isThumb() ? 4 : 8; MachineFunction &MF = DAG.getMachineFunction(); ARMFunctionInfo *AFI = MF.getInfo(); unsigned ARMPCLabelIndex = AFI->createPICLabelUId(); ARMConstantPoolValue *CPV = ARMConstantPoolConstant::Create(GA->getGlobal(), ARMPCLabelIndex, ARMCP::CPValue, PCAdj, ARMCP::TLSGD, true); SDValue Argument = DAG.getTargetConstantPool(CPV, PtrVT, 4); Argument = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, Argument); Argument = DAG.getLoad(PtrVT, dl, DAG.getEntryNode(), Argument, MachinePointerInfo::getConstantPool(), false, false, 0); SDValue Chain = Argument.getValue(1); SDValue PICLabel = DAG.getConstant(ARMPCLabelIndex, MVT::i32); Argument = DAG.getNode(ARMISD::PIC_ADD, dl, PtrVT, Argument, PICLabel); // call __tls_get_addr. ArgListTy Args; ArgListEntry Entry; Entry.Node = Argument; Entry.Ty = (Type *) Type::getInt32Ty(*DAG.getContext()); Args.push_back(Entry); // FIXME: is there useful debug info available here? std::pair CallResult = LowerCallTo(Chain, (Type *) Type::getInt32Ty(*DAG.getContext()), false, false, false, false, 0, CallingConv::C, false, /*isReturnValueUsed=*/true, DAG.getExternalSymbol("__tls_get_addr", PtrVT), Args, DAG, dl); return CallResult.first; } // Lower ISD::GlobalTLSAddress using the "initial exec" or // "local exec" model. SDValue ARMTargetLowering::LowerToTLSExecModels(GlobalAddressSDNode *GA, SelectionDAG &DAG) const { const GlobalValue *GV = GA->getGlobal(); DebugLoc dl = GA->getDebugLoc(); SDValue Offset; SDValue Chain = DAG.getEntryNode(); EVT PtrVT = getPointerTy(); // Get the Thread Pointer SDValue ThreadPointer = DAG.getNode(ARMISD::THREAD_POINTER, dl, PtrVT); if (GV->isDeclaration()) { MachineFunction &MF = DAG.getMachineFunction(); ARMFunctionInfo *AFI = MF.getInfo(); unsigned ARMPCLabelIndex = AFI->createPICLabelUId(); // Initial exec model. unsigned char PCAdj = Subtarget->isThumb() ? 4 : 8; ARMConstantPoolValue *CPV = ARMConstantPoolConstant::Create(GA->getGlobal(), ARMPCLabelIndex, ARMCP::CPValue, PCAdj, ARMCP::GOTTPOFF, true); Offset = DAG.getTargetConstantPool(CPV, PtrVT, 4); Offset = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, Offset); Offset = DAG.getLoad(PtrVT, dl, Chain, Offset, MachinePointerInfo::getConstantPool(), false, false, 0); Chain = Offset.getValue(1); SDValue PICLabel = DAG.getConstant(ARMPCLabelIndex, MVT::i32); Offset = DAG.getNode(ARMISD::PIC_ADD, dl, PtrVT, Offset, PICLabel); Offset = DAG.getLoad(PtrVT, dl, Chain, Offset, MachinePointerInfo::getConstantPool(), false, false, 0); } else { // local exec model ARMConstantPoolValue *CPV = ARMConstantPoolConstant::Create(GV, ARMCP::TPOFF); Offset = DAG.getTargetConstantPool(CPV, PtrVT, 4); Offset = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, Offset); Offset = DAG.getLoad(PtrVT, dl, Chain, Offset, MachinePointerInfo::getConstantPool(), false, false, 0); } // The address of the thread local variable is the add of the thread // pointer with the offset of the variable. return DAG.getNode(ISD::ADD, dl, PtrVT, ThreadPointer, Offset); } SDValue ARMTargetLowering::LowerGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const { // TODO: implement the "local dynamic" model assert(Subtarget->isTargetELF() && "TLS not implemented for non-ELF targets"); GlobalAddressSDNode *GA = cast(Op); // If the relocation model is PIC, use the "General Dynamic" TLS Model, // otherwise use the "Local Exec" TLS Model if (getTargetMachine().getRelocationModel() == Reloc::PIC_) return LowerToTLSGeneralDynamicModel(GA, DAG); else return LowerToTLSExecModels(GA, DAG); } SDValue ARMTargetLowering::LowerGlobalAddressELF(SDValue Op, SelectionDAG &DAG) const { EVT PtrVT = getPointerTy(); DebugLoc dl = Op.getDebugLoc(); const GlobalValue *GV = cast(Op)->getGlobal(); Reloc::Model RelocM = getTargetMachine().getRelocationModel(); if (RelocM == Reloc::PIC_) { bool UseGOTOFF = GV->hasLocalLinkage() || GV->hasHiddenVisibility(); ARMConstantPoolValue *CPV = ARMConstantPoolConstant::Create(GV, UseGOTOFF ? ARMCP::GOTOFF : ARMCP::GOT); SDValue CPAddr = DAG.getTargetConstantPool(CPV, PtrVT, 4); CPAddr = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr); SDValue Result = DAG.getLoad(PtrVT, dl, DAG.getEntryNode(), CPAddr, MachinePointerInfo::getConstantPool(), false, false, 0); SDValue Chain = Result.getValue(1); SDValue GOT = DAG.getGLOBAL_OFFSET_TABLE(PtrVT); Result = DAG.getNode(ISD::ADD, dl, PtrVT, Result, GOT); if (!UseGOTOFF) Result = DAG.getLoad(PtrVT, dl, Chain, Result, MachinePointerInfo::getGOT(), false, false, 0); return Result; } // If we have T2 ops, we can materialize the address directly via movt/movw // pair. This is always cheaper. if (Subtarget->useMovt()) { ++NumMovwMovt; // FIXME: Once remat is capable of dealing with instructions with register // operands, expand this into two nodes. return DAG.getNode(ARMISD::Wrapper, dl, PtrVT, DAG.getTargetGlobalAddress(GV, dl, PtrVT)); } else { SDValue CPAddr = DAG.getTargetConstantPool(GV, PtrVT, 4); CPAddr = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr); return DAG.getLoad(PtrVT, dl, DAG.getEntryNode(), CPAddr, MachinePointerInfo::getConstantPool(), false, false, 0); } } SDValue ARMTargetLowering::LowerGlobalAddressDarwin(SDValue Op, SelectionDAG &DAG) const { EVT PtrVT = getPointerTy(); DebugLoc dl = Op.getDebugLoc(); const GlobalValue *GV = cast(Op)->getGlobal(); Reloc::Model RelocM = getTargetMachine().getRelocationModel(); MachineFunction &MF = DAG.getMachineFunction(); ARMFunctionInfo *AFI = MF.getInfo(); // FIXME: Enable this for static codegen when tool issues are fixed. if (Subtarget->useMovt() && RelocM != Reloc::Static) { ++NumMovwMovt; // FIXME: Once remat is capable of dealing with instructions with register // operands, expand this into two nodes. if (RelocM == Reloc::Static) return DAG.getNode(ARMISD::Wrapper, dl, PtrVT, DAG.getTargetGlobalAddress(GV, dl, PtrVT)); unsigned Wrapper = (RelocM == Reloc::PIC_) ? ARMISD::WrapperPIC : ARMISD::WrapperDYN; SDValue Result = DAG.getNode(Wrapper, dl, PtrVT, DAG.getTargetGlobalAddress(GV, dl, PtrVT)); if (Subtarget->GVIsIndirectSymbol(GV, RelocM)) Result = DAG.getLoad(PtrVT, dl, DAG.getEntryNode(), Result, MachinePointerInfo::getGOT(), false, false, 0); return Result; } unsigned ARMPCLabelIndex = 0; SDValue CPAddr; if (RelocM == Reloc::Static) { CPAddr = DAG.getTargetConstantPool(GV, PtrVT, 4); } else { ARMPCLabelIndex = AFI->createPICLabelUId(); unsigned PCAdj = (RelocM != Reloc::PIC_) ? 0 : (Subtarget->isThumb()?4:8); ARMConstantPoolValue *CPV = ARMConstantPoolConstant::Create(GV, ARMPCLabelIndex, ARMCP::CPValue, PCAdj); CPAddr = DAG.getTargetConstantPool(CPV, PtrVT, 4); } CPAddr = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr); SDValue Result = DAG.getLoad(PtrVT, dl, DAG.getEntryNode(), CPAddr, MachinePointerInfo::getConstantPool(), false, false, 0); SDValue Chain = Result.getValue(1); if (RelocM == Reloc::PIC_) { SDValue PICLabel = DAG.getConstant(ARMPCLabelIndex, MVT::i32); Result = DAG.getNode(ARMISD::PIC_ADD, dl, PtrVT, Result, PICLabel); } if (Subtarget->GVIsIndirectSymbol(GV, RelocM)) Result = DAG.getLoad(PtrVT, dl, Chain, Result, MachinePointerInfo::getGOT(), false, false, 0); return Result; } SDValue ARMTargetLowering::LowerGLOBAL_OFFSET_TABLE(SDValue Op, SelectionDAG &DAG) const { assert(Subtarget->isTargetELF() && "GLOBAL OFFSET TABLE not implemented for non-ELF targets"); MachineFunction &MF = DAG.getMachineFunction(); ARMFunctionInfo *AFI = MF.getInfo(); unsigned ARMPCLabelIndex = AFI->createPICLabelUId(); EVT PtrVT = getPointerTy(); DebugLoc dl = Op.getDebugLoc(); unsigned PCAdj = Subtarget->isThumb() ? 4 : 8; ARMConstantPoolValue *CPV = ARMConstantPoolSymbol::Create(*DAG.getContext(), "_GLOBAL_OFFSET_TABLE_", ARMPCLabelIndex, PCAdj); SDValue CPAddr = DAG.getTargetConstantPool(CPV, PtrVT, 4); CPAddr = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr); SDValue Result = DAG.getLoad(PtrVT, dl, DAG.getEntryNode(), CPAddr, MachinePointerInfo::getConstantPool(), false, false, 0); SDValue PICLabel = DAG.getConstant(ARMPCLabelIndex, MVT::i32); return DAG.getNode(ARMISD::PIC_ADD, dl, PtrVT, Result, PICLabel); } SDValue ARMTargetLowering::LowerEH_SJLJ_DISPATCHSETUP(SDValue Op, SelectionDAG &DAG) const { DebugLoc dl = Op.getDebugLoc(); return DAG.getNode(ARMISD::EH_SJLJ_DISPATCHSETUP, dl, MVT::Other, Op.getOperand(0), Op.getOperand(1)); } SDValue ARMTargetLowering::LowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const { DebugLoc dl = Op.getDebugLoc(); SDValue Val = DAG.getConstant(0, MVT::i32); return DAG.getNode(ARMISD::EH_SJLJ_SETJMP, dl, DAG.getVTList(MVT::i32, MVT::Other), Op.getOperand(0), Op.getOperand(1), Val); } SDValue ARMTargetLowering::LowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const { DebugLoc dl = Op.getDebugLoc(); return DAG.getNode(ARMISD::EH_SJLJ_LONGJMP, dl, MVT::Other, Op.getOperand(0), Op.getOperand(1), DAG.getConstant(0, MVT::i32)); } SDValue ARMTargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG, const ARMSubtarget *Subtarget) const { unsigned IntNo = cast(Op.getOperand(0))->getZExtValue(); DebugLoc dl = Op.getDebugLoc(); switch (IntNo) { default: return SDValue(); // Don't custom lower most intrinsics. case Intrinsic::arm_thread_pointer: { EVT PtrVT = DAG.getTargetLoweringInfo().getPointerTy(); return DAG.getNode(ARMISD::THREAD_POINTER, dl, PtrVT); } case Intrinsic::eh_sjlj_lsda: { MachineFunction &MF = DAG.getMachineFunction(); ARMFunctionInfo *AFI = MF.getInfo(); unsigned ARMPCLabelIndex = AFI->createPICLabelUId(); EVT PtrVT = getPointerTy(); DebugLoc dl = Op.getDebugLoc(); Reloc::Model RelocM = getTargetMachine().getRelocationModel(); SDValue CPAddr; unsigned PCAdj = (RelocM != Reloc::PIC_) ? 0 : (Subtarget->isThumb() ? 4 : 8); ARMConstantPoolValue *CPV = ARMConstantPoolConstant::Create(MF.getFunction(), ARMPCLabelIndex, ARMCP::CPLSDA, PCAdj); CPAddr = DAG.getTargetConstantPool(CPV, PtrVT, 4); CPAddr = DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, CPAddr); SDValue Result = DAG.getLoad(PtrVT, dl, DAG.getEntryNode(), CPAddr, MachinePointerInfo::getConstantPool(), false, false, 0); if (RelocM == Reloc::PIC_) { SDValue PICLabel = DAG.getConstant(ARMPCLabelIndex, MVT::i32); Result = DAG.getNode(ARMISD::PIC_ADD, dl, PtrVT, Result, PICLabel); } return Result; } case Intrinsic::arm_neon_vmulls: case Intrinsic::arm_neon_vmullu: { unsigned NewOpc = (IntNo == Intrinsic::arm_neon_vmulls) ? ARMISD::VMULLs : ARMISD::VMULLu; return DAG.getNode(NewOpc, Op.getDebugLoc(), Op.getValueType(), Op.getOperand(1), Op.getOperand(2)); } } } static SDValue LowerMEMBARRIER(SDValue Op, SelectionDAG &DAG, const ARMSubtarget *Subtarget) { DebugLoc dl = Op.getDebugLoc(); if (!Subtarget->hasDataBarrier()) { // Some ARMv6 cpus can support data barriers with an mcr instruction. // Thumb1 and pre-v6 ARM mode use a libcall instead and should never get // here. assert(Subtarget->hasV6Ops() && !Subtarget->isThumb() && "Unexpected ISD::MEMBARRIER encountered. Should be libcall!"); return DAG.getNode(ARMISD::MEMBARRIER_MCR, dl, MVT::Other, Op.getOperand(0), DAG.getConstant(0, MVT::i32)); } SDValue Op5 = Op.getOperand(5); bool isDeviceBarrier = cast(Op5)->getZExtValue() != 0; unsigned isLL = cast(Op.getOperand(1))->getZExtValue(); unsigned isLS = cast(Op.getOperand(2))->getZExtValue(); bool isOnlyStoreBarrier = (isLL == 0 && isLS == 0); ARM_MB::MemBOpt DMBOpt; if (isDeviceBarrier) DMBOpt = isOnlyStoreBarrier ? ARM_MB::ST : ARM_MB::SY; else DMBOpt = isOnlyStoreBarrier ? ARM_MB::ISHST : ARM_MB::ISH; return DAG.getNode(ARMISD::MEMBARRIER, dl, MVT::Other, Op.getOperand(0), DAG.getConstant(DMBOpt, MVT::i32)); } static SDValue LowerATOMIC_FENCE(SDValue Op, SelectionDAG &DAG, const ARMSubtarget *Subtarget) { // FIXME: handle "fence singlethread" more efficiently. DebugLoc dl = Op.getDebugLoc(); if (!Subtarget->hasDataBarrier()) { // Some ARMv6 cpus can support data barriers with an mcr instruction. // Thumb1 and pre-v6 ARM mode use a libcall instead and should never get // here. assert(Subtarget->hasV6Ops() && !Subtarget->isThumb() && "Unexpected ISD::MEMBARRIER encountered. Should be libcall!"); return DAG.getNode(ARMISD::MEMBARRIER_MCR, dl, MVT::Other, Op.getOperand(0), DAG.getConstant(0, MVT::i32)); } return DAG.getNode(ARMISD::MEMBARRIER, dl, MVT::Other, Op.getOperand(0), DAG.getConstant(ARM_MB::ISH, MVT::i32)); } static SDValue LowerPREFETCH(SDValue Op, SelectionDAG &DAG, const ARMSubtarget *Subtarget) { // ARM pre v5TE and Thumb1 does not have preload instructions. if (!(Subtarget->isThumb2() || (!Subtarget->isThumb1Only() && Subtarget->hasV5TEOps()))) // Just preserve the chain. return Op.getOperand(0); DebugLoc dl = Op.getDebugLoc(); unsigned isRead = ~cast(Op.getOperand(2))->getZExtValue() & 1; if (!isRead && (!Subtarget->hasV7Ops() || !Subtarget->hasMPExtension())) // ARMv7 with MP extension has PLDW. return Op.getOperand(0); unsigned isData = cast(Op.getOperand(4))->getZExtValue(); if (Subtarget->isThumb()) { // Invert the bits. isRead = ~isRead & 1; isData = ~isData & 1; } return DAG.getNode(ARMISD::PRELOAD, dl, MVT::Other, Op.getOperand(0), Op.getOperand(1), DAG.getConstant(isRead, MVT::i32), DAG.getConstant(isData, MVT::i32)); } static SDValue LowerVASTART(SDValue Op, SelectionDAG &DAG) { MachineFunction &MF = DAG.getMachineFunction(); ARMFunctionInfo *FuncInfo = MF.getInfo(); // vastart just stores the address of the VarArgsFrameIndex slot into the // memory location argument. DebugLoc dl = Op.getDebugLoc(); EVT PtrVT = DAG.getTargetLoweringInfo().getPointerTy(); SDValue FR = DAG.getFrameIndex(FuncInfo->getVarArgsFrameIndex(), PtrVT); const Value *SV = cast(Op.getOperand(2))->getValue(); return DAG.getStore(Op.getOperand(0), dl, FR, Op.getOperand(1), MachinePointerInfo(SV), false, false, 0); } SDValue ARMTargetLowering::GetF64FormalArgument(CCValAssign &VA, CCValAssign &NextVA, SDValue &Root, SelectionDAG &DAG, DebugLoc dl) const { MachineFunction &MF = DAG.getMachineFunction(); ARMFunctionInfo *AFI = MF.getInfo(); TargetRegisterClass *RC; if (AFI->isThumb1OnlyFunction()) RC = ARM::tGPRRegisterClass; else RC = ARM::GPRRegisterClass; // Transform the arguments stored in physical registers into virtual ones. unsigned Reg = MF.addLiveIn(VA.getLocReg(), RC); SDValue ArgValue = DAG.getCopyFromReg(Root, dl, Reg, MVT::i32); SDValue ArgValue2; if (NextVA.isMemLoc()) { MachineFrameInfo *MFI = MF.getFrameInfo(); int FI = MFI->CreateFixedObject(4, NextVA.getLocMemOffset(), true); // Create load node to retrieve arguments from the stack. SDValue FIN = DAG.getFrameIndex(FI, getPointerTy()); ArgValue2 = DAG.getLoad(MVT::i32, dl, Root, FIN, MachinePointerInfo::getFixedStack(FI), false, false, 0); } else { Reg = MF.addLiveIn(NextVA.getLocReg(), RC); ArgValue2 = DAG.getCopyFromReg(Root, dl, Reg, MVT::i32); } return DAG.getNode(ARMISD::VMOVDRR, dl, MVT::f64, ArgValue, ArgValue2); } void ARMTargetLowering::computeRegArea(CCState &CCInfo, MachineFunction &MF, unsigned &VARegSize, unsigned &VARegSaveSize) const { unsigned NumGPRs; if (CCInfo.isFirstByValRegValid()) NumGPRs = ARM::R4 - CCInfo.getFirstByValReg(); else { unsigned int firstUnalloced; firstUnalloced = CCInfo.getFirstUnallocated(GPRArgRegs, sizeof(GPRArgRegs) / sizeof(GPRArgRegs[0])); NumGPRs = (firstUnalloced <= 3) ? (4 - firstUnalloced) : 0; } unsigned Align = MF.getTarget().getFrameLowering()->getStackAlignment(); VARegSize = NumGPRs * 4; VARegSaveSize = (VARegSize + Align - 1) & ~(Align - 1); } // The remaining GPRs hold either the beginning of variable-argument // data, or the beginning of an aggregate passed by value (usuall // byval). Either way, we allocate stack slots adjacent to the data // provided by our caller, and store the unallocated registers there. // If this is a variadic function, the va_list pointer will begin with // these values; otherwise, this reassembles a (byval) structure that // was split between registers and memory. void ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo, SelectionDAG &DAG, DebugLoc dl, SDValue &Chain, unsigned ArgOffset) const { MachineFunction &MF = DAG.getMachineFunction(); MachineFrameInfo *MFI = MF.getFrameInfo(); ARMFunctionInfo *AFI = MF.getInfo(); unsigned firstRegToSaveIndex; if (CCInfo.isFirstByValRegValid()) firstRegToSaveIndex = CCInfo.getFirstByValReg() - ARM::R0; else { firstRegToSaveIndex = CCInfo.getFirstUnallocated (GPRArgRegs, sizeof(GPRArgRegs) / sizeof(GPRArgRegs[0])); } unsigned VARegSize, VARegSaveSize; computeRegArea(CCInfo, MF, VARegSize, VARegSaveSize); if (VARegSaveSize) { // If this function is vararg, store any remaining integer argument regs // to their spots on the stack so that they may be loaded by deferencing // the result of va_next. AFI->setVarArgsRegSaveSize(VARegSaveSize); AFI->setVarArgsFrameIndex(MFI->CreateFixedObject(VARegSaveSize, ArgOffset + VARegSaveSize - VARegSize, false)); SDValue FIN = DAG.getFrameIndex(AFI->getVarArgsFrameIndex(), getPointerTy()); SmallVector MemOps; for (; firstRegToSaveIndex < 4; ++firstRegToSaveIndex) { TargetRegisterClass *RC; if (AFI->isThumb1OnlyFunction()) RC = ARM::tGPRRegisterClass; else RC = ARM::GPRRegisterClass; unsigned VReg = MF.addLiveIn(GPRArgRegs[firstRegToSaveIndex], RC); SDValue Val = DAG.getCopyFromReg(Chain, dl, VReg, MVT::i32); SDValue Store = DAG.getStore(Val.getValue(1), dl, Val, FIN, MachinePointerInfo::getFixedStack(AFI->getVarArgsFrameIndex()), false, false, 0); MemOps.push_back(Store); FIN = DAG.getNode(ISD::ADD, dl, getPointerTy(), FIN, DAG.getConstant(4, getPointerTy())); } if (!MemOps.empty()) Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, &MemOps[0], MemOps.size()); } else // This will point to the next argument passed via stack. AFI->setVarArgsFrameIndex(MFI->CreateFixedObject(4, ArgOffset, true)); } SDValue ARMTargetLowering::LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv, bool isVarArg, const SmallVectorImpl &Ins, DebugLoc dl, SelectionDAG &DAG, SmallVectorImpl &InVals) const { MachineFunction &MF = DAG.getMachineFunction(); MachineFrameInfo *MFI = MF.getFrameInfo(); ARMFunctionInfo *AFI = MF.getInfo(); // Assign locations to all of the incoming arguments. SmallVector ArgLocs; ARMCCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(), getTargetMachine(), ArgLocs, *DAG.getContext(), Prologue); CCInfo.AnalyzeFormalArguments(Ins, CCAssignFnForNode(CallConv, /* Return*/ false, isVarArg)); SmallVector ArgValues; int lastInsIndex = -1; SDValue ArgValue; for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) { CCValAssign &VA = ArgLocs[i]; // Arguments stored in registers. if (VA.isRegLoc()) { EVT RegVT = VA.getLocVT(); if (VA.needsCustom()) { // f64 and vector types are split up into multiple registers or // combinations of registers and stack slots. if (VA.getLocVT() == MVT::v2f64) { SDValue ArgValue1 = GetF64FormalArgument(VA, ArgLocs[++i], Chain, DAG, dl); VA = ArgLocs[++i]; // skip ahead to next loc SDValue ArgValue2; if (VA.isMemLoc()) { int FI = MFI->CreateFixedObject(8, VA.getLocMemOffset(), true); SDValue FIN = DAG.getFrameIndex(FI, getPointerTy()); ArgValue2 = DAG.getLoad(MVT::f64, dl, Chain, FIN, MachinePointerInfo::getFixedStack(FI), false, false, 0); } else { ArgValue2 = GetF64FormalArgument(VA, ArgLocs[++i], Chain, DAG, dl); } ArgValue = DAG.getNode(ISD::UNDEF, dl, MVT::v2f64); ArgValue = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, MVT::v2f64, ArgValue, ArgValue1, DAG.getIntPtrConstant(0)); ArgValue = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, MVT::v2f64, ArgValue, ArgValue2, DAG.getIntPtrConstant(1)); } else ArgValue = GetF64FormalArgument(VA, ArgLocs[++i], Chain, DAG, dl); } else { TargetRegisterClass *RC; if (RegVT == MVT::f32) RC = ARM::SPRRegisterClass; else if (RegVT == MVT::f64) RC = ARM::DPRRegisterClass; else if (RegVT == MVT::v2f64) RC = ARM::QPRRegisterClass; else if (RegVT == MVT::i32) RC = (AFI->isThumb1OnlyFunction() ? ARM::tGPRRegisterClass : ARM::GPRRegisterClass); else llvm_unreachable("RegVT not supported by FORMAL_ARGUMENTS Lowering"); // Transform the arguments in physical registers into virtual ones. unsigned Reg = MF.addLiveIn(VA.getLocReg(), RC); ArgValue = DAG.getCopyFromReg(Chain, dl, Reg, RegVT); } // If this is an 8 or 16-bit value, it is really passed promoted // to 32 bits. Insert an assert[sz]ext to capture this, then // truncate to the right size. switch (VA.getLocInfo()) { default: llvm_unreachable("Unknown loc info!"); case CCValAssign::Full: break; case CCValAssign::BCvt: ArgValue = DAG.getNode(ISD::BITCAST, dl, VA.getValVT(), ArgValue); break; case CCValAssign::SExt: ArgValue = DAG.getNode(ISD::AssertSext, dl, RegVT, ArgValue, DAG.getValueType(VA.getValVT())); ArgValue = DAG.getNode(ISD::TRUNCATE, dl, VA.getValVT(), ArgValue); break; case CCValAssign::ZExt: ArgValue = DAG.getNode(ISD::AssertZext, dl, RegVT, ArgValue, DAG.getValueType(VA.getValVT())); ArgValue = DAG.getNode(ISD::TRUNCATE, dl, VA.getValVT(), ArgValue); break; } InVals.push_back(ArgValue); } else { // VA.isRegLoc() // sanity check assert(VA.isMemLoc()); assert(VA.getValVT() != MVT::i64 && "i64 should already be lowered"); int index = ArgLocs[i].getValNo(); // Some Ins[] entries become multiple ArgLoc[] entries. // Process them only once. if (index != lastInsIndex) { ISD::ArgFlagsTy Flags = Ins[index].Flags; // FIXME: For now, all byval parameter objects are marked mutable. // This can be changed with more analysis. // In case of tail call optimization mark all arguments mutable. // Since they could be overwritten by lowering of arguments in case of // a tail call. if (Flags.isByVal()) { unsigned VARegSize, VARegSaveSize; computeRegArea(CCInfo, MF, VARegSize, VARegSaveSize); VarArgStyleRegisters(CCInfo, DAG, dl, Chain, 0); unsigned Bytes = Flags.getByValSize() - VARegSize; if (Bytes == 0) Bytes = 1; // Don't create zero-sized stack objects. int FI = MFI->CreateFixedObject(Bytes, VA.getLocMemOffset(), false); InVals.push_back(DAG.getFrameIndex(FI, getPointerTy())); } else { int FI = MFI->CreateFixedObject(VA.getLocVT().getSizeInBits()/8, VA.getLocMemOffset(), true); // Create load nodes to retrieve arguments from the stack. SDValue FIN = DAG.getFrameIndex(FI, getPointerTy()); InVals.push_back(DAG.getLoad(VA.getValVT(), dl, Chain, FIN, MachinePointerInfo::getFixedStack(FI), false, false, 0)); } lastInsIndex = index; } } } // varargs if (isVarArg) VarArgStyleRegisters(CCInfo, DAG, dl, Chain, CCInfo.getNextStackOffset()); return Chain; } /// isFloatingPointZero - Return true if this is +0.0. static bool isFloatingPointZero(SDValue Op) { if (ConstantFPSDNode *CFP = dyn_cast(Op)) return CFP->getValueAPF().isPosZero(); else if (ISD::isEXTLoad(Op.getNode()) || ISD::isNON_EXTLoad(Op.getNode())) { // Maybe this has already been legalized into the constant pool? if (Op.getOperand(1).getOpcode() == ARMISD::Wrapper) { SDValue WrapperOp = Op.getOperand(1).getOperand(0); if (ConstantPoolSDNode *CP = dyn_cast(WrapperOp)) if (const ConstantFP *CFP = dyn_cast(CP->getConstVal())) return CFP->getValueAPF().isPosZero(); } } return false; } /// Returns appropriate ARM CMP (cmp) and corresponding condition code for /// the given operands. SDValue ARMTargetLowering::getARMCmp(SDValue LHS, SDValue RHS, ISD::CondCode CC, SDValue &ARMcc, SelectionDAG &DAG, DebugLoc dl) const { if (ConstantSDNode *RHSC = dyn_cast(RHS.getNode())) { unsigned C = RHSC->getZExtValue(); if (!isLegalICmpImmediate(C)) { // Constant does not fit, try adjusting it by one? switch (CC) { default: break; case ISD::SETLT: case ISD::SETGE: if (C != 0x80000000 && isLegalICmpImmediate(C-1)) { CC = (CC == ISD::SETLT) ? ISD::SETLE : ISD::SETGT; RHS = DAG.getConstant(C-1, MVT::i32); } break; case ISD::SETULT: case ISD::SETUGE: if (C != 0 && isLegalICmpImmediate(C-1)) { CC = (CC == ISD::SETULT) ? ISD::SETULE : ISD::SETUGT; RHS = DAG.getConstant(C-1, MVT::i32); } break; case ISD::SETLE: case ISD::SETGT: if (C != 0x7fffffff && isLegalICmpImmediate(C+1)) { CC = (CC == ISD::SETLE) ? ISD::SETLT : ISD::SETGE; RHS = DAG.getConstant(C+1, MVT::i32); } break; case ISD::SETULE: case ISD::SETUGT: if (C != 0xffffffff && isLegalICmpImmediate(C+1)) { CC = (CC == ISD::SETULE) ? ISD::SETULT : ISD::SETUGE; RHS = DAG.getConstant(C+1, MVT::i32); } break; } } } ARMCC::CondCodes CondCode = IntCCToARMCC(CC); ARMISD::NodeType CompareType; switch (CondCode) { default: CompareType = ARMISD::CMP; break; case ARMCC::EQ: case ARMCC::NE: // Uses only Z Flag CompareType = ARMISD::CMPZ; break; } ARMcc = DAG.getConstant(CondCode, MVT::i32); return DAG.getNode(CompareType, dl, MVT::Glue, LHS, RHS); } /// Returns a appropriate VFP CMP (fcmp{s|d}+fmstat) for the given operands. SDValue ARMTargetLowering::getVFPCmp(SDValue LHS, SDValue RHS, SelectionDAG &DAG, DebugLoc dl) const { SDValue Cmp; if (!isFloatingPointZero(RHS)) Cmp = DAG.getNode(ARMISD::CMPFP, dl, MVT::Glue, LHS, RHS); else Cmp = DAG.getNode(ARMISD::CMPFPw0, dl, MVT::Glue, LHS); return DAG.getNode(ARMISD::FMSTAT, dl, MVT::Glue, Cmp); } /// duplicateCmp - Glue values can have only one use, so this function /// duplicates a comparison node. SDValue ARMTargetLowering::duplicateCmp(SDValue Cmp, SelectionDAG &DAG) const { unsigned Opc = Cmp.getOpcode(); DebugLoc DL = Cmp.getDebugLoc(); if (Opc == ARMISD::CMP || Opc == ARMISD::CMPZ) return DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0),Cmp.getOperand(1)); assert(Opc == ARMISD::FMSTAT && "unexpected comparison operation"); Cmp = Cmp.getOperand(0); Opc = Cmp.getOpcode(); if (Opc == ARMISD::CMPFP) Cmp = DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0),Cmp.getOperand(1)); else { assert(Opc == ARMISD::CMPFPw0 && "unexpected operand of FMSTAT"); Cmp = DAG.getNode(Opc, DL, MVT::Glue, Cmp.getOperand(0)); } return DAG.getNode(ARMISD::FMSTAT, DL, MVT::Glue, Cmp); } SDValue ARMTargetLowering::LowerSELECT(SDValue Op, SelectionDAG &DAG) const { SDValue Cond = Op.getOperand(0); SDValue SelectTrue = Op.getOperand(1); SDValue SelectFalse = Op.getOperand(2); DebugLoc dl = Op.getDebugLoc(); // Convert: // // (select (cmov 1, 0, cond), t, f) -> (cmov t, f, cond) // (select (cmov 0, 1, cond), t, f) -> (cmov f, t, cond) // if (Cond.getOpcode() == ARMISD::CMOV && Cond.hasOneUse()) { const ConstantSDNode *CMOVTrue = dyn_cast(Cond.getOperand(0)); const ConstantSDNode *CMOVFalse = dyn_cast(Cond.getOperand(1)); if (CMOVTrue && CMOVFalse) { unsigned CMOVTrueVal = CMOVTrue->getZExtValue(); unsigned CMOVFalseVal = CMOVFalse->getZExtValue(); SDValue True; SDValue False; if (CMOVTrueVal == 1 && CMOVFalseVal == 0) { True = SelectTrue; False = SelectFalse; } else if (CMOVTrueVal == 0 && CMOVFalseVal == 1) { True = SelectFalse; False = SelectTrue; } if (True.getNode() && False.getNode()) { EVT VT = Op.getValueType(); SDValue ARMcc = Cond.getOperand(2); SDValue CCR = Cond.getOperand(3); SDValue Cmp = duplicateCmp(Cond.getOperand(4), DAG); assert(True.getValueType() == VT); return DAG.getNode(ARMISD::CMOV, dl, VT, True, False, ARMcc, CCR, Cmp); } } } return DAG.getSelectCC(dl, Cond, DAG.getConstant(0, Cond.getValueType()), SelectTrue, SelectFalse, ISD::SETNE); } SDValue ARMTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const { EVT VT = Op.getValueType(); SDValue LHS = Op.getOperand(0); SDValue RHS = Op.getOperand(1); ISD::CondCode CC = cast(Op.getOperand(4))->get(); SDValue TrueVal = Op.getOperand(2); SDValue FalseVal = Op.getOperand(3); DebugLoc dl = Op.getDebugLoc(); if (LHS.getValueType() == MVT::i32) { SDValue ARMcc; SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32); SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl); return DAG.getNode(ARMISD::CMOV, dl, VT, FalseVal, TrueVal, ARMcc, CCR,Cmp); } ARMCC::CondCodes CondCode, CondCode2; FPCCToARMCC(CC, CondCode, CondCode2); SDValue ARMcc = DAG.getConstant(CondCode, MVT::i32); SDValue Cmp = getVFPCmp(LHS, RHS, DAG, dl); SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32); SDValue Result = DAG.getNode(ARMISD::CMOV, dl, VT, FalseVal, TrueVal, ARMcc, CCR, Cmp); if (CondCode2 != ARMCC::AL) { SDValue ARMcc2 = DAG.getConstant(CondCode2, MVT::i32); // FIXME: Needs another CMP because flag can have but one use. SDValue Cmp2 = getVFPCmp(LHS, RHS, DAG, dl); Result = DAG.getNode(ARMISD::CMOV, dl, VT, Result, TrueVal, ARMcc2, CCR, Cmp2); } return Result; } /// canChangeToInt - Given the fp compare operand, return true if it is suitable /// to morph to an integer compare sequence. static bool canChangeToInt(SDValue Op, bool &SeenZero, const ARMSubtarget *Subtarget) { SDNode *N = Op.getNode(); if (!N->hasOneUse()) // Otherwise it requires moving the value from fp to integer registers. return false; if (!N->getNumValues()) return false; EVT VT = Op.getValueType(); if (VT != MVT::f32 && !Subtarget->isFPBrccSlow()) // f32 case is generally profitable. f64 case only makes sense when vcmpe + // vmrs are very slow, e.g. cortex-a8. return false; if (isFloatingPointZero(Op)) { SeenZero = true; return true; } return ISD::isNormalLoad(N); } static SDValue bitcastf32Toi32(SDValue Op, SelectionDAG &DAG) { if (isFloatingPointZero(Op)) return DAG.getConstant(0, MVT::i32); if (LoadSDNode *Ld = dyn_cast(Op)) return DAG.getLoad(MVT::i32, Op.getDebugLoc(), Ld->getChain(), Ld->getBasePtr(), Ld->getPointerInfo(), Ld->isVolatile(), Ld->isNonTemporal(), Ld->getAlignment()); llvm_unreachable("Unknown VFP cmp argument!"); } static void expandf64Toi32(SDValue Op, SelectionDAG &DAG, SDValue &RetVal1, SDValue &RetVal2) { if (isFloatingPointZero(Op)) { RetVal1 = DAG.getConstant(0, MVT::i32); RetVal2 = DAG.getConstant(0, MVT::i32); return; } if (LoadSDNode *Ld = dyn_cast(Op)) { SDValue Ptr = Ld->getBasePtr(); RetVal1 = DAG.getLoad(MVT::i32, Op.getDebugLoc(), Ld->getChain(), Ptr, Ld->getPointerInfo(), Ld->isVolatile(), Ld->isNonTemporal(), Ld->getAlignment()); EVT PtrType = Ptr.getValueType(); unsigned NewAlign = MinAlign(Ld->getAlignment(), 4); SDValue NewPtr = DAG.getNode(ISD::ADD, Op.getDebugLoc(), PtrType, Ptr, DAG.getConstant(4, PtrType)); RetVal2 = DAG.getLoad(MVT::i32, Op.getDebugLoc(), Ld->getChain(), NewPtr, Ld->getPointerInfo().getWithOffset(4), Ld->isVolatile(), Ld->isNonTemporal(), NewAlign); return; } llvm_unreachable("Unknown VFP cmp argument!"); } /// OptimizeVFPBrcond - With -enable-unsafe-fp-math, it's legal to optimize some /// f32 and even f64 comparisons to integer ones. SDValue ARMTargetLowering::OptimizeVFPBrcond(SDValue Op, SelectionDAG &DAG) const { SDValue Chain = Op.getOperand(0); ISD::CondCode CC = cast(Op.getOperand(1))->get(); SDValue LHS = Op.getOperand(2); SDValue RHS = Op.getOperand(3); SDValue Dest = Op.getOperand(4); DebugLoc dl = Op.getDebugLoc(); bool SeenZero = false; if (canChangeToInt(LHS, SeenZero, Subtarget) && canChangeToInt(RHS, SeenZero, Subtarget) && // If one of the operand is zero, it's safe to ignore the NaN case since // we only care about equality comparisons. (SeenZero || (DAG.isKnownNeverNaN(LHS) && DAG.isKnownNeverNaN(RHS)))) { // If unsafe fp math optimization is enabled and there are no other uses of // the CMP operands, and the condition code is EQ or NE, we can optimize it // to an integer comparison. if (CC == ISD::SETOEQ) CC = ISD::SETEQ; else if (CC == ISD::SETUNE) CC = ISD::SETNE; SDValue ARMcc; if (LHS.getValueType() == MVT::f32) { LHS = bitcastf32Toi32(LHS, DAG); RHS = bitcastf32Toi32(RHS, DAG); SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl); SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32); return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc, CCR, Cmp); } SDValue LHS1, LHS2; SDValue RHS1, RHS2; expandf64Toi32(LHS, DAG, LHS1, LHS2); expandf64Toi32(RHS, DAG, RHS1, RHS2); ARMCC::CondCodes CondCode = IntCCToARMCC(CC); ARMcc = DAG.getConstant(CondCode, MVT::i32); SDVTList VTList = DAG.getVTList(MVT::Other, MVT::Glue); SDValue Ops[] = { Chain, ARMcc, LHS1, LHS2, RHS1, RHS2, Dest }; return DAG.getNode(ARMISD::BCC_i64, dl, VTList, Ops, 7); } return SDValue(); } SDValue ARMTargetLowering::LowerBR_CC(SDValue Op, SelectionDAG &DAG) const { SDValue Chain = Op.getOperand(0); ISD::CondCode CC = cast(Op.getOperand(1))->get(); SDValue LHS = Op.getOperand(2); SDValue RHS = Op.getOperand(3); SDValue Dest = Op.getOperand(4); DebugLoc dl = Op.getDebugLoc(); if (LHS.getValueType() == MVT::i32) { SDValue ARMcc; SDValue Cmp = getARMCmp(LHS, RHS, CC, ARMcc, DAG, dl); SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32); return DAG.getNode(ARMISD::BRCOND, dl, MVT::Other, Chain, Dest, ARMcc, CCR, Cmp); } assert(LHS.getValueType() == MVT::f32 || LHS.getValueType() == MVT::f64); if (UnsafeFPMath && (CC == ISD::SETEQ || CC == ISD::SETOEQ || CC == ISD::SETNE || CC == ISD::SETUNE)) { SDValue Result = OptimizeVFPBrcond(Op, DAG); if (Result.getNode()) return Result; } ARMCC::CondCodes CondCode, CondCode2; FPCCToARMCC(CC, CondCode, CondCode2); SDValue ARMcc = DAG.getConstant(CondCode, MVT::i32); SDValue Cmp = getVFPCmp(LHS, RHS, DAG, dl); SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32); SDVTList VTList = DAG.getVTList(MVT::Other, MVT::Glue); SDValue Ops[] = { Chain, Dest, ARMcc, CCR, Cmp }; SDValue Res = DAG.getNode(ARMISD::BRCOND, dl, VTList, Ops, 5); if (CondCode2 != ARMCC::AL) { ARMcc = DAG.getConstant(CondCode2, MVT::i32); SDValue Ops[] = { Res, Dest, ARMcc, CCR, Res.getValue(1) }; Res = DAG.getNode(ARMISD::BRCOND, dl, VTList, Ops, 5); } return Res; } SDValue ARMTargetLowering::LowerBR_JT(SDValue Op, SelectionDAG &DAG) const { SDValue Chain = Op.getOperand(0); SDValue Table = Op.getOperand(1); SDValue Index = Op.getOperand(2); DebugLoc dl = Op.getDebugLoc(); EVT PTy = getPointerTy(); JumpTableSDNode *JT = cast(Table); ARMFunctionInfo *AFI = DAG.getMachineFunction().getInfo(); SDValue UId = DAG.getConstant(AFI->createJumpTableUId(), PTy); SDValue JTI = DAG.getTargetJumpTable(JT->getIndex(), PTy); Table = DAG.getNode(ARMISD::WrapperJT, dl, MVT::i32, JTI, UId); Index = DAG.getNode(ISD::MUL, dl, PTy, Index, DAG.getConstant(4, PTy)); SDValue Addr = DAG.getNode(ISD::ADD, dl, PTy, Index, Table); if (Subtarget->isThumb2()) { // Thumb2 uses a two-level jump. That is, it jumps into the jump table // which does another jump to the destination. This also makes it easier // to translate it to TBB / TBH later. // FIXME: This might not work if the function is extremely large. return DAG.getNode(ARMISD::BR2_JT, dl, MVT::Other, Chain, Addr, Op.getOperand(2), JTI, UId); } if (getTargetMachine().getRelocationModel() == Reloc::PIC_) { Addr = DAG.getLoad((EVT)MVT::i32, dl, Chain, Addr, MachinePointerInfo::getJumpTable(), false, false, 0); Chain = Addr.getValue(1); Addr = DAG.getNode(ISD::ADD, dl, PTy, Addr, Table); return DAG.getNode(ARMISD::BR_JT, dl, MVT::Other, Chain, Addr, JTI, UId); } else { Addr = DAG.getLoad(PTy, dl, Chain, Addr, MachinePointerInfo::getJumpTable(), false, false, 0); Chain = Addr.getValue(1); return DAG.getNode(ARMISD::BR_JT, dl, MVT::Other, Chain, Addr, JTI, UId); } } static SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG) { DebugLoc dl = Op.getDebugLoc(); unsigned Opc; switch (Op.getOpcode()) { default: assert(0 && "Invalid opcode!"); case ISD::FP_TO_SINT: Opc = ARMISD::FTOSI; break; case ISD::FP_TO_UINT: Opc = ARMISD::FTOUI; break; } Op = DAG.getNode(Opc, dl, MVT::f32, Op.getOperand(0)); return DAG.getNode(ISD::BITCAST, dl, MVT::i32, Op); } static SDValue LowerVectorINT_TO_FP(SDValue Op, SelectionDAG &DAG) { EVT VT = Op.getValueType(); DebugLoc dl = Op.getDebugLoc(); assert(Op.getOperand(0).getValueType() == MVT::v4i16 && "Invalid type for custom lowering!"); if (VT != MVT::v4f32) return DAG.UnrollVectorOp(Op.getNode()); unsigned CastOpc; unsigned Opc; switch (Op.getOpcode()) { default: assert(0 && "Invalid opcode!"); case ISD::SINT_TO_FP: CastOpc = ISD::SIGN_EXTEND; Opc = ISD::SINT_TO_FP; break; case ISD::UINT_TO_FP: CastOpc = ISD::ZERO_EXTEND; Opc = ISD::UINT_TO_FP; break; } Op = DAG.getNode(CastOpc, dl, MVT::v4i32, Op.getOperand(0)); return DAG.getNode(Opc, dl, VT, Op); } static SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) { EVT VT = Op.getValueType(); if (VT.isVector()) return LowerVectorINT_TO_FP(Op, DAG); DebugLoc dl = Op.getDebugLoc(); unsigned Opc; switch (Op.getOpcode()) { default: assert(0 && "Invalid opcode!"); case ISD::SINT_TO_FP: Opc = ARMISD::SITOF; break; case ISD::UINT_TO_FP: Opc = ARMISD::UITOF; break; } Op = DAG.getNode(ISD::BITCAST, dl, MVT::f32, Op.getOperand(0)); return DAG.getNode(Opc, dl, VT, Op); } SDValue ARMTargetLowering::LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) const { // Implement fcopysign with a fabs and a conditional fneg. SDValue Tmp0 = Op.getOperand(0); SDValue Tmp1 = Op.getOperand(1); DebugLoc dl = Op.getDebugLoc(); EVT VT = Op.getValueType(); EVT SrcVT = Tmp1.getValueType(); bool InGPR = Tmp0.getOpcode() == ISD::BITCAST || Tmp0.getOpcode() == ARMISD::VMOVDRR; bool UseNEON = !InGPR && Subtarget->hasNEON(); if (UseNEON) { // Use VBSL to copy the sign bit. unsigned EncodedVal = ARM_AM::createNEONModImm(0x6, 0x80); SDValue Mask = DAG.getNode(ARMISD::VMOVIMM, dl, MVT::v2i32, DAG.getTargetConstant(EncodedVal, MVT::i32)); EVT OpVT = (VT == MVT::f32) ? MVT::v2i32 : MVT::v1i64; if (VT == MVT::f64) Mask = DAG.getNode(ARMISD::VSHL, dl, OpVT, DAG.getNode(ISD::BITCAST, dl, OpVT, Mask), DAG.getConstant(32, MVT::i32)); else /*if (VT == MVT::f32)*/ Tmp0 = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v2f32, Tmp0); if (SrcVT == MVT::f32) { Tmp1 = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v2f32, Tmp1); if (VT == MVT::f64) Tmp1 = DAG.getNode(ARMISD::VSHL, dl, OpVT, DAG.getNode(ISD::BITCAST, dl, OpVT, Tmp1), DAG.getConstant(32, MVT::i32)); } else if (VT == MVT::f32) Tmp1 = DAG.getNode(ARMISD::VSHRu, dl, MVT::v1i64, DAG.getNode(ISD::BITCAST, dl, MVT::v1i64, Tmp1), DAG.getConstant(32, MVT::i32)); Tmp0 = DAG.getNode(ISD::BITCAST, dl, OpVT, Tmp0); Tmp1 = DAG.getNode(ISD::BITCAST, dl, OpVT, Tmp1); SDValue AllOnes = DAG.getTargetConstant(ARM_AM::createNEONModImm(0xe, 0xff), MVT::i32); AllOnes = DAG.getNode(ARMISD::VMOVIMM, dl, MVT::v8i8, AllOnes); SDValue MaskNot = DAG.getNode(ISD::XOR, dl, OpVT, Mask, DAG.getNode(ISD::BITCAST, dl, OpVT, AllOnes)); SDValue Res = DAG.getNode(ISD::OR, dl, OpVT, DAG.getNode(ISD::AND, dl, OpVT, Tmp1, Mask), DAG.getNode(ISD::AND, dl, OpVT, Tmp0, MaskNot)); if (VT == MVT::f32) { Res = DAG.getNode(ISD::BITCAST, dl, MVT::v2f32, Res); Res = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f32, Res, DAG.getConstant(0, MVT::i32)); } else { Res = DAG.getNode(ISD::BITCAST, dl, MVT::f64, Res); } return Res; } // Bitcast operand 1 to i32. if (SrcVT == MVT::f64) Tmp1 = DAG.getNode(ARMISD::VMOVRRD, dl, DAG.getVTList(MVT::i32, MVT::i32), &Tmp1, 1).getValue(1); Tmp1 = DAG.getNode(ISD::BITCAST, dl, MVT::i32, Tmp1); // Or in the signbit with integer operations. SDValue Mask1 = DAG.getConstant(0x80000000, MVT::i32); SDValue Mask2 = DAG.getConstant(0x7fffffff, MVT::i32); Tmp1 = DAG.getNode(ISD::AND, dl, MVT::i32, Tmp1, Mask1); if (VT == MVT::f32) { Tmp0 = DAG.getNode(ISD::AND, dl, MVT::i32, DAG.getNode(ISD::BITCAST, dl, MVT::i32, Tmp0), Mask2); return DAG.getNode(ISD::BITCAST, dl, MVT::f32, DAG.getNode(ISD::OR, dl, MVT::i32, Tmp0, Tmp1)); } // f64: Or the high part with signbit and then combine two parts. Tmp0 = DAG.getNode(ARMISD::VMOVRRD, dl, DAG.getVTList(MVT::i32, MVT::i32), &Tmp0, 1); SDValue Lo = Tmp0.getValue(0); SDValue Hi = DAG.getNode(ISD::AND, dl, MVT::i32, Tmp0.getValue(1), Mask2); Hi = DAG.getNode(ISD::OR, dl, MVT::i32, Hi, Tmp1); return DAG.getNode(ARMISD::VMOVDRR, dl, MVT::f64, Lo, Hi); } SDValue ARMTargetLowering::LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const{ MachineFunction &MF = DAG.getMachineFunction(); MachineFrameInfo *MFI = MF.getFrameInfo(); MFI->setReturnAddressIsTaken(true); EVT VT = Op.getValueType(); DebugLoc dl = Op.getDebugLoc(); unsigned Depth = cast(Op.getOperand(0))->getZExtValue(); if (Depth) { SDValue FrameAddr = LowerFRAMEADDR(Op, DAG); SDValue Offset = DAG.getConstant(4, MVT::i32); return DAG.getLoad(VT, dl, DAG.getEntryNode(), DAG.getNode(ISD::ADD, dl, VT, FrameAddr, Offset), MachinePointerInfo(), false, false, 0); } // Return LR, which contains the return address. Mark it an implicit live-in. unsigned Reg = MF.addLiveIn(ARM::LR, getRegClassFor(MVT::i32)); return DAG.getCopyFromReg(DAG.getEntryNode(), dl, Reg, VT); } SDValue ARMTargetLowering::LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const { MachineFrameInfo *MFI = DAG.getMachineFunction().getFrameInfo(); MFI->setFrameAddressIsTaken(true); EVT VT = Op.getValueType(); DebugLoc dl = Op.getDebugLoc(); // FIXME probably not meaningful unsigned Depth = cast(Op.getOperand(0))->getZExtValue(); unsigned FrameReg = (Subtarget->isThumb() || Subtarget->isTargetDarwin()) ? ARM::R7 : ARM::R11; SDValue FrameAddr = DAG.getCopyFromReg(DAG.getEntryNode(), dl, FrameReg, VT); while (Depth--) FrameAddr = DAG.getLoad(VT, dl, DAG.getEntryNode(), FrameAddr, MachinePointerInfo(), false, false, 0); return FrameAddr; } /// ExpandBITCAST - If the target supports VFP, this function is called to /// expand a bit convert where either the source or destination type is i64 to /// use a VMOVDRR or VMOVRRD node. This should not be done when the non-i64 /// operand type is illegal (e.g., v2f32 for a target that doesn't support /// vectors), since the legalizer won't know what to do with that. static SDValue ExpandBITCAST(SDNode *N, SelectionDAG &DAG) { const TargetLowering &TLI = DAG.getTargetLoweringInfo(); DebugLoc dl = N->getDebugLoc(); SDValue Op = N->getOperand(0); // This function is only supposed to be called for i64 types, either as the // source or destination of the bit convert. EVT SrcVT = Op.getValueType(); EVT DstVT = N->getValueType(0); assert((SrcVT == MVT::i64 || DstVT == MVT::i64) && "ExpandBITCAST called for non-i64 type"); // Turn i64->f64 into VMOVDRR. if (SrcVT == MVT::i64 && TLI.isTypeLegal(DstVT)) { SDValue Lo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, Op, DAG.getConstant(0, MVT::i32)); SDValue Hi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, Op, DAG.getConstant(1, MVT::i32)); return DAG.getNode(ISD::BITCAST, dl, DstVT, DAG.getNode(ARMISD::VMOVDRR, dl, MVT::f64, Lo, Hi)); } // Turn f64->i64 into VMOVRRD. if (DstVT == MVT::i64 && TLI.isTypeLegal(SrcVT)) { SDValue Cvt = DAG.getNode(ARMISD::VMOVRRD, dl, DAG.getVTList(MVT::i32, MVT::i32), &Op, 1); // Merge the pieces into a single i64 value. return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, Cvt, Cvt.getValue(1)); } return SDValue(); } /// getZeroVector - Returns a vector of specified type with all zero elements. /// Zero vectors are used to represent vector negation and in those cases /// will be implemented with the NEON VNEG instruction. However, VNEG does /// not support i64 elements, so sometimes the zero vectors will need to be /// explicitly constructed. Regardless, use a canonical VMOV to create the /// zero vector. static SDValue getZeroVector(EVT VT, SelectionDAG &DAG, DebugLoc dl) { assert(VT.isVector() && "Expected a vector type"); // The canonical modified immediate encoding of a zero vector is....0! SDValue EncodedVal = DAG.getTargetConstant(0, MVT::i32); EVT VmovVT = VT.is128BitVector() ? MVT::v4i32 : MVT::v2i32; SDValue Vmov = DAG.getNode(ARMISD::VMOVIMM, dl, VmovVT, EncodedVal); return DAG.getNode(ISD::BITCAST, dl, VT, Vmov); } /// LowerShiftRightParts - Lower SRA_PARTS, which returns two /// i32 values and take a 2 x i32 value to shift plus a shift amount. SDValue ARMTargetLowering::LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const { assert(Op.getNumOperands() == 3 && "Not a double-shift!"); EVT VT = Op.getValueType(); unsigned VTBits = VT.getSizeInBits(); DebugLoc dl = Op.getDebugLoc(); SDValue ShOpLo = Op.getOperand(0); SDValue ShOpHi = Op.getOperand(1); SDValue ShAmt = Op.getOperand(2); SDValue ARMcc; unsigned Opc = (Op.getOpcode() == ISD::SRA_PARTS) ? ISD::SRA : ISD::SRL; assert(Op.getOpcode() == ISD::SRA_PARTS || Op.getOpcode() == ISD::SRL_PARTS); SDValue RevShAmt = DAG.getNode(ISD::SUB, dl, MVT::i32, DAG.getConstant(VTBits, MVT::i32), ShAmt); SDValue Tmp1 = DAG.getNode(ISD::SRL, dl, VT, ShOpLo, ShAmt); SDValue ExtraShAmt = DAG.getNode(ISD::SUB, dl, MVT::i32, ShAmt, DAG.getConstant(VTBits, MVT::i32)); SDValue Tmp2 = DAG.getNode(ISD::SHL, dl, VT, ShOpHi, RevShAmt); SDValue FalseVal = DAG.getNode(ISD::OR, dl, VT, Tmp1, Tmp2); SDValue TrueVal = DAG.getNode(Opc, dl, VT, ShOpHi, ExtraShAmt); SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32); SDValue Cmp = getARMCmp(ExtraShAmt, DAG.getConstant(0, MVT::i32), ISD::SETGE, ARMcc, DAG, dl); SDValue Hi = DAG.getNode(Opc, dl, VT, ShOpHi, ShAmt); SDValue Lo = DAG.getNode(ARMISD::CMOV, dl, VT, FalseVal, TrueVal, ARMcc, CCR, Cmp); SDValue Ops[2] = { Lo, Hi }; return DAG.getMergeValues(Ops, 2, dl); } /// LowerShiftLeftParts - Lower SHL_PARTS, which returns two /// i32 values and take a 2 x i32 value to shift plus a shift amount. SDValue ARMTargetLowering::LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const { assert(Op.getNumOperands() == 3 && "Not a double-shift!"); EVT VT = Op.getValueType(); unsigned VTBits = VT.getSizeInBits(); DebugLoc dl = Op.getDebugLoc(); SDValue ShOpLo = Op.getOperand(0); SDValue ShOpHi = Op.getOperand(1); SDValue ShAmt = Op.getOperand(2); SDValue ARMcc; assert(Op.getOpcode() == ISD::SHL_PARTS); SDValue RevShAmt = DAG.getNode(ISD::SUB, dl, MVT::i32, DAG.getConstant(VTBits, MVT::i32), ShAmt); SDValue Tmp1 = DAG.getNode(ISD::SRL, dl, VT, ShOpLo, RevShAmt); SDValue ExtraShAmt = DAG.getNode(ISD::SUB, dl, MVT::i32, ShAmt, DAG.getConstant(VTBits, MVT::i32)); SDValue Tmp2 = DAG.getNode(ISD::SHL, dl, VT, ShOpHi, ShAmt); SDValue Tmp3 = DAG.getNode(ISD::SHL, dl, VT, ShOpLo, ExtraShAmt); SDValue FalseVal = DAG.getNode(ISD::OR, dl, VT, Tmp1, Tmp2); SDValue CCR = DAG.getRegister(ARM::CPSR, MVT::i32); SDValue Cmp = getARMCmp(ExtraShAmt, DAG.getConstant(0, MVT::i32), ISD::SETGE, ARMcc, DAG, dl); SDValue Lo = DAG.getNode(ISD::SHL, dl, VT, ShOpLo, ShAmt); SDValue Hi = DAG.getNode(ARMISD::CMOV, dl, VT, FalseVal, Tmp3, ARMcc, CCR, Cmp); SDValue Ops[2] = { Lo, Hi }; return DAG.getMergeValues(Ops, 2, dl); } SDValue ARMTargetLowering::LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const { // The rounding mode is in bits 23:22 of the FPSCR. // The ARM rounding mode value to FLT_ROUNDS mapping is 0->1, 1->2, 2->3, 3->0 // The formula we use to implement this is (((FPSCR + 1 << 22) >> 22) & 3) // so that the shift + and get folded into a bitfield extract. DebugLoc dl = Op.getDebugLoc(); SDValue FPSCR = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, MVT::i32, DAG.getConstant(Intrinsic::arm_get_fpscr, MVT::i32)); SDValue FltRounds = DAG.getNode(ISD::ADD, dl, MVT::i32, FPSCR, DAG.getConstant(1U << 22, MVT::i32)); SDValue RMODE = DAG.getNode(ISD::SRL, dl, MVT::i32, FltRounds, DAG.getConstant(22, MVT::i32)); return DAG.getNode(ISD::AND, dl, MVT::i32, RMODE, DAG.getConstant(3, MVT::i32)); } static SDValue LowerCTTZ(SDNode *N, SelectionDAG &DAG, const ARMSubtarget *ST) { EVT VT = N->getValueType(0); DebugLoc dl = N->getDebugLoc(); if (!ST->hasV6T2Ops()) return SDValue(); SDValue rbit = DAG.getNode(ARMISD::RBIT, dl, VT, N->getOperand(0)); return DAG.getNode(ISD::CTLZ, dl, VT, rbit); } static SDValue LowerShift(SDNode *N, SelectionDAG &DAG, const ARMSubtarget *ST) { EVT VT = N->getValueType(0); DebugLoc dl = N->getDebugLoc(); if (!VT.isVector()) return SDValue(); // Lower vector shifts on NEON to use VSHL. assert(ST->hasNEON() && "unexpected vector shift"); // Left shifts translate directly to the vshiftu intrinsic. if (N->getOpcode() == ISD::SHL) return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, VT, DAG.getConstant(Intrinsic::arm_neon_vshiftu, MVT::i32), N->getOperand(0), N->getOperand(1)); assert((N->getOpcode() == ISD::SRA || N->getOpcode() == ISD::SRL) && "unexpected vector shift opcode"); // NEON uses the same intrinsics for both left and right shifts. For // right shifts, the shift amounts are negative, so negate the vector of // shift amounts. EVT ShiftVT = N->getOperand(1).getValueType(); SDValue NegatedCount = DAG.getNode(ISD::SUB, dl, ShiftVT, getZeroVector(ShiftVT, DAG, dl), N->getOperand(1)); Intrinsic::ID vshiftInt = (N->getOpcode() == ISD::SRA ? Intrinsic::arm_neon_vshifts : Intrinsic::arm_neon_vshiftu); return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, VT, DAG.getConstant(vshiftInt, MVT::i32), N->getOperand(0), NegatedCount); } static SDValue Expand64BitShift(SDNode *N, SelectionDAG &DAG, const ARMSubtarget *ST) { EVT VT = N->getValueType(0); DebugLoc dl = N->getDebugLoc(); // We can get here for a node like i32 = ISD::SHL i32, i64 if (VT != MVT::i64) return SDValue(); assert((N->getOpcode() == ISD::SRL || N->getOpcode() == ISD::SRA) && "Unknown shift to lower!"); // We only lower SRA, SRL of 1 here, all others use generic lowering. if (!isa(N->getOperand(1)) || cast(N->getOperand(1))->getZExtValue() != 1) return SDValue(); // If we are in thumb mode, we don't have RRX. if (ST->isThumb1Only()) return SDValue(); // Okay, we have a 64-bit SRA or SRL of 1. Lower this to an RRX expr. SDValue Lo = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, N->getOperand(0), DAG.getConstant(0, MVT::i32)); SDValue Hi = DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, N->getOperand(0), DAG.getConstant(1, MVT::i32)); // First, build a SRA_FLAG/SRL_FLAG op, which shifts the top part by one and // captures the result into a carry flag. unsigned Opc = N->getOpcode() == ISD::SRL ? ARMISD::SRL_FLAG:ARMISD::SRA_FLAG; Hi = DAG.getNode(Opc, dl, DAG.getVTList(MVT::i32, MVT::Glue), &Hi, 1); // The low part is an ARMISD::RRX operand, which shifts the carry in. Lo = DAG.getNode(ARMISD::RRX, dl, MVT::i32, Lo, Hi.getValue(1)); // Merge the pieces into a single i64 value. return DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, Lo, Hi); } static SDValue LowerVSETCC(SDValue Op, SelectionDAG &DAG) { SDValue TmpOp0, TmpOp1; bool Invert = false; bool Swap = false; unsigned Opc = 0; SDValue Op0 = Op.getOperand(0); SDValue Op1 = Op.getOperand(1); SDValue CC = Op.getOperand(2); EVT VT = Op.getValueType(); ISD::CondCode SetCCOpcode = cast(CC)->get(); DebugLoc dl = Op.getDebugLoc(); if (Op.getOperand(1).getValueType().isFloatingPoint()) { switch (SetCCOpcode) { default: llvm_unreachable("Illegal FP comparison"); break; case ISD::SETUNE: case ISD::SETNE: Invert = true; // Fallthrough case ISD::SETOEQ: case ISD::SETEQ: Opc = ARMISD::VCEQ; break; case ISD::SETOLT: case ISD::SETLT: Swap = true; // Fallthrough case ISD::SETOGT: case ISD::SETGT: Opc = ARMISD::VCGT; break; case ISD::SETOLE: case ISD::SETLE: Swap = true; // Fallthrough case ISD::SETOGE: case ISD::SETGE: Opc = ARMISD::VCGE; break; case ISD::SETUGE: Swap = true; // Fallthrough case ISD::SETULE: Invert = true; Opc = ARMISD::VCGT; break; case ISD::SETUGT: Swap = true; // Fallthrough case ISD::SETULT: Invert = true; Opc = ARMISD::VCGE; break; case ISD::SETUEQ: Invert = true; // Fallthrough case ISD::SETONE: // Expand this to (OLT | OGT). TmpOp0 = Op0; TmpOp1 = Op1; Opc = ISD::OR; Op0 = DAG.getNode(ARMISD::VCGT, dl, VT, TmpOp1, TmpOp0); Op1 = DAG.getNode(ARMISD::VCGT, dl, VT, TmpOp0, TmpOp1); break; case ISD::SETUO: Invert = true; // Fallthrough case ISD::SETO: // Expand this to (OLT | OGE). TmpOp0 = Op0; TmpOp1 = Op1; Opc = ISD::OR; Op0 = DAG.getNode(ARMISD::VCGT, dl, VT, TmpOp1, TmpOp0); Op1 = DAG.getNode(ARMISD::VCGE, dl, VT, TmpOp0, TmpOp1); break; } } else { // Integer comparisons. switch (SetCCOpcode) { default: llvm_unreachable("Illegal integer comparison"); break; case ISD::SETNE: Invert = true; case ISD::SETEQ: Opc = ARMISD::VCEQ; break; case ISD::SETLT: Swap = true; case ISD::SETGT: Opc = ARMISD::VCGT; break; case ISD::SETLE: Swap = true; case ISD::SETGE: Opc = ARMISD::VCGE; break; case ISD::SETULT: Swap = true; case ISD::SETUGT: Opc = ARMISD::VCGTU; break; case ISD::SETULE: Swap = true; case ISD::SETUGE: Opc = ARMISD::VCGEU; break; } // Detect VTST (Vector Test Bits) = icmp ne (and (op0, op1), zero). if (Opc == ARMISD::VCEQ) { SDValue AndOp; if (ISD::isBuildVectorAllZeros(Op1.getNode())) AndOp = Op0; else if (ISD::isBuildVectorAllZeros(Op0.getNode())) AndOp = Op1; // Ignore bitconvert. if (AndOp.getNode() && AndOp.getOpcode() == ISD::BITCAST) AndOp = AndOp.getOperand(0); if (AndOp.getNode() && AndOp.getOpcode() == ISD::AND) { Opc = ARMISD::VTST; Op0 = DAG.getNode(ISD::BITCAST, dl, VT, AndOp.getOperand(0)); Op1 = DAG.getNode(ISD::BITCAST, dl, VT, AndOp.getOperand(1)); Invert = !Invert; } } } if (Swap) std::swap(Op0, Op1); // If one of the operands is a constant vector zero, attempt to fold the // comparison to a specialized compare-against-zero form. SDValue SingleOp; if (ISD::isBuildVectorAllZeros(Op1.getNode())) SingleOp = Op0; else if (ISD::isBuildVectorAllZeros(Op0.getNode())) { if (Opc == ARMISD::VCGE) Opc = ARMISD::VCLEZ; else if (Opc == ARMISD::VCGT) Opc = ARMISD::VCLTZ; SingleOp = Op1; } SDValue Result; if (SingleOp.getNode()) { switch (Opc) { case ARMISD::VCEQ: Result = DAG.getNode(ARMISD::VCEQZ, dl, VT, SingleOp); break; case ARMISD::VCGE: Result = DAG.getNode(ARMISD::VCGEZ, dl, VT, SingleOp); break; case ARMISD::VCLEZ: Result = DAG.getNode(ARMISD::VCLEZ, dl, VT, SingleOp); break; case ARMISD::VCGT: Result = DAG.getNode(ARMISD::VCGTZ, dl, VT, SingleOp); break; case ARMISD::VCLTZ: Result = DAG.getNode(ARMISD::VCLTZ, dl, VT, SingleOp); break; default: Result = DAG.getNode(Opc, dl, VT, Op0, Op1); } } else { Result = DAG.getNode(Opc, dl, VT, Op0, Op1); } if (Invert) Result = DAG.getNOT(dl, Result, VT); return Result; } /// isNEONModifiedImm - Check if the specified splat value corresponds to a /// valid vector constant for a NEON instruction with a "modified immediate" /// operand (e.g., VMOV). If so, return the encoded value. static SDValue isNEONModifiedImm(uint64_t SplatBits, uint64_t SplatUndef, unsigned SplatBitSize, SelectionDAG &DAG, EVT &VT, bool is128Bits, NEONModImmType type) { unsigned OpCmode, Imm; // SplatBitSize is set to the smallest size that splats the vector, so a // zero vector will always have SplatBitSize == 8. However, NEON modified // immediate instructions others than VMOV do not support the 8-bit encoding // of a zero vector, and the default encoding of zero is supposed to be the // 32-bit version. if (SplatBits == 0) SplatBitSize = 32; switch (SplatBitSize) { case 8: if (type != VMOVModImm) return SDValue(); // Any 1-byte value is OK. Op=0, Cmode=1110. assert((SplatBits & ~0xff) == 0 && "one byte splat value is too big"); OpCmode = 0xe; Imm = SplatBits; VT = is128Bits ? MVT::v16i8 : MVT::v8i8; break; case 16: // NEON's 16-bit VMOV supports splat values where only one byte is nonzero. VT = is128Bits ? MVT::v8i16 : MVT::v4i16; if ((SplatBits & ~0xff) == 0) { // Value = 0x00nn: Op=x, Cmode=100x. OpCmode = 0x8; Imm = SplatBits; break; } if ((SplatBits & ~0xff00) == 0) { // Value = 0xnn00: Op=x, Cmode=101x. OpCmode = 0xa; Imm = SplatBits >> 8; break; } return SDValue(); case 32: // NEON's 32-bit VMOV supports splat values where: // * only one byte is nonzero, or // * the least significant byte is 0xff and the second byte is nonzero, or // * the least significant 2 bytes are 0xff and the third is nonzero. VT = is128Bits ? MVT::v4i32 : MVT::v2i32; if ((SplatBits & ~0xff) == 0) { // Value = 0x000000nn: Op=x, Cmode=000x. OpCmode = 0; Imm = SplatBits; break; } if ((SplatBits & ~0xff00) == 0) { // Value = 0x0000nn00: Op=x, Cmode=001x. OpCmode = 0x2; Imm = SplatBits >> 8; break; } if ((SplatBits & ~0xff0000) == 0) { // Value = 0x00nn0000: Op=x, Cmode=010x. OpCmode = 0x4; Imm = SplatBits >> 16; break; } if ((SplatBits & ~0xff000000) == 0) { // Value = 0xnn000000: Op=x, Cmode=011x. OpCmode = 0x6; Imm = SplatBits >> 24; break; } // cmode == 0b1100 and cmode == 0b1101 are not supported for VORR or VBIC if (type == OtherModImm) return SDValue(); if ((SplatBits & ~0xffff) == 0 && ((SplatBits | SplatUndef) & 0xff) == 0xff) { // Value = 0x0000nnff: Op=x, Cmode=1100. OpCmode = 0xc; Imm = SplatBits >> 8; SplatBits |= 0xff; break; } if ((SplatBits & ~0xffffff) == 0 && ((SplatBits | SplatUndef) & 0xffff) == 0xffff) { // Value = 0x00nnffff: Op=x, Cmode=1101. OpCmode = 0xd; Imm = SplatBits >> 16; SplatBits |= 0xffff; break; } // Note: there are a few 32-bit splat values (specifically: 00ffff00, // ff000000, ff0000ff, and ffff00ff) that are valid for VMOV.I64 but not // VMOV.I32. A (very) minor optimization would be to replicate the value // and fall through here to test for a valid 64-bit splat. But, then the // caller would also need to check and handle the change in size. return SDValue(); case 64: { if (type != VMOVModImm) return SDValue(); // NEON has a 64-bit VMOV splat where each byte is either 0 or 0xff. uint64_t BitMask = 0xff; uint64_t Val = 0; unsigned ImmMask = 1; Imm = 0; for (int ByteNum = 0; ByteNum < 8; ++ByteNum) { if (((SplatBits | SplatUndef) & BitMask) == BitMask) { Val |= BitMask; Imm |= ImmMask; } else if ((SplatBits & BitMask) != 0) { return SDValue(); } BitMask <<= 8; ImmMask <<= 1; } // Op=1, Cmode=1110. OpCmode = 0x1e; SplatBits = Val; VT = is128Bits ? MVT::v2i64 : MVT::v1i64; break; } default: llvm_unreachable("unexpected size for isNEONModifiedImm"); return SDValue(); } unsigned EncodedVal = ARM_AM::createNEONModImm(OpCmode, Imm); return DAG.getTargetConstant(EncodedVal, MVT::i32); } static bool isVEXTMask(const SmallVectorImpl &M, EVT VT, bool &ReverseVEXT, unsigned &Imm) { unsigned NumElts = VT.getVectorNumElements(); ReverseVEXT = false; // Assume that the first shuffle index is not UNDEF. Fail if it is. if (M[0] < 0) return false; Imm = M[0]; // If this is a VEXT shuffle, the immediate value is the index of the first // element. The other shuffle indices must be the successive elements after // the first one. unsigned ExpectedElt = Imm; for (unsigned i = 1; i < NumElts; ++i) { // Increment the expected index. If it wraps around, it may still be // a VEXT but the source vectors must be swapped. ExpectedElt += 1; if (ExpectedElt == NumElts * 2) { ExpectedElt = 0; ReverseVEXT = true; } if (M[i] < 0) continue; // ignore UNDEF indices if (ExpectedElt != static_cast(M[i])) return false; } // Adjust the index value if the source operands will be swapped. if (ReverseVEXT) Imm -= NumElts; return true; } /// isVREVMask - Check if a vector shuffle corresponds to a VREV /// instruction with the specified blocksize. (The order of the elements /// within each block of the vector is reversed.) static bool isVREVMask(const SmallVectorImpl &M, EVT VT, unsigned BlockSize) { assert((BlockSize==16 || BlockSize==32 || BlockSize==64) && "Only possible block sizes for VREV are: 16, 32, 64"); unsigned EltSz = VT.getVectorElementType().getSizeInBits(); if (EltSz == 64) return false; unsigned NumElts = VT.getVectorNumElements(); unsigned BlockElts = M[0] + 1; // If the first shuffle index is UNDEF, be optimistic. if (M[0] < 0) BlockElts = BlockSize / EltSz; if (BlockSize <= EltSz || BlockSize != BlockElts * EltSz) return false; for (unsigned i = 0; i < NumElts; ++i) { if (M[i] < 0) continue; // ignore UNDEF indices if ((unsigned) M[i] != (i - i%BlockElts) + (BlockElts - 1 - i%BlockElts)) return false; } return true; } static bool isVTBLMask(const SmallVectorImpl &M, EVT VT) { // We can handle <8 x i8> vector shuffles. If the index in the mask is out of // range, then 0 is placed into the resulting vector. So pretty much any mask // of 8 elements can work here. return VT == MVT::v8i8 && M.size() == 8; } static bool isVTRNMask(const SmallVectorImpl &M, EVT VT, unsigned &WhichResult) { unsigned EltSz = VT.getVectorElementType().getSizeInBits(); if (EltSz == 64) return false; unsigned NumElts = VT.getVectorNumElements(); WhichResult = (M[0] == 0 ? 0 : 1); for (unsigned i = 0; i < NumElts; i += 2) { if ((M[i] >= 0 && (unsigned) M[i] != i + WhichResult) || (M[i+1] >= 0 && (unsigned) M[i+1] != i + NumElts + WhichResult)) return false; } return true; } /// isVTRN_v_undef_Mask - Special case of isVTRNMask for canonical form of /// "vector_shuffle v, v", i.e., "vector_shuffle v, undef". /// Mask is e.g., <0, 0, 2, 2> instead of <0, 4, 2, 6>. static bool isVTRN_v_undef_Mask(const SmallVectorImpl &M, EVT VT, unsigned &WhichResult) { unsigned EltSz = VT.getVectorElementType().getSizeInBits(); if (EltSz == 64) return false; unsigned NumElts = VT.getVectorNumElements(); WhichResult = (M[0] == 0 ? 0 : 1); for (unsigned i = 0; i < NumElts; i += 2) { if ((M[i] >= 0 && (unsigned) M[i] != i + WhichResult) || (M[i+1] >= 0 && (unsigned) M[i+1] != i + WhichResult)) return false; } return true; } static bool isVUZPMask(const SmallVectorImpl &M, EVT VT, unsigned &WhichResult) { unsigned EltSz = VT.getVectorElementType().getSizeInBits(); if (EltSz == 64) return false; unsigned NumElts = VT.getVectorNumElements(); WhichResult = (M[0] == 0 ? 0 : 1); for (unsigned i = 0; i != NumElts; ++i) { if (M[i] < 0) continue; // ignore UNDEF indices if ((unsigned) M[i] != 2 * i + WhichResult) return false; } // VUZP.32 for 64-bit vectors is a pseudo-instruction alias for VTRN.32. if (VT.is64BitVector() && EltSz == 32) return false; return true; } /// isVUZP_v_undef_Mask - Special case of isVUZPMask for canonical form of /// "vector_shuffle v, v", i.e., "vector_shuffle v, undef". /// Mask is e.g., <0, 2, 0, 2> instead of <0, 2, 4, 6>, static bool isVUZP_v_undef_Mask(const SmallVectorImpl &M, EVT VT, unsigned &WhichResult) { unsigned EltSz = VT.getVectorElementType().getSizeInBits(); if (EltSz == 64) return false; unsigned Half = VT.getVectorNumElements() / 2; WhichResult = (M[0] == 0 ? 0 : 1); for (unsigned j = 0; j != 2; ++j) { unsigned Idx = WhichResult; for (unsigned i = 0; i != Half; ++i) { int MIdx = M[i + j * Half]; if (MIdx >= 0 && (unsigned) MIdx != Idx) return false; Idx += 2; } } // VUZP.32 for 64-bit vectors is a pseudo-instruction alias for VTRN.32. if (VT.is64BitVector() && EltSz == 32) return false; return true; } static bool isVZIPMask(const SmallVectorImpl &M, EVT VT, unsigned &WhichResult) { unsigned EltSz = VT.getVectorElementType().getSizeInBits(); if (EltSz == 64) return false; unsigned NumElts = VT.getVectorNumElements(); WhichResult = (M[0] == 0 ? 0 : 1); unsigned Idx = WhichResult * NumElts / 2; for (unsigned i = 0; i != NumElts; i += 2) { if ((M[i] >= 0 && (unsigned) M[i] != Idx) || (M[i+1] >= 0 && (unsigned) M[i+1] != Idx + NumElts)) return false; Idx += 1; } // VZIP.32 for 64-bit vectors is a pseudo-instruction alias for VTRN.32. if (VT.is64BitVector() && EltSz == 32) return false; return true; } /// isVZIP_v_undef_Mask - Special case of isVZIPMask for canonical form of /// "vector_shuffle v, v", i.e., "vector_shuffle v, undef". /// Mask is e.g., <0, 0, 1, 1> instead of <0, 4, 1, 5>. static bool isVZIP_v_undef_Mask(const SmallVectorImpl &M, EVT VT, unsigned &WhichResult) { unsigned EltSz = VT.getVectorElementType().getSizeInBits(); if (EltSz == 64) return false; unsigned NumElts = VT.getVectorNumElements(); WhichResult = (M[0] == 0 ? 0 : 1); unsigned Idx = WhichResult * NumElts / 2; for (unsigned i = 0; i != NumElts; i += 2) { if ((M[i] >= 0 && (unsigned) M[i] != Idx) || (M[i+1] >= 0 && (unsigned) M[i+1] != Idx)) return false; Idx += 1; } // VZIP.32 for 64-bit vectors is a pseudo-instruction alias for VTRN.32. if (VT.is64BitVector() && EltSz == 32) return false; return true; } // If N is an integer constant that can be moved into a register in one // instruction, return an SDValue of such a constant (will become a MOV // instruction). Otherwise return null. static SDValue IsSingleInstrConstant(SDValue N, SelectionDAG &DAG, const ARMSubtarget *ST, DebugLoc dl) { uint64_t Val; if (!isa(N)) return SDValue(); Val = cast(N)->getZExtValue(); if (ST->isThumb1Only()) { if (Val <= 255 || ~Val <= 255) return DAG.getConstant(Val, MVT::i32); } else { if (ARM_AM::getSOImmVal(Val) != -1 || ARM_AM::getSOImmVal(~Val) != -1) return DAG.getConstant(Val, MVT::i32); } return SDValue(); } // If this is a case we can't handle, return null and let the default // expansion code take care of it. SDValue ARMTargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG, const ARMSubtarget *ST) const { BuildVectorSDNode *BVN = cast(Op.getNode()); DebugLoc dl = Op.getDebugLoc(); EVT VT = Op.getValueType(); APInt SplatBits, SplatUndef; unsigned SplatBitSize; bool HasAnyUndefs; if (BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs)) { if (SplatBitSize <= 64) { // Check if an immediate VMOV works. EVT VmovVT; SDValue Val = isNEONModifiedImm(SplatBits.getZExtValue(), SplatUndef.getZExtValue(), SplatBitSize, DAG, VmovVT, VT.is128BitVector(), VMOVModImm); if (Val.getNode()) { SDValue Vmov = DAG.getNode(ARMISD::VMOVIMM, dl, VmovVT, Val); return DAG.getNode(ISD::BITCAST, dl, VT, Vmov); } // Try an immediate VMVN. uint64_t NegatedImm = (~SplatBits).getZExtValue(); Val = isNEONModifiedImm(NegatedImm, SplatUndef.getZExtValue(), SplatBitSize, DAG, VmovVT, VT.is128BitVector(), VMVNModImm); if (Val.getNode()) { SDValue Vmov = DAG.getNode(ARMISD::VMVNIMM, dl, VmovVT, Val); return DAG.getNode(ISD::BITCAST, dl, VT, Vmov); } } } // Scan through the operands to see if only one value is used. unsigned NumElts = VT.getVectorNumElements(); bool isOnlyLowElement = true; bool usesOnlyOneValue = true; bool isConstant = true; SDValue Value; for (unsigned i = 0; i < NumElts; ++i) { SDValue V = Op.getOperand(i); if (V.getOpcode() == ISD::UNDEF) continue; if (i > 0) isOnlyLowElement = false; if (!isa(V) && !isa(V)) isConstant = false; if (!Value.getNode()) Value = V; else if (V != Value) usesOnlyOneValue = false; } if (!Value.getNode()) return DAG.getUNDEF(VT); if (isOnlyLowElement) return DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VT, Value); unsigned EltSize = VT.getVectorElementType().getSizeInBits(); // Use VDUP for non-constant splats. For f32 constant splats, reduce to // i32 and try again. if (usesOnlyOneValue && EltSize <= 32) { if (!isConstant) return DAG.getNode(ARMISD::VDUP, dl, VT, Value); if (VT.getVectorElementType().isFloatingPoint()) { SmallVector Ops; for (unsigned i = 0; i < NumElts; ++i) Ops.push_back(DAG.getNode(ISD::BITCAST, dl, MVT::i32, Op.getOperand(i))); EVT VecVT = EVT::getVectorVT(*DAG.getContext(), MVT::i32, NumElts); SDValue Val = DAG.getNode(ISD::BUILD_VECTOR, dl, VecVT, &Ops[0], NumElts); Val = LowerBUILD_VECTOR(Val, DAG, ST); if (Val.getNode()) return DAG.getNode(ISD::BITCAST, dl, VT, Val); } SDValue Val = IsSingleInstrConstant(Value, DAG, ST, dl); if (Val.getNode()) return DAG.getNode(ARMISD::VDUP, dl, VT, Val); } // If all elements are constants and the case above didn't get hit, fall back // to the default expansion, which will generate a load from the constant // pool. if (isConstant) return SDValue(); // Empirical tests suggest this is rarely worth it for vectors of length <= 2. if (NumElts >= 4) { SDValue shuffle = ReconstructShuffle(Op, DAG); if (shuffle != SDValue()) return shuffle; } // Vectors with 32- or 64-bit elements can be built by directly assigning // the subregisters. Lower it to an ARMISD::BUILD_VECTOR so the operands // will be legalized. if (EltSize >= 32) { // Do the expansion with floating-point types, since that is what the VFP // registers are defined to use, and since i64 is not legal. EVT EltVT = EVT::getFloatingPointVT(EltSize); EVT VecVT = EVT::getVectorVT(*DAG.getContext(), EltVT, NumElts); SmallVector Ops; for (unsigned i = 0; i < NumElts; ++i) Ops.push_back(DAG.getNode(ISD::BITCAST, dl, EltVT, Op.getOperand(i))); SDValue Val = DAG.getNode(ARMISD::BUILD_VECTOR, dl, VecVT, &Ops[0],NumElts); return DAG.getNode(ISD::BITCAST, dl, VT, Val); } return SDValue(); } // Gather data to see if the operation can be modelled as a // shuffle in combination with VEXTs. SDValue ARMTargetLowering::ReconstructShuffle(SDValue Op, SelectionDAG &DAG) const { DebugLoc dl = Op.getDebugLoc(); EVT VT = Op.getValueType(); unsigned NumElts = VT.getVectorNumElements(); SmallVector SourceVecs; SmallVector MinElts; SmallVector MaxElts; for (unsigned i = 0; i < NumElts; ++i) { SDValue V = Op.getOperand(i); if (V.getOpcode() == ISD::UNDEF) continue; else if (V.getOpcode() != ISD::EXTRACT_VECTOR_ELT) { // A shuffle can only come from building a vector from various // elements of other vectors. return SDValue(); } else if (V.getOperand(0).getValueType().getVectorElementType() != VT.getVectorElementType()) { // This code doesn't know how to handle shuffles where the vector // element types do not match (this happens because type legalization // promotes the return type of EXTRACT_VECTOR_ELT). // FIXME: It might be appropriate to extend this code to handle // mismatched types. return SDValue(); } // Record this extraction against the appropriate vector if possible... SDValue SourceVec = V.getOperand(0); unsigned EltNo = cast(V.getOperand(1))->getZExtValue(); bool FoundSource = false; for (unsigned j = 0; j < SourceVecs.size(); ++j) { if (SourceVecs[j] == SourceVec) { if (MinElts[j] > EltNo) MinElts[j] = EltNo; if (MaxElts[j] < EltNo) MaxElts[j] = EltNo; FoundSource = true; break; } } // Or record a new source if not... if (!FoundSource) { SourceVecs.push_back(SourceVec); MinElts.push_back(EltNo); MaxElts.push_back(EltNo); } } // Currently only do something sane when at most two source vectors // involved. if (SourceVecs.size() > 2) return SDValue(); SDValue ShuffleSrcs[2] = {DAG.getUNDEF(VT), DAG.getUNDEF(VT) }; int VEXTOffsets[2] = {0, 0}; // This loop extracts the usage patterns of the source vectors // and prepares appropriate SDValues for a shuffle if possible. for (unsigned i = 0; i < SourceVecs.size(); ++i) { if (SourceVecs[i].getValueType() == VT) { // No VEXT necessary ShuffleSrcs[i] = SourceVecs[i]; VEXTOffsets[i] = 0; continue; } else if (SourceVecs[i].getValueType().getVectorNumElements() < NumElts) { // It probably isn't worth padding out a smaller vector just to // break it down again in a shuffle. return SDValue(); } // Since only 64-bit and 128-bit vectors are legal on ARM and // we've eliminated the other cases... assert(SourceVecs[i].getValueType().getVectorNumElements() == 2*NumElts && "unexpected vector sizes in ReconstructShuffle"); if (MaxElts[i] - MinElts[i] >= NumElts) { // Span too large for a VEXT to cope return SDValue(); } if (MinElts[i] >= NumElts) { // The extraction can just take the second half VEXTOffsets[i] = NumElts; ShuffleSrcs[i] = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT, SourceVecs[i], DAG.getIntPtrConstant(NumElts)); } else if (MaxElts[i] < NumElts) { // The extraction can just take the first half VEXTOffsets[i] = 0; ShuffleSrcs[i] = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT, SourceVecs[i], DAG.getIntPtrConstant(0)); } else { // An actual VEXT is needed VEXTOffsets[i] = MinElts[i]; SDValue VEXTSrc1 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT, SourceVecs[i], DAG.getIntPtrConstant(0)); SDValue VEXTSrc2 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT, SourceVecs[i], DAG.getIntPtrConstant(NumElts)); ShuffleSrcs[i] = DAG.getNode(ARMISD::VEXT, dl, VT, VEXTSrc1, VEXTSrc2, DAG.getConstant(VEXTOffsets[i], MVT::i32)); } } SmallVector Mask; for (unsigned i = 0; i < NumElts; ++i) { SDValue Entry = Op.getOperand(i); if (Entry.getOpcode() == ISD::UNDEF) { Mask.push_back(-1); continue; } SDValue ExtractVec = Entry.getOperand(0); int ExtractElt = cast(Op.getOperand(i) .getOperand(1))->getSExtValue(); if (ExtractVec == SourceVecs[0]) { Mask.push_back(ExtractElt - VEXTOffsets[0]); } else { Mask.push_back(ExtractElt + NumElts - VEXTOffsets[1]); } } // Final check before we try to produce nonsense... if (isShuffleMaskLegal(Mask, VT)) return DAG.getVectorShuffle(VT, dl, ShuffleSrcs[0], ShuffleSrcs[1], &Mask[0]); return SDValue(); } /// isShuffleMaskLegal - Targets can use this to indicate that they only /// support *some* VECTOR_SHUFFLE operations, those with specific masks. /// By default, if a target supports the VECTOR_SHUFFLE node, all mask values /// are assumed to be legal. bool ARMTargetLowering::isShuffleMaskLegal(const SmallVectorImpl &M, EVT VT) const { if (VT.getVectorNumElements() == 4 && (VT.is128BitVector() || VT.is64BitVector())) { unsigned PFIndexes[4]; for (unsigned i = 0; i != 4; ++i) { if (M[i] < 0) PFIndexes[i] = 8; else PFIndexes[i] = M[i]; } // Compute the index in the perfect shuffle table. unsigned PFTableIndex = PFIndexes[0]*9*9*9+PFIndexes[1]*9*9+PFIndexes[2]*9+PFIndexes[3]; unsigned PFEntry = PerfectShuffleTable[PFTableIndex]; unsigned Cost = (PFEntry >> 30); if (Cost <= 4) return true; } bool ReverseVEXT; unsigned Imm, WhichResult; unsigned EltSize = VT.getVectorElementType().getSizeInBits(); return (EltSize >= 32 || ShuffleVectorSDNode::isSplatMask(&M[0], VT) || isVREVMask(M, VT, 64) || isVREVMask(M, VT, 32) || isVREVMask(M, VT, 16) || isVEXTMask(M, VT, ReverseVEXT, Imm) || isVTBLMask(M, VT) || isVTRNMask(M, VT, WhichResult) || isVUZPMask(M, VT, WhichResult) || isVZIPMask(M, VT, WhichResult) || isVTRN_v_undef_Mask(M, VT, WhichResult) || isVUZP_v_undef_Mask(M, VT, WhichResult) || isVZIP_v_undef_Mask(M, VT, WhichResult)); } /// GeneratePerfectShuffle - Given an entry in the perfect-shuffle table, emit /// the specified operations to build the shuffle. static SDValue GeneratePerfectShuffle(unsigned PFEntry, SDValue LHS, SDValue RHS, SelectionDAG &DAG, DebugLoc dl) { unsigned OpNum = (PFEntry >> 26) & 0x0F; unsigned LHSID = (PFEntry >> 13) & ((1 << 13)-1); unsigned RHSID = (PFEntry >> 0) & ((1 << 13)-1); enum { OP_COPY = 0, // Copy, used for things like to say it is <0,1,2,3> OP_VREV, OP_VDUP0, OP_VDUP1, OP_VDUP2, OP_VDUP3, OP_VEXT1, OP_VEXT2, OP_VEXT3, OP_VUZPL, // VUZP, left result OP_VUZPR, // VUZP, right result OP_VZIPL, // VZIP, left result OP_VZIPR, // VZIP, right result OP_VTRNL, // VTRN, left result OP_VTRNR // VTRN, right result }; if (OpNum == OP_COPY) { if (LHSID == (1*9+2)*9+3) return LHS; assert(LHSID == ((4*9+5)*9+6)*9+7 && "Illegal OP_COPY!"); return RHS; } SDValue OpLHS, OpRHS; OpLHS = GeneratePerfectShuffle(PerfectShuffleTable[LHSID], LHS, RHS, DAG, dl); OpRHS = GeneratePerfectShuffle(PerfectShuffleTable[RHSID], LHS, RHS, DAG, dl); EVT VT = OpLHS.getValueType(); switch (OpNum) { default: llvm_unreachable("Unknown shuffle opcode!"); case OP_VREV: // VREV divides the vector in half and swaps within the half. if (VT.getVectorElementType() == MVT::i32 || VT.getVectorElementType() == MVT::f32) return DAG.getNode(ARMISD::VREV64, dl, VT, OpLHS); // vrev <4 x i16> -> VREV32 if (VT.getVectorElementType() == MVT::i16) return DAG.getNode(ARMISD::VREV32, dl, VT, OpLHS); // vrev <4 x i8> -> VREV16 assert(VT.getVectorElementType() == MVT::i8); return DAG.getNode(ARMISD::VREV16, dl, VT, OpLHS); case OP_VDUP0: case OP_VDUP1: case OP_VDUP2: case OP_VDUP3: return DAG.getNode(ARMISD::VDUPLANE, dl, VT, OpLHS, DAG.getConstant(OpNum-OP_VDUP0, MVT::i32)); case OP_VEXT1: case OP_VEXT2: case OP_VEXT3: return DAG.getNode(ARMISD::VEXT, dl, VT, OpLHS, OpRHS, DAG.getConstant(OpNum-OP_VEXT1+1, MVT::i32)); case OP_VUZPL: case OP_VUZPR: return DAG.getNode(ARMISD::VUZP, dl, DAG.getVTList(VT, VT), OpLHS, OpRHS).getValue(OpNum-OP_VUZPL); case OP_VZIPL: case OP_VZIPR: return DAG.getNode(ARMISD::VZIP, dl, DAG.getVTList(VT, VT), OpLHS, OpRHS).getValue(OpNum-OP_VZIPL); case OP_VTRNL: case OP_VTRNR: return DAG.getNode(ARMISD::VTRN, dl, DAG.getVTList(VT, VT), OpLHS, OpRHS).getValue(OpNum-OP_VTRNL); } } static SDValue LowerVECTOR_SHUFFLEv8i8(SDValue Op, SmallVectorImpl &ShuffleMask, SelectionDAG &DAG) { // Check to see if we can use the VTBL instruction. SDValue V1 = Op.getOperand(0); SDValue V2 = Op.getOperand(1); DebugLoc DL = Op.getDebugLoc(); SmallVector VTBLMask; for (SmallVectorImpl::iterator I = ShuffleMask.begin(), E = ShuffleMask.end(); I != E; ++I) VTBLMask.push_back(DAG.getConstant(*I, MVT::i32)); if (V2.getNode()->getOpcode() == ISD::UNDEF) return DAG.getNode(ARMISD::VTBL1, DL, MVT::v8i8, V1, DAG.getNode(ISD::BUILD_VECTOR, DL, MVT::v8i8, &VTBLMask[0], 8)); return DAG.getNode(ARMISD::VTBL2, DL, MVT::v8i8, V1, V2, DAG.getNode(ISD::BUILD_VECTOR, DL, MVT::v8i8, &VTBLMask[0], 8)); } static SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) { SDValue V1 = Op.getOperand(0); SDValue V2 = Op.getOperand(1); DebugLoc dl = Op.getDebugLoc(); EVT VT = Op.getValueType(); ShuffleVectorSDNode *SVN = cast(Op.getNode()); SmallVector ShuffleMask; // Convert shuffles that are directly supported on NEON to target-specific // DAG nodes, instead of keeping them as shuffles and matching them again // during code selection. This is more efficient and avoids the possibility // of inconsistencies between legalization and selection. // FIXME: floating-point vectors should be canonicalized to integer vectors // of the same time so that they get CSEd properly. SVN->getMask(ShuffleMask); unsigned EltSize = VT.getVectorElementType().getSizeInBits(); if (EltSize <= 32) { if (ShuffleVectorSDNode::isSplatMask(&ShuffleMask[0], VT)) { int Lane = SVN->getSplatIndex(); // If this is undef splat, generate it via "just" vdup, if possible. if (Lane == -1) Lane = 0; if (Lane == 0 && V1.getOpcode() == ISD::SCALAR_TO_VECTOR) { return DAG.getNode(ARMISD::VDUP, dl, VT, V1.getOperand(0)); } return DAG.getNode(ARMISD::VDUPLANE, dl, VT, V1, DAG.getConstant(Lane, MVT::i32)); } bool ReverseVEXT; unsigned Imm; if (isVEXTMask(ShuffleMask, VT, ReverseVEXT, Imm)) { if (ReverseVEXT) std::swap(V1, V2); return DAG.getNode(ARMISD::VEXT, dl, VT, V1, V2, DAG.getConstant(Imm, MVT::i32)); } if (isVREVMask(ShuffleMask, VT, 64)) return DAG.getNode(ARMISD::VREV64, dl, VT, V1); if (isVREVMask(ShuffleMask, VT, 32)) return DAG.getNode(ARMISD::VREV32, dl, VT, V1); if (isVREVMask(ShuffleMask, VT, 16)) return DAG.getNode(ARMISD::VREV16, dl, VT, V1); // Check for Neon shuffles that modify both input vectors in place. // If both results are used, i.e., if there are two shuffles with the same // source operands and with masks corresponding to both results of one of // these operations, DAG memoization will ensure that a single node is // used for both shuffles. unsigned WhichResult; if (isVTRNMask(ShuffleMask, VT, WhichResult)) return DAG.getNode(ARMISD::VTRN, dl, DAG.getVTList(VT, VT), V1, V2).getValue(WhichResult); if (isVUZPMask(ShuffleMask, VT, WhichResult)) return DAG.getNode(ARMISD::VUZP, dl, DAG.getVTList(VT, VT), V1, V2).getValue(WhichResult); if (isVZIPMask(ShuffleMask, VT, WhichResult)) return DAG.getNode(ARMISD::VZIP, dl, DAG.getVTList(VT, VT), V1, V2).getValue(WhichResult); if (isVTRN_v_undef_Mask(ShuffleMask, VT, WhichResult)) return DAG.getNode(ARMISD::VTRN, dl, DAG.getVTList(VT, VT), V1, V1).getValue(WhichResult); if (isVUZP_v_undef_Mask(ShuffleMask, VT, WhichResult)) return DAG.getNode(ARMISD::VUZP, dl, DAG.getVTList(VT, VT), V1, V1).getValue(WhichResult); if (isVZIP_v_undef_Mask(ShuffleMask, VT, WhichResult)) return DAG.getNode(ARMISD::VZIP, dl, DAG.getVTList(VT, VT), V1, V1).getValue(WhichResult); } // If the shuffle is not directly supported and it has 4 elements, use // the PerfectShuffle-generated table to synthesize it from other shuffles. unsigned NumElts = VT.getVectorNumElements(); if (NumElts == 4) { unsigned PFIndexes[4]; for (unsigned i = 0; i != 4; ++i) { if (ShuffleMask[i] < 0) PFIndexes[i] = 8; else PFIndexes[i] = ShuffleMask[i]; } // Compute the index in the perfect shuffle table. unsigned PFTableIndex = PFIndexes[0]*9*9*9+PFIndexes[1]*9*9+PFIndexes[2]*9+PFIndexes[3]; unsigned PFEntry = PerfectShuffleTable[PFTableIndex]; unsigned Cost = (PFEntry >> 30); if (Cost <= 4) return GeneratePerfectShuffle(PFEntry, V1, V2, DAG, dl); } // Implement shuffles with 32- or 64-bit elements as ARMISD::BUILD_VECTORs. if (EltSize >= 32) { // Do the expansion with floating-point types, since that is what the VFP // registers are defined to use, and since i64 is not legal. EVT EltVT = EVT::getFloatingPointVT(EltSize); EVT VecVT = EVT::getVectorVT(*DAG.getContext(), EltVT, NumElts); V1 = DAG.getNode(ISD::BITCAST, dl, VecVT, V1); V2 = DAG.getNode(ISD::BITCAST, dl, VecVT, V2); SmallVector Ops; for (unsigned i = 0; i < NumElts; ++i) { if (ShuffleMask[i] < 0) Ops.push_back(DAG.getUNDEF(EltVT)); else Ops.push_back(DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, EltVT, ShuffleMask[i] < (int)NumElts ? V1 : V2, DAG.getConstant(ShuffleMask[i] & (NumElts-1), MVT::i32))); } SDValue Val = DAG.getNode(ARMISD::BUILD_VECTOR, dl, VecVT, &Ops[0],NumElts); return DAG.getNode(ISD::BITCAST, dl, VT, Val); } if (VT == MVT::v8i8) { SDValue NewOp = LowerVECTOR_SHUFFLEv8i8(Op, ShuffleMask, DAG); if (NewOp.getNode()) return NewOp; } return SDValue(); } static SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) { // EXTRACT_VECTOR_ELT is legal only for immediate indexes. SDValue Lane = Op.getOperand(1); if (!isa(Lane)) return SDValue(); SDValue Vec = Op.getOperand(0); if (Op.getValueType() == MVT::i32 && Vec.getValueType().getVectorElementType().getSizeInBits() < 32) { DebugLoc dl = Op.getDebugLoc(); return DAG.getNode(ARMISD::VGETLANEu, dl, MVT::i32, Vec, Lane); } return Op; } static SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) { // The only time a CONCAT_VECTORS operation can have legal types is when // two 64-bit vectors are concatenated to a 128-bit vector. assert(Op.getValueType().is128BitVector() && Op.getNumOperands() == 2 && "unexpected CONCAT_VECTORS"); DebugLoc dl = Op.getDebugLoc(); SDValue Val = DAG.getUNDEF(MVT::v2f64); SDValue Op0 = Op.getOperand(0); SDValue Op1 = Op.getOperand(1); if (Op0.getOpcode() != ISD::UNDEF) Val = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, MVT::v2f64, Val, DAG.getNode(ISD::BITCAST, dl, MVT::f64, Op0), DAG.getIntPtrConstant(0)); if (Op1.getOpcode() != ISD::UNDEF) Val = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, MVT::v2f64, Val, DAG.getNode(ISD::BITCAST, dl, MVT::f64, Op1), DAG.getIntPtrConstant(1)); return DAG.getNode(ISD::BITCAST, dl, Op.getValueType(), Val); } /// isExtendedBUILD_VECTOR - Check if N is a constant BUILD_VECTOR where each /// element has been zero/sign-extended, depending on the isSigned parameter, /// from an integer type half its size. static bool isExtendedBUILD_VECTOR(SDNode *N, SelectionDAG &DAG, bool isSigned) { // A v2i64 BUILD_VECTOR will have been legalized to a BITCAST from v4i32. EVT VT = N->getValueType(0); if (VT == MVT::v2i64 && N->getOpcode() == ISD::BITCAST) { SDNode *BVN = N->getOperand(0).getNode(); if (BVN->getValueType(0) != MVT::v4i32 || BVN->getOpcode() != ISD::BUILD_VECTOR) return false; unsigned LoElt = DAG.getTargetLoweringInfo().isBigEndian() ? 1 : 0; unsigned HiElt = 1 - LoElt; ConstantSDNode *Lo0 = dyn_cast(BVN->getOperand(LoElt)); ConstantSDNode *Hi0 = dyn_cast(BVN->getOperand(HiElt)); ConstantSDNode *Lo1 = dyn_cast(BVN->getOperand(LoElt+2)); ConstantSDNode *Hi1 = dyn_cast(BVN->getOperand(HiElt+2)); if (!Lo0 || !Hi0 || !Lo1 || !Hi1) return false; if (isSigned) { if (Hi0->getSExtValue() == Lo0->getSExtValue() >> 32 && Hi1->getSExtValue() == Lo1->getSExtValue() >> 32) return true; } else { if (Hi0->isNullValue() && Hi1->isNullValue()) return true; } return false; } if (N->getOpcode() != ISD::BUILD_VECTOR) return false; for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i) { SDNode *Elt = N->getOperand(i).getNode(); if (ConstantSDNode *C = dyn_cast(Elt)) { unsigned EltSize = VT.getVectorElementType().getSizeInBits(); unsigned HalfSize = EltSize / 2; if (isSigned) { int64_t SExtVal = C->getSExtValue(); if ((SExtVal >> HalfSize) != (SExtVal >> EltSize)) return false; } else { if ((C->getZExtValue() >> HalfSize) != 0) return false; } continue; } return false; } return true; } /// isSignExtended - Check if a node is a vector value that is sign-extended /// or a constant BUILD_VECTOR with sign-extended elements. static bool isSignExtended(SDNode *N, SelectionDAG &DAG) { if (N->getOpcode() == ISD::SIGN_EXTEND || ISD::isSEXTLoad(N)) return true; if (isExtendedBUILD_VECTOR(N, DAG, true)) return true; return false; } /// isZeroExtended - Check if a node is a vector value that is zero-extended /// or a constant BUILD_VECTOR with zero-extended elements. static bool isZeroExtended(SDNode *N, SelectionDAG &DAG) { if (N->getOpcode() == ISD::ZERO_EXTEND || ISD::isZEXTLoad(N)) return true; if (isExtendedBUILD_VECTOR(N, DAG, false)) return true; return false; } /// SkipExtension - For a node that is a SIGN_EXTEND, ZERO_EXTEND, extending /// load, or BUILD_VECTOR with extended elements, return the unextended value. static SDValue SkipExtension(SDNode *N, SelectionDAG &DAG) { if (N->getOpcode() == ISD::SIGN_EXTEND || N->getOpcode() == ISD::ZERO_EXTEND) return N->getOperand(0); if (LoadSDNode *LD = dyn_cast(N)) return DAG.getLoad(LD->getMemoryVT(), N->getDebugLoc(), LD->getChain(), LD->getBasePtr(), LD->getPointerInfo(), LD->isVolatile(), LD->isNonTemporal(), LD->getAlignment()); // Otherwise, the value must be a BUILD_VECTOR. For v2i64, it will // have been legalized as a BITCAST from v4i32. if (N->getOpcode() == ISD::BITCAST) { SDNode *BVN = N->getOperand(0).getNode(); assert(BVN->getOpcode() == ISD::BUILD_VECTOR && BVN->getValueType(0) == MVT::v4i32 && "expected v4i32 BUILD_VECTOR"); unsigned LowElt = DAG.getTargetLoweringInfo().isBigEndian() ? 1 : 0; return DAG.getNode(ISD::BUILD_VECTOR, N->getDebugLoc(), MVT::v2i32, BVN->getOperand(LowElt), BVN->getOperand(LowElt+2)); } // Construct a new BUILD_VECTOR with elements truncated to half the size. assert(N->getOpcode() == ISD::BUILD_VECTOR && "expected BUILD_VECTOR"); EVT VT = N->getValueType(0); unsigned EltSize = VT.getVectorElementType().getSizeInBits() / 2; unsigned NumElts = VT.getVectorNumElements(); MVT TruncVT = MVT::getIntegerVT(EltSize); SmallVector Ops; for (unsigned i = 0; i != NumElts; ++i) { ConstantSDNode *C = cast(N->getOperand(i)); const APInt &CInt = C->getAPIntValue(); Ops.push_back(DAG.getConstant(CInt.trunc(EltSize), TruncVT)); } return DAG.getNode(ISD::BUILD_VECTOR, N->getDebugLoc(), MVT::getVectorVT(TruncVT, NumElts), Ops.data(), NumElts); } static bool isAddSubSExt(SDNode *N, SelectionDAG &DAG) { unsigned Opcode = N->getOpcode(); if (Opcode == ISD::ADD || Opcode == ISD::SUB) { SDNode *N0 = N->getOperand(0).getNode(); SDNode *N1 = N->getOperand(1).getNode(); return N0->hasOneUse() && N1->hasOneUse() && isSignExtended(N0, DAG) && isSignExtended(N1, DAG); } return false; } static bool isAddSubZExt(SDNode *N, SelectionDAG &DAG) { unsigned Opcode = N->getOpcode(); if (Opcode == ISD::ADD || Opcode == ISD::SUB) { SDNode *N0 = N->getOperand(0).getNode(); SDNode *N1 = N->getOperand(1).getNode(); return N0->hasOneUse() && N1->hasOneUse() && isZeroExtended(N0, DAG) && isZeroExtended(N1, DAG); } return false; } static SDValue LowerMUL(SDValue Op, SelectionDAG &DAG) { // Multiplications are only custom-lowered for 128-bit vectors so that // VMULL can be detected. Otherwise v2i64 multiplications are not legal. EVT VT = Op.getValueType(); assert(VT.is128BitVector() && "unexpected type for custom-lowering ISD::MUL"); SDNode *N0 = Op.getOperand(0).getNode(); SDNode *N1 = Op.getOperand(1).getNode(); unsigned NewOpc = 0; bool isMLA = false; bool isN0SExt = isSignExtended(N0, DAG); bool isN1SExt = isSignExtended(N1, DAG); if (isN0SExt && isN1SExt) NewOpc = ARMISD::VMULLs; else { bool isN0ZExt = isZeroExtended(N0, DAG); bool isN1ZExt = isZeroExtended(N1, DAG); if (isN0ZExt && isN1ZExt) NewOpc = ARMISD::VMULLu; else if (isN1SExt || isN1ZExt) { // Look for (s/zext A + s/zext B) * (s/zext C). We want to turn these // into (s/zext A * s/zext C) + (s/zext B * s/zext C) if (isN1SExt && isAddSubSExt(N0, DAG)) { NewOpc = ARMISD::VMULLs; isMLA = true; } else if (isN1ZExt && isAddSubZExt(N0, DAG)) { NewOpc = ARMISD::VMULLu; isMLA = true; } else if (isN0ZExt && isAddSubZExt(N1, DAG)) { std::swap(N0, N1); NewOpc = ARMISD::VMULLu; isMLA = true; } } if (!NewOpc) { if (VT == MVT::v2i64) // Fall through to expand this. It is not legal. return SDValue(); else // Other vector multiplications are legal. return Op; } } // Legalize to a VMULL instruction. DebugLoc DL = Op.getDebugLoc(); SDValue Op0; SDValue Op1 = SkipExtension(N1, DAG); if (!isMLA) { Op0 = SkipExtension(N0, DAG); assert(Op0.getValueType().is64BitVector() && Op1.getValueType().is64BitVector() && "unexpected types for extended operands to VMULL"); return DAG.getNode(NewOpc, DL, VT, Op0, Op1); } // Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during // isel lowering to take advantage of no-stall back to back vmul + vmla. // vmull q0, d4, d6 // vmlal q0, d5, d6 // is faster than // vaddl q0, d4, d5 // vmovl q1, d6 // vmul q0, q0, q1 SDValue N00 = SkipExtension(N0->getOperand(0).getNode(), DAG); SDValue N01 = SkipExtension(N0->getOperand(1).getNode(), DAG); EVT Op1VT = Op1.getValueType(); return DAG.getNode(N0->getOpcode(), DL, VT, DAG.getNode(NewOpc, DL, VT, DAG.getNode(ISD::BITCAST, DL, Op1VT, N00), Op1), DAG.getNode(NewOpc, DL, VT, DAG.getNode(ISD::BITCAST, DL, Op1VT, N01), Op1)); } static SDValue LowerSDIV_v4i8(SDValue X, SDValue Y, DebugLoc dl, SelectionDAG &DAG) { // Convert to float // float4 xf = vcvt_f32_s32(vmovl_s16(a.lo)); // float4 yf = vcvt_f32_s32(vmovl_s16(b.lo)); X = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v4i32, X); Y = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v4i32, Y); X = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::v4f32, X); Y = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::v4f32, Y); // Get reciprocal estimate. // float4 recip = vrecpeq_f32(yf); Y = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, MVT::v4f32, DAG.getConstant(Intrinsic::arm_neon_vrecpe, MVT::i32), Y); // Because char has a smaller range than uchar, we can actually get away // without any newton steps. This requires that we use a weird bias // of 0xb000, however (again, this has been exhaustively tested). // float4 result = as_float4(as_int4(xf*recip) + 0xb000); X = DAG.getNode(ISD::FMUL, dl, MVT::v4f32, X, Y); X = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, X); Y = DAG.getConstant(0xb000, MVT::i32); Y = DAG.getNode(ISD::BUILD_VECTOR, dl, MVT::v4i32, Y, Y, Y, Y); X = DAG.getNode(ISD::ADD, dl, MVT::v4i32, X, Y); X = DAG.getNode(ISD::BITCAST, dl, MVT::v4f32, X); // Convert back to short. X = DAG.getNode(ISD::FP_TO_SINT, dl, MVT::v4i32, X); X = DAG.getNode(ISD::TRUNCATE, dl, MVT::v4i16, X); return X; } static SDValue LowerSDIV_v4i16(SDValue N0, SDValue N1, DebugLoc dl, SelectionDAG &DAG) { SDValue N2; // Convert to float. // float4 yf = vcvt_f32_s32(vmovl_s16(y)); // float4 xf = vcvt_f32_s32(vmovl_s16(x)); N0 = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v4i32, N0); N1 = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v4i32, N1); N0 = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::v4f32, N0); N1 = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::v4f32, N1); // Use reciprocal estimate and one refinement step. // float4 recip = vrecpeq_f32(yf); // recip *= vrecpsq_f32(yf, recip); N2 = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, MVT::v4f32, DAG.getConstant(Intrinsic::arm_neon_vrecpe, MVT::i32), N1); N1 = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, MVT::v4f32, DAG.getConstant(Intrinsic::arm_neon_vrecps, MVT::i32), N1, N2); N2 = DAG.getNode(ISD::FMUL, dl, MVT::v4f32, N1, N2); // Because short has a smaller range than ushort, we can actually get away // with only a single newton step. This requires that we use a weird bias // of 89, however (again, this has been exhaustively tested). // float4 result = as_float4(as_int4(xf*recip) + 0x89); N0 = DAG.getNode(ISD::FMUL, dl, MVT::v4f32, N0, N2); N0 = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, N0); N1 = DAG.getConstant(0x89, MVT::i32); N1 = DAG.getNode(ISD::BUILD_VECTOR, dl, MVT::v4i32, N1, N1, N1, N1); N0 = DAG.getNode(ISD::ADD, dl, MVT::v4i32, N0, N1); N0 = DAG.getNode(ISD::BITCAST, dl, MVT::v4f32, N0); // Convert back to integer and return. // return vmovn_s32(vcvt_s32_f32(result)); N0 = DAG.getNode(ISD::FP_TO_SINT, dl, MVT::v4i32, N0); N0 = DAG.getNode(ISD::TRUNCATE, dl, MVT::v4i16, N0); return N0; } static SDValue LowerSDIV(SDValue Op, SelectionDAG &DAG) { EVT VT = Op.getValueType(); assert((VT == MVT::v4i16 || VT == MVT::v8i8) && "unexpected type for custom-lowering ISD::SDIV"); DebugLoc dl = Op.getDebugLoc(); SDValue N0 = Op.getOperand(0); SDValue N1 = Op.getOperand(1); SDValue N2, N3; if (VT == MVT::v8i8) { N0 = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v8i16, N0); N1 = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v8i16, N1); N2 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MVT::v4i16, N0, DAG.getIntPtrConstant(4)); N3 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MVT::v4i16, N1, DAG.getIntPtrConstant(4)); N0 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MVT::v4i16, N0, DAG.getIntPtrConstant(0)); N1 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MVT::v4i16, N1, DAG.getIntPtrConstant(0)); N0 = LowerSDIV_v4i8(N0, N1, dl, DAG); // v4i16 N2 = LowerSDIV_v4i8(N2, N3, dl, DAG); // v4i16 N0 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v8i16, N0, N2); N0 = LowerCONCAT_VECTORS(N0, DAG); N0 = DAG.getNode(ISD::TRUNCATE, dl, MVT::v8i8, N0); return N0; } return LowerSDIV_v4i16(N0, N1, dl, DAG); } static SDValue LowerUDIV(SDValue Op, SelectionDAG &DAG) { EVT VT = Op.getValueType(); assert((VT == MVT::v4i16 || VT == MVT::v8i8) && "unexpected type for custom-lowering ISD::UDIV"); DebugLoc dl = Op.getDebugLoc(); SDValue N0 = Op.getOperand(0); SDValue N1 = Op.getOperand(1); SDValue N2, N3; if (VT == MVT::v8i8) { N0 = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::v8i16, N0); N1 = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::v8i16, N1); N2 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MVT::v4i16, N0, DAG.getIntPtrConstant(4)); N3 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MVT::v4i16, N1, DAG.getIntPtrConstant(4)); N0 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MVT::v4i16, N0, DAG.getIntPtrConstant(0)); N1 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, MVT::v4i16, N1, DAG.getIntPtrConstant(0)); N0 = LowerSDIV_v4i16(N0, N1, dl, DAG); // v4i16 N2 = LowerSDIV_v4i16(N2, N3, dl, DAG); // v4i16 N0 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v8i16, N0, N2); N0 = LowerCONCAT_VECTORS(N0, DAG); N0 = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, MVT::v8i8, DAG.getConstant(Intrinsic::arm_neon_vqmovnsu, MVT::i32), N0); return N0; } // v4i16 sdiv ... Convert to float. // float4 yf = vcvt_f32_s32(vmovl_u16(y)); // float4 xf = vcvt_f32_s32(vmovl_u16(x)); N0 = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::v4i32, N0); N1 = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::v4i32, N1); N0 = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::v4f32, N0); SDValue BN1 = DAG.getNode(ISD::SINT_TO_FP, dl, MVT::v4f32, N1); // Use reciprocal estimate and two refinement steps. // float4 recip = vrecpeq_f32(yf); // recip *= vrecpsq_f32(yf, recip); // recip *= vrecpsq_f32(yf, recip); N2 = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, MVT::v4f32, DAG.getConstant(Intrinsic::arm_neon_vrecpe, MVT::i32), BN1); N1 = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, MVT::v4f32, DAG.getConstant(Intrinsic::arm_neon_vrecps, MVT::i32), BN1, N2); N2 = DAG.getNode(ISD::FMUL, dl, MVT::v4f32, N1, N2); N1 = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, dl, MVT::v4f32, DAG.getConstant(Intrinsic::arm_neon_vrecps, MVT::i32), BN1, N2); N2 = DAG.getNode(ISD::FMUL, dl, MVT::v4f32, N1, N2); // Simply multiplying by the reciprocal estimate can leave us a few ulps // too low, so we add 2 ulps (exhaustive testing shows that this is enough, // and that it will never cause us to return an answer too large). // float4 result = as_float4(as_int4(xf*recip) + 2); N0 = DAG.getNode(ISD::FMUL, dl, MVT::v4f32, N0, N2); N0 = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, N0); N1 = DAG.getConstant(2, MVT::i32); N1 = DAG.getNode(ISD::BUILD_VECTOR, dl, MVT::v4i32, N1, N1, N1, N1); N0 = DAG.getNode(ISD::ADD, dl, MVT::v4i32, N0, N1); N0 = DAG.getNode(ISD::BITCAST, dl, MVT::v4f32, N0); // Convert back to integer and return. // return vmovn_u32(vcvt_s32_f32(result)); N0 = DAG.getNode(ISD::FP_TO_SINT, dl, MVT::v4i32, N0); N0 = DAG.getNode(ISD::TRUNCATE, dl, MVT::v4i16, N0); return N0; } static SDValue LowerADDC_ADDE_SUBC_SUBE(SDValue Op, SelectionDAG &DAG) { EVT VT = Op.getNode()->getValueType(0); SDVTList VTs = DAG.getVTList(VT, MVT::i32); unsigned Opc; bool ExtraOp = false; switch (Op.getOpcode()) { default: assert(0 && "Invalid code"); case ISD::ADDC: Opc = ARMISD::ADDC; break; case ISD::ADDE: Opc = ARMISD::ADDE; ExtraOp = true; break; case ISD::SUBC: Opc = ARMISD::SUBC; break; case ISD::SUBE: Opc = ARMISD::SUBE; ExtraOp = true; break; } if (!ExtraOp) return DAG.getNode(Opc, Op->getDebugLoc(), VTs, Op.getOperand(0), Op.getOperand(1)); return DAG.getNode(Opc, Op->getDebugLoc(), VTs, Op.getOperand(0), Op.getOperand(1), Op.getOperand(2)); } static SDValue LowerAtomicLoadStore(SDValue Op, SelectionDAG &DAG) { // Monotonic load/store is legal for all targets if (cast(Op)->getOrdering() <= Monotonic) return Op; // Aquire/Release load/store is not legal for targets without a // dmb or equivalent available. return SDValue(); } static void ReplaceATOMIC_OP_64(SDNode *Node, SmallVectorImpl& Results, SelectionDAG &DAG, unsigned NewOp) { DebugLoc dl = Node->getDebugLoc(); assert (Node->getValueType(0) == MVT::i64 && "Only know how to expand i64 atomics"); SmallVector Ops; Ops.push_back(Node->getOperand(0)); // Chain Ops.push_back(Node->getOperand(1)); // Ptr // Low part of Val1 Ops.push_back(DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, Node->getOperand(2), DAG.getIntPtrConstant(0))); // High part of Val1 Ops.push_back(DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, Node->getOperand(2), DAG.getIntPtrConstant(1))); if (NewOp == ARMISD::ATOMCMPXCHG64_DAG) { // High part of Val1 Ops.push_back(DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, Node->getOperand(3), DAG.getIntPtrConstant(0))); // High part of Val2 Ops.push_back(DAG.getNode(ISD::EXTRACT_ELEMENT, dl, MVT::i32, Node->getOperand(3), DAG.getIntPtrConstant(1))); } SDVTList Tys = DAG.getVTList(MVT::i32, MVT::i32, MVT::Other); SDValue Result = DAG.getMemIntrinsicNode(NewOp, dl, Tys, Ops.data(), Ops.size(), MVT::i64, cast(Node)->getMemOperand()); SDValue OpsF[] = { Result.getValue(0), Result.getValue(1) }; Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, OpsF, 2)); Results.push_back(Result.getValue(2)); } SDValue ARMTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { switch (Op.getOpcode()) { default: llvm_unreachable("Don't know how to custom lower this!"); case ISD::ConstantPool: return LowerConstantPool(Op, DAG); case ISD::BlockAddress: return LowerBlockAddress(Op, DAG); case ISD::GlobalAddress: return Subtarget->isTargetDarwin() ? LowerGlobalAddressDarwin(Op, DAG) : LowerGlobalAddressELF(Op, DAG); case ISD::GlobalTLSAddress: return LowerGlobalTLSAddress(Op, DAG); case ISD::SELECT: return LowerSELECT(Op, DAG); case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG); case ISD::BR_CC: return LowerBR_CC(Op, DAG); case ISD::BR_JT: return LowerBR_JT(Op, DAG); case ISD::VASTART: return LowerVASTART(Op, DAG); case ISD::MEMBARRIER: return LowerMEMBARRIER(Op, DAG, Subtarget); case ISD::ATOMIC_FENCE: return LowerATOMIC_FENCE(Op, DAG, Subtarget); case ISD::PREFETCH: return LowerPREFETCH(Op, DAG, Subtarget); case ISD::SINT_TO_FP: case ISD::UINT_TO_FP: return LowerINT_TO_FP(Op, DAG); case ISD::FP_TO_SINT: case ISD::FP_TO_UINT: return LowerFP_TO_INT(Op, DAG); case ISD::FCOPYSIGN: return LowerFCOPYSIGN(Op, DAG); case ISD::RETURNADDR: return LowerRETURNADDR(Op, DAG); case ISD::FRAMEADDR: return LowerFRAMEADDR(Op, DAG); case ISD::GLOBAL_OFFSET_TABLE: return LowerGLOBAL_OFFSET_TABLE(Op, DAG); case ISD::EH_SJLJ_SETJMP: return LowerEH_SJLJ_SETJMP(Op, DAG); case ISD::EH_SJLJ_LONGJMP: return LowerEH_SJLJ_LONGJMP(Op, DAG); case ISD::EH_SJLJ_DISPATCHSETUP: return LowerEH_SJLJ_DISPATCHSETUP(Op, DAG); case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG, Subtarget); case ISD::BITCAST: return ExpandBITCAST(Op.getNode(), DAG); case ISD::SHL: case ISD::SRL: case ISD::SRA: return LowerShift(Op.getNode(), DAG, Subtarget); case ISD::SHL_PARTS: return LowerShiftLeftParts(Op, DAG); case ISD::SRL_PARTS: case ISD::SRA_PARTS: return LowerShiftRightParts(Op, DAG); case ISD::CTTZ: return LowerCTTZ(Op.getNode(), DAG, Subtarget); case ISD::SETCC: return LowerVSETCC(Op, DAG); case ISD::BUILD_VECTOR: return LowerBUILD_VECTOR(Op, DAG, Subtarget); case ISD::VECTOR_SHUFFLE: return LowerVECTOR_SHUFFLE(Op, DAG); case ISD::EXTRACT_VECTOR_ELT: return LowerEXTRACT_VECTOR_ELT(Op, DAG); case ISD::CONCAT_VECTORS: return LowerCONCAT_VECTORS(Op, DAG); case ISD::FLT_ROUNDS_: return LowerFLT_ROUNDS_(Op, DAG); case ISD::MUL: return LowerMUL(Op, DAG); case ISD::SDIV: return LowerSDIV(Op, DAG); case ISD::UDIV: return LowerUDIV(Op, DAG); case ISD::ADDC: case ISD::ADDE: case ISD::SUBC: case ISD::SUBE: return LowerADDC_ADDE_SUBC_SUBE(Op, DAG); case ISD::ATOMIC_LOAD: case ISD::ATOMIC_STORE: return LowerAtomicLoadStore(Op, DAG); } return SDValue(); } /// ReplaceNodeResults - Replace the results of node with an illegal result /// type with new values built out of custom code. void ARMTargetLowering::ReplaceNodeResults(SDNode *N, SmallVectorImpl&Results, SelectionDAG &DAG) const { SDValue Res; switch (N->getOpcode()) { default: llvm_unreachable("Don't know how to custom expand this!"); break; case ISD::BITCAST: Res = ExpandBITCAST(N, DAG); break; case ISD::SRL: case ISD::SRA: Res = Expand64BitShift(N, DAG, Subtarget); break; case ISD::ATOMIC_LOAD_ADD: ReplaceATOMIC_OP_64(N, Results, DAG, ARMISD::ATOMADD64_DAG); return; case ISD::ATOMIC_LOAD_AND: ReplaceATOMIC_OP_64(N, Results, DAG, ARMISD::ATOMAND64_DAG); return; case ISD::ATOMIC_LOAD_NAND: ReplaceATOMIC_OP_64(N, Results, DAG, ARMISD::ATOMNAND64_DAG); return; case ISD::ATOMIC_LOAD_OR: ReplaceATOMIC_OP_64(N, Results, DAG, ARMISD::ATOMOR64_DAG); return; case ISD::ATOMIC_LOAD_SUB: ReplaceATOMIC_OP_64(N, Results, DAG, ARMISD::ATOMSUB64_DAG); return; case ISD::ATOMIC_LOAD_XOR: ReplaceATOMIC_OP_64(N, Results, DAG, ARMISD::ATOMXOR64_DAG); return; case ISD::ATOMIC_SWAP: ReplaceATOMIC_OP_64(N, Results, DAG, ARMISD::ATOMSWAP64_DAG); return; case ISD::ATOMIC_CMP_SWAP: ReplaceATOMIC_OP_64(N, Results, DAG, ARMISD::ATOMCMPXCHG64_DAG); return; } if (Res.getNode()) Results.push_back(Res); } //===----------------------------------------------------------------------===// // ARM Scheduler Hooks //===----------------------------------------------------------------------===// MachineBasicBlock * ARMTargetLowering::EmitAtomicCmpSwap(MachineInstr *MI, MachineBasicBlock *BB, unsigned Size) const { unsigned dest = MI->getOperand(0).getReg(); unsigned ptr = MI->getOperand(1).getReg(); unsigned oldval = MI->getOperand(2).getReg(); unsigned newval = MI->getOperand(3).getReg(); const TargetInstrInfo *TII = getTargetMachine().getInstrInfo(); DebugLoc dl = MI->getDebugLoc(); bool isThumb2 = Subtarget->isThumb2(); MachineRegisterInfo &MRI = BB->getParent()->getRegInfo(); unsigned scratch = MRI.createVirtualRegister(isThumb2 ? ARM::rGPRRegisterClass : ARM::GPRRegisterClass); if (isThumb2) { MRI.constrainRegClass(dest, ARM::rGPRRegisterClass); MRI.constrainRegClass(oldval, ARM::rGPRRegisterClass); MRI.constrainRegClass(newval, ARM::rGPRRegisterClass); } unsigned ldrOpc, strOpc; switch (Size) { default: llvm_unreachable("unsupported size for AtomicCmpSwap!"); case 1: ldrOpc = isThumb2 ? ARM::t2LDREXB : ARM::LDREXB; strOpc = isThumb2 ? ARM::t2STREXB : ARM::STREXB; break; case 2: ldrOpc = isThumb2 ? ARM::t2LDREXH : ARM::LDREXH; strOpc = isThumb2 ? ARM::t2STREXH : ARM::STREXH; break; case 4: ldrOpc = isThumb2 ? ARM::t2LDREX : ARM::LDREX; strOpc = isThumb2 ? ARM::t2STREX : ARM::STREX; break; } MachineFunction *MF = BB->getParent(); const BasicBlock *LLVM_BB = BB->getBasicBlock(); MachineFunction::iterator It = BB; ++It; // insert the new blocks after the current block MachineBasicBlock *loop1MBB = MF->CreateMachineBasicBlock(LLVM_BB); MachineBasicBlock *loop2MBB = MF->CreateMachineBasicBlock(LLVM_BB); MachineBasicBlock *exitMBB = MF->CreateMachineBasicBlock(LLVM_BB); MF->insert(It, loop1MBB); MF->insert(It, loop2MBB); MF->insert(It, exitMBB); // Transfer the remainder of BB and its successor edges to exitMBB. exitMBB->splice(exitMBB->begin(), BB, llvm::next(MachineBasicBlock::iterator(MI)), BB->end()); exitMBB->transferSuccessorsAndUpdatePHIs(BB); // thisMBB: // ... // fallthrough --> loop1MBB BB->addSuccessor(loop1MBB); // loop1MBB: // ldrex dest, [ptr] // cmp dest, oldval // bne exitMBB BB = loop1MBB; MachineInstrBuilder MIB = BuildMI(BB, dl, TII->get(ldrOpc), dest).addReg(ptr); if (ldrOpc == ARM::t2LDREX) MIB.addImm(0); AddDefaultPred(MIB); AddDefaultPred(BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPrr : ARM::CMPrr)) .addReg(dest).addReg(oldval)); BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc)) .addMBB(exitMBB).addImm(ARMCC::NE).addReg(ARM::CPSR); BB->addSuccessor(loop2MBB); BB->addSuccessor(exitMBB); // loop2MBB: // strex scratch, newval, [ptr] // cmp scratch, #0 // bne loop1MBB BB = loop2MBB; MIB = BuildMI(BB, dl, TII->get(strOpc), scratch).addReg(newval).addReg(ptr); if (strOpc == ARM::t2STREX) MIB.addImm(0); AddDefaultPred(MIB); AddDefaultPred(BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPri : ARM::CMPri)) .addReg(scratch).addImm(0)); BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc)) .addMBB(loop1MBB).addImm(ARMCC::NE).addReg(ARM::CPSR); BB->addSuccessor(loop1MBB); BB->addSuccessor(exitMBB); // exitMBB: // ... BB = exitMBB; MI->eraseFromParent(); // The instruction is gone now. return BB; } MachineBasicBlock * ARMTargetLowering::EmitAtomicBinary(MachineInstr *MI, MachineBasicBlock *BB, unsigned Size, unsigned BinOpcode) const { // This also handles ATOMIC_SWAP, indicated by BinOpcode==0. const TargetInstrInfo *TII = getTargetMachine().getInstrInfo(); const BasicBlock *LLVM_BB = BB->getBasicBlock(); MachineFunction *MF = BB->getParent(); MachineFunction::iterator It = BB; ++It; unsigned dest = MI->getOperand(0).getReg(); unsigned ptr = MI->getOperand(1).getReg(); unsigned incr = MI->getOperand(2).getReg(); DebugLoc dl = MI->getDebugLoc(); bool isThumb2 = Subtarget->isThumb2(); MachineRegisterInfo &MRI = BB->getParent()->getRegInfo(); if (isThumb2) { MRI.constrainRegClass(dest, ARM::rGPRRegisterClass); MRI.constrainRegClass(ptr, ARM::rGPRRegisterClass); } unsigned ldrOpc, strOpc; switch (Size) { default: llvm_unreachable("unsupported size for AtomicCmpSwap!"); case 1: ldrOpc = isThumb2 ? ARM::t2LDREXB : ARM::LDREXB; strOpc = isThumb2 ? ARM::t2STREXB : ARM::STREXB; break; case 2: ldrOpc = isThumb2 ? ARM::t2LDREXH : ARM::LDREXH; strOpc = isThumb2 ? ARM::t2STREXH : ARM::STREXH; break; case 4: ldrOpc = isThumb2 ? ARM::t2LDREX : ARM::LDREX; strOpc = isThumb2 ? ARM::t2STREX : ARM::STREX; break; } MachineBasicBlock *loopMBB = MF->CreateMachineBasicBlock(LLVM_BB); MachineBasicBlock *exitMBB = MF->CreateMachineBasicBlock(LLVM_BB); MF->insert(It, loopMBB); MF->insert(It, exitMBB); // Transfer the remainder of BB and its successor edges to exitMBB. exitMBB->splice(exitMBB->begin(), BB, llvm::next(MachineBasicBlock::iterator(MI)), BB->end()); exitMBB->transferSuccessorsAndUpdatePHIs(BB); TargetRegisterClass *TRC = isThumb2 ? ARM::tGPRRegisterClass : ARM::GPRRegisterClass; unsigned scratch = MRI.createVirtualRegister(TRC); unsigned scratch2 = (!BinOpcode) ? incr : MRI.createVirtualRegister(TRC); // thisMBB: // ... // fallthrough --> loopMBB BB->addSuccessor(loopMBB); // loopMBB: // ldrex dest, ptr // scratch2, dest, incr // strex scratch, scratch2, ptr // cmp scratch, #0 // bne- loopMBB // fallthrough --> exitMBB BB = loopMBB; MachineInstrBuilder MIB = BuildMI(BB, dl, TII->get(ldrOpc), dest).addReg(ptr); if (ldrOpc == ARM::t2LDREX) MIB.addImm(0); AddDefaultPred(MIB); if (BinOpcode) { // operand order needs to go the other way for NAND if (BinOpcode == ARM::BICrr || BinOpcode == ARM::t2BICrr) AddDefaultPred(BuildMI(BB, dl, TII->get(BinOpcode), scratch2). addReg(incr).addReg(dest)).addReg(0); else AddDefaultPred(BuildMI(BB, dl, TII->get(BinOpcode), scratch2). addReg(dest).addReg(incr)).addReg(0); } MIB = BuildMI(BB, dl, TII->get(strOpc), scratch).addReg(scratch2).addReg(ptr); if (strOpc == ARM::t2STREX) MIB.addImm(0); AddDefaultPred(MIB); AddDefaultPred(BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPri : ARM::CMPri)) .addReg(scratch).addImm(0)); BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc)) .addMBB(loopMBB).addImm(ARMCC::NE).addReg(ARM::CPSR); BB->addSuccessor(loopMBB); BB->addSuccessor(exitMBB); // exitMBB: // ... BB = exitMBB; MI->eraseFromParent(); // The instruction is gone now. return BB; } MachineBasicBlock * ARMTargetLowering::EmitAtomicBinaryMinMax(MachineInstr *MI, MachineBasicBlock *BB, unsigned Size, bool signExtend, ARMCC::CondCodes Cond) const { const TargetInstrInfo *TII = getTargetMachine().getInstrInfo(); const BasicBlock *LLVM_BB = BB->getBasicBlock(); MachineFunction *MF = BB->getParent(); MachineFunction::iterator It = BB; ++It; unsigned dest = MI->getOperand(0).getReg(); unsigned ptr = MI->getOperand(1).getReg(); unsigned incr = MI->getOperand(2).getReg(); unsigned oldval = dest; DebugLoc dl = MI->getDebugLoc(); bool isThumb2 = Subtarget->isThumb2(); MachineRegisterInfo &MRI = BB->getParent()->getRegInfo(); if (isThumb2) { MRI.constrainRegClass(dest, ARM::rGPRRegisterClass); MRI.constrainRegClass(ptr, ARM::rGPRRegisterClass); } unsigned ldrOpc, strOpc, extendOpc; switch (Size) { default: llvm_unreachable("unsupported size for AtomicCmpSwap!"); case 1: ldrOpc = isThumb2 ? ARM::t2LDREXB : ARM::LDREXB; strOpc = isThumb2 ? ARM::t2STREXB : ARM::STREXB; extendOpc = isThumb2 ? ARM::t2SXTB : ARM::SXTB; break; case 2: ldrOpc = isThumb2 ? ARM::t2LDREXH : ARM::LDREXH; strOpc = isThumb2 ? ARM::t2STREXH : ARM::STREXH; extendOpc = isThumb2 ? ARM::t2SXTH : ARM::SXTH; break; case 4: ldrOpc = isThumb2 ? ARM::t2LDREX : ARM::LDREX; strOpc = isThumb2 ? ARM::t2STREX : ARM::STREX; extendOpc = 0; break; } MachineBasicBlock *loopMBB = MF->CreateMachineBasicBlock(LLVM_BB); MachineBasicBlock *exitMBB = MF->CreateMachineBasicBlock(LLVM_BB); MF->insert(It, loopMBB); MF->insert(It, exitMBB); // Transfer the remainder of BB and its successor edges to exitMBB. exitMBB->splice(exitMBB->begin(), BB, llvm::next(MachineBasicBlock::iterator(MI)), BB->end()); exitMBB->transferSuccessorsAndUpdatePHIs(BB); TargetRegisterClass *TRC = isThumb2 ? ARM::tGPRRegisterClass : ARM::GPRRegisterClass; unsigned scratch = MRI.createVirtualRegister(TRC); unsigned scratch2 = MRI.createVirtualRegister(TRC); // thisMBB: // ... // fallthrough --> loopMBB BB->addSuccessor(loopMBB); // loopMBB: // ldrex dest, ptr // (sign extend dest, if required) // cmp dest, incr // cmov.cond scratch2, dest, incr // strex scratch, scratch2, ptr // cmp scratch, #0 // bne- loopMBB // fallthrough --> exitMBB BB = loopMBB; MachineInstrBuilder MIB = BuildMI(BB, dl, TII->get(ldrOpc), dest).addReg(ptr); if (ldrOpc == ARM::t2LDREX) MIB.addImm(0); AddDefaultPred(MIB); // Sign extend the value, if necessary. if (signExtend && extendOpc) { oldval = MRI.createVirtualRegister(ARM::GPRRegisterClass); AddDefaultPred(BuildMI(BB, dl, TII->get(extendOpc), oldval) .addReg(dest) .addImm(0)); } // Build compare and cmov instructions. AddDefaultPred(BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPrr : ARM::CMPrr)) .addReg(oldval).addReg(incr)); BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2MOVCCr : ARM::MOVCCr), scratch2) .addReg(oldval).addReg(incr).addImm(Cond).addReg(ARM::CPSR); MIB = BuildMI(BB, dl, TII->get(strOpc), scratch).addReg(scratch2).addReg(ptr); if (strOpc == ARM::t2STREX) MIB.addImm(0); AddDefaultPred(MIB); AddDefaultPred(BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPri : ARM::CMPri)) .addReg(scratch).addImm(0)); BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc)) .addMBB(loopMBB).addImm(ARMCC::NE).addReg(ARM::CPSR); BB->addSuccessor(loopMBB); BB->addSuccessor(exitMBB); // exitMBB: // ... BB = exitMBB; MI->eraseFromParent(); // The instruction is gone now. return BB; } MachineBasicBlock * ARMTargetLowering::EmitAtomicBinary64(MachineInstr *MI, MachineBasicBlock *BB, unsigned Op1, unsigned Op2, bool NeedsCarry, bool IsCmpxchg) const { // This also handles ATOMIC_SWAP, indicated by Op1==0. const TargetInstrInfo *TII = getTargetMachine().getInstrInfo(); const BasicBlock *LLVM_BB = BB->getBasicBlock(); MachineFunction *MF = BB->getParent(); MachineFunction::iterator It = BB; ++It; unsigned destlo = MI->getOperand(0).getReg(); unsigned desthi = MI->getOperand(1).getReg(); unsigned ptr = MI->getOperand(2).getReg(); unsigned vallo = MI->getOperand(3).getReg(); unsigned valhi = MI->getOperand(4).getReg(); DebugLoc dl = MI->getDebugLoc(); bool isThumb2 = Subtarget->isThumb2(); MachineRegisterInfo &MRI = BB->getParent()->getRegInfo(); if (isThumb2) { MRI.constrainRegClass(destlo, ARM::rGPRRegisterClass); MRI.constrainRegClass(desthi, ARM::rGPRRegisterClass); MRI.constrainRegClass(ptr, ARM::rGPRRegisterClass); } unsigned ldrOpc = isThumb2 ? ARM::t2LDREXD : ARM::LDREXD; unsigned strOpc = isThumb2 ? ARM::t2STREXD : ARM::STREXD; MachineBasicBlock *loopMBB = MF->CreateMachineBasicBlock(LLVM_BB); MachineBasicBlock *contBB = 0, *cont2BB = 0; if (IsCmpxchg) { contBB = MF->CreateMachineBasicBlock(LLVM_BB); cont2BB = MF->CreateMachineBasicBlock(LLVM_BB); } MachineBasicBlock *exitMBB = MF->CreateMachineBasicBlock(LLVM_BB); MF->insert(It, loopMBB); if (IsCmpxchg) { MF->insert(It, contBB); MF->insert(It, cont2BB); } MF->insert(It, exitMBB); // Transfer the remainder of BB and its successor edges to exitMBB. exitMBB->splice(exitMBB->begin(), BB, llvm::next(MachineBasicBlock::iterator(MI)), BB->end()); exitMBB->transferSuccessorsAndUpdatePHIs(BB); TargetRegisterClass *TRC = isThumb2 ? ARM::tGPRRegisterClass : ARM::GPRRegisterClass; unsigned storesuccess = MRI.createVirtualRegister(TRC); // thisMBB: // ... // fallthrough --> loopMBB BB->addSuccessor(loopMBB); // loopMBB: // ldrexd r2, r3, ptr // r0, r2, incr // r1, r3, incr // strexd storesuccess, r0, r1, ptr // cmp storesuccess, #0 // bne- loopMBB // fallthrough --> exitMBB // // Note that the registers are explicitly specified because there is not any // way to force the register allocator to allocate a register pair. // // FIXME: The hardcoded registers are not necessary for Thumb2, but we // need to properly enforce the restriction that the two output registers // for ldrexd must be different. BB = loopMBB; // Load AddDefaultPred(BuildMI(BB, dl, TII->get(ldrOpc)) .addReg(ARM::R2, RegState::Define) .addReg(ARM::R3, RegState::Define).addReg(ptr)); // Copy r2/r3 into dest. (This copy will normally be coalesced.) BuildMI(BB, dl, TII->get(TargetOpcode::COPY), destlo).addReg(ARM::R2); BuildMI(BB, dl, TII->get(TargetOpcode::COPY), desthi).addReg(ARM::R3); if (IsCmpxchg) { // Add early exit for (unsigned i = 0; i < 2; i++) { AddDefaultPred(BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPrr : ARM::CMPrr)) .addReg(i == 0 ? destlo : desthi) .addReg(i == 0 ? vallo : valhi)); BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc)) .addMBB(exitMBB).addImm(ARMCC::NE).addReg(ARM::CPSR); BB->addSuccessor(exitMBB); BB->addSuccessor(i == 0 ? contBB : cont2BB); BB = (i == 0 ? contBB : cont2BB); } // Copy to physregs for strexd unsigned setlo = MI->getOperand(5).getReg(); unsigned sethi = MI->getOperand(6).getReg(); BuildMI(BB, dl, TII->get(TargetOpcode::COPY), ARM::R0).addReg(setlo); BuildMI(BB, dl, TII->get(TargetOpcode::COPY), ARM::R1).addReg(sethi); } else if (Op1) { // Perform binary operation AddDefaultPred(BuildMI(BB, dl, TII->get(Op1), ARM::R0) .addReg(destlo).addReg(vallo)) .addReg(NeedsCarry ? ARM::CPSR : 0, getDefRegState(NeedsCarry)); AddDefaultPred(BuildMI(BB, dl, TII->get(Op2), ARM::R1) .addReg(desthi).addReg(valhi)).addReg(0); } else { // Copy to physregs for strexd BuildMI(BB, dl, TII->get(TargetOpcode::COPY), ARM::R0).addReg(vallo); BuildMI(BB, dl, TII->get(TargetOpcode::COPY), ARM::R1).addReg(valhi); } // Store AddDefaultPred(BuildMI(BB, dl, TII->get(strOpc), storesuccess) .addReg(ARM::R0).addReg(ARM::R1).addReg(ptr)); // Cmp+jump AddDefaultPred(BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPri : ARM::CMPri)) .addReg(storesuccess).addImm(0)); BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc)) .addMBB(loopMBB).addImm(ARMCC::NE).addReg(ARM::CPSR); BB->addSuccessor(loopMBB); BB->addSuccessor(exitMBB); // exitMBB: // ... BB = exitMBB; MI->eraseFromParent(); // The instruction is gone now. return BB; } /// EmitBasePointerRecalculation - For functions using a base pointer, we /// rematerialize it (via the frame pointer). void ARMTargetLowering:: EmitBasePointerRecalculation(MachineInstr *MI, MachineBasicBlock *MBB, MachineBasicBlock *DispatchBB) const { const TargetInstrInfo *TII = getTargetMachine().getInstrInfo(); const ARMBaseInstrInfo *AII = static_cast(TII); MachineFunction &MF = *MI->getParent()->getParent(); ARMFunctionInfo *AFI = MF.getInfo(); const ARMBaseRegisterInfo &RI = AII->getRegisterInfo(); if (!RI.hasBasePointer(MF)) return; MachineBasicBlock::iterator MBBI = MI; int32_t NumBytes = AFI->getFramePtrSpillOffset(); unsigned FramePtr = RI.getFrameRegister(MF); assert(MF.getTarget().getFrameLowering()->hasFP(MF) && "Base pointer without frame pointer?"); if (AFI->isThumb2Function()) llvm::emitT2RegPlusImmediate(*MBB, MBBI, MI->getDebugLoc(), ARM::R6, FramePtr, -NumBytes, ARMCC::AL, 0, *AII); else if (AFI->isThumbFunction()) llvm::emitThumbRegPlusImmediate(*MBB, MBBI, MI->getDebugLoc(), ARM::R6, FramePtr, -NumBytes, *AII, RI); else llvm::emitARMRegPlusImmediate(*MBB, MBBI, MI->getDebugLoc(), ARM::R6, FramePtr, -NumBytes, ARMCC::AL, 0, *AII); if (!RI.needsStackRealignment(MF)) return; // If there's dynamic realignment, adjust for it. MachineFrameInfo *MFI = MF.getFrameInfo(); unsigned MaxAlign = MFI->getMaxAlignment(); assert(!AFI->isThumb1OnlyFunction()); // Emit bic r6, r6, MaxAlign unsigned bicOpc = AFI->isThumbFunction() ? ARM::t2BICri : ARM::BICri; AddDefaultCC( AddDefaultPred( BuildMI(*MBB, MBBI, MI->getDebugLoc(), TII->get(bicOpc), ARM::R6) .addReg(ARM::R6, RegState::Kill) .addImm(MaxAlign - 1))); } /// SetupEntryBlockForSjLj - Insert code into the entry block that creates and /// registers the function context. void ARMTargetLowering:: SetupEntryBlockForSjLj(MachineInstr *MI, MachineBasicBlock *MBB, MachineBasicBlock *DispatchBB, int FI) const { const TargetInstrInfo *TII = getTargetMachine().getInstrInfo(); DebugLoc dl = MI->getDebugLoc(); MachineFunction *MF = MBB->getParent(); MachineRegisterInfo *MRI = &MF->getRegInfo(); MachineConstantPool *MCP = MF->getConstantPool(); ARMFunctionInfo *AFI = MF->getInfo(); const Function *F = MF->getFunction(); bool isThumb = Subtarget->isThumb(); bool isThumb2 = Subtarget->isThumb2(); unsigned PCLabelId = AFI->createPICLabelUId(); unsigned PCAdj = (isThumb || isThumb2) ? 4 : 8; ARMConstantPoolValue *CPV = ARMConstantPoolMBB::Create(F->getContext(), DispatchBB, PCLabelId, PCAdj); unsigned CPI = MCP->getConstantPoolIndex(CPV, 4); const TargetRegisterClass *TRC = isThumb ? ARM::tGPRRegisterClass : ARM::GPRRegisterClass; // Grab constant pool and fixed stack memory operands. MachineMemOperand *CPMMO = MF->getMachineMemOperand(MachinePointerInfo::getConstantPool(), MachineMemOperand::MOLoad, 4, 4); MachineMemOperand *FIMMOSt = MF->getMachineMemOperand(MachinePointerInfo::getFixedStack(FI), MachineMemOperand::MOStore, 4, 4); EmitBasePointerRecalculation(MI, MBB, DispatchBB); // Load the address of the dispatch MBB into the jump buffer. if (isThumb2) { // Incoming value: jbuf // ldr.n r5, LCPI1_1 // orr r5, r5, #1 // add r5, pc // str r5, [$jbuf, #+4] ; &jbuf[1] unsigned NewVReg1 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(*MBB, MI, dl, TII->get(ARM::t2LDRpci), NewVReg1) .addConstantPoolIndex(CPI) .addMemOperand(CPMMO)); // Set the low bit because of thumb mode. unsigned NewVReg2 = MRI->createVirtualRegister(TRC); AddDefaultCC( AddDefaultPred(BuildMI(*MBB, MI, dl, TII->get(ARM::t2ORRri), NewVReg2) .addReg(NewVReg1, RegState::Kill) .addImm(0x01))); unsigned NewVReg3 = MRI->createVirtualRegister(TRC); BuildMI(*MBB, MI, dl, TII->get(ARM::tPICADD), NewVReg3) .addReg(NewVReg2, RegState::Kill) .addImm(PCLabelId); AddDefaultPred(BuildMI(*MBB, MI, dl, TII->get(ARM::t2STRi12)) .addReg(NewVReg3, RegState::Kill) .addFrameIndex(FI) .addImm(36) // &jbuf[1] :: pc .addMemOperand(FIMMOSt)); } else if (isThumb) { // Incoming value: jbuf // ldr.n r1, LCPI1_4 // add r1, pc // mov r2, #1 // orrs r1, r2 // add r2, $jbuf, #+4 ; &jbuf[1] // str r1, [r2] unsigned NewVReg1 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(*MBB, MI, dl, TII->get(ARM::tLDRpci), NewVReg1) .addConstantPoolIndex(CPI) .addMemOperand(CPMMO)); unsigned NewVReg2 = MRI->createVirtualRegister(TRC); BuildMI(*MBB, MI, dl, TII->get(ARM::tPICADD), NewVReg2) .addReg(NewVReg1, RegState::Kill) .addImm(PCLabelId); // Set the low bit because of thumb mode. unsigned NewVReg3 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(*MBB, MI, dl, TII->get(ARM::tMOVi8), NewVReg3) .addReg(ARM::CPSR, RegState::Define) .addImm(1)); unsigned NewVReg4 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(*MBB, MI, dl, TII->get(ARM::tORR), NewVReg4) .addReg(ARM::CPSR, RegState::Define) .addReg(NewVReg2, RegState::Kill) .addReg(NewVReg3, RegState::Kill)); unsigned NewVReg5 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(*MBB, MI, dl, TII->get(ARM::tADDrSPi), NewVReg5) .addFrameIndex(FI) .addImm(36)); // &jbuf[1] :: pc AddDefaultPred(BuildMI(*MBB, MI, dl, TII->get(ARM::tSTRi)) .addReg(NewVReg4, RegState::Kill) .addReg(NewVReg5, RegState::Kill) .addImm(0) .addMemOperand(FIMMOSt)); } else { // Incoming value: jbuf // ldr r1, LCPI1_1 // add r1, pc, r1 // str r1, [$jbuf, #+4] ; &jbuf[1] unsigned NewVReg1 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(*MBB, MI, dl, TII->get(ARM::LDRi12), NewVReg1) .addConstantPoolIndex(CPI) .addImm(0) .addMemOperand(CPMMO)); unsigned NewVReg2 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(*MBB, MI, dl, TII->get(ARM::PICADD), NewVReg2) .addReg(NewVReg1, RegState::Kill) .addImm(PCLabelId)); AddDefaultPred(BuildMI(*MBB, MI, dl, TII->get(ARM::STRi12)) .addReg(NewVReg2, RegState::Kill) .addFrameIndex(FI) .addImm(36) // &jbuf[1] :: pc .addMemOperand(FIMMOSt)); } } MachineBasicBlock *ARMTargetLowering:: EmitSjLjDispatchBlock(MachineInstr *MI, MachineBasicBlock *MBB) const { const TargetInstrInfo *TII = getTargetMachine().getInstrInfo(); DebugLoc dl = MI->getDebugLoc(); MachineFunction *MF = MBB->getParent(); MachineRegisterInfo *MRI = &MF->getRegInfo(); ARMFunctionInfo *AFI = MF->getInfo(); MachineFrameInfo *MFI = MF->getFrameInfo(); int FI = MFI->getFunctionContextIndex(); const TargetRegisterClass *TRC = Subtarget->isThumb() ? ARM::tGPRRegisterClass : ARM::GPRRegisterClass; // Get a mapping of the call site numbers to all of the landing pads they're // associated with. DenseMap > CallSiteNumToLPad; unsigned MaxCSNum = 0; MachineModuleInfo &MMI = MF->getMMI(); for (MachineFunction::iterator BB = MF->begin(), E = MF->end(); BB != E; ++BB) { if (!BB->isLandingPad()) continue; // FIXME: We should assert that the EH_LABEL is the first MI in the landing // pad. for (MachineBasicBlock::iterator II = BB->begin(), IE = BB->end(); II != IE; ++II) { if (!II->isEHLabel()) continue; MCSymbol *Sym = II->getOperand(0).getMCSymbol(); if (!MMI.hasCallSiteLandingPad(Sym)) continue; SmallVectorImpl &CallSiteIdxs = MMI.getCallSiteLandingPad(Sym); for (SmallVectorImpl::iterator CSI = CallSiteIdxs.begin(), CSE = CallSiteIdxs.end(); CSI != CSE; ++CSI) { CallSiteNumToLPad[*CSI].push_back(BB); MaxCSNum = std::max(MaxCSNum, *CSI); } break; } } // Get an ordered list of the machine basic blocks for the jump table. std::vector LPadList; SmallPtrSet InvokeBBs; LPadList.reserve(CallSiteNumToLPad.size()); for (unsigned I = 1; I <= MaxCSNum; ++I) { SmallVectorImpl &MBBList = CallSiteNumToLPad[I]; for (SmallVectorImpl::iterator II = MBBList.begin(), IE = MBBList.end(); II != IE; ++II) { LPadList.push_back(*II); InvokeBBs.insert((*II)->pred_begin(), (*II)->pred_end()); } } assert(!LPadList.empty() && "No landing pad destinations for the dispatch jump table!"); // Create the jump table and associated information. MachineJumpTableInfo *JTI = MF->getOrCreateJumpTableInfo(MachineJumpTableInfo::EK_Inline); unsigned MJTI = JTI->createJumpTableIndex(LPadList); unsigned UId = AFI->createJumpTableUId(); // Create the MBBs for the dispatch code. // Shove the dispatch's address into the return slot in the function context. MachineBasicBlock *DispatchBB = MF->CreateMachineBasicBlock(); DispatchBB->setIsLandingPad(); MachineBasicBlock *TrapBB = MF->CreateMachineBasicBlock(); BuildMI(TrapBB, dl, TII->get(Subtarget->isThumb() ? ARM::tTRAP : ARM::TRAP)); DispatchBB->addSuccessor(TrapBB); MachineBasicBlock *DispContBB = MF->CreateMachineBasicBlock(); DispatchBB->addSuccessor(DispContBB); // Insert and renumber MBBs. MachineBasicBlock *Last = &MF->back(); MF->insert(MF->end(), DispatchBB); MF->insert(MF->end(), DispContBB); MF->insert(MF->end(), TrapBB); MF->RenumberBlocks(Last); // Insert code into the entry block that creates and registers the function // context. SetupEntryBlockForSjLj(MI, MBB, DispatchBB, FI); MachineMemOperand *FIMMOLd = MF->getMachineMemOperand(MachinePointerInfo::getFixedStack(FI), MachineMemOperand::MOLoad | MachineMemOperand::MOVolatile, 4, 4); if (Subtarget->isThumb2()) { unsigned NewVReg1 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(DispatchBB, dl, TII->get(ARM::t2LDRi12), NewVReg1) .addFrameIndex(FI) .addImm(4) .addMemOperand(FIMMOLd)); AddDefaultPred(BuildMI(DispatchBB, dl, TII->get(ARM::t2CMPri)) .addReg(NewVReg1) .addImm(LPadList.size())); BuildMI(DispatchBB, dl, TII->get(ARM::t2Bcc)) .addMBB(TrapBB) .addImm(ARMCC::HI) .addReg(ARM::CPSR); unsigned NewVReg2 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(DispContBB, dl, TII->get(ARM::t2LEApcrelJT),NewVReg2) .addJumpTableIndex(MJTI) .addImm(UId)); unsigned NewVReg3 = MRI->createVirtualRegister(TRC); AddDefaultCC( AddDefaultPred( BuildMI(DispContBB, dl, TII->get(ARM::t2ADDrs), NewVReg3) .addReg(NewVReg2, RegState::Kill) .addReg(NewVReg1) .addImm(ARM_AM::getSORegOpc(ARM_AM::lsl, 2)))); BuildMI(DispContBB, dl, TII->get(ARM::t2BR_JT)) .addReg(NewVReg3, RegState::Kill) .addReg(NewVReg1) .addJumpTableIndex(MJTI) .addImm(UId); } else if (Subtarget->isThumb()) { unsigned NewVReg1 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(DispatchBB, dl, TII->get(ARM::tLDRspi), NewVReg1) .addFrameIndex(FI) .addImm(1) .addMemOperand(FIMMOLd)); AddDefaultPred(BuildMI(DispatchBB, dl, TII->get(ARM::tCMPi8)) .addReg(NewVReg1) .addImm(LPadList.size())); BuildMI(DispatchBB, dl, TII->get(ARM::tBcc)) .addMBB(TrapBB) .addImm(ARMCC::HI) .addReg(ARM::CPSR); unsigned NewVReg2 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(DispContBB, dl, TII->get(ARM::tLSLri), NewVReg2) .addReg(ARM::CPSR, RegState::Define) .addReg(NewVReg1) .addImm(2)); unsigned NewVReg3 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(DispContBB, dl, TII->get(ARM::tLEApcrelJT), NewVReg3) .addJumpTableIndex(MJTI) .addImm(UId)); unsigned NewVReg4 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(DispContBB, dl, TII->get(ARM::tADDrr), NewVReg4) .addReg(ARM::CPSR, RegState::Define) .addReg(NewVReg2, RegState::Kill) .addReg(NewVReg3)); MachineMemOperand *JTMMOLd = MF->getMachineMemOperand(MachinePointerInfo::getJumpTable(), MachineMemOperand::MOLoad, 4, 4); unsigned NewVReg5 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(DispContBB, dl, TII->get(ARM::tLDRi), NewVReg5) .addReg(NewVReg4, RegState::Kill) .addImm(0) .addMemOperand(JTMMOLd)); unsigned NewVReg6 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(DispContBB, dl, TII->get(ARM::tADDrr), NewVReg6) .addReg(ARM::CPSR, RegState::Define) .addReg(NewVReg5, RegState::Kill) .addReg(NewVReg3)); BuildMI(DispContBB, dl, TII->get(ARM::tBR_JTr)) .addReg(NewVReg6, RegState::Kill) .addJumpTableIndex(MJTI) .addImm(UId); } else { unsigned NewVReg1 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(DispatchBB, dl, TII->get(ARM::LDRi12), NewVReg1) .addFrameIndex(FI) .addImm(4) .addMemOperand(FIMMOLd)); AddDefaultPred(BuildMI(DispatchBB, dl, TII->get(ARM::CMPri)) .addReg(NewVReg1) .addImm(LPadList.size())); BuildMI(DispatchBB, dl, TII->get(ARM::Bcc)) .addMBB(TrapBB) .addImm(ARMCC::HI) .addReg(ARM::CPSR); unsigned NewVReg2 = MRI->createVirtualRegister(TRC); AddDefaultCC( AddDefaultPred(BuildMI(DispContBB, dl, TII->get(ARM::MOVsi), NewVReg2) .addReg(NewVReg1) .addImm(ARM_AM::getSORegOpc(ARM_AM::lsl, 2)))); unsigned NewVReg3 = MRI->createVirtualRegister(TRC); AddDefaultPred(BuildMI(DispContBB, dl, TII->get(ARM::LEApcrelJT), NewVReg3) .addJumpTableIndex(MJTI) .addImm(UId)); MachineMemOperand *JTMMOLd = MF->getMachineMemOperand(MachinePointerInfo::getJumpTable(), MachineMemOperand::MOLoad, 4, 4); unsigned NewVReg4 = MRI->createVirtualRegister(TRC); AddDefaultPred( BuildMI(DispContBB, dl, TII->get(ARM::LDRrs), NewVReg4) .addReg(NewVReg2, RegState::Kill) .addReg(NewVReg3) .addImm(0) .addMemOperand(JTMMOLd)); BuildMI(DispContBB, dl, TII->get(ARM::BR_JTadd)) .addReg(NewVReg4, RegState::Kill) .addReg(NewVReg3) .addJumpTableIndex(MJTI) .addImm(UId); } // Add the jump table entries as successors to the MBB. MachineBasicBlock *PrevMBB = 0; for (std::vector::iterator I = LPadList.begin(), E = LPadList.end(); I != E; ++I) { MachineBasicBlock *CurMBB = *I; if (PrevMBB != CurMBB) DispContBB->addSuccessor(CurMBB); PrevMBB = CurMBB; } const ARMBaseInstrInfo *AII = static_cast(TII); const ARMBaseRegisterInfo &RI = AII->getRegisterInfo(); const unsigned *SavedRegs = RI.getCalleeSavedRegs(MF); for (SmallPtrSet::iterator I = InvokeBBs.begin(), E = InvokeBBs.end(); I != E; ++I) { MachineBasicBlock *BB = *I; // Remove the landing pad successor from the invoke block and replace it // with the new dispatch block. for (MachineBasicBlock::succ_iterator SI = BB->succ_begin(), SE = BB->succ_end(); SI != SE; ++SI) { MachineBasicBlock *SMBB = *SI; if (SMBB->isLandingPad()) { BB->removeSuccessor(SMBB); SMBB->setIsLandingPad(false); } } BB->addSuccessor(DispatchBB); // Find the invoke call and mark all of the callee-saved registers as // 'implicit defined' so that they're spilled. This prevents code from // moving instructions to before the EH block, where they will never be // executed. for (MachineBasicBlock::reverse_iterator II = BB->rbegin(), IE = BB->rend(); II != IE; ++II) { if (!II->getDesc().isCall()) continue; DenseMap DefRegs; for (MachineInstr::mop_iterator OI = II->operands_begin(), OE = II->operands_end(); OI != OE; ++OI) { if (!OI->isReg()) continue; DefRegs[OI->getReg()] = true; } MachineInstrBuilder MIB(&*II); for (unsigned i = 0; SavedRegs[i] != 0; ++i) { if (!TRC->contains(SavedRegs[i])) continue; if (!DefRegs[SavedRegs[i]]) MIB.addReg(SavedRegs[i], RegState::ImplicitDefine | RegState::Dead); } break; } } // The instruction is gone now. MI->eraseFromParent(); return MBB; } static MachineBasicBlock *OtherSucc(MachineBasicBlock *MBB, MachineBasicBlock *Succ) { for (MachineBasicBlock::succ_iterator I = MBB->succ_begin(), E = MBB->succ_end(); I != E; ++I) if (*I != Succ) return *I; llvm_unreachable("Expecting a BB with two successors!"); } MachineBasicBlock * ARMTargetLowering::EmitInstrWithCustomInserter(MachineInstr *MI, MachineBasicBlock *BB) const { const TargetInstrInfo *TII = getTargetMachine().getInstrInfo(); DebugLoc dl = MI->getDebugLoc(); bool isThumb2 = Subtarget->isThumb2(); switch (MI->getOpcode()) { default: { MI->dump(); llvm_unreachable("Unexpected instr type to insert"); } // The Thumb2 pre-indexed stores have the same MI operands, they just // define them differently in the .td files from the isel patterns, so // they need pseudos. case ARM::t2STR_preidx: MI->setDesc(TII->get(ARM::t2STR_PRE)); return BB; case ARM::t2STRB_preidx: MI->setDesc(TII->get(ARM::t2STRB_PRE)); return BB; case ARM::t2STRH_preidx: MI->setDesc(TII->get(ARM::t2STRH_PRE)); return BB; case ARM::STRi_preidx: case ARM::STRBi_preidx: { unsigned NewOpc = MI->getOpcode() == ARM::STRi_preidx ? ARM::STR_PRE_IMM : ARM::STRB_PRE_IMM; // Decode the offset. unsigned Offset = MI->getOperand(4).getImm(); bool isSub = ARM_AM::getAM2Op(Offset) == ARM_AM::sub; Offset = ARM_AM::getAM2Offset(Offset); if (isSub) Offset = -Offset; MachineMemOperand *MMO = *MI->memoperands_begin(); BuildMI(*BB, MI, dl, TII->get(NewOpc)) .addOperand(MI->getOperand(0)) // Rn_wb .addOperand(MI->getOperand(1)) // Rt .addOperand(MI->getOperand(2)) // Rn .addImm(Offset) // offset (skip GPR==zero_reg) .addOperand(MI->getOperand(5)) // pred .addOperand(MI->getOperand(6)) .addMemOperand(MMO); MI->eraseFromParent(); return BB; } case ARM::STRr_preidx: case ARM::STRBr_preidx: case ARM::STRH_preidx: { unsigned NewOpc; switch (MI->getOpcode()) { default: llvm_unreachable("unexpected opcode!"); case ARM::STRr_preidx: NewOpc = ARM::STR_PRE_REG; break; case ARM::STRBr_preidx: NewOpc = ARM::STRB_PRE_REG; break; case ARM::STRH_preidx: NewOpc = ARM::STRH_PRE; break; } MachineInstrBuilder MIB = BuildMI(*BB, MI, dl, TII->get(NewOpc)); for (unsigned i = 0; i < MI->getNumOperands(); ++i) MIB.addOperand(MI->getOperand(i)); MI->eraseFromParent(); return BB; } case ARM::ATOMIC_LOAD_ADD_I8: return EmitAtomicBinary(MI, BB, 1, isThumb2 ? ARM::t2ADDrr : ARM::ADDrr); case ARM::ATOMIC_LOAD_ADD_I16: return EmitAtomicBinary(MI, BB, 2, isThumb2 ? ARM::t2ADDrr : ARM::ADDrr); case ARM::ATOMIC_LOAD_ADD_I32: return EmitAtomicBinary(MI, BB, 4, isThumb2 ? ARM::t2ADDrr : ARM::ADDrr); case ARM::ATOMIC_LOAD_AND_I8: return EmitAtomicBinary(MI, BB, 1, isThumb2 ? ARM::t2ANDrr : ARM::ANDrr); case ARM::ATOMIC_LOAD_AND_I16: return EmitAtomicBinary(MI, BB, 2, isThumb2 ? ARM::t2ANDrr : ARM::ANDrr); case ARM::ATOMIC_LOAD_AND_I32: return EmitAtomicBinary(MI, BB, 4, isThumb2 ? ARM::t2ANDrr : ARM::ANDrr); case ARM::ATOMIC_LOAD_OR_I8: return EmitAtomicBinary(MI, BB, 1, isThumb2 ? ARM::t2ORRrr : ARM::ORRrr); case ARM::ATOMIC_LOAD_OR_I16: return EmitAtomicBinary(MI, BB, 2, isThumb2 ? ARM::t2ORRrr : ARM::ORRrr); case ARM::ATOMIC_LOAD_OR_I32: return EmitAtomicBinary(MI, BB, 4, isThumb2 ? ARM::t2ORRrr : ARM::ORRrr); case ARM::ATOMIC_LOAD_XOR_I8: return EmitAtomicBinary(MI, BB, 1, isThumb2 ? ARM::t2EORrr : ARM::EORrr); case ARM::ATOMIC_LOAD_XOR_I16: return EmitAtomicBinary(MI, BB, 2, isThumb2 ? ARM::t2EORrr : ARM::EORrr); case ARM::ATOMIC_LOAD_XOR_I32: return EmitAtomicBinary(MI, BB, 4, isThumb2 ? ARM::t2EORrr : ARM::EORrr); case ARM::ATOMIC_LOAD_NAND_I8: return EmitAtomicBinary(MI, BB, 1, isThumb2 ? ARM::t2BICrr : ARM::BICrr); case ARM::ATOMIC_LOAD_NAND_I16: return EmitAtomicBinary(MI, BB, 2, isThumb2 ? ARM::t2BICrr : ARM::BICrr); case ARM::ATOMIC_LOAD_NAND_I32: return EmitAtomicBinary(MI, BB, 4, isThumb2 ? ARM::t2BICrr : ARM::BICrr); case ARM::ATOMIC_LOAD_SUB_I8: return EmitAtomicBinary(MI, BB, 1, isThumb2 ? ARM::t2SUBrr : ARM::SUBrr); case ARM::ATOMIC_LOAD_SUB_I16: return EmitAtomicBinary(MI, BB, 2, isThumb2 ? ARM::t2SUBrr : ARM::SUBrr); case ARM::ATOMIC_LOAD_SUB_I32: return EmitAtomicBinary(MI, BB, 4, isThumb2 ? ARM::t2SUBrr : ARM::SUBrr); case ARM::ATOMIC_LOAD_MIN_I8: return EmitAtomicBinaryMinMax(MI, BB, 1, true, ARMCC::LT); case ARM::ATOMIC_LOAD_MIN_I16: return EmitAtomicBinaryMinMax(MI, BB, 2, true, ARMCC::LT); case ARM::ATOMIC_LOAD_MIN_I32: return EmitAtomicBinaryMinMax(MI, BB, 4, true, ARMCC::LT); case ARM::ATOMIC_LOAD_MAX_I8: return EmitAtomicBinaryMinMax(MI, BB, 1, true, ARMCC::GT); case ARM::ATOMIC_LOAD_MAX_I16: return EmitAtomicBinaryMinMax(MI, BB, 2, true, ARMCC::GT); case ARM::ATOMIC_LOAD_MAX_I32: return EmitAtomicBinaryMinMax(MI, BB, 4, true, ARMCC::GT); case ARM::ATOMIC_LOAD_UMIN_I8: return EmitAtomicBinaryMinMax(MI, BB, 1, false, ARMCC::LO); case ARM::ATOMIC_LOAD_UMIN_I16: return EmitAtomicBinaryMinMax(MI, BB, 2, false, ARMCC::LO); case ARM::ATOMIC_LOAD_UMIN_I32: return EmitAtomicBinaryMinMax(MI, BB, 4, false, ARMCC::LO); case ARM::ATOMIC_LOAD_UMAX_I8: return EmitAtomicBinaryMinMax(MI, BB, 1, false, ARMCC::HI); case ARM::ATOMIC_LOAD_UMAX_I16: return EmitAtomicBinaryMinMax(MI, BB, 2, false, ARMCC::HI); case ARM::ATOMIC_LOAD_UMAX_I32: return EmitAtomicBinaryMinMax(MI, BB, 4, false, ARMCC::HI); case ARM::ATOMIC_SWAP_I8: return EmitAtomicBinary(MI, BB, 1, 0); case ARM::ATOMIC_SWAP_I16: return EmitAtomicBinary(MI, BB, 2, 0); case ARM::ATOMIC_SWAP_I32: return EmitAtomicBinary(MI, BB, 4, 0); case ARM::ATOMIC_CMP_SWAP_I8: return EmitAtomicCmpSwap(MI, BB, 1); case ARM::ATOMIC_CMP_SWAP_I16: return EmitAtomicCmpSwap(MI, BB, 2); case ARM::ATOMIC_CMP_SWAP_I32: return EmitAtomicCmpSwap(MI, BB, 4); case ARM::ATOMADD6432: return EmitAtomicBinary64(MI, BB, isThumb2 ? ARM::t2ADDrr : ARM::ADDrr, isThumb2 ? ARM::t2ADCrr : ARM::ADCrr, /*NeedsCarry*/ true); case ARM::ATOMSUB6432: return EmitAtomicBinary64(MI, BB, isThumb2 ? ARM::t2SUBrr : ARM::SUBrr, isThumb2 ? ARM::t2SBCrr : ARM::SBCrr, /*NeedsCarry*/ true); case ARM::ATOMOR6432: return EmitAtomicBinary64(MI, BB, isThumb2 ? ARM::t2ORRrr : ARM::ORRrr, isThumb2 ? ARM::t2ORRrr : ARM::ORRrr); case ARM::ATOMXOR6432: return EmitAtomicBinary64(MI, BB, isThumb2 ? ARM::t2EORrr : ARM::EORrr, isThumb2 ? ARM::t2EORrr : ARM::EORrr); case ARM::ATOMAND6432: return EmitAtomicBinary64(MI, BB, isThumb2 ? ARM::t2ANDrr : ARM::ANDrr, isThumb2 ? ARM::t2ANDrr : ARM::ANDrr); case ARM::ATOMSWAP6432: return EmitAtomicBinary64(MI, BB, 0, 0, false); case ARM::ATOMCMPXCHG6432: return EmitAtomicBinary64(MI, BB, isThumb2 ? ARM::t2SUBrr : ARM::SUBrr, isThumb2 ? ARM::t2SBCrr : ARM::SBCrr, /*NeedsCarry*/ false, /*IsCmpxchg*/true); case ARM::tMOVCCr_pseudo: { // To "insert" a SELECT_CC instruction, we actually have to insert the // diamond control-flow pattern. The incoming instruction knows the // destination vreg to set, the condition code register to branch on, the // true/false values to select between, and a branch opcode to use. const BasicBlock *LLVM_BB = BB->getBasicBlock(); MachineFunction::iterator It = BB; ++It; // thisMBB: // ... // TrueVal = ... // cmpTY ccX, r1, r2 // bCC copy1MBB // fallthrough --> copy0MBB MachineBasicBlock *thisMBB = BB; MachineFunction *F = BB->getParent(); MachineBasicBlock *copy0MBB = F->CreateMachineBasicBlock(LLVM_BB); MachineBasicBlock *sinkMBB = F->CreateMachineBasicBlock(LLVM_BB); F->insert(It, copy0MBB); F->insert(It, sinkMBB); // Transfer the remainder of BB and its successor edges to sinkMBB. sinkMBB->splice(sinkMBB->begin(), BB, llvm::next(MachineBasicBlock::iterator(MI)), BB->end()); sinkMBB->transferSuccessorsAndUpdatePHIs(BB); BB->addSuccessor(copy0MBB); BB->addSuccessor(sinkMBB); BuildMI(BB, dl, TII->get(ARM::tBcc)).addMBB(sinkMBB) .addImm(MI->getOperand(3).getImm()).addReg(MI->getOperand(4).getReg()); // copy0MBB: // %FalseValue = ... // # fallthrough to sinkMBB BB = copy0MBB; // Update machine-CFG edges BB->addSuccessor(sinkMBB); // sinkMBB: // %Result = phi [ %FalseValue, copy0MBB ], [ %TrueValue, thisMBB ] // ... BB = sinkMBB; BuildMI(*BB, BB->begin(), dl, TII->get(ARM::PHI), MI->getOperand(0).getReg()) .addReg(MI->getOperand(1).getReg()).addMBB(copy0MBB) .addReg(MI->getOperand(2).getReg()).addMBB(thisMBB); MI->eraseFromParent(); // The pseudo instruction is gone now. return BB; } case ARM::BCCi64: case ARM::BCCZi64: { // If there is an unconditional branch to the other successor, remove it. BB->erase(llvm::next(MachineBasicBlock::iterator(MI)), BB->end()); // Compare both parts that make up the double comparison separately for // equality. bool RHSisZero = MI->getOpcode() == ARM::BCCZi64; unsigned LHS1 = MI->getOperand(1).getReg(); unsigned LHS2 = MI->getOperand(2).getReg(); if (RHSisZero) { AddDefaultPred(BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPri : ARM::CMPri)) .addReg(LHS1).addImm(0)); BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPri : ARM::CMPri)) .addReg(LHS2).addImm(0) .addImm(ARMCC::EQ).addReg(ARM::CPSR); } else { unsigned RHS1 = MI->getOperand(3).getReg(); unsigned RHS2 = MI->getOperand(4).getReg(); AddDefaultPred(BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPrr : ARM::CMPrr)) .addReg(LHS1).addReg(RHS1)); BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2CMPrr : ARM::CMPrr)) .addReg(LHS2).addReg(RHS2) .addImm(ARMCC::EQ).addReg(ARM::CPSR); } MachineBasicBlock *destMBB = MI->getOperand(RHSisZero ? 3 : 5).getMBB(); MachineBasicBlock *exitMBB = OtherSucc(BB, destMBB); if (MI->getOperand(0).getImm() == ARMCC::NE) std::swap(destMBB, exitMBB); BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc)) .addMBB(destMBB).addImm(ARMCC::EQ).addReg(ARM::CPSR); if (isThumb2) AddDefaultPred(BuildMI(BB, dl, TII->get(ARM::t2B)).addMBB(exitMBB)); else BuildMI(BB, dl, TII->get(ARM::B)) .addMBB(exitMBB); MI->eraseFromParent(); // The pseudo instruction is gone now. return BB; } case ARM::ABS: case ARM::t2ABS: { // To insert an ABS instruction, we have to insert the // diamond control-flow pattern. The incoming instruction knows the // source vreg to test against 0, the destination vreg to set, // the condition code register to branch on, the // true/false values to select between, and a branch opcode to use. // It transforms // V1 = ABS V0 // into // V2 = MOVS V0 // BCC (branch to SinkBB if V0 >= 0) // RSBBB: V3 = RSBri V2, 0 (compute ABS if V2 < 0) // SinkBB: V1 = PHI(V2, V3) const BasicBlock *LLVM_BB = BB->getBasicBlock(); MachineFunction::iterator BBI = BB; ++BBI; MachineFunction *Fn = BB->getParent(); MachineBasicBlock *RSBBB = Fn->CreateMachineBasicBlock(LLVM_BB); MachineBasicBlock *SinkBB = Fn->CreateMachineBasicBlock(LLVM_BB); Fn->insert(BBI, RSBBB); Fn->insert(BBI, SinkBB); unsigned int ABSSrcReg = MI->getOperand(1).getReg(); unsigned int ABSDstReg = MI->getOperand(0).getReg(); bool isThumb2 = Subtarget->isThumb2(); MachineRegisterInfo &MRI = Fn->getRegInfo(); // In Thumb mode S must not be specified if source register is the SP or // PC and if destination register is the SP, so restrict register class unsigned NewMovDstReg = MRI.createVirtualRegister( isThumb2 ? ARM::rGPRRegisterClass : ARM::GPRRegisterClass); unsigned NewRsbDstReg = MRI.createVirtualRegister( isThumb2 ? ARM::rGPRRegisterClass : ARM::GPRRegisterClass); // Transfer the remainder of BB and its successor edges to sinkMBB. SinkBB->splice(SinkBB->begin(), BB, llvm::next(MachineBasicBlock::iterator(MI)), BB->end()); SinkBB->transferSuccessorsAndUpdatePHIs(BB); BB->addSuccessor(RSBBB); BB->addSuccessor(SinkBB); // fall through to SinkMBB RSBBB->addSuccessor(SinkBB); // insert a movs at the end of BB BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2MOVr : ARM::MOVr), NewMovDstReg) .addReg(ABSSrcReg, RegState::Kill) .addImm((unsigned)ARMCC::AL).addReg(0) .addReg(ARM::CPSR, RegState::Define); // insert a bcc with opposite CC to ARMCC::MI at the end of BB BuildMI(BB, dl, TII->get(isThumb2 ? ARM::t2Bcc : ARM::Bcc)).addMBB(SinkBB) .addImm(ARMCC::getOppositeCondition(ARMCC::MI)).addReg(ARM::CPSR); // insert rsbri in RSBBB // Note: BCC and rsbri will be converted into predicated rsbmi // by if-conversion pass BuildMI(*RSBBB, RSBBB->begin(), dl, TII->get(isThumb2 ? ARM::t2RSBri : ARM::RSBri), NewRsbDstReg) .addReg(NewMovDstReg, RegState::Kill) .addImm(0).addImm((unsigned)ARMCC::AL).addReg(0).addReg(0); // insert PHI in SinkBB, // reuse ABSDstReg to not change uses of ABS instruction BuildMI(*SinkBB, SinkBB->begin(), dl, TII->get(ARM::PHI), ABSDstReg) .addReg(NewRsbDstReg).addMBB(RSBBB) .addReg(NewMovDstReg).addMBB(BB); // remove ABS instruction MI->eraseFromParent(); // return last added BB return SinkBB; } } } void ARMTargetLowering::AdjustInstrPostInstrSelection(MachineInstr *MI, SDNode *Node) const { const MCInstrDesc &MCID = MI->getDesc(); if (!MCID.hasPostISelHook()) { assert(!convertAddSubFlagsOpcode(MI->getOpcode()) && "Pseudo flag-setting opcodes must be marked with 'hasPostISelHook'"); return; } // Adjust potentially 's' setting instructions after isel, i.e. ADC, SBC, RSB, // RSC. Coming out of isel, they have an implicit CPSR def, but the optional // operand is still set to noreg. If needed, set the optional operand's // register to CPSR, and remove the redundant implicit def. // // e.g. ADCS (...opt:%noreg, CPSR) -> ADC (... opt:CPSR). // Rename pseudo opcodes. unsigned NewOpc = convertAddSubFlagsOpcode(MI->getOpcode()); if (NewOpc) { const ARMBaseInstrInfo *TII = static_cast(getTargetMachine().getInstrInfo()); MI->setDesc(TII->get(NewOpc)); } unsigned ccOutIdx = MCID.getNumOperands() - 1; // Any ARM instruction that sets the 's' bit should specify an optional // "cc_out" operand in the last operand position. if (!MCID.hasOptionalDef() || !MCID.OpInfo[ccOutIdx].isOptionalDef()) { assert(!NewOpc && "Optional cc_out operand required"); return; } // Look for an implicit def of CPSR added by MachineInstr ctor. Remove it // since we already have an optional CPSR def. bool definesCPSR = false; bool deadCPSR = false; for (unsigned i = MCID.getNumOperands(), e = MI->getNumOperands(); i != e; ++i) { const MachineOperand &MO = MI->getOperand(i); if (MO.isReg() && MO.isDef() && MO.getReg() == ARM::CPSR) { definesCPSR = true; if (MO.isDead()) deadCPSR = true; MI->RemoveOperand(i); break; } } if (!definesCPSR) { assert(!NewOpc && "Optional cc_out operand required"); return; } assert(deadCPSR == !Node->hasAnyUseOfValue(1) && "inconsistent dead flag"); if (deadCPSR) { assert(!MI->getOperand(ccOutIdx).getReg() && "expect uninitialized optional cc_out operand"); return; } // If this instruction was defined with an optional CPSR def and its dag node // had a live implicit CPSR def, then activate the optional CPSR def. MachineOperand &MO = MI->getOperand(ccOutIdx); MO.setReg(ARM::CPSR); MO.setIsDef(true); } //===----------------------------------------------------------------------===// // ARM Optimization Hooks //===----------------------------------------------------------------------===// static SDValue combineSelectAndUse(SDNode *N, SDValue Slct, SDValue OtherOp, TargetLowering::DAGCombinerInfo &DCI) { SelectionDAG &DAG = DCI.DAG; const TargetLowering &TLI = DAG.getTargetLoweringInfo(); EVT VT = N->getValueType(0); unsigned Opc = N->getOpcode(); bool isSlctCC = Slct.getOpcode() == ISD::SELECT_CC; SDValue LHS = isSlctCC ? Slct.getOperand(2) : Slct.getOperand(1); SDValue RHS = isSlctCC ? Slct.getOperand(3) : Slct.getOperand(2); ISD::CondCode CC = ISD::SETCC_INVALID; if (isSlctCC) { CC = cast(Slct.getOperand(4))->get(); } else { SDValue CCOp = Slct.getOperand(0); if (CCOp.getOpcode() == ISD::SETCC) CC = cast(CCOp.getOperand(2))->get(); } bool DoXform = false; bool InvCC = false; assert ((Opc == ISD::ADD || (Opc == ISD::SUB && Slct == N->getOperand(1))) && "Bad input!"); if (LHS.getOpcode() == ISD::Constant && cast(LHS)->isNullValue()) { DoXform = true; } else if (CC != ISD::SETCC_INVALID && RHS.getOpcode() == ISD::Constant && cast(RHS)->isNullValue()) { std::swap(LHS, RHS); SDValue Op0 = Slct.getOperand(0); EVT OpVT = isSlctCC ? Op0.getValueType() : Op0.getOperand(0).getValueType(); bool isInt = OpVT.isInteger(); CC = ISD::getSetCCInverse(CC, isInt); if (!TLI.isCondCodeLegal(CC, OpVT)) return SDValue(); // Inverse operator isn't legal. DoXform = true; InvCC = true; } if (DoXform) { SDValue Result = DAG.getNode(Opc, RHS.getDebugLoc(), VT, OtherOp, RHS); if (isSlctCC) return DAG.getSelectCC(N->getDebugLoc(), OtherOp, Result, Slct.getOperand(0), Slct.getOperand(1), CC); SDValue CCOp = Slct.getOperand(0); if (InvCC) CCOp = DAG.getSetCC(Slct.getDebugLoc(), CCOp.getValueType(), CCOp.getOperand(0), CCOp.getOperand(1), CC); return DAG.getNode(ISD::SELECT, N->getDebugLoc(), VT, CCOp, OtherOp, Result); } return SDValue(); } // AddCombineToVPADDL- For pair-wise add on neon, use the vpaddl instruction // (only after legalization). static SDValue AddCombineToVPADDL(SDNode *N, SDValue N0, SDValue N1, TargetLowering::DAGCombinerInfo &DCI, const ARMSubtarget *Subtarget) { // Only perform optimization if after legalize, and if NEON is available. We // also expected both operands to be BUILD_VECTORs. if (DCI.isBeforeLegalize() || !Subtarget->hasNEON() || N0.getOpcode() != ISD::BUILD_VECTOR || N1.getOpcode() != ISD::BUILD_VECTOR) return SDValue(); // Check output type since VPADDL operand elements can only be 8, 16, or 32. EVT VT = N->getValueType(0); if (!VT.isInteger() || VT.getVectorElementType() == MVT::i64) return SDValue(); // Check that the vector operands are of the right form. // N0 and N1 are BUILD_VECTOR nodes with N number of EXTRACT_VECTOR // operands, where N is the size of the formed vector. // Each EXTRACT_VECTOR should have the same input vector and odd or even // index such that we have a pair wise add pattern. // Grab the vector that all EXTRACT_VECTOR nodes should be referencing. if (N0->getOperand(0)->getOpcode() != ISD::EXTRACT_VECTOR_ELT) return SDValue(); SDValue Vec = N0->getOperand(0)->getOperand(0); SDNode *V = Vec.getNode(); unsigned nextIndex = 0; // For each operands to the ADD which are BUILD_VECTORs, // check to see if each of their operands are an EXTRACT_VECTOR with // the same vector and appropriate index. for (unsigned i = 0, e = N0->getNumOperands(); i != e; ++i) { if (N0->getOperand(i)->getOpcode() == ISD::EXTRACT_VECTOR_ELT && N1->getOperand(i)->getOpcode() == ISD::EXTRACT_VECTOR_ELT) { SDValue ExtVec0 = N0->getOperand(i); SDValue ExtVec1 = N1->getOperand(i); // First operand is the vector, verify its the same. if (V != ExtVec0->getOperand(0).getNode() || V != ExtVec1->getOperand(0).getNode()) return SDValue(); // Second is the constant, verify its correct. ConstantSDNode *C0 = dyn_cast(ExtVec0->getOperand(1)); ConstantSDNode *C1 = dyn_cast(ExtVec1->getOperand(1)); // For the constant, we want to see all the even or all the odd. if (!C0 || !C1 || C0->getZExtValue() != nextIndex || C1->getZExtValue() != nextIndex+1) return SDValue(); // Increment index. nextIndex+=2; } else return SDValue(); } // Create VPADDL node. SelectionDAG &DAG = DCI.DAG; const TargetLowering &TLI = DAG.getTargetLoweringInfo(); // Build operand list. SmallVector Ops; Ops.push_back(DAG.getConstant(Intrinsic::arm_neon_vpaddls, TLI.getPointerTy())); // Input is the vector. Ops.push_back(Vec); // Get widened type and narrowed type. MVT widenType; unsigned numElem = VT.getVectorNumElements(); switch (VT.getVectorElementType().getSimpleVT().SimpleTy) { case MVT::i8: widenType = MVT::getVectorVT(MVT::i16, numElem); break; case MVT::i16: widenType = MVT::getVectorVT(MVT::i32, numElem); break; case MVT::i32: widenType = MVT::getVectorVT(MVT::i64, numElem); break; default: assert(0 && "Invalid vector element type for padd optimization."); } SDValue tmp = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, N->getDebugLoc(), widenType, &Ops[0], Ops.size()); return DAG.getNode(ISD::TRUNCATE, N->getDebugLoc(), VT, tmp); } /// PerformADDCombineWithOperands - Try DAG combinations for an ADD with /// operands N0 and N1. This is a helper for PerformADDCombine that is /// called with the default operands, and if that fails, with commuted /// operands. static SDValue PerformADDCombineWithOperands(SDNode *N, SDValue N0, SDValue N1, TargetLowering::DAGCombinerInfo &DCI, const ARMSubtarget *Subtarget){ // Attempt to create vpaddl for this add. SDValue Result = AddCombineToVPADDL(N, N0, N1, DCI, Subtarget); if (Result.getNode()) return Result; // fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c)) if (N0.getOpcode() == ISD::SELECT && N0.getNode()->hasOneUse()) { SDValue Result = combineSelectAndUse(N, N0, N1, DCI); if (Result.getNode()) return Result; } return SDValue(); } /// PerformADDCombine - Target-specific dag combine xforms for ISD::ADD. /// static SDValue PerformADDCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI, const ARMSubtarget *Subtarget) { SDValue N0 = N->getOperand(0); SDValue N1 = N->getOperand(1); // First try with the default operand order. SDValue Result = PerformADDCombineWithOperands(N, N0, N1, DCI, Subtarget); if (Result.getNode()) return Result; // If that didn't work, try again with the operands commuted. return PerformADDCombineWithOperands(N, N1, N0, DCI, Subtarget); } /// PerformSUBCombine - Target-specific dag combine xforms for ISD::SUB. /// static SDValue PerformSUBCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) { SDValue N0 = N->getOperand(0); SDValue N1 = N->getOperand(1); // fold (sub x, (select cc, 0, c)) -> (select cc, x, (sub, x, c)) if (N1.getOpcode() == ISD::SELECT && N1.getNode()->hasOneUse()) { SDValue Result = combineSelectAndUse(N, N1, N0, DCI); if (Result.getNode()) return Result; } return SDValue(); } /// PerformVMULCombine /// Distribute (A + B) * C to (A * C) + (B * C) to take advantage of the /// special multiplier accumulator forwarding. /// vmul d3, d0, d2 /// vmla d3, d1, d2 /// is faster than /// vadd d3, d0, d1 /// vmul d3, d3, d2 static SDValue PerformVMULCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI, const ARMSubtarget *Subtarget) { if (!Subtarget->hasVMLxForwarding()) return SDValue(); SelectionDAG &DAG = DCI.DAG; SDValue N0 = N->getOperand(0); SDValue N1 = N->getOperand(1); unsigned Opcode = N0.getOpcode(); if (Opcode != ISD::ADD && Opcode != ISD::SUB && Opcode != ISD::FADD && Opcode != ISD::FSUB) { Opcode = N1.getOpcode(); if (Opcode != ISD::ADD && Opcode != ISD::SUB && Opcode != ISD::FADD && Opcode != ISD::FSUB) return SDValue(); std::swap(N0, N1); } EVT VT = N->getValueType(0); DebugLoc DL = N->getDebugLoc(); SDValue N00 = N0->getOperand(0); SDValue N01 = N0->getOperand(1); return DAG.getNode(Opcode, DL, VT, DAG.getNode(ISD::MUL, DL, VT, N00, N1), DAG.getNode(ISD::MUL, DL, VT, N01, N1)); } static SDValue PerformMULCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI, const ARMSubtarget *Subtarget) { SelectionDAG &DAG = DCI.DAG; if (Subtarget->isThumb1Only()) return SDValue(); if (DCI.isBeforeLegalize() || DCI.isCalledByLegalizer()) return SDValue(); EVT VT = N->getValueType(0); if (VT.is64BitVector() || VT.is128BitVector()) return PerformVMULCombine(N, DCI, Subtarget); if (VT != MVT::i32) return SDValue(); ConstantSDNode *C = dyn_cast(N->getOperand(1)); if (!C) return SDValue(); uint64_t MulAmt = C->getZExtValue(); unsigned ShiftAmt = CountTrailingZeros_64(MulAmt); ShiftAmt = ShiftAmt & (32 - 1); SDValue V = N->getOperand(0); DebugLoc DL = N->getDebugLoc(); SDValue Res; MulAmt >>= ShiftAmt; if (isPowerOf2_32(MulAmt - 1)) { // (mul x, 2^N + 1) => (add (shl x, N), x) Res = DAG.getNode(ISD::ADD, DL, VT, V, DAG.getNode(ISD::SHL, DL, VT, V, DAG.getConstant(Log2_32(MulAmt-1), MVT::i32))); } else if (isPowerOf2_32(MulAmt + 1)) { // (mul x, 2^N - 1) => (sub (shl x, N), x) Res = DAG.getNode(ISD::SUB, DL, VT, DAG.getNode(ISD::SHL, DL, VT, V, DAG.getConstant(Log2_32(MulAmt+1), MVT::i32)), V); } else return SDValue(); if (ShiftAmt != 0) Res = DAG.getNode(ISD::SHL, DL, VT, Res, DAG.getConstant(ShiftAmt, MVT::i32)); // Do not add new nodes to DAG combiner worklist. DCI.CombineTo(N, Res, false); return SDValue(); } static SDValue PerformANDCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) { // Attempt to use immediate-form VBIC BuildVectorSDNode *BVN = dyn_cast(N->getOperand(1)); DebugLoc dl = N->getDebugLoc(); EVT VT = N->getValueType(0); SelectionDAG &DAG = DCI.DAG; if(!DAG.getTargetLoweringInfo().isTypeLegal(VT)) return SDValue(); APInt SplatBits, SplatUndef; unsigned SplatBitSize; bool HasAnyUndefs; if (BVN && BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs)) { if (SplatBitSize <= 64) { EVT VbicVT; SDValue Val = isNEONModifiedImm((~SplatBits).getZExtValue(), SplatUndef.getZExtValue(), SplatBitSize, DAG, VbicVT, VT.is128BitVector(), OtherModImm); if (Val.getNode()) { SDValue Input = DAG.getNode(ISD::BITCAST, dl, VbicVT, N->getOperand(0)); SDValue Vbic = DAG.getNode(ARMISD::VBICIMM, dl, VbicVT, Input, Val); return DAG.getNode(ISD::BITCAST, dl, VT, Vbic); } } } return SDValue(); } /// PerformORCombine - Target-specific dag combine xforms for ISD::OR static SDValue PerformORCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI, const ARMSubtarget *Subtarget) { // Attempt to use immediate-form VORR BuildVectorSDNode *BVN = dyn_cast(N->getOperand(1)); DebugLoc dl = N->getDebugLoc(); EVT VT = N->getValueType(0); SelectionDAG &DAG = DCI.DAG; if(!DAG.getTargetLoweringInfo().isTypeLegal(VT)) return SDValue(); APInt SplatBits, SplatUndef; unsigned SplatBitSize; bool HasAnyUndefs; if (BVN && Subtarget->hasNEON() && BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs)) { if (SplatBitSize <= 64) { EVT VorrVT; SDValue Val = isNEONModifiedImm(SplatBits.getZExtValue(), SplatUndef.getZExtValue(), SplatBitSize, DAG, VorrVT, VT.is128BitVector(), OtherModImm); if (Val.getNode()) { SDValue Input = DAG.getNode(ISD::BITCAST, dl, VorrVT, N->getOperand(0)); SDValue Vorr = DAG.getNode(ARMISD::VORRIMM, dl, VorrVT, Input, Val); return DAG.getNode(ISD::BITCAST, dl, VT, Vorr); } } } SDValue N0 = N->getOperand(0); if (N0.getOpcode() != ISD::AND) return SDValue(); SDValue N1 = N->getOperand(1); // (or (and B, A), (and C, ~A)) => (VBSL A, B, C) when A is a constant. if (Subtarget->hasNEON() && N1.getOpcode() == ISD::AND && VT.isVector() && DAG.getTargetLoweringInfo().isTypeLegal(VT)) { APInt SplatUndef; unsigned SplatBitSize; bool HasAnyUndefs; BuildVectorSDNode *BVN0 = dyn_cast(N0->getOperand(1)); APInt SplatBits0; if (BVN0 && BVN0->isConstantSplat(SplatBits0, SplatUndef, SplatBitSize, HasAnyUndefs) && !HasAnyUndefs) { BuildVectorSDNode *BVN1 = dyn_cast(N1->getOperand(1)); APInt SplatBits1; if (BVN1 && BVN1->isConstantSplat(SplatBits1, SplatUndef, SplatBitSize, HasAnyUndefs) && !HasAnyUndefs && SplatBits0 == ~SplatBits1) { // Canonicalize the vector type to make instruction selection simpler. EVT CanonicalVT = VT.is128BitVector() ? MVT::v4i32 : MVT::v2i32; SDValue Result = DAG.getNode(ARMISD::VBSL, dl, CanonicalVT, N0->getOperand(1), N0->getOperand(0), N1->getOperand(0)); return DAG.getNode(ISD::BITCAST, dl, VT, Result); } } } // Try to use the ARM/Thumb2 BFI (bitfield insert) instruction when // reasonable. // BFI is only available on V6T2+ if (Subtarget->isThumb1Only() || !Subtarget->hasV6T2Ops()) return SDValue(); DebugLoc DL = N->getDebugLoc(); // 1) or (and A, mask), val => ARMbfi A, val, mask // iff (val & mask) == val // // 2) or (and A, mask), (and B, mask2) => ARMbfi A, (lsr B, amt), mask // 2a) iff isBitFieldInvertedMask(mask) && isBitFieldInvertedMask(~mask2) // && mask == ~mask2 // 2b) iff isBitFieldInvertedMask(~mask) && isBitFieldInvertedMask(mask2) // && ~mask == mask2 // (i.e., copy a bitfield value into another bitfield of the same width) if (VT != MVT::i32) return SDValue(); SDValue N00 = N0.getOperand(0); // The value and the mask need to be constants so we can verify this is // actually a bitfield set. If the mask is 0xffff, we can do better // via a movt instruction, so don't use BFI in that case. SDValue MaskOp = N0.getOperand(1); ConstantSDNode *MaskC = dyn_cast(MaskOp); if (!MaskC) return SDValue(); unsigned Mask = MaskC->getZExtValue(); if (Mask == 0xffff) return SDValue(); SDValue Res; // Case (1): or (and A, mask), val => ARMbfi A, val, mask ConstantSDNode *N1C = dyn_cast(N1); if (N1C) { unsigned Val = N1C->getZExtValue(); if ((Val & ~Mask) != Val) return SDValue(); if (ARM::isBitFieldInvertedMask(Mask)) { Val >>= CountTrailingZeros_32(~Mask); Res = DAG.getNode(ARMISD::BFI, DL, VT, N00, DAG.getConstant(Val, MVT::i32), DAG.getConstant(Mask, MVT::i32)); // Do not add new nodes to DAG combiner worklist. DCI.CombineTo(N, Res, false); return SDValue(); } } else if (N1.getOpcode() == ISD::AND) { // case (2) or (and A, mask), (and B, mask2) => ARMbfi A, (lsr B, amt), mask ConstantSDNode *N11C = dyn_cast(N1.getOperand(1)); if (!N11C) return SDValue(); unsigned Mask2 = N11C->getZExtValue(); // Mask and ~Mask2 (or reverse) must be equivalent for the BFI pattern // as is to match. if (ARM::isBitFieldInvertedMask(Mask) && (Mask == ~Mask2)) { // The pack halfword instruction works better for masks that fit it, // so use that when it's available. if (Subtarget->hasT2ExtractPack() && (Mask == 0xffff || Mask == 0xffff0000)) return SDValue(); // 2a unsigned amt = CountTrailingZeros_32(Mask2); Res = DAG.getNode(ISD::SRL, DL, VT, N1.getOperand(0), DAG.getConstant(amt, MVT::i32)); Res = DAG.getNode(ARMISD::BFI, DL, VT, N00, Res, DAG.getConstant(Mask, MVT::i32)); // Do not add new nodes to DAG combiner worklist. DCI.CombineTo(N, Res, false); return SDValue(); } else if (ARM::isBitFieldInvertedMask(~Mask) && (~Mask == Mask2)) { // The pack halfword instruction works better for masks that fit it, // so use that when it's available. if (Subtarget->hasT2ExtractPack() && (Mask2 == 0xffff || Mask2 == 0xffff0000)) return SDValue(); // 2b unsigned lsb = CountTrailingZeros_32(Mask); Res = DAG.getNode(ISD::SRL, DL, VT, N00, DAG.getConstant(lsb, MVT::i32)); Res = DAG.getNode(ARMISD::BFI, DL, VT, N1.getOperand(0), Res, DAG.getConstant(Mask2, MVT::i32)); // Do not add new nodes to DAG combiner worklist. DCI.CombineTo(N, Res, false); return SDValue(); } } if (DAG.MaskedValueIsZero(N1, MaskC->getAPIntValue()) && N00.getOpcode() == ISD::SHL && isa(N00.getOperand(1)) && ARM::isBitFieldInvertedMask(~Mask)) { // Case (3): or (and (shl A, #shamt), mask), B => ARMbfi B, A, ~mask // where lsb(mask) == #shamt and masked bits of B are known zero. SDValue ShAmt = N00.getOperand(1); unsigned ShAmtC = cast(ShAmt)->getZExtValue(); unsigned LSB = CountTrailingZeros_32(Mask); if (ShAmtC != LSB) return SDValue(); Res = DAG.getNode(ARMISD::BFI, DL, VT, N1, N00.getOperand(0), DAG.getConstant(~Mask, MVT::i32)); // Do not add new nodes to DAG combiner worklist. DCI.CombineTo(N, Res, false); } return SDValue(); } /// PerformBFICombine - (bfi A, (and B, Mask1), Mask2) -> (bfi A, B, Mask2) iff /// the bits being cleared by the AND are not demanded by the BFI. static SDValue PerformBFICombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) { SDValue N1 = N->getOperand(1); if (N1.getOpcode() == ISD::AND) { ConstantSDNode *N11C = dyn_cast(N1.getOperand(1)); if (!N11C) return SDValue(); unsigned InvMask = cast(N->getOperand(2))->getZExtValue(); unsigned LSB = CountTrailingZeros_32(~InvMask); unsigned Width = (32 - CountLeadingZeros_32(~InvMask)) - LSB; unsigned Mask = (1 << Width)-1; unsigned Mask2 = N11C->getZExtValue(); if ((Mask & (~Mask2)) == 0) return DCI.DAG.getNode(ARMISD::BFI, N->getDebugLoc(), N->getValueType(0), N->getOperand(0), N1.getOperand(0), N->getOperand(2)); } return SDValue(); } /// PerformVMOVRRDCombine - Target-specific dag combine xforms for /// ARMISD::VMOVRRD. static SDValue PerformVMOVRRDCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) { // vmovrrd(vmovdrr x, y) -> x,y SDValue InDouble = N->getOperand(0); if (InDouble.getOpcode() == ARMISD::VMOVDRR) return DCI.CombineTo(N, InDouble.getOperand(0), InDouble.getOperand(1)); // vmovrrd(load f64) -> (load i32), (load i32) SDNode *InNode = InDouble.getNode(); if (ISD::isNormalLoad(InNode) && InNode->hasOneUse() && InNode->getValueType(0) == MVT::f64 && InNode->getOperand(1).getOpcode() == ISD::FrameIndex && !cast(InNode)->isVolatile()) { // TODO: Should this be done for non-FrameIndex operands? LoadSDNode *LD = cast(InNode); SelectionDAG &DAG = DCI.DAG; DebugLoc DL = LD->getDebugLoc(); SDValue BasePtr = LD->getBasePtr(); SDValue NewLD1 = DAG.getLoad(MVT::i32, DL, LD->getChain(), BasePtr, LD->getPointerInfo(), LD->isVolatile(), LD->isNonTemporal(), LD->getAlignment()); SDValue OffsetPtr = DAG.getNode(ISD::ADD, DL, MVT::i32, BasePtr, DAG.getConstant(4, MVT::i32)); SDValue NewLD2 = DAG.getLoad(MVT::i32, DL, NewLD1.getValue(1), OffsetPtr, LD->getPointerInfo(), LD->isVolatile(), LD->isNonTemporal(), std::min(4U, LD->getAlignment() / 2)); DAG.ReplaceAllUsesOfValueWith(SDValue(LD, 1), NewLD2.getValue(1)); SDValue Result = DCI.CombineTo(N, NewLD1, NewLD2); DCI.RemoveFromWorklist(LD); DAG.DeleteNode(LD); return Result; } return SDValue(); } /// PerformVMOVDRRCombine - Target-specific dag combine xforms for /// ARMISD::VMOVDRR. This is also used for BUILD_VECTORs with 2 operands. static SDValue PerformVMOVDRRCombine(SDNode *N, SelectionDAG &DAG) { // N=vmovrrd(X); vmovdrr(N:0, N:1) -> bit_convert(X) SDValue Op0 = N->getOperand(0); SDValue Op1 = N->getOperand(1); if (Op0.getOpcode() == ISD::BITCAST) Op0 = Op0.getOperand(0); if (Op1.getOpcode() == ISD::BITCAST) Op1 = Op1.getOperand(0); if (Op0.getOpcode() == ARMISD::VMOVRRD && Op0.getNode() == Op1.getNode() && Op0.getResNo() == 0 && Op1.getResNo() == 1) return DAG.getNode(ISD::BITCAST, N->getDebugLoc(), N->getValueType(0), Op0.getOperand(0)); return SDValue(); } /// PerformSTORECombine - Target-specific dag combine xforms for /// ISD::STORE. static SDValue PerformSTORECombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) { // Bitcast an i64 store extracted from a vector to f64. // Otherwise, the i64 value will be legalized to a pair of i32 values. StoreSDNode *St = cast(N); SDValue StVal = St->getValue(); if (!ISD::isNormalStore(St) || St->isVolatile()) return SDValue(); if (StVal.getNode()->getOpcode() == ARMISD::VMOVDRR && StVal.getNode()->hasOneUse() && !St->isVolatile()) { SelectionDAG &DAG = DCI.DAG; DebugLoc DL = St->getDebugLoc(); SDValue BasePtr = St->getBasePtr(); SDValue NewST1 = DAG.getStore(St->getChain(), DL, StVal.getNode()->getOperand(0), BasePtr, St->getPointerInfo(), St->isVolatile(), St->isNonTemporal(), St->getAlignment()); SDValue OffsetPtr = DAG.getNode(ISD::ADD, DL, MVT::i32, BasePtr, DAG.getConstant(4, MVT::i32)); return DAG.getStore(NewST1.getValue(0), DL, StVal.getNode()->getOperand(1), OffsetPtr, St->getPointerInfo(), St->isVolatile(), St->isNonTemporal(), std::min(4U, St->getAlignment() / 2)); } if (StVal.getValueType() != MVT::i64 || StVal.getNode()->getOpcode() != ISD::EXTRACT_VECTOR_ELT) return SDValue(); SelectionDAG &DAG = DCI.DAG; DebugLoc dl = StVal.getDebugLoc(); SDValue IntVec = StVal.getOperand(0); EVT FloatVT = EVT::getVectorVT(*DAG.getContext(), MVT::f64, IntVec.getValueType().getVectorNumElements()); SDValue Vec = DAG.getNode(ISD::BITCAST, dl, FloatVT, IntVec); SDValue ExtElt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64, Vec, StVal.getOperand(1)); dl = N->getDebugLoc(); SDValue V = DAG.getNode(ISD::BITCAST, dl, MVT::i64, ExtElt); // Make the DAGCombiner fold the bitcasts. DCI.AddToWorklist(Vec.getNode()); DCI.AddToWorklist(ExtElt.getNode()); DCI.AddToWorklist(V.getNode()); return DAG.getStore(St->getChain(), dl, V, St->getBasePtr(), St->getPointerInfo(), St->isVolatile(), St->isNonTemporal(), St->getAlignment(), St->getTBAAInfo()); } /// hasNormalLoadOperand - Check if any of the operands of a BUILD_VECTOR node /// are normal, non-volatile loads. If so, it is profitable to bitcast an /// i64 vector to have f64 elements, since the value can then be loaded /// directly into a VFP register. static bool hasNormalLoadOperand(SDNode *N) { unsigned NumElts = N->getValueType(0).getVectorNumElements(); for (unsigned i = 0; i < NumElts; ++i) { SDNode *Elt = N->getOperand(i).getNode(); if (ISD::isNormalLoad(Elt) && !cast(Elt)->isVolatile()) return true; } return false; } /// PerformBUILD_VECTORCombine - Target-specific dag combine xforms for /// ISD::BUILD_VECTOR. static SDValue PerformBUILD_VECTORCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI){ // build_vector(N=ARMISD::VMOVRRD(X), N:1) -> bit_convert(X): // VMOVRRD is introduced when legalizing i64 types. It forces the i64 value // into a pair of GPRs, which is fine when the value is used as a scalar, // but if the i64 value is converted to a vector, we need to undo the VMOVRRD. SelectionDAG &DAG = DCI.DAG; if (N->getNumOperands() == 2) { SDValue RV = PerformVMOVDRRCombine(N, DAG); if (RV.getNode()) return RV; } // Load i64 elements as f64 values so that type legalization does not split // them up into i32 values. EVT VT = N->getValueType(0); if (VT.getVectorElementType() != MVT::i64 || !hasNormalLoadOperand(N)) return SDValue(); DebugLoc dl = N->getDebugLoc(); SmallVector Ops; unsigned NumElts = VT.getVectorNumElements(); for (unsigned i = 0; i < NumElts; ++i) { SDValue V = DAG.getNode(ISD::BITCAST, dl, MVT::f64, N->getOperand(i)); Ops.push_back(V); // Make the DAGCombiner fold the bitcast. DCI.AddToWorklist(V.getNode()); } EVT FloatVT = EVT::getVectorVT(*DAG.getContext(), MVT::f64, NumElts); SDValue BV = DAG.getNode(ISD::BUILD_VECTOR, dl, FloatVT, Ops.data(), NumElts); return DAG.getNode(ISD::BITCAST, dl, VT, BV); } /// PerformInsertEltCombine - Target-specific dag combine xforms for /// ISD::INSERT_VECTOR_ELT. static SDValue PerformInsertEltCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) { // Bitcast an i64 load inserted into a vector to f64. // Otherwise, the i64 value will be legalized to a pair of i32 values. EVT VT = N->getValueType(0); SDNode *Elt = N->getOperand(1).getNode(); if (VT.getVectorElementType() != MVT::i64 || !ISD::isNormalLoad(Elt) || cast(Elt)->isVolatile()) return SDValue(); SelectionDAG &DAG = DCI.DAG; DebugLoc dl = N->getDebugLoc(); EVT FloatVT = EVT::getVectorVT(*DAG.getContext(), MVT::f64, VT.getVectorNumElements()); SDValue Vec = DAG.getNode(ISD::BITCAST, dl, FloatVT, N->getOperand(0)); SDValue V = DAG.getNode(ISD::BITCAST, dl, MVT::f64, N->getOperand(1)); // Make the DAGCombiner fold the bitcasts. DCI.AddToWorklist(Vec.getNode()); DCI.AddToWorklist(V.getNode()); SDValue InsElt = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, FloatVT, Vec, V, N->getOperand(2)); return DAG.getNode(ISD::BITCAST, dl, VT, InsElt); } /// PerformVECTOR_SHUFFLECombine - Target-specific dag combine xforms for /// ISD::VECTOR_SHUFFLE. static SDValue PerformVECTOR_SHUFFLECombine(SDNode *N, SelectionDAG &DAG) { // The LLVM shufflevector instruction does not require the shuffle mask // length to match the operand vector length, but ISD::VECTOR_SHUFFLE does // have that requirement. When translating to ISD::VECTOR_SHUFFLE, if the // operands do not match the mask length, they are extended by concatenating // them with undef vectors. That is probably the right thing for other // targets, but for NEON it is better to concatenate two double-register // size vector operands into a single quad-register size vector. Do that // transformation here: // shuffle(concat(v1, undef), concat(v2, undef)) -> // shuffle(concat(v1, v2), undef) SDValue Op0 = N->getOperand(0); SDValue Op1 = N->getOperand(1); if (Op0.getOpcode() != ISD::CONCAT_VECTORS || Op1.getOpcode() != ISD::CONCAT_VECTORS || Op0.getNumOperands() != 2 || Op1.getNumOperands() != 2) return SDValue(); SDValue Concat0Op1 = Op0.getOperand(1); SDValue Concat1Op1 = Op1.getOperand(1); if (Concat0Op1.getOpcode() != ISD::UNDEF || Concat1Op1.getOpcode() != ISD::UNDEF) return SDValue(); // Skip the transformation if any of the types are illegal. const TargetLowering &TLI = DAG.getTargetLoweringInfo(); EVT VT = N->getValueType(0); if (!TLI.isTypeLegal(VT) || !TLI.isTypeLegal(Concat0Op1.getValueType()) || !TLI.isTypeLegal(Concat1Op1.getValueType())) return SDValue(); SDValue NewConcat = DAG.getNode(ISD::CONCAT_VECTORS, N->getDebugLoc(), VT, Op0.getOperand(0), Op1.getOperand(0)); // Translate the shuffle mask. SmallVector NewMask; unsigned NumElts = VT.getVectorNumElements(); unsigned HalfElts = NumElts/2; ShuffleVectorSDNode *SVN = cast(N); for (unsigned n = 0; n < NumElts; ++n) { int MaskElt = SVN->getMaskElt(n); int NewElt = -1; if (MaskElt < (int)HalfElts) NewElt = MaskElt; else if (MaskElt >= (int)NumElts && MaskElt < (int)(NumElts + HalfElts)) NewElt = HalfElts + MaskElt - NumElts; NewMask.push_back(NewElt); } return DAG.getVectorShuffle(VT, N->getDebugLoc(), NewConcat, DAG.getUNDEF(VT), NewMask.data()); } /// CombineBaseUpdate - Target-specific DAG combine function for VLDDUP and /// NEON load/store intrinsics to merge base address updates. static SDValue CombineBaseUpdate(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) { if (DCI.isBeforeLegalize() || DCI.isCalledByLegalizer()) return SDValue(); SelectionDAG &DAG = DCI.DAG; bool isIntrinsic = (N->getOpcode() == ISD::INTRINSIC_VOID || N->getOpcode() == ISD::INTRINSIC_W_CHAIN); unsigned AddrOpIdx = (isIntrinsic ? 2 : 1); SDValue Addr = N->getOperand(AddrOpIdx); // Search for a use of the address operand that is an increment. for (SDNode::use_iterator UI = Addr.getNode()->use_begin(), UE = Addr.getNode()->use_end(); UI != UE; ++UI) { SDNode *User = *UI; if (User->getOpcode() != ISD::ADD || UI.getUse().getResNo() != Addr.getResNo()) continue; // Check that the add is independent of the load/store. Otherwise, folding // it would create a cycle. if (User->isPredecessorOf(N) || N->isPredecessorOf(User)) continue; // Find the new opcode for the updating load/store. bool isLoad = true; bool isLaneOp = false; unsigned NewOpc = 0; unsigned NumVecs = 0; if (isIntrinsic) { unsigned IntNo = cast(N->getOperand(1))->getZExtValue(); switch (IntNo) { default: assert(0 && "unexpected intrinsic for Neon base update"); case Intrinsic::arm_neon_vld1: NewOpc = ARMISD::VLD1_UPD; NumVecs = 1; break; case Intrinsic::arm_neon_vld2: NewOpc = ARMISD::VLD2_UPD; NumVecs = 2; break; case Intrinsic::arm_neon_vld3: NewOpc = ARMISD::VLD3_UPD; NumVecs = 3; break; case Intrinsic::arm_neon_vld4: NewOpc = ARMISD::VLD4_UPD; NumVecs = 4; break; case Intrinsic::arm_neon_vld2lane: NewOpc = ARMISD::VLD2LN_UPD; NumVecs = 2; isLaneOp = true; break; case Intrinsic::arm_neon_vld3lane: NewOpc = ARMISD::VLD3LN_UPD; NumVecs = 3; isLaneOp = true; break; case Intrinsic::arm_neon_vld4lane: NewOpc = ARMISD::VLD4LN_UPD; NumVecs = 4; isLaneOp = true; break; case Intrinsic::arm_neon_vst1: NewOpc = ARMISD::VST1_UPD; NumVecs = 1; isLoad = false; break; case Intrinsic::arm_neon_vst2: NewOpc = ARMISD::VST2_UPD; NumVecs = 2; isLoad = false; break; case Intrinsic::arm_neon_vst3: NewOpc = ARMISD::VST3_UPD; NumVecs = 3; isLoad = false; break; case Intrinsic::arm_neon_vst4: NewOpc = ARMISD::VST4_UPD; NumVecs = 4; isLoad = false; break; case Intrinsic::arm_neon_vst2lane: NewOpc = ARMISD::VST2LN_UPD; NumVecs = 2; isLoad = false; isLaneOp = true; break; case Intrinsic::arm_neon_vst3lane: NewOpc = ARMISD::VST3LN_UPD; NumVecs = 3; isLoad = false; isLaneOp = true; break; case Intrinsic::arm_neon_vst4lane: NewOpc = ARMISD::VST4LN_UPD; NumVecs = 4; isLoad = false; isLaneOp = true; break; } } else { isLaneOp = true; switch (N->getOpcode()) { default: assert(0 && "unexpected opcode for Neon base update"); case ARMISD::VLD2DUP: NewOpc = ARMISD::VLD2DUP_UPD; NumVecs = 2; break; case ARMISD::VLD3DUP: NewOpc = ARMISD::VLD3DUP_UPD; NumVecs = 3; break; case ARMISD::VLD4DUP: NewOpc = ARMISD::VLD4DUP_UPD; NumVecs = 4; break; } } // Find the size of memory referenced by the load/store. EVT VecTy; if (isLoad) VecTy = N->getValueType(0); else VecTy = N->getOperand(AddrOpIdx+1).getValueType(); unsigned NumBytes = NumVecs * VecTy.getSizeInBits() / 8; if (isLaneOp) NumBytes /= VecTy.getVectorNumElements(); // If the increment is a constant, it must match the memory ref size. SDValue Inc = User->getOperand(User->getOperand(0) == Addr ? 1 : 0); if (ConstantSDNode *CInc = dyn_cast(Inc.getNode())) { uint64_t IncVal = CInc->getZExtValue(); if (IncVal != NumBytes) continue; } else if (NumBytes >= 3 * 16) { // VLD3/4 and VST3/4 for 128-bit vectors are implemented with two // separate instructions that make it harder to use a non-constant update. continue; } // Create the new updating load/store node. EVT Tys[6]; unsigned NumResultVecs = (isLoad ? NumVecs : 0); unsigned n; for (n = 0; n < NumResultVecs; ++n) Tys[n] = VecTy; Tys[n++] = MVT::i32; Tys[n] = MVT::Other; SDVTList SDTys = DAG.getVTList(Tys, NumResultVecs+2); SmallVector Ops; Ops.push_back(N->getOperand(0)); // incoming chain Ops.push_back(N->getOperand(AddrOpIdx)); Ops.push_back(Inc); for (unsigned i = AddrOpIdx + 1; i < N->getNumOperands(); ++i) { Ops.push_back(N->getOperand(i)); } MemIntrinsicSDNode *MemInt = cast(N); SDValue UpdN = DAG.getMemIntrinsicNode(NewOpc, N->getDebugLoc(), SDTys, Ops.data(), Ops.size(), MemInt->getMemoryVT(), MemInt->getMemOperand()); // Update the uses. std::vector NewResults; for (unsigned i = 0; i < NumResultVecs; ++i) { NewResults.push_back(SDValue(UpdN.getNode(), i)); } NewResults.push_back(SDValue(UpdN.getNode(), NumResultVecs+1)); // chain DCI.CombineTo(N, NewResults); DCI.CombineTo(User, SDValue(UpdN.getNode(), NumResultVecs)); break; } return SDValue(); } /// CombineVLDDUP - For a VDUPLANE node N, check if its source operand is a /// vldN-lane (N > 1) intrinsic, and if all the other uses of that intrinsic /// are also VDUPLANEs. If so, combine them to a vldN-dup operation and /// return true. static bool CombineVLDDUP(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) { SelectionDAG &DAG = DCI.DAG; EVT VT = N->getValueType(0); // vldN-dup instructions only support 64-bit vectors for N > 1. if (!VT.is64BitVector()) return false; // Check if the VDUPLANE operand is a vldN-dup intrinsic. SDNode *VLD = N->getOperand(0).getNode(); if (VLD->getOpcode() != ISD::INTRINSIC_W_CHAIN) return false; unsigned NumVecs = 0; unsigned NewOpc = 0; unsigned IntNo = cast(VLD->getOperand(1))->getZExtValue(); if (IntNo == Intrinsic::arm_neon_vld2lane) { NumVecs = 2; NewOpc = ARMISD::VLD2DUP; } else if (IntNo == Intrinsic::arm_neon_vld3lane) { NumVecs = 3; NewOpc = ARMISD::VLD3DUP; } else if (IntNo == Intrinsic::arm_neon_vld4lane) { NumVecs = 4; NewOpc = ARMISD::VLD4DUP; } else { return false; } // First check that all the vldN-lane uses are VDUPLANEs and that the lane // numbers match the load. unsigned VLDLaneNo = cast(VLD->getOperand(NumVecs+3))->getZExtValue(); for (SDNode::use_iterator UI = VLD->use_begin(), UE = VLD->use_end(); UI != UE; ++UI) { // Ignore uses of the chain result. if (UI.getUse().getResNo() == NumVecs) continue; SDNode *User = *UI; if (User->getOpcode() != ARMISD::VDUPLANE || VLDLaneNo != cast(User->getOperand(1))->getZExtValue()) return false; } // Create the vldN-dup node. EVT Tys[5]; unsigned n; for (n = 0; n < NumVecs; ++n) Tys[n] = VT; Tys[n] = MVT::Other; SDVTList SDTys = DAG.getVTList(Tys, NumVecs+1); SDValue Ops[] = { VLD->getOperand(0), VLD->getOperand(2) }; MemIntrinsicSDNode *VLDMemInt = cast(VLD); SDValue VLDDup = DAG.getMemIntrinsicNode(NewOpc, VLD->getDebugLoc(), SDTys, Ops, 2, VLDMemInt->getMemoryVT(), VLDMemInt->getMemOperand()); // Update the uses. for (SDNode::use_iterator UI = VLD->use_begin(), UE = VLD->use_end(); UI != UE; ++UI) { unsigned ResNo = UI.getUse().getResNo(); // Ignore uses of the chain result. if (ResNo == NumVecs) continue; SDNode *User = *UI; DCI.CombineTo(User, SDValue(VLDDup.getNode(), ResNo)); } // Now the vldN-lane intrinsic is dead except for its chain result. // Update uses of the chain. std::vector VLDDupResults; for (unsigned n = 0; n < NumVecs; ++n) VLDDupResults.push_back(SDValue(VLDDup.getNode(), n)); VLDDupResults.push_back(SDValue(VLDDup.getNode(), NumVecs)); DCI.CombineTo(VLD, VLDDupResults); return true; } /// PerformVDUPLANECombine - Target-specific dag combine xforms for /// ARMISD::VDUPLANE. static SDValue PerformVDUPLANECombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) { SDValue Op = N->getOperand(0); // If the source is a vldN-lane (N > 1) intrinsic, and all the other uses // of that intrinsic are also VDUPLANEs, combine them to a vldN-dup operation. if (CombineVLDDUP(N, DCI)) return SDValue(N, 0); // If the source is already a VMOVIMM or VMVNIMM splat, the VDUPLANE is // redundant. Ignore bit_converts for now; element sizes are checked below. while (Op.getOpcode() == ISD::BITCAST) Op = Op.getOperand(0); if (Op.getOpcode() != ARMISD::VMOVIMM && Op.getOpcode() != ARMISD::VMVNIMM) return SDValue(); // Make sure the VMOV element size is not bigger than the VDUPLANE elements. unsigned EltSize = Op.getValueType().getVectorElementType().getSizeInBits(); // The canonical VMOV for a zero vector uses a 32-bit element size. unsigned Imm = cast(Op.getOperand(0))->getZExtValue(); unsigned EltBits; if (ARM_AM::decodeNEONModImm(Imm, EltBits) == 0) EltSize = 8; EVT VT = N->getValueType(0); if (EltSize > VT.getVectorElementType().getSizeInBits()) return SDValue(); return DCI.DAG.getNode(ISD::BITCAST, N->getDebugLoc(), VT, Op); } // isConstVecPow2 - Return true if each vector element is a power of 2, all // elements are the same constant, C, and Log2(C) ranges from 1 to 32. static bool isConstVecPow2(SDValue ConstVec, bool isSigned, uint64_t &C) { integerPart cN; integerPart c0 = 0; for (unsigned I = 0, E = ConstVec.getValueType().getVectorNumElements(); I != E; I++) { ConstantFPSDNode *C = dyn_cast(ConstVec.getOperand(I)); if (!C) return false; bool isExact; APFloat APF = C->getValueAPF(); if (APF.convertToInteger(&cN, 64, isSigned, APFloat::rmTowardZero, &isExact) != APFloat::opOK || !isExact) return false; c0 = (I == 0) ? cN : c0; if (!isPowerOf2_64(cN) || c0 != cN || Log2_64(c0) < 1 || Log2_64(c0) > 32) return false; } C = c0; return true; } /// PerformVCVTCombine - VCVT (floating-point to fixed-point, Advanced SIMD) /// can replace combinations of VMUL and VCVT (floating-point to integer) /// when the VMUL has a constant operand that is a power of 2. /// /// Example (assume d17 = ): /// vmul.f32 d16, d17, d16 /// vcvt.s32.f32 d16, d16 /// becomes: /// vcvt.s32.f32 d16, d16, #3 static SDValue PerformVCVTCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI, const ARMSubtarget *Subtarget) { SelectionDAG &DAG = DCI.DAG; SDValue Op = N->getOperand(0); if (!Subtarget->hasNEON() || !Op.getValueType().isVector() || Op.getOpcode() != ISD::FMUL) return SDValue(); uint64_t C; SDValue N0 = Op->getOperand(0); SDValue ConstVec = Op->getOperand(1); bool isSigned = N->getOpcode() == ISD::FP_TO_SINT; if (ConstVec.getOpcode() != ISD::BUILD_VECTOR || !isConstVecPow2(ConstVec, isSigned, C)) return SDValue(); unsigned IntrinsicOpcode = isSigned ? Intrinsic::arm_neon_vcvtfp2fxs : Intrinsic::arm_neon_vcvtfp2fxu; return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, N->getDebugLoc(), N->getValueType(0), DAG.getConstant(IntrinsicOpcode, MVT::i32), N0, DAG.getConstant(Log2_64(C), MVT::i32)); } /// PerformVDIVCombine - VCVT (fixed-point to floating-point, Advanced SIMD) /// can replace combinations of VCVT (integer to floating-point) and VDIV /// when the VDIV has a constant operand that is a power of 2. /// /// Example (assume d17 = ): /// vcvt.f32.s32 d16, d16 /// vdiv.f32 d16, d17, d16 /// becomes: /// vcvt.f32.s32 d16, d16, #3 static SDValue PerformVDIVCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI, const ARMSubtarget *Subtarget) { SelectionDAG &DAG = DCI.DAG; SDValue Op = N->getOperand(0); unsigned OpOpcode = Op.getNode()->getOpcode(); if (!Subtarget->hasNEON() || !N->getValueType(0).isVector() || (OpOpcode != ISD::SINT_TO_FP && OpOpcode != ISD::UINT_TO_FP)) return SDValue(); uint64_t C; SDValue ConstVec = N->getOperand(1); bool isSigned = OpOpcode == ISD::SINT_TO_FP; if (ConstVec.getOpcode() != ISD::BUILD_VECTOR || !isConstVecPow2(ConstVec, isSigned, C)) return SDValue(); unsigned IntrinsicOpcode = isSigned ? Intrinsic::arm_neon_vcvtfxs2fp : Intrinsic::arm_neon_vcvtfxu2fp; return DAG.getNode(ISD::INTRINSIC_WO_CHAIN, N->getDebugLoc(), Op.getValueType(), DAG.getConstant(IntrinsicOpcode, MVT::i32), Op.getOperand(0), DAG.getConstant(Log2_64(C), MVT::i32)); } /// Getvshiftimm - Check if this is a valid build_vector for the immediate /// operand of a vector shift operation, where all the elements of the /// build_vector must have the same constant integer value. static bool getVShiftImm(SDValue Op, unsigned ElementBits, int64_t &Cnt) { // Ignore bit_converts. while (Op.getOpcode() == ISD::BITCAST) Op = Op.getOperand(0); BuildVectorSDNode *BVN = dyn_cast(Op.getNode()); APInt SplatBits, SplatUndef; unsigned SplatBitSize; bool HasAnyUndefs; if (! BVN || ! BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs, ElementBits) || SplatBitSize > ElementBits) return false; Cnt = SplatBits.getSExtValue(); return true; } /// isVShiftLImm - Check if this is a valid build_vector for the immediate /// operand of a vector shift left operation. That value must be in the range: /// 0 <= Value < ElementBits for a left shift; or /// 0 <= Value <= ElementBits for a long left shift. static bool isVShiftLImm(SDValue Op, EVT VT, bool isLong, int64_t &Cnt) { assert(VT.isVector() && "vector shift count is not a vector type"); unsigned ElementBits = VT.getVectorElementType().getSizeInBits(); if (! getVShiftImm(Op, ElementBits, Cnt)) return false; return (Cnt >= 0 && (isLong ? Cnt-1 : Cnt) < ElementBits); } /// isVShiftRImm - Check if this is a valid build_vector for the immediate /// operand of a vector shift right operation. For a shift opcode, the value /// is positive, but for an intrinsic the value count must be negative. The /// absolute value must be in the range: /// 1 <= |Value| <= ElementBits for a right shift; or /// 1 <= |Value| <= ElementBits/2 for a narrow right shift. static bool isVShiftRImm(SDValue Op, EVT VT, bool isNarrow, bool isIntrinsic, int64_t &Cnt) { assert(VT.isVector() && "vector shift count is not a vector type"); unsigned ElementBits = VT.getVectorElementType().getSizeInBits(); if (! getVShiftImm(Op, ElementBits, Cnt)) return false; if (isIntrinsic) Cnt = -Cnt; return (Cnt >= 1 && Cnt <= (isNarrow ? ElementBits/2 : ElementBits)); } /// PerformIntrinsicCombine - ARM-specific DAG combining for intrinsics. static SDValue PerformIntrinsicCombine(SDNode *N, SelectionDAG &DAG) { unsigned IntNo = cast(N->getOperand(0))->getZExtValue(); switch (IntNo) { default: // Don't do anything for most intrinsics. break; // Vector shifts: check for immediate versions and lower them. // Note: This is done during DAG combining instead of DAG legalizing because // the build_vectors for 64-bit vector element shift counts are generally // not legal, and it is hard to see their values after they get legalized to // loads from a constant pool. case Intrinsic::arm_neon_vshifts: case Intrinsic::arm_neon_vshiftu: case Intrinsic::arm_neon_vshiftls: case Intrinsic::arm_neon_vshiftlu: case Intrinsic::arm_neon_vshiftn: case Intrinsic::arm_neon_vrshifts: case Intrinsic::arm_neon_vrshiftu: case Intrinsic::arm_neon_vrshiftn: case Intrinsic::arm_neon_vqshifts: case Intrinsic::arm_neon_vqshiftu: case Intrinsic::arm_neon_vqshiftsu: case Intrinsic::arm_neon_vqshiftns: case Intrinsic::arm_neon_vqshiftnu: case Intrinsic::arm_neon_vqshiftnsu: case Intrinsic::arm_neon_vqrshiftns: case Intrinsic::arm_neon_vqrshiftnu: case Intrinsic::arm_neon_vqrshiftnsu: { EVT VT = N->getOperand(1).getValueType(); int64_t Cnt; unsigned VShiftOpc = 0; switch (IntNo) { case Intrinsic::arm_neon_vshifts: case Intrinsic::arm_neon_vshiftu: if (isVShiftLImm(N->getOperand(2), VT, false, Cnt)) { VShiftOpc = ARMISD::VSHL; break; } if (isVShiftRImm(N->getOperand(2), VT, false, true, Cnt)) { VShiftOpc = (IntNo == Intrinsic::arm_neon_vshifts ? ARMISD::VSHRs : ARMISD::VSHRu); break; } return SDValue(); case Intrinsic::arm_neon_vshiftls: case Intrinsic::arm_neon_vshiftlu: if (isVShiftLImm(N->getOperand(2), VT, true, Cnt)) break; llvm_unreachable("invalid shift count for vshll intrinsic"); case Intrinsic::arm_neon_vrshifts: case Intrinsic::arm_neon_vrshiftu: if (isVShiftRImm(N->getOperand(2), VT, false, true, Cnt)) break; return SDValue(); case Intrinsic::arm_neon_vqshifts: case Intrinsic::arm_neon_vqshiftu: if (isVShiftLImm(N->getOperand(2), VT, false, Cnt)) break; return SDValue(); case Intrinsic::arm_neon_vqshiftsu: if (isVShiftLImm(N->getOperand(2), VT, false, Cnt)) break; llvm_unreachable("invalid shift count for vqshlu intrinsic"); case Intrinsic::arm_neon_vshiftn: case Intrinsic::arm_neon_vrshiftn: case Intrinsic::arm_neon_vqshiftns: case Intrinsic::arm_neon_vqshiftnu: case Intrinsic::arm_neon_vqshiftnsu: case Intrinsic::arm_neon_vqrshiftns: case Intrinsic::arm_neon_vqrshiftnu: case Intrinsic::arm_neon_vqrshiftnsu: // Narrowing shifts require an immediate right shift. if (isVShiftRImm(N->getOperand(2), VT, true, true, Cnt)) break; llvm_unreachable("invalid shift count for narrowing vector shift " "intrinsic"); default: llvm_unreachable("unhandled vector shift"); } switch (IntNo) { case Intrinsic::arm_neon_vshifts: case Intrinsic::arm_neon_vshiftu: // Opcode already set above. break; case Intrinsic::arm_neon_vshiftls: case Intrinsic::arm_neon_vshiftlu: if (Cnt == VT.getVectorElementType().getSizeInBits()) VShiftOpc = ARMISD::VSHLLi; else VShiftOpc = (IntNo == Intrinsic::arm_neon_vshiftls ? ARMISD::VSHLLs : ARMISD::VSHLLu); break; case Intrinsic::arm_neon_vshiftn: VShiftOpc = ARMISD::VSHRN; break; case Intrinsic::arm_neon_vrshifts: VShiftOpc = ARMISD::VRSHRs; break; case Intrinsic::arm_neon_vrshiftu: VShiftOpc = ARMISD::VRSHRu; break; case Intrinsic::arm_neon_vrshiftn: VShiftOpc = ARMISD::VRSHRN; break; case Intrinsic::arm_neon_vqshifts: VShiftOpc = ARMISD::VQSHLs; break; case Intrinsic::arm_neon_vqshiftu: VShiftOpc = ARMISD::VQSHLu; break; case Intrinsic::arm_neon_vqshiftsu: VShiftOpc = ARMISD::VQSHLsu; break; case Intrinsic::arm_neon_vqshiftns: VShiftOpc = ARMISD::VQSHRNs; break; case Intrinsic::arm_neon_vqshiftnu: VShiftOpc = ARMISD::VQSHRNu; break; case Intrinsic::arm_neon_vqshiftnsu: VShiftOpc = ARMISD::VQSHRNsu; break; case Intrinsic::arm_neon_vqrshiftns: VShiftOpc = ARMISD::VQRSHRNs; break; case Intrinsic::arm_neon_vqrshiftnu: VShiftOpc = ARMISD::VQRSHRNu; break; case Intrinsic::arm_neon_vqrshiftnsu: VShiftOpc = ARMISD::VQRSHRNsu; break; } return DAG.getNode(VShiftOpc, N->getDebugLoc(), N->getValueType(0), N->getOperand(1), DAG.getConstant(Cnt, MVT::i32)); } case Intrinsic::arm_neon_vshiftins: { EVT VT = N->getOperand(1).getValueType(); int64_t Cnt; unsigned VShiftOpc = 0; if (isVShiftLImm(N->getOperand(3), VT, false, Cnt)) VShiftOpc = ARMISD::VSLI; else if (isVShiftRImm(N->getOperand(3), VT, false, true, Cnt)) VShiftOpc = ARMISD::VSRI; else { llvm_unreachable("invalid shift count for vsli/vsri intrinsic"); } return DAG.getNode(VShiftOpc, N->getDebugLoc(), N->getValueType(0), N->getOperand(1), N->getOperand(2), DAG.getConstant(Cnt, MVT::i32)); } case Intrinsic::arm_neon_vqrshifts: case Intrinsic::arm_neon_vqrshiftu: // No immediate versions of these to check for. break; } return SDValue(); } /// PerformShiftCombine - Checks for immediate versions of vector shifts and /// lowers them. As with the vector shift intrinsics, this is done during DAG /// combining instead of DAG legalizing because the build_vectors for 64-bit /// vector element shift counts are generally not legal, and it is hard to see /// their values after they get legalized to loads from a constant pool. static SDValue PerformShiftCombine(SDNode *N, SelectionDAG &DAG, const ARMSubtarget *ST) { EVT VT = N->getValueType(0); // Nothing to be done for scalar shifts. const TargetLowering &TLI = DAG.getTargetLoweringInfo(); if (!VT.isVector() || !TLI.isTypeLegal(VT)) return SDValue(); assert(ST->hasNEON() && "unexpected vector shift"); int64_t Cnt; switch (N->getOpcode()) { default: llvm_unreachable("unexpected shift opcode"); case ISD::SHL: if (isVShiftLImm(N->getOperand(1), VT, false, Cnt)) return DAG.getNode(ARMISD::VSHL, N->getDebugLoc(), VT, N->getOperand(0), DAG.getConstant(Cnt, MVT::i32)); break; case ISD::SRA: case ISD::SRL: if (isVShiftRImm(N->getOperand(1), VT, false, false, Cnt)) { unsigned VShiftOpc = (N->getOpcode() == ISD::SRA ? ARMISD::VSHRs : ARMISD::VSHRu); return DAG.getNode(VShiftOpc, N->getDebugLoc(), VT, N->getOperand(0), DAG.getConstant(Cnt, MVT::i32)); } } return SDValue(); } /// PerformExtendCombine - Target-specific DAG combining for ISD::SIGN_EXTEND, /// ISD::ZERO_EXTEND, and ISD::ANY_EXTEND. static SDValue PerformExtendCombine(SDNode *N, SelectionDAG &DAG, const ARMSubtarget *ST) { SDValue N0 = N->getOperand(0); // Check for sign- and zero-extensions of vector extract operations of 8- // and 16-bit vector elements. NEON supports these directly. They are // handled during DAG combining because type legalization will promote them // to 32-bit types and it is messy to recognize the operations after that. if (ST->hasNEON() && N0.getOpcode() == ISD::EXTRACT_VECTOR_ELT) { SDValue Vec = N0.getOperand(0); SDValue Lane = N0.getOperand(1); EVT VT = N->getValueType(0); EVT EltVT = N0.getValueType(); const TargetLowering &TLI = DAG.getTargetLoweringInfo(); if (VT == MVT::i32 && (EltVT == MVT::i8 || EltVT == MVT::i16) && TLI.isTypeLegal(Vec.getValueType()) && isa(Lane)) { unsigned Opc = 0; switch (N->getOpcode()) { default: llvm_unreachable("unexpected opcode"); case ISD::SIGN_EXTEND: Opc = ARMISD::VGETLANEs; break; case ISD::ZERO_EXTEND: case ISD::ANY_EXTEND: Opc = ARMISD::VGETLANEu; break; } return DAG.getNode(Opc, N->getDebugLoc(), VT, Vec, Lane); } } return SDValue(); } /// PerformSELECT_CCCombine - Target-specific DAG combining for ISD::SELECT_CC /// to match f32 max/min patterns to use NEON vmax/vmin instructions. static SDValue PerformSELECT_CCCombine(SDNode *N, SelectionDAG &DAG, const ARMSubtarget *ST) { // If the target supports NEON, try to use vmax/vmin instructions for f32 // selects like "x < y ? x : y". Unless the NoNaNsFPMath option is set, // be careful about NaNs: NEON's vmax/vmin return NaN if either operand is // a NaN; only do the transformation when it matches that behavior. // For now only do this when using NEON for FP operations; if using VFP, it // is not obvious that the benefit outweighs the cost of switching to the // NEON pipeline. if (!ST->hasNEON() || !ST->useNEONForSinglePrecisionFP() || N->getValueType(0) != MVT::f32) return SDValue(); SDValue CondLHS = N->getOperand(0); SDValue CondRHS = N->getOperand(1); SDValue LHS = N->getOperand(2); SDValue RHS = N->getOperand(3); ISD::CondCode CC = cast(N->getOperand(4))->get(); unsigned Opcode = 0; bool IsReversed; if (DAG.isEqualTo(LHS, CondLHS) && DAG.isEqualTo(RHS, CondRHS)) { IsReversed = false; // x CC y ? x : y } else if (DAG.isEqualTo(LHS, CondRHS) && DAG.isEqualTo(RHS, CondLHS)) { IsReversed = true ; // x CC y ? y : x } else { return SDValue(); } bool IsUnordered; switch (CC) { default: break; case ISD::SETOLT: case ISD::SETOLE: case ISD::SETLT: case ISD::SETLE: case ISD::SETULT: case ISD::SETULE: // If LHS is NaN, an ordered comparison will be false and the result will // be the RHS, but vmin(NaN, RHS) = NaN. Avoid this by checking that LHS // != NaN. Likewise, for unordered comparisons, check for RHS != NaN. IsUnordered = (CC == ISD::SETULT || CC == ISD::SETULE); if (!DAG.isKnownNeverNaN(IsUnordered ? RHS : LHS)) break; // For less-than-or-equal comparisons, "+0 <= -0" will be true but vmin // will return -0, so vmin can only be used for unsafe math or if one of // the operands is known to be nonzero. if ((CC == ISD::SETLE || CC == ISD::SETOLE || CC == ISD::SETULE) && !UnsafeFPMath && !(DAG.isKnownNeverZero(LHS) || DAG.isKnownNeverZero(RHS))) break; Opcode = IsReversed ? ARMISD::FMAX : ARMISD::FMIN; break; case ISD::SETOGT: case ISD::SETOGE: case ISD::SETGT: case ISD::SETGE: case ISD::SETUGT: case ISD::SETUGE: // If LHS is NaN, an ordered comparison will be false and the result will // be the RHS, but vmax(NaN, RHS) = NaN. Avoid this by checking that LHS // != NaN. Likewise, for unordered comparisons, check for RHS != NaN. IsUnordered = (CC == ISD::SETUGT || CC == ISD::SETUGE); if (!DAG.isKnownNeverNaN(IsUnordered ? RHS : LHS)) break; // For greater-than-or-equal comparisons, "-0 >= +0" will be true but vmax // will return +0, so vmax can only be used for unsafe math or if one of // the operands is known to be nonzero. if ((CC == ISD::SETGE || CC == ISD::SETOGE || CC == ISD::SETUGE) && !UnsafeFPMath && !(DAG.isKnownNeverZero(LHS) || DAG.isKnownNeverZero(RHS))) break; Opcode = IsReversed ? ARMISD::FMIN : ARMISD::FMAX; break; } if (!Opcode) return SDValue(); return DAG.getNode(Opcode, N->getDebugLoc(), N->getValueType(0), LHS, RHS); } /// PerformCMOVCombine - Target-specific DAG combining for ARMISD::CMOV. SDValue ARMTargetLowering::PerformCMOVCombine(SDNode *N, SelectionDAG &DAG) const { SDValue Cmp = N->getOperand(4); if (Cmp.getOpcode() != ARMISD::CMPZ) // Only looking at EQ and NE cases. return SDValue(); EVT VT = N->getValueType(0); DebugLoc dl = N->getDebugLoc(); SDValue LHS = Cmp.getOperand(0); SDValue RHS = Cmp.getOperand(1); SDValue FalseVal = N->getOperand(0); SDValue TrueVal = N->getOperand(1); SDValue ARMcc = N->getOperand(2); ARMCC::CondCodes CC = (ARMCC::CondCodes)cast(ARMcc)->getZExtValue(); // Simplify // mov r1, r0 // cmp r1, x // mov r0, y // moveq r0, x // to // cmp r0, x // movne r0, y // // mov r1, r0 // cmp r1, x // mov r0, x // movne r0, y // to // cmp r0, x // movne r0, y /// FIXME: Turn this into a target neutral optimization? SDValue Res; if (CC == ARMCC::NE && FalseVal == RHS && FalseVal != LHS) { Res = DAG.getNode(ARMISD::CMOV, dl, VT, LHS, TrueVal, ARMcc, N->getOperand(3), Cmp); } else if (CC == ARMCC::EQ && TrueVal == RHS) { SDValue ARMcc; SDValue NewCmp = getARMCmp(LHS, RHS, ISD::SETNE, ARMcc, DAG, dl); Res = DAG.getNode(ARMISD::CMOV, dl, VT, LHS, FalseVal, ARMcc, N->getOperand(3), NewCmp); } if (Res.getNode()) { APInt KnownZero, KnownOne; APInt Mask = APInt::getAllOnesValue(VT.getScalarType().getSizeInBits()); DAG.ComputeMaskedBits(SDValue(N,0), Mask, KnownZero, KnownOne); // Capture demanded bits information that would be otherwise lost. if (KnownZero == 0xfffffffe) Res = DAG.getNode(ISD::AssertZext, dl, MVT::i32, Res, DAG.getValueType(MVT::i1)); else if (KnownZero == 0xffffff00) Res = DAG.getNode(ISD::AssertZext, dl, MVT::i32, Res, DAG.getValueType(MVT::i8)); else if (KnownZero == 0xffff0000) Res = DAG.getNode(ISD::AssertZext, dl, MVT::i32, Res, DAG.getValueType(MVT::i16)); } return Res; } SDValue ARMTargetLowering::PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const { switch (N->getOpcode()) { default: break; case ISD::ADD: return PerformADDCombine(N, DCI, Subtarget); case ISD::SUB: return PerformSUBCombine(N, DCI); case ISD::MUL: return PerformMULCombine(N, DCI, Subtarget); case ISD::OR: return PerformORCombine(N, DCI, Subtarget); case ISD::AND: return PerformANDCombine(N, DCI); case ARMISD::BFI: return PerformBFICombine(N, DCI); case ARMISD::VMOVRRD: return PerformVMOVRRDCombine(N, DCI); case ARMISD::VMOVDRR: return PerformVMOVDRRCombine(N, DCI.DAG); case ISD::STORE: return PerformSTORECombine(N, DCI); case ISD::BUILD_VECTOR: return PerformBUILD_VECTORCombine(N, DCI); case ISD::INSERT_VECTOR_ELT: return PerformInsertEltCombine(N, DCI); case ISD::VECTOR_SHUFFLE: return PerformVECTOR_SHUFFLECombine(N, DCI.DAG); case ARMISD::VDUPLANE: return PerformVDUPLANECombine(N, DCI); case ISD::FP_TO_SINT: case ISD::FP_TO_UINT: return PerformVCVTCombine(N, DCI, Subtarget); case ISD::FDIV: return PerformVDIVCombine(N, DCI, Subtarget); case ISD::INTRINSIC_WO_CHAIN: return PerformIntrinsicCombine(N, DCI.DAG); case ISD::SHL: case ISD::SRA: case ISD::SRL: return PerformShiftCombine(N, DCI.DAG, Subtarget); case ISD::SIGN_EXTEND: case ISD::ZERO_EXTEND: case ISD::ANY_EXTEND: return PerformExtendCombine(N, DCI.DAG, Subtarget); case ISD::SELECT_CC: return PerformSELECT_CCCombine(N, DCI.DAG, Subtarget); case ARMISD::CMOV: return PerformCMOVCombine(N, DCI.DAG); case ARMISD::VLD2DUP: case ARMISD::VLD3DUP: case ARMISD::VLD4DUP: return CombineBaseUpdate(N, DCI); case ISD::INTRINSIC_VOID: case ISD::INTRINSIC_W_CHAIN: switch (cast(N->getOperand(1))->getZExtValue()) { case Intrinsic::arm_neon_vld1: case Intrinsic::arm_neon_vld2: case Intrinsic::arm_neon_vld3: case Intrinsic::arm_neon_vld4: case Intrinsic::arm_neon_vld2lane: case Intrinsic::arm_neon_vld3lane: case Intrinsic::arm_neon_vld4lane: case Intrinsic::arm_neon_vst1: case Intrinsic::arm_neon_vst2: case Intrinsic::arm_neon_vst3: case Intrinsic::arm_neon_vst4: case Intrinsic::arm_neon_vst2lane: case Intrinsic::arm_neon_vst3lane: case Intrinsic::arm_neon_vst4lane: return CombineBaseUpdate(N, DCI); default: break; } break; } return SDValue(); } bool ARMTargetLowering::isDesirableToTransformToIntegerOp(unsigned Opc, EVT VT) const { return (VT == MVT::f32) && (Opc == ISD::LOAD || Opc == ISD::STORE); } bool ARMTargetLowering::allowsUnalignedMemoryAccesses(EVT VT) const { if (!Subtarget->allowsUnalignedMem()) return false; switch (VT.getSimpleVT().SimpleTy) { default: return false; case MVT::i8: case MVT::i16: case MVT::i32: return true; // FIXME: VLD1 etc with standard alignment is legal. } } static bool isLegalT1AddressImmediate(int64_t V, EVT VT) { if (V < 0) return false; unsigned Scale = 1; switch (VT.getSimpleVT().SimpleTy) { default: return false; case MVT::i1: case MVT::i8: // Scale == 1; break; case MVT::i16: // Scale == 2; Scale = 2; break; case MVT::i32: // Scale == 4; Scale = 4; break; } if ((V & (Scale - 1)) != 0) return false; V /= Scale; return V == (V & ((1LL << 5) - 1)); } static bool isLegalT2AddressImmediate(int64_t V, EVT VT, const ARMSubtarget *Subtarget) { bool isNeg = false; if (V < 0) { isNeg = true; V = - V; } switch (VT.getSimpleVT().SimpleTy) { default: return false; case MVT::i1: case MVT::i8: case MVT::i16: case MVT::i32: // + imm12 or - imm8 if (isNeg) return V == (V & ((1LL << 8) - 1)); return V == (V & ((1LL << 12) - 1)); case MVT::f32: case MVT::f64: // Same as ARM mode. FIXME: NEON? if (!Subtarget->hasVFP2()) return false; if ((V & 3) != 0) return false; V >>= 2; return V == (V & ((1LL << 8) - 1)); } } /// isLegalAddressImmediate - Return true if the integer value can be used /// as the offset of the target addressing mode for load / store of the /// given type. static bool isLegalAddressImmediate(int64_t V, EVT VT, const ARMSubtarget *Subtarget) { if (V == 0) return true; if (!VT.isSimple()) return false; if (Subtarget->isThumb1Only()) return isLegalT1AddressImmediate(V, VT); else if (Subtarget->isThumb2()) return isLegalT2AddressImmediate(V, VT, Subtarget); // ARM mode. if (V < 0) V = - V; switch (VT.getSimpleVT().SimpleTy) { default: return false; case MVT::i1: case MVT::i8: case MVT::i32: // +- imm12 return V == (V & ((1LL << 12) - 1)); case MVT::i16: // +- imm8 return V == (V & ((1LL << 8) - 1)); case MVT::f32: case MVT::f64: if (!Subtarget->hasVFP2()) // FIXME: NEON? return false; if ((V & 3) != 0) return false; V >>= 2; return V == (V & ((1LL << 8) - 1)); } } bool ARMTargetLowering::isLegalT2ScaledAddressingMode(const AddrMode &AM, EVT VT) const { int Scale = AM.Scale; if (Scale < 0) return false; switch (VT.getSimpleVT().SimpleTy) { default: return false; case MVT::i1: case MVT::i8: case MVT::i16: case MVT::i32: if (Scale == 1) return true; // r + r << imm Scale = Scale & ~1; return Scale == 2 || Scale == 4 || Scale == 8; case MVT::i64: // r + r if (((unsigned)AM.HasBaseReg + Scale) <= 2) return true; return false; case MVT::isVoid: // Note, we allow "void" uses (basically, uses that aren't loads or // stores), because arm allows folding a scale into many arithmetic // operations. This should be made more precise and revisited later. // Allow r << imm, but the imm has to be a multiple of two. if (Scale & 1) return false; return isPowerOf2_32(Scale); } } /// isLegalAddressingMode - Return true if the addressing mode represented /// by AM is legal for this target, for a load/store of the specified type. bool ARMTargetLowering::isLegalAddressingMode(const AddrMode &AM, Type *Ty) const { EVT VT = getValueType(Ty, true); if (!isLegalAddressImmediate(AM.BaseOffs, VT, Subtarget)) return false; // Can never fold addr of global into load/store. if (AM.BaseGV) return false; switch (AM.Scale) { case 0: // no scale reg, must be "r+i" or "r", or "i". break; case 1: if (Subtarget->isThumb1Only()) return false; // FALL THROUGH. default: // ARM doesn't support any R+R*scale+imm addr modes. if (AM.BaseOffs) return false; if (!VT.isSimple()) return false; if (Subtarget->isThumb2()) return isLegalT2ScaledAddressingMode(AM, VT); int Scale = AM.Scale; switch (VT.getSimpleVT().SimpleTy) { default: return false; case MVT::i1: case MVT::i8: case MVT::i32: if (Scale < 0) Scale = -Scale; if (Scale == 1) return true; // r + r << imm return isPowerOf2_32(Scale & ~1); case MVT::i16: case MVT::i64: // r + r if (((unsigned)AM.HasBaseReg + Scale) <= 2) return true; return false; case MVT::isVoid: // Note, we allow "void" uses (basically, uses that aren't loads or // stores), because arm allows folding a scale into many arithmetic // operations. This should be made more precise and revisited later. // Allow r << imm, but the imm has to be a multiple of two. if (Scale & 1) return false; return isPowerOf2_32(Scale); } break; } return true; } /// isLegalICmpImmediate - Return true if the specified immediate is legal /// icmp immediate, that is the target has icmp instructions which can compare /// a register against the immediate without having to materialize the /// immediate into a register. bool ARMTargetLowering::isLegalICmpImmediate(int64_t Imm) const { if (!Subtarget->isThumb()) return ARM_AM::getSOImmVal(Imm) != -1; if (Subtarget->isThumb2()) return ARM_AM::getT2SOImmVal(Imm) != -1; return Imm >= 0 && Imm <= 255; } /// isLegalAddImmediate - Return true if the specified immediate is legal /// add immediate, that is the target has add instructions which can add /// a register with the immediate without having to materialize the /// immediate into a register. bool ARMTargetLowering::isLegalAddImmediate(int64_t Imm) const { return ARM_AM::getSOImmVal(Imm) != -1; } static bool getARMIndexedAddressParts(SDNode *Ptr, EVT VT, bool isSEXTLoad, SDValue &Base, SDValue &Offset, bool &isInc, SelectionDAG &DAG) { if (Ptr->getOpcode() != ISD::ADD && Ptr->getOpcode() != ISD::SUB) return false; if (VT == MVT::i16 || ((VT == MVT::i8 || VT == MVT::i1) && isSEXTLoad)) { // AddressingMode 3 Base = Ptr->getOperand(0); if (ConstantSDNode *RHS = dyn_cast(Ptr->getOperand(1))) { int RHSC = (int)RHS->getZExtValue(); if (RHSC < 0 && RHSC > -256) { assert(Ptr->getOpcode() == ISD::ADD); isInc = false; Offset = DAG.getConstant(-RHSC, RHS->getValueType(0)); return true; } } isInc = (Ptr->getOpcode() == ISD::ADD); Offset = Ptr->getOperand(1); return true; } else if (VT == MVT::i32 || VT == MVT::i8 || VT == MVT::i1) { // AddressingMode 2 if (ConstantSDNode *RHS = dyn_cast(Ptr->getOperand(1))) { int RHSC = (int)RHS->getZExtValue(); if (RHSC < 0 && RHSC > -0x1000) { assert(Ptr->getOpcode() == ISD::ADD); isInc = false; Offset = DAG.getConstant(-RHSC, RHS->getValueType(0)); Base = Ptr->getOperand(0); return true; } } if (Ptr->getOpcode() == ISD::ADD) { isInc = true; ARM_AM::ShiftOpc ShOpcVal= ARM_AM::getShiftOpcForNode(Ptr->getOperand(0).getOpcode()); if (ShOpcVal != ARM_AM::no_shift) { Base = Ptr->getOperand(1); Offset = Ptr->getOperand(0); } else { Base = Ptr->getOperand(0); Offset = Ptr->getOperand(1); } return true; } isInc = (Ptr->getOpcode() == ISD::ADD); Base = Ptr->getOperand(0); Offset = Ptr->getOperand(1); return true; } // FIXME: Use VLDM / VSTM to emulate indexed FP load / store. return false; } static bool getT2IndexedAddressParts(SDNode *Ptr, EVT VT, bool isSEXTLoad, SDValue &Base, SDValue &Offset, bool &isInc, SelectionDAG &DAG) { if (Ptr->getOpcode() != ISD::ADD && Ptr->getOpcode() != ISD::SUB) return false; Base = Ptr->getOperand(0); if (ConstantSDNode *RHS = dyn_cast(Ptr->getOperand(1))) { int RHSC = (int)RHS->getZExtValue(); if (RHSC < 0 && RHSC > -0x100) { // 8 bits. assert(Ptr->getOpcode() == ISD::ADD); isInc = false; Offset = DAG.getConstant(-RHSC, RHS->getValueType(0)); return true; } else if (RHSC > 0 && RHSC < 0x100) { // 8 bit, no zero. isInc = Ptr->getOpcode() == ISD::ADD; Offset = DAG.getConstant(RHSC, RHS->getValueType(0)); return true; } } return false; } /// getPreIndexedAddressParts - returns true by value, base pointer and /// offset pointer and addressing mode by reference if the node's address /// can be legally represented as pre-indexed load / store address. bool ARMTargetLowering::getPreIndexedAddressParts(SDNode *N, SDValue &Base, SDValue &Offset, ISD::MemIndexedMode &AM, SelectionDAG &DAG) const { if (Subtarget->isThumb1Only()) return false; EVT VT; SDValue Ptr; bool isSEXTLoad = false; if (LoadSDNode *LD = dyn_cast(N)) { Ptr = LD->getBasePtr(); VT = LD->getMemoryVT(); isSEXTLoad = LD->getExtensionType() == ISD::SEXTLOAD; } else if (StoreSDNode *ST = dyn_cast(N)) { Ptr = ST->getBasePtr(); VT = ST->getMemoryVT(); } else return false; bool isInc; bool isLegal = false; if (Subtarget->isThumb2()) isLegal = getT2IndexedAddressParts(Ptr.getNode(), VT, isSEXTLoad, Base, Offset, isInc, DAG); else isLegal = getARMIndexedAddressParts(Ptr.getNode(), VT, isSEXTLoad, Base, Offset, isInc, DAG); if (!isLegal) return false; AM = isInc ? ISD::PRE_INC : ISD::PRE_DEC; return true; } /// getPostIndexedAddressParts - returns true by value, base pointer and /// offset pointer and addressing mode by reference if this node can be /// combined with a load / store to form a post-indexed load / store. bool ARMTargetLowering::getPostIndexedAddressParts(SDNode *N, SDNode *Op, SDValue &Base, SDValue &Offset, ISD::MemIndexedMode &AM, SelectionDAG &DAG) const { if (Subtarget->isThumb1Only()) return false; EVT VT; SDValue Ptr; bool isSEXTLoad = false; if (LoadSDNode *LD = dyn_cast(N)) { VT = LD->getMemoryVT(); Ptr = LD->getBasePtr(); isSEXTLoad = LD->getExtensionType() == ISD::SEXTLOAD; } else if (StoreSDNode *ST = dyn_cast(N)) { VT = ST->getMemoryVT(); Ptr = ST->getBasePtr(); } else return false; bool isInc; bool isLegal = false; if (Subtarget->isThumb2()) isLegal = getT2IndexedAddressParts(Op, VT, isSEXTLoad, Base, Offset, isInc, DAG); else isLegal = getARMIndexedAddressParts(Op, VT, isSEXTLoad, Base, Offset, isInc, DAG); if (!isLegal) return false; if (Ptr != Base) { // Swap base ptr and offset to catch more post-index load / store when // it's legal. In Thumb2 mode, offset must be an immediate. if (Ptr == Offset && Op->getOpcode() == ISD::ADD && !Subtarget->isThumb2()) std::swap(Base, Offset); // Post-indexed load / store update the base pointer. if (Ptr != Base) return false; } AM = isInc ? ISD::POST_INC : ISD::POST_DEC; return true; } void ARMTargetLowering::computeMaskedBitsForTargetNode(const SDValue Op, const APInt &Mask, APInt &KnownZero, APInt &KnownOne, const SelectionDAG &DAG, unsigned Depth) const { KnownZero = KnownOne = APInt(Mask.getBitWidth(), 0); switch (Op.getOpcode()) { default: break; case ARMISD::CMOV: { // Bits are known zero/one if known on the LHS and RHS. DAG.ComputeMaskedBits(Op.getOperand(0), Mask, KnownZero, KnownOne, Depth+1); if (KnownZero == 0 && KnownOne == 0) return; APInt KnownZeroRHS, KnownOneRHS; DAG.ComputeMaskedBits(Op.getOperand(1), Mask, KnownZeroRHS, KnownOneRHS, Depth+1); KnownZero &= KnownZeroRHS; KnownOne &= KnownOneRHS; return; } } } //===----------------------------------------------------------------------===// // ARM Inline Assembly Support //===----------------------------------------------------------------------===// bool ARMTargetLowering::ExpandInlineAsm(CallInst *CI) const { // Looking for "rev" which is V6+. if (!Subtarget->hasV6Ops()) return false; InlineAsm *IA = cast(CI->getCalledValue()); std::string AsmStr = IA->getAsmString(); SmallVector AsmPieces; SplitString(AsmStr, AsmPieces, ";\n"); switch (AsmPieces.size()) { default: return false; case 1: AsmStr = AsmPieces[0]; AsmPieces.clear(); SplitString(AsmStr, AsmPieces, " \t,"); // rev $0, $1 if (AsmPieces.size() == 3 && AsmPieces[0] == "rev" && AsmPieces[1] == "$0" && AsmPieces[2] == "$1" && IA->getConstraintString().compare(0, 4, "=l,l") == 0) { IntegerType *Ty = dyn_cast(CI->getType()); if (Ty && Ty->getBitWidth() == 32) return IntrinsicLowering::LowerToByteSwap(CI); } break; } return false; } /// getConstraintType - Given a constraint letter, return the type of /// constraint it is for this target. ARMTargetLowering::ConstraintType ARMTargetLowering::getConstraintType(const std::string &Constraint) const { if (Constraint.size() == 1) { switch (Constraint[0]) { default: break; case 'l': return C_RegisterClass; case 'w': return C_RegisterClass; case 'h': return C_RegisterClass; case 'x': return C_RegisterClass; case 't': return C_RegisterClass; case 'j': return C_Other; // Constant for movw. // An address with a single base register. Due to the way we // currently handle addresses it is the same as an 'r' memory constraint. case 'Q': return C_Memory; } } else if (Constraint.size() == 2) { switch (Constraint[0]) { default: break; // All 'U+' constraints are addresses. case 'U': return C_Memory; } } return TargetLowering::getConstraintType(Constraint); } /// Examine constraint type and operand type and determine a weight value. /// This object must already have been set up with the operand type /// and the current alternative constraint selected. TargetLowering::ConstraintWeight ARMTargetLowering::getSingleConstraintMatchWeight( AsmOperandInfo &info, const char *constraint) const { ConstraintWeight weight = CW_Invalid; Value *CallOperandVal = info.CallOperandVal; // If we don't have a value, we can't do a match, // but allow it at the lowest weight. if (CallOperandVal == NULL) return CW_Default; Type *type = CallOperandVal->getType(); // Look at the constraint type. switch (*constraint) { default: weight = TargetLowering::getSingleConstraintMatchWeight(info, constraint); break; case 'l': if (type->isIntegerTy()) { if (Subtarget->isThumb()) weight = CW_SpecificReg; else weight = CW_Register; } break; case 'w': if (type->isFloatingPointTy()) weight = CW_Register; break; } return weight; } typedef std::pair RCPair; RCPair ARMTargetLowering::getRegForInlineAsmConstraint(const std::string &Constraint, EVT VT) const { if (Constraint.size() == 1) { // GCC ARM Constraint Letters switch (Constraint[0]) { case 'l': // Low regs or general regs. if (Subtarget->isThumb()) return RCPair(0U, ARM::tGPRRegisterClass); else return RCPair(0U, ARM::GPRRegisterClass); case 'h': // High regs or no regs. if (Subtarget->isThumb()) return RCPair(0U, ARM::hGPRRegisterClass); break; case 'r': return RCPair(0U, ARM::GPRRegisterClass); case 'w': if (VT == MVT::f32) return RCPair(0U, ARM::SPRRegisterClass); if (VT.getSizeInBits() == 64) return RCPair(0U, ARM::DPRRegisterClass); if (VT.getSizeInBits() == 128) return RCPair(0U, ARM::QPRRegisterClass); break; case 'x': if (VT == MVT::f32) return RCPair(0U, ARM::SPR_8RegisterClass); if (VT.getSizeInBits() == 64) return RCPair(0U, ARM::DPR_8RegisterClass); if (VT.getSizeInBits() == 128) return RCPair(0U, ARM::QPR_8RegisterClass); break; case 't': if (VT == MVT::f32) return RCPair(0U, ARM::SPRRegisterClass); break; } } if (StringRef("{cc}").equals_lower(Constraint)) return std::make_pair(unsigned(ARM::CPSR), ARM::CCRRegisterClass); return TargetLowering::getRegForInlineAsmConstraint(Constraint, VT); } /// LowerAsmOperandForConstraint - Lower the specified operand into the Ops /// vector. If it is invalid, don't add anything to Ops. void ARMTargetLowering::LowerAsmOperandForConstraint(SDValue Op, std::string &Constraint, std::vector&Ops, SelectionDAG &DAG) const { SDValue Result(0, 0); // Currently only support length 1 constraints. if (Constraint.length() != 1) return; char ConstraintLetter = Constraint[0]; switch (ConstraintLetter) { default: break; case 'j': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': ConstantSDNode *C = dyn_cast(Op); if (!C) return; int64_t CVal64 = C->getSExtValue(); int CVal = (int) CVal64; // None of these constraints allow values larger than 32 bits. Check // that the value fits in an int. if (CVal != CVal64) return; switch (ConstraintLetter) { case 'j': // Constant suitable for movw, must be between 0 and // 65535. if (Subtarget->hasV6T2Ops()) if (CVal >= 0 && CVal <= 65535) break; return; case 'I': if (Subtarget->isThumb1Only()) { // This must be a constant between 0 and 255, for ADD // immediates. if (CVal >= 0 && CVal <= 255) break; } else if (Subtarget->isThumb2()) { // A constant that can be used as an immediate value in a // data-processing instruction. if (ARM_AM::getT2SOImmVal(CVal) != -1) break; } else { // A constant that can be used as an immediate value in a // data-processing instruction. if (ARM_AM::getSOImmVal(CVal) != -1) break; } return; case 'J': if (Subtarget->isThumb()) { // FIXME thumb2 // This must be a constant between -255 and -1, for negated ADD // immediates. This can be used in GCC with an "n" modifier that // prints the negated value, for use with SUB instructions. It is // not useful otherwise but is implemented for compatibility. if (CVal >= -255 && CVal <= -1) break; } else { // This must be a constant between -4095 and 4095. It is not clear // what this constraint is intended for. Implemented for // compatibility with GCC. if (CVal >= -4095 && CVal <= 4095) break; } return; case 'K': if (Subtarget->isThumb1Only()) { // A 32-bit value where only one byte has a nonzero value. Exclude // zero to match GCC. This constraint is used by GCC internally for // constants that can be loaded with a move/shift combination. // It is not useful otherwise but is implemented for compatibility. if (CVal != 0 && ARM_AM::isThumbImmShiftedVal(CVal)) break; } else if (Subtarget->isThumb2()) { // A constant whose bitwise inverse can be used as an immediate // value in a data-processing instruction. This can be used in GCC // with a "B" modifier that prints the inverted value, for use with // BIC and MVN instructions. It is not useful otherwise but is // implemented for compatibility. if (ARM_AM::getT2SOImmVal(~CVal) != -1) break; } else { // A constant whose bitwise inverse can be used as an immediate // value in a data-processing instruction. This can be used in GCC // with a "B" modifier that prints the inverted value, for use with // BIC and MVN instructions. It is not useful otherwise but is // implemented for compatibility. if (ARM_AM::getSOImmVal(~CVal) != -1) break; } return; case 'L': if (Subtarget->isThumb1Only()) { // This must be a constant between -7 and 7, // for 3-operand ADD/SUB immediate instructions. if (CVal >= -7 && CVal < 7) break; } else if (Subtarget->isThumb2()) { // A constant whose negation can be used as an immediate value in a // data-processing instruction. This can be used in GCC with an "n" // modifier that prints the negated value, for use with SUB // instructions. It is not useful otherwise but is implemented for // compatibility. if (ARM_AM::getT2SOImmVal(-CVal) != -1) break; } else { // A constant whose negation can be used as an immediate value in a // data-processing instruction. This can be used in GCC with an "n" // modifier that prints the negated value, for use with SUB // instructions. It is not useful otherwise but is implemented for // compatibility. if (ARM_AM::getSOImmVal(-CVal) != -1) break; } return; case 'M': if (Subtarget->isThumb()) { // FIXME thumb2 // This must be a multiple of 4 between 0 and 1020, for // ADD sp + immediate. if ((CVal >= 0 && CVal <= 1020) && ((CVal & 3) == 0)) break; } else { // A power of two or a constant between 0 and 32. This is used in // GCC for the shift amount on shifted register operands, but it is // useful in general for any shift amounts. if ((CVal >= 0 && CVal <= 32) || ((CVal & (CVal - 1)) == 0)) break; } return; case 'N': if (Subtarget->isThumb()) { // FIXME thumb2 // This must be a constant between 0 and 31, for shift amounts. if (CVal >= 0 && CVal <= 31) break; } return; case 'O': if (Subtarget->isThumb()) { // FIXME thumb2 // This must be a multiple of 4 between -508 and 508, for // ADD/SUB sp = sp + immediate. if ((CVal >= -508 && CVal <= 508) && ((CVal & 3) == 0)) break; } return; } Result = DAG.getTargetConstant(CVal, Op.getValueType()); break; } if (Result.getNode()) { Ops.push_back(Result); return; } return TargetLowering::LowerAsmOperandForConstraint(Op, Constraint, Ops, DAG); } bool ARMTargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const { // The ARM target isn't yet aware of offsets. return false; } bool ARM::isBitFieldInvertedMask(unsigned v) { if (v == 0xffffffff) return 0; // there can be 1's on either or both "outsides", all the "inside" // bits must be 0's unsigned int lsb = 0, msb = 31; while (v & (1 << msb)) --msb; while (v & (1 << lsb)) ++lsb; for (unsigned int i = lsb; i <= msb; ++i) { if (v & (1 << i)) return 0; } return 1; } /// isFPImmLegal - Returns true if the target can instruction select the /// specified FP immediate natively. If false, the legalizer will /// materialize the FP immediate as a load from a constant pool. bool ARMTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT) const { if (!Subtarget->hasVFP3()) return false; if (VT == MVT::f32) return ARM_AM::getFP32Imm(Imm) != -1; if (VT == MVT::f64) return ARM_AM::getFP64Imm(Imm) != -1; return false; } /// getTgtMemIntrinsic - Represent NEON load and store intrinsics as /// MemIntrinsicNodes. The associated MachineMemOperands record the alignment /// specified in the intrinsic calls. bool ARMTargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info, const CallInst &I, unsigned Intrinsic) const { switch (Intrinsic) { case Intrinsic::arm_neon_vld1: case Intrinsic::arm_neon_vld2: case Intrinsic::arm_neon_vld3: case Intrinsic::arm_neon_vld4: case Intrinsic::arm_neon_vld2lane: case Intrinsic::arm_neon_vld3lane: case Intrinsic::arm_neon_vld4lane: { Info.opc = ISD::INTRINSIC_W_CHAIN; // Conservatively set memVT to the entire set of vectors loaded. uint64_t NumElts = getTargetData()->getTypeAllocSize(I.getType()) / 8; Info.memVT = EVT::getVectorVT(I.getType()->getContext(), MVT::i64, NumElts); Info.ptrVal = I.getArgOperand(0); Info.offset = 0; Value *AlignArg = I.getArgOperand(I.getNumArgOperands() - 1); Info.align = cast(AlignArg)->getZExtValue(); Info.vol = false; // volatile loads with NEON intrinsics not supported Info.readMem = true; Info.writeMem = false; return true; } case Intrinsic::arm_neon_vst1: case Intrinsic::arm_neon_vst2: case Intrinsic::arm_neon_vst3: case Intrinsic::arm_neon_vst4: case Intrinsic::arm_neon_vst2lane: case Intrinsic::arm_neon_vst3lane: case Intrinsic::arm_neon_vst4lane: { Info.opc = ISD::INTRINSIC_VOID; // Conservatively set memVT to the entire set of vectors stored. unsigned NumElts = 0; for (unsigned ArgI = 1, ArgE = I.getNumArgOperands(); ArgI < ArgE; ++ArgI) { Type *ArgTy = I.getArgOperand(ArgI)->getType(); if (!ArgTy->isVectorTy()) break; NumElts += getTargetData()->getTypeAllocSize(ArgTy) / 8; } Info.memVT = EVT::getVectorVT(I.getType()->getContext(), MVT::i64, NumElts); Info.ptrVal = I.getArgOperand(0); Info.offset = 0; Value *AlignArg = I.getArgOperand(I.getNumArgOperands() - 1); Info.align = cast(AlignArg)->getZExtValue(); Info.vol = false; // volatile stores with NEON intrinsics not supported Info.readMem = false; Info.writeMem = true; return true; } case Intrinsic::arm_strexd: { Info.opc = ISD::INTRINSIC_W_CHAIN; Info.memVT = MVT::i64; Info.ptrVal = I.getArgOperand(2); Info.offset = 0; Info.align = 8; Info.vol = true; Info.readMem = false; Info.writeMem = true; return true; } case Intrinsic::arm_ldrexd: { Info.opc = ISD::INTRINSIC_W_CHAIN; Info.memVT = MVT::i64; Info.ptrVal = I.getArgOperand(0); Info.offset = 0; Info.align = 8; Info.vol = true; Info.readMem = true; Info.writeMem = false; return true; } default: break; } return false; } Index: vendor/llvm/dist/lib/Target/ARM/ARMInstrThumb2.td =================================================================== --- vendor/llvm/dist/lib/Target/ARM/ARMInstrThumb2.td (revision 228363) +++ vendor/llvm/dist/lib/Target/ARM/ARMInstrThumb2.td (revision 228364) @@ -1,4022 +1,4018 @@ //===- ARMInstrThumb2.td - Thumb2 support for ARM -------------------------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file describes the Thumb2 instruction set. // //===----------------------------------------------------------------------===// // IT block predicate field def it_pred_asmoperand : AsmOperandClass { let Name = "ITCondCode"; let ParserMethod = "parseITCondCode"; } def it_pred : Operand { let PrintMethod = "printMandatoryPredicateOperand"; let ParserMatchClass = it_pred_asmoperand; } // IT block condition mask def it_mask_asmoperand : AsmOperandClass { let Name = "ITMask"; } def it_mask : Operand { let PrintMethod = "printThumbITMask"; let ParserMatchClass = it_mask_asmoperand; } // t2_shift_imm: An integer that encodes a shift amount and the type of shift // (asr or lsl). The 6-bit immediate encodes as: // {5} 0 ==> lsl // 1 asr // {4-0} imm5 shift amount. // asr #32 not allowed def t2_shift_imm : Operand { let PrintMethod = "printShiftImmOperand"; let ParserMatchClass = ShifterImmAsmOperand; let DecoderMethod = "DecodeT2ShifterImmOperand"; } // Shifted operands. No register controlled shifts for Thumb2. // Note: We do not support rrx shifted operands yet. def t2_so_reg : Operand, // reg imm ComplexPattern { let EncoderMethod = "getT2SORegOpValue"; let PrintMethod = "printT2SOOperand"; let DecoderMethod = "DecodeSORegImmOperand"; let ParserMatchClass = ShiftedImmAsmOperand; let MIOperandInfo = (ops rGPR, i32imm); } // t2_so_imm_not_XFORM - Return the complement of a t2_so_imm value def t2_so_imm_not_XFORM : SDNodeXFormgetTargetConstant(~((uint32_t)N->getZExtValue()), MVT::i32); }]>; // t2_so_imm_neg_XFORM - Return the negation of a t2_so_imm value def t2_so_imm_neg_XFORM : SDNodeXFormgetTargetConstant(-((int)N->getZExtValue()), MVT::i32); }]>; // t2_so_imm - Match a 32-bit immediate operand, which is an // 8-bit immediate rotated by an arbitrary number of bits, or an 8-bit // immediate splatted into multiple bytes of the word. def t2_so_imm_asmoperand : AsmOperandClass { let Name = "T2SOImm"; } def t2_so_imm : Operand, ImmLeaf { let ParserMatchClass = t2_so_imm_asmoperand; let EncoderMethod = "getT2SOImmOpValue"; let DecoderMethod = "DecodeT2SOImm"; } // t2_so_imm_not - Match an immediate that is a complement // of a t2_so_imm. def t2_so_imm_not : Operand, PatLeaf<(imm), [{ return ARM_AM::getT2SOImmVal(~((uint32_t)N->getZExtValue())) != -1; }], t2_so_imm_not_XFORM>; // t2_so_imm_neg - Match an immediate that is a negation of a t2_so_imm. def t2_so_imm_neg : Operand, PatLeaf<(imm), [{ return ARM_AM::getT2SOImmVal(-((uint32_t)N->getZExtValue())) != -1; }], t2_so_imm_neg_XFORM>; /// imm0_4095 predicate - True if the 32-bit immediate is in the range [0.4095]. def imm0_4095 : Operand, ImmLeaf= 0 && Imm < 4096; }]>; def imm0_4095_neg : PatLeaf<(i32 imm), [{ return (uint32_t)(-N->getZExtValue()) < 4096; }], imm_neg_XFORM>; def imm0_255_neg : PatLeaf<(i32 imm), [{ return (uint32_t)(-N->getZExtValue()) < 255; }], imm_neg_XFORM>; def imm0_255_not : PatLeaf<(i32 imm), [{ return (uint32_t)(~N->getZExtValue()) < 255; }], imm_comp_XFORM>; def lo5AllOne : PatLeaf<(i32 imm), [{ // Returns true if all low 5-bits are 1. return (((uint32_t)N->getZExtValue()) & 0x1FUL) == 0x1FUL; }]>; // Define Thumb2 specific addressing modes. // t2addrmode_imm12 := reg + imm12 def t2addrmode_imm12_asmoperand : AsmOperandClass {let Name="MemUImm12Offset";} def t2addrmode_imm12 : Operand, ComplexPattern { let PrintMethod = "printAddrModeImm12Operand"; let EncoderMethod = "getAddrModeImm12OpValue"; let DecoderMethod = "DecodeT2AddrModeImm12"; let ParserMatchClass = t2addrmode_imm12_asmoperand; let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm); } // t2ldrlabel := imm12 def t2ldrlabel : Operand { let EncoderMethod = "getAddrModeImm12OpValue"; let PrintMethod = "printT2LdrLabelOperand"; } // ADR instruction labels. def t2adrlabel : Operand { let EncoderMethod = "getT2AdrLabelOpValue"; } // t2addrmode_posimm8 := reg + imm8 def MemPosImm8OffsetAsmOperand : AsmOperandClass {let Name="MemPosImm8Offset";} def t2addrmode_posimm8 : Operand { let PrintMethod = "printT2AddrModeImm8Operand"; let EncoderMethod = "getT2AddrModeImm8OpValue"; let DecoderMethod = "DecodeT2AddrModeImm8"; let ParserMatchClass = MemPosImm8OffsetAsmOperand; let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm); } // t2addrmode_negimm8 := reg - imm8 def MemNegImm8OffsetAsmOperand : AsmOperandClass {let Name="MemNegImm8Offset";} def t2addrmode_negimm8 : Operand, ComplexPattern { let PrintMethod = "printT2AddrModeImm8Operand"; let EncoderMethod = "getT2AddrModeImm8OpValue"; let DecoderMethod = "DecodeT2AddrModeImm8"; let ParserMatchClass = MemNegImm8OffsetAsmOperand; let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm); } // t2addrmode_imm8 := reg +/- imm8 def MemImm8OffsetAsmOperand : AsmOperandClass { let Name = "MemImm8Offset"; } def t2addrmode_imm8 : Operand, ComplexPattern { let PrintMethod = "printT2AddrModeImm8Operand"; let EncoderMethod = "getT2AddrModeImm8OpValue"; let DecoderMethod = "DecodeT2AddrModeImm8"; let ParserMatchClass = MemImm8OffsetAsmOperand; let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm); } def t2am_imm8_offset : Operand, ComplexPattern { let PrintMethod = "printT2AddrModeImm8OffsetOperand"; let EncoderMethod = "getT2AddrModeImm8OffsetOpValue"; let DecoderMethod = "DecodeT2Imm8"; } // t2addrmode_imm8s4 := reg +/- (imm8 << 2) def MemImm8s4OffsetAsmOperand : AsmOperandClass {let Name = "MemImm8s4Offset";} def t2addrmode_imm8s4 : Operand { let PrintMethod = "printT2AddrModeImm8s4Operand"; let EncoderMethod = "getT2AddrModeImm8s4OpValue"; let DecoderMethod = "DecodeT2AddrModeImm8s4"; let ParserMatchClass = MemImm8s4OffsetAsmOperand; let MIOperandInfo = (ops GPR:$base, i32imm:$offsimm); } def t2am_imm8s4_offset_asmoperand : AsmOperandClass { let Name = "Imm8s4"; } def t2am_imm8s4_offset : Operand { let PrintMethod = "printT2AddrModeImm8s4OffsetOperand"; let EncoderMethod = "getT2Imm8s4OpValue"; let DecoderMethod = "DecodeT2Imm8S4"; } // t2addrmode_imm0_1020s4 := reg + (imm8 << 2) def MemImm0_1020s4OffsetAsmOperand : AsmOperandClass { let Name = "MemImm0_1020s4Offset"; } def t2addrmode_imm0_1020s4 : Operand { let PrintMethod = "printT2AddrModeImm0_1020s4Operand"; let EncoderMethod = "getT2AddrModeImm0_1020s4OpValue"; let DecoderMethod = "DecodeT2AddrModeImm0_1020s4"; let ParserMatchClass = MemImm0_1020s4OffsetAsmOperand; let MIOperandInfo = (ops GPRnopc:$base, i32imm:$offsimm); } // t2addrmode_so_reg := reg + (reg << imm2) def t2addrmode_so_reg_asmoperand : AsmOperandClass {let Name="T2MemRegOffset";} def t2addrmode_so_reg : Operand, ComplexPattern { let PrintMethod = "printT2AddrModeSoRegOperand"; let EncoderMethod = "getT2AddrModeSORegOpValue"; let DecoderMethod = "DecodeT2AddrModeSOReg"; let ParserMatchClass = t2addrmode_so_reg_asmoperand; let MIOperandInfo = (ops GPR:$base, rGPR:$offsreg, i32imm:$offsimm); } // Addresses for the TBB/TBH instructions. def addrmode_tbb_asmoperand : AsmOperandClass { let Name = "MemTBB"; } def addrmode_tbb : Operand { let PrintMethod = "printAddrModeTBB"; let ParserMatchClass = addrmode_tbb_asmoperand; let MIOperandInfo = (ops GPR:$Rn, rGPR:$Rm); } def addrmode_tbh_asmoperand : AsmOperandClass { let Name = "MemTBH"; } def addrmode_tbh : Operand { let PrintMethod = "printAddrModeTBH"; let ParserMatchClass = addrmode_tbh_asmoperand; let MIOperandInfo = (ops GPR:$Rn, rGPR:$Rm); } //===----------------------------------------------------------------------===// // Multiclass helpers... // class T2OneRegImm pattern> : T2I { bits<4> Rd; bits<12> imm; let Inst{11-8} = Rd; let Inst{26} = imm{11}; let Inst{14-12} = imm{10-8}; let Inst{7-0} = imm{7-0}; } class T2sOneRegImm pattern> : T2sI { bits<4> Rd; bits<4> Rn; bits<12> imm; let Inst{11-8} = Rd; let Inst{26} = imm{11}; let Inst{14-12} = imm{10-8}; let Inst{7-0} = imm{7-0}; } class T2OneRegCmpImm pattern> : T2I { bits<4> Rn; bits<12> imm; let Inst{19-16} = Rn; let Inst{26} = imm{11}; let Inst{14-12} = imm{10-8}; let Inst{7-0} = imm{7-0}; } class T2OneRegShiftedReg pattern> : T2I { bits<4> Rd; bits<12> ShiftedRm; let Inst{11-8} = Rd; let Inst{3-0} = ShiftedRm{3-0}; let Inst{5-4} = ShiftedRm{6-5}; let Inst{14-12} = ShiftedRm{11-9}; let Inst{7-6} = ShiftedRm{8-7}; } class T2sOneRegShiftedReg pattern> : T2sI { bits<4> Rd; bits<12> ShiftedRm; let Inst{11-8} = Rd; let Inst{3-0} = ShiftedRm{3-0}; let Inst{5-4} = ShiftedRm{6-5}; let Inst{14-12} = ShiftedRm{11-9}; let Inst{7-6} = ShiftedRm{8-7}; } class T2OneRegCmpShiftedReg pattern> : T2I { bits<4> Rn; bits<12> ShiftedRm; let Inst{19-16} = Rn; let Inst{3-0} = ShiftedRm{3-0}; let Inst{5-4} = ShiftedRm{6-5}; let Inst{14-12} = ShiftedRm{11-9}; let Inst{7-6} = ShiftedRm{8-7}; } class T2TwoReg pattern> : T2I { bits<4> Rd; bits<4> Rm; let Inst{11-8} = Rd; let Inst{3-0} = Rm; } class T2sTwoReg pattern> : T2sI { bits<4> Rd; bits<4> Rm; let Inst{11-8} = Rd; let Inst{3-0} = Rm; } class T2TwoRegCmp pattern> : T2I { bits<4> Rn; bits<4> Rm; let Inst{19-16} = Rn; let Inst{3-0} = Rm; } class T2TwoRegImm pattern> : T2I { bits<4> Rd; bits<4> Rn; bits<12> imm; let Inst{11-8} = Rd; let Inst{19-16} = Rn; let Inst{26} = imm{11}; let Inst{14-12} = imm{10-8}; let Inst{7-0} = imm{7-0}; } class T2sTwoRegImm pattern> : T2sI { bits<4> Rd; bits<4> Rn; bits<12> imm; let Inst{11-8} = Rd; let Inst{19-16} = Rn; let Inst{26} = imm{11}; let Inst{14-12} = imm{10-8}; let Inst{7-0} = imm{7-0}; } class T2TwoRegShiftImm pattern> : T2I { bits<4> Rd; bits<4> Rm; bits<5> imm; let Inst{11-8} = Rd; let Inst{3-0} = Rm; let Inst{14-12} = imm{4-2}; let Inst{7-6} = imm{1-0}; } class T2sTwoRegShiftImm pattern> : T2sI { bits<4> Rd; bits<4> Rm; bits<5> imm; let Inst{11-8} = Rd; let Inst{3-0} = Rm; let Inst{14-12} = imm{4-2}; let Inst{7-6} = imm{1-0}; } class T2ThreeReg pattern> : T2I { bits<4> Rd; bits<4> Rn; bits<4> Rm; let Inst{11-8} = Rd; let Inst{19-16} = Rn; let Inst{3-0} = Rm; } class T2sThreeReg pattern> : T2sI { bits<4> Rd; bits<4> Rn; bits<4> Rm; let Inst{11-8} = Rd; let Inst{19-16} = Rn; let Inst{3-0} = Rm; } class T2TwoRegShiftedReg pattern> : T2I { bits<4> Rd; bits<4> Rn; bits<12> ShiftedRm; let Inst{11-8} = Rd; let Inst{19-16} = Rn; let Inst{3-0} = ShiftedRm{3-0}; let Inst{5-4} = ShiftedRm{6-5}; let Inst{14-12} = ShiftedRm{11-9}; let Inst{7-6} = ShiftedRm{8-7}; } class T2sTwoRegShiftedReg pattern> : T2sI { bits<4> Rd; bits<4> Rn; bits<12> ShiftedRm; let Inst{11-8} = Rd; let Inst{19-16} = Rn; let Inst{3-0} = ShiftedRm{3-0}; let Inst{5-4} = ShiftedRm{6-5}; let Inst{14-12} = ShiftedRm{11-9}; let Inst{7-6} = ShiftedRm{8-7}; } class T2FourReg pattern> : T2I { bits<4> Rd; bits<4> Rn; bits<4> Rm; bits<4> Ra; let Inst{19-16} = Rn; let Inst{15-12} = Ra; let Inst{11-8} = Rd; let Inst{3-0} = Rm; } class T2MulLong opc22_20, bits<4> opc7_4, dag oops, dag iops, InstrItinClass itin, string opc, string asm, list pattern> : T2I { bits<4> RdLo; bits<4> RdHi; bits<4> Rn; bits<4> Rm; let Inst{31-23} = 0b111110111; let Inst{22-20} = opc22_20; let Inst{19-16} = Rn; let Inst{15-12} = RdLo; let Inst{11-8} = RdHi; let Inst{7-4} = opc7_4; let Inst{3-0} = Rm; } /// T2I_bin_irs - Defines a set of (op reg, {so_imm|r|so_reg}) patterns for a /// binary operation that produces a value. These are predicable and can be /// changed to modify CPSR. multiclass T2I_bin_irs opcod, string opc, InstrItinClass iii, InstrItinClass iir, InstrItinClass iis, PatFrag opnode, string baseOpc, bit Commutable = 0, string wide = ""> { // shifted imm def ri : T2sTwoRegImm< (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_imm:$imm), iii, opc, "\t$Rd, $Rn, $imm", [(set rGPR:$Rd, (opnode rGPR:$Rn, t2_so_imm:$imm))]> { let Inst{31-27} = 0b11110; let Inst{25} = 0; let Inst{24-21} = opcod; let Inst{15} = 0; } // register def rr : T2sThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), iir, opc, !strconcat(wide, "\t$Rd, $Rn, $Rm"), [(set rGPR:$Rd, (opnode rGPR:$Rn, rGPR:$Rm))]> { let isCommutable = Commutable; let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = opcod; let Inst{14-12} = 0b000; // imm3 let Inst{7-6} = 0b00; // imm2 let Inst{5-4} = 0b00; // type } // shifted register def rs : T2sTwoRegShiftedReg< (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_reg:$ShiftedRm), iis, opc, !strconcat(wide, "\t$Rd, $Rn, $ShiftedRm"), [(set rGPR:$Rd, (opnode rGPR:$Rn, t2_so_reg:$ShiftedRm))]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = opcod; } // Assembly aliases for optional destination operand when it's the same // as the source operand. def : t2InstAlias(!strconcat(baseOpc, "ri")) rGPR:$Rdn, rGPR:$Rdn, t2_so_imm:$imm, pred:$p, cc_out:$s)>; def : t2InstAlias(!strconcat(baseOpc, "rr")) rGPR:$Rdn, rGPR:$Rdn, rGPR:$Rm, pred:$p, cc_out:$s)>; def : t2InstAlias(!strconcat(baseOpc, "rs")) rGPR:$Rdn, rGPR:$Rdn, t2_so_reg:$shift, pred:$p, cc_out:$s)>; } /// T2I_bin_w_irs - Same as T2I_bin_irs except these operations need // the ".w" suffix to indicate that they are wide. multiclass T2I_bin_w_irs opcod, string opc, InstrItinClass iii, InstrItinClass iir, InstrItinClass iis, PatFrag opnode, string baseOpc, bit Commutable = 0> : T2I_bin_irs { // Assembler aliases w/o the ".w" suffix. def : t2InstAlias(!strconcat(baseOpc, "rr")) rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, pred:$p, cc_out:$s)>; def : t2InstAlias(!strconcat(baseOpc, "rs")) rGPR:$Rd, rGPR:$Rn, t2_so_reg:$shift, pred:$p, cc_out:$s)>; // and with the optional destination operand, too. def : t2InstAlias(!strconcat(baseOpc, "rr")) rGPR:$Rdn, rGPR:$Rdn, rGPR:$Rm, pred:$p, cc_out:$s)>; def : t2InstAlias(!strconcat(baseOpc, "rs")) rGPR:$Rdn, rGPR:$Rdn, t2_so_reg:$shift, pred:$p, cc_out:$s)>; } /// T2I_rbin_is - Same as T2I_bin_irs except the order of operands are /// reversed. The 'rr' form is only defined for the disassembler; for codegen /// it is equivalent to the T2I_bin_irs counterpart. multiclass T2I_rbin_irs opcod, string opc, PatFrag opnode> { // shifted imm def ri : T2sTwoRegImm< (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_imm:$imm), IIC_iALUi, opc, ".w\t$Rd, $Rn, $imm", [(set rGPR:$Rd, (opnode t2_so_imm:$imm, rGPR:$Rn))]> { let Inst{31-27} = 0b11110; let Inst{25} = 0; let Inst{24-21} = opcod; let Inst{15} = 0; } // register def rr : T2sThreeReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iALUr, opc, "\t$Rd, $Rn, $Rm", [/* For disassembly only; pattern left blank */]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = opcod; let Inst{14-12} = 0b000; // imm3 let Inst{7-6} = 0b00; // imm2 let Inst{5-4} = 0b00; // type } // shifted register def rs : T2sTwoRegShiftedReg< (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_reg:$ShiftedRm), IIC_iALUsir, opc, "\t$Rd, $Rn, $ShiftedRm", [(set rGPR:$Rd, (opnode t2_so_reg:$ShiftedRm, rGPR:$Rn))]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = opcod; } } /// T2I_bin_s_irs - Similar to T2I_bin_irs except it sets the 's' bit so the /// instruction modifies the CPSR register. /// /// These opcodes will be converted to the real non-S opcodes by /// AdjustInstrPostInstrSelection after giving then an optional CPSR operand. let hasPostISelHook = 1, isCodeGenOnly = 1, isPseudo = 1, Defs = [CPSR] in { multiclass T2I_bin_s_irs opcod, string opc, InstrItinClass iii, InstrItinClass iir, InstrItinClass iis, PatFrag opnode, bit Commutable = 0> { // shifted imm def ri : T2sTwoRegImm< (outs rGPR:$Rd), (ins GPR:$Rn, t2_so_imm:$imm), iii, opc, ".w\t$Rd, $Rn, $imm", [(set rGPR:$Rd, CPSR, (opnode GPR:$Rn, t2_so_imm:$imm))]>; // register def rr : T2sThreeReg< (outs rGPR:$Rd), (ins GPR:$Rn, rGPR:$Rm), iir, opc, ".w\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, CPSR, (opnode GPR:$Rn, rGPR:$Rm))]>; // shifted register def rs : T2sTwoRegShiftedReg< (outs rGPR:$Rd), (ins GPR:$Rn, t2_so_reg:$ShiftedRm), iis, opc, ".w\t$Rd, $Rn, $ShiftedRm", [(set rGPR:$Rd, CPSR, (opnode GPR:$Rn, t2_so_reg:$ShiftedRm))]>; } } /// T2I_bin_ii12rs - Defines a set of (op reg, {so_imm|imm0_4095|r|so_reg}) /// patterns for a binary operation that produces a value. multiclass T2I_bin_ii12rs op23_21, string opc, PatFrag opnode, bit Commutable = 0> { // shifted imm // The register-immediate version is re-materializable. This is useful // in particular for taking the address of a local. let isReMaterializable = 1 in { def ri : T2sTwoRegImm< (outs GPRnopc:$Rd), (ins GPRnopc:$Rn, t2_so_imm:$imm), IIC_iALUi, opc, ".w\t$Rd, $Rn, $imm", [(set GPRnopc:$Rd, (opnode GPRnopc:$Rn, t2_so_imm:$imm))]> { let Inst{31-27} = 0b11110; let Inst{25} = 0; let Inst{24} = 1; let Inst{23-21} = op23_21; let Inst{15} = 0; } } // 12-bit imm def ri12 : T2I< (outs GPRnopc:$Rd), (ins GPR:$Rn, imm0_4095:$imm), IIC_iALUi, !strconcat(opc, "w"), "\t$Rd, $Rn, $imm", [(set GPRnopc:$Rd, (opnode GPR:$Rn, imm0_4095:$imm))]> { bits<4> Rd; bits<4> Rn; bits<12> imm; let Inst{31-27} = 0b11110; let Inst{26} = imm{11}; let Inst{25-24} = 0b10; let Inst{23-21} = op23_21; let Inst{20} = 0; // The S bit. let Inst{19-16} = Rn; let Inst{15} = 0; let Inst{14-12} = imm{10-8}; let Inst{11-8} = Rd; let Inst{7-0} = imm{7-0}; } // register def rr : T2sThreeReg<(outs GPRnopc:$Rd), (ins GPRnopc:$Rn, rGPR:$Rm), IIC_iALUr, opc, ".w\t$Rd, $Rn, $Rm", [(set GPRnopc:$Rd, (opnode GPRnopc:$Rn, rGPR:$Rm))]> { let isCommutable = Commutable; let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24} = 1; let Inst{23-21} = op23_21; let Inst{14-12} = 0b000; // imm3 let Inst{7-6} = 0b00; // imm2 let Inst{5-4} = 0b00; // type } // shifted register def rs : T2sTwoRegShiftedReg< (outs GPRnopc:$Rd), (ins GPRnopc:$Rn, t2_so_reg:$ShiftedRm), IIC_iALUsi, opc, ".w\t$Rd, $Rn, $ShiftedRm", [(set GPRnopc:$Rd, (opnode GPRnopc:$Rn, t2_so_reg:$ShiftedRm))]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24} = 1; let Inst{23-21} = op23_21; } } /// T2I_adde_sube_irs - Defines a set of (op reg, {so_imm|r|so_reg}) patterns /// for a binary operation that produces a value and use the carry /// bit. It's not predicable. let Defs = [CPSR], Uses = [CPSR] in { multiclass T2I_adde_sube_irs opcod, string opc, PatFrag opnode, bit Commutable = 0> { // shifted imm def ri : T2sTwoRegImm<(outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_imm:$imm), IIC_iALUi, opc, "\t$Rd, $Rn, $imm", [(set rGPR:$Rd, CPSR, (opnode rGPR:$Rn, t2_so_imm:$imm, CPSR))]>, Requires<[IsThumb2]> { let Inst{31-27} = 0b11110; let Inst{25} = 0; let Inst{24-21} = opcod; let Inst{15} = 0; } // register def rr : T2sThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iALUr, opc, ".w\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, CPSR, (opnode rGPR:$Rn, rGPR:$Rm, CPSR))]>, Requires<[IsThumb2]> { let isCommutable = Commutable; let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = opcod; let Inst{14-12} = 0b000; // imm3 let Inst{7-6} = 0b00; // imm2 let Inst{5-4} = 0b00; // type } // shifted register def rs : T2sTwoRegShiftedReg< (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_reg:$ShiftedRm), IIC_iALUsi, opc, ".w\t$Rd, $Rn, $ShiftedRm", [(set rGPR:$Rd, CPSR, (opnode rGPR:$Rn, t2_so_reg:$ShiftedRm, CPSR))]>, Requires<[IsThumb2]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = opcod; } } } /// T2I_rbin_s_is - Same as T2I_rbin_irs except sets 's' bit and the register /// version is not needed since this is only for codegen. /// /// These opcodes will be converted to the real non-S opcodes by /// AdjustInstrPostInstrSelection after giving then an optional CPSR operand. let hasPostISelHook = 1, isCodeGenOnly = 1, isPseudo = 1, Defs = [CPSR] in { multiclass T2I_rbin_s_is opcod, string opc, PatFrag opnode> { // shifted imm def ri : T2sTwoRegImm< (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_imm:$imm), IIC_iALUi, opc, ".w\t$Rd, $Rn, $imm", [(set rGPR:$Rd, CPSR, (opnode t2_so_imm:$imm, rGPR:$Rn))]>; // shifted register def rs : T2sTwoRegShiftedReg< (outs rGPR:$Rd), (ins rGPR:$Rn, t2_so_reg:$ShiftedRm), IIC_iALUsi, opc, "\t$Rd, $Rn, $ShiftedRm", [(set rGPR:$Rd, CPSR, (opnode t2_so_reg:$ShiftedRm, rGPR:$Rn))]>; } } /// T2I_sh_ir - Defines a set of (op reg, {so_imm|r}) patterns for a shift / // rotate operation that produces a value. multiclass T2I_sh_ir opcod, string opc, Operand ty, PatFrag opnode, string baseOpc> { // 5-bit imm def ri : T2sTwoRegShiftImm< (outs rGPR:$Rd), (ins rGPR:$Rm, ty:$imm), IIC_iMOVsi, opc, ".w\t$Rd, $Rm, $imm", [(set rGPR:$Rd, (opnode rGPR:$Rm, (i32 ty:$imm)))]> { let Inst{31-27} = 0b11101; let Inst{26-21} = 0b010010; let Inst{19-16} = 0b1111; // Rn let Inst{5-4} = opcod; } // register def rr : T2sThreeReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMOVsr, opc, ".w\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, (opnode rGPR:$Rn, rGPR:$Rm))]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0100; let Inst{22-21} = opcod; let Inst{15-12} = 0b1111; let Inst{7-4} = 0b0000; } // Optional destination register def : t2InstAlias(!strconcat(baseOpc, "ri")) rGPR:$Rdn, rGPR:$Rdn, ty:$imm, pred:$p, cc_out:$s)>; def : t2InstAlias(!strconcat(baseOpc, "rr")) rGPR:$Rdn, rGPR:$Rdn, rGPR:$Rm, pred:$p, cc_out:$s)>; // Assembler aliases w/o the ".w" suffix. def : t2InstAlias(!strconcat(baseOpc, "ri")) rGPR:$Rd, rGPR:$Rn, ty:$imm, pred:$p, cc_out:$s)>; def : t2InstAlias(!strconcat(baseOpc, "rr")) rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, pred:$p, cc_out:$s)>; // and with the optional destination operand, too. def : t2InstAlias(!strconcat(baseOpc, "ri")) rGPR:$Rdn, rGPR:$Rdn, ty:$imm, pred:$p, cc_out:$s)>; def : t2InstAlias(!strconcat(baseOpc, "rr")) rGPR:$Rdn, rGPR:$Rdn, rGPR:$Rm, pred:$p, cc_out:$s)>; } /// T2I_cmp_irs - Defines a set of (op r, {so_imm|r|so_reg}) cmp / test /// patterns. Similar to T2I_bin_irs except the instruction does not produce /// a explicit result, only implicitly set CPSR. multiclass T2I_cmp_irs opcod, string opc, InstrItinClass iii, InstrItinClass iir, InstrItinClass iis, PatFrag opnode, string baseOpc> { let isCompare = 1, Defs = [CPSR] in { // shifted imm def ri : T2OneRegCmpImm< (outs), (ins GPRnopc:$Rn, t2_so_imm:$imm), iii, opc, ".w\t$Rn, $imm", [(opnode GPRnopc:$Rn, t2_so_imm:$imm)]> { let Inst{31-27} = 0b11110; let Inst{25} = 0; let Inst{24-21} = opcod; let Inst{20} = 1; // The S bit. let Inst{15} = 0; let Inst{11-8} = 0b1111; // Rd } // register def rr : T2TwoRegCmp< (outs), (ins GPRnopc:$Rn, rGPR:$Rm), iir, opc, ".w\t$Rn, $Rm", [(opnode GPRnopc:$Rn, rGPR:$Rm)]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = opcod; let Inst{20} = 1; // The S bit. let Inst{14-12} = 0b000; // imm3 let Inst{11-8} = 0b1111; // Rd let Inst{7-6} = 0b00; // imm2 let Inst{5-4} = 0b00; // type } // shifted register def rs : T2OneRegCmpShiftedReg< (outs), (ins GPRnopc:$Rn, t2_so_reg:$ShiftedRm), iis, opc, ".w\t$Rn, $ShiftedRm", [(opnode GPRnopc:$Rn, t2_so_reg:$ShiftedRm)]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = opcod; let Inst{20} = 1; // The S bit. let Inst{11-8} = 0b1111; // Rd } } // Assembler aliases w/o the ".w" suffix. // No alias here for 'rr' version as not all instantiations of this // multiclass want one (CMP in particular, does not). def : t2InstAlias(!strconcat(baseOpc, "ri")) GPRnopc:$Rn, t2_so_imm:$imm, pred:$p)>; def : t2InstAlias(!strconcat(baseOpc, "rs")) GPRnopc:$Rn, t2_so_reg:$shift, pred:$p)>; } /// T2I_ld - Defines a set of (op r, {imm12|imm8|so_reg}) load patterns. multiclass T2I_ld opcod, string opc, InstrItinClass iii, InstrItinClass iis, RegisterClass target, PatFrag opnode> { def i12 : T2Ii12<(outs target:$Rt), (ins t2addrmode_imm12:$addr), iii, opc, ".w\t$Rt, $addr", [(set target:$Rt, (opnode t2addrmode_imm12:$addr))]> { bits<4> Rt; bits<17> addr; let Inst{31-25} = 0b1111100; let Inst{24} = signed; let Inst{23} = 1; let Inst{22-21} = opcod; let Inst{20} = 1; // load let Inst{19-16} = addr{16-13}; // Rn let Inst{15-12} = Rt; let Inst{11-0} = addr{11-0}; // imm } def i8 : T2Ii8 <(outs target:$Rt), (ins t2addrmode_negimm8:$addr), iii, opc, "\t$Rt, $addr", [(set target:$Rt, (opnode t2addrmode_negimm8:$addr))]> { bits<4> Rt; bits<13> addr; let Inst{31-27} = 0b11111; let Inst{26-25} = 0b00; let Inst{24} = signed; let Inst{23} = 0; let Inst{22-21} = opcod; let Inst{20} = 1; // load let Inst{19-16} = addr{12-9}; // Rn let Inst{15-12} = Rt; let Inst{11} = 1; // Offset: index==TRUE, wback==FALSE let Inst{10} = 1; // The P bit. let Inst{9} = addr{8}; // U let Inst{8} = 0; // The W bit. let Inst{7-0} = addr{7-0}; // imm } def s : T2Iso <(outs target:$Rt), (ins t2addrmode_so_reg:$addr), iis, opc, ".w\t$Rt, $addr", [(set target:$Rt, (opnode t2addrmode_so_reg:$addr))]> { let Inst{31-27} = 0b11111; let Inst{26-25} = 0b00; let Inst{24} = signed; let Inst{23} = 0; let Inst{22-21} = opcod; let Inst{20} = 1; // load let Inst{11-6} = 0b000000; bits<4> Rt; let Inst{15-12} = Rt; bits<10> addr; let Inst{19-16} = addr{9-6}; // Rn let Inst{3-0} = addr{5-2}; // Rm let Inst{5-4} = addr{1-0}; // imm let DecoderMethod = "DecodeT2LoadShift"; } // FIXME: Is the pci variant actually needed? def pci : T2Ipc <(outs target:$Rt), (ins t2ldrlabel:$addr), iii, opc, ".w\t$Rt, $addr", [(set target:$Rt, (opnode (ARMWrapper tconstpool:$addr)))]> { let isReMaterializable = 1; let Inst{31-27} = 0b11111; let Inst{26-25} = 0b00; let Inst{24} = signed; let Inst{23} = ?; // add = (U == '1') let Inst{22-21} = opcod; let Inst{20} = 1; // load let Inst{19-16} = 0b1111; // Rn bits<4> Rt; bits<12> addr; let Inst{15-12} = Rt{3-0}; let Inst{11-0} = addr{11-0}; } } /// T2I_st - Defines a set of (op r, {imm12|imm8|so_reg}) store patterns. multiclass T2I_st opcod, string opc, InstrItinClass iii, InstrItinClass iis, RegisterClass target, PatFrag opnode> { def i12 : T2Ii12<(outs), (ins target:$Rt, t2addrmode_imm12:$addr), iii, opc, ".w\t$Rt, $addr", [(opnode target:$Rt, t2addrmode_imm12:$addr)]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0001; let Inst{22-21} = opcod; let Inst{20} = 0; // !load bits<4> Rt; let Inst{15-12} = Rt; bits<17> addr; let addr{12} = 1; // add = TRUE let Inst{19-16} = addr{16-13}; // Rn let Inst{23} = addr{12}; // U let Inst{11-0} = addr{11-0}; // imm } def i8 : T2Ii8 <(outs), (ins target:$Rt, t2addrmode_negimm8:$addr), iii, opc, "\t$Rt, $addr", [(opnode target:$Rt, t2addrmode_negimm8:$addr)]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0000; let Inst{22-21} = opcod; let Inst{20} = 0; // !load let Inst{11} = 1; // Offset: index==TRUE, wback==FALSE let Inst{10} = 1; // The P bit. let Inst{8} = 0; // The W bit. bits<4> Rt; let Inst{15-12} = Rt; bits<13> addr; let Inst{19-16} = addr{12-9}; // Rn let Inst{9} = addr{8}; // U let Inst{7-0} = addr{7-0}; // imm } def s : T2Iso <(outs), (ins target:$Rt, t2addrmode_so_reg:$addr), iis, opc, ".w\t$Rt, $addr", [(opnode target:$Rt, t2addrmode_so_reg:$addr)]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0000; let Inst{22-21} = opcod; let Inst{20} = 0; // !load let Inst{11-6} = 0b000000; bits<4> Rt; let Inst{15-12} = Rt; bits<10> addr; let Inst{19-16} = addr{9-6}; // Rn let Inst{3-0} = addr{5-2}; // Rm let Inst{5-4} = addr{1-0}; // imm } } /// T2I_ext_rrot - A unary operation with two forms: one whose operand is a /// register and one whose operand is a register rotated by 8/16/24. class T2I_ext_rrot opcod, string opc, PatFrag opnode> : T2TwoReg<(outs rGPR:$Rd), (ins rGPR:$Rm, rot_imm:$rot), IIC_iEXTr, opc, ".w\t$Rd, $Rm$rot", [(set rGPR:$Rd, (opnode (rotr rGPR:$Rm, rot_imm:$rot)))]>, Requires<[IsThumb2]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0100; let Inst{22-20} = opcod; let Inst{19-16} = 0b1111; // Rn let Inst{15-12} = 0b1111; let Inst{7} = 1; bits<2> rot; let Inst{5-4} = rot{1-0}; // rotate } // UXTB16 - Requres T2ExtractPack, does not need the .w qualifier. class T2I_ext_rrot_uxtb16 opcod, string opc, PatFrag opnode> : T2TwoReg<(outs rGPR:$Rd), (ins rGPR:$Rm, rot_imm:$rot), IIC_iEXTr, opc, "\t$Rd, $Rm$rot", [(set rGPR:$Rd, (opnode (rotr rGPR:$Rm, rot_imm:$rot)))]>, Requires<[HasT2ExtractPack, IsThumb2]> { bits<2> rot; let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0100; let Inst{22-20} = opcod; let Inst{19-16} = 0b1111; // Rn let Inst{15-12} = 0b1111; let Inst{7} = 1; let Inst{5-4} = rot; } // SXTB16 - Requres T2ExtractPack, does not need the .w qualifier, no pattern // supported yet. class T2I_ext_rrot_sxtb16 opcod, string opc> : T2TwoReg<(outs rGPR:$Rd), (ins rGPR:$Rm, rot_imm:$rot), IIC_iEXTr, opc, "\t$Rd, $Rm$rot", []>, Requires<[IsThumb2, HasT2ExtractPack]> { bits<2> rot; let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0100; let Inst{22-20} = opcod; let Inst{19-16} = 0b1111; // Rn let Inst{15-12} = 0b1111; let Inst{7} = 1; let Inst{5-4} = rot; } /// T2I_exta_rrot - A binary operation with two forms: one whose operand is a /// register and one whose operand is a register rotated by 8/16/24. class T2I_exta_rrot opcod, string opc, PatFrag opnode> : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rot_imm:$rot), IIC_iEXTAsr, opc, "\t$Rd, $Rn, $Rm$rot", [(set rGPR:$Rd, (opnode rGPR:$Rn, (rotr rGPR:$Rm,rot_imm:$rot)))]>, Requires<[HasT2ExtractPack, IsThumb2]> { bits<2> rot; let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0100; let Inst{22-20} = opcod; let Inst{15-12} = 0b1111; let Inst{7} = 1; let Inst{5-4} = rot; } class T2I_exta_rrot_np opcod, string opc> : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm,rot_imm:$rot), IIC_iEXTAsr, opc, "\t$Rd, $Rn, $Rm$rot", []> { bits<2> rot; let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0100; let Inst{22-20} = opcod; let Inst{15-12} = 0b1111; let Inst{7} = 1; let Inst{5-4} = rot; } //===----------------------------------------------------------------------===// // Instructions //===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===// // Miscellaneous Instructions. // class T2PCOneRegImm pattern> : T2XI { bits<4> Rd; bits<12> label; let Inst{11-8} = Rd; let Inst{26} = label{11}; let Inst{14-12} = label{10-8}; let Inst{7-0} = label{7-0}; } // LEApcrel - Load a pc-relative address into a register without offending the // assembler. def t2ADR : T2PCOneRegImm<(outs rGPR:$Rd), (ins t2adrlabel:$addr, pred:$p), IIC_iALUi, "adr{$p}.w\t$Rd, $addr", []> { let Inst{31-27} = 0b11110; let Inst{25-24} = 0b10; // Inst{23:21} = '11' (add = FALSE) or '00' (add = TRUE) let Inst{22} = 0; let Inst{20} = 0; let Inst{19-16} = 0b1111; // Rn let Inst{15} = 0; bits<4> Rd; bits<13> addr; let Inst{11-8} = Rd; let Inst{23} = addr{12}; let Inst{21} = addr{12}; let Inst{26} = addr{11}; let Inst{14-12} = addr{10-8}; let Inst{7-0} = addr{7-0}; let DecoderMethod = "DecodeT2Adr"; } let neverHasSideEffects = 1, isReMaterializable = 1 in def t2LEApcrel : t2PseudoInst<(outs rGPR:$Rd), (ins i32imm:$label, pred:$p), 4, IIC_iALUi, []>; def t2LEApcrelJT : t2PseudoInst<(outs rGPR:$Rd), (ins i32imm:$label, nohash_imm:$id, pred:$p), 4, IIC_iALUi, []>; //===----------------------------------------------------------------------===// // Load / store Instructions. // // Load let canFoldAsLoad = 1, isReMaterializable = 1 in defm t2LDR : T2I_ld<0, 0b10, "ldr", IIC_iLoad_i, IIC_iLoad_si, GPR, UnOpFrag<(load node:$Src)>>; // Loads with zero extension defm t2LDRH : T2I_ld<0, 0b01, "ldrh", IIC_iLoad_bh_i, IIC_iLoad_bh_si, rGPR, UnOpFrag<(zextloadi16 node:$Src)>>; defm t2LDRB : T2I_ld<0, 0b00, "ldrb", IIC_iLoad_bh_i, IIC_iLoad_bh_si, rGPR, UnOpFrag<(zextloadi8 node:$Src)>>; // Loads with sign extension defm t2LDRSH : T2I_ld<1, 0b01, "ldrsh", IIC_iLoad_bh_i, IIC_iLoad_bh_si, rGPR, UnOpFrag<(sextloadi16 node:$Src)>>; defm t2LDRSB : T2I_ld<1, 0b00, "ldrsb", IIC_iLoad_bh_i, IIC_iLoad_bh_si, rGPR, UnOpFrag<(sextloadi8 node:$Src)>>; let mayLoad = 1, neverHasSideEffects = 1, hasExtraDefRegAllocReq = 1 in { // Load doubleword def t2LDRDi8 : T2Ii8s4<1, 0, 1, (outs rGPR:$Rt, rGPR:$Rt2), (ins t2addrmode_imm8s4:$addr), IIC_iLoad_d_i, "ldrd", "\t$Rt, $Rt2, $addr", "", []>; } // mayLoad = 1, neverHasSideEffects = 1, hasExtraDefRegAllocReq = 1 // zextload i1 -> zextload i8 def : T2Pat<(zextloadi1 t2addrmode_imm12:$addr), (t2LDRBi12 t2addrmode_imm12:$addr)>; def : T2Pat<(zextloadi1 t2addrmode_negimm8:$addr), (t2LDRBi8 t2addrmode_negimm8:$addr)>; def : T2Pat<(zextloadi1 t2addrmode_so_reg:$addr), (t2LDRBs t2addrmode_so_reg:$addr)>; def : T2Pat<(zextloadi1 (ARMWrapper tconstpool:$addr)), (t2LDRBpci tconstpool:$addr)>; // extload -> zextload // FIXME: Reduce the number of patterns by legalizing extload to zextload // earlier? def : T2Pat<(extloadi1 t2addrmode_imm12:$addr), (t2LDRBi12 t2addrmode_imm12:$addr)>; def : T2Pat<(extloadi1 t2addrmode_negimm8:$addr), (t2LDRBi8 t2addrmode_negimm8:$addr)>; def : T2Pat<(extloadi1 t2addrmode_so_reg:$addr), (t2LDRBs t2addrmode_so_reg:$addr)>; def : T2Pat<(extloadi1 (ARMWrapper tconstpool:$addr)), (t2LDRBpci tconstpool:$addr)>; def : T2Pat<(extloadi8 t2addrmode_imm12:$addr), (t2LDRBi12 t2addrmode_imm12:$addr)>; def : T2Pat<(extloadi8 t2addrmode_negimm8:$addr), (t2LDRBi8 t2addrmode_negimm8:$addr)>; def : T2Pat<(extloadi8 t2addrmode_so_reg:$addr), (t2LDRBs t2addrmode_so_reg:$addr)>; def : T2Pat<(extloadi8 (ARMWrapper tconstpool:$addr)), (t2LDRBpci tconstpool:$addr)>; def : T2Pat<(extloadi16 t2addrmode_imm12:$addr), (t2LDRHi12 t2addrmode_imm12:$addr)>; def : T2Pat<(extloadi16 t2addrmode_negimm8:$addr), (t2LDRHi8 t2addrmode_negimm8:$addr)>; def : T2Pat<(extloadi16 t2addrmode_so_reg:$addr), (t2LDRHs t2addrmode_so_reg:$addr)>; def : T2Pat<(extloadi16 (ARMWrapper tconstpool:$addr)), (t2LDRHpci tconstpool:$addr)>; // FIXME: The destination register of the loads and stores can't be PC, but // can be SP. We need another regclass (similar to rGPR) to represent // that. Not a pressing issue since these are selected manually, // not via pattern. // Indexed loads let mayLoad = 1, neverHasSideEffects = 1 in { def t2LDR_PRE : T2Ipreldst<0, 0b10, 1, 1, (outs GPR:$Rt, GPR:$Rn_wb), (ins t2addrmode_imm8:$addr), AddrModeT2_i8, IndexModePre, IIC_iLoad_iu, "ldr", "\t$Rt, $addr!", "$addr.base = $Rn_wb", []> { let AsmMatchConverter = "cvtLdWriteBackRegT2AddrModeImm8"; } def t2LDR_POST : T2Ipostldst<0, 0b10, 1, 0, (outs GPR:$Rt, GPR:$Rn_wb), (ins addr_offset_none:$Rn, t2am_imm8_offset:$offset), AddrModeT2_i8, IndexModePost, IIC_iLoad_iu, "ldr", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb", []>; def t2LDRB_PRE : T2Ipreldst<0, 0b00, 1, 1, (outs GPR:$Rt, GPR:$Rn_wb), (ins t2addrmode_imm8:$addr), AddrModeT2_i8, IndexModePre, IIC_iLoad_bh_iu, "ldrb", "\t$Rt, $addr!", "$addr.base = $Rn_wb", []> { let AsmMatchConverter = "cvtLdWriteBackRegT2AddrModeImm8"; } def t2LDRB_POST : T2Ipostldst<0, 0b00, 1, 0, (outs GPR:$Rt, GPR:$Rn_wb), (ins addr_offset_none:$Rn, t2am_imm8_offset:$offset), AddrModeT2_i8, IndexModePost, IIC_iLoad_bh_iu, "ldrb", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb", []>; def t2LDRH_PRE : T2Ipreldst<0, 0b01, 1, 1, (outs GPR:$Rt, GPR:$Rn_wb), (ins t2addrmode_imm8:$addr), AddrModeT2_i8, IndexModePre, IIC_iLoad_bh_iu, "ldrh", "\t$Rt, $addr!", "$addr.base = $Rn_wb", []> { let AsmMatchConverter = "cvtLdWriteBackRegT2AddrModeImm8"; } def t2LDRH_POST : T2Ipostldst<0, 0b01, 1, 0, (outs GPR:$Rt, GPR:$Rn_wb), (ins addr_offset_none:$Rn, t2am_imm8_offset:$offset), AddrModeT2_i8, IndexModePost, IIC_iLoad_bh_iu, "ldrh", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb", []>; def t2LDRSB_PRE : T2Ipreldst<1, 0b00, 1, 1, (outs GPR:$Rt, GPR:$Rn_wb), (ins t2addrmode_imm8:$addr), AddrModeT2_i8, IndexModePre, IIC_iLoad_bh_iu, "ldrsb", "\t$Rt, $addr!", "$addr.base = $Rn_wb", []> { let AsmMatchConverter = "cvtLdWriteBackRegT2AddrModeImm8"; } def t2LDRSB_POST : T2Ipostldst<1, 0b00, 1, 0, (outs GPR:$Rt, GPR:$Rn_wb), (ins addr_offset_none:$Rn, t2am_imm8_offset:$offset), AddrModeT2_i8, IndexModePost, IIC_iLoad_bh_iu, "ldrsb", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb", []>; def t2LDRSH_PRE : T2Ipreldst<1, 0b01, 1, 1, (outs GPR:$Rt, GPR:$Rn_wb), (ins t2addrmode_imm8:$addr), AddrModeT2_i8, IndexModePre, IIC_iLoad_bh_iu, "ldrsh", "\t$Rt, $addr!", "$addr.base = $Rn_wb", []> { let AsmMatchConverter = "cvtLdWriteBackRegT2AddrModeImm8"; } def t2LDRSH_POST : T2Ipostldst<1, 0b01, 1, 0, (outs GPR:$Rt, GPR:$Rn_wb), (ins addr_offset_none:$Rn, t2am_imm8_offset:$offset), AddrModeT2_i8, IndexModePost, IIC_iLoad_bh_iu, "ldrsh", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb", []>; } // mayLoad = 1, neverHasSideEffects = 1 // LDRT, LDRBT, LDRHT, LDRSBT, LDRSHT all have offset mode (PUW=0b110). // Ref: A8.6.57 LDR (immediate, Thumb) Encoding T4 class T2IldT type, string opc, InstrItinClass ii> : T2Ii8<(outs rGPR:$Rt), (ins t2addrmode_posimm8:$addr), ii, opc, "\t$Rt, $addr", []> { bits<4> Rt; bits<13> addr; let Inst{31-27} = 0b11111; let Inst{26-25} = 0b00; let Inst{24} = signed; let Inst{23} = 0; let Inst{22-21} = type; let Inst{20} = 1; // load let Inst{19-16} = addr{12-9}; let Inst{15-12} = Rt; let Inst{11} = 1; let Inst{10-8} = 0b110; // PUW. let Inst{7-0} = addr{7-0}; } def t2LDRT : T2IldT<0, 0b10, "ldrt", IIC_iLoad_i>; def t2LDRBT : T2IldT<0, 0b00, "ldrbt", IIC_iLoad_bh_i>; def t2LDRHT : T2IldT<0, 0b01, "ldrht", IIC_iLoad_bh_i>; def t2LDRSBT : T2IldT<1, 0b00, "ldrsbt", IIC_iLoad_bh_i>; def t2LDRSHT : T2IldT<1, 0b01, "ldrsht", IIC_iLoad_bh_i>; // Store defm t2STR :T2I_st<0b10,"str", IIC_iStore_i, IIC_iStore_si, GPR, BinOpFrag<(store node:$LHS, node:$RHS)>>; defm t2STRB:T2I_st<0b00,"strb", IIC_iStore_bh_i, IIC_iStore_bh_si, rGPR, BinOpFrag<(truncstorei8 node:$LHS, node:$RHS)>>; defm t2STRH:T2I_st<0b01,"strh", IIC_iStore_bh_i, IIC_iStore_bh_si, rGPR, BinOpFrag<(truncstorei16 node:$LHS, node:$RHS)>>; // Store doubleword let mayLoad = 1, neverHasSideEffects = 1, hasExtraSrcRegAllocReq = 1 in def t2STRDi8 : T2Ii8s4<1, 0, 0, (outs), (ins GPR:$Rt, GPR:$Rt2, t2addrmode_imm8s4:$addr), IIC_iStore_d_r, "strd", "\t$Rt, $Rt2, $addr", "", []>; // Indexed stores def t2STR_PRE : T2Ipreldst<0, 0b10, 0, 1, (outs GPRnopc:$Rn_wb), (ins rGPR:$Rt, t2addrmode_imm8:$addr), AddrModeT2_i8, IndexModePre, IIC_iStore_iu, "str", "\t$Rt, $addr!", "$addr.base = $Rn_wb,@earlyclobber $Rn_wb", []> { let AsmMatchConverter = "cvtStWriteBackRegT2AddrModeImm8"; } def t2STRH_PRE : T2Ipreldst<0, 0b01, 0, 1, (outs GPRnopc:$Rn_wb), (ins rGPR:$Rt, t2addrmode_imm8:$addr), AddrModeT2_i8, IndexModePre, IIC_iStore_iu, "strh", "\t$Rt, $addr!", "$addr.base = $Rn_wb,@earlyclobber $Rn_wb", []> { let AsmMatchConverter = "cvtStWriteBackRegT2AddrModeImm8"; } def t2STRB_PRE : T2Ipreldst<0, 0b00, 0, 1, (outs GPRnopc:$Rn_wb), (ins rGPR:$Rt, t2addrmode_imm8:$addr), AddrModeT2_i8, IndexModePre, IIC_iStore_bh_iu, "strb", "\t$Rt, $addr!", "$addr.base = $Rn_wb,@earlyclobber $Rn_wb", []> { let AsmMatchConverter = "cvtStWriteBackRegT2AddrModeImm8"; } def t2STR_POST : T2Ipostldst<0, 0b10, 0, 0, (outs GPRnopc:$Rn_wb), (ins rGPR:$Rt, addr_offset_none:$Rn, t2am_imm8_offset:$offset), AddrModeT2_i8, IndexModePost, IIC_iStore_iu, "str", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb,@earlyclobber $Rn_wb", [(set GPRnopc:$Rn_wb, (post_store rGPR:$Rt, addr_offset_none:$Rn, t2am_imm8_offset:$offset))]>; def t2STRH_POST : T2Ipostldst<0, 0b01, 0, 0, (outs GPRnopc:$Rn_wb), (ins rGPR:$Rt, addr_offset_none:$Rn, t2am_imm8_offset:$offset), AddrModeT2_i8, IndexModePost, IIC_iStore_bh_iu, "strh", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb,@earlyclobber $Rn_wb", [(set GPRnopc:$Rn_wb, (post_truncsti16 rGPR:$Rt, addr_offset_none:$Rn, t2am_imm8_offset:$offset))]>; def t2STRB_POST : T2Ipostldst<0, 0b00, 0, 0, (outs GPRnopc:$Rn_wb), (ins rGPR:$Rt, addr_offset_none:$Rn, t2am_imm8_offset:$offset), AddrModeT2_i8, IndexModePost, IIC_iStore_bh_iu, "strb", "\t$Rt, $Rn$offset", "$Rn = $Rn_wb,@earlyclobber $Rn_wb", [(set GPRnopc:$Rn_wb, (post_truncsti8 rGPR:$Rt, addr_offset_none:$Rn, t2am_imm8_offset:$offset))]>; // Pseudo-instructions for pattern matching the pre-indexed stores. We can't // put the patterns on the instruction definitions directly as ISel wants // the address base and offset to be separate operands, not a single // complex operand like we represent the instructions themselves. The // pseudos map between the two. let usesCustomInserter = 1, Constraints = "$Rn = $Rn_wb,@earlyclobber $Rn_wb" in { def t2STR_preidx: t2PseudoInst<(outs GPRnopc:$Rn_wb), (ins rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset, pred:$p), 4, IIC_iStore_ru, [(set GPRnopc:$Rn_wb, (pre_store rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset))]>; def t2STRB_preidx: t2PseudoInst<(outs GPRnopc:$Rn_wb), (ins rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset, pred:$p), 4, IIC_iStore_ru, [(set GPRnopc:$Rn_wb, (pre_truncsti8 rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset))]>; def t2STRH_preidx: t2PseudoInst<(outs GPRnopc:$Rn_wb), (ins rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset, pred:$p), 4, IIC_iStore_ru, [(set GPRnopc:$Rn_wb, (pre_truncsti16 rGPR:$Rt, GPRnopc:$Rn, t2am_imm8_offset:$offset))]>; } // STRT, STRBT, STRHT all have offset mode (PUW=0b110) and are for disassembly // only. // Ref: A8.6.193 STR (immediate, Thumb) Encoding T4 class T2IstT type, string opc, InstrItinClass ii> : T2Ii8<(outs rGPR:$Rt), (ins t2addrmode_imm8:$addr), ii, opc, "\t$Rt, $addr", []> { let Inst{31-27} = 0b11111; let Inst{26-25} = 0b00; let Inst{24} = 0; // not signed let Inst{23} = 0; let Inst{22-21} = type; let Inst{20} = 0; // store let Inst{11} = 1; let Inst{10-8} = 0b110; // PUW bits<4> Rt; bits<13> addr; let Inst{15-12} = Rt; let Inst{19-16} = addr{12-9}; let Inst{7-0} = addr{7-0}; } def t2STRT : T2IstT<0b10, "strt", IIC_iStore_i>; def t2STRBT : T2IstT<0b00, "strbt", IIC_iStore_bh_i>; def t2STRHT : T2IstT<0b01, "strht", IIC_iStore_bh_i>; // ldrd / strd pre / post variants // For disassembly only. def t2LDRD_PRE : T2Ii8s4<1, 1, 1, (outs rGPR:$Rt, rGPR:$Rt2, GPR:$wb), (ins t2addrmode_imm8s4:$addr), IIC_iLoad_d_ru, "ldrd", "\t$Rt, $Rt2, $addr!", "$addr.base = $wb", []> { let AsmMatchConverter = "cvtT2LdrdPre"; let DecoderMethod = "DecodeT2LDRDPreInstruction"; } def t2LDRD_POST : T2Ii8s4post<0, 1, 1, (outs rGPR:$Rt, rGPR:$Rt2, GPR:$wb), (ins addr_offset_none:$addr, t2am_imm8s4_offset:$imm), IIC_iLoad_d_ru, "ldrd", "\t$Rt, $Rt2, $addr$imm", "$addr.base = $wb", []>; def t2STRD_PRE : T2Ii8s4<1, 1, 0, (outs GPR:$wb), (ins rGPR:$Rt, rGPR:$Rt2, t2addrmode_imm8s4:$addr), IIC_iStore_d_ru, "strd", "\t$Rt, $Rt2, $addr!", "$addr.base = $wb", []> { let AsmMatchConverter = "cvtT2StrdPre"; let DecoderMethod = "DecodeT2STRDPreInstruction"; } def t2STRD_POST : T2Ii8s4post<0, 1, 0, (outs GPR:$wb), (ins rGPR:$Rt, rGPR:$Rt2, addr_offset_none:$addr, t2am_imm8s4_offset:$imm), IIC_iStore_d_ru, "strd", "\t$Rt, $Rt2, $addr$imm", "$addr.base = $wb", []>; // T2Ipl (Preload Data/Instruction) signals the memory system of possible future // data/instruction access. These are for disassembly only. // instr_write is inverted for Thumb mode: (prefetch 3) -> (preload 0), // (prefetch 1) -> (preload 2), (prefetch 2) -> (preload 1). multiclass T2Ipl write, bits<1> instr, string opc> { def i12 : T2Ii12<(outs), (ins t2addrmode_imm12:$addr), IIC_Preload, opc, "\t$addr", [(ARMPreload t2addrmode_imm12:$addr, (i32 write), (i32 instr))]> { let Inst{31-25} = 0b1111100; let Inst{24} = instr; let Inst{22} = 0; let Inst{21} = write; let Inst{20} = 1; let Inst{15-12} = 0b1111; bits<17> addr; let addr{12} = 1; // add = TRUE let Inst{19-16} = addr{16-13}; // Rn let Inst{23} = addr{12}; // U let Inst{11-0} = addr{11-0}; // imm12 } def i8 : T2Ii8<(outs), (ins t2addrmode_negimm8:$addr), IIC_Preload, opc, "\t$addr", [(ARMPreload t2addrmode_negimm8:$addr, (i32 write), (i32 instr))]> { let Inst{31-25} = 0b1111100; let Inst{24} = instr; let Inst{23} = 0; // U = 0 let Inst{22} = 0; let Inst{21} = write; let Inst{20} = 1; let Inst{15-12} = 0b1111; let Inst{11-8} = 0b1100; bits<13> addr; let Inst{19-16} = addr{12-9}; // Rn let Inst{7-0} = addr{7-0}; // imm8 } def s : T2Iso<(outs), (ins t2addrmode_so_reg:$addr), IIC_Preload, opc, "\t$addr", [(ARMPreload t2addrmode_so_reg:$addr, (i32 write), (i32 instr))]> { let Inst{31-25} = 0b1111100; let Inst{24} = instr; let Inst{23} = 0; // add = TRUE for T1 let Inst{22} = 0; let Inst{21} = write; let Inst{20} = 1; let Inst{15-12} = 0b1111; let Inst{11-6} = 0000000; bits<10> addr; let Inst{19-16} = addr{9-6}; // Rn let Inst{3-0} = addr{5-2}; // Rm let Inst{5-4} = addr{1-0}; // imm2 let DecoderMethod = "DecodeT2LoadShift"; } } defm t2PLD : T2Ipl<0, 0, "pld">, Requires<[IsThumb2]>; defm t2PLDW : T2Ipl<1, 0, "pldw">, Requires<[IsThumb2,HasV7,HasMP]>; defm t2PLI : T2Ipl<0, 1, "pli">, Requires<[IsThumb2,HasV7]>; //===----------------------------------------------------------------------===// // Load / store multiple Instructions. // multiclass thumb2_ld_mult { def IA : T2XI<(outs), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops), itin, !strconcat(asm, "${p}.w\t$Rn, $regs"), []> { bits<4> Rn; bits<16> regs; let Inst{31-27} = 0b11101; let Inst{26-25} = 0b00; let Inst{24-23} = 0b01; // Increment After let Inst{22} = 0; let Inst{21} = 0; // No writeback let Inst{20} = L_bit; let Inst{19-16} = Rn; - let Inst{15} = 0; - let Inst{14-0} = regs{14-0}; + let Inst{15-0} = regs; } def IA_UPD : T2XIt<(outs GPR:$wb), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops), itin_upd, !strconcat(asm, "${p}.w\t$Rn!, $regs"), "$Rn = $wb", []> { bits<4> Rn; bits<16> regs; let Inst{31-27} = 0b11101; let Inst{26-25} = 0b00; let Inst{24-23} = 0b01; // Increment After let Inst{22} = 0; let Inst{21} = 1; // Writeback let Inst{20} = L_bit; let Inst{19-16} = Rn; - let Inst{15} = 0; - let Inst{14-0} = regs{14-0}; + let Inst{15-0} = regs; } def DB : T2XI<(outs), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops), itin, !strconcat(asm, "db${p}\t$Rn, $regs"), []> { bits<4> Rn; bits<16> regs; let Inst{31-27} = 0b11101; let Inst{26-25} = 0b00; let Inst{24-23} = 0b10; // Decrement Before let Inst{22} = 0; let Inst{21} = 0; // No writeback let Inst{20} = L_bit; let Inst{19-16} = Rn; - let Inst{15} = 0; - let Inst{14-0} = regs{14-0}; + let Inst{15-0} = regs; } def DB_UPD : T2XIt<(outs GPR:$wb), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops), itin_upd, !strconcat(asm, "db${p}\t$Rn!, $regs"), "$Rn = $wb", []> { bits<4> Rn; bits<16> regs; let Inst{31-27} = 0b11101; let Inst{26-25} = 0b00; let Inst{24-23} = 0b10; // Decrement Before let Inst{22} = 0; let Inst{21} = 1; // Writeback let Inst{20} = L_bit; let Inst{19-16} = Rn; - let Inst{15} = 0; - let Inst{14-0} = regs{14-0}; + let Inst{15-0} = regs; } } let neverHasSideEffects = 1 in { let mayLoad = 1, hasExtraDefRegAllocReq = 1 in defm t2LDM : thumb2_ld_mult<"ldm", IIC_iLoad_m, IIC_iLoad_mu, 1>; multiclass thumb2_st_mult { def IA : T2XI<(outs), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops), itin, !strconcat(asm, "${p}.w\t$Rn, $regs"), []> { bits<4> Rn; bits<16> regs; let Inst{31-27} = 0b11101; let Inst{26-25} = 0b00; let Inst{24-23} = 0b01; // Increment After let Inst{22} = 0; let Inst{21} = 0; // No writeback let Inst{20} = L_bit; let Inst{19-16} = Rn; let Inst{15} = 0; let Inst{14} = regs{14}; let Inst{13} = 0; let Inst{12-0} = regs{12-0}; } def IA_UPD : T2XIt<(outs GPR:$wb), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops), itin_upd, !strconcat(asm, "${p}.w\t$Rn!, $regs"), "$Rn = $wb", []> { bits<4> Rn; bits<16> regs; let Inst{31-27} = 0b11101; let Inst{26-25} = 0b00; let Inst{24-23} = 0b01; // Increment After let Inst{22} = 0; let Inst{21} = 1; // Writeback let Inst{20} = L_bit; let Inst{19-16} = Rn; let Inst{15} = 0; let Inst{14} = regs{14}; let Inst{13} = 0; let Inst{12-0} = regs{12-0}; } def DB : T2XI<(outs), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops), itin, !strconcat(asm, "db${p}\t$Rn, $regs"), []> { bits<4> Rn; bits<16> regs; let Inst{31-27} = 0b11101; let Inst{26-25} = 0b00; let Inst{24-23} = 0b10; // Decrement Before let Inst{22} = 0; let Inst{21} = 0; // No writeback let Inst{20} = L_bit; let Inst{19-16} = Rn; let Inst{15} = 0; let Inst{14} = regs{14}; let Inst{13} = 0; let Inst{12-0} = regs{12-0}; } def DB_UPD : T2XIt<(outs GPR:$wb), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops), itin_upd, !strconcat(asm, "db${p}\t$Rn!, $regs"), "$Rn = $wb", []> { bits<4> Rn; bits<16> regs; let Inst{31-27} = 0b11101; let Inst{26-25} = 0b00; let Inst{24-23} = 0b10; // Decrement Before let Inst{22} = 0; let Inst{21} = 1; // Writeback let Inst{20} = L_bit; let Inst{19-16} = Rn; let Inst{15} = 0; let Inst{14} = regs{14}; let Inst{13} = 0; let Inst{12-0} = regs{12-0}; } } let mayStore = 1, hasExtraSrcRegAllocReq = 1 in defm t2STM : thumb2_st_mult<"stm", IIC_iStore_m, IIC_iStore_mu, 0>; } // neverHasSideEffects //===----------------------------------------------------------------------===// // Move Instructions. // let neverHasSideEffects = 1 in def t2MOVr : T2sTwoReg<(outs GPRnopc:$Rd), (ins GPR:$Rm), IIC_iMOVr, "mov", ".w\t$Rd, $Rm", []> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = 0b0010; let Inst{19-16} = 0b1111; // Rn let Inst{14-12} = 0b000; let Inst{7-4} = 0b0000; } def : t2InstAlias<"movs${p}.w $Rd, $Rm", (t2MOVr GPRnopc:$Rd, GPR:$Rm, pred:$p, CPSR)>; def : t2InstAlias<"movs${p} $Rd, $Rm", (t2MOVr GPRnopc:$Rd, GPR:$Rm, pred:$p, CPSR)>; // AddedComplexity to ensure isel tries t2MOVi before t2MOVi16. let isReMaterializable = 1, isAsCheapAsAMove = 1, isMoveImm = 1, AddedComplexity = 1 in def t2MOVi : T2sOneRegImm<(outs rGPR:$Rd), (ins t2_so_imm:$imm), IIC_iMOVi, "mov", ".w\t$Rd, $imm", [(set rGPR:$Rd, t2_so_imm:$imm)]> { let Inst{31-27} = 0b11110; let Inst{25} = 0; let Inst{24-21} = 0b0010; let Inst{19-16} = 0b1111; // Rn let Inst{15} = 0; } // cc_out is handled as part of the explicit mnemonic in the parser for 'mov'. // Use aliases to get that to play nice here. def : t2InstAlias<"movs${p}.w $Rd, $imm", (t2MOVi rGPR:$Rd, t2_so_imm:$imm, pred:$p, CPSR)>; def : t2InstAlias<"movs${p} $Rd, $imm", (t2MOVi rGPR:$Rd, t2_so_imm:$imm, pred:$p, CPSR)>; def : t2InstAlias<"mov${p}.w $Rd, $imm", (t2MOVi rGPR:$Rd, t2_so_imm:$imm, pred:$p, zero_reg)>; def : t2InstAlias<"mov${p} $Rd, $imm", (t2MOVi rGPR:$Rd, t2_so_imm:$imm, pred:$p, zero_reg)>; let isReMaterializable = 1, isAsCheapAsAMove = 1, isMoveImm = 1 in def t2MOVi16 : T2I<(outs rGPR:$Rd), (ins imm0_65535_expr:$imm), IIC_iMOVi, "movw", "\t$Rd, $imm", [(set rGPR:$Rd, imm0_65535:$imm)]> { let Inst{31-27} = 0b11110; let Inst{25} = 1; let Inst{24-21} = 0b0010; let Inst{20} = 0; // The S bit. let Inst{15} = 0; bits<4> Rd; bits<16> imm; let Inst{11-8} = Rd; let Inst{19-16} = imm{15-12}; let Inst{26} = imm{11}; let Inst{14-12} = imm{10-8}; let Inst{7-0} = imm{7-0}; let DecoderMethod = "DecodeT2MOVTWInstruction"; } def t2MOVi16_ga_pcrel : PseudoInst<(outs rGPR:$Rd), (ins i32imm:$addr, pclabel:$id), IIC_iMOVi, []>; let Constraints = "$src = $Rd" in { def t2MOVTi16 : T2I<(outs rGPR:$Rd), (ins rGPR:$src, imm0_65535_expr:$imm), IIC_iMOVi, "movt", "\t$Rd, $imm", [(set rGPR:$Rd, (or (and rGPR:$src, 0xffff), lo16AllZero:$imm))]> { let Inst{31-27} = 0b11110; let Inst{25} = 1; let Inst{24-21} = 0b0110; let Inst{20} = 0; // The S bit. let Inst{15} = 0; bits<4> Rd; bits<16> imm; let Inst{11-8} = Rd; let Inst{19-16} = imm{15-12}; let Inst{26} = imm{11}; let Inst{14-12} = imm{10-8}; let Inst{7-0} = imm{7-0}; let DecoderMethod = "DecodeT2MOVTWInstruction"; } def t2MOVTi16_ga_pcrel : PseudoInst<(outs rGPR:$Rd), (ins rGPR:$src, i32imm:$addr, pclabel:$id), IIC_iMOVi, []>; } // Constraints def : T2Pat<(or rGPR:$src, 0xffff0000), (t2MOVTi16 rGPR:$src, 0xffff)>; //===----------------------------------------------------------------------===// // Extend Instructions. // // Sign extenders def t2SXTB : T2I_ext_rrot<0b100, "sxtb", UnOpFrag<(sext_inreg node:$Src, i8)>>; def t2SXTH : T2I_ext_rrot<0b000, "sxth", UnOpFrag<(sext_inreg node:$Src, i16)>>; def t2SXTB16 : T2I_ext_rrot_sxtb16<0b010, "sxtb16">; def t2SXTAB : T2I_exta_rrot<0b100, "sxtab", BinOpFrag<(add node:$LHS, (sext_inreg node:$RHS, i8))>>; def t2SXTAH : T2I_exta_rrot<0b000, "sxtah", BinOpFrag<(add node:$LHS, (sext_inreg node:$RHS,i16))>>; def t2SXTAB16 : T2I_exta_rrot_np<0b010, "sxtab16">; // Zero extenders let AddedComplexity = 16 in { def t2UXTB : T2I_ext_rrot<0b101, "uxtb", UnOpFrag<(and node:$Src, 0x000000FF)>>; def t2UXTH : T2I_ext_rrot<0b001, "uxth", UnOpFrag<(and node:$Src, 0x0000FFFF)>>; def t2UXTB16 : T2I_ext_rrot_uxtb16<0b011, "uxtb16", UnOpFrag<(and node:$Src, 0x00FF00FF)>>; // FIXME: This pattern incorrectly assumes the shl operator is a rotate. // The transformation should probably be done as a combiner action // instead so we can include a check for masking back in the upper // eight bits of the source into the lower eight bits of the result. //def : T2Pat<(and (shl rGPR:$Src, (i32 8)), 0xFF00FF), // (t2UXTB16 rGPR:$Src, 3)>, // Requires<[HasT2ExtractPack, IsThumb2]>; def : T2Pat<(and (srl rGPR:$Src, (i32 8)), 0xFF00FF), (t2UXTB16 rGPR:$Src, 1)>, Requires<[HasT2ExtractPack, IsThumb2]>; def t2UXTAB : T2I_exta_rrot<0b101, "uxtab", BinOpFrag<(add node:$LHS, (and node:$RHS, 0x00FF))>>; def t2UXTAH : T2I_exta_rrot<0b001, "uxtah", BinOpFrag<(add node:$LHS, (and node:$RHS, 0xFFFF))>>; def t2UXTAB16 : T2I_exta_rrot_np<0b011, "uxtab16">; } //===----------------------------------------------------------------------===// // Arithmetic Instructions. // defm t2ADD : T2I_bin_ii12rs<0b000, "add", BinOpFrag<(add node:$LHS, node:$RHS)>, 1>; defm t2SUB : T2I_bin_ii12rs<0b101, "sub", BinOpFrag<(sub node:$LHS, node:$RHS)>>; // ADD and SUB with 's' bit set. No 12-bit immediate (T4) variants. // // Currently, t2ADDS/t2SUBS are pseudo opcodes that exist only in the // selection DAG. They are "lowered" to real t2ADD/t2SUB opcodes by // AdjustInstrPostInstrSelection where we determine whether or not to // set the "s" bit based on CPSR liveness. // // FIXME: Eliminate t2ADDS/t2SUBS pseudo opcodes after adding tablegen // support for an optional CPSR definition that corresponds to the DAG // node's second value. We can then eliminate the implicit def of CPSR. defm t2ADDS : T2I_bin_s_irs <0b1000, "add", IIC_iALUi, IIC_iALUr, IIC_iALUsi, BinOpFrag<(ARMaddc node:$LHS, node:$RHS)>, 1>; defm t2SUBS : T2I_bin_s_irs <0b1101, "sub", IIC_iALUi, IIC_iALUr, IIC_iALUsi, BinOpFrag<(ARMsubc node:$LHS, node:$RHS)>>; let hasPostISelHook = 1 in { defm t2ADC : T2I_adde_sube_irs<0b1010, "adc", BinOpWithFlagFrag<(ARMadde node:$LHS, node:$RHS, node:$FLAG)>, 1>; defm t2SBC : T2I_adde_sube_irs<0b1011, "sbc", BinOpWithFlagFrag<(ARMsube node:$LHS, node:$RHS, node:$FLAG)>>; } // RSB defm t2RSB : T2I_rbin_irs <0b1110, "rsb", BinOpFrag<(sub node:$LHS, node:$RHS)>>; // FIXME: Eliminate them if we can write def : Pat patterns which defines // CPSR and the implicit def of CPSR is not needed. defm t2RSBS : T2I_rbin_s_is <0b1110, "rsb", BinOpFrag<(ARMsubc node:$LHS, node:$RHS)>>; // (sub X, imm) gets canonicalized to (add X, -imm). Match this form. // The assume-no-carry-in form uses the negation of the input since add/sub // assume opposite meanings of the carry flag (i.e., carry == !borrow). // See the definition of AddWithCarry() in the ARM ARM A2.2.1 for the gory // details. // The AddedComplexity preferences the first variant over the others since // it can be shrunk to a 16-bit wide encoding, while the others cannot. let AddedComplexity = 1 in def : T2Pat<(add GPR:$src, imm0_255_neg:$imm), (t2SUBri GPR:$src, imm0_255_neg:$imm)>; def : T2Pat<(add GPR:$src, t2_so_imm_neg:$imm), (t2SUBri GPR:$src, t2_so_imm_neg:$imm)>; def : T2Pat<(add GPR:$src, imm0_4095_neg:$imm), (t2SUBri12 GPR:$src, imm0_4095_neg:$imm)>; let AddedComplexity = 1 in def : T2Pat<(ARMaddc rGPR:$src, imm0_255_neg:$imm), (t2SUBSri rGPR:$src, imm0_255_neg:$imm)>; def : T2Pat<(ARMaddc rGPR:$src, t2_so_imm_neg:$imm), (t2SUBSri rGPR:$src, t2_so_imm_neg:$imm)>; // The with-carry-in form matches bitwise not instead of the negation. // Effectively, the inverse interpretation of the carry flag already accounts // for part of the negation. let AddedComplexity = 1 in def : T2Pat<(ARMadde rGPR:$src, imm0_255_not:$imm, CPSR), (t2SBCri rGPR:$src, imm0_255_not:$imm)>; def : T2Pat<(ARMadde rGPR:$src, t2_so_imm_not:$imm, CPSR), (t2SBCri rGPR:$src, t2_so_imm_not:$imm)>; // Select Bytes -- for disassembly only def t2SEL : T2ThreeReg<(outs GPR:$Rd), (ins GPR:$Rn, GPR:$Rm), NoItinerary, "sel", "\t$Rd, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-24} = 0b010; let Inst{23} = 0b1; let Inst{22-20} = 0b010; let Inst{15-12} = 0b1111; let Inst{7} = 0b1; let Inst{6-4} = 0b000; } // A6.3.13, A6.3.14, A6.3.15 Parallel addition and subtraction (signed/unsigned) // And Miscellaneous operations -- for disassembly only class T2I_pam op22_20, bits<4> op7_4, string opc, list pat = [/* For disassembly only; pattern left blank */], dag iops = (ins rGPR:$Rn, rGPR:$Rm), string asm = "\t$Rd, $Rn, $Rm"> : T2I<(outs rGPR:$Rd), iops, NoItinerary, opc, asm, pat>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0101; let Inst{22-20} = op22_20; let Inst{15-12} = 0b1111; let Inst{7-4} = op7_4; bits<4> Rd; bits<4> Rn; bits<4> Rm; let Inst{11-8} = Rd; let Inst{19-16} = Rn; let Inst{3-0} = Rm; } // Saturating add/subtract -- for disassembly only def t2QADD : T2I_pam<0b000, 0b1000, "qadd", [(set rGPR:$Rd, (int_arm_qadd rGPR:$Rn, rGPR:$Rm))], (ins rGPR:$Rm, rGPR:$Rn), "\t$Rd, $Rm, $Rn">; def t2QADD16 : T2I_pam<0b001, 0b0001, "qadd16">; def t2QADD8 : T2I_pam<0b000, 0b0001, "qadd8">; def t2QASX : T2I_pam<0b010, 0b0001, "qasx">; def t2QDADD : T2I_pam<0b000, 0b1001, "qdadd", [], (ins rGPR:$Rm, rGPR:$Rn), "\t$Rd, $Rm, $Rn">; def t2QDSUB : T2I_pam<0b000, 0b1011, "qdsub", [], (ins rGPR:$Rm, rGPR:$Rn), "\t$Rd, $Rm, $Rn">; def t2QSAX : T2I_pam<0b110, 0b0001, "qsax">; def t2QSUB : T2I_pam<0b000, 0b1010, "qsub", [(set rGPR:$Rd, (int_arm_qsub rGPR:$Rn, rGPR:$Rm))], (ins rGPR:$Rm, rGPR:$Rn), "\t$Rd, $Rm, $Rn">; def t2QSUB16 : T2I_pam<0b101, 0b0001, "qsub16">; def t2QSUB8 : T2I_pam<0b100, 0b0001, "qsub8">; def t2UQADD16 : T2I_pam<0b001, 0b0101, "uqadd16">; def t2UQADD8 : T2I_pam<0b000, 0b0101, "uqadd8">; def t2UQASX : T2I_pam<0b010, 0b0101, "uqasx">; def t2UQSAX : T2I_pam<0b110, 0b0101, "uqsax">; def t2UQSUB16 : T2I_pam<0b101, 0b0101, "uqsub16">; def t2UQSUB8 : T2I_pam<0b100, 0b0101, "uqsub8">; // Signed/Unsigned add/subtract -- for disassembly only def t2SASX : T2I_pam<0b010, 0b0000, "sasx">; def t2SADD16 : T2I_pam<0b001, 0b0000, "sadd16">; def t2SADD8 : T2I_pam<0b000, 0b0000, "sadd8">; def t2SSAX : T2I_pam<0b110, 0b0000, "ssax">; def t2SSUB16 : T2I_pam<0b101, 0b0000, "ssub16">; def t2SSUB8 : T2I_pam<0b100, 0b0000, "ssub8">; def t2UASX : T2I_pam<0b010, 0b0100, "uasx">; def t2UADD16 : T2I_pam<0b001, 0b0100, "uadd16">; def t2UADD8 : T2I_pam<0b000, 0b0100, "uadd8">; def t2USAX : T2I_pam<0b110, 0b0100, "usax">; def t2USUB16 : T2I_pam<0b101, 0b0100, "usub16">; def t2USUB8 : T2I_pam<0b100, 0b0100, "usub8">; // Signed/Unsigned halving add/subtract -- for disassembly only def t2SHASX : T2I_pam<0b010, 0b0010, "shasx">; def t2SHADD16 : T2I_pam<0b001, 0b0010, "shadd16">; def t2SHADD8 : T2I_pam<0b000, 0b0010, "shadd8">; def t2SHSAX : T2I_pam<0b110, 0b0010, "shsax">; def t2SHSUB16 : T2I_pam<0b101, 0b0010, "shsub16">; def t2SHSUB8 : T2I_pam<0b100, 0b0010, "shsub8">; def t2UHASX : T2I_pam<0b010, 0b0110, "uhasx">; def t2UHADD16 : T2I_pam<0b001, 0b0110, "uhadd16">; def t2UHADD8 : T2I_pam<0b000, 0b0110, "uhadd8">; def t2UHSAX : T2I_pam<0b110, 0b0110, "uhsax">; def t2UHSUB16 : T2I_pam<0b101, 0b0110, "uhsub16">; def t2UHSUB8 : T2I_pam<0b100, 0b0110, "uhsub8">; // Helper class for disassembly only // A6.3.16 & A6.3.17 // T2Imac - Thumb2 multiply [accumulate, and absolute difference] instructions. class T2ThreeReg_mac op22_20, bits<4> op7_4, dag oops, dag iops, InstrItinClass itin, string opc, string asm, list pattern> : T2ThreeReg { let Inst{31-27} = 0b11111; let Inst{26-24} = 0b011; let Inst{23} = long; let Inst{22-20} = op22_20; let Inst{7-4} = op7_4; } class T2FourReg_mac op22_20, bits<4> op7_4, dag oops, dag iops, InstrItinClass itin, string opc, string asm, list pattern> : T2FourReg { let Inst{31-27} = 0b11111; let Inst{26-24} = 0b011; let Inst{23} = long; let Inst{22-20} = op22_20; let Inst{7-4} = op7_4; } // Unsigned Sum of Absolute Differences [and Accumulate]. def t2USAD8 : T2ThreeReg_mac<0, 0b111, 0b0000, (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), NoItinerary, "usad8", "\t$Rd, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{15-12} = 0b1111; } def t2USADA8 : T2FourReg_mac<0, 0b111, 0b0000, (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), NoItinerary, "usada8", "\t$Rd, $Rn, $Rm, $Ra", []>, Requires<[IsThumb2, HasThumb2DSP]>; // Signed/Unsigned saturate. class T2SatI pattern> : T2I { bits<4> Rd; bits<4> Rn; bits<5> sat_imm; bits<7> sh; let Inst{11-8} = Rd; let Inst{19-16} = Rn; let Inst{4-0} = sat_imm; let Inst{21} = sh{5}; let Inst{14-12} = sh{4-2}; let Inst{7-6} = sh{1-0}; } def t2SSAT: T2SatI< (outs rGPR:$Rd), (ins imm1_32:$sat_imm, rGPR:$Rn, t2_shift_imm:$sh), NoItinerary, "ssat", "\t$Rd, $sat_imm, $Rn$sh", []> { let Inst{31-27} = 0b11110; let Inst{25-22} = 0b1100; let Inst{20} = 0; let Inst{15} = 0; let Inst{5} = 0; } def t2SSAT16: T2SatI< (outs rGPR:$Rd), (ins imm1_16:$sat_imm, rGPR:$Rn), NoItinerary, "ssat16", "\t$Rd, $sat_imm, $Rn", []>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11110; let Inst{25-22} = 0b1100; let Inst{20} = 0; let Inst{15} = 0; let Inst{21} = 1; // sh = '1' let Inst{14-12} = 0b000; // imm3 = '000' let Inst{7-6} = 0b00; // imm2 = '00' let Inst{5-4} = 0b00; } def t2USAT: T2SatI< (outs rGPR:$Rd), (ins imm0_31:$sat_imm, rGPR:$Rn, t2_shift_imm:$sh), NoItinerary, "usat", "\t$Rd, $sat_imm, $Rn$sh", []> { let Inst{31-27} = 0b11110; let Inst{25-22} = 0b1110; let Inst{20} = 0; let Inst{15} = 0; } def t2USAT16: T2SatI<(outs rGPR:$Rd), (ins imm0_15:$sat_imm, rGPR:$Rn), NoItinerary, "usat16", "\t$Rd, $sat_imm, $Rn", []>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-22} = 0b1111001110; let Inst{20} = 0; let Inst{15} = 0; let Inst{21} = 1; // sh = '1' let Inst{14-12} = 0b000; // imm3 = '000' let Inst{7-6} = 0b00; // imm2 = '00' let Inst{5-4} = 0b00; } def : T2Pat<(int_arm_ssat GPR:$a, imm:$pos), (t2SSAT imm:$pos, GPR:$a, 0)>; def : T2Pat<(int_arm_usat GPR:$a, imm:$pos), (t2USAT imm:$pos, GPR:$a, 0)>; //===----------------------------------------------------------------------===// // Shift and rotate Instructions. // defm t2LSL : T2I_sh_ir<0b00, "lsl", imm0_31, BinOpFrag<(shl node:$LHS, node:$RHS)>, "t2LSL">; defm t2LSR : T2I_sh_ir<0b01, "lsr", imm_sr, BinOpFrag<(srl node:$LHS, node:$RHS)>, "t2LSR">; defm t2ASR : T2I_sh_ir<0b10, "asr", imm_sr, BinOpFrag<(sra node:$LHS, node:$RHS)>, "t2ASR">; defm t2ROR : T2I_sh_ir<0b11, "ror", imm0_31, BinOpFrag<(rotr node:$LHS, node:$RHS)>, "t2ROR">; // (rotr x, (and y, 0x...1f)) ==> (ROR x, y) def : Pat<(rotr rGPR:$lhs, (and rGPR:$rhs, lo5AllOne)), (t2RORrr rGPR:$lhs, rGPR:$rhs)>; let Uses = [CPSR] in { def t2RRX : T2sTwoReg<(outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi, "rrx", "\t$Rd, $Rm", [(set rGPR:$Rd, (ARMrrx rGPR:$Rm))]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = 0b0010; let Inst{19-16} = 0b1111; // Rn let Inst{14-12} = 0b000; let Inst{7-4} = 0b0011; } } let isCodeGenOnly = 1, Defs = [CPSR] in { def t2MOVsrl_flag : T2TwoRegShiftImm< (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi, "lsrs", ".w\t$Rd, $Rm, #1", [(set rGPR:$Rd, (ARMsrl_flag rGPR:$Rm))]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = 0b0010; let Inst{20} = 1; // The S bit. let Inst{19-16} = 0b1111; // Rn let Inst{5-4} = 0b01; // Shift type. // Shift amount = Inst{14-12:7-6} = 1. let Inst{14-12} = 0b000; let Inst{7-6} = 0b01; } def t2MOVsra_flag : T2TwoRegShiftImm< (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iMOVsi, "asrs", ".w\t$Rd, $Rm, #1", [(set rGPR:$Rd, (ARMsra_flag rGPR:$Rm))]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = 0b0010; let Inst{20} = 1; // The S bit. let Inst{19-16} = 0b1111; // Rn let Inst{5-4} = 0b10; // Shift type. // Shift amount = Inst{14-12:7-6} = 1. let Inst{14-12} = 0b000; let Inst{7-6} = 0b01; } } //===----------------------------------------------------------------------===// // Bitwise Instructions. // defm t2AND : T2I_bin_w_irs<0b0000, "and", IIC_iBITi, IIC_iBITr, IIC_iBITsi, BinOpFrag<(and node:$LHS, node:$RHS)>, "t2AND", 1>; defm t2ORR : T2I_bin_w_irs<0b0010, "orr", IIC_iBITi, IIC_iBITr, IIC_iBITsi, BinOpFrag<(or node:$LHS, node:$RHS)>, "t2ORR", 1>; defm t2EOR : T2I_bin_w_irs<0b0100, "eor", IIC_iBITi, IIC_iBITr, IIC_iBITsi, BinOpFrag<(xor node:$LHS, node:$RHS)>, "t2EOR", 1>; defm t2BIC : T2I_bin_w_irs<0b0001, "bic", IIC_iBITi, IIC_iBITr, IIC_iBITsi, BinOpFrag<(and node:$LHS, (not node:$RHS))>, "t2BIC">; class T2BitFI pattern> : T2I { bits<4> Rd; bits<5> msb; bits<5> lsb; let Inst{11-8} = Rd; let Inst{4-0} = msb{4-0}; let Inst{14-12} = lsb{4-2}; let Inst{7-6} = lsb{1-0}; } class T2TwoRegBitFI pattern> : T2BitFI { bits<4> Rn; let Inst{19-16} = Rn; } let Constraints = "$src = $Rd" in def t2BFC : T2BitFI<(outs rGPR:$Rd), (ins rGPR:$src, bf_inv_mask_imm:$imm), IIC_iUNAsi, "bfc", "\t$Rd, $imm", [(set rGPR:$Rd, (and rGPR:$src, bf_inv_mask_imm:$imm))]> { let Inst{31-27} = 0b11110; let Inst{26} = 0; // should be 0. let Inst{25} = 1; let Inst{24-20} = 0b10110; let Inst{19-16} = 0b1111; // Rn let Inst{15} = 0; let Inst{5} = 0; // should be 0. bits<10> imm; let msb{4-0} = imm{9-5}; let lsb{4-0} = imm{4-0}; } def t2SBFX: T2TwoRegBitFI< (outs rGPR:$Rd), (ins rGPR:$Rn, imm0_31:$lsb, imm1_32:$msb), IIC_iUNAsi, "sbfx", "\t$Rd, $Rn, $lsb, $msb", []> { let Inst{31-27} = 0b11110; let Inst{25} = 1; let Inst{24-20} = 0b10100; let Inst{15} = 0; } def t2UBFX: T2TwoRegBitFI< (outs rGPR:$Rd), (ins rGPR:$Rn, imm0_31:$lsb, imm1_32:$msb), IIC_iUNAsi, "ubfx", "\t$Rd, $Rn, $lsb, $msb", []> { let Inst{31-27} = 0b11110; let Inst{25} = 1; let Inst{24-20} = 0b11100; let Inst{15} = 0; } // A8.6.18 BFI - Bitfield insert (Encoding T1) let Constraints = "$src = $Rd" in { def t2BFI : T2TwoRegBitFI<(outs rGPR:$Rd), (ins rGPR:$src, rGPR:$Rn, bf_inv_mask_imm:$imm), IIC_iBITi, "bfi", "\t$Rd, $Rn, $imm", [(set rGPR:$Rd, (ARMbfi rGPR:$src, rGPR:$Rn, bf_inv_mask_imm:$imm))]> { let Inst{31-27} = 0b11110; let Inst{26} = 0; // should be 0. let Inst{25} = 1; let Inst{24-20} = 0b10110; let Inst{15} = 0; let Inst{5} = 0; // should be 0. bits<10> imm; let msb{4-0} = imm{9-5}; let lsb{4-0} = imm{4-0}; } } defm t2ORN : T2I_bin_irs<0b0011, "orn", IIC_iBITi, IIC_iBITr, IIC_iBITsi, BinOpFrag<(or node:$LHS, (not node:$RHS))>, "t2ORN", 0, "">; /// T2I_un_irs - Defines a set of (op reg, {so_imm|r|so_reg}) patterns for a /// unary operation that produces a value. These are predicable and can be /// changed to modify CPSR. multiclass T2I_un_irs opcod, string opc, InstrItinClass iii, InstrItinClass iir, InstrItinClass iis, PatFrag opnode, bit Cheap = 0, bit ReMat = 0> { // shifted imm def i : T2sOneRegImm<(outs rGPR:$Rd), (ins t2_so_imm:$imm), iii, opc, "\t$Rd, $imm", [(set rGPR:$Rd, (opnode t2_so_imm:$imm))]> { let isAsCheapAsAMove = Cheap; let isReMaterializable = ReMat; let Inst{31-27} = 0b11110; let Inst{25} = 0; let Inst{24-21} = opcod; let Inst{19-16} = 0b1111; // Rn let Inst{15} = 0; } // register def r : T2sTwoReg<(outs rGPR:$Rd), (ins rGPR:$Rm), iir, opc, ".w\t$Rd, $Rm", [(set rGPR:$Rd, (opnode rGPR:$Rm))]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = opcod; let Inst{19-16} = 0b1111; // Rn let Inst{14-12} = 0b000; // imm3 let Inst{7-6} = 0b00; // imm2 let Inst{5-4} = 0b00; // type } // shifted register def s : T2sOneRegShiftedReg<(outs rGPR:$Rd), (ins t2_so_reg:$ShiftedRm), iis, opc, ".w\t$Rd, $ShiftedRm", [(set rGPR:$Rd, (opnode t2_so_reg:$ShiftedRm))]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = opcod; let Inst{19-16} = 0b1111; // Rn } } // Prefer over of t2EORri ra, rb, -1 because mvn has 16-bit version let AddedComplexity = 1 in defm t2MVN : T2I_un_irs <0b0011, "mvn", IIC_iMVNi, IIC_iMVNr, IIC_iMVNsi, UnOpFrag<(not node:$Src)>, 1, 1>; let AddedComplexity = 1 in def : T2Pat<(and rGPR:$src, t2_so_imm_not:$imm), (t2BICri rGPR:$src, t2_so_imm_not:$imm)>; // FIXME: Disable this pattern on Darwin to workaround an assembler bug. def : T2Pat<(or rGPR:$src, t2_so_imm_not:$imm), (t2ORNri rGPR:$src, t2_so_imm_not:$imm)>, Requires<[IsThumb2]>; def : T2Pat<(t2_so_imm_not:$src), (t2MVNi t2_so_imm_not:$src)>; //===----------------------------------------------------------------------===// // Multiply Instructions. // let isCommutable = 1 in def t2MUL: T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL32, "mul", "\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, (mul rGPR:$Rn, rGPR:$Rm))]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b000; let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate) let Inst{7-4} = 0b0000; // Multiply } def t2MLA: T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32, "mla", "\t$Rd, $Rn, $Rm, $Ra", [(set rGPR:$Rd, (add (mul rGPR:$Rn, rGPR:$Rm), rGPR:$Ra))]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b000; let Inst{7-4} = 0b0000; // Multiply } def t2MLS: T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32, "mls", "\t$Rd, $Rn, $Rm, $Ra", [(set rGPR:$Rd, (sub rGPR:$Ra, (mul rGPR:$Rn, rGPR:$Rm)))]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b000; let Inst{7-4} = 0b0001; // Multiply and Subtract } // Extra precision multiplies with low / high results let neverHasSideEffects = 1 in { let isCommutable = 1 in { def t2SMULL : T2MulLong<0b000, 0b0000, (outs rGPR:$RdLo, rGPR:$RdHi), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL64, "smull", "\t$RdLo, $RdHi, $Rn, $Rm", []>; def t2UMULL : T2MulLong<0b010, 0b0000, (outs rGPR:$RdLo, rGPR:$RdHi), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL64, "umull", "\t$RdLo, $RdHi, $Rn, $Rm", []>; } // isCommutable // Multiply + accumulate def t2SMLAL : T2MulLong<0b100, 0b0000, (outs rGPR:$RdLo, rGPR:$RdHi), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC64, "smlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>; def t2UMLAL : T2MulLong<0b110, 0b0000, (outs rGPR:$RdLo, rGPR:$RdHi), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC64, "umlal", "\t$RdLo, $RdHi, $Rn, $Rm", []>; def t2UMAAL : T2MulLong<0b110, 0b0110, (outs rGPR:$RdLo, rGPR:$RdHi), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC64, "umaal", "\t$RdLo, $RdHi, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]>; } // neverHasSideEffects // Rounding variants of the below included for disassembly only // Most significant word multiply def t2SMMUL : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL32, "smmul", "\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, (mulhs rGPR:$Rn, rGPR:$Rm))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b101; let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate) let Inst{7-4} = 0b0000; // No Rounding (Inst{4} = 0) } def t2SMMULR : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL32, "smmulr", "\t$Rd, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b101; let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate) let Inst{7-4} = 0b0001; // Rounding (Inst{4} = 1) } def t2SMMLA : T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32, "smmla", "\t$Rd, $Rn, $Rm, $Ra", [(set rGPR:$Rd, (add (mulhs rGPR:$Rm, rGPR:$Rn), rGPR:$Ra))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b101; let Inst{7-4} = 0b0000; // No Rounding (Inst{4} = 0) } def t2SMMLAR: T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32, "smmlar", "\t$Rd, $Rn, $Rm, $Ra", []>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b101; let Inst{7-4} = 0b0001; // Rounding (Inst{4} = 1) } def t2SMMLS: T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32, "smmls", "\t$Rd, $Rn, $Rm, $Ra", [(set rGPR:$Rd, (sub rGPR:$Ra, (mulhs rGPR:$Rn, rGPR:$Rm)))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b110; let Inst{7-4} = 0b0000; // No Rounding (Inst{4} = 0) } def t2SMMLSR:T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32, "smmlsr", "\t$Rd, $Rn, $Rm, $Ra", []>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b110; let Inst{7-4} = 0b0001; // Rounding (Inst{4} = 1) } multiclass T2I_smul { def BB : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL16, !strconcat(opc, "bb"), "\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, (opnode (sext_inreg rGPR:$Rn, i16), (sext_inreg rGPR:$Rm, i16)))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b001; let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate) let Inst{7-6} = 0b00; let Inst{5-4} = 0b00; } def BT : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL16, !strconcat(opc, "bt"), "\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, (opnode (sext_inreg rGPR:$Rn, i16), (sra rGPR:$Rm, (i32 16))))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b001; let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate) let Inst{7-6} = 0b00; let Inst{5-4} = 0b01; } def TB : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL16, !strconcat(opc, "tb"), "\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, (opnode (sra rGPR:$Rn, (i32 16)), (sext_inreg rGPR:$Rm, i16)))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b001; let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate) let Inst{7-6} = 0b00; let Inst{5-4} = 0b10; } def TT : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL16, !strconcat(opc, "tt"), "\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, (opnode (sra rGPR:$Rn, (i32 16)), (sra rGPR:$Rm, (i32 16))))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b001; let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate) let Inst{7-6} = 0b00; let Inst{5-4} = 0b11; } def WB : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL16, !strconcat(opc, "wb"), "\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, (sra (opnode rGPR:$Rn, (sext_inreg rGPR:$Rm, i16)), (i32 16)))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b011; let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate) let Inst{7-6} = 0b00; let Inst{5-4} = 0b00; } def WT : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMUL16, !strconcat(opc, "wt"), "\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, (sra (opnode rGPR:$Rn, (sra rGPR:$Rm, (i32 16))), (i32 16)))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b011; let Inst{15-12} = 0b1111; // Ra = 0b1111 (no accumulate) let Inst{7-6} = 0b00; let Inst{5-4} = 0b01; } } multiclass T2I_smla { def BB : T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC16, !strconcat(opc, "bb"), "\t$Rd, $Rn, $Rm, $Ra", [(set rGPR:$Rd, (add rGPR:$Ra, (opnode (sext_inreg rGPR:$Rn, i16), (sext_inreg rGPR:$Rm, i16))))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b001; let Inst{7-6} = 0b00; let Inst{5-4} = 0b00; } def BT : T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC16, !strconcat(opc, "bt"), "\t$Rd, $Rn, $Rm, $Ra", [(set rGPR:$Rd, (add rGPR:$Ra, (opnode (sext_inreg rGPR:$Rn, i16), (sra rGPR:$Rm, (i32 16)))))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b001; let Inst{7-6} = 0b00; let Inst{5-4} = 0b01; } def TB : T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC16, !strconcat(opc, "tb"), "\t$Rd, $Rn, $Rm, $Ra", [(set rGPR:$Rd, (add rGPR:$Ra, (opnode (sra rGPR:$Rn, (i32 16)), (sext_inreg rGPR:$Rm, i16))))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b001; let Inst{7-6} = 0b00; let Inst{5-4} = 0b10; } def TT : T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC16, !strconcat(opc, "tt"), "\t$Rd, $Rn, $Rm, $Ra", [(set rGPR:$Rd, (add rGPR:$Ra, (opnode (sra rGPR:$Rn, (i32 16)), (sra rGPR:$Rm, (i32 16)))))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b001; let Inst{7-6} = 0b00; let Inst{5-4} = 0b11; } def WB : T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC16, !strconcat(opc, "wb"), "\t$Rd, $Rn, $Rm, $Ra", [(set rGPR:$Rd, (add rGPR:$Ra, (sra (opnode rGPR:$Rn, (sext_inreg rGPR:$Rm, i16)), (i32 16))))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b011; let Inst{7-6} = 0b00; let Inst{5-4} = 0b00; } def WT : T2FourReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC16, !strconcat(opc, "wt"), "\t$Rd, $Rn, $Rm, $Ra", [(set rGPR:$Rd, (add rGPR:$Ra, (sra (opnode rGPR:$Rn, (sra rGPR:$Rm, (i32 16))), (i32 16))))]>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{31-27} = 0b11111; let Inst{26-23} = 0b0110; let Inst{22-20} = 0b011; let Inst{7-6} = 0b00; let Inst{5-4} = 0b01; } } defm t2SMUL : T2I_smul<"smul", BinOpFrag<(mul node:$LHS, node:$RHS)>>; defm t2SMLA : T2I_smla<"smla", BinOpFrag<(mul node:$LHS, node:$RHS)>>; // Halfword multiple accumulate long: SMLAL def t2SMLALBB : T2FourReg_mac<1, 0b100, 0b1000, (outs rGPR:$Ra,rGPR:$Rd), (ins rGPR:$Rn,rGPR:$Rm), IIC_iMAC64, "smlalbb", "\t$Ra, $Rd, $Rn, $Rm", [/* For disassembly only; pattern left blank */]>, Requires<[IsThumb2, HasThumb2DSP]>; def t2SMLALBT : T2FourReg_mac<1, 0b100, 0b1001, (outs rGPR:$Ra,rGPR:$Rd), (ins rGPR:$Rn,rGPR:$Rm), IIC_iMAC64, "smlalbt", "\t$Ra, $Rd, $Rn, $Rm", [/* For disassembly only; pattern left blank */]>, Requires<[IsThumb2, HasThumb2DSP]>; def t2SMLALTB : T2FourReg_mac<1, 0b100, 0b1010, (outs rGPR:$Ra,rGPR:$Rd), (ins rGPR:$Rn,rGPR:$Rm), IIC_iMAC64, "smlaltb", "\t$Ra, $Rd, $Rn, $Rm", [/* For disassembly only; pattern left blank */]>, Requires<[IsThumb2, HasThumb2DSP]>; def t2SMLALTT : T2FourReg_mac<1, 0b100, 0b1011, (outs rGPR:$Ra,rGPR:$Rd), (ins rGPR:$Rn,rGPR:$Rm), IIC_iMAC64, "smlaltt", "\t$Ra, $Rd, $Rn, $Rm", [/* For disassembly only; pattern left blank */]>, Requires<[IsThumb2, HasThumb2DSP]>; // Dual halfword multiple: SMUAD, SMUSD, SMLAD, SMLSD, SMLALD, SMLSLD def t2SMUAD: T2ThreeReg_mac< 0, 0b010, 0b0000, (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC32, "smuad", "\t$Rd, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{15-12} = 0b1111; } def t2SMUADX:T2ThreeReg_mac< 0, 0b010, 0b0001, (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC32, "smuadx", "\t$Rd, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{15-12} = 0b1111; } def t2SMUSD: T2ThreeReg_mac< 0, 0b100, 0b0000, (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC32, "smusd", "\t$Rd, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{15-12} = 0b1111; } def t2SMUSDX:T2ThreeReg_mac< 0, 0b100, 0b0001, (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC32, "smusdx", "\t$Rd, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]> { let Inst{15-12} = 0b1111; } def t2SMLAD : T2FourReg_mac< 0, 0b010, 0b0000, (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32, "smlad", "\t$Rd, $Rn, $Rm, $Ra", []>, Requires<[IsThumb2, HasThumb2DSP]>; def t2SMLADX : T2FourReg_mac< 0, 0b010, 0b0001, (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32, "smladx", "\t$Rd, $Rn, $Rm, $Ra", []>, Requires<[IsThumb2, HasThumb2DSP]>; def t2SMLSD : T2FourReg_mac<0, 0b100, 0b0000, (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32, "smlsd", "\t$Rd, $Rn, $Rm, $Ra", []>, Requires<[IsThumb2, HasThumb2DSP]>; def t2SMLSDX : T2FourReg_mac<0, 0b100, 0b0001, (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, rGPR:$Ra), IIC_iMAC32, "smlsdx", "\t$Rd, $Rn, $Rm, $Ra", []>, Requires<[IsThumb2, HasThumb2DSP]>; def t2SMLALD : T2FourReg_mac<1, 0b100, 0b1100, (outs rGPR:$Ra,rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iMAC64, "smlald", "\t$Ra, $Rd, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]>; def t2SMLALDX : T2FourReg_mac<1, 0b100, 0b1101, (outs rGPR:$Ra,rGPR:$Rd), (ins rGPR:$Rn,rGPR:$Rm), IIC_iMAC64, "smlaldx", "\t$Ra, $Rd, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]>; def t2SMLSLD : T2FourReg_mac<1, 0b101, 0b1100, (outs rGPR:$Ra,rGPR:$Rd), (ins rGPR:$Rn,rGPR:$Rm), IIC_iMAC64, "smlsld", "\t$Ra, $Rd, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]>; def t2SMLSLDX : T2FourReg_mac<1, 0b101, 0b1101, (outs rGPR:$Ra,rGPR:$Rd), (ins rGPR:$Rm,rGPR:$Rn), IIC_iMAC64, "smlsldx", "\t$Ra, $Rd, $Rn, $Rm", []>, Requires<[IsThumb2, HasThumb2DSP]>; //===----------------------------------------------------------------------===// // Division Instructions. // Signed and unsigned division on v7-M // def t2SDIV : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iALUi, "sdiv", "\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, (sdiv rGPR:$Rn, rGPR:$Rm))]>, Requires<[HasDivide, IsThumb2]> { let Inst{31-27} = 0b11111; let Inst{26-21} = 0b011100; let Inst{20} = 0b1; let Inst{15-12} = 0b1111; let Inst{7-4} = 0b1111; } def t2UDIV : T2ThreeReg<(outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm), IIC_iALUi, "udiv", "\t$Rd, $Rn, $Rm", [(set rGPR:$Rd, (udiv rGPR:$Rn, rGPR:$Rm))]>, Requires<[HasDivide, IsThumb2]> { let Inst{31-27} = 0b11111; let Inst{26-21} = 0b011101; let Inst{20} = 0b1; let Inst{15-12} = 0b1111; let Inst{7-4} = 0b1111; } //===----------------------------------------------------------------------===// // Misc. Arithmetic Instructions. // class T2I_misc op1, bits<2> op2, dag oops, dag iops, InstrItinClass itin, string opc, string asm, list pattern> : T2ThreeReg { let Inst{31-27} = 0b11111; let Inst{26-22} = 0b01010; let Inst{21-20} = op1; let Inst{15-12} = 0b1111; let Inst{7-6} = 0b10; let Inst{5-4} = op2; let Rn{3-0} = Rm; } def t2CLZ : T2I_misc<0b11, 0b00, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr, "clz", "\t$Rd, $Rm", [(set rGPR:$Rd, (ctlz rGPR:$Rm))]>; def t2RBIT : T2I_misc<0b01, 0b10, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr, "rbit", "\t$Rd, $Rm", [(set rGPR:$Rd, (ARMrbit rGPR:$Rm))]>; def t2REV : T2I_misc<0b01, 0b00, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr, "rev", ".w\t$Rd, $Rm", [(set rGPR:$Rd, (bswap rGPR:$Rm))]>; def t2REV16 : T2I_misc<0b01, 0b01, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr, "rev16", ".w\t$Rd, $Rm", [(set rGPR:$Rd, (rotr (bswap rGPR:$Rm), (i32 16)))]>; def t2REVSH : T2I_misc<0b01, 0b11, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr, "revsh", ".w\t$Rd, $Rm", [(set rGPR:$Rd, (sra (bswap rGPR:$Rm), (i32 16)))]>; def : T2Pat<(or (sra (shl rGPR:$Rm, (i32 24)), (i32 16)), (and (srl rGPR:$Rm, (i32 8)), 0xFF)), (t2REVSH rGPR:$Rm)>; def t2PKHBT : T2ThreeReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, pkh_lsl_amt:$sh), IIC_iBITsi, "pkhbt", "\t$Rd, $Rn, $Rm$sh", [(set rGPR:$Rd, (or (and rGPR:$Rn, 0xFFFF), (and (shl rGPR:$Rm, pkh_lsl_amt:$sh), 0xFFFF0000)))]>, Requires<[HasT2ExtractPack, IsThumb2]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-20} = 0b01100; let Inst{5} = 0; // BT form let Inst{4} = 0; bits<5> sh; let Inst{14-12} = sh{4-2}; let Inst{7-6} = sh{1-0}; } // Alternate cases for PKHBT where identities eliminate some nodes. def : T2Pat<(or (and rGPR:$src1, 0xFFFF), (and rGPR:$src2, 0xFFFF0000)), (t2PKHBT rGPR:$src1, rGPR:$src2, 0)>, Requires<[HasT2ExtractPack, IsThumb2]>; def : T2Pat<(or (and rGPR:$src1, 0xFFFF), (shl rGPR:$src2, imm16_31:$sh)), (t2PKHBT rGPR:$src1, rGPR:$src2, imm16_31:$sh)>, Requires<[HasT2ExtractPack, IsThumb2]>; // Note: Shifts of 1-15 bits will be transformed to srl instead of sra and // will match the pattern below. def t2PKHTB : T2ThreeReg< (outs rGPR:$Rd), (ins rGPR:$Rn, rGPR:$Rm, pkh_asr_amt:$sh), IIC_iBITsi, "pkhtb", "\t$Rd, $Rn, $Rm$sh", [(set rGPR:$Rd, (or (and rGPR:$Rn, 0xFFFF0000), (and (sra rGPR:$Rm, pkh_asr_amt:$sh), 0xFFFF)))]>, Requires<[HasT2ExtractPack, IsThumb2]> { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-20} = 0b01100; let Inst{5} = 1; // TB form let Inst{4} = 0; bits<5> sh; let Inst{14-12} = sh{4-2}; let Inst{7-6} = sh{1-0}; } // Alternate cases for PKHTB where identities eliminate some nodes. Note that // a shift amount of 0 is *not legal* here, it is PKHBT instead. def : T2Pat<(or (and rGPR:$src1, 0xFFFF0000), (srl rGPR:$src2, imm16_31:$sh)), (t2PKHTB rGPR:$src1, rGPR:$src2, imm16_31:$sh)>, Requires<[HasT2ExtractPack, IsThumb2]>; def : T2Pat<(or (and rGPR:$src1, 0xFFFF0000), (and (srl rGPR:$src2, imm1_15:$sh), 0xFFFF)), (t2PKHTB rGPR:$src1, rGPR:$src2, imm1_15:$sh)>, Requires<[HasT2ExtractPack, IsThumb2]>; //===----------------------------------------------------------------------===// // Comparison Instructions... // defm t2CMP : T2I_cmp_irs<0b1101, "cmp", IIC_iCMPi, IIC_iCMPr, IIC_iCMPsi, BinOpFrag<(ARMcmp node:$LHS, node:$RHS)>, "t2CMP">; def : T2Pat<(ARMcmpZ GPRnopc:$lhs, t2_so_imm:$imm), (t2CMPri GPRnopc:$lhs, t2_so_imm:$imm)>; def : T2Pat<(ARMcmpZ GPRnopc:$lhs, rGPR:$rhs), (t2CMPrr GPRnopc:$lhs, rGPR:$rhs)>; def : T2Pat<(ARMcmpZ GPRnopc:$lhs, t2_so_reg:$rhs), (t2CMPrs GPRnopc:$lhs, t2_so_reg:$rhs)>; //FIXME: Disable CMN, as CCodes are backwards from compare expectations // Compare-to-zero still works out, just not the relationals //defm t2CMN : T2I_cmp_irs<0b1000, "cmn", // BinOpFrag<(ARMcmp node:$LHS,(ineg node:$RHS))>>; defm t2CMNz : T2I_cmp_irs<0b1000, "cmn", IIC_iCMPi, IIC_iCMPr, IIC_iCMPsi, BinOpFrag<(ARMcmpZ node:$LHS,(ineg node:$RHS))>, "t2CMNz">; //def : T2Pat<(ARMcmp GPR:$src, t2_so_imm_neg:$imm), // (t2CMNri GPR:$src, t2_so_imm_neg:$imm)>; def : T2Pat<(ARMcmpZ GPRnopc:$src, t2_so_imm_neg:$imm), (t2CMNzri GPRnopc:$src, t2_so_imm_neg:$imm)>; defm t2TST : T2I_cmp_irs<0b0000, "tst", IIC_iTSTi, IIC_iTSTr, IIC_iTSTsi, BinOpFrag<(ARMcmpZ (and_su node:$LHS, node:$RHS), 0)>, "t2TST">; defm t2TEQ : T2I_cmp_irs<0b0100, "teq", IIC_iTSTi, IIC_iTSTr, IIC_iTSTsi, BinOpFrag<(ARMcmpZ (xor_su node:$LHS, node:$RHS), 0)>, "t2TEQ">; // Conditional moves // FIXME: should be able to write a pattern for ARMcmov, but can't use // a two-value operand where a dag node expects two operands. :( let neverHasSideEffects = 1 in { def t2MOVCCr : t2PseudoInst<(outs rGPR:$Rd), (ins rGPR:$false, rGPR:$Rm, pred:$p), 4, IIC_iCMOVr, [/*(set rGPR:$Rd, (ARMcmov rGPR:$false, rGPR:$Rm, imm:$cc, CCR:$ccr))*/]>, RegConstraint<"$false = $Rd">; let isMoveImm = 1 in def t2MOVCCi : t2PseudoInst<(outs rGPR:$Rd), (ins rGPR:$false, t2_so_imm:$imm, pred:$p), 4, IIC_iCMOVi, [/*(set rGPR:$Rd,(ARMcmov rGPR:$false,t2_so_imm:$imm, imm:$cc, CCR:$ccr))*/]>, RegConstraint<"$false = $Rd">; // FIXME: Pseudo-ize these. For now, just mark codegen only. let isCodeGenOnly = 1 in { let isMoveImm = 1 in def t2MOVCCi16 : T2I<(outs rGPR:$Rd), (ins rGPR:$false, imm0_65535_expr:$imm), IIC_iCMOVi, "movw", "\t$Rd, $imm", []>, RegConstraint<"$false = $Rd"> { let Inst{31-27} = 0b11110; let Inst{25} = 1; let Inst{24-21} = 0b0010; let Inst{20} = 0; // The S bit. let Inst{15} = 0; bits<4> Rd; bits<16> imm; let Inst{11-8} = Rd; let Inst{19-16} = imm{15-12}; let Inst{26} = imm{11}; let Inst{14-12} = imm{10-8}; let Inst{7-0} = imm{7-0}; } let isMoveImm = 1 in def t2MOVCCi32imm : PseudoInst<(outs rGPR:$dst), (ins rGPR:$false, i32imm:$src, pred:$p), IIC_iCMOVix2, []>, RegConstraint<"$false = $dst">; let isMoveImm = 1 in def t2MVNCCi : T2OneRegImm<(outs rGPR:$Rd), (ins rGPR:$false, t2_so_imm:$imm), IIC_iCMOVi, "mvn", ".w\t$Rd, $imm", [/*(set rGPR:$Rd,(ARMcmov rGPR:$false,t2_so_imm_not:$imm, imm:$cc, CCR:$ccr))*/]>, RegConstraint<"$false = $Rd"> { let Inst{31-27} = 0b11110; let Inst{25} = 0; let Inst{24-21} = 0b0011; let Inst{20} = 0; // The S bit. let Inst{19-16} = 0b1111; // Rn let Inst{15} = 0; } class T2I_movcc_sh opcod, dag oops, dag iops, InstrItinClass itin, string opc, string asm, list pattern> : T2TwoRegShiftImm { let Inst{31-27} = 0b11101; let Inst{26-25} = 0b01; let Inst{24-21} = 0b0010; let Inst{20} = 0; // The S bit. let Inst{19-16} = 0b1111; // Rn let Inst{5-4} = opcod; // Shift type. } def t2MOVCClsl : T2I_movcc_sh<0b00, (outs rGPR:$Rd), (ins rGPR:$false, rGPR:$Rm, i32imm:$imm), IIC_iCMOVsi, "lsl", ".w\t$Rd, $Rm, $imm", []>, RegConstraint<"$false = $Rd">; def t2MOVCClsr : T2I_movcc_sh<0b01, (outs rGPR:$Rd), (ins rGPR:$false, rGPR:$Rm, i32imm:$imm), IIC_iCMOVsi, "lsr", ".w\t$Rd, $Rm, $imm", []>, RegConstraint<"$false = $Rd">; def t2MOVCCasr : T2I_movcc_sh<0b10, (outs rGPR:$Rd), (ins rGPR:$false, rGPR:$Rm, i32imm:$imm), IIC_iCMOVsi, "asr", ".w\t$Rd, $Rm, $imm", []>, RegConstraint<"$false = $Rd">; def t2MOVCCror : T2I_movcc_sh<0b11, (outs rGPR:$Rd), (ins rGPR:$false, rGPR:$Rm, i32imm:$imm), IIC_iCMOVsi, "ror", ".w\t$Rd, $Rm, $imm", []>, RegConstraint<"$false = $Rd">; } // isCodeGenOnly = 1 } // neverHasSideEffects //===----------------------------------------------------------------------===// // Atomic operations intrinsics // // memory barriers protect the atomic sequences let hasSideEffects = 1 in { def t2DMB : AInoP<(outs), (ins memb_opt:$opt), ThumbFrm, NoItinerary, "dmb", "\t$opt", [(ARMMemBarrier (i32 imm:$opt))]>, Requires<[IsThumb, HasDB]> { bits<4> opt; let Inst{31-4} = 0xf3bf8f5; let Inst{3-0} = opt; } } def t2DSB : AInoP<(outs), (ins memb_opt:$opt), ThumbFrm, NoItinerary, "dsb", "\t$opt", []>, Requires<[IsThumb, HasDB]> { bits<4> opt; let Inst{31-4} = 0xf3bf8f4; let Inst{3-0} = opt; } def t2ISB : AInoP<(outs), (ins memb_opt:$opt), ThumbFrm, NoItinerary, "isb", "\t$opt", []>, Requires<[IsThumb2, HasDB]> { bits<4> opt; let Inst{31-4} = 0xf3bf8f6; let Inst{3-0} = opt; } class T2I_ldrex opcod, dag oops, dag iops, AddrMode am, int sz, InstrItinClass itin, string opc, string asm, string cstr, list pattern, bits<4> rt2 = 0b1111> : Thumb2I { let Inst{31-27} = 0b11101; let Inst{26-20} = 0b0001101; let Inst{11-8} = rt2; let Inst{7-6} = 0b01; let Inst{5-4} = opcod; let Inst{3-0} = 0b1111; bits<4> addr; bits<4> Rt; let Inst{19-16} = addr; let Inst{15-12} = Rt; } class T2I_strex opcod, dag oops, dag iops, AddrMode am, int sz, InstrItinClass itin, string opc, string asm, string cstr, list pattern, bits<4> rt2 = 0b1111> : Thumb2I { let Inst{31-27} = 0b11101; let Inst{26-20} = 0b0001100; let Inst{11-8} = rt2; let Inst{7-6} = 0b01; let Inst{5-4} = opcod; bits<4> Rd; bits<4> addr; bits<4> Rt; let Inst{3-0} = Rd; let Inst{19-16} = addr; let Inst{15-12} = Rt; } let mayLoad = 1 in { def t2LDREXB : T2I_ldrex<0b00, (outs rGPR:$Rt), (ins addr_offset_none:$addr), AddrModeNone, 4, NoItinerary, "ldrexb", "\t$Rt, $addr", "", []>; def t2LDREXH : T2I_ldrex<0b01, (outs rGPR:$Rt), (ins addr_offset_none:$addr), AddrModeNone, 4, NoItinerary, "ldrexh", "\t$Rt, $addr", "", []>; def t2LDREX : Thumb2I<(outs rGPR:$Rt), (ins t2addrmode_imm0_1020s4:$addr), AddrModeNone, 4, NoItinerary, "ldrex", "\t$Rt, $addr", "", []> { bits<4> Rt; bits<12> addr; let Inst{31-27} = 0b11101; let Inst{26-20} = 0b0000101; let Inst{19-16} = addr{11-8}; let Inst{15-12} = Rt; let Inst{11-8} = 0b1111; let Inst{7-0} = addr{7-0}; } let hasExtraDefRegAllocReq = 1 in def t2LDREXD : T2I_ldrex<0b11, (outs rGPR:$Rt, rGPR:$Rt2), (ins addr_offset_none:$addr), AddrModeNone, 4, NoItinerary, "ldrexd", "\t$Rt, $Rt2, $addr", "", [], {?, ?, ?, ?}> { bits<4> Rt2; let Inst{11-8} = Rt2; } } let mayStore = 1, Constraints = "@earlyclobber $Rd" in { def t2STREXB : T2I_strex<0b00, (outs rGPR:$Rd), (ins rGPR:$Rt, addr_offset_none:$addr), AddrModeNone, 4, NoItinerary, "strexb", "\t$Rd, $Rt, $addr", "", []>; def t2STREXH : T2I_strex<0b01, (outs rGPR:$Rd), (ins rGPR:$Rt, addr_offset_none:$addr), AddrModeNone, 4, NoItinerary, "strexh", "\t$Rd, $Rt, $addr", "", []>; def t2STREX : Thumb2I<(outs rGPR:$Rd), (ins rGPR:$Rt, t2addrmode_imm0_1020s4:$addr), AddrModeNone, 4, NoItinerary, "strex", "\t$Rd, $Rt, $addr", "", []> { bits<4> Rd; bits<4> Rt; bits<12> addr; let Inst{31-27} = 0b11101; let Inst{26-20} = 0b0000100; let Inst{19-16} = addr{11-8}; let Inst{15-12} = Rt; let Inst{11-8} = Rd; let Inst{7-0} = addr{7-0}; } } let hasExtraSrcRegAllocReq = 1, Constraints = "@earlyclobber $Rd" in def t2STREXD : T2I_strex<0b11, (outs rGPR:$Rd), (ins rGPR:$Rt, rGPR:$Rt2, addr_offset_none:$addr), AddrModeNone, 4, NoItinerary, "strexd", "\t$Rd, $Rt, $Rt2, $addr", "", [], {?, ?, ?, ?}> { bits<4> Rt2; let Inst{11-8} = Rt2; } def t2CLREX : T2I<(outs), (ins), NoItinerary, "clrex", "", []>, Requires<[IsThumb2, HasV7]> { let Inst{31-16} = 0xf3bf; let Inst{15-14} = 0b10; let Inst{13} = 0; let Inst{12} = 0; let Inst{11-8} = 0b1111; let Inst{7-4} = 0b0010; let Inst{3-0} = 0b1111; } //===----------------------------------------------------------------------===// // SJLJ Exception handling intrinsics // eh_sjlj_setjmp() is an instruction sequence to store the return // address and save #0 in R0 for the non-longjmp case. // Since by its nature we may be coming from some other function to get // here, and we're using the stack frame for the containing function to // save/restore registers, we can't keep anything live in regs across // the eh_sjlj_setjmp(), else it will almost certainly have been tromped upon // when we get here from a longjmp(). We force everything out of registers // except for our own input by listing the relevant registers in Defs. By // doing so, we also cause the prologue/epilogue code to actively preserve // all of the callee-saved resgisters, which is exactly what we want. // $val is a scratch register for our use. let Defs = [ R0, R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, LR, CPSR, QQQQ0, QQQQ1, QQQQ2, QQQQ3 ], hasSideEffects = 1, isBarrier = 1, isCodeGenOnly = 1 in { def t2Int_eh_sjlj_setjmp : Thumb2XI<(outs), (ins tGPR:$src, tGPR:$val), AddrModeNone, 0, NoItinerary, "", "", [(set R0, (ARMeh_sjlj_setjmp tGPR:$src, tGPR:$val))]>, Requires<[IsThumb2, HasVFP2]>; } let Defs = [ R0, R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, LR, CPSR ], hasSideEffects = 1, isBarrier = 1, isCodeGenOnly = 1 in { def t2Int_eh_sjlj_setjmp_nofp : Thumb2XI<(outs), (ins tGPR:$src, tGPR:$val), AddrModeNone, 0, NoItinerary, "", "", [(set R0, (ARMeh_sjlj_setjmp tGPR:$src, tGPR:$val))]>, Requires<[IsThumb2, NoVFP]>; } //===----------------------------------------------------------------------===// // Control-Flow Instructions // // FIXME: remove when we have a way to marking a MI with these properties. // FIXME: Should pc be an implicit operand like PICADD, etc? let isReturn = 1, isTerminator = 1, isBarrier = 1, mayLoad = 1, hasExtraDefRegAllocReq = 1, isCodeGenOnly = 1 in def t2LDMIA_RET: t2PseudoExpand<(outs GPR:$wb), (ins GPR:$Rn, pred:$p, reglist:$regs, variable_ops), 4, IIC_iLoad_mBr, [], (t2LDMIA_UPD GPR:$wb, GPR:$Rn, pred:$p, reglist:$regs)>, RegConstraint<"$Rn = $wb">; let isBranch = 1, isTerminator = 1, isBarrier = 1 in { let isPredicable = 1 in def t2B : T2I<(outs), (ins uncondbrtarget:$target), IIC_Br, "b", ".w\t$target", [(br bb:$target)]> { let Inst{31-27} = 0b11110; let Inst{15-14} = 0b10; let Inst{12} = 1; bits<20> target; let Inst{26} = target{19}; let Inst{11} = target{18}; let Inst{13} = target{17}; let Inst{21-16} = target{16-11}; let Inst{10-0} = target{10-0}; } let isNotDuplicable = 1, isIndirectBranch = 1 in { def t2BR_JT : t2PseudoInst<(outs), (ins GPR:$target, GPR:$index, i32imm:$jt, i32imm:$id), 0, IIC_Br, [(ARMbr2jt GPR:$target, GPR:$index, tjumptable:$jt, imm:$id)]>; // FIXME: Add a non-pc based case that can be predicated. def t2TBB_JT : t2PseudoInst<(outs), (ins GPR:$index, i32imm:$jt, i32imm:$id), 0, IIC_Br, []>; def t2TBH_JT : t2PseudoInst<(outs), (ins GPR:$index, i32imm:$jt, i32imm:$id), 0, IIC_Br, []>; def t2TBB : T2I<(outs), (ins addrmode_tbb:$addr), IIC_Br, "tbb", "\t$addr", []> { bits<4> Rn; bits<4> Rm; let Inst{31-20} = 0b111010001101; let Inst{19-16} = Rn; let Inst{15-5} = 0b11110000000; let Inst{4} = 0; // B form let Inst{3-0} = Rm; let DecoderMethod = "DecodeThumbTableBranch"; } def t2TBH : T2I<(outs), (ins addrmode_tbh:$addr), IIC_Br, "tbh", "\t$addr", []> { bits<4> Rn; bits<4> Rm; let Inst{31-20} = 0b111010001101; let Inst{19-16} = Rn; let Inst{15-5} = 0b11110000000; let Inst{4} = 1; // H form let Inst{3-0} = Rm; let DecoderMethod = "DecodeThumbTableBranch"; } } // isNotDuplicable, isIndirectBranch } // isBranch, isTerminator, isBarrier // FIXME: should be able to write a pattern for ARMBrcond, but can't use // a two-value operand where a dag node expects ", "two operands. :( let isBranch = 1, isTerminator = 1 in def t2Bcc : T2I<(outs), (ins brtarget:$target), IIC_Br, "b", ".w\t$target", [/*(ARMbrcond bb:$target, imm:$cc)*/]> { let Inst{31-27} = 0b11110; let Inst{15-14} = 0b10; let Inst{12} = 0; bits<4> p; let Inst{25-22} = p; bits<21> target; let Inst{26} = target{20}; let Inst{11} = target{19}; let Inst{13} = target{18}; let Inst{21-16} = target{17-12}; let Inst{10-0} = target{11-1}; let DecoderMethod = "DecodeThumb2BCCInstruction"; } // Tail calls. The Darwin version of thumb tail calls uses a t2 branch, so // it goes here. let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1 in { // Darwin version. let Defs = [R0, R1, R2, R3, R9, R12, QQQQ0, QQQQ2, QQQQ3, PC], Uses = [SP] in def tTAILJMPd: tPseudoExpand<(outs), (ins uncondbrtarget:$dst, pred:$p, variable_ops), 4, IIC_Br, [], (t2B uncondbrtarget:$dst, pred:$p)>, Requires<[IsThumb2, IsDarwin]>; } // IT block let Defs = [ITSTATE] in def t2IT : Thumb2XI<(outs), (ins it_pred:$cc, it_mask:$mask), AddrModeNone, 2, IIC_iALUx, "it$mask\t$cc", "", []> { // 16-bit instruction. let Inst{31-16} = 0x0000; let Inst{15-8} = 0b10111111; bits<4> cc; bits<4> mask; let Inst{7-4} = cc; let Inst{3-0} = mask; let DecoderMethod = "DecodeIT"; } // Branch and Exchange Jazelle -- for disassembly only // Rm = Inst{19-16} def t2BXJ : T2I<(outs), (ins rGPR:$func), NoItinerary, "bxj", "\t$func", []> { bits<4> func; let Inst{31-27} = 0b11110; let Inst{26} = 0; let Inst{25-20} = 0b111100; let Inst{19-16} = func; let Inst{15-0} = 0b1000111100000000; } // Compare and branch on zero / non-zero let isBranch = 1, isTerminator = 1 in { def tCBZ : T1I<(outs), (ins tGPR:$Rn, t_cbtarget:$target), IIC_Br, "cbz\t$Rn, $target", []>, T1Misc<{0,0,?,1,?,?,?}>, Requires<[IsThumb2]> { // A8.6.27 bits<6> target; bits<3> Rn; let Inst{9} = target{5}; let Inst{7-3} = target{4-0}; let Inst{2-0} = Rn; } def tCBNZ : T1I<(outs), (ins tGPR:$Rn, t_cbtarget:$target), IIC_Br, "cbnz\t$Rn, $target", []>, T1Misc<{1,0,?,1,?,?,?}>, Requires<[IsThumb2]> { // A8.6.27 bits<6> target; bits<3> Rn; let Inst{9} = target{5}; let Inst{7-3} = target{4-0}; let Inst{2-0} = Rn; } } // Change Processor State is a system instruction. // FIXME: Since the asm parser has currently no clean way to handle optional // operands, create 3 versions of the same instruction. Once there's a clean // framework to represent optional operands, change this behavior. class t2CPS : T2XI<(outs), iops, NoItinerary, !strconcat("cps", asm_op), []> { bits<2> imod; bits<3> iflags; bits<5> mode; bit M; let Inst{31-27} = 0b11110; let Inst{26} = 0; let Inst{25-20} = 0b111010; let Inst{19-16} = 0b1111; let Inst{15-14} = 0b10; let Inst{12} = 0; let Inst{10-9} = imod; let Inst{8} = M; let Inst{7-5} = iflags; let Inst{4-0} = mode; let DecoderMethod = "DecodeT2CPSInstruction"; } let M = 1 in def t2CPS3p : t2CPS<(ins imod_op:$imod, iflags_op:$iflags, i32imm:$mode), "$imod.w\t$iflags, $mode">; let mode = 0, M = 0 in def t2CPS2p : t2CPS<(ins imod_op:$imod, iflags_op:$iflags), "$imod.w\t$iflags">; let imod = 0, iflags = 0, M = 1 in def t2CPS1p : t2CPS<(ins imm0_31:$mode), "\t$mode">; // A6.3.4 Branches and miscellaneous control // Table A6-14 Change Processor State, and hint instructions class T2I_hint op7_0, string opc, string asm> : T2I<(outs), (ins), NoItinerary, opc, asm, []> { let Inst{31-20} = 0xf3a; let Inst{19-16} = 0b1111; let Inst{15-14} = 0b10; let Inst{12} = 0; let Inst{10-8} = 0b000; let Inst{7-0} = op7_0; } def t2NOP : T2I_hint<0b00000000, "nop", ".w">; def t2YIELD : T2I_hint<0b00000001, "yield", ".w">; def t2WFE : T2I_hint<0b00000010, "wfe", ".w">; def t2WFI : T2I_hint<0b00000011, "wfi", ".w">; def t2SEV : T2I_hint<0b00000100, "sev", ".w">; def t2DBG : T2I<(outs), (ins imm0_15:$opt), NoItinerary, "dbg", "\t$opt", []> { bits<4> opt; let Inst{31-20} = 0b111100111010; let Inst{19-16} = 0b1111; let Inst{15-8} = 0b10000000; let Inst{7-4} = 0b1111; let Inst{3-0} = opt; } // Secure Monitor Call is a system instruction. // Option = Inst{19-16} def t2SMC : T2I<(outs), (ins imm0_15:$opt), NoItinerary, "smc", "\t$opt", []> { let Inst{31-27} = 0b11110; let Inst{26-20} = 0b1111111; let Inst{15-12} = 0b1000; bits<4> opt; let Inst{19-16} = opt; } class T2SRS Op, bit W, dag oops, dag iops, InstrItinClass itin, string opc, string asm, list pattern> : T2I { bits<5> mode; let Inst{31-25} = 0b1110100; let Inst{24-23} = Op; let Inst{22} = 0; let Inst{21} = W; let Inst{20-16} = 0b01101; let Inst{15-5} = 0b11000000000; let Inst{4-0} = mode{4-0}; } // Store Return State is a system instruction. def t2SRSDB_UPD : T2SRS<0b00, 1, (outs), (ins imm0_31:$mode), NoItinerary, "srsdb", "\tsp!, $mode", []>; def t2SRSDB : T2SRS<0b00, 0, (outs), (ins imm0_31:$mode), NoItinerary, "srsdb","\tsp, $mode", []>; def t2SRSIA_UPD : T2SRS<0b11, 1, (outs), (ins imm0_31:$mode), NoItinerary, "srsia","\tsp!, $mode", []>; def t2SRSIA : T2SRS<0b11, 0, (outs), (ins imm0_31:$mode), NoItinerary, "srsia","\tsp, $mode", []>; // Return From Exception is a system instruction. class T2RFE op31_20, dag oops, dag iops, InstrItinClass itin, string opc, string asm, list pattern> : T2I { let Inst{31-20} = op31_20{11-0}; bits<4> Rn; let Inst{19-16} = Rn; let Inst{15-0} = 0xc000; } def t2RFEDBW : T2RFE<0b111010000011, (outs), (ins GPR:$Rn), NoItinerary, "rfedb", "\t$Rn!", [/* For disassembly only; pattern left blank */]>; def t2RFEDB : T2RFE<0b111010000001, (outs), (ins GPR:$Rn), NoItinerary, "rfedb", "\t$Rn", [/* For disassembly only; pattern left blank */]>; def t2RFEIAW : T2RFE<0b111010011011, (outs), (ins GPR:$Rn), NoItinerary, "rfeia", "\t$Rn!", [/* For disassembly only; pattern left blank */]>; def t2RFEIA : T2RFE<0b111010011001, (outs), (ins GPR:$Rn), NoItinerary, "rfeia", "\t$Rn", [/* For disassembly only; pattern left blank */]>; //===----------------------------------------------------------------------===// // Non-Instruction Patterns // // 32-bit immediate using movw + movt. // This is a single pseudo instruction to make it re-materializable. // FIXME: Remove this when we can do generalized remat. let isReMaterializable = 1, isMoveImm = 1 in def t2MOVi32imm : PseudoInst<(outs rGPR:$dst), (ins i32imm:$src), IIC_iMOVix2, [(set rGPR:$dst, (i32 imm:$src))]>, Requires<[IsThumb, HasV6T2]>; // Pseudo instruction that combines movw + movt + add pc (if pic). // It also makes it possible to rematerialize the instructions. // FIXME: Remove this when we can do generalized remat and when machine licm // can properly the instructions. let isReMaterializable = 1 in { def t2MOV_ga_pcrel : PseudoInst<(outs rGPR:$dst), (ins i32imm:$addr), IIC_iMOVix2addpc, [(set rGPR:$dst, (ARMWrapperPIC tglobaladdr:$addr))]>, Requires<[IsThumb2, UseMovt]>; def t2MOV_ga_dyn : PseudoInst<(outs rGPR:$dst), (ins i32imm:$addr), IIC_iMOVix2, [(set rGPR:$dst, (ARMWrapperDYN tglobaladdr:$addr))]>, Requires<[IsThumb2, UseMovt]>; } // ConstantPool, GlobalAddress, and JumpTable def : T2Pat<(ARMWrapper tglobaladdr :$dst), (t2LEApcrel tglobaladdr :$dst)>, Requires<[IsThumb2, DontUseMovt]>; def : T2Pat<(ARMWrapper tconstpool :$dst), (t2LEApcrel tconstpool :$dst)>; def : T2Pat<(ARMWrapper tglobaladdr :$dst), (t2MOVi32imm tglobaladdr :$dst)>, Requires<[IsThumb2, UseMovt]>; def : T2Pat<(ARMWrapperJT tjumptable:$dst, imm:$id), (t2LEApcrelJT tjumptable:$dst, imm:$id)>; // Pseudo instruction that combines ldr from constpool and add pc. This should // be expanded into two instructions late to allow if-conversion and // scheduling. let canFoldAsLoad = 1, isReMaterializable = 1 in def t2LDRpci_pic : PseudoInst<(outs rGPR:$dst), (ins i32imm:$addr, pclabel:$cp), IIC_iLoadiALU, [(set rGPR:$dst, (ARMpic_add (load (ARMWrapper tconstpool:$addr)), imm:$cp))]>, Requires<[IsThumb2]>; // Pseudo isntruction that combines movs + predicated rsbmi // to implement integer ABS let usesCustomInserter = 1, Defs = [CPSR] in { def t2ABS : PseudoInst<(outs rGPR:$dst), (ins rGPR:$src), NoItinerary, []>, Requires<[IsThumb2]>; } //===----------------------------------------------------------------------===// // Coprocessor load/store -- for disassembly only // class T2CI op31_28, dag oops, dag iops, string opc, string asm> : T2I { let Inst{31-28} = op31_28; let Inst{27-25} = 0b110; } multiclass t2LdStCop op31_28, bit load, bit Dbit, string asm> { def _OFFSET : T2CI { bits<13> addr; bits<4> cop; bits<4> CRd; let Inst{24} = 1; // P = 1 let Inst{23} = addr{8}; let Inst{22} = Dbit; let Inst{21} = 0; // W = 0 let Inst{20} = load; let Inst{19-16} = addr{12-9}; let Inst{15-12} = CRd; let Inst{11-8} = cop; let Inst{7-0} = addr{7-0}; let DecoderMethod = "DecodeCopMemInstruction"; } def _PRE : T2CI { bits<13> addr; bits<4> cop; bits<4> CRd; let Inst{24} = 1; // P = 1 let Inst{23} = addr{8}; let Inst{22} = Dbit; let Inst{21} = 1; // W = 1 let Inst{20} = load; let Inst{19-16} = addr{12-9}; let Inst{15-12} = CRd; let Inst{11-8} = cop; let Inst{7-0} = addr{7-0}; let DecoderMethod = "DecodeCopMemInstruction"; } def _POST: T2CI { bits<9> offset; bits<4> addr; bits<4> cop; bits<4> CRd; let Inst{24} = 0; // P = 0 let Inst{23} = offset{8}; let Inst{22} = Dbit; let Inst{21} = 1; // W = 1 let Inst{20} = load; let Inst{19-16} = addr; let Inst{15-12} = CRd; let Inst{11-8} = cop; let Inst{7-0} = offset{7-0}; let DecoderMethod = "DecodeCopMemInstruction"; } def _OPTION : T2CI { bits<8> option; bits<4> addr; bits<4> cop; bits<4> CRd; let Inst{24} = 0; // P = 0 let Inst{23} = 1; // U = 1 let Inst{22} = Dbit; let Inst{21} = 0; // W = 0 let Inst{20} = load; let Inst{19-16} = addr; let Inst{15-12} = CRd; let Inst{11-8} = cop; let Inst{7-0} = option; let DecoderMethod = "DecodeCopMemInstruction"; } } defm t2LDC : t2LdStCop<0b1110, 1, 0, "ldc">; defm t2LDCL : t2LdStCop<0b1110, 1, 1, "ldcl">; defm t2STC : t2LdStCop<0b1110, 0, 0, "stc">; defm t2STCL : t2LdStCop<0b1110, 0, 1, "stcl">; defm t2LDC2 : t2LdStCop<0b1111, 1, 0, "ldc2">; defm t2LDC2L : t2LdStCop<0b1111, 1, 1, "ldc2l">; defm t2STC2 : t2LdStCop<0b1111, 0, 0, "stc2">; defm t2STC2L : t2LdStCop<0b1111, 0, 1, "stc2l">; //===----------------------------------------------------------------------===// // Move between special register and ARM core register -- for disassembly only // // Move to ARM core register from Special Register // A/R class MRS. // // A/R class can only move from CPSR or SPSR. def t2MRS_AR : T2I<(outs GPR:$Rd), (ins), NoItinerary, "mrs", "\t$Rd, apsr", []>, Requires<[IsThumb2,IsARClass]> { bits<4> Rd; let Inst{31-12} = 0b11110011111011111000; let Inst{11-8} = Rd; let Inst{7-0} = 0b0000; } def : t2InstAlias<"mrs${p} $Rd, cpsr", (t2MRS_AR GPR:$Rd, pred:$p)>; def t2MRSsys_AR: T2I<(outs GPR:$Rd), (ins), NoItinerary, "mrs", "\t$Rd, spsr", []>, Requires<[IsThumb2,IsARClass]> { bits<4> Rd; let Inst{31-12} = 0b11110011111111111000; let Inst{11-8} = Rd; let Inst{7-0} = 0b0000; } // M class MRS. // // This MRS has a mask field in bits 7-0 and can take more values than // the A/R class (a full msr_mask). def t2MRS_M : T2I<(outs rGPR:$Rd), (ins msr_mask:$mask), NoItinerary, "mrs", "\t$Rd, $mask", []>, Requires<[IsThumb2,IsMClass]> { bits<4> Rd; bits<8> mask; let Inst{31-12} = 0b11110011111011111000; let Inst{11-8} = Rd; let Inst{19-16} = 0b1111; let Inst{7-0} = mask; } // Move from ARM core register to Special Register // // A/R class MSR. // // No need to have both system and application versions, the encodings are the // same and the assembly parser has no way to distinguish between them. The mask // operand contains the special register (R Bit) in bit 4 and bits 3-0 contains // the mask with the fields to be accessed in the special register. def t2MSR_AR : T2I<(outs), (ins msr_mask:$mask, rGPR:$Rn), NoItinerary, "msr", "\t$mask, $Rn", []>, Requires<[IsThumb2,IsARClass]> { bits<5> mask; bits<4> Rn; let Inst{31-21} = 0b11110011100; let Inst{20} = mask{4}; // R Bit let Inst{19-16} = Rn; let Inst{15-12} = 0b1000; let Inst{11-8} = mask{3-0}; let Inst{7-0} = 0; } // M class MSR. // // Move from ARM core register to Special Register def t2MSR_M : T2I<(outs), (ins msr_mask:$SYSm, rGPR:$Rn), NoItinerary, "msr", "\t$SYSm, $Rn", []>, Requires<[IsThumb2,IsMClass]> { bits<8> SYSm; bits<4> Rn; let Inst{31-21} = 0b11110011100; let Inst{20} = 0b0; let Inst{19-16} = Rn; let Inst{15-12} = 0b1000; let Inst{7-0} = SYSm; } //===----------------------------------------------------------------------===// // Move between coprocessor and ARM core register // class t2MovRCopro Op, string opc, bit direction, dag oops, dag iops, list pattern> : T2Cop { let Inst{27-24} = 0b1110; let Inst{20} = direction; let Inst{4} = 1; bits<4> Rt; bits<4> cop; bits<3> opc1; bits<3> opc2; bits<4> CRm; bits<4> CRn; let Inst{15-12} = Rt; let Inst{11-8} = cop; let Inst{23-21} = opc1; let Inst{7-5} = opc2; let Inst{3-0} = CRm; let Inst{19-16} = CRn; } class t2MovRRCopro Op, string opc, bit direction, list pattern = []> : T2Cop { let Inst{27-24} = 0b1100; let Inst{23-21} = 0b010; let Inst{20} = direction; bits<4> Rt; bits<4> Rt2; bits<4> cop; bits<4> opc1; bits<4> CRm; let Inst{15-12} = Rt; let Inst{19-16} = Rt2; let Inst{11-8} = cop; let Inst{7-4} = opc1; let Inst{3-0} = CRm; } /* from ARM core register to coprocessor */ def t2MCR : t2MovRCopro<0b1110, "mcr", 0, (outs), (ins p_imm:$cop, imm0_7:$opc1, GPR:$Rt, c_imm:$CRn, c_imm:$CRm, imm0_7:$opc2), [(int_arm_mcr imm:$cop, imm:$opc1, GPR:$Rt, imm:$CRn, imm:$CRm, imm:$opc2)]>; def t2MCR2 : t2MovRCopro<0b1111, "mcr2", 0, (outs), (ins p_imm:$cop, imm0_7:$opc1, GPR:$Rt, c_imm:$CRn, c_imm:$CRm, imm0_7:$opc2), [(int_arm_mcr2 imm:$cop, imm:$opc1, GPR:$Rt, imm:$CRn, imm:$CRm, imm:$opc2)]>; /* from coprocessor to ARM core register */ def t2MRC : t2MovRCopro<0b1110, "mrc", 1, (outs GPR:$Rt), (ins p_imm:$cop, imm0_7:$opc1, c_imm:$CRn, c_imm:$CRm, imm0_7:$opc2), []>; def t2MRC2 : t2MovRCopro<0b1111, "mrc2", 1, (outs GPR:$Rt), (ins p_imm:$cop, imm0_7:$opc1, c_imm:$CRn, c_imm:$CRm, imm0_7:$opc2), []>; def : T2v6Pat<(int_arm_mrc imm:$cop, imm:$opc1, imm:$CRn, imm:$CRm, imm:$opc2), (t2MRC imm:$cop, imm:$opc1, imm:$CRn, imm:$CRm, imm:$opc2)>; def : T2v6Pat<(int_arm_mrc2 imm:$cop, imm:$opc1, imm:$CRn, imm:$CRm, imm:$opc2), (t2MRC2 imm:$cop, imm:$opc1, imm:$CRn, imm:$CRm, imm:$opc2)>; /* from ARM core register to coprocessor */ def t2MCRR : t2MovRRCopro<0b1110, "mcrr", 0, [(int_arm_mcrr imm:$cop, imm:$opc1, GPR:$Rt, GPR:$Rt2, imm:$CRm)]>; def t2MCRR2 : t2MovRRCopro<0b1111, "mcrr2", 0, [(int_arm_mcrr2 imm:$cop, imm:$opc1, GPR:$Rt, GPR:$Rt2, imm:$CRm)]>; /* from coprocessor to ARM core register */ def t2MRRC : t2MovRRCopro<0b1110, "mrrc", 1>; def t2MRRC2 : t2MovRRCopro<0b1111, "mrrc2", 1>; //===----------------------------------------------------------------------===// // Other Coprocessor Instructions. // def tCDP : T2Cop<0b1110, (outs), (ins p_imm:$cop, imm0_15:$opc1, c_imm:$CRd, c_imm:$CRn, c_imm:$CRm, imm0_7:$opc2), "cdp\t$cop, $opc1, $CRd, $CRn, $CRm, $opc2", [(int_arm_cdp imm:$cop, imm:$opc1, imm:$CRd, imm:$CRn, imm:$CRm, imm:$opc2)]> { let Inst{27-24} = 0b1110; bits<4> opc1; bits<4> CRn; bits<4> CRd; bits<4> cop; bits<3> opc2; bits<4> CRm; let Inst{3-0} = CRm; let Inst{4} = 0; let Inst{7-5} = opc2; let Inst{11-8} = cop; let Inst{15-12} = CRd; let Inst{19-16} = CRn; let Inst{23-20} = opc1; } def t2CDP2 : T2Cop<0b1111, (outs), (ins p_imm:$cop, imm0_15:$opc1, c_imm:$CRd, c_imm:$CRn, c_imm:$CRm, imm0_7:$opc2), "cdp2\t$cop, $opc1, $CRd, $CRn, $CRm, $opc2", [(int_arm_cdp2 imm:$cop, imm:$opc1, imm:$CRd, imm:$CRn, imm:$CRm, imm:$opc2)]> { let Inst{27-24} = 0b1110; bits<4> opc1; bits<4> CRn; bits<4> CRd; bits<4> cop; bits<3> opc2; bits<4> CRm; let Inst{3-0} = CRm; let Inst{4} = 0; let Inst{7-5} = opc2; let Inst{11-8} = cop; let Inst{15-12} = CRd; let Inst{19-16} = CRn; let Inst{23-20} = opc1; } //===----------------------------------------------------------------------===// // Non-Instruction Patterns // // SXT/UXT with no rotate let AddedComplexity = 16 in { def : T2Pat<(and rGPR:$Rm, 0x000000FF), (t2UXTB rGPR:$Rm, 0)>, Requires<[IsThumb2]>; def : T2Pat<(and rGPR:$Rm, 0x0000FFFF), (t2UXTH rGPR:$Rm, 0)>, Requires<[IsThumb2]>; def : T2Pat<(and rGPR:$Rm, 0x00FF00FF), (t2UXTB16 rGPR:$Rm, 0)>, Requires<[HasT2ExtractPack, IsThumb2]>; def : T2Pat<(add rGPR:$Rn, (and rGPR:$Rm, 0x00FF)), (t2UXTAB rGPR:$Rn, rGPR:$Rm, 0)>, Requires<[HasT2ExtractPack, IsThumb2]>; def : T2Pat<(add rGPR:$Rn, (and rGPR:$Rm, 0xFFFF)), (t2UXTAH rGPR:$Rn, rGPR:$Rm, 0)>, Requires<[HasT2ExtractPack, IsThumb2]>; } def : T2Pat<(sext_inreg rGPR:$Src, i8), (t2SXTB rGPR:$Src, 0)>, Requires<[IsThumb2]>; def : T2Pat<(sext_inreg rGPR:$Src, i16), (t2SXTH rGPR:$Src, 0)>, Requires<[IsThumb2]>; def : T2Pat<(add rGPR:$Rn, (sext_inreg rGPR:$Rm, i8)), (t2SXTAB rGPR:$Rn, rGPR:$Rm, 0)>, Requires<[HasT2ExtractPack, IsThumb2]>; def : T2Pat<(add rGPR:$Rn, (sext_inreg rGPR:$Rm, i16)), (t2SXTAH rGPR:$Rn, rGPR:$Rm, 0)>, Requires<[HasT2ExtractPack, IsThumb2]>; // Atomic load/store patterns def : T2Pat<(atomic_load_8 t2addrmode_imm12:$addr), (t2LDRBi12 t2addrmode_imm12:$addr)>; def : T2Pat<(atomic_load_8 t2addrmode_negimm8:$addr), (t2LDRBi8 t2addrmode_negimm8:$addr)>; def : T2Pat<(atomic_load_8 t2addrmode_so_reg:$addr), (t2LDRBs t2addrmode_so_reg:$addr)>; def : T2Pat<(atomic_load_16 t2addrmode_imm12:$addr), (t2LDRHi12 t2addrmode_imm12:$addr)>; def : T2Pat<(atomic_load_16 t2addrmode_negimm8:$addr), (t2LDRHi8 t2addrmode_negimm8:$addr)>; def : T2Pat<(atomic_load_16 t2addrmode_so_reg:$addr), (t2LDRHs t2addrmode_so_reg:$addr)>; def : T2Pat<(atomic_load_32 t2addrmode_imm12:$addr), (t2LDRi12 t2addrmode_imm12:$addr)>; def : T2Pat<(atomic_load_32 t2addrmode_negimm8:$addr), (t2LDRi8 t2addrmode_negimm8:$addr)>; def : T2Pat<(atomic_load_32 t2addrmode_so_reg:$addr), (t2LDRs t2addrmode_so_reg:$addr)>; def : T2Pat<(atomic_store_8 t2addrmode_imm12:$addr, GPR:$val), (t2STRBi12 GPR:$val, t2addrmode_imm12:$addr)>; def : T2Pat<(atomic_store_8 t2addrmode_negimm8:$addr, GPR:$val), (t2STRBi8 GPR:$val, t2addrmode_negimm8:$addr)>; def : T2Pat<(atomic_store_8 t2addrmode_so_reg:$addr, GPR:$val), (t2STRBs GPR:$val, t2addrmode_so_reg:$addr)>; def : T2Pat<(atomic_store_16 t2addrmode_imm12:$addr, GPR:$val), (t2STRHi12 GPR:$val, t2addrmode_imm12:$addr)>; def : T2Pat<(atomic_store_16 t2addrmode_negimm8:$addr, GPR:$val), (t2STRHi8 GPR:$val, t2addrmode_negimm8:$addr)>; def : T2Pat<(atomic_store_16 t2addrmode_so_reg:$addr, GPR:$val), (t2STRHs GPR:$val, t2addrmode_so_reg:$addr)>; def : T2Pat<(atomic_store_32 t2addrmode_imm12:$addr, GPR:$val), (t2STRi12 GPR:$val, t2addrmode_imm12:$addr)>; def : T2Pat<(atomic_store_32 t2addrmode_negimm8:$addr, GPR:$val), (t2STRi8 GPR:$val, t2addrmode_negimm8:$addr)>; def : T2Pat<(atomic_store_32 t2addrmode_so_reg:$addr, GPR:$val), (t2STRs GPR:$val, t2addrmode_so_reg:$addr)>; //===----------------------------------------------------------------------===// // Assembler aliases // // Aliases for ADC without the ".w" optional width specifier. def : t2InstAlias<"adc${s}${p} $Rd, $Rn, $Rm", (t2ADCrr rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, pred:$p, cc_out:$s)>; def : t2InstAlias<"adc${s}${p} $Rd, $Rn, $ShiftedRm", (t2ADCrs rGPR:$Rd, rGPR:$Rn, t2_so_reg:$ShiftedRm, pred:$p, cc_out:$s)>; // Aliases for SBC without the ".w" optional width specifier. def : t2InstAlias<"sbc${s}${p} $Rd, $Rn, $Rm", (t2SBCrr rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, pred:$p, cc_out:$s)>; def : t2InstAlias<"sbc${s}${p} $Rd, $Rn, $ShiftedRm", (t2SBCrs rGPR:$Rd, rGPR:$Rn, t2_so_reg:$ShiftedRm, pred:$p, cc_out:$s)>; // Aliases for ADD without the ".w" optional width specifier. def : t2InstAlias<"add${s}${p} $Rd, $Rn, $imm", (t2ADDri GPRnopc:$Rd, GPRnopc:$Rn, t2_so_imm:$imm, pred:$p, cc_out:$s)>; def : t2InstAlias<"add${p} $Rd, $Rn, $imm", (t2ADDri12 GPRnopc:$Rd, GPR:$Rn, imm0_4095:$imm, pred:$p)>; def : t2InstAlias<"add${s}${p} $Rd, $Rn, $Rm", (t2ADDrr GPRnopc:$Rd, GPRnopc:$Rn, rGPR:$Rm, pred:$p, cc_out:$s)>; def : t2InstAlias<"add${s}${p} $Rd, $Rn, $ShiftedRm", (t2ADDrs GPRnopc:$Rd, GPRnopc:$Rn, t2_so_reg:$ShiftedRm, pred:$p, cc_out:$s)>; // Aliases for SUB without the ".w" optional width specifier. def : t2InstAlias<"sub${s}${p} $Rd, $Rn, $imm", (t2SUBri GPRnopc:$Rd, GPRnopc:$Rn, t2_so_imm:$imm, pred:$p, cc_out:$s)>; def : t2InstAlias<"sub${p} $Rd, $Rn, $imm", (t2SUBri12 GPRnopc:$Rd, GPR:$Rn, imm0_4095:$imm, pred:$p)>; def : t2InstAlias<"sub${s}${p} $Rd, $Rn, $Rm", (t2SUBrr GPRnopc:$Rd, GPRnopc:$Rn, rGPR:$Rm, pred:$p, cc_out:$s)>; def : t2InstAlias<"sub${s}${p} $Rd, $Rn, $ShiftedRm", (t2SUBrs GPRnopc:$Rd, GPRnopc:$Rn, t2_so_reg:$ShiftedRm, pred:$p, cc_out:$s)>; // Alias for compares without the ".w" optional width specifier. def : t2InstAlias<"cmn${p} $Rn, $Rm", (t2CMNzrr GPRnopc:$Rn, rGPR:$Rm, pred:$p)>; def : t2InstAlias<"teq${p} $Rn, $Rm", (t2TEQrr GPRnopc:$Rn, rGPR:$Rm, pred:$p)>; def : t2InstAlias<"tst${p} $Rn, $Rm", (t2TSTrr GPRnopc:$Rn, rGPR:$Rm, pred:$p)>; // Memory barriers def : InstAlias<"dmb", (t2DMB 0xf)>, Requires<[IsThumb2, HasDB]>; def : InstAlias<"dsb", (t2DSB 0xf)>, Requires<[IsThumb2, HasDB]>; def : InstAlias<"isb", (t2ISB 0xf)>, Requires<[IsThumb2, HasDB]>; // Alias for LDR, LDRB, LDRH, LDRSB, and LDRSH without the ".w" optional // width specifier. def : t2InstAlias<"ldr${p} $Rt, $addr", (t2LDRi12 GPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>; def : t2InstAlias<"ldrb${p} $Rt, $addr", (t2LDRBi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>; def : t2InstAlias<"ldrh${p} $Rt, $addr", (t2LDRHi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>; def : t2InstAlias<"ldrsb${p} $Rt, $addr", (t2LDRSBi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>; def : t2InstAlias<"ldrsh${p} $Rt, $addr", (t2LDRSHi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>; def : t2InstAlias<"ldr${p} $Rt, $addr", (t2LDRs GPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>; def : t2InstAlias<"ldrb${p} $Rt, $addr", (t2LDRBs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>; def : t2InstAlias<"ldrh${p} $Rt, $addr", (t2LDRHs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>; def : t2InstAlias<"ldrsb${p} $Rt, $addr", (t2LDRSBs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>; def : t2InstAlias<"ldrsh${p} $Rt, $addr", (t2LDRSHs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>; // Alias for MVN without the ".w" optional width specifier. def : t2InstAlias<"mvn${s}${p} $Rd, $Rm", (t2MVNr rGPR:$Rd, rGPR:$Rm, pred:$p, cc_out:$s)>; def : t2InstAlias<"mvn${s}${p} $Rd, $ShiftedRm", (t2MVNs rGPR:$Rd, t2_so_reg:$ShiftedRm, pred:$p, cc_out:$s)>; // PKHBT/PKHTB with default shift amount. PKHTB is equivalent to PKHBT when the // shift amount is zero (i.e., unspecified). def : InstAlias<"pkhbt${p} $Rd, $Rn, $Rm", (t2PKHBT rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p)>, Requires<[HasT2ExtractPack, IsThumb2]>; def : InstAlias<"pkhtb${p} $Rd, $Rn, $Rm", (t2PKHBT rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p)>, Requires<[HasT2ExtractPack, IsThumb2]>; // PUSH/POP aliases for STM/LDM def : t2InstAlias<"push${p}.w $regs", (t2STMDB_UPD SP, pred:$p, reglist:$regs)>; def : t2InstAlias<"push${p} $regs", (t2STMDB_UPD SP, pred:$p, reglist:$regs)>; def : t2InstAlias<"pop${p}.w $regs", (t2LDMIA_UPD SP, pred:$p, reglist:$regs)>; def : t2InstAlias<"pop${p} $regs", (t2LDMIA_UPD SP, pred:$p, reglist:$regs)>; // Alias for REV/REV16/REVSH without the ".w" optional width specifier. def : t2InstAlias<"rev${p} $Rd, $Rm", (t2REV rGPR:$Rd, rGPR:$Rm, pred:$p)>; def : t2InstAlias<"rev16${p} $Rd, $Rm", (t2REV16 rGPR:$Rd, rGPR:$Rm, pred:$p)>; def : t2InstAlias<"revsh${p} $Rd, $Rm", (t2REVSH rGPR:$Rd, rGPR:$Rm, pred:$p)>; // Alias for RSB without the ".w" optional width specifier, and with optional // implied destination register. def : t2InstAlias<"rsb${s}${p} $Rd, $Rn, $imm", (t2RSBri rGPR:$Rd, rGPR:$Rn, t2_so_imm:$imm, pred:$p, cc_out:$s)>; def : t2InstAlias<"rsb${s}${p} $Rdn, $imm", (t2RSBri rGPR:$Rdn, rGPR:$Rdn, t2_so_imm:$imm, pred:$p, cc_out:$s)>; def : t2InstAlias<"rsb${s}${p} $Rdn, $Rm", (t2RSBrr rGPR:$Rdn, rGPR:$Rdn, rGPR:$Rm, pred:$p, cc_out:$s)>; def : t2InstAlias<"rsb${s}${p} $Rdn, $ShiftedRm", (t2RSBrs rGPR:$Rdn, rGPR:$Rdn, t2_so_reg:$ShiftedRm, pred:$p, cc_out:$s)>; // SSAT/USAT optional shift operand. def : t2InstAlias<"ssat${p} $Rd, $sat_imm, $Rn", (t2SSAT rGPR:$Rd, imm1_32:$sat_imm, rGPR:$Rn, 0, pred:$p)>; def : t2InstAlias<"usat${p} $Rd, $sat_imm, $Rn", (t2USAT rGPR:$Rd, imm0_31:$sat_imm, rGPR:$Rn, 0, pred:$p)>; // STM w/o the .w suffix. def : t2InstAlias<"stm${p} $Rn, $regs", (t2STMIA GPR:$Rn, pred:$p, reglist:$regs)>; // Alias for STR, STRB, and STRH without the ".w" optional // width specifier. def : t2InstAlias<"str${p} $Rt, $addr", (t2STRi12 GPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>; def : t2InstAlias<"strb${p} $Rt, $addr", (t2STRBi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>; def : t2InstAlias<"strh${p} $Rt, $addr", (t2STRHi12 rGPR:$Rt, t2addrmode_imm12:$addr, pred:$p)>; def : t2InstAlias<"str${p} $Rt, $addr", (t2STRs GPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>; def : t2InstAlias<"strb${p} $Rt, $addr", (t2STRBs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>; def : t2InstAlias<"strh${p} $Rt, $addr", (t2STRHs rGPR:$Rt, t2addrmode_so_reg:$addr, pred:$p)>; // Extend instruction optional rotate operand. def : t2InstAlias<"sxtab${p} $Rd, $Rn, $Rm", (t2SXTAB rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"sxtah${p} $Rd, $Rn, $Rm", (t2SXTAH rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"sxtab16${p} $Rd, $Rn, $Rm", (t2SXTAB16 rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"sxtb${p} $Rd, $Rm", (t2SXTB rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"sxtb16${p} $Rd, $Rm", (t2SXTB16 rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"sxth${p} $Rd, $Rm", (t2SXTH rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"sxtb${p}.w $Rd, $Rm", (t2SXTB rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"sxth${p}.w $Rd, $Rm", (t2SXTH rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"uxtab${p} $Rd, $Rn, $Rm", (t2UXTAB rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"uxtah${p} $Rd, $Rn, $Rm", (t2UXTAH rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"uxtab16${p} $Rd, $Rn, $Rm", (t2UXTAB16 rGPR:$Rd, rGPR:$Rn, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"uxtb${p} $Rd, $Rm", (t2UXTB rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"uxtb16${p} $Rd, $Rm", (t2UXTB16 rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"uxth${p} $Rd, $Rm", (t2UXTH rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"uxtb${p}.w $Rd, $Rm", (t2UXTB rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>; def : t2InstAlias<"uxth${p}.w $Rd, $Rm", (t2UXTH rGPR:$Rd, rGPR:$Rm, 0, pred:$p)>; // Extend instruction w/o the ".w" optional width specifier. def : t2InstAlias<"uxtb${p} $Rd, $Rm$rot", (t2UXTB rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p)>; def : t2InstAlias<"uxtb16${p} $Rd, $Rm$rot", (t2UXTB16 rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p)>; def : t2InstAlias<"uxth${p} $Rd, $Rm$rot", (t2UXTH rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p)>; def : t2InstAlias<"sxtb${p} $Rd, $Rm$rot", (t2SXTB rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p)>; def : t2InstAlias<"sxtb16${p} $Rd, $Rm$rot", (t2SXTB16 rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p)>; def : t2InstAlias<"sxth${p} $Rd, $Rm$rot", (t2SXTH rGPR:$Rd, rGPR:$Rm, rot_imm:$rot, pred:$p)>; Index: vendor/llvm/dist/lib/Target/CppBackend/CPPBackend.cpp =================================================================== --- vendor/llvm/dist/lib/Target/CppBackend/CPPBackend.cpp (revision 228363) +++ vendor/llvm/dist/lib/Target/CppBackend/CPPBackend.cpp (revision 228364) @@ -1,1978 +1,2073 @@ //===-- CPPBackend.cpp - Library for converting LLVM code to C++ code -----===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file implements the writing of the LLVM IR as a set of C++ calls to the // LLVM IR interface. The input module is assumed to be verified. // //===----------------------------------------------------------------------===// #include "CPPTargetMachine.h" #include "llvm/CallingConv.h" #include "llvm/Constants.h" #include "llvm/DerivedTypes.h" #include "llvm/InlineAsm.h" #include "llvm/Instruction.h" #include "llvm/Instructions.h" #include "llvm/Module.h" #include "llvm/Pass.h" #include "llvm/PassManager.h" #include "llvm/MC/MCAsmInfo.h" #include "llvm/MC/MCInstrInfo.h" #include "llvm/MC/MCSubtargetInfo.h" #include "llvm/ADT/SmallPtrSet.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/FormattedStream.h" #include "llvm/Support/TargetRegistry.h" #include "llvm/ADT/StringExtras.h" #include "llvm/Config/config.h" #include #include #include using namespace llvm; static cl::opt FuncName("cppfname", cl::desc("Specify the name of the generated function"), cl::value_desc("function name")); enum WhatToGenerate { GenProgram, GenModule, GenContents, GenFunction, GenFunctions, GenInline, GenVariable, GenType }; static cl::opt GenerationType("cppgen", cl::Optional, cl::desc("Choose what kind of output to generate"), cl::init(GenProgram), cl::values( clEnumValN(GenProgram, "program", "Generate a complete program"), clEnumValN(GenModule, "module", "Generate a module definition"), clEnumValN(GenContents, "contents", "Generate contents of a module"), clEnumValN(GenFunction, "function", "Generate a function definition"), clEnumValN(GenFunctions,"functions", "Generate all function definitions"), clEnumValN(GenInline, "inline", "Generate an inline function"), clEnumValN(GenVariable, "variable", "Generate a variable definition"), clEnumValN(GenType, "type", "Generate a type definition"), clEnumValEnd ) ); static cl::opt NameToGenerate("cppfor", cl::Optional, cl::desc("Specify the name of the thing to generate"), cl::init("!bad!")); extern "C" void LLVMInitializeCppBackendTarget() { // Register the target. RegisterTargetMachine X(TheCppBackendTarget); } namespace { typedef std::vector TypeList; typedef std::map TypeMap; typedef std::map ValueMap; typedef std::set NameSet; typedef std::set TypeSet; typedef std::set ValueSet; typedef std::map ForwardRefMap; /// CppWriter - This class is the main chunk of code that converts an LLVM /// module to a C++ translation unit. class CppWriter : public ModulePass { formatted_raw_ostream &Out; const Module *TheModule; uint64_t uniqueNum; TypeMap TypeNames; ValueMap ValueNames; NameSet UsedNames; TypeSet DefinedTypes; ValueSet DefinedValues; ForwardRefMap ForwardRefs; bool is_inline; unsigned indent_level; public: static char ID; explicit CppWriter(formatted_raw_ostream &o) : ModulePass(ID), Out(o), uniqueNum(0), is_inline(false), indent_level(0){} virtual const char *getPassName() const { return "C++ backend"; } bool runOnModule(Module &M); void printProgram(const std::string& fname, const std::string& modName ); void printModule(const std::string& fname, const std::string& modName ); void printContents(const std::string& fname, const std::string& modName ); void printFunction(const std::string& fname, const std::string& funcName ); void printFunctions(); void printInline(const std::string& fname, const std::string& funcName ); void printVariable(const std::string& fname, const std::string& varName ); void printType(const std::string& fname, const std::string& typeName ); void error(const std::string& msg); formatted_raw_ostream& nl(formatted_raw_ostream &Out, int delta = 0); inline void in() { indent_level++; } inline void out() { if (indent_level >0) indent_level--; } private: void printLinkageType(GlobalValue::LinkageTypes LT); void printVisibilityType(GlobalValue::VisibilityTypes VisTypes); void printCallingConv(CallingConv::ID cc); void printEscapedString(const std::string& str); void printCFP(const ConstantFP* CFP); std::string getCppName(Type* val); inline void printCppName(Type* val); std::string getCppName(const Value* val); inline void printCppName(const Value* val); void printAttributes(const AttrListPtr &PAL, const std::string &name); void printType(Type* Ty); void printTypes(const Module* M); void printConstant(const Constant *CPV); void printConstants(const Module* M); void printVariableUses(const GlobalVariable *GV); void printVariableHead(const GlobalVariable *GV); void printVariableBody(const GlobalVariable *GV); void printFunctionUses(const Function *F); void printFunctionHead(const Function *F); void printFunctionBody(const Function *F); void printInstruction(const Instruction *I, const std::string& bbname); std::string getOpName(const Value*); void printModuleBody(); }; } // end anonymous namespace. formatted_raw_ostream &CppWriter::nl(formatted_raw_ostream &Out, int delta) { Out << '\n'; if (delta >= 0 || indent_level >= unsigned(-delta)) indent_level += delta; Out.indent(indent_level); return Out; } static inline void sanitize(std::string &str) { for (size_t i = 0; i < str.length(); ++i) if (!isalnum(str[i]) && str[i] != '_') str[i] = '_'; } static std::string getTypePrefix(Type *Ty) { switch (Ty->getTypeID()) { case Type::VoidTyID: return "void_"; case Type::IntegerTyID: return "int" + utostr(cast(Ty)->getBitWidth()) + "_"; case Type::FloatTyID: return "float_"; case Type::DoubleTyID: return "double_"; case Type::LabelTyID: return "label_"; case Type::FunctionTyID: return "func_"; case Type::StructTyID: return "struct_"; case Type::ArrayTyID: return "array_"; case Type::PointerTyID: return "ptr_"; case Type::VectorTyID: return "packed_"; default: return "other_"; } return "unknown_"; } void CppWriter::error(const std::string& msg) { report_fatal_error(msg); } // printCFP - Print a floating point constant .. very carefully :) // This makes sure that conversion to/from floating yields the same binary // result so that we don't lose precision. void CppWriter::printCFP(const ConstantFP *CFP) { bool ignored; APFloat APF = APFloat(CFP->getValueAPF()); // copy if (CFP->getType() == Type::getFloatTy(CFP->getContext())) APF.convert(APFloat::IEEEdouble, APFloat::rmNearestTiesToEven, &ignored); Out << "ConstantFP::get(mod->getContext(), "; Out << "APFloat("; #if HAVE_PRINTF_A char Buffer[100]; sprintf(Buffer, "%A", APF.convertToDouble()); if ((!strncmp(Buffer, "0x", 2) || !strncmp(Buffer, "-0x", 3) || !strncmp(Buffer, "+0x", 3)) && APF.bitwiseIsEqual(APFloat(atof(Buffer)))) { if (CFP->getType() == Type::getDoubleTy(CFP->getContext())) Out << "BitsToDouble(" << Buffer << ")"; else Out << "BitsToFloat((float)" << Buffer << ")"; Out << ")"; } else { #endif std::string StrVal = ftostr(CFP->getValueAPF()); while (StrVal[0] == ' ') StrVal.erase(StrVal.begin()); // Check to make sure that the stringized number is not some string like // "Inf" or NaN. Check that the string matches the "[-+]?[0-9]" regex. if (((StrVal[0] >= '0' && StrVal[0] <= '9') || ((StrVal[0] == '-' || StrVal[0] == '+') && (StrVal[1] >= '0' && StrVal[1] <= '9'))) && (CFP->isExactlyValue(atof(StrVal.c_str())))) { if (CFP->getType() == Type::getDoubleTy(CFP->getContext())) Out << StrVal; else Out << StrVal << "f"; } else if (CFP->getType() == Type::getDoubleTy(CFP->getContext())) Out << "BitsToDouble(0x" << utohexstr(CFP->getValueAPF().bitcastToAPInt().getZExtValue()) << "ULL) /* " << StrVal << " */"; else Out << "BitsToFloat(0x" << utohexstr((uint32_t)CFP->getValueAPF(). bitcastToAPInt().getZExtValue()) << "U) /* " << StrVal << " */"; Out << ")"; #if HAVE_PRINTF_A } #endif Out << ")"; } void CppWriter::printCallingConv(CallingConv::ID cc){ // Print the calling convention. switch (cc) { case CallingConv::C: Out << "CallingConv::C"; break; case CallingConv::Fast: Out << "CallingConv::Fast"; break; case CallingConv::Cold: Out << "CallingConv::Cold"; break; case CallingConv::FirstTargetCC: Out << "CallingConv::FirstTargetCC"; break; default: Out << cc; break; } } void CppWriter::printLinkageType(GlobalValue::LinkageTypes LT) { switch (LT) { case GlobalValue::InternalLinkage: Out << "GlobalValue::InternalLinkage"; break; case GlobalValue::PrivateLinkage: Out << "GlobalValue::PrivateLinkage"; break; case GlobalValue::LinkerPrivateLinkage: Out << "GlobalValue::LinkerPrivateLinkage"; break; case GlobalValue::LinkerPrivateWeakLinkage: Out << "GlobalValue::LinkerPrivateWeakLinkage"; break; case GlobalValue::LinkerPrivateWeakDefAutoLinkage: Out << "GlobalValue::LinkerPrivateWeakDefAutoLinkage"; break; case GlobalValue::AvailableExternallyLinkage: Out << "GlobalValue::AvailableExternallyLinkage "; break; case GlobalValue::LinkOnceAnyLinkage: Out << "GlobalValue::LinkOnceAnyLinkage "; break; case GlobalValue::LinkOnceODRLinkage: Out << "GlobalValue::LinkOnceODRLinkage "; break; case GlobalValue::WeakAnyLinkage: Out << "GlobalValue::WeakAnyLinkage"; break; case GlobalValue::WeakODRLinkage: Out << "GlobalValue::WeakODRLinkage"; break; case GlobalValue::AppendingLinkage: Out << "GlobalValue::AppendingLinkage"; break; case GlobalValue::ExternalLinkage: Out << "GlobalValue::ExternalLinkage"; break; case GlobalValue::DLLImportLinkage: Out << "GlobalValue::DLLImportLinkage"; break; case GlobalValue::DLLExportLinkage: Out << "GlobalValue::DLLExportLinkage"; break; case GlobalValue::ExternalWeakLinkage: Out << "GlobalValue::ExternalWeakLinkage"; break; case GlobalValue::CommonLinkage: Out << "GlobalValue::CommonLinkage"; break; } } void CppWriter::printVisibilityType(GlobalValue::VisibilityTypes VisType) { switch (VisType) { default: llvm_unreachable("Unknown GVar visibility"); case GlobalValue::DefaultVisibility: Out << "GlobalValue::DefaultVisibility"; break; case GlobalValue::HiddenVisibility: Out << "GlobalValue::HiddenVisibility"; break; case GlobalValue::ProtectedVisibility: Out << "GlobalValue::ProtectedVisibility"; break; } } // printEscapedString - Print each character of the specified string, escaping // it if it is not printable or if it is an escape char. void CppWriter::printEscapedString(const std::string &Str) { for (unsigned i = 0, e = Str.size(); i != e; ++i) { unsigned char C = Str[i]; if (isprint(C) && C != '"' && C != '\\') { Out << C; } else { Out << "\\x" << (char) ((C/16 < 10) ? ( C/16 +'0') : ( C/16 -10+'A')) << (char)(((C&15) < 10) ? ((C&15)+'0') : ((C&15)-10+'A')); } } } std::string CppWriter::getCppName(Type* Ty) { // First, handle the primitive types .. easy if (Ty->isPrimitiveType() || Ty->isIntegerTy()) { switch (Ty->getTypeID()) { case Type::VoidTyID: return "Type::getVoidTy(mod->getContext())"; case Type::IntegerTyID: { unsigned BitWidth = cast(Ty)->getBitWidth(); return "IntegerType::get(mod->getContext(), " + utostr(BitWidth) + ")"; } case Type::X86_FP80TyID: return "Type::getX86_FP80Ty(mod->getContext())"; case Type::FloatTyID: return "Type::getFloatTy(mod->getContext())"; case Type::DoubleTyID: return "Type::getDoubleTy(mod->getContext())"; case Type::LabelTyID: return "Type::getLabelTy(mod->getContext())"; case Type::X86_MMXTyID: return "Type::getX86_MMXTy(mod->getContext())"; default: error("Invalid primitive type"); break; } // shouldn't be returned, but make it sensible return "Type::getVoidTy(mod->getContext())"; } // Now, see if we've seen the type before and return that TypeMap::iterator I = TypeNames.find(Ty); if (I != TypeNames.end()) return I->second; // Okay, let's build a new name for this type. Start with a prefix const char* prefix = 0; switch (Ty->getTypeID()) { case Type::FunctionTyID: prefix = "FuncTy_"; break; case Type::StructTyID: prefix = "StructTy_"; break; case Type::ArrayTyID: prefix = "ArrayTy_"; break; case Type::PointerTyID: prefix = "PointerTy_"; break; case Type::VectorTyID: prefix = "VectorTy_"; break; default: prefix = "OtherTy_"; break; // prevent breakage } // See if the type has a name in the symboltable and build accordingly std::string name; if (StructType *STy = dyn_cast(Ty)) if (STy->hasName()) name = STy->getName(); if (name.empty()) name = utostr(uniqueNum++); name = std::string(prefix) + name; sanitize(name); // Save the name return TypeNames[Ty] = name; } void CppWriter::printCppName(Type* Ty) { printEscapedString(getCppName(Ty)); } std::string CppWriter::getCppName(const Value* val) { std::string name; ValueMap::iterator I = ValueNames.find(val); if (I != ValueNames.end() && I->first == val) return I->second; if (const GlobalVariable* GV = dyn_cast(val)) { name = std::string("gvar_") + getTypePrefix(GV->getType()->getElementType()); } else if (isa(val)) { name = std::string("func_"); } else if (const Constant* C = dyn_cast(val)) { name = std::string("const_") + getTypePrefix(C->getType()); } else if (const Argument* Arg = dyn_cast(val)) { if (is_inline) { unsigned argNum = std::distance(Arg->getParent()->arg_begin(), Function::const_arg_iterator(Arg)) + 1; name = std::string("arg_") + utostr(argNum); NameSet::iterator NI = UsedNames.find(name); if (NI != UsedNames.end()) name += std::string("_") + utostr(uniqueNum++); UsedNames.insert(name); return ValueNames[val] = name; } else { name = getTypePrefix(val->getType()); } } else { name = getTypePrefix(val->getType()); } if (val->hasName()) name += val->getName(); else name += utostr(uniqueNum++); sanitize(name); NameSet::iterator NI = UsedNames.find(name); if (NI != UsedNames.end()) name += std::string("_") + utostr(uniqueNum++); UsedNames.insert(name); return ValueNames[val] = name; } void CppWriter::printCppName(const Value* val) { printEscapedString(getCppName(val)); } void CppWriter::printAttributes(const AttrListPtr &PAL, const std::string &name) { Out << "AttrListPtr " << name << "_PAL;"; nl(Out); if (!PAL.isEmpty()) { Out << '{'; in(); nl(Out); Out << "SmallVector Attrs;"; nl(Out); Out << "AttributeWithIndex PAWI;"; nl(Out); for (unsigned i = 0; i < PAL.getNumSlots(); ++i) { unsigned index = PAL.getSlot(i).Index; Attributes attrs = PAL.getSlot(i).Attrs; Out << "PAWI.Index = " << index << "U; PAWI.Attrs = 0 "; #define HANDLE_ATTR(X) \ if (attrs & Attribute::X) \ Out << " | Attribute::" #X; \ attrs &= ~Attribute::X; HANDLE_ATTR(SExt); HANDLE_ATTR(ZExt); HANDLE_ATTR(NoReturn); HANDLE_ATTR(InReg); HANDLE_ATTR(StructRet); HANDLE_ATTR(NoUnwind); HANDLE_ATTR(NoAlias); HANDLE_ATTR(ByVal); HANDLE_ATTR(Nest); HANDLE_ATTR(ReadNone); HANDLE_ATTR(ReadOnly); HANDLE_ATTR(NoInline); HANDLE_ATTR(AlwaysInline); HANDLE_ATTR(OptimizeForSize); HANDLE_ATTR(StackProtect); HANDLE_ATTR(StackProtectReq); HANDLE_ATTR(NoCapture); HANDLE_ATTR(NoRedZone); HANDLE_ATTR(NoImplicitFloat); HANDLE_ATTR(Naked); HANDLE_ATTR(InlineHint); HANDLE_ATTR(ReturnsTwice); HANDLE_ATTR(UWTable); HANDLE_ATTR(NonLazyBind); #undef HANDLE_ATTR if (attrs & Attribute::StackAlignment) Out << " | Attribute::constructStackAlignmentFromInt(" << Attribute::getStackAlignmentFromAttrs(attrs) << ")"; attrs &= ~Attribute::StackAlignment; assert(attrs == 0 && "Unhandled attribute!"); Out << ";"; nl(Out); Out << "Attrs.push_back(PAWI);"; nl(Out); } Out << name << "_PAL = AttrListPtr::get(Attrs.begin(), Attrs.end());"; nl(Out); out(); nl(Out); Out << '}'; nl(Out); } } void CppWriter::printType(Type* Ty) { // We don't print definitions for primitive types if (Ty->isPrimitiveType() || Ty->isIntegerTy()) return; // If we already defined this type, we don't need to define it again. if (DefinedTypes.find(Ty) != DefinedTypes.end()) return; // Everything below needs the name for the type so get it now. std::string typeName(getCppName(Ty)); // Print the type definition switch (Ty->getTypeID()) { case Type::FunctionTyID: { FunctionType* FT = cast(Ty); Out << "std::vector" << typeName << "_args;"; nl(Out); FunctionType::param_iterator PI = FT->param_begin(); FunctionType::param_iterator PE = FT->param_end(); for (; PI != PE; ++PI) { Type* argTy = static_cast(*PI); printType(argTy); std::string argName(getCppName(argTy)); Out << typeName << "_args.push_back(" << argName; Out << ");"; nl(Out); } printType(FT->getReturnType()); std::string retTypeName(getCppName(FT->getReturnType())); Out << "FunctionType* " << typeName << " = FunctionType::get("; in(); nl(Out) << "/*Result=*/" << retTypeName; Out << ","; nl(Out) << "/*Params=*/" << typeName << "_args,"; nl(Out) << "/*isVarArg=*/" << (FT->isVarArg() ? "true" : "false") << ");"; out(); nl(Out); break; } case Type::StructTyID: { StructType* ST = cast(Ty); if (!ST->isLiteral()) { Out << "StructType *" << typeName << " = mod->getTypeByName(\""; printEscapedString(ST->getName()); Out << "\");"; nl(Out); Out << "if (!" << typeName << ") {"; nl(Out); Out << typeName << " = "; Out << "StructType::create(mod->getContext(), \""; printEscapedString(ST->getName()); Out << "\");"; nl(Out); Out << "}"; nl(Out); // Indicate that this type is now defined. DefinedTypes.insert(Ty); } Out << "std::vector" << typeName << "_fields;"; nl(Out); StructType::element_iterator EI = ST->element_begin(); StructType::element_iterator EE = ST->element_end(); for (; EI != EE; ++EI) { Type* fieldTy = static_cast(*EI); printType(fieldTy); std::string fieldName(getCppName(fieldTy)); Out << typeName << "_fields.push_back(" << fieldName; Out << ");"; nl(Out); } if (ST->isLiteral()) { Out << "StructType *" << typeName << " = "; Out << "StructType::get(" << "mod->getContext(), "; } else { Out << "if (" << typeName << "->isOpaque()) {"; nl(Out); Out << typeName << "->setBody("; } Out << typeName << "_fields, /*isPacked=*/" << (ST->isPacked() ? "true" : "false") << ");"; nl(Out); if (!ST->isLiteral()) { Out << "}"; nl(Out); } break; } case Type::ArrayTyID: { ArrayType* AT = cast(Ty); Type* ET = AT->getElementType(); printType(ET); if (DefinedTypes.find(Ty) == DefinedTypes.end()) { std::string elemName(getCppName(ET)); Out << "ArrayType* " << typeName << " = ArrayType::get(" << elemName << ", " << utostr(AT->getNumElements()) << ");"; nl(Out); } break; } case Type::PointerTyID: { PointerType* PT = cast(Ty); Type* ET = PT->getElementType(); printType(ET); if (DefinedTypes.find(Ty) == DefinedTypes.end()) { std::string elemName(getCppName(ET)); Out << "PointerType* " << typeName << " = PointerType::get(" << elemName << ", " << utostr(PT->getAddressSpace()) << ");"; nl(Out); } break; } case Type::VectorTyID: { VectorType* PT = cast(Ty); Type* ET = PT->getElementType(); printType(ET); if (DefinedTypes.find(Ty) == DefinedTypes.end()) { std::string elemName(getCppName(ET)); Out << "VectorType* " << typeName << " = VectorType::get(" << elemName << ", " << utostr(PT->getNumElements()) << ");"; nl(Out); } break; } default: error("Invalid TypeID"); } // Indicate that this type is now defined. DefinedTypes.insert(Ty); // Finally, separate the type definition from other with a newline. nl(Out); } void CppWriter::printTypes(const Module* M) { // Add all of the global variables to the value table. for (Module::const_global_iterator I = TheModule->global_begin(), E = TheModule->global_end(); I != E; ++I) { if (I->hasInitializer()) printType(I->getInitializer()->getType()); printType(I->getType()); } // Add all the functions to the table for (Module::const_iterator FI = TheModule->begin(), FE = TheModule->end(); FI != FE; ++FI) { printType(FI->getReturnType()); printType(FI->getFunctionType()); // Add all the function arguments for (Function::const_arg_iterator AI = FI->arg_begin(), AE = FI->arg_end(); AI != AE; ++AI) { printType(AI->getType()); } // Add all of the basic blocks and instructions for (Function::const_iterator BB = FI->begin(), E = FI->end(); BB != E; ++BB) { printType(BB->getType()); for (BasicBlock::const_iterator I = BB->begin(), E = BB->end(); I!=E; ++I) { printType(I->getType()); for (unsigned i = 0; i < I->getNumOperands(); ++i) printType(I->getOperand(i)->getType()); } } } } // printConstant - Print out a constant pool entry... void CppWriter::printConstant(const Constant *CV) { // First, if the constant is actually a GlobalValue (variable or function) // or its already in the constant list then we've printed it already and we // can just return. if (isa(CV) || ValueNames.find(CV) != ValueNames.end()) return; std::string constName(getCppName(CV)); std::string typeName(getCppName(CV->getType())); if (isa(CV)) { // Skip variables and functions, we emit them elsewhere return; } if (const ConstantInt *CI = dyn_cast(CV)) { std::string constValue = CI->getValue().toString(10, true); Out << "ConstantInt* " << constName << " = ConstantInt::get(mod->getContext(), APInt(" << cast(CI->getType())->getBitWidth() << ", StringRef(\"" << constValue << "\"), 10));"; } else if (isa(CV)) { Out << "ConstantAggregateZero* " << constName << " = ConstantAggregateZero::get(" << typeName << ");"; } else if (isa(CV)) { Out << "ConstantPointerNull* " << constName << " = ConstantPointerNull::get(" << typeName << ");"; } else if (const ConstantFP *CFP = dyn_cast(CV)) { Out << "ConstantFP* " << constName << " = "; printCFP(CFP); Out << ";"; } else if (const ConstantArray *CA = dyn_cast(CV)) { if (CA->isString() && CA->getType()->getElementType() == Type::getInt8Ty(CA->getContext())) { Out << "Constant* " << constName << " = ConstantArray::get(mod->getContext(), \""; std::string tmp = CA->getAsString(); bool nullTerminate = false; if (tmp[tmp.length()-1] == 0) { tmp.erase(tmp.length()-1); nullTerminate = true; } printEscapedString(tmp); // Determine if we want null termination or not. if (nullTerminate) Out << "\", true"; // Indicate that the null terminator should be // added. else Out << "\", false";// No null terminator Out << ");"; } else { Out << "std::vector " << constName << "_elems;"; nl(Out); unsigned N = CA->getNumOperands(); for (unsigned i = 0; i < N; ++i) { printConstant(CA->getOperand(i)); // recurse to print operands Out << constName << "_elems.push_back(" << getCppName(CA->getOperand(i)) << ");"; nl(Out); } Out << "Constant* " << constName << " = ConstantArray::get(" << typeName << ", " << constName << "_elems);"; } } else if (const ConstantStruct *CS = dyn_cast(CV)) { Out << "std::vector " << constName << "_fields;"; nl(Out); unsigned N = CS->getNumOperands(); for (unsigned i = 0; i < N; i++) { printConstant(CS->getOperand(i)); Out << constName << "_fields.push_back(" << getCppName(CS->getOperand(i)) << ");"; nl(Out); } Out << "Constant* " << constName << " = ConstantStruct::get(" << typeName << ", " << constName << "_fields);"; } else if (const ConstantVector *CP = dyn_cast(CV)) { Out << "std::vector " << constName << "_elems;"; nl(Out); unsigned N = CP->getNumOperands(); for (unsigned i = 0; i < N; ++i) { printConstant(CP->getOperand(i)); Out << constName << "_elems.push_back(" << getCppName(CP->getOperand(i)) << ");"; nl(Out); } Out << "Constant* " << constName << " = ConstantVector::get(" << typeName << ", " << constName << "_elems);"; } else if (isa(CV)) { Out << "UndefValue* " << constName << " = UndefValue::get(" << typeName << ");"; } else if (const ConstantExpr *CE = dyn_cast(CV)) { if (CE->getOpcode() == Instruction::GetElementPtr) { Out << "std::vector " << constName << "_indices;"; nl(Out); printConstant(CE->getOperand(0)); for (unsigned i = 1; i < CE->getNumOperands(); ++i ) { printConstant(CE->getOperand(i)); Out << constName << "_indices.push_back(" << getCppName(CE->getOperand(i)) << ");"; nl(Out); } Out << "Constant* " << constName << " = ConstantExpr::getGetElementPtr(" << getCppName(CE->getOperand(0)) << ", " << constName << "_indices);"; } else if (CE->isCast()) { printConstant(CE->getOperand(0)); Out << "Constant* " << constName << " = ConstantExpr::getCast("; switch (CE->getOpcode()) { default: llvm_unreachable("Invalid cast opcode"); case Instruction::Trunc: Out << "Instruction::Trunc"; break; case Instruction::ZExt: Out << "Instruction::ZExt"; break; case Instruction::SExt: Out << "Instruction::SExt"; break; case Instruction::FPTrunc: Out << "Instruction::FPTrunc"; break; case Instruction::FPExt: Out << "Instruction::FPExt"; break; case Instruction::FPToUI: Out << "Instruction::FPToUI"; break; case Instruction::FPToSI: Out << "Instruction::FPToSI"; break; case Instruction::UIToFP: Out << "Instruction::UIToFP"; break; case Instruction::SIToFP: Out << "Instruction::SIToFP"; break; case Instruction::PtrToInt: Out << "Instruction::PtrToInt"; break; case Instruction::IntToPtr: Out << "Instruction::IntToPtr"; break; case Instruction::BitCast: Out << "Instruction::BitCast"; break; } Out << ", " << getCppName(CE->getOperand(0)) << ", " << getCppName(CE->getType()) << ");"; } else { unsigned N = CE->getNumOperands(); for (unsigned i = 0; i < N; ++i ) { printConstant(CE->getOperand(i)); } Out << "Constant* " << constName << " = ConstantExpr::"; switch (CE->getOpcode()) { case Instruction::Add: Out << "getAdd("; break; case Instruction::FAdd: Out << "getFAdd("; break; case Instruction::Sub: Out << "getSub("; break; case Instruction::FSub: Out << "getFSub("; break; case Instruction::Mul: Out << "getMul("; break; case Instruction::FMul: Out << "getFMul("; break; case Instruction::UDiv: Out << "getUDiv("; break; case Instruction::SDiv: Out << "getSDiv("; break; case Instruction::FDiv: Out << "getFDiv("; break; case Instruction::URem: Out << "getURem("; break; case Instruction::SRem: Out << "getSRem("; break; case Instruction::FRem: Out << "getFRem("; break; case Instruction::And: Out << "getAnd("; break; case Instruction::Or: Out << "getOr("; break; case Instruction::Xor: Out << "getXor("; break; case Instruction::ICmp: Out << "getICmp(ICmpInst::ICMP_"; switch (CE->getPredicate()) { case ICmpInst::ICMP_EQ: Out << "EQ"; break; case ICmpInst::ICMP_NE: Out << "NE"; break; case ICmpInst::ICMP_SLT: Out << "SLT"; break; case ICmpInst::ICMP_ULT: Out << "ULT"; break; case ICmpInst::ICMP_SGT: Out << "SGT"; break; case ICmpInst::ICMP_UGT: Out << "UGT"; break; case ICmpInst::ICMP_SLE: Out << "SLE"; break; case ICmpInst::ICMP_ULE: Out << "ULE"; break; case ICmpInst::ICMP_SGE: Out << "SGE"; break; case ICmpInst::ICMP_UGE: Out << "UGE"; break; default: error("Invalid ICmp Predicate"); } break; case Instruction::FCmp: Out << "getFCmp(FCmpInst::FCMP_"; switch (CE->getPredicate()) { case FCmpInst::FCMP_FALSE: Out << "FALSE"; break; case FCmpInst::FCMP_ORD: Out << "ORD"; break; case FCmpInst::FCMP_UNO: Out << "UNO"; break; case FCmpInst::FCMP_OEQ: Out << "OEQ"; break; case FCmpInst::FCMP_UEQ: Out << "UEQ"; break; case FCmpInst::FCMP_ONE: Out << "ONE"; break; case FCmpInst::FCMP_UNE: Out << "UNE"; break; case FCmpInst::FCMP_OLT: Out << "OLT"; break; case FCmpInst::FCMP_ULT: Out << "ULT"; break; case FCmpInst::FCMP_OGT: Out << "OGT"; break; case FCmpInst::FCMP_UGT: Out << "UGT"; break; case FCmpInst::FCMP_OLE: Out << "OLE"; break; case FCmpInst::FCMP_ULE: Out << "ULE"; break; case FCmpInst::FCMP_OGE: Out << "OGE"; break; case FCmpInst::FCMP_UGE: Out << "UGE"; break; case FCmpInst::FCMP_TRUE: Out << "TRUE"; break; default: error("Invalid FCmp Predicate"); } break; case Instruction::Shl: Out << "getShl("; break; case Instruction::LShr: Out << "getLShr("; break; case Instruction::AShr: Out << "getAShr("; break; case Instruction::Select: Out << "getSelect("; break; case Instruction::ExtractElement: Out << "getExtractElement("; break; case Instruction::InsertElement: Out << "getInsertElement("; break; case Instruction::ShuffleVector: Out << "getShuffleVector("; break; default: error("Invalid constant expression"); break; } Out << getCppName(CE->getOperand(0)); for (unsigned i = 1; i < CE->getNumOperands(); ++i) Out << ", " << getCppName(CE->getOperand(i)); Out << ");"; } } else if (const BlockAddress *BA = dyn_cast(CV)) { Out << "Constant* " << constName << " = "; Out << "BlockAddress::get(" << getOpName(BA->getBasicBlock()) << ");"; } else { error("Bad Constant"); Out << "Constant* " << constName << " = 0; "; } nl(Out); } void CppWriter::printConstants(const Module* M) { // Traverse all the global variables looking for constant initializers for (Module::const_global_iterator I = TheModule->global_begin(), E = TheModule->global_end(); I != E; ++I) if (I->hasInitializer()) printConstant(I->getInitializer()); // Traverse the LLVM functions looking for constants for (Module::const_iterator FI = TheModule->begin(), FE = TheModule->end(); FI != FE; ++FI) { // Add all of the basic blocks and instructions for (Function::const_iterator BB = FI->begin(), E = FI->end(); BB != E; ++BB) { for (BasicBlock::const_iterator I = BB->begin(), E = BB->end(); I!=E; ++I) { for (unsigned i = 0; i < I->getNumOperands(); ++i) { if (Constant* C = dyn_cast(I->getOperand(i))) { printConstant(C); } } } } } } void CppWriter::printVariableUses(const GlobalVariable *GV) { nl(Out) << "// Type Definitions"; nl(Out); printType(GV->getType()); if (GV->hasInitializer()) { const Constant *Init = GV->getInitializer(); printType(Init->getType()); if (const Function *F = dyn_cast(Init)) { nl(Out)<< "/ Function Declarations"; nl(Out); printFunctionHead(F); } else if (const GlobalVariable* gv = dyn_cast(Init)) { nl(Out) << "// Global Variable Declarations"; nl(Out); printVariableHead(gv); nl(Out) << "// Global Variable Definitions"; nl(Out); printVariableBody(gv); } else { nl(Out) << "// Constant Definitions"; nl(Out); printConstant(Init); } } } void CppWriter::printVariableHead(const GlobalVariable *GV) { nl(Out) << "GlobalVariable* " << getCppName(GV); if (is_inline) { Out << " = mod->getGlobalVariable(mod->getContext(), "; printEscapedString(GV->getName()); Out << ", " << getCppName(GV->getType()->getElementType()) << ",true)"; nl(Out) << "if (!" << getCppName(GV) << ") {"; in(); nl(Out) << getCppName(GV); } Out << " = new GlobalVariable(/*Module=*/*mod, "; nl(Out) << "/*Type=*/"; printCppName(GV->getType()->getElementType()); Out << ","; nl(Out) << "/*isConstant=*/" << (GV->isConstant()?"true":"false"); Out << ","; nl(Out) << "/*Linkage=*/"; printLinkageType(GV->getLinkage()); Out << ","; nl(Out) << "/*Initializer=*/0, "; if (GV->hasInitializer()) { Out << "// has initializer, specified below"; } nl(Out) << "/*Name=*/\""; printEscapedString(GV->getName()); Out << "\");"; nl(Out); if (GV->hasSection()) { printCppName(GV); Out << "->setSection(\""; printEscapedString(GV->getSection()); Out << "\");"; nl(Out); } if (GV->getAlignment()) { printCppName(GV); Out << "->setAlignment(" << utostr(GV->getAlignment()) << ");"; nl(Out); } if (GV->getVisibility() != GlobalValue::DefaultVisibility) { printCppName(GV); Out << "->setVisibility("; printVisibilityType(GV->getVisibility()); Out << ");"; nl(Out); } if (GV->isThreadLocal()) { printCppName(GV); Out << "->setThreadLocal(true);"; nl(Out); } if (is_inline) { out(); Out << "}"; nl(Out); } } void CppWriter::printVariableBody(const GlobalVariable *GV) { if (GV->hasInitializer()) { printCppName(GV); Out << "->setInitializer("; Out << getCppName(GV->getInitializer()) << ");"; nl(Out); } } std::string CppWriter::getOpName(const Value* V) { if (!isa(V) || DefinedValues.find(V) != DefinedValues.end()) return getCppName(V); // See if its alread in the map of forward references, if so just return the // name we already set up for it ForwardRefMap::const_iterator I = ForwardRefs.find(V); if (I != ForwardRefs.end()) return I->second; // This is a new forward reference. Generate a unique name for it std::string result(std::string("fwdref_") + utostr(uniqueNum++)); // Yes, this is a hack. An Argument is the smallest instantiable value that // we can make as a placeholder for the real value. We'll replace these // Argument instances later. Out << "Argument* " << result << " = new Argument(" << getCppName(V->getType()) << ");"; nl(Out); ForwardRefs[V] = result; return result; } +static StringRef ConvertAtomicOrdering(AtomicOrdering Ordering) { + switch (Ordering) { + case NotAtomic: return "NotAtomic"; + case Unordered: return "Unordered"; + case Monotonic: return "Monotonic"; + case Acquire: return "Acquire"; + case Release: return "Release"; + case AcquireRelease: return "AcquireRelease"; + case SequentiallyConsistent: return "SequentiallyConsistent"; + } + llvm_unreachable("Unknown ordering"); +} + +static StringRef ConvertAtomicSynchScope(SynchronizationScope SynchScope) { + switch (SynchScope) { + case SingleThread: return "SingleThread"; + case CrossThread: return "CrossThread"; + } + llvm_unreachable("Unknown synch scope"); +} + // printInstruction - This member is called for each Instruction in a function. void CppWriter::printInstruction(const Instruction *I, const std::string& bbname) { std::string iName(getCppName(I)); // Before we emit this instruction, we need to take care of generating any // forward references. So, we get the names of all the operands in advance const unsigned Ops(I->getNumOperands()); std::string* opNames = new std::string[Ops]; for (unsigned i = 0; i < Ops; i++) opNames[i] = getOpName(I->getOperand(i)); switch (I->getOpcode()) { default: error("Invalid instruction"); break; case Instruction::Ret: { const ReturnInst* ret = cast(I); Out << "ReturnInst::Create(mod->getContext(), " << (ret->getReturnValue() ? opNames[0] + ", " : "") << bbname << ");"; break; } case Instruction::Br: { const BranchInst* br = cast(I); Out << "BranchInst::Create(" ; if (br->getNumOperands() == 3) { Out << opNames[2] << ", " << opNames[1] << ", " << opNames[0] << ", "; } else if (br->getNumOperands() == 1) { Out << opNames[0] << ", "; } else { error("Branch with 2 operands?"); } Out << bbname << ");"; break; } case Instruction::Switch: { const SwitchInst *SI = cast(I); Out << "SwitchInst* " << iName << " = SwitchInst::Create(" << getOpName(SI->getCondition()) << ", " << getOpName(SI->getDefaultDest()) << ", " << SI->getNumCases() << ", " << bbname << ");"; nl(Out); unsigned NumCases = SI->getNumCases(); for (unsigned i = 1; i < NumCases; ++i) { const ConstantInt* CaseVal = SI->getCaseValue(i); const BasicBlock* BB = SI->getSuccessor(i); Out << iName << "->addCase(" << getOpName(CaseVal) << ", " << getOpName(BB) << ");"; nl(Out); } break; } case Instruction::IndirectBr: { const IndirectBrInst *IBI = cast(I); Out << "IndirectBrInst *" << iName << " = IndirectBrInst::Create(" << opNames[0] << ", " << IBI->getNumDestinations() << ");"; nl(Out); for (unsigned i = 1; i != IBI->getNumOperands(); ++i) { Out << iName << "->addDestination(" << opNames[i] << ");"; nl(Out); } break; } case Instruction::Resume: { Out << "ResumeInst::Create(mod->getContext(), " << opNames[0] << ", " << bbname << ");"; break; } case Instruction::Invoke: { const InvokeInst* inv = cast(I); Out << "std::vector " << iName << "_params;"; nl(Out); for (unsigned i = 0; i < inv->getNumArgOperands(); ++i) { Out << iName << "_params.push_back(" << getOpName(inv->getArgOperand(i)) << ");"; nl(Out); } // FIXME: This shouldn't use magic numbers -3, -2, and -1. Out << "InvokeInst *" << iName << " = InvokeInst::Create(" << getOpName(inv->getCalledFunction()) << ", " << getOpName(inv->getNormalDest()) << ", " << getOpName(inv->getUnwindDest()) << ", " << iName << "_params, \""; printEscapedString(inv->getName()); Out << "\", " << bbname << ");"; nl(Out) << iName << "->setCallingConv("; printCallingConv(inv->getCallingConv()); Out << ");"; printAttributes(inv->getAttributes(), iName); Out << iName << "->setAttributes(" << iName << "_PAL);"; nl(Out); break; } case Instruction::Unwind: { Out << "new UnwindInst(" << bbname << ");"; break; } case Instruction::Unreachable: { Out << "new UnreachableInst(" << "mod->getContext(), " << bbname << ");"; break; } case Instruction::Add: case Instruction::FAdd: case Instruction::Sub: case Instruction::FSub: case Instruction::Mul: case Instruction::FMul: case Instruction::UDiv: case Instruction::SDiv: case Instruction::FDiv: case Instruction::URem: case Instruction::SRem: case Instruction::FRem: case Instruction::And: case Instruction::Or: case Instruction::Xor: case Instruction::Shl: case Instruction::LShr: case Instruction::AShr:{ Out << "BinaryOperator* " << iName << " = BinaryOperator::Create("; switch (I->getOpcode()) { case Instruction::Add: Out << "Instruction::Add"; break; case Instruction::FAdd: Out << "Instruction::FAdd"; break; case Instruction::Sub: Out << "Instruction::Sub"; break; case Instruction::FSub: Out << "Instruction::FSub"; break; case Instruction::Mul: Out << "Instruction::Mul"; break; case Instruction::FMul: Out << "Instruction::FMul"; break; case Instruction::UDiv:Out << "Instruction::UDiv"; break; case Instruction::SDiv:Out << "Instruction::SDiv"; break; case Instruction::FDiv:Out << "Instruction::FDiv"; break; case Instruction::URem:Out << "Instruction::URem"; break; case Instruction::SRem:Out << "Instruction::SRem"; break; case Instruction::FRem:Out << "Instruction::FRem"; break; case Instruction::And: Out << "Instruction::And"; break; case Instruction::Or: Out << "Instruction::Or"; break; case Instruction::Xor: Out << "Instruction::Xor"; break; case Instruction::Shl: Out << "Instruction::Shl"; break; case Instruction::LShr:Out << "Instruction::LShr"; break; case Instruction::AShr:Out << "Instruction::AShr"; break; default: Out << "Instruction::BadOpCode"; break; } Out << ", " << opNames[0] << ", " << opNames[1] << ", \""; printEscapedString(I->getName()); Out << "\", " << bbname << ");"; break; } case Instruction::FCmp: { Out << "FCmpInst* " << iName << " = new FCmpInst(*" << bbname << ", "; switch (cast(I)->getPredicate()) { case FCmpInst::FCMP_FALSE: Out << "FCmpInst::FCMP_FALSE"; break; case FCmpInst::FCMP_OEQ : Out << "FCmpInst::FCMP_OEQ"; break; case FCmpInst::FCMP_OGT : Out << "FCmpInst::FCMP_OGT"; break; case FCmpInst::FCMP_OGE : Out << "FCmpInst::FCMP_OGE"; break; case FCmpInst::FCMP_OLT : Out << "FCmpInst::FCMP_OLT"; break; case FCmpInst::FCMP_OLE : Out << "FCmpInst::FCMP_OLE"; break; case FCmpInst::FCMP_ONE : Out << "FCmpInst::FCMP_ONE"; break; case FCmpInst::FCMP_ORD : Out << "FCmpInst::FCMP_ORD"; break; case FCmpInst::FCMP_UNO : Out << "FCmpInst::FCMP_UNO"; break; case FCmpInst::FCMP_UEQ : Out << "FCmpInst::FCMP_UEQ"; break; case FCmpInst::FCMP_UGT : Out << "FCmpInst::FCMP_UGT"; break; case FCmpInst::FCMP_UGE : Out << "FCmpInst::FCMP_UGE"; break; case FCmpInst::FCMP_ULT : Out << "FCmpInst::FCMP_ULT"; break; case FCmpInst::FCMP_ULE : Out << "FCmpInst::FCMP_ULE"; break; case FCmpInst::FCMP_UNE : Out << "FCmpInst::FCMP_UNE"; break; case FCmpInst::FCMP_TRUE : Out << "FCmpInst::FCMP_TRUE"; break; default: Out << "FCmpInst::BAD_ICMP_PREDICATE"; break; } Out << ", " << opNames[0] << ", " << opNames[1] << ", \""; printEscapedString(I->getName()); Out << "\");"; break; } case Instruction::ICmp: { Out << "ICmpInst* " << iName << " = new ICmpInst(*" << bbname << ", "; switch (cast(I)->getPredicate()) { case ICmpInst::ICMP_EQ: Out << "ICmpInst::ICMP_EQ"; break; case ICmpInst::ICMP_NE: Out << "ICmpInst::ICMP_NE"; break; case ICmpInst::ICMP_ULE: Out << "ICmpInst::ICMP_ULE"; break; case ICmpInst::ICMP_SLE: Out << "ICmpInst::ICMP_SLE"; break; case ICmpInst::ICMP_UGE: Out << "ICmpInst::ICMP_UGE"; break; case ICmpInst::ICMP_SGE: Out << "ICmpInst::ICMP_SGE"; break; case ICmpInst::ICMP_ULT: Out << "ICmpInst::ICMP_ULT"; break; case ICmpInst::ICMP_SLT: Out << "ICmpInst::ICMP_SLT"; break; case ICmpInst::ICMP_UGT: Out << "ICmpInst::ICMP_UGT"; break; case ICmpInst::ICMP_SGT: Out << "ICmpInst::ICMP_SGT"; break; default: Out << "ICmpInst::BAD_ICMP_PREDICATE"; break; } Out << ", " << opNames[0] << ", " << opNames[1] << ", \""; printEscapedString(I->getName()); Out << "\");"; break; } case Instruction::Alloca: { const AllocaInst* allocaI = cast(I); Out << "AllocaInst* " << iName << " = new AllocaInst(" << getCppName(allocaI->getAllocatedType()) << ", "; if (allocaI->isArrayAllocation()) Out << opNames[0] << ", "; Out << "\""; printEscapedString(allocaI->getName()); Out << "\", " << bbname << ");"; if (allocaI->getAlignment()) nl(Out) << iName << "->setAlignment(" << allocaI->getAlignment() << ");"; break; } case Instruction::Load: { const LoadInst* load = cast(I); Out << "LoadInst* " << iName << " = new LoadInst(" << opNames[0] << ", \""; printEscapedString(load->getName()); Out << "\", " << (load->isVolatile() ? "true" : "false" ) << ", " << bbname << ");"; + if (load->getAlignment()) + nl(Out) << iName << "->setAlignment(" + << load->getAlignment() << ");"; + if (load->isAtomic()) { + StringRef Ordering = ConvertAtomicOrdering(load->getOrdering()); + StringRef CrossThread = ConvertAtomicSynchScope(load->getSynchScope()); + nl(Out) << iName << "->setAtomic(" + << Ordering << ", " << CrossThread << ");"; + } break; } case Instruction::Store: { const StoreInst* store = cast(I); - Out << " new StoreInst(" + Out << "StoreInst* " << iName << " = new StoreInst(" << opNames[0] << ", " << opNames[1] << ", " << (store->isVolatile() ? "true" : "false") << ", " << bbname << ");"; + if (store->getAlignment()) + nl(Out) << iName << "->setAlignment(" + << store->getAlignment() << ");"; + if (store->isAtomic()) { + StringRef Ordering = ConvertAtomicOrdering(store->getOrdering()); + StringRef CrossThread = ConvertAtomicSynchScope(store->getSynchScope()); + nl(Out) << iName << "->setAtomic(" + << Ordering << ", " << CrossThread << ");"; + } break; } case Instruction::GetElementPtr: { const GetElementPtrInst* gep = cast(I); if (gep->getNumOperands() <= 2) { Out << "GetElementPtrInst* " << iName << " = GetElementPtrInst::Create(" << opNames[0]; if (gep->getNumOperands() == 2) Out << ", " << opNames[1]; } else { Out << "std::vector " << iName << "_indices;"; nl(Out); for (unsigned i = 1; i < gep->getNumOperands(); ++i ) { Out << iName << "_indices.push_back(" << opNames[i] << ");"; nl(Out); } Out << "Instruction* " << iName << " = GetElementPtrInst::Create(" << opNames[0] << ", " << iName << "_indices"; } Out << ", \""; printEscapedString(gep->getName()); Out << "\", " << bbname << ");"; break; } case Instruction::PHI: { const PHINode* phi = cast(I); Out << "PHINode* " << iName << " = PHINode::Create(" << getCppName(phi->getType()) << ", " << phi->getNumIncomingValues() << ", \""; printEscapedString(phi->getName()); Out << "\", " << bbname << ");"; nl(Out); for (unsigned i = 0; i < phi->getNumIncomingValues(); ++i) { Out << iName << "->addIncoming(" << opNames[PHINode::getOperandNumForIncomingValue(i)] << ", " << getOpName(phi->getIncomingBlock(i)) << ");"; nl(Out); } break; } case Instruction::Trunc: case Instruction::ZExt: case Instruction::SExt: case Instruction::FPTrunc: case Instruction::FPExt: case Instruction::FPToUI: case Instruction::FPToSI: case Instruction::UIToFP: case Instruction::SIToFP: case Instruction::PtrToInt: case Instruction::IntToPtr: case Instruction::BitCast: { const CastInst* cst = cast(I); Out << "CastInst* " << iName << " = new "; switch (I->getOpcode()) { case Instruction::Trunc: Out << "TruncInst"; break; case Instruction::ZExt: Out << "ZExtInst"; break; case Instruction::SExt: Out << "SExtInst"; break; case Instruction::FPTrunc: Out << "FPTruncInst"; break; case Instruction::FPExt: Out << "FPExtInst"; break; case Instruction::FPToUI: Out << "FPToUIInst"; break; case Instruction::FPToSI: Out << "FPToSIInst"; break; case Instruction::UIToFP: Out << "UIToFPInst"; break; case Instruction::SIToFP: Out << "SIToFPInst"; break; case Instruction::PtrToInt: Out << "PtrToIntInst"; break; case Instruction::IntToPtr: Out << "IntToPtrInst"; break; case Instruction::BitCast: Out << "BitCastInst"; break; default: assert(0 && "Unreachable"); break; } Out << "(" << opNames[0] << ", " << getCppName(cst->getType()) << ", \""; printEscapedString(cst->getName()); Out << "\", " << bbname << ");"; break; } case Instruction::Call: { const CallInst* call = cast(I); if (const InlineAsm* ila = dyn_cast(call->getCalledValue())) { Out << "InlineAsm* " << getCppName(ila) << " = InlineAsm::get(" << getCppName(ila->getFunctionType()) << ", \"" << ila->getAsmString() << "\", \"" << ila->getConstraintString() << "\"," << (ila->hasSideEffects() ? "true" : "false") << ");"; nl(Out); } if (call->getNumArgOperands() > 1) { Out << "std::vector " << iName << "_params;"; nl(Out); for (unsigned i = 0; i < call->getNumArgOperands(); ++i) { Out << iName << "_params.push_back(" << opNames[i] << ");"; nl(Out); } Out << "CallInst* " << iName << " = CallInst::Create(" << opNames[call->getNumArgOperands()] << ", " << iName << "_params, \""; } else if (call->getNumArgOperands() == 1) { Out << "CallInst* " << iName << " = CallInst::Create(" << opNames[call->getNumArgOperands()] << ", " << opNames[0] << ", \""; } else { Out << "CallInst* " << iName << " = CallInst::Create(" << opNames[call->getNumArgOperands()] << ", \""; } printEscapedString(call->getName()); Out << "\", " << bbname << ");"; nl(Out) << iName << "->setCallingConv("; printCallingConv(call->getCallingConv()); Out << ");"; nl(Out) << iName << "->setTailCall(" << (call->isTailCall() ? "true" : "false"); Out << ");"; nl(Out); printAttributes(call->getAttributes(), iName); Out << iName << "->setAttributes(" << iName << "_PAL);"; nl(Out); break; } case Instruction::Select: { const SelectInst* sel = cast(I); Out << "SelectInst* " << getCppName(sel) << " = SelectInst::Create("; Out << opNames[0] << ", " << opNames[1] << ", " << opNames[2] << ", \""; printEscapedString(sel->getName()); Out << "\", " << bbname << ");"; break; } case Instruction::UserOp1: /// FALL THROUGH case Instruction::UserOp2: { /// FIXME: What should be done here? break; } case Instruction::VAArg: { const VAArgInst* va = cast(I); Out << "VAArgInst* " << getCppName(va) << " = new VAArgInst(" << opNames[0] << ", " << getCppName(va->getType()) << ", \""; printEscapedString(va->getName()); Out << "\", " << bbname << ");"; break; } case Instruction::ExtractElement: { const ExtractElementInst* eei = cast(I); Out << "ExtractElementInst* " << getCppName(eei) << " = new ExtractElementInst(" << opNames[0] << ", " << opNames[1] << ", \""; printEscapedString(eei->getName()); Out << "\", " << bbname << ");"; break; } case Instruction::InsertElement: { const InsertElementInst* iei = cast(I); Out << "InsertElementInst* " << getCppName(iei) << " = InsertElementInst::Create(" << opNames[0] << ", " << opNames[1] << ", " << opNames[2] << ", \""; printEscapedString(iei->getName()); Out << "\", " << bbname << ");"; break; } case Instruction::ShuffleVector: { const ShuffleVectorInst* svi = cast(I); Out << "ShuffleVectorInst* " << getCppName(svi) << " = new ShuffleVectorInst(" << opNames[0] << ", " << opNames[1] << ", " << opNames[2] << ", \""; printEscapedString(svi->getName()); Out << "\", " << bbname << ");"; break; } case Instruction::ExtractValue: { const ExtractValueInst *evi = cast(I); Out << "std::vector " << iName << "_indices;"; nl(Out); for (unsigned i = 0; i < evi->getNumIndices(); ++i) { Out << iName << "_indices.push_back(" << evi->idx_begin()[i] << ");"; nl(Out); } Out << "ExtractValueInst* " << getCppName(evi) << " = ExtractValueInst::Create(" << opNames[0] << ", " << iName << "_indices, \""; printEscapedString(evi->getName()); Out << "\", " << bbname << ");"; break; } case Instruction::InsertValue: { const InsertValueInst *ivi = cast(I); Out << "std::vector " << iName << "_indices;"; nl(Out); for (unsigned i = 0; i < ivi->getNumIndices(); ++i) { Out << iName << "_indices.push_back(" << ivi->idx_begin()[i] << ");"; nl(Out); } Out << "InsertValueInst* " << getCppName(ivi) << " = InsertValueInst::Create(" << opNames[0] << ", " << opNames[1] << ", " << iName << "_indices, \""; printEscapedString(ivi->getName()); Out << "\", " << bbname << ");"; break; } + case Instruction::Fence: { + const FenceInst *fi = cast(I); + StringRef Ordering = ConvertAtomicOrdering(fi->getOrdering()); + StringRef CrossThread = ConvertAtomicSynchScope(fi->getSynchScope()); + Out << "FenceInst* " << iName + << " = new FenceInst(mod->getContext(), " + << Ordering << ", " << CrossThread << ", " << bbname + << ");"; + break; } + case Instruction::AtomicCmpXchg: { + const AtomicCmpXchgInst *cxi = cast(I); + StringRef Ordering = ConvertAtomicOrdering(cxi->getOrdering()); + StringRef CrossThread = ConvertAtomicSynchScope(cxi->getSynchScope()); + Out << "AtomicCmpXchgInst* " << iName + << " = new AtomicCmpXchgInst(" + << opNames[0] << ", " << opNames[1] << ", " << opNames[2] << ", " + << Ordering << ", " << CrossThread << ", " << bbname + << ");"; + nl(Out) << iName << "->setName(\""; + printEscapedString(cxi->getName()); + Out << "\");"; + break; + } + case Instruction::AtomicRMW: { + const AtomicRMWInst *rmwi = cast(I); + StringRef Ordering = ConvertAtomicOrdering(rmwi->getOrdering()); + StringRef CrossThread = ConvertAtomicSynchScope(rmwi->getSynchScope()); + StringRef Operation; + switch (rmwi->getOperation()) { + case AtomicRMWInst::Xchg: Operation = "AtomicRMWInst::Xchg"; break; + case AtomicRMWInst::Add: Operation = "AtomicRMWInst::Add"; break; + case AtomicRMWInst::Sub: Operation = "AtomicRMWInst::Sub"; break; + case AtomicRMWInst::And: Operation = "AtomicRMWInst::And"; break; + case AtomicRMWInst::Nand: Operation = "AtomicRMWInst::Nand"; break; + case AtomicRMWInst::Or: Operation = "AtomicRMWInst::Or"; break; + case AtomicRMWInst::Xor: Operation = "AtomicRMWInst::Xor"; break; + case AtomicRMWInst::Max: Operation = "AtomicRMWInst::Max"; break; + case AtomicRMWInst::Min: Operation = "AtomicRMWInst::Min"; break; + case AtomicRMWInst::UMax: Operation = "AtomicRMWInst::UMax"; break; + case AtomicRMWInst::UMin: Operation = "AtomicRMWInst::UMin"; break; + case AtomicRMWInst::BAD_BINOP: llvm_unreachable("Bad atomic operation"); + } + Out << "AtomicRMWInst* " << iName + << " = new AtomicRMWInst(" + << Operation << ", " + << opNames[0] << ", " << opNames[1] << ", " + << Ordering << ", " << CrossThread << ", " << bbname + << ");"; + nl(Out) << iName << "->setName(\""; + printEscapedString(rmwi->getName()); + Out << "\");"; + break; + } + } DefinedValues.insert(I); nl(Out); delete [] opNames; } // Print out the types, constants and declarations needed by one function void CppWriter::printFunctionUses(const Function* F) { nl(Out) << "// Type Definitions"; nl(Out); if (!is_inline) { // Print the function's return type printType(F->getReturnType()); // Print the function's function type printType(F->getFunctionType()); // Print the types of each of the function's arguments for (Function::const_arg_iterator AI = F->arg_begin(), AE = F->arg_end(); AI != AE; ++AI) { printType(AI->getType()); } } // Print type definitions for every type referenced by an instruction and // make a note of any global values or constants that are referenced SmallPtrSet gvs; SmallPtrSet consts; for (Function::const_iterator BB = F->begin(), BE = F->end(); BB != BE; ++BB){ for (BasicBlock::const_iterator I = BB->begin(), E = BB->end(); I != E; ++I) { // Print the type of the instruction itself printType(I->getType()); // Print the type of each of the instruction's operands for (unsigned i = 0; i < I->getNumOperands(); ++i) { Value* operand = I->getOperand(i); printType(operand->getType()); // If the operand references a GVal or Constant, make a note of it if (GlobalValue* GV = dyn_cast(operand)) { gvs.insert(GV); if (GenerationType != GenFunction) if (GlobalVariable *GVar = dyn_cast(GV)) if (GVar->hasInitializer()) consts.insert(GVar->getInitializer()); } else if (Constant* C = dyn_cast(operand)) { consts.insert(C); for (unsigned j = 0; j < C->getNumOperands(); ++j) { // If the operand references a GVal or Constant, make a note of it Value* operand = C->getOperand(j); printType(operand->getType()); if (GlobalValue* GV = dyn_cast(operand)) { gvs.insert(GV); if (GenerationType != GenFunction) if (GlobalVariable *GVar = dyn_cast(GV)) if (GVar->hasInitializer()) consts.insert(GVar->getInitializer()); } } } } } } // Print the function declarations for any functions encountered nl(Out) << "// Function Declarations"; nl(Out); for (SmallPtrSet::iterator I = gvs.begin(), E = gvs.end(); I != E; ++I) { if (Function* Fun = dyn_cast(*I)) { if (!is_inline || Fun != F) printFunctionHead(Fun); } } // Print the global variable declarations for any variables encountered nl(Out) << "// Global Variable Declarations"; nl(Out); for (SmallPtrSet::iterator I = gvs.begin(), E = gvs.end(); I != E; ++I) { if (GlobalVariable* F = dyn_cast(*I)) printVariableHead(F); } // Print the constants found nl(Out) << "// Constant Definitions"; nl(Out); for (SmallPtrSet::iterator I = consts.begin(), E = consts.end(); I != E; ++I) { printConstant(*I); } // Process the global variables definitions now that all the constants have // been emitted. These definitions just couple the gvars with their constant // initializers. if (GenerationType != GenFunction) { nl(Out) << "// Global Variable Definitions"; nl(Out); for (SmallPtrSet::iterator I = gvs.begin(), E = gvs.end(); I != E; ++I) { if (GlobalVariable* GV = dyn_cast(*I)) printVariableBody(GV); } } } void CppWriter::printFunctionHead(const Function* F) { nl(Out) << "Function* " << getCppName(F); Out << " = mod->getFunction(\""; printEscapedString(F->getName()); Out << "\");"; nl(Out) << "if (!" << getCppName(F) << ") {"; nl(Out) << getCppName(F); Out<< " = Function::Create("; nl(Out,1) << "/*Type=*/" << getCppName(F->getFunctionType()) << ","; nl(Out) << "/*Linkage=*/"; printLinkageType(F->getLinkage()); Out << ","; nl(Out) << "/*Name=*/\""; printEscapedString(F->getName()); Out << "\", mod); " << (F->isDeclaration()? "// (external, no body)" : ""); nl(Out,-1); printCppName(F); Out << "->setCallingConv("; printCallingConv(F->getCallingConv()); Out << ");"; nl(Out); if (F->hasSection()) { printCppName(F); Out << "->setSection(\"" << F->getSection() << "\");"; nl(Out); } if (F->getAlignment()) { printCppName(F); Out << "->setAlignment(" << F->getAlignment() << ");"; nl(Out); } if (F->getVisibility() != GlobalValue::DefaultVisibility) { printCppName(F); Out << "->setVisibility("; printVisibilityType(F->getVisibility()); Out << ");"; nl(Out); } if (F->hasGC()) { printCppName(F); Out << "->setGC(\"" << F->getGC() << "\");"; nl(Out); } Out << "}"; nl(Out); printAttributes(F->getAttributes(), getCppName(F)); printCppName(F); Out << "->setAttributes(" << getCppName(F) << "_PAL);"; nl(Out); } void CppWriter::printFunctionBody(const Function *F) { if (F->isDeclaration()) return; // external functions have no bodies. // Clear the DefinedValues and ForwardRefs maps because we can't have // cross-function forward refs ForwardRefs.clear(); DefinedValues.clear(); // Create all the argument values if (!is_inline) { if (!F->arg_empty()) { Out << "Function::arg_iterator args = " << getCppName(F) << "->arg_begin();"; nl(Out); } for (Function::const_arg_iterator AI = F->arg_begin(), AE = F->arg_end(); AI != AE; ++AI) { Out << "Value* " << getCppName(AI) << " = args++;"; nl(Out); if (AI->hasName()) { - Out << getCppName(AI) << "->setName(\"" << AI->getName() << "\");"; + Out << getCppName(AI) << "->setName(\""; + printEscapedString(AI->getName()); + Out << "\");"; nl(Out); } } } // Create all the basic blocks nl(Out); for (Function::const_iterator BI = F->begin(), BE = F->end(); BI != BE; ++BI) { std::string bbname(getCppName(BI)); Out << "BasicBlock* " << bbname << " = BasicBlock::Create(mod->getContext(), \""; if (BI->hasName()) printEscapedString(BI->getName()); Out << "\"," << getCppName(BI->getParent()) << ",0);"; nl(Out); } // Output all of its basic blocks... for the function for (Function::const_iterator BI = F->begin(), BE = F->end(); BI != BE; ++BI) { std::string bbname(getCppName(BI)); nl(Out) << "// Block " << BI->getName() << " (" << bbname << ")"; nl(Out); // Output all of the instructions in the basic block... for (BasicBlock::const_iterator I = BI->begin(), E = BI->end(); I != E; ++I) { printInstruction(I,bbname); } } // Loop over the ForwardRefs and resolve them now that all instructions // are generated. if (!ForwardRefs.empty()) { nl(Out) << "// Resolve Forward References"; nl(Out); } while (!ForwardRefs.empty()) { ForwardRefMap::iterator I = ForwardRefs.begin(); Out << I->second << "->replaceAllUsesWith(" << getCppName(I->first) << "); delete " << I->second << ";"; nl(Out); ForwardRefs.erase(I); } } void CppWriter::printInline(const std::string& fname, const std::string& func) { const Function* F = TheModule->getFunction(func); if (!F) { error(std::string("Function '") + func + "' not found in input module"); return; } if (F->isDeclaration()) { error(std::string("Function '") + func + "' is external!"); return; } nl(Out) << "BasicBlock* " << fname << "(Module* mod, Function *" << getCppName(F); unsigned arg_count = 1; for (Function::const_arg_iterator AI = F->arg_begin(), AE = F->arg_end(); AI != AE; ++AI) { Out << ", Value* arg_" << arg_count; } Out << ") {"; nl(Out); is_inline = true; printFunctionUses(F); printFunctionBody(F); is_inline = false; Out << "return " << getCppName(F->begin()) << ";"; nl(Out) << "}"; nl(Out); } void CppWriter::printModuleBody() { // Print out all the type definitions nl(Out) << "// Type Definitions"; nl(Out); printTypes(TheModule); // Functions can call each other and global variables can reference them so // define all the functions first before emitting their function bodies. nl(Out) << "// Function Declarations"; nl(Out); for (Module::const_iterator I = TheModule->begin(), E = TheModule->end(); I != E; ++I) printFunctionHead(I); // Process the global variables declarations. We can't initialze them until // after the constants are printed so just print a header for each global nl(Out) << "// Global Variable Declarations\n"; nl(Out); for (Module::const_global_iterator I = TheModule->global_begin(), E = TheModule->global_end(); I != E; ++I) { printVariableHead(I); } // Print out all the constants definitions. Constants don't recurse except // through GlobalValues. All GlobalValues have been declared at this point // so we can proceed to generate the constants. nl(Out) << "// Constant Definitions"; nl(Out); printConstants(TheModule); // Process the global variables definitions now that all the constants have // been emitted. These definitions just couple the gvars with their constant // initializers. nl(Out) << "// Global Variable Definitions"; nl(Out); for (Module::const_global_iterator I = TheModule->global_begin(), E = TheModule->global_end(); I != E; ++I) { printVariableBody(I); } // Finally, we can safely put out all of the function bodies. nl(Out) << "// Function Definitions"; nl(Out); for (Module::const_iterator I = TheModule->begin(), E = TheModule->end(); I != E; ++I) { if (!I->isDeclaration()) { nl(Out) << "// Function: " << I->getName() << " (" << getCppName(I) << ")"; nl(Out) << "{"; nl(Out,1); printFunctionBody(I); nl(Out,-1) << "}"; nl(Out); } } } void CppWriter::printProgram(const std::string& fname, const std::string& mName) { Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "#include \n"; Out << "using namespace llvm;\n\n"; Out << "Module* " << fname << "();\n\n"; Out << "int main(int argc, char**argv) {\n"; Out << " Module* Mod = " << fname << "();\n"; Out << " verifyModule(*Mod, PrintMessageAction);\n"; Out << " PassManager PM;\n"; Out << " PM.add(createPrintModulePass(&outs()));\n"; Out << " PM.run(*Mod);\n"; Out << " return 0;\n"; Out << "}\n\n"; printModule(fname,mName); } void CppWriter::printModule(const std::string& fname, const std::string& mName) { nl(Out) << "Module* " << fname << "() {"; nl(Out,1) << "// Module Construction"; nl(Out) << "Module* mod = new Module(\""; printEscapedString(mName); Out << "\", getGlobalContext());"; if (!TheModule->getTargetTriple().empty()) { nl(Out) << "mod->setDataLayout(\"" << TheModule->getDataLayout() << "\");"; } if (!TheModule->getTargetTriple().empty()) { nl(Out) << "mod->setTargetTriple(\"" << TheModule->getTargetTriple() << "\");"; } if (!TheModule->getModuleInlineAsm().empty()) { nl(Out) << "mod->setModuleInlineAsm(\""; printEscapedString(TheModule->getModuleInlineAsm()); Out << "\");"; } nl(Out); // Loop over the dependent libraries and emit them. Module::lib_iterator LI = TheModule->lib_begin(); Module::lib_iterator LE = TheModule->lib_end(); while (LI != LE) { Out << "mod->addLibrary(\"" << *LI << "\");"; nl(Out); ++LI; } printModuleBody(); nl(Out) << "return mod;"; nl(Out,-1) << "}"; nl(Out); } void CppWriter::printContents(const std::string& fname, const std::string& mName) { Out << "\nModule* " << fname << "(Module *mod) {\n"; Out << "\nmod->setModuleIdentifier(\""; printEscapedString(mName); Out << "\");\n"; printModuleBody(); Out << "\nreturn mod;\n"; Out << "\n}\n"; } void CppWriter::printFunction(const std::string& fname, const std::string& funcName) { const Function* F = TheModule->getFunction(funcName); if (!F) { error(std::string("Function '") + funcName + "' not found in input module"); return; } Out << "\nFunction* " << fname << "(Module *mod) {\n"; printFunctionUses(F); printFunctionHead(F); printFunctionBody(F); Out << "return " << getCppName(F) << ";\n"; Out << "}\n"; } void CppWriter::printFunctions() { const Module::FunctionListType &funcs = TheModule->getFunctionList(); Module::const_iterator I = funcs.begin(); Module::const_iterator IE = funcs.end(); for (; I != IE; ++I) { const Function &func = *I; if (!func.isDeclaration()) { std::string name("define_"); name += func.getName(); printFunction(name, func.getName()); } } } void CppWriter::printVariable(const std::string& fname, const std::string& varName) { const GlobalVariable* GV = TheModule->getNamedGlobal(varName); if (!GV) { error(std::string("Variable '") + varName + "' not found in input module"); return; } Out << "\nGlobalVariable* " << fname << "(Module *mod) {\n"; printVariableUses(GV); printVariableHead(GV); printVariableBody(GV); Out << "return " << getCppName(GV) << ";\n"; Out << "}\n"; } void CppWriter::printType(const std::string &fname, const std::string &typeName) { Type* Ty = TheModule->getTypeByName(typeName); if (!Ty) { error(std::string("Type '") + typeName + "' not found in input module"); return; } Out << "\nType* " << fname << "(Module *mod) {\n"; printType(Ty); Out << "return " << getCppName(Ty) << ";\n"; Out << "}\n"; } bool CppWriter::runOnModule(Module &M) { TheModule = &M; // Emit a header Out << "// Generated by llvm2cpp - DO NOT MODIFY!\n\n"; // Get the name of the function we're supposed to generate std::string fname = FuncName.getValue(); // Get the name of the thing we are to generate std::string tgtname = NameToGenerate.getValue(); if (GenerationType == GenModule || GenerationType == GenContents || GenerationType == GenProgram || GenerationType == GenFunctions) { if (tgtname == "!bad!") { if (M.getModuleIdentifier() == "-") tgtname = ""; else tgtname = M.getModuleIdentifier(); } } else if (tgtname == "!bad!") error("You must use the -for option with -gen-{function,variable,type}"); switch (WhatToGenerate(GenerationType)) { case GenProgram: if (fname.empty()) fname = "makeLLVMModule"; printProgram(fname,tgtname); break; case GenModule: if (fname.empty()) fname = "makeLLVMModule"; printModule(fname,tgtname); break; case GenContents: if (fname.empty()) fname = "makeLLVMModuleContents"; printContents(fname,tgtname); break; case GenFunction: if (fname.empty()) fname = "makeLLVMFunction"; printFunction(fname,tgtname); break; case GenFunctions: printFunctions(); break; case GenInline: if (fname.empty()) fname = "makeLLVMInline"; printInline(fname,tgtname); break; case GenVariable: if (fname.empty()) fname = "makeLLVMVariable"; printVariable(fname,tgtname); break; case GenType: if (fname.empty()) fname = "makeLLVMType"; printType(fname,tgtname); break; default: error("Invalid generation option"); } return false; } char CppWriter::ID = 0; //===----------------------------------------------------------------------===// // External Interface declaration //===----------------------------------------------------------------------===// bool CPPTargetMachine::addPassesToEmitFile(PassManagerBase &PM, formatted_raw_ostream &o, CodeGenFileType FileType, CodeGenOpt::Level OptLevel, bool DisableVerify) { if (FileType != TargetMachine::CGFT_AssemblyFile) return true; PM.add(new CppWriter(o)); return false; } Index: vendor/llvm/dist/lib/Target/Mips/CMakeLists.txt =================================================================== --- vendor/llvm/dist/lib/Target/Mips/CMakeLists.txt (revision 228363) +++ vendor/llvm/dist/lib/Target/Mips/CMakeLists.txt (revision 228364) @@ -1,46 +1,47 @@ set(LLVM_TARGET_DEFINITIONS Mips.td) llvm_tablegen(MipsGenRegisterInfo.inc -gen-register-info) llvm_tablegen(MipsGenInstrInfo.inc -gen-instr-info) +llvm_tablegen(MipsGenCodeEmitter.inc -gen-emitter) llvm_tablegen(MipsGenAsmWriter.inc -gen-asm-writer) llvm_tablegen(MipsGenDAGISel.inc -gen-dag-isel) llvm_tablegen(MipsGenCallingConv.inc -gen-callingconv) llvm_tablegen(MipsGenSubtargetInfo.inc -gen-subtarget) add_public_tablegen_target(MipsCommonTableGen) add_llvm_target(MipsCodeGen MipsAsmPrinter.cpp MipsCodeEmitter.cpp MipsDelaySlotFiller.cpp MipsEmitGPRestore.cpp MipsExpandPseudo.cpp MipsJITInfo.cpp MipsInstrInfo.cpp MipsISelDAGToDAG.cpp MipsISelLowering.cpp MipsFrameLowering.cpp MipsMCInstLower.cpp MipsMCSymbolRefExpr.cpp MipsRegisterInfo.cpp MipsSubtarget.cpp MipsTargetMachine.cpp MipsTargetObjectFile.cpp MipsSelectionDAGInfo.cpp ) add_llvm_library_dependencies(LLVMMipsCodeGen LLVMAsmPrinter LLVMCodeGen LLVMCore LLVMMC LLVMMipsAsmPrinter LLVMMipsDesc LLVMMipsInfo LLVMSelectionDAG LLVMSupport LLVMTarget ) add_subdirectory(InstPrinter) add_subdirectory(TargetInfo) add_subdirectory(MCTargetDesc) Index: vendor/llvm/dist/lib/Target/Mips/Makefile =================================================================== --- vendor/llvm/dist/lib/Target/Mips/Makefile (revision 228363) +++ vendor/llvm/dist/lib/Target/Mips/Makefile (revision 228364) @@ -1,23 +1,23 @@ ##===- lib/Target/Mips/Makefile ----------------------------*- Makefile -*-===## # # The LLVM Compiler Infrastructure # # This file is distributed under the University of Illinois Open Source # License. See LICENSE.TXT for details. # ##===----------------------------------------------------------------------===## LEVEL = ../../.. LIBRARYNAME = LLVMMipsCodeGen TARGET = Mips # Make sure that tblgen is run, first thing. BUILT_SOURCES = MipsGenRegisterInfo.inc MipsGenInstrInfo.inc \ - MipsGenAsmWriter.inc \ + MipsGenAsmWriter.inc MipsGenCodeEmitter.inc \ MipsGenDAGISel.inc MipsGenCallingConv.inc \ MipsGenSubtargetInfo.inc DIRS = InstPrinter TargetInfo MCTargetDesc include $(LEVEL)/Makefile.common Index: vendor/llvm/dist/lib/Target/Mips/Mips64InstrInfo.td =================================================================== --- vendor/llvm/dist/lib/Target/Mips/Mips64InstrInfo.td (revision 228363) +++ vendor/llvm/dist/lib/Target/Mips/Mips64InstrInfo.td (revision 228364) @@ -1,214 +1,214 @@ //===- Mips64InstrInfo.td - Mips64 Instruction Information -*- tablegen -*-===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file describes Mips64 instructions. // //===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===// // Mips Operand, Complex Patterns and Transformations Definitions. //===----------------------------------------------------------------------===// // Instruction operand types def shamt_64 : Operand; // Unsigned Operand def uimm16_64 : Operand { let PrintMethod = "printUnsignedImm"; } // Transformation Function - get Imm - 32. def Subtract32 : SDNodeXFormgetZExtValue() - 32); }]>; // imm32_63 predicate - True if imm is in range [32, 63]. def imm32_63 : ImmLeaf= 32 && (int32_t)Imm < 64;}], Subtract32>; //===----------------------------------------------------------------------===// // Instructions specific format //===----------------------------------------------------------------------===// // Shifts class LogicR_shift_rotate_imm64 func, bits<5> _rs, string instr_asm, SDNode OpNode, PatFrag PF>: - FR<0x00, func, (outs CPU64Regs:$dst), (ins CPU64Regs:$b, shamt_64:$c), - !strconcat(instr_asm, "\t$dst, $b, $c"), - [(set CPU64Regs:$dst, (OpNode CPU64Regs:$b, (i64 PF:$c)))], + FR<0x00, func, (outs CPU64Regs:$rd), (ins CPU64Regs:$rt, shamt_64:$shamt), + !strconcat(instr_asm, "\t$rd, $rt, $shamt"), + [(set CPU64Regs:$rd, (OpNode CPU64Regs:$rt, (i64 PF:$shamt)))], IIAlu> { let rs = _rs; } class LogicR_shift_rotate_reg64 func, bits<5> _shamt, string instr_asm, SDNode OpNode>: - FR<0x00, func, (outs CPU64Regs:$dst), (ins CPU64Regs:$c, CPU64Regs:$b), - !strconcat(instr_asm, "\t$dst, $b, $c"), - [(set CPU64Regs:$dst, (OpNode CPU64Regs:$b, CPU64Regs:$c))], IIAlu> { + FR<0x00, func, (outs CPU64Regs:$rd), (ins CPU64Regs:$rs, CPU64Regs:$rt), + !strconcat(instr_asm, "\t$rd, $rt, $rs"), + [(set CPU64Regs:$rd, (OpNode CPU64Regs:$rt, CPU64Regs:$rs))], IIAlu> { let shamt = _shamt; } // Mul, Div -let Defs = [HI64, LO64] in { +let rd = 0, shamt = 0, Defs = [HI64, LO64] in { let isCommutable = 1 in class Mul64 func, string instr_asm, InstrItinClass itin>: - FR<0x00, func, (outs), (ins CPU64Regs:$a, CPU64Regs:$b), - !strconcat(instr_asm, "\t$a, $b"), [], itin>; + FR<0x00, func, (outs), (ins CPU64Regs:$rs, CPU64Regs:$rt), + !strconcat(instr_asm, "\t$rs, $rt"), [], itin>; class Div64 func, string instr_asm, InstrItinClass itin>: - FR<0x00, func, (outs), (ins CPU64Regs:$a, CPU64Regs:$b), - !strconcat(instr_asm, "\t$$zero, $a, $b"), - [(op CPU64Regs:$a, CPU64Regs:$b)], itin>; + FR<0x00, func, (outs), (ins CPU64Regs:$rs, CPU64Regs:$rt), + !strconcat(instr_asm, "\t$$zero, $rs, $rt"), + [(op CPU64Regs:$rs, CPU64Regs:$rt)], itin>; } // Move from Hi/Lo let shamt = 0 in { let rs = 0, rt = 0 in class MoveFromLOHI64 func, string instr_asm>: - FR<0x00, func, (outs CPU64Regs:$dst), (ins), - !strconcat(instr_asm, "\t$dst"), [], IIHiLo>; + FR<0x00, func, (outs CPU64Regs:$rd), (ins), + !strconcat(instr_asm, "\t$rd"), [], IIHiLo>; let rt = 0, rd = 0 in class MoveToLOHI64 func, string instr_asm>: - FR<0x00, func, (outs), (ins CPU64Regs:$src), - !strconcat(instr_asm, "\t$src"), [], IIHiLo>; + FR<0x00, func, (outs), (ins CPU64Regs:$rs), + !strconcat(instr_asm, "\t$rs"), [], IIHiLo>; } // Count Leading Ones/Zeros in Word class CountLeading64 func, string instr_asm, list pattern>: - FR<0x1c, func, (outs CPU64Regs:$dst), (ins CPU64Regs:$src), - !strconcat(instr_asm, "\t$dst, $src"), pattern, IIAlu>, + FR<0x1c, func, (outs CPU64Regs:$rd), (ins CPU64Regs:$rs), + !strconcat(instr_asm, "\t$rd, $rs"), pattern, IIAlu>, Requires<[HasBitCount]> { let shamt = 0; let rt = rd; } //===----------------------------------------------------------------------===// // Instruction definition //===----------------------------------------------------------------------===// /// Arithmetic Instructions (ALU Immediate) def DADDiu : ArithLogicI<0x19, "daddiu", add, simm16_64, immSExt16, CPU64Regs>; def DANDi : ArithLogicI<0x0c, "andi", and, uimm16_64, immZExt16, CPU64Regs>; def SLTi64 : SetCC_I<0x0a, "slti", setlt, simm16_64, immSExt16, CPU64Regs>; def SLTiu64 : SetCC_I<0x0b, "sltiu", setult, simm16_64, immSExt16, CPU64Regs>; def ORi64 : ArithLogicI<0x0d, "ori", or, uimm16_64, immZExt16, CPU64Regs>; def XORi64 : ArithLogicI<0x0e, "xori", xor, uimm16_64, immZExt16, CPU64Regs>; /// Arithmetic Instructions (3-Operand, R-Type) def DADDu : ArithLogicR<0x00, 0x2d, "daddu", add, IIAlu, CPU64Regs, 1>; def DSUBu : ArithLogicR<0x00, 0x2f, "dsubu", sub, IIAlu, CPU64Regs>; def SLT64 : SetCC_R<0x00, 0x2a, "slt", setlt, CPU64Regs>; def SLTu64 : SetCC_R<0x00, 0x2b, "sltu", setult, CPU64Regs>; def AND64 : ArithLogicR<0x00, 0x24, "and", and, IIAlu, CPU64Regs, 1>; def OR64 : ArithLogicR<0x00, 0x25, "or", or, IIAlu, CPU64Regs, 1>; def XOR64 : ArithLogicR<0x00, 0x26, "xor", xor, IIAlu, CPU64Regs, 1>; def NOR64 : LogicNOR<0x00, 0x27, "nor", CPU64Regs>; /// Shift Instructions def DSLL : LogicR_shift_rotate_imm64<0x38, 0x00, "dsll", shl, immZExt5>; def DSRL : LogicR_shift_rotate_imm64<0x3a, 0x00, "dsrl", srl, immZExt5>; def DSRA : LogicR_shift_rotate_imm64<0x3b, 0x00, "dsra", sra, immZExt5>; def DSLL32 : LogicR_shift_rotate_imm64<0x3c, 0x00, "dsll32", shl, imm32_63>; def DSRL32 : LogicR_shift_rotate_imm64<0x3e, 0x00, "dsrl32", srl, imm32_63>; def DSRA32 : LogicR_shift_rotate_imm64<0x3f, 0x00, "dsra32", sra, imm32_63>; def DSLLV : LogicR_shift_rotate_reg64<0x24, 0x00, "dsllv", shl>; def DSRLV : LogicR_shift_rotate_reg64<0x26, 0x00, "dsrlv", srl>; def DSRAV : LogicR_shift_rotate_reg64<0x27, 0x00, "dsrav", sra>; // Rotate Instructions let Predicates = [HasMips64r2] in { def DROTR : LogicR_shift_rotate_imm64<0x3a, 0x01, "drotr", rotr, immZExt5>; def DROTR32 : LogicR_shift_rotate_imm64<0x3e, 0x01, "drotr32", rotr, imm32_63>; def DROTRV : LogicR_shift_rotate_reg64<0x16, 0x01, "drotrv", rotr>; } /// Load and Store Instructions /// aligned defm LB64 : LoadM64<0x20, "lb", sextloadi8>; defm LBu64 : LoadM64<0x24, "lbu", zextloadi8>; defm LH64 : LoadM64<0x21, "lh", sextloadi16_a>; defm LHu64 : LoadM64<0x25, "lhu", zextloadi16_a>; defm LW64 : LoadM64<0x23, "lw", sextloadi32_a>; defm LWu64 : LoadM64<0x27, "lwu", zextloadi32_a>; defm SB64 : StoreM64<0x28, "sb", truncstorei8>; defm SH64 : StoreM64<0x29, "sh", truncstorei16_a>; defm SW64 : StoreM64<0x2b, "sw", truncstorei32_a>; defm LD : LoadM64<0x37, "ld", load_a>; defm SD : StoreM64<0x3f, "sd", store_a>; /// unaligned defm ULH64 : LoadM64<0x21, "ulh", sextloadi16_u, 1>; defm ULHu64 : LoadM64<0x25, "ulhu", zextloadi16_u, 1>; defm ULW64 : LoadM64<0x23, "ulw", sextloadi32_u, 1>; defm USH64 : StoreM64<0x29, "ush", truncstorei16_u, 1>; defm USW64 : StoreM64<0x2b, "usw", truncstorei32_u, 1>; defm ULD : LoadM64<0x37, "uld", load_u, 1>; defm USD : StoreM64<0x3f, "usd", store_u, 1>; /// Jump and Branch Instructions def BEQ64 : CBranch<0x04, "beq", seteq, CPU64Regs>; def BNE64 : CBranch<0x05, "bne", setne, CPU64Regs>; def BGEZ64 : CBranchZero<0x01, 1, "bgez", setge, CPU64Regs>; def BGTZ64 : CBranchZero<0x07, 0, "bgtz", setgt, CPU64Regs>; def BLEZ64 : CBranchZero<0x07, 0, "blez", setle, CPU64Regs>; def BLTZ64 : CBranchZero<0x01, 0, "bltz", setlt, CPU64Regs>; /// Multiply and Divide Instructions. def DMULT : Mul64<0x1c, "dmult", IIImul>; def DMULTu : Mul64<0x1d, "dmultu", IIImul>; def DSDIV : Div64; def DUDIV : Div64; let Defs = [HI64] in def MTHI64 : MoveToLOHI64<0x11, "mthi">; let Defs = [LO64] in def MTLO64 : MoveToLOHI64<0x13, "mtlo">; let Uses = [HI64] in def MFHI64 : MoveFromLOHI64<0x10, "mfhi">; let Uses = [LO64] in def MFLO64 : MoveFromLOHI64<0x12, "mflo">; /// Count Leading def DCLZ : CountLeading64<0x24, "dclz", - [(set CPU64Regs:$dst, (ctlz CPU64Regs:$src))]>; + [(set CPU64Regs:$rd, (ctlz CPU64Regs:$rs))]>; def DCLO : CountLeading64<0x25, "dclo", - [(set CPU64Regs:$dst, (ctlz (not CPU64Regs:$src)))]>; + [(set CPU64Regs:$rd, (ctlz (not CPU64Regs:$rs)))]>; //===----------------------------------------------------------------------===// // Arbitrary patterns that map to one or more instructions //===----------------------------------------------------------------------===// // Small immediates def : Pat<(i64 immSExt16:$in), (DADDiu ZERO_64, imm:$in)>; def : Pat<(i64 immZExt16:$in), (ORi64 ZERO_64, imm:$in)>; // zextloadi32_u def : Pat<(zextloadi32_u addr:$a), (DSRL (DSLL (ULW64_P8 addr:$a), 32), 32)>, Requires<[IsN64]>; def : Pat<(zextloadi32_u addr:$a), (DSRL (DSLL (ULW64 addr:$a), 32), 32)>, Requires<[NotN64]>; // hi/lo relocs def : Pat<(i64 (MipsLo tglobaladdr:$in)), (DADDiu ZERO_64, tglobaladdr:$in)>; defm : BrcondPats; // setcc patterns defm : SeteqPats; defm : SetlePats; defm : SetgtPats; defm : SetgePats; defm : SetgeImmPats; Index: vendor/llvm/dist/lib/Target/Mips/MipsCodeEmitter.cpp =================================================================== --- vendor/llvm/dist/lib/Target/Mips/MipsCodeEmitter.cpp (revision 228363) +++ vendor/llvm/dist/lib/Target/Mips/MipsCodeEmitter.cpp (revision 228364) @@ -1,245 +1,266 @@ //===-- Mips/MipsCodeEmitter.cpp - Convert Mips code to machine code -----===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===---------------------------------------------------------------------===// // // This file contains the pass that transforms the Mips machine instructions // into relocatable machine code. // //===---------------------------------------------------------------------===// #define DEBUG_TYPE "jit" #include "Mips.h" #include "MipsInstrInfo.h" #include "MipsRelocations.h" #include "MipsSubtarget.h" #include "MipsTargetMachine.h" #include "llvm/Constants.h" #include "llvm/DerivedTypes.h" #include "llvm/Function.h" #include "llvm/PassManager.h" #include "llvm/CodeGen/JITCodeEmitter.h" #include "llvm/CodeGen/MachineConstantPool.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineInstr.h" #include "llvm/CodeGen/MachineJumpTableInfo.h" #include "llvm/CodeGen/MachineModuleInfo.h" #include "llvm/CodeGen/Passes.h" #include "llvm/ADT/Statistic.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/raw_ostream.h" #ifndef NDEBUG #include #endif #include "llvm/CodeGen/MachineOperand.h" using namespace llvm; STATISTIC(NumEmitted, "Number of machine instructions emitted"); namespace { class MipsCodeEmitter : public MachineFunctionPass { MipsJITInfo *JTI; const MipsInstrInfo *II; const TargetData *TD; const MipsSubtarget *Subtarget; TargetMachine &TM; JITCodeEmitter &MCE; const std::vector *MCPEs; const std::vector *MJTEs; bool IsPIC; void getAnalysisUsage(AnalysisUsage &AU) const { AU.addRequired (); MachineFunctionPass::getAnalysisUsage(AU); } static char ID; public: MipsCodeEmitter(TargetMachine &tm, JITCodeEmitter &mce) : MachineFunctionPass(ID), JTI(0), II((const MipsInstrInfo *) tm.getInstrInfo()), TD(tm.getTargetData()), TM(tm), MCE(mce), MCPEs(0), MJTEs(0), IsPIC(TM.getRelocationModel() == Reloc::PIC_) { } bool runOnMachineFunction(MachineFunction &MF); virtual const char *getPassName() const { return "Mips Machine Code Emitter"; } /// getBinaryCodeForInstr - This function, generated by the /// CodeEmitterGenerator using TableGen, produces the binary encoding for /// machine instructions. unsigned getBinaryCodeForInstr(const MachineInstr &MI) const; void emitInstruction(const MachineInstr &MI); private: void emitWordLE(unsigned Word); /// Routines that handle operands which add machine relocations which are /// fixed up by the relocation stage. void emitGlobalAddress(const GlobalValue *GV, unsigned Reloc, bool MayNeedFarStub) const; void emitExternalSymbolAddress(const char *ES, unsigned Reloc) const; void emitConstPoolAddress(unsigned CPI, unsigned Reloc) const; void emitJumpTableAddress(unsigned JTIndex, unsigned Reloc) const; void emitMachineBasicBlock(MachineBasicBlock *BB, unsigned Reloc) const; /// getMachineOpValue - Return binary encoding of operand. If the machine /// operand requires relocation, record the relocation and return zero. unsigned getMachineOpValue(const MachineInstr &MI, const MachineOperand &MO) const; unsigned getRelocation(const MachineInstr &MI, const MachineOperand &MO) const; + unsigned getMemEncoding(const MachineInstr &MI, unsigned OpNo) const; + unsigned getSizeExtEncoding(const MachineInstr &MI, unsigned OpNo) const; + unsigned getSizeInsEncoding(const MachineInstr &MI, unsigned OpNo) const; }; } char MipsCodeEmitter::ID = 0; bool MipsCodeEmitter::runOnMachineFunction(MachineFunction &MF) { JTI = ((MipsTargetMachine&) MF.getTarget()).getJITInfo(); II = ((const MipsTargetMachine&) MF.getTarget()).getInstrInfo(); TD = ((const MipsTargetMachine&) MF.getTarget()).getTargetData(); Subtarget = &TM.getSubtarget (); MCPEs = &MF.getConstantPool()->getConstants(); MJTEs = 0; if (MF.getJumpTableInfo()) MJTEs = &MF.getJumpTableInfo()->getJumpTables(); JTI->Initialize(MF, IsPIC); MCE.setModuleInfo(&getAnalysis ()); do { DEBUG(errs() << "JITTing function '" << MF.getFunction()->getName() << "'\n"); MCE.startFunction(MF); for (MachineFunction::iterator MBB = MF.begin(), E = MF.end(); MBB != E; ++MBB){ MCE.StartMachineBasicBlock(MBB); for (MachineBasicBlock::const_iterator I = MBB->begin(), E = MBB->end(); I != E; ++I) emitInstruction(*I); } } while (MCE.finishFunction(MF)); return false; } unsigned MipsCodeEmitter::getRelocation(const MachineInstr &MI, const MachineOperand &MO) const { // NOTE: This relocations are for static. uint64_t TSFlags = MI.getDesc().TSFlags; uint64_t Form = TSFlags & MipsII::FormMask; if (Form == MipsII::FrmJ) return Mips::reloc_mips_26; if ((Form == MipsII::FrmI || Form == MipsII::FrmFI) && MI.getDesc().isBranch()) return Mips::reloc_mips_branch; if (Form == MipsII::FrmI && MI.getOpcode() == Mips::LUi) return Mips::reloc_mips_hi; return Mips::reloc_mips_lo; } +unsigned MipsCodeEmitter::getMemEncoding(const MachineInstr &MI, + unsigned OpNo) const { + // Base register is encoded in bits 20-16, offset is encoded in bits 15-0. + assert(MI.getOperand(OpNo).isReg()); + unsigned RegBits = getMachineOpValue(MI, MI.getOperand(OpNo)) << 16; + return + (getMachineOpValue(MI, MI.getOperand(OpNo+1)) & 0xFFFF) | RegBits; +} + +unsigned MipsCodeEmitter::getSizeExtEncoding(const MachineInstr &MI, + unsigned OpNo) const { + // size is encoded as size-1. + return getMachineOpValue(MI, MI.getOperand(OpNo)) - 1; +} + +unsigned MipsCodeEmitter::getSizeInsEncoding(const MachineInstr &MI, + unsigned OpNo) const { + // size is encoded as pos+size-1. + return getMachineOpValue(MI, MI.getOperand(OpNo-1)) + + getMachineOpValue(MI, MI.getOperand(OpNo)) - 1; +} + /// getMachineOpValue - Return binary encoding of operand. If the machine /// operand requires relocation, record the relocation and return zero. unsigned MipsCodeEmitter::getMachineOpValue(const MachineInstr &MI, const MachineOperand &MO) const { if (MO.isReg()) return MipsRegisterInfo::getRegisterNumbering(MO.getReg()); else if (MO.isImm()) return static_cast(MO.getImm()); else if (MO.isGlobal()) emitGlobalAddress(MO.getGlobal(), getRelocation(MI, MO), true); else if (MO.isSymbol()) emitExternalSymbolAddress(MO.getSymbolName(), getRelocation(MI, MO)); else if (MO.isCPI()) emitConstPoolAddress(MO.getIndex(), getRelocation(MI, MO)); else if (MO.isJTI()) emitJumpTableAddress(MO.getIndex(), getRelocation(MI, MO)); else if (MO.isMBB()) emitMachineBasicBlock(MO.getMBB(), getRelocation(MI, MO)); else llvm_unreachable("Unable to encode MachineOperand!"); return 0; } void MipsCodeEmitter::emitGlobalAddress(const GlobalValue *GV, unsigned Reloc, bool MayNeedFarStub) const { MCE.addRelocation(MachineRelocation::getGV(MCE.getCurrentPCOffset(), Reloc, const_cast(GV), 0, MayNeedFarStub)); } void MipsCodeEmitter:: emitExternalSymbolAddress(const char *ES, unsigned Reloc) const { MCE.addRelocation(MachineRelocation::getExtSym(MCE.getCurrentPCOffset(), Reloc, ES, 0, 0, false)); } void MipsCodeEmitter::emitConstPoolAddress(unsigned CPI, unsigned Reloc) const { MCE.addRelocation(MachineRelocation::getConstPool(MCE.getCurrentPCOffset(), Reloc, CPI, 0, false)); } void MipsCodeEmitter:: emitJumpTableAddress(unsigned JTIndex, unsigned Reloc) const { MCE.addRelocation(MachineRelocation::getJumpTable(MCE.getCurrentPCOffset(), Reloc, JTIndex, 0, false)); } void MipsCodeEmitter::emitMachineBasicBlock(MachineBasicBlock *BB, unsigned Reloc) const { MCE.addRelocation(MachineRelocation::getBB(MCE.getCurrentPCOffset(), Reloc, BB)); } void MipsCodeEmitter::emitInstruction(const MachineInstr &MI) { DEBUG(errs() << "JIT: " << (void*)MCE.getCurrentPCValue() << ":\t" << MI); MCE.processDebugLoc(MI.getDebugLoc(), true); // Skip pseudo instructions. if ((MI.getDesc().TSFlags & MipsII::FormMask) == MipsII::Pseudo) return; ++NumEmitted; // Keep track of the # of mi's emitted switch (MI.getOpcode()) { default: emitWordLE(getBinaryCodeForInstr(MI)); break; } MCE.processDebugLoc(MI.getDebugLoc(), false); } void MipsCodeEmitter::emitWordLE(unsigned Word) { DEBUG(errs() << " 0x"; errs().write_hex(Word) << "\n"); MCE.emitWordLE(Word); } /// createMipsJITCodeEmitterPass - Return a pass that emits the collected Mips /// code to the specified MCE object. FunctionPass *llvm::createMipsJITCodeEmitterPass(MipsTargetMachine &TM, JITCodeEmitter &JCE) { return new MipsCodeEmitter(TM, JCE); } -unsigned MipsCodeEmitter::getBinaryCodeForInstr(const MachineInstr &MI) const { - // this function will be automatically generated by the CodeEmitterGenerator - // using TableGen - return 0; -} +#include "MipsGenCodeEmitter.inc" Index: vendor/llvm/dist/lib/Target/Mips/MipsInstrFPU.td =================================================================== --- vendor/llvm/dist/lib/Target/Mips/MipsInstrFPU.td (revision 228363) +++ vendor/llvm/dist/lib/Target/Mips/MipsInstrFPU.td (revision 228364) @@ -1,362 +1,376 @@ //===- MipsInstrFPU.td - Mips FPU Instruction Information --*- tablegen -*-===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file describes the Mips FPU instruction set. // //===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===// // Floating Point Instructions // ------------------------ // * 64bit fp: // - 32 64-bit registers (default mode) // - 16 even 32-bit registers (32-bit compatible mode) for // single and double access. // * 32bit fp: // - 16 even 32-bit registers - single and double (aliased) // - 32 32-bit registers (within single-only mode) //===----------------------------------------------------------------------===// // Floating Point Compare and Branch def SDT_MipsFPBrcond : SDTypeProfile<0, 2, [SDTCisInt<0>, SDTCisVT<1, OtherVT>]>; def SDT_MipsFPCmp : SDTypeProfile<0, 3, [SDTCisSameAs<0, 1>, SDTCisFP<1>, SDTCisVT<2, i32>]>; def SDT_MipsCMovFP : SDTypeProfile<1, 2, [SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>]>; def SDT_MipsBuildPairF64 : SDTypeProfile<1, 2, [SDTCisVT<0, f64>, SDTCisVT<1, i32>, SDTCisSameAs<1, 2>]>; def SDT_MipsExtractElementF64 : SDTypeProfile<1, 2, [SDTCisVT<0, i32>, SDTCisVT<1, f64>, SDTCisVT<2, i32>]>; def MipsFPCmp : SDNode<"MipsISD::FPCmp", SDT_MipsFPCmp, [SDNPOutGlue]>; def MipsCMovFP_T : SDNode<"MipsISD::CMovFP_T", SDT_MipsCMovFP, [SDNPInGlue]>; def MipsCMovFP_F : SDNode<"MipsISD::CMovFP_F", SDT_MipsCMovFP, [SDNPInGlue]>; def MipsFPBrcond : SDNode<"MipsISD::FPBrcond", SDT_MipsFPBrcond, [SDNPHasChain, SDNPOptInGlue]>; def MipsBuildPairF64 : SDNode<"MipsISD::BuildPairF64", SDT_MipsBuildPairF64>; def MipsExtractElementF64 : SDNode<"MipsISD::ExtractElementF64", SDT_MipsExtractElementF64>; // Operand for printing out a condition code. let PrintMethod = "printFCCOperand" in def condcode : Operand; //===----------------------------------------------------------------------===// // Feature predicates. //===----------------------------------------------------------------------===// def IsFP64bit : Predicate<"Subtarget.isFP64bit()">; def NotFP64bit : Predicate<"!Subtarget.isFP64bit()">; def IsSingleFloat : Predicate<"Subtarget.isSingleFloat()">; def IsNotSingleFloat : Predicate<"!Subtarget.isSingleFloat()">; //===----------------------------------------------------------------------===// // Instruction Class Templates // // A set of multiclasses is used to address the register usage. // // S32 - single precision in 16 32bit even fp registers // single precision in 32 32bit fp registers in SingleOnly mode // S64 - single precision in 32 64bit fp registers (In64BitMode) // D32 - double precision in 16 32bit even fp registers // D64 - double precision in 32 64bit fp registers (In64BitMode) // // Only S32 and D32 are supported right now. //===----------------------------------------------------------------------===// // FP load. class FPLoad op, string opstr, PatFrag FOp, RegisterClass RC, Operand MemOpnd>: - FFI; + FMem; // FP store. class FPStore op, string opstr, PatFrag FOp, RegisterClass RC, Operand MemOpnd>: - FFI; + FMem; // Instructions that convert an FP value to 32-bit fixed point. multiclass FFR1_W_M funct, string opstr> { def _S : FFR1; def _D32 : FFR1, Requires<[NotFP64bit]>; def _D64 : FFR1, Requires<[IsFP64bit]>; } // Instructions that convert an FP value to 64-bit fixed point. let Predicates = [IsFP64bit] in multiclass FFR1_L_M funct, string opstr> { def _S : FFR1; def _D64 : FFR1; } // FP-to-FP conversion instructions. multiclass FFR1P_M funct, string opstr, SDNode OpNode> { def _S : FFR1P; def _D32 : FFR1P, Requires<[NotFP64bit]>; def _D64 : FFR1P, Requires<[IsFP64bit]>; } multiclass FFR2P_M funct, string opstr, SDNode OpNode, bit isComm = 0> { let isCommutable = isComm in { def _S : FFR2P; def _D32 : FFR2P, Requires<[NotFP64bit]>; def _D64 : FFR2P, Requires<[IsFP64bit]>; } } //===----------------------------------------------------------------------===// // Floating Point Instructions //===----------------------------------------------------------------------===// defm ROUND_W : FFR1_W_M<0xc, "round">; defm ROUND_L : FFR1_L_M<0x8, "round">; defm TRUNC_W : FFR1_W_M<0xd, "trunc">; defm TRUNC_L : FFR1_L_M<0x9, "trunc">; defm CEIL_W : FFR1_W_M<0xe, "ceil">; defm CEIL_L : FFR1_L_M<0xa, "ceil">; defm FLOOR_W : FFR1_W_M<0xf, "floor">; defm FLOOR_L : FFR1_L_M<0xb, "floor">; defm CVT_W : FFR1_W_M<0x24, "cvt">; defm CVT_L : FFR1_L_M<0x25, "cvt">; def CVT_S_W : FFR1<0x20, 20, "cvt", "s.w", FGR32, FGR32>; let Predicates = [NotFP64bit] in { def CVT_S_D32 : FFR1<0x20, 17, "cvt", "s.d", FGR32, AFGR64>; def CVT_D32_W : FFR1<0x21, 20, "cvt", "d.w", AFGR64, FGR32>; def CVT_D32_S : FFR1<0x21, 16, "cvt", "d.s", AFGR64, FGR32>; } let Predicates = [IsFP64bit] in { def CVT_S_D64 : FFR1<0x20, 17, "cvt", "s.d", FGR32, FGR64>; def CVT_S_L : FFR1<0x20, 21, "cvt", "s.l", FGR32, FGR64>; def CVT_D64_W : FFR1<0x21, 20, "cvt", "d.w", FGR64, FGR32>; def CVT_D64_S : FFR1<0x21, 16, "cvt", "d.s", FGR64, FGR32>; def CVT_D64_L : FFR1<0x21, 21, "cvt", "d.l", FGR64, FGR64>; } defm FABS : FFR1P_M<0x5, "abs", fabs>; defm FNEG : FFR1P_M<0x7, "neg", fneg>; defm FSQRT : FFR1P_M<0x4, "sqrt", fsqrt>; // The odd-numbered registers are only referenced when doing loads, // stores, and moves between floating-point and integer registers. // When defining instructions, we reference all 32-bit registers, // regardless of register aliasing. -let fd = 0 in { - /// Move Control Registers From/To CPU Registers - def CFC1 : FFR<0x11, 0x0, 0x2, (outs CPURegs:$rt), (ins CCR:$fs), + +class FFRGPR _fmt, dag outs, dag ins, string asmstr, list pattern>: + FFR<0x11, 0x0, _fmt, outs, ins, asmstr, pattern> { + bits<5> rt; + let ft = rt; + let fd = 0; +} + +/// Move Control Registers From/To CPU Registers +def CFC1 : FFRGPR<0x2, (outs CPURegs:$rt), (ins CCR:$fs), "cfc1\t$rt, $fs", []>; - def CTC1 : FFR<0x11, 0x0, 0x6, (outs CCR:$rt), (ins CPURegs:$fs), - "ctc1\t$fs, $rt", []>; +def CTC1 : FFRGPR<0x6, (outs CCR:$fs), (ins CPURegs:$rt), + "ctc1\t$rt, $fs", []>; - def MFC1 : FFR<0x11, 0x00, 0x00, (outs CPURegs:$rt), (ins FGR32:$fs), +def MFC1 : FFRGPR<0x00, (outs CPURegs:$rt), (ins FGR32:$fs), "mfc1\t$rt, $fs", [(set CPURegs:$rt, (bitconvert FGR32:$fs))]>; - def MTC1 : FFR<0x11, 0x00, 0x04, (outs FGR32:$fs), (ins CPURegs:$rt), +def MTC1 : FFRGPR<0x04, (outs FGR32:$fs), (ins CPURegs:$rt), "mtc1\t$rt, $fs", [(set FGR32:$fs, (bitconvert CPURegs:$rt))]>; -} def FMOV_S : FFR1<0x6, 16, "mov", "s", FGR32, FGR32>; def FMOV_D32 : FFR1<0x6, 17, "mov", "d", AFGR64, AFGR64>, Requires<[NotFP64bit]>; def FMOV_D64 : FFR1<0x6, 17, "mov", "d", FGR64, FGR64>, Requires<[IsFP64bit]>; /// Floating Point Memory Instructions let Predicates = [IsN64] in { def LWC1_P8 : FPLoad<0x31, "lwc1", load, FGR32, mem64>; def SWC1_P8 : FPStore<0x39, "swc1", store, FGR32, mem64>; def LDC164_P8 : FPLoad<0x35, "ldc1", load, FGR64, mem64>; def SDC164_P8 : FPStore<0x3d, "sdc1", store, FGR64, mem64>; } let Predicates = [NotN64] in { def LWC1 : FPLoad<0x31, "lwc1", load, FGR32, mem>; def SWC1 : FPStore<0x39, "swc1", store, FGR32, mem>; let Predicates = [HasMips64] in { def LDC164 : FPLoad<0x35, "ldc1", load, FGR64, mem>; def SDC164 : FPStore<0x3d, "sdc1", store, FGR64, mem>; } let Predicates = [NotMips64] in { def LDC1 : FPLoad<0x35, "ldc1", load, AFGR64, mem>; def SDC1 : FPStore<0x3d, "sdc1", store, AFGR64, mem>; } } /// Floating-point Aritmetic -defm FADD : FFR2P_M<0x10, "add", fadd, 1>; +defm FADD : FFR2P_M<0x00, "add", fadd, 1>; defm FDIV : FFR2P_M<0x03, "div", fdiv>; defm FMUL : FFR2P_M<0x02, "mul", fmul, 1>; defm FSUB : FFR2P_M<0x01, "sub", fsub>; //===----------------------------------------------------------------------===// // Floating Point Branch Codes //===----------------------------------------------------------------------===// // Mips branch codes. These correspond to condcode in MipsInstrInfo.h. // They must be kept in synch. def MIPS_BRANCH_F : PatLeaf<(i32 0)>; def MIPS_BRANCH_T : PatLeaf<(i32 1)>; /// Floating Point Branch of False/True (Likely) let isBranch=1, isTerminator=1, hasDelaySlot=1, base=0x8, Uses=[FCR31] in - class FBRANCH : FFI<0x11, (outs), - (ins brtarget:$dst), !strconcat(asmstr, "\t$dst"), - [(MipsFPBrcond op, bb:$dst)]>; + class FBRANCH nd, bits<1> tf, PatLeaf op, string asmstr> : + FFI<0x11, (outs), (ins brtarget:$dst), !strconcat(asmstr, "\t$dst"), + [(MipsFPBrcond op, bb:$dst)]> { + let Inst{20-18} = 0; + let Inst{17} = nd; + let Inst{16} = tf; +} -def BC1F : FBRANCH; -def BC1T : FBRANCH; +def BC1F : FBRANCH<0, 0, MIPS_BRANCH_F, "bc1f">; +def BC1T : FBRANCH<0, 1, MIPS_BRANCH_T, "bc1t">; //===----------------------------------------------------------------------===// // Floating Point Flag Conditions //===----------------------------------------------------------------------===// // Mips condition codes. They must correspond to condcode in MipsInstrInfo.h. // They must be kept in synch. def MIPS_FCOND_F : PatLeaf<(i32 0)>; def MIPS_FCOND_UN : PatLeaf<(i32 1)>; def MIPS_FCOND_OEQ : PatLeaf<(i32 2)>; def MIPS_FCOND_UEQ : PatLeaf<(i32 3)>; def MIPS_FCOND_OLT : PatLeaf<(i32 4)>; def MIPS_FCOND_ULT : PatLeaf<(i32 5)>; def MIPS_FCOND_OLE : PatLeaf<(i32 6)>; def MIPS_FCOND_ULE : PatLeaf<(i32 7)>; def MIPS_FCOND_SF : PatLeaf<(i32 8)>; def MIPS_FCOND_NGLE : PatLeaf<(i32 9)>; def MIPS_FCOND_SEQ : PatLeaf<(i32 10)>; def MIPS_FCOND_NGL : PatLeaf<(i32 11)>; def MIPS_FCOND_LT : PatLeaf<(i32 12)>; def MIPS_FCOND_NGE : PatLeaf<(i32 13)>; def MIPS_FCOND_LE : PatLeaf<(i32 14)>; def MIPS_FCOND_NGT : PatLeaf<(i32 15)>; /// Floating Point Compare let Defs=[FCR31] in { - def FCMP_S32 : FCC<0x0, (outs), (ins FGR32:$fs, FGR32:$ft, condcode:$cc), + def FCMP_S32 : FCC<0x10, (outs), (ins FGR32:$fs, FGR32:$ft, condcode:$cc), "c.$cc.s\t$fs, $ft", [(MipsFPCmp FGR32:$fs, FGR32:$ft, imm:$cc)]>; - def FCMP_D32 : FCC<0x1, (outs), (ins AFGR64:$fs, AFGR64:$ft, condcode:$cc), + def FCMP_D32 : FCC<0x11, (outs), (ins AFGR64:$fs, AFGR64:$ft, condcode:$cc), "c.$cc.d\t$fs, $ft", [(MipsFPCmp AFGR64:$fs, AFGR64:$ft, imm:$cc)]>, Requires<[NotFP64bit]>; } // Conditional moves: // These instructions are expanded in // MipsISelLowering::EmitInstrWithCustomInserter if target does not have // conditional move instructions. // flag:int, data:float let usesCustomInserter = 1, Constraints = "$F = $dst" in class CondMovIntFP fmt, bits<6> func, string instr_asm> : FFR<0x11, func, fmt, (outs RC:$dst), (ins RC:$T, CPURegs:$cond, RC:$F), !strconcat(instr_asm, "\t$dst, $T, $cond"), []>; def MOVZ_S : CondMovIntFP; def MOVN_S : CondMovIntFP; let Predicates = [NotFP64bit] in { def MOVZ_D : CondMovIntFP; def MOVN_D : CondMovIntFP; } defm : MovzPats; defm : MovnPats; let Predicates = [NotFP64bit] in { defm : MovzPats; defm : MovnPats; } -let usesCustomInserter = 1, Uses = [FCR31], Constraints = "$F = $dst" in { +let cc = 0, usesCustomInserter = 1, Uses = [FCR31], + Constraints = "$F = $dst" in { // flag:float, data:int class CondMovFPInt tf, string instr_asm> : FCMOV; // flag:float, data:float +let cc = 0 in class CondMovFPFP fmt, bits<1> tf, string instr_asm> : FFCMOV; } def MOVT : CondMovFPInt; def MOVF : CondMovFPInt; def MOVT_S : CondMovFPFP; def MOVF_S : CondMovFPFP; let Predicates = [NotFP64bit] in { def MOVT_D : CondMovFPFP; def MOVF_D : CondMovFPFP; } //===----------------------------------------------------------------------===// // Floating Point Pseudo-Instructions //===----------------------------------------------------------------------===// def MOVCCRToCCR : MipsPseudo<(outs CCR:$dst), (ins CCR:$src), "# MOVCCRToCCR", []>; // This pseudo instr gets expanded into 2 mtc1 instrs after register // allocation. def BuildPairF64 : MipsPseudo<(outs AFGR64:$dst), (ins CPURegs:$lo, CPURegs:$hi), "", [(set AFGR64:$dst, (MipsBuildPairF64 CPURegs:$lo, CPURegs:$hi))]>; // This pseudo instr gets expanded into 2 mfc1 instrs after register // allocation. // if n is 0, lower part of src is extracted. // if n is 1, higher part of src is extracted. def ExtractElementF64 : MipsPseudo<(outs CPURegs:$dst), (ins AFGR64:$src, i32imm:$n), "", [(set CPURegs:$dst, (MipsExtractElementF64 AFGR64:$src, imm:$n))]>; //===----------------------------------------------------------------------===// // Floating Point Patterns //===----------------------------------------------------------------------===// def fpimm0 : PatLeaf<(fpimm), [{ return N->isExactlyValue(+0.0); }]>; def fpimm0neg : PatLeaf<(fpimm), [{ return N->isExactlyValue(-0.0); }]>; def : Pat<(f32 fpimm0), (MTC1 ZERO)>; def : Pat<(f32 fpimm0neg), (FNEG_S (MTC1 ZERO))>; def : Pat<(f32 (sint_to_fp CPURegs:$src)), (CVT_S_W (MTC1 CPURegs:$src))>; def : Pat<(f64 (sint_to_fp CPURegs:$src)), (CVT_D32_W (MTC1 CPURegs:$src))>; def : Pat<(i32 (fp_to_sint FGR32:$src)), (MFC1 (TRUNC_W_S FGR32:$src))>; def : Pat<(i32 (fp_to_sint AFGR64:$src)), (MFC1 (TRUNC_W_D32 AFGR64:$src))>; let Predicates = [NotFP64bit] in { def : Pat<(f32 (fround AFGR64:$src)), (CVT_S_D32 AFGR64:$src)>; def : Pat<(f64 (fextend FGR32:$src)), (CVT_D32_S FGR32:$src)>; } Index: vendor/llvm/dist/lib/Target/Mips/MipsInstrFormats.td =================================================================== --- vendor/llvm/dist/lib/Target/Mips/MipsInstrFormats.td (revision 228363) +++ vendor/llvm/dist/lib/Target/Mips/MipsInstrFormats.td (revision 228364) @@ -1,267 +1,292 @@ //===- MipsInstrFormats.td - Mips Instruction Formats ------*- tablegen -*-===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===// // Describe MIPS instructions format // // CPU INSTRUCTION FORMATS // // opcode - operation code. // rs - src reg. // rt - dst reg (on a 2 regs instr) or src reg (on a 3 reg instr). // rd - dst reg, only used on 3 regs instr. // shamt - only used on shift instructions, contains the shift amount. // funct - combined with opcode field give us an operation code. // //===----------------------------------------------------------------------===// +// Format specifies the encoding used by the instruction. This is part of the +// ad-hoc solution used to emit machine instruction encodings by our machine +// code emitter. +class Format val> { + bits<4> Value = val; +} + +def Pseudo : Format<0>; +def FrmR : Format<1>; +def FrmI : Format<2>; +def FrmJ : Format<3>; +def FrmFR : Format<4>; +def FrmFI : Format<5>; +def FrmOther : Format<6>; // Instruction w/ a custom format + // Generic Mips Format class MipsInst pattern, - InstrItinClass itin>: Instruction + InstrItinClass itin, Format f>: Instruction { field bits<32> Inst; + Format Form = f; let Namespace = "Mips"; - bits<6> opcode; + bits<6> Opcode = 0; - // Top 5 bits are the 'opcode' field - let Inst{31-26} = opcode; + // Top 6 bits are the 'opcode' field + let Inst{31-26} = Opcode; - dag OutOperandList = outs; - dag InOperandList = ins; + let OutOperandList = outs; + let InOperandList = ins; let AsmString = asmstr; let Pattern = pattern; let Itinerary = itin; + + // + // Attributes specific to Mips instructions... + // + bits<4> FormBits = Form.Value; + + // TSFlags layout should be kept in sync with MipsInstrInfo.h. + let TSFlags{3-0} = FormBits; } // Mips Pseudo Instructions Format class MipsPseudo pattern>: - MipsInst { + MipsInst { + let isCodeGenOnly = 1; let isPseudo = 1; } //===----------------------------------------------------------------------===// // Format R instruction class in Mips : <|opcode|rs|rt|rd|shamt|funct|> //===----------------------------------------------------------------------===// class FR op, bits<6> _funct, dag outs, dag ins, string asmstr, list pattern, InstrItinClass itin>: - MipsInst + MipsInst { bits<5> rd; bits<5> rs; bits<5> rt; bits<5> shamt; bits<6> funct; - let opcode = op; + let Opcode = op; let funct = _funct; let Inst{25-21} = rs; let Inst{20-16} = rt; let Inst{15-11} = rd; let Inst{10-6} = shamt; let Inst{5-0} = funct; } //===----------------------------------------------------------------------===// // Format I instruction class in Mips : <|opcode|rs|rt|immediate|> //===----------------------------------------------------------------------===// class FI op, dag outs, dag ins, string asmstr, list pattern, - InstrItinClass itin>: MipsInst + InstrItinClass itin>: MipsInst { bits<5> rt; bits<5> rs; bits<16> imm16; - let opcode = op; + let Opcode = op; let Inst{25-21} = rs; let Inst{20-16} = rt; let Inst{15-0} = imm16; } class CBranchBase op, dag outs, dag ins, string asmstr, list pattern, InstrItinClass itin>: - MipsInst + MipsInst { bits<5> rs; bits<5> rt; bits<16> imm16; - let opcode = op; + let Opcode = op; let Inst{25-21} = rs; let Inst{20-16} = rt; let Inst{15-0} = imm16; } //===----------------------------------------------------------------------===// // Format J instruction class in Mips : <|opcode|address|> //===----------------------------------------------------------------------===// class FJ op, dag outs, dag ins, string asmstr, list pattern, - InstrItinClass itin>: MipsInst + InstrItinClass itin>: MipsInst { bits<26> addr; - let opcode = op; + let Opcode = op; let Inst{25-0} = addr; } //===----------------------------------------------------------------------===// // // FLOATING POINT INSTRUCTION FORMATS // // opcode - operation code. // fs - src reg. // ft - dst reg (on a 2 regs instr) or src reg (on a 3 reg instr). // fd - dst reg, only used on 3 regs instr. // fmt - double or single precision. // funct - combined with opcode field give us an operation code. // //===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===// // Format FR instruction class in Mips : <|opcode|fmt|ft|fs|fd|funct|> //===----------------------------------------------------------------------===// class FFR op, bits<6> _funct, bits<5> _fmt, dag outs, dag ins, string asmstr, list pattern> : - MipsInst + MipsInst { bits<5> fd; bits<5> fs; bits<5> ft; bits<5> fmt; bits<6> funct; - let opcode = op; + let Opcode = op; let funct = _funct; let fmt = _fmt; let Inst{25-21} = fmt; let Inst{20-16} = ft; let Inst{15-11} = fs; let Inst{10-6} = fd; let Inst{5-0} = funct; } //===----------------------------------------------------------------------===// // Format FI instruction class in Mips : <|opcode|base|ft|immediate|> //===----------------------------------------------------------------------===// class FFI op, dag outs, dag ins, string asmstr, list pattern>: - MipsInst + MipsInst { bits<5> ft; bits<5> base; bits<16> imm16; - let opcode = op; + let Opcode = op; let Inst{25-21} = base; let Inst{20-16} = ft; let Inst{15-0} = imm16; } //===----------------------------------------------------------------------===// // Compare instruction class in Mips : <|010001|fmt|ft|fs|0000011|condcode|> //===----------------------------------------------------------------------===// class FCC _fmt, dag outs, dag ins, string asmstr, list pattern> : - MipsInst + MipsInst { bits<5> fs; bits<5> ft; bits<4> cc; bits<5> fmt; - let opcode = 0x11; + let Opcode = 0x11; let fmt = _fmt; let Inst{25-21} = fmt; let Inst{20-16} = ft; let Inst{15-11} = fs; let Inst{10-6} = 0; let Inst{5-4} = 0b11; let Inst{3-0} = cc; } class FCMOV _tf, dag outs, dag ins, string asmstr, list pattern> : - MipsInst + MipsInst { bits<5> rd; bits<5> rs; - bits<3> N; + bits<3> cc; bits<1> tf; - let opcode = 0; + let Opcode = 0; let tf = _tf; let Inst{25-21} = rs; - let Inst{20-18} = N; + let Inst{20-18} = cc; let Inst{17} = 0; let Inst{16} = tf; let Inst{15-11} = rd; let Inst{10-6} = 0; let Inst{5-0} = 1; } class FFCMOV _fmt, bits<1> _tf, dag outs, dag ins, string asmstr, list pattern> : - MipsInst + MipsInst { bits<5> fd; bits<5> fs; - bits<3> N; + bits<3> cc; bits<5> fmt; bits<1> tf; - let opcode = 17; + let Opcode = 17; let fmt = _fmt; let tf = _tf; let Inst{25-21} = fmt; - let Inst{20-18} = N; + let Inst{20-18} = cc; let Inst{17} = 0; let Inst{16} = tf; let Inst{15-11} = fs; let Inst{10-6} = fd; let Inst{5-0} = 17; } // FP unary instructions without patterns. class FFR1 funct, bits<5> fmt, string opstr, string fmtstr, RegisterClass DstRC, RegisterClass SrcRC> : FFR<0x11, funct, fmt, (outs DstRC:$fd), (ins SrcRC:$fs), !strconcat(opstr, ".", fmtstr, "\t$fd, $fs"), []> { let ft = 0; } // FP unary instructions with patterns. class FFR1P funct, bits<5> fmt, string opstr, string fmtstr, RegisterClass DstRC, RegisterClass SrcRC, SDNode OpNode> : FFR<0x11, funct, fmt, (outs DstRC:$fd), (ins SrcRC:$fs), !strconcat(opstr, ".", fmtstr, "\t$fd, $fs"), [(set DstRC:$fd, (OpNode SrcRC:$fs))]> { let ft = 0; } class FFR2P funct, bits<5> fmt, string opstr, string fmtstr, RegisterClass RC, SDNode OpNode> : FFR<0x11, funct, fmt, (outs RC:$fd), (ins RC:$fs, RC:$ft), !strconcat(opstr, ".", fmtstr, "\t$fd, $fs, $ft"), [(set RC:$fd, (OpNode RC:$fs, RC:$ft))]>; Index: vendor/llvm/dist/lib/Target/Mips/MipsInstrInfo.td =================================================================== --- vendor/llvm/dist/lib/Target/Mips/MipsInstrInfo.td (revision 228363) +++ vendor/llvm/dist/lib/Target/Mips/MipsInstrInfo.td (revision 228364) @@ -1,1017 +1,1038 @@ //===- MipsInstrInfo.td - Target Description for Mips Target -*- tablegen -*-=// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file contains the Mips implementation of the TargetInstrInfo class. // //===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===// // Instruction format superclass //===----------------------------------------------------------------------===// include "MipsInstrFormats.td" //===----------------------------------------------------------------------===// // Mips profiles and nodes //===----------------------------------------------------------------------===// def SDT_MipsRet : SDTypeProfile<0, 1, [SDTCisInt<0>]>; def SDT_MipsJmpLink : SDTypeProfile<0, 1, [SDTCisVT<0, iPTR>]>; def SDT_MipsCMov : SDTypeProfile<1, 4, [SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>, SDTCisSameAs<3, 4>, SDTCisInt<4>]>; def SDT_MipsCallSeqStart : SDCallSeqStart<[SDTCisVT<0, i32>]>; def SDT_MipsCallSeqEnd : SDCallSeqEnd<[SDTCisVT<0, i32>, SDTCisVT<1, i32>]>; def SDT_MipsMAddMSub : SDTypeProfile<0, 4, [SDTCisVT<0, i32>, SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>, SDTCisSameAs<2, 3>]>; def SDT_MipsDivRem : SDTypeProfile<0, 2, [SDTCisInt<0>, SDTCisSameAs<0, 1>]>; def SDT_MipsThreadPointer : SDTypeProfile<1, 0, [SDTCisPtrTy<0>]>; def SDT_MipsDynAlloc : SDTypeProfile<1, 1, [SDTCisVT<0, i32>, SDTCisVT<1, iPTR>]>; def SDT_Sync : SDTypeProfile<0, 1, [SDTCisVT<0, i32>]>; def SDT_Ext : SDTypeProfile<1, 3, [SDTCisInt<0>, SDTCisSameAs<0, 1>, SDTCisVT<2, i32>, SDTCisSameAs<2, 3>]>; def SDT_Ins : SDTypeProfile<1, 4, [SDTCisInt<0>, SDTCisSameAs<0, 1>, SDTCisVT<2, i32>, SDTCisSameAs<2, 3>, SDTCisSameAs<0, 4>]>; // Call def MipsJmpLink : SDNode<"MipsISD::JmpLink",SDT_MipsJmpLink, [SDNPHasChain, SDNPOutGlue, SDNPOptInGlue, SDNPVariadic]>; // Hi and Lo nodes are used to handle global addresses. Used on // MipsISelLowering to lower stuff like GlobalAddress, ExternalSymbol // static model. (nothing to do with Mips Registers Hi and Lo) def MipsHi : SDNode<"MipsISD::Hi", SDTIntUnaryOp>; def MipsLo : SDNode<"MipsISD::Lo", SDTIntUnaryOp>; def MipsGPRel : SDNode<"MipsISD::GPRel", SDTIntUnaryOp>; // TlsGd node is used to handle General Dynamic TLS def MipsTlsGd : SDNode<"MipsISD::TlsGd", SDTIntUnaryOp>; // TprelHi and TprelLo nodes are used to handle Local Exec TLS def MipsTprelHi : SDNode<"MipsISD::TprelHi", SDTIntUnaryOp>; def MipsTprelLo : SDNode<"MipsISD::TprelLo", SDTIntUnaryOp>; // Thread pointer def MipsThreadPointer: SDNode<"MipsISD::ThreadPointer", SDT_MipsThreadPointer>; // Return def MipsRet : SDNode<"MipsISD::Ret", SDT_MipsRet, [SDNPHasChain, SDNPOptInGlue]>; // These are target-independent nodes, but have target-specific formats. def callseq_start : SDNode<"ISD::CALLSEQ_START", SDT_MipsCallSeqStart, [SDNPHasChain, SDNPOutGlue]>; def callseq_end : SDNode<"ISD::CALLSEQ_END", SDT_MipsCallSeqEnd, [SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>; // MAdd*/MSub* nodes def MipsMAdd : SDNode<"MipsISD::MAdd", SDT_MipsMAddMSub, [SDNPOptInGlue, SDNPOutGlue]>; def MipsMAddu : SDNode<"MipsISD::MAddu", SDT_MipsMAddMSub, [SDNPOptInGlue, SDNPOutGlue]>; def MipsMSub : SDNode<"MipsISD::MSub", SDT_MipsMAddMSub, [SDNPOptInGlue, SDNPOutGlue]>; def MipsMSubu : SDNode<"MipsISD::MSubu", SDT_MipsMAddMSub, [SDNPOptInGlue, SDNPOutGlue]>; // DivRem(u) nodes def MipsDivRem : SDNode<"MipsISD::DivRem", SDT_MipsDivRem, [SDNPOutGlue]>; def MipsDivRemU : SDNode<"MipsISD::DivRemU", SDT_MipsDivRem, [SDNPOutGlue]>; // Target constant nodes that are not part of any isel patterns and remain // unchanged can cause instructions with illegal operands to be emitted. // Wrapper node patterns give the instruction selector a chance to replace // target constant nodes that would otherwise remain unchanged with ADDiu // nodes. Without these wrapper node patterns, the following conditional move // instrucion is emitted when function cmov2 in test/CodeGen/Mips/cmov.ll is // compiled: // movn %got(d)($gp), %got(c)($gp), $4 // This instruction is illegal since movn can take only register operands. def MipsWrapperPIC : SDNode<"MipsISD::WrapperPIC", SDTIntUnaryOp>; // Pointer to dynamically allocated stack area. def MipsDynAlloc : SDNode<"MipsISD::DynAlloc", SDT_MipsDynAlloc, [SDNPHasChain, SDNPInGlue]>; def MipsSync : SDNode<"MipsISD::Sync", SDT_Sync, [SDNPHasChain]>; def MipsExt : SDNode<"MipsISD::Ext", SDT_Ext>; def MipsIns : SDNode<"MipsISD::Ins", SDT_Ins>; //===----------------------------------------------------------------------===// // Mips Instruction Predicate Definitions. //===----------------------------------------------------------------------===// def HasSEInReg : Predicate<"Subtarget.hasSEInReg()">; def HasBitCount : Predicate<"Subtarget.hasBitCount()">; def HasSwap : Predicate<"Subtarget.hasSwap()">; def HasCondMov : Predicate<"Subtarget.hasCondMov()">; def HasMips32 : Predicate<"Subtarget.hasMips32()">; def HasMips32r2 : Predicate<"Subtarget.hasMips32r2()">; def HasMips64 : Predicate<"Subtarget.hasMips64()">; def NotMips64 : Predicate<"!Subtarget.hasMips64()">; def HasMips64r2 : Predicate<"Subtarget.hasMips64r2()">; def IsN64 : Predicate<"Subtarget.isABI_N64()">; def NotN64 : Predicate<"!Subtarget.isABI_N64()">; //===----------------------------------------------------------------------===// // Mips Operand, Complex Patterns and Transformations Definitions. //===----------------------------------------------------------------------===// // Instruction operand types def brtarget : Operand; def calltarget : Operand; def simm16 : Operand; def simm16_64 : Operand; def shamt : Operand; // Unsigned Operand def uimm16 : Operand { let PrintMethod = "printUnsignedImm"; } // Address operand def mem : Operand { let PrintMethod = "printMemOperand"; let MIOperandInfo = (ops CPURegs, simm16); + let EncoderMethod = "getMemEncoding"; } def mem64 : Operand { let PrintMethod = "printMemOperand"; let MIOperandInfo = (ops CPU64Regs, simm16_64); } def mem_ea : Operand { let PrintMethod = "printMemOperandEA"; let MIOperandInfo = (ops CPURegs, simm16); + let EncoderMethod = "getMemEncoding"; } +// size operand of ext instruction +def size_ext : Operand { + let EncoderMethod = "getSizeExtEncoding"; +} + +// size operand of ins instruction +def size_ins : Operand { + let EncoderMethod = "getSizeInsEncoding"; +} + // Transformation Function - get the lower 16 bits. def LO16 : SDNodeXFormgetZExtValue() & 0xFFFF); }]>; // Transformation Function - get the higher 16 bits. def HI16 : SDNodeXFormgetZExtValue() >> 16); }]>; // Node immediate fits as 16-bit sign extended on target immediate. // e.g. addi, andi def immSExt16 : PatLeaf<(imm), [{ return isInt<16>(N->getSExtValue()); }]>; // Node immediate fits as 16-bit zero extended on target immediate. // The LO16 param means that only the lower 16 bits of the node // immediate are caught. // e.g. addiu, sltiu def immZExt16 : PatLeaf<(imm), [{ if (N->getValueType(0) == MVT::i32) return (uint32_t)N->getZExtValue() == (unsigned short)N->getZExtValue(); else return (uint64_t)N->getZExtValue() == (unsigned short)N->getZExtValue(); }], LO16>; // shamt field must fit in 5 bits. def immZExt5 : PatLeaf<(imm), [{ return N->getZExtValue() == ((N->getZExtValue()) & 0x1f) ; }]>; // Mips Address Mode! SDNode frameindex could possibily be a match // since load and store instructions from stack used it. def addr : ComplexPattern; //===----------------------------------------------------------------------===// // Pattern fragment for load/store //===----------------------------------------------------------------------===// class UnalignedLoad : PatFrag<(ops node:$ptr), (Node node:$ptr), [{ LoadSDNode *LD = cast(N); return LD->getMemoryVT().getSizeInBits()/8 > LD->getAlignment(); }]>; class AlignedLoad : PatFrag<(ops node:$ptr), (Node node:$ptr), [{ LoadSDNode *LD = cast(N); return LD->getMemoryVT().getSizeInBits()/8 <= LD->getAlignment(); }]>; class UnalignedStore : PatFrag<(ops node:$val, node:$ptr), (Node node:$val, node:$ptr), [{ StoreSDNode *SD = cast(N); return SD->getMemoryVT().getSizeInBits()/8 > SD->getAlignment(); }]>; class AlignedStore : PatFrag<(ops node:$val, node:$ptr), (Node node:$val, node:$ptr), [{ StoreSDNode *SD = cast(N); return SD->getMemoryVT().getSizeInBits()/8 <= SD->getAlignment(); }]>; // Load/Store PatFrags. def sextloadi16_a : AlignedLoad; def zextloadi16_a : AlignedLoad; def extloadi16_a : AlignedLoad; def load_a : AlignedLoad; def sextloadi32_a : AlignedLoad; def zextloadi32_a : AlignedLoad; def extloadi32_a : AlignedLoad; def truncstorei16_a : AlignedStore; def store_a : AlignedStore; def truncstorei32_a : AlignedStore; def sextloadi16_u : UnalignedLoad; def zextloadi16_u : UnalignedLoad; def extloadi16_u : UnalignedLoad; def load_u : UnalignedLoad; def sextloadi32_u : UnalignedLoad; def zextloadi32_u : UnalignedLoad; def extloadi32_u : UnalignedLoad; def truncstorei16_u : UnalignedStore; def store_u : UnalignedStore; def truncstorei32_u : UnalignedStore; //===----------------------------------------------------------------------===// // Instructions specific format //===----------------------------------------------------------------------===// // Arithmetic and logical instructions with 3 register operands. class ArithLogicR op, bits<6> func, string instr_asm, SDNode OpNode, InstrItinClass itin, RegisterClass RC, bit isComm = 0>: FR { let shamt = 0; let isCommutable = isComm; } class ArithOverflowR op, bits<6> func, string instr_asm, InstrItinClass itin, RegisterClass RC, bit isComm = 0>: FR { let shamt = 0; let isCommutable = isComm; } // Arithmetic and logical instructions with 2 register operands. class ArithLogicI op, string instr_asm, SDNode OpNode, Operand Od, PatLeaf imm_type, RegisterClass RC> : - FI; + FI; class ArithOverflowI op, string instr_asm, SDNode OpNode, Operand Od, PatLeaf imm_type, RegisterClass RC> : - FI; + FI; // Arithmetic Multiply ADD/SUB let rd = 0, shamt = 0, Defs = [HI, LO], Uses = [HI, LO] in class MArithR func, string instr_asm, SDNode op, bit isComm = 0> : FR<0x1c, func, (outs), (ins CPURegs:$rs, CPURegs:$rt), !strconcat(instr_asm, "\t$rs, $rt"), [(op CPURegs:$rs, CPURegs:$rt, LO, HI)], IIImul> { let rd = 0; let shamt = 0; let isCommutable = isComm; } // Logical class LogicNOR op, bits<6> func, string instr_asm, RegisterClass RC>: FR { let shamt = 0; let isCommutable = 1; } // Shifts class LogicR_shift_rotate_imm func, bits<5> _rs, string instr_asm, SDNode OpNode>: FR<0x00, func, (outs CPURegs:$rd), (ins CPURegs:$rt, shamt:$shamt), !strconcat(instr_asm, "\t$rd, $rt, $shamt"), [(set CPURegs:$rd, (OpNode CPURegs:$rt, (i32 immZExt5:$shamt)))], IIAlu> { let rs = _rs; } class LogicR_shift_rotate_reg func, bits<5> isRotate, string instr_asm, SDNode OpNode>: FR<0x00, func, (outs CPURegs:$rd), (ins CPURegs:$rs, CPURegs:$rt), !strconcat(instr_asm, "\t$rd, $rt, $rs"), [(set CPURegs:$rd, (OpNode CPURegs:$rt, CPURegs:$rs))], IIAlu> { let shamt = isRotate; } // Load Upper Imediate class LoadUpper op, string instr_asm>: - FI { + FI { let rs = 0; } +class FMem op, dag outs, dag ins, string asmstr, list pattern, + InstrItinClass itin>: FFI { + bits<21> addr; + let Inst{25-21} = addr{20-16}; + let Inst{15-0} = addr{15-0}; +} + // Memory Load/Store let canFoldAsLoad = 1 in class LoadM op, string instr_asm, PatFrag OpNode, RegisterClass RC, Operand MemOpnd, bit Pseudo>: - FI { let isPseudo = Pseudo; } class StoreM op, string instr_asm, PatFrag OpNode, RegisterClass RC, Operand MemOpnd, bit Pseudo>: - FI { let isPseudo = Pseudo; } // 32-bit load. multiclass LoadM32 op, string instr_asm, PatFrag OpNode, bit Pseudo = 0> { def #NAME# : LoadM, Requires<[NotN64]>; def _P8 : LoadM, Requires<[IsN64]>; } // 64-bit load. multiclass LoadM64 op, string instr_asm, PatFrag OpNode, bit Pseudo = 0> { def #NAME# : LoadM, Requires<[NotN64]>; def _P8 : LoadM, Requires<[IsN64]>; } // 32-bit store. multiclass StoreM32 op, string instr_asm, PatFrag OpNode, bit Pseudo = 0> { def #NAME# : StoreM, Requires<[NotN64]>; def _P8 : StoreM, Requires<[IsN64]>; } // 64-bit store. multiclass StoreM64 op, string instr_asm, PatFrag OpNode, bit Pseudo = 0> { def #NAME# : StoreM, Requires<[NotN64]>; def _P8 : StoreM, Requires<[IsN64]>; } // Conditional Branch class CBranch op, string instr_asm, PatFrag cond_op, RegisterClass RC>: - CBranchBase { + CBranchBase { let isBranch = 1; let isTerminator = 1; let hasDelaySlot = 1; } class CBranchZero op, bits<5> _rt, string instr_asm, PatFrag cond_op, RegisterClass RC>: - CBranchBase { + CBranchBase { let rt = _rt; let isBranch = 1; let isTerminator = 1; let hasDelaySlot = 1; } // SetCC class SetCC_R op, bits<6> func, string instr_asm, PatFrag cond_op, RegisterClass RC>: FR { let shamt = 0; } class SetCC_I op, string instr_asm, PatFrag cond_op, Operand Od, PatLeaf imm_type, RegisterClass RC>: - FI; // Unconditional branch let isBranch=1, isTerminator=1, isBarrier=1, hasDelaySlot = 1 in class JumpFJ op, string instr_asm>: FJ; let isBranch=1, isTerminator=1, isBarrier=1, rd=0, hasDelaySlot = 1 in class JumpFR op, bits<6> func, string instr_asm>: FR { let rt = 0; let rd = 0; let shamt = 0; } // Jump and Link (Call) let isCall=1, hasDelaySlot=1, // All calls clobber the non-callee saved registers... Defs = [AT, V0, V1, A0, A1, A2, A3, T0, T1, T2, T3, T4, T5, T6, T7, T8, T9, K0, K1, D0, D1, D2, D3, D4, D5, D6, D7, D8, D9], Uses = [GP] in { class JumpLink op, string instr_asm>: FJ; class JumpLinkReg op, bits<6> func, string instr_asm>: FR { let rt = 0; let rd = 31; let shamt = 0; } class BranchLink: - FI<0x1, (outs), (ins CPURegs:$rs, brtarget:$target, variable_ops), - !strconcat(instr_asm, "\t$rs, $target"), [], IIBranch> { - let rt = 0; - } + FI<0x1, (outs), (ins CPURegs:$rs, brtarget:$imm16, variable_ops), + !strconcat(instr_asm, "\t$rs, $imm16"), [], IIBranch>; } // Mul, Div class Mul func, string instr_asm, InstrItinClass itin>: FR<0x00, func, (outs), (ins CPURegs:$rs, CPURegs:$rt), !strconcat(instr_asm, "\t$rs, $rt"), [], itin> { let rd = 0; let shamt = 0; let isCommutable = 1; let Defs = [HI, LO]; } class Div func, string instr_asm, InstrItinClass itin>: FR<0x00, func, (outs), (ins CPURegs:$rs, CPURegs:$rt), !strconcat(instr_asm, "\t$$zero, $rs, $rt"), [(op CPURegs:$rs, CPURegs:$rt)], itin> { let rd = 0; let shamt = 0; let Defs = [HI, LO]; } // Move from Hi/Lo class MoveFromLOHI func, string instr_asm>: FR<0x00, func, (outs CPURegs:$rd), (ins), !strconcat(instr_asm, "\t$rd"), [], IIHiLo> { let rs = 0; let rt = 0; let shamt = 0; } class MoveToLOHI func, string instr_asm>: FR<0x00, func, (outs), (ins CPURegs:$rs), !strconcat(instr_asm, "\t$rs"), [], IIHiLo> { let rt = 0; let rd = 0; let shamt = 0; } class EffectiveAddress : - FI<0x09, (outs CPURegs:$rt), (ins mem_ea:$addr), + FMem<0x09, (outs CPURegs:$rt), (ins mem_ea:$addr), instr_asm, [(set CPURegs:$rt, addr:$addr)], IIAlu>; // Count Leading Ones/Zeros in Word class CountLeading func, string instr_asm, list pattern>: FR<0x1c, func, (outs CPURegs:$rd), (ins CPURegs:$rs), !strconcat(instr_asm, "\t$rd, $rs"), pattern, IIAlu>, Requires<[HasBitCount]> { let shamt = 0; let rt = rd; } // Sign Extend in Register. class SignExtInReg sa, string instr_asm, ValueType vt>: - FR<0x3f, 0x20, (outs CPURegs:$rd), (ins CPURegs:$rt), + FR<0x1f, 0x20, (outs CPURegs:$rd), (ins CPURegs:$rt), !strconcat(instr_asm, "\t$rd, $rt"), [(set CPURegs:$rd, (sext_inreg CPURegs:$rt, vt))], NoItinerary> { let rs = 0; let shamt = sa; let Predicates = [HasSEInReg]; } // Byte Swap class ByteSwap func, bits<5> sa, string instr_asm>: FR<0x1f, func, (outs CPURegs:$rd), (ins CPURegs:$rt), !strconcat(instr_asm, "\t$rd, $rt"), [(set CPURegs:$rd, (bswap CPURegs:$rt))], NoItinerary> { let rs = 0; let shamt = sa; let Predicates = [HasSwap]; } // Read Hardware class ReadHardware: FR<0x1f, 0x3b, (outs CPURegs:$rt), (ins HWRegs:$rd), "rdhwr\t$rt, $rd", [], IIAlu> { let rs = 0; let shamt = 0; } // Ext and Ins class ExtIns _funct, string instr_asm, dag outs, dag ins, list pattern, InstrItinClass itin>: FR<0x1f, _funct, outs, ins, !strconcat(instr_asm, " $rt, $rs, $pos, $sz"), pattern, itin>, Requires<[HasMips32r2]> { bits<5> pos; bits<5> sz; let rd = sz; let shamt = pos; } // Atomic instructions with 2 source operands (ATOMIC_SWAP & ATOMIC_LOAD_*). class Atomic2Ops : MipsPseudo<(outs CPURegs:$dst), (ins CPURegs:$ptr, CPURegs:$incr), !strconcat("atomic_", Opstr, "\t$dst, $ptr, $incr"), [(set CPURegs:$dst, (Op CPURegs:$ptr, CPURegs:$incr))]>; // Atomic Compare & Swap. class AtomicCmpSwap : MipsPseudo<(outs CPURegs:$dst), (ins CPURegs:$ptr, CPURegs:$cmp, CPURegs:$swap), !strconcat("atomic_cmp_swap_", Width, "\t$dst, $ptr, $cmp, $swap"), [(set CPURegs:$dst, (Op CPURegs:$ptr, CPURegs:$cmp, CPURegs:$swap))]>; //===----------------------------------------------------------------------===// // Pseudo instructions //===----------------------------------------------------------------------===// // As stack alignment is always done with addiu, we need a 16-bit immediate let Defs = [SP], Uses = [SP] in { def ADJCALLSTACKDOWN : MipsPseudo<(outs), (ins uimm16:$amt), "!ADJCALLSTACKDOWN $amt", [(callseq_start timm:$amt)]>; def ADJCALLSTACKUP : MipsPseudo<(outs), (ins uimm16:$amt1, uimm16:$amt2), "!ADJCALLSTACKUP $amt1", [(callseq_end timm:$amt1, timm:$amt2)]>; } // Some assembly macros need to avoid pseudoinstructions and assembler // automatic reodering, we should reorder ourselves. def MACRO : MipsPseudo<(outs), (ins), ".set\tmacro", []>; def REORDER : MipsPseudo<(outs), (ins), ".set\treorder", []>; def NOMACRO : MipsPseudo<(outs), (ins), ".set\tnomacro", []>; def NOREORDER : MipsPseudo<(outs), (ins), ".set\tnoreorder", []>; // These macros are inserted to prevent GAS from complaining // when using the AT register. def NOAT : MipsPseudo<(outs), (ins), ".set\tnoat", []>; def ATMACRO : MipsPseudo<(outs), (ins), ".set\tat", []>; // When handling PIC code the assembler needs .cpload and .cprestore // directives. If the real instructions corresponding these directives // are used, we have the same behavior, but get also a bunch of warnings // from the assembler. def CPLOAD : MipsPseudo<(outs), (ins CPURegs:$picreg), ".cpload\t$picreg", []>; def CPRESTORE : MipsPseudo<(outs), (ins i32imm:$loc), ".cprestore\t$loc", []>; let usesCustomInserter = 1 in { def ATOMIC_LOAD_ADD_I8 : Atomic2Ops; def ATOMIC_LOAD_ADD_I16 : Atomic2Ops; def ATOMIC_LOAD_ADD_I32 : Atomic2Ops; def ATOMIC_LOAD_SUB_I8 : Atomic2Ops; def ATOMIC_LOAD_SUB_I16 : Atomic2Ops; def ATOMIC_LOAD_SUB_I32 : Atomic2Ops; def ATOMIC_LOAD_AND_I8 : Atomic2Ops; def ATOMIC_LOAD_AND_I16 : Atomic2Ops; def ATOMIC_LOAD_AND_I32 : Atomic2Ops; def ATOMIC_LOAD_OR_I8 : Atomic2Ops; def ATOMIC_LOAD_OR_I16 : Atomic2Ops; def ATOMIC_LOAD_OR_I32 : Atomic2Ops; def ATOMIC_LOAD_XOR_I8 : Atomic2Ops; def ATOMIC_LOAD_XOR_I16 : Atomic2Ops; def ATOMIC_LOAD_XOR_I32 : Atomic2Ops; def ATOMIC_LOAD_NAND_I8 : Atomic2Ops; def ATOMIC_LOAD_NAND_I16 : Atomic2Ops; def ATOMIC_LOAD_NAND_I32 : Atomic2Ops; def ATOMIC_SWAP_I8 : Atomic2Ops; def ATOMIC_SWAP_I16 : Atomic2Ops; def ATOMIC_SWAP_I32 : Atomic2Ops; def ATOMIC_CMP_SWAP_I8 : AtomicCmpSwap; def ATOMIC_CMP_SWAP_I16 : AtomicCmpSwap; def ATOMIC_CMP_SWAP_I32 : AtomicCmpSwap; } //===----------------------------------------------------------------------===// // Instruction definition //===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===// // MipsI Instructions //===----------------------------------------------------------------------===// /// Arithmetic Instructions (ALU Immediate) def ADDiu : ArithLogicI<0x09, "addiu", add, simm16, immSExt16, CPURegs>; def ADDi : ArithOverflowI<0x08, "addi", add, simm16, immSExt16, CPURegs>; def SLTi : SetCC_I<0x0a, "slti", setlt, simm16, immSExt16, CPURegs>; def SLTiu : SetCC_I<0x0b, "sltiu", setult, simm16, immSExt16, CPURegs>; def ANDi : ArithLogicI<0x0c, "andi", and, uimm16, immZExt16, CPURegs>; def ORi : ArithLogicI<0x0d, "ori", or, uimm16, immZExt16, CPURegs>; def XORi : ArithLogicI<0x0e, "xori", xor, uimm16, immZExt16, CPURegs>; def LUi : LoadUpper<0x0f, "lui">; /// Arithmetic Instructions (3-Operand, R-Type) def ADDu : ArithLogicR<0x00, 0x21, "addu", add, IIAlu, CPURegs, 1>; def SUBu : ArithLogicR<0x00, 0x23, "subu", sub, IIAlu, CPURegs>; def ADD : ArithOverflowR<0x00, 0x20, "add", IIAlu, CPURegs, 1>; def SUB : ArithOverflowR<0x00, 0x22, "sub", IIAlu, CPURegs>; def SLT : SetCC_R<0x00, 0x2a, "slt", setlt, CPURegs>; def SLTu : SetCC_R<0x00, 0x2b, "sltu", setult, CPURegs>; def AND : ArithLogicR<0x00, 0x24, "and", and, IIAlu, CPURegs, 1>; def OR : ArithLogicR<0x00, 0x25, "or", or, IIAlu, CPURegs, 1>; def XOR : ArithLogicR<0x00, 0x26, "xor", xor, IIAlu, CPURegs, 1>; def NOR : LogicNOR<0x00, 0x27, "nor", CPURegs>; /// Shift Instructions def SLL : LogicR_shift_rotate_imm<0x00, 0x00, "sll", shl>; def SRL : LogicR_shift_rotate_imm<0x02, 0x00, "srl", srl>; def SRA : LogicR_shift_rotate_imm<0x03, 0x00, "sra", sra>; def SLLV : LogicR_shift_rotate_reg<0x04, 0x00, "sllv", shl>; def SRLV : LogicR_shift_rotate_reg<0x06, 0x00, "srlv", srl>; def SRAV : LogicR_shift_rotate_reg<0x07, 0x00, "srav", sra>; // Rotate Instructions let Predicates = [HasMips32r2] in { def ROTR : LogicR_shift_rotate_imm<0x02, 0x01, "rotr", rotr>; def ROTRV : LogicR_shift_rotate_reg<0x06, 0x01, "rotrv", rotr>; } /// Load and Store Instructions /// aligned defm LB : LoadM32<0x20, "lb", sextloadi8>; defm LBu : LoadM32<0x24, "lbu", zextloadi8>; defm LH : LoadM32<0x21, "lh", sextloadi16_a>; defm LHu : LoadM32<0x25, "lhu", zextloadi16_a>; defm LW : LoadM32<0x23, "lw", load_a>; defm SB : StoreM32<0x28, "sb", truncstorei8>; defm SH : StoreM32<0x29, "sh", truncstorei16_a>; defm SW : StoreM32<0x2b, "sw", store_a>; /// unaligned defm ULH : LoadM32<0x21, "ulh", sextloadi16_u, 1>; defm ULHu : LoadM32<0x25, "ulhu", zextloadi16_u, 1>; defm ULW : LoadM32<0x23, "ulw", load_u, 1>; defm USH : StoreM32<0x29, "ush", truncstorei16_u, 1>; defm USW : StoreM32<0x2b, "usw", store_u, 1>; let hasSideEffects = 1 in def SYNC : MipsInst<(outs), (ins i32imm:$stype), "sync $stype", - [(MipsSync imm:$stype)], NoItinerary> + [(MipsSync imm:$stype)], NoItinerary, FrmOther> { - let opcode = 0; + bits<5> stype; + let Opcode = 0; let Inst{25-11} = 0; + let Inst{10-6} = stype; let Inst{5-0} = 15; } /// Load-linked, Store-conditional let mayLoad = 1 in - def LL : FI<0x30, (outs CPURegs:$dst), (ins mem:$addr), - "ll\t$dst, $addr", [], IILoad>; -let mayStore = 1, Constraints = "$src = $dst" in - def SC : FI<0x38, (outs CPURegs:$dst), (ins CPURegs:$src, mem:$addr), - "sc\t$src, $addr", [], IIStore>; + def LL : FMem<0x30, (outs CPURegs:$rt), (ins mem:$addr), + "ll\t$rt, $addr", [], IILoad>; +let mayStore = 1, Constraints = "$rt = $dst" in + def SC : FMem<0x38, (outs CPURegs:$dst), (ins CPURegs:$rt, mem:$addr), + "sc\t$rt, $addr", [], IIStore>; /// Jump and Branch Instructions def J : JumpFJ<0x02, "j">; let isIndirectBranch = 1 in def JR : JumpFR<0x00, 0x08, "jr">; def JAL : JumpLink<0x03, "jal">; def JALR : JumpLinkReg<0x00, 0x09, "jalr">; def BEQ : CBranch<0x04, "beq", seteq, CPURegs>; def BNE : CBranch<0x05, "bne", setne, CPURegs>; def BGEZ : CBranchZero<0x01, 1, "bgez", setge, CPURegs>; def BGTZ : CBranchZero<0x07, 0, "bgtz", setgt, CPURegs>; -def BLEZ : CBranchZero<0x07, 0, "blez", setle, CPURegs>; +def BLEZ : CBranchZero<0x06, 0, "blez", setle, CPURegs>; def BLTZ : CBranchZero<0x01, 0, "bltz", setlt, CPURegs>; -def BGEZAL : BranchLink<"bgezal">; -def BLTZAL : BranchLink<"bltzal">; +let rt=0x11 in + def BGEZAL : BranchLink<"bgezal">; +let rt=0x10 in + def BLTZAL : BranchLink<"bltzal">; let isReturn=1, isTerminator=1, hasDelaySlot=1, - isBarrier=1, hasCtrlDep=1, rs=0, rt=0, shamt=0 in - def RET : FR <0x00, 0x02, (outs), (ins CPURegs:$target), + isBarrier=1, hasCtrlDep=1, rd=0, rt=0, shamt=0 in + def RET : FR <0x00, 0x08, (outs), (ins CPURegs:$target), "jr\t$target", [(MipsRet CPURegs:$target)], IIBranch>; /// Multiply and Divide Instructions. def MULT : Mul<0x18, "mult", IIImul>; def MULTu : Mul<0x19, "multu", IIImul>; def SDIV : Div; def UDIV : Div; let Defs = [HI] in def MTHI : MoveToLOHI<0x11, "mthi">; let Defs = [LO] in def MTLO : MoveToLOHI<0x13, "mtlo">; let Uses = [HI] in def MFHI : MoveFromLOHI<0x10, "mfhi">; let Uses = [LO] in def MFLO : MoveFromLOHI<0x12, "mflo">; /// Sign Ext In Register Instructions. def SEB : SignExtInReg<0x10, "seb", i8>; def SEH : SignExtInReg<0x18, "seh", i16>; /// Count Leading def CLZ : CountLeading<0x20, "clz", [(set CPURegs:$rd, (ctlz CPURegs:$rs))]>; def CLO : CountLeading<0x21, "clo", [(set CPURegs:$rd, (ctlz (not CPURegs:$rs)))]>; /// Byte Swap def WSBW : ByteSwap<0x20, 0x2, "wsbw">; // Conditional moves: // These instructions are expanded in // MipsISelLowering::EmitInstrWithCustomInserter if target does not have // conditional move instructions. // flag:int, data:int class CondMovIntInt funct, string instr_asm> : FR<0, funct, (outs CPURegs:$rd), (ins CPURegs:$rs, CPURegs:$rt, CPURegs:$F), !strconcat(instr_asm, "\t$rd, $rs, $rt"), [], NoItinerary> { let shamt = 0; let usesCustomInserter = 1; let Constraints = "$F = $rd"; } def MOVZ_I : CondMovIntInt<0x0a, "movz">; def MOVN_I : CondMovIntInt<0x0b, "movn">; /// No operation let addr=0 in def NOP : FJ<0, (outs), (ins), "nop", [], IIAlu>; // FrameIndexes are legalized when they are operands from load/store // instructions. The same not happens for stack address copies, so an // add op with mem ComplexPattern is used and the stack address copy // can be matched. It's similar to Sparc LEA_ADDRi def LEA_ADDiu : EffectiveAddress<"addiu\t$rt, $addr">; // DynAlloc node points to dynamically allocated stack space. // $sp is added to the list of implicitly used registers to prevent dead code // elimination from removing instructions that modify $sp. let Uses = [SP] in def DynAlloc : EffectiveAddress<"addiu\t$rt, $addr">; // MADD*/MSUB* def MADD : MArithR<0, "madd", MipsMAdd, 1>; def MADDU : MArithR<1, "maddu", MipsMAddu, 1>; def MSUB : MArithR<4, "msub", MipsMSub>; def MSUBU : MArithR<5, "msubu", MipsMSubu>; // MUL is a assembly macro in the current used ISAs. In recent ISA's // it is a real instruction. def MUL : ArithLogicR<0x1c, 0x02, "mul", mul, IIImul, CPURegs, 1>, Requires<[HasMips32]>; def RDHWR : ReadHardware; def EXT : ExtIns<0, "ext", (outs CPURegs:$rt), - (ins CPURegs:$rs, uimm16:$pos, uimm16:$sz), + (ins CPURegs:$rs, uimm16:$pos, size_ext:$sz), [(set CPURegs:$rt, (MipsExt CPURegs:$rs, immZExt5:$pos, immZExt5:$sz))], NoItinerary>; let Constraints = "$src = $rt" in def INS : ExtIns<4, "ins", (outs CPURegs:$rt), - (ins CPURegs:$rs, uimm16:$pos, uimm16:$sz, CPURegs:$src), + (ins CPURegs:$rs, uimm16:$pos, size_ins:$sz, CPURegs:$src), [(set CPURegs:$rt, (MipsIns CPURegs:$rs, immZExt5:$pos, immZExt5:$sz, CPURegs:$src))], NoItinerary>; //===----------------------------------------------------------------------===// // Arbitrary patterns that map to one or more instructions //===----------------------------------------------------------------------===// // Small immediates def : Pat<(i32 immSExt16:$in), (ADDiu ZERO, imm:$in)>; def : Pat<(i32 immZExt16:$in), (ORi ZERO, imm:$in)>; // Arbitrary immediates def : Pat<(i32 imm:$imm), (ORi (LUi (HI16 imm:$imm)), (LO16 imm:$imm))>; // Carry patterns def : Pat<(subc CPURegs:$lhs, CPURegs:$rhs), (SUBu CPURegs:$lhs, CPURegs:$rhs)>; def : Pat<(addc CPURegs:$lhs, CPURegs:$rhs), (ADDu CPURegs:$lhs, CPURegs:$rhs)>; def : Pat<(addc CPURegs:$src, immSExt16:$imm), (ADDiu CPURegs:$src, imm:$imm)>; // Call def : Pat<(MipsJmpLink (i32 tglobaladdr:$dst)), (JAL tglobaladdr:$dst)>; def : Pat<(MipsJmpLink (i32 texternalsym:$dst)), (JAL texternalsym:$dst)>; //def : Pat<(MipsJmpLink CPURegs:$dst), // (JALR CPURegs:$dst)>; // hi/lo relocs def : Pat<(MipsHi tglobaladdr:$in), (LUi tglobaladdr:$in)>; def : Pat<(MipsHi tblockaddress:$in), (LUi tblockaddress:$in)>; def : Pat<(MipsLo tglobaladdr:$in), (ADDiu ZERO, tglobaladdr:$in)>; def : Pat<(MipsLo tblockaddress:$in), (ADDiu ZERO, tblockaddress:$in)>; def : Pat<(add CPURegs:$hi, (MipsLo tglobaladdr:$lo)), (ADDiu CPURegs:$hi, tglobaladdr:$lo)>; def : Pat<(add CPURegs:$hi, (MipsLo tblockaddress:$lo)), (ADDiu CPURegs:$hi, tblockaddress:$lo)>; def : Pat<(MipsHi tjumptable:$in), (LUi tjumptable:$in)>; def : Pat<(MipsLo tjumptable:$in), (ADDiu ZERO, tjumptable:$in)>; def : Pat<(add CPURegs:$hi, (MipsLo tjumptable:$lo)), (ADDiu CPURegs:$hi, tjumptable:$lo)>; def : Pat<(MipsHi tconstpool:$in), (LUi tconstpool:$in)>; def : Pat<(MipsLo tconstpool:$in), (ADDiu ZERO, tconstpool:$in)>; def : Pat<(add CPURegs:$hi, (MipsLo tconstpool:$lo)), (ADDiu CPURegs:$hi, tconstpool:$lo)>; // gp_rel relocs def : Pat<(add CPURegs:$gp, (MipsGPRel tglobaladdr:$in)), (ADDiu CPURegs:$gp, tglobaladdr:$in)>; def : Pat<(add CPURegs:$gp, (MipsGPRel tconstpool:$in)), (ADDiu CPURegs:$gp, tconstpool:$in)>; // tlsgd def : Pat<(add CPURegs:$gp, (MipsTlsGd tglobaltlsaddr:$in)), (ADDiu CPURegs:$gp, tglobaltlsaddr:$in)>; // tprel hi/lo def : Pat<(MipsTprelHi tglobaltlsaddr:$in), (LUi tglobaltlsaddr:$in)>; def : Pat<(MipsTprelLo tglobaltlsaddr:$in), (ADDiu ZERO, tglobaltlsaddr:$in)>; def : Pat<(add CPURegs:$hi, (MipsTprelLo tglobaltlsaddr:$lo)), (ADDiu CPURegs:$hi, tglobaltlsaddr:$lo)>; // wrapper_pic class WrapperPICPat: Pat<(MipsWrapperPIC node:$in), (ADDiu GP, node:$in)>; def : WrapperPICPat; def : WrapperPICPat; def : WrapperPICPat; def : WrapperPICPat; def : WrapperPICPat; // Mips does not have "not", so we expand our way def : Pat<(not CPURegs:$in), (NOR CPURegs:$in, ZERO)>; // extended load and stores def : Pat<(extloadi1 addr:$src), (LBu addr:$src)>; def : Pat<(extloadi8 addr:$src), (LBu addr:$src)>; def : Pat<(extloadi16_a addr:$src), (LHu addr:$src)>; def : Pat<(extloadi16_u addr:$src), (ULHu addr:$src)>; // peepholes def : Pat<(store (i32 0), addr:$dst), (SW ZERO, addr:$dst)>; // brcond patterns multiclass BrcondPats { def : Pat<(brcond (i32 (setne RC:$lhs, 0)), bb:$dst), (BNEOp RC:$lhs, ZEROReg, bb:$dst)>; def : Pat<(brcond (i32 (seteq RC:$lhs, 0)), bb:$dst), (BEQOp RC:$lhs, ZEROReg, bb:$dst)>; def : Pat<(brcond (i32 (setge RC:$lhs, RC:$rhs)), bb:$dst), (BEQ (SLTOp RC:$lhs, RC:$rhs), ZERO, bb:$dst)>; def : Pat<(brcond (i32 (setuge RC:$lhs, RC:$rhs)), bb:$dst), (BEQ (SLTuOp RC:$lhs, RC:$rhs), ZERO, bb:$dst)>; def : Pat<(brcond (i32 (setge RC:$lhs, immSExt16:$rhs)), bb:$dst), (BEQ (SLTiOp RC:$lhs, immSExt16:$rhs), ZERO, bb:$dst)>; def : Pat<(brcond (i32 (setuge RC:$lhs, immSExt16:$rhs)), bb:$dst), (BEQ (SLTiuOp RC:$lhs, immSExt16:$rhs), ZERO, bb:$dst)>; def : Pat<(brcond (i32 (setle RC:$lhs, RC:$rhs)), bb:$dst), (BEQ (SLTOp RC:$rhs, RC:$lhs), ZERO, bb:$dst)>; def : Pat<(brcond (i32 (setule RC:$lhs, RC:$rhs)), bb:$dst), (BEQ (SLTuOp RC:$rhs, RC:$lhs), ZERO, bb:$dst)>; def : Pat<(brcond RC:$cond, bb:$dst), (BNEOp RC:$cond, ZEROReg, bb:$dst)>; } defm : BrcondPats; // select patterns multiclass MovzPats { def : Pat<(select (i32 (setge CPURegs:$lhs, CPURegs:$rhs)), RC:$T, RC:$F), (MOVZInst RC:$T, (SLT CPURegs:$lhs, CPURegs:$rhs), RC:$F)>; def : Pat<(select (i32 (setuge CPURegs:$lhs, CPURegs:$rhs)), RC:$T, RC:$F), (MOVZInst RC:$T, (SLTu CPURegs:$lhs, CPURegs:$rhs), RC:$F)>; def : Pat<(select (i32 (setge CPURegs:$lhs, immSExt16:$rhs)), RC:$T, RC:$F), (MOVZInst RC:$T, (SLTi CPURegs:$lhs, immSExt16:$rhs), RC:$F)>; def : Pat<(select (i32 (setuge CPURegs:$lh, immSExt16:$rh)), RC:$T, RC:$F), (MOVZInst RC:$T, (SLTiu CPURegs:$lh, immSExt16:$rh), RC:$F)>; def : Pat<(select (i32 (setle CPURegs:$lhs, CPURegs:$rhs)), RC:$T, RC:$F), (MOVZInst RC:$T, (SLT CPURegs:$rhs, CPURegs:$lhs), RC:$F)>; def : Pat<(select (i32 (setule CPURegs:$lhs, CPURegs:$rhs)), RC:$T, RC:$F), (MOVZInst RC:$T, (SLTu CPURegs:$rhs, CPURegs:$lhs), RC:$F)>; def : Pat<(select (i32 (seteq CPURegs:$lhs, CPURegs:$rhs)), RC:$T, RC:$F), (MOVZInst RC:$T, (XOR CPURegs:$lhs, CPURegs:$rhs), RC:$F)>; def : Pat<(select (i32 (seteq CPURegs:$lhs, 0)), RC:$T, RC:$F), (MOVZInst RC:$T, CPURegs:$lhs, RC:$F)>; } multiclass MovnPats { def : Pat<(select (i32 (setne CPURegs:$lhs, CPURegs:$rhs)), RC:$T, RC:$F), (MOVNInst RC:$T, (XOR CPURegs:$lhs, CPURegs:$rhs), RC:$F)>; def : Pat<(select CPURegs:$cond, RC:$T, RC:$F), (MOVNInst RC:$T, CPURegs:$cond, RC:$F)>; def : Pat<(select (i32 (setne CPURegs:$lhs, 0)), RC:$T, RC:$F), (MOVNInst RC:$T, CPURegs:$lhs, RC:$F)>; } defm : MovzPats; defm : MovnPats; // setcc patterns multiclass SeteqPats { def : Pat<(seteq RC:$lhs, RC:$rhs), (SLTiuOp (XOROp RC:$lhs, RC:$rhs), 1)>; def : Pat<(setne RC:$lhs, RC:$rhs), (SLTuOp ZEROReg, (XOROp RC:$lhs, RC:$rhs))>; } multiclass SetlePats { def : Pat<(setle RC:$lhs, RC:$rhs), (XORi (SLTOp RC:$rhs, RC:$lhs), 1)>; def : Pat<(setule RC:$lhs, RC:$rhs), (XORi (SLTuOp RC:$rhs, RC:$lhs), 1)>; } multiclass SetgtPats { def : Pat<(setgt RC:$lhs, RC:$rhs), (SLTOp RC:$rhs, RC:$lhs)>; def : Pat<(setugt RC:$lhs, RC:$rhs), (SLTuOp RC:$rhs, RC:$lhs)>; } multiclass SetgePats { def : Pat<(setge RC:$lhs, RC:$rhs), (XORi (SLTOp RC:$lhs, RC:$rhs), 1)>; def : Pat<(setuge RC:$lhs, RC:$rhs), (XORi (SLTuOp RC:$lhs, RC:$rhs), 1)>; } multiclass SetgeImmPats { def : Pat<(setge RC:$lhs, immSExt16:$rhs), (XORi (SLTiOp RC:$lhs, immSExt16:$rhs), 1)>; def : Pat<(setuge RC:$lhs, immSExt16:$rhs), (XORi (SLTiuOp RC:$lhs, immSExt16:$rhs), 1)>; } defm : SeteqPats; defm : SetlePats; defm : SetgtPats; defm : SetgePats; defm : SetgeImmPats; // select MipsDynAlloc def : Pat<(MipsDynAlloc addr:$f), (DynAlloc addr:$f)>; //===----------------------------------------------------------------------===// // Floating Point Support //===----------------------------------------------------------------------===// include "MipsInstrFPU.td" include "Mips64InstrInfo.td" Index: vendor/llvm/dist/lib/Target/Mips/MipsJITInfo.cpp =================================================================== --- vendor/llvm/dist/lib/Target/Mips/MipsJITInfo.cpp (revision 228363) +++ vendor/llvm/dist/lib/Target/Mips/MipsJITInfo.cpp (revision 228364) @@ -1,230 +1,230 @@ //===- MipsJITInfo.cpp - Implement the JIT interfaces for the Mips target -===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file implements the JIT interfaces for the Mips target. // //===----------------------------------------------------------------------===// #define DEBUG_TYPE "jit" #include "MipsJITInfo.h" #include "MipsInstrInfo.h" #include "MipsRelocations.h" #include "MipsSubtarget.h" #include "llvm/Function.h" #include "llvm/CodeGen/JITCodeEmitter.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Support/Memory.h" #include using namespace llvm; void MipsJITInfo::replaceMachineCodeForFunction(void *Old, void *New) { report_fatal_error("MipsJITInfo::replaceMachineCodeForFunction"); } /// JITCompilerFunction - This contains the address of the JIT function used to /// compile a function lazily. static TargetJITInfo::JITCompilerFn JITCompilerFunction; // Get the ASMPREFIX for the current host. This is often '_'. #ifndef __USER_LABEL_PREFIX__ #define __USER_LABEL_PREFIX__ #endif #define GETASMPREFIX2(X) #X #define GETASMPREFIX(X) GETASMPREFIX2(X) #define ASMPREFIX GETASMPREFIX(__USER_LABEL_PREFIX__) // CompilationCallback stub - We can't use a C function with inline assembly in // it, because the prolog/epilog inserted by GCC won't work for us. Instead, // write our own wrapper, which does things our way, so we have complete control // over register saving and restoring. This code saves registers, calls // MipsCompilationCallbackC and restores registers. extern "C" { #if defined (__mips__) void MipsCompilationCallback(); asm( ".text\n" ".align 2\n" ".globl " ASMPREFIX "MipsCompilationCallback\n" ASMPREFIX "MipsCompilationCallback:\n" ".ent " ASMPREFIX "MipsCompilationCallback\n" - ".frame $29, 32, $31\n" + ".frame $sp, 32, $ra\n" ".set noreorder\n" ".cpload $t9\n" - "addiu $sp, $sp, -60\n" + "addiu $sp, $sp, -64\n" ".cprestore 16\n" // Save argument registers a0, a1, a2, a3, f12, f14 since they may contain // stuff for the real target function right now. We have to act as if this // whole compilation callback doesn't exist as far as the caller is // concerned. We also need to save the ra register since it contains the // original return address, and t8 register since it contains the address // of the end of function stub. "sw $a0, 20($sp)\n" "sw $a1, 24($sp)\n" "sw $a2, 28($sp)\n" "sw $a3, 32($sp)\n" "sw $ra, 36($sp)\n" "sw $t8, 40($sp)\n" - "sdc1 $f12, 44($sp)\n" - "sdc1 $f14, 52($sp)\n" + "sdc1 $f12, 48($sp)\n" + "sdc1 $f14, 56($sp)\n" // t8 points at the end of function stub. Pass the beginning of the stub // to the MipsCompilationCallbackC. "addiu $a0, $t8, -16\n" "jal " ASMPREFIX "MipsCompilationCallbackC\n" "nop\n" // Restore registers. "lw $a0, 20($sp)\n" "lw $a1, 24($sp)\n" "lw $a2, 28($sp)\n" "lw $a3, 32($sp)\n" "lw $ra, 36($sp)\n" "lw $t8, 40($sp)\n" - "ldc1 $f12, 44($sp)\n" - "ldc1 $f14, 52($sp)\n" - "addiu $sp, $sp, 60\n" + "ldc1 $f12, 48($sp)\n" + "ldc1 $f14, 56($sp)\n" + "addiu $sp, $sp, 64\n" // Jump to the (newly modified) stub to invoke the real function. "addiu $t8, $t8, -16\n" "jr $t8\n" "nop\n" ".set reorder\n" ".end " ASMPREFIX "MipsCompilationCallback\n" ); #else // host != Mips void MipsCompilationCallback() { llvm_unreachable( "Cannot call MipsCompilationCallback() on a non-Mips arch!"); } #endif } /// MipsCompilationCallbackC - This is the target-specific function invoked /// by the function stub when we did not know the real target of a call. /// This function must locate the start of the stub or call site and pass /// it into the JIT compiler function. extern "C" void MipsCompilationCallbackC(intptr_t StubAddr) { // Get the address of the compiled code for this function. intptr_t NewVal = (intptr_t) JITCompilerFunction((void*) StubAddr); // Rewrite the function stub so that we don't end up here every time we // execute the call. We're replacing the first four instructions of the // stub with code that jumps to the compiled function: // lui $t9, %hi(NewVal) // addiu $t9, $t9, %lo(NewVal) // jr $t9 // nop int Hi = ((unsigned)NewVal & 0xffff0000) >> 16; if ((NewVal & 0x8000) != 0) Hi++; int Lo = (int)(NewVal & 0xffff); *(intptr_t *)(StubAddr) = 0xf << 26 | 25 << 16 | Hi; *(intptr_t *)(StubAddr + 4) = 9 << 26 | 25 << 21 | 25 << 16 | Lo; *(intptr_t *)(StubAddr + 8) = 25 << 21 | 8; *(intptr_t *)(StubAddr + 12) = 0; sys::Memory::InvalidateInstructionCache((void*) StubAddr, 16); } TargetJITInfo::LazyResolverFn MipsJITInfo::getLazyResolverFunction( JITCompilerFn F) { JITCompilerFunction = F; return MipsCompilationCallback; } TargetJITInfo::StubLayout MipsJITInfo::getStubLayout() { // The stub contains 4 4-byte instructions, aligned at 4 bytes. See // emitFunctionStub for details. StubLayout Result = { 4*4, 4 }; return Result; } void *MipsJITInfo::emitFunctionStub(const Function* F, void *Fn, JITCodeEmitter &JCE) { JCE.emitAlignment(4); void *Addr = (void*) (JCE.getCurrentPCValue()); if (!sys::Memory::setRangeWritable(Addr, 16)) llvm_unreachable("ERROR: Unable to mark stub writable."); intptr_t EmittedAddr; if (Fn != (void*)(intptr_t)MipsCompilationCallback) EmittedAddr = (intptr_t)Fn; else EmittedAddr = (intptr_t)MipsCompilationCallback; int Hi = ((unsigned)EmittedAddr & 0xffff0000) >> 16; if ((EmittedAddr & 0x8000) != 0) Hi++; int Lo = (int)(EmittedAddr & 0xffff); // lui t9, %hi(EmittedAddr) // addiu t9, t9, %lo(EmittedAddr) // jalr t8, t9 // nop JCE.emitWordLE(0xf << 26 | 25 << 16 | Hi); JCE.emitWordLE(9 << 26 | 25 << 21 | 25 << 16 | Lo); JCE.emitWordLE(25 << 21 | 24 << 11 | 9); JCE.emitWordLE(0); sys::Memory::InvalidateInstructionCache(Addr, 16); if (!sys::Memory::setRangeExecutable(Addr, 16)) llvm_unreachable("ERROR: Unable to mark stub executable."); return Addr; } /// relocate - Before the JIT can run a block of code that has been emitted, /// it must rewrite the code to contain the actual addresses of any /// referenced global symbols. void MipsJITInfo::relocate(void *Function, MachineRelocation *MR, unsigned NumRelocs, unsigned char* GOTBase) { for (unsigned i = 0; i != NumRelocs; ++i, ++MR) { void *RelocPos = (char*) Function + MR->getMachineCodeOffset(); intptr_t ResultPtr = (intptr_t) MR->getResultPointer(); switch ((Mips::RelocationType) MR->getRelocationType()) { case Mips::reloc_mips_branch: ResultPtr = (((ResultPtr - (intptr_t) RelocPos) - 4) >> 2) & 0xffff; *((unsigned*) RelocPos) |= (unsigned) ResultPtr; break; case Mips::reloc_mips_26: ResultPtr = (ResultPtr & 0x0fffffff) >> 2; *((unsigned*) RelocPos) |= (unsigned) ResultPtr; break; case Mips::reloc_mips_hi: ResultPtr = ResultPtr >> 16; if ((((intptr_t) (MR->getResultPointer()) & 0xffff) >> 15) == 1) { ResultPtr += 1; } *((unsigned*) RelocPos) |= (unsigned) ResultPtr; break; case Mips::reloc_mips_lo: ResultPtr = ResultPtr & 0xffff; *((unsigned*) RelocPos) |= (unsigned) ResultPtr; break; default: llvm_unreachable("ERROR: Unknown Mips relocation."); } } } Index: vendor/llvm/dist/lib/Target/PowerPC/PPCFrameLowering.cpp =================================================================== --- vendor/llvm/dist/lib/Target/PowerPC/PPCFrameLowering.cpp (revision 228363) +++ vendor/llvm/dist/lib/Target/PowerPC/PPCFrameLowering.cpp (revision 228364) @@ -1,971 +1,969 @@ //=====- PPCFrameLowering.cpp - PPC Frame Information -----------*- C++ -*-===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file contains the PPC implementation of TargetFrameLowering class. // //===----------------------------------------------------------------------===// #include "PPCFrameLowering.h" #include "PPCInstrInfo.h" #include "PPCMachineFunctionInfo.h" #include "llvm/Function.h" #include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineInstrBuilder.h" #include "llvm/CodeGen/MachineModuleInfo.h" #include "llvm/CodeGen/MachineRegisterInfo.h" #include "llvm/CodeGen/RegisterScavenging.h" #include "llvm/Target/TargetOptions.h" using namespace llvm; // FIXME This disables some code that aligns the stack to a boundary bigger than // the default (16 bytes on Darwin) when there is a stack local of greater // alignment. This does not currently work, because the delta between old and // new stack pointers is added to offsets that reference incoming parameters // after the prolog is generated, and the code that does that doesn't handle a // variable delta. You don't want to do that anyway; a better approach is to // reserve another register that retains to the incoming stack pointer, and // reference parameters relative to that. #define ALIGN_STACK 0 /// VRRegNo - Map from a numbered VR register to its enum value. /// static const unsigned short VRRegNo[] = { PPC::V0 , PPC::V1 , PPC::V2 , PPC::V3 , PPC::V4 , PPC::V5 , PPC::V6 , PPC::V7 , PPC::V8 , PPC::V9 , PPC::V10, PPC::V11, PPC::V12, PPC::V13, PPC::V14, PPC::V15, PPC::V16, PPC::V17, PPC::V18, PPC::V19, PPC::V20, PPC::V21, PPC::V22, PPC::V23, PPC::V24, PPC::V25, PPC::V26, PPC::V27, PPC::V28, PPC::V29, PPC::V30, PPC::V31 }; /// RemoveVRSaveCode - We have found that this function does not need any code /// to manipulate the VRSAVE register, even though it uses vector registers. /// This can happen when the only registers used are known to be live in or out /// of the function. Remove all of the VRSAVE related code from the function. static void RemoveVRSaveCode(MachineInstr *MI) { MachineBasicBlock *Entry = MI->getParent(); MachineFunction *MF = Entry->getParent(); // We know that the MTVRSAVE instruction immediately follows MI. Remove it. MachineBasicBlock::iterator MBBI = MI; ++MBBI; assert(MBBI != Entry->end() && MBBI->getOpcode() == PPC::MTVRSAVE); MBBI->eraseFromParent(); bool RemovedAllMTVRSAVEs = true; // See if we can find and remove the MTVRSAVE instruction from all of the // epilog blocks. for (MachineFunction::iterator I = MF->begin(), E = MF->end(); I != E; ++I) { // If last instruction is a return instruction, add an epilogue if (!I->empty() && I->back().getDesc().isReturn()) { bool FoundIt = false; for (MBBI = I->end(); MBBI != I->begin(); ) { --MBBI; if (MBBI->getOpcode() == PPC::MTVRSAVE) { MBBI->eraseFromParent(); // remove it. FoundIt = true; break; } } RemovedAllMTVRSAVEs &= FoundIt; } } // If we found and removed all MTVRSAVE instructions, remove the read of // VRSAVE as well. if (RemovedAllMTVRSAVEs) { MBBI = MI; assert(MBBI != Entry->begin() && "UPDATE_VRSAVE is first instr in block?"); --MBBI; assert(MBBI->getOpcode() == PPC::MFVRSAVE && "VRSAVE instrs wandered?"); MBBI->eraseFromParent(); } // Finally, nuke the UPDATE_VRSAVE. MI->eraseFromParent(); } // HandleVRSaveUpdate - MI is the UPDATE_VRSAVE instruction introduced by the // instruction selector. Based on the vector registers that have been used, // transform this into the appropriate ORI instruction. static void HandleVRSaveUpdate(MachineInstr *MI, const TargetInstrInfo &TII) { MachineFunction *MF = MI->getParent()->getParent(); DebugLoc dl = MI->getDebugLoc(); unsigned UsedRegMask = 0; for (unsigned i = 0; i != 32; ++i) if (MF->getRegInfo().isPhysRegUsed(VRRegNo[i])) UsedRegMask |= 1 << (31-i); // Live in and live out values already must be in the mask, so don't bother // marking them. for (MachineRegisterInfo::livein_iterator I = MF->getRegInfo().livein_begin(), E = MF->getRegInfo().livein_end(); I != E; ++I) { unsigned RegNo = getPPCRegisterNumbering(I->first); if (VRRegNo[RegNo] == I->first) // If this really is a vector reg. UsedRegMask &= ~(1 << (31-RegNo)); // Doesn't need to be marked. } for (MachineRegisterInfo::liveout_iterator I = MF->getRegInfo().liveout_begin(), E = MF->getRegInfo().liveout_end(); I != E; ++I) { unsigned RegNo = getPPCRegisterNumbering(*I); if (VRRegNo[RegNo] == *I) // If this really is a vector reg. UsedRegMask &= ~(1 << (31-RegNo)); // Doesn't need to be marked. } // If no registers are used, turn this into a copy. if (UsedRegMask == 0) { // Remove all VRSAVE code. RemoveVRSaveCode(MI); return; } unsigned SrcReg = MI->getOperand(1).getReg(); unsigned DstReg = MI->getOperand(0).getReg(); if ((UsedRegMask & 0xFFFF) == UsedRegMask) { if (DstReg != SrcReg) BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORI), DstReg) .addReg(SrcReg) .addImm(UsedRegMask); else BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORI), DstReg) .addReg(SrcReg, RegState::Kill) .addImm(UsedRegMask); } else if ((UsedRegMask & 0xFFFF0000) == UsedRegMask) { if (DstReg != SrcReg) BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORIS), DstReg) .addReg(SrcReg) .addImm(UsedRegMask >> 16); else BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORIS), DstReg) .addReg(SrcReg, RegState::Kill) .addImm(UsedRegMask >> 16); } else { if (DstReg != SrcReg) BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORIS), DstReg) .addReg(SrcReg) .addImm(UsedRegMask >> 16); else BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORIS), DstReg) .addReg(SrcReg, RegState::Kill) .addImm(UsedRegMask >> 16); BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORI), DstReg) .addReg(DstReg, RegState::Kill) .addImm(UsedRegMask & 0xFFFF); } // Remove the old UPDATE_VRSAVE instruction. MI->eraseFromParent(); } /// determineFrameLayout - Determine the size of the frame and maximum call /// frame size. void PPCFrameLowering::determineFrameLayout(MachineFunction &MF) const { MachineFrameInfo *MFI = MF.getFrameInfo(); // Get the number of bytes to allocate from the FrameInfo unsigned FrameSize = MFI->getStackSize(); // Get the alignments provided by the target, and the maximum alignment // (if any) of the fixed frame objects. unsigned MaxAlign = MFI->getMaxAlignment(); unsigned TargetAlign = getStackAlignment(); unsigned AlignMask = TargetAlign - 1; // // If we are a leaf function, and use up to 224 bytes of stack space, // don't have a frame pointer, calls, or dynamic alloca then we do not need // to adjust the stack pointer (we fit in the Red Zone). bool DisableRedZone = MF.getFunction()->hasFnAttr(Attribute::NoRedZone); // FIXME SVR4 The 32-bit SVR4 ABI has no red zone. if (!DisableRedZone && FrameSize <= 224 && // Fits in red zone. !MFI->hasVarSizedObjects() && // No dynamic alloca. !MFI->adjustsStack() && // No calls. (!ALIGN_STACK || MaxAlign <= TargetAlign)) { // No special alignment. // No need for frame MFI->setStackSize(0); return; } // Get the maximum call frame size of all the calls. unsigned maxCallFrameSize = MFI->getMaxCallFrameSize(); // Maximum call frame needs to be at least big enough for linkage and 8 args. unsigned minCallFrameSize = getMinCallFrameSize(Subtarget.isPPC64(), Subtarget.isDarwinABI()); maxCallFrameSize = std::max(maxCallFrameSize, minCallFrameSize); // If we have dynamic alloca then maxCallFrameSize needs to be aligned so // that allocations will be aligned. if (MFI->hasVarSizedObjects()) maxCallFrameSize = (maxCallFrameSize + AlignMask) & ~AlignMask; // Update maximum call frame size. MFI->setMaxCallFrameSize(maxCallFrameSize); // Include call frame size in total. FrameSize += maxCallFrameSize; // Make sure the frame is aligned. FrameSize = (FrameSize + AlignMask) & ~AlignMask; // Update frame info. MFI->setStackSize(FrameSize); } // hasFP - Return true if the specified function actually has a dedicated frame // pointer register. bool PPCFrameLowering::hasFP(const MachineFunction &MF) const { const MachineFrameInfo *MFI = MF.getFrameInfo(); // FIXME: This is pretty much broken by design: hasFP() might be called really // early, before the stack layout was calculated and thus hasFP() might return // true or false here depending on the time of call. return (MFI->getStackSize()) && needsFP(MF); } // needsFP - Return true if the specified function should have a dedicated frame // pointer register. This is true if the function has variable sized allocas or // if frame pointer elimination is disabled. bool PPCFrameLowering::needsFP(const MachineFunction &MF) const { const MachineFrameInfo *MFI = MF.getFrameInfo(); // Naked functions have no stack frame pushed, so we don't have a frame // pointer. if (MF.getFunction()->hasFnAttr(Attribute::Naked)) return false; return DisableFramePointerElim(MF) || MFI->hasVarSizedObjects() || (GuaranteedTailCallOpt && MF.getInfo()->hasFastCall()); } void PPCFrameLowering::emitPrologue(MachineFunction &MF) const { MachineBasicBlock &MBB = MF.front(); // Prolog goes in entry BB MachineBasicBlock::iterator MBBI = MBB.begin(); MachineFrameInfo *MFI = MF.getFrameInfo(); const PPCInstrInfo &TII = *static_cast(MF.getTarget().getInstrInfo()); MachineModuleInfo &MMI = MF.getMMI(); DebugLoc dl; bool needsFrameMoves = MMI.hasDebugInfo() || MF.getFunction()->needsUnwindTableEntry(); // Prepare for frame info. MCSymbol *FrameLabel = 0; // Scan the prolog, looking for an UPDATE_VRSAVE instruction. If we find it, // process it. for (unsigned i = 0; MBBI != MBB.end(); ++i, ++MBBI) { if (MBBI->getOpcode() == PPC::UPDATE_VRSAVE) { HandleVRSaveUpdate(MBBI, TII); break; } } // Move MBBI back to the beginning of the function. MBBI = MBB.begin(); // Work out frame sizes. // FIXME: determineFrameLayout() may change the frame size. This should be // moved upper, to some hook. determineFrameLayout(MF); unsigned FrameSize = MFI->getStackSize(); int NegFrameSize = -FrameSize; // Get processor type. bool isPPC64 = Subtarget.isPPC64(); // Get operating system bool isDarwinABI = Subtarget.isDarwinABI(); // Check if the link register (LR) must be saved. PPCFunctionInfo *FI = MF.getInfo(); bool MustSaveLR = FI->mustSaveLR(); // Do we have a frame pointer for this function? bool HasFP = hasFP(MF); int LROffset = PPCFrameLowering::getReturnSaveOffset(isPPC64, isDarwinABI); int FPOffset = 0; if (HasFP) { if (Subtarget.isSVR4ABI()) { MachineFrameInfo *FFI = MF.getFrameInfo(); int FPIndex = FI->getFramePointerSaveIndex(); assert(FPIndex && "No Frame Pointer Save Slot!"); FPOffset = FFI->getObjectOffset(FPIndex); } else { FPOffset = PPCFrameLowering::getFramePointerSaveOffset(isPPC64, isDarwinABI); } } if (isPPC64) { if (MustSaveLR) BuildMI(MBB, MBBI, dl, TII.get(PPC::MFLR8), PPC::X0); if (HasFP) BuildMI(MBB, MBBI, dl, TII.get(PPC::STD)) .addReg(PPC::X31) .addImm(FPOffset/4) .addReg(PPC::X1); if (MustSaveLR) BuildMI(MBB, MBBI, dl, TII.get(PPC::STD)) .addReg(PPC::X0) .addImm(LROffset / 4) .addReg(PPC::X1); } else { if (MustSaveLR) BuildMI(MBB, MBBI, dl, TII.get(PPC::MFLR), PPC::R0); if (HasFP) BuildMI(MBB, MBBI, dl, TII.get(PPC::STW)) .addReg(PPC::R31) .addImm(FPOffset) .addReg(PPC::R1); if (MustSaveLR) BuildMI(MBB, MBBI, dl, TII.get(PPC::STW)) .addReg(PPC::R0) .addImm(LROffset) .addReg(PPC::R1); } // Skip if a leaf routine. if (!FrameSize) return; // Get stack alignments. unsigned TargetAlign = getStackAlignment(); unsigned MaxAlign = MFI->getMaxAlignment(); // Adjust stack pointer: r1 += NegFrameSize. // If there is a preferred stack alignment, align R1 now if (!isPPC64) { // PPC32. if (ALIGN_STACK && MaxAlign > TargetAlign) { assert(isPowerOf2_32(MaxAlign) && isInt<16>(MaxAlign) && "Invalid alignment!"); assert(isInt<16>(NegFrameSize) && "Unhandled stack size and alignment!"); BuildMI(MBB, MBBI, dl, TII.get(PPC::RLWINM), PPC::R0) .addReg(PPC::R1) .addImm(0) .addImm(32 - Log2_32(MaxAlign)) .addImm(31); BuildMI(MBB, MBBI, dl, TII.get(PPC::SUBFIC) ,PPC::R0) .addReg(PPC::R0, RegState::Kill) .addImm(NegFrameSize); BuildMI(MBB, MBBI, dl, TII.get(PPC::STWUX)) .addReg(PPC::R1) .addReg(PPC::R1) .addReg(PPC::R0); } else if (isInt<16>(NegFrameSize)) { BuildMI(MBB, MBBI, dl, TII.get(PPC::STWU), PPC::R1) .addReg(PPC::R1) .addImm(NegFrameSize) .addReg(PPC::R1); } else { BuildMI(MBB, MBBI, dl, TII.get(PPC::LIS), PPC::R0) .addImm(NegFrameSize >> 16); BuildMI(MBB, MBBI, dl, TII.get(PPC::ORI), PPC::R0) .addReg(PPC::R0, RegState::Kill) .addImm(NegFrameSize & 0xFFFF); BuildMI(MBB, MBBI, dl, TII.get(PPC::STWUX)) .addReg(PPC::R1) .addReg(PPC::R1) .addReg(PPC::R0); } } else { // PPC64. if (ALIGN_STACK && MaxAlign > TargetAlign) { assert(isPowerOf2_32(MaxAlign) && isInt<16>(MaxAlign) && "Invalid alignment!"); assert(isInt<16>(NegFrameSize) && "Unhandled stack size and alignment!"); BuildMI(MBB, MBBI, dl, TII.get(PPC::RLDICL), PPC::X0) .addReg(PPC::X1) .addImm(0) .addImm(64 - Log2_32(MaxAlign)); BuildMI(MBB, MBBI, dl, TII.get(PPC::SUBFIC8), PPC::X0) .addReg(PPC::X0) .addImm(NegFrameSize); BuildMI(MBB, MBBI, dl, TII.get(PPC::STDUX)) .addReg(PPC::X1) .addReg(PPC::X1) .addReg(PPC::X0); } else if (isInt<16>(NegFrameSize)) { BuildMI(MBB, MBBI, dl, TII.get(PPC::STDU), PPC::X1) .addReg(PPC::X1) .addImm(NegFrameSize / 4) .addReg(PPC::X1); } else { BuildMI(MBB, MBBI, dl, TII.get(PPC::LIS8), PPC::X0) .addImm(NegFrameSize >> 16); BuildMI(MBB, MBBI, dl, TII.get(PPC::ORI8), PPC::X0) .addReg(PPC::X0, RegState::Kill) .addImm(NegFrameSize & 0xFFFF); BuildMI(MBB, MBBI, dl, TII.get(PPC::STDUX)) .addReg(PPC::X1) .addReg(PPC::X1) .addReg(PPC::X0); } } std::vector &Moves = MMI.getFrameMoves(); // Add the "machine moves" for the instructions we generated above, but in // reverse order. if (needsFrameMoves) { // Mark effective beginning of when frame pointer becomes valid. FrameLabel = MMI.getContext().CreateTempSymbol(); BuildMI(MBB, MBBI, dl, TII.get(PPC::PROLOG_LABEL)).addSym(FrameLabel); // Show update of SP. if (NegFrameSize) { MachineLocation SPDst(MachineLocation::VirtualFP); MachineLocation SPSrc(MachineLocation::VirtualFP, NegFrameSize); Moves.push_back(MachineMove(FrameLabel, SPDst, SPSrc)); } else { MachineLocation SP(isPPC64 ? PPC::X31 : PPC::R31); Moves.push_back(MachineMove(FrameLabel, SP, SP)); } if (HasFP) { MachineLocation FPDst(MachineLocation::VirtualFP, FPOffset); MachineLocation FPSrc(isPPC64 ? PPC::X31 : PPC::R31); Moves.push_back(MachineMove(FrameLabel, FPDst, FPSrc)); } if (MustSaveLR) { MachineLocation LRDst(MachineLocation::VirtualFP, LROffset); MachineLocation LRSrc(isPPC64 ? PPC::LR8 : PPC::LR); Moves.push_back(MachineMove(FrameLabel, LRDst, LRSrc)); } } MCSymbol *ReadyLabel = 0; // If there is a frame pointer, copy R1 into R31 if (HasFP) { if (!isPPC64) { BuildMI(MBB, MBBI, dl, TII.get(PPC::OR), PPC::R31) .addReg(PPC::R1) .addReg(PPC::R1); } else { BuildMI(MBB, MBBI, dl, TII.get(PPC::OR8), PPC::X31) .addReg(PPC::X1) .addReg(PPC::X1); } if (needsFrameMoves) { ReadyLabel = MMI.getContext().CreateTempSymbol(); // Mark effective beginning of when frame pointer is ready. BuildMI(MBB, MBBI, dl, TII.get(PPC::PROLOG_LABEL)).addSym(ReadyLabel); MachineLocation FPDst(HasFP ? (isPPC64 ? PPC::X31 : PPC::R31) : (isPPC64 ? PPC::X1 : PPC::R1)); MachineLocation FPSrc(MachineLocation::VirtualFP); Moves.push_back(MachineMove(ReadyLabel, FPDst, FPSrc)); } } if (needsFrameMoves) { MCSymbol *Label = HasFP ? ReadyLabel : FrameLabel; // Add callee saved registers to move list. const std::vector &CSI = MFI->getCalleeSavedInfo(); for (unsigned I = 0, E = CSI.size(); I != E; ++I) { int Offset = MFI->getObjectOffset(CSI[I].getFrameIdx()); unsigned Reg = CSI[I].getReg(); if (Reg == PPC::LR || Reg == PPC::LR8 || Reg == PPC::RM) continue; // This is a bit of a hack: CR2LT, CR2GT, CR2EQ and CR2UN are just // subregisters of CR2. We just need to emit a move of CR2. - if (Reg == PPC::CR2LT || Reg == PPC::CR2GT || Reg == PPC::CR2EQ) + if (PPC::CRBITRCRegisterClass->contains(Reg)) continue; - if (Reg == PPC::CR2UN) - Reg = PPC::CR2; MachineLocation CSDst(MachineLocation::VirtualFP, Offset); MachineLocation CSSrc(Reg); Moves.push_back(MachineMove(Label, CSDst, CSSrc)); } } } void PPCFrameLowering::emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const { MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr(); assert(MBBI != MBB.end() && "Returning block has no terminator"); const PPCInstrInfo &TII = *static_cast(MF.getTarget().getInstrInfo()); unsigned RetOpcode = MBBI->getOpcode(); DebugLoc dl; assert((RetOpcode == PPC::BLR || RetOpcode == PPC::TCRETURNri || RetOpcode == PPC::TCRETURNdi || RetOpcode == PPC::TCRETURNai || RetOpcode == PPC::TCRETURNri8 || RetOpcode == PPC::TCRETURNdi8 || RetOpcode == PPC::TCRETURNai8) && "Can only insert epilog into returning blocks"); // Get alignment info so we know how to restore r1 const MachineFrameInfo *MFI = MF.getFrameInfo(); unsigned TargetAlign = getStackAlignment(); unsigned MaxAlign = MFI->getMaxAlignment(); // Get the number of bytes allocated from the FrameInfo. int FrameSize = MFI->getStackSize(); // Get processor type. bool isPPC64 = Subtarget.isPPC64(); // Get operating system bool isDarwinABI = Subtarget.isDarwinABI(); // Check if the link register (LR) has been saved. PPCFunctionInfo *FI = MF.getInfo(); bool MustSaveLR = FI->mustSaveLR(); // Do we have a frame pointer for this function? bool HasFP = hasFP(MF); int LROffset = PPCFrameLowering::getReturnSaveOffset(isPPC64, isDarwinABI); int FPOffset = 0; if (HasFP) { if (Subtarget.isSVR4ABI()) { MachineFrameInfo *FFI = MF.getFrameInfo(); int FPIndex = FI->getFramePointerSaveIndex(); assert(FPIndex && "No Frame Pointer Save Slot!"); FPOffset = FFI->getObjectOffset(FPIndex); } else { FPOffset = PPCFrameLowering::getFramePointerSaveOffset(isPPC64, isDarwinABI); } } bool UsesTCRet = RetOpcode == PPC::TCRETURNri || RetOpcode == PPC::TCRETURNdi || RetOpcode == PPC::TCRETURNai || RetOpcode == PPC::TCRETURNri8 || RetOpcode == PPC::TCRETURNdi8 || RetOpcode == PPC::TCRETURNai8; if (UsesTCRet) { int MaxTCRetDelta = FI->getTailCallSPDelta(); MachineOperand &StackAdjust = MBBI->getOperand(1); assert(StackAdjust.isImm() && "Expecting immediate value."); // Adjust stack pointer. int StackAdj = StackAdjust.getImm(); int Delta = StackAdj - MaxTCRetDelta; assert((Delta >= 0) && "Delta must be positive"); if (MaxTCRetDelta>0) FrameSize += (StackAdj +Delta); else FrameSize += StackAdj; } if (FrameSize) { // The loaded (or persistent) stack pointer value is offset by the 'stwu' // on entry to the function. Add this offset back now. if (!isPPC64) { // If this function contained a fastcc call and GuaranteedTailCallOpt is // enabled (=> hasFastCall()==true) the fastcc call might contain a tail // call which invalidates the stack pointer value in SP(0). So we use the // value of R31 in this case. if (FI->hasFastCall() && isInt<16>(FrameSize)) { assert(hasFP(MF) && "Expecting a valid the frame pointer."); BuildMI(MBB, MBBI, dl, TII.get(PPC::ADDI), PPC::R1) .addReg(PPC::R31).addImm(FrameSize); } else if(FI->hasFastCall()) { BuildMI(MBB, MBBI, dl, TII.get(PPC::LIS), PPC::R0) .addImm(FrameSize >> 16); BuildMI(MBB, MBBI, dl, TII.get(PPC::ORI), PPC::R0) .addReg(PPC::R0, RegState::Kill) .addImm(FrameSize & 0xFFFF); BuildMI(MBB, MBBI, dl, TII.get(PPC::ADD4)) .addReg(PPC::R1) .addReg(PPC::R31) .addReg(PPC::R0); } else if (isInt<16>(FrameSize) && (!ALIGN_STACK || TargetAlign >= MaxAlign) && !MFI->hasVarSizedObjects()) { BuildMI(MBB, MBBI, dl, TII.get(PPC::ADDI), PPC::R1) .addReg(PPC::R1).addImm(FrameSize); } else { BuildMI(MBB, MBBI, dl, TII.get(PPC::LWZ),PPC::R1) .addImm(0).addReg(PPC::R1); } } else { if (FI->hasFastCall() && isInt<16>(FrameSize)) { assert(hasFP(MF) && "Expecting a valid the frame pointer."); BuildMI(MBB, MBBI, dl, TII.get(PPC::ADDI8), PPC::X1) .addReg(PPC::X31).addImm(FrameSize); } else if(FI->hasFastCall()) { BuildMI(MBB, MBBI, dl, TII.get(PPC::LIS8), PPC::X0) .addImm(FrameSize >> 16); BuildMI(MBB, MBBI, dl, TII.get(PPC::ORI8), PPC::X0) .addReg(PPC::X0, RegState::Kill) .addImm(FrameSize & 0xFFFF); BuildMI(MBB, MBBI, dl, TII.get(PPC::ADD8)) .addReg(PPC::X1) .addReg(PPC::X31) .addReg(PPC::X0); } else if (isInt<16>(FrameSize) && TargetAlign >= MaxAlign && !MFI->hasVarSizedObjects()) { BuildMI(MBB, MBBI, dl, TII.get(PPC::ADDI8), PPC::X1) .addReg(PPC::X1).addImm(FrameSize); } else { BuildMI(MBB, MBBI, dl, TII.get(PPC::LD), PPC::X1) .addImm(0).addReg(PPC::X1); } } } if (isPPC64) { if (MustSaveLR) BuildMI(MBB, MBBI, dl, TII.get(PPC::LD), PPC::X0) .addImm(LROffset/4).addReg(PPC::X1); if (HasFP) BuildMI(MBB, MBBI, dl, TII.get(PPC::LD), PPC::X31) .addImm(FPOffset/4).addReg(PPC::X1); if (MustSaveLR) BuildMI(MBB, MBBI, dl, TII.get(PPC::MTLR8)).addReg(PPC::X0); } else { if (MustSaveLR) BuildMI(MBB, MBBI, dl, TII.get(PPC::LWZ), PPC::R0) .addImm(LROffset).addReg(PPC::R1); if (HasFP) BuildMI(MBB, MBBI, dl, TII.get(PPC::LWZ), PPC::R31) .addImm(FPOffset).addReg(PPC::R1); if (MustSaveLR) BuildMI(MBB, MBBI, dl, TII.get(PPC::MTLR)).addReg(PPC::R0); } // Callee pop calling convention. Pop parameter/linkage area. Used for tail // call optimization if (GuaranteedTailCallOpt && RetOpcode == PPC::BLR && MF.getFunction()->getCallingConv() == CallingConv::Fast) { PPCFunctionInfo *FI = MF.getInfo(); unsigned CallerAllocatedAmt = FI->getMinReservedArea(); unsigned StackReg = isPPC64 ? PPC::X1 : PPC::R1; unsigned FPReg = isPPC64 ? PPC::X31 : PPC::R31; unsigned TmpReg = isPPC64 ? PPC::X0 : PPC::R0; unsigned ADDIInstr = isPPC64 ? PPC::ADDI8 : PPC::ADDI; unsigned ADDInstr = isPPC64 ? PPC::ADD8 : PPC::ADD4; unsigned LISInstr = isPPC64 ? PPC::LIS8 : PPC::LIS; unsigned ORIInstr = isPPC64 ? PPC::ORI8 : PPC::ORI; if (CallerAllocatedAmt && isInt<16>(CallerAllocatedAmt)) { BuildMI(MBB, MBBI, dl, TII.get(ADDIInstr), StackReg) .addReg(StackReg).addImm(CallerAllocatedAmt); } else { BuildMI(MBB, MBBI, dl, TII.get(LISInstr), TmpReg) .addImm(CallerAllocatedAmt >> 16); BuildMI(MBB, MBBI, dl, TII.get(ORIInstr), TmpReg) .addReg(TmpReg, RegState::Kill) .addImm(CallerAllocatedAmt & 0xFFFF); BuildMI(MBB, MBBI, dl, TII.get(ADDInstr)) .addReg(StackReg) .addReg(FPReg) .addReg(TmpReg); } } else if (RetOpcode == PPC::TCRETURNdi) { MBBI = MBB.getLastNonDebugInstr(); MachineOperand &JumpTarget = MBBI->getOperand(0); BuildMI(MBB, MBBI, dl, TII.get(PPC::TAILB)). addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset()); } else if (RetOpcode == PPC::TCRETURNri) { MBBI = MBB.getLastNonDebugInstr(); assert(MBBI->getOperand(0).isReg() && "Expecting register operand."); BuildMI(MBB, MBBI, dl, TII.get(PPC::TAILBCTR)); } else if (RetOpcode == PPC::TCRETURNai) { MBBI = MBB.getLastNonDebugInstr(); MachineOperand &JumpTarget = MBBI->getOperand(0); BuildMI(MBB, MBBI, dl, TII.get(PPC::TAILBA)).addImm(JumpTarget.getImm()); } else if (RetOpcode == PPC::TCRETURNdi8) { MBBI = MBB.getLastNonDebugInstr(); MachineOperand &JumpTarget = MBBI->getOperand(0); BuildMI(MBB, MBBI, dl, TII.get(PPC::TAILB8)). addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset()); } else if (RetOpcode == PPC::TCRETURNri8) { MBBI = MBB.getLastNonDebugInstr(); assert(MBBI->getOperand(0).isReg() && "Expecting register operand."); BuildMI(MBB, MBBI, dl, TII.get(PPC::TAILBCTR8)); } else if (RetOpcode == PPC::TCRETURNai8) { MBBI = MBB.getLastNonDebugInstr(); MachineOperand &JumpTarget = MBBI->getOperand(0); BuildMI(MBB, MBBI, dl, TII.get(PPC::TAILBA8)).addImm(JumpTarget.getImm()); } } static bool spillsCR(const MachineFunction &MF) { const PPCFunctionInfo *FuncInfo = MF.getInfo(); return FuncInfo->isCRSpilled(); } /// MustSaveLR - Return true if this function requires that we save the LR /// register onto the stack in the prolog and restore it in the epilog of the /// function. static bool MustSaveLR(const MachineFunction &MF, unsigned LR) { const PPCFunctionInfo *MFI = MF.getInfo(); // We need a save/restore of LR if there is any def of LR (which is // defined by calls, including the PIC setup sequence), or if there is // some use of the LR stack slot (e.g. for builtin_return_address). // (LR comes in 32 and 64 bit versions.) MachineRegisterInfo::def_iterator RI = MF.getRegInfo().def_begin(LR); return RI !=MF.getRegInfo().def_end() || MFI->isLRStoreRequired(); } void PPCFrameLowering::processFunctionBeforeCalleeSavedScan(MachineFunction &MF, RegScavenger *RS) const { const TargetRegisterInfo *RegInfo = MF.getTarget().getRegisterInfo(); // Save and clear the LR state. PPCFunctionInfo *FI = MF.getInfo(); unsigned LR = RegInfo->getRARegister(); FI->setMustSaveLR(MustSaveLR(MF, LR)); MF.getRegInfo().setPhysRegUnused(LR); // Save R31 if necessary int FPSI = FI->getFramePointerSaveIndex(); bool isPPC64 = Subtarget.isPPC64(); bool isDarwinABI = Subtarget.isDarwinABI(); MachineFrameInfo *MFI = MF.getFrameInfo(); // If the frame pointer save index hasn't been defined yet. if (!FPSI && needsFP(MF)) { // Find out what the fix offset of the frame pointer save area. int FPOffset = getFramePointerSaveOffset(isPPC64, isDarwinABI); // Allocate the frame index for frame pointer save area. FPSI = MFI->CreateFixedObject(isPPC64? 8 : 4, FPOffset, true); // Save the result. FI->setFramePointerSaveIndex(FPSI); } // Reserve stack space to move the linkage area to in case of a tail call. int TCSPDelta = 0; if (GuaranteedTailCallOpt && (TCSPDelta = FI->getTailCallSPDelta()) < 0) { MFI->CreateFixedObject(-1 * TCSPDelta, TCSPDelta, true); } // Reserve a slot closest to SP or frame pointer if we have a dynalloc or // a large stack, which will require scavenging a register to materialize a // large offset. // FIXME: this doesn't actually check stack size, so is a bit pessimistic // FIXME: doesn't detect whether or not we need to spill vXX, which requires // r0 for now. if (RegInfo->requiresRegisterScavenging(MF)) // FIXME (64-bit): Enable. if (needsFP(MF) || spillsCR(MF)) { const TargetRegisterClass *GPRC = &PPC::GPRCRegClass; const TargetRegisterClass *G8RC = &PPC::G8RCRegClass; const TargetRegisterClass *RC = isPPC64 ? G8RC : GPRC; RS->setScavengingFrameIndex(MFI->CreateStackObject(RC->getSize(), RC->getAlignment(), false)); } } void PPCFrameLowering::processFunctionBeforeFrameFinalized(MachineFunction &MF) const { // Early exit if not using the SVR4 ABI. if (!Subtarget.isSVR4ABI()) return; // Get callee saved register information. MachineFrameInfo *FFI = MF.getFrameInfo(); const std::vector &CSI = FFI->getCalleeSavedInfo(); // Early exit if no callee saved registers are modified! if (CSI.empty() && !needsFP(MF)) { return; } unsigned MinGPR = PPC::R31; unsigned MinG8R = PPC::X31; unsigned MinFPR = PPC::F31; unsigned MinVR = PPC::V31; bool HasGPSaveArea = false; bool HasG8SaveArea = false; bool HasFPSaveArea = false; bool HasCRSaveArea = false; bool HasVRSAVESaveArea = false; bool HasVRSaveArea = false; SmallVector GPRegs; SmallVector G8Regs; SmallVector FPRegs; SmallVector VRegs; for (unsigned i = 0, e = CSI.size(); i != e; ++i) { unsigned Reg = CSI[i].getReg(); if (PPC::GPRCRegisterClass->contains(Reg)) { HasGPSaveArea = true; GPRegs.push_back(CSI[i]); if (Reg < MinGPR) { MinGPR = Reg; } } else if (PPC::G8RCRegisterClass->contains(Reg)) { HasG8SaveArea = true; G8Regs.push_back(CSI[i]); if (Reg < MinG8R) { MinG8R = Reg; } } else if (PPC::F8RCRegisterClass->contains(Reg)) { HasFPSaveArea = true; FPRegs.push_back(CSI[i]); if (Reg < MinFPR) { MinFPR = Reg; } // FIXME SVR4: Disable CR save area for now. } else if (PPC::CRBITRCRegisterClass->contains(Reg) || PPC::CRRCRegisterClass->contains(Reg)) { // HasCRSaveArea = true; } else if (PPC::VRSAVERCRegisterClass->contains(Reg)) { HasVRSAVESaveArea = true; } else if (PPC::VRRCRegisterClass->contains(Reg)) { HasVRSaveArea = true; VRegs.push_back(CSI[i]); if (Reg < MinVR) { MinVR = Reg; } } else { llvm_unreachable("Unknown RegisterClass!"); } } PPCFunctionInfo *PFI = MF.getInfo(); int64_t LowerBound = 0; // Take into account stack space reserved for tail calls. int TCSPDelta = 0; if (GuaranteedTailCallOpt && (TCSPDelta = PFI->getTailCallSPDelta()) < 0) { LowerBound = TCSPDelta; } // The Floating-point register save area is right below the back chain word // of the previous stack frame. if (HasFPSaveArea) { for (unsigned i = 0, e = FPRegs.size(); i != e; ++i) { int FI = FPRegs[i].getFrameIdx(); FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI)); } LowerBound -= (31 - getPPCRegisterNumbering(MinFPR) + 1) * 8; } // Check whether the frame pointer register is allocated. If so, make sure it // is spilled to the correct offset. if (needsFP(MF)) { HasGPSaveArea = true; int FI = PFI->getFramePointerSaveIndex(); assert(FI && "No Frame Pointer Save Slot!"); FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI)); } // General register save area starts right below the Floating-point // register save area. if (HasGPSaveArea || HasG8SaveArea) { // Move general register save area spill slots down, taking into account // the size of the Floating-point register save area. for (unsigned i = 0, e = GPRegs.size(); i != e; ++i) { int FI = GPRegs[i].getFrameIdx(); FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI)); } // Move general register save area spill slots down, taking into account // the size of the Floating-point register save area. for (unsigned i = 0, e = G8Regs.size(); i != e; ++i) { int FI = G8Regs[i].getFrameIdx(); FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI)); } unsigned MinReg = std::min(getPPCRegisterNumbering(MinGPR), getPPCRegisterNumbering(MinG8R)); if (Subtarget.isPPC64()) { LowerBound -= (31 - MinReg + 1) * 8; } else { LowerBound -= (31 - MinReg + 1) * 4; } } // The CR save area is below the general register save area. if (HasCRSaveArea) { // FIXME SVR4: Is it actually possible to have multiple elements in CSI // which have the CR/CRBIT register class? // Adjust the frame index of the CR spill slot. for (unsigned i = 0, e = CSI.size(); i != e; ++i) { unsigned Reg = CSI[i].getReg(); if (PPC::CRBITRCRegisterClass->contains(Reg) || PPC::CRRCRegisterClass->contains(Reg)) { int FI = CSI[i].getFrameIdx(); FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI)); } } LowerBound -= 4; // The CR save area is always 4 bytes long. } if (HasVRSAVESaveArea) { // FIXME SVR4: Is it actually possible to have multiple elements in CSI // which have the VRSAVE register class? // Adjust the frame index of the VRSAVE spill slot. for (unsigned i = 0, e = CSI.size(); i != e; ++i) { unsigned Reg = CSI[i].getReg(); if (PPC::VRSAVERCRegisterClass->contains(Reg)) { int FI = CSI[i].getFrameIdx(); FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI)); } } LowerBound -= 4; // The VRSAVE save area is always 4 bytes long. } if (HasVRSaveArea) { // Insert alignment padding, we need 16-byte alignment. LowerBound = (LowerBound - 15) & ~(15); for (unsigned i = 0, e = VRegs.size(); i != e; ++i) { int FI = VRegs[i].getFrameIdx(); FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI)); } } } Index: vendor/llvm/dist/lib/Target/X86/X86CodeEmitter.cpp =================================================================== --- vendor/llvm/dist/lib/Target/X86/X86CodeEmitter.cpp (revision 228363) +++ vendor/llvm/dist/lib/Target/X86/X86CodeEmitter.cpp (revision 228364) @@ -1,999 +1,1014 @@ //===-- X86/X86CodeEmitter.cpp - Convert X86 code to machine code ---------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // This file contains the pass that transforms the X86 machine instructions into // relocatable machine code. // //===----------------------------------------------------------------------===// #define DEBUG_TYPE "x86-emitter" #include "X86InstrInfo.h" #include "X86JITInfo.h" #include "X86Subtarget.h" #include "X86TargetMachine.h" #include "X86Relocations.h" #include "X86.h" #include "llvm/LLVMContext.h" #include "llvm/PassManager.h" #include "llvm/CodeGen/JITCodeEmitter.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineInstr.h" #include "llvm/CodeGen/MachineModuleInfo.h" #include "llvm/CodeGen/Passes.h" #include "llvm/Function.h" #include "llvm/ADT/Statistic.h" #include "llvm/MC/MCCodeEmitter.h" #include "llvm/MC/MCExpr.h" #include "llvm/MC/MCInst.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/raw_ostream.h" #include "llvm/Target/TargetOptions.h" using namespace llvm; STATISTIC(NumEmitted, "Number of machine instructions emitted"); namespace { template class Emitter : public MachineFunctionPass { const X86InstrInfo *II; const TargetData *TD; X86TargetMachine &TM; CodeEmitter &MCE; MachineModuleInfo *MMI; intptr_t PICBaseOffset; bool Is64BitMode; bool IsPIC; public: static char ID; explicit Emitter(X86TargetMachine &tm, CodeEmitter &mce) : MachineFunctionPass(ID), II(0), TD(0), TM(tm), MCE(mce), PICBaseOffset(0), Is64BitMode(false), IsPIC(TM.getRelocationModel() == Reloc::PIC_) {} Emitter(X86TargetMachine &tm, CodeEmitter &mce, const X86InstrInfo &ii, const TargetData &td, bool is64) : MachineFunctionPass(ID), II(&ii), TD(&td), TM(tm), MCE(mce), PICBaseOffset(0), Is64BitMode(is64), IsPIC(TM.getRelocationModel() == Reloc::PIC_) {} bool runOnMachineFunction(MachineFunction &MF); virtual const char *getPassName() const { return "X86 Machine Code Emitter"; } void emitInstruction(MachineInstr &MI, const MCInstrDesc *Desc); void getAnalysisUsage(AnalysisUsage &AU) const { AU.setPreservesAll(); AU.addRequired(); MachineFunctionPass::getAnalysisUsage(AU); } private: void emitPCRelativeBlockAddress(MachineBasicBlock *MBB); void emitGlobalAddress(const GlobalValue *GV, unsigned Reloc, intptr_t Disp = 0, intptr_t PCAdj = 0, bool Indirect = false); void emitExternalSymbolAddress(const char *ES, unsigned Reloc); void emitConstPoolAddress(unsigned CPI, unsigned Reloc, intptr_t Disp = 0, intptr_t PCAdj = 0); void emitJumpTableAddress(unsigned JTI, unsigned Reloc, intptr_t PCAdj = 0); void emitDisplacementField(const MachineOperand *RelocOp, int DispVal, intptr_t Adj = 0, bool IsPCRel = true); void emitRegModRMByte(unsigned ModRMReg, unsigned RegOpcodeField); void emitRegModRMByte(unsigned RegOpcodeField); void emitSIBByte(unsigned SS, unsigned Index, unsigned Base); void emitConstant(uint64_t Val, unsigned Size); void emitMemModRMByte(const MachineInstr &MI, unsigned Op, unsigned RegOpcodeField, intptr_t PCAdj = 0); }; template char Emitter::ID = 0; } // end anonymous namespace. /// createX86CodeEmitterPass - Return a pass that emits the collected X86 code /// to the specified templated MachineCodeEmitter object. FunctionPass *llvm::createX86JITCodeEmitterPass(X86TargetMachine &TM, JITCodeEmitter &JCE) { return new Emitter(TM, JCE); } template bool Emitter::runOnMachineFunction(MachineFunction &MF) { MMI = &getAnalysis(); MCE.setModuleInfo(MMI); II = TM.getInstrInfo(); TD = TM.getTargetData(); Is64BitMode = TM.getSubtarget().is64Bit(); IsPIC = TM.getRelocationModel() == Reloc::PIC_; do { DEBUG(dbgs() << "JITTing function '" << MF.getFunction()->getName() << "'\n"); MCE.startFunction(MF); for (MachineFunction::iterator MBB = MF.begin(), E = MF.end(); MBB != E; ++MBB) { MCE.StartMachineBasicBlock(MBB); for (MachineBasicBlock::iterator I = MBB->begin(), E = MBB->end(); I != E; ++I) { const MCInstrDesc &Desc = I->getDesc(); emitInstruction(*I, &Desc); // MOVPC32r is basically a call plus a pop instruction. if (Desc.getOpcode() == X86::MOVPC32r) emitInstruction(*I, &II->get(X86::POP32r)); ++NumEmitted; // Keep track of the # of mi's emitted } } } while (MCE.finishFunction(MF)); return false; } /// determineREX - Determine if the MachineInstr has to be encoded with a X86-64 /// REX prefix which specifies 1) 64-bit instructions, 2) non-default operand /// size, and 3) use of X86-64 extended registers. static unsigned determineREX(const MachineInstr &MI) { unsigned REX = 0; const MCInstrDesc &Desc = MI.getDesc(); // Pseudo instructions do not need REX prefix byte. if ((Desc.TSFlags & X86II::FormMask) == X86II::Pseudo) return 0; if (Desc.TSFlags & X86II::REX_W) REX |= 1 << 3; unsigned NumOps = Desc.getNumOperands(); if (NumOps) { bool isTwoAddr = NumOps > 1 && Desc.getOperandConstraint(1, MCOI::TIED_TO) != -1; // If it accesses SPL, BPL, SIL, or DIL, then it requires a 0x40 REX prefix. unsigned i = isTwoAddr ? 1 : 0; for (unsigned e = NumOps; i != e; ++i) { const MachineOperand& MO = MI.getOperand(i); if (MO.isReg()) { unsigned Reg = MO.getReg(); if (X86II::isX86_64NonExtLowByteReg(Reg)) REX |= 0x40; } } switch (Desc.TSFlags & X86II::FormMask) { case X86II::MRMInitReg: if (X86InstrInfo::isX86_64ExtendedReg(MI.getOperand(0))) REX |= (1 << 0) | (1 << 2); break; case X86II::MRMSrcReg: { if (X86InstrInfo::isX86_64ExtendedReg(MI.getOperand(0))) REX |= 1 << 2; i = isTwoAddr ? 2 : 1; for (unsigned e = NumOps; i != e; ++i) { const MachineOperand& MO = MI.getOperand(i); if (X86InstrInfo::isX86_64ExtendedReg(MO)) REX |= 1 << 0; } break; } case X86II::MRMSrcMem: { if (X86InstrInfo::isX86_64ExtendedReg(MI.getOperand(0))) REX |= 1 << 2; unsigned Bit = 0; i = isTwoAddr ? 2 : 1; for (; i != NumOps; ++i) { const MachineOperand& MO = MI.getOperand(i); if (MO.isReg()) { if (X86InstrInfo::isX86_64ExtendedReg(MO)) REX |= 1 << Bit; Bit++; } } break; } case X86II::MRM0m: case X86II::MRM1m: case X86II::MRM2m: case X86II::MRM3m: case X86II::MRM4m: case X86II::MRM5m: case X86II::MRM6m: case X86II::MRM7m: case X86II::MRMDestMem: { unsigned e = (isTwoAddr ? X86::AddrNumOperands+1 : X86::AddrNumOperands); i = isTwoAddr ? 1 : 0; if (NumOps > e && X86InstrInfo::isX86_64ExtendedReg(MI.getOperand(e))) REX |= 1 << 2; unsigned Bit = 0; for (; i != e; ++i) { const MachineOperand& MO = MI.getOperand(i); if (MO.isReg()) { if (X86InstrInfo::isX86_64ExtendedReg(MO)) REX |= 1 << Bit; Bit++; } } break; } default: { if (X86InstrInfo::isX86_64ExtendedReg(MI.getOperand(0))) REX |= 1 << 0; i = isTwoAddr ? 2 : 1; for (unsigned e = NumOps; i != e; ++i) { const MachineOperand& MO = MI.getOperand(i); if (X86InstrInfo::isX86_64ExtendedReg(MO)) REX |= 1 << 2; } break; } } } return REX; } /// emitPCRelativeBlockAddress - This method keeps track of the information /// necessary to resolve the address of this block later and emits a dummy /// value. /// template void Emitter::emitPCRelativeBlockAddress(MachineBasicBlock *MBB) { // Remember where this reference was and where it is to so we can // deal with it later. MCE.addRelocation(MachineRelocation::getBB(MCE.getCurrentPCOffset(), X86::reloc_pcrel_word, MBB)); MCE.emitWordLE(0); } /// emitGlobalAddress - Emit the specified address to the code stream assuming /// this is part of a "take the address of a global" instruction. /// template void Emitter::emitGlobalAddress(const GlobalValue *GV, unsigned Reloc, intptr_t Disp /* = 0 */, intptr_t PCAdj /* = 0 */, bool Indirect /* = false */) { intptr_t RelocCST = Disp; if (Reloc == X86::reloc_picrel_word) RelocCST = PICBaseOffset; else if (Reloc == X86::reloc_pcrel_word) RelocCST = PCAdj; MachineRelocation MR = Indirect ? MachineRelocation::getIndirectSymbol(MCE.getCurrentPCOffset(), Reloc, const_cast(GV), RelocCST, false) : MachineRelocation::getGV(MCE.getCurrentPCOffset(), Reloc, const_cast(GV), RelocCST, false); MCE.addRelocation(MR); // The relocated value will be added to the displacement if (Reloc == X86::reloc_absolute_dword) MCE.emitDWordLE(Disp); else MCE.emitWordLE((int32_t)Disp); } /// emitExternalSymbolAddress - Arrange for the address of an external symbol to /// be emitted to the current location in the function, and allow it to be PC /// relative. template void Emitter::emitExternalSymbolAddress(const char *ES, unsigned Reloc) { intptr_t RelocCST = (Reloc == X86::reloc_picrel_word) ? PICBaseOffset : 0; // X86 never needs stubs because instruction selection will always pick // an instruction sequence that is large enough to hold any address // to a symbol. // (see X86ISelLowering.cpp, near 2039: X86TargetLowering::LowerCall) bool NeedStub = false; MCE.addRelocation(MachineRelocation::getExtSym(MCE.getCurrentPCOffset(), Reloc, ES, RelocCST, 0, NeedStub)); if (Reloc == X86::reloc_absolute_dword) MCE.emitDWordLE(0); else MCE.emitWordLE(0); } /// emitConstPoolAddress - Arrange for the address of an constant pool /// to be emitted to the current location in the function, and allow it to be PC /// relative. template void Emitter::emitConstPoolAddress(unsigned CPI, unsigned Reloc, intptr_t Disp /* = 0 */, intptr_t PCAdj /* = 0 */) { intptr_t RelocCST = 0; if (Reloc == X86::reloc_picrel_word) RelocCST = PICBaseOffset; else if (Reloc == X86::reloc_pcrel_word) RelocCST = PCAdj; MCE.addRelocation(MachineRelocation::getConstPool(MCE.getCurrentPCOffset(), Reloc, CPI, RelocCST)); // The relocated value will be added to the displacement if (Reloc == X86::reloc_absolute_dword) MCE.emitDWordLE(Disp); else MCE.emitWordLE((int32_t)Disp); } /// emitJumpTableAddress - Arrange for the address of a jump table to /// be emitted to the current location in the function, and allow it to be PC /// relative. template void Emitter::emitJumpTableAddress(unsigned JTI, unsigned Reloc, intptr_t PCAdj /* = 0 */) { intptr_t RelocCST = 0; if (Reloc == X86::reloc_picrel_word) RelocCST = PICBaseOffset; else if (Reloc == X86::reloc_pcrel_word) RelocCST = PCAdj; MCE.addRelocation(MachineRelocation::getJumpTable(MCE.getCurrentPCOffset(), Reloc, JTI, RelocCST)); // The relocated value will be added to the displacement if (Reloc == X86::reloc_absolute_dword) MCE.emitDWordLE(0); else MCE.emitWordLE(0); } inline static unsigned char ModRMByte(unsigned Mod, unsigned RegOpcode, unsigned RM) { assert(Mod < 4 && RegOpcode < 8 && RM < 8 && "ModRM Fields out of range!"); return RM | (RegOpcode << 3) | (Mod << 6); } template void Emitter::emitRegModRMByte(unsigned ModRMReg, unsigned RegOpcodeFld){ MCE.emitByte(ModRMByte(3, RegOpcodeFld, X86_MC::getX86RegNum(ModRMReg))); } template void Emitter::emitRegModRMByte(unsigned RegOpcodeFld) { MCE.emitByte(ModRMByte(3, RegOpcodeFld, 0)); } template void Emitter::emitSIBByte(unsigned SS, unsigned Index, unsigned Base) { // SIB byte is in the same format as the ModRMByte... MCE.emitByte(ModRMByte(SS, Index, Base)); } template void Emitter::emitConstant(uint64_t Val, unsigned Size) { // Output the constant in little endian byte order... for (unsigned i = 0; i != Size; ++i) { MCE.emitByte(Val & 255); Val >>= 8; } } /// isDisp8 - Return true if this signed displacement fits in a 8-bit /// sign-extended field. static bool isDisp8(int Value) { return Value == (signed char)Value; } static bool gvNeedsNonLazyPtr(const MachineOperand &GVOp, const TargetMachine &TM) { // For Darwin-64, simulate the linktime GOT by using the same non-lazy-pointer // mechanism as 32-bit mode. if (TM.getSubtarget().is64Bit() && !TM.getSubtarget().isTargetDarwin()) return false; // Return true if this is a reference to a stub containing the address of the // global, not the global itself. return isGlobalStubReference(GVOp.getTargetFlags()); } template void Emitter::emitDisplacementField(const MachineOperand *RelocOp, int DispVal, intptr_t Adj /* = 0 */, bool IsPCRel /* = true */) { // If this is a simple integer displacement that doesn't require a relocation, // emit it now. if (!RelocOp) { emitConstant(DispVal, 4); return; } // Otherwise, this is something that requires a relocation. Emit it as such // now. unsigned RelocType = Is64BitMode ? (IsPCRel ? X86::reloc_pcrel_word : X86::reloc_absolute_word_sext) : (IsPIC ? X86::reloc_picrel_word : X86::reloc_absolute_word); if (RelocOp->isGlobal()) { // In 64-bit static small code model, we could potentially emit absolute. // But it's probably not beneficial. If the MCE supports using RIP directly // do it, otherwise fallback to absolute (this is determined by IsPCRel). // 89 05 00 00 00 00 mov %eax,0(%rip) # PC-relative // 89 04 25 00 00 00 00 mov %eax,0x0 # Absolute bool Indirect = gvNeedsNonLazyPtr(*RelocOp, TM); emitGlobalAddress(RelocOp->getGlobal(), RelocType, RelocOp->getOffset(), Adj, Indirect); } else if (RelocOp->isSymbol()) { emitExternalSymbolAddress(RelocOp->getSymbolName(), RelocType); } else if (RelocOp->isCPI()) { emitConstPoolAddress(RelocOp->getIndex(), RelocType, RelocOp->getOffset(), Adj); } else { assert(RelocOp->isJTI() && "Unexpected machine operand!"); emitJumpTableAddress(RelocOp->getIndex(), RelocType, Adj); } } template void Emitter::emitMemModRMByte(const MachineInstr &MI, unsigned Op,unsigned RegOpcodeField, intptr_t PCAdj) { const MachineOperand &Op3 = MI.getOperand(Op+3); int DispVal = 0; const MachineOperand *DispForReloc = 0; // Figure out what sort of displacement we have to handle here. if (Op3.isGlobal()) { DispForReloc = &Op3; } else if (Op3.isSymbol()) { DispForReloc = &Op3; } else if (Op3.isCPI()) { if (!MCE.earlyResolveAddresses() || Is64BitMode || IsPIC) { DispForReloc = &Op3; } else { DispVal += MCE.getConstantPoolEntryAddress(Op3.getIndex()); DispVal += Op3.getOffset(); } } else if (Op3.isJTI()) { if (!MCE.earlyResolveAddresses() || Is64BitMode || IsPIC) { DispForReloc = &Op3; } else { DispVal += MCE.getJumpTableEntryAddress(Op3.getIndex()); } } else { DispVal = Op3.getImm(); } const MachineOperand &Base = MI.getOperand(Op); const MachineOperand &Scale = MI.getOperand(Op+1); const MachineOperand &IndexReg = MI.getOperand(Op+2); unsigned BaseReg = Base.getReg(); // Handle %rip relative addressing. if (BaseReg == X86::RIP || (Is64BitMode && DispForReloc)) { // [disp32+RIP] in X86-64 mode assert(IndexReg.getReg() == 0 && Is64BitMode && "Invalid rip-relative address"); MCE.emitByte(ModRMByte(0, RegOpcodeField, 5)); emitDisplacementField(DispForReloc, DispVal, PCAdj, true); return; } // Indicate that the displacement will use an pcrel or absolute reference // by default. MCEs able to resolve addresses on-the-fly use pcrel by default // while others, unless explicit asked to use RIP, use absolute references. bool IsPCRel = MCE.earlyResolveAddresses() ? true : false; // Is a SIB byte needed? // If no BaseReg, issue a RIP relative instruction only if the MCE can // resolve addresses on-the-fly, otherwise use SIB (Intel Manual 2A, table // 2-7) and absolute references. unsigned BaseRegNo = -1U; if (BaseReg != 0 && BaseReg != X86::RIP) BaseRegNo = X86_MC::getX86RegNum(BaseReg); if (// The SIB byte must be used if there is an index register. IndexReg.getReg() == 0 && // The SIB byte must be used if the base is ESP/RSP/R12, all of which // encode to an R/M value of 4, which indicates that a SIB byte is // present. BaseRegNo != N86::ESP && // If there is no base register and we're in 64-bit mode, we need a SIB // byte to emit an addr that is just 'disp32' (the non-RIP relative form). (!Is64BitMode || BaseReg != 0)) { if (BaseReg == 0 || // [disp32] in X86-32 mode BaseReg == X86::RIP) { // [disp32+RIP] in X86-64 mode MCE.emitByte(ModRMByte(0, RegOpcodeField, 5)); emitDisplacementField(DispForReloc, DispVal, PCAdj, true); return; } // If the base is not EBP/ESP and there is no displacement, use simple // indirect register encoding, this handles addresses like [EAX]. The // encoding for [EBP] with no displacement means [disp32] so we handle it // by emitting a displacement of 0 below. if (!DispForReloc && DispVal == 0 && BaseRegNo != N86::EBP) { MCE.emitByte(ModRMByte(0, RegOpcodeField, BaseRegNo)); return; } // Otherwise, if the displacement fits in a byte, encode as [REG+disp8]. if (!DispForReloc && isDisp8(DispVal)) { MCE.emitByte(ModRMByte(1, RegOpcodeField, BaseRegNo)); emitConstant(DispVal, 1); return; } // Otherwise, emit the most general non-SIB encoding: [REG+disp32] MCE.emitByte(ModRMByte(2, RegOpcodeField, BaseRegNo)); emitDisplacementField(DispForReloc, DispVal, PCAdj, IsPCRel); return; } // Otherwise we need a SIB byte, so start by outputting the ModR/M byte first. assert(IndexReg.getReg() != X86::ESP && IndexReg.getReg() != X86::RSP && "Cannot use ESP as index reg!"); bool ForceDisp32 = false; bool ForceDisp8 = false; if (BaseReg == 0) { // If there is no base register, we emit the special case SIB byte with // MOD=0, BASE=4, to JUST get the index, scale, and displacement. MCE.emitByte(ModRMByte(0, RegOpcodeField, 4)); ForceDisp32 = true; } else if (DispForReloc) { // Emit the normal disp32 encoding. MCE.emitByte(ModRMByte(2, RegOpcodeField, 4)); ForceDisp32 = true; } else if (DispVal == 0 && BaseRegNo != N86::EBP) { // Emit no displacement ModR/M byte MCE.emitByte(ModRMByte(0, RegOpcodeField, 4)); } else if (isDisp8(DispVal)) { // Emit the disp8 encoding... MCE.emitByte(ModRMByte(1, RegOpcodeField, 4)); ForceDisp8 = true; // Make sure to force 8 bit disp if Base=EBP } else { // Emit the normal disp32 encoding... MCE.emitByte(ModRMByte(2, RegOpcodeField, 4)); } // Calculate what the SS field value should be... static const unsigned SSTable[] = { ~0U, 0, 1, ~0U, 2, ~0U, ~0U, ~0U, 3 }; unsigned SS = SSTable[Scale.getImm()]; if (BaseReg == 0) { // Handle the SIB byte for the case where there is no base, see Intel // Manual 2A, table 2-7. The displacement has already been output. unsigned IndexRegNo; if (IndexReg.getReg()) IndexRegNo = X86_MC::getX86RegNum(IndexReg.getReg()); else // Examples: [ESP+1*+4] or [scaled idx]+disp32 (MOD=0,BASE=5) IndexRegNo = 4; emitSIBByte(SS, IndexRegNo, 5); } else { unsigned BaseRegNo = X86_MC::getX86RegNum(BaseReg); unsigned IndexRegNo; if (IndexReg.getReg()) IndexRegNo = X86_MC::getX86RegNum(IndexReg.getReg()); else IndexRegNo = 4; // For example [ESP+1*+4] emitSIBByte(SS, IndexRegNo, BaseRegNo); } // Do we need to output a displacement? if (ForceDisp8) { emitConstant(DispVal, 1); } else if (DispVal != 0 || ForceDisp32) { emitDisplacementField(DispForReloc, DispVal, PCAdj, IsPCRel); } } +static const MCInstrDesc *UpdateOp(MachineInstr &MI, const X86InstrInfo *II, + unsigned Opcode) { + const MCInstrDesc *Desc = &II->get(Opcode); + MI.setDesc(*Desc); + return Desc; +} + template void Emitter::emitInstruction(MachineInstr &MI, const MCInstrDesc *Desc) { DEBUG(dbgs() << MI); // If this is a pseudo instruction, lower it. switch (Desc->getOpcode()) { - case X86::ADD16rr_DB: Desc = &II->get(X86::OR16rr); MI.setDesc(*Desc);break; - case X86::ADD32rr_DB: Desc = &II->get(X86::OR32rr); MI.setDesc(*Desc);break; - case X86::ADD64rr_DB: Desc = &II->get(X86::OR64rr); MI.setDesc(*Desc);break; - case X86::ADD16ri_DB: Desc = &II->get(X86::OR16ri); MI.setDesc(*Desc);break; - case X86::ADD32ri_DB: Desc = &II->get(X86::OR32ri); MI.setDesc(*Desc);break; - case X86::ADD64ri32_DB:Desc = &II->get(X86::OR64ri32);MI.setDesc(*Desc);break; - case X86::ADD16ri8_DB: Desc = &II->get(X86::OR16ri8);MI.setDesc(*Desc);break; - case X86::ADD32ri8_DB: Desc = &II->get(X86::OR32ri8);MI.setDesc(*Desc);break; - case X86::ADD64ri8_DB: Desc = &II->get(X86::OR64ri8);MI.setDesc(*Desc);break; + case X86::ADD16rr_DB: Desc = UpdateOp(MI, II, X86::OR16rr); break; + case X86::ADD32rr_DB: Desc = UpdateOp(MI, II, X86::OR32rr); break; + case X86::ADD64rr_DB: Desc = UpdateOp(MI, II, X86::OR64rr); break; + case X86::ADD16ri_DB: Desc = UpdateOp(MI, II, X86::OR16ri); break; + case X86::ADD32ri_DB: Desc = UpdateOp(MI, II, X86::OR32ri); break; + case X86::ADD64ri32_DB: Desc = UpdateOp(MI, II, X86::OR64ri32); break; + case X86::ADD16ri8_DB: Desc = UpdateOp(MI, II, X86::OR16ri8); break; + case X86::ADD32ri8_DB: Desc = UpdateOp(MI, II, X86::OR32ri8); break; + case X86::ADD64ri8_DB: Desc = UpdateOp(MI, II, X86::OR64ri8); break; + case X86::ACQUIRE_MOV8rm: Desc = UpdateOp(MI, II, X86::MOV8rm); break; + case X86::ACQUIRE_MOV16rm: Desc = UpdateOp(MI, II, X86::MOV16rm); break; + case X86::ACQUIRE_MOV32rm: Desc = UpdateOp(MI, II, X86::MOV32rm); break; + case X86::ACQUIRE_MOV64rm: Desc = UpdateOp(MI, II, X86::MOV64rm); break; + case X86::RELEASE_MOV8mr: Desc = UpdateOp(MI, II, X86::MOV8mr); break; + case X86::RELEASE_MOV16mr: Desc = UpdateOp(MI, II, X86::MOV16mr); break; + case X86::RELEASE_MOV32mr: Desc = UpdateOp(MI, II, X86::MOV32mr); break; + case X86::RELEASE_MOV64mr: Desc = UpdateOp(MI, II, X86::MOV64mr); break; } MCE.processDebugLoc(MI.getDebugLoc(), true); unsigned Opcode = Desc->Opcode; // Emit the lock opcode prefix as needed. if (Desc->TSFlags & X86II::LOCK) MCE.emitByte(0xF0); // Emit segment override opcode prefix as needed. switch (Desc->TSFlags & X86II::SegOvrMask) { case X86II::FS: MCE.emitByte(0x64); break; case X86II::GS: MCE.emitByte(0x65); break; default: llvm_unreachable("Invalid segment!"); case 0: break; // No segment override! } // Emit the repeat opcode prefix as needed. if ((Desc->TSFlags & X86II::Op0Mask) == X86II::REP) MCE.emitByte(0xF3); // Emit the operand size opcode prefix as needed. if (Desc->TSFlags & X86II::OpSize) MCE.emitByte(0x66); // Emit the address size opcode prefix as needed. if (Desc->TSFlags & X86II::AdSize) MCE.emitByte(0x67); bool Need0FPrefix = false; switch (Desc->TSFlags & X86II::Op0Mask) { case X86II::TB: // Two-byte opcode prefix case X86II::T8: // 0F 38 case X86II::TA: // 0F 3A case X86II::A6: // 0F A6 case X86II::A7: // 0F A7 Need0FPrefix = true; break; case X86II::TF: // F2 0F 38 MCE.emitByte(0xF2); Need0FPrefix = true; break; case X86II::REP: break; // already handled. case X86II::XS: // F3 0F MCE.emitByte(0xF3); Need0FPrefix = true; break; case X86II::XD: // F2 0F MCE.emitByte(0xF2); Need0FPrefix = true; break; case X86II::D8: case X86II::D9: case X86II::DA: case X86II::DB: case X86II::DC: case X86II::DD: case X86II::DE: case X86II::DF: MCE.emitByte(0xD8+ (((Desc->TSFlags & X86II::Op0Mask)-X86II::D8) >> X86II::Op0Shift)); break; // Two-byte opcode prefix default: llvm_unreachable("Invalid prefix!"); case 0: break; // No prefix! } // Handle REX prefix. if (Is64BitMode) { if (unsigned REX = determineREX(MI)) MCE.emitByte(0x40 | REX); } // 0x0F escape code must be emitted just before the opcode. if (Need0FPrefix) MCE.emitByte(0x0F); switch (Desc->TSFlags & X86II::Op0Mask) { case X86II::TF: // F2 0F 38 case X86II::T8: // 0F 38 MCE.emitByte(0x38); break; case X86II::TA: // 0F 3A MCE.emitByte(0x3A); break; case X86II::A6: // 0F A6 MCE.emitByte(0xA6); break; case X86II::A7: // 0F A7 MCE.emitByte(0xA7); break; } // If this is a two-address instruction, skip one of the register operands. unsigned NumOps = Desc->getNumOperands(); unsigned CurOp = 0; if (NumOps > 1 && Desc->getOperandConstraint(1, MCOI::TIED_TO) != -1) ++CurOp; else if (NumOps > 2 && Desc->getOperandConstraint(NumOps-1,MCOI::TIED_TO)== 0) // Skip the last source operand that is tied_to the dest reg. e.g. LXADD32 --NumOps; unsigned char BaseOpcode = X86II::getBaseOpcodeFor(Desc->TSFlags); switch (Desc->TSFlags & X86II::FormMask) { default: llvm_unreachable("Unknown FormMask value in X86 MachineCodeEmitter!"); case X86II::Pseudo: // Remember the current PC offset, this is the PIC relocation // base address. switch (Opcode) { default: llvm_unreachable("pseudo instructions should be removed before code" " emission"); break; // Do nothing for Int_MemBarrier - it's just a comment. Add a debug // to make it slightly easier to see. case X86::Int_MemBarrier: DEBUG(dbgs() << "#MEMBARRIER\n"); break; case TargetOpcode::INLINEASM: // We allow inline assembler nodes with empty bodies - they can // implicitly define registers, which is ok for JIT. if (MI.getOperand(0).getSymbolName()[0]) report_fatal_error("JIT does not support inline asm!"); break; case TargetOpcode::PROLOG_LABEL: case TargetOpcode::GC_LABEL: case TargetOpcode::EH_LABEL: MCE.emitLabel(MI.getOperand(0).getMCSymbol()); break; case TargetOpcode::IMPLICIT_DEF: case TargetOpcode::KILL: break; case X86::MOVPC32r: { // This emits the "call" portion of this pseudo instruction. MCE.emitByte(BaseOpcode); emitConstant(0, X86II::getSizeOfImm(Desc->TSFlags)); // Remember PIC base. PICBaseOffset = (intptr_t) MCE.getCurrentPCOffset(); X86JITInfo *JTI = TM.getJITInfo(); JTI->setPICBase(MCE.getCurrentPCValue()); break; } } CurOp = NumOps; break; case X86II::RawFrm: { MCE.emitByte(BaseOpcode); if (CurOp == NumOps) break; const MachineOperand &MO = MI.getOperand(CurOp++); DEBUG(dbgs() << "RawFrm CurOp " << CurOp << "\n"); DEBUG(dbgs() << "isMBB " << MO.isMBB() << "\n"); DEBUG(dbgs() << "isGlobal " << MO.isGlobal() << "\n"); DEBUG(dbgs() << "isSymbol " << MO.isSymbol() << "\n"); DEBUG(dbgs() << "isImm " << MO.isImm() << "\n"); if (MO.isMBB()) { emitPCRelativeBlockAddress(MO.getMBB()); break; } if (MO.isGlobal()) { emitGlobalAddress(MO.getGlobal(), X86::reloc_pcrel_word, MO.getOffset(), 0); break; } if (MO.isSymbol()) { emitExternalSymbolAddress(MO.getSymbolName(), X86::reloc_pcrel_word); break; } // FIXME: Only used by hackish MCCodeEmitter, remove when dead. if (MO.isJTI()) { emitJumpTableAddress(MO.getIndex(), X86::reloc_pcrel_word); break; } assert(MO.isImm() && "Unknown RawFrm operand!"); if (Opcode == X86::CALLpcrel32 || Opcode == X86::CALL64pcrel32 || Opcode == X86::WINCALL64pcrel32) { // Fix up immediate operand for pc relative calls. intptr_t Imm = (intptr_t)MO.getImm(); Imm = Imm - MCE.getCurrentPCValue() - 4; emitConstant(Imm, X86II::getSizeOfImm(Desc->TSFlags)); } else emitConstant(MO.getImm(), X86II::getSizeOfImm(Desc->TSFlags)); break; } case X86II::AddRegFrm: { MCE.emitByte(BaseOpcode + X86_MC::getX86RegNum(MI.getOperand(CurOp++).getReg())); if (CurOp == NumOps) break; const MachineOperand &MO1 = MI.getOperand(CurOp++); unsigned Size = X86II::getSizeOfImm(Desc->TSFlags); if (MO1.isImm()) { emitConstant(MO1.getImm(), Size); break; } unsigned rt = Is64BitMode ? X86::reloc_pcrel_word : (IsPIC ? X86::reloc_picrel_word : X86::reloc_absolute_word); if (Opcode == X86::MOV64ri64i32) rt = X86::reloc_absolute_word; // FIXME: add X86II flag? // This should not occur on Darwin for relocatable objects. if (Opcode == X86::MOV64ri) rt = X86::reloc_absolute_dword; // FIXME: add X86II flag? if (MO1.isGlobal()) { bool Indirect = gvNeedsNonLazyPtr(MO1, TM); emitGlobalAddress(MO1.getGlobal(), rt, MO1.getOffset(), 0, Indirect); } else if (MO1.isSymbol()) emitExternalSymbolAddress(MO1.getSymbolName(), rt); else if (MO1.isCPI()) emitConstPoolAddress(MO1.getIndex(), rt); else if (MO1.isJTI()) emitJumpTableAddress(MO1.getIndex(), rt); break; } case X86II::MRMDestReg: { MCE.emitByte(BaseOpcode); emitRegModRMByte(MI.getOperand(CurOp).getReg(), X86_MC::getX86RegNum(MI.getOperand(CurOp+1).getReg())); CurOp += 2; if (CurOp != NumOps) emitConstant(MI.getOperand(CurOp++).getImm(), X86II::getSizeOfImm(Desc->TSFlags)); break; } case X86II::MRMDestMem: { MCE.emitByte(BaseOpcode); emitMemModRMByte(MI, CurOp, X86_MC::getX86RegNum(MI.getOperand(CurOp + X86::AddrNumOperands) .getReg())); CurOp += X86::AddrNumOperands + 1; if (CurOp != NumOps) emitConstant(MI.getOperand(CurOp++).getImm(), X86II::getSizeOfImm(Desc->TSFlags)); break; } case X86II::MRMSrcReg: MCE.emitByte(BaseOpcode); emitRegModRMByte(MI.getOperand(CurOp+1).getReg(), X86_MC::getX86RegNum(MI.getOperand(CurOp).getReg())); CurOp += 2; if (CurOp != NumOps) emitConstant(MI.getOperand(CurOp++).getImm(), X86II::getSizeOfImm(Desc->TSFlags)); break; case X86II::MRMSrcMem: { int AddrOperands = X86::AddrNumOperands; intptr_t PCAdj = (CurOp + AddrOperands + 1 != NumOps) ? X86II::getSizeOfImm(Desc->TSFlags) : 0; MCE.emitByte(BaseOpcode); emitMemModRMByte(MI, CurOp+1, X86_MC::getX86RegNum(MI.getOperand(CurOp).getReg()),PCAdj); CurOp += AddrOperands + 1; if (CurOp != NumOps) emitConstant(MI.getOperand(CurOp++).getImm(), X86II::getSizeOfImm(Desc->TSFlags)); break; } case X86II::MRM0r: case X86II::MRM1r: case X86II::MRM2r: case X86II::MRM3r: case X86II::MRM4r: case X86II::MRM5r: case X86II::MRM6r: case X86II::MRM7r: { MCE.emitByte(BaseOpcode); emitRegModRMByte(MI.getOperand(CurOp++).getReg(), (Desc->TSFlags & X86II::FormMask)-X86II::MRM0r); if (CurOp == NumOps) break; const MachineOperand &MO1 = MI.getOperand(CurOp++); unsigned Size = X86II::getSizeOfImm(Desc->TSFlags); if (MO1.isImm()) { emitConstant(MO1.getImm(), Size); break; } unsigned rt = Is64BitMode ? X86::reloc_pcrel_word : (IsPIC ? X86::reloc_picrel_word : X86::reloc_absolute_word); if (Opcode == X86::MOV64ri32) rt = X86::reloc_absolute_word_sext; // FIXME: add X86II flag? if (MO1.isGlobal()) { bool Indirect = gvNeedsNonLazyPtr(MO1, TM); emitGlobalAddress(MO1.getGlobal(), rt, MO1.getOffset(), 0, Indirect); } else if (MO1.isSymbol()) emitExternalSymbolAddress(MO1.getSymbolName(), rt); else if (MO1.isCPI()) emitConstPoolAddress(MO1.getIndex(), rt); else if (MO1.isJTI()) emitJumpTableAddress(MO1.getIndex(), rt); break; } case X86II::MRM0m: case X86II::MRM1m: case X86II::MRM2m: case X86II::MRM3m: case X86II::MRM4m: case X86II::MRM5m: case X86II::MRM6m: case X86II::MRM7m: { intptr_t PCAdj = (CurOp + X86::AddrNumOperands != NumOps) ? (MI.getOperand(CurOp+X86::AddrNumOperands).isImm() ? X86II::getSizeOfImm(Desc->TSFlags) : 4) : 0; MCE.emitByte(BaseOpcode); emitMemModRMByte(MI, CurOp, (Desc->TSFlags & X86II::FormMask)-X86II::MRM0m, PCAdj); CurOp += X86::AddrNumOperands; if (CurOp == NumOps) break; const MachineOperand &MO = MI.getOperand(CurOp++); unsigned Size = X86II::getSizeOfImm(Desc->TSFlags); if (MO.isImm()) { emitConstant(MO.getImm(), Size); break; } unsigned rt = Is64BitMode ? X86::reloc_pcrel_word : (IsPIC ? X86::reloc_picrel_word : X86::reloc_absolute_word); if (Opcode == X86::MOV64mi32) rt = X86::reloc_absolute_word_sext; // FIXME: add X86II flag? if (MO.isGlobal()) { bool Indirect = gvNeedsNonLazyPtr(MO, TM); emitGlobalAddress(MO.getGlobal(), rt, MO.getOffset(), 0, Indirect); } else if (MO.isSymbol()) emitExternalSymbolAddress(MO.getSymbolName(), rt); else if (MO.isCPI()) emitConstPoolAddress(MO.getIndex(), rt); else if (MO.isJTI()) emitJumpTableAddress(MO.getIndex(), rt); break; } case X86II::MRMInitReg: MCE.emitByte(BaseOpcode); // Duplicate register, used by things like MOV8r0 (aka xor reg,reg). emitRegModRMByte(MI.getOperand(CurOp).getReg(), X86_MC::getX86RegNum(MI.getOperand(CurOp).getReg())); ++CurOp; break; case X86II::MRM_C1: MCE.emitByte(BaseOpcode); MCE.emitByte(0xC1); break; case X86II::MRM_C8: MCE.emitByte(BaseOpcode); MCE.emitByte(0xC8); break; case X86II::MRM_C9: MCE.emitByte(BaseOpcode); MCE.emitByte(0xC9); break; case X86II::MRM_E8: MCE.emitByte(BaseOpcode); MCE.emitByte(0xE8); break; case X86II::MRM_F0: MCE.emitByte(BaseOpcode); MCE.emitByte(0xF0); break; } if (!Desc->isVariadic() && CurOp != NumOps) { #ifndef NDEBUG dbgs() << "Cannot encode all operands of: " << MI << "\n"; #endif llvm_unreachable(0); } MCE.processDebugLoc(MI.getDebugLoc(), false); } Index: vendor/llvm/dist/lib/Transforms/InstCombine/InstructionCombining.cpp =================================================================== --- vendor/llvm/dist/lib/Transforms/InstCombine/InstructionCombining.cpp (revision 228363) +++ vendor/llvm/dist/lib/Transforms/InstCombine/InstructionCombining.cpp (revision 228364) @@ -1,2087 +1,2088 @@ //===- InstructionCombining.cpp - Combine multiple instructions -----------===// // // The LLVM Compiler Infrastructure // // This file is distributed under the University of Illinois Open Source // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// // // InstructionCombining - Combine instructions to form fewer, simple // instructions. This pass does not modify the CFG. This pass is where // algebraic simplification happens. // // This pass combines things like: // %Y = add i32 %X, 1 // %Z = add i32 %Y, 1 // into: // %Z = add i32 %X, 2 // // This is a simple worklist driven algorithm. // // This pass guarantees that the following canonicalizations are performed on // the program: // 1. If a binary operator has a constant operand, it is moved to the RHS // 2. Bitwise operators with constant operands are always grouped so that // shifts are performed first, then or's, then and's, then xor's. // 3. Compare instructions are converted from <,>,<=,>= to ==,!= if possible // 4. All cmp instructions on boolean values are replaced with logical ops // 5. add X, X is represented as (X*2) => (X << 1) // 6. Multiplies with a power-of-two constant argument are transformed into // shifts. // ... etc. // //===----------------------------------------------------------------------===// #define DEBUG_TYPE "instcombine" #include "llvm/Transforms/Scalar.h" #include "InstCombine.h" #include "llvm/IntrinsicInst.h" #include "llvm/Analysis/ConstantFolding.h" #include "llvm/Analysis/InstructionSimplify.h" #include "llvm/Analysis/MemoryBuiltins.h" #include "llvm/Target/TargetData.h" #include "llvm/Transforms/Utils/Local.h" #include "llvm/Support/CFG.h" #include "llvm/Support/Debug.h" #include "llvm/Support/GetElementPtrTypeIterator.h" #include "llvm/Support/PatternMatch.h" #include "llvm/Support/ValueHandle.h" #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/Statistic.h" #include "llvm/ADT/StringSwitch.h" #include "llvm-c/Initialization.h" #include #include using namespace llvm; using namespace llvm::PatternMatch; STATISTIC(NumCombined , "Number of insts combined"); STATISTIC(NumConstProp, "Number of constant folds"); STATISTIC(NumDeadInst , "Number of dead inst eliminated"); STATISTIC(NumSunkInst , "Number of instructions sunk"); STATISTIC(NumExpand, "Number of expansions"); STATISTIC(NumFactor , "Number of factorizations"); STATISTIC(NumReassoc , "Number of reassociations"); // Initialization Routines void llvm::initializeInstCombine(PassRegistry &Registry) { initializeInstCombinerPass(Registry); } void LLVMInitializeInstCombine(LLVMPassRegistryRef R) { initializeInstCombine(*unwrap(R)); } char InstCombiner::ID = 0; INITIALIZE_PASS(InstCombiner, "instcombine", "Combine redundant instructions", false, false) void InstCombiner::getAnalysisUsage(AnalysisUsage &AU) const { AU.setPreservesCFG(); } /// ShouldChangeType - Return true if it is desirable to convert a computation /// from 'From' to 'To'. We don't want to convert from a legal to an illegal /// type for example, or from a smaller to a larger illegal type. bool InstCombiner::ShouldChangeType(Type *From, Type *To) const { assert(From->isIntegerTy() && To->isIntegerTy()); // If we don't have TD, we don't know if the source/dest are legal. if (!TD) return false; unsigned FromWidth = From->getPrimitiveSizeInBits(); unsigned ToWidth = To->getPrimitiveSizeInBits(); bool FromLegal = TD->isLegalInteger(FromWidth); bool ToLegal = TD->isLegalInteger(ToWidth); // If this is a legal integer from type, and the result would be an illegal // type, don't do the transformation. if (FromLegal && !ToLegal) return false; // Otherwise, if both are illegal, do not increase the size of the result. We // do allow things like i160 -> i64, but not i64 -> i160. if (!FromLegal && !ToLegal && ToWidth > FromWidth) return false; return true; } // Return true, if No Signed Wrap should be maintained for I. // The No Signed Wrap flag can be kept if the operation "B (I.getOpcode) C", // where both B and C should be ConstantInts, results in a constant that does // not overflow. This function only handles the Add and Sub opcodes. For // all other opcodes, the function conservatively returns false. static bool MaintainNoSignedWrap(BinaryOperator &I, Value *B, Value *C) { OverflowingBinaryOperator *OBO = dyn_cast(&I); if (!OBO || !OBO->hasNoSignedWrap()) { return false; } // We reason about Add and Sub Only. Instruction::BinaryOps Opcode = I.getOpcode(); if (Opcode != Instruction::Add && Opcode != Instruction::Sub) { return false; } ConstantInt *CB = dyn_cast(B); ConstantInt *CC = dyn_cast(C); if (!CB || !CC) { return false; } const APInt &BVal = CB->getValue(); const APInt &CVal = CC->getValue(); bool Overflow = false; if (Opcode == Instruction::Add) { BVal.sadd_ov(CVal, Overflow); } else { BVal.ssub_ov(CVal, Overflow); } return !Overflow; } /// SimplifyAssociativeOrCommutative - This performs a few simplifications for /// operators which are associative or commutative: // // Commutative operators: // // 1. Order operands such that they are listed from right (least complex) to // left (most complex). This puts constants before unary operators before // binary operators. // // Associative operators: // // 2. Transform: "(A op B) op C" ==> "A op (B op C)" if "B op C" simplifies. // 3. Transform: "A op (B op C)" ==> "(A op B) op C" if "A op B" simplifies. // // Associative and commutative operators: // // 4. Transform: "(A op B) op C" ==> "(C op A) op B" if "C op A" simplifies. // 5. Transform: "A op (B op C)" ==> "B op (C op A)" if "C op A" simplifies. // 6. Transform: "(A op C1) op (B op C2)" ==> "(A op B) op (C1 op C2)" // if C1 and C2 are constants. // bool InstCombiner::SimplifyAssociativeOrCommutative(BinaryOperator &I) { Instruction::BinaryOps Opcode = I.getOpcode(); bool Changed = false; do { // Order operands such that they are listed from right (least complex) to // left (most complex). This puts constants before unary operators before // binary operators. if (I.isCommutative() && getComplexity(I.getOperand(0)) < getComplexity(I.getOperand(1))) Changed = !I.swapOperands(); BinaryOperator *Op0 = dyn_cast(I.getOperand(0)); BinaryOperator *Op1 = dyn_cast(I.getOperand(1)); if (I.isAssociative()) { // Transform: "(A op B) op C" ==> "A op (B op C)" if "B op C" simplifies. if (Op0 && Op0->getOpcode() == Opcode) { Value *A = Op0->getOperand(0); Value *B = Op0->getOperand(1); Value *C = I.getOperand(1); // Does "B op C" simplify? if (Value *V = SimplifyBinOp(Opcode, B, C, TD)) { // It simplifies to V. Form "A op V". I.setOperand(0, A); I.setOperand(1, V); // Conservatively clear the optional flags, since they may not be // preserved by the reassociation. if (MaintainNoSignedWrap(I, B, C) && (!Op0 || (isa(Op0) && Op0->hasNoSignedWrap()))) { // Note: this is only valid because SimplifyBinOp doesn't look at // the operands to Op0. I.clearSubclassOptionalData(); I.setHasNoSignedWrap(true); } else { I.clearSubclassOptionalData(); } Changed = true; ++NumReassoc; continue; } } // Transform: "A op (B op C)" ==> "(A op B) op C" if "A op B" simplifies. if (Op1 && Op1->getOpcode() == Opcode) { Value *A = I.getOperand(0); Value *B = Op1->getOperand(0); Value *C = Op1->getOperand(1); // Does "A op B" simplify? if (Value *V = SimplifyBinOp(Opcode, A, B, TD)) { // It simplifies to V. Form "V op C". I.setOperand(0, V); I.setOperand(1, C); // Conservatively clear the optional flags, since they may not be // preserved by the reassociation. I.clearSubclassOptionalData(); Changed = true; ++NumReassoc; continue; } } } if (I.isAssociative() && I.isCommutative()) { // Transform: "(A op B) op C" ==> "(C op A) op B" if "C op A" simplifies. if (Op0 && Op0->getOpcode() == Opcode) { Value *A = Op0->getOperand(0); Value *B = Op0->getOperand(1); Value *C = I.getOperand(1); // Does "C op A" simplify? if (Value *V = SimplifyBinOp(Opcode, C, A, TD)) { // It simplifies to V. Form "V op B". I.setOperand(0, V); I.setOperand(1, B); // Conservatively clear the optional flags, since they may not be // preserved by the reassociation. I.clearSubclassOptionalData(); Changed = true; ++NumReassoc; continue; } } // Transform: "A op (B op C)" ==> "B op (C op A)" if "C op A" simplifies. if (Op1 && Op1->getOpcode() == Opcode) { Value *A = I.getOperand(0); Value *B = Op1->getOperand(0); Value *C = Op1->getOperand(1); // Does "C op A" simplify? if (Value *V = SimplifyBinOp(Opcode, C, A, TD)) { // It simplifies to V. Form "B op V". I.setOperand(0, B); I.setOperand(1, V); // Conservatively clear the optional flags, since they may not be // preserved by the reassociation. I.clearSubclassOptionalData(); Changed = true; ++NumReassoc; continue; } } // Transform: "(A op C1) op (B op C2)" ==> "(A op B) op (C1 op C2)" // if C1 and C2 are constants. if (Op0 && Op1 && Op0->getOpcode() == Opcode && Op1->getOpcode() == Opcode && isa(Op0->getOperand(1)) && isa(Op1->getOperand(1)) && Op0->hasOneUse() && Op1->hasOneUse()) { Value *A = Op0->getOperand(0); Constant *C1 = cast(Op0->getOperand(1)); Value *B = Op1->getOperand(0); Constant *C2 = cast(Op1->getOperand(1)); Constant *Folded = ConstantExpr::get(Opcode, C1, C2); BinaryOperator *New = BinaryOperator::Create(Opcode, A, B); InsertNewInstWith(New, I); New->takeName(Op1); I.setOperand(0, New); I.setOperand(1, Folded); // Conservatively clear the optional flags, since they may not be // preserved by the reassociation. I.clearSubclassOptionalData(); Changed = true; continue; } } // No further simplifications. return Changed; } while (1); } /// LeftDistributesOverRight - Whether "X LOp (Y ROp Z)" is always equal to /// "(X LOp Y) ROp (X LOp Z)". static bool LeftDistributesOverRight(Instruction::BinaryOps LOp, Instruction::BinaryOps ROp) { switch (LOp) { default: return false; case Instruction::And: // And distributes over Or and Xor. switch (ROp) { default: return false; case Instruction::Or: case Instruction::Xor: return true; } case Instruction::Mul: // Multiplication distributes over addition and subtraction. switch (ROp) { default: return false; case Instruction::Add: case Instruction::Sub: return true; } case Instruction::Or: // Or distributes over And. switch (ROp) { default: return false; case Instruction::And: return true; } } } /// RightDistributesOverLeft - Whether "(X LOp Y) ROp Z" is always equal to /// "(X ROp Z) LOp (Y ROp Z)". static bool RightDistributesOverLeft(Instruction::BinaryOps LOp, Instruction::BinaryOps ROp) { if (Instruction::isCommutative(ROp)) return LeftDistributesOverRight(ROp, LOp); // TODO: It would be nice to handle division, aka "(X + Y)/Z = X/Z + Y/Z", // but this requires knowing that the addition does not overflow and other // such subtleties. return false; } /// SimplifyUsingDistributiveLaws - This tries to simplify binary operations /// which some other binary operation distributes over either by factorizing /// out common terms (eg "(A*B)+(A*C)" -> "A*(B+C)") or expanding out if this /// results in simplifications (eg: "A & (B | C) -> (A&B) | (A&C)" if this is /// a win). Returns the simplified value, or null if it didn't simplify. Value *InstCombiner::SimplifyUsingDistributiveLaws(BinaryOperator &I) { Value *LHS = I.getOperand(0), *RHS = I.getOperand(1); BinaryOperator *Op0 = dyn_cast(LHS); BinaryOperator *Op1 = dyn_cast(RHS); Instruction::BinaryOps TopLevelOpcode = I.getOpcode(); // op // Factorization. if (Op0 && Op1 && Op0->getOpcode() == Op1->getOpcode()) { // The instruction has the form "(A op' B) op (C op' D)". Try to factorize // a common term. Value *A = Op0->getOperand(0), *B = Op0->getOperand(1); Value *C = Op1->getOperand(0), *D = Op1->getOperand(1); Instruction::BinaryOps InnerOpcode = Op0->getOpcode(); // op' // Does "X op' Y" always equal "Y op' X"? bool InnerCommutative = Instruction::isCommutative(InnerOpcode); // Does "X op' (Y op Z)" always equal "(X op' Y) op (X op' Z)"? if (LeftDistributesOverRight(InnerOpcode, TopLevelOpcode)) // Does the instruction have the form "(A op' B) op (A op' D)" or, in the // commutative case, "(A op' B) op (C op' A)"? if (A == C || (InnerCommutative && A == D)) { if (A != C) std::swap(C, D); // Consider forming "A op' (B op D)". // If "B op D" simplifies then it can be formed with no cost. Value *V = SimplifyBinOp(TopLevelOpcode, B, D, TD); // If "B op D" doesn't simplify then only go on if both of the existing // operations "A op' B" and "C op' D" will be zapped as no longer used. if (!V && Op0->hasOneUse() && Op1->hasOneUse()) V = Builder->CreateBinOp(TopLevelOpcode, B, D, Op1->getName()); if (V) { ++NumFactor; V = Builder->CreateBinOp(InnerOpcode, A, V); V->takeName(&I); return V; } } // Does "(X op Y) op' Z" always equal "(X op' Z) op (Y op' Z)"? if (RightDistributesOverLeft(TopLevelOpcode, InnerOpcode)) // Does the instruction have the form "(A op' B) op (C op' B)" or, in the // commutative case, "(A op' B) op (B op' D)"? if (B == D || (InnerCommutative && B == C)) { if (B != D) std::swap(C, D); // Consider forming "(A op C) op' B". // If "A op C" simplifies then it can be formed with no cost. Value *V = SimplifyBinOp(TopLevelOpcode, A, C, TD); // If "A op C" doesn't simplify then only go on if both of the existing // operations "A op' B" and "C op' D" will be zapped as no longer used. if (!V && Op0->hasOneUse() && Op1->hasOneUse()) V = Builder->CreateBinOp(TopLevelOpcode, A, C, Op0->getName()); if (V) { ++NumFactor; V = Builder->CreateBinOp(InnerOpcode, V, B); V->takeName(&I); return V; } } } // Expansion. if (Op0 && RightDistributesOverLeft(Op0->getOpcode(), TopLevelOpcode)) { // The instruction has the form "(A op' B) op C". See if expanding it out // to "(A op C) op' (B op C)" results in simplifications. Value *A = Op0->getOperand(0), *B = Op0->getOperand(1), *C = RHS; Instruction::BinaryOps InnerOpcode = Op0->getOpcode(); // op' // Do "A op C" and "B op C" both simplify? if (Value *L = SimplifyBinOp(TopLevelOpcode, A, C, TD)) if (Value *R = SimplifyBinOp(TopLevelOpcode, B, C, TD)) { // They do! Return "L op' R". ++NumExpand; // If "L op' R" equals "A op' B" then "L op' R" is just the LHS. if ((L == A && R == B) || (Instruction::isCommutative(InnerOpcode) && L == B && R == A)) return Op0; // Otherwise return "L op' R" if it simplifies. if (Value *V = SimplifyBinOp(InnerOpcode, L, R, TD)) return V; // Otherwise, create a new instruction. C = Builder->CreateBinOp(InnerOpcode, L, R); C->takeName(&I); return C; } } if (Op1 && LeftDistributesOverRight(TopLevelOpcode, Op1->getOpcode())) { // The instruction has the form "A op (B op' C)". See if expanding it out // to "(A op B) op' (A op C)" results in simplifications. Value *A = LHS, *B = Op1->getOperand(0), *C = Op1->getOperand(1); Instruction::BinaryOps InnerOpcode = Op1->getOpcode(); // op' // Do "A op B" and "A op C" both simplify? if (Value *L = SimplifyBinOp(TopLevelOpcode, A, B, TD)) if (Value *R = SimplifyBinOp(TopLevelOpcode, A, C, TD)) { // They do! Return "L op' R". ++NumExpand; // If "L op' R" equals "B op' C" then "L op' R" is just the RHS. if ((L == B && R == C) || (Instruction::isCommutative(InnerOpcode) && L == C && R == B)) return Op1; // Otherwise return "L op' R" if it simplifies. if (Value *V = SimplifyBinOp(InnerOpcode, L, R, TD)) return V; // Otherwise, create a new instruction. A = Builder->CreateBinOp(InnerOpcode, L, R); A->takeName(&I); return A; } } return 0; } // dyn_castNegVal - Given a 'sub' instruction, return the RHS of the instruction // if the LHS is a constant zero (which is the 'negate' form). // Value *InstCombiner::dyn_castNegVal(Value *V) const { if (BinaryOperator::isNeg(V)) return BinaryOperator::getNegArgument(V); // Constants can be considered to be negated values if they can be folded. if (ConstantInt *C = dyn_cast(V)) return ConstantExpr::getNeg(C); if (ConstantVector *C = dyn_cast(V)) if (C->getType()->getElementType()->isIntegerTy()) return ConstantExpr::getNeg(C); return 0; } // dyn_castFNegVal - Given a 'fsub' instruction, return the RHS of the // instruction if the LHS is a constant negative zero (which is the 'negate' // form). // Value *InstCombiner::dyn_castFNegVal(Value *V) const { if (BinaryOperator::isFNeg(V)) return BinaryOperator::getFNegArgument(V); // Constants can be considered to be negated values if they can be folded. if (ConstantFP *C = dyn_cast(V)) return ConstantExpr::getFNeg(C); if (ConstantVector *C = dyn_cast(V)) if (C->getType()->getElementType()->isFloatingPointTy()) return ConstantExpr::getFNeg(C); return 0; } static Value *FoldOperationIntoSelectOperand(Instruction &I, Value *SO, InstCombiner *IC) { if (CastInst *CI = dyn_cast(&I)) { return IC->Builder->CreateCast(CI->getOpcode(), SO, I.getType()); } // Figure out if the constant is the left or the right argument. bool ConstIsRHS = isa(I.getOperand(1)); Constant *ConstOperand = cast(I.getOperand(ConstIsRHS)); if (Constant *SOC = dyn_cast(SO)) { if (ConstIsRHS) return ConstantExpr::get(I.getOpcode(), SOC, ConstOperand); return ConstantExpr::get(I.getOpcode(), ConstOperand, SOC); } Value *Op0 = SO, *Op1 = ConstOperand; if (!ConstIsRHS) std::swap(Op0, Op1); if (BinaryOperator *BO = dyn_cast(&I)) return IC->Builder->CreateBinOp(BO->getOpcode(), Op0, Op1, SO->getName()+".op"); if (ICmpInst *CI = dyn_cast(&I)) return IC->Builder->CreateICmp(CI->getPredicate(), Op0, Op1, SO->getName()+".cmp"); if (FCmpInst *CI = dyn_cast(&I)) return IC->Builder->CreateICmp(CI->getPredicate(), Op0, Op1, SO->getName()+".cmp"); llvm_unreachable("Unknown binary instruction type!"); } // FoldOpIntoSelect - Given an instruction with a select as one operand and a // constant as the other operand, try to fold the binary operator into the // select arguments. This also works for Cast instructions, which obviously do // not have a second operand. Instruction *InstCombiner::FoldOpIntoSelect(Instruction &Op, SelectInst *SI) { // Don't modify shared select instructions if (!SI->hasOneUse()) return 0; Value *TV = SI->getOperand(1); Value *FV = SI->getOperand(2); if (isa(TV) || isa(FV)) { // Bool selects with constant operands can be folded to logical ops. if (SI->getType()->isIntegerTy(1)) return 0; // If it's a bitcast involving vectors, make sure it has the same number of // elements on both sides. if (BitCastInst *BC = dyn_cast(&Op)) { VectorType *DestTy = dyn_cast(BC->getDestTy()); VectorType *SrcTy = dyn_cast(BC->getSrcTy()); // Verify that either both or neither are vectors. if ((SrcTy == NULL) != (DestTy == NULL)) return 0; // If vectors, verify that they have the same number of elements. if (SrcTy && SrcTy->getNumElements() != DestTy->getNumElements()) return 0; } Value *SelectTrueVal = FoldOperationIntoSelectOperand(Op, TV, this); Value *SelectFalseVal = FoldOperationIntoSelectOperand(Op, FV, this); return SelectInst::Create(SI->getCondition(), SelectTrueVal, SelectFalseVal); } return 0; } /// FoldOpIntoPhi - Given a binary operator, cast instruction, or select which /// has a PHI node as operand #0, see if we can fold the instruction into the /// PHI (which is only possible if all operands to the PHI are constants). /// Instruction *InstCombiner::FoldOpIntoPhi(Instruction &I) { PHINode *PN = cast(I.getOperand(0)); unsigned NumPHIValues = PN->getNumIncomingValues(); if (NumPHIValues == 0) return 0; // We normally only transform phis with a single use. However, if a PHI has // multiple uses and they are all the same operation, we can fold *all* of the // uses into the PHI. if (!PN->hasOneUse()) { // Walk the use list for the instruction, comparing them to I. for (Value::use_iterator UI = PN->use_begin(), E = PN->use_end(); UI != E; ++UI) { Instruction *User = cast(*UI); if (User != &I && !I.isIdenticalTo(User)) return 0; } // Otherwise, we can replace *all* users with the new PHI we form. } // Check to see if all of the operands of the PHI are simple constants // (constantint/constantfp/undef). If there is one non-constant value, // remember the BB it is in. If there is more than one or if *it* is a PHI, // bail out. We don't do arbitrary constant expressions here because moving // their computation can be expensive without a cost model. BasicBlock *NonConstBB = 0; for (unsigned i = 0; i != NumPHIValues; ++i) { Value *InVal = PN->getIncomingValue(i); if (isa(InVal) && !isa(InVal)) continue; if (isa(InVal)) return 0; // Itself a phi. if (NonConstBB) return 0; // More than one non-const value. NonConstBB = PN->getIncomingBlock(i); // If the InVal is an invoke at the end of the pred block, then we can't // insert a computation after it without breaking the edge. if (InvokeInst *II = dyn_cast(InVal)) if (II->getParent() == NonConstBB) return 0; // If the incoming non-constant value is in I's block, we will remove one // instruction, but insert another equivalent one, leading to infinite // instcombine. if (NonConstBB == I.getParent()) return 0; } // If there is exactly one non-constant value, we can insert a copy of the // operation in that block. However, if this is a critical edge, we would be // inserting the computation one some other paths (e.g. inside a loop). Only // do this if the pred block is unconditionally branching into the phi block. if (NonConstBB != 0) { BranchInst *BI = dyn_cast(NonConstBB->getTerminator()); if (!BI || !BI->isUnconditional()) return 0; } // Okay, we can do the transformation: create the new PHI node. PHINode *NewPN = PHINode::Create(I.getType(), PN->getNumIncomingValues()); InsertNewInstBefore(NewPN, *PN); NewPN->takeName(PN); // If we are going to have to insert a new computation, do so right before the // predecessors terminator. if (NonConstBB) Builder->SetInsertPoint(NonConstBB->getTerminator()); // Next, add all of the operands to the PHI. if (SelectInst *SI = dyn_cast(&I)) { // We only currently try to fold the condition of a select when it is a phi, // not the true/false values. Value *TrueV = SI->getTrueValue(); Value *FalseV = SI->getFalseValue(); BasicBlock *PhiTransBB = PN->getParent(); for (unsigned i = 0; i != NumPHIValues; ++i) { BasicBlock *ThisBB = PN->getIncomingBlock(i); Value *TrueVInPred = TrueV->DoPHITranslation(PhiTransBB, ThisBB); Value *FalseVInPred = FalseV->DoPHITranslation(PhiTransBB, ThisBB); Value *InV = 0; if (Constant *InC = dyn_cast(PN->getIncomingValue(i))) InV = InC->isNullValue() ? FalseVInPred : TrueVInPred; else InV = Builder->CreateSelect(PN->getIncomingValue(i), TrueVInPred, FalseVInPred, "phitmp"); NewPN->addIncoming(InV, ThisBB); } } else if (CmpInst *CI = dyn_cast(&I)) { Constant *C = cast(I.getOperand(1)); for (unsigned i = 0; i != NumPHIValues; ++i) { Value *InV = 0; if (Constant *InC = dyn_cast(PN->getIncomingValue(i))) InV = ConstantExpr::getCompare(CI->getPredicate(), InC, C); else if (isa(CI)) InV = Builder->CreateICmp(CI->getPredicate(), PN->getIncomingValue(i), C, "phitmp"); else InV = Builder->CreateFCmp(CI->getPredicate(), PN->getIncomingValue(i), C, "phitmp"); NewPN->addIncoming(InV, PN->getIncomingBlock(i)); } } else if (I.getNumOperands() == 2) { Constant *C = cast(I.getOperand(1)); for (unsigned i = 0; i != NumPHIValues; ++i) { Value *InV = 0; if (Constant *InC = dyn_cast(PN->getIncomingValue(i))) InV = ConstantExpr::get(I.getOpcode(), InC, C); else InV = Builder->CreateBinOp(cast(I).getOpcode(), PN->getIncomingValue(i), C, "phitmp"); NewPN->addIncoming(InV, PN->getIncomingBlock(i)); } } else { CastInst *CI = cast(&I); Type *RetTy = CI->getType(); for (unsigned i = 0; i != NumPHIValues; ++i) { Value *InV; if (Constant *InC = dyn_cast(PN->getIncomingValue(i))) InV = ConstantExpr::getCast(CI->getOpcode(), InC, RetTy); else InV = Builder->CreateCast(CI->getOpcode(), PN->getIncomingValue(i), I.getType(), "phitmp"); NewPN->addIncoming(InV, PN->getIncomingBlock(i)); } } for (Value::use_iterator UI = PN->use_begin(), E = PN->use_end(); UI != E; ) { Instruction *User = cast(*UI++); if (User == &I) continue; ReplaceInstUsesWith(*User, NewPN); EraseInstFromFunction(*User); } return ReplaceInstUsesWith(I, NewPN); } /// FindElementAtOffset - Given a type and a constant offset, determine whether /// or not there is a sequence of GEP indices into the type that will land us at /// the specified offset. If so, fill them into NewIndices and return the /// resultant element type, otherwise return null. Type *InstCombiner::FindElementAtOffset(Type *Ty, int64_t Offset, SmallVectorImpl &NewIndices) { if (!TD) return 0; if (!Ty->isSized()) return 0; // Start with the index over the outer type. Note that the type size // might be zero (even if the offset isn't zero) if the indexed type // is something like [0 x {int, int}] Type *IntPtrTy = TD->getIntPtrType(Ty->getContext()); int64_t FirstIdx = 0; if (int64_t TySize = TD->getTypeAllocSize(Ty)) { FirstIdx = Offset/TySize; Offset -= FirstIdx*TySize; // Handle hosts where % returns negative instead of values [0..TySize). if (Offset < 0) { --FirstIdx; Offset += TySize; assert(Offset >= 0); } assert((uint64_t)Offset < (uint64_t)TySize && "Out of range offset"); } NewIndices.push_back(ConstantInt::get(IntPtrTy, FirstIdx)); // Index into the types. If we fail, set OrigBase to null. while (Offset) { // Indexing into tail padding between struct/array elements. if (uint64_t(Offset*8) >= TD->getTypeSizeInBits(Ty)) return 0; if (StructType *STy = dyn_cast(Ty)) { const StructLayout *SL = TD->getStructLayout(STy); assert(Offset < (int64_t)SL->getSizeInBytes() && "Offset must stay within the indexed type"); unsigned Elt = SL->getElementContainingOffset(Offset); NewIndices.push_back(ConstantInt::get(Type::getInt32Ty(Ty->getContext()), Elt)); Offset -= SL->getElementOffset(Elt); Ty = STy->getElementType(Elt); } else if (ArrayType *AT = dyn_cast(Ty)) { uint64_t EltSize = TD->getTypeAllocSize(AT->getElementType()); assert(EltSize && "Cannot index into a zero-sized array"); NewIndices.push_back(ConstantInt::get(IntPtrTy,Offset/EltSize)); Offset %= EltSize; Ty = AT->getElementType(); } else { // Otherwise, we can't index into the middle of this atomic type, bail. return 0; } } return Ty; } static bool shouldMergeGEPs(GEPOperator &GEP, GEPOperator &Src) { // If this GEP has only 0 indices, it is the same pointer as // Src. If Src is not a trivial GEP too, don't combine // the indices. if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() && !Src.hasOneUse()) return false; return true; } Instruction *InstCombiner::visitGetElementPtrInst(GetElementPtrInst &GEP) { SmallVector Ops(GEP.op_begin(), GEP.op_end()); if (Value *V = SimplifyGEPInst(Ops, TD)) return ReplaceInstUsesWith(GEP, V); Value *PtrOp = GEP.getOperand(0); // Eliminate unneeded casts for indices, and replace indices which displace // by multiples of a zero size type with zero. if (TD) { bool MadeChange = false; Type *IntPtrTy = TD->getIntPtrType(GEP.getContext()); gep_type_iterator GTI = gep_type_begin(GEP); for (User::op_iterator I = GEP.op_begin() + 1, E = GEP.op_end(); I != E; ++I, ++GTI) { // Skip indices into struct types. SequentialType *SeqTy = dyn_cast(*GTI); if (!SeqTy) continue; // If the element type has zero size then any index over it is equivalent // to an index of zero, so replace it with zero if it is not zero already. if (SeqTy->getElementType()->isSized() && TD->getTypeAllocSize(SeqTy->getElementType()) == 0) if (!isa(*I) || !cast(*I)->isNullValue()) { *I = Constant::getNullValue(IntPtrTy); MadeChange = true; } if ((*I)->getType() != IntPtrTy) { // If we are using a wider index than needed for this platform, shrink // it to what we need. If narrower, sign-extend it to what we need. // This explicit cast can make subsequent optimizations more obvious. *I = Builder->CreateIntCast(*I, IntPtrTy, true); MadeChange = true; } } if (MadeChange) return &GEP; } // Combine Indices - If the source pointer to this getelementptr instruction // is a getelementptr instruction, combine the indices of the two // getelementptr instructions into a single instruction. // if (GEPOperator *Src = dyn_cast(PtrOp)) { if (!shouldMergeGEPs(*cast(&GEP), *Src)) return 0; // Note that if our source is a gep chain itself that we wait for that // chain to be resolved before we perform this transformation. This // avoids us creating a TON of code in some cases. if (GEPOperator *SrcGEP = dyn_cast(Src->getOperand(0))) if (SrcGEP->getNumOperands() == 2 && shouldMergeGEPs(*Src, *SrcGEP)) return 0; // Wait until our source is folded to completion. SmallVector Indices; // Find out whether the last index in the source GEP is a sequential idx. bool EndsWithSequential = false; for (gep_type_iterator I = gep_type_begin(*Src), E = gep_type_end(*Src); I != E; ++I) EndsWithSequential = !(*I)->isStructTy(); // Can we combine the two pointer arithmetics offsets? if (EndsWithSequential) { // Replace: gep (gep %P, long B), long A, ... // With: T = long A+B; gep %P, T, ... // Value *Sum; Value *SO1 = Src->getOperand(Src->getNumOperands()-1); Value *GO1 = GEP.getOperand(1); if (SO1 == Constant::getNullValue(SO1->getType())) { Sum = GO1; } else if (GO1 == Constant::getNullValue(GO1->getType())) { Sum = SO1; } else { // If they aren't the same type, then the input hasn't been processed // by the loop above yet (which canonicalizes sequential index types to // intptr_t). Just avoid transforming this until the input has been // normalized. if (SO1->getType() != GO1->getType()) return 0; Sum = Builder->CreateAdd(SO1, GO1, PtrOp->getName()+".sum"); } // Update the GEP in place if possible. if (Src->getNumOperands() == 2) { GEP.setOperand(0, Src->getOperand(0)); GEP.setOperand(1, Sum); return &GEP; } Indices.append(Src->op_begin()+1, Src->op_end()-1); Indices.push_back(Sum); Indices.append(GEP.op_begin()+2, GEP.op_end()); } else if (isa(*GEP.idx_begin()) && cast(*GEP.idx_begin())->isNullValue() && Src->getNumOperands() != 1) { // Otherwise we can do the fold if the first index of the GEP is a zero Indices.append(Src->op_begin()+1, Src->op_end()); Indices.append(GEP.idx_begin()+1, GEP.idx_end()); } if (!Indices.empty()) return (GEP.isInBounds() && Src->isInBounds()) ? GetElementPtrInst::CreateInBounds(Src->getOperand(0), Indices, GEP.getName()) : GetElementPtrInst::Create(Src->getOperand(0), Indices, GEP.getName()); } // Handle gep(bitcast x) and gep(gep x, 0, 0, 0). Value *StrippedPtr = PtrOp->stripPointerCasts(); PointerType *StrippedPtrTy =cast(StrippedPtr->getType()); if (StrippedPtr != PtrOp && StrippedPtrTy->getAddressSpace() == GEP.getPointerAddressSpace()) { bool HasZeroPointerIndex = false; if (ConstantInt *C = dyn_cast(GEP.getOperand(1))) HasZeroPointerIndex = C->isZero(); // Transform: GEP (bitcast [10 x i8]* X to [0 x i8]*), i32 0, ... // into : GEP [10 x i8]* X, i32 0, ... // // Likewise, transform: GEP (bitcast i8* X to [0 x i8]*), i32 0, ... // into : GEP i8* X, ... // // This occurs when the program declares an array extern like "int X[];" if (HasZeroPointerIndex) { PointerType *CPTy = cast(PtrOp->getType()); if (ArrayType *CATy = dyn_cast(CPTy->getElementType())) { // GEP (bitcast i8* X to [0 x i8]*), i32 0, ... ? if (CATy->getElementType() == StrippedPtrTy->getElementType()) { // -> GEP i8* X, ... SmallVector Idx(GEP.idx_begin()+1, GEP.idx_end()); GetElementPtrInst *Res = GetElementPtrInst::Create(StrippedPtr, Idx, GEP.getName()); Res->setIsInBounds(GEP.isInBounds()); return Res; } if (ArrayType *XATy = dyn_cast(StrippedPtrTy->getElementType())){ // GEP (bitcast [10 x i8]* X to [0 x i8]*), i32 0, ... ? if (CATy->getElementType() == XATy->getElementType()) { // -> GEP [10 x i8]* X, i32 0, ... // At this point, we know that the cast source type is a pointer // to an array of the same type as the destination pointer // array. Because the array type is never stepped over (there // is a leading zero) we can fold the cast into this GEP. GEP.setOperand(0, StrippedPtr); return &GEP; } } } } else if (GEP.getNumOperands() == 2) { // Transform things like: // %t = getelementptr i32* bitcast ([2 x i32]* %str to i32*), i32 %V // into: %t1 = getelementptr [2 x i32]* %str, i32 0, i32 %V; bitcast Type *SrcElTy = StrippedPtrTy->getElementType(); Type *ResElTy=cast(PtrOp->getType())->getElementType(); if (TD && SrcElTy->isArrayTy() && TD->getTypeAllocSize(cast(SrcElTy)->getElementType()) == TD->getTypeAllocSize(ResElTy)) { Value *Idx[2]; Idx[0] = Constant::getNullValue(Type::getInt32Ty(GEP.getContext())); Idx[1] = GEP.getOperand(1); Value *NewGEP = GEP.isInBounds() ? Builder->CreateInBoundsGEP(StrippedPtr, Idx, GEP.getName()) : Builder->CreateGEP(StrippedPtr, Idx, GEP.getName()); // V and GEP are both pointer types --> BitCast return new BitCastInst(NewGEP, GEP.getType()); } // Transform things like: // getelementptr i8* bitcast ([100 x double]* X to i8*), i32 %tmp // (where tmp = 8*tmp2) into: // getelementptr [100 x double]* %arr, i32 0, i32 %tmp2; bitcast if (TD && SrcElTy->isArrayTy() && ResElTy->isIntegerTy(8)) { uint64_t ArrayEltSize = TD->getTypeAllocSize(cast(SrcElTy)->getElementType()); // Check to see if "tmp" is a scale by a multiple of ArrayEltSize. We // allow either a mul, shift, or constant here. Value *NewIdx = 0; ConstantInt *Scale = 0; if (ArrayEltSize == 1) { NewIdx = GEP.getOperand(1); Scale = ConstantInt::get(cast(NewIdx->getType()), 1); } else if (ConstantInt *CI = dyn_cast(GEP.getOperand(1))) { NewIdx = ConstantInt::get(CI->getType(), 1); Scale = CI; } else if (Instruction *Inst =dyn_cast(GEP.getOperand(1))){ if (Inst->getOpcode() == Instruction::Shl && isa(Inst->getOperand(1))) { ConstantInt *ShAmt = cast(Inst->getOperand(1)); uint32_t ShAmtVal = ShAmt->getLimitedValue(64); Scale = ConstantInt::get(cast(Inst->getType()), 1ULL << ShAmtVal); NewIdx = Inst->getOperand(0); } else if (Inst->getOpcode() == Instruction::Mul && isa(Inst->getOperand(1))) { Scale = cast(Inst->getOperand(1)); NewIdx = Inst->getOperand(0); } } // If the index will be to exactly the right offset with the scale taken // out, perform the transformation. Note, we don't know whether Scale is // signed or not. We'll use unsigned version of division/modulo // operation after making sure Scale doesn't have the sign bit set. if (ArrayEltSize && Scale && Scale->getSExtValue() >= 0LL && Scale->getZExtValue() % ArrayEltSize == 0) { Scale = ConstantInt::get(Scale->getType(), Scale->getZExtValue() / ArrayEltSize); if (Scale->getZExtValue() != 1) { Constant *C = ConstantExpr::getIntegerCast(Scale, NewIdx->getType(), false /*ZExt*/); NewIdx = Builder->CreateMul(NewIdx, C, "idxscale"); } // Insert the new GEP instruction. Value *Idx[2]; Idx[0] = Constant::getNullValue(Type::getInt32Ty(GEP.getContext())); Idx[1] = NewIdx; Value *NewGEP = GEP.isInBounds() ? Builder->CreateInBoundsGEP(StrippedPtr, Idx, GEP.getName()): Builder->CreateGEP(StrippedPtr, Idx, GEP.getName()); // The NewGEP must be pointer typed, so must the old one -> BitCast return new BitCastInst(NewGEP, GEP.getType()); } } } } /// See if we can simplify: /// X = bitcast A* to B* /// Y = gep X, <...constant indices...> /// into a gep of the original struct. This is important for SROA and alias /// analysis of unions. If "A" is also a bitcast, wait for A/X to be merged. if (BitCastInst *BCI = dyn_cast(PtrOp)) { if (TD && !isa(BCI->getOperand(0)) && GEP.hasAllConstantIndices() && StrippedPtrTy->getAddressSpace() == GEP.getPointerAddressSpace()) { // Determine how much the GEP moves the pointer. We are guaranteed to get // a constant back from EmitGEPOffset. ConstantInt *OffsetV = cast(EmitGEPOffset(&GEP)); int64_t Offset = OffsetV->getSExtValue(); // If this GEP instruction doesn't move the pointer, just replace the GEP // with a bitcast of the real input to the dest type. if (Offset == 0) { // If the bitcast is of an allocation, and the allocation will be // converted to match the type of the cast, don't touch this. if (isa(BCI->getOperand(0)) || isMalloc(BCI->getOperand(0))) { // See if the bitcast simplifies, if so, don't nuke this GEP yet. if (Instruction *I = visitBitCast(*BCI)) { if (I != BCI) { I->takeName(BCI); BCI->getParent()->getInstList().insert(BCI, I); ReplaceInstUsesWith(*BCI, I); } return &GEP; } } return new BitCastInst(BCI->getOperand(0), GEP.getType()); } // Otherwise, if the offset is non-zero, we need to find out if there is a // field at Offset in 'A's type. If so, we can pull the cast through the // GEP. SmallVector NewIndices; Type *InTy = cast(BCI->getOperand(0)->getType())->getElementType(); if (FindElementAtOffset(InTy, Offset, NewIndices)) { Value *NGEP = GEP.isInBounds() ? Builder->CreateInBoundsGEP(BCI->getOperand(0), NewIndices) : Builder->CreateGEP(BCI->getOperand(0), NewIndices); if (NGEP->getType() == GEP.getType()) return ReplaceInstUsesWith(GEP, NGEP); NGEP->takeName(&GEP); return new BitCastInst(NGEP, GEP.getType()); } } } return 0; } static bool IsOnlyNullComparedAndFreed(Value *V, SmallVectorImpl &Users, int Depth = 0) { if (Depth == 8) return false; for (Value::use_iterator UI = V->use_begin(), UE = V->use_end(); UI != UE; ++UI) { User *U = *UI; if (isFreeCall(U)) { Users.push_back(U); continue; } if (ICmpInst *ICI = dyn_cast(U)) { if (ICI->isEquality() && isa(ICI->getOperand(1))) { Users.push_back(ICI); continue; } } if (BitCastInst *BCI = dyn_cast(U)) { if (IsOnlyNullComparedAndFreed(BCI, Users, Depth+1)) { Users.push_back(BCI); continue; } } if (GetElementPtrInst *GEPI = dyn_cast(U)) { if (IsOnlyNullComparedAndFreed(GEPI, Users, Depth+1)) { Users.push_back(GEPI); continue; } } if (IntrinsicInst *II = dyn_cast(U)) { if (II->getIntrinsicID() == Intrinsic::lifetime_start || II->getIntrinsicID() == Intrinsic::lifetime_end) { Users.push_back(II); continue; } } return false; } return true; } Instruction *InstCombiner::visitMalloc(Instruction &MI) { // If we have a malloc call which is only used in any amount of comparisons // to null and free calls, delete the calls and replace the comparisons with // true or false as appropriate. SmallVector Users; if (IsOnlyNullComparedAndFreed(&MI, Users)) { for (unsigned i = 0, e = Users.size(); i != e; ++i) { Instruction *I = cast_or_null(&*Users[i]); if (!I) continue; if (ICmpInst *C = dyn_cast(I)) { ReplaceInstUsesWith(*C, ConstantInt::get(Type::getInt1Ty(C->getContext()), C->isFalseWhenEqual())); } else if (isa(I) || isa(I)) { ReplaceInstUsesWith(*I, UndefValue::get(I->getType())); } EraseInstFromFunction(*I); } return EraseInstFromFunction(MI); } return 0; } Instruction *InstCombiner::visitFree(CallInst &FI) { Value *Op = FI.getArgOperand(0); // free undef -> unreachable. if (isa(Op)) { // Insert a new store to null because we cannot modify the CFG here. Builder->CreateStore(ConstantInt::getTrue(FI.getContext()), UndefValue::get(Type::getInt1PtrTy(FI.getContext()))); return EraseInstFromFunction(FI); } // If we have 'free null' delete the instruction. This can happen in stl code // when lots of inlining happens. if (isa(Op)) return EraseInstFromFunction(FI); return 0; } Instruction *InstCombiner::visitBranchInst(BranchInst &BI) { // Change br (not X), label True, label False to: br X, label False, True Value *X = 0; BasicBlock *TrueDest; BasicBlock *FalseDest; if (match(&BI, m_Br(m_Not(m_Value(X)), TrueDest, FalseDest)) && !isa(X)) { // Swap Destinations and condition... BI.setCondition(X); BI.swapSuccessors(); return &BI; } // Cannonicalize fcmp_one -> fcmp_oeq FCmpInst::Predicate FPred; Value *Y; if (match(&BI, m_Br(m_FCmp(FPred, m_Value(X), m_Value(Y)), TrueDest, FalseDest)) && BI.getCondition()->hasOneUse()) if (FPred == FCmpInst::FCMP_ONE || FPred == FCmpInst::FCMP_OLE || FPred == FCmpInst::FCMP_OGE) { FCmpInst *Cond = cast(BI.getCondition()); Cond->setPredicate(FCmpInst::getInversePredicate(FPred)); // Swap Destinations and condition. BI.swapSuccessors(); Worklist.Add(Cond); return &BI; } // Cannonicalize icmp_ne -> icmp_eq ICmpInst::Predicate IPred; if (match(&BI, m_Br(m_ICmp(IPred, m_Value(X), m_Value(Y)), TrueDest, FalseDest)) && BI.getCondition()->hasOneUse()) if (IPred == ICmpInst::ICMP_NE || IPred == ICmpInst::ICMP_ULE || IPred == ICmpInst::ICMP_SLE || IPred == ICmpInst::ICMP_UGE || IPred == ICmpInst::ICMP_SGE) { ICmpInst *Cond = cast(BI.getCondition()); Cond->setPredicate(ICmpInst::getInversePredicate(IPred)); // Swap Destinations and condition. BI.swapSuccessors(); Worklist.Add(Cond); return &BI; } return 0; } Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) { Value *Cond = SI.getCondition(); if (Instruction *I = dyn_cast(Cond)) { if (I->getOpcode() == Instruction::Add) if (ConstantInt *AddRHS = dyn_cast(I->getOperand(1))) { // change 'switch (X+4) case 1:' into 'switch (X) case -3' unsigned NumCases = SI.getNumCases(); // Skip the first item since that's the default case. for (unsigned i = 1; i < NumCases; ++i) { ConstantInt* CaseVal = SI.getCaseValue(i); Constant* NewCaseVal = ConstantExpr::getSub(cast(CaseVal), AddRHS); assert(isa(NewCaseVal) && "Result of expression should be constant"); SI.setSuccessorValue(i, cast(NewCaseVal)); } SI.setCondition(I->getOperand(0)); Worklist.Add(I); return &SI; } } return 0; } Instruction *InstCombiner::visitExtractValueInst(ExtractValueInst &EV) { Value *Agg = EV.getAggregateOperand(); if (!EV.hasIndices()) return ReplaceInstUsesWith(EV, Agg); if (Constant *C = dyn_cast(Agg)) { if (isa(C)) return ReplaceInstUsesWith(EV, UndefValue::get(EV.getType())); if (isa(C)) return ReplaceInstUsesWith(EV, Constant::getNullValue(EV.getType())); if (isa(C) || isa(C)) { // Extract the element indexed by the first index out of the constant Value *V = C->getOperand(*EV.idx_begin()); if (EV.getNumIndices() > 1) // Extract the remaining indices out of the constant indexed by the // first index return ExtractValueInst::Create(V, EV.getIndices().slice(1)); else return ReplaceInstUsesWith(EV, V); } return 0; // Can't handle other constants } if (InsertValueInst *IV = dyn_cast(Agg)) { // We're extracting from an insertvalue instruction, compare the indices const unsigned *exti, *exte, *insi, *inse; for (exti = EV.idx_begin(), insi = IV->idx_begin(), exte = EV.idx_end(), inse = IV->idx_end(); exti != exte && insi != inse; ++exti, ++insi) { if (*insi != *exti) // The insert and extract both reference distinctly different elements. // This means the extract is not influenced by the insert, and we can // replace the aggregate operand of the extract with the aggregate // operand of the insert. i.e., replace // %I = insertvalue { i32, { i32 } } %A, { i32 } { i32 42 }, 1 // %E = extractvalue { i32, { i32 } } %I, 0 // with // %E = extractvalue { i32, { i32 } } %A, 0 return ExtractValueInst::Create(IV->getAggregateOperand(), EV.getIndices()); } if (exti == exte && insi == inse) // Both iterators are at the end: Index lists are identical. Replace // %B = insertvalue { i32, { i32 } } %A, i32 42, 1, 0 // %C = extractvalue { i32, { i32 } } %B, 1, 0 // with "i32 42" return ReplaceInstUsesWith(EV, IV->getInsertedValueOperand()); if (exti == exte) { // The extract list is a prefix of the insert list. i.e. replace // %I = insertvalue { i32, { i32 } } %A, i32 42, 1, 0 // %E = extractvalue { i32, { i32 } } %I, 1 // with // %X = extractvalue { i32, { i32 } } %A, 1 // %E = insertvalue { i32 } %X, i32 42, 0 // by switching the order of the insert and extract (though the // insertvalue should be left in, since it may have other uses). Value *NewEV = Builder->CreateExtractValue(IV->getAggregateOperand(), EV.getIndices()); return InsertValueInst::Create(NewEV, IV->getInsertedValueOperand(), makeArrayRef(insi, inse)); } if (insi == inse) // The insert list is a prefix of the extract list // We can simply remove the common indices from the extract and make it // operate on the inserted value instead of the insertvalue result. // i.e., replace // %I = insertvalue { i32, { i32 } } %A, { i32 } { i32 42 }, 1 // %E = extractvalue { i32, { i32 } } %I, 1, 0 // with // %E extractvalue { i32 } { i32 42 }, 0 return ExtractValueInst::Create(IV->getInsertedValueOperand(), makeArrayRef(exti, exte)); } if (IntrinsicInst *II = dyn_cast(Agg)) { // We're extracting from an intrinsic, see if we're the only user, which // allows us to simplify multiple result intrinsics to simpler things that // just get one value. if (II->hasOneUse()) { // Check if we're grabbing the overflow bit or the result of a 'with // overflow' intrinsic. If it's the latter we can remove the intrinsic // and replace it with a traditional binary instruction. switch (II->getIntrinsicID()) { case Intrinsic::uadd_with_overflow: case Intrinsic::sadd_with_overflow: if (*EV.idx_begin() == 0) { // Normal result. Value *LHS = II->getArgOperand(0), *RHS = II->getArgOperand(1); ReplaceInstUsesWith(*II, UndefValue::get(II->getType())); EraseInstFromFunction(*II); return BinaryOperator::CreateAdd(LHS, RHS); } // If the normal result of the add is dead, and the RHS is a constant, // we can transform this into a range comparison. // overflow = uadd a, -4 --> overflow = icmp ugt a, 3 if (II->getIntrinsicID() == Intrinsic::uadd_with_overflow) if (ConstantInt *CI = dyn_cast(II->getArgOperand(1))) return new ICmpInst(ICmpInst::ICMP_UGT, II->getArgOperand(0), ConstantExpr::getNot(CI)); break; case Intrinsic::usub_with_overflow: case Intrinsic::ssub_with_overflow: if (*EV.idx_begin() == 0) { // Normal result. Value *LHS = II->getArgOperand(0), *RHS = II->getArgOperand(1); ReplaceInstUsesWith(*II, UndefValue::get(II->getType())); EraseInstFromFunction(*II); return BinaryOperator::CreateSub(LHS, RHS); } break; case Intrinsic::umul_with_overflow: case Intrinsic::smul_with_overflow: if (*EV.idx_begin() == 0) { // Normal result. Value *LHS = II->getArgOperand(0), *RHS = II->getArgOperand(1); ReplaceInstUsesWith(*II, UndefValue::get(II->getType())); EraseInstFromFunction(*II); return BinaryOperator::CreateMul(LHS, RHS); } break; default: break; } } } if (LoadInst *L = dyn_cast(Agg)) // If the (non-volatile) load only has one use, we can rewrite this to a // load from a GEP. This reduces the size of the load. // FIXME: If a load is used only by extractvalue instructions then this // could be done regardless of having multiple uses. if (L->isSimple() && L->hasOneUse()) { // extractvalue has integer indices, getelementptr has Value*s. Convert. SmallVector Indices; // Prefix an i32 0 since we need the first element. Indices.push_back(Builder->getInt32(0)); for (ExtractValueInst::idx_iterator I = EV.idx_begin(), E = EV.idx_end(); I != E; ++I) Indices.push_back(Builder->getInt32(*I)); // We need to insert these at the location of the old load, not at that of // the extractvalue. Builder->SetInsertPoint(L->getParent(), L); Value *GEP = Builder->CreateInBoundsGEP(L->getPointerOperand(), Indices); // Returning the load directly will cause the main loop to insert it in // the wrong spot, so use ReplaceInstUsesWith(). return ReplaceInstUsesWith(EV, Builder->CreateLoad(GEP)); } // We could simplify extracts from other values. Note that nested extracts may // already be simplified implicitly by the above: extract (extract (insert) ) // will be translated into extract ( insert ( extract ) ) first and then just // the value inserted, if appropriate. Similarly for extracts from single-use // loads: extract (extract (load)) will be translated to extract (load (gep)) // and if again single-use then via load (gep (gep)) to load (gep). // However, double extracts from e.g. function arguments or return values // aren't handled yet. return 0; } enum Personality_Type { Unknown_Personality, GNU_Ada_Personality, GNU_CXX_Personality }; /// RecognizePersonality - See if the given exception handling personality /// function is one that we understand. If so, return a description of it; /// otherwise return Unknown_Personality. static Personality_Type RecognizePersonality(Value *Pers) { Function *F = dyn_cast(Pers->stripPointerCasts()); if (!F) return Unknown_Personality; return StringSwitch(F->getName()) .Case("__gnat_eh_personality", GNU_Ada_Personality) .Case("__gxx_personality_v0", GNU_CXX_Personality) .Default(Unknown_Personality); } /// isCatchAll - Return 'true' if the given typeinfo will match anything. static bool isCatchAll(Personality_Type Personality, Constant *TypeInfo) { switch (Personality) { case Unknown_Personality: return false; case GNU_Ada_Personality: // While __gnat_all_others_value will match any Ada exception, it doesn't // match foreign exceptions (or didn't, before gcc-4.7). return false; case GNU_CXX_Personality: return TypeInfo->isNullValue(); } llvm_unreachable("Unknown personality!"); } static bool shorter_filter(const Value *LHS, const Value *RHS) { return cast(LHS->getType())->getNumElements() < cast(RHS->getType())->getNumElements(); } Instruction *InstCombiner::visitLandingPadInst(LandingPadInst &LI) { // The logic here should be correct for any real-world personality function. // However if that turns out not to be true, the offending logic can always // be conditioned on the personality function, like the catch-all logic is. Personality_Type Personality = RecognizePersonality(LI.getPersonalityFn()); // Simplify the list of clauses, eg by removing repeated catch clauses // (these are often created by inlining). bool MakeNewInstruction = false; // If true, recreate using the following: SmallVector NewClauses; // - Clauses for the new instruction; bool CleanupFlag = LI.isCleanup(); // - The new instruction is a cleanup. SmallPtrSet AlreadyCaught; // Typeinfos known caught already. for (unsigned i = 0, e = LI.getNumClauses(); i != e; ++i) { bool isLastClause = i + 1 == e; if (LI.isCatch(i)) { // A catch clause. Value *CatchClause = LI.getClause(i); Constant *TypeInfo = cast(CatchClause->stripPointerCasts()); // If we already saw this clause, there is no point in having a second // copy of it. if (AlreadyCaught.insert(TypeInfo)) { // This catch clause was not already seen. NewClauses.push_back(CatchClause); } else { // Repeated catch clause - drop the redundant copy. MakeNewInstruction = true; } // If this is a catch-all then there is no point in keeping any following // clauses or marking the landingpad as having a cleanup. if (isCatchAll(Personality, TypeInfo)) { if (!isLastClause) MakeNewInstruction = true; CleanupFlag = false; break; } } else { // A filter clause. If any of the filter elements were already caught // then they can be dropped from the filter. It is tempting to try to // exploit the filter further by saying that any typeinfo that does not // occur in the filter can't be caught later (and thus can be dropped). // However this would be wrong, since typeinfos can match without being // equal (for example if one represents a C++ class, and the other some // class derived from it). assert(LI.isFilter(i) && "Unsupported landingpad clause!"); Value *FilterClause = LI.getClause(i); ArrayType *FilterType = cast(FilterClause->getType()); unsigned NumTypeInfos = FilterType->getNumElements(); // An empty filter catches everything, so there is no point in keeping any // following clauses or marking the landingpad as having a cleanup. By // dealing with this case here the following code is made a bit simpler. if (!NumTypeInfos) { NewClauses.push_back(FilterClause); if (!isLastClause) MakeNewInstruction = true; CleanupFlag = false; break; } bool MakeNewFilter = false; // If true, make a new filter. SmallVector NewFilterElts; // New elements. if (isa(FilterClause)) { // Not an empty filter - it contains at least one null typeinfo. assert(NumTypeInfos > 0 && "Should have handled empty filter already!"); Constant *TypeInfo = Constant::getNullValue(FilterType->getElementType()); // If this typeinfo is a catch-all then the filter can never match. if (isCatchAll(Personality, TypeInfo)) { // Throw the filter away. MakeNewInstruction = true; continue; } // There is no point in having multiple copies of this typeinfo, so // discard all but the first copy if there is more than one. NewFilterElts.push_back(TypeInfo); if (NumTypeInfos > 1) MakeNewFilter = true; } else { ConstantArray *Filter = cast(FilterClause); SmallPtrSet SeenInFilter; // For uniquing the elements. NewFilterElts.reserve(NumTypeInfos); // Remove any filter elements that were already caught or that already // occurred in the filter. While there, see if any of the elements are // catch-alls. If so, the filter can be discarded. bool SawCatchAll = false; for (unsigned j = 0; j != NumTypeInfos; ++j) { Value *Elt = Filter->getOperand(j); Constant *TypeInfo = cast(Elt->stripPointerCasts()); if (isCatchAll(Personality, TypeInfo)) { // This element is a catch-all. Bail out, noting this fact. SawCatchAll = true; break; } if (AlreadyCaught.count(TypeInfo)) // Already caught by an earlier clause, so having it in the filter // is pointless. continue; // There is no point in having multiple copies of the same typeinfo in // a filter, so only add it if we didn't already. if (SeenInFilter.insert(TypeInfo)) NewFilterElts.push_back(cast(Elt)); } // A filter containing a catch-all cannot match anything by definition. if (SawCatchAll) { // Throw the filter away. MakeNewInstruction = true; continue; } // If we dropped something from the filter, make a new one. if (NewFilterElts.size() < NumTypeInfos) MakeNewFilter = true; } if (MakeNewFilter) { FilterType = ArrayType::get(FilterType->getElementType(), NewFilterElts.size()); FilterClause = ConstantArray::get(FilterType, NewFilterElts); MakeNewInstruction = true; } NewClauses.push_back(FilterClause); // If the new filter is empty then it will catch everything so there is // no point in keeping any following clauses or marking the landingpad // as having a cleanup. The case of the original filter being empty was // already handled above. if (MakeNewFilter && !NewFilterElts.size()) { assert(MakeNewInstruction && "New filter but not a new instruction!"); CleanupFlag = false; break; } } } // If several filters occur in a row then reorder them so that the shortest // filters come first (those with the smallest number of elements). This is // advantageous because shorter filters are more likely to match, speeding up // unwinding, but mostly because it increases the effectiveness of the other // filter optimizations below. for (unsigned i = 0, e = NewClauses.size(); i + 1 < e; ) { unsigned j; // Find the maximal 'j' s.t. the range [i, j) consists entirely of filters. for (j = i; j != e; ++j) if (!isa(NewClauses[j]->getType())) break; // Check whether the filters are already sorted by length. We need to know // if sorting them is actually going to do anything so that we only make a // new landingpad instruction if it does. for (unsigned k = i; k + 1 < j; ++k) if (shorter_filter(NewClauses[k+1], NewClauses[k])) { // Not sorted, so sort the filters now. Doing an unstable sort would be // correct too but reordering filters pointlessly might confuse users. std::stable_sort(NewClauses.begin() + i, NewClauses.begin() + j, shorter_filter); MakeNewInstruction = true; break; } // Look for the next batch of filters. i = j + 1; } // If typeinfos matched if and only if equal, then the elements of a filter L // that occurs later than a filter F could be replaced by the intersection of // the elements of F and L. In reality two typeinfos can match without being // equal (for example if one represents a C++ class, and the other some class // derived from it) so it would be wrong to perform this transform in general. // However the transform is correct and useful if F is a subset of L. In that // case L can be replaced by F, and thus removed altogether since repeating a // filter is pointless. So here we look at all pairs of filters F and L where // L follows F in the list of clauses, and remove L if every element of F is // an element of L. This can occur when inlining C++ functions with exception // specifications. for (unsigned i = 0; i + 1 < NewClauses.size(); ++i) { // Examine each filter in turn. Value *Filter = NewClauses[i]; ArrayType *FTy = dyn_cast(Filter->getType()); if (!FTy) // Not a filter - skip it. continue; unsigned FElts = FTy->getNumElements(); // Examine each filter following this one. Doing this backwards means that // we don't have to worry about filters disappearing under us when removed. for (unsigned j = NewClauses.size() - 1; j != i; --j) { Value *LFilter = NewClauses[j]; ArrayType *LTy = dyn_cast(LFilter->getType()); if (!LTy) // Not a filter - skip it. continue; // If Filter is a subset of LFilter, i.e. every element of Filter is also // an element of LFilter, then discard LFilter. SmallVector::iterator J = NewClauses.begin() + j; // If Filter is empty then it is a subset of LFilter. if (!FElts) { // Discard LFilter. NewClauses.erase(J); MakeNewInstruction = true; // Move on to the next filter. continue; } unsigned LElts = LTy->getNumElements(); // If Filter is longer than LFilter then it cannot be a subset of it. if (FElts > LElts) // Move on to the next filter. continue; // At this point we know that LFilter has at least one element. if (isa(LFilter)) { // LFilter only contains zeros. // Filter is a subset of LFilter iff Filter contains only zeros (as we // already know that Filter is not longer than LFilter). if (isa(Filter)) { assert(FElts <= LElts && "Should have handled this case earlier!"); // Discard LFilter. NewClauses.erase(J); MakeNewInstruction = true; } // Move on to the next filter. continue; } ConstantArray *LArray = cast(LFilter); if (isa(Filter)) { // Filter only contains zeros. // Since Filter is non-empty and contains only zeros, it is a subset of // LFilter iff LFilter contains a zero. assert(FElts > 0 && "Should have eliminated the empty filter earlier!"); for (unsigned l = 0; l != LElts; ++l) if (LArray->getOperand(l)->isNullValue()) { // LFilter contains a zero - discard it. NewClauses.erase(J); MakeNewInstruction = true; break; } // Move on to the next filter. continue; } // At this point we know that both filters are ConstantArrays. Loop over // operands to see whether every element of Filter is also an element of // LFilter. Since filters tend to be short this is probably faster than // using a method that scales nicely. ConstantArray *FArray = cast(Filter); bool AllFound = true; for (unsigned f = 0; f != FElts; ++f) { Value *FTypeInfo = FArray->getOperand(f)->stripPointerCasts(); AllFound = false; for (unsigned l = 0; l != LElts; ++l) { Value *LTypeInfo = LArray->getOperand(l)->stripPointerCasts(); if (LTypeInfo == FTypeInfo) { AllFound = true; break; } } if (!AllFound) break; } if (AllFound) { // Discard LFilter. NewClauses.erase(J); MakeNewInstruction = true; } // Move on to the next filter. } } // If we changed any of the clauses, replace the old landingpad instruction // with a new one. if (MakeNewInstruction) { LandingPadInst *NLI = LandingPadInst::Create(LI.getType(), LI.getPersonalityFn(), NewClauses.size()); for (unsigned i = 0, e = NewClauses.size(); i != e; ++i) NLI->addClause(NewClauses[i]); // A landing pad with no clauses must have the cleanup flag set. It is // theoretically possible, though highly unlikely, that we eliminated all // clauses. If so, force the cleanup flag to true. if (NewClauses.empty()) CleanupFlag = true; NLI->setCleanup(CleanupFlag); return NLI; } // Even if none of the clauses changed, we may nonetheless have understood // that the cleanup flag is pointless. Clear it if so. if (LI.isCleanup() != CleanupFlag) { assert(!CleanupFlag && "Adding a cleanup, not removing one?!"); LI.setCleanup(CleanupFlag); return &LI; } return 0; } /// TryToSinkInstruction - Try to move the specified instruction from its /// current block into the beginning of DestBlock, which can only happen if it's /// safe to move the instruction past all of the instructions between it and the /// end of its block. static bool TryToSinkInstruction(Instruction *I, BasicBlock *DestBlock) { assert(I->hasOneUse() && "Invariants didn't hold!"); // Cannot move control-flow-involving, volatile loads, vaarg, etc. if (isa(I) || isa(I) || I->mayHaveSideEffects() || isa(I)) return false; // Do not sink alloca instructions out of the entry block. if (isa(I) && I->getParent() == &DestBlock->getParent()->getEntryBlock()) return false; // We can only sink load instructions if there is nothing between the load and // the end of block that could change the value. if (I->mayReadFromMemory()) { for (BasicBlock::iterator Scan = I, E = I->getParent()->end(); Scan != E; ++Scan) if (Scan->mayWriteToMemory()) return false; } BasicBlock::iterator InsertPos = DestBlock->getFirstInsertionPt(); I->moveBefore(InsertPos); ++NumSunkInst; return true; } /// AddReachableCodeToWorklist - Walk the function in depth-first order, adding /// all reachable code to the worklist. /// /// This has a couple of tricks to make the code faster and more powerful. In /// particular, we constant fold and DCE instructions as we go, to avoid adding /// them to the worklist (this significantly speeds up instcombine on code where /// many instructions are dead or constant). Additionally, if we find a branch /// whose condition is a known constant, we only visit the reachable successors. /// static bool AddReachableCodeToWorklist(BasicBlock *BB, SmallPtrSet &Visited, InstCombiner &IC, const TargetData *TD) { bool MadeIRChange = false; SmallVector Worklist; Worklist.push_back(BB); SmallVector InstrsForInstCombineWorklist; DenseMap FoldedConstants; do { BB = Worklist.pop_back_val(); // We have now visited this block! If we've already been here, ignore it. if (!Visited.insert(BB)) continue; for (BasicBlock::iterator BBI = BB->begin(), E = BB->end(); BBI != E; ) { Instruction *Inst = BBI++; // DCE instruction if trivially dead. if (isInstructionTriviallyDead(Inst)) { ++NumDeadInst; DEBUG(errs() << "IC: DCE: " << *Inst << '\n'); Inst->eraseFromParent(); continue; } // ConstantProp instruction if trivially constant. if (!Inst->use_empty() && isa(Inst->getOperand(0))) if (Constant *C = ConstantFoldInstruction(Inst, TD)) { DEBUG(errs() << "IC: ConstFold to: " << *C << " from: " << *Inst << '\n'); Inst->replaceAllUsesWith(C); ++NumConstProp; Inst->eraseFromParent(); continue; } if (TD) { // See if we can constant fold its operands. for (User::op_iterator i = Inst->op_begin(), e = Inst->op_end(); i != e; ++i) { ConstantExpr *CE = dyn_cast(i); if (CE == 0) continue; Constant*& FoldRes = FoldedConstants[CE]; if (!FoldRes) FoldRes = ConstantFoldConstantExpression(CE, TD); if (!FoldRes) FoldRes = CE; if (FoldRes != CE) { *i = FoldRes; MadeIRChange = true; } } } InstrsForInstCombineWorklist.push_back(Inst); } // Recursively visit successors. If this is a branch or switch on a // constant, only visit the reachable successor. TerminatorInst *TI = BB->getTerminator(); if (BranchInst *BI = dyn_cast(TI)) { if (BI->isConditional() && isa(BI->getCondition())) { bool CondVal = cast(BI->getCondition())->getZExtValue(); BasicBlock *ReachableBB = BI->getSuccessor(!CondVal); Worklist.push_back(ReachableBB); continue; } } else if (SwitchInst *SI = dyn_cast(TI)) { if (ConstantInt *Cond = dyn_cast(SI->getCondition())) { // See if this is an explicit destination. for (unsigned i = 1, e = SI->getNumSuccessors(); i != e; ++i) if (SI->getCaseValue(i) == Cond) { BasicBlock *ReachableBB = SI->getSuccessor(i); Worklist.push_back(ReachableBB); continue; } // Otherwise it is the default destination. Worklist.push_back(SI->getSuccessor(0)); continue; } } for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i) Worklist.push_back(TI->getSuccessor(i)); } while (!Worklist.empty()); // Once we've found all of the instructions to add to instcombine's worklist, // add them in reverse order. This way instcombine will visit from the top // of the function down. This jives well with the way that it adds all uses // of instructions to the worklist after doing a transformation, thus avoiding // some N^2 behavior in pathological cases. IC.Worklist.AddInitialGroup(&InstrsForInstCombineWorklist[0], InstrsForInstCombineWorklist.size()); return MadeIRChange; } bool InstCombiner::DoOneIteration(Function &F, unsigned Iteration) { MadeIRChange = false; DEBUG(errs() << "\n\nINSTCOMBINE ITERATION #" << Iteration << " on " << F.getNameStr() << "\n"); { // Do a depth-first traversal of the function, populate the worklist with // the reachable instructions. Ignore blocks that are not reachable. Keep // track of which blocks we visit. SmallPtrSet Visited; MadeIRChange |= AddReachableCodeToWorklist(F.begin(), Visited, *this, TD); // Do a quick scan over the function. If we find any blocks that are // unreachable, remove any instructions inside of them. This prevents // the instcombine code from having to deal with some bad special cases. for (Function::iterator BB = F.begin(), E = F.end(); BB != E; ++BB) { if (Visited.count(BB)) continue; // Delete the instructions backwards, as it has a reduced likelihood of // having to update as many def-use and use-def chains. Instruction *EndInst = BB->getTerminator(); // Last not to be deleted. while (EndInst != BB->begin()) { // Delete the next to last instruction. BasicBlock::iterator I = EndInst; Instruction *Inst = --I; if (!Inst->use_empty()) Inst->replaceAllUsesWith(UndefValue::get(Inst->getType())); if (isa(Inst)) { EndInst = Inst; continue; } if (!isa(Inst)) { ++NumDeadInst; MadeIRChange = true; } Inst->eraseFromParent(); } } } while (!Worklist.isEmpty()) { Instruction *I = Worklist.RemoveOne(); if (I == 0) continue; // skip null values. // Check to see if we can DCE the instruction. if (isInstructionTriviallyDead(I)) { DEBUG(errs() << "IC: DCE: " << *I << '\n'); EraseInstFromFunction(*I); ++NumDeadInst; MadeIRChange = true; continue; } // Instruction isn't dead, see if we can constant propagate it. if (!I->use_empty() && isa(I->getOperand(0))) if (Constant *C = ConstantFoldInstruction(I, TD)) { DEBUG(errs() << "IC: ConstFold to: " << *C << " from: " << *I << '\n'); // Add operands to the worklist. ReplaceInstUsesWith(*I, C); ++NumConstProp; EraseInstFromFunction(*I); MadeIRChange = true; continue; } // See if we can trivially sink this instruction to a successor basic block. if (I->hasOneUse()) { BasicBlock *BB = I->getParent(); Instruction *UserInst = cast(I->use_back()); BasicBlock *UserParent; // Get the block the use occurs in. if (PHINode *PN = dyn_cast(UserInst)) UserParent = PN->getIncomingBlock(I->use_begin().getUse()); else UserParent = UserInst->getParent(); if (UserParent != BB) { bool UserIsSuccessor = false; // See if the user is one of our successors. for (succ_iterator SI = succ_begin(BB), E = succ_end(BB); SI != E; ++SI) if (*SI == UserParent) { UserIsSuccessor = true; break; } // If the user is one of our immediate successors, and if that successor // only has us as a predecessors (we'd have to split the critical edge // otherwise), we can keep going. if (UserIsSuccessor && UserParent->getSinglePredecessor()) // Okay, the CFG is simple enough, try to sink this instruction. MadeIRChange |= TryToSinkInstruction(I, UserParent); } } // Now that we have an instruction, try combining it to simplify it. Builder->SetInsertPoint(I->getParent(), I); Builder->SetCurrentDebugLocation(I->getDebugLoc()); #ifndef NDEBUG std::string OrigI; #endif DEBUG(raw_string_ostream SS(OrigI); I->print(SS); OrigI = SS.str();); DEBUG(errs() << "IC: Visiting: " << OrigI << '\n'); if (Instruction *Result = visit(*I)) { ++NumCombined; // Should we replace the old instruction with a new one? if (Result != I) { DEBUG(errs() << "IC: Old = " << *I << '\n' << " New = " << *Result << '\n'); if (!I->getDebugLoc().isUnknown()) Result->setDebugLoc(I->getDebugLoc()); // Everything uses the new instruction now. I->replaceAllUsesWith(Result); // Move the name to the new instruction first. Result->takeName(I); // Push the new instruction and any users onto the worklist. Worklist.Add(Result); Worklist.AddUsersToWorkList(*Result); // Insert the new instruction into the basic block... BasicBlock *InstParent = I->getParent(); BasicBlock::iterator InsertPos = I; - if (!isa(Result)) // If combining a PHI, don't insert - while (isa(InsertPos)) // middle of a block of PHIs. - ++InsertPos; + // If we replace a PHI with something that isn't a PHI, fix up the + // insertion point. + if (!isa(Result) && isa(InsertPos)) + InsertPos = InstParent->getFirstInsertionPt(); InstParent->getInstList().insert(InsertPos, Result); EraseInstFromFunction(*I); } else { #ifndef NDEBUG DEBUG(errs() << "IC: Mod = " << OrigI << '\n' << " New = " << *I << '\n'); #endif // If the instruction was modified, it's possible that it is now dead. // if so, remove it. if (isInstructionTriviallyDead(I)) { EraseInstFromFunction(*I); } else { Worklist.Add(I); Worklist.AddUsersToWorkList(*I); } } MadeIRChange = true; } } Worklist.Zap(); return MadeIRChange; } bool InstCombiner::runOnFunction(Function &F) { TD = getAnalysisIfAvailable(); /// Builder - This is an IRBuilder that automatically inserts new /// instructions into the worklist when they are created. IRBuilder TheBuilder(F.getContext(), TargetFolder(TD), InstCombineIRInserter(Worklist)); Builder = &TheBuilder; bool EverMadeChange = false; // Lower dbg.declare intrinsics otherwise their value may be clobbered // by instcombiner. EverMadeChange = LowerDbgDeclare(F); // Iterate while there is work to do. unsigned Iteration = 0; while (DoOneIteration(F, Iteration++)) EverMadeChange = true; Builder = 0; return EverMadeChange; } FunctionPass *llvm::createInstructionCombiningPass() { return new InstCombiner(); } Index: vendor/llvm/dist/test/CodeGen/ARM/gv-stubs-crash.ll =================================================================== --- vendor/llvm/dist/test/CodeGen/ARM/gv-stubs-crash.ll (nonexistent) +++ vendor/llvm/dist/test/CodeGen/ARM/gv-stubs-crash.ll (revision 228364) @@ -0,0 +1,36 @@ +; RUN: llc < %s -mtriple=thumbv7-apple-ios -relocation-model=pic +; + +@Exn = external hidden unnamed_addr constant { i8*, i8* } + +define hidden void @func(i32* %this, i32* %e) optsize align 2 { + %e.ld = load i32* %e, align 4 + %inv = invoke zeroext i1 @func2(i32* %this, i32 %e.ld) optsize + to label %ret unwind label %lpad + +ret: + ret void + +lpad: + %lp = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gxx_personality_sj0 to i8*) + catch i8* bitcast ({ i8*, i8* }* @Exn to i8*) + br label %.loopexit4 + +.loopexit4: + %exn = call i8* @__cxa_allocate_exception(i32 8) nounwind + call void @__cxa_throw(i8* %exn, i8* bitcast ({ i8*, i8* }* @Exn to i8*), i8* bitcast (void (i32*)* @dtor to i8*)) noreturn + unreachable + +resume: + resume { i8*, i32 } %lp +} + +declare hidden zeroext i1 @func2(i32*, i32) optsize align 2 + +declare i8* @__cxa_allocate_exception(i32) + +declare i32 @__gxx_personality_sj0(...) + +declare void @dtor(i32*) optsize + +declare void @__cxa_throw(i8*, i8*, i8*) Index: vendor/llvm/dist/test/CodeGen/PowerPC/2008-10-17-AsmMatchingOperands.ll =================================================================== --- vendor/llvm/dist/test/CodeGen/PowerPC/2008-10-17-AsmMatchingOperands.ll (revision 228363) +++ vendor/llvm/dist/test/CodeGen/PowerPC/2008-10-17-AsmMatchingOperands.ll (revision 228364) @@ -1,11 +1,15 @@ +; PR11218 +; FIXME: This depends on assertion failure for now. +; REQUIRES: asserts + ; RUN: llc < %s ; XFAIL: * ; PR2356 target datalayout = "E-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f128:64:128" target triple = "powerpc-apple-darwin9" define i32 @test(i64 %x, i32* %p) nounwind { %asmtmp = call i32 asm "", "=r,0"(i64 0) nounwind ; [#uses=0] %y = add i32 %asmtmp, 1 ret i32 %y } Index: vendor/llvm/dist/test/CodeGen/X86/dbg-i128-const.ll =================================================================== --- vendor/llvm/dist/test/CodeGen/X86/dbg-i128-const.ll (revision 228363) +++ vendor/llvm/dist/test/CodeGen/X86/dbg-i128-const.ll (revision 228364) @@ -1,26 +1,26 @@ -; RUN: llc < %s | FileCheck %s +; RUN: llc -mtriple=x86_64-linux < %s | FileCheck %s ; CHECK: DW_AT_const_value ; CHECK-NEXT: 42 define i128 @__foo(i128 %a, i128 %b) nounwind { entry: tail call void @llvm.dbg.value(metadata !0, i64 0, metadata !1), !dbg !11 %add = add i128 %a, %b, !dbg !11 ret i128 %add, !dbg !11 } declare void @llvm.dbg.value(metadata, i64, metadata) nounwind readnone !0 = metadata !{i128 42 } !1 = metadata !{i32 524544, metadata !2, metadata !"MAX", metadata !4, i32 29, metadata !8} ; [ DW_TAG_auto_variable ] !2 = metadata !{i32 524299, metadata !3, i32 26, i32 0} ; [ DW_TAG_lexical_block ] !3 = metadata !{i32 524334, i32 0, metadata !4, metadata !"__foo", metadata !"__foo", metadata !"__foo", metadata !4, i32 26, metadata !6, i1 false, i1 true, i32 0, i32 0, null, i1 false} ; [ DW_TAG_subprogram ] !4 = metadata !{i32 524329, metadata !"foo.c", metadata !"/tmp", metadata !5} ; [ DW_TAG_file_type ] !5 = metadata !{i32 524305, i32 0, i32 1, metadata !"foo.c", metadata !"/tmp", metadata !"clang", i1 true, i1 true, metadata !"", i32 0} ; [ DW_TAG_compile_unit ] !6 = metadata !{i32 524309, metadata !4, metadata !"", metadata !4, i32 0, i64 0, i64 0, i64 0, i32 0, null, metadata !7, i32 0, null} ; [ DW_TAG_subroutine_type ] !7 = metadata !{metadata !8, metadata !8, metadata !8} !8 = metadata !{i32 524310, metadata !4, metadata !"ti_int", metadata !9, i32 78, i64 0, i64 0, i64 0, i32 0, metadata !10} ; [ DW_TAG_typedef ] !9 = metadata !{i32 524329, metadata !"myint.h", metadata !"/tmp", metadata !5} ; [ DW_TAG_file_type ] !10 = metadata !{i32 524324, metadata !4, metadata !"", metadata !4, i32 0, i64 128, i64 128, i64 0, i32 0, i32 5} ; [ DW_TAG_base_type ] !11 = metadata !{i32 29, i32 0, metadata !2, null} Index: vendor/llvm/dist/test/MC/ARM/elf-thumbfunc-reloc.ll =================================================================== --- vendor/llvm/dist/test/MC/ARM/elf-thumbfunc-reloc.ll (revision 228363) +++ vendor/llvm/dist/test/MC/ARM/elf-thumbfunc-reloc.ll (revision 228364) @@ -1,37 +1,37 @@ ; RUN: llc %s -mtriple=thumbv7-linux-gnueabi -relocation-model=pic \ ; RUN: -filetype=obj -o - | elf-dump --dump-section-data | \ ; RUN: FileCheck %s ; FIXME: This file needs to be in .s form! ; We wanna test relocatable thumb function call, ; but ARMAsmParser cannot handle "bl foo(PLT)" yet target datalayout = "e-p:32:32:32-i1:8:32-i8:8:32-i16:16:32-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:64:128-a0:0:32-n32" target triple = "thumbv7-none--gnueabi" define void @foo() nounwind { entry: ret void } define void @bar() nounwind { entry: call void @foo() ret void } ; make sure that bl 0 (fff7feff) is correctly encoded -; CHECK: '_section_data', '704700bf 2de90048 fff7feff bde80008' +; CHECK: '_section_data', '704700bf 2de90048 fff7feff bde80088' ; Offset Info Type Sym.Value Sym. Name ; 00000008 0000070a R_ARM_THM_CALL 00000001 foo ; CHECK: Relocation 0 ; CHECK-NEXT: 'r_offset', 0x00000008 ; CHECK-NEXT: 'r_sym', 0x000007 ; CHECK-NEXT: 'r_type', 0x0a ; make sure foo is thumb function: bit 0 = 1 ; CHECK: Symbol 7 ; CHECK-NEXT: 'foo' ; CHECK-NEXT: 'st_value', 0x00000001 Index: vendor/llvm/dist/test/MC/AsmParser/2011-09-06-NoNewline.s =================================================================== --- vendor/llvm/dist/test/MC/AsmParser/2011-09-06-NoNewline.s (revision 228363) +++ vendor/llvm/dist/test/MC/AsmParser/2011-09-06-NoNewline.s (revision 228364) @@ -1,6 +1,7 @@ -// RUN: llvm-mc %s +// RUN: llvm-mc -triple i386-unknown-unknown %s movl %gs:8, %eax -// RUN: llvm-mc %s +// RUN: llvm-mc -triple i386-unknown-unknown %s movl %gs:8, %eax -// RUN: llvm-mc %s -movl %gs:8, %eax \ No newline at end of file +// RUN: llvm-mc -triple i386-unknown-unknown %s +movl %gs:8, %eax + \ No newline at end of file Index: vendor/llvm/dist/test/Transforms/InstCombine/crash.ll =================================================================== --- vendor/llvm/dist/test/Transforms/InstCombine/crash.ll (revision 228363) +++ vendor/llvm/dist/test/Transforms/InstCombine/crash.ll (revision 228364) @@ -1,376 +1,401 @@ ; RUN: opt < %s -instcombine -S target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128:n8:16:32" target triple = "i386-apple-darwin10.0" define i32 @test0(i8 %tmp2) ssp { entry: %tmp3 = zext i8 %tmp2 to i32 %tmp8 = lshr i32 %tmp3, 6 %tmp9 = lshr i32 %tmp3, 7 %tmp10 = xor i32 %tmp9, 67108858 %tmp11 = xor i32 %tmp10, %tmp8 %tmp12 = xor i32 %tmp11, 0 ret i32 %tmp12 } ; PR4905 define <2 x i64> @test1(<2 x i64> %x, <2 x i64> %y) nounwind { entry: %conv.i94 = bitcast <2 x i64> %y to <4 x i32> ; <<4 x i32>> [#uses=1] %sub.i97 = sub <4 x i32> %conv.i94, undef ; <<4 x i32>> [#uses=1] %conv3.i98 = bitcast <4 x i32> %sub.i97 to <2 x i64> ; <<2 x i64>> [#uses=2] %conv2.i86 = bitcast <2 x i64> %conv3.i98 to <4 x i32> ; <<4 x i32>> [#uses=1] %cmp.i87 = icmp sgt <4 x i32> undef, %conv2.i86 ; <<4 x i1>> [#uses=1] %sext.i88 = sext <4 x i1> %cmp.i87 to <4 x i32> ; <<4 x i32>> [#uses=1] %conv3.i89 = bitcast <4 x i32> %sext.i88 to <2 x i64> ; <<2 x i64>> [#uses=1] %and.i = and <2 x i64> %conv3.i89, %conv3.i98 ; <<2 x i64>> [#uses=1] %or.i = or <2 x i64> zeroinitializer, %and.i ; <<2 x i64>> [#uses=1] %conv2.i43 = bitcast <2 x i64> %or.i to <4 x i32> ; <<4 x i32>> [#uses=1] %sub.i = sub <4 x i32> zeroinitializer, %conv2.i43 ; <<4 x i32>> [#uses=1] %conv3.i44 = bitcast <4 x i32> %sub.i to <2 x i64> ; <<2 x i64>> [#uses=1] ret <2 x i64> %conv3.i44 } ; PR4908 define void @test2(<1 x i16>* nocapture %b, i32* nocapture %c) nounwind ssp { entry: %arrayidx = getelementptr inbounds <1 x i16>* %b, i64 undef ; <<1 x i16>*> %tmp2 = load <1 x i16>* %arrayidx ; <<1 x i16>> [#uses=1] %tmp6 = bitcast <1 x i16> %tmp2 to i16 ; [#uses=1] %tmp7 = zext i16 %tmp6 to i32 ; [#uses=1] %ins = or i32 0, %tmp7 ; [#uses=1] %arrayidx20 = getelementptr inbounds i32* %c, i64 undef ; [#uses=1] store i32 %ins, i32* %arrayidx20 ret void } ; PR5262 @tmp2 = global i64 0 ; [#uses=1] declare void @use(i64) nounwind define void @foo(i1) nounwind align 2 { ;